From avilella at gmail.com Sat Jan 2 03:57:28 2010 From: avilella at gmail.com (Albert Vilella) Date: Sat, 2 Jan 2010 08:57:28 +0000 Subject: [Bioperl-l] Downloading from dbEST by taxon range Message-ID: <358f4d651001020057g21c8497eia6559bfeac0b5544@mail.gmail.com> Hi all and happy 2010 for those that follow the Gregorian calendar, A question that is a bit in between bioperl and NCBI. I would like to use bioperl to download sequences fom dbEST. For that, my idea is to use Bio::DB::Genbank and get the sequences by gi id. Now, I want my script to download sequences for a given NCBI taxonomy clade. For example, if I want to download all fish (clupeocephala) sequences in dbEST, I can browse it around with the dbEST webpage using "clupeocephala[taxonomy]", so I am thinking there should be a way to do it programmatically. How can I query NCBI dbEST through bioperl to give me the list of GI ids I am looking for given a taxon id? Thanks in advance, Albert. From jason at bioperl.org Sat Jan 2 11:35:22 2010 From: jason at bioperl.org (Jason Stajich) Date: Sat, 2 Jan 2010 08:35:22 -0800 Subject: [Bioperl-l] Downloading from dbEST by taxon range In-Reply-To: <358f4d651001020057g21c8497eia6559bfeac0b5544@mail.gmail.com> References: <358f4d651001020057g21c8497eia6559bfeac0b5544@mail.gmail.com> Message-ID: DId you try Bio::DB::Query::GenBank ? You'd want to use -db => 'nucest' and then you just put in an Entrez query as per the example. you can include dates in the query so you can do updates to your locally retrieved data in a script that runs periodically. -jason On Jan 2, 2010, at 12:57 AM, Albert Vilella wrote: > Hi all and happy 2010 for those that follow the Gregorian calendar, > > A question that is a bit in between bioperl and NCBI. I would like > to use > bioperl to download sequences fom dbEST. For that, my idea is to use > Bio::DB::Genbank and get the sequences by gi id. > > Now, I want my script to download sequences for a given NCBI > taxonomy clade. > > For example, if I want to download all fish (clupeocephala) > sequences in dbEST, > I can browse it around with the dbEST webpage using > "clupeocephala[taxonomy]", > so I am thinking there should be a way to do it programmatically. > > How can I query NCBI dbEST through bioperl to give me the list of GI > ids I am > looking for given a taxon id? > > Thanks in advance, > > Albert. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ From avilella at gmail.com Sun Jan 3 04:08:33 2010 From: avilella at gmail.com (Albert Vilella) Date: Sun, 3 Jan 2010 09:08:33 +0000 Subject: [Bioperl-l] Downloading from dbEST by taxon range In-Reply-To: References: <358f4d651001020057g21c8497eia6559bfeac0b5544@mail.gmail.com> Message-ID: <358f4d651001030108p6a92fb27k5fa39be6bebb3a9c@mail.gmail.com> Thanks Jason! For the sake of completion, here is the script I needed: --------------------- #!/usr/bin/perl use strict; use Bio::SeqIO; use Bio::DB::Taxonomy; use Bio::DB::Query::GenBank; use Bio::DB::GenBank; use Bio::SeqIO; use Getopt::Long; my $keyword_type = 'EST'; my $outdir = '.'; my $taxon_name = undef; my $db_type = 'nucest'; GetOptions('keyword_type:s' => \$keyword_type, 't|taxon_name:s' => \$taxon_name, 'db_type:s' => \$db_type, 'outdir:s' => \$outdir); my $query_string = $taxon_name ."[Organism] AND ". $keyword_type ."[Keyword]"; my $db = Bio::DB::Query::GenBank->new (-db => $db_type, -query => $query_string, -mindate => '2007', -maxdate => '2010'); my $taxon_name_string = $taxon_name; $taxon_name_string =~ s/\ /\_/g; my $outfile = $outdir . "/" . $taxon_name_string . ".". $db_type . ".fasta"; my $out = Bio::SeqIO->new(-file => ">$outfile", -format => 'fasta'); print $db->count,"\n"; my $gb = Bio::DB::GenBank->new(); my $stream = $gb->get_Stream_by_query($db); while (my $seq = $stream->next_seq) { # Filtering reads shorter than 800 next unless (length($seq->seq) > 800); $out->write_seq($seq); } $out->close; --------------------- On Sat, Jan 2, 2010 at 4:35 PM, Jason Stajich wrote: > DId you try Bio::DB::Query::GenBank ? > You'd want to use -db => 'nucest' and then you just put in an Entrez query > as per the example. ?you can include dates in the query so you can do > updates to your locally retrieved data in a script that runs periodically. > > -jason > On Jan 2, 2010, at 12:57 AM, Albert Vilella wrote: > >> Hi all and happy 2010 for those that follow the Gregorian calendar, >> >> A question that is a bit in between bioperl and NCBI. I would like to use >> bioperl to download sequences fom dbEST. For that, my idea is to use >> Bio::DB::Genbank and get the sequences by gi id. >> >> Now, I want my script to download sequences for a given NCBI taxonomy >> clade. >> >> For example, if I want to download all fish (clupeocephala) sequences in >> dbEST, >> I can browse it around with the dbEST webpage using >> "clupeocephala[taxonomy]", >> so I am thinking there should be a way to do it programmatically. >> >> How can I query NCBI dbEST through bioperl to give me the list of GI ids I >> am >> looking for given a taxon id? >> >> Thanks in advance, >> >> Albert. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > http://fungalgenomes.org/ > > From Jean-Marc.Frigerio at pierroton.inra.fr Mon Jan 4 09:12:18 2010 From: Jean-Marc.Frigerio at pierroton.inra.fr (Jean-Marc Frigerio INRA) Date: Mon, 04 Jan 2010 15:12:18 +0100 Subject: [Bioperl-l] How to read in the whole fasta file in the memory? In-Reply-To: References: Message-ID: <4B41F742.2030209@pierroton.inra.fr> > Message: 1 > Date: Thu, 31 Dec 2009 11:26:45 +1800 > From: Peng Yu > Subject: [Bioperl-l] How to read in the whole fasta file in the > memory? > To: bioperl-l at lists.open-bio.org > Message-ID: > <366c6f340912300926k5af5cc88nc3c3babda541fd1 at mail.gmail.com> > Content-Type: text/plain; charset=ISO-8859-1 > > With Bio::SeqIO, I can only read in the records in a fasta file one by > one. This is preferable if there are many records in a file. > > But I also want to read all the records in. I could use a while loop > to read all records in. But could somebody let me know if there is a > function in bioperl that can read in all the record at once and return > me an object? > > http://www.bioperl.org/wiki/HOWTO:SeqIO > > > ------------------------------ > > Message: 2 > Date: Wed, 30 Dec 2009 13:04:53 -0500 > From: Sean Davis > Subject: Re: [Bioperl-l] How to read in the whole fasta file in the > memory? > To: Peng Yu > Cc: "bioperl-l at lists.open-bio.org" > Message-ID: > <264855a00912301004t396e0d4fwf9d291c5d82c3fb9 at mail.gmail.com> > Content-Type: text/plain; charset=UTF-8 > > On Wed, Dec 30, 2009 at 12:26 PM, Peng Yu wrote: >> With Bio::SeqIO, I can only read in the records in a fasta file one by >> one. This is preferable if there are many records in a file. >> >> But I also want to read all the records in. I could use a while loop >> to read all records in. But could somebody let me know if there is a >> function in bioperl that can read in all the record at once and return >> me an object? > > In perl, you can use an array to store the records. You could also > use a hash if you have reasonable keys for the entries. > > Sean > > >> http://www.bioperl.org/wiki/HOWTO:SeqIO >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > ------------------------------ > > Message: 3 > Date: Wed, 30 Dec 2009 11:58:54 -0800 > From: Jason Stajich > Subject: Re: [Bioperl-l] How to read in the whole fasta file in the > memory? > To: Peng Yu > Cc: BioPerl List > Message-ID: <3550F192-111F-48A7-B1B7-113FFFAC105B at bioperl.org> > Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes > > or use a database object so you can retrieve sequences that have a > particular id. See Bio::DB::Fasta > On Dec 30, 2009, at 10:04 AM, Sean Davis wrote: > >> On Wed, Dec 30, 2009 at 12:26 PM, Peng Yu wrote: >>> With Bio::SeqIO, I can only read in the records in a fasta file one >>> by >>> one. This is preferable if there are many records in a file. >>> >>> But I also want to read all the records in. I could use a while loop >>> to read all records in. But could somebody let me know if there is a >>> function in bioperl that can read in all the record at once and >>> return >>> me an object? >> In perl, you can use an array to store the records. You could also >> use a hash if you have reasonable keys for the entries. >> >> Sean >> >> >>> http://www.bioperl.org/wiki/HOWTO:SeqIO >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > http://fungalgenomes.org/ > > > > ------------------------------ > > Message: 4 > Date: Wed, 30 Dec 2009 16:20:31 -0500 > From: "Mark A. Jensen" > Subject: Re: [Bioperl-l] How to read in the whole fasta file in the > memory? > To: "Peng Yu" , > Message-ID: <2646F627E6D14AADB412A6E6B51E24DA at NewLife> > Content-Type: text/plain; format=flowed; charset="iso-8859-1"; > reply-type=original > > I think you might want Bio::AlignIO: > > $alnio = Bio::AlignIO->new(-file=> 'my.fas' ); > $aln = $alnio->next_aln; > @seqs = $aln->each_seqs; > > MAJ > ----- Original Message ----- > From: "Peng Yu" > To: > Sent: Wednesday, December 30, 2009 12:26 PM > Subject: [Bioperl-l] How to read in the whole fasta file in the memory? > > >> With Bio::SeqIO, I can only read in the records in a fasta file one by >> one. This is preferable if there are many records in a file. >> >> But I also want to read all the records in. I could use a while loop >> to read all records in. But could somebody let me know if there is a >> function in bioperl that can read in all the record at once and return >> me an object? >> >> http://www.bioperl.org/wiki/HOWTO:SeqIO >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l Hi, I wrote and currently use a module I named Bio::SeqIO::multifasta, which is basically a copy of Bio::SeqIO::fasta plus a few methods: get_by_id(), get_by_order(), first_seq() and previous_seq() It would need review, validation etc. Do I submit it to Bugzilla ? -- jmf From jason at bioperl.org Mon Jan 4 11:03:45 2010 From: jason at bioperl.org (Jason Stajich) Date: Mon, 4 Jan 2010 08:03:45 -0800 Subject: [Bioperl-l] How to read in the whole fasta file in the memory? In-Reply-To: <4B41F742.2030209@pierroton.inra.fr> References: <4B41F742.2030209@pierroton.inra.fr> Message-ID: <16D7C8C1-E4BE-406F-9D60-379876178CAB@bioperl.org> We typically think of SeqIO as parsing a stream of data, not being reliant on it being a file which is what these methods would be implying I think. Sounds a lot like a database - does Bio::DB::Fasta not provide some of the functionality you need by these methods? I realize there isn't a by_order() but the get_by_id() is implemented to allow random access. -jason > > Hi, > > I wrote and currently use a module I named Bio::SeqIO::multifasta, > which is basically a copy of Bio::SeqIO::fasta plus a few methods: > get_by_id(), get_by_order(), first_seq() and previous_seq() > > It would need review, validation etc. Do I submit it to Bugzilla ? > > -- jmf > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ From avilella at gmail.com Mon Jan 4 15:00:24 2010 From: avilella at gmail.com (Albert Vilella) Date: Mon, 4 Jan 2010 20:00:24 +0000 Subject: [Bioperl-l] indexed fastq files Message-ID: <358f4d651001041200m1f715b83k743f4d2ee6b6312b@mail.gmail.com> Hi all, What is the best way to index fastq files, so that once clustered, I can provide a list of seq_ids and get them back in fastq format from the indexed db? Cheers, Albert. From cjfields at illinois.edu Mon Jan 4 16:59:50 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 4 Jan 2010 15:59:50 -0600 Subject: [Bioperl-l] indexed fastq files In-Reply-To: <358f4d651001041200m1f715b83k743f4d2ee6b6312b@mail.gmail.com> References: <358f4d651001041200m1f715b83k743f4d2ee6b6312b@mail.gmail.com> Message-ID: <07EBA105-6A34-490C-B0B9-4772DF386CBA@illinois.edu> Bio::Index::Fastq, maybe? To tell the truth, I haven't tried it since we refactored FASTQ parsing, so let us know if it doesn't work. chris On Jan 4, 2010, at 2:00 PM, Albert Vilella wrote: > Hi all, > > What is the best way to index fastq files, so that once clustered, I > can provide a list of seq_ids and get > them back in fastq format from the indexed db? > > Cheers, > > Albert. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Jan 4 22:54:03 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 4 Jan 2010 21:54:03 -0600 Subject: [Bioperl-l] How to read in the whole fasta file in the memory? In-Reply-To: <4B41F742.2030209@pierroton.inra.fr> References: <4B41F742.2030209@pierroton.inra.fr> Message-ID: <1BAE5508-0DB7-41B4-92E3-49256582131F@illinois.edu> Jean-Marc, You can do that, yes. Just curious, but have you looked at the various flat file indexing modules for FASTA? Bio::DB::Fasta and Bio::Index::Fasta are commonly used and allow lookups by primary ID (and I think in some cases secondary IDs). chris On Jan 4, 2010, at 8:12 AM, Jean-Marc Frigerio INRA wrote: > ... > > Hi, > > I wrote and currently use a module I named Bio::SeqIO::multifasta, which is basically a copy of Bio::SeqIO::fasta plus a few methods: > get_by_id(), get_by_order(), first_seq() and previous_seq() > > It would need review, validation etc. Do I submit it to Bugzilla ? > > -- jmf > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From fs5 at sanger.ac.uk Wed Jan 6 17:16:13 2010 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Wed, 06 Jan 2010 22:16:13 +0000 Subject: [Bioperl-l] Bio::DB::Sam strange behaviour for read pairs Message-ID: <4B450BAD.3050807@sanger.ac.uk> I'm trying to extract paired reads from a BAM file that span a given region. I would then like to get the two read ends of the sequenced clone that spans the region. I use Bio::DB::Sam->get_features_by_location for this and it does give me the correct read pairs as a region match but it doesn't give me both read pairs in all cases. Here is the script: #!/usr/bin/perl use Bio::DB::Sam; my $usage = "usage: $0 BAMFILE CHROMOSOME STARTPOS ENDPOS\n" ; my ($bam_file,$chrom,$start,$end) = @ARGV ; die $usage unless $bam_file && $chrom && $start && $end; my $bam = Bio::DB::Sam->new(-bam => $bam_file); my @pairs = $bam->get_features_by_location( -type => 'read_pair', -seq_id => $chrom, -start => $start, -end => $end); print "region: $chrom:$start..$end\n" ; foreach my $pair (@pairs) { print " pair: id: ".$pair->id.", start".$pair->start.', end:'.$pair->end."\n"; my ($first_mate,$second_mate) = $pair->get_SeqFeatures; print " first_mate: start:".$first_mate->start.', end:'.$first_mate->end."\n"; if ($second_mate){ print " second_mate: start:".$second_mate->start.', end:'.$second_mate->end."\n"; } else { print " no second mate\n"; } } And here are the matching pairs that it produces with one of my files for the region tal12:22479..29232: region: tal12:22479..29232 pair: id: tal-2446c08, start17496, end:29423 first_mate: start:28540, end:29423 no second mate pair: id: tal-2463d10, start23534, end:31363 first_mate: start:23534, end:24448 no second mate pair: id: tal-2371c09, start20860, end:28230 first_mate: start:27604, end:28230 no second mate pair: id: tal-2440b06, start19232, end:27099 first_mate: start:26025, end:27099 no second mate pair: id: tal-2327g09, start18909, end:26129 first_mate: start:25354, end:26129 no second mate pair: id: tal-2381b05, start25658, end:35054 first_mate: start:25658, end:26295 no second mate pair: id: tal-2377c11, start20898, end:28230 first_mate: start:27473, end:28230 no second mate pair: id: tal-2426e12, start21975, end:27562 first_mate: start:21975, end:23008 second_mate: start:26396, end:27562 pair: id: tal-2365h10, start22843, end:31944 first_mate: start:22843, end:23184 no second mate pair: id: tal-2388h09, start19016, end:28238 first_mate: start:27475, end:28238 no second mate So it finds a lot of pairs that span the region and the start/end from the pair is also correct but it only gives me both individual mates in one case: pair: id: tal-2426e12, start21975, end:27562 first_mate: start:21975, end:23008 second_mate: start:26396, end:27562 In this case, both pairs are actually inside the query region (at least partially) whereas in the other cases, one of the mates is not inside, e.g. this one: pair: id: tal-2388h09, start19016, end:28238 first_mate: start:27475, end:28238 no second mate > get this read pair from the BAM file: $ samtools view clones.bam | grep tal-2388h09 tal-2388h09 99 tal12 19016 205 36H9M1D14M1D664M1D16M1D21M1D28M1D15M1D10M1D12M1D7M1D8M1D5M = 27475 9223 CTTTGGATGAAATAGTTTTTAAATAATACTTATTAAATATTAAATATATAACACATAAATAAGTATTGATGCAAATTTTAAAGTATTATAGAAAACTAGGTTTGATTATATTGTTATACTGTACTTTAAGAGGAGAGAGATAAGATATCTTTGCTCTTTTAATATATAAATTTAGATAAATATTCGTTAAATTTTCTACATAGTTATTTTTTATCTTATATATTATACTGCTATAGTTATCAATGTATATACATTCAAATAATTTATTAAAAATTCTATATTATATTAATTCTATGATAAAATAATCCTGTTTGTGATTTAAAAAATGATGATTCAATAAAAACTAATAATATAATACGAGTTAATATGGAATAATAAAATGGCATTTAACATGAATTTAGTCTTTAACCTTTTCTTTGTTTGTCAAGTTTTTTAAAACATAAAACCACACATTTCAAAATGGATTTTTAGCAAATATATAAAAATTATACATTTATAATGTATTGTTATGCGTCTTTTCGATAGAATCAATATTTAATTATATGAAGTTTCCACAATAAAATAATATTTAATATTATTTATTAGTAGAGTATTTGATTATATATATAGGCATATAATAATAACTCTAGTTCTATCTACCATATTATTTATAATTATTATAACAAAATGTGATATGAAATTTTATTATATACTTATATTATTTTTTTAACTATTTTAAAATATATTTATTTATACCTCAAAACTATAAAATTGAAATTATTAATAATAATCTAATATATACCTTTATAAAAATAAACGTATAAACTAAT ><:4/+1+*)+4>BEH=9-,,66IIIIIIIIEDA>>>>A at DDFFIHHHHHITIIIIIHIIHHHHHHIYYYYYTTTYDDDHDDDDDDDIIINNTNHHHHHIYYYYIIIIIINNNNTTIIIIIIIIIIITTNTTTTTYYYTTTTTTYYNNNNNNLLLLLLLLLLLNNNNNNTTTTTTTTTTTTTNNTNNNTTTYYTLLLLLLTTTTTTTTYTTTNNNNNNTTTTTTTNNTTNNTTTTTTTTTTYYTTTTTTTNNNNNNTTTTTTTYYTTTTTTYYYTTTTTTTTTTTYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYTTTTTTTTTYYYYYYYYYYYYYYYYYYYYYYYTTTTTYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYTTTTYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYTTNNINTTYYYYYYYYYYYYYYYTTTTTTYYYYYYYYYYYYTTTTTTYYYYYYYYYYYYYTTTTTOOOIFFFIFIIOICC>>II@>>>>>>C>>>>>>CIBECCCHIIOOOOOOOOTTTIIFDDEIQQA:55839AA>99>@IIIIII>>::;;I;>>CC>>>>>@III<::=>AAA<>>>>I>:>>99:>842225006824855;5>68844//.//00:>::338:99<:/-+*-./0)((((+00+..,++(((+-()(*((((()*)***))3)''')*..+*++((*1++--+*''''((+/)*42.((***)+,+('*'''*((''''((,'%%''''''''( AS:i:614 MS:i:50 tal-2388h09 147 tal12 27475 205 1H764M40H = 19016 -9223 ATTAAATCGGTATCGCCAACACAATGAGTATAATCATTGTCAAATATGCGTTTGTAAGTATATTCATTGTCACATTCACGTTTGTAAGTATATTCATTGTCACATTCACGTTTGTAAGTATAATCATTGGAACGTTCATTTTTGTAAGTATAATCATTGGTACGTTCATTTTTGTGAGTATAATCATTGGTACGTTCATTTTTATGAGTATAATCATTGGTACGTTCATTTTTGTGAGTATAATCATTGGTACGTTCATTTTTATGAGTATAATCATTGGAACGTTCATTTTTGTAAGTATAATCATTGGTACGTTCATTTTTGTGAGTATAATCATTGGTACGTTCATTTTTATGAGTATAATCATTGGAACGTTCATTTTTATGAGTATAATCATTGGAACGTTCATTTTTGTAAGTATAATCATTGGTACGTTCATTTTTATGAGTATAATCATTGGTACGTTCATTTTTATGAGTATAATCATTGGAACGTTCATTTTTATGAGTATAATCATTGGAACGTTCATTTTTGTAAGTATAATCATTGGAACGTTCATTTTTATGAGTATAATCATTGGTACGTTCATTTTTGTGAGTATAATCATTGGTACGTTCATTTTTGTGAGTATAATCATTGGTACGTTCATTTTTATGAGTATAATCATTGGAACGTTCATTTTTATGAGTATAATCATTGGTACGTTCATTTTTGTGAGTATAATCATTGGTACGTTAATTTTTGTGAGTATAATCATTGGAACG (((0))*,-1-../2((())03---03266300271+*.-0-*''''+*,+/+))*-05330+)..4>7=77273911**((+20+03688633:93036<8;::5:<99379>>::>>>:57:<:7--)))1435::333228>::>II>::>A>>3/.958677AA=AA:>:==IIII8338<>A>>>>IIIIIIIIYYYYYKKYYYMIFFFFEIIIMI::4..8AIIC>9>=EIQQQMCAAAAAACIIIIAICIIIOOYTIIIMOQQMIIIIC>>AAABCCCCCEAI>C>>IQQIIIIIIIIIIKKYYYYYYYYYYYYYYYYYYYYYTIIIIIIYYYYTNINNNTYYYYYYYYYYYYYTTTTTYYYYYYYYYYYYYYYYYYYYYYYYYTTTTTTYYYYYYYYYYYYYYYSSYYYYYYYTTTTTTYYYYYYYYYYYYYYYYYYYYYYYYTTTTTTTTTTTTTTTTTTYYYYYTTTTTTYYYTTNNNNTTYYYYYYYYYTTTTLLTTNNTTTTYTTTTTTYYYYYYYYYYYTTOOKKKLKOOTYYYYYYYYYYYYYYYYTNNNNNNNNNTTTNYNNNNTNNNNTTYYYYYYYYTTNNNNTTYNNNNNITTTTTYYYYYYYYYYTTNNIIIIIDIIIIHTNNNNTTYYYYTNNNIIIIIITTTINIIIINNNNTTTYYYYIHHHDDHHDDIHDDGDFFFTIIINTTYYYYTTTTHHHHCCIIIHIHHHHCAI9:++**1168>ACCIIDDDDDDI>>>>>?NNN AS:i:688 MS:i:50 So the read in the first line starts before the start of the query region and is not accessible via $pair->get_SeqFeatures although this is a valid pair. Am I doing something wrong, is this the desired behaviour or is it a bug? Thanks for your help! -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From hlapp at drycafe.net Thu Jan 7 11:55:00 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Thu, 7 Jan 2010 11:55:00 -0500 Subject: [Bioperl-l] Data missing into Annotation object using Bio::SeqIO (Genbank) In-Reply-To: <29CB0088-99C1-417E-BB3B-56FE7EC135F9@illinois.edu> References: <4B28EB44.3080006@pasteur.fr> <29CB0088-99C1-417E-BB3B-56FE7EC135F9@illinois.edu> Message-ID: <240F198A-83FA-4304-ACA8-80A702A68D8C@drycafe.net> I don't know to what extent this was followed up on further and I guess it's too long ago to be of much help, but if it hasn't been mentioned before I wanted to point out Bio::SeqFeature::AnnotationAdaptor which integrates tag/value annotation and Bio::Annotation annotation into one AnnotationCollection, so it doesn't matter whether something is attached as a tag or as an annotation object. -hilmar On Dec 16, 2009, at 10:09 AM, Chris Fields wrote: > Emmanuel, > > The previous behavior in the 1.5.x series was to store feature tags > as Bio::Annotation. The problem had been the way this was > implemented was considered unsatisfactory for various reasons, so we > reverted back to using simple tag-value pairs as the default. You > can get at the data this way (from the Feature/Annotation HOWTO): > > for my $feat_object ($seq_object->get_SeqFeatures) { > print "primary tag: ", $feat_object->primary_tag, "\n"; > for my $tag ($feat_object->get_all_tags) { > print " tag: ", $tag, "\n"; > for my $value ($feat_object->get_tag_values($tag)) { > print " value: ", $value, "\n"; > } > } > } > > You can also convert all the tag-value data into a > Bio::Annotation::Collection using the > Bio::SeqFeature::AnnotationAdaptor, but this is completely optional. > > chris > > On Dec 16, 2009, at 8:14 AM, Emmanuel Quevillon wrote: > >> Hi, >> >> I've wrote a small Genbank parser few months ago before BioPerl >> release 1.6.0. >> I tried to use my code once again but now the output of my parser >> is empty. >> It looks like Annotation from seqfeatures is not filled anymore. >> >> Here is the code I used previously: >> >> while(my $seq = $streamer->next_seq()){ >> >> #We only want to retrieve CDS features... >> foreach my $feat (grep { $_->primary_tag() eq 'CDS' } $seq- >> >get_SeqFeatures()){ >> print $ofh join("#", >> $feat->annotation()- >> >get_Annotations('locus_tag'), # Acc num >> $feat->annotation()->get_Annotations('gene') >> ? $feat->annotation()- >> >get_Annotations('gene') # Gene name >> : $feat->annotation()- >> >get_Annotations('locus_tag'), >> $feat->annotation()- >> >get_Annotations('product'), # Description >> ),"\n"; >> } >> } >> >> $feat is a Bio::SeqFeature::Generic object >> >> If I print Dumper($feat->annotation()) here is the output : >> >> $VAR1 = bless( { >> '_typemap' => bless( { >> '_type' => { >> 'comment' => >> 'Bio::Annotation::Comment', >> 'reference' => >> 'Bio::Annotation::Reference', >> 'dblink' => >> 'Bio::Annotation::DBLink' >> } >> }, >> 'Bio::Annotation::TypeManager' ), >> '_annotation' => {} >> }, 'Bio::Annotation::Collection' ); >> >> Have some changes been made into the way annotation object is >> populated? >> >> Thanks for any clue and sorry if my question look stupid >> >> Regards >> >> Emmanuel >> >> -- >> ------------------------- >> Emmanuel Quevillon >> Biological Software and Databases Group >> Institut Pasteur >> +33 1 44 38 95 98 >> tuco at_ pasteur dot fr >> ------------------------- >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From rtbio.2009 at gmail.com Fri Jan 8 10:00:21 2010 From: rtbio.2009 at gmail.com (Roopa Raghuveer) Date: Fri, 8 Jan 2010 16:00:21 +0100 Subject: [Bioperl-l] Regarding blast in Bioperl Message-ID: Hello all, I was trying Remote blast using Bioperl. My input data is a Trypanosoma brucei sequence in Fasta format. When I was trying to submit to BLAST using the step $r=$factory->submit_blast($input) It was not returning anything which I checked by debugging the code. It is not blasting my input sequence even though I mentioned all the parameters.I would paste the code below. Please help me in solving put this problem. It is very urgent. Regards Roopa. #!/usr/bin/perl #path for extra camel module use lib "/srv/www/htdocs/rain/RNAi/"; use Roopablast; use Bio::SearchIO; use Bio::Search::Result::BlastResult; use Bio::Perl; use Bio::Tools::Run::RemoteBlast; use Bio::Seq; use Bio::SeqIO; use Bio::DB::GenBank; $serverpath = "/srv/www/htdocs/rain/RNAi"; $serverurl = "http://141.84.66.66/rain/RNAi"; $outfile = $serverpath."/rnairesult_".time().".html"; $nuc = $serverpath."/nuc".time().".txt"; $debugfile = $serverpath."/debug_".time().".txt"; $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; my $outstring =""; &parse_form; print "Content-type: text/html\n\n"; print "\n"; print "RNAi Result"; print " \n"; print "\n"; print "\n"; print " Your results will appear here
"; print " Please be patient, runtime can be up to 5 minutes
"; print " This page will automatically reload in 30 seconds. Roopa"; print "\n"; print "\n"; defined(my $pid = fork) or die "Can't fork: $!"; exit if $pid; open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; open(OUTFILE, '>',$outfile); print OUTFILE "\n RNAi Result \n \n \n Your results will appear here
Please be patient, runtime can be up to 5 minutes wait wait wait......
This page will automatically reload in 30 seconds Roopa
\n \n"; close(OUTFILE); @compseqs = blastcode($in{'Inputseq'}); $in{'Inputseq'} =~ s/>.*$//m; $in{'Inputseq'} =~ s/[^TAGC]//gim; $in{'Inputseq'} =~ tr/actg/ACTG/; @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'}, $in{'Threshold'}); sub blastcode { $inpu1= $_[0]; #$organ= $_[1]; open(NUC,'>',$nuc); print NUC $inpu1; close(NUC); my $prog = 'blastn'; my $db = 'refseq_rna'; my $e_val= '1e-10'; my $organism= 'Trypanosoma Brucei'; $gb = new Bio::DB::GenBank; my @params = ( '-prog' => $prog, '-data' => $db, '-expect' => $e_val, '-readmethod' => 'SearchIO', '-Organism' => $organism ); # open(OUTFILE,'>',$debugfile); # print OUTFILE @params; # close(OUTFILE); my $factory = Bio::Tools::Run::RemoteBlast->new(@params); #change a paramter $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma Brucei[ORGN]'; #change a paramter # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]'; my $v = 1; #$v is just to turn on and off the messages my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , '-organism' => 'Trypanosoma Brucei' ); while (my $input = $str->next_seq()) { #Blast a sequence against a database: #Alternatively, you could pass in a file with many #sequences rather than loop through sequence one at a time #Remove the loop starting 'while (my $input = $str->next_seq())' #and swap the two lines below for an example of that. open(OUTFILE,'>',$debugfile); print OUTFILE $input; close(OUTFILE); my $r = $factory->submit_blast($input); #The program stops here it does not return any value and it does not enter the While loop,Please help me in this regard.# open(OUTFILE,'>',$debugfile); print OUTFILE $r; close(OUTFILE); print STDERR "waiting...." if($v>0); while ( my @rids = $factory->each_rid ) { open(OUTFILE,'>',$debugfile); print OUTFILE "while entered"; close(OUTFILE); foreach my $rid ( @rids ) { open(OUTFILE,'>',$debugfile); print OUTFILE "foreach entered"; close(OUTFILE); my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $factory->remove_rid($rid); } open(OUTFILE,'>',$debugfile); print OUTFILE "if entered"; close(OUTFILE); print STDERR "." if ( $v > 0 ); sleep 5; } else { open(OUTFILE,'>',$debugfile); print OUTFILE "else entered"; close(OUTFILE); my $result = $rc->next_result(); #save the output $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; open(BLASTDEBUGFILE,'>',$blastdebugfile); print BLASTDEBUGFILE $result->next_hit(); close(BLASTDEBUGFILE); my $filename = $serverpath."/blastdata_".time().$result->query_name()."\.out"; # open(DEBUGFILE,'>',$debugfile); # open(new,'>',$filename); # @arra=; # print DEBUGFILE @arra; # close(DEBUGFILE); # close(new); $factory->save_output($filename); # open(BLASTDEBUGFILE,'>',$debugfile); # print BLASTDEBUGFILE "Hello $rid"; # close(BLASTDEBUGFILE); $factory->remove_rid($rid); open(BLASTDEBUGFILE,'>',$blastdebugfile); print BLASTDEBUGFILE $organism; close(BLASTDEBUGFILE); # open(OUTFILE,'>',$outfile); # print OUTFILE "Test2 $result->database_name()"; # close(OUTFILE); #$hit = $result->next_hit; #open(new,'>',$debugfile); #print $hit; #close(new); while ( my $hit = $result->next_hit ) { next unless ( $v > 0); # open(OUTFILE,'>',$debugfile); # print OUTFILE "$hit in while hits"; # close(OUTFILE); my $sequ = $gb->get_Seq_by_version($hit->name); my $dna = $sequ->seq(); # get the sequence as a string push(@seqs,$dna); } } } } } #open(OUTFILE,'>',$debugfile); #print OUTFILE $seqs[0]; #close(OUTFILE); return(@seqs); } open(OUTFILE, '>',$outfile) || die ; print OUTFILE "\n RNAi Result \n \n

Inputsequence:
"; for ($i=0; $i\n"; } } print OUTFILE "

"; $z=@compseqs; for($k=1;$k<$z;$k++) { print OUTFILE "

Compare Sequence:
"; for ($i=0; $i\n"; } } print OUTFILE "

"; } print OUTFILE "

Window:
$in{'Windowsize'}

Threshold:
$in{'Threshold'}

"; my $j=0; for ($i=0; $i{similar}<=$in{'Threshold'}){ $j=$in{'Windowsize'}; } $height=$out[$i]->{similar}*5; } if ($j>0) { print OUTFILE ""; $outstring .= "".substr ($in{'Inputseq'}, $i, 1).""; $j--; } else { print OUTFILE ""; $outstring .= "".substr ($in{'Inputseq'}, $i, 1).""; } if ( ($i+1)%10==0){ $outstring .= " "; } if ( ($i+1)%60==0){ $outstring .= "
\n"; } if ( ($i+1)%800==0){ print OUTFILE "

\n"; } } print OUTFILE "

$outstring"; #foreach (@out) { #print OUTFILE "

Sequence: $_->{sequence}: $_->{similar} matchs

"; #if ($_->{similar}<=$in{'Threshold'}){ # } #} print OUTFILE "\n\n"; close OUTFILE; #nameprint(); sub parse_form { local ($buffer, @pairs, $pair, $name, $value); # Read in text $ENV{'REQUEST_METHOD'} =~ tr/a-z/A-Z/; if ($ENV{'REQUEST_METHOD'} eq "POST") { read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'}); } else { $buffer = $ENV{'QUERY_STRING'}; } @pairs = split(/&/, $buffer); foreach $pair (@pairs) { ($name, $value) = split(/=/, $pair); $value =~ tr/+/ /; $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg; $in{$name} = $value; } } From maj at fortinbras.us Fri Jan 8 10:36:41 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 8 Jan 2010 10:36:41 -0500 Subject: [Bioperl-l] Regarding blast in Bioperl In-Reply-To: References: Message-ID: Hi Roopa-- I got your code to work with the following changes: +# the input should be a valid FASTA file... ... open(NUC,'>',$nuc); +print NUC ">seq (need a name line for valid fasta)\n"; print NUC $inpu1, "\n"; close(NUC); ... +# you can set these header parms in the call itself... - my $factory = Bio::Tools::Run::RemoteBlast->new(@params); + my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY => ''Trypanosoma Brucei[ORGN]'); #change a paramter +# commented this out... +# $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma Brucei[ORGN]'; MAJ ----- Original Message ----- From: "Roopa Raghuveer" To: Sent: Friday, January 08, 2010 10:00 AM Subject: [Bioperl-l] Regarding blast in Bioperl > Hello all, > > I was trying Remote blast using Bioperl. My input data is a Trypanosoma > brucei sequence in Fasta format. When I was trying to submit to BLAST using > the step > $r=$factory->submit_blast($input) > It was not returning anything which I checked by debugging the code. It is > not blasting my input sequence even though I mentioned all the parameters.I > would paste the code below. > > Please help me in solving put this problem. It is very urgent. > > Regards > Roopa. > > #!/usr/bin/perl > > #path for extra camel module > use lib "/srv/www/htdocs/rain/RNAi/"; > use Roopablast; > > > use Bio::SearchIO; > use Bio::Search::Result::BlastResult; > use Bio::Perl; > use Bio::Tools::Run::RemoteBlast; > use Bio::Seq; > use Bio::SeqIO; > use Bio::DB::GenBank; > > $serverpath = "/srv/www/htdocs/rain/RNAi"; > $serverurl = "http://141.84.66.66/rain/RNAi"; > $outfile = $serverpath."/rnairesult_".time().".html"; > $nuc = $serverpath."/nuc".time().".txt"; > $debugfile = $serverpath."/debug_".time().".txt"; > $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; > > my $outstring =""; > > &parse_form; > > print "Content-type: text/html\n\n"; > print "\n"; > print "RNAi Result"; > print " URL=$serverurl/rnairesult_".time().".html\"> \n"; > print "\n"; > print "\n"; > print " Your results will appear href=$serverurl/rnairesult_".time().".html>here
"; > print " Please be patient, runtime can be up to 5 minutes
"; > print " This page will automatically reload in 30 seconds. Roopa"; > print "\n"; > print "\n"; > > defined(my $pid = fork) or die "Can't fork: $!"; > exit if $pid; > open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; > open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; > open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; > > > > open(OUTFILE, '>',$outfile); > > print OUTFILE "\n > RNAi Result > URL=$serverurl//rnairesult_".time().".html\"> \n > > \n > \n > Your results will appear href=$serverurl/rnairesult_".time().".html>here
> Please be patient, runtime can be up to 5 minutes wait wait wait......
> This page will automatically reload in 30 seconds Roopa
> \n > \n"; > > close(OUTFILE); > > > @compseqs = blastcode($in{'Inputseq'}); > > $in{'Inputseq'} =~ s/>.*$//m; > $in{'Inputseq'} =~ s/[^TAGC]//gim; > $in{'Inputseq'} =~ tr/actg/ACTG/; > > @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'}, > $in{'Threshold'}); > > > sub blastcode > { > > $inpu1= $_[0]; > > #$organ= $_[1]; > > open(NUC,'>',$nuc); > print NUC $inpu1; > close(NUC); > > my $prog = 'blastn'; > my $db = 'refseq_rna'; > my $e_val= '1e-10'; > my $organism= 'Trypanosoma Brucei'; > > $gb = new Bio::DB::GenBank; > > my @params = ( '-prog' => $prog, > '-data' => $db, > '-expect' => $e_val, > '-readmethod' => 'SearchIO', > '-Organism' => $organism ); > > # open(OUTFILE,'>',$debugfile); > # print OUTFILE @params; > # close(OUTFILE); > > > my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > > #change a paramter > > $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma > Brucei[ORGN]'; > > #change a paramter > # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]'; > > my $v = 1; > #$v is just to turn on and off the messages > > my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , > '-organism' => 'Trypanosoma Brucei' ); > > > while (my $input = $str->next_seq()) > { > #Blast a sequence against a database: > #Alternatively, you could pass in a file with many > #sequences rather than loop through sequence one at a time > #Remove the loop starting 'while (my $input = $str->next_seq())' > #and swap the two lines below for an example of that. > > open(OUTFILE,'>',$debugfile); > print OUTFILE $input; > close(OUTFILE); > > > my $r = $factory->submit_blast($input); #The program stops here it > does not return any value and it does not enter the While loop,Please help > me in this regard.# > open(OUTFILE,'>',$debugfile); > print OUTFILE $r; > close(OUTFILE); > > > print STDERR "waiting...." if($v>0); > > while ( my @rids = $factory->each_rid ) { > open(OUTFILE,'>',$debugfile); > print OUTFILE "while entered"; > close(OUTFILE); > foreach my $rid ( @rids ) { > > open(OUTFILE,'>',$debugfile); > print OUTFILE "foreach entered"; > close(OUTFILE); > > my $rc = $factory->retrieve_blast($rid); > > if( !ref($rc) ) > { > if( $rc < 0 ) > { > $factory->remove_rid($rid); > } > open(OUTFILE,'>',$debugfile); > print OUTFILE "if entered"; > close(OUTFILE); > print STDERR "." if ( $v > 0 ); > sleep 5; > } > else { > open(OUTFILE,'>',$debugfile); > print OUTFILE "else entered"; > close(OUTFILE); > > my $result = $rc->next_result(); > #save the output > $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; > > open(BLASTDEBUGFILE,'>',$blastdebugfile); > print BLASTDEBUGFILE $result->next_hit(); > close(BLASTDEBUGFILE); > > my $filename = > $serverpath."/blastdata_".time().$result->query_name()."\.out"; > > # open(DEBUGFILE,'>',$debugfile); > # open(new,'>',$filename); > # @arra=; > # print DEBUGFILE @arra; > # close(DEBUGFILE); > # close(new); > > $factory->save_output($filename); > > # open(BLASTDEBUGFILE,'>',$debugfile); > # print BLASTDEBUGFILE "Hello $rid"; > # close(BLASTDEBUGFILE); > > $factory->remove_rid($rid); > > open(BLASTDEBUGFILE,'>',$blastdebugfile); > print BLASTDEBUGFILE $organism; > close(BLASTDEBUGFILE); > > # open(OUTFILE,'>',$outfile); > # print OUTFILE "Test2 $result->database_name()"; > # close(OUTFILE); > > #$hit = $result->next_hit; > #open(new,'>',$debugfile); > #print $hit; > #close(new); > > while ( my $hit = $result->next_hit ) { > > next unless ( $v > 0); > > # open(OUTFILE,'>',$debugfile); > # print OUTFILE "$hit in while hits"; > # close(OUTFILE); > > my $sequ = $gb->get_Seq_by_version($hit->name); > my $dna = $sequ->seq(); # get the sequence as a string > push(@seqs,$dna); > } > } > } > } > } > > #open(OUTFILE,'>',$debugfile); > #print OUTFILE $seqs[0]; > #close(OUTFILE); > > return(@seqs); > > } > > open(OUTFILE, '>',$outfile) || die ; > > print OUTFILE "\n > RNAi Result > \n > \n >

> Inputsequence:
"; > > for ($i=0; $i > print OUTFILE substr ($in{'Inputseq'}, $i, 1); > > if ( ($i+1)%10==0){ > print OUTFILE " "; > } > if ( ($i+1)%60==0){ > print OUTFILE "
\n"; > } > } > > > > print OUTFILE "

"; > > $z=@compseqs; > > for($k=1;$k<$z;$k++) { > print OUTFILE "

Compare > Sequence:
"; > > for ($i=0; $i > print OUTFILE substr ($compseqs[$k], $i, 1); > > if ( ($i+1)%10==0){ > print OUTFILE " "; > } > if ( ($i+1)%60==0){ > print OUTFILE "
\n"; > } > } > print OUTFILE "

"; > } > > print OUTFILE "

> Window:
$in{'Windowsize'} >

>

> Threshold:
$in{'Threshold'} >

"; > my $j=0; > > for ($i=0; $i > if ($i<=length ($in{'Inputseq'})-$in{'Windowsize'}){ > if ($out[$i]->{similar}<=$in{'Threshold'}){ > $j=$in{'Windowsize'}; > } > $height=$out[$i]->{similar}*5; > } > > if ($j>0) { > print OUTFILE " height=\"5\">"; > $outstring .= "".substr ($in{'Inputseq'}, $i, > 1).""; > $j--; > } > else { > print OUTFILE " height=\"5\">"; > $outstring .= "".substr ($in{'Inputseq'}, $i, > 1).""; > } > > if ( ($i+1)%10==0){ > $outstring .= " "; > } > if ( ($i+1)%60==0){ > $outstring .= "
\n"; > > } > if ( ($i+1)%800==0){ > print OUTFILE "

\n"; > > } > } > > print OUTFILE "

set\">$outstring"; > > #foreach (@out) { > #print OUTFILE "

Sequence: $_->{sequence}: $_->{similar} matchs

"; > #if ($_->{similar}<=$in{'Threshold'}){ > > # } > #} > > print OUTFILE "\n\n"; > > close OUTFILE; > > #nameprint(); > > sub parse_form { > local ($buffer, @pairs, $pair, $name, $value); > # Read in text > $ENV{'REQUEST_METHOD'} =~ tr/a-z/A-Z/; > if ($ENV{'REQUEST_METHOD'} eq "POST") > { > read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'}); > } > else > { > $buffer = $ENV{'QUERY_STRING'}; > } > @pairs = split(/&/, $buffer); > foreach $pair (@pairs) > { > ($name, $value) = split(/=/, $pair); > $value =~ tr/+/ /; > $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg; > $in{$name} = $value; > } > } > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From julian.onions at gmail.com Fri Jan 8 11:53:50 2010 From: julian.onions at gmail.com (Julian Onions) Date: Fri, 8 Jan 2010 16:53:50 +0000 Subject: [Bioperl-l] Cladogram construction Message-ID: Does anyone have any sample code for building cladograms based on Pars (one of Phylip tools) type format (or any other format actually) I've got something sort of working but I get no weights on the tree - everything appears as nan. I'd also like to set one of the species to be an outgroup. This is the closest sample I've found so far. #!/usr/bin/perl -w use strict; use Bio::AlignIO; use Bio::Tree::DistanceFactory; use Bio::Align::ProteinStatistics; use Bio::TreeIO; use Bio::Tree::Draw::Cladogram; my $alnfile = shift @ARGV || die "need a file to run"; my $input= Bio::AlignIO->new(-format => 'fasta', -file => $alnfile); if( my $aln = $input->next_aln ) { my $dfactory = Bio::Tree::DistanceFactory->new(-method => 'NJ'); my $stats = Bio::Align::ProteinStatistics->new; my $distmat = $stats->distance(-align => $aln, -method => 'Kimura'); my $treeout = Bio::TreeIO->new(-format => 'newick'); my $tree = $dfactory->make_tree($distmat); $treeout->write_tree($tree); my $obj1 = Bio::Tree::Draw::Cladogram->new(-tree => $tree, -compact => 0); $obj1->print(-file => "tree.eps"); } else { die "could not find any alignments in the file $alnfile"; } Pars input looks like 3 4 Robin 101 Blackbird 100 Sparrow 100 Thanks, Julian. From rtbio.2009 at gmail.com Sat Jan 9 11:57:09 2010 From: rtbio.2009 at gmail.com (Roopa Raghuveer) Date: Sat, 9 Jan 2010 17:57:09 +0100 Subject: [Bioperl-l] Regarding blast in Bioperl In-Reply-To: References: Message-ID: Hello all, Thanks alot for your reply Mark. It was working for Trypanosoma brucei as the organism parameter,but when I tried to use the Organism parameter from the user,it was not working i.e., I was unable to get the target sequences. Please help me in this regard. My code is #!/usr/bin/perl #path for extra camel module use lib "/srv/www/htdocs/rain/RNAi/"; use Roopablast; use Bio::SearchIO; use Bio::Search::Result::BlastResult; use Bio::Perl; use Bio::Tools::Run::RemoteBlast; use Bio::Seq; use Bio::SeqIO; use Bio::DB::GenBank; $serverpath = "/srv/www/htdocs/rain/RNAi"; $serverurl = "http://141.84.66.66/rain/RNAi"; $outfile = $serverpath."/rnairesult_".time().".html"; $nuc = $serverpath."/nuc".time().".txt"; $debugfile = $serverpath."/debug_".time().".txt"; $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; my $outstring =""; &parse_form; print "Content-type: text/html\n\n"; print "\n"; print "RNAi Result"; print " \n"; print "\n"; print "\n"; print " Your results will appear here
"; print " Please be patient, runtime can be up to 5 minutes
"; print " This page will automatically reload in 30 seconds. Roopa"; print "\n"; print "\n"; defined(my $pid = fork) or die "Can't fork: $!"; exit if $pid; open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; open(OUTFILE, '>',$outfile); print OUTFILE "\n RNAi Result \n \n \n Your results will appear here
Please be patient, runtime can be up to 5 minutes wait wait wait......
This page will automatically reload in 30 seconds Roopa
\n \n"; close(OUTFILE); @compseqs = blastcode($in{'Inputseq'},$in{'Organism'}); $in{'Inputseq'} =~ s/>.*$//m; $in{'Inputseq'} =~ s/[^TAGC]//gim; $in{'Inputseq'} =~ tr/actg/ACTG/; @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'}, $in{'Threshold'}); sub blastcode { $inpu1= $_[0]; $organ= $_[1]; open(NUC,'>',$nuc); print NUC $inpu1,"\n"; close(NUC); my $prog = 'blastn'; my $db = 'refseq_rna'; my $e_val= '1e-10'; my $organism= $organ; $gb = new Bio::DB::GenBank; my @params = ( '-prog' => $prog, '-data' => $db, '-expect' => $e_val, '-readmethod' => 'SearchIO', '-Organism' => $organism ); open(OUTFILE,'>',$debugfile); print OUTFILE $inpu1; close(OUTFILE); my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY => '$organ[ORGN]'); #my $factory = Bio::Tools::Run::RemoteBlast->new(@params); #change a paramter #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma Brucei[ORGN]'; #change a paramter # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]'; my $v = 1; #$v is just to turn on and off the messages my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , '-organism' => $organ ); while (my $input = $str->next_seq()) { #Blast a sequence against a database: #Alternatively, you could pass in a file with many #sequences rather than loop through sequence one at a time #Remove the loop starting 'while (my $input = $str->next_seq())' #and swap the two lines below for an example of that. #open(OUTFILE,'>',$debugfile); # print OUTFILE $input; #close(OUTFILE); my $r = $factory->submit_blast($input); open(OUTFILE,'>',$debugfile); # print OUTFILE $r; close(OUTFILE); print STDERR "waiting...." if($v>0); while ( my @rids = $factory->each_rid ) { # open(OUTFILE,'>',$debugfile); # print OUTFILE "while entered"; # close(OUTFILE); foreach my $rid ( @rids ) { # open(OUTFILE,'>',$debugfile); # print OUTFILE "foreach entered"; # close(OUTFILE); my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $factory->remove_rid($rid); } open(OUTFILE,'>',$debugfile); # print OUTFILE "if entered"; close(OUTFILE); print STDERR "." if ( $v > 0 ); sleep 5; } else { # open(OUTFILE,'>',$debugfile); # print OUTFILE "else entered"; # close(OUTFILE); my $result = $rc->next_result(); #save the output $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; open(BLASTDEBUGFILE,'>',$blastdebugfile); print BLASTDEBUGFILE $result->next_hit(); close(BLASTDEBUGFILE); my $filename = $serverpath."/blastdata_".time().$result->query_name()."\.out"; # open(DEBUGFILE,'>',$debugfile); # open(new,'>',$filename); # @arra=; # print DEBUGFILE @arra; # close(DEBUGFILE); # close(new); $factory->save_output($filename); # open(BLASTDEBUGFILE,'>',$debugfile); # print BLASTDEBUGFILE "Hello $rid"; # close(BLASTDEBUGFILE); $factory->remove_rid($rid); open(BLASTDEBUGFILE,'>',$blastdebugfile); print BLASTDEBUGFILE $organism; close(BLASTDEBUGFILE); # open(OUTFILE,'>',$outfile); # print OUTFILE "Test2 $result->database_name()"; # close(OUTFILE); #$hit = $result->next_hit; #open(new,'>',$debugfile); #print $hit; #close(new); while ( my $hit = $result->next_hit ) { next unless ( $v > 0); # open(OUTFILE,'>',$debugfile); # print OUTFILE "$hit in while hits"; # close(OUTFILE); my $sequ = $gb->get_Seq_by_version($hit->name); my $dna = $sequ->seq(); # get the sequence as a string push(@seqs,$dna); } } } } } #open(OUTFILE,'>',$debugfile); #print OUTFILE $seqs[0]; #close(OUTFILE); return(@seqs); } Regards, Roopa. On Fri, Jan 8, 2010 at 4:36 PM, Mark A. Jensen wrote: > Hi Roopa-- > > I got your code to work with the following changes: > > +# the input should be a valid FASTA file... > ... > open(NUC,'>',$nuc); > +print NUC ">seq (need a name line for valid fasta)\n"; > print NUC $inpu1, "\n"; > close(NUC); > ... > > +# you can set these header parms in the call itself... > - my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > + my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY => > ''Trypanosoma Brucei[ORGN]'); > > #change a paramter > +# commented this out... > +# $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma > Brucei[ORGN]'; > > MAJ > ----- Original Message ----- From: "Roopa Raghuveer" > > To: > Sent: Friday, January 08, 2010 10:00 AM > Subject: [Bioperl-l] Regarding blast in Bioperl > > > Hello all, >> >> I was trying Remote blast using Bioperl. My input data is a Trypanosoma >> brucei sequence in Fasta format. When I was trying to submit to BLAST >> using >> the step >> $r=$factory->submit_blast($input) >> It was not returning anything which I checked by debugging the code. It is >> not blasting my input sequence even though I mentioned all the >> parameters.I >> would paste the code below. >> >> Please help me in solving put this problem. It is very urgent. >> >> Regards >> Roopa. >> >> #!/usr/bin/perl >> >> #path for extra camel module >> use lib "/srv/www/htdocs/rain/RNAi/"; >> use Roopablast; >> >> >> use Bio::SearchIO; >> use Bio::Search::Result::BlastResult; >> use Bio::Perl; >> use Bio::Tools::Run::RemoteBlast; >> use Bio::Seq; >> use Bio::SeqIO; >> use Bio::DB::GenBank; >> >> $serverpath = "/srv/www/htdocs/rain/RNAi"; >> $serverurl = "http://141.84.66.66/rain/RNAi"; >> $outfile = $serverpath."/rnairesult_".time().".html"; >> $nuc = $serverpath."/nuc".time().".txt"; >> $debugfile = $serverpath."/debug_".time().".txt"; >> $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; >> >> my $outstring =""; >> >> &parse_form; >> >> print "Content-type: text/html\n\n"; >> print "\n"; >> print "RNAi Result"; >> print "> URL=$serverurl/rnairesult_".time().".html\"> \n"; >> print "\n"; >> print "\n"; >> print " Your results will appear > href=$serverurl/rnairesult_".time().".html>here
"; >> print " Please be patient, runtime can be up to 5 minutes
"; >> print " This page will automatically reload in 30 seconds. Roopa"; >> print "\n"; >> print "\n"; >> >> defined(my $pid = fork) or die "Can't fork: $!"; >> exit if $pid; >> open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; >> open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; >> open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; >> >> >> >> open(OUTFILE, '>',$outfile); >> >> print OUTFILE "\n >> RNAi Result >> > URL=$serverurl//rnairesult_".time().".html\"> \n >> >> \n >> \n >> Your results will appear > href=$serverurl/rnairesult_".time().".html>here
>> Please be patient, runtime can be up to 5 minutes wait wait >> wait......
>> This page will automatically reload in 30 seconds Roopa
>> \n >> \n"; >> >> close(OUTFILE); >> >> >> @compseqs = blastcode($in{'Inputseq'}); >> >> $in{'Inputseq'} =~ s/>.*$//m; >> $in{'Inputseq'} =~ s/[^TAGC]//gim; >> $in{'Inputseq'} =~ tr/actg/ACTG/; >> >> @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'}, >> $in{'Threshold'}); >> >> >> sub blastcode >> { >> >> $inpu1= $_[0]; >> >> #$organ= $_[1]; >> >> open(NUC,'>',$nuc); >> print NUC $inpu1; >> close(NUC); >> >> my $prog = 'blastn'; >> my $db = 'refseq_rna'; >> my $e_val= '1e-10'; >> my $organism= 'Trypanosoma Brucei'; >> >> $gb = new Bio::DB::GenBank; >> >> my @params = ( '-prog' => $prog, >> '-data' => $db, >> '-expect' => $e_val, >> '-readmethod' => 'SearchIO', >> '-Organism' => $organism ); >> >> # open(OUTFILE,'>',$debugfile); >> # print OUTFILE @params; >> # close(OUTFILE); >> >> >> my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >> >> #change a paramter >> >> $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma >> Brucei[ORGN]'; >> >> #change a paramter >> # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]'; >> >> my $v = 1; >> #$v is just to turn on and off the messages >> >> my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , >> '-organism' => 'Trypanosoma Brucei' ); >> >> >> while (my $input = $str->next_seq()) >> { >> #Blast a sequence against a database: >> #Alternatively, you could pass in a file with many >> #sequences rather than loop through sequence one at a time >> #Remove the loop starting 'while (my $input = $str->next_seq())' >> #and swap the two lines below for an example of that. >> >> open(OUTFILE,'>',$debugfile); >> print OUTFILE $input; >> close(OUTFILE); >> >> >> my $r = $factory->submit_blast($input); #The program stops here it >> does not return any value and it does not enter the While loop,Please help >> me in this regard.# >> open(OUTFILE,'>',$debugfile); >> print OUTFILE $r; >> close(OUTFILE); >> >> >> print STDERR "waiting...." if($v>0); >> >> while ( my @rids = $factory->each_rid ) { >> open(OUTFILE,'>',$debugfile); >> print OUTFILE "while entered"; >> close(OUTFILE); >> foreach my $rid ( @rids ) { >> >> open(OUTFILE,'>',$debugfile); >> print OUTFILE "foreach entered"; >> close(OUTFILE); >> >> my $rc = $factory->retrieve_blast($rid); >> >> if( !ref($rc) ) >> { >> if( $rc < 0 ) >> { >> $factory->remove_rid($rid); >> } >> open(OUTFILE,'>',$debugfile); >> print OUTFILE "if entered"; >> close(OUTFILE); >> print STDERR "." if ( $v > 0 ); >> sleep 5; >> } >> else { >> open(OUTFILE,'>',$debugfile); >> print OUTFILE "else entered"; >> close(OUTFILE); >> >> my $result = $rc->next_result(); >> #save the output >> $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; >> >> open(BLASTDEBUGFILE,'>',$blastdebugfile); >> print BLASTDEBUGFILE $result->next_hit(); >> close(BLASTDEBUGFILE); >> >> my $filename = >> $serverpath."/blastdata_".time().$result->query_name()."\.out"; >> >> # open(DEBUGFILE,'>',$debugfile); >> # open(new,'>',$filename); >> # @arra=; >> # print DEBUGFILE @arra; >> # close(DEBUGFILE); >> # close(new); >> >> $factory->save_output($filename); >> >> # open(BLASTDEBUGFILE,'>',$debugfile); >> # print BLASTDEBUGFILE "Hello $rid"; >> # close(BLASTDEBUGFILE); >> >> $factory->remove_rid($rid); >> >> open(BLASTDEBUGFILE,'>',$blastdebugfile); >> print BLASTDEBUGFILE $organism; >> close(BLASTDEBUGFILE); >> >> # open(OUTFILE,'>',$outfile); >> # print OUTFILE "Test2 $result->database_name()"; >> # close(OUTFILE); >> >> #$hit = $result->next_hit; >> #open(new,'>',$debugfile); >> #print $hit; >> #close(new); >> >> while ( my $hit = $result->next_hit ) { >> >> next unless ( $v > 0); >> >> # open(OUTFILE,'>',$debugfile); >> # print OUTFILE "$hit in while hits"; >> # close(OUTFILE); >> >> my $sequ = $gb->get_Seq_by_version($hit->name); >> my $dna = $sequ->seq(); # get the sequence as a string >> push(@seqs,$dna); >> } >> } >> } >> } >> } >> >> #open(OUTFILE,'>',$debugfile); >> #print OUTFILE $seqs[0]; >> #close(OUTFILE); >> >> return(@seqs); >> >> } >> >> open(OUTFILE, '>',$outfile) || die ; >> >> print OUTFILE "\n >> RNAi Result >> \n >> \n >>

>> Inputsequence:
"; >> >> for ($i=0; $i> >> print OUTFILE substr ($in{'Inputseq'}, $i, 1); >> >> if ( ($i+1)%10==0){ >> print OUTFILE " "; >> } >> if ( ($i+1)%60==0){ >> print OUTFILE "
\n"; >> } >> } >> >> >> >> print OUTFILE "

"; >> >> $z=@compseqs; >> >> for($k=1;$k<$z;$k++) { >> print OUTFILE "

Compare >> Sequence:
"; >> >> for ($i=0; $i> >> print OUTFILE substr ($compseqs[$k], $i, 1); >> >> if ( ($i+1)%10==0){ >> print OUTFILE " "; >> } >> if ( ($i+1)%60==0){ >> print OUTFILE "
\n"; >> } >> } >> print OUTFILE "

"; >> } >> >> print OUTFILE "

>> Window:
$in{'Windowsize'} >>

>>

>> Threshold:
$in{'Threshold'} >>

"; >> my $j=0; >> >> for ($i=0; $i> >> if ($i<=length ($in{'Inputseq'})-$in{'Windowsize'}){ >> if ($out[$i]->{similar}<=$in{'Threshold'}){ >> $j=$in{'Windowsize'}; >> } >> $height=$out[$i]->{similar}*5; >> } >> >> if ($j>0) { >> print OUTFILE "> height=\"5\">"; >> $outstring .= "".substr ($in{'Inputseq'}, $i, >> 1).""; >> $j--; >> } >> else { >> print OUTFILE "> height=\"5\">"; >> $outstring .= "".substr ($in{'Inputseq'}, $i, >> 1).""; >> } >> >> if ( ($i+1)%10==0){ >> $outstring .= " "; >> } >> if ( ($i+1)%60==0){ >> $outstring .= "
\n"; >> >> } >> if ( ($i+1)%800==0){ >> print OUTFILE "

\n"; >> >> } >> } >> >> print OUTFILE "

> set\">$outstring"; >> >> #foreach (@out) { >> #print OUTFILE "

Sequence: $_->{sequence}: $_->{similar} matchs

"; >> #if ($_->{similar}<=$in{'Threshold'}){ >> >> # } >> #} >> >> print OUTFILE "\n\n"; >> >> close OUTFILE; >> >> #nameprint(); >> >> sub parse_form { >> local ($buffer, @pairs, $pair, $name, $value); >> # Read in text >> $ENV{'REQUEST_METHOD'} =~ tr/a-z/A-Z/; >> if ($ENV{'REQUEST_METHOD'} eq "POST") >> { >> read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'}); >> } >> else >> { >> $buffer = $ENV{'QUERY_STRING'}; >> } >> @pairs = split(/&/, $buffer); >> foreach $pair (@pairs) >> { >> ($name, $value) = split(/=/, $pair); >> $value =~ tr/+/ /; >> $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg; >> $in{$name} = $value; >> } >> } >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > From maj at fortinbras.us Sat Jan 9 13:05:41 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 9 Jan 2010 13:05:41 -0500 Subject: [Bioperl-l] Regarding blast in Bioperl In-Reply-To: References: Message-ID: <4C2E8133F916495B876628EF3E8FCBB2@NewLife> I see it immediately (from making same bug many times) : my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY => - '$organ[ORGN]'); +"$organ[ORGN]"); MAJ ----- Original Message ----- From: "Roopa Raghuveer" To: "Mark A. Jensen" Cc: Sent: Saturday, January 09, 2010 11:57 AM Subject: Re: [Bioperl-l] Regarding blast in Bioperl > Hello all, > > Thanks alot for your reply Mark. It was working for Trypanosoma brucei as > the organism parameter,but when I tried to use the Organism parameter from > the user,it was not working i.e., I was unable to get the target sequences. > Please help me in this regard. My code is > > #!/usr/bin/perl > > #path for extra camel module > use lib "/srv/www/htdocs/rain/RNAi/"; > use Roopablast; > > > use Bio::SearchIO; > use Bio::Search::Result::BlastResult; > use Bio::Perl; > use Bio::Tools::Run::RemoteBlast; > use Bio::Seq; > use Bio::SeqIO; > use Bio::DB::GenBank; > > $serverpath = "/srv/www/htdocs/rain/RNAi"; > $serverurl = "http://141.84.66.66/rain/RNAi"; > $outfile = $serverpath."/rnairesult_".time().".html"; > $nuc = $serverpath."/nuc".time().".txt"; > $debugfile = $serverpath."/debug_".time().".txt"; > $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; > > my $outstring =""; > > &parse_form; > > print "Content-type: text/html\n\n"; > print "\n"; > print "RNAi Result"; > print " URL=$serverurl/rnairesult_".time().".html\"> \n"; > print "\n"; > print "\n"; > print " Your results will appear href=$serverurl/rnairesult_".time().".html>here
"; > print " Please be patient, runtime can be up to 5 minutes
"; > print " This page will automatically reload in 30 seconds. Roopa"; > print "\n"; > print "\n"; > > defined(my $pid = fork) or die "Can't fork: $!"; > exit if $pid; > open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; > open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; > open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; > > open(OUTFILE, '>',$outfile); > > print OUTFILE "\n > RNAi Result > URL=$serverurl//rnairesult_".time().".html\"> \n > > \n > \n > Your results will appear href=$serverurl/rnairesult_".time().".html>here
> Please be patient, runtime can be up to 5 minutes wait wait wait......
> This page will automatically reload in 30 seconds Roopa
> \n > \n"; > > close(OUTFILE); > > > @compseqs = blastcode($in{'Inputseq'},$in{'Organism'}); > > $in{'Inputseq'} =~ s/>.*$//m; > $in{'Inputseq'} =~ s/[^TAGC]//gim; > $in{'Inputseq'} =~ tr/actg/ACTG/; > > @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'}, > $in{'Threshold'}); > > > sub blastcode > { > > $inpu1= $_[0]; > > $organ= $_[1]; > > open(NUC,'>',$nuc); > print NUC $inpu1,"\n"; > close(NUC); > > my $prog = 'blastn'; > my $db = 'refseq_rna'; > my $e_val= '1e-10'; > my $organism= $organ; > > $gb = new Bio::DB::GenBank; > > my @params = ( '-prog' => $prog, > '-data' => $db, > '-expect' => $e_val, > '-readmethod' => 'SearchIO', > '-Organism' => $organism ); > > open(OUTFILE,'>',$debugfile); > print OUTFILE $inpu1; > close(OUTFILE); > > > my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY => > '$organ[ORGN]'); > > #my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > > #change a paramter > > #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma > Brucei[ORGN]'; > > #change a paramter > # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]'; > > my $v = 1; > #$v is just to turn on and off the messages > > my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , > '-organism' => $organ ); > > > while (my $input = $str->next_seq()) > { > #Blast a sequence against a database: > #Alternatively, you could pass in a file with many > #sequences rather than loop through sequence one at a time > #Remove the loop starting 'while (my $input = $str->next_seq())' > #and swap the two lines below for an example of that. > > #open(OUTFILE,'>',$debugfile); > # print OUTFILE $input; > #close(OUTFILE); > > > my $r = $factory->submit_blast($input); > > open(OUTFILE,'>',$debugfile); > # print OUTFILE $r; > close(OUTFILE); > > print STDERR "waiting...." if($v>0); > > while ( my @rids = $factory->each_rid ) { > # open(OUTFILE,'>',$debugfile); > # print OUTFILE "while entered"; > # close(OUTFILE); > foreach my $rid ( @rids ) { > > # open(OUTFILE,'>',$debugfile); > # print OUTFILE "foreach entered"; > # close(OUTFILE); > > my $rc = $factory->retrieve_blast($rid); > > if( !ref($rc) ) > { > if( $rc < 0 ) > { > $factory->remove_rid($rid); > } > open(OUTFILE,'>',$debugfile); > # print OUTFILE "if entered"; > close(OUTFILE); > print STDERR "." if ( $v > 0 ); > sleep 5; > } > else { > # open(OUTFILE,'>',$debugfile); > # print OUTFILE "else entered"; > # close(OUTFILE); > > my $result = $rc->next_result(); > #save the output > $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; > > open(BLASTDEBUGFILE,'>',$blastdebugfile); > print BLASTDEBUGFILE $result->next_hit(); > close(BLASTDEBUGFILE); > > my $filename = > $serverpath."/blastdata_".time().$result->query_name()."\.out"; > > # open(DEBUGFILE,'>',$debugfile); > # open(new,'>',$filename); > # @arra=; > # print DEBUGFILE @arra; > # close(DEBUGFILE); > # close(new); > > $factory->save_output($filename); > # open(BLASTDEBUGFILE,'>',$debugfile); > # print BLASTDEBUGFILE "Hello $rid"; > # close(BLASTDEBUGFILE); > > $factory->remove_rid($rid); > > open(BLASTDEBUGFILE,'>',$blastdebugfile); > print BLASTDEBUGFILE $organism; > close(BLASTDEBUGFILE); > > # open(OUTFILE,'>',$outfile); > # print OUTFILE "Test2 $result->database_name()"; > # close(OUTFILE); > > #$hit = $result->next_hit; > #open(new,'>',$debugfile); > #print $hit; > #close(new); > > while ( my $hit = $result->next_hit ) { > > next unless ( $v > 0); > > # open(OUTFILE,'>',$debugfile); > # print OUTFILE "$hit in while hits"; > # close(OUTFILE); > > my $sequ = $gb->get_Seq_by_version($hit->name); > my $dna = $sequ->seq(); # get the sequence as a string > push(@seqs,$dna); > } > } > } > } > } > > #open(OUTFILE,'>',$debugfile); > #print OUTFILE $seqs[0]; > #close(OUTFILE); > > return(@seqs); > > } > > Regards, > Roopa. > > > On Fri, Jan 8, 2010 at 4:36 PM, Mark A. Jensen wrote: > >> Hi Roopa-- >> >> I got your code to work with the following changes: >> >> +# the input should be a valid FASTA file... >> ... >> open(NUC,'>',$nuc); >> +print NUC ">seq (need a name line for valid fasta)\n"; >> print NUC $inpu1, "\n"; >> close(NUC); >> ... >> >> +# you can set these header parms in the call itself... >> - my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >> + my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY => >> ''Trypanosoma Brucei[ORGN]'); >> >> #change a paramter >> +# commented this out... >> +# $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma >> Brucei[ORGN]'; >> >> MAJ >> ----- Original Message ----- From: "Roopa Raghuveer" > > >> To: >> Sent: Friday, January 08, 2010 10:00 AM >> Subject: [Bioperl-l] Regarding blast in Bioperl >> >> >> Hello all, >>> >>> I was trying Remote blast using Bioperl. My input data is a Trypanosoma >>> brucei sequence in Fasta format. When I was trying to submit to BLAST >>> using >>> the step >>> $r=$factory->submit_blast($input) >>> It was not returning anything which I checked by debugging the code. It is >>> not blasting my input sequence even though I mentioned all the >>> parameters.I >>> would paste the code below. >>> >>> Please help me in solving put this problem. It is very urgent. >>> >>> Regards >>> Roopa. >>> >>> #!/usr/bin/perl >>> >>> #path for extra camel module >>> use lib "/srv/www/htdocs/rain/RNAi/"; >>> use Roopablast; >>> >>> >>> use Bio::SearchIO; >>> use Bio::Search::Result::BlastResult; >>> use Bio::Perl; >>> use Bio::Tools::Run::RemoteBlast; >>> use Bio::Seq; >>> use Bio::SeqIO; >>> use Bio::DB::GenBank; >>> >>> $serverpath = "/srv/www/htdocs/rain/RNAi"; >>> $serverurl = "http://141.84.66.66/rain/RNAi"; >>> $outfile = $serverpath."/rnairesult_".time().".html"; >>> $nuc = $serverpath."/nuc".time().".txt"; >>> $debugfile = $serverpath."/debug_".time().".txt"; >>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; >>> >>> my $outstring =""; >>> >>> &parse_form; >>> >>> print "Content-type: text/html\n\n"; >>> print "\n"; >>> print "RNAi Result"; >>> print ">> URL=$serverurl/rnairesult_".time().".html\"> \n"; >>> print "\n"; >>> print "\n"; >>> print " Your results will appear >> href=$serverurl/rnairesult_".time().".html>here
"; >>> print " Please be patient, runtime can be up to 5 minutes
"; >>> print " This page will automatically reload in 30 seconds. Roopa"; >>> print "\n"; >>> print "\n"; >>> >>> defined(my $pid = fork) or die "Can't fork: $!"; >>> exit if $pid; >>> open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; >>> open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; >>> open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; >>> >>> >>> >>> open(OUTFILE, '>',$outfile); >>> >>> print OUTFILE "\n >>> RNAi Result >>> >> URL=$serverurl//rnairesult_".time().".html\"> \n >>> >>> \n >>> \n >>> Your results will appear >> href=$serverurl/rnairesult_".time().".html>here
>>> Please be patient, runtime can be up to 5 minutes wait wait >>> wait......
>>> This page will automatically reload in 30 seconds Roopa
>>> \n >>> \n"; >>> >>> close(OUTFILE); >>> >>> >>> @compseqs = blastcode($in{'Inputseq'}); >>> >>> $in{'Inputseq'} =~ s/>.*$//m; >>> $in{'Inputseq'} =~ s/[^TAGC]//gim; >>> $in{'Inputseq'} =~ tr/actg/ACTG/; >>> >>> @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'}, >>> $in{'Threshold'}); >>> >>> >>> sub blastcode >>> { >>> >>> $inpu1= $_[0]; >>> >>> #$organ= $_[1]; >>> >>> open(NUC,'>',$nuc); >>> print NUC $inpu1; >>> close(NUC); >>> >>> my $prog = 'blastn'; >>> my $db = 'refseq_rna'; >>> my $e_val= '1e-10'; >>> my $organism= 'Trypanosoma Brucei'; >>> >>> $gb = new Bio::DB::GenBank; >>> >>> my @params = ( '-prog' => $prog, >>> '-data' => $db, >>> '-expect' => $e_val, >>> '-readmethod' => 'SearchIO', >>> '-Organism' => $organism ); >>> >>> # open(OUTFILE,'>',$debugfile); >>> # print OUTFILE @params; >>> # close(OUTFILE); >>> >>> >>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >>> >>> #change a paramter >>> >>> $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma >>> Brucei[ORGN]'; >>> >>> #change a paramter >>> # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]'; >>> >>> my $v = 1; >>> #$v is just to turn on and off the messages >>> >>> my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , >>> '-organism' => 'Trypanosoma Brucei' ); >>> >>> >>> while (my $input = $str->next_seq()) >>> { >>> #Blast a sequence against a database: >>> #Alternatively, you could pass in a file with many >>> #sequences rather than loop through sequence one at a time >>> #Remove the loop starting 'while (my $input = $str->next_seq())' >>> #and swap the two lines below for an example of that. >>> >>> open(OUTFILE,'>',$debugfile); >>> print OUTFILE $input; >>> close(OUTFILE); >>> >>> >>> my $r = $factory->submit_blast($input); #The program stops here it >>> does not return any value and it does not enter the While loop,Please help >>> me in this regard.# >>> open(OUTFILE,'>',$debugfile); >>> print OUTFILE $r; >>> close(OUTFILE); >>> >>> >>> print STDERR "waiting...." if($v>0); >>> >>> while ( my @rids = $factory->each_rid ) { >>> open(OUTFILE,'>',$debugfile); >>> print OUTFILE "while entered"; >>> close(OUTFILE); >>> foreach my $rid ( @rids ) { >>> >>> open(OUTFILE,'>',$debugfile); >>> print OUTFILE "foreach entered"; >>> close(OUTFILE); >>> >>> my $rc = $factory->retrieve_blast($rid); >>> >>> if( !ref($rc) ) >>> { >>> if( $rc < 0 ) >>> { >>> $factory->remove_rid($rid); >>> } >>> open(OUTFILE,'>',$debugfile); >>> print OUTFILE "if entered"; >>> close(OUTFILE); >>> print STDERR "." if ( $v > 0 ); >>> sleep 5; >>> } >>> else { >>> open(OUTFILE,'>',$debugfile); >>> print OUTFILE "else entered"; >>> close(OUTFILE); >>> >>> my $result = $rc->next_result(); >>> #save the output >>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; >>> >>> open(BLASTDEBUGFILE,'>',$blastdebugfile); >>> print BLASTDEBUGFILE $result->next_hit(); >>> close(BLASTDEBUGFILE); >>> >>> my $filename = >>> $serverpath."/blastdata_".time().$result->query_name()."\.out"; >>> >>> # open(DEBUGFILE,'>',$debugfile); >>> # open(new,'>',$filename); >>> # @arra=; >>> # print DEBUGFILE @arra; >>> # close(DEBUGFILE); >>> # close(new); >>> >>> $factory->save_output($filename); >>> >>> # open(BLASTDEBUGFILE,'>',$debugfile); >>> # print BLASTDEBUGFILE "Hello $rid"; >>> # close(BLASTDEBUGFILE); >>> >>> $factory->remove_rid($rid); >>> >>> open(BLASTDEBUGFILE,'>',$blastdebugfile); >>> print BLASTDEBUGFILE $organism; >>> close(BLASTDEBUGFILE); >>> >>> # open(OUTFILE,'>',$outfile); >>> # print OUTFILE "Test2 $result->database_name()"; >>> # close(OUTFILE); >>> >>> #$hit = $result->next_hit; >>> #open(new,'>',$debugfile); >>> #print $hit; >>> #close(new); >>> >>> while ( my $hit = $result->next_hit ) { >>> >>> next unless ( $v > 0); >>> >>> # open(OUTFILE,'>',$debugfile); >>> # print OUTFILE "$hit in while hits"; >>> # close(OUTFILE); >>> >>> my $sequ = $gb->get_Seq_by_version($hit->name); >>> my $dna = $sequ->seq(); # get the sequence as a string >>> push(@seqs,$dna); >>> } >>> } >>> } >>> } >>> } >>> >>> #open(OUTFILE,'>',$debugfile); >>> #print OUTFILE $seqs[0]; >>> #close(OUTFILE); >>> >>> return(@seqs); >>> >>> } >>> >>> open(OUTFILE, '>',$outfile) || die ; >>> >>> print OUTFILE "\n >>> RNAi Result >>> \n >>> \n >>>

>>> Inputsequence:
"; >>> >>> for ($i=0; $i>> >>> print OUTFILE substr ($in{'Inputseq'}, $i, 1); >>> >>> if ( ($i+1)%10==0){ >>> print OUTFILE " "; >>> } >>> if ( ($i+1)%60==0){ >>> print OUTFILE "
\n"; >>> } >>> } >>> >>> >>> >>> print OUTFILE "

"; >>> >>> $z=@compseqs; >>> >>> for($k=1;$k<$z;$k++) { >>> print OUTFILE "

Compare >>> Sequence:
"; >>> >>> for ($i=0; $i>> >>> print OUTFILE substr ($compseqs[$k], $i, 1); >>> >>> if ( ($i+1)%10==0){ >>> print OUTFILE " "; >>> } >>> if ( ($i+1)%60==0){ >>> print OUTFILE "
\n"; >>> } >>> } >>> print OUTFILE "

"; >>> } >>> >>> print OUTFILE "

>>> Window:
$in{'Windowsize'} >>>

>>>

>>> Threshold:
$in{'Threshold'} >>>

"; >>> my $j=0; >>> >>> for ($i=0; $i>> >>> if ($i<=length ($in{'Inputseq'})-$in{'Windowsize'}){ >>> if ($out[$i]->{similar}<=$in{'Threshold'}){ >>> $j=$in{'Windowsize'}; >>> } >>> $height=$out[$i]->{similar}*5; >>> } >>> >>> if ($j>0) { >>> print OUTFILE ">> height=\"5\">"; >>> $outstring .= "".substr ($in{'Inputseq'}, $i, >>> 1).""; >>> $j--; >>> } >>> else { >>> print OUTFILE ">> height=\"5\">"; >>> $outstring .= "".substr ($in{'Inputseq'}, $i, >>> 1).""; >>> } >>> >>> if ( ($i+1)%10==0){ >>> $outstring .= " "; >>> } >>> if ( ($i+1)%60==0){ >>> $outstring .= "
\n"; >>> >>> } >>> if ( ($i+1)%800==0){ >>> print OUTFILE "

\n"; >>> >>> } >>> } >>> >>> print OUTFILE "

>> set\">$outstring"; >>> >>> #foreach (@out) { >>> #print OUTFILE "

Sequence: $_->{sequence}: $_->{similar} matchs

"; >>> #if ($_->{similar}<=$in{'Threshold'}){ >>> >>> # } >>> #} >>> >>> print OUTFILE "\n\n"; >>> >>> close OUTFILE; >>> >>> #nameprint(); >>> >>> sub parse_form { >>> local ($buffer, @pairs, $pair, $name, $value); >>> # Read in text >>> $ENV{'REQUEST_METHOD'} =~ tr/a-z/A-Z/; >>> if ($ENV{'REQUEST_METHOD'} eq "POST") >>> { >>> read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'}); >>> } >>> else >>> { >>> $buffer = $ENV{'QUERY_STRING'}; >>> } >>> @pairs = split(/&/, $buffer); >>> foreach $pair (@pairs) >>> { >>> ($name, $value) = split(/=/, $pair); >>> $value =~ tr/+/ /; >>> $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg; >>> $in{$name} = $value; >>> } >>> } >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From robert.bradbury at gmail.com Sat Jan 9 14:52:53 2010 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Sat, 9 Jan 2010 14:52:53 -0500 Subject: [Bioperl-l] Regarding blast in Bioperl In-Reply-To: References: Message-ID: Roopa, Mark is correct, you have to be very careful of single vs. double quotes in perl. Double quoted strings are "interpreted" while single quoted strings are taken literally is my current understanding. I tried to run your script (with fixes) but without the supporting files it appears to be impossible. What I am curious about is what it is trying to do, I was particularly i particularly intrigued by some apparent efforts to parse blast results into color enhanced HTML and without thinking about the code in detail it seems easier to simply ask what you are trying to do? I find "classical" blast results particularly tedious and long for blast results that display concise information as the NCBI homologene cross-species comparisons do. Unfortunately NCBI has deemed their methods (I have asked them) "too complex to disclose (for a person comfortable in dealing with assembly language, or even gate level electronics -- "too complex" is a very relative concept)". One has the option of using NCBI with a limited number of species but good display methodologies or Ensembl with many more species but less desirable display methodologies (phylogenetic tree derived from cross species comparisons). And for the WRN protein which may play a key role in aging (through the activity of its exonuclease domain mutating DNA sequences and inducing microdeletions and microinsertions this gets important because it appears that the *C. elegans* genome is missing the exonuclease domain (so it may be useless from the perspective of studying aging), and the other 4 nematode species which have been sequenced aren't even in the NCBI nor the Ensembl databases. Needless to say, if we manage in the near future, given the drop in sequencing costs, to sequence the nematodes which are freeze/thaw tolerant (which induces DSB that have to be repaired) those genomes will be unlikely to be in the NCBI/Ensembl databases either. So there is a requirement for the user to develop the ability to mix and match public and obscure databases in creative ways to provide easy to interpret information. Robert Bradbury From robert.bradbury at gmail.com Sat Jan 9 15:27:54 2010 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Sat, 9 Jan 2010 15:27:54 -0500 Subject: [Bioperl-l] Ensembl problems Message-ID: I am trying to get the examples provided by EMBL/Ensembl to work and am encountering problems. For example, about 1/3 of the way through the Compara API tutorial [1] there is what is supposed to be a completely functional script. It does not work. This is in contrast to some of the earlier simple scripts (listing the species in Ensmbl etc.) which do work on my machine, so I have all the libraries do dah installed correctly). Very poor form to document scripts which do not function on a properly setup system. I have modified my invocation of the script slightly: Align.pl --set_of_species \ "Homo sapiens:Pan troglodytes:Gorilla gorilla:Macaca mulatta:Otolemur garnettii:Pongo pygmaeus:Equus caballus:Bos taurus:Loxodonta africana:Pteropus vampyrus:Myotis lucifugus:Felis catus:Canis familiaris:Sus scrofa:Rattus norvegicus:Mus musculus:Gallus gallus:Xenopus tropicalis:Takifugu rubripes:Tetraodon nigroviridis:Danio rerio:Tupaia belangeri:Caenorhabditis elegans:Saccharomyces cerevisiae" which results in "Can't call method "fetch_by_method_link_type_GenomeDBs" on an undefined value at ./Align.pl line 132." (Align.pl is my slightly modified example of the Compara Tutoraial code.) As these are slightly modified perl scripts from the documantation, the line numbers may be variable. I can print out the genome_dbs, and it gives me a list of genome names (hash tables) though it appears that is problematic in the Align.pl script. in spite of the fact that just previously to that call I dumped "genome_dbs" and got back some 25 hash tables (expected). I believe this occurs whether one is comparing "human:mouse" or the more complex species set I have outlined above. Has anyone else attempted to run the code documented in the Ensembl API Tutorial? Any suggestions as to what direction to go in would be appreciated -- when one is trying to copy code out of a tutorial and it fails its kind of hard to know where to go.) There do appear to be some problems in the specifications of a Compara version/database and there don't appear to be a lot of resources informing one of what resources are currently available. Robert 1. http://pre.ensembl.org/info/docs/api/compara/compara_tutorial.html From ak at ebi.ac.uk Sat Jan 9 17:01:21 2010 From: ak at ebi.ac.uk (Andreas =?iso-8859-1?B?S+Ro5HJp?=) Date: Sat, 9 Jan 2010 22:01:21 +0000 Subject: [Bioperl-l] Ensembl problems In-Reply-To: References: Message-ID: <20100109220121.GA9521@quux.windows.ebi.ac.uk> On Sat, Jan 09, 2010 at 03:27:54PM -0500, Robert Bradbury wrote: > I am trying to get the examples provided by EMBL/Ensembl to work and am > encountering problems. Hi Robert, The ensembl-dev list is the appropriate forum for this type of questions as it has nothing to do with bioperl. There is also the Ensembl helpdesk. If you send your problem to I'm sure that it will be picked up by the appropriate people (I do myself not know enough about the Compara API to be able to diagnose this problem straight away I'm afraid). Be sure to submit a minimal script that still exhibit the problem, and information about what version of the APIs you're using (we will assume that you're not mixing newer version of the API with older databases or vice versa). We are generally very happy to have bugs in documentation or code pointed out to us, and will correct errors as we are made aware of them. Kind regards, Andreas > For example, about 1/3 of the way through the Compara API tutorial [1] there > is what is supposed to be a completely functional script. It does not > work. This is in contrast to some of the earlier simple scripts (listing > the species in Ensmbl etc.) which do work on my machine, so I have all the > libraries do dah installed correctly). > > Very poor form to document scripts which do not function on a properly setup > system. > > I have modified my invocation of the script slightly: > Align.pl --set_of_species \ > "Homo sapiens:Pan troglodytes:Gorilla gorilla:Macaca mulatta:Otolemur > garnettii:Pongo pygmaeus:Equus caballus:Bos taurus:Loxodonta > africana:Pteropus vampyrus:Myotis lucifugus:Felis catus:Canis familiaris:Sus > scrofa:Rattus norvegicus:Mus musculus:Gallus gallus:Xenopus > tropicalis:Takifugu rubripes:Tetraodon nigroviridis:Danio rerio:Tupaia > belangeri:Caenorhabditis elegans:Saccharomyces cerevisiae" > > which results in "Can't call method "fetch_by_method_link_type_GenomeDBs" on > an undefined value at ./Align.pl line 132." (Align.pl is my slightly > modified example of the Compara Tutoraial code.) > As these are slightly modified perl scripts from the documantation, the line > numbers may be variable. > > I can print out the genome_dbs, and it gives me a list of genome names (hash > tables) though it appears that is problematic in the Align.pl script. > in spite of the fact that just previously to that call I dumped "genome_dbs" > and got back some 25 hash tables (expected). I believe this occurs whether > one is comparing "human:mouse" or the more complex species set I have > outlined above. > > > > Has anyone else attempted to run the code documented in the Ensembl API > Tutorial? > Any suggestions as to what direction to go in would be appreciated -- when > one is trying to copy code out of a tutorial and it fails its kind of hard > to know where to go.) > > There do appear to be some problems in the specifications of a Compara > version/database and there don't appear to be a lot of resources informing > one of what resources are currently available. > > Robert > > > 1. http://pre.ensembl.org/info/docs/api/compara/compara_tutorial.html > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Andreas K?h?ri, Ensembl Software Developer European Bioinformatics Institute (EMBL-EBI) Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, United Kingdom From cjfields at illinois.edu Sat Jan 9 17:01:19 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 9 Jan 2010 16:01:19 -0600 Subject: [Bioperl-l] Ensembl problems In-Reply-To: References: Message-ID: <743C998D-BBB5-4832-BA25-24D7D7288F78@illinois.edu> Robert, Ensembl errors probably should be redirected to the ensembl mail list. I can't speak to the problems with it (they appear specific to the Ensembl tool set). chris On Jan 9, 2010, at 2:27 PM, Robert Bradbury wrote: > I am trying to get the examples provided by EMBL/Ensembl to work and am > encountering problems. > > For example, about 1/3 of the way through the Compara API tutorial [1] there > is what is supposed to be a completely functional script. It does not > work. This is in contrast to some of the earlier simple scripts (listing > the species in Ensmbl etc.) which do work on my machine, so I have all the > libraries do dah installed correctly). > > Very poor form to document scripts which do not function on a properly setup > system. > > I have modified my invocation of the script slightly: > Align.pl --set_of_species \ > "Homo sapiens:Pan troglodytes:Gorilla gorilla:Macaca mulatta:Otolemur > garnettii:Pongo pygmaeus:Equus caballus:Bos taurus:Loxodonta > africana:Pteropus vampyrus:Myotis lucifugus:Felis catus:Canis familiaris:Sus > scrofa:Rattus norvegicus:Mus musculus:Gallus gallus:Xenopus > tropicalis:Takifugu rubripes:Tetraodon nigroviridis:Danio rerio:Tupaia > belangeri:Caenorhabditis elegans:Saccharomyces cerevisiae" > > which results in "Can't call method "fetch_by_method_link_type_GenomeDBs" on > an undefined value at ./Align.pl line 132." (Align.pl is my slightly > modified example of the Compara Tutoraial code.) > As these are slightly modified perl scripts from the documantation, the line > numbers may be variable. > > I can print out the genome_dbs, and it gives me a list of genome names (hash > tables) though it appears that is problematic in the Align.pl script. > in spite of the fact that just previously to that call I dumped "genome_dbs" > and got back some 25 hash tables (expected). I believe this occurs whether > one is comparing "human:mouse" or the more complex species set I have > outlined above. > > > > Has anyone else attempted to run the code documented in the Ensembl API > Tutorial? > Any suggestions as to what direction to go in would be appreciated -- when > one is trying to copy code out of a tutorial and it fails its kind of hard > to know where to go.) > > There do appear to be some problems in the specifications of a Compara > version/database and there don't appear to be a lot of resources informing > one of what resources are currently available. > > Robert > > > 1. http://pre.ensembl.org/info/docs/api/compara/compara_tutorial.html > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From robert.bradbury at gmail.com Sun Jan 10 14:47:00 2010 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Sun, 10 Jan 2010 14:47:00 -0500 Subject: [Bioperl-l] Ensembl problems In-Reply-To: <20100109220121.GA9521@quux.windows.ebi.ac.uk> References: <20100109220121.GA9521@quux.windows.ebi.ac.uk> Message-ID: As it turns out the example from the file I cited (the compara API tutorial does work). The code that I started with may have been from a "MS-WORD" document distributed with the documentation (which could quite well be out-of-date). But even the corrected code does not work for various uncommon comparisons between species (which they may not have archived in Ensembl). I also don't understand enough about the functions yet as to whether they are comparing the same regions from the same chromosomes that just happen to be identical or whether they are comparing the same region with a homologous region on a different chromosome (i.e. conserved genes). I'm going to have to dig into this some more to figure out what is going on. Thanks for the pointers, I'll refer future questions to the Ensembl list/help-desk. However, if anyone knows Ensembl very well, the database has in it some of these interspecies comparisons already. They are accessed when one does a phylogeny tree for specific genes (and generally for highly conserved gene you will get a tree that includes nearly all 50 species in the database). As I don't think they are computed on-the-fly, the information must be precomputed and stored someplace in the database. I would very much like to know how to access this information. Thanks, Robert On 1/9/10, Andreas K?h?ri wrote: > On Sat, Jan 09, 2010 at 03:27:54PM -0500, Robert Bradbury wrote: >> I am trying to get the examples provided by EMBL/Ensembl to work and am >> encountering problems. > > Hi Robert, > > The ensembl-dev list is the appropriate forum for this type of questions > as it has nothing to do with bioperl. > > There is also the Ensembl helpdesk. If you send your problem to > I'm sure that it will be picked up by the > appropriate people (I do myself not know enough about the Compara API to > be able to diagnose this problem straight away I'm afraid). > > Be sure to submit a minimal script that still exhibit the problem, and > information about what version of the APIs you're using (we will assume > that you're not mixing newer version of the API with older databases or > vice versa). > > We are generally very happy to have bugs in documentation or code > pointed out to us, and will correct errors as we are made aware of them. > > > Kind regards, > Andreas > >> For example, about 1/3 of the way through the Compara API tutorial [1] >> there >> is what is supposed to be a completely functional script. It does not >> work. This is in contrast to some of the earlier simple scripts (listing >> the species in Ensmbl etc.) which do work on my machine, so I have all >> the >> libraries do dah installed correctly). >> >> Very poor form to document scripts which do not function on a properly >> setup >> system. >> >> I have modified my invocation of the script slightly: >> Align.pl --set_of_species \ >> "Homo sapiens:Pan troglodytes:Gorilla gorilla:Macaca mulatta:Otolemur >> garnettii:Pongo pygmaeus:Equus caballus:Bos taurus:Loxodonta >> africana:Pteropus vampyrus:Myotis lucifugus:Felis catus:Canis >> familiaris:Sus >> scrofa:Rattus norvegicus:Mus musculus:Gallus gallus:Xenopus >> tropicalis:Takifugu rubripes:Tetraodon nigroviridis:Danio rerio:Tupaia >> belangeri:Caenorhabditis elegans:Saccharomyces cerevisiae" >> >> which results in "Can't call method "fetch_by_method_link_type_GenomeDBs" >> on >> an undefined value at ./Align.pl line 132." (Align.pl is my slightly >> modified example of the Compara Tutoraial code.) >> As these are slightly modified perl scripts from the documantation, the >> line >> numbers may be variable. >> >> I can print out the genome_dbs, and it gives me a list of genome names >> (hash >> tables) though it appears that is problematic in the Align.pl script. >> in spite of the fact that just previously to that call I dumped >> "genome_dbs" >> and got back some 25 hash tables (expected). I believe this occurs >> whether >> one is comparing "human:mouse" or the more complex species set I have >> outlined above. >> >> >> >> Has anyone else attempted to run the code documented in the Ensembl API >> Tutorial? >> Any suggestions as to what direction to go in would be appreciated -- when >> one is trying to copy code out of a tutorial and it fails its kind of hard >> to know where to go.) >> >> There do appear to be some problems in the specifications of a Compara >> version/database and there don't appear to be a lot of resources informing >> one of what resources are currently available. >> >> Robert >> >> >> 1. http://pre.ensembl.org/info/docs/api/compara/compara_tutorial.html >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > -- > Andreas K?h?ri, Ensembl Software Developer > European Bioinformatics Institute (EMBL-EBI) > Wellcome Trust Genome Campus, Hinxton > Cambridge CB10 1SD, United Kingdom > From Russell.Smithies at agresearch.co.nz Sun Jan 10 15:34:39 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Mon, 11 Jan 2010 09:34:39 +1300 Subject: [Bioperl-l] how to retrieve organism name from accession number? In-Reply-To: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife> References: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz> An alternate non-BioPerly way (that may be faster given NCBI's flakiness lately) would be to download the gi_taxid_nucl.zip or gi_taxid_prot.zip files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash and do lookups. In that same dir, taxdump.tar.gz contains a file called names.dmp which lists taxids and descriptions (and synonyms) If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I could do this: my $taxid = $gi_taxid_nucl{$accession}; my $org_name = $names{$taxid}; --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen > Sent: Saturday, 26 December 2009 4:52 p.m. > To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] how to retrieve organism name from accession > number? > > Bhakti, > The following example (using EUtilities) may serve your purpose: > > use Bio::DB::EUtilities; > > my (%taxa, @taxa); > my (%names, %idmap); > > # these are protein ids; nuc ids will work by changing -dbfrom => > 'nucleotide', > # (probably) > > my @ids = qw(1621261 89318838 68536103 20807972 730439); > > my $factory = Bio::DB::EUtilities->new(-eutil => 'elink', > -db => 'taxonomy', > -dbfrom => 'protein', > -correspondence => 1, > -id => \@ids); > > # iterate through the LinkSet objects > while (my $ds = $factory->next_LinkSet) { > $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0] > } > > @taxa = @taxa{@ids}; > > $factory = Bio::DB::EUtilities->new(-eutil => 'esummary', > -db => 'taxonomy', > -id => \@taxa ); > > while (local $_ = $factory->next_DocSum) { > $names{($_->get_contents_by_name('TaxId'))[0]} = > ($_->get_contents_by_name('ScientificName'))[0]; > } > > foreach (@ids) { > $idmap{$_} = $names{$taxa{$_}}; > } > > # %idmap is > # 1621261 => 'Mycobacterium tuberculosis H37Rv' > # 20807972 => 'Thermoanaerobacter tengcongensis MB4' > # 68536103 => 'Corynebacterium jeikeium K411' > # 730439 => 'Bacillus caldolyticus' > # 89318838 => undef (this record has been removed from the db) > > 1; > > You probably will need to break up your 30000 into chunks > (say, 1000-3000 each), and do the above on each chunk with a > > sleep 3; > > or so separating the queries. > MAJ > ----- Original Message ----- > From: "Bhakti Dwivedi" > To: > Sent: Friday, December 25, 2009 9:46 PM > Subject: [Bioperl-l] how to retrieve organism name from accession number? > > > > Hi, > > > > Does anyone know how to retrieve the "Source" or the "Species name" > given > > the accession number using Bioperl. I have these 30,000 accession > numbers > > for which I need to get the source organisms. Any kind of help will be > > appreciated. > > > > Thanks > > > > BD > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From cjfields at illinois.edu Sun Jan 10 15:49:40 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 10 Jan 2010 14:49:40 -0600 Subject: [Bioperl-l] how to retrieve organism name from accession number? In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz> References: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife> <18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz> Message-ID: One could also use Bio::DB::Taxonomy, which indexes the same files or (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for the details). chris On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote: > An alternate non-BioPerly way (that may be faster given NCBI's flakiness lately) would be to download the gi_taxid_nucl.zip or gi_taxid_prot.zip files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash and do lookups. > In that same dir, taxdump.tar.gz contains a file called names.dmp which lists taxids and descriptions (and synonyms) > > If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I could do this: > > my $taxid = $gi_taxid_nucl{$accession}; > my $org_name = $names{$taxid}; > > --Russell > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen >> Sent: Saturday, 26 December 2009 4:52 p.m. >> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] how to retrieve organism name from accession >> number? >> >> Bhakti, >> The following example (using EUtilities) may serve your purpose: >> >> use Bio::DB::EUtilities; >> >> my (%taxa, @taxa); >> my (%names, %idmap); >> >> # these are protein ids; nuc ids will work by changing -dbfrom => >> 'nucleotide', >> # (probably) >> >> my @ids = qw(1621261 89318838 68536103 20807972 730439); >> >> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink', >> -db => 'taxonomy', >> -dbfrom => 'protein', >> -correspondence => 1, >> -id => \@ids); >> >> # iterate through the LinkSet objects >> while (my $ds = $factory->next_LinkSet) { >> $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0] >> } >> >> @taxa = @taxa{@ids}; >> >> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary', >> -db => 'taxonomy', >> -id => \@taxa ); >> >> while (local $_ = $factory->next_DocSum) { >> $names{($_->get_contents_by_name('TaxId'))[0]} = >> ($_->get_contents_by_name('ScientificName'))[0]; >> } >> >> foreach (@ids) { >> $idmap{$_} = $names{$taxa{$_}}; >> } >> >> # %idmap is >> # 1621261 => 'Mycobacterium tuberculosis H37Rv' >> # 20807972 => 'Thermoanaerobacter tengcongensis MB4' >> # 68536103 => 'Corynebacterium jeikeium K411' >> # 730439 => 'Bacillus caldolyticus' >> # 89318838 => undef (this record has been removed from the db) >> >> 1; >> >> You probably will need to break up your 30000 into chunks >> (say, 1000-3000 each), and do the above on each chunk with a >> >> sleep 3; >> >> or so separating the queries. >> MAJ >> ----- Original Message ----- >> From: "Bhakti Dwivedi" >> To: >> Sent: Friday, December 25, 2009 9:46 PM >> Subject: [Bioperl-l] how to retrieve organism name from accession number? >> >> >>> Hi, >>> >>> Does anyone know how to retrieve the "Source" or the "Species name" >> given >>> the accession number using Bioperl. I have these 30,000 accession >> numbers >>> for which I need to get the source organisms. Any kind of help will be >>> appreciated. >>> >>> Thanks >>> >>> BD >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Russell.Smithies at agresearch.co.nz Sun Jan 10 16:05:06 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Mon, 11 Jan 2010 10:05:06 +1300 Subject: [Bioperl-l] how to retrieve organism name from accession number? In-Reply-To: References: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife> <18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz> I've started to go off eUtils recently (not BioPerl's fault) as I've often been finding that with large queries, chunks of the resulting data is missing. For example, before Xmas I was creating species-specific databases by using eUtils to get a list of GI numbers back for a taxid, then retrieving the fasta sequences in chunks of 500. Very regularly, in the middle of the fasta there would be a message about resource unavailable eg. >test_sequence_1 TACGATCATCGCTResource UnavailableTACGACTCTGCT >test_sequence_2 TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT Often this wasn't detected until formatdb complained about invalid characters. Inquiries to NCBI as to why this was happening and what to do about it returned stupid answers ("do each sequence manually thru the web interface", or "use eUtils"). As we have a nice fast network connection, I now prefer to download very large gzip files (i.e. all of refseq) and extract what I need. I can't help but think that NCBI could solve a lot of problems if they gzipped the output from eUtils queries - it's something I've requested regularly for the last 5 years or so!! --Russell > -----Original Message----- > From: Chris Fields [mailto:cjfields at illinois.edu] > Sent: Monday, 11 January 2010 9:50 a.m. > To: Smithies, Russell > Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-bio.org' > Subject: Re: [Bioperl-l] how to retrieve organism name from accession > number? > > One could also use Bio::DB::Taxonomy, which indexes the same files or > (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for the > details). > > chris > > On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote: > > > An alternate non-BioPerly way (that may be faster given NCBI's flakiness > lately) would be to download the gi_taxid_nucl.zip or gi_taxid_prot.zip > files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash and > do lookups. > > In that same dir, taxdump.tar.gz contains a file called names.dmp which > lists taxids and descriptions (and synonyms) > > > > If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I > could do this: > > > > my $taxid = $gi_taxid_nucl{$accession}; > > my $org_name = $names{$taxid}; > > > > --Russell > > > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen > >> Sent: Saturday, 26 December 2009 4:52 p.m. > >> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org > >> Subject: Re: [Bioperl-l] how to retrieve organism name from accession > >> number? > >> > >> Bhakti, > >> The following example (using EUtilities) may serve your purpose: > >> > >> use Bio::DB::EUtilities; > >> > >> my (%taxa, @taxa); > >> my (%names, %idmap); > >> > >> # these are protein ids; nuc ids will work by changing -dbfrom => > >> 'nucleotide', > >> # (probably) > >> > >> my @ids = qw(1621261 89318838 68536103 20807972 730439); > >> > >> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink', > >> -db => 'taxonomy', > >> -dbfrom => 'protein', > >> -correspondence => 1, > >> -id => \@ids); > >> > >> # iterate through the LinkSet objects > >> while (my $ds = $factory->next_LinkSet) { > >> $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0] > >> } > >> > >> @taxa = @taxa{@ids}; > >> > >> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary', > >> -db => 'taxonomy', > >> -id => \@taxa ); > >> > >> while (local $_ = $factory->next_DocSum) { > >> $names{($_->get_contents_by_name('TaxId'))[0]} = > >> ($_->get_contents_by_name('ScientificName'))[0]; > >> } > >> > >> foreach (@ids) { > >> $idmap{$_} = $names{$taxa{$_}}; > >> } > >> > >> # %idmap is > >> # 1621261 => 'Mycobacterium tuberculosis H37Rv' > >> # 20807972 => 'Thermoanaerobacter tengcongensis MB4' > >> # 68536103 => 'Corynebacterium jeikeium K411' > >> # 730439 => 'Bacillus caldolyticus' > >> # 89318838 => undef (this record has been removed from the db) > >> > >> 1; > >> > >> You probably will need to break up your 30000 into chunks > >> (say, 1000-3000 each), and do the above on each chunk with a > >> > >> sleep 3; > >> > >> or so separating the queries. > >> MAJ > >> ----- Original Message ----- > >> From: "Bhakti Dwivedi" > >> To: > >> Sent: Friday, December 25, 2009 9:46 PM > >> Subject: [Bioperl-l] how to retrieve organism name from accession > number? > >> > >> > >>> Hi, > >>> > >>> Does anyone know how to retrieve the "Source" or the "Species name" > >> given > >>> the accession number using Bioperl. I have these 30,000 accession > >> numbers > >>> for which I need to get the source organisms. Any kind of help will > be > >>> appreciated. > >>> > >>> Thanks > >>> > >>> BD > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >>> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > ======================================================================= > > Attention: The information contained in this message and/or attachments > > from AgResearch Limited is intended only for the persons or entities > > to which it is addressed and may contain confidential and/or privileged > > material. Any review, retransmission, dissemination or other use of, or > > taking of any action in reliance upon, this information by persons or > > entities other than the intended recipients is prohibited by AgResearch > > Limited. If you have received this message in error, please notify the > > sender immediately. > > ======================================================================= > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l From avilella at gmail.com Sun Jan 10 16:05:13 2010 From: avilella at gmail.com (Albert Vilella) Date: Sun, 10 Jan 2010 21:05:13 +0000 Subject: [Bioperl-l] Ensembl problems In-Reply-To: References: <20100109220121.GA9521@quux.windows.ebi.ac.uk> Message-ID: <358f4d651001101305q1b75cfe3q558a245ab1ab1238@mail.gmail.com> > However, if anyone knows Ensembl very well, the database has in it > some of these interspecies comparisons already. ?They are accessed > when one does a phylogeny tree for specific genes (and generally for > highly conserved gene you will get a tree that includes nearly all 50 > species in the database). ?As I don't think they are computed > on-the-fly, the information must be precomputed and stored someplace > in the database. ?I would very much like to know how to access this > information. Yes, they are. You can access the data programmatically by installing the ensembl and ensembl-compara Perl APIs. There are a few example scripts for the GeneTrees: ensembl-compara/scripts/examples/homology*.pl Cheers, Albert. > Thanks, > Robert > > > > > On 1/9/10, Andreas K?h?ri wrote: >> On Sat, Jan 09, 2010 at 03:27:54PM -0500, Robert Bradbury wrote: >>> I am trying to get the examples provided by EMBL/Ensembl to work and am >>> encountering problems. >> >> Hi Robert, >> >> The ensembl-dev list is the appropriate forum for this type of questions >> as it has nothing to do with bioperl. >> >> There is also the Ensembl helpdesk. ?If you send your problem to >> I'm sure that it will be picked up by the >> appropriate people (I do myself not know enough about the Compara API to >> be able to diagnose this problem straight away I'm afraid). >> >> Be sure to submit a minimal script that still exhibit the problem, and >> information about what version of the APIs you're using (we will assume >> that you're not mixing newer version of the API with older databases or >> vice versa). >> >> We are generally very happy to have bugs in documentation or code >> pointed out to us, and will correct errors as we are made aware of them. >> >> >> Kind regards, >> Andreas >> >>> For example, about 1/3 of the way through the Compara API tutorial [1] >>> there >>> is what is supposed to be a completely functional script. ?It does not >>> work. ?This is in contrast to some of the earlier simple scripts (listing >>> the species in ?Ensmbl etc.) which do work on my machine, so I have all >>> the >>> libraries do dah installed correctly). >>> >>> Very poor form to document scripts which do not function on a properly >>> setup >>> system. >>> >>> I have modified my invocation of the script slightly: >>> ? Align.pl --set_of_species \ >>> "Homo sapiens:Pan troglodytes:Gorilla gorilla:Macaca mulatta:Otolemur >>> garnettii:Pongo pygmaeus:Equus caballus:Bos taurus:Loxodonta >>> africana:Pteropus vampyrus:Myotis lucifugus:Felis catus:Canis >>> familiaris:Sus >>> scrofa:Rattus norvegicus:Mus musculus:Gallus gallus:Xenopus >>> tropicalis:Takifugu rubripes:Tetraodon nigroviridis:Danio rerio:Tupaia >>> belangeri:Caenorhabditis elegans:Saccharomyces cerevisiae" >>> >>> which results in "Can't call method "fetch_by_method_link_type_GenomeDBs" >>> on >>> an undefined value at ./Align.pl line 132." (Align.pl is my slightly >>> modified example of the Compara Tutoraial code.) >>> As these are slightly modified perl scripts from the documantation, the >>> line >>> numbers may be variable. >>> >>> I can print out the genome_dbs, and it gives me a list of genome names >>> (hash >>> tables) though it appears that is problematic in the Align.pl script. >>> in spite of the fact that just previously to that call I dumped >>> "genome_dbs" >>> and got back some 25 hash tables (expected). ?I believe this occurs >>> whether >>> one is comparing "human:mouse" or the more complex species set I have >>> outlined above. >>> >>> >>> >>> Has anyone else attempted to run the code documented in the Ensembl API >>> Tutorial? >>> Any suggestions as to what direction to go in would be appreciated -- when >>> one is trying to copy code out of a tutorial and it fails its kind of hard >>> to know where to go.) >>> >>> There do appear to be some problems in the specifications of a Compara >>> version/database and there don't appear to be a lot of resources informing >>> one of what resources are currently available. >>> >>> Robert >>> >>> >>> 1. http://pre.ensembl.org/info/docs/api/compara/compara_tutorial.html >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> -- >> Andreas K?h?ri, Ensembl Software Developer >> European Bioinformatics Institute (EMBL-EBI) >> Wellcome Trust Genome Campus, Hinxton >> Cambridge CB10 1SD, United Kingdom >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From alessandra.bilardi at gmail.com Sun Jan 10 18:21:12 2010 From: alessandra.bilardi at gmail.com (Alessandra) Date: Mon, 11 Jan 2010 00:21:12 +0100 Subject: [Bioperl-l] GBrowse.org project In-Reply-To: References: Message-ID: Hi all, I'm Alessandra and I run GBrowse.org. GBrowse.org is a resource for using and setting up GBrowse genome browsers. The site provides one location where biologists and bioinformaticians can find: 1. Genome browser web sites for any organism that has them. If a species has a genome browser anywhere on the web, then we aim to link to it. 2. Links to sequence and annotation files that are available online. 3. Links to genome browser configuration files, when available 4. An FTP site containing genome annotation and configuration files for each annotated genome that does not have its own web site. GBrowse.org emphasizes the GBrowse genome browser in its organization, but also links to sites that use other browser packages such as UCSC, Ensembl, and JBrowse. Also, we are currently conducting a survey seeking input on future project direction. Please take a few minutes now to provide your feedback. Survey link: http://gbrowse.org/survey/index.php?sid=64264&lang=en GBrowse.org introdution link: http://gmod.org/wiki/August_2009_GMOD_Meeting#GBrowse.org Thank you for your help, Alessandra Bilardi. http://gbrowse.org/ CRIBI Genomics, University of Padua http://genomics.cribi.unipd.it/ From cjfields at illinois.edu Sun Jan 10 22:04:13 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 10 Jan 2010 21:04:13 -0600 Subject: [Bioperl-l] GMOD BioPerl Meeting Message-ID: <7D72ECC2-E856-4C09-B67A-62AFFB59B377@illinois.edu> Just a quick reminder that we're having a BioPerl satellite meeting after the PAG Conference (just prior to the GMOD Meeting). The meeting is this Wednesday, Jan. 13, starting at 11:30am, at the Best Western Seven Seas in San Diego. I will update the relevant BioPerl and GMOD pages with more details as they become available. At the moment, we will be meeting in the hotel lobby prior to starting the meeting and possible hackathon. http://www.bioperl.org/wiki/GMOD_2010_Meeting http://gmod.org/wiki/January_2010_GMOD_Meeting#Satellite_Meetings Thanks! chris From bernd.jagla at pasteur.fr Mon Jan 11 05:11:16 2010 From: bernd.jagla at pasteur.fr (Bernd Jagla) Date: Mon, 11 Jan 2010 11:11:16 +0100 Subject: [Bioperl-l] Bio:DB:SeqFeature - MySql - Java Message-ID: <6D85585C10F94E25898249D2D7CAC0D7@zillumina> Hi, First off, I am not sure if this is supposed to be addressed to the Bioperl or Gbrowse mailing list, so apologies if this is the wrong list and please let me know. I am writing a program in Java that needs to access genome annotation data. Since I am using Gbrowse already I was thinking that I could combine both approaches making life eventually easier for me. I am mainly interested in getting a gene/feature name for a given position. The position is stored in the feature table and through linking typelist, locationlist, (maybe sequence), and feature I can get all the information I need. Unfortunately it seems that the feature name is stored in the object blog of the feature table. That is a bit suspicious to me because I don't understand why searching for a name can be so fast if it is not indexed through mysql when searching using GBrowse. So my question is how to I parse the Bio::DB::SeqFeature object in JAVA correctly to get the name of the feature and possible also any further information. Any suggestions are greatly appreciated. Maybe there is a better solution than parsing Perl code with Java.? Thanks a lot, Bernd From biopython at maubp.freeserve.co.uk Mon Jan 11 05:48:52 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 11 Jan 2010 10:48:52 +0000 Subject: [Bioperl-l] Bio:DB:SeqFeature - MySql - Java In-Reply-To: <6D85585C10F94E25898249D2D7CAC0D7@zillumina> References: <6D85585C10F94E25898249D2D7CAC0D7@zillumina> Message-ID: <320fb6e01001110248t628f0837qa5e057fd53b58eac@mail.gmail.com> On Mon, Jan 11, 2010 at 10:11 AM, Bernd Jagla wrote: > Hi, > > First off, I am not sure if this is supposed to be addressed to the Bioperl > or Gbrowse mailing list, so apologies if this is the wrong list and please > let me know. > > I am writing a program in Java that needs to access genome annotation data. > Since I am using Gbrowse already I was thinking that I could combine both > approaches making life eventually easier for me. I am mainly interested in > getting a gene/feature name for a given position. The position is stored in > the feature table and through linking typelist, locationlist, (maybe > sequence), and feature I can get all the information I need. Unfortunately > it seems that the feature name is stored in the object blog of the feature > table. How are you storing the data in Gbrowse? There are several back ends, and this will make a big difference for accessing the raw data. One option would be to use Gbrowse with BioSQL as the backend. You can then use BioJava (or BioPerl, or BioPython, etc) to access the database. The only downside is Gbrowse isn't working 100% on top of BioSQL right now (I'd like to see this fixed, but I don't know Perl). There is an open bug on this [ gmod-Bugs-2168597 ]. Peter From bernd.jagla at pasteur.fr Mon Jan 11 05:53:20 2010 From: bernd.jagla at pasteur.fr (Bernd Jagla) Date: Mon, 11 Jan 2010 11:53:20 +0100 Subject: [Bioperl-l] Bio:DB:SeqFeature - MySql - Java In-Reply-To: <320fb6e01001110248t628f0837qa5e057fd53b58eac@mail.gmail.com> References: <6D85585C10F94E25898249D2D7CAC0D7@zillumina> <320fb6e01001110248t628f0837qa5e057fd53b58eac@mail.gmail.com> Message-ID: <9056164A8A744A77B6CD1E8E4E20B104@zillumina> I am using bp_seqfeature_load.pl to load my features. That is using Bio:DB:SeqFeature(Store) and MySql as a backend... That's all I understood... B > -----Original Message----- > From: p.j.a.cock at googlemail.com [mailto:p.j.a.cock at googlemail.com] On > Behalf Of Peter > Sent: Monday, January 11, 2010 11:49 AM > To: Bernd Jagla > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bio:DB:SeqFeature - MySql - Java > > On Mon, Jan 11, 2010 at 10:11 AM, Bernd Jagla > wrote: > > Hi, > > > > First off, I am not sure if this is supposed to be addressed to the > Bioperl > > or Gbrowse mailing list, so apologies if this is the wrong list and > please > > let me know. > > > > I am writing a program in Java that needs to access genome annotation > data. > > Since I am using Gbrowse already I was thinking that I could combine > both > > approaches making life eventually easier for me. I am mainly interested > in > > getting a gene/feature name for a given position. The position is stored > in > > the feature table and through linking typelist, locationlist, (maybe > > sequence), and feature I can get all the information I need. > Unfortunately > > it seems that the feature name is stored in the object blog of the > feature > > table. > > How are you storing the data in Gbrowse? There are several back ends, > and this will make a big difference for accessing the raw data. > > One option would be to use Gbrowse with BioSQL as the backend. > You can then use BioJava (or BioPerl, or BioPython, etc) to access the > database. The only downside is Gbrowse isn't working 100% on top > of BioSQL right now (I'd like to see this fixed, but I don't know Perl). > There is an open bug on this [ gmod-Bugs-2168597 ]. > > Peter From awitney at sgul.ac.uk Mon Jan 11 07:21:07 2010 From: awitney at sgul.ac.uk (Adam Witney) Date: Mon, 11 Jan 2010 12:21:07 +0000 Subject: [Bioperl-l] create Bio::AlignIO object from hash Message-ID: Hi, I am writing a script to automate the running of Phylip Pars. In the process i have to create a Bio::AlignIO object from a set of data that i have in a hash. I could write the hash data into a phylip file and then load the Bio::AlignIO from that file, but i wondered if i could skip the writing and then reading of a temporary file ? thanks for any help adam From roy.chaudhuri at gmail.com Mon Jan 11 08:54:25 2010 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Mon, 11 Jan 2010 13:54:25 +0000 Subject: [Bioperl-l] create Bio::AlignIO object from hash In-Reply-To: <4B4B2A51.9040602@gmail.com> References: <4B4B2A51.9040602@gmail.com> Message-ID: <4B4B2D91.70906@gmail.com> Actually, I guess some sample code would be more helpful: use Bio::LocatableSeq; use Bio::SimpleAlign; use Bio::AlignIO; my $seq1=Bio::LocatableSeq->new(-id=>'one', -seq=>'AT-CG', -start=>1, -end=>4); my $seq2=Bio::LocatableSeq->new(-id=>'two', -seq=>'A--CG', -start=>1, -end=>3); my $seq3=Bio::LocatableSeq->new(-id=>'three', -seq=>'ATTCG', -start=>1, -end=>5); my $aln=Bio::SimpleAlign->new(-seqs=>[$seq1,$seq2,$seq3]); Bio::AlignIO->new(-format=>'phylip')->write_aln($aln); Cheers, Roy. On 11/01/2010 13:40, Roy Chaudhuri wrote: > Hi Adam, > > I'm guessing you actually want to create a Bio::SimpleAlign object > (representing an alignment), rather than a Bio::AlignIO object (which is > just for reading/writing alignment files). Bio::SimpleAlign has a > documented new method that allows you to construct an alignment from > Bio::LocatableSeq objects, which are similar to Bio::Seq objects but > include gaps and start/end coordinates to describe their relationship to > other sequences in the alignment. > > Roy. > > On 11/01/2010 12:21, Adam Witney wrote: >> Hi, >> >> I am writing a script to automate the running of Phylip Pars. In the >> process i have to create a Bio::AlignIO object from a set of data >> that i have in a hash. >> >> I could write the hash data into a phylip file and then load the >> Bio::AlignIO from that file, but i wondered if i could skip the >> writing and then reading of a temporary file ? >> >> thanks for any help >> >> adam _______________________________________________ Bioperl-l >> mailing list Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From roy.chaudhuri at gmail.com Mon Jan 11 08:40:33 2010 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Mon, 11 Jan 2010 13:40:33 +0000 Subject: [Bioperl-l] create Bio::AlignIO object from hash In-Reply-To: References: Message-ID: <4B4B2A51.9040602@gmail.com> Hi Adam, I'm guessing you actually want to create a Bio::SimpleAlign object (representing an alignment), rather than a Bio::AlignIO object (which is just for reading/writing alignment files). Bio::SimpleAlign has a documented new method that allows you to construct an alignment from Bio::LocatableSeq objects, which are similar to Bio::Seq objects but include gaps and start/end coordinates to describe their relationship to other sequences in the alignment. Roy. On 11/01/2010 12:21, Adam Witney wrote: > Hi, > > I am writing a script to automate the running of Phylip Pars. In the > process i have to create a Bio::AlignIO object from a set of data > that i have in a hash. > > I could write the hash data into a phylip file and then load the > Bio::AlignIO from that file, but i wondered if i could skip the > writing and then reading of a temporary file ? > > thanks for any help > > adam _______________________________________________ Bioperl-l > mailing list Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From biopython at maubp.freeserve.co.uk Mon Jan 11 09:16:45 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 11 Jan 2010 14:16:45 +0000 Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records Message-ID: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com> Hi, I'm running bioperl-live from SVN, just updated to revision 16648. $ perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"' 1.0069 I am trying to get Bio::SeqIO to convert a multiple record EMBL file into GenBank format, piping the data via stdin/stdout using the following trivial Perl script: #!/usr/bin/env perl use Bio::SeqIO; my $in = Bio::SeqIO->new(-fh => \*STDIN, -format => 'embl'); my $out = Bio::SeqIO->new(-format => 'genbank'); while (my $seq = $in->next_seq) { $out->write_seq($seq) }; This only seems to find the first EMBL record in my example files. For example, this simple file has just two contig records: http://biopython.open-bio.org/SRC/biopython/Tests/EMBL/Human_contigs.embl This is just the first two records taken from a much larger EMBL file rel_con_hum_01_r102.dat downloaded and uncompressed from: ftp://ftp.ebi.ac.uk/pub/databases/embl/release/rel_con_hum_01_r102.dat.gz Trying both these examples as input, BioPerl just gives a single GenBank record as output (the first EMBL entry in the input). Is this a BioPerl bug, or am I missing something? Peter From maj at fortinbras.us Mon Jan 11 10:04:00 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 11 Jan 2010 10:04:00 -0500 Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records In-Reply-To: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com> References: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com> Message-ID: Hi Peter, I found the issue-- there are no SQ lines in the data, and having them is a key stop condition in the parser (line 438 embl.pm). We evidently need to be more liberal in what we accept, even as we are strict in what we emit. Could you make a bug report? thanks for the heads-up-- MAJ ----- Original Message ----- From: "Peter" To: "bioperl-l list" Sent: Monday, January 11, 2010 9:16 AM Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records > Hi, > > I'm running bioperl-live from SVN, just updated to revision 16648. > > $ perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"' > 1.0069 > > I am trying to get Bio::SeqIO to convert a multiple record EMBL > file into GenBank format, piping the data via stdin/stdout using > the following trivial Perl script: > > #!/usr/bin/env perl > use Bio::SeqIO; > my $in = Bio::SeqIO->new(-fh => \*STDIN, -format => 'embl'); > my $out = Bio::SeqIO->new(-format => 'genbank'); > while (my $seq = $in->next_seq) { $out->write_seq($seq) }; > > This only seems to find the first EMBL record in my example > files. For example, this simple file has just two contig records: > http://biopython.open-bio.org/SRC/biopython/Tests/EMBL/Human_contigs.embl > > This is just the first two records taken from a much larger EMBL file > rel_con_hum_01_r102.dat downloaded and uncompressed from: > ftp://ftp.ebi.ac.uk/pub/databases/embl/release/rel_con_hum_01_r102.dat.gz > > Trying both these examples as input, BioPerl just gives a single > GenBank record as output (the first EMBL entry in the input). > > Is this a BioPerl bug, or am I missing something? > > Peter > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From biopython at maubp.freeserve.co.uk Mon Jan 11 10:17:37 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 11 Jan 2010 15:17:37 +0000 Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records In-Reply-To: References: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com> Message-ID: <320fb6e01001110717g93f11ccn13c1010cefeb3a5b@mail.gmail.com> On Mon, Jan 11, 2010 at 3:04 PM, Mark A. Jensen wrote: > > Hi Peter, I found the issue-- there are no SQ lines in the data, and having > them is a key stop condition in the parser (line 438 embl.pm). > We evidently need to be more liberal in what we accept, even as we are > strict in what we emit. Could you make a bug report? > thanks for the heads-up-- > MAJ Done: http://bugzilla.open-bio.org/show_bug.cgi?id=2982 These are EMBL contig records, so they don't have SQ lines, but instead CO lines. Peter From cjfields at illinois.edu Mon Jan 11 10:24:24 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 11 Jan 2010 09:24:24 -0600 Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records In-Reply-To: <320fb6e01001110717g93f11ccn13c1010cefeb3a5b@mail.gmail.com> References: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com> <320fb6e01001110717g93f11ccn13c1010cefeb3a5b@mail.gmail.com> Message-ID: On Jan 11, 2010, at 9:17 AM, Peter wrote: > On Mon, Jan 11, 2010 at 3:04 PM, Mark A. Jensen wrote: >> >> Hi Peter, I found the issue-- there are no SQ lines in the data, and having >> them is a key stop condition in the parser (line 438 embl.pm). >> We evidently need to be more liberal in what we accept, even as we are >> strict in what we emit. Could you make a bug report? >> thanks for the heads-up-- >> MAJ > > Done: http://bugzilla.open-bio.org/show_bug.cgi?id=2982 > > These are EMBL contig records, so they don't have SQ lines, > but instead CO lines. > > Peter Peter, Just curious, but have you tried the experimental EMBL parser 'embldriver'? I don't think it's bound to the same strictures, but I may be mistaken. chris From cjfields at illinois.edu Mon Jan 11 10:23:00 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 11 Jan 2010 09:23:00 -0600 Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records In-Reply-To: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com> References: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com> Message-ID: <0D0D9DB5-56FA-414E-8D1D-3FE18198F7EC@illinois.edu> Just saw that mark responded, so if possible submit a bug. We may be doing a mini-hackathon this Wednesday, so we can probably tackle it in the process (possibly along with a few other pressing issues). chris On Jan 11, 2010, at 8:16 AM, Peter wrote: > Hi, > > I'm running bioperl-live from SVN, just updated to revision 16648. > > $ perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"' > 1.0069 > > I am trying to get Bio::SeqIO to convert a multiple record EMBL > file into GenBank format, piping the data via stdin/stdout using > the following trivial Perl script: > > #!/usr/bin/env perl > use Bio::SeqIO; > my $in = Bio::SeqIO->new(-fh => \*STDIN, -format => 'embl'); > my $out = Bio::SeqIO->new(-format => 'genbank'); > while (my $seq = $in->next_seq) { $out->write_seq($seq) }; > > This only seems to find the first EMBL record in my example > files. For example, this simple file has just two contig records: > http://biopython.open-bio.org/SRC/biopython/Tests/EMBL/Human_contigs.embl > > This is just the first two records taken from a much larger EMBL file > rel_con_hum_01_r102.dat downloaded and uncompressed from: > ftp://ftp.ebi.ac.uk/pub/databases/embl/release/rel_con_hum_01_r102.dat.gz > > Trying both these examples as input, BioPerl just gives a single > GenBank record as output (the first EMBL entry in the input). > > Is this a BioPerl bug, or am I missing something? > > Peter > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From biopython at maubp.freeserve.co.uk Mon Jan 11 10:55:26 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 11 Jan 2010 15:55:26 +0000 Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records In-Reply-To: References: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com> Message-ID: <320fb6e01001110755u50fd4255i1f6dae40a608a562@mail.gmail.com> On Mon, Jan 11, 2010 at 3:42 PM, Hotz, Hans-Rudolf wrote: > > These entries form the CON data class, see: > http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html#3_4_14 > and they don't contain any sequence information. I know - GenBank files have a similar system with CONTIG lines instead of sequences. I was expecting BioPerl to be able to convert these EMBL files with CO lines into GenBank files with CONTIG lines. > If you take the 'expanded' entries from > ftp://ftp.ebi.ac.uk/pub/databases/embl/expanded_con/release/rel_con_hum_01_r102.dat.gz > your script will work. That's a useful tip - thanks. Peter From hrh at fmi.ch Mon Jan 11 10:42:22 2010 From: hrh at fmi.ch (Hotz, Hans-Rudolf) Date: Mon, 11 Jan 2010 16:42:22 +0100 Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records In-Reply-To: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com> Message-ID: On 1/11/10 3:16 PM, "Peter" wrote: > Hi, > > I'm running bioperl-live from SVN, just updated to revision 16648. > > $ perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"' > 1.0069 > > I am trying to get Bio::SeqIO to convert a multiple record EMBL > file into GenBank format, piping the data via stdin/stdout using > the following trivial Perl script: > > #!/usr/bin/env perl > use Bio::SeqIO; > my $in = Bio::SeqIO->new(-fh => \*STDIN, -format => 'embl'); > my $out = Bio::SeqIO->new(-format => 'genbank'); > while (my $seq = $in->next_seq) { $out->write_seq($seq) }; > > This only seems to find the first EMBL record in my example > files. For example, this simple file has just two contig records: > http://biopython.open-bio.org/SRC/biopython/Tests/EMBL/Human_contigs.embl > > This is just the first two records taken from a much larger EMBL file > rel_con_hum_01_r102.dat downloaded and uncompressed from: > ftp://ftp.ebi.ac.uk/pub/databases/embl/release/rel_con_hum_01_r102.dat.gz These entries form the CON data class, see: http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html#3_4_14 and they don't contain any sequence information. If you take the 'expanded' entries from ftp://ftp.ebi.ac.uk/pub/databases/embl/expanded_con/release/rel_con_hum_01_r 102.dat.gz your script will work. Hans > Trying both these examples as input, BioPerl just gives a single > GenBank record as output (the first EMBL entry in the input). > > Is this a BioPerl bug, or am I missing something? > > Peter > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From awitney at sgul.ac.uk Mon Jan 11 11:27:15 2010 From: awitney at sgul.ac.uk (Adam Witney) Date: Mon, 11 Jan 2010 16:27:15 +0000 Subject: [Bioperl-l] create Bio::AlignIO object from hash In-Reply-To: <4B4B2D91.70906@gmail.com> References: <4B4B2A51.9040602@gmail.com> <4B4B2D91.70906@gmail.com> Message-ID: Ah excellent, thanks Roy. I was indeed thinking about it the wrong way. In the process of writing this i have created a Bio::Tools::Run::Phylo::Phylip::Pars class which is essentially just a modified copy of ProtPars. I have also fixed a few typos and possible bugs in Bio/Tools/Run/Phylo/Phylip/Base.pm Bio/Tools/Run/Phylo/Phylip/PhylipConf.pm Bio/AlignIO/phylip.pm Bio/Tools/Run/Alignment/Clustalw.pm I am of course happy to send these back in to the project... how would i best do this? Cheers adam On 11 Jan 2010, at 13:54, Roy Chaudhuri wrote: > Actually, I guess some sample code would be more helpful: > > use Bio::LocatableSeq; > use Bio::SimpleAlign; > use Bio::AlignIO; > my $seq1=Bio::LocatableSeq->new(-id=>'one', -seq=>'AT-CG', -start=>1, -end=>4); > my $seq2=Bio::LocatableSeq->new(-id=>'two', -seq=>'A--CG', -start=>1, -end=>3); > my $seq3=Bio::LocatableSeq->new(-id=>'three', -seq=>'ATTCG', -start=>1, -end=>5); > my $aln=Bio::SimpleAlign->new(-seqs=>[$seq1,$seq2,$seq3]); > Bio::AlignIO->new(-format=>'phylip')->write_aln($aln); > > Cheers, > Roy. > > > On 11/01/2010 13:40, Roy Chaudhuri wrote: >> Hi Adam, >> >> I'm guessing you actually want to create a Bio::SimpleAlign object >> (representing an alignment), rather than a Bio::AlignIO object (which is >> just for reading/writing alignment files). Bio::SimpleAlign has a >> documented new method that allows you to construct an alignment from >> Bio::LocatableSeq objects, which are similar to Bio::Seq objects but >> include gaps and start/end coordinates to describe their relationship to >> other sequences in the alignment. >> >> Roy. >> >> On 11/01/2010 12:21, Adam Witney wrote: >>> Hi, >>> >>> I am writing a script to automate the running of Phylip Pars. In the >>> process i have to create a Bio::AlignIO object from a set of data >>> that i have in a hash. >>> >>> I could write the hash data into a phylip file and then load the >>> Bio::AlignIO from that file, but i wondered if i could skip the >>> writing and then reading of a temporary file ? >>> >>> thanks for any help >>> >>> adam _______________________________________________ Bioperl-l >>> mailing list Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > From Russell.Smithies at agresearch.co.nz Mon Jan 11 22:41:02 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue, 12 Jan 2010 16:41:02 +1300 Subject: [Bioperl-l] BioPerl version? In-Reply-To: References: <4B4B2A51.9040602@gmail.com> <4B4B2D91.70906@gmail.com> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C619262C8@exchsth.agresearch.co.nz> Probably a dumb question but how do I find the version of an existing BioPerl installation without resorting to reading thru Bio/Root/Version.pm ? --Russell ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From cjfields at illinois.edu Mon Jan 11 22:59:44 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 11 Jan 2010 21:59:44 -0600 Subject: [Bioperl-l] BioPerl version? In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C619262C8@exchsth.agresearch.co.nz> References: <4B4B2A51.9040602@gmail.com> <4B4B2D91.70906@gmail.com> <18DF7D20DFEC044098A1062202F5FFF32C619262C8@exchsth.agresearch.co.nz> Message-ID: <795BD926-4AE9-4478-AAD5-E36558350745@illinois.edu> Not dumb, but a frequently asked one: that's a FAQ question ;> http://www.bioperl.org/wiki/FAQ#How_can_I_tell_what_version_of_BioPerl_is_installed.3F perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"' chris On Jan 11, 2010, at 9:41 PM, Smithies, Russell wrote: > Probably a dumb question but how do I find the version of an existing BioPerl installation without resorting to reading thru Bio/Root/Version.pm ? > > --Russell > > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Jan 12 11:02:02 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 12 Jan 2010 10:02:02 -0600 Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records In-Reply-To: <320fb6e01001110755u50fd4255i1f6dae40a608a562@mail.gmail.com> References: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com> <320fb6e01001110755u50fd4255i1f6dae40a608a562@mail.gmail.com> Message-ID: On Jan 11, 2010, at 9:55 AM, Peter wrote: > On Mon, Jan 11, 2010 at 3:42 PM, Hotz, Hans-Rudolf wrote: >> >> These entries form the CON data class, see: >> http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html#3_4_14 >> and they don't contain any sequence information. > > I know - GenBank files have a similar system with CONTIG > lines instead of sequences. I was expecting BioPerl to be > able to convert these EMBL files with CO lines into GenBank > files with CONTIG lines. IIRC the contig information for GenBank is stored in annotation. We can try to ensure the data is carried over to EMBL properly. >> If you take the 'expanded' entries from >> ftp://ftp.ebi.ac.uk/pub/databases/embl/expanded_con/release/rel_con_hum_01_r102.dat.gz >> your script will work. > > That's a useful tip - thanks. > > Peter NCBI's eutil option 'gbwithparts' is similar (always retrieves the sequence). chris From biopython at maubp.freeserve.co.uk Tue Jan 12 11:19:32 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 12 Jan 2010 16:19:32 +0000 Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records In-Reply-To: References: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com> <320fb6e01001110755u50fd4255i1f6dae40a608a562@mail.gmail.com> Message-ID: <320fb6e01001120819u50e73fa8k9bde8aa1abdf942d@mail.gmail.com> On Tue, Jan 12, 2010 at 4:02 PM, Chris Fields wrote: > On Jan 11, 2010, at 9:55 AM, Peter wrote: > >> On Mon, Jan 11, 2010 at 3:42 PM, Hotz, Hans-Rudolf wrote: >>> >>> These entries form the CON data class, see: >>> http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html#3_4_14 >>> and they don't contain any sequence information. >> >> I know - GenBank files have a similar system with CONTIG >> lines instead of sequences. I was expecting BioPerl to be >> able to convert these EMBL files with CO lines into GenBank >> files with CONTIG lines. > > IIRC the contig information for GenBank is stored in annotation. > We can try to ensure the data is carried over to EMBL properly. For contig records (where there is no sequence) I think we just need to map the GenBank CONTIG lines to the EMBL CO lines, and vice versa. At least, that's what Biopython now does (trunk code, not yet released). >>> If you take the 'expanded' entries from >>> ftp://ftp.ebi.ac.uk/pub/databases/embl/expanded_con/release/rel_con_hum_01_r102.dat.gz >>> your script will work. >> >> That's a useful tip - thanks. >> >> Peter > > NCBI's eutil option 'gbwithparts' is similar (always retrieves the sequence). Indeed. This is a useful work around for when a parser couldn't cope with the contig version of a GenBank file for some reason, e.g. http://bugzilla.open-bio.org/show_bug.cgi?id=2745 Peter From maj at fortinbras.us Tue Jan 12 12:33:30 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 12 Jan 2010 12:33:30 -0500 Subject: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP web service Message-ID: <231A8D9473704E7697F7A486A0CDA86A@NewLife> Hi All-- The beta of Bio::DB::SoapEUtilities is now available in the bioperl-run trunk: one-stop shopping for the NCBI Entrez SOAP web service. The system is fully WSDL based, and all eutils are available. The best thing (IMHO) are the result adaptors, which provide conversion and iteration of SOAP results into BioPerl objects. Schau, mal: use Bio::DB::EUtilities; my $fac = Bio::DB::EUtilities->new(); # step 1 my $seqio = $fac->esearch( -db => 'nucleotide', -term => 'HIV1 and CCR5 and Brazil' )->run(-auto_adapt => 1, -rettype => 'fasta'); # step 2, 3 # yes, it's already done the efetch under the hood... while ( my $seq = $seqio->next_seq ) { # step 4 # do something with $seq, a Bio::Seq object... } or this: my $links = $fac->elink( -db => 'protein', -dbfrom => 'nucleotide', -id => \@nucids )->run( -auto_adapt => 1 ); # maybe more than one associated id... my @prot_0 = $links->id_map( $nucids[0] ); while ( my $ls = $links->next_linkset ) { @ids = $ls->ids; @submitted_ids = $ls->submitted_ids; # etc. } and much, much more. See http://www.bioperl.org/wiki/HOWTO:EUtilities_Web_Service and of course, the POD, for all the details, including download/installation. Tests in bioperl-run/t. cheers, MAJ -- No new dependencies were added or animals mistreated -- during the making of these modules. From sheldon.mckay at gmail.com Tue Jan 12 13:02:53 2010 From: sheldon.mckay at gmail.com (Sheldon McKay) Date: Tue, 12 Jan 2010 10:02:53 -0800 Subject: [Bioperl-l] code.open-bio.org timing out? Message-ID: Hi all, I keep timing out trying to do an svn checkout of bioperl-live from code.open-bio.org. Any suggestions? Thanks, Sheldon ---- Sheldon McKay, PhD Lead, iPlant Tree of Life Engagement Team; Research Investigator Cold Spring Harbor Laboratory http://mckay.cshl.edu Google Voice: (203) 701-9204 On Tue, Nov 3, 2009 at 9:09 AM, Aaron Mackey wrote: > [ajm6q at lc4 bioperl-live]$ svn update > svn: Decompression of svndiff data failed > > > I'll admit to not having svn updated in awhile; A clean, anonymous svn co > failed with the same message: > > [...] > A ? ?bioperl-live/Bio/Structure/StructureI.pm > A ? ?bioperl-live/Bio/Structure/IO > svn: Decompression of svndiff data failed > > -Aaron > > P.S. I used this command: svn co svn:// > code.open-bio.org/bioperl/bioperl-live/trunk bioperl-live > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From biopython at maubp.freeserve.co.uk Tue Jan 12 13:12:46 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 12 Jan 2010 18:12:46 +0000 Subject: [Bioperl-l] code.open-bio.org timing out? In-Reply-To: References: Message-ID: <320fb6e01001121012r68166814o764df62c0a5a6224@mail.gmail.com> On Tue, Jan 12, 2010 at 6:02 PM, Sheldon McKay wrote: > Hi all, > > I keep timing out trying to do an svn checkout of bioperl-live from > code.open-bio.org. ?Any suggestions? > > Thanks, > Sheldon The OBF team know about this (its being discussed on root-l), hopefully they'll have it fixed before too long. Peter From cjfields at illinois.edu Tue Jan 12 13:18:45 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 12 Jan 2010 12:18:45 -0600 Subject: [Bioperl-l] code.open-bio.org timing out? In-Reply-To: <320fb6e01001121012r68166814o764df62c0a5a6224@mail.gmail.com> References: <320fb6e01001121012r68166814o764df62c0a5a6224@mail.gmail.com> Message-ID: <8E18DCA9-5C72-4201-A213-BF53A6AAAAD2@illinois.edu> On Jan 12, 2010, at 12:12 PM, Peter wrote: > On Tue, Jan 12, 2010 at 6:02 PM, Sheldon McKay wrote: >> Hi all, >> >> I keep timing out trying to do an svn checkout of bioperl-live from >> code.open-bio.org. Any suggestions? >> >> Thanks, >> Sheldon > > The OBF team know about this (its being discussed on root-l), > hopefully they'll have it fixed before too long. > > Peter We probably need to set up some automatic syncing of our read-only code.google.com repo as a backup. Jason had originally set that up, hopefully he'll respond. chris From jason at bioperl.org Tue Jan 12 13:27:55 2010 From: jason at bioperl.org (Jason Stajich) Date: Tue, 12 Jan 2010 10:27:55 -0800 Subject: [Bioperl-l] code.open-bio.org timing out? In-Reply-To: <8E18DCA9-5C72-4201-A213-BF53A6AAAAD2@illinois.edu> References: <320fb6e01001121012r68166814o764df62c0a5a6224@mail.gmail.com> <8E18DCA9-5C72-4201-A213-BF53A6AAAAD2@illinois.edu> Message-ID: Hi - I had setup the google code sync, but then the unfortunately realization that the revision numbers are shared among the wiki and the code SVN (all 1 repo) so when I added a wiki page on the site I screwed up the numbering and it wasn't possible to sync anymore (that I could figure out) without resetting it and I haven't gone back to that. Sorry - I wasn't sure if we had figured out what we wanted to for repositories so I sort of stopped worrying about it. -jason On Jan 12, 2010, at 10:18 AM, Chris Fields wrote: > On Jan 12, 2010, at 12:12 PM, Peter wrote: > >> On Tue, Jan 12, 2010 at 6:02 PM, Sheldon McKay > > wrote: >>> Hi all, >>> >>> I keep timing out trying to do an svn checkout of bioperl-live from >>> code.open-bio.org. Any suggestions? >>> >>> Thanks, >>> Sheldon >> >> The OBF team know about this (its being discussed on root-l), >> hopefully they'll have it fixed before too long. >> >> Peter > > We probably need to set up some automatic syncing of our read-only > code.google.com repo as a backup. Jason had originally set that up, > hopefully he'll respond. > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ From virajj at gmail.com Wed Jan 6 13:20:39 2010 From: virajj at gmail.com (Vijayaraj Nagarajan) Date: Wed, 6 Jan 2010 13:20:39 -0500 Subject: [Bioperl-l] targetp request Message-ID: <581b44531001061020i68b2e80ic15f3bcc830204a@mail.gmail.com> Hi, I am trying to use targetP in bioperl. the documentation at the bioperl site is a bit confusing to me... I would appreciate if you could give a very small example, as to how to use "Bio::Tools::TargetP" to predict the localization of a protein sequence that i have stored as a string. Thanks, Vijay From cjfields at illinois.edu Tue Jan 12 18:36:53 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 12 Jan 2010 17:36:53 -0600 Subject: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP web service In-Reply-To: <231A8D9473704E7697F7A486A0CDA86A@NewLife> References: <231A8D9473704E7697F7A486A0CDA86A@NewLife> Message-ID: Um, just to be clear, this isn't Bio::DB::EUtilities, right (it's Bio::DB::SoapEUtilities)? Otherwise this would be a serious namespace and API conflict with the current EUtilities tools. chris On Jan 12, 2010, at 11:33 AM, Mark A. Jensen wrote: > Hi All-- > > The beta of Bio::DB::SoapEUtilities is now available in the > bioperl-run trunk: one-stop shopping for the NCBI Entrez SOAP web > service. The system is fully WSDL based, and all eutils are > available. The best thing (IMHO) are the result adaptors, which > provide conversion and iteration of SOAP results into BioPerl > objects. Schau, mal: > > use Bio::DB::EUtilities; > my $fac = Bio::DB::EUtilities->new(); # step 1 > my $seqio = $fac->esearch( > -db => 'nucleotide', > -term => 'HIV1 and CCR5 and Brazil' > )->run(-auto_adapt => 1, -rettype => 'fasta'); # step 2, 3 > # yes, it's already done the efetch under the hood... > while ( my $seq = $seqio->next_seq ) { # step 4 > # do something with $seq, a Bio::Seq object... > } > > or this: > > my $links = $fac->elink( -db => 'protein', > -dbfrom => 'nucleotide', > -id => \@nucids )->run( -auto_adapt => 1 ); > > # maybe more than one associated id... > my @prot_0 = $links->id_map( $nucids[0] ); > > while ( my $ls = $links->next_linkset ) { > @ids = $ls->ids; > @submitted_ids = $ls->submitted_ids; > # etc. > } > > and much, much more. See > > http://www.bioperl.org/wiki/HOWTO:EUtilities_Web_Service > > and of course, the POD, for all the details, including > download/installation. Tests in bioperl-run/t. > > cheers, > MAJ > > -- No new dependencies were added or animals mistreated > -- during the making of these modules. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Jan 12 19:22:10 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 12 Jan 2010 18:22:10 -0600 Subject: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP web service In-Reply-To: <5AD210CB0C444A57881BBDD34DE99149@NewLife> References: <231A8D9473704E7697F7A486A0CDA86A@NewLife> <5AD210CB0C444A57881BBDD34DE99149@NewLife> Message-ID: Okay, just making sure (I was getting a bit paranoid). Great work on the SOAP interface, BTW! chris On Jan 12, 2010, at 6:08 PM, Mark A. Jensen wrote: > Um, yeah. > ----- Original Message ----- From: "Chris Fields" > To: "Mark A. Jensen" > Cc: "BioPerl List" > Sent: Tuesday, January 12, 2010 6:36 PM > Subject: Re: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP web service > > > Um, just to be clear, this isn't Bio::DB::EUtilities, right (it's Bio::DB::SoapEUtilities)? Otherwise this would be a serious namespace and API conflict with the current EUtilities tools. > > chris > > On Jan 12, 2010, at 11:33 AM, Mark A. Jensen wrote: > >> Hi All-- >> >> The beta of Bio::DB::SoapEUtilities is now available in the >> bioperl-run trunk: one-stop shopping for the NCBI Entrez SOAP web >> service. The system is fully WSDL based, and all eutils are >> available. The best thing (IMHO) are the result adaptors, which >> provide conversion and iteration of SOAP results into BioPerl >> objects. Schau, mal: >> >> use Bio::DB::EUtilities; >> my $fac = Bio::DB::EUtilities->new(); # step 1 >> my $seqio = $fac->esearch( >> -db => 'nucleotide', >> -term => 'HIV1 and CCR5 and Brazil' >> )->run(-auto_adapt => 1, -rettype => 'fasta'); # step 2, 3 >> # yes, it's already done the efetch under the hood... >> while ( my $seq = $seqio->next_seq ) { # step 4 >> # do something with $seq, a Bio::Seq object... >> } >> >> or this: >> >> my $links = $fac->elink( -db => 'protein', >> -dbfrom => 'nucleotide', >> -id => \@nucids )->run( -auto_adapt => 1 ); >> >> # maybe more than one associated id... >> my @prot_0 = $links->id_map( $nucids[0] ); >> >> while ( my $ls = $links->next_linkset ) { >> @ids = $ls->ids; >> @submitted_ids = $ls->submitted_ids; >> # etc. >> } >> >> and much, much more. See >> >> http://www.bioperl.org/wiki/HOWTO:EUtilities_Web_Service >> >> and of course, the POD, for all the details, including >> download/installation. Tests in bioperl-run/t. >> >> cheers, >> MAJ >> >> -- No new dependencies were added or animals mistreated >> -- during the making of these modules. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Tue Jan 12 19:08:12 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 12 Jan 2010 19:08:12 -0500 Subject: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP web service In-Reply-To: References: <231A8D9473704E7697F7A486A0CDA86A@NewLife> Message-ID: <5AD210CB0C444A57881BBDD34DE99149@NewLife> Um, yeah. ----- Original Message ----- From: "Chris Fields" To: "Mark A. Jensen" Cc: "BioPerl List" Sent: Tuesday, January 12, 2010 6:36 PM Subject: Re: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP web service Um, just to be clear, this isn't Bio::DB::EUtilities, right (it's Bio::DB::SoapEUtilities)? Otherwise this would be a serious namespace and API conflict with the current EUtilities tools. chris On Jan 12, 2010, at 11:33 AM, Mark A. Jensen wrote: > Hi All-- > > The beta of Bio::DB::SoapEUtilities is now available in the > bioperl-run trunk: one-stop shopping for the NCBI Entrez SOAP web > service. The system is fully WSDL based, and all eutils are > available. The best thing (IMHO) are the result adaptors, which > provide conversion and iteration of SOAP results into BioPerl > objects. Schau, mal: > > use Bio::DB::EUtilities; > my $fac = Bio::DB::EUtilities->new(); # step 1 > my $seqio = $fac->esearch( > -db => 'nucleotide', > -term => 'HIV1 and CCR5 and Brazil' > )->run(-auto_adapt => 1, -rettype => 'fasta'); # step 2, 3 > # yes, it's already done the efetch under the hood... > while ( my $seq = $seqio->next_seq ) { # step 4 > # do something with $seq, a Bio::Seq object... > } > > or this: > > my $links = $fac->elink( -db => 'protein', > -dbfrom => 'nucleotide', > -id => \@nucids )->run( -auto_adapt => 1 ); > > # maybe more than one associated id... > my @prot_0 = $links->id_map( $nucids[0] ); > > while ( my $ls = $links->next_linkset ) { > @ids = $ls->ids; > @submitted_ids = $ls->submitted_ids; > # etc. > } > > and much, much more. See > > http://www.bioperl.org/wiki/HOWTO:EUtilities_Web_Service > > and of course, the POD, for all the details, including > download/installation. Tests in bioperl-run/t. > > cheers, > MAJ > > -- No new dependencies were added or animals mistreated > -- during the making of these modules. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Tue Jan 12 20:09:28 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 12 Jan 2010 20:09:28 -0500 Subject: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP webservice In-Reply-To: <5AD210CB0C444A57881BBDD34DE99149@NewLife> References: <231A8D9473704E7697F7A486A0CDA86A@NewLife> <5AD210CB0C444A57881BBDD34DE99149@NewLife> Message-ID: corrected: use Bio::DB::SoapEUtilities; my $fac = Bio::DB::SoapEUtilities->new(); # step 1 my $seqio = $fac->esearch( -db => 'nucleotide', -term => 'HIV1 and CCR5 and Brazil' )->run(-auto_adapt => 1, -rettype => 'fasta'); # step 2, 3 # yes, it's already done the efetch under the hood... while ( my $seq = $seqio->next_seq ) { # step 4 # do something with $seq, a Bio::Seq object... } ----- Original Message ----- From: "Mark A. Jensen" To: "Chris Fields" Cc: "BioPerl List" Sent: Tuesday, January 12, 2010 7:08 PM Subject: Re: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP webservice > Um, yeah. > ----- Original Message ----- > From: "Chris Fields" > To: "Mark A. Jensen" > Cc: "BioPerl List" > Sent: Tuesday, January 12, 2010 6:36 PM > Subject: Re: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP web > service > > > Um, just to be clear, this isn't Bio::DB::EUtilities, right (it's > Bio::DB::SoapEUtilities)? Otherwise this would be a serious namespace and API > conflict with the current EUtilities tools. > > chris > > On Jan 12, 2010, at 11:33 AM, Mark A. Jensen wrote: > >> Hi All-- >> >> The beta of Bio::DB::SoapEUtilities is now available in the >> bioperl-run trunk: one-stop shopping for the NCBI Entrez SOAP web >> service. The system is fully WSDL based, and all eutils are >> available. The best thing (IMHO) are the result adaptors, which >> provide conversion and iteration of SOAP results into BioPerl >> objects. Schau, mal: >> >> use Bio::DB::EUtilities; >> my $fac = Bio::DB::EUtilities->new(); # step 1 >> my $seqio = $fac->esearch( >> -db => 'nucleotide', >> -term => 'HIV1 and CCR5 and Brazil' >> )->run(-auto_adapt => 1, -rettype => 'fasta'); # step 2, 3 >> # yes, it's already done the efetch under the hood... >> while ( my $seq = $seqio->next_seq ) { # step 4 >> # do something with $seq, a Bio::Seq object... >> } >> >> or this: >> >> my $links = $fac->elink( -db => 'protein', >> -dbfrom => 'nucleotide', >> -id => \@nucids )->run( -auto_adapt => 1 ); >> >> # maybe more than one associated id... >> my @prot_0 = $links->id_map( $nucids[0] ); >> >> while ( my $ls = $links->next_linkset ) { >> @ids = $ls->ids; >> @submitted_ids = $ls->submitted_ids; >> # etc. >> } >> >> and much, much more. See >> >> http://www.bioperl.org/wiki/HOWTO:EUtilities_Web_Service >> >> and of course, the POD, for all the details, including >> download/installation. Tests in bioperl-run/t. >> >> cheers, >> MAJ >> >> -- No new dependencies were added or animals mistreated >> -- during the making of these modules. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From tuco at pasteur.fr Wed Jan 13 05:24:34 2010 From: tuco at pasteur.fr (Emmanuel Quevillon) Date: Wed, 13 Jan 2010 11:24:34 +0100 Subject: [Bioperl-l] targetp request In-Reply-To: <581b44531001061020i68b2e80ic15f3bcc830204a@mail.gmail.com> References: <581b44531001061020i68b2e80ic15f3bcc830204a@mail.gmail.com> Message-ID: <4B4D9F62.5010306@pasteur.fr> On 1/6/10 7:20 PM, Vijayaraj Nagarajan wrote: > Hi, > > I am trying to use targetP in bioperl. > the documentation at the bioperl site is a bit confusing to me... > > I would appreciate if you could give a very small example, as to how to use > "Bio::Tools::TargetP" to predict the localization of a protein sequence that > i have stored as a string. > > Thanks, > Vijay > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Dear Vivay, Bio::Tools::TargetP is not intended to run targetp on a sequence but to read and parse results from targetp run. From the Pod doc : DESCRIPTION TargetP modules will provides parsed informations about protein localization. It reads in a targetp output file. It parses the results, and returns a Bio::SeqFeature::Generic object for each sequences found to have a subcellular localization So to analyze your sequence, you'll first need to run targetp on your sequence file to create a targetp result output file. Then use Bio::Tools::TargetP module to parse this result file and get only informations you want/need from the result to be display as shown in the SYNOPSIS of the Pod documentation of the module. HTH Regards Emmanuel From roy.chaudhuri at gmail.com Wed Jan 13 07:52:58 2010 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Wed, 13 Jan 2010 12:52:58 +0000 Subject: [Bioperl-l] create Bio::AlignIO object from hash In-Reply-To: References: <4B4B2A51.9040602@gmail.com> <4B4B2D91.70906@gmail.com> Message-ID: <4B4DC22A.8080701@gmail.com> Upload them to Bugzilla as patches, and one of the devs will review your changes and incorporate them into bioperl-live: http://www.bioperl.org/wiki/HOWTO:SubmitPatch Roy. On 11/01/2010 16:27, Adam Witney wrote: > > Ah excellent, thanks Roy. I was indeed thinking about it the wrong > way. > > In the process of writing this i have created a > > Bio::Tools::Run::Phylo::Phylip::Pars class > > which is essentially just a modified copy of ProtPars. I have also > fixed a few typos and possible bugs in > > Bio/Tools/Run/Phylo/Phylip/Base.pm > Bio/Tools/Run/Phylo/Phylip/PhylipConf.pm Bio/AlignIO/phylip.pm > Bio/Tools/Run/Alignment/Clustalw.pm > > I am of course happy to send these back in to the project... how > would i best do this? > > Cheers > > adam > > > On 11 Jan 2010, at 13:54, Roy Chaudhuri wrote: > >> Actually, I guess some sample code would be more helpful: >> >> use Bio::LocatableSeq; use Bio::SimpleAlign; use Bio::AlignIO; my >> $seq1=Bio::LocatableSeq->new(-id=>'one', -seq=>'AT-CG', -start=>1, >> -end=>4); my $seq2=Bio::LocatableSeq->new(-id=>'two', >> -seq=>'A--CG', -start=>1, -end=>3); my >> $seq3=Bio::LocatableSeq->new(-id=>'three', -seq=>'ATTCG', >> -start=>1, -end=>5); my >> $aln=Bio::SimpleAlign->new(-seqs=>[$seq1,$seq2,$seq3]); >> Bio::AlignIO->new(-format=>'phylip')->write_aln($aln); >> >> Cheers, Roy. >> >> >> On 11/01/2010 13:40, Roy Chaudhuri wrote: >>> Hi Adam, >>> >>> I'm guessing you actually want to create a Bio::SimpleAlign >>> object (representing an alignment), rather than a Bio::AlignIO >>> object (which is just for reading/writing alignment files). >>> Bio::SimpleAlign has a documented new method that allows you to >>> construct an alignment from Bio::LocatableSeq objects, which are >>> similar to Bio::Seq objects but include gaps and start/end >>> coordinates to describe their relationship to other sequences in >>> the alignment. >>> >>> Roy. >>> >>> On 11/01/2010 12:21, Adam Witney wrote: >>>> Hi, >>>> >>>> I am writing a script to automate the running of Phylip Pars. >>>> In the process i have to create a Bio::AlignIO object from a >>>> set of data that i have in a hash. >>>> >>>> I could write the hash data into a phylip file and then load >>>> the Bio::AlignIO from that file, but i wondered if i could skip >>>> the writing and then reading of a temporary file ? >>>> >>>> thanks for any help >>>> >>>> adam _______________________________________________ Bioperl-l >>>> mailing list Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> > From marcelo011982 at gmail.com Wed Jan 13 13:12:04 2010 From: marcelo011982 at gmail.com (Marcelo Iwata) Date: Wed, 13 Jan 2010 16:12:04 -0200 Subject: [Bioperl-l] Blast to Clustalw Format Message-ID: <1c9f28971001131012w65378e9bw12d892e874028056@mail.gmail.com> Hi.. I have an simple Blast result, such as blastn. Is there an scrip to transform such result to Clustalw format in Bioperl ?(.aln) Thanx for any help. From Kevin.M.Brown at asu.edu Wed Jan 13 13:01:42 2010 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Wed, 13 Jan 2010 11:01:42 -0700 Subject: [Bioperl-l] targetp request In-Reply-To: <4B4D9F62.5010306@pasteur.fr> References: <581b44531001061020i68b2e80ic15f3bcc830204a@mail.gmail.com> <4B4D9F62.5010306@pasteur.fr> Message-ID: <1A4207F8295607498283FE9E93B775B4067C133E@EX02.asurite.ad.asu.edu> Sounds like this module might be in the wrong place then. Sounds more like a SeqIO or AlignIO module, heheh. Also looks like the docs might need to be cleaned up a bit for english readability (at least that initial sentence). Kevin Brown Center for Innovations in Medicine Biodesign Institute Arizona State University > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Emmanuel Quevillon > Sent: Wednesday, January 13, 2010 3:25 AM > To: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] targetp request > > On 1/6/10 7:20 PM, Vijayaraj Nagarajan wrote: > > Hi, > > > > I am trying to use targetP in bioperl. > > the documentation at the bioperl site is a bit confusing to me... > > > > I would appreciate if you could give a very small example, > as to how to use > > "Bio::Tools::TargetP" to predict the localization of a > protein sequence that > > i have stored as a string. > > > > Thanks, > > Vijay > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Dear Vivay, > > Bio::Tools::TargetP is not intended to run targetp on a > sequence but to > read and parse results from targetp run. > > From the Pod doc : > > DESCRIPTION > TargetP modules will provides parsed informations > about protein > localization. It > reads in a targetp output file. It parses the results, and > returns a > Bio::SeqFeature::Generic object for each sequences > found to have > a subcellular > localization > > > So to analyze your sequence, you'll first need to run targetp on your > sequence file to create a targetp result output file. Then use > Bio::Tools::TargetP module to parse this result file and get only > informations you want/need from the result to be display as > shown in the > SYNOPSIS of the Pod documentation of the module. > > HTH > > Regards > > Emmanuel > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From maj at fortinbras.us Wed Jan 13 13:44:36 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 13 Jan 2010 13:44:36 -0500 Subject: [Bioperl-l] Blast to Clustalw Format In-Reply-To: <1c9f28971001131012w65378e9bw12d892e874028056@mail.gmail.com> References: <1c9f28971001131012w65378e9bw12d892e874028056@mail.gmail.com> Message-ID: Marcelo- Yes-- look at the code snip at http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_SearchIO combined with the snip at http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods (using -format => 'clustalw') cheers MAJ ----- Original Message ----- From: "Marcelo Iwata" To: Sent: Wednesday, January 13, 2010 1:12 PM Subject: [Bioperl-l] Blast to Clustalw Format > Hi.. > I have an simple Blast result, such as blastn. > Is there an scrip to transform such result to Clustalw format in Bioperl > ?(.aln) > > Thanx for any help. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From dan.kortschak at adelaide.edu.au Wed Jan 13 23:26:46 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Thu, 14 Jan 2010 14:56:46 +1030 Subject: [Bioperl-l] not able to use Bio::Root::IO method Message-ID: <1263443206.8016.47.camel@zoidberg.mbs.adelaide.edu.au> Hi All, I'm having a stupid problem that for some reason I just can't figure out. I'm putting together a B:A:IO:bowtie module to wrap around the B:A:IO:sam module so bowtie output can be used as an assembly start point. For some reason that is escaping me I can't create tempfiles! What should be the relevant code in the module: package Bio::Assembly::IO::bowtie; use strict; use warnings; # Object preamble - inherits from Bio::Root::Root use Bio::SeqIO; use Bio::Tools::Run::Samtools; use Bio::Assembly::IO; use Carp; use Bio::Root::Root; use Bio::Root::IO; use base qw( Bio::Root::Root Bio::Root::IO Bio::Assembly::IO ); and the line (there are a couple of others that are like to fail in the same way, but I've not got that far) my ($sam_tmp_h, $sam_tmp_f) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.sam' ); Which dies with: Can't locate object method "io" via package "Bio::Assembly::IO::bowtie" at /usr/local/share/perl/5.10.0/Bio/Assembly/IO/bowtie.pm line 175. Relevant environment vars: DB<10> x @ISA 0 'Bio::Root::Root' 1 'Bio::Root::IO' 2 'Bio::Assembly::IO' DB<11> x $self 0 Bio::Assembly::IO::bowtie=HASH(0x2d226d8) '_no_head' => undef '_no_sq' => undef '_root_verbose' => 0 Can someone suggest what I'm missing? cheers Dan From maj at fortinbras.us Thu Jan 14 00:11:01 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 14 Jan 2010 00:11:01 -0500 Subject: [Bioperl-l] not able to use Bio::Root::IO method In-Reply-To: <1263443206.8016.47.camel@zoidberg.mbs.adelaide.edu.au> References: <1263443206.8016.47.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: <84196F01FF584C64A79B89FECE2DD86F@NewLife> Hey Dan-- what does your constructor look like? I wonder if something's getting lost in new() and _initialize() chaining spaghetti- MAJ ----- Original Message ----- From: "Dan Kortschak" To: Sent: Wednesday, January 13, 2010 11:26 PM Subject: [Bioperl-l] not able to use Bio::Root::IO method > Hi All, > > I'm having a stupid problem that for some reason I just can't figure > out. I'm putting together a B:A:IO:bowtie module to wrap around the > B:A:IO:sam module so bowtie output can be used as an assembly start > point. > > For some reason that is escaping me I can't create tempfiles! > > What should be the relevant code in the module: > > package Bio::Assembly::IO::bowtie; > use strict; > use warnings; > > # Object preamble - inherits from Bio::Root::Root > > use Bio::SeqIO; > use Bio::Tools::Run::Samtools; > use Bio::Assembly::IO; > use Carp; > use Bio::Root::Root; > use Bio::Root::IO; > use base qw( Bio::Root::Root Bio::Root::IO Bio::Assembly::IO ); > > > and the line (there are a couple of others that are like to fail in the > same way, but I've not got that far) > > my ($sam_tmp_h, $sam_tmp_f) = $self->io->tempfile( -dir => > $self->tempdir(), -suffix => '.sam' ); > > Which dies with: > Can't locate object method "io" via package "Bio::Assembly::IO::bowtie" > at /usr/local/share/perl/5.10.0/Bio/Assembly/IO/bowtie.pm line 175. > > Relevant environment vars: > DB<10> x @ISA > 0 'Bio::Root::Root' > 1 'Bio::Root::IO' > 2 'Bio::Assembly::IO' > > DB<11> x $self > 0 Bio::Assembly::IO::bowtie=HASH(0x2d226d8) > '_no_head' => undef > '_no_sq' => undef > '_root_verbose' => 0 > > > > Can someone suggest what I'm missing? > > cheers > Dan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From dan.kortschak at adelaide.edu.au Thu Jan 14 00:35:35 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Thu, 14 Jan 2010 16:05:35 +1030 Subject: [Bioperl-l] not able to use Bio::Root::IO method In-Reply-To: <84196F01FF584C64A79B89FECE2DD86F@NewLife> References: <1263443206.8016.47.camel@zoidberg.mbs.adelaide.edu.au> <84196F01FF584C64A79B89FECE2DD86F@NewLife> Message-ID: <1263447335.8016.57.camel@zoidberg.mbs.adelaide.edu.au> Thanks Mark, I'm not sure about that since @ISA still includes Bio::Root:IO when it's at the call, but it might be. cheers Dan Here is the entirety of the code (it reasonably short): package Bio::Assembly::IO::bowtie; use strict; use warnings; # Object preamble - inherits from Bio::Root::Root use Bio::SeqIO; use Bio::Tools::Run::Samtools; use Bio::Assembly::IO; use Carp; use Bio::Root::Root; use Bio::Root::IO; use base qw( Bio::Root::Root Bio::Root::IO Bio::Assembly::IO ); our $HD = "\@HD\tVN:1.0\tSO:unsorted\n"; our $PG = "\@PG\tID=Bowtie\n"; our $HAVE_IO_UNCOMPRESS; BEGIN { # check requirements unless ( eval "require Bio::Tools::Run::Bowtie;") { Bio::Root::Root->throw("Bio::Tools::Run::Bowtie is not available - cannot extract refdb from index."); } unless ( eval "require IO::Uncompress::Gunzip; \$HAVE_IO_UNCOMPRESS = 1") { Bio::Root::Root->warn("IO::Uncompress::Gunzip is not available; you'll have to do your decompression by hand."); } } sub new { my $class = shift; my @args = @_; my $self = $class->SUPER::new(@args); my ($file, $index, $no_head, $no_sq) = $self->_rearrange([qw(FILE INDEX NO_HEAD NO_SQ)], @args); $file =~ s/^{'_no_head'} = $no_head; $self->{'_no_sq'} = $no_sq; # get the sequence so samtools can work with it my $inspector = Bio::Tools::Run::Bowtie->new( -command => 'inspect' ); my $refdb = $inspector->run($index); my $bam_file = $self->_make_bam($self->_bowtie_to_sam($file, $refdb)); my $sam = Bio::Assembly::IO->new( -file => "<$bam_file", -refdb => $refdb , -format => 'sam' ); return $sam; } sub _bowtie_to_sam { my ($self, $file, $refdb) = @_; $self->throw("'$file' does not exist or is not readable.") unless ( -e $file && -r $file ); my $guesser = Bio::Tools::GuessSeqFormat->new(-file=>$file); $self->throw("'$file' is not a bowtie formatted file.") unless $guesser->guess =~ m/^bowtie$/; my %SQ; my $mapq = 255; my $in_pair; my @mate_line; my $mlen; if ($file =~ m/\.gz[^.]*$/) { unless ($HAVE_IO_UNCOMPRESS) { croak( "IO::Uncompress::Gunzip not available, can't expand '$_'" ); } my ($tfh, $tf) = $self->io->tempfile; my $z = IO::Uncompress::Gunzip->new($_); while (<$z>) { print $tfh $_ } close $tfh; $file = $tf; } open(my $fh, $file) or $self->throw("Can not open '$file' for reading: $!"); # create temp file for working my ($sam_tmp_h, $sam_tmp_f) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.sam' ); while ($fh) { chomp; my ($qname,$strand,$rname,$pos,$seq,$qual,$m,$details)=split("\t",$_); $SQ{$rname} = 1; my $paired_f = ($qname =~ m#/[12]#) ? 0x03 : 0; my $strand_f = ($strand eq '-') ? 0x10 : 0; my $op_strand_f = ($strand eq '+' && $paired_f) ? 0x20 : 0; my $first_f = ($qname =~ m#/1#) ? 0x40 : 0; my $second_f = ($qname =~ m#/2#) ? 0x80 : 0; my $flag = $paired_f | $strand_f | $op_strand_f | $first_f | $second_f; $pos++; my $len = length $seq; die unless $len == length $qual; my $cigar = $len.'M'; my @detail = split(',',$details); my $dist = 'NM:i:'.scalar @detail; my @mismatch; my $last_pos = 0; for (@detail) { m/(\d+):(\w)>\w/; my $err = ($1-$last_pos); $last_pos = $1+1; push @mismatch,($err,$2); } push @mismatch, $len-$last_pos; @mismatch = reverse @mismatch if $strand eq '-'; my $mismatch = join('',('MD:Z:', at mismatch)); if ($paired_f) { my $mrnm = '='; if ($in_pair) { my $mpos = $mate_line[3]; $mate_line[7] = $pos; my $isize = $mpos-$pos-$len; $mate_line[8] = -$isize; print $sam_tmp_h join("\t", at mate_line),"\n"; print $sam_tmp_h join("\t",$qname, $flag, $rname, $pos, $mapq, $cigar, $mrnm, $mpos, $isize, $seq, $qual, $mismatch, $dist),"\n"; $in_pair = 0; } else { $mlen = $len; @mate_line = ($qname, $flag, $rname, $pos, $mapq, $cigar, $mrnm, undef, undef, $seq, $qual, $mismatch, $dist); $in_pair = 1; } } else { my $mrnm = '*'; my $mpos = 0; my $isize = 0; print $sam_tmp_h join("\t",$qname, $flag, $rname, $pos, $mapq, $cigar, $mrnm, $mpos, $isize, $seq, $qual, $mismatch, $dist),"\n"; } } close($fh); $sam_tmp_h->close; return $sam_tmp_f if $self->{'_no_head'}; my ($samh, $samf) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.sam' ); # print header print $samh $HD; # print sequence dictionary unless ($self->{'_no_sq'}) { my $db = Bio::SeqIO->new( -file => $refdb, -format => 'fasta' ); while ( my $seq = $db->next_seq() ) { $SQ{$seq->id} = $seq->length if $SQ{$seq->id}; } map { print $samh join("\t", ('@SQ', "SN:$_", "LN:$SQ{$_}")), "\n" } keys %SQ; } # print program print $samh $PG; open($sam_tmp_h, $sam_tmp_f) or $self->throw("Can not open '$sam_tmp_f' for reading: $!"); print $samh $_ while ($sam_tmp_h); close($sam_tmp_h); $samh->close; return $samf; } sub _make_bam { my ($self, $file) = @_; $self->throw("'$file' does not exist or is not readable") unless ( -e $file && -r $file ); # make a sorted bam file from a sam file input my ($bamh, $bamf) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.bam' ); my ($srth, $srtf) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.srt' ); $_->close for ($bamh, $srth); my $samt = Bio::Tools::Run::Samtools->new( -command => 'view', -sam_input => 1, -bam_output => 1 ); $samt->run( -bam => $file, -out => $bamf ); $samt = Bio::Tools::Run::Samtools->new( -command => 'sort' ); $samt->run( -bam => $bamf, -pfx => $srtf); return $srtf.'.bam' } 1; On Thu, 2010-01-14 at 00:11 -0500, Mark A. Jensen wrote: > Hey Dan-- what does your constructor look like? I wonder if > something's getting > lost in new() and _initialize() chaining spaghetti- MAJ > From dan.kortschak at adelaide.edu.au Thu Jan 14 00:35:48 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Thu, 14 Jan 2010 16:05:48 +1030 Subject: [Bioperl-l] not able to use Bio::Root::IO method In-Reply-To: References: <1263443206.8016.47.camel@zoidberg.mbs.adelaide.edu.au> <84196F01FF584C64A79B89FECE2DD86F@NewLife> <1263446261.8016.50.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: <1263447348.8016.59.camel@zoidberg.mbs.adelaide.edu.au> I've had a bit of a play with that, but no luck. Dan On Thu, 2010-01-14 at 00:26 -0500, Mark A. Jensen wrote: > I've found that rearranging the items in the 'use base' array can > sometimes > recover > lost methods. I don't know enough of the arcana to know why it works. > (Sometimes, > java starts looking pretty good from here...) > From maj at fortinbras.us Thu Jan 14 00:38:00 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 14 Jan 2010 00:38:00 -0500 Subject: [Bioperl-l] Fw: not able to use Bio::Root::IO method Message-ID: <59E0A4EAF5934DC6BDCA7D8E98DB085F@NewLife> up to list ----- Original Message ----- From: "Mark A. Jensen" To: "Dan Kortschak" Sent: Thursday, January 14, 2010 12:36 AM Subject: Re: [Bioperl-l] not able to use Bio::Root::IO method > Aha-- check out the pod for Bio::Root::IO: > > "This module provides methods that will usually be needed for any sort > of file- or stream-related input/output, e.g., keeping track of a file > handle, transient printing and reading from the file handle, a close > method, automatically closing the handle on garbage collection, etc. > > To use this for your own code you will either want to inherit from > this module, or instantiate an object for every file or stream you are > dealing with. In the first case this module will most likely not be > the first class off which your class inherits; therefore you need to > call _initialize_io() with the named parameters in order to set file > handle, open file, etc automatically." > > I think you're wanting a call to $self->_initialize_io(). (There is no io() > method explicitly defined in any of the base classes.) > MAJ > ----- Original Message ----- > From: "Dan Kortschak" > To: > Sent: Wednesday, January 13, 2010 11:26 PM > Subject: [Bioperl-l] not able to use Bio::Root::IO method > > >> Hi All, >> >> I'm having a stupid problem that for some reason I just can't figure >> out. I'm putting together a B:A:IO:bowtie module to wrap around the >> B:A:IO:sam module so bowtie output can be used as an assembly start >> point. >> >> For some reason that is escaping me I can't create tempfiles! >> >> What should be the relevant code in the module: >> >> package Bio::Assembly::IO::bowtie; >> use strict; >> use warnings; >> >> # Object preamble - inherits from Bio::Root::Root >> >> use Bio::SeqIO; >> use Bio::Tools::Run::Samtools; >> use Bio::Assembly::IO; >> use Carp; >> use Bio::Root::Root; >> use Bio::Root::IO; >> use base qw( Bio::Root::Root Bio::Root::IO Bio::Assembly::IO ); >> >> >> and the line (there are a couple of others that are like to fail in the >> same way, but I've not got that far) >> >> my ($sam_tmp_h, $sam_tmp_f) = $self->io->tempfile( -dir => >> $self->tempdir(), -suffix => '.sam' ); >> >> Which dies with: >> Can't locate object method "io" via package "Bio::Assembly::IO::bowtie" >> at /usr/local/share/perl/5.10.0/Bio/Assembly/IO/bowtie.pm line 175. >> >> Relevant environment vars: >> DB<10> x @ISA >> 0 'Bio::Root::Root' >> 1 'Bio::Root::IO' >> 2 'Bio::Assembly::IO' >> >> DB<11> x $self >> 0 Bio::Assembly::IO::bowtie=HASH(0x2d226d8) >> '_no_head' => undef >> '_no_sq' => undef >> '_root_verbose' => 0 >> >> >> >> Can someone suggest what I'm missing? >> >> cheers >> Dan >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> From maj at fortinbras.us Thu Jan 14 00:50:11 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 14 Jan 2010 00:50:11 -0500 Subject: [Bioperl-l] not able to use Bio::Root::IO method In-Reply-To: <1263447348.8016.59.camel@zoidberg.mbs.adelaide.edu.au> References: <1263443206.8016.47.camel@zoidberg.mbs.adelaide.edu.au> <84196F01FF584C64A79B89FECE2DD86F@NewLife> <1263446261.8016.50.camel@zoidberg.mbs.adelaide.edu.au> <1263447348.8016.59.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: <82BFF47099684EF496DB3875D39DCA14@NewLife> For the benefit of the list, I categorically deny ever making the statement about java below.... MAJ ----- Original Message ----- From: "Dan Kortschak" To: "Mark A. Jensen" Cc: Sent: Thursday, January 14, 2010 12:35 AM Subject: Re: [Bioperl-l] not able to use Bio::Root::IO method > I've had a bit of a play with that, but no luck. > > Dan > > On Thu, 2010-01-14 at 00:26 -0500, Mark A. Jensen wrote: >> I've found that rearranging the items in the 'use base' array can >> sometimes >> recover >> lost methods. I don't know enough of the arcana to know why it works. >> (Sometimes, >> java starts looking pretty good from here...) >> > > From cjfields at illinois.edu Thu Jan 14 02:23:41 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 14 Jan 2010 01:23:41 -0600 Subject: [Bioperl-l] not able to use Bio::Root::IO method In-Reply-To: <1263447335.8016.57.camel@zoidberg.mbs.adelaide.edu.au> References: <1263443206.8016.47.camel@zoidberg.mbs.adelaide.edu.au> <84196F01FF584C64A79B89FECE2DD86F@NewLife> <1263447335.8016.57.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: You can remove separate 'use' directives if they are declared with 'use base' (they will be imported then). Also, Bio::Root::IO inherits Bio::Root::Root, and Bio::Assembly::IO should inherit from Bio::Root::IO, so the only base module you should need is Bio::Assembly::IO. It's possible having all three is confusing the interpreter. chris On Jan 13, 2010, at 11:35 PM, Dan Kortschak wrote: > Thanks Mark, I'm not sure about that since @ISA still includes > Bio::Root:IO when it's at the call, but it might be. > > cheers > Dan > > Here is the entirety of the code (it reasonably short): > > package Bio::Assembly::IO::bowtie; > use strict; > use warnings; > > # Object preamble - inherits from Bio::Root::Root > > use Bio::SeqIO; > use Bio::Tools::Run::Samtools; > use Bio::Assembly::IO; > use Carp; > use Bio::Root::Root; > use Bio::Root::IO; > use base qw( Bio::Root::Root Bio::Root::IO Bio::Assembly::IO ); > > our $HD = "\@HD\tVN:1.0\tSO:unsorted\n"; > our $PG = "\@PG\tID=Bowtie\n"; > > our $HAVE_IO_UNCOMPRESS; > BEGIN { > # check requirements > unless ( eval "require Bio::Tools::Run::Bowtie;") { > Bio::Root::Root->throw("Bio::Tools::Run::Bowtie is not available - cannot extract refdb from index."); > } > unless ( eval "require IO::Uncompress::Gunzip; \$HAVE_IO_UNCOMPRESS = 1") { > Bio::Root::Root->warn("IO::Uncompress::Gunzip is not available; you'll have to do your decompression by hand."); > } > } > > sub new { > my $class = shift; > my @args = @_; > my $self = $class->SUPER::new(@args); > my ($file, $index, $no_head, $no_sq) = $self->_rearrange([qw(FILE INDEX NO_HEAD NO_SQ)], @args); > $file =~ s/^ $self->{'_no_head'} = $no_head; > $self->{'_no_sq'} = $no_sq; > # get the sequence so samtools can work with it > my $inspector = Bio::Tools::Run::Bowtie->new( -command => 'inspect' ); > my $refdb = $inspector->run($index); > my $bam_file = $self->_make_bam($self->_bowtie_to_sam($file, $refdb)); > my $sam = Bio::Assembly::IO->new( -file => "<$bam_file", -refdb => $refdb , -format => 'sam' ); > return $sam; > } > > sub _bowtie_to_sam { > my ($self, $file, $refdb) = @_; > > $self->throw("'$file' does not exist or is not readable.") > unless ( -e $file && -r $file ); > my $guesser = Bio::Tools::GuessSeqFormat->new(-file=>$file); > $self->throw("'$file' is not a bowtie formatted file.") unless $guesser->guess =~ m/^bowtie$/; > > my %SQ; > my $mapq = 255; > my $in_pair; > my @mate_line; > my $mlen; > > if ($file =~ m/\.gz[^.]*$/) { > unless ($HAVE_IO_UNCOMPRESS) { > croak( "IO::Uncompress::Gunzip not available, can't expand '$_'" ); > } > my ($tfh, $tf) = $self->io->tempfile; > my $z = IO::Uncompress::Gunzip->new($_); > while (<$z>) { print $tfh $_ } > close $tfh; > $file = $tf; > } > > open(my $fh, $file) or > $self->throw("Can not open '$file' for reading: $!"); > > # create temp file for working > my ($sam_tmp_h, $sam_tmp_f) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.sam' ); > > while ($fh) { > chomp; > my ($qname,$strand,$rname,$pos,$seq,$qual,$m,$details)=split("\t",$_); > $SQ{$rname} = 1; > > my $paired_f = ($qname =~ m#/[12]#) ? 0x03 : 0; > my $strand_f = ($strand eq '-') ? 0x10 : 0; > my $op_strand_f = ($strand eq '+' && $paired_f) ? 0x20 : 0; > my $first_f = ($qname =~ m#/1#) ? 0x40 : 0; > my $second_f = ($qname =~ m#/2#) ? 0x80 : 0; > my $flag = $paired_f | $strand_f | $op_strand_f | $first_f | $second_f; > > $pos++; > my $len = length $seq; > die unless $len == length $qual; > my $cigar = $len.'M'; > my @detail = split(',',$details); > my $dist = 'NM:i:'.scalar @detail; > > my @mismatch; > my $last_pos = 0; > for (@detail) { > m/(\d+):(\w)>\w/; > my $err = ($1-$last_pos); > $last_pos = $1+1; > push @mismatch,($err,$2); > } > push @mismatch, $len-$last_pos; > @mismatch = reverse @mismatch if $strand eq '-'; > my $mismatch = join('',('MD:Z:', at mismatch)); > > if ($paired_f) { > my $mrnm = '='; > if ($in_pair) { > my $mpos = $mate_line[3]; > $mate_line[7] = $pos; > my $isize = $mpos-$pos-$len; > $mate_line[8] = -$isize; > print $sam_tmp_h join("\t", at mate_line),"\n"; > print $sam_tmp_h join("\t",$qname, $flag, $rname, $pos, $mapq, $cigar, $mrnm, $mpos, $isize, $seq, $qual, $mismatch, $dist),"\n"; > $in_pair = 0; > } else { > $mlen = $len; > @mate_line = ($qname, $flag, $rname, $pos, $mapq, $cigar, $mrnm, undef, undef, $seq, $qual, $mismatch, $dist); > $in_pair = 1; > } > } else { > my $mrnm = '*'; > my $mpos = 0; > my $isize = 0; > print $sam_tmp_h join("\t",$qname, $flag, $rname, $pos, $mapq, $cigar, $mrnm, $mpos, $isize, $seq, $qual, $mismatch, $dist),"\n"; > } > } > > close($fh); > $sam_tmp_h->close; > > return $sam_tmp_f if $self->{'_no_head'}; > > my ($samh, $samf) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.sam' ); > > # print header > print $samh $HD; > > # print sequence dictionary > unless ($self->{'_no_sq'}) { > my $db = Bio::SeqIO->new( -file => $refdb, -format => 'fasta' ); > while ( my $seq = $db->next_seq() ) { > $SQ{$seq->id} = $seq->length if $SQ{$seq->id}; > } > > map { print $samh join("\t", ('@SQ', "SN:$_", "LN:$SQ{$_}")), "\n" } keys %SQ; > } > > # print program > print $samh $PG; > > open($sam_tmp_h, $sam_tmp_f) or > $self->throw("Can not open '$sam_tmp_f' for reading: $!"); > > print $samh $_ while ($sam_tmp_h); > > close($sam_tmp_h); > $samh->close; > > return $samf; > } > > sub _make_bam { > my ($self, $file) = @_; > > $self->throw("'$file' does not exist or is not readable") > unless ( -e $file && -r $file ); > > # make a sorted bam file from a sam file input > my ($bamh, $bamf) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.bam' ); > my ($srth, $srtf) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.srt' ); > $_->close for ($bamh, $srth); > > my $samt = Bio::Tools::Run::Samtools->new( -command => 'view', > -sam_input => 1, > -bam_output => 1 ); > > $samt->run( -bam => $file, -out => $bamf ); > > $samt = Bio::Tools::Run::Samtools->new( -command => 'sort' ); > > $samt->run( -bam => $bamf, -pfx => $srtf); > > return $srtf.'.bam' > } > > 1; > > > On Thu, 2010-01-14 at 00:11 -0500, Mark A. Jensen wrote: >> Hey Dan-- what does your constructor look like? I wonder if >> something's getting >> lost in new() and _initialize() chaining spaghetti- MAJ >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Jan 14 02:25:05 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 14 Jan 2010 01:25:05 -0600 Subject: [Bioperl-l] Fw: not able to use Bio::Root::IO method In-Reply-To: <59E0A4EAF5934DC6BDCA7D8E98DB085F@NewLife> References: <59E0A4EAF5934DC6BDCA7D8E98DB085F@NewLife> Message-ID: <1DB926E1-9C6F-4B96-8D7E-28317DD7DE42@illinois.edu> Yes, that's true. The call to an io() is a Bio::Tools::Run::WrapperBase thing (the io() is a Bio::Root::IO instance). chris On Jan 13, 2010, at 11:38 PM, Mark A. Jensen wrote: > up to list > ----- Original Message ----- From: "Mark A. Jensen" > To: "Dan Kortschak" > Sent: Thursday, January 14, 2010 12:36 AM > Subject: Re: [Bioperl-l] not able to use Bio::Root::IO method > > >> Aha-- check out the pod for Bio::Root::IO: >> "This module provides methods that will usually be needed for any sort >> of file- or stream-related input/output, e.g., keeping track of a file >> handle, transient printing and reading from the file handle, a close >> method, automatically closing the handle on garbage collection, etc. >> To use this for your own code you will either want to inherit from >> this module, or instantiate an object for every file or stream you are >> dealing with. In the first case this module will most likely not be >> the first class off which your class inherits; therefore you need to >> call _initialize_io() with the named parameters in order to set file >> handle, open file, etc automatically." >> I think you're wanting a call to $self->_initialize_io(). (There is no io() method explicitly defined in any of the base classes.) >> MAJ >> ----- Original Message ----- From: "Dan Kortschak" >> To: >> Sent: Wednesday, January 13, 2010 11:26 PM >> Subject: [Bioperl-l] not able to use Bio::Root::IO method >>> Hi All, >>> I'm having a stupid problem that for some reason I just can't figure >>> out. I'm putting together a B:A:IO:bowtie module to wrap around the >>> B:A:IO:sam module so bowtie output can be used as an assembly start >>> point. >>> For some reason that is escaping me I can't create tempfiles! >>> What should be the relevant code in the module: >>> package Bio::Assembly::IO::bowtie; >>> use strict; >>> use warnings; >>> # Object preamble - inherits from Bio::Root::Root >>> use Bio::SeqIO; >>> use Bio::Tools::Run::Samtools; >>> use Bio::Assembly::IO; >>> use Carp; >>> use Bio::Root::Root; >>> use Bio::Root::IO; >>> use base qw( Bio::Root::Root Bio::Root::IO Bio::Assembly::IO ); >>> and the line (there are a couple of others that are like to fail in the >>> same way, but I've not got that far) >>> my ($sam_tmp_h, $sam_tmp_f) = $self->io->tempfile( -dir => >>> $self->tempdir(), -suffix => '.sam' ); >>> Which dies with: >>> Can't locate object method "io" via package "Bio::Assembly::IO::bowtie" >>> at /usr/local/share/perl/5.10.0/Bio/Assembly/IO/bowtie.pm line 175. >>> Relevant environment vars: >>> DB<10> x @ISA 0 'Bio::Root::Root' >>> 1 'Bio::Root::IO' >>> 2 'Bio::Assembly::IO' >>> DB<11> x $self >>> 0 Bio::Assembly::IO::bowtie=HASH(0x2d226d8) >>> '_no_head' => undef >>> '_no_sq' => undef >>> '_root_verbose' => 0 >>> Can someone suggest what I'm missing? >>> cheers >>> Dan >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From dan.kortschak at adelaide.edu.au Thu Jan 14 02:59:20 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Thu, 14 Jan 2010 18:29:20 +1030 Subject: [Bioperl-l] not able to use Bio::Root::IO method In-Reply-To: References: <1263443206.8016.47.camel@zoidberg.mbs.adelaide.edu.au> <84196F01FF584C64A79B89FECE2DD86F@NewLife> <1263447335.8016.57.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: <1263455960.4630.3.camel@epistle> Thanks Chris, I've done that, and since the inheritance is direct (rather than being a constructed attribute in the object hash) the calls are $obj->temp* rather than the $obj->io->temp* that I was using. It works now and is much clearer having gotten rid of much of the declarations. cheers Dan On Thu, 2010-01-14 at 01:23 -0600, Chris Fields wrote: > You can remove separate 'use' directives if they are declared with > 'use base' (they will be imported then). Also, Bio::Root::IO inherits > Bio::Root::Root, and Bio::Assembly::IO should inherit from > Bio::Root::IO, so the only base module you should need is > Bio::Assembly::IO. It's possible having all three is confusing the > interpreter. > > chris From marcelo011982 at gmail.com Thu Jan 14 08:44:25 2010 From: marcelo011982 at gmail.com (Marcelo Iwata) Date: Thu, 14 Jan 2010 11:44:25 -0200 Subject: [Bioperl-l] Blast to Clustalw Format In-Reply-To: References: <1c9f28971001131012w65378e9bw12d892e874028056@mail.gmail.com> Message-ID: <1c9f28971001140544v715fc9acue40f836a31529307@mail.gmail.com> Thanks Mark. I think that most of you already know it. But , i'll put it for new users: #!/usr/bin/perl -w use strict; use Bio::SearchIO; use Bio::AlignIO; my $in = new Bio::SearchIO(-format => 'blast', -file => ' ../../fontes/exemplos/blat/teste2/output.blast '); my $aln; my $alnIO; $alnIO = Bio::AlignIO->new(-format =>"clustalw", -file => ">hsp.aln"); while ( my $result = $in->next_result ) { ## $result is a Bio::Search::Result::ResultI compliant object while ( my $hit = $result->next_hit ) { ## $hit is a Bio::Search::Hit::HitI compliant object while ( my $hsp = $hit->next_hsp ) { ## $hsp is a Bio::Search::HSP::HSPI compliant object $aln = $hsp->get_aln; $alnIO->write_aln($aln); } } } On Wed, Jan 13, 2010 at 4:44 PM, Mark A. Jensen wrote: > Marcelo- > Yes-- look at the code snip at > http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_SearchIO > combined with the snip at > http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods > (using -format => 'clustalw') > cheers MAJ > ----- Original Message ----- From: "Marcelo Iwata" < > marcelo011982 at gmail.com> > To: > Sent: Wednesday, January 13, 2010 1:12 PM > Subject: [Bioperl-l] Blast to Clustalw Format > > > Hi.. >> I have an simple Blast result, such as blastn. >> Is there an scrip to transform such result to Clustalw format in Bioperl >> ?(.aln) >> >> Thanx for any help. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> From marcelo011982 at gmail.com Thu Jan 14 08:46:21 2010 From: marcelo011982 at gmail.com (Marcelo Iwata) Date: Thu, 14 Jan 2010 11:46:21 -0200 Subject: [Bioperl-l] Blast to Clustalw Format In-Reply-To: <1c9f28971001140544v715fc9acue40f836a31529307@mail.gmail.com> References: <1c9f28971001131012w65378e9bw12d892e874028056@mail.gmail.com> <1c9f28971001140544v715fc9acue40f836a31529307@mail.gmail.com> Message-ID: <1c9f28971001140546r275a0b67pec68de5ab8bff015@mail.gmail.com> Sorry , the correct code is: #!/usr/bin/perl -w use strict; use Bio::SearchIO; use Bio::AlignIO; my $in = new Bio::SearchIO(-format => 'blast', -file => ' ../../fontes/exemplos/blat/teste2/output.blast '); my $aln; my $alnIO; $alnIO = Bio::AlignIO->new(-format =>"clustalw", -file => ">hsp.aln"); while ( my $result = $in->next_result ) { ## $result is a Bio::Search::Result::ResultI compliant object while ( my $hit = $result->next_hit ) { ## $hit is a Bio::Search::Hit::HitI compliant object while ( my $hsp = $hit->next_hsp ) { ## $hsp is a Bio::Search::HSP::HSPI compliant object $aln = $hsp->get_aln; $alnIO->write_aln($aln); } } } On Thu, Jan 14, 2010 at 11:44 AM, Marcelo Iwata wrote: > Thanks Mark. > I think that most of you already know it. > But , i'll put it for new users: > > > #!/usr/bin/perl -w > > use strict; > use Bio::SearchIO; > use Bio::AlignIO; > > my $in = new Bio::SearchIO(-format => 'blast', > -file => ' > ../../fontes/exemplos/blat/teste2/output.blast '); > my $aln; > my $alnIO; > $alnIO = Bio::AlignIO->new(-format =>"clustalw", -file => ">hsp.aln"); > while ( my $result = $in->next_result ) { > ## $result is a Bio::Search::Result::ResultI compliant object > while ( my $hit = $result->next_hit ) { > ## $hit is a Bio::Search::Hit::HitI compliant object > while ( my $hsp = $hit->next_hsp ) { > ## $hsp is a Bio::Search::HSP::HSPI compliant object > $aln = $hsp->get_aln; > $alnIO->write_aln($aln); > > > } > } > } > > > On Wed, Jan 13, 2010 at 4:44 PM, Mark A. Jensen wrote: > >> Marcelo- >> Yes-- look at the code snip at >> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_SearchIO >> combined with the snip at >> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods >> (using -format => 'clustalw') >> cheers MAJ >> ----- Original Message ----- From: "Marcelo Iwata" < >> marcelo011982 at gmail.com> >> To: >> Sent: Wednesday, January 13, 2010 1:12 PM >> Subject: [Bioperl-l] Blast to Clustalw Format >> >> >> Hi.. >>> I have an simple Blast result, such as blastn. >>> Is there an scrip to transform such result to Clustalw format in >>> Bioperl >>> ?(.aln) >>> >>> Thanx for any help. >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> > From maj at fortinbras.us Thu Jan 14 08:54:31 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 14 Jan 2010 08:54:31 -0500 Subject: [Bioperl-l] Blast to Clustalw Format In-Reply-To: <1c9f28971001140546r275a0b67pec68de5ab8bff015@mail.gmail.com> References: <1c9f28971001131012w65378e9bw12d892e874028056@mail.gmail.com><1c9f28971001140544v715fc9acue40f836a31529307@mail.gmail.com> <1c9f28971001140546r275a0b67pec68de5ab8bff015@mail.gmail.com> Message-ID: <1B8891488AA746F49BCAAB531FBE4D0B@NewLife> Thanks Marcelo-- code snips always appreciated! MAJ ----- Original Message ----- From: "Marcelo Iwata" To: Sent: Thursday, January 14, 2010 8:46 AM Subject: Re: [Bioperl-l] Blast to Clustalw Format > Sorry , the correct code is: > > > > #!/usr/bin/perl -w > > use strict; > use Bio::SearchIO; > use Bio::AlignIO; > > my $in = new Bio::SearchIO(-format => 'blast', > -file => ' > ../../fontes/exemplos/blat/teste2/output.blast '); > my $aln; > my $alnIO; > $alnIO = Bio::AlignIO->new(-format =>"clustalw", -file => ">hsp.aln"); > while ( my $result = $in->next_result ) { > ## $result is a Bio::Search::Result::ResultI compliant object > while ( my $hit = $result->next_hit ) { > ## $hit is a Bio::Search::Hit::HitI compliant object > while ( my $hsp = $hit->next_hsp ) { > ## $hsp is a Bio::Search::HSP::HSPI compliant object > $aln = $hsp->get_aln; > $alnIO->write_aln($aln); > > } > } > } > > > On Thu, Jan 14, 2010 at 11:44 AM, Marcelo Iwata > wrote: > >> Thanks Mark. >> I think that most of you already know it. >> But , i'll put it for new users: >> >> >> #!/usr/bin/perl -w >> >> use strict; >> use Bio::SearchIO; >> use Bio::AlignIO; >> >> my $in = new Bio::SearchIO(-format => 'blast', >> -file => ' >> ../../fontes/exemplos/blat/teste2/output.blast '); >> my $aln; >> my $alnIO; >> $alnIO = Bio::AlignIO->new(-format =>"clustalw", -file => ">hsp.aln"); >> while ( my $result = $in->next_result ) { >> ## $result is a Bio::Search::Result::ResultI compliant object >> while ( my $hit = $result->next_hit ) { >> ## $hit is a Bio::Search::Hit::HitI compliant object >> while ( my $hsp = $hit->next_hsp ) { >> ## $hsp is a Bio::Search::HSP::HSPI compliant object >> $aln = $hsp->get_aln; >> $alnIO->write_aln($aln); >> >> >> } >> } >> } >> >> >> On Wed, Jan 13, 2010 at 4:44 PM, Mark A. Jensen wrote: >> >>> Marcelo- >>> Yes-- look at the code snip at >>> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_SearchIO >>> combined with the snip at >>> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods >>> (using -format => 'clustalw') >>> cheers MAJ >>> ----- Original Message ----- From: "Marcelo Iwata" < >>> marcelo011982 at gmail.com> >>> To: >>> Sent: Wednesday, January 13, 2010 1:12 PM >>> Subject: [Bioperl-l] Blast to Clustalw Format >>> >>> >>> Hi.. >>>> I have an simple Blast result, such as blastn. >>>> Is there an scrip to transform such result to Clustalw format in >>>> Bioperl >>>> ?(.aln) >>>> >>>> Thanx for any help. >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From sidd.basu at gmail.com Thu Jan 14 14:15:04 2010 From: sidd.basu at gmail.com (Siddhartha Basu) Date: Thu, 14 Jan 2010 13:15:04 -0600 Subject: [Bioperl-l] reading blast report Message-ID: <4b4f6d3b.5544f10a.40b1.4126@mx.google.com> Hi, I have a script that reads a tblastn report(13000 records) and loads in a chado database(Bio::Chado::Schema module), however the machine runs of memory. I am trying to figure out other than loading the database stuff if it the reading of SearchIO module could consume a lot of memory. So, when i am reading a blast file and getting the result object .... while (my $result = $searchio->next_result) * Does the searchio object loads a huge chunk of file in the memory or for each iteration it only reads a part of the result. * Does doing an index on blast report and then reading from it be much faster and why. And is there any way i could iterate through each record in the index, will that be helpful. -siddhartha From jason at bioperl.org Thu Jan 14 14:53:29 2010 From: jason at bioperl.org (Jason Stajich) Date: Thu, 14 Jan 2010 11:53:29 -0800 Subject: [Bioperl-l] reading blast report In-Reply-To: <4b4f6d3b.5544f10a.40b1.4126@mx.google.com> References: <4b4f6d3b.5544f10a.40b1.4126@mx.google.com> Message-ID: <83075C66-90CC-4250-890E-FE42F5FEB019@bioperl.org> What aspects of the report are you loading? You might consider the blast report as tab-delimited (-m 8 format) if you only are interested in start/end positions and scores of ailgnments which is a simpler and reduced dataset that has lower memory footprint by the parser. Searchio (default) -format => blast - you can try the BLAST -format => blast_pull instead which lazy parses to create objects and will reduce memory consumption. -jason On Jan 14, 2010, at 11:15 AM, Siddhartha Basu wrote: > Hi, > I have a script that reads a tblastn report(13000 records) and loads > in > a chado database(Bio::Chado::Schema module), however the machine > runs of memory. I am trying to figure > out other than loading the database stuff > if it the reading of SearchIO module could consume a lot of memory. > So, > when i am reading a blast file and getting the result object .... > > while (my $result = $searchio->next_result) > > * Does the searchio object loads a huge chunk of file in the memory or > for each iteration it only reads a part of the result. > > * Does doing an index on blast report and then reading from it be much > faster and why. And is there any way i could iterate through each > record in the index, will that be helpful. > > -siddhartha > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ From sidd.basu at gmail.com Thu Jan 14 15:15:45 2010 From: sidd.basu at gmail.com (Siddhartha Basu) Date: Thu, 14 Jan 2010 14:15:45 -0600 Subject: [Bioperl-l] Re: reading blast report In-Reply-To: <83075C66-90CC-4250-890E-FE42F5FEB019@bioperl.org> References: <4b4f6d3b.5544f10a.40b1.4126@mx.google.com> <83075C66-90CC-4250-890E-FE42F5FEB019@bioperl.org> Message-ID: <4b4f7b74.5744f10a.7087.4813@mx.google.com> On Thu, 14 Jan 2010, Jason Stajich wrote: > What aspects of the report are you loading? You might consider the blast > report as tab-delimited (-m 8 format) if you only are interested in > start/end positions and scores of ailgnments which is a simpler and reduced > dataset that has lower memory footprint by the parser. I think this would be a better approach i am mostly interested in start/end/score data only. > > Searchio (default) -format => blast - you can try the BLAST -format => > blast_pull instead which lazy parses to create objects and will reduce > memory consumption. It's another good option though. But just out of curosity, so the regular blast parser do load the entire file in the memory consider the output consist of multiple Results concatenated together into a single file. Could anybody clarify. thanks, -siddhartha > > -jason > On Jan 14, 2010, at 11:15 AM, Siddhartha Basu wrote: > > > Hi, > > I have a script that reads a tblastn report(13000 records) and loads in > > a chado database(Bio::Chado::Schema module), however the machine runs of > > memory. I am trying to figure > > out other than loading the database stuff > > if it the reading of SearchIO module could consume a lot of memory. So, > > when i am reading a blast file and getting the result object .... > > > > while (my $result = $searchio->next_result) > > > > * Does the searchio object loads a huge chunk of file in the memory or > > for each iteration it only reads a part of the result. > > > > * Does doing an index on blast report and then reading from it be much > > faster and why. And is there any way i could iterate through each > > record in the index, will that be helpful. > > > > -siddhartha > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > http://fungalgenomes.org/ > From jason at bioperl.org Thu Jan 14 16:28:29 2010 From: jason at bioperl.org (Jason Stajich) Date: Thu, 14 Jan 2010 13:28:29 -0800 Subject: [Bioperl-l] reading blast report In-Reply-To: <4b4f7b74.5744f10a.7087.4813@mx.google.com> References: <4b4f6d3b.5544f10a.40b1.4126@mx.google.com> <83075C66-90CC-4250-890E-FE42F5FEB019@bioperl.org> <4b4f7b74.5744f10a.7087.4813@mx.google.com> Message-ID: On Jan 14, 2010, at 12:15 PM, Siddhartha Basu wrote: > On Thu, 14 Jan 2010, Jason Stajich wrote: > >> What aspects of the report are you loading? You might consider the >> blast >> report as tab-delimited (-m 8 format) if you only are interested in >> start/end positions and scores of ailgnments which is a simpler and >> reduced >> dataset that has lower memory footprint by the parser. > > I think this would be a better approach i am mostly interested in > start/end/score data only. > >> >> Searchio (default) -format => blast - you can try the BLAST -format >> => >> blast_pull instead which lazy parses to create objects and will >> reduce >> memory consumption. > > It's another good option though. But just out of curosity, so the > regular blast parser do load the entire file in the memory consider > the > output consist of multiple Results concatenated together into a > single file. Could anybody clarify. > > thanks, > -siddhartha Each result is parsed (1 result per query) and all the hits and HSPs are parsed and brought into memory with the standard (non-pull) approach. The SearchIO iterates at the level of result - that is why you call next_result which parses each one at a time. > > >> >> -jason >> On Jan 14, 2010, at 11:15 AM, Siddhartha Basu wrote: >> >>> Hi, >>> I have a script that reads a tblastn report(13000 records) and >>> loads in >>> a chado database(Bio::Chado::Schema module), however the machine >>> runs of >>> memory. I am trying to figure >>> out other than loading the database stuff >>> if it the reading of SearchIO module could consume a lot of >>> memory. So, >>> when i am reading a blast file and getting the result object .... >>> >>> while (my $result = $searchio->next_result) >>> >>> * Does the searchio object loads a huge chunk of file in the >>> memory or >>> for each iteration it only reads a part of the result. >>> >>> * Does doing an index on blast report and then reading from it be >>> much >>> faster and why. And is there any way i could iterate through each >>> record in the index, will that be helpful. >>> >>> -siddhartha >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> jason.stajich at gmail.com >> jason at bioperl.org >> http://fungalgenomes.org/ >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ From sidd.basu at gmail.com Thu Jan 14 16:40:42 2010 From: sidd.basu at gmail.com (Siddhartha Basu) Date: Thu, 14 Jan 2010 15:40:42 -0600 Subject: [Bioperl-l] Re: reading blast report In-Reply-To: References: <4b4f6d3b.5544f10a.40b1.4126@mx.google.com> <83075C66-90CC-4250-890E-FE42F5FEB019@bioperl.org> <4b4f7b74.5744f10a.7087.4813@mx.google.com> Message-ID: <4b4f8f5d.5644f10a.2be2.47dc@mx.google.com> Thanks jason for clarification. On Thu, 14 Jan 2010, Jason Stajich wrote: > > On Jan 14, 2010, at 12:15 PM, Siddhartha Basu wrote: > > > On Thu, 14 Jan 2010, Jason Stajich wrote: > > > >> What aspects of the report are you loading? You might consider the blast > >> report as tab-delimited (-m 8 format) if you only are interested in > >> start/end positions and scores of ailgnments which is a simpler and > >> reduced > >> dataset that has lower memory footprint by the parser. > > > > I think this would be a better approach i am mostly interested in > > start/end/score data only. > > > >> > >> Searchio (default) -format => blast - you can try the BLAST -format => > >> blast_pull instead which lazy parses to create objects and will reduce > >> memory consumption. > > > > It's another good option though. But just out of curosity, so the > > regular blast parser do load the entire file in the memory consider the > > output consist of multiple Results concatenated together into a > > single file. Could anybody clarify. > > > > thanks, > > -siddhartha > > Each result is parsed (1 result per query) and all the hits and HSPs are > parsed and brought into memory with the standard (non-pull) approach. > The SearchIO iterates at the level of result - that is why you call > next_result which parses each one at a time. > > > > > > >> > >> -jason > >> On Jan 14, 2010, at 11:15 AM, Siddhartha Basu wrote: > >> > >>> Hi, > >>> I have a script that reads a tblastn report(13000 records) and loads in > >>> a chado database(Bio::Chado::Schema module), however the machine runs > >>> of > >>> memory. I am trying to figure > >>> out other than loading the database stuff > >>> if it the reading of SearchIO module could consume a lot of memory. So, > >>> when i am reading a blast file and getting the result object .... > >>> > >>> while (my $result = $searchio->next_result) > >>> > >>> * Does the searchio object loads a huge chunk of file in the memory or > >>> for each iteration it only reads a part of the result. > >>> > >>> * Does doing an index on blast report and then reading from it be much > >>> faster and why. And is there any way i could iterate through each > >>> record in the index, will that be helpful. > >>> > >>> -siddhartha > >>> > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> -- > >> Jason Stajich > >> jason.stajich at gmail.com > >> jason at bioperl.org > >> http://fungalgenomes.org/ > >> > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > http://fungalgenomes.org/ > From SMarkel at accelrys.com Thu Jan 14 17:58:06 2010 From: SMarkel at accelrys.com (Scott Markel) Date: Thu, 14 Jan 2010 14:58:06 -0800 Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes Message-ID: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net> We've been looking at Bio::Tools::Run::RemoteBlast after some feedback from our customers. Due to network irregularities (not sure what else to call it) users see the getting of remote BLAST results as somewhat random. When results come back the hits are fine, but sometimes no information comes back at all. Retrying helps. In looking at RemoteBlast.pm there are four "return -1" cases. * $status eq 'ERROR' (return on line 614) * $line =~ /ERROR/I (return on line 628) * !$got_content (return on line 648) * !$response->is_success (return on line 655) In the case of no content we'd like to retry remote BLAST. We're happy to do that part in our Pipeline Pilot Perl code wrapper for the BioPerl module, but we only want to retry in that case, not the other three. What would happen if that third "return -1" changed to a different return value? Scott Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel at accelrys.com Accelrys (Pipeline Pilot R&D) mobile: +1 858 205 3653 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 San Diego, CA 92121 fax: +1 858 799 5222 USA web: http://www.accelrys.com http://www.linkedin.com/in/smarkel Vice President, Board of Directors: International Society for Computational Biology Chair: ISCB Publications Committee Associate Editor: PLoS Computational Biology Editorial Board: Briefings in Bioinformatics From nickjd at gmail.com Wed Jan 13 08:18:12 2010 From: nickjd at gmail.com (NickJD) Date: Wed, 13 Jan 2010 05:18:12 -0800 (PST) Subject: [Bioperl-l] Parsing PSI-BLAST results with SearchIO Message-ID: <65554589-081b-4297-ab68-9ddfbd3d9944@c34g2000yqn.googlegroups.com> I am trying to parse PSI-BLAST results using SearchIO and some very basic code just to read the number of hits, number of hsps, etc. I have done 10 rounds on 1 input sequence and parsed it but it seems to treat each round as a separate result, so round/iteration is always 1 and new_hits its always the total list not the ones that are new to that round. Does anyone have any experience of this? Thanks, Nick From dsidote at waksman.rutgers.edu Wed Jan 13 10:08:48 2010 From: dsidote at waksman.rutgers.edu (David J Sidote) Date: Wed, 13 Jan 2010 10:08:48 -0500 Subject: [Bioperl-l] Bioinformatician position - Waksman Institute Message-ID: <4b42af671001130708i703ecce0u47348484321714f@mail.gmail.com> Bioinformatician ? Research Assistant Professor The Waksman Institute of Microbiology located on the New Brunswick campus of Rutgers University is seeking a highly motivated and talented bioinformatics scientist for an Research Assistant Professor appointment. The successful candidate will analyze genome, transcriptome, and epigenome data generated on the Life Sciences 454, Illumina, and AB SOLiD high-throughput sequencing platforms. Excellent communication and teamwork skills are essential as the successful candidate will work closely with individual research groups to develop software to facilitate the visualization, quantification, and interpretation of the data. The successful candidate will be expected to contribute to the publication of scientific literature and to present at seminars and conferences. Qualifications: - PhD in molecular biology, genetics, bioinformatics, systems biology or other related fields; candidates with a PhD in physics, mathematics, or computer science with some working knowledge of biology and experience are encouraged to apply. - Demonstrated scientific track record - Highly proficient in perl, python, or ruby programming, linux/unix scripting, and SQL. - Experience with R is desirable but not required - Experience with high-throughput sequencing, microarrays, or other high-throughput biological platforms - Excellent communication and organizational skills How to Apply: Please send a cover letter stating your current research interests, why you are interested in this position, and how your skill set complements this position along with a curriculum vitae, and the names and contact information of three references to hr at waksman.rutgers.edu. Please include "Bioinformatics Assistant Research Professor" in the subject line. Rutgers is an equal opportunity employer. For more information about this position please contact: Dr. David Sidote (dsidote at waksman.rutgers.edu) From albezg at gmail.com Wed Jan 13 20:57:27 2010 From: albezg at gmail.com (albezg) Date: Wed, 13 Jan 2010 20:57:27 -0500 Subject: [Bioperl-l] AlignIO crashes when reading stockholm alignment with negative PDB ranges In-Reply-To: <49C405F0.5050100@gmail.com> References: <49C2B97B.7070304@gmail.com> <49C405F0.5050100@gmail.com> Message-ID: <4B4E7A07.7070805@gmail.com> Hi all, I have a problem using AlignIO to read Pfam database: ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release/Pfam-A.seed.gz The database is in STOCKHOLM 1.0 format. AlignIO can read the alignment OK until the alignment PF00331.13. There it crashes with the following message: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: '1-344' is not an integer. STACK: Error::throw STACK: Bio::Root::Root::throw /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Root/Root.pm:368 STACK: Bio::Range::end /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Range.pm:228 STACK: Bio::Annotation::Target::new /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Annotation/Target.pm:82 STACK: Bio::AlignIO::Handler::GenericAlignHandler::_stockholm_target /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/GenericAlignHandler.pm:293 STACK: Bio::AlignIO::Handler::GenericAlignHandler::data_handler /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/GenericAlignHandler.pm:73 STACK: Bio::AlignIO::stockholm::next_aln /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/stockholm.pm:471 STACK: /home/albezg/scripts/pfam2fasta.pl:22 ----------------------------------------------------------- It appears this is caused by this entry: #=GS XYNA_PSEFL/263-608 DR PDB; 1e5n B; -1-344; I don't care about residues in PDB, so I have just removed minus signs from the ranges. This seems to have fixed the crashing. Is it a known problem? Is there a solution for it? Thanks, Alexandr On 03/20/2009 05:09 PM, albezg wrote: > > I'm trying to change FASTA header(display_id) for a sequence in an > alignment(SimpleAlign). > > There are no issues when I print it, however when I use AlignIO to write > the alignment to a FASTA file, it does not work. Is this behavior intended? > > Demo code: http://github.com/jhannah/sandbox/tree/master/Bio_AlignIO_bug > > The error: > ------------- EXCEPTION ------------- > MSG: No sequence with name [1/1-11] > STACK Bio::SimpleAlign::displayname > /scratch/BioSoftware/bioperl-live/Bio/SimpleAlign.pm:2659 > STACK Bio::AlignIO::fasta::write_aln > /scratch/BioSoftware/bioperl-live/Bio/AlignIO/fasta.pm:200 > STACK toplevel ./demo.pl:14 > ------------------------------------- > > Alexandr From mitch_skinner at berkeley.edu Thu Jan 14 17:10:53 2010 From: mitch_skinner at berkeley.edu (Mitch Skinner) Date: Thu, 14 Jan 2010 14:10:53 -0800 Subject: [Bioperl-l] filter_by_location in Bio::DB::SeqFeature::Store::memory Message-ID: <4B4F966D.3030300@berkeley.edu> Hi, Some people haven't been getting all of the features in their GFF3 into JBrowse, and a nice test case that James Casbon posted to the list helped me track it down. Here's an example of the behavior I was seeing with BioPerl 1.6.1 (using Devel::REPL): ============== $ use Bio::DB::SeqFeature::Store $ my $db = Bio::DB::SeqFeature::Store->new(-adaptor=>"memory", -dsn=>"casbon.gff3") $Bio_DB_SeqFeature_Store_memory1 = Bio::DB::SeqFeature::Store::memory=HASH(0xa27ceec); $ $db->features(-seq_id=>"CYP2C8") $ARRAY1 = [ Feature:src(41), region(CYP2C8), Feature:src(37), Feature:src(39), Feature:src(42), Feature:src(40), Feature:src(38) ]; ============== I expected to also see the features with IDs 43 and 44 (the gff3 file is attached). I think there's a problem in the filter_by_location method. If start and end parameters aren't passed to the method, it sets default start and end values that lead it to examine all of the bins in its index. But the end value that it creates is at the beginning of the last bin, and I think it should be at the end of the last bin instead. The attached patch changes it to be at the end of the last bin. Regards, Mitch -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: casbon.gff3 URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: bdsfsm-filter_by_location.patch URL: From jason at bioperl.org Thu Jan 14 19:20:43 2010 From: jason at bioperl.org (Jason Stajich) Date: Thu, 14 Jan 2010 16:20:43 -0800 Subject: [Bioperl-l] AlignIO crashes when reading stockholm alignment with negative PDB ranges In-Reply-To: <4B4E7A07.7070805@gmail.com> References: <49C2B97B.7070304@gmail.com> <49C405F0.5050100@gmail.com> <4B4E7A07.7070805@gmail.com> Message-ID: <94913CE4-22AA-407C-9B27-7572A680C9F7@bioperl.org> Seems like improper data really -- "-1" is an improper coordinate as far as the parser is concerned. You may want to tell Pfam that there is possible error in the dumper since that was the only record that had this problem? -jason On Jan 13, 2010, at 5:57 PM, albezg wrote: > Hi all, > > I have a problem using AlignIO to read Pfam database: > ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release/Pfam-A.seed.gz > The database is in STOCKHOLM 1.0 format. AlignIO can read the > alignment OK until the alignment PF00331.13. There it crashes with > the following message: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: '1-344' is not an integer. > > STACK: Error::throw > STACK: Bio::Root::Root::throw /home/albezg/lib/perl5/site_perl/ > 5.10.0/Bio/Root/Root.pm:368 > STACK: Bio::Range::end /home/albezg/lib/perl5/site_perl/5.10.0/Bio/ > Range.pm:228 > STACK: Bio::Annotation::Target::new /home/albezg/lib/perl5/site_perl/ > 5.10.0/Bio/Annotation/Target.pm:82 > STACK: > Bio::AlignIO::Handler::GenericAlignHandler::_stockholm_target /home/ > albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/ > GenericAlignHandler.pm:293 > STACK: Bio::AlignIO::Handler::GenericAlignHandler::data_handler / > home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/ > GenericAlignHandler.pm:73 > STACK: Bio::AlignIO::stockholm::next_aln /home/albezg/lib/perl5/ > site_perl/5.10.0/Bio/AlignIO/stockholm.pm:471 > STACK: /home/albezg/scripts/pfam2fasta.pl:22 > ----------------------------------------------------------- > > It appears this is caused by this entry: > #=GS XYNA_PSEFL/263-608 DR PDB; 1e5n B; -1-344; > > I don't care about residues in PDB, so I have just removed minus > signs from the ranges. This seems to have fixed the crashing. > > Is it a known problem? Is there a solution for it? > > Thanks, > Alexandr > > > On 03/20/2009 05:09 PM, albezg wrote: >> >> I'm trying to change FASTA header(display_id) for a sequence in an >> alignment(SimpleAlign). >> >> There are no issues when I print it, however when I use AlignIO to >> write >> the alignment to a FASTA file, it does not work. Is this behavior >> intended? >> >> Demo code: http://github.com/jhannah/sandbox/tree/master/Bio_AlignIO_bug >> >> The error: >> ------------- EXCEPTION ------------- >> MSG: No sequence with name [1/1-11] >> STACK Bio::SimpleAlign::displayname >> /scratch/BioSoftware/bioperl-live/Bio/SimpleAlign.pm:2659 >> STACK Bio::AlignIO::fasta::write_aln >> /scratch/BioSoftware/bioperl-live/Bio/AlignIO/fasta.pm:200 >> STACK toplevel ./demo.pl:14 >> ------------------------------------- >> >> Alexandr > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ From maj at fortinbras.us Thu Jan 14 21:00:31 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 14 Jan 2010 21:00:31 -0500 Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes In-Reply-To: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net> References: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net> Message-ID: How about returning 1, 2, 4 for the non-zero cases, with some error constants set for convenience? MAJ ----- Original Message ----- From: "Scott Markel" To: Sent: Thursday, January 14, 2010 5:58 PM Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes > We've been looking at Bio::Tools::Run::RemoteBlast after some feedback > from our customers. Due to network irregularities (not sure what else > to call it) users see the getting of remote BLAST results as somewhat > random. When results come back the hits are fine, but sometimes no > information comes back at all. Retrying helps. > > In looking at RemoteBlast.pm there are four "return -1" cases. > > * $status eq 'ERROR' (return on line 614) > * $line =~ /ERROR/I (return on line 628) > * !$got_content (return on line 648) > * !$response->is_success (return on line 655) > > In the case of no content we'd like to retry remote BLAST. We're happy > to do that part in our Pipeline Pilot Perl code wrapper for the BioPerl > module, but we only want to retry in that case, not the other three. > > What would happen if that third "return -1" changed to a different > return value? > > Scott > > Scott Markel, Ph.D. > Principal Bioinformatics Architect email: smarkel at accelrys.com > Accelrys (Pipeline Pilot R&D) mobile: +1 858 205 3653 > 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 > San Diego, CA 92121 fax: +1 858 799 5222 > USA web: http://www.accelrys.com > > http://www.linkedin.com/in/smarkel > Vice President, Board of Directors: > International Society for Computational Biology > Chair: ISCB Publications Committee > Associate Editor: PLoS Computational Biology > Editorial Board: Briefings in Bioinformatics > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Thu Jan 14 19:42:31 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 14 Jan 2010 18:42:31 -0600 Subject: [Bioperl-l] reading blast report In-Reply-To: <4b4f7b74.5744f10a.7087.4813@mx.google.com> References: <4b4f6d3b.5544f10a.40b1.4126@mx.google.com> <83075C66-90CC-4250-890E-FE42F5FEB019@bioperl.org> <4b4f7b74.5744f10a.7087.4813@mx.google.com> Message-ID: <0B76CCA7-C37C-4E24-BBDF-C8FD805DBBF2@illinois.edu> On Jan 14, 2010, at 2:15 PM, Siddhartha Basu wrote: > On Thu, 14 Jan 2010, Jason Stajich wrote: > >> What aspects of the report are you loading? You might consider the blast >> report as tab-delimited (-m 8 format) if you only are interested in >> start/end positions and scores of ailgnments which is a simpler and reduced >> dataset that has lower memory footprint by the parser. > > I think this would be a better approach i am mostly interested in > start/end/score data only. > >> Searchio (default) -format => blast - you can try the BLAST -format => >> blast_pull instead which lazy parses to create objects and will reduce >> memory consumption. > > It's another good option though. But just out of curosity, so the > regular blast parser do load the entire file in the memory consider the > output consist of multiple Results concatenated together into a > single file. Could anybody clarify. Yes, the original SearchIO parsers all load the data into objects. This was based on the presumption that one wouldn't want very large BLAST reports, but this assumption probably isn't amenable today. The pull parser is one aswer to that, in it pulls the data only upon request (creates them on the fly), so it should be more amenable to parsing very large BLAST reports. > thanks, > -siddhartha > >> -jason chris From cjfields at illinois.edu Fri Jan 15 01:33:50 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 15 Jan 2010 00:33:50 -0600 Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes In-Reply-To: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net> References: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net> Message-ID: Scott, I think this is fine (to change the third condition and retry with a specific code). The other possibility is to simply throw different exceptions under each of these circumstances, which can be caught via eval to allow a retry under only certain conditions (no content, for instance). One interesting bit: I think (though I'm not sure) the new BLAST+ allows remote BLAST queries from command line, similar to the legacy blastcl3. Mark just wrote up a BLAST+ wrapper, so it might be worth testing that theory out. chris PS - BTW, nice to finally meet you at GMOD! On Jan 14, 2010, at 4:58 PM, Scott Markel wrote: > We've been looking at Bio::Tools::Run::RemoteBlast after some feedback > from our customers. Due to network irregularities (not sure what else > to call it) users see the getting of remote BLAST results as somewhat > random. When results come back the hits are fine, but sometimes no > information comes back at all. Retrying helps. > > In looking at RemoteBlast.pm there are four "return -1" cases. > > * $status eq 'ERROR' (return on line 614) > * $line =~ /ERROR/I (return on line 628) > * !$got_content (return on line 648) > * !$response->is_success (return on line 655) > > In the case of no content we'd like to retry remote BLAST. We're happy > to do that part in our Pipeline Pilot Perl code wrapper for the BioPerl > module, but we only want to retry in that case, not the other three. > > What would happen if that third "return -1" changed to a different > return value? > > Scott > > Scott Markel, Ph.D. > Principal Bioinformatics Architect email: smarkel at accelrys.com > Accelrys (Pipeline Pilot R&D) mobile: +1 858 205 3653 > 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 > San Diego, CA 92121 fax: +1 858 799 5222 > USA web: http://www.accelrys.com > > http://www.linkedin.com/in/smarkel > Vice President, Board of Directors: > International Society for Computational Biology > Chair: ISCB Publications Committee > Associate Editor: PLoS Computational Biology > Editorial Board: Briefings in Bioinformatics > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields1 at gmail.com Fri Jan 15 01:35:35 2010 From: cjfields1 at gmail.com (Christopher Fields) Date: Fri, 15 Jan 2010 00:35:35 -0600 Subject: [Bioperl-l] filter_by_location in Bio::DB::SeqFeature::Store::memory In-Reply-To: <4B4F966D.3030300@berkeley.edu> References: <4B4F966D.3030300@berkeley.edu> Message-ID: <992796AC-B85B-4555-88A1-36000C0A2002@gmail.com> An HTML attachment was scrubbed... URL: From David.Messina at sbc.su.se Fri Jan 15 10:17:14 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 15 Jan 2010 16:17:14 +0100 Subject: [Bioperl-l] getting/setting species names with Bio::Species Message-ID: <86045871-98A8-46A3-9ECC-699C32CFE1A1@sbc.su.se> Hi everybody, I'm having a little trouble with names in Bio::Species objects. According to the Bio::Species documentation, if I have a species name as a string, like "Homo sapiens", I can get and set that using the species method: my $my_species_obj = Bio::Species->new(); $my_species_obj->species('Homo sapiens'); print $my_species_obj->species; # 'Homo sapiens' That works fine if I create the Bio::Species object myself. But if I try to get that string back out from a BIo::Species object created by SeqIO from a genbank file, I get just 'sapiens' back: my $io = Bio::SeqIO->new('-format' => 'genbank', '-file' => 'hoxa2.gb'); my $seq_obj = $io->next_seq; my $io_species_obj = $seq_obj->species; print $io_species_obj->species; # 'sapiens' I think that happens because genbank records have more taxonomic info about the species name, like the genus (and in fact the whole taxonomic categorization: kingdom phylum order, etc). So the genus is stored separately. Poking around a bit more in Bio::Species, I turned up the method 'binomial', which appears to do the right thing, returning genus and species in both cases. Except, as you can see, the space is stripped out for my species-name-is-just-a-string object: print $my_species_obj->binomial; # 'Homosapiens' print $io_species_obj->binomial; # 'Homo sapiens' I'm not very familiar with Bio::Species (and its parent Bio::Taxon); am I using it correctly above, or is there a better way? If not, this kinda looks like a bug to me. I've got a patch which works and passes the BioPerl test suite. Thanks, Dave From maj at fortinbras.us Fri Jan 15 10:31:16 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 15 Jan 2010 10:31:16 -0500 Subject: [Bioperl-l] getting/setting species names with Bio::Species In-Reply-To: <86045871-98A8-46A3-9ECC-699C32CFE1A1@sbc.su.se> References: <86045871-98A8-46A3-9ECC-699C32CFE1A1@sbc.su.se> Message-ID: I'm not that familiar with Bio::Species either, but this looks like conflicting semantics betwen Bio::Species and Bio::SeqIO. Bio::SeqIO sets the species accessor to the 'species' element of the lineage array, I believe. FWIW, I'd prefer "binomial" = "genus" . "species" MAJ ----- Original Message ----- From: "Dave Messina" To: "BioPerl List" Sent: Friday, January 15, 2010 10:17 AM Subject: [Bioperl-l] getting/setting species names with Bio::Species > Hi everybody, > > I'm having a little trouble with names in Bio::Species objects. > > According to the Bio::Species documentation, if I have a species name as a > string, like "Homo sapiens", I can get and set that using the species method: > > my $my_species_obj = Bio::Species->new(); > $my_species_obj->species('Homo sapiens'); > > print $my_species_obj->species; # 'Homo sapiens' > > > That works fine if I create the Bio::Species object myself. > > But if I try to get that string back out from a BIo::Species object created by > SeqIO from a genbank file, I get just 'sapiens' back: > > my $io = Bio::SeqIO->new('-format' => 'genbank', > '-file' => 'hoxa2.gb'); > my $seq_obj = $io->next_seq; > my $io_species_obj = $seq_obj->species; > > print $io_species_obj->species; # 'sapiens' > > > I think that happens because genbank records have more taxonomic info about > the species name, like the genus (and in fact the whole taxonomic > categorization: kingdom phylum order, etc). So the genus is stored separately. > > Poking around a bit more in Bio::Species, I turned up the method 'binomial', > which appears to do the right thing, returning genus and species in both > cases. Except, as you can see, the space is stripped out for my > species-name-is-just-a-string object: > > print $my_species_obj->binomial; # 'Homosapiens' > print $io_species_obj->binomial; # 'Homo sapiens' > > > I'm not very familiar with Bio::Species (and its parent Bio::Taxon); am I > using it correctly above, or is there a better way? > > If not, this kinda looks like a bug to me. I've got a patch which works and > passes the BioPerl test suite. > > > Thanks, > Dave > > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From maj at fortinbras.us Fri Jan 15 10:24:06 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 15 Jan 2010 10:24:06 -0500 Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes In-Reply-To: References: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net> Message-ID: True-- blast+ allows remote dbs. I just commited a patch that makes this easy in StandAloneBlastPlus: specify '-remote => 1' in the factory, and downstream command calls will take care of it- MAJ # ex... use Bio::Tools::Run::StandAloneBlastPlus; use Bio::Seq; $ENV{BLASTPLUSDIR} = $where_it_is; my $fac = Bio::Tools::Run::StandAloneBlastPlus->new( -db_name => 'wgs', -remote => 1 ); my $result = $fac->blastn( -query => Bio::Seq->new(-seq=>'ggcaacaaacctggtaaagaagacggcaacaagcctggtaaagaagatggcaacaagcct', -id=>"proteinA") ); 1; ----- Original Message ----- From: "Chris Fields" To: "Scott Markel" Cc: Sent: Friday, January 15, 2010 1:33 AM Subject: Re: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes > Scott, > > I think this is fine (to change the third condition and retry with a specific > code). The other possibility is to simply throw different exceptions under > each of these circumstances, which can be caught via eval to allow a retry > under only certain conditions (no content, for instance). > > One interesting bit: I think (though I'm not sure) the new BLAST+ allows > remote BLAST queries from command line, similar to the legacy blastcl3. Mark > just wrote up a BLAST+ wrapper, so it might be worth testing that theory out. > > chris > > PS - BTW, nice to finally meet you at GMOD! > > On Jan 14, 2010, at 4:58 PM, Scott Markel wrote: > >> We've been looking at Bio::Tools::Run::RemoteBlast after some feedback >> from our customers. Due to network irregularities (not sure what else >> to call it) users see the getting of remote BLAST results as somewhat >> random. When results come back the hits are fine, but sometimes no >> information comes back at all. Retrying helps. >> >> In looking at RemoteBlast.pm there are four "return -1" cases. >> >> * $status eq 'ERROR' (return on line 614) >> * $line =~ /ERROR/I (return on line 628) >> * !$got_content (return on line 648) >> * !$response->is_success (return on line 655) >> >> In the case of no content we'd like to retry remote BLAST. We're happy >> to do that part in our Pipeline Pilot Perl code wrapper for the BioPerl >> module, but we only want to retry in that case, not the other three. >> >> What would happen if that third "return -1" changed to a different >> return value? >> >> Scott >> >> Scott Markel, Ph.D. >> Principal Bioinformatics Architect email: smarkel at accelrys.com >> Accelrys (Pipeline Pilot R&D) mobile: +1 858 205 3653 >> 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 >> San Diego, CA 92121 fax: +1 858 799 5222 >> USA web: http://www.accelrys.com >> >> http://www.linkedin.com/in/smarkel >> Vice President, Board of Directors: >> International Society for Computational Biology >> Chair: ISCB Publications Committee >> Associate Editor: PLoS Computational Biology >> Editorial Board: Briefings in Bioinformatics >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From SMarkel at accelrys.com Fri Jan 15 10:40:31 2010 From: SMarkel at accelrys.com (Scott Markel) Date: Fri, 15 Jan 2010 07:40:31 -0800 Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes In-Reply-To: References: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net> Message-ID: <5ACBA19439E77B43A06F4CAB897EC977019B3096@EXCH1-COLO.accelrys.net> Chris, It was nice meeting you and Scott C., too. And seeing Jason again. If you and Mark > How about returning 1, 2, 4 for the non-zero cases, with some > error constants set for convenience? MAJ are okay with adding more return values, that works best for us in Pipeline Pilot. I'll add a Bugzilla entry. Scott -----Original Message----- From: Chris Fields [mailto:cjfields at illinois.edu] Sent: Thursday, 14 January 2010 10:34 PM To: Scott Markel Cc: Bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes Scott, I think this is fine (to change the third condition and retry with a specific code). The other possibility is to simply throw different exceptions under each of these circumstances, which can be caught via eval to allow a retry under only certain conditions (no content, for instance). One interesting bit: I think (though I'm not sure) the new BLAST+ allows remote BLAST queries from command line, similar to the legacy blastcl3. Mark just wrote up a BLAST+ wrapper, so it might be worth testing that theory out. chris PS - BTW, nice to finally meet you at GMOD! On Jan 14, 2010, at 4:58 PM, Scott Markel wrote: > We've been looking at Bio::Tools::Run::RemoteBlast after some feedback > from our customers. Due to network irregularities (not sure what else > to call it) users see the getting of remote BLAST results as somewhat > random. When results come back the hits are fine, but sometimes no > information comes back at all. Retrying helps. > > In looking at RemoteBlast.pm there are four "return -1" cases. > > * $status eq 'ERROR' (return on line 614) > * $line =~ /ERROR/I (return on line 628) > * !$got_content (return on line 648) > * !$response->is_success (return on line 655) > > In the case of no content we'd like to retry remote BLAST. We're happy > to do that part in our Pipeline Pilot Perl code wrapper for the BioPerl > module, but we only want to retry in that case, not the other three. > > What would happen if that third "return -1" changed to a different > return value? > > Scott > > Scott Markel, Ph.D. > Principal Bioinformatics Architect email: smarkel at accelrys.com > Accelrys (Pipeline Pilot R&D) mobile: +1 858 205 3653 > 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 > San Diego, CA 92121 fax: +1 858 799 5222 > USA web: http://www.accelrys.com > > http://www.linkedin.com/in/smarkel > Vice President, Board of Directors: > International Society for Computational Biology > Chair: ISCB Publications Committee > Associate Editor: PLoS Computational Biology > Editorial Board: Briefings in Bioinformatics > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Fri Jan 15 11:00:21 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 15 Jan 2010 10:00:21 -0600 Subject: [Bioperl-l] getting/setting species names with Bio::Species In-Reply-To: References: <86045871-98A8-46A3-9ECC-699C32CFE1A1@sbc.su.se> Message-ID: <16F8A316-FAB3-4D5E-975A-05CE14578982@illinois.edu> > FWIW, I'd prefer "binomial" = "genus" . "species" That's the way Bio::Species is supposed to work, at least when it was refactored by Sendu. But just a note: Bio::Species was considered deprecated (scheduled for the 1.7 release IIRC) for many very good reasons in favor of Bio::Taxon. First and foremost among these is the fact we cannot consistently parse out the genus/species/strain/variant/etc for every organism in GenBank w/o knowing it's full lineage, which means including some taxonomic information. And even then it's highly problematic. We've had several heated discussions on list about how to handle this in a somewhat backwards-compatible way, and the main solution was to forego compatibility issues altogether and eventually deprecate Bio::Species altogether in favor of Bio::Taxon, a class that doesn't make the same assumptions. Bio::Species, in the interim, is-a Bio::Taxon. You'll note that a minimal Bio::DB::Taxonomy instance is constructed from the classification scheme in some instances, but if one had a proper DB link one could link to Entrez Taxonomy or a local flat file indexes DB and grab the info. Bio::Taxon (correct me if I'm wrong on this Sendu, if you're out there) eschews various methods (species, etc) for simpler consistent ones based on Taxonomy, and doesn't force us to handle every exception to getting the genus/species out of a name. That is left up to the user, at their peril. For either one, if you are reproducing the fully qualified name, you probably should use something like node_name() for consistency. Bio::Species also has scientific_name(). With a true Bio::Taxon one would need to be check this is performed on the species node. chris On Jan 15, 2010, at 9:31 AM, Mark A. Jensen wrote: > I'm not that familiar with Bio::Species either, but this looks > like conflicting semantics betwen Bio::Species and Bio::SeqIO. > Bio::SeqIO sets the species accessor to the 'species' element of > the lineage array, I believe. > FWIW, I'd prefer "binomial" = "genus" . "species" > MAJ > ----- Original Message ----- From: "Dave Messina" > To: "BioPerl List" > Sent: Friday, January 15, 2010 10:17 AM > Subject: [Bioperl-l] getting/setting species names with Bio::Species > > >> Hi everybody, >> >> I'm having a little trouble with names in Bio::Species objects. >> >> According to the Bio::Species documentation, if I have a species name as a string, like "Homo sapiens", I can get and set that using the species method: >> >> my $my_species_obj = Bio::Species->new(); >> $my_species_obj->species('Homo sapiens'); >> >> print $my_species_obj->species; # 'Homo sapiens' >> >> >> That works fine if I create the Bio::Species object myself. >> >> But if I try to get that string back out from a BIo::Species object created by SeqIO from a genbank file, I get just 'sapiens' back: >> >> my $io = Bio::SeqIO->new('-format' => 'genbank', >> '-file' => 'hoxa2.gb'); >> my $seq_obj = $io->next_seq; >> my $io_species_obj = $seq_obj->species; >> >> print $io_species_obj->species; # 'sapiens' >> >> >> I think that happens because genbank records have more taxonomic info about the species name, like the genus (and in fact the whole taxonomic categorization: kingdom phylum order, etc). So the genus is stored separately. >> >> Poking around a bit more in Bio::Species, I turned up the method 'binomial', which appears to do the right thing, returning genus and species in both cases. Except, as you can see, the space is stripped out for my species-name-is-just-a-string object: >> >> print $my_species_obj->binomial; # 'Homosapiens' >> print $io_species_obj->binomial; # 'Homo sapiens' >> >> >> I'm not very familiar with Bio::Species (and its parent Bio::Taxon); am I using it correctly above, or is there a better way? >> >> If not, this kinda looks like a bug to me. I've got a patch which works and passes the BioPerl test suite. >> >> >> Thanks, >> Dave >> >> >> >> >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From SMarkel at accelrys.com Fri Jan 15 11:10:34 2010 From: SMarkel at accelrys.com (Scott Markel) Date: Fri, 15 Jan 2010 08:10:34 -0800 Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes In-Reply-To: References: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net> <5ACBA19439E77B43A06F4CAB897EC977019B3096@EXCH1-COLO.accelrys.net> Message-ID: <5ACBA19439E77B43A06F4CAB897EC977019B30A7@EXCH1-COLO.accelrys.net> Mark, Thank you. Scott -----Original Message----- From: Mark A. Jensen [mailto:maj at fortinbras.us] Sent: Friday, 15 January 2010 8:10 AM To: Scott Markel; Chris Fields Cc: Bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes can do Scott-- cheers MAJ ----- Original Message ----- From: "Scott Markel" To: "Chris Fields" Cc: Sent: Friday, January 15, 2010 10:40 AM Subject: Re: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes > Chris, > > It was nice meeting you and Scott C., too. And seeing Jason again. > > If you and Mark > >> How about returning 1, 2, 4 for the non-zero cases, with some >> error constants set for convenience? MAJ > > are okay with adding more return values, that works best for us in > Pipeline Pilot. > > I'll add a Bugzilla entry. > > Scott > > > -----Original Message----- > From: Chris Fields [mailto:cjfields at illinois.edu] > Sent: Thursday, 14 January 2010 10:34 PM > To: Scott Markel > Cc: Bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes > > Scott, > > I think this is fine (to change the third condition and retry with a specific > code). The other possibility is to simply throw different exceptions under > each of these circumstances, which can be caught via eval to allow a retry > under only certain conditions (no content, for instance). > > One interesting bit: I think (though I'm not sure) the new BLAST+ allows > remote BLAST queries from command line, similar to the legacy blastcl3. Mark > just wrote up a BLAST+ wrapper, so it might be worth testing that theory out. > > chris > > PS - BTW, nice to finally meet you at GMOD! > > On Jan 14, 2010, at 4:58 PM, Scott Markel wrote: > >> We've been looking at Bio::Tools::Run::RemoteBlast after some feedback >> from our customers. Due to network irregularities (not sure what else >> to call it) users see the getting of remote BLAST results as somewhat >> random. When results come back the hits are fine, but sometimes no >> information comes back at all. Retrying helps. >> >> In looking at RemoteBlast.pm there are four "return -1" cases. >> >> * $status eq 'ERROR' (return on line 614) >> * $line =~ /ERROR/I (return on line 628) >> * !$got_content (return on line 648) >> * !$response->is_success (return on line 655) >> >> In the case of no content we'd like to retry remote BLAST. We're happy >> to do that part in our Pipeline Pilot Perl code wrapper for the BioPerl >> module, but we only want to retry in that case, not the other three. >> >> What would happen if that third "return -1" changed to a different >> return value? >> >> Scott >> >> Scott Markel, Ph.D. >> Principal Bioinformatics Architect email: smarkel at accelrys.com >> Accelrys (Pipeline Pilot R&D) mobile: +1 858 205 3653 >> 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 >> San Diego, CA 92121 fax: +1 858 799 5222 >> USA web: http://www.accelrys.com >> >> http://www.linkedin.com/in/smarkel >> Vice President, Board of Directors: >> International Society for Computational Biology >> Chair: ISCB Publications Committee >> Associate Editor: PLoS Computational Biology >> Editorial Board: Briefings in Bioinformatics >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From maj at fortinbras.us Fri Jan 15 11:09:38 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 15 Jan 2010 11:09:38 -0500 Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes In-Reply-To: <5ACBA19439E77B43A06F4CAB897EC977019B3096@EXCH1-COLO.accelrys.net> References: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net> <5ACBA19439E77B43A06F4CAB897EC977019B3096@EXCH1-COLO.accelrys.net> Message-ID: can do Scott-- cheers MAJ ----- Original Message ----- From: "Scott Markel" To: "Chris Fields" Cc: Sent: Friday, January 15, 2010 10:40 AM Subject: Re: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes > Chris, > > It was nice meeting you and Scott C., too. And seeing Jason again. > > If you and Mark > >> How about returning 1, 2, 4 for the non-zero cases, with some >> error constants set for convenience? MAJ > > are okay with adding more return values, that works best for us in > Pipeline Pilot. > > I'll add a Bugzilla entry. > > Scott > > > -----Original Message----- > From: Chris Fields [mailto:cjfields at illinois.edu] > Sent: Thursday, 14 January 2010 10:34 PM > To: Scott Markel > Cc: Bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes > > Scott, > > I think this is fine (to change the third condition and retry with a specific > code). The other possibility is to simply throw different exceptions under > each of these circumstances, which can be caught via eval to allow a retry > under only certain conditions (no content, for instance). > > One interesting bit: I think (though I'm not sure) the new BLAST+ allows > remote BLAST queries from command line, similar to the legacy blastcl3. Mark > just wrote up a BLAST+ wrapper, so it might be worth testing that theory out. > > chris > > PS - BTW, nice to finally meet you at GMOD! > > On Jan 14, 2010, at 4:58 PM, Scott Markel wrote: > >> We've been looking at Bio::Tools::Run::RemoteBlast after some feedback >> from our customers. Due to network irregularities (not sure what else >> to call it) users see the getting of remote BLAST results as somewhat >> random. When results come back the hits are fine, but sometimes no >> information comes back at all. Retrying helps. >> >> In looking at RemoteBlast.pm there are four "return -1" cases. >> >> * $status eq 'ERROR' (return on line 614) >> * $line =~ /ERROR/I (return on line 628) >> * !$got_content (return on line 648) >> * !$response->is_success (return on line 655) >> >> In the case of no content we'd like to retry remote BLAST. We're happy >> to do that part in our Pipeline Pilot Perl code wrapper for the BioPerl >> module, but we only want to retry in that case, not the other three. >> >> What would happen if that third "return -1" changed to a different >> return value? >> >> Scott >> >> Scott Markel, Ph.D. >> Principal Bioinformatics Architect email: smarkel at accelrys.com >> Accelrys (Pipeline Pilot R&D) mobile: +1 858 205 3653 >> 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 >> San Diego, CA 92121 fax: +1 858 799 5222 >> USA web: http://www.accelrys.com >> >> http://www.linkedin.com/in/smarkel >> Vice President, Board of Directors: >> International Society for Computational Biology >> Chair: ISCB Publications Committee >> Associate Editor: PLoS Computational Biology >> Editorial Board: Briefings in Bioinformatics >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From maj at fortinbras.us Fri Jan 15 11:10:02 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 15 Jan 2010 11:10:02 -0500 Subject: [Bioperl-l] getting/setting species names with Bio::Species In-Reply-To: <16F8A316-FAB3-4D5E-975A-05CE14578982@illinois.edu> References: <86045871-98A8-46A3-9ECC-699C32CFE1A1@sbc.su.se> <16F8A316-FAB3-4D5E-975A-05CE14578982@illinois.edu> Message-ID: excellent summary--thanks!! ----- Original Message ----- From: "Chris Fields" To: "Mark A. Jensen" Cc: "BioPerl List" Sent: Friday, January 15, 2010 11:00 AM Subject: Re: [Bioperl-l] getting/setting species names with Bio::Species >> FWIW, I'd prefer "binomial" = "genus" . "species" > > > That's the way Bio::Species is supposed to work, at least when it was > refactored by Sendu. But just a note: Bio::Species was considered deprecated > (scheduled for the 1.7 release IIRC) for many very good reasons in favor of > Bio::Taxon. First and foremost among these is the fact we cannot consistently > parse out the genus/species/strain/variant/etc for every organism in GenBank > w/o knowing it's full lineage, which means including some taxonomic > information. And even then it's highly problematic. > > We've had several heated discussions on list about how to handle this in a > somewhat backwards-compatible way, and the main solution was to forego > compatibility issues altogether and eventually deprecate Bio::Species > altogether in favor of Bio::Taxon, a class that doesn't make the same > assumptions. Bio::Species, in the interim, is-a Bio::Taxon. You'll note that > a minimal Bio::DB::Taxonomy instance is constructed from the classification > scheme in some instances, but if one had a proper DB link one could link to > Entrez Taxonomy or a local flat file indexes DB and grab the info. Bio::Taxon > (correct me if I'm wrong on this Sendu, if you're out there) eschews various > methods (species, etc) for simpler consistent ones based on Taxonomy, and > doesn't force us to handle every exception to getting the genus/species out of > a name. That is left up to the user, at their peril. > > For either one, if you are reproducing the fully qualified name, you probably > should use something like node_name() for consistency. Bio::Species also has > scientific_name(). With a true Bio::Taxon one would need to be check this is > performed on the species node. > > chris > > On Jan 15, 2010, at 9:31 AM, Mark A. Jensen wrote: > >> I'm not that familiar with Bio::Species either, but this looks >> like conflicting semantics betwen Bio::Species and Bio::SeqIO. >> Bio::SeqIO sets the species accessor to the 'species' element of >> the lineage array, I believe. >> FWIW, I'd prefer "binomial" = "genus" . "species" >> MAJ >> ----- Original Message ----- From: "Dave Messina" >> To: "BioPerl List" >> Sent: Friday, January 15, 2010 10:17 AM >> Subject: [Bioperl-l] getting/setting species names with Bio::Species >> >> >>> Hi everybody, >>> >>> I'm having a little trouble with names in Bio::Species objects. >>> >>> According to the Bio::Species documentation, if I have a species name as a >>> string, like "Homo sapiens", I can get and set that using the species >>> method: >>> >>> my $my_species_obj = Bio::Species->new(); >>> $my_species_obj->species('Homo sapiens'); >>> >>> print $my_species_obj->species; # 'Homo sapiens' >>> >>> >>> That works fine if I create the Bio::Species object myself. >>> >>> But if I try to get that string back out from a BIo::Species object created >>> by SeqIO from a genbank file, I get just 'sapiens' back: >>> >>> my $io = Bio::SeqIO->new('-format' => 'genbank', >>> '-file' => 'hoxa2.gb'); >>> my $seq_obj = $io->next_seq; >>> my $io_species_obj = $seq_obj->species; >>> >>> print $io_species_obj->species; # 'sapiens' >>> >>> >>> I think that happens because genbank records have more taxonomic info about >>> the species name, like the genus (and in fact the whole taxonomic >>> categorization: kingdom phylum order, etc). So the genus is stored >>> separately. >>> >>> Poking around a bit more in Bio::Species, I turned up the method 'binomial', >>> which appears to do the right thing, returning genus and species in both >>> cases. Except, as you can see, the space is stripped out for my >>> species-name-is-just-a-string object: >>> >>> print $my_species_obj->binomial; # 'Homosapiens' >>> print $io_species_obj->binomial; # 'Homo sapiens' >>> >>> >>> I'm not very familiar with Bio::Species (and its parent Bio::Taxon); am I >>> using it correctly above, or is there a better way? >>> >>> If not, this kinda looks like a bug to me. I've got a patch which works and >>> passes the BioPerl test suite. >>> >>> >>> Thanks, >>> Dave >>> >>> >>> >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From hlapp at drycafe.net Fri Jan 15 12:04:43 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Fri, 15 Jan 2010 12:04:43 -0500 Subject: [Bioperl-l] getting/setting species names with Bio::Species In-Reply-To: <86045871-98A8-46A3-9ECC-699C32CFE1A1@sbc.su.se> References: <86045871-98A8-46A3-9ECC-699C32CFE1A1@sbc.su.se> Message-ID: <2114E010-7819-4E74-9A92-C6DE6496ED0E@drycafe.net> On Jan 15, 2010, at 10:17 AM, Dave Messina wrote: > According to the Bio::Species documentation, if I have a species > name as a string, like "Homo sapiens", I can get and set that using > the species method: > > my $my_species_obj = Bio::Species->new(); > $my_species_obj->species('Homo sapiens'); If that's really what the documentation says, it's wrong. It is the binomial() method that does this (as getter and setter). -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From David.Messina at sbc.su.se Fri Jan 15 13:37:17 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 15 Jan 2010 19:37:17 +0100 Subject: [Bioperl-l] getting/setting species names with Bio::Species In-Reply-To: <2114E010-7819-4E74-9A92-C6DE6496ED0E@drycafe.net> References: <86045871-98A8-46A3-9ECC-699C32CFE1A1@sbc.su.se> <2114E010-7819-4E74-9A92-C6DE6496ED0E@drycafe.net> Message-ID: <24798E45-CF24-47D9-AB39-E66C35A5FA8B@sbc.su.se> Thanks guys. Well, looks like I ignored the deprecation warnings at my own peril. :) I'll reimplement my code using Bio::Taxon directly instead. I made a little test using the node_name() method as Chris suggested, and it seems to do the trick nicely. > If that's really what the documentation says, it's wrong. I'm afraid so. In the POD > Title : species > Usage : $self->species( $species ); > $species = $self->species(); > Function: Get or set the scientific species name. > Example : $self->species('Homo sapiens'); > Returns : Scientific species name as string > Args : Scientific species name as string and the HOWTO http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#The_Species_Object > # legible and long > my $species_object = $seq_object->species; > my $species_string = $species_object->species; > > # Perlish > my $species_string = $seq_object->species->species; > # either way, $species_string is "Homo sapiens" Unless there's objection, I'll fix both of those. > It is the binomial() method that does this (as getter and setter). Great, thanks for the clarification, Hilmar. From bhakti.dwivedi at gmail.com Sun Jan 17 11:02:47 2010 From: bhakti.dwivedi at gmail.com (Bhakti Dwivedi) Date: Sun, 17 Jan 2010 11:02:47 -0500 Subject: [Bioperl-l] Reciprocal best hits using Bioperl? Message-ID: Hi Is there a Bio-perl module to parse the reciprocal best hits (query1-> hit1 && hit1 -> query1) from a blast table report? Thanks BD From cjfields at illinois.edu Sun Jan 17 12:45:08 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 17 Jan 2010 11:45:08 -0600 Subject: [Bioperl-l] Reciprocal best hits using Bioperl? In-Reply-To: References: Message-ID: <4FC546A8-079F-4A17-AB96-D4A0060904D6@illinois.edu> It's probably not best to use BioPerl directly for this. Have you tried OrthoMCL, or InParanoid? chris On Jan 17, 2010, at 10:02 AM, Bhakti Dwivedi wrote: > Hi > > Is there a Bio-perl module to parse the reciprocal best hits (query1-> hit1 > && hit1 -> query1) from a blast table report? > > Thanks > > BD > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Sun Jan 17 16:03:24 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 17 Jan 2010 16:03:24 -0500 Subject: [Bioperl-l] Reciprocal best hits using Bioperl? In-Reply-To: References: Message-ID: re Chris's answer, check out this archived post: http://bioperl.org/pipermail/bioperl-l/2008-March/027357.html cheers MAJ ----- Original Message ----- From: "Bhakti Dwivedi" To: Sent: Sunday, January 17, 2010 11:02 AM Subject: [Bioperl-l] Reciprocal best hits using Bioperl? > Hi > > Is there a Bio-perl module to parse the reciprocal best hits (query1-> hit1 > && hit1 -> query1) from a blast table report? > > Thanks > > BD > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From bhakti.dwivedi at gmail.com Sun Jan 17 16:10:03 2010 From: bhakti.dwivedi at gmail.com (Bhakti Dwivedi) Date: Sun, 17 Jan 2010 16:10:03 -0500 Subject: [Bioperl-l] Reciprocal best hits using Bioperl? In-Reply-To: References: Message-ID: Thank you! On Sun, Jan 17, 2010 at 4:03 PM, Mark A. Jensen wrote: > re Chris's answer, check out this archived post: > http://bioperl.org/pipermail/bioperl-l/2008-March/027357.html > cheers MAJ > ----- Original Message ----- From: "Bhakti Dwivedi" < > bhakti.dwivedi at gmail.com> > To: > Sent: Sunday, January 17, 2010 11:02 AM > Subject: [Bioperl-l] Reciprocal best hits using Bioperl? > > > Hi >> >> Is there a Bio-perl module to parse the reciprocal best hits (query1-> >> hit1 >> && hit1 -> query1) from a blast table report? >> >> Thanks >> >> BD >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> From cjfields at illinois.edu Sun Jan 17 17:00:02 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 17 Jan 2010 16:00:02 -0600 Subject: [Bioperl-l] Reciprocal best hits using Bioperl? In-Reply-To: References: Message-ID: <392263B8-10EC-4361-82A8-0ED8E9FC7627@illinois.edu> OrthoMCL has updated to v2 and no longer uses BioPerl, just plain perl. Database is available here: http://orthomcl.org/cgi-bin/OrthoMclWeb.cgi Package (you'll need a few other things to get it working): http://orthomcl.org/common/downloads/software/ chris On Jan 17, 2010, at 3:10 PM, Bhakti Dwivedi wrote: > Thank you! > > > On Sun, Jan 17, 2010 at 4:03 PM, Mark A. Jensen wrote: > >> re Chris's answer, check out this archived post: >> http://bioperl.org/pipermail/bioperl-l/2008-March/027357.html >> cheers MAJ >> ----- Original Message ----- From: "Bhakti Dwivedi" < >> bhakti.dwivedi at gmail.com> >> To: >> Sent: Sunday, January 17, 2010 11:02 AM >> Subject: [Bioperl-l] Reciprocal best hits using Bioperl? >> >> >> Hi >>> >>> Is there a Bio-perl module to parse the reciprocal best hits (query1-> >>> hit1 >>> && hit1 -> query1) from a blast table report? >>> >>> Thanks >>> >>> BD >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From tristan.lefebure at gmail.com Sun Jan 17 18:12:56 2010 From: tristan.lefebure at gmail.com (Tristan Lefebure) Date: Sun, 17 Jan 2010 18:12:56 -0500 Subject: [Bioperl-l] Reciprocal best hits using Bioperl? In-Reply-To: <392263B8-10EC-4361-82A8-0ED8E9FC7627@illinois.edu> References: <392263B8-10EC-4361-82A8-0ED8E9FC7627@illinois.edu> Message-ID: <201001171812.56238.tristan.lefebure@gmail.com> The transition to orthoMCL v2 being a bit painful (you need a MySQL database), I recently switched directly to MCL and the accompanying mclblastline and co programs. Modular, simple and very fast. Following some simulations, It gives better results with incomplete genomes than orthoMCL v1.x ... http://micans.org/mcl/ --Tristan On Sunday 17 January 2010 17:00:02 Chris Fields wrote: > OrthoMCL has updated to v2 and no longer uses BioPerl, > just plain perl. Database is available here: > > http://orthomcl.org/cgi-bin/OrthoMclWeb.cgi > > Package (you'll need a few other things to get it > working): > > http://orthomcl.org/common/downloads/software/ > > chris > > On Jan 17, 2010, at 3:10 PM, Bhakti Dwivedi wrote: > > Thank you! > > > > On Sun, Jan 17, 2010 at 4:03 PM, Mark A. Jensen wrote: > >> re Chris's answer, check out this archived post: > >> http://bioperl.org/pipermail/bioperl-l/2008-March/0273 > >>57.html cheers MAJ > >> ----- Original Message ----- From: "Bhakti Dwivedi" < > >> bhakti.dwivedi at gmail.com> > >> To: > >> Sent: Sunday, January 17, 2010 11:02 AM > >> Subject: [Bioperl-l] Reciprocal best hits using > >> Bioperl? > >> > >> > >> Hi > >> > >>> Is there a Bio-perl module to parse the reciprocal > >>> best hits (query1-> hit1 > >>> && hit1 -> query1) from a blast table report? > >>> > >>> Thanks > >>> > >>> BD > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jason at bioperl.org Sun Jan 17 18:59:05 2010 From: jason at bioperl.org (Jason Stajich) Date: Sun, 17 Jan 2010 15:59:05 -0800 Subject: [Bioperl-l] Reciprocal best hits using Bioperl? In-Reply-To: <201001171812.56238.tristan.lefebure@gmail.com> References: <392263B8-10EC-4361-82A8-0ED8E9FC7627@illinois.edu> <201001171812.56238.tristan.lefebure@gmail.com> Message-ID: <9151FF3A-B51D-4E03-AAA0-4463A6A57F10@bioperl.org> yes - but mcl alone is something slightly different in that it doesn't correct for inparalogs, but for incomplete genomes this is probably okay. orthomcl2 does correct the major memory hog problem and efficiencies in the parsing in the previous version by relying on the db for the indexing and looking of the reciprocal hits. -jason On Jan 17, 2010, at 3:12 PM, Tristan Lefebure wrote: > The transition to orthoMCL v2 being a bit painful (you need > a MySQL database), I recently switched directly to MCL and > the accompanying mclblastline and co programs. Modular, > simple and very fast. Following some simulations, It gives > better results with incomplete genomes than orthoMCL v1.x > ... > > http://micans.org/mcl/ > > --Tristan > > On Sunday 17 January 2010 17:00:02 Chris Fields wrote: >> OrthoMCL has updated to v2 and no longer uses BioPerl, >> just plain perl. Database is available here: >> >> http://orthomcl.org/cgi-bin/OrthoMclWeb.cgi >> >> Package (you'll need a few other things to get it >> working): >> >> http://orthomcl.org/common/downloads/software/ >> >> chris >> >> On Jan 17, 2010, at 3:10 PM, Bhakti Dwivedi wrote: >>> Thank you! >>> >>> On Sun, Jan 17, 2010 at 4:03 PM, Mark A. Jensen > wrote: >>>> re Chris's answer, check out this archived post: >>>> http://bioperl.org/pipermail/bioperl-l/2008-March/0273 >>>> 57.html cheers MAJ >>>> ----- Original Message ----- From: "Bhakti Dwivedi" < >>>> bhakti.dwivedi at gmail.com> >>>> To: >>>> Sent: Sunday, January 17, 2010 11:02 AM >>>> Subject: [Bioperl-l] Reciprocal best hits using >>>> Bioperl? >>>> >>>> >>>> Hi >>>> >>>>> Is there a Bio-perl module to parse the reciprocal >>>>> best hits (query1-> hit1 >>>>> && hit1 -> query1) from a blast table report? >>>>> >>>>> Thanks >>>>> >>>>> BD >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ From tristan.lefebure at gmail.com Sun Jan 17 20:36:38 2010 From: tristan.lefebure at gmail.com (Tristan Lefebure) Date: Sun, 17 Jan 2010 20:36:38 -0500 Subject: [Bioperl-l] Reciprocal best hits using Bioperl? In-Reply-To: <9151FF3A-B51D-4E03-AAA0-4463A6A57F10@bioperl.org> References: <201001171812.56238.tristan.lefebure@gmail.com> <9151FF3A-B51D-4E03-AAA0-4463A6A57F10@bioperl.org> Message-ID: <201001172036.39032.tristan.lefebure@gmail.com> On Sunday 17 January 2010 18:59:05 Jason Stajich wrote: > yes - but mcl alone is something slightly different in > that it doesn't correct for inparalogs, but for > incomplete genomes this is probably okay. interestingly, my experience with not too divergent bacterial genomes (same genera) does not support the normalization used in the orthoMCL (which, as far as I understand, is a standardization of the -Log10(evalue) per taxa combination, including a taxa with itself). MCL, which does not do any normalization (just -Log10(evalue)) gives about the same number of false negative (i.e. missed orthologs), but a lot less false positive (false orthologs). In other words, you get many fake singletons. I don't known exactly if the problem lies in the normalization process or the fact that orthoMCLv1.x is using a very old version of MCL. What I do known is that many false positive are made of short or incomplete proteins that are very common in draft genomes and automatic annotations... Things might be completely different with more divergent and globally longer proteins. Testing orthoMCLv2 on the same data set would probably give the answer. --Tristan From robert.bradbury at gmail.com Mon Jan 18 05:20:33 2010 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Mon, 18 Jan 2010 05:20:33 -0500 Subject: [Bioperl-l] Reciprocal best hits using Bioperl? In-Reply-To: <201001172036.39032.tristan.lefebure@gmail.com> References: <201001171812.56238.tristan.lefebure@gmail.com> <9151FF3A-B51D-4E03-AAA0-4463A6A57F10@bioperl.org> <201001172036.39032.tristan.lefebure@gmail.com> Message-ID: My comment might be that the problem with OrthoMCL is that it is primarily lower organisms. The problem with Ensembl (and some other databases) is that it is primarliy higher organisms (though they do include Drosophila, C. elegans and Yeast). The problem arises when one wants to cross those boundaries. For example the 5-10 antioxidant proteins, the ~150 DNA repair proteins, many of the mitochondrial (ETC) proteins, the ribosomal rRNA's & tRNAs, and the fundamental biochemistry (EC) proteins are homologous all the way from the most ancient bacteria through H. sapiens. The only way to play in the mixed arena of prokaryotes and eukaryotes involving fundamental vectors in evolution is to either construct ones own databases (which presumably means getting involved with MySQL, and probably spending some $$$ on hardware) or to develop some BioPerl modules that can do the SpeciesX vs. SpeciesY comparisons on demand using some part of the cloud. This problem isn't going to get smaller its only going to get larger, now that the cost of sequencing (pseudo-resequencing) a vertebrate genome is starting to come in under $10,000 and people are starting to seriously talk about 10,000 vertebrate genomes. 10,000 x 10,000 x 20,000 (genes) isn't something people are going to undertake very soon. Robert On 1/17/10, Tristan Lefebure wrote: > On Sunday 17 January 2010 18:59:05 Jason Stajich wrote: >> yes - but mcl alone is something slightly different in >> that it doesn't correct for inparalogs, but for >> incomplete genomes this is probably okay. > > interestingly, my experience with not too divergent > bacterial genomes (same genera) does not support the > normalization used in the orthoMCL (which, as far as I > understand, is a standardization of the -Log10(evalue) per > taxa combination, including a taxa with itself). MCL, which > does not do any normalization (just -Log10(evalue)) gives > about the same number of false negative (i.e. missed > orthologs), but a lot less false positive (false orthologs). > In other words, you get many fake singletons. I don't known > exactly if the problem lies in the normalization process or > the fact that orthoMCLv1.x is using a very old version of > MCL. What I do known is that many false positive are made of > short or incomplete proteins that are very common in draft > genomes and automatic annotations... Things might be > completely different with more divergent and globally longer > proteins. Testing orthoMCLv2 on the same data set would > probably give the answer. > > --Tristan > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From ghhu at sibs.ac.cn Sun Jan 17 21:34:23 2010 From: ghhu at sibs.ac.cn (Guohong Hu) Date: Mon, 18 Jan 2010 10:34:23 +0800 Subject: [Bioperl-l] Bioperl 1.6 Message-ID: <004f01ca97e6$bee06650$3ca132f0$@ac.cn> Hi there, I was trying to install BioPerl in windows using ppm, by following the instruction in "http://bioperl.open-bio.org/wiki/Installing_Bioperl_on_Windows". I set up the repositories, and did the search of Bioperl packages. The latest version available is BioPerl 1.6.1 on http://bioperl.org/DIST. When I tried to install it, a number of prerequisite modules were being installed too, which include Bioperl 1.4. Then an error message showed up during installation: "ERROR: File conflict for 'C:/Perl/html/bin/bp_aacomp.html'.The package BioPerl has already installed a file that package bioperl wants to install." It looks to me that BioPerl 1.6.1 had installed a file that bioperl 1.4 wanted to install again. I don't know why bioperl 1.4 was one of the prerequisites for 1.6.1. If I just install 1.4, it will be installed without errors. But I need a newer version, because some modules (like Bio::Tools::HMM) is not included in 1.4. I saw on internet that somebody had the same problem when he was trying to install BioPerl 1.5, but I didn't find the solution. Anybody has a clue on that? Thank you for your time. GH From cjfields at illinois.edu Mon Jan 18 10:30:20 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 18 Jan 2010 09:30:20 -0600 Subject: [Bioperl-l] Bioperl 1.6 In-Reply-To: <004f01ca97e6$bee06650$3ca132f0$@ac.cn> References: <004f01ca97e6$bee06650$3ca132f0$@ac.cn> Message-ID: Guohong, 1.6.1 PPM doesn't (at least, shouldn't) require BioPerl 1.4 to be installed first. Make sure the repos are set according to the Windows installation instructions on the BioPerl wiki: http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows IIRC the actual order of the PPM repository can be critical (PPM pulls based on highest version, first repo, but sometimes it gets confused). Just curious but where is the v 1.4 PPM located? If it is local to our PPM repo I can physically remove it to prevent this from happening. chris On Jan 17, 2010, at 8:34 PM, Guohong Hu wrote: > Hi there, > > > > I was trying to install BioPerl in windows using ppm, by following the > instruction in > "http://bioperl.open-bio.org/wiki/Installing_Bioperl_on_Windows". I set up > the repositories, and did the search of Bioperl packages. The latest version > available is BioPerl 1.6.1 on http://bioperl.org/DIST. When I tried to > install it, a number of prerequisite modules were being installed too, which > include Bioperl 1.4. Then an error message showed up during installation: > > > > "ERROR: File conflict for 'C:/Perl/html/bin/bp_aacomp.html'.The package > BioPerl has already installed a file that package bioperl wants to install." > > > > It looks to me that BioPerl 1.6.1 had installed a file that bioperl 1.4 > wanted to install again. I don't know why bioperl 1.4 was one of the > prerequisites for 1.6.1. If I just install 1.4, it will be installed without > errors. But I need a newer version, because some modules (like > > Bio::Tools::HMM) is not included in 1.4. > > > > I saw on internet that somebody had the same problem when he was trying to > install BioPerl 1.5, but I didn't find the solution. > > > > Anybody has a clue on that? Thank you for your time. > > > > GH > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Jan 18 11:12:08 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 18 Jan 2010 10:12:08 -0600 Subject: [Bioperl-l] Reciprocal best hits using Bioperl? In-Reply-To: References: <201001171812.56238.tristan.lefebure@gmail.com> <9151FF3A-B51D-4E03-AAA0-4463A6A57F10@bioperl.org> <201001172036.39032.tristan.lefebure@gmail.com> Message-ID: (my small rant on this) On Jan 18, 2010, at 4:20 AM, Robert Bradbury wrote: > My comment might be that the problem with OrthoMCL is that it is > primarily lower organisms. The problem with Ensembl (and some other > databases) is that it is primarliy higher organisms (though they do > include Drosophila, C. elegans and Yeast). OrthoMCL v2 handles both lower and higher organism; I've used it for both, with decent success. Most other ortholog tools do as well (if I'm not mistaken, ensembl also uses MCL under the hood, unless that's changed). I don't believe one should be completely bound to one toolset, particularly in this case (there are lots of nice ortholog clustering tools using various moeans of comparison out there), but I do think OrthoMCL is very good as an initial pass. If anything, I would like a set of (possibly bioperl-based, definitely DB-based) modules that can deal with this information. The more imperative issue in my opinion is that one is prisoner to the gene models for those specific organisms of interest, and this may vary widely depending on the source of those gene models (Ensembl, UCSC, NCBI, EBI, centralized MODs like FlyBase, etc). For instance, if gene models are poorly curated or rarely updated, the comparisons may be significantly flawed. Some of these issues may also be (somewhat) alleviated once more transcriptome data is available that helps clear up gene model ambiguities, but that won't be true for all organisms, at least initially. Note this isn't meant as a slam on any specific DBs or MODs in general, the problem is one born of the fact that there isn't a single, centralized, trusted, consistently updated source for this data, specifically something that will handle moderated third-party annotation. That's a very difficult problem to solve effectively. Some of these very issues crept up at the GMOD conference, and there appears to be consensus that a real attempt is needed to address this. I don't know, maybe it's just unicorns and rainbows. Personally I do think the situation will improve, as there seems to be great demand for it, but it requires time, resources, manpower, money, cat herding, etc. > The problem arises when one wants to cross those boundaries. For > example the 5-10 antioxidant proteins, the ~150 DNA repair proteins, > many of the mitochondrial (ETC) proteins, the ribosomal rRNA's & > tRNAs, and the fundamental biochemistry (EC) proteins are homologous > all the way from the most ancient bacteria through H. sapiens. The > only way to play in the mixed arena of prokaryotes and eukaryotes > involving fundamental vectors in evolution is to either construct ones > own databases (which presumably means getting involved with MySQL, and > probably spending some $$$ on hardware) or to develop some BioPerl > modules that can do the SpeciesX vs. SpeciesY comparisons on demand > using some part of the cloud. This problem isn't going to get smaller > its only going to get larger, now that the cost of sequencing > (pseudo-resequencing) a vertebrate genome is starting to come in under > $10,000 and people are starting to seriously talk about 10,000 > vertebrate genomes. 10,000 x 10,000 x 20,000 (genes) isn't something > people are going to undertake very soon. > > Robert They're already undertaking it now using a broad range of organisms, in and out of the cloud. In most cases one can amend a prior recip. comparative analysis with new data fairly easily, if one takes care to do so early on (i.e. set up the BLAST databases with a specified defined size for comparative stats between separate analyses). OrthoMCL v2 describes a procedure to do this, and I believe others have similar methodology. I could also see possible ways one can further optimize this, for instance in cases where two very closely-related organisms are compared, where translated seqs are 100% identical, etc. IIRC, the OrthoMCL DB site already has a way to upload custom sets of protein data for mapping to (already pre-run) clusters. Just the fact that the tools are available as OS, they're semi-automated, and can be generically applied to data of personal interest is a great boon. Not sure I see the downside of that, and I'm pretty confident the scalability issues will be addressed in some way. chris From maj at fortinbras.us Mon Jan 18 11:33:12 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 18 Jan 2010 11:33:12 -0500 Subject: [Bioperl-l] Bioperl 1.6 In-Reply-To: References: <004f01ca97e6$bee06650$3ca132f0$@ac.cn> Message-ID: <6093E45F17B543438AC02E6C626439E1@NewLife> this issue's come up before, see this thread http://lists.open-bio.org/pipermail/bioperl-l/2009-October/031400.html MAJ ----- Original Message ----- From: "Chris Fields" To: "Guohong Hu" Cc: Sent: Monday, January 18, 2010 10:30 AM Subject: Re: [Bioperl-l] Bioperl 1.6 > Guohong, > > 1.6.1 PPM doesn't (at least, shouldn't) require BioPerl 1.4 to be installed > first. Make sure the repos are set according to the Windows installation > instructions on the BioPerl wiki: > > http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows > > IIRC the actual order of the PPM repository can be critical (PPM pulls based > on highest version, first repo, but sometimes it gets confused). Just curious > but where is the v 1.4 PPM located? If it is local to our PPM repo I can > physically remove it to prevent this from happening. > > chris > > On Jan 17, 2010, at 8:34 PM, Guohong Hu wrote: > >> Hi there, >> >> >> >> I was trying to install BioPerl in windows using ppm, by following the >> instruction in >> "http://bioperl.open-bio.org/wiki/Installing_Bioperl_on_Windows". I set up >> the repositories, and did the search of Bioperl packages. The latest version >> available is BioPerl 1.6.1 on http://bioperl.org/DIST. When I tried to >> install it, a number of prerequisite modules were being installed too, which >> include Bioperl 1.4. Then an error message showed up during installation: >> >> >> >> "ERROR: File conflict for 'C:/Perl/html/bin/bp_aacomp.html'.The package >> BioPerl has already installed a file that package bioperl wants to install." >> >> >> >> It looks to me that BioPerl 1.6.1 had installed a file that bioperl 1.4 >> wanted to install again. I don't know why bioperl 1.4 was one of the >> prerequisites for 1.6.1. If I just install 1.4, it will be installed without >> errors. But I need a newer version, because some modules (like >> >> Bio::Tools::HMM) is not included in 1.4. >> >> >> >> I saw on internet that somebody had the same problem when he was trying to >> install BioPerl 1.5, but I didn't find the solution. >> >> >> >> Anybody has a clue on that? Thank you for your time. >> >> >> >> GH >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Mon Jan 18 12:18:34 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 18 Jan 2010 11:18:34 -0600 Subject: [Bioperl-l] Bioperl 1.6 In-Reply-To: <6093E45F17B543438AC02E6C626439E1@NewLife> References: <004f01ca97e6$bee06650$3ca132f0$@ac.cn> <6093E45F17B543438AC02E6C626439E1@NewLife> Message-ID: Mark, Odd issue, maybe it's a dependency like Bio::ASN1::EntrezGene that's causing this? Regardless, it's problematic for me to test this out directly, at least for the next few days. Maybe someone could try it? Also, there is the Strawberry Perl alternative, which uses CPAN (I think ActiveState also supports this). chris On Jan 18, 2010, at 10:33 AM, Mark A. Jensen wrote: > this issue's come up before, see this thread > http://lists.open-bio.org/pipermail/bioperl-l/2009-October/031400.html > MAJ > ----- Original Message ----- From: "Chris Fields" > To: "Guohong Hu" > Cc: > Sent: Monday, January 18, 2010 10:30 AM > Subject: Re: [Bioperl-l] Bioperl 1.6 > > >> Guohong, >> >> 1.6.1 PPM doesn't (at least, shouldn't) require BioPerl 1.4 to be installed first. Make sure the repos are set according to the Windows installation instructions on the BioPerl wiki: >> >> http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows >> >> IIRC the actual order of the PPM repository can be critical (PPM pulls based on highest version, first repo, but sometimes it gets confused). Just curious but where is the v 1.4 PPM located? If it is local to our PPM repo I can physically remove it to prevent this from happening. >> >> chris >> >> On Jan 17, 2010, at 8:34 PM, Guohong Hu wrote: >> >>> Hi there, >>> >>> >>> >>> I was trying to install BioPerl in windows using ppm, by following the >>> instruction in >>> "http://bioperl.open-bio.org/wiki/Installing_Bioperl_on_Windows". I set up >>> the repositories, and did the search of Bioperl packages. The latest version >>> available is BioPerl 1.6.1 on http://bioperl.org/DIST. When I tried to >>> install it, a number of prerequisite modules were being installed too, which >>> include Bioperl 1.4. Then an error message showed up during installation: >>> >>> >>> >>> "ERROR: File conflict for 'C:/Perl/html/bin/bp_aacomp.html'.The package >>> BioPerl has already installed a file that package bioperl wants to install." >>> >>> >>> >>> It looks to me that BioPerl 1.6.1 had installed a file that bioperl 1.4 >>> wanted to install again. I don't know why bioperl 1.4 was one of the >>> prerequisites for 1.6.1. If I just install 1.4, it will be installed without >>> errors. But I need a newer version, because some modules (like >>> >>> Bio::Tools::HMM) is not included in 1.4. >>> >>> >>> >>> I saw on internet that somebody had the same problem when he was trying to >>> install BioPerl 1.5, but I didn't find the solution. >>> >>> >>> >>> Anybody has a clue on that? Thank you for your time. >>> >>> >>> >>> GH >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From clarsen at vecna.com Mon Jan 18 12:42:13 2010 From: clarsen at vecna.com (Chris Larsen) Date: Mon, 18 Jan 2010 12:42:13 -0500 Subject: [Bioperl-l] Reciprocal best blast hits using BioPerl? In-Reply-To: References: Message-ID: Bhakti, (and Chris, Mark)-- Yes there is some perl available to parse reciprocal best blast hits. Mark's referenced / archived post was mine, we were looking to do what you wanted. Here we proceed with the thread. We ended up implementing OrthoMCL 1.4 as Chris F pointed to, and then made a simple perl parser that would take the raw OrthoMCL output, do splits, and spit out a delimited table of all the orthologs in a group, for say Mycobacterium Genus, so you could stuff it into DBLoader. The link to the script, SOP, and method is at: http://www.biohealthbase.org/brcDocs/documents/BHB_ORTHOLOG_SOP.pdf Giving e.g.: Francisella 1 110321310 Francisella 1 110321361 Francisella 1 56707275 Francisella 1 56707366 Francisella 1 56707462 Five members of Ortholog Group 1, with just their gi number. And you can see the results of that parsing, supported by a database, being used to load BioHealthbase with all the reciprocal best blast hits plus other OrthoMCL parsing, for mycobacterial PolA at: http://www.biohealthbase.org/brc/details.do?locus=MAV_3155&decorator=mycobacterium See? Pretty? We were just interested in making ortholog groups on the bais of paralog-conscious reciprocal blast stuff. Like you. This package and doc I've made does what you want I think, as long as you stay in prokaryotes. But--careful...garbage in, garbage out. We started with clean Genuses. (. o O Genii?). You'll get more junky HUGE and TINY ortholog groups if you put in different Orders of microbes. Its taxa sensitive. OrthoMCL author David Roos is great at it though and designed it in mind of higher unicellular euks too...comb the docs for that; sorry I was doing bacterial work at the time and cant guide you if thats what you want.. If you end up installing OrthMCL 1.4, you can pipe the output to this method and get out useable stuff. Hope it works for you. Cheers, Chris L -- Christopher Larsen, Ph.D. Sr. Scientist / Grants Manager Vecna Technologies 6404 Ivy Lane #500 Greenbelt, MD 20770 Phone: (240) 965-4525 Fax: (240) 547-6133 240-737-4525 From maj at fortinbras.us Mon Jan 18 14:37:43 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 18 Jan 2010 14:37:43 -0500 Subject: [Bioperl-l] Bioperl 1.6 In-Reply-To: References: <004f01ca97e6$bee06650$3ca132f0$@ac.cn> <6093E45F17B543438AC02E6C626439E1@NewLife> Message-ID: <61F331117B7C4E2282684FA240B9710F@NewLife> I will play around with it-- in the meantime, Guohong, please look at the following http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows#Comand-line_Installation where there is a workaround for this issue, using the ppm-shell-- cheers, Mark ----- Original Message ----- From: "Chris Fields" To: "Mark A. Jensen" Cc: "Guohong Hu" ; Sent: Monday, January 18, 2010 12:18 PM Subject: Re: [Bioperl-l] Bioperl 1.6 Mark, Odd issue, maybe it's a dependency like Bio::ASN1::EntrezGene that's causing this? Regardless, it's problematic for me to test this out directly, at least for the next few days. Maybe someone could try it? Also, there is the Strawberry Perl alternative, which uses CPAN (I think ActiveState also supports this). chris On Jan 18, 2010, at 10:33 AM, Mark A. Jensen wrote: > this issue's come up before, see this thread > http://lists.open-bio.org/pipermail/bioperl-l/2009-October/031400.html > MAJ > ----- Original Message ----- From: "Chris Fields" > To: "Guohong Hu" > Cc: > Sent: Monday, January 18, 2010 10:30 AM > Subject: Re: [Bioperl-l] Bioperl 1.6 > > >> Guohong, >> >> 1.6.1 PPM doesn't (at least, shouldn't) require BioPerl 1.4 to be installed >> first. Make sure the repos are set according to the Windows installation >> instructions on the BioPerl wiki: >> >> http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows >> >> IIRC the actual order of the PPM repository can be critical (PPM pulls based >> on highest version, first repo, but sometimes it gets confused). Just >> curious but where is the v 1.4 PPM located? If it is local to our PPM repo I >> can physically remove it to prevent this from happening. >> >> chris >> >> On Jan 17, 2010, at 8:34 PM, Guohong Hu wrote: >> >>> Hi there, >>> >>> >>> >>> I was trying to install BioPerl in windows using ppm, by following the >>> instruction in >>> "http://bioperl.open-bio.org/wiki/Installing_Bioperl_on_Windows". I set up >>> the repositories, and did the search of Bioperl packages. The latest version >>> available is BioPerl 1.6.1 on http://bioperl.org/DIST. When I tried to >>> install it, a number of prerequisite modules were being installed too, which >>> include Bioperl 1.4. Then an error message showed up during installation: >>> >>> >>> >>> "ERROR: File conflict for 'C:/Perl/html/bin/bp_aacomp.html'.The package >>> BioPerl has already installed a file that package bioperl wants to install." >>> >>> >>> >>> It looks to me that BioPerl 1.6.1 had installed a file that bioperl 1.4 >>> wanted to install again. I don't know why bioperl 1.4 was one of the >>> prerequisites for 1.6.1. If I just install 1.4, it will be installed without >>> errors. But I need a newer version, because some modules (like >>> >>> Bio::Tools::HMM) is not included in 1.4. >>> >>> >>> >>> I saw on internet that somebody had the same problem when he was trying to >>> install BioPerl 1.5, but I didn't find the solution. >>> >>> >>> >>> Anybody has a clue on that? Thank you for your time. >>> >>> >>> >>> GH >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jason at bioperl.org Mon Jan 18 15:24:33 2010 From: jason at bioperl.org (Jason Stajich) Date: Mon, 18 Jan 2010 12:24:33 -0800 Subject: [Bioperl-l] Reciprocal best hits using Bioperl? In-Reply-To: References: <201001171812.56238.tristan.lefebure@gmail.com> <9151FF3A-B51D-4E03-AAA0-4463A6A57F10@bioperl.org> <201001172036.39032.tristan.lefebure@gmail.com> Message-ID: <68DF70A5-63A6-428D-A7F1-7B3D01528375@bioperl.org> On Jan 18, 2010, at 8:12 AM, Chris Fields wrote: > (my small rant on this) > > On Jan 18, 2010, at 4:20 AM, Robert Bradbury wrote: > >> My comment might be that the problem with OrthoMCL is that it is >> primarily lower organisms. The problem with Ensembl (and some other >> databases) is that it is primarliy higher organisms (though they do >> include Drosophila, C. elegans and Yeast). > > OrthoMCL v2 handles both lower and higher organism; I've used it for > both, with decent success. Most other ortholog tools do as well (if > I'm not mistaken, ensembl also uses MCL under the hood, unless > that's changed). I don't believe one should be completely bound to > one toolset, particularly in this case (there are lots of nice > ortholog clustering tools using various moeans of comparison out > there), but I do think OrthoMCL is very good as an initial pass. If > anything, I would like a set of (possibly bioperl-based, definitely > DB-based) modules that can deal with this information. > > The more imperative issue in my opinion is that one is prisoner to > the gene models for those specific organisms of interest, and this > may vary widely depending on the source of those gene models > (Ensembl, UCSC, NCBI, EBI, centralized MODs like FlyBase, etc). For > instance, if gene models are poorly curated or rarely updated, the > comparisons may be significantly flawed. Some of these issues may > also be (somewhat) alleviated once more transcriptome data is > available that helps clear up gene model ambiguities, but that won't > be true for all organisms, at least initially. > > Note this isn't meant as a slam on any specific DBs or MODs in > general, the problem is one born of the fact that there isn't a > single, centralized, trusted, consistently updated source for this > data, specifically something that will handle moderated third-party > annotation. That's a very difficult problem to solve effectively. > Some of these very issues crept up at the GMOD conference, and there > appears to be consensus that a real attempt is needed to address this. > > I don't know, maybe it's just unicorns and rainbows. Personally I > do think the situation will improve, as there seems to be great > demand for it, but it requires time, resources, manpower, money, cat > herding, etc. > >> The problem arises when one wants to cross those boundaries. For >> example the 5-10 antioxidant proteins, the ~150 DNA repair proteins, >> many of the mitochondrial (ETC) proteins, the ribosomal rRNA's & >> tRNAs, and the fundamental biochemistry (EC) proteins are homologous >> all the way from the most ancient bacteria through H. sapiens. The >> only way to play in the mixed arena of prokaryotes and eukaryotes >> involving fundamental vectors in evolution is to either construct >> ones >> own databases (which presumably means getting involved with MySQL, >> and >> probably spending some $$$ on hardware) or to develop some BioPerl >> modules that can do the SpeciesX vs. SpeciesY comparisons on demand >> using some part of the cloud. This problem isn't going to get >> smaller >> its only going to get larger, now that the cost of sequencing >> (pseudo-resequencing) a vertebrate genome is starting to come in >> under >> $10,000 and people are starting to seriously talk about 10,000 >> vertebrate genomes. 10,000 x 10,000 x 20,000 (genes) isn't something >> people are going to undertake very soon. >> >> Robert > > They're already undertaking it now using a broad range of organisms, > in and out of the cloud. In most cases one can amend a prior recip. > comparative analysis with new data fairly easily, if one takes care > to do so early on (i.e. set up the BLAST databases with a specified > defined size for comparative stats between separate analyses). > OrthoMCL v2 describes a procedure to do this, and I believe others > have similar methodology. > > I could also see possible ways one can further optimize this, for > instance in cases where two very closely-related organisms are > compared, where translated seqs are 100% identical, etc. IIRC, the > OrthoMCL DB site already has a way to upload custom sets of protein > data for mapping to (already pre-run) clusters. Just the fact that > the tools are available as OS, they're semi-automated, and can be > generically applied to data of personal interest is a great boon. > Not sure I see the downside of that, and I'm pretty confident the > scalability issues will be addressed in some way. I think that the approach that Paul Thomas's group at SRI http://www.ai.sri.com/esb/ is doing is really what you'd want to focus on if you are only interested in a particular set of gene families rather than de novo clustering. That or the PhyloFacts approach http://phylogenomics.berkeley.edu/phylofacts/ . That is where HMMs are more appropriate, focusing on your initial seed set of families of proteins. HMMs for your families with some automated clustering initially to get better resolution. Once you start throwing multiple 10^6 proteins the unsupervised clustering approach may not be able to give as accurate or timely results but can be a good initial filtering step depending on how much initial knowledge you are starting with. Using HMM models won't be as computationally expensive either if you are compute limited. TreeFam is also providing curated phylogenies of gene families http://www.treefam.org/ that span the optisthokonts in that a few fungi are sprinkled in. Also things like http://boinc.bio.wzw.tum.de/boincsimap/ provide ways to use distributed computing to calculate the matrix of similarities among proteins if you are interested in the exhaustive approach. -jason > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ From jay at jays.net Mon Jan 18 18:36:20 2010 From: jay at jays.net (Jay Hannah) Date: Mon, 18 Jan 2010 17:36:20 -0600 Subject: [Bioperl-l] Reciprocal best hits using Bioperl? In-Reply-To: References: Message-ID: <9AA13F94-3336-4CC1-89C4-249D0EB7C857@jays.net> On Jan 17, 2010, at 10:02 AM, Bhakti Dwivedi wrote: > Is there a Bio-perl module to parse the reciprocal best hits (query1-> hit1 > && hit1 -> query1) from a blast table report? If all the advice and resources in this thread have not dissuaded you from writing your own, you could glance at cross_blast() here as reference: https://clabsvn.ist.unomaha.edu/anonsvn/user/jhannah/UNO/seqlab/seqlab/tutorial.pod About the (abandoned) project: http://clab.ist.unomaha.edu/CLAB/index.php/SeqLab_%28Perl%29 I wrote that in 2006 for clustering a few hundred proteins based on custom criteria. Cheers, Jay Hannah http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah From jay at jays.net Mon Jan 18 19:22:48 2010 From: jay at jays.net (Jay Hannah) Date: Mon, 18 Jan 2010 18:22:48 -0600 Subject: [Bioperl-l] Bio::BroodComb - RFC Message-ID: <638696D6-7529-4717-A05A-F1E8FF1C5A8F@jays.net> I formalized a little framework so I could stop re-writing little programs that do some things people frequently ask me to do. http://github.com/jhannah/bio-broodcomb It stores everything in SQLite so users can write custom reports if they want to. It uses BioPerl and some shiny CPAN beads (DBIx::Class, Moose::Role). Tests included. The first two functions I stuck in the framework: Find subsequences (Bio::BroodComb::SubSeq): use Bio::BroodComb; my $bc = Bio::BroodComb->new(); $bc->load_large_seq(file => "large_seq.fasta"); $bc->load_small_seq(file => "small_seq.fasta"); $bc->find_subseqs(); print $bc->subseq_report1; In-silico PCR (Bio::BroodComb::PCR): use Bio::BroodComb; my $bc = Bio::BroodComb->new(); $bc->load_large_seq(file => "large_seq.fasta"); $bc->add_primerset( description => "U5/R", # however you want it reported forward_primer => 'GCGGGCAGCAATACTGCTTTGTAA', reverse_primer => 'ACCAGCGTTCAGCATATGGAGGAT', ); $bc->find_pcr_hits(); $bc->find_pcr_products(); print $bc->pcr_report1; I find this rather handy, so will probably be adding all my applicable future work to it instead of writing stand-alone programs. Not sure if it should be renamed for eventual CPAN / wherever. Suggestions, contributions welcome. :) http://github.com/jhannah/bio-broodcomb Jay Hannah http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah From ocornejo at gmail.com Mon Jan 18 19:46:10 2010 From: ocornejo at gmail.com (Omar Cornejo) Date: Mon, 18 Jan 2010 16:46:10 -0800 (PST) Subject: [Bioperl-l] installing bioperl for mac Message-ID: <780113ef-a931-4c2e-bdfe-6a67706e2d0e@p16g2000vbe.googlegroups.com> Dear People, I have tried to install Bioperl in my new Mac Book, which carries the latest perl distribution (5.10.0) and for some reason I can't (using fink) make it recognize this version or perl. I have tried: fink install bioperl-pm510 fink install bioperl-pm5100 but neither one works. Is it fine installing bioperl for perl v 5.9? thank you, Omar Cornejo From jason at bioperl.org Mon Jan 18 20:04:31 2010 From: jason at bioperl.org (Jason Stajich) Date: Mon, 18 Jan 2010 17:04:31 -0800 Subject: [Bioperl-l] AlignIO crashes when reading stockholm alignment with negative PDB ranges In-Reply-To: <4B5502D9.2010706@gmail.com> References: <49C2B97B.7070304@gmail.com> <49C405F0.5050100@gmail.com> <4B4E7A07.7070805@gmail.com> <94913CE4-22AA-407C-9B27-7572A680C9F7@bioperl.org> <4B5502D9.2010706@gmail.com> Message-ID: Alexandr - Thanks for getting back to us - I am guessing the parser needs to recognize negative coordinates around about line 370 in Bio/AlignIO/ Handler/GenericAlignHandler.pm which assumes a split on '-' will be sufficient. Can you post it as a bug to bugzilla along with attaching a record and script that replicates the problem so a test can be written for this. http://bugzilla.open-bio.org/ -jason On Jan 18, 2010, at 4:54 PM, Alexandr Bezginov wrote: > I have contacted Pfam, and I have been told that The PDB file actually > does include a reference to residue "-1": > > DBREF 1E5N A -1 347 UNP P14768 XYNA_PSEFL 264 611 > > DBREF 1E5N B -1 347 UNP P14768 XYNA_PSEFL 264 611 > > > Since negative numbers are allowed in PDB, the data should probably be > considered valid. > > There are quite a few records like this, so this is not an isolated > issue. > > Alexandr > > On 1/14/2010 7:20 PM, Jason Stajich wrote: >> Seems like improper data really -- "-1" is an improper coordinate >> as far >> as the parser is concerned. You may want to tell Pfam that there is >> possible error in the dumper since that was the only record that had >> this problem? >> >> -jason >> On Jan 13, 2010, at 5:57 PM, albezg wrote: >> >>> Hi all, >>> >>> I have a problem using AlignIO to read Pfam database: >>> ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release/Pfam-A.seed.gz >>> The database is in STOCKHOLM 1.0 format. AlignIO can read the >>> alignment OK until the alignment PF00331.13. There it crashes with >>> the >>> following message: >>> >>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>> MSG: '1-344' is not an integer. >>> >>> STACK: Error::throw >>> STACK: Bio::Root::Root::throw >>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Root/Root.pm:368 >>> STACK: Bio::Range::end >>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Range.pm:228 >>> STACK: Bio::Annotation::Target::new >>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Annotation/Target.pm:82 >>> STACK: Bio::AlignIO::Handler::GenericAlignHandler::_stockholm_target >>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/ >>> GenericAlignHandler.pm:293 >>> >>> STACK: Bio::AlignIO::Handler::GenericAlignHandler::data_handler >>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/ >>> GenericAlignHandler.pm:73 >>> >>> STACK: Bio::AlignIO::stockholm::next_aln >>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/stockholm.pm:471 >>> STACK: /home/albezg/scripts/pfam2fasta.pl:22 >>> ----------------------------------------------------------- >>> >>> It appears this is caused by this entry: >>> #=GS XYNA_PSEFL/263-608 DR PDB; 1e5n B; -1-344; >>> >>> I don't care about residues in PDB, so I have just removed minus >>> signs >>> from the ranges. This seems to have fixed the crashing. >>> >>> Is it a known problem? Is there a solution for it? >>> >>> Thanks, >>> Alexandr >>> >>> >>> On 03/20/2009 05:09 PM, albezg wrote: >>>> >>>> I'm trying to change FASTA header(display_id) for a sequence in an >>>> alignment(SimpleAlign). >>>> >>>> There are no issues when I print it, however when I use AlignIO >>>> to write >>>> the alignment to a FASTA file, it does not work. Is this behavior >>>> intended? >>>> >>>> Demo code: http://github.com/jhannah/sandbox/tree/master/Bio_AlignIO_bug >>>> >>>> The error: >>>> ------------- EXCEPTION ------------- >>>> MSG: No sequence with name [1/1-11] >>>> STACK Bio::SimpleAlign::displayname >>>> /scratch/BioSoftware/bioperl-live/Bio/SimpleAlign.pm:2659 >>>> STACK Bio::AlignIO::fasta::write_aln >>>> /scratch/BioSoftware/bioperl-live/Bio/AlignIO/fasta.pm:200 >>>> STACK toplevel ./demo.pl:14 >>>> ------------------------------------- >>>> >>>> Alexandr >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> jason.stajich at gmail.com >> jason at bioperl.org >> http://fungalgenomes.org/ >> > -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ From cjfields at illinois.edu Mon Jan 18 21:19:30 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 18 Jan 2010 20:19:30 -0600 Subject: [Bioperl-l] AlignIO crashes when reading stockholm alignment with negative PDB ranges In-Reply-To: References: <49C2B97B.7070304@gmail.com> <49C405F0.5050100@gmail.com> <4B4E7A07.7070805@gmail.com> <94913CE4-22AA-407C-9B27-7572A680C9F7@bioperl.org> <4B5502D9.2010706@gmail.com> Message-ID: <46FD172A-69C0-436C-A005-AC38668C3347@illinois.edu> Alexandr, Posting the bug report would be great, should be an easy enough fix. chris On Jan 18, 2010, at 7:04 PM, Jason Stajich wrote: > Alexandr - > > Thanks for getting back to us - I am guessing the parser needs to recognize negative coordinates around about line 370 in Bio/AlignIO/Handler/GenericAlignHandler.pm which assumes a split on '-' will be sufficient. > > Can you post it as a bug to bugzilla along with attaching a record and script that replicates the problem so a test can be written for this. http://bugzilla.open-bio.org/ > > -jason > On Jan 18, 2010, at 4:54 PM, Alexandr Bezginov wrote: > >> I have contacted Pfam, and I have been told that The PDB file actually >> does include a reference to residue "-1": >> >> DBREF 1E5N A -1 347 UNP P14768 XYNA_PSEFL 264 611 >> >> DBREF 1E5N B -1 347 UNP P14768 XYNA_PSEFL 264 611 >> >> >> Since negative numbers are allowed in PDB, the data should probably be >> considered valid. >> >> There are quite a few records like this, so this is not an isolated issue. >> >> Alexandr >> >> On 1/14/2010 7:20 PM, Jason Stajich wrote: >>> Seems like improper data really -- "-1" is an improper coordinate as far >>> as the parser is concerned. You may want to tell Pfam that there is >>> possible error in the dumper since that was the only record that had >>> this problem? >>> >>> -jason >>> On Jan 13, 2010, at 5:57 PM, albezg wrote: >>> >>>> Hi all, >>>> >>>> I have a problem using AlignIO to read Pfam database: >>>> ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release/Pfam-A.seed.gz >>>> The database is in STOCKHOLM 1.0 format. AlignIO can read the >>>> alignment OK until the alignment PF00331.13. There it crashes with the >>>> following message: >>>> >>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>> MSG: '1-344' is not an integer. >>>> >>>> STACK: Error::throw >>>> STACK: Bio::Root::Root::throw >>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Root/Root.pm:368 >>>> STACK: Bio::Range::end >>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Range.pm:228 >>>> STACK: Bio::Annotation::Target::new >>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Annotation/Target.pm:82 >>>> STACK: Bio::AlignIO::Handler::GenericAlignHandler::_stockholm_target >>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/GenericAlignHandler.pm:293 >>>> >>>> STACK: Bio::AlignIO::Handler::GenericAlignHandler::data_handler >>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/GenericAlignHandler.pm:73 >>>> >>>> STACK: Bio::AlignIO::stockholm::next_aln >>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/stockholm.pm:471 >>>> STACK: /home/albezg/scripts/pfam2fasta.pl:22 >>>> ----------------------------------------------------------- >>>> >>>> It appears this is caused by this entry: >>>> #=GS XYNA_PSEFL/263-608 DR PDB; 1e5n B; -1-344; >>>> >>>> I don't care about residues in PDB, so I have just removed minus signs >>>> from the ranges. This seems to have fixed the crashing. >>>> >>>> Is it a known problem? Is there a solution for it? >>>> >>>> Thanks, >>>> Alexandr >>>> >>>> >>>> On 03/20/2009 05:09 PM, albezg wrote: >>>>> >>>>> I'm trying to change FASTA header(display_id) for a sequence in an >>>>> alignment(SimpleAlign). >>>>> >>>>> There are no issues when I print it, however when I use AlignIO to write >>>>> the alignment to a FASTA file, it does not work. Is this behavior >>>>> intended? >>>>> >>>>> Demo code: http://github.com/jhannah/sandbox/tree/master/Bio_AlignIO_bug >>>>> >>>>> The error: >>>>> ------------- EXCEPTION ------------- >>>>> MSG: No sequence with name [1/1-11] >>>>> STACK Bio::SimpleAlign::displayname >>>>> /scratch/BioSoftware/bioperl-live/Bio/SimpleAlign.pm:2659 >>>>> STACK Bio::AlignIO::fasta::write_aln >>>>> /scratch/BioSoftware/bioperl-live/Bio/AlignIO/fasta.pm:200 >>>>> STACK toplevel ./demo.pl:14 >>>>> ------------------------------------- >>>>> >>>>> Alexandr >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> -- >>> Jason Stajich >>> jason.stajich at gmail.com >>> jason at bioperl.org >>> http://fungalgenomes.org/ >>> >> > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > http://fungalgenomes.org/ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Jan 18 21:20:31 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 18 Jan 2010 20:20:31 -0600 Subject: [Bioperl-l] installing bioperl for mac In-Reply-To: <780113ef-a931-4c2e-bdfe-6a67706e2d0e@p16g2000vbe.googlegroups.com> References: <780113ef-a931-4c2e-bdfe-6a67706e2d0e@p16g2000vbe.googlegroups.com> Message-ID: <2C159519-B13D-4ABA-BC7B-A21AB5EE0C37@illinois.edu> On Jan 18, 2010, at 6:46 PM, Omar Cornejo wrote: > Dear People, > I have tried to install Bioperl in my new Mac Book, which carries > the latest perl distribution (5.10.0) and for some reason I can't > (using fink) make it recognize this version or perl. > I have tried: > fink install bioperl-pm510 > fink install bioperl-pm5100 > > but neither one works. Is it fine installing bioperl for perl v 5.9? > > thank you, > Omar Cornejo fink doesn't have a package for perl 5.10. You can install it using CPAN, however (it's pure perl), or use other UNIX-y options. See the UNIX installation instructions on the wiki: http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix chris From dan.kortschak at adelaide.edu.au Mon Jan 18 21:47:47 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Tue, 19 Jan 2010 13:17:47 +1030 Subject: [Bioperl-l] Bio::Tools::Run::Bowtie and Bio::Assembly::IO::bowtie now available BETA Message-ID: <1263869267.12881.33.camel@zoidberg.mbs.adelaide.edu.au> Hi All, A wrapper and output parser for bowtie 'ultrafast, memory-efficient short read aligner' are now available in the bioperl-live and bioperl-run subversion repositories (bioperl-live/trunk at 16727 and bioperl-run/trunk at 16726). Bowtie details are available here: http://bowtie-bio.sourceforge.net/index.shtml The modules can return a Bio::Assembly::Scaffold object (operating via the MAJ's Bio::Assembly::IO::sam module in bioperl-live/trunk which requires lstein's Bio::DB::Sam, from CPAN). Note that Bio::DB::Sam uses large amounts of memory - the test suite works for me with >=2GB but not with 1GB due to this. (Is there a disk file system based tool for this for large projects?) Bowtie (>0.12.0) can align in colour space, but this is not currently supported by the wrapper though it should not be difficult to add. If someone can point me to a small set of colour space reads and a reference sequence I will be able to use these for testing. Thanks to the core devs for helping me with many of my problems in putting this together. Dan From maj at fortinbras.us Mon Jan 18 22:31:36 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 18 Jan 2010 22:31:36 -0500 Subject: [Bioperl-l] Bio::Tools::Run::Bowtie and Bio::Assembly::IO::bowtie now available BETA In-Reply-To: <1263869267.12881.33.camel@zoidberg.mbs.adelaide.edu.au> References: <1263869267.12881.33.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: Excellent Dan! Thanks for all this work-- MAJ ----- Original Message ----- From: "Dan Kortschak" To: Sent: Monday, January 18, 2010 9:47 PM Subject: [Bioperl-l] Bio::Tools::Run::Bowtie and Bio::Assembly::IO::bowtie now available BETA > Hi All, > > A wrapper and output parser for bowtie 'ultrafast, memory-efficient > short read aligner' are now available in the bioperl-live and > bioperl-run subversion repositories (bioperl-live/trunk at 16727 and > bioperl-run/trunk at 16726). Bowtie details are available here: > > http://bowtie-bio.sourceforge.net/index.shtml > > The modules can return a Bio::Assembly::Scaffold object (operating via > the MAJ's Bio::Assembly::IO::sam module in bioperl-live/trunk > which requires lstein's Bio::DB::Sam, from CPAN). Note that Bio::DB::Sam > uses large amounts of memory - the test suite works for me with >=2GB > but not with 1GB due to this. (Is there a disk file system based tool > for this for large projects?) > > Bowtie (>0.12.0) can align in colour space, but this is not currently > supported by the wrapper though it should not be difficult to add. If > someone can point me to a small set of colour space reads and a > reference sequence I will be able to use these for testing. > > Thanks to the core devs for helping me with many of my problems in > putting this together. > > Dan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Mon Jan 18 22:36:12 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 18 Jan 2010 21:36:12 -0600 Subject: [Bioperl-l] Bio::Tools::Run::Bowtie and Bio::Assembly::IO::bowtie now available BETA In-Reply-To: <1263869267.12881.33.camel@zoidberg.mbs.adelaide.edu.au> References: <1263869267.12881.33.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: On Jan 18, 2010, at 8:47 PM, Dan Kortschak wrote: > Hi All, > > A wrapper and output parser for bowtie 'ultrafast, memory-efficient > short read aligner' are now available in the bioperl-live and > bioperl-run subversion repositories (bioperl-live/trunk at 16727 and > bioperl-run/trunk at 16726). Bowtie details are available here: > > http://bowtie-bio.sourceforge.net/index.shtml > > The modules can return a Bio::Assembly::Scaffold object (operating via > the MAJ's Bio::Assembly::IO::sam module in bioperl-live/trunk > which requires lstein's Bio::DB::Sam, from CPAN). Note that Bio::DB::Sam > uses large amounts of memory - the test suite works for me with >=2GB > but not with 1GB due to this. (Is there a disk file system based tool > for this for large projects?) > > Bowtie (>0.12.0) can align in colour space, but this is not currently > supported by the wrapper though it should not be difficult to add. If > someone can point me to a small set of colour space reads and a > reference sequence I will be able to use these for testing. > > Thanks to the core devs for helping me with many of my problems in > putting this together. > > Dan And (on behalf of the core devs) thank you for putting this together! chris From scott at scottcain.net Mon Jan 18 22:41:43 2010 From: scott at scottcain.net (Scott Cain) Date: Mon, 18 Jan 2010 22:41:43 -0500 Subject: [Bioperl-l] installing bioperl for mac In-Reply-To: <2C159519-B13D-4ABA-BC7B-A21AB5EE0C37@illinois.edu> References: <780113ef-a931-4c2e-bdfe-6a67706e2d0e@p16g2000vbe.googlegroups.com> <2C159519-B13D-4ABA-BC7B-A21AB5EE0C37@illinois.edu> Message-ID: <4536f7701001181941v7ba47d7du340d18f02f84bb17@mail.gmail.com> But make sure you have the developers tools installed before the first time you run the cpan shell; it will make your life easier. Scott On Mon, Jan 18, 2010 at 9:20 PM, Chris Fields wrote: > On Jan 18, 2010, at 6:46 PM, Omar Cornejo wrote: > >> Dear People, >> ?I have tried to install Bioperl in my new Mac Book, which carries >> the latest perl distribution (5.10.0) and for some reason I can't >> (using fink) make it recognize this version or perl. >> ?I have tried: >> fink install bioperl-pm510 >> fink install bioperl-pm5100 >> >> but neither one works. ?Is it fine installing bioperl for perl v 5.9? >> >> thank you, >> Omar Cornejo > > fink doesn't have a package for perl 5.10. ?You can install it using CPAN, however (it's pure perl), or use other UNIX-y options. ?See the UNIX installation instructions on the wiki: > > http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From cjfields at illinois.edu Mon Jan 18 23:04:57 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 18 Jan 2010 22:04:57 -0600 Subject: [Bioperl-l] Bioperl 1.6 In-Reply-To: <009801c8b957$2af4f8d0$80deea70$@ac.cn> References: <004f01ca97e6$bee06650$3ca132f0$@ac.cn> <009801c8b957$2af4f8d0$80deea70$@ac.cn> Message-ID: <79D53148-1FDA-4025-99A6-77A7F124E6BD@illinois.edu> Hmm, the trouchelle repo is the only one that had a working DB_File for perl 5.10 (not sure but I think 5.8.9 was fine). Probably worth contacting them about this to see if they can drop the (way out-of-date) 1.4 distribution. chris On May 18, 2008, at 9:22 PM, Guohong Hu wrote: > Thank for you all. The problem is solved. The bioperl 1.4 version is from > the Trouchelle repo, but 1.6 is in the Bioperl Regular Releases repo. When I > added all the repo according to the bioperl wiki instruction, somehow 1.4 > became a prerequisite for 1.6. But Chris's question reminded me, so I > removed Trouchelle repo, and the installation proceeded without errors. I > suggested we put a note in the wiki link since it looks like an odd issue > not just for me. > > Best, > Guohong > > > > _________________________________________ > ??????: Chris Fields [mailto:cjfields at illinois.edu] > ????????: 2010??1??18?? 23:30 > ??????: Guohong Hu > ????: bioperl-l at lists.open-bio.org > ????: Re: [Bioperl-l] Bioperl 1.6 > > Guohong, > > 1.6.1 PPM doesn't (at least, shouldn't) require BioPerl 1.4 to be installed > first. Make sure the repos are set according to the Windows installation > instructions on the BioPerl wiki: > > http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows > > IIRC the actual order of the PPM repository can be critical (PPM pulls based > on highest version, first repo, but sometimes it gets confused). Just > curious but where is the v 1.4 PPM located? If it is local to our PPM repo > I can physically remove it to prevent this from happening. > > chris > > On Jan 17, 2010, at 8:34 PM, Guohong Hu wrote: > >> Hi there, >> >> >> >> I was trying to install BioPerl in windows using ppm, by following the >> instruction in >> "http://bioperl.open-bio.org/wiki/Installing_Bioperl_on_Windows". I set up >> the repositories, and did the search of Bioperl packages. The latest > version >> available is BioPerl 1.6.1 on http://bioperl.org/DIST. When I tried to >> install it, a number of prerequisite modules were being installed too, > which >> include Bioperl 1.4. Then an error message showed up during installation: >> >> >> >> "ERROR: File conflict for 'C:/Perl/html/bin/bp_aacomp.html'.The package >> BioPerl has already installed a file that package bioperl wants to > install." >> >> >> >> It looks to me that BioPerl 1.6.1 had installed a file that bioperl 1.4 >> wanted to install again. I don't know why bioperl 1.4 was one of the >> prerequisites for 1.6.1. If I just install 1.4, it will be installed > without >> errors. But I need a newer version, because some modules (like >> >> Bio::Tools::HMM) is not included in 1.4. >> >> >> >> I saw on internet that somebody had the same problem when he was trying to >> install BioPerl 1.5, but I didn't find the solution. >> >> >> >> Anybody has a clue on that? Thank you for your time. >> >> >> >> GH >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From ocornejo at gmail.com Mon Jan 18 23:18:00 2010 From: ocornejo at gmail.com (Omar Eduardo Cornejo Ordaz) Date: Mon, 18 Jan 2010 23:18:00 -0500 Subject: [Bioperl-l] installing bioperl for mac In-Reply-To: <5C767287-A133-4DB5-8708-AD1BF48A4E7E@illinois.edu> References: <780113ef-a931-4c2e-bdfe-6a67706e2d0e@p16g2000vbe.googlegroups.com> <2C159519-B13D-4ABA-BC7B-A21AB5EE0C37@illinois.edu> <4536f7701001181941v7ba47d7du340d18f02f84bb17@mail.gmail.com> <5C767287-A133-4DB5-8708-AD1BF48A4E7E@illinois.edu> Message-ID: I see. thank you Scott and Chris. I had already installed the latest version of the Xcode Developer Tools. I will go the cpan way then. have a nice one, Omar On Mon, Jan 18, 2010 at 10:58 PM, Chris Fields wrote: > Yes, definitely! > > -c > > On Jan 18, 2010, at 9:41 PM, Scott Cain wrote: > > > But make sure you have the developers tools installed before the first > > time you run the cpan shell; it will make your life easier. > > > > Scott > > > > > > On Mon, Jan 18, 2010 at 9:20 PM, Chris Fields > wrote: > >> On Jan 18, 2010, at 6:46 PM, Omar Cornejo wrote: > >> > >>> Dear People, > >>> I have tried to install Bioperl in my new Mac Book, which carries > >>> the latest perl distribution (5.10.0) and for some reason I can't > >>> (using fink) make it recognize this version or perl. > >>> I have tried: > >>> fink install bioperl-pm510 > >>> fink install bioperl-pm5100 > >>> > >>> but neither one works. Is it fine installing bioperl for perl v 5.9? > >>> > >>> thank you, > >>> Omar Cornejo > >> > >> fink doesn't have a package for perl 5.10. You can install it using > CPAN, however (it's pure perl), or use other UNIX-y options. See the UNIX > installation instructions on the wiki: > >> > >> http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix > >> > >> chris > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > > > > > > > -- > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D. scott at scottcain > dot net > > GMOD Coordinator (http://gmod.org/) 216-392-3087 > > Ontario Institute for Cancer Research > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Mon Jan 18 22:58:36 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 18 Jan 2010 21:58:36 -0600 Subject: [Bioperl-l] installing bioperl for mac In-Reply-To: <4536f7701001181941v7ba47d7du340d18f02f84bb17@mail.gmail.com> References: <780113ef-a931-4c2e-bdfe-6a67706e2d0e@p16g2000vbe.googlegroups.com> <2C159519-B13D-4ABA-BC7B-A21AB5EE0C37@illinois.edu> <4536f7701001181941v7ba47d7du340d18f02f84bb17@mail.gmail.com> Message-ID: <5C767287-A133-4DB5-8708-AD1BF48A4E7E@illinois.edu> Yes, definitely! -c On Jan 18, 2010, at 9:41 PM, Scott Cain wrote: > But make sure you have the developers tools installed before the first > time you run the cpan shell; it will make your life easier. > > Scott > > > On Mon, Jan 18, 2010 at 9:20 PM, Chris Fields wrote: >> On Jan 18, 2010, at 6:46 PM, Omar Cornejo wrote: >> >>> Dear People, >>> I have tried to install Bioperl in my new Mac Book, which carries >>> the latest perl distribution (5.10.0) and for some reason I can't >>> (using fink) make it recognize this version or perl. >>> I have tried: >>> fink install bioperl-pm510 >>> fink install bioperl-pm5100 >>> >>> but neither one works. Is it fine installing bioperl for perl v 5.9? >>> >>> thank you, >>> Omar Cornejo >> >> fink doesn't have a package for perl 5.10. You can install it using CPAN, however (it's pure perl), or use other UNIX-y options. See the UNIX installation instructions on the wiki: >> >> http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From albezg at gmail.com Mon Jan 18 19:54:49 2010 From: albezg at gmail.com (Alexandr Bezginov) Date: Mon, 18 Jan 2010 19:54:49 -0500 Subject: [Bioperl-l] AlignIO crashes when reading stockholm alignment with negative PDB ranges In-Reply-To: <94913CE4-22AA-407C-9B27-7572A680C9F7@bioperl.org> References: <49C2B97B.7070304@gmail.com> <49C405F0.5050100@gmail.com> <4B4E7A07.7070805@gmail.com> <94913CE4-22AA-407C-9B27-7572A680C9F7@bioperl.org> Message-ID: <4B5502D9.2010706@gmail.com> I have contacted Pfam, and I have been told that The PDB file actually does include a reference to residue "-1": DBREF 1E5N A -1 347 UNP P14768 XYNA_PSEFL 264 611 DBREF 1E5N B -1 347 UNP P14768 XYNA_PSEFL 264 611 Since negative numbers are allowed in PDB, the data should probably be considered valid. There are quite a few records like this, so this is not an isolated issue. Alexandr On 1/14/2010 7:20 PM, Jason Stajich wrote: > Seems like improper data really -- "-1" is an improper coordinate as far > as the parser is concerned. You may want to tell Pfam that there is > possible error in the dumper since that was the only record that had > this problem? > > -jason > On Jan 13, 2010, at 5:57 PM, albezg wrote: > >> Hi all, >> >> I have a problem using AlignIO to read Pfam database: >> ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release/Pfam-A.seed.gz >> The database is in STOCKHOLM 1.0 format. AlignIO can read the >> alignment OK until the alignment PF00331.13. There it crashes with the >> following message: >> >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: '1-344' is not an integer. >> >> STACK: Error::throw >> STACK: Bio::Root::Root::throw >> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Root/Root.pm:368 >> STACK: Bio::Range::end >> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Range.pm:228 >> STACK: Bio::Annotation::Target::new >> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Annotation/Target.pm:82 >> STACK: Bio::AlignIO::Handler::GenericAlignHandler::_stockholm_target >> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/GenericAlignHandler.pm:293 >> >> STACK: Bio::AlignIO::Handler::GenericAlignHandler::data_handler >> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/GenericAlignHandler.pm:73 >> >> STACK: Bio::AlignIO::stockholm::next_aln >> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/stockholm.pm:471 >> STACK: /home/albezg/scripts/pfam2fasta.pl:22 >> ----------------------------------------------------------- >> >> It appears this is caused by this entry: >> #=GS XYNA_PSEFL/263-608 DR PDB; 1e5n B; -1-344; >> >> I don't care about residues in PDB, so I have just removed minus signs >> from the ranges. This seems to have fixed the crashing. >> >> Is it a known problem? Is there a solution for it? >> >> Thanks, >> Alexandr >> >> >> On 03/20/2009 05:09 PM, albezg wrote: >>> >>> I'm trying to change FASTA header(display_id) for a sequence in an >>> alignment(SimpleAlign). >>> >>> There are no issues when I print it, however when I use AlignIO to write >>> the alignment to a FASTA file, it does not work. Is this behavior >>> intended? >>> >>> Demo code: http://github.com/jhannah/sandbox/tree/master/Bio_AlignIO_bug >>> >>> The error: >>> ------------- EXCEPTION ------------- >>> MSG: No sequence with name [1/1-11] >>> STACK Bio::SimpleAlign::displayname >>> /scratch/BioSoftware/bioperl-live/Bio/SimpleAlign.pm:2659 >>> STACK Bio::AlignIO::fasta::write_aln >>> /scratch/BioSoftware/bioperl-live/Bio/AlignIO/fasta.pm:200 >>> STACK toplevel ./demo.pl:14 >>> ------------------------------------- >>> >>> Alexandr >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > http://fungalgenomes.org/ > From ghhu at sibs.ac.cn Mon Jan 18 21:22:19 2010 From: ghhu at sibs.ac.cn (Guohong Hu) Date: Tue, 19 Jan 2010 02:22:19 -0000 Subject: [Bioperl-l] Bioperl 1.6 In-Reply-To: References: <004f01ca97e6$bee06650$3ca132f0$@ac.cn> Message-ID: <009801c8b957$2af4f8d0$80deea70$@ac.cn> Thank for you all. The problem is solved. The bioperl 1.4 version is from the Trouchelle repo, but 1.6 is in the Bioperl Regular Releases repo. When I added all the repo according to the bioperl wiki instruction, somehow 1.4 became a prerequisite for 1.6. But Chris's question reminded me, so I removed Trouchelle repo, and the installation proceeded without errors. I suggested we put a note in the wiki link since it looks like an odd issue not just for me. Best, Guohong _________________________________________ ??????: Chris Fields [mailto:cjfields at illinois.edu] ????????: 2010??1??18?? 23:30 ??????: Guohong Hu ????: bioperl-l at lists.open-bio.org ????: Re: [Bioperl-l] Bioperl 1.6 Guohong, 1.6.1 PPM doesn't (at least, shouldn't) require BioPerl 1.4 to be installed first. Make sure the repos are set according to the Windows installation instructions on the BioPerl wiki: http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows IIRC the actual order of the PPM repository can be critical (PPM pulls based on highest version, first repo, but sometimes it gets confused). Just curious but where is the v 1.4 PPM located? If it is local to our PPM repo I can physically remove it to prevent this from happening. chris On Jan 17, 2010, at 8:34 PM, Guohong Hu wrote: > Hi there, > > > > I was trying to install BioPerl in windows using ppm, by following the > instruction in > "http://bioperl.open-bio.org/wiki/Installing_Bioperl_on_Windows". I set up > the repositories, and did the search of Bioperl packages. The latest version > available is BioPerl 1.6.1 on http://bioperl.org/DIST. When I tried to > install it, a number of prerequisite modules were being installed too, which > include Bioperl 1.4. Then an error message showed up during installation: > > > > "ERROR: File conflict for 'C:/Perl/html/bin/bp_aacomp.html'.The package > BioPerl has already installed a file that package bioperl wants to install." > > > > It looks to me that BioPerl 1.6.1 had installed a file that bioperl 1.4 > wanted to install again. I don't know why bioperl 1.4 was one of the > prerequisites for 1.6.1. If I just install 1.4, it will be installed without > errors. But I need a newer version, because some modules (like > > Bio::Tools::HMM) is not included in 1.4. > > > > I saw on internet that somebody had the same problem when he was trying to > install BioPerl 1.5, but I didn't find the solution. > > > > Anybody has a clue on that? Thank you for your time. > > > > GH > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jw12 at sanger.ac.uk Tue Jan 19 05:41:12 2010 From: jw12 at sanger.ac.uk (Jonathan Warren) Date: Tue, 19 Jan 2010 10:41:12 +0000 Subject: [Bioperl-l] DAS Workshop Registrations now Open (workshop date 7-9 April 2010) Message-ID: <9EDF4E46-15F8-434E-B557-2DE5906C4182@sanger.ac.uk> If you don't know about DAS and wish to know how to distribute your latest biological annotation to the world then the upcoming DAS workshop maybe for you. If you know about DAS and are maybe a DAS client developer then the upcoming DAS workshop is for you (as you will need to know about the upcoming DAS 1.6 Specification and how it may affect your software). For information on the workshop and registration please go to: http://www.ebi.ac.uk/training/handson/DAS_070410.html Jonathan Warren Senior Developer and DAS coordinator jw12 at sanger.ac.uk -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From SMarkel at accelrys.com Tue Jan 19 13:00:22 2010 From: SMarkel at accelrys.com (Scott Markel) Date: Tue, 19 Jan 2010 10:00:22 -0800 Subject: [Bioperl-l] Bio::Tools::Run::Bowtie and Bio::Assembly::IO::bowtie now available BETA In-Reply-To: <1263869267.12881.33.camel@zoidberg.mbs.adelaide.edu.au> References: <1263869267.12881.33.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: <5ACBA19439E77B43A06F4CAB897EC977019B31FE@EXCH1-COLO.accelrys.net> Dan, Life Tech has sample data for E. coli at http://solidsoftwaretools.com/gf/project/ecoli2x50/ and http://solidsoftwaretools.com/gf/project/dh10bfrag/. Reference sequences are included. Scott Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel at accelrys.com Accelrys (Pipeline Pilot R&D) mobile: +1 858 205 3653 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 San Diego, CA 92121 fax: +1 858 799 5222 USA web: http://www.accelrys.com http://www.linkedin.com/in/smarkel Vice President, Board of Directors: International Society for Computational Biology Chair: ISCB Publications Committee Associate Editor: PLoS Computational Biology Editorial Board: Briefings in Bioinformatics -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Dan Kortschak Sent: Monday, 18 January 2010 6:48 PM To: bioperl-l at lists.open-bio.org Subject: [Bioperl-l] Bio::Tools::Run::Bowtie and Bio::Assembly::IO::bowtie now available BETA Hi All, A wrapper and output parser for bowtie 'ultrafast, memory-efficient short read aligner' are now available in the bioperl-live and bioperl-run subversion repositories (bioperl-live/trunk at 16727 and bioperl-run/trunk at 16726). Bowtie details are available here: http://bowtie-bio.sourceforge.net/index.shtml The modules can return a Bio::Assembly::Scaffold object (operating via the MAJ's Bio::Assembly::IO::sam module in bioperl-live/trunk which requires lstein's Bio::DB::Sam, from CPAN). Note that Bio::DB::Sam uses large amounts of memory - the test suite works for me with >=2GB but not with 1GB due to this. (Is there a disk file system based tool for this for large projects?) Bowtie (>0.12.0) can align in colour space, but this is not currently supported by the wrapper though it should not be difficult to add. If someone can point me to a small set of colour space reads and a reference sequence I will be able to use these for testing. Thanks to the core devs for helping me with many of my problems in putting this together. Dan _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From dan.kortschak at adelaide.edu.au Tue Jan 19 16:18:20 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Wed, 20 Jan 2010 07:48:20 +1030 Subject: [Bioperl-l] Bio::Tools::Run::Bowtie and Bio::Assembly::IO::bowtie now available BETA In-Reply-To: <5ACBA19439E77B43A06F4CAB897EC977019B31FE@EXCH1-COLO.accelrys.net> References: <1263869267.12881.33.camel@zoidberg.mbs.adelaide.edu.au> <5ACBA19439E77B43A06F4CAB897EC977019B31FE@EXCH1-COLO.accelrys.net> Message-ID: <1263935900.4813.0.camel@epistle> Great. Thanks, Scott. Dan On Tue, 2010-01-19 at 10:00 -0800, Scott Markel wrote: > Dan, > > Life Tech has sample data for E. coli at > > http://solidsoftwaretools.com/gf/project/ecoli2x50/ > > and > > http://solidsoftwaretools.com/gf/project/dh10bfrag/. > > Reference sequences are included. > > Scott > > Scott Markel, Ph.D. > Principal Bioinformatics Architect email: smarkel at accelrys.com > Accelrys (Pipeline Pilot R&D) mobile: +1 858 205 3653 > 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 > San Diego, CA 92121 fax: +1 858 799 5222 > USA web: http://www.accelrys.com > > http://www.linkedin.com/in/smarkel > Vice President, Board of Directors: > International Society for Computational Biology > Chair: ISCB Publications Committee > Associate Editor: PLoS Computational Biology > Editorial Board: Briefings in Bioinformatics > > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Dan Kortschak > Sent: Monday, 18 January 2010 6:48 PM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Bio::Tools::Run::Bowtie and Bio::Assembly::IO::bowtie now available BETA > > Hi All, > > A wrapper and output parser for bowtie 'ultrafast, memory-efficient > short read aligner' are now available in the bioperl-live and > bioperl-run subversion repositories (bioperl-live/trunk at 16727 and > bioperl-run/trunk at 16726). Bowtie details are available here: > > http://bowtie-bio.sourceforge.net/index.shtml > > The modules can return a Bio::Assembly::Scaffold object (operating via > the MAJ's Bio::Assembly::IO::sam module in bioperl-live/trunk > which requires lstein's Bio::DB::Sam, from CPAN). Note that Bio::DB::Sam > uses large amounts of memory - the test suite works for me with >=2GB > but not with 1GB due to this. (Is there a disk file system based tool > for this for large projects?) > > Bowtie (>0.12.0) can align in colour space, but this is not currently > supported by the wrapper though it should not be difficult to add. If > someone can point me to a small set of colour space reads and a > reference sequence I will be able to use these for testing. > > Thanks to the core devs for helping me with many of my problems in > putting this together. > > Dan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From dan.kortschak at adelaide.edu.au Wed Jan 20 00:32:05 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Wed, 20 Jan 2010 16:02:05 +1030 Subject: [Bioperl-l] using Bio::DB::GFF for aggregation Message-ID: <1263965525.12780.24.camel@zoidberg.mbs.adelaide.edu.au> Hi Chris (or others), I've been looking at ways to do large assemblies (really rnaseq/readseq comparisons for coverage) with maq/bowtie output and it's clear that for the size of project that I'm working on the space complexity is too nasty with Bio::DB::Sam. So I thought Bio::DB:GFF might be the way to go. I was thinking: B:T:R:Bowtie ~> B:SeqFeat:Generic -> B:T:GFF -> B:DB:GFF This depends on the behaviour of B:DB:GFF->features(-merge=>1). I've read through the docs, and it's not entirely clear (I'm hoping I've interpreted it the right way), but does this result in the return of features such that overlapping features are returned as a single feature while non-overlapping features come back separately. If this is the case, it would satisfy my requirements perfectly. thanks for your time Dan From jason at bioperl.org Wed Jan 20 01:35:24 2010 From: jason at bioperl.org (Jason Stajich) Date: Tue, 19 Jan 2010 22:35:24 -0800 Subject: [Bioperl-l] using Bio::DB::GFF for aggregation In-Reply-To: <1263965525.12780.24.camel@zoidberg.mbs.adelaide.edu.au> References: <1263965525.12780.24.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: Are you looking at the bowtie features file or the SAM? -jason On Jan 19, 2010, at 9:32 PM, Dan Kortschak wrote: > Hi Chris (or others), > > I've been looking at ways to do large assemblies (really rnaseq/ > readseq > comparisons for coverage) with maq/bowtie output and it's clear that > for > the size of project that I'm working on the space complexity is too > nasty with Bio::DB::Sam. So I thought Bio::DB:GFF might be the way to > go. > > I was thinking: B:T:R:Bowtie ~> B:SeqFeat:Generic -> B:T:GFF -> > B:DB:GFF > > This depends on the behaviour of B:DB:GFF->features(-merge=>1). I've > read through the docs, and it's not entirely clear (I'm hoping I've > interpreted it the right way), but does this result in the return of > features such that overlapping features are returned as a single > feature > while non-overlapping features come back separately. If this is the > case, it would satisfy my requirements perfectly. > > thanks for your time > Dan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ From dan.kortschak at adelaide.edu.au Wed Jan 20 02:19:05 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Wed, 20 Jan 2010 17:49:05 +1030 Subject: [Bioperl-l] using Bio::DB::GFF for aggregation In-Reply-To: References: <1263965525.12780.24.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: <1263971945.4582.2.camel@epistle> It doesn't really matter, they are largely inter-convertible. The problem is not really the upstream processing, but the aggregation of reads into read-assigned regions (unless I've misunderstood your question). Dan On Tue, 2010-01-19 at 22:35 -0800, Jason Stajich wrote: > Are you looking at the bowtie features file or the SAM? > -jason > On Jan 19, 2010, at 9:32 PM, Dan Kortschak wrote: > > > Hi Chris (or others), > > > > I've been looking at ways to do large assemblies (really rnaseq/ > > readseq > > comparisons for coverage) with maq/bowtie output and it's clear that > > for > > the size of project that I'm working on the space complexity is too > > nasty with Bio::DB::Sam. So I thought Bio::DB:GFF might be the way to > > go. > > > > I was thinking: B:T:R:Bowtie ~> B:SeqFeat:Generic -> B:T:GFF -> > > B:DB:GFF > > > > This depends on the behaviour of B:DB:GFF->features(-merge=>1). I've > > read through the docs, and it's not entirely clear (I'm hoping I've > > interpreted it the right way), but does this result in the return of > > features such that overlapping features are returned as a single > > feature > > while non-overlapping features come back separately. If this is the > > case, it would satisfy my requirements perfectly. > > > > thanks for your time > > Dan > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > http://fungalgenomes.org/ -- Dan Kortschak From ajmackey at gmail.com Wed Jan 20 07:59:38 2010 From: ajmackey at gmail.com (Aaron Mackey) Date: Wed, 20 Jan 2010 07:59:38 -0500 Subject: [Bioperl-l] using Bio::DB::GFF for aggregation In-Reply-To: <1263965525.12780.24.camel@zoidberg.mbs.adelaide.edu.au> References: <1263965525.12780.24.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: <24c96eca1001200459g36cc5610pe12a10fea8b59a4c@mail.gmail.com> I would advise using BEDtools or the R IRanges package for this kind of aggregation/merging work, rather than trying to reinvent this particular wheel. -Aaron On Wed, Jan 20, 2010 at 12:32 AM, Dan Kortschak < dan.kortschak at adelaide.edu.au> wrote: > Hi Chris (or others), > > I've been looking at ways to do large assemblies (really rnaseq/readseq > comparisons for coverage) with maq/bowtie output and it's clear that for > the size of project that I'm working on the space complexity is too > nasty with Bio::DB::Sam. So I thought Bio::DB:GFF might be the way to > go. > > I was thinking: B:T:R:Bowtie ~> B:SeqFeat:Generic -> B:T:GFF -> B:DB:GFF > > This depends on the behaviour of B:DB:GFF->features(-merge=>1). I've > read through the docs, and it's not entirely clear (I'm hoping I've > interpreted it the right way), but does this result in the return of > features such that overlapping features are returned as a single feature > while non-overlapping features come back separately. If this is the > case, it would satisfy my requirements perfectly. > > thanks for your time > Dan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From dan.kortschak at adelaide.edu.au Wed Jan 20 16:16:39 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Thu, 21 Jan 2010 07:46:39 +1030 Subject: [Bioperl-l] using Bio::DB::GFF for aggregation In-Reply-To: <24c96eca1001200459g36cc5610pe12a10fea8b59a4c@mail.gmail.com> References: <1263965525.12780.24.camel@zoidberg.mbs.adelaide.edu.au> <24c96eca1001200459g36cc5610pe12a10fea8b59a4c@mail.gmail.com> Message-ID: <1264022199.4688.29.camel@epistle> Thanks for that, I'll look into those. BEDtools looks like what I want. cheers Dan On Wed, 2010-01-20 at 07:59 -0500, Aaron Mackey wrote: > I would advise using BEDtools or the R IRanges package for this kind > of aggregation/merging work, rather than trying to reinvent this > particular wheel. > > -Aaron From biopython at maubp.freeserve.co.uk Thu Jan 21 07:33:53 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 21 Jan 2010 12:33:53 +0000 Subject: [Bioperl-l] SwissProt DE lines and UniProt XML / TagTree as XML in BioSQL Message-ID: <320fb6e01001210433n6f42e617g6519ee2790d6add5@mail.gmail.com> Hi all, This is cross posted to try and ensure relevant people see it. I suggest we continue the discussion on the BioSQL list (for how to serialise structured annotation to BioSQL), and/or the OpenBio list (for things like file format naming conventions). I am hoping we (Bio*) can be consistent in how we parse and load into BioSQL the SwissProt DE lines (known as "swiss" format in both BioPerl and Biopython's SeqIO, and by EMBOSS) or the equivalent UniProt XML tags (which we are tentatively going to call the "uniprot" format in Biopython's SeqIO - comments?). Like BioPerl (etc), Biopython can parse plain text SwissProt ("swiss") files and load them into BioSQL. Biopython currently treats the DE comment lines as a long string, as BioPerl used to: http://lists.open-bio.org/pipermail/bioperl-l/2009-May/030041.html http://lists.open-bio.org/pipermail/biosql-l/2009-May/001514.html I understand that BioPerl now turns the SwissProt DE lines into a TagTree, and for storing this in BioSQL this gets serialised as XML. I would like Biopython to handle this the same way (although rather than a Perl TagTree, we'd use a Python structure of course), and would appreciate clarification of what exactly was implemented (e.g. which bit of the BioPerl source code should be look at, and could you show a worked example?). Andrea Pierlenoin (CC'd - not sure if he is on the BioSQL or Open-Bio lists yet) has started work on parsing UniProt XML files for Biopython. Here the DE comment lines are already provided broken up with XML markup. Hopefully their nested structure matches what BioPerl was doing with the SwissProt DE lines. Regards, Peter From cjfields at illinois.edu Thu Jan 21 08:34:12 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 21 Jan 2010 07:34:12 -0600 Subject: [Bioperl-l] [Open-bio-l] SwissProt DE lines and UniProt XML / TagTree as XML in BioSQL In-Reply-To: <320fb6e01001210433n6f42e617g6519ee2790d6add5@mail.gmail.com> References: <320fb6e01001210433n6f42e617g6519ee2790d6add5@mail.gmail.com> Message-ID: Peter, The relevant code is in Bio::Annotation::TagTree in bioperl-live, which is a decorator for Data::Stag: http://search.cpan.org/~cmungall/Data-Stag-0.11/Data/Stag.pm This is where the text output is derived from. It's a bit of a heavyweight solution to the problem, but it's capable of round-tripping the DE data and parses out the data in a way that's approachable. We could probably abstract out the serialization backend there and allow a pure bioperl solution (or the current solution) as a fallback. If the plain-text DE info is represented in a hierarchy already in UniProt XML, we should probably conform as closely as possible to that (using a standard format like XML, JSON, etc.). chris On Jan 21, 2010, at 6:33 AM, Peter wrote: > Hi all, > > This is cross posted to try and ensure relevant people see it. > I suggest we continue the discussion on the BioSQL list > (for how to serialise structured annotation to BioSQL), and/or > the OpenBio list (for things like file format naming conventions). > > I am hoping we (Bio*) can be consistent in how we parse and load > into BioSQL the SwissProt DE lines (known as "swiss" format in > both BioPerl and Biopython's SeqIO, and by EMBOSS) or the > equivalent UniProt XML tags (which we are tentatively going to > call the "uniprot" format in Biopython's SeqIO - comments?). > > Like BioPerl (etc), Biopython can parse plain text SwissProt ("swiss") > files and load them into BioSQL. Biopython currently treats the DE > comment lines as a long string, as BioPerl used to: > > http://lists.open-bio.org/pipermail/bioperl-l/2009-May/030041.html > http://lists.open-bio.org/pipermail/biosql-l/2009-May/001514.html > > I understand that BioPerl now turns the SwissProt DE lines into a > TagTree, and for storing this in BioSQL this gets serialised as XML. > I would like Biopython to handle this the same way (although rather > than a Perl TagTree, we'd use a Python structure of course), and > would appreciate clarification of what exactly was implemented > (e.g. which bit of the BioPerl source code should be look at, > and could you show a worked example?). > > Andrea Pierlenoin (CC'd - not sure if he is on the BioSQL or > Open-Bio lists yet) has started work on parsing UniProt XML > files for Biopython. Here the DE comment lines are already > provided broken up with XML markup. Hopefully their nested > structure matches what BioPerl was doing with the SwissProt > DE lines. > > Regards, > > Peter From sharmashalu.bio at gmail.com Thu Jan 21 09:25:44 2010 From: sharmashalu.bio at gmail.com (shalu sharma) Date: Thu, 21 Jan 2010 09:25:44 -0500 Subject: [Bioperl-l] sequence orientation Message-ID: <465b5a661001210625j3d84a165u69d8c8d21d2fe7ac@mail.gmail.com> Hi All, This is not a perl/bioperl query but i thought that its a best place to ask. I have some pyro reads ( from CAMERA) and i want to find out their 5' and 3' ends. Is there any way i can do this? I would really appreciate if anyone can help me out. Thanks Shalu From rtbio.2009 at gmail.com Thu Jan 21 13:28:43 2010 From: rtbio.2009 at gmail.com (Roopa Raghuveer) Date: Thu, 21 Jan 2010 19:28:43 +0100 Subject: [Bioperl-l] Regarding blast in Bioperl In-Reply-To: <196889DF87964224ACDB948681BA7F86@NewLife> References: <4C2E8133F916495B876628EF3E8FCBB2@NewLife> <9D8A1428463C4D5E9C416521C35E254C@NewLife> <196889DF87964224ACDB948681BA7F86@NewLife> Message-ID: Hello Mark, This is Roopa again. I have a small problem again. I am working on Remote blast. The program works well. But the problem is this. The program accesses the server and gets the output correctly. I am trying to send the result sequences into an array and I found that always the first sequence among the Result sequences is missing. The code is my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , '-organism' => "$organ\[ORGN]"); while (my $input = $str->next_seq()) { #Blast a sequence against a database: #Alternatively, you could pass in a file with many #sequences rather than loop through sequence one at a time #Remove the loop starting 'while (my $input = $str->next_seq())' #and swap the two lines below for an example of that. open(OUTFILE,'>',$debugfile); print OUTFILE $input; close(OUTFILE); my $r = $factory->submit_blast($input); open(OUTFILE,'>',$debugfile); # print OUTFILE $r; close(OUTFILE); print STDERR "waiting...." if($v>0); while ( my @rids = $factory->each_rid ) { open(OUTFILE,'>',$debugfile); # print OUTFILE "while entered"; close(OUTFILE); foreach my $rid ( @rids ) { open(OUTFILE,'>',$debugfile); # print OUTFILE "foreach entered"; close(OUTFILE); my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $factory->remove_rid($rid); } open(OUTFILE,'>',$debugfile); # print OUTFILE "if entered"; close(OUTFILE); print STDERR "." if ( $v > 0 ); sleep 5; } else { open(OUTFILE,'>',$debugfile); # print OUTFILE "else entered"; close(OUTFILE); my $result = $rc->next_result(); #save the output $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; open(BLASTDEBUGFILE,'>',$blastdebugfile); print BLASTDEBUGFILE $result->next_hit(); close(BLASTDEBUGFILE); my $filename = $serverpath."/blastdata_".time()."\.out"; # open(DEBUGFILE,'>',$debugfile); # open(new,'>',$filename); # @arra=; # print DEBUGFILE @arra; # close(DEBUGFILE); # close(new); $factory->save_output($filename); # open(BLASTDEBUGFILE,'>',$debugfile); # print BLASTDEBUGFILE "Hello $rid"; # close(BLASTDEBUGFILE); $factory->remove_rid($rid); open(BLASTDEBUGFILE,'>',$blastdebugfile); print BLASTDEBUGFILE $organism; close(BLASTDEBUGFILE); # open(OUTFILE,'>',$outfile); # print OUTFILE "Test2 $result->database_name()"; # close(OUTFILE); #$hit = $result->next_hit; #open(new,'>',$debugfile); #print $hit; #close(new); $dummy=0; while ( my $hit = $result->next_hit ) { next unless ( $v >= 0); # open(OUTFILE,'>',$debugfile); # print OUTFILE "$hit in while hits"; # close(OUTFILE); my $sequ = $gb->get_Seq_by_version($hit->name); my $dna = $sequ->seq(); # get the sequence as a string $dummy++; open(OUTFILE,'>',$debugfile); # print OUTFILE $dummy; close(OUTFILE); push(@seqs,$dna); } } } } } $warum=@seqs; open(OUTFILE,'>',$debugfile); # print OUTFILE $warum; print OUTFILE @seqs; close(OUTFILE); return(@seqs); } open(OUTFILE, '>',$outfile) || die ; print OUTFILE "\n RNAi Result \n \n

Inputsequence:
"; Here in the above code, I was trying to debug the code and trying to get the count of the array and even the sequence. But when the output data was giving 1 sequence, the count of the array was 0 and when I tried to print the output sequence I could not get any. It was the same when the no of output sequences was 3, I tried to print the sequences but was getting the count of the array as 2 and was printing only two sequences. Please help me in sorting out this problem. Regards, Roopa. On Sun, Jan 10, 2010 at 1:03 AM, Mark A. Jensen wrote: > Excellent Roopa- it's my pleasure-- MAJ > > ----- Original Message ----- > *From:* Roopa Raghuveer > *To:* Mark A. Jensen > *Sent:* Saturday, January 09, 2010 6:41 PM > *Subject:* Re: [Bioperl-l] Regarding blast in Bioperl > > Hi Mark, > > Thank you very very much. The code is working now. Thanks for the support > and time you have spent on me. > > Thanks in advance > Roopa. > > On Sat, Jan 9, 2010 at 10:56 PM, Mark A. Jensen wrote: > >> There is still a bug with the double quotes. Use "$organ\[ORGN]", which >> prevents perl from >> looking for a member of an array called @organ. This would have shown up >> if 'use strict;' had >> been in place. Still don't know whether this would work precisely; can you >> send me the query >> sequence so I can reproduce your ouput? >> thanks MAJ >> >> ----- Original Message ----- >> *From:* Roopa Raghuveer >> *To:* Mark A. Jensen >> *Sent:* Saturday, January 09, 2010 2:02 PM >> *Subject:* Re: [Bioperl-l] Regarding blast in Bioperl >> >> Hi Mark, >> >> I tried it with double quotes but still i got the same o/p with sequences >> from different species. >> >> ref|XM_822292.1| Trypanosoma brucei TREU927 protein kinase A ... 1813 >> 0.0 >> ref|XM_822286.1| Trypanosoma brucei TREU927 protein kinase A ... 1622 >> 0.0 >> ref|XM_816530.1| Trypanosoma cruzi strain CL Brener protein k... 773 >> 0.0 >> ref|XM_816527.1| Trypanosoma cruzi strain CL Brener protein k... 749 >> 0.0 >> ref|XM_838414.1| Leishmania major strain Friedlin protein kin... 551 >> 3e-154 >> ref|XM_838409.1| Leishmania major strain Friedlin protein kin... 551 >> 3e-154 >> ref|XM_001568451.1| Leishmania braziliensis MHOM/BR/75/M2904 ... 542 >> 2e-151 >> ref|XM_001469171.1| Leishmania infantum protein kinase A cata... 538 >> 2e-150 >> ref|XM_001469166.1| Leishmania infantum protein kinase A cata... 538 >> 2e-150 >> ref|XM_001682462.1| Leishmania major protein kinase A catalyt... 196 >> 3e-47 >> ref|XM_804361.1| Trypanosoma cruzi strain CL Brener protein k... 190 >> 1e-45 >> ref|XM_002065851.1| Drosophila willistoni GK20594 (Dwil\GK205... 181 >> 7e-43 >> ref|XM_822694.1| Trypanosoma brucei TREU927 protein kinase A ... 179 >> 2e-42 >> ref|XM_001563990.1| Leishmania braziliensis MHOM/BR/75/M2904 ... 178 >> 8e-42 >> ref|XM_814844.1| Trypanosoma cruzi strain CL Brener protein k... 170 >> 1e-39 >> ref|XM_001763039.1| Physcomitrella patens subsp. patens predi... 169 >> 4e-39 >> ref|XM_001464886.1| Leishmania infantum JPCM5 protein kinase ... 167 >> 1e-38 >> ref|XM_001377302.1| PREDICTED: Monodelphis domestica similar ... 150 >> 1e-33 >> ref|XM_001603485.1| PREDICTED: Nasonia vitripennis similar to... 145 >> 5e-32 >> ref|XM_416852.2| PREDICTED: Gallus gallus protein kinase, X-l... 143 >> 2e-31 >> ref|NM_001016403.2| Xenopus (Silurana) tropicalis protein kin... 143 >> 2e-31 >> ref|XM_002009291.1| Drosophila mojavensis GI11297 (Dmoj\GI112... 138 >> 7e-30 >> ref|NM_016979.1| Mus musculus protein kinase, X-linked (Prkx)... 138 >> 7e-30 >> ref|XM_001495664.2| PREDICTED: Equus caballus similar to Seri... 136 >> 2e-29 >> ref|XM_001111571.1| PREDICTED: Macaca mulatta cAMP-dependent ... 136 >> 2e-29 >> ref|XM_001611655.1| Babesia bovis protein kinase domain conta... 134 >> 9e-29 >> ref|NR_028062.1| Homo sapiens protein kinase, Y-linked (PRKY)... 132 >> 3e-28 >> ref|XM_001517795.1| PREDICTED: Ornithorhynchus anatinus simil... 132 >> 3e-28 >> ref|XM_685338.2| PREDICTED: Danio rerio similar to Serine/thr... 132 >> 3e-28 >> ref|XM_002189865.1| PREDICTED: Taeniopygia guttata protein ki... 131 >> 1e-27 >> ref|XM_001362299.1| PREDICTED: Monodelphis domestica similar ... 131 >> 1e-27 >> ref|NM_001093198.1| Xenopus laevis protein kinase, cAMP-depen... 129 >> 4e-27 >> ref|XM_001461322.1| Paramecium tetraurelia hypothetical prote... 129 >> 4e-27 >> ref|NM_001099869.1| Xenopus laevis cAMP-dependent protein kin... 129 >> 4e-27 >> ref|NM_001003470.1| Danio rerio protein kinase, cAMP-dependen... 129 >> 4e-27 >> ref|XM_001141503.1| PREDICTED: Pan troglodytes verus protein ... 127 >> 1e-26 >> ref|XM_001145269.1| PREDICTED: Pan troglodytes protein kinase... 127 >> 1e-26 >> ref|XM_512434.2| PREDICTED: Pan troglodytes cAMP-dependent pr... 127 >> 1e-26 >> ref|XM_001171457.1| PREDICTED: Pan troglodytes cAMP-dependent... 127 >> 1e-26 >> ref|XM_001171437.1| PREDICTED: Pan troglodytes cAMP-dependent... 127 >> 1e-26 >> ref|XM_847420.1| PREDICTED: Canis familiaris similar to Serin... 127 >> 1e-26 >> ref|NM_207518.1| Homo sapiens protein kinase, cAMP-dependent,... 127 >> 1e-26 >> ref|NM_002730.3| Homo sapiens protein kinase, cAMP-dependent,... 127 >> 1e-26 >> >> >> Thanks in advance. >> >> Roopa. >> >> On Sat, Jan 9, 2010 at 7:46 PM, Mark A. Jensen wrote: >> >>> I understand you. Put in the double quotes and see what happens. >>> >>> ----- Original Message ----- >>> *From:* Roopa Raghuveer >>> *To:* Mark A. Jensen >>> *Sent:* Saturday, January 09, 2010 1:40 PM >>> *Subject:* Re: [Bioperl-l] Regarding blast in Bioperl >>> >>> Hi Mark, >>> >>> Thanks for your reply. It was working when I specifically use the name of >>> the organism as Trypanosoma brucei in the code,but my idea is to introduce a >>> $organ which takes the organism given by the user i.e., let it be anything >>> >>> Pseudomonas, Drosophila, Trypanosoma, Leishmania etc., I should get the >>> sequences related to only those organisms. >>> >>> i.e., If the user enters Pseudomonas,the $organ parameter of the code >>> takes Pseudomonas ,does BLAST and returns only those sequences that produce >>> significant alignment with Pseudomonas(only).But this is not happening like >>> that . >>> >>> Please help me in this regard. >>> >>> Thanks in advance >>> Roopa >>> >>> On Sat, Jan 9, 2010 at 7:29 PM, Mark A. Jensen wrote: >>> >>>> Hi Roopa-- You may get what you want if you make the change. >>>> With single quotes, ENTREZ_QUERY is set to the literal string >>>> >>>> $organ[ORGN] >>>> >>>> while, with double quotes, the variable value will be substituted, >>>> and the parameter should be set to >>>> >>>> Trypanosoma brucei[ORGN] >>>> >>>> I'm guess that it worked because the database ignored the strange >>>> parameter, >>>> and returned all the matches. Try this and if it doesn't work I look >>>> harder. >>>> cheers, >>>> Mark >>>> >>>> ----- Original Message ----- >>>> *From:* Roopa Raghuveer >>>> *To:* Mark A. Jensen >>>> *Sent:* Saturday, January 09, 2010 1:24 PM >>>> *Subject:* Re: [Bioperl-l] Regarding blast in Bioperl >>>> >>>> hello Mark, >>>> >>>> Thanks for your reply.It was working without enclosing $organ[ORGN] in >>>> double quotations,but. I would like to have only those specific sequences >>>> which are specific for my Organism i.e., I need sequences only from the >>>> organism that I entered. >>>> >>>> When the organism is Trypanosoma brucei,I could get even Leishmania and >>>> other species as the similar sequences. But I want to get only trypanosoma >>>> brucei sequences. >>>> >>>> Could you please help me out in this regard? >>>> >>>> Roopa. >>>> >>>> My output >>>> >>>> I/P organism: Trypanosoma brucei >>>> >>>> O/P:- >>>> ref|XM_822292.1| Trypanosoma brucei TREU927 protein kinase A ... >>>> 1813 0.0 >>>> ref|XM_822286.1| Trypanosoma brucei TREU927 protein kinase A ... >>>> 1622 0.0 >>>> ref|XM_816530.1| Trypanosoma cruzi strain CL Brener protein k... >>>> 773 0.0 >>>> ref|XM_816527.1| Trypanosoma cruzi strain CL Brener protein k... >>>> 749 0.0 >>>> ref|XM_838414.1| Leishmania major strain Friedlin protein kin... >>>> 551 3e-154 >>>> ref|XM_838409.1| Leishmania major strain Friedlin protein kin... >>>> 551 3e-154 >>>> ref|XM_001568451.1| Leishmania braziliensis MHOM/BR/75/M2904 ... >>>> 542 2e-151 >>>> ref|XM_001469171.1| Leishmania infantum protein kinase A cata... >>>> 538 2e-150 >>>> ref|XM_001469166.1| Leishmania infantum protein kinase A cata... >>>> 538 2e-150 >>>> ref|XM_001682462.1| Leishmania major protein kinase A catalyt... >>>> 196 3e-47 >>>> ref|XM_804361.1| Trypanosoma cruzi strain CL Brener protein k... >>>> 190 1e-45 >>>> ref|XM_002065851.1| Drosophila willistoni GK20594 (Dwil\GK205... >>>> 181 7e-43 >>>> ref|XM_822694.1| Trypanosoma brucei TREU927 protein kinase A ... >>>> 179 2e-42 >>>> ref|XM_001563990.1| Leishmania braziliensis MHOM/BR/75/M2904 ... >>>> 178 8e-42 >>>> ref|XM_814844.1| Trypanosoma cruzi strain CL Brener protein k... >>>> 170 1e-39 >>>> ref|XM_001763039.1| Physcomitrella patens subsp. patens predi... >>>> 168 4e-39 >>>> ref|XM_001464886.1| Leishmania infantum JPCM5 protein kinase ... >>>> 167 1e-38 >>>> ref|XM_001377302.1| PREDICTED: Monodelphis domestica similar ... >>>> 150 1e-33 >>>> ref|XM_001603485.1| PREDICTED: Nasonia vitripennis similar to... >>>> 145 5e-32 >>>> ref|XM_416852.2| PREDICTED: Gallus gallus protein kinase, X-l... >>>> 143 2e-31 >>>> ref|NM_001016403.2| Xenopus (Silurana) tropicalis protein kin... >>>> 143 2e-31 >>>> ref|XM_002009291.1| Drosophila mojavensis GI11297 (Dmoj\GI112... >>>> 138 7e-30 >>>> ref|NM_016979.1| Mus musculus protein kinase, X-linked (Prkx)... >>>> 138 7e-30 >>>> ref|XM_001495664.2| PREDICTED: Equus caballus similar to Seri... >>>> 136 2e-29 >>>> ref|XM_001111571.1| PREDICTED: Macaca mulatta cAMP-dependent ... >>>> 136 2e-29 >>>> ref|XM_001611655.1| Babesia bovis protein kinase domain conta... >>>> 134 9e-29 >>>> ref|NR_028062.1| Homo sapiens protein kinase, Y-linked (PRKY)... >>>> 132 3e-28 >>>> ref|XM_001517795.1| PREDICTED: Ornithorhynchus anatinus simil... >>>> 132 3e-28 >>>> ref|XM_685338.2| PREDICTED: Danio rerio similar to Serine/thr... >>>> 132 3e-28 >>>> ref|XM_002189865.1| PREDICTED: Taeniopygia guttata protein ki... >>>> 131 1e-27 >>>> ref|XM_001362299.1| PREDICTED: Monodelphis domestica similar ... >>>> 131 1e-27 >>>> ref|NM_001093198.1| Xenopus laevis protein kinase, cAMP-depen... >>>> 129 4e-27 >>>> ref|XM_001461322.1| Paramecium tetraurelia hypothetical prote... >>>> 129 4e-27 >>>> ref|NM_001099869.1| Xenopus laevis cAMP-dependent protein kin... >>>> 129 4e-27 >>>> >>>> Roopa. >>>> >>>> On Sat, Jan 9, 2010 at 7:05 PM, Mark A. Jensen wrote: >>>> >>>>> I see it immediately (from making same bug many times) : >>>>> >>>>> >>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY >>>>> => >>>>> - '$organ[ORGN]'); >>>>> +"$organ[ORGN]"); >>>>> >>>>> >>>>> MAJ >>>>> >>>>> ----- Original Message ----- From: "Roopa Raghuveer" < >>>>> rtbio.2009 at gmail.com> >>>>> To: "Mark A. Jensen" >>>>> Cc: >>>>> Sent: Saturday, January 09, 2010 11:57 AM >>>>> Subject: Re: [Bioperl-l] Regarding blast in Bioperl >>>>> >>>>> >>>>> >>>>> Hello all, >>>>>> >>>>>> Thanks alot for your reply Mark. It was working for Trypanosoma brucei >>>>>> as >>>>>> the organism parameter,but when I tried to use the Organism parameter >>>>>> from >>>>>> the user,it was not working i.e., I was unable to get the target >>>>>> sequences. >>>>>> Please help me in this regard. My code is >>>>>> >>>>>> #!/usr/bin/perl >>>>>> >>>>>> #path for extra camel module >>>>>> use lib "/srv/www/htdocs/rain/RNAi/"; >>>>>> use Roopablast; >>>>>> >>>>>> >>>>>> use Bio::SearchIO; >>>>>> use Bio::Search::Result::BlastResult; >>>>>> use Bio::Perl; >>>>>> use Bio::Tools::Run::RemoteBlast; >>>>>> use Bio::Seq; >>>>>> use Bio::SeqIO; >>>>>> use Bio::DB::GenBank; >>>>>> >>>>>> $serverpath = "/srv/www/htdocs/rain/RNAi"; >>>>>> $serverurl = "http://141.84.66.66/rain/RNAi"; >>>>>> $outfile = $serverpath."/rnairesult_".time().".html"; >>>>>> $nuc = $serverpath."/nuc".time().".txt"; >>>>>> $debugfile = $serverpath."/debug_".time().".txt"; >>>>>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; >>>>>> >>>>>> my $outstring =""; >>>>>> >>>>>> &parse_form; >>>>>> >>>>>> print "Content-type: text/html\n\n"; >>>>>> print "\n"; >>>>>> print "RNAi Result"; >>>>>> print ">>>>> URL=$serverurl/rnairesult_".time().".html\"> \n"; >>>>>> print "\n"; >>>>>> print "\n"; >>>>>> print " Your results will appear >>>>> href=$serverurl/rnairesult_".time().".html>here
"; >>>>>> print " Please be patient, runtime can be up to 5 minutes
"; >>>>>> print " This page will automatically reload in 30 seconds. Roopa"; >>>>>> print "\n"; >>>>>> print "\n"; >>>>>> >>>>>> defined(my $pid = fork) or die "Can't fork: $!"; >>>>>> exit if $pid; >>>>>> open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; >>>>>> open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; >>>>>> open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; >>>>>> >>>>>> open(OUTFILE, '>',$outfile); >>>>>> >>>>>> print OUTFILE "\n >>>>>> RNAi Result >>>>>> >>>>> URL=$serverurl//rnairesult_".time().".html\"> \n >>>>>> >>>>>> \n >>>>>> \n >>>>>> Your results will appear >>>>> href=$serverurl/rnairesult_".time().".html>here
>>>>>> Please be patient, runtime can be up to 5 minutes wait wait >>>>>> wait......
>>>>>> This page will automatically reload in 30 seconds Roopa
>>>>>> \n >>>>>> \n"; >>>>>> >>>>>> close(OUTFILE); >>>>>> >>>>>> >>>>>> @compseqs = blastcode($in{'Inputseq'},$in{'Organism'}); >>>>>> >>>>>> $in{'Inputseq'} =~ s/>.*$//m; >>>>>> $in{'Inputseq'} =~ s/[^TAGC]//gim; >>>>>> $in{'Inputseq'} =~ tr/actg/ACTG/; >>>>>> >>>>>> @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'}, >>>>>> $in{'Threshold'}); >>>>>> >>>>>> >>>>>> sub blastcode >>>>>> { >>>>>> >>>>>> $inpu1= $_[0]; >>>>>> >>>>>> $organ= $_[1]; >>>>>> >>>>>> open(NUC,'>',$nuc); >>>>>> print NUC $inpu1,"\n"; >>>>>> close(NUC); >>>>>> >>>>>> my $prog = 'blastn'; >>>>>> my $db = 'refseq_rna'; >>>>>> my $e_val= '1e-10'; >>>>>> my $organism= $organ; >>>>>> >>>>>> $gb = new Bio::DB::GenBank; >>>>>> >>>>>> my @params = ( '-prog' => $prog, >>>>>> '-data' => $db, >>>>>> '-expect' => $e_val, >>>>>> '-readmethod' => 'SearchIO', >>>>>> '-Organism' => $organism ); >>>>>> >>>>>> open(OUTFILE,'>',$debugfile); >>>>>> print OUTFILE $inpu1; >>>>>> close(OUTFILE); >>>>>> >>>>>> >>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY >>>>>> => >>>>>> '$organ[ORGN]'); >>>>>> >>>>>> #my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >>>>>> >>>>>> #change a paramter >>>>>> >>>>>> #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma >>>>>> Brucei[ORGN]'; >>>>>> >>>>>> #change a paramter >>>>>> # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = >>>>>> '$input2[ORGN]'; >>>>>> >>>>>> my $v = 1; >>>>>> #$v is just to turn on and off the messages >>>>>> >>>>>> my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , >>>>>> '-organism' => $organ ); >>>>>> >>>>>> >>>>>> while (my $input = $str->next_seq()) >>>>>> { >>>>>> #Blast a sequence against a database: >>>>>> #Alternatively, you could pass in a file with many >>>>>> #sequences rather than loop through sequence one at a time >>>>>> #Remove the loop starting 'while (my $input = $str->next_seq())' >>>>>> #and swap the two lines below for an example of that. >>>>>> >>>>>> #open(OUTFILE,'>',$debugfile); >>>>>> # print OUTFILE $input; >>>>>> #close(OUTFILE); >>>>>> >>>>>> >>>>>> my $r = $factory->submit_blast($input); >>>>>> >>>>>> open(OUTFILE,'>',$debugfile); >>>>>> # print OUTFILE $r; >>>>>> close(OUTFILE); >>>>>> >>>>>> print STDERR "waiting...." if($v>0); >>>>>> >>>>>> while ( my @rids = $factory->each_rid ) { >>>>>> # open(OUTFILE,'>',$debugfile); >>>>>> # print OUTFILE "while entered"; >>>>>> # close(OUTFILE); >>>>>> foreach my $rid ( @rids ) { >>>>>> >>>>>> # open(OUTFILE,'>',$debugfile); >>>>>> # print OUTFILE "foreach entered"; >>>>>> # close(OUTFILE); >>>>>> >>>>>> my $rc = $factory->retrieve_blast($rid); >>>>>> >>>>>> if( !ref($rc) ) >>>>>> { >>>>>> if( $rc < 0 ) >>>>>> { >>>>>> $factory->remove_rid($rid); >>>>>> } >>>>>> open(OUTFILE,'>',$debugfile); >>>>>> # print OUTFILE "if entered"; >>>>>> close(OUTFILE); >>>>>> print STDERR "." if ( $v > 0 ); >>>>>> sleep 5; >>>>>> } >>>>>> else { >>>>>> # open(OUTFILE,'>',$debugfile); >>>>>> # print OUTFILE "else entered"; >>>>>> # close(OUTFILE); >>>>>> >>>>>> my $result = $rc->next_result(); >>>>>> #save the output >>>>>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; >>>>>> >>>>>> open(BLASTDEBUGFILE,'>',$blastdebugfile); >>>>>> print BLASTDEBUGFILE $result->next_hit(); >>>>>> close(BLASTDEBUGFILE); >>>>>> >>>>>> my $filename = >>>>>> $serverpath."/blastdata_".time().$result->query_name()."\.out"; >>>>>> >>>>>> # open(DEBUGFILE,'>',$debugfile); >>>>>> # open(new,'>',$filename); >>>>>> # @arra=; >>>>>> # print DEBUGFILE @arra; >>>>>> # close(DEBUGFILE); >>>>>> # close(new); >>>>>> >>>>>> $factory->save_output($filename); >>>>>> # open(BLASTDEBUGFILE,'>',$debugfile); >>>>>> # print BLASTDEBUGFILE "Hello $rid"; >>>>>> # close(BLASTDEBUGFILE); >>>>>> >>>>>> $factory->remove_rid($rid); >>>>>> >>>>>> open(BLASTDEBUGFILE,'>',$blastdebugfile); >>>>>> print BLASTDEBUGFILE $organism; >>>>>> close(BLASTDEBUGFILE); >>>>>> >>>>>> # open(OUTFILE,'>',$outfile); >>>>>> # print OUTFILE "Test2 $result->database_name()"; >>>>>> # close(OUTFILE); >>>>>> >>>>>> #$hit = $result->next_hit; >>>>>> #open(new,'>',$debugfile); >>>>>> #print $hit; >>>>>> #close(new); >>>>>> >>>>>> while ( my $hit = $result->next_hit ) { >>>>>> >>>>>> next unless ( $v > 0); >>>>>> >>>>>> # open(OUTFILE,'>',$debugfile); >>>>>> # print OUTFILE "$hit in while hits"; >>>>>> # close(OUTFILE); >>>>>> >>>>>> my $sequ = $gb->get_Seq_by_version($hit->name); >>>>>> my $dna = $sequ->seq(); # get the sequence as a string >>>>>> push(@seqs,$dna); >>>>>> } >>>>>> } >>>>>> } >>>>>> } >>>>>> } >>>>>> >>>>>> #open(OUTFILE,'>',$debugfile); >>>>>> #print OUTFILE $seqs[0]; >>>>>> #close(OUTFILE); >>>>>> >>>>>> return(@seqs); >>>>>> >>>>>> } >>>>>> >>>>>> Regards, >>>>>> Roopa. >>>>>> >>>>>> >>>>>> On Fri, Jan 8, 2010 at 4:36 PM, Mark A. Jensen >>>>>> wrote: >>>>>> >>>>>> Hi Roopa-- >>>>>>> >>>>>>> I got your code to work with the following changes: >>>>>>> >>>>>>> +# the input should be a valid FASTA file... >>>>>>> ... >>>>>>> open(NUC,'>',$nuc); >>>>>>> +print NUC ">seq (need a name line for valid fasta)\n"; >>>>>>> print NUC $inpu1, "\n"; >>>>>>> close(NUC); >>>>>>> ... >>>>>>> >>>>>>> +# you can set these header parms in the call itself... >>>>>>> - my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >>>>>>> + my $factory = Bio::Tools::Run::RemoteBlast->new(@params, >>>>>>> -ENTREZ_QUERY => >>>>>>> ''Trypanosoma Brucei[ORGN]'); >>>>>>> >>>>>>> #change a paramter >>>>>>> +# commented this out... >>>>>>> +# $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = >>>>>>> 'Trypanosoma >>>>>>> Brucei[ORGN]'; >>>>>>> >>>>>>> MAJ >>>>>>> ----- Original Message ----- From: "Roopa Raghuveer" < >>>>>>> rtbio.2009 at gmail.com >>>>>>> > >>>>>>> To: >>>>>>> Sent: Friday, January 08, 2010 10:00 AM >>>>>>> Subject: [Bioperl-l] Regarding blast in Bioperl >>>>>>> >>>>>>> >>>>>>> Hello all, >>>>>>> >>>>>>>> >>>>>>>> I was trying Remote blast using Bioperl. My input data is a >>>>>>>> Trypanosoma >>>>>>>> brucei sequence in Fasta format. When I was trying to submit to >>>>>>>> BLAST >>>>>>>> using >>>>>>>> the step >>>>>>>> $r=$factory->submit_blast($input) >>>>>>>> It was not returning anything which I checked by debugging the code. >>>>>>>> It is >>>>>>>> not blasting my input sequence even though I mentioned all the >>>>>>>> parameters.I >>>>>>>> would paste the code below. >>>>>>>> >>>>>>>> Please help me in solving put this problem. It is very urgent. >>>>>>>> >>>>>>>> Regards >>>>>>>> Roopa. >>>>>>>> >>>>>>>> #!/usr/bin/perl >>>>>>>> >>>>>>>> #path for extra camel module >>>>>>>> use lib "/srv/www/htdocs/rain/RNAi/"; >>>>>>>> use Roopablast; >>>>>>>> >>>>>>>> >>>>>>>> use Bio::SearchIO; >>>>>>>> use Bio::Search::Result::BlastResult; >>>>>>>> use Bio::Perl; >>>>>>>> use Bio::Tools::Run::RemoteBlast; >>>>>>>> use Bio::Seq; >>>>>>>> use Bio::SeqIO; >>>>>>>> use Bio::DB::GenBank; >>>>>>>> >>>>>>>> $serverpath = "/srv/www/htdocs/rain/RNAi"; >>>>>>>> $serverurl = "http://141.84.66.66/rain/RNAi"; >>>>>>>> $outfile = $serverpath."/rnairesult_".time().".html"; >>>>>>>> $nuc = $serverpath."/nuc".time().".txt"; >>>>>>>> $debugfile = $serverpath."/debug_".time().".txt"; >>>>>>>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; >>>>>>>> >>>>>>>> my $outstring =""; >>>>>>>> >>>>>>>> &parse_form; >>>>>>>> >>>>>>>> print "Content-type: text/html\n\n"; >>>>>>>> print "\n"; >>>>>>>> print "RNAi Result"; >>>>>>>> print ">>>>>>> URL=$serverurl/rnairesult_".time().".html\"> \n"; >>>>>>>> print "\n"; >>>>>>>> print "\n"; >>>>>>>> print " Your results will appear >>>>>>> href=$serverurl/rnairesult_".time().".html>here
"; >>>>>>>> print " Please be patient, runtime can be up to 5 minutes
"; >>>>>>>> print " This page will automatically reload in 30 seconds. Roopa"; >>>>>>>> print "\n"; >>>>>>>> print "\n"; >>>>>>>> >>>>>>>> defined(my $pid = fork) or die "Can't fork: $!"; >>>>>>>> exit if $pid; >>>>>>>> open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; >>>>>>>> open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; >>>>>>>> open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> open(OUTFILE, '>',$outfile); >>>>>>>> >>>>>>>> print OUTFILE "\n >>>>>>>> RNAi Result >>>>>>>> >>>>>>> URL=$serverurl//rnairesult_".time().".html\"> \n >>>>>>>> >>>>>>>> \n >>>>>>>> \n >>>>>>>> Your results will appear >>>>>>> href=$serverurl/rnairesult_".time().".html>here
>>>>>>>> Please be patient, runtime can be up to 5 minutes wait wait >>>>>>>> wait......
>>>>>>>> This page will automatically reload in 30 seconds Roopa
>>>>>>>> \n >>>>>>>> \n"; >>>>>>>> >>>>>>>> close(OUTFILE); >>>>>>>> >>>>>>>> >>>>>>>> @compseqs = blastcode($in{'Inputseq'}); >>>>>>>> >>>>>>>> $in{'Inputseq'} =~ s/>.*$//m; >>>>>>>> $in{'Inputseq'} =~ s/[^TAGC]//gim; >>>>>>>> $in{'Inputseq'} =~ tr/actg/ACTG/; >>>>>>>> >>>>>>>> @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'}, >>>>>>>> $in{'Threshold'}); >>>>>>>> >>>>>>>> >>>>>>>> sub blastcode >>>>>>>> { >>>>>>>> >>>>>>>> $inpu1= $_[0]; >>>>>>>> >>>>>>>> #$organ= $_[1]; >>>>>>>> >>>>>>>> open(NUC,'>',$nuc); >>>>>>>> print NUC $inpu1; >>>>>>>> close(NUC); >>>>>>>> >>>>>>>> my $prog = 'blastn'; >>>>>>>> my $db = 'refseq_rna'; >>>>>>>> my $e_val= '1e-10'; >>>>>>>> my $organism= 'Trypanosoma Brucei'; >>>>>>>> >>>>>>>> $gb = new Bio::DB::GenBank; >>>>>>>> >>>>>>>> my @params = ( '-prog' => $prog, >>>>>>>> '-data' => $db, >>>>>>>> '-expect' => $e_val, >>>>>>>> '-readmethod' => 'SearchIO', >>>>>>>> '-Organism' => $organism ); >>>>>>>> >>>>>>>> # open(OUTFILE,'>',$debugfile); >>>>>>>> # print OUTFILE @params; >>>>>>>> # close(OUTFILE); >>>>>>>> >>>>>>>> >>>>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >>>>>>>> >>>>>>>> #change a paramter >>>>>>>> >>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma >>>>>>>> Brucei[ORGN]'; >>>>>>>> >>>>>>>> #change a paramter >>>>>>>> # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = >>>>>>>> '$input2[ORGN]'; >>>>>>>> >>>>>>>> my $v = 1; >>>>>>>> #$v is just to turn on and off the messages >>>>>>>> >>>>>>>> my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , >>>>>>>> '-organism' => 'Trypanosoma Brucei' ); >>>>>>>> >>>>>>>> >>>>>>>> while (my $input = $str->next_seq()) >>>>>>>> { >>>>>>>> #Blast a sequence against a database: >>>>>>>> #Alternatively, you could pass in a file with many >>>>>>>> #sequences rather than loop through sequence one at a time >>>>>>>> #Remove the loop starting 'while (my $input = $str->next_seq())' >>>>>>>> #and swap the two lines below for an example of that. >>>>>>>> >>>>>>>> open(OUTFILE,'>',$debugfile); >>>>>>>> print OUTFILE $input; >>>>>>>> close(OUTFILE); >>>>>>>> >>>>>>>> >>>>>>>> my $r = $factory->submit_blast($input); #The program stops here >>>>>>>> it >>>>>>>> does not return any value and it does not enter the While >>>>>>>> loop,Please help >>>>>>>> me in this regard.# >>>>>>>> open(OUTFILE,'>',$debugfile); >>>>>>>> print OUTFILE $r; >>>>>>>> close(OUTFILE); >>>>>>>> >>>>>>>> >>>>>>>> print STDERR "waiting...." if($v>0); >>>>>>>> >>>>>>>> while ( my @rids = $factory->each_rid ) { >>>>>>>> open(OUTFILE,'>',$debugfile); >>>>>>>> print OUTFILE "while entered"; >>>>>>>> close(OUTFILE); >>>>>>>> foreach my $rid ( @rids ) { >>>>>>>> >>>>>>>> open(OUTFILE,'>',$debugfile); >>>>>>>> print OUTFILE "foreach entered"; >>>>>>>> close(OUTFILE); >>>>>>>> >>>>>>>> my $rc = $factory->retrieve_blast($rid); >>>>>>>> >>>>>>>> if( !ref($rc) ) >>>>>>>> { >>>>>>>> if( $rc < 0 ) >>>>>>>> { >>>>>>>> $factory->remove_rid($rid); >>>>>>>> } >>>>>>>> open(OUTFILE,'>',$debugfile); >>>>>>>> print OUTFILE "if entered"; >>>>>>>> close(OUTFILE); >>>>>>>> print STDERR "." if ( $v > 0 ); >>>>>>>> sleep 5; >>>>>>>> } >>>>>>>> else { >>>>>>>> open(OUTFILE,'>',$debugfile); >>>>>>>> print OUTFILE "else entered"; >>>>>>>> close(OUTFILE); >>>>>>>> >>>>>>>> my $result = $rc->next_result(); >>>>>>>> #save the output >>>>>>>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; >>>>>>>> >>>>>>>> open(BLASTDEBUGFILE,'>',$blastdebugfile); >>>>>>>> print BLASTDEBUGFILE $result->next_hit(); >>>>>>>> close(BLASTDEBUGFILE); >>>>>>>> >>>>>>>> my $filename = >>>>>>>> $serverpath."/blastdata_".time().$result->query_name()."\.out"; >>>>>>>> >>>>>>>> # open(DEBUGFILE,'>',$debugfile); >>>>>>>> # open(new,'>',$filename); >>>>>>>> # @arra=; >>>>>>>> # print DEBUGFILE @arra; >>>>>>>> # close(DEBUGFILE); >>>>>>>> # close(new); >>>>>>>> >>>>>>>> $factory->save_output($filename); >>>>>>>> >>>>>>>> # open(BLASTDEBUGFILE,'>',$debugfile); >>>>>>>> # print BLASTDEBUGFILE "Hello $rid"; >>>>>>>> # close(BLASTDEBUGFILE); >>>>>>>> >>>>>>>> $factory->remove_rid($rid); >>>>>>>> >>>>>>>> open(BLASTDEBUGFILE,'>',$blastdebugfile); >>>>>>>> print BLASTDEBUGFILE $organism; >>>>>>>> close(BLASTDEBUGFILE); >>>>>>>> >>>>>>>> # open(OUTFILE,'>',$outfile); >>>>>>>> # print OUTFILE "Test2 $result->database_name()"; >>>>>>>> # close(OUTFILE); >>>>>>>> >>>>>>>> #$hit = $result->next_hit; >>>>>>>> #open(new,'>',$debugfile); >>>>>>>> #print $hit; >>>>>>>> #close(new); >>>>>>>> >>>>>>>> while ( my $hit = $result->next_hit ) { >>>>>>>> >>>>>>>> next unless ( $v > 0); >>>>>>>> >>>>>>>> # open(OUTFILE,'>',$debugfile); >>>>>>>> # print OUTFILE "$hit in while hits"; >>>>>>>> # close(OUTFILE); >>>>>>>> >>>>>>>> my $sequ = $gb->get_Seq_by_version($hit->name); >>>>>>>> my $dna = $sequ->seq(); # get the sequence as a >>>>>>>> string >>>>>>>> push(@seqs,$dna); >>>>>>>> } >>>>>>>> } >>>>>>>> } >>>>>>>> } >>>>>>>> } >>>>>>>> >>>>>>>> #open(OUTFILE,'>',$debugfile); >>>>>>>> #print OUTFILE $seqs[0]; >>>>>>>> #close(OUTFILE); >>>>>>>> >>>>>>>> return(@seqs); >>>>>>>> >>>>>>>> } >>>>>>>> >>>>>>>> open(OUTFILE, '>',$outfile) || die ; >>>>>>>> >>>>>>>> print OUTFILE "\n >>>>>>>> RNAi Result >>>>>>>> \n >>>>>>>> \n >>>>>>>>

>>>>>>>> Inputsequence:
"; >>>>>>>> >>>>>>>> for ($i=0; $i>>>>>>> >>>>>>>> print OUTFILE substr ($in{'Inputseq'}, $i, 1); >>>>>>>> >>>>>>>> if ( ($i+1)%10==0){ >>>>>>>> print OUTFILE " "; >>>>>>>> } >>>>>>>> if ( ($i+1)%60==0){ >>>>>>>> print OUTFILE "
\n"; >>>>>>>> } >>>>>>>> } >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> print OUTFILE "

"; >>>>>>>> >>>>>>>> $z=@compseqs; >>>>>>>> >>>>>>>> for($k=1;$k<$z;$k++) { >>>>>>>> print OUTFILE ">>>>>>> set\">

Compare >>>>>>>> Sequence:
"; >>>>>>>> >>>>>>>> for ($i=0; $i>>>>>>> >>>>>>>> print OUTFILE substr ($compseqs[$k], $i, 1); >>>>>>>> >>>>>>>> if ( ($i+1)%10==0){ >>>>>>>> print OUTFILE " "; >>>>>>>> } >>>>>>>> if ( ($i+1)%60==0){ >>>>>>>> print OUTFILE "
\n"; >>>>>>>> } >>>>>>>> } >>>>>>>> print OUTFILE "

"; >>>>>>>> } >>>>>>>> >>>>>>>> print OUTFILE "

>>>>>>>> Window:
$in{'Windowsize'} >>>>>>>>

>>>>>>>>

>>>>>>>> Threshold:
$in{'Threshold'} >>>>>>>>

"; >>>>>>>> my $j=0; >>>>>>>> >>>>>>>> for ($i=0; $i>>>>>>> >>>>>>>> if ($i<=length ($in{'Inputseq'})-$in{'Windowsize'}){ >>>>>>>> if ($out[$i]->{similar}<=$in{'Threshold'}){ >>>>>>>> $j=$in{'Windowsize'}; >>>>>>>> } >>>>>>>> $height=$out[$i]->{similar}*5; >>>>>>>> } >>>>>>>> >>>>>>>> if ($j>0) { >>>>>>>> print OUTFILE ">>>>>>> height=\"5\">"; >>>>>>>> $outstring .= "".substr ($in{'Inputseq'}, >>>>>>>> $i, >>>>>>>> 1).""; >>>>>>>> $j--; >>>>>>>> } >>>>>>>> else { >>>>>>>> print OUTFILE ">>>>>>> height=\"5\">"; >>>>>>>> $outstring .= "".substr ($in{'Inputseq'}, >>>>>>>> $i, >>>>>>>> 1).""; >>>>>>>> } >>>>>>>> >>>>>>>> if ( ($i+1)%10==0){ >>>>>>>> $outstring .= " "; >>>>>>>> } >>>>>>>> if ( ($i+1)%60==0){ >>>>>>>> $outstring .= "
\n"; >>>>>>>> >>>>>>>> } >>>>>>>> if ( ($i+1)%800==0){ >>>>>>>> print OUTFILE "

\n"; >>>>>>>> >>>>>>>> } >>>>>>>> } >>>>>>>> >>>>>>>> print OUTFILE "

>>>>>>> set\">$outstring"; >>>>>>>> >>>>>>>> #foreach (@out) { >>>>>>>> #print OUTFILE "

Sequence: $_->{sequence}: $_->{similar} >>>>>>>> matchs

"; >>>>>>>> #if ($_->{similar}<=$in{'Threshold'}){ >>>>>>>> >>>>>>>> # } >>>>>>>> #} >>>>>>>> >>>>>>>> print OUTFILE "\n\n"; >>>>>>>> >>>>>>>> close OUTFILE; >>>>>>>> >>>>>>>> #nameprint(); >>>>>>>> >>>>>>>> sub parse_form { >>>>>>>> local ($buffer, @pairs, $pair, $name, $value); >>>>>>>> # Read in text >>>>>>>> $ENV{'REQUEST_METHOD'} =~ tr/a-z/A-Z/; >>>>>>>> if ($ENV{'REQUEST_METHOD'} eq "POST") >>>>>>>> { >>>>>>>> read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'}); >>>>>>>> } >>>>>>>> else >>>>>>>> { >>>>>>>> $buffer = $ENV{'QUERY_STRING'}; >>>>>>>> } >>>>>>>> @pairs = split(/&/, $buffer); >>>>>>>> foreach $pair (@pairs) >>>>>>>> { >>>>>>>> ($name, $value) = split(/=/, $pair); >>>>>>>> $value =~ tr/+/ /; >>>>>>>> $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg; >>>>>>>> $in{$name} = $value; >>>>>>>> } >>>>>>>> } >>>>>>>> _______________________________________________ >>>>>>>> Bioperl-l mailing list >>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>>>> >>>> >>> >> > From bernd.web at gmail.com Thu Jan 21 13:37:18 2010 From: bernd.web at gmail.com (Bernd Web) Date: Thu, 21 Jan 2010 19:37:18 +0100 Subject: [Bioperl-l] Regarding blast in Bioperl In-Reply-To: References: <9D8A1428463C4D5E9C416521C35E254C@NewLife> <196889DF87964224ACDB948681BA7F86@NewLife> Message-ID: <716af09c1001211037p59b19a29l1967f1e514469e79@mail.gmail.com> Hi, Regarding RemoteBlast, my I add a query? It seems that Bio::Tools::Run::RemoteBlast is sending each sequence seperately to the NCBI (at least in BP 1.5.2). This means that for each Sequence a RID is to be checked. Is this indeed the case? The BLAST URL-API or batch interface supports sending multiple sequences at once. Regards, Bernd On Thu, Jan 21, 2010 at 7:28 PM, Roopa Raghuveer wrote: > Hello Mark, > > This is Roopa again. I have a small problem again. I am working on Remote > blast. The program works well. But the problem is this. ?The program > accesses the server and gets the output correctly. I am trying to send the > result sequences into an array and I found that always the first sequence > among the Result sequences is missing. The code is > > ?my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , > '-organism' => "$organ\[ORGN]"); From cjfields at illinois.edu Thu Jan 21 23:31:25 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 21 Jan 2010 22:31:25 -0600 Subject: [Bioperl-l] Bio::BroodComb - RFC In-Reply-To: <638696D6-7529-4717-A05A-F1E8FF1C5A8F@jays.net> References: <638696D6-7529-4717-A05A-F1E8FF1C5A8F@jays.net> Message-ID: Jay, Did you want to release it to CPAN? I'll take a closer look at the docs to get an idea of what you are doing with it, but from my perspective I can see this becoming a nice general use tool akin to Bio::Perl, maybe a bit more lightweight. chris On Jan 18, 2010, at 6:22 PM, Jay Hannah wrote: > I formalized a little framework so I could stop re-writing little programs that do some things people frequently ask me to do. > > http://github.com/jhannah/bio-broodcomb > > It stores everything in SQLite so users can write custom reports if they want to. It uses BioPerl and some shiny CPAN beads (DBIx::Class, Moose::Role). Tests included. > > The first two functions I stuck in the framework: > > Find subsequences (Bio::BroodComb::SubSeq): > > use Bio::BroodComb; > my $bc = Bio::BroodComb->new(); > $bc->load_large_seq(file => "large_seq.fasta"); > $bc->load_small_seq(file => "small_seq.fasta"); > $bc->find_subseqs(); > print $bc->subseq_report1; > > In-silico PCR (Bio::BroodComb::PCR): > > use Bio::BroodComb; > my $bc = Bio::BroodComb->new(); > $bc->load_large_seq(file => "large_seq.fasta"); > $bc->add_primerset( > description => "U5/R", # however you want it reported > forward_primer => 'GCGGGCAGCAATACTGCTTTGTAA', > reverse_primer => 'ACCAGCGTTCAGCATATGGAGGAT', > ); > $bc->find_pcr_hits(); > $bc->find_pcr_products(); > print $bc->pcr_report1; > > I find this rather handy, so will probably be adding all my applicable future work to it instead of writing stand-alone programs. Not sure if it should be renamed for eventual CPAN / wherever. > > Suggestions, contributions welcome. :) > > http://github.com/jhannah/bio-broodcomb > > Jay Hannah > http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Fri Jan 22 01:17:14 2010 From: jason at bioperl.org (Jason Stajich) Date: Thu, 21 Jan 2010 22:17:14 -0800 Subject: [Bioperl-l] forcing alphabet in Bio::AlignIO Message-ID: I'm considering putting in allowable initialization parameter (and get/ set) for Bio::AlignIO that would allow setting of the alphabet. This is then passed to Bio::LocatableSeq creation so that _guess_alphabet isn't called. This will allow removal of warnings about empty sequences because _guess_alphabet won't be called on a sequence if we have explictly set the alphabet. This worked great on my local install and tests pass. Any objections or concerns? basically it means when you make an AlignIO you can specify the alphabet i.e. my $in = Bio::AlignIO->new(-format => 'fasta', -alphabet => 'dna', - file => 'genome.fasaln'); I have some alignments with empty sequences and I think turning off the warnings is appropriate where I force the alphabet choice. It should also have a very modest speedup benefit too. -jason -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ http://twitter.com/hyphaltip From rtbio.2009 at gmail.com Fri Jan 22 04:54:32 2010 From: rtbio.2009 at gmail.com (Roopa Raghuveer) Date: Fri, 22 Jan 2010 10:54:32 +0100 Subject: [Bioperl-l] Fwd: Regarding blast in Bioperl In-Reply-To: References: <9D8A1428463C4D5E9C416521C35E254C@NewLife> <196889DF87964224ACDB948681BA7F86@NewLife> Message-ID: ---------- Forwarded message ---------- From: Roopa Raghuveer Date: Thu, Jan 21, 2010 at 7:28 PM Subject: Re: [Bioperl-l] Regarding blast in Bioperl To: "Mark A. Jensen" Cc: bioperl-l at lists.open-bio.org Hello Mark, This is Roopa again. I have a small problem again. I am working on Remote blast. The program works well. But the problem is this. The program accesses the server and gets the output correctly. I am trying to send the result sequences into an array and I found that always the first sequence among the Result sequences is missing. The code is my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , '-organism' => "$organ\[ORGN]"); while (my $input = $str->next_seq()) { #Blast a sequence against a database: #Alternatively, you could pass in a file with many #sequences rather than loop through sequence one at a time #Remove the loop starting 'while (my $input = $str->next_seq())' #and swap the two lines below for an example of that. open(OUTFILE,'>',$debugfile); print OUTFILE $input; close(OUTFILE); my $r = $factory->submit_blast($input); open(OUTFILE,'>',$debugfile); # print OUTFILE $r; close(OUTFILE); print STDERR "waiting...." if($v>0); while ( my @rids = $factory->each_rid ) { open(OUTFILE,'>',$debugfile); # print OUTFILE "while entered"; close(OUTFILE); foreach my $rid ( @rids ) { open(OUTFILE,'>',$debugfile); # print OUTFILE "foreach entered"; close(OUTFILE); my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $factory->remove_rid($rid); } open(OUTFILE,'>',$debugfile); # print OUTFILE "if entered"; close(OUTFILE); print STDERR "." if ( $v > 0 ); sleep 5; } else { open(OUTFILE,'>',$debugfile); # print OUTFILE "else entered"; close(OUTFILE); my $result = $rc->next_result(); #save the output $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; open(BLASTDEBUGFILE,'>',$blastdebugfile); print BLASTDEBUGFILE $result->next_hit(); close(BLASTDEBUGFILE); my $filename = $serverpath."/blastdata_".time()."\.out"; # open(DEBUGFILE,'>',$debugfile); # open(new,'>',$filename); # @arra=; # print DEBUGFILE @arra; # close(DEBUGFILE); # close(new); $factory->save_output($filename); # open(BLASTDEBUGFILE,'>',$debugfile); # print BLASTDEBUGFILE "Hello $rid"; # close(BLASTDEBUGFILE); $factory->remove_rid($rid); open(BLASTDEBUGFILE,'>',$blastdebugfile); print BLASTDEBUGFILE $organism; close(BLASTDEBUGFILE); # open(OUTFILE,'>',$outfile); # print OUTFILE "Test2 $result->database_name()"; # close(OUTFILE); #$hit = $result->next_hit; #open(new,'>',$debugfile); #print $hit; #close(new); $dummy=0; while ( my $hit = $result->next_hit ) { next unless ( $v >= 0); # open(OUTFILE,'>',$debugfile); # print OUTFILE "$hit in while hits"; # close(OUTFILE); my $sequ = $gb->get_Seq_by_version($hit->name); my $dna = $sequ->seq(); # get the sequence as a string $dummy++; open(OUTFILE,'>',$debugfile); # print OUTFILE $dummy; close(OUTFILE); push(@seqs,$dna); } } } } } $warum=@seqs; open(OUTFILE,'>',$debugfile); # print OUTFILE $warum; print OUTFILE @seqs; close(OUTFILE); return(@seqs); } open(OUTFILE, '>',$outfile) || die ; print OUTFILE "\n RNAi Result \n \n

Inputsequence:
"; Here in the above code, I was trying to debug the code and trying to get the count of the array and even the sequence. But when the output data was giving 1 sequence, the count of the array was 0 and when I tried to print the output sequence I could not get any. It was the same when the no of output sequences was 3, I tried to print the sequences but was getting the count of the array as 2 and was printing only two sequences. Please help me in sorting out this problem. Regards, Roopa. On Sun, Jan 10, 2010 at 1:03 AM, Mark A. Jensen wrote: > Excellent Roopa- it's my pleasure-- MAJ > > ----- Original Message ----- > *From:* Roopa Raghuveer > *To:* Mark A. Jensen > *Sent:* Saturday, January 09, 2010 6:41 PM > *Subject:* Re: [Bioperl-l] Regarding blast in Bioperl > > Hi Mark, > > Thank you very very much. The code is working now. Thanks for the support > and time you have spent on me. > > Thanks in advance > Roopa. > > On Sat, Jan 9, 2010 at 10:56 PM, Mark A. Jensen wrote: > >> There is still a bug with the double quotes. Use "$organ\[ORGN]", which >> prevents perl from >> looking for a member of an array called @organ. This would have shown up >> if 'use strict;' had >> been in place. Still don't know whether this would work precisely; can you >> send me the query >> sequence so I can reproduce your ouput? >> thanks MAJ >> >> ----- Original Message ----- >> *From:* Roopa Raghuveer >> *To:* Mark A. Jensen >> *Sent:* Saturday, January 09, 2010 2:02 PM >> *Subject:* Re: [Bioperl-l] Regarding blast in Bioperl >> >> Hi Mark, >> >> I tried it with double quotes but still i got the same o/p with sequences >> from different species. >> >> ref|XM_822292.1| Trypanosoma brucei TREU927 protein kinase A ... 1813 >> 0.0 >> ref|XM_822286.1| Trypanosoma brucei TREU927 protein kinase A ... 1622 >> 0.0 >> ref|XM_816530.1| Trypanosoma cruzi strain CL Brener protein k... 773 >> 0.0 >> ref|XM_816527.1| Trypanosoma cruzi strain CL Brener protein k... 749 >> 0.0 >> ref|XM_838414.1| Leishmania major strain Friedlin protein kin... 551 >> 3e-154 >> ref|XM_838409.1| Leishmania major strain Friedlin protein kin... 551 >> 3e-154 >> ref|XM_001568451.1| Leishmania braziliensis MHOM/BR/75/M2904 ... 542 >> 2e-151 >> ref|XM_001469171.1| Leishmania infantum protein kinase A cata... 538 >> 2e-150 >> ref|XM_001469166.1| Leishmania infantum protein kinase A cata... 538 >> 2e-150 >> ref|XM_001682462.1| Leishmania major protein kinase A catalyt... 196 >> 3e-47 >> ref|XM_804361.1| Trypanosoma cruzi strain CL Brener protein k... 190 >> 1e-45 >> ref|XM_002065851.1| Drosophila willistoni GK20594 (Dwil\GK205... 181 >> 7e-43 >> ref|XM_822694.1| Trypanosoma brucei TREU927 protein kinase A ... 179 >> 2e-42 >> ref|XM_001563990.1| Leishmania braziliensis MHOM/BR/75/M2904 ... 178 >> 8e-42 >> ref|XM_814844.1| Trypanosoma cruzi strain CL Brener protein k... 170 >> 1e-39 >> ref|XM_001763039.1| Physcomitrella patens subsp. patens predi... 169 >> 4e-39 >> ref|XM_001464886.1| Leishmania infantum JPCM5 protein kinase ... 167 >> 1e-38 >> ref|XM_001377302.1| PREDICTED: Monodelphis domestica similar ... 150 >> 1e-33 >> ref|XM_001603485.1| PREDICTED: Nasonia vitripennis similar to... 145 >> 5e-32 >> ref|XM_416852.2| PREDICTED: Gallus gallus protein kinase, X-l... 143 >> 2e-31 >> ref|NM_001016403.2| Xenopus (Silurana) tropicalis protein kin... 143 >> 2e-31 >> ref|XM_002009291.1| Drosophila mojavensis GI11297 (Dmoj\GI112... 138 >> 7e-30 >> ref|NM_016979.1| Mus musculus protein kinase, X-linked (Prkx)... 138 >> 7e-30 >> ref|XM_001495664.2| PREDICTED: Equus caballus similar to Seri... 136 >> 2e-29 >> ref|XM_001111571.1| PREDICTED: Macaca mulatta cAMP-dependent ... 136 >> 2e-29 >> ref|XM_001611655.1| Babesia bovis protein kinase domain conta... 134 >> 9e-29 >> ref|NR_028062.1| Homo sapiens protein kinase, Y-linked (PRKY)... 132 >> 3e-28 >> ref|XM_001517795.1| PREDICTED: Ornithorhynchus anatinus simil... 132 >> 3e-28 >> ref|XM_685338.2| PREDICTED: Danio rerio similar to Serine/thr... 132 >> 3e-28 >> ref|XM_002189865.1| PREDICTED: Taeniopygia guttata protein ki... 131 >> 1e-27 >> ref|XM_001362299.1| PREDICTED: Monodelphis domestica similar ... 131 >> 1e-27 >> ref|NM_001093198.1| Xenopus laevis protein kinase, cAMP-depen... 129 >> 4e-27 >> ref|XM_001461322.1| Paramecium tetraurelia hypothetical prote... 129 >> 4e-27 >> ref|NM_001099869.1| Xenopus laevis cAMP-dependent protein kin... 129 >> 4e-27 >> ref|NM_001003470.1| Danio rerio protein kinase, cAMP-dependen... 129 >> 4e-27 >> ref|XM_001141503.1| PREDICTED: Pan troglodytes verus protein ... 127 >> 1e-26 >> ref|XM_001145269.1| PREDICTED: Pan troglodytes protein kinase... 127 >> 1e-26 >> ref|XM_512434.2| PREDICTED: Pan troglodytes cAMP-dependent pr... 127 >> 1e-26 >> ref|XM_001171457.1| PREDICTED: Pan troglodytes cAMP-dependent... 127 >> 1e-26 >> ref|XM_001171437.1| PREDICTED: Pan troglodytes cAMP-dependent... 127 >> 1e-26 >> ref|XM_847420.1| PREDICTED: Canis familiaris similar to Serin... 127 >> 1e-26 >> ref|NM_207518.1| Homo sapiens protein kinase, cAMP-dependent,... 127 >> 1e-26 >> ref|NM_002730.3| Homo sapiens protein kinase, cAMP-dependent,... 127 >> 1e-26 >> >> >> Thanks in advance. >> >> Roopa. >> >> On Sat, Jan 9, 2010 at 7:46 PM, Mark A. Jensen wrote: >> >>> I understand you. Put in the double quotes and see what happens. >>> >>> ----- Original Message ----- >>> *From:* Roopa Raghuveer >>> *To:* Mark A. Jensen >>> *Sent:* Saturday, January 09, 2010 1:40 PM >>> *Subject:* Re: [Bioperl-l] Regarding blast in Bioperl >>> >>> Hi Mark, >>> >>> Thanks for your reply. It was working when I specifically use the name of >>> the organism as Trypanosoma brucei in the code,but my idea is to introduce a >>> $organ which takes the organism given by the user i.e., let it be anything >>> >>> Pseudomonas, Drosophila, Trypanosoma, Leishmania etc., I should get the >>> sequences related to only those organisms. >>> >>> i.e., If the user enters Pseudomonas,the $organ parameter of the code >>> takes Pseudomonas ,does BLAST and returns only those sequences that produce >>> significant alignment with Pseudomonas(only).But this is not happening like >>> that . >>> >>> Please help me in this regard. >>> >>> Thanks in advance >>> Roopa >>> >>> On Sat, Jan 9, 2010 at 7:29 PM, Mark A. Jensen wrote: >>> >>>> Hi Roopa-- You may get what you want if you make the change. >>>> With single quotes, ENTREZ_QUERY is set to the literal string >>>> >>>> $organ[ORGN] >>>> >>>> while, with double quotes, the variable value will be substituted, >>>> and the parameter should be set to >>>> >>>> Trypanosoma brucei[ORGN] >>>> >>>> I'm guess that it worked because the database ignored the strange >>>> parameter, >>>> and returned all the matches. Try this and if it doesn't work I look >>>> harder. >>>> cheers, >>>> Mark >>>> >>>> ----- Original Message ----- >>>> *From:* Roopa Raghuveer >>>> *To:* Mark A. Jensen >>>> *Sent:* Saturday, January 09, 2010 1:24 PM >>>> *Subject:* Re: [Bioperl-l] Regarding blast in Bioperl >>>> >>>> hello Mark, >>>> >>>> Thanks for your reply.It was working without enclosing $organ[ORGN] in >>>> double quotations,but. I would like to have only those specific sequences >>>> which are specific for my Organism i.e., I need sequences only from the >>>> organism that I entered. >>>> >>>> When the organism is Trypanosoma brucei,I could get even Leishmania and >>>> other species as the similar sequences. But I want to get only trypanosoma >>>> brucei sequences. >>>> >>>> Could you please help me out in this regard? >>>> >>>> Roopa. >>>> >>>> My output >>>> >>>> I/P organism: Trypanosoma brucei >>>> >>>> O/P:- >>>> ref|XM_822292.1| Trypanosoma brucei TREU927 protein kinase A ... >>>> 1813 0.0 >>>> ref|XM_822286.1| Trypanosoma brucei TREU927 protein kinase A ... >>>> 1622 0.0 >>>> ref|XM_816530.1| Trypanosoma cruzi strain CL Brener protein k... >>>> 773 0.0 >>>> ref|XM_816527.1| Trypanosoma cruzi strain CL Brener protein k... >>>> 749 0.0 >>>> ref|XM_838414.1| Leishmania major strain Friedlin protein kin... >>>> 551 3e-154 >>>> ref|XM_838409.1| Leishmania major strain Friedlin protein kin... >>>> 551 3e-154 >>>> ref|XM_001568451.1| Leishmania braziliensis MHOM/BR/75/M2904 ... >>>> 542 2e-151 >>>> ref|XM_001469171.1| Leishmania infantum protein kinase A cata... >>>> 538 2e-150 >>>> ref|XM_001469166.1| Leishmania infantum protein kinase A cata... >>>> 538 2e-150 >>>> ref|XM_001682462.1| Leishmania major protein kinase A catalyt... >>>> 196 3e-47 >>>> ref|XM_804361.1| Trypanosoma cruzi strain CL Brener protein k... >>>> 190 1e-45 >>>> ref|XM_002065851.1| Drosophila willistoni GK20594 (Dwil\GK205... >>>> 181 7e-43 >>>> ref|XM_822694.1| Trypanosoma brucei TREU927 protein kinase A ... >>>> 179 2e-42 >>>> ref|XM_001563990.1| Leishmania braziliensis MHOM/BR/75/M2904 ... >>>> 178 8e-42 >>>> ref|XM_814844.1| Trypanosoma cruzi strain CL Brener protein k... >>>> 170 1e-39 >>>> ref|XM_001763039.1| Physcomitrella patens subsp. patens predi... >>>> 168 4e-39 >>>> ref|XM_001464886.1| Leishmania infantum JPCM5 protein kinase ... >>>> 167 1e-38 >>>> ref|XM_001377302.1| PREDICTED: Monodelphis domestica similar ... >>>> 150 1e-33 >>>> ref|XM_001603485.1| PREDICTED: Nasonia vitripennis similar to... >>>> 145 5e-32 >>>> ref|XM_416852.2| PREDICTED: Gallus gallus protein kinase, X-l... >>>> 143 2e-31 >>>> ref|NM_001016403.2| Xenopus (Silurana) tropicalis protein kin... >>>> 143 2e-31 >>>> ref|XM_002009291.1| Drosophila mojavensis GI11297 (Dmoj\GI112... >>>> 138 7e-30 >>>> ref|NM_016979.1| Mus musculus protein kinase, X-linked (Prkx)... >>>> 138 7e-30 >>>> ref|XM_001495664.2| PREDICTED: Equus caballus similar to Seri... >>>> 136 2e-29 >>>> ref|XM_001111571.1| PREDICTED: Macaca mulatta cAMP-dependent ... >>>> 136 2e-29 >>>> ref|XM_001611655.1| Babesia bovis protein kinase domain conta... >>>> 134 9e-29 >>>> ref|NR_028062.1| Homo sapiens protein kinase, Y-linked (PRKY)... >>>> 132 3e-28 >>>> ref|XM_001517795.1| PREDICTED: Ornithorhynchus anatinus simil... >>>> 132 3e-28 >>>> ref|XM_685338.2| PREDICTED: Danio rerio similar to Serine/thr... >>>> 132 3e-28 >>>> ref|XM_002189865.1| PREDICTED: Taeniopygia guttata protein ki... >>>> 131 1e-27 >>>> ref|XM_001362299.1| PREDICTED: Monodelphis domestica similar ... >>>> 131 1e-27 >>>> ref|NM_001093198.1| Xenopus laevis protein kinase, cAMP-depen... >>>> 129 4e-27 >>>> ref|XM_001461322.1| Paramecium tetraurelia hypothetical prote... >>>> 129 4e-27 >>>> ref|NM_001099869.1| Xenopus laevis cAMP-dependent protein kin... >>>> 129 4e-27 >>>> >>>> Roopa. >>>> >>>> On Sat, Jan 9, 2010 at 7:05 PM, Mark A. Jensen wrote: >>>> >>>>> I see it immediately (from making same bug many times) : >>>>> >>>>> >>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY >>>>> => >>>>> - '$organ[ORGN]'); >>>>> +"$organ[ORGN]"); >>>>> >>>>> >>>>> MAJ >>>>> >>>>> ----- Original Message ----- From: "Roopa Raghuveer" < >>>>> rtbio.2009 at gmail.com> >>>>> To: "Mark A. Jensen" >>>>> Cc: >>>>> Sent: Saturday, January 09, 2010 11:57 AM >>>>> Subject: Re: [Bioperl-l] Regarding blast in Bioperl >>>>> >>>>> >>>>> >>>>> Hello all, >>>>>> >>>>>> Thanks alot for your reply Mark. It was working for Trypanosoma brucei >>>>>> as >>>>>> the organism parameter,but when I tried to use the Organism parameter >>>>>> from >>>>>> the user,it was not working i.e., I was unable to get the target >>>>>> sequences. >>>>>> Please help me in this regard. My code is >>>>>> >>>>>> #!/usr/bin/perl >>>>>> >>>>>> #path for extra camel module >>>>>> use lib "/srv/www/htdocs/rain/RNAi/"; >>>>>> use Roopablast; >>>>>> >>>>>> >>>>>> use Bio::SearchIO; >>>>>> use Bio::Search::Result::BlastResult; >>>>>> use Bio::Perl; >>>>>> use Bio::Tools::Run::RemoteBlast; >>>>>> use Bio::Seq; >>>>>> use Bio::SeqIO; >>>>>> use Bio::DB::GenBank; >>>>>> >>>>>> $serverpath = "/srv/www/htdocs/rain/RNAi"; >>>>>> $serverurl = "http://141.84.66.66/rain/RNAi"; >>>>>> $outfile = $serverpath."/rnairesult_".time().".html"; >>>>>> $nuc = $serverpath."/nuc".time().".txt"; >>>>>> $debugfile = $serverpath."/debug_".time().".txt"; >>>>>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; >>>>>> >>>>>> my $outstring =""; >>>>>> >>>>>> &parse_form; >>>>>> >>>>>> print "Content-type: text/html\n\n"; >>>>>> print "\n"; >>>>>> print "RNAi Result"; >>>>>> print ">>>>> URL=$serverurl/rnairesult_".time().".html\"> \n"; >>>>>> print "\n"; >>>>>> print "\n"; >>>>>> print " Your results will appear >>>>> href=$serverurl/rnairesult_".time().".html>here
"; >>>>>> print " Please be patient, runtime can be up to 5 minutes
"; >>>>>> print " This page will automatically reload in 30 seconds. Roopa"; >>>>>> print "\n"; >>>>>> print "\n"; >>>>>> >>>>>> defined(my $pid = fork) or die "Can't fork: $!"; >>>>>> exit if $pid; >>>>>> open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; >>>>>> open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; >>>>>> open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; >>>>>> >>>>>> open(OUTFILE, '>',$outfile); >>>>>> >>>>>> print OUTFILE "\n >>>>>> RNAi Result >>>>>> >>>>> URL=$serverurl//rnairesult_".time().".html\"> \n >>>>>> >>>>>> \n >>>>>> \n >>>>>> Your results will appear >>>>> href=$serverurl/rnairesult_".time().".html>here
>>>>>> Please be patient, runtime can be up to 5 minutes wait wait >>>>>> wait......
>>>>>> This page will automatically reload in 30 seconds Roopa
>>>>>> \n >>>>>> \n"; >>>>>> >>>>>> close(OUTFILE); >>>>>> >>>>>> >>>>>> @compseqs = blastcode($in{'Inputseq'},$in{'Organism'}); >>>>>> >>>>>> $in{'Inputseq'} =~ s/>.*$//m; >>>>>> $in{'Inputseq'} =~ s/[^TAGC]//gim; >>>>>> $in{'Inputseq'} =~ tr/actg/ACTG/; >>>>>> >>>>>> @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'}, >>>>>> $in{'Threshold'}); >>>>>> >>>>>> >>>>>> sub blastcode >>>>>> { >>>>>> >>>>>> $inpu1= $_[0]; >>>>>> >>>>>> $organ= $_[1]; >>>>>> >>>>>> open(NUC,'>',$nuc); >>>>>> print NUC $inpu1,"\n"; >>>>>> close(NUC); >>>>>> >>>>>> my $prog = 'blastn'; >>>>>> my $db = 'refseq_rna'; >>>>>> my $e_val= '1e-10'; >>>>>> my $organism= $organ; >>>>>> >>>>>> $gb = new Bio::DB::GenBank; >>>>>> >>>>>> my @params = ( '-prog' => $prog, >>>>>> '-data' => $db, >>>>>> '-expect' => $e_val, >>>>>> '-readmethod' => 'SearchIO', >>>>>> '-Organism' => $organism ); >>>>>> >>>>>> open(OUTFILE,'>',$debugfile); >>>>>> print OUTFILE $inpu1; >>>>>> close(OUTFILE); >>>>>> >>>>>> >>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY >>>>>> => >>>>>> '$organ[ORGN]'); >>>>>> >>>>>> #my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >>>>>> >>>>>> #change a paramter >>>>>> >>>>>> #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma >>>>>> Brucei[ORGN]'; >>>>>> >>>>>> #change a paramter >>>>>> # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = >>>>>> '$input2[ORGN]'; >>>>>> >>>>>> my $v = 1; >>>>>> #$v is just to turn on and off the messages >>>>>> >>>>>> my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , >>>>>> '-organism' => $organ ); >>>>>> >>>>>> >>>>>> while (my $input = $str->next_seq()) >>>>>> { >>>>>> #Blast a sequence against a database: >>>>>> #Alternatively, you could pass in a file with many >>>>>> #sequences rather than loop through sequence one at a time >>>>>> #Remove the loop starting 'while (my $input = $str->next_seq())' >>>>>> #and swap the two lines below for an example of that. >>>>>> >>>>>> #open(OUTFILE,'>',$debugfile); >>>>>> # print OUTFILE $input; >>>>>> #close(OUTFILE); >>>>>> >>>>>> >>>>>> my $r = $factory->submit_blast($input); >>>>>> >>>>>> open(OUTFILE,'>',$debugfile); >>>>>> # print OUTFILE $r; >>>>>> close(OUTFILE); >>>>>> >>>>>> print STDERR "waiting...." if($v>0); >>>>>> >>>>>> while ( my @rids = $factory->each_rid ) { >>>>>> # open(OUTFILE,'>',$debugfile); >>>>>> # print OUTFILE "while entered"; >>>>>> # close(OUTFILE); >>>>>> foreach my $rid ( @rids ) { >>>>>> >>>>>> # open(OUTFILE,'>',$debugfile); >>>>>> # print OUTFILE "foreach entered"; >>>>>> # close(OUTFILE); >>>>>> >>>>>> my $rc = $factory->retrieve_blast($rid); >>>>>> >>>>>> if( !ref($rc) ) >>>>>> { >>>>>> if( $rc < 0 ) >>>>>> { >>>>>> $factory->remove_rid($rid); >>>>>> } >>>>>> open(OUTFILE,'>',$debugfile); >>>>>> # print OUTFILE "if entered"; >>>>>> close(OUTFILE); >>>>>> print STDERR "." if ( $v > 0 ); >>>>>> sleep 5; >>>>>> } >>>>>> else { >>>>>> # open(OUTFILE,'>',$debugfile); >>>>>> # print OUTFILE "else entered"; >>>>>> # close(OUTFILE); >>>>>> >>>>>> my $result = $rc->next_result(); >>>>>> #save the output >>>>>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; >>>>>> >>>>>> open(BLASTDEBUGFILE,'>',$blastdebugfile); >>>>>> print BLASTDEBUGFILE $result->next_hit(); >>>>>> close(BLASTDEBUGFILE); >>>>>> >>>>>> my $filename = >>>>>> $serverpath."/blastdata_".time().$result->query_name()."\.out"; >>>>>> >>>>>> # open(DEBUGFILE,'>',$debugfile); >>>>>> # open(new,'>',$filename); >>>>>> # @arra=; >>>>>> # print DEBUGFILE @arra; >>>>>> # close(DEBUGFILE); >>>>>> # close(new); >>>>>> >>>>>> $factory->save_output($filename); >>>>>> # open(BLASTDEBUGFILE,'>',$debugfile); >>>>>> # print BLASTDEBUGFILE "Hello $rid"; >>>>>> # close(BLASTDEBUGFILE); >>>>>> >>>>>> $factory->remove_rid($rid); >>>>>> >>>>>> open(BLASTDEBUGFILE,'>',$blastdebugfile); >>>>>> print BLASTDEBUGFILE $organism; >>>>>> close(BLASTDEBUGFILE); >>>>>> >>>>>> # open(OUTFILE,'>',$outfile); >>>>>> # print OUTFILE "Test2 $result->database_name()"; >>>>>> # close(OUTFILE); >>>>>> >>>>>> #$hit = $result->next_hit; >>>>>> #open(new,'>',$debugfile); >>>>>> #print $hit; >>>>>> #close(new); >>>>>> >>>>>> while ( my $hit = $result->next_hit ) { >>>>>> >>>>>> next unless ( $v > 0); >>>>>> >>>>>> # open(OUTFILE,'>',$debugfile); >>>>>> # print OUTFILE "$hit in while hits"; >>>>>> # close(OUTFILE); >>>>>> >>>>>> my $sequ = $gb->get_Seq_by_version($hit->name); >>>>>> my $dna = $sequ->seq(); # get the sequence as a string >>>>>> push(@seqs,$dna); >>>>>> } >>>>>> } >>>>>> } >>>>>> } >>>>>> } >>>>>> >>>>>> #open(OUTFILE,'>',$debugfile); >>>>>> #print OUTFILE $seqs[0]; >>>>>> #close(OUTFILE); >>>>>> >>>>>> return(@seqs); >>>>>> >>>>>> } >>>>>> >>>>>> Regards, >>>>>> Roopa. >>>>>> >>>>>> >>>>>> On Fri, Jan 8, 2010 at 4:36 PM, Mark A. Jensen >>>>>> wrote: >>>>>> >>>>>> Hi Roopa-- >>>>>>> >>>>>>> I got your code to work with the following changes: >>>>>>> >>>>>>> +# the input should be a valid FASTA file... >>>>>>> ... >>>>>>> open(NUC,'>',$nuc); >>>>>>> +print NUC ">seq (need a name line for valid fasta)\n"; >>>>>>> print NUC $inpu1, "\n"; >>>>>>> close(NUC); >>>>>>> ... >>>>>>> >>>>>>> +# you can set these header parms in the call itself... >>>>>>> - my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >>>>>>> + my $factory = Bio::Tools::Run::RemoteBlast->new(@params, >>>>>>> -ENTREZ_QUERY => >>>>>>> ''Trypanosoma Brucei[ORGN]'); >>>>>>> >>>>>>> #change a paramter >>>>>>> +# commented this out... >>>>>>> +# $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = >>>>>>> 'Trypanosoma >>>>>>> Brucei[ORGN]'; >>>>>>> >>>>>>> MAJ >>>>>>> ----- Original Message ----- From: "Roopa Raghuveer" < >>>>>>> rtbio.2009 at gmail.com >>>>>>> > >>>>>>> To: >>>>>>> Sent: Friday, January 08, 2010 10:00 AM >>>>>>> Subject: [Bioperl-l] Regarding blast in Bioperl >>>>>>> >>>>>>> >>>>>>> Hello all, >>>>>>> >>>>>>>> >>>>>>>> I was trying Remote blast using Bioperl. My input data is a >>>>>>>> Trypanosoma >>>>>>>> brucei sequence in Fasta format. When I was trying to submit to >>>>>>>> BLAST >>>>>>>> using >>>>>>>> the step >>>>>>>> $r=$factory->submit_blast($input) >>>>>>>> It was not returning anything which I checked by debugging the code. >>>>>>>> It is >>>>>>>> not blasting my input sequence even though I mentioned all the >>>>>>>> parameters.I >>>>>>>> would paste the code below. >>>>>>>> >>>>>>>> Please help me in solving put this problem. It is very urgent. >>>>>>>> >>>>>>>> Regards >>>>>>>> Roopa. >>>>>>>> >>>>>>>> #!/usr/bin/perl >>>>>>>> >>>>>>>> #path for extra camel module >>>>>>>> use lib "/srv/www/htdocs/rain/RNAi/"; >>>>>>>> use Roopablast; >>>>>>>> >>>>>>>> >>>>>>>> use Bio::SearchIO; >>>>>>>> use Bio::Search::Result::BlastResult; >>>>>>>> use Bio::Perl; >>>>>>>> use Bio::Tools::Run::RemoteBlast; >>>>>>>> use Bio::Seq; >>>>>>>> use Bio::SeqIO; >>>>>>>> use Bio::DB::GenBank; >>>>>>>> >>>>>>>> $serverpath = "/srv/www/htdocs/rain/RNAi"; >>>>>>>> $serverurl = "http://141.84.66.66/rain/RNAi"; >>>>>>>> $outfile = $serverpath."/rnairesult_".time().".html"; >>>>>>>> $nuc = $serverpath."/nuc".time().".txt"; >>>>>>>> $debugfile = $serverpath."/debug_".time().".txt"; >>>>>>>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; >>>>>>>> >>>>>>>> my $outstring =""; >>>>>>>> >>>>>>>> &parse_form; >>>>>>>> >>>>>>>> print "Content-type: text/html\n\n"; >>>>>>>> print "\n"; >>>>>>>> print "RNAi Result"; >>>>>>>> print ">>>>>>> URL=$serverurl/rnairesult_".time().".html\"> \n"; >>>>>>>> print "\n"; >>>>>>>> print "\n"; >>>>>>>> print " Your results will appear >>>>>>> href=$serverurl/rnairesult_".time().".html>here
"; >>>>>>>> print " Please be patient, runtime can be up to 5 minutes
"; >>>>>>>> print " This page will automatically reload in 30 seconds. Roopa"; >>>>>>>> print "\n"; >>>>>>>> print "\n"; >>>>>>>> >>>>>>>> defined(my $pid = fork) or die "Can't fork: $!"; >>>>>>>> exit if $pid; >>>>>>>> open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; >>>>>>>> open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; >>>>>>>> open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> open(OUTFILE, '>',$outfile); >>>>>>>> >>>>>>>> print OUTFILE "\n >>>>>>>> RNAi Result >>>>>>>> >>>>>>> URL=$serverurl//rnairesult_".time().".html\"> \n >>>>>>>> >>>>>>>> \n >>>>>>>> \n >>>>>>>> Your results will appear >>>>>>> href=$serverurl/rnairesult_".time().".html>here
>>>>>>>> Please be patient, runtime can be up to 5 minutes wait wait >>>>>>>> wait......
>>>>>>>> This page will automatically reload in 30 seconds Roopa
>>>>>>>> \n >>>>>>>> \n"; >>>>>>>> >>>>>>>> close(OUTFILE); >>>>>>>> >>>>>>>> >>>>>>>> @compseqs = blastcode($in{'Inputseq'}); >>>>>>>> >>>>>>>> $in{'Inputseq'} =~ s/>.*$//m; >>>>>>>> $in{'Inputseq'} =~ s/[^TAGC]//gim; >>>>>>>> $in{'Inputseq'} =~ tr/actg/ACTG/; >>>>>>>> >>>>>>>> @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'}, >>>>>>>> $in{'Threshold'}); >>>>>>>> >>>>>>>> >>>>>>>> sub blastcode >>>>>>>> { >>>>>>>> >>>>>>>> $inpu1= $_[0]; >>>>>>>> >>>>>>>> #$organ= $_[1]; >>>>>>>> >>>>>>>> open(NUC,'>',$nuc); >>>>>>>> print NUC $inpu1; >>>>>>>> close(NUC); >>>>>>>> >>>>>>>> my $prog = 'blastn'; >>>>>>>> my $db = 'refseq_rna'; >>>>>>>> my $e_val= '1e-10'; >>>>>>>> my $organism= 'Trypanosoma Brucei'; >>>>>>>> >>>>>>>> $gb = new Bio::DB::GenBank; >>>>>>>> >>>>>>>> my @params = ( '-prog' => $prog, >>>>>>>> '-data' => $db, >>>>>>>> '-expect' => $e_val, >>>>>>>> '-readmethod' => 'SearchIO', >>>>>>>> '-Organism' => $organism ); >>>>>>>> >>>>>>>> # open(OUTFILE,'>',$debugfile); >>>>>>>> # print OUTFILE @params; >>>>>>>> # close(OUTFILE); >>>>>>>> >>>>>>>> >>>>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >>>>>>>> >>>>>>>> #change a paramter >>>>>>>> >>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma >>>>>>>> Brucei[ORGN]'; >>>>>>>> >>>>>>>> #change a paramter >>>>>>>> # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = >>>>>>>> '$input2[ORGN]'; >>>>>>>> >>>>>>>> my $v = 1; >>>>>>>> #$v is just to turn on and off the messages >>>>>>>> >>>>>>>> my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , >>>>>>>> '-organism' => 'Trypanosoma Brucei' ); >>>>>>>> >>>>>>>> >>>>>>>> while (my $input = $str->next_seq()) >>>>>>>> { >>>>>>>> #Blast a sequence against a database: >>>>>>>> #Alternatively, you could pass in a file with many >>>>>>>> #sequences rather than loop through sequence one at a time >>>>>>>> #Remove the loop starting 'while (my $input = $str->next_seq())' >>>>>>>> #and swap the two lines below for an example of that. >>>>>>>> >>>>>>>> open(OUTFILE,'>',$debugfile); >>>>>>>> print OUTFILE $input; >>>>>>>> close(OUTFILE); >>>>>>>> >>>>>>>> >>>>>>>> my $r = $factory->submit_blast($input); #The program stops here >>>>>>>> it >>>>>>>> does not return any value and it does not enter the While >>>>>>>> loop,Please help >>>>>>>> me in this regard.# >>>>>>>> open(OUTFILE,'>',$debugfile); >>>>>>>> print OUTFILE $r; >>>>>>>> close(OUTFILE); >>>>>>>> >>>>>>>> >>>>>>>> print STDERR "waiting...." if($v>0); >>>>>>>> >>>>>>>> while ( my @rids = $factory->each_rid ) { >>>>>>>> open(OUTFILE,'>',$debugfile); >>>>>>>> print OUTFILE "while entered"; >>>>>>>> close(OUTFILE); >>>>>>>> foreach my $rid ( @rids ) { >>>>>>>> >>>>>>>> open(OUTFILE,'>',$debugfile); >>>>>>>> print OUTFILE "foreach entered"; >>>>>>>> close(OUTFILE); >>>>>>>> >>>>>>>> my $rc = $factory->retrieve_blast($rid); >>>>>>>> >>>>>>>> if( !ref($rc) ) >>>>>>>> { >>>>>>>> if( $rc < 0 ) >>>>>>>> { >>>>>>>> $factory->remove_rid($rid); >>>>>>>> } >>>>>>>> open(OUTFILE,'>',$debugfile); >>>>>>>> print OUTFILE "if entered"; >>>>>>>> close(OUTFILE); >>>>>>>> print STDERR "." if ( $v > 0 ); >>>>>>>> sleep 5; >>>>>>>> } >>>>>>>> else { >>>>>>>> open(OUTFILE,'>',$debugfile); >>>>>>>> print OUTFILE "else entered"; >>>>>>>> close(OUTFILE); >>>>>>>> >>>>>>>> my $result = $rc->next_result(); >>>>>>>> #save the output >>>>>>>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; >>>>>>>> >>>>>>>> open(BLASTDEBUGFILE,'>',$blastdebugfile); >>>>>>>> print BLASTDEBUGFILE $result->next_hit(); >>>>>>>> close(BLASTDEBUGFILE); >>>>>>>> >>>>>>>> my $filename = >>>>>>>> $serverpath."/blastdata_".time().$result->query_name()."\.out"; >>>>>>>> >>>>>>>> # open(DEBUGFILE,'>',$debugfile); >>>>>>>> # open(new,'>',$filename); >>>>>>>> # @arra=; >>>>>>>> # print DEBUGFILE @arra; >>>>>>>> # close(DEBUGFILE); >>>>>>>> # close(new); >>>>>>>> >>>>>>>> $factory->save_output($filename); >>>>>>>> >>>>>>>> # open(BLASTDEBUGFILE,'>',$debugfile); >>>>>>>> # print BLASTDEBUGFILE "Hello $rid"; >>>>>>>> # close(BLASTDEBUGFILE); >>>>>>>> >>>>>>>> $factory->remove_rid($rid); >>>>>>>> >>>>>>>> open(BLASTDEBUGFILE,'>',$blastdebugfile); >>>>>>>> print BLASTDEBUGFILE $organism; >>>>>>>> close(BLASTDEBUGFILE); >>>>>>>> >>>>>>>> # open(OUTFILE,'>',$outfile); >>>>>>>> # print OUTFILE "Test2 $result->database_name()"; >>>>>>>> # close(OUTFILE); >>>>>>>> >>>>>>>> #$hit = $result->next_hit; >>>>>>>> #open(new,'>',$debugfile); >>>>>>>> #print $hit; >>>>>>>> #close(new); >>>>>>>> >>>>>>>> while ( my $hit = $result->next_hit ) { >>>>>>>> >>>>>>>> next unless ( $v > 0); >>>>>>>> >>>>>>>> # open(OUTFILE,'>',$debugfile); >>>>>>>> # print OUTFILE "$hit in while hits"; >>>>>>>> # close(OUTFILE); >>>>>>>> >>>>>>>> my $sequ = $gb->get_Seq_by_version($hit->name); >>>>>>>> my $dna = $sequ->seq(); # get the sequence as a >>>>>>>> string >>>>>>>> push(@seqs,$dna); >>>>>>>> } >>>>>>>> } >>>>>>>> } >>>>>>>> } >>>>>>>> } >>>>>>>> >>>>>>>> #open(OUTFILE,'>',$debugfile); >>>>>>>> #print OUTFILE $seqs[0]; >>>>>>>> #close(OUTFILE); >>>>>>>> >>>>>>>> return(@seqs); >>>>>>>> >>>>>>>> } >>>>>>>> >>>>>>>> open(OUTFILE, '>',$outfile) || die ; >>>>>>>> >>>>>>>> print OUTFILE "\n >>>>>>>> RNAi Result >>>>>>>> \n >>>>>>>> \n >>>>>>>>

>>>>>>>> Inputsequence:
"; >>>>>>>> >>>>>>>> for ($i=0; $i>>>>>>> >>>>>>>> print OUTFILE substr ($in{'Inputseq'}, $i, 1); >>>>>>>> >>>>>>>> if ( ($i+1)%10==0){ >>>>>>>> print OUTFILE " "; >>>>>>>> } >>>>>>>> if ( ($i+1)%60==0){ >>>>>>>> print OUTFILE "
\n"; >>>>>>>> } >>>>>>>> } >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> print OUTFILE "

"; >>>>>>>> >>>>>>>> $z=@compseqs; >>>>>>>> >>>>>>>> for($k=1;$k<$z;$k++) { >>>>>>>> print OUTFILE ">>>>>>> set\">

Compare >>>>>>>> Sequence:
"; >>>>>>>> >>>>>>>> for ($i=0; $i>>>>>>> >>>>>>>> print OUTFILE substr ($compseqs[$k], $i, 1); >>>>>>>> >>>>>>>> if ( ($i+1)%10==0){ >>>>>>>> print OUTFILE " "; >>>>>>>> } >>>>>>>> if ( ($i+1)%60==0){ >>>>>>>> print OUTFILE "
\n"; >>>>>>>> } >>>>>>>> } >>>>>>>> print OUTFILE "

"; >>>>>>>> } >>>>>>>> >>>>>>>> print OUTFILE "

>>>>>>>> Window:
$in{'Windowsize'} >>>>>>>>

>>>>>>>>

>>>>>>>> Threshold:
$in{'Threshold'} >>>>>>>>

"; >>>>>>>> my $j=0; >>>>>>>> >>>>>>>> for ($i=0; $i>>>>>>> >>>>>>>> if ($i<=length ($in{'Inputseq'})-$in{'Windowsize'}){ >>>>>>>> if ($out[$i]->{similar}<=$in{'Threshold'}){ >>>>>>>> $j=$in{'Windowsize'}; >>>>>>>> } >>>>>>>> $height=$out[$i]->{similar}*5; >>>>>>>> } >>>>>>>> >>>>>>>> if ($j>0) { >>>>>>>> print OUTFILE ">>>>>>> height=\"5\">"; >>>>>>>> $outstring .= "".substr ($in{'Inputseq'}, >>>>>>>> $i, >>>>>>>> 1).""; >>>>>>>> $j--; >>>>>>>> } >>>>>>>> else { >>>>>>>> print OUTFILE ">>>>>>> height=\"5\">"; >>>>>>>> $outstring .= "".substr ($in{'Inputseq'}, >>>>>>>> $i, >>>>>>>> 1).""; >>>>>>>> } >>>>>>>> >>>>>>>> if ( ($i+1)%10==0){ >>>>>>>> $outstring .= " "; >>>>>>>> } >>>>>>>> if ( ($i+1)%60==0){ >>>>>>>> $outstring .= "
\n"; >>>>>>>> >>>>>>>> } >>>>>>>> if ( ($i+1)%800==0){ >>>>>>>> print OUTFILE "

\n"; >>>>>>>> >>>>>>>> } >>>>>>>> } >>>>>>>> >>>>>>>> print OUTFILE "

>>>>>>> set\">$outstring"; >>>>>>>> >>>>>>>> #foreach (@out) { >>>>>>>> #print OUTFILE "

Sequence: $_->{sequence}: $_->{similar} >>>>>>>> matchs

"; >>>>>>>> #if ($_->{similar}<=$in{'Threshold'}){ >>>>>>>> >>>>>>>> # } >>>>>>>> #} >>>>>>>> >>>>>>>> print OUTFILE "\n\n"; >>>>>>>> >>>>>>>> close OUTFILE; >>>>>>>> >>>>>>>> #nameprint(); >>>>>>>> >>>>>>>> sub parse_form { >>>>>>>> local ($buffer, @pairs, $pair, $name, $value); >>>>>>>> # Read in text >>>>>>>> $ENV{'REQUEST_METHOD'} =~ tr/a-z/A-Z/; >>>>>>>> if ($ENV{'REQUEST_METHOD'} eq "POST") >>>>>>>> { >>>>>>>> read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'}); >>>>>>>> } >>>>>>>> else >>>>>>>> { >>>>>>>> $buffer = $ENV{'QUERY_STRING'}; >>>>>>>> } >>>>>>>> @pairs = split(/&/, $buffer); >>>>>>>> foreach $pair (@pairs) >>>>>>>> { >>>>>>>> ($name, $value) = split(/=/, $pair); >>>>>>>> $value =~ tr/+/ /; >>>>>>>> $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg; >>>>>>>> $in{$name} = $value; >>>>>>>> } >>>>>>>> } >>>>>>>> _______________________________________________ >>>>>>>> Bioperl-l mailing list >>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>>>> >>>> >>> >> > From maj at fortinbras.us Fri Jan 22 07:34:59 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 22 Jan 2010 07:34:59 -0500 Subject: [Bioperl-l] forcing alphabet in Bio::AlignIO In-Reply-To: References: Message-ID: I'm down with that. ----- Original Message ----- From: "Jason Stajich" To: "BioPerl List" Sent: Friday, January 22, 2010 1:17 AM Subject: [Bioperl-l] forcing alphabet in Bio::AlignIO > I'm considering putting in allowable initialization parameter (and get/ > set) for Bio::AlignIO that would allow setting of the alphabet. This > is then passed to Bio::LocatableSeq creation so that _guess_alphabet > isn't called. This will allow removal of warnings about empty > sequences because _guess_alphabet won't be called on a sequence if we > have explictly set the alphabet. > > This worked great on my local install and tests pass. Any objections > or concerns? > > basically it means when you make an AlignIO you can specify the > alphabet i.e. > > my $in = Bio::AlignIO->new(-format => 'fasta', -alphabet => 'dna', - > file => 'genome.fasaln'); > > I have some alignments with empty sequences and I think turning off > the warnings is appropriate where I force the alphabet choice. It > should also have a very modest speedup benefit too. > > -jason > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > http://fungalgenomes.org/ > http://twitter.com/hyphaltip > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From avilella at gmail.com Fri Jan 22 08:07:26 2010 From: avilella at gmail.com (Albert Vilella) Date: Fri, 22 Jan 2010 13:07:26 +0000 Subject: [Bioperl-l] Merging fragments in a simplealign Message-ID: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com> Hi, I would like to write a script that merges fragments in a Bio::SimpleAlign object on the basis of some $seq->display_name rule. I basically want to start with something like this: seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM seq2.234 QWERTYU------------------- seq2.345 ----------ASDFGH---------- seq2.456 -------------------ZXCVBNM And end with something like this: seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM seq2.mrg QWERTYU---ASDFGH---ZXCVBNM Can people suggest any Bio::SimpleAlign methods that would help here? Cheers, Albert. From maj at fortinbras.us Fri Jan 22 08:31:54 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 22 Jan 2010 08:31:54 -0500 Subject: [Bioperl-l] Merging fragments in a simplealign In-Reply-To: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com> References: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com> Message-ID: Here's one of my favorite tricks for this: XOR mask on gap symbol. MAJ use Bio::SeqIO; use Bio::Seq; use strict; my $seqio = Bio::SeqIO->new( -fh => \*DATA ); my $acc = $seqio->next_seq->seq ^ '-'; while ($_ = $seqio->next_seq ) { $acc ^= ($_->seq ^ '-'); } my $mrg = Bio::Seq->new( -id => 'merged', -seq => $acc ^ '-' ); 1; __END__ >seq2.234 QWERTYU------------------- >seq2.345 ----------ASDFGH---------- >seq2.456 -------------------ZXCVBNM ----- Original Message ----- From: "Albert Vilella" To: Sent: Friday, January 22, 2010 8:07 AM Subject: [Bioperl-l] Merging fragments in a simplealign > Hi, > > I would like to write a script that merges fragments in a Bio::SimpleAlign > object on the basis of > some $seq->display_name rule. > > I basically want to start with something like this: > > seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM > seq2.234 QWERTYU------------------- > seq2.345 ----------ASDFGH---------- > seq2.456 -------------------ZXCVBNM > > And end with something like this: > > seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM > seq2.mrg QWERTYU---ASDFGH---ZXCVBNM > > Can people suggest any Bio::SimpleAlign methods that would help here? > > Cheers, > > Albert. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Fri Jan 22 08:34:07 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 22 Jan 2010 07:34:07 -0600 Subject: [Bioperl-l] forcing alphabet in Bio::AlignIO In-Reply-To: References: Message-ID: <55F51BAA-7DA5-4F32-B680-DCAE1714A5F1@illinois.edu> Sounds good to me. The warnings are a bit too tight on this module anyway. I still think we have plans towards refactoring some of this, not sure how far along they are: http://www.bioperl.org/wiki/Align_Refactor chris On Jan 22, 2010, at 12:17 AM, Jason Stajich wrote: > I'm considering putting in allowable initialization parameter (and get/set) for Bio::AlignIO that would allow setting of the alphabet. This is then passed to Bio::LocatableSeq creation so that _guess_alphabet isn't called. This will allow removal of warnings about empty sequences because _guess_alphabet won't be called on a sequence if we have explictly set the alphabet. > > This worked great on my local install and tests pass. Any objections or concerns? > > basically it means when you make an AlignIO you can specify the alphabet i.e. > > my $in = Bio::AlignIO->new(-format => 'fasta', -alphabet => 'dna', -file => 'genome.fasaln'); > > I have some alignments with empty sequences and I think turning off the warnings is appropriate where I force the alphabet choice. It should also have a very modest speedup benefit too. > > -jason > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > http://fungalgenomes.org/ > http://twitter.com/hyphaltip > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Fri Jan 22 08:40:57 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 22 Jan 2010 07:40:57 -0600 Subject: [Bioperl-l] Merging fragments in a simplealign In-Reply-To: References: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com> Message-ID: <058430B4-F149-438C-A9DD-C9E24545FDBB@illinois.edu> May be something for the cook/scrapbook? chris On Jan 22, 2010, at 7:31 AM, Mark A. Jensen wrote: > Here's one of my favorite tricks for this: XOR mask on gap symbol. > MAJ > > use Bio::SeqIO; > use Bio::Seq; > use strict; > my $seqio = Bio::SeqIO->new( -fh => \*DATA ); > > my $acc = $seqio->next_seq->seq ^ '-'; > while ($_ = $seqio->next_seq ) { > $acc ^= ($_->seq ^ '-'); > } > my $mrg = Bio::Seq->new( -id => 'merged', > -seq => $acc ^ '-' ); > 1; > > > __END__ >> seq2.234 > QWERTYU------------------- >> seq2.345 > ----------ASDFGH---------- >> seq2.456 > -------------------ZXCVBNM > > ----- Original Message ----- From: "Albert Vilella" > To: > Sent: Friday, January 22, 2010 8:07 AM > Subject: [Bioperl-l] Merging fragments in a simplealign > > >> Hi, >> I would like to write a script that merges fragments in a Bio::SimpleAlign >> object on the basis of >> some $seq->display_name rule. >> I basically want to start with something like this: >> seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM >> seq2.234 QWERTYU------------------- >> seq2.345 ----------ASDFGH---------- >> seq2.456 -------------------ZXCVBNM >> And end with something like this: >> seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM >> seq2.mrg QWERTYU---ASDFGH---ZXCVBNM >> Can people suggest any Bio::SimpleAlign methods that would help here? >> Cheers, >> Albert. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From holland at eaglegenomics.com Fri Jan 22 05:51:52 2010 From: holland at eaglegenomics.com (Richard Holland) Date: Fri, 22 Jan 2010 10:51:52 +0000 Subject: [Bioperl-l] [BioSQL-l] SwissProt DE lines and UniProt XML / TagTree as XML in BioSQL In-Reply-To: <320fb6e01001210433n6f42e617g6519ee2790d6add5@mail.gmail.com> References: <320fb6e01001210433n6f42e617g6519ee2790d6add5@mail.gmail.com> Message-ID: <8FECCBDE-2DE1-40EE-B5A4-73BDAC893E2D@eaglegenomics.com> Nice idea. Currently, BioJava just stores the complete section as a string without parsing it, but it provides a parser module for converting it into useful tag/value format within a user's program (but not to be stored in BioSQL). On 21 Jan 2010, at 12:33, Peter wrote: > Hi all, > > This is cross posted to try and ensure relevant people see it. > I suggest we continue the discussion on the BioSQL list > (for how to serialise structured annotation to BioSQL), and/or > the OpenBio list (for things like file format naming conventions). > > I am hoping we (Bio*) can be consistent in how we parse and load > into BioSQL the SwissProt DE lines (known as "swiss" format in > both BioPerl and Biopython's SeqIO, and by EMBOSS) or the > equivalent UniProt XML tags (which we are tentatively going to > call the "uniprot" format in Biopython's SeqIO - comments?). > > Like BioPerl (etc), Biopython can parse plain text SwissProt ("swiss") > files and load them into BioSQL. Biopython currently treats the DE > comment lines as a long string, as BioPerl used to: > > http://lists.open-bio.org/pipermail/bioperl-l/2009-May/030041.html > http://lists.open-bio.org/pipermail/biosql-l/2009-May/001514.html > > I understand that BioPerl now turns the SwissProt DE lines into a > TagTree, and for storing this in BioSQL this gets serialised as XML. > I would like Biopython to handle this the same way (although rather > than a Perl TagTree, we'd use a Python structure of course), and > would appreciate clarification of what exactly was implemented > (e.g. which bit of the BioPerl source code should be look at, > and could you show a worked example?). > > Andrea Pierlenoin (CC'd - not sure if he is on the BioSQL or > Open-Bio lists yet) has started work on parsing UniProt XML > files for Biopython. Here the DE comment lines are already > provided broken up with XML markup. Hopefully their nested > structure matches what BioPerl was doing with the SwissProt > DE lines. > > Regards, > > Peter > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From andrea at biocomp.unibo.it Fri Jan 22 07:18:32 2010 From: andrea at biocomp.unibo.it (Andrea Pierleoni) Date: Fri, 22 Jan 2010 13:18:32 +0100 (CET) Subject: [Bioperl-l] SwissProt DE lines and UniProt XML / TagTree as XML in BioSQL In-Reply-To: <320fb6e01001210433n6f42e617g6519ee2790d6add5@mail.gmail.com> References: <320fb6e01001210433n6f42e617g6519ee2790d6add5@mail.gmail.com> Message-ID: <2b6e30c4628585042366646a7b46386e.squirrel@lipid.biocomp.unibo.it> I think that the point here can be a little broader, since not only the swissprot DE lines carry complex and structured data. To define a common, language-independent way to store structured data into the comment and *_qualifier_value tables of the actual BioSQL schema could be very useful. XML looks like a good candidate to me, and the UniprotXML format can be used as reference or as a template to start from. Each Bio* project will then parse and report this structured data in its own programming language data structure. Andrea > Hi all, > > This is cross posted to try and ensure relevant people see it. > I suggest we continue the discussion on the BioSQL list > (for how to serialise structured annotation to BioSQL), and/or > the OpenBio list (for things like file format naming conventions). > > I am hoping we (Bio*) can be consistent in how we parse and load > into BioSQL the SwissProt DE lines (known as "swiss" format in > both BioPerl and Biopython's SeqIO, and by EMBOSS) or the > equivalent UniProt XML tags (which we are tentatively going to > call the "uniprot" format in Biopython's SeqIO - comments?). > > Like BioPerl (etc), Biopython can parse plain text SwissProt ("swiss") > files and load them into BioSQL. Biopython currently treats the DE > comment lines as a long string, as BioPerl used to: > > http://lists.open-bio.org/pipermail/bioperl-l/2009-May/030041.html > http://lists.open-bio.org/pipermail/biosql-l/2009-May/001514.html > > I understand that BioPerl now turns the SwissProt DE lines into a > TagTree, and for storing this in BioSQL this gets serialised as XML. > I would like Biopython to handle this the same way (although rather > than a Perl TagTree, we'd use a Python structure of course), and > would appreciate clarification of what exactly was implemented > (e.g. which bit of the BioPerl source code should be look at, > and could you show a worked example?). > > Andrea Pierlenoin (CC'd - not sure if he is on the BioSQL or > Open-Bio lists yet) has started work on parsing UniProt XML > files for Biopython. Here the DE comment lines are already > provided broken up with XML markup. Hopefully their nested > structure matches what BioPerl was doing with the SwissProt > DE lines. > > Regards, > > Peter > From avilella at gmail.com Fri Jan 22 11:04:13 2010 From: avilella at gmail.com (Albert Vilella) Date: Fri, 22 Jan 2010 16:04:13 +0000 Subject: [Bioperl-l] Merging fragments in a simplealign In-Reply-To: <058430B4-F149-438C-A9DD-C9E24545FDBB@illinois.edu> References: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com> <058430B4-F149-438C-A9DD-C9E24545FDBB@illinois.edu> Message-ID: <358f4d651001220804o3a96252ctc4b721771668f1ba@mail.gmail.com> Is there/should be a 'have_pairwise_overlap' method similar to this? # $seq1 and $seq3 have matching ids my $seq1 = $aln->each_seq_by_id($seq1->display_id); my $seq3 = $aln->each_seq_by_id($seq3->display_id); my $ret = $aln->have_pairwise_overlap($seq1,$seq3); On Fri, Jan 22, 2010 at 1:40 PM, Chris Fields wrote: > May be something for the cook/scrapbook? > > chris > > On Jan 22, 2010, at 7:31 AM, Mark A. Jensen wrote: > > > Here's one of my favorite tricks for this: XOR mask on gap symbol. > > MAJ > > > > use Bio::SeqIO; > > use Bio::Seq; > > use strict; > > my $seqio = Bio::SeqIO->new( -fh => \*DATA ); > > > > my $acc = $seqio->next_seq->seq ^ '-'; > > while ($_ = $seqio->next_seq ) { > > $acc ^= ($_->seq ^ '-'); > > } > > my $mrg = Bio::Seq->new( -id => 'merged', > > -seq => $acc ^ '-' ); > > 1; > > > > > > __END__ > >> seq2.234 > > QWERTYU------------------- > >> seq2.345 > > ----------ASDFGH---------- > >> seq2.456 > > -------------------ZXCVBNM > > > > ----- Original Message ----- From: "Albert Vilella" > > To: > > Sent: Friday, January 22, 2010 8:07 AM > > Subject: [Bioperl-l] Merging fragments in a simplealign > > > > > >> Hi, > >> I would like to write a script that merges fragments in a > Bio::SimpleAlign > >> object on the basis of > >> some $seq->display_name rule. > >> I basically want to start with something like this: > >> seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM > >> seq2.234 QWERTYU------------------- > >> seq2.345 ----------ASDFGH---------- > >> seq2.456 -------------------ZXCVBNM > >> And end with something like this: > >> seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM > >> seq2.mrg QWERTYU---ASDFGH---ZXCVBNM > >> Can people suggest any Bio::SimpleAlign methods that would help here? > >> Cheers, > >> Albert. > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Fri Jan 22 11:02:55 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 22 Jan 2010 11:02:55 -0500 Subject: [Bioperl-l] Merging fragments in a simplealign In-Reply-To: <058430B4-F149-438C-A9DD-C9E24545FDBB@illinois.edu> References: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com> <058430B4-F149-438C-A9DD-C9E24545FDBB@illinois.edu> Message-ID: http://www.bioperl.org/wiki/Merge_gapped_sequences_across_a_common_region ----- Original Message ----- From: "Chris Fields" To: "Mark A. Jensen" Cc: "Albert Vilella" ; Sent: Friday, January 22, 2010 8:40 AM Subject: Re: [Bioperl-l] Merging fragments in a simplealign > May be something for the cook/scrapbook? > > chris > > On Jan 22, 2010, at 7:31 AM, Mark A. Jensen wrote: > >> Here's one of my favorite tricks for this: XOR mask on gap symbol. >> MAJ >> >> use Bio::SeqIO; >> use Bio::Seq; >> use strict; >> my $seqio = Bio::SeqIO->new( -fh => \*DATA ); >> >> my $acc = $seqio->next_seq->seq ^ '-'; >> while ($_ = $seqio->next_seq ) { >> $acc ^= ($_->seq ^ '-'); >> } >> my $mrg = Bio::Seq->new( -id => 'merged', >> -seq => $acc ^ '-' ); >> 1; >> >> >> __END__ >>> seq2.234 >> QWERTYU------------------- >>> seq2.345 >> ----------ASDFGH---------- >>> seq2.456 >> -------------------ZXCVBNM >> >> ----- Original Message ----- From: "Albert Vilella" >> To: >> Sent: Friday, January 22, 2010 8:07 AM >> Subject: [Bioperl-l] Merging fragments in a simplealign >> >> >>> Hi, >>> I would like to write a script that merges fragments in a Bio::SimpleAlign >>> object on the basis of >>> some $seq->display_name rule. >>> I basically want to start with something like this: >>> seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM >>> seq2.234 QWERTYU------------------- >>> seq2.345 ----------ASDFGH---------- >>> seq2.456 -------------------ZXCVBNM >>> And end with something like this: >>> seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM >>> seq2.mrg QWERTYU---ASDFGH---ZXCVBNM >>> Can people suggest any Bio::SimpleAlign methods that would help here? >>> Cheers, >>> Albert. >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From avilella at gmail.com Fri Jan 22 12:50:57 2010 From: avilella at gmail.com (Albert Vilella) Date: Fri, 22 Jan 2010 17:50:57 +0000 Subject: [Bioperl-l] Merging fragments in a simplealign In-Reply-To: <358f4d651001220804o3a96252ctc4b721771668f1ba@mail.gmail.com> References: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com> <058430B4-F149-438C-A9DD-C9E24545FDBB@illinois.edu> <358f4d651001220804o3a96252ctc4b721771668f1ba@mail.gmail.com> Message-ID: <358f4d651001220950v605a5b55kef0302ff6270f82e@mail.gmail.com> Or to rephrase my answer, what is the closest way for the code below that already exists? On Fri, Jan 22, 2010 at 4:04 PM, Albert Vilella wrote: > Is there/should be a 'have_pairwise_overlap' method similar to this? > > # $seq1 and $seq3 have matching ids > my $seq1 = $aln->each_seq_by_id($seq1->display_id); > my $seq3 = $aln->each_seq_by_id($seq3->display_id); > > my $ret = $aln->have_pairwise_overlap($seq1,$seq3); > > > On Fri, Jan 22, 2010 at 1:40 PM, Chris Fields wrote: > >> May be something for the cook/scrapbook? >> >> chris >> >> On Jan 22, 2010, at 7:31 AM, Mark A. Jensen wrote: >> >> > Here's one of my favorite tricks for this: XOR mask on gap symbol. >> > MAJ >> > >> > use Bio::SeqIO; >> > use Bio::Seq; >> > use strict; >> > my $seqio = Bio::SeqIO->new( -fh => \*DATA ); >> > >> > my $acc = $seqio->next_seq->seq ^ '-'; >> > while ($_ = $seqio->next_seq ) { >> > $acc ^= ($_->seq ^ '-'); >> > } >> > my $mrg = Bio::Seq->new( -id => 'merged', >> > -seq => $acc ^ '-' ); >> > 1; >> > >> > >> > __END__ >> >> seq2.234 >> > QWERTYU------------------- >> >> seq2.345 >> > ----------ASDFGH---------- >> >> seq2.456 >> > -------------------ZXCVBNM >> > >> > ----- Original Message ----- From: "Albert Vilella" > > >> > To: >> > Sent: Friday, January 22, 2010 8:07 AM >> > Subject: [Bioperl-l] Merging fragments in a simplealign >> > >> > >> >> Hi, >> >> I would like to write a script that merges fragments in a >> Bio::SimpleAlign >> >> object on the basis of >> >> some $seq->display_name rule. >> >> I basically want to start with something like this: >> >> seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM >> >> seq2.234 QWERTYU------------------- >> >> seq2.345 ----------ASDFGH---------- >> >> seq2.456 -------------------ZXCVBNM >> >> And end with something like this: >> >> seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM >> >> seq2.mrg QWERTYU---ASDFGH---ZXCVBNM >> >> Can people suggest any Bio::SimpleAlign methods that would help here? >> >> Cheers, >> >> Albert. >> >> _______________________________________________ >> >> Bioperl-l mailing list >> >> Bioperl-l at lists.open-bio.org >> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > From jay at jays.net Fri Jan 22 13:30:57 2010 From: jay at jays.net (Jay Hannah) Date: Fri, 22 Jan 2010 12:30:57 -0600 Subject: [Bioperl-l] Bio::BroodComb - RFC In-Reply-To: References: <638696D6-7529-4717-A05A-F1E8FF1C5A8F@jays.net> Message-ID: On Jan 21, 2010, at 10:31 PM, Chris Fields wrote: > Did you want to release it to CPAN? I'll take a closer look at the docs to get an idea of what you are doing with it, but from my perspective I can see this becoming a nice general use tool akin to Bio::Perl, maybe a bit more lightweight. Yes, I was thinking I would. No one has (yet) told me it's the worst idea ever, so I'm feeling encouraged. :) Given smallish inputs / databases (up to a few million rows) where some lightweight schema + SQLite + BioPerl can get the job done, it's nice to have a little easy-to-run toolbox. New tables and Roles bolt on easily, so I'll be adding them as they surface at $work[1]. Thanks for your interest. :) Jay Hannah http://github.com/jhannah/bio-broodcomb http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah From dalalhina at gmail.com Fri Jan 22 12:31:09 2010 From: dalalhina at gmail.com (hina dalal) Date: Fri, 22 Jan 2010 17:31:09 +0000 Subject: [Bioperl-l] Bioperl installation failed Message-ID: <425f75df1001220931t49f5c768j97d91d2dd1757f19@mail.gmail.com> Hi I have installed PERL from Activesate and now trying to install bioperl but can not do it . Neither from PPM (it is showing error ?Ppm install failed: 404 not found?) nor from CPAN / manual installation. It is not allowing me to download nmake, showing that ?the version of this file is not compatible with the version of windows you are running. Check your computer system information to see whether you need 32 bit or 64 bit of this program.? I am using windows VISTA. Please help. Regards Hina From H.Dalal at sms.ed.ac.uk Fri Jan 22 12:34:55 2010 From: H.Dalal at sms.ed.ac.uk (Hina Dalal) Date: Fri, 22 Jan 2010 17:34:55 +0000 Subject: [Bioperl-l] BioPerl installation failed: please help Message-ID: <20100122173455.c19sbarmswswgswc@www.sms.ed.ac.uk> Hi I have installed PERL from Activesate and now trying to install bioperl but can not do it . Neither from PPM (it is showing error ?Ppm install failed: 404 not found?) nor from CPAN manual installation. It is not allowing me to download nmake, showing that ?the version of this file is not compatible with the version of windows you are running. Check your computer system information to see whether you need 32 bit or 64 bit of this program.? Please help. Regards Hina -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From jason at bioperl.org Fri Jan 22 14:18:30 2010 From: jason at bioperl.org (Jason Stajich) Date: Fri, 22 Jan 2010 11:18:30 -0800 Subject: [Bioperl-l] forcing alphabet in Bio::AlignIO In-Reply-To: <55F51BAA-7DA5-4F32-B680-DCAE1714A5F1@illinois.edu> References: <55F51BAA-7DA5-4F32-B680-DCAE1714A5F1@illinois.edu> Message-ID: <59EC9331-FB2F-4338-AD58-2D501A528A18@bioperl.org> Done, as of r16739. Look forward to the refactor work too. -jason On Jan 22, 2010, at 5:34 AM, Chris Fields wrote: > Sounds good to me. The warnings are a bit too tight on this module > anyway. > > I still think we have plans towards refactoring some of this, not > sure how far along they are: > > http://www.bioperl.org/wiki/Align_Refactor > > chris > > On Jan 22, 2010, at 12:17 AM, Jason Stajich wrote: > >> I'm considering putting in allowable initialization parameter (and >> get/set) for Bio::AlignIO that would allow setting of the >> alphabet. This is then passed to Bio::LocatableSeq creation so >> that _guess_alphabet isn't called. This will allow removal of >> warnings about empty sequences because _guess_alphabet won't be >> called on a sequence if we have explictly set the alphabet. >> >> This worked great on my local install and tests pass. Any >> objections or concerns? >> >> basically it means when you make an AlignIO you can specify the >> alphabet i.e. >> >> my $in = Bio::AlignIO->new(-format => 'fasta', -alphabet => 'dna', - >> file => 'genome.fasaln'); >> >> I have some alignments with empty sequences and I think turning off >> the warnings is appropriate where I force the alphabet choice. It >> should also have a very modest speedup benefit too. >> >> -jason >> -- >> Jason Stajich >> jason.stajich at gmail.com >> jason at bioperl.org >> http://fungalgenomes.org/ >> http://twitter.com/hyphaltip >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ http://twitter.com/hyphaltip From cjfields at illinois.edu Fri Jan 22 14:22:43 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 22 Jan 2010 13:22:43 -0600 Subject: [Bioperl-l] Merging fragments in a simplealign In-Reply-To: <358f4d651001220950v605a5b55kef0302ff6270f82e@mail.gmail.com> References: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com> <058430B4-F149-438C-A9DD-C9E24545FDBB@illinois.edu> <358f4d651001220804o3a96252ctc4b721771668f1ba@mail.gmail.com> <358f4d651001220950v605a5b55kef0302ff6270f82e@mail.gmail.com> Message-ID: <14824B66-2112-46A1-98BB-02FC592A3A9B@illinois.edu> This could exist, but should go into a general Utilities module. Part of the Align refactoring was to pull a good number of the methods into a general utilities module, so this would fit into that category. chris On Jan 22, 2010, at 11:50 AM, Albert Vilella wrote: > Or to rephrase my answer, what is the closest way for the code below that > already exists? > > On Fri, Jan 22, 2010 at 4:04 PM, Albert Vilella wrote: > >> Is there/should be a 'have_pairwise_overlap' method similar to this? >> >> # $seq1 and $seq3 have matching ids >> my $seq1 = $aln->each_seq_by_id($seq1->display_id); >> my $seq3 = $aln->each_seq_by_id($seq3->display_id); >> >> my $ret = $aln->have_pairwise_overlap($seq1,$seq3); >> >> >> On Fri, Jan 22, 2010 at 1:40 PM, Chris Fields wrote: >> >>> May be something for the cook/scrapbook? >>> >>> chris >>> >>> On Jan 22, 2010, at 7:31 AM, Mark A. Jensen wrote: >>> >>>> Here's one of my favorite tricks for this: XOR mask on gap symbol. >>>> MAJ >>>> >>>> use Bio::SeqIO; >>>> use Bio::Seq; >>>> use strict; >>>> my $seqio = Bio::SeqIO->new( -fh => \*DATA ); >>>> >>>> my $acc = $seqio->next_seq->seq ^ '-'; >>>> while ($_ = $seqio->next_seq ) { >>>> $acc ^= ($_->seq ^ '-'); >>>> } >>>> my $mrg = Bio::Seq->new( -id => 'merged', >>>> -seq => $acc ^ '-' ); >>>> 1; >>>> >>>> >>>> __END__ >>>>> seq2.234 >>>> QWERTYU------------------- >>>>> seq2.345 >>>> ----------ASDFGH---------- >>>>> seq2.456 >>>> -------------------ZXCVBNM >>>> >>>> ----- Original Message ----- From: "Albert Vilella" >>> >>>> To: >>>> Sent: Friday, January 22, 2010 8:07 AM >>>> Subject: [Bioperl-l] Merging fragments in a simplealign >>>> >>>> >>>>> Hi, >>>>> I would like to write a script that merges fragments in a >>> Bio::SimpleAlign >>>>> object on the basis of >>>>> some $seq->display_name rule. >>>>> I basically want to start with something like this: >>>>> seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM >>>>> seq2.234 QWERTYU------------------- >>>>> seq2.345 ----------ASDFGH---------- >>>>> seq2.456 -------------------ZXCVBNM >>>>> And end with something like this: >>>>> seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM >>>>> seq2.mrg QWERTYU---ASDFGH---ZXCVBNM >>>>> Can people suggest any Bio::SimpleAlign methods that would help here? >>>>> Cheers, >>>>> Albert. >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Fri Jan 22 14:29:07 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 22 Jan 2010 14:29:07 -0500 Subject: [Bioperl-l] Merging fragments in a simplealign In-Reply-To: <14824B66-2112-46A1-98BB-02FC592A3A9B@illinois.edu> References: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com><058430B4-F149-438C-A9DD-C9E24545FDBB@illinois.edu><358f4d651001220804o3a96252ctc4b721771668f1ba@mail.gmail.com><358f4d651001220950v605a5b55kef0302ff6270f82e@mail.gmail.com> <14824B66-2112-46A1-98BB-02FC592A3A9B@illinois.edu> Message-ID: <0F7B7E5FE70D4C5CB34B27045561823C@NewLife> I'd recommend making an enhancement request via Bugzilla, so we don't forget- MAJ ----- Original Message ----- From: "Chris Fields" To: "Albert Vilella" Cc: "bioperl-l" Sent: Friday, January 22, 2010 2:22 PM Subject: Re: [Bioperl-l] Merging fragments in a simplealign > This could exist, but should go into a general Utilities module. Part of the > Align refactoring was to pull a good number of the methods into a general > utilities module, so this would fit into that category. > > chris > > On Jan 22, 2010, at 11:50 AM, Albert Vilella wrote: > >> Or to rephrase my answer, what is the closest way for the code below that >> already exists? >> >> On Fri, Jan 22, 2010 at 4:04 PM, Albert Vilella wrote: >> >>> Is there/should be a 'have_pairwise_overlap' method similar to this? >>> >>> # $seq1 and $seq3 have matching ids >>> my $seq1 = $aln->each_seq_by_id($seq1->display_id); >>> my $seq3 = $aln->each_seq_by_id($seq3->display_id); >>> >>> my $ret = $aln->have_pairwise_overlap($seq1,$seq3); >>> >>> >>> On Fri, Jan 22, 2010 at 1:40 PM, Chris Fields wrote: >>> >>>> May be something for the cook/scrapbook? >>>> >>>> chris >>>> >>>> On Jan 22, 2010, at 7:31 AM, Mark A. Jensen wrote: >>>> >>>>> Here's one of my favorite tricks for this: XOR mask on gap symbol. >>>>> MAJ >>>>> >>>>> use Bio::SeqIO; >>>>> use Bio::Seq; >>>>> use strict; >>>>> my $seqio = Bio::SeqIO->new( -fh => \*DATA ); >>>>> >>>>> my $acc = $seqio->next_seq->seq ^ '-'; >>>>> while ($_ = $seqio->next_seq ) { >>>>> $acc ^= ($_->seq ^ '-'); >>>>> } >>>>> my $mrg = Bio::Seq->new( -id => 'merged', >>>>> -seq => $acc ^ '-' ); >>>>> 1; >>>>> >>>>> >>>>> __END__ >>>>>> seq2.234 >>>>> QWERTYU------------------- >>>>>> seq2.345 >>>>> ----------ASDFGH---------- >>>>>> seq2.456 >>>>> -------------------ZXCVBNM >>>>> >>>>> ----- Original Message ----- From: "Albert Vilella" >>>> >>>>> To: >>>>> Sent: Friday, January 22, 2010 8:07 AM >>>>> Subject: [Bioperl-l] Merging fragments in a simplealign >>>>> >>>>> >>>>>> Hi, >>>>>> I would like to write a script that merges fragments in a >>>> Bio::SimpleAlign >>>>>> object on the basis of >>>>>> some $seq->display_name rule. >>>>>> I basically want to start with something like this: >>>>>> seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM >>>>>> seq2.234 QWERTYU------------------- >>>>>> seq2.345 ----------ASDFGH---------- >>>>>> seq2.456 -------------------ZXCVBNM >>>>>> And end with something like this: >>>>>> seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM >>>>>> seq2.mrg QWERTYU---ASDFGH---ZXCVBNM >>>>>> Can people suggest any Bio::SimpleAlign methods that would help here? >>>>>> Cheers, >>>>>> Albert. >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Fri Jan 22 14:33:41 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 22 Jan 2010 14:33:41 -0500 Subject: [Bioperl-l] BioPerl installation failed: please help In-Reply-To: <20100122173455.c19sbarmswswgswc@www.sms.ed.ac.uk> References: <20100122173455.c19sbarmswswgswc@www.sms.ed.ac.uk> Message-ID: <2ABEC492CD49450EAE4BFC7BA763E3DB@NewLife> Hina-- See the protocol at http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows#Comand-line_Installation for ActiveState installation. If it doesn't work, please let us know at which step the failure happened. cheers, MAJ ----- Original Message ----- From: "Hina Dalal" To: Sent: Friday, January 22, 2010 12:34 PM Subject: [Bioperl-l] BioPerl installation failed: please help Hi I have installed PERL from Activesate and now trying to install bioperl but can not do it . Neither from PPM (it is showing error "Ppm install failed: 404 not found") nor from CPAN manual installation. It is not allowing me to download nmake, showing that "the version of this file is not compatible with the version of windows you are running. Check your computer system information to see whether you need 32 bit or 64 bit of this program." Please help. Regards Hina -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Fri Jan 22 15:13:15 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 22 Jan 2010 15:13:15 -0500 Subject: [Bioperl-l] BioPerl installation failed: please help In-Reply-To: <20100122200118.053j5kc36skow0wg@www.sms.ed.ac.uk> References: <20100122173455.c19sbarmswswgswc@www.sms.ed.ac.uk><2ABEC492CD49450EAE4BFC7BA763E3DB@NewLife> <20100122200118.053j5kc36skow0wg@www.sms.ed.ac.uk> Message-ID: <9E5DE384E2C8416B8373E390ABDB7DFE@NewLife> Ok Hina, I'm not seeing any issues with the presence or availability of http://bioperl.org/DIST from my machine. Can you access that url in a browser? If not, the king of the King's Buildings may not be allowing access. Also, can you do the following: C:> ppm-shell ppm> repo list Note the number of the repo that corresponds to bioperl (if any) and do ppm> repo describe n where 'n' is that number, and send the output along. cheers, MAJ ----- Original Message ----- From: "Hina Dalal" To: "Mark A. Jensen" Sent: Friday, January 22, 2010 3:01 PM Subject: Re: [Bioperl-l] BioPerl installation failed: please help Hi Mark warm regards I was following that protocol only , but the problem is when I tried to do it from PPM, and when I reach at the stem install BioPerl, it is showing error "Ppm install failed: 404 not found" in the end. and when I tried it by CPAN /manual installation, I couldn't download nmake,its showing that "the version of this file is not compatible with the version of windows you are running. Check your computer system information to see whether you need 32 bit or 64 bit of this program and than contact the software publisher." What should I do? Please help. Regards Hina Quoting "Mark A. Jensen" : > Hina-- See the protocol at > http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows#Comand-line_Installation > for ActiveState installation. If it doesn't work, please let us know at > which step the failure happened. > cheers, MAJ > ----- Original Message ----- From: "Hina Dalal" > To: > Sent: Friday, January 22, 2010 12:34 PM > Subject: [Bioperl-l] BioPerl installation failed: please help > > > Hi > > I have installed PERL from Activesate and now trying to install > bioperl but can not do it . Neither from PPM (it is showing error "Ppm > install failed: 404 not found") nor from CPAN manual installation. It > is not allowing me to download nmake, showing that "the version of > this file is not compatible with the version of windows you are > running. Check your computer system information to see whether you > need 32 bit or 64 bit of this program." > > Please help. > > Regards > > Hina > > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From pengyu.ut at gmail.com Sun Jan 24 20:29:59 2010 From: pengyu.ut at gmail.com (Peng Yu) Date: Sun, 24 Jan 2010 19:29:59 -0600 Subject: [Bioperl-l] Transcribe in bioperl Message-ID: <366c6f341001241729t155483a7q65631b2495c338b0@mail.gmail.com> I found the function 'translate' in bioperl. But I don't find 'transcribe'. Is there such a function? From jason at bioperl.org Sun Jan 24 21:06:48 2010 From: jason at bioperl.org (Jason Stajich) Date: Sun, 24 Jan 2010 18:06:48 -0800 Subject: [Bioperl-l] Transcribe in bioperl In-Reply-To: <366c6f341001241729t155483a7q65631b2495c338b0@mail.gmail.com> References: <366c6f341001241729t155483a7q65631b2495c338b0@mail.gmail.com> Message-ID: What exactly do you want to do? spliced_seq for a feature would be the closest thing... -jason On Jan 24, 2010, at 5:29 PM, Peng Yu wrote: > I found the function 'translate' in bioperl. But I don't find > 'transcribe'. Is there such a function? > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ http://twitter.com/hyphaltip From pengyu.ut at gmail.com Sun Jan 24 21:22:12 2010 From: pengyu.ut at gmail.com (Peng Yu) Date: Sun, 24 Jan 2010 20:22:12 -0600 Subject: [Bioperl-l] Transcribe in bioperl In-Reply-To: References: <366c6f341001241729t155483a7q65631b2495c338b0@mail.gmail.com> Message-ID: <366c6f341001241822t790c19eeo5f5facffa0c35600@mail.gmail.com> To convert from T to U. I could use perl's builtin function. But it is semantically far away from 'transcribe'. If there is a function with name 'transcribe', it will be better. On Sun, Jan 24, 2010 at 8:06 PM, Jason Stajich wrote: > What exactly do you want to do? > spliced_seq for a feature would be the closest thing... > > -jason > On Jan 24, 2010, at 5:29 PM, Peng Yu wrote: > >> I found the function 'translate' in bioperl. But I don't find >> 'transcribe'. Is there such a function? >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > http://fungalgenomes.org/ > http://twitter.com/hyphaltip > > From maj at fortinbras.us Sun Jan 24 21:48:33 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 24 Jan 2010 21:48:33 -0500 Subject: [Bioperl-l] Transcribe in bioperl In-Reply-To: <366c6f341001241822t790c19eeo5f5facffa0c35600@mail.gmail.com> References: <366c6f341001241729t155483a7q65631b2495c338b0@mail.gmail.com> <366c6f341001241822t790c19eeo5f5facffa0c35600@mail.gmail.com> Message-ID: Not a bad idea, a semantics-preserving/checking thing. transcribe() could return an object with alphabet == 'rna' and the T's flipped, or bork if called against an object with alphbet != 'dna'. I can add such a thing to Bio::PrimarySeqI (where all these doodads seem to be stashed), if desired. ----- Original Message ----- From: "Peng Yu" To: "Jason Stajich" Cc: Sent: Sunday, January 24, 2010 9:22 PM Subject: Re: [Bioperl-l] Transcribe in bioperl > To convert from T to U. I could use perl's builtin function. But it is > semantically far away from 'transcribe'. If there is a function with > name 'transcribe', it will be better. > > On Sun, Jan 24, 2010 at 8:06 PM, Jason Stajich wrote: >> What exactly do you want to do? >> spliced_seq for a feature would be the closest thing... >> >> -jason >> On Jan 24, 2010, at 5:29 PM, Peng Yu wrote: >> >>> I found the function 'translate' in bioperl. But I don't find >>> 'transcribe'. Is there such a function? >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> jason.stajich at gmail.com >> jason at bioperl.org >> http://fungalgenomes.org/ >> http://twitter.com/hyphaltip >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Sun Jan 24 23:39:43 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 24 Jan 2010 22:39:43 -0600 Subject: [Bioperl-l] Transcribe in bioperl In-Reply-To: References: <366c6f341001241729t155483a7q65631b2495c338b0@mail.gmail.com> <366c6f341001241822t790c19eeo5f5facffa0c35600@mail.gmail.com> Message-ID: I think the main reason there hasn't been a transcribe() is that very few users ask for it. Most just use a quick '$seq =~ tr/T/U/', or use spliced_seq() and/or translate() (i.e. they don't care about the intermediate mRNA). I don't have a problem with adding a transcribe method to PrimarySeq, but (and Mark has already picked up on this) it should be constrained to DNA only and return RNA. And there might be a case for adding the analogous reverse_translate(). Also worth adding this to the proper interface class (PrimarySeqI, I think) so all Seq/PrimarySeq will have it (or have to implement their own). chris On Jan 24, 2010, at 8:48 PM, Mark A. Jensen wrote: > Not a bad idea, a semantics-preserving/checking thing. transcribe() could return an object with alphabet == 'rna' > and the T's flipped, or bork if called against an object with alphbet != 'dna'. > I can add such a thing to Bio::PrimarySeqI (where all these doodads seem to be stashed), if desired. > > ----- Original Message ----- From: "Peng Yu" > To: "Jason Stajich" > Cc: > Sent: Sunday, January 24, 2010 9:22 PM > Subject: Re: [Bioperl-l] Transcribe in bioperl > > >> To convert from T to U. I could use perl's builtin function. But it is >> semantically far away from 'transcribe'. If there is a function with >> name 'transcribe', it will be better. >> On Sun, Jan 24, 2010 at 8:06 PM, Jason Stajich wrote: >>> What exactly do you want to do? >>> spliced_seq for a feature would be the closest thing... >>> >>> -jason >>> On Jan 24, 2010, at 5:29 PM, Peng Yu wrote: >>> >>>> I found the function 'translate' in bioperl. But I don't find >>>> 'transcribe'. Is there such a function? >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> -- >>> Jason Stajich >>> jason.stajich at gmail.com >>> jason at bioperl.org >>> http://fungalgenomes.org/ >>> http://twitter.com/hyphaltip >>> >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Sun Jan 24 23:43:07 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 24 Jan 2010 22:43:07 -0600 Subject: [Bioperl-l] Transcribe in bioperl In-Reply-To: References: <366c6f341001241729t155483a7q65631b2495c338b0@mail.gmail.com> <366c6f341001241822t790c19eeo5f5facffa0c35600@mail.gmail.com> Message-ID: <489E0B85-0BC3-45DB-8660-494CF69F35FF@illinois.edu> On Jan 24, 2010, at 10:39 PM, Chris Fields wrote: > ...And there might be a case for adding the analogous reverse_translate(). Bah. Meant reverse_transcribe(). Ah well. chris From dan.kortschak at adelaide.edu.au Mon Jan 25 00:33:28 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Mon, 25 Jan 2010 16:03:28 +1030 Subject: [Bioperl-l] BEDTools module Message-ID: <1264397608.4898.9.camel@epistle> Hi All, A wrapper and parser for the BEDTools utilities suite of Aaron Quinlan and Ira Hall is now available in the bioperl-run subversion repository (bioperl-run/trunk r16754). Using BEDTools you can, among other things: * Intersecting two BED files in search of overlapping features. * Merging overlapping features. * Screening for paired-end (PE) overlaps between PE sequences and existing genomic features. * Calculating the depth and breadth of sequence coverage across defined "windows" in a genome. (see for manuals and downloads). BEDTools is a suite of 17 commandline executable. The module attempts to provide and options comprehensively and can return Bio::SeqIO or Bio::SeqFeature::Collection object where appropriate (or Bio::Root::IO where specific handling has not been implemented - please give feedback on desired features for this). cheers Dan From cjfields at illinois.edu Mon Jan 25 00:35:06 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 24 Jan 2010 23:35:06 -0600 Subject: [Bioperl-l] Distance between non-overlapping sequences in DNAStatistics Message-ID: <192B6949-26CA-45EA-A4E6-FD89F216CA84@illinois.edu> Just a quick question for those using DNAStatistics. I just fixed a bug in Bio::Align::DNAStatistics that failed with a div by zero error (bug 2901) on this data: >seq1 GGTACCAGCAGGTGGGCCGCCTACTGCGCACGCGCGGGTTTGCGGGCAGCCGC >seq2 GGTACCAGCTGGTGGGCCGCCTACTGCGCACGCGCGGGTTTGCGGGCAGCCGC >seq3 GGTACCAGCAGGTGGTCCGCCTA------------------------------ >seq4 --------------------------CGCACGCGCGTGTTTGCGGGCAGCCGC Since seq3 and seq4 don't overlap, the distance can't be calculated. In our case, I replace the score with 'NA' as a placeholder, but I'm worried about downstream app breakage. Anyone have an objection to using 'NA' here, or know of ways this may lead to problems elsewhere? chris From jason at bioperl.org Mon Jan 25 00:58:03 2010 From: jason at bioperl.org (Jason Stajich) Date: Sun, 24 Jan 2010 21:58:03 -0800 Subject: [Bioperl-l] Distance between non-overlapping sequences in DNAStatistics In-Reply-To: <192B6949-26CA-45EA-A4E6-FD89F216CA84@illinois.edu> References: <192B6949-26CA-45EA-A4E6-FD89F216CA84@illinois.edu> Message-ID: It could also return -1 which is used as place holder for NA in other programs that generate distance matrices. -jason On Jan 24, 2010, at 9:35 PM, Chris Fields wrote: > Just a quick question for those using DNAStatistics. I just fixed a > bug in Bio::Align::DNAStatistics that failed with a div by zero > error (bug 2901) on this data: > >> seq1 > GGTACCAGCAGGTGGGCCGCCTACTGCGCACGCGCGGGTTTGCGGGCAGCCGC >> seq2 > GGTACCAGCTGGTGGGCCGCCTACTGCGCACGCGCGGGTTTGCGGGCAGCCGC >> seq3 > GGTACCAGCAGGTGGTCCGCCTA------------------------------ >> seq4 > --------------------------CGCACGCGCGTGTTTGCGGGCAGCCGC > > Since seq3 and seq4 don't overlap, the distance can't be > calculated. In our case, I replace the score with 'NA' as a > placeholder, but I'm worried about downstream app breakage. Anyone > have an objection to using 'NA' here, or know of ways this may lead > to problems elsewhere? > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ http://twitter.com/hyphaltip From maj at fortinbras.us Mon Jan 25 08:17:54 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 25 Jan 2010 08:17:54 -0500 Subject: [Bioperl-l] Transcribe in bioperl In-Reply-To: References: <366c6f341001241729t155483a7q65631b2495c338b0@mail.gmail.com><366c6f341001241822t790c19eeo5f5facffa0c35600@mail.gmail.com> Message-ID: transcribe() and rev_transcribe added to Bio::PrimarySeqI, plus tests in t/Seq.t, @ r16757 MAJ ----- Original Message ----- From: "Chris Fields" To: "Mark A. Jensen" Cc: ; "Peng Yu" Sent: Sunday, January 24, 2010 11:39 PM Subject: Re: [Bioperl-l] Transcribe in bioperl >I think the main reason there hasn't been a transcribe() is that very few users >ask for it. Most just use a quick '$seq =~ tr/T/U/', or use spliced_seq() >and/or translate() (i.e. they don't care about the intermediate mRNA). I don't >have a problem with adding a transcribe method to PrimarySeq, but (and Mark has >already picked up on this) it should be constrained to DNA only and return RNA. >And there might be a case for adding the analogous reverse_translate(). > > Also worth adding this to the proper interface class (PrimarySeqI, I think) so > all Seq/PrimarySeq will have it (or have to implement their own). > > chris > > On Jan 24, 2010, at 8:48 PM, Mark A. Jensen wrote: > >> Not a bad idea, a semantics-preserving/checking thing. transcribe() could >> return an object with alphabet == 'rna' >> and the T's flipped, or bork if called against an object with alphbet != >> 'dna'. >> I can add such a thing to Bio::PrimarySeqI (where all these doodads seem to >> be stashed), if desired. >> >> ----- Original Message ----- From: "Peng Yu" >> To: "Jason Stajich" >> Cc: >> Sent: Sunday, January 24, 2010 9:22 PM >> Subject: Re: [Bioperl-l] Transcribe in bioperl >> >> >>> To convert from T to U. I could use perl's builtin function. But it is >>> semantically far away from 'transcribe'. If there is a function with >>> name 'transcribe', it will be better. >>> On Sun, Jan 24, 2010 at 8:06 PM, Jason Stajich wrote: >>>> What exactly do you want to do? >>>> spliced_seq for a feature would be the closest thing... >>>> >>>> -jason >>>> On Jan 24, 2010, at 5:29 PM, Peng Yu wrote: >>>> >>>>> I found the function 'translate' in bioperl. But I don't find >>>>> 'transcribe'. Is there such a function? >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> -- >>>> Jason Stajich >>>> jason.stajich at gmail.com >>>> jason at bioperl.org >>>> http://fungalgenomes.org/ >>>> http://twitter.com/hyphaltip >>>> >>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Mon Jan 25 08:23:12 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 25 Jan 2010 07:23:12 -0600 Subject: [Bioperl-l] BEDTools module In-Reply-To: <1264397608.4898.9.camel@epistle> References: <1264397608.4898.9.camel@epistle> Message-ID: <0F5CE93E-0E6C-4317-806B-A463A9B0917E@illinois.edu> Great work Dan! chris On Jan 24, 2010, at 11:33 PM, Dan Kortschak wrote: > Hi All, > > A wrapper and parser for the BEDTools utilities suite of Aaron Quinlan > and Ira Hall is now available in the bioperl-run subversion repository > (bioperl-run/trunk r16754). > > Using BEDTools you can, among other things: > > * Intersecting two BED files in search of overlapping features. > * Merging overlapping features. > * Screening for paired-end (PE) overlaps between PE sequences and > existing genomic features. > * Calculating the depth and breadth of sequence coverage across > defined "windows" in a genome. > > (see for manuals and downloads). > > BEDTools is a suite of 17 commandline executable. The module attempts to > provide and options comprehensively and can return Bio::SeqIO or > Bio::SeqFeature::Collection object where appropriate (or Bio::Root::IO > where specific handling has not been implemented - please give feedback > on desired features for this). > > cheers > Dan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Jan 25 08:27:26 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 25 Jan 2010 07:27:26 -0600 Subject: [Bioperl-l] Distance between non-overlapping sequences in DNAStatistics In-Reply-To: References: <192B6949-26CA-45EA-A4E6-FD89F216CA84@illinois.edu> Message-ID: That works for me, just want to ensure we're DTRT. I'll change it over. chris On Jan 24, 2010, at 11:58 PM, Jason Stajich wrote: > It could also return -1 which is used as place holder for NA in other programs that generate distance matrices. > -jason > On Jan 24, 2010, at 9:35 PM, Chris Fields wrote: > >> Just a quick question for those using DNAStatistics. I just fixed a bug in Bio::Align::DNAStatistics that failed with a div by zero error (bug 2901) on this data: >> >>> seq1 >> GGTACCAGCAGGTGGGCCGCCTACTGCGCACGCGCGGGTTTGCGGGCAGCCGC >>> seq2 >> GGTACCAGCTGGTGGGCCGCCTACTGCGCACGCGCGGGTTTGCGGGCAGCCGC >>> seq3 >> GGTACCAGCAGGTGGTCCGCCTA------------------------------ >>> seq4 >> --------------------------CGCACGCGCGTGTTTGCGGGCAGCCGC >> >> Since seq3 and seq4 don't overlap, the distance can't be calculated. In our case, I replace the score with 'NA' as a placeholder, but I'm worried about downstream app breakage. Anyone have an objection to using 'NA' here, or know of ways this may lead to problems elsewhere? >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > http://fungalgenomes.org/ > http://twitter.com/hyphaltip > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Mon Jan 25 08:41:38 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 25 Jan 2010 08:41:38 -0500 Subject: [Bioperl-l] BEDTools module In-Reply-To: <1264397608.4898.9.camel@epistle> References: <1264397608.4898.9.camel@epistle> Message-ID: <8D494783F87E4C32BD797008E260C3C2@NewLife> Rock 'n' roll, Dan! ----- Original Message ----- From: "Dan Kortschak" To: Sent: Monday, January 25, 2010 12:33 AM Subject: [Bioperl-l] BEDTools module > Hi All, > > A wrapper and parser for the BEDTools utilities suite of Aaron Quinlan > and Ira Hall is now available in the bioperl-run subversion repository > (bioperl-run/trunk r16754). > > Using BEDTools you can, among other things: > > * Intersecting two BED files in search of overlapping features. > * Merging overlapping features. > * Screening for paired-end (PE) overlaps between PE sequences and > existing genomic features. > * Calculating the depth and breadth of sequence coverage across > defined "windows" in a genome. > > (see for manuals and downloads). > > BEDTools is a suite of 17 commandline executable. The module attempts to > provide and options comprehensively and can return Bio::SeqIO or > Bio::SeqFeature::Collection object where appropriate (or Bio::Root::IO > where specific handling has not been implemented - please give feedback > on desired features for this). > > cheers > Dan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From rtbio.2009 at gmail.com Mon Jan 25 08:43:19 2010 From: rtbio.2009 at gmail.com (Roopa Raghuveer) Date: Mon, 25 Jan 2010 14:43:19 +0100 Subject: [Bioperl-l] Regarding blast in Bioperl Message-ID: Hello Mark,Chris and all, This is Roopa again. I have a small problem again. I am working on Remote blast. The program works well. But the problem is this. The program accesses the server and gets the output correctly. I am trying to send the result sequences into an array and I found that always the first sequence among the Result sequences is missing. The code is my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , '-organism' => "$organ\[ORGN]"); - Show quoted text - while (my $input = $str->next_seq()) { #Blast a sequence against a database: #Alternatively, you could pass in a file with many #sequences rather than loop through sequence one at a time #Remove the loop starting 'while (my $input = $str->next_seq())' #and swap the two lines below for an example of that. open(OUTFILE,'>',$debugfile); print OUTFILE $input; close(OUTFILE); my $r = $factory->submit_blast($input); open(OUTFILE,'>',$debugfile); # print OUTFILE $r; close(OUTFILE); print STDERR "waiting...." if($v>0); while ( my @rids = $factory->each_rid ) { open(OUTFILE,'>',$debugfile); # print OUTFILE "while entered"; close(OUTFILE); foreach my $rid ( @rids ) { open(OUTFILE,'>',$debugfile); # print OUTFILE "foreach entered"; close(OUTFILE); my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $factory->remove_rid($rid); } open(OUTFILE,'>',$debugfile); # print OUTFILE "if entered"; close(OUTFILE); print STDERR "." if ( $v > 0 ); sleep 5; } else { open(OUTFILE,'>',$debugfile); # print OUTFILE "else entered"; close(OUTFILE); my $result = $rc->next_result(); #save the output $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; open(BLASTDEBUGFILE,'>',$blastdebugfile); print BLASTDEBUGFILE $result->next_hit(); close(BLASTDEBUGFILE); my $filename = $serverpath."/blastdata_". time()."\.out"; # open(DEBUGFILE,'>',$debugfile); # open(new,'>',$filename); # @arra=; # print DEBUGFILE @arra; # close(DEBUGFILE); # close(new); $factory->save_output($filename); # open(BLASTDEBUGFILE,'>',$debugfile); # print BLASTDEBUGFILE "Hello $rid"; # close(BLASTDEBUGFILE); $factory->remove_rid($rid); open(BLASTDEBUGFILE,'>',$blastdebugfile); print BLASTDEBUGFILE $organism; close(BLASTDEBUGFILE); # open(OUTFILE,'>',$outfile); # print OUTFILE "Test2 $result->database_name()"; # close(OUTFILE); #$hit = $result->next_hit; #open(new,'>',$debugfile); #print $hit; #close(new); $dummy=0; while ( my $hit = $result->next_hit ) { next unless ( $v >= 0); # open(OUTFILE,'>',$debugfile); # print OUTFILE "$hit in while hits"; # close(OUTFILE); my $sequ = $gb->get_Seq_by_version($hit->name); my $dna = $sequ->seq(); # get the sequence as a string $dummy++; open(OUTFILE,'>',$debugfile); # print OUTFILE $dummy; close(OUTFILE); push(@seqs,$dna); } } } } } $warum=@seqs; open(OUTFILE,'>',$debugfile); # print OUTFILE $warum; print OUTFILE @seqs; close(OUTFILE); return(@seqs); } open(OUTFILE, '>',$outfile) || die ; print OUTFILE "\n RNAi Result \n \n

Inputsequence:
"; Here in the above code, I was trying to debug the code and trying to get the count of the array and even the sequence. But when the output data was giving 1 sequence, the count of the array was 0 and when I tried to print the output sequence I could not get any. It was the same when the no of output sequences was 3, I tried to print the sequences but was getting the count of the array as 2 and was printing only two sequences. Please help me in sorting out this problem. Regards, Roopa. From rtbio.2009 at gmail.com Mon Jan 25 08:44:57 2010 From: rtbio.2009 at gmail.com (Roopa Raghuveer) Date: Mon, 25 Jan 2010 14:44:57 +0100 Subject: [Bioperl-l] remote blast bioperl Message-ID: Hello all, I have a small problem again. I am working on Remote blast. The program works well. But the problem is this. The program accesses the server and gets the output correctly. I am trying to send the result sequences into an array and I found that always the first sequence among the Result sequences is missing. The code is my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , '-organism' => "$organ\[ORGN]"); - Show quoted text - while (my $input = $str->next_seq()) { #Blast a sequence against a database: #Alternatively, you could pass in a file with many #sequences rather than loop through sequence one at a time #Remove the loop starting 'while (my $input = $str->next_seq())' #and swap the two lines below for an example of that. open(OUTFILE,'>',$debugfile); print OUTFILE $input; close(OUTFILE); my $r = $factory->submit_blast($input); open(OUTFILE,'>',$debugfile); # print OUTFILE $r; close(OUTFILE); print STDERR "waiting...." if($v>0); while ( my @rids = $factory->each_rid ) { open(OUTFILE,'>',$debugfile); # print OUTFILE "while entered"; close(OUTFILE); foreach my $rid ( @rids ) { open(OUTFILE,'>',$debugfile); # print OUTFILE "foreach entered"; close(OUTFILE); my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $factory->remove_rid($rid); } open(OUTFILE,'>',$debugfile); # print OUTFILE "if entered"; close(OUTFILE); print STDERR "." if ( $v > 0 ); sleep 5; } else { open(OUTFILE,'>',$debugfile); # print OUTFILE "else entered"; close(OUTFILE); my $result = $rc->next_result(); #save the output $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; open(BLASTDEBUGFILE,'>',$blastdebugfile); print BLASTDEBUGFILE $result->next_hit(); close(BLASTDEBUGFILE); my $filename = $serverpath."/blastdata_". time()."\.out"; # open(DEBUGFILE,'>',$debugfile); # open(new,'>',$filename); # @arra=; # print DEBUGFILE @arra; # close(DEBUGFILE); # close(new); $factory->save_output($filename); # open(BLASTDEBUGFILE,'>',$debugfile); # print BLASTDEBUGFILE "Hello $rid"; # close(BLASTDEBUGFILE); $factory->remove_rid($rid); open(BLASTDEBUGFILE,'>',$blastdebugfile); print BLASTDEBUGFILE $organism; close(BLASTDEBUGFILE); # open(OUTFILE,'>',$outfile); # print OUTFILE "Test2 $result->database_name()"; # close(OUTFILE); #$hit = $result->next_hit; #open(new,'>',$debugfile); #print $hit; #close(new); $dummy=0; while ( my $hit = $result->next_hit ) { next unless ( $v >= 0); # open(OUTFILE,'>',$debugfile); # print OUTFILE "$hit in while hits"; # close(OUTFILE); my $sequ = $gb->get_Seq_by_version($hit->name); my $dna = $sequ->seq(); # get the sequence as a string $dummy++; open(OUTFILE,'>',$debugfile); # print OUTFILE $dummy; close(OUTFILE); push(@seqs,$dna); } } } } } $warum=@seqs; open(OUTFILE,'>',$debugfile); # print OUTFILE $warum; print OUTFILE @seqs; close(OUTFILE); return(@seqs); } open(OUTFILE, '>',$outfile) || die ; print OUTFILE "\n RNAi Result \n \n

Inputsequence:
"; Here in the above code, I was trying to debug the code and trying to get the count of the array and even the sequence. But when the output data was giving 1 sequence, the count of the array was 0 and when I tried to print the output sequence I could not get any. It was the same when the no of output sequences was 3, I tried to print the sequences but was getting the count of the array as 2 and was printing only two sequences. Please help me in sorting out this problem. Regards, Roopa. From cjfields at illinois.edu Mon Jan 25 09:05:44 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 25 Jan 2010 08:05:44 -0600 Subject: [Bioperl-l] remote blast bioperl In-Reply-To: References: Message-ID: <7E402CC5-9C66-4315-B437-7C4EC2317371@illinois.edu> Roopa, We have received all 4+ of your posts. There is absolutely no need for you to keep repeatedly posting the same thing to the list. Be patient, we'll try to get to you as soon as we can! chris On Jan 25, 2010, at 7:44 AM, Roopa Raghuveer wrote: > Hello all, > > I have a small problem again. I am working on Remote blast. The program works well. But the problem is this. The program accesses the server and gets the output correctly. I am trying to send the result sequences into an array and I found that always the first sequence among the Result sequences is missing. The code is > > my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , '-organism' => "$organ\[ORGN]"); > - Show quoted text - > > > while (my $input = $str->next_seq()) > { > #Blast a sequence against a database: > #Alternatively, you could pass in a file with many > #sequences rather than loop through sequence one at a time > #Remove the loop starting 'while (my $input = $str->next_seq())' > #and swap the two lines below for an example of that. > > open(OUTFILE,'>',$debugfile); > print OUTFILE $input; > close(OUTFILE); > > > my $r = $factory->submit_blast($input); > > open(OUTFILE,'>',$debugfile); > # print OUTFILE $r; > close(OUTFILE); > > > print STDERR "waiting...." if($v>0); > > while ( my @rids = $factory->each_rid ) { > open(OUTFILE,'>',$debugfile); > # print OUTFILE "while entered"; > close(OUTFILE); > foreach my $rid ( @rids ) { > > open(OUTFILE,'>',$debugfile); > # print OUTFILE "foreach entered"; > close(OUTFILE); > > my $rc = $factory->retrieve_blast($rid); > > if( !ref($rc) ) > { > if( $rc < 0 ) > { > $factory->remove_rid($rid); > } > open(OUTFILE,'>',$debugfile); > # print OUTFILE "if entered"; > close(OUTFILE); > print STDERR "." if ( $v > 0 ); > sleep 5; > } > else { > open(OUTFILE,'>',$debugfile); > # print OUTFILE "else entered"; > close(OUTFILE); > > my $result = $rc->next_result(); > #save the output > $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; > > open(BLASTDEBUGFILE,'>',$blastdebugfile); > print BLASTDEBUGFILE $result->next_hit(); > close(BLASTDEBUGFILE); > > my $filename = $serverpath."/blastdata_". > time()."\.out"; > > > # open(DEBUGFILE,'>',$debugfile); > # open(new,'>',$filename); > # @arra=; > # print DEBUGFILE @arra; > # close(DEBUGFILE); > # close(new); > > $factory->save_output($filename); > > # open(BLASTDEBUGFILE,'>',$debugfile); > # print BLASTDEBUGFILE "Hello $rid"; > # close(BLASTDEBUGFILE); > > $factory->remove_rid($rid); > > open(BLASTDEBUGFILE,'>',$blastdebugfile); > print BLASTDEBUGFILE $organism; > close(BLASTDEBUGFILE); > > # open(OUTFILE,'>',$outfile); > # print OUTFILE "Test2 $result->database_name()"; > # close(OUTFILE); > > #$hit = $result->next_hit; > #open(new,'>',$debugfile); > #print $hit; > #close(new); > $dummy=0; > > while ( my $hit = $result->next_hit ) { > > next unless ( $v >= 0); > > # open(OUTFILE,'>',$debugfile); > # print OUTFILE "$hit in while hits"; > # close(OUTFILE); > my $sequ = $gb->get_Seq_by_version($hit->name); > my $dna = $sequ->seq(); # get the sequence as a string > $dummy++; > open(OUTFILE,'>',$debugfile); > # print OUTFILE $dummy; > close(OUTFILE); > push(@seqs,$dna); > } > } > } > } > } > > $warum=@seqs; > open(OUTFILE,'>',$debugfile); > # print OUTFILE $warum; > print OUTFILE @seqs; > > close(OUTFILE); > return(@seqs); > } > > open(OUTFILE, '>',$outfile) || die ; > > print OUTFILE "\n > RNAi Result > \n > \n >

> Inputsequence:
"; > > > Here in the above code, I was trying to debug the code and trying to get the count of the array and even the sequence. But when the output data was giving 1 sequence, the count of the array was 0 and when I tried to print the output sequence I could not get any. It was the same when the no of output sequences was 3, I tried to print the sequences but was getting the count of the array as 2 and was printing only two sequences. > > Please help me in sorting out this problem. > > Regards, > Roopa. From jiann-jy at hotmail.com Sun Jan 24 21:03:55 2010 From: jiann-jy at hotmail.com (JY) Date: Sun, 24 Jan 2010 18:03:55 -0800 (PST) Subject: [Bioperl-l] how to retrieve accession number by taxon id?? Message-ID: <4cef88b5-fa53-4e63-9167-30075c10a058@k19g2000yqc.googlegroups.com> i need to retrieve accession number and sequence to complete one of my part in my project, but how to retrieve accession number by the taxon id. From lpaulet at ual.es Mon Jan 25 15:25:55 2010 From: lpaulet at ual.es (Lorenzo Carretero-Paulet) Date: Mon, 25 Jan 2010 21:25:55 +0100 Subject: [Bioperl-l] HTMLResultWriter Message-ID: <4B5DFE53.2000201@ual.es> Hi all, I'm trying to generate a subroutine that performs a BLAST search and returns the corresponding reports in txt, xml and html format. I?m experiencing problems with the latter, as the program returns the following error message: "Can't call method "next_result" without a package or object reference at..." sub blasting { my ($query, $E_value) = @_; my ($outputfilenameB, $outputfilenameX, $outputfilenameH); $outputfilenameB=$query.".BLAST.txt"; $outputfilenameX=$query.".BLAST.xml"; $outputfilenameH=$query.".BLAST.html"; #legacy_blast.pl blastall -i query -d nr -o blast.out --path /opt/blast/bin print qx(du -s /tmp); my _$blast_report _=qx/$blast -p blastp -d $database -i $query -e $E_value -b 20000 -o $outputfilenameB/; my $XMLblast_report =qx/$blast -p blastp -d $database -i $query -e $E_value -m 7 -b 20000 -o $outputfilenameX/; my $writerhtml = new Bio::SearchIO::Writer::HTMLResultWriter(); my $outhtml = new Bio::SearchIO(-writer => $writerhtml, -file => ">$outputfilenameH"); while( my $result = _$blast_report_->next_result ) { # get a result from Bio::SearchIO parsing or build it up in memory $outhtml->write_result($result); } } Can anyone see where the problem is? Cheers! Lorenzo From lpaulet at ual.es Mon Jan 25 15:31:08 2010 From: lpaulet at ual.es (lpaulet at ual.es) Date: Mon, 25 Jan 2010 21:31:08 +0100 Subject: [Bioperl-l] HTMLResultWriter Message-ID: <20100125213108.zws18jpd8gwwkssk@webmail.ual.es> Hi all, I'm trying to generate a subroutine that performs a BLAST search and returns the corresponding reports in txt, xml and html format. I?m experiencing problems with the latter, as the program returns the following error message: "Can't call method "next_result" without a package or object reference at..." sub blasting { my ($query, $E_value) = @_; my ($outputfilenameB, $outputfilenameX, $outputfilenameH); $outputfilenameB=$query.".BLAST.txt"; $outputfilenameX=$query.".BLAST.xml"; $outputfilenameH=$query.".BLAST.html"; #legacy_blast.pl blastall -i query -d nr -o blast.out --path /opt/blast/bin print qx(du -s /tmp); my $blast_report =qx/$blast -p blastp -d $database -i $query -e $E_value -b 20000 -o $outputfilenameB/; my $XMLblast_report =qx/$blast -p blastp -d $database -i $query -e $E_value -m 7 -b 20000 -o $outputfilenameX/; my $writerhtml = new Bio::SearchIO::Writer::HTMLResultWriter(); my $outhtml = new Bio::SearchIO(-writer => $writerhtml, -file => ">$outputfilenameH"); while( my $result = $blast_report->next_result ) { # get a result from Bio::SearchIO parsing or build it up in memory $outhtml->write_result($result); } } Can anyone see where the problem is? Cheers! Lorenzo From dan.kortschak at adelaide.edu.au Mon Jan 25 16:00:37 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Tue, 26 Jan 2010 07:30:37 +1030 Subject: [Bioperl-l] Transcribe in bioperl In-Reply-To: References: Message-ID: <1264453237.4552.3.camel@epistle> A reverse_translate to IUPAC degenerate codes is not a bad idea, particularly for PCR primer design. Dan On Mon, 2010-01-25 at 09:05 -0500, bioperl-l-request at lists.open-bio.org wrote: > On Jan 24, 2010, at 10:39 PM, Chris Fields wrote: > > > ...And there might be a case for adding the analogous > reverse_translate(). > > Bah. Meant reverse_transcribe(). Ah well. > > chris From maj at fortinbras.us Mon Jan 25 16:07:49 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 25 Jan 2010 16:07:49 -0500 Subject: [Bioperl-l] HTMLResultWriter In-Reply-To: <20100125213108.zws18jpd8gwwkssk@webmail.ual.es> References: <20100125213108.zws18jpd8gwwkssk@webmail.ual.es> Message-ID: Lorenzo-- your $blast_report is set to be (some of) the text returned by a system call of a blast program; this isn't going to be an object of any kind, and so no functions can be called from it (as at "$blast_report->next_result"). You need to parse the text generated by the blast call using Bio::SearchIO to get a Bio::Search::Result::BlastResult object. you could do @blast_lines = qx/ ...your blast call... /; open my $bf, ">my.blast"; print $bf, @blast_lines; close $bf; $blast_result = Bio::SearchIO->new(-file=>'my.blast', -format => 'blast'); and carry on from there. But why not look at Bio::Tools::Run::StandAloneBlast or Bio::Tools::Run::StandAloneBlastPlus to run your blasts within perl? These wrap the blast programs and deliver BioPerl objects, rather than plain text output. cheers MAJ ----- Original Message ----- From: To: Sent: Monday, January 25, 2010 3:31 PM Subject: [Bioperl-l] HTMLResultWriter Hi all, I'm trying to generate a subroutine that performs a BLAST search and returns the corresponding reports in txt, xml and html format. I?m experiencing problems with the latter, as the program returns the following error message: "Can't call method "next_result" without a package or object reference at..." sub blasting { my ($query, $E_value) = @_; my ($outputfilenameB, $outputfilenameX, $outputfilenameH); $outputfilenameB=$query.".BLAST.txt"; $outputfilenameX=$query.".BLAST.xml"; $outputfilenameH=$query.".BLAST.html"; #legacy_blast.pl blastall -i query -d nr -o blast.out --path /opt/blast/bin print qx(du -s /tmp); my $blast_report =qx/$blast -p blastp -d $database -i $query -e $E_value -b 20000 -o $outputfilenameB/; my $XMLblast_report =qx/$blast -p blastp -d $database -i $query -e $E_value -m 7 -b 20000 -o $outputfilenameX/; my $writerhtml = new Bio::SearchIO::Writer::HTMLResultWriter(); my $outhtml = new Bio::SearchIO(-writer => $writerhtml, -file => ">$outputfilenameH"); while( my $result = $blast_report->next_result ) { # get a result from Bio::SearchIO parsing or build it up in memory $outhtml->write_result($result); } } Can anyone see where the problem is? Cheers! Lorenzo _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Mon Jan 25 16:09:24 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 25 Jan 2010 22:09:24 +0100 Subject: [Bioperl-l] HTMLResultWriter In-Reply-To: <4B5DFE53.2000201@ual.es> References: <4B5DFE53.2000201@ual.es> Message-ID: > my _$blast_report _=qx/$blast -p blastp -d $database -i $query -e $E_value -b 20000 -o $outputfilenameB/; > while( my $result = _$blast_report_->next_result ) { # get a result from Bio::SearchIO parsing or build it up in memory _$blast_report_ is not a valid variable name, as far as I know. Plus there's a space between report and the final '_' in the first of the above two lines. Does this code compile? Dave From Russell.Smithies at agresearch.co.nz Mon Jan 25 16:14:15 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue, 26 Jan 2010 10:14:15 +1300 Subject: [Bioperl-l] Regarding blast in Bioperl In-Reply-To: References: Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C61AC908B@exchsth.agresearch.co.nz> That's a fair mix of incomplete code you've supplied!! Did you read the documentation for RemoteBlast? The example there will do 99% of what you want. http://search.cpan.org/~cjfields/BioPerl-1.6.1/Bio/Tools/Run/RemoteBlast.pm I'm not entirely sure what you're trying to do (as you've left out a bit of your code) but I assume you're trying to retrieve and print the sequence for each hit. Here's something that works, not sure exactly what/why you want to print but it should get you a bit further. --Russell ================================ #!perl -w use Bio::Tools::Run::RemoteBlast; use Bio::DB::GenBank; use CGI ':standard'; use strict; my $q = new CGI; my @params = ( -prog => 'blastn', -data => 'nr', -expect => '1e-30', -entrez_query => 'Homo sapiens [ORGN]', -readmethod => 'SearchIO' ); my $gb = Bio::DB::GenBank->new; my $factory = Bio::Tools::Run::RemoteBlast->new(@params); #$v is just to turn on and off the messages my $v = 1; my $str = Bio::SeqIO->new( -file => 'test.faa', -format => "fasta" ); while ( my $input = $str->next_seq() ) { my $r = $factory->submit_blast($input); print STDERR "waiting..." if ( $v > 0 ); while ( my @rids = $factory->each_rid ) { foreach my $rid (@rids) { my @seqs = (); my $rc = $factory->retrieve_blast($rid); if ( !ref($rc) ) { if ( $rc < 0 ) { $factory->remove_rid($rid); } print STDERR "." if ( $v > 0 ); sleep 5; } else { my $result = $rc->next_result(); #save the blast output my $filename = $result->query_accession . '.out'; $factory->save_output($filename); $factory->remove_rid($rid); print "\nQuery Name: ", $result->query_name(), "\n"; while ( my $hit = $result->next_hit ) { # store the hit sequences push @seqs, $gb->get_Seq_by_version( $hit->name ); next unless ( $v > 0 ); print "\thit name is ", $hit->name, "\n"; while ( my $hsp = $hit->next_hsp ) { print "\t\tscore is ", $hsp->score, "\n"; } } ## print the seqs you've retrieved?? open( OUTFILE, '>', $result->query_accession . '.htm' ); print OUTFILE $q->start_html('RNAi Result'), $q->h1('RNAi Result'), $q->h2('Input'), $q->pre( toString($input) ), $q->h2('Output'); foreach (@seqs) { #there's probably a better way of printing the seq print OUTFILE $q->pre( toString($_) ); } print OUTFILE $q->end_html; close OUTFILE; } } } } sub toString { my $s = shift; return '>' . $s->display_id . " " . $s->desc . "\n" . $s->seq; } ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From biopython at maubp.freeserve.co.uk Mon Jan 25 16:24:33 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 25 Jan 2010 21:24:33 +0000 Subject: [Bioperl-l] Transcribe in bioperl In-Reply-To: <1264453237.4552.3.camel@epistle> References: <1264453237.4552.3.camel@epistle> Message-ID: <320fb6e01001251324r3e8ec3adt3c6d6f16a4839f56@mail.gmail.com> On Mon, Jan 25, 2010 at 9:00 PM, Dan Kortschak wrote: > A reverse_translate to IUPAC degenerate codes is not a bad idea, > particularly for PCR primer design. I would say it could be a bad idea. For any protein string there are multiple possible back translations, and this cannot be captured fully as a nucleotide string even using the IUPAC ambiguity chars. We debated this back and forth for Biopython, and decided to leave it out. It wasn't possible for a simple back translate to a simple string to handle the use cases we considered, and other options like returning a regular expression covering all possible back translations were too complex (for a core sequence method/function). Peter From jason at bioperl.org Mon Jan 25 16:26:55 2010 From: jason at bioperl.org (Jason Stajich) Date: Mon, 25 Jan 2010 13:26:55 -0800 Subject: [Bioperl-l] Transcribe in bioperl In-Reply-To: <320fb6e01001251324r3e8ec3adt3c6d6f16a4839f56@mail.gmail.com> References: <1264453237.4552.3.camel@epistle> <320fb6e01001251324r3e8ec3adt3c6d6f16a4839f56@mail.gmail.com> Message-ID: <98995830-DC7F-4404-A216-874EF5799DB6@bioperl.org> It was already implemented several years ago -- reverse_translate Bio::Tools::CodonTable -> revtanslate my $seqobj = Bio::PrimarySeq->new(-seq => 'FHGERHEL'); my $iupac_str = $myCodonTable->reverse_translate_all($seqobj); Chris had meant to say reverse_transcribe of RNA -> DNA FWIW. -jason On Jan 25, 2010, at 1:24 PM, Peter wrote: > On Mon, Jan 25, 2010 at 9:00 PM, Dan Kortschak > wrote: >> A reverse_translate to IUPAC degenerate codes is not a bad idea, >> particularly for PCR primer design. > > I would say it could be a bad idea. For any protein string there are > multiple possible back translations, and this cannot be captured > fully as a nucleotide string even using the IUPAC ambiguity chars. > > We debated this back and forth for Biopython, and decided to leave it > out. It wasn't possible for a simple back translate to a simple > string to > handle the use cases we considered, and other options like returning > a regular expression covering all possible back translations were too > complex (for a core sequence method/function). > > Peter > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ http://twitter.com/hyphaltip From maj at fortinbras.us Mon Jan 25 16:19:24 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 25 Jan 2010 16:19:24 -0500 Subject: [Bioperl-l] Transcribe in bioperl In-Reply-To: <1264453237.4552.3.camel@epistle> References: <1264453237.4552.3.camel@epistle> Message-ID: <72B106F0D5FF4F1E858CC9BD1EF33142@NewLife> I think we have that functionality in Bio::Tools::SeqPattern, courtesy of Bruno V--- ----- Original Message ----- From: "Dan Kortschak" To: Sent: Monday, January 25, 2010 4:00 PM Subject: Re: [Bioperl-l] Transcribe in bioperl >A reverse_translate to IUPAC degenerate codes is not a bad idea, > particularly for PCR primer design. > > Dan > > On Mon, 2010-01-25 at 09:05 -0500, bioperl-l-request at lists.open-bio.org > wrote: >> On Jan 24, 2010, at 10:39 PM, Chris Fields wrote: >> >> > ...And there might be a case for adding the analogous >> reverse_translate(). >> >> Bah. Meant reverse_transcribe(). Ah well. >> >> chris > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From dan.kortschak at adelaide.edu.au Mon Jan 25 16:38:44 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Tue, 26 Jan 2010 08:08:44 +1030 Subject: [Bioperl-l] Transcribe in bioperl In-Reply-To: <320fb6e01001251324r3e8ec3adt3c6d6f16a4839f56@mail.gmail.com> References: <1264453237.4552.3.camel@epistle> <320fb6e01001251324r3e8ec3adt3c6d6f16a4839f56@mail.gmail.com> Message-ID: <1264455524.4552.23.camel@epistle> Good to see that these ideas have been considered. I'd be interested to see this discussion, or at least the point dealing with the problems that might arise. I'm at a loss as to how ambiguity codes can't completely describe all possible coding sequences for any given codon table (via Bio::Tools::CodonTable - in fact this already has the revtranslate that could be fitted into a Bio::PrimarySeq method - to answer Mark and Jason's comments, I think that /if/ a reverse_translate method exists, it makes logical sense to have it tied to a sequence object, calling the B:T:CT method on the seq object itself rather than only in Bio::Tools, 2?). Pete, tcn you provide an example of the problems? thanks Dan On Mon, 2010-01-25 at 21:24 +0000, Peter wrote: > I would say it could be a bad idea. For any protein string there are > multiple possible back translations, and this cannot be captured > fully as a nucleotide string even using the IUPAC ambiguity chars. From lpaulet at ual.es Mon Jan 25 16:53:07 2010 From: lpaulet at ual.es (lpaulet at ual.es) Date: Mon, 25 Jan 2010 22:53:07 +0100 Subject: [Bioperl-l] HTMLResultWriter In-Reply-To: References: <4B5DFE53.2000201@ual.es> Message-ID: <20100125225307.2zl2cn2hkcsgccso@webmail.ual.es> Thanks Dave and Mark. Quoting Dave Messina : >> my _$blast_report _=qx/$blast -p blastp -d $database -i $query -e >> $E_value -b 20000 -o $outputfilenameB/; > >> while( my $result = _$blast_report_->next_result ) { # get a result >> from Bio::SearchIO parsing or build it up in memory > > > _$blast_report_ is not a valid variable name, as far as I know. Plus > there's a space between report and the final '_' in the first of > the above two lines. > > Does this code compile? > > Dave > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From rtbio.2009 at gmail.com Mon Jan 25 17:35:32 2010 From: rtbio.2009 at gmail.com (Roopa Raghuveer) Date: Mon, 25 Jan 2010 23:35:32 +0100 Subject: [Bioperl-l] Regarding blast in Bioperl In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C61AC908B@exchsth.agresearch.co.nz> References: <18DF7D20DFEC044098A1062202F5FFF32C61AC908B@exchsth.agresearch.co.nz> Message-ID: Hello Russell, Thank you very much for your reply. My problem is that Remote blast is getting well executed with my code and I am getting the .out file with sequences producing significant alignments. But, when I am trying to retrieve the sequences into an array @seqs, I am able to retrieve all the sequences except for the first hit. If the number of hits that I get in the .out file to be 3, I am able to retrieve only 2 hits i.e., I am able to get only 2 sequences. If there is only one significant hit for my sequence, then the name and description of the sequence appears in the .out file, but I am unable to get it into the array,the array count shows 0 and there would not be any sequence in the array. I hope that you have got me now. Here comes my code, use Bio::SearchIO; use Bio::Search::Result::BlastResult; use Bio::Perl; use Bio::Tools::Run::RemoteBlast; use Bio::Seq; use Bio::SeqIO; use Bio::DB::GenBank; $serverpath = "/srv/www/htdocs/rain/RNAi"; $serverurl = "http://141.84.66.66/rain/RNAi"; $outfile = $serverpath."/rnairesult_".time().".html"; $nuc = $serverpath."/nuc".time().".txt"; $debugfile = $serverpath."/debug_".time().".txt"; $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; my $outstring =""; &parse_form; print "Content-type: text/html\n\n"; print "\n"; print "RNAi Result"; print " \n"; print "\n"; print "\n"; print " Your results will appear here
"; print " Please be patient, runtime can be up to 5 minutes
"; print " This page will automatically reload in 30 seconds."; print "\n"; print "\n"; defined(my $pid = fork) or die "Can't fork: $!"; exit if $pid; open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; open(OUTFILE, '>',$outfile); print OUTFILE "\n RNAi Result \n \n \n Your results will appear here
Please be patient, runtime can be up to 5 minutes
This page will automatically reload in 30 seconds
\n \n"; close(OUTFILE); @compseqs = blastcode($in{'Inputseq'},$in{'Organism'}); $in{'Inputseq'} =~ s/>.*$//m; $in{'Inputseq'} =~ s/[^TAGC]//gim; $in{'Inputseq'} =~ tr/actg/ACTG/; @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'}, $in{'Threshold'}); sub blastcode { $inpu1= $_[0]; $organ= $_[1]; open(NUC,'>',$nuc); print NUC $inpu1,"\n"; close(NUC); my $prog = 'blastn'; my $db = 'refseq_rna'; my $e_val= '1e-10'; my $organism= $organ; $gb = new Bio::DB::GenBank; my @params = ( '-prog' => $prog, '-data' => $db, '-expect' => $e_val, '-readmethod' => 'SearchIO', '-Organism' => $organism ); # open(OUTFILE,'>',$debugfile); # print OUTFILE @params; # close(OUTFILE); my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY => "$organ\[ORGN]"); #my $factory = Bio::Tools::Run::RemoteBlast->new(@params); #change a paramter #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma Brucei[ORGN]'; #change a paramter # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]'; my $v = 1; #$v is just to turn on and off the messages my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , '-organism' => "$organ\[ORGN]"); while (my $input = $str->next_seq()) { #Blast a sequence against a database: #Alternatively, you could pass in a file with many #sequences rather than loop through sequence one at a time #Remove the loop starting 'while (my $input = $str->next_seq())' #and swap the two lines below for an example of that. open(OUTFILE,'>',$debugfile); print OUTFILE $input; close(OUTFILE); my $r = $factory->submit_blast($input); open(OUTFILE,'>',$debugfile); # print OUTFILE $r; close(OUTFILE); print STDERR "waiting...." if($v>0); while ( my @rids = $factory->each_rid ) { foreach my $rid ( @rids ) { my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $factory->remove_rid($rid); } print STDERR "." if ( $v > 0 ); sleep 5; } else { my $result = $rc->next_result(); #save the output $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; open(BLASTDEBUGFILE,'>',$blastdebugfile); print BLASTDEBUGFILE $result->next_hit(); close(BLASTDEBUGFILE); my $filename = $serverpath."/blastdata_".time()."\.out"; # open(DEBUGFILE,'>',$debugfile); # open(new,'>',$filename); # @arra=; # print DEBUGFILE @arra; # close(DEBUGFILE); # close(new); $factory->save_output($filename); # open(BLASTDEBUGFILE,'>',$debugfile); # print BLASTDEBUGFILE "Hello $rid"; # close(BLASTDEBUGFILE); $factory->remove_rid($rid); while ( my $hit = $result->next_hit ) { next unless ( $v >= 0); my $sequ = $gb->get_Seq_by_version($hit->name); my $dna = $sequ->seq(); # get the sequence as a string $dummy++; open(OUTFILE,'>',$debugfile); open(OUTFILE,'>',$debugfile); # print OUTFILE $dna; close(OUTFILE); push(@seqs,$dna); } } } } } $warum=scalar(@seqs); open(OUTFILE,'>',$debugfile); print OUTFILE $warum; # print OUTFILE @seqs; close(OUTFILE); return(@seqs); } open(OUTFILE, '>',$outfile) || die ; print OUTFILE "\n RNAi Result \n \n

Inputsequence:
"; for ($i=0; $i\n"; } } print OUTFILE "

"; $z=@compseqs; for($k=0;$k<$z;$k++) { print OUTFILE "

Compare Sequence:
"; for ($i=0; $i\n"; } } print OUTFILE "

"; } print OUTFILE "

Window:
$in{'Windowsize'}

Threshold:
$in{'Threshold'}

"; my $j=0; for ($i=0; $i{similar}<=$in{'Threshold'}){ $j=$in{'Windowsize'}; } $height=$out[$i]->{similar}*5; } if ($j>0) { print OUTFILE ""; $outstring .= "".substr ($in{'Inputseq'}, $i, 1).""; $j--; } else { print OUTFILE ""; $outstring .= "".substr ($in{'Inputseq'}, $i, 1).""; } if ( ($i+1)%10==0){ $outstring .= " "; } if ( ($i+1)%60==0){ $outstring .= "
\n"; } if ( ($i+1)%800==0){ print OUTFILE "

\n"; } } print OUTFILE "

$outstring"; #foreach (@out) { #print OUTFILE "

Sequence: $_->{sequence}: $_->{similar} matchs

"; #if ($_->{similar}<=$in{'Threshold'}){ # } #} print OUTFILE "\n\n"; close OUTFILE; #nameprint(); sub parse_form { local ($buffer, @pairs, $pair, $name, $value); # Read in text $ENV{'REQUEST_METHOD'} =~ tr/a-z/A-Z/; if ($ENV{'REQUEST_METHOD'} eq "POST") { read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'}); } else { $buffer = $ENV{'QUERY_STRING'}; } @pairs = split(/&/, $buffer); foreach $pair (@pairs) { ($name, $value) = split(/=/, $pair); $value =~ tr/+/ /; $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg; $in{$name} = $value; } } Regards, Roopa. On Mon, Jan 25, 2010 at 10:14 PM, Smithies, Russell < Russell.Smithies at agresearch.co.nz> wrote: > That's a fair mix of incomplete code you've supplied!! > Did you read the documentation for RemoteBlast? The example there will do > 99% of what you want. > http://search.cpan.org/~cjfields/BioPerl-1.6.1/Bio/Tools/Run/RemoteBlast.pm > > I'm not entirely sure what you're trying to do (as you've left out a bit of > your code) but I assume you're trying to retrieve and print the sequence for > each hit. > > Here's something that works, not sure exactly what/why you want to print > but it should get you a bit further. > > --Russell > > > ================================ > #!perl -w > > use Bio::Tools::Run::RemoteBlast; > use Bio::DB::GenBank; > > use CGI ':standard'; > > use strict; > > my $q = new CGI; > > my @params = ( > -prog => 'blastn', > -data => 'nr', > -expect => '1e-30', > -entrez_query => 'Homo sapiens [ORGN]', > -readmethod => 'SearchIO' > ); > > my $gb = Bio::DB::GenBank->new; > > my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > > #$v is just to turn on and off the messages > my $v = 1; > > my $str = Bio::SeqIO->new( -file => 'test.faa', -format => "fasta" ); > > while ( my $input = $str->next_seq() ) { > > my $r = $factory->submit_blast($input); > > print STDERR "waiting..." if ( $v > 0 ); > while ( my @rids = $factory->each_rid ) { > foreach my $rid (@rids) { > my @seqs = (); > my $rc = $factory->retrieve_blast($rid); > if ( !ref($rc) ) { > if ( $rc < 0 ) { > $factory->remove_rid($rid); > } > print STDERR "." if ( $v > 0 ); > sleep 5; > } > else { > my $result = $rc->next_result(); > > #save the blast output > my $filename = $result->query_accession . '.out'; > $factory->save_output($filename); > $factory->remove_rid($rid); > print "\nQuery Name: ", $result->query_name(), "\n"; > while ( my $hit = $result->next_hit ) { > > # store the hit sequences > push @seqs, $gb->get_Seq_by_version( $hit->name ); > > next unless ( $v > 0 ); > print "\thit name is ", $hit->name, "\n"; > while ( my $hsp = $hit->next_hsp ) { > print "\t\tscore is ", $hsp->score, "\n"; > } > } > > ## print the seqs you've retrieved?? > open( OUTFILE, '>', $result->query_accession . '.htm' ); > print OUTFILE $q->start_html('RNAi Result'), > $q->h1('RNAi Result'), > $q->h2('Input'), > $q->pre( toString($input) ), > $q->h2('Output'); > > foreach (@seqs) { > > #there's probably a better way of printing the seq > print OUTFILE $q->pre( toString($_) ); > } > print OUTFILE $q->end_html; > close OUTFILE; > } > } > } > } > > sub toString { > my $s = shift; > return '>' . $s->display_id . " " . $s->desc . "\n" . $s->seq; > } > > > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > From ajmackey at gmail.com Tue Jan 26 08:24:43 2010 From: ajmackey at gmail.com (Aaron Mackey) Date: Tue, 26 Jan 2010 08:24:43 -0500 Subject: [Bioperl-l] Transcribe in bioperl In-Reply-To: <1264455524.4552.23.camel@epistle> References: <1264453237.4552.3.camel@epistle> <320fb6e01001251324r3e8ec3adt3c6d6f16a4839f56@mail.gmail.com> <1264455524.4552.23.camel@epistle> Message-ID: <24c96eca1001260524s3d46e850hfdcc461e22210972@mail.gmail.com> There's also Bio::Tools::IUPAC; given a sequence with IUPAC ambiguity codes, it provides a SeqIO stream that enumerates all the possible unambiguous realizations. Not the right solution for every situation, but quite useful when you need it. -Aaron On Mon, Jan 25, 2010 at 4:38 PM, Dan Kortschak < dan.kortschak at adelaide.edu.au> wrote: > Good to see that these ideas have been considered. > > I'd be interested to see this discussion, or at least the point dealing > with the problems that might arise. I'm at a loss as to how ambiguity > codes can't completely describe all possible coding sequences for any > given codon table (via Bio::Tools::CodonTable - in fact this already has > the revtranslate that could be fitted into a Bio::PrimarySeq method - to > answer Mark and Jason's comments, I think that /if/ a reverse_translate > method exists, it makes logical sense to have it tied to a sequence > object, calling the B:T:CT method on the seq object itself rather than > only in Bio::Tools, 2?). Pete, tcn you provide an example of the > problems? > > thanks > Dan > > On Mon, 2010-01-25 at 21:24 +0000, Peter wrote: > > I would say it could be a bad idea. For any protein string there are > > multiple possible back translations, and this cannot be captured > > fully as a nucleotide string even using the IUPAC ambiguity chars. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From nml5566 at gmail.com Tue Jan 26 16:10:54 2010 From: nml5566 at gmail.com (Nathan Liles) Date: Tue, 26 Jan 2010 15:10:54 -0600 Subject: [Bioperl-l] SVN access Message-ID: <4B5F5A5E.2070406@gmail.com> Does anyone know who I need to talk to for getting developer access for the Bioperl SVN? I want to submit a patch to the genbank2gff3 converter. Thanks, Nathan From Russell.Smithies at agresearch.co.nz Tue Jan 26 20:40:40 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 27 Jan 2010 14:40:40 +1300 Subject: [Bioperl-l] how to retrieve organism name from accession number? In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz> References: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife> <18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz> Grrrrrr, I hate eutils!!!! ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: NCBI esearch fatal error: Search Backend failed: Error 111 (Connection refused) STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 STACK: Bio::Tools::EUtilities::parse_data /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332 STACK: Bio::Tools::EUtilities::get_ids /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441 STACK: Bio::DB::EUtilities::get_ids /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363 STACK: get_desc.pl:32 ----------------------------------------------------------- Nice error message though :-) --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Smithies, Russell > Sent: Monday, 11 January 2010 10:05 a.m. > To: 'Chris Fields' > Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org' > Subject: Re: [Bioperl-l] how to retrieve organism name from accession > number? > > I've started to go off eUtils recently (not BioPerl's fault) as I've often > been finding that with large queries, chunks of the resulting data is > missing. > For example, before Xmas I was creating species-specific databases by > using eUtils to get a list of GI numbers back for a taxid, then retrieving > the fasta sequences in chunks of 500. > Very regularly, in the middle of the fasta there would be a message about > resource unavailable eg. > >test_sequence_1 > TACGATCATCGCTResource UnavailableTACGACTCTGCT > >test_sequence_2 > TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT > > Often this wasn't detected until formatdb complained about invalid > characters. > Inquiries to NCBI as to why this was happening and what to do about it > returned stupid answers ("do each sequence manually thru the web > interface", or "use eUtils"). > As we have a nice fast network connection, I now prefer to download very > large gzip files (i.e. all of refseq) and extract what I need. > > I can't help but think that NCBI could solve a lot of problems if they > gzipped the output from eUtils queries - it's something I've requested > regularly for the last 5 years or so!! > > --Russell > > > > -----Original Message----- > > From: Chris Fields [mailto:cjfields at illinois.edu] > > Sent: Monday, 11 January 2010 9:50 a.m. > > To: Smithies, Russell > > Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-bio.org' > > Subject: Re: [Bioperl-l] how to retrieve organism name from accession > > number? > > > > One could also use Bio::DB::Taxonomy, which indexes the same files or > > (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for the > > details). > > > > chris > > > > On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote: > > > > > An alternate non-BioPerly way (that may be faster given NCBI's > flakiness > > lately) would be to download the gi_taxid_nucl.zip or gi_taxid_prot.zip > > files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash > and > > do lookups. > > > In that same dir, taxdump.tar.gz contains a file called names.dmp > which > > lists taxids and descriptions (and synonyms) > > > > > > If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I > > could do this: > > > > > > my $taxid = $gi_taxid_nucl{$accession}; > > > my $org_name = $names{$taxid}; > > > > > > --Russell > > > > > > > > >> -----Original Message----- > > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > >> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen > > >> Sent: Saturday, 26 December 2009 4:52 p.m. > > >> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org > > >> Subject: Re: [Bioperl-l] how to retrieve organism name from accession > > >> number? > > >> > > >> Bhakti, > > >> The following example (using EUtilities) may serve your purpose: > > >> > > >> use Bio::DB::EUtilities; > > >> > > >> my (%taxa, @taxa); > > >> my (%names, %idmap); > > >> > > >> # these are protein ids; nuc ids will work by changing -dbfrom => > > >> 'nucleotide', > > >> # (probably) > > >> > > >> my @ids = qw(1621261 89318838 68536103 20807972 730439); > > >> > > >> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink', > > >> -db => 'taxonomy', > > >> -dbfrom => 'protein', > > >> -correspondence => 1, > > >> -id => \@ids); > > >> > > >> # iterate through the LinkSet objects > > >> while (my $ds = $factory->next_LinkSet) { > > >> $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0] > > >> } > > >> > > >> @taxa = @taxa{@ids}; > > >> > > >> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary', > > >> -db => 'taxonomy', > > >> -id => \@taxa ); > > >> > > >> while (local $_ = $factory->next_DocSum) { > > >> $names{($_->get_contents_by_name('TaxId'))[0]} = > > >> ($_->get_contents_by_name('ScientificName'))[0]; > > >> } > > >> > > >> foreach (@ids) { > > >> $idmap{$_} = $names{$taxa{$_}}; > > >> } > > >> > > >> # %idmap is > > >> # 1621261 => 'Mycobacterium tuberculosis H37Rv' > > >> # 20807972 => 'Thermoanaerobacter tengcongensis MB4' > > >> # 68536103 => 'Corynebacterium jeikeium K411' > > >> # 730439 => 'Bacillus caldolyticus' > > >> # 89318838 => undef (this record has been removed from the db) > > >> > > >> 1; > > >> > > >> You probably will need to break up your 30000 into chunks > > >> (say, 1000-3000 each), and do the above on each chunk with a > > >> > > >> sleep 3; > > >> > > >> or so separating the queries. > > >> MAJ > > >> ----- Original Message ----- > > >> From: "Bhakti Dwivedi" > > >> To: > > >> Sent: Friday, December 25, 2009 9:46 PM > > >> Subject: [Bioperl-l] how to retrieve organism name from accession > > number? > > >> > > >> > > >>> Hi, > > >>> > > >>> Does anyone know how to retrieve the "Source" or the "Species name" > > >> given > > >>> the accession number using Bioperl. I have these 30,000 accession > > >> numbers > > >>> for which I need to get the source organisms. Any kind of help will > > be > > >>> appreciated. > > >>> > > >>> Thanks > > >>> > > >>> BD > > >>> _______________________________________________ > > >>> Bioperl-l mailing list > > >>> Bioperl-l at lists.open-bio.org > > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > >>> > > >>> > > >> > > >> _______________________________________________ > > >> Bioperl-l mailing list > > >> Bioperl-l at lists.open-bio.org > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > ======================================================================= > > > Attention: The information contained in this message and/or > attachments > > > from AgResearch Limited is intended only for the persons or entities > > > to which it is addressed and may contain confidential and/or > privileged > > > material. Any review, retransmission, dissemination or other use of, > or > > > taking of any action in reliance upon, this information by persons or > > > entities other than the intended recipients is prohibited by > AgResearch > > > Limited. If you have received this message in error, please notify the > > > sender immediately. > > > > ======================================================================= > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Jan 26 20:46:26 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 26 Jan 2010 19:46:26 -0600 Subject: [Bioperl-l] how to retrieve organism name from accession number? In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz> References: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife> <18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz> Message-ID: <4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu> It's unfortunate but I have heard this problem popping up quite a bit more frequently lately. Not to push too many buttons but NCBI isn't very forthcoming with help these days; they have become quite insular. Not sure if they're short-staffed due to budget or if there are other issues. chris On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote: > Grrrrrr, I hate eutils!!!! > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: NCBI esearch fatal error: Search Backend failed: Error 111 (Connection refused) > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 > STACK: Bio::Tools::EUtilities::parse_data /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332 > STACK: Bio::Tools::EUtilities::get_ids /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441 > STACK: Bio::DB::EUtilities::get_ids /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363 > STACK: get_desc.pl:32 > ----------------------------------------------------------- > > > Nice error message though :-) > > > --Russell > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell >> Sent: Monday, 11 January 2010 10:05 a.m. >> To: 'Chris Fields' >> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org' >> Subject: Re: [Bioperl-l] how to retrieve organism name from accession >> number? >> >> I've started to go off eUtils recently (not BioPerl's fault) as I've often >> been finding that with large queries, chunks of the resulting data is >> missing. >> For example, before Xmas I was creating species-specific databases by >> using eUtils to get a list of GI numbers back for a taxid, then retrieving >> the fasta sequences in chunks of 500. >> Very regularly, in the middle of the fasta there would be a message about >> resource unavailable eg. >>> test_sequence_1 >> TACGATCATCGCTResource UnavailableTACGACTCTGCT >>> test_sequence_2 >> TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT >> >> Often this wasn't detected until formatdb complained about invalid >> characters. >> Inquiries to NCBI as to why this was happening and what to do about it >> returned stupid answers ("do each sequence manually thru the web >> interface", or "use eUtils"). >> As we have a nice fast network connection, I now prefer to download very >> large gzip files (i.e. all of refseq) and extract what I need. >> >> I can't help but think that NCBI could solve a lot of problems if they >> gzipped the output from eUtils queries - it's something I've requested >> regularly for the last 5 years or so!! >> >> --Russell >> >> >>> -----Original Message----- >>> From: Chris Fields [mailto:cjfields at illinois.edu] >>> Sent: Monday, 11 January 2010 9:50 a.m. >>> To: Smithies, Russell >>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-bio.org' >>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession >>> number? >>> >>> One could also use Bio::DB::Taxonomy, which indexes the same files or >>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for the >>> details). >>> >>> chris >>> >>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote: >>> >>>> An alternate non-BioPerly way (that may be faster given NCBI's >> flakiness >>> lately) would be to download the gi_taxid_nucl.zip or gi_taxid_prot.zip >>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash >> and >>> do lookups. >>>> In that same dir, taxdump.tar.gz contains a file called names.dmp >> which >>> lists taxids and descriptions (and synonyms) >>>> >>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I >>> could do this: >>>> >>>> my $taxid = $gi_taxid_nucl{$accession}; >>>> my $org_name = $names{$taxid}; >>>> >>>> --Russell >>>> >>>> >>>>> -----Original Message----- >>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen >>>>> Sent: Saturday, 26 December 2009 4:52 p.m. >>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org >>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession >>>>> number? >>>>> >>>>> Bhakti, >>>>> The following example (using EUtilities) may serve your purpose: >>>>> >>>>> use Bio::DB::EUtilities; >>>>> >>>>> my (%taxa, @taxa); >>>>> my (%names, %idmap); >>>>> >>>>> # these are protein ids; nuc ids will work by changing -dbfrom => >>>>> 'nucleotide', >>>>> # (probably) >>>>> >>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439); >>>>> >>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink', >>>>> -db => 'taxonomy', >>>>> -dbfrom => 'protein', >>>>> -correspondence => 1, >>>>> -id => \@ids); >>>>> >>>>> # iterate through the LinkSet objects >>>>> while (my $ds = $factory->next_LinkSet) { >>>>> $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0] >>>>> } >>>>> >>>>> @taxa = @taxa{@ids}; >>>>> >>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary', >>>>> -db => 'taxonomy', >>>>> -id => \@taxa ); >>>>> >>>>> while (local $_ = $factory->next_DocSum) { >>>>> $names{($_->get_contents_by_name('TaxId'))[0]} = >>>>> ($_->get_contents_by_name('ScientificName'))[0]; >>>>> } >>>>> >>>>> foreach (@ids) { >>>>> $idmap{$_} = $names{$taxa{$_}}; >>>>> } >>>>> >>>>> # %idmap is >>>>> # 1621261 => 'Mycobacterium tuberculosis H37Rv' >>>>> # 20807972 => 'Thermoanaerobacter tengcongensis MB4' >>>>> # 68536103 => 'Corynebacterium jeikeium K411' >>>>> # 730439 => 'Bacillus caldolyticus' >>>>> # 89318838 => undef (this record has been removed from the db) >>>>> >>>>> 1; >>>>> >>>>> You probably will need to break up your 30000 into chunks >>>>> (say, 1000-3000 each), and do the above on each chunk with a >>>>> >>>>> sleep 3; >>>>> >>>>> or so separating the queries. >>>>> MAJ >>>>> ----- Original Message ----- >>>>> From: "Bhakti Dwivedi" >>>>> To: >>>>> Sent: Friday, December 25, 2009 9:46 PM >>>>> Subject: [Bioperl-l] how to retrieve organism name from accession >>> number? >>>>> >>>>> >>>>>> Hi, >>>>>> >>>>>> Does anyone know how to retrieve the "Source" or the "Species name" >>>>> given >>>>>> the accession number using Bioperl. I have these 30,000 accession >>>>> numbers >>>>>> for which I need to get the source organisms. Any kind of help will >>> be >>>>>> appreciated. >>>>>> >>>>>> Thanks >>>>>> >>>>>> BD >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >> ======================================================================= >>>> Attention: The information contained in this message and/or >> attachments >>>> from AgResearch Limited is intended only for the persons or entities >>>> to which it is addressed and may contain confidential and/or >> privileged >>>> material. Any review, retransmission, dissemination or other use of, >> or >>>> taking of any action in reliance upon, this information by persons or >>>> entities other than the intended recipients is prohibited by >> AgResearch >>>> Limited. If you have received this message in error, please notify the >>>> sender immediately. >>>> >> ======================================================================= >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l From Russell.Smithies at agresearch.co.nz Tue Jan 26 20:59:15 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 27 Jan 2010 14:59:15 +1300 Subject: [Bioperl-l] how to retrieve organism name from accession number? In-Reply-To: <4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu> References: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife> <18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz> <4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C61AC961E@exchsth.agresearch.co.nz> I've had a wide selection of errors lately: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: NCBI esearch fatal error: Search Backend failed: Error 11 (Resource temporarily unavailable) STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 STACK: Bio::Tools::EUtilities::parse_data /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332 STACK: Bio::Tools::EUtilities::get_ids /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441 STACK: Bio::DB::EUtilities::get_ids /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363 STACK: get_desc.pl:32 ----------------------------------------------------------- And I never get a good explanation from NCBI or suggestions on how to avoid it. --Russell > -----Original Message----- > From: Chris Fields [mailto:cjfields at illinois.edu] > Sent: Wednesday, 27 January 2010 2:46 p.m. > To: Smithies, Russell > Cc: 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org' > Subject: Re: [Bioperl-l] how to retrieve organism name from accession > number? > > It's unfortunate but I have heard this problem popping up quite a bit more > frequently lately. Not to push too many buttons but NCBI isn't very > forthcoming with help these days; they have become quite insular. Not > sure if they're short-staffed due to budget or if there are other issues. > > chris > > On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote: > > > Grrrrrr, I hate eutils!!!! > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: NCBI esearch fatal error: Search Backend failed: Error 111 > (Connection refused) > > STACK: Error::throw > > STACK: Bio::Root::Root::throw > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 > > STACK: Bio::Tools::EUtilities::parse_data > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332 > > STACK: Bio::Tools::EUtilities::get_ids > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441 > > STACK: Bio::DB::EUtilities::get_ids > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363 > > STACK: get_desc.pl:32 > > ----------------------------------------------------------- > > > > > > Nice error message though :-) > > > > > > --Russell > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell > >> Sent: Monday, 11 January 2010 10:05 a.m. > >> To: 'Chris Fields' > >> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org' > >> Subject: Re: [Bioperl-l] how to retrieve organism name from accession > >> number? > >> > >> I've started to go off eUtils recently (not BioPerl's fault) as I've > often > >> been finding that with large queries, chunks of the resulting data is > >> missing. > >> For example, before Xmas I was creating species-specific databases by > >> using eUtils to get a list of GI numbers back for a taxid, then > retrieving > >> the fasta sequences in chunks of 500. > >> Very regularly, in the middle of the fasta there would be a message > about > >> resource unavailable eg. > >>> test_sequence_1 > >> TACGATCATCGCTResource UnavailableTACGACTCTGCT > >>> test_sequence_2 > >> TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT > >> > >> Often this wasn't detected until formatdb complained about invalid > >> characters. > >> Inquiries to NCBI as to why this was happening and what to do about it > >> returned stupid answers ("do each sequence manually thru the web > >> interface", or "use eUtils"). > >> As we have a nice fast network connection, I now prefer to download > very > >> large gzip files (i.e. all of refseq) and extract what I need. > >> > >> I can't help but think that NCBI could solve a lot of problems if they > >> gzipped the output from eUtils queries - it's something I've requested > >> regularly for the last 5 years or so!! > >> > >> --Russell > >> > >> > >>> -----Original Message----- > >>> From: Chris Fields [mailto:cjfields at illinois.edu] > >>> Sent: Monday, 11 January 2010 9:50 a.m. > >>> To: Smithies, Russell > >>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-bio.org' > >>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession > >>> number? > >>> > >>> One could also use Bio::DB::Taxonomy, which indexes the same files or > >>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for > the > >>> details). > >>> > >>> chris > >>> > >>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote: > >>> > >>>> An alternate non-BioPerly way (that may be faster given NCBI's > >> flakiness > >>> lately) would be to download the gi_taxid_nucl.zip or > gi_taxid_prot.zip > >>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash > >> and > >>> do lookups. > >>>> In that same dir, taxdump.tar.gz contains a file called names.dmp > >> which > >>> lists taxids and descriptions (and synonyms) > >>>> > >>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I > >>> could do this: > >>>> > >>>> my $taxid = $gi_taxid_nucl{$accession}; > >>>> my $org_name = $names{$taxid}; > >>>> > >>>> --Russell > >>>> > >>>> > >>>>> -----Original Message----- > >>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen > >>>>> Sent: Saturday, 26 December 2009 4:52 p.m. > >>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org > >>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from > accession > >>>>> number? > >>>>> > >>>>> Bhakti, > >>>>> The following example (using EUtilities) may serve your purpose: > >>>>> > >>>>> use Bio::DB::EUtilities; > >>>>> > >>>>> my (%taxa, @taxa); > >>>>> my (%names, %idmap); > >>>>> > >>>>> # these are protein ids; nuc ids will work by changing -dbfrom => > >>>>> 'nucleotide', > >>>>> # (probably) > >>>>> > >>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439); > >>>>> > >>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink', > >>>>> -db => 'taxonomy', > >>>>> -dbfrom => 'protein', > >>>>> -correspondence => 1, > >>>>> -id => \@ids); > >>>>> > >>>>> # iterate through the LinkSet objects > >>>>> while (my $ds = $factory->next_LinkSet) { > >>>>> $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0] > >>>>> } > >>>>> > >>>>> @taxa = @taxa{@ids}; > >>>>> > >>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary', > >>>>> -db => 'taxonomy', > >>>>> -id => \@taxa ); > >>>>> > >>>>> while (local $_ = $factory->next_DocSum) { > >>>>> $names{($_->get_contents_by_name('TaxId'))[0]} = > >>>>> ($_->get_contents_by_name('ScientificName'))[0]; > >>>>> } > >>>>> > >>>>> foreach (@ids) { > >>>>> $idmap{$_} = $names{$taxa{$_}}; > >>>>> } > >>>>> > >>>>> # %idmap is > >>>>> # 1621261 => 'Mycobacterium tuberculosis H37Rv' > >>>>> # 20807972 => 'Thermoanaerobacter tengcongensis MB4' > >>>>> # 68536103 => 'Corynebacterium jeikeium K411' > >>>>> # 730439 => 'Bacillus caldolyticus' > >>>>> # 89318838 => undef (this record has been removed from the db) > >>>>> > >>>>> 1; > >>>>> > >>>>> You probably will need to break up your 30000 into chunks > >>>>> (say, 1000-3000 each), and do the above on each chunk with a > >>>>> > >>>>> sleep 3; > >>>>> > >>>>> or so separating the queries. > >>>>> MAJ > >>>>> ----- Original Message ----- > >>>>> From: "Bhakti Dwivedi" > >>>>> To: > >>>>> Sent: Friday, December 25, 2009 9:46 PM > >>>>> Subject: [Bioperl-l] how to retrieve organism name from accession > >>> number? > >>>>> > >>>>> > >>>>>> Hi, > >>>>>> > >>>>>> Does anyone know how to retrieve the "Source" or the "Species name" > >>>>> given > >>>>>> the accession number using Bioperl. I have these 30,000 accession > >>>>> numbers > >>>>>> for which I need to get the source organisms. Any kind of help > will > >>> be > >>>>>> appreciated. > >>>>>> > >>>>>> Thanks > >>>>>> > >>>>>> BD > >>>>>> _______________________________________________ > >>>>>> Bioperl-l mailing list > >>>>>> Bioperl-l at lists.open-bio.org > >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>> > >>>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> Bioperl-l mailing list > >>>>> Bioperl-l at lists.open-bio.org > >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>> > >> ======================================================================= > >>>> Attention: The information contained in this message and/or > >> attachments > >>>> from AgResearch Limited is intended only for the persons or entities > >>>> to which it is addressed and may contain confidential and/or > >> privileged > >>>> material. Any review, retransmission, dissemination or other use of, > >> or > >>>> taking of any action in reliance upon, this information by persons or > >>>> entities other than the intended recipients is prohibited by > >> AgResearch > >>>> Limited. If you have received this message in error, please notify > the > >>>> sender immediately. > >>>> > >> ======================================================================= > >>>> > >>>> _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l at lists.open-bio.org > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Jan 26 21:42:22 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 26 Jan 2010 20:42:22 -0600 Subject: [Bioperl-l] how to retrieve organism name from accession number? In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C61AC961E@exchsth.agresearch.co.nz> References: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife> <18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz> <4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu> <18DF7D20DFEC044098A1062202F5FFF32C61AC961E@exchsth.agresearch.co.nz> Message-ID: <5FF9B845-2B6A-4B8C-B8AE-A12E0CBCA2D6@illinois.edu> Makes me wonder if they're pushing more users towards the SOAP-based services and away from eutils. chris On Jan 26, 2010, at 7:59 PM, Smithies, Russell wrote: > I've had a wide selection of errors lately: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: NCBI esearch fatal error: Search Backend failed: Error 11 (Resource temporarily unavailable) > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 > STACK: Bio::Tools::EUtilities::parse_data /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332 > STACK: Bio::Tools::EUtilities::get_ids /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441 > STACK: Bio::DB::EUtilities::get_ids /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363 > STACK: get_desc.pl:32 > ----------------------------------------------------------- > > And I never get a good explanation from NCBI or suggestions on how to avoid it. > > > --Russell > > >> -----Original Message----- >> From: Chris Fields [mailto:cjfields at illinois.edu] >> Sent: Wednesday, 27 January 2010 2:46 p.m. >> To: Smithies, Russell >> Cc: 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org' >> Subject: Re: [Bioperl-l] how to retrieve organism name from accession >> number? >> >> It's unfortunate but I have heard this problem popping up quite a bit more >> frequently lately. Not to push too many buttons but NCBI isn't very >> forthcoming with help these days; they have become quite insular. Not >> sure if they're short-staffed due to budget or if there are other issues. >> >> chris >> >> On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote: >> >>> Grrrrrr, I hate eutils!!!! >>> >>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>> MSG: NCBI esearch fatal error: Search Backend failed: Error 111 >> (Connection refused) >>> STACK: Error::throw >>> STACK: Bio::Root::Root::throw >> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 >>> STACK: Bio::Tools::EUtilities::parse_data >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332 >>> STACK: Bio::Tools::EUtilities::get_ids >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441 >>> STACK: Bio::DB::EUtilities::get_ids >> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363 >>> STACK: get_desc.pl:32 >>> ----------------------------------------------------------- >>> >>> >>> Nice error message though :-) >>> >>> >>> --Russell >>> >>>> -----Original Message----- >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell >>>> Sent: Monday, 11 January 2010 10:05 a.m. >>>> To: 'Chris Fields' >>>> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org' >>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession >>>> number? >>>> >>>> I've started to go off eUtils recently (not BioPerl's fault) as I've >> often >>>> been finding that with large queries, chunks of the resulting data is >>>> missing. >>>> For example, before Xmas I was creating species-specific databases by >>>> using eUtils to get a list of GI numbers back for a taxid, then >> retrieving >>>> the fasta sequences in chunks of 500. >>>> Very regularly, in the middle of the fasta there would be a message >> about >>>> resource unavailable eg. >>>>> test_sequence_1 >>>> TACGATCATCGCTResource UnavailableTACGACTCTGCT >>>>> test_sequence_2 >>>> TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT >>>> >>>> Often this wasn't detected until formatdb complained about invalid >>>> characters. >>>> Inquiries to NCBI as to why this was happening and what to do about it >>>> returned stupid answers ("do each sequence manually thru the web >>>> interface", or "use eUtils"). >>>> As we have a nice fast network connection, I now prefer to download >> very >>>> large gzip files (i.e. all of refseq) and extract what I need. >>>> >>>> I can't help but think that NCBI could solve a lot of problems if they >>>> gzipped the output from eUtils queries - it's something I've requested >>>> regularly for the last 5 years or so!! >>>> >>>> --Russell >>>> >>>> >>>>> -----Original Message----- >>>>> From: Chris Fields [mailto:cjfields at illinois.edu] >>>>> Sent: Monday, 11 January 2010 9:50 a.m. >>>>> To: Smithies, Russell >>>>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-bio.org' >>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession >>>>> number? >>>>> >>>>> One could also use Bio::DB::Taxonomy, which indexes the same files or >>>>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for >> the >>>>> details). >>>>> >>>>> chris >>>>> >>>>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote: >>>>> >>>>>> An alternate non-BioPerly way (that may be faster given NCBI's >>>> flakiness >>>>> lately) would be to download the gi_taxid_nucl.zip or >> gi_taxid_prot.zip >>>>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash >>>> and >>>>> do lookups. >>>>>> In that same dir, taxdump.tar.gz contains a file called names.dmp >>>> which >>>>> lists taxids and descriptions (and synonyms) >>>>>> >>>>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I >>>>> could do this: >>>>>> >>>>>> my $taxid = $gi_taxid_nucl{$accession}; >>>>>> my $org_name = $names{$taxid}; >>>>>> >>>>>> --Russell >>>>>> >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen >>>>>>> Sent: Saturday, 26 December 2009 4:52 p.m. >>>>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org >>>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from >> accession >>>>>>> number? >>>>>>> >>>>>>> Bhakti, >>>>>>> The following example (using EUtilities) may serve your purpose: >>>>>>> >>>>>>> use Bio::DB::EUtilities; >>>>>>> >>>>>>> my (%taxa, @taxa); >>>>>>> my (%names, %idmap); >>>>>>> >>>>>>> # these are protein ids; nuc ids will work by changing -dbfrom => >>>>>>> 'nucleotide', >>>>>>> # (probably) >>>>>>> >>>>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439); >>>>>>> >>>>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink', >>>>>>> -db => 'taxonomy', >>>>>>> -dbfrom => 'protein', >>>>>>> -correspondence => 1, >>>>>>> -id => \@ids); >>>>>>> >>>>>>> # iterate through the LinkSet objects >>>>>>> while (my $ds = $factory->next_LinkSet) { >>>>>>> $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0] >>>>>>> } >>>>>>> >>>>>>> @taxa = @taxa{@ids}; >>>>>>> >>>>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary', >>>>>>> -db => 'taxonomy', >>>>>>> -id => \@taxa ); >>>>>>> >>>>>>> while (local $_ = $factory->next_DocSum) { >>>>>>> $names{($_->get_contents_by_name('TaxId'))[0]} = >>>>>>> ($_->get_contents_by_name('ScientificName'))[0]; >>>>>>> } >>>>>>> >>>>>>> foreach (@ids) { >>>>>>> $idmap{$_} = $names{$taxa{$_}}; >>>>>>> } >>>>>>> >>>>>>> # %idmap is >>>>>>> # 1621261 => 'Mycobacterium tuberculosis H37Rv' >>>>>>> # 20807972 => 'Thermoanaerobacter tengcongensis MB4' >>>>>>> # 68536103 => 'Corynebacterium jeikeium K411' >>>>>>> # 730439 => 'Bacillus caldolyticus' >>>>>>> # 89318838 => undef (this record has been removed from the db) >>>>>>> >>>>>>> 1; >>>>>>> >>>>>>> You probably will need to break up your 30000 into chunks >>>>>>> (say, 1000-3000 each), and do the above on each chunk with a >>>>>>> >>>>>>> sleep 3; >>>>>>> >>>>>>> or so separating the queries. >>>>>>> MAJ >>>>>>> ----- Original Message ----- >>>>>>> From: "Bhakti Dwivedi" >>>>>>> To: >>>>>>> Sent: Friday, December 25, 2009 9:46 PM >>>>>>> Subject: [Bioperl-l] how to retrieve organism name from accession >>>>> number? >>>>>>> >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> Does anyone know how to retrieve the "Source" or the "Species name" >>>>>>> given >>>>>>>> the accession number using Bioperl. I have these 30,000 accession >>>>>>> numbers >>>>>>>> for which I need to get the source organisms. Any kind of help >> will >>>>> be >>>>>>>> appreciated. >>>>>>>> >>>>>>>> Thanks >>>>>>>> >>>>>>>> BD >>>>>>>> _______________________________________________ >>>>>>>> Bioperl-l mailing list >>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>> ======================================================================= >>>>>> Attention: The information contained in this message and/or >>>> attachments >>>>>> from AgResearch Limited is intended only for the persons or entities >>>>>> to which it is addressed and may contain confidential and/or >>>> privileged >>>>>> material. Any review, retransmission, dissemination or other use of, >>>> or >>>>>> taking of any action in reliance upon, this information by persons or >>>>>> entities other than the intended recipients is prohibited by >>>> AgResearch >>>>>> Limited. If you have received this message in error, please notify >> the >>>>>> sender immediately. >>>>>> >>>> ======================================================================= >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Russell.Smithies at agresearch.co.nz Tue Jan 26 21:45:58 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 27 Jan 2010 15:45:58 +1300 Subject: [Bioperl-l] how to retrieve organism name from accession number? In-Reply-To: <5FF9B845-2B6A-4B8C-B8AE-A12E0CBCA2D6@illinois.edu> References: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife> <18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz> <4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu> <18DF7D20DFEC044098A1062202F5FFF32C61AC961E@exchsth.agresearch.co.nz> <5FF9B845-2B6A-4B8C-B8AE-A12E0CBCA2D6@illinois.edu> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C61AC964C@exchsth.agresearch.co.nz> Batch-entrez http://www.ncbi.nlm.nih.gov/portal/utils/batchentrez_p.cgi still works if you don't mind a bit of manual button clicking. It's handling chunks of 100,000 records OK (today). --Russell > -----Original Message----- > From: Chris Fields [mailto:cjfields at illinois.edu] > Sent: Wednesday, 27 January 2010 3:42 p.m. > To: Smithies, Russell > Cc: 'bioperl-l at lists.open-bio.org'; 'Mark A. Jensen' > Subject: Re: [Bioperl-l] how to retrieve organism name from accession > number? > > Makes me wonder if they're pushing more users towards the SOAP-based > services and away from eutils. > > chris > > On Jan 26, 2010, at 7:59 PM, Smithies, Russell wrote: > > > I've had a wide selection of errors lately: > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: NCBI esearch fatal error: Search Backend failed: Error 11 (Resource > temporarily unavailable) > > STACK: Error::throw > > STACK: Bio::Root::Root::throw > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 > > STACK: Bio::Tools::EUtilities::parse_data > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332 > > STACK: Bio::Tools::EUtilities::get_ids > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441 > > STACK: Bio::DB::EUtilities::get_ids > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363 > > STACK: get_desc.pl:32 > > ----------------------------------------------------------- > > > > And I never get a good explanation from NCBI or suggestions on how to > avoid it. > > > > > > --Russell > > > > > >> -----Original Message----- > >> From: Chris Fields [mailto:cjfields at illinois.edu] > >> Sent: Wednesday, 27 January 2010 2:46 p.m. > >> To: Smithies, Russell > >> Cc: 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org' > >> Subject: Re: [Bioperl-l] how to retrieve organism name from accession > >> number? > >> > >> It's unfortunate but I have heard this problem popping up quite a bit > more > >> frequently lately. Not to push too many buttons but NCBI isn't very > >> forthcoming with help these days; they have become quite insular. Not > >> sure if they're short-staffed due to budget or if there are other > issues. > >> > >> chris > >> > >> On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote: > >> > >>> Grrrrrr, I hate eutils!!!! > >>> > >>> ------------- EXCEPTION: Bio::Root::Exception ------------- > >>> MSG: NCBI esearch fatal error: Search Backend failed: Error 111 > >> (Connection refused) > >>> STACK: Error::throw > >>> STACK: Bio::Root::Root::throw > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 > >>> STACK: Bio::Tools::EUtilities::parse_data > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332 > >>> STACK: Bio::Tools::EUtilities::get_ids > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441 > >>> STACK: Bio::DB::EUtilities::get_ids > >> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363 > >>> STACK: get_desc.pl:32 > >>> ----------------------------------------------------------- > >>> > >>> > >>> Nice error message though :-) > >>> > >>> > >>> --Russell > >>> > >>>> -----Original Message----- > >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>>> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell > >>>> Sent: Monday, 11 January 2010 10:05 a.m. > >>>> To: 'Chris Fields' > >>>> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open- > bio.org' > >>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession > >>>> number? > >>>> > >>>> I've started to go off eUtils recently (not BioPerl's fault) as I've > >> often > >>>> been finding that with large queries, chunks of the resulting data is > >>>> missing. > >>>> For example, before Xmas I was creating species-specific databases by > >>>> using eUtils to get a list of GI numbers back for a taxid, then > >> retrieving > >>>> the fasta sequences in chunks of 500. > >>>> Very regularly, in the middle of the fasta there would be a message > >> about > >>>> resource unavailable eg. > >>>>> test_sequence_1 > >>>> TACGATCATCGCTResource UnavailableTACGACTCTGCT > >>>>> test_sequence_2 > >>>> TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT > >>>> > >>>> Often this wasn't detected until formatdb complained about invalid > >>>> characters. > >>>> Inquiries to NCBI as to why this was happening and what to do about > it > >>>> returned stupid answers ("do each sequence manually thru the web > >>>> interface", or "use eUtils"). > >>>> As we have a nice fast network connection, I now prefer to download > >> very > >>>> large gzip files (i.e. all of refseq) and extract what I need. > >>>> > >>>> I can't help but think that NCBI could solve a lot of problems if > they > >>>> gzipped the output from eUtils queries - it's something I've > requested > >>>> regularly for the last 5 years or so!! > >>>> > >>>> --Russell > >>>> > >>>> > >>>>> -----Original Message----- > >>>>> From: Chris Fields [mailto:cjfields at illinois.edu] > >>>>> Sent: Monday, 11 January 2010 9:50 a.m. > >>>>> To: Smithies, Russell > >>>>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open- > bio.org' > >>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from > accession > >>>>> number? > >>>>> > >>>>> One could also use Bio::DB::Taxonomy, which indexes the same files > or > >>>>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for > >> the > >>>>> details). > >>>>> > >>>>> chris > >>>>> > >>>>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote: > >>>>> > >>>>>> An alternate non-BioPerly way (that may be faster given NCBI's > >>>> flakiness > >>>>> lately) would be to download the gi_taxid_nucl.zip or > >> gi_taxid_prot.zip > >>>>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a > hash > >>>> and > >>>>> do lookups. > >>>>>> In that same dir, taxdump.tar.gz contains a file called names.dmp > >>>> which > >>>>> lists taxids and descriptions (and synonyms) > >>>>>> > >>>>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so > I > >>>>> could do this: > >>>>>> > >>>>>> my $taxid = $gi_taxid_nucl{$accession}; > >>>>>> my $org_name = $names{$taxid}; > >>>>>> > >>>>>> --Russell > >>>>>> > >>>>>> > >>>>>>> -----Original Message----- > >>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>>>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen > >>>>>>> Sent: Saturday, 26 December 2009 4:52 p.m. > >>>>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org > >>>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from > >> accession > >>>>>>> number? > >>>>>>> > >>>>>>> Bhakti, > >>>>>>> The following example (using EUtilities) may serve your purpose: > >>>>>>> > >>>>>>> use Bio::DB::EUtilities; > >>>>>>> > >>>>>>> my (%taxa, @taxa); > >>>>>>> my (%names, %idmap); > >>>>>>> > >>>>>>> # these are protein ids; nuc ids will work by changing -dbfrom => > >>>>>>> 'nucleotide', > >>>>>>> # (probably) > >>>>>>> > >>>>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439); > >>>>>>> > >>>>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink', > >>>>>>> -db => 'taxonomy', > >>>>>>> -dbfrom => 'protein', > >>>>>>> -correspondence => 1, > >>>>>>> -id => \@ids); > >>>>>>> > >>>>>>> # iterate through the LinkSet objects > >>>>>>> while (my $ds = $factory->next_LinkSet) { > >>>>>>> $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0] > >>>>>>> } > >>>>>>> > >>>>>>> @taxa = @taxa{@ids}; > >>>>>>> > >>>>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary', > >>>>>>> -db => 'taxonomy', > >>>>>>> -id => \@taxa ); > >>>>>>> > >>>>>>> while (local $_ = $factory->next_DocSum) { > >>>>>>> $names{($_->get_contents_by_name('TaxId'))[0]} = > >>>>>>> ($_->get_contents_by_name('ScientificName'))[0]; > >>>>>>> } > >>>>>>> > >>>>>>> foreach (@ids) { > >>>>>>> $idmap{$_} = $names{$taxa{$_}}; > >>>>>>> } > >>>>>>> > >>>>>>> # %idmap is > >>>>>>> # 1621261 => 'Mycobacterium tuberculosis H37Rv' > >>>>>>> # 20807972 => 'Thermoanaerobacter tengcongensis MB4' > >>>>>>> # 68536103 => 'Corynebacterium jeikeium K411' > >>>>>>> # 730439 => 'Bacillus caldolyticus' > >>>>>>> # 89318838 => undef (this record has been removed from the > db) > >>>>>>> > >>>>>>> 1; > >>>>>>> > >>>>>>> You probably will need to break up your 30000 into chunks > >>>>>>> (say, 1000-3000 each), and do the above on each chunk with a > >>>>>>> > >>>>>>> sleep 3; > >>>>>>> > >>>>>>> or so separating the queries. > >>>>>>> MAJ > >>>>>>> ----- Original Message ----- > >>>>>>> From: "Bhakti Dwivedi" > >>>>>>> To: > >>>>>>> Sent: Friday, December 25, 2009 9:46 PM > >>>>>>> Subject: [Bioperl-l] how to retrieve organism name from accession > >>>>> number? > >>>>>>> > >>>>>>> > >>>>>>>> Hi, > >>>>>>>> > >>>>>>>> Does anyone know how to retrieve the "Source" or the "Species > name" > >>>>>>> given > >>>>>>>> the accession number using Bioperl. I have these 30,000 > accession > >>>>>>> numbers > >>>>>>>> for which I need to get the source organisms. Any kind of help > >> will > >>>>> be > >>>>>>>> appreciated. > >>>>>>>> > >>>>>>>> Thanks > >>>>>>>> > >>>>>>>> BD > >>>>>>>> _______________________________________________ > >>>>>>>> Bioperl-l mailing list > >>>>>>>> Bioperl-l at lists.open-bio.org > >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>>> _______________________________________________ > >>>>>>> Bioperl-l mailing list > >>>>>>> Bioperl-l at lists.open-bio.org > >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>> > >>>> > ======================================================================= > >>>>>> Attention: The information contained in this message and/or > >>>> attachments > >>>>>> from AgResearch Limited is intended only for the persons or > entities > >>>>>> to which it is addressed and may contain confidential and/or > >>>> privileged > >>>>>> material. Any review, retransmission, dissemination or other use > of, > >>>> or > >>>>>> taking of any action in reliance upon, this information by persons > or > >>>>>> entities other than the intended recipients is prohibited by > >>>> AgResearch > >>>>>> Limited. If you have received this message in error, please notify > >> the > >>>>>> sender immediately. > >>>>>> > >>>> > ======================================================================= > >>>>>> > >>>>>> _______________________________________________ > >>>>>> Bioperl-l mailing list > >>>>>> Bioperl-l at lists.open-bio.org > >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>> > >>>> > >>>> _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l at lists.open-bio.org > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Wed Jan 27 10:14:22 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 27 Jan 2010 10:14:22 -0500 Subject: [Bioperl-l] how to retrieve organism name from accession number? In-Reply-To: <5FF9B845-2B6A-4B8C-B8AE-A12E0CBCA2D6@illinois.edu> References: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife><18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz><18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz><18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz><4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu><18DF7D20DFEC044098A1062202F5FFF32C61AC961E@exchsth.agresearch.co.nz> <5FF9B845-2B6A-4B8C-B8AE-A12E0CBCA2D6@illinois.edu> Message-ID: Precisely the MO behind SoapEU...get the jump on 'em. ----- Original Message ----- From: "Chris Fields" To: "Smithies, Russell" Cc: ; "'Mark A. Jensen'" Sent: Tuesday, January 26, 2010 9:42 PM Subject: Re: [Bioperl-l] how to retrieve organism name from accession number? > Makes me wonder if they're pushing more users towards the SOAP-based services > and away from eutils. > > chris > > On Jan 26, 2010, at 7:59 PM, Smithies, Russell wrote: > >> I've had a wide selection of errors lately: >> >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: NCBI esearch fatal error: Search Backend failed: Error 11 (Resource >> temporarily unavailable) >> STACK: Error::throw >> STACK: Bio::Root::Root::throw >> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 >> STACK: Bio::Tools::EUtilities::parse_data >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332 >> STACK: Bio::Tools::EUtilities::get_ids >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441 >> STACK: Bio::DB::EUtilities::get_ids >> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363 >> STACK: get_desc.pl:32 >> ----------------------------------------------------------- >> >> And I never get a good explanation from NCBI or suggestions on how to avoid >> it. >> >> >> --Russell >> >> >>> -----Original Message----- >>> From: Chris Fields [mailto:cjfields at illinois.edu] >>> Sent: Wednesday, 27 January 2010 2:46 p.m. >>> To: Smithies, Russell >>> Cc: 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org' >>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession >>> number? >>> >>> It's unfortunate but I have heard this problem popping up quite a bit more >>> frequently lately. Not to push too many buttons but NCBI isn't very >>> forthcoming with help these days; they have become quite insular. Not >>> sure if they're short-staffed due to budget or if there are other issues. >>> >>> chris >>> >>> On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote: >>> >>>> Grrrrrr, I hate eutils!!!! >>>> >>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>> MSG: NCBI esearch fatal error: Search Backend failed: Error 111 >>> (Connection refused) >>>> STACK: Error::throw >>>> STACK: Bio::Root::Root::throw >>> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 >>>> STACK: Bio::Tools::EUtilities::parse_data >>> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332 >>>> STACK: Bio::Tools::EUtilities::get_ids >>> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441 >>>> STACK: Bio::DB::EUtilities::get_ids >>> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363 >>>> STACK: get_desc.pl:32 >>>> ----------------------------------------------------------- >>>> >>>> >>>> Nice error message though :-) >>>> >>>> >>>> --Russell >>>> >>>>> -----Original Message----- >>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell >>>>> Sent: Monday, 11 January 2010 10:05 a.m. >>>>> To: 'Chris Fields' >>>>> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org' >>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession >>>>> number? >>>>> >>>>> I've started to go off eUtils recently (not BioPerl's fault) as I've >>> often >>>>> been finding that with large queries, chunks of the resulting data is >>>>> missing. >>>>> For example, before Xmas I was creating species-specific databases by >>>>> using eUtils to get a list of GI numbers back for a taxid, then >>> retrieving >>>>> the fasta sequences in chunks of 500. >>>>> Very regularly, in the middle of the fasta there would be a message >>> about >>>>> resource unavailable eg. >>>>>> test_sequence_1 >>>>> TACGATCATCGCTResource UnavailableTACGACTCTGCT >>>>>> test_sequence_2 >>>>> TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT >>>>> >>>>> Often this wasn't detected until formatdb complained about invalid >>>>> characters. >>>>> Inquiries to NCBI as to why this was happening and what to do about it >>>>> returned stupid answers ("do each sequence manually thru the web >>>>> interface", or "use eUtils"). >>>>> As we have a nice fast network connection, I now prefer to download >>> very >>>>> large gzip files (i.e. all of refseq) and extract what I need. >>>>> >>>>> I can't help but think that NCBI could solve a lot of problems if they >>>>> gzipped the output from eUtils queries - it's something I've requested >>>>> regularly for the last 5 years or so!! >>>>> >>>>> --Russell >>>>> >>>>> >>>>>> -----Original Message----- >>>>>> From: Chris Fields [mailto:cjfields at illinois.edu] >>>>>> Sent: Monday, 11 January 2010 9:50 a.m. >>>>>> To: Smithies, Russell >>>>>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-bio.org' >>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession >>>>>> number? >>>>>> >>>>>> One could also use Bio::DB::Taxonomy, which indexes the same files or >>>>>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for >>> the >>>>>> details). >>>>>> >>>>>> chris >>>>>> >>>>>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote: >>>>>> >>>>>>> An alternate non-BioPerly way (that may be faster given NCBI's >>>>> flakiness >>>>>> lately) would be to download the gi_taxid_nucl.zip or >>> gi_taxid_prot.zip >>>>>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash >>>>> and >>>>>> do lookups. >>>>>>> In that same dir, taxdump.tar.gz contains a file called names.dmp >>>>> which >>>>>> lists taxids and descriptions (and synonyms) >>>>>>> >>>>>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I >>>>>> could do this: >>>>>>> >>>>>>> my $taxid = $gi_taxid_nucl{$accession}; >>>>>>> my $org_name = $names{$taxid}; >>>>>>> >>>>>>> --Russell >>>>>>> >>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen >>>>>>>> Sent: Saturday, 26 December 2009 4:52 p.m. >>>>>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org >>>>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from >>> accession >>>>>>>> number? >>>>>>>> >>>>>>>> Bhakti, >>>>>>>> The following example (using EUtilities) may serve your purpose: >>>>>>>> >>>>>>>> use Bio::DB::EUtilities; >>>>>>>> >>>>>>>> my (%taxa, @taxa); >>>>>>>> my (%names, %idmap); >>>>>>>> >>>>>>>> # these are protein ids; nuc ids will work by changing -dbfrom => >>>>>>>> 'nucleotide', >>>>>>>> # (probably) >>>>>>>> >>>>>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439); >>>>>>>> >>>>>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink', >>>>>>>> -db => 'taxonomy', >>>>>>>> -dbfrom => 'protein', >>>>>>>> -correspondence => 1, >>>>>>>> -id => \@ids); >>>>>>>> >>>>>>>> # iterate through the LinkSet objects >>>>>>>> while (my $ds = $factory->next_LinkSet) { >>>>>>>> $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0] >>>>>>>> } >>>>>>>> >>>>>>>> @taxa = @taxa{@ids}; >>>>>>>> >>>>>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary', >>>>>>>> -db => 'taxonomy', >>>>>>>> -id => \@taxa ); >>>>>>>> >>>>>>>> while (local $_ = $factory->next_DocSum) { >>>>>>>> $names{($_->get_contents_by_name('TaxId'))[0]} = >>>>>>>> ($_->get_contents_by_name('ScientificName'))[0]; >>>>>>>> } >>>>>>>> >>>>>>>> foreach (@ids) { >>>>>>>> $idmap{$_} = $names{$taxa{$_}}; >>>>>>>> } >>>>>>>> >>>>>>>> # %idmap is >>>>>>>> # 1621261 => 'Mycobacterium tuberculosis H37Rv' >>>>>>>> # 20807972 => 'Thermoanaerobacter tengcongensis MB4' >>>>>>>> # 68536103 => 'Corynebacterium jeikeium K411' >>>>>>>> # 730439 => 'Bacillus caldolyticus' >>>>>>>> # 89318838 => undef (this record has been removed from the db) >>>>>>>> >>>>>>>> 1; >>>>>>>> >>>>>>>> You probably will need to break up your 30000 into chunks >>>>>>>> (say, 1000-3000 each), and do the above on each chunk with a >>>>>>>> >>>>>>>> sleep 3; >>>>>>>> >>>>>>>> or so separating the queries. >>>>>>>> MAJ >>>>>>>> ----- Original Message ----- >>>>>>>> From: "Bhakti Dwivedi" >>>>>>>> To: >>>>>>>> Sent: Friday, December 25, 2009 9:46 PM >>>>>>>> Subject: [Bioperl-l] how to retrieve organism name from accession >>>>>> number? >>>>>>>> >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> Does anyone know how to retrieve the "Source" or the "Species name" >>>>>>>> given >>>>>>>>> the accession number using Bioperl. I have these 30,000 accession >>>>>>>> numbers >>>>>>>>> for which I need to get the source organisms. Any kind of help >>> will >>>>>> be >>>>>>>>> appreciated. >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> >>>>>>>>> BD >>>>>>>>> _______________________________________________ >>>>>>>>> Bioperl-l mailing list >>>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Bioperl-l mailing list >>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>>> ======================================================================= >>>>>>> Attention: The information contained in this message and/or >>>>> attachments >>>>>>> from AgResearch Limited is intended only for the persons or entities >>>>>>> to which it is addressed and may contain confidential and/or >>>>> privileged >>>>>>> material. Any review, retransmission, dissemination or other use of, >>>>> or >>>>>>> taking of any action in reliance upon, this information by persons or >>>>>>> entities other than the intended recipients is prohibited by >>>>> AgResearch >>>>>>> Limited. If you have received this message in error, please notify >>> the >>>>>>> sender immediately. >>>>>>> >>>>> ======================================================================= >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From bhakti.dwivedi at gmail.com Wed Jan 27 14:42:06 2010 From: bhakti.dwivedi at gmail.com (Bhakti Dwivedi) Date: Wed, 27 Jan 2010 14:42:06 -0500 Subject: [Bioperl-l] Designing primers from multiple sequence alignment of amino acid sequences Message-ID: Hi, I have to design primers from the multiple sequence alignments of amino acid sequences. The sequences I am working with are quite diverged and often the available primer design programs (such as CODEHOP/iCODEHOP) fail to find any primer sets. But, when I look at the alignment manually, I could see the regions that I could use to make primers. So I designed the degenerate primers the old-fashioned way, starting from selecting the conserved regions (6-10aa long) from the alignment to translating the selected regions to DNA using the appropriate codon usage table, and then finally checking the primer sets (potential forward and reverse primers) using tools like OLIGOANALYZER. In the end, I did find few good primer sets, but getting them to work in reality is something I will have to wait and see. While doing this process manually, I really felt the need to automate it (it was not just one alignment I did, I worked with several of those). I was wondering if there is anyway bioperl can help me here, or making a perl script is the only way to go. I would appreciate your suggestions/comments. Thanks! (apologize for a long email..) Regards Bhakti From Kevin.M.Brown at asu.edu Wed Jan 27 15:23:57 2010 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Wed, 27 Jan 2010 13:23:57 -0700 Subject: [Bioperl-l] Designing primers from multiple sequence alignment ofamino acid sequences In-Reply-To: References: Message-ID: <1A4207F8295607498283FE9E93B775B4068498DB@EX02.asurite.ad.asu.edu> Bioperl is just a collection of tools, not a full blown application. Most of what you want can be done with the objects available from within the toolkit, but the application (perl script) would still need to be written to put the objects to use. You could use clustalw from within perl to align the sequences (Bio::Tools::Run::Alignment::Clustalw), find the conserved regions (Bio::SimpleAlign), reverse translate them (Bio::Tools::CodonTable), then come up with an algorithm for primer analysis and selction (or even use other apps like primer3 (Bio::Tools::Run::Primer3) from within perl). Kevin Brown Center for Innovations in Medicine Biodesign Institute Arizona State University > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Bhakti Dwivedi > Sent: Wednesday, January 27, 2010 12:42 PM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Designing primers from multiple sequence > alignment ofamino acid sequences > > Hi, > > I have to design primers from the multiple sequence > alignments of amino acid > sequences. The sequences I am working with are quite > diverged and often the > available primer design programs (such as CODEHOP/iCODEHOP) > fail to find any > primer sets. But, when I look at the alignment manually, I > could see the > regions that I could use to make primers. > > So I designed the degenerate primers the old-fashioned way, > starting from > selecting the conserved regions (6-10aa long) from the alignment to > translating the selected regions to DNA using the appropriate > codon usage > table, and then finally checking the primer sets (potential > forward and > reverse primers) using tools like OLIGOANALYZER. In the end, > I did find few > good primer sets, but getting them to work in reality is > something I will > have to wait and see. > > While doing this process manually, I really felt the need to > automate it (it > was not just one alignment I did, I worked with several of > those). I was > wondering if there is anyway bioperl can help me here, or > making a perl > script is the only way to go. > > I would appreciate your suggestions/comments. Thanks! > (apologize for a > long email..) > > > Regards > Bhakti > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From mike.stubbington at bbsrc.ac.uk Thu Jan 28 10:41:49 2010 From: mike.stubbington at bbsrc.ac.uk (mike stubbington (BI)) Date: Thu, 28 Jan 2010 15:41:49 +0000 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error running blastn Message-ID: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk> Dear all, I am attempting to blast some primers against the mouse genome. I have created a local mouse genome blast database and I can search against it using 'blastn' at the command line. I have perl code that creates an array of bioperl sequence objects called @primers I then create a StandAloneBlastPlus factory using the following code? my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new( -db_dir => '/Users/stubbing/localBlast/', -db_name => 'MouseGenome' ); and then attempt to blast my primers using this? my @shortPrimers; my $count=1; foreach (@primers) { my $currentSeq = $_; print "Checking primer $count/$primerNumber "; if ($_->length < 40) { push(@shortPrimers,$_); print "Too short!\n"; } else { print "BLASTing..."; my $blastResult = $blastFactory->blastn(-query => $currentSeq); } $count++; } This fails with the following error? ------------- EXCEPTION ------------- MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running /usr/local/ncbi/blast/bin/blastn : Illegal seek at /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 989, line 532. STACK Bio::Tools::Run::WrapperBase::_run /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:994 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 STACK Bio::Tools::Run::StandAloneBlastPlus::run /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 STACK toplevel ./5CTest.pl:63 ------------------------------------- Line 63 in my code is (as you might expect) the one that calls blastn on my factory object. I'd appreciate any help you might be able to provide to shed light on this. Thanks in advance, Mike From maj at fortinbras.us Thu Jan 28 10:56:14 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 28 Jan 2010 10:56:14 -0500 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error running blastn In-Reply-To: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk> References: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk> Message-ID: <56CC8EF7EEAD4E28A3B822ADA8AD74B1@NewLife> Mike - please try updating your bioperl-live (the core) to the latest code (revision 16761 or so). CommandExts is a work in progress; from the stack errors it looks like you've got an older version. Try it then ping us back, if you would-- Thanks Mark ----- Original Message ----- From: "mike stubbington (BI)" To: Sent: Thursday, January 28, 2010 10:41 AM Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error running blastn Dear all, I am attempting to blast some primers against the mouse genome. I have created a local mouse genome blast database and I can search against it using 'blastn' at the command line. I have perl code that creates an array of bioperl sequence objects called @primers I then create a StandAloneBlastPlus factory using the following code? my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new( -db_dir => '/Users/stubbing/localBlast/', -db_name => 'MouseGenome' ); and then attempt to blast my primers using this? my @shortPrimers; my $count=1; foreach (@primers) { my $currentSeq = $_; print "Checking primer $count/$primerNumber "; if ($_->length < 40) { push(@shortPrimers,$_); print "Too short!\n"; } else { print "BLASTing..."; my $blastResult = $blastFactory->blastn(-query => $currentSeq); } $count++; } This fails with the following error? ------------- EXCEPTION ------------- MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running /usr/local/ncbi/blast/bin/blastn : Illegal seek at /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 989, line 532. STACK Bio::Tools::Run::WrapperBase::_run /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:994 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 STACK Bio::Tools::Run::StandAloneBlastPlus::run /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 STACK toplevel ./5CTest.pl:63 ------------------------------------- Line 63 in my code is (as you might expect) the one that calls blastn on my factory object. I'd appreciate any help you might be able to provide to shed light on this. Thanks in advance, Mike _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From mike.stubbington at bbsrc.ac.uk Thu Jan 28 11:18:12 2010 From: mike.stubbington at bbsrc.ac.uk (mike stubbington (BI)) Date: Thu, 28 Jan 2010 16:18:12 +0000 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error running blastn In-Reply-To: <56CC8EF7EEAD4E28A3B822ADA8AD74B1@NewLife> References: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk> <56CC8EF7EEAD4E28A3B822ADA8AD74B1@NewLife> Message-ID: <05171AE0-71F2-4102-8131-70C6019D0172@bbsrc.ac.uk> Hi, Thanks for the suggestion. Unfortunately it still fails - error as follows: ------------- EXCEPTION ------------- MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running /usr/local/ncbi/blast/bin/blastn : Illegal seek at /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 1000, line 532. STACK Bio::Tools::Run::WrapperBase::_run /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1005 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 STACK Bio::Tools::Run::StandAloneBlastPlus::run /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 STACK toplevel ./5CTest.pl:63 ------------------------------------- M On 28 Jan 2010, at 15:56, Mark A. Jensen wrote: > Mike - please try updating your bioperl-live (the core) to the latest code > (revision 16761 or so). > CommandExts is a work in progress; from the stack errors it looks like you've > got an older version. > Try it then ping us back, if you would-- > Thanks > Mark > ----- Original Message ----- > From: "mike stubbington (BI)" > To: > Sent: Thursday, January 28, 2010 10:41 AM > Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error > running blastn > > > Dear all, > > I am attempting to blast some primers against the mouse genome. I have created a > local mouse genome blast database and I can search against it using 'blastn' at > the command line. > > I have perl code that creates an array of bioperl sequence objects called > @primers > > I then create a StandAloneBlastPlus factory using the following code? > > my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new( > -db_dir => '/Users/stubbing/localBlast/', > -db_name => 'MouseGenome' > ); > > and then attempt to blast my primers using this? > > my @shortPrimers; > my $count=1; > foreach (@primers) { > my $currentSeq = $_; > print "Checking primer $count/$primerNumber "; > if ($_->length < 40) { > push(@shortPrimers,$_); > print "Too short!\n"; > } > else { > print "BLASTing..."; > my $blastResult = $blastFactory->blastn(-query => $currentSeq); > } > $count++; > } > > This fails with the following error? > > ------------- EXCEPTION ------------- > MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running > /usr/local/ncbi/blast/bin/blastn : Illegal seek at > /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 989, > line 532. > > STACK Bio::Tools::Run::WrapperBase::_run > /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:994 > STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 > STACK Bio::Tools::Run::StandAloneBlastPlus::run > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 > STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 > STACK toplevel ./5CTest.pl:63 > ------------------------------------- > > Line 63 in my code is (as you might expect) the one that calls blastn on my > factory object. > > I'd appreciate any help you might be able to provide to shed light on this. > > Thanks in advance, > > Mike > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Thu Jan 28 11:28:52 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 28 Jan 2010 11:28:52 -0500 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error running blastn In-Reply-To: <05171AE0-71F2-4102-8131-70C6019D0172@bbsrc.ac.uk> References: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk> <56CC8EF7EEAD4E28A3B822ADA8AD74B1@NewLife> <05171AE0-71F2-4102-8131-70C6019D0172@bbsrc.ac.uk> Message-ID: Thanks Mike-- will have a look asap- cheers MAJ ----- Original Message ----- From: "mike stubbington (BI)" To: "Mark A. Jensen" Cc: Sent: Thursday, January 28, 2010 11:18 AM Subject: Re: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error running blastn Hi, Thanks for the suggestion. Unfortunately it still fails - error as follows: ------------- EXCEPTION ------------- MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running /usr/local/ncbi/blast/bin/blastn : Illegal seek at /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 1000, line 532. STACK Bio::Tools::Run::WrapperBase::_run /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1005 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 STACK Bio::Tools::Run::StandAloneBlastPlus::run /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 STACK toplevel ./5CTest.pl:63 ------------------------------------- M On 28 Jan 2010, at 15:56, Mark A. Jensen wrote: > Mike - please try updating your bioperl-live (the core) to the latest code > (revision 16761 or so). > CommandExts is a work in progress; from the stack errors it looks like you've > got an older version. > Try it then ping us back, if you would-- > Thanks > Mark > ----- Original Message ----- > From: "mike stubbington (BI)" > To: > Sent: Thursday, January 28, 2010 10:41 AM > Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error > running blastn > > > Dear all, > > I am attempting to blast some primers against the mouse genome. I have created > a > local mouse genome blast database and I can search against it using 'blastn' > at > the command line. > > I have perl code that creates an array of bioperl sequence objects called > @primers > > I then create a StandAloneBlastPlus factory using the following code? > > my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new( > -db_dir => '/Users/stubbing/localBlast/', > -db_name => 'MouseGenome' > ); > > and then attempt to blast my primers using this? > > my @shortPrimers; > my $count=1; > foreach (@primers) { > my $currentSeq = $_; > print "Checking primer $count/$primerNumber "; > if ($_->length < 40) { > push(@shortPrimers,$_); > print "Too short!\n"; > } > else { > print "BLASTing..."; > my $blastResult = $blastFactory->blastn(-query => $currentSeq); > } > $count++; > } > > This fails with the following error? > > ------------- EXCEPTION ------------- > MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem > running > /usr/local/ncbi/blast/bin/blastn : Illegal seek at > /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 989, > line 532. > > STACK Bio::Tools::Run::WrapperBase::_run > /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:994 > STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 > STACK Bio::Tools::Run::StandAloneBlastPlus::run > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 > STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 > STACK toplevel ./5CTest.pl:63 > ------------------------------------- > > Line 63 in my code is (as you might expect) the one that calls blastn on my > factory object. > > I'd appreciate any help you might be able to provide to shed light on this. > > Thanks in advance, > > Mike > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Thu Jan 28 13:26:27 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 28 Jan 2010 12:26:27 -0600 Subject: [Bioperl-l] how to retrieve organism name from accession number? In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C61AC964C@exchsth.agresearch.co.nz> References: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife> <18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz> <4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu> <18DF7D20DFEC044098A1062202F5FFF32C61AC961E@exchsth.agresearch.co.nz> <5FF9B845-2B6A-4B8C-B8AE-A12E0CBCA2D6@illinois.edu> <18DF7D20DFEC044098A1062202F5FFF32C61AC964C@exchsth.agresearch.co.nz> Message-ID: <1264703187.5473.10.camel@cjfields.igb.uiuc.edu> Russell, Just curious, but have you tried setting the return email parameter (-email)? NCBI recently stated that all queries would eventually require a return email of some sort (not sure if it's validated or not). I think that was set for around late spring. I'm changing the code in svn to require it for that very purpose. chris Wed, 2010-01-27 at 15:45 +1300, Smithies, Russell wrote: > Batch-entrez http://www.ncbi.nlm.nih.gov/portal/utils/batchentrez_p.cgi still works if you don't mind a bit of manual button clicking. It's handling chunks of 100,000 records OK (today). > > --Russell > > > -----Original Message----- > > From: Chris Fields [mailto:cjfields at illinois.edu] > > Sent: Wednesday, 27 January 2010 3:42 p.m. > > To: Smithies, Russell > > Cc: 'bioperl-l at lists.open-bio.org'; 'Mark A. Jensen' > > Subject: Re: [Bioperl-l] how to retrieve organism name from accession > > number? > > > > Makes me wonder if they're pushing more users towards the SOAP-based > > services and away from eutils. > > > > chris > > > > On Jan 26, 2010, at 7:59 PM, Smithies, Russell wrote: > > > > > I've had a wide selection of errors lately: > > > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > > MSG: NCBI esearch fatal error: Search Backend failed: Error 11 (Resource > > temporarily unavailable) > > > STACK: Error::throw > > > STACK: Bio::Root::Root::throw > > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 > > > STACK: Bio::Tools::EUtilities::parse_data > > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332 > > > STACK: Bio::Tools::EUtilities::get_ids > > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441 > > > STACK: Bio::DB::EUtilities::get_ids > > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363 > > > STACK: get_desc.pl:32 > > > ----------------------------------------------------------- > > > > > > And I never get a good explanation from NCBI or suggestions on how to > > avoid it. > > > > > > > > > --Russell > > > > > > > > >> -----Original Message----- > > >> From: Chris Fields [mailto:cjfields at illinois.edu] > > >> Sent: Wednesday, 27 January 2010 2:46 p.m. > > >> To: Smithies, Russell > > >> Cc: 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org' > > >> Subject: Re: [Bioperl-l] how to retrieve organism name from accession > > >> number? > > >> > > >> It's unfortunate but I have heard this problem popping up quite a bit > > more > > >> frequently lately. Not to push too many buttons but NCBI isn't very > > >> forthcoming with help these days; they have become quite insular. Not > > >> sure if they're short-staffed due to budget or if there are other > > issues. > > >> > > >> chris > > >> > > >> On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote: > > >> > > >>> Grrrrrr, I hate eutils!!!! > > >>> > > >>> ------------- EXCEPTION: Bio::Root::Exception ------------- > > >>> MSG: NCBI esearch fatal error: Search Backend failed: Error 111 > > >> (Connection refused) > > >>> STACK: Error::throw > > >>> STACK: Bio::Root::Root::throw > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 > > >>> STACK: Bio::Tools::EUtilities::parse_data > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332 > > >>> STACK: Bio::Tools::EUtilities::get_ids > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441 > > >>> STACK: Bio::DB::EUtilities::get_ids > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363 > > >>> STACK: get_desc.pl:32 > > >>> ----------------------------------------------------------- > > >>> > > >>> > > >>> Nice error message though :-) > > >>> > > >>> > > >>> --Russell > > >>> > > >>>> -----Original Message----- > > >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > >>>> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell > > >>>> Sent: Monday, 11 January 2010 10:05 a.m. > > >>>> To: 'Chris Fields' > > >>>> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open- > > bio.org' > > >>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession > > >>>> number? > > >>>> > > >>>> I've started to go off eUtils recently (not BioPerl's fault) as I've > > >> often > > >>>> been finding that with large queries, chunks of the resulting data is > > >>>> missing. > > >>>> For example, before Xmas I was creating species-specific databases by > > >>>> using eUtils to get a list of GI numbers back for a taxid, then > > >> retrieving > > >>>> the fasta sequences in chunks of 500. > > >>>> Very regularly, in the middle of the fasta there would be a message > > >> about > > >>>> resource unavailable eg. > > >>>>> test_sequence_1 > > >>>> TACGATCATCGCTResource UnavailableTACGACTCTGCT > > >>>>> test_sequence_2 > > >>>> TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT > > >>>> > > >>>> Often this wasn't detected until formatdb complained about invalid > > >>>> characters. > > >>>> Inquiries to NCBI as to why this was happening and what to do about > > it > > >>>> returned stupid answers ("do each sequence manually thru the web > > >>>> interface", or "use eUtils"). > > >>>> As we have a nice fast network connection, I now prefer to download > > >> very > > >>>> large gzip files (i.e. all of refseq) and extract what I need. > > >>>> > > >>>> I can't help but think that NCBI could solve a lot of problems if > > they > > >>>> gzipped the output from eUtils queries - it's something I've > > requested > > >>>> regularly for the last 5 years or so!! > > >>>> > > >>>> --Russell > > >>>> > > >>>> > > >>>>> -----Original Message----- > > >>>>> From: Chris Fields [mailto:cjfields at illinois.edu] > > >>>>> Sent: Monday, 11 January 2010 9:50 a.m. > > >>>>> To: Smithies, Russell > > >>>>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open- > > bio.org' > > >>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from > > accession > > >>>>> number? > > >>>>> > > >>>>> One could also use Bio::DB::Taxonomy, which indexes the same files > > or > > >>>>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for > > >> the > > >>>>> details). > > >>>>> > > >>>>> chris > > >>>>> > > >>>>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote: > > >>>>> > > >>>>>> An alternate non-BioPerly way (that may be faster given NCBI's > > >>>> flakiness > > >>>>> lately) would be to download the gi_taxid_nucl.zip or > > >> gi_taxid_prot.zip > > >>>>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a > > hash > > >>>> and > > >>>>> do lookups. > > >>>>>> In that same dir, taxdump.tar.gz contains a file called names.dmp > > >>>> which > > >>>>> lists taxids and descriptions (and synonyms) > > >>>>>> > > >>>>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so > > I > > >>>>> could do this: > > >>>>>> > > >>>>>> my $taxid = $gi_taxid_nucl{$accession}; > > >>>>>> my $org_name = $names{$taxid}; > > >>>>>> > > >>>>>> --Russell > > >>>>>> > > >>>>>> > > >>>>>>> -----Original Message----- > > >>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > >>>>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen > > >>>>>>> Sent: Saturday, 26 December 2009 4:52 p.m. > > >>>>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org > > >>>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from > > >> accession > > >>>>>>> number? > > >>>>>>> > > >>>>>>> Bhakti, > > >>>>>>> The following example (using EUtilities) may serve your purpose: > > >>>>>>> > > >>>>>>> use Bio::DB::EUtilities; > > >>>>>>> > > >>>>>>> my (%taxa, @taxa); > > >>>>>>> my (%names, %idmap); > > >>>>>>> > > >>>>>>> # these are protein ids; nuc ids will work by changing -dbfrom => > > >>>>>>> 'nucleotide', > > >>>>>>> # (probably) > > >>>>>>> > > >>>>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439); > > >>>>>>> > > >>>>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink', > > >>>>>>> -db => 'taxonomy', > > >>>>>>> -dbfrom => 'protein', > > >>>>>>> -correspondence => 1, > > >>>>>>> -id => \@ids); > > >>>>>>> > > >>>>>>> # iterate through the LinkSet objects > > >>>>>>> while (my $ds = $factory->next_LinkSet) { > > >>>>>>> $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0] > > >>>>>>> } > > >>>>>>> > > >>>>>>> @taxa = @taxa{@ids}; > > >>>>>>> > > >>>>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary', > > >>>>>>> -db => 'taxonomy', > > >>>>>>> -id => \@taxa ); > > >>>>>>> > > >>>>>>> while (local $_ = $factory->next_DocSum) { > > >>>>>>> $names{($_->get_contents_by_name('TaxId'))[0]} = > > >>>>>>> ($_->get_contents_by_name('ScientificName'))[0]; > > >>>>>>> } > > >>>>>>> > > >>>>>>> foreach (@ids) { > > >>>>>>> $idmap{$_} = $names{$taxa{$_}}; > > >>>>>>> } > > >>>>>>> > > >>>>>>> # %idmap is > > >>>>>>> # 1621261 => 'Mycobacterium tuberculosis H37Rv' > > >>>>>>> # 20807972 => 'Thermoanaerobacter tengcongensis MB4' > > >>>>>>> # 68536103 => 'Corynebacterium jeikeium K411' > > >>>>>>> # 730439 => 'Bacillus caldolyticus' > > >>>>>>> # 89318838 => undef (this record has been removed from the > > db) > > >>>>>>> > > >>>>>>> 1; > > >>>>>>> > > >>>>>>> You probably will need to break up your 30000 into chunks > > >>>>>>> (say, 1000-3000 each), and do the above on each chunk with a > > >>>>>>> > > >>>>>>> sleep 3; > > >>>>>>> > > >>>>>>> or so separating the queries. > > >>>>>>> MAJ > > >>>>>>> ----- Original Message ----- > > >>>>>>> From: "Bhakti Dwivedi" > > >>>>>>> To: > > >>>>>>> Sent: Friday, December 25, 2009 9:46 PM > > >>>>>>> Subject: [Bioperl-l] how to retrieve organism name from accession > > >>>>> number? > > >>>>>>> > > >>>>>>> > > >>>>>>>> Hi, > > >>>>>>>> > > >>>>>>>> Does anyone know how to retrieve the "Source" or the "Species > > name" > > >>>>>>> given > > >>>>>>>> the accession number using Bioperl. I have these 30,000 > > accession > > >>>>>>> numbers > > >>>>>>>> for which I need to get the source organisms. Any kind of help > > >> will > > >>>>> be > > >>>>>>>> appreciated. > > >>>>>>>> > > >>>>>>>> Thanks > > >>>>>>>> > > >>>>>>>> BD > > >>>>>>>> _______________________________________________ > > >>>>>>>> Bioperl-l mailing list > > >>>>>>>> Bioperl-l at lists.open-bio.org > > >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > >>>>>>>> > > >>>>>>>> > > >>>>>>> > > >>>>>>> _______________________________________________ > > >>>>>>> Bioperl-l mailing list > > >>>>>>> Bioperl-l at lists.open-bio.org > > >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > >>>>>> > > >>>> > > ======================================================================= > > >>>>>> Attention: The information contained in this message and/or > > >>>> attachments > > >>>>>> from AgResearch Limited is intended only for the persons or > > entities > > >>>>>> to which it is addressed and may contain confidential and/or > > >>>> privileged > > >>>>>> material. Any review, retransmission, dissemination or other use > > of, > > >>>> or > > >>>>>> taking of any action in reliance upon, this information by persons > > or > > >>>>>> entities other than the intended recipients is prohibited by > > >>>> AgResearch > > >>>>>> Limited. If you have received this message in error, please notify > > >> the > > >>>>>> sender immediately. > > >>>>>> > > >>>> > > ======================================================================= > > >>>>>> > > >>>>>> _______________________________________________ > > >>>>>> Bioperl-l mailing list > > >>>>>> Bioperl-l at lists.open-bio.org > > >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > >>>> > > >>>> > > >>>> _______________________________________________ > > >>>> Bioperl-l mailing list > > >>>> Bioperl-l at lists.open-bio.org > > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Thu Jan 28 13:47:04 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 28 Jan 2010 13:47:04 -0500 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error running blastn In-Reply-To: <05171AE0-71F2-4102-8131-70C6019D0172@bbsrc.ac.uk> References: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk><56CC8EF7EEAD4E28A3B822ADA8AD74B1@NewLife> <05171AE0-71F2-4102-8131-70C6019D0172@bbsrc.ac.uk> Message-ID: Hi Mike, Believe I found the real bug causing the problem (was not accounting for the db_dir parameter). Crashes should now also throw much more helpful errors. Please try the code at r16774, and shout back. thanks -- MAJ ----- Original Message ----- From: "mike stubbington (BI)" To: "Mark A. Jensen" Cc: Sent: Thursday, January 28, 2010 11:18 AM Subject: Re: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error running blastn Hi, Thanks for the suggestion. Unfortunately it still fails - error as follows: ------------- EXCEPTION ------------- MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running /usr/local/ncbi/blast/bin/blastn : Illegal seek at /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 1000, line 532. STACK Bio::Tools::Run::WrapperBase::_run /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1005 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 STACK Bio::Tools::Run::StandAloneBlastPlus::run /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 STACK toplevel ./5CTest.pl:63 ------------------------------------- M On 28 Jan 2010, at 15:56, Mark A. Jensen wrote: > Mike - please try updating your bioperl-live (the core) to the latest code > (revision 16761 or so). > CommandExts is a work in progress; from the stack errors it looks like you've > got an older version. > Try it then ping us back, if you would-- > Thanks > Mark > ----- Original Message ----- > From: "mike stubbington (BI)" > To: > Sent: Thursday, January 28, 2010 10:41 AM > Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error > running blastn > > > Dear all, > > I am attempting to blast some primers against the mouse genome. I have created > a > local mouse genome blast database and I can search against it using 'blastn' > at > the command line. > > I have perl code that creates an array of bioperl sequence objects called > @primers > > I then create a StandAloneBlastPlus factory using the following code? > > my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new( > -db_dir => '/Users/stubbing/localBlast/', > -db_name => 'MouseGenome' > ); > > and then attempt to blast my primers using this? > > my @shortPrimers; > my $count=1; > foreach (@primers) { > my $currentSeq = $_; > print "Checking primer $count/$primerNumber "; > if ($_->length < 40) { > push(@shortPrimers,$_); > print "Too short!\n"; > } > else { > print "BLASTing..."; > my $blastResult = $blastFactory->blastn(-query => $currentSeq); > } > $count++; > } > > This fails with the following error? > > ------------- EXCEPTION ------------- > MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem > running > /usr/local/ncbi/blast/bin/blastn : Illegal seek at > /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 989, > line 532. > > STACK Bio::Tools::Run::WrapperBase::_run > /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:994 > STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 > STACK Bio::Tools::Run::StandAloneBlastPlus::run > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 > STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 > STACK toplevel ./5CTest.pl:63 > ------------------------------------- > > Line 63 in my code is (as you might expect) the one that calls blastn on my > factory object. > > I'd appreciate any help you might be able to provide to shed light on this. > > Thanks in advance, > > Mike > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Jan 28 14:00:26 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 28 Jan 2010 13:00:26 -0600 Subject: [Bioperl-l] EUtilities policy change Message-ID: <1264705226.5473.35.camel@cjfields.igb.uiuc.edu> All, Per NCBI's recent change in eutils user policy (effective June 1): http://bioperl.org/pipermail/bioperl-l/2009-December/031698.html Both the tool and email parameters ('-tool', '-email') are now required when making requests. Note this will significantly break all modules requiring remote access to eutils (Bio::DB::GenBank/GenPept, some Biblio and Taxonomy stuff as well, IIRC). This also applies to web services (SOAP-based access). Mark, not sure how this affects your SOAP-based modules. I have reconfigured Bio::DB::EUtilities to follow this policy; the default tool setting has been 'bioperl' and will remain that way. However, there has been no default email, therefore setting this is now required for future requests unless we (the bioperl devs) decide there is a safe default email to utilize. My gut tells me, however, that falling back to a default email opens up a can of worms for the devs and is very likely a 'BAD IDEA'(TM). Regardless, be aware that, after June 1, NCBI will very likely exclude requests with no email and will notify users who are considered to be violating their policies. I will likely make further changes to Bio::DB::EUtilities in the meantime to ensure that using the tools by default will not violate NCBI's policy (e.g. override this at your own risk). chris From maj at fortinbras.us Thu Jan 28 14:05:43 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 28 Jan 2010 14:05:43 -0500 Subject: [Bioperl-l] EUtilities policy change In-Reply-To: <1264705226.5473.35.camel@cjfields.igb.uiuc.edu> References: <1264705226.5473.35.camel@cjfields.igb.uiuc.edu> Message-ID: <8F49B5ED151143FA86E977B4D4F44265@NewLife> Thanks Chris-- The soap modules currently set tool to "SoapEUtilities(BioPerl)". I agree that a default email is a bad idea (tm) (unless maybe it's hilmar's...?). I'd say a warning on unset email parameters is a responsible "there be dragons" sort of treatment. MAJ ----- Original Message ----- From: "Chris Fields" To: "BioPerl-l" Cc: "Mark A. Jensen" Sent: Thursday, January 28, 2010 2:00 PM Subject: EUtilities policy change > All, > > Per NCBI's recent change in eutils user policy (effective June 1): > > http://bioperl.org/pipermail/bioperl-l/2009-December/031698.html > > Both the tool and email parameters ('-tool', '-email') are now required > when making requests. Note this will significantly break all modules > requiring remote access to eutils (Bio::DB::GenBank/GenPept, some Biblio > and Taxonomy stuff as well, IIRC). This also applies to web services > (SOAP-based access). Mark, not sure how this affects your SOAP-based > modules. > > I have reconfigured Bio::DB::EUtilities to follow this policy; the > default tool setting has been 'bioperl' and will remain that way. > However, there has been no default email, therefore setting this is now > required for future requests unless we (the bioperl devs) decide there > is a safe default email to utilize. My gut tells me, however, that > falling back to a default email opens up a can of worms for the devs and > is very likely a 'BAD IDEA'(TM). > > Regardless, be aware that, after June 1, NCBI will very likely exclude > requests with no email and will notify users who are considered to be > violating their policies. > > I will likely make further changes to Bio::DB::EUtilities in the > meantime to ensure that using the tools by default will not violate > NCBI's policy (e.g. override this at your own risk). > > chris > > > From cjfields at illinois.edu Thu Jan 28 14:18:22 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 28 Jan 2010 13:18:22 -0600 Subject: [Bioperl-l] EUtilities policy change In-Reply-To: <8F49B5ED151143FA86E977B4D4F44265@NewLife> References: <1264705226.5473.35.camel@cjfields.igb.uiuc.edu> <8F49B5ED151143FA86E977B4D4F44265@NewLife> Message-ID: <1264706302.5473.48.camel@cjfields.igb.uiuc.edu> I think warning is fine for now. I've reimplemented that so it occurs lazily (warns only when a request is actually made). Will also change the tool to 'BioPerl' (currently 'bioperl', all lc). We'll obviously have to address this in the test suite as well in some way, maybe ask for an email if network tests are requested. chris On Thu, 2010-01-28 at 14:05 -0500, Mark A. Jensen wrote: > Thanks Chris-- > The soap modules currently set tool to "SoapEUtilities(BioPerl)". > I agree that a default email is a bad idea (tm) (unless maybe it's > hilmar's...?). I'd say a warning on unset email parameters is a responsible > "there be dragons" sort of treatment. > MAJ > ----- Original Message ----- > From: "Chris Fields" > To: "BioPerl-l" > Cc: "Mark A. Jensen" > Sent: Thursday, January 28, 2010 2:00 PM > Subject: EUtilities policy change > > > > All, > > > > Per NCBI's recent change in eutils user policy (effective June 1): > > > > http://bioperl.org/pipermail/bioperl-l/2009-December/031698.html > > > > Both the tool and email parameters ('-tool', '-email') are now required > > when making requests. Note this will significantly break all modules > > requiring remote access to eutils (Bio::DB::GenBank/GenPept, some Biblio > > and Taxonomy stuff as well, IIRC). This also applies to web services > > (SOAP-based access). Mark, not sure how this affects your SOAP-based > > modules. > > > > I have reconfigured Bio::DB::EUtilities to follow this policy; the > > default tool setting has been 'bioperl' and will remain that way. > > However, there has been no default email, therefore setting this is now > > required for future requests unless we (the bioperl devs) decide there > > is a safe default email to utilize. My gut tells me, however, that > > falling back to a default email opens up a can of worms for the devs and > > is very likely a 'BAD IDEA'(TM). > > > > Regardless, be aware that, after June 1, NCBI will very likely exclude > > requests with no email and will notify users who are considered to be > > violating their policies. > > > > I will likely make further changes to Bio::DB::EUtilities in the > > meantime to ensure that using the tools by default will not violate > > NCBI's policy (e.g. override this at your own risk). > > > > chris > > > > > > From Russell.Smithies at agresearch.co.nz Thu Jan 28 14:25:38 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Fri, 29 Jan 2010 08:25:38 +1300 Subject: [Bioperl-l] how to retrieve organism name from accession number? In-Reply-To: <1264703187.5473.10.camel@cjfields.igb.uiuc.edu> References: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife> <18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz> <4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu> <18DF7D20DFEC044098A1062202F5FFF32C61AC961E@exchsth.agresearch.co.nz> <5FF9B845-2B6A-4B8C-B8AE-A12E0CBCA2D6@illinois.edu> <18DF7D20DFEC044098A1062202F5FFF32C61AC964C@exchsth.agresearch.co.nz> <1264703187.5473.10.camel@cjfields.igb.uiuc.edu> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C61AC9AF3@exchsth.agresearch.co.nz> Yes, I usually set the 'tool' and 'email' parameters. I went to NCBI back in 2006 and did their "PowerScripting" course where they pointed out a lot of the requirements for using eUtils. I think I requested results returned gzipped back then as well... --Russell > -----Original Message----- > From: Chris Fields [mailto:cjfields at illinois.edu] > Sent: Friday, 29 January 2010 7:26 a.m. > To: Smithies, Russell > Cc: 'bioperl-l at lists.open-bio.org'; 'Mark A. Jensen' > Subject: Re: [Bioperl-l] how to retrieve organism name from accession > number? > > Russell, > > Just curious, but have you tried setting the return email parameter > (-email)? NCBI recently stated that all queries would eventually > require a return email of some sort (not sure if it's validated or not). > I think that was set for around late spring. I'm changing the code in > svn to require it for that very purpose. > > chris > > > Wed, 2010-01-27 at 15:45 +1300, Smithies, Russell wrote: > > Batch-entrez http://www.ncbi.nlm.nih.gov/portal/utils/batchentrez_p.cgi > still works if you don't mind a bit of manual button clicking. It's > handling chunks of 100,000 records OK (today). > > > > --Russell > > > > > -----Original Message----- > > > From: Chris Fields [mailto:cjfields at illinois.edu] > > > Sent: Wednesday, 27 January 2010 3:42 p.m. > > > To: Smithies, Russell > > > Cc: 'bioperl-l at lists.open-bio.org'; 'Mark A. Jensen' > > > Subject: Re: [Bioperl-l] how to retrieve organism name from accession > > > number? > > > > > > Makes me wonder if they're pushing more users towards the SOAP-based > > > services and away from eutils. > > > > > > chris > > > > > > On Jan 26, 2010, at 7:59 PM, Smithies, Russell wrote: > > > > > > > I've had a wide selection of errors lately: > > > > > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > > > MSG: NCBI esearch fatal error: Search Backend failed: Error 11 > (Resource > > > temporarily unavailable) > > > > STACK: Error::throw > > > > STACK: Bio::Root::Root::throw > > > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 > > > > STACK: Bio::Tools::EUtilities::parse_data > > > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332 > > > > STACK: Bio::Tools::EUtilities::get_ids > > > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441 > > > > STACK: Bio::DB::EUtilities::get_ids > > > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363 > > > > STACK: get_desc.pl:32 > > > > ----------------------------------------------------------- > > > > > > > > And I never get a good explanation from NCBI or suggestions on how > to > > > avoid it. > > > > > > > > > > > > --Russell > > > > > > > > > > > >> -----Original Message----- > > > >> From: Chris Fields [mailto:cjfields at illinois.edu] > > > >> Sent: Wednesday, 27 January 2010 2:46 p.m. > > > >> To: Smithies, Russell > > > >> Cc: 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org' > > > >> Subject: Re: [Bioperl-l] how to retrieve organism name from > accession > > > >> number? > > > >> > > > >> It's unfortunate but I have heard this problem popping up quite a > bit > > > more > > > >> frequently lately. Not to push too many buttons but NCBI isn't > very > > > >> forthcoming with help these days; they have become quite insular. > Not > > > >> sure if they're short-staffed due to budget or if there are other > > > issues. > > > >> > > > >> chris > > > >> > > > >> On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote: > > > >> > > > >>> Grrrrrr, I hate eutils!!!! > > > >>> > > > >>> ------------- EXCEPTION: Bio::Root::Exception ------------- > > > >>> MSG: NCBI esearch fatal error: Search Backend failed: Error 111 > > > >> (Connection refused) > > > >>> STACK: Error::throw > > > >>> STACK: Bio::Root::Root::throw > > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 > > > >>> STACK: Bio::Tools::EUtilities::parse_data > > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332 > > > >>> STACK: Bio::Tools::EUtilities::get_ids > > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441 > > > >>> STACK: Bio::DB::EUtilities::get_ids > > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363 > > > >>> STACK: get_desc.pl:32 > > > >>> ----------------------------------------------------------- > > > >>> > > > >>> > > > >>> Nice error message though :-) > > > >>> > > > >>> > > > >>> --Russell > > > >>> > > > >>>> -----Original Message----- > > > >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > > >>>> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell > > > >>>> Sent: Monday, 11 January 2010 10:05 a.m. > > > >>>> To: 'Chris Fields' > > > >>>> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open- > > > bio.org' > > > >>>> Subject: Re: [Bioperl-l] how to retrieve organism name from > accession > > > >>>> number? > > > >>>> > > > >>>> I've started to go off eUtils recently (not BioPerl's fault) as > I've > > > >> often > > > >>>> been finding that with large queries, chunks of the resulting > data is > > > >>>> missing. > > > >>>> For example, before Xmas I was creating species-specific > databases by > > > >>>> using eUtils to get a list of GI numbers back for a taxid, then > > > >> retrieving > > > >>>> the fasta sequences in chunks of 500. > > > >>>> Very regularly, in the middle of the fasta there would be a > message > > > >> about > > > >>>> resource unavailable eg. > > > >>>>> test_sequence_1 > > > >>>> TACGATCATCGCTResource UnavailableTACGACTCTGCT > > > >>>>> test_sequence_2 > > > >>>> TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT > > > >>>> > > > >>>> Often this wasn't detected until formatdb complained about > invalid > > > >>>> characters. > > > >>>> Inquiries to NCBI as to why this was happening and what to do > about > > > it > > > >>>> returned stupid answers ("do each sequence manually thru the web > > > >>>> interface", or "use eUtils"). > > > >>>> As we have a nice fast network connection, I now prefer to > download > > > >> very > > > >>>> large gzip files (i.e. all of refseq) and extract what I need. > > > >>>> > > > >>>> I can't help but think that NCBI could solve a lot of problems if > > > they > > > >>>> gzipped the output from eUtils queries - it's something I've > > > requested > > > >>>> regularly for the last 5 years or so!! > > > >>>> > > > >>>> --Russell > > > >>>> > > > >>>> > > > >>>>> -----Original Message----- > > > >>>>> From: Chris Fields [mailto:cjfields at illinois.edu] > > > >>>>> Sent: Monday, 11 January 2010 9:50 a.m. > > > >>>>> To: Smithies, Russell > > > >>>>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open- > > > bio.org' > > > >>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from > > > accession > > > >>>>> number? > > > >>>>> > > > >>>>> One could also use Bio::DB::Taxonomy, which indexes the same > files > > > or > > > >>>>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD > for > > > >> the > > > >>>>> details). > > > >>>>> > > > >>>>> chris > > > >>>>> > > > >>>>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote: > > > >>>>> > > > >>>>>> An alternate non-BioPerly way (that may be faster given NCBI's > > > >>>> flakiness > > > >>>>> lately) would be to download the gi_taxid_nucl.zip or > > > >> gi_taxid_prot.zip > > > >>>>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into > a > > > hash > > > >>>> and > > > >>>>> do lookups. > > > >>>>>> In that same dir, taxdump.tar.gz contains a file called > names.dmp > > > >>>> which > > > >>>>> lists taxids and descriptions (and synonyms) > > > >>>>>> > > > >>>>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes > so > > > I > > > >>>>> could do this: > > > >>>>>> > > > >>>>>> my $taxid = $gi_taxid_nucl{$accession}; > > > >>>>>> my $org_name = $names{$taxid}; > > > >>>>>> > > > >>>>>> --Russell > > > >>>>>> > > > >>>>>> > > > >>>>>>> -----Original Message----- > > > >>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > > >>>>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen > > > >>>>>>> Sent: Saturday, 26 December 2009 4:52 p.m. > > > >>>>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org > > > >>>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from > > > >> accession > > > >>>>>>> number? > > > >>>>>>> > > > >>>>>>> Bhakti, > > > >>>>>>> The following example (using EUtilities) may serve your > purpose: > > > >>>>>>> > > > >>>>>>> use Bio::DB::EUtilities; > > > >>>>>>> > > > >>>>>>> my (%taxa, @taxa); > > > >>>>>>> my (%names, %idmap); > > > >>>>>>> > > > >>>>>>> # these are protein ids; nuc ids will work by changing -dbfrom > => > > > >>>>>>> 'nucleotide', > > > >>>>>>> # (probably) > > > >>>>>>> > > > >>>>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439); > > > >>>>>>> > > > >>>>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink', > > > >>>>>>> -db => 'taxonomy', > > > >>>>>>> -dbfrom => 'protein', > > > >>>>>>> -correspondence => 1, > > > >>>>>>> -id => \@ids); > > > >>>>>>> > > > >>>>>>> # iterate through the LinkSet objects > > > >>>>>>> while (my $ds = $factory->next_LinkSet) { > > > >>>>>>> $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0] > > > >>>>>>> } > > > >>>>>>> > > > >>>>>>> @taxa = @taxa{@ids}; > > > >>>>>>> > > > >>>>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary', > > > >>>>>>> -db => 'taxonomy', > > > >>>>>>> -id => \@taxa ); > > > >>>>>>> > > > >>>>>>> while (local $_ = $factory->next_DocSum) { > > > >>>>>>> $names{($_->get_contents_by_name('TaxId'))[0]} = > > > >>>>>>> ($_->get_contents_by_name('ScientificName'))[0]; > > > >>>>>>> } > > > >>>>>>> > > > >>>>>>> foreach (@ids) { > > > >>>>>>> $idmap{$_} = $names{$taxa{$_}}; > > > >>>>>>> } > > > >>>>>>> > > > >>>>>>> # %idmap is > > > >>>>>>> # 1621261 => 'Mycobacterium tuberculosis H37Rv' > > > >>>>>>> # 20807972 => 'Thermoanaerobacter tengcongensis MB4' > > > >>>>>>> # 68536103 => 'Corynebacterium jeikeium K411' > > > >>>>>>> # 730439 => 'Bacillus caldolyticus' > > > >>>>>>> # 89318838 => undef (this record has been removed from > the > > > db) > > > >>>>>>> > > > >>>>>>> 1; > > > >>>>>>> > > > >>>>>>> You probably will need to break up your 30000 into chunks > > > >>>>>>> (say, 1000-3000 each), and do the above on each chunk with a > > > >>>>>>> > > > >>>>>>> sleep 3; > > > >>>>>>> > > > >>>>>>> or so separating the queries. > > > >>>>>>> MAJ > > > >>>>>>> ----- Original Message ----- > > > >>>>>>> From: "Bhakti Dwivedi" > > > >>>>>>> To: > > > >>>>>>> Sent: Friday, December 25, 2009 9:46 PM > > > >>>>>>> Subject: [Bioperl-l] how to retrieve organism name from > accession > > > >>>>> number? > > > >>>>>>> > > > >>>>>>> > > > >>>>>>>> Hi, > > > >>>>>>>> > > > >>>>>>>> Does anyone know how to retrieve the "Source" or the "Species > > > name" > > > >>>>>>> given > > > >>>>>>>> the accession number using Bioperl. I have these 30,000 > > > accession > > > >>>>>>> numbers > > > >>>>>>>> for which I need to get the source organisms. Any kind of > help > > > >> will > > > >>>>> be > > > >>>>>>>> appreciated. > > > >>>>>>>> > > > >>>>>>>> Thanks > > > >>>>>>>> > > > >>>>>>>> BD > > > >>>>>>>> _______________________________________________ > > > >>>>>>>> Bioperl-l mailing list > > > >>>>>>>> Bioperl-l at lists.open-bio.org > > > >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>> > > > >>>>>>> _______________________________________________ > > > >>>>>>> Bioperl-l mailing list > > > >>>>>>> Bioperl-l at lists.open-bio.org > > > >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > >>>>>> > > > >>>> > > > > ======================================================================= > > > >>>>>> Attention: The information contained in this message and/or > > > >>>> attachments > > > >>>>>> from AgResearch Limited is intended only for the persons or > > > entities > > > >>>>>> to which it is addressed and may contain confidential and/or > > > >>>> privileged > > > >>>>>> material. Any review, retransmission, dissemination or other > use > > > of, > > > >>>> or > > > >>>>>> taking of any action in reliance upon, this information by > persons > > > or > > > >>>>>> entities other than the intended recipients is prohibited by > > > >>>> AgResearch > > > >>>>>> Limited. If you have received this message in error, please > notify > > > >> the > > > >>>>>> sender immediately. > > > >>>>>> > > > >>>> > > > > ======================================================================= > > > >>>>>> > > > >>>>>> _______________________________________________ > > > >>>>>> Bioperl-l mailing list > > > >>>>>> Bioperl-l at lists.open-bio.org > > > >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > >>>> > > > >>>> > > > >>>> _______________________________________________ > > > >>>> Bioperl-l mailing list > > > >>>> Bioperl-l at lists.open-bio.org > > > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l at lists.open-bio.org > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Thu Jan 28 14:30:12 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 28 Jan 2010 13:30:12 -0600 Subject: [Bioperl-l] how to retrieve organism name from accession number? In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C61AC9AF3@exchsth.agresearch.co.nz> References: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife> <18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz> <4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu> <18DF7D20DFEC044098A1062202F5FFF32C61AC961E@exchsth.agresearch.co.nz> <5FF9B845-2B6A-4B8C-B8AE-A12E0CBCA2D6@illinois.edu> <18DF7D20DFEC044098A1062202F5FFF32C61AC964C@exchsth.agresearch.co.nz> <1264703187.5473.10.camel@cjfields.igb.uiuc.edu> <18DF7D20DFEC044098A1062202F5FFF32C61AC9AF3@exchsth.agresearch.co.nz> Message-ID: <1264707012.5473.51.camel@cjfields.igb.uiuc.edu> Russell, Okay, just wanted to make sure. The email/tool requirements weren't actually enforced up until now, which is forcing us to do a bit of re-work on the various tools that don't have it set by default (at least warn users unaware of it). And I agree, gzipped archives would be nice! chris On Fri, 2010-01-29 at 08:25 +1300, Smithies, Russell wrote: > Yes, I usually set the 'tool' and 'email' parameters. > I went to NCBI back in 2006 and did their "PowerScripting" course where they pointed out a lot of the requirements for using eUtils. I think I requested results returned gzipped back then as well... > > --Russell > > > -----Original Message----- > > From: Chris Fields [mailto:cjfields at illinois.edu] > > Sent: Friday, 29 January 2010 7:26 a.m. > > To: Smithies, Russell > > Cc: 'bioperl-l at lists.open-bio.org'; 'Mark A. Jensen' > > Subject: Re: [Bioperl-l] how to retrieve organism name from accession > > number? > > > > Russell, > > > > Just curious, but have you tried setting the return email parameter > > (-email)? NCBI recently stated that all queries would eventually > > require a return email of some sort (not sure if it's validated or not). > > I think that was set for around late spring. I'm changing the code in > > svn to require it for that very purpose. > > > > chris > > > > > > Wed, 2010-01-27 at 15:45 +1300, Smithies, Russell wrote: > > > Batch-entrez http://www.ncbi.nlm.nih.gov/portal/utils/batchentrez_p.cgi > > still works if you don't mind a bit of manual button clicking. It's > > handling chunks of 100,000 records OK (today). > > > > > > --Russell > > > > > > > -----Original Message----- > > > > From: Chris Fields [mailto:cjfields at illinois.edu] > > > > Sent: Wednesday, 27 January 2010 3:42 p.m. > > > > To: Smithies, Russell > > > > Cc: 'bioperl-l at lists.open-bio.org'; 'Mark A. Jensen' > > > > Subject: Re: [Bioperl-l] how to retrieve organism name from accession > > > > number? > > > > > > > > Makes me wonder if they're pushing more users towards the SOAP-based > > > > services and away from eutils. > > > > > > > > chris > > > > > > > > On Jan 26, 2010, at 7:59 PM, Smithies, Russell wrote: > > > > > > > > > I've had a wide selection of errors lately: > > > > > > > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > > > > MSG: NCBI esearch fatal error: Search Backend failed: Error 11 > > (Resource > > > > temporarily unavailable) > > > > > STACK: Error::throw > > > > > STACK: Bio::Root::Root::throw > > > > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 > > > > > STACK: Bio::Tools::EUtilities::parse_data > > > > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332 > > > > > STACK: Bio::Tools::EUtilities::get_ids > > > > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441 > > > > > STACK: Bio::DB::EUtilities::get_ids > > > > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363 > > > > > STACK: get_desc.pl:32 > > > > > ----------------------------------------------------------- > > > > > > > > > > And I never get a good explanation from NCBI or suggestions on how > > to > > > > avoid it. > > > > > > > > > > > > > > > --Russell > > > > > > > > > > > > > > >> -----Original Message----- > > > > >> From: Chris Fields [mailto:cjfields at illinois.edu] > > > > >> Sent: Wednesday, 27 January 2010 2:46 p.m. > > > > >> To: Smithies, Russell > > > > >> Cc: 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org' > > > > >> Subject: Re: [Bioperl-l] how to retrieve organism name from > > accession > > > > >> number? > > > > >> > > > > >> It's unfortunate but I have heard this problem popping up quite a > > bit > > > > more > > > > >> frequently lately. Not to push too many buttons but NCBI isn't > > very > > > > >> forthcoming with help these days; they have become quite insular. > > Not > > > > >> sure if they're short-staffed due to budget or if there are other > > > > issues. > > > > >> > > > > >> chris > > > > >> > > > > >> On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote: > > > > >> > > > > >>> Grrrrrr, I hate eutils!!!! > > > > >>> > > > > >>> ------------- EXCEPTION: Bio::Root::Exception ------------- > > > > >>> MSG: NCBI esearch fatal error: Search Backend failed: Error 111 > > > > >> (Connection refused) > > > > >>> STACK: Error::throw > > > > >>> STACK: Bio::Root::Root::throw > > > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 > > > > >>> STACK: Bio::Tools::EUtilities::parse_data > > > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332 > > > > >>> STACK: Bio::Tools::EUtilities::get_ids > > > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441 > > > > >>> STACK: Bio::DB::EUtilities::get_ids > > > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363 > > > > >>> STACK: get_desc.pl:32 > > > > >>> ----------------------------------------------------------- > > > > >>> > > > > >>> > > > > >>> Nice error message though :-) > > > > >>> > > > > >>> > > > > >>> --Russell > > > > >>> > > > > >>>> -----Original Message----- > > > > >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > > > >>>> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell > > > > >>>> Sent: Monday, 11 January 2010 10:05 a.m. > > > > >>>> To: 'Chris Fields' > > > > >>>> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open- > > > > bio.org' > > > > >>>> Subject: Re: [Bioperl-l] how to retrieve organism name from > > accession > > > > >>>> number? > > > > >>>> > > > > >>>> I've started to go off eUtils recently (not BioPerl's fault) as > > I've > > > > >> often > > > > >>>> been finding that with large queries, chunks of the resulting > > data is > > > > >>>> missing. > > > > >>>> For example, before Xmas I was creating species-specific > > databases by > > > > >>>> using eUtils to get a list of GI numbers back for a taxid, then > > > > >> retrieving > > > > >>>> the fasta sequences in chunks of 500. > > > > >>>> Very regularly, in the middle of the fasta there would be a > > message > > > > >> about > > > > >>>> resource unavailable eg. > > > > >>>>> test_sequence_1 > > > > >>>> TACGATCATCGCTResource UnavailableTACGACTCTGCT > > > > >>>>> test_sequence_2 > > > > >>>> TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT > > > > >>>> > > > > >>>> Often this wasn't detected until formatdb complained about > > invalid > > > > >>>> characters. > > > > >>>> Inquiries to NCBI as to why this was happening and what to do > > about > > > > it > > > > >>>> returned stupid answers ("do each sequence manually thru the web > > > > >>>> interface", or "use eUtils"). > > > > >>>> As we have a nice fast network connection, I now prefer to > > download > > > > >> very > > > > >>>> large gzip files (i.e. all of refseq) and extract what I need. > > > > >>>> > > > > >>>> I can't help but think that NCBI could solve a lot of problems if > > > > they > > > > >>>> gzipped the output from eUtils queries - it's something I've > > > > requested > > > > >>>> regularly for the last 5 years or so!! > > > > >>>> > > > > >>>> --Russell > > > > >>>> > > > > >>>> > > > > >>>>> -----Original Message----- > > > > >>>>> From: Chris Fields [mailto:cjfields at illinois.edu] > > > > >>>>> Sent: Monday, 11 January 2010 9:50 a.m. > > > > >>>>> To: Smithies, Russell > > > > >>>>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open- > > > > bio.org' > > > > >>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from > > > > accession > > > > >>>>> number? > > > > >>>>> > > > > >>>>> One could also use Bio::DB::Taxonomy, which indexes the same > > files > > > > or > > > > >>>>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD > > for > > > > >> the > > > > >>>>> details). > > > > >>>>> > > > > >>>>> chris > > > > >>>>> > > > > >>>>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote: > > > > >>>>> > > > > >>>>>> An alternate non-BioPerly way (that may be faster given NCBI's > > > > >>>> flakiness > > > > >>>>> lately) would be to download the gi_taxid_nucl.zip or > > > > >> gi_taxid_prot.zip > > > > >>>>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into > > a > > > > hash > > > > >>>> and > > > > >>>>> do lookups. > > > > >>>>>> In that same dir, taxdump.tar.gz contains a file called > > names.dmp > > > > >>>> which > > > > >>>>> lists taxids and descriptions (and synonyms) > > > > >>>>>> > > > > >>>>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes > > so > > > > I > > > > >>>>> could do this: > > > > >>>>>> > > > > >>>>>> my $taxid = $gi_taxid_nucl{$accession}; > > > > >>>>>> my $org_name = $names{$taxid}; > > > > >>>>>> > > > > >>>>>> --Russell > > > > >>>>>> > > > > >>>>>> > > > > >>>>>>> -----Original Message----- > > > > >>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > > > >>>>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen > > > > >>>>>>> Sent: Saturday, 26 December 2009 4:52 p.m. > > > > >>>>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org > > > > >>>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from > > > > >> accession > > > > >>>>>>> number? > > > > >>>>>>> > > > > >>>>>>> Bhakti, > > > > >>>>>>> The following example (using EUtilities) may serve your > > purpose: > > > > >>>>>>> > > > > >>>>>>> use Bio::DB::EUtilities; > > > > >>>>>>> > > > > >>>>>>> my (%taxa, @taxa); > > > > >>>>>>> my (%names, %idmap); > > > > >>>>>>> > > > > >>>>>>> # these are protein ids; nuc ids will work by changing -dbfrom > > => > > > > >>>>>>> 'nucleotide', > > > > >>>>>>> # (probably) > > > > >>>>>>> > > > > >>>>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439); > > > > >>>>>>> > > > > >>>>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink', > > > > >>>>>>> -db => 'taxonomy', > > > > >>>>>>> -dbfrom => 'protein', > > > > >>>>>>> -correspondence => 1, > > > > >>>>>>> -id => \@ids); > > > > >>>>>>> > > > > >>>>>>> # iterate through the LinkSet objects > > > > >>>>>>> while (my $ds = $factory->next_LinkSet) { > > > > >>>>>>> $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0] > > > > >>>>>>> } > > > > >>>>>>> > > > > >>>>>>> @taxa = @taxa{@ids}; > > > > >>>>>>> > > > > >>>>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary', > > > > >>>>>>> -db => 'taxonomy', > > > > >>>>>>> -id => \@taxa ); > > > > >>>>>>> > > > > >>>>>>> while (local $_ = $factory->next_DocSum) { > > > > >>>>>>> $names{($_->get_contents_by_name('TaxId'))[0]} = > > > > >>>>>>> ($_->get_contents_by_name('ScientificName'))[0]; > > > > >>>>>>> } > > > > >>>>>>> > > > > >>>>>>> foreach (@ids) { > > > > >>>>>>> $idmap{$_} = $names{$taxa{$_}}; > > > > >>>>>>> } > > > > >>>>>>> > > > > >>>>>>> # %idmap is > > > > >>>>>>> # 1621261 => 'Mycobacterium tuberculosis H37Rv' > > > > >>>>>>> # 20807972 => 'Thermoanaerobacter tengcongensis MB4' > > > > >>>>>>> # 68536103 => 'Corynebacterium jeikeium K411' > > > > >>>>>>> # 730439 => 'Bacillus caldolyticus' > > > > >>>>>>> # 89318838 => undef (this record has been removed from > > the > > > > db) > > > > >>>>>>> > > > > >>>>>>> 1; > > > > >>>>>>> > > > > >>>>>>> You probably will need to break up your 30000 into chunks > > > > >>>>>>> (say, 1000-3000 each), and do the above on each chunk with a > > > > >>>>>>> > > > > >>>>>>> sleep 3; > > > > >>>>>>> > > > > >>>>>>> or so separating the queries. > > > > >>>>>>> MAJ > > > > >>>>>>> ----- Original Message ----- > > > > >>>>>>> From: "Bhakti Dwivedi" > > > > >>>>>>> To: > > > > >>>>>>> Sent: Friday, December 25, 2009 9:46 PM > > > > >>>>>>> Subject: [Bioperl-l] how to retrieve organism name from > > accession > > > > >>>>> number? > > > > >>>>>>> > > > > >>>>>>> > > > > >>>>>>>> Hi, > > > > >>>>>>>> > > > > >>>>>>>> Does anyone know how to retrieve the "Source" or the "Species > > > > name" > > > > >>>>>>> given > > > > >>>>>>>> the accession number using Bioperl. I have these 30,000 > > > > accession > > > > >>>>>>> numbers > > > > >>>>>>>> for which I need to get the source organisms. Any kind of > > help > > > > >> will > > > > >>>>> be > > > > >>>>>>>> appreciated. > > > > >>>>>>>> > > > > >>>>>>>> Thanks > > > > >>>>>>>> > > > > >>>>>>>> BD > > > > >>>>>>>> _______________________________________________ > > > > >>>>>>>> Bioperl-l mailing list > > > > >>>>>>>> Bioperl-l at lists.open-bio.org > > > > >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > >>>>>>>> > > > > >>>>>>>> > > > > >>>>>>> > > > > >>>>>>> _______________________________________________ > > > > >>>>>>> Bioperl-l mailing list > > > > >>>>>>> Bioperl-l at lists.open-bio.org > > > > >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > >>>>>> > > > > >>>> > > > > > > ======================================================================= > > > > >>>>>> Attention: The information contained in this message and/or > > > > >>>> attachments > > > > >>>>>> from AgResearch Limited is intended only for the persons or > > > > entities > > > > >>>>>> to which it is addressed and may contain confidential and/or > > > > >>>> privileged > > > > >>>>>> material. Any review, retransmission, dissemination or other > > use > > > > of, > > > > >>>> or > > > > >>>>>> taking of any action in reliance upon, this information by > > persons > > > > or > > > > >>>>>> entities other than the intended recipients is prohibited by > > > > >>>> AgResearch > > > > >>>>>> Limited. If you have received this message in error, please > > notify > > > > >> the > > > > >>>>>> sender immediately. > > > > >>>>>> > > > > >>>> > > > > > > ======================================================================= > > > > >>>>>> > > > > >>>>>> _______________________________________________ > > > > >>>>>> Bioperl-l mailing list > > > > >>>>>> Bioperl-l at lists.open-bio.org > > > > >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > >>>> > > > > >>>> > > > > >>>> _______________________________________________ > > > > >>>> Bioperl-l mailing list > > > > >>>> Bioperl-l at lists.open-bio.org > > > > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > > > > _______________________________________________ > > > > > Bioperl-l mailing list > > > > > Bioperl-l at lists.open-bio.org > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From maj at fortinbras.us Thu Jan 28 14:55:31 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 28 Jan 2010 14:55:31 -0500 Subject: [Bioperl-l] EUtilities policy change In-Reply-To: <1264706302.5473.48.camel@cjfields.igb.uiuc.edu> References: <1264705226.5473.35.camel@cjfields.igb.uiuc.edu><8F49B5ED151143FA86E977B4D4F44265@NewLife> <1264706302.5473.48.camel@cjfields.igb.uiuc.edu> Message-ID: Ok, SoapEU now warns on no email; passes email onto the fetch stage during autofetch -- cheers MAJ ----- Original Message ----- From: "Chris Fields" To: "Mark A. Jensen" Cc: "BioPerl-l" Sent: Thursday, January 28, 2010 2:18 PM Subject: Re: [Bioperl-l] EUtilities policy change >I think warning is fine for now. I've reimplemented that so it occurs > lazily (warns only when a request is actually made). > > Will also change the tool to 'BioPerl' (currently 'bioperl', all lc). > We'll obviously have to address this in the test suite as well in some > way, maybe ask for an email if network tests are requested. > > chris > > On Thu, 2010-01-28 at 14:05 -0500, Mark A. Jensen wrote: >> Thanks Chris-- >> The soap modules currently set tool to "SoapEUtilities(BioPerl)". >> I agree that a default email is a bad idea (tm) (unless maybe it's >> hilmar's...?). I'd say a warning on unset email parameters is a responsible >> "there be dragons" sort of treatment. >> MAJ >> ----- Original Message ----- >> From: "Chris Fields" >> To: "BioPerl-l" >> Cc: "Mark A. Jensen" >> Sent: Thursday, January 28, 2010 2:00 PM >> Subject: EUtilities policy change >> >> >> > All, >> > >> > Per NCBI's recent change in eutils user policy (effective June 1): >> > >> > http://bioperl.org/pipermail/bioperl-l/2009-December/031698.html >> > >> > Both the tool and email parameters ('-tool', '-email') are now required >> > when making requests. Note this will significantly break all modules >> > requiring remote access to eutils (Bio::DB::GenBank/GenPept, some Biblio >> > and Taxonomy stuff as well, IIRC). This also applies to web services >> > (SOAP-based access). Mark, not sure how this affects your SOAP-based >> > modules. >> > >> > I have reconfigured Bio::DB::EUtilities to follow this policy; the >> > default tool setting has been 'bioperl' and will remain that way. >> > However, there has been no default email, therefore setting this is now >> > required for future requests unless we (the bioperl devs) decide there >> > is a safe default email to utilize. My gut tells me, however, that >> > falling back to a default email opens up a can of worms for the devs and >> > is very likely a 'BAD IDEA'(TM). >> > >> > Regardless, be aware that, after June 1, NCBI will very likely exclude >> > requests with no email and will notify users who are considered to be >> > violating their policies. >> > >> > I will likely make further changes to Bio::DB::EUtilities in the >> > meantime to ensure that using the tools by default will not violate >> > NCBI's policy (e.g. override this at your own risk). >> > >> > chris >> > >> > >> > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From chapmanb at 50mail.com Thu Jan 28 15:35:05 2010 From: chapmanb at 50mail.com (Brad Chapman) Date: Thu, 28 Jan 2010 15:35:05 -0500 Subject: [Bioperl-l] OpenBio solution challenge: Project updates at BOSC 2010 Message-ID: <20100128203505.GG40046@sobchak.mgh.harvard.edu> Hello all; The BOSC 2010 organizing committee is hard at work getting prepared for this July's meeting in Boston: http://www.open-bio.org/wiki/BOSC_2010 One of the items we've traditionally had at the conference is a project update from each of the OpenBio affiliated groups. This year, we're thinking about organizing these talks around a central theme: the OpenBio solution challenge. We start with a biological question of general interest, and each of the project talks would focus around how you would solve that problem using your toolkit and programming language. This is meant to provide a challenge for OpenBio contributors, a nice tutorial style overview of various projects and approaches for other programmers, and a fun opportunity to compete and learn from other projects. Conference attendees will vote on their favorite solution, with the winner receiving fame and fortune (warning: fortune not guaranteed). For this to be successful, it of course requires interest and enthusiasm from y'all fine folks involved with the projects. Specifically: - Is there interest from your group in participating in the challenge? You'll want at least a few people to work on it, and someone to give a presentation at BOSC. - Do you have suggestions on a good theme or specific biological problem to tackle? We'll hope to pick something in a sweet spot that is challenging enough to be of interest, yet reasonable for presentation and preparation. Let's discuss ideas and get this together. Since the schedule for BOSC is developing rapidly, please give us an idea if you're interested by February 12th, and copy responses to the BOSC mailing list as a central place for discussion. bosc at open-bio.org Thanks, Brad, Michael, and the BOSC organizing committee From markw at illuminae.com Thu Jan 28 16:17:44 2010 From: markw at illuminae.com (Mark Wilkinson) Date: Thu, 28 Jan 2010 13:17:44 -0800 Subject: [Bioperl-l] [MOBY-dev] OpenBio solution challenge: Project updates at BOSC 2010 In-Reply-To: <20100128203505.GG40046@sobchak.mgh.harvard.edu> References: <20100128203505.GG40046@sobchak.mgh.harvard.edu> Message-ID: Brad, this sounds exciting! One thing strikes me, though - by asking for the sub-projects to propose the "grand challenge" themselves the one thing you can guarantee is that the "grand challenge" is solvable (or more likely, already solved!) Other "grand challenge" kinds of meetings have an independent third party pose the problem that has to be solved, and then all groups work toward a solution and compare their results. This would, IMO, be more revealing of the "state of the art" in each Open-Bio project, and point out where the weaknesses are that we should be focusing on... Someone (for example, you!) could act as the moderator to ensure that the "grand challenge" was at least a reasonable one, within the scope of what an Open-Bio project *should* be able to solve... Just my CAD $0.02 Mark On Thu, 28 Jan 2010 12:35:05 -0800, Brad Chapman wrote: > Hello all; > The BOSC 2010 organizing committee is hard at work getting prepared for > this > July's meeting in Boston: > > http://www.open-bio.org/wiki/BOSC_2010 > > One of the items we've traditionally had at the conference is a project > update from each of the OpenBio affiliated groups. This year, we're > thinking > about organizing these talks around a central theme: the OpenBio solution > challenge. We start with a biological question of general interest, and > each > of the project talks would focus around how you would solve that problem > using your toolkit and programming language. > > This is meant to provide a challenge for OpenBio contributors, a nice > tutorial > style overview of various projects and approaches for other programmers, > and a > fun opportunity to compete and learn from other projects. Conference > attendees > will vote on their favorite solution, with the winner receiving fame and > fortune (warning: fortune not guaranteed). > > For this to be successful, it of course requires interest and enthusiasm > from > y'all fine folks involved with the projects. Specifically: > > - Is there interest from your group in participating in the challenge? > You'll > want at least a few people to work on it, and someone to give a > presentation > at BOSC. > > - Do you have suggestions on a good theme or specific biological problem > to > tackle? We'll hope to pick something in a sweet spot that is > challenging > enough to be of interest, yet reasonable for presentation and > preparation. > > Let's discuss ideas and get this together. Since the schedule for BOSC is > developing rapidly, please give us an idea if you're interested by > February 12th, and copy responses to the BOSC mailing list as a central > place for discussion. > > bosc at open-bio.org > > Thanks, > Brad, Michael, and the BOSC organizing committee > _______________________________________________ > MOBY-dev mailing list > MOBY-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/moby-dev -- Mark D Wilkinson, PI Bioinformatics Assistant Professor, Medical Genetics The James Hogg iCAPTURE Centre for Cardiovascular and Pulmonary Research Providence Heart + Lung Institute University of British Columbia - St. Paul's Hospital Vancouver, BC, Canada From HWillis at scripps.edu Thu Jan 28 20:03:10 2010 From: HWillis at scripps.edu (Scooter Willis) Date: Thu, 28 Jan 2010 20:03:10 -0500 Subject: [Bioperl-l] [Biojava-dev] [MOBY-dev] OpenBio solution challenge: Project updates at BOSC 2010 In-Reply-To: References: <20100128203505.GG40046@sobchak.mgh.harvard.edu> Message-ID: <716E205A-5196-409F-A7BC-EF0F52AA997A@scripps.edu> Brad I agree with Mark that a particular problem may be biased towards a toolkit/language. Another approach would be to list a collection of problems and each group would then pick a problem to present. Could be a little more interesting to the audience as you are exposed to different problems and the various strengths of each toolkit. This could also help guide future development in the other toolkits as you would benefit from learning about the api and/or programming language. Each group would register a problem that they are going to present. From the group of problems not picked that becomes the surprise challenge where each group has 24 hours to either put together a presentation or an actual solution. Scooter On Jan 28, 2010, at 4:17 PM, Mark Wilkinson wrote: > > Brad, this sounds exciting! > > One thing strikes me, though - by asking for the sub-projects to propose > the "grand challenge" themselves the one thing you can guarantee is that > the "grand challenge" is solvable (or more likely, already solved!) > > Other "grand challenge" kinds of meetings have an independent third party > pose the problem that has to be solved, and then all groups work toward a > solution and compare their results. This would, IMO, be more revealing of > the "state of the art" in each Open-Bio project, and point out where the > weaknesses are that we should be focusing on... Someone (for example, > you!) could act as the moderator to ensure that the "grand challenge" was > at least a reasonable one, within the scope of what an Open-Bio project > *should* be able to solve... > > Just my CAD $0.02 > > Mark > > > > On Thu, 28 Jan 2010 12:35:05 -0800, Brad Chapman > wrote: > >> Hello all; >> The BOSC 2010 organizing committee is hard at work getting prepared for >> this >> July's meeting in Boston: >> >> http://www.open-bio.org/wiki/BOSC_2010 >> >> One of the items we've traditionally had at the conference is a project >> update from each of the OpenBio affiliated groups. This year, we're >> thinking >> about organizing these talks around a central theme: the OpenBio solution >> challenge. We start with a biological question of general interest, and >> each >> of the project talks would focus around how you would solve that problem >> using your toolkit and programming language. >> >> This is meant to provide a challenge for OpenBio contributors, a nice >> tutorial >> style overview of various projects and approaches for other programmers, >> and a >> fun opportunity to compete and learn from other projects. Conference >> attendees >> will vote on their favorite solution, with the winner receiving fame and >> fortune (warning: fortune not guaranteed). >> >> For this to be successful, it of course requires interest and enthusiasm >> from >> y'all fine folks involved with the projects. Specifically: >> >> - Is there interest from your group in participating in the challenge? >> You'll >> want at least a few people to work on it, and someone to give a >> presentation >> at BOSC. >> >> - Do you have suggestions on a good theme or specific biological problem >> to >> tackle? We'll hope to pick something in a sweet spot that is >> challenging >> enough to be of interest, yet reasonable for presentation and >> preparation. >> >> Let's discuss ideas and get this together. Since the schedule for BOSC is >> developing rapidly, please give us an idea if you're interested by >> February 12th, and copy responses to the BOSC mailing list as a central >> place for discussion. >> >> bosc at open-bio.org >> >> Thanks, >> Brad, Michael, and the BOSC organizing committee >> _______________________________________________ >> MOBY-dev mailing list >> MOBY-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/moby-dev > > > -- > Mark D Wilkinson, PI Bioinformatics > Assistant Professor, Medical Genetics > The James Hogg iCAPTURE Centre for Cardiovascular and Pulmonary Research > Providence Heart + Lung Institute > University of British Columbia - St. Paul's Hospital > Vancouver, BC, Canada > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From biopython at maubp.freeserve.co.uk Fri Jan 29 05:36:40 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 29 Jan 2010 10:36:40 +0000 Subject: [Bioperl-l] [MOBY-dev] OpenBio solution challenge: Project updates at BOSC 2010 In-Reply-To: References: <20100128203505.GG40046@sobchak.mgh.harvard.edu> Message-ID: <320fb6e01001290236l1ad02515w403a19f94dbb6d15@mail.gmail.com> Hi all, This is a great topic but should be continue it on just the one mailing list? Is there a suitable BOSC list, or how about the general Open Bio list? On Thu, Jan 28, 2010 at 9:17 PM, Mark Wilkinson wrote: > > Brad, this sounds exciting! > > One thing strikes me, though - by asking for the sub-projects to propose > the "grand challenge" themselves the one thing you can guarantee is that > the "grand challenge" is solvable (or more likely, already solved!) > > Other "grand challenge" kinds of meetings have an independent third party > pose the problem that has to be solved, and then all groups work toward a > solution and compare their results. ?This would, IMO, be more revealing of > the "state of the art" in each Open-Bio project, and point out where the > weaknesses are that we should be focusing on... ?Someone (for example, > you!) could act as the moderator to ensure that the "grand challenge" was > at least a reasonable one, within the scope of what an Open-Bio project > *should* be able to solve... > > Just my CAD $0.02 > > Mark One possible problem with having Brad act as moderator is his ties to Biopython (plus it would be a shame if we'd be one man down for trying to solve the challenges - grin). Having a project representative "sign off" on the challenge might work - or simply the whole of the BOSC committee which is quite balanced. Alternatively some kind of panel of challenges does seem a good way to reduce individual project bias (as suggest by Scooter), but there will still need to be a judging committee. I'm curious what kind of challenges the BOSC committee had in mind - would something like taking a newly sequence bacteria and producing an automated annotation as a GenBank, EMBL, or GFF file be too ambitious for example? There are already several major projects to do this e.g. RAST http://rast.nmpdr.org/ Peter (@Biopython) From mike.stubbington at bbsrc.ac.uk Fri Jan 29 08:25:25 2010 From: mike.stubbington at bbsrc.ac.uk (mike stubbington (BI)) Date: Fri, 29 Jan 2010 13:25:25 +0000 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error running blastn In-Reply-To: References: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk><56CC8EF7EEAD4E28A3B822ADA8AD74B1@NewLife> <05171AE0-71F2-4102-8131-70C6019D0172@bbsrc.ac.uk> Message-ID: Hi Mark, Thanks for your continued help. It now fails with this: ------------- EXCEPTION ------------- MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running /usr/local/ncbi/blast/bin/blastn : BLAST Database error: No alias or index file found for nucleotide database [MouseGenome] in search path [/Volumes/stubbing/PerlScripts/5CTest/trunk::] STACK Bio::Tools::Run::WrapperBase::_run /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1004 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 STACK Bio::Tools::Run::StandAloneBlastPlus::run /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 STACK toplevel ./5CTest.pl:63 ------------------------------------- If I change the factory creation to: my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new( -db_name => '/Users/stubbing/localBlast/MouseGenome' ); it fails with ------------- EXCEPTION ------------- MSG: DB name not valid STACK Bio::Tools::Run::StandAloneBlastPlus::new /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:516 STACK toplevel ./5CTest.pl:45 ------------------------------------- However I can run the following successfully from the command line: blastn -db /Users/stubbing/localBlast/MouseGenome -query querySequence.fasta Is there something wrong with how I'm referring to the blast database when I construct my factory? Thanks again, M On 28 Jan 2010, at 18:47, Mark A. Jensen wrote: > Hi Mike, > Believe I found the real bug causing the problem (was not accounting for > the db_dir parameter). Crashes should now also throw much more helpful > errors. Please try the code at r16774, and shout back. > thanks -- > MAJ > ----- Original Message ----- > From: "mike stubbington (BI)" > To: "Mark A. Jensen" > Cc: > Sent: Thursday, January 28, 2010 11:18 AM > Subject: Re: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek > error running blastn > > > Hi, > > Thanks for the suggestion. Unfortunately it still fails - error as follows: > > ------------- EXCEPTION ------------- > MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running > /usr/local/ncbi/blast/bin/blastn : Illegal seek at > /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 1000, > line 532. > > STACK Bio::Tools::Run::WrapperBase::_run > /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1005 > STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 > STACK Bio::Tools::Run::StandAloneBlastPlus::run > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 > STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 > STACK toplevel ./5CTest.pl:63 > ------------------------------------- > > M > > On 28 Jan 2010, at 15:56, Mark A. Jensen wrote: > >> Mike - please try updating your bioperl-live (the core) to the latest code >> (revision 16761 or so). >> CommandExts is a work in progress; from the stack errors it looks like you've >> got an older version. >> Try it then ping us back, if you would-- >> Thanks >> Mark >> ----- Original Message ----- >> From: "mike stubbington (BI)" >> To: >> Sent: Thursday, January 28, 2010 10:41 AM >> Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error >> running blastn >> >> >> Dear all, >> >> I am attempting to blast some primers against the mouse genome. I have created >> a >> local mouse genome blast database and I can search against it using 'blastn' >> at >> the command line. >> >> I have perl code that creates an array of bioperl sequence objects called >> @primers >> >> I then create a StandAloneBlastPlus factory using the following code? >> >> my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new( >> -db_dir => '/Users/stubbing/localBlast/', >> -db_name => 'MouseGenome' >> ); >> >> and then attempt to blast my primers using this? >> >> my @shortPrimers; >> my $count=1; >> foreach (@primers) { >> my $currentSeq = $_; >> print "Checking primer $count/$primerNumber "; >> if ($_->length < 40) { >> push(@shortPrimers,$_); >> print "Too short!\n"; >> } >> else { >> print "BLASTing..."; >> my $blastResult = $blastFactory->blastn(-query => $currentSeq); >> } >> $count++; >> } >> >> This fails with the following error? >> >> ------------- EXCEPTION ------------- >> MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem >> running >> /usr/local/ncbi/blast/bin/blastn : Illegal seek at >> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 989, >> line 532. >> >> STACK Bio::Tools::Run::WrapperBase::_run >> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:994 >> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD >> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 >> STACK Bio::Tools::Run::StandAloneBlastPlus::run >> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 >> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD >> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 >> STACK toplevel ./5CTest.pl:63 >> ------------------------------------- >> >> Line 63 in my code is (as you might expect) the one that calls blastn on my >> factory object. >> >> I'd appreciate any help you might be able to provide to shed light on this. >> >> Thanks in advance, >> >> Mike >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Fri Jan 29 08:36:54 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 29 Jan 2010 08:36:54 -0500 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error running blastn In-Reply-To: References: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk><56CC8EF7EEAD4E28A3B822ADA8AD74B1@NewLife> <05171AE0-71F2-4102-8131-70C6019D0172@bbsrc.ac.uk> Message-ID: Hi Mike- Well, at least we're getting more informative errors. I think it's still my bad; will look again. Both of your calls should work. (thanks for the positive control too) Thanks for your patience and the help-- MAJ ----- Original Message ----- From: "mike stubbington (BI)" To: "Mark A. Jensen" Cc: ; "Brian Osborne" Sent: Friday, January 29, 2010 8:25 AM Subject: Re: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error running blastn Hi Mark, Thanks for your continued help. It now fails with this: ------------- EXCEPTION ------------- MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running /usr/local/ncbi/blast/bin/blastn : BLAST Database error: No alias or index file found for nucleotide database [MouseGenome] in search path [/Volumes/stubbing/PerlScripts/5CTest/trunk::] STACK Bio::Tools::Run::WrapperBase::_run /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1004 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 STACK Bio::Tools::Run::StandAloneBlastPlus::run /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 STACK toplevel ./5CTest.pl:63 ------------------------------------- If I change the factory creation to: my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new( -db_name => '/Users/stubbing/localBlast/MouseGenome' ); it fails with ------------- EXCEPTION ------------- MSG: DB name not valid STACK Bio::Tools::Run::StandAloneBlastPlus::new /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:516 STACK toplevel ./5CTest.pl:45 ------------------------------------- However I can run the following successfully from the command line: blastn -db /Users/stubbing/localBlast/MouseGenome -query querySequence.fasta Is there something wrong with how I'm referring to the blast database when I construct my factory? Thanks again, M On 28 Jan 2010, at 18:47, Mark A. Jensen wrote: > Hi Mike, > Believe I found the real bug causing the problem (was not accounting for > the db_dir parameter). Crashes should now also throw much more helpful > errors. Please try the code at r16774, and shout back. > thanks -- > MAJ > ----- Original Message ----- > From: "mike stubbington (BI)" > To: "Mark A. Jensen" > Cc: > Sent: Thursday, January 28, 2010 11:18 AM > Subject: Re: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek > error running blastn > > > Hi, > > Thanks for the suggestion. Unfortunately it still fails - error as follows: > > ------------- EXCEPTION ------------- > MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem > running > /usr/local/ncbi/blast/bin/blastn : Illegal seek at > /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 1000, > > line 532. > > STACK Bio::Tools::Run::WrapperBase::_run > /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1005 > STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 > STACK Bio::Tools::Run::StandAloneBlastPlus::run > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 > STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 > STACK toplevel ./5CTest.pl:63 > ------------------------------------- > > M > > On 28 Jan 2010, at 15:56, Mark A. Jensen wrote: > >> Mike - please try updating your bioperl-live (the core) to the latest code >> (revision 16761 or so). >> CommandExts is a work in progress; from the stack errors it looks like you've >> got an older version. >> Try it then ping us back, if you would-- >> Thanks >> Mark >> ----- Original Message ----- >> From: "mike stubbington (BI)" >> To: >> Sent: Thursday, January 28, 2010 10:41 AM >> Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek >> error >> running blastn >> >> >> Dear all, >> >> I am attempting to blast some primers against the mouse genome. I have >> created >> a >> local mouse genome blast database and I can search against it using 'blastn' >> at >> the command line. >> >> I have perl code that creates an array of bioperl sequence objects called >> @primers >> >> I then create a StandAloneBlastPlus factory using the following code? >> >> my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new( >> -db_dir => '/Users/stubbing/localBlast/', >> -db_name => 'MouseGenome' >> ); >> >> and then attempt to blast my primers using this? >> >> my @shortPrimers; >> my $count=1; >> foreach (@primers) { >> my $currentSeq = $_; >> print "Checking primer $count/$primerNumber "; >> if ($_->length < 40) { >> push(@shortPrimers,$_); >> print "Too short!\n"; >> } >> else { >> print "BLASTing..."; >> my $blastResult = $blastFactory->blastn(-query => $currentSeq); >> } >> $count++; >> } >> >> This fails with the following error? >> >> ------------- EXCEPTION ------------- >> MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem >> running >> /usr/local/ncbi/blast/bin/blastn : Illegal seek at >> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 989, >> >> line 532. >> >> STACK Bio::Tools::Run::WrapperBase::_run >> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:994 >> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD >> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 >> STACK Bio::Tools::Run::StandAloneBlastPlus::run >> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 >> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD >> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 >> STACK toplevel ./5CTest.pl:63 >> ------------------------------------- >> >> Line 63 in my code is (as you might expect) the one that calls blastn on my >> factory object. >> >> I'd appreciate any help you might be able to provide to shed light on this. >> >> Thanks in advance, >> >> Mike >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Fri Jan 29 08:47:48 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 29 Jan 2010 08:47:48 -0500 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error running blastn In-Reply-To: References: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk><56CC8EF7EEAD4E28A3B822ADA8AD74B1@NewLife><05171AE0-71F2-4102-8131-70C6019D0172@bbsrc.ac.uk> Message-ID: <2B7BF6CD46AE441AB24203E169D9C503@NewLife> Mike et al-- I've entered this as Bug #3003 on http://bugzilla.bioperl.org; we'll do further ping-pongs on this issue via the comment facility there-- cheers MAJ ----- Original Message ----- From: "mike stubbington (BI)" To: "Mark A. Jensen" Cc: ; ; "Osborne" Sent: Friday, January 29, 2010 8:25 AM Subject: Re: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error running blastn Hi Mark, Thanks for your continued help. It now fails with this: ------------- EXCEPTION ------------- MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running /usr/local/ncbi/blast/bin/blastn : BLAST Database error: No alias or index file found for nucleotide database [MouseGenome] in search path [/Volumes/stubbing/PerlScripts/5CTest/trunk::] STACK Bio::Tools::Run::WrapperBase::_run /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1004 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 STACK Bio::Tools::Run::StandAloneBlastPlus::run /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 STACK toplevel ./5CTest.pl:63 ------------------------------------- If I change the factory creation to: my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new( -db_name => '/Users/stubbing/localBlast/MouseGenome' ); it fails with ------------- EXCEPTION ------------- MSG: DB name not valid STACK Bio::Tools::Run::StandAloneBlastPlus::new /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:516 STACK toplevel ./5CTest.pl:45 ------------------------------------- However I can run the following successfully from the command line: blastn -db /Users/stubbing/localBlast/MouseGenome -query querySequence.fasta Is there something wrong with how I'm referring to the blast database when I construct my factory? Thanks again, M On 28 Jan 2010, at 18:47, Mark A. Jensen wrote: > Hi Mike, > Believe I found the real bug causing the problem (was not accounting for > the db_dir parameter). Crashes should now also throw much more helpful > errors. Please try the code at r16774, and shout back. > thanks -- > MAJ > ----- Original Message ----- > From: "mike stubbington (BI)" > To: "Mark A. Jensen" > Cc: > Sent: Thursday, January 28, 2010 11:18 AM > Subject: Re: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek > error running blastn > > > Hi, > > Thanks for the suggestion. Unfortunately it still fails - error as follows: > > ------------- EXCEPTION ------------- > MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem > running > /usr/local/ncbi/blast/bin/blastn : Illegal seek at > /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 1000, > > line 532. > > STACK Bio::Tools::Run::WrapperBase::_run > /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1005 > STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 > STACK Bio::Tools::Run::StandAloneBlastPlus::run > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 > STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 > STACK toplevel ./5CTest.pl:63 > ------------------------------------- > > M > > On 28 Jan 2010, at 15:56, Mark A. Jensen wrote: > >> Mike - please try updating your bioperl-live (the core) to the latest code >> (revision 16761 or so). >> CommandExts is a work in progress; from the stack errors it looks like you've >> got an older version. >> Try it then ping us back, if you would-- >> Thanks >> Mark >> ----- Original Message ----- >> From: "mike stubbington (BI)" >> To: >> Sent: Thursday, January 28, 2010 10:41 AM >> Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek >> error >> running blastn >> >> >> Dear all, >> >> I am attempting to blast some primers against the mouse genome. I have >> created >> a >> local mouse genome blast database and I can search against it using 'blastn' >> at >> the command line. >> >> I have perl code that creates an array of bioperl sequence objects called >> @primers >> >> I then create a StandAloneBlastPlus factory using the following code? >> >> my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new( >> -db_dir => '/Users/stubbing/localBlast/', >> -db_name => 'MouseGenome' >> ); >> >> and then attempt to blast my primers using this? >> >> my @shortPrimers; >> my $count=1; >> foreach (@primers) { >> my $currentSeq = $_; >> print "Checking primer $count/$primerNumber "; >> if ($_->length < 40) { >> push(@shortPrimers,$_); >> print "Too short!\n"; >> } >> else { >> print "BLASTing..."; >> my $blastResult = $blastFactory->blastn(-query => $currentSeq); >> } >> $count++; >> } >> >> This fails with the following error? >> >> ------------- EXCEPTION ------------- >> MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem >> running >> /usr/local/ncbi/blast/bin/blastn : Illegal seek at >> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 989, >> >> line 532. >> >> STACK Bio::Tools::Run::WrapperBase::_run >> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:994 >> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD >> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 >> STACK Bio::Tools::Run::StandAloneBlastPlus::run >> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 >> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD >> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 >> STACK toplevel ./5CTest.pl:63 >> ------------------------------------- >> >> Line 63 in my code is (as you might expect) the one that calls blastn on my >> factory object. >> >> I'd appreciate any help you might be able to provide to shed light on this. >> >> Thanks in advance, >> >> Mike >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From help at gmod.org Fri Jan 29 17:03:48 2010 From: help at gmod.org (Dave Clements, GMOD Help Desk) Date: Fri, 29 Jan 2010 14:03:48 -0800 Subject: [Bioperl-l] 2010 GMOD Summer School - Americas In-Reply-To: <71ee57c71001291402h2ec67300r4fc7a3b2375f4080@mail.gmail.com> References: <71ee57c71001291351q47994b82w10dffb390dbf2837@mail.gmail.com> <71ee57c71001291354m68548823s3e3fbd2e49e9b332@mail.gmail.com> <71ee57c71001291356p5e7f1aadi2bf437c93014a393@mail.gmail.com> <71ee57c71001291357h67112e2fkcf835687e59f66ae@mail.gmail.com> <71ee57c71001291358k74781b08n232534d8895c5ec1@mail.gmail.com> <71ee57c71001291400y28e40eb6i112ea91df977dc67@mail.gmail.com> <71ee57c71001291400n6133982eh3a02293ff741900b@mail.gmail.com> <71ee57c71001291401y505b56baic61c11754d88a444@mail.gmail.com> <71ee57c71001291402s23e3f2e9w2562d6acf85bd4ae@mail.gmail.com> <71ee57c71001291402h2ec67300r4fc7a3b2375f4080@mail.gmail.com> Message-ID: <71ee57c71001291403s19be18f3s3a1d5a314c74def@mail.gmail.com> Hello all, I am pleased to announce that we are now accepting applications for: ? 2010 GMOD Summer School - Americas ? ? 6-9 May 2010 ? ? NESCent, Durham, NC, USA ? ? http://gmod.org/wiki/2010_GMOD_Summer_School_-_Americas This will be a hands-on multi-day course aimed at teaching new GMOD users/administrators how to get GMOD Components up and running. The course will introduce participants to the GMOD project and then focus on installation, configuration and integration of popular GMOD Components. The course will be held May 6-9, at NESCent in Durham, NC. These components will be covered: ? ?* Apollo - genome annotation editor ? ?* Chado - a modular and extensible database schema ? ?* Galaxy - workflow system ? ?* GBrowse - the Generic Genome Browser ? ?* GBrowse_syn - A generic synteny browser ? ?* JBrowse - genome browser ? ?* MAKER - genome annotation pipeline ? ?* Tripal - web front end for Chado The deadline for applying is the end of Friday, February 22. Admission is competitive and is based on the strength of the application (especially the statement of interest). In 2009 there were over 50 applications for the 25 slots. Any applications received after the deadline will be placed on the waiting list. See the course page for details and an application link: ?http://gmod.org/wiki/2010_GMOD_Summer_School_-_Americas Thanks, Dave Clements GMOD Help Desk PS: We are also investigating holding a GMOD course in the Asia/Pacific region, sometime this fall. Watch the GMOD mailing lists and the GMOD News page/RSS feed for updates. -- Please keep responses on the list! http://gmod.org/wiki/2010_GMOD_Summer_School_-_Americas http://gmod.org/wiki/GMOD_News Was this helpful? http://gmod.org/wiki/Help_Desk_Feedback From bhakti.dwivedi at gmail.com Sat Jan 30 17:38:40 2010 From: bhakti.dwivedi at gmail.com (Bhakti Dwivedi) Date: Sat, 30 Jan 2010 17:38:40 -0500 Subject: [Bioperl-l] how to map blast results on to the genome? Message-ID: Does anyone know how I can graphically map the blast results (m -8 format) to the genome using bio-perl? Thanks Bhakti From jason at bioperl.org Sat Jan 30 18:56:14 2010 From: jason at bioperl.org (Jason Stajich) Date: Sat, 30 Jan 2010 15:56:14 -0800 Subject: [Bioperl-l] how to map blast results on to the genome? In-Reply-To: References: Message-ID: <68937A7D-291F-419A-9ED7-7A87D9B4C78A@bioperl.org> Did you try BioGraphics and read the HOWTO on it -- http://bioperl.org/wiki/HOWTO:Graphics On Jan 30, 2010, at 2:38 PM, Bhakti Dwivedi wrote: > Does anyone know how I can graphically map the blast results (m -8 > format) > to the genome using bio-perl? > > Thanks > > Bhakti > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ http://twitter.com/hyphaltip From David.Messina at sbc.su.se Sun Jan 31 12:43:52 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Sun, 31 Jan 2010 18:43:52 +0100 Subject: [Bioperl-l] question about a PAML module In-Reply-To: <18165610.1264958229480.JavaMail.oracle@rif1.s.upf.edu> References: <17885902.1262198478831.JavaMail.oracle@rif1.s.upf.edu> <31992102.1262223390984.JavaMail.oracle@rif2.s.upf.edu> <18165610.1264958229480.JavaMail.oracle@rif1.s.upf.edu> Message-ID: Hey Rui, My apologies for keeping you waiting on this. I started looking at it on Friday, and while I believe it'll be a relatively easy fix, I haven't got to the bottom of it yet. I'll look at it some more tomorrow and hopefully get it sorted it in the next day or two. Dave From bluecurio at gmail.com Sun Jan 31 22:22:37 2010 From: bluecurio at gmail.com (Daniel Renfro) Date: Sun, 31 Jan 2010 21:22:37 -0600 Subject: [Bioperl-l] New package to compare two SeqI-implementing objects Message-ID: <25c0f63d1001311922u134e9321s10f490a335f4a6e@mail.gmail.com> Hello all, A colleague and I have been working on a (Bio)Perl package to compare two Seq objects. This is in response to a need we found in our lab -- we wanted to see the changes to GenBank files through time, but wanted an automated way to do this. This led to what I'm calling the SeqDiff.pm package. I thought it would be a good idea to inform the community and get some feedback. The package takes two Seq objects as arguments, arbitrarily called "old" and "new." It then matches the features from the old object with the new object. This is done based on some criteria -- in our case we decided the features must be of the same type (have the same primary_tag) and have at least one matching database cross-reference (db_xref) in common. The left-over features (ones that did not have a match) are dropped into arrays called "lost" and "gained." The matching is done in about NlogN time, as each matching pair are removed from subsequent searches. The matched features and iterated through and the differences are calculated. Each feature is examined recursively and any differences are reported. Optionally you can give the new() method a flag so that everything is returned (differences and similarities.) You can set callbacks for different types of objects (like anything that isa('Bio::LocationI')) if you want a custom comparison for specific BioPerl objects. This comparison step is the computationally slow part, and currently everything is held in memory. I think it'd be better to do this piece-meal, using the BioPerl-ish next() and last() methods. Maybe this was a little verbose, but that is the SeqDiff package in a nutshell. I hope to soon release v1.0. If you have any questions or comments I'd love to hear them. -Daniel Renfro Hu Lab Research Associate Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4055 From maj at fortinbras.us Sun Jan 31 22:47:05 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 31 Jan 2010 22:47:05 -0500 Subject: [Bioperl-l] New package to compare two SeqI-implementing objects In-Reply-To: <25c0f63d1001311922u134e9321s10f490a335f4a6e@mail.gmail.com> References: <25c0f63d1001311922u134e9321s10f490a335f4a6e@mail.gmail.com> Message-ID: <5DC96D65B6A447C3802AF5D745FF4AA4@NewLife> Daniel-- this sounds interesting and useful, I +1 it. Your intuition about in-memory vs streaming sounds correct to me; features can be many, and diffing many (MANY) sequences may bork. Maybe our feature-rich users can chime in. (...however, I did just hear about a magic spell called 'File::Map', might check that out on CPAN.) cheers- MAJ ----- Original Message ----- From: "Daniel Renfro" To: Sent: Sunday, January 31, 2010 10:22 PM Subject: [Bioperl-l] New package to compare two SeqI-implementing objects > Hello all, > > A colleague and I have been working on a (Bio)Perl package to compare two > Seq objects. This is in response to a need we found in our lab -- we wanted > to see the changes to GenBank files through time, but wanted an automated > way to do this. This led to what I'm calling the SeqDiff.pm package. I > thought it would be a good idea to inform the community and get some > feedback. > > The package takes two Seq objects as arguments, arbitrarily called "old" and > "new." It then matches the features from the old object with the new object. > This is done based on some criteria -- in our case we decided the features > must be of the same type (have the same primary_tag) and have at least one > matching database cross-reference (db_xref) in common. The left-over > features (ones that did not have a match) are dropped into arrays called > "lost" and "gained." The matching is done in about NlogN time, as each > matching pair are removed from subsequent searches. > > The matched features and iterated through and the differences are > calculated. Each feature is examined recursively and any differences are > reported. Optionally you can give the new() method a flag so that everything > is returned (differences and similarities.) You can set callbacks for > different types of objects (like anything that isa('Bio::LocationI')) if you > want a custom comparison for specific BioPerl objects. This comparison step > is the computationally slow part, and currently everything is held in > memory. I think it'd be better to do this piece-meal, using the BioPerl-ish > next() and last() methods. > > Maybe this was a little verbose, but that is the SeqDiff package in a > nutshell. I hope to soon release v1.0. If you have any questions or comments > I'd love to hear them. > > -Daniel Renfro > > Hu Lab Research Associate > Dept. of Biochemistry and Biophysics > 2128 TAMU > Texas A&M Univ. > College Station, TX 77843-2128 > 979-862-4055 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From rui.faria at upf.edu Sun Jan 31 12:17:09 2010 From: rui.faria at upf.edu (Rui Faria) Date: Sun, 31 Jan 2010 18:17:09 +0100 (CET) Subject: [Bioperl-l] question about a PAML module In-Reply-To: References: <17885902.1262198478831.JavaMail.oracle@rif1.s.upf.edu> <31992102.1262223390984.JavaMail.oracle@rif2.s.upf.edu> Message-ID: <18165610.1264958229480.JavaMail.oracle@rif1.s.upf.edu> Hi Dave, we reported the bug on codeml about errors when the user gives its own tree file, some time ago. Did you have any chances to look at it? We basically wanted to know your opinion on where the problem may be, since we are not the most experienced "perlers" on the planet :) I'm asking this because we have to deal with that right now. If someone could check where is the problem, to understand if it has an easy solution, that would be of great help. Best, Rui -----Mensaje Original----- De Dave Messina Enviado Jue 31/12/2009 11:55 AM Para Rui Faria Cc Jason Stajich ; sandraneto_ at hotmail.com; bioperl-l List Asunto Re: question about a PAML module Hi Rui and Sandra, Could you file this as a bug report at http://bugzilla.open-bio.org/enter_bug.cgi?product=Bioperl ? Once you've created the bug report with a brief description of the problem and submitted it, please attach the following to the bug report: - sample input files (a sequence file and a tree file, probably) - a script which reproduces the problem - the output (error messages) like you show below When I updated the code to work with the current version, I didn't exhaustively test all of the different modes of running codeml, so I appreciate you reporting this. There was another, similar issue reported a few days ago. I will try to take a look at both of these bug reports soon. Dave From rui.faria at upf.edu Sun Jan 31 13:56:56 2010 From: rui.faria at upf.edu (Rui Faria) Date: Sun, 31 Jan 2010 19:56:56 +0100 (CET) Subject: [Bioperl-l] question about a PAML module In-Reply-To: References: <17885902.1262198478831.JavaMail.oracle@rif1.s.upf.edu> <31992102.1262223390984.JavaMail.oracle@rif2.s.upf.edu> <18165610.1264958229480.JavaMail.oracle@rif1.s.upf.edu> Message-ID: <11398434.1264964216856.JavaMail.oracle@rif1.s.upf.edu> Many thanks! We hope one day that we become experts we can retribute! Rui -----Mensaje Original----- De Dave Messina Enviado Dom 31/01/2010 06:43 PM Para Rui Faria Cc Jason Stajich ; sandraneto_ at hotmail.com; bioperl-l List Asunto Re: question about a PAML module Hey Rui, My apologies for keeping you waiting on this. I started looking at it on Friday, and while I believe it'll be a relatively easy fix, I haven't got to the bottom of it yet. I'll look at it some more tomorrow and hopefully get it sorted it in the next day or two. Dave From avilella at gmail.com Sat Jan 2 03:57:28 2010 From: avilella at gmail.com (Albert Vilella) Date: Sat, 2 Jan 2010 08:57:28 +0000 Subject: [Bioperl-l] Downloading from dbEST by taxon range Message-ID: <358f4d651001020057g21c8497eia6559bfeac0b5544@mail.gmail.com> Hi all and happy 2010 for those that follow the Gregorian calendar, A question that is a bit in between bioperl and NCBI. I would like to use bioperl to download sequences fom dbEST. For that, my idea is to use Bio::DB::Genbank and get the sequences by gi id. Now, I want my script to download sequences for a given NCBI taxonomy clade. For example, if I want to download all fish (clupeocephala) sequences in dbEST, I can browse it around with the dbEST webpage using "clupeocephala[taxonomy]", so I am thinking there should be a way to do it programmatically. How can I query NCBI dbEST through bioperl to give me the list of GI ids I am looking for given a taxon id? Thanks in advance, Albert. From jason at bioperl.org Sat Jan 2 11:35:22 2010 From: jason at bioperl.org (Jason Stajich) Date: Sat, 2 Jan 2010 08:35:22 -0800 Subject: [Bioperl-l] Downloading from dbEST by taxon range In-Reply-To: <358f4d651001020057g21c8497eia6559bfeac0b5544@mail.gmail.com> References: <358f4d651001020057g21c8497eia6559bfeac0b5544@mail.gmail.com> Message-ID: DId you try Bio::DB::Query::GenBank ? You'd want to use -db => 'nucest' and then you just put in an Entrez query as per the example. you can include dates in the query so you can do updates to your locally retrieved data in a script that runs periodically. -jason On Jan 2, 2010, at 12:57 AM, Albert Vilella wrote: > Hi all and happy 2010 for those that follow the Gregorian calendar, > > A question that is a bit in between bioperl and NCBI. I would like > to use > bioperl to download sequences fom dbEST. For that, my idea is to use > Bio::DB::Genbank and get the sequences by gi id. > > Now, I want my script to download sequences for a given NCBI > taxonomy clade. > > For example, if I want to download all fish (clupeocephala) > sequences in dbEST, > I can browse it around with the dbEST webpage using > "clupeocephala[taxonomy]", > so I am thinking there should be a way to do it programmatically. > > How can I query NCBI dbEST through bioperl to give me the list of GI > ids I am > looking for given a taxon id? > > Thanks in advance, > > Albert. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ From avilella at gmail.com Sun Jan 3 04:08:33 2010 From: avilella at gmail.com (Albert Vilella) Date: Sun, 3 Jan 2010 09:08:33 +0000 Subject: [Bioperl-l] Downloading from dbEST by taxon range In-Reply-To: References: <358f4d651001020057g21c8497eia6559bfeac0b5544@mail.gmail.com> Message-ID: <358f4d651001030108p6a92fb27k5fa39be6bebb3a9c@mail.gmail.com> Thanks Jason! For the sake of completion, here is the script I needed: --------------------- #!/usr/bin/perl use strict; use Bio::SeqIO; use Bio::DB::Taxonomy; use Bio::DB::Query::GenBank; use Bio::DB::GenBank; use Bio::SeqIO; use Getopt::Long; my $keyword_type = 'EST'; my $outdir = '.'; my $taxon_name = undef; my $db_type = 'nucest'; GetOptions('keyword_type:s' => \$keyword_type, 't|taxon_name:s' => \$taxon_name, 'db_type:s' => \$db_type, 'outdir:s' => \$outdir); my $query_string = $taxon_name ."[Organism] AND ". $keyword_type ."[Keyword]"; my $db = Bio::DB::Query::GenBank->new (-db => $db_type, -query => $query_string, -mindate => '2007', -maxdate => '2010'); my $taxon_name_string = $taxon_name; $taxon_name_string =~ s/\ /\_/g; my $outfile = $outdir . "/" . $taxon_name_string . ".". $db_type . ".fasta"; my $out = Bio::SeqIO->new(-file => ">$outfile", -format => 'fasta'); print $db->count,"\n"; my $gb = Bio::DB::GenBank->new(); my $stream = $gb->get_Stream_by_query($db); while (my $seq = $stream->next_seq) { # Filtering reads shorter than 800 next unless (length($seq->seq) > 800); $out->write_seq($seq); } $out->close; --------------------- On Sat, Jan 2, 2010 at 4:35 PM, Jason Stajich wrote: > DId you try Bio::DB::Query::GenBank ? > You'd want to use -db => 'nucest' and then you just put in an Entrez query > as per the example. ?you can include dates in the query so you can do > updates to your locally retrieved data in a script that runs periodically. > > -jason > On Jan 2, 2010, at 12:57 AM, Albert Vilella wrote: > >> Hi all and happy 2010 for those that follow the Gregorian calendar, >> >> A question that is a bit in between bioperl and NCBI. I would like to use >> bioperl to download sequences fom dbEST. For that, my idea is to use >> Bio::DB::Genbank and get the sequences by gi id. >> >> Now, I want my script to download sequences for a given NCBI taxonomy >> clade. >> >> For example, if I want to download all fish (clupeocephala) sequences in >> dbEST, >> I can browse it around with the dbEST webpage using >> "clupeocephala[taxonomy]", >> so I am thinking there should be a way to do it programmatically. >> >> How can I query NCBI dbEST through bioperl to give me the list of GI ids I >> am >> looking for given a taxon id? >> >> Thanks in advance, >> >> Albert. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > http://fungalgenomes.org/ > > From Jean-Marc.Frigerio at pierroton.inra.fr Mon Jan 4 09:12:18 2010 From: Jean-Marc.Frigerio at pierroton.inra.fr (Jean-Marc Frigerio INRA) Date: Mon, 04 Jan 2010 15:12:18 +0100 Subject: [Bioperl-l] How to read in the whole fasta file in the memory? In-Reply-To: References: Message-ID: <4B41F742.2030209@pierroton.inra.fr> > Message: 1 > Date: Thu, 31 Dec 2009 11:26:45 +1800 > From: Peng Yu > Subject: [Bioperl-l] How to read in the whole fasta file in the > memory? > To: bioperl-l at lists.open-bio.org > Message-ID: > <366c6f340912300926k5af5cc88nc3c3babda541fd1 at mail.gmail.com> > Content-Type: text/plain; charset=ISO-8859-1 > > With Bio::SeqIO, I can only read in the records in a fasta file one by > one. This is preferable if there are many records in a file. > > But I also want to read all the records in. I could use a while loop > to read all records in. But could somebody let me know if there is a > function in bioperl that can read in all the record at once and return > me an object? > > http://www.bioperl.org/wiki/HOWTO:SeqIO > > > ------------------------------ > > Message: 2 > Date: Wed, 30 Dec 2009 13:04:53 -0500 > From: Sean Davis > Subject: Re: [Bioperl-l] How to read in the whole fasta file in the > memory? > To: Peng Yu > Cc: "bioperl-l at lists.open-bio.org" > Message-ID: > <264855a00912301004t396e0d4fwf9d291c5d82c3fb9 at mail.gmail.com> > Content-Type: text/plain; charset=UTF-8 > > On Wed, Dec 30, 2009 at 12:26 PM, Peng Yu wrote: >> With Bio::SeqIO, I can only read in the records in a fasta file one by >> one. This is preferable if there are many records in a file. >> >> But I also want to read all the records in. I could use a while loop >> to read all records in. But could somebody let me know if there is a >> function in bioperl that can read in all the record at once and return >> me an object? > > In perl, you can use an array to store the records. You could also > use a hash if you have reasonable keys for the entries. > > Sean > > >> http://www.bioperl.org/wiki/HOWTO:SeqIO >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > ------------------------------ > > Message: 3 > Date: Wed, 30 Dec 2009 11:58:54 -0800 > From: Jason Stajich > Subject: Re: [Bioperl-l] How to read in the whole fasta file in the > memory? > To: Peng Yu > Cc: BioPerl List > Message-ID: <3550F192-111F-48A7-B1B7-113FFFAC105B at bioperl.org> > Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes > > or use a database object so you can retrieve sequences that have a > particular id. See Bio::DB::Fasta > On Dec 30, 2009, at 10:04 AM, Sean Davis wrote: > >> On Wed, Dec 30, 2009 at 12:26 PM, Peng Yu wrote: >>> With Bio::SeqIO, I can only read in the records in a fasta file one >>> by >>> one. This is preferable if there are many records in a file. >>> >>> But I also want to read all the records in. I could use a while loop >>> to read all records in. But could somebody let me know if there is a >>> function in bioperl that can read in all the record at once and >>> return >>> me an object? >> In perl, you can use an array to store the records. You could also >> use a hash if you have reasonable keys for the entries. >> >> Sean >> >> >>> http://www.bioperl.org/wiki/HOWTO:SeqIO >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > http://fungalgenomes.org/ > > > > ------------------------------ > > Message: 4 > Date: Wed, 30 Dec 2009 16:20:31 -0500 > From: "Mark A. Jensen" > Subject: Re: [Bioperl-l] How to read in the whole fasta file in the > memory? > To: "Peng Yu" , > Message-ID: <2646F627E6D14AADB412A6E6B51E24DA at NewLife> > Content-Type: text/plain; format=flowed; charset="iso-8859-1"; > reply-type=original > > I think you might want Bio::AlignIO: > > $alnio = Bio::AlignIO->new(-file=> 'my.fas' ); > $aln = $alnio->next_aln; > @seqs = $aln->each_seqs; > > MAJ > ----- Original Message ----- > From: "Peng Yu" > To: > Sent: Wednesday, December 30, 2009 12:26 PM > Subject: [Bioperl-l] How to read in the whole fasta file in the memory? > > >> With Bio::SeqIO, I can only read in the records in a fasta file one by >> one. This is preferable if there are many records in a file. >> >> But I also want to read all the records in. I could use a while loop >> to read all records in. But could somebody let me know if there is a >> function in bioperl that can read in all the record at once and return >> me an object? >> >> http://www.bioperl.org/wiki/HOWTO:SeqIO >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l Hi, I wrote and currently use a module I named Bio::SeqIO::multifasta, which is basically a copy of Bio::SeqIO::fasta plus a few methods: get_by_id(), get_by_order(), first_seq() and previous_seq() It would need review, validation etc. Do I submit it to Bugzilla ? -- jmf From jason at bioperl.org Mon Jan 4 11:03:45 2010 From: jason at bioperl.org (Jason Stajich) Date: Mon, 4 Jan 2010 08:03:45 -0800 Subject: [Bioperl-l] How to read in the whole fasta file in the memory? In-Reply-To: <4B41F742.2030209@pierroton.inra.fr> References: <4B41F742.2030209@pierroton.inra.fr> Message-ID: <16D7C8C1-E4BE-406F-9D60-379876178CAB@bioperl.org> We typically think of SeqIO as parsing a stream of data, not being reliant on it being a file which is what these methods would be implying I think. Sounds a lot like a database - does Bio::DB::Fasta not provide some of the functionality you need by these methods? I realize there isn't a by_order() but the get_by_id() is implemented to allow random access. -jason > > Hi, > > I wrote and currently use a module I named Bio::SeqIO::multifasta, > which is basically a copy of Bio::SeqIO::fasta plus a few methods: > get_by_id(), get_by_order(), first_seq() and previous_seq() > > It would need review, validation etc. Do I submit it to Bugzilla ? > > -- jmf > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ From avilella at gmail.com Mon Jan 4 15:00:24 2010 From: avilella at gmail.com (Albert Vilella) Date: Mon, 4 Jan 2010 20:00:24 +0000 Subject: [Bioperl-l] indexed fastq files Message-ID: <358f4d651001041200m1f715b83k743f4d2ee6b6312b@mail.gmail.com> Hi all, What is the best way to index fastq files, so that once clustered, I can provide a list of seq_ids and get them back in fastq format from the indexed db? Cheers, Albert. From cjfields at illinois.edu Mon Jan 4 16:59:50 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 4 Jan 2010 15:59:50 -0600 Subject: [Bioperl-l] indexed fastq files In-Reply-To: <358f4d651001041200m1f715b83k743f4d2ee6b6312b@mail.gmail.com> References: <358f4d651001041200m1f715b83k743f4d2ee6b6312b@mail.gmail.com> Message-ID: <07EBA105-6A34-490C-B0B9-4772DF386CBA@illinois.edu> Bio::Index::Fastq, maybe? To tell the truth, I haven't tried it since we refactored FASTQ parsing, so let us know if it doesn't work. chris On Jan 4, 2010, at 2:00 PM, Albert Vilella wrote: > Hi all, > > What is the best way to index fastq files, so that once clustered, I > can provide a list of seq_ids and get > them back in fastq format from the indexed db? > > Cheers, > > Albert. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Jan 4 22:54:03 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 4 Jan 2010 21:54:03 -0600 Subject: [Bioperl-l] How to read in the whole fasta file in the memory? In-Reply-To: <4B41F742.2030209@pierroton.inra.fr> References: <4B41F742.2030209@pierroton.inra.fr> Message-ID: <1BAE5508-0DB7-41B4-92E3-49256582131F@illinois.edu> Jean-Marc, You can do that, yes. Just curious, but have you looked at the various flat file indexing modules for FASTA? Bio::DB::Fasta and Bio::Index::Fasta are commonly used and allow lookups by primary ID (and I think in some cases secondary IDs). chris On Jan 4, 2010, at 8:12 AM, Jean-Marc Frigerio INRA wrote: > ... > > Hi, > > I wrote and currently use a module I named Bio::SeqIO::multifasta, which is basically a copy of Bio::SeqIO::fasta plus a few methods: > get_by_id(), get_by_order(), first_seq() and previous_seq() > > It would need review, validation etc. Do I submit it to Bugzilla ? > > -- jmf > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From fs5 at sanger.ac.uk Wed Jan 6 17:16:13 2010 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Wed, 06 Jan 2010 22:16:13 +0000 Subject: [Bioperl-l] Bio::DB::Sam strange behaviour for read pairs Message-ID: <4B450BAD.3050807@sanger.ac.uk> I'm trying to extract paired reads from a BAM file that span a given region. I would then like to get the two read ends of the sequenced clone that spans the region. I use Bio::DB::Sam->get_features_by_location for this and it does give me the correct read pairs as a region match but it doesn't give me both read pairs in all cases. Here is the script: #!/usr/bin/perl use Bio::DB::Sam; my $usage = "usage: $0 BAMFILE CHROMOSOME STARTPOS ENDPOS\n" ; my ($bam_file,$chrom,$start,$end) = @ARGV ; die $usage unless $bam_file && $chrom && $start && $end; my $bam = Bio::DB::Sam->new(-bam => $bam_file); my @pairs = $bam->get_features_by_location( -type => 'read_pair', -seq_id => $chrom, -start => $start, -end => $end); print "region: $chrom:$start..$end\n" ; foreach my $pair (@pairs) { print " pair: id: ".$pair->id.", start".$pair->start.', end:'.$pair->end."\n"; my ($first_mate,$second_mate) = $pair->get_SeqFeatures; print " first_mate: start:".$first_mate->start.', end:'.$first_mate->end."\n"; if ($second_mate){ print " second_mate: start:".$second_mate->start.', end:'.$second_mate->end."\n"; } else { print " no second mate\n"; } } And here are the matching pairs that it produces with one of my files for the region tal12:22479..29232: region: tal12:22479..29232 pair: id: tal-2446c08, start17496, end:29423 first_mate: start:28540, end:29423 no second mate pair: id: tal-2463d10, start23534, end:31363 first_mate: start:23534, end:24448 no second mate pair: id: tal-2371c09, start20860, end:28230 first_mate: start:27604, end:28230 no second mate pair: id: tal-2440b06, start19232, end:27099 first_mate: start:26025, end:27099 no second mate pair: id: tal-2327g09, start18909, end:26129 first_mate: start:25354, end:26129 no second mate pair: id: tal-2381b05, start25658, end:35054 first_mate: start:25658, end:26295 no second mate pair: id: tal-2377c11, start20898, end:28230 first_mate: start:27473, end:28230 no second mate pair: id: tal-2426e12, start21975, end:27562 first_mate: start:21975, end:23008 second_mate: start:26396, end:27562 pair: id: tal-2365h10, start22843, end:31944 first_mate: start:22843, end:23184 no second mate pair: id: tal-2388h09, start19016, end:28238 first_mate: start:27475, end:28238 no second mate So it finds a lot of pairs that span the region and the start/end from the pair is also correct but it only gives me both individual mates in one case: pair: id: tal-2426e12, start21975, end:27562 first_mate: start:21975, end:23008 second_mate: start:26396, end:27562 In this case, both pairs are actually inside the query region (at least partially) whereas in the other cases, one of the mates is not inside, e.g. this one: pair: id: tal-2388h09, start19016, end:28238 first_mate: start:27475, end:28238 no second mate > get this read pair from the BAM file: $ samtools view clones.bam | grep tal-2388h09 tal-2388h09 99 tal12 19016 205 36H9M1D14M1D664M1D16M1D21M1D28M1D15M1D10M1D12M1D7M1D8M1D5M = 27475 9223 CTTTGGATGAAATAGTTTTTAAATAATACTTATTAAATATTAAATATATAACACATAAATAAGTATTGATGCAAATTTTAAAGTATTATAGAAAACTAGGTTTGATTATATTGTTATACTGTACTTTAAGAGGAGAGAGATAAGATATCTTTGCTCTTTTAATATATAAATTTAGATAAATATTCGTTAAATTTTCTACATAGTTATTTTTTATCTTATATATTATACTGCTATAGTTATCAATGTATATACATTCAAATAATTTATTAAAAATTCTATATTATATTAATTCTATGATAAAATAATCCTGTTTGTGATTTAAAAAATGATGATTCAATAAAAACTAATAATATAATACGAGTTAATATGGAATAATAAAATGGCATTTAACATGAATTTAGTCTTTAACCTTTTCTTTGTTTGTCAAGTTTTTTAAAACATAAAACCACACATTTCAAAATGGATTTTTAGCAAATATATAAAAATTATACATTTATAATGTATTGTTATGCGTCTTTTCGATAGAATCAATATTTAATTATATGAAGTTTCCACAATAAAATAATATTTAATATTATTTATTAGTAGAGTATTTGATTATATATATAGGCATATAATAATAACTCTAGTTCTATCTACCATATTATTTATAATTATTATAACAAAATGTGATATGAAATTTTATTATATACTTATATTATTTTTTTAACTATTTTAAAATATATTTATTTATACCTCAAAACTATAAAATTGAAATTATTAATAATAATCTAATATATACCTTTATAAAAATAAACGTATAAACTAAT ><:4/+1+*)+4>BEH=9-,,66IIIIIIIIEDA>>>>A at DDFFIHHHHHITIIIIIHIIHHHHHHIYYYYYTTTYDDDHDDDDDDDIIINNTNHHHHHIYYYYIIIIIINNNNTTIIIIIIIIIIITTNTTTTTYYYTTTTTTYYNNNNNNLLLLLLLLLLLNNNNNNTTTTTTTTTTTTTNNTNNNTTTYYTLLLLLLTTTTTTTTYTTTNNNNNNTTTTTTTNNTTNNTTTTTTTTTTYYTTTTTTTNNNNNNTTTTTTTYYTTTTTTYYYTTTTTTTTTTTYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYTTTTTTTTTYYYYYYYYYYYYYYYYYYYYYYYTTTTTYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYTTTTYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYTTNNINTTYYYYYYYYYYYYYYYTTTTTTYYYYYYYYYYYYTTTTTTYYYYYYYYYYYYYTTTTTOOOIFFFIFIIOICC>>II@>>>>>>C>>>>>>CIBECCCHIIOOOOOOOOTTTIIFDDEIQQA:55839AA>99>@IIIIII>>::;;I;>>CC>>>>>@III<::=>AAA<>>>>I>:>>99:>842225006824855;5>68844//.//00:>::338:99<:/-+*-./0)((((+00+..,++(((+-()(*((((()*)***))3)''')*..+*++((*1++--+*''''((+/)*42.((***)+,+('*'''*((''''((,'%%''''''''( AS:i:614 MS:i:50 tal-2388h09 147 tal12 27475 205 1H764M40H = 19016 -9223 ATTAAATCGGTATCGCCAACACAATGAGTATAATCATTGTCAAATATGCGTTTGTAAGTATATTCATTGTCACATTCACGTTTGTAAGTATATTCATTGTCACATTCACGTTTGTAAGTATAATCATTGGAACGTTCATTTTTGTAAGTATAATCATTGGTACGTTCATTTTTGTGAGTATAATCATTGGTACGTTCATTTTTATGAGTATAATCATTGGTACGTTCATTTTTGTGAGTATAATCATTGGTACGTTCATTTTTATGAGTATAATCATTGGAACGTTCATTTTTGTAAGTATAATCATTGGTACGTTCATTTTTGTGAGTATAATCATTGGTACGTTCATTTTTATGAGTATAATCATTGGAACGTTCATTTTTATGAGTATAATCATTGGAACGTTCATTTTTGTAAGTATAATCATTGGTACGTTCATTTTTATGAGTATAATCATTGGTACGTTCATTTTTATGAGTATAATCATTGGAACGTTCATTTTTATGAGTATAATCATTGGAACGTTCATTTTTGTAAGTATAATCATTGGAACGTTCATTTTTATGAGTATAATCATTGGTACGTTCATTTTTGTGAGTATAATCATTGGTACGTTCATTTTTGTGAGTATAATCATTGGTACGTTCATTTTTATGAGTATAATCATTGGAACGTTCATTTTTATGAGTATAATCATTGGTACGTTCATTTTTGTGAGTATAATCATTGGTACGTTAATTTTTGTGAGTATAATCATTGGAACG (((0))*,-1-../2((())03---03266300271+*.-0-*''''+*,+/+))*-05330+)..4>7=77273911**((+20+03688633:93036<8;::5:<99379>>::>>>:57:<:7--)))1435::333228>::>II>::>A>>3/.958677AA=AA:>:==IIII8338<>A>>>>IIIIIIIIYYYYYKKYYYMIFFFFEIIIMI::4..8AIIC>9>=EIQQQMCAAAAAACIIIIAICIIIOOYTIIIMOQQMIIIIC>>AAABCCCCCEAI>C>>IQQIIIIIIIIIIKKYYYYYYYYYYYYYYYYYYYYYTIIIIIIYYYYTNINNNTYYYYYYYYYYYYYTTTTTYYYYYYYYYYYYYYYYYYYYYYYYYTTTTTTYYYYYYYYYYYYYYYSSYYYYYYYTTTTTTYYYYYYYYYYYYYYYYYYYYYYYYTTTTTTTTTTTTTTTTTTYYYYYTTTTTTYYYTTNNNNTTYYYYYYYYYTTTTLLTTNNTTTTYTTTTTTYYYYYYYYYYYTTOOKKKLKOOTYYYYYYYYYYYYYYYYTNNNNNNNNNTTTNYNNNNTNNNNTTYYYYYYYYTTNNNNTTYNNNNNITTTTTYYYYYYYYYYTTNNIIIIIDIIIIHTNNNNTTYYYYTNNNIIIIIITTTINIIIINNNNTTTYYYYIHHHDDHHDDIHDDGDFFFTIIINTTYYYYTTTTHHHHCCIIIHIHHHHCAI9:++**1168>ACCIIDDDDDDI>>>>>?NNN AS:i:688 MS:i:50 So the read in the first line starts before the start of the query region and is not accessible via $pair->get_SeqFeatures although this is a valid pair. Am I doing something wrong, is this the desired behaviour or is it a bug? Thanks for your help! -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From hlapp at drycafe.net Thu Jan 7 11:55:00 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Thu, 7 Jan 2010 11:55:00 -0500 Subject: [Bioperl-l] Data missing into Annotation object using Bio::SeqIO (Genbank) In-Reply-To: <29CB0088-99C1-417E-BB3B-56FE7EC135F9@illinois.edu> References: <4B28EB44.3080006@pasteur.fr> <29CB0088-99C1-417E-BB3B-56FE7EC135F9@illinois.edu> Message-ID: <240F198A-83FA-4304-ACA8-80A702A68D8C@drycafe.net> I don't know to what extent this was followed up on further and I guess it's too long ago to be of much help, but if it hasn't been mentioned before I wanted to point out Bio::SeqFeature::AnnotationAdaptor which integrates tag/value annotation and Bio::Annotation annotation into one AnnotationCollection, so it doesn't matter whether something is attached as a tag or as an annotation object. -hilmar On Dec 16, 2009, at 10:09 AM, Chris Fields wrote: > Emmanuel, > > The previous behavior in the 1.5.x series was to store feature tags > as Bio::Annotation. The problem had been the way this was > implemented was considered unsatisfactory for various reasons, so we > reverted back to using simple tag-value pairs as the default. You > can get at the data this way (from the Feature/Annotation HOWTO): > > for my $feat_object ($seq_object->get_SeqFeatures) { > print "primary tag: ", $feat_object->primary_tag, "\n"; > for my $tag ($feat_object->get_all_tags) { > print " tag: ", $tag, "\n"; > for my $value ($feat_object->get_tag_values($tag)) { > print " value: ", $value, "\n"; > } > } > } > > You can also convert all the tag-value data into a > Bio::Annotation::Collection using the > Bio::SeqFeature::AnnotationAdaptor, but this is completely optional. > > chris > > On Dec 16, 2009, at 8:14 AM, Emmanuel Quevillon wrote: > >> Hi, >> >> I've wrote a small Genbank parser few months ago before BioPerl >> release 1.6.0. >> I tried to use my code once again but now the output of my parser >> is empty. >> It looks like Annotation from seqfeatures is not filled anymore. >> >> Here is the code I used previously: >> >> while(my $seq = $streamer->next_seq()){ >> >> #We only want to retrieve CDS features... >> foreach my $feat (grep { $_->primary_tag() eq 'CDS' } $seq- >> >get_SeqFeatures()){ >> print $ofh join("#", >> $feat->annotation()- >> >get_Annotations('locus_tag'), # Acc num >> $feat->annotation()->get_Annotations('gene') >> ? $feat->annotation()- >> >get_Annotations('gene') # Gene name >> : $feat->annotation()- >> >get_Annotations('locus_tag'), >> $feat->annotation()- >> >get_Annotations('product'), # Description >> ),"\n"; >> } >> } >> >> $feat is a Bio::SeqFeature::Generic object >> >> If I print Dumper($feat->annotation()) here is the output : >> >> $VAR1 = bless( { >> '_typemap' => bless( { >> '_type' => { >> 'comment' => >> 'Bio::Annotation::Comment', >> 'reference' => >> 'Bio::Annotation::Reference', >> 'dblink' => >> 'Bio::Annotation::DBLink' >> } >> }, >> 'Bio::Annotation::TypeManager' ), >> '_annotation' => {} >> }, 'Bio::Annotation::Collection' ); >> >> Have some changes been made into the way annotation object is >> populated? >> >> Thanks for any clue and sorry if my question look stupid >> >> Regards >> >> Emmanuel >> >> -- >> ------------------------- >> Emmanuel Quevillon >> Biological Software and Databases Group >> Institut Pasteur >> +33 1 44 38 95 98 >> tuco at_ pasteur dot fr >> ------------------------- >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From rtbio.2009 at gmail.com Fri Jan 8 10:00:21 2010 From: rtbio.2009 at gmail.com (Roopa Raghuveer) Date: Fri, 8 Jan 2010 16:00:21 +0100 Subject: [Bioperl-l] Regarding blast in Bioperl Message-ID: Hello all, I was trying Remote blast using Bioperl. My input data is a Trypanosoma brucei sequence in Fasta format. When I was trying to submit to BLAST using the step $r=$factory->submit_blast($input) It was not returning anything which I checked by debugging the code. It is not blasting my input sequence even though I mentioned all the parameters.I would paste the code below. Please help me in solving put this problem. It is very urgent. Regards Roopa. #!/usr/bin/perl #path for extra camel module use lib "/srv/www/htdocs/rain/RNAi/"; use Roopablast; use Bio::SearchIO; use Bio::Search::Result::BlastResult; use Bio::Perl; use Bio::Tools::Run::RemoteBlast; use Bio::Seq; use Bio::SeqIO; use Bio::DB::GenBank; $serverpath = "/srv/www/htdocs/rain/RNAi"; $serverurl = "http://141.84.66.66/rain/RNAi"; $outfile = $serverpath."/rnairesult_".time().".html"; $nuc = $serverpath."/nuc".time().".txt"; $debugfile = $serverpath."/debug_".time().".txt"; $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; my $outstring =""; &parse_form; print "Content-type: text/html\n\n"; print "\n"; print "RNAi Result"; print " \n"; print "\n"; print "\n"; print " Your results will appear here
"; print " Please be patient, runtime can be up to 5 minutes
"; print " This page will automatically reload in 30 seconds. Roopa"; print "\n"; print "\n"; defined(my $pid = fork) or die "Can't fork: $!"; exit if $pid; open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; open(OUTFILE, '>',$outfile); print OUTFILE "\n RNAi Result \n \n \n Your results will appear here
Please be patient, runtime can be up to 5 minutes wait wait wait......
This page will automatically reload in 30 seconds Roopa
\n \n"; close(OUTFILE); @compseqs = blastcode($in{'Inputseq'}); $in{'Inputseq'} =~ s/>.*$//m; $in{'Inputseq'} =~ s/[^TAGC]//gim; $in{'Inputseq'} =~ tr/actg/ACTG/; @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'}, $in{'Threshold'}); sub blastcode { $inpu1= $_[0]; #$organ= $_[1]; open(NUC,'>',$nuc); print NUC $inpu1; close(NUC); my $prog = 'blastn'; my $db = 'refseq_rna'; my $e_val= '1e-10'; my $organism= 'Trypanosoma Brucei'; $gb = new Bio::DB::GenBank; my @params = ( '-prog' => $prog, '-data' => $db, '-expect' => $e_val, '-readmethod' => 'SearchIO', '-Organism' => $organism ); # open(OUTFILE,'>',$debugfile); # print OUTFILE @params; # close(OUTFILE); my $factory = Bio::Tools::Run::RemoteBlast->new(@params); #change a paramter $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma Brucei[ORGN]'; #change a paramter # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]'; my $v = 1; #$v is just to turn on and off the messages my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , '-organism' => 'Trypanosoma Brucei' ); while (my $input = $str->next_seq()) { #Blast a sequence against a database: #Alternatively, you could pass in a file with many #sequences rather than loop through sequence one at a time #Remove the loop starting 'while (my $input = $str->next_seq())' #and swap the two lines below for an example of that. open(OUTFILE,'>',$debugfile); print OUTFILE $input; close(OUTFILE); my $r = $factory->submit_blast($input); #The program stops here it does not return any value and it does not enter the While loop,Please help me in this regard.# open(OUTFILE,'>',$debugfile); print OUTFILE $r; close(OUTFILE); print STDERR "waiting...." if($v>0); while ( my @rids = $factory->each_rid ) { open(OUTFILE,'>',$debugfile); print OUTFILE "while entered"; close(OUTFILE); foreach my $rid ( @rids ) { open(OUTFILE,'>',$debugfile); print OUTFILE "foreach entered"; close(OUTFILE); my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $factory->remove_rid($rid); } open(OUTFILE,'>',$debugfile); print OUTFILE "if entered"; close(OUTFILE); print STDERR "." if ( $v > 0 ); sleep 5; } else { open(OUTFILE,'>',$debugfile); print OUTFILE "else entered"; close(OUTFILE); my $result = $rc->next_result(); #save the output $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; open(BLASTDEBUGFILE,'>',$blastdebugfile); print BLASTDEBUGFILE $result->next_hit(); close(BLASTDEBUGFILE); my $filename = $serverpath."/blastdata_".time().$result->query_name()."\.out"; # open(DEBUGFILE,'>',$debugfile); # open(new,'>',$filename); # @arra=; # print DEBUGFILE @arra; # close(DEBUGFILE); # close(new); $factory->save_output($filename); # open(BLASTDEBUGFILE,'>',$debugfile); # print BLASTDEBUGFILE "Hello $rid"; # close(BLASTDEBUGFILE); $factory->remove_rid($rid); open(BLASTDEBUGFILE,'>',$blastdebugfile); print BLASTDEBUGFILE $organism; close(BLASTDEBUGFILE); # open(OUTFILE,'>',$outfile); # print OUTFILE "Test2 $result->database_name()"; # close(OUTFILE); #$hit = $result->next_hit; #open(new,'>',$debugfile); #print $hit; #close(new); while ( my $hit = $result->next_hit ) { next unless ( $v > 0); # open(OUTFILE,'>',$debugfile); # print OUTFILE "$hit in while hits"; # close(OUTFILE); my $sequ = $gb->get_Seq_by_version($hit->name); my $dna = $sequ->seq(); # get the sequence as a string push(@seqs,$dna); } } } } } #open(OUTFILE,'>',$debugfile); #print OUTFILE $seqs[0]; #close(OUTFILE); return(@seqs); } open(OUTFILE, '>',$outfile) || die ; print OUTFILE "\n RNAi Result \n \n

Inputsequence:
"; for ($i=0; $i\n"; } } print OUTFILE "

"; $z=@compseqs; for($k=1;$k<$z;$k++) { print OUTFILE "

Compare Sequence:
"; for ($i=0; $i\n"; } } print OUTFILE "

"; } print OUTFILE "

Window:
$in{'Windowsize'}

Threshold:
$in{'Threshold'}

"; my $j=0; for ($i=0; $i{similar}<=$in{'Threshold'}){ $j=$in{'Windowsize'}; } $height=$out[$i]->{similar}*5; } if ($j>0) { print OUTFILE ""; $outstring .= "".substr ($in{'Inputseq'}, $i, 1).""; $j--; } else { print OUTFILE ""; $outstring .= "".substr ($in{'Inputseq'}, $i, 1).""; } if ( ($i+1)%10==0){ $outstring .= " "; } if ( ($i+1)%60==0){ $outstring .= "
\n"; } if ( ($i+1)%800==0){ print OUTFILE "

\n"; } } print OUTFILE "

$outstring"; #foreach (@out) { #print OUTFILE "

Sequence: $_->{sequence}: $_->{similar} matchs

"; #if ($_->{similar}<=$in{'Threshold'}){ # } #} print OUTFILE "\n\n"; close OUTFILE; #nameprint(); sub parse_form { local ($buffer, @pairs, $pair, $name, $value); # Read in text $ENV{'REQUEST_METHOD'} =~ tr/a-z/A-Z/; if ($ENV{'REQUEST_METHOD'} eq "POST") { read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'}); } else { $buffer = $ENV{'QUERY_STRING'}; } @pairs = split(/&/, $buffer); foreach $pair (@pairs) { ($name, $value) = split(/=/, $pair); $value =~ tr/+/ /; $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg; $in{$name} = $value; } } From maj at fortinbras.us Fri Jan 8 10:36:41 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 8 Jan 2010 10:36:41 -0500 Subject: [Bioperl-l] Regarding blast in Bioperl In-Reply-To: References: Message-ID: Hi Roopa-- I got your code to work with the following changes: +# the input should be a valid FASTA file... ... open(NUC,'>',$nuc); +print NUC ">seq (need a name line for valid fasta)\n"; print NUC $inpu1, "\n"; close(NUC); ... +# you can set these header parms in the call itself... - my $factory = Bio::Tools::Run::RemoteBlast->new(@params); + my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY => ''Trypanosoma Brucei[ORGN]'); #change a paramter +# commented this out... +# $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma Brucei[ORGN]'; MAJ ----- Original Message ----- From: "Roopa Raghuveer" To: Sent: Friday, January 08, 2010 10:00 AM Subject: [Bioperl-l] Regarding blast in Bioperl > Hello all, > > I was trying Remote blast using Bioperl. My input data is a Trypanosoma > brucei sequence in Fasta format. When I was trying to submit to BLAST using > the step > $r=$factory->submit_blast($input) > It was not returning anything which I checked by debugging the code. It is > not blasting my input sequence even though I mentioned all the parameters.I > would paste the code below. > > Please help me in solving put this problem. It is very urgent. > > Regards > Roopa. > > #!/usr/bin/perl > > #path for extra camel module > use lib "/srv/www/htdocs/rain/RNAi/"; > use Roopablast; > > > use Bio::SearchIO; > use Bio::Search::Result::BlastResult; > use Bio::Perl; > use Bio::Tools::Run::RemoteBlast; > use Bio::Seq; > use Bio::SeqIO; > use Bio::DB::GenBank; > > $serverpath = "/srv/www/htdocs/rain/RNAi"; > $serverurl = "http://141.84.66.66/rain/RNAi"; > $outfile = $serverpath."/rnairesult_".time().".html"; > $nuc = $serverpath."/nuc".time().".txt"; > $debugfile = $serverpath."/debug_".time().".txt"; > $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; > > my $outstring =""; > > &parse_form; > > print "Content-type: text/html\n\n"; > print "\n"; > print "RNAi Result"; > print " URL=$serverurl/rnairesult_".time().".html\"> \n"; > print "\n"; > print "\n"; > print " Your results will appear href=$serverurl/rnairesult_".time().".html>here
"; > print " Please be patient, runtime can be up to 5 minutes
"; > print " This page will automatically reload in 30 seconds. Roopa"; > print "\n"; > print "\n"; > > defined(my $pid = fork) or die "Can't fork: $!"; > exit if $pid; > open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; > open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; > open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; > > > > open(OUTFILE, '>',$outfile); > > print OUTFILE "\n > RNAi Result > URL=$serverurl//rnairesult_".time().".html\"> \n > > \n > \n > Your results will appear href=$serverurl/rnairesult_".time().".html>here
> Please be patient, runtime can be up to 5 minutes wait wait wait......
> This page will automatically reload in 30 seconds Roopa
> \n > \n"; > > close(OUTFILE); > > > @compseqs = blastcode($in{'Inputseq'}); > > $in{'Inputseq'} =~ s/>.*$//m; > $in{'Inputseq'} =~ s/[^TAGC]//gim; > $in{'Inputseq'} =~ tr/actg/ACTG/; > > @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'}, > $in{'Threshold'}); > > > sub blastcode > { > > $inpu1= $_[0]; > > #$organ= $_[1]; > > open(NUC,'>',$nuc); > print NUC $inpu1; > close(NUC); > > my $prog = 'blastn'; > my $db = 'refseq_rna'; > my $e_val= '1e-10'; > my $organism= 'Trypanosoma Brucei'; > > $gb = new Bio::DB::GenBank; > > my @params = ( '-prog' => $prog, > '-data' => $db, > '-expect' => $e_val, > '-readmethod' => 'SearchIO', > '-Organism' => $organism ); > > # open(OUTFILE,'>',$debugfile); > # print OUTFILE @params; > # close(OUTFILE); > > > my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > > #change a paramter > > $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma > Brucei[ORGN]'; > > #change a paramter > # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]'; > > my $v = 1; > #$v is just to turn on and off the messages > > my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , > '-organism' => 'Trypanosoma Brucei' ); > > > while (my $input = $str->next_seq()) > { > #Blast a sequence against a database: > #Alternatively, you could pass in a file with many > #sequences rather than loop through sequence one at a time > #Remove the loop starting 'while (my $input = $str->next_seq())' > #and swap the two lines below for an example of that. > > open(OUTFILE,'>',$debugfile); > print OUTFILE $input; > close(OUTFILE); > > > my $r = $factory->submit_blast($input); #The program stops here it > does not return any value and it does not enter the While loop,Please help > me in this regard.# > open(OUTFILE,'>',$debugfile); > print OUTFILE $r; > close(OUTFILE); > > > print STDERR "waiting...." if($v>0); > > while ( my @rids = $factory->each_rid ) { > open(OUTFILE,'>',$debugfile); > print OUTFILE "while entered"; > close(OUTFILE); > foreach my $rid ( @rids ) { > > open(OUTFILE,'>',$debugfile); > print OUTFILE "foreach entered"; > close(OUTFILE); > > my $rc = $factory->retrieve_blast($rid); > > if( !ref($rc) ) > { > if( $rc < 0 ) > { > $factory->remove_rid($rid); > } > open(OUTFILE,'>',$debugfile); > print OUTFILE "if entered"; > close(OUTFILE); > print STDERR "." if ( $v > 0 ); > sleep 5; > } > else { > open(OUTFILE,'>',$debugfile); > print OUTFILE "else entered"; > close(OUTFILE); > > my $result = $rc->next_result(); > #save the output > $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; > > open(BLASTDEBUGFILE,'>',$blastdebugfile); > print BLASTDEBUGFILE $result->next_hit(); > close(BLASTDEBUGFILE); > > my $filename = > $serverpath."/blastdata_".time().$result->query_name()."\.out"; > > # open(DEBUGFILE,'>',$debugfile); > # open(new,'>',$filename); > # @arra=; > # print DEBUGFILE @arra; > # close(DEBUGFILE); > # close(new); > > $factory->save_output($filename); > > # open(BLASTDEBUGFILE,'>',$debugfile); > # print BLASTDEBUGFILE "Hello $rid"; > # close(BLASTDEBUGFILE); > > $factory->remove_rid($rid); > > open(BLASTDEBUGFILE,'>',$blastdebugfile); > print BLASTDEBUGFILE $organism; > close(BLASTDEBUGFILE); > > # open(OUTFILE,'>',$outfile); > # print OUTFILE "Test2 $result->database_name()"; > # close(OUTFILE); > > #$hit = $result->next_hit; > #open(new,'>',$debugfile); > #print $hit; > #close(new); > > while ( my $hit = $result->next_hit ) { > > next unless ( $v > 0); > > # open(OUTFILE,'>',$debugfile); > # print OUTFILE "$hit in while hits"; > # close(OUTFILE); > > my $sequ = $gb->get_Seq_by_version($hit->name); > my $dna = $sequ->seq(); # get the sequence as a string > push(@seqs,$dna); > } > } > } > } > } > > #open(OUTFILE,'>',$debugfile); > #print OUTFILE $seqs[0]; > #close(OUTFILE); > > return(@seqs); > > } > > open(OUTFILE, '>',$outfile) || die ; > > print OUTFILE "\n > RNAi Result > \n > \n >

> Inputsequence:
"; > > for ($i=0; $i > print OUTFILE substr ($in{'Inputseq'}, $i, 1); > > if ( ($i+1)%10==0){ > print OUTFILE " "; > } > if ( ($i+1)%60==0){ > print OUTFILE "
\n"; > } > } > > > > print OUTFILE "

"; > > $z=@compseqs; > > for($k=1;$k<$z;$k++) { > print OUTFILE "

Compare > Sequence:
"; > > for ($i=0; $i > print OUTFILE substr ($compseqs[$k], $i, 1); > > if ( ($i+1)%10==0){ > print OUTFILE " "; > } > if ( ($i+1)%60==0){ > print OUTFILE "
\n"; > } > } > print OUTFILE "

"; > } > > print OUTFILE "

> Window:
$in{'Windowsize'} >

>

> Threshold:
$in{'Threshold'} >

"; > my $j=0; > > for ($i=0; $i > if ($i<=length ($in{'Inputseq'})-$in{'Windowsize'}){ > if ($out[$i]->{similar}<=$in{'Threshold'}){ > $j=$in{'Windowsize'}; > } > $height=$out[$i]->{similar}*5; > } > > if ($j>0) { > print OUTFILE " height=\"5\">"; > $outstring .= "".substr ($in{'Inputseq'}, $i, > 1).""; > $j--; > } > else { > print OUTFILE " height=\"5\">"; > $outstring .= "".substr ($in{'Inputseq'}, $i, > 1).""; > } > > if ( ($i+1)%10==0){ > $outstring .= " "; > } > if ( ($i+1)%60==0){ > $outstring .= "
\n"; > > } > if ( ($i+1)%800==0){ > print OUTFILE "

\n"; > > } > } > > print OUTFILE "

set\">$outstring"; > > #foreach (@out) { > #print OUTFILE "

Sequence: $_->{sequence}: $_->{similar} matchs

"; > #if ($_->{similar}<=$in{'Threshold'}){ > > # } > #} > > print OUTFILE "\n\n"; > > close OUTFILE; > > #nameprint(); > > sub parse_form { > local ($buffer, @pairs, $pair, $name, $value); > # Read in text > $ENV{'REQUEST_METHOD'} =~ tr/a-z/A-Z/; > if ($ENV{'REQUEST_METHOD'} eq "POST") > { > read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'}); > } > else > { > $buffer = $ENV{'QUERY_STRING'}; > } > @pairs = split(/&/, $buffer); > foreach $pair (@pairs) > { > ($name, $value) = split(/=/, $pair); > $value =~ tr/+/ /; > $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg; > $in{$name} = $value; > } > } > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From julian.onions at gmail.com Fri Jan 8 11:53:50 2010 From: julian.onions at gmail.com (Julian Onions) Date: Fri, 8 Jan 2010 16:53:50 +0000 Subject: [Bioperl-l] Cladogram construction Message-ID: Does anyone have any sample code for building cladograms based on Pars (one of Phylip tools) type format (or any other format actually) I've got something sort of working but I get no weights on the tree - everything appears as nan. I'd also like to set one of the species to be an outgroup. This is the closest sample I've found so far. #!/usr/bin/perl -w use strict; use Bio::AlignIO; use Bio::Tree::DistanceFactory; use Bio::Align::ProteinStatistics; use Bio::TreeIO; use Bio::Tree::Draw::Cladogram; my $alnfile = shift @ARGV || die "need a file to run"; my $input= Bio::AlignIO->new(-format => 'fasta', -file => $alnfile); if( my $aln = $input->next_aln ) { my $dfactory = Bio::Tree::DistanceFactory->new(-method => 'NJ'); my $stats = Bio::Align::ProteinStatistics->new; my $distmat = $stats->distance(-align => $aln, -method => 'Kimura'); my $treeout = Bio::TreeIO->new(-format => 'newick'); my $tree = $dfactory->make_tree($distmat); $treeout->write_tree($tree); my $obj1 = Bio::Tree::Draw::Cladogram->new(-tree => $tree, -compact => 0); $obj1->print(-file => "tree.eps"); } else { die "could not find any alignments in the file $alnfile"; } Pars input looks like 3 4 Robin 101 Blackbird 100 Sparrow 100 Thanks, Julian. From rtbio.2009 at gmail.com Sat Jan 9 11:57:09 2010 From: rtbio.2009 at gmail.com (Roopa Raghuveer) Date: Sat, 9 Jan 2010 17:57:09 +0100 Subject: [Bioperl-l] Regarding blast in Bioperl In-Reply-To: References: Message-ID: Hello all, Thanks alot for your reply Mark. It was working for Trypanosoma brucei as the organism parameter,but when I tried to use the Organism parameter from the user,it was not working i.e., I was unable to get the target sequences. Please help me in this regard. My code is #!/usr/bin/perl #path for extra camel module use lib "/srv/www/htdocs/rain/RNAi/"; use Roopablast; use Bio::SearchIO; use Bio::Search::Result::BlastResult; use Bio::Perl; use Bio::Tools::Run::RemoteBlast; use Bio::Seq; use Bio::SeqIO; use Bio::DB::GenBank; $serverpath = "/srv/www/htdocs/rain/RNAi"; $serverurl = "http://141.84.66.66/rain/RNAi"; $outfile = $serverpath."/rnairesult_".time().".html"; $nuc = $serverpath."/nuc".time().".txt"; $debugfile = $serverpath."/debug_".time().".txt"; $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; my $outstring =""; &parse_form; print "Content-type: text/html\n\n"; print "\n"; print "RNAi Result"; print " \n"; print "\n"; print "\n"; print " Your results will appear here
"; print " Please be patient, runtime can be up to 5 minutes
"; print " This page will automatically reload in 30 seconds. Roopa"; print "\n"; print "\n"; defined(my $pid = fork) or die "Can't fork: $!"; exit if $pid; open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; open(OUTFILE, '>',$outfile); print OUTFILE "\n RNAi Result \n \n \n Your results will appear here
Please be patient, runtime can be up to 5 minutes wait wait wait......
This page will automatically reload in 30 seconds Roopa
\n \n"; close(OUTFILE); @compseqs = blastcode($in{'Inputseq'},$in{'Organism'}); $in{'Inputseq'} =~ s/>.*$//m; $in{'Inputseq'} =~ s/[^TAGC]//gim; $in{'Inputseq'} =~ tr/actg/ACTG/; @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'}, $in{'Threshold'}); sub blastcode { $inpu1= $_[0]; $organ= $_[1]; open(NUC,'>',$nuc); print NUC $inpu1,"\n"; close(NUC); my $prog = 'blastn'; my $db = 'refseq_rna'; my $e_val= '1e-10'; my $organism= $organ; $gb = new Bio::DB::GenBank; my @params = ( '-prog' => $prog, '-data' => $db, '-expect' => $e_val, '-readmethod' => 'SearchIO', '-Organism' => $organism ); open(OUTFILE,'>',$debugfile); print OUTFILE $inpu1; close(OUTFILE); my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY => '$organ[ORGN]'); #my $factory = Bio::Tools::Run::RemoteBlast->new(@params); #change a paramter #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma Brucei[ORGN]'; #change a paramter # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]'; my $v = 1; #$v is just to turn on and off the messages my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , '-organism' => $organ ); while (my $input = $str->next_seq()) { #Blast a sequence against a database: #Alternatively, you could pass in a file with many #sequences rather than loop through sequence one at a time #Remove the loop starting 'while (my $input = $str->next_seq())' #and swap the two lines below for an example of that. #open(OUTFILE,'>',$debugfile); # print OUTFILE $input; #close(OUTFILE); my $r = $factory->submit_blast($input); open(OUTFILE,'>',$debugfile); # print OUTFILE $r; close(OUTFILE); print STDERR "waiting...." if($v>0); while ( my @rids = $factory->each_rid ) { # open(OUTFILE,'>',$debugfile); # print OUTFILE "while entered"; # close(OUTFILE); foreach my $rid ( @rids ) { # open(OUTFILE,'>',$debugfile); # print OUTFILE "foreach entered"; # close(OUTFILE); my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $factory->remove_rid($rid); } open(OUTFILE,'>',$debugfile); # print OUTFILE "if entered"; close(OUTFILE); print STDERR "." if ( $v > 0 ); sleep 5; } else { # open(OUTFILE,'>',$debugfile); # print OUTFILE "else entered"; # close(OUTFILE); my $result = $rc->next_result(); #save the output $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; open(BLASTDEBUGFILE,'>',$blastdebugfile); print BLASTDEBUGFILE $result->next_hit(); close(BLASTDEBUGFILE); my $filename = $serverpath."/blastdata_".time().$result->query_name()."\.out"; # open(DEBUGFILE,'>',$debugfile); # open(new,'>',$filename); # @arra=; # print DEBUGFILE @arra; # close(DEBUGFILE); # close(new); $factory->save_output($filename); # open(BLASTDEBUGFILE,'>',$debugfile); # print BLASTDEBUGFILE "Hello $rid"; # close(BLASTDEBUGFILE); $factory->remove_rid($rid); open(BLASTDEBUGFILE,'>',$blastdebugfile); print BLASTDEBUGFILE $organism; close(BLASTDEBUGFILE); # open(OUTFILE,'>',$outfile); # print OUTFILE "Test2 $result->database_name()"; # close(OUTFILE); #$hit = $result->next_hit; #open(new,'>',$debugfile); #print $hit; #close(new); while ( my $hit = $result->next_hit ) { next unless ( $v > 0); # open(OUTFILE,'>',$debugfile); # print OUTFILE "$hit in while hits"; # close(OUTFILE); my $sequ = $gb->get_Seq_by_version($hit->name); my $dna = $sequ->seq(); # get the sequence as a string push(@seqs,$dna); } } } } } #open(OUTFILE,'>',$debugfile); #print OUTFILE $seqs[0]; #close(OUTFILE); return(@seqs); } Regards, Roopa. On Fri, Jan 8, 2010 at 4:36 PM, Mark A. Jensen wrote: > Hi Roopa-- > > I got your code to work with the following changes: > > +# the input should be a valid FASTA file... > ... > open(NUC,'>',$nuc); > +print NUC ">seq (need a name line for valid fasta)\n"; > print NUC $inpu1, "\n"; > close(NUC); > ... > > +# you can set these header parms in the call itself... > - my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > + my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY => > ''Trypanosoma Brucei[ORGN]'); > > #change a paramter > +# commented this out... > +# $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma > Brucei[ORGN]'; > > MAJ > ----- Original Message ----- From: "Roopa Raghuveer" > > To: > Sent: Friday, January 08, 2010 10:00 AM > Subject: [Bioperl-l] Regarding blast in Bioperl > > > Hello all, >> >> I was trying Remote blast using Bioperl. My input data is a Trypanosoma >> brucei sequence in Fasta format. When I was trying to submit to BLAST >> using >> the step >> $r=$factory->submit_blast($input) >> It was not returning anything which I checked by debugging the code. It is >> not blasting my input sequence even though I mentioned all the >> parameters.I >> would paste the code below. >> >> Please help me in solving put this problem. It is very urgent. >> >> Regards >> Roopa. >> >> #!/usr/bin/perl >> >> #path for extra camel module >> use lib "/srv/www/htdocs/rain/RNAi/"; >> use Roopablast; >> >> >> use Bio::SearchIO; >> use Bio::Search::Result::BlastResult; >> use Bio::Perl; >> use Bio::Tools::Run::RemoteBlast; >> use Bio::Seq; >> use Bio::SeqIO; >> use Bio::DB::GenBank; >> >> $serverpath = "/srv/www/htdocs/rain/RNAi"; >> $serverurl = "http://141.84.66.66/rain/RNAi"; >> $outfile = $serverpath."/rnairesult_".time().".html"; >> $nuc = $serverpath."/nuc".time().".txt"; >> $debugfile = $serverpath."/debug_".time().".txt"; >> $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; >> >> my $outstring =""; >> >> &parse_form; >> >> print "Content-type: text/html\n\n"; >> print "\n"; >> print "RNAi Result"; >> print "> URL=$serverurl/rnairesult_".time().".html\"> \n"; >> print "\n"; >> print "\n"; >> print " Your results will appear > href=$serverurl/rnairesult_".time().".html>here
"; >> print " Please be patient, runtime can be up to 5 minutes
"; >> print " This page will automatically reload in 30 seconds. Roopa"; >> print "\n"; >> print "\n"; >> >> defined(my $pid = fork) or die "Can't fork: $!"; >> exit if $pid; >> open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; >> open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; >> open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; >> >> >> >> open(OUTFILE, '>',$outfile); >> >> print OUTFILE "\n >> RNAi Result >> > URL=$serverurl//rnairesult_".time().".html\"> \n >> >> \n >> \n >> Your results will appear > href=$serverurl/rnairesult_".time().".html>here
>> Please be patient, runtime can be up to 5 minutes wait wait >> wait......
>> This page will automatically reload in 30 seconds Roopa
>> \n >> \n"; >> >> close(OUTFILE); >> >> >> @compseqs = blastcode($in{'Inputseq'}); >> >> $in{'Inputseq'} =~ s/>.*$//m; >> $in{'Inputseq'} =~ s/[^TAGC]//gim; >> $in{'Inputseq'} =~ tr/actg/ACTG/; >> >> @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'}, >> $in{'Threshold'}); >> >> >> sub blastcode >> { >> >> $inpu1= $_[0]; >> >> #$organ= $_[1]; >> >> open(NUC,'>',$nuc); >> print NUC $inpu1; >> close(NUC); >> >> my $prog = 'blastn'; >> my $db = 'refseq_rna'; >> my $e_val= '1e-10'; >> my $organism= 'Trypanosoma Brucei'; >> >> $gb = new Bio::DB::GenBank; >> >> my @params = ( '-prog' => $prog, >> '-data' => $db, >> '-expect' => $e_val, >> '-readmethod' => 'SearchIO', >> '-Organism' => $organism ); >> >> # open(OUTFILE,'>',$debugfile); >> # print OUTFILE @params; >> # close(OUTFILE); >> >> >> my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >> >> #change a paramter >> >> $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma >> Brucei[ORGN]'; >> >> #change a paramter >> # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]'; >> >> my $v = 1; >> #$v is just to turn on and off the messages >> >> my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , >> '-organism' => 'Trypanosoma Brucei' ); >> >> >> while (my $input = $str->next_seq()) >> { >> #Blast a sequence against a database: >> #Alternatively, you could pass in a file with many >> #sequences rather than loop through sequence one at a time >> #Remove the loop starting 'while (my $input = $str->next_seq())' >> #and swap the two lines below for an example of that. >> >> open(OUTFILE,'>',$debugfile); >> print OUTFILE $input; >> close(OUTFILE); >> >> >> my $r = $factory->submit_blast($input); #The program stops here it >> does not return any value and it does not enter the While loop,Please help >> me in this regard.# >> open(OUTFILE,'>',$debugfile); >> print OUTFILE $r; >> close(OUTFILE); >> >> >> print STDERR "waiting...." if($v>0); >> >> while ( my @rids = $factory->each_rid ) { >> open(OUTFILE,'>',$debugfile); >> print OUTFILE "while entered"; >> close(OUTFILE); >> foreach my $rid ( @rids ) { >> >> open(OUTFILE,'>',$debugfile); >> print OUTFILE "foreach entered"; >> close(OUTFILE); >> >> my $rc = $factory->retrieve_blast($rid); >> >> if( !ref($rc) ) >> { >> if( $rc < 0 ) >> { >> $factory->remove_rid($rid); >> } >> open(OUTFILE,'>',$debugfile); >> print OUTFILE "if entered"; >> close(OUTFILE); >> print STDERR "." if ( $v > 0 ); >> sleep 5; >> } >> else { >> open(OUTFILE,'>',$debugfile); >> print OUTFILE "else entered"; >> close(OUTFILE); >> >> my $result = $rc->next_result(); >> #save the output >> $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; >> >> open(BLASTDEBUGFILE,'>',$blastdebugfile); >> print BLASTDEBUGFILE $result->next_hit(); >> close(BLASTDEBUGFILE); >> >> my $filename = >> $serverpath."/blastdata_".time().$result->query_name()."\.out"; >> >> # open(DEBUGFILE,'>',$debugfile); >> # open(new,'>',$filename); >> # @arra=; >> # print DEBUGFILE @arra; >> # close(DEBUGFILE); >> # close(new); >> >> $factory->save_output($filename); >> >> # open(BLASTDEBUGFILE,'>',$debugfile); >> # print BLASTDEBUGFILE "Hello $rid"; >> # close(BLASTDEBUGFILE); >> >> $factory->remove_rid($rid); >> >> open(BLASTDEBUGFILE,'>',$blastdebugfile); >> print BLASTDEBUGFILE $organism; >> close(BLASTDEBUGFILE); >> >> # open(OUTFILE,'>',$outfile); >> # print OUTFILE "Test2 $result->database_name()"; >> # close(OUTFILE); >> >> #$hit = $result->next_hit; >> #open(new,'>',$debugfile); >> #print $hit; >> #close(new); >> >> while ( my $hit = $result->next_hit ) { >> >> next unless ( $v > 0); >> >> # open(OUTFILE,'>',$debugfile); >> # print OUTFILE "$hit in while hits"; >> # close(OUTFILE); >> >> my $sequ = $gb->get_Seq_by_version($hit->name); >> my $dna = $sequ->seq(); # get the sequence as a string >> push(@seqs,$dna); >> } >> } >> } >> } >> } >> >> #open(OUTFILE,'>',$debugfile); >> #print OUTFILE $seqs[0]; >> #close(OUTFILE); >> >> return(@seqs); >> >> } >> >> open(OUTFILE, '>',$outfile) || die ; >> >> print OUTFILE "\n >> RNAi Result >> \n >> \n >>

>> Inputsequence:
"; >> >> for ($i=0; $i> >> print OUTFILE substr ($in{'Inputseq'}, $i, 1); >> >> if ( ($i+1)%10==0){ >> print OUTFILE " "; >> } >> if ( ($i+1)%60==0){ >> print OUTFILE "
\n"; >> } >> } >> >> >> >> print OUTFILE "

"; >> >> $z=@compseqs; >> >> for($k=1;$k<$z;$k++) { >> print OUTFILE "

Compare >> Sequence:
"; >> >> for ($i=0; $i> >> print OUTFILE substr ($compseqs[$k], $i, 1); >> >> if ( ($i+1)%10==0){ >> print OUTFILE " "; >> } >> if ( ($i+1)%60==0){ >> print OUTFILE "
\n"; >> } >> } >> print OUTFILE "

"; >> } >> >> print OUTFILE "

>> Window:
$in{'Windowsize'} >>

>>

>> Threshold:
$in{'Threshold'} >>

"; >> my $j=0; >> >> for ($i=0; $i> >> if ($i<=length ($in{'Inputseq'})-$in{'Windowsize'}){ >> if ($out[$i]->{similar}<=$in{'Threshold'}){ >> $j=$in{'Windowsize'}; >> } >> $height=$out[$i]->{similar}*5; >> } >> >> if ($j>0) { >> print OUTFILE "> height=\"5\">"; >> $outstring .= "".substr ($in{'Inputseq'}, $i, >> 1).""; >> $j--; >> } >> else { >> print OUTFILE "> height=\"5\">"; >> $outstring .= "".substr ($in{'Inputseq'}, $i, >> 1).""; >> } >> >> if ( ($i+1)%10==0){ >> $outstring .= " "; >> } >> if ( ($i+1)%60==0){ >> $outstring .= "
\n"; >> >> } >> if ( ($i+1)%800==0){ >> print OUTFILE "

\n"; >> >> } >> } >> >> print OUTFILE "

> set\">$outstring"; >> >> #foreach (@out) { >> #print OUTFILE "

Sequence: $_->{sequence}: $_->{similar} matchs

"; >> #if ($_->{similar}<=$in{'Threshold'}){ >> >> # } >> #} >> >> print OUTFILE "\n\n"; >> >> close OUTFILE; >> >> #nameprint(); >> >> sub parse_form { >> local ($buffer, @pairs, $pair, $name, $value); >> # Read in text >> $ENV{'REQUEST_METHOD'} =~ tr/a-z/A-Z/; >> if ($ENV{'REQUEST_METHOD'} eq "POST") >> { >> read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'}); >> } >> else >> { >> $buffer = $ENV{'QUERY_STRING'}; >> } >> @pairs = split(/&/, $buffer); >> foreach $pair (@pairs) >> { >> ($name, $value) = split(/=/, $pair); >> $value =~ tr/+/ /; >> $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg; >> $in{$name} = $value; >> } >> } >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > From maj at fortinbras.us Sat Jan 9 13:05:41 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 9 Jan 2010 13:05:41 -0500 Subject: [Bioperl-l] Regarding blast in Bioperl In-Reply-To: References: Message-ID: <4C2E8133F916495B876628EF3E8FCBB2@NewLife> I see it immediately (from making same bug many times) : my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY => - '$organ[ORGN]'); +"$organ[ORGN]"); MAJ ----- Original Message ----- From: "Roopa Raghuveer" To: "Mark A. Jensen" Cc: Sent: Saturday, January 09, 2010 11:57 AM Subject: Re: [Bioperl-l] Regarding blast in Bioperl > Hello all, > > Thanks alot for your reply Mark. It was working for Trypanosoma brucei as > the organism parameter,but when I tried to use the Organism parameter from > the user,it was not working i.e., I was unable to get the target sequences. > Please help me in this regard. My code is > > #!/usr/bin/perl > > #path for extra camel module > use lib "/srv/www/htdocs/rain/RNAi/"; > use Roopablast; > > > use Bio::SearchIO; > use Bio::Search::Result::BlastResult; > use Bio::Perl; > use Bio::Tools::Run::RemoteBlast; > use Bio::Seq; > use Bio::SeqIO; > use Bio::DB::GenBank; > > $serverpath = "/srv/www/htdocs/rain/RNAi"; > $serverurl = "http://141.84.66.66/rain/RNAi"; > $outfile = $serverpath."/rnairesult_".time().".html"; > $nuc = $serverpath."/nuc".time().".txt"; > $debugfile = $serverpath."/debug_".time().".txt"; > $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; > > my $outstring =""; > > &parse_form; > > print "Content-type: text/html\n\n"; > print "\n"; > print "RNAi Result"; > print " URL=$serverurl/rnairesult_".time().".html\"> \n"; > print "\n"; > print "\n"; > print " Your results will appear href=$serverurl/rnairesult_".time().".html>here
"; > print " Please be patient, runtime can be up to 5 minutes
"; > print " This page will automatically reload in 30 seconds. Roopa"; > print "\n"; > print "\n"; > > defined(my $pid = fork) or die "Can't fork: $!"; > exit if $pid; > open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; > open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; > open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; > > open(OUTFILE, '>',$outfile); > > print OUTFILE "\n > RNAi Result > URL=$serverurl//rnairesult_".time().".html\"> \n > > \n > \n > Your results will appear href=$serverurl/rnairesult_".time().".html>here
> Please be patient, runtime can be up to 5 minutes wait wait wait......
> This page will automatically reload in 30 seconds Roopa
> \n > \n"; > > close(OUTFILE); > > > @compseqs = blastcode($in{'Inputseq'},$in{'Organism'}); > > $in{'Inputseq'} =~ s/>.*$//m; > $in{'Inputseq'} =~ s/[^TAGC]//gim; > $in{'Inputseq'} =~ tr/actg/ACTG/; > > @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'}, > $in{'Threshold'}); > > > sub blastcode > { > > $inpu1= $_[0]; > > $organ= $_[1]; > > open(NUC,'>',$nuc); > print NUC $inpu1,"\n"; > close(NUC); > > my $prog = 'blastn'; > my $db = 'refseq_rna'; > my $e_val= '1e-10'; > my $organism= $organ; > > $gb = new Bio::DB::GenBank; > > my @params = ( '-prog' => $prog, > '-data' => $db, > '-expect' => $e_val, > '-readmethod' => 'SearchIO', > '-Organism' => $organism ); > > open(OUTFILE,'>',$debugfile); > print OUTFILE $inpu1; > close(OUTFILE); > > > my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY => > '$organ[ORGN]'); > > #my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > > #change a paramter > > #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma > Brucei[ORGN]'; > > #change a paramter > # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]'; > > my $v = 1; > #$v is just to turn on and off the messages > > my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , > '-organism' => $organ ); > > > while (my $input = $str->next_seq()) > { > #Blast a sequence against a database: > #Alternatively, you could pass in a file with many > #sequences rather than loop through sequence one at a time > #Remove the loop starting 'while (my $input = $str->next_seq())' > #and swap the two lines below for an example of that. > > #open(OUTFILE,'>',$debugfile); > # print OUTFILE $input; > #close(OUTFILE); > > > my $r = $factory->submit_blast($input); > > open(OUTFILE,'>',$debugfile); > # print OUTFILE $r; > close(OUTFILE); > > print STDERR "waiting...." if($v>0); > > while ( my @rids = $factory->each_rid ) { > # open(OUTFILE,'>',$debugfile); > # print OUTFILE "while entered"; > # close(OUTFILE); > foreach my $rid ( @rids ) { > > # open(OUTFILE,'>',$debugfile); > # print OUTFILE "foreach entered"; > # close(OUTFILE); > > my $rc = $factory->retrieve_blast($rid); > > if( !ref($rc) ) > { > if( $rc < 0 ) > { > $factory->remove_rid($rid); > } > open(OUTFILE,'>',$debugfile); > # print OUTFILE "if entered"; > close(OUTFILE); > print STDERR "." if ( $v > 0 ); > sleep 5; > } > else { > # open(OUTFILE,'>',$debugfile); > # print OUTFILE "else entered"; > # close(OUTFILE); > > my $result = $rc->next_result(); > #save the output > $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; > > open(BLASTDEBUGFILE,'>',$blastdebugfile); > print BLASTDEBUGFILE $result->next_hit(); > close(BLASTDEBUGFILE); > > my $filename = > $serverpath."/blastdata_".time().$result->query_name()."\.out"; > > # open(DEBUGFILE,'>',$debugfile); > # open(new,'>',$filename); > # @arra=; > # print DEBUGFILE @arra; > # close(DEBUGFILE); > # close(new); > > $factory->save_output($filename); > # open(BLASTDEBUGFILE,'>',$debugfile); > # print BLASTDEBUGFILE "Hello $rid"; > # close(BLASTDEBUGFILE); > > $factory->remove_rid($rid); > > open(BLASTDEBUGFILE,'>',$blastdebugfile); > print BLASTDEBUGFILE $organism; > close(BLASTDEBUGFILE); > > # open(OUTFILE,'>',$outfile); > # print OUTFILE "Test2 $result->database_name()"; > # close(OUTFILE); > > #$hit = $result->next_hit; > #open(new,'>',$debugfile); > #print $hit; > #close(new); > > while ( my $hit = $result->next_hit ) { > > next unless ( $v > 0); > > # open(OUTFILE,'>',$debugfile); > # print OUTFILE "$hit in while hits"; > # close(OUTFILE); > > my $sequ = $gb->get_Seq_by_version($hit->name); > my $dna = $sequ->seq(); # get the sequence as a string > push(@seqs,$dna); > } > } > } > } > } > > #open(OUTFILE,'>',$debugfile); > #print OUTFILE $seqs[0]; > #close(OUTFILE); > > return(@seqs); > > } > > Regards, > Roopa. > > > On Fri, Jan 8, 2010 at 4:36 PM, Mark A. Jensen wrote: > >> Hi Roopa-- >> >> I got your code to work with the following changes: >> >> +# the input should be a valid FASTA file... >> ... >> open(NUC,'>',$nuc); >> +print NUC ">seq (need a name line for valid fasta)\n"; >> print NUC $inpu1, "\n"; >> close(NUC); >> ... >> >> +# you can set these header parms in the call itself... >> - my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >> + my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY => >> ''Trypanosoma Brucei[ORGN]'); >> >> #change a paramter >> +# commented this out... >> +# $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma >> Brucei[ORGN]'; >> >> MAJ >> ----- Original Message ----- From: "Roopa Raghuveer" > > >> To: >> Sent: Friday, January 08, 2010 10:00 AM >> Subject: [Bioperl-l] Regarding blast in Bioperl >> >> >> Hello all, >>> >>> I was trying Remote blast using Bioperl. My input data is a Trypanosoma >>> brucei sequence in Fasta format. When I was trying to submit to BLAST >>> using >>> the step >>> $r=$factory->submit_blast($input) >>> It was not returning anything which I checked by debugging the code. It is >>> not blasting my input sequence even though I mentioned all the >>> parameters.I >>> would paste the code below. >>> >>> Please help me in solving put this problem. It is very urgent. >>> >>> Regards >>> Roopa. >>> >>> #!/usr/bin/perl >>> >>> #path for extra camel module >>> use lib "/srv/www/htdocs/rain/RNAi/"; >>> use Roopablast; >>> >>> >>> use Bio::SearchIO; >>> use Bio::Search::Result::BlastResult; >>> use Bio::Perl; >>> use Bio::Tools::Run::RemoteBlast; >>> use Bio::Seq; >>> use Bio::SeqIO; >>> use Bio::DB::GenBank; >>> >>> $serverpath = "/srv/www/htdocs/rain/RNAi"; >>> $serverurl = "http://141.84.66.66/rain/RNAi"; >>> $outfile = $serverpath."/rnairesult_".time().".html"; >>> $nuc = $serverpath."/nuc".time().".txt"; >>> $debugfile = $serverpath."/debug_".time().".txt"; >>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; >>> >>> my $outstring =""; >>> >>> &parse_form; >>> >>> print "Content-type: text/html\n\n"; >>> print "\n"; >>> print "RNAi Result"; >>> print ">> URL=$serverurl/rnairesult_".time().".html\"> \n"; >>> print "\n"; >>> print "\n"; >>> print " Your results will appear >> href=$serverurl/rnairesult_".time().".html>here
"; >>> print " Please be patient, runtime can be up to 5 minutes
"; >>> print " This page will automatically reload in 30 seconds. Roopa"; >>> print "\n"; >>> print "\n"; >>> >>> defined(my $pid = fork) or die "Can't fork: $!"; >>> exit if $pid; >>> open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; >>> open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; >>> open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; >>> >>> >>> >>> open(OUTFILE, '>',$outfile); >>> >>> print OUTFILE "\n >>> RNAi Result >>> >> URL=$serverurl//rnairesult_".time().".html\"> \n >>> >>> \n >>> \n >>> Your results will appear >> href=$serverurl/rnairesult_".time().".html>here
>>> Please be patient, runtime can be up to 5 minutes wait wait >>> wait......
>>> This page will automatically reload in 30 seconds Roopa
>>> \n >>> \n"; >>> >>> close(OUTFILE); >>> >>> >>> @compseqs = blastcode($in{'Inputseq'}); >>> >>> $in{'Inputseq'} =~ s/>.*$//m; >>> $in{'Inputseq'} =~ s/[^TAGC]//gim; >>> $in{'Inputseq'} =~ tr/actg/ACTG/; >>> >>> @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'}, >>> $in{'Threshold'}); >>> >>> >>> sub blastcode >>> { >>> >>> $inpu1= $_[0]; >>> >>> #$organ= $_[1]; >>> >>> open(NUC,'>',$nuc); >>> print NUC $inpu1; >>> close(NUC); >>> >>> my $prog = 'blastn'; >>> my $db = 'refseq_rna'; >>> my $e_val= '1e-10'; >>> my $organism= 'Trypanosoma Brucei'; >>> >>> $gb = new Bio::DB::GenBank; >>> >>> my @params = ( '-prog' => $prog, >>> '-data' => $db, >>> '-expect' => $e_val, >>> '-readmethod' => 'SearchIO', >>> '-Organism' => $organism ); >>> >>> # open(OUTFILE,'>',$debugfile); >>> # print OUTFILE @params; >>> # close(OUTFILE); >>> >>> >>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >>> >>> #change a paramter >>> >>> $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma >>> Brucei[ORGN]'; >>> >>> #change a paramter >>> # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]'; >>> >>> my $v = 1; >>> #$v is just to turn on and off the messages >>> >>> my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , >>> '-organism' => 'Trypanosoma Brucei' ); >>> >>> >>> while (my $input = $str->next_seq()) >>> { >>> #Blast a sequence against a database: >>> #Alternatively, you could pass in a file with many >>> #sequences rather than loop through sequence one at a time >>> #Remove the loop starting 'while (my $input = $str->next_seq())' >>> #and swap the two lines below for an example of that. >>> >>> open(OUTFILE,'>',$debugfile); >>> print OUTFILE $input; >>> close(OUTFILE); >>> >>> >>> my $r = $factory->submit_blast($input); #The program stops here it >>> does not return any value and it does not enter the While loop,Please help >>> me in this regard.# >>> open(OUTFILE,'>',$debugfile); >>> print OUTFILE $r; >>> close(OUTFILE); >>> >>> >>> print STDERR "waiting...." if($v>0); >>> >>> while ( my @rids = $factory->each_rid ) { >>> open(OUTFILE,'>',$debugfile); >>> print OUTFILE "while entered"; >>> close(OUTFILE); >>> foreach my $rid ( @rids ) { >>> >>> open(OUTFILE,'>',$debugfile); >>> print OUTFILE "foreach entered"; >>> close(OUTFILE); >>> >>> my $rc = $factory->retrieve_blast($rid); >>> >>> if( !ref($rc) ) >>> { >>> if( $rc < 0 ) >>> { >>> $factory->remove_rid($rid); >>> } >>> open(OUTFILE,'>',$debugfile); >>> print OUTFILE "if entered"; >>> close(OUTFILE); >>> print STDERR "." if ( $v > 0 ); >>> sleep 5; >>> } >>> else { >>> open(OUTFILE,'>',$debugfile); >>> print OUTFILE "else entered"; >>> close(OUTFILE); >>> >>> my $result = $rc->next_result(); >>> #save the output >>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; >>> >>> open(BLASTDEBUGFILE,'>',$blastdebugfile); >>> print BLASTDEBUGFILE $result->next_hit(); >>> close(BLASTDEBUGFILE); >>> >>> my $filename = >>> $serverpath."/blastdata_".time().$result->query_name()."\.out"; >>> >>> # open(DEBUGFILE,'>',$debugfile); >>> # open(new,'>',$filename); >>> # @arra=; >>> # print DEBUGFILE @arra; >>> # close(DEBUGFILE); >>> # close(new); >>> >>> $factory->save_output($filename); >>> >>> # open(BLASTDEBUGFILE,'>',$debugfile); >>> # print BLASTDEBUGFILE "Hello $rid"; >>> # close(BLASTDEBUGFILE); >>> >>> $factory->remove_rid($rid); >>> >>> open(BLASTDEBUGFILE,'>',$blastdebugfile); >>> print BLASTDEBUGFILE $organism; >>> close(BLASTDEBUGFILE); >>> >>> # open(OUTFILE,'>',$outfile); >>> # print OUTFILE "Test2 $result->database_name()"; >>> # close(OUTFILE); >>> >>> #$hit = $result->next_hit; >>> #open(new,'>',$debugfile); >>> #print $hit; >>> #close(new); >>> >>> while ( my $hit = $result->next_hit ) { >>> >>> next unless ( $v > 0); >>> >>> # open(OUTFILE,'>',$debugfile); >>> # print OUTFILE "$hit in while hits"; >>> # close(OUTFILE); >>> >>> my $sequ = $gb->get_Seq_by_version($hit->name); >>> my $dna = $sequ->seq(); # get the sequence as a string >>> push(@seqs,$dna); >>> } >>> } >>> } >>> } >>> } >>> >>> #open(OUTFILE,'>',$debugfile); >>> #print OUTFILE $seqs[0]; >>> #close(OUTFILE); >>> >>> return(@seqs); >>> >>> } >>> >>> open(OUTFILE, '>',$outfile) || die ; >>> >>> print OUTFILE "\n >>> RNAi Result >>> \n >>> \n >>>

>>> Inputsequence:
"; >>> >>> for ($i=0; $i>> >>> print OUTFILE substr ($in{'Inputseq'}, $i, 1); >>> >>> if ( ($i+1)%10==0){ >>> print OUTFILE " "; >>> } >>> if ( ($i+1)%60==0){ >>> print OUTFILE "
\n"; >>> } >>> } >>> >>> >>> >>> print OUTFILE "

"; >>> >>> $z=@compseqs; >>> >>> for($k=1;$k<$z;$k++) { >>> print OUTFILE "

Compare >>> Sequence:
"; >>> >>> for ($i=0; $i>> >>> print OUTFILE substr ($compseqs[$k], $i, 1); >>> >>> if ( ($i+1)%10==0){ >>> print OUTFILE " "; >>> } >>> if ( ($i+1)%60==0){ >>> print OUTFILE "
\n"; >>> } >>> } >>> print OUTFILE "

"; >>> } >>> >>> print OUTFILE "

>>> Window:
$in{'Windowsize'} >>>

>>>

>>> Threshold:
$in{'Threshold'} >>>

"; >>> my $j=0; >>> >>> for ($i=0; $i>> >>> if ($i<=length ($in{'Inputseq'})-$in{'Windowsize'}){ >>> if ($out[$i]->{similar}<=$in{'Threshold'}){ >>> $j=$in{'Windowsize'}; >>> } >>> $height=$out[$i]->{similar}*5; >>> } >>> >>> if ($j>0) { >>> print OUTFILE ">> height=\"5\">"; >>> $outstring .= "".substr ($in{'Inputseq'}, $i, >>> 1).""; >>> $j--; >>> } >>> else { >>> print OUTFILE ">> height=\"5\">"; >>> $outstring .= "".substr ($in{'Inputseq'}, $i, >>> 1).""; >>> } >>> >>> if ( ($i+1)%10==0){ >>> $outstring .= " "; >>> } >>> if ( ($i+1)%60==0){ >>> $outstring .= "
\n"; >>> >>> } >>> if ( ($i+1)%800==0){ >>> print OUTFILE "

\n"; >>> >>> } >>> } >>> >>> print OUTFILE "

>> set\">$outstring"; >>> >>> #foreach (@out) { >>> #print OUTFILE "

Sequence: $_->{sequence}: $_->{similar} matchs

"; >>> #if ($_->{similar}<=$in{'Threshold'}){ >>> >>> # } >>> #} >>> >>> print OUTFILE "\n\n"; >>> >>> close OUTFILE; >>> >>> #nameprint(); >>> >>> sub parse_form { >>> local ($buffer, @pairs, $pair, $name, $value); >>> # Read in text >>> $ENV{'REQUEST_METHOD'} =~ tr/a-z/A-Z/; >>> if ($ENV{'REQUEST_METHOD'} eq "POST") >>> { >>> read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'}); >>> } >>> else >>> { >>> $buffer = $ENV{'QUERY_STRING'}; >>> } >>> @pairs = split(/&/, $buffer); >>> foreach $pair (@pairs) >>> { >>> ($name, $value) = split(/=/, $pair); >>> $value =~ tr/+/ /; >>> $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg; >>> $in{$name} = $value; >>> } >>> } >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From robert.bradbury at gmail.com Sat Jan 9 14:52:53 2010 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Sat, 9 Jan 2010 14:52:53 -0500 Subject: [Bioperl-l] Regarding blast in Bioperl In-Reply-To: References: Message-ID: Roopa, Mark is correct, you have to be very careful of single vs. double quotes in perl. Double quoted strings are "interpreted" while single quoted strings are taken literally is my current understanding. I tried to run your script (with fixes) but without the supporting files it appears to be impossible. What I am curious about is what it is trying to do, I was particularly i particularly intrigued by some apparent efforts to parse blast results into color enhanced HTML and without thinking about the code in detail it seems easier to simply ask what you are trying to do? I find "classical" blast results particularly tedious and long for blast results that display concise information as the NCBI homologene cross-species comparisons do. Unfortunately NCBI has deemed their methods (I have asked them) "too complex to disclose (for a person comfortable in dealing with assembly language, or even gate level electronics -- "too complex" is a very relative concept)". One has the option of using NCBI with a limited number of species but good display methodologies or Ensembl with many more species but less desirable display methodologies (phylogenetic tree derived from cross species comparisons). And for the WRN protein which may play a key role in aging (through the activity of its exonuclease domain mutating DNA sequences and inducing microdeletions and microinsertions this gets important because it appears that the *C. elegans* genome is missing the exonuclease domain (so it may be useless from the perspective of studying aging), and the other 4 nematode species which have been sequenced aren't even in the NCBI nor the Ensembl databases. Needless to say, if we manage in the near future, given the drop in sequencing costs, to sequence the nematodes which are freeze/thaw tolerant (which induces DSB that have to be repaired) those genomes will be unlikely to be in the NCBI/Ensembl databases either. So there is a requirement for the user to develop the ability to mix and match public and obscure databases in creative ways to provide easy to interpret information. Robert Bradbury From robert.bradbury at gmail.com Sat Jan 9 15:27:54 2010 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Sat, 9 Jan 2010 15:27:54 -0500 Subject: [Bioperl-l] Ensembl problems Message-ID: I am trying to get the examples provided by EMBL/Ensembl to work and am encountering problems. For example, about 1/3 of the way through the Compara API tutorial [1] there is what is supposed to be a completely functional script. It does not work. This is in contrast to some of the earlier simple scripts (listing the species in Ensmbl etc.) which do work on my machine, so I have all the libraries do dah installed correctly). Very poor form to document scripts which do not function on a properly setup system. I have modified my invocation of the script slightly: Align.pl --set_of_species \ "Homo sapiens:Pan troglodytes:Gorilla gorilla:Macaca mulatta:Otolemur garnettii:Pongo pygmaeus:Equus caballus:Bos taurus:Loxodonta africana:Pteropus vampyrus:Myotis lucifugus:Felis catus:Canis familiaris:Sus scrofa:Rattus norvegicus:Mus musculus:Gallus gallus:Xenopus tropicalis:Takifugu rubripes:Tetraodon nigroviridis:Danio rerio:Tupaia belangeri:Caenorhabditis elegans:Saccharomyces cerevisiae" which results in "Can't call method "fetch_by_method_link_type_GenomeDBs" on an undefined value at ./Align.pl line 132." (Align.pl is my slightly modified example of the Compara Tutoraial code.) As these are slightly modified perl scripts from the documantation, the line numbers may be variable. I can print out the genome_dbs, and it gives me a list of genome names (hash tables) though it appears that is problematic in the Align.pl script. in spite of the fact that just previously to that call I dumped "genome_dbs" and got back some 25 hash tables (expected). I believe this occurs whether one is comparing "human:mouse" or the more complex species set I have outlined above. Has anyone else attempted to run the code documented in the Ensembl API Tutorial? Any suggestions as to what direction to go in would be appreciated -- when one is trying to copy code out of a tutorial and it fails its kind of hard to know where to go.) There do appear to be some problems in the specifications of a Compara version/database and there don't appear to be a lot of resources informing one of what resources are currently available. Robert 1. http://pre.ensembl.org/info/docs/api/compara/compara_tutorial.html From ak at ebi.ac.uk Sat Jan 9 17:01:21 2010 From: ak at ebi.ac.uk (Andreas =?iso-8859-1?B?S+Ro5HJp?=) Date: Sat, 9 Jan 2010 22:01:21 +0000 Subject: [Bioperl-l] Ensembl problems In-Reply-To: References: Message-ID: <20100109220121.GA9521@quux.windows.ebi.ac.uk> On Sat, Jan 09, 2010 at 03:27:54PM -0500, Robert Bradbury wrote: > I am trying to get the examples provided by EMBL/Ensembl to work and am > encountering problems. Hi Robert, The ensembl-dev list is the appropriate forum for this type of questions as it has nothing to do with bioperl. There is also the Ensembl helpdesk. If you send your problem to I'm sure that it will be picked up by the appropriate people (I do myself not know enough about the Compara API to be able to diagnose this problem straight away I'm afraid). Be sure to submit a minimal script that still exhibit the problem, and information about what version of the APIs you're using (we will assume that you're not mixing newer version of the API with older databases or vice versa). We are generally very happy to have bugs in documentation or code pointed out to us, and will correct errors as we are made aware of them. Kind regards, Andreas > For example, about 1/3 of the way through the Compara API tutorial [1] there > is what is supposed to be a completely functional script. It does not > work. This is in contrast to some of the earlier simple scripts (listing > the species in Ensmbl etc.) which do work on my machine, so I have all the > libraries do dah installed correctly). > > Very poor form to document scripts which do not function on a properly setup > system. > > I have modified my invocation of the script slightly: > Align.pl --set_of_species \ > "Homo sapiens:Pan troglodytes:Gorilla gorilla:Macaca mulatta:Otolemur > garnettii:Pongo pygmaeus:Equus caballus:Bos taurus:Loxodonta > africana:Pteropus vampyrus:Myotis lucifugus:Felis catus:Canis familiaris:Sus > scrofa:Rattus norvegicus:Mus musculus:Gallus gallus:Xenopus > tropicalis:Takifugu rubripes:Tetraodon nigroviridis:Danio rerio:Tupaia > belangeri:Caenorhabditis elegans:Saccharomyces cerevisiae" > > which results in "Can't call method "fetch_by_method_link_type_GenomeDBs" on > an undefined value at ./Align.pl line 132." (Align.pl is my slightly > modified example of the Compara Tutoraial code.) > As these are slightly modified perl scripts from the documantation, the line > numbers may be variable. > > I can print out the genome_dbs, and it gives me a list of genome names (hash > tables) though it appears that is problematic in the Align.pl script. > in spite of the fact that just previously to that call I dumped "genome_dbs" > and got back some 25 hash tables (expected). I believe this occurs whether > one is comparing "human:mouse" or the more complex species set I have > outlined above. > > > > Has anyone else attempted to run the code documented in the Ensembl API > Tutorial? > Any suggestions as to what direction to go in would be appreciated -- when > one is trying to copy code out of a tutorial and it fails its kind of hard > to know where to go.) > > There do appear to be some problems in the specifications of a Compara > version/database and there don't appear to be a lot of resources informing > one of what resources are currently available. > > Robert > > > 1. http://pre.ensembl.org/info/docs/api/compara/compara_tutorial.html > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Andreas K?h?ri, Ensembl Software Developer European Bioinformatics Institute (EMBL-EBI) Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, United Kingdom From cjfields at illinois.edu Sat Jan 9 17:01:19 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 9 Jan 2010 16:01:19 -0600 Subject: [Bioperl-l] Ensembl problems In-Reply-To: References: Message-ID: <743C998D-BBB5-4832-BA25-24D7D7288F78@illinois.edu> Robert, Ensembl errors probably should be redirected to the ensembl mail list. I can't speak to the problems with it (they appear specific to the Ensembl tool set). chris On Jan 9, 2010, at 2:27 PM, Robert Bradbury wrote: > I am trying to get the examples provided by EMBL/Ensembl to work and am > encountering problems. > > For example, about 1/3 of the way through the Compara API tutorial [1] there > is what is supposed to be a completely functional script. It does not > work. This is in contrast to some of the earlier simple scripts (listing > the species in Ensmbl etc.) which do work on my machine, so I have all the > libraries do dah installed correctly). > > Very poor form to document scripts which do not function on a properly setup > system. > > I have modified my invocation of the script slightly: > Align.pl --set_of_species \ > "Homo sapiens:Pan troglodytes:Gorilla gorilla:Macaca mulatta:Otolemur > garnettii:Pongo pygmaeus:Equus caballus:Bos taurus:Loxodonta > africana:Pteropus vampyrus:Myotis lucifugus:Felis catus:Canis familiaris:Sus > scrofa:Rattus norvegicus:Mus musculus:Gallus gallus:Xenopus > tropicalis:Takifugu rubripes:Tetraodon nigroviridis:Danio rerio:Tupaia > belangeri:Caenorhabditis elegans:Saccharomyces cerevisiae" > > which results in "Can't call method "fetch_by_method_link_type_GenomeDBs" on > an undefined value at ./Align.pl line 132." (Align.pl is my slightly > modified example of the Compara Tutoraial code.) > As these are slightly modified perl scripts from the documantation, the line > numbers may be variable. > > I can print out the genome_dbs, and it gives me a list of genome names (hash > tables) though it appears that is problematic in the Align.pl script. > in spite of the fact that just previously to that call I dumped "genome_dbs" > and got back some 25 hash tables (expected). I believe this occurs whether > one is comparing "human:mouse" or the more complex species set I have > outlined above. > > > > Has anyone else attempted to run the code documented in the Ensembl API > Tutorial? > Any suggestions as to what direction to go in would be appreciated -- when > one is trying to copy code out of a tutorial and it fails its kind of hard > to know where to go.) > > There do appear to be some problems in the specifications of a Compara > version/database and there don't appear to be a lot of resources informing > one of what resources are currently available. > > Robert > > > 1. http://pre.ensembl.org/info/docs/api/compara/compara_tutorial.html > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From robert.bradbury at gmail.com Sun Jan 10 14:47:00 2010 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Sun, 10 Jan 2010 14:47:00 -0500 Subject: [Bioperl-l] Ensembl problems In-Reply-To: <20100109220121.GA9521@quux.windows.ebi.ac.uk> References: <20100109220121.GA9521@quux.windows.ebi.ac.uk> Message-ID: As it turns out the example from the file I cited (the compara API tutorial does work). The code that I started with may have been from a "MS-WORD" document distributed with the documentation (which could quite well be out-of-date). But even the corrected code does not work for various uncommon comparisons between species (which they may not have archived in Ensembl). I also don't understand enough about the functions yet as to whether they are comparing the same regions from the same chromosomes that just happen to be identical or whether they are comparing the same region with a homologous region on a different chromosome (i.e. conserved genes). I'm going to have to dig into this some more to figure out what is going on. Thanks for the pointers, I'll refer future questions to the Ensembl list/help-desk. However, if anyone knows Ensembl very well, the database has in it some of these interspecies comparisons already. They are accessed when one does a phylogeny tree for specific genes (and generally for highly conserved gene you will get a tree that includes nearly all 50 species in the database). As I don't think they are computed on-the-fly, the information must be precomputed and stored someplace in the database. I would very much like to know how to access this information. Thanks, Robert On 1/9/10, Andreas K?h?ri wrote: > On Sat, Jan 09, 2010 at 03:27:54PM -0500, Robert Bradbury wrote: >> I am trying to get the examples provided by EMBL/Ensembl to work and am >> encountering problems. > > Hi Robert, > > The ensembl-dev list is the appropriate forum for this type of questions > as it has nothing to do with bioperl. > > There is also the Ensembl helpdesk. If you send your problem to > I'm sure that it will be picked up by the > appropriate people (I do myself not know enough about the Compara API to > be able to diagnose this problem straight away I'm afraid). > > Be sure to submit a minimal script that still exhibit the problem, and > information about what version of the APIs you're using (we will assume > that you're not mixing newer version of the API with older databases or > vice versa). > > We are generally very happy to have bugs in documentation or code > pointed out to us, and will correct errors as we are made aware of them. > > > Kind regards, > Andreas > >> For example, about 1/3 of the way through the Compara API tutorial [1] >> there >> is what is supposed to be a completely functional script. It does not >> work. This is in contrast to some of the earlier simple scripts (listing >> the species in Ensmbl etc.) which do work on my machine, so I have all >> the >> libraries do dah installed correctly). >> >> Very poor form to document scripts which do not function on a properly >> setup >> system. >> >> I have modified my invocation of the script slightly: >> Align.pl --set_of_species \ >> "Homo sapiens:Pan troglodytes:Gorilla gorilla:Macaca mulatta:Otolemur >> garnettii:Pongo pygmaeus:Equus caballus:Bos taurus:Loxodonta >> africana:Pteropus vampyrus:Myotis lucifugus:Felis catus:Canis >> familiaris:Sus >> scrofa:Rattus norvegicus:Mus musculus:Gallus gallus:Xenopus >> tropicalis:Takifugu rubripes:Tetraodon nigroviridis:Danio rerio:Tupaia >> belangeri:Caenorhabditis elegans:Saccharomyces cerevisiae" >> >> which results in "Can't call method "fetch_by_method_link_type_GenomeDBs" >> on >> an undefined value at ./Align.pl line 132." (Align.pl is my slightly >> modified example of the Compara Tutoraial code.) >> As these are slightly modified perl scripts from the documantation, the >> line >> numbers may be variable. >> >> I can print out the genome_dbs, and it gives me a list of genome names >> (hash >> tables) though it appears that is problematic in the Align.pl script. >> in spite of the fact that just previously to that call I dumped >> "genome_dbs" >> and got back some 25 hash tables (expected). I believe this occurs >> whether >> one is comparing "human:mouse" or the more complex species set I have >> outlined above. >> >> >> >> Has anyone else attempted to run the code documented in the Ensembl API >> Tutorial? >> Any suggestions as to what direction to go in would be appreciated -- when >> one is trying to copy code out of a tutorial and it fails its kind of hard >> to know where to go.) >> >> There do appear to be some problems in the specifications of a Compara >> version/database and there don't appear to be a lot of resources informing >> one of what resources are currently available. >> >> Robert >> >> >> 1. http://pre.ensembl.org/info/docs/api/compara/compara_tutorial.html >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > -- > Andreas K?h?ri, Ensembl Software Developer > European Bioinformatics Institute (EMBL-EBI) > Wellcome Trust Genome Campus, Hinxton > Cambridge CB10 1SD, United Kingdom > From Russell.Smithies at agresearch.co.nz Sun Jan 10 15:34:39 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Mon, 11 Jan 2010 09:34:39 +1300 Subject: [Bioperl-l] how to retrieve organism name from accession number? In-Reply-To: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife> References: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz> An alternate non-BioPerly way (that may be faster given NCBI's flakiness lately) would be to download the gi_taxid_nucl.zip or gi_taxid_prot.zip files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash and do lookups. In that same dir, taxdump.tar.gz contains a file called names.dmp which lists taxids and descriptions (and synonyms) If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I could do this: my $taxid = $gi_taxid_nucl{$accession}; my $org_name = $names{$taxid}; --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen > Sent: Saturday, 26 December 2009 4:52 p.m. > To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] how to retrieve organism name from accession > number? > > Bhakti, > The following example (using EUtilities) may serve your purpose: > > use Bio::DB::EUtilities; > > my (%taxa, @taxa); > my (%names, %idmap); > > # these are protein ids; nuc ids will work by changing -dbfrom => > 'nucleotide', > # (probably) > > my @ids = qw(1621261 89318838 68536103 20807972 730439); > > my $factory = Bio::DB::EUtilities->new(-eutil => 'elink', > -db => 'taxonomy', > -dbfrom => 'protein', > -correspondence => 1, > -id => \@ids); > > # iterate through the LinkSet objects > while (my $ds = $factory->next_LinkSet) { > $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0] > } > > @taxa = @taxa{@ids}; > > $factory = Bio::DB::EUtilities->new(-eutil => 'esummary', > -db => 'taxonomy', > -id => \@taxa ); > > while (local $_ = $factory->next_DocSum) { > $names{($_->get_contents_by_name('TaxId'))[0]} = > ($_->get_contents_by_name('ScientificName'))[0]; > } > > foreach (@ids) { > $idmap{$_} = $names{$taxa{$_}}; > } > > # %idmap is > # 1621261 => 'Mycobacterium tuberculosis H37Rv' > # 20807972 => 'Thermoanaerobacter tengcongensis MB4' > # 68536103 => 'Corynebacterium jeikeium K411' > # 730439 => 'Bacillus caldolyticus' > # 89318838 => undef (this record has been removed from the db) > > 1; > > You probably will need to break up your 30000 into chunks > (say, 1000-3000 each), and do the above on each chunk with a > > sleep 3; > > or so separating the queries. > MAJ > ----- Original Message ----- > From: "Bhakti Dwivedi" > To: > Sent: Friday, December 25, 2009 9:46 PM > Subject: [Bioperl-l] how to retrieve organism name from accession number? > > > > Hi, > > > > Does anyone know how to retrieve the "Source" or the "Species name" > given > > the accession number using Bioperl. I have these 30,000 accession > numbers > > for which I need to get the source organisms. Any kind of help will be > > appreciated. > > > > Thanks > > > > BD > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From cjfields at illinois.edu Sun Jan 10 15:49:40 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 10 Jan 2010 14:49:40 -0600 Subject: [Bioperl-l] how to retrieve organism name from accession number? In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz> References: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife> <18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz> Message-ID: One could also use Bio::DB::Taxonomy, which indexes the same files or (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for the details). chris On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote: > An alternate non-BioPerly way (that may be faster given NCBI's flakiness lately) would be to download the gi_taxid_nucl.zip or gi_taxid_prot.zip files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash and do lookups. > In that same dir, taxdump.tar.gz contains a file called names.dmp which lists taxids and descriptions (and synonyms) > > If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I could do this: > > my $taxid = $gi_taxid_nucl{$accession}; > my $org_name = $names{$taxid}; > > --Russell > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen >> Sent: Saturday, 26 December 2009 4:52 p.m. >> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] how to retrieve organism name from accession >> number? >> >> Bhakti, >> The following example (using EUtilities) may serve your purpose: >> >> use Bio::DB::EUtilities; >> >> my (%taxa, @taxa); >> my (%names, %idmap); >> >> # these are protein ids; nuc ids will work by changing -dbfrom => >> 'nucleotide', >> # (probably) >> >> my @ids = qw(1621261 89318838 68536103 20807972 730439); >> >> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink', >> -db => 'taxonomy', >> -dbfrom => 'protein', >> -correspondence => 1, >> -id => \@ids); >> >> # iterate through the LinkSet objects >> while (my $ds = $factory->next_LinkSet) { >> $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0] >> } >> >> @taxa = @taxa{@ids}; >> >> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary', >> -db => 'taxonomy', >> -id => \@taxa ); >> >> while (local $_ = $factory->next_DocSum) { >> $names{($_->get_contents_by_name('TaxId'))[0]} = >> ($_->get_contents_by_name('ScientificName'))[0]; >> } >> >> foreach (@ids) { >> $idmap{$_} = $names{$taxa{$_}}; >> } >> >> # %idmap is >> # 1621261 => 'Mycobacterium tuberculosis H37Rv' >> # 20807972 => 'Thermoanaerobacter tengcongensis MB4' >> # 68536103 => 'Corynebacterium jeikeium K411' >> # 730439 => 'Bacillus caldolyticus' >> # 89318838 => undef (this record has been removed from the db) >> >> 1; >> >> You probably will need to break up your 30000 into chunks >> (say, 1000-3000 each), and do the above on each chunk with a >> >> sleep 3; >> >> or so separating the queries. >> MAJ >> ----- Original Message ----- >> From: "Bhakti Dwivedi" >> To: >> Sent: Friday, December 25, 2009 9:46 PM >> Subject: [Bioperl-l] how to retrieve organism name from accession number? >> >> >>> Hi, >>> >>> Does anyone know how to retrieve the "Source" or the "Species name" >> given >>> the accession number using Bioperl. I have these 30,000 accession >> numbers >>> for which I need to get the source organisms. Any kind of help will be >>> appreciated. >>> >>> Thanks >>> >>> BD >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Russell.Smithies at agresearch.co.nz Sun Jan 10 16:05:06 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Mon, 11 Jan 2010 10:05:06 +1300 Subject: [Bioperl-l] how to retrieve organism name from accession number? In-Reply-To: References: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife> <18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz> I've started to go off eUtils recently (not BioPerl's fault) as I've often been finding that with large queries, chunks of the resulting data is missing. For example, before Xmas I was creating species-specific databases by using eUtils to get a list of GI numbers back for a taxid, then retrieving the fasta sequences in chunks of 500. Very regularly, in the middle of the fasta there would be a message about resource unavailable eg. >test_sequence_1 TACGATCATCGCTResource UnavailableTACGACTCTGCT >test_sequence_2 TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT Often this wasn't detected until formatdb complained about invalid characters. Inquiries to NCBI as to why this was happening and what to do about it returned stupid answers ("do each sequence manually thru the web interface", or "use eUtils"). As we have a nice fast network connection, I now prefer to download very large gzip files (i.e. all of refseq) and extract what I need. I can't help but think that NCBI could solve a lot of problems if they gzipped the output from eUtils queries - it's something I've requested regularly for the last 5 years or so!! --Russell > -----Original Message----- > From: Chris Fields [mailto:cjfields at illinois.edu] > Sent: Monday, 11 January 2010 9:50 a.m. > To: Smithies, Russell > Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-bio.org' > Subject: Re: [Bioperl-l] how to retrieve organism name from accession > number? > > One could also use Bio::DB::Taxonomy, which indexes the same files or > (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for the > details). > > chris > > On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote: > > > An alternate non-BioPerly way (that may be faster given NCBI's flakiness > lately) would be to download the gi_taxid_nucl.zip or gi_taxid_prot.zip > files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash and > do lookups. > > In that same dir, taxdump.tar.gz contains a file called names.dmp which > lists taxids and descriptions (and synonyms) > > > > If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I > could do this: > > > > my $taxid = $gi_taxid_nucl{$accession}; > > my $org_name = $names{$taxid}; > > > > --Russell > > > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen > >> Sent: Saturday, 26 December 2009 4:52 p.m. > >> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org > >> Subject: Re: [Bioperl-l] how to retrieve organism name from accession > >> number? > >> > >> Bhakti, > >> The following example (using EUtilities) may serve your purpose: > >> > >> use Bio::DB::EUtilities; > >> > >> my (%taxa, @taxa); > >> my (%names, %idmap); > >> > >> # these are protein ids; nuc ids will work by changing -dbfrom => > >> 'nucleotide', > >> # (probably) > >> > >> my @ids = qw(1621261 89318838 68536103 20807972 730439); > >> > >> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink', > >> -db => 'taxonomy', > >> -dbfrom => 'protein', > >> -correspondence => 1, > >> -id => \@ids); > >> > >> # iterate through the LinkSet objects > >> while (my $ds = $factory->next_LinkSet) { > >> $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0] > >> } > >> > >> @taxa = @taxa{@ids}; > >> > >> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary', > >> -db => 'taxonomy', > >> -id => \@taxa ); > >> > >> while (local $_ = $factory->next_DocSum) { > >> $names{($_->get_contents_by_name('TaxId'))[0]} = > >> ($_->get_contents_by_name('ScientificName'))[0]; > >> } > >> > >> foreach (@ids) { > >> $idmap{$_} = $names{$taxa{$_}}; > >> } > >> > >> # %idmap is > >> # 1621261 => 'Mycobacterium tuberculosis H37Rv' > >> # 20807972 => 'Thermoanaerobacter tengcongensis MB4' > >> # 68536103 => 'Corynebacterium jeikeium K411' > >> # 730439 => 'Bacillus caldolyticus' > >> # 89318838 => undef (this record has been removed from the db) > >> > >> 1; > >> > >> You probably will need to break up your 30000 into chunks > >> (say, 1000-3000 each), and do the above on each chunk with a > >> > >> sleep 3; > >> > >> or so separating the queries. > >> MAJ > >> ----- Original Message ----- > >> From: "Bhakti Dwivedi" > >> To: > >> Sent: Friday, December 25, 2009 9:46 PM > >> Subject: [Bioperl-l] how to retrieve organism name from accession > number? > >> > >> > >>> Hi, > >>> > >>> Does anyone know how to retrieve the "Source" or the "Species name" > >> given > >>> the accession number using Bioperl. I have these 30,000 accession > >> numbers > >>> for which I need to get the source organisms. Any kind of help will > be > >>> appreciated. > >>> > >>> Thanks > >>> > >>> BD > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >>> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > ======================================================================= > > Attention: The information contained in this message and/or attachments > > from AgResearch Limited is intended only for the persons or entities > > to which it is addressed and may contain confidential and/or privileged > > material. Any review, retransmission, dissemination or other use of, or > > taking of any action in reliance upon, this information by persons or > > entities other than the intended recipients is prohibited by AgResearch > > Limited. If you have received this message in error, please notify the > > sender immediately. > > ======================================================================= > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l From avilella at gmail.com Sun Jan 10 16:05:13 2010 From: avilella at gmail.com (Albert Vilella) Date: Sun, 10 Jan 2010 21:05:13 +0000 Subject: [Bioperl-l] Ensembl problems In-Reply-To: References: <20100109220121.GA9521@quux.windows.ebi.ac.uk> Message-ID: <358f4d651001101305q1b75cfe3q558a245ab1ab1238@mail.gmail.com> > However, if anyone knows Ensembl very well, the database has in it > some of these interspecies comparisons already. ?They are accessed > when one does a phylogeny tree for specific genes (and generally for > highly conserved gene you will get a tree that includes nearly all 50 > species in the database). ?As I don't think they are computed > on-the-fly, the information must be precomputed and stored someplace > in the database. ?I would very much like to know how to access this > information. Yes, they are. You can access the data programmatically by installing the ensembl and ensembl-compara Perl APIs. There are a few example scripts for the GeneTrees: ensembl-compara/scripts/examples/homology*.pl Cheers, Albert. > Thanks, > Robert > > > > > On 1/9/10, Andreas K?h?ri wrote: >> On Sat, Jan 09, 2010 at 03:27:54PM -0500, Robert Bradbury wrote: >>> I am trying to get the examples provided by EMBL/Ensembl to work and am >>> encountering problems. >> >> Hi Robert, >> >> The ensembl-dev list is the appropriate forum for this type of questions >> as it has nothing to do with bioperl. >> >> There is also the Ensembl helpdesk. ?If you send your problem to >> I'm sure that it will be picked up by the >> appropriate people (I do myself not know enough about the Compara API to >> be able to diagnose this problem straight away I'm afraid). >> >> Be sure to submit a minimal script that still exhibit the problem, and >> information about what version of the APIs you're using (we will assume >> that you're not mixing newer version of the API with older databases or >> vice versa). >> >> We are generally very happy to have bugs in documentation or code >> pointed out to us, and will correct errors as we are made aware of them. >> >> >> Kind regards, >> Andreas >> >>> For example, about 1/3 of the way through the Compara API tutorial [1] >>> there >>> is what is supposed to be a completely functional script. ?It does not >>> work. ?This is in contrast to some of the earlier simple scripts (listing >>> the species in ?Ensmbl etc.) which do work on my machine, so I have all >>> the >>> libraries do dah installed correctly). >>> >>> Very poor form to document scripts which do not function on a properly >>> setup >>> system. >>> >>> I have modified my invocation of the script slightly: >>> ? Align.pl --set_of_species \ >>> "Homo sapiens:Pan troglodytes:Gorilla gorilla:Macaca mulatta:Otolemur >>> garnettii:Pongo pygmaeus:Equus caballus:Bos taurus:Loxodonta >>> africana:Pteropus vampyrus:Myotis lucifugus:Felis catus:Canis >>> familiaris:Sus >>> scrofa:Rattus norvegicus:Mus musculus:Gallus gallus:Xenopus >>> tropicalis:Takifugu rubripes:Tetraodon nigroviridis:Danio rerio:Tupaia >>> belangeri:Caenorhabditis elegans:Saccharomyces cerevisiae" >>> >>> which results in "Can't call method "fetch_by_method_link_type_GenomeDBs" >>> on >>> an undefined value at ./Align.pl line 132." (Align.pl is my slightly >>> modified example of the Compara Tutoraial code.) >>> As these are slightly modified perl scripts from the documantation, the >>> line >>> numbers may be variable. >>> >>> I can print out the genome_dbs, and it gives me a list of genome names >>> (hash >>> tables) though it appears that is problematic in the Align.pl script. >>> in spite of the fact that just previously to that call I dumped >>> "genome_dbs" >>> and got back some 25 hash tables (expected). ?I believe this occurs >>> whether >>> one is comparing "human:mouse" or the more complex species set I have >>> outlined above. >>> >>> >>> >>> Has anyone else attempted to run the code documented in the Ensembl API >>> Tutorial? >>> Any suggestions as to what direction to go in would be appreciated -- when >>> one is trying to copy code out of a tutorial and it fails its kind of hard >>> to know where to go.) >>> >>> There do appear to be some problems in the specifications of a Compara >>> version/database and there don't appear to be a lot of resources informing >>> one of what resources are currently available. >>> >>> Robert >>> >>> >>> 1. http://pre.ensembl.org/info/docs/api/compara/compara_tutorial.html >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> -- >> Andreas K?h?ri, Ensembl Software Developer >> European Bioinformatics Institute (EMBL-EBI) >> Wellcome Trust Genome Campus, Hinxton >> Cambridge CB10 1SD, United Kingdom >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From alessandra.bilardi at gmail.com Sun Jan 10 18:21:12 2010 From: alessandra.bilardi at gmail.com (Alessandra) Date: Mon, 11 Jan 2010 00:21:12 +0100 Subject: [Bioperl-l] GBrowse.org project In-Reply-To: References: Message-ID: Hi all, I'm Alessandra and I run GBrowse.org. GBrowse.org is a resource for using and setting up GBrowse genome browsers. The site provides one location where biologists and bioinformaticians can find: 1. Genome browser web sites for any organism that has them. If a species has a genome browser anywhere on the web, then we aim to link to it. 2. Links to sequence and annotation files that are available online. 3. Links to genome browser configuration files, when available 4. An FTP site containing genome annotation and configuration files for each annotated genome that does not have its own web site. GBrowse.org emphasizes the GBrowse genome browser in its organization, but also links to sites that use other browser packages such as UCSC, Ensembl, and JBrowse. Also, we are currently conducting a survey seeking input on future project direction. Please take a few minutes now to provide your feedback. Survey link: http://gbrowse.org/survey/index.php?sid=64264&lang=en GBrowse.org introdution link: http://gmod.org/wiki/August_2009_GMOD_Meeting#GBrowse.org Thank you for your help, Alessandra Bilardi. http://gbrowse.org/ CRIBI Genomics, University of Padua http://genomics.cribi.unipd.it/ From cjfields at illinois.edu Sun Jan 10 22:04:13 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 10 Jan 2010 21:04:13 -0600 Subject: [Bioperl-l] GMOD BioPerl Meeting Message-ID: <7D72ECC2-E856-4C09-B67A-62AFFB59B377@illinois.edu> Just a quick reminder that we're having a BioPerl satellite meeting after the PAG Conference (just prior to the GMOD Meeting). The meeting is this Wednesday, Jan. 13, starting at 11:30am, at the Best Western Seven Seas in San Diego. I will update the relevant BioPerl and GMOD pages with more details as they become available. At the moment, we will be meeting in the hotel lobby prior to starting the meeting and possible hackathon. http://www.bioperl.org/wiki/GMOD_2010_Meeting http://gmod.org/wiki/January_2010_GMOD_Meeting#Satellite_Meetings Thanks! chris From bernd.jagla at pasteur.fr Mon Jan 11 05:11:16 2010 From: bernd.jagla at pasteur.fr (Bernd Jagla) Date: Mon, 11 Jan 2010 11:11:16 +0100 Subject: [Bioperl-l] Bio:DB:SeqFeature - MySql - Java Message-ID: <6D85585C10F94E25898249D2D7CAC0D7@zillumina> Hi, First off, I am not sure if this is supposed to be addressed to the Bioperl or Gbrowse mailing list, so apologies if this is the wrong list and please let me know. I am writing a program in Java that needs to access genome annotation data. Since I am using Gbrowse already I was thinking that I could combine both approaches making life eventually easier for me. I am mainly interested in getting a gene/feature name for a given position. The position is stored in the feature table and through linking typelist, locationlist, (maybe sequence), and feature I can get all the information I need. Unfortunately it seems that the feature name is stored in the object blog of the feature table. That is a bit suspicious to me because I don't understand why searching for a name can be so fast if it is not indexed through mysql when searching using GBrowse. So my question is how to I parse the Bio::DB::SeqFeature object in JAVA correctly to get the name of the feature and possible also any further information. Any suggestions are greatly appreciated. Maybe there is a better solution than parsing Perl code with Java.? Thanks a lot, Bernd From biopython at maubp.freeserve.co.uk Mon Jan 11 05:48:52 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 11 Jan 2010 10:48:52 +0000 Subject: [Bioperl-l] Bio:DB:SeqFeature - MySql - Java In-Reply-To: <6D85585C10F94E25898249D2D7CAC0D7@zillumina> References: <6D85585C10F94E25898249D2D7CAC0D7@zillumina> Message-ID: <320fb6e01001110248t628f0837qa5e057fd53b58eac@mail.gmail.com> On Mon, Jan 11, 2010 at 10:11 AM, Bernd Jagla wrote: > Hi, > > First off, I am not sure if this is supposed to be addressed to the Bioperl > or Gbrowse mailing list, so apologies if this is the wrong list and please > let me know. > > I am writing a program in Java that needs to access genome annotation data. > Since I am using Gbrowse already I was thinking that I could combine both > approaches making life eventually easier for me. I am mainly interested in > getting a gene/feature name for a given position. The position is stored in > the feature table and through linking typelist, locationlist, (maybe > sequence), and feature I can get all the information I need. Unfortunately > it seems that the feature name is stored in the object blog of the feature > table. How are you storing the data in Gbrowse? There are several back ends, and this will make a big difference for accessing the raw data. One option would be to use Gbrowse with BioSQL as the backend. You can then use BioJava (or BioPerl, or BioPython, etc) to access the database. The only downside is Gbrowse isn't working 100% on top of BioSQL right now (I'd like to see this fixed, but I don't know Perl). There is an open bug on this [ gmod-Bugs-2168597 ]. Peter From bernd.jagla at pasteur.fr Mon Jan 11 05:53:20 2010 From: bernd.jagla at pasteur.fr (Bernd Jagla) Date: Mon, 11 Jan 2010 11:53:20 +0100 Subject: [Bioperl-l] Bio:DB:SeqFeature - MySql - Java In-Reply-To: <320fb6e01001110248t628f0837qa5e057fd53b58eac@mail.gmail.com> References: <6D85585C10F94E25898249D2D7CAC0D7@zillumina> <320fb6e01001110248t628f0837qa5e057fd53b58eac@mail.gmail.com> Message-ID: <9056164A8A744A77B6CD1E8E4E20B104@zillumina> I am using bp_seqfeature_load.pl to load my features. That is using Bio:DB:SeqFeature(Store) and MySql as a backend... That's all I understood... B > -----Original Message----- > From: p.j.a.cock at googlemail.com [mailto:p.j.a.cock at googlemail.com] On > Behalf Of Peter > Sent: Monday, January 11, 2010 11:49 AM > To: Bernd Jagla > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bio:DB:SeqFeature - MySql - Java > > On Mon, Jan 11, 2010 at 10:11 AM, Bernd Jagla > wrote: > > Hi, > > > > First off, I am not sure if this is supposed to be addressed to the > Bioperl > > or Gbrowse mailing list, so apologies if this is the wrong list and > please > > let me know. > > > > I am writing a program in Java that needs to access genome annotation > data. > > Since I am using Gbrowse already I was thinking that I could combine > both > > approaches making life eventually easier for me. I am mainly interested > in > > getting a gene/feature name for a given position. The position is stored > in > > the feature table and through linking typelist, locationlist, (maybe > > sequence), and feature I can get all the information I need. > Unfortunately > > it seems that the feature name is stored in the object blog of the > feature > > table. > > How are you storing the data in Gbrowse? There are several back ends, > and this will make a big difference for accessing the raw data. > > One option would be to use Gbrowse with BioSQL as the backend. > You can then use BioJava (or BioPerl, or BioPython, etc) to access the > database. The only downside is Gbrowse isn't working 100% on top > of BioSQL right now (I'd like to see this fixed, but I don't know Perl). > There is an open bug on this [ gmod-Bugs-2168597 ]. > > Peter From awitney at sgul.ac.uk Mon Jan 11 07:21:07 2010 From: awitney at sgul.ac.uk (Adam Witney) Date: Mon, 11 Jan 2010 12:21:07 +0000 Subject: [Bioperl-l] create Bio::AlignIO object from hash Message-ID: Hi, I am writing a script to automate the running of Phylip Pars. In the process i have to create a Bio::AlignIO object from a set of data that i have in a hash. I could write the hash data into a phylip file and then load the Bio::AlignIO from that file, but i wondered if i could skip the writing and then reading of a temporary file ? thanks for any help adam From roy.chaudhuri at gmail.com Mon Jan 11 08:54:25 2010 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Mon, 11 Jan 2010 13:54:25 +0000 Subject: [Bioperl-l] create Bio::AlignIO object from hash In-Reply-To: <4B4B2A51.9040602@gmail.com> References: <4B4B2A51.9040602@gmail.com> Message-ID: <4B4B2D91.70906@gmail.com> Actually, I guess some sample code would be more helpful: use Bio::LocatableSeq; use Bio::SimpleAlign; use Bio::AlignIO; my $seq1=Bio::LocatableSeq->new(-id=>'one', -seq=>'AT-CG', -start=>1, -end=>4); my $seq2=Bio::LocatableSeq->new(-id=>'two', -seq=>'A--CG', -start=>1, -end=>3); my $seq3=Bio::LocatableSeq->new(-id=>'three', -seq=>'ATTCG', -start=>1, -end=>5); my $aln=Bio::SimpleAlign->new(-seqs=>[$seq1,$seq2,$seq3]); Bio::AlignIO->new(-format=>'phylip')->write_aln($aln); Cheers, Roy. On 11/01/2010 13:40, Roy Chaudhuri wrote: > Hi Adam, > > I'm guessing you actually want to create a Bio::SimpleAlign object > (representing an alignment), rather than a Bio::AlignIO object (which is > just for reading/writing alignment files). Bio::SimpleAlign has a > documented new method that allows you to construct an alignment from > Bio::LocatableSeq objects, which are similar to Bio::Seq objects but > include gaps and start/end coordinates to describe their relationship to > other sequences in the alignment. > > Roy. > > On 11/01/2010 12:21, Adam Witney wrote: >> Hi, >> >> I am writing a script to automate the running of Phylip Pars. In the >> process i have to create a Bio::AlignIO object from a set of data >> that i have in a hash. >> >> I could write the hash data into a phylip file and then load the >> Bio::AlignIO from that file, but i wondered if i could skip the >> writing and then reading of a temporary file ? >> >> thanks for any help >> >> adam _______________________________________________ Bioperl-l >> mailing list Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From roy.chaudhuri at gmail.com Mon Jan 11 08:40:33 2010 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Mon, 11 Jan 2010 13:40:33 +0000 Subject: [Bioperl-l] create Bio::AlignIO object from hash In-Reply-To: References: Message-ID: <4B4B2A51.9040602@gmail.com> Hi Adam, I'm guessing you actually want to create a Bio::SimpleAlign object (representing an alignment), rather than a Bio::AlignIO object (which is just for reading/writing alignment files). Bio::SimpleAlign has a documented new method that allows you to construct an alignment from Bio::LocatableSeq objects, which are similar to Bio::Seq objects but include gaps and start/end coordinates to describe their relationship to other sequences in the alignment. Roy. On 11/01/2010 12:21, Adam Witney wrote: > Hi, > > I am writing a script to automate the running of Phylip Pars. In the > process i have to create a Bio::AlignIO object from a set of data > that i have in a hash. > > I could write the hash data into a phylip file and then load the > Bio::AlignIO from that file, but i wondered if i could skip the > writing and then reading of a temporary file ? > > thanks for any help > > adam _______________________________________________ Bioperl-l > mailing list Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From biopython at maubp.freeserve.co.uk Mon Jan 11 09:16:45 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 11 Jan 2010 14:16:45 +0000 Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records Message-ID: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com> Hi, I'm running bioperl-live from SVN, just updated to revision 16648. $ perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"' 1.0069 I am trying to get Bio::SeqIO to convert a multiple record EMBL file into GenBank format, piping the data via stdin/stdout using the following trivial Perl script: #!/usr/bin/env perl use Bio::SeqIO; my $in = Bio::SeqIO->new(-fh => \*STDIN, -format => 'embl'); my $out = Bio::SeqIO->new(-format => 'genbank'); while (my $seq = $in->next_seq) { $out->write_seq($seq) }; This only seems to find the first EMBL record in my example files. For example, this simple file has just two contig records: http://biopython.open-bio.org/SRC/biopython/Tests/EMBL/Human_contigs.embl This is just the first two records taken from a much larger EMBL file rel_con_hum_01_r102.dat downloaded and uncompressed from: ftp://ftp.ebi.ac.uk/pub/databases/embl/release/rel_con_hum_01_r102.dat.gz Trying both these examples as input, BioPerl just gives a single GenBank record as output (the first EMBL entry in the input). Is this a BioPerl bug, or am I missing something? Peter From maj at fortinbras.us Mon Jan 11 10:04:00 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 11 Jan 2010 10:04:00 -0500 Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records In-Reply-To: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com> References: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com> Message-ID: Hi Peter, I found the issue-- there are no SQ lines in the data, and having them is a key stop condition in the parser (line 438 embl.pm). We evidently need to be more liberal in what we accept, even as we are strict in what we emit. Could you make a bug report? thanks for the heads-up-- MAJ ----- Original Message ----- From: "Peter" To: "bioperl-l list" Sent: Monday, January 11, 2010 9:16 AM Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records > Hi, > > I'm running bioperl-live from SVN, just updated to revision 16648. > > $ perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"' > 1.0069 > > I am trying to get Bio::SeqIO to convert a multiple record EMBL > file into GenBank format, piping the data via stdin/stdout using > the following trivial Perl script: > > #!/usr/bin/env perl > use Bio::SeqIO; > my $in = Bio::SeqIO->new(-fh => \*STDIN, -format => 'embl'); > my $out = Bio::SeqIO->new(-format => 'genbank'); > while (my $seq = $in->next_seq) { $out->write_seq($seq) }; > > This only seems to find the first EMBL record in my example > files. For example, this simple file has just two contig records: > http://biopython.open-bio.org/SRC/biopython/Tests/EMBL/Human_contigs.embl > > This is just the first two records taken from a much larger EMBL file > rel_con_hum_01_r102.dat downloaded and uncompressed from: > ftp://ftp.ebi.ac.uk/pub/databases/embl/release/rel_con_hum_01_r102.dat.gz > > Trying both these examples as input, BioPerl just gives a single > GenBank record as output (the first EMBL entry in the input). > > Is this a BioPerl bug, or am I missing something? > > Peter > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From biopython at maubp.freeserve.co.uk Mon Jan 11 10:17:37 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 11 Jan 2010 15:17:37 +0000 Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records In-Reply-To: References: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com> Message-ID: <320fb6e01001110717g93f11ccn13c1010cefeb3a5b@mail.gmail.com> On Mon, Jan 11, 2010 at 3:04 PM, Mark A. Jensen wrote: > > Hi Peter, I found the issue-- there are no SQ lines in the data, and having > them is a key stop condition in the parser (line 438 embl.pm). > We evidently need to be more liberal in what we accept, even as we are > strict in what we emit. Could you make a bug report? > thanks for the heads-up-- > MAJ Done: http://bugzilla.open-bio.org/show_bug.cgi?id=2982 These are EMBL contig records, so they don't have SQ lines, but instead CO lines. Peter From cjfields at illinois.edu Mon Jan 11 10:24:24 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 11 Jan 2010 09:24:24 -0600 Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records In-Reply-To: <320fb6e01001110717g93f11ccn13c1010cefeb3a5b@mail.gmail.com> References: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com> <320fb6e01001110717g93f11ccn13c1010cefeb3a5b@mail.gmail.com> Message-ID: On Jan 11, 2010, at 9:17 AM, Peter wrote: > On Mon, Jan 11, 2010 at 3:04 PM, Mark A. Jensen wrote: >> >> Hi Peter, I found the issue-- there are no SQ lines in the data, and having >> them is a key stop condition in the parser (line 438 embl.pm). >> We evidently need to be more liberal in what we accept, even as we are >> strict in what we emit. Could you make a bug report? >> thanks for the heads-up-- >> MAJ > > Done: http://bugzilla.open-bio.org/show_bug.cgi?id=2982 > > These are EMBL contig records, so they don't have SQ lines, > but instead CO lines. > > Peter Peter, Just curious, but have you tried the experimental EMBL parser 'embldriver'? I don't think it's bound to the same strictures, but I may be mistaken. chris From cjfields at illinois.edu Mon Jan 11 10:23:00 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 11 Jan 2010 09:23:00 -0600 Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records In-Reply-To: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com> References: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com> Message-ID: <0D0D9DB5-56FA-414E-8D1D-3FE18198F7EC@illinois.edu> Just saw that mark responded, so if possible submit a bug. We may be doing a mini-hackathon this Wednesday, so we can probably tackle it in the process (possibly along with a few other pressing issues). chris On Jan 11, 2010, at 8:16 AM, Peter wrote: > Hi, > > I'm running bioperl-live from SVN, just updated to revision 16648. > > $ perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"' > 1.0069 > > I am trying to get Bio::SeqIO to convert a multiple record EMBL > file into GenBank format, piping the data via stdin/stdout using > the following trivial Perl script: > > #!/usr/bin/env perl > use Bio::SeqIO; > my $in = Bio::SeqIO->new(-fh => \*STDIN, -format => 'embl'); > my $out = Bio::SeqIO->new(-format => 'genbank'); > while (my $seq = $in->next_seq) { $out->write_seq($seq) }; > > This only seems to find the first EMBL record in my example > files. For example, this simple file has just two contig records: > http://biopython.open-bio.org/SRC/biopython/Tests/EMBL/Human_contigs.embl > > This is just the first two records taken from a much larger EMBL file > rel_con_hum_01_r102.dat downloaded and uncompressed from: > ftp://ftp.ebi.ac.uk/pub/databases/embl/release/rel_con_hum_01_r102.dat.gz > > Trying both these examples as input, BioPerl just gives a single > GenBank record as output (the first EMBL entry in the input). > > Is this a BioPerl bug, or am I missing something? > > Peter > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From biopython at maubp.freeserve.co.uk Mon Jan 11 10:55:26 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 11 Jan 2010 15:55:26 +0000 Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records In-Reply-To: References: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com> Message-ID: <320fb6e01001110755u50fd4255i1f6dae40a608a562@mail.gmail.com> On Mon, Jan 11, 2010 at 3:42 PM, Hotz, Hans-Rudolf wrote: > > These entries form the CON data class, see: > http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html#3_4_14 > and they don't contain any sequence information. I know - GenBank files have a similar system with CONTIG lines instead of sequences. I was expecting BioPerl to be able to convert these EMBL files with CO lines into GenBank files with CONTIG lines. > If you take the 'expanded' entries from > ftp://ftp.ebi.ac.uk/pub/databases/embl/expanded_con/release/rel_con_hum_01_r102.dat.gz > your script will work. That's a useful tip - thanks. Peter From hrh at fmi.ch Mon Jan 11 10:42:22 2010 From: hrh at fmi.ch (Hotz, Hans-Rudolf) Date: Mon, 11 Jan 2010 16:42:22 +0100 Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records In-Reply-To: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com> Message-ID: On 1/11/10 3:16 PM, "Peter" wrote: > Hi, > > I'm running bioperl-live from SVN, just updated to revision 16648. > > $ perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"' > 1.0069 > > I am trying to get Bio::SeqIO to convert a multiple record EMBL > file into GenBank format, piping the data via stdin/stdout using > the following trivial Perl script: > > #!/usr/bin/env perl > use Bio::SeqIO; > my $in = Bio::SeqIO->new(-fh => \*STDIN, -format => 'embl'); > my $out = Bio::SeqIO->new(-format => 'genbank'); > while (my $seq = $in->next_seq) { $out->write_seq($seq) }; > > This only seems to find the first EMBL record in my example > files. For example, this simple file has just two contig records: > http://biopython.open-bio.org/SRC/biopython/Tests/EMBL/Human_contigs.embl > > This is just the first two records taken from a much larger EMBL file > rel_con_hum_01_r102.dat downloaded and uncompressed from: > ftp://ftp.ebi.ac.uk/pub/databases/embl/release/rel_con_hum_01_r102.dat.gz These entries form the CON data class, see: http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html#3_4_14 and they don't contain any sequence information. If you take the 'expanded' entries from ftp://ftp.ebi.ac.uk/pub/databases/embl/expanded_con/release/rel_con_hum_01_r 102.dat.gz your script will work. Hans > Trying both these examples as input, BioPerl just gives a single > GenBank record as output (the first EMBL entry in the input). > > Is this a BioPerl bug, or am I missing something? > > Peter > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From awitney at sgul.ac.uk Mon Jan 11 11:27:15 2010 From: awitney at sgul.ac.uk (Adam Witney) Date: Mon, 11 Jan 2010 16:27:15 +0000 Subject: [Bioperl-l] create Bio::AlignIO object from hash In-Reply-To: <4B4B2D91.70906@gmail.com> References: <4B4B2A51.9040602@gmail.com> <4B4B2D91.70906@gmail.com> Message-ID: Ah excellent, thanks Roy. I was indeed thinking about it the wrong way. In the process of writing this i have created a Bio::Tools::Run::Phylo::Phylip::Pars class which is essentially just a modified copy of ProtPars. I have also fixed a few typos and possible bugs in Bio/Tools/Run/Phylo/Phylip/Base.pm Bio/Tools/Run/Phylo/Phylip/PhylipConf.pm Bio/AlignIO/phylip.pm Bio/Tools/Run/Alignment/Clustalw.pm I am of course happy to send these back in to the project... how would i best do this? Cheers adam On 11 Jan 2010, at 13:54, Roy Chaudhuri wrote: > Actually, I guess some sample code would be more helpful: > > use Bio::LocatableSeq; > use Bio::SimpleAlign; > use Bio::AlignIO; > my $seq1=Bio::LocatableSeq->new(-id=>'one', -seq=>'AT-CG', -start=>1, -end=>4); > my $seq2=Bio::LocatableSeq->new(-id=>'two', -seq=>'A--CG', -start=>1, -end=>3); > my $seq3=Bio::LocatableSeq->new(-id=>'three', -seq=>'ATTCG', -start=>1, -end=>5); > my $aln=Bio::SimpleAlign->new(-seqs=>[$seq1,$seq2,$seq3]); > Bio::AlignIO->new(-format=>'phylip')->write_aln($aln); > > Cheers, > Roy. > > > On 11/01/2010 13:40, Roy Chaudhuri wrote: >> Hi Adam, >> >> I'm guessing you actually want to create a Bio::SimpleAlign object >> (representing an alignment), rather than a Bio::AlignIO object (which is >> just for reading/writing alignment files). Bio::SimpleAlign has a >> documented new method that allows you to construct an alignment from >> Bio::LocatableSeq objects, which are similar to Bio::Seq objects but >> include gaps and start/end coordinates to describe their relationship to >> other sequences in the alignment. >> >> Roy. >> >> On 11/01/2010 12:21, Adam Witney wrote: >>> Hi, >>> >>> I am writing a script to automate the running of Phylip Pars. In the >>> process i have to create a Bio::AlignIO object from a set of data >>> that i have in a hash. >>> >>> I could write the hash data into a phylip file and then load the >>> Bio::AlignIO from that file, but i wondered if i could skip the >>> writing and then reading of a temporary file ? >>> >>> thanks for any help >>> >>> adam _______________________________________________ Bioperl-l >>> mailing list Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > From Russell.Smithies at agresearch.co.nz Mon Jan 11 22:41:02 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue, 12 Jan 2010 16:41:02 +1300 Subject: [Bioperl-l] BioPerl version? In-Reply-To: References: <4B4B2A51.9040602@gmail.com> <4B4B2D91.70906@gmail.com> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C619262C8@exchsth.agresearch.co.nz> Probably a dumb question but how do I find the version of an existing BioPerl installation without resorting to reading thru Bio/Root/Version.pm ? --Russell ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From cjfields at illinois.edu Mon Jan 11 22:59:44 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 11 Jan 2010 21:59:44 -0600 Subject: [Bioperl-l] BioPerl version? In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C619262C8@exchsth.agresearch.co.nz> References: <4B4B2A51.9040602@gmail.com> <4B4B2D91.70906@gmail.com> <18DF7D20DFEC044098A1062202F5FFF32C619262C8@exchsth.agresearch.co.nz> Message-ID: <795BD926-4AE9-4478-AAD5-E36558350745@illinois.edu> Not dumb, but a frequently asked one: that's a FAQ question ;> http://www.bioperl.org/wiki/FAQ#How_can_I_tell_what_version_of_BioPerl_is_installed.3F perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"' chris On Jan 11, 2010, at 9:41 PM, Smithies, Russell wrote: > Probably a dumb question but how do I find the version of an existing BioPerl installation without resorting to reading thru Bio/Root/Version.pm ? > > --Russell > > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Jan 12 11:02:02 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 12 Jan 2010 10:02:02 -0600 Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records In-Reply-To: <320fb6e01001110755u50fd4255i1f6dae40a608a562@mail.gmail.com> References: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com> <320fb6e01001110755u50fd4255i1f6dae40a608a562@mail.gmail.com> Message-ID: On Jan 11, 2010, at 9:55 AM, Peter wrote: > On Mon, Jan 11, 2010 at 3:42 PM, Hotz, Hans-Rudolf wrote: >> >> These entries form the CON data class, see: >> http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html#3_4_14 >> and they don't contain any sequence information. > > I know - GenBank files have a similar system with CONTIG > lines instead of sequences. I was expecting BioPerl to be > able to convert these EMBL files with CO lines into GenBank > files with CONTIG lines. IIRC the contig information for GenBank is stored in annotation. We can try to ensure the data is carried over to EMBL properly. >> If you take the 'expanded' entries from >> ftp://ftp.ebi.ac.uk/pub/databases/embl/expanded_con/release/rel_con_hum_01_r102.dat.gz >> your script will work. > > That's a useful tip - thanks. > > Peter NCBI's eutil option 'gbwithparts' is similar (always retrieves the sequence). chris From biopython at maubp.freeserve.co.uk Tue Jan 12 11:19:32 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 12 Jan 2010 16:19:32 +0000 Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records In-Reply-To: References: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com> <320fb6e01001110755u50fd4255i1f6dae40a608a562@mail.gmail.com> Message-ID: <320fb6e01001120819u50e73fa8k9bde8aa1abdf942d@mail.gmail.com> On Tue, Jan 12, 2010 at 4:02 PM, Chris Fields wrote: > On Jan 11, 2010, at 9:55 AM, Peter wrote: > >> On Mon, Jan 11, 2010 at 3:42 PM, Hotz, Hans-Rudolf wrote: >>> >>> These entries form the CON data class, see: >>> http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html#3_4_14 >>> and they don't contain any sequence information. >> >> I know - GenBank files have a similar system with CONTIG >> lines instead of sequences. I was expecting BioPerl to be >> able to convert these EMBL files with CO lines into GenBank >> files with CONTIG lines. > > IIRC the contig information for GenBank is stored in annotation. > We can try to ensure the data is carried over to EMBL properly. For contig records (where there is no sequence) I think we just need to map the GenBank CONTIG lines to the EMBL CO lines, and vice versa. At least, that's what Biopython now does (trunk code, not yet released). >>> If you take the 'expanded' entries from >>> ftp://ftp.ebi.ac.uk/pub/databases/embl/expanded_con/release/rel_con_hum_01_r102.dat.gz >>> your script will work. >> >> That's a useful tip - thanks. >> >> Peter > > NCBI's eutil option 'gbwithparts' is similar (always retrieves the sequence). Indeed. This is a useful work around for when a parser couldn't cope with the contig version of a GenBank file for some reason, e.g. http://bugzilla.open-bio.org/show_bug.cgi?id=2745 Peter From maj at fortinbras.us Tue Jan 12 12:33:30 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 12 Jan 2010 12:33:30 -0500 Subject: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP web service Message-ID: <231A8D9473704E7697F7A486A0CDA86A@NewLife> Hi All-- The beta of Bio::DB::SoapEUtilities is now available in the bioperl-run trunk: one-stop shopping for the NCBI Entrez SOAP web service. The system is fully WSDL based, and all eutils are available. The best thing (IMHO) are the result adaptors, which provide conversion and iteration of SOAP results into BioPerl objects. Schau, mal: use Bio::DB::EUtilities; my $fac = Bio::DB::EUtilities->new(); # step 1 my $seqio = $fac->esearch( -db => 'nucleotide', -term => 'HIV1 and CCR5 and Brazil' )->run(-auto_adapt => 1, -rettype => 'fasta'); # step 2, 3 # yes, it's already done the efetch under the hood... while ( my $seq = $seqio->next_seq ) { # step 4 # do something with $seq, a Bio::Seq object... } or this: my $links = $fac->elink( -db => 'protein', -dbfrom => 'nucleotide', -id => \@nucids )->run( -auto_adapt => 1 ); # maybe more than one associated id... my @prot_0 = $links->id_map( $nucids[0] ); while ( my $ls = $links->next_linkset ) { @ids = $ls->ids; @submitted_ids = $ls->submitted_ids; # etc. } and much, much more. See http://www.bioperl.org/wiki/HOWTO:EUtilities_Web_Service and of course, the POD, for all the details, including download/installation. Tests in bioperl-run/t. cheers, MAJ -- No new dependencies were added or animals mistreated -- during the making of these modules. From sheldon.mckay at gmail.com Tue Jan 12 13:02:53 2010 From: sheldon.mckay at gmail.com (Sheldon McKay) Date: Tue, 12 Jan 2010 10:02:53 -0800 Subject: [Bioperl-l] code.open-bio.org timing out? Message-ID: Hi all, I keep timing out trying to do an svn checkout of bioperl-live from code.open-bio.org. Any suggestions? Thanks, Sheldon ---- Sheldon McKay, PhD Lead, iPlant Tree of Life Engagement Team; Research Investigator Cold Spring Harbor Laboratory http://mckay.cshl.edu Google Voice: (203) 701-9204 On Tue, Nov 3, 2009 at 9:09 AM, Aaron Mackey wrote: > [ajm6q at lc4 bioperl-live]$ svn update > svn: Decompression of svndiff data failed > > > I'll admit to not having svn updated in awhile; A clean, anonymous svn co > failed with the same message: > > [...] > A ? ?bioperl-live/Bio/Structure/StructureI.pm > A ? ?bioperl-live/Bio/Structure/IO > svn: Decompression of svndiff data failed > > -Aaron > > P.S. I used this command: svn co svn:// > code.open-bio.org/bioperl/bioperl-live/trunk bioperl-live > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From biopython at maubp.freeserve.co.uk Tue Jan 12 13:12:46 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 12 Jan 2010 18:12:46 +0000 Subject: [Bioperl-l] code.open-bio.org timing out? In-Reply-To: References: Message-ID: <320fb6e01001121012r68166814o764df62c0a5a6224@mail.gmail.com> On Tue, Jan 12, 2010 at 6:02 PM, Sheldon McKay wrote: > Hi all, > > I keep timing out trying to do an svn checkout of bioperl-live from > code.open-bio.org. ?Any suggestions? > > Thanks, > Sheldon The OBF team know about this (its being discussed on root-l), hopefully they'll have it fixed before too long. Peter From cjfields at illinois.edu Tue Jan 12 13:18:45 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 12 Jan 2010 12:18:45 -0600 Subject: [Bioperl-l] code.open-bio.org timing out? In-Reply-To: <320fb6e01001121012r68166814o764df62c0a5a6224@mail.gmail.com> References: <320fb6e01001121012r68166814o764df62c0a5a6224@mail.gmail.com> Message-ID: <8E18DCA9-5C72-4201-A213-BF53A6AAAAD2@illinois.edu> On Jan 12, 2010, at 12:12 PM, Peter wrote: > On Tue, Jan 12, 2010 at 6:02 PM, Sheldon McKay wrote: >> Hi all, >> >> I keep timing out trying to do an svn checkout of bioperl-live from >> code.open-bio.org. Any suggestions? >> >> Thanks, >> Sheldon > > The OBF team know about this (its being discussed on root-l), > hopefully they'll have it fixed before too long. > > Peter We probably need to set up some automatic syncing of our read-only code.google.com repo as a backup. Jason had originally set that up, hopefully he'll respond. chris From jason at bioperl.org Tue Jan 12 13:27:55 2010 From: jason at bioperl.org (Jason Stajich) Date: Tue, 12 Jan 2010 10:27:55 -0800 Subject: [Bioperl-l] code.open-bio.org timing out? In-Reply-To: <8E18DCA9-5C72-4201-A213-BF53A6AAAAD2@illinois.edu> References: <320fb6e01001121012r68166814o764df62c0a5a6224@mail.gmail.com> <8E18DCA9-5C72-4201-A213-BF53A6AAAAD2@illinois.edu> Message-ID: Hi - I had setup the google code sync, but then the unfortunately realization that the revision numbers are shared among the wiki and the code SVN (all 1 repo) so when I added a wiki page on the site I screwed up the numbering and it wasn't possible to sync anymore (that I could figure out) without resetting it and I haven't gone back to that. Sorry - I wasn't sure if we had figured out what we wanted to for repositories so I sort of stopped worrying about it. -jason On Jan 12, 2010, at 10:18 AM, Chris Fields wrote: > On Jan 12, 2010, at 12:12 PM, Peter wrote: > >> On Tue, Jan 12, 2010 at 6:02 PM, Sheldon McKay > > wrote: >>> Hi all, >>> >>> I keep timing out trying to do an svn checkout of bioperl-live from >>> code.open-bio.org. Any suggestions? >>> >>> Thanks, >>> Sheldon >> >> The OBF team know about this (its being discussed on root-l), >> hopefully they'll have it fixed before too long. >> >> Peter > > We probably need to set up some automatic syncing of our read-only > code.google.com repo as a backup. Jason had originally set that up, > hopefully he'll respond. > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ From virajj at gmail.com Wed Jan 6 13:20:39 2010 From: virajj at gmail.com (Vijayaraj Nagarajan) Date: Wed, 6 Jan 2010 13:20:39 -0500 Subject: [Bioperl-l] targetp request Message-ID: <581b44531001061020i68b2e80ic15f3bcc830204a@mail.gmail.com> Hi, I am trying to use targetP in bioperl. the documentation at the bioperl site is a bit confusing to me... I would appreciate if you could give a very small example, as to how to use "Bio::Tools::TargetP" to predict the localization of a protein sequence that i have stored as a string. Thanks, Vijay From cjfields at illinois.edu Tue Jan 12 18:36:53 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 12 Jan 2010 17:36:53 -0600 Subject: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP web service In-Reply-To: <231A8D9473704E7697F7A486A0CDA86A@NewLife> References: <231A8D9473704E7697F7A486A0CDA86A@NewLife> Message-ID: Um, just to be clear, this isn't Bio::DB::EUtilities, right (it's Bio::DB::SoapEUtilities)? Otherwise this would be a serious namespace and API conflict with the current EUtilities tools. chris On Jan 12, 2010, at 11:33 AM, Mark A. Jensen wrote: > Hi All-- > > The beta of Bio::DB::SoapEUtilities is now available in the > bioperl-run trunk: one-stop shopping for the NCBI Entrez SOAP web > service. The system is fully WSDL based, and all eutils are > available. The best thing (IMHO) are the result adaptors, which > provide conversion and iteration of SOAP results into BioPerl > objects. Schau, mal: > > use Bio::DB::EUtilities; > my $fac = Bio::DB::EUtilities->new(); # step 1 > my $seqio = $fac->esearch( > -db => 'nucleotide', > -term => 'HIV1 and CCR5 and Brazil' > )->run(-auto_adapt => 1, -rettype => 'fasta'); # step 2, 3 > # yes, it's already done the efetch under the hood... > while ( my $seq = $seqio->next_seq ) { # step 4 > # do something with $seq, a Bio::Seq object... > } > > or this: > > my $links = $fac->elink( -db => 'protein', > -dbfrom => 'nucleotide', > -id => \@nucids )->run( -auto_adapt => 1 ); > > # maybe more than one associated id... > my @prot_0 = $links->id_map( $nucids[0] ); > > while ( my $ls = $links->next_linkset ) { > @ids = $ls->ids; > @submitted_ids = $ls->submitted_ids; > # etc. > } > > and much, much more. See > > http://www.bioperl.org/wiki/HOWTO:EUtilities_Web_Service > > and of course, the POD, for all the details, including > download/installation. Tests in bioperl-run/t. > > cheers, > MAJ > > -- No new dependencies were added or animals mistreated > -- during the making of these modules. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Jan 12 19:22:10 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 12 Jan 2010 18:22:10 -0600 Subject: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP web service In-Reply-To: <5AD210CB0C444A57881BBDD34DE99149@NewLife> References: <231A8D9473704E7697F7A486A0CDA86A@NewLife> <5AD210CB0C444A57881BBDD34DE99149@NewLife> Message-ID: Okay, just making sure (I was getting a bit paranoid). Great work on the SOAP interface, BTW! chris On Jan 12, 2010, at 6:08 PM, Mark A. Jensen wrote: > Um, yeah. > ----- Original Message ----- From: "Chris Fields" > To: "Mark A. Jensen" > Cc: "BioPerl List" > Sent: Tuesday, January 12, 2010 6:36 PM > Subject: Re: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP web service > > > Um, just to be clear, this isn't Bio::DB::EUtilities, right (it's Bio::DB::SoapEUtilities)? Otherwise this would be a serious namespace and API conflict with the current EUtilities tools. > > chris > > On Jan 12, 2010, at 11:33 AM, Mark A. Jensen wrote: > >> Hi All-- >> >> The beta of Bio::DB::SoapEUtilities is now available in the >> bioperl-run trunk: one-stop shopping for the NCBI Entrez SOAP web >> service. The system is fully WSDL based, and all eutils are >> available. The best thing (IMHO) are the result adaptors, which >> provide conversion and iteration of SOAP results into BioPerl >> objects. Schau, mal: >> >> use Bio::DB::EUtilities; >> my $fac = Bio::DB::EUtilities->new(); # step 1 >> my $seqio = $fac->esearch( >> -db => 'nucleotide', >> -term => 'HIV1 and CCR5 and Brazil' >> )->run(-auto_adapt => 1, -rettype => 'fasta'); # step 2, 3 >> # yes, it's already done the efetch under the hood... >> while ( my $seq = $seqio->next_seq ) { # step 4 >> # do something with $seq, a Bio::Seq object... >> } >> >> or this: >> >> my $links = $fac->elink( -db => 'protein', >> -dbfrom => 'nucleotide', >> -id => \@nucids )->run( -auto_adapt => 1 ); >> >> # maybe more than one associated id... >> my @prot_0 = $links->id_map( $nucids[0] ); >> >> while ( my $ls = $links->next_linkset ) { >> @ids = $ls->ids; >> @submitted_ids = $ls->submitted_ids; >> # etc. >> } >> >> and much, much more. See >> >> http://www.bioperl.org/wiki/HOWTO:EUtilities_Web_Service >> >> and of course, the POD, for all the details, including >> download/installation. Tests in bioperl-run/t. >> >> cheers, >> MAJ >> >> -- No new dependencies were added or animals mistreated >> -- during the making of these modules. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Tue Jan 12 19:08:12 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 12 Jan 2010 19:08:12 -0500 Subject: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP web service In-Reply-To: References: <231A8D9473704E7697F7A486A0CDA86A@NewLife> Message-ID: <5AD210CB0C444A57881BBDD34DE99149@NewLife> Um, yeah. ----- Original Message ----- From: "Chris Fields" To: "Mark A. Jensen" Cc: "BioPerl List" Sent: Tuesday, January 12, 2010 6:36 PM Subject: Re: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP web service Um, just to be clear, this isn't Bio::DB::EUtilities, right (it's Bio::DB::SoapEUtilities)? Otherwise this would be a serious namespace and API conflict with the current EUtilities tools. chris On Jan 12, 2010, at 11:33 AM, Mark A. Jensen wrote: > Hi All-- > > The beta of Bio::DB::SoapEUtilities is now available in the > bioperl-run trunk: one-stop shopping for the NCBI Entrez SOAP web > service. The system is fully WSDL based, and all eutils are > available. The best thing (IMHO) are the result adaptors, which > provide conversion and iteration of SOAP results into BioPerl > objects. Schau, mal: > > use Bio::DB::EUtilities; > my $fac = Bio::DB::EUtilities->new(); # step 1 > my $seqio = $fac->esearch( > -db => 'nucleotide', > -term => 'HIV1 and CCR5 and Brazil' > )->run(-auto_adapt => 1, -rettype => 'fasta'); # step 2, 3 > # yes, it's already done the efetch under the hood... > while ( my $seq = $seqio->next_seq ) { # step 4 > # do something with $seq, a Bio::Seq object... > } > > or this: > > my $links = $fac->elink( -db => 'protein', > -dbfrom => 'nucleotide', > -id => \@nucids )->run( -auto_adapt => 1 ); > > # maybe more than one associated id... > my @prot_0 = $links->id_map( $nucids[0] ); > > while ( my $ls = $links->next_linkset ) { > @ids = $ls->ids; > @submitted_ids = $ls->submitted_ids; > # etc. > } > > and much, much more. See > > http://www.bioperl.org/wiki/HOWTO:EUtilities_Web_Service > > and of course, the POD, for all the details, including > download/installation. Tests in bioperl-run/t. > > cheers, > MAJ > > -- No new dependencies were added or animals mistreated > -- during the making of these modules. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Tue Jan 12 20:09:28 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 12 Jan 2010 20:09:28 -0500 Subject: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP webservice In-Reply-To: <5AD210CB0C444A57881BBDD34DE99149@NewLife> References: <231A8D9473704E7697F7A486A0CDA86A@NewLife> <5AD210CB0C444A57881BBDD34DE99149@NewLife> Message-ID: corrected: use Bio::DB::SoapEUtilities; my $fac = Bio::DB::SoapEUtilities->new(); # step 1 my $seqio = $fac->esearch( -db => 'nucleotide', -term => 'HIV1 and CCR5 and Brazil' )->run(-auto_adapt => 1, -rettype => 'fasta'); # step 2, 3 # yes, it's already done the efetch under the hood... while ( my $seq = $seqio->next_seq ) { # step 4 # do something with $seq, a Bio::Seq object... } ----- Original Message ----- From: "Mark A. Jensen" To: "Chris Fields" Cc: "BioPerl List" Sent: Tuesday, January 12, 2010 7:08 PM Subject: Re: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP webservice > Um, yeah. > ----- Original Message ----- > From: "Chris Fields" > To: "Mark A. Jensen" > Cc: "BioPerl List" > Sent: Tuesday, January 12, 2010 6:36 PM > Subject: Re: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP web > service > > > Um, just to be clear, this isn't Bio::DB::EUtilities, right (it's > Bio::DB::SoapEUtilities)? Otherwise this would be a serious namespace and API > conflict with the current EUtilities tools. > > chris > > On Jan 12, 2010, at 11:33 AM, Mark A. Jensen wrote: > >> Hi All-- >> >> The beta of Bio::DB::SoapEUtilities is now available in the >> bioperl-run trunk: one-stop shopping for the NCBI Entrez SOAP web >> service. The system is fully WSDL based, and all eutils are >> available. The best thing (IMHO) are the result adaptors, which >> provide conversion and iteration of SOAP results into BioPerl >> objects. Schau, mal: >> >> use Bio::DB::EUtilities; >> my $fac = Bio::DB::EUtilities->new(); # step 1 >> my $seqio = $fac->esearch( >> -db => 'nucleotide', >> -term => 'HIV1 and CCR5 and Brazil' >> )->run(-auto_adapt => 1, -rettype => 'fasta'); # step 2, 3 >> # yes, it's already done the efetch under the hood... >> while ( my $seq = $seqio->next_seq ) { # step 4 >> # do something with $seq, a Bio::Seq object... >> } >> >> or this: >> >> my $links = $fac->elink( -db => 'protein', >> -dbfrom => 'nucleotide', >> -id => \@nucids )->run( -auto_adapt => 1 ); >> >> # maybe more than one associated id... >> my @prot_0 = $links->id_map( $nucids[0] ); >> >> while ( my $ls = $links->next_linkset ) { >> @ids = $ls->ids; >> @submitted_ids = $ls->submitted_ids; >> # etc. >> } >> >> and much, much more. See >> >> http://www.bioperl.org/wiki/HOWTO:EUtilities_Web_Service >> >> and of course, the POD, for all the details, including >> download/installation. Tests in bioperl-run/t. >> >> cheers, >> MAJ >> >> -- No new dependencies were added or animals mistreated >> -- during the making of these modules. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From tuco at pasteur.fr Wed Jan 13 05:24:34 2010 From: tuco at pasteur.fr (Emmanuel Quevillon) Date: Wed, 13 Jan 2010 11:24:34 +0100 Subject: [Bioperl-l] targetp request In-Reply-To: <581b44531001061020i68b2e80ic15f3bcc830204a@mail.gmail.com> References: <581b44531001061020i68b2e80ic15f3bcc830204a@mail.gmail.com> Message-ID: <4B4D9F62.5010306@pasteur.fr> On 1/6/10 7:20 PM, Vijayaraj Nagarajan wrote: > Hi, > > I am trying to use targetP in bioperl. > the documentation at the bioperl site is a bit confusing to me... > > I would appreciate if you could give a very small example, as to how to use > "Bio::Tools::TargetP" to predict the localization of a protein sequence that > i have stored as a string. > > Thanks, > Vijay > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Dear Vivay, Bio::Tools::TargetP is not intended to run targetp on a sequence but to read and parse results from targetp run. From the Pod doc : DESCRIPTION TargetP modules will provides parsed informations about protein localization. It reads in a targetp output file. It parses the results, and returns a Bio::SeqFeature::Generic object for each sequences found to have a subcellular localization So to analyze your sequence, you'll first need to run targetp on your sequence file to create a targetp result output file. Then use Bio::Tools::TargetP module to parse this result file and get only informations you want/need from the result to be display as shown in the SYNOPSIS of the Pod documentation of the module. HTH Regards Emmanuel From roy.chaudhuri at gmail.com Wed Jan 13 07:52:58 2010 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Wed, 13 Jan 2010 12:52:58 +0000 Subject: [Bioperl-l] create Bio::AlignIO object from hash In-Reply-To: References: <4B4B2A51.9040602@gmail.com> <4B4B2D91.70906@gmail.com> Message-ID: <4B4DC22A.8080701@gmail.com> Upload them to Bugzilla as patches, and one of the devs will review your changes and incorporate them into bioperl-live: http://www.bioperl.org/wiki/HOWTO:SubmitPatch Roy. On 11/01/2010 16:27, Adam Witney wrote: > > Ah excellent, thanks Roy. I was indeed thinking about it the wrong > way. > > In the process of writing this i have created a > > Bio::Tools::Run::Phylo::Phylip::Pars class > > which is essentially just a modified copy of ProtPars. I have also > fixed a few typos and possible bugs in > > Bio/Tools/Run/Phylo/Phylip/Base.pm > Bio/Tools/Run/Phylo/Phylip/PhylipConf.pm Bio/AlignIO/phylip.pm > Bio/Tools/Run/Alignment/Clustalw.pm > > I am of course happy to send these back in to the project... how > would i best do this? > > Cheers > > adam > > > On 11 Jan 2010, at 13:54, Roy Chaudhuri wrote: > >> Actually, I guess some sample code would be more helpful: >> >> use Bio::LocatableSeq; use Bio::SimpleAlign; use Bio::AlignIO; my >> $seq1=Bio::LocatableSeq->new(-id=>'one', -seq=>'AT-CG', -start=>1, >> -end=>4); my $seq2=Bio::LocatableSeq->new(-id=>'two', >> -seq=>'A--CG', -start=>1, -end=>3); my >> $seq3=Bio::LocatableSeq->new(-id=>'three', -seq=>'ATTCG', >> -start=>1, -end=>5); my >> $aln=Bio::SimpleAlign->new(-seqs=>[$seq1,$seq2,$seq3]); >> Bio::AlignIO->new(-format=>'phylip')->write_aln($aln); >> >> Cheers, Roy. >> >> >> On 11/01/2010 13:40, Roy Chaudhuri wrote: >>> Hi Adam, >>> >>> I'm guessing you actually want to create a Bio::SimpleAlign >>> object (representing an alignment), rather than a Bio::AlignIO >>> object (which is just for reading/writing alignment files). >>> Bio::SimpleAlign has a documented new method that allows you to >>> construct an alignment from Bio::LocatableSeq objects, which are >>> similar to Bio::Seq objects but include gaps and start/end >>> coordinates to describe their relationship to other sequences in >>> the alignment. >>> >>> Roy. >>> >>> On 11/01/2010 12:21, Adam Witney wrote: >>>> Hi, >>>> >>>> I am writing a script to automate the running of Phylip Pars. >>>> In the process i have to create a Bio::AlignIO object from a >>>> set of data that i have in a hash. >>>> >>>> I could write the hash data into a phylip file and then load >>>> the Bio::AlignIO from that file, but i wondered if i could skip >>>> the writing and then reading of a temporary file ? >>>> >>>> thanks for any help >>>> >>>> adam _______________________________________________ Bioperl-l >>>> mailing list Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> > From marcelo011982 at gmail.com Wed Jan 13 13:12:04 2010 From: marcelo011982 at gmail.com (Marcelo Iwata) Date: Wed, 13 Jan 2010 16:12:04 -0200 Subject: [Bioperl-l] Blast to Clustalw Format Message-ID: <1c9f28971001131012w65378e9bw12d892e874028056@mail.gmail.com> Hi.. I have an simple Blast result, such as blastn. Is there an scrip to transform such result to Clustalw format in Bioperl ?(.aln) Thanx for any help. From Kevin.M.Brown at asu.edu Wed Jan 13 13:01:42 2010 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Wed, 13 Jan 2010 11:01:42 -0700 Subject: [Bioperl-l] targetp request In-Reply-To: <4B4D9F62.5010306@pasteur.fr> References: <581b44531001061020i68b2e80ic15f3bcc830204a@mail.gmail.com> <4B4D9F62.5010306@pasteur.fr> Message-ID: <1A4207F8295607498283FE9E93B775B4067C133E@EX02.asurite.ad.asu.edu> Sounds like this module might be in the wrong place then. Sounds more like a SeqIO or AlignIO module, heheh. Also looks like the docs might need to be cleaned up a bit for english readability (at least that initial sentence). Kevin Brown Center for Innovations in Medicine Biodesign Institute Arizona State University > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Emmanuel Quevillon > Sent: Wednesday, January 13, 2010 3:25 AM > To: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] targetp request > > On 1/6/10 7:20 PM, Vijayaraj Nagarajan wrote: > > Hi, > > > > I am trying to use targetP in bioperl. > > the documentation at the bioperl site is a bit confusing to me... > > > > I would appreciate if you could give a very small example, > as to how to use > > "Bio::Tools::TargetP" to predict the localization of a > protein sequence that > > i have stored as a string. > > > > Thanks, > > Vijay > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Dear Vivay, > > Bio::Tools::TargetP is not intended to run targetp on a > sequence but to > read and parse results from targetp run. > > From the Pod doc : > > DESCRIPTION > TargetP modules will provides parsed informations > about protein > localization. It > reads in a targetp output file. It parses the results, and > returns a > Bio::SeqFeature::Generic object for each sequences > found to have > a subcellular > localization > > > So to analyze your sequence, you'll first need to run targetp on your > sequence file to create a targetp result output file. Then use > Bio::Tools::TargetP module to parse this result file and get only > informations you want/need from the result to be display as > shown in the > SYNOPSIS of the Pod documentation of the module. > > HTH > > Regards > > Emmanuel > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From maj at fortinbras.us Wed Jan 13 13:44:36 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 13 Jan 2010 13:44:36 -0500 Subject: [Bioperl-l] Blast to Clustalw Format In-Reply-To: <1c9f28971001131012w65378e9bw12d892e874028056@mail.gmail.com> References: <1c9f28971001131012w65378e9bw12d892e874028056@mail.gmail.com> Message-ID: Marcelo- Yes-- look at the code snip at http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_SearchIO combined with the snip at http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods (using -format => 'clustalw') cheers MAJ ----- Original Message ----- From: "Marcelo Iwata" To: Sent: Wednesday, January 13, 2010 1:12 PM Subject: [Bioperl-l] Blast to Clustalw Format > Hi.. > I have an simple Blast result, such as blastn. > Is there an scrip to transform such result to Clustalw format in Bioperl > ?(.aln) > > Thanx for any help. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From dan.kortschak at adelaide.edu.au Wed Jan 13 23:26:46 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Thu, 14 Jan 2010 14:56:46 +1030 Subject: [Bioperl-l] not able to use Bio::Root::IO method Message-ID: <1263443206.8016.47.camel@zoidberg.mbs.adelaide.edu.au> Hi All, I'm having a stupid problem that for some reason I just can't figure out. I'm putting together a B:A:IO:bowtie module to wrap around the B:A:IO:sam module so bowtie output can be used as an assembly start point. For some reason that is escaping me I can't create tempfiles! What should be the relevant code in the module: package Bio::Assembly::IO::bowtie; use strict; use warnings; # Object preamble - inherits from Bio::Root::Root use Bio::SeqIO; use Bio::Tools::Run::Samtools; use Bio::Assembly::IO; use Carp; use Bio::Root::Root; use Bio::Root::IO; use base qw( Bio::Root::Root Bio::Root::IO Bio::Assembly::IO ); and the line (there are a couple of others that are like to fail in the same way, but I've not got that far) my ($sam_tmp_h, $sam_tmp_f) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.sam' ); Which dies with: Can't locate object method "io" via package "Bio::Assembly::IO::bowtie" at /usr/local/share/perl/5.10.0/Bio/Assembly/IO/bowtie.pm line 175. Relevant environment vars: DB<10> x @ISA 0 'Bio::Root::Root' 1 'Bio::Root::IO' 2 'Bio::Assembly::IO' DB<11> x $self 0 Bio::Assembly::IO::bowtie=HASH(0x2d226d8) '_no_head' => undef '_no_sq' => undef '_root_verbose' => 0 Can someone suggest what I'm missing? cheers Dan From maj at fortinbras.us Thu Jan 14 00:11:01 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 14 Jan 2010 00:11:01 -0500 Subject: [Bioperl-l] not able to use Bio::Root::IO method In-Reply-To: <1263443206.8016.47.camel@zoidberg.mbs.adelaide.edu.au> References: <1263443206.8016.47.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: <84196F01FF584C64A79B89FECE2DD86F@NewLife> Hey Dan-- what does your constructor look like? I wonder if something's getting lost in new() and _initialize() chaining spaghetti- MAJ ----- Original Message ----- From: "Dan Kortschak" To: Sent: Wednesday, January 13, 2010 11:26 PM Subject: [Bioperl-l] not able to use Bio::Root::IO method > Hi All, > > I'm having a stupid problem that for some reason I just can't figure > out. I'm putting together a B:A:IO:bowtie module to wrap around the > B:A:IO:sam module so bowtie output can be used as an assembly start > point. > > For some reason that is escaping me I can't create tempfiles! > > What should be the relevant code in the module: > > package Bio::Assembly::IO::bowtie; > use strict; > use warnings; > > # Object preamble - inherits from Bio::Root::Root > > use Bio::SeqIO; > use Bio::Tools::Run::Samtools; > use Bio::Assembly::IO; > use Carp; > use Bio::Root::Root; > use Bio::Root::IO; > use base qw( Bio::Root::Root Bio::Root::IO Bio::Assembly::IO ); > > > and the line (there are a couple of others that are like to fail in the > same way, but I've not got that far) > > my ($sam_tmp_h, $sam_tmp_f) = $self->io->tempfile( -dir => > $self->tempdir(), -suffix => '.sam' ); > > Which dies with: > Can't locate object method "io" via package "Bio::Assembly::IO::bowtie" > at /usr/local/share/perl/5.10.0/Bio/Assembly/IO/bowtie.pm line 175. > > Relevant environment vars: > DB<10> x @ISA > 0 'Bio::Root::Root' > 1 'Bio::Root::IO' > 2 'Bio::Assembly::IO' > > DB<11> x $self > 0 Bio::Assembly::IO::bowtie=HASH(0x2d226d8) > '_no_head' => undef > '_no_sq' => undef > '_root_verbose' => 0 > > > > Can someone suggest what I'm missing? > > cheers > Dan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From dan.kortschak at adelaide.edu.au Thu Jan 14 00:35:35 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Thu, 14 Jan 2010 16:05:35 +1030 Subject: [Bioperl-l] not able to use Bio::Root::IO method In-Reply-To: <84196F01FF584C64A79B89FECE2DD86F@NewLife> References: <1263443206.8016.47.camel@zoidberg.mbs.adelaide.edu.au> <84196F01FF584C64A79B89FECE2DD86F@NewLife> Message-ID: <1263447335.8016.57.camel@zoidberg.mbs.adelaide.edu.au> Thanks Mark, I'm not sure about that since @ISA still includes Bio::Root:IO when it's at the call, but it might be. cheers Dan Here is the entirety of the code (it reasonably short): package Bio::Assembly::IO::bowtie; use strict; use warnings; # Object preamble - inherits from Bio::Root::Root use Bio::SeqIO; use Bio::Tools::Run::Samtools; use Bio::Assembly::IO; use Carp; use Bio::Root::Root; use Bio::Root::IO; use base qw( Bio::Root::Root Bio::Root::IO Bio::Assembly::IO ); our $HD = "\@HD\tVN:1.0\tSO:unsorted\n"; our $PG = "\@PG\tID=Bowtie\n"; our $HAVE_IO_UNCOMPRESS; BEGIN { # check requirements unless ( eval "require Bio::Tools::Run::Bowtie;") { Bio::Root::Root->throw("Bio::Tools::Run::Bowtie is not available - cannot extract refdb from index."); } unless ( eval "require IO::Uncompress::Gunzip; \$HAVE_IO_UNCOMPRESS = 1") { Bio::Root::Root->warn("IO::Uncompress::Gunzip is not available; you'll have to do your decompression by hand."); } } sub new { my $class = shift; my @args = @_; my $self = $class->SUPER::new(@args); my ($file, $index, $no_head, $no_sq) = $self->_rearrange([qw(FILE INDEX NO_HEAD NO_SQ)], @args); $file =~ s/^{'_no_head'} = $no_head; $self->{'_no_sq'} = $no_sq; # get the sequence so samtools can work with it my $inspector = Bio::Tools::Run::Bowtie->new( -command => 'inspect' ); my $refdb = $inspector->run($index); my $bam_file = $self->_make_bam($self->_bowtie_to_sam($file, $refdb)); my $sam = Bio::Assembly::IO->new( -file => "<$bam_file", -refdb => $refdb , -format => 'sam' ); return $sam; } sub _bowtie_to_sam { my ($self, $file, $refdb) = @_; $self->throw("'$file' does not exist or is not readable.") unless ( -e $file && -r $file ); my $guesser = Bio::Tools::GuessSeqFormat->new(-file=>$file); $self->throw("'$file' is not a bowtie formatted file.") unless $guesser->guess =~ m/^bowtie$/; my %SQ; my $mapq = 255; my $in_pair; my @mate_line; my $mlen; if ($file =~ m/\.gz[^.]*$/) { unless ($HAVE_IO_UNCOMPRESS) { croak( "IO::Uncompress::Gunzip not available, can't expand '$_'" ); } my ($tfh, $tf) = $self->io->tempfile; my $z = IO::Uncompress::Gunzip->new($_); while (<$z>) { print $tfh $_ } close $tfh; $file = $tf; } open(my $fh, $file) or $self->throw("Can not open '$file' for reading: $!"); # create temp file for working my ($sam_tmp_h, $sam_tmp_f) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.sam' ); while ($fh) { chomp; my ($qname,$strand,$rname,$pos,$seq,$qual,$m,$details)=split("\t",$_); $SQ{$rname} = 1; my $paired_f = ($qname =~ m#/[12]#) ? 0x03 : 0; my $strand_f = ($strand eq '-') ? 0x10 : 0; my $op_strand_f = ($strand eq '+' && $paired_f) ? 0x20 : 0; my $first_f = ($qname =~ m#/1#) ? 0x40 : 0; my $second_f = ($qname =~ m#/2#) ? 0x80 : 0; my $flag = $paired_f | $strand_f | $op_strand_f | $first_f | $second_f; $pos++; my $len = length $seq; die unless $len == length $qual; my $cigar = $len.'M'; my @detail = split(',',$details); my $dist = 'NM:i:'.scalar @detail; my @mismatch; my $last_pos = 0; for (@detail) { m/(\d+):(\w)>\w/; my $err = ($1-$last_pos); $last_pos = $1+1; push @mismatch,($err,$2); } push @mismatch, $len-$last_pos; @mismatch = reverse @mismatch if $strand eq '-'; my $mismatch = join('',('MD:Z:', at mismatch)); if ($paired_f) { my $mrnm = '='; if ($in_pair) { my $mpos = $mate_line[3]; $mate_line[7] = $pos; my $isize = $mpos-$pos-$len; $mate_line[8] = -$isize; print $sam_tmp_h join("\t", at mate_line),"\n"; print $sam_tmp_h join("\t",$qname, $flag, $rname, $pos, $mapq, $cigar, $mrnm, $mpos, $isize, $seq, $qual, $mismatch, $dist),"\n"; $in_pair = 0; } else { $mlen = $len; @mate_line = ($qname, $flag, $rname, $pos, $mapq, $cigar, $mrnm, undef, undef, $seq, $qual, $mismatch, $dist); $in_pair = 1; } } else { my $mrnm = '*'; my $mpos = 0; my $isize = 0; print $sam_tmp_h join("\t",$qname, $flag, $rname, $pos, $mapq, $cigar, $mrnm, $mpos, $isize, $seq, $qual, $mismatch, $dist),"\n"; } } close($fh); $sam_tmp_h->close; return $sam_tmp_f if $self->{'_no_head'}; my ($samh, $samf) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.sam' ); # print header print $samh $HD; # print sequence dictionary unless ($self->{'_no_sq'}) { my $db = Bio::SeqIO->new( -file => $refdb, -format => 'fasta' ); while ( my $seq = $db->next_seq() ) { $SQ{$seq->id} = $seq->length if $SQ{$seq->id}; } map { print $samh join("\t", ('@SQ', "SN:$_", "LN:$SQ{$_}")), "\n" } keys %SQ; } # print program print $samh $PG; open($sam_tmp_h, $sam_tmp_f) or $self->throw("Can not open '$sam_tmp_f' for reading: $!"); print $samh $_ while ($sam_tmp_h); close($sam_tmp_h); $samh->close; return $samf; } sub _make_bam { my ($self, $file) = @_; $self->throw("'$file' does not exist or is not readable") unless ( -e $file && -r $file ); # make a sorted bam file from a sam file input my ($bamh, $bamf) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.bam' ); my ($srth, $srtf) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.srt' ); $_->close for ($bamh, $srth); my $samt = Bio::Tools::Run::Samtools->new( -command => 'view', -sam_input => 1, -bam_output => 1 ); $samt->run( -bam => $file, -out => $bamf ); $samt = Bio::Tools::Run::Samtools->new( -command => 'sort' ); $samt->run( -bam => $bamf, -pfx => $srtf); return $srtf.'.bam' } 1; On Thu, 2010-01-14 at 00:11 -0500, Mark A. Jensen wrote: > Hey Dan-- what does your constructor look like? I wonder if > something's getting > lost in new() and _initialize() chaining spaghetti- MAJ > From dan.kortschak at adelaide.edu.au Thu Jan 14 00:35:48 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Thu, 14 Jan 2010 16:05:48 +1030 Subject: [Bioperl-l] not able to use Bio::Root::IO method In-Reply-To: References: <1263443206.8016.47.camel@zoidberg.mbs.adelaide.edu.au> <84196F01FF584C64A79B89FECE2DD86F@NewLife> <1263446261.8016.50.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: <1263447348.8016.59.camel@zoidberg.mbs.adelaide.edu.au> I've had a bit of a play with that, but no luck. Dan On Thu, 2010-01-14 at 00:26 -0500, Mark A. Jensen wrote: > I've found that rearranging the items in the 'use base' array can > sometimes > recover > lost methods. I don't know enough of the arcana to know why it works. > (Sometimes, > java starts looking pretty good from here...) > From maj at fortinbras.us Thu Jan 14 00:38:00 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 14 Jan 2010 00:38:00 -0500 Subject: [Bioperl-l] Fw: not able to use Bio::Root::IO method Message-ID: <59E0A4EAF5934DC6BDCA7D8E98DB085F@NewLife> up to list ----- Original Message ----- From: "Mark A. Jensen" To: "Dan Kortschak" Sent: Thursday, January 14, 2010 12:36 AM Subject: Re: [Bioperl-l] not able to use Bio::Root::IO method > Aha-- check out the pod for Bio::Root::IO: > > "This module provides methods that will usually be needed for any sort > of file- or stream-related input/output, e.g., keeping track of a file > handle, transient printing and reading from the file handle, a close > method, automatically closing the handle on garbage collection, etc. > > To use this for your own code you will either want to inherit from > this module, or instantiate an object for every file or stream you are > dealing with. In the first case this module will most likely not be > the first class off which your class inherits; therefore you need to > call _initialize_io() with the named parameters in order to set file > handle, open file, etc automatically." > > I think you're wanting a call to $self->_initialize_io(). (There is no io() > method explicitly defined in any of the base classes.) > MAJ > ----- Original Message ----- > From: "Dan Kortschak" > To: > Sent: Wednesday, January 13, 2010 11:26 PM > Subject: [Bioperl-l] not able to use Bio::Root::IO method > > >> Hi All, >> >> I'm having a stupid problem that for some reason I just can't figure >> out. I'm putting together a B:A:IO:bowtie module to wrap around the >> B:A:IO:sam module so bowtie output can be used as an assembly start >> point. >> >> For some reason that is escaping me I can't create tempfiles! >> >> What should be the relevant code in the module: >> >> package Bio::Assembly::IO::bowtie; >> use strict; >> use warnings; >> >> # Object preamble - inherits from Bio::Root::Root >> >> use Bio::SeqIO; >> use Bio::Tools::Run::Samtools; >> use Bio::Assembly::IO; >> use Carp; >> use Bio::Root::Root; >> use Bio::Root::IO; >> use base qw( Bio::Root::Root Bio::Root::IO Bio::Assembly::IO ); >> >> >> and the line (there are a couple of others that are like to fail in the >> same way, but I've not got that far) >> >> my ($sam_tmp_h, $sam_tmp_f) = $self->io->tempfile( -dir => >> $self->tempdir(), -suffix => '.sam' ); >> >> Which dies with: >> Can't locate object method "io" via package "Bio::Assembly::IO::bowtie" >> at /usr/local/share/perl/5.10.0/Bio/Assembly/IO/bowtie.pm line 175. >> >> Relevant environment vars: >> DB<10> x @ISA >> 0 'Bio::Root::Root' >> 1 'Bio::Root::IO' >> 2 'Bio::Assembly::IO' >> >> DB<11> x $self >> 0 Bio::Assembly::IO::bowtie=HASH(0x2d226d8) >> '_no_head' => undef >> '_no_sq' => undef >> '_root_verbose' => 0 >> >> >> >> Can someone suggest what I'm missing? >> >> cheers >> Dan >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> From maj at fortinbras.us Thu Jan 14 00:50:11 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 14 Jan 2010 00:50:11 -0500 Subject: [Bioperl-l] not able to use Bio::Root::IO method In-Reply-To: <1263447348.8016.59.camel@zoidberg.mbs.adelaide.edu.au> References: <1263443206.8016.47.camel@zoidberg.mbs.adelaide.edu.au> <84196F01FF584C64A79B89FECE2DD86F@NewLife> <1263446261.8016.50.camel@zoidberg.mbs.adelaide.edu.au> <1263447348.8016.59.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: <82BFF47099684EF496DB3875D39DCA14@NewLife> For the benefit of the list, I categorically deny ever making the statement about java below.... MAJ ----- Original Message ----- From: "Dan Kortschak" To: "Mark A. Jensen" Cc: Sent: Thursday, January 14, 2010 12:35 AM Subject: Re: [Bioperl-l] not able to use Bio::Root::IO method > I've had a bit of a play with that, but no luck. > > Dan > > On Thu, 2010-01-14 at 00:26 -0500, Mark A. Jensen wrote: >> I've found that rearranging the items in the 'use base' array can >> sometimes >> recover >> lost methods. I don't know enough of the arcana to know why it works. >> (Sometimes, >> java starts looking pretty good from here...) >> > > From cjfields at illinois.edu Thu Jan 14 02:23:41 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 14 Jan 2010 01:23:41 -0600 Subject: [Bioperl-l] not able to use Bio::Root::IO method In-Reply-To: <1263447335.8016.57.camel@zoidberg.mbs.adelaide.edu.au> References: <1263443206.8016.47.camel@zoidberg.mbs.adelaide.edu.au> <84196F01FF584C64A79B89FECE2DD86F@NewLife> <1263447335.8016.57.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: You can remove separate 'use' directives if they are declared with 'use base' (they will be imported then). Also, Bio::Root::IO inherits Bio::Root::Root, and Bio::Assembly::IO should inherit from Bio::Root::IO, so the only base module you should need is Bio::Assembly::IO. It's possible having all three is confusing the interpreter. chris On Jan 13, 2010, at 11:35 PM, Dan Kortschak wrote: > Thanks Mark, I'm not sure about that since @ISA still includes > Bio::Root:IO when it's at the call, but it might be. > > cheers > Dan > > Here is the entirety of the code (it reasonably short): > > package Bio::Assembly::IO::bowtie; > use strict; > use warnings; > > # Object preamble - inherits from Bio::Root::Root > > use Bio::SeqIO; > use Bio::Tools::Run::Samtools; > use Bio::Assembly::IO; > use Carp; > use Bio::Root::Root; > use Bio::Root::IO; > use base qw( Bio::Root::Root Bio::Root::IO Bio::Assembly::IO ); > > our $HD = "\@HD\tVN:1.0\tSO:unsorted\n"; > our $PG = "\@PG\tID=Bowtie\n"; > > our $HAVE_IO_UNCOMPRESS; > BEGIN { > # check requirements > unless ( eval "require Bio::Tools::Run::Bowtie;") { > Bio::Root::Root->throw("Bio::Tools::Run::Bowtie is not available - cannot extract refdb from index."); > } > unless ( eval "require IO::Uncompress::Gunzip; \$HAVE_IO_UNCOMPRESS = 1") { > Bio::Root::Root->warn("IO::Uncompress::Gunzip is not available; you'll have to do your decompression by hand."); > } > } > > sub new { > my $class = shift; > my @args = @_; > my $self = $class->SUPER::new(@args); > my ($file, $index, $no_head, $no_sq) = $self->_rearrange([qw(FILE INDEX NO_HEAD NO_SQ)], @args); > $file =~ s/^ $self->{'_no_head'} = $no_head; > $self->{'_no_sq'} = $no_sq; > # get the sequence so samtools can work with it > my $inspector = Bio::Tools::Run::Bowtie->new( -command => 'inspect' ); > my $refdb = $inspector->run($index); > my $bam_file = $self->_make_bam($self->_bowtie_to_sam($file, $refdb)); > my $sam = Bio::Assembly::IO->new( -file => "<$bam_file", -refdb => $refdb , -format => 'sam' ); > return $sam; > } > > sub _bowtie_to_sam { > my ($self, $file, $refdb) = @_; > > $self->throw("'$file' does not exist or is not readable.") > unless ( -e $file && -r $file ); > my $guesser = Bio::Tools::GuessSeqFormat->new(-file=>$file); > $self->throw("'$file' is not a bowtie formatted file.") unless $guesser->guess =~ m/^bowtie$/; > > my %SQ; > my $mapq = 255; > my $in_pair; > my @mate_line; > my $mlen; > > if ($file =~ m/\.gz[^.]*$/) { > unless ($HAVE_IO_UNCOMPRESS) { > croak( "IO::Uncompress::Gunzip not available, can't expand '$_'" ); > } > my ($tfh, $tf) = $self->io->tempfile; > my $z = IO::Uncompress::Gunzip->new($_); > while (<$z>) { print $tfh $_ } > close $tfh; > $file = $tf; > } > > open(my $fh, $file) or > $self->throw("Can not open '$file' for reading: $!"); > > # create temp file for working > my ($sam_tmp_h, $sam_tmp_f) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.sam' ); > > while ($fh) { > chomp; > my ($qname,$strand,$rname,$pos,$seq,$qual,$m,$details)=split("\t",$_); > $SQ{$rname} = 1; > > my $paired_f = ($qname =~ m#/[12]#) ? 0x03 : 0; > my $strand_f = ($strand eq '-') ? 0x10 : 0; > my $op_strand_f = ($strand eq '+' && $paired_f) ? 0x20 : 0; > my $first_f = ($qname =~ m#/1#) ? 0x40 : 0; > my $second_f = ($qname =~ m#/2#) ? 0x80 : 0; > my $flag = $paired_f | $strand_f | $op_strand_f | $first_f | $second_f; > > $pos++; > my $len = length $seq; > die unless $len == length $qual; > my $cigar = $len.'M'; > my @detail = split(',',$details); > my $dist = 'NM:i:'.scalar @detail; > > my @mismatch; > my $last_pos = 0; > for (@detail) { > m/(\d+):(\w)>\w/; > my $err = ($1-$last_pos); > $last_pos = $1+1; > push @mismatch,($err,$2); > } > push @mismatch, $len-$last_pos; > @mismatch = reverse @mismatch if $strand eq '-'; > my $mismatch = join('',('MD:Z:', at mismatch)); > > if ($paired_f) { > my $mrnm = '='; > if ($in_pair) { > my $mpos = $mate_line[3]; > $mate_line[7] = $pos; > my $isize = $mpos-$pos-$len; > $mate_line[8] = -$isize; > print $sam_tmp_h join("\t", at mate_line),"\n"; > print $sam_tmp_h join("\t",$qname, $flag, $rname, $pos, $mapq, $cigar, $mrnm, $mpos, $isize, $seq, $qual, $mismatch, $dist),"\n"; > $in_pair = 0; > } else { > $mlen = $len; > @mate_line = ($qname, $flag, $rname, $pos, $mapq, $cigar, $mrnm, undef, undef, $seq, $qual, $mismatch, $dist); > $in_pair = 1; > } > } else { > my $mrnm = '*'; > my $mpos = 0; > my $isize = 0; > print $sam_tmp_h join("\t",$qname, $flag, $rname, $pos, $mapq, $cigar, $mrnm, $mpos, $isize, $seq, $qual, $mismatch, $dist),"\n"; > } > } > > close($fh); > $sam_tmp_h->close; > > return $sam_tmp_f if $self->{'_no_head'}; > > my ($samh, $samf) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.sam' ); > > # print header > print $samh $HD; > > # print sequence dictionary > unless ($self->{'_no_sq'}) { > my $db = Bio::SeqIO->new( -file => $refdb, -format => 'fasta' ); > while ( my $seq = $db->next_seq() ) { > $SQ{$seq->id} = $seq->length if $SQ{$seq->id}; > } > > map { print $samh join("\t", ('@SQ', "SN:$_", "LN:$SQ{$_}")), "\n" } keys %SQ; > } > > # print program > print $samh $PG; > > open($sam_tmp_h, $sam_tmp_f) or > $self->throw("Can not open '$sam_tmp_f' for reading: $!"); > > print $samh $_ while ($sam_tmp_h); > > close($sam_tmp_h); > $samh->close; > > return $samf; > } > > sub _make_bam { > my ($self, $file) = @_; > > $self->throw("'$file' does not exist or is not readable") > unless ( -e $file && -r $file ); > > # make a sorted bam file from a sam file input > my ($bamh, $bamf) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.bam' ); > my ($srth, $srtf) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.srt' ); > $_->close for ($bamh, $srth); > > my $samt = Bio::Tools::Run::Samtools->new( -command => 'view', > -sam_input => 1, > -bam_output => 1 ); > > $samt->run( -bam => $file, -out => $bamf ); > > $samt = Bio::Tools::Run::Samtools->new( -command => 'sort' ); > > $samt->run( -bam => $bamf, -pfx => $srtf); > > return $srtf.'.bam' > } > > 1; > > > On Thu, 2010-01-14 at 00:11 -0500, Mark A. Jensen wrote: >> Hey Dan-- what does your constructor look like? I wonder if >> something's getting >> lost in new() and _initialize() chaining spaghetti- MAJ >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Jan 14 02:25:05 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 14 Jan 2010 01:25:05 -0600 Subject: [Bioperl-l] Fw: not able to use Bio::Root::IO method In-Reply-To: <59E0A4EAF5934DC6BDCA7D8E98DB085F@NewLife> References: <59E0A4EAF5934DC6BDCA7D8E98DB085F@NewLife> Message-ID: <1DB926E1-9C6F-4B96-8D7E-28317DD7DE42@illinois.edu> Yes, that's true. The call to an io() is a Bio::Tools::Run::WrapperBase thing (the io() is a Bio::Root::IO instance). chris On Jan 13, 2010, at 11:38 PM, Mark A. Jensen wrote: > up to list > ----- Original Message ----- From: "Mark A. Jensen" > To: "Dan Kortschak" > Sent: Thursday, January 14, 2010 12:36 AM > Subject: Re: [Bioperl-l] not able to use Bio::Root::IO method > > >> Aha-- check out the pod for Bio::Root::IO: >> "This module provides methods that will usually be needed for any sort >> of file- or stream-related input/output, e.g., keeping track of a file >> handle, transient printing and reading from the file handle, a close >> method, automatically closing the handle on garbage collection, etc. >> To use this for your own code you will either want to inherit from >> this module, or instantiate an object for every file or stream you are >> dealing with. In the first case this module will most likely not be >> the first class off which your class inherits; therefore you need to >> call _initialize_io() with the named parameters in order to set file >> handle, open file, etc automatically." >> I think you're wanting a call to $self->_initialize_io(). (There is no io() method explicitly defined in any of the base classes.) >> MAJ >> ----- Original Message ----- From: "Dan Kortschak" >> To: >> Sent: Wednesday, January 13, 2010 11:26 PM >> Subject: [Bioperl-l] not able to use Bio::Root::IO method >>> Hi All, >>> I'm having a stupid problem that for some reason I just can't figure >>> out. I'm putting together a B:A:IO:bowtie module to wrap around the >>> B:A:IO:sam module so bowtie output can be used as an assembly start >>> point. >>> For some reason that is escaping me I can't create tempfiles! >>> What should be the relevant code in the module: >>> package Bio::Assembly::IO::bowtie; >>> use strict; >>> use warnings; >>> # Object preamble - inherits from Bio::Root::Root >>> use Bio::SeqIO; >>> use Bio::Tools::Run::Samtools; >>> use Bio::Assembly::IO; >>> use Carp; >>> use Bio::Root::Root; >>> use Bio::Root::IO; >>> use base qw( Bio::Root::Root Bio::Root::IO Bio::Assembly::IO ); >>> and the line (there are a couple of others that are like to fail in the >>> same way, but I've not got that far) >>> my ($sam_tmp_h, $sam_tmp_f) = $self->io->tempfile( -dir => >>> $self->tempdir(), -suffix => '.sam' ); >>> Which dies with: >>> Can't locate object method "io" via package "Bio::Assembly::IO::bowtie" >>> at /usr/local/share/perl/5.10.0/Bio/Assembly/IO/bowtie.pm line 175. >>> Relevant environment vars: >>> DB<10> x @ISA 0 'Bio::Root::Root' >>> 1 'Bio::Root::IO' >>> 2 'Bio::Assembly::IO' >>> DB<11> x $self >>> 0 Bio::Assembly::IO::bowtie=HASH(0x2d226d8) >>> '_no_head' => undef >>> '_no_sq' => undef >>> '_root_verbose' => 0 >>> Can someone suggest what I'm missing? >>> cheers >>> Dan >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From dan.kortschak at adelaide.edu.au Thu Jan 14 02:59:20 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Thu, 14 Jan 2010 18:29:20 +1030 Subject: [Bioperl-l] not able to use Bio::Root::IO method In-Reply-To: References: <1263443206.8016.47.camel@zoidberg.mbs.adelaide.edu.au> <84196F01FF584C64A79B89FECE2DD86F@NewLife> <1263447335.8016.57.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: <1263455960.4630.3.camel@epistle> Thanks Chris, I've done that, and since the inheritance is direct (rather than being a constructed attribute in the object hash) the calls are $obj->temp* rather than the $obj->io->temp* that I was using. It works now and is much clearer having gotten rid of much of the declarations. cheers Dan On Thu, 2010-01-14 at 01:23 -0600, Chris Fields wrote: > You can remove separate 'use' directives if they are declared with > 'use base' (they will be imported then). Also, Bio::Root::IO inherits > Bio::Root::Root, and Bio::Assembly::IO should inherit from > Bio::Root::IO, so the only base module you should need is > Bio::Assembly::IO. It's possible having all three is confusing the > interpreter. > > chris From marcelo011982 at gmail.com Thu Jan 14 08:44:25 2010 From: marcelo011982 at gmail.com (Marcelo Iwata) Date: Thu, 14 Jan 2010 11:44:25 -0200 Subject: [Bioperl-l] Blast to Clustalw Format In-Reply-To: References: <1c9f28971001131012w65378e9bw12d892e874028056@mail.gmail.com> Message-ID: <1c9f28971001140544v715fc9acue40f836a31529307@mail.gmail.com> Thanks Mark. I think that most of you already know it. But , i'll put it for new users: #!/usr/bin/perl -w use strict; use Bio::SearchIO; use Bio::AlignIO; my $in = new Bio::SearchIO(-format => 'blast', -file => ' ../../fontes/exemplos/blat/teste2/output.blast '); my $aln; my $alnIO; $alnIO = Bio::AlignIO->new(-format =>"clustalw", -file => ">hsp.aln"); while ( my $result = $in->next_result ) { ## $result is a Bio::Search::Result::ResultI compliant object while ( my $hit = $result->next_hit ) { ## $hit is a Bio::Search::Hit::HitI compliant object while ( my $hsp = $hit->next_hsp ) { ## $hsp is a Bio::Search::HSP::HSPI compliant object $aln = $hsp->get_aln; $alnIO->write_aln($aln); } } } On Wed, Jan 13, 2010 at 4:44 PM, Mark A. Jensen wrote: > Marcelo- > Yes-- look at the code snip at > http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_SearchIO > combined with the snip at > http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods > (using -format => 'clustalw') > cheers MAJ > ----- Original Message ----- From: "Marcelo Iwata" < > marcelo011982 at gmail.com> > To: > Sent: Wednesday, January 13, 2010 1:12 PM > Subject: [Bioperl-l] Blast to Clustalw Format > > > Hi.. >> I have an simple Blast result, such as blastn. >> Is there an scrip to transform such result to Clustalw format in Bioperl >> ?(.aln) >> >> Thanx for any help. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> From marcelo011982 at gmail.com Thu Jan 14 08:46:21 2010 From: marcelo011982 at gmail.com (Marcelo Iwata) Date: Thu, 14 Jan 2010 11:46:21 -0200 Subject: [Bioperl-l] Blast to Clustalw Format In-Reply-To: <1c9f28971001140544v715fc9acue40f836a31529307@mail.gmail.com> References: <1c9f28971001131012w65378e9bw12d892e874028056@mail.gmail.com> <1c9f28971001140544v715fc9acue40f836a31529307@mail.gmail.com> Message-ID: <1c9f28971001140546r275a0b67pec68de5ab8bff015@mail.gmail.com> Sorry , the correct code is: #!/usr/bin/perl -w use strict; use Bio::SearchIO; use Bio::AlignIO; my $in = new Bio::SearchIO(-format => 'blast', -file => ' ../../fontes/exemplos/blat/teste2/output.blast '); my $aln; my $alnIO; $alnIO = Bio::AlignIO->new(-format =>"clustalw", -file => ">hsp.aln"); while ( my $result = $in->next_result ) { ## $result is a Bio::Search::Result::ResultI compliant object while ( my $hit = $result->next_hit ) { ## $hit is a Bio::Search::Hit::HitI compliant object while ( my $hsp = $hit->next_hsp ) { ## $hsp is a Bio::Search::HSP::HSPI compliant object $aln = $hsp->get_aln; $alnIO->write_aln($aln); } } } On Thu, Jan 14, 2010 at 11:44 AM, Marcelo Iwata wrote: > Thanks Mark. > I think that most of you already know it. > But , i'll put it for new users: > > > #!/usr/bin/perl -w > > use strict; > use Bio::SearchIO; > use Bio::AlignIO; > > my $in = new Bio::SearchIO(-format => 'blast', > -file => ' > ../../fontes/exemplos/blat/teste2/output.blast '); > my $aln; > my $alnIO; > $alnIO = Bio::AlignIO->new(-format =>"clustalw", -file => ">hsp.aln"); > while ( my $result = $in->next_result ) { > ## $result is a Bio::Search::Result::ResultI compliant object > while ( my $hit = $result->next_hit ) { > ## $hit is a Bio::Search::Hit::HitI compliant object > while ( my $hsp = $hit->next_hsp ) { > ## $hsp is a Bio::Search::HSP::HSPI compliant object > $aln = $hsp->get_aln; > $alnIO->write_aln($aln); > > > } > } > } > > > On Wed, Jan 13, 2010 at 4:44 PM, Mark A. Jensen wrote: > >> Marcelo- >> Yes-- look at the code snip at >> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_SearchIO >> combined with the snip at >> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods >> (using -format => 'clustalw') >> cheers MAJ >> ----- Original Message ----- From: "Marcelo Iwata" < >> marcelo011982 at gmail.com> >> To: >> Sent: Wednesday, January 13, 2010 1:12 PM >> Subject: [Bioperl-l] Blast to Clustalw Format >> >> >> Hi.. >>> I have an simple Blast result, such as blastn. >>> Is there an scrip to transform such result to Clustalw format in >>> Bioperl >>> ?(.aln) >>> >>> Thanx for any help. >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> > From maj at fortinbras.us Thu Jan 14 08:54:31 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 14 Jan 2010 08:54:31 -0500 Subject: [Bioperl-l] Blast to Clustalw Format In-Reply-To: <1c9f28971001140546r275a0b67pec68de5ab8bff015@mail.gmail.com> References: <1c9f28971001131012w65378e9bw12d892e874028056@mail.gmail.com><1c9f28971001140544v715fc9acue40f836a31529307@mail.gmail.com> <1c9f28971001140546r275a0b67pec68de5ab8bff015@mail.gmail.com> Message-ID: <1B8891488AA746F49BCAAB531FBE4D0B@NewLife> Thanks Marcelo-- code snips always appreciated! MAJ ----- Original Message ----- From: "Marcelo Iwata" To: Sent: Thursday, January 14, 2010 8:46 AM Subject: Re: [Bioperl-l] Blast to Clustalw Format > Sorry , the correct code is: > > > > #!/usr/bin/perl -w > > use strict; > use Bio::SearchIO; > use Bio::AlignIO; > > my $in = new Bio::SearchIO(-format => 'blast', > -file => ' > ../../fontes/exemplos/blat/teste2/output.blast '); > my $aln; > my $alnIO; > $alnIO = Bio::AlignIO->new(-format =>"clustalw", -file => ">hsp.aln"); > while ( my $result = $in->next_result ) { > ## $result is a Bio::Search::Result::ResultI compliant object > while ( my $hit = $result->next_hit ) { > ## $hit is a Bio::Search::Hit::HitI compliant object > while ( my $hsp = $hit->next_hsp ) { > ## $hsp is a Bio::Search::HSP::HSPI compliant object > $aln = $hsp->get_aln; > $alnIO->write_aln($aln); > > } > } > } > > > On Thu, Jan 14, 2010 at 11:44 AM, Marcelo Iwata > wrote: > >> Thanks Mark. >> I think that most of you already know it. >> But , i'll put it for new users: >> >> >> #!/usr/bin/perl -w >> >> use strict; >> use Bio::SearchIO; >> use Bio::AlignIO; >> >> my $in = new Bio::SearchIO(-format => 'blast', >> -file => ' >> ../../fontes/exemplos/blat/teste2/output.blast '); >> my $aln; >> my $alnIO; >> $alnIO = Bio::AlignIO->new(-format =>"clustalw", -file => ">hsp.aln"); >> while ( my $result = $in->next_result ) { >> ## $result is a Bio::Search::Result::ResultI compliant object >> while ( my $hit = $result->next_hit ) { >> ## $hit is a Bio::Search::Hit::HitI compliant object >> while ( my $hsp = $hit->next_hsp ) { >> ## $hsp is a Bio::Search::HSP::HSPI compliant object >> $aln = $hsp->get_aln; >> $alnIO->write_aln($aln); >> >> >> } >> } >> } >> >> >> On Wed, Jan 13, 2010 at 4:44 PM, Mark A. Jensen wrote: >> >>> Marcelo- >>> Yes-- look at the code snip at >>> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_SearchIO >>> combined with the snip at >>> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods >>> (using -format => 'clustalw') >>> cheers MAJ >>> ----- Original Message ----- From: "Marcelo Iwata" < >>> marcelo011982 at gmail.com> >>> To: >>> Sent: Wednesday, January 13, 2010 1:12 PM >>> Subject: [Bioperl-l] Blast to Clustalw Format >>> >>> >>> Hi.. >>>> I have an simple Blast result, such as blastn. >>>> Is there an scrip to transform such result to Clustalw format in >>>> Bioperl >>>> ?(.aln) >>>> >>>> Thanx for any help. >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From sidd.basu at gmail.com Thu Jan 14 14:15:04 2010 From: sidd.basu at gmail.com (Siddhartha Basu) Date: Thu, 14 Jan 2010 13:15:04 -0600 Subject: [Bioperl-l] reading blast report Message-ID: <4b4f6d3b.5544f10a.40b1.4126@mx.google.com> Hi, I have a script that reads a tblastn report(13000 records) and loads in a chado database(Bio::Chado::Schema module), however the machine runs of memory. I am trying to figure out other than loading the database stuff if it the reading of SearchIO module could consume a lot of memory. So, when i am reading a blast file and getting the result object .... while (my $result = $searchio->next_result) * Does the searchio object loads a huge chunk of file in the memory or for each iteration it only reads a part of the result. * Does doing an index on blast report and then reading from it be much faster and why. And is there any way i could iterate through each record in the index, will that be helpful. -siddhartha From jason at bioperl.org Thu Jan 14 14:53:29 2010 From: jason at bioperl.org (Jason Stajich) Date: Thu, 14 Jan 2010 11:53:29 -0800 Subject: [Bioperl-l] reading blast report In-Reply-To: <4b4f6d3b.5544f10a.40b1.4126@mx.google.com> References: <4b4f6d3b.5544f10a.40b1.4126@mx.google.com> Message-ID: <83075C66-90CC-4250-890E-FE42F5FEB019@bioperl.org> What aspects of the report are you loading? You might consider the blast report as tab-delimited (-m 8 format) if you only are interested in start/end positions and scores of ailgnments which is a simpler and reduced dataset that has lower memory footprint by the parser. Searchio (default) -format => blast - you can try the BLAST -format => blast_pull instead which lazy parses to create objects and will reduce memory consumption. -jason On Jan 14, 2010, at 11:15 AM, Siddhartha Basu wrote: > Hi, > I have a script that reads a tblastn report(13000 records) and loads > in > a chado database(Bio::Chado::Schema module), however the machine > runs of memory. I am trying to figure > out other than loading the database stuff > if it the reading of SearchIO module could consume a lot of memory. > So, > when i am reading a blast file and getting the result object .... > > while (my $result = $searchio->next_result) > > * Does the searchio object loads a huge chunk of file in the memory or > for each iteration it only reads a part of the result. > > * Does doing an index on blast report and then reading from it be much > faster and why. And is there any way i could iterate through each > record in the index, will that be helpful. > > -siddhartha > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ From sidd.basu at gmail.com Thu Jan 14 15:15:45 2010 From: sidd.basu at gmail.com (Siddhartha Basu) Date: Thu, 14 Jan 2010 14:15:45 -0600 Subject: [Bioperl-l] Re: reading blast report In-Reply-To: <83075C66-90CC-4250-890E-FE42F5FEB019@bioperl.org> References: <4b4f6d3b.5544f10a.40b1.4126@mx.google.com> <83075C66-90CC-4250-890E-FE42F5FEB019@bioperl.org> Message-ID: <4b4f7b74.5744f10a.7087.4813@mx.google.com> On Thu, 14 Jan 2010, Jason Stajich wrote: > What aspects of the report are you loading? You might consider the blast > report as tab-delimited (-m 8 format) if you only are interested in > start/end positions and scores of ailgnments which is a simpler and reduced > dataset that has lower memory footprint by the parser. I think this would be a better approach i am mostly interested in start/end/score data only. > > Searchio (default) -format => blast - you can try the BLAST -format => > blast_pull instead which lazy parses to create objects and will reduce > memory consumption. It's another good option though. But just out of curosity, so the regular blast parser do load the entire file in the memory consider the output consist of multiple Results concatenated together into a single file. Could anybody clarify. thanks, -siddhartha > > -jason > On Jan 14, 2010, at 11:15 AM, Siddhartha Basu wrote: > > > Hi, > > I have a script that reads a tblastn report(13000 records) and loads in > > a chado database(Bio::Chado::Schema module), however the machine runs of > > memory. I am trying to figure > > out other than loading the database stuff > > if it the reading of SearchIO module could consume a lot of memory. So, > > when i am reading a blast file and getting the result object .... > > > > while (my $result = $searchio->next_result) > > > > * Does the searchio object loads a huge chunk of file in the memory or > > for each iteration it only reads a part of the result. > > > > * Does doing an index on blast report and then reading from it be much > > faster and why. And is there any way i could iterate through each > > record in the index, will that be helpful. > > > > -siddhartha > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > http://fungalgenomes.org/ > From jason at bioperl.org Thu Jan 14 16:28:29 2010 From: jason at bioperl.org (Jason Stajich) Date: Thu, 14 Jan 2010 13:28:29 -0800 Subject: [Bioperl-l] reading blast report In-Reply-To: <4b4f7b74.5744f10a.7087.4813@mx.google.com> References: <4b4f6d3b.5544f10a.40b1.4126@mx.google.com> <83075C66-90CC-4250-890E-FE42F5FEB019@bioperl.org> <4b4f7b74.5744f10a.7087.4813@mx.google.com> Message-ID: On Jan 14, 2010, at 12:15 PM, Siddhartha Basu wrote: > On Thu, 14 Jan 2010, Jason Stajich wrote: > >> What aspects of the report are you loading? You might consider the >> blast >> report as tab-delimited (-m 8 format) if you only are interested in >> start/end positions and scores of ailgnments which is a simpler and >> reduced >> dataset that has lower memory footprint by the parser. > > I think this would be a better approach i am mostly interested in > start/end/score data only. > >> >> Searchio (default) -format => blast - you can try the BLAST -format >> => >> blast_pull instead which lazy parses to create objects and will >> reduce >> memory consumption. > > It's another good option though. But just out of curosity, so the > regular blast parser do load the entire file in the memory consider > the > output consist of multiple Results concatenated together into a > single file. Could anybody clarify. > > thanks, > -siddhartha Each result is parsed (1 result per query) and all the hits and HSPs are parsed and brought into memory with the standard (non-pull) approach. The SearchIO iterates at the level of result - that is why you call next_result which parses each one at a time. > > >> >> -jason >> On Jan 14, 2010, at 11:15 AM, Siddhartha Basu wrote: >> >>> Hi, >>> I have a script that reads a tblastn report(13000 records) and >>> loads in >>> a chado database(Bio::Chado::Schema module), however the machine >>> runs of >>> memory. I am trying to figure >>> out other than loading the database stuff >>> if it the reading of SearchIO module could consume a lot of >>> memory. So, >>> when i am reading a blast file and getting the result object .... >>> >>> while (my $result = $searchio->next_result) >>> >>> * Does the searchio object loads a huge chunk of file in the >>> memory or >>> for each iteration it only reads a part of the result. >>> >>> * Does doing an index on blast report and then reading from it be >>> much >>> faster and why. And is there any way i could iterate through each >>> record in the index, will that be helpful. >>> >>> -siddhartha >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> jason.stajich at gmail.com >> jason at bioperl.org >> http://fungalgenomes.org/ >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ From sidd.basu at gmail.com Thu Jan 14 16:40:42 2010 From: sidd.basu at gmail.com (Siddhartha Basu) Date: Thu, 14 Jan 2010 15:40:42 -0600 Subject: [Bioperl-l] Re: reading blast report In-Reply-To: References: <4b4f6d3b.5544f10a.40b1.4126@mx.google.com> <83075C66-90CC-4250-890E-FE42F5FEB019@bioperl.org> <4b4f7b74.5744f10a.7087.4813@mx.google.com> Message-ID: <4b4f8f5d.5644f10a.2be2.47dc@mx.google.com> Thanks jason for clarification. On Thu, 14 Jan 2010, Jason Stajich wrote: > > On Jan 14, 2010, at 12:15 PM, Siddhartha Basu wrote: > > > On Thu, 14 Jan 2010, Jason Stajich wrote: > > > >> What aspects of the report are you loading? You might consider the blast > >> report as tab-delimited (-m 8 format) if you only are interested in > >> start/end positions and scores of ailgnments which is a simpler and > >> reduced > >> dataset that has lower memory footprint by the parser. > > > > I think this would be a better approach i am mostly interested in > > start/end/score data only. > > > >> > >> Searchio (default) -format => blast - you can try the BLAST -format => > >> blast_pull instead which lazy parses to create objects and will reduce > >> memory consumption. > > > > It's another good option though. But just out of curosity, so the > > regular blast parser do load the entire file in the memory consider the > > output consist of multiple Results concatenated together into a > > single file. Could anybody clarify. > > > > thanks, > > -siddhartha > > Each result is parsed (1 result per query) and all the hits and HSPs are > parsed and brought into memory with the standard (non-pull) approach. > The SearchIO iterates at the level of result - that is why you call > next_result which parses each one at a time. > > > > > > >> > >> -jason > >> On Jan 14, 2010, at 11:15 AM, Siddhartha Basu wrote: > >> > >>> Hi, > >>> I have a script that reads a tblastn report(13000 records) and loads in > >>> a chado database(Bio::Chado::Schema module), however the machine runs > >>> of > >>> memory. I am trying to figure > >>> out other than loading the database stuff > >>> if it the reading of SearchIO module could consume a lot of memory. So, > >>> when i am reading a blast file and getting the result object .... > >>> > >>> while (my $result = $searchio->next_result) > >>> > >>> * Does the searchio object loads a huge chunk of file in the memory or > >>> for each iteration it only reads a part of the result. > >>> > >>> * Does doing an index on blast report and then reading from it be much > >>> faster and why. And is there any way i could iterate through each > >>> record in the index, will that be helpful. > >>> > >>> -siddhartha > >>> > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> -- > >> Jason Stajich > >> jason.stajich at gmail.com > >> jason at bioperl.org > >> http://fungalgenomes.org/ > >> > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > http://fungalgenomes.org/ > From SMarkel at accelrys.com Thu Jan 14 17:58:06 2010 From: SMarkel at accelrys.com (Scott Markel) Date: Thu, 14 Jan 2010 14:58:06 -0800 Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes Message-ID: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net> We've been looking at Bio::Tools::Run::RemoteBlast after some feedback from our customers. Due to network irregularities (not sure what else to call it) users see the getting of remote BLAST results as somewhat random. When results come back the hits are fine, but sometimes no information comes back at all. Retrying helps. In looking at RemoteBlast.pm there are four "return -1" cases. * $status eq 'ERROR' (return on line 614) * $line =~ /ERROR/I (return on line 628) * !$got_content (return on line 648) * !$response->is_success (return on line 655) In the case of no content we'd like to retry remote BLAST. We're happy to do that part in our Pipeline Pilot Perl code wrapper for the BioPerl module, but we only want to retry in that case, not the other three. What would happen if that third "return -1" changed to a different return value? Scott Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel at accelrys.com Accelrys (Pipeline Pilot R&D) mobile: +1 858 205 3653 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 San Diego, CA 92121 fax: +1 858 799 5222 USA web: http://www.accelrys.com http://www.linkedin.com/in/smarkel Vice President, Board of Directors: International Society for Computational Biology Chair: ISCB Publications Committee Associate Editor: PLoS Computational Biology Editorial Board: Briefings in Bioinformatics From nickjd at gmail.com Wed Jan 13 08:18:12 2010 From: nickjd at gmail.com (NickJD) Date: Wed, 13 Jan 2010 05:18:12 -0800 (PST) Subject: [Bioperl-l] Parsing PSI-BLAST results with SearchIO Message-ID: <65554589-081b-4297-ab68-9ddfbd3d9944@c34g2000yqn.googlegroups.com> I am trying to parse PSI-BLAST results using SearchIO and some very basic code just to read the number of hits, number of hsps, etc. I have done 10 rounds on 1 input sequence and parsed it but it seems to treat each round as a separate result, so round/iteration is always 1 and new_hits its always the total list not the ones that are new to that round. Does anyone have any experience of this? Thanks, Nick From dsidote at waksman.rutgers.edu Wed Jan 13 10:08:48 2010 From: dsidote at waksman.rutgers.edu (David J Sidote) Date: Wed, 13 Jan 2010 10:08:48 -0500 Subject: [Bioperl-l] Bioinformatician position - Waksman Institute Message-ID: <4b42af671001130708i703ecce0u47348484321714f@mail.gmail.com> Bioinformatician ? Research Assistant Professor The Waksman Institute of Microbiology located on the New Brunswick campus of Rutgers University is seeking a highly motivated and talented bioinformatics scientist for an Research Assistant Professor appointment. The successful candidate will analyze genome, transcriptome, and epigenome data generated on the Life Sciences 454, Illumina, and AB SOLiD high-throughput sequencing platforms. Excellent communication and teamwork skills are essential as the successful candidate will work closely with individual research groups to develop software to facilitate the visualization, quantification, and interpretation of the data. The successful candidate will be expected to contribute to the publication of scientific literature and to present at seminars and conferences. Qualifications: - PhD in molecular biology, genetics, bioinformatics, systems biology or other related fields; candidates with a PhD in physics, mathematics, or computer science with some working knowledge of biology and experience are encouraged to apply. - Demonstrated scientific track record - Highly proficient in perl, python, or ruby programming, linux/unix scripting, and SQL. - Experience with R is desirable but not required - Experience with high-throughput sequencing, microarrays, or other high-throughput biological platforms - Excellent communication and organizational skills How to Apply: Please send a cover letter stating your current research interests, why you are interested in this position, and how your skill set complements this position along with a curriculum vitae, and the names and contact information of three references to hr at waksman.rutgers.edu. Please include "Bioinformatics Assistant Research Professor" in the subject line. Rutgers is an equal opportunity employer. For more information about this position please contact: Dr. David Sidote (dsidote at waksman.rutgers.edu) From albezg at gmail.com Wed Jan 13 20:57:27 2010 From: albezg at gmail.com (albezg) Date: Wed, 13 Jan 2010 20:57:27 -0500 Subject: [Bioperl-l] AlignIO crashes when reading stockholm alignment with negative PDB ranges In-Reply-To: <49C405F0.5050100@gmail.com> References: <49C2B97B.7070304@gmail.com> <49C405F0.5050100@gmail.com> Message-ID: <4B4E7A07.7070805@gmail.com> Hi all, I have a problem using AlignIO to read Pfam database: ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release/Pfam-A.seed.gz The database is in STOCKHOLM 1.0 format. AlignIO can read the alignment OK until the alignment PF00331.13. There it crashes with the following message: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: '1-344' is not an integer. STACK: Error::throw STACK: Bio::Root::Root::throw /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Root/Root.pm:368 STACK: Bio::Range::end /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Range.pm:228 STACK: Bio::Annotation::Target::new /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Annotation/Target.pm:82 STACK: Bio::AlignIO::Handler::GenericAlignHandler::_stockholm_target /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/GenericAlignHandler.pm:293 STACK: Bio::AlignIO::Handler::GenericAlignHandler::data_handler /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/GenericAlignHandler.pm:73 STACK: Bio::AlignIO::stockholm::next_aln /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/stockholm.pm:471 STACK: /home/albezg/scripts/pfam2fasta.pl:22 ----------------------------------------------------------- It appears this is caused by this entry: #=GS XYNA_PSEFL/263-608 DR PDB; 1e5n B; -1-344; I don't care about residues in PDB, so I have just removed minus signs from the ranges. This seems to have fixed the crashing. Is it a known problem? Is there a solution for it? Thanks, Alexandr On 03/20/2009 05:09 PM, albezg wrote: > > I'm trying to change FASTA header(display_id) for a sequence in an > alignment(SimpleAlign). > > There are no issues when I print it, however when I use AlignIO to write > the alignment to a FASTA file, it does not work. Is this behavior intended? > > Demo code: http://github.com/jhannah/sandbox/tree/master/Bio_AlignIO_bug > > The error: > ------------- EXCEPTION ------------- > MSG: No sequence with name [1/1-11] > STACK Bio::SimpleAlign::displayname > /scratch/BioSoftware/bioperl-live/Bio/SimpleAlign.pm:2659 > STACK Bio::AlignIO::fasta::write_aln > /scratch/BioSoftware/bioperl-live/Bio/AlignIO/fasta.pm:200 > STACK toplevel ./demo.pl:14 > ------------------------------------- > > Alexandr From mitch_skinner at berkeley.edu Thu Jan 14 17:10:53 2010 From: mitch_skinner at berkeley.edu (Mitch Skinner) Date: Thu, 14 Jan 2010 14:10:53 -0800 Subject: [Bioperl-l] filter_by_location in Bio::DB::SeqFeature::Store::memory Message-ID: <4B4F966D.3030300@berkeley.edu> Hi, Some people haven't been getting all of the features in their GFF3 into JBrowse, and a nice test case that James Casbon posted to the list helped me track it down. Here's an example of the behavior I was seeing with BioPerl 1.6.1 (using Devel::REPL): ============== $ use Bio::DB::SeqFeature::Store $ my $db = Bio::DB::SeqFeature::Store->new(-adaptor=>"memory", -dsn=>"casbon.gff3") $Bio_DB_SeqFeature_Store_memory1 = Bio::DB::SeqFeature::Store::memory=HASH(0xa27ceec); $ $db->features(-seq_id=>"CYP2C8") $ARRAY1 = [ Feature:src(41), region(CYP2C8), Feature:src(37), Feature:src(39), Feature:src(42), Feature:src(40), Feature:src(38) ]; ============== I expected to also see the features with IDs 43 and 44 (the gff3 file is attached). I think there's a problem in the filter_by_location method. If start and end parameters aren't passed to the method, it sets default start and end values that lead it to examine all of the bins in its index. But the end value that it creates is at the beginning of the last bin, and I think it should be at the end of the last bin instead. The attached patch changes it to be at the end of the last bin. Regards, Mitch -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: casbon.gff3 URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: bdsfsm-filter_by_location.patch URL: From jason at bioperl.org Thu Jan 14 19:20:43 2010 From: jason at bioperl.org (Jason Stajich) Date: Thu, 14 Jan 2010 16:20:43 -0800 Subject: [Bioperl-l] AlignIO crashes when reading stockholm alignment with negative PDB ranges In-Reply-To: <4B4E7A07.7070805@gmail.com> References: <49C2B97B.7070304@gmail.com> <49C405F0.5050100@gmail.com> <4B4E7A07.7070805@gmail.com> Message-ID: <94913CE4-22AA-407C-9B27-7572A680C9F7@bioperl.org> Seems like improper data really -- "-1" is an improper coordinate as far as the parser is concerned. You may want to tell Pfam that there is possible error in the dumper since that was the only record that had this problem? -jason On Jan 13, 2010, at 5:57 PM, albezg wrote: > Hi all, > > I have a problem using AlignIO to read Pfam database: > ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release/Pfam-A.seed.gz > The database is in STOCKHOLM 1.0 format. AlignIO can read the > alignment OK until the alignment PF00331.13. There it crashes with > the following message: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: '1-344' is not an integer. > > STACK: Error::throw > STACK: Bio::Root::Root::throw /home/albezg/lib/perl5/site_perl/ > 5.10.0/Bio/Root/Root.pm:368 > STACK: Bio::Range::end /home/albezg/lib/perl5/site_perl/5.10.0/Bio/ > Range.pm:228 > STACK: Bio::Annotation::Target::new /home/albezg/lib/perl5/site_perl/ > 5.10.0/Bio/Annotation/Target.pm:82 > STACK: > Bio::AlignIO::Handler::GenericAlignHandler::_stockholm_target /home/ > albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/ > GenericAlignHandler.pm:293 > STACK: Bio::AlignIO::Handler::GenericAlignHandler::data_handler / > home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/ > GenericAlignHandler.pm:73 > STACK: Bio::AlignIO::stockholm::next_aln /home/albezg/lib/perl5/ > site_perl/5.10.0/Bio/AlignIO/stockholm.pm:471 > STACK: /home/albezg/scripts/pfam2fasta.pl:22 > ----------------------------------------------------------- > > It appears this is caused by this entry: > #=GS XYNA_PSEFL/263-608 DR PDB; 1e5n B; -1-344; > > I don't care about residues in PDB, so I have just removed minus > signs from the ranges. This seems to have fixed the crashing. > > Is it a known problem? Is there a solution for it? > > Thanks, > Alexandr > > > On 03/20/2009 05:09 PM, albezg wrote: >> >> I'm trying to change FASTA header(display_id) for a sequence in an >> alignment(SimpleAlign). >> >> There are no issues when I print it, however when I use AlignIO to >> write >> the alignment to a FASTA file, it does not work. Is this behavior >> intended? >> >> Demo code: http://github.com/jhannah/sandbox/tree/master/Bio_AlignIO_bug >> >> The error: >> ------------- EXCEPTION ------------- >> MSG: No sequence with name [1/1-11] >> STACK Bio::SimpleAlign::displayname >> /scratch/BioSoftware/bioperl-live/Bio/SimpleAlign.pm:2659 >> STACK Bio::AlignIO::fasta::write_aln >> /scratch/BioSoftware/bioperl-live/Bio/AlignIO/fasta.pm:200 >> STACK toplevel ./demo.pl:14 >> ------------------------------------- >> >> Alexandr > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ From maj at fortinbras.us Thu Jan 14 21:00:31 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 14 Jan 2010 21:00:31 -0500 Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes In-Reply-To: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net> References: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net> Message-ID: How about returning 1, 2, 4 for the non-zero cases, with some error constants set for convenience? MAJ ----- Original Message ----- From: "Scott Markel" To: Sent: Thursday, January 14, 2010 5:58 PM Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes > We've been looking at Bio::Tools::Run::RemoteBlast after some feedback > from our customers. Due to network irregularities (not sure what else > to call it) users see the getting of remote BLAST results as somewhat > random. When results come back the hits are fine, but sometimes no > information comes back at all. Retrying helps. > > In looking at RemoteBlast.pm there are four "return -1" cases. > > * $status eq 'ERROR' (return on line 614) > * $line =~ /ERROR/I (return on line 628) > * !$got_content (return on line 648) > * !$response->is_success (return on line 655) > > In the case of no content we'd like to retry remote BLAST. We're happy > to do that part in our Pipeline Pilot Perl code wrapper for the BioPerl > module, but we only want to retry in that case, not the other three. > > What would happen if that third "return -1" changed to a different > return value? > > Scott > > Scott Markel, Ph.D. > Principal Bioinformatics Architect email: smarkel at accelrys.com > Accelrys (Pipeline Pilot R&D) mobile: +1 858 205 3653 > 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 > San Diego, CA 92121 fax: +1 858 799 5222 > USA web: http://www.accelrys.com > > http://www.linkedin.com/in/smarkel > Vice President, Board of Directors: > International Society for Computational Biology > Chair: ISCB Publications Committee > Associate Editor: PLoS Computational Biology > Editorial Board: Briefings in Bioinformatics > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Thu Jan 14 19:42:31 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 14 Jan 2010 18:42:31 -0600 Subject: [Bioperl-l] reading blast report In-Reply-To: <4b4f7b74.5744f10a.7087.4813@mx.google.com> References: <4b4f6d3b.5544f10a.40b1.4126@mx.google.com> <83075C66-90CC-4250-890E-FE42F5FEB019@bioperl.org> <4b4f7b74.5744f10a.7087.4813@mx.google.com> Message-ID: <0B76CCA7-C37C-4E24-BBDF-C8FD805DBBF2@illinois.edu> On Jan 14, 2010, at 2:15 PM, Siddhartha Basu wrote: > On Thu, 14 Jan 2010, Jason Stajich wrote: > >> What aspects of the report are you loading? You might consider the blast >> report as tab-delimited (-m 8 format) if you only are interested in >> start/end positions and scores of ailgnments which is a simpler and reduced >> dataset that has lower memory footprint by the parser. > > I think this would be a better approach i am mostly interested in > start/end/score data only. > >> Searchio (default) -format => blast - you can try the BLAST -format => >> blast_pull instead which lazy parses to create objects and will reduce >> memory consumption. > > It's another good option though. But just out of curosity, so the > regular blast parser do load the entire file in the memory consider the > output consist of multiple Results concatenated together into a > single file. Could anybody clarify. Yes, the original SearchIO parsers all load the data into objects. This was based on the presumption that one wouldn't want very large BLAST reports, but this assumption probably isn't amenable today. The pull parser is one aswer to that, in it pulls the data only upon request (creates them on the fly), so it should be more amenable to parsing very large BLAST reports. > thanks, > -siddhartha > >> -jason chris From cjfields at illinois.edu Fri Jan 15 01:33:50 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 15 Jan 2010 00:33:50 -0600 Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes In-Reply-To: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net> References: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net> Message-ID: Scott, I think this is fine (to change the third condition and retry with a specific code). The other possibility is to simply throw different exceptions under each of these circumstances, which can be caught via eval to allow a retry under only certain conditions (no content, for instance). One interesting bit: I think (though I'm not sure) the new BLAST+ allows remote BLAST queries from command line, similar to the legacy blastcl3. Mark just wrote up a BLAST+ wrapper, so it might be worth testing that theory out. chris PS - BTW, nice to finally meet you at GMOD! On Jan 14, 2010, at 4:58 PM, Scott Markel wrote: > We've been looking at Bio::Tools::Run::RemoteBlast after some feedback > from our customers. Due to network irregularities (not sure what else > to call it) users see the getting of remote BLAST results as somewhat > random. When results come back the hits are fine, but sometimes no > information comes back at all. Retrying helps. > > In looking at RemoteBlast.pm there are four "return -1" cases. > > * $status eq 'ERROR' (return on line 614) > * $line =~ /ERROR/I (return on line 628) > * !$got_content (return on line 648) > * !$response->is_success (return on line 655) > > In the case of no content we'd like to retry remote BLAST. We're happy > to do that part in our Pipeline Pilot Perl code wrapper for the BioPerl > module, but we only want to retry in that case, not the other three. > > What would happen if that third "return -1" changed to a different > return value? > > Scott > > Scott Markel, Ph.D. > Principal Bioinformatics Architect email: smarkel at accelrys.com > Accelrys (Pipeline Pilot R&D) mobile: +1 858 205 3653 > 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 > San Diego, CA 92121 fax: +1 858 799 5222 > USA web: http://www.accelrys.com > > http://www.linkedin.com/in/smarkel > Vice President, Board of Directors: > International Society for Computational Biology > Chair: ISCB Publications Committee > Associate Editor: PLoS Computational Biology > Editorial Board: Briefings in Bioinformatics > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields1 at gmail.com Fri Jan 15 01:35:35 2010 From: cjfields1 at gmail.com (Christopher Fields) Date: Fri, 15 Jan 2010 00:35:35 -0600 Subject: [Bioperl-l] filter_by_location in Bio::DB::SeqFeature::Store::memory In-Reply-To: <4B4F966D.3030300@berkeley.edu> References: <4B4F966D.3030300@berkeley.edu> Message-ID: <992796AC-B85B-4555-88A1-36000C0A2002@gmail.com> An HTML attachment was scrubbed... URL: From David.Messina at sbc.su.se Fri Jan 15 10:17:14 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 15 Jan 2010 16:17:14 +0100 Subject: [Bioperl-l] getting/setting species names with Bio::Species Message-ID: <86045871-98A8-46A3-9ECC-699C32CFE1A1@sbc.su.se> Hi everybody, I'm having a little trouble with names in Bio::Species objects. According to the Bio::Species documentation, if I have a species name as a string, like "Homo sapiens", I can get and set that using the species method: my $my_species_obj = Bio::Species->new(); $my_species_obj->species('Homo sapiens'); print $my_species_obj->species; # 'Homo sapiens' That works fine if I create the Bio::Species object myself. But if I try to get that string back out from a BIo::Species object created by SeqIO from a genbank file, I get just 'sapiens' back: my $io = Bio::SeqIO->new('-format' => 'genbank', '-file' => 'hoxa2.gb'); my $seq_obj = $io->next_seq; my $io_species_obj = $seq_obj->species; print $io_species_obj->species; # 'sapiens' I think that happens because genbank records have more taxonomic info about the species name, like the genus (and in fact the whole taxonomic categorization: kingdom phylum order, etc). So the genus is stored separately. Poking around a bit more in Bio::Species, I turned up the method 'binomial', which appears to do the right thing, returning genus and species in both cases. Except, as you can see, the space is stripped out for my species-name-is-just-a-string object: print $my_species_obj->binomial; # 'Homosapiens' print $io_species_obj->binomial; # 'Homo sapiens' I'm not very familiar with Bio::Species (and its parent Bio::Taxon); am I using it correctly above, or is there a better way? If not, this kinda looks like a bug to me. I've got a patch which works and passes the BioPerl test suite. Thanks, Dave From maj at fortinbras.us Fri Jan 15 10:31:16 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 15 Jan 2010 10:31:16 -0500 Subject: [Bioperl-l] getting/setting species names with Bio::Species In-Reply-To: <86045871-98A8-46A3-9ECC-699C32CFE1A1@sbc.su.se> References: <86045871-98A8-46A3-9ECC-699C32CFE1A1@sbc.su.se> Message-ID: I'm not that familiar with Bio::Species either, but this looks like conflicting semantics betwen Bio::Species and Bio::SeqIO. Bio::SeqIO sets the species accessor to the 'species' element of the lineage array, I believe. FWIW, I'd prefer "binomial" = "genus" . "species" MAJ ----- Original Message ----- From: "Dave Messina" To: "BioPerl List" Sent: Friday, January 15, 2010 10:17 AM Subject: [Bioperl-l] getting/setting species names with Bio::Species > Hi everybody, > > I'm having a little trouble with names in Bio::Species objects. > > According to the Bio::Species documentation, if I have a species name as a > string, like "Homo sapiens", I can get and set that using the species method: > > my $my_species_obj = Bio::Species->new(); > $my_species_obj->species('Homo sapiens'); > > print $my_species_obj->species; # 'Homo sapiens' > > > That works fine if I create the Bio::Species object myself. > > But if I try to get that string back out from a BIo::Species object created by > SeqIO from a genbank file, I get just 'sapiens' back: > > my $io = Bio::SeqIO->new('-format' => 'genbank', > '-file' => 'hoxa2.gb'); > my $seq_obj = $io->next_seq; > my $io_species_obj = $seq_obj->species; > > print $io_species_obj->species; # 'sapiens' > > > I think that happens because genbank records have more taxonomic info about > the species name, like the genus (and in fact the whole taxonomic > categorization: kingdom phylum order, etc). So the genus is stored separately. > > Poking around a bit more in Bio::Species, I turned up the method 'binomial', > which appears to do the right thing, returning genus and species in both > cases. Except, as you can see, the space is stripped out for my > species-name-is-just-a-string object: > > print $my_species_obj->binomial; # 'Homosapiens' > print $io_species_obj->binomial; # 'Homo sapiens' > > > I'm not very familiar with Bio::Species (and its parent Bio::Taxon); am I > using it correctly above, or is there a better way? > > If not, this kinda looks like a bug to me. I've got a patch which works and > passes the BioPerl test suite. > > > Thanks, > Dave > > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From maj at fortinbras.us Fri Jan 15 10:24:06 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 15 Jan 2010 10:24:06 -0500 Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes In-Reply-To: References: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net> Message-ID: True-- blast+ allows remote dbs. I just commited a patch that makes this easy in StandAloneBlastPlus: specify '-remote => 1' in the factory, and downstream command calls will take care of it- MAJ # ex... use Bio::Tools::Run::StandAloneBlastPlus; use Bio::Seq; $ENV{BLASTPLUSDIR} = $where_it_is; my $fac = Bio::Tools::Run::StandAloneBlastPlus->new( -db_name => 'wgs', -remote => 1 ); my $result = $fac->blastn( -query => Bio::Seq->new(-seq=>'ggcaacaaacctggtaaagaagacggcaacaagcctggtaaagaagatggcaacaagcct', -id=>"proteinA") ); 1; ----- Original Message ----- From: "Chris Fields" To: "Scott Markel" Cc: Sent: Friday, January 15, 2010 1:33 AM Subject: Re: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes > Scott, > > I think this is fine (to change the third condition and retry with a specific > code). The other possibility is to simply throw different exceptions under > each of these circumstances, which can be caught via eval to allow a retry > under only certain conditions (no content, for instance). > > One interesting bit: I think (though I'm not sure) the new BLAST+ allows > remote BLAST queries from command line, similar to the legacy blastcl3. Mark > just wrote up a BLAST+ wrapper, so it might be worth testing that theory out. > > chris > > PS - BTW, nice to finally meet you at GMOD! > > On Jan 14, 2010, at 4:58 PM, Scott Markel wrote: > >> We've been looking at Bio::Tools::Run::RemoteBlast after some feedback >> from our customers. Due to network irregularities (not sure what else >> to call it) users see the getting of remote BLAST results as somewhat >> random. When results come back the hits are fine, but sometimes no >> information comes back at all. Retrying helps. >> >> In looking at RemoteBlast.pm there are four "return -1" cases. >> >> * $status eq 'ERROR' (return on line 614) >> * $line =~ /ERROR/I (return on line 628) >> * !$got_content (return on line 648) >> * !$response->is_success (return on line 655) >> >> In the case of no content we'd like to retry remote BLAST. We're happy >> to do that part in our Pipeline Pilot Perl code wrapper for the BioPerl >> module, but we only want to retry in that case, not the other three. >> >> What would happen if that third "return -1" changed to a different >> return value? >> >> Scott >> >> Scott Markel, Ph.D. >> Principal Bioinformatics Architect email: smarkel at accelrys.com >> Accelrys (Pipeline Pilot R&D) mobile: +1 858 205 3653 >> 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 >> San Diego, CA 92121 fax: +1 858 799 5222 >> USA web: http://www.accelrys.com >> >> http://www.linkedin.com/in/smarkel >> Vice President, Board of Directors: >> International Society for Computational Biology >> Chair: ISCB Publications Committee >> Associate Editor: PLoS Computational Biology >> Editorial Board: Briefings in Bioinformatics >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From SMarkel at accelrys.com Fri Jan 15 10:40:31 2010 From: SMarkel at accelrys.com (Scott Markel) Date: Fri, 15 Jan 2010 07:40:31 -0800 Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes In-Reply-To: References: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net> Message-ID: <5ACBA19439E77B43A06F4CAB897EC977019B3096@EXCH1-COLO.accelrys.net> Chris, It was nice meeting you and Scott C., too. And seeing Jason again. If you and Mark > How about returning 1, 2, 4 for the non-zero cases, with some > error constants set for convenience? MAJ are okay with adding more return values, that works best for us in Pipeline Pilot. I'll add a Bugzilla entry. Scott -----Original Message----- From: Chris Fields [mailto:cjfields at illinois.edu] Sent: Thursday, 14 January 2010 10:34 PM To: Scott Markel Cc: Bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes Scott, I think this is fine (to change the third condition and retry with a specific code). The other possibility is to simply throw different exceptions under each of these circumstances, which can be caught via eval to allow a retry under only certain conditions (no content, for instance). One interesting bit: I think (though I'm not sure) the new BLAST+ allows remote BLAST queries from command line, similar to the legacy blastcl3. Mark just wrote up a BLAST+ wrapper, so it might be worth testing that theory out. chris PS - BTW, nice to finally meet you at GMOD! On Jan 14, 2010, at 4:58 PM, Scott Markel wrote: > We've been looking at Bio::Tools::Run::RemoteBlast after some feedback > from our customers. Due to network irregularities (not sure what else > to call it) users see the getting of remote BLAST results as somewhat > random. When results come back the hits are fine, but sometimes no > information comes back at all. Retrying helps. > > In looking at RemoteBlast.pm there are four "return -1" cases. > > * $status eq 'ERROR' (return on line 614) > * $line =~ /ERROR/I (return on line 628) > * !$got_content (return on line 648) > * !$response->is_success (return on line 655) > > In the case of no content we'd like to retry remote BLAST. We're happy > to do that part in our Pipeline Pilot Perl code wrapper for the BioPerl > module, but we only want to retry in that case, not the other three. > > What would happen if that third "return -1" changed to a different > return value? > > Scott > > Scott Markel, Ph.D. > Principal Bioinformatics Architect email: smarkel at accelrys.com > Accelrys (Pipeline Pilot R&D) mobile: +1 858 205 3653 > 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 > San Diego, CA 92121 fax: +1 858 799 5222 > USA web: http://www.accelrys.com > > http://www.linkedin.com/in/smarkel > Vice President, Board of Directors: > International Society for Computational Biology > Chair: ISCB Publications Committee > Associate Editor: PLoS Computational Biology > Editorial Board: Briefings in Bioinformatics > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Fri Jan 15 11:00:21 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 15 Jan 2010 10:00:21 -0600 Subject: [Bioperl-l] getting/setting species names with Bio::Species In-Reply-To: References: <86045871-98A8-46A3-9ECC-699C32CFE1A1@sbc.su.se> Message-ID: <16F8A316-FAB3-4D5E-975A-05CE14578982@illinois.edu> > FWIW, I'd prefer "binomial" = "genus" . "species" That's the way Bio::Species is supposed to work, at least when it was refactored by Sendu. But just a note: Bio::Species was considered deprecated (scheduled for the 1.7 release IIRC) for many very good reasons in favor of Bio::Taxon. First and foremost among these is the fact we cannot consistently parse out the genus/species/strain/variant/etc for every organism in GenBank w/o knowing it's full lineage, which means including some taxonomic information. And even then it's highly problematic. We've had several heated discussions on list about how to handle this in a somewhat backwards-compatible way, and the main solution was to forego compatibility issues altogether and eventually deprecate Bio::Species altogether in favor of Bio::Taxon, a class that doesn't make the same assumptions. Bio::Species, in the interim, is-a Bio::Taxon. You'll note that a minimal Bio::DB::Taxonomy instance is constructed from the classification scheme in some instances, but if one had a proper DB link one could link to Entrez Taxonomy or a local flat file indexes DB and grab the info. Bio::Taxon (correct me if I'm wrong on this Sendu, if you're out there) eschews various methods (species, etc) for simpler consistent ones based on Taxonomy, and doesn't force us to handle every exception to getting the genus/species out of a name. That is left up to the user, at their peril. For either one, if you are reproducing the fully qualified name, you probably should use something like node_name() for consistency. Bio::Species also has scientific_name(). With a true Bio::Taxon one would need to be check this is performed on the species node. chris On Jan 15, 2010, at 9:31 AM, Mark A. Jensen wrote: > I'm not that familiar with Bio::Species either, but this looks > like conflicting semantics betwen Bio::Species and Bio::SeqIO. > Bio::SeqIO sets the species accessor to the 'species' element of > the lineage array, I believe. > FWIW, I'd prefer "binomial" = "genus" . "species" > MAJ > ----- Original Message ----- From: "Dave Messina" > To: "BioPerl List" > Sent: Friday, January 15, 2010 10:17 AM > Subject: [Bioperl-l] getting/setting species names with Bio::Species > > >> Hi everybody, >> >> I'm having a little trouble with names in Bio::Species objects. >> >> According to the Bio::Species documentation, if I have a species name as a string, like "Homo sapiens", I can get and set that using the species method: >> >> my $my_species_obj = Bio::Species->new(); >> $my_species_obj->species('Homo sapiens'); >> >> print $my_species_obj->species; # 'Homo sapiens' >> >> >> That works fine if I create the Bio::Species object myself. >> >> But if I try to get that string back out from a BIo::Species object created by SeqIO from a genbank file, I get just 'sapiens' back: >> >> my $io = Bio::SeqIO->new('-format' => 'genbank', >> '-file' => 'hoxa2.gb'); >> my $seq_obj = $io->next_seq; >> my $io_species_obj = $seq_obj->species; >> >> print $io_species_obj->species; # 'sapiens' >> >> >> I think that happens because genbank records have more taxonomic info about the species name, like the genus (and in fact the whole taxonomic categorization: kingdom phylum order, etc). So the genus is stored separately. >> >> Poking around a bit more in Bio::Species, I turned up the method 'binomial', which appears to do the right thing, returning genus and species in both cases. Except, as you can see, the space is stripped out for my species-name-is-just-a-string object: >> >> print $my_species_obj->binomial; # 'Homosapiens' >> print $io_species_obj->binomial; # 'Homo sapiens' >> >> >> I'm not very familiar with Bio::Species (and its parent Bio::Taxon); am I using it correctly above, or is there a better way? >> >> If not, this kinda looks like a bug to me. I've got a patch which works and passes the BioPerl test suite. >> >> >> Thanks, >> Dave >> >> >> >> >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From SMarkel at accelrys.com Fri Jan 15 11:10:34 2010 From: SMarkel at accelrys.com (Scott Markel) Date: Fri, 15 Jan 2010 08:10:34 -0800 Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes In-Reply-To: References: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net> <5ACBA19439E77B43A06F4CAB897EC977019B3096@EXCH1-COLO.accelrys.net> Message-ID: <5ACBA19439E77B43A06F4CAB897EC977019B30A7@EXCH1-COLO.accelrys.net> Mark, Thank you. Scott -----Original Message----- From: Mark A. Jensen [mailto:maj at fortinbras.us] Sent: Friday, 15 January 2010 8:10 AM To: Scott Markel; Chris Fields Cc: Bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes can do Scott-- cheers MAJ ----- Original Message ----- From: "Scott Markel" To: "Chris Fields" Cc: Sent: Friday, January 15, 2010 10:40 AM Subject: Re: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes > Chris, > > It was nice meeting you and Scott C., too. And seeing Jason again. > > If you and Mark > >> How about returning 1, 2, 4 for the non-zero cases, with some >> error constants set for convenience? MAJ > > are okay with adding more return values, that works best for us in > Pipeline Pilot. > > I'll add a Bugzilla entry. > > Scott > > > -----Original Message----- > From: Chris Fields [mailto:cjfields at illinois.edu] > Sent: Thursday, 14 January 2010 10:34 PM > To: Scott Markel > Cc: Bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes > > Scott, > > I think this is fine (to change the third condition and retry with a specific > code). The other possibility is to simply throw different exceptions under > each of these circumstances, which can be caught via eval to allow a retry > under only certain conditions (no content, for instance). > > One interesting bit: I think (though I'm not sure) the new BLAST+ allows > remote BLAST queries from command line, similar to the legacy blastcl3. Mark > just wrote up a BLAST+ wrapper, so it might be worth testing that theory out. > > chris > > PS - BTW, nice to finally meet you at GMOD! > > On Jan 14, 2010, at 4:58 PM, Scott Markel wrote: > >> We've been looking at Bio::Tools::Run::RemoteBlast after some feedback >> from our customers. Due to network irregularities (not sure what else >> to call it) users see the getting of remote BLAST results as somewhat >> random. When results come back the hits are fine, but sometimes no >> information comes back at all. Retrying helps. >> >> In looking at RemoteBlast.pm there are four "return -1" cases. >> >> * $status eq 'ERROR' (return on line 614) >> * $line =~ /ERROR/I (return on line 628) >> * !$got_content (return on line 648) >> * !$response->is_success (return on line 655) >> >> In the case of no content we'd like to retry remote BLAST. We're happy >> to do that part in our Pipeline Pilot Perl code wrapper for the BioPerl >> module, but we only want to retry in that case, not the other three. >> >> What would happen if that third "return -1" changed to a different >> return value? >> >> Scott >> >> Scott Markel, Ph.D. >> Principal Bioinformatics Architect email: smarkel at accelrys.com >> Accelrys (Pipeline Pilot R&D) mobile: +1 858 205 3653 >> 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 >> San Diego, CA 92121 fax: +1 858 799 5222 >> USA web: http://www.accelrys.com >> >> http://www.linkedin.com/in/smarkel >> Vice President, Board of Directors: >> International Society for Computational Biology >> Chair: ISCB Publications Committee >> Associate Editor: PLoS Computational Biology >> Editorial Board: Briefings in Bioinformatics >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From maj at fortinbras.us Fri Jan 15 11:09:38 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 15 Jan 2010 11:09:38 -0500 Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes In-Reply-To: <5ACBA19439E77B43A06F4CAB897EC977019B3096@EXCH1-COLO.accelrys.net> References: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net> <5ACBA19439E77B43A06F4CAB897EC977019B3096@EXCH1-COLO.accelrys.net> Message-ID: can do Scott-- cheers MAJ ----- Original Message ----- From: "Scott Markel" To: "Chris Fields" Cc: Sent: Friday, January 15, 2010 10:40 AM Subject: Re: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes > Chris, > > It was nice meeting you and Scott C., too. And seeing Jason again. > > If you and Mark > >> How about returning 1, 2, 4 for the non-zero cases, with some >> error constants set for convenience? MAJ > > are okay with adding more return values, that works best for us in > Pipeline Pilot. > > I'll add a Bugzilla entry. > > Scott > > > -----Original Message----- > From: Chris Fields [mailto:cjfields at illinois.edu] > Sent: Thursday, 14 January 2010 10:34 PM > To: Scott Markel > Cc: Bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes > > Scott, > > I think this is fine (to change the third condition and retry with a specific > code). The other possibility is to simply throw different exceptions under > each of these circumstances, which can be caught via eval to allow a retry > under only certain conditions (no content, for instance). > > One interesting bit: I think (though I'm not sure) the new BLAST+ allows > remote BLAST queries from command line, similar to the legacy blastcl3. Mark > just wrote up a BLAST+ wrapper, so it might be worth testing that theory out. > > chris > > PS - BTW, nice to finally meet you at GMOD! > > On Jan 14, 2010, at 4:58 PM, Scott Markel wrote: > >> We've been looking at Bio::Tools::Run::RemoteBlast after some feedback >> from our customers. Due to network irregularities (not sure what else >> to call it) users see the getting of remote BLAST results as somewhat >> random. When results come back the hits are fine, but sometimes no >> information comes back at all. Retrying helps. >> >> In looking at RemoteBlast.pm there are four "return -1" cases. >> >> * $status eq 'ERROR' (return on line 614) >> * $line =~ /ERROR/I (return on line 628) >> * !$got_content (return on line 648) >> * !$response->is_success (return on line 655) >> >> In the case of no content we'd like to retry remote BLAST. We're happy >> to do that part in our Pipeline Pilot Perl code wrapper for the BioPerl >> module, but we only want to retry in that case, not the other three. >> >> What would happen if that third "return -1" changed to a different >> return value? >> >> Scott >> >> Scott Markel, Ph.D. >> Principal Bioinformatics Architect email: smarkel at accelrys.com >> Accelrys (Pipeline Pilot R&D) mobile: +1 858 205 3653 >> 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 >> San Diego, CA 92121 fax: +1 858 799 5222 >> USA web: http://www.accelrys.com >> >> http://www.linkedin.com/in/smarkel >> Vice President, Board of Directors: >> International Society for Computational Biology >> Chair: ISCB Publications Committee >> Associate Editor: PLoS Computational Biology >> Editorial Board: Briefings in Bioinformatics >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From maj at fortinbras.us Fri Jan 15 11:10:02 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 15 Jan 2010 11:10:02 -0500 Subject: [Bioperl-l] getting/setting species names with Bio::Species In-Reply-To: <16F8A316-FAB3-4D5E-975A-05CE14578982@illinois.edu> References: <86045871-98A8-46A3-9ECC-699C32CFE1A1@sbc.su.se> <16F8A316-FAB3-4D5E-975A-05CE14578982@illinois.edu> Message-ID: excellent summary--thanks!! ----- Original Message ----- From: "Chris Fields" To: "Mark A. Jensen" Cc: "BioPerl List" Sent: Friday, January 15, 2010 11:00 AM Subject: Re: [Bioperl-l] getting/setting species names with Bio::Species >> FWIW, I'd prefer "binomial" = "genus" . "species" > > > That's the way Bio::Species is supposed to work, at least when it was > refactored by Sendu. But just a note: Bio::Species was considered deprecated > (scheduled for the 1.7 release IIRC) for many very good reasons in favor of > Bio::Taxon. First and foremost among these is the fact we cannot consistently > parse out the genus/species/strain/variant/etc for every organism in GenBank > w/o knowing it's full lineage, which means including some taxonomic > information. And even then it's highly problematic. > > We've had several heated discussions on list about how to handle this in a > somewhat backwards-compatible way, and the main solution was to forego > compatibility issues altogether and eventually deprecate Bio::Species > altogether in favor of Bio::Taxon, a class that doesn't make the same > assumptions. Bio::Species, in the interim, is-a Bio::Taxon. You'll note that > a minimal Bio::DB::Taxonomy instance is constructed from the classification > scheme in some instances, but if one had a proper DB link one could link to > Entrez Taxonomy or a local flat file indexes DB and grab the info. Bio::Taxon > (correct me if I'm wrong on this Sendu, if you're out there) eschews various > methods (species, etc) for simpler consistent ones based on Taxonomy, and > doesn't force us to handle every exception to getting the genus/species out of > a name. That is left up to the user, at their peril. > > For either one, if you are reproducing the fully qualified name, you probably > should use something like node_name() for consistency. Bio::Species also has > scientific_name(). With a true Bio::Taxon one would need to be check this is > performed on the species node. > > chris > > On Jan 15, 2010, at 9:31 AM, Mark A. Jensen wrote: > >> I'm not that familiar with Bio::Species either, but this looks >> like conflicting semantics betwen Bio::Species and Bio::SeqIO. >> Bio::SeqIO sets the species accessor to the 'species' element of >> the lineage array, I believe. >> FWIW, I'd prefer "binomial" = "genus" . "species" >> MAJ >> ----- Original Message ----- From: "Dave Messina" >> To: "BioPerl List" >> Sent: Friday, January 15, 2010 10:17 AM >> Subject: [Bioperl-l] getting/setting species names with Bio::Species >> >> >>> Hi everybody, >>> >>> I'm having a little trouble with names in Bio::Species objects. >>> >>> According to the Bio::Species documentation, if I have a species name as a >>> string, like "Homo sapiens", I can get and set that using the species >>> method: >>> >>> my $my_species_obj = Bio::Species->new(); >>> $my_species_obj->species('Homo sapiens'); >>> >>> print $my_species_obj->species; # 'Homo sapiens' >>> >>> >>> That works fine if I create the Bio::Species object myself. >>> >>> But if I try to get that string back out from a BIo::Species object created >>> by SeqIO from a genbank file, I get just 'sapiens' back: >>> >>> my $io = Bio::SeqIO->new('-format' => 'genbank', >>> '-file' => 'hoxa2.gb'); >>> my $seq_obj = $io->next_seq; >>> my $io_species_obj = $seq_obj->species; >>> >>> print $io_species_obj->species; # 'sapiens' >>> >>> >>> I think that happens because genbank records have more taxonomic info about >>> the species name, like the genus (and in fact the whole taxonomic >>> categorization: kingdom phylum order, etc). So the genus is stored >>> separately. >>> >>> Poking around a bit more in Bio::Species, I turned up the method 'binomial', >>> which appears to do the right thing, returning genus and species in both >>> cases. Except, as you can see, the space is stripped out for my >>> species-name-is-just-a-string object: >>> >>> print $my_species_obj->binomial; # 'Homosapiens' >>> print $io_species_obj->binomial; # 'Homo sapiens' >>> >>> >>> I'm not very familiar with Bio::Species (and its parent Bio::Taxon); am I >>> using it correctly above, or is there a better way? >>> >>> If not, this kinda looks like a bug to me. I've got a patch which works and >>> passes the BioPerl test suite. >>> >>> >>> Thanks, >>> Dave >>> >>> >>> >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From hlapp at drycafe.net Fri Jan 15 12:04:43 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Fri, 15 Jan 2010 12:04:43 -0500 Subject: [Bioperl-l] getting/setting species names with Bio::Species In-Reply-To: <86045871-98A8-46A3-9ECC-699C32CFE1A1@sbc.su.se> References: <86045871-98A8-46A3-9ECC-699C32CFE1A1@sbc.su.se> Message-ID: <2114E010-7819-4E74-9A92-C6DE6496ED0E@drycafe.net> On Jan 15, 2010, at 10:17 AM, Dave Messina wrote: > According to the Bio::Species documentation, if I have a species > name as a string, like "Homo sapiens", I can get and set that using > the species method: > > my $my_species_obj = Bio::Species->new(); > $my_species_obj->species('Homo sapiens'); If that's really what the documentation says, it's wrong. It is the binomial() method that does this (as getter and setter). -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From David.Messina at sbc.su.se Fri Jan 15 13:37:17 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 15 Jan 2010 19:37:17 +0100 Subject: [Bioperl-l] getting/setting species names with Bio::Species In-Reply-To: <2114E010-7819-4E74-9A92-C6DE6496ED0E@drycafe.net> References: <86045871-98A8-46A3-9ECC-699C32CFE1A1@sbc.su.se> <2114E010-7819-4E74-9A92-C6DE6496ED0E@drycafe.net> Message-ID: <24798E45-CF24-47D9-AB39-E66C35A5FA8B@sbc.su.se> Thanks guys. Well, looks like I ignored the deprecation warnings at my own peril. :) I'll reimplement my code using Bio::Taxon directly instead. I made a little test using the node_name() method as Chris suggested, and it seems to do the trick nicely. > If that's really what the documentation says, it's wrong. I'm afraid so. In the POD > Title : species > Usage : $self->species( $species ); > $species = $self->species(); > Function: Get or set the scientific species name. > Example : $self->species('Homo sapiens'); > Returns : Scientific species name as string > Args : Scientific species name as string and the HOWTO http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#The_Species_Object > # legible and long > my $species_object = $seq_object->species; > my $species_string = $species_object->species; > > # Perlish > my $species_string = $seq_object->species->species; > # either way, $species_string is "Homo sapiens" Unless there's objection, I'll fix both of those. > It is the binomial() method that does this (as getter and setter). Great, thanks for the clarification, Hilmar. From bhakti.dwivedi at gmail.com Sun Jan 17 11:02:47 2010 From: bhakti.dwivedi at gmail.com (Bhakti Dwivedi) Date: Sun, 17 Jan 2010 11:02:47 -0500 Subject: [Bioperl-l] Reciprocal best hits using Bioperl? Message-ID: Hi Is there a Bio-perl module to parse the reciprocal best hits (query1-> hit1 && hit1 -> query1) from a blast table report? Thanks BD From cjfields at illinois.edu Sun Jan 17 12:45:08 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 17 Jan 2010 11:45:08 -0600 Subject: [Bioperl-l] Reciprocal best hits using Bioperl? In-Reply-To: References: Message-ID: <4FC546A8-079F-4A17-AB96-D4A0060904D6@illinois.edu> It's probably not best to use BioPerl directly for this. Have you tried OrthoMCL, or InParanoid? chris On Jan 17, 2010, at 10:02 AM, Bhakti Dwivedi wrote: > Hi > > Is there a Bio-perl module to parse the reciprocal best hits (query1-> hit1 > && hit1 -> query1) from a blast table report? > > Thanks > > BD > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Sun Jan 17 16:03:24 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 17 Jan 2010 16:03:24 -0500 Subject: [Bioperl-l] Reciprocal best hits using Bioperl? In-Reply-To: References: Message-ID: re Chris's answer, check out this archived post: http://bioperl.org/pipermail/bioperl-l/2008-March/027357.html cheers MAJ ----- Original Message ----- From: "Bhakti Dwivedi" To: Sent: Sunday, January 17, 2010 11:02 AM Subject: [Bioperl-l] Reciprocal best hits using Bioperl? > Hi > > Is there a Bio-perl module to parse the reciprocal best hits (query1-> hit1 > && hit1 -> query1) from a blast table report? > > Thanks > > BD > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From bhakti.dwivedi at gmail.com Sun Jan 17 16:10:03 2010 From: bhakti.dwivedi at gmail.com (Bhakti Dwivedi) Date: Sun, 17 Jan 2010 16:10:03 -0500 Subject: [Bioperl-l] Reciprocal best hits using Bioperl? In-Reply-To: References: Message-ID: Thank you! On Sun, Jan 17, 2010 at 4:03 PM, Mark A. Jensen wrote: > re Chris's answer, check out this archived post: > http://bioperl.org/pipermail/bioperl-l/2008-March/027357.html > cheers MAJ > ----- Original Message ----- From: "Bhakti Dwivedi" < > bhakti.dwivedi at gmail.com> > To: > Sent: Sunday, January 17, 2010 11:02 AM > Subject: [Bioperl-l] Reciprocal best hits using Bioperl? > > > Hi >> >> Is there a Bio-perl module to parse the reciprocal best hits (query1-> >> hit1 >> && hit1 -> query1) from a blast table report? >> >> Thanks >> >> BD >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> From cjfields at illinois.edu Sun Jan 17 17:00:02 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 17 Jan 2010 16:00:02 -0600 Subject: [Bioperl-l] Reciprocal best hits using Bioperl? In-Reply-To: References: Message-ID: <392263B8-10EC-4361-82A8-0ED8E9FC7627@illinois.edu> OrthoMCL has updated to v2 and no longer uses BioPerl, just plain perl. Database is available here: http://orthomcl.org/cgi-bin/OrthoMclWeb.cgi Package (you'll need a few other things to get it working): http://orthomcl.org/common/downloads/software/ chris On Jan 17, 2010, at 3:10 PM, Bhakti Dwivedi wrote: > Thank you! > > > On Sun, Jan 17, 2010 at 4:03 PM, Mark A. Jensen wrote: > >> re Chris's answer, check out this archived post: >> http://bioperl.org/pipermail/bioperl-l/2008-March/027357.html >> cheers MAJ >> ----- Original Message ----- From: "Bhakti Dwivedi" < >> bhakti.dwivedi at gmail.com> >> To: >> Sent: Sunday, January 17, 2010 11:02 AM >> Subject: [Bioperl-l] Reciprocal best hits using Bioperl? >> >> >> Hi >>> >>> Is there a Bio-perl module to parse the reciprocal best hits (query1-> >>> hit1 >>> && hit1 -> query1) from a blast table report? >>> >>> Thanks >>> >>> BD >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From tristan.lefebure at gmail.com Sun Jan 17 18:12:56 2010 From: tristan.lefebure at gmail.com (Tristan Lefebure) Date: Sun, 17 Jan 2010 18:12:56 -0500 Subject: [Bioperl-l] Reciprocal best hits using Bioperl? In-Reply-To: <392263B8-10EC-4361-82A8-0ED8E9FC7627@illinois.edu> References: <392263B8-10EC-4361-82A8-0ED8E9FC7627@illinois.edu> Message-ID: <201001171812.56238.tristan.lefebure@gmail.com> The transition to orthoMCL v2 being a bit painful (you need a MySQL database), I recently switched directly to MCL and the accompanying mclblastline and co programs. Modular, simple and very fast. Following some simulations, It gives better results with incomplete genomes than orthoMCL v1.x ... http://micans.org/mcl/ --Tristan On Sunday 17 January 2010 17:00:02 Chris Fields wrote: > OrthoMCL has updated to v2 and no longer uses BioPerl, > just plain perl. Database is available here: > > http://orthomcl.org/cgi-bin/OrthoMclWeb.cgi > > Package (you'll need a few other things to get it > working): > > http://orthomcl.org/common/downloads/software/ > > chris > > On Jan 17, 2010, at 3:10 PM, Bhakti Dwivedi wrote: > > Thank you! > > > > On Sun, Jan 17, 2010 at 4:03 PM, Mark A. Jensen wrote: > >> re Chris's answer, check out this archived post: > >> http://bioperl.org/pipermail/bioperl-l/2008-March/0273 > >>57.html cheers MAJ > >> ----- Original Message ----- From: "Bhakti Dwivedi" < > >> bhakti.dwivedi at gmail.com> > >> To: > >> Sent: Sunday, January 17, 2010 11:02 AM > >> Subject: [Bioperl-l] Reciprocal best hits using > >> Bioperl? > >> > >> > >> Hi > >> > >>> Is there a Bio-perl module to parse the reciprocal > >>> best hits (query1-> hit1 > >>> && hit1 -> query1) from a blast table report? > >>> > >>> Thanks > >>> > >>> BD > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jason at bioperl.org Sun Jan 17 18:59:05 2010 From: jason at bioperl.org (Jason Stajich) Date: Sun, 17 Jan 2010 15:59:05 -0800 Subject: [Bioperl-l] Reciprocal best hits using Bioperl? In-Reply-To: <201001171812.56238.tristan.lefebure@gmail.com> References: <392263B8-10EC-4361-82A8-0ED8E9FC7627@illinois.edu> <201001171812.56238.tristan.lefebure@gmail.com> Message-ID: <9151FF3A-B51D-4E03-AAA0-4463A6A57F10@bioperl.org> yes - but mcl alone is something slightly different in that it doesn't correct for inparalogs, but for incomplete genomes this is probably okay. orthomcl2 does correct the major memory hog problem and efficiencies in the parsing in the previous version by relying on the db for the indexing and looking of the reciprocal hits. -jason On Jan 17, 2010, at 3:12 PM, Tristan Lefebure wrote: > The transition to orthoMCL v2 being a bit painful (you need > a MySQL database), I recently switched directly to MCL and > the accompanying mclblastline and co programs. Modular, > simple and very fast. Following some simulations, It gives > better results with incomplete genomes than orthoMCL v1.x > ... > > http://micans.org/mcl/ > > --Tristan > > On Sunday 17 January 2010 17:00:02 Chris Fields wrote: >> OrthoMCL has updated to v2 and no longer uses BioPerl, >> just plain perl. Database is available here: >> >> http://orthomcl.org/cgi-bin/OrthoMclWeb.cgi >> >> Package (you'll need a few other things to get it >> working): >> >> http://orthomcl.org/common/downloads/software/ >> >> chris >> >> On Jan 17, 2010, at 3:10 PM, Bhakti Dwivedi wrote: >>> Thank you! >>> >>> On Sun, Jan 17, 2010 at 4:03 PM, Mark A. Jensen > wrote: >>>> re Chris's answer, check out this archived post: >>>> http://bioperl.org/pipermail/bioperl-l/2008-March/0273 >>>> 57.html cheers MAJ >>>> ----- Original Message ----- From: "Bhakti Dwivedi" < >>>> bhakti.dwivedi at gmail.com> >>>> To: >>>> Sent: Sunday, January 17, 2010 11:02 AM >>>> Subject: [Bioperl-l] Reciprocal best hits using >>>> Bioperl? >>>> >>>> >>>> Hi >>>> >>>>> Is there a Bio-perl module to parse the reciprocal >>>>> best hits (query1-> hit1 >>>>> && hit1 -> query1) from a blast table report? >>>>> >>>>> Thanks >>>>> >>>>> BD >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ From tristan.lefebure at gmail.com Sun Jan 17 20:36:38 2010 From: tristan.lefebure at gmail.com (Tristan Lefebure) Date: Sun, 17 Jan 2010 20:36:38 -0500 Subject: [Bioperl-l] Reciprocal best hits using Bioperl? In-Reply-To: <9151FF3A-B51D-4E03-AAA0-4463A6A57F10@bioperl.org> References: <201001171812.56238.tristan.lefebure@gmail.com> <9151FF3A-B51D-4E03-AAA0-4463A6A57F10@bioperl.org> Message-ID: <201001172036.39032.tristan.lefebure@gmail.com> On Sunday 17 January 2010 18:59:05 Jason Stajich wrote: > yes - but mcl alone is something slightly different in > that it doesn't correct for inparalogs, but for > incomplete genomes this is probably okay. interestingly, my experience with not too divergent bacterial genomes (same genera) does not support the normalization used in the orthoMCL (which, as far as I understand, is a standardization of the -Log10(evalue) per taxa combination, including a taxa with itself). MCL, which does not do any normalization (just -Log10(evalue)) gives about the same number of false negative (i.e. missed orthologs), but a lot less false positive (false orthologs). In other words, you get many fake singletons. I don't known exactly if the problem lies in the normalization process or the fact that orthoMCLv1.x is using a very old version of MCL. What I do known is that many false positive are made of short or incomplete proteins that are very common in draft genomes and automatic annotations... Things might be completely different with more divergent and globally longer proteins. Testing orthoMCLv2 on the same data set would probably give the answer. --Tristan From robert.bradbury at gmail.com Mon Jan 18 05:20:33 2010 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Mon, 18 Jan 2010 05:20:33 -0500 Subject: [Bioperl-l] Reciprocal best hits using Bioperl? In-Reply-To: <201001172036.39032.tristan.lefebure@gmail.com> References: <201001171812.56238.tristan.lefebure@gmail.com> <9151FF3A-B51D-4E03-AAA0-4463A6A57F10@bioperl.org> <201001172036.39032.tristan.lefebure@gmail.com> Message-ID: My comment might be that the problem with OrthoMCL is that it is primarily lower organisms. The problem with Ensembl (and some other databases) is that it is primarliy higher organisms (though they do include Drosophila, C. elegans and Yeast). The problem arises when one wants to cross those boundaries. For example the 5-10 antioxidant proteins, the ~150 DNA repair proteins, many of the mitochondrial (ETC) proteins, the ribosomal rRNA's & tRNAs, and the fundamental biochemistry (EC) proteins are homologous all the way from the most ancient bacteria through H. sapiens. The only way to play in the mixed arena of prokaryotes and eukaryotes involving fundamental vectors in evolution is to either construct ones own databases (which presumably means getting involved with MySQL, and probably spending some $$$ on hardware) or to develop some BioPerl modules that can do the SpeciesX vs. SpeciesY comparisons on demand using some part of the cloud. This problem isn't going to get smaller its only going to get larger, now that the cost of sequencing (pseudo-resequencing) a vertebrate genome is starting to come in under $10,000 and people are starting to seriously talk about 10,000 vertebrate genomes. 10,000 x 10,000 x 20,000 (genes) isn't something people are going to undertake very soon. Robert On 1/17/10, Tristan Lefebure wrote: > On Sunday 17 January 2010 18:59:05 Jason Stajich wrote: >> yes - but mcl alone is something slightly different in >> that it doesn't correct for inparalogs, but for >> incomplete genomes this is probably okay. > > interestingly, my experience with not too divergent > bacterial genomes (same genera) does not support the > normalization used in the orthoMCL (which, as far as I > understand, is a standardization of the -Log10(evalue) per > taxa combination, including a taxa with itself). MCL, which > does not do any normalization (just -Log10(evalue)) gives > about the same number of false negative (i.e. missed > orthologs), but a lot less false positive (false orthologs). > In other words, you get many fake singletons. I don't known > exactly if the problem lies in the normalization process or > the fact that orthoMCLv1.x is using a very old version of > MCL. What I do known is that many false positive are made of > short or incomplete proteins that are very common in draft > genomes and automatic annotations... Things might be > completely different with more divergent and globally longer > proteins. Testing orthoMCLv2 on the same data set would > probably give the answer. > > --Tristan > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From ghhu at sibs.ac.cn Sun Jan 17 21:34:23 2010 From: ghhu at sibs.ac.cn (Guohong Hu) Date: Mon, 18 Jan 2010 10:34:23 +0800 Subject: [Bioperl-l] Bioperl 1.6 Message-ID: <004f01ca97e6$bee06650$3ca132f0$@ac.cn> Hi there, I was trying to install BioPerl in windows using ppm, by following the instruction in "http://bioperl.open-bio.org/wiki/Installing_Bioperl_on_Windows". I set up the repositories, and did the search of Bioperl packages. The latest version available is BioPerl 1.6.1 on http://bioperl.org/DIST. When I tried to install it, a number of prerequisite modules were being installed too, which include Bioperl 1.4. Then an error message showed up during installation: "ERROR: File conflict for 'C:/Perl/html/bin/bp_aacomp.html'.The package BioPerl has already installed a file that package bioperl wants to install." It looks to me that BioPerl 1.6.1 had installed a file that bioperl 1.4 wanted to install again. I don't know why bioperl 1.4 was one of the prerequisites for 1.6.1. If I just install 1.4, it will be installed without errors. But I need a newer version, because some modules (like Bio::Tools::HMM) is not included in 1.4. I saw on internet that somebody had the same problem when he was trying to install BioPerl 1.5, but I didn't find the solution. Anybody has a clue on that? Thank you for your time. GH From cjfields at illinois.edu Mon Jan 18 10:30:20 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 18 Jan 2010 09:30:20 -0600 Subject: [Bioperl-l] Bioperl 1.6 In-Reply-To: <004f01ca97e6$bee06650$3ca132f0$@ac.cn> References: <004f01ca97e6$bee06650$3ca132f0$@ac.cn> Message-ID: Guohong, 1.6.1 PPM doesn't (at least, shouldn't) require BioPerl 1.4 to be installed first. Make sure the repos are set according to the Windows installation instructions on the BioPerl wiki: http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows IIRC the actual order of the PPM repository can be critical (PPM pulls based on highest version, first repo, but sometimes it gets confused). Just curious but where is the v 1.4 PPM located? If it is local to our PPM repo I can physically remove it to prevent this from happening. chris On Jan 17, 2010, at 8:34 PM, Guohong Hu wrote: > Hi there, > > > > I was trying to install BioPerl in windows using ppm, by following the > instruction in > "http://bioperl.open-bio.org/wiki/Installing_Bioperl_on_Windows". I set up > the repositories, and did the search of Bioperl packages. The latest version > available is BioPerl 1.6.1 on http://bioperl.org/DIST. When I tried to > install it, a number of prerequisite modules were being installed too, which > include Bioperl 1.4. Then an error message showed up during installation: > > > > "ERROR: File conflict for 'C:/Perl/html/bin/bp_aacomp.html'.The package > BioPerl has already installed a file that package bioperl wants to install." > > > > It looks to me that BioPerl 1.6.1 had installed a file that bioperl 1.4 > wanted to install again. I don't know why bioperl 1.4 was one of the > prerequisites for 1.6.1. If I just install 1.4, it will be installed without > errors. But I need a newer version, because some modules (like > > Bio::Tools::HMM) is not included in 1.4. > > > > I saw on internet that somebody had the same problem when he was trying to > install BioPerl 1.5, but I didn't find the solution. > > > > Anybody has a clue on that? Thank you for your time. > > > > GH > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Jan 18 11:12:08 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 18 Jan 2010 10:12:08 -0600 Subject: [Bioperl-l] Reciprocal best hits using Bioperl? In-Reply-To: References: <201001171812.56238.tristan.lefebure@gmail.com> <9151FF3A-B51D-4E03-AAA0-4463A6A57F10@bioperl.org> <201001172036.39032.tristan.lefebure@gmail.com> Message-ID: (my small rant on this) On Jan 18, 2010, at 4:20 AM, Robert Bradbury wrote: > My comment might be that the problem with OrthoMCL is that it is > primarily lower organisms. The problem with Ensembl (and some other > databases) is that it is primarliy higher organisms (though they do > include Drosophila, C. elegans and Yeast). OrthoMCL v2 handles both lower and higher organism; I've used it for both, with decent success. Most other ortholog tools do as well (if I'm not mistaken, ensembl also uses MCL under the hood, unless that's changed). I don't believe one should be completely bound to one toolset, particularly in this case (there are lots of nice ortholog clustering tools using various moeans of comparison out there), but I do think OrthoMCL is very good as an initial pass. If anything, I would like a set of (possibly bioperl-based, definitely DB-based) modules that can deal with this information. The more imperative issue in my opinion is that one is prisoner to the gene models for those specific organisms of interest, and this may vary widely depending on the source of those gene models (Ensembl, UCSC, NCBI, EBI, centralized MODs like FlyBase, etc). For instance, if gene models are poorly curated or rarely updated, the comparisons may be significantly flawed. Some of these issues may also be (somewhat) alleviated once more transcriptome data is available that helps clear up gene model ambiguities, but that won't be true for all organisms, at least initially. Note this isn't meant as a slam on any specific DBs or MODs in general, the problem is one born of the fact that there isn't a single, centralized, trusted, consistently updated source for this data, specifically something that will handle moderated third-party annotation. That's a very difficult problem to solve effectively. Some of these very issues crept up at the GMOD conference, and there appears to be consensus that a real attempt is needed to address this. I don't know, maybe it's just unicorns and rainbows. Personally I do think the situation will improve, as there seems to be great demand for it, but it requires time, resources, manpower, money, cat herding, etc. > The problem arises when one wants to cross those boundaries. For > example the 5-10 antioxidant proteins, the ~150 DNA repair proteins, > many of the mitochondrial (ETC) proteins, the ribosomal rRNA's & > tRNAs, and the fundamental biochemistry (EC) proteins are homologous > all the way from the most ancient bacteria through H. sapiens. The > only way to play in the mixed arena of prokaryotes and eukaryotes > involving fundamental vectors in evolution is to either construct ones > own databases (which presumably means getting involved with MySQL, and > probably spending some $$$ on hardware) or to develop some BioPerl > modules that can do the SpeciesX vs. SpeciesY comparisons on demand > using some part of the cloud. This problem isn't going to get smaller > its only going to get larger, now that the cost of sequencing > (pseudo-resequencing) a vertebrate genome is starting to come in under > $10,000 and people are starting to seriously talk about 10,000 > vertebrate genomes. 10,000 x 10,000 x 20,000 (genes) isn't something > people are going to undertake very soon. > > Robert They're already undertaking it now using a broad range of organisms, in and out of the cloud. In most cases one can amend a prior recip. comparative analysis with new data fairly easily, if one takes care to do so early on (i.e. set up the BLAST databases with a specified defined size for comparative stats between separate analyses). OrthoMCL v2 describes a procedure to do this, and I believe others have similar methodology. I could also see possible ways one can further optimize this, for instance in cases where two very closely-related organisms are compared, where translated seqs are 100% identical, etc. IIRC, the OrthoMCL DB site already has a way to upload custom sets of protein data for mapping to (already pre-run) clusters. Just the fact that the tools are available as OS, they're semi-automated, and can be generically applied to data of personal interest is a great boon. Not sure I see the downside of that, and I'm pretty confident the scalability issues will be addressed in some way. chris From maj at fortinbras.us Mon Jan 18 11:33:12 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 18 Jan 2010 11:33:12 -0500 Subject: [Bioperl-l] Bioperl 1.6 In-Reply-To: References: <004f01ca97e6$bee06650$3ca132f0$@ac.cn> Message-ID: <6093E45F17B543438AC02E6C626439E1@NewLife> this issue's come up before, see this thread http://lists.open-bio.org/pipermail/bioperl-l/2009-October/031400.html MAJ ----- Original Message ----- From: "Chris Fields" To: "Guohong Hu" Cc: Sent: Monday, January 18, 2010 10:30 AM Subject: Re: [Bioperl-l] Bioperl 1.6 > Guohong, > > 1.6.1 PPM doesn't (at least, shouldn't) require BioPerl 1.4 to be installed > first. Make sure the repos are set according to the Windows installation > instructions on the BioPerl wiki: > > http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows > > IIRC the actual order of the PPM repository can be critical (PPM pulls based > on highest version, first repo, but sometimes it gets confused). Just curious > but where is the v 1.4 PPM located? If it is local to our PPM repo I can > physically remove it to prevent this from happening. > > chris > > On Jan 17, 2010, at 8:34 PM, Guohong Hu wrote: > >> Hi there, >> >> >> >> I was trying to install BioPerl in windows using ppm, by following the >> instruction in >> "http://bioperl.open-bio.org/wiki/Installing_Bioperl_on_Windows". I set up >> the repositories, and did the search of Bioperl packages. The latest version >> available is BioPerl 1.6.1 on http://bioperl.org/DIST. When I tried to >> install it, a number of prerequisite modules were being installed too, which >> include Bioperl 1.4. Then an error message showed up during installation: >> >> >> >> "ERROR: File conflict for 'C:/Perl/html/bin/bp_aacomp.html'.The package >> BioPerl has already installed a file that package bioperl wants to install." >> >> >> >> It looks to me that BioPerl 1.6.1 had installed a file that bioperl 1.4 >> wanted to install again. I don't know why bioperl 1.4 was one of the >> prerequisites for 1.6.1. If I just install 1.4, it will be installed without >> errors. But I need a newer version, because some modules (like >> >> Bio::Tools::HMM) is not included in 1.4. >> >> >> >> I saw on internet that somebody had the same problem when he was trying to >> install BioPerl 1.5, but I didn't find the solution. >> >> >> >> Anybody has a clue on that? Thank you for your time. >> >> >> >> GH >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Mon Jan 18 12:18:34 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 18 Jan 2010 11:18:34 -0600 Subject: [Bioperl-l] Bioperl 1.6 In-Reply-To: <6093E45F17B543438AC02E6C626439E1@NewLife> References: <004f01ca97e6$bee06650$3ca132f0$@ac.cn> <6093E45F17B543438AC02E6C626439E1@NewLife> Message-ID: Mark, Odd issue, maybe it's a dependency like Bio::ASN1::EntrezGene that's causing this? Regardless, it's problematic for me to test this out directly, at least for the next few days. Maybe someone could try it? Also, there is the Strawberry Perl alternative, which uses CPAN (I think ActiveState also supports this). chris On Jan 18, 2010, at 10:33 AM, Mark A. Jensen wrote: > this issue's come up before, see this thread > http://lists.open-bio.org/pipermail/bioperl-l/2009-October/031400.html > MAJ > ----- Original Message ----- From: "Chris Fields" > To: "Guohong Hu" > Cc: > Sent: Monday, January 18, 2010 10:30 AM > Subject: Re: [Bioperl-l] Bioperl 1.6 > > >> Guohong, >> >> 1.6.1 PPM doesn't (at least, shouldn't) require BioPerl 1.4 to be installed first. Make sure the repos are set according to the Windows installation instructions on the BioPerl wiki: >> >> http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows >> >> IIRC the actual order of the PPM repository can be critical (PPM pulls based on highest version, first repo, but sometimes it gets confused). Just curious but where is the v 1.4 PPM located? If it is local to our PPM repo I can physically remove it to prevent this from happening. >> >> chris >> >> On Jan 17, 2010, at 8:34 PM, Guohong Hu wrote: >> >>> Hi there, >>> >>> >>> >>> I was trying to install BioPerl in windows using ppm, by following the >>> instruction in >>> "http://bioperl.open-bio.org/wiki/Installing_Bioperl_on_Windows". I set up >>> the repositories, and did the search of Bioperl packages. The latest version >>> available is BioPerl 1.6.1 on http://bioperl.org/DIST. When I tried to >>> install it, a number of prerequisite modules were being installed too, which >>> include Bioperl 1.4. Then an error message showed up during installation: >>> >>> >>> >>> "ERROR: File conflict for 'C:/Perl/html/bin/bp_aacomp.html'.The package >>> BioPerl has already installed a file that package bioperl wants to install." >>> >>> >>> >>> It looks to me that BioPerl 1.6.1 had installed a file that bioperl 1.4 >>> wanted to install again. I don't know why bioperl 1.4 was one of the >>> prerequisites for 1.6.1. If I just install 1.4, it will be installed without >>> errors. But I need a newer version, because some modules (like >>> >>> Bio::Tools::HMM) is not included in 1.4. >>> >>> >>> >>> I saw on internet that somebody had the same problem when he was trying to >>> install BioPerl 1.5, but I didn't find the solution. >>> >>> >>> >>> Anybody has a clue on that? Thank you for your time. >>> >>> >>> >>> GH >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From clarsen at vecna.com Mon Jan 18 12:42:13 2010 From: clarsen at vecna.com (Chris Larsen) Date: Mon, 18 Jan 2010 12:42:13 -0500 Subject: [Bioperl-l] Reciprocal best blast hits using BioPerl? In-Reply-To: References: Message-ID: Bhakti, (and Chris, Mark)-- Yes there is some perl available to parse reciprocal best blast hits. Mark's referenced / archived post was mine, we were looking to do what you wanted. Here we proceed with the thread. We ended up implementing OrthoMCL 1.4 as Chris F pointed to, and then made a simple perl parser that would take the raw OrthoMCL output, do splits, and spit out a delimited table of all the orthologs in a group, for say Mycobacterium Genus, so you could stuff it into DBLoader. The link to the script, SOP, and method is at: http://www.biohealthbase.org/brcDocs/documents/BHB_ORTHOLOG_SOP.pdf Giving e.g.: Francisella 1 110321310 Francisella 1 110321361 Francisella 1 56707275 Francisella 1 56707366 Francisella 1 56707462 Five members of Ortholog Group 1, with just their gi number. And you can see the results of that parsing, supported by a database, being used to load BioHealthbase with all the reciprocal best blast hits plus other OrthoMCL parsing, for mycobacterial PolA at: http://www.biohealthbase.org/brc/details.do?locus=MAV_3155&decorator=mycobacterium See? Pretty? We were just interested in making ortholog groups on the bais of paralog-conscious reciprocal blast stuff. Like you. This package and doc I've made does what you want I think, as long as you stay in prokaryotes. But--careful...garbage in, garbage out. We started with clean Genuses. (. o O Genii?). You'll get more junky HUGE and TINY ortholog groups if you put in different Orders of microbes. Its taxa sensitive. OrthoMCL author David Roos is great at it though and designed it in mind of higher unicellular euks too...comb the docs for that; sorry I was doing bacterial work at the time and cant guide you if thats what you want.. If you end up installing OrthMCL 1.4, you can pipe the output to this method and get out useable stuff. Hope it works for you. Cheers, Chris L -- Christopher Larsen, Ph.D. Sr. Scientist / Grants Manager Vecna Technologies 6404 Ivy Lane #500 Greenbelt, MD 20770 Phone: (240) 965-4525 Fax: (240) 547-6133 240-737-4525 From maj at fortinbras.us Mon Jan 18 14:37:43 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 18 Jan 2010 14:37:43 -0500 Subject: [Bioperl-l] Bioperl 1.6 In-Reply-To: References: <004f01ca97e6$bee06650$3ca132f0$@ac.cn> <6093E45F17B543438AC02E6C626439E1@NewLife> Message-ID: <61F331117B7C4E2282684FA240B9710F@NewLife> I will play around with it-- in the meantime, Guohong, please look at the following http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows#Comand-line_Installation where there is a workaround for this issue, using the ppm-shell-- cheers, Mark ----- Original Message ----- From: "Chris Fields" To: "Mark A. Jensen" Cc: "Guohong Hu" ; Sent: Monday, January 18, 2010 12:18 PM Subject: Re: [Bioperl-l] Bioperl 1.6 Mark, Odd issue, maybe it's a dependency like Bio::ASN1::EntrezGene that's causing this? Regardless, it's problematic for me to test this out directly, at least for the next few days. Maybe someone could try it? Also, there is the Strawberry Perl alternative, which uses CPAN (I think ActiveState also supports this). chris On Jan 18, 2010, at 10:33 AM, Mark A. Jensen wrote: > this issue's come up before, see this thread > http://lists.open-bio.org/pipermail/bioperl-l/2009-October/031400.html > MAJ > ----- Original Message ----- From: "Chris Fields" > To: "Guohong Hu" > Cc: > Sent: Monday, January 18, 2010 10:30 AM > Subject: Re: [Bioperl-l] Bioperl 1.6 > > >> Guohong, >> >> 1.6.1 PPM doesn't (at least, shouldn't) require BioPerl 1.4 to be installed >> first. Make sure the repos are set according to the Windows installation >> instructions on the BioPerl wiki: >> >> http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows >> >> IIRC the actual order of the PPM repository can be critical (PPM pulls based >> on highest version, first repo, but sometimes it gets confused). Just >> curious but where is the v 1.4 PPM located? If it is local to our PPM repo I >> can physically remove it to prevent this from happening. >> >> chris >> >> On Jan 17, 2010, at 8:34 PM, Guohong Hu wrote: >> >>> Hi there, >>> >>> >>> >>> I was trying to install BioPerl in windows using ppm, by following the >>> instruction in >>> "http://bioperl.open-bio.org/wiki/Installing_Bioperl_on_Windows". I set up >>> the repositories, and did the search of Bioperl packages. The latest version >>> available is BioPerl 1.6.1 on http://bioperl.org/DIST. When I tried to >>> install it, a number of prerequisite modules were being installed too, which >>> include Bioperl 1.4. Then an error message showed up during installation: >>> >>> >>> >>> "ERROR: File conflict for 'C:/Perl/html/bin/bp_aacomp.html'.The package >>> BioPerl has already installed a file that package bioperl wants to install." >>> >>> >>> >>> It looks to me that BioPerl 1.6.1 had installed a file that bioperl 1.4 >>> wanted to install again. I don't know why bioperl 1.4 was one of the >>> prerequisites for 1.6.1. If I just install 1.4, it will be installed without >>> errors. But I need a newer version, because some modules (like >>> >>> Bio::Tools::HMM) is not included in 1.4. >>> >>> >>> >>> I saw on internet that somebody had the same problem when he was trying to >>> install BioPerl 1.5, but I didn't find the solution. >>> >>> >>> >>> Anybody has a clue on that? Thank you for your time. >>> >>> >>> >>> GH >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jason at bioperl.org Mon Jan 18 15:24:33 2010 From: jason at bioperl.org (Jason Stajich) Date: Mon, 18 Jan 2010 12:24:33 -0800 Subject: [Bioperl-l] Reciprocal best hits using Bioperl? In-Reply-To: References: <201001171812.56238.tristan.lefebure@gmail.com> <9151FF3A-B51D-4E03-AAA0-4463A6A57F10@bioperl.org> <201001172036.39032.tristan.lefebure@gmail.com> Message-ID: <68DF70A5-63A6-428D-A7F1-7B3D01528375@bioperl.org> On Jan 18, 2010, at 8:12 AM, Chris Fields wrote: > (my small rant on this) > > On Jan 18, 2010, at 4:20 AM, Robert Bradbury wrote: > >> My comment might be that the problem with OrthoMCL is that it is >> primarily lower organisms. The problem with Ensembl (and some other >> databases) is that it is primarliy higher organisms (though they do >> include Drosophila, C. elegans and Yeast). > > OrthoMCL v2 handles both lower and higher organism; I've used it for > both, with decent success. Most other ortholog tools do as well (if > I'm not mistaken, ensembl also uses MCL under the hood, unless > that's changed). I don't believe one should be completely bound to > one toolset, particularly in this case (there are lots of nice > ortholog clustering tools using various moeans of comparison out > there), but I do think OrthoMCL is very good as an initial pass. If > anything, I would like a set of (possibly bioperl-based, definitely > DB-based) modules that can deal with this information. > > The more imperative issue in my opinion is that one is prisoner to > the gene models for those specific organisms of interest, and this > may vary widely depending on the source of those gene models > (Ensembl, UCSC, NCBI, EBI, centralized MODs like FlyBase, etc). For > instance, if gene models are poorly curated or rarely updated, the > comparisons may be significantly flawed. Some of these issues may > also be (somewhat) alleviated once more transcriptome data is > available that helps clear up gene model ambiguities, but that won't > be true for all organisms, at least initially. > > Note this isn't meant as a slam on any specific DBs or MODs in > general, the problem is one born of the fact that there isn't a > single, centralized, trusted, consistently updated source for this > data, specifically something that will handle moderated third-party > annotation. That's a very difficult problem to solve effectively. > Some of these very issues crept up at the GMOD conference, and there > appears to be consensus that a real attempt is needed to address this. > > I don't know, maybe it's just unicorns and rainbows. Personally I > do think the situation will improve, as there seems to be great > demand for it, but it requires time, resources, manpower, money, cat > herding, etc. > >> The problem arises when one wants to cross those boundaries. For >> example the 5-10 antioxidant proteins, the ~150 DNA repair proteins, >> many of the mitochondrial (ETC) proteins, the ribosomal rRNA's & >> tRNAs, and the fundamental biochemistry (EC) proteins are homologous >> all the way from the most ancient bacteria through H. sapiens. The >> only way to play in the mixed arena of prokaryotes and eukaryotes >> involving fundamental vectors in evolution is to either construct >> ones >> own databases (which presumably means getting involved with MySQL, >> and >> probably spending some $$$ on hardware) or to develop some BioPerl >> modules that can do the SpeciesX vs. SpeciesY comparisons on demand >> using some part of the cloud. This problem isn't going to get >> smaller >> its only going to get larger, now that the cost of sequencing >> (pseudo-resequencing) a vertebrate genome is starting to come in >> under >> $10,000 and people are starting to seriously talk about 10,000 >> vertebrate genomes. 10,000 x 10,000 x 20,000 (genes) isn't something >> people are going to undertake very soon. >> >> Robert > > They're already undertaking it now using a broad range of organisms, > in and out of the cloud. In most cases one can amend a prior recip. > comparative analysis with new data fairly easily, if one takes care > to do so early on (i.e. set up the BLAST databases with a specified > defined size for comparative stats between separate analyses). > OrthoMCL v2 describes a procedure to do this, and I believe others > have similar methodology. > > I could also see possible ways one can further optimize this, for > instance in cases where two very closely-related organisms are > compared, where translated seqs are 100% identical, etc. IIRC, the > OrthoMCL DB site already has a way to upload custom sets of protein > data for mapping to (already pre-run) clusters. Just the fact that > the tools are available as OS, they're semi-automated, and can be > generically applied to data of personal interest is a great boon. > Not sure I see the downside of that, and I'm pretty confident the > scalability issues will be addressed in some way. I think that the approach that Paul Thomas's group at SRI http://www.ai.sri.com/esb/ is doing is really what you'd want to focus on if you are only interested in a particular set of gene families rather than de novo clustering. That or the PhyloFacts approach http://phylogenomics.berkeley.edu/phylofacts/ . That is where HMMs are more appropriate, focusing on your initial seed set of families of proteins. HMMs for your families with some automated clustering initially to get better resolution. Once you start throwing multiple 10^6 proteins the unsupervised clustering approach may not be able to give as accurate or timely results but can be a good initial filtering step depending on how much initial knowledge you are starting with. Using HMM models won't be as computationally expensive either if you are compute limited. TreeFam is also providing curated phylogenies of gene families http://www.treefam.org/ that span the optisthokonts in that a few fungi are sprinkled in. Also things like http://boinc.bio.wzw.tum.de/boincsimap/ provide ways to use distributed computing to calculate the matrix of similarities among proteins if you are interested in the exhaustive approach. -jason > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ From jay at jays.net Mon Jan 18 18:36:20 2010 From: jay at jays.net (Jay Hannah) Date: Mon, 18 Jan 2010 17:36:20 -0600 Subject: [Bioperl-l] Reciprocal best hits using Bioperl? In-Reply-To: References: Message-ID: <9AA13F94-3336-4CC1-89C4-249D0EB7C857@jays.net> On Jan 17, 2010, at 10:02 AM, Bhakti Dwivedi wrote: > Is there a Bio-perl module to parse the reciprocal best hits (query1-> hit1 > && hit1 -> query1) from a blast table report? If all the advice and resources in this thread have not dissuaded you from writing your own, you could glance at cross_blast() here as reference: https://clabsvn.ist.unomaha.edu/anonsvn/user/jhannah/UNO/seqlab/seqlab/tutorial.pod About the (abandoned) project: http://clab.ist.unomaha.edu/CLAB/index.php/SeqLab_%28Perl%29 I wrote that in 2006 for clustering a few hundred proteins based on custom criteria. Cheers, Jay Hannah http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah From jay at jays.net Mon Jan 18 19:22:48 2010 From: jay at jays.net (Jay Hannah) Date: Mon, 18 Jan 2010 18:22:48 -0600 Subject: [Bioperl-l] Bio::BroodComb - RFC Message-ID: <638696D6-7529-4717-A05A-F1E8FF1C5A8F@jays.net> I formalized a little framework so I could stop re-writing little programs that do some things people frequently ask me to do. http://github.com/jhannah/bio-broodcomb It stores everything in SQLite so users can write custom reports if they want to. It uses BioPerl and some shiny CPAN beads (DBIx::Class, Moose::Role). Tests included. The first two functions I stuck in the framework: Find subsequences (Bio::BroodComb::SubSeq): use Bio::BroodComb; my $bc = Bio::BroodComb->new(); $bc->load_large_seq(file => "large_seq.fasta"); $bc->load_small_seq(file => "small_seq.fasta"); $bc->find_subseqs(); print $bc->subseq_report1; In-silico PCR (Bio::BroodComb::PCR): use Bio::BroodComb; my $bc = Bio::BroodComb->new(); $bc->load_large_seq(file => "large_seq.fasta"); $bc->add_primerset( description => "U5/R", # however you want it reported forward_primer => 'GCGGGCAGCAATACTGCTTTGTAA', reverse_primer => 'ACCAGCGTTCAGCATATGGAGGAT', ); $bc->find_pcr_hits(); $bc->find_pcr_products(); print $bc->pcr_report1; I find this rather handy, so will probably be adding all my applicable future work to it instead of writing stand-alone programs. Not sure if it should be renamed for eventual CPAN / wherever. Suggestions, contributions welcome. :) http://github.com/jhannah/bio-broodcomb Jay Hannah http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah From ocornejo at gmail.com Mon Jan 18 19:46:10 2010 From: ocornejo at gmail.com (Omar Cornejo) Date: Mon, 18 Jan 2010 16:46:10 -0800 (PST) Subject: [Bioperl-l] installing bioperl for mac Message-ID: <780113ef-a931-4c2e-bdfe-6a67706e2d0e@p16g2000vbe.googlegroups.com> Dear People, I have tried to install Bioperl in my new Mac Book, which carries the latest perl distribution (5.10.0) and for some reason I can't (using fink) make it recognize this version or perl. I have tried: fink install bioperl-pm510 fink install bioperl-pm5100 but neither one works. Is it fine installing bioperl for perl v 5.9? thank you, Omar Cornejo From jason at bioperl.org Mon Jan 18 20:04:31 2010 From: jason at bioperl.org (Jason Stajich) Date: Mon, 18 Jan 2010 17:04:31 -0800 Subject: [Bioperl-l] AlignIO crashes when reading stockholm alignment with negative PDB ranges In-Reply-To: <4B5502D9.2010706@gmail.com> References: <49C2B97B.7070304@gmail.com> <49C405F0.5050100@gmail.com> <4B4E7A07.7070805@gmail.com> <94913CE4-22AA-407C-9B27-7572A680C9F7@bioperl.org> <4B5502D9.2010706@gmail.com> Message-ID: Alexandr - Thanks for getting back to us - I am guessing the parser needs to recognize negative coordinates around about line 370 in Bio/AlignIO/ Handler/GenericAlignHandler.pm which assumes a split on '-' will be sufficient. Can you post it as a bug to bugzilla along with attaching a record and script that replicates the problem so a test can be written for this. http://bugzilla.open-bio.org/ -jason On Jan 18, 2010, at 4:54 PM, Alexandr Bezginov wrote: > I have contacted Pfam, and I have been told that The PDB file actually > does include a reference to residue "-1": > > DBREF 1E5N A -1 347 UNP P14768 XYNA_PSEFL 264 611 > > DBREF 1E5N B -1 347 UNP P14768 XYNA_PSEFL 264 611 > > > Since negative numbers are allowed in PDB, the data should probably be > considered valid. > > There are quite a few records like this, so this is not an isolated > issue. > > Alexandr > > On 1/14/2010 7:20 PM, Jason Stajich wrote: >> Seems like improper data really -- "-1" is an improper coordinate >> as far >> as the parser is concerned. You may want to tell Pfam that there is >> possible error in the dumper since that was the only record that had >> this problem? >> >> -jason >> On Jan 13, 2010, at 5:57 PM, albezg wrote: >> >>> Hi all, >>> >>> I have a problem using AlignIO to read Pfam database: >>> ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release/Pfam-A.seed.gz >>> The database is in STOCKHOLM 1.0 format. AlignIO can read the >>> alignment OK until the alignment PF00331.13. There it crashes with >>> the >>> following message: >>> >>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>> MSG: '1-344' is not an integer. >>> >>> STACK: Error::throw >>> STACK: Bio::Root::Root::throw >>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Root/Root.pm:368 >>> STACK: Bio::Range::end >>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Range.pm:228 >>> STACK: Bio::Annotation::Target::new >>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Annotation/Target.pm:82 >>> STACK: Bio::AlignIO::Handler::GenericAlignHandler::_stockholm_target >>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/ >>> GenericAlignHandler.pm:293 >>> >>> STACK: Bio::AlignIO::Handler::GenericAlignHandler::data_handler >>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/ >>> GenericAlignHandler.pm:73 >>> >>> STACK: Bio::AlignIO::stockholm::next_aln >>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/stockholm.pm:471 >>> STACK: /home/albezg/scripts/pfam2fasta.pl:22 >>> ----------------------------------------------------------- >>> >>> It appears this is caused by this entry: >>> #=GS XYNA_PSEFL/263-608 DR PDB; 1e5n B; -1-344; >>> >>> I don't care about residues in PDB, so I have just removed minus >>> signs >>> from the ranges. This seems to have fixed the crashing. >>> >>> Is it a known problem? Is there a solution for it? >>> >>> Thanks, >>> Alexandr >>> >>> >>> On 03/20/2009 05:09 PM, albezg wrote: >>>> >>>> I'm trying to change FASTA header(display_id) for a sequence in an >>>> alignment(SimpleAlign). >>>> >>>> There are no issues when I print it, however when I use AlignIO >>>> to write >>>> the alignment to a FASTA file, it does not work. Is this behavior >>>> intended? >>>> >>>> Demo code: http://github.com/jhannah/sandbox/tree/master/Bio_AlignIO_bug >>>> >>>> The error: >>>> ------------- EXCEPTION ------------- >>>> MSG: No sequence with name [1/1-11] >>>> STACK Bio::SimpleAlign::displayname >>>> /scratch/BioSoftware/bioperl-live/Bio/SimpleAlign.pm:2659 >>>> STACK Bio::AlignIO::fasta::write_aln >>>> /scratch/BioSoftware/bioperl-live/Bio/AlignIO/fasta.pm:200 >>>> STACK toplevel ./demo.pl:14 >>>> ------------------------------------- >>>> >>>> Alexandr >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> jason.stajich at gmail.com >> jason at bioperl.org >> http://fungalgenomes.org/ >> > -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ From cjfields at illinois.edu Mon Jan 18 21:19:30 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 18 Jan 2010 20:19:30 -0600 Subject: [Bioperl-l] AlignIO crashes when reading stockholm alignment with negative PDB ranges In-Reply-To: References: <49C2B97B.7070304@gmail.com> <49C405F0.5050100@gmail.com> <4B4E7A07.7070805@gmail.com> <94913CE4-22AA-407C-9B27-7572A680C9F7@bioperl.org> <4B5502D9.2010706@gmail.com> Message-ID: <46FD172A-69C0-436C-A005-AC38668C3347@illinois.edu> Alexandr, Posting the bug report would be great, should be an easy enough fix. chris On Jan 18, 2010, at 7:04 PM, Jason Stajich wrote: > Alexandr - > > Thanks for getting back to us - I am guessing the parser needs to recognize negative coordinates around about line 370 in Bio/AlignIO/Handler/GenericAlignHandler.pm which assumes a split on '-' will be sufficient. > > Can you post it as a bug to bugzilla along with attaching a record and script that replicates the problem so a test can be written for this. http://bugzilla.open-bio.org/ > > -jason > On Jan 18, 2010, at 4:54 PM, Alexandr Bezginov wrote: > >> I have contacted Pfam, and I have been told that The PDB file actually >> does include a reference to residue "-1": >> >> DBREF 1E5N A -1 347 UNP P14768 XYNA_PSEFL 264 611 >> >> DBREF 1E5N B -1 347 UNP P14768 XYNA_PSEFL 264 611 >> >> >> Since negative numbers are allowed in PDB, the data should probably be >> considered valid. >> >> There are quite a few records like this, so this is not an isolated issue. >> >> Alexandr >> >> On 1/14/2010 7:20 PM, Jason Stajich wrote: >>> Seems like improper data really -- "-1" is an improper coordinate as far >>> as the parser is concerned. You may want to tell Pfam that there is >>> possible error in the dumper since that was the only record that had >>> this problem? >>> >>> -jason >>> On Jan 13, 2010, at 5:57 PM, albezg wrote: >>> >>>> Hi all, >>>> >>>> I have a problem using AlignIO to read Pfam database: >>>> ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release/Pfam-A.seed.gz >>>> The database is in STOCKHOLM 1.0 format. AlignIO can read the >>>> alignment OK until the alignment PF00331.13. There it crashes with the >>>> following message: >>>> >>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>> MSG: '1-344' is not an integer. >>>> >>>> STACK: Error::throw >>>> STACK: Bio::Root::Root::throw >>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Root/Root.pm:368 >>>> STACK: Bio::Range::end >>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Range.pm:228 >>>> STACK: Bio::Annotation::Target::new >>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Annotation/Target.pm:82 >>>> STACK: Bio::AlignIO::Handler::GenericAlignHandler::_stockholm_target >>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/GenericAlignHandler.pm:293 >>>> >>>> STACK: Bio::AlignIO::Handler::GenericAlignHandler::data_handler >>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/GenericAlignHandler.pm:73 >>>> >>>> STACK: Bio::AlignIO::stockholm::next_aln >>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/stockholm.pm:471 >>>> STACK: /home/albezg/scripts/pfam2fasta.pl:22 >>>> ----------------------------------------------------------- >>>> >>>> It appears this is caused by this entry: >>>> #=GS XYNA_PSEFL/263-608 DR PDB; 1e5n B; -1-344; >>>> >>>> I don't care about residues in PDB, so I have just removed minus signs >>>> from the ranges. This seems to have fixed the crashing. >>>> >>>> Is it a known problem? Is there a solution for it? >>>> >>>> Thanks, >>>> Alexandr >>>> >>>> >>>> On 03/20/2009 05:09 PM, albezg wrote: >>>>> >>>>> I'm trying to change FASTA header(display_id) for a sequence in an >>>>> alignment(SimpleAlign). >>>>> >>>>> There are no issues when I print it, however when I use AlignIO to write >>>>> the alignment to a FASTA file, it does not work. Is this behavior >>>>> intended? >>>>> >>>>> Demo code: http://github.com/jhannah/sandbox/tree/master/Bio_AlignIO_bug >>>>> >>>>> The error: >>>>> ------------- EXCEPTION ------------- >>>>> MSG: No sequence with name [1/1-11] >>>>> STACK Bio::SimpleAlign::displayname >>>>> /scratch/BioSoftware/bioperl-live/Bio/SimpleAlign.pm:2659 >>>>> STACK Bio::AlignIO::fasta::write_aln >>>>> /scratch/BioSoftware/bioperl-live/Bio/AlignIO/fasta.pm:200 >>>>> STACK toplevel ./demo.pl:14 >>>>> ------------------------------------- >>>>> >>>>> Alexandr >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> -- >>> Jason Stajich >>> jason.stajich at gmail.com >>> jason at bioperl.org >>> http://fungalgenomes.org/ >>> >> > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > http://fungalgenomes.org/ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Jan 18 21:20:31 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 18 Jan 2010 20:20:31 -0600 Subject: [Bioperl-l] installing bioperl for mac In-Reply-To: <780113ef-a931-4c2e-bdfe-6a67706e2d0e@p16g2000vbe.googlegroups.com> References: <780113ef-a931-4c2e-bdfe-6a67706e2d0e@p16g2000vbe.googlegroups.com> Message-ID: <2C159519-B13D-4ABA-BC7B-A21AB5EE0C37@illinois.edu> On Jan 18, 2010, at 6:46 PM, Omar Cornejo wrote: > Dear People, > I have tried to install Bioperl in my new Mac Book, which carries > the latest perl distribution (5.10.0) and for some reason I can't > (using fink) make it recognize this version or perl. > I have tried: > fink install bioperl-pm510 > fink install bioperl-pm5100 > > but neither one works. Is it fine installing bioperl for perl v 5.9? > > thank you, > Omar Cornejo fink doesn't have a package for perl 5.10. You can install it using CPAN, however (it's pure perl), or use other UNIX-y options. See the UNIX installation instructions on the wiki: http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix chris From dan.kortschak at adelaide.edu.au Mon Jan 18 21:47:47 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Tue, 19 Jan 2010 13:17:47 +1030 Subject: [Bioperl-l] Bio::Tools::Run::Bowtie and Bio::Assembly::IO::bowtie now available BETA Message-ID: <1263869267.12881.33.camel@zoidberg.mbs.adelaide.edu.au> Hi All, A wrapper and output parser for bowtie 'ultrafast, memory-efficient short read aligner' are now available in the bioperl-live and bioperl-run subversion repositories (bioperl-live/trunk at 16727 and bioperl-run/trunk at 16726). Bowtie details are available here: http://bowtie-bio.sourceforge.net/index.shtml The modules can return a Bio::Assembly::Scaffold object (operating via the MAJ's Bio::Assembly::IO::sam module in bioperl-live/trunk which requires lstein's Bio::DB::Sam, from CPAN). Note that Bio::DB::Sam uses large amounts of memory - the test suite works for me with >=2GB but not with 1GB due to this. (Is there a disk file system based tool for this for large projects?) Bowtie (>0.12.0) can align in colour space, but this is not currently supported by the wrapper though it should not be difficult to add. If someone can point me to a small set of colour space reads and a reference sequence I will be able to use these for testing. Thanks to the core devs for helping me with many of my problems in putting this together. Dan From maj at fortinbras.us Mon Jan 18 22:31:36 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 18 Jan 2010 22:31:36 -0500 Subject: [Bioperl-l] Bio::Tools::Run::Bowtie and Bio::Assembly::IO::bowtie now available BETA In-Reply-To: <1263869267.12881.33.camel@zoidberg.mbs.adelaide.edu.au> References: <1263869267.12881.33.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: Excellent Dan! Thanks for all this work-- MAJ ----- Original Message ----- From: "Dan Kortschak" To: Sent: Monday, January 18, 2010 9:47 PM Subject: [Bioperl-l] Bio::Tools::Run::Bowtie and Bio::Assembly::IO::bowtie now available BETA > Hi All, > > A wrapper and output parser for bowtie 'ultrafast, memory-efficient > short read aligner' are now available in the bioperl-live and > bioperl-run subversion repositories (bioperl-live/trunk at 16727 and > bioperl-run/trunk at 16726). Bowtie details are available here: > > http://bowtie-bio.sourceforge.net/index.shtml > > The modules can return a Bio::Assembly::Scaffold object (operating via > the MAJ's Bio::Assembly::IO::sam module in bioperl-live/trunk > which requires lstein's Bio::DB::Sam, from CPAN). Note that Bio::DB::Sam > uses large amounts of memory - the test suite works for me with >=2GB > but not with 1GB due to this. (Is there a disk file system based tool > for this for large projects?) > > Bowtie (>0.12.0) can align in colour space, but this is not currently > supported by the wrapper though it should not be difficult to add. If > someone can point me to a small set of colour space reads and a > reference sequence I will be able to use these for testing. > > Thanks to the core devs for helping me with many of my problems in > putting this together. > > Dan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Mon Jan 18 22:36:12 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 18 Jan 2010 21:36:12 -0600 Subject: [Bioperl-l] Bio::Tools::Run::Bowtie and Bio::Assembly::IO::bowtie now available BETA In-Reply-To: <1263869267.12881.33.camel@zoidberg.mbs.adelaide.edu.au> References: <1263869267.12881.33.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: On Jan 18, 2010, at 8:47 PM, Dan Kortschak wrote: > Hi All, > > A wrapper and output parser for bowtie 'ultrafast, memory-efficient > short read aligner' are now available in the bioperl-live and > bioperl-run subversion repositories (bioperl-live/trunk at 16727 and > bioperl-run/trunk at 16726). Bowtie details are available here: > > http://bowtie-bio.sourceforge.net/index.shtml > > The modules can return a Bio::Assembly::Scaffold object (operating via > the MAJ's Bio::Assembly::IO::sam module in bioperl-live/trunk > which requires lstein's Bio::DB::Sam, from CPAN). Note that Bio::DB::Sam > uses large amounts of memory - the test suite works for me with >=2GB > but not with 1GB due to this. (Is there a disk file system based tool > for this for large projects?) > > Bowtie (>0.12.0) can align in colour space, but this is not currently > supported by the wrapper though it should not be difficult to add. If > someone can point me to a small set of colour space reads and a > reference sequence I will be able to use these for testing. > > Thanks to the core devs for helping me with many of my problems in > putting this together. > > Dan And (on behalf of the core devs) thank you for putting this together! chris From scott at scottcain.net Mon Jan 18 22:41:43 2010 From: scott at scottcain.net (Scott Cain) Date: Mon, 18 Jan 2010 22:41:43 -0500 Subject: [Bioperl-l] installing bioperl for mac In-Reply-To: <2C159519-B13D-4ABA-BC7B-A21AB5EE0C37@illinois.edu> References: <780113ef-a931-4c2e-bdfe-6a67706e2d0e@p16g2000vbe.googlegroups.com> <2C159519-B13D-4ABA-BC7B-A21AB5EE0C37@illinois.edu> Message-ID: <4536f7701001181941v7ba47d7du340d18f02f84bb17@mail.gmail.com> But make sure you have the developers tools installed before the first time you run the cpan shell; it will make your life easier. Scott On Mon, Jan 18, 2010 at 9:20 PM, Chris Fields wrote: > On Jan 18, 2010, at 6:46 PM, Omar Cornejo wrote: > >> Dear People, >> ?I have tried to install Bioperl in my new Mac Book, which carries >> the latest perl distribution (5.10.0) and for some reason I can't >> (using fink) make it recognize this version or perl. >> ?I have tried: >> fink install bioperl-pm510 >> fink install bioperl-pm5100 >> >> but neither one works. ?Is it fine installing bioperl for perl v 5.9? >> >> thank you, >> Omar Cornejo > > fink doesn't have a package for perl 5.10. ?You can install it using CPAN, however (it's pure perl), or use other UNIX-y options. ?See the UNIX installation instructions on the wiki: > > http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From cjfields at illinois.edu Mon Jan 18 23:04:57 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 18 Jan 2010 22:04:57 -0600 Subject: [Bioperl-l] Bioperl 1.6 In-Reply-To: <009801c8b957$2af4f8d0$80deea70$@ac.cn> References: <004f01ca97e6$bee06650$3ca132f0$@ac.cn> <009801c8b957$2af4f8d0$80deea70$@ac.cn> Message-ID: <79D53148-1FDA-4025-99A6-77A7F124E6BD@illinois.edu> Hmm, the trouchelle repo is the only one that had a working DB_File for perl 5.10 (not sure but I think 5.8.9 was fine). Probably worth contacting them about this to see if they can drop the (way out-of-date) 1.4 distribution. chris On May 18, 2008, at 9:22 PM, Guohong Hu wrote: > Thank for you all. The problem is solved. The bioperl 1.4 version is from > the Trouchelle repo, but 1.6 is in the Bioperl Regular Releases repo. When I > added all the repo according to the bioperl wiki instruction, somehow 1.4 > became a prerequisite for 1.6. But Chris's question reminded me, so I > removed Trouchelle repo, and the installation proceeded without errors. I > suggested we put a note in the wiki link since it looks like an odd issue > not just for me. > > Best, > Guohong > > > > _________________________________________ > ??????: Chris Fields [mailto:cjfields at illinois.edu] > ????????: 2010??1??18?? 23:30 > ??????: Guohong Hu > ????: bioperl-l at lists.open-bio.org > ????: Re: [Bioperl-l] Bioperl 1.6 > > Guohong, > > 1.6.1 PPM doesn't (at least, shouldn't) require BioPerl 1.4 to be installed > first. Make sure the repos are set according to the Windows installation > instructions on the BioPerl wiki: > > http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows > > IIRC the actual order of the PPM repository can be critical (PPM pulls based > on highest version, first repo, but sometimes it gets confused). Just > curious but where is the v 1.4 PPM located? If it is local to our PPM repo > I can physically remove it to prevent this from happening. > > chris > > On Jan 17, 2010, at 8:34 PM, Guohong Hu wrote: > >> Hi there, >> >> >> >> I was trying to install BioPerl in windows using ppm, by following the >> instruction in >> "http://bioperl.open-bio.org/wiki/Installing_Bioperl_on_Windows". I set up >> the repositories, and did the search of Bioperl packages. The latest > version >> available is BioPerl 1.6.1 on http://bioperl.org/DIST. When I tried to >> install it, a number of prerequisite modules were being installed too, > which >> include Bioperl 1.4. Then an error message showed up during installation: >> >> >> >> "ERROR: File conflict for 'C:/Perl/html/bin/bp_aacomp.html'.The package >> BioPerl has already installed a file that package bioperl wants to > install." >> >> >> >> It looks to me that BioPerl 1.6.1 had installed a file that bioperl 1.4 >> wanted to install again. I don't know why bioperl 1.4 was one of the >> prerequisites for 1.6.1. If I just install 1.4, it will be installed > without >> errors. But I need a newer version, because some modules (like >> >> Bio::Tools::HMM) is not included in 1.4. >> >> >> >> I saw on internet that somebody had the same problem when he was trying to >> install BioPerl 1.5, but I didn't find the solution. >> >> >> >> Anybody has a clue on that? Thank you for your time. >> >> >> >> GH >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From ocornejo at gmail.com Mon Jan 18 23:18:00 2010 From: ocornejo at gmail.com (Omar Eduardo Cornejo Ordaz) Date: Mon, 18 Jan 2010 23:18:00 -0500 Subject: [Bioperl-l] installing bioperl for mac In-Reply-To: <5C767287-A133-4DB5-8708-AD1BF48A4E7E@illinois.edu> References: <780113ef-a931-4c2e-bdfe-6a67706e2d0e@p16g2000vbe.googlegroups.com> <2C159519-B13D-4ABA-BC7B-A21AB5EE0C37@illinois.edu> <4536f7701001181941v7ba47d7du340d18f02f84bb17@mail.gmail.com> <5C767287-A133-4DB5-8708-AD1BF48A4E7E@illinois.edu> Message-ID: I see. thank you Scott and Chris. I had already installed the latest version of the Xcode Developer Tools. I will go the cpan way then. have a nice one, Omar On Mon, Jan 18, 2010 at 10:58 PM, Chris Fields wrote: > Yes, definitely! > > -c > > On Jan 18, 2010, at 9:41 PM, Scott Cain wrote: > > > But make sure you have the developers tools installed before the first > > time you run the cpan shell; it will make your life easier. > > > > Scott > > > > > > On Mon, Jan 18, 2010 at 9:20 PM, Chris Fields > wrote: > >> On Jan 18, 2010, at 6:46 PM, Omar Cornejo wrote: > >> > >>> Dear People, > >>> I have tried to install Bioperl in my new Mac Book, which carries > >>> the latest perl distribution (5.10.0) and for some reason I can't > >>> (using fink) make it recognize this version or perl. > >>> I have tried: > >>> fink install bioperl-pm510 > >>> fink install bioperl-pm5100 > >>> > >>> but neither one works. Is it fine installing bioperl for perl v 5.9? > >>> > >>> thank you, > >>> Omar Cornejo > >> > >> fink doesn't have a package for perl 5.10. You can install it using > CPAN, however (it's pure perl), or use other UNIX-y options. See the UNIX > installation instructions on the wiki: > >> > >> http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix > >> > >> chris > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > > > > > > > -- > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D. scott at scottcain > dot net > > GMOD Coordinator (http://gmod.org/) 216-392-3087 > > Ontario Institute for Cancer Research > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Mon Jan 18 22:58:36 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 18 Jan 2010 21:58:36 -0600 Subject: [Bioperl-l] installing bioperl for mac In-Reply-To: <4536f7701001181941v7ba47d7du340d18f02f84bb17@mail.gmail.com> References: <780113ef-a931-4c2e-bdfe-6a67706e2d0e@p16g2000vbe.googlegroups.com> <2C159519-B13D-4ABA-BC7B-A21AB5EE0C37@illinois.edu> <4536f7701001181941v7ba47d7du340d18f02f84bb17@mail.gmail.com> Message-ID: <5C767287-A133-4DB5-8708-AD1BF48A4E7E@illinois.edu> Yes, definitely! -c On Jan 18, 2010, at 9:41 PM, Scott Cain wrote: > But make sure you have the developers tools installed before the first > time you run the cpan shell; it will make your life easier. > > Scott > > > On Mon, Jan 18, 2010 at 9:20 PM, Chris Fields wrote: >> On Jan 18, 2010, at 6:46 PM, Omar Cornejo wrote: >> >>> Dear People, >>> I have tried to install Bioperl in my new Mac Book, which carries >>> the latest perl distribution (5.10.0) and for some reason I can't >>> (using fink) make it recognize this version or perl. >>> I have tried: >>> fink install bioperl-pm510 >>> fink install bioperl-pm5100 >>> >>> but neither one works. Is it fine installing bioperl for perl v 5.9? >>> >>> thank you, >>> Omar Cornejo >> >> fink doesn't have a package for perl 5.10. You can install it using CPAN, however (it's pure perl), or use other UNIX-y options. See the UNIX installation instructions on the wiki: >> >> http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From albezg at gmail.com Mon Jan 18 19:54:49 2010 From: albezg at gmail.com (Alexandr Bezginov) Date: Mon, 18 Jan 2010 19:54:49 -0500 Subject: [Bioperl-l] AlignIO crashes when reading stockholm alignment with negative PDB ranges In-Reply-To: <94913CE4-22AA-407C-9B27-7572A680C9F7@bioperl.org> References: <49C2B97B.7070304@gmail.com> <49C405F0.5050100@gmail.com> <4B4E7A07.7070805@gmail.com> <94913CE4-22AA-407C-9B27-7572A680C9F7@bioperl.org> Message-ID: <4B5502D9.2010706@gmail.com> I have contacted Pfam, and I have been told that The PDB file actually does include a reference to residue "-1": DBREF 1E5N A -1 347 UNP P14768 XYNA_PSEFL 264 611 DBREF 1E5N B -1 347 UNP P14768 XYNA_PSEFL 264 611 Since negative numbers are allowed in PDB, the data should probably be considered valid. There are quite a few records like this, so this is not an isolated issue. Alexandr On 1/14/2010 7:20 PM, Jason Stajich wrote: > Seems like improper data really -- "-1" is an improper coordinate as far > as the parser is concerned. You may want to tell Pfam that there is > possible error in the dumper since that was the only record that had > this problem? > > -jason > On Jan 13, 2010, at 5:57 PM, albezg wrote: > >> Hi all, >> >> I have a problem using AlignIO to read Pfam database: >> ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release/Pfam-A.seed.gz >> The database is in STOCKHOLM 1.0 format. AlignIO can read the >> alignment OK until the alignment PF00331.13. There it crashes with the >> following message: >> >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: '1-344' is not an integer. >> >> STACK: Error::throw >> STACK: Bio::Root::Root::throw >> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Root/Root.pm:368 >> STACK: Bio::Range::end >> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Range.pm:228 >> STACK: Bio::Annotation::Target::new >> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Annotation/Target.pm:82 >> STACK: Bio::AlignIO::Handler::GenericAlignHandler::_stockholm_target >> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/GenericAlignHandler.pm:293 >> >> STACK: Bio::AlignIO::Handler::GenericAlignHandler::data_handler >> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/GenericAlignHandler.pm:73 >> >> STACK: Bio::AlignIO::stockholm::next_aln >> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/stockholm.pm:471 >> STACK: /home/albezg/scripts/pfam2fasta.pl:22 >> ----------------------------------------------------------- >> >> It appears this is caused by this entry: >> #=GS XYNA_PSEFL/263-608 DR PDB; 1e5n B; -1-344; >> >> I don't care about residues in PDB, so I have just removed minus signs >> from the ranges. This seems to have fixed the crashing. >> >> Is it a known problem? Is there a solution for it? >> >> Thanks, >> Alexandr >> >> >> On 03/20/2009 05:09 PM, albezg wrote: >>> >>> I'm trying to change FASTA header(display_id) for a sequence in an >>> alignment(SimpleAlign). >>> >>> There are no issues when I print it, however when I use AlignIO to write >>> the alignment to a FASTA file, it does not work. Is this behavior >>> intended? >>> >>> Demo code: http://github.com/jhannah/sandbox/tree/master/Bio_AlignIO_bug >>> >>> The error: >>> ------------- EXCEPTION ------------- >>> MSG: No sequence with name [1/1-11] >>> STACK Bio::SimpleAlign::displayname >>> /scratch/BioSoftware/bioperl-live/Bio/SimpleAlign.pm:2659 >>> STACK Bio::AlignIO::fasta::write_aln >>> /scratch/BioSoftware/bioperl-live/Bio/AlignIO/fasta.pm:200 >>> STACK toplevel ./demo.pl:14 >>> ------------------------------------- >>> >>> Alexandr >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > http://fungalgenomes.org/ > From ghhu at sibs.ac.cn Mon Jan 18 21:22:19 2010 From: ghhu at sibs.ac.cn (Guohong Hu) Date: Tue, 19 Jan 2010 02:22:19 -0000 Subject: [Bioperl-l] Bioperl 1.6 In-Reply-To: References: <004f01ca97e6$bee06650$3ca132f0$@ac.cn> Message-ID: <009801c8b957$2af4f8d0$80deea70$@ac.cn> Thank for you all. The problem is solved. The bioperl 1.4 version is from the Trouchelle repo, but 1.6 is in the Bioperl Regular Releases repo. When I added all the repo according to the bioperl wiki instruction, somehow 1.4 became a prerequisite for 1.6. But Chris's question reminded me, so I removed Trouchelle repo, and the installation proceeded without errors. I suggested we put a note in the wiki link since it looks like an odd issue not just for me. Best, Guohong _________________________________________ ??????: Chris Fields [mailto:cjfields at illinois.edu] ????????: 2010??1??18?? 23:30 ??????: Guohong Hu ????: bioperl-l at lists.open-bio.org ????: Re: [Bioperl-l] Bioperl 1.6 Guohong, 1.6.1 PPM doesn't (at least, shouldn't) require BioPerl 1.4 to be installed first. Make sure the repos are set according to the Windows installation instructions on the BioPerl wiki: http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows IIRC the actual order of the PPM repository can be critical (PPM pulls based on highest version, first repo, but sometimes it gets confused). Just curious but where is the v 1.4 PPM located? If it is local to our PPM repo I can physically remove it to prevent this from happening. chris On Jan 17, 2010, at 8:34 PM, Guohong Hu wrote: > Hi there, > > > > I was trying to install BioPerl in windows using ppm, by following the > instruction in > "http://bioperl.open-bio.org/wiki/Installing_Bioperl_on_Windows". I set up > the repositories, and did the search of Bioperl packages. The latest version > available is BioPerl 1.6.1 on http://bioperl.org/DIST. When I tried to > install it, a number of prerequisite modules were being installed too, which > include Bioperl 1.4. Then an error message showed up during installation: > > > > "ERROR: File conflict for 'C:/Perl/html/bin/bp_aacomp.html'.The package > BioPerl has already installed a file that package bioperl wants to install." > > > > It looks to me that BioPerl 1.6.1 had installed a file that bioperl 1.4 > wanted to install again. I don't know why bioperl 1.4 was one of the > prerequisites for 1.6.1. If I just install 1.4, it will be installed without > errors. But I need a newer version, because some modules (like > > Bio::Tools::HMM) is not included in 1.4. > > > > I saw on internet that somebody had the same problem when he was trying to > install BioPerl 1.5, but I didn't find the solution. > > > > Anybody has a clue on that? Thank you for your time. > > > > GH > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jw12 at sanger.ac.uk Tue Jan 19 05:41:12 2010 From: jw12 at sanger.ac.uk (Jonathan Warren) Date: Tue, 19 Jan 2010 10:41:12 +0000 Subject: [Bioperl-l] DAS Workshop Registrations now Open (workshop date 7-9 April 2010) Message-ID: <9EDF4E46-15F8-434E-B557-2DE5906C4182@sanger.ac.uk> If you don't know about DAS and wish to know how to distribute your latest biological annotation to the world then the upcoming DAS workshop maybe for you. If you know about DAS and are maybe a DAS client developer then the upcoming DAS workshop is for you (as you will need to know about the upcoming DAS 1.6 Specification and how it may affect your software). For information on the workshop and registration please go to: http://www.ebi.ac.uk/training/handson/DAS_070410.html Jonathan Warren Senior Developer and DAS coordinator jw12 at sanger.ac.uk -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From SMarkel at accelrys.com Tue Jan 19 13:00:22 2010 From: SMarkel at accelrys.com (Scott Markel) Date: Tue, 19 Jan 2010 10:00:22 -0800 Subject: [Bioperl-l] Bio::Tools::Run::Bowtie and Bio::Assembly::IO::bowtie now available BETA In-Reply-To: <1263869267.12881.33.camel@zoidberg.mbs.adelaide.edu.au> References: <1263869267.12881.33.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: <5ACBA19439E77B43A06F4CAB897EC977019B31FE@EXCH1-COLO.accelrys.net> Dan, Life Tech has sample data for E. coli at http://solidsoftwaretools.com/gf/project/ecoli2x50/ and http://solidsoftwaretools.com/gf/project/dh10bfrag/. Reference sequences are included. Scott Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel at accelrys.com Accelrys (Pipeline Pilot R&D) mobile: +1 858 205 3653 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 San Diego, CA 92121 fax: +1 858 799 5222 USA web: http://www.accelrys.com http://www.linkedin.com/in/smarkel Vice President, Board of Directors: International Society for Computational Biology Chair: ISCB Publications Committee Associate Editor: PLoS Computational Biology Editorial Board: Briefings in Bioinformatics -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Dan Kortschak Sent: Monday, 18 January 2010 6:48 PM To: bioperl-l at lists.open-bio.org Subject: [Bioperl-l] Bio::Tools::Run::Bowtie and Bio::Assembly::IO::bowtie now available BETA Hi All, A wrapper and output parser for bowtie 'ultrafast, memory-efficient short read aligner' are now available in the bioperl-live and bioperl-run subversion repositories (bioperl-live/trunk at 16727 and bioperl-run/trunk at 16726). Bowtie details are available here: http://bowtie-bio.sourceforge.net/index.shtml The modules can return a Bio::Assembly::Scaffold object (operating via the MAJ's Bio::Assembly::IO::sam module in bioperl-live/trunk which requires lstein's Bio::DB::Sam, from CPAN). Note that Bio::DB::Sam uses large amounts of memory - the test suite works for me with >=2GB but not with 1GB due to this. (Is there a disk file system based tool for this for large projects?) Bowtie (>0.12.0) can align in colour space, but this is not currently supported by the wrapper though it should not be difficult to add. If someone can point me to a small set of colour space reads and a reference sequence I will be able to use these for testing. Thanks to the core devs for helping me with many of my problems in putting this together. Dan _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From dan.kortschak at adelaide.edu.au Tue Jan 19 16:18:20 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Wed, 20 Jan 2010 07:48:20 +1030 Subject: [Bioperl-l] Bio::Tools::Run::Bowtie and Bio::Assembly::IO::bowtie now available BETA In-Reply-To: <5ACBA19439E77B43A06F4CAB897EC977019B31FE@EXCH1-COLO.accelrys.net> References: <1263869267.12881.33.camel@zoidberg.mbs.adelaide.edu.au> <5ACBA19439E77B43A06F4CAB897EC977019B31FE@EXCH1-COLO.accelrys.net> Message-ID: <1263935900.4813.0.camel@epistle> Great. Thanks, Scott. Dan On Tue, 2010-01-19 at 10:00 -0800, Scott Markel wrote: > Dan, > > Life Tech has sample data for E. coli at > > http://solidsoftwaretools.com/gf/project/ecoli2x50/ > > and > > http://solidsoftwaretools.com/gf/project/dh10bfrag/. > > Reference sequences are included. > > Scott > > Scott Markel, Ph.D. > Principal Bioinformatics Architect email: smarkel at accelrys.com > Accelrys (Pipeline Pilot R&D) mobile: +1 858 205 3653 > 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 > San Diego, CA 92121 fax: +1 858 799 5222 > USA web: http://www.accelrys.com > > http://www.linkedin.com/in/smarkel > Vice President, Board of Directors: > International Society for Computational Biology > Chair: ISCB Publications Committee > Associate Editor: PLoS Computational Biology > Editorial Board: Briefings in Bioinformatics > > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Dan Kortschak > Sent: Monday, 18 January 2010 6:48 PM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Bio::Tools::Run::Bowtie and Bio::Assembly::IO::bowtie now available BETA > > Hi All, > > A wrapper and output parser for bowtie 'ultrafast, memory-efficient > short read aligner' are now available in the bioperl-live and > bioperl-run subversion repositories (bioperl-live/trunk at 16727 and > bioperl-run/trunk at 16726). Bowtie details are available here: > > http://bowtie-bio.sourceforge.net/index.shtml > > The modules can return a Bio::Assembly::Scaffold object (operating via > the MAJ's Bio::Assembly::IO::sam module in bioperl-live/trunk > which requires lstein's Bio::DB::Sam, from CPAN). Note that Bio::DB::Sam > uses large amounts of memory - the test suite works for me with >=2GB > but not with 1GB due to this. (Is there a disk file system based tool > for this for large projects?) > > Bowtie (>0.12.0) can align in colour space, but this is not currently > supported by the wrapper though it should not be difficult to add. If > someone can point me to a small set of colour space reads and a > reference sequence I will be able to use these for testing. > > Thanks to the core devs for helping me with many of my problems in > putting this together. > > Dan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From dan.kortschak at adelaide.edu.au Wed Jan 20 00:32:05 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Wed, 20 Jan 2010 16:02:05 +1030 Subject: [Bioperl-l] using Bio::DB::GFF for aggregation Message-ID: <1263965525.12780.24.camel@zoidberg.mbs.adelaide.edu.au> Hi Chris (or others), I've been looking at ways to do large assemblies (really rnaseq/readseq comparisons for coverage) with maq/bowtie output and it's clear that for the size of project that I'm working on the space complexity is too nasty with Bio::DB::Sam. So I thought Bio::DB:GFF might be the way to go. I was thinking: B:T:R:Bowtie ~> B:SeqFeat:Generic -> B:T:GFF -> B:DB:GFF This depends on the behaviour of B:DB:GFF->features(-merge=>1). I've read through the docs, and it's not entirely clear (I'm hoping I've interpreted it the right way), but does this result in the return of features such that overlapping features are returned as a single feature while non-overlapping features come back separately. If this is the case, it would satisfy my requirements perfectly. thanks for your time Dan From jason at bioperl.org Wed Jan 20 01:35:24 2010 From: jason at bioperl.org (Jason Stajich) Date: Tue, 19 Jan 2010 22:35:24 -0800 Subject: [Bioperl-l] using Bio::DB::GFF for aggregation In-Reply-To: <1263965525.12780.24.camel@zoidberg.mbs.adelaide.edu.au> References: <1263965525.12780.24.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: Are you looking at the bowtie features file or the SAM? -jason On Jan 19, 2010, at 9:32 PM, Dan Kortschak wrote: > Hi Chris (or others), > > I've been looking at ways to do large assemblies (really rnaseq/ > readseq > comparisons for coverage) with maq/bowtie output and it's clear that > for > the size of project that I'm working on the space complexity is too > nasty with Bio::DB::Sam. So I thought Bio::DB:GFF might be the way to > go. > > I was thinking: B:T:R:Bowtie ~> B:SeqFeat:Generic -> B:T:GFF -> > B:DB:GFF > > This depends on the behaviour of B:DB:GFF->features(-merge=>1). I've > read through the docs, and it's not entirely clear (I'm hoping I've > interpreted it the right way), but does this result in the return of > features such that overlapping features are returned as a single > feature > while non-overlapping features come back separately. If this is the > case, it would satisfy my requirements perfectly. > > thanks for your time > Dan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ From dan.kortschak at adelaide.edu.au Wed Jan 20 02:19:05 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Wed, 20 Jan 2010 17:49:05 +1030 Subject: [Bioperl-l] using Bio::DB::GFF for aggregation In-Reply-To: References: <1263965525.12780.24.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: <1263971945.4582.2.camel@epistle> It doesn't really matter, they are largely inter-convertible. The problem is not really the upstream processing, but the aggregation of reads into read-assigned regions (unless I've misunderstood your question). Dan On Tue, 2010-01-19 at 22:35 -0800, Jason Stajich wrote: > Are you looking at the bowtie features file or the SAM? > -jason > On Jan 19, 2010, at 9:32 PM, Dan Kortschak wrote: > > > Hi Chris (or others), > > > > I've been looking at ways to do large assemblies (really rnaseq/ > > readseq > > comparisons for coverage) with maq/bowtie output and it's clear that > > for > > the size of project that I'm working on the space complexity is too > > nasty with Bio::DB::Sam. So I thought Bio::DB:GFF might be the way to > > go. > > > > I was thinking: B:T:R:Bowtie ~> B:SeqFeat:Generic -> B:T:GFF -> > > B:DB:GFF > > > > This depends on the behaviour of B:DB:GFF->features(-merge=>1). I've > > read through the docs, and it's not entirely clear (I'm hoping I've > > interpreted it the right way), but does this result in the return of > > features such that overlapping features are returned as a single > > feature > > while non-overlapping features come back separately. If this is the > > case, it would satisfy my requirements perfectly. > > > > thanks for your time > > Dan > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > http://fungalgenomes.org/ -- Dan Kortschak From ajmackey at gmail.com Wed Jan 20 07:59:38 2010 From: ajmackey at gmail.com (Aaron Mackey) Date: Wed, 20 Jan 2010 07:59:38 -0500 Subject: [Bioperl-l] using Bio::DB::GFF for aggregation In-Reply-To: <1263965525.12780.24.camel@zoidberg.mbs.adelaide.edu.au> References: <1263965525.12780.24.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: <24c96eca1001200459g36cc5610pe12a10fea8b59a4c@mail.gmail.com> I would advise using BEDtools or the R IRanges package for this kind of aggregation/merging work, rather than trying to reinvent this particular wheel. -Aaron On Wed, Jan 20, 2010 at 12:32 AM, Dan Kortschak < dan.kortschak at adelaide.edu.au> wrote: > Hi Chris (or others), > > I've been looking at ways to do large assemblies (really rnaseq/readseq > comparisons for coverage) with maq/bowtie output and it's clear that for > the size of project that I'm working on the space complexity is too > nasty with Bio::DB::Sam. So I thought Bio::DB:GFF might be the way to > go. > > I was thinking: B:T:R:Bowtie ~> B:SeqFeat:Generic -> B:T:GFF -> B:DB:GFF > > This depends on the behaviour of B:DB:GFF->features(-merge=>1). I've > read through the docs, and it's not entirely clear (I'm hoping I've > interpreted it the right way), but does this result in the return of > features such that overlapping features are returned as a single feature > while non-overlapping features come back separately. If this is the > case, it would satisfy my requirements perfectly. > > thanks for your time > Dan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From dan.kortschak at adelaide.edu.au Wed Jan 20 16:16:39 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Thu, 21 Jan 2010 07:46:39 +1030 Subject: [Bioperl-l] using Bio::DB::GFF for aggregation In-Reply-To: <24c96eca1001200459g36cc5610pe12a10fea8b59a4c@mail.gmail.com> References: <1263965525.12780.24.camel@zoidberg.mbs.adelaide.edu.au> <24c96eca1001200459g36cc5610pe12a10fea8b59a4c@mail.gmail.com> Message-ID: <1264022199.4688.29.camel@epistle> Thanks for that, I'll look into those. BEDtools looks like what I want. cheers Dan On Wed, 2010-01-20 at 07:59 -0500, Aaron Mackey wrote: > I would advise using BEDtools or the R IRanges package for this kind > of aggregation/merging work, rather than trying to reinvent this > particular wheel. > > -Aaron From biopython at maubp.freeserve.co.uk Thu Jan 21 07:33:53 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 21 Jan 2010 12:33:53 +0000 Subject: [Bioperl-l] SwissProt DE lines and UniProt XML / TagTree as XML in BioSQL Message-ID: <320fb6e01001210433n6f42e617g6519ee2790d6add5@mail.gmail.com> Hi all, This is cross posted to try and ensure relevant people see it. I suggest we continue the discussion on the BioSQL list (for how to serialise structured annotation to BioSQL), and/or the OpenBio list (for things like file format naming conventions). I am hoping we (Bio*) can be consistent in how we parse and load into BioSQL the SwissProt DE lines (known as "swiss" format in both BioPerl and Biopython's SeqIO, and by EMBOSS) or the equivalent UniProt XML tags (which we are tentatively going to call the "uniprot" format in Biopython's SeqIO - comments?). Like BioPerl (etc), Biopython can parse plain text SwissProt ("swiss") files and load them into BioSQL. Biopython currently treats the DE comment lines as a long string, as BioPerl used to: http://lists.open-bio.org/pipermail/bioperl-l/2009-May/030041.html http://lists.open-bio.org/pipermail/biosql-l/2009-May/001514.html I understand that BioPerl now turns the SwissProt DE lines into a TagTree, and for storing this in BioSQL this gets serialised as XML. I would like Biopython to handle this the same way (although rather than a Perl TagTree, we'd use a Python structure of course), and would appreciate clarification of what exactly was implemented (e.g. which bit of the BioPerl source code should be look at, and could you show a worked example?). Andrea Pierlenoin (CC'd - not sure if he is on the BioSQL or Open-Bio lists yet) has started work on parsing UniProt XML files for Biopython. Here the DE comment lines are already provided broken up with XML markup. Hopefully their nested structure matches what BioPerl was doing with the SwissProt DE lines. Regards, Peter From cjfields at illinois.edu Thu Jan 21 08:34:12 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 21 Jan 2010 07:34:12 -0600 Subject: [Bioperl-l] [Open-bio-l] SwissProt DE lines and UniProt XML / TagTree as XML in BioSQL In-Reply-To: <320fb6e01001210433n6f42e617g6519ee2790d6add5@mail.gmail.com> References: <320fb6e01001210433n6f42e617g6519ee2790d6add5@mail.gmail.com> Message-ID: Peter, The relevant code is in Bio::Annotation::TagTree in bioperl-live, which is a decorator for Data::Stag: http://search.cpan.org/~cmungall/Data-Stag-0.11/Data/Stag.pm This is where the text output is derived from. It's a bit of a heavyweight solution to the problem, but it's capable of round-tripping the DE data and parses out the data in a way that's approachable. We could probably abstract out the serialization backend there and allow a pure bioperl solution (or the current solution) as a fallback. If the plain-text DE info is represented in a hierarchy already in UniProt XML, we should probably conform as closely as possible to that (using a standard format like XML, JSON, etc.). chris On Jan 21, 2010, at 6:33 AM, Peter wrote: > Hi all, > > This is cross posted to try and ensure relevant people see it. > I suggest we continue the discussion on the BioSQL list > (for how to serialise structured annotation to BioSQL), and/or > the OpenBio list (for things like file format naming conventions). > > I am hoping we (Bio*) can be consistent in how we parse and load > into BioSQL the SwissProt DE lines (known as "swiss" format in > both BioPerl and Biopython's SeqIO, and by EMBOSS) or the > equivalent UniProt XML tags (which we are tentatively going to > call the "uniprot" format in Biopython's SeqIO - comments?). > > Like BioPerl (etc), Biopython can parse plain text SwissProt ("swiss") > files and load them into BioSQL. Biopython currently treats the DE > comment lines as a long string, as BioPerl used to: > > http://lists.open-bio.org/pipermail/bioperl-l/2009-May/030041.html > http://lists.open-bio.org/pipermail/biosql-l/2009-May/001514.html > > I understand that BioPerl now turns the SwissProt DE lines into a > TagTree, and for storing this in BioSQL this gets serialised as XML. > I would like Biopython to handle this the same way (although rather > than a Perl TagTree, we'd use a Python structure of course), and > would appreciate clarification of what exactly was implemented > (e.g. which bit of the BioPerl source code should be look at, > and could you show a worked example?). > > Andrea Pierlenoin (CC'd - not sure if he is on the BioSQL or > Open-Bio lists yet) has started work on parsing UniProt XML > files for Biopython. Here the DE comment lines are already > provided broken up with XML markup. Hopefully their nested > structure matches what BioPerl was doing with the SwissProt > DE lines. > > Regards, > > Peter From sharmashalu.bio at gmail.com Thu Jan 21 09:25:44 2010 From: sharmashalu.bio at gmail.com (shalu sharma) Date: Thu, 21 Jan 2010 09:25:44 -0500 Subject: [Bioperl-l] sequence orientation Message-ID: <465b5a661001210625j3d84a165u69d8c8d21d2fe7ac@mail.gmail.com> Hi All, This is not a perl/bioperl query but i thought that its a best place to ask. I have some pyro reads ( from CAMERA) and i want to find out their 5' and 3' ends. Is there any way i can do this? I would really appreciate if anyone can help me out. Thanks Shalu From rtbio.2009 at gmail.com Thu Jan 21 13:28:43 2010 From: rtbio.2009 at gmail.com (Roopa Raghuveer) Date: Thu, 21 Jan 2010 19:28:43 +0100 Subject: [Bioperl-l] Regarding blast in Bioperl In-Reply-To: <196889DF87964224ACDB948681BA7F86@NewLife> References: <4C2E8133F916495B876628EF3E8FCBB2@NewLife> <9D8A1428463C4D5E9C416521C35E254C@NewLife> <196889DF87964224ACDB948681BA7F86@NewLife> Message-ID: Hello Mark, This is Roopa again. I have a small problem again. I am working on Remote blast. The program works well. But the problem is this. The program accesses the server and gets the output correctly. I am trying to send the result sequences into an array and I found that always the first sequence among the Result sequences is missing. The code is my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , '-organism' => "$organ\[ORGN]"); while (my $input = $str->next_seq()) { #Blast a sequence against a database: #Alternatively, you could pass in a file with many #sequences rather than loop through sequence one at a time #Remove the loop starting 'while (my $input = $str->next_seq())' #and swap the two lines below for an example of that. open(OUTFILE,'>',$debugfile); print OUTFILE $input; close(OUTFILE); my $r = $factory->submit_blast($input); open(OUTFILE,'>',$debugfile); # print OUTFILE $r; close(OUTFILE); print STDERR "waiting...." if($v>0); while ( my @rids = $factory->each_rid ) { open(OUTFILE,'>',$debugfile); # print OUTFILE "while entered"; close(OUTFILE); foreach my $rid ( @rids ) { open(OUTFILE,'>',$debugfile); # print OUTFILE "foreach entered"; close(OUTFILE); my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $factory->remove_rid($rid); } open(OUTFILE,'>',$debugfile); # print OUTFILE "if entered"; close(OUTFILE); print STDERR "." if ( $v > 0 ); sleep 5; } else { open(OUTFILE,'>',$debugfile); # print OUTFILE "else entered"; close(OUTFILE); my $result = $rc->next_result(); #save the output $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; open(BLASTDEBUGFILE,'>',$blastdebugfile); print BLASTDEBUGFILE $result->next_hit(); close(BLASTDEBUGFILE); my $filename = $serverpath."/blastdata_".time()."\.out"; # open(DEBUGFILE,'>',$debugfile); # open(new,'>',$filename); # @arra=; # print DEBUGFILE @arra; # close(DEBUGFILE); # close(new); $factory->save_output($filename); # open(BLASTDEBUGFILE,'>',$debugfile); # print BLASTDEBUGFILE "Hello $rid"; # close(BLASTDEBUGFILE); $factory->remove_rid($rid); open(BLASTDEBUGFILE,'>',$blastdebugfile); print BLASTDEBUGFILE $organism; close(BLASTDEBUGFILE); # open(OUTFILE,'>',$outfile); # print OUTFILE "Test2 $result->database_name()"; # close(OUTFILE); #$hit = $result->next_hit; #open(new,'>',$debugfile); #print $hit; #close(new); $dummy=0; while ( my $hit = $result->next_hit ) { next unless ( $v >= 0); # open(OUTFILE,'>',$debugfile); # print OUTFILE "$hit in while hits"; # close(OUTFILE); my $sequ = $gb->get_Seq_by_version($hit->name); my $dna = $sequ->seq(); # get the sequence as a string $dummy++; open(OUTFILE,'>',$debugfile); # print OUTFILE $dummy; close(OUTFILE); push(@seqs,$dna); } } } } } $warum=@seqs; open(OUTFILE,'>',$debugfile); # print OUTFILE $warum; print OUTFILE @seqs; close(OUTFILE); return(@seqs); } open(OUTFILE, '>',$outfile) || die ; print OUTFILE "\n RNAi Result \n \n

Inputsequence:
"; Here in the above code, I was trying to debug the code and trying to get the count of the array and even the sequence. But when the output data was giving 1 sequence, the count of the array was 0 and when I tried to print the output sequence I could not get any. It was the same when the no of output sequences was 3, I tried to print the sequences but was getting the count of the array as 2 and was printing only two sequences. Please help me in sorting out this problem. Regards, Roopa. On Sun, Jan 10, 2010 at 1:03 AM, Mark A. Jensen wrote: > Excellent Roopa- it's my pleasure-- MAJ > > ----- Original Message ----- > *From:* Roopa Raghuveer > *To:* Mark A. Jensen > *Sent:* Saturday, January 09, 2010 6:41 PM > *Subject:* Re: [Bioperl-l] Regarding blast in Bioperl > > Hi Mark, > > Thank you very very much. The code is working now. Thanks for the support > and time you have spent on me. > > Thanks in advance > Roopa. > > On Sat, Jan 9, 2010 at 10:56 PM, Mark A. Jensen wrote: > >> There is still a bug with the double quotes. Use "$organ\[ORGN]", which >> prevents perl from >> looking for a member of an array called @organ. This would have shown up >> if 'use strict;' had >> been in place. Still don't know whether this would work precisely; can you >> send me the query >> sequence so I can reproduce your ouput? >> thanks MAJ >> >> ----- Original Message ----- >> *From:* Roopa Raghuveer >> *To:* Mark A. Jensen >> *Sent:* Saturday, January 09, 2010 2:02 PM >> *Subject:* Re: [Bioperl-l] Regarding blast in Bioperl >> >> Hi Mark, >> >> I tried it with double quotes but still i got the same o/p with sequences >> from different species. >> >> ref|XM_822292.1| Trypanosoma brucei TREU927 protein kinase A ... 1813 >> 0.0 >> ref|XM_822286.1| Trypanosoma brucei TREU927 protein kinase A ... 1622 >> 0.0 >> ref|XM_816530.1| Trypanosoma cruzi strain CL Brener protein k... 773 >> 0.0 >> ref|XM_816527.1| Trypanosoma cruzi strain CL Brener protein k... 749 >> 0.0 >> ref|XM_838414.1| Leishmania major strain Friedlin protein kin... 551 >> 3e-154 >> ref|XM_838409.1| Leishmania major strain Friedlin protein kin... 551 >> 3e-154 >> ref|XM_001568451.1| Leishmania braziliensis MHOM/BR/75/M2904 ... 542 >> 2e-151 >> ref|XM_001469171.1| Leishmania infantum protein kinase A cata... 538 >> 2e-150 >> ref|XM_001469166.1| Leishmania infantum protein kinase A cata... 538 >> 2e-150 >> ref|XM_001682462.1| Leishmania major protein kinase A catalyt... 196 >> 3e-47 >> ref|XM_804361.1| Trypanosoma cruzi strain CL Brener protein k... 190 >> 1e-45 >> ref|XM_002065851.1| Drosophila willistoni GK20594 (Dwil\GK205... 181 >> 7e-43 >> ref|XM_822694.1| Trypanosoma brucei TREU927 protein kinase A ... 179 >> 2e-42 >> ref|XM_001563990.1| Leishmania braziliensis MHOM/BR/75/M2904 ... 178 >> 8e-42 >> ref|XM_814844.1| Trypanosoma cruzi strain CL Brener protein k... 170 >> 1e-39 >> ref|XM_001763039.1| Physcomitrella patens subsp. patens predi... 169 >> 4e-39 >> ref|XM_001464886.1| Leishmania infantum JPCM5 protein kinase ... 167 >> 1e-38 >> ref|XM_001377302.1| PREDICTED: Monodelphis domestica similar ... 150 >> 1e-33 >> ref|XM_001603485.1| PREDICTED: Nasonia vitripennis similar to... 145 >> 5e-32 >> ref|XM_416852.2| PREDICTED: Gallus gallus protein kinase, X-l... 143 >> 2e-31 >> ref|NM_001016403.2| Xenopus (Silurana) tropicalis protein kin... 143 >> 2e-31 >> ref|XM_002009291.1| Drosophila mojavensis GI11297 (Dmoj\GI112... 138 >> 7e-30 >> ref|NM_016979.1| Mus musculus protein kinase, X-linked (Prkx)... 138 >> 7e-30 >> ref|XM_001495664.2| PREDICTED: Equus caballus similar to Seri... 136 >> 2e-29 >> ref|XM_001111571.1| PREDICTED: Macaca mulatta cAMP-dependent ... 136 >> 2e-29 >> ref|XM_001611655.1| Babesia bovis protein kinase domain conta... 134 >> 9e-29 >> ref|NR_028062.1| Homo sapiens protein kinase, Y-linked (PRKY)... 132 >> 3e-28 >> ref|XM_001517795.1| PREDICTED: Ornithorhynchus anatinus simil... 132 >> 3e-28 >> ref|XM_685338.2| PREDICTED: Danio rerio similar to Serine/thr... 132 >> 3e-28 >> ref|XM_002189865.1| PREDICTED: Taeniopygia guttata protein ki... 131 >> 1e-27 >> ref|XM_001362299.1| PREDICTED: Monodelphis domestica similar ... 131 >> 1e-27 >> ref|NM_001093198.1| Xenopus laevis protein kinase, cAMP-depen... 129 >> 4e-27 >> ref|XM_001461322.1| Paramecium tetraurelia hypothetical prote... 129 >> 4e-27 >> ref|NM_001099869.1| Xenopus laevis cAMP-dependent protein kin... 129 >> 4e-27 >> ref|NM_001003470.1| Danio rerio protein kinase, cAMP-dependen... 129 >> 4e-27 >> ref|XM_001141503.1| PREDICTED: Pan troglodytes verus protein ... 127 >> 1e-26 >> ref|XM_001145269.1| PREDICTED: Pan troglodytes protein kinase... 127 >> 1e-26 >> ref|XM_512434.2| PREDICTED: Pan troglodytes cAMP-dependent pr... 127 >> 1e-26 >> ref|XM_001171457.1| PREDICTED: Pan troglodytes cAMP-dependent... 127 >> 1e-26 >> ref|XM_001171437.1| PREDICTED: Pan troglodytes cAMP-dependent... 127 >> 1e-26 >> ref|XM_847420.1| PREDICTED: Canis familiaris similar to Serin... 127 >> 1e-26 >> ref|NM_207518.1| Homo sapiens protein kinase, cAMP-dependent,... 127 >> 1e-26 >> ref|NM_002730.3| Homo sapiens protein kinase, cAMP-dependent,... 127 >> 1e-26 >> >> >> Thanks in advance. >> >> Roopa. >> >> On Sat, Jan 9, 2010 at 7:46 PM, Mark A. Jensen wrote: >> >>> I understand you. Put in the double quotes and see what happens. >>> >>> ----- Original Message ----- >>> *From:* Roopa Raghuveer >>> *To:* Mark A. Jensen >>> *Sent:* Saturday, January 09, 2010 1:40 PM >>> *Subject:* Re: [Bioperl-l] Regarding blast in Bioperl >>> >>> Hi Mark, >>> >>> Thanks for your reply. It was working when I specifically use the name of >>> the organism as Trypanosoma brucei in the code,but my idea is to introduce a >>> $organ which takes the organism given by the user i.e., let it be anything >>> >>> Pseudomonas, Drosophila, Trypanosoma, Leishmania etc., I should get the >>> sequences related to only those organisms. >>> >>> i.e., If the user enters Pseudomonas,the $organ parameter of the code >>> takes Pseudomonas ,does BLAST and returns only those sequences that produce >>> significant alignment with Pseudomonas(only).But this is not happening like >>> that . >>> >>> Please help me in this regard. >>> >>> Thanks in advance >>> Roopa >>> >>> On Sat, Jan 9, 2010 at 7:29 PM, Mark A. Jensen wrote: >>> >>>> Hi Roopa-- You may get what you want if you make the change. >>>> With single quotes, ENTREZ_QUERY is set to the literal string >>>> >>>> $organ[ORGN] >>>> >>>> while, with double quotes, the variable value will be substituted, >>>> and the parameter should be set to >>>> >>>> Trypanosoma brucei[ORGN] >>>> >>>> I'm guess that it worked because the database ignored the strange >>>> parameter, >>>> and returned all the matches. Try this and if it doesn't work I look >>>> harder. >>>> cheers, >>>> Mark >>>> >>>> ----- Original Message ----- >>>> *From:* Roopa Raghuveer >>>> *To:* Mark A. Jensen >>>> *Sent:* Saturday, January 09, 2010 1:24 PM >>>> *Subject:* Re: [Bioperl-l] Regarding blast in Bioperl >>>> >>>> hello Mark, >>>> >>>> Thanks for your reply.It was working without enclosing $organ[ORGN] in >>>> double quotations,but. I would like to have only those specific sequences >>>> which are specific for my Organism i.e., I need sequences only from the >>>> organism that I entered. >>>> >>>> When the organism is Trypanosoma brucei,I could get even Leishmania and >>>> other species as the similar sequences. But I want to get only trypanosoma >>>> brucei sequences. >>>> >>>> Could you please help me out in this regard? >>>> >>>> Roopa. >>>> >>>> My output >>>> >>>> I/P organism: Trypanosoma brucei >>>> >>>> O/P:- >>>> ref|XM_822292.1| Trypanosoma brucei TREU927 protein kinase A ... >>>> 1813 0.0 >>>> ref|XM_822286.1| Trypanosoma brucei TREU927 protein kinase A ... >>>> 1622 0.0 >>>> ref|XM_816530.1| Trypanosoma cruzi strain CL Brener protein k... >>>> 773 0.0 >>>> ref|XM_816527.1| Trypanosoma cruzi strain CL Brener protein k... >>>> 749 0.0 >>>> ref|XM_838414.1| Leishmania major strain Friedlin protein kin... >>>> 551 3e-154 >>>> ref|XM_838409.1| Leishmania major strain Friedlin protein kin... >>>> 551 3e-154 >>>> ref|XM_001568451.1| Leishmania braziliensis MHOM/BR/75/M2904 ... >>>> 542 2e-151 >>>> ref|XM_001469171.1| Leishmania infantum protein kinase A cata... >>>> 538 2e-150 >>>> ref|XM_001469166.1| Leishmania infantum protein kinase A cata... >>>> 538 2e-150 >>>> ref|XM_001682462.1| Leishmania major protein kinase A catalyt... >>>> 196 3e-47 >>>> ref|XM_804361.1| Trypanosoma cruzi strain CL Brener protein k... >>>> 190 1e-45 >>>> ref|XM_002065851.1| Drosophila willistoni GK20594 (Dwil\GK205... >>>> 181 7e-43 >>>> ref|XM_822694.1| Trypanosoma brucei TREU927 protein kinase A ... >>>> 179 2e-42 >>>> ref|XM_001563990.1| Leishmania braziliensis MHOM/BR/75/M2904 ... >>>> 178 8e-42 >>>> ref|XM_814844.1| Trypanosoma cruzi strain CL Brener protein k... >>>> 170 1e-39 >>>> ref|XM_001763039.1| Physcomitrella patens subsp. patens predi... >>>> 168 4e-39 >>>> ref|XM_001464886.1| Leishmania infantum JPCM5 protein kinase ... >>>> 167 1e-38 >>>> ref|XM_001377302.1| PREDICTED: Monodelphis domestica similar ... >>>> 150 1e-33 >>>> ref|XM_001603485.1| PREDICTED: Nasonia vitripennis similar to... >>>> 145 5e-32 >>>> ref|XM_416852.2| PREDICTED: Gallus gallus protein kinase, X-l... >>>> 143 2e-31 >>>> ref|NM_001016403.2| Xenopus (Silurana) tropicalis protein kin... >>>> 143 2e-31 >>>> ref|XM_002009291.1| Drosophila mojavensis GI11297 (Dmoj\GI112... >>>> 138 7e-30 >>>> ref|NM_016979.1| Mus musculus protein kinase, X-linked (Prkx)... >>>> 138 7e-30 >>>> ref|XM_001495664.2| PREDICTED: Equus caballus similar to Seri... >>>> 136 2e-29 >>>> ref|XM_001111571.1| PREDICTED: Macaca mulatta cAMP-dependent ... >>>> 136 2e-29 >>>> ref|XM_001611655.1| Babesia bovis protein kinase domain conta... >>>> 134 9e-29 >>>> ref|NR_028062.1| Homo sapiens protein kinase, Y-linked (PRKY)... >>>> 132 3e-28 >>>> ref|XM_001517795.1| PREDICTED: Ornithorhynchus anatinus simil... >>>> 132 3e-28 >>>> ref|XM_685338.2| PREDICTED: Danio rerio similar to Serine/thr... >>>> 132 3e-28 >>>> ref|XM_002189865.1| PREDICTED: Taeniopygia guttata protein ki... >>>> 131 1e-27 >>>> ref|XM_001362299.1| PREDICTED: Monodelphis domestica similar ... >>>> 131 1e-27 >>>> ref|NM_001093198.1| Xenopus laevis protein kinase, cAMP-depen... >>>> 129 4e-27 >>>> ref|XM_001461322.1| Paramecium tetraurelia hypothetical prote... >>>> 129 4e-27 >>>> ref|NM_001099869.1| Xenopus laevis cAMP-dependent protein kin... >>>> 129 4e-27 >>>> >>>> Roopa. >>>> >>>> On Sat, Jan 9, 2010 at 7:05 PM, Mark A. Jensen wrote: >>>> >>>>> I see it immediately (from making same bug many times) : >>>>> >>>>> >>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY >>>>> => >>>>> - '$organ[ORGN]'); >>>>> +"$organ[ORGN]"); >>>>> >>>>> >>>>> MAJ >>>>> >>>>> ----- Original Message ----- From: "Roopa Raghuveer" < >>>>> rtbio.2009 at gmail.com> >>>>> To: "Mark A. Jensen" >>>>> Cc: >>>>> Sent: Saturday, January 09, 2010 11:57 AM >>>>> Subject: Re: [Bioperl-l] Regarding blast in Bioperl >>>>> >>>>> >>>>> >>>>> Hello all, >>>>>> >>>>>> Thanks alot for your reply Mark. It was working for Trypanosoma brucei >>>>>> as >>>>>> the organism parameter,but when I tried to use the Organism parameter >>>>>> from >>>>>> the user,it was not working i.e., I was unable to get the target >>>>>> sequences. >>>>>> Please help me in this regard. My code is >>>>>> >>>>>> #!/usr/bin/perl >>>>>> >>>>>> #path for extra camel module >>>>>> use lib "/srv/www/htdocs/rain/RNAi/"; >>>>>> use Roopablast; >>>>>> >>>>>> >>>>>> use Bio::SearchIO; >>>>>> use Bio::Search::Result::BlastResult; >>>>>> use Bio::Perl; >>>>>> use Bio::Tools::Run::RemoteBlast; >>>>>> use Bio::Seq; >>>>>> use Bio::SeqIO; >>>>>> use Bio::DB::GenBank; >>>>>> >>>>>> $serverpath = "/srv/www/htdocs/rain/RNAi"; >>>>>> $serverurl = "http://141.84.66.66/rain/RNAi"; >>>>>> $outfile = $serverpath."/rnairesult_".time().".html"; >>>>>> $nuc = $serverpath."/nuc".time().".txt"; >>>>>> $debugfile = $serverpath."/debug_".time().".txt"; >>>>>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; >>>>>> >>>>>> my $outstring =""; >>>>>> >>>>>> &parse_form; >>>>>> >>>>>> print "Content-type: text/html\n\n"; >>>>>> print "\n"; >>>>>> print "RNAi Result"; >>>>>> print ">>>>> URL=$serverurl/rnairesult_".time().".html\"> \n"; >>>>>> print "\n"; >>>>>> print "\n"; >>>>>> print " Your results will appear >>>>> href=$serverurl/rnairesult_".time().".html>here
"; >>>>>> print " Please be patient, runtime can be up to 5 minutes
"; >>>>>> print " This page will automatically reload in 30 seconds. Roopa"; >>>>>> print "\n"; >>>>>> print "\n"; >>>>>> >>>>>> defined(my $pid = fork) or die "Can't fork: $!"; >>>>>> exit if $pid; >>>>>> open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; >>>>>> open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; >>>>>> open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; >>>>>> >>>>>> open(OUTFILE, '>',$outfile); >>>>>> >>>>>> print OUTFILE "\n >>>>>> RNAi Result >>>>>> >>>>> URL=$serverurl//rnairesult_".time().".html\"> \n >>>>>> >>>>>> \n >>>>>> \n >>>>>> Your results will appear >>>>> href=$serverurl/rnairesult_".time().".html>here
>>>>>> Please be patient, runtime can be up to 5 minutes wait wait >>>>>> wait......
>>>>>> This page will automatically reload in 30 seconds Roopa
>>>>>> \n >>>>>> \n"; >>>>>> >>>>>> close(OUTFILE); >>>>>> >>>>>> >>>>>> @compseqs = blastcode($in{'Inputseq'},$in{'Organism'}); >>>>>> >>>>>> $in{'Inputseq'} =~ s/>.*$//m; >>>>>> $in{'Inputseq'} =~ s/[^TAGC]//gim; >>>>>> $in{'Inputseq'} =~ tr/actg/ACTG/; >>>>>> >>>>>> @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'}, >>>>>> $in{'Threshold'}); >>>>>> >>>>>> >>>>>> sub blastcode >>>>>> { >>>>>> >>>>>> $inpu1= $_[0]; >>>>>> >>>>>> $organ= $_[1]; >>>>>> >>>>>> open(NUC,'>',$nuc); >>>>>> print NUC $inpu1,"\n"; >>>>>> close(NUC); >>>>>> >>>>>> my $prog = 'blastn'; >>>>>> my $db = 'refseq_rna'; >>>>>> my $e_val= '1e-10'; >>>>>> my $organism= $organ; >>>>>> >>>>>> $gb = new Bio::DB::GenBank; >>>>>> >>>>>> my @params = ( '-prog' => $prog, >>>>>> '-data' => $db, >>>>>> '-expect' => $e_val, >>>>>> '-readmethod' => 'SearchIO', >>>>>> '-Organism' => $organism ); >>>>>> >>>>>> open(OUTFILE,'>',$debugfile); >>>>>> print OUTFILE $inpu1; >>>>>> close(OUTFILE); >>>>>> >>>>>> >>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY >>>>>> => >>>>>> '$organ[ORGN]'); >>>>>> >>>>>> #my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >>>>>> >>>>>> #change a paramter >>>>>> >>>>>> #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma >>>>>> Brucei[ORGN]'; >>>>>> >>>>>> #change a paramter >>>>>> # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = >>>>>> '$input2[ORGN]'; >>>>>> >>>>>> my $v = 1; >>>>>> #$v is just to turn on and off the messages >>>>>> >>>>>> my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , >>>>>> '-organism' => $organ ); >>>>>> >>>>>> >>>>>> while (my $input = $str->next_seq()) >>>>>> { >>>>>> #Blast a sequence against a database: >>>>>> #Alternatively, you could pass in a file with many >>>>>> #sequences rather than loop through sequence one at a time >>>>>> #Remove the loop starting 'while (my $input = $str->next_seq())' >>>>>> #and swap the two lines below for an example of that. >>>>>> >>>>>> #open(OUTFILE,'>',$debugfile); >>>>>> # print OUTFILE $input; >>>>>> #close(OUTFILE); >>>>>> >>>>>> >>>>>> my $r = $factory->submit_blast($input); >>>>>> >>>>>> open(OUTFILE,'>',$debugfile); >>>>>> # print OUTFILE $r; >>>>>> close(OUTFILE); >>>>>> >>>>>> print STDERR "waiting...." if($v>0); >>>>>> >>>>>> while ( my @rids = $factory->each_rid ) { >>>>>> # open(OUTFILE,'>',$debugfile); >>>>>> # print OUTFILE "while entered"; >>>>>> # close(OUTFILE); >>>>>> foreach my $rid ( @rids ) { >>>>>> >>>>>> # open(OUTFILE,'>',$debugfile); >>>>>> # print OUTFILE "foreach entered"; >>>>>> # close(OUTFILE); >>>>>> >>>>>> my $rc = $factory->retrieve_blast($rid); >>>>>> >>>>>> if( !ref($rc) ) >>>>>> { >>>>>> if( $rc < 0 ) >>>>>> { >>>>>> $factory->remove_rid($rid); >>>>>> } >>>>>> open(OUTFILE,'>',$debugfile); >>>>>> # print OUTFILE "if entered"; >>>>>> close(OUTFILE); >>>>>> print STDERR "." if ( $v > 0 ); >>>>>> sleep 5; >>>>>> } >>>>>> else { >>>>>> # open(OUTFILE,'>',$debugfile); >>>>>> # print OUTFILE "else entered"; >>>>>> # close(OUTFILE); >>>>>> >>>>>> my $result = $rc->next_result(); >>>>>> #save the output >>>>>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; >>>>>> >>>>>> open(BLASTDEBUGFILE,'>',$blastdebugfile); >>>>>> print BLASTDEBUGFILE $result->next_hit(); >>>>>> close(BLASTDEBUGFILE); >>>>>> >>>>>> my $filename = >>>>>> $serverpath."/blastdata_".time().$result->query_name()."\.out"; >>>>>> >>>>>> # open(DEBUGFILE,'>',$debugfile); >>>>>> # open(new,'>',$filename); >>>>>> # @arra=; >>>>>> # print DEBUGFILE @arra; >>>>>> # close(DEBUGFILE); >>>>>> # close(new); >>>>>> >>>>>> $factory->save_output($filename); >>>>>> # open(BLASTDEBUGFILE,'>',$debugfile); >>>>>> # print BLASTDEBUGFILE "Hello $rid"; >>>>>> # close(BLASTDEBUGFILE); >>>>>> >>>>>> $factory->remove_rid($rid); >>>>>> >>>>>> open(BLASTDEBUGFILE,'>',$blastdebugfile); >>>>>> print BLASTDEBUGFILE $organism; >>>>>> close(BLASTDEBUGFILE); >>>>>> >>>>>> # open(OUTFILE,'>',$outfile); >>>>>> # print OUTFILE "Test2 $result->database_name()"; >>>>>> # close(OUTFILE); >>>>>> >>>>>> #$hit = $result->next_hit; >>>>>> #open(new,'>',$debugfile); >>>>>> #print $hit; >>>>>> #close(new); >>>>>> >>>>>> while ( my $hit = $result->next_hit ) { >>>>>> >>>>>> next unless ( $v > 0); >>>>>> >>>>>> # open(OUTFILE,'>',$debugfile); >>>>>> # print OUTFILE "$hit in while hits"; >>>>>> # close(OUTFILE); >>>>>> >>>>>> my $sequ = $gb->get_Seq_by_version($hit->name); >>>>>> my $dna = $sequ->seq(); # get the sequence as a string >>>>>> push(@seqs,$dna); >>>>>> } >>>>>> } >>>>>> } >>>>>> } >>>>>> } >>>>>> >>>>>> #open(OUTFILE,'>',$debugfile); >>>>>> #print OUTFILE $seqs[0]; >>>>>> #close(OUTFILE); >>>>>> >>>>>> return(@seqs); >>>>>> >>>>>> } >>>>>> >>>>>> Regards, >>>>>> Roopa. >>>>>> >>>>>> >>>>>> On Fri, Jan 8, 2010 at 4:36 PM, Mark A. Jensen >>>>>> wrote: >>>>>> >>>>>> Hi Roopa-- >>>>>>> >>>>>>> I got your code to work with the following changes: >>>>>>> >>>>>>> +# the input should be a valid FASTA file... >>>>>>> ... >>>>>>> open(NUC,'>',$nuc); >>>>>>> +print NUC ">seq (need a name line for valid fasta)\n"; >>>>>>> print NUC $inpu1, "\n"; >>>>>>> close(NUC); >>>>>>> ... >>>>>>> >>>>>>> +# you can set these header parms in the call itself... >>>>>>> - my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >>>>>>> + my $factory = Bio::Tools::Run::RemoteBlast->new(@params, >>>>>>> -ENTREZ_QUERY => >>>>>>> ''Trypanosoma Brucei[ORGN]'); >>>>>>> >>>>>>> #change a paramter >>>>>>> +# commented this out... >>>>>>> +# $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = >>>>>>> 'Trypanosoma >>>>>>> Brucei[ORGN]'; >>>>>>> >>>>>>> MAJ >>>>>>> ----- Original Message ----- From: "Roopa Raghuveer" < >>>>>>> rtbio.2009 at gmail.com >>>>>>> > >>>>>>> To: >>>>>>> Sent: Friday, January 08, 2010 10:00 AM >>>>>>> Subject: [Bioperl-l] Regarding blast in Bioperl >>>>>>> >>>>>>> >>>>>>> Hello all, >>>>>>> >>>>>>>> >>>>>>>> I was trying Remote blast using Bioperl. My input data is a >>>>>>>> Trypanosoma >>>>>>>> brucei sequence in Fasta format. When I was trying to submit to >>>>>>>> BLAST >>>>>>>> using >>>>>>>> the step >>>>>>>> $r=$factory->submit_blast($input) >>>>>>>> It was not returning anything which I checked by debugging the code. >>>>>>>> It is >>>>>>>> not blasting my input sequence even though I mentioned all the >>>>>>>> parameters.I >>>>>>>> would paste the code below. >>>>>>>> >>>>>>>> Please help me in solving put this problem. It is very urgent. >>>>>>>> >>>>>>>> Regards >>>>>>>> Roopa. >>>>>>>> >>>>>>>> #!/usr/bin/perl >>>>>>>> >>>>>>>> #path for extra camel module >>>>>>>> use lib "/srv/www/htdocs/rain/RNAi/"; >>>>>>>> use Roopablast; >>>>>>>> >>>>>>>> >>>>>>>> use Bio::SearchIO; >>>>>>>> use Bio::Search::Result::BlastResult; >>>>>>>> use Bio::Perl; >>>>>>>> use Bio::Tools::Run::RemoteBlast; >>>>>>>> use Bio::Seq; >>>>>>>> use Bio::SeqIO; >>>>>>>> use Bio::DB::GenBank; >>>>>>>> >>>>>>>> $serverpath = "/srv/www/htdocs/rain/RNAi"; >>>>>>>> $serverurl = "http://141.84.66.66/rain/RNAi"; >>>>>>>> $outfile = $serverpath."/rnairesult_".time().".html"; >>>>>>>> $nuc = $serverpath."/nuc".time().".txt"; >>>>>>>> $debugfile = $serverpath."/debug_".time().".txt"; >>>>>>>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; >>>>>>>> >>>>>>>> my $outstring =""; >>>>>>>> >>>>>>>> &parse_form; >>>>>>>> >>>>>>>> print "Content-type: text/html\n\n"; >>>>>>>> print "\n"; >>>>>>>> print "RNAi Result"; >>>>>>>> print ">>>>>>> URL=$serverurl/rnairesult_".time().".html\"> \n"; >>>>>>>> print "\n"; >>>>>>>> print "\n"; >>>>>>>> print " Your results will appear >>>>>>> href=$serverurl/rnairesult_".time().".html>here
"; >>>>>>>> print " Please be patient, runtime can be up to 5 minutes
"; >>>>>>>> print " This page will automatically reload in 30 seconds. Roopa"; >>>>>>>> print "\n"; >>>>>>>> print "\n"; >>>>>>>> >>>>>>>> defined(my $pid = fork) or die "Can't fork: $!"; >>>>>>>> exit if $pid; >>>>>>>> open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; >>>>>>>> open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; >>>>>>>> open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> open(OUTFILE, '>',$outfile); >>>>>>>> >>>>>>>> print OUTFILE "\n >>>>>>>> RNAi Result >>>>>>>> >>>>>>> URL=$serverurl//rnairesult_".time().".html\"> \n >>>>>>>> >>>>>>>> \n >>>>>>>> \n >>>>>>>> Your results will appear >>>>>>> href=$serverurl/rnairesult_".time().".html>here
>>>>>>>> Please be patient, runtime can be up to 5 minutes wait wait >>>>>>>> wait......
>>>>>>>> This page will automatically reload in 30 seconds Roopa
>>>>>>>> \n >>>>>>>> \n"; >>>>>>>> >>>>>>>> close(OUTFILE); >>>>>>>> >>>>>>>> >>>>>>>> @compseqs = blastcode($in{'Inputseq'}); >>>>>>>> >>>>>>>> $in{'Inputseq'} =~ s/>.*$//m; >>>>>>>> $in{'Inputseq'} =~ s/[^TAGC]//gim; >>>>>>>> $in{'Inputseq'} =~ tr/actg/ACTG/; >>>>>>>> >>>>>>>> @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'}, >>>>>>>> $in{'Threshold'}); >>>>>>>> >>>>>>>> >>>>>>>> sub blastcode >>>>>>>> { >>>>>>>> >>>>>>>> $inpu1= $_[0]; >>>>>>>> >>>>>>>> #$organ= $_[1]; >>>>>>>> >>>>>>>> open(NUC,'>',$nuc); >>>>>>>> print NUC $inpu1; >>>>>>>> close(NUC); >>>>>>>> >>>>>>>> my $prog = 'blastn'; >>>>>>>> my $db = 'refseq_rna'; >>>>>>>> my $e_val= '1e-10'; >>>>>>>> my $organism= 'Trypanosoma Brucei'; >>>>>>>> >>>>>>>> $gb = new Bio::DB::GenBank; >>>>>>>> >>>>>>>> my @params = ( '-prog' => $prog, >>>>>>>> '-data' => $db, >>>>>>>> '-expect' => $e_val, >>>>>>>> '-readmethod' => 'SearchIO', >>>>>>>> '-Organism' => $organism ); >>>>>>>> >>>>>>>> # open(OUTFILE,'>',$debugfile); >>>>>>>> # print OUTFILE @params; >>>>>>>> # close(OUTFILE); >>>>>>>> >>>>>>>> >>>>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >>>>>>>> >>>>>>>> #change a paramter >>>>>>>> >>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma >>>>>>>> Brucei[ORGN]'; >>>>>>>> >>>>>>>> #change a paramter >>>>>>>> # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = >>>>>>>> '$input2[ORGN]'; >>>>>>>> >>>>>>>> my $v = 1; >>>>>>>> #$v is just to turn on and off the messages >>>>>>>> >>>>>>>> my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , >>>>>>>> '-organism' => 'Trypanosoma Brucei' ); >>>>>>>> >>>>>>>> >>>>>>>> while (my $input = $str->next_seq()) >>>>>>>> { >>>>>>>> #Blast a sequence against a database: >>>>>>>> #Alternatively, you could pass in a file with many >>>>>>>> #sequences rather than loop through sequence one at a time >>>>>>>> #Remove the loop starting 'while (my $input = $str->next_seq())' >>>>>>>> #and swap the two lines below for an example of that. >>>>>>>> >>>>>>>> open(OUTFILE,'>',$debugfile); >>>>>>>> print OUTFILE $input; >>>>>>>> close(OUTFILE); >>>>>>>> >>>>>>>> >>>>>>>> my $r = $factory->submit_blast($input); #The program stops here >>>>>>>> it >>>>>>>> does not return any value and it does not enter the While >>>>>>>> loop,Please help >>>>>>>> me in this regard.# >>>>>>>> open(OUTFILE,'>',$debugfile); >>>>>>>> print OUTFILE $r; >>>>>>>> close(OUTFILE); >>>>>>>> >>>>>>>> >>>>>>>> print STDERR "waiting...." if($v>0); >>>>>>>> >>>>>>>> while ( my @rids = $factory->each_rid ) { >>>>>>>> open(OUTFILE,'>',$debugfile); >>>>>>>> print OUTFILE "while entered"; >>>>>>>> close(OUTFILE); >>>>>>>> foreach my $rid ( @rids ) { >>>>>>>> >>>>>>>> open(OUTFILE,'>',$debugfile); >>>>>>>> print OUTFILE "foreach entered"; >>>>>>>> close(OUTFILE); >>>>>>>> >>>>>>>> my $rc = $factory->retrieve_blast($rid); >>>>>>>> >>>>>>>> if( !ref($rc) ) >>>>>>>> { >>>>>>>> if( $rc < 0 ) >>>>>>>> { >>>>>>>> $factory->remove_rid($rid); >>>>>>>> } >>>>>>>> open(OUTFILE,'>',$debugfile); >>>>>>>> print OUTFILE "if entered"; >>>>>>>> close(OUTFILE); >>>>>>>> print STDERR "." if ( $v > 0 ); >>>>>>>> sleep 5; >>>>>>>> } >>>>>>>> else { >>>>>>>> open(OUTFILE,'>',$debugfile); >>>>>>>> print OUTFILE "else entered"; >>>>>>>> close(OUTFILE); >>>>>>>> >>>>>>>> my $result = $rc->next_result(); >>>>>>>> #save the output >>>>>>>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; >>>>>>>> >>>>>>>> open(BLASTDEBUGFILE,'>',$blastdebugfile); >>>>>>>> print BLASTDEBUGFILE $result->next_hit(); >>>>>>>> close(BLASTDEBUGFILE); >>>>>>>> >>>>>>>> my $filename = >>>>>>>> $serverpath."/blastdata_".time().$result->query_name()."\.out"; >>>>>>>> >>>>>>>> # open(DEBUGFILE,'>',$debugfile); >>>>>>>> # open(new,'>',$filename); >>>>>>>> # @arra=; >>>>>>>> # print DEBUGFILE @arra; >>>>>>>> # close(DEBUGFILE); >>>>>>>> # close(new); >>>>>>>> >>>>>>>> $factory->save_output($filename); >>>>>>>> >>>>>>>> # open(BLASTDEBUGFILE,'>',$debugfile); >>>>>>>> # print BLASTDEBUGFILE "Hello $rid"; >>>>>>>> # close(BLASTDEBUGFILE); >>>>>>>> >>>>>>>> $factory->remove_rid($rid); >>>>>>>> >>>>>>>> open(BLASTDEBUGFILE,'>',$blastdebugfile); >>>>>>>> print BLASTDEBUGFILE $organism; >>>>>>>> close(BLASTDEBUGFILE); >>>>>>>> >>>>>>>> # open(OUTFILE,'>',$outfile); >>>>>>>> # print OUTFILE "Test2 $result->database_name()"; >>>>>>>> # close(OUTFILE); >>>>>>>> >>>>>>>> #$hit = $result->next_hit; >>>>>>>> #open(new,'>',$debugfile); >>>>>>>> #print $hit; >>>>>>>> #close(new); >>>>>>>> >>>>>>>> while ( my $hit = $result->next_hit ) { >>>>>>>> >>>>>>>> next unless ( $v > 0); >>>>>>>> >>>>>>>> # open(OUTFILE,'>',$debugfile); >>>>>>>> # print OUTFILE "$hit in while hits"; >>>>>>>> # close(OUTFILE); >>>>>>>> >>>>>>>> my $sequ = $gb->get_Seq_by_version($hit->name); >>>>>>>> my $dna = $sequ->seq(); # get the sequence as a >>>>>>>> string >>>>>>>> push(@seqs,$dna); >>>>>>>> } >>>>>>>> } >>>>>>>> } >>>>>>>> } >>>>>>>> } >>>>>>>> >>>>>>>> #open(OUTFILE,'>',$debugfile); >>>>>>>> #print OUTFILE $seqs[0]; >>>>>>>> #close(OUTFILE); >>>>>>>> >>>>>>>> return(@seqs); >>>>>>>> >>>>>>>> } >>>>>>>> >>>>>>>> open(OUTFILE, '>',$outfile) || die ; >>>>>>>> >>>>>>>> print OUTFILE "\n >>>>>>>> RNAi Result >>>>>>>> \n >>>>>>>> \n >>>>>>>>

>>>>>>>> Inputsequence:
"; >>>>>>>> >>>>>>>> for ($i=0; $i>>>>>>> >>>>>>>> print OUTFILE substr ($in{'Inputseq'}, $i, 1); >>>>>>>> >>>>>>>> if ( ($i+1)%10==0){ >>>>>>>> print OUTFILE " "; >>>>>>>> } >>>>>>>> if ( ($i+1)%60==0){ >>>>>>>> print OUTFILE "
\n"; >>>>>>>> } >>>>>>>> } >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> print OUTFILE "

"; >>>>>>>> >>>>>>>> $z=@compseqs; >>>>>>>> >>>>>>>> for($k=1;$k<$z;$k++) { >>>>>>>> print OUTFILE ">>>>>>> set\">

Compare >>>>>>>> Sequence:
"; >>>>>>>> >>>>>>>> for ($i=0; $i>>>>>>> >>>>>>>> print OUTFILE substr ($compseqs[$k], $i, 1); >>>>>>>> >>>>>>>> if ( ($i+1)%10==0){ >>>>>>>> print OUTFILE " "; >>>>>>>> } >>>>>>>> if ( ($i+1)%60==0){ >>>>>>>> print OUTFILE "
\n"; >>>>>>>> } >>>>>>>> } >>>>>>>> print OUTFILE "

"; >>>>>>>> } >>>>>>>> >>>>>>>> print OUTFILE "

>>>>>>>> Window:
$in{'Windowsize'} >>>>>>>>

>>>>>>>>

>>>>>>>> Threshold:
$in{'Threshold'} >>>>>>>>

"; >>>>>>>> my $j=0; >>>>>>>> >>>>>>>> for ($i=0; $i>>>>>>> >>>>>>>> if ($i<=length ($in{'Inputseq'})-$in{'Windowsize'}){ >>>>>>>> if ($out[$i]->{similar}<=$in{'Threshold'}){ >>>>>>>> $j=$in{'Windowsize'}; >>>>>>>> } >>>>>>>> $height=$out[$i]->{similar}*5; >>>>>>>> } >>>>>>>> >>>>>>>> if ($j>0) { >>>>>>>> print OUTFILE ">>>>>>> height=\"5\">"; >>>>>>>> $outstring .= "".substr ($in{'Inputseq'}, >>>>>>>> $i, >>>>>>>> 1).""; >>>>>>>> $j--; >>>>>>>> } >>>>>>>> else { >>>>>>>> print OUTFILE ">>>>>>> height=\"5\">"; >>>>>>>> $outstring .= "".substr ($in{'Inputseq'}, >>>>>>>> $i, >>>>>>>> 1).""; >>>>>>>> } >>>>>>>> >>>>>>>> if ( ($i+1)%10==0){ >>>>>>>> $outstring .= " "; >>>>>>>> } >>>>>>>> if ( ($i+1)%60==0){ >>>>>>>> $outstring .= "
\n"; >>>>>>>> >>>>>>>> } >>>>>>>> if ( ($i+1)%800==0){ >>>>>>>> print OUTFILE "

\n"; >>>>>>>> >>>>>>>> } >>>>>>>> } >>>>>>>> >>>>>>>> print OUTFILE "

>>>>>>> set\">$outstring"; >>>>>>>> >>>>>>>> #foreach (@out) { >>>>>>>> #print OUTFILE "

Sequence: $_->{sequence}: $_->{similar} >>>>>>>> matchs

"; >>>>>>>> #if ($_->{similar}<=$in{'Threshold'}){ >>>>>>>> >>>>>>>> # } >>>>>>>> #} >>>>>>>> >>>>>>>> print OUTFILE "\n\n"; >>>>>>>> >>>>>>>> close OUTFILE; >>>>>>>> >>>>>>>> #nameprint(); >>>>>>>> >>>>>>>> sub parse_form { >>>>>>>> local ($buffer, @pairs, $pair, $name, $value); >>>>>>>> # Read in text >>>>>>>> $ENV{'REQUEST_METHOD'} =~ tr/a-z/A-Z/; >>>>>>>> if ($ENV{'REQUEST_METHOD'} eq "POST") >>>>>>>> { >>>>>>>> read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'}); >>>>>>>> } >>>>>>>> else >>>>>>>> { >>>>>>>> $buffer = $ENV{'QUERY_STRING'}; >>>>>>>> } >>>>>>>> @pairs = split(/&/, $buffer); >>>>>>>> foreach $pair (@pairs) >>>>>>>> { >>>>>>>> ($name, $value) = split(/=/, $pair); >>>>>>>> $value =~ tr/+/ /; >>>>>>>> $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg; >>>>>>>> $in{$name} = $value; >>>>>>>> } >>>>>>>> } >>>>>>>> _______________________________________________ >>>>>>>> Bioperl-l mailing list >>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>>>> >>>> >>> >> > From bernd.web at gmail.com Thu Jan 21 13:37:18 2010 From: bernd.web at gmail.com (Bernd Web) Date: Thu, 21 Jan 2010 19:37:18 +0100 Subject: [Bioperl-l] Regarding blast in Bioperl In-Reply-To: References: <9D8A1428463C4D5E9C416521C35E254C@NewLife> <196889DF87964224ACDB948681BA7F86@NewLife> Message-ID: <716af09c1001211037p59b19a29l1967f1e514469e79@mail.gmail.com> Hi, Regarding RemoteBlast, my I add a query? It seems that Bio::Tools::Run::RemoteBlast is sending each sequence seperately to the NCBI (at least in BP 1.5.2). This means that for each Sequence a RID is to be checked. Is this indeed the case? The BLAST URL-API or batch interface supports sending multiple sequences at once. Regards, Bernd On Thu, Jan 21, 2010 at 7:28 PM, Roopa Raghuveer wrote: > Hello Mark, > > This is Roopa again. I have a small problem again. I am working on Remote > blast. The program works well. But the problem is this. ?The program > accesses the server and gets the output correctly. I am trying to send the > result sequences into an array and I found that always the first sequence > among the Result sequences is missing. The code is > > ?my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , > '-organism' => "$organ\[ORGN]"); From cjfields at illinois.edu Thu Jan 21 23:31:25 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 21 Jan 2010 22:31:25 -0600 Subject: [Bioperl-l] Bio::BroodComb - RFC In-Reply-To: <638696D6-7529-4717-A05A-F1E8FF1C5A8F@jays.net> References: <638696D6-7529-4717-A05A-F1E8FF1C5A8F@jays.net> Message-ID: Jay, Did you want to release it to CPAN? I'll take a closer look at the docs to get an idea of what you are doing with it, but from my perspective I can see this becoming a nice general use tool akin to Bio::Perl, maybe a bit more lightweight. chris On Jan 18, 2010, at 6:22 PM, Jay Hannah wrote: > I formalized a little framework so I could stop re-writing little programs that do some things people frequently ask me to do. > > http://github.com/jhannah/bio-broodcomb > > It stores everything in SQLite so users can write custom reports if they want to. It uses BioPerl and some shiny CPAN beads (DBIx::Class, Moose::Role). Tests included. > > The first two functions I stuck in the framework: > > Find subsequences (Bio::BroodComb::SubSeq): > > use Bio::BroodComb; > my $bc = Bio::BroodComb->new(); > $bc->load_large_seq(file => "large_seq.fasta"); > $bc->load_small_seq(file => "small_seq.fasta"); > $bc->find_subseqs(); > print $bc->subseq_report1; > > In-silico PCR (Bio::BroodComb::PCR): > > use Bio::BroodComb; > my $bc = Bio::BroodComb->new(); > $bc->load_large_seq(file => "large_seq.fasta"); > $bc->add_primerset( > description => "U5/R", # however you want it reported > forward_primer => 'GCGGGCAGCAATACTGCTTTGTAA', > reverse_primer => 'ACCAGCGTTCAGCATATGGAGGAT', > ); > $bc->find_pcr_hits(); > $bc->find_pcr_products(); > print $bc->pcr_report1; > > I find this rather handy, so will probably be adding all my applicable future work to it instead of writing stand-alone programs. Not sure if it should be renamed for eventual CPAN / wherever. > > Suggestions, contributions welcome. :) > > http://github.com/jhannah/bio-broodcomb > > Jay Hannah > http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Fri Jan 22 01:17:14 2010 From: jason at bioperl.org (Jason Stajich) Date: Thu, 21 Jan 2010 22:17:14 -0800 Subject: [Bioperl-l] forcing alphabet in Bio::AlignIO Message-ID: I'm considering putting in allowable initialization parameter (and get/ set) for Bio::AlignIO that would allow setting of the alphabet. This is then passed to Bio::LocatableSeq creation so that _guess_alphabet isn't called. This will allow removal of warnings about empty sequences because _guess_alphabet won't be called on a sequence if we have explictly set the alphabet. This worked great on my local install and tests pass. Any objections or concerns? basically it means when you make an AlignIO you can specify the alphabet i.e. my $in = Bio::AlignIO->new(-format => 'fasta', -alphabet => 'dna', - file => 'genome.fasaln'); I have some alignments with empty sequences and I think turning off the warnings is appropriate where I force the alphabet choice. It should also have a very modest speedup benefit too. -jason -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ http://twitter.com/hyphaltip From rtbio.2009 at gmail.com Fri Jan 22 04:54:32 2010 From: rtbio.2009 at gmail.com (Roopa Raghuveer) Date: Fri, 22 Jan 2010 10:54:32 +0100 Subject: [Bioperl-l] Fwd: Regarding blast in Bioperl In-Reply-To: References: <9D8A1428463C4D5E9C416521C35E254C@NewLife> <196889DF87964224ACDB948681BA7F86@NewLife> Message-ID: ---------- Forwarded message ---------- From: Roopa Raghuveer Date: Thu, Jan 21, 2010 at 7:28 PM Subject: Re: [Bioperl-l] Regarding blast in Bioperl To: "Mark A. Jensen" Cc: bioperl-l at lists.open-bio.org Hello Mark, This is Roopa again. I have a small problem again. I am working on Remote blast. The program works well. But the problem is this. The program accesses the server and gets the output correctly. I am trying to send the result sequences into an array and I found that always the first sequence among the Result sequences is missing. The code is my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , '-organism' => "$organ\[ORGN]"); while (my $input = $str->next_seq()) { #Blast a sequence against a database: #Alternatively, you could pass in a file with many #sequences rather than loop through sequence one at a time #Remove the loop starting 'while (my $input = $str->next_seq())' #and swap the two lines below for an example of that. open(OUTFILE,'>',$debugfile); print OUTFILE $input; close(OUTFILE); my $r = $factory->submit_blast($input); open(OUTFILE,'>',$debugfile); # print OUTFILE $r; close(OUTFILE); print STDERR "waiting...." if($v>0); while ( my @rids = $factory->each_rid ) { open(OUTFILE,'>',$debugfile); # print OUTFILE "while entered"; close(OUTFILE); foreach my $rid ( @rids ) { open(OUTFILE,'>',$debugfile); # print OUTFILE "foreach entered"; close(OUTFILE); my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $factory->remove_rid($rid); } open(OUTFILE,'>',$debugfile); # print OUTFILE "if entered"; close(OUTFILE); print STDERR "." if ( $v > 0 ); sleep 5; } else { open(OUTFILE,'>',$debugfile); # print OUTFILE "else entered"; close(OUTFILE); my $result = $rc->next_result(); #save the output $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; open(BLASTDEBUGFILE,'>',$blastdebugfile); print BLASTDEBUGFILE $result->next_hit(); close(BLASTDEBUGFILE); my $filename = $serverpath."/blastdata_".time()."\.out"; # open(DEBUGFILE,'>',$debugfile); # open(new,'>',$filename); # @arra=; # print DEBUGFILE @arra; # close(DEBUGFILE); # close(new); $factory->save_output($filename); # open(BLASTDEBUGFILE,'>',$debugfile); # print BLASTDEBUGFILE "Hello $rid"; # close(BLASTDEBUGFILE); $factory->remove_rid($rid); open(BLASTDEBUGFILE,'>',$blastdebugfile); print BLASTDEBUGFILE $organism; close(BLASTDEBUGFILE); # open(OUTFILE,'>',$outfile); # print OUTFILE "Test2 $result->database_name()"; # close(OUTFILE); #$hit = $result->next_hit; #open(new,'>',$debugfile); #print $hit; #close(new); $dummy=0; while ( my $hit = $result->next_hit ) { next unless ( $v >= 0); # open(OUTFILE,'>',$debugfile); # print OUTFILE "$hit in while hits"; # close(OUTFILE); my $sequ = $gb->get_Seq_by_version($hit->name); my $dna = $sequ->seq(); # get the sequence as a string $dummy++; open(OUTFILE,'>',$debugfile); # print OUTFILE $dummy; close(OUTFILE); push(@seqs,$dna); } } } } } $warum=@seqs; open(OUTFILE,'>',$debugfile); # print OUTFILE $warum; print OUTFILE @seqs; close(OUTFILE); return(@seqs); } open(OUTFILE, '>',$outfile) || die ; print OUTFILE "\n RNAi Result \n \n

Inputsequence:
"; Here in the above code, I was trying to debug the code and trying to get the count of the array and even the sequence. But when the output data was giving 1 sequence, the count of the array was 0 and when I tried to print the output sequence I could not get any. It was the same when the no of output sequences was 3, I tried to print the sequences but was getting the count of the array as 2 and was printing only two sequences. Please help me in sorting out this problem. Regards, Roopa. On Sun, Jan 10, 2010 at 1:03 AM, Mark A. Jensen wrote: > Excellent Roopa- it's my pleasure-- MAJ > > ----- Original Message ----- > *From:* Roopa Raghuveer > *To:* Mark A. Jensen > *Sent:* Saturday, January 09, 2010 6:41 PM > *Subject:* Re: [Bioperl-l] Regarding blast in Bioperl > > Hi Mark, > > Thank you very very much. The code is working now. Thanks for the support > and time you have spent on me. > > Thanks in advance > Roopa. > > On Sat, Jan 9, 2010 at 10:56 PM, Mark A. Jensen wrote: > >> There is still a bug with the double quotes. Use "$organ\[ORGN]", which >> prevents perl from >> looking for a member of an array called @organ. This would have shown up >> if 'use strict;' had >> been in place. Still don't know whether this would work precisely; can you >> send me the query >> sequence so I can reproduce your ouput? >> thanks MAJ >> >> ----- Original Message ----- >> *From:* Roopa Raghuveer >> *To:* Mark A. Jensen >> *Sent:* Saturday, January 09, 2010 2:02 PM >> *Subject:* Re: [Bioperl-l] Regarding blast in Bioperl >> >> Hi Mark, >> >> I tried it with double quotes but still i got the same o/p with sequences >> from different species. >> >> ref|XM_822292.1| Trypanosoma brucei TREU927 protein kinase A ... 1813 >> 0.0 >> ref|XM_822286.1| Trypanosoma brucei TREU927 protein kinase A ... 1622 >> 0.0 >> ref|XM_816530.1| Trypanosoma cruzi strain CL Brener protein k... 773 >> 0.0 >> ref|XM_816527.1| Trypanosoma cruzi strain CL Brener protein k... 749 >> 0.0 >> ref|XM_838414.1| Leishmania major strain Friedlin protein kin... 551 >> 3e-154 >> ref|XM_838409.1| Leishmania major strain Friedlin protein kin... 551 >> 3e-154 >> ref|XM_001568451.1| Leishmania braziliensis MHOM/BR/75/M2904 ... 542 >> 2e-151 >> ref|XM_001469171.1| Leishmania infantum protein kinase A cata... 538 >> 2e-150 >> ref|XM_001469166.1| Leishmania infantum protein kinase A cata... 538 >> 2e-150 >> ref|XM_001682462.1| Leishmania major protein kinase A catalyt... 196 >> 3e-47 >> ref|XM_804361.1| Trypanosoma cruzi strain CL Brener protein k... 190 >> 1e-45 >> ref|XM_002065851.1| Drosophila willistoni GK20594 (Dwil\GK205... 181 >> 7e-43 >> ref|XM_822694.1| Trypanosoma brucei TREU927 protein kinase A ... 179 >> 2e-42 >> ref|XM_001563990.1| Leishmania braziliensis MHOM/BR/75/M2904 ... 178 >> 8e-42 >> ref|XM_814844.1| Trypanosoma cruzi strain CL Brener protein k... 170 >> 1e-39 >> ref|XM_001763039.1| Physcomitrella patens subsp. patens predi... 169 >> 4e-39 >> ref|XM_001464886.1| Leishmania infantum JPCM5 protein kinase ... 167 >> 1e-38 >> ref|XM_001377302.1| PREDICTED: Monodelphis domestica similar ... 150 >> 1e-33 >> ref|XM_001603485.1| PREDICTED: Nasonia vitripennis similar to... 145 >> 5e-32 >> ref|XM_416852.2| PREDICTED: Gallus gallus protein kinase, X-l... 143 >> 2e-31 >> ref|NM_001016403.2| Xenopus (Silurana) tropicalis protein kin... 143 >> 2e-31 >> ref|XM_002009291.1| Drosophila mojavensis GI11297 (Dmoj\GI112... 138 >> 7e-30 >> ref|NM_016979.1| Mus musculus protein kinase, X-linked (Prkx)... 138 >> 7e-30 >> ref|XM_001495664.2| PREDICTED: Equus caballus similar to Seri... 136 >> 2e-29 >> ref|XM_001111571.1| PREDICTED: Macaca mulatta cAMP-dependent ... 136 >> 2e-29 >> ref|XM_001611655.1| Babesia bovis protein kinase domain conta... 134 >> 9e-29 >> ref|NR_028062.1| Homo sapiens protein kinase, Y-linked (PRKY)... 132 >> 3e-28 >> ref|XM_001517795.1| PREDICTED: Ornithorhynchus anatinus simil... 132 >> 3e-28 >> ref|XM_685338.2| PREDICTED: Danio rerio similar to Serine/thr... 132 >> 3e-28 >> ref|XM_002189865.1| PREDICTED: Taeniopygia guttata protein ki... 131 >> 1e-27 >> ref|XM_001362299.1| PREDICTED: Monodelphis domestica similar ... 131 >> 1e-27 >> ref|NM_001093198.1| Xenopus laevis protein kinase, cAMP-depen... 129 >> 4e-27 >> ref|XM_001461322.1| Paramecium tetraurelia hypothetical prote... 129 >> 4e-27 >> ref|NM_001099869.1| Xenopus laevis cAMP-dependent protein kin... 129 >> 4e-27 >> ref|NM_001003470.1| Danio rerio protein kinase, cAMP-dependen... 129 >> 4e-27 >> ref|XM_001141503.1| PREDICTED: Pan troglodytes verus protein ... 127 >> 1e-26 >> ref|XM_001145269.1| PREDICTED: Pan troglodytes protein kinase... 127 >> 1e-26 >> ref|XM_512434.2| PREDICTED: Pan troglodytes cAMP-dependent pr... 127 >> 1e-26 >> ref|XM_001171457.1| PREDICTED: Pan troglodytes cAMP-dependent... 127 >> 1e-26 >> ref|XM_001171437.1| PREDICTED: Pan troglodytes cAMP-dependent... 127 >> 1e-26 >> ref|XM_847420.1| PREDICTED: Canis familiaris similar to Serin... 127 >> 1e-26 >> ref|NM_207518.1| Homo sapiens protein kinase, cAMP-dependent,... 127 >> 1e-26 >> ref|NM_002730.3| Homo sapiens protein kinase, cAMP-dependent,... 127 >> 1e-26 >> >> >> Thanks in advance. >> >> Roopa. >> >> On Sat, Jan 9, 2010 at 7:46 PM, Mark A. Jensen wrote: >> >>> I understand you. Put in the double quotes and see what happens. >>> >>> ----- Original Message ----- >>> *From:* Roopa Raghuveer >>> *To:* Mark A. Jensen >>> *Sent:* Saturday, January 09, 2010 1:40 PM >>> *Subject:* Re: [Bioperl-l] Regarding blast in Bioperl >>> >>> Hi Mark, >>> >>> Thanks for your reply. It was working when I specifically use the name of >>> the organism as Trypanosoma brucei in the code,but my idea is to introduce a >>> $organ which takes the organism given by the user i.e., let it be anything >>> >>> Pseudomonas, Drosophila, Trypanosoma, Leishmania etc., I should get the >>> sequences related to only those organisms. >>> >>> i.e., If the user enters Pseudomonas,the $organ parameter of the code >>> takes Pseudomonas ,does BLAST and returns only those sequences that produce >>> significant alignment with Pseudomonas(only).But this is not happening like >>> that . >>> >>> Please help me in this regard. >>> >>> Thanks in advance >>> Roopa >>> >>> On Sat, Jan 9, 2010 at 7:29 PM, Mark A. Jensen wrote: >>> >>>> Hi Roopa-- You may get what you want if you make the change. >>>> With single quotes, ENTREZ_QUERY is set to the literal string >>>> >>>> $organ[ORGN] >>>> >>>> while, with double quotes, the variable value will be substituted, >>>> and the parameter should be set to >>>> >>>> Trypanosoma brucei[ORGN] >>>> >>>> I'm guess that it worked because the database ignored the strange >>>> parameter, >>>> and returned all the matches. Try this and if it doesn't work I look >>>> harder. >>>> cheers, >>>> Mark >>>> >>>> ----- Original Message ----- >>>> *From:* Roopa Raghuveer >>>> *To:* Mark A. Jensen >>>> *Sent:* Saturday, January 09, 2010 1:24 PM >>>> *Subject:* Re: [Bioperl-l] Regarding blast in Bioperl >>>> >>>> hello Mark, >>>> >>>> Thanks for your reply.It was working without enclosing $organ[ORGN] in >>>> double quotations,but. I would like to have only those specific sequences >>>> which are specific for my Organism i.e., I need sequences only from the >>>> organism that I entered. >>>> >>>> When the organism is Trypanosoma brucei,I could get even Leishmania and >>>> other species as the similar sequences. But I want to get only trypanosoma >>>> brucei sequences. >>>> >>>> Could you please help me out in this regard? >>>> >>>> Roopa. >>>> >>>> My output >>>> >>>> I/P organism: Trypanosoma brucei >>>> >>>> O/P:- >>>> ref|XM_822292.1| Trypanosoma brucei TREU927 protein kinase A ... >>>> 1813 0.0 >>>> ref|XM_822286.1| Trypanosoma brucei TREU927 protein kinase A ... >>>> 1622 0.0 >>>> ref|XM_816530.1| Trypanosoma cruzi strain CL Brener protein k... >>>> 773 0.0 >>>> ref|XM_816527.1| Trypanosoma cruzi strain CL Brener protein k... >>>> 749 0.0 >>>> ref|XM_838414.1| Leishmania major strain Friedlin protein kin... >>>> 551 3e-154 >>>> ref|XM_838409.1| Leishmania major strain Friedlin protein kin... >>>> 551 3e-154 >>>> ref|XM_001568451.1| Leishmania braziliensis MHOM/BR/75/M2904 ... >>>> 542 2e-151 >>>> ref|XM_001469171.1| Leishmania infantum protein kinase A cata... >>>> 538 2e-150 >>>> ref|XM_001469166.1| Leishmania infantum protein kinase A cata... >>>> 538 2e-150 >>>> ref|XM_001682462.1| Leishmania major protein kinase A catalyt... >>>> 196 3e-47 >>>> ref|XM_804361.1| Trypanosoma cruzi strain CL Brener protein k... >>>> 190 1e-45 >>>> ref|XM_002065851.1| Drosophila willistoni GK20594 (Dwil\GK205... >>>> 181 7e-43 >>>> ref|XM_822694.1| Trypanosoma brucei TREU927 protein kinase A ... >>>> 179 2e-42 >>>> ref|XM_001563990.1| Leishmania braziliensis MHOM/BR/75/M2904 ... >>>> 178 8e-42 >>>> ref|XM_814844.1| Trypanosoma cruzi strain CL Brener protein k... >>>> 170 1e-39 >>>> ref|XM_001763039.1| Physcomitrella patens subsp. patens predi... >>>> 168 4e-39 >>>> ref|XM_001464886.1| Leishmania infantum JPCM5 protein kinase ... >>>> 167 1e-38 >>>> ref|XM_001377302.1| PREDICTED: Monodelphis domestica similar ... >>>> 150 1e-33 >>>> ref|XM_001603485.1| PREDICTED: Nasonia vitripennis similar to... >>>> 145 5e-32 >>>> ref|XM_416852.2| PREDICTED: Gallus gallus protein kinase, X-l... >>>> 143 2e-31 >>>> ref|NM_001016403.2| Xenopus (Silurana) tropicalis protein kin... >>>> 143 2e-31 >>>> ref|XM_002009291.1| Drosophila mojavensis GI11297 (Dmoj\GI112... >>>> 138 7e-30 >>>> ref|NM_016979.1| Mus musculus protein kinase, X-linked (Prkx)... >>>> 138 7e-30 >>>> ref|XM_001495664.2| PREDICTED: Equus caballus similar to Seri... >>>> 136 2e-29 >>>> ref|XM_001111571.1| PREDICTED: Macaca mulatta cAMP-dependent ... >>>> 136 2e-29 >>>> ref|XM_001611655.1| Babesia bovis protein kinase domain conta... >>>> 134 9e-29 >>>> ref|NR_028062.1| Homo sapiens protein kinase, Y-linked (PRKY)... >>>> 132 3e-28 >>>> ref|XM_001517795.1| PREDICTED: Ornithorhynchus anatinus simil... >>>> 132 3e-28 >>>> ref|XM_685338.2| PREDICTED: Danio rerio similar to Serine/thr... >>>> 132 3e-28 >>>> ref|XM_002189865.1| PREDICTED: Taeniopygia guttata protein ki... >>>> 131 1e-27 >>>> ref|XM_001362299.1| PREDICTED: Monodelphis domestica similar ... >>>> 131 1e-27 >>>> ref|NM_001093198.1| Xenopus laevis protein kinase, cAMP-depen... >>>> 129 4e-27 >>>> ref|XM_001461322.1| Paramecium tetraurelia hypothetical prote... >>>> 129 4e-27 >>>> ref|NM_001099869.1| Xenopus laevis cAMP-dependent protein kin... >>>> 129 4e-27 >>>> >>>> Roopa. >>>> >>>> On Sat, Jan 9, 2010 at 7:05 PM, Mark A. Jensen wrote: >>>> >>>>> I see it immediately (from making same bug many times) : >>>>> >>>>> >>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY >>>>> => >>>>> - '$organ[ORGN]'); >>>>> +"$organ[ORGN]"); >>>>> >>>>> >>>>> MAJ >>>>> >>>>> ----- Original Message ----- From: "Roopa Raghuveer" < >>>>> rtbio.2009 at gmail.com> >>>>> To: "Mark A. Jensen" >>>>> Cc: >>>>> Sent: Saturday, January 09, 2010 11:57 AM >>>>> Subject: Re: [Bioperl-l] Regarding blast in Bioperl >>>>> >>>>> >>>>> >>>>> Hello all, >>>>>> >>>>>> Thanks alot for your reply Mark. It was working for Trypanosoma brucei >>>>>> as >>>>>> the organism parameter,but when I tried to use the Organism parameter >>>>>> from >>>>>> the user,it was not working i.e., I was unable to get the target >>>>>> sequences. >>>>>> Please help me in this regard. My code is >>>>>> >>>>>> #!/usr/bin/perl >>>>>> >>>>>> #path for extra camel module >>>>>> use lib "/srv/www/htdocs/rain/RNAi/"; >>>>>> use Roopablast; >>>>>> >>>>>> >>>>>> use Bio::SearchIO; >>>>>> use Bio::Search::Result::BlastResult; >>>>>> use Bio::Perl; >>>>>> use Bio::Tools::Run::RemoteBlast; >>>>>> use Bio::Seq; >>>>>> use Bio::SeqIO; >>>>>> use Bio::DB::GenBank; >>>>>> >>>>>> $serverpath = "/srv/www/htdocs/rain/RNAi"; >>>>>> $serverurl = "http://141.84.66.66/rain/RNAi"; >>>>>> $outfile = $serverpath."/rnairesult_".time().".html"; >>>>>> $nuc = $serverpath."/nuc".time().".txt"; >>>>>> $debugfile = $serverpath."/debug_".time().".txt"; >>>>>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; >>>>>> >>>>>> my $outstring =""; >>>>>> >>>>>> &parse_form; >>>>>> >>>>>> print "Content-type: text/html\n\n"; >>>>>> print "\n"; >>>>>> print "RNAi Result"; >>>>>> print ">>>>> URL=$serverurl/rnairesult_".time().".html\"> \n"; >>>>>> print "\n"; >>>>>> print "\n"; >>>>>> print " Your results will appear >>>>> href=$serverurl/rnairesult_".time().".html>here
"; >>>>>> print " Please be patient, runtime can be up to 5 minutes
"; >>>>>> print " This page will automatically reload in 30 seconds. Roopa"; >>>>>> print "\n"; >>>>>> print "\n"; >>>>>> >>>>>> defined(my $pid = fork) or die "Can't fork: $!"; >>>>>> exit if $pid; >>>>>> open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; >>>>>> open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; >>>>>> open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; >>>>>> >>>>>> open(OUTFILE, '>',$outfile); >>>>>> >>>>>> print OUTFILE "\n >>>>>> RNAi Result >>>>>> >>>>> URL=$serverurl//rnairesult_".time().".html\"> \n >>>>>> >>>>>> \n >>>>>> \n >>>>>> Your results will appear >>>>> href=$serverurl/rnairesult_".time().".html>here
>>>>>> Please be patient, runtime can be up to 5 minutes wait wait >>>>>> wait......
>>>>>> This page will automatically reload in 30 seconds Roopa
>>>>>> \n >>>>>> \n"; >>>>>> >>>>>> close(OUTFILE); >>>>>> >>>>>> >>>>>> @compseqs = blastcode($in{'Inputseq'},$in{'Organism'}); >>>>>> >>>>>> $in{'Inputseq'} =~ s/>.*$//m; >>>>>> $in{'Inputseq'} =~ s/[^TAGC]//gim; >>>>>> $in{'Inputseq'} =~ tr/actg/ACTG/; >>>>>> >>>>>> @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'}, >>>>>> $in{'Threshold'}); >>>>>> >>>>>> >>>>>> sub blastcode >>>>>> { >>>>>> >>>>>> $inpu1= $_[0]; >>>>>> >>>>>> $organ= $_[1]; >>>>>> >>>>>> open(NUC,'>',$nuc); >>>>>> print NUC $inpu1,"\n"; >>>>>> close(NUC); >>>>>> >>>>>> my $prog = 'blastn'; >>>>>> my $db = 'refseq_rna'; >>>>>> my $e_val= '1e-10'; >>>>>> my $organism= $organ; >>>>>> >>>>>> $gb = new Bio::DB::GenBank; >>>>>> >>>>>> my @params = ( '-prog' => $prog, >>>>>> '-data' => $db, >>>>>> '-expect' => $e_val, >>>>>> '-readmethod' => 'SearchIO', >>>>>> '-Organism' => $organism ); >>>>>> >>>>>> open(OUTFILE,'>',$debugfile); >>>>>> print OUTFILE $inpu1; >>>>>> close(OUTFILE); >>>>>> >>>>>> >>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY >>>>>> => >>>>>> '$organ[ORGN]'); >>>>>> >>>>>> #my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >>>>>> >>>>>> #change a paramter >>>>>> >>>>>> #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma >>>>>> Brucei[ORGN]'; >>>>>> >>>>>> #change a paramter >>>>>> # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = >>>>>> '$input2[ORGN]'; >>>>>> >>>>>> my $v = 1; >>>>>> #$v is just to turn on and off the messages >>>>>> >>>>>> my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , >>>>>> '-organism' => $organ ); >>>>>> >>>>>> >>>>>> while (my $input = $str->next_seq()) >>>>>> { >>>>>> #Blast a sequence against a database: >>>>>> #Alternatively, you could pass in a file with many >>>>>> #sequences rather than loop through sequence one at a time >>>>>> #Remove the loop starting 'while (my $input = $str->next_seq())' >>>>>> #and swap the two lines below for an example of that. >>>>>> >>>>>> #open(OUTFILE,'>',$debugfile); >>>>>> # print OUTFILE $input; >>>>>> #close(OUTFILE); >>>>>> >>>>>> >>>>>> my $r = $factory->submit_blast($input); >>>>>> >>>>>> open(OUTFILE,'>',$debugfile); >>>>>> # print OUTFILE $r; >>>>>> close(OUTFILE); >>>>>> >>>>>> print STDERR "waiting...." if($v>0); >>>>>> >>>>>> while ( my @rids = $factory->each_rid ) { >>>>>> # open(OUTFILE,'>',$debugfile); >>>>>> # print OUTFILE "while entered"; >>>>>> # close(OUTFILE); >>>>>> foreach my $rid ( @rids ) { >>>>>> >>>>>> # open(OUTFILE,'>',$debugfile); >>>>>> # print OUTFILE "foreach entered"; >>>>>> # close(OUTFILE); >>>>>> >>>>>> my $rc = $factory->retrieve_blast($rid); >>>>>> >>>>>> if( !ref($rc) ) >>>>>> { >>>>>> if( $rc < 0 ) >>>>>> { >>>>>> $factory->remove_rid($rid); >>>>>> } >>>>>> open(OUTFILE,'>',$debugfile); >>>>>> # print OUTFILE "if entered"; >>>>>> close(OUTFILE); >>>>>> print STDERR "." if ( $v > 0 ); >>>>>> sleep 5; >>>>>> } >>>>>> else { >>>>>> # open(OUTFILE,'>',$debugfile); >>>>>> # print OUTFILE "else entered"; >>>>>> # close(OUTFILE); >>>>>> >>>>>> my $result = $rc->next_result(); >>>>>> #save the output >>>>>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; >>>>>> >>>>>> open(BLASTDEBUGFILE,'>',$blastdebugfile); >>>>>> print BLASTDEBUGFILE $result->next_hit(); >>>>>> close(BLASTDEBUGFILE); >>>>>> >>>>>> my $filename = >>>>>> $serverpath."/blastdata_".time().$result->query_name()."\.out"; >>>>>> >>>>>> # open(DEBUGFILE,'>',$debugfile); >>>>>> # open(new,'>',$filename); >>>>>> # @arra=; >>>>>> # print DEBUGFILE @arra; >>>>>> # close(DEBUGFILE); >>>>>> # close(new); >>>>>> >>>>>> $factory->save_output($filename); >>>>>> # open(BLASTDEBUGFILE,'>',$debugfile); >>>>>> # print BLASTDEBUGFILE "Hello $rid"; >>>>>> # close(BLASTDEBUGFILE); >>>>>> >>>>>> $factory->remove_rid($rid); >>>>>> >>>>>> open(BLASTDEBUGFILE,'>',$blastdebugfile); >>>>>> print BLASTDEBUGFILE $organism; >>>>>> close(BLASTDEBUGFILE); >>>>>> >>>>>> # open(OUTFILE,'>',$outfile); >>>>>> # print OUTFILE "Test2 $result->database_name()"; >>>>>> # close(OUTFILE); >>>>>> >>>>>> #$hit = $result->next_hit; >>>>>> #open(new,'>',$debugfile); >>>>>> #print $hit; >>>>>> #close(new); >>>>>> >>>>>> while ( my $hit = $result->next_hit ) { >>>>>> >>>>>> next unless ( $v > 0); >>>>>> >>>>>> # open(OUTFILE,'>',$debugfile); >>>>>> # print OUTFILE "$hit in while hits"; >>>>>> # close(OUTFILE); >>>>>> >>>>>> my $sequ = $gb->get_Seq_by_version($hit->name); >>>>>> my $dna = $sequ->seq(); # get the sequence as a string >>>>>> push(@seqs,$dna); >>>>>> } >>>>>> } >>>>>> } >>>>>> } >>>>>> } >>>>>> >>>>>> #open(OUTFILE,'>',$debugfile); >>>>>> #print OUTFILE $seqs[0]; >>>>>> #close(OUTFILE); >>>>>> >>>>>> return(@seqs); >>>>>> >>>>>> } >>>>>> >>>>>> Regards, >>>>>> Roopa. >>>>>> >>>>>> >>>>>> On Fri, Jan 8, 2010 at 4:36 PM, Mark A. Jensen >>>>>> wrote: >>>>>> >>>>>> Hi Roopa-- >>>>>>> >>>>>>> I got your code to work with the following changes: >>>>>>> >>>>>>> +# the input should be a valid FASTA file... >>>>>>> ... >>>>>>> open(NUC,'>',$nuc); >>>>>>> +print NUC ">seq (need a name line for valid fasta)\n"; >>>>>>> print NUC $inpu1, "\n"; >>>>>>> close(NUC); >>>>>>> ... >>>>>>> >>>>>>> +# you can set these header parms in the call itself... >>>>>>> - my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >>>>>>> + my $factory = Bio::Tools::Run::RemoteBlast->new(@params, >>>>>>> -ENTREZ_QUERY => >>>>>>> ''Trypanosoma Brucei[ORGN]'); >>>>>>> >>>>>>> #change a paramter >>>>>>> +# commented this out... >>>>>>> +# $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = >>>>>>> 'Trypanosoma >>>>>>> Brucei[ORGN]'; >>>>>>> >>>>>>> MAJ >>>>>>> ----- Original Message ----- From: "Roopa Raghuveer" < >>>>>>> rtbio.2009 at gmail.com >>>>>>> > >>>>>>> To: >>>>>>> Sent: Friday, January 08, 2010 10:00 AM >>>>>>> Subject: [Bioperl-l] Regarding blast in Bioperl >>>>>>> >>>>>>> >>>>>>> Hello all, >>>>>>> >>>>>>>> >>>>>>>> I was trying Remote blast using Bioperl. My input data is a >>>>>>>> Trypanosoma >>>>>>>> brucei sequence in Fasta format. When I was trying to submit to >>>>>>>> BLAST >>>>>>>> using >>>>>>>> the step >>>>>>>> $r=$factory->submit_blast($input) >>>>>>>> It was not returning anything which I checked by debugging the code. >>>>>>>> It is >>>>>>>> not blasting my input sequence even though I mentioned all the >>>>>>>> parameters.I >>>>>>>> would paste the code below. >>>>>>>> >>>>>>>> Please help me in solving put this problem. It is very urgent. >>>>>>>> >>>>>>>> Regards >>>>>>>> Roopa. >>>>>>>> >>>>>>>> #!/usr/bin/perl >>>>>>>> >>>>>>>> #path for extra camel module >>>>>>>> use lib "/srv/www/htdocs/rain/RNAi/"; >>>>>>>> use Roopablast; >>>>>>>> >>>>>>>> >>>>>>>> use Bio::SearchIO; >>>>>>>> use Bio::Search::Result::BlastResult; >>>>>>>> use Bio::Perl; >>>>>>>> use Bio::Tools::Run::RemoteBlast; >>>>>>>> use Bio::Seq; >>>>>>>> use Bio::SeqIO; >>>>>>>> use Bio::DB::GenBank; >>>>>>>> >>>>>>>> $serverpath = "/srv/www/htdocs/rain/RNAi"; >>>>>>>> $serverurl = "http://141.84.66.66/rain/RNAi"; >>>>>>>> $outfile = $serverpath."/rnairesult_".time().".html"; >>>>>>>> $nuc = $serverpath."/nuc".time().".txt"; >>>>>>>> $debugfile = $serverpath."/debug_".time().".txt"; >>>>>>>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; >>>>>>>> >>>>>>>> my $outstring =""; >>>>>>>> >>>>>>>> &parse_form; >>>>>>>> >>>>>>>> print "Content-type: text/html\n\n"; >>>>>>>> print "\n"; >>>>>>>> print "RNAi Result"; >>>>>>>> print ">>>>>>> URL=$serverurl/rnairesult_".time().".html\"> \n"; >>>>>>>> print "\n"; >>>>>>>> print "\n"; >>>>>>>> print " Your results will appear >>>>>>> href=$serverurl/rnairesult_".time().".html>here
"; >>>>>>>> print " Please be patient, runtime can be up to 5 minutes
"; >>>>>>>> print " This page will automatically reload in 30 seconds. Roopa"; >>>>>>>> print "\n"; >>>>>>>> print "\n"; >>>>>>>> >>>>>>>> defined(my $pid = fork) or die "Can't fork: $!"; >>>>>>>> exit if $pid; >>>>>>>> open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; >>>>>>>> open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; >>>>>>>> open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> open(OUTFILE, '>',$outfile); >>>>>>>> >>>>>>>> print OUTFILE "\n >>>>>>>> RNAi Result >>>>>>>> >>>>>>> URL=$serverurl//rnairesult_".time().".html\"> \n >>>>>>>> >>>>>>>> \n >>>>>>>> \n >>>>>>>> Your results will appear >>>>>>> href=$serverurl/rnairesult_".time().".html>here
>>>>>>>> Please be patient, runtime can be up to 5 minutes wait wait >>>>>>>> wait......
>>>>>>>> This page will automatically reload in 30 seconds Roopa
>>>>>>>> \n >>>>>>>> \n"; >>>>>>>> >>>>>>>> close(OUTFILE); >>>>>>>> >>>>>>>> >>>>>>>> @compseqs = blastcode($in{'Inputseq'}); >>>>>>>> >>>>>>>> $in{'Inputseq'} =~ s/>.*$//m; >>>>>>>> $in{'Inputseq'} =~ s/[^TAGC]//gim; >>>>>>>> $in{'Inputseq'} =~ tr/actg/ACTG/; >>>>>>>> >>>>>>>> @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'}, >>>>>>>> $in{'Threshold'}); >>>>>>>> >>>>>>>> >>>>>>>> sub blastcode >>>>>>>> { >>>>>>>> >>>>>>>> $inpu1= $_[0]; >>>>>>>> >>>>>>>> #$organ= $_[1]; >>>>>>>> >>>>>>>> open(NUC,'>',$nuc); >>>>>>>> print NUC $inpu1; >>>>>>>> close(NUC); >>>>>>>> >>>>>>>> my $prog = 'blastn'; >>>>>>>> my $db = 'refseq_rna'; >>>>>>>> my $e_val= '1e-10'; >>>>>>>> my $organism= 'Trypanosoma Brucei'; >>>>>>>> >>>>>>>> $gb = new Bio::DB::GenBank; >>>>>>>> >>>>>>>> my @params = ( '-prog' => $prog, >>>>>>>> '-data' => $db, >>>>>>>> '-expect' => $e_val, >>>>>>>> '-readmethod' => 'SearchIO', >>>>>>>> '-Organism' => $organism ); >>>>>>>> >>>>>>>> # open(OUTFILE,'>',$debugfile); >>>>>>>> # print OUTFILE @params; >>>>>>>> # close(OUTFILE); >>>>>>>> >>>>>>>> >>>>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >>>>>>>> >>>>>>>> #change a paramter >>>>>>>> >>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma >>>>>>>> Brucei[ORGN]'; >>>>>>>> >>>>>>>> #change a paramter >>>>>>>> # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = >>>>>>>> '$input2[ORGN]'; >>>>>>>> >>>>>>>> my $v = 1; >>>>>>>> #$v is just to turn on and off the messages >>>>>>>> >>>>>>>> my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , >>>>>>>> '-organism' => 'Trypanosoma Brucei' ); >>>>>>>> >>>>>>>> >>>>>>>> while (my $input = $str->next_seq()) >>>>>>>> { >>>>>>>> #Blast a sequence against a database: >>>>>>>> #Alternatively, you could pass in a file with many >>>>>>>> #sequences rather than loop through sequence one at a time >>>>>>>> #Remove the loop starting 'while (my $input = $str->next_seq())' >>>>>>>> #and swap the two lines below for an example of that. >>>>>>>> >>>>>>>> open(OUTFILE,'>',$debugfile); >>>>>>>> print OUTFILE $input; >>>>>>>> close(OUTFILE); >>>>>>>> >>>>>>>> >>>>>>>> my $r = $factory->submit_blast($input); #The program stops here >>>>>>>> it >>>>>>>> does not return any value and it does not enter the While >>>>>>>> loop,Please help >>>>>>>> me in this regard.# >>>>>>>> open(OUTFILE,'>',$debugfile); >>>>>>>> print OUTFILE $r; >>>>>>>> close(OUTFILE); >>>>>>>> >>>>>>>> >>>>>>>> print STDERR "waiting...." if($v>0); >>>>>>>> >>>>>>>> while ( my @rids = $factory->each_rid ) { >>>>>>>> open(OUTFILE,'>',$debugfile); >>>>>>>> print OUTFILE "while entered"; >>>>>>>> close(OUTFILE); >>>>>>>> foreach my $rid ( @rids ) { >>>>>>>> >>>>>>>> open(OUTFILE,'>',$debugfile); >>>>>>>> print OUTFILE "foreach entered"; >>>>>>>> close(OUTFILE); >>>>>>>> >>>>>>>> my $rc = $factory->retrieve_blast($rid); >>>>>>>> >>>>>>>> if( !ref($rc) ) >>>>>>>> { >>>>>>>> if( $rc < 0 ) >>>>>>>> { >>>>>>>> $factory->remove_rid($rid); >>>>>>>> } >>>>>>>> open(OUTFILE,'>',$debugfile); >>>>>>>> print OUTFILE "if entered"; >>>>>>>> close(OUTFILE); >>>>>>>> print STDERR "." if ( $v > 0 ); >>>>>>>> sleep 5; >>>>>>>> } >>>>>>>> else { >>>>>>>> open(OUTFILE,'>',$debugfile); >>>>>>>> print OUTFILE "else entered"; >>>>>>>> close(OUTFILE); >>>>>>>> >>>>>>>> my $result = $rc->next_result(); >>>>>>>> #save the output >>>>>>>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; >>>>>>>> >>>>>>>> open(BLASTDEBUGFILE,'>',$blastdebugfile); >>>>>>>> print BLASTDEBUGFILE $result->next_hit(); >>>>>>>> close(BLASTDEBUGFILE); >>>>>>>> >>>>>>>> my $filename = >>>>>>>> $serverpath."/blastdata_".time().$result->query_name()."\.out"; >>>>>>>> >>>>>>>> # open(DEBUGFILE,'>',$debugfile); >>>>>>>> # open(new,'>',$filename); >>>>>>>> # @arra=; >>>>>>>> # print DEBUGFILE @arra; >>>>>>>> # close(DEBUGFILE); >>>>>>>> # close(new); >>>>>>>> >>>>>>>> $factory->save_output($filename); >>>>>>>> >>>>>>>> # open(BLASTDEBUGFILE,'>',$debugfile); >>>>>>>> # print BLASTDEBUGFILE "Hello $rid"; >>>>>>>> # close(BLASTDEBUGFILE); >>>>>>>> >>>>>>>> $factory->remove_rid($rid); >>>>>>>> >>>>>>>> open(BLASTDEBUGFILE,'>',$blastdebugfile); >>>>>>>> print BLASTDEBUGFILE $organism; >>>>>>>> close(BLASTDEBUGFILE); >>>>>>>> >>>>>>>> # open(OUTFILE,'>',$outfile); >>>>>>>> # print OUTFILE "Test2 $result->database_name()"; >>>>>>>> # close(OUTFILE); >>>>>>>> >>>>>>>> #$hit = $result->next_hit; >>>>>>>> #open(new,'>',$debugfile); >>>>>>>> #print $hit; >>>>>>>> #close(new); >>>>>>>> >>>>>>>> while ( my $hit = $result->next_hit ) { >>>>>>>> >>>>>>>> next unless ( $v > 0); >>>>>>>> >>>>>>>> # open(OUTFILE,'>',$debugfile); >>>>>>>> # print OUTFILE "$hit in while hits"; >>>>>>>> # close(OUTFILE); >>>>>>>> >>>>>>>> my $sequ = $gb->get_Seq_by_version($hit->name); >>>>>>>> my $dna = $sequ->seq(); # get the sequence as a >>>>>>>> string >>>>>>>> push(@seqs,$dna); >>>>>>>> } >>>>>>>> } >>>>>>>> } >>>>>>>> } >>>>>>>> } >>>>>>>> >>>>>>>> #open(OUTFILE,'>',$debugfile); >>>>>>>> #print OUTFILE $seqs[0]; >>>>>>>> #close(OUTFILE); >>>>>>>> >>>>>>>> return(@seqs); >>>>>>>> >>>>>>>> } >>>>>>>> >>>>>>>> open(OUTFILE, '>',$outfile) || die ; >>>>>>>> >>>>>>>> print OUTFILE "\n >>>>>>>> RNAi Result >>>>>>>> \n >>>>>>>> \n >>>>>>>>

>>>>>>>> Inputsequence:
"; >>>>>>>> >>>>>>>> for ($i=0; $i>>>>>>> >>>>>>>> print OUTFILE substr ($in{'Inputseq'}, $i, 1); >>>>>>>> >>>>>>>> if ( ($i+1)%10==0){ >>>>>>>> print OUTFILE " "; >>>>>>>> } >>>>>>>> if ( ($i+1)%60==0){ >>>>>>>> print OUTFILE "
\n"; >>>>>>>> } >>>>>>>> } >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> print OUTFILE "

"; >>>>>>>> >>>>>>>> $z=@compseqs; >>>>>>>> >>>>>>>> for($k=1;$k<$z;$k++) { >>>>>>>> print OUTFILE ">>>>>>> set\">

Compare >>>>>>>> Sequence:
"; >>>>>>>> >>>>>>>> for ($i=0; $i>>>>>>> >>>>>>>> print OUTFILE substr ($compseqs[$k], $i, 1); >>>>>>>> >>>>>>>> if ( ($i+1)%10==0){ >>>>>>>> print OUTFILE " "; >>>>>>>> } >>>>>>>> if ( ($i+1)%60==0){ >>>>>>>> print OUTFILE "
\n"; >>>>>>>> } >>>>>>>> } >>>>>>>> print OUTFILE "

"; >>>>>>>> } >>>>>>>> >>>>>>>> print OUTFILE "

>>>>>>>> Window:
$in{'Windowsize'} >>>>>>>>

>>>>>>>>

>>>>>>>> Threshold:
$in{'Threshold'} >>>>>>>>

"; >>>>>>>> my $j=0; >>>>>>>> >>>>>>>> for ($i=0; $i>>>>>>> >>>>>>>> if ($i<=length ($in{'Inputseq'})-$in{'Windowsize'}){ >>>>>>>> if ($out[$i]->{similar}<=$in{'Threshold'}){ >>>>>>>> $j=$in{'Windowsize'}; >>>>>>>> } >>>>>>>> $height=$out[$i]->{similar}*5; >>>>>>>> } >>>>>>>> >>>>>>>> if ($j>0) { >>>>>>>> print OUTFILE ">>>>>>> height=\"5\">"; >>>>>>>> $outstring .= "".substr ($in{'Inputseq'}, >>>>>>>> $i, >>>>>>>> 1).""; >>>>>>>> $j--; >>>>>>>> } >>>>>>>> else { >>>>>>>> print OUTFILE ">>>>>>> height=\"5\">"; >>>>>>>> $outstring .= "".substr ($in{'Inputseq'}, >>>>>>>> $i, >>>>>>>> 1).""; >>>>>>>> } >>>>>>>> >>>>>>>> if ( ($i+1)%10==0){ >>>>>>>> $outstring .= " "; >>>>>>>> } >>>>>>>> if ( ($i+1)%60==0){ >>>>>>>> $outstring .= "
\n"; >>>>>>>> >>>>>>>> } >>>>>>>> if ( ($i+1)%800==0){ >>>>>>>> print OUTFILE "

\n"; >>>>>>>> >>>>>>>> } >>>>>>>> } >>>>>>>> >>>>>>>> print OUTFILE "

>>>>>>> set\">$outstring"; >>>>>>>> >>>>>>>> #foreach (@out) { >>>>>>>> #print OUTFILE "

Sequence: $_->{sequence}: $_->{similar} >>>>>>>> matchs

"; >>>>>>>> #if ($_->{similar}<=$in{'Threshold'}){ >>>>>>>> >>>>>>>> # } >>>>>>>> #} >>>>>>>> >>>>>>>> print OUTFILE "\n\n"; >>>>>>>> >>>>>>>> close OUTFILE; >>>>>>>> >>>>>>>> #nameprint(); >>>>>>>> >>>>>>>> sub parse_form { >>>>>>>> local ($buffer, @pairs, $pair, $name, $value); >>>>>>>> # Read in text >>>>>>>> $ENV{'REQUEST_METHOD'} =~ tr/a-z/A-Z/; >>>>>>>> if ($ENV{'REQUEST_METHOD'} eq "POST") >>>>>>>> { >>>>>>>> read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'}); >>>>>>>> } >>>>>>>> else >>>>>>>> { >>>>>>>> $buffer = $ENV{'QUERY_STRING'}; >>>>>>>> } >>>>>>>> @pairs = split(/&/, $buffer); >>>>>>>> foreach $pair (@pairs) >>>>>>>> { >>>>>>>> ($name, $value) = split(/=/, $pair); >>>>>>>> $value =~ tr/+/ /; >>>>>>>> $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg; >>>>>>>> $in{$name} = $value; >>>>>>>> } >>>>>>>> } >>>>>>>> _______________________________________________ >>>>>>>> Bioperl-l mailing list >>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>>>> >>>> >>> >> > From maj at fortinbras.us Fri Jan 22 07:34:59 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 22 Jan 2010 07:34:59 -0500 Subject: [Bioperl-l] forcing alphabet in Bio::AlignIO In-Reply-To: References: Message-ID: I'm down with that. ----- Original Message ----- From: "Jason Stajich" To: "BioPerl List" Sent: Friday, January 22, 2010 1:17 AM Subject: [Bioperl-l] forcing alphabet in Bio::AlignIO > I'm considering putting in allowable initialization parameter (and get/ > set) for Bio::AlignIO that would allow setting of the alphabet. This > is then passed to Bio::LocatableSeq creation so that _guess_alphabet > isn't called. This will allow removal of warnings about empty > sequences because _guess_alphabet won't be called on a sequence if we > have explictly set the alphabet. > > This worked great on my local install and tests pass. Any objections > or concerns? > > basically it means when you make an AlignIO you can specify the > alphabet i.e. > > my $in = Bio::AlignIO->new(-format => 'fasta', -alphabet => 'dna', - > file => 'genome.fasaln'); > > I have some alignments with empty sequences and I think turning off > the warnings is appropriate where I force the alphabet choice. It > should also have a very modest speedup benefit too. > > -jason > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > http://fungalgenomes.org/ > http://twitter.com/hyphaltip > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From avilella at gmail.com Fri Jan 22 08:07:26 2010 From: avilella at gmail.com (Albert Vilella) Date: Fri, 22 Jan 2010 13:07:26 +0000 Subject: [Bioperl-l] Merging fragments in a simplealign Message-ID: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com> Hi, I would like to write a script that merges fragments in a Bio::SimpleAlign object on the basis of some $seq->display_name rule. I basically want to start with something like this: seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM seq2.234 QWERTYU------------------- seq2.345 ----------ASDFGH---------- seq2.456 -------------------ZXCVBNM And end with something like this: seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM seq2.mrg QWERTYU---ASDFGH---ZXCVBNM Can people suggest any Bio::SimpleAlign methods that would help here? Cheers, Albert. From maj at fortinbras.us Fri Jan 22 08:31:54 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 22 Jan 2010 08:31:54 -0500 Subject: [Bioperl-l] Merging fragments in a simplealign In-Reply-To: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com> References: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com> Message-ID: Here's one of my favorite tricks for this: XOR mask on gap symbol. MAJ use Bio::SeqIO; use Bio::Seq; use strict; my $seqio = Bio::SeqIO->new( -fh => \*DATA ); my $acc = $seqio->next_seq->seq ^ '-'; while ($_ = $seqio->next_seq ) { $acc ^= ($_->seq ^ '-'); } my $mrg = Bio::Seq->new( -id => 'merged', -seq => $acc ^ '-' ); 1; __END__ >seq2.234 QWERTYU------------------- >seq2.345 ----------ASDFGH---------- >seq2.456 -------------------ZXCVBNM ----- Original Message ----- From: "Albert Vilella" To: Sent: Friday, January 22, 2010 8:07 AM Subject: [Bioperl-l] Merging fragments in a simplealign > Hi, > > I would like to write a script that merges fragments in a Bio::SimpleAlign > object on the basis of > some $seq->display_name rule. > > I basically want to start with something like this: > > seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM > seq2.234 QWERTYU------------------- > seq2.345 ----------ASDFGH---------- > seq2.456 -------------------ZXCVBNM > > And end with something like this: > > seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM > seq2.mrg QWERTYU---ASDFGH---ZXCVBNM > > Can people suggest any Bio::SimpleAlign methods that would help here? > > Cheers, > > Albert. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Fri Jan 22 08:34:07 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 22 Jan 2010 07:34:07 -0600 Subject: [Bioperl-l] forcing alphabet in Bio::AlignIO In-Reply-To: References: Message-ID: <55F51BAA-7DA5-4F32-B680-DCAE1714A5F1@illinois.edu> Sounds good to me. The warnings are a bit too tight on this module anyway. I still think we have plans towards refactoring some of this, not sure how far along they are: http://www.bioperl.org/wiki/Align_Refactor chris On Jan 22, 2010, at 12:17 AM, Jason Stajich wrote: > I'm considering putting in allowable initialization parameter (and get/set) for Bio::AlignIO that would allow setting of the alphabet. This is then passed to Bio::LocatableSeq creation so that _guess_alphabet isn't called. This will allow removal of warnings about empty sequences because _guess_alphabet won't be called on a sequence if we have explictly set the alphabet. > > This worked great on my local install and tests pass. Any objections or concerns? > > basically it means when you make an AlignIO you can specify the alphabet i.e. > > my $in = Bio::AlignIO->new(-format => 'fasta', -alphabet => 'dna', -file => 'genome.fasaln'); > > I have some alignments with empty sequences and I think turning off the warnings is appropriate where I force the alphabet choice. It should also have a very modest speedup benefit too. > > -jason > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > http://fungalgenomes.org/ > http://twitter.com/hyphaltip > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Fri Jan 22 08:40:57 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 22 Jan 2010 07:40:57 -0600 Subject: [Bioperl-l] Merging fragments in a simplealign In-Reply-To: References: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com> Message-ID: <058430B4-F149-438C-A9DD-C9E24545FDBB@illinois.edu> May be something for the cook/scrapbook? chris On Jan 22, 2010, at 7:31 AM, Mark A. Jensen wrote: > Here's one of my favorite tricks for this: XOR mask on gap symbol. > MAJ > > use Bio::SeqIO; > use Bio::Seq; > use strict; > my $seqio = Bio::SeqIO->new( -fh => \*DATA ); > > my $acc = $seqio->next_seq->seq ^ '-'; > while ($_ = $seqio->next_seq ) { > $acc ^= ($_->seq ^ '-'); > } > my $mrg = Bio::Seq->new( -id => 'merged', > -seq => $acc ^ '-' ); > 1; > > > __END__ >> seq2.234 > QWERTYU------------------- >> seq2.345 > ----------ASDFGH---------- >> seq2.456 > -------------------ZXCVBNM > > ----- Original Message ----- From: "Albert Vilella" > To: > Sent: Friday, January 22, 2010 8:07 AM > Subject: [Bioperl-l] Merging fragments in a simplealign > > >> Hi, >> I would like to write a script that merges fragments in a Bio::SimpleAlign >> object on the basis of >> some $seq->display_name rule. >> I basically want to start with something like this: >> seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM >> seq2.234 QWERTYU------------------- >> seq2.345 ----------ASDFGH---------- >> seq2.456 -------------------ZXCVBNM >> And end with something like this: >> seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM >> seq2.mrg QWERTYU---ASDFGH---ZXCVBNM >> Can people suggest any Bio::SimpleAlign methods that would help here? >> Cheers, >> Albert. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From holland at eaglegenomics.com Fri Jan 22 05:51:52 2010 From: holland at eaglegenomics.com (Richard Holland) Date: Fri, 22 Jan 2010 10:51:52 +0000 Subject: [Bioperl-l] [BioSQL-l] SwissProt DE lines and UniProt XML / TagTree as XML in BioSQL In-Reply-To: <320fb6e01001210433n6f42e617g6519ee2790d6add5@mail.gmail.com> References: <320fb6e01001210433n6f42e617g6519ee2790d6add5@mail.gmail.com> Message-ID: <8FECCBDE-2DE1-40EE-B5A4-73BDAC893E2D@eaglegenomics.com> Nice idea. Currently, BioJava just stores the complete section as a string without parsing it, but it provides a parser module for converting it into useful tag/value format within a user's program (but not to be stored in BioSQL). On 21 Jan 2010, at 12:33, Peter wrote: > Hi all, > > This is cross posted to try and ensure relevant people see it. > I suggest we continue the discussion on the BioSQL list > (for how to serialise structured annotation to BioSQL), and/or > the OpenBio list (for things like file format naming conventions). > > I am hoping we (Bio*) can be consistent in how we parse and load > into BioSQL the SwissProt DE lines (known as "swiss" format in > both BioPerl and Biopython's SeqIO, and by EMBOSS) or the > equivalent UniProt XML tags (which we are tentatively going to > call the "uniprot" format in Biopython's SeqIO - comments?). > > Like BioPerl (etc), Biopython can parse plain text SwissProt ("swiss") > files and load them into BioSQL. Biopython currently treats the DE > comment lines as a long string, as BioPerl used to: > > http://lists.open-bio.org/pipermail/bioperl-l/2009-May/030041.html > http://lists.open-bio.org/pipermail/biosql-l/2009-May/001514.html > > I understand that BioPerl now turns the SwissProt DE lines into a > TagTree, and for storing this in BioSQL this gets serialised as XML. > I would like Biopython to handle this the same way (although rather > than a Perl TagTree, we'd use a Python structure of course), and > would appreciate clarification of what exactly was implemented > (e.g. which bit of the BioPerl source code should be look at, > and could you show a worked example?). > > Andrea Pierlenoin (CC'd - not sure if he is on the BioSQL or > Open-Bio lists yet) has started work on parsing UniProt XML > files for Biopython. Here the DE comment lines are already > provided broken up with XML markup. Hopefully their nested > structure matches what BioPerl was doing with the SwissProt > DE lines. > > Regards, > > Peter > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From andrea at biocomp.unibo.it Fri Jan 22 07:18:32 2010 From: andrea at biocomp.unibo.it (Andrea Pierleoni) Date: Fri, 22 Jan 2010 13:18:32 +0100 (CET) Subject: [Bioperl-l] SwissProt DE lines and UniProt XML / TagTree as XML in BioSQL In-Reply-To: <320fb6e01001210433n6f42e617g6519ee2790d6add5@mail.gmail.com> References: <320fb6e01001210433n6f42e617g6519ee2790d6add5@mail.gmail.com> Message-ID: <2b6e30c4628585042366646a7b46386e.squirrel@lipid.biocomp.unibo.it> I think that the point here can be a little broader, since not only the swissprot DE lines carry complex and structured data. To define a common, language-independent way to store structured data into the comment and *_qualifier_value tables of the actual BioSQL schema could be very useful. XML looks like a good candidate to me, and the UniprotXML format can be used as reference or as a template to start from. Each Bio* project will then parse and report this structured data in its own programming language data structure. Andrea > Hi all, > > This is cross posted to try and ensure relevant people see it. > I suggest we continue the discussion on the BioSQL list > (for how to serialise structured annotation to BioSQL), and/or > the OpenBio list (for things like file format naming conventions). > > I am hoping we (Bio*) can be consistent in how we parse and load > into BioSQL the SwissProt DE lines (known as "swiss" format in > both BioPerl and Biopython's SeqIO, and by EMBOSS) or the > equivalent UniProt XML tags (which we are tentatively going to > call the "uniprot" format in Biopython's SeqIO - comments?). > > Like BioPerl (etc), Biopython can parse plain text SwissProt ("swiss") > files and load them into BioSQL. Biopython currently treats the DE > comment lines as a long string, as BioPerl used to: > > http://lists.open-bio.org/pipermail/bioperl-l/2009-May/030041.html > http://lists.open-bio.org/pipermail/biosql-l/2009-May/001514.html > > I understand that BioPerl now turns the SwissProt DE lines into a > TagTree, and for storing this in BioSQL this gets serialised as XML. > I would like Biopython to handle this the same way (although rather > than a Perl TagTree, we'd use a Python structure of course), and > would appreciate clarification of what exactly was implemented > (e.g. which bit of the BioPerl source code should be look at, > and could you show a worked example?). > > Andrea Pierlenoin (CC'd - not sure if he is on the BioSQL or > Open-Bio lists yet) has started work on parsing UniProt XML > files for Biopython. Here the DE comment lines are already > provided broken up with XML markup. Hopefully their nested > structure matches what BioPerl was doing with the SwissProt > DE lines. > > Regards, > > Peter > From avilella at gmail.com Fri Jan 22 11:04:13 2010 From: avilella at gmail.com (Albert Vilella) Date: Fri, 22 Jan 2010 16:04:13 +0000 Subject: [Bioperl-l] Merging fragments in a simplealign In-Reply-To: <058430B4-F149-438C-A9DD-C9E24545FDBB@illinois.edu> References: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com> <058430B4-F149-438C-A9DD-C9E24545FDBB@illinois.edu> Message-ID: <358f4d651001220804o3a96252ctc4b721771668f1ba@mail.gmail.com> Is there/should be a 'have_pairwise_overlap' method similar to this? # $seq1 and $seq3 have matching ids my $seq1 = $aln->each_seq_by_id($seq1->display_id); my $seq3 = $aln->each_seq_by_id($seq3->display_id); my $ret = $aln->have_pairwise_overlap($seq1,$seq3); On Fri, Jan 22, 2010 at 1:40 PM, Chris Fields wrote: > May be something for the cook/scrapbook? > > chris > > On Jan 22, 2010, at 7:31 AM, Mark A. Jensen wrote: > > > Here's one of my favorite tricks for this: XOR mask on gap symbol. > > MAJ > > > > use Bio::SeqIO; > > use Bio::Seq; > > use strict; > > my $seqio = Bio::SeqIO->new( -fh => \*DATA ); > > > > my $acc = $seqio->next_seq->seq ^ '-'; > > while ($_ = $seqio->next_seq ) { > > $acc ^= ($_->seq ^ '-'); > > } > > my $mrg = Bio::Seq->new( -id => 'merged', > > -seq => $acc ^ '-' ); > > 1; > > > > > > __END__ > >> seq2.234 > > QWERTYU------------------- > >> seq2.345 > > ----------ASDFGH---------- > >> seq2.456 > > -------------------ZXCVBNM > > > > ----- Original Message ----- From: "Albert Vilella" > > To: > > Sent: Friday, January 22, 2010 8:07 AM > > Subject: [Bioperl-l] Merging fragments in a simplealign > > > > > >> Hi, > >> I would like to write a script that merges fragments in a > Bio::SimpleAlign > >> object on the basis of > >> some $seq->display_name rule. > >> I basically want to start with something like this: > >> seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM > >> seq2.234 QWERTYU------------------- > >> seq2.345 ----------ASDFGH---------- > >> seq2.456 -------------------ZXCVBNM > >> And end with something like this: > >> seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM > >> seq2.mrg QWERTYU---ASDFGH---ZXCVBNM > >> Can people suggest any Bio::SimpleAlign methods that would help here? > >> Cheers, > >> Albert. > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Fri Jan 22 11:02:55 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 22 Jan 2010 11:02:55 -0500 Subject: [Bioperl-l] Merging fragments in a simplealign In-Reply-To: <058430B4-F149-438C-A9DD-C9E24545FDBB@illinois.edu> References: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com> <058430B4-F149-438C-A9DD-C9E24545FDBB@illinois.edu> Message-ID: http://www.bioperl.org/wiki/Merge_gapped_sequences_across_a_common_region ----- Original Message ----- From: "Chris Fields" To: "Mark A. Jensen" Cc: "Albert Vilella" ; Sent: Friday, January 22, 2010 8:40 AM Subject: Re: [Bioperl-l] Merging fragments in a simplealign > May be something for the cook/scrapbook? > > chris > > On Jan 22, 2010, at 7:31 AM, Mark A. Jensen wrote: > >> Here's one of my favorite tricks for this: XOR mask on gap symbol. >> MAJ >> >> use Bio::SeqIO; >> use Bio::Seq; >> use strict; >> my $seqio = Bio::SeqIO->new( -fh => \*DATA ); >> >> my $acc = $seqio->next_seq->seq ^ '-'; >> while ($_ = $seqio->next_seq ) { >> $acc ^= ($_->seq ^ '-'); >> } >> my $mrg = Bio::Seq->new( -id => 'merged', >> -seq => $acc ^ '-' ); >> 1; >> >> >> __END__ >>> seq2.234 >> QWERTYU------------------- >>> seq2.345 >> ----------ASDFGH---------- >>> seq2.456 >> -------------------ZXCVBNM >> >> ----- Original Message ----- From: "Albert Vilella" >> To: >> Sent: Friday, January 22, 2010 8:07 AM >> Subject: [Bioperl-l] Merging fragments in a simplealign >> >> >>> Hi, >>> I would like to write a script that merges fragments in a Bio::SimpleAlign >>> object on the basis of >>> some $seq->display_name rule. >>> I basically want to start with something like this: >>> seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM >>> seq2.234 QWERTYU------------------- >>> seq2.345 ----------ASDFGH---------- >>> seq2.456 -------------------ZXCVBNM >>> And end with something like this: >>> seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM >>> seq2.mrg QWERTYU---ASDFGH---ZXCVBNM >>> Can people suggest any Bio::SimpleAlign methods that would help here? >>> Cheers, >>> Albert. >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From avilella at gmail.com Fri Jan 22 12:50:57 2010 From: avilella at gmail.com (Albert Vilella) Date: Fri, 22 Jan 2010 17:50:57 +0000 Subject: [Bioperl-l] Merging fragments in a simplealign In-Reply-To: <358f4d651001220804o3a96252ctc4b721771668f1ba@mail.gmail.com> References: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com> <058430B4-F149-438C-A9DD-C9E24545FDBB@illinois.edu> <358f4d651001220804o3a96252ctc4b721771668f1ba@mail.gmail.com> Message-ID: <358f4d651001220950v605a5b55kef0302ff6270f82e@mail.gmail.com> Or to rephrase my answer, what is the closest way for the code below that already exists? On Fri, Jan 22, 2010 at 4:04 PM, Albert Vilella wrote: > Is there/should be a 'have_pairwise_overlap' method similar to this? > > # $seq1 and $seq3 have matching ids > my $seq1 = $aln->each_seq_by_id($seq1->display_id); > my $seq3 = $aln->each_seq_by_id($seq3->display_id); > > my $ret = $aln->have_pairwise_overlap($seq1,$seq3); > > > On Fri, Jan 22, 2010 at 1:40 PM, Chris Fields wrote: > >> May be something for the cook/scrapbook? >> >> chris >> >> On Jan 22, 2010, at 7:31 AM, Mark A. Jensen wrote: >> >> > Here's one of my favorite tricks for this: XOR mask on gap symbol. >> > MAJ >> > >> > use Bio::SeqIO; >> > use Bio::Seq; >> > use strict; >> > my $seqio = Bio::SeqIO->new( -fh => \*DATA ); >> > >> > my $acc = $seqio->next_seq->seq ^ '-'; >> > while ($_ = $seqio->next_seq ) { >> > $acc ^= ($_->seq ^ '-'); >> > } >> > my $mrg = Bio::Seq->new( -id => 'merged', >> > -seq => $acc ^ '-' ); >> > 1; >> > >> > >> > __END__ >> >> seq2.234 >> > QWERTYU------------------- >> >> seq2.345 >> > ----------ASDFGH---------- >> >> seq2.456 >> > -------------------ZXCVBNM >> > >> > ----- Original Message ----- From: "Albert Vilella" > > >> > To: >> > Sent: Friday, January 22, 2010 8:07 AM >> > Subject: [Bioperl-l] Merging fragments in a simplealign >> > >> > >> >> Hi, >> >> I would like to write a script that merges fragments in a >> Bio::SimpleAlign >> >> object on the basis of >> >> some $seq->display_name rule. >> >> I basically want to start with something like this: >> >> seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM >> >> seq2.234 QWERTYU------------------- >> >> seq2.345 ----------ASDFGH---------- >> >> seq2.456 -------------------ZXCVBNM >> >> And end with something like this: >> >> seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM >> >> seq2.mrg QWERTYU---ASDFGH---ZXCVBNM >> >> Can people suggest any Bio::SimpleAlign methods that would help here? >> >> Cheers, >> >> Albert. >> >> _______________________________________________ >> >> Bioperl-l mailing list >> >> Bioperl-l at lists.open-bio.org >> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > From jay at jays.net Fri Jan 22 13:30:57 2010 From: jay at jays.net (Jay Hannah) Date: Fri, 22 Jan 2010 12:30:57 -0600 Subject: [Bioperl-l] Bio::BroodComb - RFC In-Reply-To: References: <638696D6-7529-4717-A05A-F1E8FF1C5A8F@jays.net> Message-ID: On Jan 21, 2010, at 10:31 PM, Chris Fields wrote: > Did you want to release it to CPAN? I'll take a closer look at the docs to get an idea of what you are doing with it, but from my perspective I can see this becoming a nice general use tool akin to Bio::Perl, maybe a bit more lightweight. Yes, I was thinking I would. No one has (yet) told me it's the worst idea ever, so I'm feeling encouraged. :) Given smallish inputs / databases (up to a few million rows) where some lightweight schema + SQLite + BioPerl can get the job done, it's nice to have a little easy-to-run toolbox. New tables and Roles bolt on easily, so I'll be adding them as they surface at $work[1]. Thanks for your interest. :) Jay Hannah http://github.com/jhannah/bio-broodcomb http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah From dalalhina at gmail.com Fri Jan 22 12:31:09 2010 From: dalalhina at gmail.com (hina dalal) Date: Fri, 22 Jan 2010 17:31:09 +0000 Subject: [Bioperl-l] Bioperl installation failed Message-ID: <425f75df1001220931t49f5c768j97d91d2dd1757f19@mail.gmail.com> Hi I have installed PERL from Activesate and now trying to install bioperl but can not do it . Neither from PPM (it is showing error ?Ppm install failed: 404 not found?) nor from CPAN / manual installation. It is not allowing me to download nmake, showing that ?the version of this file is not compatible with the version of windows you are running. Check your computer system information to see whether you need 32 bit or 64 bit of this program.? I am using windows VISTA. Please help. Regards Hina From H.Dalal at sms.ed.ac.uk Fri Jan 22 12:34:55 2010 From: H.Dalal at sms.ed.ac.uk (Hina Dalal) Date: Fri, 22 Jan 2010 17:34:55 +0000 Subject: [Bioperl-l] BioPerl installation failed: please help Message-ID: <20100122173455.c19sbarmswswgswc@www.sms.ed.ac.uk> Hi I have installed PERL from Activesate and now trying to install bioperl but can not do it . Neither from PPM (it is showing error ?Ppm install failed: 404 not found?) nor from CPAN manual installation. It is not allowing me to download nmake, showing that ?the version of this file is not compatible with the version of windows you are running. Check your computer system information to see whether you need 32 bit or 64 bit of this program.? Please help. Regards Hina -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From jason at bioperl.org Fri Jan 22 14:18:30 2010 From: jason at bioperl.org (Jason Stajich) Date: Fri, 22 Jan 2010 11:18:30 -0800 Subject: [Bioperl-l] forcing alphabet in Bio::AlignIO In-Reply-To: <55F51BAA-7DA5-4F32-B680-DCAE1714A5F1@illinois.edu> References: <55F51BAA-7DA5-4F32-B680-DCAE1714A5F1@illinois.edu> Message-ID: <59EC9331-FB2F-4338-AD58-2D501A528A18@bioperl.org> Done, as of r16739. Look forward to the refactor work too. -jason On Jan 22, 2010, at 5:34 AM, Chris Fields wrote: > Sounds good to me. The warnings are a bit too tight on this module > anyway. > > I still think we have plans towards refactoring some of this, not > sure how far along they are: > > http://www.bioperl.org/wiki/Align_Refactor > > chris > > On Jan 22, 2010, at 12:17 AM, Jason Stajich wrote: > >> I'm considering putting in allowable initialization parameter (and >> get/set) for Bio::AlignIO that would allow setting of the >> alphabet. This is then passed to Bio::LocatableSeq creation so >> that _guess_alphabet isn't called. This will allow removal of >> warnings about empty sequences because _guess_alphabet won't be >> called on a sequence if we have explictly set the alphabet. >> >> This worked great on my local install and tests pass. Any >> objections or concerns? >> >> basically it means when you make an AlignIO you can specify the >> alphabet i.e. >> >> my $in = Bio::AlignIO->new(-format => 'fasta', -alphabet => 'dna', - >> file => 'genome.fasaln'); >> >> I have some alignments with empty sequences and I think turning off >> the warnings is appropriate where I force the alphabet choice. It >> should also have a very modest speedup benefit too. >> >> -jason >> -- >> Jason Stajich >> jason.stajich at gmail.com >> jason at bioperl.org >> http://fungalgenomes.org/ >> http://twitter.com/hyphaltip >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ http://twitter.com/hyphaltip From cjfields at illinois.edu Fri Jan 22 14:22:43 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 22 Jan 2010 13:22:43 -0600 Subject: [Bioperl-l] Merging fragments in a simplealign In-Reply-To: <358f4d651001220950v605a5b55kef0302ff6270f82e@mail.gmail.com> References: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com> <058430B4-F149-438C-A9DD-C9E24545FDBB@illinois.edu> <358f4d651001220804o3a96252ctc4b721771668f1ba@mail.gmail.com> <358f4d651001220950v605a5b55kef0302ff6270f82e@mail.gmail.com> Message-ID: <14824B66-2112-46A1-98BB-02FC592A3A9B@illinois.edu> This could exist, but should go into a general Utilities module. Part of the Align refactoring was to pull a good number of the methods into a general utilities module, so this would fit into that category. chris On Jan 22, 2010, at 11:50 AM, Albert Vilella wrote: > Or to rephrase my answer, what is the closest way for the code below that > already exists? > > On Fri, Jan 22, 2010 at 4:04 PM, Albert Vilella wrote: > >> Is there/should be a 'have_pairwise_overlap' method similar to this? >> >> # $seq1 and $seq3 have matching ids >> my $seq1 = $aln->each_seq_by_id($seq1->display_id); >> my $seq3 = $aln->each_seq_by_id($seq3->display_id); >> >> my $ret = $aln->have_pairwise_overlap($seq1,$seq3); >> >> >> On Fri, Jan 22, 2010 at 1:40 PM, Chris Fields wrote: >> >>> May be something for the cook/scrapbook? >>> >>> chris >>> >>> On Jan 22, 2010, at 7:31 AM, Mark A. Jensen wrote: >>> >>>> Here's one of my favorite tricks for this: XOR mask on gap symbol. >>>> MAJ >>>> >>>> use Bio::SeqIO; >>>> use Bio::Seq; >>>> use strict; >>>> my $seqio = Bio::SeqIO->new( -fh => \*DATA ); >>>> >>>> my $acc = $seqio->next_seq->seq ^ '-'; >>>> while ($_ = $seqio->next_seq ) { >>>> $acc ^= ($_->seq ^ '-'); >>>> } >>>> my $mrg = Bio::Seq->new( -id => 'merged', >>>> -seq => $acc ^ '-' ); >>>> 1; >>>> >>>> >>>> __END__ >>>>> seq2.234 >>>> QWERTYU------------------- >>>>> seq2.345 >>>> ----------ASDFGH---------- >>>>> seq2.456 >>>> -------------------ZXCVBNM >>>> >>>> ----- Original Message ----- From: "Albert Vilella" >>> >>>> To: >>>> Sent: Friday, January 22, 2010 8:07 AM >>>> Subject: [Bioperl-l] Merging fragments in a simplealign >>>> >>>> >>>>> Hi, >>>>> I would like to write a script that merges fragments in a >>> Bio::SimpleAlign >>>>> object on the basis of >>>>> some $seq->display_name rule. >>>>> I basically want to start with something like this: >>>>> seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM >>>>> seq2.234 QWERTYU------------------- >>>>> seq2.345 ----------ASDFGH---------- >>>>> seq2.456 -------------------ZXCVBNM >>>>> And end with something like this: >>>>> seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM >>>>> seq2.mrg QWERTYU---ASDFGH---ZXCVBNM >>>>> Can people suggest any Bio::SimpleAlign methods that would help here? >>>>> Cheers, >>>>> Albert. >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Fri Jan 22 14:29:07 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 22 Jan 2010 14:29:07 -0500 Subject: [Bioperl-l] Merging fragments in a simplealign In-Reply-To: <14824B66-2112-46A1-98BB-02FC592A3A9B@illinois.edu> References: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com><058430B4-F149-438C-A9DD-C9E24545FDBB@illinois.edu><358f4d651001220804o3a96252ctc4b721771668f1ba@mail.gmail.com><358f4d651001220950v605a5b55kef0302ff6270f82e@mail.gmail.com> <14824B66-2112-46A1-98BB-02FC592A3A9B@illinois.edu> Message-ID: <0F7B7E5FE70D4C5CB34B27045561823C@NewLife> I'd recommend making an enhancement request via Bugzilla, so we don't forget- MAJ ----- Original Message ----- From: "Chris Fields" To: "Albert Vilella" Cc: "bioperl-l" Sent: Friday, January 22, 2010 2:22 PM Subject: Re: [Bioperl-l] Merging fragments in a simplealign > This could exist, but should go into a general Utilities module. Part of the > Align refactoring was to pull a good number of the methods into a general > utilities module, so this would fit into that category. > > chris > > On Jan 22, 2010, at 11:50 AM, Albert Vilella wrote: > >> Or to rephrase my answer, what is the closest way for the code below that >> already exists? >> >> On Fri, Jan 22, 2010 at 4:04 PM, Albert Vilella wrote: >> >>> Is there/should be a 'have_pairwise_overlap' method similar to this? >>> >>> # $seq1 and $seq3 have matching ids >>> my $seq1 = $aln->each_seq_by_id($seq1->display_id); >>> my $seq3 = $aln->each_seq_by_id($seq3->display_id); >>> >>> my $ret = $aln->have_pairwise_overlap($seq1,$seq3); >>> >>> >>> On Fri, Jan 22, 2010 at 1:40 PM, Chris Fields wrote: >>> >>>> May be something for the cook/scrapbook? >>>> >>>> chris >>>> >>>> On Jan 22, 2010, at 7:31 AM, Mark A. Jensen wrote: >>>> >>>>> Here's one of my favorite tricks for this: XOR mask on gap symbol. >>>>> MAJ >>>>> >>>>> use Bio::SeqIO; >>>>> use Bio::Seq; >>>>> use strict; >>>>> my $seqio = Bio::SeqIO->new( -fh => \*DATA ); >>>>> >>>>> my $acc = $seqio->next_seq->seq ^ '-'; >>>>> while ($_ = $seqio->next_seq ) { >>>>> $acc ^= ($_->seq ^ '-'); >>>>> } >>>>> my $mrg = Bio::Seq->new( -id => 'merged', >>>>> -seq => $acc ^ '-' ); >>>>> 1; >>>>> >>>>> >>>>> __END__ >>>>>> seq2.234 >>>>> QWERTYU------------------- >>>>>> seq2.345 >>>>> ----------ASDFGH---------- >>>>>> seq2.456 >>>>> -------------------ZXCVBNM >>>>> >>>>> ----- Original Message ----- From: "Albert Vilella" >>>> >>>>> To: >>>>> Sent: Friday, January 22, 2010 8:07 AM >>>>> Subject: [Bioperl-l] Merging fragments in a simplealign >>>>> >>>>> >>>>>> Hi, >>>>>> I would like to write a script that merges fragments in a >>>> Bio::SimpleAlign >>>>>> object on the basis of >>>>>> some $seq->display_name rule. >>>>>> I basically want to start with something like this: >>>>>> seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM >>>>>> seq2.234 QWERTYU------------------- >>>>>> seq2.345 ----------ASDFGH---------- >>>>>> seq2.456 -------------------ZXCVBNM >>>>>> And end with something like this: >>>>>> seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM >>>>>> seq2.mrg QWERTYU---ASDFGH---ZXCVBNM >>>>>> Can people suggest any Bio::SimpleAlign methods that would help here? >>>>>> Cheers, >>>>>> Albert. >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Fri Jan 22 14:33:41 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 22 Jan 2010 14:33:41 -0500 Subject: [Bioperl-l] BioPerl installation failed: please help In-Reply-To: <20100122173455.c19sbarmswswgswc@www.sms.ed.ac.uk> References: <20100122173455.c19sbarmswswgswc@www.sms.ed.ac.uk> Message-ID: <2ABEC492CD49450EAE4BFC7BA763E3DB@NewLife> Hina-- See the protocol at http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows#Comand-line_Installation for ActiveState installation. If it doesn't work, please let us know at which step the failure happened. cheers, MAJ ----- Original Message ----- From: "Hina Dalal" To: Sent: Friday, January 22, 2010 12:34 PM Subject: [Bioperl-l] BioPerl installation failed: please help Hi I have installed PERL from Activesate and now trying to install bioperl but can not do it . Neither from PPM (it is showing error "Ppm install failed: 404 not found") nor from CPAN manual installation. It is not allowing me to download nmake, showing that "the version of this file is not compatible with the version of windows you are running. Check your computer system information to see whether you need 32 bit or 64 bit of this program." Please help. Regards Hina -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Fri Jan 22 15:13:15 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 22 Jan 2010 15:13:15 -0500 Subject: [Bioperl-l] BioPerl installation failed: please help In-Reply-To: <20100122200118.053j5kc36skow0wg@www.sms.ed.ac.uk> References: <20100122173455.c19sbarmswswgswc@www.sms.ed.ac.uk><2ABEC492CD49450EAE4BFC7BA763E3DB@NewLife> <20100122200118.053j5kc36skow0wg@www.sms.ed.ac.uk> Message-ID: <9E5DE384E2C8416B8373E390ABDB7DFE@NewLife> Ok Hina, I'm not seeing any issues with the presence or availability of http://bioperl.org/DIST from my machine. Can you access that url in a browser? If not, the king of the King's Buildings may not be allowing access. Also, can you do the following: C:> ppm-shell ppm> repo list Note the number of the repo that corresponds to bioperl (if any) and do ppm> repo describe n where 'n' is that number, and send the output along. cheers, MAJ ----- Original Message ----- From: "Hina Dalal" To: "Mark A. Jensen" Sent: Friday, January 22, 2010 3:01 PM Subject: Re: [Bioperl-l] BioPerl installation failed: please help Hi Mark warm regards I was following that protocol only , but the problem is when I tried to do it from PPM, and when I reach at the stem install BioPerl, it is showing error "Ppm install failed: 404 not found" in the end. and when I tried it by CPAN /manual installation, I couldn't download nmake,its showing that "the version of this file is not compatible with the version of windows you are running. Check your computer system information to see whether you need 32 bit or 64 bit of this program and than contact the software publisher." What should I do? Please help. Regards Hina Quoting "Mark A. Jensen" : > Hina-- See the protocol at > http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows#Comand-line_Installation > for ActiveState installation. If it doesn't work, please let us know at > which step the failure happened. > cheers, MAJ > ----- Original Message ----- From: "Hina Dalal" > To: > Sent: Friday, January 22, 2010 12:34 PM > Subject: [Bioperl-l] BioPerl installation failed: please help > > > Hi > > I have installed PERL from Activesate and now trying to install > bioperl but can not do it . Neither from PPM (it is showing error "Ppm > install failed: 404 not found") nor from CPAN manual installation. It > is not allowing me to download nmake, showing that "the version of > this file is not compatible with the version of windows you are > running. Check your computer system information to see whether you > need 32 bit or 64 bit of this program." > > Please help. > > Regards > > Hina > > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From pengyu.ut at gmail.com Sun Jan 24 20:29:59 2010 From: pengyu.ut at gmail.com (Peng Yu) Date: Sun, 24 Jan 2010 19:29:59 -0600 Subject: [Bioperl-l] Transcribe in bioperl Message-ID: <366c6f341001241729t155483a7q65631b2495c338b0@mail.gmail.com> I found the function 'translate' in bioperl. But I don't find 'transcribe'. Is there such a function? From jason at bioperl.org Sun Jan 24 21:06:48 2010 From: jason at bioperl.org (Jason Stajich) Date: Sun, 24 Jan 2010 18:06:48 -0800 Subject: [Bioperl-l] Transcribe in bioperl In-Reply-To: <366c6f341001241729t155483a7q65631b2495c338b0@mail.gmail.com> References: <366c6f341001241729t155483a7q65631b2495c338b0@mail.gmail.com> Message-ID: What exactly do you want to do? spliced_seq for a feature would be the closest thing... -jason On Jan 24, 2010, at 5:29 PM, Peng Yu wrote: > I found the function 'translate' in bioperl. But I don't find > 'transcribe'. Is there such a function? > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ http://twitter.com/hyphaltip From pengyu.ut at gmail.com Sun Jan 24 21:22:12 2010 From: pengyu.ut at gmail.com (Peng Yu) Date: Sun, 24 Jan 2010 20:22:12 -0600 Subject: [Bioperl-l] Transcribe in bioperl In-Reply-To: References: <366c6f341001241729t155483a7q65631b2495c338b0@mail.gmail.com> Message-ID: <366c6f341001241822t790c19eeo5f5facffa0c35600@mail.gmail.com> To convert from T to U. I could use perl's builtin function. But it is semantically far away from 'transcribe'. If there is a function with name 'transcribe', it will be better. On Sun, Jan 24, 2010 at 8:06 PM, Jason Stajich wrote: > What exactly do you want to do? > spliced_seq for a feature would be the closest thing... > > -jason > On Jan 24, 2010, at 5:29 PM, Peng Yu wrote: > >> I found the function 'translate' in bioperl. But I don't find >> 'transcribe'. Is there such a function? >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > http://fungalgenomes.org/ > http://twitter.com/hyphaltip > > From maj at fortinbras.us Sun Jan 24 21:48:33 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 24 Jan 2010 21:48:33 -0500 Subject: [Bioperl-l] Transcribe in bioperl In-Reply-To: <366c6f341001241822t790c19eeo5f5facffa0c35600@mail.gmail.com> References: <366c6f341001241729t155483a7q65631b2495c338b0@mail.gmail.com> <366c6f341001241822t790c19eeo5f5facffa0c35600@mail.gmail.com> Message-ID: Not a bad idea, a semantics-preserving/checking thing. transcribe() could return an object with alphabet == 'rna' and the T's flipped, or bork if called against an object with alphbet != 'dna'. I can add such a thing to Bio::PrimarySeqI (where all these doodads seem to be stashed), if desired. ----- Original Message ----- From: "Peng Yu" To: "Jason Stajich" Cc: Sent: Sunday, January 24, 2010 9:22 PM Subject: Re: [Bioperl-l] Transcribe in bioperl > To convert from T to U. I could use perl's builtin function. But it is > semantically far away from 'transcribe'. If there is a function with > name 'transcribe', it will be better. > > On Sun, Jan 24, 2010 at 8:06 PM, Jason Stajich wrote: >> What exactly do you want to do? >> spliced_seq for a feature would be the closest thing... >> >> -jason >> On Jan 24, 2010, at 5:29 PM, Peng Yu wrote: >> >>> I found the function 'translate' in bioperl. But I don't find >>> 'transcribe'. Is there such a function? >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> jason.stajich at gmail.com >> jason at bioperl.org >> http://fungalgenomes.org/ >> http://twitter.com/hyphaltip >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Sun Jan 24 23:39:43 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 24 Jan 2010 22:39:43 -0600 Subject: [Bioperl-l] Transcribe in bioperl In-Reply-To: References: <366c6f341001241729t155483a7q65631b2495c338b0@mail.gmail.com> <366c6f341001241822t790c19eeo5f5facffa0c35600@mail.gmail.com> Message-ID: I think the main reason there hasn't been a transcribe() is that very few users ask for it. Most just use a quick '$seq =~ tr/T/U/', or use spliced_seq() and/or translate() (i.e. they don't care about the intermediate mRNA). I don't have a problem with adding a transcribe method to PrimarySeq, but (and Mark has already picked up on this) it should be constrained to DNA only and return RNA. And there might be a case for adding the analogous reverse_translate(). Also worth adding this to the proper interface class (PrimarySeqI, I think) so all Seq/PrimarySeq will have it (or have to implement their own). chris On Jan 24, 2010, at 8:48 PM, Mark A. Jensen wrote: > Not a bad idea, a semantics-preserving/checking thing. transcribe() could return an object with alphabet == 'rna' > and the T's flipped, or bork if called against an object with alphbet != 'dna'. > I can add such a thing to Bio::PrimarySeqI (where all these doodads seem to be stashed), if desired. > > ----- Original Message ----- From: "Peng Yu" > To: "Jason Stajich" > Cc: > Sent: Sunday, January 24, 2010 9:22 PM > Subject: Re: [Bioperl-l] Transcribe in bioperl > > >> To convert from T to U. I could use perl's builtin function. But it is >> semantically far away from 'transcribe'. If there is a function with >> name 'transcribe', it will be better. >> On Sun, Jan 24, 2010 at 8:06 PM, Jason Stajich wrote: >>> What exactly do you want to do? >>> spliced_seq for a feature would be the closest thing... >>> >>> -jason >>> On Jan 24, 2010, at 5:29 PM, Peng Yu wrote: >>> >>>> I found the function 'translate' in bioperl. But I don't find >>>> 'transcribe'. Is there such a function? >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> -- >>> Jason Stajich >>> jason.stajich at gmail.com >>> jason at bioperl.org >>> http://fungalgenomes.org/ >>> http://twitter.com/hyphaltip >>> >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Sun Jan 24 23:43:07 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 24 Jan 2010 22:43:07 -0600 Subject: [Bioperl-l] Transcribe in bioperl In-Reply-To: References: <366c6f341001241729t155483a7q65631b2495c338b0@mail.gmail.com> <366c6f341001241822t790c19eeo5f5facffa0c35600@mail.gmail.com> Message-ID: <489E0B85-0BC3-45DB-8660-494CF69F35FF@illinois.edu> On Jan 24, 2010, at 10:39 PM, Chris Fields wrote: > ...And there might be a case for adding the analogous reverse_translate(). Bah. Meant reverse_transcribe(). Ah well. chris From dan.kortschak at adelaide.edu.au Mon Jan 25 00:33:28 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Mon, 25 Jan 2010 16:03:28 +1030 Subject: [Bioperl-l] BEDTools module Message-ID: <1264397608.4898.9.camel@epistle> Hi All, A wrapper and parser for the BEDTools utilities suite of Aaron Quinlan and Ira Hall is now available in the bioperl-run subversion repository (bioperl-run/trunk r16754). Using BEDTools you can, among other things: * Intersecting two BED files in search of overlapping features. * Merging overlapping features. * Screening for paired-end (PE) overlaps between PE sequences and existing genomic features. * Calculating the depth and breadth of sequence coverage across defined "windows" in a genome. (see for manuals and downloads). BEDTools is a suite of 17 commandline executable. The module attempts to provide and options comprehensively and can return Bio::SeqIO or Bio::SeqFeature::Collection object where appropriate (or Bio::Root::IO where specific handling has not been implemented - please give feedback on desired features for this). cheers Dan From cjfields at illinois.edu Mon Jan 25 00:35:06 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 24 Jan 2010 23:35:06 -0600 Subject: [Bioperl-l] Distance between non-overlapping sequences in DNAStatistics Message-ID: <192B6949-26CA-45EA-A4E6-FD89F216CA84@illinois.edu> Just a quick question for those using DNAStatistics. I just fixed a bug in Bio::Align::DNAStatistics that failed with a div by zero error (bug 2901) on this data: >seq1 GGTACCAGCAGGTGGGCCGCCTACTGCGCACGCGCGGGTTTGCGGGCAGCCGC >seq2 GGTACCAGCTGGTGGGCCGCCTACTGCGCACGCGCGGGTTTGCGGGCAGCCGC >seq3 GGTACCAGCAGGTGGTCCGCCTA------------------------------ >seq4 --------------------------CGCACGCGCGTGTTTGCGGGCAGCCGC Since seq3 and seq4 don't overlap, the distance can't be calculated. In our case, I replace the score with 'NA' as a placeholder, but I'm worried about downstream app breakage. Anyone have an objection to using 'NA' here, or know of ways this may lead to problems elsewhere? chris From jason at bioperl.org Mon Jan 25 00:58:03 2010 From: jason at bioperl.org (Jason Stajich) Date: Sun, 24 Jan 2010 21:58:03 -0800 Subject: [Bioperl-l] Distance between non-overlapping sequences in DNAStatistics In-Reply-To: <192B6949-26CA-45EA-A4E6-FD89F216CA84@illinois.edu> References: <192B6949-26CA-45EA-A4E6-FD89F216CA84@illinois.edu> Message-ID: It could also return -1 which is used as place holder for NA in other programs that generate distance matrices. -jason On Jan 24, 2010, at 9:35 PM, Chris Fields wrote: > Just a quick question for those using DNAStatistics. I just fixed a > bug in Bio::Align::DNAStatistics that failed with a div by zero > error (bug 2901) on this data: > >> seq1 > GGTACCAGCAGGTGGGCCGCCTACTGCGCACGCGCGGGTTTGCGGGCAGCCGC >> seq2 > GGTACCAGCTGGTGGGCCGCCTACTGCGCACGCGCGGGTTTGCGGGCAGCCGC >> seq3 > GGTACCAGCAGGTGGTCCGCCTA------------------------------ >> seq4 > --------------------------CGCACGCGCGTGTTTGCGGGCAGCCGC > > Since seq3 and seq4 don't overlap, the distance can't be > calculated. In our case, I replace the score with 'NA' as a > placeholder, but I'm worried about downstream app breakage. Anyone > have an objection to using 'NA' here, or know of ways this may lead > to problems elsewhere? > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ http://twitter.com/hyphaltip From maj at fortinbras.us Mon Jan 25 08:17:54 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 25 Jan 2010 08:17:54 -0500 Subject: [Bioperl-l] Transcribe in bioperl In-Reply-To: References: <366c6f341001241729t155483a7q65631b2495c338b0@mail.gmail.com><366c6f341001241822t790c19eeo5f5facffa0c35600@mail.gmail.com> Message-ID: transcribe() and rev_transcribe added to Bio::PrimarySeqI, plus tests in t/Seq.t, @ r16757 MAJ ----- Original Message ----- From: "Chris Fields" To: "Mark A. Jensen" Cc: ; "Peng Yu" Sent: Sunday, January 24, 2010 11:39 PM Subject: Re: [Bioperl-l] Transcribe in bioperl >I think the main reason there hasn't been a transcribe() is that very few users >ask for it. Most just use a quick '$seq =~ tr/T/U/', or use spliced_seq() >and/or translate() (i.e. they don't care about the intermediate mRNA). I don't >have a problem with adding a transcribe method to PrimarySeq, but (and Mark has >already picked up on this) it should be constrained to DNA only and return RNA. >And there might be a case for adding the analogous reverse_translate(). > > Also worth adding this to the proper interface class (PrimarySeqI, I think) so > all Seq/PrimarySeq will have it (or have to implement their own). > > chris > > On Jan 24, 2010, at 8:48 PM, Mark A. Jensen wrote: > >> Not a bad idea, a semantics-preserving/checking thing. transcribe() could >> return an object with alphabet == 'rna' >> and the T's flipped, or bork if called against an object with alphbet != >> 'dna'. >> I can add such a thing to Bio::PrimarySeqI (where all these doodads seem to >> be stashed), if desired. >> >> ----- Original Message ----- From: "Peng Yu" >> To: "Jason Stajich" >> Cc: >> Sent: Sunday, January 24, 2010 9:22 PM >> Subject: Re: [Bioperl-l] Transcribe in bioperl >> >> >>> To convert from T to U. I could use perl's builtin function. But it is >>> semantically far away from 'transcribe'. If there is a function with >>> name 'transcribe', it will be better. >>> On Sun, Jan 24, 2010 at 8:06 PM, Jason Stajich wrote: >>>> What exactly do you want to do? >>>> spliced_seq for a feature would be the closest thing... >>>> >>>> -jason >>>> On Jan 24, 2010, at 5:29 PM, Peng Yu wrote: >>>> >>>>> I found the function 'translate' in bioperl. But I don't find >>>>> 'transcribe'. Is there such a function? >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> -- >>>> Jason Stajich >>>> jason.stajich at gmail.com >>>> jason at bioperl.org >>>> http://fungalgenomes.org/ >>>> http://twitter.com/hyphaltip >>>> >>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Mon Jan 25 08:23:12 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 25 Jan 2010 07:23:12 -0600 Subject: [Bioperl-l] BEDTools module In-Reply-To: <1264397608.4898.9.camel@epistle> References: <1264397608.4898.9.camel@epistle> Message-ID: <0F5CE93E-0E6C-4317-806B-A463A9B0917E@illinois.edu> Great work Dan! chris On Jan 24, 2010, at 11:33 PM, Dan Kortschak wrote: > Hi All, > > A wrapper and parser for the BEDTools utilities suite of Aaron Quinlan > and Ira Hall is now available in the bioperl-run subversion repository > (bioperl-run/trunk r16754). > > Using BEDTools you can, among other things: > > * Intersecting two BED files in search of overlapping features. > * Merging overlapping features. > * Screening for paired-end (PE) overlaps between PE sequences and > existing genomic features. > * Calculating the depth and breadth of sequence coverage across > defined "windows" in a genome. > > (see for manuals and downloads). > > BEDTools is a suite of 17 commandline executable. The module attempts to > provide and options comprehensively and can return Bio::SeqIO or > Bio::SeqFeature::Collection object where appropriate (or Bio::Root::IO > where specific handling has not been implemented - please give feedback > on desired features for this). > > cheers > Dan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Jan 25 08:27:26 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 25 Jan 2010 07:27:26 -0600 Subject: [Bioperl-l] Distance between non-overlapping sequences in DNAStatistics In-Reply-To: References: <192B6949-26CA-45EA-A4E6-FD89F216CA84@illinois.edu> Message-ID: That works for me, just want to ensure we're DTRT. I'll change it over. chris On Jan 24, 2010, at 11:58 PM, Jason Stajich wrote: > It could also return -1 which is used as place holder for NA in other programs that generate distance matrices. > -jason > On Jan 24, 2010, at 9:35 PM, Chris Fields wrote: > >> Just a quick question for those using DNAStatistics. I just fixed a bug in Bio::Align::DNAStatistics that failed with a div by zero error (bug 2901) on this data: >> >>> seq1 >> GGTACCAGCAGGTGGGCCGCCTACTGCGCACGCGCGGGTTTGCGGGCAGCCGC >>> seq2 >> GGTACCAGCTGGTGGGCCGCCTACTGCGCACGCGCGGGTTTGCGGGCAGCCGC >>> seq3 >> GGTACCAGCAGGTGGTCCGCCTA------------------------------ >>> seq4 >> --------------------------CGCACGCGCGTGTTTGCGGGCAGCCGC >> >> Since seq3 and seq4 don't overlap, the distance can't be calculated. In our case, I replace the score with 'NA' as a placeholder, but I'm worried about downstream app breakage. Anyone have an objection to using 'NA' here, or know of ways this may lead to problems elsewhere? >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > http://fungalgenomes.org/ > http://twitter.com/hyphaltip > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Mon Jan 25 08:41:38 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 25 Jan 2010 08:41:38 -0500 Subject: [Bioperl-l] BEDTools module In-Reply-To: <1264397608.4898.9.camel@epistle> References: <1264397608.4898.9.camel@epistle> Message-ID: <8D494783F87E4C32BD797008E260C3C2@NewLife> Rock 'n' roll, Dan! ----- Original Message ----- From: "Dan Kortschak" To: Sent: Monday, January 25, 2010 12:33 AM Subject: [Bioperl-l] BEDTools module > Hi All, > > A wrapper and parser for the BEDTools utilities suite of Aaron Quinlan > and Ira Hall is now available in the bioperl-run subversion repository > (bioperl-run/trunk r16754). > > Using BEDTools you can, among other things: > > * Intersecting two BED files in search of overlapping features. > * Merging overlapping features. > * Screening for paired-end (PE) overlaps between PE sequences and > existing genomic features. > * Calculating the depth and breadth of sequence coverage across > defined "windows" in a genome. > > (see for manuals and downloads). > > BEDTools is a suite of 17 commandline executable. The module attempts to > provide and options comprehensively and can return Bio::SeqIO or > Bio::SeqFeature::Collection object where appropriate (or Bio::Root::IO > where specific handling has not been implemented - please give feedback > on desired features for this). > > cheers > Dan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From rtbio.2009 at gmail.com Mon Jan 25 08:43:19 2010 From: rtbio.2009 at gmail.com (Roopa Raghuveer) Date: Mon, 25 Jan 2010 14:43:19 +0100 Subject: [Bioperl-l] Regarding blast in Bioperl Message-ID: Hello Mark,Chris and all, This is Roopa again. I have a small problem again. I am working on Remote blast. The program works well. But the problem is this. The program accesses the server and gets the output correctly. I am trying to send the result sequences into an array and I found that always the first sequence among the Result sequences is missing. The code is my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , '-organism' => "$organ\[ORGN]"); - Show quoted text - while (my $input = $str->next_seq()) { #Blast a sequence against a database: #Alternatively, you could pass in a file with many #sequences rather than loop through sequence one at a time #Remove the loop starting 'while (my $input = $str->next_seq())' #and swap the two lines below for an example of that. open(OUTFILE,'>',$debugfile); print OUTFILE $input; close(OUTFILE); my $r = $factory->submit_blast($input); open(OUTFILE,'>',$debugfile); # print OUTFILE $r; close(OUTFILE); print STDERR "waiting...." if($v>0); while ( my @rids = $factory->each_rid ) { open(OUTFILE,'>',$debugfile); # print OUTFILE "while entered"; close(OUTFILE); foreach my $rid ( @rids ) { open(OUTFILE,'>',$debugfile); # print OUTFILE "foreach entered"; close(OUTFILE); my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $factory->remove_rid($rid); } open(OUTFILE,'>',$debugfile); # print OUTFILE "if entered"; close(OUTFILE); print STDERR "." if ( $v > 0 ); sleep 5; } else { open(OUTFILE,'>',$debugfile); # print OUTFILE "else entered"; close(OUTFILE); my $result = $rc->next_result(); #save the output $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; open(BLASTDEBUGFILE,'>',$blastdebugfile); print BLASTDEBUGFILE $result->next_hit(); close(BLASTDEBUGFILE); my $filename = $serverpath."/blastdata_". time()."\.out"; # open(DEBUGFILE,'>',$debugfile); # open(new,'>',$filename); # @arra=; # print DEBUGFILE @arra; # close(DEBUGFILE); # close(new); $factory->save_output($filename); # open(BLASTDEBUGFILE,'>',$debugfile); # print BLASTDEBUGFILE "Hello $rid"; # close(BLASTDEBUGFILE); $factory->remove_rid($rid); open(BLASTDEBUGFILE,'>',$blastdebugfile); print BLASTDEBUGFILE $organism; close(BLASTDEBUGFILE); # open(OUTFILE,'>',$outfile); # print OUTFILE "Test2 $result->database_name()"; # close(OUTFILE); #$hit = $result->next_hit; #open(new,'>',$debugfile); #print $hit; #close(new); $dummy=0; while ( my $hit = $result->next_hit ) { next unless ( $v >= 0); # open(OUTFILE,'>',$debugfile); # print OUTFILE "$hit in while hits"; # close(OUTFILE); my $sequ = $gb->get_Seq_by_version($hit->name); my $dna = $sequ->seq(); # get the sequence as a string $dummy++; open(OUTFILE,'>',$debugfile); # print OUTFILE $dummy; close(OUTFILE); push(@seqs,$dna); } } } } } $warum=@seqs; open(OUTFILE,'>',$debugfile); # print OUTFILE $warum; print OUTFILE @seqs; close(OUTFILE); return(@seqs); } open(OUTFILE, '>',$outfile) || die ; print OUTFILE "\n RNAi Result \n \n

Inputsequence:
"; Here in the above code, I was trying to debug the code and trying to get the count of the array and even the sequence. But when the output data was giving 1 sequence, the count of the array was 0 and when I tried to print the output sequence I could not get any. It was the same when the no of output sequences was 3, I tried to print the sequences but was getting the count of the array as 2 and was printing only two sequences. Please help me in sorting out this problem. Regards, Roopa. From rtbio.2009 at gmail.com Mon Jan 25 08:44:57 2010 From: rtbio.2009 at gmail.com (Roopa Raghuveer) Date: Mon, 25 Jan 2010 14:44:57 +0100 Subject: [Bioperl-l] remote blast bioperl Message-ID: Hello all, I have a small problem again. I am working on Remote blast. The program works well. But the problem is this. The program accesses the server and gets the output correctly. I am trying to send the result sequences into an array and I found that always the first sequence among the Result sequences is missing. The code is my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , '-organism' => "$organ\[ORGN]"); - Show quoted text - while (my $input = $str->next_seq()) { #Blast a sequence against a database: #Alternatively, you could pass in a file with many #sequences rather than loop through sequence one at a time #Remove the loop starting 'while (my $input = $str->next_seq())' #and swap the two lines below for an example of that. open(OUTFILE,'>',$debugfile); print OUTFILE $input; close(OUTFILE); my $r = $factory->submit_blast($input); open(OUTFILE,'>',$debugfile); # print OUTFILE $r; close(OUTFILE); print STDERR "waiting...." if($v>0); while ( my @rids = $factory->each_rid ) { open(OUTFILE,'>',$debugfile); # print OUTFILE "while entered"; close(OUTFILE); foreach my $rid ( @rids ) { open(OUTFILE,'>',$debugfile); # print OUTFILE "foreach entered"; close(OUTFILE); my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $factory->remove_rid($rid); } open(OUTFILE,'>',$debugfile); # print OUTFILE "if entered"; close(OUTFILE); print STDERR "." if ( $v > 0 ); sleep 5; } else { open(OUTFILE,'>',$debugfile); # print OUTFILE "else entered"; close(OUTFILE); my $result = $rc->next_result(); #save the output $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; open(BLASTDEBUGFILE,'>',$blastdebugfile); print BLASTDEBUGFILE $result->next_hit(); close(BLASTDEBUGFILE); my $filename = $serverpath."/blastdata_". time()."\.out"; # open(DEBUGFILE,'>',$debugfile); # open(new,'>',$filename); # @arra=; # print DEBUGFILE @arra; # close(DEBUGFILE); # close(new); $factory->save_output($filename); # open(BLASTDEBUGFILE,'>',$debugfile); # print BLASTDEBUGFILE "Hello $rid"; # close(BLASTDEBUGFILE); $factory->remove_rid($rid); open(BLASTDEBUGFILE,'>',$blastdebugfile); print BLASTDEBUGFILE $organism; close(BLASTDEBUGFILE); # open(OUTFILE,'>',$outfile); # print OUTFILE "Test2 $result->database_name()"; # close(OUTFILE); #$hit = $result->next_hit; #open(new,'>',$debugfile); #print $hit; #close(new); $dummy=0; while ( my $hit = $result->next_hit ) { next unless ( $v >= 0); # open(OUTFILE,'>',$debugfile); # print OUTFILE "$hit in while hits"; # close(OUTFILE); my $sequ = $gb->get_Seq_by_version($hit->name); my $dna = $sequ->seq(); # get the sequence as a string $dummy++; open(OUTFILE,'>',$debugfile); # print OUTFILE $dummy; close(OUTFILE); push(@seqs,$dna); } } } } } $warum=@seqs; open(OUTFILE,'>',$debugfile); # print OUTFILE $warum; print OUTFILE @seqs; close(OUTFILE); return(@seqs); } open(OUTFILE, '>',$outfile) || die ; print OUTFILE "\n RNAi Result \n \n

Inputsequence:
"; Here in the above code, I was trying to debug the code and trying to get the count of the array and even the sequence. But when the output data was giving 1 sequence, the count of the array was 0 and when I tried to print the output sequence I could not get any. It was the same when the no of output sequences was 3, I tried to print the sequences but was getting the count of the array as 2 and was printing only two sequences. Please help me in sorting out this problem. Regards, Roopa. From cjfields at illinois.edu Mon Jan 25 09:05:44 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 25 Jan 2010 08:05:44 -0600 Subject: [Bioperl-l] remote blast bioperl In-Reply-To: References: Message-ID: <7E402CC5-9C66-4315-B437-7C4EC2317371@illinois.edu> Roopa, We have received all 4+ of your posts. There is absolutely no need for you to keep repeatedly posting the same thing to the list. Be patient, we'll try to get to you as soon as we can! chris On Jan 25, 2010, at 7:44 AM, Roopa Raghuveer wrote: > Hello all, > > I have a small problem again. I am working on Remote blast. The program works well. But the problem is this. The program accesses the server and gets the output correctly. I am trying to send the result sequences into an array and I found that always the first sequence among the Result sequences is missing. The code is > > my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , '-organism' => "$organ\[ORGN]"); > - Show quoted text - > > > while (my $input = $str->next_seq()) > { > #Blast a sequence against a database: > #Alternatively, you could pass in a file with many > #sequences rather than loop through sequence one at a time > #Remove the loop starting 'while (my $input = $str->next_seq())' > #and swap the two lines below for an example of that. > > open(OUTFILE,'>',$debugfile); > print OUTFILE $input; > close(OUTFILE); > > > my $r = $factory->submit_blast($input); > > open(OUTFILE,'>',$debugfile); > # print OUTFILE $r; > close(OUTFILE); > > > print STDERR "waiting...." if($v>0); > > while ( my @rids = $factory->each_rid ) { > open(OUTFILE,'>',$debugfile); > # print OUTFILE "while entered"; > close(OUTFILE); > foreach my $rid ( @rids ) { > > open(OUTFILE,'>',$debugfile); > # print OUTFILE "foreach entered"; > close(OUTFILE); > > my $rc = $factory->retrieve_blast($rid); > > if( !ref($rc) ) > { > if( $rc < 0 ) > { > $factory->remove_rid($rid); > } > open(OUTFILE,'>',$debugfile); > # print OUTFILE "if entered"; > close(OUTFILE); > print STDERR "." if ( $v > 0 ); > sleep 5; > } > else { > open(OUTFILE,'>',$debugfile); > # print OUTFILE "else entered"; > close(OUTFILE); > > my $result = $rc->next_result(); > #save the output > $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; > > open(BLASTDEBUGFILE,'>',$blastdebugfile); > print BLASTDEBUGFILE $result->next_hit(); > close(BLASTDEBUGFILE); > > my $filename = $serverpath."/blastdata_". > time()."\.out"; > > > # open(DEBUGFILE,'>',$debugfile); > # open(new,'>',$filename); > # @arra=; > # print DEBUGFILE @arra; > # close(DEBUGFILE); > # close(new); > > $factory->save_output($filename); > > # open(BLASTDEBUGFILE,'>',$debugfile); > # print BLASTDEBUGFILE "Hello $rid"; > # close(BLASTDEBUGFILE); > > $factory->remove_rid($rid); > > open(BLASTDEBUGFILE,'>',$blastdebugfile); > print BLASTDEBUGFILE $organism; > close(BLASTDEBUGFILE); > > # open(OUTFILE,'>',$outfile); > # print OUTFILE "Test2 $result->database_name()"; > # close(OUTFILE); > > #$hit = $result->next_hit; > #open(new,'>',$debugfile); > #print $hit; > #close(new); > $dummy=0; > > while ( my $hit = $result->next_hit ) { > > next unless ( $v >= 0); > > # open(OUTFILE,'>',$debugfile); > # print OUTFILE "$hit in while hits"; > # close(OUTFILE); > my $sequ = $gb->get_Seq_by_version($hit->name); > my $dna = $sequ->seq(); # get the sequence as a string > $dummy++; > open(OUTFILE,'>',$debugfile); > # print OUTFILE $dummy; > close(OUTFILE); > push(@seqs,$dna); > } > } > } > } > } > > $warum=@seqs; > open(OUTFILE,'>',$debugfile); > # print OUTFILE $warum; > print OUTFILE @seqs; > > close(OUTFILE); > return(@seqs); > } > > open(OUTFILE, '>',$outfile) || die ; > > print OUTFILE "\n > RNAi Result > \n > \n >

> Inputsequence:
"; > > > Here in the above code, I was trying to debug the code and trying to get the count of the array and even the sequence. But when the output data was giving 1 sequence, the count of the array was 0 and when I tried to print the output sequence I could not get any. It was the same when the no of output sequences was 3, I tried to print the sequences but was getting the count of the array as 2 and was printing only two sequences. > > Please help me in sorting out this problem. > > Regards, > Roopa. From jiann-jy at hotmail.com Sun Jan 24 21:03:55 2010 From: jiann-jy at hotmail.com (JY) Date: Sun, 24 Jan 2010 18:03:55 -0800 (PST) Subject: [Bioperl-l] how to retrieve accession number by taxon id?? Message-ID: <4cef88b5-fa53-4e63-9167-30075c10a058@k19g2000yqc.googlegroups.com> i need to retrieve accession number and sequence to complete one of my part in my project, but how to retrieve accession number by the taxon id. From lpaulet at ual.es Mon Jan 25 15:25:55 2010 From: lpaulet at ual.es (Lorenzo Carretero-Paulet) Date: Mon, 25 Jan 2010 21:25:55 +0100 Subject: [Bioperl-l] HTMLResultWriter Message-ID: <4B5DFE53.2000201@ual.es> Hi all, I'm trying to generate a subroutine that performs a BLAST search and returns the corresponding reports in txt, xml and html format. I?m experiencing problems with the latter, as the program returns the following error message: "Can't call method "next_result" without a package or object reference at..." sub blasting { my ($query, $E_value) = @_; my ($outputfilenameB, $outputfilenameX, $outputfilenameH); $outputfilenameB=$query.".BLAST.txt"; $outputfilenameX=$query.".BLAST.xml"; $outputfilenameH=$query.".BLAST.html"; #legacy_blast.pl blastall -i query -d nr -o blast.out --path /opt/blast/bin print qx(du -s /tmp); my _$blast_report _=qx/$blast -p blastp -d $database -i $query -e $E_value -b 20000 -o $outputfilenameB/; my $XMLblast_report =qx/$blast -p blastp -d $database -i $query -e $E_value -m 7 -b 20000 -o $outputfilenameX/; my $writerhtml = new Bio::SearchIO::Writer::HTMLResultWriter(); my $outhtml = new Bio::SearchIO(-writer => $writerhtml, -file => ">$outputfilenameH"); while( my $result = _$blast_report_->next_result ) { # get a result from Bio::SearchIO parsing or build it up in memory $outhtml->write_result($result); } } Can anyone see where the problem is? Cheers! Lorenzo From lpaulet at ual.es Mon Jan 25 15:31:08 2010 From: lpaulet at ual.es (lpaulet at ual.es) Date: Mon, 25 Jan 2010 21:31:08 +0100 Subject: [Bioperl-l] HTMLResultWriter Message-ID: <20100125213108.zws18jpd8gwwkssk@webmail.ual.es> Hi all, I'm trying to generate a subroutine that performs a BLAST search and returns the corresponding reports in txt, xml and html format. I?m experiencing problems with the latter, as the program returns the following error message: "Can't call method "next_result" without a package or object reference at..." sub blasting { my ($query, $E_value) = @_; my ($outputfilenameB, $outputfilenameX, $outputfilenameH); $outputfilenameB=$query.".BLAST.txt"; $outputfilenameX=$query.".BLAST.xml"; $outputfilenameH=$query.".BLAST.html"; #legacy_blast.pl blastall -i query -d nr -o blast.out --path /opt/blast/bin print qx(du -s /tmp); my $blast_report =qx/$blast -p blastp -d $database -i $query -e $E_value -b 20000 -o $outputfilenameB/; my $XMLblast_report =qx/$blast -p blastp -d $database -i $query -e $E_value -m 7 -b 20000 -o $outputfilenameX/; my $writerhtml = new Bio::SearchIO::Writer::HTMLResultWriter(); my $outhtml = new Bio::SearchIO(-writer => $writerhtml, -file => ">$outputfilenameH"); while( my $result = $blast_report->next_result ) { # get a result from Bio::SearchIO parsing or build it up in memory $outhtml->write_result($result); } } Can anyone see where the problem is? Cheers! Lorenzo From dan.kortschak at adelaide.edu.au Mon Jan 25 16:00:37 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Tue, 26 Jan 2010 07:30:37 +1030 Subject: [Bioperl-l] Transcribe in bioperl In-Reply-To: References: Message-ID: <1264453237.4552.3.camel@epistle> A reverse_translate to IUPAC degenerate codes is not a bad idea, particularly for PCR primer design. Dan On Mon, 2010-01-25 at 09:05 -0500, bioperl-l-request at lists.open-bio.org wrote: > On Jan 24, 2010, at 10:39 PM, Chris Fields wrote: > > > ...And there might be a case for adding the analogous > reverse_translate(). > > Bah. Meant reverse_transcribe(). Ah well. > > chris From maj at fortinbras.us Mon Jan 25 16:07:49 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 25 Jan 2010 16:07:49 -0500 Subject: [Bioperl-l] HTMLResultWriter In-Reply-To: <20100125213108.zws18jpd8gwwkssk@webmail.ual.es> References: <20100125213108.zws18jpd8gwwkssk@webmail.ual.es> Message-ID: Lorenzo-- your $blast_report is set to be (some of) the text returned by a system call of a blast program; this isn't going to be an object of any kind, and so no functions can be called from it (as at "$blast_report->next_result"). You need to parse the text generated by the blast call using Bio::SearchIO to get a Bio::Search::Result::BlastResult object. you could do @blast_lines = qx/ ...your blast call... /; open my $bf, ">my.blast"; print $bf, @blast_lines; close $bf; $blast_result = Bio::SearchIO->new(-file=>'my.blast', -format => 'blast'); and carry on from there. But why not look at Bio::Tools::Run::StandAloneBlast or Bio::Tools::Run::StandAloneBlastPlus to run your blasts within perl? These wrap the blast programs and deliver BioPerl objects, rather than plain text output. cheers MAJ ----- Original Message ----- From: To: Sent: Monday, January 25, 2010 3:31 PM Subject: [Bioperl-l] HTMLResultWriter Hi all, I'm trying to generate a subroutine that performs a BLAST search and returns the corresponding reports in txt, xml and html format. I?m experiencing problems with the latter, as the program returns the following error message: "Can't call method "next_result" without a package or object reference at..." sub blasting { my ($query, $E_value) = @_; my ($outputfilenameB, $outputfilenameX, $outputfilenameH); $outputfilenameB=$query.".BLAST.txt"; $outputfilenameX=$query.".BLAST.xml"; $outputfilenameH=$query.".BLAST.html"; #legacy_blast.pl blastall -i query -d nr -o blast.out --path /opt/blast/bin print qx(du -s /tmp); my $blast_report =qx/$blast -p blastp -d $database -i $query -e $E_value -b 20000 -o $outputfilenameB/; my $XMLblast_report =qx/$blast -p blastp -d $database -i $query -e $E_value -m 7 -b 20000 -o $outputfilenameX/; my $writerhtml = new Bio::SearchIO::Writer::HTMLResultWriter(); my $outhtml = new Bio::SearchIO(-writer => $writerhtml, -file => ">$outputfilenameH"); while( my $result = $blast_report->next_result ) { # get a result from Bio::SearchIO parsing or build it up in memory $outhtml->write_result($result); } } Can anyone see where the problem is? Cheers! Lorenzo _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Mon Jan 25 16:09:24 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 25 Jan 2010 22:09:24 +0100 Subject: [Bioperl-l] HTMLResultWriter In-Reply-To: <4B5DFE53.2000201@ual.es> References: <4B5DFE53.2000201@ual.es> Message-ID: > my _$blast_report _=qx/$blast -p blastp -d $database -i $query -e $E_value -b 20000 -o $outputfilenameB/; > while( my $result = _$blast_report_->next_result ) { # get a result from Bio::SearchIO parsing or build it up in memory _$blast_report_ is not a valid variable name, as far as I know. Plus there's a space between report and the final '_' in the first of the above two lines. Does this code compile? Dave From Russell.Smithies at agresearch.co.nz Mon Jan 25 16:14:15 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue, 26 Jan 2010 10:14:15 +1300 Subject: [Bioperl-l] Regarding blast in Bioperl In-Reply-To: References: Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C61AC908B@exchsth.agresearch.co.nz> That's a fair mix of incomplete code you've supplied!! Did you read the documentation for RemoteBlast? The example there will do 99% of what you want. http://search.cpan.org/~cjfields/BioPerl-1.6.1/Bio/Tools/Run/RemoteBlast.pm I'm not entirely sure what you're trying to do (as you've left out a bit of your code) but I assume you're trying to retrieve and print the sequence for each hit. Here's something that works, not sure exactly what/why you want to print but it should get you a bit further. --Russell ================================ #!perl -w use Bio::Tools::Run::RemoteBlast; use Bio::DB::GenBank; use CGI ':standard'; use strict; my $q = new CGI; my @params = ( -prog => 'blastn', -data => 'nr', -expect => '1e-30', -entrez_query => 'Homo sapiens [ORGN]', -readmethod => 'SearchIO' ); my $gb = Bio::DB::GenBank->new; my $factory = Bio::Tools::Run::RemoteBlast->new(@params); #$v is just to turn on and off the messages my $v = 1; my $str = Bio::SeqIO->new( -file => 'test.faa', -format => "fasta" ); while ( my $input = $str->next_seq() ) { my $r = $factory->submit_blast($input); print STDERR "waiting..." if ( $v > 0 ); while ( my @rids = $factory->each_rid ) { foreach my $rid (@rids) { my @seqs = (); my $rc = $factory->retrieve_blast($rid); if ( !ref($rc) ) { if ( $rc < 0 ) { $factory->remove_rid($rid); } print STDERR "." if ( $v > 0 ); sleep 5; } else { my $result = $rc->next_result(); #save the blast output my $filename = $result->query_accession . '.out'; $factory->save_output($filename); $factory->remove_rid($rid); print "\nQuery Name: ", $result->query_name(), "\n"; while ( my $hit = $result->next_hit ) { # store the hit sequences push @seqs, $gb->get_Seq_by_version( $hit->name ); next unless ( $v > 0 ); print "\thit name is ", $hit->name, "\n"; while ( my $hsp = $hit->next_hsp ) { print "\t\tscore is ", $hsp->score, "\n"; } } ## print the seqs you've retrieved?? open( OUTFILE, '>', $result->query_accession . '.htm' ); print OUTFILE $q->start_html('RNAi Result'), $q->h1('RNAi Result'), $q->h2('Input'), $q->pre( toString($input) ), $q->h2('Output'); foreach (@seqs) { #there's probably a better way of printing the seq print OUTFILE $q->pre( toString($_) ); } print OUTFILE $q->end_html; close OUTFILE; } } } } sub toString { my $s = shift; return '>' . $s->display_id . " " . $s->desc . "\n" . $s->seq; } ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From biopython at maubp.freeserve.co.uk Mon Jan 25 16:24:33 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 25 Jan 2010 21:24:33 +0000 Subject: [Bioperl-l] Transcribe in bioperl In-Reply-To: <1264453237.4552.3.camel@epistle> References: <1264453237.4552.3.camel@epistle> Message-ID: <320fb6e01001251324r3e8ec3adt3c6d6f16a4839f56@mail.gmail.com> On Mon, Jan 25, 2010 at 9:00 PM, Dan Kortschak wrote: > A reverse_translate to IUPAC degenerate codes is not a bad idea, > particularly for PCR primer design. I would say it could be a bad idea. For any protein string there are multiple possible back translations, and this cannot be captured fully as a nucleotide string even using the IUPAC ambiguity chars. We debated this back and forth for Biopython, and decided to leave it out. It wasn't possible for a simple back translate to a simple string to handle the use cases we considered, and other options like returning a regular expression covering all possible back translations were too complex (for a core sequence method/function). Peter From jason at bioperl.org Mon Jan 25 16:26:55 2010 From: jason at bioperl.org (Jason Stajich) Date: Mon, 25 Jan 2010 13:26:55 -0800 Subject: [Bioperl-l] Transcribe in bioperl In-Reply-To: <320fb6e01001251324r3e8ec3adt3c6d6f16a4839f56@mail.gmail.com> References: <1264453237.4552.3.camel@epistle> <320fb6e01001251324r3e8ec3adt3c6d6f16a4839f56@mail.gmail.com> Message-ID: <98995830-DC7F-4404-A216-874EF5799DB6@bioperl.org> It was already implemented several years ago -- reverse_translate Bio::Tools::CodonTable -> revtanslate my $seqobj = Bio::PrimarySeq->new(-seq => 'FHGERHEL'); my $iupac_str = $myCodonTable->reverse_translate_all($seqobj); Chris had meant to say reverse_transcribe of RNA -> DNA FWIW. -jason On Jan 25, 2010, at 1:24 PM, Peter wrote: > On Mon, Jan 25, 2010 at 9:00 PM, Dan Kortschak > wrote: >> A reverse_translate to IUPAC degenerate codes is not a bad idea, >> particularly for PCR primer design. > > I would say it could be a bad idea. For any protein string there are > multiple possible back translations, and this cannot be captured > fully as a nucleotide string even using the IUPAC ambiguity chars. > > We debated this back and forth for Biopython, and decided to leave it > out. It wasn't possible for a simple back translate to a simple > string to > handle the use cases we considered, and other options like returning > a regular expression covering all possible back translations were too > complex (for a core sequence method/function). > > Peter > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ http://twitter.com/hyphaltip From maj at fortinbras.us Mon Jan 25 16:19:24 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 25 Jan 2010 16:19:24 -0500 Subject: [Bioperl-l] Transcribe in bioperl In-Reply-To: <1264453237.4552.3.camel@epistle> References: <1264453237.4552.3.camel@epistle> Message-ID: <72B106F0D5FF4F1E858CC9BD1EF33142@NewLife> I think we have that functionality in Bio::Tools::SeqPattern, courtesy of Bruno V--- ----- Original Message ----- From: "Dan Kortschak" To: Sent: Monday, January 25, 2010 4:00 PM Subject: Re: [Bioperl-l] Transcribe in bioperl >A reverse_translate to IUPAC degenerate codes is not a bad idea, > particularly for PCR primer design. > > Dan > > On Mon, 2010-01-25 at 09:05 -0500, bioperl-l-request at lists.open-bio.org > wrote: >> On Jan 24, 2010, at 10:39 PM, Chris Fields wrote: >> >> > ...And there might be a case for adding the analogous >> reverse_translate(). >> >> Bah. Meant reverse_transcribe(). Ah well. >> >> chris > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From dan.kortschak at adelaide.edu.au Mon Jan 25 16:38:44 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Tue, 26 Jan 2010 08:08:44 +1030 Subject: [Bioperl-l] Transcribe in bioperl In-Reply-To: <320fb6e01001251324r3e8ec3adt3c6d6f16a4839f56@mail.gmail.com> References: <1264453237.4552.3.camel@epistle> <320fb6e01001251324r3e8ec3adt3c6d6f16a4839f56@mail.gmail.com> Message-ID: <1264455524.4552.23.camel@epistle> Good to see that these ideas have been considered. I'd be interested to see this discussion, or at least the point dealing with the problems that might arise. I'm at a loss as to how ambiguity codes can't completely describe all possible coding sequences for any given codon table (via Bio::Tools::CodonTable - in fact this already has the revtranslate that could be fitted into a Bio::PrimarySeq method - to answer Mark and Jason's comments, I think that /if/ a reverse_translate method exists, it makes logical sense to have it tied to a sequence object, calling the B:T:CT method on the seq object itself rather than only in Bio::Tools, 2?). Pete, tcn you provide an example of the problems? thanks Dan On Mon, 2010-01-25 at 21:24 +0000, Peter wrote: > I would say it could be a bad idea. For any protein string there are > multiple possible back translations, and this cannot be captured > fully as a nucleotide string even using the IUPAC ambiguity chars. From lpaulet at ual.es Mon Jan 25 16:53:07 2010 From: lpaulet at ual.es (lpaulet at ual.es) Date: Mon, 25 Jan 2010 22:53:07 +0100 Subject: [Bioperl-l] HTMLResultWriter In-Reply-To: References: <4B5DFE53.2000201@ual.es> Message-ID: <20100125225307.2zl2cn2hkcsgccso@webmail.ual.es> Thanks Dave and Mark. Quoting Dave Messina : >> my _$blast_report _=qx/$blast -p blastp -d $database -i $query -e >> $E_value -b 20000 -o $outputfilenameB/; > >> while( my $result = _$blast_report_->next_result ) { # get a result >> from Bio::SearchIO parsing or build it up in memory > > > _$blast_report_ is not a valid variable name, as far as I know. Plus > there's a space between report and the final '_' in the first of > the above two lines. > > Does this code compile? > > Dave > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From rtbio.2009 at gmail.com Mon Jan 25 17:35:32 2010 From: rtbio.2009 at gmail.com (Roopa Raghuveer) Date: Mon, 25 Jan 2010 23:35:32 +0100 Subject: [Bioperl-l] Regarding blast in Bioperl In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C61AC908B@exchsth.agresearch.co.nz> References: <18DF7D20DFEC044098A1062202F5FFF32C61AC908B@exchsth.agresearch.co.nz> Message-ID: Hello Russell, Thank you very much for your reply. My problem is that Remote blast is getting well executed with my code and I am getting the .out file with sequences producing significant alignments. But, when I am trying to retrieve the sequences into an array @seqs, I am able to retrieve all the sequences except for the first hit. If the number of hits that I get in the .out file to be 3, I am able to retrieve only 2 hits i.e., I am able to get only 2 sequences. If there is only one significant hit for my sequence, then the name and description of the sequence appears in the .out file, but I am unable to get it into the array,the array count shows 0 and there would not be any sequence in the array. I hope that you have got me now. Here comes my code, use Bio::SearchIO; use Bio::Search::Result::BlastResult; use Bio::Perl; use Bio::Tools::Run::RemoteBlast; use Bio::Seq; use Bio::SeqIO; use Bio::DB::GenBank; $serverpath = "/srv/www/htdocs/rain/RNAi"; $serverurl = "http://141.84.66.66/rain/RNAi"; $outfile = $serverpath."/rnairesult_".time().".html"; $nuc = $serverpath."/nuc".time().".txt"; $debugfile = $serverpath."/debug_".time().".txt"; $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; my $outstring =""; &parse_form; print "Content-type: text/html\n\n"; print "\n"; print "RNAi Result"; print " \n"; print "\n"; print "\n"; print " Your results will appear here
"; print " Please be patient, runtime can be up to 5 minutes
"; print " This page will automatically reload in 30 seconds."; print "\n"; print "\n"; defined(my $pid = fork) or die "Can't fork: $!"; exit if $pid; open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; open(OUTFILE, '>',$outfile); print OUTFILE "\n RNAi Result \n \n \n Your results will appear here
Please be patient, runtime can be up to 5 minutes
This page will automatically reload in 30 seconds
\n \n"; close(OUTFILE); @compseqs = blastcode($in{'Inputseq'},$in{'Organism'}); $in{'Inputseq'} =~ s/>.*$//m; $in{'Inputseq'} =~ s/[^TAGC]//gim; $in{'Inputseq'} =~ tr/actg/ACTG/; @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'}, $in{'Threshold'}); sub blastcode { $inpu1= $_[0]; $organ= $_[1]; open(NUC,'>',$nuc); print NUC $inpu1,"\n"; close(NUC); my $prog = 'blastn'; my $db = 'refseq_rna'; my $e_val= '1e-10'; my $organism= $organ; $gb = new Bio::DB::GenBank; my @params = ( '-prog' => $prog, '-data' => $db, '-expect' => $e_val, '-readmethod' => 'SearchIO', '-Organism' => $organism ); # open(OUTFILE,'>',$debugfile); # print OUTFILE @params; # close(OUTFILE); my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY => "$organ\[ORGN]"); #my $factory = Bio::Tools::Run::RemoteBlast->new(@params); #change a paramter #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma Brucei[ORGN]'; #change a paramter # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]'; my $v = 1; #$v is just to turn on and off the messages my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , '-organism' => "$organ\[ORGN]"); while (my $input = $str->next_seq()) { #Blast a sequence against a database: #Alternatively, you could pass in a file with many #sequences rather than loop through sequence one at a time #Remove the loop starting 'while (my $input = $str->next_seq())' #and swap the two lines below for an example of that. open(OUTFILE,'>',$debugfile); print OUTFILE $input; close(OUTFILE); my $r = $factory->submit_blast($input); open(OUTFILE,'>',$debugfile); # print OUTFILE $r; close(OUTFILE); print STDERR "waiting...." if($v>0); while ( my @rids = $factory->each_rid ) { foreach my $rid ( @rids ) { my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $factory->remove_rid($rid); } print STDERR "." if ( $v > 0 ); sleep 5; } else { my $result = $rc->next_result(); #save the output $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; open(BLASTDEBUGFILE,'>',$blastdebugfile); print BLASTDEBUGFILE $result->next_hit(); close(BLASTDEBUGFILE); my $filename = $serverpath."/blastdata_".time()."\.out"; # open(DEBUGFILE,'>',$debugfile); # open(new,'>',$filename); # @arra=; # print DEBUGFILE @arra; # close(DEBUGFILE); # close(new); $factory->save_output($filename); # open(BLASTDEBUGFILE,'>',$debugfile); # print BLASTDEBUGFILE "Hello $rid"; # close(BLASTDEBUGFILE); $factory->remove_rid($rid); while ( my $hit = $result->next_hit ) { next unless ( $v >= 0); my $sequ = $gb->get_Seq_by_version($hit->name); my $dna = $sequ->seq(); # get the sequence as a string $dummy++; open(OUTFILE,'>',$debugfile); open(OUTFILE,'>',$debugfile); # print OUTFILE $dna; close(OUTFILE); push(@seqs,$dna); } } } } } $warum=scalar(@seqs); open(OUTFILE,'>',$debugfile); print OUTFILE $warum; # print OUTFILE @seqs; close(OUTFILE); return(@seqs); } open(OUTFILE, '>',$outfile) || die ; print OUTFILE "\n RNAi Result \n \n

Inputsequence:
"; for ($i=0; $i\n"; } } print OUTFILE "

"; $z=@compseqs; for($k=0;$k<$z;$k++) { print OUTFILE "

Compare Sequence:
"; for ($i=0; $i\n"; } } print OUTFILE "

"; } print OUTFILE "

Window:
$in{'Windowsize'}

Threshold:
$in{'Threshold'}

"; my $j=0; for ($i=0; $i{similar}<=$in{'Threshold'}){ $j=$in{'Windowsize'}; } $height=$out[$i]->{similar}*5; } if ($j>0) { print OUTFILE ""; $outstring .= "".substr ($in{'Inputseq'}, $i, 1).""; $j--; } else { print OUTFILE ""; $outstring .= "".substr ($in{'Inputseq'}, $i, 1).""; } if ( ($i+1)%10==0){ $outstring .= " "; } if ( ($i+1)%60==0){ $outstring .= "
\n"; } if ( ($i+1)%800==0){ print OUTFILE "

\n"; } } print OUTFILE "

$outstring"; #foreach (@out) { #print OUTFILE "

Sequence: $_->{sequence}: $_->{similar} matchs

"; #if ($_->{similar}<=$in{'Threshold'}){ # } #} print OUTFILE "\n\n"; close OUTFILE; #nameprint(); sub parse_form { local ($buffer, @pairs, $pair, $name, $value); # Read in text $ENV{'REQUEST_METHOD'} =~ tr/a-z/A-Z/; if ($ENV{'REQUEST_METHOD'} eq "POST") { read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'}); } else { $buffer = $ENV{'QUERY_STRING'}; } @pairs = split(/&/, $buffer); foreach $pair (@pairs) { ($name, $value) = split(/=/, $pair); $value =~ tr/+/ /; $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg; $in{$name} = $value; } } Regards, Roopa. On Mon, Jan 25, 2010 at 10:14 PM, Smithies, Russell < Russell.Smithies at agresearch.co.nz> wrote: > That's a fair mix of incomplete code you've supplied!! > Did you read the documentation for RemoteBlast? The example there will do > 99% of what you want. > http://search.cpan.org/~cjfields/BioPerl-1.6.1/Bio/Tools/Run/RemoteBlast.pm > > I'm not entirely sure what you're trying to do (as you've left out a bit of > your code) but I assume you're trying to retrieve and print the sequence for > each hit. > > Here's something that works, not sure exactly what/why you want to print > but it should get you a bit further. > > --Russell > > > ================================ > #!perl -w > > use Bio::Tools::Run::RemoteBlast; > use Bio::DB::GenBank; > > use CGI ':standard'; > > use strict; > > my $q = new CGI; > > my @params = ( > -prog => 'blastn', > -data => 'nr', > -expect => '1e-30', > -entrez_query => 'Homo sapiens [ORGN]', > -readmethod => 'SearchIO' > ); > > my $gb = Bio::DB::GenBank->new; > > my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > > #$v is just to turn on and off the messages > my $v = 1; > > my $str = Bio::SeqIO->new( -file => 'test.faa', -format => "fasta" ); > > while ( my $input = $str->next_seq() ) { > > my $r = $factory->submit_blast($input); > > print STDERR "waiting..." if ( $v > 0 ); > while ( my @rids = $factory->each_rid ) { > foreach my $rid (@rids) { > my @seqs = (); > my $rc = $factory->retrieve_blast($rid); > if ( !ref($rc) ) { > if ( $rc < 0 ) { > $factory->remove_rid($rid); > } > print STDERR "." if ( $v > 0 ); > sleep 5; > } > else { > my $result = $rc->next_result(); > > #save the blast output > my $filename = $result->query_accession . '.out'; > $factory->save_output($filename); > $factory->remove_rid($rid); > print "\nQuery Name: ", $result->query_name(), "\n"; > while ( my $hit = $result->next_hit ) { > > # store the hit sequences > push @seqs, $gb->get_Seq_by_version( $hit->name ); > > next unless ( $v > 0 ); > print "\thit name is ", $hit->name, "\n"; > while ( my $hsp = $hit->next_hsp ) { > print "\t\tscore is ", $hsp->score, "\n"; > } > } > > ## print the seqs you've retrieved?? > open( OUTFILE, '>', $result->query_accession . '.htm' ); > print OUTFILE $q->start_html('RNAi Result'), > $q->h1('RNAi Result'), > $q->h2('Input'), > $q->pre( toString($input) ), > $q->h2('Output'); > > foreach (@seqs) { > > #there's probably a better way of printing the seq > print OUTFILE $q->pre( toString($_) ); > } > print OUTFILE $q->end_html; > close OUTFILE; > } > } > } > } > > sub toString { > my $s = shift; > return '>' . $s->display_id . " " . $s->desc . "\n" . $s->seq; > } > > > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > From ajmackey at gmail.com Tue Jan 26 08:24:43 2010 From: ajmackey at gmail.com (Aaron Mackey) Date: Tue, 26 Jan 2010 08:24:43 -0500 Subject: [Bioperl-l] Transcribe in bioperl In-Reply-To: <1264455524.4552.23.camel@epistle> References: <1264453237.4552.3.camel@epistle> <320fb6e01001251324r3e8ec3adt3c6d6f16a4839f56@mail.gmail.com> <1264455524.4552.23.camel@epistle> Message-ID: <24c96eca1001260524s3d46e850hfdcc461e22210972@mail.gmail.com> There's also Bio::Tools::IUPAC; given a sequence with IUPAC ambiguity codes, it provides a SeqIO stream that enumerates all the possible unambiguous realizations. Not the right solution for every situation, but quite useful when you need it. -Aaron On Mon, Jan 25, 2010 at 4:38 PM, Dan Kortschak < dan.kortschak at adelaide.edu.au> wrote: > Good to see that these ideas have been considered. > > I'd be interested to see this discussion, or at least the point dealing > with the problems that might arise. I'm at a loss as to how ambiguity > codes can't completely describe all possible coding sequences for any > given codon table (via Bio::Tools::CodonTable - in fact this already has > the revtranslate that could be fitted into a Bio::PrimarySeq method - to > answer Mark and Jason's comments, I think that /if/ a reverse_translate > method exists, it makes logical sense to have it tied to a sequence > object, calling the B:T:CT method on the seq object itself rather than > only in Bio::Tools, 2?). Pete, tcn you provide an example of the > problems? > > thanks > Dan > > On Mon, 2010-01-25 at 21:24 +0000, Peter wrote: > > I would say it could be a bad idea. For any protein string there are > > multiple possible back translations, and this cannot be captured > > fully as a nucleotide string even using the IUPAC ambiguity chars. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From nml5566 at gmail.com Tue Jan 26 16:10:54 2010 From: nml5566 at gmail.com (Nathan Liles) Date: Tue, 26 Jan 2010 15:10:54 -0600 Subject: [Bioperl-l] SVN access Message-ID: <4B5F5A5E.2070406@gmail.com> Does anyone know who I need to talk to for getting developer access for the Bioperl SVN? I want to submit a patch to the genbank2gff3 converter. Thanks, Nathan From Russell.Smithies at agresearch.co.nz Tue Jan 26 20:40:40 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 27 Jan 2010 14:40:40 +1300 Subject: [Bioperl-l] how to retrieve organism name from accession number? In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz> References: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife> <18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz> Grrrrrr, I hate eutils!!!! ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: NCBI esearch fatal error: Search Backend failed: Error 111 (Connection refused) STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 STACK: Bio::Tools::EUtilities::parse_data /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332 STACK: Bio::Tools::EUtilities::get_ids /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441 STACK: Bio::DB::EUtilities::get_ids /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363 STACK: get_desc.pl:32 ----------------------------------------------------------- Nice error message though :-) --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Smithies, Russell > Sent: Monday, 11 January 2010 10:05 a.m. > To: 'Chris Fields' > Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org' > Subject: Re: [Bioperl-l] how to retrieve organism name from accession > number? > > I've started to go off eUtils recently (not BioPerl's fault) as I've often > been finding that with large queries, chunks of the resulting data is > missing. > For example, before Xmas I was creating species-specific databases by > using eUtils to get a list of GI numbers back for a taxid, then retrieving > the fasta sequences in chunks of 500. > Very regularly, in the middle of the fasta there would be a message about > resource unavailable eg. > >test_sequence_1 > TACGATCATCGCTResource UnavailableTACGACTCTGCT > >test_sequence_2 > TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT > > Often this wasn't detected until formatdb complained about invalid > characters. > Inquiries to NCBI as to why this was happening and what to do about it > returned stupid answers ("do each sequence manually thru the web > interface", or "use eUtils"). > As we have a nice fast network connection, I now prefer to download very > large gzip files (i.e. all of refseq) and extract what I need. > > I can't help but think that NCBI could solve a lot of problems if they > gzipped the output from eUtils queries - it's something I've requested > regularly for the last 5 years or so!! > > --Russell > > > > -----Original Message----- > > From: Chris Fields [mailto:cjfields at illinois.edu] > > Sent: Monday, 11 January 2010 9:50 a.m. > > To: Smithies, Russell > > Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-bio.org' > > Subject: Re: [Bioperl-l] how to retrieve organism name from accession > > number? > > > > One could also use Bio::DB::Taxonomy, which indexes the same files or > > (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for the > > details). > > > > chris > > > > On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote: > > > > > An alternate non-BioPerly way (that may be faster given NCBI's > flakiness > > lately) would be to download the gi_taxid_nucl.zip or gi_taxid_prot.zip > > files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash > and > > do lookups. > > > In that same dir, taxdump.tar.gz contains a file called names.dmp > which > > lists taxids and descriptions (and synonyms) > > > > > > If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I > > could do this: > > > > > > my $taxid = $gi_taxid_nucl{$accession}; > > > my $org_name = $names{$taxid}; > > > > > > --Russell > > > > > > > > >> -----Original Message----- > > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > >> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen > > >> Sent: Saturday, 26 December 2009 4:52 p.m. > > >> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org > > >> Subject: Re: [Bioperl-l] how to retrieve organism name from accession > > >> number? > > >> > > >> Bhakti, > > >> The following example (using EUtilities) may serve your purpose: > > >> > > >> use Bio::DB::EUtilities; > > >> > > >> my (%taxa, @taxa); > > >> my (%names, %idmap); > > >> > > >> # these are protein ids; nuc ids will work by changing -dbfrom => > > >> 'nucleotide', > > >> # (probably) > > >> > > >> my @ids = qw(1621261 89318838 68536103 20807972 730439); > > >> > > >> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink', > > >> -db => 'taxonomy', > > >> -dbfrom => 'protein', > > >> -correspondence => 1, > > >> -id => \@ids); > > >> > > >> # iterate through the LinkSet objects > > >> while (my $ds = $factory->next_LinkSet) { > > >> $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0] > > >> } > > >> > > >> @taxa = @taxa{@ids}; > > >> > > >> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary', > > >> -db => 'taxonomy', > > >> -id => \@taxa ); > > >> > > >> while (local $_ = $factory->next_DocSum) { > > >> $names{($_->get_contents_by_name('TaxId'))[0]} = > > >> ($_->get_contents_by_name('ScientificName'))[0]; > > >> } > > >> > > >> foreach (@ids) { > > >> $idmap{$_} = $names{$taxa{$_}}; > > >> } > > >> > > >> # %idmap is > > >> # 1621261 => 'Mycobacterium tuberculosis H37Rv' > > >> # 20807972 => 'Thermoanaerobacter tengcongensis MB4' > > >> # 68536103 => 'Corynebacterium jeikeium K411' > > >> # 730439 => 'Bacillus caldolyticus' > > >> # 89318838 => undef (this record has been removed from the db) > > >> > > >> 1; > > >> > > >> You probably will need to break up your 30000 into chunks > > >> (say, 1000-3000 each), and do the above on each chunk with a > > >> > > >> sleep 3; > > >> > > >> or so separating the queries. > > >> MAJ > > >> ----- Original Message ----- > > >> From: "Bhakti Dwivedi" > > >> To: > > >> Sent: Friday, December 25, 2009 9:46 PM > > >> Subject: [Bioperl-l] how to retrieve organism name from accession > > number? > > >> > > >> > > >>> Hi, > > >>> > > >>> Does anyone know how to retrieve the "Source" or the "Species name" > > >> given > > >>> the accession number using Bioperl. I have these 30,000 accession > > >> numbers > > >>> for which I need to get the source organisms. Any kind of help will > > be > > >>> appreciated. > > >>> > > >>> Thanks > > >>> > > >>> BD > > >>> _______________________________________________ > > >>> Bioperl-l mailing list > > >>> Bioperl-l at lists.open-bio.org > > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > >>> > > >>> > > >> > > >> _______________________________________________ > > >> Bioperl-l mailing list > > >> Bioperl-l at lists.open-bio.org > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > ======================================================================= > > > Attention: The information contained in this message and/or > attachments > > > from AgResearch Limited is intended only for the persons or entities > > > to which it is addressed and may contain confidential and/or > privileged > > > material. Any review, retransmission, dissemination or other use of, > or > > > taking of any action in reliance upon, this information by persons or > > > entities other than the intended recipients is prohibited by > AgResearch > > > Limited. If you have received this message in error, please notify the > > > sender immediately. > > > > ======================================================================= > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Jan 26 20:46:26 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 26 Jan 2010 19:46:26 -0600 Subject: [Bioperl-l] how to retrieve organism name from accession number? In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz> References: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife> <18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz> Message-ID: <4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu> It's unfortunate but I have heard this problem popping up quite a bit more frequently lately. Not to push too many buttons but NCBI isn't very forthcoming with help these days; they have become quite insular. Not sure if they're short-staffed due to budget or if there are other issues. chris On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote: > Grrrrrr, I hate eutils!!!! > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: NCBI esearch fatal error: Search Backend failed: Error 111 (Connection refused) > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 > STACK: Bio::Tools::EUtilities::parse_data /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332 > STACK: Bio::Tools::EUtilities::get_ids /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441 > STACK: Bio::DB::EUtilities::get_ids /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363 > STACK: get_desc.pl:32 > ----------------------------------------------------------- > > > Nice error message though :-) > > > --Russell > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell >> Sent: Monday, 11 January 2010 10:05 a.m. >> To: 'Chris Fields' >> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org' >> Subject: Re: [Bioperl-l] how to retrieve organism name from accession >> number? >> >> I've started to go off eUtils recently (not BioPerl's fault) as I've often >> been finding that with large queries, chunks of the resulting data is >> missing. >> For example, before Xmas I was creating species-specific databases by >> using eUtils to get a list of GI numbers back for a taxid, then retrieving >> the fasta sequences in chunks of 500. >> Very regularly, in the middle of the fasta there would be a message about >> resource unavailable eg. >>> test_sequence_1 >> TACGATCATCGCTResource UnavailableTACGACTCTGCT >>> test_sequence_2 >> TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT >> >> Often this wasn't detected until formatdb complained about invalid >> characters. >> Inquiries to NCBI as to why this was happening and what to do about it >> returned stupid answers ("do each sequence manually thru the web >> interface", or "use eUtils"). >> As we have a nice fast network connection, I now prefer to download very >> large gzip files (i.e. all of refseq) and extract what I need. >> >> I can't help but think that NCBI could solve a lot of problems if they >> gzipped the output from eUtils queries - it's something I've requested >> regularly for the last 5 years or so!! >> >> --Russell >> >> >>> -----Original Message----- >>> From: Chris Fields [mailto:cjfields at illinois.edu] >>> Sent: Monday, 11 January 2010 9:50 a.m. >>> To: Smithies, Russell >>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-bio.org' >>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession >>> number? >>> >>> One could also use Bio::DB::Taxonomy, which indexes the same files or >>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for the >>> details). >>> >>> chris >>> >>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote: >>> >>>> An alternate non-BioPerly way (that may be faster given NCBI's >> flakiness >>> lately) would be to download the gi_taxid_nucl.zip or gi_taxid_prot.zip >>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash >> and >>> do lookups. >>>> In that same dir, taxdump.tar.gz contains a file called names.dmp >> which >>> lists taxids and descriptions (and synonyms) >>>> >>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I >>> could do this: >>>> >>>> my $taxid = $gi_taxid_nucl{$accession}; >>>> my $org_name = $names{$taxid}; >>>> >>>> --Russell >>>> >>>> >>>>> -----Original Message----- >>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen >>>>> Sent: Saturday, 26 December 2009 4:52 p.m. >>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org >>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession >>>>> number? >>>>> >>>>> Bhakti, >>>>> The following example (using EUtilities) may serve your purpose: >>>>> >>>>> use Bio::DB::EUtilities; >>>>> >>>>> my (%taxa, @taxa); >>>>> my (%names, %idmap); >>>>> >>>>> # these are protein ids; nuc ids will work by changing -dbfrom => >>>>> 'nucleotide', >>>>> # (probably) >>>>> >>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439); >>>>> >>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink', >>>>> -db => 'taxonomy', >>>>> -dbfrom => 'protein', >>>>> -correspondence => 1, >>>>> -id => \@ids); >>>>> >>>>> # iterate through the LinkSet objects >>>>> while (my $ds = $factory->next_LinkSet) { >>>>> $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0] >>>>> } >>>>> >>>>> @taxa = @taxa{@ids}; >>>>> >>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary', >>>>> -db => 'taxonomy', >>>>> -id => \@taxa ); >>>>> >>>>> while (local $_ = $factory->next_DocSum) { >>>>> $names{($_->get_contents_by_name('TaxId'))[0]} = >>>>> ($_->get_contents_by_name('ScientificName'))[0]; >>>>> } >>>>> >>>>> foreach (@ids) { >>>>> $idmap{$_} = $names{$taxa{$_}}; >>>>> } >>>>> >>>>> # %idmap is >>>>> # 1621261 => 'Mycobacterium tuberculosis H37Rv' >>>>> # 20807972 => 'Thermoanaerobacter tengcongensis MB4' >>>>> # 68536103 => 'Corynebacterium jeikeium K411' >>>>> # 730439 => 'Bacillus caldolyticus' >>>>> # 89318838 => undef (this record has been removed from the db) >>>>> >>>>> 1; >>>>> >>>>> You probably will need to break up your 30000 into chunks >>>>> (say, 1000-3000 each), and do the above on each chunk with a >>>>> >>>>> sleep 3; >>>>> >>>>> or so separating the queries. >>>>> MAJ >>>>> ----- Original Message ----- >>>>> From: "Bhakti Dwivedi" >>>>> To: >>>>> Sent: Friday, December 25, 2009 9:46 PM >>>>> Subject: [Bioperl-l] how to retrieve organism name from accession >>> number? >>>>> >>>>> >>>>>> Hi, >>>>>> >>>>>> Does anyone know how to retrieve the "Source" or the "Species name" >>>>> given >>>>>> the accession number using Bioperl. I have these 30,000 accession >>>>> numbers >>>>>> for which I need to get the source organisms. Any kind of help will >>> be >>>>>> appreciated. >>>>>> >>>>>> Thanks >>>>>> >>>>>> BD >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >> ======================================================================= >>>> Attention: The information contained in this message and/or >> attachments >>>> from AgResearch Limited is intended only for the persons or entities >>>> to which it is addressed and may contain confidential and/or >> privileged >>>> material. Any review, retransmission, dissemination or other use of, >> or >>>> taking of any action in reliance upon, this information by persons or >>>> entities other than the intended recipients is prohibited by >> AgResearch >>>> Limited. If you have received this message in error, please notify the >>>> sender immediately. >>>> >> ======================================================================= >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l From Russell.Smithies at agresearch.co.nz Tue Jan 26 20:59:15 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 27 Jan 2010 14:59:15 +1300 Subject: [Bioperl-l] how to retrieve organism name from accession number? In-Reply-To: <4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu> References: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife> <18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz> <4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C61AC961E@exchsth.agresearch.co.nz> I've had a wide selection of errors lately: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: NCBI esearch fatal error: Search Backend failed: Error 11 (Resource temporarily unavailable) STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 STACK: Bio::Tools::EUtilities::parse_data /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332 STACK: Bio::Tools::EUtilities::get_ids /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441 STACK: Bio::DB::EUtilities::get_ids /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363 STACK: get_desc.pl:32 ----------------------------------------------------------- And I never get a good explanation from NCBI or suggestions on how to avoid it. --Russell > -----Original Message----- > From: Chris Fields [mailto:cjfields at illinois.edu] > Sent: Wednesday, 27 January 2010 2:46 p.m. > To: Smithies, Russell > Cc: 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org' > Subject: Re: [Bioperl-l] how to retrieve organism name from accession > number? > > It's unfortunate but I have heard this problem popping up quite a bit more > frequently lately. Not to push too many buttons but NCBI isn't very > forthcoming with help these days; they have become quite insular. Not > sure if they're short-staffed due to budget or if there are other issues. > > chris > > On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote: > > > Grrrrrr, I hate eutils!!!! > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: NCBI esearch fatal error: Search Backend failed: Error 111 > (Connection refused) > > STACK: Error::throw > > STACK: Bio::Root::Root::throw > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 > > STACK: Bio::Tools::EUtilities::parse_data > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332 > > STACK: Bio::Tools::EUtilities::get_ids > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441 > > STACK: Bio::DB::EUtilities::get_ids > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363 > > STACK: get_desc.pl:32 > > ----------------------------------------------------------- > > > > > > Nice error message though :-) > > > > > > --Russell > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell > >> Sent: Monday, 11 January 2010 10:05 a.m. > >> To: 'Chris Fields' > >> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org' > >> Subject: Re: [Bioperl-l] how to retrieve organism name from accession > >> number? > >> > >> I've started to go off eUtils recently (not BioPerl's fault) as I've > often > >> been finding that with large queries, chunks of the resulting data is > >> missing. > >> For example, before Xmas I was creating species-specific databases by > >> using eUtils to get a list of GI numbers back for a taxid, then > retrieving > >> the fasta sequences in chunks of 500. > >> Very regularly, in the middle of the fasta there would be a message > about > >> resource unavailable eg. > >>> test_sequence_1 > >> TACGATCATCGCTResource UnavailableTACGACTCTGCT > >>> test_sequence_2 > >> TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT > >> > >> Often this wasn't detected until formatdb complained about invalid > >> characters. > >> Inquiries to NCBI as to why this was happening and what to do about it > >> returned stupid answers ("do each sequence manually thru the web > >> interface", or "use eUtils"). > >> As we have a nice fast network connection, I now prefer to download > very > >> large gzip files (i.e. all of refseq) and extract what I need. > >> > >> I can't help but think that NCBI could solve a lot of problems if they > >> gzipped the output from eUtils queries - it's something I've requested > >> regularly for the last 5 years or so!! > >> > >> --Russell > >> > >> > >>> -----Original Message----- > >>> From: Chris Fields [mailto:cjfields at illinois.edu] > >>> Sent: Monday, 11 January 2010 9:50 a.m. > >>> To: Smithies, Russell > >>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-bio.org' > >>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession > >>> number? > >>> > >>> One could also use Bio::DB::Taxonomy, which indexes the same files or > >>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for > the > >>> details). > >>> > >>> chris > >>> > >>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote: > >>> > >>>> An alternate non-BioPerly way (that may be faster given NCBI's > >> flakiness > >>> lately) would be to download the gi_taxid_nucl.zip or > gi_taxid_prot.zip > >>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash > >> and > >>> do lookups. > >>>> In that same dir, taxdump.tar.gz contains a file called names.dmp > >> which > >>> lists taxids and descriptions (and synonyms) > >>>> > >>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I > >>> could do this: > >>>> > >>>> my $taxid = $gi_taxid_nucl{$accession}; > >>>> my $org_name = $names{$taxid}; > >>>> > >>>> --Russell > >>>> > >>>> > >>>>> -----Original Message----- > >>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen > >>>>> Sent: Saturday, 26 December 2009 4:52 p.m. > >>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org > >>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from > accession > >>>>> number? > >>>>> > >>>>> Bhakti, > >>>>> The following example (using EUtilities) may serve your purpose: > >>>>> > >>>>> use Bio::DB::EUtilities; > >>>>> > >>>>> my (%taxa, @taxa); > >>>>> my (%names, %idmap); > >>>>> > >>>>> # these are protein ids; nuc ids will work by changing -dbfrom => > >>>>> 'nucleotide', > >>>>> # (probably) > >>>>> > >>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439); > >>>>> > >>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink', > >>>>> -db => 'taxonomy', > >>>>> -dbfrom => 'protein', > >>>>> -correspondence => 1, > >>>>> -id => \@ids); > >>>>> > >>>>> # iterate through the LinkSet objects > >>>>> while (my $ds = $factory->next_LinkSet) { > >>>>> $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0] > >>>>> } > >>>>> > >>>>> @taxa = @taxa{@ids}; > >>>>> > >>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary', > >>>>> -db => 'taxonomy', > >>>>> -id => \@taxa ); > >>>>> > >>>>> while (local $_ = $factory->next_DocSum) { > >>>>> $names{($_->get_contents_by_name('TaxId'))[0]} = > >>>>> ($_->get_contents_by_name('ScientificName'))[0]; > >>>>> } > >>>>> > >>>>> foreach (@ids) { > >>>>> $idmap{$_} = $names{$taxa{$_}}; > >>>>> } > >>>>> > >>>>> # %idmap is > >>>>> # 1621261 => 'Mycobacterium tuberculosis H37Rv' > >>>>> # 20807972 => 'Thermoanaerobacter tengcongensis MB4' > >>>>> # 68536103 => 'Corynebacterium jeikeium K411' > >>>>> # 730439 => 'Bacillus caldolyticus' > >>>>> # 89318838 => undef (this record has been removed from the db) > >>>>> > >>>>> 1; > >>>>> > >>>>> You probably will need to break up your 30000 into chunks > >>>>> (say, 1000-3000 each), and do the above on each chunk with a > >>>>> > >>>>> sleep 3; > >>>>> > >>>>> or so separating the queries. > >>>>> MAJ > >>>>> ----- Original Message ----- > >>>>> From: "Bhakti Dwivedi" > >>>>> To: > >>>>> Sent: Friday, December 25, 2009 9:46 PM > >>>>> Subject: [Bioperl-l] how to retrieve organism name from accession > >>> number? > >>>>> > >>>>> > >>>>>> Hi, > >>>>>> > >>>>>> Does anyone know how to retrieve the "Source" or the "Species name" > >>>>> given > >>>>>> the accession number using Bioperl. I have these 30,000 accession > >>>>> numbers > >>>>>> for which I need to get the source organisms. Any kind of help > will > >>> be > >>>>>> appreciated. > >>>>>> > >>>>>> Thanks > >>>>>> > >>>>>> BD > >>>>>> _______________________________________________ > >>>>>> Bioperl-l mailing list > >>>>>> Bioperl-l at lists.open-bio.org > >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>> > >>>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> Bioperl-l mailing list > >>>>> Bioperl-l at lists.open-bio.org > >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>> > >> ======================================================================= > >>>> Attention: The information contained in this message and/or > >> attachments > >>>> from AgResearch Limited is intended only for the persons or entities > >>>> to which it is addressed and may contain confidential and/or > >> privileged > >>>> material. Any review, retransmission, dissemination or other use of, > >> or > >>>> taking of any action in reliance upon, this information by persons or > >>>> entities other than the intended recipients is prohibited by > >> AgResearch > >>>> Limited. If you have received this message in error, please notify > the > >>>> sender immediately. > >>>> > >> ======================================================================= > >>>> > >>>> _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l at lists.open-bio.org > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Jan 26 21:42:22 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 26 Jan 2010 20:42:22 -0600 Subject: [Bioperl-l] how to retrieve organism name from accession number? In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C61AC961E@exchsth.agresearch.co.nz> References: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife> <18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz> <4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu> <18DF7D20DFEC044098A1062202F5FFF32C61AC961E@exchsth.agresearch.co.nz> Message-ID: <5FF9B845-2B6A-4B8C-B8AE-A12E0CBCA2D6@illinois.edu> Makes me wonder if they're pushing more users towards the SOAP-based services and away from eutils. chris On Jan 26, 2010, at 7:59 PM, Smithies, Russell wrote: > I've had a wide selection of errors lately: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: NCBI esearch fatal error: Search Backend failed: Error 11 (Resource temporarily unavailable) > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 > STACK: Bio::Tools::EUtilities::parse_data /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332 > STACK: Bio::Tools::EUtilities::get_ids /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441 > STACK: Bio::DB::EUtilities::get_ids /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363 > STACK: get_desc.pl:32 > ----------------------------------------------------------- > > And I never get a good explanation from NCBI or suggestions on how to avoid it. > > > --Russell > > >> -----Original Message----- >> From: Chris Fields [mailto:cjfields at illinois.edu] >> Sent: Wednesday, 27 January 2010 2:46 p.m. >> To: Smithies, Russell >> Cc: 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org' >> Subject: Re: [Bioperl-l] how to retrieve organism name from accession >> number? >> >> It's unfortunate but I have heard this problem popping up quite a bit more >> frequently lately. Not to push too many buttons but NCBI isn't very >> forthcoming with help these days; they have become quite insular. Not >> sure if they're short-staffed due to budget or if there are other issues. >> >> chris >> >> On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote: >> >>> Grrrrrr, I hate eutils!!!! >>> >>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>> MSG: NCBI esearch fatal error: Search Backend failed: Error 111 >> (Connection refused) >>> STACK: Error::throw >>> STACK: Bio::Root::Root::throw >> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 >>> STACK: Bio::Tools::EUtilities::parse_data >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332 >>> STACK: Bio::Tools::EUtilities::get_ids >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441 >>> STACK: Bio::DB::EUtilities::get_ids >> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363 >>> STACK: get_desc.pl:32 >>> ----------------------------------------------------------- >>> >>> >>> Nice error message though :-) >>> >>> >>> --Russell >>> >>>> -----Original Message----- >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell >>>> Sent: Monday, 11 January 2010 10:05 a.m. >>>> To: 'Chris Fields' >>>> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org' >>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession >>>> number? >>>> >>>> I've started to go off eUtils recently (not BioPerl's fault) as I've >> often >>>> been finding that with large queries, chunks of the resulting data is >>>> missing. >>>> For example, before Xmas I was creating species-specific databases by >>>> using eUtils to get a list of GI numbers back for a taxid, then >> retrieving >>>> the fasta sequences in chunks of 500. >>>> Very regularly, in the middle of the fasta there would be a message >> about >>>> resource unavailable eg. >>>>> test_sequence_1 >>>> TACGATCATCGCTResource UnavailableTACGACTCTGCT >>>>> test_sequence_2 >>>> TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT >>>> >>>> Often this wasn't detected until formatdb complained about invalid >>>> characters. >>>> Inquiries to NCBI as to why this was happening and what to do about it >>>> returned stupid answers ("do each sequence manually thru the web >>>> interface", or "use eUtils"). >>>> As we have a nice fast network connection, I now prefer to download >> very >>>> large gzip files (i.e. all of refseq) and extract what I need. >>>> >>>> I can't help but think that NCBI could solve a lot of problems if they >>>> gzipped the output from eUtils queries - it's something I've requested >>>> regularly for the last 5 years or so!! >>>> >>>> --Russell >>>> >>>> >>>>> -----Original Message----- >>>>> From: Chris Fields [mailto:cjfields at illinois.edu] >>>>> Sent: Monday, 11 January 2010 9:50 a.m. >>>>> To: Smithies, Russell >>>>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-bio.org' >>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession >>>>> number? >>>>> >>>>> One could also use Bio::DB::Taxonomy, which indexes the same files or >>>>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for >> the >>>>> details). >>>>> >>>>> chris >>>>> >>>>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote: >>>>> >>>>>> An alternate non-BioPerly way (that may be faster given NCBI's >>>> flakiness >>>>> lately) would be to download the gi_taxid_nucl.zip or >> gi_taxid_prot.zip >>>>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash >>>> and >>>>> do lookups. >>>>>> In that same dir, taxdump.tar.gz contains a file called names.dmp >>>> which >>>>> lists taxids and descriptions (and synonyms) >>>>>> >>>>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I >>>>> could do this: >>>>>> >>>>>> my $taxid = $gi_taxid_nucl{$accession}; >>>>>> my $org_name = $names{$taxid}; >>>>>> >>>>>> --Russell >>>>>> >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen >>>>>>> Sent: Saturday, 26 December 2009 4:52 p.m. >>>>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org >>>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from >> accession >>>>>>> number? >>>>>>> >>>>>>> Bhakti, >>>>>>> The following example (using EUtilities) may serve your purpose: >>>>>>> >>>>>>> use Bio::DB::EUtilities; >>>>>>> >>>>>>> my (%taxa, @taxa); >>>>>>> my (%names, %idmap); >>>>>>> >>>>>>> # these are protein ids; nuc ids will work by changing -dbfrom => >>>>>>> 'nucleotide', >>>>>>> # (probably) >>>>>>> >>>>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439); >>>>>>> >>>>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink', >>>>>>> -db => 'taxonomy', >>>>>>> -dbfrom => 'protein', >>>>>>> -correspondence => 1, >>>>>>> -id => \@ids); >>>>>>> >>>>>>> # iterate through the LinkSet objects >>>>>>> while (my $ds = $factory->next_LinkSet) { >>>>>>> $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0] >>>>>>> } >>>>>>> >>>>>>> @taxa = @taxa{@ids}; >>>>>>> >>>>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary', >>>>>>> -db => 'taxonomy', >>>>>>> -id => \@taxa ); >>>>>>> >>>>>>> while (local $_ = $factory->next_DocSum) { >>>>>>> $names{($_->get_contents_by_name('TaxId'))[0]} = >>>>>>> ($_->get_contents_by_name('ScientificName'))[0]; >>>>>>> } >>>>>>> >>>>>>> foreach (@ids) { >>>>>>> $idmap{$_} = $names{$taxa{$_}}; >>>>>>> } >>>>>>> >>>>>>> # %idmap is >>>>>>> # 1621261 => 'Mycobacterium tuberculosis H37Rv' >>>>>>> # 20807972 => 'Thermoanaerobacter tengcongensis MB4' >>>>>>> # 68536103 => 'Corynebacterium jeikeium K411' >>>>>>> # 730439 => 'Bacillus caldolyticus' >>>>>>> # 89318838 => undef (this record has been removed from the db) >>>>>>> >>>>>>> 1; >>>>>>> >>>>>>> You probably will need to break up your 30000 into chunks >>>>>>> (say, 1000-3000 each), and do the above on each chunk with a >>>>>>> >>>>>>> sleep 3; >>>>>>> >>>>>>> or so separating the queries. >>>>>>> MAJ >>>>>>> ----- Original Message ----- >>>>>>> From: "Bhakti Dwivedi" >>>>>>> To: >>>>>>> Sent: Friday, December 25, 2009 9:46 PM >>>>>>> Subject: [Bioperl-l] how to retrieve organism name from accession >>>>> number? >>>>>>> >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> Does anyone know how to retrieve the "Source" or the "Species name" >>>>>>> given >>>>>>>> the accession number using Bioperl. I have these 30,000 accession >>>>>>> numbers >>>>>>>> for which I need to get the source organisms. Any kind of help >> will >>>>> be >>>>>>>> appreciated. >>>>>>>> >>>>>>>> Thanks >>>>>>>> >>>>>>>> BD >>>>>>>> _______________________________________________ >>>>>>>> Bioperl-l mailing list >>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>> ======================================================================= >>>>>> Attention: The information contained in this message and/or >>>> attachments >>>>>> from AgResearch Limited is intended only for the persons or entities >>>>>> to which it is addressed and may contain confidential and/or >>>> privileged >>>>>> material. Any review, retransmission, dissemination or other use of, >>>> or >>>>>> taking of any action in reliance upon, this information by persons or >>>>>> entities other than the intended recipients is prohibited by >>>> AgResearch >>>>>> Limited. If you have received this message in error, please notify >> the >>>>>> sender immediately. >>>>>> >>>> ======================================================================= >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Russell.Smithies at agresearch.co.nz Tue Jan 26 21:45:58 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 27 Jan 2010 15:45:58 +1300 Subject: [Bioperl-l] how to retrieve organism name from accession number? In-Reply-To: <5FF9B845-2B6A-4B8C-B8AE-A12E0CBCA2D6@illinois.edu> References: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife> <18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz> <4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu> <18DF7D20DFEC044098A1062202F5FFF32C61AC961E@exchsth.agresearch.co.nz> <5FF9B845-2B6A-4B8C-B8AE-A12E0CBCA2D6@illinois.edu> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C61AC964C@exchsth.agresearch.co.nz> Batch-entrez http://www.ncbi.nlm.nih.gov/portal/utils/batchentrez_p.cgi still works if you don't mind a bit of manual button clicking. It's handling chunks of 100,000 records OK (today). --Russell > -----Original Message----- > From: Chris Fields [mailto:cjfields at illinois.edu] > Sent: Wednesday, 27 January 2010 3:42 p.m. > To: Smithies, Russell > Cc: 'bioperl-l at lists.open-bio.org'; 'Mark A. Jensen' > Subject: Re: [Bioperl-l] how to retrieve organism name from accession > number? > > Makes me wonder if they're pushing more users towards the SOAP-based > services and away from eutils. > > chris > > On Jan 26, 2010, at 7:59 PM, Smithies, Russell wrote: > > > I've had a wide selection of errors lately: > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: NCBI esearch fatal error: Search Backend failed: Error 11 (Resource > temporarily unavailable) > > STACK: Error::throw > > STACK: Bio::Root::Root::throw > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 > > STACK: Bio::Tools::EUtilities::parse_data > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332 > > STACK: Bio::Tools::EUtilities::get_ids > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441 > > STACK: Bio::DB::EUtilities::get_ids > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363 > > STACK: get_desc.pl:32 > > ----------------------------------------------------------- > > > > And I never get a good explanation from NCBI or suggestions on how to > avoid it. > > > > > > --Russell > > > > > >> -----Original Message----- > >> From: Chris Fields [mailto:cjfields at illinois.edu] > >> Sent: Wednesday, 27 January 2010 2:46 p.m. > >> To: Smithies, Russell > >> Cc: 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org' > >> Subject: Re: [Bioperl-l] how to retrieve organism name from accession > >> number? > >> > >> It's unfortunate but I have heard this problem popping up quite a bit > more > >> frequently lately. Not to push too many buttons but NCBI isn't very > >> forthcoming with help these days; they have become quite insular. Not > >> sure if they're short-staffed due to budget or if there are other > issues. > >> > >> chris > >> > >> On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote: > >> > >>> Grrrrrr, I hate eutils!!!! > >>> > >>> ------------- EXCEPTION: Bio::Root::Exception ------------- > >>> MSG: NCBI esearch fatal error: Search Backend failed: Error 111 > >> (Connection refused) > >>> STACK: Error::throw > >>> STACK: Bio::Root::Root::throw > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 > >>> STACK: Bio::Tools::EUtilities::parse_data > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332 > >>> STACK: Bio::Tools::EUtilities::get_ids > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441 > >>> STACK: Bio::DB::EUtilities::get_ids > >> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363 > >>> STACK: get_desc.pl:32 > >>> ----------------------------------------------------------- > >>> > >>> > >>> Nice error message though :-) > >>> > >>> > >>> --Russell > >>> > >>>> -----Original Message----- > >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>>> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell > >>>> Sent: Monday, 11 January 2010 10:05 a.m. > >>>> To: 'Chris Fields' > >>>> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open- > bio.org' > >>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession > >>>> number? > >>>> > >>>> I've started to go off eUtils recently (not BioPerl's fault) as I've > >> often > >>>> been finding that with large queries, chunks of the resulting data is > >>>> missing. > >>>> For example, before Xmas I was creating species-specific databases by > >>>> using eUtils to get a list of GI numbers back for a taxid, then > >> retrieving > >>>> the fasta sequences in chunks of 500. > >>>> Very regularly, in the middle of the fasta there would be a message > >> about > >>>> resource unavailable eg. > >>>>> test_sequence_1 > >>>> TACGATCATCGCTResource UnavailableTACGACTCTGCT > >>>>> test_sequence_2 > >>>> TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT > >>>> > >>>> Often this wasn't detected until formatdb complained about invalid > >>>> characters. > >>>> Inquiries to NCBI as to why this was happening and what to do about > it > >>>> returned stupid answers ("do each sequence manually thru the web > >>>> interface", or "use eUtils"). > >>>> As we have a nice fast network connection, I now prefer to download > >> very > >>>> large gzip files (i.e. all of refseq) and extract what I need. > >>>> > >>>> I can't help but think that NCBI could solve a lot of problems if > they > >>>> gzipped the output from eUtils queries - it's something I've > requested > >>>> regularly for the last 5 years or so!! > >>>> > >>>> --Russell > >>>> > >>>> > >>>>> -----Original Message----- > >>>>> From: Chris Fields [mailto:cjfields at illinois.edu] > >>>>> Sent: Monday, 11 January 2010 9:50 a.m. > >>>>> To: Smithies, Russell > >>>>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open- > bio.org' > >>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from > accession > >>>>> number? > >>>>> > >>>>> One could also use Bio::DB::Taxonomy, which indexes the same files > or > >>>>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for > >> the > >>>>> details). > >>>>> > >>>>> chris > >>>>> > >>>>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote: > >>>>> > >>>>>> An alternate non-BioPerly way (that may be faster given NCBI's > >>>> flakiness > >>>>> lately) would be to download the gi_taxid_nucl.zip or > >> gi_taxid_prot.zip > >>>>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a > hash > >>>> and > >>>>> do lookups. > >>>>>> In that same dir, taxdump.tar.gz contains a file called names.dmp > >>>> which > >>>>> lists taxids and descriptions (and synonyms) > >>>>>> > >>>>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so > I > >>>>> could do this: > >>>>>> > >>>>>> my $taxid = $gi_taxid_nucl{$accession}; > >>>>>> my $org_name = $names{$taxid}; > >>>>>> > >>>>>> --Russell > >>>>>> > >>>>>> > >>>>>>> -----Original Message----- > >>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>>>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen > >>>>>>> Sent: Saturday, 26 December 2009 4:52 p.m. > >>>>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org > >>>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from > >> accession > >>>>>>> number? > >>>>>>> > >>>>>>> Bhakti, > >>>>>>> The following example (using EUtilities) may serve your purpose: > >>>>>>> > >>>>>>> use Bio::DB::EUtilities; > >>>>>>> > >>>>>>> my (%taxa, @taxa); > >>>>>>> my (%names, %idmap); > >>>>>>> > >>>>>>> # these are protein ids; nuc ids will work by changing -dbfrom => > >>>>>>> 'nucleotide', > >>>>>>> # (probably) > >>>>>>> > >>>>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439); > >>>>>>> > >>>>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink', > >>>>>>> -db => 'taxonomy', > >>>>>>> -dbfrom => 'protein', > >>>>>>> -correspondence => 1, > >>>>>>> -id => \@ids); > >>>>>>> > >>>>>>> # iterate through the LinkSet objects > >>>>>>> while (my $ds = $factory->next_LinkSet) { > >>>>>>> $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0] > >>>>>>> } > >>>>>>> > >>>>>>> @taxa = @taxa{@ids}; > >>>>>>> > >>>>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary', > >>>>>>> -db => 'taxonomy', > >>>>>>> -id => \@taxa ); > >>>>>>> > >>>>>>> while (local $_ = $factory->next_DocSum) { > >>>>>>> $names{($_->get_contents_by_name('TaxId'))[0]} = > >>>>>>> ($_->get_contents_by_name('ScientificName'))[0]; > >>>>>>> } > >>>>>>> > >>>>>>> foreach (@ids) { > >>>>>>> $idmap{$_} = $names{$taxa{$_}}; > >>>>>>> } > >>>>>>> > >>>>>>> # %idmap is > >>>>>>> # 1621261 => 'Mycobacterium tuberculosis H37Rv' > >>>>>>> # 20807972 => 'Thermoanaerobacter tengcongensis MB4' > >>>>>>> # 68536103 => 'Corynebacterium jeikeium K411' > >>>>>>> # 730439 => 'Bacillus caldolyticus' > >>>>>>> # 89318838 => undef (this record has been removed from the > db) > >>>>>>> > >>>>>>> 1; > >>>>>>> > >>>>>>> You probably will need to break up your 30000 into chunks > >>>>>>> (say, 1000-3000 each), and do the above on each chunk with a > >>>>>>> > >>>>>>> sleep 3; > >>>>>>> > >>>>>>> or so separating the queries. > >>>>>>> MAJ > >>>>>>> ----- Original Message ----- > >>>>>>> From: "Bhakti Dwivedi" > >>>>>>> To: > >>>>>>> Sent: Friday, December 25, 2009 9:46 PM > >>>>>>> Subject: [Bioperl-l] how to retrieve organism name from accession > >>>>> number? > >>>>>>> > >>>>>>> > >>>>>>>> Hi, > >>>>>>>> > >>>>>>>> Does anyone know how to retrieve the "Source" or the "Species > name" > >>>>>>> given > >>>>>>>> the accession number using Bioperl. I have these 30,000 > accession > >>>>>>> numbers > >>>>>>>> for which I need to get the source organisms. Any kind of help > >> will > >>>>> be > >>>>>>>> appreciated. > >>>>>>>> > >>>>>>>> Thanks > >>>>>>>> > >>>>>>>> BD > >>>>>>>> _______________________________________________ > >>>>>>>> Bioperl-l mailing list > >>>>>>>> Bioperl-l at lists.open-bio.org > >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>>> _______________________________________________ > >>>>>>> Bioperl-l mailing list > >>>>>>> Bioperl-l at lists.open-bio.org > >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>> > >>>> > ======================================================================= > >>>>>> Attention: The information contained in this message and/or > >>>> attachments > >>>>>> from AgResearch Limited is intended only for the persons or > entities > >>>>>> to which it is addressed and may contain confidential and/or > >>>> privileged > >>>>>> material. Any review, retransmission, dissemination or other use > of, > >>>> or > >>>>>> taking of any action in reliance upon, this information by persons > or > >>>>>> entities other than the intended recipients is prohibited by > >>>> AgResearch > >>>>>> Limited. If you have received this message in error, please notify > >> the > >>>>>> sender immediately. > >>>>>> > >>>> > ======================================================================= > >>>>>> > >>>>>> _______________________________________________ > >>>>>> Bioperl-l mailing list > >>>>>> Bioperl-l at lists.open-bio.org > >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>> > >>>> > >>>> _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l at lists.open-bio.org > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Wed Jan 27 10:14:22 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 27 Jan 2010 10:14:22 -0500 Subject: [Bioperl-l] how to retrieve organism name from accession number? In-Reply-To: <5FF9B845-2B6A-4B8C-B8AE-A12E0CBCA2D6@illinois.edu> References: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife><18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz><18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz><18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz><4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu><18DF7D20DFEC044098A1062202F5FFF32C61AC961E@exchsth.agresearch.co.nz> <5FF9B845-2B6A-4B8C-B8AE-A12E0CBCA2D6@illinois.edu> Message-ID: Precisely the MO behind SoapEU...get the jump on 'em. ----- Original Message ----- From: "Chris Fields" To: "Smithies, Russell" Cc: ; "'Mark A. Jensen'" Sent: Tuesday, January 26, 2010 9:42 PM Subject: Re: [Bioperl-l] how to retrieve organism name from accession number? > Makes me wonder if they're pushing more users towards the SOAP-based services > and away from eutils. > > chris > > On Jan 26, 2010, at 7:59 PM, Smithies, Russell wrote: > >> I've had a wide selection of errors lately: >> >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: NCBI esearch fatal error: Search Backend failed: Error 11 (Resource >> temporarily unavailable) >> STACK: Error::throw >> STACK: Bio::Root::Root::throw >> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 >> STACK: Bio::Tools::EUtilities::parse_data >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332 >> STACK: Bio::Tools::EUtilities::get_ids >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441 >> STACK: Bio::DB::EUtilities::get_ids >> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363 >> STACK: get_desc.pl:32 >> ----------------------------------------------------------- >> >> And I never get a good explanation from NCBI or suggestions on how to avoid >> it. >> >> >> --Russell >> >> >>> -----Original Message----- >>> From: Chris Fields [mailto:cjfields at illinois.edu] >>> Sent: Wednesday, 27 January 2010 2:46 p.m. >>> To: Smithies, Russell >>> Cc: 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org' >>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession >>> number? >>> >>> It's unfortunate but I have heard this problem popping up quite a bit more >>> frequently lately. Not to push too many buttons but NCBI isn't very >>> forthcoming with help these days; they have become quite insular. Not >>> sure if they're short-staffed due to budget or if there are other issues. >>> >>> chris >>> >>> On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote: >>> >>>> Grrrrrr, I hate eutils!!!! >>>> >>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>> MSG: NCBI esearch fatal error: Search Backend failed: Error 111 >>> (Connection refused) >>>> STACK: Error::throw >>>> STACK: Bio::Root::Root::throw >>> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 >>>> STACK: Bio::Tools::EUtilities::parse_data >>> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332 >>>> STACK: Bio::Tools::EUtilities::get_ids >>> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441 >>>> STACK: Bio::DB::EUtilities::get_ids >>> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363 >>>> STACK: get_desc.pl:32 >>>> ----------------------------------------------------------- >>>> >>>> >>>> Nice error message though :-) >>>> >>>> >>>> --Russell >>>> >>>>> -----Original Message----- >>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell >>>>> Sent: Monday, 11 January 2010 10:05 a.m. >>>>> To: 'Chris Fields' >>>>> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org' >>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession >>>>> number? >>>>> >>>>> I've started to go off eUtils recently (not BioPerl's fault) as I've >>> often >>>>> been finding that with large queries, chunks of the resulting data is >>>>> missing. >>>>> For example, before Xmas I was creating species-specific databases by >>>>> using eUtils to get a list of GI numbers back for a taxid, then >>> retrieving >>>>> the fasta sequences in chunks of 500. >>>>> Very regularly, in the middle of the fasta there would be a message >>> about >>>>> resource unavailable eg. >>>>>> test_sequence_1 >>>>> TACGATCATCGCTResource UnavailableTACGACTCTGCT >>>>>> test_sequence_2 >>>>> TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT >>>>> >>>>> Often this wasn't detected until formatdb complained about invalid >>>>> characters. >>>>> Inquiries to NCBI as to why this was happening and what to do about it >>>>> returned stupid answers ("do each sequence manually thru the web >>>>> interface", or "use eUtils"). >>>>> As we have a nice fast network connection, I now prefer to download >>> very >>>>> large gzip files (i.e. all of refseq) and extract what I need. >>>>> >>>>> I can't help but think that NCBI could solve a lot of problems if they >>>>> gzipped the output from eUtils queries - it's something I've requested >>>>> regularly for the last 5 years or so!! >>>>> >>>>> --Russell >>>>> >>>>> >>>>>> -----Original Message----- >>>>>> From: Chris Fields [mailto:cjfields at illinois.edu] >>>>>> Sent: Monday, 11 January 2010 9:50 a.m. >>>>>> To: Smithies, Russell >>>>>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-bio.org' >>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession >>>>>> number? >>>>>> >>>>>> One could also use Bio::DB::Taxonomy, which indexes the same files or >>>>>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for >>> the >>>>>> details). >>>>>> >>>>>> chris >>>>>> >>>>>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote: >>>>>> >>>>>>> An alternate non-BioPerly way (that may be faster given NCBI's >>>>> flakiness >>>>>> lately) would be to download the gi_taxid_nucl.zip or >>> gi_taxid_prot.zip >>>>>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash >>>>> and >>>>>> do lookups. >>>>>>> In that same dir, taxdump.tar.gz contains a file called names.dmp >>>>> which >>>>>> lists taxids and descriptions (and synonyms) >>>>>>> >>>>>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I >>>>>> could do this: >>>>>>> >>>>>>> my $taxid = $gi_taxid_nucl{$accession}; >>>>>>> my $org_name = $names{$taxid}; >>>>>>> >>>>>>> --Russell >>>>>>> >>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen >>>>>>>> Sent: Saturday, 26 December 2009 4:52 p.m. >>>>>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org >>>>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from >>> accession >>>>>>>> number? >>>>>>>> >>>>>>>> Bhakti, >>>>>>>> The following example (using EUtilities) may serve your purpose: >>>>>>>> >>>>>>>> use Bio::DB::EUtilities; >>>>>>>> >>>>>>>> my (%taxa, @taxa); >>>>>>>> my (%names, %idmap); >>>>>>>> >>>>>>>> # these are protein ids; nuc ids will work by changing -dbfrom => >>>>>>>> 'nucleotide', >>>>>>>> # (probably) >>>>>>>> >>>>>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439); >>>>>>>> >>>>>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink', >>>>>>>> -db => 'taxonomy', >>>>>>>> -dbfrom => 'protein', >>>>>>>> -correspondence => 1, >>>>>>>> -id => \@ids); >>>>>>>> >>>>>>>> # iterate through the LinkSet objects >>>>>>>> while (my $ds = $factory->next_LinkSet) { >>>>>>>> $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0] >>>>>>>> } >>>>>>>> >>>>>>>> @taxa = @taxa{@ids}; >>>>>>>> >>>>>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary', >>>>>>>> -db => 'taxonomy', >>>>>>>> -id => \@taxa ); >>>>>>>> >>>>>>>> while (local $_ = $factory->next_DocSum) { >>>>>>>> $names{($_->get_contents_by_name('TaxId'))[0]} = >>>>>>>> ($_->get_contents_by_name('ScientificName'))[0]; >>>>>>>> } >>>>>>>> >>>>>>>> foreach (@ids) { >>>>>>>> $idmap{$_} = $names{$taxa{$_}}; >>>>>>>> } >>>>>>>> >>>>>>>> # %idmap is >>>>>>>> # 1621261 => 'Mycobacterium tuberculosis H37Rv' >>>>>>>> # 20807972 => 'Thermoanaerobacter tengcongensis MB4' >>>>>>>> # 68536103 => 'Corynebacterium jeikeium K411' >>>>>>>> # 730439 => 'Bacillus caldolyticus' >>>>>>>> # 89318838 => undef (this record has been removed from the db) >>>>>>>> >>>>>>>> 1; >>>>>>>> >>>>>>>> You probably will need to break up your 30000 into chunks >>>>>>>> (say, 1000-3000 each), and do the above on each chunk with a >>>>>>>> >>>>>>>> sleep 3; >>>>>>>> >>>>>>>> or so separating the queries. >>>>>>>> MAJ >>>>>>>> ----- Original Message ----- >>>>>>>> From: "Bhakti Dwivedi" >>>>>>>> To: >>>>>>>> Sent: Friday, December 25, 2009 9:46 PM >>>>>>>> Subject: [Bioperl-l] how to retrieve organism name from accession >>>>>> number? >>>>>>>> >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> Does anyone know how to retrieve the "Source" or the "Species name" >>>>>>>> given >>>>>>>>> the accession number using Bioperl. I have these 30,000 accession >>>>>>>> numbers >>>>>>>>> for which I need to get the source organisms. Any kind of help >>> will >>>>>> be >>>>>>>>> appreciated. >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> >>>>>>>>> BD >>>>>>>>> _______________________________________________ >>>>>>>>> Bioperl-l mailing list >>>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Bioperl-l mailing list >>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>>> ======================================================================= >>>>>>> Attention: The information contained in this message and/or >>>>> attachments >>>>>>> from AgResearch Limited is intended only for the persons or entities >>>>>>> to which it is addressed and may contain confidential and/or >>>>> privileged >>>>>>> material. Any review, retransmission, dissemination or other use of, >>>>> or >>>>>>> taking of any action in reliance upon, this information by persons or >>>>>>> entities other than the intended recipients is prohibited by >>>>> AgResearch >>>>>>> Limited. If you have received this message in error, please notify >>> the >>>>>>> sender immediately. >>>>>>> >>>>> ======================================================================= >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From bhakti.dwivedi at gmail.com Wed Jan 27 14:42:06 2010 From: bhakti.dwivedi at gmail.com (Bhakti Dwivedi) Date: Wed, 27 Jan 2010 14:42:06 -0500 Subject: [Bioperl-l] Designing primers from multiple sequence alignment of amino acid sequences Message-ID: Hi, I have to design primers from the multiple sequence alignments of amino acid sequences. The sequences I am working with are quite diverged and often the available primer design programs (such as CODEHOP/iCODEHOP) fail to find any primer sets. But, when I look at the alignment manually, I could see the regions that I could use to make primers. So I designed the degenerate primers the old-fashioned way, starting from selecting the conserved regions (6-10aa long) from the alignment to translating the selected regions to DNA using the appropriate codon usage table, and then finally checking the primer sets (potential forward and reverse primers) using tools like OLIGOANALYZER. In the end, I did find few good primer sets, but getting them to work in reality is something I will have to wait and see. While doing this process manually, I really felt the need to automate it (it was not just one alignment I did, I worked with several of those). I was wondering if there is anyway bioperl can help me here, or making a perl script is the only way to go. I would appreciate your suggestions/comments. Thanks! (apologize for a long email..) Regards Bhakti From Kevin.M.Brown at asu.edu Wed Jan 27 15:23:57 2010 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Wed, 27 Jan 2010 13:23:57 -0700 Subject: [Bioperl-l] Designing primers from multiple sequence alignment ofamino acid sequences In-Reply-To: References: Message-ID: <1A4207F8295607498283FE9E93B775B4068498DB@EX02.asurite.ad.asu.edu> Bioperl is just a collection of tools, not a full blown application. Most of what you want can be done with the objects available from within the toolkit, but the application (perl script) would still need to be written to put the objects to use. You could use clustalw from within perl to align the sequences (Bio::Tools::Run::Alignment::Clustalw), find the conserved regions (Bio::SimpleAlign), reverse translate them (Bio::Tools::CodonTable), then come up with an algorithm for primer analysis and selction (or even use other apps like primer3 (Bio::Tools::Run::Primer3) from within perl). Kevin Brown Center for Innovations in Medicine Biodesign Institute Arizona State University > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Bhakti Dwivedi > Sent: Wednesday, January 27, 2010 12:42 PM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Designing primers from multiple sequence > alignment ofamino acid sequences > > Hi, > > I have to design primers from the multiple sequence > alignments of amino acid > sequences. The sequences I am working with are quite > diverged and often the > available primer design programs (such as CODEHOP/iCODEHOP) > fail to find any > primer sets. But, when I look at the alignment manually, I > could see the > regions that I could use to make primers. > > So I designed the degenerate primers the old-fashioned way, > starting from > selecting the conserved regions (6-10aa long) from the alignment to > translating the selected regions to DNA using the appropriate > codon usage > table, and then finally checking the primer sets (potential > forward and > reverse primers) using tools like OLIGOANALYZER. In the end, > I did find few > good primer sets, but getting them to work in reality is > something I will > have to wait and see. > > While doing this process manually, I really felt the need to > automate it (it > was not just one alignment I did, I worked with several of > those). I was > wondering if there is anyway bioperl can help me here, or > making a perl > script is the only way to go. > > I would appreciate your suggestions/comments. Thanks! > (apologize for a > long email..) > > > Regards > Bhakti > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From mike.stubbington at bbsrc.ac.uk Thu Jan 28 10:41:49 2010 From: mike.stubbington at bbsrc.ac.uk (mike stubbington (BI)) Date: Thu, 28 Jan 2010 15:41:49 +0000 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error running blastn Message-ID: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk> Dear all, I am attempting to blast some primers against the mouse genome. I have created a local mouse genome blast database and I can search against it using 'blastn' at the command line. I have perl code that creates an array of bioperl sequence objects called @primers I then create a StandAloneBlastPlus factory using the following code? my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new( -db_dir => '/Users/stubbing/localBlast/', -db_name => 'MouseGenome' ); and then attempt to blast my primers using this? my @shortPrimers; my $count=1; foreach (@primers) { my $currentSeq = $_; print "Checking primer $count/$primerNumber "; if ($_->length < 40) { push(@shortPrimers,$_); print "Too short!\n"; } else { print "BLASTing..."; my $blastResult = $blastFactory->blastn(-query => $currentSeq); } $count++; } This fails with the following error? ------------- EXCEPTION ------------- MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running /usr/local/ncbi/blast/bin/blastn : Illegal seek at /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 989, line 532. STACK Bio::Tools::Run::WrapperBase::_run /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:994 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 STACK Bio::Tools::Run::StandAloneBlastPlus::run /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 STACK toplevel ./5CTest.pl:63 ------------------------------------- Line 63 in my code is (as you might expect) the one that calls blastn on my factory object. I'd appreciate any help you might be able to provide to shed light on this. Thanks in advance, Mike From maj at fortinbras.us Thu Jan 28 10:56:14 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 28 Jan 2010 10:56:14 -0500 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error running blastn In-Reply-To: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk> References: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk> Message-ID: <56CC8EF7EEAD4E28A3B822ADA8AD74B1@NewLife> Mike - please try updating your bioperl-live (the core) to the latest code (revision 16761 or so). CommandExts is a work in progress; from the stack errors it looks like you've got an older version. Try it then ping us back, if you would-- Thanks Mark ----- Original Message ----- From: "mike stubbington (BI)" To: Sent: Thursday, January 28, 2010 10:41 AM Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error running blastn Dear all, I am attempting to blast some primers against the mouse genome. I have created a local mouse genome blast database and I can search against it using 'blastn' at the command line. I have perl code that creates an array of bioperl sequence objects called @primers I then create a StandAloneBlastPlus factory using the following code? my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new( -db_dir => '/Users/stubbing/localBlast/', -db_name => 'MouseGenome' ); and then attempt to blast my primers using this? my @shortPrimers; my $count=1; foreach (@primers) { my $currentSeq = $_; print "Checking primer $count/$primerNumber "; if ($_->length < 40) { push(@shortPrimers,$_); print "Too short!\n"; } else { print "BLASTing..."; my $blastResult = $blastFactory->blastn(-query => $currentSeq); } $count++; } This fails with the following error? ------------- EXCEPTION ------------- MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running /usr/local/ncbi/blast/bin/blastn : Illegal seek at /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 989, line 532. STACK Bio::Tools::Run::WrapperBase::_run /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:994 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 STACK Bio::Tools::Run::StandAloneBlastPlus::run /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 STACK toplevel ./5CTest.pl:63 ------------------------------------- Line 63 in my code is (as you might expect) the one that calls blastn on my factory object. I'd appreciate any help you might be able to provide to shed light on this. Thanks in advance, Mike _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From mike.stubbington at bbsrc.ac.uk Thu Jan 28 11:18:12 2010 From: mike.stubbington at bbsrc.ac.uk (mike stubbington (BI)) Date: Thu, 28 Jan 2010 16:18:12 +0000 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error running blastn In-Reply-To: <56CC8EF7EEAD4E28A3B822ADA8AD74B1@NewLife> References: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk> <56CC8EF7EEAD4E28A3B822ADA8AD74B1@NewLife> Message-ID: <05171AE0-71F2-4102-8131-70C6019D0172@bbsrc.ac.uk> Hi, Thanks for the suggestion. Unfortunately it still fails - error as follows: ------------- EXCEPTION ------------- MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running /usr/local/ncbi/blast/bin/blastn : Illegal seek at /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 1000, line 532. STACK Bio::Tools::Run::WrapperBase::_run /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1005 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 STACK Bio::Tools::Run::StandAloneBlastPlus::run /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 STACK toplevel ./5CTest.pl:63 ------------------------------------- M On 28 Jan 2010, at 15:56, Mark A. Jensen wrote: > Mike - please try updating your bioperl-live (the core) to the latest code > (revision 16761 or so). > CommandExts is a work in progress; from the stack errors it looks like you've > got an older version. > Try it then ping us back, if you would-- > Thanks > Mark > ----- Original Message ----- > From: "mike stubbington (BI)" > To: > Sent: Thursday, January 28, 2010 10:41 AM > Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error > running blastn > > > Dear all, > > I am attempting to blast some primers against the mouse genome. I have created a > local mouse genome blast database and I can search against it using 'blastn' at > the command line. > > I have perl code that creates an array of bioperl sequence objects called > @primers > > I then create a StandAloneBlastPlus factory using the following code? > > my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new( > -db_dir => '/Users/stubbing/localBlast/', > -db_name => 'MouseGenome' > ); > > and then attempt to blast my primers using this? > > my @shortPrimers; > my $count=1; > foreach (@primers) { > my $currentSeq = $_; > print "Checking primer $count/$primerNumber "; > if ($_->length < 40) { > push(@shortPrimers,$_); > print "Too short!\n"; > } > else { > print "BLASTing..."; > my $blastResult = $blastFactory->blastn(-query => $currentSeq); > } > $count++; > } > > This fails with the following error? > > ------------- EXCEPTION ------------- > MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running > /usr/local/ncbi/blast/bin/blastn : Illegal seek at > /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 989, > line 532. > > STACK Bio::Tools::Run::WrapperBase::_run > /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:994 > STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 > STACK Bio::Tools::Run::StandAloneBlastPlus::run > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 > STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 > STACK toplevel ./5CTest.pl:63 > ------------------------------------- > > Line 63 in my code is (as you might expect) the one that calls blastn on my > factory object. > > I'd appreciate any help you might be able to provide to shed light on this. > > Thanks in advance, > > Mike > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Thu Jan 28 11:28:52 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 28 Jan 2010 11:28:52 -0500 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error running blastn In-Reply-To: <05171AE0-71F2-4102-8131-70C6019D0172@bbsrc.ac.uk> References: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk> <56CC8EF7EEAD4E28A3B822ADA8AD74B1@NewLife> <05171AE0-71F2-4102-8131-70C6019D0172@bbsrc.ac.uk> Message-ID: Thanks Mike-- will have a look asap- cheers MAJ ----- Original Message ----- From: "mike stubbington (BI)" To: "Mark A. Jensen" Cc: Sent: Thursday, January 28, 2010 11:18 AM Subject: Re: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error running blastn Hi, Thanks for the suggestion. Unfortunately it still fails - error as follows: ------------- EXCEPTION ------------- MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running /usr/local/ncbi/blast/bin/blastn : Illegal seek at /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 1000, line 532. STACK Bio::Tools::Run::WrapperBase::_run /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1005 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 STACK Bio::Tools::Run::StandAloneBlastPlus::run /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 STACK toplevel ./5CTest.pl:63 ------------------------------------- M On 28 Jan 2010, at 15:56, Mark A. Jensen wrote: > Mike - please try updating your bioperl-live (the core) to the latest code > (revision 16761 or so). > CommandExts is a work in progress; from the stack errors it looks like you've > got an older version. > Try it then ping us back, if you would-- > Thanks > Mark > ----- Original Message ----- > From: "mike stubbington (BI)" > To: > Sent: Thursday, January 28, 2010 10:41 AM > Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error > running blastn > > > Dear all, > > I am attempting to blast some primers against the mouse genome. I have created > a > local mouse genome blast database and I can search against it using 'blastn' > at > the command line. > > I have perl code that creates an array of bioperl sequence objects called > @primers > > I then create a StandAloneBlastPlus factory using the following code? > > my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new( > -db_dir => '/Users/stubbing/localBlast/', > -db_name => 'MouseGenome' > ); > > and then attempt to blast my primers using this? > > my @shortPrimers; > my $count=1; > foreach (@primers) { > my $currentSeq = $_; > print "Checking primer $count/$primerNumber "; > if ($_->length < 40) { > push(@shortPrimers,$_); > print "Too short!\n"; > } > else { > print "BLASTing..."; > my $blastResult = $blastFactory->blastn(-query => $currentSeq); > } > $count++; > } > > This fails with the following error? > > ------------- EXCEPTION ------------- > MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem > running > /usr/local/ncbi/blast/bin/blastn : Illegal seek at > /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 989, > line 532. > > STACK Bio::Tools::Run::WrapperBase::_run > /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:994 > STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 > STACK Bio::Tools::Run::StandAloneBlastPlus::run > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 > STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 > STACK toplevel ./5CTest.pl:63 > ------------------------------------- > > Line 63 in my code is (as you might expect) the one that calls blastn on my > factory object. > > I'd appreciate any help you might be able to provide to shed light on this. > > Thanks in advance, > > Mike > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Thu Jan 28 13:26:27 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 28 Jan 2010 12:26:27 -0600 Subject: [Bioperl-l] how to retrieve organism name from accession number? In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C61AC964C@exchsth.agresearch.co.nz> References: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife> <18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz> <4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu> <18DF7D20DFEC044098A1062202F5FFF32C61AC961E@exchsth.agresearch.co.nz> <5FF9B845-2B6A-4B8C-B8AE-A12E0CBCA2D6@illinois.edu> <18DF7D20DFEC044098A1062202F5FFF32C61AC964C@exchsth.agresearch.co.nz> Message-ID: <1264703187.5473.10.camel@cjfields.igb.uiuc.edu> Russell, Just curious, but have you tried setting the return email parameter (-email)? NCBI recently stated that all queries would eventually require a return email of some sort (not sure if it's validated or not). I think that was set for around late spring. I'm changing the code in svn to require it for that very purpose. chris Wed, 2010-01-27 at 15:45 +1300, Smithies, Russell wrote: > Batch-entrez http://www.ncbi.nlm.nih.gov/portal/utils/batchentrez_p.cgi still works if you don't mind a bit of manual button clicking. It's handling chunks of 100,000 records OK (today). > > --Russell > > > -----Original Message----- > > From: Chris Fields [mailto:cjfields at illinois.edu] > > Sent: Wednesday, 27 January 2010 3:42 p.m. > > To: Smithies, Russell > > Cc: 'bioperl-l at lists.open-bio.org'; 'Mark A. Jensen' > > Subject: Re: [Bioperl-l] how to retrieve organism name from accession > > number? > > > > Makes me wonder if they're pushing more users towards the SOAP-based > > services and away from eutils. > > > > chris > > > > On Jan 26, 2010, at 7:59 PM, Smithies, Russell wrote: > > > > > I've had a wide selection of errors lately: > > > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > > MSG: NCBI esearch fatal error: Search Backend failed: Error 11 (Resource > > temporarily unavailable) > > > STACK: Error::throw > > > STACK: Bio::Root::Root::throw > > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 > > > STACK: Bio::Tools::EUtilities::parse_data > > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332 > > > STACK: Bio::Tools::EUtilities::get_ids > > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441 > > > STACK: Bio::DB::EUtilities::get_ids > > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363 > > > STACK: get_desc.pl:32 > > > ----------------------------------------------------------- > > > > > > And I never get a good explanation from NCBI or suggestions on how to > > avoid it. > > > > > > > > > --Russell > > > > > > > > >> -----Original Message----- > > >> From: Chris Fields [mailto:cjfields at illinois.edu] > > >> Sent: Wednesday, 27 January 2010 2:46 p.m. > > >> To: Smithies, Russell > > >> Cc: 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org' > > >> Subject: Re: [Bioperl-l] how to retrieve organism name from accession > > >> number? > > >> > > >> It's unfortunate but I have heard this problem popping up quite a bit > > more > > >> frequently lately. Not to push too many buttons but NCBI isn't very > > >> forthcoming with help these days; they have become quite insular. Not > > >> sure if they're short-staffed due to budget or if there are other > > issues. > > >> > > >> chris > > >> > > >> On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote: > > >> > > >>> Grrrrrr, I hate eutils!!!! > > >>> > > >>> ------------- EXCEPTION: Bio::Root::Exception ------------- > > >>> MSG: NCBI esearch fatal error: Search Backend failed: Error 111 > > >> (Connection refused) > > >>> STACK: Error::throw > > >>> STACK: Bio::Root::Root::throw > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 > > >>> STACK: Bio::Tools::EUtilities::parse_data > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332 > > >>> STACK: Bio::Tools::EUtilities::get_ids > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441 > > >>> STACK: Bio::DB::EUtilities::get_ids > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363 > > >>> STACK: get_desc.pl:32 > > >>> ----------------------------------------------------------- > > >>> > > >>> > > >>> Nice error message though :-) > > >>> > > >>> > > >>> --Russell > > >>> > > >>>> -----Original Message----- > > >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > >>>> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell > > >>>> Sent: Monday, 11 January 2010 10:05 a.m. > > >>>> To: 'Chris Fields' > > >>>> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open- > > bio.org' > > >>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession > > >>>> number? > > >>>> > > >>>> I've started to go off eUtils recently (not BioPerl's fault) as I've > > >> often > > >>>> been finding that with large queries, chunks of the resulting data is > > >>>> missing. > > >>>> For example, before Xmas I was creating species-specific databases by > > >>>> using eUtils to get a list of GI numbers back for a taxid, then > > >> retrieving > > >>>> the fasta sequences in chunks of 500. > > >>>> Very regularly, in the middle of the fasta there would be a message > > >> about > > >>>> resource unavailable eg. > > >>>>> test_sequence_1 > > >>>> TACGATCATCGCTResource UnavailableTACGACTCTGCT > > >>>>> test_sequence_2 > > >>>> TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT > > >>>> > > >>>> Often this wasn't detected until formatdb complained about invalid > > >>>> characters. > > >>>> Inquiries to NCBI as to why this was happening and what to do about > > it > > >>>> returned stupid answers ("do each sequence manually thru the web > > >>>> interface", or "use eUtils"). > > >>>> As we have a nice fast network connection, I now prefer to download > > >> very > > >>>> large gzip files (i.e. all of refseq) and extract what I need. > > >>>> > > >>>> I can't help but think that NCBI could solve a lot of problems if > > they > > >>>> gzipped the output from eUtils queries - it's something I've > > requested > > >>>> regularly for the last 5 years or so!! > > >>>> > > >>>> --Russell > > >>>> > > >>>> > > >>>>> -----Original Message----- > > >>>>> From: Chris Fields [mailto:cjfields at illinois.edu] > > >>>>> Sent: Monday, 11 January 2010 9:50 a.m. > > >>>>> To: Smithies, Russell > > >>>>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open- > > bio.org' > > >>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from > > accession > > >>>>> number? > > >>>>> > > >>>>> One could also use Bio::DB::Taxonomy, which indexes the same files > > or > > >>>>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for > > >> the > > >>>>> details). > > >>>>> > > >>>>> chris > > >>>>> > > >>>>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote: > > >>>>> > > >>>>>> An alternate non-BioPerly way (that may be faster given NCBI's > > >>>> flakiness > > >>>>> lately) would be to download the gi_taxid_nucl.zip or > > >> gi_taxid_prot.zip > > >>>>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a > > hash > > >>>> and > > >>>>> do lookups. > > >>>>>> In that same dir, taxdump.tar.gz contains a file called names.dmp > > >>>> which > > >>>>> lists taxids and descriptions (and synonyms) > > >>>>>> > > >>>>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so > > I > > >>>>> could do this: > > >>>>>> > > >>>>>> my $taxid = $gi_taxid_nucl{$accession}; > > >>>>>> my $org_name = $names{$taxid}; > > >>>>>> > > >>>>>> --Russell > > >>>>>> > > >>>>>> > > >>>>>>> -----Original Message----- > > >>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > >>>>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen > > >>>>>>> Sent: Saturday, 26 December 2009 4:52 p.m. > > >>>>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org > > >>>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from > > >> accession > > >>>>>>> number? > > >>>>>>> > > >>>>>>> Bhakti, > > >>>>>>> The following example (using EUtilities) may serve your purpose: > > >>>>>>> > > >>>>>>> use Bio::DB::EUtilities; > > >>>>>>> > > >>>>>>> my (%taxa, @taxa); > > >>>>>>> my (%names, %idmap); > > >>>>>>> > > >>>>>>> # these are protein ids; nuc ids will work by changing -dbfrom => > > >>>>>>> 'nucleotide', > > >>>>>>> # (probably) > > >>>>>>> > > >>>>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439); > > >>>>>>> > > >>>>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink', > > >>>>>>> -db => 'taxonomy', > > >>>>>>> -dbfrom => 'protein', > > >>>>>>> -correspondence => 1, > > >>>>>>> -id => \@ids); > > >>>>>>> > > >>>>>>> # iterate through the LinkSet objects > > >>>>>>> while (my $ds = $factory->next_LinkSet) { > > >>>>>>> $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0] > > >>>>>>> } > > >>>>>>> > > >>>>>>> @taxa = @taxa{@ids}; > > >>>>>>> > > >>>>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary', > > >>>>>>> -db => 'taxonomy', > > >>>>>>> -id => \@taxa ); > > >>>>>>> > > >>>>>>> while (local $_ = $factory->next_DocSum) { > > >>>>>>> $names{($_->get_contents_by_name('TaxId'))[0]} = > > >>>>>>> ($_->get_contents_by_name('ScientificName'))[0]; > > >>>>>>> } > > >>>>>>> > > >>>>>>> foreach (@ids) { > > >>>>>>> $idmap{$_} = $names{$taxa{$_}}; > > >>>>>>> } > > >>>>>>> > > >>>>>>> # %idmap is > > >>>>>>> # 1621261 => 'Mycobacterium tuberculosis H37Rv' > > >>>>>>> # 20807972 => 'Thermoanaerobacter tengcongensis MB4' > > >>>>>>> # 68536103 => 'Corynebacterium jeikeium K411' > > >>>>>>> # 730439 => 'Bacillus caldolyticus' > > >>>>>>> # 89318838 => undef (this record has been removed from the > > db) > > >>>>>>> > > >>>>>>> 1; > > >>>>>>> > > >>>>>>> You probably will need to break up your 30000 into chunks > > >>>>>>> (say, 1000-3000 each), and do the above on each chunk with a > > >>>>>>> > > >>>>>>> sleep 3; > > >>>>>>> > > >>>>>>> or so separating the queries. > > >>>>>>> MAJ > > >>>>>>> ----- Original Message ----- > > >>>>>>> From: "Bhakti Dwivedi" > > >>>>>>> To: > > >>>>>>> Sent: Friday, December 25, 2009 9:46 PM > > >>>>>>> Subject: [Bioperl-l] how to retrieve organism name from accession > > >>>>> number? > > >>>>>>> > > >>>>>>> > > >>>>>>>> Hi, > > >>>>>>>> > > >>>>>>>> Does anyone know how to retrieve the "Source" or the "Species > > name" > > >>>>>>> given > > >>>>>>>> the accession number using Bioperl. I have these 30,000 > > accession > > >>>>>>> numbers > > >>>>>>>> for which I need to get the source organisms. Any kind of help > > >> will > > >>>>> be > > >>>>>>>> appreciated. > > >>>>>>>> > > >>>>>>>> Thanks > > >>>>>>>> > > >>>>>>>> BD > > >>>>>>>> _______________________________________________ > > >>>>>>>> Bioperl-l mailing list > > >>>>>>>> Bioperl-l at lists.open-bio.org > > >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > >>>>>>>> > > >>>>>>>> > > >>>>>>> > > >>>>>>> _______________________________________________ > > >>>>>>> Bioperl-l mailing list > > >>>>>>> Bioperl-l at lists.open-bio.org > > >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > >>>>>> > > >>>> > > ======================================================================= > > >>>>>> Attention: The information contained in this message and/or > > >>>> attachments > > >>>>>> from AgResearch Limited is intended only for the persons or > > entities > > >>>>>> to which it is addressed and may contain confidential and/or > > >>>> privileged > > >>>>>> material. Any review, retransmission, dissemination or other use > > of, > > >>>> or > > >>>>>> taking of any action in reliance upon, this information by persons > > or > > >>>>>> entities other than the intended recipients is prohibited by > > >>>> AgResearch > > >>>>>> Limited. If you have received this message in error, please notify > > >> the > > >>>>>> sender immediately. > > >>>>>> > > >>>> > > ======================================================================= > > >>>>>> > > >>>>>> _______________________________________________ > > >>>>>> Bioperl-l mailing list > > >>>>>> Bioperl-l at lists.open-bio.org > > >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > >>>> > > >>>> > > >>>> _______________________________________________ > > >>>> Bioperl-l mailing list > > >>>> Bioperl-l at lists.open-bio.org > > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Thu Jan 28 13:47:04 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 28 Jan 2010 13:47:04 -0500 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error running blastn In-Reply-To: <05171AE0-71F2-4102-8131-70C6019D0172@bbsrc.ac.uk> References: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk><56CC8EF7EEAD4E28A3B822ADA8AD74B1@NewLife> <05171AE0-71F2-4102-8131-70C6019D0172@bbsrc.ac.uk> Message-ID: Hi Mike, Believe I found the real bug causing the problem (was not accounting for the db_dir parameter). Crashes should now also throw much more helpful errors. Please try the code at r16774, and shout back. thanks -- MAJ ----- Original Message ----- From: "mike stubbington (BI)" To: "Mark A. Jensen" Cc: Sent: Thursday, January 28, 2010 11:18 AM Subject: Re: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error running blastn Hi, Thanks for the suggestion. Unfortunately it still fails - error as follows: ------------- EXCEPTION ------------- MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running /usr/local/ncbi/blast/bin/blastn : Illegal seek at /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 1000, line 532. STACK Bio::Tools::Run::WrapperBase::_run /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1005 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 STACK Bio::Tools::Run::StandAloneBlastPlus::run /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 STACK toplevel ./5CTest.pl:63 ------------------------------------- M On 28 Jan 2010, at 15:56, Mark A. Jensen wrote: > Mike - please try updating your bioperl-live (the core) to the latest code > (revision 16761 or so). > CommandExts is a work in progress; from the stack errors it looks like you've > got an older version. > Try it then ping us back, if you would-- > Thanks > Mark > ----- Original Message ----- > From: "mike stubbington (BI)" > To: > Sent: Thursday, January 28, 2010 10:41 AM > Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error > running blastn > > > Dear all, > > I am attempting to blast some primers against the mouse genome. I have created > a > local mouse genome blast database and I can search against it using 'blastn' > at > the command line. > > I have perl code that creates an array of bioperl sequence objects called > @primers > > I then create a StandAloneBlastPlus factory using the following code? > > my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new( > -db_dir => '/Users/stubbing/localBlast/', > -db_name => 'MouseGenome' > ); > > and then attempt to blast my primers using this? > > my @shortPrimers; > my $count=1; > foreach (@primers) { > my $currentSeq = $_; > print "Checking primer $count/$primerNumber "; > if ($_->length < 40) { > push(@shortPrimers,$_); > print "Too short!\n"; > } > else { > print "BLASTing..."; > my $blastResult = $blastFactory->blastn(-query => $currentSeq); > } > $count++; > } > > This fails with the following error? > > ------------- EXCEPTION ------------- > MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem > running > /usr/local/ncbi/blast/bin/blastn : Illegal seek at > /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 989, > line 532. > > STACK Bio::Tools::Run::WrapperBase::_run > /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:994 > STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 > STACK Bio::Tools::Run::StandAloneBlastPlus::run > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 > STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 > STACK toplevel ./5CTest.pl:63 > ------------------------------------- > > Line 63 in my code is (as you might expect) the one that calls blastn on my > factory object. > > I'd appreciate any help you might be able to provide to shed light on this. > > Thanks in advance, > > Mike > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Jan 28 14:00:26 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 28 Jan 2010 13:00:26 -0600 Subject: [Bioperl-l] EUtilities policy change Message-ID: <1264705226.5473.35.camel@cjfields.igb.uiuc.edu> All, Per NCBI's recent change in eutils user policy (effective June 1): http://bioperl.org/pipermail/bioperl-l/2009-December/031698.html Both the tool and email parameters ('-tool', '-email') are now required when making requests. Note this will significantly break all modules requiring remote access to eutils (Bio::DB::GenBank/GenPept, some Biblio and Taxonomy stuff as well, IIRC). This also applies to web services (SOAP-based access). Mark, not sure how this affects your SOAP-based modules. I have reconfigured Bio::DB::EUtilities to follow this policy; the default tool setting has been 'bioperl' and will remain that way. However, there has been no default email, therefore setting this is now required for future requests unless we (the bioperl devs) decide there is a safe default email to utilize. My gut tells me, however, that falling back to a default email opens up a can of worms for the devs and is very likely a 'BAD IDEA'(TM). Regardless, be aware that, after June 1, NCBI will very likely exclude requests with no email and will notify users who are considered to be violating their policies. I will likely make further changes to Bio::DB::EUtilities in the meantime to ensure that using the tools by default will not violate NCBI's policy (e.g. override this at your own risk). chris From maj at fortinbras.us Thu Jan 28 14:05:43 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 28 Jan 2010 14:05:43 -0500 Subject: [Bioperl-l] EUtilities policy change In-Reply-To: <1264705226.5473.35.camel@cjfields.igb.uiuc.edu> References: <1264705226.5473.35.camel@cjfields.igb.uiuc.edu> Message-ID: <8F49B5ED151143FA86E977B4D4F44265@NewLife> Thanks Chris-- The soap modules currently set tool to "SoapEUtilities(BioPerl)". I agree that a default email is a bad idea (tm) (unless maybe it's hilmar's...?). I'd say a warning on unset email parameters is a responsible "there be dragons" sort of treatment. MAJ ----- Original Message ----- From: "Chris Fields" To: "BioPerl-l" Cc: "Mark A. Jensen" Sent: Thursday, January 28, 2010 2:00 PM Subject: EUtilities policy change > All, > > Per NCBI's recent change in eutils user policy (effective June 1): > > http://bioperl.org/pipermail/bioperl-l/2009-December/031698.html > > Both the tool and email parameters ('-tool', '-email') are now required > when making requests. Note this will significantly break all modules > requiring remote access to eutils (Bio::DB::GenBank/GenPept, some Biblio > and Taxonomy stuff as well, IIRC). This also applies to web services > (SOAP-based access). Mark, not sure how this affects your SOAP-based > modules. > > I have reconfigured Bio::DB::EUtilities to follow this policy; the > default tool setting has been 'bioperl' and will remain that way. > However, there has been no default email, therefore setting this is now > required for future requests unless we (the bioperl devs) decide there > is a safe default email to utilize. My gut tells me, however, that > falling back to a default email opens up a can of worms for the devs and > is very likely a 'BAD IDEA'(TM). > > Regardless, be aware that, after June 1, NCBI will very likely exclude > requests with no email and will notify users who are considered to be > violating their policies. > > I will likely make further changes to Bio::DB::EUtilities in the > meantime to ensure that using the tools by default will not violate > NCBI's policy (e.g. override this at your own risk). > > chris > > > From cjfields at illinois.edu Thu Jan 28 14:18:22 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 28 Jan 2010 13:18:22 -0600 Subject: [Bioperl-l] EUtilities policy change In-Reply-To: <8F49B5ED151143FA86E977B4D4F44265@NewLife> References: <1264705226.5473.35.camel@cjfields.igb.uiuc.edu> <8F49B5ED151143FA86E977B4D4F44265@NewLife> Message-ID: <1264706302.5473.48.camel@cjfields.igb.uiuc.edu> I think warning is fine for now. I've reimplemented that so it occurs lazily (warns only when a request is actually made). Will also change the tool to 'BioPerl' (currently 'bioperl', all lc). We'll obviously have to address this in the test suite as well in some way, maybe ask for an email if network tests are requested. chris On Thu, 2010-01-28 at 14:05 -0500, Mark A. Jensen wrote: > Thanks Chris-- > The soap modules currently set tool to "SoapEUtilities(BioPerl)". > I agree that a default email is a bad idea (tm) (unless maybe it's > hilmar's...?). I'd say a warning on unset email parameters is a responsible > "there be dragons" sort of treatment. > MAJ > ----- Original Message ----- > From: "Chris Fields" > To: "BioPerl-l" > Cc: "Mark A. Jensen" > Sent: Thursday, January 28, 2010 2:00 PM > Subject: EUtilities policy change > > > > All, > > > > Per NCBI's recent change in eutils user policy (effective June 1): > > > > http://bioperl.org/pipermail/bioperl-l/2009-December/031698.html > > > > Both the tool and email parameters ('-tool', '-email') are now required > > when making requests. Note this will significantly break all modules > > requiring remote access to eutils (Bio::DB::GenBank/GenPept, some Biblio > > and Taxonomy stuff as well, IIRC). This also applies to web services > > (SOAP-based access). Mark, not sure how this affects your SOAP-based > > modules. > > > > I have reconfigured Bio::DB::EUtilities to follow this policy; the > > default tool setting has been 'bioperl' and will remain that way. > > However, there has been no default email, therefore setting this is now > > required for future requests unless we (the bioperl devs) decide there > > is a safe default email to utilize. My gut tells me, however, that > > falling back to a default email opens up a can of worms for the devs and > > is very likely a 'BAD IDEA'(TM). > > > > Regardless, be aware that, after June 1, NCBI will very likely exclude > > requests with no email and will notify users who are considered to be > > violating their policies. > > > > I will likely make further changes to Bio::DB::EUtilities in the > > meantime to ensure that using the tools by default will not violate > > NCBI's policy (e.g. override this at your own risk). > > > > chris > > > > > > From Russell.Smithies at agresearch.co.nz Thu Jan 28 14:25:38 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Fri, 29 Jan 2010 08:25:38 +1300 Subject: [Bioperl-l] how to retrieve organism name from accession number? In-Reply-To: <1264703187.5473.10.camel@cjfields.igb.uiuc.edu> References: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife> <18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz> <4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu> <18DF7D20DFEC044098A1062202F5FFF32C61AC961E@exchsth.agresearch.co.nz> <5FF9B845-2B6A-4B8C-B8AE-A12E0CBCA2D6@illinois.edu> <18DF7D20DFEC044098A1062202F5FFF32C61AC964C@exchsth.agresearch.co.nz> <1264703187.5473.10.camel@cjfields.igb.uiuc.edu> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C61AC9AF3@exchsth.agresearch.co.nz> Yes, I usually set the 'tool' and 'email' parameters. I went to NCBI back in 2006 and did their "PowerScripting" course where they pointed out a lot of the requirements for using eUtils. I think I requested results returned gzipped back then as well... --Russell > -----Original Message----- > From: Chris Fields [mailto:cjfields at illinois.edu] > Sent: Friday, 29 January 2010 7:26 a.m. > To: Smithies, Russell > Cc: 'bioperl-l at lists.open-bio.org'; 'Mark A. Jensen' > Subject: Re: [Bioperl-l] how to retrieve organism name from accession > number? > > Russell, > > Just curious, but have you tried setting the return email parameter > (-email)? NCBI recently stated that all queries would eventually > require a return email of some sort (not sure if it's validated or not). > I think that was set for around late spring. I'm changing the code in > svn to require it for that very purpose. > > chris > > > Wed, 2010-01-27 at 15:45 +1300, Smithies, Russell wrote: > > Batch-entrez http://www.ncbi.nlm.nih.gov/portal/utils/batchentrez_p.cgi > still works if you don't mind a bit of manual button clicking. It's > handling chunks of 100,000 records OK (today). > > > > --Russell > > > > > -----Original Message----- > > > From: Chris Fields [mailto:cjfields at illinois.edu] > > > Sent: Wednesday, 27 January 2010 3:42 p.m. > > > To: Smithies, Russell > > > Cc: 'bioperl-l at lists.open-bio.org'; 'Mark A. Jensen' > > > Subject: Re: [Bioperl-l] how to retrieve organism name from accession > > > number? > > > > > > Makes me wonder if they're pushing more users towards the SOAP-based > > > services and away from eutils. > > > > > > chris > > > > > > On Jan 26, 2010, at 7:59 PM, Smithies, Russell wrote: > > > > > > > I've had a wide selection of errors lately: > > > > > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > > > MSG: NCBI esearch fatal error: Search Backend failed: Error 11 > (Resource > > > temporarily unavailable) > > > > STACK: Error::throw > > > > STACK: Bio::Root::Root::throw > > > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 > > > > STACK: Bio::Tools::EUtilities::parse_data > > > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332 > > > > STACK: Bio::Tools::EUtilities::get_ids > > > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441 > > > > STACK: Bio::DB::EUtilities::get_ids > > > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363 > > > > STACK: get_desc.pl:32 > > > > ----------------------------------------------------------- > > > > > > > > And I never get a good explanation from NCBI or suggestions on how > to > > > avoid it. > > > > > > > > > > > > --Russell > > > > > > > > > > > >> -----Original Message----- > > > >> From: Chris Fields [mailto:cjfields at illinois.edu] > > > >> Sent: Wednesday, 27 January 2010 2:46 p.m. > > > >> To: Smithies, Russell > > > >> Cc: 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org' > > > >> Subject: Re: [Bioperl-l] how to retrieve organism name from > accession > > > >> number? > > > >> > > > >> It's unfortunate but I have heard this problem popping up quite a > bit > > > more > > > >> frequently lately. Not to push too many buttons but NCBI isn't > very > > > >> forthcoming with help these days; they have become quite insular. > Not > > > >> sure if they're short-staffed due to budget or if there are other > > > issues. > > > >> > > > >> chris > > > >> > > > >> On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote: > > > >> > > > >>> Grrrrrr, I hate eutils!!!! > > > >>> > > > >>> ------------- EXCEPTION: Bio::Root::Exception ------------- > > > >>> MSG: NCBI esearch fatal error: Search Backend failed: Error 111 > > > >> (Connection refused) > > > >>> STACK: Error::throw > > > >>> STACK: Bio::Root::Root::throw > > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 > > > >>> STACK: Bio::Tools::EUtilities::parse_data > > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332 > > > >>> STACK: Bio::Tools::EUtilities::get_ids > > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441 > > > >>> STACK: Bio::DB::EUtilities::get_ids > > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363 > > > >>> STACK: get_desc.pl:32 > > > >>> ----------------------------------------------------------- > > > >>> > > > >>> > > > >>> Nice error message though :-) > > > >>> > > > >>> > > > >>> --Russell > > > >>> > > > >>>> -----Original Message----- > > > >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > > >>>> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell > > > >>>> Sent: Monday, 11 January 2010 10:05 a.m. > > > >>>> To: 'Chris Fields' > > > >>>> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open- > > > bio.org' > > > >>>> Subject: Re: [Bioperl-l] how to retrieve organism name from > accession > > > >>>> number? > > > >>>> > > > >>>> I've started to go off eUtils recently (not BioPerl's fault) as > I've > > > >> often > > > >>>> been finding that with large queries, chunks of the resulting > data is > > > >>>> missing. > > > >>>> For example, before Xmas I was creating species-specific > databases by > > > >>>> using eUtils to get a list of GI numbers back for a taxid, then > > > >> retrieving > > > >>>> the fasta sequences in chunks of 500. > > > >>>> Very regularly, in the middle of the fasta there would be a > message > > > >> about > > > >>>> resource unavailable eg. > > > >>>>> test_sequence_1 > > > >>>> TACGATCATCGCTResource UnavailableTACGACTCTGCT > > > >>>>> test_sequence_2 > > > >>>> TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT > > > >>>> > > > >>>> Often this wasn't detected until formatdb complained about > invalid > > > >>>> characters. > > > >>>> Inquiries to NCBI as to why this was happening and what to do > about > > > it > > > >>>> returned stupid answers ("do each sequence manually thru the web > > > >>>> interface", or "use eUtils"). > > > >>>> As we have a nice fast network connection, I now prefer to > download > > > >> very > > > >>>> large gzip files (i.e. all of refseq) and extract what I need. > > > >>>> > > > >>>> I can't help but think that NCBI could solve a lot of problems if > > > they > > > >>>> gzipped the output from eUtils queries - it's something I've > > > requested > > > >>>> regularly for the last 5 years or so!! > > > >>>> > > > >>>> --Russell > > > >>>> > > > >>>> > > > >>>>> -----Original Message----- > > > >>>>> From: Chris Fields [mailto:cjfields at illinois.edu] > > > >>>>> Sent: Monday, 11 January 2010 9:50 a.m. > > > >>>>> To: Smithies, Russell > > > >>>>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open- > > > bio.org' > > > >>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from > > > accession > > > >>>>> number? > > > >>>>> > > > >>>>> One could also use Bio::DB::Taxonomy, which indexes the same > files > > > or > > > >>>>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD > for > > > >> the > > > >>>>> details). > > > >>>>> > > > >>>>> chris > > > >>>>> > > > >>>>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote: > > > >>>>> > > > >>>>>> An alternate non-BioPerly way (that may be faster given NCBI's > > > >>>> flakiness > > > >>>>> lately) would be to download the gi_taxid_nucl.zip or > > > >> gi_taxid_prot.zip > > > >>>>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into > a > > > hash > > > >>>> and > > > >>>>> do lookups. > > > >>>>>> In that same dir, taxdump.tar.gz contains a file called > names.dmp > > > >>>> which > > > >>>>> lists taxids and descriptions (and synonyms) > > > >>>>>> > > > >>>>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes > so > > > I > > > >>>>> could do this: > > > >>>>>> > > > >>>>>> my $taxid = $gi_taxid_nucl{$accession}; > > > >>>>>> my $org_name = $names{$taxid}; > > > >>>>>> > > > >>>>>> --Russell > > > >>>>>> > > > >>>>>> > > > >>>>>>> -----Original Message----- > > > >>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > > >>>>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen > > > >>>>>>> Sent: Saturday, 26 December 2009 4:52 p.m. > > > >>>>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org > > > >>>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from > > > >> accession > > > >>>>>>> number? > > > >>>>>>> > > > >>>>>>> Bhakti, > > > >>>>>>> The following example (using EUtilities) may serve your > purpose: > > > >>>>>>> > > > >>>>>>> use Bio::DB::EUtilities; > > > >>>>>>> > > > >>>>>>> my (%taxa, @taxa); > > > >>>>>>> my (%names, %idmap); > > > >>>>>>> > > > >>>>>>> # these are protein ids; nuc ids will work by changing -dbfrom > => > > > >>>>>>> 'nucleotide', > > > >>>>>>> # (probably) > > > >>>>>>> > > > >>>>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439); > > > >>>>>>> > > > >>>>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink', > > > >>>>>>> -db => 'taxonomy', > > > >>>>>>> -dbfrom => 'protein', > > > >>>>>>> -correspondence => 1, > > > >>>>>>> -id => \@ids); > > > >>>>>>> > > > >>>>>>> # iterate through the LinkSet objects > > > >>>>>>> while (my $ds = $factory->next_LinkSet) { > > > >>>>>>> $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0] > > > >>>>>>> } > > > >>>>>>> > > > >>>>>>> @taxa = @taxa{@ids}; > > > >>>>>>> > > > >>>>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary', > > > >>>>>>> -db => 'taxonomy', > > > >>>>>>> -id => \@taxa ); > > > >>>>>>> > > > >>>>>>> while (local $_ = $factory->next_DocSum) { > > > >>>>>>> $names{($_->get_contents_by_name('TaxId'))[0]} = > > > >>>>>>> ($_->get_contents_by_name('ScientificName'))[0]; > > > >>>>>>> } > > > >>>>>>> > > > >>>>>>> foreach (@ids) { > > > >>>>>>> $idmap{$_} = $names{$taxa{$_}}; > > > >>>>>>> } > > > >>>>>>> > > > >>>>>>> # %idmap is > > > >>>>>>> # 1621261 => 'Mycobacterium tuberculosis H37Rv' > > > >>>>>>> # 20807972 => 'Thermoanaerobacter tengcongensis MB4' > > > >>>>>>> # 68536103 => 'Corynebacterium jeikeium K411' > > > >>>>>>> # 730439 => 'Bacillus caldolyticus' > > > >>>>>>> # 89318838 => undef (this record has been removed from > the > > > db) > > > >>>>>>> > > > >>>>>>> 1; > > > >>>>>>> > > > >>>>>>> You probably will need to break up your 30000 into chunks > > > >>>>>>> (say, 1000-3000 each), and do the above on each chunk with a > > > >>>>>>> > > > >>>>>>> sleep 3; > > > >>>>>>> > > > >>>>>>> or so separating the queries. > > > >>>>>>> MAJ > > > >>>>>>> ----- Original Message ----- > > > >>>>>>> From: "Bhakti Dwivedi" > > > >>>>>>> To: > > > >>>>>>> Sent: Friday, December 25, 2009 9:46 PM > > > >>>>>>> Subject: [Bioperl-l] how to retrieve organism name from > accession > > > >>>>> number? > > > >>>>>>> > > > >>>>>>> > > > >>>>>>>> Hi, > > > >>>>>>>> > > > >>>>>>>> Does anyone know how to retrieve the "Source" or the "Species > > > name" > > > >>>>>>> given > > > >>>>>>>> the accession number using Bioperl. I have these 30,000 > > > accession > > > >>>>>>> numbers > > > >>>>>>>> for which I need to get the source organisms. Any kind of > help > > > >> will > > > >>>>> be > > > >>>>>>>> appreciated. > > > >>>>>>>> > > > >>>>>>>> Thanks > > > >>>>>>>> > > > >>>>>>>> BD > > > >>>>>>>> _______________________________________________ > > > >>>>>>>> Bioperl-l mailing list > > > >>>>>>>> Bioperl-l at lists.open-bio.org > > > >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>> > > > >>>>>>> _______________________________________________ > > > >>>>>>> Bioperl-l mailing list > > > >>>>>>> Bioperl-l at lists.open-bio.org > > > >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > >>>>>> > > > >>>> > > > > ======================================================================= > > > >>>>>> Attention: The information contained in this message and/or > > > >>>> attachments > > > >>>>>> from AgResearch Limited is intended only for the persons or > > > entities > > > >>>>>> to which it is addressed and may contain confidential and/or > > > >>>> privileged > > > >>>>>> material. Any review, retransmission, dissemination or other > use > > > of, > > > >>>> or > > > >>>>>> taking of any action in reliance upon, this information by > persons > > > or > > > >>>>>> entities other than the intended recipients is prohibited by > > > >>>> AgResearch > > > >>>>>> Limited. If you have received this message in error, please > notify > > > >> the > > > >>>>>> sender immediately. > > > >>>>>> > > > >>>> > > > > ======================================================================= > > > >>>>>> > > > >>>>>> _______________________________________________ > > > >>>>>> Bioperl-l mailing list > > > >>>>>> Bioperl-l at lists.open-bio.org > > > >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > >>>> > > > >>>> > > > >>>> _______________________________________________ > > > >>>> Bioperl-l mailing list > > > >>>> Bioperl-l at lists.open-bio.org > > > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l at lists.open-bio.org > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Thu Jan 28 14:30:12 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 28 Jan 2010 13:30:12 -0600 Subject: [Bioperl-l] how to retrieve organism name from accession number? In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C61AC9AF3@exchsth.agresearch.co.nz> References: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife> <18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz> <4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu> <18DF7D20DFEC044098A1062202F5FFF32C61AC961E@exchsth.agresearch.co.nz> <5FF9B845-2B6A-4B8C-B8AE-A12E0CBCA2D6@illinois.edu> <18DF7D20DFEC044098A1062202F5FFF32C61AC964C@exchsth.agresearch.co.nz> <1264703187.5473.10.camel@cjfields.igb.uiuc.edu> <18DF7D20DFEC044098A1062202F5FFF32C61AC9AF3@exchsth.agresearch.co.nz> Message-ID: <1264707012.5473.51.camel@cjfields.igb.uiuc.edu> Russell, Okay, just wanted to make sure. The email/tool requirements weren't actually enforced up until now, which is forcing us to do a bit of re-work on the various tools that don't have it set by default (at least warn users unaware of it). And I agree, gzipped archives would be nice! chris On Fri, 2010-01-29 at 08:25 +1300, Smithies, Russell wrote: > Yes, I usually set the 'tool' and 'email' parameters. > I went to NCBI back in 2006 and did their "PowerScripting" course where they pointed out a lot of the requirements for using eUtils. I think I requested results returned gzipped back then as well... > > --Russell > > > -----Original Message----- > > From: Chris Fields [mailto:cjfields at illinois.edu] > > Sent: Friday, 29 January 2010 7:26 a.m. > > To: Smithies, Russell > > Cc: 'bioperl-l at lists.open-bio.org'; 'Mark A. Jensen' > > Subject: Re: [Bioperl-l] how to retrieve organism name from accession > > number? > > > > Russell, > > > > Just curious, but have you tried setting the return email parameter > > (-email)? NCBI recently stated that all queries would eventually > > require a return email of some sort (not sure if it's validated or not). > > I think that was set for around late spring. I'm changing the code in > > svn to require it for that very purpose. > > > > chris > > > > > > Wed, 2010-01-27 at 15:45 +1300, Smithies, Russell wrote: > > > Batch-entrez http://www.ncbi.nlm.nih.gov/portal/utils/batchentrez_p.cgi > > still works if you don't mind a bit of manual button clicking. It's > > handling chunks of 100,000 records OK (today). > > > > > > --Russell > > > > > > > -----Original Message----- > > > > From: Chris Fields [mailto:cjfields at illinois.edu] > > > > Sent: Wednesday, 27 January 2010 3:42 p.m. > > > > To: Smithies, Russell > > > > Cc: 'bioperl-l at lists.open-bio.org'; 'Mark A. Jensen' > > > > Subject: Re: [Bioperl-l] how to retrieve organism name from accession > > > > number? > > > > > > > > Makes me wonder if they're pushing more users towards the SOAP-based > > > > services and away from eutils. > > > > > > > > chris > > > > > > > > On Jan 26, 2010, at 7:59 PM, Smithies, Russell wrote: > > > > > > > > > I've had a wide selection of errors lately: > > > > > > > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > > > > MSG: NCBI esearch fatal error: Search Backend failed: Error 11 > > (Resource > > > > temporarily unavailable) > > > > > STACK: Error::throw > > > > > STACK: Bio::Root::Root::throw > > > > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 > > > > > STACK: Bio::Tools::EUtilities::parse_data > > > > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332 > > > > > STACK: Bio::Tools::EUtilities::get_ids > > > > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441 > > > > > STACK: Bio::DB::EUtilities::get_ids > > > > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363 > > > > > STACK: get_desc.pl:32 > > > > > ----------------------------------------------------------- > > > > > > > > > > And I never get a good explanation from NCBI or suggestions on how > > to > > > > avoid it. > > > > > > > > > > > > > > > --Russell > > > > > > > > > > > > > > >> -----Original Message----- > > > > >> From: Chris Fields [mailto:cjfields at illinois.edu] > > > > >> Sent: Wednesday, 27 January 2010 2:46 p.m. > > > > >> To: Smithies, Russell > > > > >> Cc: 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org' > > > > >> Subject: Re: [Bioperl-l] how to retrieve organism name from > > accession > > > > >> number? > > > > >> > > > > >> It's unfortunate but I have heard this problem popping up quite a > > bit > > > > more > > > > >> frequently lately. Not to push too many buttons but NCBI isn't > > very > > > > >> forthcoming with help these days; they have become quite insular. > > Not > > > > >> sure if they're short-staffed due to budget or if there are other > > > > issues. > > > > >> > > > > >> chris > > > > >> > > > > >> On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote: > > > > >> > > > > >>> Grrrrrr, I hate eutils!!!! > > > > >>> > > > > >>> ------------- EXCEPTION: Bio::Root::Exception ------------- > > > > >>> MSG: NCBI esearch fatal error: Search Backend failed: Error 111 > > > > >> (Connection refused) > > > > >>> STACK: Error::throw > > > > >>> STACK: Bio::Root::Root::throw > > > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 > > > > >>> STACK: Bio::Tools::EUtilities::parse_data > > > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332 > > > > >>> STACK: Bio::Tools::EUtilities::get_ids > > > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441 > > > > >>> STACK: Bio::DB::EUtilities::get_ids > > > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363 > > > > >>> STACK: get_desc.pl:32 > > > > >>> ----------------------------------------------------------- > > > > >>> > > > > >>> > > > > >>> Nice error message though :-) > > > > >>> > > > > >>> > > > > >>> --Russell > > > > >>> > > > > >>>> -----Original Message----- > > > > >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > > > >>>> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell > > > > >>>> Sent: Monday, 11 January 2010 10:05 a.m. > > > > >>>> To: 'Chris Fields' > > > > >>>> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open- > > > > bio.org' > > > > >>>> Subject: Re: [Bioperl-l] how to retrieve organism name from > > accession > > > > >>>> number? > > > > >>>> > > > > >>>> I've started to go off eUtils recently (not BioPerl's fault) as > > I've > > > > >> often > > > > >>>> been finding that with large queries, chunks of the resulting > > data is > > > > >>>> missing. > > > > >>>> For example, before Xmas I was creating species-specific > > databases by > > > > >>>> using eUtils to get a list of GI numbers back for a taxid, then > > > > >> retrieving > > > > >>>> the fasta sequences in chunks of 500. > > > > >>>> Very regularly, in the middle of the fasta there would be a > > message > > > > >> about > > > > >>>> resource unavailable eg. > > > > >>>>> test_sequence_1 > > > > >>>> TACGATCATCGCTResource UnavailableTACGACTCTGCT > > > > >>>>> test_sequence_2 > > > > >>>> TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT > > > > >>>> > > > > >>>> Often this wasn't detected until formatdb complained about > > invalid > > > > >>>> characters. > > > > >>>> Inquiries to NCBI as to why this was happening and what to do > > about > > > > it > > > > >>>> returned stupid answers ("do each sequence manually thru the web > > > > >>>> interface", or "use eUtils"). > > > > >>>> As we have a nice fast network connection, I now prefer to > > download > > > > >> very > > > > >>>> large gzip files (i.e. all of refseq) and extract what I need. > > > > >>>> > > > > >>>> I can't help but think that NCBI could solve a lot of problems if > > > > they > > > > >>>> gzipped the output from eUtils queries - it's something I've > > > > requested > > > > >>>> regularly for the last 5 years or so!! > > > > >>>> > > > > >>>> --Russell > > > > >>>> > > > > >>>> > > > > >>>>> -----Original Message----- > > > > >>>>> From: Chris Fields [mailto:cjfields at illinois.edu] > > > > >>>>> Sent: Monday, 11 January 2010 9:50 a.m. > > > > >>>>> To: Smithies, Russell > > > > >>>>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open- > > > > bio.org' > > > > >>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from > > > > accession > > > > >>>>> number? > > > > >>>>> > > > > >>>>> One could also use Bio::DB::Taxonomy, which indexes the same > > files > > > > or > > > > >>>>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD > > for > > > > >> the > > > > >>>>> details). > > > > >>>>> > > > > >>>>> chris > > > > >>>>> > > > > >>>>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote: > > > > >>>>> > > > > >>>>>> An alternate non-BioPerly way (that may be faster given NCBI's > > > > >>>> flakiness > > > > >>>>> lately) would be to download the gi_taxid_nucl.zip or > > > > >> gi_taxid_prot.zip > > > > >>>>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into > > a > > > > hash > > > > >>>> and > > > > >>>>> do lookups. > > > > >>>>>> In that same dir, taxdump.tar.gz contains a file called > > names.dmp > > > > >>>> which > > > > >>>>> lists taxids and descriptions (and synonyms) > > > > >>>>>> > > > > >>>>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes > > so > > > > I > > > > >>>>> could do this: > > > > >>>>>> > > > > >>>>>> my $taxid = $gi_taxid_nucl{$accession}; > > > > >>>>>> my $org_name = $names{$taxid}; > > > > >>>>>> > > > > >>>>>> --Russell > > > > >>>>>> > > > > >>>>>> > > > > >>>>>>> -----Original Message----- > > > > >>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > > > >>>>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen > > > > >>>>>>> Sent: Saturday, 26 December 2009 4:52 p.m. > > > > >>>>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org > > > > >>>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from > > > > >> accession > > > > >>>>>>> number? > > > > >>>>>>> > > > > >>>>>>> Bhakti, > > > > >>>>>>> The following example (using EUtilities) may serve your > > purpose: > > > > >>>>>>> > > > > >>>>>>> use Bio::DB::EUtilities; > > > > >>>>>>> > > > > >>>>>>> my (%taxa, @taxa); > > > > >>>>>>> my (%names, %idmap); > > > > >>>>>>> > > > > >>>>>>> # these are protein ids; nuc ids will work by changing -dbfrom > > => > > > > >>>>>>> 'nucleotide', > > > > >>>>>>> # (probably) > > > > >>>>>>> > > > > >>>>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439); > > > > >>>>>>> > > > > >>>>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink', > > > > >>>>>>> -db => 'taxonomy', > > > > >>>>>>> -dbfrom => 'protein', > > > > >>>>>>> -correspondence => 1, > > > > >>>>>>> -id => \@ids); > > > > >>>>>>> > > > > >>>>>>> # iterate through the LinkSet objects > > > > >>>>>>> while (my $ds = $factory->next_LinkSet) { > > > > >>>>>>> $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0] > > > > >>>>>>> } > > > > >>>>>>> > > > > >>>>>>> @taxa = @taxa{@ids}; > > > > >>>>>>> > > > > >>>>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary', > > > > >>>>>>> -db => 'taxonomy', > > > > >>>>>>> -id => \@taxa ); > > > > >>>>>>> > > > > >>>>>>> while (local $_ = $factory->next_DocSum) { > > > > >>>>>>> $names{($_->get_contents_by_name('TaxId'))[0]} = > > > > >>>>>>> ($_->get_contents_by_name('ScientificName'))[0]; > > > > >>>>>>> } > > > > >>>>>>> > > > > >>>>>>> foreach (@ids) { > > > > >>>>>>> $idmap{$_} = $names{$taxa{$_}}; > > > > >>>>>>> } > > > > >>>>>>> > > > > >>>>>>> # %idmap is > > > > >>>>>>> # 1621261 => 'Mycobacterium tuberculosis H37Rv' > > > > >>>>>>> # 20807972 => 'Thermoanaerobacter tengcongensis MB4' > > > > >>>>>>> # 68536103 => 'Corynebacterium jeikeium K411' > > > > >>>>>>> # 730439 => 'Bacillus caldolyticus' > > > > >>>>>>> # 89318838 => undef (this record has been removed from > > the > > > > db) > > > > >>>>>>> > > > > >>>>>>> 1; > > > > >>>>>>> > > > > >>>>>>> You probably will need to break up your 30000 into chunks > > > > >>>>>>> (say, 1000-3000 each), and do the above on each chunk with a > > > > >>>>>>> > > > > >>>>>>> sleep 3; > > > > >>>>>>> > > > > >>>>>>> or so separating the queries. > > > > >>>>>>> MAJ > > > > >>>>>>> ----- Original Message ----- > > > > >>>>>>> From: "Bhakti Dwivedi" > > > > >>>>>>> To: > > > > >>>>>>> Sent: Friday, December 25, 2009 9:46 PM > > > > >>>>>>> Subject: [Bioperl-l] how to retrieve organism name from > > accession > > > > >>>>> number? > > > > >>>>>>> > > > > >>>>>>> > > > > >>>>>>>> Hi, > > > > >>>>>>>> > > > > >>>>>>>> Does anyone know how to retrieve the "Source" or the "Species > > > > name" > > > > >>>>>>> given > > > > >>>>>>>> the accession number using Bioperl. I have these 30,000 > > > > accession > > > > >>>>>>> numbers > > > > >>>>>>>> for which I need to get the source organisms. Any kind of > > help > > > > >> will > > > > >>>>> be > > > > >>>>>>>> appreciated. > > > > >>>>>>>> > > > > >>>>>>>> Thanks > > > > >>>>>>>> > > > > >>>>>>>> BD > > > > >>>>>>>> _______________________________________________ > > > > >>>>>>>> Bioperl-l mailing list > > > > >>>>>>>> Bioperl-l at lists.open-bio.org > > > > >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > >>>>>>>> > > > > >>>>>>>> > > > > >>>>>>> > > > > >>>>>>> _______________________________________________ > > > > >>>>>>> Bioperl-l mailing list > > > > >>>>>>> Bioperl-l at lists.open-bio.org > > > > >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > >>>>>> > > > > >>>> > > > > > > ======================================================================= > > > > >>>>>> Attention: The information contained in this message and/or > > > > >>>> attachments > > > > >>>>>> from AgResearch Limited is intended only for the persons or > > > > entities > > > > >>>>>> to which it is addressed and may contain confidential and/or > > > > >>>> privileged > > > > >>>>>> material. Any review, retransmission, dissemination or other > > use > > > > of, > > > > >>>> or > > > > >>>>>> taking of any action in reliance upon, this information by > > persons > > > > or > > > > >>>>>> entities other than the intended recipients is prohibited by > > > > >>>> AgResearch > > > > >>>>>> Limited. If you have received this message in error, please > > notify > > > > >> the > > > > >>>>>> sender immediately. > > > > >>>>>> > > > > >>>> > > > > > > ======================================================================= > > > > >>>>>> > > > > >>>>>> _______________________________________________ > > > > >>>>>> Bioperl-l mailing list > > > > >>>>>> Bioperl-l at lists.open-bio.org > > > > >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > >>>> > > > > >>>> > > > > >>>> _______________________________________________ > > > > >>>> Bioperl-l mailing list > > > > >>>> Bioperl-l at lists.open-bio.org > > > > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > > > > _______________________________________________ > > > > > Bioperl-l mailing list > > > > > Bioperl-l at lists.open-bio.org > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From maj at fortinbras.us Thu Jan 28 14:55:31 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 28 Jan 2010 14:55:31 -0500 Subject: [Bioperl-l] EUtilities policy change In-Reply-To: <1264706302.5473.48.camel@cjfields.igb.uiuc.edu> References: <1264705226.5473.35.camel@cjfields.igb.uiuc.edu><8F49B5ED151143FA86E977B4D4F44265@NewLife> <1264706302.5473.48.camel@cjfields.igb.uiuc.edu> Message-ID: Ok, SoapEU now warns on no email; passes email onto the fetch stage during autofetch -- cheers MAJ ----- Original Message ----- From: "Chris Fields" To: "Mark A. Jensen" Cc: "BioPerl-l" Sent: Thursday, January 28, 2010 2:18 PM Subject: Re: [Bioperl-l] EUtilities policy change >I think warning is fine for now. I've reimplemented that so it occurs > lazily (warns only when a request is actually made). > > Will also change the tool to 'BioPerl' (currently 'bioperl', all lc). > We'll obviously have to address this in the test suite as well in some > way, maybe ask for an email if network tests are requested. > > chris > > On Thu, 2010-01-28 at 14:05 -0500, Mark A. Jensen wrote: >> Thanks Chris-- >> The soap modules currently set tool to "SoapEUtilities(BioPerl)". >> I agree that a default email is a bad idea (tm) (unless maybe it's >> hilmar's...?). I'd say a warning on unset email parameters is a responsible >> "there be dragons" sort of treatment. >> MAJ >> ----- Original Message ----- >> From: "Chris Fields" >> To: "BioPerl-l" >> Cc: "Mark A. Jensen" >> Sent: Thursday, January 28, 2010 2:00 PM >> Subject: EUtilities policy change >> >> >> > All, >> > >> > Per NCBI's recent change in eutils user policy (effective June 1): >> > >> > http://bioperl.org/pipermail/bioperl-l/2009-December/031698.html >> > >> > Both the tool and email parameters ('-tool', '-email') are now required >> > when making requests. Note this will significantly break all modules >> > requiring remote access to eutils (Bio::DB::GenBank/GenPept, some Biblio >> > and Taxonomy stuff as well, IIRC). This also applies to web services >> > (SOAP-based access). Mark, not sure how this affects your SOAP-based >> > modules. >> > >> > I have reconfigured Bio::DB::EUtilities to follow this policy; the >> > default tool setting has been 'bioperl' and will remain that way. >> > However, there has been no default email, therefore setting this is now >> > required for future requests unless we (the bioperl devs) decide there >> > is a safe default email to utilize. My gut tells me, however, that >> > falling back to a default email opens up a can of worms for the devs and >> > is very likely a 'BAD IDEA'(TM). >> > >> > Regardless, be aware that, after June 1, NCBI will very likely exclude >> > requests with no email and will notify users who are considered to be >> > violating their policies. >> > >> > I will likely make further changes to Bio::DB::EUtilities in the >> > meantime to ensure that using the tools by default will not violate >> > NCBI's policy (e.g. override this at your own risk). >> > >> > chris >> > >> > >> > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From chapmanb at 50mail.com Thu Jan 28 15:35:05 2010 From: chapmanb at 50mail.com (Brad Chapman) Date: Thu, 28 Jan 2010 15:35:05 -0500 Subject: [Bioperl-l] OpenBio solution challenge: Project updates at BOSC 2010 Message-ID: <20100128203505.GG40046@sobchak.mgh.harvard.edu> Hello all; The BOSC 2010 organizing committee is hard at work getting prepared for this July's meeting in Boston: http://www.open-bio.org/wiki/BOSC_2010 One of the items we've traditionally had at the conference is a project update from each of the OpenBio affiliated groups. This year, we're thinking about organizing these talks around a central theme: the OpenBio solution challenge. We start with a biological question of general interest, and each of the project talks would focus around how you would solve that problem using your toolkit and programming language. This is meant to provide a challenge for OpenBio contributors, a nice tutorial style overview of various projects and approaches for other programmers, and a fun opportunity to compete and learn from other projects. Conference attendees will vote on their favorite solution, with the winner receiving fame and fortune (warning: fortune not guaranteed). For this to be successful, it of course requires interest and enthusiasm from y'all fine folks involved with the projects. Specifically: - Is there interest from your group in participating in the challenge? You'll want at least a few people to work on it, and someone to give a presentation at BOSC. - Do you have suggestions on a good theme or specific biological problem to tackle? We'll hope to pick something in a sweet spot that is challenging enough to be of interest, yet reasonable for presentation and preparation. Let's discuss ideas and get this together. Since the schedule for BOSC is developing rapidly, please give us an idea if you're interested by February 12th, and copy responses to the BOSC mailing list as a central place for discussion. bosc at open-bio.org Thanks, Brad, Michael, and the BOSC organizing committee From markw at illuminae.com Thu Jan 28 16:17:44 2010 From: markw at illuminae.com (Mark Wilkinson) Date: Thu, 28 Jan 2010 13:17:44 -0800 Subject: [Bioperl-l] [MOBY-dev] OpenBio solution challenge: Project updates at BOSC 2010 In-Reply-To: <20100128203505.GG40046@sobchak.mgh.harvard.edu> References: <20100128203505.GG40046@sobchak.mgh.harvard.edu> Message-ID: Brad, this sounds exciting! One thing strikes me, though - by asking for the sub-projects to propose the "grand challenge" themselves the one thing you can guarantee is that the "grand challenge" is solvable (or more likely, already solved!) Other "grand challenge" kinds of meetings have an independent third party pose the problem that has to be solved, and then all groups work toward a solution and compare their results. This would, IMO, be more revealing of the "state of the art" in each Open-Bio project, and point out where the weaknesses are that we should be focusing on... Someone (for example, you!) could act as the moderator to ensure that the "grand challenge" was at least a reasonable one, within the scope of what an Open-Bio project *should* be able to solve... Just my CAD $0.02 Mark On Thu, 28 Jan 2010 12:35:05 -0800, Brad Chapman wrote: > Hello all; > The BOSC 2010 organizing committee is hard at work getting prepared for > this > July's meeting in Boston: > > http://www.open-bio.org/wiki/BOSC_2010 > > One of the items we've traditionally had at the conference is a project > update from each of the OpenBio affiliated groups. This year, we're > thinking > about organizing these talks around a central theme: the OpenBio solution > challenge. We start with a biological question of general interest, and > each > of the project talks would focus around how you would solve that problem > using your toolkit and programming language. > > This is meant to provide a challenge for OpenBio contributors, a nice > tutorial > style overview of various projects and approaches for other programmers, > and a > fun opportunity to compete and learn from other projects. Conference > attendees > will vote on their favorite solution, with the winner receiving fame and > fortune (warning: fortune not guaranteed). > > For this to be successful, it of course requires interest and enthusiasm > from > y'all fine folks involved with the projects. Specifically: > > - Is there interest from your group in participating in the challenge? > You'll > want at least a few people to work on it, and someone to give a > presentation > at BOSC. > > - Do you have suggestions on a good theme or specific biological problem > to > tackle? We'll hope to pick something in a sweet spot that is > challenging > enough to be of interest, yet reasonable for presentation and > preparation. > > Let's discuss ideas and get this together. Since the schedule for BOSC is > developing rapidly, please give us an idea if you're interested by > February 12th, and copy responses to the BOSC mailing list as a central > place for discussion. > > bosc at open-bio.org > > Thanks, > Brad, Michael, and the BOSC organizing committee > _______________________________________________ > MOBY-dev mailing list > MOBY-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/moby-dev -- Mark D Wilkinson, PI Bioinformatics Assistant Professor, Medical Genetics The James Hogg iCAPTURE Centre for Cardiovascular and Pulmonary Research Providence Heart + Lung Institute University of British Columbia - St. Paul's Hospital Vancouver, BC, Canada From HWillis at scripps.edu Thu Jan 28 20:03:10 2010 From: HWillis at scripps.edu (Scooter Willis) Date: Thu, 28 Jan 2010 20:03:10 -0500 Subject: [Bioperl-l] [Biojava-dev] [MOBY-dev] OpenBio solution challenge: Project updates at BOSC 2010 In-Reply-To: References: <20100128203505.GG40046@sobchak.mgh.harvard.edu> Message-ID: <716E205A-5196-409F-A7BC-EF0F52AA997A@scripps.edu> Brad I agree with Mark that a particular problem may be biased towards a toolkit/language. Another approach would be to list a collection of problems and each group would then pick a problem to present. Could be a little more interesting to the audience as you are exposed to different problems and the various strengths of each toolkit. This could also help guide future development in the other toolkits as you would benefit from learning about the api and/or programming language. Each group would register a problem that they are going to present. From the group of problems not picked that becomes the surprise challenge where each group has 24 hours to either put together a presentation or an actual solution. Scooter On Jan 28, 2010, at 4:17 PM, Mark Wilkinson wrote: > > Brad, this sounds exciting! > > One thing strikes me, though - by asking for the sub-projects to propose > the "grand challenge" themselves the one thing you can guarantee is that > the "grand challenge" is solvable (or more likely, already solved!) > > Other "grand challenge" kinds of meetings have an independent third party > pose the problem that has to be solved, and then all groups work toward a > solution and compare their results. This would, IMO, be more revealing of > the "state of the art" in each Open-Bio project, and point out where the > weaknesses are that we should be focusing on... Someone (for example, > you!) could act as the moderator to ensure that the "grand challenge" was > at least a reasonable one, within the scope of what an Open-Bio project > *should* be able to solve... > > Just my CAD $0.02 > > Mark > > > > On Thu, 28 Jan 2010 12:35:05 -0800, Brad Chapman > wrote: > >> Hello all; >> The BOSC 2010 organizing committee is hard at work getting prepared for >> this >> July's meeting in Boston: >> >> http://www.open-bio.org/wiki/BOSC_2010 >> >> One of the items we've traditionally had at the conference is a project >> update from each of the OpenBio affiliated groups. This year, we're >> thinking >> about organizing these talks around a central theme: the OpenBio solution >> challenge. We start with a biological question of general interest, and >> each >> of the project talks would focus around how you would solve that problem >> using your toolkit and programming language. >> >> This is meant to provide a challenge for OpenBio contributors, a nice >> tutorial >> style overview of various projects and approaches for other programmers, >> and a >> fun opportunity to compete and learn from other projects. Conference >> attendees >> will vote on their favorite solution, with the winner receiving fame and >> fortune (warning: fortune not guaranteed). >> >> For this to be successful, it of course requires interest and enthusiasm >> from >> y'all fine folks involved with the projects. Specifically: >> >> - Is there interest from your group in participating in the challenge? >> You'll >> want at least a few people to work on it, and someone to give a >> presentation >> at BOSC. >> >> - Do you have suggestions on a good theme or specific biological problem >> to >> tackle? We'll hope to pick something in a sweet spot that is >> challenging >> enough to be of interest, yet reasonable for presentation and >> preparation. >> >> Let's discuss ideas and get this together. Since the schedule for BOSC is >> developing rapidly, please give us an idea if you're interested by >> February 12th, and copy responses to the BOSC mailing list as a central >> place for discussion. >> >> bosc at open-bio.org >> >> Thanks, >> Brad, Michael, and the BOSC organizing committee >> _______________________________________________ >> MOBY-dev mailing list >> MOBY-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/moby-dev > > > -- > Mark D Wilkinson, PI Bioinformatics > Assistant Professor, Medical Genetics > The James Hogg iCAPTURE Centre for Cardiovascular and Pulmonary Research > Providence Heart + Lung Institute > University of British Columbia - St. Paul's Hospital > Vancouver, BC, Canada > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From biopython at maubp.freeserve.co.uk Fri Jan 29 05:36:40 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 29 Jan 2010 10:36:40 +0000 Subject: [Bioperl-l] [MOBY-dev] OpenBio solution challenge: Project updates at BOSC 2010 In-Reply-To: References: <20100128203505.GG40046@sobchak.mgh.harvard.edu> Message-ID: <320fb6e01001290236l1ad02515w403a19f94dbb6d15@mail.gmail.com> Hi all, This is a great topic but should be continue it on just the one mailing list? Is there a suitable BOSC list, or how about the general Open Bio list? On Thu, Jan 28, 2010 at 9:17 PM, Mark Wilkinson wrote: > > Brad, this sounds exciting! > > One thing strikes me, though - by asking for the sub-projects to propose > the "grand challenge" themselves the one thing you can guarantee is that > the "grand challenge" is solvable (or more likely, already solved!) > > Other "grand challenge" kinds of meetings have an independent third party > pose the problem that has to be solved, and then all groups work toward a > solution and compare their results. ?This would, IMO, be more revealing of > the "state of the art" in each Open-Bio project, and point out where the > weaknesses are that we should be focusing on... ?Someone (for example, > you!) could act as the moderator to ensure that the "grand challenge" was > at least a reasonable one, within the scope of what an Open-Bio project > *should* be able to solve... > > Just my CAD $0.02 > > Mark One possible problem with having Brad act as moderator is his ties to Biopython (plus it would be a shame if we'd be one man down for trying to solve the challenges - grin). Having a project representative "sign off" on the challenge might work - or simply the whole of the BOSC committee which is quite balanced. Alternatively some kind of panel of challenges does seem a good way to reduce individual project bias (as suggest by Scooter), but there will still need to be a judging committee. I'm curious what kind of challenges the BOSC committee had in mind - would something like taking a newly sequence bacteria and producing an automated annotation as a GenBank, EMBL, or GFF file be too ambitious for example? There are already several major projects to do this e.g. RAST http://rast.nmpdr.org/ Peter (@Biopython) From mike.stubbington at bbsrc.ac.uk Fri Jan 29 08:25:25 2010 From: mike.stubbington at bbsrc.ac.uk (mike stubbington (BI)) Date: Fri, 29 Jan 2010 13:25:25 +0000 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error running blastn In-Reply-To: References: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk><56CC8EF7EEAD4E28A3B822ADA8AD74B1@NewLife> <05171AE0-71F2-4102-8131-70C6019D0172@bbsrc.ac.uk> Message-ID: Hi Mark, Thanks for your continued help. It now fails with this: ------------- EXCEPTION ------------- MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running /usr/local/ncbi/blast/bin/blastn : BLAST Database error: No alias or index file found for nucleotide database [MouseGenome] in search path [/Volumes/stubbing/PerlScripts/5CTest/trunk::] STACK Bio::Tools::Run::WrapperBase::_run /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1004 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 STACK Bio::Tools::Run::StandAloneBlastPlus::run /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 STACK toplevel ./5CTest.pl:63 ------------------------------------- If I change the factory creation to: my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new( -db_name => '/Users/stubbing/localBlast/MouseGenome' ); it fails with ------------- EXCEPTION ------------- MSG: DB name not valid STACK Bio::Tools::Run::StandAloneBlastPlus::new /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:516 STACK toplevel ./5CTest.pl:45 ------------------------------------- However I can run the following successfully from the command line: blastn -db /Users/stubbing/localBlast/MouseGenome -query querySequence.fasta Is there something wrong with how I'm referring to the blast database when I construct my factory? Thanks again, M On 28 Jan 2010, at 18:47, Mark A. Jensen wrote: > Hi Mike, > Believe I found the real bug causing the problem (was not accounting for > the db_dir parameter). Crashes should now also throw much more helpful > errors. Please try the code at r16774, and shout back. > thanks -- > MAJ > ----- Original Message ----- > From: "mike stubbington (BI)" > To: "Mark A. Jensen" > Cc: > Sent: Thursday, January 28, 2010 11:18 AM > Subject: Re: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek > error running blastn > > > Hi, > > Thanks for the suggestion. Unfortunately it still fails - error as follows: > > ------------- EXCEPTION ------------- > MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running > /usr/local/ncbi/blast/bin/blastn : Illegal seek at > /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 1000, > line 532. > > STACK Bio::Tools::Run::WrapperBase::_run > /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1005 > STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 > STACK Bio::Tools::Run::StandAloneBlastPlus::run > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 > STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 > STACK toplevel ./5CTest.pl:63 > ------------------------------------- > > M > > On 28 Jan 2010, at 15:56, Mark A. Jensen wrote: > >> Mike - please try updating your bioperl-live (the core) to the latest code >> (revision 16761 or so). >> CommandExts is a work in progress; from the stack errors it looks like you've >> got an older version. >> Try it then ping us back, if you would-- >> Thanks >> Mark >> ----- Original Message ----- >> From: "mike stubbington (BI)" >> To: >> Sent: Thursday, January 28, 2010 10:41 AM >> Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error >> running blastn >> >> >> Dear all, >> >> I am attempting to blast some primers against the mouse genome. I have created >> a >> local mouse genome blast database and I can search against it using 'blastn' >> at >> the command line. >> >> I have perl code that creates an array of bioperl sequence objects called >> @primers >> >> I then create a StandAloneBlastPlus factory using the following code? >> >> my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new( >> -db_dir => '/Users/stubbing/localBlast/', >> -db_name => 'MouseGenome' >> ); >> >> and then attempt to blast my primers using this? >> >> my @shortPrimers; >> my $count=1; >> foreach (@primers) { >> my $currentSeq = $_; >> print "Checking primer $count/$primerNumber "; >> if ($_->length < 40) { >> push(@shortPrimers,$_); >> print "Too short!\n"; >> } >> else { >> print "BLASTing..."; >> my $blastResult = $blastFactory->blastn(-query => $currentSeq); >> } >> $count++; >> } >> >> This fails with the following error? >> >> ------------- EXCEPTION ------------- >> MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem >> running >> /usr/local/ncbi/blast/bin/blastn : Illegal seek at >> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 989, >> line 532. >> >> STACK Bio::Tools::Run::WrapperBase::_run >> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:994 >> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD >> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 >> STACK Bio::Tools::Run::StandAloneBlastPlus::run >> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 >> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD >> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 >> STACK toplevel ./5CTest.pl:63 >> ------------------------------------- >> >> Line 63 in my code is (as you might expect) the one that calls blastn on my >> factory object. >> >> I'd appreciate any help you might be able to provide to shed light on this. >> >> Thanks in advance, >> >> Mike >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Fri Jan 29 08:36:54 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 29 Jan 2010 08:36:54 -0500 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error running blastn In-Reply-To: References: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk><56CC8EF7EEAD4E28A3B822ADA8AD74B1@NewLife> <05171AE0-71F2-4102-8131-70C6019D0172@bbsrc.ac.uk> Message-ID: Hi Mike- Well, at least we're getting more informative errors. I think it's still my bad; will look again. Both of your calls should work. (thanks for the positive control too) Thanks for your patience and the help-- MAJ ----- Original Message ----- From: "mike stubbington (BI)" To: "Mark A. Jensen" Cc: ; "Brian Osborne" Sent: Friday, January 29, 2010 8:25 AM Subject: Re: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error running blastn Hi Mark, Thanks for your continued help. It now fails with this: ------------- EXCEPTION ------------- MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running /usr/local/ncbi/blast/bin/blastn : BLAST Database error: No alias or index file found for nucleotide database [MouseGenome] in search path [/Volumes/stubbing/PerlScripts/5CTest/trunk::] STACK Bio::Tools::Run::WrapperBase::_run /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1004 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 STACK Bio::Tools::Run::StandAloneBlastPlus::run /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 STACK toplevel ./5CTest.pl:63 ------------------------------------- If I change the factory creation to: my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new( -db_name => '/Users/stubbing/localBlast/MouseGenome' ); it fails with ------------- EXCEPTION ------------- MSG: DB name not valid STACK Bio::Tools::Run::StandAloneBlastPlus::new /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:516 STACK toplevel ./5CTest.pl:45 ------------------------------------- However I can run the following successfully from the command line: blastn -db /Users/stubbing/localBlast/MouseGenome -query querySequence.fasta Is there something wrong with how I'm referring to the blast database when I construct my factory? Thanks again, M On 28 Jan 2010, at 18:47, Mark A. Jensen wrote: > Hi Mike, > Believe I found the real bug causing the problem (was not accounting for > the db_dir parameter). Crashes should now also throw much more helpful > errors. Please try the code at r16774, and shout back. > thanks -- > MAJ > ----- Original Message ----- > From: "mike stubbington (BI)" > To: "Mark A. Jensen" > Cc: > Sent: Thursday, January 28, 2010 11:18 AM > Subject: Re: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek > error running blastn > > > Hi, > > Thanks for the suggestion. Unfortunately it still fails - error as follows: > > ------------- EXCEPTION ------------- > MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem > running > /usr/local/ncbi/blast/bin/blastn : Illegal seek at > /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 1000, > > line 532. > > STACK Bio::Tools::Run::WrapperBase::_run > /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1005 > STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 > STACK Bio::Tools::Run::StandAloneBlastPlus::run > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 > STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 > STACK toplevel ./5CTest.pl:63 > ------------------------------------- > > M > > On 28 Jan 2010, at 15:56, Mark A. Jensen wrote: > >> Mike - please try updating your bioperl-live (the core) to the latest code >> (revision 16761 or so). >> CommandExts is a work in progress; from the stack errors it looks like you've >> got an older version. >> Try it then ping us back, if you would-- >> Thanks >> Mark >> ----- Original Message ----- >> From: "mike stubbington (BI)" >> To: >> Sent: Thursday, January 28, 2010 10:41 AM >> Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek >> error >> running blastn >> >> >> Dear all, >> >> I am attempting to blast some primers against the mouse genome. I have >> created >> a >> local mouse genome blast database and I can search against it using 'blastn' >> at >> the command line. >> >> I have perl code that creates an array of bioperl sequence objects called >> @primers >> >> I then create a StandAloneBlastPlus factory using the following code? >> >> my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new( >> -db_dir => '/Users/stubbing/localBlast/', >> -db_name => 'MouseGenome' >> ); >> >> and then attempt to blast my primers using this? >> >> my @shortPrimers; >> my $count=1; >> foreach (@primers) { >> my $currentSeq = $_; >> print "Checking primer $count/$primerNumber "; >> if ($_->length < 40) { >> push(@shortPrimers,$_); >> print "Too short!\n"; >> } >> else { >> print "BLASTing..."; >> my $blastResult = $blastFactory->blastn(-query => $currentSeq); >> } >> $count++; >> } >> >> This fails with the following error? >> >> ------------- EXCEPTION ------------- >> MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem >> running >> /usr/local/ncbi/blast/bin/blastn : Illegal seek at >> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 989, >> >> line 532. >> >> STACK Bio::Tools::Run::WrapperBase::_run >> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:994 >> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD >> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 >> STACK Bio::Tools::Run::StandAloneBlastPlus::run >> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 >> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD >> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 >> STACK toplevel ./5CTest.pl:63 >> ------------------------------------- >> >> Line 63 in my code is (as you might expect) the one that calls blastn on my >> factory object. >> >> I'd appreciate any help you might be able to provide to shed light on this. >> >> Thanks in advance, >> >> Mike >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Fri Jan 29 08:47:48 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 29 Jan 2010 08:47:48 -0500 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error running blastn In-Reply-To: References: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk><56CC8EF7EEAD4E28A3B822ADA8AD74B1@NewLife><05171AE0-71F2-4102-8131-70C6019D0172@bbsrc.ac.uk> Message-ID: <2B7BF6CD46AE441AB24203E169D9C503@NewLife> Mike et al-- I've entered this as Bug #3003 on http://bugzilla.bioperl.org; we'll do further ping-pongs on this issue via the comment facility there-- cheers MAJ ----- Original Message ----- From: "mike stubbington (BI)" To: "Mark A. Jensen" Cc: ; ; "Osborne" Sent: Friday, January 29, 2010 8:25 AM Subject: Re: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error running blastn Hi Mark, Thanks for your continued help. It now fails with this: ------------- EXCEPTION ------------- MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running /usr/local/ncbi/blast/bin/blastn : BLAST Database error: No alias or index file found for nucleotide database [MouseGenome] in search path [/Volumes/stubbing/PerlScripts/5CTest/trunk::] STACK Bio::Tools::Run::WrapperBase::_run /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1004 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 STACK Bio::Tools::Run::StandAloneBlastPlus::run /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 STACK toplevel ./5CTest.pl:63 ------------------------------------- If I change the factory creation to: my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new( -db_name => '/Users/stubbing/localBlast/MouseGenome' ); it fails with ------------- EXCEPTION ------------- MSG: DB name not valid STACK Bio::Tools::Run::StandAloneBlastPlus::new /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:516 STACK toplevel ./5CTest.pl:45 ------------------------------------- However I can run the following successfully from the command line: blastn -db /Users/stubbing/localBlast/MouseGenome -query querySequence.fasta Is there something wrong with how I'm referring to the blast database when I construct my factory? Thanks again, M On 28 Jan 2010, at 18:47, Mark A. Jensen wrote: > Hi Mike, > Believe I found the real bug causing the problem (was not accounting for > the db_dir parameter). Crashes should now also throw much more helpful > errors. Please try the code at r16774, and shout back. > thanks -- > MAJ > ----- Original Message ----- > From: "mike stubbington (BI)" > To: "Mark A. Jensen" > Cc: > Sent: Thursday, January 28, 2010 11:18 AM > Subject: Re: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek > error running blastn > > > Hi, > > Thanks for the suggestion. Unfortunately it still fails - error as follows: > > ------------- EXCEPTION ------------- > MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem > running > /usr/local/ncbi/blast/bin/blastn : Illegal seek at > /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 1000, > > line 532. > > STACK Bio::Tools::Run::WrapperBase::_run > /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1005 > STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 > STACK Bio::Tools::Run::StandAloneBlastPlus::run > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 > STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 > STACK toplevel ./5CTest.pl:63 > ------------------------------------- > > M > > On 28 Jan 2010, at 15:56, Mark A. Jensen wrote: > >> Mike - please try updating your bioperl-live (the core) to the latest code >> (revision 16761 or so). >> CommandExts is a work in progress; from the stack errors it looks like you've >> got an older version. >> Try it then ping us back, if you would-- >> Thanks >> Mark >> ----- Original Message ----- >> From: "mike stubbington (BI)" >> To: >> Sent: Thursday, January 28, 2010 10:41 AM >> Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek >> error >> running blastn >> >> >> Dear all, >> >> I am attempting to blast some primers against the mouse genome. I have >> created >> a >> local mouse genome blast database and I can search against it using 'blastn' >> at >> the command line. >> >> I have perl code that creates an array of bioperl sequence objects called >> @primers >> >> I then create a StandAloneBlastPlus factory using the following code? >> >> my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new( >> -db_dir => '/Users/stubbing/localBlast/', >> -db_name => 'MouseGenome' >> ); >> >> and then attempt to blast my primers using this? >> >> my @shortPrimers; >> my $count=1; >> foreach (@primers) { >> my $currentSeq = $_; >> print "Checking primer $count/$primerNumber "; >> if ($_->length < 40) { >> push(@shortPrimers,$_); >> print "Too short!\n"; >> } >> else { >> print "BLASTing..."; >> my $blastResult = $blastFactory->blastn(-query => $currentSeq); >> } >> $count++; >> } >> >> This fails with the following error? >> >> ------------- EXCEPTION ------------- >> MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem >> running >> /usr/local/ncbi/blast/bin/blastn : Illegal seek at >> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 989, >> >> line 532. >> >> STACK Bio::Tools::Run::WrapperBase::_run >> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:994 >> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD >> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 >> STACK Bio::Tools::Run::StandAloneBlastPlus::run >> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 >> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD >> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 >> STACK toplevel ./5CTest.pl:63 >> ------------------------------------- >> >> Line 63 in my code is (as you might expect) the one that calls blastn on my >> factory object. >> >> I'd appreciate any help you might be able to provide to shed light on this. >> >> Thanks in advance, >> >> Mike >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From help at gmod.org Fri Jan 29 17:03:48 2010 From: help at gmod.org (Dave Clements, GMOD Help Desk) Date: Fri, 29 Jan 2010 14:03:48 -0800 Subject: [Bioperl-l] 2010 GMOD Summer School - Americas In-Reply-To: <71ee57c71001291402h2ec67300r4fc7a3b2375f4080@mail.gmail.com> References: <71ee57c71001291351q47994b82w10dffb390dbf2837@mail.gmail.com> <71ee57c71001291354m68548823s3e3fbd2e49e9b332@mail.gmail.com> <71ee57c71001291356p5e7f1aadi2bf437c93014a393@mail.gmail.com> <71ee57c71001291357h67112e2fkcf835687e59f66ae@mail.gmail.com> <71ee57c71001291358k74781b08n232534d8895c5ec1@mail.gmail.com> <71ee57c71001291400y28e40eb6i112ea91df977dc67@mail.gmail.com> <71ee57c71001291400n6133982eh3a02293ff741900b@mail.gmail.com> <71ee57c71001291401y505b56baic61c11754d88a444@mail.gmail.com> <71ee57c71001291402s23e3f2e9w2562d6acf85bd4ae@mail.gmail.com> <71ee57c71001291402h2ec67300r4fc7a3b2375f4080@mail.gmail.com> Message-ID: <71ee57c71001291403s19be18f3s3a1d5a314c74def@mail.gmail.com> Hello all, I am pleased to announce that we are now accepting applications for: ? 2010 GMOD Summer School - Americas ? ? 6-9 May 2010 ? ? NESCent, Durham, NC, USA ? ? http://gmod.org/wiki/2010_GMOD_Summer_School_-_Americas This will be a hands-on multi-day course aimed at teaching new GMOD users/administrators how to get GMOD Components up and running. The course will introduce participants to the GMOD project and then focus on installation, configuration and integration of popular GMOD Components. The course will be held May 6-9, at NESCent in Durham, NC. These components will be covered: ? ?* Apollo - genome annotation editor ? ?* Chado - a modular and extensible database schema ? ?* Galaxy - workflow system ? ?* GBrowse - the Generic Genome Browser ? ?* GBrowse_syn - A generic synteny browser ? ?* JBrowse - genome browser ? ?* MAKER - genome annotation pipeline ? ?* Tripal - web front end for Chado The deadline for applying is the end of Friday, February 22. Admission is competitive and is based on the strength of the application (especially the statement of interest). In 2009 there were over 50 applications for the 25 slots. Any applications received after the deadline will be placed on the waiting list. See the course page for details and an application link: ?http://gmod.org/wiki/2010_GMOD_Summer_School_-_Americas Thanks, Dave Clements GMOD Help Desk PS: We are also investigating holding a GMOD course in the Asia/Pacific region, sometime this fall. Watch the GMOD mailing lists and the GMOD News page/RSS feed for updates. -- Please keep responses on the list! http://gmod.org/wiki/2010_GMOD_Summer_School_-_Americas http://gmod.org/wiki/GMOD_News Was this helpful? http://gmod.org/wiki/Help_Desk_Feedback From bhakti.dwivedi at gmail.com Sat Jan 30 17:38:40 2010 From: bhakti.dwivedi at gmail.com (Bhakti Dwivedi) Date: Sat, 30 Jan 2010 17:38:40 -0500 Subject: [Bioperl-l] how to map blast results on to the genome? Message-ID: Does anyone know how I can graphically map the blast results (m -8 format) to the genome using bio-perl? Thanks Bhakti From jason at bioperl.org Sat Jan 30 18:56:14 2010 From: jason at bioperl.org (Jason Stajich) Date: Sat, 30 Jan 2010 15:56:14 -0800 Subject: [Bioperl-l] how to map blast results on to the genome? In-Reply-To: References: Message-ID: <68937A7D-291F-419A-9ED7-7A87D9B4C78A@bioperl.org> Did you try BioGraphics and read the HOWTO on it -- http://bioperl.org/wiki/HOWTO:Graphics On Jan 30, 2010, at 2:38 PM, Bhakti Dwivedi wrote: > Does anyone know how I can graphically map the blast results (m -8 > format) > to the genome using bio-perl? > > Thanks > > Bhakti > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ http://twitter.com/hyphaltip From David.Messina at sbc.su.se Sun Jan 31 12:43:52 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Sun, 31 Jan 2010 18:43:52 +0100 Subject: [Bioperl-l] question about a PAML module In-Reply-To: <18165610.1264958229480.JavaMail.oracle@rif1.s.upf.edu> References: <17885902.1262198478831.JavaMail.oracle@rif1.s.upf.edu> <31992102.1262223390984.JavaMail.oracle@rif2.s.upf.edu> <18165610.1264958229480.JavaMail.oracle@rif1.s.upf.edu> Message-ID: Hey Rui, My apologies for keeping you waiting on this. I started looking at it on Friday, and while I believe it'll be a relatively easy fix, I haven't got to the bottom of it yet. I'll look at it some more tomorrow and hopefully get it sorted it in the next day or two. Dave From bluecurio at gmail.com Sun Jan 31 22:22:37 2010 From: bluecurio at gmail.com (Daniel Renfro) Date: Sun, 31 Jan 2010 21:22:37 -0600 Subject: [Bioperl-l] New package to compare two SeqI-implementing objects Message-ID: <25c0f63d1001311922u134e9321s10f490a335f4a6e@mail.gmail.com> Hello all, A colleague and I have been working on a (Bio)Perl package to compare two Seq objects. This is in response to a need we found in our lab -- we wanted to see the changes to GenBank files through time, but wanted an automated way to do this. This led to what I'm calling the SeqDiff.pm package. I thought it would be a good idea to inform the community and get some feedback. The package takes two Seq objects as arguments, arbitrarily called "old" and "new." It then matches the features from the old object with the new object. This is done based on some criteria -- in our case we decided the features must be of the same type (have the same primary_tag) and have at least one matching database cross-reference (db_xref) in common. The left-over features (ones that did not have a match) are dropped into arrays called "lost" and "gained." The matching is done in about NlogN time, as each matching pair are removed from subsequent searches. The matched features and iterated through and the differences are calculated. Each feature is examined recursively and any differences are reported. Optionally you can give the new() method a flag so that everything is returned (differences and similarities.) You can set callbacks for different types of objects (like anything that isa('Bio::LocationI')) if you want a custom comparison for specific BioPerl objects. This comparison step is the computationally slow part, and currently everything is held in memory. I think it'd be better to do this piece-meal, using the BioPerl-ish next() and last() methods. Maybe this was a little verbose, but that is the SeqDiff package in a nutshell. I hope to soon release v1.0. If you have any questions or comments I'd love to hear them. -Daniel Renfro Hu Lab Research Associate Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4055 From maj at fortinbras.us Sun Jan 31 22:47:05 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 31 Jan 2010 22:47:05 -0500 Subject: [Bioperl-l] New package to compare two SeqI-implementing objects In-Reply-To: <25c0f63d1001311922u134e9321s10f490a335f4a6e@mail.gmail.com> References: <25c0f63d1001311922u134e9321s10f490a335f4a6e@mail.gmail.com> Message-ID: <5DC96D65B6A447C3802AF5D745FF4AA4@NewLife> Daniel-- this sounds interesting and useful, I +1 it. Your intuition about in-memory vs streaming sounds correct to me; features can be many, and diffing many (MANY) sequences may bork. Maybe our feature-rich users can chime in. (...however, I did just hear about a magic spell called 'File::Map', might check that out on CPAN.) cheers- MAJ ----- Original Message ----- From: "Daniel Renfro" To: Sent: Sunday, January 31, 2010 10:22 PM Subject: [Bioperl-l] New package to compare two SeqI-implementing objects > Hello all, > > A colleague and I have been working on a (Bio)Perl package to compare two > Seq objects. This is in response to a need we found in our lab -- we wanted > to see the changes to GenBank files through time, but wanted an automated > way to do this. This led to what I'm calling the SeqDiff.pm package. I > thought it would be a good idea to inform the community and get some > feedback. > > The package takes two Seq objects as arguments, arbitrarily called "old" and > "new." It then matches the features from the old object with the new object. > This is done based on some criteria -- in our case we decided the features > must be of the same type (have the same primary_tag) and have at least one > matching database cross-reference (db_xref) in common. The left-over > features (ones that did not have a match) are dropped into arrays called > "lost" and "gained." The matching is done in about NlogN time, as each > matching pair are removed from subsequent searches. > > The matched features and iterated through and the differences are > calculated. Each feature is examined recursively and any differences are > reported. Optionally you can give the new() method a flag so that everything > is returned (differences and similarities.) You can set callbacks for > different types of objects (like anything that isa('Bio::LocationI')) if you > want a custom comparison for specific BioPerl objects. This comparison step > is the computationally slow part, and currently everything is held in > memory. I think it'd be better to do this piece-meal, using the BioPerl-ish > next() and last() methods. > > Maybe this was a little verbose, but that is the SeqDiff package in a > nutshell. I hope to soon release v1.0. If you have any questions or comments > I'd love to hear them. > > -Daniel Renfro > > Hu Lab Research Associate > Dept. of Biochemistry and Biophysics > 2128 TAMU > Texas A&M Univ. > College Station, TX 77843-2128 > 979-862-4055 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From rui.faria at upf.edu Sun Jan 31 12:17:09 2010 From: rui.faria at upf.edu (Rui Faria) Date: Sun, 31 Jan 2010 18:17:09 +0100 (CET) Subject: [Bioperl-l] question about a PAML module In-Reply-To: References: <17885902.1262198478831.JavaMail.oracle@rif1.s.upf.edu> <31992102.1262223390984.JavaMail.oracle@rif2.s.upf.edu> Message-ID: <18165610.1264958229480.JavaMail.oracle@rif1.s.upf.edu> Hi Dave, we reported the bug on codeml about errors when the user gives its own tree file, some time ago. Did you have any chances to look at it? We basically wanted to know your opinion on where the problem may be, since we are not the most experienced "perlers" on the planet :) I'm asking this because we have to deal with that right now. If someone could check where is the problem, to understand if it has an easy solution, that would be of great help. Best, Rui -----Mensaje Original----- De Dave Messina Enviado Jue 31/12/2009 11:55 AM Para Rui Faria Cc Jason Stajich ; sandraneto_ at hotmail.com; bioperl-l List Asunto Re: question about a PAML module Hi Rui and Sandra, Could you file this as a bug report at http://bugzilla.open-bio.org/enter_bug.cgi?product=Bioperl ? Once you've created the bug report with a brief description of the problem and submitted it, please attach the following to the bug report: - sample input files (a sequence file and a tree file, probably) - a script which reproduces the problem - the output (error messages) like you show below When I updated the code to work with the current version, I didn't exhaustively test all of the different modes of running codeml, so I appreciate you reporting this. There was another, similar issue reported a few days ago. I will try to take a look at both of these bug reports soon. Dave From rui.faria at upf.edu Sun Jan 31 13:56:56 2010 From: rui.faria at upf.edu (Rui Faria) Date: Sun, 31 Jan 2010 19:56:56 +0100 (CET) Subject: [Bioperl-l] question about a PAML module In-Reply-To: References: <17885902.1262198478831.JavaMail.oracle@rif1.s.upf.edu> <31992102.1262223390984.JavaMail.oracle@rif2.s.upf.edu> <18165610.1264958229480.JavaMail.oracle@rif1.s.upf.edu> Message-ID: <11398434.1264964216856.JavaMail.oracle@rif1.s.upf.edu> Many thanks! We hope one day that we become experts we can retribute! Rui -----Mensaje Original----- De Dave Messina Enviado Dom 31/01/2010 06:43 PM Para Rui Faria Cc Jason Stajich ; sandraneto_ at hotmail.com; bioperl-l List Asunto Re: question about a PAML module Hey Rui, My apologies for keeping you waiting on this. I started looking at it on Friday, and while I believe it'll be a relatively easy fix, I haven't got to the bottom of it yet. I'll look at it some more tomorrow and hopefully get it sorted it in the next day or two. Dave From avilella at gmail.com Sat Jan 2 03:57:28 2010 From: avilella at gmail.com (Albert Vilella) Date: Sat, 2 Jan 2010 08:57:28 +0000 Subject: [Bioperl-l] Downloading from dbEST by taxon range Message-ID: <358f4d651001020057g21c8497eia6559bfeac0b5544@mail.gmail.com> Hi all and happy 2010 for those that follow the Gregorian calendar, A question that is a bit in between bioperl and NCBI. I would like to use bioperl to download sequences fom dbEST. For that, my idea is to use Bio::DB::Genbank and get the sequences by gi id. Now, I want my script to download sequences for a given NCBI taxonomy clade. For example, if I want to download all fish (clupeocephala) sequences in dbEST, I can browse it around with the dbEST webpage using "clupeocephala[taxonomy]", so I am thinking there should be a way to do it programmatically. How can I query NCBI dbEST through bioperl to give me the list of GI ids I am looking for given a taxon id? Thanks in advance, Albert. From jason at bioperl.org Sat Jan 2 11:35:22 2010 From: jason at bioperl.org (Jason Stajich) Date: Sat, 2 Jan 2010 08:35:22 -0800 Subject: [Bioperl-l] Downloading from dbEST by taxon range In-Reply-To: <358f4d651001020057g21c8497eia6559bfeac0b5544@mail.gmail.com> References: <358f4d651001020057g21c8497eia6559bfeac0b5544@mail.gmail.com> Message-ID: DId you try Bio::DB::Query::GenBank ? You'd want to use -db => 'nucest' and then you just put in an Entrez query as per the example. you can include dates in the query so you can do updates to your locally retrieved data in a script that runs periodically. -jason On Jan 2, 2010, at 12:57 AM, Albert Vilella wrote: > Hi all and happy 2010 for those that follow the Gregorian calendar, > > A question that is a bit in between bioperl and NCBI. I would like > to use > bioperl to download sequences fom dbEST. For that, my idea is to use > Bio::DB::Genbank and get the sequences by gi id. > > Now, I want my script to download sequences for a given NCBI > taxonomy clade. > > For example, if I want to download all fish (clupeocephala) > sequences in dbEST, > I can browse it around with the dbEST webpage using > "clupeocephala[taxonomy]", > so I am thinking there should be a way to do it programmatically. > > How can I query NCBI dbEST through bioperl to give me the list of GI > ids I am > looking for given a taxon id? > > Thanks in advance, > > Albert. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ From avilella at gmail.com Sun Jan 3 04:08:33 2010 From: avilella at gmail.com (Albert Vilella) Date: Sun, 3 Jan 2010 09:08:33 +0000 Subject: [Bioperl-l] Downloading from dbEST by taxon range In-Reply-To: References: <358f4d651001020057g21c8497eia6559bfeac0b5544@mail.gmail.com> Message-ID: <358f4d651001030108p6a92fb27k5fa39be6bebb3a9c@mail.gmail.com> Thanks Jason! For the sake of completion, here is the script I needed: --------------------- #!/usr/bin/perl use strict; use Bio::SeqIO; use Bio::DB::Taxonomy; use Bio::DB::Query::GenBank; use Bio::DB::GenBank; use Bio::SeqIO; use Getopt::Long; my $keyword_type = 'EST'; my $outdir = '.'; my $taxon_name = undef; my $db_type = 'nucest'; GetOptions('keyword_type:s' => \$keyword_type, 't|taxon_name:s' => \$taxon_name, 'db_type:s' => \$db_type, 'outdir:s' => \$outdir); my $query_string = $taxon_name ."[Organism] AND ". $keyword_type ."[Keyword]"; my $db = Bio::DB::Query::GenBank->new (-db => $db_type, -query => $query_string, -mindate => '2007', -maxdate => '2010'); my $taxon_name_string = $taxon_name; $taxon_name_string =~ s/\ /\_/g; my $outfile = $outdir . "/" . $taxon_name_string . ".". $db_type . ".fasta"; my $out = Bio::SeqIO->new(-file => ">$outfile", -format => 'fasta'); print $db->count,"\n"; my $gb = Bio::DB::GenBank->new(); my $stream = $gb->get_Stream_by_query($db); while (my $seq = $stream->next_seq) { # Filtering reads shorter than 800 next unless (length($seq->seq) > 800); $out->write_seq($seq); } $out->close; --------------------- On Sat, Jan 2, 2010 at 4:35 PM, Jason Stajich wrote: > DId you try Bio::DB::Query::GenBank ? > You'd want to use -db => 'nucest' and then you just put in an Entrez query > as per the example. ?you can include dates in the query so you can do > updates to your locally retrieved data in a script that runs periodically. > > -jason > On Jan 2, 2010, at 12:57 AM, Albert Vilella wrote: > >> Hi all and happy 2010 for those that follow the Gregorian calendar, >> >> A question that is a bit in between bioperl and NCBI. I would like to use >> bioperl to download sequences fom dbEST. For that, my idea is to use >> Bio::DB::Genbank and get the sequences by gi id. >> >> Now, I want my script to download sequences for a given NCBI taxonomy >> clade. >> >> For example, if I want to download all fish (clupeocephala) sequences in >> dbEST, >> I can browse it around with the dbEST webpage using >> "clupeocephala[taxonomy]", >> so I am thinking there should be a way to do it programmatically. >> >> How can I query NCBI dbEST through bioperl to give me the list of GI ids I >> am >> looking for given a taxon id? >> >> Thanks in advance, >> >> Albert. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > http://fungalgenomes.org/ > > From Jean-Marc.Frigerio at pierroton.inra.fr Mon Jan 4 09:12:18 2010 From: Jean-Marc.Frigerio at pierroton.inra.fr (Jean-Marc Frigerio INRA) Date: Mon, 04 Jan 2010 15:12:18 +0100 Subject: [Bioperl-l] How to read in the whole fasta file in the memory? In-Reply-To: References: Message-ID: <4B41F742.2030209@pierroton.inra.fr> > Message: 1 > Date: Thu, 31 Dec 2009 11:26:45 +1800 > From: Peng Yu > Subject: [Bioperl-l] How to read in the whole fasta file in the > memory? > To: bioperl-l at lists.open-bio.org > Message-ID: > <366c6f340912300926k5af5cc88nc3c3babda541fd1 at mail.gmail.com> > Content-Type: text/plain; charset=ISO-8859-1 > > With Bio::SeqIO, I can only read in the records in a fasta file one by > one. This is preferable if there are many records in a file. > > But I also want to read all the records in. I could use a while loop > to read all records in. But could somebody let me know if there is a > function in bioperl that can read in all the record at once and return > me an object? > > http://www.bioperl.org/wiki/HOWTO:SeqIO > > > ------------------------------ > > Message: 2 > Date: Wed, 30 Dec 2009 13:04:53 -0500 > From: Sean Davis > Subject: Re: [Bioperl-l] How to read in the whole fasta file in the > memory? > To: Peng Yu > Cc: "bioperl-l at lists.open-bio.org" > Message-ID: > <264855a00912301004t396e0d4fwf9d291c5d82c3fb9 at mail.gmail.com> > Content-Type: text/plain; charset=UTF-8 > > On Wed, Dec 30, 2009 at 12:26 PM, Peng Yu wrote: >> With Bio::SeqIO, I can only read in the records in a fasta file one by >> one. This is preferable if there are many records in a file. >> >> But I also want to read all the records in. I could use a while loop >> to read all records in. But could somebody let me know if there is a >> function in bioperl that can read in all the record at once and return >> me an object? > > In perl, you can use an array to store the records. You could also > use a hash if you have reasonable keys for the entries. > > Sean > > >> http://www.bioperl.org/wiki/HOWTO:SeqIO >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > ------------------------------ > > Message: 3 > Date: Wed, 30 Dec 2009 11:58:54 -0800 > From: Jason Stajich > Subject: Re: [Bioperl-l] How to read in the whole fasta file in the > memory? > To: Peng Yu > Cc: BioPerl List > Message-ID: <3550F192-111F-48A7-B1B7-113FFFAC105B at bioperl.org> > Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes > > or use a database object so you can retrieve sequences that have a > particular id. See Bio::DB::Fasta > On Dec 30, 2009, at 10:04 AM, Sean Davis wrote: > >> On Wed, Dec 30, 2009 at 12:26 PM, Peng Yu wrote: >>> With Bio::SeqIO, I can only read in the records in a fasta file one >>> by >>> one. This is preferable if there are many records in a file. >>> >>> But I also want to read all the records in. I could use a while loop >>> to read all records in. But could somebody let me know if there is a >>> function in bioperl that can read in all the record at once and >>> return >>> me an object? >> In perl, you can use an array to store the records. You could also >> use a hash if you have reasonable keys for the entries. >> >> Sean >> >> >>> http://www.bioperl.org/wiki/HOWTO:SeqIO >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > http://fungalgenomes.org/ > > > > ------------------------------ > > Message: 4 > Date: Wed, 30 Dec 2009 16:20:31 -0500 > From: "Mark A. Jensen" > Subject: Re: [Bioperl-l] How to read in the whole fasta file in the > memory? > To: "Peng Yu" , > Message-ID: <2646F627E6D14AADB412A6E6B51E24DA at NewLife> > Content-Type: text/plain; format=flowed; charset="iso-8859-1"; > reply-type=original > > I think you might want Bio::AlignIO: > > $alnio = Bio::AlignIO->new(-file=> 'my.fas' ); > $aln = $alnio->next_aln; > @seqs = $aln->each_seqs; > > MAJ > ----- Original Message ----- > From: "Peng Yu" > To: > Sent: Wednesday, December 30, 2009 12:26 PM > Subject: [Bioperl-l] How to read in the whole fasta file in the memory? > > >> With Bio::SeqIO, I can only read in the records in a fasta file one by >> one. This is preferable if there are many records in a file. >> >> But I also want to read all the records in. I could use a while loop >> to read all records in. But could somebody let me know if there is a >> function in bioperl that can read in all the record at once and return >> me an object? >> >> http://www.bioperl.org/wiki/HOWTO:SeqIO >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l Hi, I wrote and currently use a module I named Bio::SeqIO::multifasta, which is basically a copy of Bio::SeqIO::fasta plus a few methods: get_by_id(), get_by_order(), first_seq() and previous_seq() It would need review, validation etc. Do I submit it to Bugzilla ? -- jmf From jason at bioperl.org Mon Jan 4 11:03:45 2010 From: jason at bioperl.org (Jason Stajich) Date: Mon, 4 Jan 2010 08:03:45 -0800 Subject: [Bioperl-l] How to read in the whole fasta file in the memory? In-Reply-To: <4B41F742.2030209@pierroton.inra.fr> References: <4B41F742.2030209@pierroton.inra.fr> Message-ID: <16D7C8C1-E4BE-406F-9D60-379876178CAB@bioperl.org> We typically think of SeqIO as parsing a stream of data, not being reliant on it being a file which is what these methods would be implying I think. Sounds a lot like a database - does Bio::DB::Fasta not provide some of the functionality you need by these methods? I realize there isn't a by_order() but the get_by_id() is implemented to allow random access. -jason > > Hi, > > I wrote and currently use a module I named Bio::SeqIO::multifasta, > which is basically a copy of Bio::SeqIO::fasta plus a few methods: > get_by_id(), get_by_order(), first_seq() and previous_seq() > > It would need review, validation etc. Do I submit it to Bugzilla ? > > -- jmf > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ From avilella at gmail.com Mon Jan 4 15:00:24 2010 From: avilella at gmail.com (Albert Vilella) Date: Mon, 4 Jan 2010 20:00:24 +0000 Subject: [Bioperl-l] indexed fastq files Message-ID: <358f4d651001041200m1f715b83k743f4d2ee6b6312b@mail.gmail.com> Hi all, What is the best way to index fastq files, so that once clustered, I can provide a list of seq_ids and get them back in fastq format from the indexed db? Cheers, Albert. From cjfields at illinois.edu Mon Jan 4 16:59:50 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 4 Jan 2010 15:59:50 -0600 Subject: [Bioperl-l] indexed fastq files In-Reply-To: <358f4d651001041200m1f715b83k743f4d2ee6b6312b@mail.gmail.com> References: <358f4d651001041200m1f715b83k743f4d2ee6b6312b@mail.gmail.com> Message-ID: <07EBA105-6A34-490C-B0B9-4772DF386CBA@illinois.edu> Bio::Index::Fastq, maybe? To tell the truth, I haven't tried it since we refactored FASTQ parsing, so let us know if it doesn't work. chris On Jan 4, 2010, at 2:00 PM, Albert Vilella wrote: > Hi all, > > What is the best way to index fastq files, so that once clustered, I > can provide a list of seq_ids and get > them back in fastq format from the indexed db? > > Cheers, > > Albert. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Jan 4 22:54:03 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 4 Jan 2010 21:54:03 -0600 Subject: [Bioperl-l] How to read in the whole fasta file in the memory? In-Reply-To: <4B41F742.2030209@pierroton.inra.fr> References: <4B41F742.2030209@pierroton.inra.fr> Message-ID: <1BAE5508-0DB7-41B4-92E3-49256582131F@illinois.edu> Jean-Marc, You can do that, yes. Just curious, but have you looked at the various flat file indexing modules for FASTA? Bio::DB::Fasta and Bio::Index::Fasta are commonly used and allow lookups by primary ID (and I think in some cases secondary IDs). chris On Jan 4, 2010, at 8:12 AM, Jean-Marc Frigerio INRA wrote: > ... > > Hi, > > I wrote and currently use a module I named Bio::SeqIO::multifasta, which is basically a copy of Bio::SeqIO::fasta plus a few methods: > get_by_id(), get_by_order(), first_seq() and previous_seq() > > It would need review, validation etc. Do I submit it to Bugzilla ? > > -- jmf > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From fs5 at sanger.ac.uk Wed Jan 6 17:16:13 2010 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Wed, 06 Jan 2010 22:16:13 +0000 Subject: [Bioperl-l] Bio::DB::Sam strange behaviour for read pairs Message-ID: <4B450BAD.3050807@sanger.ac.uk> I'm trying to extract paired reads from a BAM file that span a given region. I would then like to get the two read ends of the sequenced clone that spans the region. I use Bio::DB::Sam->get_features_by_location for this and it does give me the correct read pairs as a region match but it doesn't give me both read pairs in all cases. Here is the script: #!/usr/bin/perl use Bio::DB::Sam; my $usage = "usage: $0 BAMFILE CHROMOSOME STARTPOS ENDPOS\n" ; my ($bam_file,$chrom,$start,$end) = @ARGV ; die $usage unless $bam_file && $chrom && $start && $end; my $bam = Bio::DB::Sam->new(-bam => $bam_file); my @pairs = $bam->get_features_by_location( -type => 'read_pair', -seq_id => $chrom, -start => $start, -end => $end); print "region: $chrom:$start..$end\n" ; foreach my $pair (@pairs) { print " pair: id: ".$pair->id.", start".$pair->start.', end:'.$pair->end."\n"; my ($first_mate,$second_mate) = $pair->get_SeqFeatures; print " first_mate: start:".$first_mate->start.', end:'.$first_mate->end."\n"; if ($second_mate){ print " second_mate: start:".$second_mate->start.', end:'.$second_mate->end."\n"; } else { print " no second mate\n"; } } And here are the matching pairs that it produces with one of my files for the region tal12:22479..29232: region: tal12:22479..29232 pair: id: tal-2446c08, start17496, end:29423 first_mate: start:28540, end:29423 no second mate pair: id: tal-2463d10, start23534, end:31363 first_mate: start:23534, end:24448 no second mate pair: id: tal-2371c09, start20860, end:28230 first_mate: start:27604, end:28230 no second mate pair: id: tal-2440b06, start19232, end:27099 first_mate: start:26025, end:27099 no second mate pair: id: tal-2327g09, start18909, end:26129 first_mate: start:25354, end:26129 no second mate pair: id: tal-2381b05, start25658, end:35054 first_mate: start:25658, end:26295 no second mate pair: id: tal-2377c11, start20898, end:28230 first_mate: start:27473, end:28230 no second mate pair: id: tal-2426e12, start21975, end:27562 first_mate: start:21975, end:23008 second_mate: start:26396, end:27562 pair: id: tal-2365h10, start22843, end:31944 first_mate: start:22843, end:23184 no second mate pair: id: tal-2388h09, start19016, end:28238 first_mate: start:27475, end:28238 no second mate So it finds a lot of pairs that span the region and the start/end from the pair is also correct but it only gives me both individual mates in one case: pair: id: tal-2426e12, start21975, end:27562 first_mate: start:21975, end:23008 second_mate: start:26396, end:27562 In this case, both pairs are actually inside the query region (at least partially) whereas in the other cases, one of the mates is not inside, e.g. this one: pair: id: tal-2388h09, start19016, end:28238 first_mate: start:27475, end:28238 no second mate > get this read pair from the BAM file: $ samtools view clones.bam | grep tal-2388h09 tal-2388h09 99 tal12 19016 205 36H9M1D14M1D664M1D16M1D21M1D28M1D15M1D10M1D12M1D7M1D8M1D5M = 27475 9223 CTTTGGATGAAATAGTTTTTAAATAATACTTATTAAATATTAAATATATAACACATAAATAAGTATTGATGCAAATTTTAAAGTATTATAGAAAACTAGGTTTGATTATATTGTTATACTGTACTTTAAGAGGAGAGAGATAAGATATCTTTGCTCTTTTAATATATAAATTTAGATAAATATTCGTTAAATTTTCTACATAGTTATTTTTTATCTTATATATTATACTGCTATAGTTATCAATGTATATACATTCAAATAATTTATTAAAAATTCTATATTATATTAATTCTATGATAAAATAATCCTGTTTGTGATTTAAAAAATGATGATTCAATAAAAACTAATAATATAATACGAGTTAATATGGAATAATAAAATGGCATTTAACATGAATTTAGTCTTTAACCTTTTCTTTGTTTGTCAAGTTTTTTAAAACATAAAACCACACATTTCAAAATGGATTTTTAGCAAATATATAAAAATTATACATTTATAATGTATTGTTATGCGTCTTTTCGATAGAATCAATATTTAATTATATGAAGTTTCCACAATAAAATAATATTTAATATTATTTATTAGTAGAGTATTTGATTATATATATAGGCATATAATAATAACTCTAGTTCTATCTACCATATTATTTATAATTATTATAACAAAATGTGATATGAAATTTTATTATATACTTATATTATTTTTTTAACTATTTTAAAATATATTTATTTATACCTCAAAACTATAAAATTGAAATTATTAATAATAATCTAATATATACCTTTATAAAAATAAACGTATAAACTAAT ><:4/+1+*)+4>BEH=9-,,66IIIIIIIIEDA>>>>A at DDFFIHHHHHITIIIIIHIIHHHHHHIYYYYYTTTYDDDHDDDDDDDIIINNTNHHHHHIYYYYIIIIIINNNNTTIIIIIIIIIIITTNTTTTTYYYTTTTTTYYNNNNNNLLLLLLLLLLLNNNNNNTTTTTTTTTTTTTNNTNNNTTTYYTLLLLLLTTTTTTTTYTTTNNNNNNTTTTTTTNNTTNNTTTTTTTTTTYYTTTTTTTNNNNNNTTTTTTTYYTTTTTTYYYTTTTTTTTTTTYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYTTTTTTTTTYYYYYYYYYYYYYYYYYYYYYYYTTTTTYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYTTTTYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYTTNNINTTYYYYYYYYYYYYYYYTTTTTTYYYYYYYYYYYYTTTTTTYYYYYYYYYYYYYTTTTTOOOIFFFIFIIOICC>>II@>>>>>>C>>>>>>CIBECCCHIIOOOOOOOOTTTIIFDDEIQQA:55839AA>99>@IIIIII>>::;;I;>>CC>>>>>@III<::=>AAA<>>>>I>:>>99:>842225006824855;5>68844//.//00:>::338:99<:/-+*-./0)((((+00+..,++(((+-()(*((((()*)***))3)''')*..+*++((*1++--+*''''((+/)*42.((***)+,+('*'''*((''''((,'%%''''''''( AS:i:614 MS:i:50 tal-2388h09 147 tal12 27475 205 1H764M40H = 19016 -9223 ATTAAATCGGTATCGCCAACACAATGAGTATAATCATTGTCAAATATGCGTTTGTAAGTATATTCATTGTCACATTCACGTTTGTAAGTATATTCATTGTCACATTCACGTTTGTAAGTATAATCATTGGAACGTTCATTTTTGTAAGTATAATCATTGGTACGTTCATTTTTGTGAGTATAATCATTGGTACGTTCATTTTTATGAGTATAATCATTGGTACGTTCATTTTTGTGAGTATAATCATTGGTACGTTCATTTTTATGAGTATAATCATTGGAACGTTCATTTTTGTAAGTATAATCATTGGTACGTTCATTTTTGTGAGTATAATCATTGGTACGTTCATTTTTATGAGTATAATCATTGGAACGTTCATTTTTATGAGTATAATCATTGGAACGTTCATTTTTGTAAGTATAATCATTGGTACGTTCATTTTTATGAGTATAATCATTGGTACGTTCATTTTTATGAGTATAATCATTGGAACGTTCATTTTTATGAGTATAATCATTGGAACGTTCATTTTTGTAAGTATAATCATTGGAACGTTCATTTTTATGAGTATAATCATTGGTACGTTCATTTTTGTGAGTATAATCATTGGTACGTTCATTTTTGTGAGTATAATCATTGGTACGTTCATTTTTATGAGTATAATCATTGGAACGTTCATTTTTATGAGTATAATCATTGGTACGTTCATTTTTGTGAGTATAATCATTGGTACGTTAATTTTTGTGAGTATAATCATTGGAACG (((0))*,-1-../2((())03---03266300271+*.-0-*''''+*,+/+))*-05330+)..4>7=77273911**((+20+03688633:93036<8;::5:<99379>>::>>>:57:<:7--)))1435::333228>::>II>::>A>>3/.958677AA=AA:>:==IIII8338<>A>>>>IIIIIIIIYYYYYKKYYYMIFFFFEIIIMI::4..8AIIC>9>=EIQQQMCAAAAAACIIIIAICIIIOOYTIIIMOQQMIIIIC>>AAABCCCCCEAI>C>>IQQIIIIIIIIIIKKYYYYYYYYYYYYYYYYYYYYYTIIIIIIYYYYTNINNNTYYYYYYYYYYYYYTTTTTYYYYYYYYYYYYYYYYYYYYYYYYYTTTTTTYYYYYYYYYYYYYYYSSYYYYYYYTTTTTTYYYYYYYYYYYYYYYYYYYYYYYYTTTTTTTTTTTTTTTTTTYYYYYTTTTTTYYYTTNNNNTTYYYYYYYYYTTTTLLTTNNTTTTYTTTTTTYYYYYYYYYYYTTOOKKKLKOOTYYYYYYYYYYYYYYYYTNNNNNNNNNTTTNYNNNNTNNNNTTYYYYYYYYTTNNNNTTYNNNNNITTTTTYYYYYYYYYYTTNNIIIIIDIIIIHTNNNNTTYYYYTNNNIIIIIITTTINIIIINNNNTTTYYYYIHHHDDHHDDIHDDGDFFFTIIINTTYYYYTTTTHHHHCCIIIHIHHHHCAI9:++**1168>ACCIIDDDDDDI>>>>>?NNN AS:i:688 MS:i:50 So the read in the first line starts before the start of the query region and is not accessible via $pair->get_SeqFeatures although this is a valid pair. Am I doing something wrong, is this the desired behaviour or is it a bug? Thanks for your help! -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From hlapp at drycafe.net Thu Jan 7 11:55:00 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Thu, 7 Jan 2010 11:55:00 -0500 Subject: [Bioperl-l] Data missing into Annotation object using Bio::SeqIO (Genbank) In-Reply-To: <29CB0088-99C1-417E-BB3B-56FE7EC135F9@illinois.edu> References: <4B28EB44.3080006@pasteur.fr> <29CB0088-99C1-417E-BB3B-56FE7EC135F9@illinois.edu> Message-ID: <240F198A-83FA-4304-ACA8-80A702A68D8C@drycafe.net> I don't know to what extent this was followed up on further and I guess it's too long ago to be of much help, but if it hasn't been mentioned before I wanted to point out Bio::SeqFeature::AnnotationAdaptor which integrates tag/value annotation and Bio::Annotation annotation into one AnnotationCollection, so it doesn't matter whether something is attached as a tag or as an annotation object. -hilmar On Dec 16, 2009, at 10:09 AM, Chris Fields wrote: > Emmanuel, > > The previous behavior in the 1.5.x series was to store feature tags > as Bio::Annotation. The problem had been the way this was > implemented was considered unsatisfactory for various reasons, so we > reverted back to using simple tag-value pairs as the default. You > can get at the data this way (from the Feature/Annotation HOWTO): > > for my $feat_object ($seq_object->get_SeqFeatures) { > print "primary tag: ", $feat_object->primary_tag, "\n"; > for my $tag ($feat_object->get_all_tags) { > print " tag: ", $tag, "\n"; > for my $value ($feat_object->get_tag_values($tag)) { > print " value: ", $value, "\n"; > } > } > } > > You can also convert all the tag-value data into a > Bio::Annotation::Collection using the > Bio::SeqFeature::AnnotationAdaptor, but this is completely optional. > > chris > > On Dec 16, 2009, at 8:14 AM, Emmanuel Quevillon wrote: > >> Hi, >> >> I've wrote a small Genbank parser few months ago before BioPerl >> release 1.6.0. >> I tried to use my code once again but now the output of my parser >> is empty. >> It looks like Annotation from seqfeatures is not filled anymore. >> >> Here is the code I used previously: >> >> while(my $seq = $streamer->next_seq()){ >> >> #We only want to retrieve CDS features... >> foreach my $feat (grep { $_->primary_tag() eq 'CDS' } $seq- >> >get_SeqFeatures()){ >> print $ofh join("#", >> $feat->annotation()- >> >get_Annotations('locus_tag'), # Acc num >> $feat->annotation()->get_Annotations('gene') >> ? $feat->annotation()- >> >get_Annotations('gene') # Gene name >> : $feat->annotation()- >> >get_Annotations('locus_tag'), >> $feat->annotation()- >> >get_Annotations('product'), # Description >> ),"\n"; >> } >> } >> >> $feat is a Bio::SeqFeature::Generic object >> >> If I print Dumper($feat->annotation()) here is the output : >> >> $VAR1 = bless( { >> '_typemap' => bless( { >> '_type' => { >> 'comment' => >> 'Bio::Annotation::Comment', >> 'reference' => >> 'Bio::Annotation::Reference', >> 'dblink' => >> 'Bio::Annotation::DBLink' >> } >> }, >> 'Bio::Annotation::TypeManager' ), >> '_annotation' => {} >> }, 'Bio::Annotation::Collection' ); >> >> Have some changes been made into the way annotation object is >> populated? >> >> Thanks for any clue and sorry if my question look stupid >> >> Regards >> >> Emmanuel >> >> -- >> ------------------------- >> Emmanuel Quevillon >> Biological Software and Databases Group >> Institut Pasteur >> +33 1 44 38 95 98 >> tuco at_ pasteur dot fr >> ------------------------- >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From rtbio.2009 at gmail.com Fri Jan 8 10:00:21 2010 From: rtbio.2009 at gmail.com (Roopa Raghuveer) Date: Fri, 8 Jan 2010 16:00:21 +0100 Subject: [Bioperl-l] Regarding blast in Bioperl Message-ID: Hello all, I was trying Remote blast using Bioperl. My input data is a Trypanosoma brucei sequence in Fasta format. When I was trying to submit to BLAST using the step $r=$factory->submit_blast($input) It was not returning anything which I checked by debugging the code. It is not blasting my input sequence even though I mentioned all the parameters.I would paste the code below. Please help me in solving put this problem. It is very urgent. Regards Roopa. #!/usr/bin/perl #path for extra camel module use lib "/srv/www/htdocs/rain/RNAi/"; use Roopablast; use Bio::SearchIO; use Bio::Search::Result::BlastResult; use Bio::Perl; use Bio::Tools::Run::RemoteBlast; use Bio::Seq; use Bio::SeqIO; use Bio::DB::GenBank; $serverpath = "/srv/www/htdocs/rain/RNAi"; $serverurl = "http://141.84.66.66/rain/RNAi"; $outfile = $serverpath."/rnairesult_".time().".html"; $nuc = $serverpath."/nuc".time().".txt"; $debugfile = $serverpath."/debug_".time().".txt"; $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; my $outstring =""; &parse_form; print "Content-type: text/html\n\n"; print "\n"; print "RNAi Result"; print " \n"; print "\n"; print "\n"; print " Your results will appear here
"; print " Please be patient, runtime can be up to 5 minutes
"; print " This page will automatically reload in 30 seconds. Roopa"; print "\n"; print "\n"; defined(my $pid = fork) or die "Can't fork: $!"; exit if $pid; open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; open(OUTFILE, '>',$outfile); print OUTFILE "\n RNAi Result \n \n \n Your results will appear here
Please be patient, runtime can be up to 5 minutes wait wait wait......
This page will automatically reload in 30 seconds Roopa
\n \n"; close(OUTFILE); @compseqs = blastcode($in{'Inputseq'}); $in{'Inputseq'} =~ s/>.*$//m; $in{'Inputseq'} =~ s/[^TAGC]//gim; $in{'Inputseq'} =~ tr/actg/ACTG/; @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'}, $in{'Threshold'}); sub blastcode { $inpu1= $_[0]; #$organ= $_[1]; open(NUC,'>',$nuc); print NUC $inpu1; close(NUC); my $prog = 'blastn'; my $db = 'refseq_rna'; my $e_val= '1e-10'; my $organism= 'Trypanosoma Brucei'; $gb = new Bio::DB::GenBank; my @params = ( '-prog' => $prog, '-data' => $db, '-expect' => $e_val, '-readmethod' => 'SearchIO', '-Organism' => $organism ); # open(OUTFILE,'>',$debugfile); # print OUTFILE @params; # close(OUTFILE); my $factory = Bio::Tools::Run::RemoteBlast->new(@params); #change a paramter $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma Brucei[ORGN]'; #change a paramter # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]'; my $v = 1; #$v is just to turn on and off the messages my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , '-organism' => 'Trypanosoma Brucei' ); while (my $input = $str->next_seq()) { #Blast a sequence against a database: #Alternatively, you could pass in a file with many #sequences rather than loop through sequence one at a time #Remove the loop starting 'while (my $input = $str->next_seq())' #and swap the two lines below for an example of that. open(OUTFILE,'>',$debugfile); print OUTFILE $input; close(OUTFILE); my $r = $factory->submit_blast($input); #The program stops here it does not return any value and it does not enter the While loop,Please help me in this regard.# open(OUTFILE,'>',$debugfile); print OUTFILE $r; close(OUTFILE); print STDERR "waiting...." if($v>0); while ( my @rids = $factory->each_rid ) { open(OUTFILE,'>',$debugfile); print OUTFILE "while entered"; close(OUTFILE); foreach my $rid ( @rids ) { open(OUTFILE,'>',$debugfile); print OUTFILE "foreach entered"; close(OUTFILE); my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $factory->remove_rid($rid); } open(OUTFILE,'>',$debugfile); print OUTFILE "if entered"; close(OUTFILE); print STDERR "." if ( $v > 0 ); sleep 5; } else { open(OUTFILE,'>',$debugfile); print OUTFILE "else entered"; close(OUTFILE); my $result = $rc->next_result(); #save the output $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; open(BLASTDEBUGFILE,'>',$blastdebugfile); print BLASTDEBUGFILE $result->next_hit(); close(BLASTDEBUGFILE); my $filename = $serverpath."/blastdata_".time().$result->query_name()."\.out"; # open(DEBUGFILE,'>',$debugfile); # open(new,'>',$filename); # @arra=; # print DEBUGFILE @arra; # close(DEBUGFILE); # close(new); $factory->save_output($filename); # open(BLASTDEBUGFILE,'>',$debugfile); # print BLASTDEBUGFILE "Hello $rid"; # close(BLASTDEBUGFILE); $factory->remove_rid($rid); open(BLASTDEBUGFILE,'>',$blastdebugfile); print BLASTDEBUGFILE $organism; close(BLASTDEBUGFILE); # open(OUTFILE,'>',$outfile); # print OUTFILE "Test2 $result->database_name()"; # close(OUTFILE); #$hit = $result->next_hit; #open(new,'>',$debugfile); #print $hit; #close(new); while ( my $hit = $result->next_hit ) { next unless ( $v > 0); # open(OUTFILE,'>',$debugfile); # print OUTFILE "$hit in while hits"; # close(OUTFILE); my $sequ = $gb->get_Seq_by_version($hit->name); my $dna = $sequ->seq(); # get the sequence as a string push(@seqs,$dna); } } } } } #open(OUTFILE,'>',$debugfile); #print OUTFILE $seqs[0]; #close(OUTFILE); return(@seqs); } open(OUTFILE, '>',$outfile) || die ; print OUTFILE "\n RNAi Result \n \n

Inputsequence:
"; for ($i=0; $i\n"; } } print OUTFILE "

"; $z=@compseqs; for($k=1;$k<$z;$k++) { print OUTFILE "

Compare Sequence:
"; for ($i=0; $i\n"; } } print OUTFILE "

"; } print OUTFILE "

Window:
$in{'Windowsize'}

Threshold:
$in{'Threshold'}

"; my $j=0; for ($i=0; $i{similar}<=$in{'Threshold'}){ $j=$in{'Windowsize'}; } $height=$out[$i]->{similar}*5; } if ($j>0) { print OUTFILE ""; $outstring .= "".substr ($in{'Inputseq'}, $i, 1).""; $j--; } else { print OUTFILE ""; $outstring .= "".substr ($in{'Inputseq'}, $i, 1).""; } if ( ($i+1)%10==0){ $outstring .= " "; } if ( ($i+1)%60==0){ $outstring .= "
\n"; } if ( ($i+1)%800==0){ print OUTFILE "

\n"; } } print OUTFILE "

$outstring"; #foreach (@out) { #print OUTFILE "

Sequence: $_->{sequence}: $_->{similar} matchs

"; #if ($_->{similar}<=$in{'Threshold'}){ # } #} print OUTFILE "\n\n"; close OUTFILE; #nameprint(); sub parse_form { local ($buffer, @pairs, $pair, $name, $value); # Read in text $ENV{'REQUEST_METHOD'} =~ tr/a-z/A-Z/; if ($ENV{'REQUEST_METHOD'} eq "POST") { read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'}); } else { $buffer = $ENV{'QUERY_STRING'}; } @pairs = split(/&/, $buffer); foreach $pair (@pairs) { ($name, $value) = split(/=/, $pair); $value =~ tr/+/ /; $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg; $in{$name} = $value; } } From maj at fortinbras.us Fri Jan 8 10:36:41 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 8 Jan 2010 10:36:41 -0500 Subject: [Bioperl-l] Regarding blast in Bioperl In-Reply-To: References: Message-ID: Hi Roopa-- I got your code to work with the following changes: +# the input should be a valid FASTA file... ... open(NUC,'>',$nuc); +print NUC ">seq (need a name line for valid fasta)\n"; print NUC $inpu1, "\n"; close(NUC); ... +# you can set these header parms in the call itself... - my $factory = Bio::Tools::Run::RemoteBlast->new(@params); + my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY => ''Trypanosoma Brucei[ORGN]'); #change a paramter +# commented this out... +# $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma Brucei[ORGN]'; MAJ ----- Original Message ----- From: "Roopa Raghuveer" To: Sent: Friday, January 08, 2010 10:00 AM Subject: [Bioperl-l] Regarding blast in Bioperl > Hello all, > > I was trying Remote blast using Bioperl. My input data is a Trypanosoma > brucei sequence in Fasta format. When I was trying to submit to BLAST using > the step > $r=$factory->submit_blast($input) > It was not returning anything which I checked by debugging the code. It is > not blasting my input sequence even though I mentioned all the parameters.I > would paste the code below. > > Please help me in solving put this problem. It is very urgent. > > Regards > Roopa. > > #!/usr/bin/perl > > #path for extra camel module > use lib "/srv/www/htdocs/rain/RNAi/"; > use Roopablast; > > > use Bio::SearchIO; > use Bio::Search::Result::BlastResult; > use Bio::Perl; > use Bio::Tools::Run::RemoteBlast; > use Bio::Seq; > use Bio::SeqIO; > use Bio::DB::GenBank; > > $serverpath = "/srv/www/htdocs/rain/RNAi"; > $serverurl = "http://141.84.66.66/rain/RNAi"; > $outfile = $serverpath."/rnairesult_".time().".html"; > $nuc = $serverpath."/nuc".time().".txt"; > $debugfile = $serverpath."/debug_".time().".txt"; > $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; > > my $outstring =""; > > &parse_form; > > print "Content-type: text/html\n\n"; > print "\n"; > print "RNAi Result"; > print " URL=$serverurl/rnairesult_".time().".html\"> \n"; > print "\n"; > print "\n"; > print " Your results will appear href=$serverurl/rnairesult_".time().".html>here
"; > print " Please be patient, runtime can be up to 5 minutes
"; > print " This page will automatically reload in 30 seconds. Roopa"; > print "\n"; > print "\n"; > > defined(my $pid = fork) or die "Can't fork: $!"; > exit if $pid; > open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; > open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; > open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; > > > > open(OUTFILE, '>',$outfile); > > print OUTFILE "\n > RNAi Result > URL=$serverurl//rnairesult_".time().".html\"> \n > > \n > \n > Your results will appear href=$serverurl/rnairesult_".time().".html>here
> Please be patient, runtime can be up to 5 minutes wait wait wait......
> This page will automatically reload in 30 seconds Roopa
> \n > \n"; > > close(OUTFILE); > > > @compseqs = blastcode($in{'Inputseq'}); > > $in{'Inputseq'} =~ s/>.*$//m; > $in{'Inputseq'} =~ s/[^TAGC]//gim; > $in{'Inputseq'} =~ tr/actg/ACTG/; > > @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'}, > $in{'Threshold'}); > > > sub blastcode > { > > $inpu1= $_[0]; > > #$organ= $_[1]; > > open(NUC,'>',$nuc); > print NUC $inpu1; > close(NUC); > > my $prog = 'blastn'; > my $db = 'refseq_rna'; > my $e_val= '1e-10'; > my $organism= 'Trypanosoma Brucei'; > > $gb = new Bio::DB::GenBank; > > my @params = ( '-prog' => $prog, > '-data' => $db, > '-expect' => $e_val, > '-readmethod' => 'SearchIO', > '-Organism' => $organism ); > > # open(OUTFILE,'>',$debugfile); > # print OUTFILE @params; > # close(OUTFILE); > > > my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > > #change a paramter > > $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma > Brucei[ORGN]'; > > #change a paramter > # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]'; > > my $v = 1; > #$v is just to turn on and off the messages > > my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , > '-organism' => 'Trypanosoma Brucei' ); > > > while (my $input = $str->next_seq()) > { > #Blast a sequence against a database: > #Alternatively, you could pass in a file with many > #sequences rather than loop through sequence one at a time > #Remove the loop starting 'while (my $input = $str->next_seq())' > #and swap the two lines below for an example of that. > > open(OUTFILE,'>',$debugfile); > print OUTFILE $input; > close(OUTFILE); > > > my $r = $factory->submit_blast($input); #The program stops here it > does not return any value and it does not enter the While loop,Please help > me in this regard.# > open(OUTFILE,'>',$debugfile); > print OUTFILE $r; > close(OUTFILE); > > > print STDERR "waiting...." if($v>0); > > while ( my @rids = $factory->each_rid ) { > open(OUTFILE,'>',$debugfile); > print OUTFILE "while entered"; > close(OUTFILE); > foreach my $rid ( @rids ) { > > open(OUTFILE,'>',$debugfile); > print OUTFILE "foreach entered"; > close(OUTFILE); > > my $rc = $factory->retrieve_blast($rid); > > if( !ref($rc) ) > { > if( $rc < 0 ) > { > $factory->remove_rid($rid); > } > open(OUTFILE,'>',$debugfile); > print OUTFILE "if entered"; > close(OUTFILE); > print STDERR "." if ( $v > 0 ); > sleep 5; > } > else { > open(OUTFILE,'>',$debugfile); > print OUTFILE "else entered"; > close(OUTFILE); > > my $result = $rc->next_result(); > #save the output > $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; > > open(BLASTDEBUGFILE,'>',$blastdebugfile); > print BLASTDEBUGFILE $result->next_hit(); > close(BLASTDEBUGFILE); > > my $filename = > $serverpath."/blastdata_".time().$result->query_name()."\.out"; > > # open(DEBUGFILE,'>',$debugfile); > # open(new,'>',$filename); > # @arra=; > # print DEBUGFILE @arra; > # close(DEBUGFILE); > # close(new); > > $factory->save_output($filename); > > # open(BLASTDEBUGFILE,'>',$debugfile); > # print BLASTDEBUGFILE "Hello $rid"; > # close(BLASTDEBUGFILE); > > $factory->remove_rid($rid); > > open(BLASTDEBUGFILE,'>',$blastdebugfile); > print BLASTDEBUGFILE $organism; > close(BLASTDEBUGFILE); > > # open(OUTFILE,'>',$outfile); > # print OUTFILE "Test2 $result->database_name()"; > # close(OUTFILE); > > #$hit = $result->next_hit; > #open(new,'>',$debugfile); > #print $hit; > #close(new); > > while ( my $hit = $result->next_hit ) { > > next unless ( $v > 0); > > # open(OUTFILE,'>',$debugfile); > # print OUTFILE "$hit in while hits"; > # close(OUTFILE); > > my $sequ = $gb->get_Seq_by_version($hit->name); > my $dna = $sequ->seq(); # get the sequence as a string > push(@seqs,$dna); > } > } > } > } > } > > #open(OUTFILE,'>',$debugfile); > #print OUTFILE $seqs[0]; > #close(OUTFILE); > > return(@seqs); > > } > > open(OUTFILE, '>',$outfile) || die ; > > print OUTFILE "\n > RNAi Result > \n > \n >

> Inputsequence:
"; > > for ($i=0; $i > print OUTFILE substr ($in{'Inputseq'}, $i, 1); > > if ( ($i+1)%10==0){ > print OUTFILE " "; > } > if ( ($i+1)%60==0){ > print OUTFILE "
\n"; > } > } > > > > print OUTFILE "

"; > > $z=@compseqs; > > for($k=1;$k<$z;$k++) { > print OUTFILE "

Compare > Sequence:
"; > > for ($i=0; $i > print OUTFILE substr ($compseqs[$k], $i, 1); > > if ( ($i+1)%10==0){ > print OUTFILE " "; > } > if ( ($i+1)%60==0){ > print OUTFILE "
\n"; > } > } > print OUTFILE "

"; > } > > print OUTFILE "

> Window:
$in{'Windowsize'} >

>

> Threshold:
$in{'Threshold'} >

"; > my $j=0; > > for ($i=0; $i > if ($i<=length ($in{'Inputseq'})-$in{'Windowsize'}){ > if ($out[$i]->{similar}<=$in{'Threshold'}){ > $j=$in{'Windowsize'}; > } > $height=$out[$i]->{similar}*5; > } > > if ($j>0) { > print OUTFILE " height=\"5\">"; > $outstring .= "".substr ($in{'Inputseq'}, $i, > 1).""; > $j--; > } > else { > print OUTFILE " height=\"5\">"; > $outstring .= "".substr ($in{'Inputseq'}, $i, > 1).""; > } > > if ( ($i+1)%10==0){ > $outstring .= " "; > } > if ( ($i+1)%60==0){ > $outstring .= "
\n"; > > } > if ( ($i+1)%800==0){ > print OUTFILE "

\n"; > > } > } > > print OUTFILE "

set\">$outstring"; > > #foreach (@out) { > #print OUTFILE "

Sequence: $_->{sequence}: $_->{similar} matchs

"; > #if ($_->{similar}<=$in{'Threshold'}){ > > # } > #} > > print OUTFILE "\n\n"; > > close OUTFILE; > > #nameprint(); > > sub parse_form { > local ($buffer, @pairs, $pair, $name, $value); > # Read in text > $ENV{'REQUEST_METHOD'} =~ tr/a-z/A-Z/; > if ($ENV{'REQUEST_METHOD'} eq "POST") > { > read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'}); > } > else > { > $buffer = $ENV{'QUERY_STRING'}; > } > @pairs = split(/&/, $buffer); > foreach $pair (@pairs) > { > ($name, $value) = split(/=/, $pair); > $value =~ tr/+/ /; > $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg; > $in{$name} = $value; > } > } > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From julian.onions at gmail.com Fri Jan 8 11:53:50 2010 From: julian.onions at gmail.com (Julian Onions) Date: Fri, 8 Jan 2010 16:53:50 +0000 Subject: [Bioperl-l] Cladogram construction Message-ID: Does anyone have any sample code for building cladograms based on Pars (one of Phylip tools) type format (or any other format actually) I've got something sort of working but I get no weights on the tree - everything appears as nan. I'd also like to set one of the species to be an outgroup. This is the closest sample I've found so far. #!/usr/bin/perl -w use strict; use Bio::AlignIO; use Bio::Tree::DistanceFactory; use Bio::Align::ProteinStatistics; use Bio::TreeIO; use Bio::Tree::Draw::Cladogram; my $alnfile = shift @ARGV || die "need a file to run"; my $input= Bio::AlignIO->new(-format => 'fasta', -file => $alnfile); if( my $aln = $input->next_aln ) { my $dfactory = Bio::Tree::DistanceFactory->new(-method => 'NJ'); my $stats = Bio::Align::ProteinStatistics->new; my $distmat = $stats->distance(-align => $aln, -method => 'Kimura'); my $treeout = Bio::TreeIO->new(-format => 'newick'); my $tree = $dfactory->make_tree($distmat); $treeout->write_tree($tree); my $obj1 = Bio::Tree::Draw::Cladogram->new(-tree => $tree, -compact => 0); $obj1->print(-file => "tree.eps"); } else { die "could not find any alignments in the file $alnfile"; } Pars input looks like 3 4 Robin 101 Blackbird 100 Sparrow 100 Thanks, Julian. From rtbio.2009 at gmail.com Sat Jan 9 11:57:09 2010 From: rtbio.2009 at gmail.com (Roopa Raghuveer) Date: Sat, 9 Jan 2010 17:57:09 +0100 Subject: [Bioperl-l] Regarding blast in Bioperl In-Reply-To: References: Message-ID: Hello all, Thanks alot for your reply Mark. It was working for Trypanosoma brucei as the organism parameter,but when I tried to use the Organism parameter from the user,it was not working i.e., I was unable to get the target sequences. Please help me in this regard. My code is #!/usr/bin/perl #path for extra camel module use lib "/srv/www/htdocs/rain/RNAi/"; use Roopablast; use Bio::SearchIO; use Bio::Search::Result::BlastResult; use Bio::Perl; use Bio::Tools::Run::RemoteBlast; use Bio::Seq; use Bio::SeqIO; use Bio::DB::GenBank; $serverpath = "/srv/www/htdocs/rain/RNAi"; $serverurl = "http://141.84.66.66/rain/RNAi"; $outfile = $serverpath."/rnairesult_".time().".html"; $nuc = $serverpath."/nuc".time().".txt"; $debugfile = $serverpath."/debug_".time().".txt"; $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; my $outstring =""; &parse_form; print "Content-type: text/html\n\n"; print "\n"; print "RNAi Result"; print " \n"; print "\n"; print "\n"; print " Your results will appear here
"; print " Please be patient, runtime can be up to 5 minutes
"; print " This page will automatically reload in 30 seconds. Roopa"; print "\n"; print "\n"; defined(my $pid = fork) or die "Can't fork: $!"; exit if $pid; open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; open(OUTFILE, '>',$outfile); print OUTFILE "\n RNAi Result \n \n \n Your results will appear here
Please be patient, runtime can be up to 5 minutes wait wait wait......
This page will automatically reload in 30 seconds Roopa
\n \n"; close(OUTFILE); @compseqs = blastcode($in{'Inputseq'},$in{'Organism'}); $in{'Inputseq'} =~ s/>.*$//m; $in{'Inputseq'} =~ s/[^TAGC]//gim; $in{'Inputseq'} =~ tr/actg/ACTG/; @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'}, $in{'Threshold'}); sub blastcode { $inpu1= $_[0]; $organ= $_[1]; open(NUC,'>',$nuc); print NUC $inpu1,"\n"; close(NUC); my $prog = 'blastn'; my $db = 'refseq_rna'; my $e_val= '1e-10'; my $organism= $organ; $gb = new Bio::DB::GenBank; my @params = ( '-prog' => $prog, '-data' => $db, '-expect' => $e_val, '-readmethod' => 'SearchIO', '-Organism' => $organism ); open(OUTFILE,'>',$debugfile); print OUTFILE $inpu1; close(OUTFILE); my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY => '$organ[ORGN]'); #my $factory = Bio::Tools::Run::RemoteBlast->new(@params); #change a paramter #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma Brucei[ORGN]'; #change a paramter # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]'; my $v = 1; #$v is just to turn on and off the messages my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , '-organism' => $organ ); while (my $input = $str->next_seq()) { #Blast a sequence against a database: #Alternatively, you could pass in a file with many #sequences rather than loop through sequence one at a time #Remove the loop starting 'while (my $input = $str->next_seq())' #and swap the two lines below for an example of that. #open(OUTFILE,'>',$debugfile); # print OUTFILE $input; #close(OUTFILE); my $r = $factory->submit_blast($input); open(OUTFILE,'>',$debugfile); # print OUTFILE $r; close(OUTFILE); print STDERR "waiting...." if($v>0); while ( my @rids = $factory->each_rid ) { # open(OUTFILE,'>',$debugfile); # print OUTFILE "while entered"; # close(OUTFILE); foreach my $rid ( @rids ) { # open(OUTFILE,'>',$debugfile); # print OUTFILE "foreach entered"; # close(OUTFILE); my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $factory->remove_rid($rid); } open(OUTFILE,'>',$debugfile); # print OUTFILE "if entered"; close(OUTFILE); print STDERR "." if ( $v > 0 ); sleep 5; } else { # open(OUTFILE,'>',$debugfile); # print OUTFILE "else entered"; # close(OUTFILE); my $result = $rc->next_result(); #save the output $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; open(BLASTDEBUGFILE,'>',$blastdebugfile); print BLASTDEBUGFILE $result->next_hit(); close(BLASTDEBUGFILE); my $filename = $serverpath."/blastdata_".time().$result->query_name()."\.out"; # open(DEBUGFILE,'>',$debugfile); # open(new,'>',$filename); # @arra=; # print DEBUGFILE @arra; # close(DEBUGFILE); # close(new); $factory->save_output($filename); # open(BLASTDEBUGFILE,'>',$debugfile); # print BLASTDEBUGFILE "Hello $rid"; # close(BLASTDEBUGFILE); $factory->remove_rid($rid); open(BLASTDEBUGFILE,'>',$blastdebugfile); print BLASTDEBUGFILE $organism; close(BLASTDEBUGFILE); # open(OUTFILE,'>',$outfile); # print OUTFILE "Test2 $result->database_name()"; # close(OUTFILE); #$hit = $result->next_hit; #open(new,'>',$debugfile); #print $hit; #close(new); while ( my $hit = $result->next_hit ) { next unless ( $v > 0); # open(OUTFILE,'>',$debugfile); # print OUTFILE "$hit in while hits"; # close(OUTFILE); my $sequ = $gb->get_Seq_by_version($hit->name); my $dna = $sequ->seq(); # get the sequence as a string push(@seqs,$dna); } } } } } #open(OUTFILE,'>',$debugfile); #print OUTFILE $seqs[0]; #close(OUTFILE); return(@seqs); } Regards, Roopa. On Fri, Jan 8, 2010 at 4:36 PM, Mark A. Jensen wrote: > Hi Roopa-- > > I got your code to work with the following changes: > > +# the input should be a valid FASTA file... > ... > open(NUC,'>',$nuc); > +print NUC ">seq (need a name line for valid fasta)\n"; > print NUC $inpu1, "\n"; > close(NUC); > ... > > +# you can set these header parms in the call itself... > - my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > + my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY => > ''Trypanosoma Brucei[ORGN]'); > > #change a paramter > +# commented this out... > +# $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma > Brucei[ORGN]'; > > MAJ > ----- Original Message ----- From: "Roopa Raghuveer" > > To: > Sent: Friday, January 08, 2010 10:00 AM > Subject: [Bioperl-l] Regarding blast in Bioperl > > > Hello all, >> >> I was trying Remote blast using Bioperl. My input data is a Trypanosoma >> brucei sequence in Fasta format. When I was trying to submit to BLAST >> using >> the step >> $r=$factory->submit_blast($input) >> It was not returning anything which I checked by debugging the code. It is >> not blasting my input sequence even though I mentioned all the >> parameters.I >> would paste the code below. >> >> Please help me in solving put this problem. It is very urgent. >> >> Regards >> Roopa. >> >> #!/usr/bin/perl >> >> #path for extra camel module >> use lib "/srv/www/htdocs/rain/RNAi/"; >> use Roopablast; >> >> >> use Bio::SearchIO; >> use Bio::Search::Result::BlastResult; >> use Bio::Perl; >> use Bio::Tools::Run::RemoteBlast; >> use Bio::Seq; >> use Bio::SeqIO; >> use Bio::DB::GenBank; >> >> $serverpath = "/srv/www/htdocs/rain/RNAi"; >> $serverurl = "http://141.84.66.66/rain/RNAi"; >> $outfile = $serverpath."/rnairesult_".time().".html"; >> $nuc = $serverpath."/nuc".time().".txt"; >> $debugfile = $serverpath."/debug_".time().".txt"; >> $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; >> >> my $outstring =""; >> >> &parse_form; >> >> print "Content-type: text/html\n\n"; >> print "\n"; >> print "RNAi Result"; >> print "> URL=$serverurl/rnairesult_".time().".html\"> \n"; >> print "\n"; >> print "\n"; >> print " Your results will appear > href=$serverurl/rnairesult_".time().".html>here
"; >> print " Please be patient, runtime can be up to 5 minutes
"; >> print " This page will automatically reload in 30 seconds. Roopa"; >> print "\n"; >> print "\n"; >> >> defined(my $pid = fork) or die "Can't fork: $!"; >> exit if $pid; >> open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; >> open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; >> open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; >> >> >> >> open(OUTFILE, '>',$outfile); >> >> print OUTFILE "\n >> RNAi Result >> > URL=$serverurl//rnairesult_".time().".html\"> \n >> >> \n >> \n >> Your results will appear > href=$serverurl/rnairesult_".time().".html>here
>> Please be patient, runtime can be up to 5 minutes wait wait >> wait......
>> This page will automatically reload in 30 seconds Roopa
>> \n >> \n"; >> >> close(OUTFILE); >> >> >> @compseqs = blastcode($in{'Inputseq'}); >> >> $in{'Inputseq'} =~ s/>.*$//m; >> $in{'Inputseq'} =~ s/[^TAGC]//gim; >> $in{'Inputseq'} =~ tr/actg/ACTG/; >> >> @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'}, >> $in{'Threshold'}); >> >> >> sub blastcode >> { >> >> $inpu1= $_[0]; >> >> #$organ= $_[1]; >> >> open(NUC,'>',$nuc); >> print NUC $inpu1; >> close(NUC); >> >> my $prog = 'blastn'; >> my $db = 'refseq_rna'; >> my $e_val= '1e-10'; >> my $organism= 'Trypanosoma Brucei'; >> >> $gb = new Bio::DB::GenBank; >> >> my @params = ( '-prog' => $prog, >> '-data' => $db, >> '-expect' => $e_val, >> '-readmethod' => 'SearchIO', >> '-Organism' => $organism ); >> >> # open(OUTFILE,'>',$debugfile); >> # print OUTFILE @params; >> # close(OUTFILE); >> >> >> my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >> >> #change a paramter >> >> $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma >> Brucei[ORGN]'; >> >> #change a paramter >> # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]'; >> >> my $v = 1; >> #$v is just to turn on and off the messages >> >> my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , >> '-organism' => 'Trypanosoma Brucei' ); >> >> >> while (my $input = $str->next_seq()) >> { >> #Blast a sequence against a database: >> #Alternatively, you could pass in a file with many >> #sequences rather than loop through sequence one at a time >> #Remove the loop starting 'while (my $input = $str->next_seq())' >> #and swap the two lines below for an example of that. >> >> open(OUTFILE,'>',$debugfile); >> print OUTFILE $input; >> close(OUTFILE); >> >> >> my $r = $factory->submit_blast($input); #The program stops here it >> does not return any value and it does not enter the While loop,Please help >> me in this regard.# >> open(OUTFILE,'>',$debugfile); >> print OUTFILE $r; >> close(OUTFILE); >> >> >> print STDERR "waiting...." if($v>0); >> >> while ( my @rids = $factory->each_rid ) { >> open(OUTFILE,'>',$debugfile); >> print OUTFILE "while entered"; >> close(OUTFILE); >> foreach my $rid ( @rids ) { >> >> open(OUTFILE,'>',$debugfile); >> print OUTFILE "foreach entered"; >> close(OUTFILE); >> >> my $rc = $factory->retrieve_blast($rid); >> >> if( !ref($rc) ) >> { >> if( $rc < 0 ) >> { >> $factory->remove_rid($rid); >> } >> open(OUTFILE,'>',$debugfile); >> print OUTFILE "if entered"; >> close(OUTFILE); >> print STDERR "." if ( $v > 0 ); >> sleep 5; >> } >> else { >> open(OUTFILE,'>',$debugfile); >> print OUTFILE "else entered"; >> close(OUTFILE); >> >> my $result = $rc->next_result(); >> #save the output >> $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; >> >> open(BLASTDEBUGFILE,'>',$blastdebugfile); >> print BLASTDEBUGFILE $result->next_hit(); >> close(BLASTDEBUGFILE); >> >> my $filename = >> $serverpath."/blastdata_".time().$result->query_name()."\.out"; >> >> # open(DEBUGFILE,'>',$debugfile); >> # open(new,'>',$filename); >> # @arra=; >> # print DEBUGFILE @arra; >> # close(DEBUGFILE); >> # close(new); >> >> $factory->save_output($filename); >> >> # open(BLASTDEBUGFILE,'>',$debugfile); >> # print BLASTDEBUGFILE "Hello $rid"; >> # close(BLASTDEBUGFILE); >> >> $factory->remove_rid($rid); >> >> open(BLASTDEBUGFILE,'>',$blastdebugfile); >> print BLASTDEBUGFILE $organism; >> close(BLASTDEBUGFILE); >> >> # open(OUTFILE,'>',$outfile); >> # print OUTFILE "Test2 $result->database_name()"; >> # close(OUTFILE); >> >> #$hit = $result->next_hit; >> #open(new,'>',$debugfile); >> #print $hit; >> #close(new); >> >> while ( my $hit = $result->next_hit ) { >> >> next unless ( $v > 0); >> >> # open(OUTFILE,'>',$debugfile); >> # print OUTFILE "$hit in while hits"; >> # close(OUTFILE); >> >> my $sequ = $gb->get_Seq_by_version($hit->name); >> my $dna = $sequ->seq(); # get the sequence as a string >> push(@seqs,$dna); >> } >> } >> } >> } >> } >> >> #open(OUTFILE,'>',$debugfile); >> #print OUTFILE $seqs[0]; >> #close(OUTFILE); >> >> return(@seqs); >> >> } >> >> open(OUTFILE, '>',$outfile) || die ; >> >> print OUTFILE "\n >> RNAi Result >> \n >> \n >>

>> Inputsequence:
"; >> >> for ($i=0; $i> >> print OUTFILE substr ($in{'Inputseq'}, $i, 1); >> >> if ( ($i+1)%10==0){ >> print OUTFILE " "; >> } >> if ( ($i+1)%60==0){ >> print OUTFILE "
\n"; >> } >> } >> >> >> >> print OUTFILE "

"; >> >> $z=@compseqs; >> >> for($k=1;$k<$z;$k++) { >> print OUTFILE "

Compare >> Sequence:
"; >> >> for ($i=0; $i> >> print OUTFILE substr ($compseqs[$k], $i, 1); >> >> if ( ($i+1)%10==0){ >> print OUTFILE " "; >> } >> if ( ($i+1)%60==0){ >> print OUTFILE "
\n"; >> } >> } >> print OUTFILE "

"; >> } >> >> print OUTFILE "

>> Window:
$in{'Windowsize'} >>

>>

>> Threshold:
$in{'Threshold'} >>

"; >> my $j=0; >> >> for ($i=0; $i> >> if ($i<=length ($in{'Inputseq'})-$in{'Windowsize'}){ >> if ($out[$i]->{similar}<=$in{'Threshold'}){ >> $j=$in{'Windowsize'}; >> } >> $height=$out[$i]->{similar}*5; >> } >> >> if ($j>0) { >> print OUTFILE "> height=\"5\">"; >> $outstring .= "".substr ($in{'Inputseq'}, $i, >> 1).""; >> $j--; >> } >> else { >> print OUTFILE "> height=\"5\">"; >> $outstring .= "".substr ($in{'Inputseq'}, $i, >> 1).""; >> } >> >> if ( ($i+1)%10==0){ >> $outstring .= " "; >> } >> if ( ($i+1)%60==0){ >> $outstring .= "
\n"; >> >> } >> if ( ($i+1)%800==0){ >> print OUTFILE "

\n"; >> >> } >> } >> >> print OUTFILE "

> set\">$outstring"; >> >> #foreach (@out) { >> #print OUTFILE "

Sequence: $_->{sequence}: $_->{similar} matchs

"; >> #if ($_->{similar}<=$in{'Threshold'}){ >> >> # } >> #} >> >> print OUTFILE "\n\n"; >> >> close OUTFILE; >> >> #nameprint(); >> >> sub parse_form { >> local ($buffer, @pairs, $pair, $name, $value); >> # Read in text >> $ENV{'REQUEST_METHOD'} =~ tr/a-z/A-Z/; >> if ($ENV{'REQUEST_METHOD'} eq "POST") >> { >> read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'}); >> } >> else >> { >> $buffer = $ENV{'QUERY_STRING'}; >> } >> @pairs = split(/&/, $buffer); >> foreach $pair (@pairs) >> { >> ($name, $value) = split(/=/, $pair); >> $value =~ tr/+/ /; >> $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg; >> $in{$name} = $value; >> } >> } >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > From maj at fortinbras.us Sat Jan 9 13:05:41 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 9 Jan 2010 13:05:41 -0500 Subject: [Bioperl-l] Regarding blast in Bioperl In-Reply-To: References: Message-ID: <4C2E8133F916495B876628EF3E8FCBB2@NewLife> I see it immediately (from making same bug many times) : my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY => - '$organ[ORGN]'); +"$organ[ORGN]"); MAJ ----- Original Message ----- From: "Roopa Raghuveer" To: "Mark A. Jensen" Cc: Sent: Saturday, January 09, 2010 11:57 AM Subject: Re: [Bioperl-l] Regarding blast in Bioperl > Hello all, > > Thanks alot for your reply Mark. It was working for Trypanosoma brucei as > the organism parameter,but when I tried to use the Organism parameter from > the user,it was not working i.e., I was unable to get the target sequences. > Please help me in this regard. My code is > > #!/usr/bin/perl > > #path for extra camel module > use lib "/srv/www/htdocs/rain/RNAi/"; > use Roopablast; > > > use Bio::SearchIO; > use Bio::Search::Result::BlastResult; > use Bio::Perl; > use Bio::Tools::Run::RemoteBlast; > use Bio::Seq; > use Bio::SeqIO; > use Bio::DB::GenBank; > > $serverpath = "/srv/www/htdocs/rain/RNAi"; > $serverurl = "http://141.84.66.66/rain/RNAi"; > $outfile = $serverpath."/rnairesult_".time().".html"; > $nuc = $serverpath."/nuc".time().".txt"; > $debugfile = $serverpath."/debug_".time().".txt"; > $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; > > my $outstring =""; > > &parse_form; > > print "Content-type: text/html\n\n"; > print "\n"; > print "RNAi Result"; > print " URL=$serverurl/rnairesult_".time().".html\"> \n"; > print "\n"; > print "\n"; > print " Your results will appear href=$serverurl/rnairesult_".time().".html>here
"; > print " Please be patient, runtime can be up to 5 minutes
"; > print " This page will automatically reload in 30 seconds. Roopa"; > print "\n"; > print "\n"; > > defined(my $pid = fork) or die "Can't fork: $!"; > exit if $pid; > open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; > open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; > open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; > > open(OUTFILE, '>',$outfile); > > print OUTFILE "\n > RNAi Result > URL=$serverurl//rnairesult_".time().".html\"> \n > > \n > \n > Your results will appear href=$serverurl/rnairesult_".time().".html>here
> Please be patient, runtime can be up to 5 minutes wait wait wait......
> This page will automatically reload in 30 seconds Roopa
> \n > \n"; > > close(OUTFILE); > > > @compseqs = blastcode($in{'Inputseq'},$in{'Organism'}); > > $in{'Inputseq'} =~ s/>.*$//m; > $in{'Inputseq'} =~ s/[^TAGC]//gim; > $in{'Inputseq'} =~ tr/actg/ACTG/; > > @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'}, > $in{'Threshold'}); > > > sub blastcode > { > > $inpu1= $_[0]; > > $organ= $_[1]; > > open(NUC,'>',$nuc); > print NUC $inpu1,"\n"; > close(NUC); > > my $prog = 'blastn'; > my $db = 'refseq_rna'; > my $e_val= '1e-10'; > my $organism= $organ; > > $gb = new Bio::DB::GenBank; > > my @params = ( '-prog' => $prog, > '-data' => $db, > '-expect' => $e_val, > '-readmethod' => 'SearchIO', > '-Organism' => $organism ); > > open(OUTFILE,'>',$debugfile); > print OUTFILE $inpu1; > close(OUTFILE); > > > my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY => > '$organ[ORGN]'); > > #my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > > #change a paramter > > #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma > Brucei[ORGN]'; > > #change a paramter > # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]'; > > my $v = 1; > #$v is just to turn on and off the messages > > my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , > '-organism' => $organ ); > > > while (my $input = $str->next_seq()) > { > #Blast a sequence against a database: > #Alternatively, you could pass in a file with many > #sequences rather than loop through sequence one at a time > #Remove the loop starting 'while (my $input = $str->next_seq())' > #and swap the two lines below for an example of that. > > #open(OUTFILE,'>',$debugfile); > # print OUTFILE $input; > #close(OUTFILE); > > > my $r = $factory->submit_blast($input); > > open(OUTFILE,'>',$debugfile); > # print OUTFILE $r; > close(OUTFILE); > > print STDERR "waiting...." if($v>0); > > while ( my @rids = $factory->each_rid ) { > # open(OUTFILE,'>',$debugfile); > # print OUTFILE "while entered"; > # close(OUTFILE); > foreach my $rid ( @rids ) { > > # open(OUTFILE,'>',$debugfile); > # print OUTFILE "foreach entered"; > # close(OUTFILE); > > my $rc = $factory->retrieve_blast($rid); > > if( !ref($rc) ) > { > if( $rc < 0 ) > { > $factory->remove_rid($rid); > } > open(OUTFILE,'>',$debugfile); > # print OUTFILE "if entered"; > close(OUTFILE); > print STDERR "." if ( $v > 0 ); > sleep 5; > } > else { > # open(OUTFILE,'>',$debugfile); > # print OUTFILE "else entered"; > # close(OUTFILE); > > my $result = $rc->next_result(); > #save the output > $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; > > open(BLASTDEBUGFILE,'>',$blastdebugfile); > print BLASTDEBUGFILE $result->next_hit(); > close(BLASTDEBUGFILE); > > my $filename = > $serverpath."/blastdata_".time().$result->query_name()."\.out"; > > # open(DEBUGFILE,'>',$debugfile); > # open(new,'>',$filename); > # @arra=; > # print DEBUGFILE @arra; > # close(DEBUGFILE); > # close(new); > > $factory->save_output($filename); > # open(BLASTDEBUGFILE,'>',$debugfile); > # print BLASTDEBUGFILE "Hello $rid"; > # close(BLASTDEBUGFILE); > > $factory->remove_rid($rid); > > open(BLASTDEBUGFILE,'>',$blastdebugfile); > print BLASTDEBUGFILE $organism; > close(BLASTDEBUGFILE); > > # open(OUTFILE,'>',$outfile); > # print OUTFILE "Test2 $result->database_name()"; > # close(OUTFILE); > > #$hit = $result->next_hit; > #open(new,'>',$debugfile); > #print $hit; > #close(new); > > while ( my $hit = $result->next_hit ) { > > next unless ( $v > 0); > > # open(OUTFILE,'>',$debugfile); > # print OUTFILE "$hit in while hits"; > # close(OUTFILE); > > my $sequ = $gb->get_Seq_by_version($hit->name); > my $dna = $sequ->seq(); # get the sequence as a string > push(@seqs,$dna); > } > } > } > } > } > > #open(OUTFILE,'>',$debugfile); > #print OUTFILE $seqs[0]; > #close(OUTFILE); > > return(@seqs); > > } > > Regards, > Roopa. > > > On Fri, Jan 8, 2010 at 4:36 PM, Mark A. Jensen wrote: > >> Hi Roopa-- >> >> I got your code to work with the following changes: >> >> +# the input should be a valid FASTA file... >> ... >> open(NUC,'>',$nuc); >> +print NUC ">seq (need a name line for valid fasta)\n"; >> print NUC $inpu1, "\n"; >> close(NUC); >> ... >> >> +# you can set these header parms in the call itself... >> - my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >> + my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY => >> ''Trypanosoma Brucei[ORGN]'); >> >> #change a paramter >> +# commented this out... >> +# $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma >> Brucei[ORGN]'; >> >> MAJ >> ----- Original Message ----- From: "Roopa Raghuveer" > > >> To: >> Sent: Friday, January 08, 2010 10:00 AM >> Subject: [Bioperl-l] Regarding blast in Bioperl >> >> >> Hello all, >>> >>> I was trying Remote blast using Bioperl. My input data is a Trypanosoma >>> brucei sequence in Fasta format. When I was trying to submit to BLAST >>> using >>> the step >>> $r=$factory->submit_blast($input) >>> It was not returning anything which I checked by debugging the code. It is >>> not blasting my input sequence even though I mentioned all the >>> parameters.I >>> would paste the code below. >>> >>> Please help me in solving put this problem. It is very urgent. >>> >>> Regards >>> Roopa. >>> >>> #!/usr/bin/perl >>> >>> #path for extra camel module >>> use lib "/srv/www/htdocs/rain/RNAi/"; >>> use Roopablast; >>> >>> >>> use Bio::SearchIO; >>> use Bio::Search::Result::BlastResult; >>> use Bio::Perl; >>> use Bio::Tools::Run::RemoteBlast; >>> use Bio::Seq; >>> use Bio::SeqIO; >>> use Bio::DB::GenBank; >>> >>> $serverpath = "/srv/www/htdocs/rain/RNAi"; >>> $serverurl = "http://141.84.66.66/rain/RNAi"; >>> $outfile = $serverpath."/rnairesult_".time().".html"; >>> $nuc = $serverpath."/nuc".time().".txt"; >>> $debugfile = $serverpath."/debug_".time().".txt"; >>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; >>> >>> my $outstring =""; >>> >>> &parse_form; >>> >>> print "Content-type: text/html\n\n"; >>> print "\n"; >>> print "RNAi Result"; >>> print ">> URL=$serverurl/rnairesult_".time().".html\"> \n"; >>> print "\n"; >>> print "\n"; >>> print " Your results will appear >> href=$serverurl/rnairesult_".time().".html>here
"; >>> print " Please be patient, runtime can be up to 5 minutes
"; >>> print " This page will automatically reload in 30 seconds. Roopa"; >>> print "\n"; >>> print "\n"; >>> >>> defined(my $pid = fork) or die "Can't fork: $!"; >>> exit if $pid; >>> open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; >>> open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; >>> open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; >>> >>> >>> >>> open(OUTFILE, '>',$outfile); >>> >>> print OUTFILE "\n >>> RNAi Result >>> >> URL=$serverurl//rnairesult_".time().".html\"> \n >>> >>> \n >>> \n >>> Your results will appear >> href=$serverurl/rnairesult_".time().".html>here
>>> Please be patient, runtime can be up to 5 minutes wait wait >>> wait......
>>> This page will automatically reload in 30 seconds Roopa
>>> \n >>> \n"; >>> >>> close(OUTFILE); >>> >>> >>> @compseqs = blastcode($in{'Inputseq'}); >>> >>> $in{'Inputseq'} =~ s/>.*$//m; >>> $in{'Inputseq'} =~ s/[^TAGC]//gim; >>> $in{'Inputseq'} =~ tr/actg/ACTG/; >>> >>> @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'}, >>> $in{'Threshold'}); >>> >>> >>> sub blastcode >>> { >>> >>> $inpu1= $_[0]; >>> >>> #$organ= $_[1]; >>> >>> open(NUC,'>',$nuc); >>> print NUC $inpu1; >>> close(NUC); >>> >>> my $prog = 'blastn'; >>> my $db = 'refseq_rna'; >>> my $e_val= '1e-10'; >>> my $organism= 'Trypanosoma Brucei'; >>> >>> $gb = new Bio::DB::GenBank; >>> >>> my @params = ( '-prog' => $prog, >>> '-data' => $db, >>> '-expect' => $e_val, >>> '-readmethod' => 'SearchIO', >>> '-Organism' => $organism ); >>> >>> # open(OUTFILE,'>',$debugfile); >>> # print OUTFILE @params; >>> # close(OUTFILE); >>> >>> >>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >>> >>> #change a paramter >>> >>> $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma >>> Brucei[ORGN]'; >>> >>> #change a paramter >>> # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]'; >>> >>> my $v = 1; >>> #$v is just to turn on and off the messages >>> >>> my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , >>> '-organism' => 'Trypanosoma Brucei' ); >>> >>> >>> while (my $input = $str->next_seq()) >>> { >>> #Blast a sequence against a database: >>> #Alternatively, you could pass in a file with many >>> #sequences rather than loop through sequence one at a time >>> #Remove the loop starting 'while (my $input = $str->next_seq())' >>> #and swap the two lines below for an example of that. >>> >>> open(OUTFILE,'>',$debugfile); >>> print OUTFILE $input; >>> close(OUTFILE); >>> >>> >>> my $r = $factory->submit_blast($input); #The program stops here it >>> does not return any value and it does not enter the While loop,Please help >>> me in this regard.# >>> open(OUTFILE,'>',$debugfile); >>> print OUTFILE $r; >>> close(OUTFILE); >>> >>> >>> print STDERR "waiting...." if($v>0); >>> >>> while ( my @rids = $factory->each_rid ) { >>> open(OUTFILE,'>',$debugfile); >>> print OUTFILE "while entered"; >>> close(OUTFILE); >>> foreach my $rid ( @rids ) { >>> >>> open(OUTFILE,'>',$debugfile); >>> print OUTFILE "foreach entered"; >>> close(OUTFILE); >>> >>> my $rc = $factory->retrieve_blast($rid); >>> >>> if( !ref($rc) ) >>> { >>> if( $rc < 0 ) >>> { >>> $factory->remove_rid($rid); >>> } >>> open(OUTFILE,'>',$debugfile); >>> print OUTFILE "if entered"; >>> close(OUTFILE); >>> print STDERR "." if ( $v > 0 ); >>> sleep 5; >>> } >>> else { >>> open(OUTFILE,'>',$debugfile); >>> print OUTFILE "else entered"; >>> close(OUTFILE); >>> >>> my $result = $rc->next_result(); >>> #save the output >>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; >>> >>> open(BLASTDEBUGFILE,'>',$blastdebugfile); >>> print BLASTDEBUGFILE $result->next_hit(); >>> close(BLASTDEBUGFILE); >>> >>> my $filename = >>> $serverpath."/blastdata_".time().$result->query_name()."\.out"; >>> >>> # open(DEBUGFILE,'>',$debugfile); >>> # open(new,'>',$filename); >>> # @arra=; >>> # print DEBUGFILE @arra; >>> # close(DEBUGFILE); >>> # close(new); >>> >>> $factory->save_output($filename); >>> >>> # open(BLASTDEBUGFILE,'>',$debugfile); >>> # print BLASTDEBUGFILE "Hello $rid"; >>> # close(BLASTDEBUGFILE); >>> >>> $factory->remove_rid($rid); >>> >>> open(BLASTDEBUGFILE,'>',$blastdebugfile); >>> print BLASTDEBUGFILE $organism; >>> close(BLASTDEBUGFILE); >>> >>> # open(OUTFILE,'>',$outfile); >>> # print OUTFILE "Test2 $result->database_name()"; >>> # close(OUTFILE); >>> >>> #$hit = $result->next_hit; >>> #open(new,'>',$debugfile); >>> #print $hit; >>> #close(new); >>> >>> while ( my $hit = $result->next_hit ) { >>> >>> next unless ( $v > 0); >>> >>> # open(OUTFILE,'>',$debugfile); >>> # print OUTFILE "$hit in while hits"; >>> # close(OUTFILE); >>> >>> my $sequ = $gb->get_Seq_by_version($hit->name); >>> my $dna = $sequ->seq(); # get the sequence as a string >>> push(@seqs,$dna); >>> } >>> } >>> } >>> } >>> } >>> >>> #open(OUTFILE,'>',$debugfile); >>> #print OUTFILE $seqs[0]; >>> #close(OUTFILE); >>> >>> return(@seqs); >>> >>> } >>> >>> open(OUTFILE, '>',$outfile) || die ; >>> >>> print OUTFILE "\n >>> RNAi Result >>> \n >>> \n >>>

>>> Inputsequence:
"; >>> >>> for ($i=0; $i>> >>> print OUTFILE substr ($in{'Inputseq'}, $i, 1); >>> >>> if ( ($i+1)%10==0){ >>> print OUTFILE " "; >>> } >>> if ( ($i+1)%60==0){ >>> print OUTFILE "
\n"; >>> } >>> } >>> >>> >>> >>> print OUTFILE "

"; >>> >>> $z=@compseqs; >>> >>> for($k=1;$k<$z;$k++) { >>> print OUTFILE "

Compare >>> Sequence:
"; >>> >>> for ($i=0; $i>> >>> print OUTFILE substr ($compseqs[$k], $i, 1); >>> >>> if ( ($i+1)%10==0){ >>> print OUTFILE " "; >>> } >>> if ( ($i+1)%60==0){ >>> print OUTFILE "
\n"; >>> } >>> } >>> print OUTFILE "

"; >>> } >>> >>> print OUTFILE "

>>> Window:
$in{'Windowsize'} >>>

>>>

>>> Threshold:
$in{'Threshold'} >>>

"; >>> my $j=0; >>> >>> for ($i=0; $i>> >>> if ($i<=length ($in{'Inputseq'})-$in{'Windowsize'}){ >>> if ($out[$i]->{similar}<=$in{'Threshold'}){ >>> $j=$in{'Windowsize'}; >>> } >>> $height=$out[$i]->{similar}*5; >>> } >>> >>> if ($j>0) { >>> print OUTFILE ">> height=\"5\">"; >>> $outstring .= "".substr ($in{'Inputseq'}, $i, >>> 1).""; >>> $j--; >>> } >>> else { >>> print OUTFILE ">> height=\"5\">"; >>> $outstring .= "".substr ($in{'Inputseq'}, $i, >>> 1).""; >>> } >>> >>> if ( ($i+1)%10==0){ >>> $outstring .= " "; >>> } >>> if ( ($i+1)%60==0){ >>> $outstring .= "
\n"; >>> >>> } >>> if ( ($i+1)%800==0){ >>> print OUTFILE "

\n"; >>> >>> } >>> } >>> >>> print OUTFILE "

>> set\">$outstring"; >>> >>> #foreach (@out) { >>> #print OUTFILE "

Sequence: $_->{sequence}: $_->{similar} matchs

"; >>> #if ($_->{similar}<=$in{'Threshold'}){ >>> >>> # } >>> #} >>> >>> print OUTFILE "\n\n"; >>> >>> close OUTFILE; >>> >>> #nameprint(); >>> >>> sub parse_form { >>> local ($buffer, @pairs, $pair, $name, $value); >>> # Read in text >>> $ENV{'REQUEST_METHOD'} =~ tr/a-z/A-Z/; >>> if ($ENV{'REQUEST_METHOD'} eq "POST") >>> { >>> read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'}); >>> } >>> else >>> { >>> $buffer = $ENV{'QUERY_STRING'}; >>> } >>> @pairs = split(/&/, $buffer); >>> foreach $pair (@pairs) >>> { >>> ($name, $value) = split(/=/, $pair); >>> $value =~ tr/+/ /; >>> $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg; >>> $in{$name} = $value; >>> } >>> } >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From robert.bradbury at gmail.com Sat Jan 9 14:52:53 2010 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Sat, 9 Jan 2010 14:52:53 -0500 Subject: [Bioperl-l] Regarding blast in Bioperl In-Reply-To: References: Message-ID: Roopa, Mark is correct, you have to be very careful of single vs. double quotes in perl. Double quoted strings are "interpreted" while single quoted strings are taken literally is my current understanding. I tried to run your script (with fixes) but without the supporting files it appears to be impossible. What I am curious about is what it is trying to do, I was particularly i particularly intrigued by some apparent efforts to parse blast results into color enhanced HTML and without thinking about the code in detail it seems easier to simply ask what you are trying to do? I find "classical" blast results particularly tedious and long for blast results that display concise information as the NCBI homologene cross-species comparisons do. Unfortunately NCBI has deemed their methods (I have asked them) "too complex to disclose (for a person comfortable in dealing with assembly language, or even gate level electronics -- "too complex" is a very relative concept)". One has the option of using NCBI with a limited number of species but good display methodologies or Ensembl with many more species but less desirable display methodologies (phylogenetic tree derived from cross species comparisons). And for the WRN protein which may play a key role in aging (through the activity of its exonuclease domain mutating DNA sequences and inducing microdeletions and microinsertions this gets important because it appears that the *C. elegans* genome is missing the exonuclease domain (so it may be useless from the perspective of studying aging), and the other 4 nematode species which have been sequenced aren't even in the NCBI nor the Ensembl databases. Needless to say, if we manage in the near future, given the drop in sequencing costs, to sequence the nematodes which are freeze/thaw tolerant (which induces DSB that have to be repaired) those genomes will be unlikely to be in the NCBI/Ensembl databases either. So there is a requirement for the user to develop the ability to mix and match public and obscure databases in creative ways to provide easy to interpret information. Robert Bradbury From robert.bradbury at gmail.com Sat Jan 9 15:27:54 2010 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Sat, 9 Jan 2010 15:27:54 -0500 Subject: [Bioperl-l] Ensembl problems Message-ID: I am trying to get the examples provided by EMBL/Ensembl to work and am encountering problems. For example, about 1/3 of the way through the Compara API tutorial [1] there is what is supposed to be a completely functional script. It does not work. This is in contrast to some of the earlier simple scripts (listing the species in Ensmbl etc.) which do work on my machine, so I have all the libraries do dah installed correctly). Very poor form to document scripts which do not function on a properly setup system. I have modified my invocation of the script slightly: Align.pl --set_of_species \ "Homo sapiens:Pan troglodytes:Gorilla gorilla:Macaca mulatta:Otolemur garnettii:Pongo pygmaeus:Equus caballus:Bos taurus:Loxodonta africana:Pteropus vampyrus:Myotis lucifugus:Felis catus:Canis familiaris:Sus scrofa:Rattus norvegicus:Mus musculus:Gallus gallus:Xenopus tropicalis:Takifugu rubripes:Tetraodon nigroviridis:Danio rerio:Tupaia belangeri:Caenorhabditis elegans:Saccharomyces cerevisiae" which results in "Can't call method "fetch_by_method_link_type_GenomeDBs" on an undefined value at ./Align.pl line 132." (Align.pl is my slightly modified example of the Compara Tutoraial code.) As these are slightly modified perl scripts from the documantation, the line numbers may be variable. I can print out the genome_dbs, and it gives me a list of genome names (hash tables) though it appears that is problematic in the Align.pl script. in spite of the fact that just previously to that call I dumped "genome_dbs" and got back some 25 hash tables (expected). I believe this occurs whether one is comparing "human:mouse" or the more complex species set I have outlined above. Has anyone else attempted to run the code documented in the Ensembl API Tutorial? Any suggestions as to what direction to go in would be appreciated -- when one is trying to copy code out of a tutorial and it fails its kind of hard to know where to go.) There do appear to be some problems in the specifications of a Compara version/database and there don't appear to be a lot of resources informing one of what resources are currently available. Robert 1. http://pre.ensembl.org/info/docs/api/compara/compara_tutorial.html From ak at ebi.ac.uk Sat Jan 9 17:01:21 2010 From: ak at ebi.ac.uk (Andreas =?iso-8859-1?B?S+Ro5HJp?=) Date: Sat, 9 Jan 2010 22:01:21 +0000 Subject: [Bioperl-l] Ensembl problems In-Reply-To: References: Message-ID: <20100109220121.GA9521@quux.windows.ebi.ac.uk> On Sat, Jan 09, 2010 at 03:27:54PM -0500, Robert Bradbury wrote: > I am trying to get the examples provided by EMBL/Ensembl to work and am > encountering problems. Hi Robert, The ensembl-dev list is the appropriate forum for this type of questions as it has nothing to do with bioperl. There is also the Ensembl helpdesk. If you send your problem to I'm sure that it will be picked up by the appropriate people (I do myself not know enough about the Compara API to be able to diagnose this problem straight away I'm afraid). Be sure to submit a minimal script that still exhibit the problem, and information about what version of the APIs you're using (we will assume that you're not mixing newer version of the API with older databases or vice versa). We are generally very happy to have bugs in documentation or code pointed out to us, and will correct errors as we are made aware of them. Kind regards, Andreas > For example, about 1/3 of the way through the Compara API tutorial [1] there > is what is supposed to be a completely functional script. It does not > work. This is in contrast to some of the earlier simple scripts (listing > the species in Ensmbl etc.) which do work on my machine, so I have all the > libraries do dah installed correctly). > > Very poor form to document scripts which do not function on a properly setup > system. > > I have modified my invocation of the script slightly: > Align.pl --set_of_species \ > "Homo sapiens:Pan troglodytes:Gorilla gorilla:Macaca mulatta:Otolemur > garnettii:Pongo pygmaeus:Equus caballus:Bos taurus:Loxodonta > africana:Pteropus vampyrus:Myotis lucifugus:Felis catus:Canis familiaris:Sus > scrofa:Rattus norvegicus:Mus musculus:Gallus gallus:Xenopus > tropicalis:Takifugu rubripes:Tetraodon nigroviridis:Danio rerio:Tupaia > belangeri:Caenorhabditis elegans:Saccharomyces cerevisiae" > > which results in "Can't call method "fetch_by_method_link_type_GenomeDBs" on > an undefined value at ./Align.pl line 132." (Align.pl is my slightly > modified example of the Compara Tutoraial code.) > As these are slightly modified perl scripts from the documantation, the line > numbers may be variable. > > I can print out the genome_dbs, and it gives me a list of genome names (hash > tables) though it appears that is problematic in the Align.pl script. > in spite of the fact that just previously to that call I dumped "genome_dbs" > and got back some 25 hash tables (expected). I believe this occurs whether > one is comparing "human:mouse" or the more complex species set I have > outlined above. > > > > Has anyone else attempted to run the code documented in the Ensembl API > Tutorial? > Any suggestions as to what direction to go in would be appreciated -- when > one is trying to copy code out of a tutorial and it fails its kind of hard > to know where to go.) > > There do appear to be some problems in the specifications of a Compara > version/database and there don't appear to be a lot of resources informing > one of what resources are currently available. > > Robert > > > 1. http://pre.ensembl.org/info/docs/api/compara/compara_tutorial.html > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Andreas K?h?ri, Ensembl Software Developer European Bioinformatics Institute (EMBL-EBI) Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, United Kingdom From cjfields at illinois.edu Sat Jan 9 17:01:19 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 9 Jan 2010 16:01:19 -0600 Subject: [Bioperl-l] Ensembl problems In-Reply-To: References: Message-ID: <743C998D-BBB5-4832-BA25-24D7D7288F78@illinois.edu> Robert, Ensembl errors probably should be redirected to the ensembl mail list. I can't speak to the problems with it (they appear specific to the Ensembl tool set). chris On Jan 9, 2010, at 2:27 PM, Robert Bradbury wrote: > I am trying to get the examples provided by EMBL/Ensembl to work and am > encountering problems. > > For example, about 1/3 of the way through the Compara API tutorial [1] there > is what is supposed to be a completely functional script. It does not > work. This is in contrast to some of the earlier simple scripts (listing > the species in Ensmbl etc.) which do work on my machine, so I have all the > libraries do dah installed correctly). > > Very poor form to document scripts which do not function on a properly setup > system. > > I have modified my invocation of the script slightly: > Align.pl --set_of_species \ > "Homo sapiens:Pan troglodytes:Gorilla gorilla:Macaca mulatta:Otolemur > garnettii:Pongo pygmaeus:Equus caballus:Bos taurus:Loxodonta > africana:Pteropus vampyrus:Myotis lucifugus:Felis catus:Canis familiaris:Sus > scrofa:Rattus norvegicus:Mus musculus:Gallus gallus:Xenopus > tropicalis:Takifugu rubripes:Tetraodon nigroviridis:Danio rerio:Tupaia > belangeri:Caenorhabditis elegans:Saccharomyces cerevisiae" > > which results in "Can't call method "fetch_by_method_link_type_GenomeDBs" on > an undefined value at ./Align.pl line 132." (Align.pl is my slightly > modified example of the Compara Tutoraial code.) > As these are slightly modified perl scripts from the documantation, the line > numbers may be variable. > > I can print out the genome_dbs, and it gives me a list of genome names (hash > tables) though it appears that is problematic in the Align.pl script. > in spite of the fact that just previously to that call I dumped "genome_dbs" > and got back some 25 hash tables (expected). I believe this occurs whether > one is comparing "human:mouse" or the more complex species set I have > outlined above. > > > > Has anyone else attempted to run the code documented in the Ensembl API > Tutorial? > Any suggestions as to what direction to go in would be appreciated -- when > one is trying to copy code out of a tutorial and it fails its kind of hard > to know where to go.) > > There do appear to be some problems in the specifications of a Compara > version/database and there don't appear to be a lot of resources informing > one of what resources are currently available. > > Robert > > > 1. http://pre.ensembl.org/info/docs/api/compara/compara_tutorial.html > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From robert.bradbury at gmail.com Sun Jan 10 14:47:00 2010 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Sun, 10 Jan 2010 14:47:00 -0500 Subject: [Bioperl-l] Ensembl problems In-Reply-To: <20100109220121.GA9521@quux.windows.ebi.ac.uk> References: <20100109220121.GA9521@quux.windows.ebi.ac.uk> Message-ID: As it turns out the example from the file I cited (the compara API tutorial does work). The code that I started with may have been from a "MS-WORD" document distributed with the documentation (which could quite well be out-of-date). But even the corrected code does not work for various uncommon comparisons between species (which they may not have archived in Ensembl). I also don't understand enough about the functions yet as to whether they are comparing the same regions from the same chromosomes that just happen to be identical or whether they are comparing the same region with a homologous region on a different chromosome (i.e. conserved genes). I'm going to have to dig into this some more to figure out what is going on. Thanks for the pointers, I'll refer future questions to the Ensembl list/help-desk. However, if anyone knows Ensembl very well, the database has in it some of these interspecies comparisons already. They are accessed when one does a phylogeny tree for specific genes (and generally for highly conserved gene you will get a tree that includes nearly all 50 species in the database). As I don't think they are computed on-the-fly, the information must be precomputed and stored someplace in the database. I would very much like to know how to access this information. Thanks, Robert On 1/9/10, Andreas K?h?ri wrote: > On Sat, Jan 09, 2010 at 03:27:54PM -0500, Robert Bradbury wrote: >> I am trying to get the examples provided by EMBL/Ensembl to work and am >> encountering problems. > > Hi Robert, > > The ensembl-dev list is the appropriate forum for this type of questions > as it has nothing to do with bioperl. > > There is also the Ensembl helpdesk. If you send your problem to > I'm sure that it will be picked up by the > appropriate people (I do myself not know enough about the Compara API to > be able to diagnose this problem straight away I'm afraid). > > Be sure to submit a minimal script that still exhibit the problem, and > information about what version of the APIs you're using (we will assume > that you're not mixing newer version of the API with older databases or > vice versa). > > We are generally very happy to have bugs in documentation or code > pointed out to us, and will correct errors as we are made aware of them. > > > Kind regards, > Andreas > >> For example, about 1/3 of the way through the Compara API tutorial [1] >> there >> is what is supposed to be a completely functional script. It does not >> work. This is in contrast to some of the earlier simple scripts (listing >> the species in Ensmbl etc.) which do work on my machine, so I have all >> the >> libraries do dah installed correctly). >> >> Very poor form to document scripts which do not function on a properly >> setup >> system. >> >> I have modified my invocation of the script slightly: >> Align.pl --set_of_species \ >> "Homo sapiens:Pan troglodytes:Gorilla gorilla:Macaca mulatta:Otolemur >> garnettii:Pongo pygmaeus:Equus caballus:Bos taurus:Loxodonta >> africana:Pteropus vampyrus:Myotis lucifugus:Felis catus:Canis >> familiaris:Sus >> scrofa:Rattus norvegicus:Mus musculus:Gallus gallus:Xenopus >> tropicalis:Takifugu rubripes:Tetraodon nigroviridis:Danio rerio:Tupaia >> belangeri:Caenorhabditis elegans:Saccharomyces cerevisiae" >> >> which results in "Can't call method "fetch_by_method_link_type_GenomeDBs" >> on >> an undefined value at ./Align.pl line 132." (Align.pl is my slightly >> modified example of the Compara Tutoraial code.) >> As these are slightly modified perl scripts from the documantation, the >> line >> numbers may be variable. >> >> I can print out the genome_dbs, and it gives me a list of genome names >> (hash >> tables) though it appears that is problematic in the Align.pl script. >> in spite of the fact that just previously to that call I dumped >> "genome_dbs" >> and got back some 25 hash tables (expected). I believe this occurs >> whether >> one is comparing "human:mouse" or the more complex species set I have >> outlined above. >> >> >> >> Has anyone else attempted to run the code documented in the Ensembl API >> Tutorial? >> Any suggestions as to what direction to go in would be appreciated -- when >> one is trying to copy code out of a tutorial and it fails its kind of hard >> to know where to go.) >> >> There do appear to be some problems in the specifications of a Compara >> version/database and there don't appear to be a lot of resources informing >> one of what resources are currently available. >> >> Robert >> >> >> 1. http://pre.ensembl.org/info/docs/api/compara/compara_tutorial.html >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > -- > Andreas K?h?ri, Ensembl Software Developer > European Bioinformatics Institute (EMBL-EBI) > Wellcome Trust Genome Campus, Hinxton > Cambridge CB10 1SD, United Kingdom > From Russell.Smithies at agresearch.co.nz Sun Jan 10 15:34:39 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Mon, 11 Jan 2010 09:34:39 +1300 Subject: [Bioperl-l] how to retrieve organism name from accession number? In-Reply-To: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife> References: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz> An alternate non-BioPerly way (that may be faster given NCBI's flakiness lately) would be to download the gi_taxid_nucl.zip or gi_taxid_prot.zip files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash and do lookups. In that same dir, taxdump.tar.gz contains a file called names.dmp which lists taxids and descriptions (and synonyms) If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I could do this: my $taxid = $gi_taxid_nucl{$accession}; my $org_name = $names{$taxid}; --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen > Sent: Saturday, 26 December 2009 4:52 p.m. > To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] how to retrieve organism name from accession > number? > > Bhakti, > The following example (using EUtilities) may serve your purpose: > > use Bio::DB::EUtilities; > > my (%taxa, @taxa); > my (%names, %idmap); > > # these are protein ids; nuc ids will work by changing -dbfrom => > 'nucleotide', > # (probably) > > my @ids = qw(1621261 89318838 68536103 20807972 730439); > > my $factory = Bio::DB::EUtilities->new(-eutil => 'elink', > -db => 'taxonomy', > -dbfrom => 'protein', > -correspondence => 1, > -id => \@ids); > > # iterate through the LinkSet objects > while (my $ds = $factory->next_LinkSet) { > $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0] > } > > @taxa = @taxa{@ids}; > > $factory = Bio::DB::EUtilities->new(-eutil => 'esummary', > -db => 'taxonomy', > -id => \@taxa ); > > while (local $_ = $factory->next_DocSum) { > $names{($_->get_contents_by_name('TaxId'))[0]} = > ($_->get_contents_by_name('ScientificName'))[0]; > } > > foreach (@ids) { > $idmap{$_} = $names{$taxa{$_}}; > } > > # %idmap is > # 1621261 => 'Mycobacterium tuberculosis H37Rv' > # 20807972 => 'Thermoanaerobacter tengcongensis MB4' > # 68536103 => 'Corynebacterium jeikeium K411' > # 730439 => 'Bacillus caldolyticus' > # 89318838 => undef (this record has been removed from the db) > > 1; > > You probably will need to break up your 30000 into chunks > (say, 1000-3000 each), and do the above on each chunk with a > > sleep 3; > > or so separating the queries. > MAJ > ----- Original Message ----- > From: "Bhakti Dwivedi" > To: > Sent: Friday, December 25, 2009 9:46 PM > Subject: [Bioperl-l] how to retrieve organism name from accession number? > > > > Hi, > > > > Does anyone know how to retrieve the "Source" or the "Species name" > given > > the accession number using Bioperl. I have these 30,000 accession > numbers > > for which I need to get the source organisms. Any kind of help will be > > appreciated. > > > > Thanks > > > > BD > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From cjfields at illinois.edu Sun Jan 10 15:49:40 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 10 Jan 2010 14:49:40 -0600 Subject: [Bioperl-l] how to retrieve organism name from accession number? In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz> References: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife> <18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz> Message-ID: One could also use Bio::DB::Taxonomy, which indexes the same files or (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for the details). chris On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote: > An alternate non-BioPerly way (that may be faster given NCBI's flakiness lately) would be to download the gi_taxid_nucl.zip or gi_taxid_prot.zip files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash and do lookups. > In that same dir, taxdump.tar.gz contains a file called names.dmp which lists taxids and descriptions (and synonyms) > > If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I could do this: > > my $taxid = $gi_taxid_nucl{$accession}; > my $org_name = $names{$taxid}; > > --Russell > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen >> Sent: Saturday, 26 December 2009 4:52 p.m. >> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] how to retrieve organism name from accession >> number? >> >> Bhakti, >> The following example (using EUtilities) may serve your purpose: >> >> use Bio::DB::EUtilities; >> >> my (%taxa, @taxa); >> my (%names, %idmap); >> >> # these are protein ids; nuc ids will work by changing -dbfrom => >> 'nucleotide', >> # (probably) >> >> my @ids = qw(1621261 89318838 68536103 20807972 730439); >> >> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink', >> -db => 'taxonomy', >> -dbfrom => 'protein', >> -correspondence => 1, >> -id => \@ids); >> >> # iterate through the LinkSet objects >> while (my $ds = $factory->next_LinkSet) { >> $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0] >> } >> >> @taxa = @taxa{@ids}; >> >> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary', >> -db => 'taxonomy', >> -id => \@taxa ); >> >> while (local $_ = $factory->next_DocSum) { >> $names{($_->get_contents_by_name('TaxId'))[0]} = >> ($_->get_contents_by_name('ScientificName'))[0]; >> } >> >> foreach (@ids) { >> $idmap{$_} = $names{$taxa{$_}}; >> } >> >> # %idmap is >> # 1621261 => 'Mycobacterium tuberculosis H37Rv' >> # 20807972 => 'Thermoanaerobacter tengcongensis MB4' >> # 68536103 => 'Corynebacterium jeikeium K411' >> # 730439 => 'Bacillus caldolyticus' >> # 89318838 => undef (this record has been removed from the db) >> >> 1; >> >> You probably will need to break up your 30000 into chunks >> (say, 1000-3000 each), and do the above on each chunk with a >> >> sleep 3; >> >> or so separating the queries. >> MAJ >> ----- Original Message ----- >> From: "Bhakti Dwivedi" >> To: >> Sent: Friday, December 25, 2009 9:46 PM >> Subject: [Bioperl-l] how to retrieve organism name from accession number? >> >> >>> Hi, >>> >>> Does anyone know how to retrieve the "Source" or the "Species name" >> given >>> the accession number using Bioperl. I have these 30,000 accession >> numbers >>> for which I need to get the source organisms. Any kind of help will be >>> appreciated. >>> >>> Thanks >>> >>> BD >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Russell.Smithies at agresearch.co.nz Sun Jan 10 16:05:06 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Mon, 11 Jan 2010 10:05:06 +1300 Subject: [Bioperl-l] how to retrieve organism name from accession number? In-Reply-To: References: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife> <18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz> I've started to go off eUtils recently (not BioPerl's fault) as I've often been finding that with large queries, chunks of the resulting data is missing. For example, before Xmas I was creating species-specific databases by using eUtils to get a list of GI numbers back for a taxid, then retrieving the fasta sequences in chunks of 500. Very regularly, in the middle of the fasta there would be a message about resource unavailable eg. >test_sequence_1 TACGATCATCGCTResource UnavailableTACGACTCTGCT >test_sequence_2 TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT Often this wasn't detected until formatdb complained about invalid characters. Inquiries to NCBI as to why this was happening and what to do about it returned stupid answers ("do each sequence manually thru the web interface", or "use eUtils"). As we have a nice fast network connection, I now prefer to download very large gzip files (i.e. all of refseq) and extract what I need. I can't help but think that NCBI could solve a lot of problems if they gzipped the output from eUtils queries - it's something I've requested regularly for the last 5 years or so!! --Russell > -----Original Message----- > From: Chris Fields [mailto:cjfields at illinois.edu] > Sent: Monday, 11 January 2010 9:50 a.m. > To: Smithies, Russell > Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-bio.org' > Subject: Re: [Bioperl-l] how to retrieve organism name from accession > number? > > One could also use Bio::DB::Taxonomy, which indexes the same files or > (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for the > details). > > chris > > On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote: > > > An alternate non-BioPerly way (that may be faster given NCBI's flakiness > lately) would be to download the gi_taxid_nucl.zip or gi_taxid_prot.zip > files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash and > do lookups. > > In that same dir, taxdump.tar.gz contains a file called names.dmp which > lists taxids and descriptions (and synonyms) > > > > If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I > could do this: > > > > my $taxid = $gi_taxid_nucl{$accession}; > > my $org_name = $names{$taxid}; > > > > --Russell > > > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen > >> Sent: Saturday, 26 December 2009 4:52 p.m. > >> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org > >> Subject: Re: [Bioperl-l] how to retrieve organism name from accession > >> number? > >> > >> Bhakti, > >> The following example (using EUtilities) may serve your purpose: > >> > >> use Bio::DB::EUtilities; > >> > >> my (%taxa, @taxa); > >> my (%names, %idmap); > >> > >> # these are protein ids; nuc ids will work by changing -dbfrom => > >> 'nucleotide', > >> # (probably) > >> > >> my @ids = qw(1621261 89318838 68536103 20807972 730439); > >> > >> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink', > >> -db => 'taxonomy', > >> -dbfrom => 'protein', > >> -correspondence => 1, > >> -id => \@ids); > >> > >> # iterate through the LinkSet objects > >> while (my $ds = $factory->next_LinkSet) { > >> $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0] > >> } > >> > >> @taxa = @taxa{@ids}; > >> > >> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary', > >> -db => 'taxonomy', > >> -id => \@taxa ); > >> > >> while (local $_ = $factory->next_DocSum) { > >> $names{($_->get_contents_by_name('TaxId'))[0]} = > >> ($_->get_contents_by_name('ScientificName'))[0]; > >> } > >> > >> foreach (@ids) { > >> $idmap{$_} = $names{$taxa{$_}}; > >> } > >> > >> # %idmap is > >> # 1621261 => 'Mycobacterium tuberculosis H37Rv' > >> # 20807972 => 'Thermoanaerobacter tengcongensis MB4' > >> # 68536103 => 'Corynebacterium jeikeium K411' > >> # 730439 => 'Bacillus caldolyticus' > >> # 89318838 => undef (this record has been removed from the db) > >> > >> 1; > >> > >> You probably will need to break up your 30000 into chunks > >> (say, 1000-3000 each), and do the above on each chunk with a > >> > >> sleep 3; > >> > >> or so separating the queries. > >> MAJ > >> ----- Original Message ----- > >> From: "Bhakti Dwivedi" > >> To: > >> Sent: Friday, December 25, 2009 9:46 PM > >> Subject: [Bioperl-l] how to retrieve organism name from accession > number? > >> > >> > >>> Hi, > >>> > >>> Does anyone know how to retrieve the "Source" or the "Species name" > >> given > >>> the accession number using Bioperl. I have these 30,000 accession > >> numbers > >>> for which I need to get the source organisms. Any kind of help will > be > >>> appreciated. > >>> > >>> Thanks > >>> > >>> BD > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >>> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > ======================================================================= > > Attention: The information contained in this message and/or attachments > > from AgResearch Limited is intended only for the persons or entities > > to which it is addressed and may contain confidential and/or privileged > > material. Any review, retransmission, dissemination or other use of, or > > taking of any action in reliance upon, this information by persons or > > entities other than the intended recipients is prohibited by AgResearch > > Limited. If you have received this message in error, please notify the > > sender immediately. > > ======================================================================= > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l From avilella at gmail.com Sun Jan 10 16:05:13 2010 From: avilella at gmail.com (Albert Vilella) Date: Sun, 10 Jan 2010 21:05:13 +0000 Subject: [Bioperl-l] Ensembl problems In-Reply-To: References: <20100109220121.GA9521@quux.windows.ebi.ac.uk> Message-ID: <358f4d651001101305q1b75cfe3q558a245ab1ab1238@mail.gmail.com> > However, if anyone knows Ensembl very well, the database has in it > some of these interspecies comparisons already. ?They are accessed > when one does a phylogeny tree for specific genes (and generally for > highly conserved gene you will get a tree that includes nearly all 50 > species in the database). ?As I don't think they are computed > on-the-fly, the information must be precomputed and stored someplace > in the database. ?I would very much like to know how to access this > information. Yes, they are. You can access the data programmatically by installing the ensembl and ensembl-compara Perl APIs. There are a few example scripts for the GeneTrees: ensembl-compara/scripts/examples/homology*.pl Cheers, Albert. > Thanks, > Robert > > > > > On 1/9/10, Andreas K?h?ri wrote: >> On Sat, Jan 09, 2010 at 03:27:54PM -0500, Robert Bradbury wrote: >>> I am trying to get the examples provided by EMBL/Ensembl to work and am >>> encountering problems. >> >> Hi Robert, >> >> The ensembl-dev list is the appropriate forum for this type of questions >> as it has nothing to do with bioperl. >> >> There is also the Ensembl helpdesk. ?If you send your problem to >> I'm sure that it will be picked up by the >> appropriate people (I do myself not know enough about the Compara API to >> be able to diagnose this problem straight away I'm afraid). >> >> Be sure to submit a minimal script that still exhibit the problem, and >> information about what version of the APIs you're using (we will assume >> that you're not mixing newer version of the API with older databases or >> vice versa). >> >> We are generally very happy to have bugs in documentation or code >> pointed out to us, and will correct errors as we are made aware of them. >> >> >> Kind regards, >> Andreas >> >>> For example, about 1/3 of the way through the Compara API tutorial [1] >>> there >>> is what is supposed to be a completely functional script. ?It does not >>> work. ?This is in contrast to some of the earlier simple scripts (listing >>> the species in ?Ensmbl etc.) which do work on my machine, so I have all >>> the >>> libraries do dah installed correctly). >>> >>> Very poor form to document scripts which do not function on a properly >>> setup >>> system. >>> >>> I have modified my invocation of the script slightly: >>> ? Align.pl --set_of_species \ >>> "Homo sapiens:Pan troglodytes:Gorilla gorilla:Macaca mulatta:Otolemur >>> garnettii:Pongo pygmaeus:Equus caballus:Bos taurus:Loxodonta >>> africana:Pteropus vampyrus:Myotis lucifugus:Felis catus:Canis >>> familiaris:Sus >>> scrofa:Rattus norvegicus:Mus musculus:Gallus gallus:Xenopus >>> tropicalis:Takifugu rubripes:Tetraodon nigroviridis:Danio rerio:Tupaia >>> belangeri:Caenorhabditis elegans:Saccharomyces cerevisiae" >>> >>> which results in "Can't call method "fetch_by_method_link_type_GenomeDBs" >>> on >>> an undefined value at ./Align.pl line 132." (Align.pl is my slightly >>> modified example of the Compara Tutoraial code.) >>> As these are slightly modified perl scripts from the documantation, the >>> line >>> numbers may be variable. >>> >>> I can print out the genome_dbs, and it gives me a list of genome names >>> (hash >>> tables) though it appears that is problematic in the Align.pl script. >>> in spite of the fact that just previously to that call I dumped >>> "genome_dbs" >>> and got back some 25 hash tables (expected). ?I believe this occurs >>> whether >>> one is comparing "human:mouse" or the more complex species set I have >>> outlined above. >>> >>> >>> >>> Has anyone else attempted to run the code documented in the Ensembl API >>> Tutorial? >>> Any suggestions as to what direction to go in would be appreciated -- when >>> one is trying to copy code out of a tutorial and it fails its kind of hard >>> to know where to go.) >>> >>> There do appear to be some problems in the specifications of a Compara >>> version/database and there don't appear to be a lot of resources informing >>> one of what resources are currently available. >>> >>> Robert >>> >>> >>> 1. http://pre.ensembl.org/info/docs/api/compara/compara_tutorial.html >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> -- >> Andreas K?h?ri, Ensembl Software Developer >> European Bioinformatics Institute (EMBL-EBI) >> Wellcome Trust Genome Campus, Hinxton >> Cambridge CB10 1SD, United Kingdom >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From alessandra.bilardi at gmail.com Sun Jan 10 18:21:12 2010 From: alessandra.bilardi at gmail.com (Alessandra) Date: Mon, 11 Jan 2010 00:21:12 +0100 Subject: [Bioperl-l] GBrowse.org project In-Reply-To: References: Message-ID: Hi all, I'm Alessandra and I run GBrowse.org. GBrowse.org is a resource for using and setting up GBrowse genome browsers. The site provides one location where biologists and bioinformaticians can find: 1. Genome browser web sites for any organism that has them. If a species has a genome browser anywhere on the web, then we aim to link to it. 2. Links to sequence and annotation files that are available online. 3. Links to genome browser configuration files, when available 4. An FTP site containing genome annotation and configuration files for each annotated genome that does not have its own web site. GBrowse.org emphasizes the GBrowse genome browser in its organization, but also links to sites that use other browser packages such as UCSC, Ensembl, and JBrowse. Also, we are currently conducting a survey seeking input on future project direction. Please take a few minutes now to provide your feedback. Survey link: http://gbrowse.org/survey/index.php?sid=64264&lang=en GBrowse.org introdution link: http://gmod.org/wiki/August_2009_GMOD_Meeting#GBrowse.org Thank you for your help, Alessandra Bilardi. http://gbrowse.org/ CRIBI Genomics, University of Padua http://genomics.cribi.unipd.it/ From cjfields at illinois.edu Sun Jan 10 22:04:13 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 10 Jan 2010 21:04:13 -0600 Subject: [Bioperl-l] GMOD BioPerl Meeting Message-ID: <7D72ECC2-E856-4C09-B67A-62AFFB59B377@illinois.edu> Just a quick reminder that we're having a BioPerl satellite meeting after the PAG Conference (just prior to the GMOD Meeting). The meeting is this Wednesday, Jan. 13, starting at 11:30am, at the Best Western Seven Seas in San Diego. I will update the relevant BioPerl and GMOD pages with more details as they become available. At the moment, we will be meeting in the hotel lobby prior to starting the meeting and possible hackathon. http://www.bioperl.org/wiki/GMOD_2010_Meeting http://gmod.org/wiki/January_2010_GMOD_Meeting#Satellite_Meetings Thanks! chris From bernd.jagla at pasteur.fr Mon Jan 11 05:11:16 2010 From: bernd.jagla at pasteur.fr (Bernd Jagla) Date: Mon, 11 Jan 2010 11:11:16 +0100 Subject: [Bioperl-l] Bio:DB:SeqFeature - MySql - Java Message-ID: <6D85585C10F94E25898249D2D7CAC0D7@zillumina> Hi, First off, I am not sure if this is supposed to be addressed to the Bioperl or Gbrowse mailing list, so apologies if this is the wrong list and please let me know. I am writing a program in Java that needs to access genome annotation data. Since I am using Gbrowse already I was thinking that I could combine both approaches making life eventually easier for me. I am mainly interested in getting a gene/feature name for a given position. The position is stored in the feature table and through linking typelist, locationlist, (maybe sequence), and feature I can get all the information I need. Unfortunately it seems that the feature name is stored in the object blog of the feature table. That is a bit suspicious to me because I don't understand why searching for a name can be so fast if it is not indexed through mysql when searching using GBrowse. So my question is how to I parse the Bio::DB::SeqFeature object in JAVA correctly to get the name of the feature and possible also any further information. Any suggestions are greatly appreciated. Maybe there is a better solution than parsing Perl code with Java.? Thanks a lot, Bernd From biopython at maubp.freeserve.co.uk Mon Jan 11 05:48:52 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 11 Jan 2010 10:48:52 +0000 Subject: [Bioperl-l] Bio:DB:SeqFeature - MySql - Java In-Reply-To: <6D85585C10F94E25898249D2D7CAC0D7@zillumina> References: <6D85585C10F94E25898249D2D7CAC0D7@zillumina> Message-ID: <320fb6e01001110248t628f0837qa5e057fd53b58eac@mail.gmail.com> On Mon, Jan 11, 2010 at 10:11 AM, Bernd Jagla wrote: > Hi, > > First off, I am not sure if this is supposed to be addressed to the Bioperl > or Gbrowse mailing list, so apologies if this is the wrong list and please > let me know. > > I am writing a program in Java that needs to access genome annotation data. > Since I am using Gbrowse already I was thinking that I could combine both > approaches making life eventually easier for me. I am mainly interested in > getting a gene/feature name for a given position. The position is stored in > the feature table and through linking typelist, locationlist, (maybe > sequence), and feature I can get all the information I need. Unfortunately > it seems that the feature name is stored in the object blog of the feature > table. How are you storing the data in Gbrowse? There are several back ends, and this will make a big difference for accessing the raw data. One option would be to use Gbrowse with BioSQL as the backend. You can then use BioJava (or BioPerl, or BioPython, etc) to access the database. The only downside is Gbrowse isn't working 100% on top of BioSQL right now (I'd like to see this fixed, but I don't know Perl). There is an open bug on this [ gmod-Bugs-2168597 ]. Peter From bernd.jagla at pasteur.fr Mon Jan 11 05:53:20 2010 From: bernd.jagla at pasteur.fr (Bernd Jagla) Date: Mon, 11 Jan 2010 11:53:20 +0100 Subject: [Bioperl-l] Bio:DB:SeqFeature - MySql - Java In-Reply-To: <320fb6e01001110248t628f0837qa5e057fd53b58eac@mail.gmail.com> References: <6D85585C10F94E25898249D2D7CAC0D7@zillumina> <320fb6e01001110248t628f0837qa5e057fd53b58eac@mail.gmail.com> Message-ID: <9056164A8A744A77B6CD1E8E4E20B104@zillumina> I am using bp_seqfeature_load.pl to load my features. That is using Bio:DB:SeqFeature(Store) and MySql as a backend... That's all I understood... B > -----Original Message----- > From: p.j.a.cock at googlemail.com [mailto:p.j.a.cock at googlemail.com] On > Behalf Of Peter > Sent: Monday, January 11, 2010 11:49 AM > To: Bernd Jagla > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bio:DB:SeqFeature - MySql - Java > > On Mon, Jan 11, 2010 at 10:11 AM, Bernd Jagla > wrote: > > Hi, > > > > First off, I am not sure if this is supposed to be addressed to the > Bioperl > > or Gbrowse mailing list, so apologies if this is the wrong list and > please > > let me know. > > > > I am writing a program in Java that needs to access genome annotation > data. > > Since I am using Gbrowse already I was thinking that I could combine > both > > approaches making life eventually easier for me. I am mainly interested > in > > getting a gene/feature name for a given position. The position is stored > in > > the feature table and through linking typelist, locationlist, (maybe > > sequence), and feature I can get all the information I need. > Unfortunately > > it seems that the feature name is stored in the object blog of the > feature > > table. > > How are you storing the data in Gbrowse? There are several back ends, > and this will make a big difference for accessing the raw data. > > One option would be to use Gbrowse with BioSQL as the backend. > You can then use BioJava (or BioPerl, or BioPython, etc) to access the > database. The only downside is Gbrowse isn't working 100% on top > of BioSQL right now (I'd like to see this fixed, but I don't know Perl). > There is an open bug on this [ gmod-Bugs-2168597 ]. > > Peter From awitney at sgul.ac.uk Mon Jan 11 07:21:07 2010 From: awitney at sgul.ac.uk (Adam Witney) Date: Mon, 11 Jan 2010 12:21:07 +0000 Subject: [Bioperl-l] create Bio::AlignIO object from hash Message-ID: Hi, I am writing a script to automate the running of Phylip Pars. In the process i have to create a Bio::AlignIO object from a set of data that i have in a hash. I could write the hash data into a phylip file and then load the Bio::AlignIO from that file, but i wondered if i could skip the writing and then reading of a temporary file ? thanks for any help adam From roy.chaudhuri at gmail.com Mon Jan 11 08:54:25 2010 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Mon, 11 Jan 2010 13:54:25 +0000 Subject: [Bioperl-l] create Bio::AlignIO object from hash In-Reply-To: <4B4B2A51.9040602@gmail.com> References: <4B4B2A51.9040602@gmail.com> Message-ID: <4B4B2D91.70906@gmail.com> Actually, I guess some sample code would be more helpful: use Bio::LocatableSeq; use Bio::SimpleAlign; use Bio::AlignIO; my $seq1=Bio::LocatableSeq->new(-id=>'one', -seq=>'AT-CG', -start=>1, -end=>4); my $seq2=Bio::LocatableSeq->new(-id=>'two', -seq=>'A--CG', -start=>1, -end=>3); my $seq3=Bio::LocatableSeq->new(-id=>'three', -seq=>'ATTCG', -start=>1, -end=>5); my $aln=Bio::SimpleAlign->new(-seqs=>[$seq1,$seq2,$seq3]); Bio::AlignIO->new(-format=>'phylip')->write_aln($aln); Cheers, Roy. On 11/01/2010 13:40, Roy Chaudhuri wrote: > Hi Adam, > > I'm guessing you actually want to create a Bio::SimpleAlign object > (representing an alignment), rather than a Bio::AlignIO object (which is > just for reading/writing alignment files). Bio::SimpleAlign has a > documented new method that allows you to construct an alignment from > Bio::LocatableSeq objects, which are similar to Bio::Seq objects but > include gaps and start/end coordinates to describe their relationship to > other sequences in the alignment. > > Roy. > > On 11/01/2010 12:21, Adam Witney wrote: >> Hi, >> >> I am writing a script to automate the running of Phylip Pars. In the >> process i have to create a Bio::AlignIO object from a set of data >> that i have in a hash. >> >> I could write the hash data into a phylip file and then load the >> Bio::AlignIO from that file, but i wondered if i could skip the >> writing and then reading of a temporary file ? >> >> thanks for any help >> >> adam _______________________________________________ Bioperl-l >> mailing list Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From roy.chaudhuri at gmail.com Mon Jan 11 08:40:33 2010 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Mon, 11 Jan 2010 13:40:33 +0000 Subject: [Bioperl-l] create Bio::AlignIO object from hash In-Reply-To: References: Message-ID: <4B4B2A51.9040602@gmail.com> Hi Adam, I'm guessing you actually want to create a Bio::SimpleAlign object (representing an alignment), rather than a Bio::AlignIO object (which is just for reading/writing alignment files). Bio::SimpleAlign has a documented new method that allows you to construct an alignment from Bio::LocatableSeq objects, which are similar to Bio::Seq objects but include gaps and start/end coordinates to describe their relationship to other sequences in the alignment. Roy. On 11/01/2010 12:21, Adam Witney wrote: > Hi, > > I am writing a script to automate the running of Phylip Pars. In the > process i have to create a Bio::AlignIO object from a set of data > that i have in a hash. > > I could write the hash data into a phylip file and then load the > Bio::AlignIO from that file, but i wondered if i could skip the > writing and then reading of a temporary file ? > > thanks for any help > > adam _______________________________________________ Bioperl-l > mailing list Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From biopython at maubp.freeserve.co.uk Mon Jan 11 09:16:45 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 11 Jan 2010 14:16:45 +0000 Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records Message-ID: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com> Hi, I'm running bioperl-live from SVN, just updated to revision 16648. $ perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"' 1.0069 I am trying to get Bio::SeqIO to convert a multiple record EMBL file into GenBank format, piping the data via stdin/stdout using the following trivial Perl script: #!/usr/bin/env perl use Bio::SeqIO; my $in = Bio::SeqIO->new(-fh => \*STDIN, -format => 'embl'); my $out = Bio::SeqIO->new(-format => 'genbank'); while (my $seq = $in->next_seq) { $out->write_seq($seq) }; This only seems to find the first EMBL record in my example files. For example, this simple file has just two contig records: http://biopython.open-bio.org/SRC/biopython/Tests/EMBL/Human_contigs.embl This is just the first two records taken from a much larger EMBL file rel_con_hum_01_r102.dat downloaded and uncompressed from: ftp://ftp.ebi.ac.uk/pub/databases/embl/release/rel_con_hum_01_r102.dat.gz Trying both these examples as input, BioPerl just gives a single GenBank record as output (the first EMBL entry in the input). Is this a BioPerl bug, or am I missing something? Peter From maj at fortinbras.us Mon Jan 11 10:04:00 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 11 Jan 2010 10:04:00 -0500 Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records In-Reply-To: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com> References: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com> Message-ID: Hi Peter, I found the issue-- there are no SQ lines in the data, and having them is a key stop condition in the parser (line 438 embl.pm). We evidently need to be more liberal in what we accept, even as we are strict in what we emit. Could you make a bug report? thanks for the heads-up-- MAJ ----- Original Message ----- From: "Peter" To: "bioperl-l list" Sent: Monday, January 11, 2010 9:16 AM Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records > Hi, > > I'm running bioperl-live from SVN, just updated to revision 16648. > > $ perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"' > 1.0069 > > I am trying to get Bio::SeqIO to convert a multiple record EMBL > file into GenBank format, piping the data via stdin/stdout using > the following trivial Perl script: > > #!/usr/bin/env perl > use Bio::SeqIO; > my $in = Bio::SeqIO->new(-fh => \*STDIN, -format => 'embl'); > my $out = Bio::SeqIO->new(-format => 'genbank'); > while (my $seq = $in->next_seq) { $out->write_seq($seq) }; > > This only seems to find the first EMBL record in my example > files. For example, this simple file has just two contig records: > http://biopython.open-bio.org/SRC/biopython/Tests/EMBL/Human_contigs.embl > > This is just the first two records taken from a much larger EMBL file > rel_con_hum_01_r102.dat downloaded and uncompressed from: > ftp://ftp.ebi.ac.uk/pub/databases/embl/release/rel_con_hum_01_r102.dat.gz > > Trying both these examples as input, BioPerl just gives a single > GenBank record as output (the first EMBL entry in the input). > > Is this a BioPerl bug, or am I missing something? > > Peter > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From biopython at maubp.freeserve.co.uk Mon Jan 11 10:17:37 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 11 Jan 2010 15:17:37 +0000 Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records In-Reply-To: References: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com> Message-ID: <320fb6e01001110717g93f11ccn13c1010cefeb3a5b@mail.gmail.com> On Mon, Jan 11, 2010 at 3:04 PM, Mark A. Jensen wrote: > > Hi Peter, I found the issue-- there are no SQ lines in the data, and having > them is a key stop condition in the parser (line 438 embl.pm). > We evidently need to be more liberal in what we accept, even as we are > strict in what we emit. Could you make a bug report? > thanks for the heads-up-- > MAJ Done: http://bugzilla.open-bio.org/show_bug.cgi?id=2982 These are EMBL contig records, so they don't have SQ lines, but instead CO lines. Peter From cjfields at illinois.edu Mon Jan 11 10:24:24 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 11 Jan 2010 09:24:24 -0600 Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records In-Reply-To: <320fb6e01001110717g93f11ccn13c1010cefeb3a5b@mail.gmail.com> References: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com> <320fb6e01001110717g93f11ccn13c1010cefeb3a5b@mail.gmail.com> Message-ID: On Jan 11, 2010, at 9:17 AM, Peter wrote: > On Mon, Jan 11, 2010 at 3:04 PM, Mark A. Jensen wrote: >> >> Hi Peter, I found the issue-- there are no SQ lines in the data, and having >> them is a key stop condition in the parser (line 438 embl.pm). >> We evidently need to be more liberal in what we accept, even as we are >> strict in what we emit. Could you make a bug report? >> thanks for the heads-up-- >> MAJ > > Done: http://bugzilla.open-bio.org/show_bug.cgi?id=2982 > > These are EMBL contig records, so they don't have SQ lines, > but instead CO lines. > > Peter Peter, Just curious, but have you tried the experimental EMBL parser 'embldriver'? I don't think it's bound to the same strictures, but I may be mistaken. chris From cjfields at illinois.edu Mon Jan 11 10:23:00 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 11 Jan 2010 09:23:00 -0600 Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records In-Reply-To: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com> References: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com> Message-ID: <0D0D9DB5-56FA-414E-8D1D-3FE18198F7EC@illinois.edu> Just saw that mark responded, so if possible submit a bug. We may be doing a mini-hackathon this Wednesday, so we can probably tackle it in the process (possibly along with a few other pressing issues). chris On Jan 11, 2010, at 8:16 AM, Peter wrote: > Hi, > > I'm running bioperl-live from SVN, just updated to revision 16648. > > $ perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"' > 1.0069 > > I am trying to get Bio::SeqIO to convert a multiple record EMBL > file into GenBank format, piping the data via stdin/stdout using > the following trivial Perl script: > > #!/usr/bin/env perl > use Bio::SeqIO; > my $in = Bio::SeqIO->new(-fh => \*STDIN, -format => 'embl'); > my $out = Bio::SeqIO->new(-format => 'genbank'); > while (my $seq = $in->next_seq) { $out->write_seq($seq) }; > > This only seems to find the first EMBL record in my example > files. For example, this simple file has just two contig records: > http://biopython.open-bio.org/SRC/biopython/Tests/EMBL/Human_contigs.embl > > This is just the first two records taken from a much larger EMBL file > rel_con_hum_01_r102.dat downloaded and uncompressed from: > ftp://ftp.ebi.ac.uk/pub/databases/embl/release/rel_con_hum_01_r102.dat.gz > > Trying both these examples as input, BioPerl just gives a single > GenBank record as output (the first EMBL entry in the input). > > Is this a BioPerl bug, or am I missing something? > > Peter > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From biopython at maubp.freeserve.co.uk Mon Jan 11 10:55:26 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 11 Jan 2010 15:55:26 +0000 Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records In-Reply-To: References: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com> Message-ID: <320fb6e01001110755u50fd4255i1f6dae40a608a562@mail.gmail.com> On Mon, Jan 11, 2010 at 3:42 PM, Hotz, Hans-Rudolf wrote: > > These entries form the CON data class, see: > http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html#3_4_14 > and they don't contain any sequence information. I know - GenBank files have a similar system with CONTIG lines instead of sequences. I was expecting BioPerl to be able to convert these EMBL files with CO lines into GenBank files with CONTIG lines. > If you take the 'expanded' entries from > ftp://ftp.ebi.ac.uk/pub/databases/embl/expanded_con/release/rel_con_hum_01_r102.dat.gz > your script will work. That's a useful tip - thanks. Peter From hrh at fmi.ch Mon Jan 11 10:42:22 2010 From: hrh at fmi.ch (Hotz, Hans-Rudolf) Date: Mon, 11 Jan 2010 16:42:22 +0100 Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records In-Reply-To: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com> Message-ID: On 1/11/10 3:16 PM, "Peter" wrote: > Hi, > > I'm running bioperl-live from SVN, just updated to revision 16648. > > $ perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"' > 1.0069 > > I am trying to get Bio::SeqIO to convert a multiple record EMBL > file into GenBank format, piping the data via stdin/stdout using > the following trivial Perl script: > > #!/usr/bin/env perl > use Bio::SeqIO; > my $in = Bio::SeqIO->new(-fh => \*STDIN, -format => 'embl'); > my $out = Bio::SeqIO->new(-format => 'genbank'); > while (my $seq = $in->next_seq) { $out->write_seq($seq) }; > > This only seems to find the first EMBL record in my example > files. For example, this simple file has just two contig records: > http://biopython.open-bio.org/SRC/biopython/Tests/EMBL/Human_contigs.embl > > This is just the first two records taken from a much larger EMBL file > rel_con_hum_01_r102.dat downloaded and uncompressed from: > ftp://ftp.ebi.ac.uk/pub/databases/embl/release/rel_con_hum_01_r102.dat.gz These entries form the CON data class, see: http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html#3_4_14 and they don't contain any sequence information. If you take the 'expanded' entries from ftp://ftp.ebi.ac.uk/pub/databases/embl/expanded_con/release/rel_con_hum_01_r 102.dat.gz your script will work. Hans > Trying both these examples as input, BioPerl just gives a single > GenBank record as output (the first EMBL entry in the input). > > Is this a BioPerl bug, or am I missing something? > > Peter > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From awitney at sgul.ac.uk Mon Jan 11 11:27:15 2010 From: awitney at sgul.ac.uk (Adam Witney) Date: Mon, 11 Jan 2010 16:27:15 +0000 Subject: [Bioperl-l] create Bio::AlignIO object from hash In-Reply-To: <4B4B2D91.70906@gmail.com> References: <4B4B2A51.9040602@gmail.com> <4B4B2D91.70906@gmail.com> Message-ID: Ah excellent, thanks Roy. I was indeed thinking about it the wrong way. In the process of writing this i have created a Bio::Tools::Run::Phylo::Phylip::Pars class which is essentially just a modified copy of ProtPars. I have also fixed a few typos and possible bugs in Bio/Tools/Run/Phylo/Phylip/Base.pm Bio/Tools/Run/Phylo/Phylip/PhylipConf.pm Bio/AlignIO/phylip.pm Bio/Tools/Run/Alignment/Clustalw.pm I am of course happy to send these back in to the project... how would i best do this? Cheers adam On 11 Jan 2010, at 13:54, Roy Chaudhuri wrote: > Actually, I guess some sample code would be more helpful: > > use Bio::LocatableSeq; > use Bio::SimpleAlign; > use Bio::AlignIO; > my $seq1=Bio::LocatableSeq->new(-id=>'one', -seq=>'AT-CG', -start=>1, -end=>4); > my $seq2=Bio::LocatableSeq->new(-id=>'two', -seq=>'A--CG', -start=>1, -end=>3); > my $seq3=Bio::LocatableSeq->new(-id=>'three', -seq=>'ATTCG', -start=>1, -end=>5); > my $aln=Bio::SimpleAlign->new(-seqs=>[$seq1,$seq2,$seq3]); > Bio::AlignIO->new(-format=>'phylip')->write_aln($aln); > > Cheers, > Roy. > > > On 11/01/2010 13:40, Roy Chaudhuri wrote: >> Hi Adam, >> >> I'm guessing you actually want to create a Bio::SimpleAlign object >> (representing an alignment), rather than a Bio::AlignIO object (which is >> just for reading/writing alignment files). Bio::SimpleAlign has a >> documented new method that allows you to construct an alignment from >> Bio::LocatableSeq objects, which are similar to Bio::Seq objects but >> include gaps and start/end coordinates to describe their relationship to >> other sequences in the alignment. >> >> Roy. >> >> On 11/01/2010 12:21, Adam Witney wrote: >>> Hi, >>> >>> I am writing a script to automate the running of Phylip Pars. In the >>> process i have to create a Bio::AlignIO object from a set of data >>> that i have in a hash. >>> >>> I could write the hash data into a phylip file and then load the >>> Bio::AlignIO from that file, but i wondered if i could skip the >>> writing and then reading of a temporary file ? >>> >>> thanks for any help >>> >>> adam _______________________________________________ Bioperl-l >>> mailing list Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > From Russell.Smithies at agresearch.co.nz Mon Jan 11 22:41:02 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue, 12 Jan 2010 16:41:02 +1300 Subject: [Bioperl-l] BioPerl version? In-Reply-To: References: <4B4B2A51.9040602@gmail.com> <4B4B2D91.70906@gmail.com> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C619262C8@exchsth.agresearch.co.nz> Probably a dumb question but how do I find the version of an existing BioPerl installation without resorting to reading thru Bio/Root/Version.pm ? --Russell ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From cjfields at illinois.edu Mon Jan 11 22:59:44 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 11 Jan 2010 21:59:44 -0600 Subject: [Bioperl-l] BioPerl version? In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C619262C8@exchsth.agresearch.co.nz> References: <4B4B2A51.9040602@gmail.com> <4B4B2D91.70906@gmail.com> <18DF7D20DFEC044098A1062202F5FFF32C619262C8@exchsth.agresearch.co.nz> Message-ID: <795BD926-4AE9-4478-AAD5-E36558350745@illinois.edu> Not dumb, but a frequently asked one: that's a FAQ question ;> http://www.bioperl.org/wiki/FAQ#How_can_I_tell_what_version_of_BioPerl_is_installed.3F perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"' chris On Jan 11, 2010, at 9:41 PM, Smithies, Russell wrote: > Probably a dumb question but how do I find the version of an existing BioPerl installation without resorting to reading thru Bio/Root/Version.pm ? > > --Russell > > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Jan 12 11:02:02 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 12 Jan 2010 10:02:02 -0600 Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records In-Reply-To: <320fb6e01001110755u50fd4255i1f6dae40a608a562@mail.gmail.com> References: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com> <320fb6e01001110755u50fd4255i1f6dae40a608a562@mail.gmail.com> Message-ID: On Jan 11, 2010, at 9:55 AM, Peter wrote: > On Mon, Jan 11, 2010 at 3:42 PM, Hotz, Hans-Rudolf wrote: >> >> These entries form the CON data class, see: >> http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html#3_4_14 >> and they don't contain any sequence information. > > I know - GenBank files have a similar system with CONTIG > lines instead of sequences. I was expecting BioPerl to be > able to convert these EMBL files with CO lines into GenBank > files with CONTIG lines. IIRC the contig information for GenBank is stored in annotation. We can try to ensure the data is carried over to EMBL properly. >> If you take the 'expanded' entries from >> ftp://ftp.ebi.ac.uk/pub/databases/embl/expanded_con/release/rel_con_hum_01_r102.dat.gz >> your script will work. > > That's a useful tip - thanks. > > Peter NCBI's eutil option 'gbwithparts' is similar (always retrieves the sequence). chris From biopython at maubp.freeserve.co.uk Tue Jan 12 11:19:32 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 12 Jan 2010 16:19:32 +0000 Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records In-Reply-To: References: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com> <320fb6e01001110755u50fd4255i1f6dae40a608a562@mail.gmail.com> Message-ID: <320fb6e01001120819u50e73fa8k9bde8aa1abdf942d@mail.gmail.com> On Tue, Jan 12, 2010 at 4:02 PM, Chris Fields wrote: > On Jan 11, 2010, at 9:55 AM, Peter wrote: > >> On Mon, Jan 11, 2010 at 3:42 PM, Hotz, Hans-Rudolf wrote: >>> >>> These entries form the CON data class, see: >>> http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html#3_4_14 >>> and they don't contain any sequence information. >> >> I know - GenBank files have a similar system with CONTIG >> lines instead of sequences. I was expecting BioPerl to be >> able to convert these EMBL files with CO lines into GenBank >> files with CONTIG lines. > > IIRC the contig information for GenBank is stored in annotation. > We can try to ensure the data is carried over to EMBL properly. For contig records (where there is no sequence) I think we just need to map the GenBank CONTIG lines to the EMBL CO lines, and vice versa. At least, that's what Biopython now does (trunk code, not yet released). >>> If you take the 'expanded' entries from >>> ftp://ftp.ebi.ac.uk/pub/databases/embl/expanded_con/release/rel_con_hum_01_r102.dat.gz >>> your script will work. >> >> That's a useful tip - thanks. >> >> Peter > > NCBI's eutil option 'gbwithparts' is similar (always retrieves the sequence). Indeed. This is a useful work around for when a parser couldn't cope with the contig version of a GenBank file for some reason, e.g. http://bugzilla.open-bio.org/show_bug.cgi?id=2745 Peter From maj at fortinbras.us Tue Jan 12 12:33:30 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 12 Jan 2010 12:33:30 -0500 Subject: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP web service Message-ID: <231A8D9473704E7697F7A486A0CDA86A@NewLife> Hi All-- The beta of Bio::DB::SoapEUtilities is now available in the bioperl-run trunk: one-stop shopping for the NCBI Entrez SOAP web service. The system is fully WSDL based, and all eutils are available. The best thing (IMHO) are the result adaptors, which provide conversion and iteration of SOAP results into BioPerl objects. Schau, mal: use Bio::DB::EUtilities; my $fac = Bio::DB::EUtilities->new(); # step 1 my $seqio = $fac->esearch( -db => 'nucleotide', -term => 'HIV1 and CCR5 and Brazil' )->run(-auto_adapt => 1, -rettype => 'fasta'); # step 2, 3 # yes, it's already done the efetch under the hood... while ( my $seq = $seqio->next_seq ) { # step 4 # do something with $seq, a Bio::Seq object... } or this: my $links = $fac->elink( -db => 'protein', -dbfrom => 'nucleotide', -id => \@nucids )->run( -auto_adapt => 1 ); # maybe more than one associated id... my @prot_0 = $links->id_map( $nucids[0] ); while ( my $ls = $links->next_linkset ) { @ids = $ls->ids; @submitted_ids = $ls->submitted_ids; # etc. } and much, much more. See http://www.bioperl.org/wiki/HOWTO:EUtilities_Web_Service and of course, the POD, for all the details, including download/installation. Tests in bioperl-run/t. cheers, MAJ -- No new dependencies were added or animals mistreated -- during the making of these modules. From sheldon.mckay at gmail.com Tue Jan 12 13:02:53 2010 From: sheldon.mckay at gmail.com (Sheldon McKay) Date: Tue, 12 Jan 2010 10:02:53 -0800 Subject: [Bioperl-l] code.open-bio.org timing out? Message-ID: Hi all, I keep timing out trying to do an svn checkout of bioperl-live from code.open-bio.org. Any suggestions? Thanks, Sheldon ---- Sheldon McKay, PhD Lead, iPlant Tree of Life Engagement Team; Research Investigator Cold Spring Harbor Laboratory http://mckay.cshl.edu Google Voice: (203) 701-9204 On Tue, Nov 3, 2009 at 9:09 AM, Aaron Mackey wrote: > [ajm6q at lc4 bioperl-live]$ svn update > svn: Decompression of svndiff data failed > > > I'll admit to not having svn updated in awhile; A clean, anonymous svn co > failed with the same message: > > [...] > A ? ?bioperl-live/Bio/Structure/StructureI.pm > A ? ?bioperl-live/Bio/Structure/IO > svn: Decompression of svndiff data failed > > -Aaron > > P.S. I used this command: svn co svn:// > code.open-bio.org/bioperl/bioperl-live/trunk bioperl-live > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From biopython at maubp.freeserve.co.uk Tue Jan 12 13:12:46 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 12 Jan 2010 18:12:46 +0000 Subject: [Bioperl-l] code.open-bio.org timing out? In-Reply-To: References: Message-ID: <320fb6e01001121012r68166814o764df62c0a5a6224@mail.gmail.com> On Tue, Jan 12, 2010 at 6:02 PM, Sheldon McKay wrote: > Hi all, > > I keep timing out trying to do an svn checkout of bioperl-live from > code.open-bio.org. ?Any suggestions? > > Thanks, > Sheldon The OBF team know about this (its being discussed on root-l), hopefully they'll have it fixed before too long. Peter From cjfields at illinois.edu Tue Jan 12 13:18:45 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 12 Jan 2010 12:18:45 -0600 Subject: [Bioperl-l] code.open-bio.org timing out? In-Reply-To: <320fb6e01001121012r68166814o764df62c0a5a6224@mail.gmail.com> References: <320fb6e01001121012r68166814o764df62c0a5a6224@mail.gmail.com> Message-ID: <8E18DCA9-5C72-4201-A213-BF53A6AAAAD2@illinois.edu> On Jan 12, 2010, at 12:12 PM, Peter wrote: > On Tue, Jan 12, 2010 at 6:02 PM, Sheldon McKay wrote: >> Hi all, >> >> I keep timing out trying to do an svn checkout of bioperl-live from >> code.open-bio.org. Any suggestions? >> >> Thanks, >> Sheldon > > The OBF team know about this (its being discussed on root-l), > hopefully they'll have it fixed before too long. > > Peter We probably need to set up some automatic syncing of our read-only code.google.com repo as a backup. Jason had originally set that up, hopefully he'll respond. chris From jason at bioperl.org Tue Jan 12 13:27:55 2010 From: jason at bioperl.org (Jason Stajich) Date: Tue, 12 Jan 2010 10:27:55 -0800 Subject: [Bioperl-l] code.open-bio.org timing out? In-Reply-To: <8E18DCA9-5C72-4201-A213-BF53A6AAAAD2@illinois.edu> References: <320fb6e01001121012r68166814o764df62c0a5a6224@mail.gmail.com> <8E18DCA9-5C72-4201-A213-BF53A6AAAAD2@illinois.edu> Message-ID: Hi - I had setup the google code sync, but then the unfortunately realization that the revision numbers are shared among the wiki and the code SVN (all 1 repo) so when I added a wiki page on the site I screwed up the numbering and it wasn't possible to sync anymore (that I could figure out) without resetting it and I haven't gone back to that. Sorry - I wasn't sure if we had figured out what we wanted to for repositories so I sort of stopped worrying about it. -jason On Jan 12, 2010, at 10:18 AM, Chris Fields wrote: > On Jan 12, 2010, at 12:12 PM, Peter wrote: > >> On Tue, Jan 12, 2010 at 6:02 PM, Sheldon McKay > > wrote: >>> Hi all, >>> >>> I keep timing out trying to do an svn checkout of bioperl-live from >>> code.open-bio.org. Any suggestions? >>> >>> Thanks, >>> Sheldon >> >> The OBF team know about this (its being discussed on root-l), >> hopefully they'll have it fixed before too long. >> >> Peter > > We probably need to set up some automatic syncing of our read-only > code.google.com repo as a backup. Jason had originally set that up, > hopefully he'll respond. > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ From virajj at gmail.com Wed Jan 6 13:20:39 2010 From: virajj at gmail.com (Vijayaraj Nagarajan) Date: Wed, 6 Jan 2010 13:20:39 -0500 Subject: [Bioperl-l] targetp request Message-ID: <581b44531001061020i68b2e80ic15f3bcc830204a@mail.gmail.com> Hi, I am trying to use targetP in bioperl. the documentation at the bioperl site is a bit confusing to me... I would appreciate if you could give a very small example, as to how to use "Bio::Tools::TargetP" to predict the localization of a protein sequence that i have stored as a string. Thanks, Vijay From cjfields at illinois.edu Tue Jan 12 18:36:53 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 12 Jan 2010 17:36:53 -0600 Subject: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP web service In-Reply-To: <231A8D9473704E7697F7A486A0CDA86A@NewLife> References: <231A8D9473704E7697F7A486A0CDA86A@NewLife> Message-ID: Um, just to be clear, this isn't Bio::DB::EUtilities, right (it's Bio::DB::SoapEUtilities)? Otherwise this would be a serious namespace and API conflict with the current EUtilities tools. chris On Jan 12, 2010, at 11:33 AM, Mark A. Jensen wrote: > Hi All-- > > The beta of Bio::DB::SoapEUtilities is now available in the > bioperl-run trunk: one-stop shopping for the NCBI Entrez SOAP web > service. The system is fully WSDL based, and all eutils are > available. The best thing (IMHO) are the result adaptors, which > provide conversion and iteration of SOAP results into BioPerl > objects. Schau, mal: > > use Bio::DB::EUtilities; > my $fac = Bio::DB::EUtilities->new(); # step 1 > my $seqio = $fac->esearch( > -db => 'nucleotide', > -term => 'HIV1 and CCR5 and Brazil' > )->run(-auto_adapt => 1, -rettype => 'fasta'); # step 2, 3 > # yes, it's already done the efetch under the hood... > while ( my $seq = $seqio->next_seq ) { # step 4 > # do something with $seq, a Bio::Seq object... > } > > or this: > > my $links = $fac->elink( -db => 'protein', > -dbfrom => 'nucleotide', > -id => \@nucids )->run( -auto_adapt => 1 ); > > # maybe more than one associated id... > my @prot_0 = $links->id_map( $nucids[0] ); > > while ( my $ls = $links->next_linkset ) { > @ids = $ls->ids; > @submitted_ids = $ls->submitted_ids; > # etc. > } > > and much, much more. See > > http://www.bioperl.org/wiki/HOWTO:EUtilities_Web_Service > > and of course, the POD, for all the details, including > download/installation. Tests in bioperl-run/t. > > cheers, > MAJ > > -- No new dependencies were added or animals mistreated > -- during the making of these modules. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Jan 12 19:22:10 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 12 Jan 2010 18:22:10 -0600 Subject: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP web service In-Reply-To: <5AD210CB0C444A57881BBDD34DE99149@NewLife> References: <231A8D9473704E7697F7A486A0CDA86A@NewLife> <5AD210CB0C444A57881BBDD34DE99149@NewLife> Message-ID: Okay, just making sure (I was getting a bit paranoid). Great work on the SOAP interface, BTW! chris On Jan 12, 2010, at 6:08 PM, Mark A. Jensen wrote: > Um, yeah. > ----- Original Message ----- From: "Chris Fields" > To: "Mark A. Jensen" > Cc: "BioPerl List" > Sent: Tuesday, January 12, 2010 6:36 PM > Subject: Re: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP web service > > > Um, just to be clear, this isn't Bio::DB::EUtilities, right (it's Bio::DB::SoapEUtilities)? Otherwise this would be a serious namespace and API conflict with the current EUtilities tools. > > chris > > On Jan 12, 2010, at 11:33 AM, Mark A. Jensen wrote: > >> Hi All-- >> >> The beta of Bio::DB::SoapEUtilities is now available in the >> bioperl-run trunk: one-stop shopping for the NCBI Entrez SOAP web >> service. The system is fully WSDL based, and all eutils are >> available. The best thing (IMHO) are the result adaptors, which >> provide conversion and iteration of SOAP results into BioPerl >> objects. Schau, mal: >> >> use Bio::DB::EUtilities; >> my $fac = Bio::DB::EUtilities->new(); # step 1 >> my $seqio = $fac->esearch( >> -db => 'nucleotide', >> -term => 'HIV1 and CCR5 and Brazil' >> )->run(-auto_adapt => 1, -rettype => 'fasta'); # step 2, 3 >> # yes, it's already done the efetch under the hood... >> while ( my $seq = $seqio->next_seq ) { # step 4 >> # do something with $seq, a Bio::Seq object... >> } >> >> or this: >> >> my $links = $fac->elink( -db => 'protein', >> -dbfrom => 'nucleotide', >> -id => \@nucids )->run( -auto_adapt => 1 ); >> >> # maybe more than one associated id... >> my @prot_0 = $links->id_map( $nucids[0] ); >> >> while ( my $ls = $links->next_linkset ) { >> @ids = $ls->ids; >> @submitted_ids = $ls->submitted_ids; >> # etc. >> } >> >> and much, much more. See >> >> http://www.bioperl.org/wiki/HOWTO:EUtilities_Web_Service >> >> and of course, the POD, for all the details, including >> download/installation. Tests in bioperl-run/t. >> >> cheers, >> MAJ >> >> -- No new dependencies were added or animals mistreated >> -- during the making of these modules. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Tue Jan 12 19:08:12 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 12 Jan 2010 19:08:12 -0500 Subject: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP web service In-Reply-To: References: <231A8D9473704E7697F7A486A0CDA86A@NewLife> Message-ID: <5AD210CB0C444A57881BBDD34DE99149@NewLife> Um, yeah. ----- Original Message ----- From: "Chris Fields" To: "Mark A. Jensen" Cc: "BioPerl List" Sent: Tuesday, January 12, 2010 6:36 PM Subject: Re: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP web service Um, just to be clear, this isn't Bio::DB::EUtilities, right (it's Bio::DB::SoapEUtilities)? Otherwise this would be a serious namespace and API conflict with the current EUtilities tools. chris On Jan 12, 2010, at 11:33 AM, Mark A. Jensen wrote: > Hi All-- > > The beta of Bio::DB::SoapEUtilities is now available in the > bioperl-run trunk: one-stop shopping for the NCBI Entrez SOAP web > service. The system is fully WSDL based, and all eutils are > available. The best thing (IMHO) are the result adaptors, which > provide conversion and iteration of SOAP results into BioPerl > objects. Schau, mal: > > use Bio::DB::EUtilities; > my $fac = Bio::DB::EUtilities->new(); # step 1 > my $seqio = $fac->esearch( > -db => 'nucleotide', > -term => 'HIV1 and CCR5 and Brazil' > )->run(-auto_adapt => 1, -rettype => 'fasta'); # step 2, 3 > # yes, it's already done the efetch under the hood... > while ( my $seq = $seqio->next_seq ) { # step 4 > # do something with $seq, a Bio::Seq object... > } > > or this: > > my $links = $fac->elink( -db => 'protein', > -dbfrom => 'nucleotide', > -id => \@nucids )->run( -auto_adapt => 1 ); > > # maybe more than one associated id... > my @prot_0 = $links->id_map( $nucids[0] ); > > while ( my $ls = $links->next_linkset ) { > @ids = $ls->ids; > @submitted_ids = $ls->submitted_ids; > # etc. > } > > and much, much more. See > > http://www.bioperl.org/wiki/HOWTO:EUtilities_Web_Service > > and of course, the POD, for all the details, including > download/installation. Tests in bioperl-run/t. > > cheers, > MAJ > > -- No new dependencies were added or animals mistreated > -- during the making of these modules. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Tue Jan 12 20:09:28 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 12 Jan 2010 20:09:28 -0500 Subject: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP webservice In-Reply-To: <5AD210CB0C444A57881BBDD34DE99149@NewLife> References: <231A8D9473704E7697F7A486A0CDA86A@NewLife> <5AD210CB0C444A57881BBDD34DE99149@NewLife> Message-ID: corrected: use Bio::DB::SoapEUtilities; my $fac = Bio::DB::SoapEUtilities->new(); # step 1 my $seqio = $fac->esearch( -db => 'nucleotide', -term => 'HIV1 and CCR5 and Brazil' )->run(-auto_adapt => 1, -rettype => 'fasta'); # step 2, 3 # yes, it's already done the efetch under the hood... while ( my $seq = $seqio->next_seq ) { # step 4 # do something with $seq, a Bio::Seq object... } ----- Original Message ----- From: "Mark A. Jensen" To: "Chris Fields" Cc: "BioPerl List" Sent: Tuesday, January 12, 2010 7:08 PM Subject: Re: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP webservice > Um, yeah. > ----- Original Message ----- > From: "Chris Fields" > To: "Mark A. Jensen" > Cc: "BioPerl List" > Sent: Tuesday, January 12, 2010 6:36 PM > Subject: Re: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP web > service > > > Um, just to be clear, this isn't Bio::DB::EUtilities, right (it's > Bio::DB::SoapEUtilities)? Otherwise this would be a serious namespace and API > conflict with the current EUtilities tools. > > chris > > On Jan 12, 2010, at 11:33 AM, Mark A. Jensen wrote: > >> Hi All-- >> >> The beta of Bio::DB::SoapEUtilities is now available in the >> bioperl-run trunk: one-stop shopping for the NCBI Entrez SOAP web >> service. The system is fully WSDL based, and all eutils are >> available. The best thing (IMHO) are the result adaptors, which >> provide conversion and iteration of SOAP results into BioPerl >> objects. Schau, mal: >> >> use Bio::DB::EUtilities; >> my $fac = Bio::DB::EUtilities->new(); # step 1 >> my $seqio = $fac->esearch( >> -db => 'nucleotide', >> -term => 'HIV1 and CCR5 and Brazil' >> )->run(-auto_adapt => 1, -rettype => 'fasta'); # step 2, 3 >> # yes, it's already done the efetch under the hood... >> while ( my $seq = $seqio->next_seq ) { # step 4 >> # do something with $seq, a Bio::Seq object... >> } >> >> or this: >> >> my $links = $fac->elink( -db => 'protein', >> -dbfrom => 'nucleotide', >> -id => \@nucids )->run( -auto_adapt => 1 ); >> >> # maybe more than one associated id... >> my @prot_0 = $links->id_map( $nucids[0] ); >> >> while ( my $ls = $links->next_linkset ) { >> @ids = $ls->ids; >> @submitted_ids = $ls->submitted_ids; >> # etc. >> } >> >> and much, much more. See >> >> http://www.bioperl.org/wiki/HOWTO:EUtilities_Web_Service >> >> and of course, the POD, for all the details, including >> download/installation. Tests in bioperl-run/t. >> >> cheers, >> MAJ >> >> -- No new dependencies were added or animals mistreated >> -- during the making of these modules. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From tuco at pasteur.fr Wed Jan 13 05:24:34 2010 From: tuco at pasteur.fr (Emmanuel Quevillon) Date: Wed, 13 Jan 2010 11:24:34 +0100 Subject: [Bioperl-l] targetp request In-Reply-To: <581b44531001061020i68b2e80ic15f3bcc830204a@mail.gmail.com> References: <581b44531001061020i68b2e80ic15f3bcc830204a@mail.gmail.com> Message-ID: <4B4D9F62.5010306@pasteur.fr> On 1/6/10 7:20 PM, Vijayaraj Nagarajan wrote: > Hi, > > I am trying to use targetP in bioperl. > the documentation at the bioperl site is a bit confusing to me... > > I would appreciate if you could give a very small example, as to how to use > "Bio::Tools::TargetP" to predict the localization of a protein sequence that > i have stored as a string. > > Thanks, > Vijay > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Dear Vivay, Bio::Tools::TargetP is not intended to run targetp on a sequence but to read and parse results from targetp run. From the Pod doc : DESCRIPTION TargetP modules will provides parsed informations about protein localization. It reads in a targetp output file. It parses the results, and returns a Bio::SeqFeature::Generic object for each sequences found to have a subcellular localization So to analyze your sequence, you'll first need to run targetp on your sequence file to create a targetp result output file. Then use Bio::Tools::TargetP module to parse this result file and get only informations you want/need from the result to be display as shown in the SYNOPSIS of the Pod documentation of the module. HTH Regards Emmanuel From roy.chaudhuri at gmail.com Wed Jan 13 07:52:58 2010 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Wed, 13 Jan 2010 12:52:58 +0000 Subject: [Bioperl-l] create Bio::AlignIO object from hash In-Reply-To: References: <4B4B2A51.9040602@gmail.com> <4B4B2D91.70906@gmail.com> Message-ID: <4B4DC22A.8080701@gmail.com> Upload them to Bugzilla as patches, and one of the devs will review your changes and incorporate them into bioperl-live: http://www.bioperl.org/wiki/HOWTO:SubmitPatch Roy. On 11/01/2010 16:27, Adam Witney wrote: > > Ah excellent, thanks Roy. I was indeed thinking about it the wrong > way. > > In the process of writing this i have created a > > Bio::Tools::Run::Phylo::Phylip::Pars class > > which is essentially just a modified copy of ProtPars. I have also > fixed a few typos and possible bugs in > > Bio/Tools/Run/Phylo/Phylip/Base.pm > Bio/Tools/Run/Phylo/Phylip/PhylipConf.pm Bio/AlignIO/phylip.pm > Bio/Tools/Run/Alignment/Clustalw.pm > > I am of course happy to send these back in to the project... how > would i best do this? > > Cheers > > adam > > > On 11 Jan 2010, at 13:54, Roy Chaudhuri wrote: > >> Actually, I guess some sample code would be more helpful: >> >> use Bio::LocatableSeq; use Bio::SimpleAlign; use Bio::AlignIO; my >> $seq1=Bio::LocatableSeq->new(-id=>'one', -seq=>'AT-CG', -start=>1, >> -end=>4); my $seq2=Bio::LocatableSeq->new(-id=>'two', >> -seq=>'A--CG', -start=>1, -end=>3); my >> $seq3=Bio::LocatableSeq->new(-id=>'three', -seq=>'ATTCG', >> -start=>1, -end=>5); my >> $aln=Bio::SimpleAlign->new(-seqs=>[$seq1,$seq2,$seq3]); >> Bio::AlignIO->new(-format=>'phylip')->write_aln($aln); >> >> Cheers, Roy. >> >> >> On 11/01/2010 13:40, Roy Chaudhuri wrote: >>> Hi Adam, >>> >>> I'm guessing you actually want to create a Bio::SimpleAlign >>> object (representing an alignment), rather than a Bio::AlignIO >>> object (which is just for reading/writing alignment files). >>> Bio::SimpleAlign has a documented new method that allows you to >>> construct an alignment from Bio::LocatableSeq objects, which are >>> similar to Bio::Seq objects but include gaps and start/end >>> coordinates to describe their relationship to other sequences in >>> the alignment. >>> >>> Roy. >>> >>> On 11/01/2010 12:21, Adam Witney wrote: >>>> Hi, >>>> >>>> I am writing a script to automate the running of Phylip Pars. >>>> In the process i have to create a Bio::AlignIO object from a >>>> set of data that i have in a hash. >>>> >>>> I could write the hash data into a phylip file and then load >>>> the Bio::AlignIO from that file, but i wondered if i could skip >>>> the writing and then reading of a temporary file ? >>>> >>>> thanks for any help >>>> >>>> adam _______________________________________________ Bioperl-l >>>> mailing list Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> > From marcelo011982 at gmail.com Wed Jan 13 13:12:04 2010 From: marcelo011982 at gmail.com (Marcelo Iwata) Date: Wed, 13 Jan 2010 16:12:04 -0200 Subject: [Bioperl-l] Blast to Clustalw Format Message-ID: <1c9f28971001131012w65378e9bw12d892e874028056@mail.gmail.com> Hi.. I have an simple Blast result, such as blastn. Is there an scrip to transform such result to Clustalw format in Bioperl ?(.aln) Thanx for any help. From Kevin.M.Brown at asu.edu Wed Jan 13 13:01:42 2010 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Wed, 13 Jan 2010 11:01:42 -0700 Subject: [Bioperl-l] targetp request In-Reply-To: <4B4D9F62.5010306@pasteur.fr> References: <581b44531001061020i68b2e80ic15f3bcc830204a@mail.gmail.com> <4B4D9F62.5010306@pasteur.fr> Message-ID: <1A4207F8295607498283FE9E93B775B4067C133E@EX02.asurite.ad.asu.edu> Sounds like this module might be in the wrong place then. Sounds more like a SeqIO or AlignIO module, heheh. Also looks like the docs might need to be cleaned up a bit for english readability (at least that initial sentence). Kevin Brown Center for Innovations in Medicine Biodesign Institute Arizona State University > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Emmanuel Quevillon > Sent: Wednesday, January 13, 2010 3:25 AM > To: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] targetp request > > On 1/6/10 7:20 PM, Vijayaraj Nagarajan wrote: > > Hi, > > > > I am trying to use targetP in bioperl. > > the documentation at the bioperl site is a bit confusing to me... > > > > I would appreciate if you could give a very small example, > as to how to use > > "Bio::Tools::TargetP" to predict the localization of a > protein sequence that > > i have stored as a string. > > > > Thanks, > > Vijay > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Dear Vivay, > > Bio::Tools::TargetP is not intended to run targetp on a > sequence but to > read and parse results from targetp run. > > From the Pod doc : > > DESCRIPTION > TargetP modules will provides parsed informations > about protein > localization. It > reads in a targetp output file. It parses the results, and > returns a > Bio::SeqFeature::Generic object for each sequences > found to have > a subcellular > localization > > > So to analyze your sequence, you'll first need to run targetp on your > sequence file to create a targetp result output file. Then use > Bio::Tools::TargetP module to parse this result file and get only > informations you want/need from the result to be display as > shown in the > SYNOPSIS of the Pod documentation of the module. > > HTH > > Regards > > Emmanuel > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From maj at fortinbras.us Wed Jan 13 13:44:36 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 13 Jan 2010 13:44:36 -0500 Subject: [Bioperl-l] Blast to Clustalw Format In-Reply-To: <1c9f28971001131012w65378e9bw12d892e874028056@mail.gmail.com> References: <1c9f28971001131012w65378e9bw12d892e874028056@mail.gmail.com> Message-ID: Marcelo- Yes-- look at the code snip at http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_SearchIO combined with the snip at http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods (using -format => 'clustalw') cheers MAJ ----- Original Message ----- From: "Marcelo Iwata" To: Sent: Wednesday, January 13, 2010 1:12 PM Subject: [Bioperl-l] Blast to Clustalw Format > Hi.. > I have an simple Blast result, such as blastn. > Is there an scrip to transform such result to Clustalw format in Bioperl > ?(.aln) > > Thanx for any help. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From dan.kortschak at adelaide.edu.au Wed Jan 13 23:26:46 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Thu, 14 Jan 2010 14:56:46 +1030 Subject: [Bioperl-l] not able to use Bio::Root::IO method Message-ID: <1263443206.8016.47.camel@zoidberg.mbs.adelaide.edu.au> Hi All, I'm having a stupid problem that for some reason I just can't figure out. I'm putting together a B:A:IO:bowtie module to wrap around the B:A:IO:sam module so bowtie output can be used as an assembly start point. For some reason that is escaping me I can't create tempfiles! What should be the relevant code in the module: package Bio::Assembly::IO::bowtie; use strict; use warnings; # Object preamble - inherits from Bio::Root::Root use Bio::SeqIO; use Bio::Tools::Run::Samtools; use Bio::Assembly::IO; use Carp; use Bio::Root::Root; use Bio::Root::IO; use base qw( Bio::Root::Root Bio::Root::IO Bio::Assembly::IO ); and the line (there are a couple of others that are like to fail in the same way, but I've not got that far) my ($sam_tmp_h, $sam_tmp_f) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.sam' ); Which dies with: Can't locate object method "io" via package "Bio::Assembly::IO::bowtie" at /usr/local/share/perl/5.10.0/Bio/Assembly/IO/bowtie.pm line 175. Relevant environment vars: DB<10> x @ISA 0 'Bio::Root::Root' 1 'Bio::Root::IO' 2 'Bio::Assembly::IO' DB<11> x $self 0 Bio::Assembly::IO::bowtie=HASH(0x2d226d8) '_no_head' => undef '_no_sq' => undef '_root_verbose' => 0 Can someone suggest what I'm missing? cheers Dan From maj at fortinbras.us Thu Jan 14 00:11:01 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 14 Jan 2010 00:11:01 -0500 Subject: [Bioperl-l] not able to use Bio::Root::IO method In-Reply-To: <1263443206.8016.47.camel@zoidberg.mbs.adelaide.edu.au> References: <1263443206.8016.47.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: <84196F01FF584C64A79B89FECE2DD86F@NewLife> Hey Dan-- what does your constructor look like? I wonder if something's getting lost in new() and _initialize() chaining spaghetti- MAJ ----- Original Message ----- From: "Dan Kortschak" To: Sent: Wednesday, January 13, 2010 11:26 PM Subject: [Bioperl-l] not able to use Bio::Root::IO method > Hi All, > > I'm having a stupid problem that for some reason I just can't figure > out. I'm putting together a B:A:IO:bowtie module to wrap around the > B:A:IO:sam module so bowtie output can be used as an assembly start > point. > > For some reason that is escaping me I can't create tempfiles! > > What should be the relevant code in the module: > > package Bio::Assembly::IO::bowtie; > use strict; > use warnings; > > # Object preamble - inherits from Bio::Root::Root > > use Bio::SeqIO; > use Bio::Tools::Run::Samtools; > use Bio::Assembly::IO; > use Carp; > use Bio::Root::Root; > use Bio::Root::IO; > use base qw( Bio::Root::Root Bio::Root::IO Bio::Assembly::IO ); > > > and the line (there are a couple of others that are like to fail in the > same way, but I've not got that far) > > my ($sam_tmp_h, $sam_tmp_f) = $self->io->tempfile( -dir => > $self->tempdir(), -suffix => '.sam' ); > > Which dies with: > Can't locate object method "io" via package "Bio::Assembly::IO::bowtie" > at /usr/local/share/perl/5.10.0/Bio/Assembly/IO/bowtie.pm line 175. > > Relevant environment vars: > DB<10> x @ISA > 0 'Bio::Root::Root' > 1 'Bio::Root::IO' > 2 'Bio::Assembly::IO' > > DB<11> x $self > 0 Bio::Assembly::IO::bowtie=HASH(0x2d226d8) > '_no_head' => undef > '_no_sq' => undef > '_root_verbose' => 0 > > > > Can someone suggest what I'm missing? > > cheers > Dan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From dan.kortschak at adelaide.edu.au Thu Jan 14 00:35:35 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Thu, 14 Jan 2010 16:05:35 +1030 Subject: [Bioperl-l] not able to use Bio::Root::IO method In-Reply-To: <84196F01FF584C64A79B89FECE2DD86F@NewLife> References: <1263443206.8016.47.camel@zoidberg.mbs.adelaide.edu.au> <84196F01FF584C64A79B89FECE2DD86F@NewLife> Message-ID: <1263447335.8016.57.camel@zoidberg.mbs.adelaide.edu.au> Thanks Mark, I'm not sure about that since @ISA still includes Bio::Root:IO when it's at the call, but it might be. cheers Dan Here is the entirety of the code (it reasonably short): package Bio::Assembly::IO::bowtie; use strict; use warnings; # Object preamble - inherits from Bio::Root::Root use Bio::SeqIO; use Bio::Tools::Run::Samtools; use Bio::Assembly::IO; use Carp; use Bio::Root::Root; use Bio::Root::IO; use base qw( Bio::Root::Root Bio::Root::IO Bio::Assembly::IO ); our $HD = "\@HD\tVN:1.0\tSO:unsorted\n"; our $PG = "\@PG\tID=Bowtie\n"; our $HAVE_IO_UNCOMPRESS; BEGIN { # check requirements unless ( eval "require Bio::Tools::Run::Bowtie;") { Bio::Root::Root->throw("Bio::Tools::Run::Bowtie is not available - cannot extract refdb from index."); } unless ( eval "require IO::Uncompress::Gunzip; \$HAVE_IO_UNCOMPRESS = 1") { Bio::Root::Root->warn("IO::Uncompress::Gunzip is not available; you'll have to do your decompression by hand."); } } sub new { my $class = shift; my @args = @_; my $self = $class->SUPER::new(@args); my ($file, $index, $no_head, $no_sq) = $self->_rearrange([qw(FILE INDEX NO_HEAD NO_SQ)], @args); $file =~ s/^{'_no_head'} = $no_head; $self->{'_no_sq'} = $no_sq; # get the sequence so samtools can work with it my $inspector = Bio::Tools::Run::Bowtie->new( -command => 'inspect' ); my $refdb = $inspector->run($index); my $bam_file = $self->_make_bam($self->_bowtie_to_sam($file, $refdb)); my $sam = Bio::Assembly::IO->new( -file => "<$bam_file", -refdb => $refdb , -format => 'sam' ); return $sam; } sub _bowtie_to_sam { my ($self, $file, $refdb) = @_; $self->throw("'$file' does not exist or is not readable.") unless ( -e $file && -r $file ); my $guesser = Bio::Tools::GuessSeqFormat->new(-file=>$file); $self->throw("'$file' is not a bowtie formatted file.") unless $guesser->guess =~ m/^bowtie$/; my %SQ; my $mapq = 255; my $in_pair; my @mate_line; my $mlen; if ($file =~ m/\.gz[^.]*$/) { unless ($HAVE_IO_UNCOMPRESS) { croak( "IO::Uncompress::Gunzip not available, can't expand '$_'" ); } my ($tfh, $tf) = $self->io->tempfile; my $z = IO::Uncompress::Gunzip->new($_); while (<$z>) { print $tfh $_ } close $tfh; $file = $tf; } open(my $fh, $file) or $self->throw("Can not open '$file' for reading: $!"); # create temp file for working my ($sam_tmp_h, $sam_tmp_f) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.sam' ); while ($fh) { chomp; my ($qname,$strand,$rname,$pos,$seq,$qual,$m,$details)=split("\t",$_); $SQ{$rname} = 1; my $paired_f = ($qname =~ m#/[12]#) ? 0x03 : 0; my $strand_f = ($strand eq '-') ? 0x10 : 0; my $op_strand_f = ($strand eq '+' && $paired_f) ? 0x20 : 0; my $first_f = ($qname =~ m#/1#) ? 0x40 : 0; my $second_f = ($qname =~ m#/2#) ? 0x80 : 0; my $flag = $paired_f | $strand_f | $op_strand_f | $first_f | $second_f; $pos++; my $len = length $seq; die unless $len == length $qual; my $cigar = $len.'M'; my @detail = split(',',$details); my $dist = 'NM:i:'.scalar @detail; my @mismatch; my $last_pos = 0; for (@detail) { m/(\d+):(\w)>\w/; my $err = ($1-$last_pos); $last_pos = $1+1; push @mismatch,($err,$2); } push @mismatch, $len-$last_pos; @mismatch = reverse @mismatch if $strand eq '-'; my $mismatch = join('',('MD:Z:', at mismatch)); if ($paired_f) { my $mrnm = '='; if ($in_pair) { my $mpos = $mate_line[3]; $mate_line[7] = $pos; my $isize = $mpos-$pos-$len; $mate_line[8] = -$isize; print $sam_tmp_h join("\t", at mate_line),"\n"; print $sam_tmp_h join("\t",$qname, $flag, $rname, $pos, $mapq, $cigar, $mrnm, $mpos, $isize, $seq, $qual, $mismatch, $dist),"\n"; $in_pair = 0; } else { $mlen = $len; @mate_line = ($qname, $flag, $rname, $pos, $mapq, $cigar, $mrnm, undef, undef, $seq, $qual, $mismatch, $dist); $in_pair = 1; } } else { my $mrnm = '*'; my $mpos = 0; my $isize = 0; print $sam_tmp_h join("\t",$qname, $flag, $rname, $pos, $mapq, $cigar, $mrnm, $mpos, $isize, $seq, $qual, $mismatch, $dist),"\n"; } } close($fh); $sam_tmp_h->close; return $sam_tmp_f if $self->{'_no_head'}; my ($samh, $samf) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.sam' ); # print header print $samh $HD; # print sequence dictionary unless ($self->{'_no_sq'}) { my $db = Bio::SeqIO->new( -file => $refdb, -format => 'fasta' ); while ( my $seq = $db->next_seq() ) { $SQ{$seq->id} = $seq->length if $SQ{$seq->id}; } map { print $samh join("\t", ('@SQ', "SN:$_", "LN:$SQ{$_}")), "\n" } keys %SQ; } # print program print $samh $PG; open($sam_tmp_h, $sam_tmp_f) or $self->throw("Can not open '$sam_tmp_f' for reading: $!"); print $samh $_ while ($sam_tmp_h); close($sam_tmp_h); $samh->close; return $samf; } sub _make_bam { my ($self, $file) = @_; $self->throw("'$file' does not exist or is not readable") unless ( -e $file && -r $file ); # make a sorted bam file from a sam file input my ($bamh, $bamf) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.bam' ); my ($srth, $srtf) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.srt' ); $_->close for ($bamh, $srth); my $samt = Bio::Tools::Run::Samtools->new( -command => 'view', -sam_input => 1, -bam_output => 1 ); $samt->run( -bam => $file, -out => $bamf ); $samt = Bio::Tools::Run::Samtools->new( -command => 'sort' ); $samt->run( -bam => $bamf, -pfx => $srtf); return $srtf.'.bam' } 1; On Thu, 2010-01-14 at 00:11 -0500, Mark A. Jensen wrote: > Hey Dan-- what does your constructor look like? I wonder if > something's getting > lost in new() and _initialize() chaining spaghetti- MAJ > From dan.kortschak at adelaide.edu.au Thu Jan 14 00:35:48 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Thu, 14 Jan 2010 16:05:48 +1030 Subject: [Bioperl-l] not able to use Bio::Root::IO method In-Reply-To: References: <1263443206.8016.47.camel@zoidberg.mbs.adelaide.edu.au> <84196F01FF584C64A79B89FECE2DD86F@NewLife> <1263446261.8016.50.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: <1263447348.8016.59.camel@zoidberg.mbs.adelaide.edu.au> I've had a bit of a play with that, but no luck. Dan On Thu, 2010-01-14 at 00:26 -0500, Mark A. Jensen wrote: > I've found that rearranging the items in the 'use base' array can > sometimes > recover > lost methods. I don't know enough of the arcana to know why it works. > (Sometimes, > java starts looking pretty good from here...) > From maj at fortinbras.us Thu Jan 14 00:38:00 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 14 Jan 2010 00:38:00 -0500 Subject: [Bioperl-l] Fw: not able to use Bio::Root::IO method Message-ID: <59E0A4EAF5934DC6BDCA7D8E98DB085F@NewLife> up to list ----- Original Message ----- From: "Mark A. Jensen" To: "Dan Kortschak" Sent: Thursday, January 14, 2010 12:36 AM Subject: Re: [Bioperl-l] not able to use Bio::Root::IO method > Aha-- check out the pod for Bio::Root::IO: > > "This module provides methods that will usually be needed for any sort > of file- or stream-related input/output, e.g., keeping track of a file > handle, transient printing and reading from the file handle, a close > method, automatically closing the handle on garbage collection, etc. > > To use this for your own code you will either want to inherit from > this module, or instantiate an object for every file or stream you are > dealing with. In the first case this module will most likely not be > the first class off which your class inherits; therefore you need to > call _initialize_io() with the named parameters in order to set file > handle, open file, etc automatically." > > I think you're wanting a call to $self->_initialize_io(). (There is no io() > method explicitly defined in any of the base classes.) > MAJ > ----- Original Message ----- > From: "Dan Kortschak" > To: > Sent: Wednesday, January 13, 2010 11:26 PM > Subject: [Bioperl-l] not able to use Bio::Root::IO method > > >> Hi All, >> >> I'm having a stupid problem that for some reason I just can't figure >> out. I'm putting together a B:A:IO:bowtie module to wrap around the >> B:A:IO:sam module so bowtie output can be used as an assembly start >> point. >> >> For some reason that is escaping me I can't create tempfiles! >> >> What should be the relevant code in the module: >> >> package Bio::Assembly::IO::bowtie; >> use strict; >> use warnings; >> >> # Object preamble - inherits from Bio::Root::Root >> >> use Bio::SeqIO; >> use Bio::Tools::Run::Samtools; >> use Bio::Assembly::IO; >> use Carp; >> use Bio::Root::Root; >> use Bio::Root::IO; >> use base qw( Bio::Root::Root Bio::Root::IO Bio::Assembly::IO ); >> >> >> and the line (there are a couple of others that are like to fail in the >> same way, but I've not got that far) >> >> my ($sam_tmp_h, $sam_tmp_f) = $self->io->tempfile( -dir => >> $self->tempdir(), -suffix => '.sam' ); >> >> Which dies with: >> Can't locate object method "io" via package "Bio::Assembly::IO::bowtie" >> at /usr/local/share/perl/5.10.0/Bio/Assembly/IO/bowtie.pm line 175. >> >> Relevant environment vars: >> DB<10> x @ISA >> 0 'Bio::Root::Root' >> 1 'Bio::Root::IO' >> 2 'Bio::Assembly::IO' >> >> DB<11> x $self >> 0 Bio::Assembly::IO::bowtie=HASH(0x2d226d8) >> '_no_head' => undef >> '_no_sq' => undef >> '_root_verbose' => 0 >> >> >> >> Can someone suggest what I'm missing? >> >> cheers >> Dan >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> From maj at fortinbras.us Thu Jan 14 00:50:11 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 14 Jan 2010 00:50:11 -0500 Subject: [Bioperl-l] not able to use Bio::Root::IO method In-Reply-To: <1263447348.8016.59.camel@zoidberg.mbs.adelaide.edu.au> References: <1263443206.8016.47.camel@zoidberg.mbs.adelaide.edu.au> <84196F01FF584C64A79B89FECE2DD86F@NewLife> <1263446261.8016.50.camel@zoidberg.mbs.adelaide.edu.au> <1263447348.8016.59.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: <82BFF47099684EF496DB3875D39DCA14@NewLife> For the benefit of the list, I categorically deny ever making the statement about java below.... MAJ ----- Original Message ----- From: "Dan Kortschak" To: "Mark A. Jensen" Cc: Sent: Thursday, January 14, 2010 12:35 AM Subject: Re: [Bioperl-l] not able to use Bio::Root::IO method > I've had a bit of a play with that, but no luck. > > Dan > > On Thu, 2010-01-14 at 00:26 -0500, Mark A. Jensen wrote: >> I've found that rearranging the items in the 'use base' array can >> sometimes >> recover >> lost methods. I don't know enough of the arcana to know why it works. >> (Sometimes, >> java starts looking pretty good from here...) >> > > From cjfields at illinois.edu Thu Jan 14 02:23:41 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 14 Jan 2010 01:23:41 -0600 Subject: [Bioperl-l] not able to use Bio::Root::IO method In-Reply-To: <1263447335.8016.57.camel@zoidberg.mbs.adelaide.edu.au> References: <1263443206.8016.47.camel@zoidberg.mbs.adelaide.edu.au> <84196F01FF584C64A79B89FECE2DD86F@NewLife> <1263447335.8016.57.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: You can remove separate 'use' directives if they are declared with 'use base' (they will be imported then). Also, Bio::Root::IO inherits Bio::Root::Root, and Bio::Assembly::IO should inherit from Bio::Root::IO, so the only base module you should need is Bio::Assembly::IO. It's possible having all three is confusing the interpreter. chris On Jan 13, 2010, at 11:35 PM, Dan Kortschak wrote: > Thanks Mark, I'm not sure about that since @ISA still includes > Bio::Root:IO when it's at the call, but it might be. > > cheers > Dan > > Here is the entirety of the code (it reasonably short): > > package Bio::Assembly::IO::bowtie; > use strict; > use warnings; > > # Object preamble - inherits from Bio::Root::Root > > use Bio::SeqIO; > use Bio::Tools::Run::Samtools; > use Bio::Assembly::IO; > use Carp; > use Bio::Root::Root; > use Bio::Root::IO; > use base qw( Bio::Root::Root Bio::Root::IO Bio::Assembly::IO ); > > our $HD = "\@HD\tVN:1.0\tSO:unsorted\n"; > our $PG = "\@PG\tID=Bowtie\n"; > > our $HAVE_IO_UNCOMPRESS; > BEGIN { > # check requirements > unless ( eval "require Bio::Tools::Run::Bowtie;") { > Bio::Root::Root->throw("Bio::Tools::Run::Bowtie is not available - cannot extract refdb from index."); > } > unless ( eval "require IO::Uncompress::Gunzip; \$HAVE_IO_UNCOMPRESS = 1") { > Bio::Root::Root->warn("IO::Uncompress::Gunzip is not available; you'll have to do your decompression by hand."); > } > } > > sub new { > my $class = shift; > my @args = @_; > my $self = $class->SUPER::new(@args); > my ($file, $index, $no_head, $no_sq) = $self->_rearrange([qw(FILE INDEX NO_HEAD NO_SQ)], @args); > $file =~ s/^ $self->{'_no_head'} = $no_head; > $self->{'_no_sq'} = $no_sq; > # get the sequence so samtools can work with it > my $inspector = Bio::Tools::Run::Bowtie->new( -command => 'inspect' ); > my $refdb = $inspector->run($index); > my $bam_file = $self->_make_bam($self->_bowtie_to_sam($file, $refdb)); > my $sam = Bio::Assembly::IO->new( -file => "<$bam_file", -refdb => $refdb , -format => 'sam' ); > return $sam; > } > > sub _bowtie_to_sam { > my ($self, $file, $refdb) = @_; > > $self->throw("'$file' does not exist or is not readable.") > unless ( -e $file && -r $file ); > my $guesser = Bio::Tools::GuessSeqFormat->new(-file=>$file); > $self->throw("'$file' is not a bowtie formatted file.") unless $guesser->guess =~ m/^bowtie$/; > > my %SQ; > my $mapq = 255; > my $in_pair; > my @mate_line; > my $mlen; > > if ($file =~ m/\.gz[^.]*$/) { > unless ($HAVE_IO_UNCOMPRESS) { > croak( "IO::Uncompress::Gunzip not available, can't expand '$_'" ); > } > my ($tfh, $tf) = $self->io->tempfile; > my $z = IO::Uncompress::Gunzip->new($_); > while (<$z>) { print $tfh $_ } > close $tfh; > $file = $tf; > } > > open(my $fh, $file) or > $self->throw("Can not open '$file' for reading: $!"); > > # create temp file for working > my ($sam_tmp_h, $sam_tmp_f) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.sam' ); > > while ($fh) { > chomp; > my ($qname,$strand,$rname,$pos,$seq,$qual,$m,$details)=split("\t",$_); > $SQ{$rname} = 1; > > my $paired_f = ($qname =~ m#/[12]#) ? 0x03 : 0; > my $strand_f = ($strand eq '-') ? 0x10 : 0; > my $op_strand_f = ($strand eq '+' && $paired_f) ? 0x20 : 0; > my $first_f = ($qname =~ m#/1#) ? 0x40 : 0; > my $second_f = ($qname =~ m#/2#) ? 0x80 : 0; > my $flag = $paired_f | $strand_f | $op_strand_f | $first_f | $second_f; > > $pos++; > my $len = length $seq; > die unless $len == length $qual; > my $cigar = $len.'M'; > my @detail = split(',',$details); > my $dist = 'NM:i:'.scalar @detail; > > my @mismatch; > my $last_pos = 0; > for (@detail) { > m/(\d+):(\w)>\w/; > my $err = ($1-$last_pos); > $last_pos = $1+1; > push @mismatch,($err,$2); > } > push @mismatch, $len-$last_pos; > @mismatch = reverse @mismatch if $strand eq '-'; > my $mismatch = join('',('MD:Z:', at mismatch)); > > if ($paired_f) { > my $mrnm = '='; > if ($in_pair) { > my $mpos = $mate_line[3]; > $mate_line[7] = $pos; > my $isize = $mpos-$pos-$len; > $mate_line[8] = -$isize; > print $sam_tmp_h join("\t", at mate_line),"\n"; > print $sam_tmp_h join("\t",$qname, $flag, $rname, $pos, $mapq, $cigar, $mrnm, $mpos, $isize, $seq, $qual, $mismatch, $dist),"\n"; > $in_pair = 0; > } else { > $mlen = $len; > @mate_line = ($qname, $flag, $rname, $pos, $mapq, $cigar, $mrnm, undef, undef, $seq, $qual, $mismatch, $dist); > $in_pair = 1; > } > } else { > my $mrnm = '*'; > my $mpos = 0; > my $isize = 0; > print $sam_tmp_h join("\t",$qname, $flag, $rname, $pos, $mapq, $cigar, $mrnm, $mpos, $isize, $seq, $qual, $mismatch, $dist),"\n"; > } > } > > close($fh); > $sam_tmp_h->close; > > return $sam_tmp_f if $self->{'_no_head'}; > > my ($samh, $samf) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.sam' ); > > # print header > print $samh $HD; > > # print sequence dictionary > unless ($self->{'_no_sq'}) { > my $db = Bio::SeqIO->new( -file => $refdb, -format => 'fasta' ); > while ( my $seq = $db->next_seq() ) { > $SQ{$seq->id} = $seq->length if $SQ{$seq->id}; > } > > map { print $samh join("\t", ('@SQ', "SN:$_", "LN:$SQ{$_}")), "\n" } keys %SQ; > } > > # print program > print $samh $PG; > > open($sam_tmp_h, $sam_tmp_f) or > $self->throw("Can not open '$sam_tmp_f' for reading: $!"); > > print $samh $_ while ($sam_tmp_h); > > close($sam_tmp_h); > $samh->close; > > return $samf; > } > > sub _make_bam { > my ($self, $file) = @_; > > $self->throw("'$file' does not exist or is not readable") > unless ( -e $file && -r $file ); > > # make a sorted bam file from a sam file input > my ($bamh, $bamf) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.bam' ); > my ($srth, $srtf) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.srt' ); > $_->close for ($bamh, $srth); > > my $samt = Bio::Tools::Run::Samtools->new( -command => 'view', > -sam_input => 1, > -bam_output => 1 ); > > $samt->run( -bam => $file, -out => $bamf ); > > $samt = Bio::Tools::Run::Samtools->new( -command => 'sort' ); > > $samt->run( -bam => $bamf, -pfx => $srtf); > > return $srtf.'.bam' > } > > 1; > > > On Thu, 2010-01-14 at 00:11 -0500, Mark A. Jensen wrote: >> Hey Dan-- what does your constructor look like? I wonder if >> something's getting >> lost in new() and _initialize() chaining spaghetti- MAJ >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Jan 14 02:25:05 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 14 Jan 2010 01:25:05 -0600 Subject: [Bioperl-l] Fw: not able to use Bio::Root::IO method In-Reply-To: <59E0A4EAF5934DC6BDCA7D8E98DB085F@NewLife> References: <59E0A4EAF5934DC6BDCA7D8E98DB085F@NewLife> Message-ID: <1DB926E1-9C6F-4B96-8D7E-28317DD7DE42@illinois.edu> Yes, that's true. The call to an io() is a Bio::Tools::Run::WrapperBase thing (the io() is a Bio::Root::IO instance). chris On Jan 13, 2010, at 11:38 PM, Mark A. Jensen wrote: > up to list > ----- Original Message ----- From: "Mark A. Jensen" > To: "Dan Kortschak" > Sent: Thursday, January 14, 2010 12:36 AM > Subject: Re: [Bioperl-l] not able to use Bio::Root::IO method > > >> Aha-- check out the pod for Bio::Root::IO: >> "This module provides methods that will usually be needed for any sort >> of file- or stream-related input/output, e.g., keeping track of a file >> handle, transient printing and reading from the file handle, a close >> method, automatically closing the handle on garbage collection, etc. >> To use this for your own code you will either want to inherit from >> this module, or instantiate an object for every file or stream you are >> dealing with. In the first case this module will most likely not be >> the first class off which your class inherits; therefore you need to >> call _initialize_io() with the named parameters in order to set file >> handle, open file, etc automatically." >> I think you're wanting a call to $self->_initialize_io(). (There is no io() method explicitly defined in any of the base classes.) >> MAJ >> ----- Original Message ----- From: "Dan Kortschak" >> To: >> Sent: Wednesday, January 13, 2010 11:26 PM >> Subject: [Bioperl-l] not able to use Bio::Root::IO method >>> Hi All, >>> I'm having a stupid problem that for some reason I just can't figure >>> out. I'm putting together a B:A:IO:bowtie module to wrap around the >>> B:A:IO:sam module so bowtie output can be used as an assembly start >>> point. >>> For some reason that is escaping me I can't create tempfiles! >>> What should be the relevant code in the module: >>> package Bio::Assembly::IO::bowtie; >>> use strict; >>> use warnings; >>> # Object preamble - inherits from Bio::Root::Root >>> use Bio::SeqIO; >>> use Bio::Tools::Run::Samtools; >>> use Bio::Assembly::IO; >>> use Carp; >>> use Bio::Root::Root; >>> use Bio::Root::IO; >>> use base qw( Bio::Root::Root Bio::Root::IO Bio::Assembly::IO ); >>> and the line (there are a couple of others that are like to fail in the >>> same way, but I've not got that far) >>> my ($sam_tmp_h, $sam_tmp_f) = $self->io->tempfile( -dir => >>> $self->tempdir(), -suffix => '.sam' ); >>> Which dies with: >>> Can't locate object method "io" via package "Bio::Assembly::IO::bowtie" >>> at /usr/local/share/perl/5.10.0/Bio/Assembly/IO/bowtie.pm line 175. >>> Relevant environment vars: >>> DB<10> x @ISA 0 'Bio::Root::Root' >>> 1 'Bio::Root::IO' >>> 2 'Bio::Assembly::IO' >>> DB<11> x $self >>> 0 Bio::Assembly::IO::bowtie=HASH(0x2d226d8) >>> '_no_head' => undef >>> '_no_sq' => undef >>> '_root_verbose' => 0 >>> Can someone suggest what I'm missing? >>> cheers >>> Dan >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From dan.kortschak at adelaide.edu.au Thu Jan 14 02:59:20 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Thu, 14 Jan 2010 18:29:20 +1030 Subject: [Bioperl-l] not able to use Bio::Root::IO method In-Reply-To: References: <1263443206.8016.47.camel@zoidberg.mbs.adelaide.edu.au> <84196F01FF584C64A79B89FECE2DD86F@NewLife> <1263447335.8016.57.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: <1263455960.4630.3.camel@epistle> Thanks Chris, I've done that, and since the inheritance is direct (rather than being a constructed attribute in the object hash) the calls are $obj->temp* rather than the $obj->io->temp* that I was using. It works now and is much clearer having gotten rid of much of the declarations. cheers Dan On Thu, 2010-01-14 at 01:23 -0600, Chris Fields wrote: > You can remove separate 'use' directives if they are declared with > 'use base' (they will be imported then). Also, Bio::Root::IO inherits > Bio::Root::Root, and Bio::Assembly::IO should inherit from > Bio::Root::IO, so the only base module you should need is > Bio::Assembly::IO. It's possible having all three is confusing the > interpreter. > > chris From marcelo011982 at gmail.com Thu Jan 14 08:44:25 2010 From: marcelo011982 at gmail.com (Marcelo Iwata) Date: Thu, 14 Jan 2010 11:44:25 -0200 Subject: [Bioperl-l] Blast to Clustalw Format In-Reply-To: References: <1c9f28971001131012w65378e9bw12d892e874028056@mail.gmail.com> Message-ID: <1c9f28971001140544v715fc9acue40f836a31529307@mail.gmail.com> Thanks Mark. I think that most of you already know it. But , i'll put it for new users: #!/usr/bin/perl -w use strict; use Bio::SearchIO; use Bio::AlignIO; my $in = new Bio::SearchIO(-format => 'blast', -file => ' ../../fontes/exemplos/blat/teste2/output.blast '); my $aln; my $alnIO; $alnIO = Bio::AlignIO->new(-format =>"clustalw", -file => ">hsp.aln"); while ( my $result = $in->next_result ) { ## $result is a Bio::Search::Result::ResultI compliant object while ( my $hit = $result->next_hit ) { ## $hit is a Bio::Search::Hit::HitI compliant object while ( my $hsp = $hit->next_hsp ) { ## $hsp is a Bio::Search::HSP::HSPI compliant object $aln = $hsp->get_aln; $alnIO->write_aln($aln); } } } On Wed, Jan 13, 2010 at 4:44 PM, Mark A. Jensen wrote: > Marcelo- > Yes-- look at the code snip at > http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_SearchIO > combined with the snip at > http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods > (using -format => 'clustalw') > cheers MAJ > ----- Original Message ----- From: "Marcelo Iwata" < > marcelo011982 at gmail.com> > To: > Sent: Wednesday, January 13, 2010 1:12 PM > Subject: [Bioperl-l] Blast to Clustalw Format > > > Hi.. >> I have an simple Blast result, such as blastn. >> Is there an scrip to transform such result to Clustalw format in Bioperl >> ?(.aln) >> >> Thanx for any help. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> From marcelo011982 at gmail.com Thu Jan 14 08:46:21 2010 From: marcelo011982 at gmail.com (Marcelo Iwata) Date: Thu, 14 Jan 2010 11:46:21 -0200 Subject: [Bioperl-l] Blast to Clustalw Format In-Reply-To: <1c9f28971001140544v715fc9acue40f836a31529307@mail.gmail.com> References: <1c9f28971001131012w65378e9bw12d892e874028056@mail.gmail.com> <1c9f28971001140544v715fc9acue40f836a31529307@mail.gmail.com> Message-ID: <1c9f28971001140546r275a0b67pec68de5ab8bff015@mail.gmail.com> Sorry , the correct code is: #!/usr/bin/perl -w use strict; use Bio::SearchIO; use Bio::AlignIO; my $in = new Bio::SearchIO(-format => 'blast', -file => ' ../../fontes/exemplos/blat/teste2/output.blast '); my $aln; my $alnIO; $alnIO = Bio::AlignIO->new(-format =>"clustalw", -file => ">hsp.aln"); while ( my $result = $in->next_result ) { ## $result is a Bio::Search::Result::ResultI compliant object while ( my $hit = $result->next_hit ) { ## $hit is a Bio::Search::Hit::HitI compliant object while ( my $hsp = $hit->next_hsp ) { ## $hsp is a Bio::Search::HSP::HSPI compliant object $aln = $hsp->get_aln; $alnIO->write_aln($aln); } } } On Thu, Jan 14, 2010 at 11:44 AM, Marcelo Iwata wrote: > Thanks Mark. > I think that most of you already know it. > But , i'll put it for new users: > > > #!/usr/bin/perl -w > > use strict; > use Bio::SearchIO; > use Bio::AlignIO; > > my $in = new Bio::SearchIO(-format => 'blast', > -file => ' > ../../fontes/exemplos/blat/teste2/output.blast '); > my $aln; > my $alnIO; > $alnIO = Bio::AlignIO->new(-format =>"clustalw", -file => ">hsp.aln"); > while ( my $result = $in->next_result ) { > ## $result is a Bio::Search::Result::ResultI compliant object > while ( my $hit = $result->next_hit ) { > ## $hit is a Bio::Search::Hit::HitI compliant object > while ( my $hsp = $hit->next_hsp ) { > ## $hsp is a Bio::Search::HSP::HSPI compliant object > $aln = $hsp->get_aln; > $alnIO->write_aln($aln); > > > } > } > } > > > On Wed, Jan 13, 2010 at 4:44 PM, Mark A. Jensen wrote: > >> Marcelo- >> Yes-- look at the code snip at >> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_SearchIO >> combined with the snip at >> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods >> (using -format => 'clustalw') >> cheers MAJ >> ----- Original Message ----- From: "Marcelo Iwata" < >> marcelo011982 at gmail.com> >> To: >> Sent: Wednesday, January 13, 2010 1:12 PM >> Subject: [Bioperl-l] Blast to Clustalw Format >> >> >> Hi.. >>> I have an simple Blast result, such as blastn. >>> Is there an scrip to transform such result to Clustalw format in >>> Bioperl >>> ?(.aln) >>> >>> Thanx for any help. >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> > From maj at fortinbras.us Thu Jan 14 08:54:31 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 14 Jan 2010 08:54:31 -0500 Subject: [Bioperl-l] Blast to Clustalw Format In-Reply-To: <1c9f28971001140546r275a0b67pec68de5ab8bff015@mail.gmail.com> References: <1c9f28971001131012w65378e9bw12d892e874028056@mail.gmail.com><1c9f28971001140544v715fc9acue40f836a31529307@mail.gmail.com> <1c9f28971001140546r275a0b67pec68de5ab8bff015@mail.gmail.com> Message-ID: <1B8891488AA746F49BCAAB531FBE4D0B@NewLife> Thanks Marcelo-- code snips always appreciated! MAJ ----- Original Message ----- From: "Marcelo Iwata" To: Sent: Thursday, January 14, 2010 8:46 AM Subject: Re: [Bioperl-l] Blast to Clustalw Format > Sorry , the correct code is: > > > > #!/usr/bin/perl -w > > use strict; > use Bio::SearchIO; > use Bio::AlignIO; > > my $in = new Bio::SearchIO(-format => 'blast', > -file => ' > ../../fontes/exemplos/blat/teste2/output.blast '); > my $aln; > my $alnIO; > $alnIO = Bio::AlignIO->new(-format =>"clustalw", -file => ">hsp.aln"); > while ( my $result = $in->next_result ) { > ## $result is a Bio::Search::Result::ResultI compliant object > while ( my $hit = $result->next_hit ) { > ## $hit is a Bio::Search::Hit::HitI compliant object > while ( my $hsp = $hit->next_hsp ) { > ## $hsp is a Bio::Search::HSP::HSPI compliant object > $aln = $hsp->get_aln; > $alnIO->write_aln($aln); > > } > } > } > > > On Thu, Jan 14, 2010 at 11:44 AM, Marcelo Iwata > wrote: > >> Thanks Mark. >> I think that most of you already know it. >> But , i'll put it for new users: >> >> >> #!/usr/bin/perl -w >> >> use strict; >> use Bio::SearchIO; >> use Bio::AlignIO; >> >> my $in = new Bio::SearchIO(-format => 'blast', >> -file => ' >> ../../fontes/exemplos/blat/teste2/output.blast '); >> my $aln; >> my $alnIO; >> $alnIO = Bio::AlignIO->new(-format =>"clustalw", -file => ">hsp.aln"); >> while ( my $result = $in->next_result ) { >> ## $result is a Bio::Search::Result::ResultI compliant object >> while ( my $hit = $result->next_hit ) { >> ## $hit is a Bio::Search::Hit::HitI compliant object >> while ( my $hsp = $hit->next_hsp ) { >> ## $hsp is a Bio::Search::HSP::HSPI compliant object >> $aln = $hsp->get_aln; >> $alnIO->write_aln($aln); >> >> >> } >> } >> } >> >> >> On Wed, Jan 13, 2010 at 4:44 PM, Mark A. Jensen wrote: >> >>> Marcelo- >>> Yes-- look at the code snip at >>> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_SearchIO >>> combined with the snip at >>> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods >>> (using -format => 'clustalw') >>> cheers MAJ >>> ----- Original Message ----- From: "Marcelo Iwata" < >>> marcelo011982 at gmail.com> >>> To: >>> Sent: Wednesday, January 13, 2010 1:12 PM >>> Subject: [Bioperl-l] Blast to Clustalw Format >>> >>> >>> Hi.. >>>> I have an simple Blast result, such as blastn. >>>> Is there an scrip to transform such result to Clustalw format in >>>> Bioperl >>>> ?(.aln) >>>> >>>> Thanx for any help. >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From sidd.basu at gmail.com Thu Jan 14 14:15:04 2010 From: sidd.basu at gmail.com (Siddhartha Basu) Date: Thu, 14 Jan 2010 13:15:04 -0600 Subject: [Bioperl-l] reading blast report Message-ID: <4b4f6d3b.5544f10a.40b1.4126@mx.google.com> Hi, I have a script that reads a tblastn report(13000 records) and loads in a chado database(Bio::Chado::Schema module), however the machine runs of memory. I am trying to figure out other than loading the database stuff if it the reading of SearchIO module could consume a lot of memory. So, when i am reading a blast file and getting the result object .... while (my $result = $searchio->next_result) * Does the searchio object loads a huge chunk of file in the memory or for each iteration it only reads a part of the result. * Does doing an index on blast report and then reading from it be much faster and why. And is there any way i could iterate through each record in the index, will that be helpful. -siddhartha From jason at bioperl.org Thu Jan 14 14:53:29 2010 From: jason at bioperl.org (Jason Stajich) Date: Thu, 14 Jan 2010 11:53:29 -0800 Subject: [Bioperl-l] reading blast report In-Reply-To: <4b4f6d3b.5544f10a.40b1.4126@mx.google.com> References: <4b4f6d3b.5544f10a.40b1.4126@mx.google.com> Message-ID: <83075C66-90CC-4250-890E-FE42F5FEB019@bioperl.org> What aspects of the report are you loading? You might consider the blast report as tab-delimited (-m 8 format) if you only are interested in start/end positions and scores of ailgnments which is a simpler and reduced dataset that has lower memory footprint by the parser. Searchio (default) -format => blast - you can try the BLAST -format => blast_pull instead which lazy parses to create objects and will reduce memory consumption. -jason On Jan 14, 2010, at 11:15 AM, Siddhartha Basu wrote: > Hi, > I have a script that reads a tblastn report(13000 records) and loads > in > a chado database(Bio::Chado::Schema module), however the machine > runs of memory. I am trying to figure > out other than loading the database stuff > if it the reading of SearchIO module could consume a lot of memory. > So, > when i am reading a blast file and getting the result object .... > > while (my $result = $searchio->next_result) > > * Does the searchio object loads a huge chunk of file in the memory or > for each iteration it only reads a part of the result. > > * Does doing an index on blast report and then reading from it be much > faster and why. And is there any way i could iterate through each > record in the index, will that be helpful. > > -siddhartha > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ From sidd.basu at gmail.com Thu Jan 14 15:15:45 2010 From: sidd.basu at gmail.com (Siddhartha Basu) Date: Thu, 14 Jan 2010 14:15:45 -0600 Subject: [Bioperl-l] Re: reading blast report In-Reply-To: <83075C66-90CC-4250-890E-FE42F5FEB019@bioperl.org> References: <4b4f6d3b.5544f10a.40b1.4126@mx.google.com> <83075C66-90CC-4250-890E-FE42F5FEB019@bioperl.org> Message-ID: <4b4f7b74.5744f10a.7087.4813@mx.google.com> On Thu, 14 Jan 2010, Jason Stajich wrote: > What aspects of the report are you loading? You might consider the blast > report as tab-delimited (-m 8 format) if you only are interested in > start/end positions and scores of ailgnments which is a simpler and reduced > dataset that has lower memory footprint by the parser. I think this would be a better approach i am mostly interested in start/end/score data only. > > Searchio (default) -format => blast - you can try the BLAST -format => > blast_pull instead which lazy parses to create objects and will reduce > memory consumption. It's another good option though. But just out of curosity, so the regular blast parser do load the entire file in the memory consider the output consist of multiple Results concatenated together into a single file. Could anybody clarify. thanks, -siddhartha > > -jason > On Jan 14, 2010, at 11:15 AM, Siddhartha Basu wrote: > > > Hi, > > I have a script that reads a tblastn report(13000 records) and loads in > > a chado database(Bio::Chado::Schema module), however the machine runs of > > memory. I am trying to figure > > out other than loading the database stuff > > if it the reading of SearchIO module could consume a lot of memory. So, > > when i am reading a blast file and getting the result object .... > > > > while (my $result = $searchio->next_result) > > > > * Does the searchio object loads a huge chunk of file in the memory or > > for each iteration it only reads a part of the result. > > > > * Does doing an index on blast report and then reading from it be much > > faster and why. And is there any way i could iterate through each > > record in the index, will that be helpful. > > > > -siddhartha > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > http://fungalgenomes.org/ > From jason at bioperl.org Thu Jan 14 16:28:29 2010 From: jason at bioperl.org (Jason Stajich) Date: Thu, 14 Jan 2010 13:28:29 -0800 Subject: [Bioperl-l] reading blast report In-Reply-To: <4b4f7b74.5744f10a.7087.4813@mx.google.com> References: <4b4f6d3b.5544f10a.40b1.4126@mx.google.com> <83075C66-90CC-4250-890E-FE42F5FEB019@bioperl.org> <4b4f7b74.5744f10a.7087.4813@mx.google.com> Message-ID: On Jan 14, 2010, at 12:15 PM, Siddhartha Basu wrote: > On Thu, 14 Jan 2010, Jason Stajich wrote: > >> What aspects of the report are you loading? You might consider the >> blast >> report as tab-delimited (-m 8 format) if you only are interested in >> start/end positions and scores of ailgnments which is a simpler and >> reduced >> dataset that has lower memory footprint by the parser. > > I think this would be a better approach i am mostly interested in > start/end/score data only. > >> >> Searchio (default) -format => blast - you can try the BLAST -format >> => >> blast_pull instead which lazy parses to create objects and will >> reduce >> memory consumption. > > It's another good option though. But just out of curosity, so the > regular blast parser do load the entire file in the memory consider > the > output consist of multiple Results concatenated together into a > single file. Could anybody clarify. > > thanks, > -siddhartha Each result is parsed (1 result per query) and all the hits and HSPs are parsed and brought into memory with the standard (non-pull) approach. The SearchIO iterates at the level of result - that is why you call next_result which parses each one at a time. > > >> >> -jason >> On Jan 14, 2010, at 11:15 AM, Siddhartha Basu wrote: >> >>> Hi, >>> I have a script that reads a tblastn report(13000 records) and >>> loads in >>> a chado database(Bio::Chado::Schema module), however the machine >>> runs of >>> memory. I am trying to figure >>> out other than loading the database stuff >>> if it the reading of SearchIO module could consume a lot of >>> memory. So, >>> when i am reading a blast file and getting the result object .... >>> >>> while (my $result = $searchio->next_result) >>> >>> * Does the searchio object loads a huge chunk of file in the >>> memory or >>> for each iteration it only reads a part of the result. >>> >>> * Does doing an index on blast report and then reading from it be >>> much >>> faster and why. And is there any way i could iterate through each >>> record in the index, will that be helpful. >>> >>> -siddhartha >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> jason.stajich at gmail.com >> jason at bioperl.org >> http://fungalgenomes.org/ >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ From sidd.basu at gmail.com Thu Jan 14 16:40:42 2010 From: sidd.basu at gmail.com (Siddhartha Basu) Date: Thu, 14 Jan 2010 15:40:42 -0600 Subject: [Bioperl-l] Re: reading blast report In-Reply-To: References: <4b4f6d3b.5544f10a.40b1.4126@mx.google.com> <83075C66-90CC-4250-890E-FE42F5FEB019@bioperl.org> <4b4f7b74.5744f10a.7087.4813@mx.google.com> Message-ID: <4b4f8f5d.5644f10a.2be2.47dc@mx.google.com> Thanks jason for clarification. On Thu, 14 Jan 2010, Jason Stajich wrote: > > On Jan 14, 2010, at 12:15 PM, Siddhartha Basu wrote: > > > On Thu, 14 Jan 2010, Jason Stajich wrote: > > > >> What aspects of the report are you loading? You might consider the blast > >> report as tab-delimited (-m 8 format) if you only are interested in > >> start/end positions and scores of ailgnments which is a simpler and > >> reduced > >> dataset that has lower memory footprint by the parser. > > > > I think this would be a better approach i am mostly interested in > > start/end/score data only. > > > >> > >> Searchio (default) -format => blast - you can try the BLAST -format => > >> blast_pull instead which lazy parses to create objects and will reduce > >> memory consumption. > > > > It's another good option though. But just out of curosity, so the > > regular blast parser do load the entire file in the memory consider the > > output consist of multiple Results concatenated together into a > > single file. Could anybody clarify. > > > > thanks, > > -siddhartha > > Each result is parsed (1 result per query) and all the hits and HSPs are > parsed and brought into memory with the standard (non-pull) approach. > The SearchIO iterates at the level of result - that is why you call > next_result which parses each one at a time. > > > > > > >> > >> -jason > >> On Jan 14, 2010, at 11:15 AM, Siddhartha Basu wrote: > >> > >>> Hi, > >>> I have a script that reads a tblastn report(13000 records) and loads in > >>> a chado database(Bio::Chado::Schema module), however the machine runs > >>> of > >>> memory. I am trying to figure > >>> out other than loading the database stuff > >>> if it the reading of SearchIO module could consume a lot of memory. So, > >>> when i am reading a blast file and getting the result object .... > >>> > >>> while (my $result = $searchio->next_result) > >>> > >>> * Does the searchio object loads a huge chunk of file in the memory or > >>> for each iteration it only reads a part of the result. > >>> > >>> * Does doing an index on blast report and then reading from it be much > >>> faster and why. And is there any way i could iterate through each > >>> record in the index, will that be helpful. > >>> > >>> -siddhartha > >>> > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> -- > >> Jason Stajich > >> jason.stajich at gmail.com > >> jason at bioperl.org > >> http://fungalgenomes.org/ > >> > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > http://fungalgenomes.org/ > From SMarkel at accelrys.com Thu Jan 14 17:58:06 2010 From: SMarkel at accelrys.com (Scott Markel) Date: Thu, 14 Jan 2010 14:58:06 -0800 Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes Message-ID: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net> We've been looking at Bio::Tools::Run::RemoteBlast after some feedback from our customers. Due to network irregularities (not sure what else to call it) users see the getting of remote BLAST results as somewhat random. When results come back the hits are fine, but sometimes no information comes back at all. Retrying helps. In looking at RemoteBlast.pm there are four "return -1" cases. * $status eq 'ERROR' (return on line 614) * $line =~ /ERROR/I (return on line 628) * !$got_content (return on line 648) * !$response->is_success (return on line 655) In the case of no content we'd like to retry remote BLAST. We're happy to do that part in our Pipeline Pilot Perl code wrapper for the BioPerl module, but we only want to retry in that case, not the other three. What would happen if that third "return -1" changed to a different return value? Scott Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel at accelrys.com Accelrys (Pipeline Pilot R&D) mobile: +1 858 205 3653 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 San Diego, CA 92121 fax: +1 858 799 5222 USA web: http://www.accelrys.com http://www.linkedin.com/in/smarkel Vice President, Board of Directors: International Society for Computational Biology Chair: ISCB Publications Committee Associate Editor: PLoS Computational Biology Editorial Board: Briefings in Bioinformatics From nickjd at gmail.com Wed Jan 13 08:18:12 2010 From: nickjd at gmail.com (NickJD) Date: Wed, 13 Jan 2010 05:18:12 -0800 (PST) Subject: [Bioperl-l] Parsing PSI-BLAST results with SearchIO Message-ID: <65554589-081b-4297-ab68-9ddfbd3d9944@c34g2000yqn.googlegroups.com> I am trying to parse PSI-BLAST results using SearchIO and some very basic code just to read the number of hits, number of hsps, etc. I have done 10 rounds on 1 input sequence and parsed it but it seems to treat each round as a separate result, so round/iteration is always 1 and new_hits its always the total list not the ones that are new to that round. Does anyone have any experience of this? Thanks, Nick From dsidote at waksman.rutgers.edu Wed Jan 13 10:08:48 2010 From: dsidote at waksman.rutgers.edu (David J Sidote) Date: Wed, 13 Jan 2010 10:08:48 -0500 Subject: [Bioperl-l] Bioinformatician position - Waksman Institute Message-ID: <4b42af671001130708i703ecce0u47348484321714f@mail.gmail.com> Bioinformatician ? Research Assistant Professor The Waksman Institute of Microbiology located on the New Brunswick campus of Rutgers University is seeking a highly motivated and talented bioinformatics scientist for an Research Assistant Professor appointment. The successful candidate will analyze genome, transcriptome, and epigenome data generated on the Life Sciences 454, Illumina, and AB SOLiD high-throughput sequencing platforms. Excellent communication and teamwork skills are essential as the successful candidate will work closely with individual research groups to develop software to facilitate the visualization, quantification, and interpretation of the data. The successful candidate will be expected to contribute to the publication of scientific literature and to present at seminars and conferences. Qualifications: - PhD in molecular biology, genetics, bioinformatics, systems biology or other related fields; candidates with a PhD in physics, mathematics, or computer science with some working knowledge of biology and experience are encouraged to apply. - Demonstrated scientific track record - Highly proficient in perl, python, or ruby programming, linux/unix scripting, and SQL. - Experience with R is desirable but not required - Experience with high-throughput sequencing, microarrays, or other high-throughput biological platforms - Excellent communication and organizational skills How to Apply: Please send a cover letter stating your current research interests, why you are interested in this position, and how your skill set complements this position along with a curriculum vitae, and the names and contact information of three references to hr at waksman.rutgers.edu. Please include "Bioinformatics Assistant Research Professor" in the subject line. Rutgers is an equal opportunity employer. For more information about this position please contact: Dr. David Sidote (dsidote at waksman.rutgers.edu) From albezg at gmail.com Wed Jan 13 20:57:27 2010 From: albezg at gmail.com (albezg) Date: Wed, 13 Jan 2010 20:57:27 -0500 Subject: [Bioperl-l] AlignIO crashes when reading stockholm alignment with negative PDB ranges In-Reply-To: <49C405F0.5050100@gmail.com> References: <49C2B97B.7070304@gmail.com> <49C405F0.5050100@gmail.com> Message-ID: <4B4E7A07.7070805@gmail.com> Hi all, I have a problem using AlignIO to read Pfam database: ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release/Pfam-A.seed.gz The database is in STOCKHOLM 1.0 format. AlignIO can read the alignment OK until the alignment PF00331.13. There it crashes with the following message: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: '1-344' is not an integer. STACK: Error::throw STACK: Bio::Root::Root::throw /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Root/Root.pm:368 STACK: Bio::Range::end /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Range.pm:228 STACK: Bio::Annotation::Target::new /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Annotation/Target.pm:82 STACK: Bio::AlignIO::Handler::GenericAlignHandler::_stockholm_target /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/GenericAlignHandler.pm:293 STACK: Bio::AlignIO::Handler::GenericAlignHandler::data_handler /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/GenericAlignHandler.pm:73 STACK: Bio::AlignIO::stockholm::next_aln /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/stockholm.pm:471 STACK: /home/albezg/scripts/pfam2fasta.pl:22 ----------------------------------------------------------- It appears this is caused by this entry: #=GS XYNA_PSEFL/263-608 DR PDB; 1e5n B; -1-344; I don't care about residues in PDB, so I have just removed minus signs from the ranges. This seems to have fixed the crashing. Is it a known problem? Is there a solution for it? Thanks, Alexandr On 03/20/2009 05:09 PM, albezg wrote: > > I'm trying to change FASTA header(display_id) for a sequence in an > alignment(SimpleAlign). > > There are no issues when I print it, however when I use AlignIO to write > the alignment to a FASTA file, it does not work. Is this behavior intended? > > Demo code: http://github.com/jhannah/sandbox/tree/master/Bio_AlignIO_bug > > The error: > ------------- EXCEPTION ------------- > MSG: No sequence with name [1/1-11] > STACK Bio::SimpleAlign::displayname > /scratch/BioSoftware/bioperl-live/Bio/SimpleAlign.pm:2659 > STACK Bio::AlignIO::fasta::write_aln > /scratch/BioSoftware/bioperl-live/Bio/AlignIO/fasta.pm:200 > STACK toplevel ./demo.pl:14 > ------------------------------------- > > Alexandr From mitch_skinner at berkeley.edu Thu Jan 14 17:10:53 2010 From: mitch_skinner at berkeley.edu (Mitch Skinner) Date: Thu, 14 Jan 2010 14:10:53 -0800 Subject: [Bioperl-l] filter_by_location in Bio::DB::SeqFeature::Store::memory Message-ID: <4B4F966D.3030300@berkeley.edu> Hi, Some people haven't been getting all of the features in their GFF3 into JBrowse, and a nice test case that James Casbon posted to the list helped me track it down. Here's an example of the behavior I was seeing with BioPerl 1.6.1 (using Devel::REPL): ============== $ use Bio::DB::SeqFeature::Store $ my $db = Bio::DB::SeqFeature::Store->new(-adaptor=>"memory", -dsn=>"casbon.gff3") $Bio_DB_SeqFeature_Store_memory1 = Bio::DB::SeqFeature::Store::memory=HASH(0xa27ceec); $ $db->features(-seq_id=>"CYP2C8") $ARRAY1 = [ Feature:src(41), region(CYP2C8), Feature:src(37), Feature:src(39), Feature:src(42), Feature:src(40), Feature:src(38) ]; ============== I expected to also see the features with IDs 43 and 44 (the gff3 file is attached). I think there's a problem in the filter_by_location method. If start and end parameters aren't passed to the method, it sets default start and end values that lead it to examine all of the bins in its index. But the end value that it creates is at the beginning of the last bin, and I think it should be at the end of the last bin instead. The attached patch changes it to be at the end of the last bin. Regards, Mitch -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: casbon.gff3 URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: bdsfsm-filter_by_location.patch URL: From jason at bioperl.org Thu Jan 14 19:20:43 2010 From: jason at bioperl.org (Jason Stajich) Date: Thu, 14 Jan 2010 16:20:43 -0800 Subject: [Bioperl-l] AlignIO crashes when reading stockholm alignment with negative PDB ranges In-Reply-To: <4B4E7A07.7070805@gmail.com> References: <49C2B97B.7070304@gmail.com> <49C405F0.5050100@gmail.com> <4B4E7A07.7070805@gmail.com> Message-ID: <94913CE4-22AA-407C-9B27-7572A680C9F7@bioperl.org> Seems like improper data really -- "-1" is an improper coordinate as far as the parser is concerned. You may want to tell Pfam that there is possible error in the dumper since that was the only record that had this problem? -jason On Jan 13, 2010, at 5:57 PM, albezg wrote: > Hi all, > > I have a problem using AlignIO to read Pfam database: > ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release/Pfam-A.seed.gz > The database is in STOCKHOLM 1.0 format. AlignIO can read the > alignment OK until the alignment PF00331.13. There it crashes with > the following message: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: '1-344' is not an integer. > > STACK: Error::throw > STACK: Bio::Root::Root::throw /home/albezg/lib/perl5/site_perl/ > 5.10.0/Bio/Root/Root.pm:368 > STACK: Bio::Range::end /home/albezg/lib/perl5/site_perl/5.10.0/Bio/ > Range.pm:228 > STACK: Bio::Annotation::Target::new /home/albezg/lib/perl5/site_perl/ > 5.10.0/Bio/Annotation/Target.pm:82 > STACK: > Bio::AlignIO::Handler::GenericAlignHandler::_stockholm_target /home/ > albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/ > GenericAlignHandler.pm:293 > STACK: Bio::AlignIO::Handler::GenericAlignHandler::data_handler / > home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/ > GenericAlignHandler.pm:73 > STACK: Bio::AlignIO::stockholm::next_aln /home/albezg/lib/perl5/ > site_perl/5.10.0/Bio/AlignIO/stockholm.pm:471 > STACK: /home/albezg/scripts/pfam2fasta.pl:22 > ----------------------------------------------------------- > > It appears this is caused by this entry: > #=GS XYNA_PSEFL/263-608 DR PDB; 1e5n B; -1-344; > > I don't care about residues in PDB, so I have just removed minus > signs from the ranges. This seems to have fixed the crashing. > > Is it a known problem? Is there a solution for it? > > Thanks, > Alexandr > > > On 03/20/2009 05:09 PM, albezg wrote: >> >> I'm trying to change FASTA header(display_id) for a sequence in an >> alignment(SimpleAlign). >> >> There are no issues when I print it, however when I use AlignIO to >> write >> the alignment to a FASTA file, it does not work. Is this behavior >> intended? >> >> Demo code: http://github.com/jhannah/sandbox/tree/master/Bio_AlignIO_bug >> >> The error: >> ------------- EXCEPTION ------------- >> MSG: No sequence with name [1/1-11] >> STACK Bio::SimpleAlign::displayname >> /scratch/BioSoftware/bioperl-live/Bio/SimpleAlign.pm:2659 >> STACK Bio::AlignIO::fasta::write_aln >> /scratch/BioSoftware/bioperl-live/Bio/AlignIO/fasta.pm:200 >> STACK toplevel ./demo.pl:14 >> ------------------------------------- >> >> Alexandr > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ From maj at fortinbras.us Thu Jan 14 21:00:31 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 14 Jan 2010 21:00:31 -0500 Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes In-Reply-To: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net> References: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net> Message-ID: How about returning 1, 2, 4 for the non-zero cases, with some error constants set for convenience? MAJ ----- Original Message ----- From: "Scott Markel" To: Sent: Thursday, January 14, 2010 5:58 PM Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes > We've been looking at Bio::Tools::Run::RemoteBlast after some feedback > from our customers. Due to network irregularities (not sure what else > to call it) users see the getting of remote BLAST results as somewhat > random. When results come back the hits are fine, but sometimes no > information comes back at all. Retrying helps. > > In looking at RemoteBlast.pm there are four "return -1" cases. > > * $status eq 'ERROR' (return on line 614) > * $line =~ /ERROR/I (return on line 628) > * !$got_content (return on line 648) > * !$response->is_success (return on line 655) > > In the case of no content we'd like to retry remote BLAST. We're happy > to do that part in our Pipeline Pilot Perl code wrapper for the BioPerl > module, but we only want to retry in that case, not the other three. > > What would happen if that third "return -1" changed to a different > return value? > > Scott > > Scott Markel, Ph.D. > Principal Bioinformatics Architect email: smarkel at accelrys.com > Accelrys (Pipeline Pilot R&D) mobile: +1 858 205 3653 > 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 > San Diego, CA 92121 fax: +1 858 799 5222 > USA web: http://www.accelrys.com > > http://www.linkedin.com/in/smarkel > Vice President, Board of Directors: > International Society for Computational Biology > Chair: ISCB Publications Committee > Associate Editor: PLoS Computational Biology > Editorial Board: Briefings in Bioinformatics > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Thu Jan 14 19:42:31 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 14 Jan 2010 18:42:31 -0600 Subject: [Bioperl-l] reading blast report In-Reply-To: <4b4f7b74.5744f10a.7087.4813@mx.google.com> References: <4b4f6d3b.5544f10a.40b1.4126@mx.google.com> <83075C66-90CC-4250-890E-FE42F5FEB019@bioperl.org> <4b4f7b74.5744f10a.7087.4813@mx.google.com> Message-ID: <0B76CCA7-C37C-4E24-BBDF-C8FD805DBBF2@illinois.edu> On Jan 14, 2010, at 2:15 PM, Siddhartha Basu wrote: > On Thu, 14 Jan 2010, Jason Stajich wrote: > >> What aspects of the report are you loading? You might consider the blast >> report as tab-delimited (-m 8 format) if you only are interested in >> start/end positions and scores of ailgnments which is a simpler and reduced >> dataset that has lower memory footprint by the parser. > > I think this would be a better approach i am mostly interested in > start/end/score data only. > >> Searchio (default) -format => blast - you can try the BLAST -format => >> blast_pull instead which lazy parses to create objects and will reduce >> memory consumption. > > It's another good option though. But just out of curosity, so the > regular blast parser do load the entire file in the memory consider the > output consist of multiple Results concatenated together into a > single file. Could anybody clarify. Yes, the original SearchIO parsers all load the data into objects. This was based on the presumption that one wouldn't want very large BLAST reports, but this assumption probably isn't amenable today. The pull parser is one aswer to that, in it pulls the data only upon request (creates them on the fly), so it should be more amenable to parsing very large BLAST reports. > thanks, > -siddhartha > >> -jason chris From cjfields at illinois.edu Fri Jan 15 01:33:50 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 15 Jan 2010 00:33:50 -0600 Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes In-Reply-To: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net> References: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net> Message-ID: Scott, I think this is fine (to change the third condition and retry with a specific code). The other possibility is to simply throw different exceptions under each of these circumstances, which can be caught via eval to allow a retry under only certain conditions (no content, for instance). One interesting bit: I think (though I'm not sure) the new BLAST+ allows remote BLAST queries from command line, similar to the legacy blastcl3. Mark just wrote up a BLAST+ wrapper, so it might be worth testing that theory out. chris PS - BTW, nice to finally meet you at GMOD! On Jan 14, 2010, at 4:58 PM, Scott Markel wrote: > We've been looking at Bio::Tools::Run::RemoteBlast after some feedback > from our customers. Due to network irregularities (not sure what else > to call it) users see the getting of remote BLAST results as somewhat > random. When results come back the hits are fine, but sometimes no > information comes back at all. Retrying helps. > > In looking at RemoteBlast.pm there are four "return -1" cases. > > * $status eq 'ERROR' (return on line 614) > * $line =~ /ERROR/I (return on line 628) > * !$got_content (return on line 648) > * !$response->is_success (return on line 655) > > In the case of no content we'd like to retry remote BLAST. We're happy > to do that part in our Pipeline Pilot Perl code wrapper for the BioPerl > module, but we only want to retry in that case, not the other three. > > What would happen if that third "return -1" changed to a different > return value? > > Scott > > Scott Markel, Ph.D. > Principal Bioinformatics Architect email: smarkel at accelrys.com > Accelrys (Pipeline Pilot R&D) mobile: +1 858 205 3653 > 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 > San Diego, CA 92121 fax: +1 858 799 5222 > USA web: http://www.accelrys.com > > http://www.linkedin.com/in/smarkel > Vice President, Board of Directors: > International Society for Computational Biology > Chair: ISCB Publications Committee > Associate Editor: PLoS Computational Biology > Editorial Board: Briefings in Bioinformatics > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields1 at gmail.com Fri Jan 15 01:35:35 2010 From: cjfields1 at gmail.com (Christopher Fields) Date: Fri, 15 Jan 2010 00:35:35 -0600 Subject: [Bioperl-l] filter_by_location in Bio::DB::SeqFeature::Store::memory In-Reply-To: <4B4F966D.3030300@berkeley.edu> References: <4B4F966D.3030300@berkeley.edu> Message-ID: <992796AC-B85B-4555-88A1-36000C0A2002@gmail.com> An HTML attachment was scrubbed... URL: From David.Messina at sbc.su.se Fri Jan 15 10:17:14 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 15 Jan 2010 16:17:14 +0100 Subject: [Bioperl-l] getting/setting species names with Bio::Species Message-ID: <86045871-98A8-46A3-9ECC-699C32CFE1A1@sbc.su.se> Hi everybody, I'm having a little trouble with names in Bio::Species objects. According to the Bio::Species documentation, if I have a species name as a string, like "Homo sapiens", I can get and set that using the species method: my $my_species_obj = Bio::Species->new(); $my_species_obj->species('Homo sapiens'); print $my_species_obj->species; # 'Homo sapiens' That works fine if I create the Bio::Species object myself. But if I try to get that string back out from a BIo::Species object created by SeqIO from a genbank file, I get just 'sapiens' back: my $io = Bio::SeqIO->new('-format' => 'genbank', '-file' => 'hoxa2.gb'); my $seq_obj = $io->next_seq; my $io_species_obj = $seq_obj->species; print $io_species_obj->species; # 'sapiens' I think that happens because genbank records have more taxonomic info about the species name, like the genus (and in fact the whole taxonomic categorization: kingdom phylum order, etc). So the genus is stored separately. Poking around a bit more in Bio::Species, I turned up the method 'binomial', which appears to do the right thing, returning genus and species in both cases. Except, as you can see, the space is stripped out for my species-name-is-just-a-string object: print $my_species_obj->binomial; # 'Homosapiens' print $io_species_obj->binomial; # 'Homo sapiens' I'm not very familiar with Bio::Species (and its parent Bio::Taxon); am I using it correctly above, or is there a better way? If not, this kinda looks like a bug to me. I've got a patch which works and passes the BioPerl test suite. Thanks, Dave From maj at fortinbras.us Fri Jan 15 10:31:16 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 15 Jan 2010 10:31:16 -0500 Subject: [Bioperl-l] getting/setting species names with Bio::Species In-Reply-To: <86045871-98A8-46A3-9ECC-699C32CFE1A1@sbc.su.se> References: <86045871-98A8-46A3-9ECC-699C32CFE1A1@sbc.su.se> Message-ID: I'm not that familiar with Bio::Species either, but this looks like conflicting semantics betwen Bio::Species and Bio::SeqIO. Bio::SeqIO sets the species accessor to the 'species' element of the lineage array, I believe. FWIW, I'd prefer "binomial" = "genus" . "species" MAJ ----- Original Message ----- From: "Dave Messina" To: "BioPerl List" Sent: Friday, January 15, 2010 10:17 AM Subject: [Bioperl-l] getting/setting species names with Bio::Species > Hi everybody, > > I'm having a little trouble with names in Bio::Species objects. > > According to the Bio::Species documentation, if I have a species name as a > string, like "Homo sapiens", I can get and set that using the species method: > > my $my_species_obj = Bio::Species->new(); > $my_species_obj->species('Homo sapiens'); > > print $my_species_obj->species; # 'Homo sapiens' > > > That works fine if I create the Bio::Species object myself. > > But if I try to get that string back out from a BIo::Species object created by > SeqIO from a genbank file, I get just 'sapiens' back: > > my $io = Bio::SeqIO->new('-format' => 'genbank', > '-file' => 'hoxa2.gb'); > my $seq_obj = $io->next_seq; > my $io_species_obj = $seq_obj->species; > > print $io_species_obj->species; # 'sapiens' > > > I think that happens because genbank records have more taxonomic info about > the species name, like the genus (and in fact the whole taxonomic > categorization: kingdom phylum order, etc). So the genus is stored separately. > > Poking around a bit more in Bio::Species, I turned up the method 'binomial', > which appears to do the right thing, returning genus and species in both > cases. Except, as you can see, the space is stripped out for my > species-name-is-just-a-string object: > > print $my_species_obj->binomial; # 'Homosapiens' > print $io_species_obj->binomial; # 'Homo sapiens' > > > I'm not very familiar with Bio::Species (and its parent Bio::Taxon); am I > using it correctly above, or is there a better way? > > If not, this kinda looks like a bug to me. I've got a patch which works and > passes the BioPerl test suite. > > > Thanks, > Dave > > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From maj at fortinbras.us Fri Jan 15 10:24:06 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 15 Jan 2010 10:24:06 -0500 Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes In-Reply-To: References: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net> Message-ID: True-- blast+ allows remote dbs. I just commited a patch that makes this easy in StandAloneBlastPlus: specify '-remote => 1' in the factory, and downstream command calls will take care of it- MAJ # ex... use Bio::Tools::Run::StandAloneBlastPlus; use Bio::Seq; $ENV{BLASTPLUSDIR} = $where_it_is; my $fac = Bio::Tools::Run::StandAloneBlastPlus->new( -db_name => 'wgs', -remote => 1 ); my $result = $fac->blastn( -query => Bio::Seq->new(-seq=>'ggcaacaaacctggtaaagaagacggcaacaagcctggtaaagaagatggcaacaagcct', -id=>"proteinA") ); 1; ----- Original Message ----- From: "Chris Fields" To: "Scott Markel" Cc: Sent: Friday, January 15, 2010 1:33 AM Subject: Re: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes > Scott, > > I think this is fine (to change the third condition and retry with a specific > code). The other possibility is to simply throw different exceptions under > each of these circumstances, which can be caught via eval to allow a retry > under only certain conditions (no content, for instance). > > One interesting bit: I think (though I'm not sure) the new BLAST+ allows > remote BLAST queries from command line, similar to the legacy blastcl3. Mark > just wrote up a BLAST+ wrapper, so it might be worth testing that theory out. > > chris > > PS - BTW, nice to finally meet you at GMOD! > > On Jan 14, 2010, at 4:58 PM, Scott Markel wrote: > >> We've been looking at Bio::Tools::Run::RemoteBlast after some feedback >> from our customers. Due to network irregularities (not sure what else >> to call it) users see the getting of remote BLAST results as somewhat >> random. When results come back the hits are fine, but sometimes no >> information comes back at all. Retrying helps. >> >> In looking at RemoteBlast.pm there are four "return -1" cases. >> >> * $status eq 'ERROR' (return on line 614) >> * $line =~ /ERROR/I (return on line 628) >> * !$got_content (return on line 648) >> * !$response->is_success (return on line 655) >> >> In the case of no content we'd like to retry remote BLAST. We're happy >> to do that part in our Pipeline Pilot Perl code wrapper for the BioPerl >> module, but we only want to retry in that case, not the other three. >> >> What would happen if that third "return -1" changed to a different >> return value? >> >> Scott >> >> Scott Markel, Ph.D. >> Principal Bioinformatics Architect email: smarkel at accelrys.com >> Accelrys (Pipeline Pilot R&D) mobile: +1 858 205 3653 >> 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 >> San Diego, CA 92121 fax: +1 858 799 5222 >> USA web: http://www.accelrys.com >> >> http://www.linkedin.com/in/smarkel >> Vice President, Board of Directors: >> International Society for Computational Biology >> Chair: ISCB Publications Committee >> Associate Editor: PLoS Computational Biology >> Editorial Board: Briefings in Bioinformatics >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From SMarkel at accelrys.com Fri Jan 15 10:40:31 2010 From: SMarkel at accelrys.com (Scott Markel) Date: Fri, 15 Jan 2010 07:40:31 -0800 Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes In-Reply-To: References: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net> Message-ID: <5ACBA19439E77B43A06F4CAB897EC977019B3096@EXCH1-COLO.accelrys.net> Chris, It was nice meeting you and Scott C., too. And seeing Jason again. If you and Mark > How about returning 1, 2, 4 for the non-zero cases, with some > error constants set for convenience? MAJ are okay with adding more return values, that works best for us in Pipeline Pilot. I'll add a Bugzilla entry. Scott -----Original Message----- From: Chris Fields [mailto:cjfields at illinois.edu] Sent: Thursday, 14 January 2010 10:34 PM To: Scott Markel Cc: Bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes Scott, I think this is fine (to change the third condition and retry with a specific code). The other possibility is to simply throw different exceptions under each of these circumstances, which can be caught via eval to allow a retry under only certain conditions (no content, for instance). One interesting bit: I think (though I'm not sure) the new BLAST+ allows remote BLAST queries from command line, similar to the legacy blastcl3. Mark just wrote up a BLAST+ wrapper, so it might be worth testing that theory out. chris PS - BTW, nice to finally meet you at GMOD! On Jan 14, 2010, at 4:58 PM, Scott Markel wrote: > We've been looking at Bio::Tools::Run::RemoteBlast after some feedback > from our customers. Due to network irregularities (not sure what else > to call it) users see the getting of remote BLAST results as somewhat > random. When results come back the hits are fine, but sometimes no > information comes back at all. Retrying helps. > > In looking at RemoteBlast.pm there are four "return -1" cases. > > * $status eq 'ERROR' (return on line 614) > * $line =~ /ERROR/I (return on line 628) > * !$got_content (return on line 648) > * !$response->is_success (return on line 655) > > In the case of no content we'd like to retry remote BLAST. We're happy > to do that part in our Pipeline Pilot Perl code wrapper for the BioPerl > module, but we only want to retry in that case, not the other three. > > What would happen if that third "return -1" changed to a different > return value? > > Scott > > Scott Markel, Ph.D. > Principal Bioinformatics Architect email: smarkel at accelrys.com > Accelrys (Pipeline Pilot R&D) mobile: +1 858 205 3653 > 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 > San Diego, CA 92121 fax: +1 858 799 5222 > USA web: http://www.accelrys.com > > http://www.linkedin.com/in/smarkel > Vice President, Board of Directors: > International Society for Computational Biology > Chair: ISCB Publications Committee > Associate Editor: PLoS Computational Biology > Editorial Board: Briefings in Bioinformatics > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Fri Jan 15 11:00:21 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 15 Jan 2010 10:00:21 -0600 Subject: [Bioperl-l] getting/setting species names with Bio::Species In-Reply-To: References: <86045871-98A8-46A3-9ECC-699C32CFE1A1@sbc.su.se> Message-ID: <16F8A316-FAB3-4D5E-975A-05CE14578982@illinois.edu> > FWIW, I'd prefer "binomial" = "genus" . "species" That's the way Bio::Species is supposed to work, at least when it was refactored by Sendu. But just a note: Bio::Species was considered deprecated (scheduled for the 1.7 release IIRC) for many very good reasons in favor of Bio::Taxon. First and foremost among these is the fact we cannot consistently parse out the genus/species/strain/variant/etc for every organism in GenBank w/o knowing it's full lineage, which means including some taxonomic information. And even then it's highly problematic. We've had several heated discussions on list about how to handle this in a somewhat backwards-compatible way, and the main solution was to forego compatibility issues altogether and eventually deprecate Bio::Species altogether in favor of Bio::Taxon, a class that doesn't make the same assumptions. Bio::Species, in the interim, is-a Bio::Taxon. You'll note that a minimal Bio::DB::Taxonomy instance is constructed from the classification scheme in some instances, but if one had a proper DB link one could link to Entrez Taxonomy or a local flat file indexes DB and grab the info. Bio::Taxon (correct me if I'm wrong on this Sendu, if you're out there) eschews various methods (species, etc) for simpler consistent ones based on Taxonomy, and doesn't force us to handle every exception to getting the genus/species out of a name. That is left up to the user, at their peril. For either one, if you are reproducing the fully qualified name, you probably should use something like node_name() for consistency. Bio::Species also has scientific_name(). With a true Bio::Taxon one would need to be check this is performed on the species node. chris On Jan 15, 2010, at 9:31 AM, Mark A. Jensen wrote: > I'm not that familiar with Bio::Species either, but this looks > like conflicting semantics betwen Bio::Species and Bio::SeqIO. > Bio::SeqIO sets the species accessor to the 'species' element of > the lineage array, I believe. > FWIW, I'd prefer "binomial" = "genus" . "species" > MAJ > ----- Original Message ----- From: "Dave Messina" > To: "BioPerl List" > Sent: Friday, January 15, 2010 10:17 AM > Subject: [Bioperl-l] getting/setting species names with Bio::Species > > >> Hi everybody, >> >> I'm having a little trouble with names in Bio::Species objects. >> >> According to the Bio::Species documentation, if I have a species name as a string, like "Homo sapiens", I can get and set that using the species method: >> >> my $my_species_obj = Bio::Species->new(); >> $my_species_obj->species('Homo sapiens'); >> >> print $my_species_obj->species; # 'Homo sapiens' >> >> >> That works fine if I create the Bio::Species object myself. >> >> But if I try to get that string back out from a BIo::Species object created by SeqIO from a genbank file, I get just 'sapiens' back: >> >> my $io = Bio::SeqIO->new('-format' => 'genbank', >> '-file' => 'hoxa2.gb'); >> my $seq_obj = $io->next_seq; >> my $io_species_obj = $seq_obj->species; >> >> print $io_species_obj->species; # 'sapiens' >> >> >> I think that happens because genbank records have more taxonomic info about the species name, like the genus (and in fact the whole taxonomic categorization: kingdom phylum order, etc). So the genus is stored separately. >> >> Poking around a bit more in Bio::Species, I turned up the method 'binomial', which appears to do the right thing, returning genus and species in both cases. Except, as you can see, the space is stripped out for my species-name-is-just-a-string object: >> >> print $my_species_obj->binomial; # 'Homosapiens' >> print $io_species_obj->binomial; # 'Homo sapiens' >> >> >> I'm not very familiar with Bio::Species (and its parent Bio::Taxon); am I using it correctly above, or is there a better way? >> >> If not, this kinda looks like a bug to me. I've got a patch which works and passes the BioPerl test suite. >> >> >> Thanks, >> Dave >> >> >> >> >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From SMarkel at accelrys.com Fri Jan 15 11:10:34 2010 From: SMarkel at accelrys.com (Scott Markel) Date: Fri, 15 Jan 2010 08:10:34 -0800 Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes In-Reply-To: References: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net> <5ACBA19439E77B43A06F4CAB897EC977019B3096@EXCH1-COLO.accelrys.net> Message-ID: <5ACBA19439E77B43A06F4CAB897EC977019B30A7@EXCH1-COLO.accelrys.net> Mark, Thank you. Scott -----Original Message----- From: Mark A. Jensen [mailto:maj at fortinbras.us] Sent: Friday, 15 January 2010 8:10 AM To: Scott Markel; Chris Fields Cc: Bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes can do Scott-- cheers MAJ ----- Original Message ----- From: "Scott Markel" To: "Chris Fields" Cc: Sent: Friday, January 15, 2010 10:40 AM Subject: Re: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes > Chris, > > It was nice meeting you and Scott C., too. And seeing Jason again. > > If you and Mark > >> How about returning 1, 2, 4 for the non-zero cases, with some >> error constants set for convenience? MAJ > > are okay with adding more return values, that works best for us in > Pipeline Pilot. > > I'll add a Bugzilla entry. > > Scott > > > -----Original Message----- > From: Chris Fields [mailto:cjfields at illinois.edu] > Sent: Thursday, 14 January 2010 10:34 PM > To: Scott Markel > Cc: Bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes > > Scott, > > I think this is fine (to change the third condition and retry with a specific > code). The other possibility is to simply throw different exceptions under > each of these circumstances, which can be caught via eval to allow a retry > under only certain conditions (no content, for instance). > > One interesting bit: I think (though I'm not sure) the new BLAST+ allows > remote BLAST queries from command line, similar to the legacy blastcl3. Mark > just wrote up a BLAST+ wrapper, so it might be worth testing that theory out. > > chris > > PS - BTW, nice to finally meet you at GMOD! > > On Jan 14, 2010, at 4:58 PM, Scott Markel wrote: > >> We've been looking at Bio::Tools::Run::RemoteBlast after some feedback >> from our customers. Due to network irregularities (not sure what else >> to call it) users see the getting of remote BLAST results as somewhat >> random. When results come back the hits are fine, but sometimes no >> information comes back at all. Retrying helps. >> >> In looking at RemoteBlast.pm there are four "return -1" cases. >> >> * $status eq 'ERROR' (return on line 614) >> * $line =~ /ERROR/I (return on line 628) >> * !$got_content (return on line 648) >> * !$response->is_success (return on line 655) >> >> In the case of no content we'd like to retry remote BLAST. We're happy >> to do that part in our Pipeline Pilot Perl code wrapper for the BioPerl >> module, but we only want to retry in that case, not the other three. >> >> What would happen if that third "return -1" changed to a different >> return value? >> >> Scott >> >> Scott Markel, Ph.D. >> Principal Bioinformatics Architect email: smarkel at accelrys.com >> Accelrys (Pipeline Pilot R&D) mobile: +1 858 205 3653 >> 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 >> San Diego, CA 92121 fax: +1 858 799 5222 >> USA web: http://www.accelrys.com >> >> http://www.linkedin.com/in/smarkel >> Vice President, Board of Directors: >> International Society for Computational Biology >> Chair: ISCB Publications Committee >> Associate Editor: PLoS Computational Biology >> Editorial Board: Briefings in Bioinformatics >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From maj at fortinbras.us Fri Jan 15 11:09:38 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 15 Jan 2010 11:09:38 -0500 Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes In-Reply-To: <5ACBA19439E77B43A06F4CAB897EC977019B3096@EXCH1-COLO.accelrys.net> References: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net> <5ACBA19439E77B43A06F4CAB897EC977019B3096@EXCH1-COLO.accelrys.net> Message-ID: can do Scott-- cheers MAJ ----- Original Message ----- From: "Scott Markel" To: "Chris Fields" Cc: Sent: Friday, January 15, 2010 10:40 AM Subject: Re: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes > Chris, > > It was nice meeting you and Scott C., too. And seeing Jason again. > > If you and Mark > >> How about returning 1, 2, 4 for the non-zero cases, with some >> error constants set for convenience? MAJ > > are okay with adding more return values, that works best for us in > Pipeline Pilot. > > I'll add a Bugzilla entry. > > Scott > > > -----Original Message----- > From: Chris Fields [mailto:cjfields at illinois.edu] > Sent: Thursday, 14 January 2010 10:34 PM > To: Scott Markel > Cc: Bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes > > Scott, > > I think this is fine (to change the third condition and retry with a specific > code). The other possibility is to simply throw different exceptions under > each of these circumstances, which can be caught via eval to allow a retry > under only certain conditions (no content, for instance). > > One interesting bit: I think (though I'm not sure) the new BLAST+ allows > remote BLAST queries from command line, similar to the legacy blastcl3. Mark > just wrote up a BLAST+ wrapper, so it might be worth testing that theory out. > > chris > > PS - BTW, nice to finally meet you at GMOD! > > On Jan 14, 2010, at 4:58 PM, Scott Markel wrote: > >> We've been looking at Bio::Tools::Run::RemoteBlast after some feedback >> from our customers. Due to network irregularities (not sure what else >> to call it) users see the getting of remote BLAST results as somewhat >> random. When results come back the hits are fine, but sometimes no >> information comes back at all. Retrying helps. >> >> In looking at RemoteBlast.pm there are four "return -1" cases. >> >> * $status eq 'ERROR' (return on line 614) >> * $line =~ /ERROR/I (return on line 628) >> * !$got_content (return on line 648) >> * !$response->is_success (return on line 655) >> >> In the case of no content we'd like to retry remote BLAST. We're happy >> to do that part in our Pipeline Pilot Perl code wrapper for the BioPerl >> module, but we only want to retry in that case, not the other three. >> >> What would happen if that third "return -1" changed to a different >> return value? >> >> Scott >> >> Scott Markel, Ph.D. >> Principal Bioinformatics Architect email: smarkel at accelrys.com >> Accelrys (Pipeline Pilot R&D) mobile: +1 858 205 3653 >> 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 >> San Diego, CA 92121 fax: +1 858 799 5222 >> USA web: http://www.accelrys.com >> >> http://www.linkedin.com/in/smarkel >> Vice President, Board of Directors: >> International Society for Computational Biology >> Chair: ISCB Publications Committee >> Associate Editor: PLoS Computational Biology >> Editorial Board: Briefings in Bioinformatics >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From maj at fortinbras.us Fri Jan 15 11:10:02 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 15 Jan 2010 11:10:02 -0500 Subject: [Bioperl-l] getting/setting species names with Bio::Species In-Reply-To: <16F8A316-FAB3-4D5E-975A-05CE14578982@illinois.edu> References: <86045871-98A8-46A3-9ECC-699C32CFE1A1@sbc.su.se> <16F8A316-FAB3-4D5E-975A-05CE14578982@illinois.edu> Message-ID: excellent summary--thanks!! ----- Original Message ----- From: "Chris Fields" To: "Mark A. Jensen" Cc: "BioPerl List" Sent: Friday, January 15, 2010 11:00 AM Subject: Re: [Bioperl-l] getting/setting species names with Bio::Species >> FWIW, I'd prefer "binomial" = "genus" . "species" > > > That's the way Bio::Species is supposed to work, at least when it was > refactored by Sendu. But just a note: Bio::Species was considered deprecated > (scheduled for the 1.7 release IIRC) for many very good reasons in favor of > Bio::Taxon. First and foremost among these is the fact we cannot consistently > parse out the genus/species/strain/variant/etc for every organism in GenBank > w/o knowing it's full lineage, which means including some taxonomic > information. And even then it's highly problematic. > > We've had several heated discussions on list about how to handle this in a > somewhat backwards-compatible way, and the main solution was to forego > compatibility issues altogether and eventually deprecate Bio::Species > altogether in favor of Bio::Taxon, a class that doesn't make the same > assumptions. Bio::Species, in the interim, is-a Bio::Taxon. You'll note that > a minimal Bio::DB::Taxonomy instance is constructed from the classification > scheme in some instances, but if one had a proper DB link one could link to > Entrez Taxonomy or a local flat file indexes DB and grab the info. Bio::Taxon > (correct me if I'm wrong on this Sendu, if you're out there) eschews various > methods (species, etc) for simpler consistent ones based on Taxonomy, and > doesn't force us to handle every exception to getting the genus/species out of > a name. That is left up to the user, at their peril. > > For either one, if you are reproducing the fully qualified name, you probably > should use something like node_name() for consistency. Bio::Species also has > scientific_name(). With a true Bio::Taxon one would need to be check this is > performed on the species node. > > chris > > On Jan 15, 2010, at 9:31 AM, Mark A. Jensen wrote: > >> I'm not that familiar with Bio::Species either, but this looks >> like conflicting semantics betwen Bio::Species and Bio::SeqIO. >> Bio::SeqIO sets the species accessor to the 'species' element of >> the lineage array, I believe. >> FWIW, I'd prefer "binomial" = "genus" . "species" >> MAJ >> ----- Original Message ----- From: "Dave Messina" >> To: "BioPerl List" >> Sent: Friday, January 15, 2010 10:17 AM >> Subject: [Bioperl-l] getting/setting species names with Bio::Species >> >> >>> Hi everybody, >>> >>> I'm having a little trouble with names in Bio::Species objects. >>> >>> According to the Bio::Species documentation, if I have a species name as a >>> string, like "Homo sapiens", I can get and set that using the species >>> method: >>> >>> my $my_species_obj = Bio::Species->new(); >>> $my_species_obj->species('Homo sapiens'); >>> >>> print $my_species_obj->species; # 'Homo sapiens' >>> >>> >>> That works fine if I create the Bio::Species object myself. >>> >>> But if I try to get that string back out from a BIo::Species object created >>> by SeqIO from a genbank file, I get just 'sapiens' back: >>> >>> my $io = Bio::SeqIO->new('-format' => 'genbank', >>> '-file' => 'hoxa2.gb'); >>> my $seq_obj = $io->next_seq; >>> my $io_species_obj = $seq_obj->species; >>> >>> print $io_species_obj->species; # 'sapiens' >>> >>> >>> I think that happens because genbank records have more taxonomic info about >>> the species name, like the genus (and in fact the whole taxonomic >>> categorization: kingdom phylum order, etc). So the genus is stored >>> separately. >>> >>> Poking around a bit more in Bio::Species, I turned up the method 'binomial', >>> which appears to do the right thing, returning genus and species in both >>> cases. Except, as you can see, the space is stripped out for my >>> species-name-is-just-a-string object: >>> >>> print $my_species_obj->binomial; # 'Homosapiens' >>> print $io_species_obj->binomial; # 'Homo sapiens' >>> >>> >>> I'm not very familiar with Bio::Species (and its parent Bio::Taxon); am I >>> using it correctly above, or is there a better way? >>> >>> If not, this kinda looks like a bug to me. I've got a patch which works and >>> passes the BioPerl test suite. >>> >>> >>> Thanks, >>> Dave >>> >>> >>> >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From hlapp at drycafe.net Fri Jan 15 12:04:43 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Fri, 15 Jan 2010 12:04:43 -0500 Subject: [Bioperl-l] getting/setting species names with Bio::Species In-Reply-To: <86045871-98A8-46A3-9ECC-699C32CFE1A1@sbc.su.se> References: <86045871-98A8-46A3-9ECC-699C32CFE1A1@sbc.su.se> Message-ID: <2114E010-7819-4E74-9A92-C6DE6496ED0E@drycafe.net> On Jan 15, 2010, at 10:17 AM, Dave Messina wrote: > According to the Bio::Species documentation, if I have a species > name as a string, like "Homo sapiens", I can get and set that using > the species method: > > my $my_species_obj = Bio::Species->new(); > $my_species_obj->species('Homo sapiens'); If that's really what the documentation says, it's wrong. It is the binomial() method that does this (as getter and setter). -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From David.Messina at sbc.su.se Fri Jan 15 13:37:17 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 15 Jan 2010 19:37:17 +0100 Subject: [Bioperl-l] getting/setting species names with Bio::Species In-Reply-To: <2114E010-7819-4E74-9A92-C6DE6496ED0E@drycafe.net> References: <86045871-98A8-46A3-9ECC-699C32CFE1A1@sbc.su.se> <2114E010-7819-4E74-9A92-C6DE6496ED0E@drycafe.net> Message-ID: <24798E45-CF24-47D9-AB39-E66C35A5FA8B@sbc.su.se> Thanks guys. Well, looks like I ignored the deprecation warnings at my own peril. :) I'll reimplement my code using Bio::Taxon directly instead. I made a little test using the node_name() method as Chris suggested, and it seems to do the trick nicely. > If that's really what the documentation says, it's wrong. I'm afraid so. In the POD > Title : species > Usage : $self->species( $species ); > $species = $self->species(); > Function: Get or set the scientific species name. > Example : $self->species('Homo sapiens'); > Returns : Scientific species name as string > Args : Scientific species name as string and the HOWTO http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#The_Species_Object > # legible and long > my $species_object = $seq_object->species; > my $species_string = $species_object->species; > > # Perlish > my $species_string = $seq_object->species->species; > # either way, $species_string is "Homo sapiens" Unless there's objection, I'll fix both of those. > It is the binomial() method that does this (as getter and setter). Great, thanks for the clarification, Hilmar. From bhakti.dwivedi at gmail.com Sun Jan 17 11:02:47 2010 From: bhakti.dwivedi at gmail.com (Bhakti Dwivedi) Date: Sun, 17 Jan 2010 11:02:47 -0500 Subject: [Bioperl-l] Reciprocal best hits using Bioperl? Message-ID: Hi Is there a Bio-perl module to parse the reciprocal best hits (query1-> hit1 && hit1 -> query1) from a blast table report? Thanks BD From cjfields at illinois.edu Sun Jan 17 12:45:08 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 17 Jan 2010 11:45:08 -0600 Subject: [Bioperl-l] Reciprocal best hits using Bioperl? In-Reply-To: References: Message-ID: <4FC546A8-079F-4A17-AB96-D4A0060904D6@illinois.edu> It's probably not best to use BioPerl directly for this. Have you tried OrthoMCL, or InParanoid? chris On Jan 17, 2010, at 10:02 AM, Bhakti Dwivedi wrote: > Hi > > Is there a Bio-perl module to parse the reciprocal best hits (query1-> hit1 > && hit1 -> query1) from a blast table report? > > Thanks > > BD > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Sun Jan 17 16:03:24 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 17 Jan 2010 16:03:24 -0500 Subject: [Bioperl-l] Reciprocal best hits using Bioperl? In-Reply-To: References: Message-ID: re Chris's answer, check out this archived post: http://bioperl.org/pipermail/bioperl-l/2008-March/027357.html cheers MAJ ----- Original Message ----- From: "Bhakti Dwivedi" To: Sent: Sunday, January 17, 2010 11:02 AM Subject: [Bioperl-l] Reciprocal best hits using Bioperl? > Hi > > Is there a Bio-perl module to parse the reciprocal best hits (query1-> hit1 > && hit1 -> query1) from a blast table report? > > Thanks > > BD > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From bhakti.dwivedi at gmail.com Sun Jan 17 16:10:03 2010 From: bhakti.dwivedi at gmail.com (Bhakti Dwivedi) Date: Sun, 17 Jan 2010 16:10:03 -0500 Subject: [Bioperl-l] Reciprocal best hits using Bioperl? In-Reply-To: References: Message-ID: Thank you! On Sun, Jan 17, 2010 at 4:03 PM, Mark A. Jensen wrote: > re Chris's answer, check out this archived post: > http://bioperl.org/pipermail/bioperl-l/2008-March/027357.html > cheers MAJ > ----- Original Message ----- From: "Bhakti Dwivedi" < > bhakti.dwivedi at gmail.com> > To: > Sent: Sunday, January 17, 2010 11:02 AM > Subject: [Bioperl-l] Reciprocal best hits using Bioperl? > > > Hi >> >> Is there a Bio-perl module to parse the reciprocal best hits (query1-> >> hit1 >> && hit1 -> query1) from a blast table report? >> >> Thanks >> >> BD >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> From cjfields at illinois.edu Sun Jan 17 17:00:02 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 17 Jan 2010 16:00:02 -0600 Subject: [Bioperl-l] Reciprocal best hits using Bioperl? In-Reply-To: References: Message-ID: <392263B8-10EC-4361-82A8-0ED8E9FC7627@illinois.edu> OrthoMCL has updated to v2 and no longer uses BioPerl, just plain perl. Database is available here: http://orthomcl.org/cgi-bin/OrthoMclWeb.cgi Package (you'll need a few other things to get it working): http://orthomcl.org/common/downloads/software/ chris On Jan 17, 2010, at 3:10 PM, Bhakti Dwivedi wrote: > Thank you! > > > On Sun, Jan 17, 2010 at 4:03 PM, Mark A. Jensen wrote: > >> re Chris's answer, check out this archived post: >> http://bioperl.org/pipermail/bioperl-l/2008-March/027357.html >> cheers MAJ >> ----- Original Message ----- From: "Bhakti Dwivedi" < >> bhakti.dwivedi at gmail.com> >> To: >> Sent: Sunday, January 17, 2010 11:02 AM >> Subject: [Bioperl-l] Reciprocal best hits using Bioperl? >> >> >> Hi >>> >>> Is there a Bio-perl module to parse the reciprocal best hits (query1-> >>> hit1 >>> && hit1 -> query1) from a blast table report? >>> >>> Thanks >>> >>> BD >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From tristan.lefebure at gmail.com Sun Jan 17 18:12:56 2010 From: tristan.lefebure at gmail.com (Tristan Lefebure) Date: Sun, 17 Jan 2010 18:12:56 -0500 Subject: [Bioperl-l] Reciprocal best hits using Bioperl? In-Reply-To: <392263B8-10EC-4361-82A8-0ED8E9FC7627@illinois.edu> References: <392263B8-10EC-4361-82A8-0ED8E9FC7627@illinois.edu> Message-ID: <201001171812.56238.tristan.lefebure@gmail.com> The transition to orthoMCL v2 being a bit painful (you need a MySQL database), I recently switched directly to MCL and the accompanying mclblastline and co programs. Modular, simple and very fast. Following some simulations, It gives better results with incomplete genomes than orthoMCL v1.x ... http://micans.org/mcl/ --Tristan On Sunday 17 January 2010 17:00:02 Chris Fields wrote: > OrthoMCL has updated to v2 and no longer uses BioPerl, > just plain perl. Database is available here: > > http://orthomcl.org/cgi-bin/OrthoMclWeb.cgi > > Package (you'll need a few other things to get it > working): > > http://orthomcl.org/common/downloads/software/ > > chris > > On Jan 17, 2010, at 3:10 PM, Bhakti Dwivedi wrote: > > Thank you! > > > > On Sun, Jan 17, 2010 at 4:03 PM, Mark A. Jensen wrote: > >> re Chris's answer, check out this archived post: > >> http://bioperl.org/pipermail/bioperl-l/2008-March/0273 > >>57.html cheers MAJ > >> ----- Original Message ----- From: "Bhakti Dwivedi" < > >> bhakti.dwivedi at gmail.com> > >> To: > >> Sent: Sunday, January 17, 2010 11:02 AM > >> Subject: [Bioperl-l] Reciprocal best hits using > >> Bioperl? > >> > >> > >> Hi > >> > >>> Is there a Bio-perl module to parse the reciprocal > >>> best hits (query1-> hit1 > >>> && hit1 -> query1) from a blast table report? > >>> > >>> Thanks > >>> > >>> BD > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jason at bioperl.org Sun Jan 17 18:59:05 2010 From: jason at bioperl.org (Jason Stajich) Date: Sun, 17 Jan 2010 15:59:05 -0800 Subject: [Bioperl-l] Reciprocal best hits using Bioperl? In-Reply-To: <201001171812.56238.tristan.lefebure@gmail.com> References: <392263B8-10EC-4361-82A8-0ED8E9FC7627@illinois.edu> <201001171812.56238.tristan.lefebure@gmail.com> Message-ID: <9151FF3A-B51D-4E03-AAA0-4463A6A57F10@bioperl.org> yes - but mcl alone is something slightly different in that it doesn't correct for inparalogs, but for incomplete genomes this is probably okay. orthomcl2 does correct the major memory hog problem and efficiencies in the parsing in the previous version by relying on the db for the indexing and looking of the reciprocal hits. -jason On Jan 17, 2010, at 3:12 PM, Tristan Lefebure wrote: > The transition to orthoMCL v2 being a bit painful (you need > a MySQL database), I recently switched directly to MCL and > the accompanying mclblastline and co programs. Modular, > simple and very fast. Following some simulations, It gives > better results with incomplete genomes than orthoMCL v1.x > ... > > http://micans.org/mcl/ > > --Tristan > > On Sunday 17 January 2010 17:00:02 Chris Fields wrote: >> OrthoMCL has updated to v2 and no longer uses BioPerl, >> just plain perl. Database is available here: >> >> http://orthomcl.org/cgi-bin/OrthoMclWeb.cgi >> >> Package (you'll need a few other things to get it >> working): >> >> http://orthomcl.org/common/downloads/software/ >> >> chris >> >> On Jan 17, 2010, at 3:10 PM, Bhakti Dwivedi wrote: >>> Thank you! >>> >>> On Sun, Jan 17, 2010 at 4:03 PM, Mark A. Jensen > wrote: >>>> re Chris's answer, check out this archived post: >>>> http://bioperl.org/pipermail/bioperl-l/2008-March/0273 >>>> 57.html cheers MAJ >>>> ----- Original Message ----- From: "Bhakti Dwivedi" < >>>> bhakti.dwivedi at gmail.com> >>>> To: >>>> Sent: Sunday, January 17, 2010 11:02 AM >>>> Subject: [Bioperl-l] Reciprocal best hits using >>>> Bioperl? >>>> >>>> >>>> Hi >>>> >>>>> Is there a Bio-perl module to parse the reciprocal >>>>> best hits (query1-> hit1 >>>>> && hit1 -> query1) from a blast table report? >>>>> >>>>> Thanks >>>>> >>>>> BD >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ From tristan.lefebure at gmail.com Sun Jan 17 20:36:38 2010 From: tristan.lefebure at gmail.com (Tristan Lefebure) Date: Sun, 17 Jan 2010 20:36:38 -0500 Subject: [Bioperl-l] Reciprocal best hits using Bioperl? In-Reply-To: <9151FF3A-B51D-4E03-AAA0-4463A6A57F10@bioperl.org> References: <201001171812.56238.tristan.lefebure@gmail.com> <9151FF3A-B51D-4E03-AAA0-4463A6A57F10@bioperl.org> Message-ID: <201001172036.39032.tristan.lefebure@gmail.com> On Sunday 17 January 2010 18:59:05 Jason Stajich wrote: > yes - but mcl alone is something slightly different in > that it doesn't correct for inparalogs, but for > incomplete genomes this is probably okay. interestingly, my experience with not too divergent bacterial genomes (same genera) does not support the normalization used in the orthoMCL (which, as far as I understand, is a standardization of the -Log10(evalue) per taxa combination, including a taxa with itself). MCL, which does not do any normalization (just -Log10(evalue)) gives about the same number of false negative (i.e. missed orthologs), but a lot less false positive (false orthologs). In other words, you get many fake singletons. I don't known exactly if the problem lies in the normalization process or the fact that orthoMCLv1.x is using a very old version of MCL. What I do known is that many false positive are made of short or incomplete proteins that are very common in draft genomes and automatic annotations... Things might be completely different with more divergent and globally longer proteins. Testing orthoMCLv2 on the same data set would probably give the answer. --Tristan From robert.bradbury at gmail.com Mon Jan 18 05:20:33 2010 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Mon, 18 Jan 2010 05:20:33 -0500 Subject: [Bioperl-l] Reciprocal best hits using Bioperl? In-Reply-To: <201001172036.39032.tristan.lefebure@gmail.com> References: <201001171812.56238.tristan.lefebure@gmail.com> <9151FF3A-B51D-4E03-AAA0-4463A6A57F10@bioperl.org> <201001172036.39032.tristan.lefebure@gmail.com> Message-ID: My comment might be that the problem with OrthoMCL is that it is primarily lower organisms. The problem with Ensembl (and some other databases) is that it is primarliy higher organisms (though they do include Drosophila, C. elegans and Yeast). The problem arises when one wants to cross those boundaries. For example the 5-10 antioxidant proteins, the ~150 DNA repair proteins, many of the mitochondrial (ETC) proteins, the ribosomal rRNA's & tRNAs, and the fundamental biochemistry (EC) proteins are homologous all the way from the most ancient bacteria through H. sapiens. The only way to play in the mixed arena of prokaryotes and eukaryotes involving fundamental vectors in evolution is to either construct ones own databases (which presumably means getting involved with MySQL, and probably spending some $$$ on hardware) or to develop some BioPerl modules that can do the SpeciesX vs. SpeciesY comparisons on demand using some part of the cloud. This problem isn't going to get smaller its only going to get larger, now that the cost of sequencing (pseudo-resequencing) a vertebrate genome is starting to come in under $10,000 and people are starting to seriously talk about 10,000 vertebrate genomes. 10,000 x 10,000 x 20,000 (genes) isn't something people are going to undertake very soon. Robert On 1/17/10, Tristan Lefebure wrote: > On Sunday 17 January 2010 18:59:05 Jason Stajich wrote: >> yes - but mcl alone is something slightly different in >> that it doesn't correct for inparalogs, but for >> incomplete genomes this is probably okay. > > interestingly, my experience with not too divergent > bacterial genomes (same genera) does not support the > normalization used in the orthoMCL (which, as far as I > understand, is a standardization of the -Log10(evalue) per > taxa combination, including a taxa with itself). MCL, which > does not do any normalization (just -Log10(evalue)) gives > about the same number of false negative (i.e. missed > orthologs), but a lot less false positive (false orthologs). > In other words, you get many fake singletons. I don't known > exactly if the problem lies in the normalization process or > the fact that orthoMCLv1.x is using a very old version of > MCL. What I do known is that many false positive are made of > short or incomplete proteins that are very common in draft > genomes and automatic annotations... Things might be > completely different with more divergent and globally longer > proteins. Testing orthoMCLv2 on the same data set would > probably give the answer. > > --Tristan > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From ghhu at sibs.ac.cn Sun Jan 17 21:34:23 2010 From: ghhu at sibs.ac.cn (Guohong Hu) Date: Mon, 18 Jan 2010 10:34:23 +0800 Subject: [Bioperl-l] Bioperl 1.6 Message-ID: <004f01ca97e6$bee06650$3ca132f0$@ac.cn> Hi there, I was trying to install BioPerl in windows using ppm, by following the instruction in "http://bioperl.open-bio.org/wiki/Installing_Bioperl_on_Windows". I set up the repositories, and did the search of Bioperl packages. The latest version available is BioPerl 1.6.1 on http://bioperl.org/DIST. When I tried to install it, a number of prerequisite modules were being installed too, which include Bioperl 1.4. Then an error message showed up during installation: "ERROR: File conflict for 'C:/Perl/html/bin/bp_aacomp.html'.The package BioPerl has already installed a file that package bioperl wants to install." It looks to me that BioPerl 1.6.1 had installed a file that bioperl 1.4 wanted to install again. I don't know why bioperl 1.4 was one of the prerequisites for 1.6.1. If I just install 1.4, it will be installed without errors. But I need a newer version, because some modules (like Bio::Tools::HMM) is not included in 1.4. I saw on internet that somebody had the same problem when he was trying to install BioPerl 1.5, but I didn't find the solution. Anybody has a clue on that? Thank you for your time. GH From cjfields at illinois.edu Mon Jan 18 10:30:20 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 18 Jan 2010 09:30:20 -0600 Subject: [Bioperl-l] Bioperl 1.6 In-Reply-To: <004f01ca97e6$bee06650$3ca132f0$@ac.cn> References: <004f01ca97e6$bee06650$3ca132f0$@ac.cn> Message-ID: Guohong, 1.6.1 PPM doesn't (at least, shouldn't) require BioPerl 1.4 to be installed first. Make sure the repos are set according to the Windows installation instructions on the BioPerl wiki: http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows IIRC the actual order of the PPM repository can be critical (PPM pulls based on highest version, first repo, but sometimes it gets confused). Just curious but where is the v 1.4 PPM located? If it is local to our PPM repo I can physically remove it to prevent this from happening. chris On Jan 17, 2010, at 8:34 PM, Guohong Hu wrote: > Hi there, > > > > I was trying to install BioPerl in windows using ppm, by following the > instruction in > "http://bioperl.open-bio.org/wiki/Installing_Bioperl_on_Windows". I set up > the repositories, and did the search of Bioperl packages. The latest version > available is BioPerl 1.6.1 on http://bioperl.org/DIST. When I tried to > install it, a number of prerequisite modules were being installed too, which > include Bioperl 1.4. Then an error message showed up during installation: > > > > "ERROR: File conflict for 'C:/Perl/html/bin/bp_aacomp.html'.The package > BioPerl has already installed a file that package bioperl wants to install." > > > > It looks to me that BioPerl 1.6.1 had installed a file that bioperl 1.4 > wanted to install again. I don't know why bioperl 1.4 was one of the > prerequisites for 1.6.1. If I just install 1.4, it will be installed without > errors. But I need a newer version, because some modules (like > > Bio::Tools::HMM) is not included in 1.4. > > > > I saw on internet that somebody had the same problem when he was trying to > install BioPerl 1.5, but I didn't find the solution. > > > > Anybody has a clue on that? Thank you for your time. > > > > GH > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Jan 18 11:12:08 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 18 Jan 2010 10:12:08 -0600 Subject: [Bioperl-l] Reciprocal best hits using Bioperl? In-Reply-To: References: <201001171812.56238.tristan.lefebure@gmail.com> <9151FF3A-B51D-4E03-AAA0-4463A6A57F10@bioperl.org> <201001172036.39032.tristan.lefebure@gmail.com> Message-ID: (my small rant on this) On Jan 18, 2010, at 4:20 AM, Robert Bradbury wrote: > My comment might be that the problem with OrthoMCL is that it is > primarily lower organisms. The problem with Ensembl (and some other > databases) is that it is primarliy higher organisms (though they do > include Drosophila, C. elegans and Yeast). OrthoMCL v2 handles both lower and higher organism; I've used it for both, with decent success. Most other ortholog tools do as well (if I'm not mistaken, ensembl also uses MCL under the hood, unless that's changed). I don't believe one should be completely bound to one toolset, particularly in this case (there are lots of nice ortholog clustering tools using various moeans of comparison out there), but I do think OrthoMCL is very good as an initial pass. If anything, I would like a set of (possibly bioperl-based, definitely DB-based) modules that can deal with this information. The more imperative issue in my opinion is that one is prisoner to the gene models for those specific organisms of interest, and this may vary widely depending on the source of those gene models (Ensembl, UCSC, NCBI, EBI, centralized MODs like FlyBase, etc). For instance, if gene models are poorly curated or rarely updated, the comparisons may be significantly flawed. Some of these issues may also be (somewhat) alleviated once more transcriptome data is available that helps clear up gene model ambiguities, but that won't be true for all organisms, at least initially. Note this isn't meant as a slam on any specific DBs or MODs in general, the problem is one born of the fact that there isn't a single, centralized, trusted, consistently updated source for this data, specifically something that will handle moderated third-party annotation. That's a very difficult problem to solve effectively. Some of these very issues crept up at the GMOD conference, and there appears to be consensus that a real attempt is needed to address this. I don't know, maybe it's just unicorns and rainbows. Personally I do think the situation will improve, as there seems to be great demand for it, but it requires time, resources, manpower, money, cat herding, etc. > The problem arises when one wants to cross those boundaries. For > example the 5-10 antioxidant proteins, the ~150 DNA repair proteins, > many of the mitochondrial (ETC) proteins, the ribosomal rRNA's & > tRNAs, and the fundamental biochemistry (EC) proteins are homologous > all the way from the most ancient bacteria through H. sapiens. The > only way to play in the mixed arena of prokaryotes and eukaryotes > involving fundamental vectors in evolution is to either construct ones > own databases (which presumably means getting involved with MySQL, and > probably spending some $$$ on hardware) or to develop some BioPerl > modules that can do the SpeciesX vs. SpeciesY comparisons on demand > using some part of the cloud. This problem isn't going to get smaller > its only going to get larger, now that the cost of sequencing > (pseudo-resequencing) a vertebrate genome is starting to come in under > $10,000 and people are starting to seriously talk about 10,000 > vertebrate genomes. 10,000 x 10,000 x 20,000 (genes) isn't something > people are going to undertake very soon. > > Robert They're already undertaking it now using a broad range of organisms, in and out of the cloud. In most cases one can amend a prior recip. comparative analysis with new data fairly easily, if one takes care to do so early on (i.e. set up the BLAST databases with a specified defined size for comparative stats between separate analyses). OrthoMCL v2 describes a procedure to do this, and I believe others have similar methodology. I could also see possible ways one can further optimize this, for instance in cases where two very closely-related organisms are compared, where translated seqs are 100% identical, etc. IIRC, the OrthoMCL DB site already has a way to upload custom sets of protein data for mapping to (already pre-run) clusters. Just the fact that the tools are available as OS, they're semi-automated, and can be generically applied to data of personal interest is a great boon. Not sure I see the downside of that, and I'm pretty confident the scalability issues will be addressed in some way. chris From maj at fortinbras.us Mon Jan 18 11:33:12 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 18 Jan 2010 11:33:12 -0500 Subject: [Bioperl-l] Bioperl 1.6 In-Reply-To: References: <004f01ca97e6$bee06650$3ca132f0$@ac.cn> Message-ID: <6093E45F17B543438AC02E6C626439E1@NewLife> this issue's come up before, see this thread http://lists.open-bio.org/pipermail/bioperl-l/2009-October/031400.html MAJ ----- Original Message ----- From: "Chris Fields" To: "Guohong Hu" Cc: Sent: Monday, January 18, 2010 10:30 AM Subject: Re: [Bioperl-l] Bioperl 1.6 > Guohong, > > 1.6.1 PPM doesn't (at least, shouldn't) require BioPerl 1.4 to be installed > first. Make sure the repos are set according to the Windows installation > instructions on the BioPerl wiki: > > http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows > > IIRC the actual order of the PPM repository can be critical (PPM pulls based > on highest version, first repo, but sometimes it gets confused). Just curious > but where is the v 1.4 PPM located? If it is local to our PPM repo I can > physically remove it to prevent this from happening. > > chris > > On Jan 17, 2010, at 8:34 PM, Guohong Hu wrote: > >> Hi there, >> >> >> >> I was trying to install BioPerl in windows using ppm, by following the >> instruction in >> "http://bioperl.open-bio.org/wiki/Installing_Bioperl_on_Windows". I set up >> the repositories, and did the search of Bioperl packages. The latest version >> available is BioPerl 1.6.1 on http://bioperl.org/DIST. When I tried to >> install it, a number of prerequisite modules were being installed too, which >> include Bioperl 1.4. Then an error message showed up during installation: >> >> >> >> "ERROR: File conflict for 'C:/Perl/html/bin/bp_aacomp.html'.The package >> BioPerl has already installed a file that package bioperl wants to install." >> >> >> >> It looks to me that BioPerl 1.6.1 had installed a file that bioperl 1.4 >> wanted to install again. I don't know why bioperl 1.4 was one of the >> prerequisites for 1.6.1. If I just install 1.4, it will be installed without >> errors. But I need a newer version, because some modules (like >> >> Bio::Tools::HMM) is not included in 1.4. >> >> >> >> I saw on internet that somebody had the same problem when he was trying to >> install BioPerl 1.5, but I didn't find the solution. >> >> >> >> Anybody has a clue on that? Thank you for your time. >> >> >> >> GH >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Mon Jan 18 12:18:34 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 18 Jan 2010 11:18:34 -0600 Subject: [Bioperl-l] Bioperl 1.6 In-Reply-To: <6093E45F17B543438AC02E6C626439E1@NewLife> References: <004f01ca97e6$bee06650$3ca132f0$@ac.cn> <6093E45F17B543438AC02E6C626439E1@NewLife> Message-ID: Mark, Odd issue, maybe it's a dependency like Bio::ASN1::EntrezGene that's causing this? Regardless, it's problematic for me to test this out directly, at least for the next few days. Maybe someone could try it? Also, there is the Strawberry Perl alternative, which uses CPAN (I think ActiveState also supports this). chris On Jan 18, 2010, at 10:33 AM, Mark A. Jensen wrote: > this issue's come up before, see this thread > http://lists.open-bio.org/pipermail/bioperl-l/2009-October/031400.html > MAJ > ----- Original Message ----- From: "Chris Fields" > To: "Guohong Hu" > Cc: > Sent: Monday, January 18, 2010 10:30 AM > Subject: Re: [Bioperl-l] Bioperl 1.6 > > >> Guohong, >> >> 1.6.1 PPM doesn't (at least, shouldn't) require BioPerl 1.4 to be installed first. Make sure the repos are set according to the Windows installation instructions on the BioPerl wiki: >> >> http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows >> >> IIRC the actual order of the PPM repository can be critical (PPM pulls based on highest version, first repo, but sometimes it gets confused). Just curious but where is the v 1.4 PPM located? If it is local to our PPM repo I can physically remove it to prevent this from happening. >> >> chris >> >> On Jan 17, 2010, at 8:34 PM, Guohong Hu wrote: >> >>> Hi there, >>> >>> >>> >>> I was trying to install BioPerl in windows using ppm, by following the >>> instruction in >>> "http://bioperl.open-bio.org/wiki/Installing_Bioperl_on_Windows". I set up >>> the repositories, and did the search of Bioperl packages. The latest version >>> available is BioPerl 1.6.1 on http://bioperl.org/DIST. When I tried to >>> install it, a number of prerequisite modules were being installed too, which >>> include Bioperl 1.4. Then an error message showed up during installation: >>> >>> >>> >>> "ERROR: File conflict for 'C:/Perl/html/bin/bp_aacomp.html'.The package >>> BioPerl has already installed a file that package bioperl wants to install." >>> >>> >>> >>> It looks to me that BioPerl 1.6.1 had installed a file that bioperl 1.4 >>> wanted to install again. I don't know why bioperl 1.4 was one of the >>> prerequisites for 1.6.1. If I just install 1.4, it will be installed without >>> errors. But I need a newer version, because some modules (like >>> >>> Bio::Tools::HMM) is not included in 1.4. >>> >>> >>> >>> I saw on internet that somebody had the same problem when he was trying to >>> install BioPerl 1.5, but I didn't find the solution. >>> >>> >>> >>> Anybody has a clue on that? Thank you for your time. >>> >>> >>> >>> GH >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From clarsen at vecna.com Mon Jan 18 12:42:13 2010 From: clarsen at vecna.com (Chris Larsen) Date: Mon, 18 Jan 2010 12:42:13 -0500 Subject: [Bioperl-l] Reciprocal best blast hits using BioPerl? In-Reply-To: References: Message-ID: Bhakti, (and Chris, Mark)-- Yes there is some perl available to parse reciprocal best blast hits. Mark's referenced / archived post was mine, we were looking to do what you wanted. Here we proceed with the thread. We ended up implementing OrthoMCL 1.4 as Chris F pointed to, and then made a simple perl parser that would take the raw OrthoMCL output, do splits, and spit out a delimited table of all the orthologs in a group, for say Mycobacterium Genus, so you could stuff it into DBLoader. The link to the script, SOP, and method is at: http://www.biohealthbase.org/brcDocs/documents/BHB_ORTHOLOG_SOP.pdf Giving e.g.: Francisella 1 110321310 Francisella 1 110321361 Francisella 1 56707275 Francisella 1 56707366 Francisella 1 56707462 Five members of Ortholog Group 1, with just their gi number. And you can see the results of that parsing, supported by a database, being used to load BioHealthbase with all the reciprocal best blast hits plus other OrthoMCL parsing, for mycobacterial PolA at: http://www.biohealthbase.org/brc/details.do?locus=MAV_3155&decorator=mycobacterium See? Pretty? We were just interested in making ortholog groups on the bais of paralog-conscious reciprocal blast stuff. Like you. This package and doc I've made does what you want I think, as long as you stay in prokaryotes. But--careful...garbage in, garbage out. We started with clean Genuses. (. o O Genii?). You'll get more junky HUGE and TINY ortholog groups if you put in different Orders of microbes. Its taxa sensitive. OrthoMCL author David Roos is great at it though and designed it in mind of higher unicellular euks too...comb the docs for that; sorry I was doing bacterial work at the time and cant guide you if thats what you want.. If you end up installing OrthMCL 1.4, you can pipe the output to this method and get out useable stuff. Hope it works for you. Cheers, Chris L -- Christopher Larsen, Ph.D. Sr. Scientist / Grants Manager Vecna Technologies 6404 Ivy Lane #500 Greenbelt, MD 20770 Phone: (240) 965-4525 Fax: (240) 547-6133 240-737-4525 From maj at fortinbras.us Mon Jan 18 14:37:43 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 18 Jan 2010 14:37:43 -0500 Subject: [Bioperl-l] Bioperl 1.6 In-Reply-To: References: <004f01ca97e6$bee06650$3ca132f0$@ac.cn> <6093E45F17B543438AC02E6C626439E1@NewLife> Message-ID: <61F331117B7C4E2282684FA240B9710F@NewLife> I will play around with it-- in the meantime, Guohong, please look at the following http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows#Comand-line_Installation where there is a workaround for this issue, using the ppm-shell-- cheers, Mark ----- Original Message ----- From: "Chris Fields" To: "Mark A. Jensen" Cc: "Guohong Hu" ; Sent: Monday, January 18, 2010 12:18 PM Subject: Re: [Bioperl-l] Bioperl 1.6 Mark, Odd issue, maybe it's a dependency like Bio::ASN1::EntrezGene that's causing this? Regardless, it's problematic for me to test this out directly, at least for the next few days. Maybe someone could try it? Also, there is the Strawberry Perl alternative, which uses CPAN (I think ActiveState also supports this). chris On Jan 18, 2010, at 10:33 AM, Mark A. Jensen wrote: > this issue's come up before, see this thread > http://lists.open-bio.org/pipermail/bioperl-l/2009-October/031400.html > MAJ > ----- Original Message ----- From: "Chris Fields" > To: "Guohong Hu" > Cc: > Sent: Monday, January 18, 2010 10:30 AM > Subject: Re: [Bioperl-l] Bioperl 1.6 > > >> Guohong, >> >> 1.6.1 PPM doesn't (at least, shouldn't) require BioPerl 1.4 to be installed >> first. Make sure the repos are set according to the Windows installation >> instructions on the BioPerl wiki: >> >> http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows >> >> IIRC the actual order of the PPM repository can be critical (PPM pulls based >> on highest version, first repo, but sometimes it gets confused). Just >> curious but where is the v 1.4 PPM located? If it is local to our PPM repo I >> can physically remove it to prevent this from happening. >> >> chris >> >> On Jan 17, 2010, at 8:34 PM, Guohong Hu wrote: >> >>> Hi there, >>> >>> >>> >>> I was trying to install BioPerl in windows using ppm, by following the >>> instruction in >>> "http://bioperl.open-bio.org/wiki/Installing_Bioperl_on_Windows". I set up >>> the repositories, and did the search of Bioperl packages. The latest version >>> available is BioPerl 1.6.1 on http://bioperl.org/DIST. When I tried to >>> install it, a number of prerequisite modules were being installed too, which >>> include Bioperl 1.4. Then an error message showed up during installation: >>> >>> >>> >>> "ERROR: File conflict for 'C:/Perl/html/bin/bp_aacomp.html'.The package >>> BioPerl has already installed a file that package bioperl wants to install." >>> >>> >>> >>> It looks to me that BioPerl 1.6.1 had installed a file that bioperl 1.4 >>> wanted to install again. I don't know why bioperl 1.4 was one of the >>> prerequisites for 1.6.1. If I just install 1.4, it will be installed without >>> errors. But I need a newer version, because some modules (like >>> >>> Bio::Tools::HMM) is not included in 1.4. >>> >>> >>> >>> I saw on internet that somebody had the same problem when he was trying to >>> install BioPerl 1.5, but I didn't find the solution. >>> >>> >>> >>> Anybody has a clue on that? Thank you for your time. >>> >>> >>> >>> GH >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jason at bioperl.org Mon Jan 18 15:24:33 2010 From: jason at bioperl.org (Jason Stajich) Date: Mon, 18 Jan 2010 12:24:33 -0800 Subject: [Bioperl-l] Reciprocal best hits using Bioperl? In-Reply-To: References: <201001171812.56238.tristan.lefebure@gmail.com> <9151FF3A-B51D-4E03-AAA0-4463A6A57F10@bioperl.org> <201001172036.39032.tristan.lefebure@gmail.com> Message-ID: <68DF70A5-63A6-428D-A7F1-7B3D01528375@bioperl.org> On Jan 18, 2010, at 8:12 AM, Chris Fields wrote: > (my small rant on this) > > On Jan 18, 2010, at 4:20 AM, Robert Bradbury wrote: > >> My comment might be that the problem with OrthoMCL is that it is >> primarily lower organisms. The problem with Ensembl (and some other >> databases) is that it is primarliy higher organisms (though they do >> include Drosophila, C. elegans and Yeast). > > OrthoMCL v2 handles both lower and higher organism; I've used it for > both, with decent success. Most other ortholog tools do as well (if > I'm not mistaken, ensembl also uses MCL under the hood, unless > that's changed). I don't believe one should be completely bound to > one toolset, particularly in this case (there are lots of nice > ortholog clustering tools using various moeans of comparison out > there), but I do think OrthoMCL is very good as an initial pass. If > anything, I would like a set of (possibly bioperl-based, definitely > DB-based) modules that can deal with this information. > > The more imperative issue in my opinion is that one is prisoner to > the gene models for those specific organisms of interest, and this > may vary widely depending on the source of those gene models > (Ensembl, UCSC, NCBI, EBI, centralized MODs like FlyBase, etc). For > instance, if gene models are poorly curated or rarely updated, the > comparisons may be significantly flawed. Some of these issues may > also be (somewhat) alleviated once more transcriptome data is > available that helps clear up gene model ambiguities, but that won't > be true for all organisms, at least initially. > > Note this isn't meant as a slam on any specific DBs or MODs in > general, the problem is one born of the fact that there isn't a > single, centralized, trusted, consistently updated source for this > data, specifically something that will handle moderated third-party > annotation. That's a very difficult problem to solve effectively. > Some of these very issues crept up at the GMOD conference, and there > appears to be consensus that a real attempt is needed to address this. > > I don't know, maybe it's just unicorns and rainbows. Personally I > do think the situation will improve, as there seems to be great > demand for it, but it requires time, resources, manpower, money, cat > herding, etc. > >> The problem arises when one wants to cross those boundaries. For >> example the 5-10 antioxidant proteins, the ~150 DNA repair proteins, >> many of the mitochondrial (ETC) proteins, the ribosomal rRNA's & >> tRNAs, and the fundamental biochemistry (EC) proteins are homologous >> all the way from the most ancient bacteria through H. sapiens. The >> only way to play in the mixed arena of prokaryotes and eukaryotes >> involving fundamental vectors in evolution is to either construct >> ones >> own databases (which presumably means getting involved with MySQL, >> and >> probably spending some $$$ on hardware) or to develop some BioPerl >> modules that can do the SpeciesX vs. SpeciesY comparisons on demand >> using some part of the cloud. This problem isn't going to get >> smaller >> its only going to get larger, now that the cost of sequencing >> (pseudo-resequencing) a vertebrate genome is starting to come in >> under >> $10,000 and people are starting to seriously talk about 10,000 >> vertebrate genomes. 10,000 x 10,000 x 20,000 (genes) isn't something >> people are going to undertake very soon. >> >> Robert > > They're already undertaking it now using a broad range of organisms, > in and out of the cloud. In most cases one can amend a prior recip. > comparative analysis with new data fairly easily, if one takes care > to do so early on (i.e. set up the BLAST databases with a specified > defined size for comparative stats between separate analyses). > OrthoMCL v2 describes a procedure to do this, and I believe others > have similar methodology. > > I could also see possible ways one can further optimize this, for > instance in cases where two very closely-related organisms are > compared, where translated seqs are 100% identical, etc. IIRC, the > OrthoMCL DB site already has a way to upload custom sets of protein > data for mapping to (already pre-run) clusters. Just the fact that > the tools are available as OS, they're semi-automated, and can be > generically applied to data of personal interest is a great boon. > Not sure I see the downside of that, and I'm pretty confident the > scalability issues will be addressed in some way. I think that the approach that Paul Thomas's group at SRI http://www.ai.sri.com/esb/ is doing is really what you'd want to focus on if you are only interested in a particular set of gene families rather than de novo clustering. That or the PhyloFacts approach http://phylogenomics.berkeley.edu/phylofacts/ . That is where HMMs are more appropriate, focusing on your initial seed set of families of proteins. HMMs for your families with some automated clustering initially to get better resolution. Once you start throwing multiple 10^6 proteins the unsupervised clustering approach may not be able to give as accurate or timely results but can be a good initial filtering step depending on how much initial knowledge you are starting with. Using HMM models won't be as computationally expensive either if you are compute limited. TreeFam is also providing curated phylogenies of gene families http://www.treefam.org/ that span the optisthokonts in that a few fungi are sprinkled in. Also things like http://boinc.bio.wzw.tum.de/boincsimap/ provide ways to use distributed computing to calculate the matrix of similarities among proteins if you are interested in the exhaustive approach. -jason > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ From jay at jays.net Mon Jan 18 18:36:20 2010 From: jay at jays.net (Jay Hannah) Date: Mon, 18 Jan 2010 17:36:20 -0600 Subject: [Bioperl-l] Reciprocal best hits using Bioperl? In-Reply-To: References: Message-ID: <9AA13F94-3336-4CC1-89C4-249D0EB7C857@jays.net> On Jan 17, 2010, at 10:02 AM, Bhakti Dwivedi wrote: > Is there a Bio-perl module to parse the reciprocal best hits (query1-> hit1 > && hit1 -> query1) from a blast table report? If all the advice and resources in this thread have not dissuaded you from writing your own, you could glance at cross_blast() here as reference: https://clabsvn.ist.unomaha.edu/anonsvn/user/jhannah/UNO/seqlab/seqlab/tutorial.pod About the (abandoned) project: http://clab.ist.unomaha.edu/CLAB/index.php/SeqLab_%28Perl%29 I wrote that in 2006 for clustering a few hundred proteins based on custom criteria. Cheers, Jay Hannah http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah From jay at jays.net Mon Jan 18 19:22:48 2010 From: jay at jays.net (Jay Hannah) Date: Mon, 18 Jan 2010 18:22:48 -0600 Subject: [Bioperl-l] Bio::BroodComb - RFC Message-ID: <638696D6-7529-4717-A05A-F1E8FF1C5A8F@jays.net> I formalized a little framework so I could stop re-writing little programs that do some things people frequently ask me to do. http://github.com/jhannah/bio-broodcomb It stores everything in SQLite so users can write custom reports if they want to. It uses BioPerl and some shiny CPAN beads (DBIx::Class, Moose::Role). Tests included. The first two functions I stuck in the framework: Find subsequences (Bio::BroodComb::SubSeq): use Bio::BroodComb; my $bc = Bio::BroodComb->new(); $bc->load_large_seq(file => "large_seq.fasta"); $bc->load_small_seq(file => "small_seq.fasta"); $bc->find_subseqs(); print $bc->subseq_report1; In-silico PCR (Bio::BroodComb::PCR): use Bio::BroodComb; my $bc = Bio::BroodComb->new(); $bc->load_large_seq(file => "large_seq.fasta"); $bc->add_primerset( description => "U5/R", # however you want it reported forward_primer => 'GCGGGCAGCAATACTGCTTTGTAA', reverse_primer => 'ACCAGCGTTCAGCATATGGAGGAT', ); $bc->find_pcr_hits(); $bc->find_pcr_products(); print $bc->pcr_report1; I find this rather handy, so will probably be adding all my applicable future work to it instead of writing stand-alone programs. Not sure if it should be renamed for eventual CPAN / wherever. Suggestions, contributions welcome. :) http://github.com/jhannah/bio-broodcomb Jay Hannah http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah From ocornejo at gmail.com Mon Jan 18 19:46:10 2010 From: ocornejo at gmail.com (Omar Cornejo) Date: Mon, 18 Jan 2010 16:46:10 -0800 (PST) Subject: [Bioperl-l] installing bioperl for mac Message-ID: <780113ef-a931-4c2e-bdfe-6a67706e2d0e@p16g2000vbe.googlegroups.com> Dear People, I have tried to install Bioperl in my new Mac Book, which carries the latest perl distribution (5.10.0) and for some reason I can't (using fink) make it recognize this version or perl. I have tried: fink install bioperl-pm510 fink install bioperl-pm5100 but neither one works. Is it fine installing bioperl for perl v 5.9? thank you, Omar Cornejo From jason at bioperl.org Mon Jan 18 20:04:31 2010 From: jason at bioperl.org (Jason Stajich) Date: Mon, 18 Jan 2010 17:04:31 -0800 Subject: [Bioperl-l] AlignIO crashes when reading stockholm alignment with negative PDB ranges In-Reply-To: <4B5502D9.2010706@gmail.com> References: <49C2B97B.7070304@gmail.com> <49C405F0.5050100@gmail.com> <4B4E7A07.7070805@gmail.com> <94913CE4-22AA-407C-9B27-7572A680C9F7@bioperl.org> <4B5502D9.2010706@gmail.com> Message-ID: Alexandr - Thanks for getting back to us - I am guessing the parser needs to recognize negative coordinates around about line 370 in Bio/AlignIO/ Handler/GenericAlignHandler.pm which assumes a split on '-' will be sufficient. Can you post it as a bug to bugzilla along with attaching a record and script that replicates the problem so a test can be written for this. http://bugzilla.open-bio.org/ -jason On Jan 18, 2010, at 4:54 PM, Alexandr Bezginov wrote: > I have contacted Pfam, and I have been told that The PDB file actually > does include a reference to residue "-1": > > DBREF 1E5N A -1 347 UNP P14768 XYNA_PSEFL 264 611 > > DBREF 1E5N B -1 347 UNP P14768 XYNA_PSEFL 264 611 > > > Since negative numbers are allowed in PDB, the data should probably be > considered valid. > > There are quite a few records like this, so this is not an isolated > issue. > > Alexandr > > On 1/14/2010 7:20 PM, Jason Stajich wrote: >> Seems like improper data really -- "-1" is an improper coordinate >> as far >> as the parser is concerned. You may want to tell Pfam that there is >> possible error in the dumper since that was the only record that had >> this problem? >> >> -jason >> On Jan 13, 2010, at 5:57 PM, albezg wrote: >> >>> Hi all, >>> >>> I have a problem using AlignIO to read Pfam database: >>> ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release/Pfam-A.seed.gz >>> The database is in STOCKHOLM 1.0 format. AlignIO can read the >>> alignment OK until the alignment PF00331.13. There it crashes with >>> the >>> following message: >>> >>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>> MSG: '1-344' is not an integer. >>> >>> STACK: Error::throw >>> STACK: Bio::Root::Root::throw >>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Root/Root.pm:368 >>> STACK: Bio::Range::end >>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Range.pm:228 >>> STACK: Bio::Annotation::Target::new >>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Annotation/Target.pm:82 >>> STACK: Bio::AlignIO::Handler::GenericAlignHandler::_stockholm_target >>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/ >>> GenericAlignHandler.pm:293 >>> >>> STACK: Bio::AlignIO::Handler::GenericAlignHandler::data_handler >>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/ >>> GenericAlignHandler.pm:73 >>> >>> STACK: Bio::AlignIO::stockholm::next_aln >>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/stockholm.pm:471 >>> STACK: /home/albezg/scripts/pfam2fasta.pl:22 >>> ----------------------------------------------------------- >>> >>> It appears this is caused by this entry: >>> #=GS XYNA_PSEFL/263-608 DR PDB; 1e5n B; -1-344; >>> >>> I don't care about residues in PDB, so I have just removed minus >>> signs >>> from the ranges. This seems to have fixed the crashing. >>> >>> Is it a known problem? Is there a solution for it? >>> >>> Thanks, >>> Alexandr >>> >>> >>> On 03/20/2009 05:09 PM, albezg wrote: >>>> >>>> I'm trying to change FASTA header(display_id) for a sequence in an >>>> alignment(SimpleAlign). >>>> >>>> There are no issues when I print it, however when I use AlignIO >>>> to write >>>> the alignment to a FASTA file, it does not work. Is this behavior >>>> intended? >>>> >>>> Demo code: http://github.com/jhannah/sandbox/tree/master/Bio_AlignIO_bug >>>> >>>> The error: >>>> ------------- EXCEPTION ------------- >>>> MSG: No sequence with name [1/1-11] >>>> STACK Bio::SimpleAlign::displayname >>>> /scratch/BioSoftware/bioperl-live/Bio/SimpleAlign.pm:2659 >>>> STACK Bio::AlignIO::fasta::write_aln >>>> /scratch/BioSoftware/bioperl-live/Bio/AlignIO/fasta.pm:200 >>>> STACK toplevel ./demo.pl:14 >>>> ------------------------------------- >>>> >>>> Alexandr >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> jason.stajich at gmail.com >> jason at bioperl.org >> http://fungalgenomes.org/ >> > -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ From cjfields at illinois.edu Mon Jan 18 21:19:30 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 18 Jan 2010 20:19:30 -0600 Subject: [Bioperl-l] AlignIO crashes when reading stockholm alignment with negative PDB ranges In-Reply-To: References: <49C2B97B.7070304@gmail.com> <49C405F0.5050100@gmail.com> <4B4E7A07.7070805@gmail.com> <94913CE4-22AA-407C-9B27-7572A680C9F7@bioperl.org> <4B5502D9.2010706@gmail.com> Message-ID: <46FD172A-69C0-436C-A005-AC38668C3347@illinois.edu> Alexandr, Posting the bug report would be great, should be an easy enough fix. chris On Jan 18, 2010, at 7:04 PM, Jason Stajich wrote: > Alexandr - > > Thanks for getting back to us - I am guessing the parser needs to recognize negative coordinates around about line 370 in Bio/AlignIO/Handler/GenericAlignHandler.pm which assumes a split on '-' will be sufficient. > > Can you post it as a bug to bugzilla along with attaching a record and script that replicates the problem so a test can be written for this. http://bugzilla.open-bio.org/ > > -jason > On Jan 18, 2010, at 4:54 PM, Alexandr Bezginov wrote: > >> I have contacted Pfam, and I have been told that The PDB file actually >> does include a reference to residue "-1": >> >> DBREF 1E5N A -1 347 UNP P14768 XYNA_PSEFL 264 611 >> >> DBREF 1E5N B -1 347 UNP P14768 XYNA_PSEFL 264 611 >> >> >> Since negative numbers are allowed in PDB, the data should probably be >> considered valid. >> >> There are quite a few records like this, so this is not an isolated issue. >> >> Alexandr >> >> On 1/14/2010 7:20 PM, Jason Stajich wrote: >>> Seems like improper data really -- "-1" is an improper coordinate as far >>> as the parser is concerned. You may want to tell Pfam that there is >>> possible error in the dumper since that was the only record that had >>> this problem? >>> >>> -jason >>> On Jan 13, 2010, at 5:57 PM, albezg wrote: >>> >>>> Hi all, >>>> >>>> I have a problem using AlignIO to read Pfam database: >>>> ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release/Pfam-A.seed.gz >>>> The database is in STOCKHOLM 1.0 format. AlignIO can read the >>>> alignment OK until the alignment PF00331.13. There it crashes with the >>>> following message: >>>> >>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>> MSG: '1-344' is not an integer. >>>> >>>> STACK: Error::throw >>>> STACK: Bio::Root::Root::throw >>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Root/Root.pm:368 >>>> STACK: Bio::Range::end >>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Range.pm:228 >>>> STACK: Bio::Annotation::Target::new >>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Annotation/Target.pm:82 >>>> STACK: Bio::AlignIO::Handler::GenericAlignHandler::_stockholm_target >>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/GenericAlignHandler.pm:293 >>>> >>>> STACK: Bio::AlignIO::Handler::GenericAlignHandler::data_handler >>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/GenericAlignHandler.pm:73 >>>> >>>> STACK: Bio::AlignIO::stockholm::next_aln >>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/stockholm.pm:471 >>>> STACK: /home/albezg/scripts/pfam2fasta.pl:22 >>>> ----------------------------------------------------------- >>>> >>>> It appears this is caused by this entry: >>>> #=GS XYNA_PSEFL/263-608 DR PDB; 1e5n B; -1-344; >>>> >>>> I don't care about residues in PDB, so I have just removed minus signs >>>> from the ranges. This seems to have fixed the crashing. >>>> >>>> Is it a known problem? Is there a solution for it? >>>> >>>> Thanks, >>>> Alexandr >>>> >>>> >>>> On 03/20/2009 05:09 PM, albezg wrote: >>>>> >>>>> I'm trying to change FASTA header(display_id) for a sequence in an >>>>> alignment(SimpleAlign). >>>>> >>>>> There are no issues when I print it, however when I use AlignIO to write >>>>> the alignment to a FASTA file, it does not work. Is this behavior >>>>> intended? >>>>> >>>>> Demo code: http://github.com/jhannah/sandbox/tree/master/Bio_AlignIO_bug >>>>> >>>>> The error: >>>>> ------------- EXCEPTION ------------- >>>>> MSG: No sequence with name [1/1-11] >>>>> STACK Bio::SimpleAlign::displayname >>>>> /scratch/BioSoftware/bioperl-live/Bio/SimpleAlign.pm:2659 >>>>> STACK Bio::AlignIO::fasta::write_aln >>>>> /scratch/BioSoftware/bioperl-live/Bio/AlignIO/fasta.pm:200 >>>>> STACK toplevel ./demo.pl:14 >>>>> ------------------------------------- >>>>> >>>>> Alexandr >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> -- >>> Jason Stajich >>> jason.stajich at gmail.com >>> jason at bioperl.org >>> http://fungalgenomes.org/ >>> >> > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > http://fungalgenomes.org/ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Jan 18 21:20:31 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 18 Jan 2010 20:20:31 -0600 Subject: [Bioperl-l] installing bioperl for mac In-Reply-To: <780113ef-a931-4c2e-bdfe-6a67706e2d0e@p16g2000vbe.googlegroups.com> References: <780113ef-a931-4c2e-bdfe-6a67706e2d0e@p16g2000vbe.googlegroups.com> Message-ID: <2C159519-B13D-4ABA-BC7B-A21AB5EE0C37@illinois.edu> On Jan 18, 2010, at 6:46 PM, Omar Cornejo wrote: > Dear People, > I have tried to install Bioperl in my new Mac Book, which carries > the latest perl distribution (5.10.0) and for some reason I can't > (using fink) make it recognize this version or perl. > I have tried: > fink install bioperl-pm510 > fink install bioperl-pm5100 > > but neither one works. Is it fine installing bioperl for perl v 5.9? > > thank you, > Omar Cornejo fink doesn't have a package for perl 5.10. You can install it using CPAN, however (it's pure perl), or use other UNIX-y options. See the UNIX installation instructions on the wiki: http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix chris From dan.kortschak at adelaide.edu.au Mon Jan 18 21:47:47 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Tue, 19 Jan 2010 13:17:47 +1030 Subject: [Bioperl-l] Bio::Tools::Run::Bowtie and Bio::Assembly::IO::bowtie now available BETA Message-ID: <1263869267.12881.33.camel@zoidberg.mbs.adelaide.edu.au> Hi All, A wrapper and output parser for bowtie 'ultrafast, memory-efficient short read aligner' are now available in the bioperl-live and bioperl-run subversion repositories (bioperl-live/trunk at 16727 and bioperl-run/trunk at 16726). Bowtie details are available here: http://bowtie-bio.sourceforge.net/index.shtml The modules can return a Bio::Assembly::Scaffold object (operating via the MAJ's Bio::Assembly::IO::sam module in bioperl-live/trunk which requires lstein's Bio::DB::Sam, from CPAN). Note that Bio::DB::Sam uses large amounts of memory - the test suite works for me with >=2GB but not with 1GB due to this. (Is there a disk file system based tool for this for large projects?) Bowtie (>0.12.0) can align in colour space, but this is not currently supported by the wrapper though it should not be difficult to add. If someone can point me to a small set of colour space reads and a reference sequence I will be able to use these for testing. Thanks to the core devs for helping me with many of my problems in putting this together. Dan From maj at fortinbras.us Mon Jan 18 22:31:36 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 18 Jan 2010 22:31:36 -0500 Subject: [Bioperl-l] Bio::Tools::Run::Bowtie and Bio::Assembly::IO::bowtie now available BETA In-Reply-To: <1263869267.12881.33.camel@zoidberg.mbs.adelaide.edu.au> References: <1263869267.12881.33.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: Excellent Dan! Thanks for all this work-- MAJ ----- Original Message ----- From: "Dan Kortschak" To: Sent: Monday, January 18, 2010 9:47 PM Subject: [Bioperl-l] Bio::Tools::Run::Bowtie and Bio::Assembly::IO::bowtie now available BETA > Hi All, > > A wrapper and output parser for bowtie 'ultrafast, memory-efficient > short read aligner' are now available in the bioperl-live and > bioperl-run subversion repositories (bioperl-live/trunk at 16727 and > bioperl-run/trunk at 16726). Bowtie details are available here: > > http://bowtie-bio.sourceforge.net/index.shtml > > The modules can return a Bio::Assembly::Scaffold object (operating via > the MAJ's Bio::Assembly::IO::sam module in bioperl-live/trunk > which requires lstein's Bio::DB::Sam, from CPAN). Note that Bio::DB::Sam > uses large amounts of memory - the test suite works for me with >=2GB > but not with 1GB due to this. (Is there a disk file system based tool > for this for large projects?) > > Bowtie (>0.12.0) can align in colour space, but this is not currently > supported by the wrapper though it should not be difficult to add. If > someone can point me to a small set of colour space reads and a > reference sequence I will be able to use these for testing. > > Thanks to the core devs for helping me with many of my problems in > putting this together. > > Dan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Mon Jan 18 22:36:12 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 18 Jan 2010 21:36:12 -0600 Subject: [Bioperl-l] Bio::Tools::Run::Bowtie and Bio::Assembly::IO::bowtie now available BETA In-Reply-To: <1263869267.12881.33.camel@zoidberg.mbs.adelaide.edu.au> References: <1263869267.12881.33.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: On Jan 18, 2010, at 8:47 PM, Dan Kortschak wrote: > Hi All, > > A wrapper and output parser for bowtie 'ultrafast, memory-efficient > short read aligner' are now available in the bioperl-live and > bioperl-run subversion repositories (bioperl-live/trunk at 16727 and > bioperl-run/trunk at 16726). Bowtie details are available here: > > http://bowtie-bio.sourceforge.net/index.shtml > > The modules can return a Bio::Assembly::Scaffold object (operating via > the MAJ's Bio::Assembly::IO::sam module in bioperl-live/trunk > which requires lstein's Bio::DB::Sam, from CPAN). Note that Bio::DB::Sam > uses large amounts of memory - the test suite works for me with >=2GB > but not with 1GB due to this. (Is there a disk file system based tool > for this for large projects?) > > Bowtie (>0.12.0) can align in colour space, but this is not currently > supported by the wrapper though it should not be difficult to add. If > someone can point me to a small set of colour space reads and a > reference sequence I will be able to use these for testing. > > Thanks to the core devs for helping me with many of my problems in > putting this together. > > Dan And (on behalf of the core devs) thank you for putting this together! chris From scott at scottcain.net Mon Jan 18 22:41:43 2010 From: scott at scottcain.net (Scott Cain) Date: Mon, 18 Jan 2010 22:41:43 -0500 Subject: [Bioperl-l] installing bioperl for mac In-Reply-To: <2C159519-B13D-4ABA-BC7B-A21AB5EE0C37@illinois.edu> References: <780113ef-a931-4c2e-bdfe-6a67706e2d0e@p16g2000vbe.googlegroups.com> <2C159519-B13D-4ABA-BC7B-A21AB5EE0C37@illinois.edu> Message-ID: <4536f7701001181941v7ba47d7du340d18f02f84bb17@mail.gmail.com> But make sure you have the developers tools installed before the first time you run the cpan shell; it will make your life easier. Scott On Mon, Jan 18, 2010 at 9:20 PM, Chris Fields wrote: > On Jan 18, 2010, at 6:46 PM, Omar Cornejo wrote: > >> Dear People, >> ?I have tried to install Bioperl in my new Mac Book, which carries >> the latest perl distribution (5.10.0) and for some reason I can't >> (using fink) make it recognize this version or perl. >> ?I have tried: >> fink install bioperl-pm510 >> fink install bioperl-pm5100 >> >> but neither one works. ?Is it fine installing bioperl for perl v 5.9? >> >> thank you, >> Omar Cornejo > > fink doesn't have a package for perl 5.10. ?You can install it using CPAN, however (it's pure perl), or use other UNIX-y options. ?See the UNIX installation instructions on the wiki: > > http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From cjfields at illinois.edu Mon Jan 18 23:04:57 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 18 Jan 2010 22:04:57 -0600 Subject: [Bioperl-l] Bioperl 1.6 In-Reply-To: <009801c8b957$2af4f8d0$80deea70$@ac.cn> References: <004f01ca97e6$bee06650$3ca132f0$@ac.cn> <009801c8b957$2af4f8d0$80deea70$@ac.cn> Message-ID: <79D53148-1FDA-4025-99A6-77A7F124E6BD@illinois.edu> Hmm, the trouchelle repo is the only one that had a working DB_File for perl 5.10 (not sure but I think 5.8.9 was fine). Probably worth contacting them about this to see if they can drop the (way out-of-date) 1.4 distribution. chris On May 18, 2008, at 9:22 PM, Guohong Hu wrote: > Thank for you all. The problem is solved. The bioperl 1.4 version is from > the Trouchelle repo, but 1.6 is in the Bioperl Regular Releases repo. When I > added all the repo according to the bioperl wiki instruction, somehow 1.4 > became a prerequisite for 1.6. But Chris's question reminded me, so I > removed Trouchelle repo, and the installation proceeded without errors. I > suggested we put a note in the wiki link since it looks like an odd issue > not just for me. > > Best, > Guohong > > > > _________________________________________ > ??????: Chris Fields [mailto:cjfields at illinois.edu] > ????????: 2010??1??18?? 23:30 > ??????: Guohong Hu > ????: bioperl-l at lists.open-bio.org > ????: Re: [Bioperl-l] Bioperl 1.6 > > Guohong, > > 1.6.1 PPM doesn't (at least, shouldn't) require BioPerl 1.4 to be installed > first. Make sure the repos are set according to the Windows installation > instructions on the BioPerl wiki: > > http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows > > IIRC the actual order of the PPM repository can be critical (PPM pulls based > on highest version, first repo, but sometimes it gets confused). Just > curious but where is the v 1.4 PPM located? If it is local to our PPM repo > I can physically remove it to prevent this from happening. > > chris > > On Jan 17, 2010, at 8:34 PM, Guohong Hu wrote: > >> Hi there, >> >> >> >> I was trying to install BioPerl in windows using ppm, by following the >> instruction in >> "http://bioperl.open-bio.org/wiki/Installing_Bioperl_on_Windows". I set up >> the repositories, and did the search of Bioperl packages. The latest > version >> available is BioPerl 1.6.1 on http://bioperl.org/DIST. When I tried to >> install it, a number of prerequisite modules were being installed too, > which >> include Bioperl 1.4. Then an error message showed up during installation: >> >> >> >> "ERROR: File conflict for 'C:/Perl/html/bin/bp_aacomp.html'.The package >> BioPerl has already installed a file that package bioperl wants to > install." >> >> >> >> It looks to me that BioPerl 1.6.1 had installed a file that bioperl 1.4 >> wanted to install again. I don't know why bioperl 1.4 was one of the >> prerequisites for 1.6.1. If I just install 1.4, it will be installed > without >> errors. But I need a newer version, because some modules (like >> >> Bio::Tools::HMM) is not included in 1.4. >> >> >> >> I saw on internet that somebody had the same problem when he was trying to >> install BioPerl 1.5, but I didn't find the solution. >> >> >> >> Anybody has a clue on that? Thank you for your time. >> >> >> >> GH >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From ocornejo at gmail.com Mon Jan 18 23:18:00 2010 From: ocornejo at gmail.com (Omar Eduardo Cornejo Ordaz) Date: Mon, 18 Jan 2010 23:18:00 -0500 Subject: [Bioperl-l] installing bioperl for mac In-Reply-To: <5C767287-A133-4DB5-8708-AD1BF48A4E7E@illinois.edu> References: <780113ef-a931-4c2e-bdfe-6a67706e2d0e@p16g2000vbe.googlegroups.com> <2C159519-B13D-4ABA-BC7B-A21AB5EE0C37@illinois.edu> <4536f7701001181941v7ba47d7du340d18f02f84bb17@mail.gmail.com> <5C767287-A133-4DB5-8708-AD1BF48A4E7E@illinois.edu> Message-ID: I see. thank you Scott and Chris. I had already installed the latest version of the Xcode Developer Tools. I will go the cpan way then. have a nice one, Omar On Mon, Jan 18, 2010 at 10:58 PM, Chris Fields wrote: > Yes, definitely! > > -c > > On Jan 18, 2010, at 9:41 PM, Scott Cain wrote: > > > But make sure you have the developers tools installed before the first > > time you run the cpan shell; it will make your life easier. > > > > Scott > > > > > > On Mon, Jan 18, 2010 at 9:20 PM, Chris Fields > wrote: > >> On Jan 18, 2010, at 6:46 PM, Omar Cornejo wrote: > >> > >>> Dear People, > >>> I have tried to install Bioperl in my new Mac Book, which carries > >>> the latest perl distribution (5.10.0) and for some reason I can't > >>> (using fink) make it recognize this version or perl. > >>> I have tried: > >>> fink install bioperl-pm510 > >>> fink install bioperl-pm5100 > >>> > >>> but neither one works. Is it fine installing bioperl for perl v 5.9? > >>> > >>> thank you, > >>> Omar Cornejo > >> > >> fink doesn't have a package for perl 5.10. You can install it using > CPAN, however (it's pure perl), or use other UNIX-y options. See the UNIX > installation instructions on the wiki: > >> > >> http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix > >> > >> chris > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > > > > > > > -- > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D. scott at scottcain > dot net > > GMOD Coordinator (http://gmod.org/) 216-392-3087 > > Ontario Institute for Cancer Research > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Mon Jan 18 22:58:36 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 18 Jan 2010 21:58:36 -0600 Subject: [Bioperl-l] installing bioperl for mac In-Reply-To: <4536f7701001181941v7ba47d7du340d18f02f84bb17@mail.gmail.com> References: <780113ef-a931-4c2e-bdfe-6a67706e2d0e@p16g2000vbe.googlegroups.com> <2C159519-B13D-4ABA-BC7B-A21AB5EE0C37@illinois.edu> <4536f7701001181941v7ba47d7du340d18f02f84bb17@mail.gmail.com> Message-ID: <5C767287-A133-4DB5-8708-AD1BF48A4E7E@illinois.edu> Yes, definitely! -c On Jan 18, 2010, at 9:41 PM, Scott Cain wrote: > But make sure you have the developers tools installed before the first > time you run the cpan shell; it will make your life easier. > > Scott > > > On Mon, Jan 18, 2010 at 9:20 PM, Chris Fields wrote: >> On Jan 18, 2010, at 6:46 PM, Omar Cornejo wrote: >> >>> Dear People, >>> I have tried to install Bioperl in my new Mac Book, which carries >>> the latest perl distribution (5.10.0) and for some reason I can't >>> (using fink) make it recognize this version or perl. >>> I have tried: >>> fink install bioperl-pm510 >>> fink install bioperl-pm5100 >>> >>> but neither one works. Is it fine installing bioperl for perl v 5.9? >>> >>> thank you, >>> Omar Cornejo >> >> fink doesn't have a package for perl 5.10. You can install it using CPAN, however (it's pure perl), or use other UNIX-y options. See the UNIX installation instructions on the wiki: >> >> http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From albezg at gmail.com Mon Jan 18 19:54:49 2010 From: albezg at gmail.com (Alexandr Bezginov) Date: Mon, 18 Jan 2010 19:54:49 -0500 Subject: [Bioperl-l] AlignIO crashes when reading stockholm alignment with negative PDB ranges In-Reply-To: <94913CE4-22AA-407C-9B27-7572A680C9F7@bioperl.org> References: <49C2B97B.7070304@gmail.com> <49C405F0.5050100@gmail.com> <4B4E7A07.7070805@gmail.com> <94913CE4-22AA-407C-9B27-7572A680C9F7@bioperl.org> Message-ID: <4B5502D9.2010706@gmail.com> I have contacted Pfam, and I have been told that The PDB file actually does include a reference to residue "-1": DBREF 1E5N A -1 347 UNP P14768 XYNA_PSEFL 264 611 DBREF 1E5N B -1 347 UNP P14768 XYNA_PSEFL 264 611 Since negative numbers are allowed in PDB, the data should probably be considered valid. There are quite a few records like this, so this is not an isolated issue. Alexandr On 1/14/2010 7:20 PM, Jason Stajich wrote: > Seems like improper data really -- "-1" is an improper coordinate as far > as the parser is concerned. You may want to tell Pfam that there is > possible error in the dumper since that was the only record that had > this problem? > > -jason > On Jan 13, 2010, at 5:57 PM, albezg wrote: > >> Hi all, >> >> I have a problem using AlignIO to read Pfam database: >> ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release/Pfam-A.seed.gz >> The database is in STOCKHOLM 1.0 format. AlignIO can read the >> alignment OK until the alignment PF00331.13. There it crashes with the >> following message: >> >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: '1-344' is not an integer. >> >> STACK: Error::throw >> STACK: Bio::Root::Root::throw >> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Root/Root.pm:368 >> STACK: Bio::Range::end >> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Range.pm:228 >> STACK: Bio::Annotation::Target::new >> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Annotation/Target.pm:82 >> STACK: Bio::AlignIO::Handler::GenericAlignHandler::_stockholm_target >> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/GenericAlignHandler.pm:293 >> >> STACK: Bio::AlignIO::Handler::GenericAlignHandler::data_handler >> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/GenericAlignHandler.pm:73 >> >> STACK: Bio::AlignIO::stockholm::next_aln >> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/stockholm.pm:471 >> STACK: /home/albezg/scripts/pfam2fasta.pl:22 >> ----------------------------------------------------------- >> >> It appears this is caused by this entry: >> #=GS XYNA_PSEFL/263-608 DR PDB; 1e5n B; -1-344; >> >> I don't care about residues in PDB, so I have just removed minus signs >> from the ranges. This seems to have fixed the crashing. >> >> Is it a known problem? Is there a solution for it? >> >> Thanks, >> Alexandr >> >> >> On 03/20/2009 05:09 PM, albezg wrote: >>> >>> I'm trying to change FASTA header(display_id) for a sequence in an >>> alignment(SimpleAlign). >>> >>> There are no issues when I print it, however when I use AlignIO to write >>> the alignment to a FASTA file, it does not work. Is this behavior >>> intended? >>> >>> Demo code: http://github.com/jhannah/sandbox/tree/master/Bio_AlignIO_bug >>> >>> The error: >>> ------------- EXCEPTION ------------- >>> MSG: No sequence with name [1/1-11] >>> STACK Bio::SimpleAlign::displayname >>> /scratch/BioSoftware/bioperl-live/Bio/SimpleAlign.pm:2659 >>> STACK Bio::AlignIO::fasta::write_aln >>> /scratch/BioSoftware/bioperl-live/Bio/AlignIO/fasta.pm:200 >>> STACK toplevel ./demo.pl:14 >>> ------------------------------------- >>> >>> Alexandr >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > http://fungalgenomes.org/ > From ghhu at sibs.ac.cn Mon Jan 18 21:22:19 2010 From: ghhu at sibs.ac.cn (Guohong Hu) Date: Tue, 19 Jan 2010 02:22:19 -0000 Subject: [Bioperl-l] Bioperl 1.6 In-Reply-To: References: <004f01ca97e6$bee06650$3ca132f0$@ac.cn> Message-ID: <009801c8b957$2af4f8d0$80deea70$@ac.cn> Thank for you all. The problem is solved. The bioperl 1.4 version is from the Trouchelle repo, but 1.6 is in the Bioperl Regular Releases repo. When I added all the repo according to the bioperl wiki instruction, somehow 1.4 became a prerequisite for 1.6. But Chris's question reminded me, so I removed Trouchelle repo, and the installation proceeded without errors. I suggested we put a note in the wiki link since it looks like an odd issue not just for me. Best, Guohong _________________________________________ ??????: Chris Fields [mailto:cjfields at illinois.edu] ????????: 2010??1??18?? 23:30 ??????: Guohong Hu ????: bioperl-l at lists.open-bio.org ????: Re: [Bioperl-l] Bioperl 1.6 Guohong, 1.6.1 PPM doesn't (at least, shouldn't) require BioPerl 1.4 to be installed first. Make sure the repos are set according to the Windows installation instructions on the BioPerl wiki: http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows IIRC the actual order of the PPM repository can be critical (PPM pulls based on highest version, first repo, but sometimes it gets confused). Just curious but where is the v 1.4 PPM located? If it is local to our PPM repo I can physically remove it to prevent this from happening. chris On Jan 17, 2010, at 8:34 PM, Guohong Hu wrote: > Hi there, > > > > I was trying to install BioPerl in windows using ppm, by following the > instruction in > "http://bioperl.open-bio.org/wiki/Installing_Bioperl_on_Windows". I set up > the repositories, and did the search of Bioperl packages. The latest version > available is BioPerl 1.6.1 on http://bioperl.org/DIST. When I tried to > install it, a number of prerequisite modules were being installed too, which > include Bioperl 1.4. Then an error message showed up during installation: > > > > "ERROR: File conflict for 'C:/Perl/html/bin/bp_aacomp.html'.The package > BioPerl has already installed a file that package bioperl wants to install." > > > > It looks to me that BioPerl 1.6.1 had installed a file that bioperl 1.4 > wanted to install again. I don't know why bioperl 1.4 was one of the > prerequisites for 1.6.1. If I just install 1.4, it will be installed without > errors. But I need a newer version, because some modules (like > > Bio::Tools::HMM) is not included in 1.4. > > > > I saw on internet that somebody had the same problem when he was trying to > install BioPerl 1.5, but I didn't find the solution. > > > > Anybody has a clue on that? Thank you for your time. > > > > GH > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jw12 at sanger.ac.uk Tue Jan 19 05:41:12 2010 From: jw12 at sanger.ac.uk (Jonathan Warren) Date: Tue, 19 Jan 2010 10:41:12 +0000 Subject: [Bioperl-l] DAS Workshop Registrations now Open (workshop date 7-9 April 2010) Message-ID: <9EDF4E46-15F8-434E-B557-2DE5906C4182@sanger.ac.uk> If you don't know about DAS and wish to know how to distribute your latest biological annotation to the world then the upcoming DAS workshop maybe for you. If you know about DAS and are maybe a DAS client developer then the upcoming DAS workshop is for you (as you will need to know about the upcoming DAS 1.6 Specification and how it may affect your software). For information on the workshop and registration please go to: http://www.ebi.ac.uk/training/handson/DAS_070410.html Jonathan Warren Senior Developer and DAS coordinator jw12 at sanger.ac.uk -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From SMarkel at accelrys.com Tue Jan 19 13:00:22 2010 From: SMarkel at accelrys.com (Scott Markel) Date: Tue, 19 Jan 2010 10:00:22 -0800 Subject: [Bioperl-l] Bio::Tools::Run::Bowtie and Bio::Assembly::IO::bowtie now available BETA In-Reply-To: <1263869267.12881.33.camel@zoidberg.mbs.adelaide.edu.au> References: <1263869267.12881.33.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: <5ACBA19439E77B43A06F4CAB897EC977019B31FE@EXCH1-COLO.accelrys.net> Dan, Life Tech has sample data for E. coli at http://solidsoftwaretools.com/gf/project/ecoli2x50/ and http://solidsoftwaretools.com/gf/project/dh10bfrag/. Reference sequences are included. Scott Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel at accelrys.com Accelrys (Pipeline Pilot R&D) mobile: +1 858 205 3653 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 San Diego, CA 92121 fax: +1 858 799 5222 USA web: http://www.accelrys.com http://www.linkedin.com/in/smarkel Vice President, Board of Directors: International Society for Computational Biology Chair: ISCB Publications Committee Associate Editor: PLoS Computational Biology Editorial Board: Briefings in Bioinformatics -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Dan Kortschak Sent: Monday, 18 January 2010 6:48 PM To: bioperl-l at lists.open-bio.org Subject: [Bioperl-l] Bio::Tools::Run::Bowtie and Bio::Assembly::IO::bowtie now available BETA Hi All, A wrapper and output parser for bowtie 'ultrafast, memory-efficient short read aligner' are now available in the bioperl-live and bioperl-run subversion repositories (bioperl-live/trunk at 16727 and bioperl-run/trunk at 16726). Bowtie details are available here: http://bowtie-bio.sourceforge.net/index.shtml The modules can return a Bio::Assembly::Scaffold object (operating via the MAJ's Bio::Assembly::IO::sam module in bioperl-live/trunk which requires lstein's Bio::DB::Sam, from CPAN). Note that Bio::DB::Sam uses large amounts of memory - the test suite works for me with >=2GB but not with 1GB due to this. (Is there a disk file system based tool for this for large projects?) Bowtie (>0.12.0) can align in colour space, but this is not currently supported by the wrapper though it should not be difficult to add. If someone can point me to a small set of colour space reads and a reference sequence I will be able to use these for testing. Thanks to the core devs for helping me with many of my problems in putting this together. Dan _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From dan.kortschak at adelaide.edu.au Tue Jan 19 16:18:20 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Wed, 20 Jan 2010 07:48:20 +1030 Subject: [Bioperl-l] Bio::Tools::Run::Bowtie and Bio::Assembly::IO::bowtie now available BETA In-Reply-To: <5ACBA19439E77B43A06F4CAB897EC977019B31FE@EXCH1-COLO.accelrys.net> References: <1263869267.12881.33.camel@zoidberg.mbs.adelaide.edu.au> <5ACBA19439E77B43A06F4CAB897EC977019B31FE@EXCH1-COLO.accelrys.net> Message-ID: <1263935900.4813.0.camel@epistle> Great. Thanks, Scott. Dan On Tue, 2010-01-19 at 10:00 -0800, Scott Markel wrote: > Dan, > > Life Tech has sample data for E. coli at > > http://solidsoftwaretools.com/gf/project/ecoli2x50/ > > and > > http://solidsoftwaretools.com/gf/project/dh10bfrag/. > > Reference sequences are included. > > Scott > > Scott Markel, Ph.D. > Principal Bioinformatics Architect email: smarkel at accelrys.com > Accelrys (Pipeline Pilot R&D) mobile: +1 858 205 3653 > 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 > San Diego, CA 92121 fax: +1 858 799 5222 > USA web: http://www.accelrys.com > > http://www.linkedin.com/in/smarkel > Vice President, Board of Directors: > International Society for Computational Biology > Chair: ISCB Publications Committee > Associate Editor: PLoS Computational Biology > Editorial Board: Briefings in Bioinformatics > > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Dan Kortschak > Sent: Monday, 18 January 2010 6:48 PM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Bio::Tools::Run::Bowtie and Bio::Assembly::IO::bowtie now available BETA > > Hi All, > > A wrapper and output parser for bowtie 'ultrafast, memory-efficient > short read aligner' are now available in the bioperl-live and > bioperl-run subversion repositories (bioperl-live/trunk at 16727 and > bioperl-run/trunk at 16726). Bowtie details are available here: > > http://bowtie-bio.sourceforge.net/index.shtml > > The modules can return a Bio::Assembly::Scaffold object (operating via > the MAJ's Bio::Assembly::IO::sam module in bioperl-live/trunk > which requires lstein's Bio::DB::Sam, from CPAN). Note that Bio::DB::Sam > uses large amounts of memory - the test suite works for me with >=2GB > but not with 1GB due to this. (Is there a disk file system based tool > for this for large projects?) > > Bowtie (>0.12.0) can align in colour space, but this is not currently > supported by the wrapper though it should not be difficult to add. If > someone can point me to a small set of colour space reads and a > reference sequence I will be able to use these for testing. > > Thanks to the core devs for helping me with many of my problems in > putting this together. > > Dan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From dan.kortschak at adelaide.edu.au Wed Jan 20 00:32:05 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Wed, 20 Jan 2010 16:02:05 +1030 Subject: [Bioperl-l] using Bio::DB::GFF for aggregation Message-ID: <1263965525.12780.24.camel@zoidberg.mbs.adelaide.edu.au> Hi Chris (or others), I've been looking at ways to do large assemblies (really rnaseq/readseq comparisons for coverage) with maq/bowtie output and it's clear that for the size of project that I'm working on the space complexity is too nasty with Bio::DB::Sam. So I thought Bio::DB:GFF might be the way to go. I was thinking: B:T:R:Bowtie ~> B:SeqFeat:Generic -> B:T:GFF -> B:DB:GFF This depends on the behaviour of B:DB:GFF->features(-merge=>1). I've read through the docs, and it's not entirely clear (I'm hoping I've interpreted it the right way), but does this result in the return of features such that overlapping features are returned as a single feature while non-overlapping features come back separately. If this is the case, it would satisfy my requirements perfectly. thanks for your time Dan From jason at bioperl.org Wed Jan 20 01:35:24 2010 From: jason at bioperl.org (Jason Stajich) Date: Tue, 19 Jan 2010 22:35:24 -0800 Subject: [Bioperl-l] using Bio::DB::GFF for aggregation In-Reply-To: <1263965525.12780.24.camel@zoidberg.mbs.adelaide.edu.au> References: <1263965525.12780.24.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: Are you looking at the bowtie features file or the SAM? -jason On Jan 19, 2010, at 9:32 PM, Dan Kortschak wrote: > Hi Chris (or others), > > I've been looking at ways to do large assemblies (really rnaseq/ > readseq > comparisons for coverage) with maq/bowtie output and it's clear that > for > the size of project that I'm working on the space complexity is too > nasty with Bio::DB::Sam. So I thought Bio::DB:GFF might be the way to > go. > > I was thinking: B:T:R:Bowtie ~> B:SeqFeat:Generic -> B:T:GFF -> > B:DB:GFF > > This depends on the behaviour of B:DB:GFF->features(-merge=>1). I've > read through the docs, and it's not entirely clear (I'm hoping I've > interpreted it the right way), but does this result in the return of > features such that overlapping features are returned as a single > feature > while non-overlapping features come back separately. If this is the > case, it would satisfy my requirements perfectly. > > thanks for your time > Dan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ From dan.kortschak at adelaide.edu.au Wed Jan 20 02:19:05 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Wed, 20 Jan 2010 17:49:05 +1030 Subject: [Bioperl-l] using Bio::DB::GFF for aggregation In-Reply-To: References: <1263965525.12780.24.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: <1263971945.4582.2.camel@epistle> It doesn't really matter, they are largely inter-convertible. The problem is not really the upstream processing, but the aggregation of reads into read-assigned regions (unless I've misunderstood your question). Dan On Tue, 2010-01-19 at 22:35 -0800, Jason Stajich wrote: > Are you looking at the bowtie features file or the SAM? > -jason > On Jan 19, 2010, at 9:32 PM, Dan Kortschak wrote: > > > Hi Chris (or others), > > > > I've been looking at ways to do large assemblies (really rnaseq/ > > readseq > > comparisons for coverage) with maq/bowtie output and it's clear that > > for > > the size of project that I'm working on the space complexity is too > > nasty with Bio::DB::Sam. So I thought Bio::DB:GFF might be the way to > > go. > > > > I was thinking: B:T:R:Bowtie ~> B:SeqFeat:Generic -> B:T:GFF -> > > B:DB:GFF > > > > This depends on the behaviour of B:DB:GFF->features(-merge=>1). I've > > read through the docs, and it's not entirely clear (I'm hoping I've > > interpreted it the right way), but does this result in the return of > > features such that overlapping features are returned as a single > > feature > > while non-overlapping features come back separately. If this is the > > case, it would satisfy my requirements perfectly. > > > > thanks for your time > > Dan > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > http://fungalgenomes.org/ -- Dan Kortschak From ajmackey at gmail.com Wed Jan 20 07:59:38 2010 From: ajmackey at gmail.com (Aaron Mackey) Date: Wed, 20 Jan 2010 07:59:38 -0500 Subject: [Bioperl-l] using Bio::DB::GFF for aggregation In-Reply-To: <1263965525.12780.24.camel@zoidberg.mbs.adelaide.edu.au> References: <1263965525.12780.24.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: <24c96eca1001200459g36cc5610pe12a10fea8b59a4c@mail.gmail.com> I would advise using BEDtools or the R IRanges package for this kind of aggregation/merging work, rather than trying to reinvent this particular wheel. -Aaron On Wed, Jan 20, 2010 at 12:32 AM, Dan Kortschak < dan.kortschak at adelaide.edu.au> wrote: > Hi Chris (or others), > > I've been looking at ways to do large assemblies (really rnaseq/readseq > comparisons for coverage) with maq/bowtie output and it's clear that for > the size of project that I'm working on the space complexity is too > nasty with Bio::DB::Sam. So I thought Bio::DB:GFF might be the way to > go. > > I was thinking: B:T:R:Bowtie ~> B:SeqFeat:Generic -> B:T:GFF -> B:DB:GFF > > This depends on the behaviour of B:DB:GFF->features(-merge=>1). I've > read through the docs, and it's not entirely clear (I'm hoping I've > interpreted it the right way), but does this result in the return of > features such that overlapping features are returned as a single feature > while non-overlapping features come back separately. If this is the > case, it would satisfy my requirements perfectly. > > thanks for your time > Dan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From dan.kortschak at adelaide.edu.au Wed Jan 20 16:16:39 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Thu, 21 Jan 2010 07:46:39 +1030 Subject: [Bioperl-l] using Bio::DB::GFF for aggregation In-Reply-To: <24c96eca1001200459g36cc5610pe12a10fea8b59a4c@mail.gmail.com> References: <1263965525.12780.24.camel@zoidberg.mbs.adelaide.edu.au> <24c96eca1001200459g36cc5610pe12a10fea8b59a4c@mail.gmail.com> Message-ID: <1264022199.4688.29.camel@epistle> Thanks for that, I'll look into those. BEDtools looks like what I want. cheers Dan On Wed, 2010-01-20 at 07:59 -0500, Aaron Mackey wrote: > I would advise using BEDtools or the R IRanges package for this kind > of aggregation/merging work, rather than trying to reinvent this > particular wheel. > > -Aaron From biopython at maubp.freeserve.co.uk Thu Jan 21 07:33:53 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 21 Jan 2010 12:33:53 +0000 Subject: [Bioperl-l] SwissProt DE lines and UniProt XML / TagTree as XML in BioSQL Message-ID: <320fb6e01001210433n6f42e617g6519ee2790d6add5@mail.gmail.com> Hi all, This is cross posted to try and ensure relevant people see it. I suggest we continue the discussion on the BioSQL list (for how to serialise structured annotation to BioSQL), and/or the OpenBio list (for things like file format naming conventions). I am hoping we (Bio*) can be consistent in how we parse and load into BioSQL the SwissProt DE lines (known as "swiss" format in both BioPerl and Biopython's SeqIO, and by EMBOSS) or the equivalent UniProt XML tags (which we are tentatively going to call the "uniprot" format in Biopython's SeqIO - comments?). Like BioPerl (etc), Biopython can parse plain text SwissProt ("swiss") files and load them into BioSQL. Biopython currently treats the DE comment lines as a long string, as BioPerl used to: http://lists.open-bio.org/pipermail/bioperl-l/2009-May/030041.html http://lists.open-bio.org/pipermail/biosql-l/2009-May/001514.html I understand that BioPerl now turns the SwissProt DE lines into a TagTree, and for storing this in BioSQL this gets serialised as XML. I would like Biopython to handle this the same way (although rather than a Perl TagTree, we'd use a Python structure of course), and would appreciate clarification of what exactly was implemented (e.g. which bit of the BioPerl source code should be look at, and could you show a worked example?). Andrea Pierlenoin (CC'd - not sure if he is on the BioSQL or Open-Bio lists yet) has started work on parsing UniProt XML files for Biopython. Here the DE comment lines are already provided broken up with XML markup. Hopefully their nested structure matches what BioPerl was doing with the SwissProt DE lines. Regards, Peter From cjfields at illinois.edu Thu Jan 21 08:34:12 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 21 Jan 2010 07:34:12 -0600 Subject: [Bioperl-l] [Open-bio-l] SwissProt DE lines and UniProt XML / TagTree as XML in BioSQL In-Reply-To: <320fb6e01001210433n6f42e617g6519ee2790d6add5@mail.gmail.com> References: <320fb6e01001210433n6f42e617g6519ee2790d6add5@mail.gmail.com> Message-ID: Peter, The relevant code is in Bio::Annotation::TagTree in bioperl-live, which is a decorator for Data::Stag: http://search.cpan.org/~cmungall/Data-Stag-0.11/Data/Stag.pm This is where the text output is derived from. It's a bit of a heavyweight solution to the problem, but it's capable of round-tripping the DE data and parses out the data in a way that's approachable. We could probably abstract out the serialization backend there and allow a pure bioperl solution (or the current solution) as a fallback. If the plain-text DE info is represented in a hierarchy already in UniProt XML, we should probably conform as closely as possible to that (using a standard format like XML, JSON, etc.). chris On Jan 21, 2010, at 6:33 AM, Peter wrote: > Hi all, > > This is cross posted to try and ensure relevant people see it. > I suggest we continue the discussion on the BioSQL list > (for how to serialise structured annotation to BioSQL), and/or > the OpenBio list (for things like file format naming conventions). > > I am hoping we (Bio*) can be consistent in how we parse and load > into BioSQL the SwissProt DE lines (known as "swiss" format in > both BioPerl and Biopython's SeqIO, and by EMBOSS) or the > equivalent UniProt XML tags (which we are tentatively going to > call the "uniprot" format in Biopython's SeqIO - comments?). > > Like BioPerl (etc), Biopython can parse plain text SwissProt ("swiss") > files and load them into BioSQL. Biopython currently treats the DE > comment lines as a long string, as BioPerl used to: > > http://lists.open-bio.org/pipermail/bioperl-l/2009-May/030041.html > http://lists.open-bio.org/pipermail/biosql-l/2009-May/001514.html > > I understand that BioPerl now turns the SwissProt DE lines into a > TagTree, and for storing this in BioSQL this gets serialised as XML. > I would like Biopython to handle this the same way (although rather > than a Perl TagTree, we'd use a Python structure of course), and > would appreciate clarification of what exactly was implemented > (e.g. which bit of the BioPerl source code should be look at, > and could you show a worked example?). > > Andrea Pierlenoin (CC'd - not sure if he is on the BioSQL or > Open-Bio lists yet) has started work on parsing UniProt XML > files for Biopython. Here the DE comment lines are already > provided broken up with XML markup. Hopefully their nested > structure matches what BioPerl was doing with the SwissProt > DE lines. > > Regards, > > Peter From sharmashalu.bio at gmail.com Thu Jan 21 09:25:44 2010 From: sharmashalu.bio at gmail.com (shalu sharma) Date: Thu, 21 Jan 2010 09:25:44 -0500 Subject: [Bioperl-l] sequence orientation Message-ID: <465b5a661001210625j3d84a165u69d8c8d21d2fe7ac@mail.gmail.com> Hi All, This is not a perl/bioperl query but i thought that its a best place to ask. I have some pyro reads ( from CAMERA) and i want to find out their 5' and 3' ends. Is there any way i can do this? I would really appreciate if anyone can help me out. Thanks Shalu From rtbio.2009 at gmail.com Thu Jan 21 13:28:43 2010 From: rtbio.2009 at gmail.com (Roopa Raghuveer) Date: Thu, 21 Jan 2010 19:28:43 +0100 Subject: [Bioperl-l] Regarding blast in Bioperl In-Reply-To: <196889DF87964224ACDB948681BA7F86@NewLife> References: <4C2E8133F916495B876628EF3E8FCBB2@NewLife> <9D8A1428463C4D5E9C416521C35E254C@NewLife> <196889DF87964224ACDB948681BA7F86@NewLife> Message-ID: Hello Mark, This is Roopa again. I have a small problem again. I am working on Remote blast. The program works well. But the problem is this. The program accesses the server and gets the output correctly. I am trying to send the result sequences into an array and I found that always the first sequence among the Result sequences is missing. The code is my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , '-organism' => "$organ\[ORGN]"); while (my $input = $str->next_seq()) { #Blast a sequence against a database: #Alternatively, you could pass in a file with many #sequences rather than loop through sequence one at a time #Remove the loop starting 'while (my $input = $str->next_seq())' #and swap the two lines below for an example of that. open(OUTFILE,'>',$debugfile); print OUTFILE $input; close(OUTFILE); my $r = $factory->submit_blast($input); open(OUTFILE,'>',$debugfile); # print OUTFILE $r; close(OUTFILE); print STDERR "waiting...." if($v>0); while ( my @rids = $factory->each_rid ) { open(OUTFILE,'>',$debugfile); # print OUTFILE "while entered"; close(OUTFILE); foreach my $rid ( @rids ) { open(OUTFILE,'>',$debugfile); # print OUTFILE "foreach entered"; close(OUTFILE); my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $factory->remove_rid($rid); } open(OUTFILE,'>',$debugfile); # print OUTFILE "if entered"; close(OUTFILE); print STDERR "." if ( $v > 0 ); sleep 5; } else { open(OUTFILE,'>',$debugfile); # print OUTFILE "else entered"; close(OUTFILE); my $result = $rc->next_result(); #save the output $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; open(BLASTDEBUGFILE,'>',$blastdebugfile); print BLASTDEBUGFILE $result->next_hit(); close(BLASTDEBUGFILE); my $filename = $serverpath."/blastdata_".time()."\.out"; # open(DEBUGFILE,'>',$debugfile); # open(new,'>',$filename); # @arra=; # print DEBUGFILE @arra; # close(DEBUGFILE); # close(new); $factory->save_output($filename); # open(BLASTDEBUGFILE,'>',$debugfile); # print BLASTDEBUGFILE "Hello $rid"; # close(BLASTDEBUGFILE); $factory->remove_rid($rid); open(BLASTDEBUGFILE,'>',$blastdebugfile); print BLASTDEBUGFILE $organism; close(BLASTDEBUGFILE); # open(OUTFILE,'>',$outfile); # print OUTFILE "Test2 $result->database_name()"; # close(OUTFILE); #$hit = $result->next_hit; #open(new,'>',$debugfile); #print $hit; #close(new); $dummy=0; while ( my $hit = $result->next_hit ) { next unless ( $v >= 0); # open(OUTFILE,'>',$debugfile); # print OUTFILE "$hit in while hits"; # close(OUTFILE); my $sequ = $gb->get_Seq_by_version($hit->name); my $dna = $sequ->seq(); # get the sequence as a string $dummy++; open(OUTFILE,'>',$debugfile); # print OUTFILE $dummy; close(OUTFILE); push(@seqs,$dna); } } } } } $warum=@seqs; open(OUTFILE,'>',$debugfile); # print OUTFILE $warum; print OUTFILE @seqs; close(OUTFILE); return(@seqs); } open(OUTFILE, '>',$outfile) || die ; print OUTFILE "\n RNAi Result \n \n

Inputsequence:
"; Here in the above code, I was trying to debug the code and trying to get the count of the array and even the sequence. But when the output data was giving 1 sequence, the count of the array was 0 and when I tried to print the output sequence I could not get any. It was the same when the no of output sequences was 3, I tried to print the sequences but was getting the count of the array as 2 and was printing only two sequences. Please help me in sorting out this problem. Regards, Roopa. On Sun, Jan 10, 2010 at 1:03 AM, Mark A. Jensen wrote: > Excellent Roopa- it's my pleasure-- MAJ > > ----- Original Message ----- > *From:* Roopa Raghuveer > *To:* Mark A. Jensen > *Sent:* Saturday, January 09, 2010 6:41 PM > *Subject:* Re: [Bioperl-l] Regarding blast in Bioperl > > Hi Mark, > > Thank you very very much. The code is working now. Thanks for the support > and time you have spent on me. > > Thanks in advance > Roopa. > > On Sat, Jan 9, 2010 at 10:56 PM, Mark A. Jensen wrote: > >> There is still a bug with the double quotes. Use "$organ\[ORGN]", which >> prevents perl from >> looking for a member of an array called @organ. This would have shown up >> if 'use strict;' had >> been in place. Still don't know whether this would work precisely; can you >> send me the query >> sequence so I can reproduce your ouput? >> thanks MAJ >> >> ----- Original Message ----- >> *From:* Roopa Raghuveer >> *To:* Mark A. Jensen >> *Sent:* Saturday, January 09, 2010 2:02 PM >> *Subject:* Re: [Bioperl-l] Regarding blast in Bioperl >> >> Hi Mark, >> >> I tried it with double quotes but still i got the same o/p with sequences >> from different species. >> >> ref|XM_822292.1| Trypanosoma brucei TREU927 protein kinase A ... 1813 >> 0.0 >> ref|XM_822286.1| Trypanosoma brucei TREU927 protein kinase A ... 1622 >> 0.0 >> ref|XM_816530.1| Trypanosoma cruzi strain CL Brener protein k... 773 >> 0.0 >> ref|XM_816527.1| Trypanosoma cruzi strain CL Brener protein k... 749 >> 0.0 >> ref|XM_838414.1| Leishmania major strain Friedlin protein kin... 551 >> 3e-154 >> ref|XM_838409.1| Leishmania major strain Friedlin protein kin... 551 >> 3e-154 >> ref|XM_001568451.1| Leishmania braziliensis MHOM/BR/75/M2904 ... 542 >> 2e-151 >> ref|XM_001469171.1| Leishmania infantum protein kinase A cata... 538 >> 2e-150 >> ref|XM_001469166.1| Leishmania infantum protein kinase A cata... 538 >> 2e-150 >> ref|XM_001682462.1| Leishmania major protein kinase A catalyt... 196 >> 3e-47 >> ref|XM_804361.1| Trypanosoma cruzi strain CL Brener protein k... 190 >> 1e-45 >> ref|XM_002065851.1| Drosophila willistoni GK20594 (Dwil\GK205... 181 >> 7e-43 >> ref|XM_822694.1| Trypanosoma brucei TREU927 protein kinase A ... 179 >> 2e-42 >> ref|XM_001563990.1| Leishmania braziliensis MHOM/BR/75/M2904 ... 178 >> 8e-42 >> ref|XM_814844.1| Trypanosoma cruzi strain CL Brener protein k... 170 >> 1e-39 >> ref|XM_001763039.1| Physcomitrella patens subsp. patens predi... 169 >> 4e-39 >> ref|XM_001464886.1| Leishmania infantum JPCM5 protein kinase ... 167 >> 1e-38 >> ref|XM_001377302.1| PREDICTED: Monodelphis domestica similar ... 150 >> 1e-33 >> ref|XM_001603485.1| PREDICTED: Nasonia vitripennis similar to... 145 >> 5e-32 >> ref|XM_416852.2| PREDICTED: Gallus gallus protein kinase, X-l... 143 >> 2e-31 >> ref|NM_001016403.2| Xenopus (Silurana) tropicalis protein kin... 143 >> 2e-31 >> ref|XM_002009291.1| Drosophila mojavensis GI11297 (Dmoj\GI112... 138 >> 7e-30 >> ref|NM_016979.1| Mus musculus protein kinase, X-linked (Prkx)... 138 >> 7e-30 >> ref|XM_001495664.2| PREDICTED: Equus caballus similar to Seri... 136 >> 2e-29 >> ref|XM_001111571.1| PREDICTED: Macaca mulatta cAMP-dependent ... 136 >> 2e-29 >> ref|XM_001611655.1| Babesia bovis protein kinase domain conta... 134 >> 9e-29 >> ref|NR_028062.1| Homo sapiens protein kinase, Y-linked (PRKY)... 132 >> 3e-28 >> ref|XM_001517795.1| PREDICTED: Ornithorhynchus anatinus simil... 132 >> 3e-28 >> ref|XM_685338.2| PREDICTED: Danio rerio similar to Serine/thr... 132 >> 3e-28 >> ref|XM_002189865.1| PREDICTED: Taeniopygia guttata protein ki... 131 >> 1e-27 >> ref|XM_001362299.1| PREDICTED: Monodelphis domestica similar ... 131 >> 1e-27 >> ref|NM_001093198.1| Xenopus laevis protein kinase, cAMP-depen... 129 >> 4e-27 >> ref|XM_001461322.1| Paramecium tetraurelia hypothetical prote... 129 >> 4e-27 >> ref|NM_001099869.1| Xenopus laevis cAMP-dependent protein kin... 129 >> 4e-27 >> ref|NM_001003470.1| Danio rerio protein kinase, cAMP-dependen... 129 >> 4e-27 >> ref|XM_001141503.1| PREDICTED: Pan troglodytes verus protein ... 127 >> 1e-26 >> ref|XM_001145269.1| PREDICTED: Pan troglodytes protein kinase... 127 >> 1e-26 >> ref|XM_512434.2| PREDICTED: Pan troglodytes cAMP-dependent pr... 127 >> 1e-26 >> ref|XM_001171457.1| PREDICTED: Pan troglodytes cAMP-dependent... 127 >> 1e-26 >> ref|XM_001171437.1| PREDICTED: Pan troglodytes cAMP-dependent... 127 >> 1e-26 >> ref|XM_847420.1| PREDICTED: Canis familiaris similar to Serin... 127 >> 1e-26 >> ref|NM_207518.1| Homo sapiens protein kinase, cAMP-dependent,... 127 >> 1e-26 >> ref|NM_002730.3| Homo sapiens protein kinase, cAMP-dependent,... 127 >> 1e-26 >> >> >> Thanks in advance. >> >> Roopa. >> >> On Sat, Jan 9, 2010 at 7:46 PM, Mark A. Jensen wrote: >> >>> I understand you. Put in the double quotes and see what happens. >>> >>> ----- Original Message ----- >>> *From:* Roopa Raghuveer >>> *To:* Mark A. Jensen >>> *Sent:* Saturday, January 09, 2010 1:40 PM >>> *Subject:* Re: [Bioperl-l] Regarding blast in Bioperl >>> >>> Hi Mark, >>> >>> Thanks for your reply. It was working when I specifically use the name of >>> the organism as Trypanosoma brucei in the code,but my idea is to introduce a >>> $organ which takes the organism given by the user i.e., let it be anything >>> >>> Pseudomonas, Drosophila, Trypanosoma, Leishmania etc., I should get the >>> sequences related to only those organisms. >>> >>> i.e., If the user enters Pseudomonas,the $organ parameter of the code >>> takes Pseudomonas ,does BLAST and returns only those sequences that produce >>> significant alignment with Pseudomonas(only).But this is not happening like >>> that . >>> >>> Please help me in this regard. >>> >>> Thanks in advance >>> Roopa >>> >>> On Sat, Jan 9, 2010 at 7:29 PM, Mark A. Jensen wrote: >>> >>>> Hi Roopa-- You may get what you want if you make the change. >>>> With single quotes, ENTREZ_QUERY is set to the literal string >>>> >>>> $organ[ORGN] >>>> >>>> while, with double quotes, the variable value will be substituted, >>>> and the parameter should be set to >>>> >>>> Trypanosoma brucei[ORGN] >>>> >>>> I'm guess that it worked because the database ignored the strange >>>> parameter, >>>> and returned all the matches. Try this and if it doesn't work I look >>>> harder. >>>> cheers, >>>> Mark >>>> >>>> ----- Original Message ----- >>>> *From:* Roopa Raghuveer >>>> *To:* Mark A. Jensen >>>> *Sent:* Saturday, January 09, 2010 1:24 PM >>>> *Subject:* Re: [Bioperl-l] Regarding blast in Bioperl >>>> >>>> hello Mark, >>>> >>>> Thanks for your reply.It was working without enclosing $organ[ORGN] in >>>> double quotations,but. I would like to have only those specific sequences >>>> which are specific for my Organism i.e., I need sequences only from the >>>> organism that I entered. >>>> >>>> When the organism is Trypanosoma brucei,I could get even Leishmania and >>>> other species as the similar sequences. But I want to get only trypanosoma >>>> brucei sequences. >>>> >>>> Could you please help me out in this regard? >>>> >>>> Roopa. >>>> >>>> My output >>>> >>>> I/P organism: Trypanosoma brucei >>>> >>>> O/P:- >>>> ref|XM_822292.1| Trypanosoma brucei TREU927 protein kinase A ... >>>> 1813 0.0 >>>> ref|XM_822286.1| Trypanosoma brucei TREU927 protein kinase A ... >>>> 1622 0.0 >>>> ref|XM_816530.1| Trypanosoma cruzi strain CL Brener protein k... >>>> 773 0.0 >>>> ref|XM_816527.1| Trypanosoma cruzi strain CL Brener protein k... >>>> 749 0.0 >>>> ref|XM_838414.1| Leishmania major strain Friedlin protein kin... >>>> 551 3e-154 >>>> ref|XM_838409.1| Leishmania major strain Friedlin protein kin... >>>> 551 3e-154 >>>> ref|XM_001568451.1| Leishmania braziliensis MHOM/BR/75/M2904 ... >>>> 542 2e-151 >>>> ref|XM_001469171.1| Leishmania infantum protein kinase A cata... >>>> 538 2e-150 >>>> ref|XM_001469166.1| Leishmania infantum protein kinase A cata... >>>> 538 2e-150 >>>> ref|XM_001682462.1| Leishmania major protein kinase A catalyt... >>>> 196 3e-47 >>>> ref|XM_804361.1| Trypanosoma cruzi strain CL Brener protein k... >>>> 190 1e-45 >>>> ref|XM_002065851.1| Drosophila willistoni GK20594 (Dwil\GK205... >>>> 181 7e-43 >>>> ref|XM_822694.1| Trypanosoma brucei TREU927 protein kinase A ... >>>> 179 2e-42 >>>> ref|XM_001563990.1| Leishmania braziliensis MHOM/BR/75/M2904 ... >>>> 178 8e-42 >>>> ref|XM_814844.1| Trypanosoma cruzi strain CL Brener protein k... >>>> 170 1e-39 >>>> ref|XM_001763039.1| Physcomitrella patens subsp. patens predi... >>>> 168 4e-39 >>>> ref|XM_001464886.1| Leishmania infantum JPCM5 protein kinase ... >>>> 167 1e-38 >>>> ref|XM_001377302.1| PREDICTED: Monodelphis domestica similar ... >>>> 150 1e-33 >>>> ref|XM_001603485.1| PREDICTED: Nasonia vitripennis similar to... >>>> 145 5e-32 >>>> ref|XM_416852.2| PREDICTED: Gallus gallus protein kinase, X-l... >>>> 143 2e-31 >>>> ref|NM_001016403.2| Xenopus (Silurana) tropicalis protein kin... >>>> 143 2e-31 >>>> ref|XM_002009291.1| Drosophila mojavensis GI11297 (Dmoj\GI112... >>>> 138 7e-30 >>>> ref|NM_016979.1| Mus musculus protein kinase, X-linked (Prkx)... >>>> 138 7e-30 >>>> ref|XM_001495664.2| PREDICTED: Equus caballus similar to Seri... >>>> 136 2e-29 >>>> ref|XM_001111571.1| PREDICTED: Macaca mulatta cAMP-dependent ... >>>> 136 2e-29 >>>> ref|XM_001611655.1| Babesia bovis protein kinase domain conta... >>>> 134 9e-29 >>>> ref|NR_028062.1| Homo sapiens protein kinase, Y-linked (PRKY)... >>>> 132 3e-28 >>>> ref|XM_001517795.1| PREDICTED: Ornithorhynchus anatinus simil... >>>> 132 3e-28 >>>> ref|XM_685338.2| PREDICTED: Danio rerio similar to Serine/thr... >>>> 132 3e-28 >>>> ref|XM_002189865.1| PREDICTED: Taeniopygia guttata protein ki... >>>> 131 1e-27 >>>> ref|XM_001362299.1| PREDICTED: Monodelphis domestica similar ... >>>> 131 1e-27 >>>> ref|NM_001093198.1| Xenopus laevis protein kinase, cAMP-depen... >>>> 129 4e-27 >>>> ref|XM_001461322.1| Paramecium tetraurelia hypothetical prote... >>>> 129 4e-27 >>>> ref|NM_001099869.1| Xenopus laevis cAMP-dependent protein kin... >>>> 129 4e-27 >>>> >>>> Roopa. >>>> >>>> On Sat, Jan 9, 2010 at 7:05 PM, Mark A. Jensen wrote: >>>> >>>>> I see it immediately (from making same bug many times) : >>>>> >>>>> >>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY >>>>> => >>>>> - '$organ[ORGN]'); >>>>> +"$organ[ORGN]"); >>>>> >>>>> >>>>> MAJ >>>>> >>>>> ----- Original Message ----- From: "Roopa Raghuveer" < >>>>> rtbio.2009 at gmail.com> >>>>> To: "Mark A. Jensen" >>>>> Cc: >>>>> Sent: Saturday, January 09, 2010 11:57 AM >>>>> Subject: Re: [Bioperl-l] Regarding blast in Bioperl >>>>> >>>>> >>>>> >>>>> Hello all, >>>>>> >>>>>> Thanks alot for your reply Mark. It was working for Trypanosoma brucei >>>>>> as >>>>>> the organism parameter,but when I tried to use the Organism parameter >>>>>> from >>>>>> the user,it was not working i.e., I was unable to get the target >>>>>> sequences. >>>>>> Please help me in this regard. My code is >>>>>> >>>>>> #!/usr/bin/perl >>>>>> >>>>>> #path for extra camel module >>>>>> use lib "/srv/www/htdocs/rain/RNAi/"; >>>>>> use Roopablast; >>>>>> >>>>>> >>>>>> use Bio::SearchIO; >>>>>> use Bio::Search::Result::BlastResult; >>>>>> use Bio::Perl; >>>>>> use Bio::Tools::Run::RemoteBlast; >>>>>> use Bio::Seq; >>>>>> use Bio::SeqIO; >>>>>> use Bio::DB::GenBank; >>>>>> >>>>>> $serverpath = "/srv/www/htdocs/rain/RNAi"; >>>>>> $serverurl = "http://141.84.66.66/rain/RNAi"; >>>>>> $outfile = $serverpath."/rnairesult_".time().".html"; >>>>>> $nuc = $serverpath."/nuc".time().".txt"; >>>>>> $debugfile = $serverpath."/debug_".time().".txt"; >>>>>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; >>>>>> >>>>>> my $outstring =""; >>>>>> >>>>>> &parse_form; >>>>>> >>>>>> print "Content-type: text/html\n\n"; >>>>>> print "\n"; >>>>>> print "RNAi Result"; >>>>>> print ">>>>> URL=$serverurl/rnairesult_".time().".html\"> \n"; >>>>>> print "\n"; >>>>>> print "\n"; >>>>>> print " Your results will appear >>>>> href=$serverurl/rnairesult_".time().".html>here
"; >>>>>> print " Please be patient, runtime can be up to 5 minutes
"; >>>>>> print " This page will automatically reload in 30 seconds. Roopa"; >>>>>> print "\n"; >>>>>> print "\n"; >>>>>> >>>>>> defined(my $pid = fork) or die "Can't fork: $!"; >>>>>> exit if $pid; >>>>>> open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; >>>>>> open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; >>>>>> open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; >>>>>> >>>>>> open(OUTFILE, '>',$outfile); >>>>>> >>>>>> print OUTFILE "\n >>>>>> RNAi Result >>>>>> >>>>> URL=$serverurl//rnairesult_".time().".html\"> \n >>>>>> >>>>>> \n >>>>>> \n >>>>>> Your results will appear >>>>> href=$serverurl/rnairesult_".time().".html>here
>>>>>> Please be patient, runtime can be up to 5 minutes wait wait >>>>>> wait......
>>>>>> This page will automatically reload in 30 seconds Roopa
>>>>>> \n >>>>>> \n"; >>>>>> >>>>>> close(OUTFILE); >>>>>> >>>>>> >>>>>> @compseqs = blastcode($in{'Inputseq'},$in{'Organism'}); >>>>>> >>>>>> $in{'Inputseq'} =~ s/>.*$//m; >>>>>> $in{'Inputseq'} =~ s/[^TAGC]//gim; >>>>>> $in{'Inputseq'} =~ tr/actg/ACTG/; >>>>>> >>>>>> @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'}, >>>>>> $in{'Threshold'}); >>>>>> >>>>>> >>>>>> sub blastcode >>>>>> { >>>>>> >>>>>> $inpu1= $_[0]; >>>>>> >>>>>> $organ= $_[1]; >>>>>> >>>>>> open(NUC,'>',$nuc); >>>>>> print NUC $inpu1,"\n"; >>>>>> close(NUC); >>>>>> >>>>>> my $prog = 'blastn'; >>>>>> my $db = 'refseq_rna'; >>>>>> my $e_val= '1e-10'; >>>>>> my $organism= $organ; >>>>>> >>>>>> $gb = new Bio::DB::GenBank; >>>>>> >>>>>> my @params = ( '-prog' => $prog, >>>>>> '-data' => $db, >>>>>> '-expect' => $e_val, >>>>>> '-readmethod' => 'SearchIO', >>>>>> '-Organism' => $organism ); >>>>>> >>>>>> open(OUTFILE,'>',$debugfile); >>>>>> print OUTFILE $inpu1; >>>>>> close(OUTFILE); >>>>>> >>>>>> >>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY >>>>>> => >>>>>> '$organ[ORGN]'); >>>>>> >>>>>> #my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >>>>>> >>>>>> #change a paramter >>>>>> >>>>>> #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma >>>>>> Brucei[ORGN]'; >>>>>> >>>>>> #change a paramter >>>>>> # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = >>>>>> '$input2[ORGN]'; >>>>>> >>>>>> my $v = 1; >>>>>> #$v is just to turn on and off the messages >>>>>> >>>>>> my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , >>>>>> '-organism' => $organ ); >>>>>> >>>>>> >>>>>> while (my $input = $str->next_seq()) >>>>>> { >>>>>> #Blast a sequence against a database: >>>>>> #Alternatively, you could pass in a file with many >>>>>> #sequences rather than loop through sequence one at a time >>>>>> #Remove the loop starting 'while (my $input = $str->next_seq())' >>>>>> #and swap the two lines below for an example of that. >>>>>> >>>>>> #open(OUTFILE,'>',$debugfile); >>>>>> # print OUTFILE $input; >>>>>> #close(OUTFILE); >>>>>> >>>>>> >>>>>> my $r = $factory->submit_blast($input); >>>>>> >>>>>> open(OUTFILE,'>',$debugfile); >>>>>> # print OUTFILE $r; >>>>>> close(OUTFILE); >>>>>> >>>>>> print STDERR "waiting...." if($v>0); >>>>>> >>>>>> while ( my @rids = $factory->each_rid ) { >>>>>> # open(OUTFILE,'>',$debugfile); >>>>>> # print OUTFILE "while entered"; >>>>>> # close(OUTFILE); >>>>>> foreach my $rid ( @rids ) { >>>>>> >>>>>> # open(OUTFILE,'>',$debugfile); >>>>>> # print OUTFILE "foreach entered"; >>>>>> # close(OUTFILE); >>>>>> >>>>>> my $rc = $factory->retrieve_blast($rid); >>>>>> >>>>>> if( !ref($rc) ) >>>>>> { >>>>>> if( $rc < 0 ) >>>>>> { >>>>>> $factory->remove_rid($rid); >>>>>> } >>>>>> open(OUTFILE,'>',$debugfile); >>>>>> # print OUTFILE "if entered"; >>>>>> close(OUTFILE); >>>>>> print STDERR "." if ( $v > 0 ); >>>>>> sleep 5; >>>>>> } >>>>>> else { >>>>>> # open(OUTFILE,'>',$debugfile); >>>>>> # print OUTFILE "else entered"; >>>>>> # close(OUTFILE); >>>>>> >>>>>> my $result = $rc->next_result(); >>>>>> #save the output >>>>>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; >>>>>> >>>>>> open(BLASTDEBUGFILE,'>',$blastdebugfile); >>>>>> print BLASTDEBUGFILE $result->next_hit(); >>>>>> close(BLASTDEBUGFILE); >>>>>> >>>>>> my $filename = >>>>>> $serverpath."/blastdata_".time().$result->query_name()."\.out"; >>>>>> >>>>>> # open(DEBUGFILE,'>',$debugfile); >>>>>> # open(new,'>',$filename); >>>>>> # @arra=; >>>>>> # print DEBUGFILE @arra; >>>>>> # close(DEBUGFILE); >>>>>> # close(new); >>>>>> >>>>>> $factory->save_output($filename); >>>>>> # open(BLASTDEBUGFILE,'>',$debugfile); >>>>>> # print BLASTDEBUGFILE "Hello $rid"; >>>>>> # close(BLASTDEBUGFILE); >>>>>> >>>>>> $factory->remove_rid($rid); >>>>>> >>>>>> open(BLASTDEBUGFILE,'>',$blastdebugfile); >>>>>> print BLASTDEBUGFILE $organism; >>>>>> close(BLASTDEBUGFILE); >>>>>> >>>>>> # open(OUTFILE,'>',$outfile); >>>>>> # print OUTFILE "Test2 $result->database_name()"; >>>>>> # close(OUTFILE); >>>>>> >>>>>> #$hit = $result->next_hit; >>>>>> #open(new,'>',$debugfile); >>>>>> #print $hit; >>>>>> #close(new); >>>>>> >>>>>> while ( my $hit = $result->next_hit ) { >>>>>> >>>>>> next unless ( $v > 0); >>>>>> >>>>>> # open(OUTFILE,'>',$debugfile); >>>>>> # print OUTFILE "$hit in while hits"; >>>>>> # close(OUTFILE); >>>>>> >>>>>> my $sequ = $gb->get_Seq_by_version($hit->name); >>>>>> my $dna = $sequ->seq(); # get the sequence as a string >>>>>> push(@seqs,$dna); >>>>>> } >>>>>> } >>>>>> } >>>>>> } >>>>>> } >>>>>> >>>>>> #open(OUTFILE,'>',$debugfile); >>>>>> #print OUTFILE $seqs[0]; >>>>>> #close(OUTFILE); >>>>>> >>>>>> return(@seqs); >>>>>> >>>>>> } >>>>>> >>>>>> Regards, >>>>>> Roopa. >>>>>> >>>>>> >>>>>> On Fri, Jan 8, 2010 at 4:36 PM, Mark A. Jensen >>>>>> wrote: >>>>>> >>>>>> Hi Roopa-- >>>>>>> >>>>>>> I got your code to work with the following changes: >>>>>>> >>>>>>> +# the input should be a valid FASTA file... >>>>>>> ... >>>>>>> open(NUC,'>',$nuc); >>>>>>> +print NUC ">seq (need a name line for valid fasta)\n"; >>>>>>> print NUC $inpu1, "\n"; >>>>>>> close(NUC); >>>>>>> ... >>>>>>> >>>>>>> +# you can set these header parms in the call itself... >>>>>>> - my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >>>>>>> + my $factory = Bio::Tools::Run::RemoteBlast->new(@params, >>>>>>> -ENTREZ_QUERY => >>>>>>> ''Trypanosoma Brucei[ORGN]'); >>>>>>> >>>>>>> #change a paramter >>>>>>> +# commented this out... >>>>>>> +# $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = >>>>>>> 'Trypanosoma >>>>>>> Brucei[ORGN]'; >>>>>>> >>>>>>> MAJ >>>>>>> ----- Original Message ----- From: "Roopa Raghuveer" < >>>>>>> rtbio.2009 at gmail.com >>>>>>> > >>>>>>> To: >>>>>>> Sent: Friday, January 08, 2010 10:00 AM >>>>>>> Subject: [Bioperl-l] Regarding blast in Bioperl >>>>>>> >>>>>>> >>>>>>> Hello all, >>>>>>> >>>>>>>> >>>>>>>> I was trying Remote blast using Bioperl. My input data is a >>>>>>>> Trypanosoma >>>>>>>> brucei sequence in Fasta format. When I was trying to submit to >>>>>>>> BLAST >>>>>>>> using >>>>>>>> the step >>>>>>>> $r=$factory->submit_blast($input) >>>>>>>> It was not returning anything which I checked by debugging the code. >>>>>>>> It is >>>>>>>> not blasting my input sequence even though I mentioned all the >>>>>>>> parameters.I >>>>>>>> would paste the code below. >>>>>>>> >>>>>>>> Please help me in solving put this problem. It is very urgent. >>>>>>>> >>>>>>>> Regards >>>>>>>> Roopa. >>>>>>>> >>>>>>>> #!/usr/bin/perl >>>>>>>> >>>>>>>> #path for extra camel module >>>>>>>> use lib "/srv/www/htdocs/rain/RNAi/"; >>>>>>>> use Roopablast; >>>>>>>> >>>>>>>> >>>>>>>> use Bio::SearchIO; >>>>>>>> use Bio::Search::Result::BlastResult; >>>>>>>> use Bio::Perl; >>>>>>>> use Bio::Tools::Run::RemoteBlast; >>>>>>>> use Bio::Seq; >>>>>>>> use Bio::SeqIO; >>>>>>>> use Bio::DB::GenBank; >>>>>>>> >>>>>>>> $serverpath = "/srv/www/htdocs/rain/RNAi"; >>>>>>>> $serverurl = "http://141.84.66.66/rain/RNAi"; >>>>>>>> $outfile = $serverpath."/rnairesult_".time().".html"; >>>>>>>> $nuc = $serverpath."/nuc".time().".txt"; >>>>>>>> $debugfile = $serverpath."/debug_".time().".txt"; >>>>>>>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; >>>>>>>> >>>>>>>> my $outstring =""; >>>>>>>> >>>>>>>> &parse_form; >>>>>>>> >>>>>>>> print "Content-type: text/html\n\n"; >>>>>>>> print "\n"; >>>>>>>> print "RNAi Result"; >>>>>>>> print ">>>>>>> URL=$serverurl/rnairesult_".time().".html\"> \n"; >>>>>>>> print "\n"; >>>>>>>> print "\n"; >>>>>>>> print " Your results will appear >>>>>>> href=$serverurl/rnairesult_".time().".html>here
"; >>>>>>>> print " Please be patient, runtime can be up to 5 minutes
"; >>>>>>>> print " This page will automatically reload in 30 seconds. Roopa"; >>>>>>>> print "\n"; >>>>>>>> print "\n"; >>>>>>>> >>>>>>>> defined(my $pid = fork) or die "Can't fork: $!"; >>>>>>>> exit if $pid; >>>>>>>> open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; >>>>>>>> open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; >>>>>>>> open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> open(OUTFILE, '>',$outfile); >>>>>>>> >>>>>>>> print OUTFILE "\n >>>>>>>> RNAi Result >>>>>>>> >>>>>>> URL=$serverurl//rnairesult_".time().".html\"> \n >>>>>>>> >>>>>>>> \n >>>>>>>> \n >>>>>>>> Your results will appear >>>>>>> href=$serverurl/rnairesult_".time().".html>here
>>>>>>>> Please be patient, runtime can be up to 5 minutes wait wait >>>>>>>> wait......
>>>>>>>> This page will automatically reload in 30 seconds Roopa
>>>>>>>> \n >>>>>>>> \n"; >>>>>>>> >>>>>>>> close(OUTFILE); >>>>>>>> >>>>>>>> >>>>>>>> @compseqs = blastcode($in{'Inputseq'}); >>>>>>>> >>>>>>>> $in{'Inputseq'} =~ s/>.*$//m; >>>>>>>> $in{'Inputseq'} =~ s/[^TAGC]//gim; >>>>>>>> $in{'Inputseq'} =~ tr/actg/ACTG/; >>>>>>>> >>>>>>>> @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'}, >>>>>>>> $in{'Threshold'}); >>>>>>>> >>>>>>>> >>>>>>>> sub blastcode >>>>>>>> { >>>>>>>> >>>>>>>> $inpu1= $_[0]; >>>>>>>> >>>>>>>> #$organ= $_[1]; >>>>>>>> >>>>>>>> open(NUC,'>',$nuc); >>>>>>>> print NUC $inpu1; >>>>>>>> close(NUC); >>>>>>>> >>>>>>>> my $prog = 'blastn'; >>>>>>>> my $db = 'refseq_rna'; >>>>>>>> my $e_val= '1e-10'; >>>>>>>> my $organism= 'Trypanosoma Brucei'; >>>>>>>> >>>>>>>> $gb = new Bio::DB::GenBank; >>>>>>>> >>>>>>>> my @params = ( '-prog' => $prog, >>>>>>>> '-data' => $db, >>>>>>>> '-expect' => $e_val, >>>>>>>> '-readmethod' => 'SearchIO', >>>>>>>> '-Organism' => $organism ); >>>>>>>> >>>>>>>> # open(OUTFILE,'>',$debugfile); >>>>>>>> # print OUTFILE @params; >>>>>>>> # close(OUTFILE); >>>>>>>> >>>>>>>> >>>>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >>>>>>>> >>>>>>>> #change a paramter >>>>>>>> >>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma >>>>>>>> Brucei[ORGN]'; >>>>>>>> >>>>>>>> #change a paramter >>>>>>>> # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = >>>>>>>> '$input2[ORGN]'; >>>>>>>> >>>>>>>> my $v = 1; >>>>>>>> #$v is just to turn on and off the messages >>>>>>>> >>>>>>>> my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , >>>>>>>> '-organism' => 'Trypanosoma Brucei' ); >>>>>>>> >>>>>>>> >>>>>>>> while (my $input = $str->next_seq()) >>>>>>>> { >>>>>>>> #Blast a sequence against a database: >>>>>>>> #Alternatively, you could pass in a file with many >>>>>>>> #sequences rather than loop through sequence one at a time >>>>>>>> #Remove the loop starting 'while (my $input = $str->next_seq())' >>>>>>>> #and swap the two lines below for an example of that. >>>>>>>> >>>>>>>> open(OUTFILE,'>',$debugfile); >>>>>>>> print OUTFILE $input; >>>>>>>> close(OUTFILE); >>>>>>>> >>>>>>>> >>>>>>>> my $r = $factory->submit_blast($input); #The program stops here >>>>>>>> it >>>>>>>> does not return any value and it does not enter the While >>>>>>>> loop,Please help >>>>>>>> me in this regard.# >>>>>>>> open(OUTFILE,'>',$debugfile); >>>>>>>> print OUTFILE $r; >>>>>>>> close(OUTFILE); >>>>>>>> >>>>>>>> >>>>>>>> print STDERR "waiting...." if($v>0); >>>>>>>> >>>>>>>> while ( my @rids = $factory->each_rid ) { >>>>>>>> open(OUTFILE,'>',$debugfile); >>>>>>>> print OUTFILE "while entered"; >>>>>>>> close(OUTFILE); >>>>>>>> foreach my $rid ( @rids ) { >>>>>>>> >>>>>>>> open(OUTFILE,'>',$debugfile); >>>>>>>> print OUTFILE "foreach entered"; >>>>>>>> close(OUTFILE); >>>>>>>> >>>>>>>> my $rc = $factory->retrieve_blast($rid); >>>>>>>> >>>>>>>> if( !ref($rc) ) >>>>>>>> { >>>>>>>> if( $rc < 0 ) >>>>>>>> { >>>>>>>> $factory->remove_rid($rid); >>>>>>>> } >>>>>>>> open(OUTFILE,'>',$debugfile); >>>>>>>> print OUTFILE "if entered"; >>>>>>>> close(OUTFILE); >>>>>>>> print STDERR "." if ( $v > 0 ); >>>>>>>> sleep 5; >>>>>>>> } >>>>>>>> else { >>>>>>>> open(OUTFILE,'>',$debugfile); >>>>>>>> print OUTFILE "else entered"; >>>>>>>> close(OUTFILE); >>>>>>>> >>>>>>>> my $result = $rc->next_result(); >>>>>>>> #save the output >>>>>>>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; >>>>>>>> >>>>>>>> open(BLASTDEBUGFILE,'>',$blastdebugfile); >>>>>>>> print BLASTDEBUGFILE $result->next_hit(); >>>>>>>> close(BLASTDEBUGFILE); >>>>>>>> >>>>>>>> my $filename = >>>>>>>> $serverpath."/blastdata_".time().$result->query_name()."\.out"; >>>>>>>> >>>>>>>> # open(DEBUGFILE,'>',$debugfile); >>>>>>>> # open(new,'>',$filename); >>>>>>>> # @arra=; >>>>>>>> # print DEBUGFILE @arra; >>>>>>>> # close(DEBUGFILE); >>>>>>>> # close(new); >>>>>>>> >>>>>>>> $factory->save_output($filename); >>>>>>>> >>>>>>>> # open(BLASTDEBUGFILE,'>',$debugfile); >>>>>>>> # print BLASTDEBUGFILE "Hello $rid"; >>>>>>>> # close(BLASTDEBUGFILE); >>>>>>>> >>>>>>>> $factory->remove_rid($rid); >>>>>>>> >>>>>>>> open(BLASTDEBUGFILE,'>',$blastdebugfile); >>>>>>>> print BLASTDEBUGFILE $organism; >>>>>>>> close(BLASTDEBUGFILE); >>>>>>>> >>>>>>>> # open(OUTFILE,'>',$outfile); >>>>>>>> # print OUTFILE "Test2 $result->database_name()"; >>>>>>>> # close(OUTFILE); >>>>>>>> >>>>>>>> #$hit = $result->next_hit; >>>>>>>> #open(new,'>',$debugfile); >>>>>>>> #print $hit; >>>>>>>> #close(new); >>>>>>>> >>>>>>>> while ( my $hit = $result->next_hit ) { >>>>>>>> >>>>>>>> next unless ( $v > 0); >>>>>>>> >>>>>>>> # open(OUTFILE,'>',$debugfile); >>>>>>>> # print OUTFILE "$hit in while hits"; >>>>>>>> # close(OUTFILE); >>>>>>>> >>>>>>>> my $sequ = $gb->get_Seq_by_version($hit->name); >>>>>>>> my $dna = $sequ->seq(); # get the sequence as a >>>>>>>> string >>>>>>>> push(@seqs,$dna); >>>>>>>> } >>>>>>>> } >>>>>>>> } >>>>>>>> } >>>>>>>> } >>>>>>>> >>>>>>>> #open(OUTFILE,'>',$debugfile); >>>>>>>> #print OUTFILE $seqs[0]; >>>>>>>> #close(OUTFILE); >>>>>>>> >>>>>>>> return(@seqs); >>>>>>>> >>>>>>>> } >>>>>>>> >>>>>>>> open(OUTFILE, '>',$outfile) || die ; >>>>>>>> >>>>>>>> print OUTFILE "\n >>>>>>>> RNAi Result >>>>>>>> \n >>>>>>>> \n >>>>>>>>

>>>>>>>> Inputsequence:
"; >>>>>>>> >>>>>>>> for ($i=0; $i>>>>>>> >>>>>>>> print OUTFILE substr ($in{'Inputseq'}, $i, 1); >>>>>>>> >>>>>>>> if ( ($i+1)%10==0){ >>>>>>>> print OUTFILE " "; >>>>>>>> } >>>>>>>> if ( ($i+1)%60==0){ >>>>>>>> print OUTFILE "
\n"; >>>>>>>> } >>>>>>>> } >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> print OUTFILE "

"; >>>>>>>> >>>>>>>> $z=@compseqs; >>>>>>>> >>>>>>>> for($k=1;$k<$z;$k++) { >>>>>>>> print OUTFILE ">>>>>>> set\">

Compare >>>>>>>> Sequence:
"; >>>>>>>> >>>>>>>> for ($i=0; $i>>>>>>> >>>>>>>> print OUTFILE substr ($compseqs[$k], $i, 1); >>>>>>>> >>>>>>>> if ( ($i+1)%10==0){ >>>>>>>> print OUTFILE " "; >>>>>>>> } >>>>>>>> if ( ($i+1)%60==0){ >>>>>>>> print OUTFILE "
\n"; >>>>>>>> } >>>>>>>> } >>>>>>>> print OUTFILE "

"; >>>>>>>> } >>>>>>>> >>>>>>>> print OUTFILE "

>>>>>>>> Window:
$in{'Windowsize'} >>>>>>>>

>>>>>>>>

>>>>>>>> Threshold:
$in{'Threshold'} >>>>>>>>

"; >>>>>>>> my $j=0; >>>>>>>> >>>>>>>> for ($i=0; $i>>>>>>> >>>>>>>> if ($i<=length ($in{'Inputseq'})-$in{'Windowsize'}){ >>>>>>>> if ($out[$i]->{similar}<=$in{'Threshold'}){ >>>>>>>> $j=$in{'Windowsize'}; >>>>>>>> } >>>>>>>> $height=$out[$i]->{similar}*5; >>>>>>>> } >>>>>>>> >>>>>>>> if ($j>0) { >>>>>>>> print OUTFILE ">>>>>>> height=\"5\">"; >>>>>>>> $outstring .= "".substr ($in{'Inputseq'}, >>>>>>>> $i, >>>>>>>> 1).""; >>>>>>>> $j--; >>>>>>>> } >>>>>>>> else { >>>>>>>> print OUTFILE ">>>>>>> height=\"5\">"; >>>>>>>> $outstring .= "".substr ($in{'Inputseq'}, >>>>>>>> $i, >>>>>>>> 1).""; >>>>>>>> } >>>>>>>> >>>>>>>> if ( ($i+1)%10==0){ >>>>>>>> $outstring .= " "; >>>>>>>> } >>>>>>>> if ( ($i+1)%60==0){ >>>>>>>> $outstring .= "
\n"; >>>>>>>> >>>>>>>> } >>>>>>>> if ( ($i+1)%800==0){ >>>>>>>> print OUTFILE "

\n"; >>>>>>>> >>>>>>>> } >>>>>>>> } >>>>>>>> >>>>>>>> print OUTFILE "

>>>>>>> set\">$outstring"; >>>>>>>> >>>>>>>> #foreach (@out) { >>>>>>>> #print OUTFILE "

Sequence: $_->{sequence}: $_->{similar} >>>>>>>> matchs

"; >>>>>>>> #if ($_->{similar}<=$in{'Threshold'}){ >>>>>>>> >>>>>>>> # } >>>>>>>> #} >>>>>>>> >>>>>>>> print OUTFILE "\n\n"; >>>>>>>> >>>>>>>> close OUTFILE; >>>>>>>> >>>>>>>> #nameprint(); >>>>>>>> >>>>>>>> sub parse_form { >>>>>>>> local ($buffer, @pairs, $pair, $name, $value); >>>>>>>> # Read in text >>>>>>>> $ENV{'REQUEST_METHOD'} =~ tr/a-z/A-Z/; >>>>>>>> if ($ENV{'REQUEST_METHOD'} eq "POST") >>>>>>>> { >>>>>>>> read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'}); >>>>>>>> } >>>>>>>> else >>>>>>>> { >>>>>>>> $buffer = $ENV{'QUERY_STRING'}; >>>>>>>> } >>>>>>>> @pairs = split(/&/, $buffer); >>>>>>>> foreach $pair (@pairs) >>>>>>>> { >>>>>>>> ($name, $value) = split(/=/, $pair); >>>>>>>> $value =~ tr/+/ /; >>>>>>>> $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg; >>>>>>>> $in{$name} = $value; >>>>>>>> } >>>>>>>> } >>>>>>>> _______________________________________________ >>>>>>>> Bioperl-l mailing list >>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>>>> >>>> >>> >> > From bernd.web at gmail.com Thu Jan 21 13:37:18 2010 From: bernd.web at gmail.com (Bernd Web) Date: Thu, 21 Jan 2010 19:37:18 +0100 Subject: [Bioperl-l] Regarding blast in Bioperl In-Reply-To: References: <9D8A1428463C4D5E9C416521C35E254C@NewLife> <196889DF87964224ACDB948681BA7F86@NewLife> Message-ID: <716af09c1001211037p59b19a29l1967f1e514469e79@mail.gmail.com> Hi, Regarding RemoteBlast, my I add a query? It seems that Bio::Tools::Run::RemoteBlast is sending each sequence seperately to the NCBI (at least in BP 1.5.2). This means that for each Sequence a RID is to be checked. Is this indeed the case? The BLAST URL-API or batch interface supports sending multiple sequences at once. Regards, Bernd On Thu, Jan 21, 2010 at 7:28 PM, Roopa Raghuveer wrote: > Hello Mark, > > This is Roopa again. I have a small problem again. I am working on Remote > blast. The program works well. But the problem is this. ?The program > accesses the server and gets the output correctly. I am trying to send the > result sequences into an array and I found that always the first sequence > among the Result sequences is missing. The code is > > ?my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , > '-organism' => "$organ\[ORGN]"); From cjfields at illinois.edu Thu Jan 21 23:31:25 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 21 Jan 2010 22:31:25 -0600 Subject: [Bioperl-l] Bio::BroodComb - RFC In-Reply-To: <638696D6-7529-4717-A05A-F1E8FF1C5A8F@jays.net> References: <638696D6-7529-4717-A05A-F1E8FF1C5A8F@jays.net> Message-ID: Jay, Did you want to release it to CPAN? I'll take a closer look at the docs to get an idea of what you are doing with it, but from my perspective I can see this becoming a nice general use tool akin to Bio::Perl, maybe a bit more lightweight. chris On Jan 18, 2010, at 6:22 PM, Jay Hannah wrote: > I formalized a little framework so I could stop re-writing little programs that do some things people frequently ask me to do. > > http://github.com/jhannah/bio-broodcomb > > It stores everything in SQLite so users can write custom reports if they want to. It uses BioPerl and some shiny CPAN beads (DBIx::Class, Moose::Role). Tests included. > > The first two functions I stuck in the framework: > > Find subsequences (Bio::BroodComb::SubSeq): > > use Bio::BroodComb; > my $bc = Bio::BroodComb->new(); > $bc->load_large_seq(file => "large_seq.fasta"); > $bc->load_small_seq(file => "small_seq.fasta"); > $bc->find_subseqs(); > print $bc->subseq_report1; > > In-silico PCR (Bio::BroodComb::PCR): > > use Bio::BroodComb; > my $bc = Bio::BroodComb->new(); > $bc->load_large_seq(file => "large_seq.fasta"); > $bc->add_primerset( > description => "U5/R", # however you want it reported > forward_primer => 'GCGGGCAGCAATACTGCTTTGTAA', > reverse_primer => 'ACCAGCGTTCAGCATATGGAGGAT', > ); > $bc->find_pcr_hits(); > $bc->find_pcr_products(); > print $bc->pcr_report1; > > I find this rather handy, so will probably be adding all my applicable future work to it instead of writing stand-alone programs. Not sure if it should be renamed for eventual CPAN / wherever. > > Suggestions, contributions welcome. :) > > http://github.com/jhannah/bio-broodcomb > > Jay Hannah > http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Fri Jan 22 01:17:14 2010 From: jason at bioperl.org (Jason Stajich) Date: Thu, 21 Jan 2010 22:17:14 -0800 Subject: [Bioperl-l] forcing alphabet in Bio::AlignIO Message-ID: I'm considering putting in allowable initialization parameter (and get/ set) for Bio::AlignIO that would allow setting of the alphabet. This is then passed to Bio::LocatableSeq creation so that _guess_alphabet isn't called. This will allow removal of warnings about empty sequences because _guess_alphabet won't be called on a sequence if we have explictly set the alphabet. This worked great on my local install and tests pass. Any objections or concerns? basically it means when you make an AlignIO you can specify the alphabet i.e. my $in = Bio::AlignIO->new(-format => 'fasta', -alphabet => 'dna', - file => 'genome.fasaln'); I have some alignments with empty sequences and I think turning off the warnings is appropriate where I force the alphabet choice. It should also have a very modest speedup benefit too. -jason -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ http://twitter.com/hyphaltip From rtbio.2009 at gmail.com Fri Jan 22 04:54:32 2010 From: rtbio.2009 at gmail.com (Roopa Raghuveer) Date: Fri, 22 Jan 2010 10:54:32 +0100 Subject: [Bioperl-l] Fwd: Regarding blast in Bioperl In-Reply-To: References: <9D8A1428463C4D5E9C416521C35E254C@NewLife> <196889DF87964224ACDB948681BA7F86@NewLife> Message-ID: ---------- Forwarded message ---------- From: Roopa Raghuveer Date: Thu, Jan 21, 2010 at 7:28 PM Subject: Re: [Bioperl-l] Regarding blast in Bioperl To: "Mark A. Jensen" Cc: bioperl-l at lists.open-bio.org Hello Mark, This is Roopa again. I have a small problem again. I am working on Remote blast. The program works well. But the problem is this. The program accesses the server and gets the output correctly. I am trying to send the result sequences into an array and I found that always the first sequence among the Result sequences is missing. The code is my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , '-organism' => "$organ\[ORGN]"); while (my $input = $str->next_seq()) { #Blast a sequence against a database: #Alternatively, you could pass in a file with many #sequences rather than loop through sequence one at a time #Remove the loop starting 'while (my $input = $str->next_seq())' #and swap the two lines below for an example of that. open(OUTFILE,'>',$debugfile); print OUTFILE $input; close(OUTFILE); my $r = $factory->submit_blast($input); open(OUTFILE,'>',$debugfile); # print OUTFILE $r; close(OUTFILE); print STDERR "waiting...." if($v>0); while ( my @rids = $factory->each_rid ) { open(OUTFILE,'>',$debugfile); # print OUTFILE "while entered"; close(OUTFILE); foreach my $rid ( @rids ) { open(OUTFILE,'>',$debugfile); # print OUTFILE "foreach entered"; close(OUTFILE); my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $factory->remove_rid($rid); } open(OUTFILE,'>',$debugfile); # print OUTFILE "if entered"; close(OUTFILE); print STDERR "." if ( $v > 0 ); sleep 5; } else { open(OUTFILE,'>',$debugfile); # print OUTFILE "else entered"; close(OUTFILE); my $result = $rc->next_result(); #save the output $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; open(BLASTDEBUGFILE,'>',$blastdebugfile); print BLASTDEBUGFILE $result->next_hit(); close(BLASTDEBUGFILE); my $filename = $serverpath."/blastdata_".time()."\.out"; # open(DEBUGFILE,'>',$debugfile); # open(new,'>',$filename); # @arra=; # print DEBUGFILE @arra; # close(DEBUGFILE); # close(new); $factory->save_output($filename); # open(BLASTDEBUGFILE,'>',$debugfile); # print BLASTDEBUGFILE "Hello $rid"; # close(BLASTDEBUGFILE); $factory->remove_rid($rid); open(BLASTDEBUGFILE,'>',$blastdebugfile); print BLASTDEBUGFILE $organism; close(BLASTDEBUGFILE); # open(OUTFILE,'>',$outfile); # print OUTFILE "Test2 $result->database_name()"; # close(OUTFILE); #$hit = $result->next_hit; #open(new,'>',$debugfile); #print $hit; #close(new); $dummy=0; while ( my $hit = $result->next_hit ) { next unless ( $v >= 0); # open(OUTFILE,'>',$debugfile); # print OUTFILE "$hit in while hits"; # close(OUTFILE); my $sequ = $gb->get_Seq_by_version($hit->name); my $dna = $sequ->seq(); # get the sequence as a string $dummy++; open(OUTFILE,'>',$debugfile); # print OUTFILE $dummy; close(OUTFILE); push(@seqs,$dna); } } } } } $warum=@seqs; open(OUTFILE,'>',$debugfile); # print OUTFILE $warum; print OUTFILE @seqs; close(OUTFILE); return(@seqs); } open(OUTFILE, '>',$outfile) || die ; print OUTFILE "\n RNAi Result \n \n

Inputsequence:
"; Here in the above code, I was trying to debug the code and trying to get the count of the array and even the sequence. But when the output data was giving 1 sequence, the count of the array was 0 and when I tried to print the output sequence I could not get any. It was the same when the no of output sequences was 3, I tried to print the sequences but was getting the count of the array as 2 and was printing only two sequences. Please help me in sorting out this problem. Regards, Roopa. On Sun, Jan 10, 2010 at 1:03 AM, Mark A. Jensen wrote: > Excellent Roopa- it's my pleasure-- MAJ > > ----- Original Message ----- > *From:* Roopa Raghuveer > *To:* Mark A. Jensen > *Sent:* Saturday, January 09, 2010 6:41 PM > *Subject:* Re: [Bioperl-l] Regarding blast in Bioperl > > Hi Mark, > > Thank you very very much. The code is working now. Thanks for the support > and time you have spent on me. > > Thanks in advance > Roopa. > > On Sat, Jan 9, 2010 at 10:56 PM, Mark A. Jensen wrote: > >> There is still a bug with the double quotes. Use "$organ\[ORGN]", which >> prevents perl from >> looking for a member of an array called @organ. This would have shown up >> if 'use strict;' had >> been in place. Still don't know whether this would work precisely; can you >> send me the query >> sequence so I can reproduce your ouput? >> thanks MAJ >> >> ----- Original Message ----- >> *From:* Roopa Raghuveer >> *To:* Mark A. Jensen >> *Sent:* Saturday, January 09, 2010 2:02 PM >> *Subject:* Re: [Bioperl-l] Regarding blast in Bioperl >> >> Hi Mark, >> >> I tried it with double quotes but still i got the same o/p with sequences >> from different species. >> >> ref|XM_822292.1| Trypanosoma brucei TREU927 protein kinase A ... 1813 >> 0.0 >> ref|XM_822286.1| Trypanosoma brucei TREU927 protein kinase A ... 1622 >> 0.0 >> ref|XM_816530.1| Trypanosoma cruzi strain CL Brener protein k... 773 >> 0.0 >> ref|XM_816527.1| Trypanosoma cruzi strain CL Brener protein k... 749 >> 0.0 >> ref|XM_838414.1| Leishmania major strain Friedlin protein kin... 551 >> 3e-154 >> ref|XM_838409.1| Leishmania major strain Friedlin protein kin... 551 >> 3e-154 >> ref|XM_001568451.1| Leishmania braziliensis MHOM/BR/75/M2904 ... 542 >> 2e-151 >> ref|XM_001469171.1| Leishmania infantum protein kinase A cata... 538 >> 2e-150 >> ref|XM_001469166.1| Leishmania infantum protein kinase A cata... 538 >> 2e-150 >> ref|XM_001682462.1| Leishmania major protein kinase A catalyt... 196 >> 3e-47 >> ref|XM_804361.1| Trypanosoma cruzi strain CL Brener protein k... 190 >> 1e-45 >> ref|XM_002065851.1| Drosophila willistoni GK20594 (Dwil\GK205... 181 >> 7e-43 >> ref|XM_822694.1| Trypanosoma brucei TREU927 protein kinase A ... 179 >> 2e-42 >> ref|XM_001563990.1| Leishmania braziliensis MHOM/BR/75/M2904 ... 178 >> 8e-42 >> ref|XM_814844.1| Trypanosoma cruzi strain CL Brener protein k... 170 >> 1e-39 >> ref|XM_001763039.1| Physcomitrella patens subsp. patens predi... 169 >> 4e-39 >> ref|XM_001464886.1| Leishmania infantum JPCM5 protein kinase ... 167 >> 1e-38 >> ref|XM_001377302.1| PREDICTED: Monodelphis domestica similar ... 150 >> 1e-33 >> ref|XM_001603485.1| PREDICTED: Nasonia vitripennis similar to... 145 >> 5e-32 >> ref|XM_416852.2| PREDICTED: Gallus gallus protein kinase, X-l... 143 >> 2e-31 >> ref|NM_001016403.2| Xenopus (Silurana) tropicalis protein kin... 143 >> 2e-31 >> ref|XM_002009291.1| Drosophila mojavensis GI11297 (Dmoj\GI112... 138 >> 7e-30 >> ref|NM_016979.1| Mus musculus protein kinase, X-linked (Prkx)... 138 >> 7e-30 >> ref|XM_001495664.2| PREDICTED: Equus caballus similar to Seri... 136 >> 2e-29 >> ref|XM_001111571.1| PREDICTED: Macaca mulatta cAMP-dependent ... 136 >> 2e-29 >> ref|XM_001611655.1| Babesia bovis protein kinase domain conta... 134 >> 9e-29 >> ref|NR_028062.1| Homo sapiens protein kinase, Y-linked (PRKY)... 132 >> 3e-28 >> ref|XM_001517795.1| PREDICTED: Ornithorhynchus anatinus simil... 132 >> 3e-28 >> ref|XM_685338.2| PREDICTED: Danio rerio similar to Serine/thr... 132 >> 3e-28 >> ref|XM_002189865.1| PREDICTED: Taeniopygia guttata protein ki... 131 >> 1e-27 >> ref|XM_001362299.1| PREDICTED: Monodelphis domestica similar ... 131 >> 1e-27 >> ref|NM_001093198.1| Xenopus laevis protein kinase, cAMP-depen... 129 >> 4e-27 >> ref|XM_001461322.1| Paramecium tetraurelia hypothetical prote... 129 >> 4e-27 >> ref|NM_001099869.1| Xenopus laevis cAMP-dependent protein kin... 129 >> 4e-27 >> ref|NM_001003470.1| Danio rerio protein kinase, cAMP-dependen... 129 >> 4e-27 >> ref|XM_001141503.1| PREDICTED: Pan troglodytes verus protein ... 127 >> 1e-26 >> ref|XM_001145269.1| PREDICTED: Pan troglodytes protein kinase... 127 >> 1e-26 >> ref|XM_512434.2| PREDICTED: Pan troglodytes cAMP-dependent pr... 127 >> 1e-26 >> ref|XM_001171457.1| PREDICTED: Pan troglodytes cAMP-dependent... 127 >> 1e-26 >> ref|XM_001171437.1| PREDICTED: Pan troglodytes cAMP-dependent... 127 >> 1e-26 >> ref|XM_847420.1| PREDICTED: Canis familiaris similar to Serin... 127 >> 1e-26 >> ref|NM_207518.1| Homo sapiens protein kinase, cAMP-dependent,... 127 >> 1e-26 >> ref|NM_002730.3| Homo sapiens protein kinase, cAMP-dependent,... 127 >> 1e-26 >> >> >> Thanks in advance. >> >> Roopa. >> >> On Sat, Jan 9, 2010 at 7:46 PM, Mark A. Jensen wrote: >> >>> I understand you. Put in the double quotes and see what happens. >>> >>> ----- Original Message ----- >>> *From:* Roopa Raghuveer >>> *To:* Mark A. Jensen >>> *Sent:* Saturday, January 09, 2010 1:40 PM >>> *Subject:* Re: [Bioperl-l] Regarding blast in Bioperl >>> >>> Hi Mark, >>> >>> Thanks for your reply. It was working when I specifically use the name of >>> the organism as Trypanosoma brucei in the code,but my idea is to introduce a >>> $organ which takes the organism given by the user i.e., let it be anything >>> >>> Pseudomonas, Drosophila, Trypanosoma, Leishmania etc., I should get the >>> sequences related to only those organisms. >>> >>> i.e., If the user enters Pseudomonas,the $organ parameter of the code >>> takes Pseudomonas ,does BLAST and returns only those sequences that produce >>> significant alignment with Pseudomonas(only).But this is not happening like >>> that . >>> >>> Please help me in this regard. >>> >>> Thanks in advance >>> Roopa >>> >>> On Sat, Jan 9, 2010 at 7:29 PM, Mark A. Jensen wrote: >>> >>>> Hi Roopa-- You may get what you want if you make the change. >>>> With single quotes, ENTREZ_QUERY is set to the literal string >>>> >>>> $organ[ORGN] >>>> >>>> while, with double quotes, the variable value will be substituted, >>>> and the parameter should be set to >>>> >>>> Trypanosoma brucei[ORGN] >>>> >>>> I'm guess that it worked because the database ignored the strange >>>> parameter, >>>> and returned all the matches. Try this and if it doesn't work I look >>>> harder. >>>> cheers, >>>> Mark >>>> >>>> ----- Original Message ----- >>>> *From:* Roopa Raghuveer >>>> *To:* Mark A. Jensen >>>> *Sent:* Saturday, January 09, 2010 1:24 PM >>>> *Subject:* Re: [Bioperl-l] Regarding blast in Bioperl >>>> >>>> hello Mark, >>>> >>>> Thanks for your reply.It was working without enclosing $organ[ORGN] in >>>> double quotations,but. I would like to have only those specific sequences >>>> which are specific for my Organism i.e., I need sequences only from the >>>> organism that I entered. >>>> >>>> When the organism is Trypanosoma brucei,I could get even Leishmania and >>>> other species as the similar sequences. But I want to get only trypanosoma >>>> brucei sequences. >>>> >>>> Could you please help me out in this regard? >>>> >>>> Roopa. >>>> >>>> My output >>>> >>>> I/P organism: Trypanosoma brucei >>>> >>>> O/P:- >>>> ref|XM_822292.1| Trypanosoma brucei TREU927 protein kinase A ... >>>> 1813 0.0 >>>> ref|XM_822286.1| Trypanosoma brucei TREU927 protein kinase A ... >>>> 1622 0.0 >>>> ref|XM_816530.1| Trypanosoma cruzi strain CL Brener protein k... >>>> 773 0.0 >>>> ref|XM_816527.1| Trypanosoma cruzi strain CL Brener protein k... >>>> 749 0.0 >>>> ref|XM_838414.1| Leishmania major strain Friedlin protein kin... >>>> 551 3e-154 >>>> ref|XM_838409.1| Leishmania major strain Friedlin protein kin... >>>> 551 3e-154 >>>> ref|XM_001568451.1| Leishmania braziliensis MHOM/BR/75/M2904 ... >>>> 542 2e-151 >>>> ref|XM_001469171.1| Leishmania infantum protein kinase A cata... >>>> 538 2e-150 >>>> ref|XM_001469166.1| Leishmania infantum protein kinase A cata... >>>> 538 2e-150 >>>> ref|XM_001682462.1| Leishmania major protein kinase A catalyt... >>>> 196 3e-47 >>>> ref|XM_804361.1| Trypanosoma cruzi strain CL Brener protein k... >>>> 190 1e-45 >>>> ref|XM_002065851.1| Drosophila willistoni GK20594 (Dwil\GK205... >>>> 181 7e-43 >>>> ref|XM_822694.1| Trypanosoma brucei TREU927 protein kinase A ... >>>> 179 2e-42 >>>> ref|XM_001563990.1| Leishmania braziliensis MHOM/BR/75/M2904 ... >>>> 178 8e-42 >>>> ref|XM_814844.1| Trypanosoma cruzi strain CL Brener protein k... >>>> 170 1e-39 >>>> ref|XM_001763039.1| Physcomitrella patens subsp. patens predi... >>>> 168 4e-39 >>>> ref|XM_001464886.1| Leishmania infantum JPCM5 protein kinase ... >>>> 167 1e-38 >>>> ref|XM_001377302.1| PREDICTED: Monodelphis domestica similar ... >>>> 150 1e-33 >>>> ref|XM_001603485.1| PREDICTED: Nasonia vitripennis similar to... >>>> 145 5e-32 >>>> ref|XM_416852.2| PREDICTED: Gallus gallus protein kinase, X-l... >>>> 143 2e-31 >>>> ref|NM_001016403.2| Xenopus (Silurana) tropicalis protein kin... >>>> 143 2e-31 >>>> ref|XM_002009291.1| Drosophila mojavensis GI11297 (Dmoj\GI112... >>>> 138 7e-30 >>>> ref|NM_016979.1| Mus musculus protein kinase, X-linked (Prkx)... >>>> 138 7e-30 >>>> ref|XM_001495664.2| PREDICTED: Equus caballus similar to Seri... >>>> 136 2e-29 >>>> ref|XM_001111571.1| PREDICTED: Macaca mulatta cAMP-dependent ... >>>> 136 2e-29 >>>> ref|XM_001611655.1| Babesia bovis protein kinase domain conta... >>>> 134 9e-29 >>>> ref|NR_028062.1| Homo sapiens protein kinase, Y-linked (PRKY)... >>>> 132 3e-28 >>>> ref|XM_001517795.1| PREDICTED: Ornithorhynchus anatinus simil... >>>> 132 3e-28 >>>> ref|XM_685338.2| PREDICTED: Danio rerio similar to Serine/thr... >>>> 132 3e-28 >>>> ref|XM_002189865.1| PREDICTED: Taeniopygia guttata protein ki... >>>> 131 1e-27 >>>> ref|XM_001362299.1| PREDICTED: Monodelphis domestica similar ... >>>> 131 1e-27 >>>> ref|NM_001093198.1| Xenopus laevis protein kinase, cAMP-depen... >>>> 129 4e-27 >>>> ref|XM_001461322.1| Paramecium tetraurelia hypothetical prote... >>>> 129 4e-27 >>>> ref|NM_001099869.1| Xenopus laevis cAMP-dependent protein kin... >>>> 129 4e-27 >>>> >>>> Roopa. >>>> >>>> On Sat, Jan 9, 2010 at 7:05 PM, Mark A. Jensen wrote: >>>> >>>>> I see it immediately (from making same bug many times) : >>>>> >>>>> >>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY >>>>> => >>>>> - '$organ[ORGN]'); >>>>> +"$organ[ORGN]"); >>>>> >>>>> >>>>> MAJ >>>>> >>>>> ----- Original Message ----- From: "Roopa Raghuveer" < >>>>> rtbio.2009 at gmail.com> >>>>> To: "Mark A. Jensen" >>>>> Cc: >>>>> Sent: Saturday, January 09, 2010 11:57 AM >>>>> Subject: Re: [Bioperl-l] Regarding blast in Bioperl >>>>> >>>>> >>>>> >>>>> Hello all, >>>>>> >>>>>> Thanks alot for your reply Mark. It was working for Trypanosoma brucei >>>>>> as >>>>>> the organism parameter,but when I tried to use the Organism parameter >>>>>> from >>>>>> the user,it was not working i.e., I was unable to get the target >>>>>> sequences. >>>>>> Please help me in this regard. My code is >>>>>> >>>>>> #!/usr/bin/perl >>>>>> >>>>>> #path for extra camel module >>>>>> use lib "/srv/www/htdocs/rain/RNAi/"; >>>>>> use Roopablast; >>>>>> >>>>>> >>>>>> use Bio::SearchIO; >>>>>> use Bio::Search::Result::BlastResult; >>>>>> use Bio::Perl; >>>>>> use Bio::Tools::Run::RemoteBlast; >>>>>> use Bio::Seq; >>>>>> use Bio::SeqIO; >>>>>> use Bio::DB::GenBank; >>>>>> >>>>>> $serverpath = "/srv/www/htdocs/rain/RNAi"; >>>>>> $serverurl = "http://141.84.66.66/rain/RNAi"; >>>>>> $outfile = $serverpath."/rnairesult_".time().".html"; >>>>>> $nuc = $serverpath."/nuc".time().".txt"; >>>>>> $debugfile = $serverpath."/debug_".time().".txt"; >>>>>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; >>>>>> >>>>>> my $outstring =""; >>>>>> >>>>>> &parse_form; >>>>>> >>>>>> print "Content-type: text/html\n\n"; >>>>>> print "\n"; >>>>>> print "RNAi Result"; >>>>>> print ">>>>> URL=$serverurl/rnairesult_".time().".html\"> \n"; >>>>>> print "\n"; >>>>>> print "\n"; >>>>>> print " Your results will appear >>>>> href=$serverurl/rnairesult_".time().".html>here
"; >>>>>> print " Please be patient, runtime can be up to 5 minutes
"; >>>>>> print " This page will automatically reload in 30 seconds. Roopa"; >>>>>> print "\n"; >>>>>> print "\n"; >>>>>> >>>>>> defined(my $pid = fork) or die "Can't fork: $!"; >>>>>> exit if $pid; >>>>>> open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; >>>>>> open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; >>>>>> open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; >>>>>> >>>>>> open(OUTFILE, '>',$outfile); >>>>>> >>>>>> print OUTFILE "\n >>>>>> RNAi Result >>>>>> >>>>> URL=$serverurl//rnairesult_".time().".html\"> \n >>>>>> >>>>>> \n >>>>>> \n >>>>>> Your results will appear >>>>> href=$serverurl/rnairesult_".time().".html>here
>>>>>> Please be patient, runtime can be up to 5 minutes wait wait >>>>>> wait......
>>>>>> This page will automatically reload in 30 seconds Roopa
>>>>>> \n >>>>>> \n"; >>>>>> >>>>>> close(OUTFILE); >>>>>> >>>>>> >>>>>> @compseqs = blastcode($in{'Inputseq'},$in{'Organism'}); >>>>>> >>>>>> $in{'Inputseq'} =~ s/>.*$//m; >>>>>> $in{'Inputseq'} =~ s/[^TAGC]//gim; >>>>>> $in{'Inputseq'} =~ tr/actg/ACTG/; >>>>>> >>>>>> @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'}, >>>>>> $in{'Threshold'}); >>>>>> >>>>>> >>>>>> sub blastcode >>>>>> { >>>>>> >>>>>> $inpu1= $_[0]; >>>>>> >>>>>> $organ= $_[1]; >>>>>> >>>>>> open(NUC,'>',$nuc); >>>>>> print NUC $inpu1,"\n"; >>>>>> close(NUC); >>>>>> >>>>>> my $prog = 'blastn'; >>>>>> my $db = 'refseq_rna'; >>>>>> my $e_val= '1e-10'; >>>>>> my $organism= $organ; >>>>>> >>>>>> $gb = new Bio::DB::GenBank; >>>>>> >>>>>> my @params = ( '-prog' => $prog, >>>>>> '-data' => $db, >>>>>> '-expect' => $e_val, >>>>>> '-readmethod' => 'SearchIO', >>>>>> '-Organism' => $organism ); >>>>>> >>>>>> open(OUTFILE,'>',$debugfile); >>>>>> print OUTFILE $inpu1; >>>>>> close(OUTFILE); >>>>>> >>>>>> >>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY >>>>>> => >>>>>> '$organ[ORGN]'); >>>>>> >>>>>> #my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >>>>>> >>>>>> #change a paramter >>>>>> >>>>>> #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma >>>>>> Brucei[ORGN]'; >>>>>> >>>>>> #change a paramter >>>>>> # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = >>>>>> '$input2[ORGN]'; >>>>>> >>>>>> my $v = 1; >>>>>> #$v is just to turn on and off the messages >>>>>> >>>>>> my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , >>>>>> '-organism' => $organ ); >>>>>> >>>>>> >>>>>> while (my $input = $str->next_seq()) >>>>>> { >>>>>> #Blast a sequence against a database: >>>>>> #Alternatively, you could pass in a file with many >>>>>> #sequences rather than loop through sequence one at a time >>>>>> #Remove the loop starting 'while (my $input = $str->next_seq())' >>>>>> #and swap the two lines below for an example of that. >>>>>> >>>>>> #open(OUTFILE,'>',$debugfile); >>>>>> # print OUTFILE $input; >>>>>> #close(OUTFILE); >>>>>> >>>>>> >>>>>> my $r = $factory->submit_blast($input); >>>>>> >>>>>> open(OUTFILE,'>',$debugfile); >>>>>> # print OUTFILE $r; >>>>>> close(OUTFILE); >>>>>> >>>>>> print STDERR "waiting...." if($v>0); >>>>>> >>>>>> while ( my @rids = $factory->each_rid ) { >>>>>> # open(OUTFILE,'>',$debugfile); >>>>>> # print OUTFILE "while entered"; >>>>>> # close(OUTFILE); >>>>>> foreach my $rid ( @rids ) { >>>>>> >>>>>> # open(OUTFILE,'>',$debugfile); >>>>>> # print OUTFILE "foreach entered"; >>>>>> # close(OUTFILE); >>>>>> >>>>>> my $rc = $factory->retrieve_blast($rid); >>>>>> >>>>>> if( !ref($rc) ) >>>>>> { >>>>>> if( $rc < 0 ) >>>>>> { >>>>>> $factory->remove_rid($rid); >>>>>> } >>>>>> open(OUTFILE,'>',$debugfile); >>>>>> # print OUTFILE "if entered"; >>>>>> close(OUTFILE); >>>>>> print STDERR "." if ( $v > 0 ); >>>>>> sleep 5; >>>>>> } >>>>>> else { >>>>>> # open(OUTFILE,'>',$debugfile); >>>>>> # print OUTFILE "else entered"; >>>>>> # close(OUTFILE); >>>>>> >>>>>> my $result = $rc->next_result(); >>>>>> #save the output >>>>>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; >>>>>> >>>>>> open(BLASTDEBUGFILE,'>',$blastdebugfile); >>>>>> print BLASTDEBUGFILE $result->next_hit(); >>>>>> close(BLASTDEBUGFILE); >>>>>> >>>>>> my $filename = >>>>>> $serverpath."/blastdata_".time().$result->query_name()."\.out"; >>>>>> >>>>>> # open(DEBUGFILE,'>',$debugfile); >>>>>> # open(new,'>',$filename); >>>>>> # @arra=; >>>>>> # print DEBUGFILE @arra; >>>>>> # close(DEBUGFILE); >>>>>> # close(new); >>>>>> >>>>>> $factory->save_output($filename); >>>>>> # open(BLASTDEBUGFILE,'>',$debugfile); >>>>>> # print BLASTDEBUGFILE "Hello $rid"; >>>>>> # close(BLASTDEBUGFILE); >>>>>> >>>>>> $factory->remove_rid($rid); >>>>>> >>>>>> open(BLASTDEBUGFILE,'>',$blastdebugfile); >>>>>> print BLASTDEBUGFILE $organism; >>>>>> close(BLASTDEBUGFILE); >>>>>> >>>>>> # open(OUTFILE,'>',$outfile); >>>>>> # print OUTFILE "Test2 $result->database_name()"; >>>>>> # close(OUTFILE); >>>>>> >>>>>> #$hit = $result->next_hit; >>>>>> #open(new,'>',$debugfile); >>>>>> #print $hit; >>>>>> #close(new); >>>>>> >>>>>> while ( my $hit = $result->next_hit ) { >>>>>> >>>>>> next unless ( $v > 0); >>>>>> >>>>>> # open(OUTFILE,'>',$debugfile); >>>>>> # print OUTFILE "$hit in while hits"; >>>>>> # close(OUTFILE); >>>>>> >>>>>> my $sequ = $gb->get_Seq_by_version($hit->name); >>>>>> my $dna = $sequ->seq(); # get the sequence as a string >>>>>> push(@seqs,$dna); >>>>>> } >>>>>> } >>>>>> } >>>>>> } >>>>>> } >>>>>> >>>>>> #open(OUTFILE,'>',$debugfile); >>>>>> #print OUTFILE $seqs[0]; >>>>>> #close(OUTFILE); >>>>>> >>>>>> return(@seqs); >>>>>> >>>>>> } >>>>>> >>>>>> Regards, >>>>>> Roopa. >>>>>> >>>>>> >>>>>> On Fri, Jan 8, 2010 at 4:36 PM, Mark A. Jensen >>>>>> wrote: >>>>>> >>>>>> Hi Roopa-- >>>>>>> >>>>>>> I got your code to work with the following changes: >>>>>>> >>>>>>> +# the input should be a valid FASTA file... >>>>>>> ... >>>>>>> open(NUC,'>',$nuc); >>>>>>> +print NUC ">seq (need a name line for valid fasta)\n"; >>>>>>> print NUC $inpu1, "\n"; >>>>>>> close(NUC); >>>>>>> ... >>>>>>> >>>>>>> +# you can set these header parms in the call itself... >>>>>>> - my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >>>>>>> + my $factory = Bio::Tools::Run::RemoteBlast->new(@params, >>>>>>> -ENTREZ_QUERY => >>>>>>> ''Trypanosoma Brucei[ORGN]'); >>>>>>> >>>>>>> #change a paramter >>>>>>> +# commented this out... >>>>>>> +# $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = >>>>>>> 'Trypanosoma >>>>>>> Brucei[ORGN]'; >>>>>>> >>>>>>> MAJ >>>>>>> ----- Original Message ----- From: "Roopa Raghuveer" < >>>>>>> rtbio.2009 at gmail.com >>>>>>> > >>>>>>> To: >>>>>>> Sent: Friday, January 08, 2010 10:00 AM >>>>>>> Subject: [Bioperl-l] Regarding blast in Bioperl >>>>>>> >>>>>>> >>>>>>> Hello all, >>>>>>> >>>>>>>> >>>>>>>> I was trying Remote blast using Bioperl. My input data is a >>>>>>>> Trypanosoma >>>>>>>> brucei sequence in Fasta format. When I was trying to submit to >>>>>>>> BLAST >>>>>>>> using >>>>>>>> the step >>>>>>>> $r=$factory->submit_blast($input) >>>>>>>> It was not returning anything which I checked by debugging the code. >>>>>>>> It is >>>>>>>> not blasting my input sequence even though I mentioned all the >>>>>>>> parameters.I >>>>>>>> would paste the code below. >>>>>>>> >>>>>>>> Please help me in solving put this problem. It is very urgent. >>>>>>>> >>>>>>>> Regards >>>>>>>> Roopa. >>>>>>>> >>>>>>>> #!/usr/bin/perl >>>>>>>> >>>>>>>> #path for extra camel module >>>>>>>> use lib "/srv/www/htdocs/rain/RNAi/"; >>>>>>>> use Roopablast; >>>>>>>> >>>>>>>> >>>>>>>> use Bio::SearchIO; >>>>>>>> use Bio::Search::Result::BlastResult; >>>>>>>> use Bio::Perl; >>>>>>>> use Bio::Tools::Run::RemoteBlast; >>>>>>>> use Bio::Seq; >>>>>>>> use Bio::SeqIO; >>>>>>>> use Bio::DB::GenBank; >>>>>>>> >>>>>>>> $serverpath = "/srv/www/htdocs/rain/RNAi"; >>>>>>>> $serverurl = "http://141.84.66.66/rain/RNAi"; >>>>>>>> $outfile = $serverpath."/rnairesult_".time().".html"; >>>>>>>> $nuc = $serverpath."/nuc".time().".txt"; >>>>>>>> $debugfile = $serverpath."/debug_".time().".txt"; >>>>>>>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; >>>>>>>> >>>>>>>> my $outstring =""; >>>>>>>> >>>>>>>> &parse_form; >>>>>>>> >>>>>>>> print "Content-type: text/html\n\n"; >>>>>>>> print "\n"; >>>>>>>> print "RNAi Result"; >>>>>>>> print ">>>>>>> URL=$serverurl/rnairesult_".time().".html\"> \n"; >>>>>>>> print "\n"; >>>>>>>> print "\n"; >>>>>>>> print " Your results will appear >>>>>>> href=$serverurl/rnairesult_".time().".html>here
"; >>>>>>>> print " Please be patient, runtime can be up to 5 minutes
"; >>>>>>>> print " This page will automatically reload in 30 seconds. Roopa"; >>>>>>>> print "\n"; >>>>>>>> print "\n"; >>>>>>>> >>>>>>>> defined(my $pid = fork) or die "Can't fork: $!"; >>>>>>>> exit if $pid; >>>>>>>> open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; >>>>>>>> open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; >>>>>>>> open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> open(OUTFILE, '>',$outfile); >>>>>>>> >>>>>>>> print OUTFILE "\n >>>>>>>> RNAi Result >>>>>>>> >>>>>>> URL=$serverurl//rnairesult_".time().".html\"> \n >>>>>>>> >>>>>>>> \n >>>>>>>> \n >>>>>>>> Your results will appear >>>>>>> href=$serverurl/rnairesult_".time().".html>here
>>>>>>>> Please be patient, runtime can be up to 5 minutes wait wait >>>>>>>> wait......
>>>>>>>> This page will automatically reload in 30 seconds Roopa
>>>>>>>> \n >>>>>>>> \n"; >>>>>>>> >>>>>>>> close(OUTFILE); >>>>>>>> >>>>>>>> >>>>>>>> @compseqs = blastcode($in{'Inputseq'}); >>>>>>>> >>>>>>>> $in{'Inputseq'} =~ s/>.*$//m; >>>>>>>> $in{'Inputseq'} =~ s/[^TAGC]//gim; >>>>>>>> $in{'Inputseq'} =~ tr/actg/ACTG/; >>>>>>>> >>>>>>>> @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'}, >>>>>>>> $in{'Threshold'}); >>>>>>>> >>>>>>>> >>>>>>>> sub blastcode >>>>>>>> { >>>>>>>> >>>>>>>> $inpu1= $_[0]; >>>>>>>> >>>>>>>> #$organ= $_[1]; >>>>>>>> >>>>>>>> open(NUC,'>',$nuc); >>>>>>>> print NUC $inpu1; >>>>>>>> close(NUC); >>>>>>>> >>>>>>>> my $prog = 'blastn'; >>>>>>>> my $db = 'refseq_rna'; >>>>>>>> my $e_val= '1e-10'; >>>>>>>> my $organism= 'Trypanosoma Brucei'; >>>>>>>> >>>>>>>> $gb = new Bio::DB::GenBank; >>>>>>>> >>>>>>>> my @params = ( '-prog' => $prog, >>>>>>>> '-data' => $db, >>>>>>>> '-expect' => $e_val, >>>>>>>> '-readmethod' => 'SearchIO', >>>>>>>> '-Organism' => $organism ); >>>>>>>> >>>>>>>> # open(OUTFILE,'>',$debugfile); >>>>>>>> # print OUTFILE @params; >>>>>>>> # close(OUTFILE); >>>>>>>> >>>>>>>> >>>>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >>>>>>>> >>>>>>>> #change a paramter >>>>>>>> >>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma >>>>>>>> Brucei[ORGN]'; >>>>>>>> >>>>>>>> #change a paramter >>>>>>>> # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = >>>>>>>> '$input2[ORGN]'; >>>>>>>> >>>>>>>> my $v = 1; >>>>>>>> #$v is just to turn on and off the messages >>>>>>>> >>>>>>>> my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , >>>>>>>> '-organism' => 'Trypanosoma Brucei' ); >>>>>>>> >>>>>>>> >>>>>>>> while (my $input = $str->next_seq()) >>>>>>>> { >>>>>>>> #Blast a sequence against a database: >>>>>>>> #Alternatively, you could pass in a file with many >>>>>>>> #sequences rather than loop through sequence one at a time >>>>>>>> #Remove the loop starting 'while (my $input = $str->next_seq())' >>>>>>>> #and swap the two lines below for an example of that. >>>>>>>> >>>>>>>> open(OUTFILE,'>',$debugfile); >>>>>>>> print OUTFILE $input; >>>>>>>> close(OUTFILE); >>>>>>>> >>>>>>>> >>>>>>>> my $r = $factory->submit_blast($input); #The program stops here >>>>>>>> it >>>>>>>> does not return any value and it does not enter the While >>>>>>>> loop,Please help >>>>>>>> me in this regard.# >>>>>>>> open(OUTFILE,'>',$debugfile); >>>>>>>> print OUTFILE $r; >>>>>>>> close(OUTFILE); >>>>>>>> >>>>>>>> >>>>>>>> print STDERR "waiting...." if($v>0); >>>>>>>> >>>>>>>> while ( my @rids = $factory->each_rid ) { >>>>>>>> open(OUTFILE,'>',$debugfile); >>>>>>>> print OUTFILE "while entered"; >>>>>>>> close(OUTFILE); >>>>>>>> foreach my $rid ( @rids ) { >>>>>>>> >>>>>>>> open(OUTFILE,'>',$debugfile); >>>>>>>> print OUTFILE "foreach entered"; >>>>>>>> close(OUTFILE); >>>>>>>> >>>>>>>> my $rc = $factory->retrieve_blast($rid); >>>>>>>> >>>>>>>> if( !ref($rc) ) >>>>>>>> { >>>>>>>> if( $rc < 0 ) >>>>>>>> { >>>>>>>> $factory->remove_rid($rid); >>>>>>>> } >>>>>>>> open(OUTFILE,'>',$debugfile); >>>>>>>> print OUTFILE "if entered"; >>>>>>>> close(OUTFILE); >>>>>>>> print STDERR "." if ( $v > 0 ); >>>>>>>> sleep 5; >>>>>>>> } >>>>>>>> else { >>>>>>>> open(OUTFILE,'>',$debugfile); >>>>>>>> print OUTFILE "else entered"; >>>>>>>> close(OUTFILE); >>>>>>>> >>>>>>>> my $result = $rc->next_result(); >>>>>>>> #save the output >>>>>>>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; >>>>>>>> >>>>>>>> open(BLASTDEBUGFILE,'>',$blastdebugfile); >>>>>>>> print BLASTDEBUGFILE $result->next_hit(); >>>>>>>> close(BLASTDEBUGFILE); >>>>>>>> >>>>>>>> my $filename = >>>>>>>> $serverpath."/blastdata_".time().$result->query_name()."\.out"; >>>>>>>> >>>>>>>> # open(DEBUGFILE,'>',$debugfile); >>>>>>>> # open(new,'>',$filename); >>>>>>>> # @arra=; >>>>>>>> # print DEBUGFILE @arra; >>>>>>>> # close(DEBUGFILE); >>>>>>>> # close(new); >>>>>>>> >>>>>>>> $factory->save_output($filename); >>>>>>>> >>>>>>>> # open(BLASTDEBUGFILE,'>',$debugfile); >>>>>>>> # print BLASTDEBUGFILE "Hello $rid"; >>>>>>>> # close(BLASTDEBUGFILE); >>>>>>>> >>>>>>>> $factory->remove_rid($rid); >>>>>>>> >>>>>>>> open(BLASTDEBUGFILE,'>',$blastdebugfile); >>>>>>>> print BLASTDEBUGFILE $organism; >>>>>>>> close(BLASTDEBUGFILE); >>>>>>>> >>>>>>>> # open(OUTFILE,'>',$outfile); >>>>>>>> # print OUTFILE "Test2 $result->database_name()"; >>>>>>>> # close(OUTFILE); >>>>>>>> >>>>>>>> #$hit = $result->next_hit; >>>>>>>> #open(new,'>',$debugfile); >>>>>>>> #print $hit; >>>>>>>> #close(new); >>>>>>>> >>>>>>>> while ( my $hit = $result->next_hit ) { >>>>>>>> >>>>>>>> next unless ( $v > 0); >>>>>>>> >>>>>>>> # open(OUTFILE,'>',$debugfile); >>>>>>>> # print OUTFILE "$hit in while hits"; >>>>>>>> # close(OUTFILE); >>>>>>>> >>>>>>>> my $sequ = $gb->get_Seq_by_version($hit->name); >>>>>>>> my $dna = $sequ->seq(); # get the sequence as a >>>>>>>> string >>>>>>>> push(@seqs,$dna); >>>>>>>> } >>>>>>>> } >>>>>>>> } >>>>>>>> } >>>>>>>> } >>>>>>>> >>>>>>>> #open(OUTFILE,'>',$debugfile); >>>>>>>> #print OUTFILE $seqs[0]; >>>>>>>> #close(OUTFILE); >>>>>>>> >>>>>>>> return(@seqs); >>>>>>>> >>>>>>>> } >>>>>>>> >>>>>>>> open(OUTFILE, '>',$outfile) || die ; >>>>>>>> >>>>>>>> print OUTFILE "\n >>>>>>>> RNAi Result >>>>>>>> \n >>>>>>>> \n >>>>>>>>

>>>>>>>> Inputsequence:
"; >>>>>>>> >>>>>>>> for ($i=0; $i>>>>>>> >>>>>>>> print OUTFILE substr ($in{'Inputseq'}, $i, 1); >>>>>>>> >>>>>>>> if ( ($i+1)%10==0){ >>>>>>>> print OUTFILE " "; >>>>>>>> } >>>>>>>> if ( ($i+1)%60==0){ >>>>>>>> print OUTFILE "
\n"; >>>>>>>> } >>>>>>>> } >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> print OUTFILE "

"; >>>>>>>> >>>>>>>> $z=@compseqs; >>>>>>>> >>>>>>>> for($k=1;$k<$z;$k++) { >>>>>>>> print OUTFILE ">>>>>>> set\">

Compare >>>>>>>> Sequence:
"; >>>>>>>> >>>>>>>> for ($i=0; $i>>>>>>> >>>>>>>> print OUTFILE substr ($compseqs[$k], $i, 1); >>>>>>>> >>>>>>>> if ( ($i+1)%10==0){ >>>>>>>> print OUTFILE " "; >>>>>>>> } >>>>>>>> if ( ($i+1)%60==0){ >>>>>>>> print OUTFILE "
\n"; >>>>>>>> } >>>>>>>> } >>>>>>>> print OUTFILE "

"; >>>>>>>> } >>>>>>>> >>>>>>>> print OUTFILE "

>>>>>>>> Window:
$in{'Windowsize'} >>>>>>>>

>>>>>>>>

>>>>>>>> Threshold:
$in{'Threshold'} >>>>>>>>

"; >>>>>>>> my $j=0; >>>>>>>> >>>>>>>> for ($i=0; $i>>>>>>> >>>>>>>> if ($i<=length ($in{'Inputseq'})-$in{'Windowsize'}){ >>>>>>>> if ($out[$i]->{similar}<=$in{'Threshold'}){ >>>>>>>> $j=$in{'Windowsize'}; >>>>>>>> } >>>>>>>> $height=$out[$i]->{similar}*5; >>>>>>>> } >>>>>>>> >>>>>>>> if ($j>0) { >>>>>>>> print OUTFILE ">>>>>>> height=\"5\">"; >>>>>>>> $outstring .= "".substr ($in{'Inputseq'}, >>>>>>>> $i, >>>>>>>> 1).""; >>>>>>>> $j--; >>>>>>>> } >>>>>>>> else { >>>>>>>> print OUTFILE ">>>>>>> height=\"5\">"; >>>>>>>> $outstring .= "".substr ($in{'Inputseq'}, >>>>>>>> $i, >>>>>>>> 1).""; >>>>>>>> } >>>>>>>> >>>>>>>> if ( ($i+1)%10==0){ >>>>>>>> $outstring .= " "; >>>>>>>> } >>>>>>>> if ( ($i+1)%60==0){ >>>>>>>> $outstring .= "
\n"; >>>>>>>> >>>>>>>> } >>>>>>>> if ( ($i+1)%800==0){ >>>>>>>> print OUTFILE "

\n"; >>>>>>>> >>>>>>>> } >>>>>>>> } >>>>>>>> >>>>>>>> print OUTFILE "

>>>>>>> set\">$outstring"; >>>>>>>> >>>>>>>> #foreach (@out) { >>>>>>>> #print OUTFILE "

Sequence: $_->{sequence}: $_->{similar} >>>>>>>> matchs

"; >>>>>>>> #if ($_->{similar}<=$in{'Threshold'}){ >>>>>>>> >>>>>>>> # } >>>>>>>> #} >>>>>>>> >>>>>>>> print OUTFILE "\n\n"; >>>>>>>> >>>>>>>> close OUTFILE; >>>>>>>> >>>>>>>> #nameprint(); >>>>>>>> >>>>>>>> sub parse_form { >>>>>>>> local ($buffer, @pairs, $pair, $name, $value); >>>>>>>> # Read in text >>>>>>>> $ENV{'REQUEST_METHOD'} =~ tr/a-z/A-Z/; >>>>>>>> if ($ENV{'REQUEST_METHOD'} eq "POST") >>>>>>>> { >>>>>>>> read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'}); >>>>>>>> } >>>>>>>> else >>>>>>>> { >>>>>>>> $buffer = $ENV{'QUERY_STRING'}; >>>>>>>> } >>>>>>>> @pairs = split(/&/, $buffer); >>>>>>>> foreach $pair (@pairs) >>>>>>>> { >>>>>>>> ($name, $value) = split(/=/, $pair); >>>>>>>> $value =~ tr/+/ /; >>>>>>>> $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg; >>>>>>>> $in{$name} = $value; >>>>>>>> } >>>>>>>> } >>>>>>>> _______________________________________________ >>>>>>>> Bioperl-l mailing list >>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>>>> >>>> >>> >> > From maj at fortinbras.us Fri Jan 22 07:34:59 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 22 Jan 2010 07:34:59 -0500 Subject: [Bioperl-l] forcing alphabet in Bio::AlignIO In-Reply-To: References: Message-ID: I'm down with that. ----- Original Message ----- From: "Jason Stajich" To: "BioPerl List" Sent: Friday, January 22, 2010 1:17 AM Subject: [Bioperl-l] forcing alphabet in Bio::AlignIO > I'm considering putting in allowable initialization parameter (and get/ > set) for Bio::AlignIO that would allow setting of the alphabet. This > is then passed to Bio::LocatableSeq creation so that _guess_alphabet > isn't called. This will allow removal of warnings about empty > sequences because _guess_alphabet won't be called on a sequence if we > have explictly set the alphabet. > > This worked great on my local install and tests pass. Any objections > or concerns? > > basically it means when you make an AlignIO you can specify the > alphabet i.e. > > my $in = Bio::AlignIO->new(-format => 'fasta', -alphabet => 'dna', - > file => 'genome.fasaln'); > > I have some alignments with empty sequences and I think turning off > the warnings is appropriate where I force the alphabet choice. It > should also have a very modest speedup benefit too. > > -jason > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > http://fungalgenomes.org/ > http://twitter.com/hyphaltip > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From avilella at gmail.com Fri Jan 22 08:07:26 2010 From: avilella at gmail.com (Albert Vilella) Date: Fri, 22 Jan 2010 13:07:26 +0000 Subject: [Bioperl-l] Merging fragments in a simplealign Message-ID: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com> Hi, I would like to write a script that merges fragments in a Bio::SimpleAlign object on the basis of some $seq->display_name rule. I basically want to start with something like this: seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM seq2.234 QWERTYU------------------- seq2.345 ----------ASDFGH---------- seq2.456 -------------------ZXCVBNM And end with something like this: seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM seq2.mrg QWERTYU---ASDFGH---ZXCVBNM Can people suggest any Bio::SimpleAlign methods that would help here? Cheers, Albert. From maj at fortinbras.us Fri Jan 22 08:31:54 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 22 Jan 2010 08:31:54 -0500 Subject: [Bioperl-l] Merging fragments in a simplealign In-Reply-To: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com> References: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com> Message-ID: Here's one of my favorite tricks for this: XOR mask on gap symbol. MAJ use Bio::SeqIO; use Bio::Seq; use strict; my $seqio = Bio::SeqIO->new( -fh => \*DATA ); my $acc = $seqio->next_seq->seq ^ '-'; while ($_ = $seqio->next_seq ) { $acc ^= ($_->seq ^ '-'); } my $mrg = Bio::Seq->new( -id => 'merged', -seq => $acc ^ '-' ); 1; __END__ >seq2.234 QWERTYU------------------- >seq2.345 ----------ASDFGH---------- >seq2.456 -------------------ZXCVBNM ----- Original Message ----- From: "Albert Vilella" To: Sent: Friday, January 22, 2010 8:07 AM Subject: [Bioperl-l] Merging fragments in a simplealign > Hi, > > I would like to write a script that merges fragments in a Bio::SimpleAlign > object on the basis of > some $seq->display_name rule. > > I basically want to start with something like this: > > seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM > seq2.234 QWERTYU------------------- > seq2.345 ----------ASDFGH---------- > seq2.456 -------------------ZXCVBNM > > And end with something like this: > > seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM > seq2.mrg QWERTYU---ASDFGH---ZXCVBNM > > Can people suggest any Bio::SimpleAlign methods that would help here? > > Cheers, > > Albert. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Fri Jan 22 08:34:07 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 22 Jan 2010 07:34:07 -0600 Subject: [Bioperl-l] forcing alphabet in Bio::AlignIO In-Reply-To: References: Message-ID: <55F51BAA-7DA5-4F32-B680-DCAE1714A5F1@illinois.edu> Sounds good to me. The warnings are a bit too tight on this module anyway. I still think we have plans towards refactoring some of this, not sure how far along they are: http://www.bioperl.org/wiki/Align_Refactor chris On Jan 22, 2010, at 12:17 AM, Jason Stajich wrote: > I'm considering putting in allowable initialization parameter (and get/set) for Bio::AlignIO that would allow setting of the alphabet. This is then passed to Bio::LocatableSeq creation so that _guess_alphabet isn't called. This will allow removal of warnings about empty sequences because _guess_alphabet won't be called on a sequence if we have explictly set the alphabet. > > This worked great on my local install and tests pass. Any objections or concerns? > > basically it means when you make an AlignIO you can specify the alphabet i.e. > > my $in = Bio::AlignIO->new(-format => 'fasta', -alphabet => 'dna', -file => 'genome.fasaln'); > > I have some alignments with empty sequences and I think turning off the warnings is appropriate where I force the alphabet choice. It should also have a very modest speedup benefit too. > > -jason > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > http://fungalgenomes.org/ > http://twitter.com/hyphaltip > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Fri Jan 22 08:40:57 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 22 Jan 2010 07:40:57 -0600 Subject: [Bioperl-l] Merging fragments in a simplealign In-Reply-To: References: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com> Message-ID: <058430B4-F149-438C-A9DD-C9E24545FDBB@illinois.edu> May be something for the cook/scrapbook? chris On Jan 22, 2010, at 7:31 AM, Mark A. Jensen wrote: > Here's one of my favorite tricks for this: XOR mask on gap symbol. > MAJ > > use Bio::SeqIO; > use Bio::Seq; > use strict; > my $seqio = Bio::SeqIO->new( -fh => \*DATA ); > > my $acc = $seqio->next_seq->seq ^ '-'; > while ($_ = $seqio->next_seq ) { > $acc ^= ($_->seq ^ '-'); > } > my $mrg = Bio::Seq->new( -id => 'merged', > -seq => $acc ^ '-' ); > 1; > > > __END__ >> seq2.234 > QWERTYU------------------- >> seq2.345 > ----------ASDFGH---------- >> seq2.456 > -------------------ZXCVBNM > > ----- Original Message ----- From: "Albert Vilella" > To: > Sent: Friday, January 22, 2010 8:07 AM > Subject: [Bioperl-l] Merging fragments in a simplealign > > >> Hi, >> I would like to write a script that merges fragments in a Bio::SimpleAlign >> object on the basis of >> some $seq->display_name rule. >> I basically want to start with something like this: >> seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM >> seq2.234 QWERTYU------------------- >> seq2.345 ----------ASDFGH---------- >> seq2.456 -------------------ZXCVBNM >> And end with something like this: >> seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM >> seq2.mrg QWERTYU---ASDFGH---ZXCVBNM >> Can people suggest any Bio::SimpleAlign methods that would help here? >> Cheers, >> Albert. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From holland at eaglegenomics.com Fri Jan 22 05:51:52 2010 From: holland at eaglegenomics.com (Richard Holland) Date: Fri, 22 Jan 2010 10:51:52 +0000 Subject: [Bioperl-l] [BioSQL-l] SwissProt DE lines and UniProt XML / TagTree as XML in BioSQL In-Reply-To: <320fb6e01001210433n6f42e617g6519ee2790d6add5@mail.gmail.com> References: <320fb6e01001210433n6f42e617g6519ee2790d6add5@mail.gmail.com> Message-ID: <8FECCBDE-2DE1-40EE-B5A4-73BDAC893E2D@eaglegenomics.com> Nice idea. Currently, BioJava just stores the complete section as a string without parsing it, but it provides a parser module for converting it into useful tag/value format within a user's program (but not to be stored in BioSQL). On 21 Jan 2010, at 12:33, Peter wrote: > Hi all, > > This is cross posted to try and ensure relevant people see it. > I suggest we continue the discussion on the BioSQL list > (for how to serialise structured annotation to BioSQL), and/or > the OpenBio list (for things like file format naming conventions). > > I am hoping we (Bio*) can be consistent in how we parse and load > into BioSQL the SwissProt DE lines (known as "swiss" format in > both BioPerl and Biopython's SeqIO, and by EMBOSS) or the > equivalent UniProt XML tags (which we are tentatively going to > call the "uniprot" format in Biopython's SeqIO - comments?). > > Like BioPerl (etc), Biopython can parse plain text SwissProt ("swiss") > files and load them into BioSQL. Biopython currently treats the DE > comment lines as a long string, as BioPerl used to: > > http://lists.open-bio.org/pipermail/bioperl-l/2009-May/030041.html > http://lists.open-bio.org/pipermail/biosql-l/2009-May/001514.html > > I understand that BioPerl now turns the SwissProt DE lines into a > TagTree, and for storing this in BioSQL this gets serialised as XML. > I would like Biopython to handle this the same way (although rather > than a Perl TagTree, we'd use a Python structure of course), and > would appreciate clarification of what exactly was implemented > (e.g. which bit of the BioPerl source code should be look at, > and could you show a worked example?). > > Andrea Pierlenoin (CC'd - not sure if he is on the BioSQL or > Open-Bio lists yet) has started work on parsing UniProt XML > files for Biopython. Here the DE comment lines are already > provided broken up with XML markup. Hopefully their nested > structure matches what BioPerl was doing with the SwissProt > DE lines. > > Regards, > > Peter > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From andrea at biocomp.unibo.it Fri Jan 22 07:18:32 2010 From: andrea at biocomp.unibo.it (Andrea Pierleoni) Date: Fri, 22 Jan 2010 13:18:32 +0100 (CET) Subject: [Bioperl-l] SwissProt DE lines and UniProt XML / TagTree as XML in BioSQL In-Reply-To: <320fb6e01001210433n6f42e617g6519ee2790d6add5@mail.gmail.com> References: <320fb6e01001210433n6f42e617g6519ee2790d6add5@mail.gmail.com> Message-ID: <2b6e30c4628585042366646a7b46386e.squirrel@lipid.biocomp.unibo.it> I think that the point here can be a little broader, since not only the swissprot DE lines carry complex and structured data. To define a common, language-independent way to store structured data into the comment and *_qualifier_value tables of the actual BioSQL schema could be very useful. XML looks like a good candidate to me, and the UniprotXML format can be used as reference or as a template to start from. Each Bio* project will then parse and report this structured data in its own programming language data structure. Andrea > Hi all, > > This is cross posted to try and ensure relevant people see it. > I suggest we continue the discussion on the BioSQL list > (for how to serialise structured annotation to BioSQL), and/or > the OpenBio list (for things like file format naming conventions). > > I am hoping we (Bio*) can be consistent in how we parse and load > into BioSQL the SwissProt DE lines (known as "swiss" format in > both BioPerl and Biopython's SeqIO, and by EMBOSS) or the > equivalent UniProt XML tags (which we are tentatively going to > call the "uniprot" format in Biopython's SeqIO - comments?). > > Like BioPerl (etc), Biopython can parse plain text SwissProt ("swiss") > files and load them into BioSQL. Biopython currently treats the DE > comment lines as a long string, as BioPerl used to: > > http://lists.open-bio.org/pipermail/bioperl-l/2009-May/030041.html > http://lists.open-bio.org/pipermail/biosql-l/2009-May/001514.html > > I understand that BioPerl now turns the SwissProt DE lines into a > TagTree, and for storing this in BioSQL this gets serialised as XML. > I would like Biopython to handle this the same way (although rather > than a Perl TagTree, we'd use a Python structure of course), and > would appreciate clarification of what exactly was implemented > (e.g. which bit of the BioPerl source code should be look at, > and could you show a worked example?). > > Andrea Pierlenoin (CC'd - not sure if he is on the BioSQL or > Open-Bio lists yet) has started work on parsing UniProt XML > files for Biopython. Here the DE comment lines are already > provided broken up with XML markup. Hopefully their nested > structure matches what BioPerl was doing with the SwissProt > DE lines. > > Regards, > > Peter > From avilella at gmail.com Fri Jan 22 11:04:13 2010 From: avilella at gmail.com (Albert Vilella) Date: Fri, 22 Jan 2010 16:04:13 +0000 Subject: [Bioperl-l] Merging fragments in a simplealign In-Reply-To: <058430B4-F149-438C-A9DD-C9E24545FDBB@illinois.edu> References: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com> <058430B4-F149-438C-A9DD-C9E24545FDBB@illinois.edu> Message-ID: <358f4d651001220804o3a96252ctc4b721771668f1ba@mail.gmail.com> Is there/should be a 'have_pairwise_overlap' method similar to this? # $seq1 and $seq3 have matching ids my $seq1 = $aln->each_seq_by_id($seq1->display_id); my $seq3 = $aln->each_seq_by_id($seq3->display_id); my $ret = $aln->have_pairwise_overlap($seq1,$seq3); On Fri, Jan 22, 2010 at 1:40 PM, Chris Fields wrote: > May be something for the cook/scrapbook? > > chris > > On Jan 22, 2010, at 7:31 AM, Mark A. Jensen wrote: > > > Here's one of my favorite tricks for this: XOR mask on gap symbol. > > MAJ > > > > use Bio::SeqIO; > > use Bio::Seq; > > use strict; > > my $seqio = Bio::SeqIO->new( -fh => \*DATA ); > > > > my $acc = $seqio->next_seq->seq ^ '-'; > > while ($_ = $seqio->next_seq ) { > > $acc ^= ($_->seq ^ '-'); > > } > > my $mrg = Bio::Seq->new( -id => 'merged', > > -seq => $acc ^ '-' ); > > 1; > > > > > > __END__ > >> seq2.234 > > QWERTYU------------------- > >> seq2.345 > > ----------ASDFGH---------- > >> seq2.456 > > -------------------ZXCVBNM > > > > ----- Original Message ----- From: "Albert Vilella" > > To: > > Sent: Friday, January 22, 2010 8:07 AM > > Subject: [Bioperl-l] Merging fragments in a simplealign > > > > > >> Hi, > >> I would like to write a script that merges fragments in a > Bio::SimpleAlign > >> object on the basis of > >> some $seq->display_name rule. > >> I basically want to start with something like this: > >> seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM > >> seq2.234 QWERTYU------------------- > >> seq2.345 ----------ASDFGH---------- > >> seq2.456 -------------------ZXCVBNM > >> And end with something like this: > >> seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM > >> seq2.mrg QWERTYU---ASDFGH---ZXCVBNM > >> Can people suggest any Bio::SimpleAlign methods that would help here? > >> Cheers, > >> Albert. > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Fri Jan 22 11:02:55 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 22 Jan 2010 11:02:55 -0500 Subject: [Bioperl-l] Merging fragments in a simplealign In-Reply-To: <058430B4-F149-438C-A9DD-C9E24545FDBB@illinois.edu> References: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com> <058430B4-F149-438C-A9DD-C9E24545FDBB@illinois.edu> Message-ID: http://www.bioperl.org/wiki/Merge_gapped_sequences_across_a_common_region ----- Original Message ----- From: "Chris Fields" To: "Mark A. Jensen" Cc: "Albert Vilella" ; Sent: Friday, January 22, 2010 8:40 AM Subject: Re: [Bioperl-l] Merging fragments in a simplealign > May be something for the cook/scrapbook? > > chris > > On Jan 22, 2010, at 7:31 AM, Mark A. Jensen wrote: > >> Here's one of my favorite tricks for this: XOR mask on gap symbol. >> MAJ >> >> use Bio::SeqIO; >> use Bio::Seq; >> use strict; >> my $seqio = Bio::SeqIO->new( -fh => \*DATA ); >> >> my $acc = $seqio->next_seq->seq ^ '-'; >> while ($_ = $seqio->next_seq ) { >> $acc ^= ($_->seq ^ '-'); >> } >> my $mrg = Bio::Seq->new( -id => 'merged', >> -seq => $acc ^ '-' ); >> 1; >> >> >> __END__ >>> seq2.234 >> QWERTYU------------------- >>> seq2.345 >> ----------ASDFGH---------- >>> seq2.456 >> -------------------ZXCVBNM >> >> ----- Original Message ----- From: "Albert Vilella" >> To: >> Sent: Friday, January 22, 2010 8:07 AM >> Subject: [Bioperl-l] Merging fragments in a simplealign >> >> >>> Hi, >>> I would like to write a script that merges fragments in a Bio::SimpleAlign >>> object on the basis of >>> some $seq->display_name rule. >>> I basically want to start with something like this: >>> seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM >>> seq2.234 QWERTYU------------------- >>> seq2.345 ----------ASDFGH---------- >>> seq2.456 -------------------ZXCVBNM >>> And end with something like this: >>> seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM >>> seq2.mrg QWERTYU---ASDFGH---ZXCVBNM >>> Can people suggest any Bio::SimpleAlign methods that would help here? >>> Cheers, >>> Albert. >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From avilella at gmail.com Fri Jan 22 12:50:57 2010 From: avilella at gmail.com (Albert Vilella) Date: Fri, 22 Jan 2010 17:50:57 +0000 Subject: [Bioperl-l] Merging fragments in a simplealign In-Reply-To: <358f4d651001220804o3a96252ctc4b721771668f1ba@mail.gmail.com> References: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com> <058430B4-F149-438C-A9DD-C9E24545FDBB@illinois.edu> <358f4d651001220804o3a96252ctc4b721771668f1ba@mail.gmail.com> Message-ID: <358f4d651001220950v605a5b55kef0302ff6270f82e@mail.gmail.com> Or to rephrase my answer, what is the closest way for the code below that already exists? On Fri, Jan 22, 2010 at 4:04 PM, Albert Vilella wrote: > Is there/should be a 'have_pairwise_overlap' method similar to this? > > # $seq1 and $seq3 have matching ids > my $seq1 = $aln->each_seq_by_id($seq1->display_id); > my $seq3 = $aln->each_seq_by_id($seq3->display_id); > > my $ret = $aln->have_pairwise_overlap($seq1,$seq3); > > > On Fri, Jan 22, 2010 at 1:40 PM, Chris Fields wrote: > >> May be something for the cook/scrapbook? >> >> chris >> >> On Jan 22, 2010, at 7:31 AM, Mark A. Jensen wrote: >> >> > Here's one of my favorite tricks for this: XOR mask on gap symbol. >> > MAJ >> > >> > use Bio::SeqIO; >> > use Bio::Seq; >> > use strict; >> > my $seqio = Bio::SeqIO->new( -fh => \*DATA ); >> > >> > my $acc = $seqio->next_seq->seq ^ '-'; >> > while ($_ = $seqio->next_seq ) { >> > $acc ^= ($_->seq ^ '-'); >> > } >> > my $mrg = Bio::Seq->new( -id => 'merged', >> > -seq => $acc ^ '-' ); >> > 1; >> > >> > >> > __END__ >> >> seq2.234 >> > QWERTYU------------------- >> >> seq2.345 >> > ----------ASDFGH---------- >> >> seq2.456 >> > -------------------ZXCVBNM >> > >> > ----- Original Message ----- From: "Albert Vilella" > > >> > To: >> > Sent: Friday, January 22, 2010 8:07 AM >> > Subject: [Bioperl-l] Merging fragments in a simplealign >> > >> > >> >> Hi, >> >> I would like to write a script that merges fragments in a >> Bio::SimpleAlign >> >> object on the basis of >> >> some $seq->display_name rule. >> >> I basically want to start with something like this: >> >> seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM >> >> seq2.234 QWERTYU------------------- >> >> seq2.345 ----------ASDFGH---------- >> >> seq2.456 -------------------ZXCVBNM >> >> And end with something like this: >> >> seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM >> >> seq2.mrg QWERTYU---ASDFGH---ZXCVBNM >> >> Can people suggest any Bio::SimpleAlign methods that would help here? >> >> Cheers, >> >> Albert. >> >> _______________________________________________ >> >> Bioperl-l mailing list >> >> Bioperl-l at lists.open-bio.org >> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > From jay at jays.net Fri Jan 22 13:30:57 2010 From: jay at jays.net (Jay Hannah) Date: Fri, 22 Jan 2010 12:30:57 -0600 Subject: [Bioperl-l] Bio::BroodComb - RFC In-Reply-To: References: <638696D6-7529-4717-A05A-F1E8FF1C5A8F@jays.net> Message-ID: On Jan 21, 2010, at 10:31 PM, Chris Fields wrote: > Did you want to release it to CPAN? I'll take a closer look at the docs to get an idea of what you are doing with it, but from my perspective I can see this becoming a nice general use tool akin to Bio::Perl, maybe a bit more lightweight. Yes, I was thinking I would. No one has (yet) told me it's the worst idea ever, so I'm feeling encouraged. :) Given smallish inputs / databases (up to a few million rows) where some lightweight schema + SQLite + BioPerl can get the job done, it's nice to have a little easy-to-run toolbox. New tables and Roles bolt on easily, so I'll be adding them as they surface at $work[1]. Thanks for your interest. :) Jay Hannah http://github.com/jhannah/bio-broodcomb http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah From dalalhina at gmail.com Fri Jan 22 12:31:09 2010 From: dalalhina at gmail.com (hina dalal) Date: Fri, 22 Jan 2010 17:31:09 +0000 Subject: [Bioperl-l] Bioperl installation failed Message-ID: <425f75df1001220931t49f5c768j97d91d2dd1757f19@mail.gmail.com> Hi I have installed PERL from Activesate and now trying to install bioperl but can not do it . Neither from PPM (it is showing error ?Ppm install failed: 404 not found?) nor from CPAN / manual installation. It is not allowing me to download nmake, showing that ?the version of this file is not compatible with the version of windows you are running. Check your computer system information to see whether you need 32 bit or 64 bit of this program.? I am using windows VISTA. Please help. Regards Hina From H.Dalal at sms.ed.ac.uk Fri Jan 22 12:34:55 2010 From: H.Dalal at sms.ed.ac.uk (Hina Dalal) Date: Fri, 22 Jan 2010 17:34:55 +0000 Subject: [Bioperl-l] BioPerl installation failed: please help Message-ID: <20100122173455.c19sbarmswswgswc@www.sms.ed.ac.uk> Hi I have installed PERL from Activesate and now trying to install bioperl but can not do it . Neither from PPM (it is showing error ?Ppm install failed: 404 not found?) nor from CPAN manual installation. It is not allowing me to download nmake, showing that ?the version of this file is not compatible with the version of windows you are running. Check your computer system information to see whether you need 32 bit or 64 bit of this program.? Please help. Regards Hina -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From jason at bioperl.org Fri Jan 22 14:18:30 2010 From: jason at bioperl.org (Jason Stajich) Date: Fri, 22 Jan 2010 11:18:30 -0800 Subject: [Bioperl-l] forcing alphabet in Bio::AlignIO In-Reply-To: <55F51BAA-7DA5-4F32-B680-DCAE1714A5F1@illinois.edu> References: <55F51BAA-7DA5-4F32-B680-DCAE1714A5F1@illinois.edu> Message-ID: <59EC9331-FB2F-4338-AD58-2D501A528A18@bioperl.org> Done, as of r16739. Look forward to the refactor work too. -jason On Jan 22, 2010, at 5:34 AM, Chris Fields wrote: > Sounds good to me. The warnings are a bit too tight on this module > anyway. > > I still think we have plans towards refactoring some of this, not > sure how far along they are: > > http://www.bioperl.org/wiki/Align_Refactor > > chris > > On Jan 22, 2010, at 12:17 AM, Jason Stajich wrote: > >> I'm considering putting in allowable initialization parameter (and >> get/set) for Bio::AlignIO that would allow setting of the >> alphabet. This is then passed to Bio::LocatableSeq creation so >> that _guess_alphabet isn't called. This will allow removal of >> warnings about empty sequences because _guess_alphabet won't be >> called on a sequence if we have explictly set the alphabet. >> >> This worked great on my local install and tests pass. Any >> objections or concerns? >> >> basically it means when you make an AlignIO you can specify the >> alphabet i.e. >> >> my $in = Bio::AlignIO->new(-format => 'fasta', -alphabet => 'dna', - >> file => 'genome.fasaln'); >> >> I have some alignments with empty sequences and I think turning off >> the warnings is appropriate where I force the alphabet choice. It >> should also have a very modest speedup benefit too. >> >> -jason >> -- >> Jason Stajich >> jason.stajich at gmail.com >> jason at bioperl.org >> http://fungalgenomes.org/ >> http://twitter.com/hyphaltip >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ http://twitter.com/hyphaltip From cjfields at illinois.edu Fri Jan 22 14:22:43 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 22 Jan 2010 13:22:43 -0600 Subject: [Bioperl-l] Merging fragments in a simplealign In-Reply-To: <358f4d651001220950v605a5b55kef0302ff6270f82e@mail.gmail.com> References: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com> <058430B4-F149-438C-A9DD-C9E24545FDBB@illinois.edu> <358f4d651001220804o3a96252ctc4b721771668f1ba@mail.gmail.com> <358f4d651001220950v605a5b55kef0302ff6270f82e@mail.gmail.com> Message-ID: <14824B66-2112-46A1-98BB-02FC592A3A9B@illinois.edu> This could exist, but should go into a general Utilities module. Part of the Align refactoring was to pull a good number of the methods into a general utilities module, so this would fit into that category. chris On Jan 22, 2010, at 11:50 AM, Albert Vilella wrote: > Or to rephrase my answer, what is the closest way for the code below that > already exists? > > On Fri, Jan 22, 2010 at 4:04 PM, Albert Vilella wrote: > >> Is there/should be a 'have_pairwise_overlap' method similar to this? >> >> # $seq1 and $seq3 have matching ids >> my $seq1 = $aln->each_seq_by_id($seq1->display_id); >> my $seq3 = $aln->each_seq_by_id($seq3->display_id); >> >> my $ret = $aln->have_pairwise_overlap($seq1,$seq3); >> >> >> On Fri, Jan 22, 2010 at 1:40 PM, Chris Fields wrote: >> >>> May be something for the cook/scrapbook? >>> >>> chris >>> >>> On Jan 22, 2010, at 7:31 AM, Mark A. Jensen wrote: >>> >>>> Here's one of my favorite tricks for this: XOR mask on gap symbol. >>>> MAJ >>>> >>>> use Bio::SeqIO; >>>> use Bio::Seq; >>>> use strict; >>>> my $seqio = Bio::SeqIO->new( -fh => \*DATA ); >>>> >>>> my $acc = $seqio->next_seq->seq ^ '-'; >>>> while ($_ = $seqio->next_seq ) { >>>> $acc ^= ($_->seq ^ '-'); >>>> } >>>> my $mrg = Bio::Seq->new( -id => 'merged', >>>> -seq => $acc ^ '-' ); >>>> 1; >>>> >>>> >>>> __END__ >>>>> seq2.234 >>>> QWERTYU------------------- >>>>> seq2.345 >>>> ----------ASDFGH---------- >>>>> seq2.456 >>>> -------------------ZXCVBNM >>>> >>>> ----- Original Message ----- From: "Albert Vilella" >>> >>>> To: >>>> Sent: Friday, January 22, 2010 8:07 AM >>>> Subject: [Bioperl-l] Merging fragments in a simplealign >>>> >>>> >>>>> Hi, >>>>> I would like to write a script that merges fragments in a >>> Bio::SimpleAlign >>>>> object on the basis of >>>>> some $seq->display_name rule. >>>>> I basically want to start with something like this: >>>>> seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM >>>>> seq2.234 QWERTYU------------------- >>>>> seq2.345 ----------ASDFGH---------- >>>>> seq2.456 -------------------ZXCVBNM >>>>> And end with something like this: >>>>> seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM >>>>> seq2.mrg QWERTYU---ASDFGH---ZXCVBNM >>>>> Can people suggest any Bio::SimpleAlign methods that would help here? >>>>> Cheers, >>>>> Albert. >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Fri Jan 22 14:29:07 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 22 Jan 2010 14:29:07 -0500 Subject: [Bioperl-l] Merging fragments in a simplealign In-Reply-To: <14824B66-2112-46A1-98BB-02FC592A3A9B@illinois.edu> References: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com><058430B4-F149-438C-A9DD-C9E24545FDBB@illinois.edu><358f4d651001220804o3a96252ctc4b721771668f1ba@mail.gmail.com><358f4d651001220950v605a5b55kef0302ff6270f82e@mail.gmail.com> <14824B66-2112-46A1-98BB-02FC592A3A9B@illinois.edu> Message-ID: <0F7B7E5FE70D4C5CB34B27045561823C@NewLife> I'd recommend making an enhancement request via Bugzilla, so we don't forget- MAJ ----- Original Message ----- From: "Chris Fields" To: "Albert Vilella" Cc: "bioperl-l" Sent: Friday, January 22, 2010 2:22 PM Subject: Re: [Bioperl-l] Merging fragments in a simplealign > This could exist, but should go into a general Utilities module. Part of the > Align refactoring was to pull a good number of the methods into a general > utilities module, so this would fit into that category. > > chris > > On Jan 22, 2010, at 11:50 AM, Albert Vilella wrote: > >> Or to rephrase my answer, what is the closest way for the code below that >> already exists? >> >> On Fri, Jan 22, 2010 at 4:04 PM, Albert Vilella wrote: >> >>> Is there/should be a 'have_pairwise_overlap' method similar to this? >>> >>> # $seq1 and $seq3 have matching ids >>> my $seq1 = $aln->each_seq_by_id($seq1->display_id); >>> my $seq3 = $aln->each_seq_by_id($seq3->display_id); >>> >>> my $ret = $aln->have_pairwise_overlap($seq1,$seq3); >>> >>> >>> On Fri, Jan 22, 2010 at 1:40 PM, Chris Fields wrote: >>> >>>> May be something for the cook/scrapbook? >>>> >>>> chris >>>> >>>> On Jan 22, 2010, at 7:31 AM, Mark A. Jensen wrote: >>>> >>>>> Here's one of my favorite tricks for this: XOR mask on gap symbol. >>>>> MAJ >>>>> >>>>> use Bio::SeqIO; >>>>> use Bio::Seq; >>>>> use strict; >>>>> my $seqio = Bio::SeqIO->new( -fh => \*DATA ); >>>>> >>>>> my $acc = $seqio->next_seq->seq ^ '-'; >>>>> while ($_ = $seqio->next_seq ) { >>>>> $acc ^= ($_->seq ^ '-'); >>>>> } >>>>> my $mrg = Bio::Seq->new( -id => 'merged', >>>>> -seq => $acc ^ '-' ); >>>>> 1; >>>>> >>>>> >>>>> __END__ >>>>>> seq2.234 >>>>> QWERTYU------------------- >>>>>> seq2.345 >>>>> ----------ASDFGH---------- >>>>>> seq2.456 >>>>> -------------------ZXCVBNM >>>>> >>>>> ----- Original Message ----- From: "Albert Vilella" >>>> >>>>> To: >>>>> Sent: Friday, January 22, 2010 8:07 AM >>>>> Subject: [Bioperl-l] Merging fragments in a simplealign >>>>> >>>>> >>>>>> Hi, >>>>>> I would like to write a script that merges fragments in a >>>> Bio::SimpleAlign >>>>>> object on the basis of >>>>>> some $seq->display_name rule. >>>>>> I basically want to start with something like this: >>>>>> seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM >>>>>> seq2.234 QWERTYU------------------- >>>>>> seq2.345 ----------ASDFGH---------- >>>>>> seq2.456 -------------------ZXCVBNM >>>>>> And end with something like this: >>>>>> seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM >>>>>> seq2.mrg QWERTYU---ASDFGH---ZXCVBNM >>>>>> Can people suggest any Bio::SimpleAlign methods that would help here? >>>>>> Cheers, >>>>>> Albert. >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Fri Jan 22 14:33:41 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 22 Jan 2010 14:33:41 -0500 Subject: [Bioperl-l] BioPerl installation failed: please help In-Reply-To: <20100122173455.c19sbarmswswgswc@www.sms.ed.ac.uk> References: <20100122173455.c19sbarmswswgswc@www.sms.ed.ac.uk> Message-ID: <2ABEC492CD49450EAE4BFC7BA763E3DB@NewLife> Hina-- See the protocol at http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows#Comand-line_Installation for ActiveState installation. If it doesn't work, please let us know at which step the failure happened. cheers, MAJ ----- Original Message ----- From: "Hina Dalal" To: Sent: Friday, January 22, 2010 12:34 PM Subject: [Bioperl-l] BioPerl installation failed: please help Hi I have installed PERL from Activesate and now trying to install bioperl but can not do it . Neither from PPM (it is showing error "Ppm install failed: 404 not found") nor from CPAN manual installation. It is not allowing me to download nmake, showing that "the version of this file is not compatible with the version of windows you are running. Check your computer system information to see whether you need 32 bit or 64 bit of this program." Please help. Regards Hina -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Fri Jan 22 15:13:15 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 22 Jan 2010 15:13:15 -0500 Subject: [Bioperl-l] BioPerl installation failed: please help In-Reply-To: <20100122200118.053j5kc36skow0wg@www.sms.ed.ac.uk> References: <20100122173455.c19sbarmswswgswc@www.sms.ed.ac.uk><2ABEC492CD49450EAE4BFC7BA763E3DB@NewLife> <20100122200118.053j5kc36skow0wg@www.sms.ed.ac.uk> Message-ID: <9E5DE384E2C8416B8373E390ABDB7DFE@NewLife> Ok Hina, I'm not seeing any issues with the presence or availability of http://bioperl.org/DIST from my machine. Can you access that url in a browser? If not, the king of the King's Buildings may not be allowing access. Also, can you do the following: C:> ppm-shell ppm> repo list Note the number of the repo that corresponds to bioperl (if any) and do ppm> repo describe n where 'n' is that number, and send the output along. cheers, MAJ ----- Original Message ----- From: "Hina Dalal" To: "Mark A. Jensen" Sent: Friday, January 22, 2010 3:01 PM Subject: Re: [Bioperl-l] BioPerl installation failed: please help Hi Mark warm regards I was following that protocol only , but the problem is when I tried to do it from PPM, and when I reach at the stem install BioPerl, it is showing error "Ppm install failed: 404 not found" in the end. and when I tried it by CPAN /manual installation, I couldn't download nmake,its showing that "the version of this file is not compatible with the version of windows you are running. Check your computer system information to see whether you need 32 bit or 64 bit of this program and than contact the software publisher." What should I do? Please help. Regards Hina Quoting "Mark A. Jensen" : > Hina-- See the protocol at > http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows#Comand-line_Installation > for ActiveState installation. If it doesn't work, please let us know at > which step the failure happened. > cheers, MAJ > ----- Original Message ----- From: "Hina Dalal" > To: > Sent: Friday, January 22, 2010 12:34 PM > Subject: [Bioperl-l] BioPerl installation failed: please help > > > Hi > > I have installed PERL from Activesate and now trying to install > bioperl but can not do it . Neither from PPM (it is showing error "Ppm > install failed: 404 not found") nor from CPAN manual installation. It > is not allowing me to download nmake, showing that "the version of > this file is not compatible with the version of windows you are > running. Check your computer system information to see whether you > need 32 bit or 64 bit of this program." > > Please help. > > Regards > > Hina > > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From pengyu.ut at gmail.com Sun Jan 24 20:29:59 2010 From: pengyu.ut at gmail.com (Peng Yu) Date: Sun, 24 Jan 2010 19:29:59 -0600 Subject: [Bioperl-l] Transcribe in bioperl Message-ID: <366c6f341001241729t155483a7q65631b2495c338b0@mail.gmail.com> I found the function 'translate' in bioperl. But I don't find 'transcribe'. Is there such a function? From jason at bioperl.org Sun Jan 24 21:06:48 2010 From: jason at bioperl.org (Jason Stajich) Date: Sun, 24 Jan 2010 18:06:48 -0800 Subject: [Bioperl-l] Transcribe in bioperl In-Reply-To: <366c6f341001241729t155483a7q65631b2495c338b0@mail.gmail.com> References: <366c6f341001241729t155483a7q65631b2495c338b0@mail.gmail.com> Message-ID: What exactly do you want to do? spliced_seq for a feature would be the closest thing... -jason On Jan 24, 2010, at 5:29 PM, Peng Yu wrote: > I found the function 'translate' in bioperl. But I don't find > 'transcribe'. Is there such a function? > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ http://twitter.com/hyphaltip From pengyu.ut at gmail.com Sun Jan 24 21:22:12 2010 From: pengyu.ut at gmail.com (Peng Yu) Date: Sun, 24 Jan 2010 20:22:12 -0600 Subject: [Bioperl-l] Transcribe in bioperl In-Reply-To: References: <366c6f341001241729t155483a7q65631b2495c338b0@mail.gmail.com> Message-ID: <366c6f341001241822t790c19eeo5f5facffa0c35600@mail.gmail.com> To convert from T to U. I could use perl's builtin function. But it is semantically far away from 'transcribe'. If there is a function with name 'transcribe', it will be better. On Sun, Jan 24, 2010 at 8:06 PM, Jason Stajich wrote: > What exactly do you want to do? > spliced_seq for a feature would be the closest thing... > > -jason > On Jan 24, 2010, at 5:29 PM, Peng Yu wrote: > >> I found the function 'translate' in bioperl. But I don't find >> 'transcribe'. Is there such a function? >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > http://fungalgenomes.org/ > http://twitter.com/hyphaltip > > From maj at fortinbras.us Sun Jan 24 21:48:33 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 24 Jan 2010 21:48:33 -0500 Subject: [Bioperl-l] Transcribe in bioperl In-Reply-To: <366c6f341001241822t790c19eeo5f5facffa0c35600@mail.gmail.com> References: <366c6f341001241729t155483a7q65631b2495c338b0@mail.gmail.com> <366c6f341001241822t790c19eeo5f5facffa0c35600@mail.gmail.com> Message-ID: Not a bad idea, a semantics-preserving/checking thing. transcribe() could return an object with alphabet == 'rna' and the T's flipped, or bork if called against an object with alphbet != 'dna'. I can add such a thing to Bio::PrimarySeqI (where all these doodads seem to be stashed), if desired. ----- Original Message ----- From: "Peng Yu" To: "Jason Stajich" Cc: Sent: Sunday, January 24, 2010 9:22 PM Subject: Re: [Bioperl-l] Transcribe in bioperl > To convert from T to U. I could use perl's builtin function. But it is > semantically far away from 'transcribe'. If there is a function with > name 'transcribe', it will be better. > > On Sun, Jan 24, 2010 at 8:06 PM, Jason Stajich wrote: >> What exactly do you want to do? >> spliced_seq for a feature would be the closest thing... >> >> -jason >> On Jan 24, 2010, at 5:29 PM, Peng Yu wrote: >> >>> I found the function 'translate' in bioperl. But I don't find >>> 'transcribe'. Is there such a function? >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> jason.stajich at gmail.com >> jason at bioperl.org >> http://fungalgenomes.org/ >> http://twitter.com/hyphaltip >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Sun Jan 24 23:39:43 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 24 Jan 2010 22:39:43 -0600 Subject: [Bioperl-l] Transcribe in bioperl In-Reply-To: References: <366c6f341001241729t155483a7q65631b2495c338b0@mail.gmail.com> <366c6f341001241822t790c19eeo5f5facffa0c35600@mail.gmail.com> Message-ID: I think the main reason there hasn't been a transcribe() is that very few users ask for it. Most just use a quick '$seq =~ tr/T/U/', or use spliced_seq() and/or translate() (i.e. they don't care about the intermediate mRNA). I don't have a problem with adding a transcribe method to PrimarySeq, but (and Mark has already picked up on this) it should be constrained to DNA only and return RNA. And there might be a case for adding the analogous reverse_translate(). Also worth adding this to the proper interface class (PrimarySeqI, I think) so all Seq/PrimarySeq will have it (or have to implement their own). chris On Jan 24, 2010, at 8:48 PM, Mark A. Jensen wrote: > Not a bad idea, a semantics-preserving/checking thing. transcribe() could return an object with alphabet == 'rna' > and the T's flipped, or bork if called against an object with alphbet != 'dna'. > I can add such a thing to Bio::PrimarySeqI (where all these doodads seem to be stashed), if desired. > > ----- Original Message ----- From: "Peng Yu" > To: "Jason Stajich" > Cc: > Sent: Sunday, January 24, 2010 9:22 PM > Subject: Re: [Bioperl-l] Transcribe in bioperl > > >> To convert from T to U. I could use perl's builtin function. But it is >> semantically far away from 'transcribe'. If there is a function with >> name 'transcribe', it will be better. >> On Sun, Jan 24, 2010 at 8:06 PM, Jason Stajich wrote: >>> What exactly do you want to do? >>> spliced_seq for a feature would be the closest thing... >>> >>> -jason >>> On Jan 24, 2010, at 5:29 PM, Peng Yu wrote: >>> >>>> I found the function 'translate' in bioperl. But I don't find >>>> 'transcribe'. Is there such a function? >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> -- >>> Jason Stajich >>> jason.stajich at gmail.com >>> jason at bioperl.org >>> http://fungalgenomes.org/ >>> http://twitter.com/hyphaltip >>> >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Sun Jan 24 23:43:07 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 24 Jan 2010 22:43:07 -0600 Subject: [Bioperl-l] Transcribe in bioperl In-Reply-To: References: <366c6f341001241729t155483a7q65631b2495c338b0@mail.gmail.com> <366c6f341001241822t790c19eeo5f5facffa0c35600@mail.gmail.com> Message-ID: <489E0B85-0BC3-45DB-8660-494CF69F35FF@illinois.edu> On Jan 24, 2010, at 10:39 PM, Chris Fields wrote: > ...And there might be a case for adding the analogous reverse_translate(). Bah. Meant reverse_transcribe(). Ah well. chris From dan.kortschak at adelaide.edu.au Mon Jan 25 00:33:28 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Mon, 25 Jan 2010 16:03:28 +1030 Subject: [Bioperl-l] BEDTools module Message-ID: <1264397608.4898.9.camel@epistle> Hi All, A wrapper and parser for the BEDTools utilities suite of Aaron Quinlan and Ira Hall is now available in the bioperl-run subversion repository (bioperl-run/trunk r16754). Using BEDTools you can, among other things: * Intersecting two BED files in search of overlapping features. * Merging overlapping features. * Screening for paired-end (PE) overlaps between PE sequences and existing genomic features. * Calculating the depth and breadth of sequence coverage across defined "windows" in a genome. (see for manuals and downloads). BEDTools is a suite of 17 commandline executable. The module attempts to provide and options comprehensively and can return Bio::SeqIO or Bio::SeqFeature::Collection object where appropriate (or Bio::Root::IO where specific handling has not been implemented - please give feedback on desired features for this). cheers Dan From cjfields at illinois.edu Mon Jan 25 00:35:06 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 24 Jan 2010 23:35:06 -0600 Subject: [Bioperl-l] Distance between non-overlapping sequences in DNAStatistics Message-ID: <192B6949-26CA-45EA-A4E6-FD89F216CA84@illinois.edu> Just a quick question for those using DNAStatistics. I just fixed a bug in Bio::Align::DNAStatistics that failed with a div by zero error (bug 2901) on this data: >seq1 GGTACCAGCAGGTGGGCCGCCTACTGCGCACGCGCGGGTTTGCGGGCAGCCGC >seq2 GGTACCAGCTGGTGGGCCGCCTACTGCGCACGCGCGGGTTTGCGGGCAGCCGC >seq3 GGTACCAGCAGGTGGTCCGCCTA------------------------------ >seq4 --------------------------CGCACGCGCGTGTTTGCGGGCAGCCGC Since seq3 and seq4 don't overlap, the distance can't be calculated. In our case, I replace the score with 'NA' as a placeholder, but I'm worried about downstream app breakage. Anyone have an objection to using 'NA' here, or know of ways this may lead to problems elsewhere? chris From jason at bioperl.org Mon Jan 25 00:58:03 2010 From: jason at bioperl.org (Jason Stajich) Date: Sun, 24 Jan 2010 21:58:03 -0800 Subject: [Bioperl-l] Distance between non-overlapping sequences in DNAStatistics In-Reply-To: <192B6949-26CA-45EA-A4E6-FD89F216CA84@illinois.edu> References: <192B6949-26CA-45EA-A4E6-FD89F216CA84@illinois.edu> Message-ID: It could also return -1 which is used as place holder for NA in other programs that generate distance matrices. -jason On Jan 24, 2010, at 9:35 PM, Chris Fields wrote: > Just a quick question for those using DNAStatistics. I just fixed a > bug in Bio::Align::DNAStatistics that failed with a div by zero > error (bug 2901) on this data: > >> seq1 > GGTACCAGCAGGTGGGCCGCCTACTGCGCACGCGCGGGTTTGCGGGCAGCCGC >> seq2 > GGTACCAGCTGGTGGGCCGCCTACTGCGCACGCGCGGGTTTGCGGGCAGCCGC >> seq3 > GGTACCAGCAGGTGGTCCGCCTA------------------------------ >> seq4 > --------------------------CGCACGCGCGTGTTTGCGGGCAGCCGC > > Since seq3 and seq4 don't overlap, the distance can't be > calculated. In our case, I replace the score with 'NA' as a > placeholder, but I'm worried about downstream app breakage. Anyone > have an objection to using 'NA' here, or know of ways this may lead > to problems elsewhere? > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ http://twitter.com/hyphaltip From maj at fortinbras.us Mon Jan 25 08:17:54 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 25 Jan 2010 08:17:54 -0500 Subject: [Bioperl-l] Transcribe in bioperl In-Reply-To: References: <366c6f341001241729t155483a7q65631b2495c338b0@mail.gmail.com><366c6f341001241822t790c19eeo5f5facffa0c35600@mail.gmail.com> Message-ID: transcribe() and rev_transcribe added to Bio::PrimarySeqI, plus tests in t/Seq.t, @ r16757 MAJ ----- Original Message ----- From: "Chris Fields" To: "Mark A. Jensen" Cc: ; "Peng Yu" Sent: Sunday, January 24, 2010 11:39 PM Subject: Re: [Bioperl-l] Transcribe in bioperl >I think the main reason there hasn't been a transcribe() is that very few users >ask for it. Most just use a quick '$seq =~ tr/T/U/', or use spliced_seq() >and/or translate() (i.e. they don't care about the intermediate mRNA). I don't >have a problem with adding a transcribe method to PrimarySeq, but (and Mark has >already picked up on this) it should be constrained to DNA only and return RNA. >And there might be a case for adding the analogous reverse_translate(). > > Also worth adding this to the proper interface class (PrimarySeqI, I think) so > all Seq/PrimarySeq will have it (or have to implement their own). > > chris > > On Jan 24, 2010, at 8:48 PM, Mark A. Jensen wrote: > >> Not a bad idea, a semantics-preserving/checking thing. transcribe() could >> return an object with alphabet == 'rna' >> and the T's flipped, or bork if called against an object with alphbet != >> 'dna'. >> I can add such a thing to Bio::PrimarySeqI (where all these doodads seem to >> be stashed), if desired. >> >> ----- Original Message ----- From: "Peng Yu" >> To: "Jason Stajich" >> Cc: >> Sent: Sunday, January 24, 2010 9:22 PM >> Subject: Re: [Bioperl-l] Transcribe in bioperl >> >> >>> To convert from T to U. I could use perl's builtin function. But it is >>> semantically far away from 'transcribe'. If there is a function with >>> name 'transcribe', it will be better. >>> On Sun, Jan 24, 2010 at 8:06 PM, Jason Stajich wrote: >>>> What exactly do you want to do? >>>> spliced_seq for a feature would be the closest thing... >>>> >>>> -jason >>>> On Jan 24, 2010, at 5:29 PM, Peng Yu wrote: >>>> >>>>> I found the function 'translate' in bioperl. But I don't find >>>>> 'transcribe'. Is there such a function? >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> -- >>>> Jason Stajich >>>> jason.stajich at gmail.com >>>> jason at bioperl.org >>>> http://fungalgenomes.org/ >>>> http://twitter.com/hyphaltip >>>> >>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Mon Jan 25 08:23:12 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 25 Jan 2010 07:23:12 -0600 Subject: [Bioperl-l] BEDTools module In-Reply-To: <1264397608.4898.9.camel@epistle> References: <1264397608.4898.9.camel@epistle> Message-ID: <0F5CE93E-0E6C-4317-806B-A463A9B0917E@illinois.edu> Great work Dan! chris On Jan 24, 2010, at 11:33 PM, Dan Kortschak wrote: > Hi All, > > A wrapper and parser for the BEDTools utilities suite of Aaron Quinlan > and Ira Hall is now available in the bioperl-run subversion repository > (bioperl-run/trunk r16754). > > Using BEDTools you can, among other things: > > * Intersecting two BED files in search of overlapping features. > * Merging overlapping features. > * Screening for paired-end (PE) overlaps between PE sequences and > existing genomic features. > * Calculating the depth and breadth of sequence coverage across > defined "windows" in a genome. > > (see for manuals and downloads). > > BEDTools is a suite of 17 commandline executable. The module attempts to > provide and options comprehensively and can return Bio::SeqIO or > Bio::SeqFeature::Collection object where appropriate (or Bio::Root::IO > where specific handling has not been implemented - please give feedback > on desired features for this). > > cheers > Dan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Jan 25 08:27:26 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 25 Jan 2010 07:27:26 -0600 Subject: [Bioperl-l] Distance between non-overlapping sequences in DNAStatistics In-Reply-To: References: <192B6949-26CA-45EA-A4E6-FD89F216CA84@illinois.edu> Message-ID: That works for me, just want to ensure we're DTRT. I'll change it over. chris On Jan 24, 2010, at 11:58 PM, Jason Stajich wrote: > It could also return -1 which is used as place holder for NA in other programs that generate distance matrices. > -jason > On Jan 24, 2010, at 9:35 PM, Chris Fields wrote: > >> Just a quick question for those using DNAStatistics. I just fixed a bug in Bio::Align::DNAStatistics that failed with a div by zero error (bug 2901) on this data: >> >>> seq1 >> GGTACCAGCAGGTGGGCCGCCTACTGCGCACGCGCGGGTTTGCGGGCAGCCGC >>> seq2 >> GGTACCAGCTGGTGGGCCGCCTACTGCGCACGCGCGGGTTTGCGGGCAGCCGC >>> seq3 >> GGTACCAGCAGGTGGTCCGCCTA------------------------------ >>> seq4 >> --------------------------CGCACGCGCGTGTTTGCGGGCAGCCGC >> >> Since seq3 and seq4 don't overlap, the distance can't be calculated. In our case, I replace the score with 'NA' as a placeholder, but I'm worried about downstream app breakage. Anyone have an objection to using 'NA' here, or know of ways this may lead to problems elsewhere? >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > http://fungalgenomes.org/ > http://twitter.com/hyphaltip > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Mon Jan 25 08:41:38 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 25 Jan 2010 08:41:38 -0500 Subject: [Bioperl-l] BEDTools module In-Reply-To: <1264397608.4898.9.camel@epistle> References: <1264397608.4898.9.camel@epistle> Message-ID: <8D494783F87E4C32BD797008E260C3C2@NewLife> Rock 'n' roll, Dan! ----- Original Message ----- From: "Dan Kortschak" To: Sent: Monday, January 25, 2010 12:33 AM Subject: [Bioperl-l] BEDTools module > Hi All, > > A wrapper and parser for the BEDTools utilities suite of Aaron Quinlan > and Ira Hall is now available in the bioperl-run subversion repository > (bioperl-run/trunk r16754). > > Using BEDTools you can, among other things: > > * Intersecting two BED files in search of overlapping features. > * Merging overlapping features. > * Screening for paired-end (PE) overlaps between PE sequences and > existing genomic features. > * Calculating the depth and breadth of sequence coverage across > defined "windows" in a genome. > > (see for manuals and downloads). > > BEDTools is a suite of 17 commandline executable. The module attempts to > provide and options comprehensively and can return Bio::SeqIO or > Bio::SeqFeature::Collection object where appropriate (or Bio::Root::IO > where specific handling has not been implemented - please give feedback > on desired features for this). > > cheers > Dan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From rtbio.2009 at gmail.com Mon Jan 25 08:43:19 2010 From: rtbio.2009 at gmail.com (Roopa Raghuveer) Date: Mon, 25 Jan 2010 14:43:19 +0100 Subject: [Bioperl-l] Regarding blast in Bioperl Message-ID: Hello Mark,Chris and all, This is Roopa again. I have a small problem again. I am working on Remote blast. The program works well. But the problem is this. The program accesses the server and gets the output correctly. I am trying to send the result sequences into an array and I found that always the first sequence among the Result sequences is missing. The code is my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , '-organism' => "$organ\[ORGN]"); - Show quoted text - while (my $input = $str->next_seq()) { #Blast a sequence against a database: #Alternatively, you could pass in a file with many #sequences rather than loop through sequence one at a time #Remove the loop starting 'while (my $input = $str->next_seq())' #and swap the two lines below for an example of that. open(OUTFILE,'>',$debugfile); print OUTFILE $input; close(OUTFILE); my $r = $factory->submit_blast($input); open(OUTFILE,'>',$debugfile); # print OUTFILE $r; close(OUTFILE); print STDERR "waiting...." if($v>0); while ( my @rids = $factory->each_rid ) { open(OUTFILE,'>',$debugfile); # print OUTFILE "while entered"; close(OUTFILE); foreach my $rid ( @rids ) { open(OUTFILE,'>',$debugfile); # print OUTFILE "foreach entered"; close(OUTFILE); my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $factory->remove_rid($rid); } open(OUTFILE,'>',$debugfile); # print OUTFILE "if entered"; close(OUTFILE); print STDERR "." if ( $v > 0 ); sleep 5; } else { open(OUTFILE,'>',$debugfile); # print OUTFILE "else entered"; close(OUTFILE); my $result = $rc->next_result(); #save the output $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; open(BLASTDEBUGFILE,'>',$blastdebugfile); print BLASTDEBUGFILE $result->next_hit(); close(BLASTDEBUGFILE); my $filename = $serverpath."/blastdata_". time()."\.out"; # open(DEBUGFILE,'>',$debugfile); # open(new,'>',$filename); # @arra=; # print DEBUGFILE @arra; # close(DEBUGFILE); # close(new); $factory->save_output($filename); # open(BLASTDEBUGFILE,'>',$debugfile); # print BLASTDEBUGFILE "Hello $rid"; # close(BLASTDEBUGFILE); $factory->remove_rid($rid); open(BLASTDEBUGFILE,'>',$blastdebugfile); print BLASTDEBUGFILE $organism; close(BLASTDEBUGFILE); # open(OUTFILE,'>',$outfile); # print OUTFILE "Test2 $result->database_name()"; # close(OUTFILE); #$hit = $result->next_hit; #open(new,'>',$debugfile); #print $hit; #close(new); $dummy=0; while ( my $hit = $result->next_hit ) { next unless ( $v >= 0); # open(OUTFILE,'>',$debugfile); # print OUTFILE "$hit in while hits"; # close(OUTFILE); my $sequ = $gb->get_Seq_by_version($hit->name); my $dna = $sequ->seq(); # get the sequence as a string $dummy++; open(OUTFILE,'>',$debugfile); # print OUTFILE $dummy; close(OUTFILE); push(@seqs,$dna); } } } } } $warum=@seqs; open(OUTFILE,'>',$debugfile); # print OUTFILE $warum; print OUTFILE @seqs; close(OUTFILE); return(@seqs); } open(OUTFILE, '>',$outfile) || die ; print OUTFILE "\n RNAi Result \n \n

Inputsequence:
"; Here in the above code, I was trying to debug the code and trying to get the count of the array and even the sequence. But when the output data was giving 1 sequence, the count of the array was 0 and when I tried to print the output sequence I could not get any. It was the same when the no of output sequences was 3, I tried to print the sequences but was getting the count of the array as 2 and was printing only two sequences. Please help me in sorting out this problem. Regards, Roopa. From rtbio.2009 at gmail.com Mon Jan 25 08:44:57 2010 From: rtbio.2009 at gmail.com (Roopa Raghuveer) Date: Mon, 25 Jan 2010 14:44:57 +0100 Subject: [Bioperl-l] remote blast bioperl Message-ID: Hello all, I have a small problem again. I am working on Remote blast. The program works well. But the problem is this. The program accesses the server and gets the output correctly. I am trying to send the result sequences into an array and I found that always the first sequence among the Result sequences is missing. The code is my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , '-organism' => "$organ\[ORGN]"); - Show quoted text - while (my $input = $str->next_seq()) { #Blast a sequence against a database: #Alternatively, you could pass in a file with many #sequences rather than loop through sequence one at a time #Remove the loop starting 'while (my $input = $str->next_seq())' #and swap the two lines below for an example of that. open(OUTFILE,'>',$debugfile); print OUTFILE $input; close(OUTFILE); my $r = $factory->submit_blast($input); open(OUTFILE,'>',$debugfile); # print OUTFILE $r; close(OUTFILE); print STDERR "waiting...." if($v>0); while ( my @rids = $factory->each_rid ) { open(OUTFILE,'>',$debugfile); # print OUTFILE "while entered"; close(OUTFILE); foreach my $rid ( @rids ) { open(OUTFILE,'>',$debugfile); # print OUTFILE "foreach entered"; close(OUTFILE); my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $factory->remove_rid($rid); } open(OUTFILE,'>',$debugfile); # print OUTFILE "if entered"; close(OUTFILE); print STDERR "." if ( $v > 0 ); sleep 5; } else { open(OUTFILE,'>',$debugfile); # print OUTFILE "else entered"; close(OUTFILE); my $result = $rc->next_result(); #save the output $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; open(BLASTDEBUGFILE,'>',$blastdebugfile); print BLASTDEBUGFILE $result->next_hit(); close(BLASTDEBUGFILE); my $filename = $serverpath."/blastdata_". time()."\.out"; # open(DEBUGFILE,'>',$debugfile); # open(new,'>',$filename); # @arra=; # print DEBUGFILE @arra; # close(DEBUGFILE); # close(new); $factory->save_output($filename); # open(BLASTDEBUGFILE,'>',$debugfile); # print BLASTDEBUGFILE "Hello $rid"; # close(BLASTDEBUGFILE); $factory->remove_rid($rid); open(BLASTDEBUGFILE,'>',$blastdebugfile); print BLASTDEBUGFILE $organism; close(BLASTDEBUGFILE); # open(OUTFILE,'>',$outfile); # print OUTFILE "Test2 $result->database_name()"; # close(OUTFILE); #$hit = $result->next_hit; #open(new,'>',$debugfile); #print $hit; #close(new); $dummy=0; while ( my $hit = $result->next_hit ) { next unless ( $v >= 0); # open(OUTFILE,'>',$debugfile); # print OUTFILE "$hit in while hits"; # close(OUTFILE); my $sequ = $gb->get_Seq_by_version($hit->name); my $dna = $sequ->seq(); # get the sequence as a string $dummy++; open(OUTFILE,'>',$debugfile); # print OUTFILE $dummy; close(OUTFILE); push(@seqs,$dna); } } } } } $warum=@seqs; open(OUTFILE,'>',$debugfile); # print OUTFILE $warum; print OUTFILE @seqs; close(OUTFILE); return(@seqs); } open(OUTFILE, '>',$outfile) || die ; print OUTFILE "\n RNAi Result \n \n

Inputsequence:
"; Here in the above code, I was trying to debug the code and trying to get the count of the array and even the sequence. But when the output data was giving 1 sequence, the count of the array was 0 and when I tried to print the output sequence I could not get any. It was the same when the no of output sequences was 3, I tried to print the sequences but was getting the count of the array as 2 and was printing only two sequences. Please help me in sorting out this problem. Regards, Roopa. From cjfields at illinois.edu Mon Jan 25 09:05:44 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 25 Jan 2010 08:05:44 -0600 Subject: [Bioperl-l] remote blast bioperl In-Reply-To: References: Message-ID: <7E402CC5-9C66-4315-B437-7C4EC2317371@illinois.edu> Roopa, We have received all 4+ of your posts. There is absolutely no need for you to keep repeatedly posting the same thing to the list. Be patient, we'll try to get to you as soon as we can! chris On Jan 25, 2010, at 7:44 AM, Roopa Raghuveer wrote: > Hello all, > > I have a small problem again. I am working on Remote blast. The program works well. But the problem is this. The program accesses the server and gets the output correctly. I am trying to send the result sequences into an array and I found that always the first sequence among the Result sequences is missing. The code is > > my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , '-organism' => "$organ\[ORGN]"); > - Show quoted text - > > > while (my $input = $str->next_seq()) > { > #Blast a sequence against a database: > #Alternatively, you could pass in a file with many > #sequences rather than loop through sequence one at a time > #Remove the loop starting 'while (my $input = $str->next_seq())' > #and swap the two lines below for an example of that. > > open(OUTFILE,'>',$debugfile); > print OUTFILE $input; > close(OUTFILE); > > > my $r = $factory->submit_blast($input); > > open(OUTFILE,'>',$debugfile); > # print OUTFILE $r; > close(OUTFILE); > > > print STDERR "waiting...." if($v>0); > > while ( my @rids = $factory->each_rid ) { > open(OUTFILE,'>',$debugfile); > # print OUTFILE "while entered"; > close(OUTFILE); > foreach my $rid ( @rids ) { > > open(OUTFILE,'>',$debugfile); > # print OUTFILE "foreach entered"; > close(OUTFILE); > > my $rc = $factory->retrieve_blast($rid); > > if( !ref($rc) ) > { > if( $rc < 0 ) > { > $factory->remove_rid($rid); > } > open(OUTFILE,'>',$debugfile); > # print OUTFILE "if entered"; > close(OUTFILE); > print STDERR "." if ( $v > 0 ); > sleep 5; > } > else { > open(OUTFILE,'>',$debugfile); > # print OUTFILE "else entered"; > close(OUTFILE); > > my $result = $rc->next_result(); > #save the output > $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; > > open(BLASTDEBUGFILE,'>',$blastdebugfile); > print BLASTDEBUGFILE $result->next_hit(); > close(BLASTDEBUGFILE); > > my $filename = $serverpath."/blastdata_". > time()."\.out"; > > > # open(DEBUGFILE,'>',$debugfile); > # open(new,'>',$filename); > # @arra=; > # print DEBUGFILE @arra; > # close(DEBUGFILE); > # close(new); > > $factory->save_output($filename); > > # open(BLASTDEBUGFILE,'>',$debugfile); > # print BLASTDEBUGFILE "Hello $rid"; > # close(BLASTDEBUGFILE); > > $factory->remove_rid($rid); > > open(BLASTDEBUGFILE,'>',$blastdebugfile); > print BLASTDEBUGFILE $organism; > close(BLASTDEBUGFILE); > > # open(OUTFILE,'>',$outfile); > # print OUTFILE "Test2 $result->database_name()"; > # close(OUTFILE); > > #$hit = $result->next_hit; > #open(new,'>',$debugfile); > #print $hit; > #close(new); > $dummy=0; > > while ( my $hit = $result->next_hit ) { > > next unless ( $v >= 0); > > # open(OUTFILE,'>',$debugfile); > # print OUTFILE "$hit in while hits"; > # close(OUTFILE); > my $sequ = $gb->get_Seq_by_version($hit->name); > my $dna = $sequ->seq(); # get the sequence as a string > $dummy++; > open(OUTFILE,'>',$debugfile); > # print OUTFILE $dummy; > close(OUTFILE); > push(@seqs,$dna); > } > } > } > } > } > > $warum=@seqs; > open(OUTFILE,'>',$debugfile); > # print OUTFILE $warum; > print OUTFILE @seqs; > > close(OUTFILE); > return(@seqs); > } > > open(OUTFILE, '>',$outfile) || die ; > > print OUTFILE "\n > RNAi Result > \n > \n >

> Inputsequence:
"; > > > Here in the above code, I was trying to debug the code and trying to get the count of the array and even the sequence. But when the output data was giving 1 sequence, the count of the array was 0 and when I tried to print the output sequence I could not get any. It was the same when the no of output sequences was 3, I tried to print the sequences but was getting the count of the array as 2 and was printing only two sequences. > > Please help me in sorting out this problem. > > Regards, > Roopa. From jiann-jy at hotmail.com Sun Jan 24 21:03:55 2010 From: jiann-jy at hotmail.com (JY) Date: Sun, 24 Jan 2010 18:03:55 -0800 (PST) Subject: [Bioperl-l] how to retrieve accession number by taxon id?? Message-ID: <4cef88b5-fa53-4e63-9167-30075c10a058@k19g2000yqc.googlegroups.com> i need to retrieve accession number and sequence to complete one of my part in my project, but how to retrieve accession number by the taxon id. From lpaulet at ual.es Mon Jan 25 15:25:55 2010 From: lpaulet at ual.es (Lorenzo Carretero-Paulet) Date: Mon, 25 Jan 2010 21:25:55 +0100 Subject: [Bioperl-l] HTMLResultWriter Message-ID: <4B5DFE53.2000201@ual.es> Hi all, I'm trying to generate a subroutine that performs a BLAST search and returns the corresponding reports in txt, xml and html format. I?m experiencing problems with the latter, as the program returns the following error message: "Can't call method "next_result" without a package or object reference at..." sub blasting { my ($query, $E_value) = @_; my ($outputfilenameB, $outputfilenameX, $outputfilenameH); $outputfilenameB=$query.".BLAST.txt"; $outputfilenameX=$query.".BLAST.xml"; $outputfilenameH=$query.".BLAST.html"; #legacy_blast.pl blastall -i query -d nr -o blast.out --path /opt/blast/bin print qx(du -s /tmp); my _$blast_report _=qx/$blast -p blastp -d $database -i $query -e $E_value -b 20000 -o $outputfilenameB/; my $XMLblast_report =qx/$blast -p blastp -d $database -i $query -e $E_value -m 7 -b 20000 -o $outputfilenameX/; my $writerhtml = new Bio::SearchIO::Writer::HTMLResultWriter(); my $outhtml = new Bio::SearchIO(-writer => $writerhtml, -file => ">$outputfilenameH"); while( my $result = _$blast_report_->next_result ) { # get a result from Bio::SearchIO parsing or build it up in memory $outhtml->write_result($result); } } Can anyone see where the problem is? Cheers! Lorenzo From lpaulet at ual.es Mon Jan 25 15:31:08 2010 From: lpaulet at ual.es (lpaulet at ual.es) Date: Mon, 25 Jan 2010 21:31:08 +0100 Subject: [Bioperl-l] HTMLResultWriter Message-ID: <20100125213108.zws18jpd8gwwkssk@webmail.ual.es> Hi all, I'm trying to generate a subroutine that performs a BLAST search and returns the corresponding reports in txt, xml and html format. I?m experiencing problems with the latter, as the program returns the following error message: "Can't call method "next_result" without a package or object reference at..." sub blasting { my ($query, $E_value) = @_; my ($outputfilenameB, $outputfilenameX, $outputfilenameH); $outputfilenameB=$query.".BLAST.txt"; $outputfilenameX=$query.".BLAST.xml"; $outputfilenameH=$query.".BLAST.html"; #legacy_blast.pl blastall -i query -d nr -o blast.out --path /opt/blast/bin print qx(du -s /tmp); my $blast_report =qx/$blast -p blastp -d $database -i $query -e $E_value -b 20000 -o $outputfilenameB/; my $XMLblast_report =qx/$blast -p blastp -d $database -i $query -e $E_value -m 7 -b 20000 -o $outputfilenameX/; my $writerhtml = new Bio::SearchIO::Writer::HTMLResultWriter(); my $outhtml = new Bio::SearchIO(-writer => $writerhtml, -file => ">$outputfilenameH"); while( my $result = $blast_report->next_result ) { # get a result from Bio::SearchIO parsing or build it up in memory $outhtml->write_result($result); } } Can anyone see where the problem is? Cheers! Lorenzo From dan.kortschak at adelaide.edu.au Mon Jan 25 16:00:37 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Tue, 26 Jan 2010 07:30:37 +1030 Subject: [Bioperl-l] Transcribe in bioperl In-Reply-To: References: Message-ID: <1264453237.4552.3.camel@epistle> A reverse_translate to IUPAC degenerate codes is not a bad idea, particularly for PCR primer design. Dan On Mon, 2010-01-25 at 09:05 -0500, bioperl-l-request at lists.open-bio.org wrote: > On Jan 24, 2010, at 10:39 PM, Chris Fields wrote: > > > ...And there might be a case for adding the analogous > reverse_translate(). > > Bah. Meant reverse_transcribe(). Ah well. > > chris From maj at fortinbras.us Mon Jan 25 16:07:49 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 25 Jan 2010 16:07:49 -0500 Subject: [Bioperl-l] HTMLResultWriter In-Reply-To: <20100125213108.zws18jpd8gwwkssk@webmail.ual.es> References: <20100125213108.zws18jpd8gwwkssk@webmail.ual.es> Message-ID: Lorenzo-- your $blast_report is set to be (some of) the text returned by a system call of a blast program; this isn't going to be an object of any kind, and so no functions can be called from it (as at "$blast_report->next_result"). You need to parse the text generated by the blast call using Bio::SearchIO to get a Bio::Search::Result::BlastResult object. you could do @blast_lines = qx/ ...your blast call... /; open my $bf, ">my.blast"; print $bf, @blast_lines; close $bf; $blast_result = Bio::SearchIO->new(-file=>'my.blast', -format => 'blast'); and carry on from there. But why not look at Bio::Tools::Run::StandAloneBlast or Bio::Tools::Run::StandAloneBlastPlus to run your blasts within perl? These wrap the blast programs and deliver BioPerl objects, rather than plain text output. cheers MAJ ----- Original Message ----- From: To: Sent: Monday, January 25, 2010 3:31 PM Subject: [Bioperl-l] HTMLResultWriter Hi all, I'm trying to generate a subroutine that performs a BLAST search and returns the corresponding reports in txt, xml and html format. I?m experiencing problems with the latter, as the program returns the following error message: "Can't call method "next_result" without a package or object reference at..." sub blasting { my ($query, $E_value) = @_; my ($outputfilenameB, $outputfilenameX, $outputfilenameH); $outputfilenameB=$query.".BLAST.txt"; $outputfilenameX=$query.".BLAST.xml"; $outputfilenameH=$query.".BLAST.html"; #legacy_blast.pl blastall -i query -d nr -o blast.out --path /opt/blast/bin print qx(du -s /tmp); my $blast_report =qx/$blast -p blastp -d $database -i $query -e $E_value -b 20000 -o $outputfilenameB/; my $XMLblast_report =qx/$blast -p blastp -d $database -i $query -e $E_value -m 7 -b 20000 -o $outputfilenameX/; my $writerhtml = new Bio::SearchIO::Writer::HTMLResultWriter(); my $outhtml = new Bio::SearchIO(-writer => $writerhtml, -file => ">$outputfilenameH"); while( my $result = $blast_report->next_result ) { # get a result from Bio::SearchIO parsing or build it up in memory $outhtml->write_result($result); } } Can anyone see where the problem is? Cheers! Lorenzo _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Mon Jan 25 16:09:24 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 25 Jan 2010 22:09:24 +0100 Subject: [Bioperl-l] HTMLResultWriter In-Reply-To: <4B5DFE53.2000201@ual.es> References: <4B5DFE53.2000201@ual.es> Message-ID: > my _$blast_report _=qx/$blast -p blastp -d $database -i $query -e $E_value -b 20000 -o $outputfilenameB/; > while( my $result = _$blast_report_->next_result ) { # get a result from Bio::SearchIO parsing or build it up in memory _$blast_report_ is not a valid variable name, as far as I know. Plus there's a space between report and the final '_' in the first of the above two lines. Does this code compile? Dave From Russell.Smithies at agresearch.co.nz Mon Jan 25 16:14:15 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue, 26 Jan 2010 10:14:15 +1300 Subject: [Bioperl-l] Regarding blast in Bioperl In-Reply-To: References: Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C61AC908B@exchsth.agresearch.co.nz> That's a fair mix of incomplete code you've supplied!! Did you read the documentation for RemoteBlast? The example there will do 99% of what you want. http://search.cpan.org/~cjfields/BioPerl-1.6.1/Bio/Tools/Run/RemoteBlast.pm I'm not entirely sure what you're trying to do (as you've left out a bit of your code) but I assume you're trying to retrieve and print the sequence for each hit. Here's something that works, not sure exactly what/why you want to print but it should get you a bit further. --Russell ================================ #!perl -w use Bio::Tools::Run::RemoteBlast; use Bio::DB::GenBank; use CGI ':standard'; use strict; my $q = new CGI; my @params = ( -prog => 'blastn', -data => 'nr', -expect => '1e-30', -entrez_query => 'Homo sapiens [ORGN]', -readmethod => 'SearchIO' ); my $gb = Bio::DB::GenBank->new; my $factory = Bio::Tools::Run::RemoteBlast->new(@params); #$v is just to turn on and off the messages my $v = 1; my $str = Bio::SeqIO->new( -file => 'test.faa', -format => "fasta" ); while ( my $input = $str->next_seq() ) { my $r = $factory->submit_blast($input); print STDERR "waiting..." if ( $v > 0 ); while ( my @rids = $factory->each_rid ) { foreach my $rid (@rids) { my @seqs = (); my $rc = $factory->retrieve_blast($rid); if ( !ref($rc) ) { if ( $rc < 0 ) { $factory->remove_rid($rid); } print STDERR "." if ( $v > 0 ); sleep 5; } else { my $result = $rc->next_result(); #save the blast output my $filename = $result->query_accession . '.out'; $factory->save_output($filename); $factory->remove_rid($rid); print "\nQuery Name: ", $result->query_name(), "\n"; while ( my $hit = $result->next_hit ) { # store the hit sequences push @seqs, $gb->get_Seq_by_version( $hit->name ); next unless ( $v > 0 ); print "\thit name is ", $hit->name, "\n"; while ( my $hsp = $hit->next_hsp ) { print "\t\tscore is ", $hsp->score, "\n"; } } ## print the seqs you've retrieved?? open( OUTFILE, '>', $result->query_accession . '.htm' ); print OUTFILE $q->start_html('RNAi Result'), $q->h1('RNAi Result'), $q->h2('Input'), $q->pre( toString($input) ), $q->h2('Output'); foreach (@seqs) { #there's probably a better way of printing the seq print OUTFILE $q->pre( toString($_) ); } print OUTFILE $q->end_html; close OUTFILE; } } } } sub toString { my $s = shift; return '>' . $s->display_id . " " . $s->desc . "\n" . $s->seq; } ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From biopython at maubp.freeserve.co.uk Mon Jan 25 16:24:33 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 25 Jan 2010 21:24:33 +0000 Subject: [Bioperl-l] Transcribe in bioperl In-Reply-To: <1264453237.4552.3.camel@epistle> References: <1264453237.4552.3.camel@epistle> Message-ID: <320fb6e01001251324r3e8ec3adt3c6d6f16a4839f56@mail.gmail.com> On Mon, Jan 25, 2010 at 9:00 PM, Dan Kortschak wrote: > A reverse_translate to IUPAC degenerate codes is not a bad idea, > particularly for PCR primer design. I would say it could be a bad idea. For any protein string there are multiple possible back translations, and this cannot be captured fully as a nucleotide string even using the IUPAC ambiguity chars. We debated this back and forth for Biopython, and decided to leave it out. It wasn't possible for a simple back translate to a simple string to handle the use cases we considered, and other options like returning a regular expression covering all possible back translations were too complex (for a core sequence method/function). Peter From jason at bioperl.org Mon Jan 25 16:26:55 2010 From: jason at bioperl.org (Jason Stajich) Date: Mon, 25 Jan 2010 13:26:55 -0800 Subject: [Bioperl-l] Transcribe in bioperl In-Reply-To: <320fb6e01001251324r3e8ec3adt3c6d6f16a4839f56@mail.gmail.com> References: <1264453237.4552.3.camel@epistle> <320fb6e01001251324r3e8ec3adt3c6d6f16a4839f56@mail.gmail.com> Message-ID: <98995830-DC7F-4404-A216-874EF5799DB6@bioperl.org> It was already implemented several years ago -- reverse_translate Bio::Tools::CodonTable -> revtanslate my $seqobj = Bio::PrimarySeq->new(-seq => 'FHGERHEL'); my $iupac_str = $myCodonTable->reverse_translate_all($seqobj); Chris had meant to say reverse_transcribe of RNA -> DNA FWIW. -jason On Jan 25, 2010, at 1:24 PM, Peter wrote: > On Mon, Jan 25, 2010 at 9:00 PM, Dan Kortschak > wrote: >> A reverse_translate to IUPAC degenerate codes is not a bad idea, >> particularly for PCR primer design. > > I would say it could be a bad idea. For any protein string there are > multiple possible back translations, and this cannot be captured > fully as a nucleotide string even using the IUPAC ambiguity chars. > > We debated this back and forth for Biopython, and decided to leave it > out. It wasn't possible for a simple back translate to a simple > string to > handle the use cases we considered, and other options like returning > a regular expression covering all possible back translations were too > complex (for a core sequence method/function). > > Peter > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ http://twitter.com/hyphaltip From maj at fortinbras.us Mon Jan 25 16:19:24 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 25 Jan 2010 16:19:24 -0500 Subject: [Bioperl-l] Transcribe in bioperl In-Reply-To: <1264453237.4552.3.camel@epistle> References: <1264453237.4552.3.camel@epistle> Message-ID: <72B106F0D5FF4F1E858CC9BD1EF33142@NewLife> I think we have that functionality in Bio::Tools::SeqPattern, courtesy of Bruno V--- ----- Original Message ----- From: "Dan Kortschak" To: Sent: Monday, January 25, 2010 4:00 PM Subject: Re: [Bioperl-l] Transcribe in bioperl >A reverse_translate to IUPAC degenerate codes is not a bad idea, > particularly for PCR primer design. > > Dan > > On Mon, 2010-01-25 at 09:05 -0500, bioperl-l-request at lists.open-bio.org > wrote: >> On Jan 24, 2010, at 10:39 PM, Chris Fields wrote: >> >> > ...And there might be a case for adding the analogous >> reverse_translate(). >> >> Bah. Meant reverse_transcribe(). Ah well. >> >> chris > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From dan.kortschak at adelaide.edu.au Mon Jan 25 16:38:44 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Tue, 26 Jan 2010 08:08:44 +1030 Subject: [Bioperl-l] Transcribe in bioperl In-Reply-To: <320fb6e01001251324r3e8ec3adt3c6d6f16a4839f56@mail.gmail.com> References: <1264453237.4552.3.camel@epistle> <320fb6e01001251324r3e8ec3adt3c6d6f16a4839f56@mail.gmail.com> Message-ID: <1264455524.4552.23.camel@epistle> Good to see that these ideas have been considered. I'd be interested to see this discussion, or at least the point dealing with the problems that might arise. I'm at a loss as to how ambiguity codes can't completely describe all possible coding sequences for any given codon table (via Bio::Tools::CodonTable - in fact this already has the revtranslate that could be fitted into a Bio::PrimarySeq method - to answer Mark and Jason's comments, I think that /if/ a reverse_translate method exists, it makes logical sense to have it tied to a sequence object, calling the B:T:CT method on the seq object itself rather than only in Bio::Tools, 2?). Pete, tcn you provide an example of the problems? thanks Dan On Mon, 2010-01-25 at 21:24 +0000, Peter wrote: > I would say it could be a bad idea. For any protein string there are > multiple possible back translations, and this cannot be captured > fully as a nucleotide string even using the IUPAC ambiguity chars. From lpaulet at ual.es Mon Jan 25 16:53:07 2010 From: lpaulet at ual.es (lpaulet at ual.es) Date: Mon, 25 Jan 2010 22:53:07 +0100 Subject: [Bioperl-l] HTMLResultWriter In-Reply-To: References: <4B5DFE53.2000201@ual.es> Message-ID: <20100125225307.2zl2cn2hkcsgccso@webmail.ual.es> Thanks Dave and Mark. Quoting Dave Messina : >> my _$blast_report _=qx/$blast -p blastp -d $database -i $query -e >> $E_value -b 20000 -o $outputfilenameB/; > >> while( my $result = _$blast_report_->next_result ) { # get a result >> from Bio::SearchIO parsing or build it up in memory > > > _$blast_report_ is not a valid variable name, as far as I know. Plus > there's a space between report and the final '_' in the first of > the above two lines. > > Does this code compile? > > Dave > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From rtbio.2009 at gmail.com Mon Jan 25 17:35:32 2010 From: rtbio.2009 at gmail.com (Roopa Raghuveer) Date: Mon, 25 Jan 2010 23:35:32 +0100 Subject: [Bioperl-l] Regarding blast in Bioperl In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C61AC908B@exchsth.agresearch.co.nz> References: <18DF7D20DFEC044098A1062202F5FFF32C61AC908B@exchsth.agresearch.co.nz> Message-ID: Hello Russell, Thank you very much for your reply. My problem is that Remote blast is getting well executed with my code and I am getting the .out file with sequences producing significant alignments. But, when I am trying to retrieve the sequences into an array @seqs, I am able to retrieve all the sequences except for the first hit. If the number of hits that I get in the .out file to be 3, I am able to retrieve only 2 hits i.e., I am able to get only 2 sequences. If there is only one significant hit for my sequence, then the name and description of the sequence appears in the .out file, but I am unable to get it into the array,the array count shows 0 and there would not be any sequence in the array. I hope that you have got me now. Here comes my code, use Bio::SearchIO; use Bio::Search::Result::BlastResult; use Bio::Perl; use Bio::Tools::Run::RemoteBlast; use Bio::Seq; use Bio::SeqIO; use Bio::DB::GenBank; $serverpath = "/srv/www/htdocs/rain/RNAi"; $serverurl = "http://141.84.66.66/rain/RNAi"; $outfile = $serverpath."/rnairesult_".time().".html"; $nuc = $serverpath."/nuc".time().".txt"; $debugfile = $serverpath."/debug_".time().".txt"; $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; my $outstring =""; &parse_form; print "Content-type: text/html\n\n"; print "\n"; print "RNAi Result"; print " \n"; print "\n"; print "\n"; print " Your results will appear here
"; print " Please be patient, runtime can be up to 5 minutes
"; print " This page will automatically reload in 30 seconds."; print "\n"; print "\n"; defined(my $pid = fork) or die "Can't fork: $!"; exit if $pid; open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; open(OUTFILE, '>',$outfile); print OUTFILE "\n RNAi Result \n \n \n Your results will appear here
Please be patient, runtime can be up to 5 minutes
This page will automatically reload in 30 seconds
\n \n"; close(OUTFILE); @compseqs = blastcode($in{'Inputseq'},$in{'Organism'}); $in{'Inputseq'} =~ s/>.*$//m; $in{'Inputseq'} =~ s/[^TAGC]//gim; $in{'Inputseq'} =~ tr/actg/ACTG/; @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'}, $in{'Threshold'}); sub blastcode { $inpu1= $_[0]; $organ= $_[1]; open(NUC,'>',$nuc); print NUC $inpu1,"\n"; close(NUC); my $prog = 'blastn'; my $db = 'refseq_rna'; my $e_val= '1e-10'; my $organism= $organ; $gb = new Bio::DB::GenBank; my @params = ( '-prog' => $prog, '-data' => $db, '-expect' => $e_val, '-readmethod' => 'SearchIO', '-Organism' => $organism ); # open(OUTFILE,'>',$debugfile); # print OUTFILE @params; # close(OUTFILE); my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY => "$organ\[ORGN]"); #my $factory = Bio::Tools::Run::RemoteBlast->new(@params); #change a paramter #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma Brucei[ORGN]'; #change a paramter # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]'; my $v = 1; #$v is just to turn on and off the messages my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , '-organism' => "$organ\[ORGN]"); while (my $input = $str->next_seq()) { #Blast a sequence against a database: #Alternatively, you could pass in a file with many #sequences rather than loop through sequence one at a time #Remove the loop starting 'while (my $input = $str->next_seq())' #and swap the two lines below for an example of that. open(OUTFILE,'>',$debugfile); print OUTFILE $input; close(OUTFILE); my $r = $factory->submit_blast($input); open(OUTFILE,'>',$debugfile); # print OUTFILE $r; close(OUTFILE); print STDERR "waiting...." if($v>0); while ( my @rids = $factory->each_rid ) { foreach my $rid ( @rids ) { my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $factory->remove_rid($rid); } print STDERR "." if ( $v > 0 ); sleep 5; } else { my $result = $rc->next_result(); #save the output $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; open(BLASTDEBUGFILE,'>',$blastdebugfile); print BLASTDEBUGFILE $result->next_hit(); close(BLASTDEBUGFILE); my $filename = $serverpath."/blastdata_".time()."\.out"; # open(DEBUGFILE,'>',$debugfile); # open(new,'>',$filename); # @arra=; # print DEBUGFILE @arra; # close(DEBUGFILE); # close(new); $factory->save_output($filename); # open(BLASTDEBUGFILE,'>',$debugfile); # print BLASTDEBUGFILE "Hello $rid"; # close(BLASTDEBUGFILE); $factory->remove_rid($rid); while ( my $hit = $result->next_hit ) { next unless ( $v >= 0); my $sequ = $gb->get_Seq_by_version($hit->name); my $dna = $sequ->seq(); # get the sequence as a string $dummy++; open(OUTFILE,'>',$debugfile); open(OUTFILE,'>',$debugfile); # print OUTFILE $dna; close(OUTFILE); push(@seqs,$dna); } } } } } $warum=scalar(@seqs); open(OUTFILE,'>',$debugfile); print OUTFILE $warum; # print OUTFILE @seqs; close(OUTFILE); return(@seqs); } open(OUTFILE, '>',$outfile) || die ; print OUTFILE "\n RNAi Result \n \n

Inputsequence:
"; for ($i=0; $i\n"; } } print OUTFILE "

"; $z=@compseqs; for($k=0;$k<$z;$k++) { print OUTFILE "

Compare Sequence:
"; for ($i=0; $i\n"; } } print OUTFILE "

"; } print OUTFILE "

Window:
$in{'Windowsize'}

Threshold:
$in{'Threshold'}

"; my $j=0; for ($i=0; $i{similar}<=$in{'Threshold'}){ $j=$in{'Windowsize'}; } $height=$out[$i]->{similar}*5; } if ($j>0) { print OUTFILE ""; $outstring .= "".substr ($in{'Inputseq'}, $i, 1).""; $j--; } else { print OUTFILE ""; $outstring .= "".substr ($in{'Inputseq'}, $i, 1).""; } if ( ($i+1)%10==0){ $outstring .= " "; } if ( ($i+1)%60==0){ $outstring .= "
\n"; } if ( ($i+1)%800==0){ print OUTFILE "

\n"; } } print OUTFILE "

$outstring"; #foreach (@out) { #print OUTFILE "

Sequence: $_->{sequence}: $_->{similar} matchs

"; #if ($_->{similar}<=$in{'Threshold'}){ # } #} print OUTFILE "\n\n"; close OUTFILE; #nameprint(); sub parse_form { local ($buffer, @pairs, $pair, $name, $value); # Read in text $ENV{'REQUEST_METHOD'} =~ tr/a-z/A-Z/; if ($ENV{'REQUEST_METHOD'} eq "POST") { read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'}); } else { $buffer = $ENV{'QUERY_STRING'}; } @pairs = split(/&/, $buffer); foreach $pair (@pairs) { ($name, $value) = split(/=/, $pair); $value =~ tr/+/ /; $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg; $in{$name} = $value; } } Regards, Roopa. On Mon, Jan 25, 2010 at 10:14 PM, Smithies, Russell < Russell.Smithies at agresearch.co.nz> wrote: > That's a fair mix of incomplete code you've supplied!! > Did you read the documentation for RemoteBlast? The example there will do > 99% of what you want. > http://search.cpan.org/~cjfields/BioPerl-1.6.1/Bio/Tools/Run/RemoteBlast.pm > > I'm not entirely sure what you're trying to do (as you've left out a bit of > your code) but I assume you're trying to retrieve and print the sequence for > each hit. > > Here's something that works, not sure exactly what/why you want to print > but it should get you a bit further. > > --Russell > > > ================================ > #!perl -w > > use Bio::Tools::Run::RemoteBlast; > use Bio::DB::GenBank; > > use CGI ':standard'; > > use strict; > > my $q = new CGI; > > my @params = ( > -prog => 'blastn', > -data => 'nr', > -expect => '1e-30', > -entrez_query => 'Homo sapiens [ORGN]', > -readmethod => 'SearchIO' > ); > > my $gb = Bio::DB::GenBank->new; > > my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > > #$v is just to turn on and off the messages > my $v = 1; > > my $str = Bio::SeqIO->new( -file => 'test.faa', -format => "fasta" ); > > while ( my $input = $str->next_seq() ) { > > my $r = $factory->submit_blast($input); > > print STDERR "waiting..." if ( $v > 0 ); > while ( my @rids = $factory->each_rid ) { > foreach my $rid (@rids) { > my @seqs = (); > my $rc = $factory->retrieve_blast($rid); > if ( !ref($rc) ) { > if ( $rc < 0 ) { > $factory->remove_rid($rid); > } > print STDERR "." if ( $v > 0 ); > sleep 5; > } > else { > my $result = $rc->next_result(); > > #save the blast output > my $filename = $result->query_accession . '.out'; > $factory->save_output($filename); > $factory->remove_rid($rid); > print "\nQuery Name: ", $result->query_name(), "\n"; > while ( my $hit = $result->next_hit ) { > > # store the hit sequences > push @seqs, $gb->get_Seq_by_version( $hit->name ); > > next unless ( $v > 0 ); > print "\thit name is ", $hit->name, "\n"; > while ( my $hsp = $hit->next_hsp ) { > print "\t\tscore is ", $hsp->score, "\n"; > } > } > > ## print the seqs you've retrieved?? > open( OUTFILE, '>', $result->query_accession . '.htm' ); > print OUTFILE $q->start_html('RNAi Result'), > $q->h1('RNAi Result'), > $q->h2('Input'), > $q->pre( toString($input) ), > $q->h2('Output'); > > foreach (@seqs) { > > #there's probably a better way of printing the seq > print OUTFILE $q->pre( toString($_) ); > } > print OUTFILE $q->end_html; > close OUTFILE; > } > } > } > } > > sub toString { > my $s = shift; > return '>' . $s->display_id . " " . $s->desc . "\n" . $s->seq; > } > > > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > From ajmackey at gmail.com Tue Jan 26 08:24:43 2010 From: ajmackey at gmail.com (Aaron Mackey) Date: Tue, 26 Jan 2010 08:24:43 -0500 Subject: [Bioperl-l] Transcribe in bioperl In-Reply-To: <1264455524.4552.23.camel@epistle> References: <1264453237.4552.3.camel@epistle> <320fb6e01001251324r3e8ec3adt3c6d6f16a4839f56@mail.gmail.com> <1264455524.4552.23.camel@epistle> Message-ID: <24c96eca1001260524s3d46e850hfdcc461e22210972@mail.gmail.com> There's also Bio::Tools::IUPAC; given a sequence with IUPAC ambiguity codes, it provides a SeqIO stream that enumerates all the possible unambiguous realizations. Not the right solution for every situation, but quite useful when you need it. -Aaron On Mon, Jan 25, 2010 at 4:38 PM, Dan Kortschak < dan.kortschak at adelaide.edu.au> wrote: > Good to see that these ideas have been considered. > > I'd be interested to see this discussion, or at least the point dealing > with the problems that might arise. I'm at a loss as to how ambiguity > codes can't completely describe all possible coding sequences for any > given codon table (via Bio::Tools::CodonTable - in fact this already has > the revtranslate that could be fitted into a Bio::PrimarySeq method - to > answer Mark and Jason's comments, I think that /if/ a reverse_translate > method exists, it makes logical sense to have it tied to a sequence > object, calling the B:T:CT method on the seq object itself rather than > only in Bio::Tools, 2?). Pete, tcn you provide an example of the > problems? > > thanks > Dan > > On Mon, 2010-01-25 at 21:24 +0000, Peter wrote: > > I would say it could be a bad idea. For any protein string there are > > multiple possible back translations, and this cannot be captured > > fully as a nucleotide string even using the IUPAC ambiguity chars. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From nml5566 at gmail.com Tue Jan 26 16:10:54 2010 From: nml5566 at gmail.com (Nathan Liles) Date: Tue, 26 Jan 2010 15:10:54 -0600 Subject: [Bioperl-l] SVN access Message-ID: <4B5F5A5E.2070406@gmail.com> Does anyone know who I need to talk to for getting developer access for the Bioperl SVN? I want to submit a patch to the genbank2gff3 converter. Thanks, Nathan From Russell.Smithies at agresearch.co.nz Tue Jan 26 20:40:40 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 27 Jan 2010 14:40:40 +1300 Subject: [Bioperl-l] how to retrieve organism name from accession number? In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz> References: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife> <18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz> Grrrrrr, I hate eutils!!!! ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: NCBI esearch fatal error: Search Backend failed: Error 111 (Connection refused) STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 STACK: Bio::Tools::EUtilities::parse_data /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332 STACK: Bio::Tools::EUtilities::get_ids /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441 STACK: Bio::DB::EUtilities::get_ids /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363 STACK: get_desc.pl:32 ----------------------------------------------------------- Nice error message though :-) --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Smithies, Russell > Sent: Monday, 11 January 2010 10:05 a.m. > To: 'Chris Fields' > Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org' > Subject: Re: [Bioperl-l] how to retrieve organism name from accession > number? > > I've started to go off eUtils recently (not BioPerl's fault) as I've often > been finding that with large queries, chunks of the resulting data is > missing. > For example, before Xmas I was creating species-specific databases by > using eUtils to get a list of GI numbers back for a taxid, then retrieving > the fasta sequences in chunks of 500. > Very regularly, in the middle of the fasta there would be a message about > resource unavailable eg. > >test_sequence_1 > TACGATCATCGCTResource UnavailableTACGACTCTGCT > >test_sequence_2 > TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT > > Often this wasn't detected until formatdb complained about invalid > characters. > Inquiries to NCBI as to why this was happening and what to do about it > returned stupid answers ("do each sequence manually thru the web > interface", or "use eUtils"). > As we have a nice fast network connection, I now prefer to download very > large gzip files (i.e. all of refseq) and extract what I need. > > I can't help but think that NCBI could solve a lot of problems if they > gzipped the output from eUtils queries - it's something I've requested > regularly for the last 5 years or so!! > > --Russell > > > > -----Original Message----- > > From: Chris Fields [mailto:cjfields at illinois.edu] > > Sent: Monday, 11 January 2010 9:50 a.m. > > To: Smithies, Russell > > Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-bio.org' > > Subject: Re: [Bioperl-l] how to retrieve organism name from accession > > number? > > > > One could also use Bio::DB::Taxonomy, which indexes the same files or > > (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for the > > details). > > > > chris > > > > On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote: > > > > > An alternate non-BioPerly way (that may be faster given NCBI's > flakiness > > lately) would be to download the gi_taxid_nucl.zip or gi_taxid_prot.zip > > files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash > and > > do lookups. > > > In that same dir, taxdump.tar.gz contains a file called names.dmp > which > > lists taxids and descriptions (and synonyms) > > > > > > If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I > > could do this: > > > > > > my $taxid = $gi_taxid_nucl{$accession}; > > > my $org_name = $names{$taxid}; > > > > > > --Russell > > > > > > > > >> -----Original Message----- > > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > >> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen > > >> Sent: Saturday, 26 December 2009 4:52 p.m. > > >> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org > > >> Subject: Re: [Bioperl-l] how to retrieve organism name from accession > > >> number? > > >> > > >> Bhakti, > > >> The following example (using EUtilities) may serve your purpose: > > >> > > >> use Bio::DB::EUtilities; > > >> > > >> my (%taxa, @taxa); > > >> my (%names, %idmap); > > >> > > >> # these are protein ids; nuc ids will work by changing -dbfrom => > > >> 'nucleotide', > > >> # (probably) > > >> > > >> my @ids = qw(1621261 89318838 68536103 20807972 730439); > > >> > > >> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink', > > >> -db => 'taxonomy', > > >> -dbfrom => 'protein', > > >> -correspondence => 1, > > >> -id => \@ids); > > >> > > >> # iterate through the LinkSet objects > > >> while (my $ds = $factory->next_LinkSet) { > > >> $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0] > > >> } > > >> > > >> @taxa = @taxa{@ids}; > > >> > > >> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary', > > >> -db => 'taxonomy', > > >> -id => \@taxa ); > > >> > > >> while (local $_ = $factory->next_DocSum) { > > >> $names{($_->get_contents_by_name('TaxId'))[0]} = > > >> ($_->get_contents_by_name('ScientificName'))[0]; > > >> } > > >> > > >> foreach (@ids) { > > >> $idmap{$_} = $names{$taxa{$_}}; > > >> } > > >> > > >> # %idmap is > > >> # 1621261 => 'Mycobacterium tuberculosis H37Rv' > > >> # 20807972 => 'Thermoanaerobacter tengcongensis MB4' > > >> # 68536103 => 'Corynebacterium jeikeium K411' > > >> # 730439 => 'Bacillus caldolyticus' > > >> # 89318838 => undef (this record has been removed from the db) > > >> > > >> 1; > > >> > > >> You probably will need to break up your 30000 into chunks > > >> (say, 1000-3000 each), and do the above on each chunk with a > > >> > > >> sleep 3; > > >> > > >> or so separating the queries. > > >> MAJ > > >> ----- Original Message ----- > > >> From: "Bhakti Dwivedi" > > >> To: > > >> Sent: Friday, December 25, 2009 9:46 PM > > >> Subject: [Bioperl-l] how to retrieve organism name from accession > > number? > > >> > > >> > > >>> Hi, > > >>> > > >>> Does anyone know how to retrieve the "Source" or the "Species name" > > >> given > > >>> the accession number using Bioperl. I have these 30,000 accession > > >> numbers > > >>> for which I need to get the source organisms. Any kind of help will > > be > > >>> appreciated. > > >>> > > >>> Thanks > > >>> > > >>> BD > > >>> _______________________________________________ > > >>> Bioperl-l mailing list > > >>> Bioperl-l at lists.open-bio.org > > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > >>> > > >>> > > >> > > >> _______________________________________________ > > >> Bioperl-l mailing list > > >> Bioperl-l at lists.open-bio.org > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > ======================================================================= > > > Attention: The information contained in this message and/or > attachments > > > from AgResearch Limited is intended only for the persons or entities > > > to which it is addressed and may contain confidential and/or > privileged > > > material. Any review, retransmission, dissemination or other use of, > or > > > taking of any action in reliance upon, this information by persons or > > > entities other than the intended recipients is prohibited by > AgResearch > > > Limited. If you have received this message in error, please notify the > > > sender immediately. > > > > ======================================================================= > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Jan 26 20:46:26 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 26 Jan 2010 19:46:26 -0600 Subject: [Bioperl-l] how to retrieve organism name from accession number? In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz> References: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife> <18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz> Message-ID: <4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu> It's unfortunate but I have heard this problem popping up quite a bit more frequently lately. Not to push too many buttons but NCBI isn't very forthcoming with help these days; they have become quite insular. Not sure if they're short-staffed due to budget or if there are other issues. chris On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote: > Grrrrrr, I hate eutils!!!! > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: NCBI esearch fatal error: Search Backend failed: Error 111 (Connection refused) > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 > STACK: Bio::Tools::EUtilities::parse_data /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332 > STACK: Bio::Tools::EUtilities::get_ids /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441 > STACK: Bio::DB::EUtilities::get_ids /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363 > STACK: get_desc.pl:32 > ----------------------------------------------------------- > > > Nice error message though :-) > > > --Russell > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell >> Sent: Monday, 11 January 2010 10:05 a.m. >> To: 'Chris Fields' >> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org' >> Subject: Re: [Bioperl-l] how to retrieve organism name from accession >> number? >> >> I've started to go off eUtils recently (not BioPerl's fault) as I've often >> been finding that with large queries, chunks of the resulting data is >> missing. >> For example, before Xmas I was creating species-specific databases by >> using eUtils to get a list of GI numbers back for a taxid, then retrieving >> the fasta sequences in chunks of 500. >> Very regularly, in the middle of the fasta there would be a message about >> resource unavailable eg. >>> test_sequence_1 >> TACGATCATCGCTResource UnavailableTACGACTCTGCT >>> test_sequence_2 >> TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT >> >> Often this wasn't detected until formatdb complained about invalid >> characters. >> Inquiries to NCBI as to why this was happening and what to do about it >> returned stupid answers ("do each sequence manually thru the web >> interface", or "use eUtils"). >> As we have a nice fast network connection, I now prefer to download very >> large gzip files (i.e. all of refseq) and extract what I need. >> >> I can't help but think that NCBI could solve a lot of problems if they >> gzipped the output from eUtils queries - it's something I've requested >> regularly for the last 5 years or so!! >> >> --Russell >> >> >>> -----Original Message----- >>> From: Chris Fields [mailto:cjfields at illinois.edu] >>> Sent: Monday, 11 January 2010 9:50 a.m. >>> To: Smithies, Russell >>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-bio.org' >>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession >>> number? >>> >>> One could also use Bio::DB::Taxonomy, which indexes the same files or >>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for the >>> details). >>> >>> chris >>> >>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote: >>> >>>> An alternate non-BioPerly way (that may be faster given NCBI's >> flakiness >>> lately) would be to download the gi_taxid_nucl.zip or gi_taxid_prot.zip >>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash >> and >>> do lookups. >>>> In that same dir, taxdump.tar.gz contains a file called names.dmp >> which >>> lists taxids and descriptions (and synonyms) >>>> >>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I >>> could do this: >>>> >>>> my $taxid = $gi_taxid_nucl{$accession}; >>>> my $org_name = $names{$taxid}; >>>> >>>> --Russell >>>> >>>> >>>>> -----Original Message----- >>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen >>>>> Sent: Saturday, 26 December 2009 4:52 p.m. >>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org >>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession >>>>> number? >>>>> >>>>> Bhakti, >>>>> The following example (using EUtilities) may serve your purpose: >>>>> >>>>> use Bio::DB::EUtilities; >>>>> >>>>> my (%taxa, @taxa); >>>>> my (%names, %idmap); >>>>> >>>>> # these are protein ids; nuc ids will work by changing -dbfrom => >>>>> 'nucleotide', >>>>> # (probably) >>>>> >>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439); >>>>> >>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink', >>>>> -db => 'taxonomy', >>>>> -dbfrom => 'protein', >>>>> -correspondence => 1, >>>>> -id => \@ids); >>>>> >>>>> # iterate through the LinkSet objects >>>>> while (my $ds = $factory->next_LinkSet) { >>>>> $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0] >>>>> } >>>>> >>>>> @taxa = @taxa{@ids}; >>>>> >>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary', >>>>> -db => 'taxonomy', >>>>> -id => \@taxa ); >>>>> >>>>> while (local $_ = $factory->next_DocSum) { >>>>> $names{($_->get_contents_by_name('TaxId'))[0]} = >>>>> ($_->get_contents_by_name('ScientificName'))[0]; >>>>> } >>>>> >>>>> foreach (@ids) { >>>>> $idmap{$_} = $names{$taxa{$_}}; >>>>> } >>>>> >>>>> # %idmap is >>>>> # 1621261 => 'Mycobacterium tuberculosis H37Rv' >>>>> # 20807972 => 'Thermoanaerobacter tengcongensis MB4' >>>>> # 68536103 => 'Corynebacterium jeikeium K411' >>>>> # 730439 => 'Bacillus caldolyticus' >>>>> # 89318838 => undef (this record has been removed from the db) >>>>> >>>>> 1; >>>>> >>>>> You probably will need to break up your 30000 into chunks >>>>> (say, 1000-3000 each), and do the above on each chunk with a >>>>> >>>>> sleep 3; >>>>> >>>>> or so separating the queries. >>>>> MAJ >>>>> ----- Original Message ----- >>>>> From: "Bhakti Dwivedi" >>>>> To: >>>>> Sent: Friday, December 25, 2009 9:46 PM >>>>> Subject: [Bioperl-l] how to retrieve organism name from accession >>> number? >>>>> >>>>> >>>>>> Hi, >>>>>> >>>>>> Does anyone know how to retrieve the "Source" or the "Species name" >>>>> given >>>>>> the accession number using Bioperl. I have these 30,000 accession >>>>> numbers >>>>>> for which I need to get the source organisms. Any kind of help will >>> be >>>>>> appreciated. >>>>>> >>>>>> Thanks >>>>>> >>>>>> BD >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >> ======================================================================= >>>> Attention: The information contained in this message and/or >> attachments >>>> from AgResearch Limited is intended only for the persons or entities >>>> to which it is addressed and may contain confidential and/or >> privileged >>>> material. Any review, retransmission, dissemination or other use of, >> or >>>> taking of any action in reliance upon, this information by persons or >>>> entities other than the intended recipients is prohibited by >> AgResearch >>>> Limited. If you have received this message in error, please notify the >>>> sender immediately. >>>> >> ======================================================================= >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l From Russell.Smithies at agresearch.co.nz Tue Jan 26 20:59:15 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 27 Jan 2010 14:59:15 +1300 Subject: [Bioperl-l] how to retrieve organism name from accession number? In-Reply-To: <4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu> References: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife> <18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz> <4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C61AC961E@exchsth.agresearch.co.nz> I've had a wide selection of errors lately: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: NCBI esearch fatal error: Search Backend failed: Error 11 (Resource temporarily unavailable) STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 STACK: Bio::Tools::EUtilities::parse_data /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332 STACK: Bio::Tools::EUtilities::get_ids /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441 STACK: Bio::DB::EUtilities::get_ids /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363 STACK: get_desc.pl:32 ----------------------------------------------------------- And I never get a good explanation from NCBI or suggestions on how to avoid it. --Russell > -----Original Message----- > From: Chris Fields [mailto:cjfields at illinois.edu] > Sent: Wednesday, 27 January 2010 2:46 p.m. > To: Smithies, Russell > Cc: 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org' > Subject: Re: [Bioperl-l] how to retrieve organism name from accession > number? > > It's unfortunate but I have heard this problem popping up quite a bit more > frequently lately. Not to push too many buttons but NCBI isn't very > forthcoming with help these days; they have become quite insular. Not > sure if they're short-staffed due to budget or if there are other issues. > > chris > > On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote: > > > Grrrrrr, I hate eutils!!!! > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: NCBI esearch fatal error: Search Backend failed: Error 111 > (Connection refused) > > STACK: Error::throw > > STACK: Bio::Root::Root::throw > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 > > STACK: Bio::Tools::EUtilities::parse_data > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332 > > STACK: Bio::Tools::EUtilities::get_ids > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441 > > STACK: Bio::DB::EUtilities::get_ids > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363 > > STACK: get_desc.pl:32 > > ----------------------------------------------------------- > > > > > > Nice error message though :-) > > > > > > --Russell > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell > >> Sent: Monday, 11 January 2010 10:05 a.m. > >> To: 'Chris Fields' > >> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org' > >> Subject: Re: [Bioperl-l] how to retrieve organism name from accession > >> number? > >> > >> I've started to go off eUtils recently (not BioPerl's fault) as I've > often > >> been finding that with large queries, chunks of the resulting data is > >> missing. > >> For example, before Xmas I was creating species-specific databases by > >> using eUtils to get a list of GI numbers back for a taxid, then > retrieving > >> the fasta sequences in chunks of 500. > >> Very regularly, in the middle of the fasta there would be a message > about > >> resource unavailable eg. > >>> test_sequence_1 > >> TACGATCATCGCTResource UnavailableTACGACTCTGCT > >>> test_sequence_2 > >> TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT > >> > >> Often this wasn't detected until formatdb complained about invalid > >> characters. > >> Inquiries to NCBI as to why this was happening and what to do about it > >> returned stupid answers ("do each sequence manually thru the web > >> interface", or "use eUtils"). > >> As we have a nice fast network connection, I now prefer to download > very > >> large gzip files (i.e. all of refseq) and extract what I need. > >> > >> I can't help but think that NCBI could solve a lot of problems if they > >> gzipped the output from eUtils queries - it's something I've requested > >> regularly for the last 5 years or so!! > >> > >> --Russell > >> > >> > >>> -----Original Message----- > >>> From: Chris Fields [mailto:cjfields at illinois.edu] > >>> Sent: Monday, 11 January 2010 9:50 a.m. > >>> To: Smithies, Russell > >>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-bio.org' > >>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession > >>> number? > >>> > >>> One could also use Bio::DB::Taxonomy, which indexes the same files or > >>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for > the > >>> details). > >>> > >>> chris > >>> > >>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote: > >>> > >>>> An alternate non-BioPerly way (that may be faster given NCBI's > >> flakiness > >>> lately) would be to download the gi_taxid_nucl.zip or > gi_taxid_prot.zip > >>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash > >> and > >>> do lookups. > >>>> In that same dir, taxdump.tar.gz contains a file called names.dmp > >> which > >>> lists taxids and descriptions (and synonyms) > >>>> > >>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I > >>> could do this: > >>>> > >>>> my $taxid = $gi_taxid_nucl{$accession}; > >>>> my $org_name = $names{$taxid}; > >>>> > >>>> --Russell > >>>> > >>>> > >>>>> -----Original Message----- > >>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen > >>>>> Sent: Saturday, 26 December 2009 4:52 p.m. > >>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org > >>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from > accession > >>>>> number? > >>>>> > >>>>> Bhakti, > >>>>> The following example (using EUtilities) may serve your purpose: > >>>>> > >>>>> use Bio::DB::EUtilities; > >>>>> > >>>>> my (%taxa, @taxa); > >>>>> my (%names, %idmap); > >>>>> > >>>>> # these are protein ids; nuc ids will work by changing -dbfrom => > >>>>> 'nucleotide', > >>>>> # (probably) > >>>>> > >>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439); > >>>>> > >>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink', > >>>>> -db => 'taxonomy', > >>>>> -dbfrom => 'protein', > >>>>> -correspondence => 1, > >>>>> -id => \@ids); > >>>>> > >>>>> # iterate through the LinkSet objects > >>>>> while (my $ds = $factory->next_LinkSet) { > >>>>> $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0] > >>>>> } > >>>>> > >>>>> @taxa = @taxa{@ids}; > >>>>> > >>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary', > >>>>> -db => 'taxonomy', > >>>>> -id => \@taxa ); > >>>>> > >>>>> while (local $_ = $factory->next_DocSum) { > >>>>> $names{($_->get_contents_by_name('TaxId'))[0]} = > >>>>> ($_->get_contents_by_name('ScientificName'))[0]; > >>>>> } > >>>>> > >>>>> foreach (@ids) { > >>>>> $idmap{$_} = $names{$taxa{$_}}; > >>>>> } > >>>>> > >>>>> # %idmap is > >>>>> # 1621261 => 'Mycobacterium tuberculosis H37Rv' > >>>>> # 20807972 => 'Thermoanaerobacter tengcongensis MB4' > >>>>> # 68536103 => 'Corynebacterium jeikeium K411' > >>>>> # 730439 => 'Bacillus caldolyticus' > >>>>> # 89318838 => undef (this record has been removed from the db) > >>>>> > >>>>> 1; > >>>>> > >>>>> You probably will need to break up your 30000 into chunks > >>>>> (say, 1000-3000 each), and do the above on each chunk with a > >>>>> > >>>>> sleep 3; > >>>>> > >>>>> or so separating the queries. > >>>>> MAJ > >>>>> ----- Original Message ----- > >>>>> From: "Bhakti Dwivedi" > >>>>> To: > >>>>> Sent: Friday, December 25, 2009 9:46 PM > >>>>> Subject: [Bioperl-l] how to retrieve organism name from accession > >>> number? > >>>>> > >>>>> > >>>>>> Hi, > >>>>>> > >>>>>> Does anyone know how to retrieve the "Source" or the "Species name" > >>>>> given > >>>>>> the accession number using Bioperl. I have these 30,000 accession > >>>>> numbers > >>>>>> for which I need to get the source organisms. Any kind of help > will > >>> be > >>>>>> appreciated. > >>>>>> > >>>>>> Thanks > >>>>>> > >>>>>> BD > >>>>>> _______________________________________________ > >>>>>> Bioperl-l mailing list > >>>>>> Bioperl-l at lists.open-bio.org > >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>> > >>>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> Bioperl-l mailing list > >>>>> Bioperl-l at lists.open-bio.org > >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>> > >> ======================================================================= > >>>> Attention: The information contained in this message and/or > >> attachments > >>>> from AgResearch Limited is intended only for the persons or entities > >>>> to which it is addressed and may contain confidential and/or > >> privileged > >>>> material. Any review, retransmission, dissemination or other use of, > >> or > >>>> taking of any action in reliance upon, this information by persons or > >>>> entities other than the intended recipients is prohibited by > >> AgResearch > >>>> Limited. If you have received this message in error, please notify > the > >>>> sender immediately. > >>>> > >> ======================================================================= > >>>> > >>>> _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l at lists.open-bio.org > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Jan 26 21:42:22 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 26 Jan 2010 20:42:22 -0600 Subject: [Bioperl-l] how to retrieve organism name from accession number? In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C61AC961E@exchsth.agresearch.co.nz> References: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife> <18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz> <4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu> <18DF7D20DFEC044098A1062202F5FFF32C61AC961E@exchsth.agresearch.co.nz> Message-ID: <5FF9B845-2B6A-4B8C-B8AE-A12E0CBCA2D6@illinois.edu> Makes me wonder if they're pushing more users towards the SOAP-based services and away from eutils. chris On Jan 26, 2010, at 7:59 PM, Smithies, Russell wrote: > I've had a wide selection of errors lately: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: NCBI esearch fatal error: Search Backend failed: Error 11 (Resource temporarily unavailable) > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 > STACK: Bio::Tools::EUtilities::parse_data /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332 > STACK: Bio::Tools::EUtilities::get_ids /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441 > STACK: Bio::DB::EUtilities::get_ids /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363 > STACK: get_desc.pl:32 > ----------------------------------------------------------- > > And I never get a good explanation from NCBI or suggestions on how to avoid it. > > > --Russell > > >> -----Original Message----- >> From: Chris Fields [mailto:cjfields at illinois.edu] >> Sent: Wednesday, 27 January 2010 2:46 p.m. >> To: Smithies, Russell >> Cc: 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org' >> Subject: Re: [Bioperl-l] how to retrieve organism name from accession >> number? >> >> It's unfortunate but I have heard this problem popping up quite a bit more >> frequently lately. Not to push too many buttons but NCBI isn't very >> forthcoming with help these days; they have become quite insular. Not >> sure if they're short-staffed due to budget or if there are other issues. >> >> chris >> >> On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote: >> >>> Grrrrrr, I hate eutils!!!! >>> >>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>> MSG: NCBI esearch fatal error: Search Backend failed: Error 111 >> (Connection refused) >>> STACK: Error::throw >>> STACK: Bio::Root::Root::throw >> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 >>> STACK: Bio::Tools::EUtilities::parse_data >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332 >>> STACK: Bio::Tools::EUtilities::get_ids >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441 >>> STACK: Bio::DB::EUtilities::get_ids >> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363 >>> STACK: get_desc.pl:32 >>> ----------------------------------------------------------- >>> >>> >>> Nice error message though :-) >>> >>> >>> --Russell >>> >>>> -----Original Message----- >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell >>>> Sent: Monday, 11 January 2010 10:05 a.m. >>>> To: 'Chris Fields' >>>> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org' >>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession >>>> number? >>>> >>>> I've started to go off eUtils recently (not BioPerl's fault) as I've >> often >>>> been finding that with large queries, chunks of the resulting data is >>>> missing. >>>> For example, before Xmas I was creating species-specific databases by >>>> using eUtils to get a list of GI numbers back for a taxid, then >> retrieving >>>> the fasta sequences in chunks of 500. >>>> Very regularly, in the middle of the fasta there would be a message >> about >>>> resource unavailable eg. >>>>> test_sequence_1 >>>> TACGATCATCGCTResource UnavailableTACGACTCTGCT >>>>> test_sequence_2 >>>> TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT >>>> >>>> Often this wasn't detected until formatdb complained about invalid >>>> characters. >>>> Inquiries to NCBI as to why this was happening and what to do about it >>>> returned stupid answers ("do each sequence manually thru the web >>>> interface", or "use eUtils"). >>>> As we have a nice fast network connection, I now prefer to download >> very >>>> large gzip files (i.e. all of refseq) and extract what I need. >>>> >>>> I can't help but think that NCBI could solve a lot of problems if they >>>> gzipped the output from eUtils queries - it's something I've requested >>>> regularly for the last 5 years or so!! >>>> >>>> --Russell >>>> >>>> >>>>> -----Original Message----- >>>>> From: Chris Fields [mailto:cjfields at illinois.edu] >>>>> Sent: Monday, 11 January 2010 9:50 a.m. >>>>> To: Smithies, Russell >>>>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-bio.org' >>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession >>>>> number? >>>>> >>>>> One could also use Bio::DB::Taxonomy, which indexes the same files or >>>>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for >> the >>>>> details). >>>>> >>>>> chris >>>>> >>>>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote: >>>>> >>>>>> An alternate non-BioPerly way (that may be faster given NCBI's >>>> flakiness >>>>> lately) would be to download the gi_taxid_nucl.zip or >> gi_taxid_prot.zip >>>>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash >>>> and >>>>> do lookups. >>>>>> In that same dir, taxdump.tar.gz contains a file called names.dmp >>>> which >>>>> lists taxids and descriptions (and synonyms) >>>>>> >>>>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I >>>>> could do this: >>>>>> >>>>>> my $taxid = $gi_taxid_nucl{$accession}; >>>>>> my $org_name = $names{$taxid}; >>>>>> >>>>>> --Russell >>>>>> >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen >>>>>>> Sent: Saturday, 26 December 2009 4:52 p.m. >>>>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org >>>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from >> accession >>>>>>> number? >>>>>>> >>>>>>> Bhakti, >>>>>>> The following example (using EUtilities) may serve your purpose: >>>>>>> >>>>>>> use Bio::DB::EUtilities; >>>>>>> >>>>>>> my (%taxa, @taxa); >>>>>>> my (%names, %idmap); >>>>>>> >>>>>>> # these are protein ids; nuc ids will work by changing -dbfrom => >>>>>>> 'nucleotide', >>>>>>> # (probably) >>>>>>> >>>>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439); >>>>>>> >>>>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink', >>>>>>> -db => 'taxonomy', >>>>>>> -dbfrom => 'protein', >>>>>>> -correspondence => 1, >>>>>>> -id => \@ids); >>>>>>> >>>>>>> # iterate through the LinkSet objects >>>>>>> while (my $ds = $factory->next_LinkSet) { >>>>>>> $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0] >>>>>>> } >>>>>>> >>>>>>> @taxa = @taxa{@ids}; >>>>>>> >>>>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary', >>>>>>> -db => 'taxonomy', >>>>>>> -id => \@taxa ); >>>>>>> >>>>>>> while (local $_ = $factory->next_DocSum) { >>>>>>> $names{($_->get_contents_by_name('TaxId'))[0]} = >>>>>>> ($_->get_contents_by_name('ScientificName'))[0]; >>>>>>> } >>>>>>> >>>>>>> foreach (@ids) { >>>>>>> $idmap{$_} = $names{$taxa{$_}}; >>>>>>> } >>>>>>> >>>>>>> # %idmap is >>>>>>> # 1621261 => 'Mycobacterium tuberculosis H37Rv' >>>>>>> # 20807972 => 'Thermoanaerobacter tengcongensis MB4' >>>>>>> # 68536103 => 'Corynebacterium jeikeium K411' >>>>>>> # 730439 => 'Bacillus caldolyticus' >>>>>>> # 89318838 => undef (this record has been removed from the db) >>>>>>> >>>>>>> 1; >>>>>>> >>>>>>> You probably will need to break up your 30000 into chunks >>>>>>> (say, 1000-3000 each), and do the above on each chunk with a >>>>>>> >>>>>>> sleep 3; >>>>>>> >>>>>>> or so separating the queries. >>>>>>> MAJ >>>>>>> ----- Original Message ----- >>>>>>> From: "Bhakti Dwivedi" >>>>>>> To: >>>>>>> Sent: Friday, December 25, 2009 9:46 PM >>>>>>> Subject: [Bioperl-l] how to retrieve organism name from accession >>>>> number? >>>>>>> >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> Does anyone know how to retrieve the "Source" or the "Species name" >>>>>>> given >>>>>>>> the accession number using Bioperl. I have these 30,000 accession >>>>>>> numbers >>>>>>>> for which I need to get the source organisms. Any kind of help >> will >>>>> be >>>>>>>> appreciated. >>>>>>>> >>>>>>>> Thanks >>>>>>>> >>>>>>>> BD >>>>>>>> _______________________________________________ >>>>>>>> Bioperl-l mailing list >>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>> ======================================================================= >>>>>> Attention: The information contained in this message and/or >>>> attachments >>>>>> from AgResearch Limited is intended only for the persons or entities >>>>>> to which it is addressed and may contain confidential and/or >>>> privileged >>>>>> material. Any review, retransmission, dissemination or other use of, >>>> or >>>>>> taking of any action in reliance upon, this information by persons or >>>>>> entities other than the intended recipients is prohibited by >>>> AgResearch >>>>>> Limited. If you have received this message in error, please notify >> the >>>>>> sender immediately. >>>>>> >>>> ======================================================================= >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Russell.Smithies at agresearch.co.nz Tue Jan 26 21:45:58 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 27 Jan 2010 15:45:58 +1300 Subject: [Bioperl-l] how to retrieve organism name from accession number? In-Reply-To: <5FF9B845-2B6A-4B8C-B8AE-A12E0CBCA2D6@illinois.edu> References: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife> <18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz> <4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu> <18DF7D20DFEC044098A1062202F5FFF32C61AC961E@exchsth.agresearch.co.nz> <5FF9B845-2B6A-4B8C-B8AE-A12E0CBCA2D6@illinois.edu> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C61AC964C@exchsth.agresearch.co.nz> Batch-entrez http://www.ncbi.nlm.nih.gov/portal/utils/batchentrez_p.cgi still works if you don't mind a bit of manual button clicking. It's handling chunks of 100,000 records OK (today). --Russell > -----Original Message----- > From: Chris Fields [mailto:cjfields at illinois.edu] > Sent: Wednesday, 27 January 2010 3:42 p.m. > To: Smithies, Russell > Cc: 'bioperl-l at lists.open-bio.org'; 'Mark A. Jensen' > Subject: Re: [Bioperl-l] how to retrieve organism name from accession > number? > > Makes me wonder if they're pushing more users towards the SOAP-based > services and away from eutils. > > chris > > On Jan 26, 2010, at 7:59 PM, Smithies, Russell wrote: > > > I've had a wide selection of errors lately: > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: NCBI esearch fatal error: Search Backend failed: Error 11 (Resource > temporarily unavailable) > > STACK: Error::throw > > STACK: Bio::Root::Root::throw > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 > > STACK: Bio::Tools::EUtilities::parse_data > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332 > > STACK: Bio::Tools::EUtilities::get_ids > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441 > > STACK: Bio::DB::EUtilities::get_ids > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363 > > STACK: get_desc.pl:32 > > ----------------------------------------------------------- > > > > And I never get a good explanation from NCBI or suggestions on how to > avoid it. > > > > > > --Russell > > > > > >> -----Original Message----- > >> From: Chris Fields [mailto:cjfields at illinois.edu] > >> Sent: Wednesday, 27 January 2010 2:46 p.m. > >> To: Smithies, Russell > >> Cc: 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org' > >> Subject: Re: [Bioperl-l] how to retrieve organism name from accession > >> number? > >> > >> It's unfortunate but I have heard this problem popping up quite a bit > more > >> frequently lately. Not to push too many buttons but NCBI isn't very > >> forthcoming with help these days; they have become quite insular. Not > >> sure if they're short-staffed due to budget or if there are other > issues. > >> > >> chris > >> > >> On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote: > >> > >>> Grrrrrr, I hate eutils!!!! > >>> > >>> ------------- EXCEPTION: Bio::Root::Exception ------------- > >>> MSG: NCBI esearch fatal error: Search Backend failed: Error 111 > >> (Connection refused) > >>> STACK: Error::throw > >>> STACK: Bio::Root::Root::throw > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 > >>> STACK: Bio::Tools::EUtilities::parse_data > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332 > >>> STACK: Bio::Tools::EUtilities::get_ids > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441 > >>> STACK: Bio::DB::EUtilities::get_ids > >> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363 > >>> STACK: get_desc.pl:32 > >>> ----------------------------------------------------------- > >>> > >>> > >>> Nice error message though :-) > >>> > >>> > >>> --Russell > >>> > >>>> -----Original Message----- > >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>>> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell > >>>> Sent: Monday, 11 January 2010 10:05 a.m. > >>>> To: 'Chris Fields' > >>>> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open- > bio.org' > >>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession > >>>> number? > >>>> > >>>> I've started to go off eUtils recently (not BioPerl's fault) as I've > >> often > >>>> been finding that with large queries, chunks of the resulting data is > >>>> missing. > >>>> For example, before Xmas I was creating species-specific databases by > >>>> using eUtils to get a list of GI numbers back for a taxid, then > >> retrieving > >>>> the fasta sequences in chunks of 500. > >>>> Very regularly, in the middle of the fasta there would be a message > >> about > >>>> resource unavailable eg. > >>>>> test_sequence_1 > >>>> TACGATCATCGCTResource UnavailableTACGACTCTGCT > >>>>> test_sequence_2 > >>>> TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT > >>>> > >>>> Often this wasn't detected until formatdb complained about invalid > >>>> characters. > >>>> Inquiries to NCBI as to why this was happening and what to do about > it > >>>> returned stupid answers ("do each sequence manually thru the web > >>>> interface", or "use eUtils"). > >>>> As we have a nice fast network connection, I now prefer to download > >> very > >>>> large gzip files (i.e. all of refseq) and extract what I need. > >>>> > >>>> I can't help but think that NCBI could solve a lot of problems if > they > >>>> gzipped the output from eUtils queries - it's something I've > requested > >>>> regularly for the last 5 years or so!! > >>>> > >>>> --Russell > >>>> > >>>> > >>>>> -----Original Message----- > >>>>> From: Chris Fields [mailto:cjfields at illinois.edu] > >>>>> Sent: Monday, 11 January 2010 9:50 a.m. > >>>>> To: Smithies, Russell > >>>>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open- > bio.org' > >>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from > accession > >>>>> number? > >>>>> > >>>>> One could also use Bio::DB::Taxonomy, which indexes the same files > or > >>>>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for > >> the > >>>>> details). > >>>>> > >>>>> chris > >>>>> > >>>>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote: > >>>>> > >>>>>> An alternate non-BioPerly way (that may be faster given NCBI's > >>>> flakiness > >>>>> lately) would be to download the gi_taxid_nucl.zip or > >> gi_taxid_prot.zip > >>>>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a > hash > >>>> and > >>>>> do lookups. > >>>>>> In that same dir, taxdump.tar.gz contains a file called names.dmp > >>>> which > >>>>> lists taxids and descriptions (and synonyms) > >>>>>> > >>>>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so > I > >>>>> could do this: > >>>>>> > >>>>>> my $taxid = $gi_taxid_nucl{$accession}; > >>>>>> my $org_name = $names{$taxid}; > >>>>>> > >>>>>> --Russell > >>>>>> > >>>>>> > >>>>>>> -----Original Message----- > >>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>>>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen > >>>>>>> Sent: Saturday, 26 December 2009 4:52 p.m. > >>>>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org > >>>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from > >> accession > >>>>>>> number? > >>>>>>> > >>>>>>> Bhakti, > >>>>>>> The following example (using EUtilities) may serve your purpose: > >>>>>>> > >>>>>>> use Bio::DB::EUtilities; > >>>>>>> > >>>>>>> my (%taxa, @taxa); > >>>>>>> my (%names, %idmap); > >>>>>>> > >>>>>>> # these are protein ids; nuc ids will work by changing -dbfrom => > >>>>>>> 'nucleotide', > >>>>>>> # (probably) > >>>>>>> > >>>>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439); > >>>>>>> > >>>>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink', > >>>>>>> -db => 'taxonomy', > >>>>>>> -dbfrom => 'protein', > >>>>>>> -correspondence => 1, > >>>>>>> -id => \@ids); > >>>>>>> > >>>>>>> # iterate through the LinkSet objects > >>>>>>> while (my $ds = $factory->next_LinkSet) { > >>>>>>> $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0] > >>>>>>> } > >>>>>>> > >>>>>>> @taxa = @taxa{@ids}; > >>>>>>> > >>>>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary', > >>>>>>> -db => 'taxonomy', > >>>>>>> -id => \@taxa ); > >>>>>>> > >>>>>>> while (local $_ = $factory->next_DocSum) { > >>>>>>> $names{($_->get_contents_by_name('TaxId'))[0]} = > >>>>>>> ($_->get_contents_by_name('ScientificName'))[0]; > >>>>>>> } > >>>>>>> > >>>>>>> foreach (@ids) { > >>>>>>> $idmap{$_} = $names{$taxa{$_}}; > >>>>>>> } > >>>>>>> > >>>>>>> # %idmap is > >>>>>>> # 1621261 => 'Mycobacterium tuberculosis H37Rv' > >>>>>>> # 20807972 => 'Thermoanaerobacter tengcongensis MB4' > >>>>>>> # 68536103 => 'Corynebacterium jeikeium K411' > >>>>>>> # 730439 => 'Bacillus caldolyticus' > >>>>>>> # 89318838 => undef (this record has been removed from the > db) > >>>>>>> > >>>>>>> 1; > >>>>>>> > >>>>>>> You probably will need to break up your 30000 into chunks > >>>>>>> (say, 1000-3000 each), and do the above on each chunk with a > >>>>>>> > >>>>>>> sleep 3; > >>>>>>> > >>>>>>> or so separating the queries. > >>>>>>> MAJ > >>>>>>> ----- Original Message ----- > >>>>>>> From: "Bhakti Dwivedi" > >>>>>>> To: > >>>>>>> Sent: Friday, December 25, 2009 9:46 PM > >>>>>>> Subject: [Bioperl-l] how to retrieve organism name from accession > >>>>> number? > >>>>>>> > >>>>>>> > >>>>>>>> Hi, > >>>>>>>> > >>>>>>>> Does anyone know how to retrieve the "Source" or the "Species > name" > >>>>>>> given > >>>>>>>> the accession number using Bioperl. I have these 30,000 > accession > >>>>>>> numbers > >>>>>>>> for which I need to get the source organisms. Any kind of help > >> will > >>>>> be > >>>>>>>> appreciated. > >>>>>>>> > >>>>>>>> Thanks > >>>>>>>> > >>>>>>>> BD > >>>>>>>> _______________________________________________ > >>>>>>>> Bioperl-l mailing list > >>>>>>>> Bioperl-l at lists.open-bio.org > >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>>> _______________________________________________ > >>>>>>> Bioperl-l mailing list > >>>>>>> Bioperl-l at lists.open-bio.org > >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>> > >>>> > ======================================================================= > >>>>>> Attention: The information contained in this message and/or > >>>> attachments > >>>>>> from AgResearch Limited is intended only for the persons or > entities > >>>>>> to which it is addressed and may contain confidential and/or > >>>> privileged > >>>>>> material. Any review, retransmission, dissemination or other use > of, > >>>> or > >>>>>> taking of any action in reliance upon, this information by persons > or > >>>>>> entities other than the intended recipients is prohibited by > >>>> AgResearch > >>>>>> Limited. If you have received this message in error, please notify > >> the > >>>>>> sender immediately. > >>>>>> > >>>> > ======================================================================= > >>>>>> > >>>>>> _______________________________________________ > >>>>>> Bioperl-l mailing list > >>>>>> Bioperl-l at lists.open-bio.org > >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>> > >>>> > >>>> _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l at lists.open-bio.org > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Wed Jan 27 10:14:22 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 27 Jan 2010 10:14:22 -0500 Subject: [Bioperl-l] how to retrieve organism name from accession number? In-Reply-To: <5FF9B845-2B6A-4B8C-B8AE-A12E0CBCA2D6@illinois.edu> References: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife><18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz><18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz><18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz><4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu><18DF7D20DFEC044098A1062202F5FFF32C61AC961E@exchsth.agresearch.co.nz> <5FF9B845-2B6A-4B8C-B8AE-A12E0CBCA2D6@illinois.edu> Message-ID: Precisely the MO behind SoapEU...get the jump on 'em. ----- Original Message ----- From: "Chris Fields" To: "Smithies, Russell" Cc: ; "'Mark A. Jensen'" Sent: Tuesday, January 26, 2010 9:42 PM Subject: Re: [Bioperl-l] how to retrieve organism name from accession number? > Makes me wonder if they're pushing more users towards the SOAP-based services > and away from eutils. > > chris > > On Jan 26, 2010, at 7:59 PM, Smithies, Russell wrote: > >> I've had a wide selection of errors lately: >> >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: NCBI esearch fatal error: Search Backend failed: Error 11 (Resource >> temporarily unavailable) >> STACK: Error::throw >> STACK: Bio::Root::Root::throw >> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 >> STACK: Bio::Tools::EUtilities::parse_data >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332 >> STACK: Bio::Tools::EUtilities::get_ids >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441 >> STACK: Bio::DB::EUtilities::get_ids >> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363 >> STACK: get_desc.pl:32 >> ----------------------------------------------------------- >> >> And I never get a good explanation from NCBI or suggestions on how to avoid >> it. >> >> >> --Russell >> >> >>> -----Original Message----- >>> From: Chris Fields [mailto:cjfields at illinois.edu] >>> Sent: Wednesday, 27 January 2010 2:46 p.m. >>> To: Smithies, Russell >>> Cc: 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org' >>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession >>> number? >>> >>> It's unfortunate but I have heard this problem popping up quite a bit more >>> frequently lately. Not to push too many buttons but NCBI isn't very >>> forthcoming with help these days; they have become quite insular. Not >>> sure if they're short-staffed due to budget or if there are other issues. >>> >>> chris >>> >>> On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote: >>> >>>> Grrrrrr, I hate eutils!!!! >>>> >>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>> MSG: NCBI esearch fatal error: Search Backend failed: Error 111 >>> (Connection refused) >>>> STACK: Error::throw >>>> STACK: Bio::Root::Root::throw >>> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 >>>> STACK: Bio::Tools::EUtilities::parse_data >>> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332 >>>> STACK: Bio::Tools::EUtilities::get_ids >>> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441 >>>> STACK: Bio::DB::EUtilities::get_ids >>> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363 >>>> STACK: get_desc.pl:32 >>>> ----------------------------------------------------------- >>>> >>>> >>>> Nice error message though :-) >>>> >>>> >>>> --Russell >>>> >>>>> -----Original Message----- >>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell >>>>> Sent: Monday, 11 January 2010 10:05 a.m. >>>>> To: 'Chris Fields' >>>>> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org' >>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession >>>>> number? >>>>> >>>>> I've started to go off eUtils recently (not BioPerl's fault) as I've >>> often >>>>> been finding that with large queries, chunks of the resulting data is >>>>> missing. >>>>> For example, before Xmas I was creating species-specific databases by >>>>> using eUtils to get a list of GI numbers back for a taxid, then >>> retrieving >>>>> the fasta sequences in chunks of 500. >>>>> Very regularly, in the middle of the fasta there would be a message >>> about >>>>> resource unavailable eg. >>>>>> test_sequence_1 >>>>> TACGATCATCGCTResource UnavailableTACGACTCTGCT >>>>>> test_sequence_2 >>>>> TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT >>>>> >>>>> Often this wasn't detected until formatdb complained about invalid >>>>> characters. >>>>> Inquiries to NCBI as to why this was happening and what to do about it >>>>> returned stupid answers ("do each sequence manually thru the web >>>>> interface", or "use eUtils"). >>>>> As we have a nice fast network connection, I now prefer to download >>> very >>>>> large gzip files (i.e. all of refseq) and extract what I need. >>>>> >>>>> I can't help but think that NCBI could solve a lot of problems if they >>>>> gzipped the output from eUtils queries - it's something I've requested >>>>> regularly for the last 5 years or so!! >>>>> >>>>> --Russell >>>>> >>>>> >>>>>> -----Original Message----- >>>>>> From: Chris Fields [mailto:cjfields at illinois.edu] >>>>>> Sent: Monday, 11 January 2010 9:50 a.m. >>>>>> To: Smithies, Russell >>>>>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-bio.org' >>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession >>>>>> number? >>>>>> >>>>>> One could also use Bio::DB::Taxonomy, which indexes the same files or >>>>>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for >>> the >>>>>> details). >>>>>> >>>>>> chris >>>>>> >>>>>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote: >>>>>> >>>>>>> An alternate non-BioPerly way (that may be faster given NCBI's >>>>> flakiness >>>>>> lately) would be to download the gi_taxid_nucl.zip or >>> gi_taxid_prot.zip >>>>>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash >>>>> and >>>>>> do lookups. >>>>>>> In that same dir, taxdump.tar.gz contains a file called names.dmp >>>>> which >>>>>> lists taxids and descriptions (and synonyms) >>>>>>> >>>>>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I >>>>>> could do this: >>>>>>> >>>>>>> my $taxid = $gi_taxid_nucl{$accession}; >>>>>>> my $org_name = $names{$taxid}; >>>>>>> >>>>>>> --Russell >>>>>>> >>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen >>>>>>>> Sent: Saturday, 26 December 2009 4:52 p.m. >>>>>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org >>>>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from >>> accession >>>>>>>> number? >>>>>>>> >>>>>>>> Bhakti, >>>>>>>> The following example (using EUtilities) may serve your purpose: >>>>>>>> >>>>>>>> use Bio::DB::EUtilities; >>>>>>>> >>>>>>>> my (%taxa, @taxa); >>>>>>>> my (%names, %idmap); >>>>>>>> >>>>>>>> # these are protein ids; nuc ids will work by changing -dbfrom => >>>>>>>> 'nucleotide', >>>>>>>> # (probably) >>>>>>>> >>>>>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439); >>>>>>>> >>>>>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink', >>>>>>>> -db => 'taxonomy', >>>>>>>> -dbfrom => 'protein', >>>>>>>> -correspondence => 1, >>>>>>>> -id => \@ids); >>>>>>>> >>>>>>>> # iterate through the LinkSet objects >>>>>>>> while (my $ds = $factory->next_LinkSet) { >>>>>>>> $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0] >>>>>>>> } >>>>>>>> >>>>>>>> @taxa = @taxa{@ids}; >>>>>>>> >>>>>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary', >>>>>>>> -db => 'taxonomy', >>>>>>>> -id => \@taxa ); >>>>>>>> >>>>>>>> while (local $_ = $factory->next_DocSum) { >>>>>>>> $names{($_->get_contents_by_name('TaxId'))[0]} = >>>>>>>> ($_->get_contents_by_name('ScientificName'))[0]; >>>>>>>> } >>>>>>>> >>>>>>>> foreach (@ids) { >>>>>>>> $idmap{$_} = $names{$taxa{$_}}; >>>>>>>> } >>>>>>>> >>>>>>>> # %idmap is >>>>>>>> # 1621261 => 'Mycobacterium tuberculosis H37Rv' >>>>>>>> # 20807972 => 'Thermoanaerobacter tengcongensis MB4' >>>>>>>> # 68536103 => 'Corynebacterium jeikeium K411' >>>>>>>> # 730439 => 'Bacillus caldolyticus' >>>>>>>> # 89318838 => undef (this record has been removed from the db) >>>>>>>> >>>>>>>> 1; >>>>>>>> >>>>>>>> You probably will need to break up your 30000 into chunks >>>>>>>> (say, 1000-3000 each), and do the above on each chunk with a >>>>>>>> >>>>>>>> sleep 3; >>>>>>>> >>>>>>>> or so separating the queries. >>>>>>>> MAJ >>>>>>>> ----- Original Message ----- >>>>>>>> From: "Bhakti Dwivedi" >>>>>>>> To: >>>>>>>> Sent: Friday, December 25, 2009 9:46 PM >>>>>>>> Subject: [Bioperl-l] how to retrieve organism name from accession >>>>>> number? >>>>>>>> >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> Does anyone know how to retrieve the "Source" or the "Species name" >>>>>>>> given >>>>>>>>> the accession number using Bioperl. I have these 30,000 accession >>>>>>>> numbers >>>>>>>>> for which I need to get the source organisms. Any kind of help >>> will >>>>>> be >>>>>>>>> appreciated. >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> >>>>>>>>> BD >>>>>>>>> _______________________________________________ >>>>>>>>> Bioperl-l mailing list >>>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Bioperl-l mailing list >>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>>> ======================================================================= >>>>>>> Attention: The information contained in this message and/or >>>>> attachments >>>>>>> from AgResearch Limited is intended only for the persons or entities >>>>>>> to which it is addressed and may contain confidential and/or >>>>> privileged >>>>>>> material. Any review, retransmission, dissemination or other use of, >>>>> or >>>>>>> taking of any action in reliance upon, this information by persons or >>>>>>> entities other than the intended recipients is prohibited by >>>>> AgResearch >>>>>>> Limited. If you have received this message in error, please notify >>> the >>>>>>> sender immediately. >>>>>>> >>>>> ======================================================================= >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From bhakti.dwivedi at gmail.com Wed Jan 27 14:42:06 2010 From: bhakti.dwivedi at gmail.com (Bhakti Dwivedi) Date: Wed, 27 Jan 2010 14:42:06 -0500 Subject: [Bioperl-l] Designing primers from multiple sequence alignment of amino acid sequences Message-ID: Hi, I have to design primers from the multiple sequence alignments of amino acid sequences. The sequences I am working with are quite diverged and often the available primer design programs (such as CODEHOP/iCODEHOP) fail to find any primer sets. But, when I look at the alignment manually, I could see the regions that I could use to make primers. So I designed the degenerate primers the old-fashioned way, starting from selecting the conserved regions (6-10aa long) from the alignment to translating the selected regions to DNA using the appropriate codon usage table, and then finally checking the primer sets (potential forward and reverse primers) using tools like OLIGOANALYZER. In the end, I did find few good primer sets, but getting them to work in reality is something I will have to wait and see. While doing this process manually, I really felt the need to automate it (it was not just one alignment I did, I worked with several of those). I was wondering if there is anyway bioperl can help me here, or making a perl script is the only way to go. I would appreciate your suggestions/comments. Thanks! (apologize for a long email..) Regards Bhakti From Kevin.M.Brown at asu.edu Wed Jan 27 15:23:57 2010 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Wed, 27 Jan 2010 13:23:57 -0700 Subject: [Bioperl-l] Designing primers from multiple sequence alignment ofamino acid sequences In-Reply-To: References: Message-ID: <1A4207F8295607498283FE9E93B775B4068498DB@EX02.asurite.ad.asu.edu> Bioperl is just a collection of tools, not a full blown application. Most of what you want can be done with the objects available from within the toolkit, but the application (perl script) would still need to be written to put the objects to use. You could use clustalw from within perl to align the sequences (Bio::Tools::Run::Alignment::Clustalw), find the conserved regions (Bio::SimpleAlign), reverse translate them (Bio::Tools::CodonTable), then come up with an algorithm for primer analysis and selction (or even use other apps like primer3 (Bio::Tools::Run::Primer3) from within perl). Kevin Brown Center for Innovations in Medicine Biodesign Institute Arizona State University > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Bhakti Dwivedi > Sent: Wednesday, January 27, 2010 12:42 PM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Designing primers from multiple sequence > alignment ofamino acid sequences > > Hi, > > I have to design primers from the multiple sequence > alignments of amino acid > sequences. The sequences I am working with are quite > diverged and often the > available primer design programs (such as CODEHOP/iCODEHOP) > fail to find any > primer sets. But, when I look at the alignment manually, I > could see the > regions that I could use to make primers. > > So I designed the degenerate primers the old-fashioned way, > starting from > selecting the conserved regions (6-10aa long) from the alignment to > translating the selected regions to DNA using the appropriate > codon usage > table, and then finally checking the primer sets (potential > forward and > reverse primers) using tools like OLIGOANALYZER. In the end, > I did find few > good primer sets, but getting them to work in reality is > something I will > have to wait and see. > > While doing this process manually, I really felt the need to > automate it (it > was not just one alignment I did, I worked with several of > those). I was > wondering if there is anyway bioperl can help me here, or > making a perl > script is the only way to go. > > I would appreciate your suggestions/comments. Thanks! > (apologize for a > long email..) > > > Regards > Bhakti > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From mike.stubbington at bbsrc.ac.uk Thu Jan 28 10:41:49 2010 From: mike.stubbington at bbsrc.ac.uk (mike stubbington (BI)) Date: Thu, 28 Jan 2010 15:41:49 +0000 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error running blastn Message-ID: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk> Dear all, I am attempting to blast some primers against the mouse genome. I have created a local mouse genome blast database and I can search against it using 'blastn' at the command line. I have perl code that creates an array of bioperl sequence objects called @primers I then create a StandAloneBlastPlus factory using the following code? my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new( -db_dir => '/Users/stubbing/localBlast/', -db_name => 'MouseGenome' ); and then attempt to blast my primers using this? my @shortPrimers; my $count=1; foreach (@primers) { my $currentSeq = $_; print "Checking primer $count/$primerNumber "; if ($_->length < 40) { push(@shortPrimers,$_); print "Too short!\n"; } else { print "BLASTing..."; my $blastResult = $blastFactory->blastn(-query => $currentSeq); } $count++; } This fails with the following error? ------------- EXCEPTION ------------- MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running /usr/local/ncbi/blast/bin/blastn : Illegal seek at /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 989, line 532. STACK Bio::Tools::Run::WrapperBase::_run /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:994 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 STACK Bio::Tools::Run::StandAloneBlastPlus::run /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 STACK toplevel ./5CTest.pl:63 ------------------------------------- Line 63 in my code is (as you might expect) the one that calls blastn on my factory object. I'd appreciate any help you might be able to provide to shed light on this. Thanks in advance, Mike From maj at fortinbras.us Thu Jan 28 10:56:14 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 28 Jan 2010 10:56:14 -0500 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error running blastn In-Reply-To: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk> References: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk> Message-ID: <56CC8EF7EEAD4E28A3B822ADA8AD74B1@NewLife> Mike - please try updating your bioperl-live (the core) to the latest code (revision 16761 or so). CommandExts is a work in progress; from the stack errors it looks like you've got an older version. Try it then ping us back, if you would-- Thanks Mark ----- Original Message ----- From: "mike stubbington (BI)" To: Sent: Thursday, January 28, 2010 10:41 AM Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error running blastn Dear all, I am attempting to blast some primers against the mouse genome. I have created a local mouse genome blast database and I can search against it using 'blastn' at the command line. I have perl code that creates an array of bioperl sequence objects called @primers I then create a StandAloneBlastPlus factory using the following code? my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new( -db_dir => '/Users/stubbing/localBlast/', -db_name => 'MouseGenome' ); and then attempt to blast my primers using this? my @shortPrimers; my $count=1; foreach (@primers) { my $currentSeq = $_; print "Checking primer $count/$primerNumber "; if ($_->length < 40) { push(@shortPrimers,$_); print "Too short!\n"; } else { print "BLASTing..."; my $blastResult = $blastFactory->blastn(-query => $currentSeq); } $count++; } This fails with the following error? ------------- EXCEPTION ------------- MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running /usr/local/ncbi/blast/bin/blastn : Illegal seek at /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 989, line 532. STACK Bio::Tools::Run::WrapperBase::_run /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:994 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 STACK Bio::Tools::Run::StandAloneBlastPlus::run /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 STACK toplevel ./5CTest.pl:63 ------------------------------------- Line 63 in my code is (as you might expect) the one that calls blastn on my factory object. I'd appreciate any help you might be able to provide to shed light on this. Thanks in advance, Mike _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From mike.stubbington at bbsrc.ac.uk Thu Jan 28 11:18:12 2010 From: mike.stubbington at bbsrc.ac.uk (mike stubbington (BI)) Date: Thu, 28 Jan 2010 16:18:12 +0000 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error running blastn In-Reply-To: <56CC8EF7EEAD4E28A3B822ADA8AD74B1@NewLife> References: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk> <56CC8EF7EEAD4E28A3B822ADA8AD74B1@NewLife> Message-ID: <05171AE0-71F2-4102-8131-70C6019D0172@bbsrc.ac.uk> Hi, Thanks for the suggestion. Unfortunately it still fails - error as follows: ------------- EXCEPTION ------------- MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running /usr/local/ncbi/blast/bin/blastn : Illegal seek at /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 1000, line 532. STACK Bio::Tools::Run::WrapperBase::_run /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1005 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 STACK Bio::Tools::Run::StandAloneBlastPlus::run /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 STACK toplevel ./5CTest.pl:63 ------------------------------------- M On 28 Jan 2010, at 15:56, Mark A. Jensen wrote: > Mike - please try updating your bioperl-live (the core) to the latest code > (revision 16761 or so). > CommandExts is a work in progress; from the stack errors it looks like you've > got an older version. > Try it then ping us back, if you would-- > Thanks > Mark > ----- Original Message ----- > From: "mike stubbington (BI)" > To: > Sent: Thursday, January 28, 2010 10:41 AM > Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error > running blastn > > > Dear all, > > I am attempting to blast some primers against the mouse genome. I have created a > local mouse genome blast database and I can search against it using 'blastn' at > the command line. > > I have perl code that creates an array of bioperl sequence objects called > @primers > > I then create a StandAloneBlastPlus factory using the following code? > > my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new( > -db_dir => '/Users/stubbing/localBlast/', > -db_name => 'MouseGenome' > ); > > and then attempt to blast my primers using this? > > my @shortPrimers; > my $count=1; > foreach (@primers) { > my $currentSeq = $_; > print "Checking primer $count/$primerNumber "; > if ($_->length < 40) { > push(@shortPrimers,$_); > print "Too short!\n"; > } > else { > print "BLASTing..."; > my $blastResult = $blastFactory->blastn(-query => $currentSeq); > } > $count++; > } > > This fails with the following error? > > ------------- EXCEPTION ------------- > MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running > /usr/local/ncbi/blast/bin/blastn : Illegal seek at > /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 989, > line 532. > > STACK Bio::Tools::Run::WrapperBase::_run > /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:994 > STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 > STACK Bio::Tools::Run::StandAloneBlastPlus::run > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 > STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 > STACK toplevel ./5CTest.pl:63 > ------------------------------------- > > Line 63 in my code is (as you might expect) the one that calls blastn on my > factory object. > > I'd appreciate any help you might be able to provide to shed light on this. > > Thanks in advance, > > Mike > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Thu Jan 28 11:28:52 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 28 Jan 2010 11:28:52 -0500 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error running blastn In-Reply-To: <05171AE0-71F2-4102-8131-70C6019D0172@bbsrc.ac.uk> References: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk> <56CC8EF7EEAD4E28A3B822ADA8AD74B1@NewLife> <05171AE0-71F2-4102-8131-70C6019D0172@bbsrc.ac.uk> Message-ID: Thanks Mike-- will have a look asap- cheers MAJ ----- Original Message ----- From: "mike stubbington (BI)" To: "Mark A. Jensen" Cc: Sent: Thursday, January 28, 2010 11:18 AM Subject: Re: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error running blastn Hi, Thanks for the suggestion. Unfortunately it still fails - error as follows: ------------- EXCEPTION ------------- MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running /usr/local/ncbi/blast/bin/blastn : Illegal seek at /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 1000, line 532. STACK Bio::Tools::Run::WrapperBase::_run /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1005 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 STACK Bio::Tools::Run::StandAloneBlastPlus::run /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 STACK toplevel ./5CTest.pl:63 ------------------------------------- M On 28 Jan 2010, at 15:56, Mark A. Jensen wrote: > Mike - please try updating your bioperl-live (the core) to the latest code > (revision 16761 or so). > CommandExts is a work in progress; from the stack errors it looks like you've > got an older version. > Try it then ping us back, if you would-- > Thanks > Mark > ----- Original Message ----- > From: "mike stubbington (BI)" > To: > Sent: Thursday, January 28, 2010 10:41 AM > Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error > running blastn > > > Dear all, > > I am attempting to blast some primers against the mouse genome. I have created > a > local mouse genome blast database and I can search against it using 'blastn' > at > the command line. > > I have perl code that creates an array of bioperl sequence objects called > @primers > > I then create a StandAloneBlastPlus factory using the following code? > > my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new( > -db_dir => '/Users/stubbing/localBlast/', > -db_name => 'MouseGenome' > ); > > and then attempt to blast my primers using this? > > my @shortPrimers; > my $count=1; > foreach (@primers) { > my $currentSeq = $_; > print "Checking primer $count/$primerNumber "; > if ($_->length < 40) { > push(@shortPrimers,$_); > print "Too short!\n"; > } > else { > print "BLASTing..."; > my $blastResult = $blastFactory->blastn(-query => $currentSeq); > } > $count++; > } > > This fails with the following error? > > ------------- EXCEPTION ------------- > MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem > running > /usr/local/ncbi/blast/bin/blastn : Illegal seek at > /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 989, > line 532. > > STACK Bio::Tools::Run::WrapperBase::_run > /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:994 > STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 > STACK Bio::Tools::Run::StandAloneBlastPlus::run > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 > STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 > STACK toplevel ./5CTest.pl:63 > ------------------------------------- > > Line 63 in my code is (as you might expect) the one that calls blastn on my > factory object. > > I'd appreciate any help you might be able to provide to shed light on this. > > Thanks in advance, > > Mike > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Thu Jan 28 13:26:27 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 28 Jan 2010 12:26:27 -0600 Subject: [Bioperl-l] how to retrieve organism name from accession number? In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C61AC964C@exchsth.agresearch.co.nz> References: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife> <18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz> <4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu> <18DF7D20DFEC044098A1062202F5FFF32C61AC961E@exchsth.agresearch.co.nz> <5FF9B845-2B6A-4B8C-B8AE-A12E0CBCA2D6@illinois.edu> <18DF7D20DFEC044098A1062202F5FFF32C61AC964C@exchsth.agresearch.co.nz> Message-ID: <1264703187.5473.10.camel@cjfields.igb.uiuc.edu> Russell, Just curious, but have you tried setting the return email parameter (-email)? NCBI recently stated that all queries would eventually require a return email of some sort (not sure if it's validated or not). I think that was set for around late spring. I'm changing the code in svn to require it for that very purpose. chris Wed, 2010-01-27 at 15:45 +1300, Smithies, Russell wrote: > Batch-entrez http://www.ncbi.nlm.nih.gov/portal/utils/batchentrez_p.cgi still works if you don't mind a bit of manual button clicking. It's handling chunks of 100,000 records OK (today). > > --Russell > > > -----Original Message----- > > From: Chris Fields [mailto:cjfields at illinois.edu] > > Sent: Wednesday, 27 January 2010 3:42 p.m. > > To: Smithies, Russell > > Cc: 'bioperl-l at lists.open-bio.org'; 'Mark A. Jensen' > > Subject: Re: [Bioperl-l] how to retrieve organism name from accession > > number? > > > > Makes me wonder if they're pushing more users towards the SOAP-based > > services and away from eutils. > > > > chris > > > > On Jan 26, 2010, at 7:59 PM, Smithies, Russell wrote: > > > > > I've had a wide selection of errors lately: > > > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > > MSG: NCBI esearch fatal error: Search Backend failed: Error 11 (Resource > > temporarily unavailable) > > > STACK: Error::throw > > > STACK: Bio::Root::Root::throw > > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 > > > STACK: Bio::Tools::EUtilities::parse_data > > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332 > > > STACK: Bio::Tools::EUtilities::get_ids > > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441 > > > STACK: Bio::DB::EUtilities::get_ids > > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363 > > > STACK: get_desc.pl:32 > > > ----------------------------------------------------------- > > > > > > And I never get a good explanation from NCBI or suggestions on how to > > avoid it. > > > > > > > > > --Russell > > > > > > > > >> -----Original Message----- > > >> From: Chris Fields [mailto:cjfields at illinois.edu] > > >> Sent: Wednesday, 27 January 2010 2:46 p.m. > > >> To: Smithies, Russell > > >> Cc: 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org' > > >> Subject: Re: [Bioperl-l] how to retrieve organism name from accession > > >> number? > > >> > > >> It's unfortunate but I have heard this problem popping up quite a bit > > more > > >> frequently lately. Not to push too many buttons but NCBI isn't very > > >> forthcoming with help these days; they have become quite insular. Not > > >> sure if they're short-staffed due to budget or if there are other > > issues. > > >> > > >> chris > > >> > > >> On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote: > > >> > > >>> Grrrrrr, I hate eutils!!!! > > >>> > > >>> ------------- EXCEPTION: Bio::Root::Exception ------------- > > >>> MSG: NCBI esearch fatal error: Search Backend failed: Error 111 > > >> (Connection refused) > > >>> STACK: Error::throw > > >>> STACK: Bio::Root::Root::throw > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 > > >>> STACK: Bio::Tools::EUtilities::parse_data > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332 > > >>> STACK: Bio::Tools::EUtilities::get_ids > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441 > > >>> STACK: Bio::DB::EUtilities::get_ids > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363 > > >>> STACK: get_desc.pl:32 > > >>> ----------------------------------------------------------- > > >>> > > >>> > > >>> Nice error message though :-) > > >>> > > >>> > > >>> --Russell > > >>> > > >>>> -----Original Message----- > > >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > >>>> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell > > >>>> Sent: Monday, 11 January 2010 10:05 a.m. > > >>>> To: 'Chris Fields' > > >>>> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open- > > bio.org' > > >>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession > > >>>> number? > > >>>> > > >>>> I've started to go off eUtils recently (not BioPerl's fault) as I've > > >> often > > >>>> been finding that with large queries, chunks of the resulting data is > > >>>> missing. > > >>>> For example, before Xmas I was creating species-specific databases by > > >>>> using eUtils to get a list of GI numbers back for a taxid, then > > >> retrieving > > >>>> the fasta sequences in chunks of 500. > > >>>> Very regularly, in the middle of the fasta there would be a message > > >> about > > >>>> resource unavailable eg. > > >>>>> test_sequence_1 > > >>>> TACGATCATCGCTResource UnavailableTACGACTCTGCT > > >>>>> test_sequence_2 > > >>>> TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT > > >>>> > > >>>> Often this wasn't detected until formatdb complained about invalid > > >>>> characters. > > >>>> Inquiries to NCBI as to why this was happening and what to do about > > it > > >>>> returned stupid answers ("do each sequence manually thru the web > > >>>> interface", or "use eUtils"). > > >>>> As we have a nice fast network connection, I now prefer to download > > >> very > > >>>> large gzip files (i.e. all of refseq) and extract what I need. > > >>>> > > >>>> I can't help but think that NCBI could solve a lot of problems if > > they > > >>>> gzipped the output from eUtils queries - it's something I've > > requested > > >>>> regularly for the last 5 years or so!! > > >>>> > > >>>> --Russell > > >>>> > > >>>> > > >>>>> -----Original Message----- > > >>>>> From: Chris Fields [mailto:cjfields at illinois.edu] > > >>>>> Sent: Monday, 11 January 2010 9:50 a.m. > > >>>>> To: Smithies, Russell > > >>>>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open- > > bio.org' > > >>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from > > accession > > >>>>> number? > > >>>>> > > >>>>> One could also use Bio::DB::Taxonomy, which indexes the same files > > or > > >>>>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for > > >> the > > >>>>> details). > > >>>>> > > >>>>> chris > > >>>>> > > >>>>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote: > > >>>>> > > >>>>>> An alternate non-BioPerly way (that may be faster given NCBI's > > >>>> flakiness > > >>>>> lately) would be to download the gi_taxid_nucl.zip or > > >> gi_taxid_prot.zip > > >>>>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a > > hash > > >>>> and > > >>>>> do lookups. > > >>>>>> In that same dir, taxdump.tar.gz contains a file called names.dmp > > >>>> which > > >>>>> lists taxids and descriptions (and synonyms) > > >>>>>> > > >>>>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so > > I > > >>>>> could do this: > > >>>>>> > > >>>>>> my $taxid = $gi_taxid_nucl{$accession}; > > >>>>>> my $org_name = $names{$taxid}; > > >>>>>> > > >>>>>> --Russell > > >>>>>> > > >>>>>> > > >>>>>>> -----Original Message----- > > >>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > >>>>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen > > >>>>>>> Sent: Saturday, 26 December 2009 4:52 p.m. > > >>>>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org > > >>>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from > > >> accession > > >>>>>>> number? > > >>>>>>> > > >>>>>>> Bhakti, > > >>>>>>> The following example (using EUtilities) may serve your purpose: > > >>>>>>> > > >>>>>>> use Bio::DB::EUtilities; > > >>>>>>> > > >>>>>>> my (%taxa, @taxa); > > >>>>>>> my (%names, %idmap); > > >>>>>>> > > >>>>>>> # these are protein ids; nuc ids will work by changing -dbfrom => > > >>>>>>> 'nucleotide', > > >>>>>>> # (probably) > > >>>>>>> > > >>>>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439); > > >>>>>>> > > >>>>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink', > > >>>>>>> -db => 'taxonomy', > > >>>>>>> -dbfrom => 'protein', > > >>>>>>> -correspondence => 1, > > >>>>>>> -id => \@ids); > > >>>>>>> > > >>>>>>> # iterate through the LinkSet objects > > >>>>>>> while (my $ds = $factory->next_LinkSet) { > > >>>>>>> $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0] > > >>>>>>> } > > >>>>>>> > > >>>>>>> @taxa = @taxa{@ids}; > > >>>>>>> > > >>>>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary', > > >>>>>>> -db => 'taxonomy', > > >>>>>>> -id => \@taxa ); > > >>>>>>> > > >>>>>>> while (local $_ = $factory->next_DocSum) { > > >>>>>>> $names{($_->get_contents_by_name('TaxId'))[0]} = > > >>>>>>> ($_->get_contents_by_name('ScientificName'))[0]; > > >>>>>>> } > > >>>>>>> > > >>>>>>> foreach (@ids) { > > >>>>>>> $idmap{$_} = $names{$taxa{$_}}; > > >>>>>>> } > > >>>>>>> > > >>>>>>> # %idmap is > > >>>>>>> # 1621261 => 'Mycobacterium tuberculosis H37Rv' > > >>>>>>> # 20807972 => 'Thermoanaerobacter tengcongensis MB4' > > >>>>>>> # 68536103 => 'Corynebacterium jeikeium K411' > > >>>>>>> # 730439 => 'Bacillus caldolyticus' > > >>>>>>> # 89318838 => undef (this record has been removed from the > > db) > > >>>>>>> > > >>>>>>> 1; > > >>>>>>> > > >>>>>>> You probably will need to break up your 30000 into chunks > > >>>>>>> (say, 1000-3000 each), and do the above on each chunk with a > > >>>>>>> > > >>>>>>> sleep 3; > > >>>>>>> > > >>>>>>> or so separating the queries. > > >>>>>>> MAJ > > >>>>>>> ----- Original Message ----- > > >>>>>>> From: "Bhakti Dwivedi" > > >>>>>>> To: > > >>>>>>> Sent: Friday, December 25, 2009 9:46 PM > > >>>>>>> Subject: [Bioperl-l] how to retrieve organism name from accession > > >>>>> number? > > >>>>>>> > > >>>>>>> > > >>>>>>>> Hi, > > >>>>>>>> > > >>>>>>>> Does anyone know how to retrieve the "Source" or the "Species > > name" > > >>>>>>> given > > >>>>>>>> the accession number using Bioperl. I have these 30,000 > > accession > > >>>>>>> numbers > > >>>>>>>> for which I need to get the source organisms. Any kind of help > > >> will > > >>>>> be > > >>>>>>>> appreciated. > > >>>>>>>> > > >>>>>>>> Thanks > > >>>>>>>> > > >>>>>>>> BD > > >>>>>>>> _______________________________________________ > > >>>>>>>> Bioperl-l mailing list > > >>>>>>>> Bioperl-l at lists.open-bio.org > > >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > >>>>>>>> > > >>>>>>>> > > >>>>>>> > > >>>>>>> _______________________________________________ > > >>>>>>> Bioperl-l mailing list > > >>>>>>> Bioperl-l at lists.open-bio.org > > >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > >>>>>> > > >>>> > > ======================================================================= > > >>>>>> Attention: The information contained in this message and/or > > >>>> attachments > > >>>>>> from AgResearch Limited is intended only for the persons or > > entities > > >>>>>> to which it is addressed and may contain confidential and/or > > >>>> privileged > > >>>>>> material. Any review, retransmission, dissemination or other use > > of, > > >>>> or > > >>>>>> taking of any action in reliance upon, this information by persons > > or > > >>>>>> entities other than the intended recipients is prohibited by > > >>>> AgResearch > > >>>>>> Limited. If you have received this message in error, please notify > > >> the > > >>>>>> sender immediately. > > >>>>>> > > >>>> > > ======================================================================= > > >>>>>> > > >>>>>> _______________________________________________ > > >>>>>> Bioperl-l mailing list > > >>>>>> Bioperl-l at lists.open-bio.org > > >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > >>>> > > >>>> > > >>>> _______________________________________________ > > >>>> Bioperl-l mailing list > > >>>> Bioperl-l at lists.open-bio.org > > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Thu Jan 28 13:47:04 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 28 Jan 2010 13:47:04 -0500 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error running blastn In-Reply-To: <05171AE0-71F2-4102-8131-70C6019D0172@bbsrc.ac.uk> References: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk><56CC8EF7EEAD4E28A3B822ADA8AD74B1@NewLife> <05171AE0-71F2-4102-8131-70C6019D0172@bbsrc.ac.uk> Message-ID: Hi Mike, Believe I found the real bug causing the problem (was not accounting for the db_dir parameter). Crashes should now also throw much more helpful errors. Please try the code at r16774, and shout back. thanks -- MAJ ----- Original Message ----- From: "mike stubbington (BI)" To: "Mark A. Jensen" Cc: Sent: Thursday, January 28, 2010 11:18 AM Subject: Re: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error running blastn Hi, Thanks for the suggestion. Unfortunately it still fails - error as follows: ------------- EXCEPTION ------------- MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running /usr/local/ncbi/blast/bin/blastn : Illegal seek at /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 1000, line 532. STACK Bio::Tools::Run::WrapperBase::_run /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1005 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 STACK Bio::Tools::Run::StandAloneBlastPlus::run /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 STACK toplevel ./5CTest.pl:63 ------------------------------------- M On 28 Jan 2010, at 15:56, Mark A. Jensen wrote: > Mike - please try updating your bioperl-live (the core) to the latest code > (revision 16761 or so). > CommandExts is a work in progress; from the stack errors it looks like you've > got an older version. > Try it then ping us back, if you would-- > Thanks > Mark > ----- Original Message ----- > From: "mike stubbington (BI)" > To: > Sent: Thursday, January 28, 2010 10:41 AM > Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error > running blastn > > > Dear all, > > I am attempting to blast some primers against the mouse genome. I have created > a > local mouse genome blast database and I can search against it using 'blastn' > at > the command line. > > I have perl code that creates an array of bioperl sequence objects called > @primers > > I then create a StandAloneBlastPlus factory using the following code? > > my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new( > -db_dir => '/Users/stubbing/localBlast/', > -db_name => 'MouseGenome' > ); > > and then attempt to blast my primers using this? > > my @shortPrimers; > my $count=1; > foreach (@primers) { > my $currentSeq = $_; > print "Checking primer $count/$primerNumber "; > if ($_->length < 40) { > push(@shortPrimers,$_); > print "Too short!\n"; > } > else { > print "BLASTing..."; > my $blastResult = $blastFactory->blastn(-query => $currentSeq); > } > $count++; > } > > This fails with the following error? > > ------------- EXCEPTION ------------- > MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem > running > /usr/local/ncbi/blast/bin/blastn : Illegal seek at > /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 989, > line 532. > > STACK Bio::Tools::Run::WrapperBase::_run > /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:994 > STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 > STACK Bio::Tools::Run::StandAloneBlastPlus::run > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 > STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 > STACK toplevel ./5CTest.pl:63 > ------------------------------------- > > Line 63 in my code is (as you might expect) the one that calls blastn on my > factory object. > > I'd appreciate any help you might be able to provide to shed light on this. > > Thanks in advance, > > Mike > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Jan 28 14:00:26 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 28 Jan 2010 13:00:26 -0600 Subject: [Bioperl-l] EUtilities policy change Message-ID: <1264705226.5473.35.camel@cjfields.igb.uiuc.edu> All, Per NCBI's recent change in eutils user policy (effective June 1): http://bioperl.org/pipermail/bioperl-l/2009-December/031698.html Both the tool and email parameters ('-tool', '-email') are now required when making requests. Note this will significantly break all modules requiring remote access to eutils (Bio::DB::GenBank/GenPept, some Biblio and Taxonomy stuff as well, IIRC). This also applies to web services (SOAP-based access). Mark, not sure how this affects your SOAP-based modules. I have reconfigured Bio::DB::EUtilities to follow this policy; the default tool setting has been 'bioperl' and will remain that way. However, there has been no default email, therefore setting this is now required for future requests unless we (the bioperl devs) decide there is a safe default email to utilize. My gut tells me, however, that falling back to a default email opens up a can of worms for the devs and is very likely a 'BAD IDEA'(TM). Regardless, be aware that, after June 1, NCBI will very likely exclude requests with no email and will notify users who are considered to be violating their policies. I will likely make further changes to Bio::DB::EUtilities in the meantime to ensure that using the tools by default will not violate NCBI's policy (e.g. override this at your own risk). chris From maj at fortinbras.us Thu Jan 28 14:05:43 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 28 Jan 2010 14:05:43 -0500 Subject: [Bioperl-l] EUtilities policy change In-Reply-To: <1264705226.5473.35.camel@cjfields.igb.uiuc.edu> References: <1264705226.5473.35.camel@cjfields.igb.uiuc.edu> Message-ID: <8F49B5ED151143FA86E977B4D4F44265@NewLife> Thanks Chris-- The soap modules currently set tool to "SoapEUtilities(BioPerl)". I agree that a default email is a bad idea (tm) (unless maybe it's hilmar's...?). I'd say a warning on unset email parameters is a responsible "there be dragons" sort of treatment. MAJ ----- Original Message ----- From: "Chris Fields" To: "BioPerl-l" Cc: "Mark A. Jensen" Sent: Thursday, January 28, 2010 2:00 PM Subject: EUtilities policy change > All, > > Per NCBI's recent change in eutils user policy (effective June 1): > > http://bioperl.org/pipermail/bioperl-l/2009-December/031698.html > > Both the tool and email parameters ('-tool', '-email') are now required > when making requests. Note this will significantly break all modules > requiring remote access to eutils (Bio::DB::GenBank/GenPept, some Biblio > and Taxonomy stuff as well, IIRC). This also applies to web services > (SOAP-based access). Mark, not sure how this affects your SOAP-based > modules. > > I have reconfigured Bio::DB::EUtilities to follow this policy; the > default tool setting has been 'bioperl' and will remain that way. > However, there has been no default email, therefore setting this is now > required for future requests unless we (the bioperl devs) decide there > is a safe default email to utilize. My gut tells me, however, that > falling back to a default email opens up a can of worms for the devs and > is very likely a 'BAD IDEA'(TM). > > Regardless, be aware that, after June 1, NCBI will very likely exclude > requests with no email and will notify users who are considered to be > violating their policies. > > I will likely make further changes to Bio::DB::EUtilities in the > meantime to ensure that using the tools by default will not violate > NCBI's policy (e.g. override this at your own risk). > > chris > > > From cjfields at illinois.edu Thu Jan 28 14:18:22 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 28 Jan 2010 13:18:22 -0600 Subject: [Bioperl-l] EUtilities policy change In-Reply-To: <8F49B5ED151143FA86E977B4D4F44265@NewLife> References: <1264705226.5473.35.camel@cjfields.igb.uiuc.edu> <8F49B5ED151143FA86E977B4D4F44265@NewLife> Message-ID: <1264706302.5473.48.camel@cjfields.igb.uiuc.edu> I think warning is fine for now. I've reimplemented that so it occurs lazily (warns only when a request is actually made). Will also change the tool to 'BioPerl' (currently 'bioperl', all lc). We'll obviously have to address this in the test suite as well in some way, maybe ask for an email if network tests are requested. chris On Thu, 2010-01-28 at 14:05 -0500, Mark A. Jensen wrote: > Thanks Chris-- > The soap modules currently set tool to "SoapEUtilities(BioPerl)". > I agree that a default email is a bad idea (tm) (unless maybe it's > hilmar's...?). I'd say a warning on unset email parameters is a responsible > "there be dragons" sort of treatment. > MAJ > ----- Original Message ----- > From: "Chris Fields" > To: "BioPerl-l" > Cc: "Mark A. Jensen" > Sent: Thursday, January 28, 2010 2:00 PM > Subject: EUtilities policy change > > > > All, > > > > Per NCBI's recent change in eutils user policy (effective June 1): > > > > http://bioperl.org/pipermail/bioperl-l/2009-December/031698.html > > > > Both the tool and email parameters ('-tool', '-email') are now required > > when making requests. Note this will significantly break all modules > > requiring remote access to eutils (Bio::DB::GenBank/GenPept, some Biblio > > and Taxonomy stuff as well, IIRC). This also applies to web services > > (SOAP-based access). Mark, not sure how this affects your SOAP-based > > modules. > > > > I have reconfigured Bio::DB::EUtilities to follow this policy; the > > default tool setting has been 'bioperl' and will remain that way. > > However, there has been no default email, therefore setting this is now > > required for future requests unless we (the bioperl devs) decide there > > is a safe default email to utilize. My gut tells me, however, that > > falling back to a default email opens up a can of worms for the devs and > > is very likely a 'BAD IDEA'(TM). > > > > Regardless, be aware that, after June 1, NCBI will very likely exclude > > requests with no email and will notify users who are considered to be > > violating their policies. > > > > I will likely make further changes to Bio::DB::EUtilities in the > > meantime to ensure that using the tools by default will not violate > > NCBI's policy (e.g. override this at your own risk). > > > > chris > > > > > > From Russell.Smithies at agresearch.co.nz Thu Jan 28 14:25:38 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Fri, 29 Jan 2010 08:25:38 +1300 Subject: [Bioperl-l] how to retrieve organism name from accession number? In-Reply-To: <1264703187.5473.10.camel@cjfields.igb.uiuc.edu> References: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife> <18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz> <4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu> <18DF7D20DFEC044098A1062202F5FFF32C61AC961E@exchsth.agresearch.co.nz> <5FF9B845-2B6A-4B8C-B8AE-A12E0CBCA2D6@illinois.edu> <18DF7D20DFEC044098A1062202F5FFF32C61AC964C@exchsth.agresearch.co.nz> <1264703187.5473.10.camel@cjfields.igb.uiuc.edu> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C61AC9AF3@exchsth.agresearch.co.nz> Yes, I usually set the 'tool' and 'email' parameters. I went to NCBI back in 2006 and did their "PowerScripting" course where they pointed out a lot of the requirements for using eUtils. I think I requested results returned gzipped back then as well... --Russell > -----Original Message----- > From: Chris Fields [mailto:cjfields at illinois.edu] > Sent: Friday, 29 January 2010 7:26 a.m. > To: Smithies, Russell > Cc: 'bioperl-l at lists.open-bio.org'; 'Mark A. Jensen' > Subject: Re: [Bioperl-l] how to retrieve organism name from accession > number? > > Russell, > > Just curious, but have you tried setting the return email parameter > (-email)? NCBI recently stated that all queries would eventually > require a return email of some sort (not sure if it's validated or not). > I think that was set for around late spring. I'm changing the code in > svn to require it for that very purpose. > > chris > > > Wed, 2010-01-27 at 15:45 +1300, Smithies, Russell wrote: > > Batch-entrez http://www.ncbi.nlm.nih.gov/portal/utils/batchentrez_p.cgi > still works if you don't mind a bit of manual button clicking. It's > handling chunks of 100,000 records OK (today). > > > > --Russell > > > > > -----Original Message----- > > > From: Chris Fields [mailto:cjfields at illinois.edu] > > > Sent: Wednesday, 27 January 2010 3:42 p.m. > > > To: Smithies, Russell > > > Cc: 'bioperl-l at lists.open-bio.org'; 'Mark A. Jensen' > > > Subject: Re: [Bioperl-l] how to retrieve organism name from accession > > > number? > > > > > > Makes me wonder if they're pushing more users towards the SOAP-based > > > services and away from eutils. > > > > > > chris > > > > > > On Jan 26, 2010, at 7:59 PM, Smithies, Russell wrote: > > > > > > > I've had a wide selection of errors lately: > > > > > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > > > MSG: NCBI esearch fatal error: Search Backend failed: Error 11 > (Resource > > > temporarily unavailable) > > > > STACK: Error::throw > > > > STACK: Bio::Root::Root::throw > > > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 > > > > STACK: Bio::Tools::EUtilities::parse_data > > > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332 > > > > STACK: Bio::Tools::EUtilities::get_ids > > > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441 > > > > STACK: Bio::DB::EUtilities::get_ids > > > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363 > > > > STACK: get_desc.pl:32 > > > > ----------------------------------------------------------- > > > > > > > > And I never get a good explanation from NCBI or suggestions on how > to > > > avoid it. > > > > > > > > > > > > --Russell > > > > > > > > > > > >> -----Original Message----- > > > >> From: Chris Fields [mailto:cjfields at illinois.edu] > > > >> Sent: Wednesday, 27 January 2010 2:46 p.m. > > > >> To: Smithies, Russell > > > >> Cc: 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org' > > > >> Subject: Re: [Bioperl-l] how to retrieve organism name from > accession > > > >> number? > > > >> > > > >> It's unfortunate but I have heard this problem popping up quite a > bit > > > more > > > >> frequently lately. Not to push too many buttons but NCBI isn't > very > > > >> forthcoming with help these days; they have become quite insular. > Not > > > >> sure if they're short-staffed due to budget or if there are other > > > issues. > > > >> > > > >> chris > > > >> > > > >> On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote: > > > >> > > > >>> Grrrrrr, I hate eutils!!!! > > > >>> > > > >>> ------------- EXCEPTION: Bio::Root::Exception ------------- > > > >>> MSG: NCBI esearch fatal error: Search Backend failed: Error 111 > > > >> (Connection refused) > > > >>> STACK: Error::throw > > > >>> STACK: Bio::Root::Root::throw > > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 > > > >>> STACK: Bio::Tools::EUtilities::parse_data > > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332 > > > >>> STACK: Bio::Tools::EUtilities::get_ids > > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441 > > > >>> STACK: Bio::DB::EUtilities::get_ids > > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363 > > > >>> STACK: get_desc.pl:32 > > > >>> ----------------------------------------------------------- > > > >>> > > > >>> > > > >>> Nice error message though :-) > > > >>> > > > >>> > > > >>> --Russell > > > >>> > > > >>>> -----Original Message----- > > > >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > > >>>> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell > > > >>>> Sent: Monday, 11 January 2010 10:05 a.m. > > > >>>> To: 'Chris Fields' > > > >>>> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open- > > > bio.org' > > > >>>> Subject: Re: [Bioperl-l] how to retrieve organism name from > accession > > > >>>> number? > > > >>>> > > > >>>> I've started to go off eUtils recently (not BioPerl's fault) as > I've > > > >> often > > > >>>> been finding that with large queries, chunks of the resulting > data is > > > >>>> missing. > > > >>>> For example, before Xmas I was creating species-specific > databases by > > > >>>> using eUtils to get a list of GI numbers back for a taxid, then > > > >> retrieving > > > >>>> the fasta sequences in chunks of 500. > > > >>>> Very regularly, in the middle of the fasta there would be a > message > > > >> about > > > >>>> resource unavailable eg. > > > >>>>> test_sequence_1 > > > >>>> TACGATCATCGCTResource UnavailableTACGACTCTGCT > > > >>>>> test_sequence_2 > > > >>>> TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT > > > >>>> > > > >>>> Often this wasn't detected until formatdb complained about > invalid > > > >>>> characters. > > > >>>> Inquiries to NCBI as to why this was happening and what to do > about > > > it > > > >>>> returned stupid answers ("do each sequence manually thru the web > > > >>>> interface", or "use eUtils"). > > > >>>> As we have a nice fast network connection, I now prefer to > download > > > >> very > > > >>>> large gzip files (i.e. all of refseq) and extract what I need. > > > >>>> > > > >>>> I can't help but think that NCBI could solve a lot of problems if > > > they > > > >>>> gzipped the output from eUtils queries - it's something I've > > > requested > > > >>>> regularly for the last 5 years or so!! > > > >>>> > > > >>>> --Russell > > > >>>> > > > >>>> > > > >>>>> -----Original Message----- > > > >>>>> From: Chris Fields [mailto:cjfields at illinois.edu] > > > >>>>> Sent: Monday, 11 January 2010 9:50 a.m. > > > >>>>> To: Smithies, Russell > > > >>>>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open- > > > bio.org' > > > >>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from > > > accession > > > >>>>> number? > > > >>>>> > > > >>>>> One could also use Bio::DB::Taxonomy, which indexes the same > files > > > or > > > >>>>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD > for > > > >> the > > > >>>>> details). > > > >>>>> > > > >>>>> chris > > > >>>>> > > > >>>>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote: > > > >>>>> > > > >>>>>> An alternate non-BioPerly way (that may be faster given NCBI's > > > >>>> flakiness > > > >>>>> lately) would be to download the gi_taxid_nucl.zip or > > > >> gi_taxid_prot.zip > > > >>>>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into > a > > > hash > > > >>>> and > > > >>>>> do lookups. > > > >>>>>> In that same dir, taxdump.tar.gz contains a file called > names.dmp > > > >>>> which > > > >>>>> lists taxids and descriptions (and synonyms) > > > >>>>>> > > > >>>>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes > so > > > I > > > >>>>> could do this: > > > >>>>>> > > > >>>>>> my $taxid = $gi_taxid_nucl{$accession}; > > > >>>>>> my $org_name = $names{$taxid}; > > > >>>>>> > > > >>>>>> --Russell > > > >>>>>> > > > >>>>>> > > > >>>>>>> -----Original Message----- > > > >>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > > >>>>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen > > > >>>>>>> Sent: Saturday, 26 December 2009 4:52 p.m. > > > >>>>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org > > > >>>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from > > > >> accession > > > >>>>>>> number? > > > >>>>>>> > > > >>>>>>> Bhakti, > > > >>>>>>> The following example (using EUtilities) may serve your > purpose: > > > >>>>>>> > > > >>>>>>> use Bio::DB::EUtilities; > > > >>>>>>> > > > >>>>>>> my (%taxa, @taxa); > > > >>>>>>> my (%names, %idmap); > > > >>>>>>> > > > >>>>>>> # these are protein ids; nuc ids will work by changing -dbfrom > => > > > >>>>>>> 'nucleotide', > > > >>>>>>> # (probably) > > > >>>>>>> > > > >>>>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439); > > > >>>>>>> > > > >>>>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink', > > > >>>>>>> -db => 'taxonomy', > > > >>>>>>> -dbfrom => 'protein', > > > >>>>>>> -correspondence => 1, > > > >>>>>>> -id => \@ids); > > > >>>>>>> > > > >>>>>>> # iterate through the LinkSet objects > > > >>>>>>> while (my $ds = $factory->next_LinkSet) { > > > >>>>>>> $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0] > > > >>>>>>> } > > > >>>>>>> > > > >>>>>>> @taxa = @taxa{@ids}; > > > >>>>>>> > > > >>>>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary', > > > >>>>>>> -db => 'taxonomy', > > > >>>>>>> -id => \@taxa ); > > > >>>>>>> > > > >>>>>>> while (local $_ = $factory->next_DocSum) { > > > >>>>>>> $names{($_->get_contents_by_name('TaxId'))[0]} = > > > >>>>>>> ($_->get_contents_by_name('ScientificName'))[0]; > > > >>>>>>> } > > > >>>>>>> > > > >>>>>>> foreach (@ids) { > > > >>>>>>> $idmap{$_} = $names{$taxa{$_}}; > > > >>>>>>> } > > > >>>>>>> > > > >>>>>>> # %idmap is > > > >>>>>>> # 1621261 => 'Mycobacterium tuberculosis H37Rv' > > > >>>>>>> # 20807972 => 'Thermoanaerobacter tengcongensis MB4' > > > >>>>>>> # 68536103 => 'Corynebacterium jeikeium K411' > > > >>>>>>> # 730439 => 'Bacillus caldolyticus' > > > >>>>>>> # 89318838 => undef (this record has been removed from > the > > > db) > > > >>>>>>> > > > >>>>>>> 1; > > > >>>>>>> > > > >>>>>>> You probably will need to break up your 30000 into chunks > > > >>>>>>> (say, 1000-3000 each), and do the above on each chunk with a > > > >>>>>>> > > > >>>>>>> sleep 3; > > > >>>>>>> > > > >>>>>>> or so separating the queries. > > > >>>>>>> MAJ > > > >>>>>>> ----- Original Message ----- > > > >>>>>>> From: "Bhakti Dwivedi" > > > >>>>>>> To: > > > >>>>>>> Sent: Friday, December 25, 2009 9:46 PM > > > >>>>>>> Subject: [Bioperl-l] how to retrieve organism name from > accession > > > >>>>> number? > > > >>>>>>> > > > >>>>>>> > > > >>>>>>>> Hi, > > > >>>>>>>> > > > >>>>>>>> Does anyone know how to retrieve the "Source" or the "Species > > > name" > > > >>>>>>> given > > > >>>>>>>> the accession number using Bioperl. I have these 30,000 > > > accession > > > >>>>>>> numbers > > > >>>>>>>> for which I need to get the source organisms. Any kind of > help > > > >> will > > > >>>>> be > > > >>>>>>>> appreciated. > > > >>>>>>>> > > > >>>>>>>> Thanks > > > >>>>>>>> > > > >>>>>>>> BD > > > >>>>>>>> _______________________________________________ > > > >>>>>>>> Bioperl-l mailing list > > > >>>>>>>> Bioperl-l at lists.open-bio.org > > > >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>> > > > >>>>>>> _______________________________________________ > > > >>>>>>> Bioperl-l mailing list > > > >>>>>>> Bioperl-l at lists.open-bio.org > > > >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > >>>>>> > > > >>>> > > > > ======================================================================= > > > >>>>>> Attention: The information contained in this message and/or > > > >>>> attachments > > > >>>>>> from AgResearch Limited is intended only for the persons or > > > entities > > > >>>>>> to which it is addressed and may contain confidential and/or > > > >>>> privileged > > > >>>>>> material. Any review, retransmission, dissemination or other > use > > > of, > > > >>>> or > > > >>>>>> taking of any action in reliance upon, this information by > persons > > > or > > > >>>>>> entities other than the intended recipients is prohibited by > > > >>>> AgResearch > > > >>>>>> Limited. If you have received this message in error, please > notify > > > >> the > > > >>>>>> sender immediately. > > > >>>>>> > > > >>>> > > > > ======================================================================= > > > >>>>>> > > > >>>>>> _______________________________________________ > > > >>>>>> Bioperl-l mailing list > > > >>>>>> Bioperl-l at lists.open-bio.org > > > >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > >>>> > > > >>>> > > > >>>> _______________________________________________ > > > >>>> Bioperl-l mailing list > > > >>>> Bioperl-l at lists.open-bio.org > > > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l at lists.open-bio.org > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Thu Jan 28 14:30:12 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 28 Jan 2010 13:30:12 -0600 Subject: [Bioperl-l] how to retrieve organism name from accession number? In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C61AC9AF3@exchsth.agresearch.co.nz> References: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife> <18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz> <4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu> <18DF7D20DFEC044098A1062202F5FFF32C61AC961E@exchsth.agresearch.co.nz> <5FF9B845-2B6A-4B8C-B8AE-A12E0CBCA2D6@illinois.edu> <18DF7D20DFEC044098A1062202F5FFF32C61AC964C@exchsth.agresearch.co.nz> <1264703187.5473.10.camel@cjfields.igb.uiuc.edu> <18DF7D20DFEC044098A1062202F5FFF32C61AC9AF3@exchsth.agresearch.co.nz> Message-ID: <1264707012.5473.51.camel@cjfields.igb.uiuc.edu> Russell, Okay, just wanted to make sure. The email/tool requirements weren't actually enforced up until now, which is forcing us to do a bit of re-work on the various tools that don't have it set by default (at least warn users unaware of it). And I agree, gzipped archives would be nice! chris On Fri, 2010-01-29 at 08:25 +1300, Smithies, Russell wrote: > Yes, I usually set the 'tool' and 'email' parameters. > I went to NCBI back in 2006 and did their "PowerScripting" course where they pointed out a lot of the requirements for using eUtils. I think I requested results returned gzipped back then as well... > > --Russell > > > -----Original Message----- > > From: Chris Fields [mailto:cjfields at illinois.edu] > > Sent: Friday, 29 January 2010 7:26 a.m. > > To: Smithies, Russell > > Cc: 'bioperl-l at lists.open-bio.org'; 'Mark A. Jensen' > > Subject: Re: [Bioperl-l] how to retrieve organism name from accession > > number? > > > > Russell, > > > > Just curious, but have you tried setting the return email parameter > > (-email)? NCBI recently stated that all queries would eventually > > require a return email of some sort (not sure if it's validated or not). > > I think that was set for around late spring. I'm changing the code in > > svn to require it for that very purpose. > > > > chris > > > > > > Wed, 2010-01-27 at 15:45 +1300, Smithies, Russell wrote: > > > Batch-entrez http://www.ncbi.nlm.nih.gov/portal/utils/batchentrez_p.cgi > > still works if you don't mind a bit of manual button clicking. It's > > handling chunks of 100,000 records OK (today). > > > > > > --Russell > > > > > > > -----Original Message----- > > > > From: Chris Fields [mailto:cjfields at illinois.edu] > > > > Sent: Wednesday, 27 January 2010 3:42 p.m. > > > > To: Smithies, Russell > > > > Cc: 'bioperl-l at lists.open-bio.org'; 'Mark A. Jensen' > > > > Subject: Re: [Bioperl-l] how to retrieve organism name from accession > > > > number? > > > > > > > > Makes me wonder if they're pushing more users towards the SOAP-based > > > > services and away from eutils. > > > > > > > > chris > > > > > > > > On Jan 26, 2010, at 7:59 PM, Smithies, Russell wrote: > > > > > > > > > I've had a wide selection of errors lately: > > > > > > > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > > > > MSG: NCBI esearch fatal error: Search Backend failed: Error 11 > > (Resource > > > > temporarily unavailable) > > > > > STACK: Error::throw > > > > > STACK: Bio::Root::Root::throw > > > > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 > > > > > STACK: Bio::Tools::EUtilities::parse_data > > > > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332 > > > > > STACK: Bio::Tools::EUtilities::get_ids > > > > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441 > > > > > STACK: Bio::DB::EUtilities::get_ids > > > > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363 > > > > > STACK: get_desc.pl:32 > > > > > ----------------------------------------------------------- > > > > > > > > > > And I never get a good explanation from NCBI or suggestions on how > > to > > > > avoid it. > > > > > > > > > > > > > > > --Russell > > > > > > > > > > > > > > >> -----Original Message----- > > > > >> From: Chris Fields [mailto:cjfields at illinois.edu] > > > > >> Sent: Wednesday, 27 January 2010 2:46 p.m. > > > > >> To: Smithies, Russell > > > > >> Cc: 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org' > > > > >> Subject: Re: [Bioperl-l] how to retrieve organism name from > > accession > > > > >> number? > > > > >> > > > > >> It's unfortunate but I have heard this problem popping up quite a > > bit > > > > more > > > > >> frequently lately. Not to push too many buttons but NCBI isn't > > very > > > > >> forthcoming with help these days; they have become quite insular. > > Not > > > > >> sure if they're short-staffed due to budget or if there are other > > > > issues. > > > > >> > > > > >> chris > > > > >> > > > > >> On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote: > > > > >> > > > > >>> Grrrrrr, I hate eutils!!!! > > > > >>> > > > > >>> ------------- EXCEPTION: Bio::Root::Exception ------------- > > > > >>> MSG: NCBI esearch fatal error: Search Backend failed: Error 111 > > > > >> (Connection refused) > > > > >>> STACK: Error::throw > > > > >>> STACK: Bio::Root::Root::throw > > > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 > > > > >>> STACK: Bio::Tools::EUtilities::parse_data > > > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332 > > > > >>> STACK: Bio::Tools::EUtilities::get_ids > > > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441 > > > > >>> STACK: Bio::DB::EUtilities::get_ids > > > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363 > > > > >>> STACK: get_desc.pl:32 > > > > >>> ----------------------------------------------------------- > > > > >>> > > > > >>> > > > > >>> Nice error message though :-) > > > > >>> > > > > >>> > > > > >>> --Russell > > > > >>> > > > > >>>> -----Original Message----- > > > > >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > > > >>>> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell > > > > >>>> Sent: Monday, 11 January 2010 10:05 a.m. > > > > >>>> To: 'Chris Fields' > > > > >>>> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open- > > > > bio.org' > > > > >>>> Subject: Re: [Bioperl-l] how to retrieve organism name from > > accession > > > > >>>> number? > > > > >>>> > > > > >>>> I've started to go off eUtils recently (not BioPerl's fault) as > > I've > > > > >> often > > > > >>>> been finding that with large queries, chunks of the resulting > > data is > > > > >>>> missing. > > > > >>>> For example, before Xmas I was creating species-specific > > databases by > > > > >>>> using eUtils to get a list of GI numbers back for a taxid, then > > > > >> retrieving > > > > >>>> the fasta sequences in chunks of 500. > > > > >>>> Very regularly, in the middle of the fasta there would be a > > message > > > > >> about > > > > >>>> resource unavailable eg. > > > > >>>>> test_sequence_1 > > > > >>>> TACGATCATCGCTResource UnavailableTACGACTCTGCT > > > > >>>>> test_sequence_2 > > > > >>>> TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT > > > > >>>> > > > > >>>> Often this wasn't detected until formatdb complained about > > invalid > > > > >>>> characters. > > > > >>>> Inquiries to NCBI as to why this was happening and what to do > > about > > > > it > > > > >>>> returned stupid answers ("do each sequence manually thru the web > > > > >>>> interface", or "use eUtils"). > > > > >>>> As we have a nice fast network connection, I now prefer to > > download > > > > >> very > > > > >>>> large gzip files (i.e. all of refseq) and extract what I need. > > > > >>>> > > > > >>>> I can't help but think that NCBI could solve a lot of problems if > > > > they > > > > >>>> gzipped the output from eUtils queries - it's something I've > > > > requested > > > > >>>> regularly for the last 5 years or so!! > > > > >>>> > > > > >>>> --Russell > > > > >>>> > > > > >>>> > > > > >>>>> -----Original Message----- > > > > >>>>> From: Chris Fields [mailto:cjfields at illinois.edu] > > > > >>>>> Sent: Monday, 11 January 2010 9:50 a.m. > > > > >>>>> To: Smithies, Russell > > > > >>>>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open- > > > > bio.org' > > > > >>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from > > > > accession > > > > >>>>> number? > > > > >>>>> > > > > >>>>> One could also use Bio::DB::Taxonomy, which indexes the same > > files > > > > or > > > > >>>>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD > > for > > > > >> the > > > > >>>>> details). > > > > >>>>> > > > > >>>>> chris > > > > >>>>> > > > > >>>>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote: > > > > >>>>> > > > > >>>>>> An alternate non-BioPerly way (that may be faster given NCBI's > > > > >>>> flakiness > > > > >>>>> lately) would be to download the gi_taxid_nucl.zip or > > > > >> gi_taxid_prot.zip > > > > >>>>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into > > a > > > > hash > > > > >>>> and > > > > >>>>> do lookups. > > > > >>>>>> In that same dir, taxdump.tar.gz contains a file called > > names.dmp > > > > >>>> which > > > > >>>>> lists taxids and descriptions (and synonyms) > > > > >>>>>> > > > > >>>>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes > > so > > > > I > > > > >>>>> could do this: > > > > >>>>>> > > > > >>>>>> my $taxid = $gi_taxid_nucl{$accession}; > > > > >>>>>> my $org_name = $names{$taxid}; > > > > >>>>>> > > > > >>>>>> --Russell > > > > >>>>>> > > > > >>>>>> > > > > >>>>>>> -----Original Message----- > > > > >>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > > > >>>>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen > > > > >>>>>>> Sent: Saturday, 26 December 2009 4:52 p.m. > > > > >>>>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org > > > > >>>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from > > > > >> accession > > > > >>>>>>> number? > > > > >>>>>>> > > > > >>>>>>> Bhakti, > > > > >>>>>>> The following example (using EUtilities) may serve your > > purpose: > > > > >>>>>>> > > > > >>>>>>> use Bio::DB::EUtilities; > > > > >>>>>>> > > > > >>>>>>> my (%taxa, @taxa); > > > > >>>>>>> my (%names, %idmap); > > > > >>>>>>> > > > > >>>>>>> # these are protein ids; nuc ids will work by changing -dbfrom > > => > > > > >>>>>>> 'nucleotide', > > > > >>>>>>> # (probably) > > > > >>>>>>> > > > > >>>>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439); > > > > >>>>>>> > > > > >>>>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink', > > > > >>>>>>> -db => 'taxonomy', > > > > >>>>>>> -dbfrom => 'protein', > > > > >>>>>>> -correspondence => 1, > > > > >>>>>>> -id => \@ids); > > > > >>>>>>> > > > > >>>>>>> # iterate through the LinkSet objects > > > > >>>>>>> while (my $ds = $factory->next_LinkSet) { > > > > >>>>>>> $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0] > > > > >>>>>>> } > > > > >>>>>>> > > > > >>>>>>> @taxa = @taxa{@ids}; > > > > >>>>>>> > > > > >>>>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary', > > > > >>>>>>> -db => 'taxonomy', > > > > >>>>>>> -id => \@taxa ); > > > > >>>>>>> > > > > >>>>>>> while (local $_ = $factory->next_DocSum) { > > > > >>>>>>> $names{($_->get_contents_by_name('TaxId'))[0]} = > > > > >>>>>>> ($_->get_contents_by_name('ScientificName'))[0]; > > > > >>>>>>> } > > > > >>>>>>> > > > > >>>>>>> foreach (@ids) { > > > > >>>>>>> $idmap{$_} = $names{$taxa{$_}}; > > > > >>>>>>> } > > > > >>>>>>> > > > > >>>>>>> # %idmap is > > > > >>>>>>> # 1621261 => 'Mycobacterium tuberculosis H37Rv' > > > > >>>>>>> # 20807972 => 'Thermoanaerobacter tengcongensis MB4' > > > > >>>>>>> # 68536103 => 'Corynebacterium jeikeium K411' > > > > >>>>>>> # 730439 => 'Bacillus caldolyticus' > > > > >>>>>>> # 89318838 => undef (this record has been removed from > > the > > > > db) > > > > >>>>>>> > > > > >>>>>>> 1; > > > > >>>>>>> > > > > >>>>>>> You probably will need to break up your 30000 into chunks > > > > >>>>>>> (say, 1000-3000 each), and do the above on each chunk with a > > > > >>>>>>> > > > > >>>>>>> sleep 3; > > > > >>>>>>> > > > > >>>>>>> or so separating the queries. > > > > >>>>>>> MAJ > > > > >>>>>>> ----- Original Message ----- > > > > >>>>>>> From: "Bhakti Dwivedi" > > > > >>>>>>> To: > > > > >>>>>>> Sent: Friday, December 25, 2009 9:46 PM > > > > >>>>>>> Subject: [Bioperl-l] how to retrieve organism name from > > accession > > > > >>>>> number? > > > > >>>>>>> > > > > >>>>>>> > > > > >>>>>>>> Hi, > > > > >>>>>>>> > > > > >>>>>>>> Does anyone know how to retrieve the "Source" or the "Species > > > > name" > > > > >>>>>>> given > > > > >>>>>>>> the accession number using Bioperl. I have these 30,000 > > > > accession > > > > >>>>>>> numbers > > > > >>>>>>>> for which I need to get the source organisms. Any kind of > > help > > > > >> will > > > > >>>>> be > > > > >>>>>>>> appreciated. > > > > >>>>>>>> > > > > >>>>>>>> Thanks > > > > >>>>>>>> > > > > >>>>>>>> BD > > > > >>>>>>>> _______________________________________________ > > > > >>>>>>>> Bioperl-l mailing list > > > > >>>>>>>> Bioperl-l at lists.open-bio.org > > > > >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > >>>>>>>> > > > > >>>>>>>> > > > > >>>>>>> > > > > >>>>>>> _______________________________________________ > > > > >>>>>>> Bioperl-l mailing list > > > > >>>>>>> Bioperl-l at lists.open-bio.org > > > > >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > >>>>>> > > > > >>>> > > > > > > ======================================================================= > > > > >>>>>> Attention: The information contained in this message and/or > > > > >>>> attachments > > > > >>>>>> from AgResearch Limited is intended only for the persons or > > > > entities > > > > >>>>>> to which it is addressed and may contain confidential and/or > > > > >>>> privileged > > > > >>>>>> material. Any review, retransmission, dissemination or other > > use > > > > of, > > > > >>>> or > > > > >>>>>> taking of any action in reliance upon, this information by > > persons > > > > or > > > > >>>>>> entities other than the intended recipients is prohibited by > > > > >>>> AgResearch > > > > >>>>>> Limited. If you have received this message in error, please > > notify > > > > >> the > > > > >>>>>> sender immediately. > > > > >>>>>> > > > > >>>> > > > > > > ======================================================================= > > > > >>>>>> > > > > >>>>>> _______________________________________________ > > > > >>>>>> Bioperl-l mailing list > > > > >>>>>> Bioperl-l at lists.open-bio.org > > > > >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > >>>> > > > > >>>> > > > > >>>> _______________________________________________ > > > > >>>> Bioperl-l mailing list > > > > >>>> Bioperl-l at lists.open-bio.org > > > > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > > > > _______________________________________________ > > > > > Bioperl-l mailing list > > > > > Bioperl-l at lists.open-bio.org > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From maj at fortinbras.us Thu Jan 28 14:55:31 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 28 Jan 2010 14:55:31 -0500 Subject: [Bioperl-l] EUtilities policy change In-Reply-To: <1264706302.5473.48.camel@cjfields.igb.uiuc.edu> References: <1264705226.5473.35.camel@cjfields.igb.uiuc.edu><8F49B5ED151143FA86E977B4D4F44265@NewLife> <1264706302.5473.48.camel@cjfields.igb.uiuc.edu> Message-ID: Ok, SoapEU now warns on no email; passes email onto the fetch stage during autofetch -- cheers MAJ ----- Original Message ----- From: "Chris Fields" To: "Mark A. Jensen" Cc: "BioPerl-l" Sent: Thursday, January 28, 2010 2:18 PM Subject: Re: [Bioperl-l] EUtilities policy change >I think warning is fine for now. I've reimplemented that so it occurs > lazily (warns only when a request is actually made). > > Will also change the tool to 'BioPerl' (currently 'bioperl', all lc). > We'll obviously have to address this in the test suite as well in some > way, maybe ask for an email if network tests are requested. > > chris > > On Thu, 2010-01-28 at 14:05 -0500, Mark A. Jensen wrote: >> Thanks Chris-- >> The soap modules currently set tool to "SoapEUtilities(BioPerl)". >> I agree that a default email is a bad idea (tm) (unless maybe it's >> hilmar's...?). I'd say a warning on unset email parameters is a responsible >> "there be dragons" sort of treatment. >> MAJ >> ----- Original Message ----- >> From: "Chris Fields" >> To: "BioPerl-l" >> Cc: "Mark A. Jensen" >> Sent: Thursday, January 28, 2010 2:00 PM >> Subject: EUtilities policy change >> >> >> > All, >> > >> > Per NCBI's recent change in eutils user policy (effective June 1): >> > >> > http://bioperl.org/pipermail/bioperl-l/2009-December/031698.html >> > >> > Both the tool and email parameters ('-tool', '-email') are now required >> > when making requests. Note this will significantly break all modules >> > requiring remote access to eutils (Bio::DB::GenBank/GenPept, some Biblio >> > and Taxonomy stuff as well, IIRC). This also applies to web services >> > (SOAP-based access). Mark, not sure how this affects your SOAP-based >> > modules. >> > >> > I have reconfigured Bio::DB::EUtilities to follow this policy; the >> > default tool setting has been 'bioperl' and will remain that way. >> > However, there has been no default email, therefore setting this is now >> > required for future requests unless we (the bioperl devs) decide there >> > is a safe default email to utilize. My gut tells me, however, that >> > falling back to a default email opens up a can of worms for the devs and >> > is very likely a 'BAD IDEA'(TM). >> > >> > Regardless, be aware that, after June 1, NCBI will very likely exclude >> > requests with no email and will notify users who are considered to be >> > violating their policies. >> > >> > I will likely make further changes to Bio::DB::EUtilities in the >> > meantime to ensure that using the tools by default will not violate >> > NCBI's policy (e.g. override this at your own risk). >> > >> > chris >> > >> > >> > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From chapmanb at 50mail.com Thu Jan 28 15:35:05 2010 From: chapmanb at 50mail.com (Brad Chapman) Date: Thu, 28 Jan 2010 15:35:05 -0500 Subject: [Bioperl-l] OpenBio solution challenge: Project updates at BOSC 2010 Message-ID: <20100128203505.GG40046@sobchak.mgh.harvard.edu> Hello all; The BOSC 2010 organizing committee is hard at work getting prepared for this July's meeting in Boston: http://www.open-bio.org/wiki/BOSC_2010 One of the items we've traditionally had at the conference is a project update from each of the OpenBio affiliated groups. This year, we're thinking about organizing these talks around a central theme: the OpenBio solution challenge. We start with a biological question of general interest, and each of the project talks would focus around how you would solve that problem using your toolkit and programming language. This is meant to provide a challenge for OpenBio contributors, a nice tutorial style overview of various projects and approaches for other programmers, and a fun opportunity to compete and learn from other projects. Conference attendees will vote on their favorite solution, with the winner receiving fame and fortune (warning: fortune not guaranteed). For this to be successful, it of course requires interest and enthusiasm from y'all fine folks involved with the projects. Specifically: - Is there interest from your group in participating in the challenge? You'll want at least a few people to work on it, and someone to give a presentation at BOSC. - Do you have suggestions on a good theme or specific biological problem to tackle? We'll hope to pick something in a sweet spot that is challenging enough to be of interest, yet reasonable for presentation and preparation. Let's discuss ideas and get this together. Since the schedule for BOSC is developing rapidly, please give us an idea if you're interested by February 12th, and copy responses to the BOSC mailing list as a central place for discussion. bosc at open-bio.org Thanks, Brad, Michael, and the BOSC organizing committee From markw at illuminae.com Thu Jan 28 16:17:44 2010 From: markw at illuminae.com (Mark Wilkinson) Date: Thu, 28 Jan 2010 13:17:44 -0800 Subject: [Bioperl-l] [MOBY-dev] OpenBio solution challenge: Project updates at BOSC 2010 In-Reply-To: <20100128203505.GG40046@sobchak.mgh.harvard.edu> References: <20100128203505.GG40046@sobchak.mgh.harvard.edu> Message-ID: Brad, this sounds exciting! One thing strikes me, though - by asking for the sub-projects to propose the "grand challenge" themselves the one thing you can guarantee is that the "grand challenge" is solvable (or more likely, already solved!) Other "grand challenge" kinds of meetings have an independent third party pose the problem that has to be solved, and then all groups work toward a solution and compare their results. This would, IMO, be more revealing of the "state of the art" in each Open-Bio project, and point out where the weaknesses are that we should be focusing on... Someone (for example, you!) could act as the moderator to ensure that the "grand challenge" was at least a reasonable one, within the scope of what an Open-Bio project *should* be able to solve... Just my CAD $0.02 Mark On Thu, 28 Jan 2010 12:35:05 -0800, Brad Chapman wrote: > Hello all; > The BOSC 2010 organizing committee is hard at work getting prepared for > this > July's meeting in Boston: > > http://www.open-bio.org/wiki/BOSC_2010 > > One of the items we've traditionally had at the conference is a project > update from each of the OpenBio affiliated groups. This year, we're > thinking > about organizing these talks around a central theme: the OpenBio solution > challenge. We start with a biological question of general interest, and > each > of the project talks would focus around how you would solve that problem > using your toolkit and programming language. > > This is meant to provide a challenge for OpenBio contributors, a nice > tutorial > style overview of various projects and approaches for other programmers, > and a > fun opportunity to compete and learn from other projects. Conference > attendees > will vote on their favorite solution, with the winner receiving fame and > fortune (warning: fortune not guaranteed). > > For this to be successful, it of course requires interest and enthusiasm > from > y'all fine folks involved with the projects. Specifically: > > - Is there interest from your group in participating in the challenge? > You'll > want at least a few people to work on it, and someone to give a > presentation > at BOSC. > > - Do you have suggestions on a good theme or specific biological problem > to > tackle? We'll hope to pick something in a sweet spot that is > challenging > enough to be of interest, yet reasonable for presentation and > preparation. > > Let's discuss ideas and get this together. Since the schedule for BOSC is > developing rapidly, please give us an idea if you're interested by > February 12th, and copy responses to the BOSC mailing list as a central > place for discussion. > > bosc at open-bio.org > > Thanks, > Brad, Michael, and the BOSC organizing committee > _______________________________________________ > MOBY-dev mailing list > MOBY-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/moby-dev -- Mark D Wilkinson, PI Bioinformatics Assistant Professor, Medical Genetics The James Hogg iCAPTURE Centre for Cardiovascular and Pulmonary Research Providence Heart + Lung Institute University of British Columbia - St. Paul's Hospital Vancouver, BC, Canada From HWillis at scripps.edu Thu Jan 28 20:03:10 2010 From: HWillis at scripps.edu (Scooter Willis) Date: Thu, 28 Jan 2010 20:03:10 -0500 Subject: [Bioperl-l] [Biojava-dev] [MOBY-dev] OpenBio solution challenge: Project updates at BOSC 2010 In-Reply-To: References: <20100128203505.GG40046@sobchak.mgh.harvard.edu> Message-ID: <716E205A-5196-409F-A7BC-EF0F52AA997A@scripps.edu> Brad I agree with Mark that a particular problem may be biased towards a toolkit/language. Another approach would be to list a collection of problems and each group would then pick a problem to present. Could be a little more interesting to the audience as you are exposed to different problems and the various strengths of each toolkit. This could also help guide future development in the other toolkits as you would benefit from learning about the api and/or programming language. Each group would register a problem that they are going to present. From the group of problems not picked that becomes the surprise challenge where each group has 24 hours to either put together a presentation or an actual solution. Scooter On Jan 28, 2010, at 4:17 PM, Mark Wilkinson wrote: > > Brad, this sounds exciting! > > One thing strikes me, though - by asking for the sub-projects to propose > the "grand challenge" themselves the one thing you can guarantee is that > the "grand challenge" is solvable (or more likely, already solved!) > > Other "grand challenge" kinds of meetings have an independent third party > pose the problem that has to be solved, and then all groups work toward a > solution and compare their results. This would, IMO, be more revealing of > the "state of the art" in each Open-Bio project, and point out where the > weaknesses are that we should be focusing on... Someone (for example, > you!) could act as the moderator to ensure that the "grand challenge" was > at least a reasonable one, within the scope of what an Open-Bio project > *should* be able to solve... > > Just my CAD $0.02 > > Mark > > > > On Thu, 28 Jan 2010 12:35:05 -0800, Brad Chapman > wrote: > >> Hello all; >> The BOSC 2010 organizing committee is hard at work getting prepared for >> this >> July's meeting in Boston: >> >> http://www.open-bio.org/wiki/BOSC_2010 >> >> One of the items we've traditionally had at the conference is a project >> update from each of the OpenBio affiliated groups. This year, we're >> thinking >> about organizing these talks around a central theme: the OpenBio solution >> challenge. We start with a biological question of general interest, and >> each >> of the project talks would focus around how you would solve that problem >> using your toolkit and programming language. >> >> This is meant to provide a challenge for OpenBio contributors, a nice >> tutorial >> style overview of various projects and approaches for other programmers, >> and a >> fun opportunity to compete and learn from other projects. Conference >> attendees >> will vote on their favorite solution, with the winner receiving fame and >> fortune (warning: fortune not guaranteed). >> >> For this to be successful, it of course requires interest and enthusiasm >> from >> y'all fine folks involved with the projects. Specifically: >> >> - Is there interest from your group in participating in the challenge? >> You'll >> want at least a few people to work on it, and someone to give a >> presentation >> at BOSC. >> >> - Do you have suggestions on a good theme or specific biological problem >> to >> tackle? We'll hope to pick something in a sweet spot that is >> challenging >> enough to be of interest, yet reasonable for presentation and >> preparation. >> >> Let's discuss ideas and get this together. Since the schedule for BOSC is >> developing rapidly, please give us an idea if you're interested by >> February 12th, and copy responses to the BOSC mailing list as a central >> place for discussion. >> >> bosc at open-bio.org >> >> Thanks, >> Brad, Michael, and the BOSC organizing committee >> _______________________________________________ >> MOBY-dev mailing list >> MOBY-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/moby-dev > > > -- > Mark D Wilkinson, PI Bioinformatics > Assistant Professor, Medical Genetics > The James Hogg iCAPTURE Centre for Cardiovascular and Pulmonary Research > Providence Heart + Lung Institute > University of British Columbia - St. Paul's Hospital > Vancouver, BC, Canada > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From biopython at maubp.freeserve.co.uk Fri Jan 29 05:36:40 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 29 Jan 2010 10:36:40 +0000 Subject: [Bioperl-l] [MOBY-dev] OpenBio solution challenge: Project updates at BOSC 2010 In-Reply-To: References: <20100128203505.GG40046@sobchak.mgh.harvard.edu> Message-ID: <320fb6e01001290236l1ad02515w403a19f94dbb6d15@mail.gmail.com> Hi all, This is a great topic but should be continue it on just the one mailing list? Is there a suitable BOSC list, or how about the general Open Bio list? On Thu, Jan 28, 2010 at 9:17 PM, Mark Wilkinson wrote: > > Brad, this sounds exciting! > > One thing strikes me, though - by asking for the sub-projects to propose > the "grand challenge" themselves the one thing you can guarantee is that > the "grand challenge" is solvable (or more likely, already solved!) > > Other "grand challenge" kinds of meetings have an independent third party > pose the problem that has to be solved, and then all groups work toward a > solution and compare their results. ?This would, IMO, be more revealing of > the "state of the art" in each Open-Bio project, and point out where the > weaknesses are that we should be focusing on... ?Someone (for example, > you!) could act as the moderator to ensure that the "grand challenge" was > at least a reasonable one, within the scope of what an Open-Bio project > *should* be able to solve... > > Just my CAD $0.02 > > Mark One possible problem with having Brad act as moderator is his ties to Biopython (plus it would be a shame if we'd be one man down for trying to solve the challenges - grin). Having a project representative "sign off" on the challenge might work - or simply the whole of the BOSC committee which is quite balanced. Alternatively some kind of panel of challenges does seem a good way to reduce individual project bias (as suggest by Scooter), but there will still need to be a judging committee. I'm curious what kind of challenges the BOSC committee had in mind - would something like taking a newly sequence bacteria and producing an automated annotation as a GenBank, EMBL, or GFF file be too ambitious for example? There are already several major projects to do this e.g. RAST http://rast.nmpdr.org/ Peter (@Biopython) From mike.stubbington at bbsrc.ac.uk Fri Jan 29 08:25:25 2010 From: mike.stubbington at bbsrc.ac.uk (mike stubbington (BI)) Date: Fri, 29 Jan 2010 13:25:25 +0000 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error running blastn In-Reply-To: References: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk><56CC8EF7EEAD4E28A3B822ADA8AD74B1@NewLife> <05171AE0-71F2-4102-8131-70C6019D0172@bbsrc.ac.uk> Message-ID: Hi Mark, Thanks for your continued help. It now fails with this: ------------- EXCEPTION ------------- MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running /usr/local/ncbi/blast/bin/blastn : BLAST Database error: No alias or index file found for nucleotide database [MouseGenome] in search path [/Volumes/stubbing/PerlScripts/5CTest/trunk::] STACK Bio::Tools::Run::WrapperBase::_run /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1004 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 STACK Bio::Tools::Run::StandAloneBlastPlus::run /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 STACK toplevel ./5CTest.pl:63 ------------------------------------- If I change the factory creation to: my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new( -db_name => '/Users/stubbing/localBlast/MouseGenome' ); it fails with ------------- EXCEPTION ------------- MSG: DB name not valid STACK Bio::Tools::Run::StandAloneBlastPlus::new /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:516 STACK toplevel ./5CTest.pl:45 ------------------------------------- However I can run the following successfully from the command line: blastn -db /Users/stubbing/localBlast/MouseGenome -query querySequence.fasta Is there something wrong with how I'm referring to the blast database when I construct my factory? Thanks again, M On 28 Jan 2010, at 18:47, Mark A. Jensen wrote: > Hi Mike, > Believe I found the real bug causing the problem (was not accounting for > the db_dir parameter). Crashes should now also throw much more helpful > errors. Please try the code at r16774, and shout back. > thanks -- > MAJ > ----- Original Message ----- > From: "mike stubbington (BI)" > To: "Mark A. Jensen" > Cc: > Sent: Thursday, January 28, 2010 11:18 AM > Subject: Re: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek > error running blastn > > > Hi, > > Thanks for the suggestion. Unfortunately it still fails - error as follows: > > ------------- EXCEPTION ------------- > MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running > /usr/local/ncbi/blast/bin/blastn : Illegal seek at > /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 1000, > line 532. > > STACK Bio::Tools::Run::WrapperBase::_run > /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1005 > STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 > STACK Bio::Tools::Run::StandAloneBlastPlus::run > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 > STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 > STACK toplevel ./5CTest.pl:63 > ------------------------------------- > > M > > On 28 Jan 2010, at 15:56, Mark A. Jensen wrote: > >> Mike - please try updating your bioperl-live (the core) to the latest code >> (revision 16761 or so). >> CommandExts is a work in progress; from the stack errors it looks like you've >> got an older version. >> Try it then ping us back, if you would-- >> Thanks >> Mark >> ----- Original Message ----- >> From: "mike stubbington (BI)" >> To: >> Sent: Thursday, January 28, 2010 10:41 AM >> Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error >> running blastn >> >> >> Dear all, >> >> I am attempting to blast some primers against the mouse genome. I have created >> a >> local mouse genome blast database and I can search against it using 'blastn' >> at >> the command line. >> >> I have perl code that creates an array of bioperl sequence objects called >> @primers >> >> I then create a StandAloneBlastPlus factory using the following code? >> >> my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new( >> -db_dir => '/Users/stubbing/localBlast/', >> -db_name => 'MouseGenome' >> ); >> >> and then attempt to blast my primers using this? >> >> my @shortPrimers; >> my $count=1; >> foreach (@primers) { >> my $currentSeq = $_; >> print "Checking primer $count/$primerNumber "; >> if ($_->length < 40) { >> push(@shortPrimers,$_); >> print "Too short!\n"; >> } >> else { >> print "BLASTing..."; >> my $blastResult = $blastFactory->blastn(-query => $currentSeq); >> } >> $count++; >> } >> >> This fails with the following error? >> >> ------------- EXCEPTION ------------- >> MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem >> running >> /usr/local/ncbi/blast/bin/blastn : Illegal seek at >> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 989, >> line 532. >> >> STACK Bio::Tools::Run::WrapperBase::_run >> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:994 >> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD >> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 >> STACK Bio::Tools::Run::StandAloneBlastPlus::run >> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 >> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD >> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 >> STACK toplevel ./5CTest.pl:63 >> ------------------------------------- >> >> Line 63 in my code is (as you might expect) the one that calls blastn on my >> factory object. >> >> I'd appreciate any help you might be able to provide to shed light on this. >> >> Thanks in advance, >> >> Mike >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Fri Jan 29 08:36:54 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 29 Jan 2010 08:36:54 -0500 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error running blastn In-Reply-To: References: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk><56CC8EF7EEAD4E28A3B822ADA8AD74B1@NewLife> <05171AE0-71F2-4102-8131-70C6019D0172@bbsrc.ac.uk> Message-ID: Hi Mike- Well, at least we're getting more informative errors. I think it's still my bad; will look again. Both of your calls should work. (thanks for the positive control too) Thanks for your patience and the help-- MAJ ----- Original Message ----- From: "mike stubbington (BI)" To: "Mark A. Jensen" Cc: ; "Brian Osborne" Sent: Friday, January 29, 2010 8:25 AM Subject: Re: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error running blastn Hi Mark, Thanks for your continued help. It now fails with this: ------------- EXCEPTION ------------- MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running /usr/local/ncbi/blast/bin/blastn : BLAST Database error: No alias or index file found for nucleotide database [MouseGenome] in search path [/Volumes/stubbing/PerlScripts/5CTest/trunk::] STACK Bio::Tools::Run::WrapperBase::_run /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1004 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 STACK Bio::Tools::Run::StandAloneBlastPlus::run /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 STACK toplevel ./5CTest.pl:63 ------------------------------------- If I change the factory creation to: my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new( -db_name => '/Users/stubbing/localBlast/MouseGenome' ); it fails with ------------- EXCEPTION ------------- MSG: DB name not valid STACK Bio::Tools::Run::StandAloneBlastPlus::new /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:516 STACK toplevel ./5CTest.pl:45 ------------------------------------- However I can run the following successfully from the command line: blastn -db /Users/stubbing/localBlast/MouseGenome -query querySequence.fasta Is there something wrong with how I'm referring to the blast database when I construct my factory? Thanks again, M On 28 Jan 2010, at 18:47, Mark A. Jensen wrote: > Hi Mike, > Believe I found the real bug causing the problem (was not accounting for > the db_dir parameter). Crashes should now also throw much more helpful > errors. Please try the code at r16774, and shout back. > thanks -- > MAJ > ----- Original Message ----- > From: "mike stubbington (BI)" > To: "Mark A. Jensen" > Cc: > Sent: Thursday, January 28, 2010 11:18 AM > Subject: Re: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek > error running blastn > > > Hi, > > Thanks for the suggestion. Unfortunately it still fails - error as follows: > > ------------- EXCEPTION ------------- > MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem > running > /usr/local/ncbi/blast/bin/blastn : Illegal seek at > /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 1000, > > line 532. > > STACK Bio::Tools::Run::WrapperBase::_run > /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1005 > STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 > STACK Bio::Tools::Run::StandAloneBlastPlus::run > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 > STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 > STACK toplevel ./5CTest.pl:63 > ------------------------------------- > > M > > On 28 Jan 2010, at 15:56, Mark A. Jensen wrote: > >> Mike - please try updating your bioperl-live (the core) to the latest code >> (revision 16761 or so). >> CommandExts is a work in progress; from the stack errors it looks like you've >> got an older version. >> Try it then ping us back, if you would-- >> Thanks >> Mark >> ----- Original Message ----- >> From: "mike stubbington (BI)" >> To: >> Sent: Thursday, January 28, 2010 10:41 AM >> Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek >> error >> running blastn >> >> >> Dear all, >> >> I am attempting to blast some primers against the mouse genome. I have >> created >> a >> local mouse genome blast database and I can search against it using 'blastn' >> at >> the command line. >> >> I have perl code that creates an array of bioperl sequence objects called >> @primers >> >> I then create a StandAloneBlastPlus factory using the following code? >> >> my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new( >> -db_dir => '/Users/stubbing/localBlast/', >> -db_name => 'MouseGenome' >> ); >> >> and then attempt to blast my primers using this? >> >> my @shortPrimers; >> my $count=1; >> foreach (@primers) { >> my $currentSeq = $_; >> print "Checking primer $count/$primerNumber "; >> if ($_->length < 40) { >> push(@shortPrimers,$_); >> print "Too short!\n"; >> } >> else { >> print "BLASTing..."; >> my $blastResult = $blastFactory->blastn(-query => $currentSeq); >> } >> $count++; >> } >> >> This fails with the following error? >> >> ------------- EXCEPTION ------------- >> MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem >> running >> /usr/local/ncbi/blast/bin/blastn : Illegal seek at >> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 989, >> >> line 532. >> >> STACK Bio::Tools::Run::WrapperBase::_run >> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:994 >> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD >> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 >> STACK Bio::Tools::Run::StandAloneBlastPlus::run >> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 >> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD >> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 >> STACK toplevel ./5CTest.pl:63 >> ------------------------------------- >> >> Line 63 in my code is (as you might expect) the one that calls blastn on my >> factory object. >> >> I'd appreciate any help you might be able to provide to shed light on this. >> >> Thanks in advance, >> >> Mike >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Fri Jan 29 08:47:48 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 29 Jan 2010 08:47:48 -0500 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error running blastn In-Reply-To: References: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk><56CC8EF7EEAD4E28A3B822ADA8AD74B1@NewLife><05171AE0-71F2-4102-8131-70C6019D0172@bbsrc.ac.uk> Message-ID: <2B7BF6CD46AE441AB24203E169D9C503@NewLife> Mike et al-- I've entered this as Bug #3003 on http://bugzilla.bioperl.org; we'll do further ping-pongs on this issue via the comment facility there-- cheers MAJ ----- Original Message ----- From: "mike stubbington (BI)" To: "Mark A. Jensen" Cc: ; ; "Osborne" Sent: Friday, January 29, 2010 8:25 AM Subject: Re: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error running blastn Hi Mark, Thanks for your continued help. It now fails with this: ------------- EXCEPTION ------------- MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running /usr/local/ncbi/blast/bin/blastn : BLAST Database error: No alias or index file found for nucleotide database [MouseGenome] in search path [/Volumes/stubbing/PerlScripts/5CTest/trunk::] STACK Bio::Tools::Run::WrapperBase::_run /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1004 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 STACK Bio::Tools::Run::StandAloneBlastPlus::run /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 STACK toplevel ./5CTest.pl:63 ------------------------------------- If I change the factory creation to: my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new( -db_name => '/Users/stubbing/localBlast/MouseGenome' ); it fails with ------------- EXCEPTION ------------- MSG: DB name not valid STACK Bio::Tools::Run::StandAloneBlastPlus::new /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:516 STACK toplevel ./5CTest.pl:45 ------------------------------------- However I can run the following successfully from the command line: blastn -db /Users/stubbing/localBlast/MouseGenome -query querySequence.fasta Is there something wrong with how I'm referring to the blast database when I construct my factory? Thanks again, M On 28 Jan 2010, at 18:47, Mark A. Jensen wrote: > Hi Mike, > Believe I found the real bug causing the problem (was not accounting for > the db_dir parameter). Crashes should now also throw much more helpful > errors. Please try the code at r16774, and shout back. > thanks -- > MAJ > ----- Original Message ----- > From: "mike stubbington (BI)" > To: "Mark A. Jensen" > Cc: > Sent: Thursday, January 28, 2010 11:18 AM > Subject: Re: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek > error running blastn > > > Hi, > > Thanks for the suggestion. Unfortunately it still fails - error as follows: > > ------------- EXCEPTION ------------- > MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem > running > /usr/local/ncbi/blast/bin/blastn : Illegal seek at > /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 1000, > > line 532. > > STACK Bio::Tools::Run::WrapperBase::_run > /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1005 > STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 > STACK Bio::Tools::Run::StandAloneBlastPlus::run > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 > STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 > STACK toplevel ./5CTest.pl:63 > ------------------------------------- > > M > > On 28 Jan 2010, at 15:56, Mark A. Jensen wrote: > >> Mike - please try updating your bioperl-live (the core) to the latest code >> (revision 16761 or so). >> CommandExts is a work in progress; from the stack errors it looks like you've >> got an older version. >> Try it then ping us back, if you would-- >> Thanks >> Mark >> ----- Original Message ----- >> From: "mike stubbington (BI)" >> To: >> Sent: Thursday, January 28, 2010 10:41 AM >> Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek >> error >> running blastn >> >> >> Dear all, >> >> I am attempting to blast some primers against the mouse genome. I have >> created >> a >> local mouse genome blast database and I can search against it using 'blastn' >> at >> the command line. >> >> I have perl code that creates an array of bioperl sequence objects called >> @primers >> >> I then create a StandAloneBlastPlus factory using the following code? >> >> my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new( >> -db_dir => '/Users/stubbing/localBlast/', >> -db_name => 'MouseGenome' >> ); >> >> and then attempt to blast my primers using this? >> >> my @shortPrimers; >> my $count=1; >> foreach (@primers) { >> my $currentSeq = $_; >> print "Checking primer $count/$primerNumber "; >> if ($_->length < 40) { >> push(@shortPrimers,$_); >> print "Too short!\n"; >> } >> else { >> print "BLASTing..."; >> my $blastResult = $blastFactory->blastn(-query => $currentSeq); >> } >> $count++; >> } >> >> This fails with the following error? >> >> ------------- EXCEPTION ------------- >> MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem >> running >> /usr/local/ncbi/blast/bin/blastn : Illegal seek at >> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 989, >> >> line 532. >> >> STACK Bio::Tools::Run::WrapperBase::_run >> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:994 >> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD >> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 >> STACK Bio::Tools::Run::StandAloneBlastPlus::run >> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 >> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD >> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 >> STACK toplevel ./5CTest.pl:63 >> ------------------------------------- >> >> Line 63 in my code is (as you might expect) the one that calls blastn on my >> factory object. >> >> I'd appreciate any help you might be able to provide to shed light on this. >> >> Thanks in advance, >> >> Mike >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From help at gmod.org Fri Jan 29 17:03:48 2010 From: help at gmod.org (Dave Clements, GMOD Help Desk) Date: Fri, 29 Jan 2010 14:03:48 -0800 Subject: [Bioperl-l] 2010 GMOD Summer School - Americas In-Reply-To: <71ee57c71001291402h2ec67300r4fc7a3b2375f4080@mail.gmail.com> References: <71ee57c71001291351q47994b82w10dffb390dbf2837@mail.gmail.com> <71ee57c71001291354m68548823s3e3fbd2e49e9b332@mail.gmail.com> <71ee57c71001291356p5e7f1aadi2bf437c93014a393@mail.gmail.com> <71ee57c71001291357h67112e2fkcf835687e59f66ae@mail.gmail.com> <71ee57c71001291358k74781b08n232534d8895c5ec1@mail.gmail.com> <71ee57c71001291400y28e40eb6i112ea91df977dc67@mail.gmail.com> <71ee57c71001291400n6133982eh3a02293ff741900b@mail.gmail.com> <71ee57c71001291401y505b56baic61c11754d88a444@mail.gmail.com> <71ee57c71001291402s23e3f2e9w2562d6acf85bd4ae@mail.gmail.com> <71ee57c71001291402h2ec67300r4fc7a3b2375f4080@mail.gmail.com> Message-ID: <71ee57c71001291403s19be18f3s3a1d5a314c74def@mail.gmail.com> Hello all, I am pleased to announce that we are now accepting applications for: ? 2010 GMOD Summer School - Americas ? ? 6-9 May 2010 ? ? NESCent, Durham, NC, USA ? ? http://gmod.org/wiki/2010_GMOD_Summer_School_-_Americas This will be a hands-on multi-day course aimed at teaching new GMOD users/administrators how to get GMOD Components up and running. The course will introduce participants to the GMOD project and then focus on installation, configuration and integration of popular GMOD Components. The course will be held May 6-9, at NESCent in Durham, NC. These components will be covered: ? ?* Apollo - genome annotation editor ? ?* Chado - a modular and extensible database schema ? ?* Galaxy - workflow system ? ?* GBrowse - the Generic Genome Browser ? ?* GBrowse_syn - A generic synteny browser ? ?* JBrowse - genome browser ? ?* MAKER - genome annotation pipeline ? ?* Tripal - web front end for Chado The deadline for applying is the end of Friday, February 22. Admission is competitive and is based on the strength of the application (especially the statement of interest). In 2009 there were over 50 applications for the 25 slots. Any applications received after the deadline will be placed on the waiting list. See the course page for details and an application link: ?http://gmod.org/wiki/2010_GMOD_Summer_School_-_Americas Thanks, Dave Clements GMOD Help Desk PS: We are also investigating holding a GMOD course in the Asia/Pacific region, sometime this fall. Watch the GMOD mailing lists and the GMOD News page/RSS feed for updates. -- Please keep responses on the list! http://gmod.org/wiki/2010_GMOD_Summer_School_-_Americas http://gmod.org/wiki/GMOD_News Was this helpful? http://gmod.org/wiki/Help_Desk_Feedback From bhakti.dwivedi at gmail.com Sat Jan 30 17:38:40 2010 From: bhakti.dwivedi at gmail.com (Bhakti Dwivedi) Date: Sat, 30 Jan 2010 17:38:40 -0500 Subject: [Bioperl-l] how to map blast results on to the genome? Message-ID: Does anyone know how I can graphically map the blast results (m -8 format) to the genome using bio-perl? Thanks Bhakti From jason at bioperl.org Sat Jan 30 18:56:14 2010 From: jason at bioperl.org (Jason Stajich) Date: Sat, 30 Jan 2010 15:56:14 -0800 Subject: [Bioperl-l] how to map blast results on to the genome? In-Reply-To: References: Message-ID: <68937A7D-291F-419A-9ED7-7A87D9B4C78A@bioperl.org> Did you try BioGraphics and read the HOWTO on it -- http://bioperl.org/wiki/HOWTO:Graphics On Jan 30, 2010, at 2:38 PM, Bhakti Dwivedi wrote: > Does anyone know how I can graphically map the blast results (m -8 > format) > to the genome using bio-perl? > > Thanks > > Bhakti > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ http://twitter.com/hyphaltip From David.Messina at sbc.su.se Sun Jan 31 12:43:52 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Sun, 31 Jan 2010 18:43:52 +0100 Subject: [Bioperl-l] question about a PAML module In-Reply-To: <18165610.1264958229480.JavaMail.oracle@rif1.s.upf.edu> References: <17885902.1262198478831.JavaMail.oracle@rif1.s.upf.edu> <31992102.1262223390984.JavaMail.oracle@rif2.s.upf.edu> <18165610.1264958229480.JavaMail.oracle@rif1.s.upf.edu> Message-ID: Hey Rui, My apologies for keeping you waiting on this. I started looking at it on Friday, and while I believe it'll be a relatively easy fix, I haven't got to the bottom of it yet. I'll look at it some more tomorrow and hopefully get it sorted it in the next day or two. Dave From bluecurio at gmail.com Sun Jan 31 22:22:37 2010 From: bluecurio at gmail.com (Daniel Renfro) Date: Sun, 31 Jan 2010 21:22:37 -0600 Subject: [Bioperl-l] New package to compare two SeqI-implementing objects Message-ID: <25c0f63d1001311922u134e9321s10f490a335f4a6e@mail.gmail.com> Hello all, A colleague and I have been working on a (Bio)Perl package to compare two Seq objects. This is in response to a need we found in our lab -- we wanted to see the changes to GenBank files through time, but wanted an automated way to do this. This led to what I'm calling the SeqDiff.pm package. I thought it would be a good idea to inform the community and get some feedback. The package takes two Seq objects as arguments, arbitrarily called "old" and "new." It then matches the features from the old object with the new object. This is done based on some criteria -- in our case we decided the features must be of the same type (have the same primary_tag) and have at least one matching database cross-reference (db_xref) in common. The left-over features (ones that did not have a match) are dropped into arrays called "lost" and "gained." The matching is done in about NlogN time, as each matching pair are removed from subsequent searches. The matched features and iterated through and the differences are calculated. Each feature is examined recursively and any differences are reported. Optionally you can give the new() method a flag so that everything is returned (differences and similarities.) You can set callbacks for different types of objects (like anything that isa('Bio::LocationI')) if you want a custom comparison for specific BioPerl objects. This comparison step is the computationally slow part, and currently everything is held in memory. I think it'd be better to do this piece-meal, using the BioPerl-ish next() and last() methods. Maybe this was a little verbose, but that is the SeqDiff package in a nutshell. I hope to soon release v1.0. If you have any questions or comments I'd love to hear them. -Daniel Renfro Hu Lab Research Associate Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4055 From maj at fortinbras.us Sun Jan 31 22:47:05 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 31 Jan 2010 22:47:05 -0500 Subject: [Bioperl-l] New package to compare two SeqI-implementing objects In-Reply-To: <25c0f63d1001311922u134e9321s10f490a335f4a6e@mail.gmail.com> References: <25c0f63d1001311922u134e9321s10f490a335f4a6e@mail.gmail.com> Message-ID: <5DC96D65B6A447C3802AF5D745FF4AA4@NewLife> Daniel-- this sounds interesting and useful, I +1 it. Your intuition about in-memory vs streaming sounds correct to me; features can be many, and diffing many (MANY) sequences may bork. Maybe our feature-rich users can chime in. (...however, I did just hear about a magic spell called 'File::Map', might check that out on CPAN.) cheers- MAJ ----- Original Message ----- From: "Daniel Renfro" To: Sent: Sunday, January 31, 2010 10:22 PM Subject: [Bioperl-l] New package to compare two SeqI-implementing objects > Hello all, > > A colleague and I have been working on a (Bio)Perl package to compare two > Seq objects. This is in response to a need we found in our lab -- we wanted > to see the changes to GenBank files through time, but wanted an automated > way to do this. This led to what I'm calling the SeqDiff.pm package. I > thought it would be a good idea to inform the community and get some > feedback. > > The package takes two Seq objects as arguments, arbitrarily called "old" and > "new." It then matches the features from the old object with the new object. > This is done based on some criteria -- in our case we decided the features > must be of the same type (have the same primary_tag) and have at least one > matching database cross-reference (db_xref) in common. The left-over > features (ones that did not have a match) are dropped into arrays called > "lost" and "gained." The matching is done in about NlogN time, as each > matching pair are removed from subsequent searches. > > The matched features and iterated through and the differences are > calculated. Each feature is examined recursively and any differences are > reported. Optionally you can give the new() method a flag so that everything > is returned (differences and similarities.) You can set callbacks for > different types of objects (like anything that isa('Bio::LocationI')) if you > want a custom comparison for specific BioPerl objects. This comparison step > is the computationally slow part, and currently everything is held in > memory. I think it'd be better to do this piece-meal, using the BioPerl-ish > next() and last() methods. > > Maybe this was a little verbose, but that is the SeqDiff package in a > nutshell. I hope to soon release v1.0. If you have any questions or comments > I'd love to hear them. > > -Daniel Renfro > > Hu Lab Research Associate > Dept. of Biochemistry and Biophysics > 2128 TAMU > Texas A&M Univ. > College Station, TX 77843-2128 > 979-862-4055 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From rui.faria at upf.edu Sun Jan 31 12:17:09 2010 From: rui.faria at upf.edu (Rui Faria) Date: Sun, 31 Jan 2010 18:17:09 +0100 (CET) Subject: [Bioperl-l] question about a PAML module In-Reply-To: References: <17885902.1262198478831.JavaMail.oracle@rif1.s.upf.edu> <31992102.1262223390984.JavaMail.oracle@rif2.s.upf.edu> Message-ID: <18165610.1264958229480.JavaMail.oracle@rif1.s.upf.edu> Hi Dave, we reported the bug on codeml about errors when the user gives its own tree file, some time ago. Did you have any chances to look at it? We basically wanted to know your opinion on where the problem may be, since we are not the most experienced "perlers" on the planet :) I'm asking this because we have to deal with that right now. If someone could check where is the problem, to understand if it has an easy solution, that would be of great help. Best, Rui -----Mensaje Original----- De Dave Messina Enviado Jue 31/12/2009 11:55 AM Para Rui Faria Cc Jason Stajich ; sandraneto_ at hotmail.com; bioperl-l List Asunto Re: question about a PAML module Hi Rui and Sandra, Could you file this as a bug report at http://bugzilla.open-bio.org/enter_bug.cgi?product=Bioperl ? Once you've created the bug report with a brief description of the problem and submitted it, please attach the following to the bug report: - sample input files (a sequence file and a tree file, probably) - a script which reproduces the problem - the output (error messages) like you show below When I updated the code to work with the current version, I didn't exhaustively test all of the different modes of running codeml, so I appreciate you reporting this. There was another, similar issue reported a few days ago. I will try to take a look at both of these bug reports soon. Dave From rui.faria at upf.edu Sun Jan 31 13:56:56 2010 From: rui.faria at upf.edu (Rui Faria) Date: Sun, 31 Jan 2010 19:56:56 +0100 (CET) Subject: [Bioperl-l] question about a PAML module In-Reply-To: References: <17885902.1262198478831.JavaMail.oracle@rif1.s.upf.edu> <31992102.1262223390984.JavaMail.oracle@rif2.s.upf.edu> <18165610.1264958229480.JavaMail.oracle@rif1.s.upf.edu> Message-ID: <11398434.1264964216856.JavaMail.oracle@rif1.s.upf.edu> Many thanks! We hope one day that we become experts we can retribute! Rui -----Mensaje Original----- De Dave Messina Enviado Dom 31/01/2010 06:43 PM Para Rui Faria Cc Jason Stajich ; sandraneto_ at hotmail.com; bioperl-l List Asunto Re: question about a PAML module Hey Rui, My apologies for keeping you waiting on this. I started looking at it on Friday, and while I believe it'll be a relatively easy fix, I haven't got to the bottom of it yet. I'll look at it some more tomorrow and hopefully get it sorted it in the next day or two. Dave From avilella at gmail.com Sat Jan 2 08:57:28 2010 From: avilella at gmail.com (Albert Vilella) Date: Sat, 2 Jan 2010 08:57:28 +0000 Subject: [Bioperl-l] Downloading from dbEST by taxon range Message-ID: <358f4d651001020057g21c8497eia6559bfeac0b5544@mail.gmail.com> Hi all and happy 2010 for those that follow the Gregorian calendar, A question that is a bit in between bioperl and NCBI. I would like to use bioperl to download sequences fom dbEST. For that, my idea is to use Bio::DB::Genbank and get the sequences by gi id. Now, I want my script to download sequences for a given NCBI taxonomy clade. For example, if I want to download all fish (clupeocephala) sequences in dbEST, I can browse it around with the dbEST webpage using "clupeocephala[taxonomy]", so I am thinking there should be a way to do it programmatically. How can I query NCBI dbEST through bioperl to give me the list of GI ids I am looking for given a taxon id? Thanks in advance, Albert. From jason at bioperl.org Sat Jan 2 16:35:22 2010 From: jason at bioperl.org (Jason Stajich) Date: Sat, 2 Jan 2010 08:35:22 -0800 Subject: [Bioperl-l] Downloading from dbEST by taxon range In-Reply-To: <358f4d651001020057g21c8497eia6559bfeac0b5544@mail.gmail.com> References: <358f4d651001020057g21c8497eia6559bfeac0b5544@mail.gmail.com> Message-ID: DId you try Bio::DB::Query::GenBank ? You'd want to use -db => 'nucest' and then you just put in an Entrez query as per the example. you can include dates in the query so you can do updates to your locally retrieved data in a script that runs periodically. -jason On Jan 2, 2010, at 12:57 AM, Albert Vilella wrote: > Hi all and happy 2010 for those that follow the Gregorian calendar, > > A question that is a bit in between bioperl and NCBI. I would like > to use > bioperl to download sequences fom dbEST. For that, my idea is to use > Bio::DB::Genbank and get the sequences by gi id. > > Now, I want my script to download sequences for a given NCBI > taxonomy clade. > > For example, if I want to download all fish (clupeocephala) > sequences in dbEST, > I can browse it around with the dbEST webpage using > "clupeocephala[taxonomy]", > so I am thinking there should be a way to do it programmatically. > > How can I query NCBI dbEST through bioperl to give me the list of GI > ids I am > looking for given a taxon id? > > Thanks in advance, > > Albert. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ From avilella at gmail.com Sun Jan 3 09:08:33 2010 From: avilella at gmail.com (Albert Vilella) Date: Sun, 3 Jan 2010 09:08:33 +0000 Subject: [Bioperl-l] Downloading from dbEST by taxon range In-Reply-To: References: <358f4d651001020057g21c8497eia6559bfeac0b5544@mail.gmail.com> Message-ID: <358f4d651001030108p6a92fb27k5fa39be6bebb3a9c@mail.gmail.com> Thanks Jason! For the sake of completion, here is the script I needed: --------------------- #!/usr/bin/perl use strict; use Bio::SeqIO; use Bio::DB::Taxonomy; use Bio::DB::Query::GenBank; use Bio::DB::GenBank; use Bio::SeqIO; use Getopt::Long; my $keyword_type = 'EST'; my $outdir = '.'; my $taxon_name = undef; my $db_type = 'nucest'; GetOptions('keyword_type:s' => \$keyword_type, 't|taxon_name:s' => \$taxon_name, 'db_type:s' => \$db_type, 'outdir:s' => \$outdir); my $query_string = $taxon_name ."[Organism] AND ". $keyword_type ."[Keyword]"; my $db = Bio::DB::Query::GenBank->new (-db => $db_type, -query => $query_string, -mindate => '2007', -maxdate => '2010'); my $taxon_name_string = $taxon_name; $taxon_name_string =~ s/\ /\_/g; my $outfile = $outdir . "/" . $taxon_name_string . ".". $db_type . ".fasta"; my $out = Bio::SeqIO->new(-file => ">$outfile", -format => 'fasta'); print $db->count,"\n"; my $gb = Bio::DB::GenBank->new(); my $stream = $gb->get_Stream_by_query($db); while (my $seq = $stream->next_seq) { # Filtering reads shorter than 800 next unless (length($seq->seq) > 800); $out->write_seq($seq); } $out->close; --------------------- On Sat, Jan 2, 2010 at 4:35 PM, Jason Stajich wrote: > DId you try Bio::DB::Query::GenBank ? > You'd want to use -db => 'nucest' and then you just put in an Entrez query > as per the example. ?you can include dates in the query so you can do > updates to your locally retrieved data in a script that runs periodically. > > -jason > On Jan 2, 2010, at 12:57 AM, Albert Vilella wrote: > >> Hi all and happy 2010 for those that follow the Gregorian calendar, >> >> A question that is a bit in between bioperl and NCBI. I would like to use >> bioperl to download sequences fom dbEST. For that, my idea is to use >> Bio::DB::Genbank and get the sequences by gi id. >> >> Now, I want my script to download sequences for a given NCBI taxonomy >> clade. >> >> For example, if I want to download all fish (clupeocephala) sequences in >> dbEST, >> I can browse it around with the dbEST webpage using >> "clupeocephala[taxonomy]", >> so I am thinking there should be a way to do it programmatically. >> >> How can I query NCBI dbEST through bioperl to give me the list of GI ids I >> am >> looking for given a taxon id? >> >> Thanks in advance, >> >> Albert. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > http://fungalgenomes.org/ > > From Jean-Marc.Frigerio at pierroton.inra.fr Mon Jan 4 14:12:18 2010 From: Jean-Marc.Frigerio at pierroton.inra.fr (Jean-Marc Frigerio INRA) Date: Mon, 04 Jan 2010 15:12:18 +0100 Subject: [Bioperl-l] How to read in the whole fasta file in the memory? In-Reply-To: References: Message-ID: <4B41F742.2030209@pierroton.inra.fr> > Message: 1 > Date: Thu, 31 Dec 2009 11:26:45 +1800 > From: Peng Yu > Subject: [Bioperl-l] How to read in the whole fasta file in the > memory? > To: bioperl-l at lists.open-bio.org > Message-ID: > <366c6f340912300926k5af5cc88nc3c3babda541fd1 at mail.gmail.com> > Content-Type: text/plain; charset=ISO-8859-1 > > With Bio::SeqIO, I can only read in the records in a fasta file one by > one. This is preferable if there are many records in a file. > > But I also want to read all the records in. I could use a while loop > to read all records in. But could somebody let me know if there is a > function in bioperl that can read in all the record at once and return > me an object? > > http://www.bioperl.org/wiki/HOWTO:SeqIO > > > ------------------------------ > > Message: 2 > Date: Wed, 30 Dec 2009 13:04:53 -0500 > From: Sean Davis > Subject: Re: [Bioperl-l] How to read in the whole fasta file in the > memory? > To: Peng Yu > Cc: "bioperl-l at lists.open-bio.org" > Message-ID: > <264855a00912301004t396e0d4fwf9d291c5d82c3fb9 at mail.gmail.com> > Content-Type: text/plain; charset=UTF-8 > > On Wed, Dec 30, 2009 at 12:26 PM, Peng Yu wrote: >> With Bio::SeqIO, I can only read in the records in a fasta file one by >> one. This is preferable if there are many records in a file. >> >> But I also want to read all the records in. I could use a while loop >> to read all records in. But could somebody let me know if there is a >> function in bioperl that can read in all the record at once and return >> me an object? > > In perl, you can use an array to store the records. You could also > use a hash if you have reasonable keys for the entries. > > Sean > > >> http://www.bioperl.org/wiki/HOWTO:SeqIO >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > ------------------------------ > > Message: 3 > Date: Wed, 30 Dec 2009 11:58:54 -0800 > From: Jason Stajich > Subject: Re: [Bioperl-l] How to read in the whole fasta file in the > memory? > To: Peng Yu > Cc: BioPerl List > Message-ID: <3550F192-111F-48A7-B1B7-113FFFAC105B at bioperl.org> > Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes > > or use a database object so you can retrieve sequences that have a > particular id. See Bio::DB::Fasta > On Dec 30, 2009, at 10:04 AM, Sean Davis wrote: > >> On Wed, Dec 30, 2009 at 12:26 PM, Peng Yu wrote: >>> With Bio::SeqIO, I can only read in the records in a fasta file one >>> by >>> one. This is preferable if there are many records in a file. >>> >>> But I also want to read all the records in. I could use a while loop >>> to read all records in. But could somebody let me know if there is a >>> function in bioperl that can read in all the record at once and >>> return >>> me an object? >> In perl, you can use an array to store the records. You could also >> use a hash if you have reasonable keys for the entries. >> >> Sean >> >> >>> http://www.bioperl.org/wiki/HOWTO:SeqIO >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > http://fungalgenomes.org/ > > > > ------------------------------ > > Message: 4 > Date: Wed, 30 Dec 2009 16:20:31 -0500 > From: "Mark A. Jensen" > Subject: Re: [Bioperl-l] How to read in the whole fasta file in the > memory? > To: "Peng Yu" , > Message-ID: <2646F627E6D14AADB412A6E6B51E24DA at NewLife> > Content-Type: text/plain; format=flowed; charset="iso-8859-1"; > reply-type=original > > I think you might want Bio::AlignIO: > > $alnio = Bio::AlignIO->new(-file=> 'my.fas' ); > $aln = $alnio->next_aln; > @seqs = $aln->each_seqs; > > MAJ > ----- Original Message ----- > From: "Peng Yu" > To: > Sent: Wednesday, December 30, 2009 12:26 PM > Subject: [Bioperl-l] How to read in the whole fasta file in the memory? > > >> With Bio::SeqIO, I can only read in the records in a fasta file one by >> one. This is preferable if there are many records in a file. >> >> But I also want to read all the records in. I could use a while loop >> to read all records in. But could somebody let me know if there is a >> function in bioperl that can read in all the record at once and return >> me an object? >> >> http://www.bioperl.org/wiki/HOWTO:SeqIO >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l Hi, I wrote and currently use a module I named Bio::SeqIO::multifasta, which is basically a copy of Bio::SeqIO::fasta plus a few methods: get_by_id(), get_by_order(), first_seq() and previous_seq() It would need review, validation etc. Do I submit it to Bugzilla ? -- jmf From jason at bioperl.org Mon Jan 4 16:03:45 2010 From: jason at bioperl.org (Jason Stajich) Date: Mon, 4 Jan 2010 08:03:45 -0800 Subject: [Bioperl-l] How to read in the whole fasta file in the memory? In-Reply-To: <4B41F742.2030209@pierroton.inra.fr> References: <4B41F742.2030209@pierroton.inra.fr> Message-ID: <16D7C8C1-E4BE-406F-9D60-379876178CAB@bioperl.org> We typically think of SeqIO as parsing a stream of data, not being reliant on it being a file which is what these methods would be implying I think. Sounds a lot like a database - does Bio::DB::Fasta not provide some of the functionality you need by these methods? I realize there isn't a by_order() but the get_by_id() is implemented to allow random access. -jason > > Hi, > > I wrote and currently use a module I named Bio::SeqIO::multifasta, > which is basically a copy of Bio::SeqIO::fasta plus a few methods: > get_by_id(), get_by_order(), first_seq() and previous_seq() > > It would need review, validation etc. Do I submit it to Bugzilla ? > > -- jmf > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ From avilella at gmail.com Mon Jan 4 20:00:24 2010 From: avilella at gmail.com (Albert Vilella) Date: Mon, 4 Jan 2010 20:00:24 +0000 Subject: [Bioperl-l] indexed fastq files Message-ID: <358f4d651001041200m1f715b83k743f4d2ee6b6312b@mail.gmail.com> Hi all, What is the best way to index fastq files, so that once clustered, I can provide a list of seq_ids and get them back in fastq format from the indexed db? Cheers, Albert. From cjfields at illinois.edu Mon Jan 4 21:59:50 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 4 Jan 2010 15:59:50 -0600 Subject: [Bioperl-l] indexed fastq files In-Reply-To: <358f4d651001041200m1f715b83k743f4d2ee6b6312b@mail.gmail.com> References: <358f4d651001041200m1f715b83k743f4d2ee6b6312b@mail.gmail.com> Message-ID: <07EBA105-6A34-490C-B0B9-4772DF386CBA@illinois.edu> Bio::Index::Fastq, maybe? To tell the truth, I haven't tried it since we refactored FASTQ parsing, so let us know if it doesn't work. chris On Jan 4, 2010, at 2:00 PM, Albert Vilella wrote: > Hi all, > > What is the best way to index fastq files, so that once clustered, I > can provide a list of seq_ids and get > them back in fastq format from the indexed db? > > Cheers, > > Albert. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Jan 5 03:54:03 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 4 Jan 2010 21:54:03 -0600 Subject: [Bioperl-l] How to read in the whole fasta file in the memory? In-Reply-To: <4B41F742.2030209@pierroton.inra.fr> References: <4B41F742.2030209@pierroton.inra.fr> Message-ID: <1BAE5508-0DB7-41B4-92E3-49256582131F@illinois.edu> Jean-Marc, You can do that, yes. Just curious, but have you looked at the various flat file indexing modules for FASTA? Bio::DB::Fasta and Bio::Index::Fasta are commonly used and allow lookups by primary ID (and I think in some cases secondary IDs). chris On Jan 4, 2010, at 8:12 AM, Jean-Marc Frigerio INRA wrote: > ... > > Hi, > > I wrote and currently use a module I named Bio::SeqIO::multifasta, which is basically a copy of Bio::SeqIO::fasta plus a few methods: > get_by_id(), get_by_order(), first_seq() and previous_seq() > > It would need review, validation etc. Do I submit it to Bugzilla ? > > -- jmf > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From fs5 at sanger.ac.uk Wed Jan 6 22:16:13 2010 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Wed, 06 Jan 2010 22:16:13 +0000 Subject: [Bioperl-l] Bio::DB::Sam strange behaviour for read pairs Message-ID: <4B450BAD.3050807@sanger.ac.uk> I'm trying to extract paired reads from a BAM file that span a given region. I would then like to get the two read ends of the sequenced clone that spans the region. I use Bio::DB::Sam->get_features_by_location for this and it does give me the correct read pairs as a region match but it doesn't give me both read pairs in all cases. Here is the script: #!/usr/bin/perl use Bio::DB::Sam; my $usage = "usage: $0 BAMFILE CHROMOSOME STARTPOS ENDPOS\n" ; my ($bam_file,$chrom,$start,$end) = @ARGV ; die $usage unless $bam_file && $chrom && $start && $end; my $bam = Bio::DB::Sam->new(-bam => $bam_file); my @pairs = $bam->get_features_by_location( -type => 'read_pair', -seq_id => $chrom, -start => $start, -end => $end); print "region: $chrom:$start..$end\n" ; foreach my $pair (@pairs) { print " pair: id: ".$pair->id.", start".$pair->start.', end:'.$pair->end."\n"; my ($first_mate,$second_mate) = $pair->get_SeqFeatures; print " first_mate: start:".$first_mate->start.', end:'.$first_mate->end."\n"; if ($second_mate){ print " second_mate: start:".$second_mate->start.', end:'.$second_mate->end."\n"; } else { print " no second mate\n"; } } And here are the matching pairs that it produces with one of my files for the region tal12:22479..29232: region: tal12:22479..29232 pair: id: tal-2446c08, start17496, end:29423 first_mate: start:28540, end:29423 no second mate pair: id: tal-2463d10, start23534, end:31363 first_mate: start:23534, end:24448 no second mate pair: id: tal-2371c09, start20860, end:28230 first_mate: start:27604, end:28230 no second mate pair: id: tal-2440b06, start19232, end:27099 first_mate: start:26025, end:27099 no second mate pair: id: tal-2327g09, start18909, end:26129 first_mate: start:25354, end:26129 no second mate pair: id: tal-2381b05, start25658, end:35054 first_mate: start:25658, end:26295 no second mate pair: id: tal-2377c11, start20898, end:28230 first_mate: start:27473, end:28230 no second mate pair: id: tal-2426e12, start21975, end:27562 first_mate: start:21975, end:23008 second_mate: start:26396, end:27562 pair: id: tal-2365h10, start22843, end:31944 first_mate: start:22843, end:23184 no second mate pair: id: tal-2388h09, start19016, end:28238 first_mate: start:27475, end:28238 no second mate So it finds a lot of pairs that span the region and the start/end from the pair is also correct but it only gives me both individual mates in one case: pair: id: tal-2426e12, start21975, end:27562 first_mate: start:21975, end:23008 second_mate: start:26396, end:27562 In this case, both pairs are actually inside the query region (at least partially) whereas in the other cases, one of the mates is not inside, e.g. this one: pair: id: tal-2388h09, start19016, end:28238 first_mate: start:27475, end:28238 no second mate > get this read pair from the BAM file: $ samtools view clones.bam | grep tal-2388h09 tal-2388h09 99 tal12 19016 205 36H9M1D14M1D664M1D16M1D21M1D28M1D15M1D10M1D12M1D7M1D8M1D5M = 27475 9223 CTTTGGATGAAATAGTTTTTAAATAATACTTATTAAATATTAAATATATAACACATAAATAAGTATTGATGCAAATTTTAAAGTATTATAGAAAACTAGGTTTGATTATATTGTTATACTGTACTTTAAGAGGAGAGAGATAAGATATCTTTGCTCTTTTAATATATAAATTTAGATAAATATTCGTTAAATTTTCTACATAGTTATTTTTTATCTTATATATTATACTGCTATAGTTATCAATGTATATACATTCAAATAATTTATTAAAAATTCTATATTATATTAATTCTATGATAAAATAATCCTGTTTGTGATTTAAAAAATGATGATTCAATAAAAACTAATAATATAATACGAGTTAATATGGAATAATAAAATGGCATTTAACATGAATTTAGTCTTTAACCTTTTCTTTGTTTGTCAAGTTTTTTAAAACATAAAACCACACATTTCAAAATGGATTTTTAGCAAATATATAAAAATTATACATTTATAATGTATTGTTATGCGTCTTTTCGATAGAATCAATATTTAATTATATGAAGTTTCCACAATAAAATAATATTTAATATTATTTATTAGTAGAGTATTTGATTATATATATAGGCATATAATAATAACTCTAGTTCTATCTACCATATTATTTATAATTATTATAACAAAATGTGATATGAAATTTTATTATATACTTATATTATTTTTTTAACTATTTTAAAATATATTTATTTATACCTCAAAACTATAAAATTGAAATTATTAATAATAATCTAATATATACCTTTATAAAAATAAACGTATAAACTAAT ><:4/+1+*)+4>BEH=9-,,66IIIIIIIIEDA>>>>A at DDFFIHHHHHITIIIIIHIIHHHHHHIYYYYYTTTYDDDHDDDDDDDIIINNTNHHHHHIYYYYIIIIIINNNNTTIIIIIIIIIIITTNTTTTTYYYTTTTTTYYNNNNNNLLLLLLLLLLLNNNNNNTTTTTTTTTTTTTNNTNNNTTTYYTLLLLLLTTTTTTTTYTTTNNNNNNTTTTTTTNNTTNNTTTTTTTTTTYYTTTTTTTNNNNNNTTTTTTTYYTTTTTTYYYTTTTTTTTTTTYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYTTTTTTTTTYYYYYYYYYYYYYYYYYYYYYYYTTTTTYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYTTTTYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYTTNNINTTYYYYYYYYYYYYYYYTTTTTTYYYYYYYYYYYYTTTTTTYYYYYYYYYYYYYTTTTTOOOIFFFIFIIOICC>>II@>>>>>>C>>>>>>CIBECCCHIIOOOOOOOOTTTIIFDDEIQQA:55839AA>99>@IIIIII>>::;;I;>>CC>>>>>@III<::=>AAA<>>>>I>:>>99:>842225006824855;5>68844//.//00:>::338:99<:/-+*-./0)((((+00+..,++(((+-()(*((((()*)***))3)''')*..+*++((*1++--+*''''((+/)*42.((***)+,+('*'''*((''''((,'%%''''''''( AS:i:614 MS:i:50 tal-2388h09 147 tal12 27475 205 1H764M40H = 19016 -9223 ATTAAATCGGTATCGCCAACACAATGAGTATAATCATTGTCAAATATGCGTTTGTAAGTATATTCATTGTCACATTCACGTTTGTAAGTATATTCATTGTCACATTCACGTTTGTAAGTATAATCATTGGAACGTTCATTTTTGTAAGTATAATCATTGGTACGTTCATTTTTGTGAGTATAATCATTGGTACGTTCATTTTTATGAGTATAATCATTGGTACGTTCATTTTTGTGAGTATAATCATTGGTACGTTCATTTTTATGAGTATAATCATTGGAACGTTCATTTTTGTAAGTATAATCATTGGTACGTTCATTTTTGTGAGTATAATCATTGGTACGTTCATTTTTATGAGTATAATCATTGGAACGTTCATTTTTATGAGTATAATCATTGGAACGTTCATTTTTGTAAGTATAATCATTGGTACGTTCATTTTTATGAGTATAATCATTGGTACGTTCATTTTTATGAGTATAATCATTGGAACGTTCATTTTTATGAGTATAATCATTGGAACGTTCATTTTTGTAAGTATAATCATTGGAACGTTCATTTTTATGAGTATAATCATTGGTACGTTCATTTTTGTGAGTATAATCATTGGTACGTTCATTTTTGTGAGTATAATCATTGGTACGTTCATTTTTATGAGTATAATCATTGGAACGTTCATTTTTATGAGTATAATCATTGGTACGTTCATTTTTGTGAGTATAATCATTGGTACGTTAATTTTTGTGAGTATAATCATTGGAACG (((0))*,-1-../2((())03---03266300271+*.-0-*''''+*,+/+))*-05330+)..4>7=77273911**((+20+03688633:93036<8;::5:<99379>>::>>>:57:<:7--)))1435::333228>::>II>::>A>>3/.958677AA=AA:>:==IIII8338<>A>>>>IIIIIIIIYYYYYKKYYYMIFFFFEIIIMI::4..8AIIC>9>=EIQQQMCAAAAAACIIIIAICIIIOOYTIIIMOQQMIIIIC>>AAABCCCCCEAI>C>>IQQIIIIIIIIIIKKYYYYYYYYYYYYYYYYYYYYYTIIIIIIYYYYTNINNNTYYYYYYYYYYYYYTTTTTYYYYYYYYYYYYYYYYYYYYYYYYYTTTTTTYYYYYYYYYYYYYYYSSYYYYYYYTTTTTTYYYYYYYYYYYYYYYYYYYYYYYYTTTTTTTTTTTTTTTTTTYYYYYTTTTTTYYYTTNNNNTTYYYYYYYYYTTTTLLTTNNTTTTYTTTTTTYYYYYYYYYYYTTOOKKKLKOOTYYYYYYYYYYYYYYYYTNNNNNNNNNTTTNYNNNNTNNNNTTYYYYYYYYTTNNNNTTYNNNNNITTTTTYYYYYYYYYYTTNNIIIIIDIIIIHTNNNNTTYYYYTNNNIIIIIITTTINIIIINNNNTTTYYYYIHHHDDHHDDIHDDGDFFFTIIINTTYYYYTTTTHHHHCCIIIHIHHHHCAI9:++**1168>ACCIIDDDDDDI>>>>>?NNN AS:i:688 MS:i:50 So the read in the first line starts before the start of the query region and is not accessible via $pair->get_SeqFeatures although this is a valid pair. Am I doing something wrong, is this the desired behaviour or is it a bug? Thanks for your help! -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From hlapp at drycafe.net Thu Jan 7 16:55:00 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Thu, 7 Jan 2010 11:55:00 -0500 Subject: [Bioperl-l] Data missing into Annotation object using Bio::SeqIO (Genbank) In-Reply-To: <29CB0088-99C1-417E-BB3B-56FE7EC135F9@illinois.edu> References: <4B28EB44.3080006@pasteur.fr> <29CB0088-99C1-417E-BB3B-56FE7EC135F9@illinois.edu> Message-ID: <240F198A-83FA-4304-ACA8-80A702A68D8C@drycafe.net> I don't know to what extent this was followed up on further and I guess it's too long ago to be of much help, but if it hasn't been mentioned before I wanted to point out Bio::SeqFeature::AnnotationAdaptor which integrates tag/value annotation and Bio::Annotation annotation into one AnnotationCollection, so it doesn't matter whether something is attached as a tag or as an annotation object. -hilmar On Dec 16, 2009, at 10:09 AM, Chris Fields wrote: > Emmanuel, > > The previous behavior in the 1.5.x series was to store feature tags > as Bio::Annotation. The problem had been the way this was > implemented was considered unsatisfactory for various reasons, so we > reverted back to using simple tag-value pairs as the default. You > can get at the data this way (from the Feature/Annotation HOWTO): > > for my $feat_object ($seq_object->get_SeqFeatures) { > print "primary tag: ", $feat_object->primary_tag, "\n"; > for my $tag ($feat_object->get_all_tags) { > print " tag: ", $tag, "\n"; > for my $value ($feat_object->get_tag_values($tag)) { > print " value: ", $value, "\n"; > } > } > } > > You can also convert all the tag-value data into a > Bio::Annotation::Collection using the > Bio::SeqFeature::AnnotationAdaptor, but this is completely optional. > > chris > > On Dec 16, 2009, at 8:14 AM, Emmanuel Quevillon wrote: > >> Hi, >> >> I've wrote a small Genbank parser few months ago before BioPerl >> release 1.6.0. >> I tried to use my code once again but now the output of my parser >> is empty. >> It looks like Annotation from seqfeatures is not filled anymore. >> >> Here is the code I used previously: >> >> while(my $seq = $streamer->next_seq()){ >> >> #We only want to retrieve CDS features... >> foreach my $feat (grep { $_->primary_tag() eq 'CDS' } $seq- >> >get_SeqFeatures()){ >> print $ofh join("#", >> $feat->annotation()- >> >get_Annotations('locus_tag'), # Acc num >> $feat->annotation()->get_Annotations('gene') >> ? $feat->annotation()- >> >get_Annotations('gene') # Gene name >> : $feat->annotation()- >> >get_Annotations('locus_tag'), >> $feat->annotation()- >> >get_Annotations('product'), # Description >> ),"\n"; >> } >> } >> >> $feat is a Bio::SeqFeature::Generic object >> >> If I print Dumper($feat->annotation()) here is the output : >> >> $VAR1 = bless( { >> '_typemap' => bless( { >> '_type' => { >> 'comment' => >> 'Bio::Annotation::Comment', >> 'reference' => >> 'Bio::Annotation::Reference', >> 'dblink' => >> 'Bio::Annotation::DBLink' >> } >> }, >> 'Bio::Annotation::TypeManager' ), >> '_annotation' => {} >> }, 'Bio::Annotation::Collection' ); >> >> Have some changes been made into the way annotation object is >> populated? >> >> Thanks for any clue and sorry if my question look stupid >> >> Regards >> >> Emmanuel >> >> -- >> ------------------------- >> Emmanuel Quevillon >> Biological Software and Databases Group >> Institut Pasteur >> +33 1 44 38 95 98 >> tuco at_ pasteur dot fr >> ------------------------- >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From rtbio.2009 at gmail.com Fri Jan 8 15:00:21 2010 From: rtbio.2009 at gmail.com (Roopa Raghuveer) Date: Fri, 8 Jan 2010 16:00:21 +0100 Subject: [Bioperl-l] Regarding blast in Bioperl Message-ID: Hello all, I was trying Remote blast using Bioperl. My input data is a Trypanosoma brucei sequence in Fasta format. When I was trying to submit to BLAST using the step $r=$factory->submit_blast($input) It was not returning anything which I checked by debugging the code. It is not blasting my input sequence even though I mentioned all the parameters.I would paste the code below. Please help me in solving put this problem. It is very urgent. Regards Roopa. #!/usr/bin/perl #path for extra camel module use lib "/srv/www/htdocs/rain/RNAi/"; use Roopablast; use Bio::SearchIO; use Bio::Search::Result::BlastResult; use Bio::Perl; use Bio::Tools::Run::RemoteBlast; use Bio::Seq; use Bio::SeqIO; use Bio::DB::GenBank; $serverpath = "/srv/www/htdocs/rain/RNAi"; $serverurl = "http://141.84.66.66/rain/RNAi"; $outfile = $serverpath."/rnairesult_".time().".html"; $nuc = $serverpath."/nuc".time().".txt"; $debugfile = $serverpath."/debug_".time().".txt"; $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; my $outstring =""; &parse_form; print "Content-type: text/html\n\n"; print "\n"; print "RNAi Result"; print " \n"; print "\n"; print "\n"; print " Your results will appear here
"; print " Please be patient, runtime can be up to 5 minutes
"; print " This page will automatically reload in 30 seconds. Roopa"; print "\n"; print "\n"; defined(my $pid = fork) or die "Can't fork: $!"; exit if $pid; open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; open(OUTFILE, '>',$outfile); print OUTFILE "\n RNAi Result \n \n \n Your results will appear here
Please be patient, runtime can be up to 5 minutes wait wait wait......
This page will automatically reload in 30 seconds Roopa
\n \n"; close(OUTFILE); @compseqs = blastcode($in{'Inputseq'}); $in{'Inputseq'} =~ s/>.*$//m; $in{'Inputseq'} =~ s/[^TAGC]//gim; $in{'Inputseq'} =~ tr/actg/ACTG/; @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'}, $in{'Threshold'}); sub blastcode { $inpu1= $_[0]; #$organ= $_[1]; open(NUC,'>',$nuc); print NUC $inpu1; close(NUC); my $prog = 'blastn'; my $db = 'refseq_rna'; my $e_val= '1e-10'; my $organism= 'Trypanosoma Brucei'; $gb = new Bio::DB::GenBank; my @params = ( '-prog' => $prog, '-data' => $db, '-expect' => $e_val, '-readmethod' => 'SearchIO', '-Organism' => $organism ); # open(OUTFILE,'>',$debugfile); # print OUTFILE @params; # close(OUTFILE); my $factory = Bio::Tools::Run::RemoteBlast->new(@params); #change a paramter $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma Brucei[ORGN]'; #change a paramter # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]'; my $v = 1; #$v is just to turn on and off the messages my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , '-organism' => 'Trypanosoma Brucei' ); while (my $input = $str->next_seq()) { #Blast a sequence against a database: #Alternatively, you could pass in a file with many #sequences rather than loop through sequence one at a time #Remove the loop starting 'while (my $input = $str->next_seq())' #and swap the two lines below for an example of that. open(OUTFILE,'>',$debugfile); print OUTFILE $input; close(OUTFILE); my $r = $factory->submit_blast($input); #The program stops here it does not return any value and it does not enter the While loop,Please help me in this regard.# open(OUTFILE,'>',$debugfile); print OUTFILE $r; close(OUTFILE); print STDERR "waiting...." if($v>0); while ( my @rids = $factory->each_rid ) { open(OUTFILE,'>',$debugfile); print OUTFILE "while entered"; close(OUTFILE); foreach my $rid ( @rids ) { open(OUTFILE,'>',$debugfile); print OUTFILE "foreach entered"; close(OUTFILE); my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $factory->remove_rid($rid); } open(OUTFILE,'>',$debugfile); print OUTFILE "if entered"; close(OUTFILE); print STDERR "." if ( $v > 0 ); sleep 5; } else { open(OUTFILE,'>',$debugfile); print OUTFILE "else entered"; close(OUTFILE); my $result = $rc->next_result(); #save the output $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; open(BLASTDEBUGFILE,'>',$blastdebugfile); print BLASTDEBUGFILE $result->next_hit(); close(BLASTDEBUGFILE); my $filename = $serverpath."/blastdata_".time().$result->query_name()."\.out"; # open(DEBUGFILE,'>',$debugfile); # open(new,'>',$filename); # @arra=; # print DEBUGFILE @arra; # close(DEBUGFILE); # close(new); $factory->save_output($filename); # open(BLASTDEBUGFILE,'>',$debugfile); # print BLASTDEBUGFILE "Hello $rid"; # close(BLASTDEBUGFILE); $factory->remove_rid($rid); open(BLASTDEBUGFILE,'>',$blastdebugfile); print BLASTDEBUGFILE $organism; close(BLASTDEBUGFILE); # open(OUTFILE,'>',$outfile); # print OUTFILE "Test2 $result->database_name()"; # close(OUTFILE); #$hit = $result->next_hit; #open(new,'>',$debugfile); #print $hit; #close(new); while ( my $hit = $result->next_hit ) { next unless ( $v > 0); # open(OUTFILE,'>',$debugfile); # print OUTFILE "$hit in while hits"; # close(OUTFILE); my $sequ = $gb->get_Seq_by_version($hit->name); my $dna = $sequ->seq(); # get the sequence as a string push(@seqs,$dna); } } } } } #open(OUTFILE,'>',$debugfile); #print OUTFILE $seqs[0]; #close(OUTFILE); return(@seqs); } open(OUTFILE, '>',$outfile) || die ; print OUTFILE "\n RNAi Result \n \n

Inputsequence:
"; for ($i=0; $i\n"; } } print OUTFILE "

"; $z=@compseqs; for($k=1;$k<$z;$k++) { print OUTFILE "

Compare Sequence:
"; for ($i=0; $i\n"; } } print OUTFILE "

"; } print OUTFILE "

Window:
$in{'Windowsize'}

Threshold:
$in{'Threshold'}

"; my $j=0; for ($i=0; $i{similar}<=$in{'Threshold'}){ $j=$in{'Windowsize'}; } $height=$out[$i]->{similar}*5; } if ($j>0) { print OUTFILE ""; $outstring .= "".substr ($in{'Inputseq'}, $i, 1).""; $j--; } else { print OUTFILE ""; $outstring .= "".substr ($in{'Inputseq'}, $i, 1).""; } if ( ($i+1)%10==0){ $outstring .= " "; } if ( ($i+1)%60==0){ $outstring .= "
\n"; } if ( ($i+1)%800==0){ print OUTFILE "

\n"; } } print OUTFILE "

$outstring"; #foreach (@out) { #print OUTFILE "

Sequence: $_->{sequence}: $_->{similar} matchs

"; #if ($_->{similar}<=$in{'Threshold'}){ # } #} print OUTFILE "\n\n"; close OUTFILE; #nameprint(); sub parse_form { local ($buffer, @pairs, $pair, $name, $value); # Read in text $ENV{'REQUEST_METHOD'} =~ tr/a-z/A-Z/; if ($ENV{'REQUEST_METHOD'} eq "POST") { read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'}); } else { $buffer = $ENV{'QUERY_STRING'}; } @pairs = split(/&/, $buffer); foreach $pair (@pairs) { ($name, $value) = split(/=/, $pair); $value =~ tr/+/ /; $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg; $in{$name} = $value; } } From maj at fortinbras.us Fri Jan 8 15:36:41 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 8 Jan 2010 10:36:41 -0500 Subject: [Bioperl-l] Regarding blast in Bioperl In-Reply-To: References: Message-ID: Hi Roopa-- I got your code to work with the following changes: +# the input should be a valid FASTA file... ... open(NUC,'>',$nuc); +print NUC ">seq (need a name line for valid fasta)\n"; print NUC $inpu1, "\n"; close(NUC); ... +# you can set these header parms in the call itself... - my $factory = Bio::Tools::Run::RemoteBlast->new(@params); + my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY => ''Trypanosoma Brucei[ORGN]'); #change a paramter +# commented this out... +# $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma Brucei[ORGN]'; MAJ ----- Original Message ----- From: "Roopa Raghuveer" To: Sent: Friday, January 08, 2010 10:00 AM Subject: [Bioperl-l] Regarding blast in Bioperl > Hello all, > > I was trying Remote blast using Bioperl. My input data is a Trypanosoma > brucei sequence in Fasta format. When I was trying to submit to BLAST using > the step > $r=$factory->submit_blast($input) > It was not returning anything which I checked by debugging the code. It is > not blasting my input sequence even though I mentioned all the parameters.I > would paste the code below. > > Please help me in solving put this problem. It is very urgent. > > Regards > Roopa. > > #!/usr/bin/perl > > #path for extra camel module > use lib "/srv/www/htdocs/rain/RNAi/"; > use Roopablast; > > > use Bio::SearchIO; > use Bio::Search::Result::BlastResult; > use Bio::Perl; > use Bio::Tools::Run::RemoteBlast; > use Bio::Seq; > use Bio::SeqIO; > use Bio::DB::GenBank; > > $serverpath = "/srv/www/htdocs/rain/RNAi"; > $serverurl = "http://141.84.66.66/rain/RNAi"; > $outfile = $serverpath."/rnairesult_".time().".html"; > $nuc = $serverpath."/nuc".time().".txt"; > $debugfile = $serverpath."/debug_".time().".txt"; > $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; > > my $outstring =""; > > &parse_form; > > print "Content-type: text/html\n\n"; > print "\n"; > print "RNAi Result"; > print " URL=$serverurl/rnairesult_".time().".html\"> \n"; > print "\n"; > print "\n"; > print " Your results will appear href=$serverurl/rnairesult_".time().".html>here
"; > print " Please be patient, runtime can be up to 5 minutes
"; > print " This page will automatically reload in 30 seconds. Roopa"; > print "\n"; > print "\n"; > > defined(my $pid = fork) or die "Can't fork: $!"; > exit if $pid; > open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; > open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; > open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; > > > > open(OUTFILE, '>',$outfile); > > print OUTFILE "\n > RNAi Result > URL=$serverurl//rnairesult_".time().".html\"> \n > > \n > \n > Your results will appear href=$serverurl/rnairesult_".time().".html>here
> Please be patient, runtime can be up to 5 minutes wait wait wait......
> This page will automatically reload in 30 seconds Roopa
> \n > \n"; > > close(OUTFILE); > > > @compseqs = blastcode($in{'Inputseq'}); > > $in{'Inputseq'} =~ s/>.*$//m; > $in{'Inputseq'} =~ s/[^TAGC]//gim; > $in{'Inputseq'} =~ tr/actg/ACTG/; > > @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'}, > $in{'Threshold'}); > > > sub blastcode > { > > $inpu1= $_[0]; > > #$organ= $_[1]; > > open(NUC,'>',$nuc); > print NUC $inpu1; > close(NUC); > > my $prog = 'blastn'; > my $db = 'refseq_rna'; > my $e_val= '1e-10'; > my $organism= 'Trypanosoma Brucei'; > > $gb = new Bio::DB::GenBank; > > my @params = ( '-prog' => $prog, > '-data' => $db, > '-expect' => $e_val, > '-readmethod' => 'SearchIO', > '-Organism' => $organism ); > > # open(OUTFILE,'>',$debugfile); > # print OUTFILE @params; > # close(OUTFILE); > > > my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > > #change a paramter > > $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma > Brucei[ORGN]'; > > #change a paramter > # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]'; > > my $v = 1; > #$v is just to turn on and off the messages > > my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , > '-organism' => 'Trypanosoma Brucei' ); > > > while (my $input = $str->next_seq()) > { > #Blast a sequence against a database: > #Alternatively, you could pass in a file with many > #sequences rather than loop through sequence one at a time > #Remove the loop starting 'while (my $input = $str->next_seq())' > #and swap the two lines below for an example of that. > > open(OUTFILE,'>',$debugfile); > print OUTFILE $input; > close(OUTFILE); > > > my $r = $factory->submit_blast($input); #The program stops here it > does not return any value and it does not enter the While loop,Please help > me in this regard.# > open(OUTFILE,'>',$debugfile); > print OUTFILE $r; > close(OUTFILE); > > > print STDERR "waiting...." if($v>0); > > while ( my @rids = $factory->each_rid ) { > open(OUTFILE,'>',$debugfile); > print OUTFILE "while entered"; > close(OUTFILE); > foreach my $rid ( @rids ) { > > open(OUTFILE,'>',$debugfile); > print OUTFILE "foreach entered"; > close(OUTFILE); > > my $rc = $factory->retrieve_blast($rid); > > if( !ref($rc) ) > { > if( $rc < 0 ) > { > $factory->remove_rid($rid); > } > open(OUTFILE,'>',$debugfile); > print OUTFILE "if entered"; > close(OUTFILE); > print STDERR "." if ( $v > 0 ); > sleep 5; > } > else { > open(OUTFILE,'>',$debugfile); > print OUTFILE "else entered"; > close(OUTFILE); > > my $result = $rc->next_result(); > #save the output > $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; > > open(BLASTDEBUGFILE,'>',$blastdebugfile); > print BLASTDEBUGFILE $result->next_hit(); > close(BLASTDEBUGFILE); > > my $filename = > $serverpath."/blastdata_".time().$result->query_name()."\.out"; > > # open(DEBUGFILE,'>',$debugfile); > # open(new,'>',$filename); > # @arra=; > # print DEBUGFILE @arra; > # close(DEBUGFILE); > # close(new); > > $factory->save_output($filename); > > # open(BLASTDEBUGFILE,'>',$debugfile); > # print BLASTDEBUGFILE "Hello $rid"; > # close(BLASTDEBUGFILE); > > $factory->remove_rid($rid); > > open(BLASTDEBUGFILE,'>',$blastdebugfile); > print BLASTDEBUGFILE $organism; > close(BLASTDEBUGFILE); > > # open(OUTFILE,'>',$outfile); > # print OUTFILE "Test2 $result->database_name()"; > # close(OUTFILE); > > #$hit = $result->next_hit; > #open(new,'>',$debugfile); > #print $hit; > #close(new); > > while ( my $hit = $result->next_hit ) { > > next unless ( $v > 0); > > # open(OUTFILE,'>',$debugfile); > # print OUTFILE "$hit in while hits"; > # close(OUTFILE); > > my $sequ = $gb->get_Seq_by_version($hit->name); > my $dna = $sequ->seq(); # get the sequence as a string > push(@seqs,$dna); > } > } > } > } > } > > #open(OUTFILE,'>',$debugfile); > #print OUTFILE $seqs[0]; > #close(OUTFILE); > > return(@seqs); > > } > > open(OUTFILE, '>',$outfile) || die ; > > print OUTFILE "\n > RNAi Result > \n > \n >

> Inputsequence:
"; > > for ($i=0; $i > print OUTFILE substr ($in{'Inputseq'}, $i, 1); > > if ( ($i+1)%10==0){ > print OUTFILE " "; > } > if ( ($i+1)%60==0){ > print OUTFILE "
\n"; > } > } > > > > print OUTFILE "

"; > > $z=@compseqs; > > for($k=1;$k<$z;$k++) { > print OUTFILE "

Compare > Sequence:
"; > > for ($i=0; $i > print OUTFILE substr ($compseqs[$k], $i, 1); > > if ( ($i+1)%10==0){ > print OUTFILE " "; > } > if ( ($i+1)%60==0){ > print OUTFILE "
\n"; > } > } > print OUTFILE "

"; > } > > print OUTFILE "

> Window:
$in{'Windowsize'} >

>

> Threshold:
$in{'Threshold'} >

"; > my $j=0; > > for ($i=0; $i > if ($i<=length ($in{'Inputseq'})-$in{'Windowsize'}){ > if ($out[$i]->{similar}<=$in{'Threshold'}){ > $j=$in{'Windowsize'}; > } > $height=$out[$i]->{similar}*5; > } > > if ($j>0) { > print OUTFILE " height=\"5\">"; > $outstring .= "".substr ($in{'Inputseq'}, $i, > 1).""; > $j--; > } > else { > print OUTFILE " height=\"5\">"; > $outstring .= "".substr ($in{'Inputseq'}, $i, > 1).""; > } > > if ( ($i+1)%10==0){ > $outstring .= " "; > } > if ( ($i+1)%60==0){ > $outstring .= "
\n"; > > } > if ( ($i+1)%800==0){ > print OUTFILE "

\n"; > > } > } > > print OUTFILE "

set\">$outstring"; > > #foreach (@out) { > #print OUTFILE "

Sequence: $_->{sequence}: $_->{similar} matchs

"; > #if ($_->{similar}<=$in{'Threshold'}){ > > # } > #} > > print OUTFILE "\n\n"; > > close OUTFILE; > > #nameprint(); > > sub parse_form { > local ($buffer, @pairs, $pair, $name, $value); > # Read in text > $ENV{'REQUEST_METHOD'} =~ tr/a-z/A-Z/; > if ($ENV{'REQUEST_METHOD'} eq "POST") > { > read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'}); > } > else > { > $buffer = $ENV{'QUERY_STRING'}; > } > @pairs = split(/&/, $buffer); > foreach $pair (@pairs) > { > ($name, $value) = split(/=/, $pair); > $value =~ tr/+/ /; > $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg; > $in{$name} = $value; > } > } > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From julian.onions at gmail.com Fri Jan 8 16:53:50 2010 From: julian.onions at gmail.com (Julian Onions) Date: Fri, 8 Jan 2010 16:53:50 +0000 Subject: [Bioperl-l] Cladogram construction Message-ID: Does anyone have any sample code for building cladograms based on Pars (one of Phylip tools) type format (or any other format actually) I've got something sort of working but I get no weights on the tree - everything appears as nan. I'd also like to set one of the species to be an outgroup. This is the closest sample I've found so far. #!/usr/bin/perl -w use strict; use Bio::AlignIO; use Bio::Tree::DistanceFactory; use Bio::Align::ProteinStatistics; use Bio::TreeIO; use Bio::Tree::Draw::Cladogram; my $alnfile = shift @ARGV || die "need a file to run"; my $input= Bio::AlignIO->new(-format => 'fasta', -file => $alnfile); if( my $aln = $input->next_aln ) { my $dfactory = Bio::Tree::DistanceFactory->new(-method => 'NJ'); my $stats = Bio::Align::ProteinStatistics->new; my $distmat = $stats->distance(-align => $aln, -method => 'Kimura'); my $treeout = Bio::TreeIO->new(-format => 'newick'); my $tree = $dfactory->make_tree($distmat); $treeout->write_tree($tree); my $obj1 = Bio::Tree::Draw::Cladogram->new(-tree => $tree, -compact => 0); $obj1->print(-file => "tree.eps"); } else { die "could not find any alignments in the file $alnfile"; } Pars input looks like 3 4 Robin 101 Blackbird 100 Sparrow 100 Thanks, Julian. From rtbio.2009 at gmail.com Sat Jan 9 16:57:09 2010 From: rtbio.2009 at gmail.com (Roopa Raghuveer) Date: Sat, 9 Jan 2010 17:57:09 +0100 Subject: [Bioperl-l] Regarding blast in Bioperl In-Reply-To: References: Message-ID: Hello all, Thanks alot for your reply Mark. It was working for Trypanosoma brucei as the organism parameter,but when I tried to use the Organism parameter from the user,it was not working i.e., I was unable to get the target sequences. Please help me in this regard. My code is #!/usr/bin/perl #path for extra camel module use lib "/srv/www/htdocs/rain/RNAi/"; use Roopablast; use Bio::SearchIO; use Bio::Search::Result::BlastResult; use Bio::Perl; use Bio::Tools::Run::RemoteBlast; use Bio::Seq; use Bio::SeqIO; use Bio::DB::GenBank; $serverpath = "/srv/www/htdocs/rain/RNAi"; $serverurl = "http://141.84.66.66/rain/RNAi"; $outfile = $serverpath."/rnairesult_".time().".html"; $nuc = $serverpath."/nuc".time().".txt"; $debugfile = $serverpath."/debug_".time().".txt"; $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; my $outstring =""; &parse_form; print "Content-type: text/html\n\n"; print "\n"; print "RNAi Result"; print " \n"; print "\n"; print "\n"; print " Your results will appear here
"; print " Please be patient, runtime can be up to 5 minutes
"; print " This page will automatically reload in 30 seconds. Roopa"; print "\n"; print "\n"; defined(my $pid = fork) or die "Can't fork: $!"; exit if $pid; open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; open(OUTFILE, '>',$outfile); print OUTFILE "\n RNAi Result \n \n \n Your results will appear here
Please be patient, runtime can be up to 5 minutes wait wait wait......
This page will automatically reload in 30 seconds Roopa
\n \n"; close(OUTFILE); @compseqs = blastcode($in{'Inputseq'},$in{'Organism'}); $in{'Inputseq'} =~ s/>.*$//m; $in{'Inputseq'} =~ s/[^TAGC]//gim; $in{'Inputseq'} =~ tr/actg/ACTG/; @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'}, $in{'Threshold'}); sub blastcode { $inpu1= $_[0]; $organ= $_[1]; open(NUC,'>',$nuc); print NUC $inpu1,"\n"; close(NUC); my $prog = 'blastn'; my $db = 'refseq_rna'; my $e_val= '1e-10'; my $organism= $organ; $gb = new Bio::DB::GenBank; my @params = ( '-prog' => $prog, '-data' => $db, '-expect' => $e_val, '-readmethod' => 'SearchIO', '-Organism' => $organism ); open(OUTFILE,'>',$debugfile); print OUTFILE $inpu1; close(OUTFILE); my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY => '$organ[ORGN]'); #my $factory = Bio::Tools::Run::RemoteBlast->new(@params); #change a paramter #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma Brucei[ORGN]'; #change a paramter # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]'; my $v = 1; #$v is just to turn on and off the messages my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , '-organism' => $organ ); while (my $input = $str->next_seq()) { #Blast a sequence against a database: #Alternatively, you could pass in a file with many #sequences rather than loop through sequence one at a time #Remove the loop starting 'while (my $input = $str->next_seq())' #and swap the two lines below for an example of that. #open(OUTFILE,'>',$debugfile); # print OUTFILE $input; #close(OUTFILE); my $r = $factory->submit_blast($input); open(OUTFILE,'>',$debugfile); # print OUTFILE $r; close(OUTFILE); print STDERR "waiting...." if($v>0); while ( my @rids = $factory->each_rid ) { # open(OUTFILE,'>',$debugfile); # print OUTFILE "while entered"; # close(OUTFILE); foreach my $rid ( @rids ) { # open(OUTFILE,'>',$debugfile); # print OUTFILE "foreach entered"; # close(OUTFILE); my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $factory->remove_rid($rid); } open(OUTFILE,'>',$debugfile); # print OUTFILE "if entered"; close(OUTFILE); print STDERR "." if ( $v > 0 ); sleep 5; } else { # open(OUTFILE,'>',$debugfile); # print OUTFILE "else entered"; # close(OUTFILE); my $result = $rc->next_result(); #save the output $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; open(BLASTDEBUGFILE,'>',$blastdebugfile); print BLASTDEBUGFILE $result->next_hit(); close(BLASTDEBUGFILE); my $filename = $serverpath."/blastdata_".time().$result->query_name()."\.out"; # open(DEBUGFILE,'>',$debugfile); # open(new,'>',$filename); # @arra=; # print DEBUGFILE @arra; # close(DEBUGFILE); # close(new); $factory->save_output($filename); # open(BLASTDEBUGFILE,'>',$debugfile); # print BLASTDEBUGFILE "Hello $rid"; # close(BLASTDEBUGFILE); $factory->remove_rid($rid); open(BLASTDEBUGFILE,'>',$blastdebugfile); print BLASTDEBUGFILE $organism; close(BLASTDEBUGFILE); # open(OUTFILE,'>',$outfile); # print OUTFILE "Test2 $result->database_name()"; # close(OUTFILE); #$hit = $result->next_hit; #open(new,'>',$debugfile); #print $hit; #close(new); while ( my $hit = $result->next_hit ) { next unless ( $v > 0); # open(OUTFILE,'>',$debugfile); # print OUTFILE "$hit in while hits"; # close(OUTFILE); my $sequ = $gb->get_Seq_by_version($hit->name); my $dna = $sequ->seq(); # get the sequence as a string push(@seqs,$dna); } } } } } #open(OUTFILE,'>',$debugfile); #print OUTFILE $seqs[0]; #close(OUTFILE); return(@seqs); } Regards, Roopa. On Fri, Jan 8, 2010 at 4:36 PM, Mark A. Jensen wrote: > Hi Roopa-- > > I got your code to work with the following changes: > > +# the input should be a valid FASTA file... > ... > open(NUC,'>',$nuc); > +print NUC ">seq (need a name line for valid fasta)\n"; > print NUC $inpu1, "\n"; > close(NUC); > ... > > +# you can set these header parms in the call itself... > - my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > + my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY => > ''Trypanosoma Brucei[ORGN]'); > > #change a paramter > +# commented this out... > +# $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma > Brucei[ORGN]'; > > MAJ > ----- Original Message ----- From: "Roopa Raghuveer" > > To: > Sent: Friday, January 08, 2010 10:00 AM > Subject: [Bioperl-l] Regarding blast in Bioperl > > > Hello all, >> >> I was trying Remote blast using Bioperl. My input data is a Trypanosoma >> brucei sequence in Fasta format. When I was trying to submit to BLAST >> using >> the step >> $r=$factory->submit_blast($input) >> It was not returning anything which I checked by debugging the code. It is >> not blasting my input sequence even though I mentioned all the >> parameters.I >> would paste the code below. >> >> Please help me in solving put this problem. It is very urgent. >> >> Regards >> Roopa. >> >> #!/usr/bin/perl >> >> #path for extra camel module >> use lib "/srv/www/htdocs/rain/RNAi/"; >> use Roopablast; >> >> >> use Bio::SearchIO; >> use Bio::Search::Result::BlastResult; >> use Bio::Perl; >> use Bio::Tools::Run::RemoteBlast; >> use Bio::Seq; >> use Bio::SeqIO; >> use Bio::DB::GenBank; >> >> $serverpath = "/srv/www/htdocs/rain/RNAi"; >> $serverurl = "http://141.84.66.66/rain/RNAi"; >> $outfile = $serverpath."/rnairesult_".time().".html"; >> $nuc = $serverpath."/nuc".time().".txt"; >> $debugfile = $serverpath."/debug_".time().".txt"; >> $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; >> >> my $outstring =""; >> >> &parse_form; >> >> print "Content-type: text/html\n\n"; >> print "\n"; >> print "RNAi Result"; >> print "> URL=$serverurl/rnairesult_".time().".html\"> \n"; >> print "\n"; >> print "\n"; >> print " Your results will appear > href=$serverurl/rnairesult_".time().".html>here
"; >> print " Please be patient, runtime can be up to 5 minutes
"; >> print " This page will automatically reload in 30 seconds. Roopa"; >> print "\n"; >> print "\n"; >> >> defined(my $pid = fork) or die "Can't fork: $!"; >> exit if $pid; >> open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; >> open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; >> open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; >> >> >> >> open(OUTFILE, '>',$outfile); >> >> print OUTFILE "\n >> RNAi Result >> > URL=$serverurl//rnairesult_".time().".html\"> \n >> >> \n >> \n >> Your results will appear > href=$serverurl/rnairesult_".time().".html>here
>> Please be patient, runtime can be up to 5 minutes wait wait >> wait......
>> This page will automatically reload in 30 seconds Roopa
>> \n >> \n"; >> >> close(OUTFILE); >> >> >> @compseqs = blastcode($in{'Inputseq'}); >> >> $in{'Inputseq'} =~ s/>.*$//m; >> $in{'Inputseq'} =~ s/[^TAGC]//gim; >> $in{'Inputseq'} =~ tr/actg/ACTG/; >> >> @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'}, >> $in{'Threshold'}); >> >> >> sub blastcode >> { >> >> $inpu1= $_[0]; >> >> #$organ= $_[1]; >> >> open(NUC,'>',$nuc); >> print NUC $inpu1; >> close(NUC); >> >> my $prog = 'blastn'; >> my $db = 'refseq_rna'; >> my $e_val= '1e-10'; >> my $organism= 'Trypanosoma Brucei'; >> >> $gb = new Bio::DB::GenBank; >> >> my @params = ( '-prog' => $prog, >> '-data' => $db, >> '-expect' => $e_val, >> '-readmethod' => 'SearchIO', >> '-Organism' => $organism ); >> >> # open(OUTFILE,'>',$debugfile); >> # print OUTFILE @params; >> # close(OUTFILE); >> >> >> my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >> >> #change a paramter >> >> $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma >> Brucei[ORGN]'; >> >> #change a paramter >> # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]'; >> >> my $v = 1; >> #$v is just to turn on and off the messages >> >> my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , >> '-organism' => 'Trypanosoma Brucei' ); >> >> >> while (my $input = $str->next_seq()) >> { >> #Blast a sequence against a database: >> #Alternatively, you could pass in a file with many >> #sequences rather than loop through sequence one at a time >> #Remove the loop starting 'while (my $input = $str->next_seq())' >> #and swap the two lines below for an example of that. >> >> open(OUTFILE,'>',$debugfile); >> print OUTFILE $input; >> close(OUTFILE); >> >> >> my $r = $factory->submit_blast($input); #The program stops here it >> does not return any value and it does not enter the While loop,Please help >> me in this regard.# >> open(OUTFILE,'>',$debugfile); >> print OUTFILE $r; >> close(OUTFILE); >> >> >> print STDERR "waiting...." if($v>0); >> >> while ( my @rids = $factory->each_rid ) { >> open(OUTFILE,'>',$debugfile); >> print OUTFILE "while entered"; >> close(OUTFILE); >> foreach my $rid ( @rids ) { >> >> open(OUTFILE,'>',$debugfile); >> print OUTFILE "foreach entered"; >> close(OUTFILE); >> >> my $rc = $factory->retrieve_blast($rid); >> >> if( !ref($rc) ) >> { >> if( $rc < 0 ) >> { >> $factory->remove_rid($rid); >> } >> open(OUTFILE,'>',$debugfile); >> print OUTFILE "if entered"; >> close(OUTFILE); >> print STDERR "." if ( $v > 0 ); >> sleep 5; >> } >> else { >> open(OUTFILE,'>',$debugfile); >> print OUTFILE "else entered"; >> close(OUTFILE); >> >> my $result = $rc->next_result(); >> #save the output >> $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; >> >> open(BLASTDEBUGFILE,'>',$blastdebugfile); >> print BLASTDEBUGFILE $result->next_hit(); >> close(BLASTDEBUGFILE); >> >> my $filename = >> $serverpath."/blastdata_".time().$result->query_name()."\.out"; >> >> # open(DEBUGFILE,'>',$debugfile); >> # open(new,'>',$filename); >> # @arra=; >> # print DEBUGFILE @arra; >> # close(DEBUGFILE); >> # close(new); >> >> $factory->save_output($filename); >> >> # open(BLASTDEBUGFILE,'>',$debugfile); >> # print BLASTDEBUGFILE "Hello $rid"; >> # close(BLASTDEBUGFILE); >> >> $factory->remove_rid($rid); >> >> open(BLASTDEBUGFILE,'>',$blastdebugfile); >> print BLASTDEBUGFILE $organism; >> close(BLASTDEBUGFILE); >> >> # open(OUTFILE,'>',$outfile); >> # print OUTFILE "Test2 $result->database_name()"; >> # close(OUTFILE); >> >> #$hit = $result->next_hit; >> #open(new,'>',$debugfile); >> #print $hit; >> #close(new); >> >> while ( my $hit = $result->next_hit ) { >> >> next unless ( $v > 0); >> >> # open(OUTFILE,'>',$debugfile); >> # print OUTFILE "$hit in while hits"; >> # close(OUTFILE); >> >> my $sequ = $gb->get_Seq_by_version($hit->name); >> my $dna = $sequ->seq(); # get the sequence as a string >> push(@seqs,$dna); >> } >> } >> } >> } >> } >> >> #open(OUTFILE,'>',$debugfile); >> #print OUTFILE $seqs[0]; >> #close(OUTFILE); >> >> return(@seqs); >> >> } >> >> open(OUTFILE, '>',$outfile) || die ; >> >> print OUTFILE "\n >> RNAi Result >> \n >> \n >>

>> Inputsequence:
"; >> >> for ($i=0; $i> >> print OUTFILE substr ($in{'Inputseq'}, $i, 1); >> >> if ( ($i+1)%10==0){ >> print OUTFILE " "; >> } >> if ( ($i+1)%60==0){ >> print OUTFILE "
\n"; >> } >> } >> >> >> >> print OUTFILE "

"; >> >> $z=@compseqs; >> >> for($k=1;$k<$z;$k++) { >> print OUTFILE "

Compare >> Sequence:
"; >> >> for ($i=0; $i> >> print OUTFILE substr ($compseqs[$k], $i, 1); >> >> if ( ($i+1)%10==0){ >> print OUTFILE " "; >> } >> if ( ($i+1)%60==0){ >> print OUTFILE "
\n"; >> } >> } >> print OUTFILE "

"; >> } >> >> print OUTFILE "

>> Window:
$in{'Windowsize'} >>

>>

>> Threshold:
$in{'Threshold'} >>

"; >> my $j=0; >> >> for ($i=0; $i> >> if ($i<=length ($in{'Inputseq'})-$in{'Windowsize'}){ >> if ($out[$i]->{similar}<=$in{'Threshold'}){ >> $j=$in{'Windowsize'}; >> } >> $height=$out[$i]->{similar}*5; >> } >> >> if ($j>0) { >> print OUTFILE "> height=\"5\">"; >> $outstring .= "".substr ($in{'Inputseq'}, $i, >> 1).""; >> $j--; >> } >> else { >> print OUTFILE "> height=\"5\">"; >> $outstring .= "".substr ($in{'Inputseq'}, $i, >> 1).""; >> } >> >> if ( ($i+1)%10==0){ >> $outstring .= " "; >> } >> if ( ($i+1)%60==0){ >> $outstring .= "
\n"; >> >> } >> if ( ($i+1)%800==0){ >> print OUTFILE "

\n"; >> >> } >> } >> >> print OUTFILE "

> set\">$outstring"; >> >> #foreach (@out) { >> #print OUTFILE "

Sequence: $_->{sequence}: $_->{similar} matchs

"; >> #if ($_->{similar}<=$in{'Threshold'}){ >> >> # } >> #} >> >> print OUTFILE "\n\n"; >> >> close OUTFILE; >> >> #nameprint(); >> >> sub parse_form { >> local ($buffer, @pairs, $pair, $name, $value); >> # Read in text >> $ENV{'REQUEST_METHOD'} =~ tr/a-z/A-Z/; >> if ($ENV{'REQUEST_METHOD'} eq "POST") >> { >> read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'}); >> } >> else >> { >> $buffer = $ENV{'QUERY_STRING'}; >> } >> @pairs = split(/&/, $buffer); >> foreach $pair (@pairs) >> { >> ($name, $value) = split(/=/, $pair); >> $value =~ tr/+/ /; >> $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg; >> $in{$name} = $value; >> } >> } >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > From maj at fortinbras.us Sat Jan 9 18:05:41 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 9 Jan 2010 13:05:41 -0500 Subject: [Bioperl-l] Regarding blast in Bioperl In-Reply-To: References: Message-ID: <4C2E8133F916495B876628EF3E8FCBB2@NewLife> I see it immediately (from making same bug many times) : my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY => - '$organ[ORGN]'); +"$organ[ORGN]"); MAJ ----- Original Message ----- From: "Roopa Raghuveer" To: "Mark A. Jensen" Cc: Sent: Saturday, January 09, 2010 11:57 AM Subject: Re: [Bioperl-l] Regarding blast in Bioperl > Hello all, > > Thanks alot for your reply Mark. It was working for Trypanosoma brucei as > the organism parameter,but when I tried to use the Organism parameter from > the user,it was not working i.e., I was unable to get the target sequences. > Please help me in this regard. My code is > > #!/usr/bin/perl > > #path for extra camel module > use lib "/srv/www/htdocs/rain/RNAi/"; > use Roopablast; > > > use Bio::SearchIO; > use Bio::Search::Result::BlastResult; > use Bio::Perl; > use Bio::Tools::Run::RemoteBlast; > use Bio::Seq; > use Bio::SeqIO; > use Bio::DB::GenBank; > > $serverpath = "/srv/www/htdocs/rain/RNAi"; > $serverurl = "http://141.84.66.66/rain/RNAi"; > $outfile = $serverpath."/rnairesult_".time().".html"; > $nuc = $serverpath."/nuc".time().".txt"; > $debugfile = $serverpath."/debug_".time().".txt"; > $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; > > my $outstring =""; > > &parse_form; > > print "Content-type: text/html\n\n"; > print "\n"; > print "RNAi Result"; > print " URL=$serverurl/rnairesult_".time().".html\"> \n"; > print "\n"; > print "\n"; > print " Your results will appear href=$serverurl/rnairesult_".time().".html>here
"; > print " Please be patient, runtime can be up to 5 minutes
"; > print " This page will automatically reload in 30 seconds. Roopa"; > print "\n"; > print "\n"; > > defined(my $pid = fork) or die "Can't fork: $!"; > exit if $pid; > open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; > open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; > open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; > > open(OUTFILE, '>',$outfile); > > print OUTFILE "\n > RNAi Result > URL=$serverurl//rnairesult_".time().".html\"> \n > > \n > \n > Your results will appear href=$serverurl/rnairesult_".time().".html>here
> Please be patient, runtime can be up to 5 minutes wait wait wait......
> This page will automatically reload in 30 seconds Roopa
> \n > \n"; > > close(OUTFILE); > > > @compseqs = blastcode($in{'Inputseq'},$in{'Organism'}); > > $in{'Inputseq'} =~ s/>.*$//m; > $in{'Inputseq'} =~ s/[^TAGC]//gim; > $in{'Inputseq'} =~ tr/actg/ACTG/; > > @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'}, > $in{'Threshold'}); > > > sub blastcode > { > > $inpu1= $_[0]; > > $organ= $_[1]; > > open(NUC,'>',$nuc); > print NUC $inpu1,"\n"; > close(NUC); > > my $prog = 'blastn'; > my $db = 'refseq_rna'; > my $e_val= '1e-10'; > my $organism= $organ; > > $gb = new Bio::DB::GenBank; > > my @params = ( '-prog' => $prog, > '-data' => $db, > '-expect' => $e_val, > '-readmethod' => 'SearchIO', > '-Organism' => $organism ); > > open(OUTFILE,'>',$debugfile); > print OUTFILE $inpu1; > close(OUTFILE); > > > my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY => > '$organ[ORGN]'); > > #my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > > #change a paramter > > #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma > Brucei[ORGN]'; > > #change a paramter > # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]'; > > my $v = 1; > #$v is just to turn on and off the messages > > my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , > '-organism' => $organ ); > > > while (my $input = $str->next_seq()) > { > #Blast a sequence against a database: > #Alternatively, you could pass in a file with many > #sequences rather than loop through sequence one at a time > #Remove the loop starting 'while (my $input = $str->next_seq())' > #and swap the two lines below for an example of that. > > #open(OUTFILE,'>',$debugfile); > # print OUTFILE $input; > #close(OUTFILE); > > > my $r = $factory->submit_blast($input); > > open(OUTFILE,'>',$debugfile); > # print OUTFILE $r; > close(OUTFILE); > > print STDERR "waiting...." if($v>0); > > while ( my @rids = $factory->each_rid ) { > # open(OUTFILE,'>',$debugfile); > # print OUTFILE "while entered"; > # close(OUTFILE); > foreach my $rid ( @rids ) { > > # open(OUTFILE,'>',$debugfile); > # print OUTFILE "foreach entered"; > # close(OUTFILE); > > my $rc = $factory->retrieve_blast($rid); > > if( !ref($rc) ) > { > if( $rc < 0 ) > { > $factory->remove_rid($rid); > } > open(OUTFILE,'>',$debugfile); > # print OUTFILE "if entered"; > close(OUTFILE); > print STDERR "." if ( $v > 0 ); > sleep 5; > } > else { > # open(OUTFILE,'>',$debugfile); > # print OUTFILE "else entered"; > # close(OUTFILE); > > my $result = $rc->next_result(); > #save the output > $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; > > open(BLASTDEBUGFILE,'>',$blastdebugfile); > print BLASTDEBUGFILE $result->next_hit(); > close(BLASTDEBUGFILE); > > my $filename = > $serverpath."/blastdata_".time().$result->query_name()."\.out"; > > # open(DEBUGFILE,'>',$debugfile); > # open(new,'>',$filename); > # @arra=; > # print DEBUGFILE @arra; > # close(DEBUGFILE); > # close(new); > > $factory->save_output($filename); > # open(BLASTDEBUGFILE,'>',$debugfile); > # print BLASTDEBUGFILE "Hello $rid"; > # close(BLASTDEBUGFILE); > > $factory->remove_rid($rid); > > open(BLASTDEBUGFILE,'>',$blastdebugfile); > print BLASTDEBUGFILE $organism; > close(BLASTDEBUGFILE); > > # open(OUTFILE,'>',$outfile); > # print OUTFILE "Test2 $result->database_name()"; > # close(OUTFILE); > > #$hit = $result->next_hit; > #open(new,'>',$debugfile); > #print $hit; > #close(new); > > while ( my $hit = $result->next_hit ) { > > next unless ( $v > 0); > > # open(OUTFILE,'>',$debugfile); > # print OUTFILE "$hit in while hits"; > # close(OUTFILE); > > my $sequ = $gb->get_Seq_by_version($hit->name); > my $dna = $sequ->seq(); # get the sequence as a string > push(@seqs,$dna); > } > } > } > } > } > > #open(OUTFILE,'>',$debugfile); > #print OUTFILE $seqs[0]; > #close(OUTFILE); > > return(@seqs); > > } > > Regards, > Roopa. > > > On Fri, Jan 8, 2010 at 4:36 PM, Mark A. Jensen wrote: > >> Hi Roopa-- >> >> I got your code to work with the following changes: >> >> +# the input should be a valid FASTA file... >> ... >> open(NUC,'>',$nuc); >> +print NUC ">seq (need a name line for valid fasta)\n"; >> print NUC $inpu1, "\n"; >> close(NUC); >> ... >> >> +# you can set these header parms in the call itself... >> - my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >> + my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY => >> ''Trypanosoma Brucei[ORGN]'); >> >> #change a paramter >> +# commented this out... >> +# $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma >> Brucei[ORGN]'; >> >> MAJ >> ----- Original Message ----- From: "Roopa Raghuveer" > > >> To: >> Sent: Friday, January 08, 2010 10:00 AM >> Subject: [Bioperl-l] Regarding blast in Bioperl >> >> >> Hello all, >>> >>> I was trying Remote blast using Bioperl. My input data is a Trypanosoma >>> brucei sequence in Fasta format. When I was trying to submit to BLAST >>> using >>> the step >>> $r=$factory->submit_blast($input) >>> It was not returning anything which I checked by debugging the code. It is >>> not blasting my input sequence even though I mentioned all the >>> parameters.I >>> would paste the code below. >>> >>> Please help me in solving put this problem. It is very urgent. >>> >>> Regards >>> Roopa. >>> >>> #!/usr/bin/perl >>> >>> #path for extra camel module >>> use lib "/srv/www/htdocs/rain/RNAi/"; >>> use Roopablast; >>> >>> >>> use Bio::SearchIO; >>> use Bio::Search::Result::BlastResult; >>> use Bio::Perl; >>> use Bio::Tools::Run::RemoteBlast; >>> use Bio::Seq; >>> use Bio::SeqIO; >>> use Bio::DB::GenBank; >>> >>> $serverpath = "/srv/www/htdocs/rain/RNAi"; >>> $serverurl = "http://141.84.66.66/rain/RNAi"; >>> $outfile = $serverpath."/rnairesult_".time().".html"; >>> $nuc = $serverpath."/nuc".time().".txt"; >>> $debugfile = $serverpath."/debug_".time().".txt"; >>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; >>> >>> my $outstring =""; >>> >>> &parse_form; >>> >>> print "Content-type: text/html\n\n"; >>> print "\n"; >>> print "RNAi Result"; >>> print ">> URL=$serverurl/rnairesult_".time().".html\"> \n"; >>> print "\n"; >>> print "\n"; >>> print " Your results will appear >> href=$serverurl/rnairesult_".time().".html>here
"; >>> print " Please be patient, runtime can be up to 5 minutes
"; >>> print " This page will automatically reload in 30 seconds. Roopa"; >>> print "\n"; >>> print "\n"; >>> >>> defined(my $pid = fork) or die "Can't fork: $!"; >>> exit if $pid; >>> open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; >>> open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; >>> open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; >>> >>> >>> >>> open(OUTFILE, '>',$outfile); >>> >>> print OUTFILE "\n >>> RNAi Result >>> >> URL=$serverurl//rnairesult_".time().".html\"> \n >>> >>> \n >>> \n >>> Your results will appear >> href=$serverurl/rnairesult_".time().".html>here
>>> Please be patient, runtime can be up to 5 minutes wait wait >>> wait......
>>> This page will automatically reload in 30 seconds Roopa
>>> \n >>> \n"; >>> >>> close(OUTFILE); >>> >>> >>> @compseqs = blastcode($in{'Inputseq'}); >>> >>> $in{'Inputseq'} =~ s/>.*$//m; >>> $in{'Inputseq'} =~ s/[^TAGC]//gim; >>> $in{'Inputseq'} =~ tr/actg/ACTG/; >>> >>> @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'}, >>> $in{'Threshold'}); >>> >>> >>> sub blastcode >>> { >>> >>> $inpu1= $_[0]; >>> >>> #$organ= $_[1]; >>> >>> open(NUC,'>',$nuc); >>> print NUC $inpu1; >>> close(NUC); >>> >>> my $prog = 'blastn'; >>> my $db = 'refseq_rna'; >>> my $e_val= '1e-10'; >>> my $organism= 'Trypanosoma Brucei'; >>> >>> $gb = new Bio::DB::GenBank; >>> >>> my @params = ( '-prog' => $prog, >>> '-data' => $db, >>> '-expect' => $e_val, >>> '-readmethod' => 'SearchIO', >>> '-Organism' => $organism ); >>> >>> # open(OUTFILE,'>',$debugfile); >>> # print OUTFILE @params; >>> # close(OUTFILE); >>> >>> >>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >>> >>> #change a paramter >>> >>> $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma >>> Brucei[ORGN]'; >>> >>> #change a paramter >>> # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]'; >>> >>> my $v = 1; >>> #$v is just to turn on and off the messages >>> >>> my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , >>> '-organism' => 'Trypanosoma Brucei' ); >>> >>> >>> while (my $input = $str->next_seq()) >>> { >>> #Blast a sequence against a database: >>> #Alternatively, you could pass in a file with many >>> #sequences rather than loop through sequence one at a time >>> #Remove the loop starting 'while (my $input = $str->next_seq())' >>> #and swap the two lines below for an example of that. >>> >>> open(OUTFILE,'>',$debugfile); >>> print OUTFILE $input; >>> close(OUTFILE); >>> >>> >>> my $r = $factory->submit_blast($input); #The program stops here it >>> does not return any value and it does not enter the While loop,Please help >>> me in this regard.# >>> open(OUTFILE,'>',$debugfile); >>> print OUTFILE $r; >>> close(OUTFILE); >>> >>> >>> print STDERR "waiting...." if($v>0); >>> >>> while ( my @rids = $factory->each_rid ) { >>> open(OUTFILE,'>',$debugfile); >>> print OUTFILE "while entered"; >>> close(OUTFILE); >>> foreach my $rid ( @rids ) { >>> >>> open(OUTFILE,'>',$debugfile); >>> print OUTFILE "foreach entered"; >>> close(OUTFILE); >>> >>> my $rc = $factory->retrieve_blast($rid); >>> >>> if( !ref($rc) ) >>> { >>> if( $rc < 0 ) >>> { >>> $factory->remove_rid($rid); >>> } >>> open(OUTFILE,'>',$debugfile); >>> print OUTFILE "if entered"; >>> close(OUTFILE); >>> print STDERR "." if ( $v > 0 ); >>> sleep 5; >>> } >>> else { >>> open(OUTFILE,'>',$debugfile); >>> print OUTFILE "else entered"; >>> close(OUTFILE); >>> >>> my $result = $rc->next_result(); >>> #save the output >>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; >>> >>> open(BLASTDEBUGFILE,'>',$blastdebugfile); >>> print BLASTDEBUGFILE $result->next_hit(); >>> close(BLASTDEBUGFILE); >>> >>> my $filename = >>> $serverpath."/blastdata_".time().$result->query_name()."\.out"; >>> >>> # open(DEBUGFILE,'>',$debugfile); >>> # open(new,'>',$filename); >>> # @arra=; >>> # print DEBUGFILE @arra; >>> # close(DEBUGFILE); >>> # close(new); >>> >>> $factory->save_output($filename); >>> >>> # open(BLASTDEBUGFILE,'>',$debugfile); >>> # print BLASTDEBUGFILE "Hello $rid"; >>> # close(BLASTDEBUGFILE); >>> >>> $factory->remove_rid($rid); >>> >>> open(BLASTDEBUGFILE,'>',$blastdebugfile); >>> print BLASTDEBUGFILE $organism; >>> close(BLASTDEBUGFILE); >>> >>> # open(OUTFILE,'>',$outfile); >>> # print OUTFILE "Test2 $result->database_name()"; >>> # close(OUTFILE); >>> >>> #$hit = $result->next_hit; >>> #open(new,'>',$debugfile); >>> #print $hit; >>> #close(new); >>> >>> while ( my $hit = $result->next_hit ) { >>> >>> next unless ( $v > 0); >>> >>> # open(OUTFILE,'>',$debugfile); >>> # print OUTFILE "$hit in while hits"; >>> # close(OUTFILE); >>> >>> my $sequ = $gb->get_Seq_by_version($hit->name); >>> my $dna = $sequ->seq(); # get the sequence as a string >>> push(@seqs,$dna); >>> } >>> } >>> } >>> } >>> } >>> >>> #open(OUTFILE,'>',$debugfile); >>> #print OUTFILE $seqs[0]; >>> #close(OUTFILE); >>> >>> return(@seqs); >>> >>> } >>> >>> open(OUTFILE, '>',$outfile) || die ; >>> >>> print OUTFILE "\n >>> RNAi Result >>> \n >>> \n >>>

>>> Inputsequence:
"; >>> >>> for ($i=0; $i>> >>> print OUTFILE substr ($in{'Inputseq'}, $i, 1); >>> >>> if ( ($i+1)%10==0){ >>> print OUTFILE " "; >>> } >>> if ( ($i+1)%60==0){ >>> print OUTFILE "
\n"; >>> } >>> } >>> >>> >>> >>> print OUTFILE "

"; >>> >>> $z=@compseqs; >>> >>> for($k=1;$k<$z;$k++) { >>> print OUTFILE "

Compare >>> Sequence:
"; >>> >>> for ($i=0; $i>> >>> print OUTFILE substr ($compseqs[$k], $i, 1); >>> >>> if ( ($i+1)%10==0){ >>> print OUTFILE " "; >>> } >>> if ( ($i+1)%60==0){ >>> print OUTFILE "
\n"; >>> } >>> } >>> print OUTFILE "

"; >>> } >>> >>> print OUTFILE "

>>> Window:
$in{'Windowsize'} >>>

>>>

>>> Threshold:
$in{'Threshold'} >>>

"; >>> my $j=0; >>> >>> for ($i=0; $i>> >>> if ($i<=length ($in{'Inputseq'})-$in{'Windowsize'}){ >>> if ($out[$i]->{similar}<=$in{'Threshold'}){ >>> $j=$in{'Windowsize'}; >>> } >>> $height=$out[$i]->{similar}*5; >>> } >>> >>> if ($j>0) { >>> print OUTFILE ">> height=\"5\">"; >>> $outstring .= "".substr ($in{'Inputseq'}, $i, >>> 1).""; >>> $j--; >>> } >>> else { >>> print OUTFILE ">> height=\"5\">"; >>> $outstring .= "".substr ($in{'Inputseq'}, $i, >>> 1).""; >>> } >>> >>> if ( ($i+1)%10==0){ >>> $outstring .= " "; >>> } >>> if ( ($i+1)%60==0){ >>> $outstring .= "
\n"; >>> >>> } >>> if ( ($i+1)%800==0){ >>> print OUTFILE "

\n"; >>> >>> } >>> } >>> >>> print OUTFILE "

>> set\">$outstring"; >>> >>> #foreach (@out) { >>> #print OUTFILE "

Sequence: $_->{sequence}: $_->{similar} matchs

"; >>> #if ($_->{similar}<=$in{'Threshold'}){ >>> >>> # } >>> #} >>> >>> print OUTFILE "\n\n"; >>> >>> close OUTFILE; >>> >>> #nameprint(); >>> >>> sub parse_form { >>> local ($buffer, @pairs, $pair, $name, $value); >>> # Read in text >>> $ENV{'REQUEST_METHOD'} =~ tr/a-z/A-Z/; >>> if ($ENV{'REQUEST_METHOD'} eq "POST") >>> { >>> read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'}); >>> } >>> else >>> { >>> $buffer = $ENV{'QUERY_STRING'}; >>> } >>> @pairs = split(/&/, $buffer); >>> foreach $pair (@pairs) >>> { >>> ($name, $value) = split(/=/, $pair); >>> $value =~ tr/+/ /; >>> $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg; >>> $in{$name} = $value; >>> } >>> } >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From robert.bradbury at gmail.com Sat Jan 9 19:52:53 2010 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Sat, 9 Jan 2010 14:52:53 -0500 Subject: [Bioperl-l] Regarding blast in Bioperl In-Reply-To: References: Message-ID: Roopa, Mark is correct, you have to be very careful of single vs. double quotes in perl. Double quoted strings are "interpreted" while single quoted strings are taken literally is my current understanding. I tried to run your script (with fixes) but without the supporting files it appears to be impossible. What I am curious about is what it is trying to do, I was particularly i particularly intrigued by some apparent efforts to parse blast results into color enhanced HTML and without thinking about the code in detail it seems easier to simply ask what you are trying to do? I find "classical" blast results particularly tedious and long for blast results that display concise information as the NCBI homologene cross-species comparisons do. Unfortunately NCBI has deemed their methods (I have asked them) "too complex to disclose (for a person comfortable in dealing with assembly language, or even gate level electronics -- "too complex" is a very relative concept)". One has the option of using NCBI with a limited number of species but good display methodologies or Ensembl with many more species but less desirable display methodologies (phylogenetic tree derived from cross species comparisons). And for the WRN protein which may play a key role in aging (through the activity of its exonuclease domain mutating DNA sequences and inducing microdeletions and microinsertions this gets important because it appears that the *C. elegans* genome is missing the exonuclease domain (so it may be useless from the perspective of studying aging), and the other 4 nematode species which have been sequenced aren't even in the NCBI nor the Ensembl databases. Needless to say, if we manage in the near future, given the drop in sequencing costs, to sequence the nematodes which are freeze/thaw tolerant (which induces DSB that have to be repaired) those genomes will be unlikely to be in the NCBI/Ensembl databases either. So there is a requirement for the user to develop the ability to mix and match public and obscure databases in creative ways to provide easy to interpret information. Robert Bradbury From robert.bradbury at gmail.com Sat Jan 9 20:27:54 2010 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Sat, 9 Jan 2010 15:27:54 -0500 Subject: [Bioperl-l] Ensembl problems Message-ID: I am trying to get the examples provided by EMBL/Ensembl to work and am encountering problems. For example, about 1/3 of the way through the Compara API tutorial [1] there is what is supposed to be a completely functional script. It does not work. This is in contrast to some of the earlier simple scripts (listing the species in Ensmbl etc.) which do work on my machine, so I have all the libraries do dah installed correctly). Very poor form to document scripts which do not function on a properly setup system. I have modified my invocation of the script slightly: Align.pl --set_of_species \ "Homo sapiens:Pan troglodytes:Gorilla gorilla:Macaca mulatta:Otolemur garnettii:Pongo pygmaeus:Equus caballus:Bos taurus:Loxodonta africana:Pteropus vampyrus:Myotis lucifugus:Felis catus:Canis familiaris:Sus scrofa:Rattus norvegicus:Mus musculus:Gallus gallus:Xenopus tropicalis:Takifugu rubripes:Tetraodon nigroviridis:Danio rerio:Tupaia belangeri:Caenorhabditis elegans:Saccharomyces cerevisiae" which results in "Can't call method "fetch_by_method_link_type_GenomeDBs" on an undefined value at ./Align.pl line 132." (Align.pl is my slightly modified example of the Compara Tutoraial code.) As these are slightly modified perl scripts from the documantation, the line numbers may be variable. I can print out the genome_dbs, and it gives me a list of genome names (hash tables) though it appears that is problematic in the Align.pl script. in spite of the fact that just previously to that call I dumped "genome_dbs" and got back some 25 hash tables (expected). I believe this occurs whether one is comparing "human:mouse" or the more complex species set I have outlined above. Has anyone else attempted to run the code documented in the Ensembl API Tutorial? Any suggestions as to what direction to go in would be appreciated -- when one is trying to copy code out of a tutorial and it fails its kind of hard to know where to go.) There do appear to be some problems in the specifications of a Compara version/database and there don't appear to be a lot of resources informing one of what resources are currently available. Robert 1. http://pre.ensembl.org/info/docs/api/compara/compara_tutorial.html From ak at ebi.ac.uk Sat Jan 9 22:01:21 2010 From: ak at ebi.ac.uk (Andreas =?iso-8859-1?B?S+Ro5HJp?=) Date: Sat, 9 Jan 2010 22:01:21 +0000 Subject: [Bioperl-l] Ensembl problems In-Reply-To: References: Message-ID: <20100109220121.GA9521@quux.windows.ebi.ac.uk> On Sat, Jan 09, 2010 at 03:27:54PM -0500, Robert Bradbury wrote: > I am trying to get the examples provided by EMBL/Ensembl to work and am > encountering problems. Hi Robert, The ensembl-dev list is the appropriate forum for this type of questions as it has nothing to do with bioperl. There is also the Ensembl helpdesk. If you send your problem to I'm sure that it will be picked up by the appropriate people (I do myself not know enough about the Compara API to be able to diagnose this problem straight away I'm afraid). Be sure to submit a minimal script that still exhibit the problem, and information about what version of the APIs you're using (we will assume that you're not mixing newer version of the API with older databases or vice versa). We are generally very happy to have bugs in documentation or code pointed out to us, and will correct errors as we are made aware of them. Kind regards, Andreas > For example, about 1/3 of the way through the Compara API tutorial [1] there > is what is supposed to be a completely functional script. It does not > work. This is in contrast to some of the earlier simple scripts (listing > the species in Ensmbl etc.) which do work on my machine, so I have all the > libraries do dah installed correctly). > > Very poor form to document scripts which do not function on a properly setup > system. > > I have modified my invocation of the script slightly: > Align.pl --set_of_species \ > "Homo sapiens:Pan troglodytes:Gorilla gorilla:Macaca mulatta:Otolemur > garnettii:Pongo pygmaeus:Equus caballus:Bos taurus:Loxodonta > africana:Pteropus vampyrus:Myotis lucifugus:Felis catus:Canis familiaris:Sus > scrofa:Rattus norvegicus:Mus musculus:Gallus gallus:Xenopus > tropicalis:Takifugu rubripes:Tetraodon nigroviridis:Danio rerio:Tupaia > belangeri:Caenorhabditis elegans:Saccharomyces cerevisiae" > > which results in "Can't call method "fetch_by_method_link_type_GenomeDBs" on > an undefined value at ./Align.pl line 132." (Align.pl is my slightly > modified example of the Compara Tutoraial code.) > As these are slightly modified perl scripts from the documantation, the line > numbers may be variable. > > I can print out the genome_dbs, and it gives me a list of genome names (hash > tables) though it appears that is problematic in the Align.pl script. > in spite of the fact that just previously to that call I dumped "genome_dbs" > and got back some 25 hash tables (expected). I believe this occurs whether > one is comparing "human:mouse" or the more complex species set I have > outlined above. > > > > Has anyone else attempted to run the code documented in the Ensembl API > Tutorial? > Any suggestions as to what direction to go in would be appreciated -- when > one is trying to copy code out of a tutorial and it fails its kind of hard > to know where to go.) > > There do appear to be some problems in the specifications of a Compara > version/database and there don't appear to be a lot of resources informing > one of what resources are currently available. > > Robert > > > 1. http://pre.ensembl.org/info/docs/api/compara/compara_tutorial.html > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Andreas K?h?ri, Ensembl Software Developer European Bioinformatics Institute (EMBL-EBI) Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, United Kingdom From cjfields at illinois.edu Sat Jan 9 22:01:19 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 9 Jan 2010 16:01:19 -0600 Subject: [Bioperl-l] Ensembl problems In-Reply-To: References: Message-ID: <743C998D-BBB5-4832-BA25-24D7D7288F78@illinois.edu> Robert, Ensembl errors probably should be redirected to the ensembl mail list. I can't speak to the problems with it (they appear specific to the Ensembl tool set). chris On Jan 9, 2010, at 2:27 PM, Robert Bradbury wrote: > I am trying to get the examples provided by EMBL/Ensembl to work and am > encountering problems. > > For example, about 1/3 of the way through the Compara API tutorial [1] there > is what is supposed to be a completely functional script. It does not > work. This is in contrast to some of the earlier simple scripts (listing > the species in Ensmbl etc.) which do work on my machine, so I have all the > libraries do dah installed correctly). > > Very poor form to document scripts which do not function on a properly setup > system. > > I have modified my invocation of the script slightly: > Align.pl --set_of_species \ > "Homo sapiens:Pan troglodytes:Gorilla gorilla:Macaca mulatta:Otolemur > garnettii:Pongo pygmaeus:Equus caballus:Bos taurus:Loxodonta > africana:Pteropus vampyrus:Myotis lucifugus:Felis catus:Canis familiaris:Sus > scrofa:Rattus norvegicus:Mus musculus:Gallus gallus:Xenopus > tropicalis:Takifugu rubripes:Tetraodon nigroviridis:Danio rerio:Tupaia > belangeri:Caenorhabditis elegans:Saccharomyces cerevisiae" > > which results in "Can't call method "fetch_by_method_link_type_GenomeDBs" on > an undefined value at ./Align.pl line 132." (Align.pl is my slightly > modified example of the Compara Tutoraial code.) > As these are slightly modified perl scripts from the documantation, the line > numbers may be variable. > > I can print out the genome_dbs, and it gives me a list of genome names (hash > tables) though it appears that is problematic in the Align.pl script. > in spite of the fact that just previously to that call I dumped "genome_dbs" > and got back some 25 hash tables (expected). I believe this occurs whether > one is comparing "human:mouse" or the more complex species set I have > outlined above. > > > > Has anyone else attempted to run the code documented in the Ensembl API > Tutorial? > Any suggestions as to what direction to go in would be appreciated -- when > one is trying to copy code out of a tutorial and it fails its kind of hard > to know where to go.) > > There do appear to be some problems in the specifications of a Compara > version/database and there don't appear to be a lot of resources informing > one of what resources are currently available. > > Robert > > > 1. http://pre.ensembl.org/info/docs/api/compara/compara_tutorial.html > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From robert.bradbury at gmail.com Sun Jan 10 19:47:00 2010 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Sun, 10 Jan 2010 14:47:00 -0500 Subject: [Bioperl-l] Ensembl problems In-Reply-To: <20100109220121.GA9521@quux.windows.ebi.ac.uk> References: <20100109220121.GA9521@quux.windows.ebi.ac.uk> Message-ID: As it turns out the example from the file I cited (the compara API tutorial does work). The code that I started with may have been from a "MS-WORD" document distributed with the documentation (which could quite well be out-of-date). But even the corrected code does not work for various uncommon comparisons between species (which they may not have archived in Ensembl). I also don't understand enough about the functions yet as to whether they are comparing the same regions from the same chromosomes that just happen to be identical or whether they are comparing the same region with a homologous region on a different chromosome (i.e. conserved genes). I'm going to have to dig into this some more to figure out what is going on. Thanks for the pointers, I'll refer future questions to the Ensembl list/help-desk. However, if anyone knows Ensembl very well, the database has in it some of these interspecies comparisons already. They are accessed when one does a phylogeny tree for specific genes (and generally for highly conserved gene you will get a tree that includes nearly all 50 species in the database). As I don't think they are computed on-the-fly, the information must be precomputed and stored someplace in the database. I would very much like to know how to access this information. Thanks, Robert On 1/9/10, Andreas K?h?ri wrote: > On Sat, Jan 09, 2010 at 03:27:54PM -0500, Robert Bradbury wrote: >> I am trying to get the examples provided by EMBL/Ensembl to work and am >> encountering problems. > > Hi Robert, > > The ensembl-dev list is the appropriate forum for this type of questions > as it has nothing to do with bioperl. > > There is also the Ensembl helpdesk. If you send your problem to > I'm sure that it will be picked up by the > appropriate people (I do myself not know enough about the Compara API to > be able to diagnose this problem straight away I'm afraid). > > Be sure to submit a minimal script that still exhibit the problem, and > information about what version of the APIs you're using (we will assume > that you're not mixing newer version of the API with older databases or > vice versa). > > We are generally very happy to have bugs in documentation or code > pointed out to us, and will correct errors as we are made aware of them. > > > Kind regards, > Andreas > >> For example, about 1/3 of the way through the Compara API tutorial [1] >> there >> is what is supposed to be a completely functional script. It does not >> work. This is in contrast to some of the earlier simple scripts (listing >> the species in Ensmbl etc.) which do work on my machine, so I have all >> the >> libraries do dah installed correctly). >> >> Very poor form to document scripts which do not function on a properly >> setup >> system. >> >> I have modified my invocation of the script slightly: >> Align.pl --set_of_species \ >> "Homo sapiens:Pan troglodytes:Gorilla gorilla:Macaca mulatta:Otolemur >> garnettii:Pongo pygmaeus:Equus caballus:Bos taurus:Loxodonta >> africana:Pteropus vampyrus:Myotis lucifugus:Felis catus:Canis >> familiaris:Sus >> scrofa:Rattus norvegicus:Mus musculus:Gallus gallus:Xenopus >> tropicalis:Takifugu rubripes:Tetraodon nigroviridis:Danio rerio:Tupaia >> belangeri:Caenorhabditis elegans:Saccharomyces cerevisiae" >> >> which results in "Can't call method "fetch_by_method_link_type_GenomeDBs" >> on >> an undefined value at ./Align.pl line 132." (Align.pl is my slightly >> modified example of the Compara Tutoraial code.) >> As these are slightly modified perl scripts from the documantation, the >> line >> numbers may be variable. >> >> I can print out the genome_dbs, and it gives me a list of genome names >> (hash >> tables) though it appears that is problematic in the Align.pl script. >> in spite of the fact that just previously to that call I dumped >> "genome_dbs" >> and got back some 25 hash tables (expected). I believe this occurs >> whether >> one is comparing "human:mouse" or the more complex species set I have >> outlined above. >> >> >> >> Has anyone else attempted to run the code documented in the Ensembl API >> Tutorial? >> Any suggestions as to what direction to go in would be appreciated -- when >> one is trying to copy code out of a tutorial and it fails its kind of hard >> to know where to go.) >> >> There do appear to be some problems in the specifications of a Compara >> version/database and there don't appear to be a lot of resources informing >> one of what resources are currently available. >> >> Robert >> >> >> 1. http://pre.ensembl.org/info/docs/api/compara/compara_tutorial.html >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > -- > Andreas K?h?ri, Ensembl Software Developer > European Bioinformatics Institute (EMBL-EBI) > Wellcome Trust Genome Campus, Hinxton > Cambridge CB10 1SD, United Kingdom > From Russell.Smithies at agresearch.co.nz Sun Jan 10 20:34:39 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Mon, 11 Jan 2010 09:34:39 +1300 Subject: [Bioperl-l] how to retrieve organism name from accession number? In-Reply-To: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife> References: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz> An alternate non-BioPerly way (that may be faster given NCBI's flakiness lately) would be to download the gi_taxid_nucl.zip or gi_taxid_prot.zip files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash and do lookups. In that same dir, taxdump.tar.gz contains a file called names.dmp which lists taxids and descriptions (and synonyms) If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I could do this: my $taxid = $gi_taxid_nucl{$accession}; my $org_name = $names{$taxid}; --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen > Sent: Saturday, 26 December 2009 4:52 p.m. > To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] how to retrieve organism name from accession > number? > > Bhakti, > The following example (using EUtilities) may serve your purpose: > > use Bio::DB::EUtilities; > > my (%taxa, @taxa); > my (%names, %idmap); > > # these are protein ids; nuc ids will work by changing -dbfrom => > 'nucleotide', > # (probably) > > my @ids = qw(1621261 89318838 68536103 20807972 730439); > > my $factory = Bio::DB::EUtilities->new(-eutil => 'elink', > -db => 'taxonomy', > -dbfrom => 'protein', > -correspondence => 1, > -id => \@ids); > > # iterate through the LinkSet objects > while (my $ds = $factory->next_LinkSet) { > $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0] > } > > @taxa = @taxa{@ids}; > > $factory = Bio::DB::EUtilities->new(-eutil => 'esummary', > -db => 'taxonomy', > -id => \@taxa ); > > while (local $_ = $factory->next_DocSum) { > $names{($_->get_contents_by_name('TaxId'))[0]} = > ($_->get_contents_by_name('ScientificName'))[0]; > } > > foreach (@ids) { > $idmap{$_} = $names{$taxa{$_}}; > } > > # %idmap is > # 1621261 => 'Mycobacterium tuberculosis H37Rv' > # 20807972 => 'Thermoanaerobacter tengcongensis MB4' > # 68536103 => 'Corynebacterium jeikeium K411' > # 730439 => 'Bacillus caldolyticus' > # 89318838 => undef (this record has been removed from the db) > > 1; > > You probably will need to break up your 30000 into chunks > (say, 1000-3000 each), and do the above on each chunk with a > > sleep 3; > > or so separating the queries. > MAJ > ----- Original Message ----- > From: "Bhakti Dwivedi" > To: > Sent: Friday, December 25, 2009 9:46 PM > Subject: [Bioperl-l] how to retrieve organism name from accession number? > > > > Hi, > > > > Does anyone know how to retrieve the "Source" or the "Species name" > given > > the accession number using Bioperl. I have these 30,000 accession > numbers > > for which I need to get the source organisms. Any kind of help will be > > appreciated. > > > > Thanks > > > > BD > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From cjfields at illinois.edu Sun Jan 10 20:49:40 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 10 Jan 2010 14:49:40 -0600 Subject: [Bioperl-l] how to retrieve organism name from accession number? In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz> References: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife> <18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz> Message-ID: One could also use Bio::DB::Taxonomy, which indexes the same files or (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for the details). chris On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote: > An alternate non-BioPerly way (that may be faster given NCBI's flakiness lately) would be to download the gi_taxid_nucl.zip or gi_taxid_prot.zip files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash and do lookups. > In that same dir, taxdump.tar.gz contains a file called names.dmp which lists taxids and descriptions (and synonyms) > > If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I could do this: > > my $taxid = $gi_taxid_nucl{$accession}; > my $org_name = $names{$taxid}; > > --Russell > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen >> Sent: Saturday, 26 December 2009 4:52 p.m. >> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] how to retrieve organism name from accession >> number? >> >> Bhakti, >> The following example (using EUtilities) may serve your purpose: >> >> use Bio::DB::EUtilities; >> >> my (%taxa, @taxa); >> my (%names, %idmap); >> >> # these are protein ids; nuc ids will work by changing -dbfrom => >> 'nucleotide', >> # (probably) >> >> my @ids = qw(1621261 89318838 68536103 20807972 730439); >> >> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink', >> -db => 'taxonomy', >> -dbfrom => 'protein', >> -correspondence => 1, >> -id => \@ids); >> >> # iterate through the LinkSet objects >> while (my $ds = $factory->next_LinkSet) { >> $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0] >> } >> >> @taxa = @taxa{@ids}; >> >> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary', >> -db => 'taxonomy', >> -id => \@taxa ); >> >> while (local $_ = $factory->next_DocSum) { >> $names{($_->get_contents_by_name('TaxId'))[0]} = >> ($_->get_contents_by_name('ScientificName'))[0]; >> } >> >> foreach (@ids) { >> $idmap{$_} = $names{$taxa{$_}}; >> } >> >> # %idmap is >> # 1621261 => 'Mycobacterium tuberculosis H37Rv' >> # 20807972 => 'Thermoanaerobacter tengcongensis MB4' >> # 68536103 => 'Corynebacterium jeikeium K411' >> # 730439 => 'Bacillus caldolyticus' >> # 89318838 => undef (this record has been removed from the db) >> >> 1; >> >> You probably will need to break up your 30000 into chunks >> (say, 1000-3000 each), and do the above on each chunk with a >> >> sleep 3; >> >> or so separating the queries. >> MAJ >> ----- Original Message ----- >> From: "Bhakti Dwivedi" >> To: >> Sent: Friday, December 25, 2009 9:46 PM >> Subject: [Bioperl-l] how to retrieve organism name from accession number? >> >> >>> Hi, >>> >>> Does anyone know how to retrieve the "Source" or the "Species name" >> given >>> the accession number using Bioperl. I have these 30,000 accession >> numbers >>> for which I need to get the source organisms. Any kind of help will be >>> appreciated. >>> >>> Thanks >>> >>> BD >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Russell.Smithies at agresearch.co.nz Sun Jan 10 21:05:06 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Mon, 11 Jan 2010 10:05:06 +1300 Subject: [Bioperl-l] how to retrieve organism name from accession number? In-Reply-To: References: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife> <18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz> I've started to go off eUtils recently (not BioPerl's fault) as I've often been finding that with large queries, chunks of the resulting data is missing. For example, before Xmas I was creating species-specific databases by using eUtils to get a list of GI numbers back for a taxid, then retrieving the fasta sequences in chunks of 500. Very regularly, in the middle of the fasta there would be a message about resource unavailable eg. >test_sequence_1 TACGATCATCGCTResource UnavailableTACGACTCTGCT >test_sequence_2 TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT Often this wasn't detected until formatdb complained about invalid characters. Inquiries to NCBI as to why this was happening and what to do about it returned stupid answers ("do each sequence manually thru the web interface", or "use eUtils"). As we have a nice fast network connection, I now prefer to download very large gzip files (i.e. all of refseq) and extract what I need. I can't help but think that NCBI could solve a lot of problems if they gzipped the output from eUtils queries - it's something I've requested regularly for the last 5 years or so!! --Russell > -----Original Message----- > From: Chris Fields [mailto:cjfields at illinois.edu] > Sent: Monday, 11 January 2010 9:50 a.m. > To: Smithies, Russell > Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-bio.org' > Subject: Re: [Bioperl-l] how to retrieve organism name from accession > number? > > One could also use Bio::DB::Taxonomy, which indexes the same files or > (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for the > details). > > chris > > On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote: > > > An alternate non-BioPerly way (that may be faster given NCBI's flakiness > lately) would be to download the gi_taxid_nucl.zip or gi_taxid_prot.zip > files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash and > do lookups. > > In that same dir, taxdump.tar.gz contains a file called names.dmp which > lists taxids and descriptions (and synonyms) > > > > If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I > could do this: > > > > my $taxid = $gi_taxid_nucl{$accession}; > > my $org_name = $names{$taxid}; > > > > --Russell > > > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen > >> Sent: Saturday, 26 December 2009 4:52 p.m. > >> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org > >> Subject: Re: [Bioperl-l] how to retrieve organism name from accession > >> number? > >> > >> Bhakti, > >> The following example (using EUtilities) may serve your purpose: > >> > >> use Bio::DB::EUtilities; > >> > >> my (%taxa, @taxa); > >> my (%names, %idmap); > >> > >> # these are protein ids; nuc ids will work by changing -dbfrom => > >> 'nucleotide', > >> # (probably) > >> > >> my @ids = qw(1621261 89318838 68536103 20807972 730439); > >> > >> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink', > >> -db => 'taxonomy', > >> -dbfrom => 'protein', > >> -correspondence => 1, > >> -id => \@ids); > >> > >> # iterate through the LinkSet objects > >> while (my $ds = $factory->next_LinkSet) { > >> $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0] > >> } > >> > >> @taxa = @taxa{@ids}; > >> > >> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary', > >> -db => 'taxonomy', > >> -id => \@taxa ); > >> > >> while (local $_ = $factory->next_DocSum) { > >> $names{($_->get_contents_by_name('TaxId'))[0]} = > >> ($_->get_contents_by_name('ScientificName'))[0]; > >> } > >> > >> foreach (@ids) { > >> $idmap{$_} = $names{$taxa{$_}}; > >> } > >> > >> # %idmap is > >> # 1621261 => 'Mycobacterium tuberculosis H37Rv' > >> # 20807972 => 'Thermoanaerobacter tengcongensis MB4' > >> # 68536103 => 'Corynebacterium jeikeium K411' > >> # 730439 => 'Bacillus caldolyticus' > >> # 89318838 => undef (this record has been removed from the db) > >> > >> 1; > >> > >> You probably will need to break up your 30000 into chunks > >> (say, 1000-3000 each), and do the above on each chunk with a > >> > >> sleep 3; > >> > >> or so separating the queries. > >> MAJ > >> ----- Original Message ----- > >> From: "Bhakti Dwivedi" > >> To: > >> Sent: Friday, December 25, 2009 9:46 PM > >> Subject: [Bioperl-l] how to retrieve organism name from accession > number? > >> > >> > >>> Hi, > >>> > >>> Does anyone know how to retrieve the "Source" or the "Species name" > >> given > >>> the accession number using Bioperl. I have these 30,000 accession > >> numbers > >>> for which I need to get the source organisms. Any kind of help will > be > >>> appreciated. > >>> > >>> Thanks > >>> > >>> BD > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >>> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > ======================================================================= > > Attention: The information contained in this message and/or attachments > > from AgResearch Limited is intended only for the persons or entities > > to which it is addressed and may contain confidential and/or privileged > > material. Any review, retransmission, dissemination or other use of, or > > taking of any action in reliance upon, this information by persons or > > entities other than the intended recipients is prohibited by AgResearch > > Limited. If you have received this message in error, please notify the > > sender immediately. > > ======================================================================= > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l From avilella at gmail.com Sun Jan 10 21:05:13 2010 From: avilella at gmail.com (Albert Vilella) Date: Sun, 10 Jan 2010 21:05:13 +0000 Subject: [Bioperl-l] Ensembl problems In-Reply-To: References: <20100109220121.GA9521@quux.windows.ebi.ac.uk> Message-ID: <358f4d651001101305q1b75cfe3q558a245ab1ab1238@mail.gmail.com> > However, if anyone knows Ensembl very well, the database has in it > some of these interspecies comparisons already. ?They are accessed > when one does a phylogeny tree for specific genes (and generally for > highly conserved gene you will get a tree that includes nearly all 50 > species in the database). ?As I don't think they are computed > on-the-fly, the information must be precomputed and stored someplace > in the database. ?I would very much like to know how to access this > information. Yes, they are. You can access the data programmatically by installing the ensembl and ensembl-compara Perl APIs. There are a few example scripts for the GeneTrees: ensembl-compara/scripts/examples/homology*.pl Cheers, Albert. > Thanks, > Robert > > > > > On 1/9/10, Andreas K?h?ri wrote: >> On Sat, Jan 09, 2010 at 03:27:54PM -0500, Robert Bradbury wrote: >>> I am trying to get the examples provided by EMBL/Ensembl to work and am >>> encountering problems. >> >> Hi Robert, >> >> The ensembl-dev list is the appropriate forum for this type of questions >> as it has nothing to do with bioperl. >> >> There is also the Ensembl helpdesk. ?If you send your problem to >> I'm sure that it will be picked up by the >> appropriate people (I do myself not know enough about the Compara API to >> be able to diagnose this problem straight away I'm afraid). >> >> Be sure to submit a minimal script that still exhibit the problem, and >> information about what version of the APIs you're using (we will assume >> that you're not mixing newer version of the API with older databases or >> vice versa). >> >> We are generally very happy to have bugs in documentation or code >> pointed out to us, and will correct errors as we are made aware of them. >> >> >> Kind regards, >> Andreas >> >>> For example, about 1/3 of the way through the Compara API tutorial [1] >>> there >>> is what is supposed to be a completely functional script. ?It does not >>> work. ?This is in contrast to some of the earlier simple scripts (listing >>> the species in ?Ensmbl etc.) which do work on my machine, so I have all >>> the >>> libraries do dah installed correctly). >>> >>> Very poor form to document scripts which do not function on a properly >>> setup >>> system. >>> >>> I have modified my invocation of the script slightly: >>> ? Align.pl --set_of_species \ >>> "Homo sapiens:Pan troglodytes:Gorilla gorilla:Macaca mulatta:Otolemur >>> garnettii:Pongo pygmaeus:Equus caballus:Bos taurus:Loxodonta >>> africana:Pteropus vampyrus:Myotis lucifugus:Felis catus:Canis >>> familiaris:Sus >>> scrofa:Rattus norvegicus:Mus musculus:Gallus gallus:Xenopus >>> tropicalis:Takifugu rubripes:Tetraodon nigroviridis:Danio rerio:Tupaia >>> belangeri:Caenorhabditis elegans:Saccharomyces cerevisiae" >>> >>> which results in "Can't call method "fetch_by_method_link_type_GenomeDBs" >>> on >>> an undefined value at ./Align.pl line 132." (Align.pl is my slightly >>> modified example of the Compara Tutoraial code.) >>> As these are slightly modified perl scripts from the documantation, the >>> line >>> numbers may be variable. >>> >>> I can print out the genome_dbs, and it gives me a list of genome names >>> (hash >>> tables) though it appears that is problematic in the Align.pl script. >>> in spite of the fact that just previously to that call I dumped >>> "genome_dbs" >>> and got back some 25 hash tables (expected). ?I believe this occurs >>> whether >>> one is comparing "human:mouse" or the more complex species set I have >>> outlined above. >>> >>> >>> >>> Has anyone else attempted to run the code documented in the Ensembl API >>> Tutorial? >>> Any suggestions as to what direction to go in would be appreciated -- when >>> one is trying to copy code out of a tutorial and it fails its kind of hard >>> to know where to go.) >>> >>> There do appear to be some problems in the specifications of a Compara >>> version/database and there don't appear to be a lot of resources informing >>> one of what resources are currently available. >>> >>> Robert >>> >>> >>> 1. http://pre.ensembl.org/info/docs/api/compara/compara_tutorial.html >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> -- >> Andreas K?h?ri, Ensembl Software Developer >> European Bioinformatics Institute (EMBL-EBI) >> Wellcome Trust Genome Campus, Hinxton >> Cambridge CB10 1SD, United Kingdom >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From alessandra.bilardi at gmail.com Sun Jan 10 23:21:12 2010 From: alessandra.bilardi at gmail.com (Alessandra) Date: Mon, 11 Jan 2010 00:21:12 +0100 Subject: [Bioperl-l] GBrowse.org project In-Reply-To: References: Message-ID: Hi all, I'm Alessandra and I run GBrowse.org. GBrowse.org is a resource for using and setting up GBrowse genome browsers. The site provides one location where biologists and bioinformaticians can find: 1. Genome browser web sites for any organism that has them. If a species has a genome browser anywhere on the web, then we aim to link to it. 2. Links to sequence and annotation files that are available online. 3. Links to genome browser configuration files, when available 4. An FTP site containing genome annotation and configuration files for each annotated genome that does not have its own web site. GBrowse.org emphasizes the GBrowse genome browser in its organization, but also links to sites that use other browser packages such as UCSC, Ensembl, and JBrowse. Also, we are currently conducting a survey seeking input on future project direction. Please take a few minutes now to provide your feedback. Survey link: http://gbrowse.org/survey/index.php?sid=64264&lang=en GBrowse.org introdution link: http://gmod.org/wiki/August_2009_GMOD_Meeting#GBrowse.org Thank you for your help, Alessandra Bilardi. http://gbrowse.org/ CRIBI Genomics, University of Padua http://genomics.cribi.unipd.it/ From cjfields at illinois.edu Mon Jan 11 03:04:13 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 10 Jan 2010 21:04:13 -0600 Subject: [Bioperl-l] GMOD BioPerl Meeting Message-ID: <7D72ECC2-E856-4C09-B67A-62AFFB59B377@illinois.edu> Just a quick reminder that we're having a BioPerl satellite meeting after the PAG Conference (just prior to the GMOD Meeting). The meeting is this Wednesday, Jan. 13, starting at 11:30am, at the Best Western Seven Seas in San Diego. I will update the relevant BioPerl and GMOD pages with more details as they become available. At the moment, we will be meeting in the hotel lobby prior to starting the meeting and possible hackathon. http://www.bioperl.org/wiki/GMOD_2010_Meeting http://gmod.org/wiki/January_2010_GMOD_Meeting#Satellite_Meetings Thanks! chris From bernd.jagla at pasteur.fr Mon Jan 11 10:11:16 2010 From: bernd.jagla at pasteur.fr (Bernd Jagla) Date: Mon, 11 Jan 2010 11:11:16 +0100 Subject: [Bioperl-l] Bio:DB:SeqFeature - MySql - Java Message-ID: <6D85585C10F94E25898249D2D7CAC0D7@zillumina> Hi, First off, I am not sure if this is supposed to be addressed to the Bioperl or Gbrowse mailing list, so apologies if this is the wrong list and please let me know. I am writing a program in Java that needs to access genome annotation data. Since I am using Gbrowse already I was thinking that I could combine both approaches making life eventually easier for me. I am mainly interested in getting a gene/feature name for a given position. The position is stored in the feature table and through linking typelist, locationlist, (maybe sequence), and feature I can get all the information I need. Unfortunately it seems that the feature name is stored in the object blog of the feature table. That is a bit suspicious to me because I don't understand why searching for a name can be so fast if it is not indexed through mysql when searching using GBrowse. So my question is how to I parse the Bio::DB::SeqFeature object in JAVA correctly to get the name of the feature and possible also any further information. Any suggestions are greatly appreciated. Maybe there is a better solution than parsing Perl code with Java.? Thanks a lot, Bernd From biopython at maubp.freeserve.co.uk Mon Jan 11 10:48:52 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 11 Jan 2010 10:48:52 +0000 Subject: [Bioperl-l] Bio:DB:SeqFeature - MySql - Java In-Reply-To: <6D85585C10F94E25898249D2D7CAC0D7@zillumina> References: <6D85585C10F94E25898249D2D7CAC0D7@zillumina> Message-ID: <320fb6e01001110248t628f0837qa5e057fd53b58eac@mail.gmail.com> On Mon, Jan 11, 2010 at 10:11 AM, Bernd Jagla wrote: > Hi, > > First off, I am not sure if this is supposed to be addressed to the Bioperl > or Gbrowse mailing list, so apologies if this is the wrong list and please > let me know. > > I am writing a program in Java that needs to access genome annotation data. > Since I am using Gbrowse already I was thinking that I could combine both > approaches making life eventually easier for me. I am mainly interested in > getting a gene/feature name for a given position. The position is stored in > the feature table and through linking typelist, locationlist, (maybe > sequence), and feature I can get all the information I need. Unfortunately > it seems that the feature name is stored in the object blog of the feature > table. How are you storing the data in Gbrowse? There are several back ends, and this will make a big difference for accessing the raw data. One option would be to use Gbrowse with BioSQL as the backend. You can then use BioJava (or BioPerl, or BioPython, etc) to access the database. The only downside is Gbrowse isn't working 100% on top of BioSQL right now (I'd like to see this fixed, but I don't know Perl). There is an open bug on this [ gmod-Bugs-2168597 ]. Peter From bernd.jagla at pasteur.fr Mon Jan 11 10:53:20 2010 From: bernd.jagla at pasteur.fr (Bernd Jagla) Date: Mon, 11 Jan 2010 11:53:20 +0100 Subject: [Bioperl-l] Bio:DB:SeqFeature - MySql - Java In-Reply-To: <320fb6e01001110248t628f0837qa5e057fd53b58eac@mail.gmail.com> References: <6D85585C10F94E25898249D2D7CAC0D7@zillumina> <320fb6e01001110248t628f0837qa5e057fd53b58eac@mail.gmail.com> Message-ID: <9056164A8A744A77B6CD1E8E4E20B104@zillumina> I am using bp_seqfeature_load.pl to load my features. That is using Bio:DB:SeqFeature(Store) and MySql as a backend... That's all I understood... B > -----Original Message----- > From: p.j.a.cock at googlemail.com [mailto:p.j.a.cock at googlemail.com] On > Behalf Of Peter > Sent: Monday, January 11, 2010 11:49 AM > To: Bernd Jagla > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bio:DB:SeqFeature - MySql - Java > > On Mon, Jan 11, 2010 at 10:11 AM, Bernd Jagla > wrote: > > Hi, > > > > First off, I am not sure if this is supposed to be addressed to the > Bioperl > > or Gbrowse mailing list, so apologies if this is the wrong list and > please > > let me know. > > > > I am writing a program in Java that needs to access genome annotation > data. > > Since I am using Gbrowse already I was thinking that I could combine > both > > approaches making life eventually easier for me. I am mainly interested > in > > getting a gene/feature name for a given position. The position is stored > in > > the feature table and through linking typelist, locationlist, (maybe > > sequence), and feature I can get all the information I need. > Unfortunately > > it seems that the feature name is stored in the object blog of the > feature > > table. > > How are you storing the data in Gbrowse? There are several back ends, > and this will make a big difference for accessing the raw data. > > One option would be to use Gbrowse with BioSQL as the backend. > You can then use BioJava (or BioPerl, or BioPython, etc) to access the > database. The only downside is Gbrowse isn't working 100% on top > of BioSQL right now (I'd like to see this fixed, but I don't know Perl). > There is an open bug on this [ gmod-Bugs-2168597 ]. > > Peter From awitney at sgul.ac.uk Mon Jan 11 12:21:07 2010 From: awitney at sgul.ac.uk (Adam Witney) Date: Mon, 11 Jan 2010 12:21:07 +0000 Subject: [Bioperl-l] create Bio::AlignIO object from hash Message-ID: Hi, I am writing a script to automate the running of Phylip Pars. In the process i have to create a Bio::AlignIO object from a set of data that i have in a hash. I could write the hash data into a phylip file and then load the Bio::AlignIO from that file, but i wondered if i could skip the writing and then reading of a temporary file ? thanks for any help adam From roy.chaudhuri at gmail.com Mon Jan 11 13:54:25 2010 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Mon, 11 Jan 2010 13:54:25 +0000 Subject: [Bioperl-l] create Bio::AlignIO object from hash In-Reply-To: <4B4B2A51.9040602@gmail.com> References: <4B4B2A51.9040602@gmail.com> Message-ID: <4B4B2D91.70906@gmail.com> Actually, I guess some sample code would be more helpful: use Bio::LocatableSeq; use Bio::SimpleAlign; use Bio::AlignIO; my $seq1=Bio::LocatableSeq->new(-id=>'one', -seq=>'AT-CG', -start=>1, -end=>4); my $seq2=Bio::LocatableSeq->new(-id=>'two', -seq=>'A--CG', -start=>1, -end=>3); my $seq3=Bio::LocatableSeq->new(-id=>'three', -seq=>'ATTCG', -start=>1, -end=>5); my $aln=Bio::SimpleAlign->new(-seqs=>[$seq1,$seq2,$seq3]); Bio::AlignIO->new(-format=>'phylip')->write_aln($aln); Cheers, Roy. On 11/01/2010 13:40, Roy Chaudhuri wrote: > Hi Adam, > > I'm guessing you actually want to create a Bio::SimpleAlign object > (representing an alignment), rather than a Bio::AlignIO object (which is > just for reading/writing alignment files). Bio::SimpleAlign has a > documented new method that allows you to construct an alignment from > Bio::LocatableSeq objects, which are similar to Bio::Seq objects but > include gaps and start/end coordinates to describe their relationship to > other sequences in the alignment. > > Roy. > > On 11/01/2010 12:21, Adam Witney wrote: >> Hi, >> >> I am writing a script to automate the running of Phylip Pars. In the >> process i have to create a Bio::AlignIO object from a set of data >> that i have in a hash. >> >> I could write the hash data into a phylip file and then load the >> Bio::AlignIO from that file, but i wondered if i could skip the >> writing and then reading of a temporary file ? >> >> thanks for any help >> >> adam _______________________________________________ Bioperl-l >> mailing list Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From roy.chaudhuri at gmail.com Mon Jan 11 13:40:33 2010 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Mon, 11 Jan 2010 13:40:33 +0000 Subject: [Bioperl-l] create Bio::AlignIO object from hash In-Reply-To: References: Message-ID: <4B4B2A51.9040602@gmail.com> Hi Adam, I'm guessing you actually want to create a Bio::SimpleAlign object (representing an alignment), rather than a Bio::AlignIO object (which is just for reading/writing alignment files). Bio::SimpleAlign has a documented new method that allows you to construct an alignment from Bio::LocatableSeq objects, which are similar to Bio::Seq objects but include gaps and start/end coordinates to describe their relationship to other sequences in the alignment. Roy. On 11/01/2010 12:21, Adam Witney wrote: > Hi, > > I am writing a script to automate the running of Phylip Pars. In the > process i have to create a Bio::AlignIO object from a set of data > that i have in a hash. > > I could write the hash data into a phylip file and then load the > Bio::AlignIO from that file, but i wondered if i could skip the > writing and then reading of a temporary file ? > > thanks for any help > > adam _______________________________________________ Bioperl-l > mailing list Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From biopython at maubp.freeserve.co.uk Mon Jan 11 14:16:45 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 11 Jan 2010 14:16:45 +0000 Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records Message-ID: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com> Hi, I'm running bioperl-live from SVN, just updated to revision 16648. $ perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"' 1.0069 I am trying to get Bio::SeqIO to convert a multiple record EMBL file into GenBank format, piping the data via stdin/stdout using the following trivial Perl script: #!/usr/bin/env perl use Bio::SeqIO; my $in = Bio::SeqIO->new(-fh => \*STDIN, -format => 'embl'); my $out = Bio::SeqIO->new(-format => 'genbank'); while (my $seq = $in->next_seq) { $out->write_seq($seq) }; This only seems to find the first EMBL record in my example files. For example, this simple file has just two contig records: http://biopython.open-bio.org/SRC/biopython/Tests/EMBL/Human_contigs.embl This is just the first two records taken from a much larger EMBL file rel_con_hum_01_r102.dat downloaded and uncompressed from: ftp://ftp.ebi.ac.uk/pub/databases/embl/release/rel_con_hum_01_r102.dat.gz Trying both these examples as input, BioPerl just gives a single GenBank record as output (the first EMBL entry in the input). Is this a BioPerl bug, or am I missing something? Peter From maj at fortinbras.us Mon Jan 11 15:04:00 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 11 Jan 2010 10:04:00 -0500 Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records In-Reply-To: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com> References: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com> Message-ID: Hi Peter, I found the issue-- there are no SQ lines in the data, and having them is a key stop condition in the parser (line 438 embl.pm). We evidently need to be more liberal in what we accept, even as we are strict in what we emit. Could you make a bug report? thanks for the heads-up-- MAJ ----- Original Message ----- From: "Peter" To: "bioperl-l list" Sent: Monday, January 11, 2010 9:16 AM Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records > Hi, > > I'm running bioperl-live from SVN, just updated to revision 16648. > > $ perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"' > 1.0069 > > I am trying to get Bio::SeqIO to convert a multiple record EMBL > file into GenBank format, piping the data via stdin/stdout using > the following trivial Perl script: > > #!/usr/bin/env perl > use Bio::SeqIO; > my $in = Bio::SeqIO->new(-fh => \*STDIN, -format => 'embl'); > my $out = Bio::SeqIO->new(-format => 'genbank'); > while (my $seq = $in->next_seq) { $out->write_seq($seq) }; > > This only seems to find the first EMBL record in my example > files. For example, this simple file has just two contig records: > http://biopython.open-bio.org/SRC/biopython/Tests/EMBL/Human_contigs.embl > > This is just the first two records taken from a much larger EMBL file > rel_con_hum_01_r102.dat downloaded and uncompressed from: > ftp://ftp.ebi.ac.uk/pub/databases/embl/release/rel_con_hum_01_r102.dat.gz > > Trying both these examples as input, BioPerl just gives a single > GenBank record as output (the first EMBL entry in the input). > > Is this a BioPerl bug, or am I missing something? > > Peter > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From biopython at maubp.freeserve.co.uk Mon Jan 11 15:17:37 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 11 Jan 2010 15:17:37 +0000 Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records In-Reply-To: References: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com> Message-ID: <320fb6e01001110717g93f11ccn13c1010cefeb3a5b@mail.gmail.com> On Mon, Jan 11, 2010 at 3:04 PM, Mark A. Jensen wrote: > > Hi Peter, I found the issue-- there are no SQ lines in the data, and having > them is a key stop condition in the parser (line 438 embl.pm). > We evidently need to be more liberal in what we accept, even as we are > strict in what we emit. Could you make a bug report? > thanks for the heads-up-- > MAJ Done: http://bugzilla.open-bio.org/show_bug.cgi?id=2982 These are EMBL contig records, so they don't have SQ lines, but instead CO lines. Peter From cjfields at illinois.edu Mon Jan 11 15:24:24 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 11 Jan 2010 09:24:24 -0600 Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records In-Reply-To: <320fb6e01001110717g93f11ccn13c1010cefeb3a5b@mail.gmail.com> References: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com> <320fb6e01001110717g93f11ccn13c1010cefeb3a5b@mail.gmail.com> Message-ID: On Jan 11, 2010, at 9:17 AM, Peter wrote: > On Mon, Jan 11, 2010 at 3:04 PM, Mark A. Jensen wrote: >> >> Hi Peter, I found the issue-- there are no SQ lines in the data, and having >> them is a key stop condition in the parser (line 438 embl.pm). >> We evidently need to be more liberal in what we accept, even as we are >> strict in what we emit. Could you make a bug report? >> thanks for the heads-up-- >> MAJ > > Done: http://bugzilla.open-bio.org/show_bug.cgi?id=2982 > > These are EMBL contig records, so they don't have SQ lines, > but instead CO lines. > > Peter Peter, Just curious, but have you tried the experimental EMBL parser 'embldriver'? I don't think it's bound to the same strictures, but I may be mistaken. chris From cjfields at illinois.edu Mon Jan 11 15:23:00 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 11 Jan 2010 09:23:00 -0600 Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records In-Reply-To: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com> References: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com> Message-ID: <0D0D9DB5-56FA-414E-8D1D-3FE18198F7EC@illinois.edu> Just saw that mark responded, so if possible submit a bug. We may be doing a mini-hackathon this Wednesday, so we can probably tackle it in the process (possibly along with a few other pressing issues). chris On Jan 11, 2010, at 8:16 AM, Peter wrote: > Hi, > > I'm running bioperl-live from SVN, just updated to revision 16648. > > $ perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"' > 1.0069 > > I am trying to get Bio::SeqIO to convert a multiple record EMBL > file into GenBank format, piping the data via stdin/stdout using > the following trivial Perl script: > > #!/usr/bin/env perl > use Bio::SeqIO; > my $in = Bio::SeqIO->new(-fh => \*STDIN, -format => 'embl'); > my $out = Bio::SeqIO->new(-format => 'genbank'); > while (my $seq = $in->next_seq) { $out->write_seq($seq) }; > > This only seems to find the first EMBL record in my example > files. For example, this simple file has just two contig records: > http://biopython.open-bio.org/SRC/biopython/Tests/EMBL/Human_contigs.embl > > This is just the first two records taken from a much larger EMBL file > rel_con_hum_01_r102.dat downloaded and uncompressed from: > ftp://ftp.ebi.ac.uk/pub/databases/embl/release/rel_con_hum_01_r102.dat.gz > > Trying both these examples as input, BioPerl just gives a single > GenBank record as output (the first EMBL entry in the input). > > Is this a BioPerl bug, or am I missing something? > > Peter > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From biopython at maubp.freeserve.co.uk Mon Jan 11 15:55:26 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 11 Jan 2010 15:55:26 +0000 Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records In-Reply-To: References: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com> Message-ID: <320fb6e01001110755u50fd4255i1f6dae40a608a562@mail.gmail.com> On Mon, Jan 11, 2010 at 3:42 PM, Hotz, Hans-Rudolf wrote: > > These entries form the CON data class, see: > http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html#3_4_14 > and they don't contain any sequence information. I know - GenBank files have a similar system with CONTIG lines instead of sequences. I was expecting BioPerl to be able to convert these EMBL files with CO lines into GenBank files with CONTIG lines. > If you take the 'expanded' entries from > ftp://ftp.ebi.ac.uk/pub/databases/embl/expanded_con/release/rel_con_hum_01_r102.dat.gz > your script will work. That's a useful tip - thanks. Peter From hrh at fmi.ch Mon Jan 11 15:42:22 2010 From: hrh at fmi.ch (Hotz, Hans-Rudolf) Date: Mon, 11 Jan 2010 16:42:22 +0100 Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records In-Reply-To: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com> Message-ID: On 1/11/10 3:16 PM, "Peter" wrote: > Hi, > > I'm running bioperl-live from SVN, just updated to revision 16648. > > $ perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"' > 1.0069 > > I am trying to get Bio::SeqIO to convert a multiple record EMBL > file into GenBank format, piping the data via stdin/stdout using > the following trivial Perl script: > > #!/usr/bin/env perl > use Bio::SeqIO; > my $in = Bio::SeqIO->new(-fh => \*STDIN, -format => 'embl'); > my $out = Bio::SeqIO->new(-format => 'genbank'); > while (my $seq = $in->next_seq) { $out->write_seq($seq) }; > > This only seems to find the first EMBL record in my example > files. For example, this simple file has just two contig records: > http://biopython.open-bio.org/SRC/biopython/Tests/EMBL/Human_contigs.embl > > This is just the first two records taken from a much larger EMBL file > rel_con_hum_01_r102.dat downloaded and uncompressed from: > ftp://ftp.ebi.ac.uk/pub/databases/embl/release/rel_con_hum_01_r102.dat.gz These entries form the CON data class, see: http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html#3_4_14 and they don't contain any sequence information. If you take the 'expanded' entries from ftp://ftp.ebi.ac.uk/pub/databases/embl/expanded_con/release/rel_con_hum_01_r 102.dat.gz your script will work. Hans > Trying both these examples as input, BioPerl just gives a single > GenBank record as output (the first EMBL entry in the input). > > Is this a BioPerl bug, or am I missing something? > > Peter > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From awitney at sgul.ac.uk Mon Jan 11 16:27:15 2010 From: awitney at sgul.ac.uk (Adam Witney) Date: Mon, 11 Jan 2010 16:27:15 +0000 Subject: [Bioperl-l] create Bio::AlignIO object from hash In-Reply-To: <4B4B2D91.70906@gmail.com> References: <4B4B2A51.9040602@gmail.com> <4B4B2D91.70906@gmail.com> Message-ID: Ah excellent, thanks Roy. I was indeed thinking about it the wrong way. In the process of writing this i have created a Bio::Tools::Run::Phylo::Phylip::Pars class which is essentially just a modified copy of ProtPars. I have also fixed a few typos and possible bugs in Bio/Tools/Run/Phylo/Phylip/Base.pm Bio/Tools/Run/Phylo/Phylip/PhylipConf.pm Bio/AlignIO/phylip.pm Bio/Tools/Run/Alignment/Clustalw.pm I am of course happy to send these back in to the project... how would i best do this? Cheers adam On 11 Jan 2010, at 13:54, Roy Chaudhuri wrote: > Actually, I guess some sample code would be more helpful: > > use Bio::LocatableSeq; > use Bio::SimpleAlign; > use Bio::AlignIO; > my $seq1=Bio::LocatableSeq->new(-id=>'one', -seq=>'AT-CG', -start=>1, -end=>4); > my $seq2=Bio::LocatableSeq->new(-id=>'two', -seq=>'A--CG', -start=>1, -end=>3); > my $seq3=Bio::LocatableSeq->new(-id=>'three', -seq=>'ATTCG', -start=>1, -end=>5); > my $aln=Bio::SimpleAlign->new(-seqs=>[$seq1,$seq2,$seq3]); > Bio::AlignIO->new(-format=>'phylip')->write_aln($aln); > > Cheers, > Roy. > > > On 11/01/2010 13:40, Roy Chaudhuri wrote: >> Hi Adam, >> >> I'm guessing you actually want to create a Bio::SimpleAlign object >> (representing an alignment), rather than a Bio::AlignIO object (which is >> just for reading/writing alignment files). Bio::SimpleAlign has a >> documented new method that allows you to construct an alignment from >> Bio::LocatableSeq objects, which are similar to Bio::Seq objects but >> include gaps and start/end coordinates to describe their relationship to >> other sequences in the alignment. >> >> Roy. >> >> On 11/01/2010 12:21, Adam Witney wrote: >>> Hi, >>> >>> I am writing a script to automate the running of Phylip Pars. In the >>> process i have to create a Bio::AlignIO object from a set of data >>> that i have in a hash. >>> >>> I could write the hash data into a phylip file and then load the >>> Bio::AlignIO from that file, but i wondered if i could skip the >>> writing and then reading of a temporary file ? >>> >>> thanks for any help >>> >>> adam _______________________________________________ Bioperl-l >>> mailing list Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > From Russell.Smithies at agresearch.co.nz Tue Jan 12 03:41:02 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue, 12 Jan 2010 16:41:02 +1300 Subject: [Bioperl-l] BioPerl version? In-Reply-To: References: <4B4B2A51.9040602@gmail.com> <4B4B2D91.70906@gmail.com> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C619262C8@exchsth.agresearch.co.nz> Probably a dumb question but how do I find the version of an existing BioPerl installation without resorting to reading thru Bio/Root/Version.pm ? --Russell ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From cjfields at illinois.edu Tue Jan 12 03:59:44 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 11 Jan 2010 21:59:44 -0600 Subject: [Bioperl-l] BioPerl version? In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C619262C8@exchsth.agresearch.co.nz> References: <4B4B2A51.9040602@gmail.com> <4B4B2D91.70906@gmail.com> <18DF7D20DFEC044098A1062202F5FFF32C619262C8@exchsth.agresearch.co.nz> Message-ID: <795BD926-4AE9-4478-AAD5-E36558350745@illinois.edu> Not dumb, but a frequently asked one: that's a FAQ question ;> http://www.bioperl.org/wiki/FAQ#How_can_I_tell_what_version_of_BioPerl_is_installed.3F perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"' chris On Jan 11, 2010, at 9:41 PM, Smithies, Russell wrote: > Probably a dumb question but how do I find the version of an existing BioPerl installation without resorting to reading thru Bio/Root/Version.pm ? > > --Russell > > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Jan 12 16:02:02 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 12 Jan 2010 10:02:02 -0600 Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records In-Reply-To: <320fb6e01001110755u50fd4255i1f6dae40a608a562@mail.gmail.com> References: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com> <320fb6e01001110755u50fd4255i1f6dae40a608a562@mail.gmail.com> Message-ID: On Jan 11, 2010, at 9:55 AM, Peter wrote: > On Mon, Jan 11, 2010 at 3:42 PM, Hotz, Hans-Rudolf wrote: >> >> These entries form the CON data class, see: >> http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html#3_4_14 >> and they don't contain any sequence information. > > I know - GenBank files have a similar system with CONTIG > lines instead of sequences. I was expecting BioPerl to be > able to convert these EMBL files with CO lines into GenBank > files with CONTIG lines. IIRC the contig information for GenBank is stored in annotation. We can try to ensure the data is carried over to EMBL properly. >> If you take the 'expanded' entries from >> ftp://ftp.ebi.ac.uk/pub/databases/embl/expanded_con/release/rel_con_hum_01_r102.dat.gz >> your script will work. > > That's a useful tip - thanks. > > Peter NCBI's eutil option 'gbwithparts' is similar (always retrieves the sequence). chris From biopython at maubp.freeserve.co.uk Tue Jan 12 16:19:32 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 12 Jan 2010 16:19:32 +0000 Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records In-Reply-To: References: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com> <320fb6e01001110755u50fd4255i1f6dae40a608a562@mail.gmail.com> Message-ID: <320fb6e01001120819u50e73fa8k9bde8aa1abdf942d@mail.gmail.com> On Tue, Jan 12, 2010 at 4:02 PM, Chris Fields wrote: > On Jan 11, 2010, at 9:55 AM, Peter wrote: > >> On Mon, Jan 11, 2010 at 3:42 PM, Hotz, Hans-Rudolf wrote: >>> >>> These entries form the CON data class, see: >>> http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html#3_4_14 >>> and they don't contain any sequence information. >> >> I know - GenBank files have a similar system with CONTIG >> lines instead of sequences. I was expecting BioPerl to be >> able to convert these EMBL files with CO lines into GenBank >> files with CONTIG lines. > > IIRC the contig information for GenBank is stored in annotation. > We can try to ensure the data is carried over to EMBL properly. For contig records (where there is no sequence) I think we just need to map the GenBank CONTIG lines to the EMBL CO lines, and vice versa. At least, that's what Biopython now does (trunk code, not yet released). >>> If you take the 'expanded' entries from >>> ftp://ftp.ebi.ac.uk/pub/databases/embl/expanded_con/release/rel_con_hum_01_r102.dat.gz >>> your script will work. >> >> That's a useful tip - thanks. >> >> Peter > > NCBI's eutil option 'gbwithparts' is similar (always retrieves the sequence). Indeed. This is a useful work around for when a parser couldn't cope with the contig version of a GenBank file for some reason, e.g. http://bugzilla.open-bio.org/show_bug.cgi?id=2745 Peter From maj at fortinbras.us Tue Jan 12 17:33:30 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 12 Jan 2010 12:33:30 -0500 Subject: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP web service Message-ID: <231A8D9473704E7697F7A486A0CDA86A@NewLife> Hi All-- The beta of Bio::DB::SoapEUtilities is now available in the bioperl-run trunk: one-stop shopping for the NCBI Entrez SOAP web service. The system is fully WSDL based, and all eutils are available. The best thing (IMHO) are the result adaptors, which provide conversion and iteration of SOAP results into BioPerl objects. Schau, mal: use Bio::DB::EUtilities; my $fac = Bio::DB::EUtilities->new(); # step 1 my $seqio = $fac->esearch( -db => 'nucleotide', -term => 'HIV1 and CCR5 and Brazil' )->run(-auto_adapt => 1, -rettype => 'fasta'); # step 2, 3 # yes, it's already done the efetch under the hood... while ( my $seq = $seqio->next_seq ) { # step 4 # do something with $seq, a Bio::Seq object... } or this: my $links = $fac->elink( -db => 'protein', -dbfrom => 'nucleotide', -id => \@nucids )->run( -auto_adapt => 1 ); # maybe more than one associated id... my @prot_0 = $links->id_map( $nucids[0] ); while ( my $ls = $links->next_linkset ) { @ids = $ls->ids; @submitted_ids = $ls->submitted_ids; # etc. } and much, much more. See http://www.bioperl.org/wiki/HOWTO:EUtilities_Web_Service and of course, the POD, for all the details, including download/installation. Tests in bioperl-run/t. cheers, MAJ -- No new dependencies were added or animals mistreated -- during the making of these modules. From sheldon.mckay at gmail.com Tue Jan 12 18:02:53 2010 From: sheldon.mckay at gmail.com (Sheldon McKay) Date: Tue, 12 Jan 2010 10:02:53 -0800 Subject: [Bioperl-l] code.open-bio.org timing out? Message-ID: Hi all, I keep timing out trying to do an svn checkout of bioperl-live from code.open-bio.org. Any suggestions? Thanks, Sheldon ---- Sheldon McKay, PhD Lead, iPlant Tree of Life Engagement Team; Research Investigator Cold Spring Harbor Laboratory http://mckay.cshl.edu Google Voice: (203) 701-9204 On Tue, Nov 3, 2009 at 9:09 AM, Aaron Mackey wrote: > [ajm6q at lc4 bioperl-live]$ svn update > svn: Decompression of svndiff data failed > > > I'll admit to not having svn updated in awhile; A clean, anonymous svn co > failed with the same message: > > [...] > A ? ?bioperl-live/Bio/Structure/StructureI.pm > A ? ?bioperl-live/Bio/Structure/IO > svn: Decompression of svndiff data failed > > -Aaron > > P.S. I used this command: svn co svn:// > code.open-bio.org/bioperl/bioperl-live/trunk bioperl-live > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From biopython at maubp.freeserve.co.uk Tue Jan 12 18:12:46 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 12 Jan 2010 18:12:46 +0000 Subject: [Bioperl-l] code.open-bio.org timing out? In-Reply-To: References: Message-ID: <320fb6e01001121012r68166814o764df62c0a5a6224@mail.gmail.com> On Tue, Jan 12, 2010 at 6:02 PM, Sheldon McKay wrote: > Hi all, > > I keep timing out trying to do an svn checkout of bioperl-live from > code.open-bio.org. ?Any suggestions? > > Thanks, > Sheldon The OBF team know about this (its being discussed on root-l), hopefully they'll have it fixed before too long. Peter From cjfields at illinois.edu Tue Jan 12 18:18:45 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 12 Jan 2010 12:18:45 -0600 Subject: [Bioperl-l] code.open-bio.org timing out? In-Reply-To: <320fb6e01001121012r68166814o764df62c0a5a6224@mail.gmail.com> References: <320fb6e01001121012r68166814o764df62c0a5a6224@mail.gmail.com> Message-ID: <8E18DCA9-5C72-4201-A213-BF53A6AAAAD2@illinois.edu> On Jan 12, 2010, at 12:12 PM, Peter wrote: > On Tue, Jan 12, 2010 at 6:02 PM, Sheldon McKay wrote: >> Hi all, >> >> I keep timing out trying to do an svn checkout of bioperl-live from >> code.open-bio.org. Any suggestions? >> >> Thanks, >> Sheldon > > The OBF team know about this (its being discussed on root-l), > hopefully they'll have it fixed before too long. > > Peter We probably need to set up some automatic syncing of our read-only code.google.com repo as a backup. Jason had originally set that up, hopefully he'll respond. chris From jason at bioperl.org Tue Jan 12 18:27:55 2010 From: jason at bioperl.org (Jason Stajich) Date: Tue, 12 Jan 2010 10:27:55 -0800 Subject: [Bioperl-l] code.open-bio.org timing out? In-Reply-To: <8E18DCA9-5C72-4201-A213-BF53A6AAAAD2@illinois.edu> References: <320fb6e01001121012r68166814o764df62c0a5a6224@mail.gmail.com> <8E18DCA9-5C72-4201-A213-BF53A6AAAAD2@illinois.edu> Message-ID: Hi - I had setup the google code sync, but then the unfortunately realization that the revision numbers are shared among the wiki and the code SVN (all 1 repo) so when I added a wiki page on the site I screwed up the numbering and it wasn't possible to sync anymore (that I could figure out) without resetting it and I haven't gone back to that. Sorry - I wasn't sure if we had figured out what we wanted to for repositories so I sort of stopped worrying about it. -jason On Jan 12, 2010, at 10:18 AM, Chris Fields wrote: > On Jan 12, 2010, at 12:12 PM, Peter wrote: > >> On Tue, Jan 12, 2010 at 6:02 PM, Sheldon McKay > > wrote: >>> Hi all, >>> >>> I keep timing out trying to do an svn checkout of bioperl-live from >>> code.open-bio.org. Any suggestions? >>> >>> Thanks, >>> Sheldon >> >> The OBF team know about this (its being discussed on root-l), >> hopefully they'll have it fixed before too long. >> >> Peter > > We probably need to set up some automatic syncing of our read-only > code.google.com repo as a backup. Jason had originally set that up, > hopefully he'll respond. > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ From virajj at gmail.com Wed Jan 6 18:20:39 2010 From: virajj at gmail.com (Vijayaraj Nagarajan) Date: Wed, 6 Jan 2010 13:20:39 -0500 Subject: [Bioperl-l] targetp request Message-ID: <581b44531001061020i68b2e80ic15f3bcc830204a@mail.gmail.com> Hi, I am trying to use targetP in bioperl. the documentation at the bioperl site is a bit confusing to me... I would appreciate if you could give a very small example, as to how to use "Bio::Tools::TargetP" to predict the localization of a protein sequence that i have stored as a string. Thanks, Vijay From cjfields at illinois.edu Tue Jan 12 23:36:53 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 12 Jan 2010 17:36:53 -0600 Subject: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP web service In-Reply-To: <231A8D9473704E7697F7A486A0CDA86A@NewLife> References: <231A8D9473704E7697F7A486A0CDA86A@NewLife> Message-ID: Um, just to be clear, this isn't Bio::DB::EUtilities, right (it's Bio::DB::SoapEUtilities)? Otherwise this would be a serious namespace and API conflict with the current EUtilities tools. chris On Jan 12, 2010, at 11:33 AM, Mark A. Jensen wrote: > Hi All-- > > The beta of Bio::DB::SoapEUtilities is now available in the > bioperl-run trunk: one-stop shopping for the NCBI Entrez SOAP web > service. The system is fully WSDL based, and all eutils are > available. The best thing (IMHO) are the result adaptors, which > provide conversion and iteration of SOAP results into BioPerl > objects. Schau, mal: > > use Bio::DB::EUtilities; > my $fac = Bio::DB::EUtilities->new(); # step 1 > my $seqio = $fac->esearch( > -db => 'nucleotide', > -term => 'HIV1 and CCR5 and Brazil' > )->run(-auto_adapt => 1, -rettype => 'fasta'); # step 2, 3 > # yes, it's already done the efetch under the hood... > while ( my $seq = $seqio->next_seq ) { # step 4 > # do something with $seq, a Bio::Seq object... > } > > or this: > > my $links = $fac->elink( -db => 'protein', > -dbfrom => 'nucleotide', > -id => \@nucids )->run( -auto_adapt => 1 ); > > # maybe more than one associated id... > my @prot_0 = $links->id_map( $nucids[0] ); > > while ( my $ls = $links->next_linkset ) { > @ids = $ls->ids; > @submitted_ids = $ls->submitted_ids; > # etc. > } > > and much, much more. See > > http://www.bioperl.org/wiki/HOWTO:EUtilities_Web_Service > > and of course, the POD, for all the details, including > download/installation. Tests in bioperl-run/t. > > cheers, > MAJ > > -- No new dependencies were added or animals mistreated > -- during the making of these modules. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Jan 13 00:22:10 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 12 Jan 2010 18:22:10 -0600 Subject: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP web service In-Reply-To: <5AD210CB0C444A57881BBDD34DE99149@NewLife> References: <231A8D9473704E7697F7A486A0CDA86A@NewLife> <5AD210CB0C444A57881BBDD34DE99149@NewLife> Message-ID: Okay, just making sure (I was getting a bit paranoid). Great work on the SOAP interface, BTW! chris On Jan 12, 2010, at 6:08 PM, Mark A. Jensen wrote: > Um, yeah. > ----- Original Message ----- From: "Chris Fields" > To: "Mark A. Jensen" > Cc: "BioPerl List" > Sent: Tuesday, January 12, 2010 6:36 PM > Subject: Re: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP web service > > > Um, just to be clear, this isn't Bio::DB::EUtilities, right (it's Bio::DB::SoapEUtilities)? Otherwise this would be a serious namespace and API conflict with the current EUtilities tools. > > chris > > On Jan 12, 2010, at 11:33 AM, Mark A. Jensen wrote: > >> Hi All-- >> >> The beta of Bio::DB::SoapEUtilities is now available in the >> bioperl-run trunk: one-stop shopping for the NCBI Entrez SOAP web >> service. The system is fully WSDL based, and all eutils are >> available. The best thing (IMHO) are the result adaptors, which >> provide conversion and iteration of SOAP results into BioPerl >> objects. Schau, mal: >> >> use Bio::DB::EUtilities; >> my $fac = Bio::DB::EUtilities->new(); # step 1 >> my $seqio = $fac->esearch( >> -db => 'nucleotide', >> -term => 'HIV1 and CCR5 and Brazil' >> )->run(-auto_adapt => 1, -rettype => 'fasta'); # step 2, 3 >> # yes, it's already done the efetch under the hood... >> while ( my $seq = $seqio->next_seq ) { # step 4 >> # do something with $seq, a Bio::Seq object... >> } >> >> or this: >> >> my $links = $fac->elink( -db => 'protein', >> -dbfrom => 'nucleotide', >> -id => \@nucids )->run( -auto_adapt => 1 ); >> >> # maybe more than one associated id... >> my @prot_0 = $links->id_map( $nucids[0] ); >> >> while ( my $ls = $links->next_linkset ) { >> @ids = $ls->ids; >> @submitted_ids = $ls->submitted_ids; >> # etc. >> } >> >> and much, much more. See >> >> http://www.bioperl.org/wiki/HOWTO:EUtilities_Web_Service >> >> and of course, the POD, for all the details, including >> download/installation. Tests in bioperl-run/t. >> >> cheers, >> MAJ >> >> -- No new dependencies were added or animals mistreated >> -- during the making of these modules. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Wed Jan 13 00:08:12 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 12 Jan 2010 19:08:12 -0500 Subject: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP web service In-Reply-To: References: <231A8D9473704E7697F7A486A0CDA86A@NewLife> Message-ID: <5AD210CB0C444A57881BBDD34DE99149@NewLife> Um, yeah. ----- Original Message ----- From: "Chris Fields" To: "Mark A. Jensen" Cc: "BioPerl List" Sent: Tuesday, January 12, 2010 6:36 PM Subject: Re: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP web service Um, just to be clear, this isn't Bio::DB::EUtilities, right (it's Bio::DB::SoapEUtilities)? Otherwise this would be a serious namespace and API conflict with the current EUtilities tools. chris On Jan 12, 2010, at 11:33 AM, Mark A. Jensen wrote: > Hi All-- > > The beta of Bio::DB::SoapEUtilities is now available in the > bioperl-run trunk: one-stop shopping for the NCBI Entrez SOAP web > service. The system is fully WSDL based, and all eutils are > available. The best thing (IMHO) are the result adaptors, which > provide conversion and iteration of SOAP results into BioPerl > objects. Schau, mal: > > use Bio::DB::EUtilities; > my $fac = Bio::DB::EUtilities->new(); # step 1 > my $seqio = $fac->esearch( > -db => 'nucleotide', > -term => 'HIV1 and CCR5 and Brazil' > )->run(-auto_adapt => 1, -rettype => 'fasta'); # step 2, 3 > # yes, it's already done the efetch under the hood... > while ( my $seq = $seqio->next_seq ) { # step 4 > # do something with $seq, a Bio::Seq object... > } > > or this: > > my $links = $fac->elink( -db => 'protein', > -dbfrom => 'nucleotide', > -id => \@nucids )->run( -auto_adapt => 1 ); > > # maybe more than one associated id... > my @prot_0 = $links->id_map( $nucids[0] ); > > while ( my $ls = $links->next_linkset ) { > @ids = $ls->ids; > @submitted_ids = $ls->submitted_ids; > # etc. > } > > and much, much more. See > > http://www.bioperl.org/wiki/HOWTO:EUtilities_Web_Service > > and of course, the POD, for all the details, including > download/installation. Tests in bioperl-run/t. > > cheers, > MAJ > > -- No new dependencies were added or animals mistreated > -- during the making of these modules. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Wed Jan 13 01:09:28 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 12 Jan 2010 20:09:28 -0500 Subject: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP webservice In-Reply-To: <5AD210CB0C444A57881BBDD34DE99149@NewLife> References: <231A8D9473704E7697F7A486A0CDA86A@NewLife> <5AD210CB0C444A57881BBDD34DE99149@NewLife> Message-ID: corrected: use Bio::DB::SoapEUtilities; my $fac = Bio::DB::SoapEUtilities->new(); # step 1 my $seqio = $fac->esearch( -db => 'nucleotide', -term => 'HIV1 and CCR5 and Brazil' )->run(-auto_adapt => 1, -rettype => 'fasta'); # step 2, 3 # yes, it's already done the efetch under the hood... while ( my $seq = $seqio->next_seq ) { # step 4 # do something with $seq, a Bio::Seq object... } ----- Original Message ----- From: "Mark A. Jensen" To: "Chris Fields" Cc: "BioPerl List" Sent: Tuesday, January 12, 2010 7:08 PM Subject: Re: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP webservice > Um, yeah. > ----- Original Message ----- > From: "Chris Fields" > To: "Mark A. Jensen" > Cc: "BioPerl List" > Sent: Tuesday, January 12, 2010 6:36 PM > Subject: Re: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP web > service > > > Um, just to be clear, this isn't Bio::DB::EUtilities, right (it's > Bio::DB::SoapEUtilities)? Otherwise this would be a serious namespace and API > conflict with the current EUtilities tools. > > chris > > On Jan 12, 2010, at 11:33 AM, Mark A. Jensen wrote: > >> Hi All-- >> >> The beta of Bio::DB::SoapEUtilities is now available in the >> bioperl-run trunk: one-stop shopping for the NCBI Entrez SOAP web >> service. The system is fully WSDL based, and all eutils are >> available. The best thing (IMHO) are the result adaptors, which >> provide conversion and iteration of SOAP results into BioPerl >> objects. Schau, mal: >> >> use Bio::DB::EUtilities; >> my $fac = Bio::DB::EUtilities->new(); # step 1 >> my $seqio = $fac->esearch( >> -db => 'nucleotide', >> -term => 'HIV1 and CCR5 and Brazil' >> )->run(-auto_adapt => 1, -rettype => 'fasta'); # step 2, 3 >> # yes, it's already done the efetch under the hood... >> while ( my $seq = $seqio->next_seq ) { # step 4 >> # do something with $seq, a Bio::Seq object... >> } >> >> or this: >> >> my $links = $fac->elink( -db => 'protein', >> -dbfrom => 'nucleotide', >> -id => \@nucids )->run( -auto_adapt => 1 ); >> >> # maybe more than one associated id... >> my @prot_0 = $links->id_map( $nucids[0] ); >> >> while ( my $ls = $links->next_linkset ) { >> @ids = $ls->ids; >> @submitted_ids = $ls->submitted_ids; >> # etc. >> } >> >> and much, much more. See >> >> http://www.bioperl.org/wiki/HOWTO:EUtilities_Web_Service >> >> and of course, the POD, for all the details, including >> download/installation. Tests in bioperl-run/t. >> >> cheers, >> MAJ >> >> -- No new dependencies were added or animals mistreated >> -- during the making of these modules. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From tuco at pasteur.fr Wed Jan 13 10:24:34 2010 From: tuco at pasteur.fr (Emmanuel Quevillon) Date: Wed, 13 Jan 2010 11:24:34 +0100 Subject: [Bioperl-l] targetp request In-Reply-To: <581b44531001061020i68b2e80ic15f3bcc830204a@mail.gmail.com> References: <581b44531001061020i68b2e80ic15f3bcc830204a@mail.gmail.com> Message-ID: <4B4D9F62.5010306@pasteur.fr> On 1/6/10 7:20 PM, Vijayaraj Nagarajan wrote: > Hi, > > I am trying to use targetP in bioperl. > the documentation at the bioperl site is a bit confusing to me... > > I would appreciate if you could give a very small example, as to how to use > "Bio::Tools::TargetP" to predict the localization of a protein sequence that > i have stored as a string. > > Thanks, > Vijay > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Dear Vivay, Bio::Tools::TargetP is not intended to run targetp on a sequence but to read and parse results from targetp run. From the Pod doc : DESCRIPTION TargetP modules will provides parsed informations about protein localization. It reads in a targetp output file. It parses the results, and returns a Bio::SeqFeature::Generic object for each sequences found to have a subcellular localization So to analyze your sequence, you'll first need to run targetp on your sequence file to create a targetp result output file. Then use Bio::Tools::TargetP module to parse this result file and get only informations you want/need from the result to be display as shown in the SYNOPSIS of the Pod documentation of the module. HTH Regards Emmanuel From roy.chaudhuri at gmail.com Wed Jan 13 12:52:58 2010 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Wed, 13 Jan 2010 12:52:58 +0000 Subject: [Bioperl-l] create Bio::AlignIO object from hash In-Reply-To: References: <4B4B2A51.9040602@gmail.com> <4B4B2D91.70906@gmail.com> Message-ID: <4B4DC22A.8080701@gmail.com> Upload them to Bugzilla as patches, and one of the devs will review your changes and incorporate them into bioperl-live: http://www.bioperl.org/wiki/HOWTO:SubmitPatch Roy. On 11/01/2010 16:27, Adam Witney wrote: > > Ah excellent, thanks Roy. I was indeed thinking about it the wrong > way. > > In the process of writing this i have created a > > Bio::Tools::Run::Phylo::Phylip::Pars class > > which is essentially just a modified copy of ProtPars. I have also > fixed a few typos and possible bugs in > > Bio/Tools/Run/Phylo/Phylip/Base.pm > Bio/Tools/Run/Phylo/Phylip/PhylipConf.pm Bio/AlignIO/phylip.pm > Bio/Tools/Run/Alignment/Clustalw.pm > > I am of course happy to send these back in to the project... how > would i best do this? > > Cheers > > adam > > > On 11 Jan 2010, at 13:54, Roy Chaudhuri wrote: > >> Actually, I guess some sample code would be more helpful: >> >> use Bio::LocatableSeq; use Bio::SimpleAlign; use Bio::AlignIO; my >> $seq1=Bio::LocatableSeq->new(-id=>'one', -seq=>'AT-CG', -start=>1, >> -end=>4); my $seq2=Bio::LocatableSeq->new(-id=>'two', >> -seq=>'A--CG', -start=>1, -end=>3); my >> $seq3=Bio::LocatableSeq->new(-id=>'three', -seq=>'ATTCG', >> -start=>1, -end=>5); my >> $aln=Bio::SimpleAlign->new(-seqs=>[$seq1,$seq2,$seq3]); >> Bio::AlignIO->new(-format=>'phylip')->write_aln($aln); >> >> Cheers, Roy. >> >> >> On 11/01/2010 13:40, Roy Chaudhuri wrote: >>> Hi Adam, >>> >>> I'm guessing you actually want to create a Bio::SimpleAlign >>> object (representing an alignment), rather than a Bio::AlignIO >>> object (which is just for reading/writing alignment files). >>> Bio::SimpleAlign has a documented new method that allows you to >>> construct an alignment from Bio::LocatableSeq objects, which are >>> similar to Bio::Seq objects but include gaps and start/end >>> coordinates to describe their relationship to other sequences in >>> the alignment. >>> >>> Roy. >>> >>> On 11/01/2010 12:21, Adam Witney wrote: >>>> Hi, >>>> >>>> I am writing a script to automate the running of Phylip Pars. >>>> In the process i have to create a Bio::AlignIO object from a >>>> set of data that i have in a hash. >>>> >>>> I could write the hash data into a phylip file and then load >>>> the Bio::AlignIO from that file, but i wondered if i could skip >>>> the writing and then reading of a temporary file ? >>>> >>>> thanks for any help >>>> >>>> adam _______________________________________________ Bioperl-l >>>> mailing list Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> > From marcelo011982 at gmail.com Wed Jan 13 18:12:04 2010 From: marcelo011982 at gmail.com (Marcelo Iwata) Date: Wed, 13 Jan 2010 16:12:04 -0200 Subject: [Bioperl-l] Blast to Clustalw Format Message-ID: <1c9f28971001131012w65378e9bw12d892e874028056@mail.gmail.com> Hi.. I have an simple Blast result, such as blastn. Is there an scrip to transform such result to Clustalw format in Bioperl ?(.aln) Thanx for any help. From Kevin.M.Brown at asu.edu Wed Jan 13 18:01:42 2010 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Wed, 13 Jan 2010 11:01:42 -0700 Subject: [Bioperl-l] targetp request In-Reply-To: <4B4D9F62.5010306@pasteur.fr> References: <581b44531001061020i68b2e80ic15f3bcc830204a@mail.gmail.com> <4B4D9F62.5010306@pasteur.fr> Message-ID: <1A4207F8295607498283FE9E93B775B4067C133E@EX02.asurite.ad.asu.edu> Sounds like this module might be in the wrong place then. Sounds more like a SeqIO or AlignIO module, heheh. Also looks like the docs might need to be cleaned up a bit for english readability (at least that initial sentence). Kevin Brown Center for Innovations in Medicine Biodesign Institute Arizona State University > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Emmanuel Quevillon > Sent: Wednesday, January 13, 2010 3:25 AM > To: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] targetp request > > On 1/6/10 7:20 PM, Vijayaraj Nagarajan wrote: > > Hi, > > > > I am trying to use targetP in bioperl. > > the documentation at the bioperl site is a bit confusing to me... > > > > I would appreciate if you could give a very small example, > as to how to use > > "Bio::Tools::TargetP" to predict the localization of a > protein sequence that > > i have stored as a string. > > > > Thanks, > > Vijay > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Dear Vivay, > > Bio::Tools::TargetP is not intended to run targetp on a > sequence but to > read and parse results from targetp run. > > From the Pod doc : > > DESCRIPTION > TargetP modules will provides parsed informations > about protein > localization. It > reads in a targetp output file. It parses the results, and > returns a > Bio::SeqFeature::Generic object for each sequences > found to have > a subcellular > localization > > > So to analyze your sequence, you'll first need to run targetp on your > sequence file to create a targetp result output file. Then use > Bio::Tools::TargetP module to parse this result file and get only > informations you want/need from the result to be display as > shown in the > SYNOPSIS of the Pod documentation of the module. > > HTH > > Regards > > Emmanuel > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From maj at fortinbras.us Wed Jan 13 18:44:36 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 13 Jan 2010 13:44:36 -0500 Subject: [Bioperl-l] Blast to Clustalw Format In-Reply-To: <1c9f28971001131012w65378e9bw12d892e874028056@mail.gmail.com> References: <1c9f28971001131012w65378e9bw12d892e874028056@mail.gmail.com> Message-ID: Marcelo- Yes-- look at the code snip at http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_SearchIO combined with the snip at http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods (using -format => 'clustalw') cheers MAJ ----- Original Message ----- From: "Marcelo Iwata" To: Sent: Wednesday, January 13, 2010 1:12 PM Subject: [Bioperl-l] Blast to Clustalw Format > Hi.. > I have an simple Blast result, such as blastn. > Is there an scrip to transform such result to Clustalw format in Bioperl > ?(.aln) > > Thanx for any help. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From dan.kortschak at adelaide.edu.au Thu Jan 14 04:26:46 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Thu, 14 Jan 2010 14:56:46 +1030 Subject: [Bioperl-l] not able to use Bio::Root::IO method Message-ID: <1263443206.8016.47.camel@zoidberg.mbs.adelaide.edu.au> Hi All, I'm having a stupid problem that for some reason I just can't figure out. I'm putting together a B:A:IO:bowtie module to wrap around the B:A:IO:sam module so bowtie output can be used as an assembly start point. For some reason that is escaping me I can't create tempfiles! What should be the relevant code in the module: package Bio::Assembly::IO::bowtie; use strict; use warnings; # Object preamble - inherits from Bio::Root::Root use Bio::SeqIO; use Bio::Tools::Run::Samtools; use Bio::Assembly::IO; use Carp; use Bio::Root::Root; use Bio::Root::IO; use base qw( Bio::Root::Root Bio::Root::IO Bio::Assembly::IO ); and the line (there are a couple of others that are like to fail in the same way, but I've not got that far) my ($sam_tmp_h, $sam_tmp_f) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.sam' ); Which dies with: Can't locate object method "io" via package "Bio::Assembly::IO::bowtie" at /usr/local/share/perl/5.10.0/Bio/Assembly/IO/bowtie.pm line 175. Relevant environment vars: DB<10> x @ISA 0 'Bio::Root::Root' 1 'Bio::Root::IO' 2 'Bio::Assembly::IO' DB<11> x $self 0 Bio::Assembly::IO::bowtie=HASH(0x2d226d8) '_no_head' => undef '_no_sq' => undef '_root_verbose' => 0 Can someone suggest what I'm missing? cheers Dan From maj at fortinbras.us Thu Jan 14 05:11:01 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 14 Jan 2010 00:11:01 -0500 Subject: [Bioperl-l] not able to use Bio::Root::IO method In-Reply-To: <1263443206.8016.47.camel@zoidberg.mbs.adelaide.edu.au> References: <1263443206.8016.47.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: <84196F01FF584C64A79B89FECE2DD86F@NewLife> Hey Dan-- what does your constructor look like? I wonder if something's getting lost in new() and _initialize() chaining spaghetti- MAJ ----- Original Message ----- From: "Dan Kortschak" To: Sent: Wednesday, January 13, 2010 11:26 PM Subject: [Bioperl-l] not able to use Bio::Root::IO method > Hi All, > > I'm having a stupid problem that for some reason I just can't figure > out. I'm putting together a B:A:IO:bowtie module to wrap around the > B:A:IO:sam module so bowtie output can be used as an assembly start > point. > > For some reason that is escaping me I can't create tempfiles! > > What should be the relevant code in the module: > > package Bio::Assembly::IO::bowtie; > use strict; > use warnings; > > # Object preamble - inherits from Bio::Root::Root > > use Bio::SeqIO; > use Bio::Tools::Run::Samtools; > use Bio::Assembly::IO; > use Carp; > use Bio::Root::Root; > use Bio::Root::IO; > use base qw( Bio::Root::Root Bio::Root::IO Bio::Assembly::IO ); > > > and the line (there are a couple of others that are like to fail in the > same way, but I've not got that far) > > my ($sam_tmp_h, $sam_tmp_f) = $self->io->tempfile( -dir => > $self->tempdir(), -suffix => '.sam' ); > > Which dies with: > Can't locate object method "io" via package "Bio::Assembly::IO::bowtie" > at /usr/local/share/perl/5.10.0/Bio/Assembly/IO/bowtie.pm line 175. > > Relevant environment vars: > DB<10> x @ISA > 0 'Bio::Root::Root' > 1 'Bio::Root::IO' > 2 'Bio::Assembly::IO' > > DB<11> x $self > 0 Bio::Assembly::IO::bowtie=HASH(0x2d226d8) > '_no_head' => undef > '_no_sq' => undef > '_root_verbose' => 0 > > > > Can someone suggest what I'm missing? > > cheers > Dan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From dan.kortschak at adelaide.edu.au Thu Jan 14 05:35:35 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Thu, 14 Jan 2010 16:05:35 +1030 Subject: [Bioperl-l] not able to use Bio::Root::IO method In-Reply-To: <84196F01FF584C64A79B89FECE2DD86F@NewLife> References: <1263443206.8016.47.camel@zoidberg.mbs.adelaide.edu.au> <84196F01FF584C64A79B89FECE2DD86F@NewLife> Message-ID: <1263447335.8016.57.camel@zoidberg.mbs.adelaide.edu.au> Thanks Mark, I'm not sure about that since @ISA still includes Bio::Root:IO when it's at the call, but it might be. cheers Dan Here is the entirety of the code (it reasonably short): package Bio::Assembly::IO::bowtie; use strict; use warnings; # Object preamble - inherits from Bio::Root::Root use Bio::SeqIO; use Bio::Tools::Run::Samtools; use Bio::Assembly::IO; use Carp; use Bio::Root::Root; use Bio::Root::IO; use base qw( Bio::Root::Root Bio::Root::IO Bio::Assembly::IO ); our $HD = "\@HD\tVN:1.0\tSO:unsorted\n"; our $PG = "\@PG\tID=Bowtie\n"; our $HAVE_IO_UNCOMPRESS; BEGIN { # check requirements unless ( eval "require Bio::Tools::Run::Bowtie;") { Bio::Root::Root->throw("Bio::Tools::Run::Bowtie is not available - cannot extract refdb from index."); } unless ( eval "require IO::Uncompress::Gunzip; \$HAVE_IO_UNCOMPRESS = 1") { Bio::Root::Root->warn("IO::Uncompress::Gunzip is not available; you'll have to do your decompression by hand."); } } sub new { my $class = shift; my @args = @_; my $self = $class->SUPER::new(@args); my ($file, $index, $no_head, $no_sq) = $self->_rearrange([qw(FILE INDEX NO_HEAD NO_SQ)], @args); $file =~ s/^{'_no_head'} = $no_head; $self->{'_no_sq'} = $no_sq; # get the sequence so samtools can work with it my $inspector = Bio::Tools::Run::Bowtie->new( -command => 'inspect' ); my $refdb = $inspector->run($index); my $bam_file = $self->_make_bam($self->_bowtie_to_sam($file, $refdb)); my $sam = Bio::Assembly::IO->new( -file => "<$bam_file", -refdb => $refdb , -format => 'sam' ); return $sam; } sub _bowtie_to_sam { my ($self, $file, $refdb) = @_; $self->throw("'$file' does not exist or is not readable.") unless ( -e $file && -r $file ); my $guesser = Bio::Tools::GuessSeqFormat->new(-file=>$file); $self->throw("'$file' is not a bowtie formatted file.") unless $guesser->guess =~ m/^bowtie$/; my %SQ; my $mapq = 255; my $in_pair; my @mate_line; my $mlen; if ($file =~ m/\.gz[^.]*$/) { unless ($HAVE_IO_UNCOMPRESS) { croak( "IO::Uncompress::Gunzip not available, can't expand '$_'" ); } my ($tfh, $tf) = $self->io->tempfile; my $z = IO::Uncompress::Gunzip->new($_); while (<$z>) { print $tfh $_ } close $tfh; $file = $tf; } open(my $fh, $file) or $self->throw("Can not open '$file' for reading: $!"); # create temp file for working my ($sam_tmp_h, $sam_tmp_f) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.sam' ); while ($fh) { chomp; my ($qname,$strand,$rname,$pos,$seq,$qual,$m,$details)=split("\t",$_); $SQ{$rname} = 1; my $paired_f = ($qname =~ m#/[12]#) ? 0x03 : 0; my $strand_f = ($strand eq '-') ? 0x10 : 0; my $op_strand_f = ($strand eq '+' && $paired_f) ? 0x20 : 0; my $first_f = ($qname =~ m#/1#) ? 0x40 : 0; my $second_f = ($qname =~ m#/2#) ? 0x80 : 0; my $flag = $paired_f | $strand_f | $op_strand_f | $first_f | $second_f; $pos++; my $len = length $seq; die unless $len == length $qual; my $cigar = $len.'M'; my @detail = split(',',$details); my $dist = 'NM:i:'.scalar @detail; my @mismatch; my $last_pos = 0; for (@detail) { m/(\d+):(\w)>\w/; my $err = ($1-$last_pos); $last_pos = $1+1; push @mismatch,($err,$2); } push @mismatch, $len-$last_pos; @mismatch = reverse @mismatch if $strand eq '-'; my $mismatch = join('',('MD:Z:', at mismatch)); if ($paired_f) { my $mrnm = '='; if ($in_pair) { my $mpos = $mate_line[3]; $mate_line[7] = $pos; my $isize = $mpos-$pos-$len; $mate_line[8] = -$isize; print $sam_tmp_h join("\t", at mate_line),"\n"; print $sam_tmp_h join("\t",$qname, $flag, $rname, $pos, $mapq, $cigar, $mrnm, $mpos, $isize, $seq, $qual, $mismatch, $dist),"\n"; $in_pair = 0; } else { $mlen = $len; @mate_line = ($qname, $flag, $rname, $pos, $mapq, $cigar, $mrnm, undef, undef, $seq, $qual, $mismatch, $dist); $in_pair = 1; } } else { my $mrnm = '*'; my $mpos = 0; my $isize = 0; print $sam_tmp_h join("\t",$qname, $flag, $rname, $pos, $mapq, $cigar, $mrnm, $mpos, $isize, $seq, $qual, $mismatch, $dist),"\n"; } } close($fh); $sam_tmp_h->close; return $sam_tmp_f if $self->{'_no_head'}; my ($samh, $samf) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.sam' ); # print header print $samh $HD; # print sequence dictionary unless ($self->{'_no_sq'}) { my $db = Bio::SeqIO->new( -file => $refdb, -format => 'fasta' ); while ( my $seq = $db->next_seq() ) { $SQ{$seq->id} = $seq->length if $SQ{$seq->id}; } map { print $samh join("\t", ('@SQ', "SN:$_", "LN:$SQ{$_}")), "\n" } keys %SQ; } # print program print $samh $PG; open($sam_tmp_h, $sam_tmp_f) or $self->throw("Can not open '$sam_tmp_f' for reading: $!"); print $samh $_ while ($sam_tmp_h); close($sam_tmp_h); $samh->close; return $samf; } sub _make_bam { my ($self, $file) = @_; $self->throw("'$file' does not exist or is not readable") unless ( -e $file && -r $file ); # make a sorted bam file from a sam file input my ($bamh, $bamf) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.bam' ); my ($srth, $srtf) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.srt' ); $_->close for ($bamh, $srth); my $samt = Bio::Tools::Run::Samtools->new( -command => 'view', -sam_input => 1, -bam_output => 1 ); $samt->run( -bam => $file, -out => $bamf ); $samt = Bio::Tools::Run::Samtools->new( -command => 'sort' ); $samt->run( -bam => $bamf, -pfx => $srtf); return $srtf.'.bam' } 1; On Thu, 2010-01-14 at 00:11 -0500, Mark A. Jensen wrote: > Hey Dan-- what does your constructor look like? I wonder if > something's getting > lost in new() and _initialize() chaining spaghetti- MAJ > From dan.kortschak at adelaide.edu.au Thu Jan 14 05:35:48 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Thu, 14 Jan 2010 16:05:48 +1030 Subject: [Bioperl-l] not able to use Bio::Root::IO method In-Reply-To: References: <1263443206.8016.47.camel@zoidberg.mbs.adelaide.edu.au> <84196F01FF584C64A79B89FECE2DD86F@NewLife> <1263446261.8016.50.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: <1263447348.8016.59.camel@zoidberg.mbs.adelaide.edu.au> I've had a bit of a play with that, but no luck. Dan On Thu, 2010-01-14 at 00:26 -0500, Mark A. Jensen wrote: > I've found that rearranging the items in the 'use base' array can > sometimes > recover > lost methods. I don't know enough of the arcana to know why it works. > (Sometimes, > java starts looking pretty good from here...) > From maj at fortinbras.us Thu Jan 14 05:38:00 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 14 Jan 2010 00:38:00 -0500 Subject: [Bioperl-l] Fw: not able to use Bio::Root::IO method Message-ID: <59E0A4EAF5934DC6BDCA7D8E98DB085F@NewLife> up to list ----- Original Message ----- From: "Mark A. Jensen" To: "Dan Kortschak" Sent: Thursday, January 14, 2010 12:36 AM Subject: Re: [Bioperl-l] not able to use Bio::Root::IO method > Aha-- check out the pod for Bio::Root::IO: > > "This module provides methods that will usually be needed for any sort > of file- or stream-related input/output, e.g., keeping track of a file > handle, transient printing and reading from the file handle, a close > method, automatically closing the handle on garbage collection, etc. > > To use this for your own code you will either want to inherit from > this module, or instantiate an object for every file or stream you are > dealing with. In the first case this module will most likely not be > the first class off which your class inherits; therefore you need to > call _initialize_io() with the named parameters in order to set file > handle, open file, etc automatically." > > I think you're wanting a call to $self->_initialize_io(). (There is no io() > method explicitly defined in any of the base classes.) > MAJ > ----- Original Message ----- > From: "Dan Kortschak" > To: > Sent: Wednesday, January 13, 2010 11:26 PM > Subject: [Bioperl-l] not able to use Bio::Root::IO method > > >> Hi All, >> >> I'm having a stupid problem that for some reason I just can't figure >> out. I'm putting together a B:A:IO:bowtie module to wrap around the >> B:A:IO:sam module so bowtie output can be used as an assembly start >> point. >> >> For some reason that is escaping me I can't create tempfiles! >> >> What should be the relevant code in the module: >> >> package Bio::Assembly::IO::bowtie; >> use strict; >> use warnings; >> >> # Object preamble - inherits from Bio::Root::Root >> >> use Bio::SeqIO; >> use Bio::Tools::Run::Samtools; >> use Bio::Assembly::IO; >> use Carp; >> use Bio::Root::Root; >> use Bio::Root::IO; >> use base qw( Bio::Root::Root Bio::Root::IO Bio::Assembly::IO ); >> >> >> and the line (there are a couple of others that are like to fail in the >> same way, but I've not got that far) >> >> my ($sam_tmp_h, $sam_tmp_f) = $self->io->tempfile( -dir => >> $self->tempdir(), -suffix => '.sam' ); >> >> Which dies with: >> Can't locate object method "io" via package "Bio::Assembly::IO::bowtie" >> at /usr/local/share/perl/5.10.0/Bio/Assembly/IO/bowtie.pm line 175. >> >> Relevant environment vars: >> DB<10> x @ISA >> 0 'Bio::Root::Root' >> 1 'Bio::Root::IO' >> 2 'Bio::Assembly::IO' >> >> DB<11> x $self >> 0 Bio::Assembly::IO::bowtie=HASH(0x2d226d8) >> '_no_head' => undef >> '_no_sq' => undef >> '_root_verbose' => 0 >> >> >> >> Can someone suggest what I'm missing? >> >> cheers >> Dan >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> From maj at fortinbras.us Thu Jan 14 05:50:11 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 14 Jan 2010 00:50:11 -0500 Subject: [Bioperl-l] not able to use Bio::Root::IO method In-Reply-To: <1263447348.8016.59.camel@zoidberg.mbs.adelaide.edu.au> References: <1263443206.8016.47.camel@zoidberg.mbs.adelaide.edu.au> <84196F01FF584C64A79B89FECE2DD86F@NewLife> <1263446261.8016.50.camel@zoidberg.mbs.adelaide.edu.au> <1263447348.8016.59.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: <82BFF47099684EF496DB3875D39DCA14@NewLife> For the benefit of the list, I categorically deny ever making the statement about java below.... MAJ ----- Original Message ----- From: "Dan Kortschak" To: "Mark A. Jensen" Cc: Sent: Thursday, January 14, 2010 12:35 AM Subject: Re: [Bioperl-l] not able to use Bio::Root::IO method > I've had a bit of a play with that, but no luck. > > Dan > > On Thu, 2010-01-14 at 00:26 -0500, Mark A. Jensen wrote: >> I've found that rearranging the items in the 'use base' array can >> sometimes >> recover >> lost methods. I don't know enough of the arcana to know why it works. >> (Sometimes, >> java starts looking pretty good from here...) >> > > From cjfields at illinois.edu Thu Jan 14 07:23:41 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 14 Jan 2010 01:23:41 -0600 Subject: [Bioperl-l] not able to use Bio::Root::IO method In-Reply-To: <1263447335.8016.57.camel@zoidberg.mbs.adelaide.edu.au> References: <1263443206.8016.47.camel@zoidberg.mbs.adelaide.edu.au> <84196F01FF584C64A79B89FECE2DD86F@NewLife> <1263447335.8016.57.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: You can remove separate 'use' directives if they are declared with 'use base' (they will be imported then). Also, Bio::Root::IO inherits Bio::Root::Root, and Bio::Assembly::IO should inherit from Bio::Root::IO, so the only base module you should need is Bio::Assembly::IO. It's possible having all three is confusing the interpreter. chris On Jan 13, 2010, at 11:35 PM, Dan Kortschak wrote: > Thanks Mark, I'm not sure about that since @ISA still includes > Bio::Root:IO when it's at the call, but it might be. > > cheers > Dan > > Here is the entirety of the code (it reasonably short): > > package Bio::Assembly::IO::bowtie; > use strict; > use warnings; > > # Object preamble - inherits from Bio::Root::Root > > use Bio::SeqIO; > use Bio::Tools::Run::Samtools; > use Bio::Assembly::IO; > use Carp; > use Bio::Root::Root; > use Bio::Root::IO; > use base qw( Bio::Root::Root Bio::Root::IO Bio::Assembly::IO ); > > our $HD = "\@HD\tVN:1.0\tSO:unsorted\n"; > our $PG = "\@PG\tID=Bowtie\n"; > > our $HAVE_IO_UNCOMPRESS; > BEGIN { > # check requirements > unless ( eval "require Bio::Tools::Run::Bowtie;") { > Bio::Root::Root->throw("Bio::Tools::Run::Bowtie is not available - cannot extract refdb from index."); > } > unless ( eval "require IO::Uncompress::Gunzip; \$HAVE_IO_UNCOMPRESS = 1") { > Bio::Root::Root->warn("IO::Uncompress::Gunzip is not available; you'll have to do your decompression by hand."); > } > } > > sub new { > my $class = shift; > my @args = @_; > my $self = $class->SUPER::new(@args); > my ($file, $index, $no_head, $no_sq) = $self->_rearrange([qw(FILE INDEX NO_HEAD NO_SQ)], @args); > $file =~ s/^ $self->{'_no_head'} = $no_head; > $self->{'_no_sq'} = $no_sq; > # get the sequence so samtools can work with it > my $inspector = Bio::Tools::Run::Bowtie->new( -command => 'inspect' ); > my $refdb = $inspector->run($index); > my $bam_file = $self->_make_bam($self->_bowtie_to_sam($file, $refdb)); > my $sam = Bio::Assembly::IO->new( -file => "<$bam_file", -refdb => $refdb , -format => 'sam' ); > return $sam; > } > > sub _bowtie_to_sam { > my ($self, $file, $refdb) = @_; > > $self->throw("'$file' does not exist or is not readable.") > unless ( -e $file && -r $file ); > my $guesser = Bio::Tools::GuessSeqFormat->new(-file=>$file); > $self->throw("'$file' is not a bowtie formatted file.") unless $guesser->guess =~ m/^bowtie$/; > > my %SQ; > my $mapq = 255; > my $in_pair; > my @mate_line; > my $mlen; > > if ($file =~ m/\.gz[^.]*$/) { > unless ($HAVE_IO_UNCOMPRESS) { > croak( "IO::Uncompress::Gunzip not available, can't expand '$_'" ); > } > my ($tfh, $tf) = $self->io->tempfile; > my $z = IO::Uncompress::Gunzip->new($_); > while (<$z>) { print $tfh $_ } > close $tfh; > $file = $tf; > } > > open(my $fh, $file) or > $self->throw("Can not open '$file' for reading: $!"); > > # create temp file for working > my ($sam_tmp_h, $sam_tmp_f) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.sam' ); > > while ($fh) { > chomp; > my ($qname,$strand,$rname,$pos,$seq,$qual,$m,$details)=split("\t",$_); > $SQ{$rname} = 1; > > my $paired_f = ($qname =~ m#/[12]#) ? 0x03 : 0; > my $strand_f = ($strand eq '-') ? 0x10 : 0; > my $op_strand_f = ($strand eq '+' && $paired_f) ? 0x20 : 0; > my $first_f = ($qname =~ m#/1#) ? 0x40 : 0; > my $second_f = ($qname =~ m#/2#) ? 0x80 : 0; > my $flag = $paired_f | $strand_f | $op_strand_f | $first_f | $second_f; > > $pos++; > my $len = length $seq; > die unless $len == length $qual; > my $cigar = $len.'M'; > my @detail = split(',',$details); > my $dist = 'NM:i:'.scalar @detail; > > my @mismatch; > my $last_pos = 0; > for (@detail) { > m/(\d+):(\w)>\w/; > my $err = ($1-$last_pos); > $last_pos = $1+1; > push @mismatch,($err,$2); > } > push @mismatch, $len-$last_pos; > @mismatch = reverse @mismatch if $strand eq '-'; > my $mismatch = join('',('MD:Z:', at mismatch)); > > if ($paired_f) { > my $mrnm = '='; > if ($in_pair) { > my $mpos = $mate_line[3]; > $mate_line[7] = $pos; > my $isize = $mpos-$pos-$len; > $mate_line[8] = -$isize; > print $sam_tmp_h join("\t", at mate_line),"\n"; > print $sam_tmp_h join("\t",$qname, $flag, $rname, $pos, $mapq, $cigar, $mrnm, $mpos, $isize, $seq, $qual, $mismatch, $dist),"\n"; > $in_pair = 0; > } else { > $mlen = $len; > @mate_line = ($qname, $flag, $rname, $pos, $mapq, $cigar, $mrnm, undef, undef, $seq, $qual, $mismatch, $dist); > $in_pair = 1; > } > } else { > my $mrnm = '*'; > my $mpos = 0; > my $isize = 0; > print $sam_tmp_h join("\t",$qname, $flag, $rname, $pos, $mapq, $cigar, $mrnm, $mpos, $isize, $seq, $qual, $mismatch, $dist),"\n"; > } > } > > close($fh); > $sam_tmp_h->close; > > return $sam_tmp_f if $self->{'_no_head'}; > > my ($samh, $samf) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.sam' ); > > # print header > print $samh $HD; > > # print sequence dictionary > unless ($self->{'_no_sq'}) { > my $db = Bio::SeqIO->new( -file => $refdb, -format => 'fasta' ); > while ( my $seq = $db->next_seq() ) { > $SQ{$seq->id} = $seq->length if $SQ{$seq->id}; > } > > map { print $samh join("\t", ('@SQ', "SN:$_", "LN:$SQ{$_}")), "\n" } keys %SQ; > } > > # print program > print $samh $PG; > > open($sam_tmp_h, $sam_tmp_f) or > $self->throw("Can not open '$sam_tmp_f' for reading: $!"); > > print $samh $_ while ($sam_tmp_h); > > close($sam_tmp_h); > $samh->close; > > return $samf; > } > > sub _make_bam { > my ($self, $file) = @_; > > $self->throw("'$file' does not exist or is not readable") > unless ( -e $file && -r $file ); > > # make a sorted bam file from a sam file input > my ($bamh, $bamf) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.bam' ); > my ($srth, $srtf) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.srt' ); > $_->close for ($bamh, $srth); > > my $samt = Bio::Tools::Run::Samtools->new( -command => 'view', > -sam_input => 1, > -bam_output => 1 ); > > $samt->run( -bam => $file, -out => $bamf ); > > $samt = Bio::Tools::Run::Samtools->new( -command => 'sort' ); > > $samt->run( -bam => $bamf, -pfx => $srtf); > > return $srtf.'.bam' > } > > 1; > > > On Thu, 2010-01-14 at 00:11 -0500, Mark A. Jensen wrote: >> Hey Dan-- what does your constructor look like? I wonder if >> something's getting >> lost in new() and _initialize() chaining spaghetti- MAJ >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Jan 14 07:25:05 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 14 Jan 2010 01:25:05 -0600 Subject: [Bioperl-l] Fw: not able to use Bio::Root::IO method In-Reply-To: <59E0A4EAF5934DC6BDCA7D8E98DB085F@NewLife> References: <59E0A4EAF5934DC6BDCA7D8E98DB085F@NewLife> Message-ID: <1DB926E1-9C6F-4B96-8D7E-28317DD7DE42@illinois.edu> Yes, that's true. The call to an io() is a Bio::Tools::Run::WrapperBase thing (the io() is a Bio::Root::IO instance). chris On Jan 13, 2010, at 11:38 PM, Mark A. Jensen wrote: > up to list > ----- Original Message ----- From: "Mark A. Jensen" > To: "Dan Kortschak" > Sent: Thursday, January 14, 2010 12:36 AM > Subject: Re: [Bioperl-l] not able to use Bio::Root::IO method > > >> Aha-- check out the pod for Bio::Root::IO: >> "This module provides methods that will usually be needed for any sort >> of file- or stream-related input/output, e.g., keeping track of a file >> handle, transient printing and reading from the file handle, a close >> method, automatically closing the handle on garbage collection, etc. >> To use this for your own code you will either want to inherit from >> this module, or instantiate an object for every file or stream you are >> dealing with. In the first case this module will most likely not be >> the first class off which your class inherits; therefore you need to >> call _initialize_io() with the named parameters in order to set file >> handle, open file, etc automatically." >> I think you're wanting a call to $self->_initialize_io(). (There is no io() method explicitly defined in any of the base classes.) >> MAJ >> ----- Original Message ----- From: "Dan Kortschak" >> To: >> Sent: Wednesday, January 13, 2010 11:26 PM >> Subject: [Bioperl-l] not able to use Bio::Root::IO method >>> Hi All, >>> I'm having a stupid problem that for some reason I just can't figure >>> out. I'm putting together a B:A:IO:bowtie module to wrap around the >>> B:A:IO:sam module so bowtie output can be used as an assembly start >>> point. >>> For some reason that is escaping me I can't create tempfiles! >>> What should be the relevant code in the module: >>> package Bio::Assembly::IO::bowtie; >>> use strict; >>> use warnings; >>> # Object preamble - inherits from Bio::Root::Root >>> use Bio::SeqIO; >>> use Bio::Tools::Run::Samtools; >>> use Bio::Assembly::IO; >>> use Carp; >>> use Bio::Root::Root; >>> use Bio::Root::IO; >>> use base qw( Bio::Root::Root Bio::Root::IO Bio::Assembly::IO ); >>> and the line (there are a couple of others that are like to fail in the >>> same way, but I've not got that far) >>> my ($sam_tmp_h, $sam_tmp_f) = $self->io->tempfile( -dir => >>> $self->tempdir(), -suffix => '.sam' ); >>> Which dies with: >>> Can't locate object method "io" via package "Bio::Assembly::IO::bowtie" >>> at /usr/local/share/perl/5.10.0/Bio/Assembly/IO/bowtie.pm line 175. >>> Relevant environment vars: >>> DB<10> x @ISA 0 'Bio::Root::Root' >>> 1 'Bio::Root::IO' >>> 2 'Bio::Assembly::IO' >>> DB<11> x $self >>> 0 Bio::Assembly::IO::bowtie=HASH(0x2d226d8) >>> '_no_head' => undef >>> '_no_sq' => undef >>> '_root_verbose' => 0 >>> Can someone suggest what I'm missing? >>> cheers >>> Dan >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From dan.kortschak at adelaide.edu.au Thu Jan 14 07:59:20 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Thu, 14 Jan 2010 18:29:20 +1030 Subject: [Bioperl-l] not able to use Bio::Root::IO method In-Reply-To: References: <1263443206.8016.47.camel@zoidberg.mbs.adelaide.edu.au> <84196F01FF584C64A79B89FECE2DD86F@NewLife> <1263447335.8016.57.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: <1263455960.4630.3.camel@epistle> Thanks Chris, I've done that, and since the inheritance is direct (rather than being a constructed attribute in the object hash) the calls are $obj->temp* rather than the $obj->io->temp* that I was using. It works now and is much clearer having gotten rid of much of the declarations. cheers Dan On Thu, 2010-01-14 at 01:23 -0600, Chris Fields wrote: > You can remove separate 'use' directives if they are declared with > 'use base' (they will be imported then). Also, Bio::Root::IO inherits > Bio::Root::Root, and Bio::Assembly::IO should inherit from > Bio::Root::IO, so the only base module you should need is > Bio::Assembly::IO. It's possible having all three is confusing the > interpreter. > > chris From marcelo011982 at gmail.com Thu Jan 14 13:44:25 2010 From: marcelo011982 at gmail.com (Marcelo Iwata) Date: Thu, 14 Jan 2010 11:44:25 -0200 Subject: [Bioperl-l] Blast to Clustalw Format In-Reply-To: References: <1c9f28971001131012w65378e9bw12d892e874028056@mail.gmail.com> Message-ID: <1c9f28971001140544v715fc9acue40f836a31529307@mail.gmail.com> Thanks Mark. I think that most of you already know it. But , i'll put it for new users: #!/usr/bin/perl -w use strict; use Bio::SearchIO; use Bio::AlignIO; my $in = new Bio::SearchIO(-format => 'blast', -file => ' ../../fontes/exemplos/blat/teste2/output.blast '); my $aln; my $alnIO; $alnIO = Bio::AlignIO->new(-format =>"clustalw", -file => ">hsp.aln"); while ( my $result = $in->next_result ) { ## $result is a Bio::Search::Result::ResultI compliant object while ( my $hit = $result->next_hit ) { ## $hit is a Bio::Search::Hit::HitI compliant object while ( my $hsp = $hit->next_hsp ) { ## $hsp is a Bio::Search::HSP::HSPI compliant object $aln = $hsp->get_aln; $alnIO->write_aln($aln); } } } On Wed, Jan 13, 2010 at 4:44 PM, Mark A. Jensen wrote: > Marcelo- > Yes-- look at the code snip at > http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_SearchIO > combined with the snip at > http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods > (using -format => 'clustalw') > cheers MAJ > ----- Original Message ----- From: "Marcelo Iwata" < > marcelo011982 at gmail.com> > To: > Sent: Wednesday, January 13, 2010 1:12 PM > Subject: [Bioperl-l] Blast to Clustalw Format > > > Hi.. >> I have an simple Blast result, such as blastn. >> Is there an scrip to transform such result to Clustalw format in Bioperl >> ?(.aln) >> >> Thanx for any help. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> From marcelo011982 at gmail.com Thu Jan 14 13:46:21 2010 From: marcelo011982 at gmail.com (Marcelo Iwata) Date: Thu, 14 Jan 2010 11:46:21 -0200 Subject: [Bioperl-l] Blast to Clustalw Format In-Reply-To: <1c9f28971001140544v715fc9acue40f836a31529307@mail.gmail.com> References: <1c9f28971001131012w65378e9bw12d892e874028056@mail.gmail.com> <1c9f28971001140544v715fc9acue40f836a31529307@mail.gmail.com> Message-ID: <1c9f28971001140546r275a0b67pec68de5ab8bff015@mail.gmail.com> Sorry , the correct code is: #!/usr/bin/perl -w use strict; use Bio::SearchIO; use Bio::AlignIO; my $in = new Bio::SearchIO(-format => 'blast', -file => ' ../../fontes/exemplos/blat/teste2/output.blast '); my $aln; my $alnIO; $alnIO = Bio::AlignIO->new(-format =>"clustalw", -file => ">hsp.aln"); while ( my $result = $in->next_result ) { ## $result is a Bio::Search::Result::ResultI compliant object while ( my $hit = $result->next_hit ) { ## $hit is a Bio::Search::Hit::HitI compliant object while ( my $hsp = $hit->next_hsp ) { ## $hsp is a Bio::Search::HSP::HSPI compliant object $aln = $hsp->get_aln; $alnIO->write_aln($aln); } } } On Thu, Jan 14, 2010 at 11:44 AM, Marcelo Iwata wrote: > Thanks Mark. > I think that most of you already know it. > But , i'll put it for new users: > > > #!/usr/bin/perl -w > > use strict; > use Bio::SearchIO; > use Bio::AlignIO; > > my $in = new Bio::SearchIO(-format => 'blast', > -file => ' > ../../fontes/exemplos/blat/teste2/output.blast '); > my $aln; > my $alnIO; > $alnIO = Bio::AlignIO->new(-format =>"clustalw", -file => ">hsp.aln"); > while ( my $result = $in->next_result ) { > ## $result is a Bio::Search::Result::ResultI compliant object > while ( my $hit = $result->next_hit ) { > ## $hit is a Bio::Search::Hit::HitI compliant object > while ( my $hsp = $hit->next_hsp ) { > ## $hsp is a Bio::Search::HSP::HSPI compliant object > $aln = $hsp->get_aln; > $alnIO->write_aln($aln); > > > } > } > } > > > On Wed, Jan 13, 2010 at 4:44 PM, Mark A. Jensen wrote: > >> Marcelo- >> Yes-- look at the code snip at >> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_SearchIO >> combined with the snip at >> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods >> (using -format => 'clustalw') >> cheers MAJ >> ----- Original Message ----- From: "Marcelo Iwata" < >> marcelo011982 at gmail.com> >> To: >> Sent: Wednesday, January 13, 2010 1:12 PM >> Subject: [Bioperl-l] Blast to Clustalw Format >> >> >> Hi.. >>> I have an simple Blast result, such as blastn. >>> Is there an scrip to transform such result to Clustalw format in >>> Bioperl >>> ?(.aln) >>> >>> Thanx for any help. >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> > From maj at fortinbras.us Thu Jan 14 13:54:31 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 14 Jan 2010 08:54:31 -0500 Subject: [Bioperl-l] Blast to Clustalw Format In-Reply-To: <1c9f28971001140546r275a0b67pec68de5ab8bff015@mail.gmail.com> References: <1c9f28971001131012w65378e9bw12d892e874028056@mail.gmail.com><1c9f28971001140544v715fc9acue40f836a31529307@mail.gmail.com> <1c9f28971001140546r275a0b67pec68de5ab8bff015@mail.gmail.com> Message-ID: <1B8891488AA746F49BCAAB531FBE4D0B@NewLife> Thanks Marcelo-- code snips always appreciated! MAJ ----- Original Message ----- From: "Marcelo Iwata" To: Sent: Thursday, January 14, 2010 8:46 AM Subject: Re: [Bioperl-l] Blast to Clustalw Format > Sorry , the correct code is: > > > > #!/usr/bin/perl -w > > use strict; > use Bio::SearchIO; > use Bio::AlignIO; > > my $in = new Bio::SearchIO(-format => 'blast', > -file => ' > ../../fontes/exemplos/blat/teste2/output.blast '); > my $aln; > my $alnIO; > $alnIO = Bio::AlignIO->new(-format =>"clustalw", -file => ">hsp.aln"); > while ( my $result = $in->next_result ) { > ## $result is a Bio::Search::Result::ResultI compliant object > while ( my $hit = $result->next_hit ) { > ## $hit is a Bio::Search::Hit::HitI compliant object > while ( my $hsp = $hit->next_hsp ) { > ## $hsp is a Bio::Search::HSP::HSPI compliant object > $aln = $hsp->get_aln; > $alnIO->write_aln($aln); > > } > } > } > > > On Thu, Jan 14, 2010 at 11:44 AM, Marcelo Iwata > wrote: > >> Thanks Mark. >> I think that most of you already know it. >> But , i'll put it for new users: >> >> >> #!/usr/bin/perl -w >> >> use strict; >> use Bio::SearchIO; >> use Bio::AlignIO; >> >> my $in = new Bio::SearchIO(-format => 'blast', >> -file => ' >> ../../fontes/exemplos/blat/teste2/output.blast '); >> my $aln; >> my $alnIO; >> $alnIO = Bio::AlignIO->new(-format =>"clustalw", -file => ">hsp.aln"); >> while ( my $result = $in->next_result ) { >> ## $result is a Bio::Search::Result::ResultI compliant object >> while ( my $hit = $result->next_hit ) { >> ## $hit is a Bio::Search::Hit::HitI compliant object >> while ( my $hsp = $hit->next_hsp ) { >> ## $hsp is a Bio::Search::HSP::HSPI compliant object >> $aln = $hsp->get_aln; >> $alnIO->write_aln($aln); >> >> >> } >> } >> } >> >> >> On Wed, Jan 13, 2010 at 4:44 PM, Mark A. Jensen wrote: >> >>> Marcelo- >>> Yes-- look at the code snip at >>> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_SearchIO >>> combined with the snip at >>> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods >>> (using -format => 'clustalw') >>> cheers MAJ >>> ----- Original Message ----- From: "Marcelo Iwata" < >>> marcelo011982 at gmail.com> >>> To: >>> Sent: Wednesday, January 13, 2010 1:12 PM >>> Subject: [Bioperl-l] Blast to Clustalw Format >>> >>> >>> Hi.. >>>> I have an simple Blast result, such as blastn. >>>> Is there an scrip to transform such result to Clustalw format in >>>> Bioperl >>>> ?(.aln) >>>> >>>> Thanx for any help. >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From sidd.basu at gmail.com Thu Jan 14 19:15:04 2010 From: sidd.basu at gmail.com (Siddhartha Basu) Date: Thu, 14 Jan 2010 13:15:04 -0600 Subject: [Bioperl-l] reading blast report Message-ID: <4b4f6d3b.5544f10a.40b1.4126@mx.google.com> Hi, I have a script that reads a tblastn report(13000 records) and loads in a chado database(Bio::Chado::Schema module), however the machine runs of memory. I am trying to figure out other than loading the database stuff if it the reading of SearchIO module could consume a lot of memory. So, when i am reading a blast file and getting the result object .... while (my $result = $searchio->next_result) * Does the searchio object loads a huge chunk of file in the memory or for each iteration it only reads a part of the result. * Does doing an index on blast report and then reading from it be much faster and why. And is there any way i could iterate through each record in the index, will that be helpful. -siddhartha From jason at bioperl.org Thu Jan 14 19:53:29 2010 From: jason at bioperl.org (Jason Stajich) Date: Thu, 14 Jan 2010 11:53:29 -0800 Subject: [Bioperl-l] reading blast report In-Reply-To: <4b4f6d3b.5544f10a.40b1.4126@mx.google.com> References: <4b4f6d3b.5544f10a.40b1.4126@mx.google.com> Message-ID: <83075C66-90CC-4250-890E-FE42F5FEB019@bioperl.org> What aspects of the report are you loading? You might consider the blast report as tab-delimited (-m 8 format) if you only are interested in start/end positions and scores of ailgnments which is a simpler and reduced dataset that has lower memory footprint by the parser. Searchio (default) -format => blast - you can try the BLAST -format => blast_pull instead which lazy parses to create objects and will reduce memory consumption. -jason On Jan 14, 2010, at 11:15 AM, Siddhartha Basu wrote: > Hi, > I have a script that reads a tblastn report(13000 records) and loads > in > a chado database(Bio::Chado::Schema module), however the machine > runs of memory. I am trying to figure > out other than loading the database stuff > if it the reading of SearchIO module could consume a lot of memory. > So, > when i am reading a blast file and getting the result object .... > > while (my $result = $searchio->next_result) > > * Does the searchio object loads a huge chunk of file in the memory or > for each iteration it only reads a part of the result. > > * Does doing an index on blast report and then reading from it be much > faster and why. And is there any way i could iterate through each > record in the index, will that be helpful. > > -siddhartha > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ From sidd.basu at gmail.com Thu Jan 14 20:15:45 2010 From: sidd.basu at gmail.com (Siddhartha Basu) Date: Thu, 14 Jan 2010 14:15:45 -0600 Subject: [Bioperl-l] Re: reading blast report In-Reply-To: <83075C66-90CC-4250-890E-FE42F5FEB019@bioperl.org> References: <4b4f6d3b.5544f10a.40b1.4126@mx.google.com> <83075C66-90CC-4250-890E-FE42F5FEB019@bioperl.org> Message-ID: <4b4f7b74.5744f10a.7087.4813@mx.google.com> On Thu, 14 Jan 2010, Jason Stajich wrote: > What aspects of the report are you loading? You might consider the blast > report as tab-delimited (-m 8 format) if you only are interested in > start/end positions and scores of ailgnments which is a simpler and reduced > dataset that has lower memory footprint by the parser. I think this would be a better approach i am mostly interested in start/end/score data only. > > Searchio (default) -format => blast - you can try the BLAST -format => > blast_pull instead which lazy parses to create objects and will reduce > memory consumption. It's another good option though. But just out of curosity, so the regular blast parser do load the entire file in the memory consider the output consist of multiple Results concatenated together into a single file. Could anybody clarify. thanks, -siddhartha > > -jason > On Jan 14, 2010, at 11:15 AM, Siddhartha Basu wrote: > > > Hi, > > I have a script that reads a tblastn report(13000 records) and loads in > > a chado database(Bio::Chado::Schema module), however the machine runs of > > memory. I am trying to figure > > out other than loading the database stuff > > if it the reading of SearchIO module could consume a lot of memory. So, > > when i am reading a blast file and getting the result object .... > > > > while (my $result = $searchio->next_result) > > > > * Does the searchio object loads a huge chunk of file in the memory or > > for each iteration it only reads a part of the result. > > > > * Does doing an index on blast report and then reading from it be much > > faster and why. And is there any way i could iterate through each > > record in the index, will that be helpful. > > > > -siddhartha > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > http://fungalgenomes.org/ > From jason at bioperl.org Thu Jan 14 21:28:29 2010 From: jason at bioperl.org (Jason Stajich) Date: Thu, 14 Jan 2010 13:28:29 -0800 Subject: [Bioperl-l] reading blast report In-Reply-To: <4b4f7b74.5744f10a.7087.4813@mx.google.com> References: <4b4f6d3b.5544f10a.40b1.4126@mx.google.com> <83075C66-90CC-4250-890E-FE42F5FEB019@bioperl.org> <4b4f7b74.5744f10a.7087.4813@mx.google.com> Message-ID: On Jan 14, 2010, at 12:15 PM, Siddhartha Basu wrote: > On Thu, 14 Jan 2010, Jason Stajich wrote: > >> What aspects of the report are you loading? You might consider the >> blast >> report as tab-delimited (-m 8 format) if you only are interested in >> start/end positions and scores of ailgnments which is a simpler and >> reduced >> dataset that has lower memory footprint by the parser. > > I think this would be a better approach i am mostly interested in > start/end/score data only. > >> >> Searchio (default) -format => blast - you can try the BLAST -format >> => >> blast_pull instead which lazy parses to create objects and will >> reduce >> memory consumption. > > It's another good option though. But just out of curosity, so the > regular blast parser do load the entire file in the memory consider > the > output consist of multiple Results concatenated together into a > single file. Could anybody clarify. > > thanks, > -siddhartha Each result is parsed (1 result per query) and all the hits and HSPs are parsed and brought into memory with the standard (non-pull) approach. The SearchIO iterates at the level of result - that is why you call next_result which parses each one at a time. > > >> >> -jason >> On Jan 14, 2010, at 11:15 AM, Siddhartha Basu wrote: >> >>> Hi, >>> I have a script that reads a tblastn report(13000 records) and >>> loads in >>> a chado database(Bio::Chado::Schema module), however the machine >>> runs of >>> memory. I am trying to figure >>> out other than loading the database stuff >>> if it the reading of SearchIO module could consume a lot of >>> memory. So, >>> when i am reading a blast file and getting the result object .... >>> >>> while (my $result = $searchio->next_result) >>> >>> * Does the searchio object loads a huge chunk of file in the >>> memory or >>> for each iteration it only reads a part of the result. >>> >>> * Does doing an index on blast report and then reading from it be >>> much >>> faster and why. And is there any way i could iterate through each >>> record in the index, will that be helpful. >>> >>> -siddhartha >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> jason.stajich at gmail.com >> jason at bioperl.org >> http://fungalgenomes.org/ >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ From sidd.basu at gmail.com Thu Jan 14 21:40:42 2010 From: sidd.basu at gmail.com (Siddhartha Basu) Date: Thu, 14 Jan 2010 15:40:42 -0600 Subject: [Bioperl-l] Re: reading blast report In-Reply-To: References: <4b4f6d3b.5544f10a.40b1.4126@mx.google.com> <83075C66-90CC-4250-890E-FE42F5FEB019@bioperl.org> <4b4f7b74.5744f10a.7087.4813@mx.google.com> Message-ID: <4b4f8f5d.5644f10a.2be2.47dc@mx.google.com> Thanks jason for clarification. On Thu, 14 Jan 2010, Jason Stajich wrote: > > On Jan 14, 2010, at 12:15 PM, Siddhartha Basu wrote: > > > On Thu, 14 Jan 2010, Jason Stajich wrote: > > > >> What aspects of the report are you loading? You might consider the blast > >> report as tab-delimited (-m 8 format) if you only are interested in > >> start/end positions and scores of ailgnments which is a simpler and > >> reduced > >> dataset that has lower memory footprint by the parser. > > > > I think this would be a better approach i am mostly interested in > > start/end/score data only. > > > >> > >> Searchio (default) -format => blast - you can try the BLAST -format => > >> blast_pull instead which lazy parses to create objects and will reduce > >> memory consumption. > > > > It's another good option though. But just out of curosity, so the > > regular blast parser do load the entire file in the memory consider the > > output consist of multiple Results concatenated together into a > > single file. Could anybody clarify. > > > > thanks, > > -siddhartha > > Each result is parsed (1 result per query) and all the hits and HSPs are > parsed and brought into memory with the standard (non-pull) approach. > The SearchIO iterates at the level of result - that is why you call > next_result which parses each one at a time. > > > > > > >> > >> -jason > >> On Jan 14, 2010, at 11:15 AM, Siddhartha Basu wrote: > >> > >>> Hi, > >>> I have a script that reads a tblastn report(13000 records) and loads in > >>> a chado database(Bio::Chado::Schema module), however the machine runs > >>> of > >>> memory. I am trying to figure > >>> out other than loading the database stuff > >>> if it the reading of SearchIO module could consume a lot of memory. So, > >>> when i am reading a blast file and getting the result object .... > >>> > >>> while (my $result = $searchio->next_result) > >>> > >>> * Does the searchio object loads a huge chunk of file in the memory or > >>> for each iteration it only reads a part of the result. > >>> > >>> * Does doing an index on blast report and then reading from it be much > >>> faster and why. And is there any way i could iterate through each > >>> record in the index, will that be helpful. > >>> > >>> -siddhartha > >>> > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> -- > >> Jason Stajich > >> jason.stajich at gmail.com > >> jason at bioperl.org > >> http://fungalgenomes.org/ > >> > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > http://fungalgenomes.org/ > From SMarkel at accelrys.com Thu Jan 14 22:58:06 2010 From: SMarkel at accelrys.com (Scott Markel) Date: Thu, 14 Jan 2010 14:58:06 -0800 Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes Message-ID: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net> We've been looking at Bio::Tools::Run::RemoteBlast after some feedback from our customers. Due to network irregularities (not sure what else to call it) users see the getting of remote BLAST results as somewhat random. When results come back the hits are fine, but sometimes no information comes back at all. Retrying helps. In looking at RemoteBlast.pm there are four "return -1" cases. * $status eq 'ERROR' (return on line 614) * $line =~ /ERROR/I (return on line 628) * !$got_content (return on line 648) * !$response->is_success (return on line 655) In the case of no content we'd like to retry remote BLAST. We're happy to do that part in our Pipeline Pilot Perl code wrapper for the BioPerl module, but we only want to retry in that case, not the other three. What would happen if that third "return -1" changed to a different return value? Scott Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel at accelrys.com Accelrys (Pipeline Pilot R&D) mobile: +1 858 205 3653 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 San Diego, CA 92121 fax: +1 858 799 5222 USA web: http://www.accelrys.com http://www.linkedin.com/in/smarkel Vice President, Board of Directors: International Society for Computational Biology Chair: ISCB Publications Committee Associate Editor: PLoS Computational Biology Editorial Board: Briefings in Bioinformatics From nickjd at gmail.com Wed Jan 13 13:18:12 2010 From: nickjd at gmail.com (NickJD) Date: Wed, 13 Jan 2010 05:18:12 -0800 (PST) Subject: [Bioperl-l] Parsing PSI-BLAST results with SearchIO Message-ID: <65554589-081b-4297-ab68-9ddfbd3d9944@c34g2000yqn.googlegroups.com> I am trying to parse PSI-BLAST results using SearchIO and some very basic code just to read the number of hits, number of hsps, etc. I have done 10 rounds on 1 input sequence and parsed it but it seems to treat each round as a separate result, so round/iteration is always 1 and new_hits its always the total list not the ones that are new to that round. Does anyone have any experience of this? Thanks, Nick From dsidote at waksman.rutgers.edu Wed Jan 13 15:08:48 2010 From: dsidote at waksman.rutgers.edu (David J Sidote) Date: Wed, 13 Jan 2010 10:08:48 -0500 Subject: [Bioperl-l] Bioinformatician position - Waksman Institute Message-ID: <4b42af671001130708i703ecce0u47348484321714f@mail.gmail.com> Bioinformatician ? Research Assistant Professor The Waksman Institute of Microbiology located on the New Brunswick campus of Rutgers University is seeking a highly motivated and talented bioinformatics scientist for an Research Assistant Professor appointment. The successful candidate will analyze genome, transcriptome, and epigenome data generated on the Life Sciences 454, Illumina, and AB SOLiD high-throughput sequencing platforms. Excellent communication and teamwork skills are essential as the successful candidate will work closely with individual research groups to develop software to facilitate the visualization, quantification, and interpretation of the data. The successful candidate will be expected to contribute to the publication of scientific literature and to present at seminars and conferences. Qualifications: - PhD in molecular biology, genetics, bioinformatics, systems biology or other related fields; candidates with a PhD in physics, mathematics, or computer science with some working knowledge of biology and experience are encouraged to apply. - Demonstrated scientific track record - Highly proficient in perl, python, or ruby programming, linux/unix scripting, and SQL. - Experience with R is desirable but not required - Experience with high-throughput sequencing, microarrays, or other high-throughput biological platforms - Excellent communication and organizational skills How to Apply: Please send a cover letter stating your current research interests, why you are interested in this position, and how your skill set complements this position along with a curriculum vitae, and the names and contact information of three references to hr at waksman.rutgers.edu. Please include "Bioinformatics Assistant Research Professor" in the subject line. Rutgers is an equal opportunity employer. For more information about this position please contact: Dr. David Sidote (dsidote at waksman.rutgers.edu) From albezg at gmail.com Thu Jan 14 01:57:27 2010 From: albezg at gmail.com (albezg) Date: Wed, 13 Jan 2010 20:57:27 -0500 Subject: [Bioperl-l] AlignIO crashes when reading stockholm alignment with negative PDB ranges In-Reply-To: <49C405F0.5050100@gmail.com> References: <49C2B97B.7070304@gmail.com> <49C405F0.5050100@gmail.com> Message-ID: <4B4E7A07.7070805@gmail.com> Hi all, I have a problem using AlignIO to read Pfam database: ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release/Pfam-A.seed.gz The database is in STOCKHOLM 1.0 format. AlignIO can read the alignment OK until the alignment PF00331.13. There it crashes with the following message: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: '1-344' is not an integer. STACK: Error::throw STACK: Bio::Root::Root::throw /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Root/Root.pm:368 STACK: Bio::Range::end /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Range.pm:228 STACK: Bio::Annotation::Target::new /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Annotation/Target.pm:82 STACK: Bio::AlignIO::Handler::GenericAlignHandler::_stockholm_target /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/GenericAlignHandler.pm:293 STACK: Bio::AlignIO::Handler::GenericAlignHandler::data_handler /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/GenericAlignHandler.pm:73 STACK: Bio::AlignIO::stockholm::next_aln /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/stockholm.pm:471 STACK: /home/albezg/scripts/pfam2fasta.pl:22 ----------------------------------------------------------- It appears this is caused by this entry: #=GS XYNA_PSEFL/263-608 DR PDB; 1e5n B; -1-344; I don't care about residues in PDB, so I have just removed minus signs from the ranges. This seems to have fixed the crashing. Is it a known problem? Is there a solution for it? Thanks, Alexandr On 03/20/2009 05:09 PM, albezg wrote: > > I'm trying to change FASTA header(display_id) for a sequence in an > alignment(SimpleAlign). > > There are no issues when I print it, however when I use AlignIO to write > the alignment to a FASTA file, it does not work. Is this behavior intended? > > Demo code: http://github.com/jhannah/sandbox/tree/master/Bio_AlignIO_bug > > The error: > ------------- EXCEPTION ------------- > MSG: No sequence with name [1/1-11] > STACK Bio::SimpleAlign::displayname > /scratch/BioSoftware/bioperl-live/Bio/SimpleAlign.pm:2659 > STACK Bio::AlignIO::fasta::write_aln > /scratch/BioSoftware/bioperl-live/Bio/AlignIO/fasta.pm:200 > STACK toplevel ./demo.pl:14 > ------------------------------------- > > Alexandr From mitch_skinner at berkeley.edu Thu Jan 14 22:10:53 2010 From: mitch_skinner at berkeley.edu (Mitch Skinner) Date: Thu, 14 Jan 2010 14:10:53 -0800 Subject: [Bioperl-l] filter_by_location in Bio::DB::SeqFeature::Store::memory Message-ID: <4B4F966D.3030300@berkeley.edu> Hi, Some people haven't been getting all of the features in their GFF3 into JBrowse, and a nice test case that James Casbon posted to the list helped me track it down. Here's an example of the behavior I was seeing with BioPerl 1.6.1 (using Devel::REPL): ============== $ use Bio::DB::SeqFeature::Store $ my $db = Bio::DB::SeqFeature::Store->new(-adaptor=>"memory", -dsn=>"casbon.gff3") $Bio_DB_SeqFeature_Store_memory1 = Bio::DB::SeqFeature::Store::memory=HASH(0xa27ceec); $ $db->features(-seq_id=>"CYP2C8") $ARRAY1 = [ Feature:src(41), region(CYP2C8), Feature:src(37), Feature:src(39), Feature:src(42), Feature:src(40), Feature:src(38) ]; ============== I expected to also see the features with IDs 43 and 44 (the gff3 file is attached). I think there's a problem in the filter_by_location method. If start and end parameters aren't passed to the method, it sets default start and end values that lead it to examine all of the bins in its index. But the end value that it creates is at the beginning of the last bin, and I think it should be at the end of the last bin instead. The attached patch changes it to be at the end of the last bin. Regards, Mitch -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: casbon.gff3 URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: bdsfsm-filter_by_location.patch URL: From jason at bioperl.org Fri Jan 15 00:20:43 2010 From: jason at bioperl.org (Jason Stajich) Date: Thu, 14 Jan 2010 16:20:43 -0800 Subject: [Bioperl-l] AlignIO crashes when reading stockholm alignment with negative PDB ranges In-Reply-To: <4B4E7A07.7070805@gmail.com> References: <49C2B97B.7070304@gmail.com> <49C405F0.5050100@gmail.com> <4B4E7A07.7070805@gmail.com> Message-ID: <94913CE4-22AA-407C-9B27-7572A680C9F7@bioperl.org> Seems like improper data really -- "-1" is an improper coordinate as far as the parser is concerned. You may want to tell Pfam that there is possible error in the dumper since that was the only record that had this problem? -jason On Jan 13, 2010, at 5:57 PM, albezg wrote: > Hi all, > > I have a problem using AlignIO to read Pfam database: > ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release/Pfam-A.seed.gz > The database is in STOCKHOLM 1.0 format. AlignIO can read the > alignment OK until the alignment PF00331.13. There it crashes with > the following message: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: '1-344' is not an integer. > > STACK: Error::throw > STACK: Bio::Root::Root::throw /home/albezg/lib/perl5/site_perl/ > 5.10.0/Bio/Root/Root.pm:368 > STACK: Bio::Range::end /home/albezg/lib/perl5/site_perl/5.10.0/Bio/ > Range.pm:228 > STACK: Bio::Annotation::Target::new /home/albezg/lib/perl5/site_perl/ > 5.10.0/Bio/Annotation/Target.pm:82 > STACK: > Bio::AlignIO::Handler::GenericAlignHandler::_stockholm_target /home/ > albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/ > GenericAlignHandler.pm:293 > STACK: Bio::AlignIO::Handler::GenericAlignHandler::data_handler / > home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/ > GenericAlignHandler.pm:73 > STACK: Bio::AlignIO::stockholm::next_aln /home/albezg/lib/perl5/ > site_perl/5.10.0/Bio/AlignIO/stockholm.pm:471 > STACK: /home/albezg/scripts/pfam2fasta.pl:22 > ----------------------------------------------------------- > > It appears this is caused by this entry: > #=GS XYNA_PSEFL/263-608 DR PDB; 1e5n B; -1-344; > > I don't care about residues in PDB, so I have just removed minus > signs from the ranges. This seems to have fixed the crashing. > > Is it a known problem? Is there a solution for it? > > Thanks, > Alexandr > > > On 03/20/2009 05:09 PM, albezg wrote: >> >> I'm trying to change FASTA header(display_id) for a sequence in an >> alignment(SimpleAlign). >> >> There are no issues when I print it, however when I use AlignIO to >> write >> the alignment to a FASTA file, it does not work. Is this behavior >> intended? >> >> Demo code: http://github.com/jhannah/sandbox/tree/master/Bio_AlignIO_bug >> >> The error: >> ------------- EXCEPTION ------------- >> MSG: No sequence with name [1/1-11] >> STACK Bio::SimpleAlign::displayname >> /scratch/BioSoftware/bioperl-live/Bio/SimpleAlign.pm:2659 >> STACK Bio::AlignIO::fasta::write_aln >> /scratch/BioSoftware/bioperl-live/Bio/AlignIO/fasta.pm:200 >> STACK toplevel ./demo.pl:14 >> ------------------------------------- >> >> Alexandr > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ From maj at fortinbras.us Fri Jan 15 02:00:31 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 14 Jan 2010 21:00:31 -0500 Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes In-Reply-To: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net> References: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net> Message-ID: How about returning 1, 2, 4 for the non-zero cases, with some error constants set for convenience? MAJ ----- Original Message ----- From: "Scott Markel" To: Sent: Thursday, January 14, 2010 5:58 PM Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes > We've been looking at Bio::Tools::Run::RemoteBlast after some feedback > from our customers. Due to network irregularities (not sure what else > to call it) users see the getting of remote BLAST results as somewhat > random. When results come back the hits are fine, but sometimes no > information comes back at all. Retrying helps. > > In looking at RemoteBlast.pm there are four "return -1" cases. > > * $status eq 'ERROR' (return on line 614) > * $line =~ /ERROR/I (return on line 628) > * !$got_content (return on line 648) > * !$response->is_success (return on line 655) > > In the case of no content we'd like to retry remote BLAST. We're happy > to do that part in our Pipeline Pilot Perl code wrapper for the BioPerl > module, but we only want to retry in that case, not the other three. > > What would happen if that third "return -1" changed to a different > return value? > > Scott > > Scott Markel, Ph.D. > Principal Bioinformatics Architect email: smarkel at accelrys.com > Accelrys (Pipeline Pilot R&D) mobile: +1 858 205 3653 > 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 > San Diego, CA 92121 fax: +1 858 799 5222 > USA web: http://www.accelrys.com > > http://www.linkedin.com/in/smarkel > Vice President, Board of Directors: > International Society for Computational Biology > Chair: ISCB Publications Committee > Associate Editor: PLoS Computational Biology > Editorial Board: Briefings in Bioinformatics > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Fri Jan 15 00:42:31 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 14 Jan 2010 18:42:31 -0600 Subject: [Bioperl-l] reading blast report In-Reply-To: <4b4f7b74.5744f10a.7087.4813@mx.google.com> References: <4b4f6d3b.5544f10a.40b1.4126@mx.google.com> <83075C66-90CC-4250-890E-FE42F5FEB019@bioperl.org> <4b4f7b74.5744f10a.7087.4813@mx.google.com> Message-ID: <0B76CCA7-C37C-4E24-BBDF-C8FD805DBBF2@illinois.edu> On Jan 14, 2010, at 2:15 PM, Siddhartha Basu wrote: > On Thu, 14 Jan 2010, Jason Stajich wrote: > >> What aspects of the report are you loading? You might consider the blast >> report as tab-delimited (-m 8 format) if you only are interested in >> start/end positions and scores of ailgnments which is a simpler and reduced >> dataset that has lower memory footprint by the parser. > > I think this would be a better approach i am mostly interested in > start/end/score data only. > >> Searchio (default) -format => blast - you can try the BLAST -format => >> blast_pull instead which lazy parses to create objects and will reduce >> memory consumption. > > It's another good option though. But just out of curosity, so the > regular blast parser do load the entire file in the memory consider the > output consist of multiple Results concatenated together into a > single file. Could anybody clarify. Yes, the original SearchIO parsers all load the data into objects. This was based on the presumption that one wouldn't want very large BLAST reports, but this assumption probably isn't amenable today. The pull parser is one aswer to that, in it pulls the data only upon request (creates them on the fly), so it should be more amenable to parsing very large BLAST reports. > thanks, > -siddhartha > >> -jason chris From cjfields at illinois.edu Fri Jan 15 06:33:50 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 15 Jan 2010 00:33:50 -0600 Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes In-Reply-To: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net> References: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net> Message-ID: Scott, I think this is fine (to change the third condition and retry with a specific code). The other possibility is to simply throw different exceptions under each of these circumstances, which can be caught via eval to allow a retry under only certain conditions (no content, for instance). One interesting bit: I think (though I'm not sure) the new BLAST+ allows remote BLAST queries from command line, similar to the legacy blastcl3. Mark just wrote up a BLAST+ wrapper, so it might be worth testing that theory out. chris PS - BTW, nice to finally meet you at GMOD! On Jan 14, 2010, at 4:58 PM, Scott Markel wrote: > We've been looking at Bio::Tools::Run::RemoteBlast after some feedback > from our customers. Due to network irregularities (not sure what else > to call it) users see the getting of remote BLAST results as somewhat > random. When results come back the hits are fine, but sometimes no > information comes back at all. Retrying helps. > > In looking at RemoteBlast.pm there are four "return -1" cases. > > * $status eq 'ERROR' (return on line 614) > * $line =~ /ERROR/I (return on line 628) > * !$got_content (return on line 648) > * !$response->is_success (return on line 655) > > In the case of no content we'd like to retry remote BLAST. We're happy > to do that part in our Pipeline Pilot Perl code wrapper for the BioPerl > module, but we only want to retry in that case, not the other three. > > What would happen if that third "return -1" changed to a different > return value? > > Scott > > Scott Markel, Ph.D. > Principal Bioinformatics Architect email: smarkel at accelrys.com > Accelrys (Pipeline Pilot R&D) mobile: +1 858 205 3653 > 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 > San Diego, CA 92121 fax: +1 858 799 5222 > USA web: http://www.accelrys.com > > http://www.linkedin.com/in/smarkel > Vice President, Board of Directors: > International Society for Computational Biology > Chair: ISCB Publications Committee > Associate Editor: PLoS Computational Biology > Editorial Board: Briefings in Bioinformatics > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields1 at gmail.com Fri Jan 15 06:35:35 2010 From: cjfields1 at gmail.com (Christopher Fields) Date: Fri, 15 Jan 2010 00:35:35 -0600 Subject: [Bioperl-l] filter_by_location in Bio::DB::SeqFeature::Store::memory In-Reply-To: <4B4F966D.3030300@berkeley.edu> References: <4B4F966D.3030300@berkeley.edu> Message-ID: <992796AC-B85B-4555-88A1-36000C0A2002@gmail.com> An HTML attachment was scrubbed... URL: From David.Messina at sbc.su.se Fri Jan 15 15:17:14 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 15 Jan 2010 16:17:14 +0100 Subject: [Bioperl-l] getting/setting species names with Bio::Species Message-ID: <86045871-98A8-46A3-9ECC-699C32CFE1A1@sbc.su.se> Hi everybody, I'm having a little trouble with names in Bio::Species objects. According to the Bio::Species documentation, if I have a species name as a string, like "Homo sapiens", I can get and set that using the species method: my $my_species_obj = Bio::Species->new(); $my_species_obj->species('Homo sapiens'); print $my_species_obj->species; # 'Homo sapiens' That works fine if I create the Bio::Species object myself. But if I try to get that string back out from a BIo::Species object created by SeqIO from a genbank file, I get just 'sapiens' back: my $io = Bio::SeqIO->new('-format' => 'genbank', '-file' => 'hoxa2.gb'); my $seq_obj = $io->next_seq; my $io_species_obj = $seq_obj->species; print $io_species_obj->species; # 'sapiens' I think that happens because genbank records have more taxonomic info about the species name, like the genus (and in fact the whole taxonomic categorization: kingdom phylum order, etc). So the genus is stored separately. Poking around a bit more in Bio::Species, I turned up the method 'binomial', which appears to do the right thing, returning genus and species in both cases. Except, as you can see, the space is stripped out for my species-name-is-just-a-string object: print $my_species_obj->binomial; # 'Homosapiens' print $io_species_obj->binomial; # 'Homo sapiens' I'm not very familiar with Bio::Species (and its parent Bio::Taxon); am I using it correctly above, or is there a better way? If not, this kinda looks like a bug to me. I've got a patch which works and passes the BioPerl test suite. Thanks, Dave From maj at fortinbras.us Fri Jan 15 15:31:16 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 15 Jan 2010 10:31:16 -0500 Subject: [Bioperl-l] getting/setting species names with Bio::Species In-Reply-To: <86045871-98A8-46A3-9ECC-699C32CFE1A1@sbc.su.se> References: <86045871-98A8-46A3-9ECC-699C32CFE1A1@sbc.su.se> Message-ID: I'm not that familiar with Bio::Species either, but this looks like conflicting semantics betwen Bio::Species and Bio::SeqIO. Bio::SeqIO sets the species accessor to the 'species' element of the lineage array, I believe. FWIW, I'd prefer "binomial" = "genus" . "species" MAJ ----- Original Message ----- From: "Dave Messina" To: "BioPerl List" Sent: Friday, January 15, 2010 10:17 AM Subject: [Bioperl-l] getting/setting species names with Bio::Species > Hi everybody, > > I'm having a little trouble with names in Bio::Species objects. > > According to the Bio::Species documentation, if I have a species name as a > string, like "Homo sapiens", I can get and set that using the species method: > > my $my_species_obj = Bio::Species->new(); > $my_species_obj->species('Homo sapiens'); > > print $my_species_obj->species; # 'Homo sapiens' > > > That works fine if I create the Bio::Species object myself. > > But if I try to get that string back out from a BIo::Species object created by > SeqIO from a genbank file, I get just 'sapiens' back: > > my $io = Bio::SeqIO->new('-format' => 'genbank', > '-file' => 'hoxa2.gb'); > my $seq_obj = $io->next_seq; > my $io_species_obj = $seq_obj->species; > > print $io_species_obj->species; # 'sapiens' > > > I think that happens because genbank records have more taxonomic info about > the species name, like the genus (and in fact the whole taxonomic > categorization: kingdom phylum order, etc). So the genus is stored separately. > > Poking around a bit more in Bio::Species, I turned up the method 'binomial', > which appears to do the right thing, returning genus and species in both > cases. Except, as you can see, the space is stripped out for my > species-name-is-just-a-string object: > > print $my_species_obj->binomial; # 'Homosapiens' > print $io_species_obj->binomial; # 'Homo sapiens' > > > I'm not very familiar with Bio::Species (and its parent Bio::Taxon); am I > using it correctly above, or is there a better way? > > If not, this kinda looks like a bug to me. I've got a patch which works and > passes the BioPerl test suite. > > > Thanks, > Dave > > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From maj at fortinbras.us Fri Jan 15 15:24:06 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 15 Jan 2010 10:24:06 -0500 Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes In-Reply-To: References: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net> Message-ID: True-- blast+ allows remote dbs. I just commited a patch that makes this easy in StandAloneBlastPlus: specify '-remote => 1' in the factory, and downstream command calls will take care of it- MAJ # ex... use Bio::Tools::Run::StandAloneBlastPlus; use Bio::Seq; $ENV{BLASTPLUSDIR} = $where_it_is; my $fac = Bio::Tools::Run::StandAloneBlastPlus->new( -db_name => 'wgs', -remote => 1 ); my $result = $fac->blastn( -query => Bio::Seq->new(-seq=>'ggcaacaaacctggtaaagaagacggcaacaagcctggtaaagaagatggcaacaagcct', -id=>"proteinA") ); 1; ----- Original Message ----- From: "Chris Fields" To: "Scott Markel" Cc: Sent: Friday, January 15, 2010 1:33 AM Subject: Re: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes > Scott, > > I think this is fine (to change the third condition and retry with a specific > code). The other possibility is to simply throw different exceptions under > each of these circumstances, which can be caught via eval to allow a retry > under only certain conditions (no content, for instance). > > One interesting bit: I think (though I'm not sure) the new BLAST+ allows > remote BLAST queries from command line, similar to the legacy blastcl3. Mark > just wrote up a BLAST+ wrapper, so it might be worth testing that theory out. > > chris > > PS - BTW, nice to finally meet you at GMOD! > > On Jan 14, 2010, at 4:58 PM, Scott Markel wrote: > >> We've been looking at Bio::Tools::Run::RemoteBlast after some feedback >> from our customers. Due to network irregularities (not sure what else >> to call it) users see the getting of remote BLAST results as somewhat >> random. When results come back the hits are fine, but sometimes no >> information comes back at all. Retrying helps. >> >> In looking at RemoteBlast.pm there are four "return -1" cases. >> >> * $status eq 'ERROR' (return on line 614) >> * $line =~ /ERROR/I (return on line 628) >> * !$got_content (return on line 648) >> * !$response->is_success (return on line 655) >> >> In the case of no content we'd like to retry remote BLAST. We're happy >> to do that part in our Pipeline Pilot Perl code wrapper for the BioPerl >> module, but we only want to retry in that case, not the other three. >> >> What would happen if that third "return -1" changed to a different >> return value? >> >> Scott >> >> Scott Markel, Ph.D. >> Principal Bioinformatics Architect email: smarkel at accelrys.com >> Accelrys (Pipeline Pilot R&D) mobile: +1 858 205 3653 >> 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 >> San Diego, CA 92121 fax: +1 858 799 5222 >> USA web: http://www.accelrys.com >> >> http://www.linkedin.com/in/smarkel >> Vice President, Board of Directors: >> International Society for Computational Biology >> Chair: ISCB Publications Committee >> Associate Editor: PLoS Computational Biology >> Editorial Board: Briefings in Bioinformatics >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From SMarkel at accelrys.com Fri Jan 15 15:40:31 2010 From: SMarkel at accelrys.com (Scott Markel) Date: Fri, 15 Jan 2010 07:40:31 -0800 Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes In-Reply-To: References: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net> Message-ID: <5ACBA19439E77B43A06F4CAB897EC977019B3096@EXCH1-COLO.accelrys.net> Chris, It was nice meeting you and Scott C., too. And seeing Jason again. If you and Mark > How about returning 1, 2, 4 for the non-zero cases, with some > error constants set for convenience? MAJ are okay with adding more return values, that works best for us in Pipeline Pilot. I'll add a Bugzilla entry. Scott -----Original Message----- From: Chris Fields [mailto:cjfields at illinois.edu] Sent: Thursday, 14 January 2010 10:34 PM To: Scott Markel Cc: Bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes Scott, I think this is fine (to change the third condition and retry with a specific code). The other possibility is to simply throw different exceptions under each of these circumstances, which can be caught via eval to allow a retry under only certain conditions (no content, for instance). One interesting bit: I think (though I'm not sure) the new BLAST+ allows remote BLAST queries from command line, similar to the legacy blastcl3. Mark just wrote up a BLAST+ wrapper, so it might be worth testing that theory out. chris PS - BTW, nice to finally meet you at GMOD! On Jan 14, 2010, at 4:58 PM, Scott Markel wrote: > We've been looking at Bio::Tools::Run::RemoteBlast after some feedback > from our customers. Due to network irregularities (not sure what else > to call it) users see the getting of remote BLAST results as somewhat > random. When results come back the hits are fine, but sometimes no > information comes back at all. Retrying helps. > > In looking at RemoteBlast.pm there are four "return -1" cases. > > * $status eq 'ERROR' (return on line 614) > * $line =~ /ERROR/I (return on line 628) > * !$got_content (return on line 648) > * !$response->is_success (return on line 655) > > In the case of no content we'd like to retry remote BLAST. We're happy > to do that part in our Pipeline Pilot Perl code wrapper for the BioPerl > module, but we only want to retry in that case, not the other three. > > What would happen if that third "return -1" changed to a different > return value? > > Scott > > Scott Markel, Ph.D. > Principal Bioinformatics Architect email: smarkel at accelrys.com > Accelrys (Pipeline Pilot R&D) mobile: +1 858 205 3653 > 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 > San Diego, CA 92121 fax: +1 858 799 5222 > USA web: http://www.accelrys.com > > http://www.linkedin.com/in/smarkel > Vice President, Board of Directors: > International Society for Computational Biology > Chair: ISCB Publications Committee > Associate Editor: PLoS Computational Biology > Editorial Board: Briefings in Bioinformatics > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Fri Jan 15 16:00:21 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 15 Jan 2010 10:00:21 -0600 Subject: [Bioperl-l] getting/setting species names with Bio::Species In-Reply-To: References: <86045871-98A8-46A3-9ECC-699C32CFE1A1@sbc.su.se> Message-ID: <16F8A316-FAB3-4D5E-975A-05CE14578982@illinois.edu> > FWIW, I'd prefer "binomial" = "genus" . "species" That's the way Bio::Species is supposed to work, at least when it was refactored by Sendu. But just a note: Bio::Species was considered deprecated (scheduled for the 1.7 release IIRC) for many very good reasons in favor of Bio::Taxon. First and foremost among these is the fact we cannot consistently parse out the genus/species/strain/variant/etc for every organism in GenBank w/o knowing it's full lineage, which means including some taxonomic information. And even then it's highly problematic. We've had several heated discussions on list about how to handle this in a somewhat backwards-compatible way, and the main solution was to forego compatibility issues altogether and eventually deprecate Bio::Species altogether in favor of Bio::Taxon, a class that doesn't make the same assumptions. Bio::Species, in the interim, is-a Bio::Taxon. You'll note that a minimal Bio::DB::Taxonomy instance is constructed from the classification scheme in some instances, but if one had a proper DB link one could link to Entrez Taxonomy or a local flat file indexes DB and grab the info. Bio::Taxon (correct me if I'm wrong on this Sendu, if you're out there) eschews various methods (species, etc) for simpler consistent ones based on Taxonomy, and doesn't force us to handle every exception to getting the genus/species out of a name. That is left up to the user, at their peril. For either one, if you are reproducing the fully qualified name, you probably should use something like node_name() for consistency. Bio::Species also has scientific_name(). With a true Bio::Taxon one would need to be check this is performed on the species node. chris On Jan 15, 2010, at 9:31 AM, Mark A. Jensen wrote: > I'm not that familiar with Bio::Species either, but this looks > like conflicting semantics betwen Bio::Species and Bio::SeqIO. > Bio::SeqIO sets the species accessor to the 'species' element of > the lineage array, I believe. > FWIW, I'd prefer "binomial" = "genus" . "species" > MAJ > ----- Original Message ----- From: "Dave Messina" > To: "BioPerl List" > Sent: Friday, January 15, 2010 10:17 AM > Subject: [Bioperl-l] getting/setting species names with Bio::Species > > >> Hi everybody, >> >> I'm having a little trouble with names in Bio::Species objects. >> >> According to the Bio::Species documentation, if I have a species name as a string, like "Homo sapiens", I can get and set that using the species method: >> >> my $my_species_obj = Bio::Species->new(); >> $my_species_obj->species('Homo sapiens'); >> >> print $my_species_obj->species; # 'Homo sapiens' >> >> >> That works fine if I create the Bio::Species object myself. >> >> But if I try to get that string back out from a BIo::Species object created by SeqIO from a genbank file, I get just 'sapiens' back: >> >> my $io = Bio::SeqIO->new('-format' => 'genbank', >> '-file' => 'hoxa2.gb'); >> my $seq_obj = $io->next_seq; >> my $io_species_obj = $seq_obj->species; >> >> print $io_species_obj->species; # 'sapiens' >> >> >> I think that happens because genbank records have more taxonomic info about the species name, like the genus (and in fact the whole taxonomic categorization: kingdom phylum order, etc). So the genus is stored separately. >> >> Poking around a bit more in Bio::Species, I turned up the method 'binomial', which appears to do the right thing, returning genus and species in both cases. Except, as you can see, the space is stripped out for my species-name-is-just-a-string object: >> >> print $my_species_obj->binomial; # 'Homosapiens' >> print $io_species_obj->binomial; # 'Homo sapiens' >> >> >> I'm not very familiar with Bio::Species (and its parent Bio::Taxon); am I using it correctly above, or is there a better way? >> >> If not, this kinda looks like a bug to me. I've got a patch which works and passes the BioPerl test suite. >> >> >> Thanks, >> Dave >> >> >> >> >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From SMarkel at accelrys.com Fri Jan 15 16:10:34 2010 From: SMarkel at accelrys.com (Scott Markel) Date: Fri, 15 Jan 2010 08:10:34 -0800 Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes In-Reply-To: References: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net> <5ACBA19439E77B43A06F4CAB897EC977019B3096@EXCH1-COLO.accelrys.net> Message-ID: <5ACBA19439E77B43A06F4CAB897EC977019B30A7@EXCH1-COLO.accelrys.net> Mark, Thank you. Scott -----Original Message----- From: Mark A. Jensen [mailto:maj at fortinbras.us] Sent: Friday, 15 January 2010 8:10 AM To: Scott Markel; Chris Fields Cc: Bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes can do Scott-- cheers MAJ ----- Original Message ----- From: "Scott Markel" To: "Chris Fields" Cc: Sent: Friday, January 15, 2010 10:40 AM Subject: Re: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes > Chris, > > It was nice meeting you and Scott C., too. And seeing Jason again. > > If you and Mark > >> How about returning 1, 2, 4 for the non-zero cases, with some >> error constants set for convenience? MAJ > > are okay with adding more return values, that works best for us in > Pipeline Pilot. > > I'll add a Bugzilla entry. > > Scott > > > -----Original Message----- > From: Chris Fields [mailto:cjfields at illinois.edu] > Sent: Thursday, 14 January 2010 10:34 PM > To: Scott Markel > Cc: Bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes > > Scott, > > I think this is fine (to change the third condition and retry with a specific > code). The other possibility is to simply throw different exceptions under > each of these circumstances, which can be caught via eval to allow a retry > under only certain conditions (no content, for instance). > > One interesting bit: I think (though I'm not sure) the new BLAST+ allows > remote BLAST queries from command line, similar to the legacy blastcl3. Mark > just wrote up a BLAST+ wrapper, so it might be worth testing that theory out. > > chris > > PS - BTW, nice to finally meet you at GMOD! > > On Jan 14, 2010, at 4:58 PM, Scott Markel wrote: > >> We've been looking at Bio::Tools::Run::RemoteBlast after some feedback >> from our customers. Due to network irregularities (not sure what else >> to call it) users see the getting of remote BLAST results as somewhat >> random. When results come back the hits are fine, but sometimes no >> information comes back at all. Retrying helps. >> >> In looking at RemoteBlast.pm there are four "return -1" cases. >> >> * $status eq 'ERROR' (return on line 614) >> * $line =~ /ERROR/I (return on line 628) >> * !$got_content (return on line 648) >> * !$response->is_success (return on line 655) >> >> In the case of no content we'd like to retry remote BLAST. We're happy >> to do that part in our Pipeline Pilot Perl code wrapper for the BioPerl >> module, but we only want to retry in that case, not the other three. >> >> What would happen if that third "return -1" changed to a different >> return value? >> >> Scott >> >> Scott Markel, Ph.D. >> Principal Bioinformatics Architect email: smarkel at accelrys.com >> Accelrys (Pipeline Pilot R&D) mobile: +1 858 205 3653 >> 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 >> San Diego, CA 92121 fax: +1 858 799 5222 >> USA web: http://www.accelrys.com >> >> http://www.linkedin.com/in/smarkel >> Vice President, Board of Directors: >> International Society for Computational Biology >> Chair: ISCB Publications Committee >> Associate Editor: PLoS Computational Biology >> Editorial Board: Briefings in Bioinformatics >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From maj at fortinbras.us Fri Jan 15 16:09:38 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 15 Jan 2010 11:09:38 -0500 Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes In-Reply-To: <5ACBA19439E77B43A06F4CAB897EC977019B3096@EXCH1-COLO.accelrys.net> References: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net> <5ACBA19439E77B43A06F4CAB897EC977019B3096@EXCH1-COLO.accelrys.net> Message-ID: can do Scott-- cheers MAJ ----- Original Message ----- From: "Scott Markel" To: "Chris Fields" Cc: Sent: Friday, January 15, 2010 10:40 AM Subject: Re: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes > Chris, > > It was nice meeting you and Scott C., too. And seeing Jason again. > > If you and Mark > >> How about returning 1, 2, 4 for the non-zero cases, with some >> error constants set for convenience? MAJ > > are okay with adding more return values, that works best for us in > Pipeline Pilot. > > I'll add a Bugzilla entry. > > Scott > > > -----Original Message----- > From: Chris Fields [mailto:cjfields at illinois.edu] > Sent: Thursday, 14 January 2010 10:34 PM > To: Scott Markel > Cc: Bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes > > Scott, > > I think this is fine (to change the third condition and retry with a specific > code). The other possibility is to simply throw different exceptions under > each of these circumstances, which can be caught via eval to allow a retry > under only certain conditions (no content, for instance). > > One interesting bit: I think (though I'm not sure) the new BLAST+ allows > remote BLAST queries from command line, similar to the legacy blastcl3. Mark > just wrote up a BLAST+ wrapper, so it might be worth testing that theory out. > > chris > > PS - BTW, nice to finally meet you at GMOD! > > On Jan 14, 2010, at 4:58 PM, Scott Markel wrote: > >> We've been looking at Bio::Tools::Run::RemoteBlast after some feedback >> from our customers. Due to network irregularities (not sure what else >> to call it) users see the getting of remote BLAST results as somewhat >> random. When results come back the hits are fine, but sometimes no >> information comes back at all. Retrying helps. >> >> In looking at RemoteBlast.pm there are four "return -1" cases. >> >> * $status eq 'ERROR' (return on line 614) >> * $line =~ /ERROR/I (return on line 628) >> * !$got_content (return on line 648) >> * !$response->is_success (return on line 655) >> >> In the case of no content we'd like to retry remote BLAST. We're happy >> to do that part in our Pipeline Pilot Perl code wrapper for the BioPerl >> module, but we only want to retry in that case, not the other three. >> >> What would happen if that third "return -1" changed to a different >> return value? >> >> Scott >> >> Scott Markel, Ph.D. >> Principal Bioinformatics Architect email: smarkel at accelrys.com >> Accelrys (Pipeline Pilot R&D) mobile: +1 858 205 3653 >> 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 >> San Diego, CA 92121 fax: +1 858 799 5222 >> USA web: http://www.accelrys.com >> >> http://www.linkedin.com/in/smarkel >> Vice President, Board of Directors: >> International Society for Computational Biology >> Chair: ISCB Publications Committee >> Associate Editor: PLoS Computational Biology >> Editorial Board: Briefings in Bioinformatics >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From maj at fortinbras.us Fri Jan 15 16:10:02 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 15 Jan 2010 11:10:02 -0500 Subject: [Bioperl-l] getting/setting species names with Bio::Species In-Reply-To: <16F8A316-FAB3-4D5E-975A-05CE14578982@illinois.edu> References: <86045871-98A8-46A3-9ECC-699C32CFE1A1@sbc.su.se> <16F8A316-FAB3-4D5E-975A-05CE14578982@illinois.edu> Message-ID: excellent summary--thanks!! ----- Original Message ----- From: "Chris Fields" To: "Mark A. Jensen" Cc: "BioPerl List" Sent: Friday, January 15, 2010 11:00 AM Subject: Re: [Bioperl-l] getting/setting species names with Bio::Species >> FWIW, I'd prefer "binomial" = "genus" . "species" > > > That's the way Bio::Species is supposed to work, at least when it was > refactored by Sendu. But just a note: Bio::Species was considered deprecated > (scheduled for the 1.7 release IIRC) for many very good reasons in favor of > Bio::Taxon. First and foremost among these is the fact we cannot consistently > parse out the genus/species/strain/variant/etc for every organism in GenBank > w/o knowing it's full lineage, which means including some taxonomic > information. And even then it's highly problematic. > > We've had several heated discussions on list about how to handle this in a > somewhat backwards-compatible way, and the main solution was to forego > compatibility issues altogether and eventually deprecate Bio::Species > altogether in favor of Bio::Taxon, a class that doesn't make the same > assumptions. Bio::Species, in the interim, is-a Bio::Taxon. You'll note that > a minimal Bio::DB::Taxonomy instance is constructed from the classification > scheme in some instances, but if one had a proper DB link one could link to > Entrez Taxonomy or a local flat file indexes DB and grab the info. Bio::Taxon > (correct me if I'm wrong on this Sendu, if you're out there) eschews various > methods (species, etc) for simpler consistent ones based on Taxonomy, and > doesn't force us to handle every exception to getting the genus/species out of > a name. That is left up to the user, at their peril. > > For either one, if you are reproducing the fully qualified name, you probably > should use something like node_name() for consistency. Bio::Species also has > scientific_name(). With a true Bio::Taxon one would need to be check this is > performed on the species node. > > chris > > On Jan 15, 2010, at 9:31 AM, Mark A. Jensen wrote: > >> I'm not that familiar with Bio::Species either, but this looks >> like conflicting semantics betwen Bio::Species and Bio::SeqIO. >> Bio::SeqIO sets the species accessor to the 'species' element of >> the lineage array, I believe. >> FWIW, I'd prefer "binomial" = "genus" . "species" >> MAJ >> ----- Original Message ----- From: "Dave Messina" >> To: "BioPerl List" >> Sent: Friday, January 15, 2010 10:17 AM >> Subject: [Bioperl-l] getting/setting species names with Bio::Species >> >> >>> Hi everybody, >>> >>> I'm having a little trouble with names in Bio::Species objects. >>> >>> According to the Bio::Species documentation, if I have a species name as a >>> string, like "Homo sapiens", I can get and set that using the species >>> method: >>> >>> my $my_species_obj = Bio::Species->new(); >>> $my_species_obj->species('Homo sapiens'); >>> >>> print $my_species_obj->species; # 'Homo sapiens' >>> >>> >>> That works fine if I create the Bio::Species object myself. >>> >>> But if I try to get that string back out from a BIo::Species object created >>> by SeqIO from a genbank file, I get just 'sapiens' back: >>> >>> my $io = Bio::SeqIO->new('-format' => 'genbank', >>> '-file' => 'hoxa2.gb'); >>> my $seq_obj = $io->next_seq; >>> my $io_species_obj = $seq_obj->species; >>> >>> print $io_species_obj->species; # 'sapiens' >>> >>> >>> I think that happens because genbank records have more taxonomic info about >>> the species name, like the genus (and in fact the whole taxonomic >>> categorization: kingdom phylum order, etc). So the genus is stored >>> separately. >>> >>> Poking around a bit more in Bio::Species, I turned up the method 'binomial', >>> which appears to do the right thing, returning genus and species in both >>> cases. Except, as you can see, the space is stripped out for my >>> species-name-is-just-a-string object: >>> >>> print $my_species_obj->binomial; # 'Homosapiens' >>> print $io_species_obj->binomial; # 'Homo sapiens' >>> >>> >>> I'm not very familiar with Bio::Species (and its parent Bio::Taxon); am I >>> using it correctly above, or is there a better way? >>> >>> If not, this kinda looks like a bug to me. I've got a patch which works and >>> passes the BioPerl test suite. >>> >>> >>> Thanks, >>> Dave >>> >>> >>> >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From hlapp at drycafe.net Fri Jan 15 17:04:43 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Fri, 15 Jan 2010 12:04:43 -0500 Subject: [Bioperl-l] getting/setting species names with Bio::Species In-Reply-To: <86045871-98A8-46A3-9ECC-699C32CFE1A1@sbc.su.se> References: <86045871-98A8-46A3-9ECC-699C32CFE1A1@sbc.su.se> Message-ID: <2114E010-7819-4E74-9A92-C6DE6496ED0E@drycafe.net> On Jan 15, 2010, at 10:17 AM, Dave Messina wrote: > According to the Bio::Species documentation, if I have a species > name as a string, like "Homo sapiens", I can get and set that using > the species method: > > my $my_species_obj = Bio::Species->new(); > $my_species_obj->species('Homo sapiens'); If that's really what the documentation says, it's wrong. It is the binomial() method that does this (as getter and setter). -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From David.Messina at sbc.su.se Fri Jan 15 18:37:17 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 15 Jan 2010 19:37:17 +0100 Subject: [Bioperl-l] getting/setting species names with Bio::Species In-Reply-To: <2114E010-7819-4E74-9A92-C6DE6496ED0E@drycafe.net> References: <86045871-98A8-46A3-9ECC-699C32CFE1A1@sbc.su.se> <2114E010-7819-4E74-9A92-C6DE6496ED0E@drycafe.net> Message-ID: <24798E45-CF24-47D9-AB39-E66C35A5FA8B@sbc.su.se> Thanks guys. Well, looks like I ignored the deprecation warnings at my own peril. :) I'll reimplement my code using Bio::Taxon directly instead. I made a little test using the node_name() method as Chris suggested, and it seems to do the trick nicely. > If that's really what the documentation says, it's wrong. I'm afraid so. In the POD > Title : species > Usage : $self->species( $species ); > $species = $self->species(); > Function: Get or set the scientific species name. > Example : $self->species('Homo sapiens'); > Returns : Scientific species name as string > Args : Scientific species name as string and the HOWTO http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#The_Species_Object > # legible and long > my $species_object = $seq_object->species; > my $species_string = $species_object->species; > > # Perlish > my $species_string = $seq_object->species->species; > # either way, $species_string is "Homo sapiens" Unless there's objection, I'll fix both of those. > It is the binomial() method that does this (as getter and setter). Great, thanks for the clarification, Hilmar. From bhakti.dwivedi at gmail.com Sun Jan 17 16:02:47 2010 From: bhakti.dwivedi at gmail.com (Bhakti Dwivedi) Date: Sun, 17 Jan 2010 11:02:47 -0500 Subject: [Bioperl-l] Reciprocal best hits using Bioperl? Message-ID: Hi Is there a Bio-perl module to parse the reciprocal best hits (query1-> hit1 && hit1 -> query1) from a blast table report? Thanks BD From cjfields at illinois.edu Sun Jan 17 17:45:08 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 17 Jan 2010 11:45:08 -0600 Subject: [Bioperl-l] Reciprocal best hits using Bioperl? In-Reply-To: References: Message-ID: <4FC546A8-079F-4A17-AB96-D4A0060904D6@illinois.edu> It's probably not best to use BioPerl directly for this. Have you tried OrthoMCL, or InParanoid? chris On Jan 17, 2010, at 10:02 AM, Bhakti Dwivedi wrote: > Hi > > Is there a Bio-perl module to parse the reciprocal best hits (query1-> hit1 > && hit1 -> query1) from a blast table report? > > Thanks > > BD > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Sun Jan 17 21:03:24 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 17 Jan 2010 16:03:24 -0500 Subject: [Bioperl-l] Reciprocal best hits using Bioperl? In-Reply-To: References: Message-ID: re Chris's answer, check out this archived post: http://bioperl.org/pipermail/bioperl-l/2008-March/027357.html cheers MAJ ----- Original Message ----- From: "Bhakti Dwivedi" To: Sent: Sunday, January 17, 2010 11:02 AM Subject: [Bioperl-l] Reciprocal best hits using Bioperl? > Hi > > Is there a Bio-perl module to parse the reciprocal best hits (query1-> hit1 > && hit1 -> query1) from a blast table report? > > Thanks > > BD > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From bhakti.dwivedi at gmail.com Sun Jan 17 21:10:03 2010 From: bhakti.dwivedi at gmail.com (Bhakti Dwivedi) Date: Sun, 17 Jan 2010 16:10:03 -0500 Subject: [Bioperl-l] Reciprocal best hits using Bioperl? In-Reply-To: References: Message-ID: Thank you! On Sun, Jan 17, 2010 at 4:03 PM, Mark A. Jensen wrote: > re Chris's answer, check out this archived post: > http://bioperl.org/pipermail/bioperl-l/2008-March/027357.html > cheers MAJ > ----- Original Message ----- From: "Bhakti Dwivedi" < > bhakti.dwivedi at gmail.com> > To: > Sent: Sunday, January 17, 2010 11:02 AM > Subject: [Bioperl-l] Reciprocal best hits using Bioperl? > > > Hi >> >> Is there a Bio-perl module to parse the reciprocal best hits (query1-> >> hit1 >> && hit1 -> query1) from a blast table report? >> >> Thanks >> >> BD >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> From cjfields at illinois.edu Sun Jan 17 22:00:02 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 17 Jan 2010 16:00:02 -0600 Subject: [Bioperl-l] Reciprocal best hits using Bioperl? In-Reply-To: References: Message-ID: <392263B8-10EC-4361-82A8-0ED8E9FC7627@illinois.edu> OrthoMCL has updated to v2 and no longer uses BioPerl, just plain perl. Database is available here: http://orthomcl.org/cgi-bin/OrthoMclWeb.cgi Package (you'll need a few other things to get it working): http://orthomcl.org/common/downloads/software/ chris On Jan 17, 2010, at 3:10 PM, Bhakti Dwivedi wrote: > Thank you! > > > On Sun, Jan 17, 2010 at 4:03 PM, Mark A. Jensen wrote: > >> re Chris's answer, check out this archived post: >> http://bioperl.org/pipermail/bioperl-l/2008-March/027357.html >> cheers MAJ >> ----- Original Message ----- From: "Bhakti Dwivedi" < >> bhakti.dwivedi at gmail.com> >> To: >> Sent: Sunday, January 17, 2010 11:02 AM >> Subject: [Bioperl-l] Reciprocal best hits using Bioperl? >> >> >> Hi >>> >>> Is there a Bio-perl module to parse the reciprocal best hits (query1-> >>> hit1 >>> && hit1 -> query1) from a blast table report? >>> >>> Thanks >>> >>> BD >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From tristan.lefebure at gmail.com Sun Jan 17 23:12:56 2010 From: tristan.lefebure at gmail.com (Tristan Lefebure) Date: Sun, 17 Jan 2010 18:12:56 -0500 Subject: [Bioperl-l] Reciprocal best hits using Bioperl? In-Reply-To: <392263B8-10EC-4361-82A8-0ED8E9FC7627@illinois.edu> References: <392263B8-10EC-4361-82A8-0ED8E9FC7627@illinois.edu> Message-ID: <201001171812.56238.tristan.lefebure@gmail.com> The transition to orthoMCL v2 being a bit painful (you need a MySQL database), I recently switched directly to MCL and the accompanying mclblastline and co programs. Modular, simple and very fast. Following some simulations, It gives better results with incomplete genomes than orthoMCL v1.x ... http://micans.org/mcl/ --Tristan On Sunday 17 January 2010 17:00:02 Chris Fields wrote: > OrthoMCL has updated to v2 and no longer uses BioPerl, > just plain perl. Database is available here: > > http://orthomcl.org/cgi-bin/OrthoMclWeb.cgi > > Package (you'll need a few other things to get it > working): > > http://orthomcl.org/common/downloads/software/ > > chris > > On Jan 17, 2010, at 3:10 PM, Bhakti Dwivedi wrote: > > Thank you! > > > > On Sun, Jan 17, 2010 at 4:03 PM, Mark A. Jensen wrote: > >> re Chris's answer, check out this archived post: > >> http://bioperl.org/pipermail/bioperl-l/2008-March/0273 > >>57.html cheers MAJ > >> ----- Original Message ----- From: "Bhakti Dwivedi" < > >> bhakti.dwivedi at gmail.com> > >> To: > >> Sent: Sunday, January 17, 2010 11:02 AM > >> Subject: [Bioperl-l] Reciprocal best hits using > >> Bioperl? > >> > >> > >> Hi > >> > >>> Is there a Bio-perl module to parse the reciprocal > >>> best hits (query1-> hit1 > >>> && hit1 -> query1) from a blast table report? > >>> > >>> Thanks > >>> > >>> BD > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jason at bioperl.org Sun Jan 17 23:59:05 2010 From: jason at bioperl.org (Jason Stajich) Date: Sun, 17 Jan 2010 15:59:05 -0800 Subject: [Bioperl-l] Reciprocal best hits using Bioperl? In-Reply-To: <201001171812.56238.tristan.lefebure@gmail.com> References: <392263B8-10EC-4361-82A8-0ED8E9FC7627@illinois.edu> <201001171812.56238.tristan.lefebure@gmail.com> Message-ID: <9151FF3A-B51D-4E03-AAA0-4463A6A57F10@bioperl.org> yes - but mcl alone is something slightly different in that it doesn't correct for inparalogs, but for incomplete genomes this is probably okay. orthomcl2 does correct the major memory hog problem and efficiencies in the parsing in the previous version by relying on the db for the indexing and looking of the reciprocal hits. -jason On Jan 17, 2010, at 3:12 PM, Tristan Lefebure wrote: > The transition to orthoMCL v2 being a bit painful (you need > a MySQL database), I recently switched directly to MCL and > the accompanying mclblastline and co programs. Modular, > simple and very fast. Following some simulations, It gives > better results with incomplete genomes than orthoMCL v1.x > ... > > http://micans.org/mcl/ > > --Tristan > > On Sunday 17 January 2010 17:00:02 Chris Fields wrote: >> OrthoMCL has updated to v2 and no longer uses BioPerl, >> just plain perl. Database is available here: >> >> http://orthomcl.org/cgi-bin/OrthoMclWeb.cgi >> >> Package (you'll need a few other things to get it >> working): >> >> http://orthomcl.org/common/downloads/software/ >> >> chris >> >> On Jan 17, 2010, at 3:10 PM, Bhakti Dwivedi wrote: >>> Thank you! >>> >>> On Sun, Jan 17, 2010 at 4:03 PM, Mark A. Jensen > wrote: >>>> re Chris's answer, check out this archived post: >>>> http://bioperl.org/pipermail/bioperl-l/2008-March/0273 >>>> 57.html cheers MAJ >>>> ----- Original Message ----- From: "Bhakti Dwivedi" < >>>> bhakti.dwivedi at gmail.com> >>>> To: >>>> Sent: Sunday, January 17, 2010 11:02 AM >>>> Subject: [Bioperl-l] Reciprocal best hits using >>>> Bioperl? >>>> >>>> >>>> Hi >>>> >>>>> Is there a Bio-perl module to parse the reciprocal >>>>> best hits (query1-> hit1 >>>>> && hit1 -> query1) from a blast table report? >>>>> >>>>> Thanks >>>>> >>>>> BD >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ From tristan.lefebure at gmail.com Mon Jan 18 01:36:38 2010 From: tristan.lefebure at gmail.com (Tristan Lefebure) Date: Sun, 17 Jan 2010 20:36:38 -0500 Subject: [Bioperl-l] Reciprocal best hits using Bioperl? In-Reply-To: <9151FF3A-B51D-4E03-AAA0-4463A6A57F10@bioperl.org> References: <201001171812.56238.tristan.lefebure@gmail.com> <9151FF3A-B51D-4E03-AAA0-4463A6A57F10@bioperl.org> Message-ID: <201001172036.39032.tristan.lefebure@gmail.com> On Sunday 17 January 2010 18:59:05 Jason Stajich wrote: > yes - but mcl alone is something slightly different in > that it doesn't correct for inparalogs, but for > incomplete genomes this is probably okay. interestingly, my experience with not too divergent bacterial genomes (same genera) does not support the normalization used in the orthoMCL (which, as far as I understand, is a standardization of the -Log10(evalue) per taxa combination, including a taxa with itself). MCL, which does not do any normalization (just -Log10(evalue)) gives about the same number of false negative (i.e. missed orthologs), but a lot less false positive (false orthologs). In other words, you get many fake singletons. I don't known exactly if the problem lies in the normalization process or the fact that orthoMCLv1.x is using a very old version of MCL. What I do known is that many false positive are made of short or incomplete proteins that are very common in draft genomes and automatic annotations... Things might be completely different with more divergent and globally longer proteins. Testing orthoMCLv2 on the same data set would probably give the answer. --Tristan From robert.bradbury at gmail.com Mon Jan 18 10:20:33 2010 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Mon, 18 Jan 2010 05:20:33 -0500 Subject: [Bioperl-l] Reciprocal best hits using Bioperl? In-Reply-To: <201001172036.39032.tristan.lefebure@gmail.com> References: <201001171812.56238.tristan.lefebure@gmail.com> <9151FF3A-B51D-4E03-AAA0-4463A6A57F10@bioperl.org> <201001172036.39032.tristan.lefebure@gmail.com> Message-ID: My comment might be that the problem with OrthoMCL is that it is primarily lower organisms. The problem with Ensembl (and some other databases) is that it is primarliy higher organisms (though they do include Drosophila, C. elegans and Yeast). The problem arises when one wants to cross those boundaries. For example the 5-10 antioxidant proteins, the ~150 DNA repair proteins, many of the mitochondrial (ETC) proteins, the ribosomal rRNA's & tRNAs, and the fundamental biochemistry (EC) proteins are homologous all the way from the most ancient bacteria through H. sapiens. The only way to play in the mixed arena of prokaryotes and eukaryotes involving fundamental vectors in evolution is to either construct ones own databases (which presumably means getting involved with MySQL, and probably spending some $$$ on hardware) or to develop some BioPerl modules that can do the SpeciesX vs. SpeciesY comparisons on demand using some part of the cloud. This problem isn't going to get smaller its only going to get larger, now that the cost of sequencing (pseudo-resequencing) a vertebrate genome is starting to come in under $10,000 and people are starting to seriously talk about 10,000 vertebrate genomes. 10,000 x 10,000 x 20,000 (genes) isn't something people are going to undertake very soon. Robert On 1/17/10, Tristan Lefebure wrote: > On Sunday 17 January 2010 18:59:05 Jason Stajich wrote: >> yes - but mcl alone is something slightly different in >> that it doesn't correct for inparalogs, but for >> incomplete genomes this is probably okay. > > interestingly, my experience with not too divergent > bacterial genomes (same genera) does not support the > normalization used in the orthoMCL (which, as far as I > understand, is a standardization of the -Log10(evalue) per > taxa combination, including a taxa with itself). MCL, which > does not do any normalization (just -Log10(evalue)) gives > about the same number of false negative (i.e. missed > orthologs), but a lot less false positive (false orthologs). > In other words, you get many fake singletons. I don't known > exactly if the problem lies in the normalization process or > the fact that orthoMCLv1.x is using a very old version of > MCL. What I do known is that many false positive are made of > short or incomplete proteins that are very common in draft > genomes and automatic annotations... Things might be > completely different with more divergent and globally longer > proteins. Testing orthoMCLv2 on the same data set would > probably give the answer. > > --Tristan > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From ghhu at sibs.ac.cn Mon Jan 18 02:34:23 2010 From: ghhu at sibs.ac.cn (Guohong Hu) Date: Mon, 18 Jan 2010 10:34:23 +0800 Subject: [Bioperl-l] Bioperl 1.6 Message-ID: <004f01ca97e6$bee06650$3ca132f0$@ac.cn> Hi there, I was trying to install BioPerl in windows using ppm, by following the instruction in "http://bioperl.open-bio.org/wiki/Installing_Bioperl_on_Windows". I set up the repositories, and did the search of Bioperl packages. The latest version available is BioPerl 1.6.1 on http://bioperl.org/DIST. When I tried to install it, a number of prerequisite modules were being installed too, which include Bioperl 1.4. Then an error message showed up during installation: "ERROR: File conflict for 'C:/Perl/html/bin/bp_aacomp.html'.The package BioPerl has already installed a file that package bioperl wants to install." It looks to me that BioPerl 1.6.1 had installed a file that bioperl 1.4 wanted to install again. I don't know why bioperl 1.4 was one of the prerequisites for 1.6.1. If I just install 1.4, it will be installed without errors. But I need a newer version, because some modules (like Bio::Tools::HMM) is not included in 1.4. I saw on internet that somebody had the same problem when he was trying to install BioPerl 1.5, but I didn't find the solution. Anybody has a clue on that? Thank you for your time. GH From cjfields at illinois.edu Mon Jan 18 15:30:20 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 18 Jan 2010 09:30:20 -0600 Subject: [Bioperl-l] Bioperl 1.6 In-Reply-To: <004f01ca97e6$bee06650$3ca132f0$@ac.cn> References: <004f01ca97e6$bee06650$3ca132f0$@ac.cn> Message-ID: Guohong, 1.6.1 PPM doesn't (at least, shouldn't) require BioPerl 1.4 to be installed first. Make sure the repos are set according to the Windows installation instructions on the BioPerl wiki: http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows IIRC the actual order of the PPM repository can be critical (PPM pulls based on highest version, first repo, but sometimes it gets confused). Just curious but where is the v 1.4 PPM located? If it is local to our PPM repo I can physically remove it to prevent this from happening. chris On Jan 17, 2010, at 8:34 PM, Guohong Hu wrote: > Hi there, > > > > I was trying to install BioPerl in windows using ppm, by following the > instruction in > "http://bioperl.open-bio.org/wiki/Installing_Bioperl_on_Windows". I set up > the repositories, and did the search of Bioperl packages. The latest version > available is BioPerl 1.6.1 on http://bioperl.org/DIST. When I tried to > install it, a number of prerequisite modules were being installed too, which > include Bioperl 1.4. Then an error message showed up during installation: > > > > "ERROR: File conflict for 'C:/Perl/html/bin/bp_aacomp.html'.The package > BioPerl has already installed a file that package bioperl wants to install." > > > > It looks to me that BioPerl 1.6.1 had installed a file that bioperl 1.4 > wanted to install again. I don't know why bioperl 1.4 was one of the > prerequisites for 1.6.1. If I just install 1.4, it will be installed without > errors. But I need a newer version, because some modules (like > > Bio::Tools::HMM) is not included in 1.4. > > > > I saw on internet that somebody had the same problem when he was trying to > install BioPerl 1.5, but I didn't find the solution. > > > > Anybody has a clue on that? Thank you for your time. > > > > GH > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Jan 18 16:12:08 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 18 Jan 2010 10:12:08 -0600 Subject: [Bioperl-l] Reciprocal best hits using Bioperl? In-Reply-To: References: <201001171812.56238.tristan.lefebure@gmail.com> <9151FF3A-B51D-4E03-AAA0-4463A6A57F10@bioperl.org> <201001172036.39032.tristan.lefebure@gmail.com> Message-ID: (my small rant on this) On Jan 18, 2010, at 4:20 AM, Robert Bradbury wrote: > My comment might be that the problem with OrthoMCL is that it is > primarily lower organisms. The problem with Ensembl (and some other > databases) is that it is primarliy higher organisms (though they do > include Drosophila, C. elegans and Yeast). OrthoMCL v2 handles both lower and higher organism; I've used it for both, with decent success. Most other ortholog tools do as well (if I'm not mistaken, ensembl also uses MCL under the hood, unless that's changed). I don't believe one should be completely bound to one toolset, particularly in this case (there are lots of nice ortholog clustering tools using various moeans of comparison out there), but I do think OrthoMCL is very good as an initial pass. If anything, I would like a set of (possibly bioperl-based, definitely DB-based) modules that can deal with this information. The more imperative issue in my opinion is that one is prisoner to the gene models for those specific organisms of interest, and this may vary widely depending on the source of those gene models (Ensembl, UCSC, NCBI, EBI, centralized MODs like FlyBase, etc). For instance, if gene models are poorly curated or rarely updated, the comparisons may be significantly flawed. Some of these issues may also be (somewhat) alleviated once more transcriptome data is available that helps clear up gene model ambiguities, but that won't be true for all organisms, at least initially. Note this isn't meant as a slam on any specific DBs or MODs in general, the problem is one born of the fact that there isn't a single, centralized, trusted, consistently updated source for this data, specifically something that will handle moderated third-party annotation. That's a very difficult problem to solve effectively. Some of these very issues crept up at the GMOD conference, and there appears to be consensus that a real attempt is needed to address this. I don't know, maybe it's just unicorns and rainbows. Personally I do think the situation will improve, as there seems to be great demand for it, but it requires time, resources, manpower, money, cat herding, etc. > The problem arises when one wants to cross those boundaries. For > example the 5-10 antioxidant proteins, the ~150 DNA repair proteins, > many of the mitochondrial (ETC) proteins, the ribosomal rRNA's & > tRNAs, and the fundamental biochemistry (EC) proteins are homologous > all the way from the most ancient bacteria through H. sapiens. The > only way to play in the mixed arena of prokaryotes and eukaryotes > involving fundamental vectors in evolution is to either construct ones > own databases (which presumably means getting involved with MySQL, and > probably spending some $$$ on hardware) or to develop some BioPerl > modules that can do the SpeciesX vs. SpeciesY comparisons on demand > using some part of the cloud. This problem isn't going to get smaller > its only going to get larger, now that the cost of sequencing > (pseudo-resequencing) a vertebrate genome is starting to come in under > $10,000 and people are starting to seriously talk about 10,000 > vertebrate genomes. 10,000 x 10,000 x 20,000 (genes) isn't something > people are going to undertake very soon. > > Robert They're already undertaking it now using a broad range of organisms, in and out of the cloud. In most cases one can amend a prior recip. comparative analysis with new data fairly easily, if one takes care to do so early on (i.e. set up the BLAST databases with a specified defined size for comparative stats between separate analyses). OrthoMCL v2 describes a procedure to do this, and I believe others have similar methodology. I could also see possible ways one can further optimize this, for instance in cases where two very closely-related organisms are compared, where translated seqs are 100% identical, etc. IIRC, the OrthoMCL DB site already has a way to upload custom sets of protein data for mapping to (already pre-run) clusters. Just the fact that the tools are available as OS, they're semi-automated, and can be generically applied to data of personal interest is a great boon. Not sure I see the downside of that, and I'm pretty confident the scalability issues will be addressed in some way. chris From maj at fortinbras.us Mon Jan 18 16:33:12 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 18 Jan 2010 11:33:12 -0500 Subject: [Bioperl-l] Bioperl 1.6 In-Reply-To: References: <004f01ca97e6$bee06650$3ca132f0$@ac.cn> Message-ID: <6093E45F17B543438AC02E6C626439E1@NewLife> this issue's come up before, see this thread http://lists.open-bio.org/pipermail/bioperl-l/2009-October/031400.html MAJ ----- Original Message ----- From: "Chris Fields" To: "Guohong Hu" Cc: Sent: Monday, January 18, 2010 10:30 AM Subject: Re: [Bioperl-l] Bioperl 1.6 > Guohong, > > 1.6.1 PPM doesn't (at least, shouldn't) require BioPerl 1.4 to be installed > first. Make sure the repos are set according to the Windows installation > instructions on the BioPerl wiki: > > http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows > > IIRC the actual order of the PPM repository can be critical (PPM pulls based > on highest version, first repo, but sometimes it gets confused). Just curious > but where is the v 1.4 PPM located? If it is local to our PPM repo I can > physically remove it to prevent this from happening. > > chris > > On Jan 17, 2010, at 8:34 PM, Guohong Hu wrote: > >> Hi there, >> >> >> >> I was trying to install BioPerl in windows using ppm, by following the >> instruction in >> "http://bioperl.open-bio.org/wiki/Installing_Bioperl_on_Windows". I set up >> the repositories, and did the search of Bioperl packages. The latest version >> available is BioPerl 1.6.1 on http://bioperl.org/DIST. When I tried to >> install it, a number of prerequisite modules were being installed too, which >> include Bioperl 1.4. Then an error message showed up during installation: >> >> >> >> "ERROR: File conflict for 'C:/Perl/html/bin/bp_aacomp.html'.The package >> BioPerl has already installed a file that package bioperl wants to install." >> >> >> >> It looks to me that BioPerl 1.6.1 had installed a file that bioperl 1.4 >> wanted to install again. I don't know why bioperl 1.4 was one of the >> prerequisites for 1.6.1. If I just install 1.4, it will be installed without >> errors. But I need a newer version, because some modules (like >> >> Bio::Tools::HMM) is not included in 1.4. >> >> >> >> I saw on internet that somebody had the same problem when he was trying to >> install BioPerl 1.5, but I didn't find the solution. >> >> >> >> Anybody has a clue on that? Thank you for your time. >> >> >> >> GH >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Mon Jan 18 17:18:34 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 18 Jan 2010 11:18:34 -0600 Subject: [Bioperl-l] Bioperl 1.6 In-Reply-To: <6093E45F17B543438AC02E6C626439E1@NewLife> References: <004f01ca97e6$bee06650$3ca132f0$@ac.cn> <6093E45F17B543438AC02E6C626439E1@NewLife> Message-ID: Mark, Odd issue, maybe it's a dependency like Bio::ASN1::EntrezGene that's causing this? Regardless, it's problematic for me to test this out directly, at least for the next few days. Maybe someone could try it? Also, there is the Strawberry Perl alternative, which uses CPAN (I think ActiveState also supports this). chris On Jan 18, 2010, at 10:33 AM, Mark A. Jensen wrote: > this issue's come up before, see this thread > http://lists.open-bio.org/pipermail/bioperl-l/2009-October/031400.html > MAJ > ----- Original Message ----- From: "Chris Fields" > To: "Guohong Hu" > Cc: > Sent: Monday, January 18, 2010 10:30 AM > Subject: Re: [Bioperl-l] Bioperl 1.6 > > >> Guohong, >> >> 1.6.1 PPM doesn't (at least, shouldn't) require BioPerl 1.4 to be installed first. Make sure the repos are set according to the Windows installation instructions on the BioPerl wiki: >> >> http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows >> >> IIRC the actual order of the PPM repository can be critical (PPM pulls based on highest version, first repo, but sometimes it gets confused). Just curious but where is the v 1.4 PPM located? If it is local to our PPM repo I can physically remove it to prevent this from happening. >> >> chris >> >> On Jan 17, 2010, at 8:34 PM, Guohong Hu wrote: >> >>> Hi there, >>> >>> >>> >>> I was trying to install BioPerl in windows using ppm, by following the >>> instruction in >>> "http://bioperl.open-bio.org/wiki/Installing_Bioperl_on_Windows". I set up >>> the repositories, and did the search of Bioperl packages. The latest version >>> available is BioPerl 1.6.1 on http://bioperl.org/DIST. When I tried to >>> install it, a number of prerequisite modules were being installed too, which >>> include Bioperl 1.4. Then an error message showed up during installation: >>> >>> >>> >>> "ERROR: File conflict for 'C:/Perl/html/bin/bp_aacomp.html'.The package >>> BioPerl has already installed a file that package bioperl wants to install." >>> >>> >>> >>> It looks to me that BioPerl 1.6.1 had installed a file that bioperl 1.4 >>> wanted to install again. I don't know why bioperl 1.4 was one of the >>> prerequisites for 1.6.1. If I just install 1.4, it will be installed without >>> errors. But I need a newer version, because some modules (like >>> >>> Bio::Tools::HMM) is not included in 1.4. >>> >>> >>> >>> I saw on internet that somebody had the same problem when he was trying to >>> install BioPerl 1.5, but I didn't find the solution. >>> >>> >>> >>> Anybody has a clue on that? Thank you for your time. >>> >>> >>> >>> GH >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From clarsen at vecna.com Mon Jan 18 17:42:13 2010 From: clarsen at vecna.com (Chris Larsen) Date: Mon, 18 Jan 2010 12:42:13 -0500 Subject: [Bioperl-l] Reciprocal best blast hits using BioPerl? In-Reply-To: References: Message-ID: Bhakti, (and Chris, Mark)-- Yes there is some perl available to parse reciprocal best blast hits. Mark's referenced / archived post was mine, we were looking to do what you wanted. Here we proceed with the thread. We ended up implementing OrthoMCL 1.4 as Chris F pointed to, and then made a simple perl parser that would take the raw OrthoMCL output, do splits, and spit out a delimited table of all the orthologs in a group, for say Mycobacterium Genus, so you could stuff it into DBLoader. The link to the script, SOP, and method is at: http://www.biohealthbase.org/brcDocs/documents/BHB_ORTHOLOG_SOP.pdf Giving e.g.: Francisella 1 110321310 Francisella 1 110321361 Francisella 1 56707275 Francisella 1 56707366 Francisella 1 56707462 Five members of Ortholog Group 1, with just their gi number. And you can see the results of that parsing, supported by a database, being used to load BioHealthbase with all the reciprocal best blast hits plus other OrthoMCL parsing, for mycobacterial PolA at: http://www.biohealthbase.org/brc/details.do?locus=MAV_3155&decorator=mycobacterium See? Pretty? We were just interested in making ortholog groups on the bais of paralog-conscious reciprocal blast stuff. Like you. This package and doc I've made does what you want I think, as long as you stay in prokaryotes. But--careful...garbage in, garbage out. We started with clean Genuses. (. o O Genii?). You'll get more junky HUGE and TINY ortholog groups if you put in different Orders of microbes. Its taxa sensitive. OrthoMCL author David Roos is great at it though and designed it in mind of higher unicellular euks too...comb the docs for that; sorry I was doing bacterial work at the time and cant guide you if thats what you want.. If you end up installing OrthMCL 1.4, you can pipe the output to this method and get out useable stuff. Hope it works for you. Cheers, Chris L -- Christopher Larsen, Ph.D. Sr. Scientist / Grants Manager Vecna Technologies 6404 Ivy Lane #500 Greenbelt, MD 20770 Phone: (240) 965-4525 Fax: (240) 547-6133 240-737-4525 From maj at fortinbras.us Mon Jan 18 19:37:43 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 18 Jan 2010 14:37:43 -0500 Subject: [Bioperl-l] Bioperl 1.6 In-Reply-To: References: <004f01ca97e6$bee06650$3ca132f0$@ac.cn> <6093E45F17B543438AC02E6C626439E1@NewLife> Message-ID: <61F331117B7C4E2282684FA240B9710F@NewLife> I will play around with it-- in the meantime, Guohong, please look at the following http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows#Comand-line_Installation where there is a workaround for this issue, using the ppm-shell-- cheers, Mark ----- Original Message ----- From: "Chris Fields" To: "Mark A. Jensen" Cc: "Guohong Hu" ; Sent: Monday, January 18, 2010 12:18 PM Subject: Re: [Bioperl-l] Bioperl 1.6 Mark, Odd issue, maybe it's a dependency like Bio::ASN1::EntrezGene that's causing this? Regardless, it's problematic for me to test this out directly, at least for the next few days. Maybe someone could try it? Also, there is the Strawberry Perl alternative, which uses CPAN (I think ActiveState also supports this). chris On Jan 18, 2010, at 10:33 AM, Mark A. Jensen wrote: > this issue's come up before, see this thread > http://lists.open-bio.org/pipermail/bioperl-l/2009-October/031400.html > MAJ > ----- Original Message ----- From: "Chris Fields" > To: "Guohong Hu" > Cc: > Sent: Monday, January 18, 2010 10:30 AM > Subject: Re: [Bioperl-l] Bioperl 1.6 > > >> Guohong, >> >> 1.6.1 PPM doesn't (at least, shouldn't) require BioPerl 1.4 to be installed >> first. Make sure the repos are set according to the Windows installation >> instructions on the BioPerl wiki: >> >> http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows >> >> IIRC the actual order of the PPM repository can be critical (PPM pulls based >> on highest version, first repo, but sometimes it gets confused). Just >> curious but where is the v 1.4 PPM located? If it is local to our PPM repo I >> can physically remove it to prevent this from happening. >> >> chris >> >> On Jan 17, 2010, at 8:34 PM, Guohong Hu wrote: >> >>> Hi there, >>> >>> >>> >>> I was trying to install BioPerl in windows using ppm, by following the >>> instruction in >>> "http://bioperl.open-bio.org/wiki/Installing_Bioperl_on_Windows". I set up >>> the repositories, and did the search of Bioperl packages. The latest version >>> available is BioPerl 1.6.1 on http://bioperl.org/DIST. When I tried to >>> install it, a number of prerequisite modules were being installed too, which >>> include Bioperl 1.4. Then an error message showed up during installation: >>> >>> >>> >>> "ERROR: File conflict for 'C:/Perl/html/bin/bp_aacomp.html'.The package >>> BioPerl has already installed a file that package bioperl wants to install." >>> >>> >>> >>> It looks to me that BioPerl 1.6.1 had installed a file that bioperl 1.4 >>> wanted to install again. I don't know why bioperl 1.4 was one of the >>> prerequisites for 1.6.1. If I just install 1.4, it will be installed without >>> errors. But I need a newer version, because some modules (like >>> >>> Bio::Tools::HMM) is not included in 1.4. >>> >>> >>> >>> I saw on internet that somebody had the same problem when he was trying to >>> install BioPerl 1.5, but I didn't find the solution. >>> >>> >>> >>> Anybody has a clue on that? Thank you for your time. >>> >>> >>> >>> GH >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jason at bioperl.org Mon Jan 18 20:24:33 2010 From: jason at bioperl.org (Jason Stajich) Date: Mon, 18 Jan 2010 12:24:33 -0800 Subject: [Bioperl-l] Reciprocal best hits using Bioperl? In-Reply-To: References: <201001171812.56238.tristan.lefebure@gmail.com> <9151FF3A-B51D-4E03-AAA0-4463A6A57F10@bioperl.org> <201001172036.39032.tristan.lefebure@gmail.com> Message-ID: <68DF70A5-63A6-428D-A7F1-7B3D01528375@bioperl.org> On Jan 18, 2010, at 8:12 AM, Chris Fields wrote: > (my small rant on this) > > On Jan 18, 2010, at 4:20 AM, Robert Bradbury wrote: > >> My comment might be that the problem with OrthoMCL is that it is >> primarily lower organisms. The problem with Ensembl (and some other >> databases) is that it is primarliy higher organisms (though they do >> include Drosophila, C. elegans and Yeast). > > OrthoMCL v2 handles both lower and higher organism; I've used it for > both, with decent success. Most other ortholog tools do as well (if > I'm not mistaken, ensembl also uses MCL under the hood, unless > that's changed). I don't believe one should be completely bound to > one toolset, particularly in this case (there are lots of nice > ortholog clustering tools using various moeans of comparison out > there), but I do think OrthoMCL is very good as an initial pass. If > anything, I would like a set of (possibly bioperl-based, definitely > DB-based) modules that can deal with this information. > > The more imperative issue in my opinion is that one is prisoner to > the gene models for those specific organisms of interest, and this > may vary widely depending on the source of those gene models > (Ensembl, UCSC, NCBI, EBI, centralized MODs like FlyBase, etc). For > instance, if gene models are poorly curated or rarely updated, the > comparisons may be significantly flawed. Some of these issues may > also be (somewhat) alleviated once more transcriptome data is > available that helps clear up gene model ambiguities, but that won't > be true for all organisms, at least initially. > > Note this isn't meant as a slam on any specific DBs or MODs in > general, the problem is one born of the fact that there isn't a > single, centralized, trusted, consistently updated source for this > data, specifically something that will handle moderated third-party > annotation. That's a very difficult problem to solve effectively. > Some of these very issues crept up at the GMOD conference, and there > appears to be consensus that a real attempt is needed to address this. > > I don't know, maybe it's just unicorns and rainbows. Personally I > do think the situation will improve, as there seems to be great > demand for it, but it requires time, resources, manpower, money, cat > herding, etc. > >> The problem arises when one wants to cross those boundaries. For >> example the 5-10 antioxidant proteins, the ~150 DNA repair proteins, >> many of the mitochondrial (ETC) proteins, the ribosomal rRNA's & >> tRNAs, and the fundamental biochemistry (EC) proteins are homologous >> all the way from the most ancient bacteria through H. sapiens. The >> only way to play in the mixed arena of prokaryotes and eukaryotes >> involving fundamental vectors in evolution is to either construct >> ones >> own databases (which presumably means getting involved with MySQL, >> and >> probably spending some $$$ on hardware) or to develop some BioPerl >> modules that can do the SpeciesX vs. SpeciesY comparisons on demand >> using some part of the cloud. This problem isn't going to get >> smaller >> its only going to get larger, now that the cost of sequencing >> (pseudo-resequencing) a vertebrate genome is starting to come in >> under >> $10,000 and people are starting to seriously talk about 10,000 >> vertebrate genomes. 10,000 x 10,000 x 20,000 (genes) isn't something >> people are going to undertake very soon. >> >> Robert > > They're already undertaking it now using a broad range of organisms, > in and out of the cloud. In most cases one can amend a prior recip. > comparative analysis with new data fairly easily, if one takes care > to do so early on (i.e. set up the BLAST databases with a specified > defined size for comparative stats between separate analyses). > OrthoMCL v2 describes a procedure to do this, and I believe others > have similar methodology. > > I could also see possible ways one can further optimize this, for > instance in cases where two very closely-related organisms are > compared, where translated seqs are 100% identical, etc. IIRC, the > OrthoMCL DB site already has a way to upload custom sets of protein > data for mapping to (already pre-run) clusters. Just the fact that > the tools are available as OS, they're semi-automated, and can be > generically applied to data of personal interest is a great boon. > Not sure I see the downside of that, and I'm pretty confident the > scalability issues will be addressed in some way. I think that the approach that Paul Thomas's group at SRI http://www.ai.sri.com/esb/ is doing is really what you'd want to focus on if you are only interested in a particular set of gene families rather than de novo clustering. That or the PhyloFacts approach http://phylogenomics.berkeley.edu/phylofacts/ . That is where HMMs are more appropriate, focusing on your initial seed set of families of proteins. HMMs for your families with some automated clustering initially to get better resolution. Once you start throwing multiple 10^6 proteins the unsupervised clustering approach may not be able to give as accurate or timely results but can be a good initial filtering step depending on how much initial knowledge you are starting with. Using HMM models won't be as computationally expensive either if you are compute limited. TreeFam is also providing curated phylogenies of gene families http://www.treefam.org/ that span the optisthokonts in that a few fungi are sprinkled in. Also things like http://boinc.bio.wzw.tum.de/boincsimap/ provide ways to use distributed computing to calculate the matrix of similarities among proteins if you are interested in the exhaustive approach. -jason > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ From jay at jays.net Mon Jan 18 23:36:20 2010 From: jay at jays.net (Jay Hannah) Date: Mon, 18 Jan 2010 17:36:20 -0600 Subject: [Bioperl-l] Reciprocal best hits using Bioperl? In-Reply-To: References: Message-ID: <9AA13F94-3336-4CC1-89C4-249D0EB7C857@jays.net> On Jan 17, 2010, at 10:02 AM, Bhakti Dwivedi wrote: > Is there a Bio-perl module to parse the reciprocal best hits (query1-> hit1 > && hit1 -> query1) from a blast table report? If all the advice and resources in this thread have not dissuaded you from writing your own, you could glance at cross_blast() here as reference: https://clabsvn.ist.unomaha.edu/anonsvn/user/jhannah/UNO/seqlab/seqlab/tutorial.pod About the (abandoned) project: http://clab.ist.unomaha.edu/CLAB/index.php/SeqLab_%28Perl%29 I wrote that in 2006 for clustering a few hundred proteins based on custom criteria. Cheers, Jay Hannah http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah From jay at jays.net Tue Jan 19 00:22:48 2010 From: jay at jays.net (Jay Hannah) Date: Mon, 18 Jan 2010 18:22:48 -0600 Subject: [Bioperl-l] Bio::BroodComb - RFC Message-ID: <638696D6-7529-4717-A05A-F1E8FF1C5A8F@jays.net> I formalized a little framework so I could stop re-writing little programs that do some things people frequently ask me to do. http://github.com/jhannah/bio-broodcomb It stores everything in SQLite so users can write custom reports if they want to. It uses BioPerl and some shiny CPAN beads (DBIx::Class, Moose::Role). Tests included. The first two functions I stuck in the framework: Find subsequences (Bio::BroodComb::SubSeq): use Bio::BroodComb; my $bc = Bio::BroodComb->new(); $bc->load_large_seq(file => "large_seq.fasta"); $bc->load_small_seq(file => "small_seq.fasta"); $bc->find_subseqs(); print $bc->subseq_report1; In-silico PCR (Bio::BroodComb::PCR): use Bio::BroodComb; my $bc = Bio::BroodComb->new(); $bc->load_large_seq(file => "large_seq.fasta"); $bc->add_primerset( description => "U5/R", # however you want it reported forward_primer => 'GCGGGCAGCAATACTGCTTTGTAA', reverse_primer => 'ACCAGCGTTCAGCATATGGAGGAT', ); $bc->find_pcr_hits(); $bc->find_pcr_products(); print $bc->pcr_report1; I find this rather handy, so will probably be adding all my applicable future work to it instead of writing stand-alone programs. Not sure if it should be renamed for eventual CPAN / wherever. Suggestions, contributions welcome. :) http://github.com/jhannah/bio-broodcomb Jay Hannah http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah From ocornejo at gmail.com Tue Jan 19 00:46:10 2010 From: ocornejo at gmail.com (Omar Cornejo) Date: Mon, 18 Jan 2010 16:46:10 -0800 (PST) Subject: [Bioperl-l] installing bioperl for mac Message-ID: <780113ef-a931-4c2e-bdfe-6a67706e2d0e@p16g2000vbe.googlegroups.com> Dear People, I have tried to install Bioperl in my new Mac Book, which carries the latest perl distribution (5.10.0) and for some reason I can't (using fink) make it recognize this version or perl. I have tried: fink install bioperl-pm510 fink install bioperl-pm5100 but neither one works. Is it fine installing bioperl for perl v 5.9? thank you, Omar Cornejo From jason at bioperl.org Tue Jan 19 01:04:31 2010 From: jason at bioperl.org (Jason Stajich) Date: Mon, 18 Jan 2010 17:04:31 -0800 Subject: [Bioperl-l] AlignIO crashes when reading stockholm alignment with negative PDB ranges In-Reply-To: <4B5502D9.2010706@gmail.com> References: <49C2B97B.7070304@gmail.com> <49C405F0.5050100@gmail.com> <4B4E7A07.7070805@gmail.com> <94913CE4-22AA-407C-9B27-7572A680C9F7@bioperl.org> <4B5502D9.2010706@gmail.com> Message-ID: Alexandr - Thanks for getting back to us - I am guessing the parser needs to recognize negative coordinates around about line 370 in Bio/AlignIO/ Handler/GenericAlignHandler.pm which assumes a split on '-' will be sufficient. Can you post it as a bug to bugzilla along with attaching a record and script that replicates the problem so a test can be written for this. http://bugzilla.open-bio.org/ -jason On Jan 18, 2010, at 4:54 PM, Alexandr Bezginov wrote: > I have contacted Pfam, and I have been told that The PDB file actually > does include a reference to residue "-1": > > DBREF 1E5N A -1 347 UNP P14768 XYNA_PSEFL 264 611 > > DBREF 1E5N B -1 347 UNP P14768 XYNA_PSEFL 264 611 > > > Since negative numbers are allowed in PDB, the data should probably be > considered valid. > > There are quite a few records like this, so this is not an isolated > issue. > > Alexandr > > On 1/14/2010 7:20 PM, Jason Stajich wrote: >> Seems like improper data really -- "-1" is an improper coordinate >> as far >> as the parser is concerned. You may want to tell Pfam that there is >> possible error in the dumper since that was the only record that had >> this problem? >> >> -jason >> On Jan 13, 2010, at 5:57 PM, albezg wrote: >> >>> Hi all, >>> >>> I have a problem using AlignIO to read Pfam database: >>> ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release/Pfam-A.seed.gz >>> The database is in STOCKHOLM 1.0 format. AlignIO can read the >>> alignment OK until the alignment PF00331.13. There it crashes with >>> the >>> following message: >>> >>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>> MSG: '1-344' is not an integer. >>> >>> STACK: Error::throw >>> STACK: Bio::Root::Root::throw >>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Root/Root.pm:368 >>> STACK: Bio::Range::end >>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Range.pm:228 >>> STACK: Bio::Annotation::Target::new >>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Annotation/Target.pm:82 >>> STACK: Bio::AlignIO::Handler::GenericAlignHandler::_stockholm_target >>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/ >>> GenericAlignHandler.pm:293 >>> >>> STACK: Bio::AlignIO::Handler::GenericAlignHandler::data_handler >>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/ >>> GenericAlignHandler.pm:73 >>> >>> STACK: Bio::AlignIO::stockholm::next_aln >>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/stockholm.pm:471 >>> STACK: /home/albezg/scripts/pfam2fasta.pl:22 >>> ----------------------------------------------------------- >>> >>> It appears this is caused by this entry: >>> #=GS XYNA_PSEFL/263-608 DR PDB; 1e5n B; -1-344; >>> >>> I don't care about residues in PDB, so I have just removed minus >>> signs >>> from the ranges. This seems to have fixed the crashing. >>> >>> Is it a known problem? Is there a solution for it? >>> >>> Thanks, >>> Alexandr >>> >>> >>> On 03/20/2009 05:09 PM, albezg wrote: >>>> >>>> I'm trying to change FASTA header(display_id) for a sequence in an >>>> alignment(SimpleAlign). >>>> >>>> There are no issues when I print it, however when I use AlignIO >>>> to write >>>> the alignment to a FASTA file, it does not work. Is this behavior >>>> intended? >>>> >>>> Demo code: http://github.com/jhannah/sandbox/tree/master/Bio_AlignIO_bug >>>> >>>> The error: >>>> ------------- EXCEPTION ------------- >>>> MSG: No sequence with name [1/1-11] >>>> STACK Bio::SimpleAlign::displayname >>>> /scratch/BioSoftware/bioperl-live/Bio/SimpleAlign.pm:2659 >>>> STACK Bio::AlignIO::fasta::write_aln >>>> /scratch/BioSoftware/bioperl-live/Bio/AlignIO/fasta.pm:200 >>>> STACK toplevel ./demo.pl:14 >>>> ------------------------------------- >>>> >>>> Alexandr >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> jason.stajich at gmail.com >> jason at bioperl.org >> http://fungalgenomes.org/ >> > -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ From cjfields at illinois.edu Tue Jan 19 02:19:30 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 18 Jan 2010 20:19:30 -0600 Subject: [Bioperl-l] AlignIO crashes when reading stockholm alignment with negative PDB ranges In-Reply-To: References: <49C2B97B.7070304@gmail.com> <49C405F0.5050100@gmail.com> <4B4E7A07.7070805@gmail.com> <94913CE4-22AA-407C-9B27-7572A680C9F7@bioperl.org> <4B5502D9.2010706@gmail.com> Message-ID: <46FD172A-69C0-436C-A005-AC38668C3347@illinois.edu> Alexandr, Posting the bug report would be great, should be an easy enough fix. chris On Jan 18, 2010, at 7:04 PM, Jason Stajich wrote: > Alexandr - > > Thanks for getting back to us - I am guessing the parser needs to recognize negative coordinates around about line 370 in Bio/AlignIO/Handler/GenericAlignHandler.pm which assumes a split on '-' will be sufficient. > > Can you post it as a bug to bugzilla along with attaching a record and script that replicates the problem so a test can be written for this. http://bugzilla.open-bio.org/ > > -jason > On Jan 18, 2010, at 4:54 PM, Alexandr Bezginov wrote: > >> I have contacted Pfam, and I have been told that The PDB file actually >> does include a reference to residue "-1": >> >> DBREF 1E5N A -1 347 UNP P14768 XYNA_PSEFL 264 611 >> >> DBREF 1E5N B -1 347 UNP P14768 XYNA_PSEFL 264 611 >> >> >> Since negative numbers are allowed in PDB, the data should probably be >> considered valid. >> >> There are quite a few records like this, so this is not an isolated issue. >> >> Alexandr >> >> On 1/14/2010 7:20 PM, Jason Stajich wrote: >>> Seems like improper data really -- "-1" is an improper coordinate as far >>> as the parser is concerned. You may want to tell Pfam that there is >>> possible error in the dumper since that was the only record that had >>> this problem? >>> >>> -jason >>> On Jan 13, 2010, at 5:57 PM, albezg wrote: >>> >>>> Hi all, >>>> >>>> I have a problem using AlignIO to read Pfam database: >>>> ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release/Pfam-A.seed.gz >>>> The database is in STOCKHOLM 1.0 format. AlignIO can read the >>>> alignment OK until the alignment PF00331.13. There it crashes with the >>>> following message: >>>> >>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>> MSG: '1-344' is not an integer. >>>> >>>> STACK: Error::throw >>>> STACK: Bio::Root::Root::throw >>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Root/Root.pm:368 >>>> STACK: Bio::Range::end >>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Range.pm:228 >>>> STACK: Bio::Annotation::Target::new >>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Annotation/Target.pm:82 >>>> STACK: Bio::AlignIO::Handler::GenericAlignHandler::_stockholm_target >>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/GenericAlignHandler.pm:293 >>>> >>>> STACK: Bio::AlignIO::Handler::GenericAlignHandler::data_handler >>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/GenericAlignHandler.pm:73 >>>> >>>> STACK: Bio::AlignIO::stockholm::next_aln >>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/stockholm.pm:471 >>>> STACK: /home/albezg/scripts/pfam2fasta.pl:22 >>>> ----------------------------------------------------------- >>>> >>>> It appears this is caused by this entry: >>>> #=GS XYNA_PSEFL/263-608 DR PDB; 1e5n B; -1-344; >>>> >>>> I don't care about residues in PDB, so I have just removed minus signs >>>> from the ranges. This seems to have fixed the crashing. >>>> >>>> Is it a known problem? Is there a solution for it? >>>> >>>> Thanks, >>>> Alexandr >>>> >>>> >>>> On 03/20/2009 05:09 PM, albezg wrote: >>>>> >>>>> I'm trying to change FASTA header(display_id) for a sequence in an >>>>> alignment(SimpleAlign). >>>>> >>>>> There are no issues when I print it, however when I use AlignIO to write >>>>> the alignment to a FASTA file, it does not work. Is this behavior >>>>> intended? >>>>> >>>>> Demo code: http://github.com/jhannah/sandbox/tree/master/Bio_AlignIO_bug >>>>> >>>>> The error: >>>>> ------------- EXCEPTION ------------- >>>>> MSG: No sequence with name [1/1-11] >>>>> STACK Bio::SimpleAlign::displayname >>>>> /scratch/BioSoftware/bioperl-live/Bio/SimpleAlign.pm:2659 >>>>> STACK Bio::AlignIO::fasta::write_aln >>>>> /scratch/BioSoftware/bioperl-live/Bio/AlignIO/fasta.pm:200 >>>>> STACK toplevel ./demo.pl:14 >>>>> ------------------------------------- >>>>> >>>>> Alexandr >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> -- >>> Jason Stajich >>> jason.stajich at gmail.com >>> jason at bioperl.org >>> http://fungalgenomes.org/ >>> >> > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > http://fungalgenomes.org/ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Jan 19 02:20:31 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 18 Jan 2010 20:20:31 -0600 Subject: [Bioperl-l] installing bioperl for mac In-Reply-To: <780113ef-a931-4c2e-bdfe-6a67706e2d0e@p16g2000vbe.googlegroups.com> References: <780113ef-a931-4c2e-bdfe-6a67706e2d0e@p16g2000vbe.googlegroups.com> Message-ID: <2C159519-B13D-4ABA-BC7B-A21AB5EE0C37@illinois.edu> On Jan 18, 2010, at 6:46 PM, Omar Cornejo wrote: > Dear People, > I have tried to install Bioperl in my new Mac Book, which carries > the latest perl distribution (5.10.0) and for some reason I can't > (using fink) make it recognize this version or perl. > I have tried: > fink install bioperl-pm510 > fink install bioperl-pm5100 > > but neither one works. Is it fine installing bioperl for perl v 5.9? > > thank you, > Omar Cornejo fink doesn't have a package for perl 5.10. You can install it using CPAN, however (it's pure perl), or use other UNIX-y options. See the UNIX installation instructions on the wiki: http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix chris From dan.kortschak at adelaide.edu.au Tue Jan 19 02:47:47 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Tue, 19 Jan 2010 13:17:47 +1030 Subject: [Bioperl-l] Bio::Tools::Run::Bowtie and Bio::Assembly::IO::bowtie now available BETA Message-ID: <1263869267.12881.33.camel@zoidberg.mbs.adelaide.edu.au> Hi All, A wrapper and output parser for bowtie 'ultrafast, memory-efficient short read aligner' are now available in the bioperl-live and bioperl-run subversion repositories (bioperl-live/trunk at 16727 and bioperl-run/trunk at 16726). Bowtie details are available here: http://bowtie-bio.sourceforge.net/index.shtml The modules can return a Bio::Assembly::Scaffold object (operating via the MAJ's Bio::Assembly::IO::sam module in bioperl-live/trunk which requires lstein's Bio::DB::Sam, from CPAN). Note that Bio::DB::Sam uses large amounts of memory - the test suite works for me with >=2GB but not with 1GB due to this. (Is there a disk file system based tool for this for large projects?) Bowtie (>0.12.0) can align in colour space, but this is not currently supported by the wrapper though it should not be difficult to add. If someone can point me to a small set of colour space reads and a reference sequence I will be able to use these for testing. Thanks to the core devs for helping me with many of my problems in putting this together. Dan From maj at fortinbras.us Tue Jan 19 03:31:36 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 18 Jan 2010 22:31:36 -0500 Subject: [Bioperl-l] Bio::Tools::Run::Bowtie and Bio::Assembly::IO::bowtie now available BETA In-Reply-To: <1263869267.12881.33.camel@zoidberg.mbs.adelaide.edu.au> References: <1263869267.12881.33.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: Excellent Dan! Thanks for all this work-- MAJ ----- Original Message ----- From: "Dan Kortschak" To: Sent: Monday, January 18, 2010 9:47 PM Subject: [Bioperl-l] Bio::Tools::Run::Bowtie and Bio::Assembly::IO::bowtie now available BETA > Hi All, > > A wrapper and output parser for bowtie 'ultrafast, memory-efficient > short read aligner' are now available in the bioperl-live and > bioperl-run subversion repositories (bioperl-live/trunk at 16727 and > bioperl-run/trunk at 16726). Bowtie details are available here: > > http://bowtie-bio.sourceforge.net/index.shtml > > The modules can return a Bio::Assembly::Scaffold object (operating via > the MAJ's Bio::Assembly::IO::sam module in bioperl-live/trunk > which requires lstein's Bio::DB::Sam, from CPAN). Note that Bio::DB::Sam > uses large amounts of memory - the test suite works for me with >=2GB > but not with 1GB due to this. (Is there a disk file system based tool > for this for large projects?) > > Bowtie (>0.12.0) can align in colour space, but this is not currently > supported by the wrapper though it should not be difficult to add. If > someone can point me to a small set of colour space reads and a > reference sequence I will be able to use these for testing. > > Thanks to the core devs for helping me with many of my problems in > putting this together. > > Dan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Tue Jan 19 03:36:12 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 18 Jan 2010 21:36:12 -0600 Subject: [Bioperl-l] Bio::Tools::Run::Bowtie and Bio::Assembly::IO::bowtie now available BETA In-Reply-To: <1263869267.12881.33.camel@zoidberg.mbs.adelaide.edu.au> References: <1263869267.12881.33.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: On Jan 18, 2010, at 8:47 PM, Dan Kortschak wrote: > Hi All, > > A wrapper and output parser for bowtie 'ultrafast, memory-efficient > short read aligner' are now available in the bioperl-live and > bioperl-run subversion repositories (bioperl-live/trunk at 16727 and > bioperl-run/trunk at 16726). Bowtie details are available here: > > http://bowtie-bio.sourceforge.net/index.shtml > > The modules can return a Bio::Assembly::Scaffold object (operating via > the MAJ's Bio::Assembly::IO::sam module in bioperl-live/trunk > which requires lstein's Bio::DB::Sam, from CPAN). Note that Bio::DB::Sam > uses large amounts of memory - the test suite works for me with >=2GB > but not with 1GB due to this. (Is there a disk file system based tool > for this for large projects?) > > Bowtie (>0.12.0) can align in colour space, but this is not currently > supported by the wrapper though it should not be difficult to add. If > someone can point me to a small set of colour space reads and a > reference sequence I will be able to use these for testing. > > Thanks to the core devs for helping me with many of my problems in > putting this together. > > Dan And (on behalf of the core devs) thank you for putting this together! chris From scott at scottcain.net Tue Jan 19 03:41:43 2010 From: scott at scottcain.net (Scott Cain) Date: Mon, 18 Jan 2010 22:41:43 -0500 Subject: [Bioperl-l] installing bioperl for mac In-Reply-To: <2C159519-B13D-4ABA-BC7B-A21AB5EE0C37@illinois.edu> References: <780113ef-a931-4c2e-bdfe-6a67706e2d0e@p16g2000vbe.googlegroups.com> <2C159519-B13D-4ABA-BC7B-A21AB5EE0C37@illinois.edu> Message-ID: <4536f7701001181941v7ba47d7du340d18f02f84bb17@mail.gmail.com> But make sure you have the developers tools installed before the first time you run the cpan shell; it will make your life easier. Scott On Mon, Jan 18, 2010 at 9:20 PM, Chris Fields wrote: > On Jan 18, 2010, at 6:46 PM, Omar Cornejo wrote: > >> Dear People, >> ?I have tried to install Bioperl in my new Mac Book, which carries >> the latest perl distribution (5.10.0) and for some reason I can't >> (using fink) make it recognize this version or perl. >> ?I have tried: >> fink install bioperl-pm510 >> fink install bioperl-pm5100 >> >> but neither one works. ?Is it fine installing bioperl for perl v 5.9? >> >> thank you, >> Omar Cornejo > > fink doesn't have a package for perl 5.10. ?You can install it using CPAN, however (it's pure perl), or use other UNIX-y options. ?See the UNIX installation instructions on the wiki: > > http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From cjfields at illinois.edu Tue Jan 19 04:04:57 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 18 Jan 2010 22:04:57 -0600 Subject: [Bioperl-l] Bioperl 1.6 In-Reply-To: <009801c8b957$2af4f8d0$80deea70$@ac.cn> References: <004f01ca97e6$bee06650$3ca132f0$@ac.cn> <009801c8b957$2af4f8d0$80deea70$@ac.cn> Message-ID: <79D53148-1FDA-4025-99A6-77A7F124E6BD@illinois.edu> Hmm, the trouchelle repo is the only one that had a working DB_File for perl 5.10 (not sure but I think 5.8.9 was fine). Probably worth contacting them about this to see if they can drop the (way out-of-date) 1.4 distribution. chris On May 18, 2008, at 9:22 PM, Guohong Hu wrote: > Thank for you all. The problem is solved. The bioperl 1.4 version is from > the Trouchelle repo, but 1.6 is in the Bioperl Regular Releases repo. When I > added all the repo according to the bioperl wiki instruction, somehow 1.4 > became a prerequisite for 1.6. But Chris's question reminded me, so I > removed Trouchelle repo, and the installation proceeded without errors. I > suggested we put a note in the wiki link since it looks like an odd issue > not just for me. > > Best, > Guohong > > > > _________________________________________ > ???: Chris Fields [mailto:cjfields at illinois.edu] > ????: 2010?1?18? 23:30 > ???: Guohong Hu > ??: bioperl-l at lists.open-bio.org > ??: Re: [Bioperl-l] Bioperl 1.6 > > Guohong, > > 1.6.1 PPM doesn't (at least, shouldn't) require BioPerl 1.4 to be installed > first. Make sure the repos are set according to the Windows installation > instructions on the BioPerl wiki: > > http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows > > IIRC the actual order of the PPM repository can be critical (PPM pulls based > on highest version, first repo, but sometimes it gets confused). Just > curious but where is the v 1.4 PPM located? If it is local to our PPM repo > I can physically remove it to prevent this from happening. > > chris > > On Jan 17, 2010, at 8:34 PM, Guohong Hu wrote: > >> Hi there, >> >> >> >> I was trying to install BioPerl in windows using ppm, by following the >> instruction in >> "http://bioperl.open-bio.org/wiki/Installing_Bioperl_on_Windows". I set up >> the repositories, and did the search of Bioperl packages. The latest > version >> available is BioPerl 1.6.1 on http://bioperl.org/DIST. When I tried to >> install it, a number of prerequisite modules were being installed too, > which >> include Bioperl 1.4. Then an error message showed up during installation: >> >> >> >> "ERROR: File conflict for 'C:/Perl/html/bin/bp_aacomp.html'.The package >> BioPerl has already installed a file that package bioperl wants to > install." >> >> >> >> It looks to me that BioPerl 1.6.1 had installed a file that bioperl 1.4 >> wanted to install again. I don't know why bioperl 1.4 was one of the >> prerequisites for 1.6.1. If I just install 1.4, it will be installed > without >> errors. But I need a newer version, because some modules (like >> >> Bio::Tools::HMM) is not included in 1.4. >> >> >> >> I saw on internet that somebody had the same problem when he was trying to >> install BioPerl 1.5, but I didn't find the solution. >> >> >> >> Anybody has a clue on that? Thank you for your time. >> >> >> >> GH >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From ocornejo at gmail.com Tue Jan 19 04:18:00 2010 From: ocornejo at gmail.com (Omar Eduardo Cornejo Ordaz) Date: Mon, 18 Jan 2010 23:18:00 -0500 Subject: [Bioperl-l] installing bioperl for mac In-Reply-To: <5C767287-A133-4DB5-8708-AD1BF48A4E7E@illinois.edu> References: <780113ef-a931-4c2e-bdfe-6a67706e2d0e@p16g2000vbe.googlegroups.com> <2C159519-B13D-4ABA-BC7B-A21AB5EE0C37@illinois.edu> <4536f7701001181941v7ba47d7du340d18f02f84bb17@mail.gmail.com> <5C767287-A133-4DB5-8708-AD1BF48A4E7E@illinois.edu> Message-ID: I see. thank you Scott and Chris. I had already installed the latest version of the Xcode Developer Tools. I will go the cpan way then. have a nice one, Omar On Mon, Jan 18, 2010 at 10:58 PM, Chris Fields wrote: > Yes, definitely! > > -c > > On Jan 18, 2010, at 9:41 PM, Scott Cain wrote: > > > But make sure you have the developers tools installed before the first > > time you run the cpan shell; it will make your life easier. > > > > Scott > > > > > > On Mon, Jan 18, 2010 at 9:20 PM, Chris Fields > wrote: > >> On Jan 18, 2010, at 6:46 PM, Omar Cornejo wrote: > >> > >>> Dear People, > >>> I have tried to install Bioperl in my new Mac Book, which carries > >>> the latest perl distribution (5.10.0) and for some reason I can't > >>> (using fink) make it recognize this version or perl. > >>> I have tried: > >>> fink install bioperl-pm510 > >>> fink install bioperl-pm5100 > >>> > >>> but neither one works. Is it fine installing bioperl for perl v 5.9? > >>> > >>> thank you, > >>> Omar Cornejo > >> > >> fink doesn't have a package for perl 5.10. You can install it using > CPAN, however (it's pure perl), or use other UNIX-y options. See the UNIX > installation instructions on the wiki: > >> > >> http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix > >> > >> chris > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > > > > > > > -- > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D. scott at scottcain > dot net > > GMOD Coordinator (http://gmod.org/) 216-392-3087 > > Ontario Institute for Cancer Research > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Tue Jan 19 03:58:36 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 18 Jan 2010 21:58:36 -0600 Subject: [Bioperl-l] installing bioperl for mac In-Reply-To: <4536f7701001181941v7ba47d7du340d18f02f84bb17@mail.gmail.com> References: <780113ef-a931-4c2e-bdfe-6a67706e2d0e@p16g2000vbe.googlegroups.com> <2C159519-B13D-4ABA-BC7B-A21AB5EE0C37@illinois.edu> <4536f7701001181941v7ba47d7du340d18f02f84bb17@mail.gmail.com> Message-ID: <5C767287-A133-4DB5-8708-AD1BF48A4E7E@illinois.edu> Yes, definitely! -c On Jan 18, 2010, at 9:41 PM, Scott Cain wrote: > But make sure you have the developers tools installed before the first > time you run the cpan shell; it will make your life easier. > > Scott > > > On Mon, Jan 18, 2010 at 9:20 PM, Chris Fields wrote: >> On Jan 18, 2010, at 6:46 PM, Omar Cornejo wrote: >> >>> Dear People, >>> I have tried to install Bioperl in my new Mac Book, which carries >>> the latest perl distribution (5.10.0) and for some reason I can't >>> (using fink) make it recognize this version or perl. >>> I have tried: >>> fink install bioperl-pm510 >>> fink install bioperl-pm5100 >>> >>> but neither one works. Is it fine installing bioperl for perl v 5.9? >>> >>> thank you, >>> Omar Cornejo >> >> fink doesn't have a package for perl 5.10. You can install it using CPAN, however (it's pure perl), or use other UNIX-y options. See the UNIX installation instructions on the wiki: >> >> http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From albezg at gmail.com Tue Jan 19 00:54:49 2010 From: albezg at gmail.com (Alexandr Bezginov) Date: Mon, 18 Jan 2010 19:54:49 -0500 Subject: [Bioperl-l] AlignIO crashes when reading stockholm alignment with negative PDB ranges In-Reply-To: <94913CE4-22AA-407C-9B27-7572A680C9F7@bioperl.org> References: <49C2B97B.7070304@gmail.com> <49C405F0.5050100@gmail.com> <4B4E7A07.7070805@gmail.com> <94913CE4-22AA-407C-9B27-7572A680C9F7@bioperl.org> Message-ID: <4B5502D9.2010706@gmail.com> I have contacted Pfam, and I have been told that The PDB file actually does include a reference to residue "-1": DBREF 1E5N A -1 347 UNP P14768 XYNA_PSEFL 264 611 DBREF 1E5N B -1 347 UNP P14768 XYNA_PSEFL 264 611 Since negative numbers are allowed in PDB, the data should probably be considered valid. There are quite a few records like this, so this is not an isolated issue. Alexandr On 1/14/2010 7:20 PM, Jason Stajich wrote: > Seems like improper data really -- "-1" is an improper coordinate as far > as the parser is concerned. You may want to tell Pfam that there is > possible error in the dumper since that was the only record that had > this problem? > > -jason > On Jan 13, 2010, at 5:57 PM, albezg wrote: > >> Hi all, >> >> I have a problem using AlignIO to read Pfam database: >> ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release/Pfam-A.seed.gz >> The database is in STOCKHOLM 1.0 format. AlignIO can read the >> alignment OK until the alignment PF00331.13. There it crashes with the >> following message: >> >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: '1-344' is not an integer. >> >> STACK: Error::throw >> STACK: Bio::Root::Root::throw >> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Root/Root.pm:368 >> STACK: Bio::Range::end >> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Range.pm:228 >> STACK: Bio::Annotation::Target::new >> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Annotation/Target.pm:82 >> STACK: Bio::AlignIO::Handler::GenericAlignHandler::_stockholm_target >> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/GenericAlignHandler.pm:293 >> >> STACK: Bio::AlignIO::Handler::GenericAlignHandler::data_handler >> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/GenericAlignHandler.pm:73 >> >> STACK: Bio::AlignIO::stockholm::next_aln >> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/stockholm.pm:471 >> STACK: /home/albezg/scripts/pfam2fasta.pl:22 >> ----------------------------------------------------------- >> >> It appears this is caused by this entry: >> #=GS XYNA_PSEFL/263-608 DR PDB; 1e5n B; -1-344; >> >> I don't care about residues in PDB, so I have just removed minus signs >> from the ranges. This seems to have fixed the crashing. >> >> Is it a known problem? Is there a solution for it? >> >> Thanks, >> Alexandr >> >> >> On 03/20/2009 05:09 PM, albezg wrote: >>> >>> I'm trying to change FASTA header(display_id) for a sequence in an >>> alignment(SimpleAlign). >>> >>> There are no issues when I print it, however when I use AlignIO to write >>> the alignment to a FASTA file, it does not work. Is this behavior >>> intended? >>> >>> Demo code: http://github.com/jhannah/sandbox/tree/master/Bio_AlignIO_bug >>> >>> The error: >>> ------------- EXCEPTION ------------- >>> MSG: No sequence with name [1/1-11] >>> STACK Bio::SimpleAlign::displayname >>> /scratch/BioSoftware/bioperl-live/Bio/SimpleAlign.pm:2659 >>> STACK Bio::AlignIO::fasta::write_aln >>> /scratch/BioSoftware/bioperl-live/Bio/AlignIO/fasta.pm:200 >>> STACK toplevel ./demo.pl:14 >>> ------------------------------------- >>> >>> Alexandr >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > http://fungalgenomes.org/ > From ghhu at sibs.ac.cn Tue Jan 19 02:22:19 2010 From: ghhu at sibs.ac.cn (Guohong Hu) Date: Tue, 19 Jan 2010 02:22:19 -0000 Subject: [Bioperl-l] Bioperl 1.6 In-Reply-To: References: <004f01ca97e6$bee06650$3ca132f0$@ac.cn> Message-ID: <009801c8b957$2af4f8d0$80deea70$@ac.cn> Thank for you all. The problem is solved. The bioperl 1.4 version is from the Trouchelle repo, but 1.6 is in the Bioperl Regular Releases repo. When I added all the repo according to the bioperl wiki instruction, somehow 1.4 became a prerequisite for 1.6. But Chris's question reminded me, so I removed Trouchelle repo, and the installation proceeded without errors. I suggested we put a note in the wiki link since it looks like an odd issue not just for me. Best, Guohong _________________________________________ ???: Chris Fields [mailto:cjfields at illinois.edu] ????: 2010?1?18? 23:30 ???: Guohong Hu ??: bioperl-l at lists.open-bio.org ??: Re: [Bioperl-l] Bioperl 1.6 Guohong, 1.6.1 PPM doesn't (at least, shouldn't) require BioPerl 1.4 to be installed first. Make sure the repos are set according to the Windows installation instructions on the BioPerl wiki: http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows IIRC the actual order of the PPM repository can be critical (PPM pulls based on highest version, first repo, but sometimes it gets confused). Just curious but where is the v 1.4 PPM located? If it is local to our PPM repo I can physically remove it to prevent this from happening. chris On Jan 17, 2010, at 8:34 PM, Guohong Hu wrote: > Hi there, > > > > I was trying to install BioPerl in windows using ppm, by following the > instruction in > "http://bioperl.open-bio.org/wiki/Installing_Bioperl_on_Windows". I set up > the repositories, and did the search of Bioperl packages. The latest version > available is BioPerl 1.6.1 on http://bioperl.org/DIST. When I tried to > install it, a number of prerequisite modules were being installed too, which > include Bioperl 1.4. Then an error message showed up during installation: > > > > "ERROR: File conflict for 'C:/Perl/html/bin/bp_aacomp.html'.The package > BioPerl has already installed a file that package bioperl wants to install." > > > > It looks to me that BioPerl 1.6.1 had installed a file that bioperl 1.4 > wanted to install again. I don't know why bioperl 1.4 was one of the > prerequisites for 1.6.1. If I just install 1.4, it will be installed without > errors. But I need a newer version, because some modules (like > > Bio::Tools::HMM) is not included in 1.4. > > > > I saw on internet that somebody had the same problem when he was trying to > install BioPerl 1.5, but I didn't find the solution. > > > > Anybody has a clue on that? Thank you for your time. > > > > GH > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jw12 at sanger.ac.uk Tue Jan 19 10:41:12 2010 From: jw12 at sanger.ac.uk (Jonathan Warren) Date: Tue, 19 Jan 2010 10:41:12 +0000 Subject: [Bioperl-l] DAS Workshop Registrations now Open (workshop date 7-9 April 2010) Message-ID: <9EDF4E46-15F8-434E-B557-2DE5906C4182@sanger.ac.uk> If you don't know about DAS and wish to know how to distribute your latest biological annotation to the world then the upcoming DAS workshop maybe for you. If you know about DAS and are maybe a DAS client developer then the upcoming DAS workshop is for you (as you will need to know about the upcoming DAS 1.6 Specification and how it may affect your software). For information on the workshop and registration please go to: http://www.ebi.ac.uk/training/handson/DAS_070410.html Jonathan Warren Senior Developer and DAS coordinator jw12 at sanger.ac.uk -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From SMarkel at accelrys.com Tue Jan 19 18:00:22 2010 From: SMarkel at accelrys.com (Scott Markel) Date: Tue, 19 Jan 2010 10:00:22 -0800 Subject: [Bioperl-l] Bio::Tools::Run::Bowtie and Bio::Assembly::IO::bowtie now available BETA In-Reply-To: <1263869267.12881.33.camel@zoidberg.mbs.adelaide.edu.au> References: <1263869267.12881.33.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: <5ACBA19439E77B43A06F4CAB897EC977019B31FE@EXCH1-COLO.accelrys.net> Dan, Life Tech has sample data for E. coli at http://solidsoftwaretools.com/gf/project/ecoli2x50/ and http://solidsoftwaretools.com/gf/project/dh10bfrag/. Reference sequences are included. Scott Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel at accelrys.com Accelrys (Pipeline Pilot R&D) mobile: +1 858 205 3653 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 San Diego, CA 92121 fax: +1 858 799 5222 USA web: http://www.accelrys.com http://www.linkedin.com/in/smarkel Vice President, Board of Directors: International Society for Computational Biology Chair: ISCB Publications Committee Associate Editor: PLoS Computational Biology Editorial Board: Briefings in Bioinformatics -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Dan Kortschak Sent: Monday, 18 January 2010 6:48 PM To: bioperl-l at lists.open-bio.org Subject: [Bioperl-l] Bio::Tools::Run::Bowtie and Bio::Assembly::IO::bowtie now available BETA Hi All, A wrapper and output parser for bowtie 'ultrafast, memory-efficient short read aligner' are now available in the bioperl-live and bioperl-run subversion repositories (bioperl-live/trunk at 16727 and bioperl-run/trunk at 16726). Bowtie details are available here: http://bowtie-bio.sourceforge.net/index.shtml The modules can return a Bio::Assembly::Scaffold object (operating via the MAJ's Bio::Assembly::IO::sam module in bioperl-live/trunk which requires lstein's Bio::DB::Sam, from CPAN). Note that Bio::DB::Sam uses large amounts of memory - the test suite works for me with >=2GB but not with 1GB due to this. (Is there a disk file system based tool for this for large projects?) Bowtie (>0.12.0) can align in colour space, but this is not currently supported by the wrapper though it should not be difficult to add. If someone can point me to a small set of colour space reads and a reference sequence I will be able to use these for testing. Thanks to the core devs for helping me with many of my problems in putting this together. Dan _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From dan.kortschak at adelaide.edu.au Tue Jan 19 21:18:20 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Wed, 20 Jan 2010 07:48:20 +1030 Subject: [Bioperl-l] Bio::Tools::Run::Bowtie and Bio::Assembly::IO::bowtie now available BETA In-Reply-To: <5ACBA19439E77B43A06F4CAB897EC977019B31FE@EXCH1-COLO.accelrys.net> References: <1263869267.12881.33.camel@zoidberg.mbs.adelaide.edu.au> <5ACBA19439E77B43A06F4CAB897EC977019B31FE@EXCH1-COLO.accelrys.net> Message-ID: <1263935900.4813.0.camel@epistle> Great. Thanks, Scott. Dan On Tue, 2010-01-19 at 10:00 -0800, Scott Markel wrote: > Dan, > > Life Tech has sample data for E. coli at > > http://solidsoftwaretools.com/gf/project/ecoli2x50/ > > and > > http://solidsoftwaretools.com/gf/project/dh10bfrag/. > > Reference sequences are included. > > Scott > > Scott Markel, Ph.D. > Principal Bioinformatics Architect email: smarkel at accelrys.com > Accelrys (Pipeline Pilot R&D) mobile: +1 858 205 3653 > 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 > San Diego, CA 92121 fax: +1 858 799 5222 > USA web: http://www.accelrys.com > > http://www.linkedin.com/in/smarkel > Vice President, Board of Directors: > International Society for Computational Biology > Chair: ISCB Publications Committee > Associate Editor: PLoS Computational Biology > Editorial Board: Briefings in Bioinformatics > > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Dan Kortschak > Sent: Monday, 18 January 2010 6:48 PM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Bio::Tools::Run::Bowtie and Bio::Assembly::IO::bowtie now available BETA > > Hi All, > > A wrapper and output parser for bowtie 'ultrafast, memory-efficient > short read aligner' are now available in the bioperl-live and > bioperl-run subversion repositories (bioperl-live/trunk at 16727 and > bioperl-run/trunk at 16726). Bowtie details are available here: > > http://bowtie-bio.sourceforge.net/index.shtml > > The modules can return a Bio::Assembly::Scaffold object (operating via > the MAJ's Bio::Assembly::IO::sam module in bioperl-live/trunk > which requires lstein's Bio::DB::Sam, from CPAN). Note that Bio::DB::Sam > uses large amounts of memory - the test suite works for me with >=2GB > but not with 1GB due to this. (Is there a disk file system based tool > for this for large projects?) > > Bowtie (>0.12.0) can align in colour space, but this is not currently > supported by the wrapper though it should not be difficult to add. If > someone can point me to a small set of colour space reads and a > reference sequence I will be able to use these for testing. > > Thanks to the core devs for helping me with many of my problems in > putting this together. > > Dan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From dan.kortschak at adelaide.edu.au Wed Jan 20 05:32:05 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Wed, 20 Jan 2010 16:02:05 +1030 Subject: [Bioperl-l] using Bio::DB::GFF for aggregation Message-ID: <1263965525.12780.24.camel@zoidberg.mbs.adelaide.edu.au> Hi Chris (or others), I've been looking at ways to do large assemblies (really rnaseq/readseq comparisons for coverage) with maq/bowtie output and it's clear that for the size of project that I'm working on the space complexity is too nasty with Bio::DB::Sam. So I thought Bio::DB:GFF might be the way to go. I was thinking: B:T:R:Bowtie ~> B:SeqFeat:Generic -> B:T:GFF -> B:DB:GFF This depends on the behaviour of B:DB:GFF->features(-merge=>1). I've read through the docs, and it's not entirely clear (I'm hoping I've interpreted it the right way), but does this result in the return of features such that overlapping features are returned as a single feature while non-overlapping features come back separately. If this is the case, it would satisfy my requirements perfectly. thanks for your time Dan From jason at bioperl.org Wed Jan 20 06:35:24 2010 From: jason at bioperl.org (Jason Stajich) Date: Tue, 19 Jan 2010 22:35:24 -0800 Subject: [Bioperl-l] using Bio::DB::GFF for aggregation In-Reply-To: <1263965525.12780.24.camel@zoidberg.mbs.adelaide.edu.au> References: <1263965525.12780.24.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: Are you looking at the bowtie features file or the SAM? -jason On Jan 19, 2010, at 9:32 PM, Dan Kortschak wrote: > Hi Chris (or others), > > I've been looking at ways to do large assemblies (really rnaseq/ > readseq > comparisons for coverage) with maq/bowtie output and it's clear that > for > the size of project that I'm working on the space complexity is too > nasty with Bio::DB::Sam. So I thought Bio::DB:GFF might be the way to > go. > > I was thinking: B:T:R:Bowtie ~> B:SeqFeat:Generic -> B:T:GFF -> > B:DB:GFF > > This depends on the behaviour of B:DB:GFF->features(-merge=>1). I've > read through the docs, and it's not entirely clear (I'm hoping I've > interpreted it the right way), but does this result in the return of > features such that overlapping features are returned as a single > feature > while non-overlapping features come back separately. If this is the > case, it would satisfy my requirements perfectly. > > thanks for your time > Dan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ From dan.kortschak at adelaide.edu.au Wed Jan 20 07:19:05 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Wed, 20 Jan 2010 17:49:05 +1030 Subject: [Bioperl-l] using Bio::DB::GFF for aggregation In-Reply-To: References: <1263965525.12780.24.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: <1263971945.4582.2.camel@epistle> It doesn't really matter, they are largely inter-convertible. The problem is not really the upstream processing, but the aggregation of reads into read-assigned regions (unless I've misunderstood your question). Dan On Tue, 2010-01-19 at 22:35 -0800, Jason Stajich wrote: > Are you looking at the bowtie features file or the SAM? > -jason > On Jan 19, 2010, at 9:32 PM, Dan Kortschak wrote: > > > Hi Chris (or others), > > > > I've been looking at ways to do large assemblies (really rnaseq/ > > readseq > > comparisons for coverage) with maq/bowtie output and it's clear that > > for > > the size of project that I'm working on the space complexity is too > > nasty with Bio::DB::Sam. So I thought Bio::DB:GFF might be the way to > > go. > > > > I was thinking: B:T:R:Bowtie ~> B:SeqFeat:Generic -> B:T:GFF -> > > B:DB:GFF > > > > This depends on the behaviour of B:DB:GFF->features(-merge=>1). I've > > read through the docs, and it's not entirely clear (I'm hoping I've > > interpreted it the right way), but does this result in the return of > > features such that overlapping features are returned as a single > > feature > > while non-overlapping features come back separately. If this is the > > case, it would satisfy my requirements perfectly. > > > > thanks for your time > > Dan > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > http://fungalgenomes.org/ -- Dan Kortschak From ajmackey at gmail.com Wed Jan 20 12:59:38 2010 From: ajmackey at gmail.com (Aaron Mackey) Date: Wed, 20 Jan 2010 07:59:38 -0500 Subject: [Bioperl-l] using Bio::DB::GFF for aggregation In-Reply-To: <1263965525.12780.24.camel@zoidberg.mbs.adelaide.edu.au> References: <1263965525.12780.24.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: <24c96eca1001200459g36cc5610pe12a10fea8b59a4c@mail.gmail.com> I would advise using BEDtools or the R IRanges package for this kind of aggregation/merging work, rather than trying to reinvent this particular wheel. -Aaron On Wed, Jan 20, 2010 at 12:32 AM, Dan Kortschak < dan.kortschak at adelaide.edu.au> wrote: > Hi Chris (or others), > > I've been looking at ways to do large assemblies (really rnaseq/readseq > comparisons for coverage) with maq/bowtie output and it's clear that for > the size of project that I'm working on the space complexity is too > nasty with Bio::DB::Sam. So I thought Bio::DB:GFF might be the way to > go. > > I was thinking: B:T:R:Bowtie ~> B:SeqFeat:Generic -> B:T:GFF -> B:DB:GFF > > This depends on the behaviour of B:DB:GFF->features(-merge=>1). I've > read through the docs, and it's not entirely clear (I'm hoping I've > interpreted it the right way), but does this result in the return of > features such that overlapping features are returned as a single feature > while non-overlapping features come back separately. If this is the > case, it would satisfy my requirements perfectly. > > thanks for your time > Dan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From dan.kortschak at adelaide.edu.au Wed Jan 20 21:16:39 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Thu, 21 Jan 2010 07:46:39 +1030 Subject: [Bioperl-l] using Bio::DB::GFF for aggregation In-Reply-To: <24c96eca1001200459g36cc5610pe12a10fea8b59a4c@mail.gmail.com> References: <1263965525.12780.24.camel@zoidberg.mbs.adelaide.edu.au> <24c96eca1001200459g36cc5610pe12a10fea8b59a4c@mail.gmail.com> Message-ID: <1264022199.4688.29.camel@epistle> Thanks for that, I'll look into those. BEDtools looks like what I want. cheers Dan On Wed, 2010-01-20 at 07:59 -0500, Aaron Mackey wrote: > I would advise using BEDtools or the R IRanges package for this kind > of aggregation/merging work, rather than trying to reinvent this > particular wheel. > > -Aaron From biopython at maubp.freeserve.co.uk Thu Jan 21 12:33:53 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 21 Jan 2010 12:33:53 +0000 Subject: [Bioperl-l] SwissProt DE lines and UniProt XML / TagTree as XML in BioSQL Message-ID: <320fb6e01001210433n6f42e617g6519ee2790d6add5@mail.gmail.com> Hi all, This is cross posted to try and ensure relevant people see it. I suggest we continue the discussion on the BioSQL list (for how to serialise structured annotation to BioSQL), and/or the OpenBio list (for things like file format naming conventions). I am hoping we (Bio*) can be consistent in how we parse and load into BioSQL the SwissProt DE lines (known as "swiss" format in both BioPerl and Biopython's SeqIO, and by EMBOSS) or the equivalent UniProt XML tags (which we are tentatively going to call the "uniprot" format in Biopython's SeqIO - comments?). Like BioPerl (etc), Biopython can parse plain text SwissProt ("swiss") files and load them into BioSQL. Biopython currently treats the DE comment lines as a long string, as BioPerl used to: http://lists.open-bio.org/pipermail/bioperl-l/2009-May/030041.html http://lists.open-bio.org/pipermail/biosql-l/2009-May/001514.html I understand that BioPerl now turns the SwissProt DE lines into a TagTree, and for storing this in BioSQL this gets serialised as XML. I would like Biopython to handle this the same way (although rather than a Perl TagTree, we'd use a Python structure of course), and would appreciate clarification of what exactly was implemented (e.g. which bit of the BioPerl source code should be look at, and could you show a worked example?). Andrea Pierlenoin (CC'd - not sure if he is on the BioSQL or Open-Bio lists yet) has started work on parsing UniProt XML files for Biopython. Here the DE comment lines are already provided broken up with XML markup. Hopefully their nested structure matches what BioPerl was doing with the SwissProt DE lines. Regards, Peter From cjfields at illinois.edu Thu Jan 21 13:34:12 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 21 Jan 2010 07:34:12 -0600 Subject: [Bioperl-l] [Open-bio-l] SwissProt DE lines and UniProt XML / TagTree as XML in BioSQL In-Reply-To: <320fb6e01001210433n6f42e617g6519ee2790d6add5@mail.gmail.com> References: <320fb6e01001210433n6f42e617g6519ee2790d6add5@mail.gmail.com> Message-ID: Peter, The relevant code is in Bio::Annotation::TagTree in bioperl-live, which is a decorator for Data::Stag: http://search.cpan.org/~cmungall/Data-Stag-0.11/Data/Stag.pm This is where the text output is derived from. It's a bit of a heavyweight solution to the problem, but it's capable of round-tripping the DE data and parses out the data in a way that's approachable. We could probably abstract out the serialization backend there and allow a pure bioperl solution (or the current solution) as a fallback. If the plain-text DE info is represented in a hierarchy already in UniProt XML, we should probably conform as closely as possible to that (using a standard format like XML, JSON, etc.). chris On Jan 21, 2010, at 6:33 AM, Peter wrote: > Hi all, > > This is cross posted to try and ensure relevant people see it. > I suggest we continue the discussion on the BioSQL list > (for how to serialise structured annotation to BioSQL), and/or > the OpenBio list (for things like file format naming conventions). > > I am hoping we (Bio*) can be consistent in how we parse and load > into BioSQL the SwissProt DE lines (known as "swiss" format in > both BioPerl and Biopython's SeqIO, and by EMBOSS) or the > equivalent UniProt XML tags (which we are tentatively going to > call the "uniprot" format in Biopython's SeqIO - comments?). > > Like BioPerl (etc), Biopython can parse plain text SwissProt ("swiss") > files and load them into BioSQL. Biopython currently treats the DE > comment lines as a long string, as BioPerl used to: > > http://lists.open-bio.org/pipermail/bioperl-l/2009-May/030041.html > http://lists.open-bio.org/pipermail/biosql-l/2009-May/001514.html > > I understand that BioPerl now turns the SwissProt DE lines into a > TagTree, and for storing this in BioSQL this gets serialised as XML. > I would like Biopython to handle this the same way (although rather > than a Perl TagTree, we'd use a Python structure of course), and > would appreciate clarification of what exactly was implemented > (e.g. which bit of the BioPerl source code should be look at, > and could you show a worked example?). > > Andrea Pierlenoin (CC'd - not sure if he is on the BioSQL or > Open-Bio lists yet) has started work on parsing UniProt XML > files for Biopython. Here the DE comment lines are already > provided broken up with XML markup. Hopefully their nested > structure matches what BioPerl was doing with the SwissProt > DE lines. > > Regards, > > Peter From sharmashalu.bio at gmail.com Thu Jan 21 14:25:44 2010 From: sharmashalu.bio at gmail.com (shalu sharma) Date: Thu, 21 Jan 2010 09:25:44 -0500 Subject: [Bioperl-l] sequence orientation Message-ID: <465b5a661001210625j3d84a165u69d8c8d21d2fe7ac@mail.gmail.com> Hi All, This is not a perl/bioperl query but i thought that its a best place to ask. I have some pyro reads ( from CAMERA) and i want to find out their 5' and 3' ends. Is there any way i can do this? I would really appreciate if anyone can help me out. Thanks Shalu From rtbio.2009 at gmail.com Thu Jan 21 18:28:43 2010 From: rtbio.2009 at gmail.com (Roopa Raghuveer) Date: Thu, 21 Jan 2010 19:28:43 +0100 Subject: [Bioperl-l] Regarding blast in Bioperl In-Reply-To: <196889DF87964224ACDB948681BA7F86@NewLife> References: <4C2E8133F916495B876628EF3E8FCBB2@NewLife> <9D8A1428463C4D5E9C416521C35E254C@NewLife> <196889DF87964224ACDB948681BA7F86@NewLife> Message-ID: Hello Mark, This is Roopa again. I have a small problem again. I am working on Remote blast. The program works well. But the problem is this. The program accesses the server and gets the output correctly. I am trying to send the result sequences into an array and I found that always the first sequence among the Result sequences is missing. The code is my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , '-organism' => "$organ\[ORGN]"); while (my $input = $str->next_seq()) { #Blast a sequence against a database: #Alternatively, you could pass in a file with many #sequences rather than loop through sequence one at a time #Remove the loop starting 'while (my $input = $str->next_seq())' #and swap the two lines below for an example of that. open(OUTFILE,'>',$debugfile); print OUTFILE $input; close(OUTFILE); my $r = $factory->submit_blast($input); open(OUTFILE,'>',$debugfile); # print OUTFILE $r; close(OUTFILE); print STDERR "waiting...." if($v>0); while ( my @rids = $factory->each_rid ) { open(OUTFILE,'>',$debugfile); # print OUTFILE "while entered"; close(OUTFILE); foreach my $rid ( @rids ) { open(OUTFILE,'>',$debugfile); # print OUTFILE "foreach entered"; close(OUTFILE); my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $factory->remove_rid($rid); } open(OUTFILE,'>',$debugfile); # print OUTFILE "if entered"; close(OUTFILE); print STDERR "." if ( $v > 0 ); sleep 5; } else { open(OUTFILE,'>',$debugfile); # print OUTFILE "else entered"; close(OUTFILE); my $result = $rc->next_result(); #save the output $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; open(BLASTDEBUGFILE,'>',$blastdebugfile); print BLASTDEBUGFILE $result->next_hit(); close(BLASTDEBUGFILE); my $filename = $serverpath."/blastdata_".time()."\.out"; # open(DEBUGFILE,'>',$debugfile); # open(new,'>',$filename); # @arra=; # print DEBUGFILE @arra; # close(DEBUGFILE); # close(new); $factory->save_output($filename); # open(BLASTDEBUGFILE,'>',$debugfile); # print BLASTDEBUGFILE "Hello $rid"; # close(BLASTDEBUGFILE); $factory->remove_rid($rid); open(BLASTDEBUGFILE,'>',$blastdebugfile); print BLASTDEBUGFILE $organism; close(BLASTDEBUGFILE); # open(OUTFILE,'>',$outfile); # print OUTFILE "Test2 $result->database_name()"; # close(OUTFILE); #$hit = $result->next_hit; #open(new,'>',$debugfile); #print $hit; #close(new); $dummy=0; while ( my $hit = $result->next_hit ) { next unless ( $v >= 0); # open(OUTFILE,'>',$debugfile); # print OUTFILE "$hit in while hits"; # close(OUTFILE); my $sequ = $gb->get_Seq_by_version($hit->name); my $dna = $sequ->seq(); # get the sequence as a string $dummy++; open(OUTFILE,'>',$debugfile); # print OUTFILE $dummy; close(OUTFILE); push(@seqs,$dna); } } } } } $warum=@seqs; open(OUTFILE,'>',$debugfile); # print OUTFILE $warum; print OUTFILE @seqs; close(OUTFILE); return(@seqs); } open(OUTFILE, '>',$outfile) || die ; print OUTFILE "\n RNAi Result \n \n

Inputsequence:
"; Here in the above code, I was trying to debug the code and trying to get the count of the array and even the sequence. But when the output data was giving 1 sequence, the count of the array was 0 and when I tried to print the output sequence I could not get any. It was the same when the no of output sequences was 3, I tried to print the sequences but was getting the count of the array as 2 and was printing only two sequences. Please help me in sorting out this problem. Regards, Roopa. On Sun, Jan 10, 2010 at 1:03 AM, Mark A. Jensen wrote: > Excellent Roopa- it's my pleasure-- MAJ > > ----- Original Message ----- > *From:* Roopa Raghuveer > *To:* Mark A. Jensen > *Sent:* Saturday, January 09, 2010 6:41 PM > *Subject:* Re: [Bioperl-l] Regarding blast in Bioperl > > Hi Mark, > > Thank you very very much. The code is working now. Thanks for the support > and time you have spent on me. > > Thanks in advance > Roopa. > > On Sat, Jan 9, 2010 at 10:56 PM, Mark A. Jensen wrote: > >> There is still a bug with the double quotes. Use "$organ\[ORGN]", which >> prevents perl from >> looking for a member of an array called @organ. This would have shown up >> if 'use strict;' had >> been in place. Still don't know whether this would work precisely; can you >> send me the query >> sequence so I can reproduce your ouput? >> thanks MAJ >> >> ----- Original Message ----- >> *From:* Roopa Raghuveer >> *To:* Mark A. Jensen >> *Sent:* Saturday, January 09, 2010 2:02 PM >> *Subject:* Re: [Bioperl-l] Regarding blast in Bioperl >> >> Hi Mark, >> >> I tried it with double quotes but still i got the same o/p with sequences >> from different species. >> >> ref|XM_822292.1| Trypanosoma brucei TREU927 protein kinase A ... 1813 >> 0.0 >> ref|XM_822286.1| Trypanosoma brucei TREU927 protein kinase A ... 1622 >> 0.0 >> ref|XM_816530.1| Trypanosoma cruzi strain CL Brener protein k... 773 >> 0.0 >> ref|XM_816527.1| Trypanosoma cruzi strain CL Brener protein k... 749 >> 0.0 >> ref|XM_838414.1| Leishmania major strain Friedlin protein kin... 551 >> 3e-154 >> ref|XM_838409.1| Leishmania major strain Friedlin protein kin... 551 >> 3e-154 >> ref|XM_001568451.1| Leishmania braziliensis MHOM/BR/75/M2904 ... 542 >> 2e-151 >> ref|XM_001469171.1| Leishmania infantum protein kinase A cata... 538 >> 2e-150 >> ref|XM_001469166.1| Leishmania infantum protein kinase A cata... 538 >> 2e-150 >> ref|XM_001682462.1| Leishmania major protein kinase A catalyt... 196 >> 3e-47 >> ref|XM_804361.1| Trypanosoma cruzi strain CL Brener protein k... 190 >> 1e-45 >> ref|XM_002065851.1| Drosophila willistoni GK20594 (Dwil\GK205... 181 >> 7e-43 >> ref|XM_822694.1| Trypanosoma brucei TREU927 protein kinase A ... 179 >> 2e-42 >> ref|XM_001563990.1| Leishmania braziliensis MHOM/BR/75/M2904 ... 178 >> 8e-42 >> ref|XM_814844.1| Trypanosoma cruzi strain CL Brener protein k... 170 >> 1e-39 >> ref|XM_001763039.1| Physcomitrella patens subsp. patens predi... 169 >> 4e-39 >> ref|XM_001464886.1| Leishmania infantum JPCM5 protein kinase ... 167 >> 1e-38 >> ref|XM_001377302.1| PREDICTED: Monodelphis domestica similar ... 150 >> 1e-33 >> ref|XM_001603485.1| PREDICTED: Nasonia vitripennis similar to... 145 >> 5e-32 >> ref|XM_416852.2| PREDICTED: Gallus gallus protein kinase, X-l... 143 >> 2e-31 >> ref|NM_001016403.2| Xenopus (Silurana) tropicalis protein kin... 143 >> 2e-31 >> ref|XM_002009291.1| Drosophila mojavensis GI11297 (Dmoj\GI112... 138 >> 7e-30 >> ref|NM_016979.1| Mus musculus protein kinase, X-linked (Prkx)... 138 >> 7e-30 >> ref|XM_001495664.2| PREDICTED: Equus caballus similar to Seri... 136 >> 2e-29 >> ref|XM_001111571.1| PREDICTED: Macaca mulatta cAMP-dependent ... 136 >> 2e-29 >> ref|XM_001611655.1| Babesia bovis protein kinase domain conta... 134 >> 9e-29 >> ref|NR_028062.1| Homo sapiens protein kinase, Y-linked (PRKY)... 132 >> 3e-28 >> ref|XM_001517795.1| PREDICTED: Ornithorhynchus anatinus simil... 132 >> 3e-28 >> ref|XM_685338.2| PREDICTED: Danio rerio similar to Serine/thr... 132 >> 3e-28 >> ref|XM_002189865.1| PREDICTED: Taeniopygia guttata protein ki... 131 >> 1e-27 >> ref|XM_001362299.1| PREDICTED: Monodelphis domestica similar ... 131 >> 1e-27 >> ref|NM_001093198.1| Xenopus laevis protein kinase, cAMP-depen... 129 >> 4e-27 >> ref|XM_001461322.1| Paramecium tetraurelia hypothetical prote... 129 >> 4e-27 >> ref|NM_001099869.1| Xenopus laevis cAMP-dependent protein kin... 129 >> 4e-27 >> ref|NM_001003470.1| Danio rerio protein kinase, cAMP-dependen... 129 >> 4e-27 >> ref|XM_001141503.1| PREDICTED: Pan troglodytes verus protein ... 127 >> 1e-26 >> ref|XM_001145269.1| PREDICTED: Pan troglodytes protein kinase... 127 >> 1e-26 >> ref|XM_512434.2| PREDICTED: Pan troglodytes cAMP-dependent pr... 127 >> 1e-26 >> ref|XM_001171457.1| PREDICTED: Pan troglodytes cAMP-dependent... 127 >> 1e-26 >> ref|XM_001171437.1| PREDICTED: Pan troglodytes cAMP-dependent... 127 >> 1e-26 >> ref|XM_847420.1| PREDICTED: Canis familiaris similar to Serin... 127 >> 1e-26 >> ref|NM_207518.1| Homo sapiens protein kinase, cAMP-dependent,... 127 >> 1e-26 >> ref|NM_002730.3| Homo sapiens protein kinase, cAMP-dependent,... 127 >> 1e-26 >> >> >> Thanks in advance. >> >> Roopa. >> >> On Sat, Jan 9, 2010 at 7:46 PM, Mark A. Jensen wrote: >> >>> I understand you. Put in the double quotes and see what happens. >>> >>> ----- Original Message ----- >>> *From:* Roopa Raghuveer >>> *To:* Mark A. Jensen >>> *Sent:* Saturday, January 09, 2010 1:40 PM >>> *Subject:* Re: [Bioperl-l] Regarding blast in Bioperl >>> >>> Hi Mark, >>> >>> Thanks for your reply. It was working when I specifically use the name of >>> the organism as Trypanosoma brucei in the code,but my idea is to introduce a >>> $organ which takes the organism given by the user i.e., let it be anything >>> >>> Pseudomonas, Drosophila, Trypanosoma, Leishmania etc., I should get the >>> sequences related to only those organisms. >>> >>> i.e., If the user enters Pseudomonas,the $organ parameter of the code >>> takes Pseudomonas ,does BLAST and returns only those sequences that produce >>> significant alignment with Pseudomonas(only).But this is not happening like >>> that . >>> >>> Please help me in this regard. >>> >>> Thanks in advance >>> Roopa >>> >>> On Sat, Jan 9, 2010 at 7:29 PM, Mark A. Jensen wrote: >>> >>>> Hi Roopa-- You may get what you want if you make the change. >>>> With single quotes, ENTREZ_QUERY is set to the literal string >>>> >>>> $organ[ORGN] >>>> >>>> while, with double quotes, the variable value will be substituted, >>>> and the parameter should be set to >>>> >>>> Trypanosoma brucei[ORGN] >>>> >>>> I'm guess that it worked because the database ignored the strange >>>> parameter, >>>> and returned all the matches. Try this and if it doesn't work I look >>>> harder. >>>> cheers, >>>> Mark >>>> >>>> ----- Original Message ----- >>>> *From:* Roopa Raghuveer >>>> *To:* Mark A. Jensen >>>> *Sent:* Saturday, January 09, 2010 1:24 PM >>>> *Subject:* Re: [Bioperl-l] Regarding blast in Bioperl >>>> >>>> hello Mark, >>>> >>>> Thanks for your reply.It was working without enclosing $organ[ORGN] in >>>> double quotations,but. I would like to have only those specific sequences >>>> which are specific for my Organism i.e., I need sequences only from the >>>> organism that I entered. >>>> >>>> When the organism is Trypanosoma brucei,I could get even Leishmania and >>>> other species as the similar sequences. But I want to get only trypanosoma >>>> brucei sequences. >>>> >>>> Could you please help me out in this regard? >>>> >>>> Roopa. >>>> >>>> My output >>>> >>>> I/P organism: Trypanosoma brucei >>>> >>>> O/P:- >>>> ref|XM_822292.1| Trypanosoma brucei TREU927 protein kinase A ... >>>> 1813 0.0 >>>> ref|XM_822286.1| Trypanosoma brucei TREU927 protein kinase A ... >>>> 1622 0.0 >>>> ref|XM_816530.1| Trypanosoma cruzi strain CL Brener protein k... >>>> 773 0.0 >>>> ref|XM_816527.1| Trypanosoma cruzi strain CL Brener protein k... >>>> 749 0.0 >>>> ref|XM_838414.1| Leishmania major strain Friedlin protein kin... >>>> 551 3e-154 >>>> ref|XM_838409.1| Leishmania major strain Friedlin protein kin... >>>> 551 3e-154 >>>> ref|XM_001568451.1| Leishmania braziliensis MHOM/BR/75/M2904 ... >>>> 542 2e-151 >>>> ref|XM_001469171.1| Leishmania infantum protein kinase A cata... >>>> 538 2e-150 >>>> ref|XM_001469166.1| Leishmania infantum protein kinase A cata... >>>> 538 2e-150 >>>> ref|XM_001682462.1| Leishmania major protein kinase A catalyt... >>>> 196 3e-47 >>>> ref|XM_804361.1| Trypanosoma cruzi strain CL Brener protein k... >>>> 190 1e-45 >>>> ref|XM_002065851.1| Drosophila willistoni GK20594 (Dwil\GK205... >>>> 181 7e-43 >>>> ref|XM_822694.1| Trypanosoma brucei TREU927 protein kinase A ... >>>> 179 2e-42 >>>> ref|XM_001563990.1| Leishmania braziliensis MHOM/BR/75/M2904 ... >>>> 178 8e-42 >>>> ref|XM_814844.1| Trypanosoma cruzi strain CL Brener protein k... >>>> 170 1e-39 >>>> ref|XM_001763039.1| Physcomitrella patens subsp. patens predi... >>>> 168 4e-39 >>>> ref|XM_001464886.1| Leishmania infantum JPCM5 protein kinase ... >>>> 167 1e-38 >>>> ref|XM_001377302.1| PREDICTED: Monodelphis domestica similar ... >>>> 150 1e-33 >>>> ref|XM_001603485.1| PREDICTED: Nasonia vitripennis similar to... >>>> 145 5e-32 >>>> ref|XM_416852.2| PREDICTED: Gallus gallus protein kinase, X-l... >>>> 143 2e-31 >>>> ref|NM_001016403.2| Xenopus (Silurana) tropicalis protein kin... >>>> 143 2e-31 >>>> ref|XM_002009291.1| Drosophila mojavensis GI11297 (Dmoj\GI112... >>>> 138 7e-30 >>>> ref|NM_016979.1| Mus musculus protein kinase, X-linked (Prkx)... >>>> 138 7e-30 >>>> ref|XM_001495664.2| PREDICTED: Equus caballus similar to Seri... >>>> 136 2e-29 >>>> ref|XM_001111571.1| PREDICTED: Macaca mulatta cAMP-dependent ... >>>> 136 2e-29 >>>> ref|XM_001611655.1| Babesia bovis protein kinase domain conta... >>>> 134 9e-29 >>>> ref|NR_028062.1| Homo sapiens protein kinase, Y-linked (PRKY)... >>>> 132 3e-28 >>>> ref|XM_001517795.1| PREDICTED: Ornithorhynchus anatinus simil... >>>> 132 3e-28 >>>> ref|XM_685338.2| PREDICTED: Danio rerio similar to Serine/thr... >>>> 132 3e-28 >>>> ref|XM_002189865.1| PREDICTED: Taeniopygia guttata protein ki... >>>> 131 1e-27 >>>> ref|XM_001362299.1| PREDICTED: Monodelphis domestica similar ... >>>> 131 1e-27 >>>> ref|NM_001093198.1| Xenopus laevis protein kinase, cAMP-depen... >>>> 129 4e-27 >>>> ref|XM_001461322.1| Paramecium tetraurelia hypothetical prote... >>>> 129 4e-27 >>>> ref|NM_001099869.1| Xenopus laevis cAMP-dependent protein kin... >>>> 129 4e-27 >>>> >>>> Roopa. >>>> >>>> On Sat, Jan 9, 2010 at 7:05 PM, Mark A. Jensen wrote: >>>> >>>>> I see it immediately (from making same bug many times) : >>>>> >>>>> >>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY >>>>> => >>>>> - '$organ[ORGN]'); >>>>> +"$organ[ORGN]"); >>>>> >>>>> >>>>> MAJ >>>>> >>>>> ----- Original Message ----- From: "Roopa Raghuveer" < >>>>> rtbio.2009 at gmail.com> >>>>> To: "Mark A. Jensen" >>>>> Cc: >>>>> Sent: Saturday, January 09, 2010 11:57 AM >>>>> Subject: Re: [Bioperl-l] Regarding blast in Bioperl >>>>> >>>>> >>>>> >>>>> Hello all, >>>>>> >>>>>> Thanks alot for your reply Mark. It was working for Trypanosoma brucei >>>>>> as >>>>>> the organism parameter,but when I tried to use the Organism parameter >>>>>> from >>>>>> the user,it was not working i.e., I was unable to get the target >>>>>> sequences. >>>>>> Please help me in this regard. My code is >>>>>> >>>>>> #!/usr/bin/perl >>>>>> >>>>>> #path for extra camel module >>>>>> use lib "/srv/www/htdocs/rain/RNAi/"; >>>>>> use Roopablast; >>>>>> >>>>>> >>>>>> use Bio::SearchIO; >>>>>> use Bio::Search::Result::BlastResult; >>>>>> use Bio::Perl; >>>>>> use Bio::Tools::Run::RemoteBlast; >>>>>> use Bio::Seq; >>>>>> use Bio::SeqIO; >>>>>> use Bio::DB::GenBank; >>>>>> >>>>>> $serverpath = "/srv/www/htdocs/rain/RNAi"; >>>>>> $serverurl = "http://141.84.66.66/rain/RNAi"; >>>>>> $outfile = $serverpath."/rnairesult_".time().".html"; >>>>>> $nuc = $serverpath."/nuc".time().".txt"; >>>>>> $debugfile = $serverpath."/debug_".time().".txt"; >>>>>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; >>>>>> >>>>>> my $outstring =""; >>>>>> >>>>>> &parse_form; >>>>>> >>>>>> print "Content-type: text/html\n\n"; >>>>>> print "\n"; >>>>>> print "RNAi Result"; >>>>>> print ">>>>> URL=$serverurl/rnairesult_".time().".html\"> \n"; >>>>>> print "\n"; >>>>>> print "\n"; >>>>>> print " Your results will appear >>>>> href=$serverurl/rnairesult_".time().".html>here
"; >>>>>> print " Please be patient, runtime can be up to 5 minutes
"; >>>>>> print " This page will automatically reload in 30 seconds. Roopa"; >>>>>> print "\n"; >>>>>> print "\n"; >>>>>> >>>>>> defined(my $pid = fork) or die "Can't fork: $!"; >>>>>> exit if $pid; >>>>>> open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; >>>>>> open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; >>>>>> open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; >>>>>> >>>>>> open(OUTFILE, '>',$outfile); >>>>>> >>>>>> print OUTFILE "\n >>>>>> RNAi Result >>>>>> >>>>> URL=$serverurl//rnairesult_".time().".html\"> \n >>>>>> >>>>>> \n >>>>>> \n >>>>>> Your results will appear >>>>> href=$serverurl/rnairesult_".time().".html>here
>>>>>> Please be patient, runtime can be up to 5 minutes wait wait >>>>>> wait......
>>>>>> This page will automatically reload in 30 seconds Roopa
>>>>>> \n >>>>>> \n"; >>>>>> >>>>>> close(OUTFILE); >>>>>> >>>>>> >>>>>> @compseqs = blastcode($in{'Inputseq'},$in{'Organism'}); >>>>>> >>>>>> $in{'Inputseq'} =~ s/>.*$//m; >>>>>> $in{'Inputseq'} =~ s/[^TAGC]//gim; >>>>>> $in{'Inputseq'} =~ tr/actg/ACTG/; >>>>>> >>>>>> @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'}, >>>>>> $in{'Threshold'}); >>>>>> >>>>>> >>>>>> sub blastcode >>>>>> { >>>>>> >>>>>> $inpu1= $_[0]; >>>>>> >>>>>> $organ= $_[1]; >>>>>> >>>>>> open(NUC,'>',$nuc); >>>>>> print NUC $inpu1,"\n"; >>>>>> close(NUC); >>>>>> >>>>>> my $prog = 'blastn'; >>>>>> my $db = 'refseq_rna'; >>>>>> my $e_val= '1e-10'; >>>>>> my $organism= $organ; >>>>>> >>>>>> $gb = new Bio::DB::GenBank; >>>>>> >>>>>> my @params = ( '-prog' => $prog, >>>>>> '-data' => $db, >>>>>> '-expect' => $e_val, >>>>>> '-readmethod' => 'SearchIO', >>>>>> '-Organism' => $organism ); >>>>>> >>>>>> open(OUTFILE,'>',$debugfile); >>>>>> print OUTFILE $inpu1; >>>>>> close(OUTFILE); >>>>>> >>>>>> >>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY >>>>>> => >>>>>> '$organ[ORGN]'); >>>>>> >>>>>> #my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >>>>>> >>>>>> #change a paramter >>>>>> >>>>>> #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma >>>>>> Brucei[ORGN]'; >>>>>> >>>>>> #change a paramter >>>>>> # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = >>>>>> '$input2[ORGN]'; >>>>>> >>>>>> my $v = 1; >>>>>> #$v is just to turn on and off the messages >>>>>> >>>>>> my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , >>>>>> '-organism' => $organ ); >>>>>> >>>>>> >>>>>> while (my $input = $str->next_seq()) >>>>>> { >>>>>> #Blast a sequence against a database: >>>>>> #Alternatively, you could pass in a file with many >>>>>> #sequences rather than loop through sequence one at a time >>>>>> #Remove the loop starting 'while (my $input = $str->next_seq())' >>>>>> #and swap the two lines below for an example of that. >>>>>> >>>>>> #open(OUTFILE,'>',$debugfile); >>>>>> # print OUTFILE $input; >>>>>> #close(OUTFILE); >>>>>> >>>>>> >>>>>> my $r = $factory->submit_blast($input); >>>>>> >>>>>> open(OUTFILE,'>',$debugfile); >>>>>> # print OUTFILE $r; >>>>>> close(OUTFILE); >>>>>> >>>>>> print STDERR "waiting...." if($v>0); >>>>>> >>>>>> while ( my @rids = $factory->each_rid ) { >>>>>> # open(OUTFILE,'>',$debugfile); >>>>>> # print OUTFILE "while entered"; >>>>>> # close(OUTFILE); >>>>>> foreach my $rid ( @rids ) { >>>>>> >>>>>> # open(OUTFILE,'>',$debugfile); >>>>>> # print OUTFILE "foreach entered"; >>>>>> # close(OUTFILE); >>>>>> >>>>>> my $rc = $factory->retrieve_blast($rid); >>>>>> >>>>>> if( !ref($rc) ) >>>>>> { >>>>>> if( $rc < 0 ) >>>>>> { >>>>>> $factory->remove_rid($rid); >>>>>> } >>>>>> open(OUTFILE,'>',$debugfile); >>>>>> # print OUTFILE "if entered"; >>>>>> close(OUTFILE); >>>>>> print STDERR "." if ( $v > 0 ); >>>>>> sleep 5; >>>>>> } >>>>>> else { >>>>>> # open(OUTFILE,'>',$debugfile); >>>>>> # print OUTFILE "else entered"; >>>>>> # close(OUTFILE); >>>>>> >>>>>> my $result = $rc->next_result(); >>>>>> #save the output >>>>>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; >>>>>> >>>>>> open(BLASTDEBUGFILE,'>',$blastdebugfile); >>>>>> print BLASTDEBUGFILE $result->next_hit(); >>>>>> close(BLASTDEBUGFILE); >>>>>> >>>>>> my $filename = >>>>>> $serverpath."/blastdata_".time().$result->query_name()."\.out"; >>>>>> >>>>>> # open(DEBUGFILE,'>',$debugfile); >>>>>> # open(new,'>',$filename); >>>>>> # @arra=; >>>>>> # print DEBUGFILE @arra; >>>>>> # close(DEBUGFILE); >>>>>> # close(new); >>>>>> >>>>>> $factory->save_output($filename); >>>>>> # open(BLASTDEBUGFILE,'>',$debugfile); >>>>>> # print BLASTDEBUGFILE "Hello $rid"; >>>>>> # close(BLASTDEBUGFILE); >>>>>> >>>>>> $factory->remove_rid($rid); >>>>>> >>>>>> open(BLASTDEBUGFILE,'>',$blastdebugfile); >>>>>> print BLASTDEBUGFILE $organism; >>>>>> close(BLASTDEBUGFILE); >>>>>> >>>>>> # open(OUTFILE,'>',$outfile); >>>>>> # print OUTFILE "Test2 $result->database_name()"; >>>>>> # close(OUTFILE); >>>>>> >>>>>> #$hit = $result->next_hit; >>>>>> #open(new,'>',$debugfile); >>>>>> #print $hit; >>>>>> #close(new); >>>>>> >>>>>> while ( my $hit = $result->next_hit ) { >>>>>> >>>>>> next unless ( $v > 0); >>>>>> >>>>>> # open(OUTFILE,'>',$debugfile); >>>>>> # print OUTFILE "$hit in while hits"; >>>>>> # close(OUTFILE); >>>>>> >>>>>> my $sequ = $gb->get_Seq_by_version($hit->name); >>>>>> my $dna = $sequ->seq(); # get the sequence as a string >>>>>> push(@seqs,$dna); >>>>>> } >>>>>> } >>>>>> } >>>>>> } >>>>>> } >>>>>> >>>>>> #open(OUTFILE,'>',$debugfile); >>>>>> #print OUTFILE $seqs[0]; >>>>>> #close(OUTFILE); >>>>>> >>>>>> return(@seqs); >>>>>> >>>>>> } >>>>>> >>>>>> Regards, >>>>>> Roopa. >>>>>> >>>>>> >>>>>> On Fri, Jan 8, 2010 at 4:36 PM, Mark A. Jensen >>>>>> wrote: >>>>>> >>>>>> Hi Roopa-- >>>>>>> >>>>>>> I got your code to work with the following changes: >>>>>>> >>>>>>> +# the input should be a valid FASTA file... >>>>>>> ... >>>>>>> open(NUC,'>',$nuc); >>>>>>> +print NUC ">seq (need a name line for valid fasta)\n"; >>>>>>> print NUC $inpu1, "\n"; >>>>>>> close(NUC); >>>>>>> ... >>>>>>> >>>>>>> +# you can set these header parms in the call itself... >>>>>>> - my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >>>>>>> + my $factory = Bio::Tools::Run::RemoteBlast->new(@params, >>>>>>> -ENTREZ_QUERY => >>>>>>> ''Trypanosoma Brucei[ORGN]'); >>>>>>> >>>>>>> #change a paramter >>>>>>> +# commented this out... >>>>>>> +# $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = >>>>>>> 'Trypanosoma >>>>>>> Brucei[ORGN]'; >>>>>>> >>>>>>> MAJ >>>>>>> ----- Original Message ----- From: "Roopa Raghuveer" < >>>>>>> rtbio.2009 at gmail.com >>>>>>> > >>>>>>> To: >>>>>>> Sent: Friday, January 08, 2010 10:00 AM >>>>>>> Subject: [Bioperl-l] Regarding blast in Bioperl >>>>>>> >>>>>>> >>>>>>> Hello all, >>>>>>> >>>>>>>> >>>>>>>> I was trying Remote blast using Bioperl. My input data is a >>>>>>>> Trypanosoma >>>>>>>> brucei sequence in Fasta format. When I was trying to submit to >>>>>>>> BLAST >>>>>>>> using >>>>>>>> the step >>>>>>>> $r=$factory->submit_blast($input) >>>>>>>> It was not returning anything which I checked by debugging the code. >>>>>>>> It is >>>>>>>> not blasting my input sequence even though I mentioned all the >>>>>>>> parameters.I >>>>>>>> would paste the code below. >>>>>>>> >>>>>>>> Please help me in solving put this problem. It is very urgent. >>>>>>>> >>>>>>>> Regards >>>>>>>> Roopa. >>>>>>>> >>>>>>>> #!/usr/bin/perl >>>>>>>> >>>>>>>> #path for extra camel module >>>>>>>> use lib "/srv/www/htdocs/rain/RNAi/"; >>>>>>>> use Roopablast; >>>>>>>> >>>>>>>> >>>>>>>> use Bio::SearchIO; >>>>>>>> use Bio::Search::Result::BlastResult; >>>>>>>> use Bio::Perl; >>>>>>>> use Bio::Tools::Run::RemoteBlast; >>>>>>>> use Bio::Seq; >>>>>>>> use Bio::SeqIO; >>>>>>>> use Bio::DB::GenBank; >>>>>>>> >>>>>>>> $serverpath = "/srv/www/htdocs/rain/RNAi"; >>>>>>>> $serverurl = "http://141.84.66.66/rain/RNAi"; >>>>>>>> $outfile = $serverpath."/rnairesult_".time().".html"; >>>>>>>> $nuc = $serverpath."/nuc".time().".txt"; >>>>>>>> $debugfile = $serverpath."/debug_".time().".txt"; >>>>>>>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; >>>>>>>> >>>>>>>> my $outstring =""; >>>>>>>> >>>>>>>> &parse_form; >>>>>>>> >>>>>>>> print "Content-type: text/html\n\n"; >>>>>>>> print "\n"; >>>>>>>> print "RNAi Result"; >>>>>>>> print ">>>>>>> URL=$serverurl/rnairesult_".time().".html\"> \n"; >>>>>>>> print "\n"; >>>>>>>> print "\n"; >>>>>>>> print " Your results will appear >>>>>>> href=$serverurl/rnairesult_".time().".html>here
"; >>>>>>>> print " Please be patient, runtime can be up to 5 minutes
"; >>>>>>>> print " This page will automatically reload in 30 seconds. Roopa"; >>>>>>>> print "\n"; >>>>>>>> print "\n"; >>>>>>>> >>>>>>>> defined(my $pid = fork) or die "Can't fork: $!"; >>>>>>>> exit if $pid; >>>>>>>> open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; >>>>>>>> open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; >>>>>>>> open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> open(OUTFILE, '>',$outfile); >>>>>>>> >>>>>>>> print OUTFILE "\n >>>>>>>> RNAi Result >>>>>>>> >>>>>>> URL=$serverurl//rnairesult_".time().".html\"> \n >>>>>>>> >>>>>>>> \n >>>>>>>> \n >>>>>>>> Your results will appear >>>>>>> href=$serverurl/rnairesult_".time().".html>here
>>>>>>>> Please be patient, runtime can be up to 5 minutes wait wait >>>>>>>> wait......
>>>>>>>> This page will automatically reload in 30 seconds Roopa
>>>>>>>> \n >>>>>>>> \n"; >>>>>>>> >>>>>>>> close(OUTFILE); >>>>>>>> >>>>>>>> >>>>>>>> @compseqs = blastcode($in{'Inputseq'}); >>>>>>>> >>>>>>>> $in{'Inputseq'} =~ s/>.*$//m; >>>>>>>> $in{'Inputseq'} =~ s/[^TAGC]//gim; >>>>>>>> $in{'Inputseq'} =~ tr/actg/ACTG/; >>>>>>>> >>>>>>>> @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'}, >>>>>>>> $in{'Threshold'}); >>>>>>>> >>>>>>>> >>>>>>>> sub blastcode >>>>>>>> { >>>>>>>> >>>>>>>> $inpu1= $_[0]; >>>>>>>> >>>>>>>> #$organ= $_[1]; >>>>>>>> >>>>>>>> open(NUC,'>',$nuc); >>>>>>>> print NUC $inpu1; >>>>>>>> close(NUC); >>>>>>>> >>>>>>>> my $prog = 'blastn'; >>>>>>>> my $db = 'refseq_rna'; >>>>>>>> my $e_val= '1e-10'; >>>>>>>> my $organism= 'Trypanosoma Brucei'; >>>>>>>> >>>>>>>> $gb = new Bio::DB::GenBank; >>>>>>>> >>>>>>>> my @params = ( '-prog' => $prog, >>>>>>>> '-data' => $db, >>>>>>>> '-expect' => $e_val, >>>>>>>> '-readmethod' => 'SearchIO', >>>>>>>> '-Organism' => $organism ); >>>>>>>> >>>>>>>> # open(OUTFILE,'>',$debugfile); >>>>>>>> # print OUTFILE @params; >>>>>>>> # close(OUTFILE); >>>>>>>> >>>>>>>> >>>>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >>>>>>>> >>>>>>>> #change a paramter >>>>>>>> >>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma >>>>>>>> Brucei[ORGN]'; >>>>>>>> >>>>>>>> #change a paramter >>>>>>>> # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = >>>>>>>> '$input2[ORGN]'; >>>>>>>> >>>>>>>> my $v = 1; >>>>>>>> #$v is just to turn on and off the messages >>>>>>>> >>>>>>>> my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , >>>>>>>> '-organism' => 'Trypanosoma Brucei' ); >>>>>>>> >>>>>>>> >>>>>>>> while (my $input = $str->next_seq()) >>>>>>>> { >>>>>>>> #Blast a sequence against a database: >>>>>>>> #Alternatively, you could pass in a file with many >>>>>>>> #sequences rather than loop through sequence one at a time >>>>>>>> #Remove the loop starting 'while (my $input = $str->next_seq())' >>>>>>>> #and swap the two lines below for an example of that. >>>>>>>> >>>>>>>> open(OUTFILE,'>',$debugfile); >>>>>>>> print OUTFILE $input; >>>>>>>> close(OUTFILE); >>>>>>>> >>>>>>>> >>>>>>>> my $r = $factory->submit_blast($input); #The program stops here >>>>>>>> it >>>>>>>> does not return any value and it does not enter the While >>>>>>>> loop,Please help >>>>>>>> me in this regard.# >>>>>>>> open(OUTFILE,'>',$debugfile); >>>>>>>> print OUTFILE $r; >>>>>>>> close(OUTFILE); >>>>>>>> >>>>>>>> >>>>>>>> print STDERR "waiting...." if($v>0); >>>>>>>> >>>>>>>> while ( my @rids = $factory->each_rid ) { >>>>>>>> open(OUTFILE,'>',$debugfile); >>>>>>>> print OUTFILE "while entered"; >>>>>>>> close(OUTFILE); >>>>>>>> foreach my $rid ( @rids ) { >>>>>>>> >>>>>>>> open(OUTFILE,'>',$debugfile); >>>>>>>> print OUTFILE "foreach entered"; >>>>>>>> close(OUTFILE); >>>>>>>> >>>>>>>> my $rc = $factory->retrieve_blast($rid); >>>>>>>> >>>>>>>> if( !ref($rc) ) >>>>>>>> { >>>>>>>> if( $rc < 0 ) >>>>>>>> { >>>>>>>> $factory->remove_rid($rid); >>>>>>>> } >>>>>>>> open(OUTFILE,'>',$debugfile); >>>>>>>> print OUTFILE "if entered"; >>>>>>>> close(OUTFILE); >>>>>>>> print STDERR "." if ( $v > 0 ); >>>>>>>> sleep 5; >>>>>>>> } >>>>>>>> else { >>>>>>>> open(OUTFILE,'>',$debugfile); >>>>>>>> print OUTFILE "else entered"; >>>>>>>> close(OUTFILE); >>>>>>>> >>>>>>>> my $result = $rc->next_result(); >>>>>>>> #save the output >>>>>>>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; >>>>>>>> >>>>>>>> open(BLASTDEBUGFILE,'>',$blastdebugfile); >>>>>>>> print BLASTDEBUGFILE $result->next_hit(); >>>>>>>> close(BLASTDEBUGFILE); >>>>>>>> >>>>>>>> my $filename = >>>>>>>> $serverpath."/blastdata_".time().$result->query_name()."\.out"; >>>>>>>> >>>>>>>> # open(DEBUGFILE,'>',$debugfile); >>>>>>>> # open(new,'>',$filename); >>>>>>>> # @arra=; >>>>>>>> # print DEBUGFILE @arra; >>>>>>>> # close(DEBUGFILE); >>>>>>>> # close(new); >>>>>>>> >>>>>>>> $factory->save_output($filename); >>>>>>>> >>>>>>>> # open(BLASTDEBUGFILE,'>',$debugfile); >>>>>>>> # print BLASTDEBUGFILE "Hello $rid"; >>>>>>>> # close(BLASTDEBUGFILE); >>>>>>>> >>>>>>>> $factory->remove_rid($rid); >>>>>>>> >>>>>>>> open(BLASTDEBUGFILE,'>',$blastdebugfile); >>>>>>>> print BLASTDEBUGFILE $organism; >>>>>>>> close(BLASTDEBUGFILE); >>>>>>>> >>>>>>>> # open(OUTFILE,'>',$outfile); >>>>>>>> # print OUTFILE "Test2 $result->database_name()"; >>>>>>>> # close(OUTFILE); >>>>>>>> >>>>>>>> #$hit = $result->next_hit; >>>>>>>> #open(new,'>',$debugfile); >>>>>>>> #print $hit; >>>>>>>> #close(new); >>>>>>>> >>>>>>>> while ( my $hit = $result->next_hit ) { >>>>>>>> >>>>>>>> next unless ( $v > 0); >>>>>>>> >>>>>>>> # open(OUTFILE,'>',$debugfile); >>>>>>>> # print OUTFILE "$hit in while hits"; >>>>>>>> # close(OUTFILE); >>>>>>>> >>>>>>>> my $sequ = $gb->get_Seq_by_version($hit->name); >>>>>>>> my $dna = $sequ->seq(); # get the sequence as a >>>>>>>> string >>>>>>>> push(@seqs,$dna); >>>>>>>> } >>>>>>>> } >>>>>>>> } >>>>>>>> } >>>>>>>> } >>>>>>>> >>>>>>>> #open(OUTFILE,'>',$debugfile); >>>>>>>> #print OUTFILE $seqs[0]; >>>>>>>> #close(OUTFILE); >>>>>>>> >>>>>>>> return(@seqs); >>>>>>>> >>>>>>>> } >>>>>>>> >>>>>>>> open(OUTFILE, '>',$outfile) || die ; >>>>>>>> >>>>>>>> print OUTFILE "\n >>>>>>>> RNAi Result >>>>>>>> \n >>>>>>>> \n >>>>>>>>

>>>>>>>> Inputsequence:
"; >>>>>>>> >>>>>>>> for ($i=0; $i>>>>>>> >>>>>>>> print OUTFILE substr ($in{'Inputseq'}, $i, 1); >>>>>>>> >>>>>>>> if ( ($i+1)%10==0){ >>>>>>>> print OUTFILE " "; >>>>>>>> } >>>>>>>> if ( ($i+1)%60==0){ >>>>>>>> print OUTFILE "
\n"; >>>>>>>> } >>>>>>>> } >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> print OUTFILE "

"; >>>>>>>> >>>>>>>> $z=@compseqs; >>>>>>>> >>>>>>>> for($k=1;$k<$z;$k++) { >>>>>>>> print OUTFILE ">>>>>>> set\">

Compare >>>>>>>> Sequence:
"; >>>>>>>> >>>>>>>> for ($i=0; $i>>>>>>> >>>>>>>> print OUTFILE substr ($compseqs[$k], $i, 1); >>>>>>>> >>>>>>>> if ( ($i+1)%10==0){ >>>>>>>> print OUTFILE " "; >>>>>>>> } >>>>>>>> if ( ($i+1)%60==0){ >>>>>>>> print OUTFILE "
\n"; >>>>>>>> } >>>>>>>> } >>>>>>>> print OUTFILE "

"; >>>>>>>> } >>>>>>>> >>>>>>>> print OUTFILE "

>>>>>>>> Window:
$in{'Windowsize'} >>>>>>>>

>>>>>>>>

>>>>>>>> Threshold:
$in{'Threshold'} >>>>>>>>

"; >>>>>>>> my $j=0; >>>>>>>> >>>>>>>> for ($i=0; $i>>>>>>> >>>>>>>> if ($i<=length ($in{'Inputseq'})-$in{'Windowsize'}){ >>>>>>>> if ($out[$i]->{similar}<=$in{'Threshold'}){ >>>>>>>> $j=$in{'Windowsize'}; >>>>>>>> } >>>>>>>> $height=$out[$i]->{similar}*5; >>>>>>>> } >>>>>>>> >>>>>>>> if ($j>0) { >>>>>>>> print OUTFILE ">>>>>>> height=\"5\">"; >>>>>>>> $outstring .= "".substr ($in{'Inputseq'}, >>>>>>>> $i, >>>>>>>> 1).""; >>>>>>>> $j--; >>>>>>>> } >>>>>>>> else { >>>>>>>> print OUTFILE ">>>>>>> height=\"5\">"; >>>>>>>> $outstring .= "".substr ($in{'Inputseq'}, >>>>>>>> $i, >>>>>>>> 1).""; >>>>>>>> } >>>>>>>> >>>>>>>> if ( ($i+1)%10==0){ >>>>>>>> $outstring .= " "; >>>>>>>> } >>>>>>>> if ( ($i+1)%60==0){ >>>>>>>> $outstring .= "
\n"; >>>>>>>> >>>>>>>> } >>>>>>>> if ( ($i+1)%800==0){ >>>>>>>> print OUTFILE "

\n"; >>>>>>>> >>>>>>>> } >>>>>>>> } >>>>>>>> >>>>>>>> print OUTFILE "

>>>>>>> set\">$outstring"; >>>>>>>> >>>>>>>> #foreach (@out) { >>>>>>>> #print OUTFILE "

Sequence: $_->{sequence}: $_->{similar} >>>>>>>> matchs

"; >>>>>>>> #if ($_->{similar}<=$in{'Threshold'}){ >>>>>>>> >>>>>>>> # } >>>>>>>> #} >>>>>>>> >>>>>>>> print OUTFILE "\n\n"; >>>>>>>> >>>>>>>> close OUTFILE; >>>>>>>> >>>>>>>> #nameprint(); >>>>>>>> >>>>>>>> sub parse_form { >>>>>>>> local ($buffer, @pairs, $pair, $name, $value); >>>>>>>> # Read in text >>>>>>>> $ENV{'REQUEST_METHOD'} =~ tr/a-z/A-Z/; >>>>>>>> if ($ENV{'REQUEST_METHOD'} eq "POST") >>>>>>>> { >>>>>>>> read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'}); >>>>>>>> } >>>>>>>> else >>>>>>>> { >>>>>>>> $buffer = $ENV{'QUERY_STRING'}; >>>>>>>> } >>>>>>>> @pairs = split(/&/, $buffer); >>>>>>>> foreach $pair (@pairs) >>>>>>>> { >>>>>>>> ($name, $value) = split(/=/, $pair); >>>>>>>> $value =~ tr/+/ /; >>>>>>>> $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg; >>>>>>>> $in{$name} = $value; >>>>>>>> } >>>>>>>> } >>>>>>>> _______________________________________________ >>>>>>>> Bioperl-l mailing list >>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>>>> >>>> >>> >> > From bernd.web at gmail.com Thu Jan 21 18:37:18 2010 From: bernd.web at gmail.com (Bernd Web) Date: Thu, 21 Jan 2010 19:37:18 +0100 Subject: [Bioperl-l] Regarding blast in Bioperl In-Reply-To: References: <9D8A1428463C4D5E9C416521C35E254C@NewLife> <196889DF87964224ACDB948681BA7F86@NewLife> Message-ID: <716af09c1001211037p59b19a29l1967f1e514469e79@mail.gmail.com> Hi, Regarding RemoteBlast, my I add a query? It seems that Bio::Tools::Run::RemoteBlast is sending each sequence seperately to the NCBI (at least in BP 1.5.2). This means that for each Sequence a RID is to be checked. Is this indeed the case? The BLAST URL-API or batch interface supports sending multiple sequences at once. Regards, Bernd On Thu, Jan 21, 2010 at 7:28 PM, Roopa Raghuveer wrote: > Hello Mark, > > This is Roopa again. I have a small problem again. I am working on Remote > blast. The program works well. But the problem is this. ?The program > accesses the server and gets the output correctly. I am trying to send the > result sequences into an array and I found that always the first sequence > among the Result sequences is missing. The code is > > ?my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , > '-organism' => "$organ\[ORGN]"); From cjfields at illinois.edu Fri Jan 22 04:31:25 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 21 Jan 2010 22:31:25 -0600 Subject: [Bioperl-l] Bio::BroodComb - RFC In-Reply-To: <638696D6-7529-4717-A05A-F1E8FF1C5A8F@jays.net> References: <638696D6-7529-4717-A05A-F1E8FF1C5A8F@jays.net> Message-ID: Jay, Did you want to release it to CPAN? I'll take a closer look at the docs to get an idea of what you are doing with it, but from my perspective I can see this becoming a nice general use tool akin to Bio::Perl, maybe a bit more lightweight. chris On Jan 18, 2010, at 6:22 PM, Jay Hannah wrote: > I formalized a little framework so I could stop re-writing little programs that do some things people frequently ask me to do. > > http://github.com/jhannah/bio-broodcomb > > It stores everything in SQLite so users can write custom reports if they want to. It uses BioPerl and some shiny CPAN beads (DBIx::Class, Moose::Role). Tests included. > > The first two functions I stuck in the framework: > > Find subsequences (Bio::BroodComb::SubSeq): > > use Bio::BroodComb; > my $bc = Bio::BroodComb->new(); > $bc->load_large_seq(file => "large_seq.fasta"); > $bc->load_small_seq(file => "small_seq.fasta"); > $bc->find_subseqs(); > print $bc->subseq_report1; > > In-silico PCR (Bio::BroodComb::PCR): > > use Bio::BroodComb; > my $bc = Bio::BroodComb->new(); > $bc->load_large_seq(file => "large_seq.fasta"); > $bc->add_primerset( > description => "U5/R", # however you want it reported > forward_primer => 'GCGGGCAGCAATACTGCTTTGTAA', > reverse_primer => 'ACCAGCGTTCAGCATATGGAGGAT', > ); > $bc->find_pcr_hits(); > $bc->find_pcr_products(); > print $bc->pcr_report1; > > I find this rather handy, so will probably be adding all my applicable future work to it instead of writing stand-alone programs. Not sure if it should be renamed for eventual CPAN / wherever. > > Suggestions, contributions welcome. :) > > http://github.com/jhannah/bio-broodcomb > > Jay Hannah > http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Fri Jan 22 06:17:14 2010 From: jason at bioperl.org (Jason Stajich) Date: Thu, 21 Jan 2010 22:17:14 -0800 Subject: [Bioperl-l] forcing alphabet in Bio::AlignIO Message-ID: I'm considering putting in allowable initialization parameter (and get/ set) for Bio::AlignIO that would allow setting of the alphabet. This is then passed to Bio::LocatableSeq creation so that _guess_alphabet isn't called. This will allow removal of warnings about empty sequences because _guess_alphabet won't be called on a sequence if we have explictly set the alphabet. This worked great on my local install and tests pass. Any objections or concerns? basically it means when you make an AlignIO you can specify the alphabet i.e. my $in = Bio::AlignIO->new(-format => 'fasta', -alphabet => 'dna', - file => 'genome.fasaln'); I have some alignments with empty sequences and I think turning off the warnings is appropriate where I force the alphabet choice. It should also have a very modest speedup benefit too. -jason -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ http://twitter.com/hyphaltip From rtbio.2009 at gmail.com Fri Jan 22 09:54:32 2010 From: rtbio.2009 at gmail.com (Roopa Raghuveer) Date: Fri, 22 Jan 2010 10:54:32 +0100 Subject: [Bioperl-l] Fwd: Regarding blast in Bioperl In-Reply-To: References: <9D8A1428463C4D5E9C416521C35E254C@NewLife> <196889DF87964224ACDB948681BA7F86@NewLife> Message-ID: ---------- Forwarded message ---------- From: Roopa Raghuveer Date: Thu, Jan 21, 2010 at 7:28 PM Subject: Re: [Bioperl-l] Regarding blast in Bioperl To: "Mark A. Jensen" Cc: bioperl-l at lists.open-bio.org Hello Mark, This is Roopa again. I have a small problem again. I am working on Remote blast. The program works well. But the problem is this. The program accesses the server and gets the output correctly. I am trying to send the result sequences into an array and I found that always the first sequence among the Result sequences is missing. The code is my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , '-organism' => "$organ\[ORGN]"); while (my $input = $str->next_seq()) { #Blast a sequence against a database: #Alternatively, you could pass in a file with many #sequences rather than loop through sequence one at a time #Remove the loop starting 'while (my $input = $str->next_seq())' #and swap the two lines below for an example of that. open(OUTFILE,'>',$debugfile); print OUTFILE $input; close(OUTFILE); my $r = $factory->submit_blast($input); open(OUTFILE,'>',$debugfile); # print OUTFILE $r; close(OUTFILE); print STDERR "waiting...." if($v>0); while ( my @rids = $factory->each_rid ) { open(OUTFILE,'>',$debugfile); # print OUTFILE "while entered"; close(OUTFILE); foreach my $rid ( @rids ) { open(OUTFILE,'>',$debugfile); # print OUTFILE "foreach entered"; close(OUTFILE); my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $factory->remove_rid($rid); } open(OUTFILE,'>',$debugfile); # print OUTFILE "if entered"; close(OUTFILE); print STDERR "." if ( $v > 0 ); sleep 5; } else { open(OUTFILE,'>',$debugfile); # print OUTFILE "else entered"; close(OUTFILE); my $result = $rc->next_result(); #save the output $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; open(BLASTDEBUGFILE,'>',$blastdebugfile); print BLASTDEBUGFILE $result->next_hit(); close(BLASTDEBUGFILE); my $filename = $serverpath."/blastdata_".time()."\.out"; # open(DEBUGFILE,'>',$debugfile); # open(new,'>',$filename); # @arra=; # print DEBUGFILE @arra; # close(DEBUGFILE); # close(new); $factory->save_output($filename); # open(BLASTDEBUGFILE,'>',$debugfile); # print BLASTDEBUGFILE "Hello $rid"; # close(BLASTDEBUGFILE); $factory->remove_rid($rid); open(BLASTDEBUGFILE,'>',$blastdebugfile); print BLASTDEBUGFILE $organism; close(BLASTDEBUGFILE); # open(OUTFILE,'>',$outfile); # print OUTFILE "Test2 $result->database_name()"; # close(OUTFILE); #$hit = $result->next_hit; #open(new,'>',$debugfile); #print $hit; #close(new); $dummy=0; while ( my $hit = $result->next_hit ) { next unless ( $v >= 0); # open(OUTFILE,'>',$debugfile); # print OUTFILE "$hit in while hits"; # close(OUTFILE); my $sequ = $gb->get_Seq_by_version($hit->name); my $dna = $sequ->seq(); # get the sequence as a string $dummy++; open(OUTFILE,'>',$debugfile); # print OUTFILE $dummy; close(OUTFILE); push(@seqs,$dna); } } } } } $warum=@seqs; open(OUTFILE,'>',$debugfile); # print OUTFILE $warum; print OUTFILE @seqs; close(OUTFILE); return(@seqs); } open(OUTFILE, '>',$outfile) || die ; print OUTFILE "\n RNAi Result \n \n

Inputsequence:
"; Here in the above code, I was trying to debug the code and trying to get the count of the array and even the sequence. But when the output data was giving 1 sequence, the count of the array was 0 and when I tried to print the output sequence I could not get any. It was the same when the no of output sequences was 3, I tried to print the sequences but was getting the count of the array as 2 and was printing only two sequences. Please help me in sorting out this problem. Regards, Roopa. On Sun, Jan 10, 2010 at 1:03 AM, Mark A. Jensen wrote: > Excellent Roopa- it's my pleasure-- MAJ > > ----- Original Message ----- > *From:* Roopa Raghuveer > *To:* Mark A. Jensen > *Sent:* Saturday, January 09, 2010 6:41 PM > *Subject:* Re: [Bioperl-l] Regarding blast in Bioperl > > Hi Mark, > > Thank you very very much. The code is working now. Thanks for the support > and time you have spent on me. > > Thanks in advance > Roopa. > > On Sat, Jan 9, 2010 at 10:56 PM, Mark A. Jensen wrote: > >> There is still a bug with the double quotes. Use "$organ\[ORGN]", which >> prevents perl from >> looking for a member of an array called @organ. This would have shown up >> if 'use strict;' had >> been in place. Still don't know whether this would work precisely; can you >> send me the query >> sequence so I can reproduce your ouput? >> thanks MAJ >> >> ----- Original Message ----- >> *From:* Roopa Raghuveer >> *To:* Mark A. Jensen >> *Sent:* Saturday, January 09, 2010 2:02 PM >> *Subject:* Re: [Bioperl-l] Regarding blast in Bioperl >> >> Hi Mark, >> >> I tried it with double quotes but still i got the same o/p with sequences >> from different species. >> >> ref|XM_822292.1| Trypanosoma brucei TREU927 protein kinase A ... 1813 >> 0.0 >> ref|XM_822286.1| Trypanosoma brucei TREU927 protein kinase A ... 1622 >> 0.0 >> ref|XM_816530.1| Trypanosoma cruzi strain CL Brener protein k... 773 >> 0.0 >> ref|XM_816527.1| Trypanosoma cruzi strain CL Brener protein k... 749 >> 0.0 >> ref|XM_838414.1| Leishmania major strain Friedlin protein kin... 551 >> 3e-154 >> ref|XM_838409.1| Leishmania major strain Friedlin protein kin... 551 >> 3e-154 >> ref|XM_001568451.1| Leishmania braziliensis MHOM/BR/75/M2904 ... 542 >> 2e-151 >> ref|XM_001469171.1| Leishmania infantum protein kinase A cata... 538 >> 2e-150 >> ref|XM_001469166.1| Leishmania infantum protein kinase A cata... 538 >> 2e-150 >> ref|XM_001682462.1| Leishmania major protein kinase A catalyt... 196 >> 3e-47 >> ref|XM_804361.1| Trypanosoma cruzi strain CL Brener protein k... 190 >> 1e-45 >> ref|XM_002065851.1| Drosophila willistoni GK20594 (Dwil\GK205... 181 >> 7e-43 >> ref|XM_822694.1| Trypanosoma brucei TREU927 protein kinase A ... 179 >> 2e-42 >> ref|XM_001563990.1| Leishmania braziliensis MHOM/BR/75/M2904 ... 178 >> 8e-42 >> ref|XM_814844.1| Trypanosoma cruzi strain CL Brener protein k... 170 >> 1e-39 >> ref|XM_001763039.1| Physcomitrella patens subsp. patens predi... 169 >> 4e-39 >> ref|XM_001464886.1| Leishmania infantum JPCM5 protein kinase ... 167 >> 1e-38 >> ref|XM_001377302.1| PREDICTED: Monodelphis domestica similar ... 150 >> 1e-33 >> ref|XM_001603485.1| PREDICTED: Nasonia vitripennis similar to... 145 >> 5e-32 >> ref|XM_416852.2| PREDICTED: Gallus gallus protein kinase, X-l... 143 >> 2e-31 >> ref|NM_001016403.2| Xenopus (Silurana) tropicalis protein kin... 143 >> 2e-31 >> ref|XM_002009291.1| Drosophila mojavensis GI11297 (Dmoj\GI112... 138 >> 7e-30 >> ref|NM_016979.1| Mus musculus protein kinase, X-linked (Prkx)... 138 >> 7e-30 >> ref|XM_001495664.2| PREDICTED: Equus caballus similar to Seri... 136 >> 2e-29 >> ref|XM_001111571.1| PREDICTED: Macaca mulatta cAMP-dependent ... 136 >> 2e-29 >> ref|XM_001611655.1| Babesia bovis protein kinase domain conta... 134 >> 9e-29 >> ref|NR_028062.1| Homo sapiens protein kinase, Y-linked (PRKY)... 132 >> 3e-28 >> ref|XM_001517795.1| PREDICTED: Ornithorhynchus anatinus simil... 132 >> 3e-28 >> ref|XM_685338.2| PREDICTED: Danio rerio similar to Serine/thr... 132 >> 3e-28 >> ref|XM_002189865.1| PREDICTED: Taeniopygia guttata protein ki... 131 >> 1e-27 >> ref|XM_001362299.1| PREDICTED: Monodelphis domestica similar ... 131 >> 1e-27 >> ref|NM_001093198.1| Xenopus laevis protein kinase, cAMP-depen... 129 >> 4e-27 >> ref|XM_001461322.1| Paramecium tetraurelia hypothetical prote... 129 >> 4e-27 >> ref|NM_001099869.1| Xenopus laevis cAMP-dependent protein kin... 129 >> 4e-27 >> ref|NM_001003470.1| Danio rerio protein kinase, cAMP-dependen... 129 >> 4e-27 >> ref|XM_001141503.1| PREDICTED: Pan troglodytes verus protein ... 127 >> 1e-26 >> ref|XM_001145269.1| PREDICTED: Pan troglodytes protein kinase... 127 >> 1e-26 >> ref|XM_512434.2| PREDICTED: Pan troglodytes cAMP-dependent pr... 127 >> 1e-26 >> ref|XM_001171457.1| PREDICTED: Pan troglodytes cAMP-dependent... 127 >> 1e-26 >> ref|XM_001171437.1| PREDICTED: Pan troglodytes cAMP-dependent... 127 >> 1e-26 >> ref|XM_847420.1| PREDICTED: Canis familiaris similar to Serin... 127 >> 1e-26 >> ref|NM_207518.1| Homo sapiens protein kinase, cAMP-dependent,... 127 >> 1e-26 >> ref|NM_002730.3| Homo sapiens protein kinase, cAMP-dependent,... 127 >> 1e-26 >> >> >> Thanks in advance. >> >> Roopa. >> >> On Sat, Jan 9, 2010 at 7:46 PM, Mark A. Jensen wrote: >> >>> I understand you. Put in the double quotes and see what happens. >>> >>> ----- Original Message ----- >>> *From:* Roopa Raghuveer >>> *To:* Mark A. Jensen >>> *Sent:* Saturday, January 09, 2010 1:40 PM >>> *Subject:* Re: [Bioperl-l] Regarding blast in Bioperl >>> >>> Hi Mark, >>> >>> Thanks for your reply. It was working when I specifically use the name of >>> the organism as Trypanosoma brucei in the code,but my idea is to introduce a >>> $organ which takes the organism given by the user i.e., let it be anything >>> >>> Pseudomonas, Drosophila, Trypanosoma, Leishmania etc., I should get the >>> sequences related to only those organisms. >>> >>> i.e., If the user enters Pseudomonas,the $organ parameter of the code >>> takes Pseudomonas ,does BLAST and returns only those sequences that produce >>> significant alignment with Pseudomonas(only).But this is not happening like >>> that . >>> >>> Please help me in this regard. >>> >>> Thanks in advance >>> Roopa >>> >>> On Sat, Jan 9, 2010 at 7:29 PM, Mark A. Jensen wrote: >>> >>>> Hi Roopa-- You may get what you want if you make the change. >>>> With single quotes, ENTREZ_QUERY is set to the literal string >>>> >>>> $organ[ORGN] >>>> >>>> while, with double quotes, the variable value will be substituted, >>>> and the parameter should be set to >>>> >>>> Trypanosoma brucei[ORGN] >>>> >>>> I'm guess that it worked because the database ignored the strange >>>> parameter, >>>> and returned all the matches. Try this and if it doesn't work I look >>>> harder. >>>> cheers, >>>> Mark >>>> >>>> ----- Original Message ----- >>>> *From:* Roopa Raghuveer >>>> *To:* Mark A. Jensen >>>> *Sent:* Saturday, January 09, 2010 1:24 PM >>>> *Subject:* Re: [Bioperl-l] Regarding blast in Bioperl >>>> >>>> hello Mark, >>>> >>>> Thanks for your reply.It was working without enclosing $organ[ORGN] in >>>> double quotations,but. I would like to have only those specific sequences >>>> which are specific for my Organism i.e., I need sequences only from the >>>> organism that I entered. >>>> >>>> When the organism is Trypanosoma brucei,I could get even Leishmania and >>>> other species as the similar sequences. But I want to get only trypanosoma >>>> brucei sequences. >>>> >>>> Could you please help me out in this regard? >>>> >>>> Roopa. >>>> >>>> My output >>>> >>>> I/P organism: Trypanosoma brucei >>>> >>>> O/P:- >>>> ref|XM_822292.1| Trypanosoma brucei TREU927 protein kinase A ... >>>> 1813 0.0 >>>> ref|XM_822286.1| Trypanosoma brucei TREU927 protein kinase A ... >>>> 1622 0.0 >>>> ref|XM_816530.1| Trypanosoma cruzi strain CL Brener protein k... >>>> 773 0.0 >>>> ref|XM_816527.1| Trypanosoma cruzi strain CL Brener protein k... >>>> 749 0.0 >>>> ref|XM_838414.1| Leishmania major strain Friedlin protein kin... >>>> 551 3e-154 >>>> ref|XM_838409.1| Leishmania major strain Friedlin protein kin... >>>> 551 3e-154 >>>> ref|XM_001568451.1| Leishmania braziliensis MHOM/BR/75/M2904 ... >>>> 542 2e-151 >>>> ref|XM_001469171.1| Leishmania infantum protein kinase A cata... >>>> 538 2e-150 >>>> ref|XM_001469166.1| Leishmania infantum protein kinase A cata... >>>> 538 2e-150 >>>> ref|XM_001682462.1| Leishmania major protein kinase A catalyt... >>>> 196 3e-47 >>>> ref|XM_804361.1| Trypanosoma cruzi strain CL Brener protein k... >>>> 190 1e-45 >>>> ref|XM_002065851.1| Drosophila willistoni GK20594 (Dwil\GK205... >>>> 181 7e-43 >>>> ref|XM_822694.1| Trypanosoma brucei TREU927 protein kinase A ... >>>> 179 2e-42 >>>> ref|XM_001563990.1| Leishmania braziliensis MHOM/BR/75/M2904 ... >>>> 178 8e-42 >>>> ref|XM_814844.1| Trypanosoma cruzi strain CL Brener protein k... >>>> 170 1e-39 >>>> ref|XM_001763039.1| Physcomitrella patens subsp. patens predi... >>>> 168 4e-39 >>>> ref|XM_001464886.1| Leishmania infantum JPCM5 protein kinase ... >>>> 167 1e-38 >>>> ref|XM_001377302.1| PREDICTED: Monodelphis domestica similar ... >>>> 150 1e-33 >>>> ref|XM_001603485.1| PREDICTED: Nasonia vitripennis similar to... >>>> 145 5e-32 >>>> ref|XM_416852.2| PREDICTED: Gallus gallus protein kinase, X-l... >>>> 143 2e-31 >>>> ref|NM_001016403.2| Xenopus (Silurana) tropicalis protein kin... >>>> 143 2e-31 >>>> ref|XM_002009291.1| Drosophila mojavensis GI11297 (Dmoj\GI112... >>>> 138 7e-30 >>>> ref|NM_016979.1| Mus musculus protein kinase, X-linked (Prkx)... >>>> 138 7e-30 >>>> ref|XM_001495664.2| PREDICTED: Equus caballus similar to Seri... >>>> 136 2e-29 >>>> ref|XM_001111571.1| PREDICTED: Macaca mulatta cAMP-dependent ... >>>> 136 2e-29 >>>> ref|XM_001611655.1| Babesia bovis protein kinase domain conta... >>>> 134 9e-29 >>>> ref|NR_028062.1| Homo sapiens protein kinase, Y-linked (PRKY)... >>>> 132 3e-28 >>>> ref|XM_001517795.1| PREDICTED: Ornithorhynchus anatinus simil... >>>> 132 3e-28 >>>> ref|XM_685338.2| PREDICTED: Danio rerio similar to Serine/thr... >>>> 132 3e-28 >>>> ref|XM_002189865.1| PREDICTED: Taeniopygia guttata protein ki... >>>> 131 1e-27 >>>> ref|XM_001362299.1| PREDICTED: Monodelphis domestica similar ... >>>> 131 1e-27 >>>> ref|NM_001093198.1| Xenopus laevis protein kinase, cAMP-depen... >>>> 129 4e-27 >>>> ref|XM_001461322.1| Paramecium tetraurelia hypothetical prote... >>>> 129 4e-27 >>>> ref|NM_001099869.1| Xenopus laevis cAMP-dependent protein kin... >>>> 129 4e-27 >>>> >>>> Roopa. >>>> >>>> On Sat, Jan 9, 2010 at 7:05 PM, Mark A. Jensen wrote: >>>> >>>>> I see it immediately (from making same bug many times) : >>>>> >>>>> >>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY >>>>> => >>>>> - '$organ[ORGN]'); >>>>> +"$organ[ORGN]"); >>>>> >>>>> >>>>> MAJ >>>>> >>>>> ----- Original Message ----- From: "Roopa Raghuveer" < >>>>> rtbio.2009 at gmail.com> >>>>> To: "Mark A. Jensen" >>>>> Cc: >>>>> Sent: Saturday, January 09, 2010 11:57 AM >>>>> Subject: Re: [Bioperl-l] Regarding blast in Bioperl >>>>> >>>>> >>>>> >>>>> Hello all, >>>>>> >>>>>> Thanks alot for your reply Mark. It was working for Trypanosoma brucei >>>>>> as >>>>>> the organism parameter,but when I tried to use the Organism parameter >>>>>> from >>>>>> the user,it was not working i.e., I was unable to get the target >>>>>> sequences. >>>>>> Please help me in this regard. My code is >>>>>> >>>>>> #!/usr/bin/perl >>>>>> >>>>>> #path for extra camel module >>>>>> use lib "/srv/www/htdocs/rain/RNAi/"; >>>>>> use Roopablast; >>>>>> >>>>>> >>>>>> use Bio::SearchIO; >>>>>> use Bio::Search::Result::BlastResult; >>>>>> use Bio::Perl; >>>>>> use Bio::Tools::Run::RemoteBlast; >>>>>> use Bio::Seq; >>>>>> use Bio::SeqIO; >>>>>> use Bio::DB::GenBank; >>>>>> >>>>>> $serverpath = "/srv/www/htdocs/rain/RNAi"; >>>>>> $serverurl = "http://141.84.66.66/rain/RNAi"; >>>>>> $outfile = $serverpath."/rnairesult_".time().".html"; >>>>>> $nuc = $serverpath."/nuc".time().".txt"; >>>>>> $debugfile = $serverpath."/debug_".time().".txt"; >>>>>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; >>>>>> >>>>>> my $outstring =""; >>>>>> >>>>>> &parse_form; >>>>>> >>>>>> print "Content-type: text/html\n\n"; >>>>>> print "\n"; >>>>>> print "RNAi Result"; >>>>>> print ">>>>> URL=$serverurl/rnairesult_".time().".html\"> \n"; >>>>>> print "\n"; >>>>>> print "\n"; >>>>>> print " Your results will appear >>>>> href=$serverurl/rnairesult_".time().".html>here
"; >>>>>> print " Please be patient, runtime can be up to 5 minutes
"; >>>>>> print " This page will automatically reload in 30 seconds. Roopa"; >>>>>> print "\n"; >>>>>> print "\n"; >>>>>> >>>>>> defined(my $pid = fork) or die "Can't fork: $!"; >>>>>> exit if $pid; >>>>>> open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; >>>>>> open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; >>>>>> open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; >>>>>> >>>>>> open(OUTFILE, '>',$outfile); >>>>>> >>>>>> print OUTFILE "\n >>>>>> RNAi Result >>>>>> >>>>> URL=$serverurl//rnairesult_".time().".html\"> \n >>>>>> >>>>>> \n >>>>>> \n >>>>>> Your results will appear >>>>> href=$serverurl/rnairesult_".time().".html>here
>>>>>> Please be patient, runtime can be up to 5 minutes wait wait >>>>>> wait......
>>>>>> This page will automatically reload in 30 seconds Roopa
>>>>>> \n >>>>>> \n"; >>>>>> >>>>>> close(OUTFILE); >>>>>> >>>>>> >>>>>> @compseqs = blastcode($in{'Inputseq'},$in{'Organism'}); >>>>>> >>>>>> $in{'Inputseq'} =~ s/>.*$//m; >>>>>> $in{'Inputseq'} =~ s/[^TAGC]//gim; >>>>>> $in{'Inputseq'} =~ tr/actg/ACTG/; >>>>>> >>>>>> @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'}, >>>>>> $in{'Threshold'}); >>>>>> >>>>>> >>>>>> sub blastcode >>>>>> { >>>>>> >>>>>> $inpu1= $_[0]; >>>>>> >>>>>> $organ= $_[1]; >>>>>> >>>>>> open(NUC,'>',$nuc); >>>>>> print NUC $inpu1,"\n"; >>>>>> close(NUC); >>>>>> >>>>>> my $prog = 'blastn'; >>>>>> my $db = 'refseq_rna'; >>>>>> my $e_val= '1e-10'; >>>>>> my $organism= $organ; >>>>>> >>>>>> $gb = new Bio::DB::GenBank; >>>>>> >>>>>> my @params = ( '-prog' => $prog, >>>>>> '-data' => $db, >>>>>> '-expect' => $e_val, >>>>>> '-readmethod' => 'SearchIO', >>>>>> '-Organism' => $organism ); >>>>>> >>>>>> open(OUTFILE,'>',$debugfile); >>>>>> print OUTFILE $inpu1; >>>>>> close(OUTFILE); >>>>>> >>>>>> >>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY >>>>>> => >>>>>> '$organ[ORGN]'); >>>>>> >>>>>> #my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >>>>>> >>>>>> #change a paramter >>>>>> >>>>>> #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma >>>>>> Brucei[ORGN]'; >>>>>> >>>>>> #change a paramter >>>>>> # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = >>>>>> '$input2[ORGN]'; >>>>>> >>>>>> my $v = 1; >>>>>> #$v is just to turn on and off the messages >>>>>> >>>>>> my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , >>>>>> '-organism' => $organ ); >>>>>> >>>>>> >>>>>> while (my $input = $str->next_seq()) >>>>>> { >>>>>> #Blast a sequence against a database: >>>>>> #Alternatively, you could pass in a file with many >>>>>> #sequences rather than loop through sequence one at a time >>>>>> #Remove the loop starting 'while (my $input = $str->next_seq())' >>>>>> #and swap the two lines below for an example of that. >>>>>> >>>>>> #open(OUTFILE,'>',$debugfile); >>>>>> # print OUTFILE $input; >>>>>> #close(OUTFILE); >>>>>> >>>>>> >>>>>> my $r = $factory->submit_blast($input); >>>>>> >>>>>> open(OUTFILE,'>',$debugfile); >>>>>> # print OUTFILE $r; >>>>>> close(OUTFILE); >>>>>> >>>>>> print STDERR "waiting...." if($v>0); >>>>>> >>>>>> while ( my @rids = $factory->each_rid ) { >>>>>> # open(OUTFILE,'>',$debugfile); >>>>>> # print OUTFILE "while entered"; >>>>>> # close(OUTFILE); >>>>>> foreach my $rid ( @rids ) { >>>>>> >>>>>> # open(OUTFILE,'>',$debugfile); >>>>>> # print OUTFILE "foreach entered"; >>>>>> # close(OUTFILE); >>>>>> >>>>>> my $rc = $factory->retrieve_blast($rid); >>>>>> >>>>>> if( !ref($rc) ) >>>>>> { >>>>>> if( $rc < 0 ) >>>>>> { >>>>>> $factory->remove_rid($rid); >>>>>> } >>>>>> open(OUTFILE,'>',$debugfile); >>>>>> # print OUTFILE "if entered"; >>>>>> close(OUTFILE); >>>>>> print STDERR "." if ( $v > 0 ); >>>>>> sleep 5; >>>>>> } >>>>>> else { >>>>>> # open(OUTFILE,'>',$debugfile); >>>>>> # print OUTFILE "else entered"; >>>>>> # close(OUTFILE); >>>>>> >>>>>> my $result = $rc->next_result(); >>>>>> #save the output >>>>>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; >>>>>> >>>>>> open(BLASTDEBUGFILE,'>',$blastdebugfile); >>>>>> print BLASTDEBUGFILE $result->next_hit(); >>>>>> close(BLASTDEBUGFILE); >>>>>> >>>>>> my $filename = >>>>>> $serverpath."/blastdata_".time().$result->query_name()."\.out"; >>>>>> >>>>>> # open(DEBUGFILE,'>',$debugfile); >>>>>> # open(new,'>',$filename); >>>>>> # @arra=; >>>>>> # print DEBUGFILE @arra; >>>>>> # close(DEBUGFILE); >>>>>> # close(new); >>>>>> >>>>>> $factory->save_output($filename); >>>>>> # open(BLASTDEBUGFILE,'>',$debugfile); >>>>>> # print BLASTDEBUGFILE "Hello $rid"; >>>>>> # close(BLASTDEBUGFILE); >>>>>> >>>>>> $factory->remove_rid($rid); >>>>>> >>>>>> open(BLASTDEBUGFILE,'>',$blastdebugfile); >>>>>> print BLASTDEBUGFILE $organism; >>>>>> close(BLASTDEBUGFILE); >>>>>> >>>>>> # open(OUTFILE,'>',$outfile); >>>>>> # print OUTFILE "Test2 $result->database_name()"; >>>>>> # close(OUTFILE); >>>>>> >>>>>> #$hit = $result->next_hit; >>>>>> #open(new,'>',$debugfile); >>>>>> #print $hit; >>>>>> #close(new); >>>>>> >>>>>> while ( my $hit = $result->next_hit ) { >>>>>> >>>>>> next unless ( $v > 0); >>>>>> >>>>>> # open(OUTFILE,'>',$debugfile); >>>>>> # print OUTFILE "$hit in while hits"; >>>>>> # close(OUTFILE); >>>>>> >>>>>> my $sequ = $gb->get_Seq_by_version($hit->name); >>>>>> my $dna = $sequ->seq(); # get the sequence as a string >>>>>> push(@seqs,$dna); >>>>>> } >>>>>> } >>>>>> } >>>>>> } >>>>>> } >>>>>> >>>>>> #open(OUTFILE,'>',$debugfile); >>>>>> #print OUTFILE $seqs[0]; >>>>>> #close(OUTFILE); >>>>>> >>>>>> return(@seqs); >>>>>> >>>>>> } >>>>>> >>>>>> Regards, >>>>>> Roopa. >>>>>> >>>>>> >>>>>> On Fri, Jan 8, 2010 at 4:36 PM, Mark A. Jensen >>>>>> wrote: >>>>>> >>>>>> Hi Roopa-- >>>>>>> >>>>>>> I got your code to work with the following changes: >>>>>>> >>>>>>> +# the input should be a valid FASTA file... >>>>>>> ... >>>>>>> open(NUC,'>',$nuc); >>>>>>> +print NUC ">seq (need a name line for valid fasta)\n"; >>>>>>> print NUC $inpu1, "\n"; >>>>>>> close(NUC); >>>>>>> ... >>>>>>> >>>>>>> +# you can set these header parms in the call itself... >>>>>>> - my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >>>>>>> + my $factory = Bio::Tools::Run::RemoteBlast->new(@params, >>>>>>> -ENTREZ_QUERY => >>>>>>> ''Trypanosoma Brucei[ORGN]'); >>>>>>> >>>>>>> #change a paramter >>>>>>> +# commented this out... >>>>>>> +# $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = >>>>>>> 'Trypanosoma >>>>>>> Brucei[ORGN]'; >>>>>>> >>>>>>> MAJ >>>>>>> ----- Original Message ----- From: "Roopa Raghuveer" < >>>>>>> rtbio.2009 at gmail.com >>>>>>> > >>>>>>> To: >>>>>>> Sent: Friday, January 08, 2010 10:00 AM >>>>>>> Subject: [Bioperl-l] Regarding blast in Bioperl >>>>>>> >>>>>>> >>>>>>> Hello all, >>>>>>> >>>>>>>> >>>>>>>> I was trying Remote blast using Bioperl. My input data is a >>>>>>>> Trypanosoma >>>>>>>> brucei sequence in Fasta format. When I was trying to submit to >>>>>>>> BLAST >>>>>>>> using >>>>>>>> the step >>>>>>>> $r=$factory->submit_blast($input) >>>>>>>> It was not returning anything which I checked by debugging the code. >>>>>>>> It is >>>>>>>> not blasting my input sequence even though I mentioned all the >>>>>>>> parameters.I >>>>>>>> would paste the code below. >>>>>>>> >>>>>>>> Please help me in solving put this problem. It is very urgent. >>>>>>>> >>>>>>>> Regards >>>>>>>> Roopa. >>>>>>>> >>>>>>>> #!/usr/bin/perl >>>>>>>> >>>>>>>> #path for extra camel module >>>>>>>> use lib "/srv/www/htdocs/rain/RNAi/"; >>>>>>>> use Roopablast; >>>>>>>> >>>>>>>> >>>>>>>> use Bio::SearchIO; >>>>>>>> use Bio::Search::Result::BlastResult; >>>>>>>> use Bio::Perl; >>>>>>>> use Bio::Tools::Run::RemoteBlast; >>>>>>>> use Bio::Seq; >>>>>>>> use Bio::SeqIO; >>>>>>>> use Bio::DB::GenBank; >>>>>>>> >>>>>>>> $serverpath = "/srv/www/htdocs/rain/RNAi"; >>>>>>>> $serverurl = "http://141.84.66.66/rain/RNAi"; >>>>>>>> $outfile = $serverpath."/rnairesult_".time().".html"; >>>>>>>> $nuc = $serverpath."/nuc".time().".txt"; >>>>>>>> $debugfile = $serverpath."/debug_".time().".txt"; >>>>>>>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; >>>>>>>> >>>>>>>> my $outstring =""; >>>>>>>> >>>>>>>> &parse_form; >>>>>>>> >>>>>>>> print "Content-type: text/html\n\n"; >>>>>>>> print "\n"; >>>>>>>> print "RNAi Result"; >>>>>>>> print ">>>>>>> URL=$serverurl/rnairesult_".time().".html\"> \n"; >>>>>>>> print "\n"; >>>>>>>> print "\n"; >>>>>>>> print " Your results will appear >>>>>>> href=$serverurl/rnairesult_".time().".html>here
"; >>>>>>>> print " Please be patient, runtime can be up to 5 minutes
"; >>>>>>>> print " This page will automatically reload in 30 seconds. Roopa"; >>>>>>>> print "\n"; >>>>>>>> print "\n"; >>>>>>>> >>>>>>>> defined(my $pid = fork) or die "Can't fork: $!"; >>>>>>>> exit if $pid; >>>>>>>> open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; >>>>>>>> open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; >>>>>>>> open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> open(OUTFILE, '>',$outfile); >>>>>>>> >>>>>>>> print OUTFILE "\n >>>>>>>> RNAi Result >>>>>>>> >>>>>>> URL=$serverurl//rnairesult_".time().".html\"> \n >>>>>>>> >>>>>>>> \n >>>>>>>> \n >>>>>>>> Your results will appear >>>>>>> href=$serverurl/rnairesult_".time().".html>here
>>>>>>>> Please be patient, runtime can be up to 5 minutes wait wait >>>>>>>> wait......
>>>>>>>> This page will automatically reload in 30 seconds Roopa
>>>>>>>> \n >>>>>>>> \n"; >>>>>>>> >>>>>>>> close(OUTFILE); >>>>>>>> >>>>>>>> >>>>>>>> @compseqs = blastcode($in{'Inputseq'}); >>>>>>>> >>>>>>>> $in{'Inputseq'} =~ s/>.*$//m; >>>>>>>> $in{'Inputseq'} =~ s/[^TAGC]//gim; >>>>>>>> $in{'Inputseq'} =~ tr/actg/ACTG/; >>>>>>>> >>>>>>>> @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'}, >>>>>>>> $in{'Threshold'}); >>>>>>>> >>>>>>>> >>>>>>>> sub blastcode >>>>>>>> { >>>>>>>> >>>>>>>> $inpu1= $_[0]; >>>>>>>> >>>>>>>> #$organ= $_[1]; >>>>>>>> >>>>>>>> open(NUC,'>',$nuc); >>>>>>>> print NUC $inpu1; >>>>>>>> close(NUC); >>>>>>>> >>>>>>>> my $prog = 'blastn'; >>>>>>>> my $db = 'refseq_rna'; >>>>>>>> my $e_val= '1e-10'; >>>>>>>> my $organism= 'Trypanosoma Brucei'; >>>>>>>> >>>>>>>> $gb = new Bio::DB::GenBank; >>>>>>>> >>>>>>>> my @params = ( '-prog' => $prog, >>>>>>>> '-data' => $db, >>>>>>>> '-expect' => $e_val, >>>>>>>> '-readmethod' => 'SearchIO', >>>>>>>> '-Organism' => $organism ); >>>>>>>> >>>>>>>> # open(OUTFILE,'>',$debugfile); >>>>>>>> # print OUTFILE @params; >>>>>>>> # close(OUTFILE); >>>>>>>> >>>>>>>> >>>>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >>>>>>>> >>>>>>>> #change a paramter >>>>>>>> >>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma >>>>>>>> Brucei[ORGN]'; >>>>>>>> >>>>>>>> #change a paramter >>>>>>>> # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = >>>>>>>> '$input2[ORGN]'; >>>>>>>> >>>>>>>> my $v = 1; >>>>>>>> #$v is just to turn on and off the messages >>>>>>>> >>>>>>>> my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , >>>>>>>> '-organism' => 'Trypanosoma Brucei' ); >>>>>>>> >>>>>>>> >>>>>>>> while (my $input = $str->next_seq()) >>>>>>>> { >>>>>>>> #Blast a sequence against a database: >>>>>>>> #Alternatively, you could pass in a file with many >>>>>>>> #sequences rather than loop through sequence one at a time >>>>>>>> #Remove the loop starting 'while (my $input = $str->next_seq())' >>>>>>>> #and swap the two lines below for an example of that. >>>>>>>> >>>>>>>> open(OUTFILE,'>',$debugfile); >>>>>>>> print OUTFILE $input; >>>>>>>> close(OUTFILE); >>>>>>>> >>>>>>>> >>>>>>>> my $r = $factory->submit_blast($input); #The program stops here >>>>>>>> it >>>>>>>> does not return any value and it does not enter the While >>>>>>>> loop,Please help >>>>>>>> me in this regard.# >>>>>>>> open(OUTFILE,'>',$debugfile); >>>>>>>> print OUTFILE $r; >>>>>>>> close(OUTFILE); >>>>>>>> >>>>>>>> >>>>>>>> print STDERR "waiting...." if($v>0); >>>>>>>> >>>>>>>> while ( my @rids = $factory->each_rid ) { >>>>>>>> open(OUTFILE,'>',$debugfile); >>>>>>>> print OUTFILE "while entered"; >>>>>>>> close(OUTFILE); >>>>>>>> foreach my $rid ( @rids ) { >>>>>>>> >>>>>>>> open(OUTFILE,'>',$debugfile); >>>>>>>> print OUTFILE "foreach entered"; >>>>>>>> close(OUTFILE); >>>>>>>> >>>>>>>> my $rc = $factory->retrieve_blast($rid); >>>>>>>> >>>>>>>> if( !ref($rc) ) >>>>>>>> { >>>>>>>> if( $rc < 0 ) >>>>>>>> { >>>>>>>> $factory->remove_rid($rid); >>>>>>>> } >>>>>>>> open(OUTFILE,'>',$debugfile); >>>>>>>> print OUTFILE "if entered"; >>>>>>>> close(OUTFILE); >>>>>>>> print STDERR "." if ( $v > 0 ); >>>>>>>> sleep 5; >>>>>>>> } >>>>>>>> else { >>>>>>>> open(OUTFILE,'>',$debugfile); >>>>>>>> print OUTFILE "else entered"; >>>>>>>> close(OUTFILE); >>>>>>>> >>>>>>>> my $result = $rc->next_result(); >>>>>>>> #save the output >>>>>>>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; >>>>>>>> >>>>>>>> open(BLASTDEBUGFILE,'>',$blastdebugfile); >>>>>>>> print BLASTDEBUGFILE $result->next_hit(); >>>>>>>> close(BLASTDEBUGFILE); >>>>>>>> >>>>>>>> my $filename = >>>>>>>> $serverpath."/blastdata_".time().$result->query_name()."\.out"; >>>>>>>> >>>>>>>> # open(DEBUGFILE,'>',$debugfile); >>>>>>>> # open(new,'>',$filename); >>>>>>>> # @arra=; >>>>>>>> # print DEBUGFILE @arra; >>>>>>>> # close(DEBUGFILE); >>>>>>>> # close(new); >>>>>>>> >>>>>>>> $factory->save_output($filename); >>>>>>>> >>>>>>>> # open(BLASTDEBUGFILE,'>',$debugfile); >>>>>>>> # print BLASTDEBUGFILE "Hello $rid"; >>>>>>>> # close(BLASTDEBUGFILE); >>>>>>>> >>>>>>>> $factory->remove_rid($rid); >>>>>>>> >>>>>>>> open(BLASTDEBUGFILE,'>',$blastdebugfile); >>>>>>>> print BLASTDEBUGFILE $organism; >>>>>>>> close(BLASTDEBUGFILE); >>>>>>>> >>>>>>>> # open(OUTFILE,'>',$outfile); >>>>>>>> # print OUTFILE "Test2 $result->database_name()"; >>>>>>>> # close(OUTFILE); >>>>>>>> >>>>>>>> #$hit = $result->next_hit; >>>>>>>> #open(new,'>',$debugfile); >>>>>>>> #print $hit; >>>>>>>> #close(new); >>>>>>>> >>>>>>>> while ( my $hit = $result->next_hit ) { >>>>>>>> >>>>>>>> next unless ( $v > 0); >>>>>>>> >>>>>>>> # open(OUTFILE,'>',$debugfile); >>>>>>>> # print OUTFILE "$hit in while hits"; >>>>>>>> # close(OUTFILE); >>>>>>>> >>>>>>>> my $sequ = $gb->get_Seq_by_version($hit->name); >>>>>>>> my $dna = $sequ->seq(); # get the sequence as a >>>>>>>> string >>>>>>>> push(@seqs,$dna); >>>>>>>> } >>>>>>>> } >>>>>>>> } >>>>>>>> } >>>>>>>> } >>>>>>>> >>>>>>>> #open(OUTFILE,'>',$debugfile); >>>>>>>> #print OUTFILE $seqs[0]; >>>>>>>> #close(OUTFILE); >>>>>>>> >>>>>>>> return(@seqs); >>>>>>>> >>>>>>>> } >>>>>>>> >>>>>>>> open(OUTFILE, '>',$outfile) || die ; >>>>>>>> >>>>>>>> print OUTFILE "\n >>>>>>>> RNAi Result >>>>>>>> \n >>>>>>>> \n >>>>>>>>

>>>>>>>> Inputsequence:
"; >>>>>>>> >>>>>>>> for ($i=0; $i>>>>>>> >>>>>>>> print OUTFILE substr ($in{'Inputseq'}, $i, 1); >>>>>>>> >>>>>>>> if ( ($i+1)%10==0){ >>>>>>>> print OUTFILE " "; >>>>>>>> } >>>>>>>> if ( ($i+1)%60==0){ >>>>>>>> print OUTFILE "
\n"; >>>>>>>> } >>>>>>>> } >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> print OUTFILE "

"; >>>>>>>> >>>>>>>> $z=@compseqs; >>>>>>>> >>>>>>>> for($k=1;$k<$z;$k++) { >>>>>>>> print OUTFILE ">>>>>>> set\">

Compare >>>>>>>> Sequence:
"; >>>>>>>> >>>>>>>> for ($i=0; $i>>>>>>> >>>>>>>> print OUTFILE substr ($compseqs[$k], $i, 1); >>>>>>>> >>>>>>>> if ( ($i+1)%10==0){ >>>>>>>> print OUTFILE " "; >>>>>>>> } >>>>>>>> if ( ($i+1)%60==0){ >>>>>>>> print OUTFILE "
\n"; >>>>>>>> } >>>>>>>> } >>>>>>>> print OUTFILE "

"; >>>>>>>> } >>>>>>>> >>>>>>>> print OUTFILE "

>>>>>>>> Window:
$in{'Windowsize'} >>>>>>>>

>>>>>>>>

>>>>>>>> Threshold:
$in{'Threshold'} >>>>>>>>

"; >>>>>>>> my $j=0; >>>>>>>> >>>>>>>> for ($i=0; $i>>>>>>> >>>>>>>> if ($i<=length ($in{'Inputseq'})-$in{'Windowsize'}){ >>>>>>>> if ($out[$i]->{similar}<=$in{'Threshold'}){ >>>>>>>> $j=$in{'Windowsize'}; >>>>>>>> } >>>>>>>> $height=$out[$i]->{similar}*5; >>>>>>>> } >>>>>>>> >>>>>>>> if ($j>0) { >>>>>>>> print OUTFILE ">>>>>>> height=\"5\">"; >>>>>>>> $outstring .= "".substr ($in{'Inputseq'}, >>>>>>>> $i, >>>>>>>> 1).""; >>>>>>>> $j--; >>>>>>>> } >>>>>>>> else { >>>>>>>> print OUTFILE ">>>>>>> height=\"5\">"; >>>>>>>> $outstring .= "".substr ($in{'Inputseq'}, >>>>>>>> $i, >>>>>>>> 1).""; >>>>>>>> } >>>>>>>> >>>>>>>> if ( ($i+1)%10==0){ >>>>>>>> $outstring .= " "; >>>>>>>> } >>>>>>>> if ( ($i+1)%60==0){ >>>>>>>> $outstring .= "
\n"; >>>>>>>> >>>>>>>> } >>>>>>>> if ( ($i+1)%800==0){ >>>>>>>> print OUTFILE "

\n"; >>>>>>>> >>>>>>>> } >>>>>>>> } >>>>>>>> >>>>>>>> print OUTFILE "

>>>>>>> set\">$outstring"; >>>>>>>> >>>>>>>> #foreach (@out) { >>>>>>>> #print OUTFILE "

Sequence: $_->{sequence}: $_->{similar} >>>>>>>> matchs

"; >>>>>>>> #if ($_->{similar}<=$in{'Threshold'}){ >>>>>>>> >>>>>>>> # } >>>>>>>> #} >>>>>>>> >>>>>>>> print OUTFILE "\n\n"; >>>>>>>> >>>>>>>> close OUTFILE; >>>>>>>> >>>>>>>> #nameprint(); >>>>>>>> >>>>>>>> sub parse_form { >>>>>>>> local ($buffer, @pairs, $pair, $name, $value); >>>>>>>> # Read in text >>>>>>>> $ENV{'REQUEST_METHOD'} =~ tr/a-z/A-Z/; >>>>>>>> if ($ENV{'REQUEST_METHOD'} eq "POST") >>>>>>>> { >>>>>>>> read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'}); >>>>>>>> } >>>>>>>> else >>>>>>>> { >>>>>>>> $buffer = $ENV{'QUERY_STRING'}; >>>>>>>> } >>>>>>>> @pairs = split(/&/, $buffer); >>>>>>>> foreach $pair (@pairs) >>>>>>>> { >>>>>>>> ($name, $value) = split(/=/, $pair); >>>>>>>> $value =~ tr/+/ /; >>>>>>>> $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg; >>>>>>>> $in{$name} = $value; >>>>>>>> } >>>>>>>> } >>>>>>>> _______________________________________________ >>>>>>>> Bioperl-l mailing list >>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>>>> >>>> >>> >> > From maj at fortinbras.us Fri Jan 22 12:34:59 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 22 Jan 2010 07:34:59 -0500 Subject: [Bioperl-l] forcing alphabet in Bio::AlignIO In-Reply-To: References: Message-ID: I'm down with that. ----- Original Message ----- From: "Jason Stajich" To: "BioPerl List" Sent: Friday, January 22, 2010 1:17 AM Subject: [Bioperl-l] forcing alphabet in Bio::AlignIO > I'm considering putting in allowable initialization parameter (and get/ > set) for Bio::AlignIO that would allow setting of the alphabet. This > is then passed to Bio::LocatableSeq creation so that _guess_alphabet > isn't called. This will allow removal of warnings about empty > sequences because _guess_alphabet won't be called on a sequence if we > have explictly set the alphabet. > > This worked great on my local install and tests pass. Any objections > or concerns? > > basically it means when you make an AlignIO you can specify the > alphabet i.e. > > my $in = Bio::AlignIO->new(-format => 'fasta', -alphabet => 'dna', - > file => 'genome.fasaln'); > > I have some alignments with empty sequences and I think turning off > the warnings is appropriate where I force the alphabet choice. It > should also have a very modest speedup benefit too. > > -jason > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > http://fungalgenomes.org/ > http://twitter.com/hyphaltip > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From avilella at gmail.com Fri Jan 22 13:07:26 2010 From: avilella at gmail.com (Albert Vilella) Date: Fri, 22 Jan 2010 13:07:26 +0000 Subject: [Bioperl-l] Merging fragments in a simplealign Message-ID: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com> Hi, I would like to write a script that merges fragments in a Bio::SimpleAlign object on the basis of some $seq->display_name rule. I basically want to start with something like this: seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM seq2.234 QWERTYU------------------- seq2.345 ----------ASDFGH---------- seq2.456 -------------------ZXCVBNM And end with something like this: seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM seq2.mrg QWERTYU---ASDFGH---ZXCVBNM Can people suggest any Bio::SimpleAlign methods that would help here? Cheers, Albert. From maj at fortinbras.us Fri Jan 22 13:31:54 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 22 Jan 2010 08:31:54 -0500 Subject: [Bioperl-l] Merging fragments in a simplealign In-Reply-To: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com> References: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com> Message-ID: Here's one of my favorite tricks for this: XOR mask on gap symbol. MAJ use Bio::SeqIO; use Bio::Seq; use strict; my $seqio = Bio::SeqIO->new( -fh => \*DATA ); my $acc = $seqio->next_seq->seq ^ '-'; while ($_ = $seqio->next_seq ) { $acc ^= ($_->seq ^ '-'); } my $mrg = Bio::Seq->new( -id => 'merged', -seq => $acc ^ '-' ); 1; __END__ >seq2.234 QWERTYU------------------- >seq2.345 ----------ASDFGH---------- >seq2.456 -------------------ZXCVBNM ----- Original Message ----- From: "Albert Vilella" To: Sent: Friday, January 22, 2010 8:07 AM Subject: [Bioperl-l] Merging fragments in a simplealign > Hi, > > I would like to write a script that merges fragments in a Bio::SimpleAlign > object on the basis of > some $seq->display_name rule. > > I basically want to start with something like this: > > seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM > seq2.234 QWERTYU------------------- > seq2.345 ----------ASDFGH---------- > seq2.456 -------------------ZXCVBNM > > And end with something like this: > > seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM > seq2.mrg QWERTYU---ASDFGH---ZXCVBNM > > Can people suggest any Bio::SimpleAlign methods that would help here? > > Cheers, > > Albert. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Fri Jan 22 13:34:07 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 22 Jan 2010 07:34:07 -0600 Subject: [Bioperl-l] forcing alphabet in Bio::AlignIO In-Reply-To: References: Message-ID: <55F51BAA-7DA5-4F32-B680-DCAE1714A5F1@illinois.edu> Sounds good to me. The warnings are a bit too tight on this module anyway. I still think we have plans towards refactoring some of this, not sure how far along they are: http://www.bioperl.org/wiki/Align_Refactor chris On Jan 22, 2010, at 12:17 AM, Jason Stajich wrote: > I'm considering putting in allowable initialization parameter (and get/set) for Bio::AlignIO that would allow setting of the alphabet. This is then passed to Bio::LocatableSeq creation so that _guess_alphabet isn't called. This will allow removal of warnings about empty sequences because _guess_alphabet won't be called on a sequence if we have explictly set the alphabet. > > This worked great on my local install and tests pass. Any objections or concerns? > > basically it means when you make an AlignIO you can specify the alphabet i.e. > > my $in = Bio::AlignIO->new(-format => 'fasta', -alphabet => 'dna', -file => 'genome.fasaln'); > > I have some alignments with empty sequences and I think turning off the warnings is appropriate where I force the alphabet choice. It should also have a very modest speedup benefit too. > > -jason > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > http://fungalgenomes.org/ > http://twitter.com/hyphaltip > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Fri Jan 22 13:40:57 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 22 Jan 2010 07:40:57 -0600 Subject: [Bioperl-l] Merging fragments in a simplealign In-Reply-To: References: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com> Message-ID: <058430B4-F149-438C-A9DD-C9E24545FDBB@illinois.edu> May be something for the cook/scrapbook? chris On Jan 22, 2010, at 7:31 AM, Mark A. Jensen wrote: > Here's one of my favorite tricks for this: XOR mask on gap symbol. > MAJ > > use Bio::SeqIO; > use Bio::Seq; > use strict; > my $seqio = Bio::SeqIO->new( -fh => \*DATA ); > > my $acc = $seqio->next_seq->seq ^ '-'; > while ($_ = $seqio->next_seq ) { > $acc ^= ($_->seq ^ '-'); > } > my $mrg = Bio::Seq->new( -id => 'merged', > -seq => $acc ^ '-' ); > 1; > > > __END__ >> seq2.234 > QWERTYU------------------- >> seq2.345 > ----------ASDFGH---------- >> seq2.456 > -------------------ZXCVBNM > > ----- Original Message ----- From: "Albert Vilella" > To: > Sent: Friday, January 22, 2010 8:07 AM > Subject: [Bioperl-l] Merging fragments in a simplealign > > >> Hi, >> I would like to write a script that merges fragments in a Bio::SimpleAlign >> object on the basis of >> some $seq->display_name rule. >> I basically want to start with something like this: >> seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM >> seq2.234 QWERTYU------------------- >> seq2.345 ----------ASDFGH---------- >> seq2.456 -------------------ZXCVBNM >> And end with something like this: >> seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM >> seq2.mrg QWERTYU---ASDFGH---ZXCVBNM >> Can people suggest any Bio::SimpleAlign methods that would help here? >> Cheers, >> Albert. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From holland at eaglegenomics.com Fri Jan 22 10:51:52 2010 From: holland at eaglegenomics.com (Richard Holland) Date: Fri, 22 Jan 2010 10:51:52 +0000 Subject: [Bioperl-l] [BioSQL-l] SwissProt DE lines and UniProt XML / TagTree as XML in BioSQL In-Reply-To: <320fb6e01001210433n6f42e617g6519ee2790d6add5@mail.gmail.com> References: <320fb6e01001210433n6f42e617g6519ee2790d6add5@mail.gmail.com> Message-ID: <8FECCBDE-2DE1-40EE-B5A4-73BDAC893E2D@eaglegenomics.com> Nice idea. Currently, BioJava just stores the complete section as a string without parsing it, but it provides a parser module for converting it into useful tag/value format within a user's program (but not to be stored in BioSQL). On 21 Jan 2010, at 12:33, Peter wrote: > Hi all, > > This is cross posted to try and ensure relevant people see it. > I suggest we continue the discussion on the BioSQL list > (for how to serialise structured annotation to BioSQL), and/or > the OpenBio list (for things like file format naming conventions). > > I am hoping we (Bio*) can be consistent in how we parse and load > into BioSQL the SwissProt DE lines (known as "swiss" format in > both BioPerl and Biopython's SeqIO, and by EMBOSS) or the > equivalent UniProt XML tags (which we are tentatively going to > call the "uniprot" format in Biopython's SeqIO - comments?). > > Like BioPerl (etc), Biopython can parse plain text SwissProt ("swiss") > files and load them into BioSQL. Biopython currently treats the DE > comment lines as a long string, as BioPerl used to: > > http://lists.open-bio.org/pipermail/bioperl-l/2009-May/030041.html > http://lists.open-bio.org/pipermail/biosql-l/2009-May/001514.html > > I understand that BioPerl now turns the SwissProt DE lines into a > TagTree, and for storing this in BioSQL this gets serialised as XML. > I would like Biopython to handle this the same way (although rather > than a Perl TagTree, we'd use a Python structure of course), and > would appreciate clarification of what exactly was implemented > (e.g. which bit of the BioPerl source code should be look at, > and could you show a worked example?). > > Andrea Pierlenoin (CC'd - not sure if he is on the BioSQL or > Open-Bio lists yet) has started work on parsing UniProt XML > files for Biopython. Here the DE comment lines are already > provided broken up with XML markup. Hopefully their nested > structure matches what BioPerl was doing with the SwissProt > DE lines. > > Regards, > > Peter > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From andrea at biocomp.unibo.it Fri Jan 22 12:18:32 2010 From: andrea at biocomp.unibo.it (Andrea Pierleoni) Date: Fri, 22 Jan 2010 13:18:32 +0100 (CET) Subject: [Bioperl-l] SwissProt DE lines and UniProt XML / TagTree as XML in BioSQL In-Reply-To: <320fb6e01001210433n6f42e617g6519ee2790d6add5@mail.gmail.com> References: <320fb6e01001210433n6f42e617g6519ee2790d6add5@mail.gmail.com> Message-ID: <2b6e30c4628585042366646a7b46386e.squirrel@lipid.biocomp.unibo.it> I think that the point here can be a little broader, since not only the swissprot DE lines carry complex and structured data. To define a common, language-independent way to store structured data into the comment and *_qualifier_value tables of the actual BioSQL schema could be very useful. XML looks like a good candidate to me, and the UniprotXML format can be used as reference or as a template to start from. Each Bio* project will then parse and report this structured data in its own programming language data structure. Andrea > Hi all, > > This is cross posted to try and ensure relevant people see it. > I suggest we continue the discussion on the BioSQL list > (for how to serialise structured annotation to BioSQL), and/or > the OpenBio list (for things like file format naming conventions). > > I am hoping we (Bio*) can be consistent in how we parse and load > into BioSQL the SwissProt DE lines (known as "swiss" format in > both BioPerl and Biopython's SeqIO, and by EMBOSS) or the > equivalent UniProt XML tags (which we are tentatively going to > call the "uniprot" format in Biopython's SeqIO - comments?). > > Like BioPerl (etc), Biopython can parse plain text SwissProt ("swiss") > files and load them into BioSQL. Biopython currently treats the DE > comment lines as a long string, as BioPerl used to: > > http://lists.open-bio.org/pipermail/bioperl-l/2009-May/030041.html > http://lists.open-bio.org/pipermail/biosql-l/2009-May/001514.html > > I understand that BioPerl now turns the SwissProt DE lines into a > TagTree, and for storing this in BioSQL this gets serialised as XML. > I would like Biopython to handle this the same way (although rather > than a Perl TagTree, we'd use a Python structure of course), and > would appreciate clarification of what exactly was implemented > (e.g. which bit of the BioPerl source code should be look at, > and could you show a worked example?). > > Andrea Pierlenoin (CC'd - not sure if he is on the BioSQL or > Open-Bio lists yet) has started work on parsing UniProt XML > files for Biopython. Here the DE comment lines are already > provided broken up with XML markup. Hopefully their nested > structure matches what BioPerl was doing with the SwissProt > DE lines. > > Regards, > > Peter > From avilella at gmail.com Fri Jan 22 16:04:13 2010 From: avilella at gmail.com (Albert Vilella) Date: Fri, 22 Jan 2010 16:04:13 +0000 Subject: [Bioperl-l] Merging fragments in a simplealign In-Reply-To: <058430B4-F149-438C-A9DD-C9E24545FDBB@illinois.edu> References: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com> <058430B4-F149-438C-A9DD-C9E24545FDBB@illinois.edu> Message-ID: <358f4d651001220804o3a96252ctc4b721771668f1ba@mail.gmail.com> Is there/should be a 'have_pairwise_overlap' method similar to this? # $seq1 and $seq3 have matching ids my $seq1 = $aln->each_seq_by_id($seq1->display_id); my $seq3 = $aln->each_seq_by_id($seq3->display_id); my $ret = $aln->have_pairwise_overlap($seq1,$seq3); On Fri, Jan 22, 2010 at 1:40 PM, Chris Fields wrote: > May be something for the cook/scrapbook? > > chris > > On Jan 22, 2010, at 7:31 AM, Mark A. Jensen wrote: > > > Here's one of my favorite tricks for this: XOR mask on gap symbol. > > MAJ > > > > use Bio::SeqIO; > > use Bio::Seq; > > use strict; > > my $seqio = Bio::SeqIO->new( -fh => \*DATA ); > > > > my $acc = $seqio->next_seq->seq ^ '-'; > > while ($_ = $seqio->next_seq ) { > > $acc ^= ($_->seq ^ '-'); > > } > > my $mrg = Bio::Seq->new( -id => 'merged', > > -seq => $acc ^ '-' ); > > 1; > > > > > > __END__ > >> seq2.234 > > QWERTYU------------------- > >> seq2.345 > > ----------ASDFGH---------- > >> seq2.456 > > -------------------ZXCVBNM > > > > ----- Original Message ----- From: "Albert Vilella" > > To: > > Sent: Friday, January 22, 2010 8:07 AM > > Subject: [Bioperl-l] Merging fragments in a simplealign > > > > > >> Hi, > >> I would like to write a script that merges fragments in a > Bio::SimpleAlign > >> object on the basis of > >> some $seq->display_name rule. > >> I basically want to start with something like this: > >> seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM > >> seq2.234 QWERTYU------------------- > >> seq2.345 ----------ASDFGH---------- > >> seq2.456 -------------------ZXCVBNM > >> And end with something like this: > >> seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM > >> seq2.mrg QWERTYU---ASDFGH---ZXCVBNM > >> Can people suggest any Bio::SimpleAlign methods that would help here? > >> Cheers, > >> Albert. > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Fri Jan 22 16:02:55 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 22 Jan 2010 11:02:55 -0500 Subject: [Bioperl-l] Merging fragments in a simplealign In-Reply-To: <058430B4-F149-438C-A9DD-C9E24545FDBB@illinois.edu> References: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com> <058430B4-F149-438C-A9DD-C9E24545FDBB@illinois.edu> Message-ID: http://www.bioperl.org/wiki/Merge_gapped_sequences_across_a_common_region ----- Original Message ----- From: "Chris Fields" To: "Mark A. Jensen" Cc: "Albert Vilella" ; Sent: Friday, January 22, 2010 8:40 AM Subject: Re: [Bioperl-l] Merging fragments in a simplealign > May be something for the cook/scrapbook? > > chris > > On Jan 22, 2010, at 7:31 AM, Mark A. Jensen wrote: > >> Here's one of my favorite tricks for this: XOR mask on gap symbol. >> MAJ >> >> use Bio::SeqIO; >> use Bio::Seq; >> use strict; >> my $seqio = Bio::SeqIO->new( -fh => \*DATA ); >> >> my $acc = $seqio->next_seq->seq ^ '-'; >> while ($_ = $seqio->next_seq ) { >> $acc ^= ($_->seq ^ '-'); >> } >> my $mrg = Bio::Seq->new( -id => 'merged', >> -seq => $acc ^ '-' ); >> 1; >> >> >> __END__ >>> seq2.234 >> QWERTYU------------------- >>> seq2.345 >> ----------ASDFGH---------- >>> seq2.456 >> -------------------ZXCVBNM >> >> ----- Original Message ----- From: "Albert Vilella" >> To: >> Sent: Friday, January 22, 2010 8:07 AM >> Subject: [Bioperl-l] Merging fragments in a simplealign >> >> >>> Hi, >>> I would like to write a script that merges fragments in a Bio::SimpleAlign >>> object on the basis of >>> some $seq->display_name rule. >>> I basically want to start with something like this: >>> seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM >>> seq2.234 QWERTYU------------------- >>> seq2.345 ----------ASDFGH---------- >>> seq2.456 -------------------ZXCVBNM >>> And end with something like this: >>> seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM >>> seq2.mrg QWERTYU---ASDFGH---ZXCVBNM >>> Can people suggest any Bio::SimpleAlign methods that would help here? >>> Cheers, >>> Albert. >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From avilella at gmail.com Fri Jan 22 17:50:57 2010 From: avilella at gmail.com (Albert Vilella) Date: Fri, 22 Jan 2010 17:50:57 +0000 Subject: [Bioperl-l] Merging fragments in a simplealign In-Reply-To: <358f4d651001220804o3a96252ctc4b721771668f1ba@mail.gmail.com> References: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com> <058430B4-F149-438C-A9DD-C9E24545FDBB@illinois.edu> <358f4d651001220804o3a96252ctc4b721771668f1ba@mail.gmail.com> Message-ID: <358f4d651001220950v605a5b55kef0302ff6270f82e@mail.gmail.com> Or to rephrase my answer, what is the closest way for the code below that already exists? On Fri, Jan 22, 2010 at 4:04 PM, Albert Vilella wrote: > Is there/should be a 'have_pairwise_overlap' method similar to this? > > # $seq1 and $seq3 have matching ids > my $seq1 = $aln->each_seq_by_id($seq1->display_id); > my $seq3 = $aln->each_seq_by_id($seq3->display_id); > > my $ret = $aln->have_pairwise_overlap($seq1,$seq3); > > > On Fri, Jan 22, 2010 at 1:40 PM, Chris Fields wrote: > >> May be something for the cook/scrapbook? >> >> chris >> >> On Jan 22, 2010, at 7:31 AM, Mark A. Jensen wrote: >> >> > Here's one of my favorite tricks for this: XOR mask on gap symbol. >> > MAJ >> > >> > use Bio::SeqIO; >> > use Bio::Seq; >> > use strict; >> > my $seqio = Bio::SeqIO->new( -fh => \*DATA ); >> > >> > my $acc = $seqio->next_seq->seq ^ '-'; >> > while ($_ = $seqio->next_seq ) { >> > $acc ^= ($_->seq ^ '-'); >> > } >> > my $mrg = Bio::Seq->new( -id => 'merged', >> > -seq => $acc ^ '-' ); >> > 1; >> > >> > >> > __END__ >> >> seq2.234 >> > QWERTYU------------------- >> >> seq2.345 >> > ----------ASDFGH---------- >> >> seq2.456 >> > -------------------ZXCVBNM >> > >> > ----- Original Message ----- From: "Albert Vilella" > > >> > To: >> > Sent: Friday, January 22, 2010 8:07 AM >> > Subject: [Bioperl-l] Merging fragments in a simplealign >> > >> > >> >> Hi, >> >> I would like to write a script that merges fragments in a >> Bio::SimpleAlign >> >> object on the basis of >> >> some $seq->display_name rule. >> >> I basically want to start with something like this: >> >> seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM >> >> seq2.234 QWERTYU------------------- >> >> seq2.345 ----------ASDFGH---------- >> >> seq2.456 -------------------ZXCVBNM >> >> And end with something like this: >> >> seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM >> >> seq2.mrg QWERTYU---ASDFGH---ZXCVBNM >> >> Can people suggest any Bio::SimpleAlign methods that would help here? >> >> Cheers, >> >> Albert. >> >> _______________________________________________ >> >> Bioperl-l mailing list >> >> Bioperl-l at lists.open-bio.org >> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > From jay at jays.net Fri Jan 22 18:30:57 2010 From: jay at jays.net (Jay Hannah) Date: Fri, 22 Jan 2010 12:30:57 -0600 Subject: [Bioperl-l] Bio::BroodComb - RFC In-Reply-To: References: <638696D6-7529-4717-A05A-F1E8FF1C5A8F@jays.net> Message-ID: On Jan 21, 2010, at 10:31 PM, Chris Fields wrote: > Did you want to release it to CPAN? I'll take a closer look at the docs to get an idea of what you are doing with it, but from my perspective I can see this becoming a nice general use tool akin to Bio::Perl, maybe a bit more lightweight. Yes, I was thinking I would. No one has (yet) told me it's the worst idea ever, so I'm feeling encouraged. :) Given smallish inputs / databases (up to a few million rows) where some lightweight schema + SQLite + BioPerl can get the job done, it's nice to have a little easy-to-run toolbox. New tables and Roles bolt on easily, so I'll be adding them as they surface at $work[1]. Thanks for your interest. :) Jay Hannah http://github.com/jhannah/bio-broodcomb http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah From dalalhina at gmail.com Fri Jan 22 17:31:09 2010 From: dalalhina at gmail.com (hina dalal) Date: Fri, 22 Jan 2010 17:31:09 +0000 Subject: [Bioperl-l] Bioperl installation failed Message-ID: <425f75df1001220931t49f5c768j97d91d2dd1757f19@mail.gmail.com> Hi I have installed PERL from Activesate and now trying to install bioperl but can not do it . Neither from PPM (it is showing error ?Ppm install failed: 404 not found?) nor from CPAN / manual installation. It is not allowing me to download nmake, showing that ?the version of this file is not compatible with the version of windows you are running. Check your computer system information to see whether you need 32 bit or 64 bit of this program.? I am using windows VISTA. Please help. Regards Hina From H.Dalal at sms.ed.ac.uk Fri Jan 22 17:34:55 2010 From: H.Dalal at sms.ed.ac.uk (Hina Dalal) Date: Fri, 22 Jan 2010 17:34:55 +0000 Subject: [Bioperl-l] BioPerl installation failed: please help Message-ID: <20100122173455.c19sbarmswswgswc@www.sms.ed.ac.uk> Hi I have installed PERL from Activesate and now trying to install bioperl but can not do it . Neither from PPM (it is showing error ?Ppm install failed: 404 not found?) nor from CPAN manual installation. It is not allowing me to download nmake, showing that ?the version of this file is not compatible with the version of windows you are running. Check your computer system information to see whether you need 32 bit or 64 bit of this program.? Please help. Regards Hina -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From jason at bioperl.org Fri Jan 22 19:18:30 2010 From: jason at bioperl.org (Jason Stajich) Date: Fri, 22 Jan 2010 11:18:30 -0800 Subject: [Bioperl-l] forcing alphabet in Bio::AlignIO In-Reply-To: <55F51BAA-7DA5-4F32-B680-DCAE1714A5F1@illinois.edu> References: <55F51BAA-7DA5-4F32-B680-DCAE1714A5F1@illinois.edu> Message-ID: <59EC9331-FB2F-4338-AD58-2D501A528A18@bioperl.org> Done, as of r16739. Look forward to the refactor work too. -jason On Jan 22, 2010, at 5:34 AM, Chris Fields wrote: > Sounds good to me. The warnings are a bit too tight on this module > anyway. > > I still think we have plans towards refactoring some of this, not > sure how far along they are: > > http://www.bioperl.org/wiki/Align_Refactor > > chris > > On Jan 22, 2010, at 12:17 AM, Jason Stajich wrote: > >> I'm considering putting in allowable initialization parameter (and >> get/set) for Bio::AlignIO that would allow setting of the >> alphabet. This is then passed to Bio::LocatableSeq creation so >> that _guess_alphabet isn't called. This will allow removal of >> warnings about empty sequences because _guess_alphabet won't be >> called on a sequence if we have explictly set the alphabet. >> >> This worked great on my local install and tests pass. Any >> objections or concerns? >> >> basically it means when you make an AlignIO you can specify the >> alphabet i.e. >> >> my $in = Bio::AlignIO->new(-format => 'fasta', -alphabet => 'dna', - >> file => 'genome.fasaln'); >> >> I have some alignments with empty sequences and I think turning off >> the warnings is appropriate where I force the alphabet choice. It >> should also have a very modest speedup benefit too. >> >> -jason >> -- >> Jason Stajich >> jason.stajich at gmail.com >> jason at bioperl.org >> http://fungalgenomes.org/ >> http://twitter.com/hyphaltip >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ http://twitter.com/hyphaltip From cjfields at illinois.edu Fri Jan 22 19:22:43 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 22 Jan 2010 13:22:43 -0600 Subject: [Bioperl-l] Merging fragments in a simplealign In-Reply-To: <358f4d651001220950v605a5b55kef0302ff6270f82e@mail.gmail.com> References: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com> <058430B4-F149-438C-A9DD-C9E24545FDBB@illinois.edu> <358f4d651001220804o3a96252ctc4b721771668f1ba@mail.gmail.com> <358f4d651001220950v605a5b55kef0302ff6270f82e@mail.gmail.com> Message-ID: <14824B66-2112-46A1-98BB-02FC592A3A9B@illinois.edu> This could exist, but should go into a general Utilities module. Part of the Align refactoring was to pull a good number of the methods into a general utilities module, so this would fit into that category. chris On Jan 22, 2010, at 11:50 AM, Albert Vilella wrote: > Or to rephrase my answer, what is the closest way for the code below that > already exists? > > On Fri, Jan 22, 2010 at 4:04 PM, Albert Vilella wrote: > >> Is there/should be a 'have_pairwise_overlap' method similar to this? >> >> # $seq1 and $seq3 have matching ids >> my $seq1 = $aln->each_seq_by_id($seq1->display_id); >> my $seq3 = $aln->each_seq_by_id($seq3->display_id); >> >> my $ret = $aln->have_pairwise_overlap($seq1,$seq3); >> >> >> On Fri, Jan 22, 2010 at 1:40 PM, Chris Fields wrote: >> >>> May be something for the cook/scrapbook? >>> >>> chris >>> >>> On Jan 22, 2010, at 7:31 AM, Mark A. Jensen wrote: >>> >>>> Here's one of my favorite tricks for this: XOR mask on gap symbol. >>>> MAJ >>>> >>>> use Bio::SeqIO; >>>> use Bio::Seq; >>>> use strict; >>>> my $seqio = Bio::SeqIO->new( -fh => \*DATA ); >>>> >>>> my $acc = $seqio->next_seq->seq ^ '-'; >>>> while ($_ = $seqio->next_seq ) { >>>> $acc ^= ($_->seq ^ '-'); >>>> } >>>> my $mrg = Bio::Seq->new( -id => 'merged', >>>> -seq => $acc ^ '-' ); >>>> 1; >>>> >>>> >>>> __END__ >>>>> seq2.234 >>>> QWERTYU------------------- >>>>> seq2.345 >>>> ----------ASDFGH---------- >>>>> seq2.456 >>>> -------------------ZXCVBNM >>>> >>>> ----- Original Message ----- From: "Albert Vilella" >>> >>>> To: >>>> Sent: Friday, January 22, 2010 8:07 AM >>>> Subject: [Bioperl-l] Merging fragments in a simplealign >>>> >>>> >>>>> Hi, >>>>> I would like to write a script that merges fragments in a >>> Bio::SimpleAlign >>>>> object on the basis of >>>>> some $seq->display_name rule. >>>>> I basically want to start with something like this: >>>>> seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM >>>>> seq2.234 QWERTYU------------------- >>>>> seq2.345 ----------ASDFGH---------- >>>>> seq2.456 -------------------ZXCVBNM >>>>> And end with something like this: >>>>> seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM >>>>> seq2.mrg QWERTYU---ASDFGH---ZXCVBNM >>>>> Can people suggest any Bio::SimpleAlign methods that would help here? >>>>> Cheers, >>>>> Albert. >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Fri Jan 22 19:29:07 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 22 Jan 2010 14:29:07 -0500 Subject: [Bioperl-l] Merging fragments in a simplealign In-Reply-To: <14824B66-2112-46A1-98BB-02FC592A3A9B@illinois.edu> References: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com><058430B4-F149-438C-A9DD-C9E24545FDBB@illinois.edu><358f4d651001220804o3a96252ctc4b721771668f1ba@mail.gmail.com><358f4d651001220950v605a5b55kef0302ff6270f82e@mail.gmail.com> <14824B66-2112-46A1-98BB-02FC592A3A9B@illinois.edu> Message-ID: <0F7B7E5FE70D4C5CB34B27045561823C@NewLife> I'd recommend making an enhancement request via Bugzilla, so we don't forget- MAJ ----- Original Message ----- From: "Chris Fields" To: "Albert Vilella" Cc: "bioperl-l" Sent: Friday, January 22, 2010 2:22 PM Subject: Re: [Bioperl-l] Merging fragments in a simplealign > This could exist, but should go into a general Utilities module. Part of the > Align refactoring was to pull a good number of the methods into a general > utilities module, so this would fit into that category. > > chris > > On Jan 22, 2010, at 11:50 AM, Albert Vilella wrote: > >> Or to rephrase my answer, what is the closest way for the code below that >> already exists? >> >> On Fri, Jan 22, 2010 at 4:04 PM, Albert Vilella wrote: >> >>> Is there/should be a 'have_pairwise_overlap' method similar to this? >>> >>> # $seq1 and $seq3 have matching ids >>> my $seq1 = $aln->each_seq_by_id($seq1->display_id); >>> my $seq3 = $aln->each_seq_by_id($seq3->display_id); >>> >>> my $ret = $aln->have_pairwise_overlap($seq1,$seq3); >>> >>> >>> On Fri, Jan 22, 2010 at 1:40 PM, Chris Fields wrote: >>> >>>> May be something for the cook/scrapbook? >>>> >>>> chris >>>> >>>> On Jan 22, 2010, at 7:31 AM, Mark A. Jensen wrote: >>>> >>>>> Here's one of my favorite tricks for this: XOR mask on gap symbol. >>>>> MAJ >>>>> >>>>> use Bio::SeqIO; >>>>> use Bio::Seq; >>>>> use strict; >>>>> my $seqio = Bio::SeqIO->new( -fh => \*DATA ); >>>>> >>>>> my $acc = $seqio->next_seq->seq ^ '-'; >>>>> while ($_ = $seqio->next_seq ) { >>>>> $acc ^= ($_->seq ^ '-'); >>>>> } >>>>> my $mrg = Bio::Seq->new( -id => 'merged', >>>>> -seq => $acc ^ '-' ); >>>>> 1; >>>>> >>>>> >>>>> __END__ >>>>>> seq2.234 >>>>> QWERTYU------------------- >>>>>> seq2.345 >>>>> ----------ASDFGH---------- >>>>>> seq2.456 >>>>> -------------------ZXCVBNM >>>>> >>>>> ----- Original Message ----- From: "Albert Vilella" >>>> >>>>> To: >>>>> Sent: Friday, January 22, 2010 8:07 AM >>>>> Subject: [Bioperl-l] Merging fragments in a simplealign >>>>> >>>>> >>>>>> Hi, >>>>>> I would like to write a script that merges fragments in a >>>> Bio::SimpleAlign >>>>>> object on the basis of >>>>>> some $seq->display_name rule. >>>>>> I basically want to start with something like this: >>>>>> seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM >>>>>> seq2.234 QWERTYU------------------- >>>>>> seq2.345 ----------ASDFGH---------- >>>>>> seq2.456 -------------------ZXCVBNM >>>>>> And end with something like this: >>>>>> seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM >>>>>> seq2.mrg QWERTYU---ASDFGH---ZXCVBNM >>>>>> Can people suggest any Bio::SimpleAlign methods that would help here? >>>>>> Cheers, >>>>>> Albert. >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Fri Jan 22 19:33:41 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 22 Jan 2010 14:33:41 -0500 Subject: [Bioperl-l] BioPerl installation failed: please help In-Reply-To: <20100122173455.c19sbarmswswgswc@www.sms.ed.ac.uk> References: <20100122173455.c19sbarmswswgswc@www.sms.ed.ac.uk> Message-ID: <2ABEC492CD49450EAE4BFC7BA763E3DB@NewLife> Hina-- See the protocol at http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows#Comand-line_Installation for ActiveState installation. If it doesn't work, please let us know at which step the failure happened. cheers, MAJ ----- Original Message ----- From: "Hina Dalal" To: Sent: Friday, January 22, 2010 12:34 PM Subject: [Bioperl-l] BioPerl installation failed: please help Hi I have installed PERL from Activesate and now trying to install bioperl but can not do it . Neither from PPM (it is showing error "Ppm install failed: 404 not found") nor from CPAN manual installation. It is not allowing me to download nmake, showing that "the version of this file is not compatible with the version of windows you are running. Check your computer system information to see whether you need 32 bit or 64 bit of this program." Please help. Regards Hina -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Fri Jan 22 20:13:15 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 22 Jan 2010 15:13:15 -0500 Subject: [Bioperl-l] BioPerl installation failed: please help In-Reply-To: <20100122200118.053j5kc36skow0wg@www.sms.ed.ac.uk> References: <20100122173455.c19sbarmswswgswc@www.sms.ed.ac.uk><2ABEC492CD49450EAE4BFC7BA763E3DB@NewLife> <20100122200118.053j5kc36skow0wg@www.sms.ed.ac.uk> Message-ID: <9E5DE384E2C8416B8373E390ABDB7DFE@NewLife> Ok Hina, I'm not seeing any issues with the presence or availability of http://bioperl.org/DIST from my machine. Can you access that url in a browser? If not, the king of the King's Buildings may not be allowing access. Also, can you do the following: C:> ppm-shell ppm> repo list Note the number of the repo that corresponds to bioperl (if any) and do ppm> repo describe n where 'n' is that number, and send the output along. cheers, MAJ ----- Original Message ----- From: "Hina Dalal" To: "Mark A. Jensen" Sent: Friday, January 22, 2010 3:01 PM Subject: Re: [Bioperl-l] BioPerl installation failed: please help Hi Mark warm regards I was following that protocol only , but the problem is when I tried to do it from PPM, and when I reach at the stem install BioPerl, it is showing error "Ppm install failed: 404 not found" in the end. and when I tried it by CPAN /manual installation, I couldn't download nmake,its showing that "the version of this file is not compatible with the version of windows you are running. Check your computer system information to see whether you need 32 bit or 64 bit of this program and than contact the software publisher." What should I do? Please help. Regards Hina Quoting "Mark A. Jensen" : > Hina-- See the protocol at > http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows#Comand-line_Installation > for ActiveState installation. If it doesn't work, please let us know at > which step the failure happened. > cheers, MAJ > ----- Original Message ----- From: "Hina Dalal" > To: > Sent: Friday, January 22, 2010 12:34 PM > Subject: [Bioperl-l] BioPerl installation failed: please help > > > Hi > > I have installed PERL from Activesate and now trying to install > bioperl but can not do it . Neither from PPM (it is showing error "Ppm > install failed: 404 not found") nor from CPAN manual installation. It > is not allowing me to download nmake, showing that "the version of > this file is not compatible with the version of windows you are > running. Check your computer system information to see whether you > need 32 bit or 64 bit of this program." > > Please help. > > Regards > > Hina > > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From pengyu.ut at gmail.com Mon Jan 25 01:29:59 2010 From: pengyu.ut at gmail.com (Peng Yu) Date: Sun, 24 Jan 2010 19:29:59 -0600 Subject: [Bioperl-l] Transcribe in bioperl Message-ID: <366c6f341001241729t155483a7q65631b2495c338b0@mail.gmail.com> I found the function 'translate' in bioperl. But I don't find 'transcribe'. Is there such a function? From jason at bioperl.org Mon Jan 25 02:06:48 2010 From: jason at bioperl.org (Jason Stajich) Date: Sun, 24 Jan 2010 18:06:48 -0800 Subject: [Bioperl-l] Transcribe in bioperl In-Reply-To: <366c6f341001241729t155483a7q65631b2495c338b0@mail.gmail.com> References: <366c6f341001241729t155483a7q65631b2495c338b0@mail.gmail.com> Message-ID: What exactly do you want to do? spliced_seq for a feature would be the closest thing... -jason On Jan 24, 2010, at 5:29 PM, Peng Yu wrote: > I found the function 'translate' in bioperl. But I don't find > 'transcribe'. Is there such a function? > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ http://twitter.com/hyphaltip From pengyu.ut at gmail.com Mon Jan 25 02:22:12 2010 From: pengyu.ut at gmail.com (Peng Yu) Date: Sun, 24 Jan 2010 20:22:12 -0600 Subject: [Bioperl-l] Transcribe in bioperl In-Reply-To: References: <366c6f341001241729t155483a7q65631b2495c338b0@mail.gmail.com> Message-ID: <366c6f341001241822t790c19eeo5f5facffa0c35600@mail.gmail.com> To convert from T to U. I could use perl's builtin function. But it is semantically far away from 'transcribe'. If there is a function with name 'transcribe', it will be better. On Sun, Jan 24, 2010 at 8:06 PM, Jason Stajich wrote: > What exactly do you want to do? > spliced_seq for a feature would be the closest thing... > > -jason > On Jan 24, 2010, at 5:29 PM, Peng Yu wrote: > >> I found the function 'translate' in bioperl. But I don't find >> 'transcribe'. Is there such a function? >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > http://fungalgenomes.org/ > http://twitter.com/hyphaltip > > From maj at fortinbras.us Mon Jan 25 02:48:33 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 24 Jan 2010 21:48:33 -0500 Subject: [Bioperl-l] Transcribe in bioperl In-Reply-To: <366c6f341001241822t790c19eeo5f5facffa0c35600@mail.gmail.com> References: <366c6f341001241729t155483a7q65631b2495c338b0@mail.gmail.com> <366c6f341001241822t790c19eeo5f5facffa0c35600@mail.gmail.com> Message-ID: Not a bad idea, a semantics-preserving/checking thing. transcribe() could return an object with alphabet == 'rna' and the T's flipped, or bork if called against an object with alphbet != 'dna'. I can add such a thing to Bio::PrimarySeqI (where all these doodads seem to be stashed), if desired. ----- Original Message ----- From: "Peng Yu" To: "Jason Stajich" Cc: Sent: Sunday, January 24, 2010 9:22 PM Subject: Re: [Bioperl-l] Transcribe in bioperl > To convert from T to U. I could use perl's builtin function. But it is > semantically far away from 'transcribe'. If there is a function with > name 'transcribe', it will be better. > > On Sun, Jan 24, 2010 at 8:06 PM, Jason Stajich wrote: >> What exactly do you want to do? >> spliced_seq for a feature would be the closest thing... >> >> -jason >> On Jan 24, 2010, at 5:29 PM, Peng Yu wrote: >> >>> I found the function 'translate' in bioperl. But I don't find >>> 'transcribe'. Is there such a function? >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> jason.stajich at gmail.com >> jason at bioperl.org >> http://fungalgenomes.org/ >> http://twitter.com/hyphaltip >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Mon Jan 25 04:39:43 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 24 Jan 2010 22:39:43 -0600 Subject: [Bioperl-l] Transcribe in bioperl In-Reply-To: References: <366c6f341001241729t155483a7q65631b2495c338b0@mail.gmail.com> <366c6f341001241822t790c19eeo5f5facffa0c35600@mail.gmail.com> Message-ID: I think the main reason there hasn't been a transcribe() is that very few users ask for it. Most just use a quick '$seq =~ tr/T/U/', or use spliced_seq() and/or translate() (i.e. they don't care about the intermediate mRNA). I don't have a problem with adding a transcribe method to PrimarySeq, but (and Mark has already picked up on this) it should be constrained to DNA only and return RNA. And there might be a case for adding the analogous reverse_translate(). Also worth adding this to the proper interface class (PrimarySeqI, I think) so all Seq/PrimarySeq will have it (or have to implement their own). chris On Jan 24, 2010, at 8:48 PM, Mark A. Jensen wrote: > Not a bad idea, a semantics-preserving/checking thing. transcribe() could return an object with alphabet == 'rna' > and the T's flipped, or bork if called against an object with alphbet != 'dna'. > I can add such a thing to Bio::PrimarySeqI (where all these doodads seem to be stashed), if desired. > > ----- Original Message ----- From: "Peng Yu" > To: "Jason Stajich" > Cc: > Sent: Sunday, January 24, 2010 9:22 PM > Subject: Re: [Bioperl-l] Transcribe in bioperl > > >> To convert from T to U. I could use perl's builtin function. But it is >> semantically far away from 'transcribe'. If there is a function with >> name 'transcribe', it will be better. >> On Sun, Jan 24, 2010 at 8:06 PM, Jason Stajich wrote: >>> What exactly do you want to do? >>> spliced_seq for a feature would be the closest thing... >>> >>> -jason >>> On Jan 24, 2010, at 5:29 PM, Peng Yu wrote: >>> >>>> I found the function 'translate' in bioperl. But I don't find >>>> 'transcribe'. Is there such a function? >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> -- >>> Jason Stajich >>> jason.stajich at gmail.com >>> jason at bioperl.org >>> http://fungalgenomes.org/ >>> http://twitter.com/hyphaltip >>> >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Jan 25 04:43:07 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 24 Jan 2010 22:43:07 -0600 Subject: [Bioperl-l] Transcribe in bioperl In-Reply-To: References: <366c6f341001241729t155483a7q65631b2495c338b0@mail.gmail.com> <366c6f341001241822t790c19eeo5f5facffa0c35600@mail.gmail.com> Message-ID: <489E0B85-0BC3-45DB-8660-494CF69F35FF@illinois.edu> On Jan 24, 2010, at 10:39 PM, Chris Fields wrote: > ...And there might be a case for adding the analogous reverse_translate(). Bah. Meant reverse_transcribe(). Ah well. chris From dan.kortschak at adelaide.edu.au Mon Jan 25 05:33:28 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Mon, 25 Jan 2010 16:03:28 +1030 Subject: [Bioperl-l] BEDTools module Message-ID: <1264397608.4898.9.camel@epistle> Hi All, A wrapper and parser for the BEDTools utilities suite of Aaron Quinlan and Ira Hall is now available in the bioperl-run subversion repository (bioperl-run/trunk r16754). Using BEDTools you can, among other things: * Intersecting two BED files in search of overlapping features. * Merging overlapping features. * Screening for paired-end (PE) overlaps between PE sequences and existing genomic features. * Calculating the depth and breadth of sequence coverage across defined "windows" in a genome. (see for manuals and downloads). BEDTools is a suite of 17 commandline executable. The module attempts to provide and options comprehensively and can return Bio::SeqIO or Bio::SeqFeature::Collection object where appropriate (or Bio::Root::IO where specific handling has not been implemented - please give feedback on desired features for this). cheers Dan From cjfields at illinois.edu Mon Jan 25 05:35:06 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 24 Jan 2010 23:35:06 -0600 Subject: [Bioperl-l] Distance between non-overlapping sequences in DNAStatistics Message-ID: <192B6949-26CA-45EA-A4E6-FD89F216CA84@illinois.edu> Just a quick question for those using DNAStatistics. I just fixed a bug in Bio::Align::DNAStatistics that failed with a div by zero error (bug 2901) on this data: >seq1 GGTACCAGCAGGTGGGCCGCCTACTGCGCACGCGCGGGTTTGCGGGCAGCCGC >seq2 GGTACCAGCTGGTGGGCCGCCTACTGCGCACGCGCGGGTTTGCGGGCAGCCGC >seq3 GGTACCAGCAGGTGGTCCGCCTA------------------------------ >seq4 --------------------------CGCACGCGCGTGTTTGCGGGCAGCCGC Since seq3 and seq4 don't overlap, the distance can't be calculated. In our case, I replace the score with 'NA' as a placeholder, but I'm worried about downstream app breakage. Anyone have an objection to using 'NA' here, or know of ways this may lead to problems elsewhere? chris From jason at bioperl.org Mon Jan 25 05:58:03 2010 From: jason at bioperl.org (Jason Stajich) Date: Sun, 24 Jan 2010 21:58:03 -0800 Subject: [Bioperl-l] Distance between non-overlapping sequences in DNAStatistics In-Reply-To: <192B6949-26CA-45EA-A4E6-FD89F216CA84@illinois.edu> References: <192B6949-26CA-45EA-A4E6-FD89F216CA84@illinois.edu> Message-ID: It could also return -1 which is used as place holder for NA in other programs that generate distance matrices. -jason On Jan 24, 2010, at 9:35 PM, Chris Fields wrote: > Just a quick question for those using DNAStatistics. I just fixed a > bug in Bio::Align::DNAStatistics that failed with a div by zero > error (bug 2901) on this data: > >> seq1 > GGTACCAGCAGGTGGGCCGCCTACTGCGCACGCGCGGGTTTGCGGGCAGCCGC >> seq2 > GGTACCAGCTGGTGGGCCGCCTACTGCGCACGCGCGGGTTTGCGGGCAGCCGC >> seq3 > GGTACCAGCAGGTGGTCCGCCTA------------------------------ >> seq4 > --------------------------CGCACGCGCGTGTTTGCGGGCAGCCGC > > Since seq3 and seq4 don't overlap, the distance can't be > calculated. In our case, I replace the score with 'NA' as a > placeholder, but I'm worried about downstream app breakage. Anyone > have an objection to using 'NA' here, or know of ways this may lead > to problems elsewhere? > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ http://twitter.com/hyphaltip From maj at fortinbras.us Mon Jan 25 13:17:54 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 25 Jan 2010 08:17:54 -0500 Subject: [Bioperl-l] Transcribe in bioperl In-Reply-To: References: <366c6f341001241729t155483a7q65631b2495c338b0@mail.gmail.com><366c6f341001241822t790c19eeo5f5facffa0c35600@mail.gmail.com> Message-ID: transcribe() and rev_transcribe added to Bio::PrimarySeqI, plus tests in t/Seq.t, @ r16757 MAJ ----- Original Message ----- From: "Chris Fields" To: "Mark A. Jensen" Cc: ; "Peng Yu" Sent: Sunday, January 24, 2010 11:39 PM Subject: Re: [Bioperl-l] Transcribe in bioperl >I think the main reason there hasn't been a transcribe() is that very few users >ask for it. Most just use a quick '$seq =~ tr/T/U/', or use spliced_seq() >and/or translate() (i.e. they don't care about the intermediate mRNA). I don't >have a problem with adding a transcribe method to PrimarySeq, but (and Mark has >already picked up on this) it should be constrained to DNA only and return RNA. >And there might be a case for adding the analogous reverse_translate(). > > Also worth adding this to the proper interface class (PrimarySeqI, I think) so > all Seq/PrimarySeq will have it (or have to implement their own). > > chris > > On Jan 24, 2010, at 8:48 PM, Mark A. Jensen wrote: > >> Not a bad idea, a semantics-preserving/checking thing. transcribe() could >> return an object with alphabet == 'rna' >> and the T's flipped, or bork if called against an object with alphbet != >> 'dna'. >> I can add such a thing to Bio::PrimarySeqI (where all these doodads seem to >> be stashed), if desired. >> >> ----- Original Message ----- From: "Peng Yu" >> To: "Jason Stajich" >> Cc: >> Sent: Sunday, January 24, 2010 9:22 PM >> Subject: Re: [Bioperl-l] Transcribe in bioperl >> >> >>> To convert from T to U. I could use perl's builtin function. But it is >>> semantically far away from 'transcribe'. If there is a function with >>> name 'transcribe', it will be better. >>> On Sun, Jan 24, 2010 at 8:06 PM, Jason Stajich wrote: >>>> What exactly do you want to do? >>>> spliced_seq for a feature would be the closest thing... >>>> >>>> -jason >>>> On Jan 24, 2010, at 5:29 PM, Peng Yu wrote: >>>> >>>>> I found the function 'translate' in bioperl. But I don't find >>>>> 'transcribe'. Is there such a function? >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> -- >>>> Jason Stajich >>>> jason.stajich at gmail.com >>>> jason at bioperl.org >>>> http://fungalgenomes.org/ >>>> http://twitter.com/hyphaltip >>>> >>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Mon Jan 25 13:23:12 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 25 Jan 2010 07:23:12 -0600 Subject: [Bioperl-l] BEDTools module In-Reply-To: <1264397608.4898.9.camel@epistle> References: <1264397608.4898.9.camel@epistle> Message-ID: <0F5CE93E-0E6C-4317-806B-A463A9B0917E@illinois.edu> Great work Dan! chris On Jan 24, 2010, at 11:33 PM, Dan Kortschak wrote: > Hi All, > > A wrapper and parser for the BEDTools utilities suite of Aaron Quinlan > and Ira Hall is now available in the bioperl-run subversion repository > (bioperl-run/trunk r16754). > > Using BEDTools you can, among other things: > > * Intersecting two BED files in search of overlapping features. > * Merging overlapping features. > * Screening for paired-end (PE) overlaps between PE sequences and > existing genomic features. > * Calculating the depth and breadth of sequence coverage across > defined "windows" in a genome. > > (see for manuals and downloads). > > BEDTools is a suite of 17 commandline executable. The module attempts to > provide and options comprehensively and can return Bio::SeqIO or > Bio::SeqFeature::Collection object where appropriate (or Bio::Root::IO > where specific handling has not been implemented - please give feedback > on desired features for this). > > cheers > Dan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Jan 25 13:27:26 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 25 Jan 2010 07:27:26 -0600 Subject: [Bioperl-l] Distance between non-overlapping sequences in DNAStatistics In-Reply-To: References: <192B6949-26CA-45EA-A4E6-FD89F216CA84@illinois.edu> Message-ID: That works for me, just want to ensure we're DTRT. I'll change it over. chris On Jan 24, 2010, at 11:58 PM, Jason Stajich wrote: > It could also return -1 which is used as place holder for NA in other programs that generate distance matrices. > -jason > On Jan 24, 2010, at 9:35 PM, Chris Fields wrote: > >> Just a quick question for those using DNAStatistics. I just fixed a bug in Bio::Align::DNAStatistics that failed with a div by zero error (bug 2901) on this data: >> >>> seq1 >> GGTACCAGCAGGTGGGCCGCCTACTGCGCACGCGCGGGTTTGCGGGCAGCCGC >>> seq2 >> GGTACCAGCTGGTGGGCCGCCTACTGCGCACGCGCGGGTTTGCGGGCAGCCGC >>> seq3 >> GGTACCAGCAGGTGGTCCGCCTA------------------------------ >>> seq4 >> --------------------------CGCACGCGCGTGTTTGCGGGCAGCCGC >> >> Since seq3 and seq4 don't overlap, the distance can't be calculated. In our case, I replace the score with 'NA' as a placeholder, but I'm worried about downstream app breakage. Anyone have an objection to using 'NA' here, or know of ways this may lead to problems elsewhere? >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > http://fungalgenomes.org/ > http://twitter.com/hyphaltip > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Mon Jan 25 13:41:38 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 25 Jan 2010 08:41:38 -0500 Subject: [Bioperl-l] BEDTools module In-Reply-To: <1264397608.4898.9.camel@epistle> References: <1264397608.4898.9.camel@epistle> Message-ID: <8D494783F87E4C32BD797008E260C3C2@NewLife> Rock 'n' roll, Dan! ----- Original Message ----- From: "Dan Kortschak" To: Sent: Monday, January 25, 2010 12:33 AM Subject: [Bioperl-l] BEDTools module > Hi All, > > A wrapper and parser for the BEDTools utilities suite of Aaron Quinlan > and Ira Hall is now available in the bioperl-run subversion repository > (bioperl-run/trunk r16754). > > Using BEDTools you can, among other things: > > * Intersecting two BED files in search of overlapping features. > * Merging overlapping features. > * Screening for paired-end (PE) overlaps between PE sequences and > existing genomic features. > * Calculating the depth and breadth of sequence coverage across > defined "windows" in a genome. > > (see for manuals and downloads). > > BEDTools is a suite of 17 commandline executable. The module attempts to > provide and options comprehensively and can return Bio::SeqIO or > Bio::SeqFeature::Collection object where appropriate (or Bio::Root::IO > where specific handling has not been implemented - please give feedback > on desired features for this). > > cheers > Dan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From rtbio.2009 at gmail.com Mon Jan 25 13:43:19 2010 From: rtbio.2009 at gmail.com (Roopa Raghuveer) Date: Mon, 25 Jan 2010 14:43:19 +0100 Subject: [Bioperl-l] Regarding blast in Bioperl Message-ID: Hello Mark,Chris and all, This is Roopa again. I have a small problem again. I am working on Remote blast. The program works well. But the problem is this. The program accesses the server and gets the output correctly. I am trying to send the result sequences into an array and I found that always the first sequence among the Result sequences is missing. The code is my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , '-organism' => "$organ\[ORGN]"); - Show quoted text - while (my $input = $str->next_seq()) { #Blast a sequence against a database: #Alternatively, you could pass in a file with many #sequences rather than loop through sequence one at a time #Remove the loop starting 'while (my $input = $str->next_seq())' #and swap the two lines below for an example of that. open(OUTFILE,'>',$debugfile); print OUTFILE $input; close(OUTFILE); my $r = $factory->submit_blast($input); open(OUTFILE,'>',$debugfile); # print OUTFILE $r; close(OUTFILE); print STDERR "waiting...." if($v>0); while ( my @rids = $factory->each_rid ) { open(OUTFILE,'>',$debugfile); # print OUTFILE "while entered"; close(OUTFILE); foreach my $rid ( @rids ) { open(OUTFILE,'>',$debugfile); # print OUTFILE "foreach entered"; close(OUTFILE); my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $factory->remove_rid($rid); } open(OUTFILE,'>',$debugfile); # print OUTFILE "if entered"; close(OUTFILE); print STDERR "." if ( $v > 0 ); sleep 5; } else { open(OUTFILE,'>',$debugfile); # print OUTFILE "else entered"; close(OUTFILE); my $result = $rc->next_result(); #save the output $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; open(BLASTDEBUGFILE,'>',$blastdebugfile); print BLASTDEBUGFILE $result->next_hit(); close(BLASTDEBUGFILE); my $filename = $serverpath."/blastdata_". time()."\.out"; # open(DEBUGFILE,'>',$debugfile); # open(new,'>',$filename); # @arra=; # print DEBUGFILE @arra; # close(DEBUGFILE); # close(new); $factory->save_output($filename); # open(BLASTDEBUGFILE,'>',$debugfile); # print BLASTDEBUGFILE "Hello $rid"; # close(BLASTDEBUGFILE); $factory->remove_rid($rid); open(BLASTDEBUGFILE,'>',$blastdebugfile); print BLASTDEBUGFILE $organism; close(BLASTDEBUGFILE); # open(OUTFILE,'>',$outfile); # print OUTFILE "Test2 $result->database_name()"; # close(OUTFILE); #$hit = $result->next_hit; #open(new,'>',$debugfile); #print $hit; #close(new); $dummy=0; while ( my $hit = $result->next_hit ) { next unless ( $v >= 0); # open(OUTFILE,'>',$debugfile); # print OUTFILE "$hit in while hits"; # close(OUTFILE); my $sequ = $gb->get_Seq_by_version($hit->name); my $dna = $sequ->seq(); # get the sequence as a string $dummy++; open(OUTFILE,'>',$debugfile); # print OUTFILE $dummy; close(OUTFILE); push(@seqs,$dna); } } } } } $warum=@seqs; open(OUTFILE,'>',$debugfile); # print OUTFILE $warum; print OUTFILE @seqs; close(OUTFILE); return(@seqs); } open(OUTFILE, '>',$outfile) || die ; print OUTFILE "\n RNAi Result \n \n

Inputsequence:
"; Here in the above code, I was trying to debug the code and trying to get the count of the array and even the sequence. But when the output data was giving 1 sequence, the count of the array was 0 and when I tried to print the output sequence I could not get any. It was the same when the no of output sequences was 3, I tried to print the sequences but was getting the count of the array as 2 and was printing only two sequences. Please help me in sorting out this problem. Regards, Roopa. From rtbio.2009 at gmail.com Mon Jan 25 13:44:57 2010 From: rtbio.2009 at gmail.com (Roopa Raghuveer) Date: Mon, 25 Jan 2010 14:44:57 +0100 Subject: [Bioperl-l] remote blast bioperl Message-ID: Hello all, I have a small problem again. I am working on Remote blast. The program works well. But the problem is this. The program accesses the server and gets the output correctly. I am trying to send the result sequences into an array and I found that always the first sequence among the Result sequences is missing. The code is my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , '-organism' => "$organ\[ORGN]"); - Show quoted text - while (my $input = $str->next_seq()) { #Blast a sequence against a database: #Alternatively, you could pass in a file with many #sequences rather than loop through sequence one at a time #Remove the loop starting 'while (my $input = $str->next_seq())' #and swap the two lines below for an example of that. open(OUTFILE,'>',$debugfile); print OUTFILE $input; close(OUTFILE); my $r = $factory->submit_blast($input); open(OUTFILE,'>',$debugfile); # print OUTFILE $r; close(OUTFILE); print STDERR "waiting...." if($v>0); while ( my @rids = $factory->each_rid ) { open(OUTFILE,'>',$debugfile); # print OUTFILE "while entered"; close(OUTFILE); foreach my $rid ( @rids ) { open(OUTFILE,'>',$debugfile); # print OUTFILE "foreach entered"; close(OUTFILE); my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $factory->remove_rid($rid); } open(OUTFILE,'>',$debugfile); # print OUTFILE "if entered"; close(OUTFILE); print STDERR "." if ( $v > 0 ); sleep 5; } else { open(OUTFILE,'>',$debugfile); # print OUTFILE "else entered"; close(OUTFILE); my $result = $rc->next_result(); #save the output $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; open(BLASTDEBUGFILE,'>',$blastdebugfile); print BLASTDEBUGFILE $result->next_hit(); close(BLASTDEBUGFILE); my $filename = $serverpath."/blastdata_". time()."\.out"; # open(DEBUGFILE,'>',$debugfile); # open(new,'>',$filename); # @arra=; # print DEBUGFILE @arra; # close(DEBUGFILE); # close(new); $factory->save_output($filename); # open(BLASTDEBUGFILE,'>',$debugfile); # print BLASTDEBUGFILE "Hello $rid"; # close(BLASTDEBUGFILE); $factory->remove_rid($rid); open(BLASTDEBUGFILE,'>',$blastdebugfile); print BLASTDEBUGFILE $organism; close(BLASTDEBUGFILE); # open(OUTFILE,'>',$outfile); # print OUTFILE "Test2 $result->database_name()"; # close(OUTFILE); #$hit = $result->next_hit; #open(new,'>',$debugfile); #print $hit; #close(new); $dummy=0; while ( my $hit = $result->next_hit ) { next unless ( $v >= 0); # open(OUTFILE,'>',$debugfile); # print OUTFILE "$hit in while hits"; # close(OUTFILE); my $sequ = $gb->get_Seq_by_version($hit->name); my $dna = $sequ->seq(); # get the sequence as a string $dummy++; open(OUTFILE,'>',$debugfile); # print OUTFILE $dummy; close(OUTFILE); push(@seqs,$dna); } } } } } $warum=@seqs; open(OUTFILE,'>',$debugfile); # print OUTFILE $warum; print OUTFILE @seqs; close(OUTFILE); return(@seqs); } open(OUTFILE, '>',$outfile) || die ; print OUTFILE "\n RNAi Result \n \n

Inputsequence:
"; Here in the above code, I was trying to debug the code and trying to get the count of the array and even the sequence. But when the output data was giving 1 sequence, the count of the array was 0 and when I tried to print the output sequence I could not get any. It was the same when the no of output sequences was 3, I tried to print the sequences but was getting the count of the array as 2 and was printing only two sequences. Please help me in sorting out this problem. Regards, Roopa. From cjfields at illinois.edu Mon Jan 25 14:05:44 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 25 Jan 2010 08:05:44 -0600 Subject: [Bioperl-l] remote blast bioperl In-Reply-To: References: Message-ID: <7E402CC5-9C66-4315-B437-7C4EC2317371@illinois.edu> Roopa, We have received all 4+ of your posts. There is absolutely no need for you to keep repeatedly posting the same thing to the list. Be patient, we'll try to get to you as soon as we can! chris On Jan 25, 2010, at 7:44 AM, Roopa Raghuveer wrote: > Hello all, > > I have a small problem again. I am working on Remote blast. The program works well. But the problem is this. The program accesses the server and gets the output correctly. I am trying to send the result sequences into an array and I found that always the first sequence among the Result sequences is missing. The code is > > my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , '-organism' => "$organ\[ORGN]"); > - Show quoted text - > > > while (my $input = $str->next_seq()) > { > #Blast a sequence against a database: > #Alternatively, you could pass in a file with many > #sequences rather than loop through sequence one at a time > #Remove the loop starting 'while (my $input = $str->next_seq())' > #and swap the two lines below for an example of that. > > open(OUTFILE,'>',$debugfile); > print OUTFILE $input; > close(OUTFILE); > > > my $r = $factory->submit_blast($input); > > open(OUTFILE,'>',$debugfile); > # print OUTFILE $r; > close(OUTFILE); > > > print STDERR "waiting...." if($v>0); > > while ( my @rids = $factory->each_rid ) { > open(OUTFILE,'>',$debugfile); > # print OUTFILE "while entered"; > close(OUTFILE); > foreach my $rid ( @rids ) { > > open(OUTFILE,'>',$debugfile); > # print OUTFILE "foreach entered"; > close(OUTFILE); > > my $rc = $factory->retrieve_blast($rid); > > if( !ref($rc) ) > { > if( $rc < 0 ) > { > $factory->remove_rid($rid); > } > open(OUTFILE,'>',$debugfile); > # print OUTFILE "if entered"; > close(OUTFILE); > print STDERR "." if ( $v > 0 ); > sleep 5; > } > else { > open(OUTFILE,'>',$debugfile); > # print OUTFILE "else entered"; > close(OUTFILE); > > my $result = $rc->next_result(); > #save the output > $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; > > open(BLASTDEBUGFILE,'>',$blastdebugfile); > print BLASTDEBUGFILE $result->next_hit(); > close(BLASTDEBUGFILE); > > my $filename = $serverpath."/blastdata_". > time()."\.out"; > > > # open(DEBUGFILE,'>',$debugfile); > # open(new,'>',$filename); > # @arra=; > # print DEBUGFILE @arra; > # close(DEBUGFILE); > # close(new); > > $factory->save_output($filename); > > # open(BLASTDEBUGFILE,'>',$debugfile); > # print BLASTDEBUGFILE "Hello $rid"; > # close(BLASTDEBUGFILE); > > $factory->remove_rid($rid); > > open(BLASTDEBUGFILE,'>',$blastdebugfile); > print BLASTDEBUGFILE $organism; > close(BLASTDEBUGFILE); > > # open(OUTFILE,'>',$outfile); > # print OUTFILE "Test2 $result->database_name()"; > # close(OUTFILE); > > #$hit = $result->next_hit; > #open(new,'>',$debugfile); > #print $hit; > #close(new); > $dummy=0; > > while ( my $hit = $result->next_hit ) { > > next unless ( $v >= 0); > > # open(OUTFILE,'>',$debugfile); > # print OUTFILE "$hit in while hits"; > # close(OUTFILE); > my $sequ = $gb->get_Seq_by_version($hit->name); > my $dna = $sequ->seq(); # get the sequence as a string > $dummy++; > open(OUTFILE,'>',$debugfile); > # print OUTFILE $dummy; > close(OUTFILE); > push(@seqs,$dna); > } > } > } > } > } > > $warum=@seqs; > open(OUTFILE,'>',$debugfile); > # print OUTFILE $warum; > print OUTFILE @seqs; > > close(OUTFILE); > return(@seqs); > } > > open(OUTFILE, '>',$outfile) || die ; > > print OUTFILE "\n > RNAi Result > \n > \n >

> Inputsequence:
"; > > > Here in the above code, I was trying to debug the code and trying to get the count of the array and even the sequence. But when the output data was giving 1 sequence, the count of the array was 0 and when I tried to print the output sequence I could not get any. It was the same when the no of output sequences was 3, I tried to print the sequences but was getting the count of the array as 2 and was printing only two sequences. > > Please help me in sorting out this problem. > > Regards, > Roopa. From jiann-jy at hotmail.com Mon Jan 25 02:03:55 2010 From: jiann-jy at hotmail.com (JY) Date: Sun, 24 Jan 2010 18:03:55 -0800 (PST) Subject: [Bioperl-l] how to retrieve accession number by taxon id?? Message-ID: <4cef88b5-fa53-4e63-9167-30075c10a058@k19g2000yqc.googlegroups.com> i need to retrieve accession number and sequence to complete one of my part in my project, but how to retrieve accession number by the taxon id. From lpaulet at ual.es Mon Jan 25 20:25:55 2010 From: lpaulet at ual.es (Lorenzo Carretero-Paulet) Date: Mon, 25 Jan 2010 21:25:55 +0100 Subject: [Bioperl-l] HTMLResultWriter Message-ID: <4B5DFE53.2000201@ual.es> Hi all, I'm trying to generate a subroutine that performs a BLAST search and returns the corresponding reports in txt, xml and html format. I?m experiencing problems with the latter, as the program returns the following error message: "Can't call method "next_result" without a package or object reference at..." sub blasting { my ($query, $E_value) = @_; my ($outputfilenameB, $outputfilenameX, $outputfilenameH); $outputfilenameB=$query.".BLAST.txt"; $outputfilenameX=$query.".BLAST.xml"; $outputfilenameH=$query.".BLAST.html"; #legacy_blast.pl blastall -i query -d nr -o blast.out --path /opt/blast/bin print qx(du -s /tmp); my _$blast_report _=qx/$blast -p blastp -d $database -i $query -e $E_value -b 20000 -o $outputfilenameB/; my $XMLblast_report =qx/$blast -p blastp -d $database -i $query -e $E_value -m 7 -b 20000 -o $outputfilenameX/; my $writerhtml = new Bio::SearchIO::Writer::HTMLResultWriter(); my $outhtml = new Bio::SearchIO(-writer => $writerhtml, -file => ">$outputfilenameH"); while( my $result = _$blast_report_->next_result ) { # get a result from Bio::SearchIO parsing or build it up in memory $outhtml->write_result($result); } } Can anyone see where the problem is? Cheers! Lorenzo From lpaulet at ual.es Mon Jan 25 20:31:08 2010 From: lpaulet at ual.es (lpaulet at ual.es) Date: Mon, 25 Jan 2010 21:31:08 +0100 Subject: [Bioperl-l] HTMLResultWriter Message-ID: <20100125213108.zws18jpd8gwwkssk@webmail.ual.es> Hi all, I'm trying to generate a subroutine that performs a BLAST search and returns the corresponding reports in txt, xml and html format. I?m experiencing problems with the latter, as the program returns the following error message: "Can't call method "next_result" without a package or object reference at..." sub blasting { my ($query, $E_value) = @_; my ($outputfilenameB, $outputfilenameX, $outputfilenameH); $outputfilenameB=$query.".BLAST.txt"; $outputfilenameX=$query.".BLAST.xml"; $outputfilenameH=$query.".BLAST.html"; #legacy_blast.pl blastall -i query -d nr -o blast.out --path /opt/blast/bin print qx(du -s /tmp); my $blast_report =qx/$blast -p blastp -d $database -i $query -e $E_value -b 20000 -o $outputfilenameB/; my $XMLblast_report =qx/$blast -p blastp -d $database -i $query -e $E_value -m 7 -b 20000 -o $outputfilenameX/; my $writerhtml = new Bio::SearchIO::Writer::HTMLResultWriter(); my $outhtml = new Bio::SearchIO(-writer => $writerhtml, -file => ">$outputfilenameH"); while( my $result = $blast_report->next_result ) { # get a result from Bio::SearchIO parsing or build it up in memory $outhtml->write_result($result); } } Can anyone see where the problem is? Cheers! Lorenzo From dan.kortschak at adelaide.edu.au Mon Jan 25 21:00:37 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Tue, 26 Jan 2010 07:30:37 +1030 Subject: [Bioperl-l] Transcribe in bioperl In-Reply-To: References: Message-ID: <1264453237.4552.3.camel@epistle> A reverse_translate to IUPAC degenerate codes is not a bad idea, particularly for PCR primer design. Dan On Mon, 2010-01-25 at 09:05 -0500, bioperl-l-request at lists.open-bio.org wrote: > On Jan 24, 2010, at 10:39 PM, Chris Fields wrote: > > > ...And there might be a case for adding the analogous > reverse_translate(). > > Bah. Meant reverse_transcribe(). Ah well. > > chris From maj at fortinbras.us Mon Jan 25 21:07:49 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 25 Jan 2010 16:07:49 -0500 Subject: [Bioperl-l] HTMLResultWriter In-Reply-To: <20100125213108.zws18jpd8gwwkssk@webmail.ual.es> References: <20100125213108.zws18jpd8gwwkssk@webmail.ual.es> Message-ID: Lorenzo-- your $blast_report is set to be (some of) the text returned by a system call of a blast program; this isn't going to be an object of any kind, and so no functions can be called from it (as at "$blast_report->next_result"). You need to parse the text generated by the blast call using Bio::SearchIO to get a Bio::Search::Result::BlastResult object. you could do @blast_lines = qx/ ...your blast call... /; open my $bf, ">my.blast"; print $bf, @blast_lines; close $bf; $blast_result = Bio::SearchIO->new(-file=>'my.blast', -format => 'blast'); and carry on from there. But why not look at Bio::Tools::Run::StandAloneBlast or Bio::Tools::Run::StandAloneBlastPlus to run your blasts within perl? These wrap the blast programs and deliver BioPerl objects, rather than plain text output. cheers MAJ ----- Original Message ----- From: To: Sent: Monday, January 25, 2010 3:31 PM Subject: [Bioperl-l] HTMLResultWriter Hi all, I'm trying to generate a subroutine that performs a BLAST search and returns the corresponding reports in txt, xml and html format. I?m experiencing problems with the latter, as the program returns the following error message: "Can't call method "next_result" without a package or object reference at..." sub blasting { my ($query, $E_value) = @_; my ($outputfilenameB, $outputfilenameX, $outputfilenameH); $outputfilenameB=$query.".BLAST.txt"; $outputfilenameX=$query.".BLAST.xml"; $outputfilenameH=$query.".BLAST.html"; #legacy_blast.pl blastall -i query -d nr -o blast.out --path /opt/blast/bin print qx(du -s /tmp); my $blast_report =qx/$blast -p blastp -d $database -i $query -e $E_value -b 20000 -o $outputfilenameB/; my $XMLblast_report =qx/$blast -p blastp -d $database -i $query -e $E_value -m 7 -b 20000 -o $outputfilenameX/; my $writerhtml = new Bio::SearchIO::Writer::HTMLResultWriter(); my $outhtml = new Bio::SearchIO(-writer => $writerhtml, -file => ">$outputfilenameH"); while( my $result = $blast_report->next_result ) { # get a result from Bio::SearchIO parsing or build it up in memory $outhtml->write_result($result); } } Can anyone see where the problem is? Cheers! Lorenzo _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Mon Jan 25 21:09:24 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 25 Jan 2010 22:09:24 +0100 Subject: [Bioperl-l] HTMLResultWriter In-Reply-To: <4B5DFE53.2000201@ual.es> References: <4B5DFE53.2000201@ual.es> Message-ID: > my _$blast_report _=qx/$blast -p blastp -d $database -i $query -e $E_value -b 20000 -o $outputfilenameB/; > while( my $result = _$blast_report_->next_result ) { # get a result from Bio::SearchIO parsing or build it up in memory _$blast_report_ is not a valid variable name, as far as I know. Plus there's a space between report and the final '_' in the first of the above two lines. Does this code compile? Dave From Russell.Smithies at agresearch.co.nz Mon Jan 25 21:14:15 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue, 26 Jan 2010 10:14:15 +1300 Subject: [Bioperl-l] Regarding blast in Bioperl In-Reply-To: References: Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C61AC908B@exchsth.agresearch.co.nz> That's a fair mix of incomplete code you've supplied!! Did you read the documentation for RemoteBlast? The example there will do 99% of what you want. http://search.cpan.org/~cjfields/BioPerl-1.6.1/Bio/Tools/Run/RemoteBlast.pm I'm not entirely sure what you're trying to do (as you've left out a bit of your code) but I assume you're trying to retrieve and print the sequence for each hit. Here's something that works, not sure exactly what/why you want to print but it should get you a bit further. --Russell ================================ #!perl -w use Bio::Tools::Run::RemoteBlast; use Bio::DB::GenBank; use CGI ':standard'; use strict; my $q = new CGI; my @params = ( -prog => 'blastn', -data => 'nr', -expect => '1e-30', -entrez_query => 'Homo sapiens [ORGN]', -readmethod => 'SearchIO' ); my $gb = Bio::DB::GenBank->new; my $factory = Bio::Tools::Run::RemoteBlast->new(@params); #$v is just to turn on and off the messages my $v = 1; my $str = Bio::SeqIO->new( -file => 'test.faa', -format => "fasta" ); while ( my $input = $str->next_seq() ) { my $r = $factory->submit_blast($input); print STDERR "waiting..." if ( $v > 0 ); while ( my @rids = $factory->each_rid ) { foreach my $rid (@rids) { my @seqs = (); my $rc = $factory->retrieve_blast($rid); if ( !ref($rc) ) { if ( $rc < 0 ) { $factory->remove_rid($rid); } print STDERR "." if ( $v > 0 ); sleep 5; } else { my $result = $rc->next_result(); #save the blast output my $filename = $result->query_accession . '.out'; $factory->save_output($filename); $factory->remove_rid($rid); print "\nQuery Name: ", $result->query_name(), "\n"; while ( my $hit = $result->next_hit ) { # store the hit sequences push @seqs, $gb->get_Seq_by_version( $hit->name ); next unless ( $v > 0 ); print "\thit name is ", $hit->name, "\n"; while ( my $hsp = $hit->next_hsp ) { print "\t\tscore is ", $hsp->score, "\n"; } } ## print the seqs you've retrieved?? open( OUTFILE, '>', $result->query_accession . '.htm' ); print OUTFILE $q->start_html('RNAi Result'), $q->h1('RNAi Result'), $q->h2('Input'), $q->pre( toString($input) ), $q->h2('Output'); foreach (@seqs) { #there's probably a better way of printing the seq print OUTFILE $q->pre( toString($_) ); } print OUTFILE $q->end_html; close OUTFILE; } } } } sub toString { my $s = shift; return '>' . $s->display_id . " " . $s->desc . "\n" . $s->seq; } ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From biopython at maubp.freeserve.co.uk Mon Jan 25 21:24:33 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 25 Jan 2010 21:24:33 +0000 Subject: [Bioperl-l] Transcribe in bioperl In-Reply-To: <1264453237.4552.3.camel@epistle> References: <1264453237.4552.3.camel@epistle> Message-ID: <320fb6e01001251324r3e8ec3adt3c6d6f16a4839f56@mail.gmail.com> On Mon, Jan 25, 2010 at 9:00 PM, Dan Kortschak wrote: > A reverse_translate to IUPAC degenerate codes is not a bad idea, > particularly for PCR primer design. I would say it could be a bad idea. For any protein string there are multiple possible back translations, and this cannot be captured fully as a nucleotide string even using the IUPAC ambiguity chars. We debated this back and forth for Biopython, and decided to leave it out. It wasn't possible for a simple back translate to a simple string to handle the use cases we considered, and other options like returning a regular expression covering all possible back translations were too complex (for a core sequence method/function). Peter From jason at bioperl.org Mon Jan 25 21:26:55 2010 From: jason at bioperl.org (Jason Stajich) Date: Mon, 25 Jan 2010 13:26:55 -0800 Subject: [Bioperl-l] Transcribe in bioperl In-Reply-To: <320fb6e01001251324r3e8ec3adt3c6d6f16a4839f56@mail.gmail.com> References: <1264453237.4552.3.camel@epistle> <320fb6e01001251324r3e8ec3adt3c6d6f16a4839f56@mail.gmail.com> Message-ID: <98995830-DC7F-4404-A216-874EF5799DB6@bioperl.org> It was already implemented several years ago -- reverse_translate Bio::Tools::CodonTable -> revtanslate my $seqobj = Bio::PrimarySeq->new(-seq => 'FHGERHEL'); my $iupac_str = $myCodonTable->reverse_translate_all($seqobj); Chris had meant to say reverse_transcribe of RNA -> DNA FWIW. -jason On Jan 25, 2010, at 1:24 PM, Peter wrote: > On Mon, Jan 25, 2010 at 9:00 PM, Dan Kortschak > wrote: >> A reverse_translate to IUPAC degenerate codes is not a bad idea, >> particularly for PCR primer design. > > I would say it could be a bad idea. For any protein string there are > multiple possible back translations, and this cannot be captured > fully as a nucleotide string even using the IUPAC ambiguity chars. > > We debated this back and forth for Biopython, and decided to leave it > out. It wasn't possible for a simple back translate to a simple > string to > handle the use cases we considered, and other options like returning > a regular expression covering all possible back translations were too > complex (for a core sequence method/function). > > Peter > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ http://twitter.com/hyphaltip From maj at fortinbras.us Mon Jan 25 21:19:24 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 25 Jan 2010 16:19:24 -0500 Subject: [Bioperl-l] Transcribe in bioperl In-Reply-To: <1264453237.4552.3.camel@epistle> References: <1264453237.4552.3.camel@epistle> Message-ID: <72B106F0D5FF4F1E858CC9BD1EF33142@NewLife> I think we have that functionality in Bio::Tools::SeqPattern, courtesy of Bruno V--- ----- Original Message ----- From: "Dan Kortschak" To: Sent: Monday, January 25, 2010 4:00 PM Subject: Re: [Bioperl-l] Transcribe in bioperl >A reverse_translate to IUPAC degenerate codes is not a bad idea, > particularly for PCR primer design. > > Dan > > On Mon, 2010-01-25 at 09:05 -0500, bioperl-l-request at lists.open-bio.org > wrote: >> On Jan 24, 2010, at 10:39 PM, Chris Fields wrote: >> >> > ...And there might be a case for adding the analogous >> reverse_translate(). >> >> Bah. Meant reverse_transcribe(). Ah well. >> >> chris > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From dan.kortschak at adelaide.edu.au Mon Jan 25 21:38:44 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Tue, 26 Jan 2010 08:08:44 +1030 Subject: [Bioperl-l] Transcribe in bioperl In-Reply-To: <320fb6e01001251324r3e8ec3adt3c6d6f16a4839f56@mail.gmail.com> References: <1264453237.4552.3.camel@epistle> <320fb6e01001251324r3e8ec3adt3c6d6f16a4839f56@mail.gmail.com> Message-ID: <1264455524.4552.23.camel@epistle> Good to see that these ideas have been considered. I'd be interested to see this discussion, or at least the point dealing with the problems that might arise. I'm at a loss as to how ambiguity codes can't completely describe all possible coding sequences for any given codon table (via Bio::Tools::CodonTable - in fact this already has the revtranslate that could be fitted into a Bio::PrimarySeq method - to answer Mark and Jason's comments, I think that /if/ a reverse_translate method exists, it makes logical sense to have it tied to a sequence object, calling the B:T:CT method on the seq object itself rather than only in Bio::Tools, 2?). Pete, tcn you provide an example of the problems? thanks Dan On Mon, 2010-01-25 at 21:24 +0000, Peter wrote: > I would say it could be a bad idea. For any protein string there are > multiple possible back translations, and this cannot be captured > fully as a nucleotide string even using the IUPAC ambiguity chars. From lpaulet at ual.es Mon Jan 25 21:53:07 2010 From: lpaulet at ual.es (lpaulet at ual.es) Date: Mon, 25 Jan 2010 22:53:07 +0100 Subject: [Bioperl-l] HTMLResultWriter In-Reply-To: References: <4B5DFE53.2000201@ual.es> Message-ID: <20100125225307.2zl2cn2hkcsgccso@webmail.ual.es> Thanks Dave and Mark. Quoting Dave Messina : >> my _$blast_report _=qx/$blast -p blastp -d $database -i $query -e >> $E_value -b 20000 -o $outputfilenameB/; > >> while( my $result = _$blast_report_->next_result ) { # get a result >> from Bio::SearchIO parsing or build it up in memory > > > _$blast_report_ is not a valid variable name, as far as I know. Plus > there's a space between report and the final '_' in the first of > the above two lines. > > Does this code compile? > > Dave > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From rtbio.2009 at gmail.com Mon Jan 25 22:35:32 2010 From: rtbio.2009 at gmail.com (Roopa Raghuveer) Date: Mon, 25 Jan 2010 23:35:32 +0100 Subject: [Bioperl-l] Regarding blast in Bioperl In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C61AC908B@exchsth.agresearch.co.nz> References: <18DF7D20DFEC044098A1062202F5FFF32C61AC908B@exchsth.agresearch.co.nz> Message-ID: Hello Russell, Thank you very much for your reply. My problem is that Remote blast is getting well executed with my code and I am getting the .out file with sequences producing significant alignments. But, when I am trying to retrieve the sequences into an array @seqs, I am able to retrieve all the sequences except for the first hit. If the number of hits that I get in the .out file to be 3, I am able to retrieve only 2 hits i.e., I am able to get only 2 sequences. If there is only one significant hit for my sequence, then the name and description of the sequence appears in the .out file, but I am unable to get it into the array,the array count shows 0 and there would not be any sequence in the array. I hope that you have got me now. Here comes my code, use Bio::SearchIO; use Bio::Search::Result::BlastResult; use Bio::Perl; use Bio::Tools::Run::RemoteBlast; use Bio::Seq; use Bio::SeqIO; use Bio::DB::GenBank; $serverpath = "/srv/www/htdocs/rain/RNAi"; $serverurl = "http://141.84.66.66/rain/RNAi"; $outfile = $serverpath."/rnairesult_".time().".html"; $nuc = $serverpath."/nuc".time().".txt"; $debugfile = $serverpath."/debug_".time().".txt"; $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; my $outstring =""; &parse_form; print "Content-type: text/html\n\n"; print "\n"; print "RNAi Result"; print " \n"; print "\n"; print "\n"; print " Your results will appear here
"; print " Please be patient, runtime can be up to 5 minutes
"; print " This page will automatically reload in 30 seconds."; print "\n"; print "\n"; defined(my $pid = fork) or die "Can't fork: $!"; exit if $pid; open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; open(OUTFILE, '>',$outfile); print OUTFILE "\n RNAi Result \n \n \n Your results will appear here
Please be patient, runtime can be up to 5 minutes
This page will automatically reload in 30 seconds
\n \n"; close(OUTFILE); @compseqs = blastcode($in{'Inputseq'},$in{'Organism'}); $in{'Inputseq'} =~ s/>.*$//m; $in{'Inputseq'} =~ s/[^TAGC]//gim; $in{'Inputseq'} =~ tr/actg/ACTG/; @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'}, $in{'Threshold'}); sub blastcode { $inpu1= $_[0]; $organ= $_[1]; open(NUC,'>',$nuc); print NUC $inpu1,"\n"; close(NUC); my $prog = 'blastn'; my $db = 'refseq_rna'; my $e_val= '1e-10'; my $organism= $organ; $gb = new Bio::DB::GenBank; my @params = ( '-prog' => $prog, '-data' => $db, '-expect' => $e_val, '-readmethod' => 'SearchIO', '-Organism' => $organism ); # open(OUTFILE,'>',$debugfile); # print OUTFILE @params; # close(OUTFILE); my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY => "$organ\[ORGN]"); #my $factory = Bio::Tools::Run::RemoteBlast->new(@params); #change a paramter #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma Brucei[ORGN]'; #change a paramter # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]'; my $v = 1; #$v is just to turn on and off the messages my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , '-organism' => "$organ\[ORGN]"); while (my $input = $str->next_seq()) { #Blast a sequence against a database: #Alternatively, you could pass in a file with many #sequences rather than loop through sequence one at a time #Remove the loop starting 'while (my $input = $str->next_seq())' #and swap the two lines below for an example of that. open(OUTFILE,'>',$debugfile); print OUTFILE $input; close(OUTFILE); my $r = $factory->submit_blast($input); open(OUTFILE,'>',$debugfile); # print OUTFILE $r; close(OUTFILE); print STDERR "waiting...." if($v>0); while ( my @rids = $factory->each_rid ) { foreach my $rid ( @rids ) { my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $factory->remove_rid($rid); } print STDERR "." if ( $v > 0 ); sleep 5; } else { my $result = $rc->next_result(); #save the output $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; open(BLASTDEBUGFILE,'>',$blastdebugfile); print BLASTDEBUGFILE $result->next_hit(); close(BLASTDEBUGFILE); my $filename = $serverpath."/blastdata_".time()."\.out"; # open(DEBUGFILE,'>',$debugfile); # open(new,'>',$filename); # @arra=; # print DEBUGFILE @arra; # close(DEBUGFILE); # close(new); $factory->save_output($filename); # open(BLASTDEBUGFILE,'>',$debugfile); # print BLASTDEBUGFILE "Hello $rid"; # close(BLASTDEBUGFILE); $factory->remove_rid($rid); while ( my $hit = $result->next_hit ) { next unless ( $v >= 0); my $sequ = $gb->get_Seq_by_version($hit->name); my $dna = $sequ->seq(); # get the sequence as a string $dummy++; open(OUTFILE,'>',$debugfile); open(OUTFILE,'>',$debugfile); # print OUTFILE $dna; close(OUTFILE); push(@seqs,$dna); } } } } } $warum=scalar(@seqs); open(OUTFILE,'>',$debugfile); print OUTFILE $warum; # print OUTFILE @seqs; close(OUTFILE); return(@seqs); } open(OUTFILE, '>',$outfile) || die ; print OUTFILE "\n RNAi Result \n \n

Inputsequence:
"; for ($i=0; $i\n"; } } print OUTFILE "

"; $z=@compseqs; for($k=0;$k<$z;$k++) { print OUTFILE "

Compare Sequence:
"; for ($i=0; $i\n"; } } print OUTFILE "

"; } print OUTFILE "

Window:
$in{'Windowsize'}

Threshold:
$in{'Threshold'}

"; my $j=0; for ($i=0; $i{similar}<=$in{'Threshold'}){ $j=$in{'Windowsize'}; } $height=$out[$i]->{similar}*5; } if ($j>0) { print OUTFILE ""; $outstring .= "".substr ($in{'Inputseq'}, $i, 1).""; $j--; } else { print OUTFILE ""; $outstring .= "".substr ($in{'Inputseq'}, $i, 1).""; } if ( ($i+1)%10==0){ $outstring .= " "; } if ( ($i+1)%60==0){ $outstring .= "
\n"; } if ( ($i+1)%800==0){ print OUTFILE "

\n"; } } print OUTFILE "

$outstring"; #foreach (@out) { #print OUTFILE "

Sequence: $_->{sequence}: $_->{similar} matchs

"; #if ($_->{similar}<=$in{'Threshold'}){ # } #} print OUTFILE "\n\n"; close OUTFILE; #nameprint(); sub parse_form { local ($buffer, @pairs, $pair, $name, $value); # Read in text $ENV{'REQUEST_METHOD'} =~ tr/a-z/A-Z/; if ($ENV{'REQUEST_METHOD'} eq "POST") { read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'}); } else { $buffer = $ENV{'QUERY_STRING'}; } @pairs = split(/&/, $buffer); foreach $pair (@pairs) { ($name, $value) = split(/=/, $pair); $value =~ tr/+/ /; $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg; $in{$name} = $value; } } Regards, Roopa. On Mon, Jan 25, 2010 at 10:14 PM, Smithies, Russell < Russell.Smithies at agresearch.co.nz> wrote: > That's a fair mix of incomplete code you've supplied!! > Did you read the documentation for RemoteBlast? The example there will do > 99% of what you want. > http://search.cpan.org/~cjfields/BioPerl-1.6.1/Bio/Tools/Run/RemoteBlast.pm > > I'm not entirely sure what you're trying to do (as you've left out a bit of > your code) but I assume you're trying to retrieve and print the sequence for > each hit. > > Here's something that works, not sure exactly what/why you want to print > but it should get you a bit further. > > --Russell > > > ================================ > #!perl -w > > use Bio::Tools::Run::RemoteBlast; > use Bio::DB::GenBank; > > use CGI ':standard'; > > use strict; > > my $q = new CGI; > > my @params = ( > -prog => 'blastn', > -data => 'nr', > -expect => '1e-30', > -entrez_query => 'Homo sapiens [ORGN]', > -readmethod => 'SearchIO' > ); > > my $gb = Bio::DB::GenBank->new; > > my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > > #$v is just to turn on and off the messages > my $v = 1; > > my $str = Bio::SeqIO->new( -file => 'test.faa', -format => "fasta" ); > > while ( my $input = $str->next_seq() ) { > > my $r = $factory->submit_blast($input); > > print STDERR "waiting..." if ( $v > 0 ); > while ( my @rids = $factory->each_rid ) { > foreach my $rid (@rids) { > my @seqs = (); > my $rc = $factory->retrieve_blast($rid); > if ( !ref($rc) ) { > if ( $rc < 0 ) { > $factory->remove_rid($rid); > } > print STDERR "." if ( $v > 0 ); > sleep 5; > } > else { > my $result = $rc->next_result(); > > #save the blast output > my $filename = $result->query_accession . '.out'; > $factory->save_output($filename); > $factory->remove_rid($rid); > print "\nQuery Name: ", $result->query_name(), "\n"; > while ( my $hit = $result->next_hit ) { > > # store the hit sequences > push @seqs, $gb->get_Seq_by_version( $hit->name ); > > next unless ( $v > 0 ); > print "\thit name is ", $hit->name, "\n"; > while ( my $hsp = $hit->next_hsp ) { > print "\t\tscore is ", $hsp->score, "\n"; > } > } > > ## print the seqs you've retrieved?? > open( OUTFILE, '>', $result->query_accession . '.htm' ); > print OUTFILE $q->start_html('RNAi Result'), > $q->h1('RNAi Result'), > $q->h2('Input'), > $q->pre( toString($input) ), > $q->h2('Output'); > > foreach (@seqs) { > > #there's probably a better way of printing the seq > print OUTFILE $q->pre( toString($_) ); > } > print OUTFILE $q->end_html; > close OUTFILE; > } > } > } > } > > sub toString { > my $s = shift; > return '>' . $s->display_id . " " . $s->desc . "\n" . $s->seq; > } > > > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > From ajmackey at gmail.com Tue Jan 26 13:24:43 2010 From: ajmackey at gmail.com (Aaron Mackey) Date: Tue, 26 Jan 2010 08:24:43 -0500 Subject: [Bioperl-l] Transcribe in bioperl In-Reply-To: <1264455524.4552.23.camel@epistle> References: <1264453237.4552.3.camel@epistle> <320fb6e01001251324r3e8ec3adt3c6d6f16a4839f56@mail.gmail.com> <1264455524.4552.23.camel@epistle> Message-ID: <24c96eca1001260524s3d46e850hfdcc461e22210972@mail.gmail.com> There's also Bio::Tools::IUPAC; given a sequence with IUPAC ambiguity codes, it provides a SeqIO stream that enumerates all the possible unambiguous realizations. Not the right solution for every situation, but quite useful when you need it. -Aaron On Mon, Jan 25, 2010 at 4:38 PM, Dan Kortschak < dan.kortschak at adelaide.edu.au> wrote: > Good to see that these ideas have been considered. > > I'd be interested to see this discussion, or at least the point dealing > with the problems that might arise. I'm at a loss as to how ambiguity > codes can't completely describe all possible coding sequences for any > given codon table (via Bio::Tools::CodonTable - in fact this already has > the revtranslate that could be fitted into a Bio::PrimarySeq method - to > answer Mark and Jason's comments, I think that /if/ a reverse_translate > method exists, it makes logical sense to have it tied to a sequence > object, calling the B:T:CT method on the seq object itself rather than > only in Bio::Tools, 2?). Pete, tcn you provide an example of the > problems? > > thanks > Dan > > On Mon, 2010-01-25 at 21:24 +0000, Peter wrote: > > I would say it could be a bad idea. For any protein string there are > > multiple possible back translations, and this cannot be captured > > fully as a nucleotide string even using the IUPAC ambiguity chars. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From nml5566 at gmail.com Tue Jan 26 21:10:54 2010 From: nml5566 at gmail.com (Nathan Liles) Date: Tue, 26 Jan 2010 15:10:54 -0600 Subject: [Bioperl-l] SVN access Message-ID: <4B5F5A5E.2070406@gmail.com> Does anyone know who I need to talk to for getting developer access for the Bioperl SVN? I want to submit a patch to the genbank2gff3 converter. Thanks, Nathan From Russell.Smithies at agresearch.co.nz Wed Jan 27 01:40:40 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 27 Jan 2010 14:40:40 +1300 Subject: [Bioperl-l] how to retrieve organism name from accession number? In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz> References: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife> <18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz> Grrrrrr, I hate eutils!!!! ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: NCBI esearch fatal error: Search Backend failed: Error 111 (Connection refused) STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 STACK: Bio::Tools::EUtilities::parse_data /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332 STACK: Bio::Tools::EUtilities::get_ids /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441 STACK: Bio::DB::EUtilities::get_ids /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363 STACK: get_desc.pl:32 ----------------------------------------------------------- Nice error message though :-) --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Smithies, Russell > Sent: Monday, 11 January 2010 10:05 a.m. > To: 'Chris Fields' > Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org' > Subject: Re: [Bioperl-l] how to retrieve organism name from accession > number? > > I've started to go off eUtils recently (not BioPerl's fault) as I've often > been finding that with large queries, chunks of the resulting data is > missing. > For example, before Xmas I was creating species-specific databases by > using eUtils to get a list of GI numbers back for a taxid, then retrieving > the fasta sequences in chunks of 500. > Very regularly, in the middle of the fasta there would be a message about > resource unavailable eg. > >test_sequence_1 > TACGATCATCGCTResource UnavailableTACGACTCTGCT > >test_sequence_2 > TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT > > Often this wasn't detected until formatdb complained about invalid > characters. > Inquiries to NCBI as to why this was happening and what to do about it > returned stupid answers ("do each sequence manually thru the web > interface", or "use eUtils"). > As we have a nice fast network connection, I now prefer to download very > large gzip files (i.e. all of refseq) and extract what I need. > > I can't help but think that NCBI could solve a lot of problems if they > gzipped the output from eUtils queries - it's something I've requested > regularly for the last 5 years or so!! > > --Russell > > > > -----Original Message----- > > From: Chris Fields [mailto:cjfields at illinois.edu] > > Sent: Monday, 11 January 2010 9:50 a.m. > > To: Smithies, Russell > > Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-bio.org' > > Subject: Re: [Bioperl-l] how to retrieve organism name from accession > > number? > > > > One could also use Bio::DB::Taxonomy, which indexes the same files or > > (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for the > > details). > > > > chris > > > > On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote: > > > > > An alternate non-BioPerly way (that may be faster given NCBI's > flakiness > > lately) would be to download the gi_taxid_nucl.zip or gi_taxid_prot.zip > > files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash > and > > do lookups. > > > In that same dir, taxdump.tar.gz contains a file called names.dmp > which > > lists taxids and descriptions (and synonyms) > > > > > > If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I > > could do this: > > > > > > my $taxid = $gi_taxid_nucl{$accession}; > > > my $org_name = $names{$taxid}; > > > > > > --Russell > > > > > > > > >> -----Original Message----- > > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > >> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen > > >> Sent: Saturday, 26 December 2009 4:52 p.m. > > >> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org > > >> Subject: Re: [Bioperl-l] how to retrieve organism name from accession > > >> number? > > >> > > >> Bhakti, > > >> The following example (using EUtilities) may serve your purpose: > > >> > > >> use Bio::DB::EUtilities; > > >> > > >> my (%taxa, @taxa); > > >> my (%names, %idmap); > > >> > > >> # these are protein ids; nuc ids will work by changing -dbfrom => > > >> 'nucleotide', > > >> # (probably) > > >> > > >> my @ids = qw(1621261 89318838 68536103 20807972 730439); > > >> > > >> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink', > > >> -db => 'taxonomy', > > >> -dbfrom => 'protein', > > >> -correspondence => 1, > > >> -id => \@ids); > > >> > > >> # iterate through the LinkSet objects > > >> while (my $ds = $factory->next_LinkSet) { > > >> $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0] > > >> } > > >> > > >> @taxa = @taxa{@ids}; > > >> > > >> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary', > > >> -db => 'taxonomy', > > >> -id => \@taxa ); > > >> > > >> while (local $_ = $factory->next_DocSum) { > > >> $names{($_->get_contents_by_name('TaxId'))[0]} = > > >> ($_->get_contents_by_name('ScientificName'))[0]; > > >> } > > >> > > >> foreach (@ids) { > > >> $idmap{$_} = $names{$taxa{$_}}; > > >> } > > >> > > >> # %idmap is > > >> # 1621261 => 'Mycobacterium tuberculosis H37Rv' > > >> # 20807972 => 'Thermoanaerobacter tengcongensis MB4' > > >> # 68536103 => 'Corynebacterium jeikeium K411' > > >> # 730439 => 'Bacillus caldolyticus' > > >> # 89318838 => undef (this record has been removed from the db) > > >> > > >> 1; > > >> > > >> You probably will need to break up your 30000 into chunks > > >> (say, 1000-3000 each), and do the above on each chunk with a > > >> > > >> sleep 3; > > >> > > >> or so separating the queries. > > >> MAJ > > >> ----- Original Message ----- > > >> From: "Bhakti Dwivedi" > > >> To: > > >> Sent: Friday, December 25, 2009 9:46 PM > > >> Subject: [Bioperl-l] how to retrieve organism name from accession > > number? > > >> > > >> > > >>> Hi, > > >>> > > >>> Does anyone know how to retrieve the "Source" or the "Species name" > > >> given > > >>> the accession number using Bioperl. I have these 30,000 accession > > >> numbers > > >>> for which I need to get the source organisms. Any kind of help will > > be > > >>> appreciated. > > >>> > > >>> Thanks > > >>> > > >>> BD > > >>> _______________________________________________ > > >>> Bioperl-l mailing list > > >>> Bioperl-l at lists.open-bio.org > > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > >>> > > >>> > > >> > > >> _______________________________________________ > > >> Bioperl-l mailing list > > >> Bioperl-l at lists.open-bio.org > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > ======================================================================= > > > Attention: The information contained in this message and/or > attachments > > > from AgResearch Limited is intended only for the persons or entities > > > to which it is addressed and may contain confidential and/or > privileged > > > material. Any review, retransmission, dissemination or other use of, > or > > > taking of any action in reliance upon, this information by persons or > > > entities other than the intended recipients is prohibited by > AgResearch > > > Limited. If you have received this message in error, please notify the > > > sender immediately. > > > > ======================================================================= > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Jan 27 01:46:26 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 26 Jan 2010 19:46:26 -0600 Subject: [Bioperl-l] how to retrieve organism name from accession number? In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz> References: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife> <18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz> Message-ID: <4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu> It's unfortunate but I have heard this problem popping up quite a bit more frequently lately. Not to push too many buttons but NCBI isn't very forthcoming with help these days; they have become quite insular. Not sure if they're short-staffed due to budget or if there are other issues. chris On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote: > Grrrrrr, I hate eutils!!!! > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: NCBI esearch fatal error: Search Backend failed: Error 111 (Connection refused) > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 > STACK: Bio::Tools::EUtilities::parse_data /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332 > STACK: Bio::Tools::EUtilities::get_ids /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441 > STACK: Bio::DB::EUtilities::get_ids /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363 > STACK: get_desc.pl:32 > ----------------------------------------------------------- > > > Nice error message though :-) > > > --Russell > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell >> Sent: Monday, 11 January 2010 10:05 a.m. >> To: 'Chris Fields' >> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org' >> Subject: Re: [Bioperl-l] how to retrieve organism name from accession >> number? >> >> I've started to go off eUtils recently (not BioPerl's fault) as I've often >> been finding that with large queries, chunks of the resulting data is >> missing. >> For example, before Xmas I was creating species-specific databases by >> using eUtils to get a list of GI numbers back for a taxid, then retrieving >> the fasta sequences in chunks of 500. >> Very regularly, in the middle of the fasta there would be a message about >> resource unavailable eg. >>> test_sequence_1 >> TACGATCATCGCTResource UnavailableTACGACTCTGCT >>> test_sequence_2 >> TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT >> >> Often this wasn't detected until formatdb complained about invalid >> characters. >> Inquiries to NCBI as to why this was happening and what to do about it >> returned stupid answers ("do each sequence manually thru the web >> interface", or "use eUtils"). >> As we have a nice fast network connection, I now prefer to download very >> large gzip files (i.e. all of refseq) and extract what I need. >> >> I can't help but think that NCBI could solve a lot of problems if they >> gzipped the output from eUtils queries - it's something I've requested >> regularly for the last 5 years or so!! >> >> --Russell >> >> >>> -----Original Message----- >>> From: Chris Fields [mailto:cjfields at illinois.edu] >>> Sent: Monday, 11 January 2010 9:50 a.m. >>> To: Smithies, Russell >>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-bio.org' >>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession >>> number? >>> >>> One could also use Bio::DB::Taxonomy, which indexes the same files or >>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for the >>> details). >>> >>> chris >>> >>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote: >>> >>>> An alternate non-BioPerly way (that may be faster given NCBI's >> flakiness >>> lately) would be to download the gi_taxid_nucl.zip or gi_taxid_prot.zip >>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash >> and >>> do lookups. >>>> In that same dir, taxdump.tar.gz contains a file called names.dmp >> which >>> lists taxids and descriptions (and synonyms) >>>> >>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I >>> could do this: >>>> >>>> my $taxid = $gi_taxid_nucl{$accession}; >>>> my $org_name = $names{$taxid}; >>>> >>>> --Russell >>>> >>>> >>>>> -----Original Message----- >>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen >>>>> Sent: Saturday, 26 December 2009 4:52 p.m. >>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org >>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession >>>>> number? >>>>> >>>>> Bhakti, >>>>> The following example (using EUtilities) may serve your purpose: >>>>> >>>>> use Bio::DB::EUtilities; >>>>> >>>>> my (%taxa, @taxa); >>>>> my (%names, %idmap); >>>>> >>>>> # these are protein ids; nuc ids will work by changing -dbfrom => >>>>> 'nucleotide', >>>>> # (probably) >>>>> >>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439); >>>>> >>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink', >>>>> -db => 'taxonomy', >>>>> -dbfrom => 'protein', >>>>> -correspondence => 1, >>>>> -id => \@ids); >>>>> >>>>> # iterate through the LinkSet objects >>>>> while (my $ds = $factory->next_LinkSet) { >>>>> $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0] >>>>> } >>>>> >>>>> @taxa = @taxa{@ids}; >>>>> >>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary', >>>>> -db => 'taxonomy', >>>>> -id => \@taxa ); >>>>> >>>>> while (local $_ = $factory->next_DocSum) { >>>>> $names{($_->get_contents_by_name('TaxId'))[0]} = >>>>> ($_->get_contents_by_name('ScientificName'))[0]; >>>>> } >>>>> >>>>> foreach (@ids) { >>>>> $idmap{$_} = $names{$taxa{$_}}; >>>>> } >>>>> >>>>> # %idmap is >>>>> # 1621261 => 'Mycobacterium tuberculosis H37Rv' >>>>> # 20807972 => 'Thermoanaerobacter tengcongensis MB4' >>>>> # 68536103 => 'Corynebacterium jeikeium K411' >>>>> # 730439 => 'Bacillus caldolyticus' >>>>> # 89318838 => undef (this record has been removed from the db) >>>>> >>>>> 1; >>>>> >>>>> You probably will need to break up your 30000 into chunks >>>>> (say, 1000-3000 each), and do the above on each chunk with a >>>>> >>>>> sleep 3; >>>>> >>>>> or so separating the queries. >>>>> MAJ >>>>> ----- Original Message ----- >>>>> From: "Bhakti Dwivedi" >>>>> To: >>>>> Sent: Friday, December 25, 2009 9:46 PM >>>>> Subject: [Bioperl-l] how to retrieve organism name from accession >>> number? >>>>> >>>>> >>>>>> Hi, >>>>>> >>>>>> Does anyone know how to retrieve the "Source" or the "Species name" >>>>> given >>>>>> the accession number using Bioperl. I have these 30,000 accession >>>>> numbers >>>>>> for which I need to get the source organisms. Any kind of help will >>> be >>>>>> appreciated. >>>>>> >>>>>> Thanks >>>>>> >>>>>> BD >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >> ======================================================================= >>>> Attention: The information contained in this message and/or >> attachments >>>> from AgResearch Limited is intended only for the persons or entities >>>> to which it is addressed and may contain confidential and/or >> privileged >>>> material. Any review, retransmission, dissemination or other use of, >> or >>>> taking of any action in reliance upon, this information by persons or >>>> entities other than the intended recipients is prohibited by >> AgResearch >>>> Limited. If you have received this message in error, please notify the >>>> sender immediately. >>>> >> ======================================================================= >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l From Russell.Smithies at agresearch.co.nz Wed Jan 27 01:59:15 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 27 Jan 2010 14:59:15 +1300 Subject: [Bioperl-l] how to retrieve organism name from accession number? In-Reply-To: <4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu> References: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife> <18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz> <4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C61AC961E@exchsth.agresearch.co.nz> I've had a wide selection of errors lately: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: NCBI esearch fatal error: Search Backend failed: Error 11 (Resource temporarily unavailable) STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 STACK: Bio::Tools::EUtilities::parse_data /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332 STACK: Bio::Tools::EUtilities::get_ids /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441 STACK: Bio::DB::EUtilities::get_ids /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363 STACK: get_desc.pl:32 ----------------------------------------------------------- And I never get a good explanation from NCBI or suggestions on how to avoid it. --Russell > -----Original Message----- > From: Chris Fields [mailto:cjfields at illinois.edu] > Sent: Wednesday, 27 January 2010 2:46 p.m. > To: Smithies, Russell > Cc: 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org' > Subject: Re: [Bioperl-l] how to retrieve organism name from accession > number? > > It's unfortunate but I have heard this problem popping up quite a bit more > frequently lately. Not to push too many buttons but NCBI isn't very > forthcoming with help these days; they have become quite insular. Not > sure if they're short-staffed due to budget or if there are other issues. > > chris > > On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote: > > > Grrrrrr, I hate eutils!!!! > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: NCBI esearch fatal error: Search Backend failed: Error 111 > (Connection refused) > > STACK: Error::throw > > STACK: Bio::Root::Root::throw > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 > > STACK: Bio::Tools::EUtilities::parse_data > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332 > > STACK: Bio::Tools::EUtilities::get_ids > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441 > > STACK: Bio::DB::EUtilities::get_ids > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363 > > STACK: get_desc.pl:32 > > ----------------------------------------------------------- > > > > > > Nice error message though :-) > > > > > > --Russell > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell > >> Sent: Monday, 11 January 2010 10:05 a.m. > >> To: 'Chris Fields' > >> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org' > >> Subject: Re: [Bioperl-l] how to retrieve organism name from accession > >> number? > >> > >> I've started to go off eUtils recently (not BioPerl's fault) as I've > often > >> been finding that with large queries, chunks of the resulting data is > >> missing. > >> For example, before Xmas I was creating species-specific databases by > >> using eUtils to get a list of GI numbers back for a taxid, then > retrieving > >> the fasta sequences in chunks of 500. > >> Very regularly, in the middle of the fasta there would be a message > about > >> resource unavailable eg. > >>> test_sequence_1 > >> TACGATCATCGCTResource UnavailableTACGACTCTGCT > >>> test_sequence_2 > >> TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT > >> > >> Often this wasn't detected until formatdb complained about invalid > >> characters. > >> Inquiries to NCBI as to why this was happening and what to do about it > >> returned stupid answers ("do each sequence manually thru the web > >> interface", or "use eUtils"). > >> As we have a nice fast network connection, I now prefer to download > very > >> large gzip files (i.e. all of refseq) and extract what I need. > >> > >> I can't help but think that NCBI could solve a lot of problems if they > >> gzipped the output from eUtils queries - it's something I've requested > >> regularly for the last 5 years or so!! > >> > >> --Russell > >> > >> > >>> -----Original Message----- > >>> From: Chris Fields [mailto:cjfields at illinois.edu] > >>> Sent: Monday, 11 January 2010 9:50 a.m. > >>> To: Smithies, Russell > >>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-bio.org' > >>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession > >>> number? > >>> > >>> One could also use Bio::DB::Taxonomy, which indexes the same files or > >>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for > the > >>> details). > >>> > >>> chris > >>> > >>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote: > >>> > >>>> An alternate non-BioPerly way (that may be faster given NCBI's > >> flakiness > >>> lately) would be to download the gi_taxid_nucl.zip or > gi_taxid_prot.zip > >>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash > >> and > >>> do lookups. > >>>> In that same dir, taxdump.tar.gz contains a file called names.dmp > >> which > >>> lists taxids and descriptions (and synonyms) > >>>> > >>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I > >>> could do this: > >>>> > >>>> my $taxid = $gi_taxid_nucl{$accession}; > >>>> my $org_name = $names{$taxid}; > >>>> > >>>> --Russell > >>>> > >>>> > >>>>> -----Original Message----- > >>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen > >>>>> Sent: Saturday, 26 December 2009 4:52 p.m. > >>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org > >>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from > accession > >>>>> number? > >>>>> > >>>>> Bhakti, > >>>>> The following example (using EUtilities) may serve your purpose: > >>>>> > >>>>> use Bio::DB::EUtilities; > >>>>> > >>>>> my (%taxa, @taxa); > >>>>> my (%names, %idmap); > >>>>> > >>>>> # these are protein ids; nuc ids will work by changing -dbfrom => > >>>>> 'nucleotide', > >>>>> # (probably) > >>>>> > >>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439); > >>>>> > >>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink', > >>>>> -db => 'taxonomy', > >>>>> -dbfrom => 'protein', > >>>>> -correspondence => 1, > >>>>> -id => \@ids); > >>>>> > >>>>> # iterate through the LinkSet objects > >>>>> while (my $ds = $factory->next_LinkSet) { > >>>>> $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0] > >>>>> } > >>>>> > >>>>> @taxa = @taxa{@ids}; > >>>>> > >>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary', > >>>>> -db => 'taxonomy', > >>>>> -id => \@taxa ); > >>>>> > >>>>> while (local $_ = $factory->next_DocSum) { > >>>>> $names{($_->get_contents_by_name('TaxId'))[0]} = > >>>>> ($_->get_contents_by_name('ScientificName'))[0]; > >>>>> } > >>>>> > >>>>> foreach (@ids) { > >>>>> $idmap{$_} = $names{$taxa{$_}}; > >>>>> } > >>>>> > >>>>> # %idmap is > >>>>> # 1621261 => 'Mycobacterium tuberculosis H37Rv' > >>>>> # 20807972 => 'Thermoanaerobacter tengcongensis MB4' > >>>>> # 68536103 => 'Corynebacterium jeikeium K411' > >>>>> # 730439 => 'Bacillus caldolyticus' > >>>>> # 89318838 => undef (this record has been removed from the db) > >>>>> > >>>>> 1; > >>>>> > >>>>> You probably will need to break up your 30000 into chunks > >>>>> (say, 1000-3000 each), and do the above on each chunk with a > >>>>> > >>>>> sleep 3; > >>>>> > >>>>> or so separating the queries. > >>>>> MAJ > >>>>> ----- Original Message ----- > >>>>> From: "Bhakti Dwivedi" > >>>>> To: > >>>>> Sent: Friday, December 25, 2009 9:46 PM > >>>>> Subject: [Bioperl-l] how to retrieve organism name from accession > >>> number? > >>>>> > >>>>> > >>>>>> Hi, > >>>>>> > >>>>>> Does anyone know how to retrieve the "Source" or the "Species name" > >>>>> given > >>>>>> the accession number using Bioperl. I have these 30,000 accession > >>>>> numbers > >>>>>> for which I need to get the source organisms. Any kind of help > will > >>> be > >>>>>> appreciated. > >>>>>> > >>>>>> Thanks > >>>>>> > >>>>>> BD > >>>>>> _______________________________________________ > >>>>>> Bioperl-l mailing list > >>>>>> Bioperl-l at lists.open-bio.org > >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>> > >>>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> Bioperl-l mailing list > >>>>> Bioperl-l at lists.open-bio.org > >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>> > >> ======================================================================= > >>>> Attention: The information contained in this message and/or > >> attachments > >>>> from AgResearch Limited is intended only for the persons or entities > >>>> to which it is addressed and may contain confidential and/or > >> privileged > >>>> material. Any review, retransmission, dissemination or other use of, > >> or > >>>> taking of any action in reliance upon, this information by persons or > >>>> entities other than the intended recipients is prohibited by > >> AgResearch > >>>> Limited. If you have received this message in error, please notify > the > >>>> sender immediately. > >>>> > >> ======================================================================= > >>>> > >>>> _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l at lists.open-bio.org > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Jan 27 02:42:22 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 26 Jan 2010 20:42:22 -0600 Subject: [Bioperl-l] how to retrieve organism name from accession number? In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C61AC961E@exchsth.agresearch.co.nz> References: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife> <18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz> <4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu> <18DF7D20DFEC044098A1062202F5FFF32C61AC961E@exchsth.agresearch.co.nz> Message-ID: <5FF9B845-2B6A-4B8C-B8AE-A12E0CBCA2D6@illinois.edu> Makes me wonder if they're pushing more users towards the SOAP-based services and away from eutils. chris On Jan 26, 2010, at 7:59 PM, Smithies, Russell wrote: > I've had a wide selection of errors lately: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: NCBI esearch fatal error: Search Backend failed: Error 11 (Resource temporarily unavailable) > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 > STACK: Bio::Tools::EUtilities::parse_data /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332 > STACK: Bio::Tools::EUtilities::get_ids /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441 > STACK: Bio::DB::EUtilities::get_ids /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363 > STACK: get_desc.pl:32 > ----------------------------------------------------------- > > And I never get a good explanation from NCBI or suggestions on how to avoid it. > > > --Russell > > >> -----Original Message----- >> From: Chris Fields [mailto:cjfields at illinois.edu] >> Sent: Wednesday, 27 January 2010 2:46 p.m. >> To: Smithies, Russell >> Cc: 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org' >> Subject: Re: [Bioperl-l] how to retrieve organism name from accession >> number? >> >> It's unfortunate but I have heard this problem popping up quite a bit more >> frequently lately. Not to push too many buttons but NCBI isn't very >> forthcoming with help these days; they have become quite insular. Not >> sure if they're short-staffed due to budget or if there are other issues. >> >> chris >> >> On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote: >> >>> Grrrrrr, I hate eutils!!!! >>> >>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>> MSG: NCBI esearch fatal error: Search Backend failed: Error 111 >> (Connection refused) >>> STACK: Error::throw >>> STACK: Bio::Root::Root::throw >> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 >>> STACK: Bio::Tools::EUtilities::parse_data >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332 >>> STACK: Bio::Tools::EUtilities::get_ids >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441 >>> STACK: Bio::DB::EUtilities::get_ids >> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363 >>> STACK: get_desc.pl:32 >>> ----------------------------------------------------------- >>> >>> >>> Nice error message though :-) >>> >>> >>> --Russell >>> >>>> -----Original Message----- >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell >>>> Sent: Monday, 11 January 2010 10:05 a.m. >>>> To: 'Chris Fields' >>>> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org' >>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession >>>> number? >>>> >>>> I've started to go off eUtils recently (not BioPerl's fault) as I've >> often >>>> been finding that with large queries, chunks of the resulting data is >>>> missing. >>>> For example, before Xmas I was creating species-specific databases by >>>> using eUtils to get a list of GI numbers back for a taxid, then >> retrieving >>>> the fasta sequences in chunks of 500. >>>> Very regularly, in the middle of the fasta there would be a message >> about >>>> resource unavailable eg. >>>>> test_sequence_1 >>>> TACGATCATCGCTResource UnavailableTACGACTCTGCT >>>>> test_sequence_2 >>>> TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT >>>> >>>> Often this wasn't detected until formatdb complained about invalid >>>> characters. >>>> Inquiries to NCBI as to why this was happening and what to do about it >>>> returned stupid answers ("do each sequence manually thru the web >>>> interface", or "use eUtils"). >>>> As we have a nice fast network connection, I now prefer to download >> very >>>> large gzip files (i.e. all of refseq) and extract what I need. >>>> >>>> I can't help but think that NCBI could solve a lot of problems if they >>>> gzipped the output from eUtils queries - it's something I've requested >>>> regularly for the last 5 years or so!! >>>> >>>> --Russell >>>> >>>> >>>>> -----Original Message----- >>>>> From: Chris Fields [mailto:cjfields at illinois.edu] >>>>> Sent: Monday, 11 January 2010 9:50 a.m. >>>>> To: Smithies, Russell >>>>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-bio.org' >>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession >>>>> number? >>>>> >>>>> One could also use Bio::DB::Taxonomy, which indexes the same files or >>>>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for >> the >>>>> details). >>>>> >>>>> chris >>>>> >>>>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote: >>>>> >>>>>> An alternate non-BioPerly way (that may be faster given NCBI's >>>> flakiness >>>>> lately) would be to download the gi_taxid_nucl.zip or >> gi_taxid_prot.zip >>>>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash >>>> and >>>>> do lookups. >>>>>> In that same dir, taxdump.tar.gz contains a file called names.dmp >>>> which >>>>> lists taxids and descriptions (and synonyms) >>>>>> >>>>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I >>>>> could do this: >>>>>> >>>>>> my $taxid = $gi_taxid_nucl{$accession}; >>>>>> my $org_name = $names{$taxid}; >>>>>> >>>>>> --Russell >>>>>> >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen >>>>>>> Sent: Saturday, 26 December 2009 4:52 p.m. >>>>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org >>>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from >> accession >>>>>>> number? >>>>>>> >>>>>>> Bhakti, >>>>>>> The following example (using EUtilities) may serve your purpose: >>>>>>> >>>>>>> use Bio::DB::EUtilities; >>>>>>> >>>>>>> my (%taxa, @taxa); >>>>>>> my (%names, %idmap); >>>>>>> >>>>>>> # these are protein ids; nuc ids will work by changing -dbfrom => >>>>>>> 'nucleotide', >>>>>>> # (probably) >>>>>>> >>>>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439); >>>>>>> >>>>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink', >>>>>>> -db => 'taxonomy', >>>>>>> -dbfrom => 'protein', >>>>>>> -correspondence => 1, >>>>>>> -id => \@ids); >>>>>>> >>>>>>> # iterate through the LinkSet objects >>>>>>> while (my $ds = $factory->next_LinkSet) { >>>>>>> $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0] >>>>>>> } >>>>>>> >>>>>>> @taxa = @taxa{@ids}; >>>>>>> >>>>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary', >>>>>>> -db => 'taxonomy', >>>>>>> -id => \@taxa ); >>>>>>> >>>>>>> while (local $_ = $factory->next_DocSum) { >>>>>>> $names{($_->get_contents_by_name('TaxId'))[0]} = >>>>>>> ($_->get_contents_by_name('ScientificName'))[0]; >>>>>>> } >>>>>>> >>>>>>> foreach (@ids) { >>>>>>> $idmap{$_} = $names{$taxa{$_}}; >>>>>>> } >>>>>>> >>>>>>> # %idmap is >>>>>>> # 1621261 => 'Mycobacterium tuberculosis H37Rv' >>>>>>> # 20807972 => 'Thermoanaerobacter tengcongensis MB4' >>>>>>> # 68536103 => 'Corynebacterium jeikeium K411' >>>>>>> # 730439 => 'Bacillus caldolyticus' >>>>>>> # 89318838 => undef (this record has been removed from the db) >>>>>>> >>>>>>> 1; >>>>>>> >>>>>>> You probably will need to break up your 30000 into chunks >>>>>>> (say, 1000-3000 each), and do the above on each chunk with a >>>>>>> >>>>>>> sleep 3; >>>>>>> >>>>>>> or so separating the queries. >>>>>>> MAJ >>>>>>> ----- Original Message ----- >>>>>>> From: "Bhakti Dwivedi" >>>>>>> To: >>>>>>> Sent: Friday, December 25, 2009 9:46 PM >>>>>>> Subject: [Bioperl-l] how to retrieve organism name from accession >>>>> number? >>>>>>> >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> Does anyone know how to retrieve the "Source" or the "Species name" >>>>>>> given >>>>>>>> the accession number using Bioperl. I have these 30,000 accession >>>>>>> numbers >>>>>>>> for which I need to get the source organisms. Any kind of help >> will >>>>> be >>>>>>>> appreciated. >>>>>>>> >>>>>>>> Thanks >>>>>>>> >>>>>>>> BD >>>>>>>> _______________________________________________ >>>>>>>> Bioperl-l mailing list >>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>> ======================================================================= >>>>>> Attention: The information contained in this message and/or >>>> attachments >>>>>> from AgResearch Limited is intended only for the persons or entities >>>>>> to which it is addressed and may contain confidential and/or >>>> privileged >>>>>> material. Any review, retransmission, dissemination or other use of, >>>> or >>>>>> taking of any action in reliance upon, this information by persons or >>>>>> entities other than the intended recipients is prohibited by >>>> AgResearch >>>>>> Limited. If you have received this message in error, please notify >> the >>>>>> sender immediately. >>>>>> >>>> ======================================================================= >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Russell.Smithies at agresearch.co.nz Wed Jan 27 02:45:58 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 27 Jan 2010 15:45:58 +1300 Subject: [Bioperl-l] how to retrieve organism name from accession number? In-Reply-To: <5FF9B845-2B6A-4B8C-B8AE-A12E0CBCA2D6@illinois.edu> References: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife> <18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz> <4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu> <18DF7D20DFEC044098A1062202F5FFF32C61AC961E@exchsth.agresearch.co.nz> <5FF9B845-2B6A-4B8C-B8AE-A12E0CBCA2D6@illinois.edu> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C61AC964C@exchsth.agresearch.co.nz> Batch-entrez http://www.ncbi.nlm.nih.gov/portal/utils/batchentrez_p.cgi still works if you don't mind a bit of manual button clicking. It's handling chunks of 100,000 records OK (today). --Russell > -----Original Message----- > From: Chris Fields [mailto:cjfields at illinois.edu] > Sent: Wednesday, 27 January 2010 3:42 p.m. > To: Smithies, Russell > Cc: 'bioperl-l at lists.open-bio.org'; 'Mark A. Jensen' > Subject: Re: [Bioperl-l] how to retrieve organism name from accession > number? > > Makes me wonder if they're pushing more users towards the SOAP-based > services and away from eutils. > > chris > > On Jan 26, 2010, at 7:59 PM, Smithies, Russell wrote: > > > I've had a wide selection of errors lately: > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: NCBI esearch fatal error: Search Backend failed: Error 11 (Resource > temporarily unavailable) > > STACK: Error::throw > > STACK: Bio::Root::Root::throw > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 > > STACK: Bio::Tools::EUtilities::parse_data > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332 > > STACK: Bio::Tools::EUtilities::get_ids > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441 > > STACK: Bio::DB::EUtilities::get_ids > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363 > > STACK: get_desc.pl:32 > > ----------------------------------------------------------- > > > > And I never get a good explanation from NCBI or suggestions on how to > avoid it. > > > > > > --Russell > > > > > >> -----Original Message----- > >> From: Chris Fields [mailto:cjfields at illinois.edu] > >> Sent: Wednesday, 27 January 2010 2:46 p.m. > >> To: Smithies, Russell > >> Cc: 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org' > >> Subject: Re: [Bioperl-l] how to retrieve organism name from accession > >> number? > >> > >> It's unfortunate but I have heard this problem popping up quite a bit > more > >> frequently lately. Not to push too many buttons but NCBI isn't very > >> forthcoming with help these days; they have become quite insular. Not > >> sure if they're short-staffed due to budget or if there are other > issues. > >> > >> chris > >> > >> On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote: > >> > >>> Grrrrrr, I hate eutils!!!! > >>> > >>> ------------- EXCEPTION: Bio::Root::Exception ------------- > >>> MSG: NCBI esearch fatal error: Search Backend failed: Error 111 > >> (Connection refused) > >>> STACK: Error::throw > >>> STACK: Bio::Root::Root::throw > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 > >>> STACK: Bio::Tools::EUtilities::parse_data > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332 > >>> STACK: Bio::Tools::EUtilities::get_ids > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441 > >>> STACK: Bio::DB::EUtilities::get_ids > >> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363 > >>> STACK: get_desc.pl:32 > >>> ----------------------------------------------------------- > >>> > >>> > >>> Nice error message though :-) > >>> > >>> > >>> --Russell > >>> > >>>> -----Original Message----- > >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>>> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell > >>>> Sent: Monday, 11 January 2010 10:05 a.m. > >>>> To: 'Chris Fields' > >>>> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open- > bio.org' > >>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession > >>>> number? > >>>> > >>>> I've started to go off eUtils recently (not BioPerl's fault) as I've > >> often > >>>> been finding that with large queries, chunks of the resulting data is > >>>> missing. > >>>> For example, before Xmas I was creating species-specific databases by > >>>> using eUtils to get a list of GI numbers back for a taxid, then > >> retrieving > >>>> the fasta sequences in chunks of 500. > >>>> Very regularly, in the middle of the fasta there would be a message > >> about > >>>> resource unavailable eg. > >>>>> test_sequence_1 > >>>> TACGATCATCGCTResource UnavailableTACGACTCTGCT > >>>>> test_sequence_2 > >>>> TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT > >>>> > >>>> Often this wasn't detected until formatdb complained about invalid > >>>> characters. > >>>> Inquiries to NCBI as to why this was happening and what to do about > it > >>>> returned stupid answers ("do each sequence manually thru the web > >>>> interface", or "use eUtils"). > >>>> As we have a nice fast network connection, I now prefer to download > >> very > >>>> large gzip files (i.e. all of refseq) and extract what I need. > >>>> > >>>> I can't help but think that NCBI could solve a lot of problems if > they > >>>> gzipped the output from eUtils queries - it's something I've > requested > >>>> regularly for the last 5 years or so!! > >>>> > >>>> --Russell > >>>> > >>>> > >>>>> -----Original Message----- > >>>>> From: Chris Fields [mailto:cjfields at illinois.edu] > >>>>> Sent: Monday, 11 January 2010 9:50 a.m. > >>>>> To: Smithies, Russell > >>>>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open- > bio.org' > >>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from > accession > >>>>> number? > >>>>> > >>>>> One could also use Bio::DB::Taxonomy, which indexes the same files > or > >>>>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for > >> the > >>>>> details). > >>>>> > >>>>> chris > >>>>> > >>>>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote: > >>>>> > >>>>>> An alternate non-BioPerly way (that may be faster given NCBI's > >>>> flakiness > >>>>> lately) would be to download the gi_taxid_nucl.zip or > >> gi_taxid_prot.zip > >>>>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a > hash > >>>> and > >>>>> do lookups. > >>>>>> In that same dir, taxdump.tar.gz contains a file called names.dmp > >>>> which > >>>>> lists taxids and descriptions (and synonyms) > >>>>>> > >>>>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so > I > >>>>> could do this: > >>>>>> > >>>>>> my $taxid = $gi_taxid_nucl{$accession}; > >>>>>> my $org_name = $names{$taxid}; > >>>>>> > >>>>>> --Russell > >>>>>> > >>>>>> > >>>>>>> -----Original Message----- > >>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>>>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen > >>>>>>> Sent: Saturday, 26 December 2009 4:52 p.m. > >>>>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org > >>>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from > >> accession > >>>>>>> number? > >>>>>>> > >>>>>>> Bhakti, > >>>>>>> The following example (using EUtilities) may serve your purpose: > >>>>>>> > >>>>>>> use Bio::DB::EUtilities; > >>>>>>> > >>>>>>> my (%taxa, @taxa); > >>>>>>> my (%names, %idmap); > >>>>>>> > >>>>>>> # these are protein ids; nuc ids will work by changing -dbfrom => > >>>>>>> 'nucleotide', > >>>>>>> # (probably) > >>>>>>> > >>>>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439); > >>>>>>> > >>>>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink', > >>>>>>> -db => 'taxonomy', > >>>>>>> -dbfrom => 'protein', > >>>>>>> -correspondence => 1, > >>>>>>> -id => \@ids); > >>>>>>> > >>>>>>> # iterate through the LinkSet objects > >>>>>>> while (my $ds = $factory->next_LinkSet) { > >>>>>>> $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0] > >>>>>>> } > >>>>>>> > >>>>>>> @taxa = @taxa{@ids}; > >>>>>>> > >>>>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary', > >>>>>>> -db => 'taxonomy', > >>>>>>> -id => \@taxa ); > >>>>>>> > >>>>>>> while (local $_ = $factory->next_DocSum) { > >>>>>>> $names{($_->get_contents_by_name('TaxId'))[0]} = > >>>>>>> ($_->get_contents_by_name('ScientificName'))[0]; > >>>>>>> } > >>>>>>> > >>>>>>> foreach (@ids) { > >>>>>>> $idmap{$_} = $names{$taxa{$_}}; > >>>>>>> } > >>>>>>> > >>>>>>> # %idmap is > >>>>>>> # 1621261 => 'Mycobacterium tuberculosis H37Rv' > >>>>>>> # 20807972 => 'Thermoanaerobacter tengcongensis MB4' > >>>>>>> # 68536103 => 'Corynebacterium jeikeium K411' > >>>>>>> # 730439 => 'Bacillus caldolyticus' > >>>>>>> # 89318838 => undef (this record has been removed from the > db) > >>>>>>> > >>>>>>> 1; > >>>>>>> > >>>>>>> You probably will need to break up your 30000 into chunks > >>>>>>> (say, 1000-3000 each), and do the above on each chunk with a > >>>>>>> > >>>>>>> sleep 3; > >>>>>>> > >>>>>>> or so separating the queries. > >>>>>>> MAJ > >>>>>>> ----- Original Message ----- > >>>>>>> From: "Bhakti Dwivedi" > >>>>>>> To: > >>>>>>> Sent: Friday, December 25, 2009 9:46 PM > >>>>>>> Subject: [Bioperl-l] how to retrieve organism name from accession > >>>>> number? > >>>>>>> > >>>>>>> > >>>>>>>> Hi, > >>>>>>>> > >>>>>>>> Does anyone know how to retrieve the "Source" or the "Species > name" > >>>>>>> given > >>>>>>>> the accession number using Bioperl. I have these 30,000 > accession > >>>>>>> numbers > >>>>>>>> for which I need to get the source organisms. Any kind of help > >> will > >>>>> be > >>>>>>>> appreciated. > >>>>>>>> > >>>>>>>> Thanks > >>>>>>>> > >>>>>>>> BD > >>>>>>>> _______________________________________________ > >>>>>>>> Bioperl-l mailing list > >>>>>>>> Bioperl-l at lists.open-bio.org > >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>>> _______________________________________________ > >>>>>>> Bioperl-l mailing list > >>>>>>> Bioperl-l at lists.open-bio.org > >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>> > >>>> > ======================================================================= > >>>>>> Attention: The information contained in this message and/or > >>>> attachments > >>>>>> from AgResearch Limited is intended only for the persons or > entities > >>>>>> to which it is addressed and may contain confidential and/or > >>>> privileged > >>>>>> material. Any review, retransmission, dissemination or other use > of, > >>>> or > >>>>>> taking of any action in reliance upon, this information by persons > or > >>>>>> entities other than the intended recipients is prohibited by > >>>> AgResearch > >>>>>> Limited. If you have received this message in error, please notify > >> the > >>>>>> sender immediately. > >>>>>> > >>>> > ======================================================================= > >>>>>> > >>>>>> _______________________________________________ > >>>>>> Bioperl-l mailing list > >>>>>> Bioperl-l at lists.open-bio.org > >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>> > >>>> > >>>> _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l at lists.open-bio.org > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Wed Jan 27 15:14:22 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 27 Jan 2010 10:14:22 -0500 Subject: [Bioperl-l] how to retrieve organism name from accession number? In-Reply-To: <5FF9B845-2B6A-4B8C-B8AE-A12E0CBCA2D6@illinois.edu> References: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife><18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz><18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz><18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz><4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu><18DF7D20DFEC044098A1062202F5FFF32C61AC961E@exchsth.agresearch.co.nz> <5FF9B845-2B6A-4B8C-B8AE-A12E0CBCA2D6@illinois.edu> Message-ID: Precisely the MO behind SoapEU...get the jump on 'em. ----- Original Message ----- From: "Chris Fields" To: "Smithies, Russell" Cc: ; "'Mark A. Jensen'" Sent: Tuesday, January 26, 2010 9:42 PM Subject: Re: [Bioperl-l] how to retrieve organism name from accession number? > Makes me wonder if they're pushing more users towards the SOAP-based services > and away from eutils. > > chris > > On Jan 26, 2010, at 7:59 PM, Smithies, Russell wrote: > >> I've had a wide selection of errors lately: >> >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: NCBI esearch fatal error: Search Backend failed: Error 11 (Resource >> temporarily unavailable) >> STACK: Error::throw >> STACK: Bio::Root::Root::throw >> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 >> STACK: Bio::Tools::EUtilities::parse_data >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332 >> STACK: Bio::Tools::EUtilities::get_ids >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441 >> STACK: Bio::DB::EUtilities::get_ids >> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363 >> STACK: get_desc.pl:32 >> ----------------------------------------------------------- >> >> And I never get a good explanation from NCBI or suggestions on how to avoid >> it. >> >> >> --Russell >> >> >>> -----Original Message----- >>> From: Chris Fields [mailto:cjfields at illinois.edu] >>> Sent: Wednesday, 27 January 2010 2:46 p.m. >>> To: Smithies, Russell >>> Cc: 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org' >>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession >>> number? >>> >>> It's unfortunate but I have heard this problem popping up quite a bit more >>> frequently lately. Not to push too many buttons but NCBI isn't very >>> forthcoming with help these days; they have become quite insular. Not >>> sure if they're short-staffed due to budget or if there are other issues. >>> >>> chris >>> >>> On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote: >>> >>>> Grrrrrr, I hate eutils!!!! >>>> >>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>> MSG: NCBI esearch fatal error: Search Backend failed: Error 111 >>> (Connection refused) >>>> STACK: Error::throw >>>> STACK: Bio::Root::Root::throw >>> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 >>>> STACK: Bio::Tools::EUtilities::parse_data >>> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332 >>>> STACK: Bio::Tools::EUtilities::get_ids >>> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441 >>>> STACK: Bio::DB::EUtilities::get_ids >>> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363 >>>> STACK: get_desc.pl:32 >>>> ----------------------------------------------------------- >>>> >>>> >>>> Nice error message though :-) >>>> >>>> >>>> --Russell >>>> >>>>> -----Original Message----- >>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell >>>>> Sent: Monday, 11 January 2010 10:05 a.m. >>>>> To: 'Chris Fields' >>>>> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org' >>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession >>>>> number? >>>>> >>>>> I've started to go off eUtils recently (not BioPerl's fault) as I've >>> often >>>>> been finding that with large queries, chunks of the resulting data is >>>>> missing. >>>>> For example, before Xmas I was creating species-specific databases by >>>>> using eUtils to get a list of GI numbers back for a taxid, then >>> retrieving >>>>> the fasta sequences in chunks of 500. >>>>> Very regularly, in the middle of the fasta there would be a message >>> about >>>>> resource unavailable eg. >>>>>> test_sequence_1 >>>>> TACGATCATCGCTResource UnavailableTACGACTCTGCT >>>>>> test_sequence_2 >>>>> TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT >>>>> >>>>> Often this wasn't detected until formatdb complained about invalid >>>>> characters. >>>>> Inquiries to NCBI as to why this was happening and what to do about it >>>>> returned stupid answers ("do each sequence manually thru the web >>>>> interface", or "use eUtils"). >>>>> As we have a nice fast network connection, I now prefer to download >>> very >>>>> large gzip files (i.e. all of refseq) and extract what I need. >>>>> >>>>> I can't help but think that NCBI could solve a lot of problems if they >>>>> gzipped the output from eUtils queries - it's something I've requested >>>>> regularly for the last 5 years or so!! >>>>> >>>>> --Russell >>>>> >>>>> >>>>>> -----Original Message----- >>>>>> From: Chris Fields [mailto:cjfields at illinois.edu] >>>>>> Sent: Monday, 11 January 2010 9:50 a.m. >>>>>> To: Smithies, Russell >>>>>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-bio.org' >>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession >>>>>> number? >>>>>> >>>>>> One could also use Bio::DB::Taxonomy, which indexes the same files or >>>>>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for >>> the >>>>>> details). >>>>>> >>>>>> chris >>>>>> >>>>>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote: >>>>>> >>>>>>> An alternate non-BioPerly way (that may be faster given NCBI's >>>>> flakiness >>>>>> lately) would be to download the gi_taxid_nucl.zip or >>> gi_taxid_prot.zip >>>>>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash >>>>> and >>>>>> do lookups. >>>>>>> In that same dir, taxdump.tar.gz contains a file called names.dmp >>>>> which >>>>>> lists taxids and descriptions (and synonyms) >>>>>>> >>>>>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I >>>>>> could do this: >>>>>>> >>>>>>> my $taxid = $gi_taxid_nucl{$accession}; >>>>>>> my $org_name = $names{$taxid}; >>>>>>> >>>>>>> --Russell >>>>>>> >>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen >>>>>>>> Sent: Saturday, 26 December 2009 4:52 p.m. >>>>>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org >>>>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from >>> accession >>>>>>>> number? >>>>>>>> >>>>>>>> Bhakti, >>>>>>>> The following example (using EUtilities) may serve your purpose: >>>>>>>> >>>>>>>> use Bio::DB::EUtilities; >>>>>>>> >>>>>>>> my (%taxa, @taxa); >>>>>>>> my (%names, %idmap); >>>>>>>> >>>>>>>> # these are protein ids; nuc ids will work by changing -dbfrom => >>>>>>>> 'nucleotide', >>>>>>>> # (probably) >>>>>>>> >>>>>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439); >>>>>>>> >>>>>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink', >>>>>>>> -db => 'taxonomy', >>>>>>>> -dbfrom => 'protein', >>>>>>>> -correspondence => 1, >>>>>>>> -id => \@ids); >>>>>>>> >>>>>>>> # iterate through the LinkSet objects >>>>>>>> while (my $ds = $factory->next_LinkSet) { >>>>>>>> $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0] >>>>>>>> } >>>>>>>> >>>>>>>> @taxa = @taxa{@ids}; >>>>>>>> >>>>>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary', >>>>>>>> -db => 'taxonomy', >>>>>>>> -id => \@taxa ); >>>>>>>> >>>>>>>> while (local $_ = $factory->next_DocSum) { >>>>>>>> $names{($_->get_contents_by_name('TaxId'))[0]} = >>>>>>>> ($_->get_contents_by_name('ScientificName'))[0]; >>>>>>>> } >>>>>>>> >>>>>>>> foreach (@ids) { >>>>>>>> $idmap{$_} = $names{$taxa{$_}}; >>>>>>>> } >>>>>>>> >>>>>>>> # %idmap is >>>>>>>> # 1621261 => 'Mycobacterium tuberculosis H37Rv' >>>>>>>> # 20807972 => 'Thermoanaerobacter tengcongensis MB4' >>>>>>>> # 68536103 => 'Corynebacterium jeikeium K411' >>>>>>>> # 730439 => 'Bacillus caldolyticus' >>>>>>>> # 89318838 => undef (this record has been removed from the db) >>>>>>>> >>>>>>>> 1; >>>>>>>> >>>>>>>> You probably will need to break up your 30000 into chunks >>>>>>>> (say, 1000-3000 each), and do the above on each chunk with a >>>>>>>> >>>>>>>> sleep 3; >>>>>>>> >>>>>>>> or so separating the queries. >>>>>>>> MAJ >>>>>>>> ----- Original Message ----- >>>>>>>> From: "Bhakti Dwivedi" >>>>>>>> To: >>>>>>>> Sent: Friday, December 25, 2009 9:46 PM >>>>>>>> Subject: [Bioperl-l] how to retrieve organism name from accession >>>>>> number? >>>>>>>> >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> Does anyone know how to retrieve the "Source" or the "Species name" >>>>>>>> given >>>>>>>>> the accession number using Bioperl. I have these 30,000 accession >>>>>>>> numbers >>>>>>>>> for which I need to get the source organisms. Any kind of help >>> will >>>>>> be >>>>>>>>> appreciated. >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> >>>>>>>>> BD >>>>>>>>> _______________________________________________ >>>>>>>>> Bioperl-l mailing list >>>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Bioperl-l mailing list >>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>>> ======================================================================= >>>>>>> Attention: The information contained in this message and/or >>>>> attachments >>>>>>> from AgResearch Limited is intended only for the persons or entities >>>>>>> to which it is addressed and may contain confidential and/or >>>>> privileged >>>>>>> material. Any review, retransmission, dissemination or other use of, >>>>> or >>>>>>> taking of any action in reliance upon, this information by persons or >>>>>>> entities other than the intended recipients is prohibited by >>>>> AgResearch >>>>>>> Limited. If you have received this message in error, please notify >>> the >>>>>>> sender immediately. >>>>>>> >>>>> ======================================================================= >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From bhakti.dwivedi at gmail.com Wed Jan 27 19:42:06 2010 From: bhakti.dwivedi at gmail.com (Bhakti Dwivedi) Date: Wed, 27 Jan 2010 14:42:06 -0500 Subject: [Bioperl-l] Designing primers from multiple sequence alignment of amino acid sequences Message-ID: Hi, I have to design primers from the multiple sequence alignments of amino acid sequences. The sequences I am working with are quite diverged and often the available primer design programs (such as CODEHOP/iCODEHOP) fail to find any primer sets. But, when I look at the alignment manually, I could see the regions that I could use to make primers. So I designed the degenerate primers the old-fashioned way, starting from selecting the conserved regions (6-10aa long) from the alignment to translating the selected regions to DNA using the appropriate codon usage table, and then finally checking the primer sets (potential forward and reverse primers) using tools like OLIGOANALYZER. In the end, I did find few good primer sets, but getting them to work in reality is something I will have to wait and see. While doing this process manually, I really felt the need to automate it (it was not just one alignment I did, I worked with several of those). I was wondering if there is anyway bioperl can help me here, or making a perl script is the only way to go. I would appreciate your suggestions/comments. Thanks! (apologize for a long email..) Regards Bhakti From Kevin.M.Brown at asu.edu Wed Jan 27 20:23:57 2010 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Wed, 27 Jan 2010 13:23:57 -0700 Subject: [Bioperl-l] Designing primers from multiple sequence alignment ofamino acid sequences In-Reply-To: References: Message-ID: <1A4207F8295607498283FE9E93B775B4068498DB@EX02.asurite.ad.asu.edu> Bioperl is just a collection of tools, not a full blown application. Most of what you want can be done with the objects available from within the toolkit, but the application (perl script) would still need to be written to put the objects to use. You could use clustalw from within perl to align the sequences (Bio::Tools::Run::Alignment::Clustalw), find the conserved regions (Bio::SimpleAlign), reverse translate them (Bio::Tools::CodonTable), then come up with an algorithm for primer analysis and selction (or even use other apps like primer3 (Bio::Tools::Run::Primer3) from within perl). Kevin Brown Center for Innovations in Medicine Biodesign Institute Arizona State University > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Bhakti Dwivedi > Sent: Wednesday, January 27, 2010 12:42 PM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Designing primers from multiple sequence > alignment ofamino acid sequences > > Hi, > > I have to design primers from the multiple sequence > alignments of amino acid > sequences. The sequences I am working with are quite > diverged and often the > available primer design programs (such as CODEHOP/iCODEHOP) > fail to find any > primer sets. But, when I look at the alignment manually, I > could see the > regions that I could use to make primers. > > So I designed the degenerate primers the old-fashioned way, > starting from > selecting the conserved regions (6-10aa long) from the alignment to > translating the selected regions to DNA using the appropriate > codon usage > table, and then finally checking the primer sets (potential > forward and > reverse primers) using tools like OLIGOANALYZER. In the end, > I did find few > good primer sets, but getting them to work in reality is > something I will > have to wait and see. > > While doing this process manually, I really felt the need to > automate it (it > was not just one alignment I did, I worked with several of > those). I was > wondering if there is anyway bioperl can help me here, or > making a perl > script is the only way to go. > > I would appreciate your suggestions/comments. Thanks! > (apologize for a > long email..) > > > Regards > Bhakti > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From mike.stubbington at bbsrc.ac.uk Thu Jan 28 15:41:49 2010 From: mike.stubbington at bbsrc.ac.uk (mike stubbington (BI)) Date: Thu, 28 Jan 2010 15:41:49 +0000 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error running blastn Message-ID: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk> Dear all, I am attempting to blast some primers against the mouse genome. I have created a local mouse genome blast database and I can search against it using 'blastn' at the command line. I have perl code that creates an array of bioperl sequence objects called @primers I then create a StandAloneBlastPlus factory using the following code? my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new( -db_dir => '/Users/stubbing/localBlast/', -db_name => 'MouseGenome' ); and then attempt to blast my primers using this? my @shortPrimers; my $count=1; foreach (@primers) { my $currentSeq = $_; print "Checking primer $count/$primerNumber "; if ($_->length < 40) { push(@shortPrimers,$_); print "Too short!\n"; } else { print "BLASTing..."; my $blastResult = $blastFactory->blastn(-query => $currentSeq); } $count++; } This fails with the following error? ------------- EXCEPTION ------------- MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running /usr/local/ncbi/blast/bin/blastn : Illegal seek at /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 989, line 532. STACK Bio::Tools::Run::WrapperBase::_run /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:994 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 STACK Bio::Tools::Run::StandAloneBlastPlus::run /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 STACK toplevel ./5CTest.pl:63 ------------------------------------- Line 63 in my code is (as you might expect) the one that calls blastn on my factory object. I'd appreciate any help you might be able to provide to shed light on this. Thanks in advance, Mike From maj at fortinbras.us Thu Jan 28 15:56:14 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 28 Jan 2010 10:56:14 -0500 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error running blastn In-Reply-To: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk> References: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk> Message-ID: <56CC8EF7EEAD4E28A3B822ADA8AD74B1@NewLife> Mike - please try updating your bioperl-live (the core) to the latest code (revision 16761 or so). CommandExts is a work in progress; from the stack errors it looks like you've got an older version. Try it then ping us back, if you would-- Thanks Mark ----- Original Message ----- From: "mike stubbington (BI)" To: Sent: Thursday, January 28, 2010 10:41 AM Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error running blastn Dear all, I am attempting to blast some primers against the mouse genome. I have created a local mouse genome blast database and I can search against it using 'blastn' at the command line. I have perl code that creates an array of bioperl sequence objects called @primers I then create a StandAloneBlastPlus factory using the following code? my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new( -db_dir => '/Users/stubbing/localBlast/', -db_name => 'MouseGenome' ); and then attempt to blast my primers using this? my @shortPrimers; my $count=1; foreach (@primers) { my $currentSeq = $_; print "Checking primer $count/$primerNumber "; if ($_->length < 40) { push(@shortPrimers,$_); print "Too short!\n"; } else { print "BLASTing..."; my $blastResult = $blastFactory->blastn(-query => $currentSeq); } $count++; } This fails with the following error? ------------- EXCEPTION ------------- MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running /usr/local/ncbi/blast/bin/blastn : Illegal seek at /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 989, line 532. STACK Bio::Tools::Run::WrapperBase::_run /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:994 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 STACK Bio::Tools::Run::StandAloneBlastPlus::run /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 STACK toplevel ./5CTest.pl:63 ------------------------------------- Line 63 in my code is (as you might expect) the one that calls blastn on my factory object. I'd appreciate any help you might be able to provide to shed light on this. Thanks in advance, Mike _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From mike.stubbington at bbsrc.ac.uk Thu Jan 28 16:18:12 2010 From: mike.stubbington at bbsrc.ac.uk (mike stubbington (BI)) Date: Thu, 28 Jan 2010 16:18:12 +0000 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error running blastn In-Reply-To: <56CC8EF7EEAD4E28A3B822ADA8AD74B1@NewLife> References: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk> <56CC8EF7EEAD4E28A3B822ADA8AD74B1@NewLife> Message-ID: <05171AE0-71F2-4102-8131-70C6019D0172@bbsrc.ac.uk> Hi, Thanks for the suggestion. Unfortunately it still fails - error as follows: ------------- EXCEPTION ------------- MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running /usr/local/ncbi/blast/bin/blastn : Illegal seek at /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 1000, line 532. STACK Bio::Tools::Run::WrapperBase::_run /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1005 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 STACK Bio::Tools::Run::StandAloneBlastPlus::run /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 STACK toplevel ./5CTest.pl:63 ------------------------------------- M On 28 Jan 2010, at 15:56, Mark A. Jensen wrote: > Mike - please try updating your bioperl-live (the core) to the latest code > (revision 16761 or so). > CommandExts is a work in progress; from the stack errors it looks like you've > got an older version. > Try it then ping us back, if you would-- > Thanks > Mark > ----- Original Message ----- > From: "mike stubbington (BI)" > To: > Sent: Thursday, January 28, 2010 10:41 AM > Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error > running blastn > > > Dear all, > > I am attempting to blast some primers against the mouse genome. I have created a > local mouse genome blast database and I can search against it using 'blastn' at > the command line. > > I have perl code that creates an array of bioperl sequence objects called > @primers > > I then create a StandAloneBlastPlus factory using the following code? > > my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new( > -db_dir => '/Users/stubbing/localBlast/', > -db_name => 'MouseGenome' > ); > > and then attempt to blast my primers using this? > > my @shortPrimers; > my $count=1; > foreach (@primers) { > my $currentSeq = $_; > print "Checking primer $count/$primerNumber "; > if ($_->length < 40) { > push(@shortPrimers,$_); > print "Too short!\n"; > } > else { > print "BLASTing..."; > my $blastResult = $blastFactory->blastn(-query => $currentSeq); > } > $count++; > } > > This fails with the following error? > > ------------- EXCEPTION ------------- > MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running > /usr/local/ncbi/blast/bin/blastn : Illegal seek at > /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 989, > line 532. > > STACK Bio::Tools::Run::WrapperBase::_run > /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:994 > STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 > STACK Bio::Tools::Run::StandAloneBlastPlus::run > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 > STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 > STACK toplevel ./5CTest.pl:63 > ------------------------------------- > > Line 63 in my code is (as you might expect) the one that calls blastn on my > factory object. > > I'd appreciate any help you might be able to provide to shed light on this. > > Thanks in advance, > > Mike > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Thu Jan 28 16:28:52 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 28 Jan 2010 11:28:52 -0500 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error running blastn In-Reply-To: <05171AE0-71F2-4102-8131-70C6019D0172@bbsrc.ac.uk> References: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk> <56CC8EF7EEAD4E28A3B822ADA8AD74B1@NewLife> <05171AE0-71F2-4102-8131-70C6019D0172@bbsrc.ac.uk> Message-ID: Thanks Mike-- will have a look asap- cheers MAJ ----- Original Message ----- From: "mike stubbington (BI)" To: "Mark A. Jensen" Cc: Sent: Thursday, January 28, 2010 11:18 AM Subject: Re: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error running blastn Hi, Thanks for the suggestion. Unfortunately it still fails - error as follows: ------------- EXCEPTION ------------- MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running /usr/local/ncbi/blast/bin/blastn : Illegal seek at /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 1000, line 532. STACK Bio::Tools::Run::WrapperBase::_run /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1005 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 STACK Bio::Tools::Run::StandAloneBlastPlus::run /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 STACK toplevel ./5CTest.pl:63 ------------------------------------- M On 28 Jan 2010, at 15:56, Mark A. Jensen wrote: > Mike - please try updating your bioperl-live (the core) to the latest code > (revision 16761 or so). > CommandExts is a work in progress; from the stack errors it looks like you've > got an older version. > Try it then ping us back, if you would-- > Thanks > Mark > ----- Original Message ----- > From: "mike stubbington (BI)" > To: > Sent: Thursday, January 28, 2010 10:41 AM > Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error > running blastn > > > Dear all, > > I am attempting to blast some primers against the mouse genome. I have created > a > local mouse genome blast database and I can search against it using 'blastn' > at > the command line. > > I have perl code that creates an array of bioperl sequence objects called > @primers > > I then create a StandAloneBlastPlus factory using the following code? > > my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new( > -db_dir => '/Users/stubbing/localBlast/', > -db_name => 'MouseGenome' > ); > > and then attempt to blast my primers using this? > > my @shortPrimers; > my $count=1; > foreach (@primers) { > my $currentSeq = $_; > print "Checking primer $count/$primerNumber "; > if ($_->length < 40) { > push(@shortPrimers,$_); > print "Too short!\n"; > } > else { > print "BLASTing..."; > my $blastResult = $blastFactory->blastn(-query => $currentSeq); > } > $count++; > } > > This fails with the following error? > > ------------- EXCEPTION ------------- > MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem > running > /usr/local/ncbi/blast/bin/blastn : Illegal seek at > /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 989, > line 532. > > STACK Bio::Tools::Run::WrapperBase::_run > /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:994 > STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 > STACK Bio::Tools::Run::StandAloneBlastPlus::run > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 > STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 > STACK toplevel ./5CTest.pl:63 > ------------------------------------- > > Line 63 in my code is (as you might expect) the one that calls blastn on my > factory object. > > I'd appreciate any help you might be able to provide to shed light on this. > > Thanks in advance, > > Mike > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Thu Jan 28 18:26:27 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 28 Jan 2010 12:26:27 -0600 Subject: [Bioperl-l] how to retrieve organism name from accession number? In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C61AC964C@exchsth.agresearch.co.nz> References: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife> <18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz> <4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu> <18DF7D20DFEC044098A1062202F5FFF32C61AC961E@exchsth.agresearch.co.nz> <5FF9B845-2B6A-4B8C-B8AE-A12E0CBCA2D6@illinois.edu> <18DF7D20DFEC044098A1062202F5FFF32C61AC964C@exchsth.agresearch.co.nz> Message-ID: <1264703187.5473.10.camel@cjfields.igb.uiuc.edu> Russell, Just curious, but have you tried setting the return email parameter (-email)? NCBI recently stated that all queries would eventually require a return email of some sort (not sure if it's validated or not). I think that was set for around late spring. I'm changing the code in svn to require it for that very purpose. chris Wed, 2010-01-27 at 15:45 +1300, Smithies, Russell wrote: > Batch-entrez http://www.ncbi.nlm.nih.gov/portal/utils/batchentrez_p.cgi still works if you don't mind a bit of manual button clicking. It's handling chunks of 100,000 records OK (today). > > --Russell > > > -----Original Message----- > > From: Chris Fields [mailto:cjfields at illinois.edu] > > Sent: Wednesday, 27 January 2010 3:42 p.m. > > To: Smithies, Russell > > Cc: 'bioperl-l at lists.open-bio.org'; 'Mark A. Jensen' > > Subject: Re: [Bioperl-l] how to retrieve organism name from accession > > number? > > > > Makes me wonder if they're pushing more users towards the SOAP-based > > services and away from eutils. > > > > chris > > > > On Jan 26, 2010, at 7:59 PM, Smithies, Russell wrote: > > > > > I've had a wide selection of errors lately: > > > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > > MSG: NCBI esearch fatal error: Search Backend failed: Error 11 (Resource > > temporarily unavailable) > > > STACK: Error::throw > > > STACK: Bio::Root::Root::throw > > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 > > > STACK: Bio::Tools::EUtilities::parse_data > > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332 > > > STACK: Bio::Tools::EUtilities::get_ids > > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441 > > > STACK: Bio::DB::EUtilities::get_ids > > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363 > > > STACK: get_desc.pl:32 > > > ----------------------------------------------------------- > > > > > > And I never get a good explanation from NCBI or suggestions on how to > > avoid it. > > > > > > > > > --Russell > > > > > > > > >> -----Original Message----- > > >> From: Chris Fields [mailto:cjfields at illinois.edu] > > >> Sent: Wednesday, 27 January 2010 2:46 p.m. > > >> To: Smithies, Russell > > >> Cc: 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org' > > >> Subject: Re: [Bioperl-l] how to retrieve organism name from accession > > >> number? > > >> > > >> It's unfortunate but I have heard this problem popping up quite a bit > > more > > >> frequently lately. Not to push too many buttons but NCBI isn't very > > >> forthcoming with help these days; they have become quite insular. Not > > >> sure if they're short-staffed due to budget or if there are other > > issues. > > >> > > >> chris > > >> > > >> On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote: > > >> > > >>> Grrrrrr, I hate eutils!!!! > > >>> > > >>> ------------- EXCEPTION: Bio::Root::Exception ------------- > > >>> MSG: NCBI esearch fatal error: Search Backend failed: Error 111 > > >> (Connection refused) > > >>> STACK: Error::throw > > >>> STACK: Bio::Root::Root::throw > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 > > >>> STACK: Bio::Tools::EUtilities::parse_data > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332 > > >>> STACK: Bio::Tools::EUtilities::get_ids > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441 > > >>> STACK: Bio::DB::EUtilities::get_ids > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363 > > >>> STACK: get_desc.pl:32 > > >>> ----------------------------------------------------------- > > >>> > > >>> > > >>> Nice error message though :-) > > >>> > > >>> > > >>> --Russell > > >>> > > >>>> -----Original Message----- > > >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > >>>> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell > > >>>> Sent: Monday, 11 January 2010 10:05 a.m. > > >>>> To: 'Chris Fields' > > >>>> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open- > > bio.org' > > >>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession > > >>>> number? > > >>>> > > >>>> I've started to go off eUtils recently (not BioPerl's fault) as I've > > >> often > > >>>> been finding that with large queries, chunks of the resulting data is > > >>>> missing. > > >>>> For example, before Xmas I was creating species-specific databases by > > >>>> using eUtils to get a list of GI numbers back for a taxid, then > > >> retrieving > > >>>> the fasta sequences in chunks of 500. > > >>>> Very regularly, in the middle of the fasta there would be a message > > >> about > > >>>> resource unavailable eg. > > >>>>> test_sequence_1 > > >>>> TACGATCATCGCTResource UnavailableTACGACTCTGCT > > >>>>> test_sequence_2 > > >>>> TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT > > >>>> > > >>>> Often this wasn't detected until formatdb complained about invalid > > >>>> characters. > > >>>> Inquiries to NCBI as to why this was happening and what to do about > > it > > >>>> returned stupid answers ("do each sequence manually thru the web > > >>>> interface", or "use eUtils"). > > >>>> As we have a nice fast network connection, I now prefer to download > > >> very > > >>>> large gzip files (i.e. all of refseq) and extract what I need. > > >>>> > > >>>> I can't help but think that NCBI could solve a lot of problems if > > they > > >>>> gzipped the output from eUtils queries - it's something I've > > requested > > >>>> regularly for the last 5 years or so!! > > >>>> > > >>>> --Russell > > >>>> > > >>>> > > >>>>> -----Original Message----- > > >>>>> From: Chris Fields [mailto:cjfields at illinois.edu] > > >>>>> Sent: Monday, 11 January 2010 9:50 a.m. > > >>>>> To: Smithies, Russell > > >>>>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open- > > bio.org' > > >>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from > > accession > > >>>>> number? > > >>>>> > > >>>>> One could also use Bio::DB::Taxonomy, which indexes the same files > > or > > >>>>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for > > >> the > > >>>>> details). > > >>>>> > > >>>>> chris > > >>>>> > > >>>>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote: > > >>>>> > > >>>>>> An alternate non-BioPerly way (that may be faster given NCBI's > > >>>> flakiness > > >>>>> lately) would be to download the gi_taxid_nucl.zip or > > >> gi_taxid_prot.zip > > >>>>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a > > hash > > >>>> and > > >>>>> do lookups. > > >>>>>> In that same dir, taxdump.tar.gz contains a file called names.dmp > > >>>> which > > >>>>> lists taxids and descriptions (and synonyms) > > >>>>>> > > >>>>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so > > I > > >>>>> could do this: > > >>>>>> > > >>>>>> my $taxid = $gi_taxid_nucl{$accession}; > > >>>>>> my $org_name = $names{$taxid}; > > >>>>>> > > >>>>>> --Russell > > >>>>>> > > >>>>>> > > >>>>>>> -----Original Message----- > > >>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > >>>>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen > > >>>>>>> Sent: Saturday, 26 December 2009 4:52 p.m. > > >>>>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org > > >>>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from > > >> accession > > >>>>>>> number? > > >>>>>>> > > >>>>>>> Bhakti, > > >>>>>>> The following example (using EUtilities) may serve your purpose: > > >>>>>>> > > >>>>>>> use Bio::DB::EUtilities; > > >>>>>>> > > >>>>>>> my (%taxa, @taxa); > > >>>>>>> my (%names, %idmap); > > >>>>>>> > > >>>>>>> # these are protein ids; nuc ids will work by changing -dbfrom => > > >>>>>>> 'nucleotide', > > >>>>>>> # (probably) > > >>>>>>> > > >>>>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439); > > >>>>>>> > > >>>>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink', > > >>>>>>> -db => 'taxonomy', > > >>>>>>> -dbfrom => 'protein', > > >>>>>>> -correspondence => 1, > > >>>>>>> -id => \@ids); > > >>>>>>> > > >>>>>>> # iterate through the LinkSet objects > > >>>>>>> while (my $ds = $factory->next_LinkSet) { > > >>>>>>> $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0] > > >>>>>>> } > > >>>>>>> > > >>>>>>> @taxa = @taxa{@ids}; > > >>>>>>> > > >>>>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary', > > >>>>>>> -db => 'taxonomy', > > >>>>>>> -id => \@taxa ); > > >>>>>>> > > >>>>>>> while (local $_ = $factory->next_DocSum) { > > >>>>>>> $names{($_->get_contents_by_name('TaxId'))[0]} = > > >>>>>>> ($_->get_contents_by_name('ScientificName'))[0]; > > >>>>>>> } > > >>>>>>> > > >>>>>>> foreach (@ids) { > > >>>>>>> $idmap{$_} = $names{$taxa{$_}}; > > >>>>>>> } > > >>>>>>> > > >>>>>>> # %idmap is > > >>>>>>> # 1621261 => 'Mycobacterium tuberculosis H37Rv' > > >>>>>>> # 20807972 => 'Thermoanaerobacter tengcongensis MB4' > > >>>>>>> # 68536103 => 'Corynebacterium jeikeium K411' > > >>>>>>> # 730439 => 'Bacillus caldolyticus' > > >>>>>>> # 89318838 => undef (this record has been removed from the > > db) > > >>>>>>> > > >>>>>>> 1; > > >>>>>>> > > >>>>>>> You probably will need to break up your 30000 into chunks > > >>>>>>> (say, 1000-3000 each), and do the above on each chunk with a > > >>>>>>> > > >>>>>>> sleep 3; > > >>>>>>> > > >>>>>>> or so separating the queries. > > >>>>>>> MAJ > > >>>>>>> ----- Original Message ----- > > >>>>>>> From: "Bhakti Dwivedi" > > >>>>>>> To: > > >>>>>>> Sent: Friday, December 25, 2009 9:46 PM > > >>>>>>> Subject: [Bioperl-l] how to retrieve organism name from accession > > >>>>> number? > > >>>>>>> > > >>>>>>> > > >>>>>>>> Hi, > > >>>>>>>> > > >>>>>>>> Does anyone know how to retrieve the "Source" or the "Species > > name" > > >>>>>>> given > > >>>>>>>> the accession number using Bioperl. I have these 30,000 > > accession > > >>>>>>> numbers > > >>>>>>>> for which I need to get the source organisms. Any kind of help > > >> will > > >>>>> be > > >>>>>>>> appreciated. > > >>>>>>>> > > >>>>>>>> Thanks > > >>>>>>>> > > >>>>>>>> BD > > >>>>>>>> _______________________________________________ > > >>>>>>>> Bioperl-l mailing list > > >>>>>>>> Bioperl-l at lists.open-bio.org > > >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > >>>>>>>> > > >>>>>>>> > > >>>>>>> > > >>>>>>> _______________________________________________ > > >>>>>>> Bioperl-l mailing list > > >>>>>>> Bioperl-l at lists.open-bio.org > > >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > >>>>>> > > >>>> > > ======================================================================= > > >>>>>> Attention: The information contained in this message and/or > > >>>> attachments > > >>>>>> from AgResearch Limited is intended only for the persons or > > entities > > >>>>>> to which it is addressed and may contain confidential and/or > > >>>> privileged > > >>>>>> material. Any review, retransmission, dissemination or other use > > of, > > >>>> or > > >>>>>> taking of any action in reliance upon, this information by persons > > or > > >>>>>> entities other than the intended recipients is prohibited by > > >>>> AgResearch > > >>>>>> Limited. If you have received this message in error, please notify > > >> the > > >>>>>> sender immediately. > > >>>>>> > > >>>> > > ======================================================================= > > >>>>>> > > >>>>>> _______________________________________________ > > >>>>>> Bioperl-l mailing list > > >>>>>> Bioperl-l at lists.open-bio.org > > >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > >>>> > > >>>> > > >>>> _______________________________________________ > > >>>> Bioperl-l mailing list > > >>>> Bioperl-l at lists.open-bio.org > > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Thu Jan 28 18:47:04 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 28 Jan 2010 13:47:04 -0500 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error running blastn In-Reply-To: <05171AE0-71F2-4102-8131-70C6019D0172@bbsrc.ac.uk> References: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk><56CC8EF7EEAD4E28A3B822ADA8AD74B1@NewLife> <05171AE0-71F2-4102-8131-70C6019D0172@bbsrc.ac.uk> Message-ID: Hi Mike, Believe I found the real bug causing the problem (was not accounting for the db_dir parameter). Crashes should now also throw much more helpful errors. Please try the code at r16774, and shout back. thanks -- MAJ ----- Original Message ----- From: "mike stubbington (BI)" To: "Mark A. Jensen" Cc: Sent: Thursday, January 28, 2010 11:18 AM Subject: Re: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error running blastn Hi, Thanks for the suggestion. Unfortunately it still fails - error as follows: ------------- EXCEPTION ------------- MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running /usr/local/ncbi/blast/bin/blastn : Illegal seek at /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 1000, line 532. STACK Bio::Tools::Run::WrapperBase::_run /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1005 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 STACK Bio::Tools::Run::StandAloneBlastPlus::run /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 STACK toplevel ./5CTest.pl:63 ------------------------------------- M On 28 Jan 2010, at 15:56, Mark A. Jensen wrote: > Mike - please try updating your bioperl-live (the core) to the latest code > (revision 16761 or so). > CommandExts is a work in progress; from the stack errors it looks like you've > got an older version. > Try it then ping us back, if you would-- > Thanks > Mark > ----- Original Message ----- > From: "mike stubbington (BI)" > To: > Sent: Thursday, January 28, 2010 10:41 AM > Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error > running blastn > > > Dear all, > > I am attempting to blast some primers against the mouse genome. I have created > a > local mouse genome blast database and I can search against it using 'blastn' > at > the command line. > > I have perl code that creates an array of bioperl sequence objects called > @primers > > I then create a StandAloneBlastPlus factory using the following code? > > my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new( > -db_dir => '/Users/stubbing/localBlast/', > -db_name => 'MouseGenome' > ); > > and then attempt to blast my primers using this? > > my @shortPrimers; > my $count=1; > foreach (@primers) { > my $currentSeq = $_; > print "Checking primer $count/$primerNumber "; > if ($_->length < 40) { > push(@shortPrimers,$_); > print "Too short!\n"; > } > else { > print "BLASTing..."; > my $blastResult = $blastFactory->blastn(-query => $currentSeq); > } > $count++; > } > > This fails with the following error? > > ------------- EXCEPTION ------------- > MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem > running > /usr/local/ncbi/blast/bin/blastn : Illegal seek at > /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 989, > line 532. > > STACK Bio::Tools::Run::WrapperBase::_run > /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:994 > STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 > STACK Bio::Tools::Run::StandAloneBlastPlus::run > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 > STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 > STACK toplevel ./5CTest.pl:63 > ------------------------------------- > > Line 63 in my code is (as you might expect) the one that calls blastn on my > factory object. > > I'd appreciate any help you might be able to provide to shed light on this. > > Thanks in advance, > > Mike > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Jan 28 19:00:26 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 28 Jan 2010 13:00:26 -0600 Subject: [Bioperl-l] EUtilities policy change Message-ID: <1264705226.5473.35.camel@cjfields.igb.uiuc.edu> All, Per NCBI's recent change in eutils user policy (effective June 1): http://bioperl.org/pipermail/bioperl-l/2009-December/031698.html Both the tool and email parameters ('-tool', '-email') are now required when making requests. Note this will significantly break all modules requiring remote access to eutils (Bio::DB::GenBank/GenPept, some Biblio and Taxonomy stuff as well, IIRC). This also applies to web services (SOAP-based access). Mark, not sure how this affects your SOAP-based modules. I have reconfigured Bio::DB::EUtilities to follow this policy; the default tool setting has been 'bioperl' and will remain that way. However, there has been no default email, therefore setting this is now required for future requests unless we (the bioperl devs) decide there is a safe default email to utilize. My gut tells me, however, that falling back to a default email opens up a can of worms for the devs and is very likely a 'BAD IDEA'(TM). Regardless, be aware that, after June 1, NCBI will very likely exclude requests with no email and will notify users who are considered to be violating their policies. I will likely make further changes to Bio::DB::EUtilities in the meantime to ensure that using the tools by default will not violate NCBI's policy (e.g. override this at your own risk). chris From maj at fortinbras.us Thu Jan 28 19:05:43 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 28 Jan 2010 14:05:43 -0500 Subject: [Bioperl-l] EUtilities policy change In-Reply-To: <1264705226.5473.35.camel@cjfields.igb.uiuc.edu> References: <1264705226.5473.35.camel@cjfields.igb.uiuc.edu> Message-ID: <8F49B5ED151143FA86E977B4D4F44265@NewLife> Thanks Chris-- The soap modules currently set tool to "SoapEUtilities(BioPerl)". I agree that a default email is a bad idea (tm) (unless maybe it's hilmar's...?). I'd say a warning on unset email parameters is a responsible "there be dragons" sort of treatment. MAJ ----- Original Message ----- From: "Chris Fields" To: "BioPerl-l" Cc: "Mark A. Jensen" Sent: Thursday, January 28, 2010 2:00 PM Subject: EUtilities policy change > All, > > Per NCBI's recent change in eutils user policy (effective June 1): > > http://bioperl.org/pipermail/bioperl-l/2009-December/031698.html > > Both the tool and email parameters ('-tool', '-email') are now required > when making requests. Note this will significantly break all modules > requiring remote access to eutils (Bio::DB::GenBank/GenPept, some Biblio > and Taxonomy stuff as well, IIRC). This also applies to web services > (SOAP-based access). Mark, not sure how this affects your SOAP-based > modules. > > I have reconfigured Bio::DB::EUtilities to follow this policy; the > default tool setting has been 'bioperl' and will remain that way. > However, there has been no default email, therefore setting this is now > required for future requests unless we (the bioperl devs) decide there > is a safe default email to utilize. My gut tells me, however, that > falling back to a default email opens up a can of worms for the devs and > is very likely a 'BAD IDEA'(TM). > > Regardless, be aware that, after June 1, NCBI will very likely exclude > requests with no email and will notify users who are considered to be > violating their policies. > > I will likely make further changes to Bio::DB::EUtilities in the > meantime to ensure that using the tools by default will not violate > NCBI's policy (e.g. override this at your own risk). > > chris > > > From cjfields at illinois.edu Thu Jan 28 19:18:22 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 28 Jan 2010 13:18:22 -0600 Subject: [Bioperl-l] EUtilities policy change In-Reply-To: <8F49B5ED151143FA86E977B4D4F44265@NewLife> References: <1264705226.5473.35.camel@cjfields.igb.uiuc.edu> <8F49B5ED151143FA86E977B4D4F44265@NewLife> Message-ID: <1264706302.5473.48.camel@cjfields.igb.uiuc.edu> I think warning is fine for now. I've reimplemented that so it occurs lazily (warns only when a request is actually made). Will also change the tool to 'BioPerl' (currently 'bioperl', all lc). We'll obviously have to address this in the test suite as well in some way, maybe ask for an email if network tests are requested. chris On Thu, 2010-01-28 at 14:05 -0500, Mark A. Jensen wrote: > Thanks Chris-- > The soap modules currently set tool to "SoapEUtilities(BioPerl)". > I agree that a default email is a bad idea (tm) (unless maybe it's > hilmar's...?). I'd say a warning on unset email parameters is a responsible > "there be dragons" sort of treatment. > MAJ > ----- Original Message ----- > From: "Chris Fields" > To: "BioPerl-l" > Cc: "Mark A. Jensen" > Sent: Thursday, January 28, 2010 2:00 PM > Subject: EUtilities policy change > > > > All, > > > > Per NCBI's recent change in eutils user policy (effective June 1): > > > > http://bioperl.org/pipermail/bioperl-l/2009-December/031698.html > > > > Both the tool and email parameters ('-tool', '-email') are now required > > when making requests. Note this will significantly break all modules > > requiring remote access to eutils (Bio::DB::GenBank/GenPept, some Biblio > > and Taxonomy stuff as well, IIRC). This also applies to web services > > (SOAP-based access). Mark, not sure how this affects your SOAP-based > > modules. > > > > I have reconfigured Bio::DB::EUtilities to follow this policy; the > > default tool setting has been 'bioperl' and will remain that way. > > However, there has been no default email, therefore setting this is now > > required for future requests unless we (the bioperl devs) decide there > > is a safe default email to utilize. My gut tells me, however, that > > falling back to a default email opens up a can of worms for the devs and > > is very likely a 'BAD IDEA'(TM). > > > > Regardless, be aware that, after June 1, NCBI will very likely exclude > > requests with no email and will notify users who are considered to be > > violating their policies. > > > > I will likely make further changes to Bio::DB::EUtilities in the > > meantime to ensure that using the tools by default will not violate > > NCBI's policy (e.g. override this at your own risk). > > > > chris > > > > > > From Russell.Smithies at agresearch.co.nz Thu Jan 28 19:25:38 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Fri, 29 Jan 2010 08:25:38 +1300 Subject: [Bioperl-l] how to retrieve organism name from accession number? In-Reply-To: <1264703187.5473.10.camel@cjfields.igb.uiuc.edu> References: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife> <18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz> <4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu> <18DF7D20DFEC044098A1062202F5FFF32C61AC961E@exchsth.agresearch.co.nz> <5FF9B845-2B6A-4B8C-B8AE-A12E0CBCA2D6@illinois.edu> <18DF7D20DFEC044098A1062202F5FFF32C61AC964C@exchsth.agresearch.co.nz> <1264703187.5473.10.camel@cjfields.igb.uiuc.edu> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C61AC9AF3@exchsth.agresearch.co.nz> Yes, I usually set the 'tool' and 'email' parameters. I went to NCBI back in 2006 and did their "PowerScripting" course where they pointed out a lot of the requirements for using eUtils. I think I requested results returned gzipped back then as well... --Russell > -----Original Message----- > From: Chris Fields [mailto:cjfields at illinois.edu] > Sent: Friday, 29 January 2010 7:26 a.m. > To: Smithies, Russell > Cc: 'bioperl-l at lists.open-bio.org'; 'Mark A. Jensen' > Subject: Re: [Bioperl-l] how to retrieve organism name from accession > number? > > Russell, > > Just curious, but have you tried setting the return email parameter > (-email)? NCBI recently stated that all queries would eventually > require a return email of some sort (not sure if it's validated or not). > I think that was set for around late spring. I'm changing the code in > svn to require it for that very purpose. > > chris > > > Wed, 2010-01-27 at 15:45 +1300, Smithies, Russell wrote: > > Batch-entrez http://www.ncbi.nlm.nih.gov/portal/utils/batchentrez_p.cgi > still works if you don't mind a bit of manual button clicking. It's > handling chunks of 100,000 records OK (today). > > > > --Russell > > > > > -----Original Message----- > > > From: Chris Fields [mailto:cjfields at illinois.edu] > > > Sent: Wednesday, 27 January 2010 3:42 p.m. > > > To: Smithies, Russell > > > Cc: 'bioperl-l at lists.open-bio.org'; 'Mark A. Jensen' > > > Subject: Re: [Bioperl-l] how to retrieve organism name from accession > > > number? > > > > > > Makes me wonder if they're pushing more users towards the SOAP-based > > > services and away from eutils. > > > > > > chris > > > > > > On Jan 26, 2010, at 7:59 PM, Smithies, Russell wrote: > > > > > > > I've had a wide selection of errors lately: > > > > > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > > > MSG: NCBI esearch fatal error: Search Backend failed: Error 11 > (Resource > > > temporarily unavailable) > > > > STACK: Error::throw > > > > STACK: Bio::Root::Root::throw > > > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 > > > > STACK: Bio::Tools::EUtilities::parse_data > > > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332 > > > > STACK: Bio::Tools::EUtilities::get_ids > > > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441 > > > > STACK: Bio::DB::EUtilities::get_ids > > > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363 > > > > STACK: get_desc.pl:32 > > > > ----------------------------------------------------------- > > > > > > > > And I never get a good explanation from NCBI or suggestions on how > to > > > avoid it. > > > > > > > > > > > > --Russell > > > > > > > > > > > >> -----Original Message----- > > > >> From: Chris Fields [mailto:cjfields at illinois.edu] > > > >> Sent: Wednesday, 27 January 2010 2:46 p.m. > > > >> To: Smithies, Russell > > > >> Cc: 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org' > > > >> Subject: Re: [Bioperl-l] how to retrieve organism name from > accession > > > >> number? > > > >> > > > >> It's unfortunate but I have heard this problem popping up quite a > bit > > > more > > > >> frequently lately. Not to push too many buttons but NCBI isn't > very > > > >> forthcoming with help these days; they have become quite insular. > Not > > > >> sure if they're short-staffed due to budget or if there are other > > > issues. > > > >> > > > >> chris > > > >> > > > >> On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote: > > > >> > > > >>> Grrrrrr, I hate eutils!!!! > > > >>> > > > >>> ------------- EXCEPTION: Bio::Root::Exception ------------- > > > >>> MSG: NCBI esearch fatal error: Search Backend failed: Error 111 > > > >> (Connection refused) > > > >>> STACK: Error::throw > > > >>> STACK: Bio::Root::Root::throw > > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 > > > >>> STACK: Bio::Tools::EUtilities::parse_data > > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332 > > > >>> STACK: Bio::Tools::EUtilities::get_ids > > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441 > > > >>> STACK: Bio::DB::EUtilities::get_ids > > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363 > > > >>> STACK: get_desc.pl:32 > > > >>> ----------------------------------------------------------- > > > >>> > > > >>> > > > >>> Nice error message though :-) > > > >>> > > > >>> > > > >>> --Russell > > > >>> > > > >>>> -----Original Message----- > > > >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > > >>>> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell > > > >>>> Sent: Monday, 11 January 2010 10:05 a.m. > > > >>>> To: 'Chris Fields' > > > >>>> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open- > > > bio.org' > > > >>>> Subject: Re: [Bioperl-l] how to retrieve organism name from > accession > > > >>>> number? > > > >>>> > > > >>>> I've started to go off eUtils recently (not BioPerl's fault) as > I've > > > >> often > > > >>>> been finding that with large queries, chunks of the resulting > data is > > > >>>> missing. > > > >>>> For example, before Xmas I was creating species-specific > databases by > > > >>>> using eUtils to get a list of GI numbers back for a taxid, then > > > >> retrieving > > > >>>> the fasta sequences in chunks of 500. > > > >>>> Very regularly, in the middle of the fasta there would be a > message > > > >> about > > > >>>> resource unavailable eg. > > > >>>>> test_sequence_1 > > > >>>> TACGATCATCGCTResource UnavailableTACGACTCTGCT > > > >>>>> test_sequence_2 > > > >>>> TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT > > > >>>> > > > >>>> Often this wasn't detected until formatdb complained about > invalid > > > >>>> characters. > > > >>>> Inquiries to NCBI as to why this was happening and what to do > about > > > it > > > >>>> returned stupid answers ("do each sequence manually thru the web > > > >>>> interface", or "use eUtils"). > > > >>>> As we have a nice fast network connection, I now prefer to > download > > > >> very > > > >>>> large gzip files (i.e. all of refseq) and extract what I need. > > > >>>> > > > >>>> I can't help but think that NCBI could solve a lot of problems if > > > they > > > >>>> gzipped the output from eUtils queries - it's something I've > > > requested > > > >>>> regularly for the last 5 years or so!! > > > >>>> > > > >>>> --Russell > > > >>>> > > > >>>> > > > >>>>> -----Original Message----- > > > >>>>> From: Chris Fields [mailto:cjfields at illinois.edu] > > > >>>>> Sent: Monday, 11 January 2010 9:50 a.m. > > > >>>>> To: Smithies, Russell > > > >>>>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open- > > > bio.org' > > > >>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from > > > accession > > > >>>>> number? > > > >>>>> > > > >>>>> One could also use Bio::DB::Taxonomy, which indexes the same > files > > > or > > > >>>>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD > for > > > >> the > > > >>>>> details). > > > >>>>> > > > >>>>> chris > > > >>>>> > > > >>>>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote: > > > >>>>> > > > >>>>>> An alternate non-BioPerly way (that may be faster given NCBI's > > > >>>> flakiness > > > >>>>> lately) would be to download the gi_taxid_nucl.zip or > > > >> gi_taxid_prot.zip > > > >>>>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into > a > > > hash > > > >>>> and > > > >>>>> do lookups. > > > >>>>>> In that same dir, taxdump.tar.gz contains a file called > names.dmp > > > >>>> which > > > >>>>> lists taxids and descriptions (and synonyms) > > > >>>>>> > > > >>>>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes > so > > > I > > > >>>>> could do this: > > > >>>>>> > > > >>>>>> my $taxid = $gi_taxid_nucl{$accession}; > > > >>>>>> my $org_name = $names{$taxid}; > > > >>>>>> > > > >>>>>> --Russell > > > >>>>>> > > > >>>>>> > > > >>>>>>> -----Original Message----- > > > >>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > > >>>>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen > > > >>>>>>> Sent: Saturday, 26 December 2009 4:52 p.m. > > > >>>>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org > > > >>>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from > > > >> accession > > > >>>>>>> number? > > > >>>>>>> > > > >>>>>>> Bhakti, > > > >>>>>>> The following example (using EUtilities) may serve your > purpose: > > > >>>>>>> > > > >>>>>>> use Bio::DB::EUtilities; > > > >>>>>>> > > > >>>>>>> my (%taxa, @taxa); > > > >>>>>>> my (%names, %idmap); > > > >>>>>>> > > > >>>>>>> # these are protein ids; nuc ids will work by changing -dbfrom > => > > > >>>>>>> 'nucleotide', > > > >>>>>>> # (probably) > > > >>>>>>> > > > >>>>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439); > > > >>>>>>> > > > >>>>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink', > > > >>>>>>> -db => 'taxonomy', > > > >>>>>>> -dbfrom => 'protein', > > > >>>>>>> -correspondence => 1, > > > >>>>>>> -id => \@ids); > > > >>>>>>> > > > >>>>>>> # iterate through the LinkSet objects > > > >>>>>>> while (my $ds = $factory->next_LinkSet) { > > > >>>>>>> $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0] > > > >>>>>>> } > > > >>>>>>> > > > >>>>>>> @taxa = @taxa{@ids}; > > > >>>>>>> > > > >>>>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary', > > > >>>>>>> -db => 'taxonomy', > > > >>>>>>> -id => \@taxa ); > > > >>>>>>> > > > >>>>>>> while (local $_ = $factory->next_DocSum) { > > > >>>>>>> $names{($_->get_contents_by_name('TaxId'))[0]} = > > > >>>>>>> ($_->get_contents_by_name('ScientificName'))[0]; > > > >>>>>>> } > > > >>>>>>> > > > >>>>>>> foreach (@ids) { > > > >>>>>>> $idmap{$_} = $names{$taxa{$_}}; > > > >>>>>>> } > > > >>>>>>> > > > >>>>>>> # %idmap is > > > >>>>>>> # 1621261 => 'Mycobacterium tuberculosis H37Rv' > > > >>>>>>> # 20807972 => 'Thermoanaerobacter tengcongensis MB4' > > > >>>>>>> # 68536103 => 'Corynebacterium jeikeium K411' > > > >>>>>>> # 730439 => 'Bacillus caldolyticus' > > > >>>>>>> # 89318838 => undef (this record has been removed from > the > > > db) > > > >>>>>>> > > > >>>>>>> 1; > > > >>>>>>> > > > >>>>>>> You probably will need to break up your 30000 into chunks > > > >>>>>>> (say, 1000-3000 each), and do the above on each chunk with a > > > >>>>>>> > > > >>>>>>> sleep 3; > > > >>>>>>> > > > >>>>>>> or so separating the queries. > > > >>>>>>> MAJ > > > >>>>>>> ----- Original Message ----- > > > >>>>>>> From: "Bhakti Dwivedi" > > > >>>>>>> To: > > > >>>>>>> Sent: Friday, December 25, 2009 9:46 PM > > > >>>>>>> Subject: [Bioperl-l] how to retrieve organism name from > accession > > > >>>>> number? > > > >>>>>>> > > > >>>>>>> > > > >>>>>>>> Hi, > > > >>>>>>>> > > > >>>>>>>> Does anyone know how to retrieve the "Source" or the "Species > > > name" > > > >>>>>>> given > > > >>>>>>>> the accession number using Bioperl. I have these 30,000 > > > accession > > > >>>>>>> numbers > > > >>>>>>>> for which I need to get the source organisms. Any kind of > help > > > >> will > > > >>>>> be > > > >>>>>>>> appreciated. > > > >>>>>>>> > > > >>>>>>>> Thanks > > > >>>>>>>> > > > >>>>>>>> BD > > > >>>>>>>> _______________________________________________ > > > >>>>>>>> Bioperl-l mailing list > > > >>>>>>>> Bioperl-l at lists.open-bio.org > > > >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>> > > > >>>>>>> _______________________________________________ > > > >>>>>>> Bioperl-l mailing list > > > >>>>>>> Bioperl-l at lists.open-bio.org > > > >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > >>>>>> > > > >>>> > > > > ======================================================================= > > > >>>>>> Attention: The information contained in this message and/or > > > >>>> attachments > > > >>>>>> from AgResearch Limited is intended only for the persons or > > > entities > > > >>>>>> to which it is addressed and may contain confidential and/or > > > >>>> privileged > > > >>>>>> material. Any review, retransmission, dissemination or other > use > > > of, > > > >>>> or > > > >>>>>> taking of any action in reliance upon, this information by > persons > > > or > > > >>>>>> entities other than the intended recipients is prohibited by > > > >>>> AgResearch > > > >>>>>> Limited. If you have received this message in error, please > notify > > > >> the > > > >>>>>> sender immediately. > > > >>>>>> > > > >>>> > > > > ======================================================================= > > > >>>>>> > > > >>>>>> _______________________________________________ > > > >>>>>> Bioperl-l mailing list > > > >>>>>> Bioperl-l at lists.open-bio.org > > > >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > >>>> > > > >>>> > > > >>>> _______________________________________________ > > > >>>> Bioperl-l mailing list > > > >>>> Bioperl-l at lists.open-bio.org > > > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l at lists.open-bio.org > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Thu Jan 28 19:30:12 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 28 Jan 2010 13:30:12 -0600 Subject: [Bioperl-l] how to retrieve organism name from accession number? In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C61AC9AF3@exchsth.agresearch.co.nz> References: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife> <18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz> <4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu> <18DF7D20DFEC044098A1062202F5FFF32C61AC961E@exchsth.agresearch.co.nz> <5FF9B845-2B6A-4B8C-B8AE-A12E0CBCA2D6@illinois.edu> <18DF7D20DFEC044098A1062202F5FFF32C61AC964C@exchsth.agresearch.co.nz> <1264703187.5473.10.camel@cjfields.igb.uiuc.edu> <18DF7D20DFEC044098A1062202F5FFF32C61AC9AF3@exchsth.agresearch.co.nz> Message-ID: <1264707012.5473.51.camel@cjfields.igb.uiuc.edu> Russell, Okay, just wanted to make sure. The email/tool requirements weren't actually enforced up until now, which is forcing us to do a bit of re-work on the various tools that don't have it set by default (at least warn users unaware of it). And I agree, gzipped archives would be nice! chris On Fri, 2010-01-29 at 08:25 +1300, Smithies, Russell wrote: > Yes, I usually set the 'tool' and 'email' parameters. > I went to NCBI back in 2006 and did their "PowerScripting" course where they pointed out a lot of the requirements for using eUtils. I think I requested results returned gzipped back then as well... > > --Russell > > > -----Original Message----- > > From: Chris Fields [mailto:cjfields at illinois.edu] > > Sent: Friday, 29 January 2010 7:26 a.m. > > To: Smithies, Russell > > Cc: 'bioperl-l at lists.open-bio.org'; 'Mark A. Jensen' > > Subject: Re: [Bioperl-l] how to retrieve organism name from accession > > number? > > > > Russell, > > > > Just curious, but have you tried setting the return email parameter > > (-email)? NCBI recently stated that all queries would eventually > > require a return email of some sort (not sure if it's validated or not). > > I think that was set for around late spring. I'm changing the code in > > svn to require it for that very purpose. > > > > chris > > > > > > Wed, 2010-01-27 at 15:45 +1300, Smithies, Russell wrote: > > > Batch-entrez http://www.ncbi.nlm.nih.gov/portal/utils/batchentrez_p.cgi > > still works if you don't mind a bit of manual button clicking. It's > > handling chunks of 100,000 records OK (today). > > > > > > --Russell > > > > > > > -----Original Message----- > > > > From: Chris Fields [mailto:cjfields at illinois.edu] > > > > Sent: Wednesday, 27 January 2010 3:42 p.m. > > > > To: Smithies, Russell > > > > Cc: 'bioperl-l at lists.open-bio.org'; 'Mark A. Jensen' > > > > Subject: Re: [Bioperl-l] how to retrieve organism name from accession > > > > number? > > > > > > > > Makes me wonder if they're pushing more users towards the SOAP-based > > > > services and away from eutils. > > > > > > > > chris > > > > > > > > On Jan 26, 2010, at 7:59 PM, Smithies, Russell wrote: > > > > > > > > > I've had a wide selection of errors lately: > > > > > > > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > > > > MSG: NCBI esearch fatal error: Search Backend failed: Error 11 > > (Resource > > > > temporarily unavailable) > > > > > STACK: Error::throw > > > > > STACK: Bio::Root::Root::throw > > > > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 > > > > > STACK: Bio::Tools::EUtilities::parse_data > > > > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332 > > > > > STACK: Bio::Tools::EUtilities::get_ids > > > > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441 > > > > > STACK: Bio::DB::EUtilities::get_ids > > > > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363 > > > > > STACK: get_desc.pl:32 > > > > > ----------------------------------------------------------- > > > > > > > > > > And I never get a good explanation from NCBI or suggestions on how > > to > > > > avoid it. > > > > > > > > > > > > > > > --Russell > > > > > > > > > > > > > > >> -----Original Message----- > > > > >> From: Chris Fields [mailto:cjfields at illinois.edu] > > > > >> Sent: Wednesday, 27 January 2010 2:46 p.m. > > > > >> To: Smithies, Russell > > > > >> Cc: 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org' > > > > >> Subject: Re: [Bioperl-l] how to retrieve organism name from > > accession > > > > >> number? > > > > >> > > > > >> It's unfortunate but I have heard this problem popping up quite a > > bit > > > > more > > > > >> frequently lately. Not to push too many buttons but NCBI isn't > > very > > > > >> forthcoming with help these days; they have become quite insular. > > Not > > > > >> sure if they're short-staffed due to budget or if there are other > > > > issues. > > > > >> > > > > >> chris > > > > >> > > > > >> On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote: > > > > >> > > > > >>> Grrrrrr, I hate eutils!!!! > > > > >>> > > > > >>> ------------- EXCEPTION: Bio::Root::Exception ------------- > > > > >>> MSG: NCBI esearch fatal error: Search Backend failed: Error 111 > > > > >> (Connection refused) > > > > >>> STACK: Error::throw > > > > >>> STACK: Bio::Root::Root::throw > > > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 > > > > >>> STACK: Bio::Tools::EUtilities::parse_data > > > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332 > > > > >>> STACK: Bio::Tools::EUtilities::get_ids > > > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441 > > > > >>> STACK: Bio::DB::EUtilities::get_ids > > > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363 > > > > >>> STACK: get_desc.pl:32 > > > > >>> ----------------------------------------------------------- > > > > >>> > > > > >>> > > > > >>> Nice error message though :-) > > > > >>> > > > > >>> > > > > >>> --Russell > > > > >>> > > > > >>>> -----Original Message----- > > > > >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > > > >>>> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell > > > > >>>> Sent: Monday, 11 January 2010 10:05 a.m. > > > > >>>> To: 'Chris Fields' > > > > >>>> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open- > > > > bio.org' > > > > >>>> Subject: Re: [Bioperl-l] how to retrieve organism name from > > accession > > > > >>>> number? > > > > >>>> > > > > >>>> I've started to go off eUtils recently (not BioPerl's fault) as > > I've > > > > >> often > > > > >>>> been finding that with large queries, chunks of the resulting > > data is > > > > >>>> missing. > > > > >>>> For example, before Xmas I was creating species-specific > > databases by > > > > >>>> using eUtils to get a list of GI numbers back for a taxid, then > > > > >> retrieving > > > > >>>> the fasta sequences in chunks of 500. > > > > >>>> Very regularly, in the middle of the fasta there would be a > > message > > > > >> about > > > > >>>> resource unavailable eg. > > > > >>>>> test_sequence_1 > > > > >>>> TACGATCATCGCTResource UnavailableTACGACTCTGCT > > > > >>>>> test_sequence_2 > > > > >>>> TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT > > > > >>>> > > > > >>>> Often this wasn't detected until formatdb complained about > > invalid > > > > >>>> characters. > > > > >>>> Inquiries to NCBI as to why this was happening and what to do > > about > > > > it > > > > >>>> returned stupid answers ("do each sequence manually thru the web > > > > >>>> interface", or "use eUtils"). > > > > >>>> As we have a nice fast network connection, I now prefer to > > download > > > > >> very > > > > >>>> large gzip files (i.e. all of refseq) and extract what I need. > > > > >>>> > > > > >>>> I can't help but think that NCBI could solve a lot of problems if > > > > they > > > > >>>> gzipped the output from eUtils queries - it's something I've > > > > requested > > > > >>>> regularly for the last 5 years or so!! > > > > >>>> > > > > >>>> --Russell > > > > >>>> > > > > >>>> > > > > >>>>> -----Original Message----- > > > > >>>>> From: Chris Fields [mailto:cjfields at illinois.edu] > > > > >>>>> Sent: Monday, 11 January 2010 9:50 a.m. > > > > >>>>> To: Smithies, Russell > > > > >>>>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open- > > > > bio.org' > > > > >>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from > > > > accession > > > > >>>>> number? > > > > >>>>> > > > > >>>>> One could also use Bio::DB::Taxonomy, which indexes the same > > files > > > > or > > > > >>>>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD > > for > > > > >> the > > > > >>>>> details). > > > > >>>>> > > > > >>>>> chris > > > > >>>>> > > > > >>>>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote: > > > > >>>>> > > > > >>>>>> An alternate non-BioPerly way (that may be faster given NCBI's > > > > >>>> flakiness > > > > >>>>> lately) would be to download the gi_taxid_nucl.zip or > > > > >> gi_taxid_prot.zip > > > > >>>>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into > > a > > > > hash > > > > >>>> and > > > > >>>>> do lookups. > > > > >>>>>> In that same dir, taxdump.tar.gz contains a file called > > names.dmp > > > > >>>> which > > > > >>>>> lists taxids and descriptions (and synonyms) > > > > >>>>>> > > > > >>>>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes > > so > > > > I > > > > >>>>> could do this: > > > > >>>>>> > > > > >>>>>> my $taxid = $gi_taxid_nucl{$accession}; > > > > >>>>>> my $org_name = $names{$taxid}; > > > > >>>>>> > > > > >>>>>> --Russell > > > > >>>>>> > > > > >>>>>> > > > > >>>>>>> -----Original Message----- > > > > >>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > > > >>>>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen > > > > >>>>>>> Sent: Saturday, 26 December 2009 4:52 p.m. > > > > >>>>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org > > > > >>>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from > > > > >> accession > > > > >>>>>>> number? > > > > >>>>>>> > > > > >>>>>>> Bhakti, > > > > >>>>>>> The following example (using EUtilities) may serve your > > purpose: > > > > >>>>>>> > > > > >>>>>>> use Bio::DB::EUtilities; > > > > >>>>>>> > > > > >>>>>>> my (%taxa, @taxa); > > > > >>>>>>> my (%names, %idmap); > > > > >>>>>>> > > > > >>>>>>> # these are protein ids; nuc ids will work by changing -dbfrom > > => > > > > >>>>>>> 'nucleotide', > > > > >>>>>>> # (probably) > > > > >>>>>>> > > > > >>>>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439); > > > > >>>>>>> > > > > >>>>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink', > > > > >>>>>>> -db => 'taxonomy', > > > > >>>>>>> -dbfrom => 'protein', > > > > >>>>>>> -correspondence => 1, > > > > >>>>>>> -id => \@ids); > > > > >>>>>>> > > > > >>>>>>> # iterate through the LinkSet objects > > > > >>>>>>> while (my $ds = $factory->next_LinkSet) { > > > > >>>>>>> $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0] > > > > >>>>>>> } > > > > >>>>>>> > > > > >>>>>>> @taxa = @taxa{@ids}; > > > > >>>>>>> > > > > >>>>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary', > > > > >>>>>>> -db => 'taxonomy', > > > > >>>>>>> -id => \@taxa ); > > > > >>>>>>> > > > > >>>>>>> while (local $_ = $factory->next_DocSum) { > > > > >>>>>>> $names{($_->get_contents_by_name('TaxId'))[0]} = > > > > >>>>>>> ($_->get_contents_by_name('ScientificName'))[0]; > > > > >>>>>>> } > > > > >>>>>>> > > > > >>>>>>> foreach (@ids) { > > > > >>>>>>> $idmap{$_} = $names{$taxa{$_}}; > > > > >>>>>>> } > > > > >>>>>>> > > > > >>>>>>> # %idmap is > > > > >>>>>>> # 1621261 => 'Mycobacterium tuberculosis H37Rv' > > > > >>>>>>> # 20807972 => 'Thermoanaerobacter tengcongensis MB4' > > > > >>>>>>> # 68536103 => 'Corynebacterium jeikeium K411' > > > > >>>>>>> # 730439 => 'Bacillus caldolyticus' > > > > >>>>>>> # 89318838 => undef (this record has been removed from > > the > > > > db) > > > > >>>>>>> > > > > >>>>>>> 1; > > > > >>>>>>> > > > > >>>>>>> You probably will need to break up your 30000 into chunks > > > > >>>>>>> (say, 1000-3000 each), and do the above on each chunk with a > > > > >>>>>>> > > > > >>>>>>> sleep 3; > > > > >>>>>>> > > > > >>>>>>> or so separating the queries. > > > > >>>>>>> MAJ > > > > >>>>>>> ----- Original Message ----- > > > > >>>>>>> From: "Bhakti Dwivedi" > > > > >>>>>>> To: > > > > >>>>>>> Sent: Friday, December 25, 2009 9:46 PM > > > > >>>>>>> Subject: [Bioperl-l] how to retrieve organism name from > > accession > > > > >>>>> number? > > > > >>>>>>> > > > > >>>>>>> > > > > >>>>>>>> Hi, > > > > >>>>>>>> > > > > >>>>>>>> Does anyone know how to retrieve the "Source" or the "Species > > > > name" > > > > >>>>>>> given > > > > >>>>>>>> the accession number using Bioperl. I have these 30,000 > > > > accession > > > > >>>>>>> numbers > > > > >>>>>>>> for which I need to get the source organisms. Any kind of > > help > > > > >> will > > > > >>>>> be > > > > >>>>>>>> appreciated. > > > > >>>>>>>> > > > > >>>>>>>> Thanks > > > > >>>>>>>> > > > > >>>>>>>> BD > > > > >>>>>>>> _______________________________________________ > > > > >>>>>>>> Bioperl-l mailing list > > > > >>>>>>>> Bioperl-l at lists.open-bio.org > > > > >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > >>>>>>>> > > > > >>>>>>>> > > > > >>>>>>> > > > > >>>>>>> _______________________________________________ > > > > >>>>>>> Bioperl-l mailing list > > > > >>>>>>> Bioperl-l at lists.open-bio.org > > > > >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > >>>>>> > > > > >>>> > > > > > > ======================================================================= > > > > >>>>>> Attention: The information contained in this message and/or > > > > >>>> attachments > > > > >>>>>> from AgResearch Limited is intended only for the persons or > > > > entities > > > > >>>>>> to which it is addressed and may contain confidential and/or > > > > >>>> privileged > > > > >>>>>> material. Any review, retransmission, dissemination or other > > use > > > > of, > > > > >>>> or > > > > >>>>>> taking of any action in reliance upon, this information by > > persons > > > > or > > > > >>>>>> entities other than the intended recipients is prohibited by > > > > >>>> AgResearch > > > > >>>>>> Limited. If you have received this message in error, please > > notify > > > > >> the > > > > >>>>>> sender immediately. > > > > >>>>>> > > > > >>>> > > > > > > ======================================================================= > > > > >>>>>> > > > > >>>>>> _______________________________________________ > > > > >>>>>> Bioperl-l mailing list > > > > >>>>>> Bioperl-l at lists.open-bio.org > > > > >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > >>>> > > > > >>>> > > > > >>>> _______________________________________________ > > > > >>>> Bioperl-l mailing list > > > > >>>> Bioperl-l at lists.open-bio.org > > > > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > > > > _______________________________________________ > > > > > Bioperl-l mailing list > > > > > Bioperl-l at lists.open-bio.org > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From maj at fortinbras.us Thu Jan 28 19:55:31 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 28 Jan 2010 14:55:31 -0500 Subject: [Bioperl-l] EUtilities policy change In-Reply-To: <1264706302.5473.48.camel@cjfields.igb.uiuc.edu> References: <1264705226.5473.35.camel@cjfields.igb.uiuc.edu><8F49B5ED151143FA86E977B4D4F44265@NewLife> <1264706302.5473.48.camel@cjfields.igb.uiuc.edu> Message-ID: Ok, SoapEU now warns on no email; passes email onto the fetch stage during autofetch -- cheers MAJ ----- Original Message ----- From: "Chris Fields" To: "Mark A. Jensen" Cc: "BioPerl-l" Sent: Thursday, January 28, 2010 2:18 PM Subject: Re: [Bioperl-l] EUtilities policy change >I think warning is fine for now. I've reimplemented that so it occurs > lazily (warns only when a request is actually made). > > Will also change the tool to 'BioPerl' (currently 'bioperl', all lc). > We'll obviously have to address this in the test suite as well in some > way, maybe ask for an email if network tests are requested. > > chris > > On Thu, 2010-01-28 at 14:05 -0500, Mark A. Jensen wrote: >> Thanks Chris-- >> The soap modules currently set tool to "SoapEUtilities(BioPerl)". >> I agree that a default email is a bad idea (tm) (unless maybe it's >> hilmar's...?). I'd say a warning on unset email parameters is a responsible >> "there be dragons" sort of treatment. >> MAJ >> ----- Original Message ----- >> From: "Chris Fields" >> To: "BioPerl-l" >> Cc: "Mark A. Jensen" >> Sent: Thursday, January 28, 2010 2:00 PM >> Subject: EUtilities policy change >> >> >> > All, >> > >> > Per NCBI's recent change in eutils user policy (effective June 1): >> > >> > http://bioperl.org/pipermail/bioperl-l/2009-December/031698.html >> > >> > Both the tool and email parameters ('-tool', '-email') are now required >> > when making requests. Note this will significantly break all modules >> > requiring remote access to eutils (Bio::DB::GenBank/GenPept, some Biblio >> > and Taxonomy stuff as well, IIRC). This also applies to web services >> > (SOAP-based access). Mark, not sure how this affects your SOAP-based >> > modules. >> > >> > I have reconfigured Bio::DB::EUtilities to follow this policy; the >> > default tool setting has been 'bioperl' and will remain that way. >> > However, there has been no default email, therefore setting this is now >> > required for future requests unless we (the bioperl devs) decide there >> > is a safe default email to utilize. My gut tells me, however, that >> > falling back to a default email opens up a can of worms for the devs and >> > is very likely a 'BAD IDEA'(TM). >> > >> > Regardless, be aware that, after June 1, NCBI will very likely exclude >> > requests with no email and will notify users who are considered to be >> > violating their policies. >> > >> > I will likely make further changes to Bio::DB::EUtilities in the >> > meantime to ensure that using the tools by default will not violate >> > NCBI's policy (e.g. override this at your own risk). >> > >> > chris >> > >> > >> > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From chapmanb at 50mail.com Thu Jan 28 20:35:05 2010 From: chapmanb at 50mail.com (Brad Chapman) Date: Thu, 28 Jan 2010 15:35:05 -0500 Subject: [Bioperl-l] OpenBio solution challenge: Project updates at BOSC 2010 Message-ID: <20100128203505.GG40046@sobchak.mgh.harvard.edu> Hello all; The BOSC 2010 organizing committee is hard at work getting prepared for this July's meeting in Boston: http://www.open-bio.org/wiki/BOSC_2010 One of the items we've traditionally had at the conference is a project update from each of the OpenBio affiliated groups. This year, we're thinking about organizing these talks around a central theme: the OpenBio solution challenge. We start with a biological question of general interest, and each of the project talks would focus around how you would solve that problem using your toolkit and programming language. This is meant to provide a challenge for OpenBio contributors, a nice tutorial style overview of various projects and approaches for other programmers, and a fun opportunity to compete and learn from other projects. Conference attendees will vote on their favorite solution, with the winner receiving fame and fortune (warning: fortune not guaranteed). For this to be successful, it of course requires interest and enthusiasm from y'all fine folks involved with the projects. Specifically: - Is there interest from your group in participating in the challenge? You'll want at least a few people to work on it, and someone to give a presentation at BOSC. - Do you have suggestions on a good theme or specific biological problem to tackle? We'll hope to pick something in a sweet spot that is challenging enough to be of interest, yet reasonable for presentation and preparation. Let's discuss ideas and get this together. Since the schedule for BOSC is developing rapidly, please give us an idea if you're interested by February 12th, and copy responses to the BOSC mailing list as a central place for discussion. bosc at open-bio.org Thanks, Brad, Michael, and the BOSC organizing committee From markw at illuminae.com Thu Jan 28 21:17:44 2010 From: markw at illuminae.com (Mark Wilkinson) Date: Thu, 28 Jan 2010 13:17:44 -0800 Subject: [Bioperl-l] [MOBY-dev] OpenBio solution challenge: Project updates at BOSC 2010 In-Reply-To: <20100128203505.GG40046@sobchak.mgh.harvard.edu> References: <20100128203505.GG40046@sobchak.mgh.harvard.edu> Message-ID: Brad, this sounds exciting! One thing strikes me, though - by asking for the sub-projects to propose the "grand challenge" themselves the one thing you can guarantee is that the "grand challenge" is solvable (or more likely, already solved!) Other "grand challenge" kinds of meetings have an independent third party pose the problem that has to be solved, and then all groups work toward a solution and compare their results. This would, IMO, be more revealing of the "state of the art" in each Open-Bio project, and point out where the weaknesses are that we should be focusing on... Someone (for example, you!) could act as the moderator to ensure that the "grand challenge" was at least a reasonable one, within the scope of what an Open-Bio project *should* be able to solve... Just my CAD $0.02 Mark On Thu, 28 Jan 2010 12:35:05 -0800, Brad Chapman wrote: > Hello all; > The BOSC 2010 organizing committee is hard at work getting prepared for > this > July's meeting in Boston: > > http://www.open-bio.org/wiki/BOSC_2010 > > One of the items we've traditionally had at the conference is a project > update from each of the OpenBio affiliated groups. This year, we're > thinking > about organizing these talks around a central theme: the OpenBio solution > challenge. We start with a biological question of general interest, and > each > of the project talks would focus around how you would solve that problem > using your toolkit and programming language. > > This is meant to provide a challenge for OpenBio contributors, a nice > tutorial > style overview of various projects and approaches for other programmers, > and a > fun opportunity to compete and learn from other projects. Conference > attendees > will vote on their favorite solution, with the winner receiving fame and > fortune (warning: fortune not guaranteed). > > For this to be successful, it of course requires interest and enthusiasm > from > y'all fine folks involved with the projects. Specifically: > > - Is there interest from your group in participating in the challenge? > You'll > want at least a few people to work on it, and someone to give a > presentation > at BOSC. > > - Do you have suggestions on a good theme or specific biological problem > to > tackle? We'll hope to pick something in a sweet spot that is > challenging > enough to be of interest, yet reasonable for presentation and > preparation. > > Let's discuss ideas and get this together. Since the schedule for BOSC is > developing rapidly, please give us an idea if you're interested by > February 12th, and copy responses to the BOSC mailing list as a central > place for discussion. > > bosc at open-bio.org > > Thanks, > Brad, Michael, and the BOSC organizing committee > _______________________________________________ > MOBY-dev mailing list > MOBY-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/moby-dev -- Mark D Wilkinson, PI Bioinformatics Assistant Professor, Medical Genetics The James Hogg iCAPTURE Centre for Cardiovascular and Pulmonary Research Providence Heart + Lung Institute University of British Columbia - St. Paul's Hospital Vancouver, BC, Canada From HWillis at scripps.edu Fri Jan 29 01:03:10 2010 From: HWillis at scripps.edu (Scooter Willis) Date: Thu, 28 Jan 2010 20:03:10 -0500 Subject: [Bioperl-l] [Biojava-dev] [MOBY-dev] OpenBio solution challenge: Project updates at BOSC 2010 In-Reply-To: References: <20100128203505.GG40046@sobchak.mgh.harvard.edu> Message-ID: <716E205A-5196-409F-A7BC-EF0F52AA997A@scripps.edu> Brad I agree with Mark that a particular problem may be biased towards a toolkit/language. Another approach would be to list a collection of problems and each group would then pick a problem to present. Could be a little more interesting to the audience as you are exposed to different problems and the various strengths of each toolkit. This could also help guide future development in the other toolkits as you would benefit from learning about the api and/or programming language. Each group would register a problem that they are going to present. From the group of problems not picked that becomes the surprise challenge where each group has 24 hours to either put together a presentation or an actual solution. Scooter On Jan 28, 2010, at 4:17 PM, Mark Wilkinson wrote: > > Brad, this sounds exciting! > > One thing strikes me, though - by asking for the sub-projects to propose > the "grand challenge" themselves the one thing you can guarantee is that > the "grand challenge" is solvable (or more likely, already solved!) > > Other "grand challenge" kinds of meetings have an independent third party > pose the problem that has to be solved, and then all groups work toward a > solution and compare their results. This would, IMO, be more revealing of > the "state of the art" in each Open-Bio project, and point out where the > weaknesses are that we should be focusing on... Someone (for example, > you!) could act as the moderator to ensure that the "grand challenge" was > at least a reasonable one, within the scope of what an Open-Bio project > *should* be able to solve... > > Just my CAD $0.02 > > Mark > > > > On Thu, 28 Jan 2010 12:35:05 -0800, Brad Chapman > wrote: > >> Hello all; >> The BOSC 2010 organizing committee is hard at work getting prepared for >> this >> July's meeting in Boston: >> >> http://www.open-bio.org/wiki/BOSC_2010 >> >> One of the items we've traditionally had at the conference is a project >> update from each of the OpenBio affiliated groups. This year, we're >> thinking >> about organizing these talks around a central theme: the OpenBio solution >> challenge. We start with a biological question of general interest, and >> each >> of the project talks would focus around how you would solve that problem >> using your toolkit and programming language. >> >> This is meant to provide a challenge for OpenBio contributors, a nice >> tutorial >> style overview of various projects and approaches for other programmers, >> and a >> fun opportunity to compete and learn from other projects. Conference >> attendees >> will vote on their favorite solution, with the winner receiving fame and >> fortune (warning: fortune not guaranteed). >> >> For this to be successful, it of course requires interest and enthusiasm >> from >> y'all fine folks involved with the projects. Specifically: >> >> - Is there interest from your group in participating in the challenge? >> You'll >> want at least a few people to work on it, and someone to give a >> presentation >> at BOSC. >> >> - Do you have suggestions on a good theme or specific biological problem >> to >> tackle? We'll hope to pick something in a sweet spot that is >> challenging >> enough to be of interest, yet reasonable for presentation and >> preparation. >> >> Let's discuss ideas and get this together. Since the schedule for BOSC is >> developing rapidly, please give us an idea if you're interested by >> February 12th, and copy responses to the BOSC mailing list as a central >> place for discussion. >> >> bosc at open-bio.org >> >> Thanks, >> Brad, Michael, and the BOSC organizing committee >> _______________________________________________ >> MOBY-dev mailing list >> MOBY-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/moby-dev > > > -- > Mark D Wilkinson, PI Bioinformatics > Assistant Professor, Medical Genetics > The James Hogg iCAPTURE Centre for Cardiovascular and Pulmonary Research > Providence Heart + Lung Institute > University of British Columbia - St. Paul's Hospital > Vancouver, BC, Canada > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From biopython at maubp.freeserve.co.uk Fri Jan 29 10:36:40 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 29 Jan 2010 10:36:40 +0000 Subject: [Bioperl-l] [MOBY-dev] OpenBio solution challenge: Project updates at BOSC 2010 In-Reply-To: References: <20100128203505.GG40046@sobchak.mgh.harvard.edu> Message-ID: <320fb6e01001290236l1ad02515w403a19f94dbb6d15@mail.gmail.com> Hi all, This is a great topic but should be continue it on just the one mailing list? Is there a suitable BOSC list, or how about the general Open Bio list? On Thu, Jan 28, 2010 at 9:17 PM, Mark Wilkinson wrote: > > Brad, this sounds exciting! > > One thing strikes me, though - by asking for the sub-projects to propose > the "grand challenge" themselves the one thing you can guarantee is that > the "grand challenge" is solvable (or more likely, already solved!) > > Other "grand challenge" kinds of meetings have an independent third party > pose the problem that has to be solved, and then all groups work toward a > solution and compare their results. ?This would, IMO, be more revealing of > the "state of the art" in each Open-Bio project, and point out where the > weaknesses are that we should be focusing on... ?Someone (for example, > you!) could act as the moderator to ensure that the "grand challenge" was > at least a reasonable one, within the scope of what an Open-Bio project > *should* be able to solve... > > Just my CAD $0.02 > > Mark One possible problem with having Brad act as moderator is his ties to Biopython (plus it would be a shame if we'd be one man down for trying to solve the challenges - grin). Having a project representative "sign off" on the challenge might work - or simply the whole of the BOSC committee which is quite balanced. Alternatively some kind of panel of challenges does seem a good way to reduce individual project bias (as suggest by Scooter), but there will still need to be a judging committee. I'm curious what kind of challenges the BOSC committee had in mind - would something like taking a newly sequence bacteria and producing an automated annotation as a GenBank, EMBL, or GFF file be too ambitious for example? There are already several major projects to do this e.g. RAST http://rast.nmpdr.org/ Peter (@Biopython) From mike.stubbington at bbsrc.ac.uk Fri Jan 29 13:25:25 2010 From: mike.stubbington at bbsrc.ac.uk (mike stubbington (BI)) Date: Fri, 29 Jan 2010 13:25:25 +0000 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error running blastn In-Reply-To: References: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk><56CC8EF7EEAD4E28A3B822ADA8AD74B1@NewLife> <05171AE0-71F2-4102-8131-70C6019D0172@bbsrc.ac.uk> Message-ID: Hi Mark, Thanks for your continued help. It now fails with this: ------------- EXCEPTION ------------- MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running /usr/local/ncbi/blast/bin/blastn : BLAST Database error: No alias or index file found for nucleotide database [MouseGenome] in search path [/Volumes/stubbing/PerlScripts/5CTest/trunk::] STACK Bio::Tools::Run::WrapperBase::_run /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1004 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 STACK Bio::Tools::Run::StandAloneBlastPlus::run /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 STACK toplevel ./5CTest.pl:63 ------------------------------------- If I change the factory creation to: my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new( -db_name => '/Users/stubbing/localBlast/MouseGenome' ); it fails with ------------- EXCEPTION ------------- MSG: DB name not valid STACK Bio::Tools::Run::StandAloneBlastPlus::new /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:516 STACK toplevel ./5CTest.pl:45 ------------------------------------- However I can run the following successfully from the command line: blastn -db /Users/stubbing/localBlast/MouseGenome -query querySequence.fasta Is there something wrong with how I'm referring to the blast database when I construct my factory? Thanks again, M On 28 Jan 2010, at 18:47, Mark A. Jensen wrote: > Hi Mike, > Believe I found the real bug causing the problem (was not accounting for > the db_dir parameter). Crashes should now also throw much more helpful > errors. Please try the code at r16774, and shout back. > thanks -- > MAJ > ----- Original Message ----- > From: "mike stubbington (BI)" > To: "Mark A. Jensen" > Cc: > Sent: Thursday, January 28, 2010 11:18 AM > Subject: Re: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek > error running blastn > > > Hi, > > Thanks for the suggestion. Unfortunately it still fails - error as follows: > > ------------- EXCEPTION ------------- > MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running > /usr/local/ncbi/blast/bin/blastn : Illegal seek at > /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 1000, > line 532. > > STACK Bio::Tools::Run::WrapperBase::_run > /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1005 > STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 > STACK Bio::Tools::Run::StandAloneBlastPlus::run > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 > STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 > STACK toplevel ./5CTest.pl:63 > ------------------------------------- > > M > > On 28 Jan 2010, at 15:56, Mark A. Jensen wrote: > >> Mike - please try updating your bioperl-live (the core) to the latest code >> (revision 16761 or so). >> CommandExts is a work in progress; from the stack errors it looks like you've >> got an older version. >> Try it then ping us back, if you would-- >> Thanks >> Mark >> ----- Original Message ----- >> From: "mike stubbington (BI)" >> To: >> Sent: Thursday, January 28, 2010 10:41 AM >> Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error >> running blastn >> >> >> Dear all, >> >> I am attempting to blast some primers against the mouse genome. I have created >> a >> local mouse genome blast database and I can search against it using 'blastn' >> at >> the command line. >> >> I have perl code that creates an array of bioperl sequence objects called >> @primers >> >> I then create a StandAloneBlastPlus factory using the following code? >> >> my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new( >> -db_dir => '/Users/stubbing/localBlast/', >> -db_name => 'MouseGenome' >> ); >> >> and then attempt to blast my primers using this? >> >> my @shortPrimers; >> my $count=1; >> foreach (@primers) { >> my $currentSeq = $_; >> print "Checking primer $count/$primerNumber "; >> if ($_->length < 40) { >> push(@shortPrimers,$_); >> print "Too short!\n"; >> } >> else { >> print "BLASTing..."; >> my $blastResult = $blastFactory->blastn(-query => $currentSeq); >> } >> $count++; >> } >> >> This fails with the following error? >> >> ------------- EXCEPTION ------------- >> MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem >> running >> /usr/local/ncbi/blast/bin/blastn : Illegal seek at >> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 989, >> line 532. >> >> STACK Bio::Tools::Run::WrapperBase::_run >> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:994 >> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD >> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 >> STACK Bio::Tools::Run::StandAloneBlastPlus::run >> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 >> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD >> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 >> STACK toplevel ./5CTest.pl:63 >> ------------------------------------- >> >> Line 63 in my code is (as you might expect) the one that calls blastn on my >> factory object. >> >> I'd appreciate any help you might be able to provide to shed light on this. >> >> Thanks in advance, >> >> Mike >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Fri Jan 29 13:36:54 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 29 Jan 2010 08:36:54 -0500 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error running blastn In-Reply-To: References: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk><56CC8EF7EEAD4E28A3B822ADA8AD74B1@NewLife> <05171AE0-71F2-4102-8131-70C6019D0172@bbsrc.ac.uk> Message-ID: Hi Mike- Well, at least we're getting more informative errors. I think it's still my bad; will look again. Both of your calls should work. (thanks for the positive control too) Thanks for your patience and the help-- MAJ ----- Original Message ----- From: "mike stubbington (BI)" To: "Mark A. Jensen" Cc: ; "Brian Osborne" Sent: Friday, January 29, 2010 8:25 AM Subject: Re: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error running blastn Hi Mark, Thanks for your continued help. It now fails with this: ------------- EXCEPTION ------------- MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running /usr/local/ncbi/blast/bin/blastn : BLAST Database error: No alias or index file found for nucleotide database [MouseGenome] in search path [/Volumes/stubbing/PerlScripts/5CTest/trunk::] STACK Bio::Tools::Run::WrapperBase::_run /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1004 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 STACK Bio::Tools::Run::StandAloneBlastPlus::run /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 STACK toplevel ./5CTest.pl:63 ------------------------------------- If I change the factory creation to: my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new( -db_name => '/Users/stubbing/localBlast/MouseGenome' ); it fails with ------------- EXCEPTION ------------- MSG: DB name not valid STACK Bio::Tools::Run::StandAloneBlastPlus::new /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:516 STACK toplevel ./5CTest.pl:45 ------------------------------------- However I can run the following successfully from the command line: blastn -db /Users/stubbing/localBlast/MouseGenome -query querySequence.fasta Is there something wrong with how I'm referring to the blast database when I construct my factory? Thanks again, M On 28 Jan 2010, at 18:47, Mark A. Jensen wrote: > Hi Mike, > Believe I found the real bug causing the problem (was not accounting for > the db_dir parameter). Crashes should now also throw much more helpful > errors. Please try the code at r16774, and shout back. > thanks -- > MAJ > ----- Original Message ----- > From: "mike stubbington (BI)" > To: "Mark A. Jensen" > Cc: > Sent: Thursday, January 28, 2010 11:18 AM > Subject: Re: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek > error running blastn > > > Hi, > > Thanks for the suggestion. Unfortunately it still fails - error as follows: > > ------------- EXCEPTION ------------- > MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem > running > /usr/local/ncbi/blast/bin/blastn : Illegal seek at > /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 1000, > > line 532. > > STACK Bio::Tools::Run::WrapperBase::_run > /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1005 > STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 > STACK Bio::Tools::Run::StandAloneBlastPlus::run > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 > STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 > STACK toplevel ./5CTest.pl:63 > ------------------------------------- > > M > > On 28 Jan 2010, at 15:56, Mark A. Jensen wrote: > >> Mike - please try updating your bioperl-live (the core) to the latest code >> (revision 16761 or so). >> CommandExts is a work in progress; from the stack errors it looks like you've >> got an older version. >> Try it then ping us back, if you would-- >> Thanks >> Mark >> ----- Original Message ----- >> From: "mike stubbington (BI)" >> To: >> Sent: Thursday, January 28, 2010 10:41 AM >> Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek >> error >> running blastn >> >> >> Dear all, >> >> I am attempting to blast some primers against the mouse genome. I have >> created >> a >> local mouse genome blast database and I can search against it using 'blastn' >> at >> the command line. >> >> I have perl code that creates an array of bioperl sequence objects called >> @primers >> >> I then create a StandAloneBlastPlus factory using the following code? >> >> my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new( >> -db_dir => '/Users/stubbing/localBlast/', >> -db_name => 'MouseGenome' >> ); >> >> and then attempt to blast my primers using this? >> >> my @shortPrimers; >> my $count=1; >> foreach (@primers) { >> my $currentSeq = $_; >> print "Checking primer $count/$primerNumber "; >> if ($_->length < 40) { >> push(@shortPrimers,$_); >> print "Too short!\n"; >> } >> else { >> print "BLASTing..."; >> my $blastResult = $blastFactory->blastn(-query => $currentSeq); >> } >> $count++; >> } >> >> This fails with the following error? >> >> ------------- EXCEPTION ------------- >> MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem >> running >> /usr/local/ncbi/blast/bin/blastn : Illegal seek at >> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 989, >> >> line 532. >> >> STACK Bio::Tools::Run::WrapperBase::_run >> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:994 >> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD >> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 >> STACK Bio::Tools::Run::StandAloneBlastPlus::run >> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 >> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD >> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 >> STACK toplevel ./5CTest.pl:63 >> ------------------------------------- >> >> Line 63 in my code is (as you might expect) the one that calls blastn on my >> factory object. >> >> I'd appreciate any help you might be able to provide to shed light on this. >> >> Thanks in advance, >> >> Mike >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Fri Jan 29 13:47:48 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 29 Jan 2010 08:47:48 -0500 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error running blastn In-Reply-To: References: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk><56CC8EF7EEAD4E28A3B822ADA8AD74B1@NewLife><05171AE0-71F2-4102-8131-70C6019D0172@bbsrc.ac.uk> Message-ID: <2B7BF6CD46AE441AB24203E169D9C503@NewLife> Mike et al-- I've entered this as Bug #3003 on http://bugzilla.bioperl.org; we'll do further ping-pongs on this issue via the comment facility there-- cheers MAJ ----- Original Message ----- From: "mike stubbington (BI)" To: "Mark A. Jensen" Cc: ; ; "Osborne" Sent: Friday, January 29, 2010 8:25 AM Subject: Re: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error running blastn Hi Mark, Thanks for your continued help. It now fails with this: ------------- EXCEPTION ------------- MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running /usr/local/ncbi/blast/bin/blastn : BLAST Database error: No alias or index file found for nucleotide database [MouseGenome] in search path [/Volumes/stubbing/PerlScripts/5CTest/trunk::] STACK Bio::Tools::Run::WrapperBase::_run /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1004 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 STACK Bio::Tools::Run::StandAloneBlastPlus::run /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 STACK toplevel ./5CTest.pl:63 ------------------------------------- If I change the factory creation to: my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new( -db_name => '/Users/stubbing/localBlast/MouseGenome' ); it fails with ------------- EXCEPTION ------------- MSG: DB name not valid STACK Bio::Tools::Run::StandAloneBlastPlus::new /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:516 STACK toplevel ./5CTest.pl:45 ------------------------------------- However I can run the following successfully from the command line: blastn -db /Users/stubbing/localBlast/MouseGenome -query querySequence.fasta Is there something wrong with how I'm referring to the blast database when I construct my factory? Thanks again, M On 28 Jan 2010, at 18:47, Mark A. Jensen wrote: > Hi Mike, > Believe I found the real bug causing the problem (was not accounting for > the db_dir parameter). Crashes should now also throw much more helpful > errors. Please try the code at r16774, and shout back. > thanks -- > MAJ > ----- Original Message ----- > From: "mike stubbington (BI)" > To: "Mark A. Jensen" > Cc: > Sent: Thursday, January 28, 2010 11:18 AM > Subject: Re: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek > error running blastn > > > Hi, > > Thanks for the suggestion. Unfortunately it still fails - error as follows: > > ------------- EXCEPTION ------------- > MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem > running > /usr/local/ncbi/blast/bin/blastn : Illegal seek at > /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 1000, > > line 532. > > STACK Bio::Tools::Run::WrapperBase::_run > /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1005 > STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 > STACK Bio::Tools::Run::StandAloneBlastPlus::run > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 > STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD > /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 > STACK toplevel ./5CTest.pl:63 > ------------------------------------- > > M > > On 28 Jan 2010, at 15:56, Mark A. Jensen wrote: > >> Mike - please try updating your bioperl-live (the core) to the latest code >> (revision 16761 or so). >> CommandExts is a work in progress; from the stack errors it looks like you've >> got an older version. >> Try it then ping us back, if you would-- >> Thanks >> Mark >> ----- Original Message ----- >> From: "mike stubbington (BI)" >> To: >> Sent: Thursday, January 28, 2010 10:41 AM >> Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek >> error >> running blastn >> >> >> Dear all, >> >> I am attempting to blast some primers against the mouse genome. I have >> created >> a >> local mouse genome blast database and I can search against it using 'blastn' >> at >> the command line. >> >> I have perl code that creates an array of bioperl sequence objects called >> @primers >> >> I then create a StandAloneBlastPlus factory using the following code? >> >> my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new( >> -db_dir => '/Users/stubbing/localBlast/', >> -db_name => 'MouseGenome' >> ); >> >> and then attempt to blast my primers using this? >> >> my @shortPrimers; >> my $count=1; >> foreach (@primers) { >> my $currentSeq = $_; >> print "Checking primer $count/$primerNumber "; >> if ($_->length < 40) { >> push(@shortPrimers,$_); >> print "Too short!\n"; >> } >> else { >> print "BLASTing..."; >> my $blastResult = $blastFactory->blastn(-query => $currentSeq); >> } >> $count++; >> } >> >> This fails with the following error? >> >> ------------- EXCEPTION ------------- >> MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem >> running >> /usr/local/ncbi/blast/bin/blastn : Illegal seek at >> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 989, >> >> line 532. >> >> STACK Bio::Tools::Run::WrapperBase::_run >> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:994 >> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD >> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236 >> STACK Bio::Tools::Run::StandAloneBlastPlus::run >> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267 >> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD >> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233 >> STACK toplevel ./5CTest.pl:63 >> ------------------------------------- >> >> Line 63 in my code is (as you might expect) the one that calls blastn on my >> factory object. >> >> I'd appreciate any help you might be able to provide to shed light on this. >> >> Thanks in advance, >> >> Mike >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From help at gmod.org Fri Jan 29 22:03:48 2010 From: help at gmod.org (Dave Clements, GMOD Help Desk) Date: Fri, 29 Jan 2010 14:03:48 -0800 Subject: [Bioperl-l] 2010 GMOD Summer School - Americas In-Reply-To: <71ee57c71001291402h2ec67300r4fc7a3b2375f4080@mail.gmail.com> References: <71ee57c71001291351q47994b82w10dffb390dbf2837@mail.gmail.com> <71ee57c71001291354m68548823s3e3fbd2e49e9b332@mail.gmail.com> <71ee57c71001291356p5e7f1aadi2bf437c93014a393@mail.gmail.com> <71ee57c71001291357h67112e2fkcf835687e59f66ae@mail.gmail.com> <71ee57c71001291358k74781b08n232534d8895c5ec1@mail.gmail.com> <71ee57c71001291400y28e40eb6i112ea91df977dc67@mail.gmail.com> <71ee57c71001291400n6133982eh3a02293ff741900b@mail.gmail.com> <71ee57c71001291401y505b56baic61c11754d88a444@mail.gmail.com> <71ee57c71001291402s23e3f2e9w2562d6acf85bd4ae@mail.gmail.com> <71ee57c71001291402h2ec67300r4fc7a3b2375f4080@mail.gmail.com> Message-ID: <71ee57c71001291403s19be18f3s3a1d5a314c74def@mail.gmail.com> Hello all, I am pleased to announce that we are now accepting applications for: ? 2010 GMOD Summer School - Americas ? ? 6-9 May 2010 ? ? NESCent, Durham, NC, USA ? ? http://gmod.org/wiki/2010_GMOD_Summer_School_-_Americas This will be a hands-on multi-day course aimed at teaching new GMOD users/administrators how to get GMOD Components up and running. The course will introduce participants to the GMOD project and then focus on installation, configuration and integration of popular GMOD Components. The course will be held May 6-9, at NESCent in Durham, NC. These components will be covered: ? ?* Apollo - genome annotation editor ? ?* Chado - a modular and extensible database schema ? ?* Galaxy - workflow system ? ?* GBrowse - the Generic Genome Browser ? ?* GBrowse_syn - A generic synteny browser ? ?* JBrowse - genome browser ? ?* MAKER - genome annotation pipeline ? ?* Tripal - web front end for Chado The deadline for applying is the end of Friday, February 22. Admission is competitive and is based on the strength of the application (especially the statement of interest). In 2009 there were over 50 applications for the 25 slots. Any applications received after the deadline will be placed on the waiting list. See the course page for details and an application link: ?http://gmod.org/wiki/2010_GMOD_Summer_School_-_Americas Thanks, Dave Clements GMOD Help Desk PS: We are also investigating holding a GMOD course in the Asia/Pacific region, sometime this fall. Watch the GMOD mailing lists and the GMOD News page/RSS feed for updates. -- Please keep responses on the list! http://gmod.org/wiki/2010_GMOD_Summer_School_-_Americas http://gmod.org/wiki/GMOD_News Was this helpful? http://gmod.org/wiki/Help_Desk_Feedback From bhakti.dwivedi at gmail.com Sat Jan 30 22:38:40 2010 From: bhakti.dwivedi at gmail.com (Bhakti Dwivedi) Date: Sat, 30 Jan 2010 17:38:40 -0500 Subject: [Bioperl-l] how to map blast results on to the genome? Message-ID: Does anyone know how I can graphically map the blast results (m -8 format) to the genome using bio-perl? Thanks Bhakti From jason at bioperl.org Sat Jan 30 23:56:14 2010 From: jason at bioperl.org (Jason Stajich) Date: Sat, 30 Jan 2010 15:56:14 -0800 Subject: [Bioperl-l] how to map blast results on to the genome? In-Reply-To: References: Message-ID: <68937A7D-291F-419A-9ED7-7A87D9B4C78A@bioperl.org> Did you try BioGraphics and read the HOWTO on it -- http://bioperl.org/wiki/HOWTO:Graphics On Jan 30, 2010, at 2:38 PM, Bhakti Dwivedi wrote: > Does anyone know how I can graphically map the blast results (m -8 > format) > to the genome using bio-perl? > > Thanks > > Bhakti > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ http://twitter.com/hyphaltip From David.Messina at sbc.su.se Sun Jan 31 17:43:52 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Sun, 31 Jan 2010 18:43:52 +0100 Subject: [Bioperl-l] question about a PAML module In-Reply-To: <18165610.1264958229480.JavaMail.oracle@rif1.s.upf.edu> References: <17885902.1262198478831.JavaMail.oracle@rif1.s.upf.edu> <31992102.1262223390984.JavaMail.oracle@rif2.s.upf.edu> <18165610.1264958229480.JavaMail.oracle@rif1.s.upf.edu> Message-ID: Hey Rui, My apologies for keeping you waiting on this. I started looking at it on Friday, and while I believe it'll be a relatively easy fix, I haven't got to the bottom of it yet. I'll look at it some more tomorrow and hopefully get it sorted it in the next day or two. Dave From rui.faria at upf.edu Sun Jan 31 17:17:09 2010 From: rui.faria at upf.edu (Rui Faria) Date: Sun, 31 Jan 2010 18:17:09 +0100 (CET) Subject: [Bioperl-l] question about a PAML module In-Reply-To: References: <17885902.1262198478831.JavaMail.oracle@rif1.s.upf.edu> <31992102.1262223390984.JavaMail.oracle@rif2.s.upf.edu> Message-ID: <18165610.1264958229480.JavaMail.oracle@rif1.s.upf.edu> Hi Dave, we reported the bug on codeml about errors when the user gives its own tree file, some time ago. Did you have any chances to look at it? We basically wanted to know your opinion on where the problem may be, since we are not the most experienced "perlers" on the planet :) I'm asking this because we have to deal with that right now. If someone could check where is the problem, to understand if it has an easy solution, that would be of great help. Best, Rui -----Mensaje Original----- De Dave Messina Enviado Jue 31/12/2009 11:55 AM Para Rui Faria Cc Jason Stajich ; sandraneto_ at hotmail.com; bioperl-l List Asunto Re: question about a PAML module Hi Rui and Sandra, Could you file this as a bug report at http://bugzilla.open-bio.org/enter_bug.cgi?product=Bioperl ? Once you've created the bug report with a brief description of the problem and submitted it, please attach the following to the bug report: - sample input files (a sequence file and a tree file, probably) - a script which reproduces the problem - the output (error messages) like you show below When I updated the code to work with the current version, I didn't exhaustively test all of the different modes of running codeml, so I appreciate you reporting this. There was another, similar issue reported a few days ago. I will try to take a look at both of these bug reports soon. Dave From rui.faria at upf.edu Sun Jan 31 18:56:56 2010 From: rui.faria at upf.edu (Rui Faria) Date: Sun, 31 Jan 2010 19:56:56 +0100 (CET) Subject: [Bioperl-l] question about a PAML module In-Reply-To: References: <17885902.1262198478831.JavaMail.oracle@rif1.s.upf.edu> <31992102.1262223390984.JavaMail.oracle@rif2.s.upf.edu> <18165610.1264958229480.JavaMail.oracle@rif1.s.upf.edu> Message-ID: <11398434.1264964216856.JavaMail.oracle@rif1.s.upf.edu> Many thanks! We hope one day that we become experts we can retribute! Rui -----Mensaje Original----- De Dave Messina Enviado Dom 31/01/2010 06:43 PM Para Rui Faria Cc Jason Stajich ; sandraneto_ at hotmail.com; bioperl-l List Asunto Re: question about a PAML module Hey Rui, My apologies for keeping you waiting on this. I started looking at it on Friday, and while I believe it'll be a relatively easy fix, I haven't got to the bottom of it yet. I'll look at it some more tomorrow and hopefully get it sorted it in the next day or two. Dave