From jimhu at tamu.edu Wed Jan 2 14:44:56 2013 From: jimhu at tamu.edu (Jim Hu) Date: Wed, 2 Jan 2013 13:44:56 -0600 Subject: [Bioperl-l] Converting blast+ output to gff (with gaps) Message-ID: <072DC877-6A75-4A2E-9E6B-027E1D874C4C@tamu.edu> I assume this has already been done many times, but I can't seem to find it on bioperl.org or via google. I'm looking for a script that will take one of the blast+ outformats that includes the positions of gaps and mismatches, and create gff with appropriate subfeatures. Thanks, Jim ===================================== Jim Hu Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054 From biojiangke at gmail.com Wed Jan 2 15:20:02 2013 From: biojiangke at gmail.com (Ke Jiang) Date: Wed, 2 Jan 2013 15:20:02 -0500 Subject: [Bioperl-l] Converting blast+ output to gff (with gaps) In-Reply-To: <072DC877-6A75-4A2E-9E6B-027E1D874C4C@tamu.edu> References: <072DC877-6A75-4A2E-9E6B-027E1D874C4C@tamu.edu> Message-ID: There is a blast2gff.pl floating around for Gbrowse to visualize blast results as tracks. Not sure about the blast+. One more reason to stick with the good old blast! https://github.com/GMOD/GBrowse/blob/master/contrib/blast2gff.pl Ke ------------------------------------------- Ke Jiang, Ph.D. Post-doctoral Fellow Delbruck Laboratory Cold Spring Harbor Laboratory Cold Spring Harbor, NY 11724 From cjfields at illinois.edu Wed Jan 2 15:47:52 2013 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 2 Jan 2013 20:47:52 +0000 Subject: [Bioperl-l] Converting blast+ output to gff (with gaps) In-Reply-To: References: <072DC877-6A75-4A2E-9E6B-027E1D874C4C@tamu.edu> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF4CF5FD44@CHIMBX5.ad.uillinois.edu> ?except that NCBI no longer supports legacy BLAST. Caveat emptor. chris On Jan 2, 2013, at 2:20 PM, Ke Jiang wrote: > There is a blast2gff.pl floating around for Gbrowse to visualize blast > results as tracks. Not sure about the blast+. One more reason to stick with > the good old blast! > > https://github.com/GMOD/GBrowse/blob/master/contrib/blast2gff.pl > > Ke > > ------------------------------------------- > Ke Jiang, Ph.D. > Post-doctoral Fellow > Delbruck Laboratory > Cold Spring Harbor Laboratory > Cold Spring Harbor, NY 11724 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bosborne11 at verizon.net Wed Jan 2 15:26:22 2013 From: bosborne11 at verizon.net (Brian Osborne) Date: Wed, 02 Jan 2013 15:26:22 -0500 Subject: [Bioperl-l] Converting blast+ output to gff (with gaps) In-Reply-To: <072DC877-6A75-4A2E-9E6B-027E1D874C4C@tamu.edu> References: <072DC877-6A75-4A2E-9E6B-027E1D874C4C@tamu.edu> Message-ID: Here's one: https://github.com/GMOD/GBrowse/blob/master/contrib/blast2gff.pl Another one: ~/git/bioperl-live>head scripts/utilities/bp_search2gff.pl #!perl # Author: Jason Stajich # Description: Turn SearchIO parseable report(s) into a GFF report # =head1 NAME bp_search2gff - Turn SearchIO parseable reports(s) into a GFF report Brian O. On Jan 2, 2013, at 2:44 PM, Jim Hu wrote: > I assume this has already been done many times, but I can't seem to find it on bioperl.org or via google. > > I'm looking for a script that will take one of the blast+ outformats that includes the positions of gaps and mismatches, and create gff with appropriate subfeatures. > > Thanks, > > Jim > ===================================== > Jim Hu > Professor > Dept. of Biochemistry and Biophysics > 2128 TAMU > Texas A&M Univ. > College Station, TX 77843-2128 > 979-862-4054 > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From scott at scottcain.net Wed Jan 2 16:32:16 2013 From: scott at scottcain.net (Scott Cain) Date: Wed, 2 Jan 2013 16:32:16 -0500 Subject: [Bioperl-l] Converting blast+ output to gff (with gaps) In-Reply-To: References: <072DC877-6A75-4A2E-9E6B-027E1D874C4C@tamu.edu> Message-ID: Hi Brian, I was going to suggest the same thing--though that script is fairly old, it's not as old as the blast2gff script in the GBrowse distribution (which probably should be retired). I believe it supports GFF3, though I don't have any sample data with which to test it to be sure. I also don't know if it supports BLAST+ input--I haven't kept up with SearchIO (on which search2gff.pl depends); will it accept it? Scott On Wed, Jan 2, 2013 at 3:26 PM, Brian Osborne wrote: > Here's one: > > https://github.com/GMOD/GBrowse/blob/master/contrib/blast2gff.pl > > Another one: > > ~/git/bioperl-live>head scripts/utilities/bp_search2gff.pl > #!perl > > # Author: Jason Stajich > # Description: Turn SearchIO parseable report(s) into a GFF report > # > =head1 NAME > > bp_search2gff - Turn SearchIO parseable reports(s) into a GFF report > > > > Brian O. > > On Jan 2, 2013, at 2:44 PM, Jim Hu wrote: > >> I assume this has already been done many times, but I can't seem to find it on bioperl.org or via google. >> >> I'm looking for a script that will take one of the blast+ outformats that includes the positions of gaps and mismatches, and create gff with appropriate subfeatures. >> >> Thanks, >> >> Jim >> ===================================== >> Jim Hu >> Professor >> Dept. of Biochemistry and Biophysics >> 2128 TAMU >> Texas A&M Univ. >> College Station, TX 77843-2128 >> 979-862-4054 >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From cjfields at illinois.edu Wed Jan 2 17:26:55 2013 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 2 Jan 2013 22:26:55 +0000 Subject: [Bioperl-l] Converting blast+ output to gff (with gaps) In-Reply-To: References: <072DC877-6A75-4A2E-9E6B-027E1D874C4C@tamu.edu> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF4CF5FFF5@CHIMBX5.ad.uillinois.edu> It should (I recall using it at one point). If it doesn't we should fix it so it does. How does MAKER deal with this? IIRC it uses (a modified) SearchIO-based method... chris On Jan 2, 2013, at 3:32 PM, Scott Cain wrote: > Hi Brian, > > I was going to suggest the same thing--though that script is fairly > old, it's not as old as the blast2gff script in the GBrowse > distribution (which probably should be retired). I believe it > supports GFF3, though I don't have any sample data with which to test > it to be sure. I also don't know if it supports BLAST+ input--I > haven't kept up with SearchIO (on which search2gff.pl depends); will > it accept it? > > Scott > > > On Wed, Jan 2, 2013 at 3:26 PM, Brian Osborne wrote: >> Here's one: >> >> https://github.com/GMOD/GBrowse/blob/master/contrib/blast2gff.pl >> >> Another one: >> >> ~/git/bioperl-live>head scripts/utilities/bp_search2gff.pl >> #!perl >> >> # Author: Jason Stajich >> # Description: Turn SearchIO parseable report(s) into a GFF report >> # >> =head1 NAME >> >> bp_search2gff - Turn SearchIO parseable reports(s) into a GFF report >> >> >> >> Brian O. >> >> On Jan 2, 2013, at 2:44 PM, Jim Hu wrote: >> >>> I assume this has already been done many times, but I can't seem to find it on bioperl.org or via google. >>> >>> I'm looking for a script that will take one of the blast+ outformats that includes the positions of gaps and mismatches, and create gff with appropriate subfeatures. >>> >>> Thanks, >>> >>> Jim >>> ===================================== >>> Jim Hu >>> Professor >>> Dept. of Biochemistry and Biophysics >>> 2128 TAMU >>> Texas A&M Univ. >>> College Station, TX 77843-2128 >>> 979-862-4054 >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bosborne11 at verizon.net Wed Jan 2 17:53:50 2013 From: bosborne11 at verizon.net (Brian Osborne) Date: Wed, 02 Jan 2013 17:53:50 -0500 Subject: [Bioperl-l] Converting blast+ output to gff (with gaps) In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF4CF5FFF5@CHIMBX5.ad.uillinois.edu> References: <072DC877-6A75-4A2E-9E6B-027E1D874C4C@tamu.edu> <118F034CF4C3EF48A96F86CE585B94BF4CF5FFF5@CHIMBX5.ad.uillinois.edu> Message-ID: <80636318-ED50-4F05-85C2-35F12E4FF9D8@verizon.net> Scott and Chris, I'll test it and see? Brian O. On Jan 2, 2013, at 5:26 PM, "Fields, Christopher J" wrote: > It should (I recall using it at one point). If it doesn't we should fix it so it does. > > How does MAKER deal with this? IIRC it uses (a modified) SearchIO-based method... > > chris > > On Jan 2, 2013, at 3:32 PM, Scott Cain wrote: > >> Hi Brian, >> >> I was going to suggest the same thing--though that script is fairly >> old, it's not as old as the blast2gff script in the GBrowse >> distribution (which probably should be retired). I believe it >> supports GFF3, though I don't have any sample data with which to test >> it to be sure. I also don't know if it supports BLAST+ input--I >> haven't kept up with SearchIO (on which search2gff.pl depends); will >> it accept it? >> >> Scott >> >> >> On Wed, Jan 2, 2013 at 3:26 PM, Brian Osborne wrote: >>> Here's one: >>> >>> https://github.com/GMOD/GBrowse/blob/master/contrib/blast2gff.pl >>> >>> Another one: >>> >>> ~/git/bioperl-live>head scripts/utilities/bp_search2gff.pl >>> #!perl >>> >>> # Author: Jason Stajich >>> # Description: Turn SearchIO parseable report(s) into a GFF report >>> # >>> =head1 NAME >>> >>> bp_search2gff - Turn SearchIO parseable reports(s) into a GFF report >>> >>> >>> >>> Brian O. >>> >>> On Jan 2, 2013, at 2:44 PM, Jim Hu wrote: >>> >>>> I assume this has already been done many times, but I can't seem to find it on bioperl.org or via google. >>>> >>>> I'm looking for a script that will take one of the blast+ outformats that includes the positions of gaps and mismatches, and create gff with appropriate subfeatures. >>>> >>>> Thanks, >>>> >>>> Jim >>>> ===================================== >>>> Jim Hu >>>> Professor >>>> Dept. of Biochemistry and Biophysics >>>> 2128 TAMU >>>> Texas A&M Univ. >>>> College Station, TX 77843-2128 >>>> 979-862-4054 >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D. scott at scottcain dot net >> GMOD Coordinator (http://gmod.org/) 216-392-3087 >> Ontario Institute for Cancer Research >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jimhu at tamu.edu Fri Jan 4 14:49:51 2013 From: jimhu at tamu.edu (Jim Hu) Date: Fri, 4 Jan 2013 13:49:51 -0600 Subject: [Bioperl-l] Converting blast+ output to gff (with gaps) In-Reply-To: <80636318-ED50-4F05-85C2-35F12E4FF9D8@verizon.net> References: <072DC877-6A75-4A2E-9E6B-027E1D874C4C@tamu.edu> <118F034CF4C3EF48A96F86CE585B94BF4CF5FFF5@CHIMBX5.ad.uillinois.edu> <80636318-ED50-4F05-85C2-35F12E4FF9D8@verizon.net> Message-ID: <7F69D24B-8D4C-41F5-B8A1-CB850EBB1C67@tamu.edu> Thanks for the replies, but... I can't tell what input formats for the blast results file are supported. Format 11 and format 6 give no output and no feedback. Putting some diagnostic print statements in the code suggests that I'm not getting any result objects from Bio::SearchIO. The script uses Bio::SearchIO, but does not seem to call the submodules for blast. Documentation links on the wiki seem to be broken, at least on this page: http://www.bioperl.org/wiki/Module:Bio::SearchIO Jim On Jan 2, 2013, at 4:53 PM, Brian Osborne wrote: > Scott and Chris, > > I'll test it and see? > > Brian O. > > > On Jan 2, 2013, at 5:26 PM, "Fields, Christopher J" wrote: > >> It should (I recall using it at one point). If it doesn't we should fix it so it does. >> >> How does MAKER deal with this? IIRC it uses (a modified) SearchIO-based method... >> >> chris >> >> On Jan 2, 2013, at 3:32 PM, Scott Cain wrote: >> >>> Hi Brian, >>> >>> I was going to suggest the same thing--though that script is fairly >>> old, it's not as old as the blast2gff script in the GBrowse >>> distribution (which probably should be retired). I believe it >>> supports GFF3, though I don't have any sample data with which to test >>> it to be sure. I also don't know if it supports BLAST+ input--I >>> haven't kept up with SearchIO (on which search2gff.pl depends); will >>> it accept it? >>> >>> Scott >>> >>> >>> On Wed, Jan 2, 2013 at 3:26 PM, Brian Osborne wrote: >>>> Here's one: >>>> >>>> https://github.com/GMOD/GBrowse/blob/master/contrib/blast2gff.pl >>>> >>>> Another one: >>>> >>>> ~/git/bioperl-live>head scripts/utilities/bp_search2gff.pl >>>> #!perl >>>> >>>> # Author: Jason Stajich >>>> # Description: Turn SearchIO parseable report(s) into a GFF report >>>> # >>>> =head1 NAME >>>> >>>> bp_search2gff - Turn SearchIO parseable reports(s) into a GFF report >>>> >>>> >>>> >>>> Brian O. >>>> >>>> On Jan 2, 2013, at 2:44 PM, Jim Hu wrote: >>>> >>>>> I assume this has already been done many times, but I can't seem to find it on bioperl.org or via google. >>>>> >>>>> I'm looking for a script that will take one of the blast+ outformats that includes the positions of gaps and mismatches, and create gff with appropriate subfeatures. >>>>> >>>>> Thanks, >>>>> >>>>> Jim >>>>> ===================================== >>>>> Jim Hu >>>>> Professor >>>>> Dept. of Biochemistry and Biophysics >>>>> 2128 TAMU >>>>> Texas A&M Univ. >>>>> College Station, TX 77843-2128 >>>>> 979-862-4054 >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >>> -- >>> ------------------------------------------------------------------------ >>> Scott Cain, Ph. D. scott at scottcain dot net >>> GMOD Coordinator (http://gmod.org/) 216-392-3087 >>> Ontario Institute for Cancer Research >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > ===================================== Jim Hu Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054 From bosborne11 at verizon.net Fri Jan 4 14:26:52 2013 From: bosborne11 at verizon.net (Brian Osborne) Date: Fri, 04 Jan 2013 14:26:52 -0500 Subject: [Bioperl-l] strange error in the beginners howto In-Reply-To: References: Message-ID: Matthew, First, let's distinguish "warning" and "error". Errors stop execution of the app or script, what you're seeing is a warning, your script actually completes. Does the warning vanish if you remove "-w"? Second, seems like this is a Perl version thing: http://www.sysarchitects.com/bioperl BIO On Jan 4, 2013, at 1:55 PM, "Thornton, Matthew" wrote: > Hello, > > I am working through the first part of the BioPerl beginners howto and I am getting an error and I was wondering if you could tell me what is wrong. > > I made the seqio.pl script: > > #!/bin/perl -w > > use Bio::Seq; > > $seq_obj = Bio::Seq->new(-seq => "aaaatgggggggggggccccgtt", > -display_id => "#12345", > -desc => "example 1", > -alphabet => "dna" ); > > print $seq_obj->seq(); > > I execute the script and I get this error: > > gserver at gserver:~/bio$ perl seqio.pl > Replacement list is longer than search list at /usr/local/share/perl/5.14.2/Bio/Range.pm line 251. > aaaatgggggggggggccccgttgserver at gserver:~/bio$ > > Is there something wrong with my installation? > > Any help is greatly appreciated. > > Thanks! > > > Matthew E. Thornton > > > > Laboratory Technician III > Broad Center for Stem Cell and Regenerative Medicine > University of Southern California > 1501 San Pablo St. ZNI > Los Angeles, CA 90089-2821 > ph.:323-442-1688 > fx.:323-442-4040 > matthew.thornton at med.usc.edu > Brian O. -- Brian Osborne, PhD BioTeam: http://bioteam.net email: briano at bioteam.net mobile: 978-317-3101 From MEC at stowers.org Fri Jan 4 15:20:17 2013 From: MEC at stowers.org (Cook, Malcolm) Date: Fri, 4 Jan 2013 14:20:17 -0600 Subject: [Bioperl-l] Converting blast+ output to gff (with gaps) In-Reply-To: <7F69D24B-8D4C-41F5-B8A1-CB850EBB1C67@tamu.edu> References: <072DC877-6A75-4A2E-9E6B-027E1D874C4C@tamu.edu> <118F034CF4C3EF48A96F86CE585B94BF4CF5FFF5@CHIMBX5.ad.uillinois.edu> <80636318-ED50-4F05-85C2-35F12E4FF9D8@verizon.net> <7F69D24B-8D4C-41F5-B8A1-CB850EBB1C67@tamu.edu> Message-ID: <2C40E43D1F7A56408C4463FD245DDDF907958EE791@EXCHMB-02.stowers-institute.org> Jim, Getting to your original question: > I'm looking for a script that will take one of the blast+ outformats that includes the positions of gaps and mismatches, and .create gff with appropriate subfeatures. Exactly what/how do you want/expect to encode the blast output as GFF{1,2,2.5,3}?? If GFF3 pe http://www.sequenceontology.org/gff3.shtml then are you hoping to get GFF3 marked up as described in section 'THE GAP ATTRIBUTE' or as in 'ALIGNMENTS' I would guess not because neither of them have 'subfeatures'. If you could explain more fully with examples (hand cobbled or borrowed from someone else) of what you expect then I might have a better idea of what options might suit your needs. ~Malcolm .-----Original Message----- .From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Jim Hu .Sent: Friday, January 04, 2013 1:50 PM .To: Brian Osborne .Cc: Fields, Christopher J; Scott Cain; bioperl-l at bioperl.org .Subject: Re: [Bioperl-l] Converting blast+ output to gff (with gaps) . .Thanks for the replies, but... . .I can't tell what input formats for the blast results file are supported. Format 11 and format 6 give no output and no feedback. Putting .some diagnostic print statements in the code suggests that I'm not getting any result objects from Bio::SearchIO. . .The script uses Bio::SearchIO, but does not seem to call the submodules for blast. Documentation links on the wiki seem to be .broken, at least on this page: . . http://www.bioperl.org/wiki/Module:Bio::SearchIO . .Jim . . .On Jan 2, 2013, at 4:53 PM, Brian Osborne wrote: . .> Scott and Chris, .> .> I'll test it and see... .> .> Brian O. .> .> .> On Jan 2, 2013, at 5:26 PM, "Fields, Christopher J" wrote: .> .>> It should (I recall using it at one point). If it doesn't we should fix it so it does. .>> .>> How does MAKER deal with this? IIRC it uses (a modified) SearchIO-based method... .>> .>> chris .>> .>> On Jan 2, 2013, at 3:32 PM, Scott Cain wrote: .>> .>>> Hi Brian, .>>> .>>> I was going to suggest the same thing--though that script is fairly .>>> old, it's not as old as the blast2gff script in the GBrowse .>>> distribution (which probably should be retired). I believe it .>>> supports GFF3, though I don't have any sample data with which to test .>>> it to be sure. I also don't know if it supports BLAST+ input--I .>>> haven't kept up with SearchIO (on which search2gff.pl depends); will .>>> it accept it? .>>> .>>> Scott .>>> .>>> .>>> On Wed, Jan 2, 2013 at 3:26 PM, Brian Osborne wrote: .>>>> Here's one: .>>>> .>>>> https://github.com/GMOD/GBrowse/blob/master/contrib/blast2gff.pl .>>>> .>>>> Another one: .>>>> .>>>> ~/git/bioperl-live>head scripts/utilities/bp_search2gff.pl .>>>> #!perl .>>>> .>>>> # Author: Jason Stajich .>>>> # Description: Turn SearchIO parseable report(s) into a GFF report .>>>> # .>>>> =head1 NAME .>>>> .>>>> bp_search2gff - Turn SearchIO parseable reports(s) into a GFF report .>>>> .>>>> .>>>> .>>>> Brian O. .>>>> .>>>> On Jan 2, 2013, at 2:44 PM, Jim Hu wrote: .>>>> .>>>>> I assume this has already been done many times, but I can't seem to find it on bioperl.org or via google. .>>>>> .>>>>> I'm looking for a script that will take one of the blast+ outformats that includes the positions of gaps and mismatches, and .create gff with appropriate subfeatures. .>>>>> .>>>>> Thanks, .>>>>> .>>>>> Jim .>>>>> ===================================== .>>>>> Jim Hu .>>>>> Professor .>>>>> Dept. of Biochemistry and Biophysics .>>>>> 2128 TAMU .>>>>> Texas A&M Univ. .>>>>> College Station, TX 77843-2128 .>>>>> 979-862-4054 .>>>>> .>>>>> .>>>>> .>>>>> _______________________________________________ .>>>>> Bioperl-l mailing list .>>>>> Bioperl-l at lists.open-bio.org .>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l .>>>> .>>>> .>>>> _______________________________________________ .>>>> Bioperl-l mailing list .>>>> Bioperl-l at lists.open-bio.org .>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l .>>> .>>> .>>> .>>> -- .>>> ------------------------------------------------------------------------ .>>> Scott Cain, Ph. D. scott at scottcain dot net .>>> GMOD Coordinator (http://gmod.org/) 216-392-3087 .>>> Ontario Institute for Cancer Research .>>> _______________________________________________ .>>> Bioperl-l mailing list .>>> Bioperl-l at lists.open-bio.org .>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l .>> .> . .===================================== .Jim Hu .Professor .Dept. of Biochemistry and Biophysics .2128 TAMU .Texas A&M Univ. .College Station, TX 77843-2128 .979-862-4054 . . . ._______________________________________________ .Bioperl-l mailing list .Bioperl-l at lists.open-bio.org .http://lists.open-bio.org/mailman/listinfo/bioperl-l From jimhu at tamu.edu Fri Jan 4 16:57:38 2013 From: jimhu at tamu.edu (Jim Hu) Date: Fri, 4 Jan 2013 15:57:38 -0600 Subject: [Bioperl-l] Converting blast+ output to gff (with gaps) In-Reply-To: <2C40E43D1F7A56408C4463FD245DDDF907958EE791@EXCHMB-02.stowers-institute.org> References: <072DC877-6A75-4A2E-9E6B-027E1D874C4C@tamu.edu> <118F034CF4C3EF48A96F86CE585B94BF4CF5FFF5@CHIMBX5.ad.uillinois.edu> <80636318-ED50-4F05-85C2-35F12E4FF9D8@verizon.net> <7F69D24B-8D4C-41F5-B8A1-CB850EBB1C67@tamu.edu> <2C40E43D1F7A56408C4463FD245DDDF907958EE791@EXCHMB-02.stowers-institute.org> Message-ID: <2D58FAEB-ED4B-4965-A8CB-890D0012D098@tamu.edu> Malcolm, Thanks, I should have reread the GFF3 spec before posting! In the section on the Gap attrribute and below on alignments it discusses two ways to represent an alignment. I was originally thinking of something like the later example shown for cDNA vs genome. But the gap attribute representation would be fine too. So, I can see how the final output could be done in different ways, but I'm still stuck on how to get there. I don't have a specific application in mind; I'm mostly just trying to understand how to get from having standalone blast+ output to get to things that look like the examples in the gff spec and the gbrowse documentation - really basic display of alignments that are gapped. For my teaching, we do EST vs genomic blast and want gapped cDNA alignments to show where the introns go. My other work is with bacteria where introns are rare, but there are times when I'd like to show an alignment that is interrupted by a transposable element, for example. Excerpting from blastp -help *** Formatting options -outfmt alignment view options: 0 = pairwise, 1 = query-anchored showing identities, 2 = query-anchored no identities, 3 = flat query-anchored, show identities, 4 = flat query-anchored, no identities, 5 = XML Blast output, 6 = tabular, 7 = tabular with comment lines, 8 = Text ASN.1, 9 = Binary ASN.1, 10 = Comma-separated values, 11 = BLAST archive format (ASN.1) Several of these are "lossy" in terms of where the actual gaps occur (e.g. 6). Others seem to me to be more human readable than suited for parsing. So I was hoping to get pointed to an existing script that would generate either the single feature with gap attribute OR the multi-line match features OR a combination from one of these output formats. I'm probably missing something very, very obvious. Best, Jim On Jan 4, 2013, at 2:20 PM, Cook, Malcolm wrote: > Jim, > > Getting to your original question: > >> I'm looking for a script that will take one of the blast+ outformats that includes the positions of gaps and mismatches, and .create gff with appropriate subfeatures. > > Exactly what/how do you want/expect to encode the blast output as GFF{1,2,2.5,3}?? > > If GFF3 pe http://www.sequenceontology.org/gff3.shtml then are you hoping to get GFF3 marked up as described in section 'THE GAP ATTRIBUTE' or as in 'ALIGNMENTS' > > I would guess not because neither of them have 'subfeatures'. > > If you could explain more fully with examples (hand cobbled or borrowed from someone else) of what you expect then I might have a better idea of what options might suit your needs. > > > ~Malcolm > > > .-----Original Message----- > .From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Jim Hu > .Sent: Friday, January 04, 2013 1:50 PM > .To: Brian Osborne > .Cc: Fields, Christopher J; Scott Cain; bioperl-l at bioperl.org > .Subject: Re: [Bioperl-l] Converting blast+ output to gff (with gaps) > . > .Thanks for the replies, but... > . > .I can't tell what input formats for the blast results file are supported. Format 11 and format 6 give no output and no feedback. Putting > .some diagnostic print statements in the code suggests that I'm not getting any result objects from Bio::SearchIO. > . > .The script uses Bio::SearchIO, but does not seem to call the submodules for blast. Documentation links on the wiki seem to be > .broken, at least on this page: > . > . http://www.bioperl.org/wiki/Module:Bio::SearchIO > . > .Jim > . > . > .On Jan 2, 2013, at 4:53 PM, Brian Osborne wrote: > . > .> Scott and Chris, > .> > .> I'll test it and see... > .> > .> Brian O. > .> > .> > .> On Jan 2, 2013, at 5:26 PM, "Fields, Christopher J" wrote: > .> > .>> It should (I recall using it at one point). If it doesn't we should fix it so it does. > .>> > .>> How does MAKER deal with this? IIRC it uses (a modified) SearchIO-based method... > .>> > .>> chris > .>> > .>> On Jan 2, 2013, at 3:32 PM, Scott Cain wrote: > .>> > .>>> Hi Brian, > .>>> > .>>> I was going to suggest the same thing--though that script is fairly > .>>> old, it's not as old as the blast2gff script in the GBrowse > .>>> distribution (which probably should be retired). I believe it > .>>> supports GFF3, though I don't have any sample data with which to test > .>>> it to be sure. I also don't know if it supports BLAST+ input--I > .>>> haven't kept up with SearchIO (on which search2gff.pl depends); will > .>>> it accept it? > .>>> > .>>> Scott > .>>> > .>>> > .>>> On Wed, Jan 2, 2013 at 3:26 PM, Brian Osborne wrote: > .>>>> Here's one: > .>>>> > .>>>> https://github.com/GMOD/GBrowse/blob/master/contrib/blast2gff.pl > .>>>> > .>>>> Another one: > .>>>> > .>>>> ~/git/bioperl-live>head scripts/utilities/bp_search2gff.pl > .>>>> #!perl > .>>>> > .>>>> # Author: Jason Stajich > .>>>> # Description: Turn SearchIO parseable report(s) into a GFF report > .>>>> # > .>>>> =head1 NAME > .>>>> > .>>>> bp_search2gff - Turn SearchIO parseable reports(s) into a GFF report > .>>>> > .>>>> > .>>>> > .>>>> Brian O. > .>>>> > .>>>> On Jan 2, 2013, at 2:44 PM, Jim Hu wrote: > .>>>> > .>>>>> I assume this has already been done many times, but I can't seem to find it on bioperl.org or via google. > .>>>>> > .>>>>> I'm looking for a script that will take one of the blast+ outformats that includes the positions of gaps and mismatches, and > .create gff with appropriate subfeatures. > .>>>>> > .>>>>> Thanks, > .>>>>> > .>>>>> Jim > .>>>>> ===================================== > .>>>>> Jim Hu > .>>>>> Professor > .>>>>> Dept. of Biochemistry and Biophysics > .>>>>> 2128 TAMU > .>>>>> Texas A&M Univ. > .>>>>> College Station, TX 77843-2128 > .>>>>> 979-862-4054 > .>>>>> > .>>>>> > .>>>>> > .>>>>> _______________________________________________ > .>>>>> Bioperl-l mailing list > .>>>>> Bioperl-l at lists.open-bio.org > .>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > .>>>> > .>>>> > .>>>> _______________________________________________ > .>>>> Bioperl-l mailing list > .>>>> Bioperl-l at lists.open-bio.org > .>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > .>>> > .>>> > .>>> > .>>> -- > .>>> ------------------------------------------------------------------------ > .>>> Scott Cain, Ph. D. scott at scottcain dot net > .>>> GMOD Coordinator (http://gmod.org/) 216-392-3087 > .>>> Ontario Institute for Cancer Research > .>>> _______________________________________________ > .>>> Bioperl-l mailing list > .>>> Bioperl-l at lists.open-bio.org > .>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > .>> > .> > . > .===================================== > .Jim Hu > .Professor > .Dept. of Biochemistry and Biophysics > .2128 TAMU > .Texas A&M Univ. > .College Station, TX 77843-2128 > .979-862-4054 > . > . > . > ._______________________________________________ > .Bioperl-l mailing list > .Bioperl-l at lists.open-bio.org > .http://lists.open-bio.org/mailman/listinfo/bioperl-l ===================================== Jim Hu Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054 From MEC at stowers.org Fri Jan 4 18:33:04 2013 From: MEC at stowers.org (Cook, Malcolm) Date: Fri, 4 Jan 2013 17:33:04 -0600 Subject: [Bioperl-l] Converting blast+ output to gff (with gaps) In-Reply-To: <2D58FAEB-ED4B-4965-A8CB-890D0012D098@tamu.edu> References: <072DC877-6A75-4A2E-9E6B-027E1D874C4C@tamu.edu> <118F034CF4C3EF48A96F86CE585B94BF4CF5FFF5@CHIMBX5.ad.uillinois.edu> <80636318-ED50-4F05-85C2-35F12E4FF9D8@verizon.net> <7F69D24B-8D4C-41F5-B8A1-CB850EBB1C67@tamu.edu> <2C40E43D1F7A56408C4463FD245DDDF907958EE791@EXCHMB-02.stowers-institute.org> <2D58FAEB-ED4B-4965-A8CB-890D0012D098@tamu.edu> Message-ID: <2C40E43D1F7A56408C4463FD245DDDF907958EE7CB@EXCHMB-02.stowers-institute.org> Jim, To get a working example for further discussion.... Assuming you have bioperl and blast+ installed.... This command gets a known transcript from ncbi and blasts it back at ncbi into fly genome: > bp_download_query_genbank.pl --query 'NM_001259364' | blastn -remote -db refseq_genomic -entrez_query 'melanogaster[Taxid]' -outfmt 6 > test.blast.tab Take a look at the results (see below, or run it yourself). First line output looks like this: NM_001259364 gi|116010442|ref|NT_033778.3| 100.00 2127 0 0 1869 3995 9330023 9327897 0.0 3928 Old GFF did not have parent-child. Depending on where you are going with this, you MIGHT want to use this strategy: http://gmod.org/wiki/GFF#Alignments Where the ID in column 9 serves to group the features. There is an 'implicit' parent. I think this would be respected by both GBrowse and UCSC genome browser (but I'm not positive). If so, converting blast output would yield one line output per each line input. You would need to assign a strand, swap sstart and send where sstart>send, and some slight re-formatting First line of GFF might look like: NT_033778.3 est EST_match 9330023 9327897 . + . ID=Match1;Name= NM_001259364;Target= NM_001259364 1869 3995 Is this along the lines of what you need? ~Malcolm Complete example blast+ output looks like this: NM_001259364 gi|116010442|ref|NT_033778.3| 100.00 2127 0 0 1869 3995 9330023 9327897 0.0 3928 NM_001259364 gi|116010442|ref|NT_033778.3| 100.00 1019 0 0 3995 5013 9327834 9326816 0.0 1882 NM_001259364 gi|116010442|ref|NT_033778.3| 100.00 374 0 0 345 718 9332643 9332270 0.0 691 NM_001259364 gi|116010442|ref|NT_033778.3| 100.00 300 0 0 1 300 9337978 9337679 1e-154 555 NM_001259364 gi|116010442|ref|NT_033778.3| 100.00 270 0 0 717 986 9332201 9331932 5e-138 499 NM_001259364 gi|116010442|ref|NT_033778.3| 100.00 243 0 0 1434 1676 9331200 9330958 5e-123 449 NM_001259364 gi|116010442|ref|NT_033778.3| 99.55 223 0 1 1154 1376 9331611 9331390 1e-109 405 NM_001259364 gi|116010442|ref|NT_033778.3| 100.00 196 0 0 1675 1870 9330634 9330439 7e-97 363 NM_001259364 gi|116010442|ref|NT_033778.3| 100.00 169 0 0 986 1154 9331847 9331679 7e-82 313 NM_001259364 gi|116010442|ref|NT_033778.3| 97.26 73 2 0 1372 1444 9331322 9331250 3e-25 124 NM_001259364 gi|116010442|ref|NT_033778.3| 100.00 44 0 0 301 344 9333095 9333052 2e-12 82.4 NM_001259364 gi|195586644|ref|NT_167067.1| 96.52 2127 74 0 1869 3995 7777827 7775701 0.0 3518 NM_001259364 gi|195586644|ref|NT_167067.1| 97.03 404 7 3 4004 4402 7775629 7775226 0.0 675 NM_001259364 gi|195586644|ref|NT_167067.1| 93.58 374 24 0 345 718 7780392 7780019 8e-156 558 NM_001259364 gi|195586644|ref|NT_167067.1| 94.37 302 15 1 1 300 7786262 7785961 6e-127 462 NM_001259364 gi|195586644|ref|NT_167067.1| 96.25 267 10 0 720 986 7779949 7779683 1e-119 438 NM_001259364 gi|195586644|ref|NT_167067.1| 95.00 240 12 0 1434 1673 7778959 7778720 2e-101 377 NM_001259364 gi|195586644|ref|NT_167067.1| 96.86 223 6 1 1154 1376 7779370 7779149 1e-99 372 NM_001259364 gi|195586644|ref|NT_167067.1| 93.88 196 12 0 1675 1870 7778384 7778189 7e-77 296 NM_001259364 gi|195586644|ref|NT_167067.1| 95.19 187 5 4 4827 5013 7774635 7774453 9e-76 292 NM_001259364 gi|195586644|ref|NT_167067.1| 92.86 168 12 0 987 1154 7779598 7779431 3e-61 244 NM_001259364 gi|195586644|ref|NT_167067.1| 92.86 70 5 0 1374 1443 7779079 7779010 2e-18 102 NM_001259364 gi|195586644|ref|NT_167067.1| 100.00 44 0 0 301 344 7780833 7780790 2e-12 82.4 NM_001259364 gi|195489961|ref|NT_167063.1| 92.43 2127 161 0 1869 3995 7628910 7631036 0.0 3037 NM_001259364 gi|195489961|ref|NT_167063.1| 91.94 1030 38 20 3995 5013 7631102 7632097 0.0 1400 NM_001259364 gi|195489961|ref|NT_167063.1| 88.56 376 39 4 345 718 7626215 7626588 4e-124 453 NM_001259364 gi|195489961|ref|NT_167063.1| 93.67 300 18 1 1 300 7620777 7621075 2e-122 448 NM_001259364 gi|195489961|ref|NT_167063.1| 95.41 196 9 0 1675 1870 7628263 7628458 7e-82 313 NM_001259364 gi|195489961|ref|NT_167063.1| 91.93 223 17 1 1154 1376 7627224 7627445 2e-81 311 NM_001259364 gi|195489961|ref|NT_167063.1| 86.42 265 31 3 720 983 7626653 7626913 1e-73 285 NM_001259364 gi|195489961|ref|NT_167063.1| 86.18 246 25 4 1434 1673 7627635 7627877 3e-65 257 NM_001259364 gi|195489961|ref|NT_167063.1| 88.27 162 19 0 993 1154 7626999 7627160 3e-46 195 NM_001259364 gi|195489961|ref|NT_167063.1| 93.06 72 5 0 1372 1443 7627513 7627584 1e-19 106 NM_001259364 gi|195489961|ref|NT_167063.1| 100.00 44 0 0 301 344 7625770 7625813 2e-12 82.4 ~Malcolm .-----Original Message----- .From: Jim Hu [mailto:jimhu at tamu.edu] .Sent: Friday, January 04, 2013 3:58 PM .To: Cook, Malcolm .Cc: 'Brian Osborne'; 'Fields, Christopher J'; 'Scott Cain'; 'bioperl-l at bioperl.org' .Subject: Re: [Bioperl-l] Converting blast+ output to gff (with gaps) . .Malcolm, . .Thanks, I should have reread the GFF3 spec before posting! . .In the section on the Gap attrribute and below on alignments it discusses two ways to represent an alignment. I was originally thinking .of something like the later example shown for cDNA vs genome. But the gap attribute representation would be fine too. So, I can see .how the final output could be done in different ways, but I'm still stuck on how to get there. . .I don't have a specific application in mind; I'm mostly just trying to understand how to get from having standalone blast+ output to get .to things that look like the examples in the gff spec and the gbrowse documentation - really basic display of alignments that are .gapped. For my teaching, we do EST vs genomic blast and want gapped cDNA alignments to show where the introns go. My other .work is with bacteria where introns are rare, but there are times when I'd like to show an alignment that is interrupted by a .transposable element, for example. . .Excerpting from blastp -help . . *** Formatting options . -outfmt . alignment view options: . 0 = pairwise, . 1 = query-anchored showing identities, . 2 = query-anchored no identities, . 3 = flat query-anchored, show identities, . 4 = flat query-anchored, no identities, . 5 = XML Blast output, . 6 = tabular, . 7 = tabular with comment lines, . 8 = Text ASN.1, . 9 = Binary ASN.1, . 10 = Comma-separated values, . 11 = BLAST archive format (ASN.1) . .Several of these are "lossy" in terms of where the actual gaps occur (e.g. 6). Others seem to me to be more human readable than .suited for parsing. So I was hoping to get pointed to an existing script that would generate either the single feature with gap attribute .OR the multi-line match features OR a combination from one of these output formats. . .I'm probably missing something very, very obvious. . .Best, . .Jim . . .On Jan 4, 2013, at 2:20 PM, Cook, Malcolm wrote: . .> Jim, .> .> Getting to your original question: .> .>> I'm looking for a script that will take one of the blast+ outformats that includes the positions of gaps and mismatches, and .create .gff with appropriate subfeatures. .> .> Exactly what/how do you want/expect to encode the blast output as GFF{1,2,2.5,3}?? .> .> If GFF3 pe http://www.sequenceontology.org/gff3.shtml then are you hoping to get GFF3 marked up as described in section 'THE .GAP ATTRIBUTE' or as in 'ALIGNMENTS' .> .> I would guess not because neither of them have 'subfeatures'. .> .> If you could explain more fully with examples (hand cobbled or borrowed from someone else) of what you expect then I might have .a better idea of what options might suit your needs. .> .> .> ~Malcolm .> .> .> .-----Original Message----- .> .From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Jim Hu .> .Sent: Friday, January 04, 2013 1:50 PM .> .To: Brian Osborne .> .Cc: Fields, Christopher J; Scott Cain; bioperl-l at bioperl.org .> .Subject: Re: [Bioperl-l] Converting blast+ output to gff (with gaps) .> . .> .Thanks for the replies, but... .> . .> .I can't tell what input formats for the blast results file are supported. Format 11 and format 6 give no output and no feedback. .Putting .> .some diagnostic print statements in the code suggests that I'm not getting any result objects from Bio::SearchIO. .> . .> .The script uses Bio::SearchIO, but does not seem to call the submodules for blast. Documentation links on the wiki seem to be .> .broken, at least on this page: .> . .> . http://www.bioperl.org/wiki/Module:Bio::SearchIO .> . .> .Jim .> . .> . .> .On Jan 2, 2013, at 4:53 PM, Brian Osborne wrote: .> . .> .> Scott and Chris, .> .> .> .> I'll test it and see... .> .> .> .> Brian O. .> .> .> .> .> .> On Jan 2, 2013, at 5:26 PM, "Fields, Christopher J" wrote: .> .> .> .>> It should (I recall using it at one point). If it doesn't we should fix it so it does. .> .>> .> .>> How does MAKER deal with this? IIRC it uses (a modified) SearchIO-based method... .> .>> .> .>> chris .> .>> .> .>> On Jan 2, 2013, at 3:32 PM, Scott Cain wrote: .> .>> .> .>>> Hi Brian, .> .>>> .> .>>> I was going to suggest the same thing--though that script is fairly .> .>>> old, it's not as old as the blast2gff script in the GBrowse .> .>>> distribution (which probably should be retired). I believe it .> .>>> supports GFF3, though I don't have any sample data with which to test .> .>>> it to be sure. I also don't know if it supports BLAST+ input--I .> .>>> haven't kept up with SearchIO (on which search2gff.pl depends); will .> .>>> it accept it? .> .>>> .> .>>> Scott .> .>>> .> .>>> .> .>>> On Wed, Jan 2, 2013 at 3:26 PM, Brian Osborne wrote: .> .>>>> Here's one: .> .>>>> .> .>>>> https://github.com/GMOD/GBrowse/blob/master/contrib/blast2gff.pl .> .>>>> .> .>>>> Another one: .> .>>>> .> .>>>> ~/git/bioperl-live>head scripts/utilities/bp_search2gff.pl .> .>>>> #!perl .> .>>>> .> .>>>> # Author: Jason Stajich .> .>>>> # Description: Turn SearchIO parseable report(s) into a GFF report .> .>>>> # .> .>>>> =head1 NAME .> .>>>> .> .>>>> bp_search2gff - Turn SearchIO parseable reports(s) into a GFF report .> .>>>> .> .>>>> .> .>>>> .> .>>>> Brian O. .> .>>>> .> .>>>> On Jan 2, 2013, at 2:44 PM, Jim Hu wrote: .> .>>>> .> .>>>>> I assume this has already been done many times, but I can't seem to find it on bioperl.org or via google. .> .>>>>> .> .>>>>> I'm looking for a script that will take one of the blast+ outformats that includes the positions of gaps and mismatches, and .> .create gff with appropriate subfeatures. .> .>>>>> .> .>>>>> Thanks, .> .>>>>> .> .>>>>> Jim .> .>>>>> ===================================== .> .>>>>> Jim Hu .> .>>>>> Professor .> .>>>>> Dept. of Biochemistry and Biophysics .> .>>>>> 2128 TAMU .> .>>>>> Texas A&M Univ. .> .>>>>> College Station, TX 77843-2128 .> .>>>>> 979-862-4054 .> .>>>>> .> .>>>>> .> .>>>>> .> .>>>>> _______________________________________________ .> .>>>>> Bioperl-l mailing list .> .>>>>> Bioperl-l at lists.open-bio.org .> .>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l .> .>>>> .> .>>>> .> .>>>> _______________________________________________ .> .>>>> Bioperl-l mailing list .> .>>>> Bioperl-l at lists.open-bio.org .> .>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l .> .>>> .> .>>> .> .>>> .> .>>> -- .> .>>> ------------------------------------------------------------------------ .> .>>> Scott Cain, Ph. D. scott at scottcain dot net .> .>>> GMOD Coordinator (http://gmod.org/) 216-392-3087 .> .>>> Ontario Institute for Cancer Research .> .>>> _______________________________________________ .> .>>> Bioperl-l mailing list .> .>>> Bioperl-l at lists.open-bio.org .> .>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l .> .>> .> .> .> . .> .===================================== .> .Jim Hu .> .Professor .> .Dept. of Biochemistry and Biophysics .> .2128 TAMU .> .Texas A&M Univ. .> .College Station, TX 77843-2128 .> .979-862-4054 .> . .> . .> . .> ._______________________________________________ .> .Bioperl-l mailing list .> .Bioperl-l at lists.open-bio.org .> .http://lists.open-bio.org/mailman/listinfo/bioperl-l . .===================================== .Jim Hu .Professor .Dept. of Biochemistry and Biophysics .2128 TAMU .Texas A&M Univ. .College Station, TX 77843-2128 .979-862-4054 . From bosborne11 at verizon.net Sat Jan 5 17:25:37 2013 From: bosborne11 at verizon.net (Brian Osborne) Date: Sat, 05 Jan 2013 17:25:37 -0500 Subject: [Bioperl-l] Converting blast+ output to gff (with gaps) In-Reply-To: <80636318-ED50-4F05-85C2-35F12E4FF9D8@verizon.net> References: <072DC877-6A75-4A2E-9E6B-027E1D874C4C@tamu.edu> <118F034CF4C3EF48A96F86CE585B94BF4CF5FFF5@CHIMBX5.ad.uillinois.edu> <80636318-ED50-4F05-85C2-35F12E4FF9D8@verizon.net> Message-ID: <12E97C07-D3E0-4E21-8A24-A4223468FF70@verizon.net> All, The scripts/utilities/bp_search2gff.pl script works fine with blastn (BLAST+), no surprise there, and since it uses Bio::Tools::GFF it can produce GFF3. Brian O. On Jan 2, 2013, at 5:53 PM, Brian Osborne wrote: > Scott and Chris, > > I'll test it and see? > > Brian O. > > > On Jan 2, 2013, at 5:26 PM, "Fields, Christopher J" wrote: > >> It should (I recall using it at one point). If it doesn't we should fix it so it does. >> >> How does MAKER deal with this? IIRC it uses (a modified) SearchIO-based method... >> >> chris >> >> On Jan 2, 2013, at 3:32 PM, Scott Cain wrote: >> >>> Hi Brian, >>> >>> I was going to suggest the same thing--though that script is fairly >>> old, it's not as old as the blast2gff script in the GBrowse >>> distribution (which probably should be retired). I believe it >>> supports GFF3, though I don't have any sample data with which to test >>> it to be sure. I also don't know if it supports BLAST+ input--I >>> haven't kept up with SearchIO (on which search2gff.pl depends); will >>> it accept it? >>> >>> Scott >>> >>> >>> On Wed, Jan 2, 2013 at 3:26 PM, Brian Osborne wrote: >>>> Here's one: >>>> >>>> https://github.com/GMOD/GBrowse/blob/master/contrib/blast2gff.pl >>>> >>>> Another one: >>>> >>>> ~/git/bioperl-live>head scripts/utilities/bp_search2gff.pl >>>> #!perl >>>> >>>> # Author: Jason Stajich >>>> # Description: Turn SearchIO parseable report(s) into a GFF report >>>> # >>>> =head1 NAME >>>> >>>> bp_search2gff - Turn SearchIO parseable reports(s) into a GFF report >>>> >>>> >>>> >>>> Brian O. >>>> >>>> On Jan 2, 2013, at 2:44 PM, Jim Hu wrote: >>>> >>>>> I assume this has already been done many times, but I can't seem to find it on bioperl.org or via google. >>>>> >>>>> I'm looking for a script that will take one of the blast+ outformats that includes the positions of gaps and mismatches, and create gff with appropriate subfeatures. >>>>> >>>>> Thanks, >>>>> >>>>> Jim >>>>> ===================================== >>>>> Jim Hu >>>>> Professor >>>>> Dept. of Biochemistry and Biophysics >>>>> 2128 TAMU >>>>> Texas A&M Univ. >>>>> College Station, TX 77843-2128 >>>>> 979-862-4054 >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >>> -- >>> ------------------------------------------------------------------------ >>> Scott Cain, Ph. D. scott at scottcain dot net >>> GMOD Coordinator (http://gmod.org/) 216-392-3087 >>> Ontario Institute for Cancer Research >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at gmail.com Mon Jan 7 04:35:23 2013 From: jason.stajich at gmail.com (Jason Stajich) Date: Mon, 7 Jan 2013 01:35:23 -0800 Subject: [Bioperl-l] Converting blast+ output to gff (with gaps) In-Reply-To: <12E97C07-D3E0-4E21-8A24-A4223468FF70@verizon.net> References: <072DC877-6A75-4A2E-9E6B-027E1D874C4C@tamu.edu> <118F034CF4C3EF48A96F86CE585B94BF4CF5FFF5@CHIMBX5.ad.uillinois.edu> <80636318-ED50-4F05-85C2-35F12E4FF9D8@verizon.net> <12E97C07-D3E0-4E21-8A24-A4223468FF70@verizon.net> Message-ID: with bp_search2gff I included a cmdline options so you decide what is the reference (seqid column) and what is the target (9th column) since in some cases (BLASTX vs TBLASTN) you may want to be displaying one or the other as the reference. this is the -t option. You can also have the -m option when you run which will build the grouping match term for a parent and then the sub-children will be HSPs that have that parent per Malcom's description of the parent child info. Have you tried this with your output - it should work fine with regular vanilla text blast+ output or the -outfmt 6 option (tell bp_search2gff the format is blasttable) or you can use -outfmt 5 and tell it you are using blastxml format. Jason On Jan 5, 2013, at 2:25 PM, Brian Osborne wrote: > All, > > The scripts/utilities/bp_search2gff.pl script works fine with blastn (BLAST+), no surprise there, and since it uses Bio::Tools::GFF it can produce GFF3. > > Brian O. > > > On Jan 2, 2013, at 5:53 PM, Brian Osborne wrote: > >> Scott and Chris, >> >> I'll test it and see? >> >> Brian O. >> >> >> On Jan 2, 2013, at 5:26 PM, "Fields, Christopher J" wrote: >> >>> It should (I recall using it at one point). If it doesn't we should fix it so it does. >>> >>> How does MAKER deal with this? IIRC it uses (a modified) SearchIO-based method... >>> >>> chris >>> >>> On Jan 2, 2013, at 3:32 PM, Scott Cain wrote: >>> >>>> Hi Brian, >>>> >>>> I was going to suggest the same thing--though that script is fairly >>>> old, it's not as old as the blast2gff script in the GBrowse >>>> distribution (which probably should be retired). I believe it >>>> supports GFF3, though I don't have any sample data with which to test >>>> it to be sure. I also don't know if it supports BLAST+ input--I >>>> haven't kept up with SearchIO (on which search2gff.pl depends); will >>>> it accept it? >>>> >>>> Scott >>>> >>>> >>>> On Wed, Jan 2, 2013 at 3:26 PM, Brian Osborne wrote: >>>>> Here's one: >>>>> >>>>> https://github.com/GMOD/GBrowse/blob/master/contrib/blast2gff.pl >>>>> >>>>> Another one: >>>>> >>>>> ~/git/bioperl-live>head scripts/utilities/bp_search2gff.pl >>>>> #!perl >>>>> >>>>> # Author: Jason Stajich >>>>> # Description: Turn SearchIO parseable report(s) into a GFF report >>>>> # >>>>> =head1 NAME >>>>> >>>>> bp_search2gff - Turn SearchIO parseable reports(s) into a GFF report >>>>> >>>>> >>>>> >>>>> Brian O. >>>>> >>>>> On Jan 2, 2013, at 2:44 PM, Jim Hu wrote: >>>>> >>>>>> I assume this has already been done many times, but I can't seem to find it on bioperl.org or via google. >>>>>> >>>>>> I'm looking for a script that will take one of the blast+ outformats that includes the positions of gaps and mismatches, and create gff with appropriate subfeatures. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Jim >>>>>> ===================================== >>>>>> Jim Hu >>>>>> Professor >>>>>> Dept. of Biochemistry and Biophysics >>>>>> 2128 TAMU >>>>>> Texas A&M Univ. >>>>>> College Station, TX 77843-2128 >>>>>> 979-862-4054 >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> >>>> -- >>>> ------------------------------------------------------------------------ >>>> Scott Cain, Ph. D. scott at scottcain dot net >>>> GMOD Coordinator (http://gmod.org/) 216-392-3087 >>>> Ontario Institute for Cancer Research >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason.stajich at gmail.com jason at bioperl.org From pengyu.ut at gmail.com Thu Jan 3 17:41:37 2013 From: pengyu.ut at gmail.com (Peng Yu) Date: Thu, 3 Jan 2013 14:41:37 -0800 (PST) Subject: [Bioperl-l] Error when using Bio::AlignIO on stockholm file from Pfam Message-ID: Hi, I tried the following code on the Pfam stockholm file (downloaded at ftp.sanger.ac.uk/pub/databases/Pfam/current_release/Pfam-A.full.gz). But it gives me the following error. Is it a bug in bioperl or it is an error in the Pfam-A.full.gz file? ~/linux/bin/xplat/src/stk/stk2acc/perl$ cat main.pl #!/usr/bin/env perl use strict; use warnings; #use FindBin; #use lib "$FindBin::Bin/."; use Bio::AlignIO; use Data::Dumper; my $in = Bio::AlignIO->new( -format => 'stockholm', -file => '/dev/stdin' ); while(my $aln=$in->next_aln) { print $aln->accession, "\n"; } ~/linux/bin/xplat/src/stk/stk2acc/perl$ cat main_full.sh #!/usr/bin/env bash zcat ~/db/Pfam-A.full.gz | ./main.pl ~/linux/bin/xplat/src/stk/stk2acc/perl$ ./main_full.sh ... ------------- EXCEPTION ------------- MSG: '1985D' is not an integer. STACK Bio::Range::end /Users/py/Library/ActivePerl-5.14/lib/Bio/ Range.pm:227 STACK Bio::Annotation::Target::new /Users/py/Library/ActivePerl-5.14/ lib/Bio/Annotation/Target.pm:81 STACK Bio::AlignIO::Handler::GenericAlignHandler::_stockholm_target / Users/py/Library/ActivePerl-5.14/lib/Bio/AlignIO/Handler/ GenericAlignHandler.pm:293 STACK Bio::AlignIO::Handler::GenericAlignHandler::data_handler /Users/ py/Library/ActivePerl-5.14/lib/Bio/AlignIO/Handler/ GenericAlignHandler.pm:73 STACK Bio::AlignIO::stockholm::next_aln /Users/py/Library/ ActivePerl-5.14/lib/Bio/AlignIO/stockholm.pm:470 STACK toplevel ./main.pl:17 ------------------------------------- From thomas.girke at gmail.com Wed Jan 9 15:59:08 2013 From: thomas.girke at gmail.com (Thomas Girke) Date: Wed, 9 Jan 2013 12:59:08 -0800 (PST) Subject: [Bioperl-l] Bioinformatics Position Opening Message-ID: <9a99afb9-4a67-4d20-89ce-4d416fa7a46b@googlegroups.com> Dear List, Below is an announcement for a Ph.D. level bioinformatics position at UCR. It is a long-term position with a competitive salary in a vibrant research environment with cutting edge high-performance compute and genomics facilities. Application instructions are given in the announcement. Potential candidates are welcome to email me their questions about this position directly, e.g. prior or after submitting a formal application. Best, Thomas -- Thomas Girke Associate Professor of Bioinformatics Institute for Integrative Genome Biology (IIGB) 1207F Genomics Building University of California Riverside, CA 92521 E-mail: thomas.girke at ucr.edu Ph: 951-905-5232 Fax: 951-827-5155 POSITION ANNOUNCEMENT POSITION The Institute for Integrative Genome Biology (IIGB) at the University of California, Riverside is seeking a Ph.D. level bioinformatician to manage its bioinformatics research activities and computing facility. TITLE/RANK Bioinformatics Facility Director. Salary will be competitive and commensurate with accomplishments. LOCATION University of California, Riverside. BACKGROUND Successful candidates will join an innovative and multidisciplinary Institute for Integrative Genome Biology (IIGB) that connects theoretical and experimental researchers from different departments in Life, Physical and Mathematical Sciences, Medicine, Engineering and various campus based Centers. The IIGB is organized around a 10,000 sq.ft. suite of Instrumentation Facilities that serve as a centralized, shared-use resource for faculty, staff and students, offering advanced tools in bioinformatics, microscopy, proteomics and genomics. Its bioinformatic component is equipped with an extensive high-performance compute (HPC) infrastructure. QUALIFICATIONS Applicants must have a Ph.D. from a recognized university in bioinformatics; or combined degrees in computer science and a biological science; or a degree in either computer science combined with relevant experience in biological science; or a degree in a biological science combined with relevant experience in computer science. The successful candidate will have at least 2 years of professional hands-on experience with next generation sequence data analysis, scientific data programming and high-performance computing. A strong publication record of bioinformatics research in collaboration with experimental biologists is expected. Another requirement is several years of professional experience with common programming languages/environments. This includes at least one statistical programming environment (preferentially R), one or more general-purpose scripting languages (e.g. Python, Perl or Ruby), experience with web development frameworks and relational database design. Several years of computational research experience using HPC systems will be beneficial. The incumbent should also have experience with the analysis of modern biological data sets, such as microarrays, next generation sequence data (e.g. genotyping, RNA profiling, de novo assemblies), phylogenetics and/or molecular dynamics simulations. RESPONSIBILITIES The Bioinformatics Facility Director manages IIGB's computational infrastructure jointly with its bioinformatics staff, including an HPC/Linux systems administrator, one or more programmers and students workers. The incumbent will be required to provide data analysis support to collaborative research activities and make available findings through presentations and contribute as team member to scientific publications as well as participate in the preparation of joint grant applications and reports. The teaching expectations include the development of a state-of-the-art workshop program on large-scale data analysis and programming. Participation in collaborative equipment grants will be another core responsibility to secure future growth of the facility?s computing resources. TO APPLY Review of applications will continue until the position is filled. Interested individuals should: (1) submit a curriculum vitae, (2) provide a statement of research interests, and (3) arrange to have three letters of reference sent on their behalf. All information should be emailed to: thomas.girke at ucr.edu WEBSITE http://facility.bioinformatics.ucr.edu/position-opening From frej.tulin at gmail.com Thu Jan 10 21:53:16 2013 From: frej.tulin at gmail.com (Frej Tulin) Date: Thu, 10 Jan 2013 21:53:16 -0500 Subject: [Bioperl-l] stand alone blast plus problem Message-ID: Hi I have been trying for a long time to control my blast searches with the Bio::Tools::Run::StandAloneBlastPlus module. The module is installed and gets loaded ok. However, when calling: my $blast_db = Bio::Tools::Run::StandAloneBlastPlus -> new ( -db_data => \@seq_arr, -create => 1); $blast_db -> make_db(); ---------------------------------------------------------------------- I get this error ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: /home/frej/ncbi-blast-2.2.27+/bin/makeblastdb call crashed: There was a problem running /home/frej/ncbi-blast-2.2.27+/bin/makeblastdb : Error: NCBI C++ Exception: "/am/ncbiapdata/release/blast/src/2.2.27/Linux64-Centos-icc/c++/ICC1010-ReleaseMT64--Linux64-Centos-icc/../src/objects/seq/../seqloc/Seq_id.cpp", line 1689: Error: ncbi::objects::CSeq_id::x_Init() - Unsupported ID type ATCG00500.1 STACK: Error::throw STACK: Bio::Root::Root::throw /home/frej/perl5/lib/perl5/Bio/Root/Root.pm:472 STACK: Bio::Tools::Run::WrapperBase::_run /home/frej/perl5/lib/perl5/Bio/Tools/Run/WrapperBase/CommandExts.pm:1012 STACK: Bio::Tools::Run::StandAloneBlastPlus::make_db /home/frej/perl5/lib/perl5/Bio/Tools/Run/StandAloneBlastPlus.pm:776 STACK: comp2fasta.pl:57 ------------------------------------------------------------------------------------------------------------------------------- The general header format of the fasta files I've tried are >ATCG00500.1|PACid:19637947 or >g8540.t1|PACid:26887931 I always get the same message. I have tried modifying the header by putting 'lcl|' in from but nothing seems to help. When I build a database using the makeblastdb directly from the command line it works ok, it just doesn't work from within bioperl. How can this be? What am I doing wrong? helpful for any assistance best /frej -- Frej Tulin Fred Cross lab Rockefeller University From cjfields at illinois.edu Fri Jan 11 00:04:12 2013 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 11 Jan 2013 05:04:12 +0000 Subject: [Bioperl-l] Error when using Bio::AlignIO on stockholm file from Pfam In-Reply-To: References: Message-ID: <118F034CF4C3EF48A96F86CE585B94BF4CF6A00B@CHIMBX5.ad.uillinois.edu> I haven't used the '/dev/stdin' file like this, I typically do: > my $in = Bio::AlignIO->new( > -format => 'stockholm', > -fh => \*STDIN > ); but if it works? That does look like a bug in the parser. Would you know the Pfam alignmet accessions that triggered it, or maybe ones just prior to the fail? chris On Jan 3, 2013, at 4:41 PM, Peng Yu wrote: > Hi, > > I tried the following code on the Pfam stockholm file (downloaded at > ftp.sanger.ac.uk/pub/databases/Pfam/current_release/Pfam-A.full.gz). > But it gives me the following error. Is it a bug in bioperl or it is > an error in the Pfam-A.full.gz file? > > ~/linux/bin/xplat/src/stk/stk2acc/perl$ cat main.pl > #!/usr/bin/env perl > > use strict; > use warnings; > > #use FindBin; > #use lib "$FindBin::Bin/."; > > use Bio::AlignIO; > use Data::Dumper; > > my $in = Bio::AlignIO->new( > -format => 'stockholm', > -file => '/dev/stdin' > ); > > while(my $aln=$in->next_aln) { > print $aln->accession, "\n"; > } > > ~/linux/bin/xplat/src/stk/stk2acc/perl$ cat main_full.sh > #!/usr/bin/env bash > > zcat ~/db/Pfam-A.full.gz | ./main.pl > > ~/linux/bin/xplat/src/stk/stk2acc/perl$ ./main_full.sh > > ... > > ------------- EXCEPTION ------------- > MSG: '1985D' is not an integer. > > STACK Bio::Range::end /Users/py/Library/ActivePerl-5.14/lib/Bio/ > Range.pm:227 > STACK Bio::Annotation::Target::new /Users/py/Library/ActivePerl-5.14/ > lib/Bio/Annotation/Target.pm:81 > STACK Bio::AlignIO::Handler::GenericAlignHandler::_stockholm_target / > Users/py/Library/ActivePerl-5.14/lib/Bio/AlignIO/Handler/ > GenericAlignHandler.pm:293 > STACK Bio::AlignIO::Handler::GenericAlignHandler::data_handler /Users/ > py/Library/ActivePerl-5.14/lib/Bio/AlignIO/Handler/ > GenericAlignHandler.pm:73 > STACK Bio::AlignIO::stockholm::next_aln /Users/py/Library/ > ActivePerl-5.14/lib/Bio/AlignIO/stockholm.pm:470 > STACK toplevel ./main.pl:17 > ------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Russell.Smithies at agresearch.co.nz Sun Jan 13 17:25:16 2013 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Mon, 14 Jan 2013 11:25:16 +1300 Subject: [Bioperl-l] stand alone blast plus problem In-Reply-To: References: Message-ID: <18DF7D20DFEC044098A1062202F5FFF372E99EE4C1@exchsth.agresearch.co.nz> Hi Frej, Here's a few things to try: Is it possible you need to escape the special chars in your IDs? i.e. the pipes and colons? Have you tried it with simpler IDs? Is your BioPerl sequence object collection created correctly? Can you create the database by just passing it the fasta file? $fac = Bio::Tools::Run::StandAloneBlastPlus->new( -db_name => 'mydb', -db_data => 'myseqs.fas', -create => 1 ); --Russell -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Frej Tulin Sent: Friday, 11 January 2013 3:53 p.m. To: bioperl-l at bioperl.org Subject: [Bioperl-l] stand alone blast plus problem Hi I have been trying for a long time to control my blast searches with the Bio::Tools::Run::StandAloneBlastPlus module. The module is installed and gets loaded ok. However, when calling: my $blast_db = Bio::Tools::Run::StandAloneBlastPlus -> new ( -db_data => \@seq_arr, -create => 1); $blast_db -> make_db(); ---------------------------------------------------------------------- I get this error ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: /home/frej/ncbi-blast-2.2.27+/bin/makeblastdb call crashed: There was a problem running /home/frej/ncbi-blast-2.2.27+/bin/makeblastdb : Error: NCBI C++ Exception: "/am/ncbiapdata/release/blast/src/2.2.27/Linux64-Centos-icc/c++/ICC1010-ReleaseMT64--Linux64-Centos-icc/../src/objects/seq/../seqloc/Seq_id.cpp", line 1689: Error: ncbi::objects::CSeq_id::x_Init() - Unsupported ID type ATCG00500.1 STACK: Error::throw STACK: Bio::Root::Root::throw /home/frej/perl5/lib/perl5/Bio/Root/Root.pm:472 STACK: Bio::Tools::Run::WrapperBase::_run /home/frej/perl5/lib/perl5/Bio/Tools/Run/WrapperBase/CommandExts.pm:1012 STACK: Bio::Tools::Run::StandAloneBlastPlus::make_db /home/frej/perl5/lib/perl5/Bio/Tools/Run/StandAloneBlastPlus.pm:776 STACK: comp2fasta.pl:57 ------------------------------------------------------------------------------------------------------------------------------- The general header format of the fasta files I've tried are >ATCG00500.1|PACid:19637947 or >g8540.t1|PACid:26887931 I always get the same message. I have tried modifying the header by putting 'lcl|' in from but nothing seems to help. When I build a database using the makeblastdb directly from the command line it works ok, it just doesn't work from within bioperl. How can this be? What am I doing wrong? helpful for any assistance best /frej -- Frej Tulin Fred Cross lab Rockefeller University _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From awitney at sgul.ac.uk Thu Jan 17 06:45:40 2013 From: awitney at sgul.ac.uk (Adam Witney) Date: Thu, 17 Jan 2013 11:45:40 +0000 Subject: [Bioperl-l] drawing genome comparison cross linking tracks? Message-ID: <50F7E464.3090906@sgul.ac.uk> Hi, I would like to produces graphics as shown in this section (cross linking between genome tracks) [scroll down a little to see the pictures]: http://www.biopython.org/DIST/docs/tutorial/Tutorial.html#htoc211 but using perl. Can Bio::Graphics do this? I can only find Gbrowse_syn, but this works within a web browser, I would like to generate these diagrams from a stand alone script. If not does anyone know any perl libraries that can be used to show genome comparisons? Thanks Adam From cjfields at illinois.edu Thu Jan 17 08:37:04 2013 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 17 Jan 2013 13:37:04 +0000 Subject: [Bioperl-l] drawing genome comparison cross linking tracks? In-Reply-To: <50F7E464.3090906@sgul.ac.uk> References: <50F7E464.3090906@sgul.ac.uk> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF69BFAE17@CITESMBX5.ad.uillinois.edu> Gbrowse_syn, SynBrowse, and others can do this, but there isn't a pre-packaged way of drawing this that I know of analogous to Biopython's. It would be nice to have, though. chris On Jan 17, 2013, at 5:45 AM, Adam Witney wrote: > Hi, > > I would like to produces graphics as shown in this section (cross linking between genome tracks) [scroll down a little to see the pictures]: > > http://www.biopython.org/DIST/docs/tutorial/Tutorial.html#htoc211 > > but using perl. Can Bio::Graphics do this? I can only find Gbrowse_syn, but this works within a web browser, I would like to generate these diagrams from a stand alone script. If not does anyone know any perl libraries that can be used to show genome comparisons? > > Thanks > > Adam > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From awitney at sgul.ac.uk Thu Jan 17 12:33:42 2013 From: awitney at sgul.ac.uk (Adam Witney) Date: Thu, 17 Jan 2013 17:33:42 +0000 Subject: [Bioperl-l] drawing genome comparison cross linking tracks? In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF69BFAE17@CITESMBX5.ad.uillinois.edu> References: <50F7E464.3090906@sgul.ac.uk> <118F034CF4C3EF48A96F86CE585B94BF69BFAE17@CITESMBX5.ad.uillinois.edu> Message-ID: <50F835F6.5090603@sgul.ac.uk> Thanks Chris. I hadn't seen SynBrowse before, looks good. But I would like to do this without all the database/webserver overhead, I have a perl script that can interact with a gbrowse_syn database to generate images ie no webserver, I just need to separate out the database stuff now. Thanks again Adam On 17/01/2013 13:37, Fields, Christopher J wrote: > Gbrowse_syn, SynBrowse, and others can do this, but there isn't a pre-packaged way of drawing this that I know of analogous to Biopython's. It would be nice to have, though. > > chris > > On Jan 17, 2013, at 5:45 AM, Adam Witney wrote: > >> Hi, >> >> I would like to produces graphics as shown in this section (cross linking between genome tracks) [scroll down a little to see the pictures]: >> >> http://www.biopython.org/DIST/docs/tutorial/Tutorial.html#htoc211 >> >> but using perl. Can Bio::Graphics do this? I can only find Gbrowse_syn, but this works within a web browser, I would like to generate these diagrams from a stand alone script. If not does anyone know any perl libraries that can be used to show genome comparisons? >> >> Thanks >> >> Adam >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Thu Jan 17 12:41:47 2013 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 17 Jan 2013 17:41:47 +0000 Subject: [Bioperl-l] drawing genome comparison cross linking tracks? In-Reply-To: <50F835F6.5090603@sgul.ac.uk> References: <50F7E464.3090906@sgul.ac.uk> <118F034CF4C3EF48A96F86CE585B94BF69BFAE17@CITESMBX5.ad.uillinois.edu> <50F835F6.5090603@sgul.ac.uk> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF69BFD4BA@CITESMBX5.ad.uillinois.edu> Let us know if this can be of use to the community, I'm sure others would find it useful as well. chris On Jan 17, 2013, at 11:33 AM, Adam Witney wrote: > > Thanks Chris. I hadn't seen SynBrowse before, looks good. > > But I would like to do this without all the database/webserver overhead, I have a perl script that can interact with a gbrowse_syn database to generate images ie no webserver, I just need to separate out the database stuff now. > > Thanks again > > Adam > > On 17/01/2013 13:37, Fields, Christopher J wrote: >> Gbrowse_syn, SynBrowse, and others can do this, but there isn't a pre-packaged way of drawing this that I know of analogous to Biopython's. It would be nice to have, though. >> >> chris >> >> On Jan 17, 2013, at 5:45 AM, Adam Witney wrote: >> >>> Hi, >>> >>> I would like to produces graphics as shown in this section (cross linking between genome tracks) [scroll down a little to see the pictures]: >>> >>> http://www.biopython.org/DIST/docs/tutorial/Tutorial.html#htoc211 >>> >>> but using perl. Can Bio::Graphics do this? I can only find Gbrowse_syn, but this works within a web browser, I would like to generate these diagrams from a stand alone script. If not does anyone know any perl libraries that can be used to show genome comparisons? >>> >>> Thanks >>> >>> Adam >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From buschj at hhu.de Sat Jan 19 11:12:54 2013 From: buschj at hhu.de (jobu) Date: Sat, 19 Jan 2013 17:12:54 +0100 Subject: [Bioperl-l] Parsing Blast-Report extracting "Features flanking .." Message-ID: <50FAC606.2060800@hhu.de> Dear List, I've got quite a lot of hits from blasting against nt-database that state: "Features flanking this part of subject sequence: " where the subject-accession only will give information about complete genomic sequences, the detailed information inside the "Features flanking" section of the HSP about annotations of upstream and downstream elements would be what I need. Is there any elegant way to make Bio::SearchIO do this job for me? greetings Jochen From saladi1 at illinois.edu Tue Jan 22 16:45:23 2013 From: saladi1 at illinois.edu (Shyam Saladi) Date: Tue, 22 Jan 2013 13:45:23 -0800 Subject: [Bioperl-l] Bio::DB::Taxonomy example Message-ID: Hi, There is an example on how to use Bio::DB::Taxonomy at http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy I tried the script but could not get it to work properly. After looking through the documentation it seems that my @extant_children = grep { $_->rank eq 'species' } $node->get_all_Descendents; should instead be my @extant_children = grep { $_->rank eq 'species' } $db->get_all_Descendents($node) I found this solves the problem for me. If my fix is in fact correct, an admin may want to change the example script online. Thanks, Shyam ** From cjfields at illinois.edu Tue Jan 22 18:40:05 2013 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 22 Jan 2013 23:40:05 +0000 Subject: [Bioperl-l] Bio::DB::Taxonomy example In-Reply-To: References: Message-ID: <118F034CF4C3EF48A96F86CE585B94BF6CDFE139@CITESMBX5.ad.uillinois.edu> Done. Thanks for pointing that out! chris On Jan 23, 2013, at 5:45 AM, Shyam Saladi wrote: > Hi, > > There is an example on how to use Bio::DB::Taxonomy at > http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy > > I tried the script but could not get it to work properly. After looking > through the documentation it seems that > > my @extant_children = grep { $_->rank eq 'species' } > $node->get_all_Descendents; > > should instead be > > my @extant_children = grep { $_->rank eq 'species' } > $db->get_all_Descendents($node) > > I found this solves the problem for me. If my fix is in fact correct, an > admin may want to change the example script online. > > Thanks, > Shyam > ** > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From buschj at hhu.de Fri Jan 25 05:20:22 2013 From: buschj at hhu.de (jobu) Date: Fri, 25 Jan 2013 11:20:22 +0100 Subject: [Bioperl-l] Parsing Blast-Report extracting "Features flanking .." In-Reply-To: References: <50FAC606.2060800@hhu.de> Message-ID: <51025C66.6060008@hhu.de> Am 22.01.2013 19:03, schrieb Mgavi Brathwaite: > What upstream and downstream elements are you interested in? I've got a huge pile of short RNA reads. Part of the question now is whether those RNA fragments originate from siRNA events, or may represent miRNAs / parts of pre-miRNAs. So I did an online blast search against database nt. The resulting report quite often just gives subject information like this: ----- > gb|CP002686.1| Arabidopsis thaliana chromosome 3, complete sequence Length=23459830 ----- Now I would like to get the hit's neighbouring regions for further analysis. Preferably I would like to do that in an automized way, but the only possible action with this kind of subject gi | description would be to fetch the entire chromosomal sequence I guess ? However, right below the line above, the report states more precisely: ------ Features flanking this part of subject sequence: 8872 bp at 5' side: cytochrome P450 90B1 402 bp at 3' side: U1 small nuclear ribonucleoprotein-70K ------ Still I would like to have the possibility to automatically fetch the subject's sequence(s), as of now I think parsing the report with SearchIO won't let me aquire that information, because SearchIO does not recognize report sections like those. I hope I did not miss any of SearchIOs capabilities, but I could not find any method covering my wish?! Right now maybe the only way to get the information I want is to construct my own parser and write it out into a separate file, which in turn again I could read into a hash before processing the Blast-Report with SearchIO to combine both data for further automized work. I am aware though that even successfully getting the flanking features would leave me with the more or less wide intergenic gap my hsp is located in. However I'm in need of a way to get the flanking features including their annotation and the region spanning between them. But I hope I do not have to get complete sequences to accomplish that, as this would be kind of an overkill. with kind regards Jochen From afonsomduarte at gmail.com Mon Jan 14 05:09:48 2013 From: afonsomduarte at gmail.com (Afonso Duarte) Date: Mon, 14 Jan 2013 02:09:48 -0800 (PST) Subject: [Bioperl-l] Parsing Blast results for different genus Message-ID: <217cf8ac-5323-48f5-8a81-aba032cba68b@googlegroups.com> Dear All, I have obtained several .gb BLAST results that I want to split in different .gb or .fasta files accordingly to genus. I searched for different ways to do it in the web, but didn't find a way to do it when one has something like 50 genus in the blast. (to do it for a short number or genus I could use splitgb.pl from http://www.bioperl.org/wiki/HOWTO:SeqIO but it will not be feasible for a large number of genus). Is there a script of way to do it in a efficient way ? Best PS sorry if this message appears twice ... as i was having some trouble posting the first time. From afonsomduarte at gmail.com Sun Jan 13 14:17:19 2013 From: afonsomduarte at gmail.com (Afonso Duarte) Date: Sun, 13 Jan 2013 11:17:19 -0800 (PST) Subject: [Bioperl-l] Split blast result by genus. How to parse it ? Message-ID: <83857a34-14a0-43e2-a2b3-5ca538d41bcd@googlegroups.com> Dear All, I have obtained several .gb BLAST results that I want to split in different .gb or .fasta files accordingly to genus. I searched for different ways to do it in the web, but didn't find a way to do it when one has something like 50 genus in the blast. (to do it for a short number or genus I could use splitgb.pl from http://www.bioperl.org/wiki/HOWTO:SeqIO but it will not be feasible for a large number of genus). Is there a script of way to do it in a efficient way ? Best From afonsomduarte at gmail.com Mon Jan 21 04:03:44 2013 From: afonsomduarte at gmail.com (Afonso Duarte) Date: Mon, 21 Jan 2013 01:03:44 -0800 (PST) Subject: [Bioperl-l] spliting genbank file into smaller ones via organism Message-ID: Dear List, I have obtained several .gb BLAST results that I want to split in different .gb or .fasta files accordingly to genus. I searched for different ways to do it in the web, but didn't find a way to do it when one has something like 50 genus in the blast. (to do it for a short number or genus I could use splitgb.pl from http://www.bioperl.org/wiki/HOWTO:SeqIO but it will not be feasible for a large number of genus). Is there a script of way to do it in a efficient way ? Best From jason.stajich at gmail.com Tue Jan 29 14:00:16 2013 From: jason.stajich at gmail.com (Jason Stajich) Date: Tue, 29 Jan 2013 11:00:16 -0800 Subject: [Bioperl-l] Parsing Blast-Report extracting "Features flanking .." In-Reply-To: <51025C66.6060008@hhu.de> References: <50FAC606.2060800@hhu.de> <51025C66.6060008@hhu.de> Message-ID: <6E83E3F3-C304-4DC4-9A11-FE1CA90F207D@gmail.com> We don't parse the NCBI feature info from the BLAST reports per your query. To look up a specific feature you can use Bio::DB::GenBank to query for sequence from a specific feature by accession number - see the HOWTOs for that. However, most people use tools that generate SAM/BAM files with short reads - then you can use a tool like bedtools to find overlaps of reads with the locations of features. basically: - download the genome and GFF for arabidopsis - align your sRNA to the genome with a short read aligner - bowtie, bwa, others - convert your sam to bam file with SAMtools or picard - compare the location of features with the reads to get expression summaries or individuals reads with BEDTools On Jan 25, 2013, at 2:20 AM, jobu wrote: > Am 22.01.2013 19:03, schrieb Mgavi Brathwaite: >> What upstream and downstream elements are you interested in? > > > I've got a huge pile of short RNA reads. > Part of the question now is whether those RNA fragments originate from > siRNA events, > or may represent miRNAs / parts of pre-miRNAs. > > So I did an online blast search against database nt. > The resulting report quite often just gives subject information like this: > > ----- >> gb|CP002686.1| Arabidopsis thaliana chromosome 3, complete sequence > Length=23459830 > ----- > > Now I would like to get the hit's neighbouring regions for further > analysis. > Preferably I would like to do that in an automized way, but the only > possible action with this kind of subject gi | description would be to > fetch the entire chromosomal sequence I guess ? > > However, > right below the line above, the report states more precisely: > > ------ > Features flanking this part of subject sequence: > 8872 bp at 5' side: cytochrome P450 90B1 > 402 bp at 3' side: U1 small nuclear ribonucleoprotein-70K > ------ > > Still I would like to have the possibility to automatically fetch the > subject's sequence(s), > as of now I think parsing the report with SearchIO won't let me aquire > that information, because SearchIO does not recognize report sections > like those. > > I hope I did not miss any of SearchIOs capabilities, but I could not > find any method covering my wish?! > > Right now maybe the only way to get the information I want is to > construct my own parser and write it out into a separate file, which in > turn again I could read into a hash before processing the Blast-Report > with SearchIO to combine both data for further automized work. > > I am aware though that even successfully getting the flanking features > would leave me with the more or less wide intergenic gap my hsp is > located in. > > However I'm in need of a way to get the flanking features including > their annotation and the region spanning between them. > But I hope I do not have to get complete sequences to accomplish that, > as this would be kind of an overkill. > > with kind regards > Jochen > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason.stajich at gmail.com jason at bioperl.org From dr_kilburn59 at yahoo.com Wed Jan 30 16:40:26 2013 From: dr_kilburn59 at yahoo.com (Dan kilburn) Date: Wed, 30 Jan 2013 16:40:26 -0500 Subject: [Bioperl-l] Bioperl-l Digest, Vol 117, Issue 13 In-Reply-To: References: Message-ID: <575B184F-C051-4FEF-9BEB-7AB98E3C52A6@yahoo.com> Hi Jason, Are there any plans to keep SearchIO up to date with ncbi blast? I know they change formats ridiculously often, but I had to write my own parser to get sequence identity, which I would rather not have done. I realize that this job would be a big load on anyone who takes it, but it's so fundamental. Maybe I can help. --Dan Sent from my iPhone On Jan 30, 2013, at 12:00 PM, bioperl-l-request at lists.open-bio.org wrote: > Send Bioperl-l mailing list submissions to > bioperl-l at lists.open-bio.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://lists.open-bio.org/mailman/listinfo/bioperl-l > or, via email, send a message with subject or body 'help' to > bioperl-l-request at lists.open-bio.org > > You can reach the person managing the list at > bioperl-l-owner at lists.open-bio.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Bioperl-l digest..." > > > Today's Topics: > > 1. Re: Parsing Blast-Report extracting "Features flanking .." > (Jason Stajich) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Tue, 29 Jan 2013 11:00:16 -0800 > From: Jason Stajich > Subject: Re: [Bioperl-l] Parsing Blast-Report extracting "Features > flanking .." > To: buschj at hhu.de > Cc: bioperl-l at lists.open-bio.org > Message-ID: <6E83E3F3-C304-4DC4-9A11-FE1CA90F207D at gmail.com> > Content-Type: text/plain; charset=us-ascii > > We don't parse the NCBI feature info from the BLAST reports per your query. To look up a specific feature you can use Bio::DB::GenBank to query for sequence from a specific feature by accession number - see the HOWTOs for that. > > However, most people use tools that generate SAM/BAM files with short reads - then you can use a tool like bedtools to find overlaps of reads with the locations of features. > > basically: > - download the genome and GFF for arabidopsis > - align your sRNA to the genome with a short read aligner - bowtie, bwa, others > - convert your sam to bam file with SAMtools or picard > - compare the location of features with the reads to get expression summaries or individuals reads with BEDTools > > > On Jan 25, 2013, at 2:20 AM, jobu wrote: > >> Am 22.01.2013 19:03, schrieb Mgavi Brathwaite: >>> What upstream and downstream elements are you interested in? >> >> >> I've got a huge pile of short RNA reads. >> Part of the question now is whether those RNA fragments originate from >> siRNA events, >> or may represent miRNAs / parts of pre-miRNAs. >> >> So I did an online blast search against database nt. >> The resulting report quite often just gives subject information like this: >> >> ----- >>> gb|CP002686.1| Arabidopsis thaliana chromosome 3, complete sequence >> Length=23459830 >> ----- >> >> Now I would like to get the hit's neighbouring regions for further >> analysis. >> Preferably I would like to do that in an automized way, but the only >> possible action with this kind of subject gi | description would be to >> fetch the entire chromosomal sequence I guess ? >> >> However, >> right below the line above, the report states more precisely: >> >> ------ >> Features flanking this part of subject sequence: >> 8872 bp at 5' side: cytochrome P450 90B1 >> 402 bp at 3' side: U1 small nuclear ribonucleoprotein-70K >> ------ >> >> Still I would like to have the possibility to automatically fetch the >> subject's sequence(s), >> as of now I think parsing the report with SearchIO won't let me aquire >> that information, because SearchIO does not recognize report sections >> like those. >> >> I hope I did not miss any of SearchIOs capabilities, but I could not >> find any method covering my wish?! >> >> Right now maybe the only way to get the information I want is to >> construct my own parser and write it out into a separate file, which in >> turn again I could read into a hash before processing the Blast-Report >> with SearchIO to combine both data for further automized work. >> >> I am aware though that even successfully getting the flanking features >> would leave me with the more or less wide intergenic gap my hsp is >> located in. >> >> However I'm in need of a way to get the flanking features including >> their annotation and the region spanning between them. >> But I hope I do not have to get complete sequences to accomplish that, >> as this would be kind of an overkill. >> >> with kind regards >> Jochen >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > > > > > ------------------------------ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > End of Bioperl-l Digest, Vol 117, Issue 13 > ****************************************** From cjfields at illinois.edu Wed Jan 30 18:13:49 2013 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 30 Jan 2013 23:13:49 +0000 Subject: [Bioperl-l] Bioperl-l Digest, Vol 117, Issue 13 In-Reply-To: <575B184F-C051-4FEF-9BEB-7AB98E3C52A6@yahoo.com> References: <575B184F-C051-4FEF-9BEB-7AB98E3C52A6@yahoo.com> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF6CE1175C@CHIMBX5.ad.uillinois.edu> We certainly accept support for updates to the code. In fact, if you are familiar with git/github the process is fairly straightforward: 1) Fork the code to your github account 2) Make and commit changes 3) Submit a pull request 4) Post something to the list just in case. We also accept code patches; the best way to submit these is as a bug report to the redmine repository (doesn't hurt to post here as well): https://redmine.open-bio.org/ chris On Jan 30, 2013, at 3:40 PM, Dan kilburn wrote: > Hi Jason, > > Are there any plans to keep SearchIO up to date with ncbi blast? I know they change formats ridiculously often, but I had to write my own parser to get sequence identity, which I would rather not have done. I realize that this job would be a big load on anyone who takes it, but it's so fundamental. Maybe I can help. > > --Dan > Sent from my iPhone > > On Jan 30, 2013, at 12:00 PM, bioperl-l-request at lists.open-bio.org wrote: > >> Send Bioperl-l mailing list submissions to >> bioperl-l at lists.open-bio.org >> >> To subscribe or unsubscribe via the World Wide Web, visit >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> or, via email, send a message with subject or body 'help' to >> bioperl-l-request at lists.open-bio.org >> >> You can reach the person managing the list at >> bioperl-l-owner at lists.open-bio.org >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of Bioperl-l digest..." >> >> >> Today's Topics: >> >> 1. Re: Parsing Blast-Report extracting "Features flanking .." >> (Jason Stajich) >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Tue, 29 Jan 2013 11:00:16 -0800 >> From: Jason Stajich >> Subject: Re: [Bioperl-l] Parsing Blast-Report extracting "Features >> flanking .." >> To: buschj at hhu.de >> Cc: bioperl-l at lists.open-bio.org >> Message-ID: <6E83E3F3-C304-4DC4-9A11-FE1CA90F207D at gmail.com> >> Content-Type: text/plain; charset=us-ascii >> >> We don't parse the NCBI feature info from the BLAST reports per your query. To look up a specific feature you can use Bio::DB::GenBank to query for sequence from a specific feature by accession number - see the HOWTOs for that. >> >> However, most people use tools that generate SAM/BAM files with short reads - then you can use a tool like bedtools to find overlaps of reads with the locations of features. >> >> basically: >> - download the genome and GFF for arabidopsis >> - align your sRNA to the genome with a short read aligner - bowtie, bwa, others >> - convert your sam to bam file with SAMtools or picard >> - compare the location of features with the reads to get expression summaries or individuals reads with BEDTools >> >> >> On Jan 25, 2013, at 2:20 AM, jobu wrote: >> >>> Am 22.01.2013 19:03, schrieb Mgavi Brathwaite: >>>> What upstream and downstream elements are you interested in? >>> >>> >>> I've got a huge pile of short RNA reads. >>> Part of the question now is whether those RNA fragments originate from >>> siRNA events, >>> or may represent miRNAs / parts of pre-miRNAs. >>> >>> So I did an online blast search against database nt. >>> The resulting report quite often just gives subject information like this: >>> >>> ----- >>>> gb|CP002686.1| Arabidopsis thaliana chromosome 3, complete sequence >>> Length=23459830 >>> ----- >>> >>> Now I would like to get the hit's neighbouring regions for further >>> analysis. >>> Preferably I would like to do that in an automized way, but the only >>> possible action with this kind of subject gi | description would be to >>> fetch the entire chromosomal sequence I guess ? >>> >>> However, >>> right below the line above, the report states more precisely: >>> >>> ------ >>> Features flanking this part of subject sequence: >>> 8872 bp at 5' side: cytochrome P450 90B1 >>> 402 bp at 3' side: U1 small nuclear ribonucleoprotein-70K >>> ------ >>> >>> Still I would like to have the possibility to automatically fetch the >>> subject's sequence(s), >>> as of now I think parsing the report with SearchIO won't let me aquire >>> that information, because SearchIO does not recognize report sections >>> like those. >>> >>> I hope I did not miss any of SearchIOs capabilities, but I could not >>> find any method covering my wish?! >>> >>> Right now maybe the only way to get the information I want is to >>> construct my own parser and write it out into a separate file, which in >>> turn again I could read into a hash before processing the Blast-Report >>> with SearchIO to combine both data for further automized work. >>> >>> I am aware though that even successfully getting the flanking features >>> would leave me with the more or less wide intergenic gap my hsp is >>> located in. >>> >>> However I'm in need of a way to get the flanking features including >>> their annotation and the region spanning between them. >>> But I hope I do not have to get complete sequences to accomplish that, >>> as this would be kind of an overkill. >>> >>> with kind regards >>> Jochen >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Jason Stajich >> jason.stajich at gmail.com >> jason at bioperl.org >> >> >> >> >> ------------------------------ >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> End of Bioperl-l Digest, Vol 117, Issue 13 >> ****************************************** > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From thiago.venancio at gmail.com Thu Jan 31 12:10:39 2013 From: thiago.venancio at gmail.com (Thiago Venancio) Date: Thu, 31 Jan 2013 09:10:39 -0800 (PST) Subject: [Bioperl-l] bioperl-ext Message-ID: <5523d5a2-b103-4cf3-8c84-bd33ef675a6d@googlegroups.com> Hi all, I am trying to install bioperl-ext and the compilation gives me the error: /usr/bin/ld: libs/libsw.a(aln.o): relocation R_X86_64_32 against `.rodata' can not be used when making a shared object; recompile with -fPIC libs/libsw.a: could not read symbols: Bad value collect2: ld returned 1 exit status make[1]: *** [blib/arch/auto/Bio/Ext/Align/Align.so] Error 1 make[1]: Leaving directory `/home/thiago/temp/bioperl-ext-master/Bio/Ext/Align' make: *** [subdirs] Error 2 However, the Makefile says that -fPIC is already being used, which is puzzling me. Any clues ? Thanks. Thiago From thiago.venancio at gmail.com Thu Jan 31 12:25:12 2013 From: thiago.venancio at gmail.com (Thiago Venancio) Date: Thu, 31 Jan 2013 09:25:12 -0800 (PST) Subject: [Bioperl-l] bioperl-ext In-Reply-To: <5523d5a2-b103-4cf3-8c84-bd33ef675a6d@googlegroups.com> References: <5523d5a2-b103-4cf3-8c84-bd33ef675a6d@googlegroups.com> Message-ID: Solved using the solution posted by jmf here : https://groups.google.com/forum/?fromgroups=#!searchin/bioperl-l/bioperl-ext/bioperl-l/FgJo_g8UQE8/FB1DEjJkTooJ Which I found after posting. T On Thursday, January 31, 2013 3:10:39 PM UTC-2, Thiago Venancio wrote: > > Hi all, > > I am trying to install bioperl-ext and the compilation gives me the error: > > /usr/bin/ld: libs/libsw.a(aln.o): relocation R_X86_64_32 against `.rodata' > can not be used when making a shared object; recompile with -fPIC > libs/libsw.a: could not read symbols: Bad value > collect2: ld returned 1 exit status > make[1]: *** [blib/arch/auto/Bio/Ext/Align/Align.so] Error 1 > make[1]: Leaving directory > `/home/thiago/temp/bioperl-ext-master/Bio/Ext/Align' > make: *** [subdirs] Error 2 > > However, the Makefile says that -fPIC is already being used, which is > puzzling me. > > Any clues ? > > Thanks. > Thiago > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From koriege at googlemail.com Thu Jan 31 21:49:20 2013 From: koriege at googlemail.com (koriege at googlemail.com) Date: Thu, 31 Jan 2013 18:49:20 -0800 (PST) Subject: [Bioperl-l] problem with Bio::*::Fasta id_parser Message-ID: Hi, I tried two methods to create a bioperl FASTA database, but it failes by extracting the substring out of my headers. Can someone explain me why I get the standard header or show me a work around? thanks in advance. pyr0 i) my $objDB = Bio::Index::Fasta->new(-filename => $PATHdbIdx, -write_flag => 1); $objDB->id_parser(\&get_id); $objDB->make_index(glob($objParameter->dbGenome())); sub get_id { my $header = shift; $header =~ /^>.*\bsp\|([A-Z]\d{5}\b)/; $1; } output Use of uninitialized value $id in concatenation (.) or string at /usr/share/perl5/Bio/Index/Abstract.pm line 753, <$FASTA> line 1. Use of uninitialized value $id in exists at /usr/share/perl5/Bio/Index/Abstract.pm line 754, <$FASTA> line 1. Use of uninitialized value $id in hash element at /usr/share/perl5/Bio/Index/Abstract.pm line 757, <$FASTA> line 1. gi|376282008|ref|NC_016798.1| ii) my $PATHdbIdx=catfile($objParameter->DIR,'data','db.idx'); unlink($PATHdbIdx); my $objDB = Bio::DB::Fasta->new($objParameter->dbGenome(), -makeid => \&get_id); $objDBgenome->set(\$objDB); output: Use of uninitialized value $key in pattern match (m//) at /usr/share/perl5/Bio/DB/Fasta.pm line 1178. Use of uninitialized value $id in exists at /usr/share/perl5/Bio/DB/Fasta.pm line 617. gi|376282008|ref|NC_016798.1| From yuf228 at hotmail.com Thu Jan 31 23:15:15 2013 From: yuf228 at hotmail.com (Rob) Date: Fri, 1 Feb 2013 04:15:15 +0000 (UTC) Subject: [Bioperl-l] Where to get BLASTCLUST or equivalent? References: <200305311150.h4VBopn2019091@localhost.localdomain> Message-ID: Cyril C.C. Chua bmb.leeds.ac.uk> writes: > > Hi, > > I have some difficulty in sourcing for BLASTCLUST or related > programs/mods. Does any1 know exactly how to locate them? > > Regards > > Cyril Chua > Hi Cyril, I heard of the following programmes that might do similar things (I HAVEN'T used any of them yet): Afree - http://www.vicbioinformatics.com/software.afree.shtml Uclust - http://drive5.com/uclust/uclust_userguide_2_1.pdf Usearch - http://www.drive5.com/usearch/ DomClust - http://mbgd.genome.ad.jp/domclust/ or Check this: http://ppod.princeton.edu/help/help_tech.html God bless, Robert From jimhu at tamu.edu Wed Jan 2 14:44:56 2013 From: jimhu at tamu.edu (Jim Hu) Date: Wed, 2 Jan 2013 13:44:56 -0600 Subject: [Bioperl-l] Converting blast+ output to gff (with gaps) Message-ID: <072DC877-6A75-4A2E-9E6B-027E1D874C4C@tamu.edu> I assume this has already been done many times, but I can't seem to find it on bioperl.org or via google. I'm looking for a script that will take one of the blast+ outformats that includes the positions of gaps and mismatches, and create gff with appropriate subfeatures. Thanks, Jim ===================================== Jim Hu Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054 From biojiangke at gmail.com Wed Jan 2 15:20:02 2013 From: biojiangke at gmail.com (Ke Jiang) Date: Wed, 2 Jan 2013 15:20:02 -0500 Subject: [Bioperl-l] Converting blast+ output to gff (with gaps) In-Reply-To: <072DC877-6A75-4A2E-9E6B-027E1D874C4C@tamu.edu> References: <072DC877-6A75-4A2E-9E6B-027E1D874C4C@tamu.edu> Message-ID: There is a blast2gff.pl floating around for Gbrowse to visualize blast results as tracks. Not sure about the blast+. One more reason to stick with the good old blast! https://github.com/GMOD/GBrowse/blob/master/contrib/blast2gff.pl Ke ------------------------------------------- Ke Jiang, Ph.D. Post-doctoral Fellow Delbruck Laboratory Cold Spring Harbor Laboratory Cold Spring Harbor, NY 11724 From cjfields at illinois.edu Wed Jan 2 15:47:52 2013 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 2 Jan 2013 20:47:52 +0000 Subject: [Bioperl-l] Converting blast+ output to gff (with gaps) In-Reply-To: References: <072DC877-6A75-4A2E-9E6B-027E1D874C4C@tamu.edu> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF4CF5FD44@CHIMBX5.ad.uillinois.edu> ?except that NCBI no longer supports legacy BLAST. Caveat emptor. chris On Jan 2, 2013, at 2:20 PM, Ke Jiang wrote: > There is a blast2gff.pl floating around for Gbrowse to visualize blast > results as tracks. Not sure about the blast+. One more reason to stick with > the good old blast! > > https://github.com/GMOD/GBrowse/blob/master/contrib/blast2gff.pl > > Ke > > ------------------------------------------- > Ke Jiang, Ph.D. > Post-doctoral Fellow > Delbruck Laboratory > Cold Spring Harbor Laboratory > Cold Spring Harbor, NY 11724 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bosborne11 at verizon.net Wed Jan 2 15:26:22 2013 From: bosborne11 at verizon.net (Brian Osborne) Date: Wed, 02 Jan 2013 15:26:22 -0500 Subject: [Bioperl-l] Converting blast+ output to gff (with gaps) In-Reply-To: <072DC877-6A75-4A2E-9E6B-027E1D874C4C@tamu.edu> References: <072DC877-6A75-4A2E-9E6B-027E1D874C4C@tamu.edu> Message-ID: Here's one: https://github.com/GMOD/GBrowse/blob/master/contrib/blast2gff.pl Another one: ~/git/bioperl-live>head scripts/utilities/bp_search2gff.pl #!perl # Author: Jason Stajich # Description: Turn SearchIO parseable report(s) into a GFF report # =head1 NAME bp_search2gff - Turn SearchIO parseable reports(s) into a GFF report Brian O. On Jan 2, 2013, at 2:44 PM, Jim Hu wrote: > I assume this has already been done many times, but I can't seem to find it on bioperl.org or via google. > > I'm looking for a script that will take one of the blast+ outformats that includes the positions of gaps and mismatches, and create gff with appropriate subfeatures. > > Thanks, > > Jim > ===================================== > Jim Hu > Professor > Dept. of Biochemistry and Biophysics > 2128 TAMU > Texas A&M Univ. > College Station, TX 77843-2128 > 979-862-4054 > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From scott at scottcain.net Wed Jan 2 16:32:16 2013 From: scott at scottcain.net (Scott Cain) Date: Wed, 2 Jan 2013 16:32:16 -0500 Subject: [Bioperl-l] Converting blast+ output to gff (with gaps) In-Reply-To: References: <072DC877-6A75-4A2E-9E6B-027E1D874C4C@tamu.edu> Message-ID: Hi Brian, I was going to suggest the same thing--though that script is fairly old, it's not as old as the blast2gff script in the GBrowse distribution (which probably should be retired). I believe it supports GFF3, though I don't have any sample data with which to test it to be sure. I also don't know if it supports BLAST+ input--I haven't kept up with SearchIO (on which search2gff.pl depends); will it accept it? Scott On Wed, Jan 2, 2013 at 3:26 PM, Brian Osborne wrote: > Here's one: > > https://github.com/GMOD/GBrowse/blob/master/contrib/blast2gff.pl > > Another one: > > ~/git/bioperl-live>head scripts/utilities/bp_search2gff.pl > #!perl > > # Author: Jason Stajich > # Description: Turn SearchIO parseable report(s) into a GFF report > # > =head1 NAME > > bp_search2gff - Turn SearchIO parseable reports(s) into a GFF report > > > > Brian O. > > On Jan 2, 2013, at 2:44 PM, Jim Hu wrote: > >> I assume this has already been done many times, but I can't seem to find it on bioperl.org or via google. >> >> I'm looking for a script that will take one of the blast+ outformats that includes the positions of gaps and mismatches, and create gff with appropriate subfeatures. >> >> Thanks, >> >> Jim >> ===================================== >> Jim Hu >> Professor >> Dept. of Biochemistry and Biophysics >> 2128 TAMU >> Texas A&M Univ. >> College Station, TX 77843-2128 >> 979-862-4054 >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From cjfields at illinois.edu Wed Jan 2 17:26:55 2013 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 2 Jan 2013 22:26:55 +0000 Subject: [Bioperl-l] Converting blast+ output to gff (with gaps) In-Reply-To: References: <072DC877-6A75-4A2E-9E6B-027E1D874C4C@tamu.edu> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF4CF5FFF5@CHIMBX5.ad.uillinois.edu> It should (I recall using it at one point). If it doesn't we should fix it so it does. How does MAKER deal with this? IIRC it uses (a modified) SearchIO-based method... chris On Jan 2, 2013, at 3:32 PM, Scott Cain wrote: > Hi Brian, > > I was going to suggest the same thing--though that script is fairly > old, it's not as old as the blast2gff script in the GBrowse > distribution (which probably should be retired). I believe it > supports GFF3, though I don't have any sample data with which to test > it to be sure. I also don't know if it supports BLAST+ input--I > haven't kept up with SearchIO (on which search2gff.pl depends); will > it accept it? > > Scott > > > On Wed, Jan 2, 2013 at 3:26 PM, Brian Osborne wrote: >> Here's one: >> >> https://github.com/GMOD/GBrowse/blob/master/contrib/blast2gff.pl >> >> Another one: >> >> ~/git/bioperl-live>head scripts/utilities/bp_search2gff.pl >> #!perl >> >> # Author: Jason Stajich >> # Description: Turn SearchIO parseable report(s) into a GFF report >> # >> =head1 NAME >> >> bp_search2gff - Turn SearchIO parseable reports(s) into a GFF report >> >> >> >> Brian O. >> >> On Jan 2, 2013, at 2:44 PM, Jim Hu wrote: >> >>> I assume this has already been done many times, but I can't seem to find it on bioperl.org or via google. >>> >>> I'm looking for a script that will take one of the blast+ outformats that includes the positions of gaps and mismatches, and create gff with appropriate subfeatures. >>> >>> Thanks, >>> >>> Jim >>> ===================================== >>> Jim Hu >>> Professor >>> Dept. of Biochemistry and Biophysics >>> 2128 TAMU >>> Texas A&M Univ. >>> College Station, TX 77843-2128 >>> 979-862-4054 >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bosborne11 at verizon.net Wed Jan 2 17:53:50 2013 From: bosborne11 at verizon.net (Brian Osborne) Date: Wed, 02 Jan 2013 17:53:50 -0500 Subject: [Bioperl-l] Converting blast+ output to gff (with gaps) In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF4CF5FFF5@CHIMBX5.ad.uillinois.edu> References: <072DC877-6A75-4A2E-9E6B-027E1D874C4C@tamu.edu> <118F034CF4C3EF48A96F86CE585B94BF4CF5FFF5@CHIMBX5.ad.uillinois.edu> Message-ID: <80636318-ED50-4F05-85C2-35F12E4FF9D8@verizon.net> Scott and Chris, I'll test it and see? Brian O. On Jan 2, 2013, at 5:26 PM, "Fields, Christopher J" wrote: > It should (I recall using it at one point). If it doesn't we should fix it so it does. > > How does MAKER deal with this? IIRC it uses (a modified) SearchIO-based method... > > chris > > On Jan 2, 2013, at 3:32 PM, Scott Cain wrote: > >> Hi Brian, >> >> I was going to suggest the same thing--though that script is fairly >> old, it's not as old as the blast2gff script in the GBrowse >> distribution (which probably should be retired). I believe it >> supports GFF3, though I don't have any sample data with which to test >> it to be sure. I also don't know if it supports BLAST+ input--I >> haven't kept up with SearchIO (on which search2gff.pl depends); will >> it accept it? >> >> Scott >> >> >> On Wed, Jan 2, 2013 at 3:26 PM, Brian Osborne wrote: >>> Here's one: >>> >>> https://github.com/GMOD/GBrowse/blob/master/contrib/blast2gff.pl >>> >>> Another one: >>> >>> ~/git/bioperl-live>head scripts/utilities/bp_search2gff.pl >>> #!perl >>> >>> # Author: Jason Stajich >>> # Description: Turn SearchIO parseable report(s) into a GFF report >>> # >>> =head1 NAME >>> >>> bp_search2gff - Turn SearchIO parseable reports(s) into a GFF report >>> >>> >>> >>> Brian O. >>> >>> On Jan 2, 2013, at 2:44 PM, Jim Hu wrote: >>> >>>> I assume this has already been done many times, but I can't seem to find it on bioperl.org or via google. >>>> >>>> I'm looking for a script that will take one of the blast+ outformats that includes the positions of gaps and mismatches, and create gff with appropriate subfeatures. >>>> >>>> Thanks, >>>> >>>> Jim >>>> ===================================== >>>> Jim Hu >>>> Professor >>>> Dept. of Biochemistry and Biophysics >>>> 2128 TAMU >>>> Texas A&M Univ. >>>> College Station, TX 77843-2128 >>>> 979-862-4054 >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D. scott at scottcain dot net >> GMOD Coordinator (http://gmod.org/) 216-392-3087 >> Ontario Institute for Cancer Research >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jimhu at tamu.edu Fri Jan 4 14:49:51 2013 From: jimhu at tamu.edu (Jim Hu) Date: Fri, 4 Jan 2013 13:49:51 -0600 Subject: [Bioperl-l] Converting blast+ output to gff (with gaps) In-Reply-To: <80636318-ED50-4F05-85C2-35F12E4FF9D8@verizon.net> References: <072DC877-6A75-4A2E-9E6B-027E1D874C4C@tamu.edu> <118F034CF4C3EF48A96F86CE585B94BF4CF5FFF5@CHIMBX5.ad.uillinois.edu> <80636318-ED50-4F05-85C2-35F12E4FF9D8@verizon.net> Message-ID: <7F69D24B-8D4C-41F5-B8A1-CB850EBB1C67@tamu.edu> Thanks for the replies, but... I can't tell what input formats for the blast results file are supported. Format 11 and format 6 give no output and no feedback. Putting some diagnostic print statements in the code suggests that I'm not getting any result objects from Bio::SearchIO. The script uses Bio::SearchIO, but does not seem to call the submodules for blast. Documentation links on the wiki seem to be broken, at least on this page: http://www.bioperl.org/wiki/Module:Bio::SearchIO Jim On Jan 2, 2013, at 4:53 PM, Brian Osborne wrote: > Scott and Chris, > > I'll test it and see? > > Brian O. > > > On Jan 2, 2013, at 5:26 PM, "Fields, Christopher J" wrote: > >> It should (I recall using it at one point). If it doesn't we should fix it so it does. >> >> How does MAKER deal with this? IIRC it uses (a modified) SearchIO-based method... >> >> chris >> >> On Jan 2, 2013, at 3:32 PM, Scott Cain wrote: >> >>> Hi Brian, >>> >>> I was going to suggest the same thing--though that script is fairly >>> old, it's not as old as the blast2gff script in the GBrowse >>> distribution (which probably should be retired). I believe it >>> supports GFF3, though I don't have any sample data with which to test >>> it to be sure. I also don't know if it supports BLAST+ input--I >>> haven't kept up with SearchIO (on which search2gff.pl depends); will >>> it accept it? >>> >>> Scott >>> >>> >>> On Wed, Jan 2, 2013 at 3:26 PM, Brian Osborne wrote: >>>> Here's one: >>>> >>>> https://github.com/GMOD/GBrowse/blob/master/contrib/blast2gff.pl >>>> >>>> Another one: >>>> >>>> ~/git/bioperl-live>head scripts/utilities/bp_search2gff.pl >>>> #!perl >>>> >>>> # Author: Jason Stajich >>>> # Description: Turn SearchIO parseable report(s) into a GFF report >>>> # >>>> =head1 NAME >>>> >>>> bp_search2gff - Turn SearchIO parseable reports(s) into a GFF report >>>> >>>> >>>> >>>> Brian O. >>>> >>>> On Jan 2, 2013, at 2:44 PM, Jim Hu wrote: >>>> >>>>> I assume this has already been done many times, but I can't seem to find it on bioperl.org or via google. >>>>> >>>>> I'm looking for a script that will take one of the blast+ outformats that includes the positions of gaps and mismatches, and create gff with appropriate subfeatures. >>>>> >>>>> Thanks, >>>>> >>>>> Jim >>>>> ===================================== >>>>> Jim Hu >>>>> Professor >>>>> Dept. of Biochemistry and Biophysics >>>>> 2128 TAMU >>>>> Texas A&M Univ. >>>>> College Station, TX 77843-2128 >>>>> 979-862-4054 >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >>> -- >>> ------------------------------------------------------------------------ >>> Scott Cain, Ph. D. scott at scottcain dot net >>> GMOD Coordinator (http://gmod.org/) 216-392-3087 >>> Ontario Institute for Cancer Research >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > ===================================== Jim Hu Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054 From bosborne11 at verizon.net Fri Jan 4 14:26:52 2013 From: bosborne11 at verizon.net (Brian Osborne) Date: Fri, 04 Jan 2013 14:26:52 -0500 Subject: [Bioperl-l] strange error in the beginners howto In-Reply-To: References: Message-ID: Matthew, First, let's distinguish "warning" and "error". Errors stop execution of the app or script, what you're seeing is a warning, your script actually completes. Does the warning vanish if you remove "-w"? Second, seems like this is a Perl version thing: http://www.sysarchitects.com/bioperl BIO On Jan 4, 2013, at 1:55 PM, "Thornton, Matthew" wrote: > Hello, > > I am working through the first part of the BioPerl beginners howto and I am getting an error and I was wondering if you could tell me what is wrong. > > I made the seqio.pl script: > > #!/bin/perl -w > > use Bio::Seq; > > $seq_obj = Bio::Seq->new(-seq => "aaaatgggggggggggccccgtt", > -display_id => "#12345", > -desc => "example 1", > -alphabet => "dna" ); > > print $seq_obj->seq(); > > I execute the script and I get this error: > > gserver at gserver:~/bio$ perl seqio.pl > Replacement list is longer than search list at /usr/local/share/perl/5.14.2/Bio/Range.pm line 251. > aaaatgggggggggggccccgttgserver at gserver:~/bio$ > > Is there something wrong with my installation? > > Any help is greatly appreciated. > > Thanks! > > > Matthew E. Thornton > > > > Laboratory Technician III > Broad Center for Stem Cell and Regenerative Medicine > University of Southern California > 1501 San Pablo St. ZNI > Los Angeles, CA 90089-2821 > ph.:323-442-1688 > fx.:323-442-4040 > matthew.thornton at med.usc.edu > Brian O. -- Brian Osborne, PhD BioTeam: http://bioteam.net email: briano at bioteam.net mobile: 978-317-3101 From MEC at stowers.org Fri Jan 4 15:20:17 2013 From: MEC at stowers.org (Cook, Malcolm) Date: Fri, 4 Jan 2013 14:20:17 -0600 Subject: [Bioperl-l] Converting blast+ output to gff (with gaps) In-Reply-To: <7F69D24B-8D4C-41F5-B8A1-CB850EBB1C67@tamu.edu> References: <072DC877-6A75-4A2E-9E6B-027E1D874C4C@tamu.edu> <118F034CF4C3EF48A96F86CE585B94BF4CF5FFF5@CHIMBX5.ad.uillinois.edu> <80636318-ED50-4F05-85C2-35F12E4FF9D8@verizon.net> <7F69D24B-8D4C-41F5-B8A1-CB850EBB1C67@tamu.edu> Message-ID: <2C40E43D1F7A56408C4463FD245DDDF907958EE791@EXCHMB-02.stowers-institute.org> Jim, Getting to your original question: > I'm looking for a script that will take one of the blast+ outformats that includes the positions of gaps and mismatches, and .create gff with appropriate subfeatures. Exactly what/how do you want/expect to encode the blast output as GFF{1,2,2.5,3}?? If GFF3 pe http://www.sequenceontology.org/gff3.shtml then are you hoping to get GFF3 marked up as described in section 'THE GAP ATTRIBUTE' or as in 'ALIGNMENTS' I would guess not because neither of them have 'subfeatures'. If you could explain more fully with examples (hand cobbled or borrowed from someone else) of what you expect then I might have a better idea of what options might suit your needs. ~Malcolm .-----Original Message----- .From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Jim Hu .Sent: Friday, January 04, 2013 1:50 PM .To: Brian Osborne .Cc: Fields, Christopher J; Scott Cain; bioperl-l at bioperl.org .Subject: Re: [Bioperl-l] Converting blast+ output to gff (with gaps) . .Thanks for the replies, but... . .I can't tell what input formats for the blast results file are supported. Format 11 and format 6 give no output and no feedback. Putting .some diagnostic print statements in the code suggests that I'm not getting any result objects from Bio::SearchIO. . .The script uses Bio::SearchIO, but does not seem to call the submodules for blast. Documentation links on the wiki seem to be .broken, at least on this page: . . http://www.bioperl.org/wiki/Module:Bio::SearchIO . .Jim . . .On Jan 2, 2013, at 4:53 PM, Brian Osborne wrote: . .> Scott and Chris, .> .> I'll test it and see... .> .> Brian O. .> .> .> On Jan 2, 2013, at 5:26 PM, "Fields, Christopher J" wrote: .> .>> It should (I recall using it at one point). If it doesn't we should fix it so it does. .>> .>> How does MAKER deal with this? IIRC it uses (a modified) SearchIO-based method... .>> .>> chris .>> .>> On Jan 2, 2013, at 3:32 PM, Scott Cain wrote: .>> .>>> Hi Brian, .>>> .>>> I was going to suggest the same thing--though that script is fairly .>>> old, it's not as old as the blast2gff script in the GBrowse .>>> distribution (which probably should be retired). I believe it .>>> supports GFF3, though I don't have any sample data with which to test .>>> it to be sure. I also don't know if it supports BLAST+ input--I .>>> haven't kept up with SearchIO (on which search2gff.pl depends); will .>>> it accept it? .>>> .>>> Scott .>>> .>>> .>>> On Wed, Jan 2, 2013 at 3:26 PM, Brian Osborne wrote: .>>>> Here's one: .>>>> .>>>> https://github.com/GMOD/GBrowse/blob/master/contrib/blast2gff.pl .>>>> .>>>> Another one: .>>>> .>>>> ~/git/bioperl-live>head scripts/utilities/bp_search2gff.pl .>>>> #!perl .>>>> .>>>> # Author: Jason Stajich .>>>> # Description: Turn SearchIO parseable report(s) into a GFF report .>>>> # .>>>> =head1 NAME .>>>> .>>>> bp_search2gff - Turn SearchIO parseable reports(s) into a GFF report .>>>> .>>>> .>>>> .>>>> Brian O. .>>>> .>>>> On Jan 2, 2013, at 2:44 PM, Jim Hu wrote: .>>>> .>>>>> I assume this has already been done many times, but I can't seem to find it on bioperl.org or via google. .>>>>> .>>>>> I'm looking for a script that will take one of the blast+ outformats that includes the positions of gaps and mismatches, and .create gff with appropriate subfeatures. .>>>>> .>>>>> Thanks, .>>>>> .>>>>> Jim .>>>>> ===================================== .>>>>> Jim Hu .>>>>> Professor .>>>>> Dept. of Biochemistry and Biophysics .>>>>> 2128 TAMU .>>>>> Texas A&M Univ. .>>>>> College Station, TX 77843-2128 .>>>>> 979-862-4054 .>>>>> .>>>>> .>>>>> .>>>>> _______________________________________________ .>>>>> Bioperl-l mailing list .>>>>> Bioperl-l at lists.open-bio.org .>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l .>>>> .>>>> .>>>> _______________________________________________ .>>>> Bioperl-l mailing list .>>>> Bioperl-l at lists.open-bio.org .>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l .>>> .>>> .>>> .>>> -- .>>> ------------------------------------------------------------------------ .>>> Scott Cain, Ph. D. scott at scottcain dot net .>>> GMOD Coordinator (http://gmod.org/) 216-392-3087 .>>> Ontario Institute for Cancer Research .>>> _______________________________________________ .>>> Bioperl-l mailing list .>>> Bioperl-l at lists.open-bio.org .>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l .>> .> . .===================================== .Jim Hu .Professor .Dept. of Biochemistry and Biophysics .2128 TAMU .Texas A&M Univ. .College Station, TX 77843-2128 .979-862-4054 . . . ._______________________________________________ .Bioperl-l mailing list .Bioperl-l at lists.open-bio.org .http://lists.open-bio.org/mailman/listinfo/bioperl-l From jimhu at tamu.edu Fri Jan 4 16:57:38 2013 From: jimhu at tamu.edu (Jim Hu) Date: Fri, 4 Jan 2013 15:57:38 -0600 Subject: [Bioperl-l] Converting blast+ output to gff (with gaps) In-Reply-To: <2C40E43D1F7A56408C4463FD245DDDF907958EE791@EXCHMB-02.stowers-institute.org> References: <072DC877-6A75-4A2E-9E6B-027E1D874C4C@tamu.edu> <118F034CF4C3EF48A96F86CE585B94BF4CF5FFF5@CHIMBX5.ad.uillinois.edu> <80636318-ED50-4F05-85C2-35F12E4FF9D8@verizon.net> <7F69D24B-8D4C-41F5-B8A1-CB850EBB1C67@tamu.edu> <2C40E43D1F7A56408C4463FD245DDDF907958EE791@EXCHMB-02.stowers-institute.org> Message-ID: <2D58FAEB-ED4B-4965-A8CB-890D0012D098@tamu.edu> Malcolm, Thanks, I should have reread the GFF3 spec before posting! In the section on the Gap attrribute and below on alignments it discusses two ways to represent an alignment. I was originally thinking of something like the later example shown for cDNA vs genome. But the gap attribute representation would be fine too. So, I can see how the final output could be done in different ways, but I'm still stuck on how to get there. I don't have a specific application in mind; I'm mostly just trying to understand how to get from having standalone blast+ output to get to things that look like the examples in the gff spec and the gbrowse documentation - really basic display of alignments that are gapped. For my teaching, we do EST vs genomic blast and want gapped cDNA alignments to show where the introns go. My other work is with bacteria where introns are rare, but there are times when I'd like to show an alignment that is interrupted by a transposable element, for example. Excerpting from blastp -help *** Formatting options -outfmt alignment view options: 0 = pairwise, 1 = query-anchored showing identities, 2 = query-anchored no identities, 3 = flat query-anchored, show identities, 4 = flat query-anchored, no identities, 5 = XML Blast output, 6 = tabular, 7 = tabular with comment lines, 8 = Text ASN.1, 9 = Binary ASN.1, 10 = Comma-separated values, 11 = BLAST archive format (ASN.1) Several of these are "lossy" in terms of where the actual gaps occur (e.g. 6). Others seem to me to be more human readable than suited for parsing. So I was hoping to get pointed to an existing script that would generate either the single feature with gap attribute OR the multi-line match features OR a combination from one of these output formats. I'm probably missing something very, very obvious. Best, Jim On Jan 4, 2013, at 2:20 PM, Cook, Malcolm wrote: > Jim, > > Getting to your original question: > >> I'm looking for a script that will take one of the blast+ outformats that includes the positions of gaps and mismatches, and .create gff with appropriate subfeatures. > > Exactly what/how do you want/expect to encode the blast output as GFF{1,2,2.5,3}?? > > If GFF3 pe http://www.sequenceontology.org/gff3.shtml then are you hoping to get GFF3 marked up as described in section 'THE GAP ATTRIBUTE' or as in 'ALIGNMENTS' > > I would guess not because neither of them have 'subfeatures'. > > If you could explain more fully with examples (hand cobbled or borrowed from someone else) of what you expect then I might have a better idea of what options might suit your needs. > > > ~Malcolm > > > .-----Original Message----- > .From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Jim Hu > .Sent: Friday, January 04, 2013 1:50 PM > .To: Brian Osborne > .Cc: Fields, Christopher J; Scott Cain; bioperl-l at bioperl.org > .Subject: Re: [Bioperl-l] Converting blast+ output to gff (with gaps) > . > .Thanks for the replies, but... > . > .I can't tell what input formats for the blast results file are supported. Format 11 and format 6 give no output and no feedback. Putting > .some diagnostic print statements in the code suggests that I'm not getting any result objects from Bio::SearchIO. > . > .The script uses Bio::SearchIO, but does not seem to call the submodules for blast. Documentation links on the wiki seem to be > .broken, at least on this page: > . > . http://www.bioperl.org/wiki/Module:Bio::SearchIO > . > .Jim > . > . > .On Jan 2, 2013, at 4:53 PM, Brian Osborne wrote: > . > .> Scott and Chris, > .> > .> I'll test it and see... > .> > .> Brian O. > .> > .> > .> On Jan 2, 2013, at 5:26 PM, "Fields, Christopher J" wrote: > .> > .>> It should (I recall using it at one point). If it doesn't we should fix it so it does. > .>> > .>> How does MAKER deal with this? IIRC it uses (a modified) SearchIO-based method... > .>> > .>> chris > .>> > .>> On Jan 2, 2013, at 3:32 PM, Scott Cain wrote: > .>> > .>>> Hi Brian, > .>>> > .>>> I was going to suggest the same thing--though that script is fairly > .>>> old, it's not as old as the blast2gff script in the GBrowse > .>>> distribution (which probably should be retired). I believe it > .>>> supports GFF3, though I don't have any sample data with which to test > .>>> it to be sure. I also don't know if it supports BLAST+ input--I > .>>> haven't kept up with SearchIO (on which search2gff.pl depends); will > .>>> it accept it? > .>>> > .>>> Scott > .>>> > .>>> > .>>> On Wed, Jan 2, 2013 at 3:26 PM, Brian Osborne wrote: > .>>>> Here's one: > .>>>> > .>>>> https://github.com/GMOD/GBrowse/blob/master/contrib/blast2gff.pl > .>>>> > .>>>> Another one: > .>>>> > .>>>> ~/git/bioperl-live>head scripts/utilities/bp_search2gff.pl > .>>>> #!perl > .>>>> > .>>>> # Author: Jason Stajich > .>>>> # Description: Turn SearchIO parseable report(s) into a GFF report > .>>>> # > .>>>> =head1 NAME > .>>>> > .>>>> bp_search2gff - Turn SearchIO parseable reports(s) into a GFF report > .>>>> > .>>>> > .>>>> > .>>>> Brian O. > .>>>> > .>>>> On Jan 2, 2013, at 2:44 PM, Jim Hu wrote: > .>>>> > .>>>>> I assume this has already been done many times, but I can't seem to find it on bioperl.org or via google. > .>>>>> > .>>>>> I'm looking for a script that will take one of the blast+ outformats that includes the positions of gaps and mismatches, and > .create gff with appropriate subfeatures. > .>>>>> > .>>>>> Thanks, > .>>>>> > .>>>>> Jim > .>>>>> ===================================== > .>>>>> Jim Hu > .>>>>> Professor > .>>>>> Dept. of Biochemistry and Biophysics > .>>>>> 2128 TAMU > .>>>>> Texas A&M Univ. > .>>>>> College Station, TX 77843-2128 > .>>>>> 979-862-4054 > .>>>>> > .>>>>> > .>>>>> > .>>>>> _______________________________________________ > .>>>>> Bioperl-l mailing list > .>>>>> Bioperl-l at lists.open-bio.org > .>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > .>>>> > .>>>> > .>>>> _______________________________________________ > .>>>> Bioperl-l mailing list > .>>>> Bioperl-l at lists.open-bio.org > .>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > .>>> > .>>> > .>>> > .>>> -- > .>>> ------------------------------------------------------------------------ > .>>> Scott Cain, Ph. D. scott at scottcain dot net > .>>> GMOD Coordinator (http://gmod.org/) 216-392-3087 > .>>> Ontario Institute for Cancer Research > .>>> _______________________________________________ > .>>> Bioperl-l mailing list > .>>> Bioperl-l at lists.open-bio.org > .>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > .>> > .> > . > .===================================== > .Jim Hu > .Professor > .Dept. of Biochemistry and Biophysics > .2128 TAMU > .Texas A&M Univ. > .College Station, TX 77843-2128 > .979-862-4054 > . > . > . > ._______________________________________________ > .Bioperl-l mailing list > .Bioperl-l at lists.open-bio.org > .http://lists.open-bio.org/mailman/listinfo/bioperl-l ===================================== Jim Hu Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054 From MEC at stowers.org Fri Jan 4 18:33:04 2013 From: MEC at stowers.org (Cook, Malcolm) Date: Fri, 4 Jan 2013 17:33:04 -0600 Subject: [Bioperl-l] Converting blast+ output to gff (with gaps) In-Reply-To: <2D58FAEB-ED4B-4965-A8CB-890D0012D098@tamu.edu> References: <072DC877-6A75-4A2E-9E6B-027E1D874C4C@tamu.edu> <118F034CF4C3EF48A96F86CE585B94BF4CF5FFF5@CHIMBX5.ad.uillinois.edu> <80636318-ED50-4F05-85C2-35F12E4FF9D8@verizon.net> <7F69D24B-8D4C-41F5-B8A1-CB850EBB1C67@tamu.edu> <2C40E43D1F7A56408C4463FD245DDDF907958EE791@EXCHMB-02.stowers-institute.org> <2D58FAEB-ED4B-4965-A8CB-890D0012D098@tamu.edu> Message-ID: <2C40E43D1F7A56408C4463FD245DDDF907958EE7CB@EXCHMB-02.stowers-institute.org> Jim, To get a working example for further discussion.... Assuming you have bioperl and blast+ installed.... This command gets a known transcript from ncbi and blasts it back at ncbi into fly genome: > bp_download_query_genbank.pl --query 'NM_001259364' | blastn -remote -db refseq_genomic -entrez_query 'melanogaster[Taxid]' -outfmt 6 > test.blast.tab Take a look at the results (see below, or run it yourself). First line output looks like this: NM_001259364 gi|116010442|ref|NT_033778.3| 100.00 2127 0 0 1869 3995 9330023 9327897 0.0 3928 Old GFF did not have parent-child. Depending on where you are going with this, you MIGHT want to use this strategy: http://gmod.org/wiki/GFF#Alignments Where the ID in column 9 serves to group the features. There is an 'implicit' parent. I think this would be respected by both GBrowse and UCSC genome browser (but I'm not positive). If so, converting blast output would yield one line output per each line input. You would need to assign a strand, swap sstart and send where sstart>send, and some slight re-formatting First line of GFF might look like: NT_033778.3 est EST_match 9330023 9327897 . + . ID=Match1;Name= NM_001259364;Target= NM_001259364 1869 3995 Is this along the lines of what you need? ~Malcolm Complete example blast+ output looks like this: NM_001259364 gi|116010442|ref|NT_033778.3| 100.00 2127 0 0 1869 3995 9330023 9327897 0.0 3928 NM_001259364 gi|116010442|ref|NT_033778.3| 100.00 1019 0 0 3995 5013 9327834 9326816 0.0 1882 NM_001259364 gi|116010442|ref|NT_033778.3| 100.00 374 0 0 345 718 9332643 9332270 0.0 691 NM_001259364 gi|116010442|ref|NT_033778.3| 100.00 300 0 0 1 300 9337978 9337679 1e-154 555 NM_001259364 gi|116010442|ref|NT_033778.3| 100.00 270 0 0 717 986 9332201 9331932 5e-138 499 NM_001259364 gi|116010442|ref|NT_033778.3| 100.00 243 0 0 1434 1676 9331200 9330958 5e-123 449 NM_001259364 gi|116010442|ref|NT_033778.3| 99.55 223 0 1 1154 1376 9331611 9331390 1e-109 405 NM_001259364 gi|116010442|ref|NT_033778.3| 100.00 196 0 0 1675 1870 9330634 9330439 7e-97 363 NM_001259364 gi|116010442|ref|NT_033778.3| 100.00 169 0 0 986 1154 9331847 9331679 7e-82 313 NM_001259364 gi|116010442|ref|NT_033778.3| 97.26 73 2 0 1372 1444 9331322 9331250 3e-25 124 NM_001259364 gi|116010442|ref|NT_033778.3| 100.00 44 0 0 301 344 9333095 9333052 2e-12 82.4 NM_001259364 gi|195586644|ref|NT_167067.1| 96.52 2127 74 0 1869 3995 7777827 7775701 0.0 3518 NM_001259364 gi|195586644|ref|NT_167067.1| 97.03 404 7 3 4004 4402 7775629 7775226 0.0 675 NM_001259364 gi|195586644|ref|NT_167067.1| 93.58 374 24 0 345 718 7780392 7780019 8e-156 558 NM_001259364 gi|195586644|ref|NT_167067.1| 94.37 302 15 1 1 300 7786262 7785961 6e-127 462 NM_001259364 gi|195586644|ref|NT_167067.1| 96.25 267 10 0 720 986 7779949 7779683 1e-119 438 NM_001259364 gi|195586644|ref|NT_167067.1| 95.00 240 12 0 1434 1673 7778959 7778720 2e-101 377 NM_001259364 gi|195586644|ref|NT_167067.1| 96.86 223 6 1 1154 1376 7779370 7779149 1e-99 372 NM_001259364 gi|195586644|ref|NT_167067.1| 93.88 196 12 0 1675 1870 7778384 7778189 7e-77 296 NM_001259364 gi|195586644|ref|NT_167067.1| 95.19 187 5 4 4827 5013 7774635 7774453 9e-76 292 NM_001259364 gi|195586644|ref|NT_167067.1| 92.86 168 12 0 987 1154 7779598 7779431 3e-61 244 NM_001259364 gi|195586644|ref|NT_167067.1| 92.86 70 5 0 1374 1443 7779079 7779010 2e-18 102 NM_001259364 gi|195586644|ref|NT_167067.1| 100.00 44 0 0 301 344 7780833 7780790 2e-12 82.4 NM_001259364 gi|195489961|ref|NT_167063.1| 92.43 2127 161 0 1869 3995 7628910 7631036 0.0 3037 NM_001259364 gi|195489961|ref|NT_167063.1| 91.94 1030 38 20 3995 5013 7631102 7632097 0.0 1400 NM_001259364 gi|195489961|ref|NT_167063.1| 88.56 376 39 4 345 718 7626215 7626588 4e-124 453 NM_001259364 gi|195489961|ref|NT_167063.1| 93.67 300 18 1 1 300 7620777 7621075 2e-122 448 NM_001259364 gi|195489961|ref|NT_167063.1| 95.41 196 9 0 1675 1870 7628263 7628458 7e-82 313 NM_001259364 gi|195489961|ref|NT_167063.1| 91.93 223 17 1 1154 1376 7627224 7627445 2e-81 311 NM_001259364 gi|195489961|ref|NT_167063.1| 86.42 265 31 3 720 983 7626653 7626913 1e-73 285 NM_001259364 gi|195489961|ref|NT_167063.1| 86.18 246 25 4 1434 1673 7627635 7627877 3e-65 257 NM_001259364 gi|195489961|ref|NT_167063.1| 88.27 162 19 0 993 1154 7626999 7627160 3e-46 195 NM_001259364 gi|195489961|ref|NT_167063.1| 93.06 72 5 0 1372 1443 7627513 7627584 1e-19 106 NM_001259364 gi|195489961|ref|NT_167063.1| 100.00 44 0 0 301 344 7625770 7625813 2e-12 82.4 ~Malcolm .-----Original Message----- .From: Jim Hu [mailto:jimhu at tamu.edu] .Sent: Friday, January 04, 2013 3:58 PM .To: Cook, Malcolm .Cc: 'Brian Osborne'; 'Fields, Christopher J'; 'Scott Cain'; 'bioperl-l at bioperl.org' .Subject: Re: [Bioperl-l] Converting blast+ output to gff (with gaps) . .Malcolm, . .Thanks, I should have reread the GFF3 spec before posting! . .In the section on the Gap attrribute and below on alignments it discusses two ways to represent an alignment. I was originally thinking .of something like the later example shown for cDNA vs genome. But the gap attribute representation would be fine too. So, I can see .how the final output could be done in different ways, but I'm still stuck on how to get there. . .I don't have a specific application in mind; I'm mostly just trying to understand how to get from having standalone blast+ output to get .to things that look like the examples in the gff spec and the gbrowse documentation - really basic display of alignments that are .gapped. For my teaching, we do EST vs genomic blast and want gapped cDNA alignments to show where the introns go. My other .work is with bacteria where introns are rare, but there are times when I'd like to show an alignment that is interrupted by a .transposable element, for example. . .Excerpting from blastp -help . . *** Formatting options . -outfmt . alignment view options: . 0 = pairwise, . 1 = query-anchored showing identities, . 2 = query-anchored no identities, . 3 = flat query-anchored, show identities, . 4 = flat query-anchored, no identities, . 5 = XML Blast output, . 6 = tabular, . 7 = tabular with comment lines, . 8 = Text ASN.1, . 9 = Binary ASN.1, . 10 = Comma-separated values, . 11 = BLAST archive format (ASN.1) . .Several of these are "lossy" in terms of where the actual gaps occur (e.g. 6). Others seem to me to be more human readable than .suited for parsing. So I was hoping to get pointed to an existing script that would generate either the single feature with gap attribute .OR the multi-line match features OR a combination from one of these output formats. . .I'm probably missing something very, very obvious. . .Best, . .Jim . . .On Jan 4, 2013, at 2:20 PM, Cook, Malcolm wrote: . .> Jim, .> .> Getting to your original question: .> .>> I'm looking for a script that will take one of the blast+ outformats that includes the positions of gaps and mismatches, and .create .gff with appropriate subfeatures. .> .> Exactly what/how do you want/expect to encode the blast output as GFF{1,2,2.5,3}?? .> .> If GFF3 pe http://www.sequenceontology.org/gff3.shtml then are you hoping to get GFF3 marked up as described in section 'THE .GAP ATTRIBUTE' or as in 'ALIGNMENTS' .> .> I would guess not because neither of them have 'subfeatures'. .> .> If you could explain more fully with examples (hand cobbled or borrowed from someone else) of what you expect then I might have .a better idea of what options might suit your needs. .> .> .> ~Malcolm .> .> .> .-----Original Message----- .> .From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Jim Hu .> .Sent: Friday, January 04, 2013 1:50 PM .> .To: Brian Osborne .> .Cc: Fields, Christopher J; Scott Cain; bioperl-l at bioperl.org .> .Subject: Re: [Bioperl-l] Converting blast+ output to gff (with gaps) .> . .> .Thanks for the replies, but... .> . .> .I can't tell what input formats for the blast results file are supported. Format 11 and format 6 give no output and no feedback. .Putting .> .some diagnostic print statements in the code suggests that I'm not getting any result objects from Bio::SearchIO. .> . .> .The script uses Bio::SearchIO, but does not seem to call the submodules for blast. Documentation links on the wiki seem to be .> .broken, at least on this page: .> . .> . http://www.bioperl.org/wiki/Module:Bio::SearchIO .> . .> .Jim .> . .> . .> .On Jan 2, 2013, at 4:53 PM, Brian Osborne wrote: .> . .> .> Scott and Chris, .> .> .> .> I'll test it and see... .> .> .> .> Brian O. .> .> .> .> .> .> On Jan 2, 2013, at 5:26 PM, "Fields, Christopher J" wrote: .> .> .> .>> It should (I recall using it at one point). If it doesn't we should fix it so it does. .> .>> .> .>> How does MAKER deal with this? IIRC it uses (a modified) SearchIO-based method... .> .>> .> .>> chris .> .>> .> .>> On Jan 2, 2013, at 3:32 PM, Scott Cain wrote: .> .>> .> .>>> Hi Brian, .> .>>> .> .>>> I was going to suggest the same thing--though that script is fairly .> .>>> old, it's not as old as the blast2gff script in the GBrowse .> .>>> distribution (which probably should be retired). I believe it .> .>>> supports GFF3, though I don't have any sample data with which to test .> .>>> it to be sure. I also don't know if it supports BLAST+ input--I .> .>>> haven't kept up with SearchIO (on which search2gff.pl depends); will .> .>>> it accept it? .> .>>> .> .>>> Scott .> .>>> .> .>>> .> .>>> On Wed, Jan 2, 2013 at 3:26 PM, Brian Osborne wrote: .> .>>>> Here's one: .> .>>>> .> .>>>> https://github.com/GMOD/GBrowse/blob/master/contrib/blast2gff.pl .> .>>>> .> .>>>> Another one: .> .>>>> .> .>>>> ~/git/bioperl-live>head scripts/utilities/bp_search2gff.pl .> .>>>> #!perl .> .>>>> .> .>>>> # Author: Jason Stajich .> .>>>> # Description: Turn SearchIO parseable report(s) into a GFF report .> .>>>> # .> .>>>> =head1 NAME .> .>>>> .> .>>>> bp_search2gff - Turn SearchIO parseable reports(s) into a GFF report .> .>>>> .> .>>>> .> .>>>> .> .>>>> Brian O. .> .>>>> .> .>>>> On Jan 2, 2013, at 2:44 PM, Jim Hu wrote: .> .>>>> .> .>>>>> I assume this has already been done many times, but I can't seem to find it on bioperl.org or via google. .> .>>>>> .> .>>>>> I'm looking for a script that will take one of the blast+ outformats that includes the positions of gaps and mismatches, and .> .create gff with appropriate subfeatures. .> .>>>>> .> .>>>>> Thanks, .> .>>>>> .> .>>>>> Jim .> .>>>>> ===================================== .> .>>>>> Jim Hu .> .>>>>> Professor .> .>>>>> Dept. of Biochemistry and Biophysics .> .>>>>> 2128 TAMU .> .>>>>> Texas A&M Univ. .> .>>>>> College Station, TX 77843-2128 .> .>>>>> 979-862-4054 .> .>>>>> .> .>>>>> .> .>>>>> .> .>>>>> _______________________________________________ .> .>>>>> Bioperl-l mailing list .> .>>>>> Bioperl-l at lists.open-bio.org .> .>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l .> .>>>> .> .>>>> .> .>>>> _______________________________________________ .> .>>>> Bioperl-l mailing list .> .>>>> Bioperl-l at lists.open-bio.org .> .>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l .> .>>> .> .>>> .> .>>> .> .>>> -- .> .>>> ------------------------------------------------------------------------ .> .>>> Scott Cain, Ph. D. scott at scottcain dot net .> .>>> GMOD Coordinator (http://gmod.org/) 216-392-3087 .> .>>> Ontario Institute for Cancer Research .> .>>> _______________________________________________ .> .>>> Bioperl-l mailing list .> .>>> Bioperl-l at lists.open-bio.org .> .>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l .> .>> .> .> .> . .> .===================================== .> .Jim Hu .> .Professor .> .Dept. of Biochemistry and Biophysics .> .2128 TAMU .> .Texas A&M Univ. .> .College Station, TX 77843-2128 .> .979-862-4054 .> . .> . .> . .> ._______________________________________________ .> .Bioperl-l mailing list .> .Bioperl-l at lists.open-bio.org .> .http://lists.open-bio.org/mailman/listinfo/bioperl-l . .===================================== .Jim Hu .Professor .Dept. of Biochemistry and Biophysics .2128 TAMU .Texas A&M Univ. .College Station, TX 77843-2128 .979-862-4054 . From bosborne11 at verizon.net Sat Jan 5 17:25:37 2013 From: bosborne11 at verizon.net (Brian Osborne) Date: Sat, 05 Jan 2013 17:25:37 -0500 Subject: [Bioperl-l] Converting blast+ output to gff (with gaps) In-Reply-To: <80636318-ED50-4F05-85C2-35F12E4FF9D8@verizon.net> References: <072DC877-6A75-4A2E-9E6B-027E1D874C4C@tamu.edu> <118F034CF4C3EF48A96F86CE585B94BF4CF5FFF5@CHIMBX5.ad.uillinois.edu> <80636318-ED50-4F05-85C2-35F12E4FF9D8@verizon.net> Message-ID: <12E97C07-D3E0-4E21-8A24-A4223468FF70@verizon.net> All, The scripts/utilities/bp_search2gff.pl script works fine with blastn (BLAST+), no surprise there, and since it uses Bio::Tools::GFF it can produce GFF3. Brian O. On Jan 2, 2013, at 5:53 PM, Brian Osborne wrote: > Scott and Chris, > > I'll test it and see? > > Brian O. > > > On Jan 2, 2013, at 5:26 PM, "Fields, Christopher J" wrote: > >> It should (I recall using it at one point). If it doesn't we should fix it so it does. >> >> How does MAKER deal with this? IIRC it uses (a modified) SearchIO-based method... >> >> chris >> >> On Jan 2, 2013, at 3:32 PM, Scott Cain wrote: >> >>> Hi Brian, >>> >>> I was going to suggest the same thing--though that script is fairly >>> old, it's not as old as the blast2gff script in the GBrowse >>> distribution (which probably should be retired). I believe it >>> supports GFF3, though I don't have any sample data with which to test >>> it to be sure. I also don't know if it supports BLAST+ input--I >>> haven't kept up with SearchIO (on which search2gff.pl depends); will >>> it accept it? >>> >>> Scott >>> >>> >>> On Wed, Jan 2, 2013 at 3:26 PM, Brian Osborne wrote: >>>> Here's one: >>>> >>>> https://github.com/GMOD/GBrowse/blob/master/contrib/blast2gff.pl >>>> >>>> Another one: >>>> >>>> ~/git/bioperl-live>head scripts/utilities/bp_search2gff.pl >>>> #!perl >>>> >>>> # Author: Jason Stajich >>>> # Description: Turn SearchIO parseable report(s) into a GFF report >>>> # >>>> =head1 NAME >>>> >>>> bp_search2gff - Turn SearchIO parseable reports(s) into a GFF report >>>> >>>> >>>> >>>> Brian O. >>>> >>>> On Jan 2, 2013, at 2:44 PM, Jim Hu wrote: >>>> >>>>> I assume this has already been done many times, but I can't seem to find it on bioperl.org or via google. >>>>> >>>>> I'm looking for a script that will take one of the blast+ outformats that includes the positions of gaps and mismatches, and create gff with appropriate subfeatures. >>>>> >>>>> Thanks, >>>>> >>>>> Jim >>>>> ===================================== >>>>> Jim Hu >>>>> Professor >>>>> Dept. of Biochemistry and Biophysics >>>>> 2128 TAMU >>>>> Texas A&M Univ. >>>>> College Station, TX 77843-2128 >>>>> 979-862-4054 >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >>> -- >>> ------------------------------------------------------------------------ >>> Scott Cain, Ph. D. scott at scottcain dot net >>> GMOD Coordinator (http://gmod.org/) 216-392-3087 >>> Ontario Institute for Cancer Research >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at gmail.com Mon Jan 7 04:35:23 2013 From: jason.stajich at gmail.com (Jason Stajich) Date: Mon, 7 Jan 2013 01:35:23 -0800 Subject: [Bioperl-l] Converting blast+ output to gff (with gaps) In-Reply-To: <12E97C07-D3E0-4E21-8A24-A4223468FF70@verizon.net> References: <072DC877-6A75-4A2E-9E6B-027E1D874C4C@tamu.edu> <118F034CF4C3EF48A96F86CE585B94BF4CF5FFF5@CHIMBX5.ad.uillinois.edu> <80636318-ED50-4F05-85C2-35F12E4FF9D8@verizon.net> <12E97C07-D3E0-4E21-8A24-A4223468FF70@verizon.net> Message-ID: with bp_search2gff I included a cmdline options so you decide what is the reference (seqid column) and what is the target (9th column) since in some cases (BLASTX vs TBLASTN) you may want to be displaying one or the other as the reference. this is the -t option. You can also have the -m option when you run which will build the grouping match term for a parent and then the sub-children will be HSPs that have that parent per Malcom's description of the parent child info. Have you tried this with your output - it should work fine with regular vanilla text blast+ output or the -outfmt 6 option (tell bp_search2gff the format is blasttable) or you can use -outfmt 5 and tell it you are using blastxml format. Jason On Jan 5, 2013, at 2:25 PM, Brian Osborne wrote: > All, > > The scripts/utilities/bp_search2gff.pl script works fine with blastn (BLAST+), no surprise there, and since it uses Bio::Tools::GFF it can produce GFF3. > > Brian O. > > > On Jan 2, 2013, at 5:53 PM, Brian Osborne wrote: > >> Scott and Chris, >> >> I'll test it and see? >> >> Brian O. >> >> >> On Jan 2, 2013, at 5:26 PM, "Fields, Christopher J" wrote: >> >>> It should (I recall using it at one point). If it doesn't we should fix it so it does. >>> >>> How does MAKER deal with this? IIRC it uses (a modified) SearchIO-based method... >>> >>> chris >>> >>> On Jan 2, 2013, at 3:32 PM, Scott Cain wrote: >>> >>>> Hi Brian, >>>> >>>> I was going to suggest the same thing--though that script is fairly >>>> old, it's not as old as the blast2gff script in the GBrowse >>>> distribution (which probably should be retired). I believe it >>>> supports GFF3, though I don't have any sample data with which to test >>>> it to be sure. I also don't know if it supports BLAST+ input--I >>>> haven't kept up with SearchIO (on which search2gff.pl depends); will >>>> it accept it? >>>> >>>> Scott >>>> >>>> >>>> On Wed, Jan 2, 2013 at 3:26 PM, Brian Osborne wrote: >>>>> Here's one: >>>>> >>>>> https://github.com/GMOD/GBrowse/blob/master/contrib/blast2gff.pl >>>>> >>>>> Another one: >>>>> >>>>> ~/git/bioperl-live>head scripts/utilities/bp_search2gff.pl >>>>> #!perl >>>>> >>>>> # Author: Jason Stajich >>>>> # Description: Turn SearchIO parseable report(s) into a GFF report >>>>> # >>>>> =head1 NAME >>>>> >>>>> bp_search2gff - Turn SearchIO parseable reports(s) into a GFF report >>>>> >>>>> >>>>> >>>>> Brian O. >>>>> >>>>> On Jan 2, 2013, at 2:44 PM, Jim Hu wrote: >>>>> >>>>>> I assume this has already been done many times, but I can't seem to find it on bioperl.org or via google. >>>>>> >>>>>> I'm looking for a script that will take one of the blast+ outformats that includes the positions of gaps and mismatches, and create gff with appropriate subfeatures. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Jim >>>>>> ===================================== >>>>>> Jim Hu >>>>>> Professor >>>>>> Dept. of Biochemistry and Biophysics >>>>>> 2128 TAMU >>>>>> Texas A&M Univ. >>>>>> College Station, TX 77843-2128 >>>>>> 979-862-4054 >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> >>>> -- >>>> ------------------------------------------------------------------------ >>>> Scott Cain, Ph. D. scott at scottcain dot net >>>> GMOD Coordinator (http://gmod.org/) 216-392-3087 >>>> Ontario Institute for Cancer Research >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason.stajich at gmail.com jason at bioperl.org From pengyu.ut at gmail.com Thu Jan 3 17:41:37 2013 From: pengyu.ut at gmail.com (Peng Yu) Date: Thu, 3 Jan 2013 14:41:37 -0800 (PST) Subject: [Bioperl-l] Error when using Bio::AlignIO on stockholm file from Pfam Message-ID: Hi, I tried the following code on the Pfam stockholm file (downloaded at ftp.sanger.ac.uk/pub/databases/Pfam/current_release/Pfam-A.full.gz). But it gives me the following error. Is it a bug in bioperl or it is an error in the Pfam-A.full.gz file? ~/linux/bin/xplat/src/stk/stk2acc/perl$ cat main.pl #!/usr/bin/env perl use strict; use warnings; #use FindBin; #use lib "$FindBin::Bin/."; use Bio::AlignIO; use Data::Dumper; my $in = Bio::AlignIO->new( -format => 'stockholm', -file => '/dev/stdin' ); while(my $aln=$in->next_aln) { print $aln->accession, "\n"; } ~/linux/bin/xplat/src/stk/stk2acc/perl$ cat main_full.sh #!/usr/bin/env bash zcat ~/db/Pfam-A.full.gz | ./main.pl ~/linux/bin/xplat/src/stk/stk2acc/perl$ ./main_full.sh ... ------------- EXCEPTION ------------- MSG: '1985D' is not an integer. STACK Bio::Range::end /Users/py/Library/ActivePerl-5.14/lib/Bio/ Range.pm:227 STACK Bio::Annotation::Target::new /Users/py/Library/ActivePerl-5.14/ lib/Bio/Annotation/Target.pm:81 STACK Bio::AlignIO::Handler::GenericAlignHandler::_stockholm_target / Users/py/Library/ActivePerl-5.14/lib/Bio/AlignIO/Handler/ GenericAlignHandler.pm:293 STACK Bio::AlignIO::Handler::GenericAlignHandler::data_handler /Users/ py/Library/ActivePerl-5.14/lib/Bio/AlignIO/Handler/ GenericAlignHandler.pm:73 STACK Bio::AlignIO::stockholm::next_aln /Users/py/Library/ ActivePerl-5.14/lib/Bio/AlignIO/stockholm.pm:470 STACK toplevel ./main.pl:17 ------------------------------------- From thomas.girke at gmail.com Wed Jan 9 15:59:08 2013 From: thomas.girke at gmail.com (Thomas Girke) Date: Wed, 9 Jan 2013 12:59:08 -0800 (PST) Subject: [Bioperl-l] Bioinformatics Position Opening Message-ID: <9a99afb9-4a67-4d20-89ce-4d416fa7a46b@googlegroups.com> Dear List, Below is an announcement for a Ph.D. level bioinformatics position at UCR. It is a long-term position with a competitive salary in a vibrant research environment with cutting edge high-performance compute and genomics facilities. Application instructions are given in the announcement. Potential candidates are welcome to email me their questions about this position directly, e.g. prior or after submitting a formal application. Best, Thomas -- Thomas Girke Associate Professor of Bioinformatics Institute for Integrative Genome Biology (IIGB) 1207F Genomics Building University of California Riverside, CA 92521 E-mail: thomas.girke at ucr.edu Ph: 951-905-5232 Fax: 951-827-5155 POSITION ANNOUNCEMENT POSITION The Institute for Integrative Genome Biology (IIGB) at the University of California, Riverside is seeking a Ph.D. level bioinformatician to manage its bioinformatics research activities and computing facility. TITLE/RANK Bioinformatics Facility Director. Salary will be competitive and commensurate with accomplishments. LOCATION University of California, Riverside. BACKGROUND Successful candidates will join an innovative and multidisciplinary Institute for Integrative Genome Biology (IIGB) that connects theoretical and experimental researchers from different departments in Life, Physical and Mathematical Sciences, Medicine, Engineering and various campus based Centers. The IIGB is organized around a 10,000 sq.ft. suite of Instrumentation Facilities that serve as a centralized, shared-use resource for faculty, staff and students, offering advanced tools in bioinformatics, microscopy, proteomics and genomics. Its bioinformatic component is equipped with an extensive high-performance compute (HPC) infrastructure. QUALIFICATIONS Applicants must have a Ph.D. from a recognized university in bioinformatics; or combined degrees in computer science and a biological science; or a degree in either computer science combined with relevant experience in biological science; or a degree in a biological science combined with relevant experience in computer science. The successful candidate will have at least 2 years of professional hands-on experience with next generation sequence data analysis, scientific data programming and high-performance computing. A strong publication record of bioinformatics research in collaboration with experimental biologists is expected. Another requirement is several years of professional experience with common programming languages/environments. This includes at least one statistical programming environment (preferentially R), one or more general-purpose scripting languages (e.g. Python, Perl or Ruby), experience with web development frameworks and relational database design. Several years of computational research experience using HPC systems will be beneficial. The incumbent should also have experience with the analysis of modern biological data sets, such as microarrays, next generation sequence data (e.g. genotyping, RNA profiling, de novo assemblies), phylogenetics and/or molecular dynamics simulations. RESPONSIBILITIES The Bioinformatics Facility Director manages IIGB's computational infrastructure jointly with its bioinformatics staff, including an HPC/Linux systems administrator, one or more programmers and students workers. The incumbent will be required to provide data analysis support to collaborative research activities and make available findings through presentations and contribute as team member to scientific publications as well as participate in the preparation of joint grant applications and reports. The teaching expectations include the development of a state-of-the-art workshop program on large-scale data analysis and programming. Participation in collaborative equipment grants will be another core responsibility to secure future growth of the facility?s computing resources. TO APPLY Review of applications will continue until the position is filled. Interested individuals should: (1) submit a curriculum vitae, (2) provide a statement of research interests, and (3) arrange to have three letters of reference sent on their behalf. All information should be emailed to: thomas.girke at ucr.edu WEBSITE http://facility.bioinformatics.ucr.edu/position-opening From frej.tulin at gmail.com Thu Jan 10 21:53:16 2013 From: frej.tulin at gmail.com (Frej Tulin) Date: Thu, 10 Jan 2013 21:53:16 -0500 Subject: [Bioperl-l] stand alone blast plus problem Message-ID: Hi I have been trying for a long time to control my blast searches with the Bio::Tools::Run::StandAloneBlastPlus module. The module is installed and gets loaded ok. However, when calling: my $blast_db = Bio::Tools::Run::StandAloneBlastPlus -> new ( -db_data => \@seq_arr, -create => 1); $blast_db -> make_db(); ---------------------------------------------------------------------- I get this error ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: /home/frej/ncbi-blast-2.2.27+/bin/makeblastdb call crashed: There was a problem running /home/frej/ncbi-blast-2.2.27+/bin/makeblastdb : Error: NCBI C++ Exception: "/am/ncbiapdata/release/blast/src/2.2.27/Linux64-Centos-icc/c++/ICC1010-ReleaseMT64--Linux64-Centos-icc/../src/objects/seq/../seqloc/Seq_id.cpp", line 1689: Error: ncbi::objects::CSeq_id::x_Init() - Unsupported ID type ATCG00500.1 STACK: Error::throw STACK: Bio::Root::Root::throw /home/frej/perl5/lib/perl5/Bio/Root/Root.pm:472 STACK: Bio::Tools::Run::WrapperBase::_run /home/frej/perl5/lib/perl5/Bio/Tools/Run/WrapperBase/CommandExts.pm:1012 STACK: Bio::Tools::Run::StandAloneBlastPlus::make_db /home/frej/perl5/lib/perl5/Bio/Tools/Run/StandAloneBlastPlus.pm:776 STACK: comp2fasta.pl:57 ------------------------------------------------------------------------------------------------------------------------------- The general header format of the fasta files I've tried are >ATCG00500.1|PACid:19637947 or >g8540.t1|PACid:26887931 I always get the same message. I have tried modifying the header by putting 'lcl|' in from but nothing seems to help. When I build a database using the makeblastdb directly from the command line it works ok, it just doesn't work from within bioperl. How can this be? What am I doing wrong? helpful for any assistance best /frej -- Frej Tulin Fred Cross lab Rockefeller University From cjfields at illinois.edu Fri Jan 11 00:04:12 2013 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 11 Jan 2013 05:04:12 +0000 Subject: [Bioperl-l] Error when using Bio::AlignIO on stockholm file from Pfam In-Reply-To: References: Message-ID: <118F034CF4C3EF48A96F86CE585B94BF4CF6A00B@CHIMBX5.ad.uillinois.edu> I haven't used the '/dev/stdin' file like this, I typically do: > my $in = Bio::AlignIO->new( > -format => 'stockholm', > -fh => \*STDIN > ); but if it works? That does look like a bug in the parser. Would you know the Pfam alignmet accessions that triggered it, or maybe ones just prior to the fail? chris On Jan 3, 2013, at 4:41 PM, Peng Yu wrote: > Hi, > > I tried the following code on the Pfam stockholm file (downloaded at > ftp.sanger.ac.uk/pub/databases/Pfam/current_release/Pfam-A.full.gz). > But it gives me the following error. Is it a bug in bioperl or it is > an error in the Pfam-A.full.gz file? > > ~/linux/bin/xplat/src/stk/stk2acc/perl$ cat main.pl > #!/usr/bin/env perl > > use strict; > use warnings; > > #use FindBin; > #use lib "$FindBin::Bin/."; > > use Bio::AlignIO; > use Data::Dumper; > > my $in = Bio::AlignIO->new( > -format => 'stockholm', > -file => '/dev/stdin' > ); > > while(my $aln=$in->next_aln) { > print $aln->accession, "\n"; > } > > ~/linux/bin/xplat/src/stk/stk2acc/perl$ cat main_full.sh > #!/usr/bin/env bash > > zcat ~/db/Pfam-A.full.gz | ./main.pl > > ~/linux/bin/xplat/src/stk/stk2acc/perl$ ./main_full.sh > > ... > > ------------- EXCEPTION ------------- > MSG: '1985D' is not an integer. > > STACK Bio::Range::end /Users/py/Library/ActivePerl-5.14/lib/Bio/ > Range.pm:227 > STACK Bio::Annotation::Target::new /Users/py/Library/ActivePerl-5.14/ > lib/Bio/Annotation/Target.pm:81 > STACK Bio::AlignIO::Handler::GenericAlignHandler::_stockholm_target / > Users/py/Library/ActivePerl-5.14/lib/Bio/AlignIO/Handler/ > GenericAlignHandler.pm:293 > STACK Bio::AlignIO::Handler::GenericAlignHandler::data_handler /Users/ > py/Library/ActivePerl-5.14/lib/Bio/AlignIO/Handler/ > GenericAlignHandler.pm:73 > STACK Bio::AlignIO::stockholm::next_aln /Users/py/Library/ > ActivePerl-5.14/lib/Bio/AlignIO/stockholm.pm:470 > STACK toplevel ./main.pl:17 > ------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Russell.Smithies at agresearch.co.nz Sun Jan 13 17:25:16 2013 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Mon, 14 Jan 2013 11:25:16 +1300 Subject: [Bioperl-l] stand alone blast plus problem In-Reply-To: References: Message-ID: <18DF7D20DFEC044098A1062202F5FFF372E99EE4C1@exchsth.agresearch.co.nz> Hi Frej, Here's a few things to try: Is it possible you need to escape the special chars in your IDs? i.e. the pipes and colons? Have you tried it with simpler IDs? Is your BioPerl sequence object collection created correctly? Can you create the database by just passing it the fasta file? $fac = Bio::Tools::Run::StandAloneBlastPlus->new( -db_name => 'mydb', -db_data => 'myseqs.fas', -create => 1 ); --Russell -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Frej Tulin Sent: Friday, 11 January 2013 3:53 p.m. To: bioperl-l at bioperl.org Subject: [Bioperl-l] stand alone blast plus problem Hi I have been trying for a long time to control my blast searches with the Bio::Tools::Run::StandAloneBlastPlus module. The module is installed and gets loaded ok. However, when calling: my $blast_db = Bio::Tools::Run::StandAloneBlastPlus -> new ( -db_data => \@seq_arr, -create => 1); $blast_db -> make_db(); ---------------------------------------------------------------------- I get this error ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: /home/frej/ncbi-blast-2.2.27+/bin/makeblastdb call crashed: There was a problem running /home/frej/ncbi-blast-2.2.27+/bin/makeblastdb : Error: NCBI C++ Exception: "/am/ncbiapdata/release/blast/src/2.2.27/Linux64-Centos-icc/c++/ICC1010-ReleaseMT64--Linux64-Centos-icc/../src/objects/seq/../seqloc/Seq_id.cpp", line 1689: Error: ncbi::objects::CSeq_id::x_Init() - Unsupported ID type ATCG00500.1 STACK: Error::throw STACK: Bio::Root::Root::throw /home/frej/perl5/lib/perl5/Bio/Root/Root.pm:472 STACK: Bio::Tools::Run::WrapperBase::_run /home/frej/perl5/lib/perl5/Bio/Tools/Run/WrapperBase/CommandExts.pm:1012 STACK: Bio::Tools::Run::StandAloneBlastPlus::make_db /home/frej/perl5/lib/perl5/Bio/Tools/Run/StandAloneBlastPlus.pm:776 STACK: comp2fasta.pl:57 ------------------------------------------------------------------------------------------------------------------------------- The general header format of the fasta files I've tried are >ATCG00500.1|PACid:19637947 or >g8540.t1|PACid:26887931 I always get the same message. I have tried modifying the header by putting 'lcl|' in from but nothing seems to help. When I build a database using the makeblastdb directly from the command line it works ok, it just doesn't work from within bioperl. How can this be? What am I doing wrong? helpful for any assistance best /frej -- Frej Tulin Fred Cross lab Rockefeller University _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From awitney at sgul.ac.uk Thu Jan 17 06:45:40 2013 From: awitney at sgul.ac.uk (Adam Witney) Date: Thu, 17 Jan 2013 11:45:40 +0000 Subject: [Bioperl-l] drawing genome comparison cross linking tracks? Message-ID: <50F7E464.3090906@sgul.ac.uk> Hi, I would like to produces graphics as shown in this section (cross linking between genome tracks) [scroll down a little to see the pictures]: http://www.biopython.org/DIST/docs/tutorial/Tutorial.html#htoc211 but using perl. Can Bio::Graphics do this? I can only find Gbrowse_syn, but this works within a web browser, I would like to generate these diagrams from a stand alone script. If not does anyone know any perl libraries that can be used to show genome comparisons? Thanks Adam From cjfields at illinois.edu Thu Jan 17 08:37:04 2013 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 17 Jan 2013 13:37:04 +0000 Subject: [Bioperl-l] drawing genome comparison cross linking tracks? In-Reply-To: <50F7E464.3090906@sgul.ac.uk> References: <50F7E464.3090906@sgul.ac.uk> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF69BFAE17@CITESMBX5.ad.uillinois.edu> Gbrowse_syn, SynBrowse, and others can do this, but there isn't a pre-packaged way of drawing this that I know of analogous to Biopython's. It would be nice to have, though. chris On Jan 17, 2013, at 5:45 AM, Adam Witney wrote: > Hi, > > I would like to produces graphics as shown in this section (cross linking between genome tracks) [scroll down a little to see the pictures]: > > http://www.biopython.org/DIST/docs/tutorial/Tutorial.html#htoc211 > > but using perl. Can Bio::Graphics do this? I can only find Gbrowse_syn, but this works within a web browser, I would like to generate these diagrams from a stand alone script. If not does anyone know any perl libraries that can be used to show genome comparisons? > > Thanks > > Adam > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From awitney at sgul.ac.uk Thu Jan 17 12:33:42 2013 From: awitney at sgul.ac.uk (Adam Witney) Date: Thu, 17 Jan 2013 17:33:42 +0000 Subject: [Bioperl-l] drawing genome comparison cross linking tracks? In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF69BFAE17@CITESMBX5.ad.uillinois.edu> References: <50F7E464.3090906@sgul.ac.uk> <118F034CF4C3EF48A96F86CE585B94BF69BFAE17@CITESMBX5.ad.uillinois.edu> Message-ID: <50F835F6.5090603@sgul.ac.uk> Thanks Chris. I hadn't seen SynBrowse before, looks good. But I would like to do this without all the database/webserver overhead, I have a perl script that can interact with a gbrowse_syn database to generate images ie no webserver, I just need to separate out the database stuff now. Thanks again Adam On 17/01/2013 13:37, Fields, Christopher J wrote: > Gbrowse_syn, SynBrowse, and others can do this, but there isn't a pre-packaged way of drawing this that I know of analogous to Biopython's. It would be nice to have, though. > > chris > > On Jan 17, 2013, at 5:45 AM, Adam Witney wrote: > >> Hi, >> >> I would like to produces graphics as shown in this section (cross linking between genome tracks) [scroll down a little to see the pictures]: >> >> http://www.biopython.org/DIST/docs/tutorial/Tutorial.html#htoc211 >> >> but using perl. Can Bio::Graphics do this? I can only find Gbrowse_syn, but this works within a web browser, I would like to generate these diagrams from a stand alone script. If not does anyone know any perl libraries that can be used to show genome comparisons? >> >> Thanks >> >> Adam >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Thu Jan 17 12:41:47 2013 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 17 Jan 2013 17:41:47 +0000 Subject: [Bioperl-l] drawing genome comparison cross linking tracks? In-Reply-To: <50F835F6.5090603@sgul.ac.uk> References: <50F7E464.3090906@sgul.ac.uk> <118F034CF4C3EF48A96F86CE585B94BF69BFAE17@CITESMBX5.ad.uillinois.edu> <50F835F6.5090603@sgul.ac.uk> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF69BFD4BA@CITESMBX5.ad.uillinois.edu> Let us know if this can be of use to the community, I'm sure others would find it useful as well. chris On Jan 17, 2013, at 11:33 AM, Adam Witney wrote: > > Thanks Chris. I hadn't seen SynBrowse before, looks good. > > But I would like to do this without all the database/webserver overhead, I have a perl script that can interact with a gbrowse_syn database to generate images ie no webserver, I just need to separate out the database stuff now. > > Thanks again > > Adam > > On 17/01/2013 13:37, Fields, Christopher J wrote: >> Gbrowse_syn, SynBrowse, and others can do this, but there isn't a pre-packaged way of drawing this that I know of analogous to Biopython's. It would be nice to have, though. >> >> chris >> >> On Jan 17, 2013, at 5:45 AM, Adam Witney wrote: >> >>> Hi, >>> >>> I would like to produces graphics as shown in this section (cross linking between genome tracks) [scroll down a little to see the pictures]: >>> >>> http://www.biopython.org/DIST/docs/tutorial/Tutorial.html#htoc211 >>> >>> but using perl. Can Bio::Graphics do this? I can only find Gbrowse_syn, but this works within a web browser, I would like to generate these diagrams from a stand alone script. If not does anyone know any perl libraries that can be used to show genome comparisons? >>> >>> Thanks >>> >>> Adam >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From buschj at hhu.de Sat Jan 19 11:12:54 2013 From: buschj at hhu.de (jobu) Date: Sat, 19 Jan 2013 17:12:54 +0100 Subject: [Bioperl-l] Parsing Blast-Report extracting "Features flanking .." Message-ID: <50FAC606.2060800@hhu.de> Dear List, I've got quite a lot of hits from blasting against nt-database that state: "Features flanking this part of subject sequence: " where the subject-accession only will give information about complete genomic sequences, the detailed information inside the "Features flanking" section of the HSP about annotations of upstream and downstream elements would be what I need. Is there any elegant way to make Bio::SearchIO do this job for me? greetings Jochen From saladi1 at illinois.edu Tue Jan 22 16:45:23 2013 From: saladi1 at illinois.edu (Shyam Saladi) Date: Tue, 22 Jan 2013 13:45:23 -0800 Subject: [Bioperl-l] Bio::DB::Taxonomy example Message-ID: Hi, There is an example on how to use Bio::DB::Taxonomy at http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy I tried the script but could not get it to work properly. After looking through the documentation it seems that my @extant_children = grep { $_->rank eq 'species' } $node->get_all_Descendents; should instead be my @extant_children = grep { $_->rank eq 'species' } $db->get_all_Descendents($node) I found this solves the problem for me. If my fix is in fact correct, an admin may want to change the example script online. Thanks, Shyam ** From cjfields at illinois.edu Tue Jan 22 18:40:05 2013 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 22 Jan 2013 23:40:05 +0000 Subject: [Bioperl-l] Bio::DB::Taxonomy example In-Reply-To: References: Message-ID: <118F034CF4C3EF48A96F86CE585B94BF6CDFE139@CITESMBX5.ad.uillinois.edu> Done. Thanks for pointing that out! chris On Jan 23, 2013, at 5:45 AM, Shyam Saladi wrote: > Hi, > > There is an example on how to use Bio::DB::Taxonomy at > http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy > > I tried the script but could not get it to work properly. After looking > through the documentation it seems that > > my @extant_children = grep { $_->rank eq 'species' } > $node->get_all_Descendents; > > should instead be > > my @extant_children = grep { $_->rank eq 'species' } > $db->get_all_Descendents($node) > > I found this solves the problem for me. If my fix is in fact correct, an > admin may want to change the example script online. > > Thanks, > Shyam > ** > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From buschj at hhu.de Fri Jan 25 05:20:22 2013 From: buschj at hhu.de (jobu) Date: Fri, 25 Jan 2013 11:20:22 +0100 Subject: [Bioperl-l] Parsing Blast-Report extracting "Features flanking .." In-Reply-To: References: <50FAC606.2060800@hhu.de> Message-ID: <51025C66.6060008@hhu.de> Am 22.01.2013 19:03, schrieb Mgavi Brathwaite: > What upstream and downstream elements are you interested in? I've got a huge pile of short RNA reads. Part of the question now is whether those RNA fragments originate from siRNA events, or may represent miRNAs / parts of pre-miRNAs. So I did an online blast search against database nt. The resulting report quite often just gives subject information like this: ----- > gb|CP002686.1| Arabidopsis thaliana chromosome 3, complete sequence Length=23459830 ----- Now I would like to get the hit's neighbouring regions for further analysis. Preferably I would like to do that in an automized way, but the only possible action with this kind of subject gi | description would be to fetch the entire chromosomal sequence I guess ? However, right below the line above, the report states more precisely: ------ Features flanking this part of subject sequence: 8872 bp at 5' side: cytochrome P450 90B1 402 bp at 3' side: U1 small nuclear ribonucleoprotein-70K ------ Still I would like to have the possibility to automatically fetch the subject's sequence(s), as of now I think parsing the report with SearchIO won't let me aquire that information, because SearchIO does not recognize report sections like those. I hope I did not miss any of SearchIOs capabilities, but I could not find any method covering my wish?! Right now maybe the only way to get the information I want is to construct my own parser and write it out into a separate file, which in turn again I could read into a hash before processing the Blast-Report with SearchIO to combine both data for further automized work. I am aware though that even successfully getting the flanking features would leave me with the more or less wide intergenic gap my hsp is located in. However I'm in need of a way to get the flanking features including their annotation and the region spanning between them. But I hope I do not have to get complete sequences to accomplish that, as this would be kind of an overkill. with kind regards Jochen From afonsomduarte at gmail.com Mon Jan 14 05:09:48 2013 From: afonsomduarte at gmail.com (Afonso Duarte) Date: Mon, 14 Jan 2013 02:09:48 -0800 (PST) Subject: [Bioperl-l] Parsing Blast results for different genus Message-ID: <217cf8ac-5323-48f5-8a81-aba032cba68b@googlegroups.com> Dear All, I have obtained several .gb BLAST results that I want to split in different .gb or .fasta files accordingly to genus. I searched for different ways to do it in the web, but didn't find a way to do it when one has something like 50 genus in the blast. (to do it for a short number or genus I could use splitgb.pl from http://www.bioperl.org/wiki/HOWTO:SeqIO but it will not be feasible for a large number of genus). Is there a script of way to do it in a efficient way ? Best PS sorry if this message appears twice ... as i was having some trouble posting the first time. From afonsomduarte at gmail.com Sun Jan 13 14:17:19 2013 From: afonsomduarte at gmail.com (Afonso Duarte) Date: Sun, 13 Jan 2013 11:17:19 -0800 (PST) Subject: [Bioperl-l] Split blast result by genus. How to parse it ? Message-ID: <83857a34-14a0-43e2-a2b3-5ca538d41bcd@googlegroups.com> Dear All, I have obtained several .gb BLAST results that I want to split in different .gb or .fasta files accordingly to genus. I searched for different ways to do it in the web, but didn't find a way to do it when one has something like 50 genus in the blast. (to do it for a short number or genus I could use splitgb.pl from http://www.bioperl.org/wiki/HOWTO:SeqIO but it will not be feasible for a large number of genus). Is there a script of way to do it in a efficient way ? Best From afonsomduarte at gmail.com Mon Jan 21 04:03:44 2013 From: afonsomduarte at gmail.com (Afonso Duarte) Date: Mon, 21 Jan 2013 01:03:44 -0800 (PST) Subject: [Bioperl-l] spliting genbank file into smaller ones via organism Message-ID: Dear List, I have obtained several .gb BLAST results that I want to split in different .gb or .fasta files accordingly to genus. I searched for different ways to do it in the web, but didn't find a way to do it when one has something like 50 genus in the blast. (to do it for a short number or genus I could use splitgb.pl from http://www.bioperl.org/wiki/HOWTO:SeqIO but it will not be feasible for a large number of genus). Is there a script of way to do it in a efficient way ? Best From jason.stajich at gmail.com Tue Jan 29 14:00:16 2013 From: jason.stajich at gmail.com (Jason Stajich) Date: Tue, 29 Jan 2013 11:00:16 -0800 Subject: [Bioperl-l] Parsing Blast-Report extracting "Features flanking .." In-Reply-To: <51025C66.6060008@hhu.de> References: <50FAC606.2060800@hhu.de> <51025C66.6060008@hhu.de> Message-ID: <6E83E3F3-C304-4DC4-9A11-FE1CA90F207D@gmail.com> We don't parse the NCBI feature info from the BLAST reports per your query. To look up a specific feature you can use Bio::DB::GenBank to query for sequence from a specific feature by accession number - see the HOWTOs for that. However, most people use tools that generate SAM/BAM files with short reads - then you can use a tool like bedtools to find overlaps of reads with the locations of features. basically: - download the genome and GFF for arabidopsis - align your sRNA to the genome with a short read aligner - bowtie, bwa, others - convert your sam to bam file with SAMtools or picard - compare the location of features with the reads to get expression summaries or individuals reads with BEDTools On Jan 25, 2013, at 2:20 AM, jobu wrote: > Am 22.01.2013 19:03, schrieb Mgavi Brathwaite: >> What upstream and downstream elements are you interested in? > > > I've got a huge pile of short RNA reads. > Part of the question now is whether those RNA fragments originate from > siRNA events, > or may represent miRNAs / parts of pre-miRNAs. > > So I did an online blast search against database nt. > The resulting report quite often just gives subject information like this: > > ----- >> gb|CP002686.1| Arabidopsis thaliana chromosome 3, complete sequence > Length=23459830 > ----- > > Now I would like to get the hit's neighbouring regions for further > analysis. > Preferably I would like to do that in an automized way, but the only > possible action with this kind of subject gi | description would be to > fetch the entire chromosomal sequence I guess ? > > However, > right below the line above, the report states more precisely: > > ------ > Features flanking this part of subject sequence: > 8872 bp at 5' side: cytochrome P450 90B1 > 402 bp at 3' side: U1 small nuclear ribonucleoprotein-70K > ------ > > Still I would like to have the possibility to automatically fetch the > subject's sequence(s), > as of now I think parsing the report with SearchIO won't let me aquire > that information, because SearchIO does not recognize report sections > like those. > > I hope I did not miss any of SearchIOs capabilities, but I could not > find any method covering my wish?! > > Right now maybe the only way to get the information I want is to > construct my own parser and write it out into a separate file, which in > turn again I could read into a hash before processing the Blast-Report > with SearchIO to combine both data for further automized work. > > I am aware though that even successfully getting the flanking features > would leave me with the more or less wide intergenic gap my hsp is > located in. > > However I'm in need of a way to get the flanking features including > their annotation and the region spanning between them. > But I hope I do not have to get complete sequences to accomplish that, > as this would be kind of an overkill. > > with kind regards > Jochen > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason.stajich at gmail.com jason at bioperl.org From dr_kilburn59 at yahoo.com Wed Jan 30 16:40:26 2013 From: dr_kilburn59 at yahoo.com (Dan kilburn) Date: Wed, 30 Jan 2013 16:40:26 -0500 Subject: [Bioperl-l] Bioperl-l Digest, Vol 117, Issue 13 In-Reply-To: References: Message-ID: <575B184F-C051-4FEF-9BEB-7AB98E3C52A6@yahoo.com> Hi Jason, Are there any plans to keep SearchIO up to date with ncbi blast? I know they change formats ridiculously often, but I had to write my own parser to get sequence identity, which I would rather not have done. I realize that this job would be a big load on anyone who takes it, but it's so fundamental. Maybe I can help. --Dan Sent from my iPhone On Jan 30, 2013, at 12:00 PM, bioperl-l-request at lists.open-bio.org wrote: > Send Bioperl-l mailing list submissions to > bioperl-l at lists.open-bio.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://lists.open-bio.org/mailman/listinfo/bioperl-l > or, via email, send a message with subject or body 'help' to > bioperl-l-request at lists.open-bio.org > > You can reach the person managing the list at > bioperl-l-owner at lists.open-bio.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Bioperl-l digest..." > > > Today's Topics: > > 1. Re: Parsing Blast-Report extracting "Features flanking .." > (Jason Stajich) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Tue, 29 Jan 2013 11:00:16 -0800 > From: Jason Stajich > Subject: Re: [Bioperl-l] Parsing Blast-Report extracting "Features > flanking .." > To: buschj at hhu.de > Cc: bioperl-l at lists.open-bio.org > Message-ID: <6E83E3F3-C304-4DC4-9A11-FE1CA90F207D at gmail.com> > Content-Type: text/plain; charset=us-ascii > > We don't parse the NCBI feature info from the BLAST reports per your query. To look up a specific feature you can use Bio::DB::GenBank to query for sequence from a specific feature by accession number - see the HOWTOs for that. > > However, most people use tools that generate SAM/BAM files with short reads - then you can use a tool like bedtools to find overlaps of reads with the locations of features. > > basically: > - download the genome and GFF for arabidopsis > - align your sRNA to the genome with a short read aligner - bowtie, bwa, others > - convert your sam to bam file with SAMtools or picard > - compare the location of features with the reads to get expression summaries or individuals reads with BEDTools > > > On Jan 25, 2013, at 2:20 AM, jobu wrote: > >> Am 22.01.2013 19:03, schrieb Mgavi Brathwaite: >>> What upstream and downstream elements are you interested in? >> >> >> I've got a huge pile of short RNA reads. >> Part of the question now is whether those RNA fragments originate from >> siRNA events, >> or may represent miRNAs / parts of pre-miRNAs. >> >> So I did an online blast search against database nt. >> The resulting report quite often just gives subject information like this: >> >> ----- >>> gb|CP002686.1| Arabidopsis thaliana chromosome 3, complete sequence >> Length=23459830 >> ----- >> >> Now I would like to get the hit's neighbouring regions for further >> analysis. >> Preferably I would like to do that in an automized way, but the only >> possible action with this kind of subject gi | description would be to >> fetch the entire chromosomal sequence I guess ? >> >> However, >> right below the line above, the report states more precisely: >> >> ------ >> Features flanking this part of subject sequence: >> 8872 bp at 5' side: cytochrome P450 90B1 >> 402 bp at 3' side: U1 small nuclear ribonucleoprotein-70K >> ------ >> >> Still I would like to have the possibility to automatically fetch the >> subject's sequence(s), >> as of now I think parsing the report with SearchIO won't let me aquire >> that information, because SearchIO does not recognize report sections >> like those. >> >> I hope I did not miss any of SearchIOs capabilities, but I could not >> find any method covering my wish?! >> >> Right now maybe the only way to get the information I want is to >> construct my own parser and write it out into a separate file, which in >> turn again I could read into a hash before processing the Blast-Report >> with SearchIO to combine both data for further automized work. >> >> I am aware though that even successfully getting the flanking features >> would leave me with the more or less wide intergenic gap my hsp is >> located in. >> >> However I'm in need of a way to get the flanking features including >> their annotation and the region spanning between them. >> But I hope I do not have to get complete sequences to accomplish that, >> as this would be kind of an overkill. >> >> with kind regards >> Jochen >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > > > > > ------------------------------ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > End of Bioperl-l Digest, Vol 117, Issue 13 > ****************************************** From cjfields at illinois.edu Wed Jan 30 18:13:49 2013 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 30 Jan 2013 23:13:49 +0000 Subject: [Bioperl-l] Bioperl-l Digest, Vol 117, Issue 13 In-Reply-To: <575B184F-C051-4FEF-9BEB-7AB98E3C52A6@yahoo.com> References: <575B184F-C051-4FEF-9BEB-7AB98E3C52A6@yahoo.com> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF6CE1175C@CHIMBX5.ad.uillinois.edu> We certainly accept support for updates to the code. In fact, if you are familiar with git/github the process is fairly straightforward: 1) Fork the code to your github account 2) Make and commit changes 3) Submit a pull request 4) Post something to the list just in case. We also accept code patches; the best way to submit these is as a bug report to the redmine repository (doesn't hurt to post here as well): https://redmine.open-bio.org/ chris On Jan 30, 2013, at 3:40 PM, Dan kilburn wrote: > Hi Jason, > > Are there any plans to keep SearchIO up to date with ncbi blast? I know they change formats ridiculously often, but I had to write my own parser to get sequence identity, which I would rather not have done. I realize that this job would be a big load on anyone who takes it, but it's so fundamental. Maybe I can help. > > --Dan > Sent from my iPhone > > On Jan 30, 2013, at 12:00 PM, bioperl-l-request at lists.open-bio.org wrote: > >> Send Bioperl-l mailing list submissions to >> bioperl-l at lists.open-bio.org >> >> To subscribe or unsubscribe via the World Wide Web, visit >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> or, via email, send a message with subject or body 'help' to >> bioperl-l-request at lists.open-bio.org >> >> You can reach the person managing the list at >> bioperl-l-owner at lists.open-bio.org >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of Bioperl-l digest..." >> >> >> Today's Topics: >> >> 1. Re: Parsing Blast-Report extracting "Features flanking .." >> (Jason Stajich) >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Tue, 29 Jan 2013 11:00:16 -0800 >> From: Jason Stajich >> Subject: Re: [Bioperl-l] Parsing Blast-Report extracting "Features >> flanking .." >> To: buschj at hhu.de >> Cc: bioperl-l at lists.open-bio.org >> Message-ID: <6E83E3F3-C304-4DC4-9A11-FE1CA90F207D at gmail.com> >> Content-Type: text/plain; charset=us-ascii >> >> We don't parse the NCBI feature info from the BLAST reports per your query. To look up a specific feature you can use Bio::DB::GenBank to query for sequence from a specific feature by accession number - see the HOWTOs for that. >> >> However, most people use tools that generate SAM/BAM files with short reads - then you can use a tool like bedtools to find overlaps of reads with the locations of features. >> >> basically: >> - download the genome and GFF for arabidopsis >> - align your sRNA to the genome with a short read aligner - bowtie, bwa, others >> - convert your sam to bam file with SAMtools or picard >> - compare the location of features with the reads to get expression summaries or individuals reads with BEDTools >> >> >> On Jan 25, 2013, at 2:20 AM, jobu wrote: >> >>> Am 22.01.2013 19:03, schrieb Mgavi Brathwaite: >>>> What upstream and downstream elements are you interested in? >>> >>> >>> I've got a huge pile of short RNA reads. >>> Part of the question now is whether those RNA fragments originate from >>> siRNA events, >>> or may represent miRNAs / parts of pre-miRNAs. >>> >>> So I did an online blast search against database nt. >>> The resulting report quite often just gives subject information like this: >>> >>> ----- >>>> gb|CP002686.1| Arabidopsis thaliana chromosome 3, complete sequence >>> Length=23459830 >>> ----- >>> >>> Now I would like to get the hit's neighbouring regions for further >>> analysis. >>> Preferably I would like to do that in an automized way, but the only >>> possible action with this kind of subject gi | description would be to >>> fetch the entire chromosomal sequence I guess ? >>> >>> However, >>> right below the line above, the report states more precisely: >>> >>> ------ >>> Features flanking this part of subject sequence: >>> 8872 bp at 5' side: cytochrome P450 90B1 >>> 402 bp at 3' side: U1 small nuclear ribonucleoprotein-70K >>> ------ >>> >>> Still I would like to have the possibility to automatically fetch the >>> subject's sequence(s), >>> as of now I think parsing the report with SearchIO won't let me aquire >>> that information, because SearchIO does not recognize report sections >>> like those. >>> >>> I hope I did not miss any of SearchIOs capabilities, but I could not >>> find any method covering my wish?! >>> >>> Right now maybe the only way to get the information I want is to >>> construct my own parser and write it out into a separate file, which in >>> turn again I could read into a hash before processing the Blast-Report >>> with SearchIO to combine both data for further automized work. >>> >>> I am aware though that even successfully getting the flanking features >>> would leave me with the more or less wide intergenic gap my hsp is >>> located in. >>> >>> However I'm in need of a way to get the flanking features including >>> their annotation and the region spanning between them. >>> But I hope I do not have to get complete sequences to accomplish that, >>> as this would be kind of an overkill. >>> >>> with kind regards >>> Jochen >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Jason Stajich >> jason.stajich at gmail.com >> jason at bioperl.org >> >> >> >> >> ------------------------------ >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> End of Bioperl-l Digest, Vol 117, Issue 13 >> ****************************************** > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From thiago.venancio at gmail.com Thu Jan 31 12:10:39 2013 From: thiago.venancio at gmail.com (Thiago Venancio) Date: Thu, 31 Jan 2013 09:10:39 -0800 (PST) Subject: [Bioperl-l] bioperl-ext Message-ID: <5523d5a2-b103-4cf3-8c84-bd33ef675a6d@googlegroups.com> Hi all, I am trying to install bioperl-ext and the compilation gives me the error: /usr/bin/ld: libs/libsw.a(aln.o): relocation R_X86_64_32 against `.rodata' can not be used when making a shared object; recompile with -fPIC libs/libsw.a: could not read symbols: Bad value collect2: ld returned 1 exit status make[1]: *** [blib/arch/auto/Bio/Ext/Align/Align.so] Error 1 make[1]: Leaving directory `/home/thiago/temp/bioperl-ext-master/Bio/Ext/Align' make: *** [subdirs] Error 2 However, the Makefile says that -fPIC is already being used, which is puzzling me. Any clues ? Thanks. Thiago From thiago.venancio at gmail.com Thu Jan 31 12:25:12 2013 From: thiago.venancio at gmail.com (Thiago Venancio) Date: Thu, 31 Jan 2013 09:25:12 -0800 (PST) Subject: [Bioperl-l] bioperl-ext In-Reply-To: <5523d5a2-b103-4cf3-8c84-bd33ef675a6d@googlegroups.com> References: <5523d5a2-b103-4cf3-8c84-bd33ef675a6d@googlegroups.com> Message-ID: Solved using the solution posted by jmf here : https://groups.google.com/forum/?fromgroups=#!searchin/bioperl-l/bioperl-ext/bioperl-l/FgJo_g8UQE8/FB1DEjJkTooJ Which I found after posting. T On Thursday, January 31, 2013 3:10:39 PM UTC-2, Thiago Venancio wrote: > > Hi all, > > I am trying to install bioperl-ext and the compilation gives me the error: > > /usr/bin/ld: libs/libsw.a(aln.o): relocation R_X86_64_32 against `.rodata' > can not be used when making a shared object; recompile with -fPIC > libs/libsw.a: could not read symbols: Bad value > collect2: ld returned 1 exit status > make[1]: *** [blib/arch/auto/Bio/Ext/Align/Align.so] Error 1 > make[1]: Leaving directory > `/home/thiago/temp/bioperl-ext-master/Bio/Ext/Align' > make: *** [subdirs] Error 2 > > However, the Makefile says that -fPIC is already being used, which is > puzzling me. > > Any clues ? > > Thanks. > Thiago > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From koriege at googlemail.com Thu Jan 31 21:49:20 2013 From: koriege at googlemail.com (koriege at googlemail.com) Date: Thu, 31 Jan 2013 18:49:20 -0800 (PST) Subject: [Bioperl-l] problem with Bio::*::Fasta id_parser Message-ID: Hi, I tried two methods to create a bioperl FASTA database, but it failes by extracting the substring out of my headers. Can someone explain me why I get the standard header or show me a work around? thanks in advance. pyr0 i) my $objDB = Bio::Index::Fasta->new(-filename => $PATHdbIdx, -write_flag => 1); $objDB->id_parser(\&get_id); $objDB->make_index(glob($objParameter->dbGenome())); sub get_id { my $header = shift; $header =~ /^>.*\bsp\|([A-Z]\d{5}\b)/; $1; } output Use of uninitialized value $id in concatenation (.) or string at /usr/share/perl5/Bio/Index/Abstract.pm line 753, <$FASTA> line 1. Use of uninitialized value $id in exists at /usr/share/perl5/Bio/Index/Abstract.pm line 754, <$FASTA> line 1. Use of uninitialized value $id in hash element at /usr/share/perl5/Bio/Index/Abstract.pm line 757, <$FASTA> line 1. gi|376282008|ref|NC_016798.1| ii) my $PATHdbIdx=catfile($objParameter->DIR,'data','db.idx'); unlink($PATHdbIdx); my $objDB = Bio::DB::Fasta->new($objParameter->dbGenome(), -makeid => \&get_id); $objDBgenome->set(\$objDB); output: Use of uninitialized value $key in pattern match (m//) at /usr/share/perl5/Bio/DB/Fasta.pm line 1178. Use of uninitialized value $id in exists at /usr/share/perl5/Bio/DB/Fasta.pm line 617. gi|376282008|ref|NC_016798.1| From yuf228 at hotmail.com Thu Jan 31 23:15:15 2013 From: yuf228 at hotmail.com (Rob) Date: Fri, 1 Feb 2013 04:15:15 +0000 (UTC) Subject: [Bioperl-l] Where to get BLASTCLUST or equivalent? References: <200305311150.h4VBopn2019091@localhost.localdomain> Message-ID: Cyril C.C. Chua bmb.leeds.ac.uk> writes: > > Hi, > > I have some difficulty in sourcing for BLASTCLUST or related > programs/mods. Does any1 know exactly how to locate them? > > Regards > > Cyril Chua > Hi Cyril, I heard of the following programmes that might do similar things (I HAVEN'T used any of them yet): Afree - http://www.vicbioinformatics.com/software.afree.shtml Uclust - http://drive5.com/uclust/uclust_userguide_2_1.pdf Usearch - http://www.drive5.com/usearch/ DomClust - http://mbgd.genome.ad.jp/domclust/ or Check this: http://ppod.princeton.edu/help/help_tech.html God bless, Robert From jimhu at tamu.edu Wed Jan 2 14:44:56 2013 From: jimhu at tamu.edu (Jim Hu) Date: Wed, 2 Jan 2013 13:44:56 -0600 Subject: [Bioperl-l] Converting blast+ output to gff (with gaps) Message-ID: <072DC877-6A75-4A2E-9E6B-027E1D874C4C@tamu.edu> I assume this has already been done many times, but I can't seem to find it on bioperl.org or via google. I'm looking for a script that will take one of the blast+ outformats that includes the positions of gaps and mismatches, and create gff with appropriate subfeatures. Thanks, Jim ===================================== Jim Hu Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054 From biojiangke at gmail.com Wed Jan 2 15:20:02 2013 From: biojiangke at gmail.com (Ke Jiang) Date: Wed, 2 Jan 2013 15:20:02 -0500 Subject: [Bioperl-l] Converting blast+ output to gff (with gaps) In-Reply-To: <072DC877-6A75-4A2E-9E6B-027E1D874C4C@tamu.edu> References: <072DC877-6A75-4A2E-9E6B-027E1D874C4C@tamu.edu> Message-ID: There is a blast2gff.pl floating around for Gbrowse to visualize blast results as tracks. Not sure about the blast+. One more reason to stick with the good old blast! https://github.com/GMOD/GBrowse/blob/master/contrib/blast2gff.pl Ke ------------------------------------------- Ke Jiang, Ph.D. Post-doctoral Fellow Delbruck Laboratory Cold Spring Harbor Laboratory Cold Spring Harbor, NY 11724 From cjfields at illinois.edu Wed Jan 2 15:47:52 2013 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 2 Jan 2013 20:47:52 +0000 Subject: [Bioperl-l] Converting blast+ output to gff (with gaps) In-Reply-To: References: <072DC877-6A75-4A2E-9E6B-027E1D874C4C@tamu.edu> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF4CF5FD44@CHIMBX5.ad.uillinois.edu> ?except that NCBI no longer supports legacy BLAST. Caveat emptor. chris On Jan 2, 2013, at 2:20 PM, Ke Jiang wrote: > There is a blast2gff.pl floating around for Gbrowse to visualize blast > results as tracks. Not sure about the blast+. One more reason to stick with > the good old blast! > > https://github.com/GMOD/GBrowse/blob/master/contrib/blast2gff.pl > > Ke > > ------------------------------------------- > Ke Jiang, Ph.D. > Post-doctoral Fellow > Delbruck Laboratory > Cold Spring Harbor Laboratory > Cold Spring Harbor, NY 11724 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bosborne11 at verizon.net Wed Jan 2 15:26:22 2013 From: bosborne11 at verizon.net (Brian Osborne) Date: Wed, 02 Jan 2013 15:26:22 -0500 Subject: [Bioperl-l] Converting blast+ output to gff (with gaps) In-Reply-To: <072DC877-6A75-4A2E-9E6B-027E1D874C4C@tamu.edu> References: <072DC877-6A75-4A2E-9E6B-027E1D874C4C@tamu.edu> Message-ID: Here's one: https://github.com/GMOD/GBrowse/blob/master/contrib/blast2gff.pl Another one: ~/git/bioperl-live>head scripts/utilities/bp_search2gff.pl #!perl # Author: Jason Stajich # Description: Turn SearchIO parseable report(s) into a GFF report # =head1 NAME bp_search2gff - Turn SearchIO parseable reports(s) into a GFF report Brian O. On Jan 2, 2013, at 2:44 PM, Jim Hu wrote: > I assume this has already been done many times, but I can't seem to find it on bioperl.org or via google. > > I'm looking for a script that will take one of the blast+ outformats that includes the positions of gaps and mismatches, and create gff with appropriate subfeatures. > > Thanks, > > Jim > ===================================== > Jim Hu > Professor > Dept. of Biochemistry and Biophysics > 2128 TAMU > Texas A&M Univ. > College Station, TX 77843-2128 > 979-862-4054 > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From scott at scottcain.net Wed Jan 2 16:32:16 2013 From: scott at scottcain.net (Scott Cain) Date: Wed, 2 Jan 2013 16:32:16 -0500 Subject: [Bioperl-l] Converting blast+ output to gff (with gaps) In-Reply-To: References: <072DC877-6A75-4A2E-9E6B-027E1D874C4C@tamu.edu> Message-ID: Hi Brian, I was going to suggest the same thing--though that script is fairly old, it's not as old as the blast2gff script in the GBrowse distribution (which probably should be retired). I believe it supports GFF3, though I don't have any sample data with which to test it to be sure. I also don't know if it supports BLAST+ input--I haven't kept up with SearchIO (on which search2gff.pl depends); will it accept it? Scott On Wed, Jan 2, 2013 at 3:26 PM, Brian Osborne wrote: > Here's one: > > https://github.com/GMOD/GBrowse/blob/master/contrib/blast2gff.pl > > Another one: > > ~/git/bioperl-live>head scripts/utilities/bp_search2gff.pl > #!perl > > # Author: Jason Stajich > # Description: Turn SearchIO parseable report(s) into a GFF report > # > =head1 NAME > > bp_search2gff - Turn SearchIO parseable reports(s) into a GFF report > > > > Brian O. > > On Jan 2, 2013, at 2:44 PM, Jim Hu wrote: > >> I assume this has already been done many times, but I can't seem to find it on bioperl.org or via google. >> >> I'm looking for a script that will take one of the blast+ outformats that includes the positions of gaps and mismatches, and create gff with appropriate subfeatures. >> >> Thanks, >> >> Jim >> ===================================== >> Jim Hu >> Professor >> Dept. of Biochemistry and Biophysics >> 2128 TAMU >> Texas A&M Univ. >> College Station, TX 77843-2128 >> 979-862-4054 >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From cjfields at illinois.edu Wed Jan 2 17:26:55 2013 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 2 Jan 2013 22:26:55 +0000 Subject: [Bioperl-l] Converting blast+ output to gff (with gaps) In-Reply-To: References: <072DC877-6A75-4A2E-9E6B-027E1D874C4C@tamu.edu> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF4CF5FFF5@CHIMBX5.ad.uillinois.edu> It should (I recall using it at one point). If it doesn't we should fix it so it does. How does MAKER deal with this? IIRC it uses (a modified) SearchIO-based method... chris On Jan 2, 2013, at 3:32 PM, Scott Cain wrote: > Hi Brian, > > I was going to suggest the same thing--though that script is fairly > old, it's not as old as the blast2gff script in the GBrowse > distribution (which probably should be retired). I believe it > supports GFF3, though I don't have any sample data with which to test > it to be sure. I also don't know if it supports BLAST+ input--I > haven't kept up with SearchIO (on which search2gff.pl depends); will > it accept it? > > Scott > > > On Wed, Jan 2, 2013 at 3:26 PM, Brian Osborne wrote: >> Here's one: >> >> https://github.com/GMOD/GBrowse/blob/master/contrib/blast2gff.pl >> >> Another one: >> >> ~/git/bioperl-live>head scripts/utilities/bp_search2gff.pl >> #!perl >> >> # Author: Jason Stajich >> # Description: Turn SearchIO parseable report(s) into a GFF report >> # >> =head1 NAME >> >> bp_search2gff - Turn SearchIO parseable reports(s) into a GFF report >> >> >> >> Brian O. >> >> On Jan 2, 2013, at 2:44 PM, Jim Hu wrote: >> >>> I assume this has already been done many times, but I can't seem to find it on bioperl.org or via google. >>> >>> I'm looking for a script that will take one of the blast+ outformats that includes the positions of gaps and mismatches, and create gff with appropriate subfeatures. >>> >>> Thanks, >>> >>> Jim >>> ===================================== >>> Jim Hu >>> Professor >>> Dept. of Biochemistry and Biophysics >>> 2128 TAMU >>> Texas A&M Univ. >>> College Station, TX 77843-2128 >>> 979-862-4054 >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bosborne11 at verizon.net Wed Jan 2 17:53:50 2013 From: bosborne11 at verizon.net (Brian Osborne) Date: Wed, 02 Jan 2013 17:53:50 -0500 Subject: [Bioperl-l] Converting blast+ output to gff (with gaps) In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF4CF5FFF5@CHIMBX5.ad.uillinois.edu> References: <072DC877-6A75-4A2E-9E6B-027E1D874C4C@tamu.edu> <118F034CF4C3EF48A96F86CE585B94BF4CF5FFF5@CHIMBX5.ad.uillinois.edu> Message-ID: <80636318-ED50-4F05-85C2-35F12E4FF9D8@verizon.net> Scott and Chris, I'll test it and see? Brian O. On Jan 2, 2013, at 5:26 PM, "Fields, Christopher J" wrote: > It should (I recall using it at one point). If it doesn't we should fix it so it does. > > How does MAKER deal with this? IIRC it uses (a modified) SearchIO-based method... > > chris > > On Jan 2, 2013, at 3:32 PM, Scott Cain wrote: > >> Hi Brian, >> >> I was going to suggest the same thing--though that script is fairly >> old, it's not as old as the blast2gff script in the GBrowse >> distribution (which probably should be retired). I believe it >> supports GFF3, though I don't have any sample data with which to test >> it to be sure. I also don't know if it supports BLAST+ input--I >> haven't kept up with SearchIO (on which search2gff.pl depends); will >> it accept it? >> >> Scott >> >> >> On Wed, Jan 2, 2013 at 3:26 PM, Brian Osborne wrote: >>> Here's one: >>> >>> https://github.com/GMOD/GBrowse/blob/master/contrib/blast2gff.pl >>> >>> Another one: >>> >>> ~/git/bioperl-live>head scripts/utilities/bp_search2gff.pl >>> #!perl >>> >>> # Author: Jason Stajich >>> # Description: Turn SearchIO parseable report(s) into a GFF report >>> # >>> =head1 NAME >>> >>> bp_search2gff - Turn SearchIO parseable reports(s) into a GFF report >>> >>> >>> >>> Brian O. >>> >>> On Jan 2, 2013, at 2:44 PM, Jim Hu wrote: >>> >>>> I assume this has already been done many times, but I can't seem to find it on bioperl.org or via google. >>>> >>>> I'm looking for a script that will take one of the blast+ outformats that includes the positions of gaps and mismatches, and create gff with appropriate subfeatures. >>>> >>>> Thanks, >>>> >>>> Jim >>>> ===================================== >>>> Jim Hu >>>> Professor >>>> Dept. of Biochemistry and Biophysics >>>> 2128 TAMU >>>> Texas A&M Univ. >>>> College Station, TX 77843-2128 >>>> 979-862-4054 >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D. scott at scottcain dot net >> GMOD Coordinator (http://gmod.org/) 216-392-3087 >> Ontario Institute for Cancer Research >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jimhu at tamu.edu Fri Jan 4 14:49:51 2013 From: jimhu at tamu.edu (Jim Hu) Date: Fri, 4 Jan 2013 13:49:51 -0600 Subject: [Bioperl-l] Converting blast+ output to gff (with gaps) In-Reply-To: <80636318-ED50-4F05-85C2-35F12E4FF9D8@verizon.net> References: <072DC877-6A75-4A2E-9E6B-027E1D874C4C@tamu.edu> <118F034CF4C3EF48A96F86CE585B94BF4CF5FFF5@CHIMBX5.ad.uillinois.edu> <80636318-ED50-4F05-85C2-35F12E4FF9D8@verizon.net> Message-ID: <7F69D24B-8D4C-41F5-B8A1-CB850EBB1C67@tamu.edu> Thanks for the replies, but... I can't tell what input formats for the blast results file are supported. Format 11 and format 6 give no output and no feedback. Putting some diagnostic print statements in the code suggests that I'm not getting any result objects from Bio::SearchIO. The script uses Bio::SearchIO, but does not seem to call the submodules for blast. Documentation links on the wiki seem to be broken, at least on this page: http://www.bioperl.org/wiki/Module:Bio::SearchIO Jim On Jan 2, 2013, at 4:53 PM, Brian Osborne wrote: > Scott and Chris, > > I'll test it and see? > > Brian O. > > > On Jan 2, 2013, at 5:26 PM, "Fields, Christopher J" wrote: > >> It should (I recall using it at one point). If it doesn't we should fix it so it does. >> >> How does MAKER deal with this? IIRC it uses (a modified) SearchIO-based method... >> >> chris >> >> On Jan 2, 2013, at 3:32 PM, Scott Cain wrote: >> >>> Hi Brian, >>> >>> I was going to suggest the same thing--though that script is fairly >>> old, it's not as old as the blast2gff script in the GBrowse >>> distribution (which probably should be retired). I believe it >>> supports GFF3, though I don't have any sample data with which to test >>> it to be sure. I also don't know if it supports BLAST+ input--I >>> haven't kept up with SearchIO (on which search2gff.pl depends); will >>> it accept it? >>> >>> Scott >>> >>> >>> On Wed, Jan 2, 2013 at 3:26 PM, Brian Osborne wrote: >>>> Here's one: >>>> >>>> https://github.com/GMOD/GBrowse/blob/master/contrib/blast2gff.pl >>>> >>>> Another one: >>>> >>>> ~/git/bioperl-live>head scripts/utilities/bp_search2gff.pl >>>> #!perl >>>> >>>> # Author: Jason Stajich >>>> # Description: Turn SearchIO parseable report(s) into a GFF report >>>> # >>>> =head1 NAME >>>> >>>> bp_search2gff - Turn SearchIO parseable reports(s) into a GFF report >>>> >>>> >>>> >>>> Brian O. >>>> >>>> On Jan 2, 2013, at 2:44 PM, Jim Hu wrote: >>>> >>>>> I assume this has already been done many times, but I can't seem to find it on bioperl.org or via google. >>>>> >>>>> I'm looking for a script that will take one of the blast+ outformats that includes the positions of gaps and mismatches, and create gff with appropriate subfeatures. >>>>> >>>>> Thanks, >>>>> >>>>> Jim >>>>> ===================================== >>>>> Jim Hu >>>>> Professor >>>>> Dept. of Biochemistry and Biophysics >>>>> 2128 TAMU >>>>> Texas A&M Univ. >>>>> College Station, TX 77843-2128 >>>>> 979-862-4054 >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >>> -- >>> ------------------------------------------------------------------------ >>> Scott Cain, Ph. D. scott at scottcain dot net >>> GMOD Coordinator (http://gmod.org/) 216-392-3087 >>> Ontario Institute for Cancer Research >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > ===================================== Jim Hu Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054 From bosborne11 at verizon.net Fri Jan 4 14:26:52 2013 From: bosborne11 at verizon.net (Brian Osborne) Date: Fri, 04 Jan 2013 14:26:52 -0500 Subject: [Bioperl-l] strange error in the beginners howto In-Reply-To: References: Message-ID: Matthew, First, let's distinguish "warning" and "error". Errors stop execution of the app or script, what you're seeing is a warning, your script actually completes. Does the warning vanish if you remove "-w"? Second, seems like this is a Perl version thing: http://www.sysarchitects.com/bioperl BIO On Jan 4, 2013, at 1:55 PM, "Thornton, Matthew" wrote: > Hello, > > I am working through the first part of the BioPerl beginners howto and I am getting an error and I was wondering if you could tell me what is wrong. > > I made the seqio.pl script: > > #!/bin/perl -w > > use Bio::Seq; > > $seq_obj = Bio::Seq->new(-seq => "aaaatgggggggggggccccgtt", > -display_id => "#12345", > -desc => "example 1", > -alphabet => "dna" ); > > print $seq_obj->seq(); > > I execute the script and I get this error: > > gserver at gserver:~/bio$ perl seqio.pl > Replacement list is longer than search list at /usr/local/share/perl/5.14.2/Bio/Range.pm line 251. > aaaatgggggggggggccccgttgserver at gserver:~/bio$ > > Is there something wrong with my installation? > > Any help is greatly appreciated. > > Thanks! > > > Matthew E. Thornton > > > > Laboratory Technician III > Broad Center for Stem Cell and Regenerative Medicine > University of Southern California > 1501 San Pablo St. ZNI > Los Angeles, CA 90089-2821 > ph.:323-442-1688 > fx.:323-442-4040 > matthew.thornton at med.usc.edu > Brian O. -- Brian Osborne, PhD BioTeam: http://bioteam.net email: briano at bioteam.net mobile: 978-317-3101 From MEC at stowers.org Fri Jan 4 15:20:17 2013 From: MEC at stowers.org (Cook, Malcolm) Date: Fri, 4 Jan 2013 14:20:17 -0600 Subject: [Bioperl-l] Converting blast+ output to gff (with gaps) In-Reply-To: <7F69D24B-8D4C-41F5-B8A1-CB850EBB1C67@tamu.edu> References: <072DC877-6A75-4A2E-9E6B-027E1D874C4C@tamu.edu> <118F034CF4C3EF48A96F86CE585B94BF4CF5FFF5@CHIMBX5.ad.uillinois.edu> <80636318-ED50-4F05-85C2-35F12E4FF9D8@verizon.net> <7F69D24B-8D4C-41F5-B8A1-CB850EBB1C67@tamu.edu> Message-ID: <2C40E43D1F7A56408C4463FD245DDDF907958EE791@EXCHMB-02.stowers-institute.org> Jim, Getting to your original question: > I'm looking for a script that will take one of the blast+ outformats that includes the positions of gaps and mismatches, and .create gff with appropriate subfeatures. Exactly what/how do you want/expect to encode the blast output as GFF{1,2,2.5,3}?? If GFF3 pe http://www.sequenceontology.org/gff3.shtml then are you hoping to get GFF3 marked up as described in section 'THE GAP ATTRIBUTE' or as in 'ALIGNMENTS' I would guess not because neither of them have 'subfeatures'. If you could explain more fully with examples (hand cobbled or borrowed from someone else) of what you expect then I might have a better idea of what options might suit your needs. ~Malcolm .-----Original Message----- .From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Jim Hu .Sent: Friday, January 04, 2013 1:50 PM .To: Brian Osborne .Cc: Fields, Christopher J; Scott Cain; bioperl-l at bioperl.org .Subject: Re: [Bioperl-l] Converting blast+ output to gff (with gaps) . .Thanks for the replies, but... . .I can't tell what input formats for the blast results file are supported. Format 11 and format 6 give no output and no feedback. Putting .some diagnostic print statements in the code suggests that I'm not getting any result objects from Bio::SearchIO. . .The script uses Bio::SearchIO, but does not seem to call the submodules for blast. Documentation links on the wiki seem to be .broken, at least on this page: . . http://www.bioperl.org/wiki/Module:Bio::SearchIO . .Jim . . .On Jan 2, 2013, at 4:53 PM, Brian Osborne wrote: . .> Scott and Chris, .> .> I'll test it and see... .> .> Brian O. .> .> .> On Jan 2, 2013, at 5:26 PM, "Fields, Christopher J" wrote: .> .>> It should (I recall using it at one point). If it doesn't we should fix it so it does. .>> .>> How does MAKER deal with this? IIRC it uses (a modified) SearchIO-based method... .>> .>> chris .>> .>> On Jan 2, 2013, at 3:32 PM, Scott Cain wrote: .>> .>>> Hi Brian, .>>> .>>> I was going to suggest the same thing--though that script is fairly .>>> old, it's not as old as the blast2gff script in the GBrowse .>>> distribution (which probably should be retired). I believe it .>>> supports GFF3, though I don't have any sample data with which to test .>>> it to be sure. I also don't know if it supports BLAST+ input--I .>>> haven't kept up with SearchIO (on which search2gff.pl depends); will .>>> it accept it? .>>> .>>> Scott .>>> .>>> .>>> On Wed, Jan 2, 2013 at 3:26 PM, Brian Osborne wrote: .>>>> Here's one: .>>>> .>>>> https://github.com/GMOD/GBrowse/blob/master/contrib/blast2gff.pl .>>>> .>>>> Another one: .>>>> .>>>> ~/git/bioperl-live>head scripts/utilities/bp_search2gff.pl .>>>> #!perl .>>>> .>>>> # Author: Jason Stajich .>>>> # Description: Turn SearchIO parseable report(s) into a GFF report .>>>> # .>>>> =head1 NAME .>>>> .>>>> bp_search2gff - Turn SearchIO parseable reports(s) into a GFF report .>>>> .>>>> .>>>> .>>>> Brian O. .>>>> .>>>> On Jan 2, 2013, at 2:44 PM, Jim Hu wrote: .>>>> .>>>>> I assume this has already been done many times, but I can't seem to find it on bioperl.org or via google. .>>>>> .>>>>> I'm looking for a script that will take one of the blast+ outformats that includes the positions of gaps and mismatches, and .create gff with appropriate subfeatures. .>>>>> .>>>>> Thanks, .>>>>> .>>>>> Jim .>>>>> ===================================== .>>>>> Jim Hu .>>>>> Professor .>>>>> Dept. of Biochemistry and Biophysics .>>>>> 2128 TAMU .>>>>> Texas A&M Univ. .>>>>> College Station, TX 77843-2128 .>>>>> 979-862-4054 .>>>>> .>>>>> .>>>>> .>>>>> _______________________________________________ .>>>>> Bioperl-l mailing list .>>>>> Bioperl-l at lists.open-bio.org .>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l .>>>> .>>>> .>>>> _______________________________________________ .>>>> Bioperl-l mailing list .>>>> Bioperl-l at lists.open-bio.org .>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l .>>> .>>> .>>> .>>> -- .>>> ------------------------------------------------------------------------ .>>> Scott Cain, Ph. D. scott at scottcain dot net .>>> GMOD Coordinator (http://gmod.org/) 216-392-3087 .>>> Ontario Institute for Cancer Research .>>> _______________________________________________ .>>> Bioperl-l mailing list .>>> Bioperl-l at lists.open-bio.org .>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l .>> .> . .===================================== .Jim Hu .Professor .Dept. of Biochemistry and Biophysics .2128 TAMU .Texas A&M Univ. .College Station, TX 77843-2128 .979-862-4054 . . . ._______________________________________________ .Bioperl-l mailing list .Bioperl-l at lists.open-bio.org .http://lists.open-bio.org/mailman/listinfo/bioperl-l From jimhu at tamu.edu Fri Jan 4 16:57:38 2013 From: jimhu at tamu.edu (Jim Hu) Date: Fri, 4 Jan 2013 15:57:38 -0600 Subject: [Bioperl-l] Converting blast+ output to gff (with gaps) In-Reply-To: <2C40E43D1F7A56408C4463FD245DDDF907958EE791@EXCHMB-02.stowers-institute.org> References: <072DC877-6A75-4A2E-9E6B-027E1D874C4C@tamu.edu> <118F034CF4C3EF48A96F86CE585B94BF4CF5FFF5@CHIMBX5.ad.uillinois.edu> <80636318-ED50-4F05-85C2-35F12E4FF9D8@verizon.net> <7F69D24B-8D4C-41F5-B8A1-CB850EBB1C67@tamu.edu> <2C40E43D1F7A56408C4463FD245DDDF907958EE791@EXCHMB-02.stowers-institute.org> Message-ID: <2D58FAEB-ED4B-4965-A8CB-890D0012D098@tamu.edu> Malcolm, Thanks, I should have reread the GFF3 spec before posting! In the section on the Gap attrribute and below on alignments it discusses two ways to represent an alignment. I was originally thinking of something like the later example shown for cDNA vs genome. But the gap attribute representation would be fine too. So, I can see how the final output could be done in different ways, but I'm still stuck on how to get there. I don't have a specific application in mind; I'm mostly just trying to understand how to get from having standalone blast+ output to get to things that look like the examples in the gff spec and the gbrowse documentation - really basic display of alignments that are gapped. For my teaching, we do EST vs genomic blast and want gapped cDNA alignments to show where the introns go. My other work is with bacteria where introns are rare, but there are times when I'd like to show an alignment that is interrupted by a transposable element, for example. Excerpting from blastp -help *** Formatting options -outfmt alignment view options: 0 = pairwise, 1 = query-anchored showing identities, 2 = query-anchored no identities, 3 = flat query-anchored, show identities, 4 = flat query-anchored, no identities, 5 = XML Blast output, 6 = tabular, 7 = tabular with comment lines, 8 = Text ASN.1, 9 = Binary ASN.1, 10 = Comma-separated values, 11 = BLAST archive format (ASN.1) Several of these are "lossy" in terms of where the actual gaps occur (e.g. 6). Others seem to me to be more human readable than suited for parsing. So I was hoping to get pointed to an existing script that would generate either the single feature with gap attribute OR the multi-line match features OR a combination from one of these output formats. I'm probably missing something very, very obvious. Best, Jim On Jan 4, 2013, at 2:20 PM, Cook, Malcolm wrote: > Jim, > > Getting to your original question: > >> I'm looking for a script that will take one of the blast+ outformats that includes the positions of gaps and mismatches, and .create gff with appropriate subfeatures. > > Exactly what/how do you want/expect to encode the blast output as GFF{1,2,2.5,3}?? > > If GFF3 pe http://www.sequenceontology.org/gff3.shtml then are you hoping to get GFF3 marked up as described in section 'THE GAP ATTRIBUTE' or as in 'ALIGNMENTS' > > I would guess not because neither of them have 'subfeatures'. > > If you could explain more fully with examples (hand cobbled or borrowed from someone else) of what you expect then I might have a better idea of what options might suit your needs. > > > ~Malcolm > > > .-----Original Message----- > .From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Jim Hu > .Sent: Friday, January 04, 2013 1:50 PM > .To: Brian Osborne > .Cc: Fields, Christopher J; Scott Cain; bioperl-l at bioperl.org > .Subject: Re: [Bioperl-l] Converting blast+ output to gff (with gaps) > . > .Thanks for the replies, but... > . > .I can't tell what input formats for the blast results file are supported. Format 11 and format 6 give no output and no feedback. Putting > .some diagnostic print statements in the code suggests that I'm not getting any result objects from Bio::SearchIO. > . > .The script uses Bio::SearchIO, but does not seem to call the submodules for blast. Documentation links on the wiki seem to be > .broken, at least on this page: > . > . http://www.bioperl.org/wiki/Module:Bio::SearchIO > . > .Jim > . > . > .On Jan 2, 2013, at 4:53 PM, Brian Osborne wrote: > . > .> Scott and Chris, > .> > .> I'll test it and see... > .> > .> Brian O. > .> > .> > .> On Jan 2, 2013, at 5:26 PM, "Fields, Christopher J" wrote: > .> > .>> It should (I recall using it at one point). If it doesn't we should fix it so it does. > .>> > .>> How does MAKER deal with this? IIRC it uses (a modified) SearchIO-based method... > .>> > .>> chris > .>> > .>> On Jan 2, 2013, at 3:32 PM, Scott Cain wrote: > .>> > .>>> Hi Brian, > .>>> > .>>> I was going to suggest the same thing--though that script is fairly > .>>> old, it's not as old as the blast2gff script in the GBrowse > .>>> distribution (which probably should be retired). I believe it > .>>> supports GFF3, though I don't have any sample data with which to test > .>>> it to be sure. I also don't know if it supports BLAST+ input--I > .>>> haven't kept up with SearchIO (on which search2gff.pl depends); will > .>>> it accept it? > .>>> > .>>> Scott > .>>> > .>>> > .>>> On Wed, Jan 2, 2013 at 3:26 PM, Brian Osborne wrote: > .>>>> Here's one: > .>>>> > .>>>> https://github.com/GMOD/GBrowse/blob/master/contrib/blast2gff.pl > .>>>> > .>>>> Another one: > .>>>> > .>>>> ~/git/bioperl-live>head scripts/utilities/bp_search2gff.pl > .>>>> #!perl > .>>>> > .>>>> # Author: Jason Stajich > .>>>> # Description: Turn SearchIO parseable report(s) into a GFF report > .>>>> # > .>>>> =head1 NAME > .>>>> > .>>>> bp_search2gff - Turn SearchIO parseable reports(s) into a GFF report > .>>>> > .>>>> > .>>>> > .>>>> Brian O. > .>>>> > .>>>> On Jan 2, 2013, at 2:44 PM, Jim Hu wrote: > .>>>> > .>>>>> I assume this has already been done many times, but I can't seem to find it on bioperl.org or via google. > .>>>>> > .>>>>> I'm looking for a script that will take one of the blast+ outformats that includes the positions of gaps and mismatches, and > .create gff with appropriate subfeatures. > .>>>>> > .>>>>> Thanks, > .>>>>> > .>>>>> Jim > .>>>>> ===================================== > .>>>>> Jim Hu > .>>>>> Professor > .>>>>> Dept. of Biochemistry and Biophysics > .>>>>> 2128 TAMU > .>>>>> Texas A&M Univ. > .>>>>> College Station, TX 77843-2128 > .>>>>> 979-862-4054 > .>>>>> > .>>>>> > .>>>>> > .>>>>> _______________________________________________ > .>>>>> Bioperl-l mailing list > .>>>>> Bioperl-l at lists.open-bio.org > .>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > .>>>> > .>>>> > .>>>> _______________________________________________ > .>>>> Bioperl-l mailing list > .>>>> Bioperl-l at lists.open-bio.org > .>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > .>>> > .>>> > .>>> > .>>> -- > .>>> ------------------------------------------------------------------------ > .>>> Scott Cain, Ph. D. scott at scottcain dot net > .>>> GMOD Coordinator (http://gmod.org/) 216-392-3087 > .>>> Ontario Institute for Cancer Research > .>>> _______________________________________________ > .>>> Bioperl-l mailing list > .>>> Bioperl-l at lists.open-bio.org > .>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > .>> > .> > . > .===================================== > .Jim Hu > .Professor > .Dept. of Biochemistry and Biophysics > .2128 TAMU > .Texas A&M Univ. > .College Station, TX 77843-2128 > .979-862-4054 > . > . > . > ._______________________________________________ > .Bioperl-l mailing list > .Bioperl-l at lists.open-bio.org > .http://lists.open-bio.org/mailman/listinfo/bioperl-l ===================================== Jim Hu Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054 From MEC at stowers.org Fri Jan 4 18:33:04 2013 From: MEC at stowers.org (Cook, Malcolm) Date: Fri, 4 Jan 2013 17:33:04 -0600 Subject: [Bioperl-l] Converting blast+ output to gff (with gaps) In-Reply-To: <2D58FAEB-ED4B-4965-A8CB-890D0012D098@tamu.edu> References: <072DC877-6A75-4A2E-9E6B-027E1D874C4C@tamu.edu> <118F034CF4C3EF48A96F86CE585B94BF4CF5FFF5@CHIMBX5.ad.uillinois.edu> <80636318-ED50-4F05-85C2-35F12E4FF9D8@verizon.net> <7F69D24B-8D4C-41F5-B8A1-CB850EBB1C67@tamu.edu> <2C40E43D1F7A56408C4463FD245DDDF907958EE791@EXCHMB-02.stowers-institute.org> <2D58FAEB-ED4B-4965-A8CB-890D0012D098@tamu.edu> Message-ID: <2C40E43D1F7A56408C4463FD245DDDF907958EE7CB@EXCHMB-02.stowers-institute.org> Jim, To get a working example for further discussion.... Assuming you have bioperl and blast+ installed.... This command gets a known transcript from ncbi and blasts it back at ncbi into fly genome: > bp_download_query_genbank.pl --query 'NM_001259364' | blastn -remote -db refseq_genomic -entrez_query 'melanogaster[Taxid]' -outfmt 6 > test.blast.tab Take a look at the results (see below, or run it yourself). First line output looks like this: NM_001259364 gi|116010442|ref|NT_033778.3| 100.00 2127 0 0 1869 3995 9330023 9327897 0.0 3928 Old GFF did not have parent-child. Depending on where you are going with this, you MIGHT want to use this strategy: http://gmod.org/wiki/GFF#Alignments Where the ID in column 9 serves to group the features. There is an 'implicit' parent. I think this would be respected by both GBrowse and UCSC genome browser (but I'm not positive). If so, converting blast output would yield one line output per each line input. You would need to assign a strand, swap sstart and send where sstart>send, and some slight re-formatting First line of GFF might look like: NT_033778.3 est EST_match 9330023 9327897 . + . ID=Match1;Name= NM_001259364;Target= NM_001259364 1869 3995 Is this along the lines of what you need? ~Malcolm Complete example blast+ output looks like this: NM_001259364 gi|116010442|ref|NT_033778.3| 100.00 2127 0 0 1869 3995 9330023 9327897 0.0 3928 NM_001259364 gi|116010442|ref|NT_033778.3| 100.00 1019 0 0 3995 5013 9327834 9326816 0.0 1882 NM_001259364 gi|116010442|ref|NT_033778.3| 100.00 374 0 0 345 718 9332643 9332270 0.0 691 NM_001259364 gi|116010442|ref|NT_033778.3| 100.00 300 0 0 1 300 9337978 9337679 1e-154 555 NM_001259364 gi|116010442|ref|NT_033778.3| 100.00 270 0 0 717 986 9332201 9331932 5e-138 499 NM_001259364 gi|116010442|ref|NT_033778.3| 100.00 243 0 0 1434 1676 9331200 9330958 5e-123 449 NM_001259364 gi|116010442|ref|NT_033778.3| 99.55 223 0 1 1154 1376 9331611 9331390 1e-109 405 NM_001259364 gi|116010442|ref|NT_033778.3| 100.00 196 0 0 1675 1870 9330634 9330439 7e-97 363 NM_001259364 gi|116010442|ref|NT_033778.3| 100.00 169 0 0 986 1154 9331847 9331679 7e-82 313 NM_001259364 gi|116010442|ref|NT_033778.3| 97.26 73 2 0 1372 1444 9331322 9331250 3e-25 124 NM_001259364 gi|116010442|ref|NT_033778.3| 100.00 44 0 0 301 344 9333095 9333052 2e-12 82.4 NM_001259364 gi|195586644|ref|NT_167067.1| 96.52 2127 74 0 1869 3995 7777827 7775701 0.0 3518 NM_001259364 gi|195586644|ref|NT_167067.1| 97.03 404 7 3 4004 4402 7775629 7775226 0.0 675 NM_001259364 gi|195586644|ref|NT_167067.1| 93.58 374 24 0 345 718 7780392 7780019 8e-156 558 NM_001259364 gi|195586644|ref|NT_167067.1| 94.37 302 15 1 1 300 7786262 7785961 6e-127 462 NM_001259364 gi|195586644|ref|NT_167067.1| 96.25 267 10 0 720 986 7779949 7779683 1e-119 438 NM_001259364 gi|195586644|ref|NT_167067.1| 95.00 240 12 0 1434 1673 7778959 7778720 2e-101 377 NM_001259364 gi|195586644|ref|NT_167067.1| 96.86 223 6 1 1154 1376 7779370 7779149 1e-99 372 NM_001259364 gi|195586644|ref|NT_167067.1| 93.88 196 12 0 1675 1870 7778384 7778189 7e-77 296 NM_001259364 gi|195586644|ref|NT_167067.1| 95.19 187 5 4 4827 5013 7774635 7774453 9e-76 292 NM_001259364 gi|195586644|ref|NT_167067.1| 92.86 168 12 0 987 1154 7779598 7779431 3e-61 244 NM_001259364 gi|195586644|ref|NT_167067.1| 92.86 70 5 0 1374 1443 7779079 7779010 2e-18 102 NM_001259364 gi|195586644|ref|NT_167067.1| 100.00 44 0 0 301 344 7780833 7780790 2e-12 82.4 NM_001259364 gi|195489961|ref|NT_167063.1| 92.43 2127 161 0 1869 3995 7628910 7631036 0.0 3037 NM_001259364 gi|195489961|ref|NT_167063.1| 91.94 1030 38 20 3995 5013 7631102 7632097 0.0 1400 NM_001259364 gi|195489961|ref|NT_167063.1| 88.56 376 39 4 345 718 7626215 7626588 4e-124 453 NM_001259364 gi|195489961|ref|NT_167063.1| 93.67 300 18 1 1 300 7620777 7621075 2e-122 448 NM_001259364 gi|195489961|ref|NT_167063.1| 95.41 196 9 0 1675 1870 7628263 7628458 7e-82 313 NM_001259364 gi|195489961|ref|NT_167063.1| 91.93 223 17 1 1154 1376 7627224 7627445 2e-81 311 NM_001259364 gi|195489961|ref|NT_167063.1| 86.42 265 31 3 720 983 7626653 7626913 1e-73 285 NM_001259364 gi|195489961|ref|NT_167063.1| 86.18 246 25 4 1434 1673 7627635 7627877 3e-65 257 NM_001259364 gi|195489961|ref|NT_167063.1| 88.27 162 19 0 993 1154 7626999 7627160 3e-46 195 NM_001259364 gi|195489961|ref|NT_167063.1| 93.06 72 5 0 1372 1443 7627513 7627584 1e-19 106 NM_001259364 gi|195489961|ref|NT_167063.1| 100.00 44 0 0 301 344 7625770 7625813 2e-12 82.4 ~Malcolm .-----Original Message----- .From: Jim Hu [mailto:jimhu at tamu.edu] .Sent: Friday, January 04, 2013 3:58 PM .To: Cook, Malcolm .Cc: 'Brian Osborne'; 'Fields, Christopher J'; 'Scott Cain'; 'bioperl-l at bioperl.org' .Subject: Re: [Bioperl-l] Converting blast+ output to gff (with gaps) . .Malcolm, . .Thanks, I should have reread the GFF3 spec before posting! . .In the section on the Gap attrribute and below on alignments it discusses two ways to represent an alignment. I was originally thinking .of something like the later example shown for cDNA vs genome. But the gap attribute representation would be fine too. So, I can see .how the final output could be done in different ways, but I'm still stuck on how to get there. . .I don't have a specific application in mind; I'm mostly just trying to understand how to get from having standalone blast+ output to get .to things that look like the examples in the gff spec and the gbrowse documentation - really basic display of alignments that are .gapped. For my teaching, we do EST vs genomic blast and want gapped cDNA alignments to show where the introns go. My other .work is with bacteria where introns are rare, but there are times when I'd like to show an alignment that is interrupted by a .transposable element, for example. . .Excerpting from blastp -help . . *** Formatting options . -outfmt . alignment view options: . 0 = pairwise, . 1 = query-anchored showing identities, . 2 = query-anchored no identities, . 3 = flat query-anchored, show identities, . 4 = flat query-anchored, no identities, . 5 = XML Blast output, . 6 = tabular, . 7 = tabular with comment lines, . 8 = Text ASN.1, . 9 = Binary ASN.1, . 10 = Comma-separated values, . 11 = BLAST archive format (ASN.1) . .Several of these are "lossy" in terms of where the actual gaps occur (e.g. 6). Others seem to me to be more human readable than .suited for parsing. So I was hoping to get pointed to an existing script that would generate either the single feature with gap attribute .OR the multi-line match features OR a combination from one of these output formats. . .I'm probably missing something very, very obvious. . .Best, . .Jim . . .On Jan 4, 2013, at 2:20 PM, Cook, Malcolm wrote: . .> Jim, .> .> Getting to your original question: .> .>> I'm looking for a script that will take one of the blast+ outformats that includes the positions of gaps and mismatches, and .create .gff with appropriate subfeatures. .> .> Exactly what/how do you want/expect to encode the blast output as GFF{1,2,2.5,3}?? .> .> If GFF3 pe http://www.sequenceontology.org/gff3.shtml then are you hoping to get GFF3 marked up as described in section 'THE .GAP ATTRIBUTE' or as in 'ALIGNMENTS' .> .> I would guess not because neither of them have 'subfeatures'. .> .> If you could explain more fully with examples (hand cobbled or borrowed from someone else) of what you expect then I might have .a better idea of what options might suit your needs. .> .> .> ~Malcolm .> .> .> .-----Original Message----- .> .From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Jim Hu .> .Sent: Friday, January 04, 2013 1:50 PM .> .To: Brian Osborne .> .Cc: Fields, Christopher J; Scott Cain; bioperl-l at bioperl.org .> .Subject: Re: [Bioperl-l] Converting blast+ output to gff (with gaps) .> . .> .Thanks for the replies, but... .> . .> .I can't tell what input formats for the blast results file are supported. Format 11 and format 6 give no output and no feedback. .Putting .> .some diagnostic print statements in the code suggests that I'm not getting any result objects from Bio::SearchIO. .> . .> .The script uses Bio::SearchIO, but does not seem to call the submodules for blast. Documentation links on the wiki seem to be .> .broken, at least on this page: .> . .> . http://www.bioperl.org/wiki/Module:Bio::SearchIO .> . .> .Jim .> . .> . .> .On Jan 2, 2013, at 4:53 PM, Brian Osborne wrote: .> . .> .> Scott and Chris, .> .> .> .> I'll test it and see... .> .> .> .> Brian O. .> .> .> .> .> .> On Jan 2, 2013, at 5:26 PM, "Fields, Christopher J" wrote: .> .> .> .>> It should (I recall using it at one point). If it doesn't we should fix it so it does. .> .>> .> .>> How does MAKER deal with this? IIRC it uses (a modified) SearchIO-based method... .> .>> .> .>> chris .> .>> .> .>> On Jan 2, 2013, at 3:32 PM, Scott Cain wrote: .> .>> .> .>>> Hi Brian, .> .>>> .> .>>> I was going to suggest the same thing--though that script is fairly .> .>>> old, it's not as old as the blast2gff script in the GBrowse .> .>>> distribution (which probably should be retired). I believe it .> .>>> supports GFF3, though I don't have any sample data with which to test .> .>>> it to be sure. I also don't know if it supports BLAST+ input--I .> .>>> haven't kept up with SearchIO (on which search2gff.pl depends); will .> .>>> it accept it? .> .>>> .> .>>> Scott .> .>>> .> .>>> .> .>>> On Wed, Jan 2, 2013 at 3:26 PM, Brian Osborne wrote: .> .>>>> Here's one: .> .>>>> .> .>>>> https://github.com/GMOD/GBrowse/blob/master/contrib/blast2gff.pl .> .>>>> .> .>>>> Another one: .> .>>>> .> .>>>> ~/git/bioperl-live>head scripts/utilities/bp_search2gff.pl .> .>>>> #!perl .> .>>>> .> .>>>> # Author: Jason Stajich .> .>>>> # Description: Turn SearchIO parseable report(s) into a GFF report .> .>>>> # .> .>>>> =head1 NAME .> .>>>> .> .>>>> bp_search2gff - Turn SearchIO parseable reports(s) into a GFF report .> .>>>> .> .>>>> .> .>>>> .> .>>>> Brian O. .> .>>>> .> .>>>> On Jan 2, 2013, at 2:44 PM, Jim Hu wrote: .> .>>>> .> .>>>>> I assume this has already been done many times, but I can't seem to find it on bioperl.org or via google. .> .>>>>> .> .>>>>> I'm looking for a script that will take one of the blast+ outformats that includes the positions of gaps and mismatches, and .> .create gff with appropriate subfeatures. .> .>>>>> .> .>>>>> Thanks, .> .>>>>> .> .>>>>> Jim .> .>>>>> ===================================== .> .>>>>> Jim Hu .> .>>>>> Professor .> .>>>>> Dept. of Biochemistry and Biophysics .> .>>>>> 2128 TAMU .> .>>>>> Texas A&M Univ. .> .>>>>> College Station, TX 77843-2128 .> .>>>>> 979-862-4054 .> .>>>>> .> .>>>>> .> .>>>>> .> .>>>>> _______________________________________________ .> .>>>>> Bioperl-l mailing list .> .>>>>> Bioperl-l at lists.open-bio.org .> .>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l .> .>>>> .> .>>>> .> .>>>> _______________________________________________ .> .>>>> Bioperl-l mailing list .> .>>>> Bioperl-l at lists.open-bio.org .> .>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l .> .>>> .> .>>> .> .>>> .> .>>> -- .> .>>> ------------------------------------------------------------------------ .> .>>> Scott Cain, Ph. D. scott at scottcain dot net .> .>>> GMOD Coordinator (http://gmod.org/) 216-392-3087 .> .>>> Ontario Institute for Cancer Research .> .>>> _______________________________________________ .> .>>> Bioperl-l mailing list .> .>>> Bioperl-l at lists.open-bio.org .> .>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l .> .>> .> .> .> . .> .===================================== .> .Jim Hu .> .Professor .> .Dept. of Biochemistry and Biophysics .> .2128 TAMU .> .Texas A&M Univ. .> .College Station, TX 77843-2128 .> .979-862-4054 .> . .> . .> . .> ._______________________________________________ .> .Bioperl-l mailing list .> .Bioperl-l at lists.open-bio.org .> .http://lists.open-bio.org/mailman/listinfo/bioperl-l . .===================================== .Jim Hu .Professor .Dept. of Biochemistry and Biophysics .2128 TAMU .Texas A&M Univ. .College Station, TX 77843-2128 .979-862-4054 . From bosborne11 at verizon.net Sat Jan 5 17:25:37 2013 From: bosborne11 at verizon.net (Brian Osborne) Date: Sat, 05 Jan 2013 17:25:37 -0500 Subject: [Bioperl-l] Converting blast+ output to gff (with gaps) In-Reply-To: <80636318-ED50-4F05-85C2-35F12E4FF9D8@verizon.net> References: <072DC877-6A75-4A2E-9E6B-027E1D874C4C@tamu.edu> <118F034CF4C3EF48A96F86CE585B94BF4CF5FFF5@CHIMBX5.ad.uillinois.edu> <80636318-ED50-4F05-85C2-35F12E4FF9D8@verizon.net> Message-ID: <12E97C07-D3E0-4E21-8A24-A4223468FF70@verizon.net> All, The scripts/utilities/bp_search2gff.pl script works fine with blastn (BLAST+), no surprise there, and since it uses Bio::Tools::GFF it can produce GFF3. Brian O. On Jan 2, 2013, at 5:53 PM, Brian Osborne wrote: > Scott and Chris, > > I'll test it and see? > > Brian O. > > > On Jan 2, 2013, at 5:26 PM, "Fields, Christopher J" wrote: > >> It should (I recall using it at one point). If it doesn't we should fix it so it does. >> >> How does MAKER deal with this? IIRC it uses (a modified) SearchIO-based method... >> >> chris >> >> On Jan 2, 2013, at 3:32 PM, Scott Cain wrote: >> >>> Hi Brian, >>> >>> I was going to suggest the same thing--though that script is fairly >>> old, it's not as old as the blast2gff script in the GBrowse >>> distribution (which probably should be retired). I believe it >>> supports GFF3, though I don't have any sample data with which to test >>> it to be sure. I also don't know if it supports BLAST+ input--I >>> haven't kept up with SearchIO (on which search2gff.pl depends); will >>> it accept it? >>> >>> Scott >>> >>> >>> On Wed, Jan 2, 2013 at 3:26 PM, Brian Osborne wrote: >>>> Here's one: >>>> >>>> https://github.com/GMOD/GBrowse/blob/master/contrib/blast2gff.pl >>>> >>>> Another one: >>>> >>>> ~/git/bioperl-live>head scripts/utilities/bp_search2gff.pl >>>> #!perl >>>> >>>> # Author: Jason Stajich >>>> # Description: Turn SearchIO parseable report(s) into a GFF report >>>> # >>>> =head1 NAME >>>> >>>> bp_search2gff - Turn SearchIO parseable reports(s) into a GFF report >>>> >>>> >>>> >>>> Brian O. >>>> >>>> On Jan 2, 2013, at 2:44 PM, Jim Hu wrote: >>>> >>>>> I assume this has already been done many times, but I can't seem to find it on bioperl.org or via google. >>>>> >>>>> I'm looking for a script that will take one of the blast+ outformats that includes the positions of gaps and mismatches, and create gff with appropriate subfeatures. >>>>> >>>>> Thanks, >>>>> >>>>> Jim >>>>> ===================================== >>>>> Jim Hu >>>>> Professor >>>>> Dept. of Biochemistry and Biophysics >>>>> 2128 TAMU >>>>> Texas A&M Univ. >>>>> College Station, TX 77843-2128 >>>>> 979-862-4054 >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >>> -- >>> ------------------------------------------------------------------------ >>> Scott Cain, Ph. D. scott at scottcain dot net >>> GMOD Coordinator (http://gmod.org/) 216-392-3087 >>> Ontario Institute for Cancer Research >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at gmail.com Mon Jan 7 04:35:23 2013 From: jason.stajich at gmail.com (Jason Stajich) Date: Mon, 7 Jan 2013 01:35:23 -0800 Subject: [Bioperl-l] Converting blast+ output to gff (with gaps) In-Reply-To: <12E97C07-D3E0-4E21-8A24-A4223468FF70@verizon.net> References: <072DC877-6A75-4A2E-9E6B-027E1D874C4C@tamu.edu> <118F034CF4C3EF48A96F86CE585B94BF4CF5FFF5@CHIMBX5.ad.uillinois.edu> <80636318-ED50-4F05-85C2-35F12E4FF9D8@verizon.net> <12E97C07-D3E0-4E21-8A24-A4223468FF70@verizon.net> Message-ID: with bp_search2gff I included a cmdline options so you decide what is the reference (seqid column) and what is the target (9th column) since in some cases (BLASTX vs TBLASTN) you may want to be displaying one or the other as the reference. this is the -t option. You can also have the -m option when you run which will build the grouping match term for a parent and then the sub-children will be HSPs that have that parent per Malcom's description of the parent child info. Have you tried this with your output - it should work fine with regular vanilla text blast+ output or the -outfmt 6 option (tell bp_search2gff the format is blasttable) or you can use -outfmt 5 and tell it you are using blastxml format. Jason On Jan 5, 2013, at 2:25 PM, Brian Osborne wrote: > All, > > The scripts/utilities/bp_search2gff.pl script works fine with blastn (BLAST+), no surprise there, and since it uses Bio::Tools::GFF it can produce GFF3. > > Brian O. > > > On Jan 2, 2013, at 5:53 PM, Brian Osborne wrote: > >> Scott and Chris, >> >> I'll test it and see? >> >> Brian O. >> >> >> On Jan 2, 2013, at 5:26 PM, "Fields, Christopher J" wrote: >> >>> It should (I recall using it at one point). If it doesn't we should fix it so it does. >>> >>> How does MAKER deal with this? IIRC it uses (a modified) SearchIO-based method... >>> >>> chris >>> >>> On Jan 2, 2013, at 3:32 PM, Scott Cain wrote: >>> >>>> Hi Brian, >>>> >>>> I was going to suggest the same thing--though that script is fairly >>>> old, it's not as old as the blast2gff script in the GBrowse >>>> distribution (which probably should be retired). I believe it >>>> supports GFF3, though I don't have any sample data with which to test >>>> it to be sure. I also don't know if it supports BLAST+ input--I >>>> haven't kept up with SearchIO (on which search2gff.pl depends); will >>>> it accept it? >>>> >>>> Scott >>>> >>>> >>>> On Wed, Jan 2, 2013 at 3:26 PM, Brian Osborne wrote: >>>>> Here's one: >>>>> >>>>> https://github.com/GMOD/GBrowse/blob/master/contrib/blast2gff.pl >>>>> >>>>> Another one: >>>>> >>>>> ~/git/bioperl-live>head scripts/utilities/bp_search2gff.pl >>>>> #!perl >>>>> >>>>> # Author: Jason Stajich >>>>> # Description: Turn SearchIO parseable report(s) into a GFF report >>>>> # >>>>> =head1 NAME >>>>> >>>>> bp_search2gff - Turn SearchIO parseable reports(s) into a GFF report >>>>> >>>>> >>>>> >>>>> Brian O. >>>>> >>>>> On Jan 2, 2013, at 2:44 PM, Jim Hu wrote: >>>>> >>>>>> I assume this has already been done many times, but I can't seem to find it on bioperl.org or via google. >>>>>> >>>>>> I'm looking for a script that will take one of the blast+ outformats that includes the positions of gaps and mismatches, and create gff with appropriate subfeatures. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Jim >>>>>> ===================================== >>>>>> Jim Hu >>>>>> Professor >>>>>> Dept. of Biochemistry and Biophysics >>>>>> 2128 TAMU >>>>>> Texas A&M Univ. >>>>>> College Station, TX 77843-2128 >>>>>> 979-862-4054 >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> >>>> -- >>>> ------------------------------------------------------------------------ >>>> Scott Cain, Ph. D. scott at scottcain dot net >>>> GMOD Coordinator (http://gmod.org/) 216-392-3087 >>>> Ontario Institute for Cancer Research >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason.stajich at gmail.com jason at bioperl.org From pengyu.ut at gmail.com Thu Jan 3 17:41:37 2013 From: pengyu.ut at gmail.com (Peng Yu) Date: Thu, 3 Jan 2013 14:41:37 -0800 (PST) Subject: [Bioperl-l] Error when using Bio::AlignIO on stockholm file from Pfam Message-ID: Hi, I tried the following code on the Pfam stockholm file (downloaded at ftp.sanger.ac.uk/pub/databases/Pfam/current_release/Pfam-A.full.gz). But it gives me the following error. Is it a bug in bioperl or it is an error in the Pfam-A.full.gz file? ~/linux/bin/xplat/src/stk/stk2acc/perl$ cat main.pl #!/usr/bin/env perl use strict; use warnings; #use FindBin; #use lib "$FindBin::Bin/."; use Bio::AlignIO; use Data::Dumper; my $in = Bio::AlignIO->new( -format => 'stockholm', -file => '/dev/stdin' ); while(my $aln=$in->next_aln) { print $aln->accession, "\n"; } ~/linux/bin/xplat/src/stk/stk2acc/perl$ cat main_full.sh #!/usr/bin/env bash zcat ~/db/Pfam-A.full.gz | ./main.pl ~/linux/bin/xplat/src/stk/stk2acc/perl$ ./main_full.sh ... ------------- EXCEPTION ------------- MSG: '1985D' is not an integer. STACK Bio::Range::end /Users/py/Library/ActivePerl-5.14/lib/Bio/ Range.pm:227 STACK Bio::Annotation::Target::new /Users/py/Library/ActivePerl-5.14/ lib/Bio/Annotation/Target.pm:81 STACK Bio::AlignIO::Handler::GenericAlignHandler::_stockholm_target / Users/py/Library/ActivePerl-5.14/lib/Bio/AlignIO/Handler/ GenericAlignHandler.pm:293 STACK Bio::AlignIO::Handler::GenericAlignHandler::data_handler /Users/ py/Library/ActivePerl-5.14/lib/Bio/AlignIO/Handler/ GenericAlignHandler.pm:73 STACK Bio::AlignIO::stockholm::next_aln /Users/py/Library/ ActivePerl-5.14/lib/Bio/AlignIO/stockholm.pm:470 STACK toplevel ./main.pl:17 ------------------------------------- From thomas.girke at gmail.com Wed Jan 9 15:59:08 2013 From: thomas.girke at gmail.com (Thomas Girke) Date: Wed, 9 Jan 2013 12:59:08 -0800 (PST) Subject: [Bioperl-l] Bioinformatics Position Opening Message-ID: <9a99afb9-4a67-4d20-89ce-4d416fa7a46b@googlegroups.com> Dear List, Below is an announcement for a Ph.D. level bioinformatics position at UCR. It is a long-term position with a competitive salary in a vibrant research environment with cutting edge high-performance compute and genomics facilities. Application instructions are given in the announcement. Potential candidates are welcome to email me their questions about this position directly, e.g. prior or after submitting a formal application. Best, Thomas -- Thomas Girke Associate Professor of Bioinformatics Institute for Integrative Genome Biology (IIGB) 1207F Genomics Building University of California Riverside, CA 92521 E-mail: thomas.girke at ucr.edu Ph: 951-905-5232 Fax: 951-827-5155 POSITION ANNOUNCEMENT POSITION The Institute for Integrative Genome Biology (IIGB) at the University of California, Riverside is seeking a Ph.D. level bioinformatician to manage its bioinformatics research activities and computing facility. TITLE/RANK Bioinformatics Facility Director. Salary will be competitive and commensurate with accomplishments. LOCATION University of California, Riverside. BACKGROUND Successful candidates will join an innovative and multidisciplinary Institute for Integrative Genome Biology (IIGB) that connects theoretical and experimental researchers from different departments in Life, Physical and Mathematical Sciences, Medicine, Engineering and various campus based Centers. The IIGB is organized around a 10,000 sq.ft. suite of Instrumentation Facilities that serve as a centralized, shared-use resource for faculty, staff and students, offering advanced tools in bioinformatics, microscopy, proteomics and genomics. Its bioinformatic component is equipped with an extensive high-performance compute (HPC) infrastructure. QUALIFICATIONS Applicants must have a Ph.D. from a recognized university in bioinformatics; or combined degrees in computer science and a biological science; or a degree in either computer science combined with relevant experience in biological science; or a degree in a biological science combined with relevant experience in computer science. The successful candidate will have at least 2 years of professional hands-on experience with next generation sequence data analysis, scientific data programming and high-performance computing. A strong publication record of bioinformatics research in collaboration with experimental biologists is expected. Another requirement is several years of professional experience with common programming languages/environments. This includes at least one statistical programming environment (preferentially R), one or more general-purpose scripting languages (e.g. Python, Perl or Ruby), experience with web development frameworks and relational database design. Several years of computational research experience using HPC systems will be beneficial. The incumbent should also have experience with the analysis of modern biological data sets, such as microarrays, next generation sequence data (e.g. genotyping, RNA profiling, de novo assemblies), phylogenetics and/or molecular dynamics simulations. RESPONSIBILITIES The Bioinformatics Facility Director manages IIGB's computational infrastructure jointly with its bioinformatics staff, including an HPC/Linux systems administrator, one or more programmers and students workers. The incumbent will be required to provide data analysis support to collaborative research activities and make available findings through presentations and contribute as team member to scientific publications as well as participate in the preparation of joint grant applications and reports. The teaching expectations include the development of a state-of-the-art workshop program on large-scale data analysis and programming. Participation in collaborative equipment grants will be another core responsibility to secure future growth of the facility?s computing resources. TO APPLY Review of applications will continue until the position is filled. Interested individuals should: (1) submit a curriculum vitae, (2) provide a statement of research interests, and (3) arrange to have three letters of reference sent on their behalf. All information should be emailed to: thomas.girke at ucr.edu WEBSITE http://facility.bioinformatics.ucr.edu/position-opening From frej.tulin at gmail.com Thu Jan 10 21:53:16 2013 From: frej.tulin at gmail.com (Frej Tulin) Date: Thu, 10 Jan 2013 21:53:16 -0500 Subject: [Bioperl-l] stand alone blast plus problem Message-ID: Hi I have been trying for a long time to control my blast searches with the Bio::Tools::Run::StandAloneBlastPlus module. The module is installed and gets loaded ok. However, when calling: my $blast_db = Bio::Tools::Run::StandAloneBlastPlus -> new ( -db_data => \@seq_arr, -create => 1); $blast_db -> make_db(); ---------------------------------------------------------------------- I get this error ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: /home/frej/ncbi-blast-2.2.27+/bin/makeblastdb call crashed: There was a problem running /home/frej/ncbi-blast-2.2.27+/bin/makeblastdb : Error: NCBI C++ Exception: "/am/ncbiapdata/release/blast/src/2.2.27/Linux64-Centos-icc/c++/ICC1010-ReleaseMT64--Linux64-Centos-icc/../src/objects/seq/../seqloc/Seq_id.cpp", line 1689: Error: ncbi::objects::CSeq_id::x_Init() - Unsupported ID type ATCG00500.1 STACK: Error::throw STACK: Bio::Root::Root::throw /home/frej/perl5/lib/perl5/Bio/Root/Root.pm:472 STACK: Bio::Tools::Run::WrapperBase::_run /home/frej/perl5/lib/perl5/Bio/Tools/Run/WrapperBase/CommandExts.pm:1012 STACK: Bio::Tools::Run::StandAloneBlastPlus::make_db /home/frej/perl5/lib/perl5/Bio/Tools/Run/StandAloneBlastPlus.pm:776 STACK: comp2fasta.pl:57 ------------------------------------------------------------------------------------------------------------------------------- The general header format of the fasta files I've tried are >ATCG00500.1|PACid:19637947 or >g8540.t1|PACid:26887931 I always get the same message. I have tried modifying the header by putting 'lcl|' in from but nothing seems to help. When I build a database using the makeblastdb directly from the command line it works ok, it just doesn't work from within bioperl. How can this be? What am I doing wrong? helpful for any assistance best /frej -- Frej Tulin Fred Cross lab Rockefeller University From cjfields at illinois.edu Fri Jan 11 00:04:12 2013 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 11 Jan 2013 05:04:12 +0000 Subject: [Bioperl-l] Error when using Bio::AlignIO on stockholm file from Pfam In-Reply-To: References: Message-ID: <118F034CF4C3EF48A96F86CE585B94BF4CF6A00B@CHIMBX5.ad.uillinois.edu> I haven't used the '/dev/stdin' file like this, I typically do: > my $in = Bio::AlignIO->new( > -format => 'stockholm', > -fh => \*STDIN > ); but if it works? That does look like a bug in the parser. Would you know the Pfam alignmet accessions that triggered it, or maybe ones just prior to the fail? chris On Jan 3, 2013, at 4:41 PM, Peng Yu wrote: > Hi, > > I tried the following code on the Pfam stockholm file (downloaded at > ftp.sanger.ac.uk/pub/databases/Pfam/current_release/Pfam-A.full.gz). > But it gives me the following error. Is it a bug in bioperl or it is > an error in the Pfam-A.full.gz file? > > ~/linux/bin/xplat/src/stk/stk2acc/perl$ cat main.pl > #!/usr/bin/env perl > > use strict; > use warnings; > > #use FindBin; > #use lib "$FindBin::Bin/."; > > use Bio::AlignIO; > use Data::Dumper; > > my $in = Bio::AlignIO->new( > -format => 'stockholm', > -file => '/dev/stdin' > ); > > while(my $aln=$in->next_aln) { > print $aln->accession, "\n"; > } > > ~/linux/bin/xplat/src/stk/stk2acc/perl$ cat main_full.sh > #!/usr/bin/env bash > > zcat ~/db/Pfam-A.full.gz | ./main.pl > > ~/linux/bin/xplat/src/stk/stk2acc/perl$ ./main_full.sh > > ... > > ------------- EXCEPTION ------------- > MSG: '1985D' is not an integer. > > STACK Bio::Range::end /Users/py/Library/ActivePerl-5.14/lib/Bio/ > Range.pm:227 > STACK Bio::Annotation::Target::new /Users/py/Library/ActivePerl-5.14/ > lib/Bio/Annotation/Target.pm:81 > STACK Bio::AlignIO::Handler::GenericAlignHandler::_stockholm_target / > Users/py/Library/ActivePerl-5.14/lib/Bio/AlignIO/Handler/ > GenericAlignHandler.pm:293 > STACK Bio::AlignIO::Handler::GenericAlignHandler::data_handler /Users/ > py/Library/ActivePerl-5.14/lib/Bio/AlignIO/Handler/ > GenericAlignHandler.pm:73 > STACK Bio::AlignIO::stockholm::next_aln /Users/py/Library/ > ActivePerl-5.14/lib/Bio/AlignIO/stockholm.pm:470 > STACK toplevel ./main.pl:17 > ------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Russell.Smithies at agresearch.co.nz Sun Jan 13 17:25:16 2013 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Mon, 14 Jan 2013 11:25:16 +1300 Subject: [Bioperl-l] stand alone blast plus problem In-Reply-To: References: Message-ID: <18DF7D20DFEC044098A1062202F5FFF372E99EE4C1@exchsth.agresearch.co.nz> Hi Frej, Here's a few things to try: Is it possible you need to escape the special chars in your IDs? i.e. the pipes and colons? Have you tried it with simpler IDs? Is your BioPerl sequence object collection created correctly? Can you create the database by just passing it the fasta file? $fac = Bio::Tools::Run::StandAloneBlastPlus->new( -db_name => 'mydb', -db_data => 'myseqs.fas', -create => 1 ); --Russell -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Frej Tulin Sent: Friday, 11 January 2013 3:53 p.m. To: bioperl-l at bioperl.org Subject: [Bioperl-l] stand alone blast plus problem Hi I have been trying for a long time to control my blast searches with the Bio::Tools::Run::StandAloneBlastPlus module. The module is installed and gets loaded ok. However, when calling: my $blast_db = Bio::Tools::Run::StandAloneBlastPlus -> new ( -db_data => \@seq_arr, -create => 1); $blast_db -> make_db(); ---------------------------------------------------------------------- I get this error ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: /home/frej/ncbi-blast-2.2.27+/bin/makeblastdb call crashed: There was a problem running /home/frej/ncbi-blast-2.2.27+/bin/makeblastdb : Error: NCBI C++ Exception: "/am/ncbiapdata/release/blast/src/2.2.27/Linux64-Centos-icc/c++/ICC1010-ReleaseMT64--Linux64-Centos-icc/../src/objects/seq/../seqloc/Seq_id.cpp", line 1689: Error: ncbi::objects::CSeq_id::x_Init() - Unsupported ID type ATCG00500.1 STACK: Error::throw STACK: Bio::Root::Root::throw /home/frej/perl5/lib/perl5/Bio/Root/Root.pm:472 STACK: Bio::Tools::Run::WrapperBase::_run /home/frej/perl5/lib/perl5/Bio/Tools/Run/WrapperBase/CommandExts.pm:1012 STACK: Bio::Tools::Run::StandAloneBlastPlus::make_db /home/frej/perl5/lib/perl5/Bio/Tools/Run/StandAloneBlastPlus.pm:776 STACK: comp2fasta.pl:57 ------------------------------------------------------------------------------------------------------------------------------- The general header format of the fasta files I've tried are >ATCG00500.1|PACid:19637947 or >g8540.t1|PACid:26887931 I always get the same message. I have tried modifying the header by putting 'lcl|' in from but nothing seems to help. When I build a database using the makeblastdb directly from the command line it works ok, it just doesn't work from within bioperl. How can this be? What am I doing wrong? helpful for any assistance best /frej -- Frej Tulin Fred Cross lab Rockefeller University _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From awitney at sgul.ac.uk Thu Jan 17 06:45:40 2013 From: awitney at sgul.ac.uk (Adam Witney) Date: Thu, 17 Jan 2013 11:45:40 +0000 Subject: [Bioperl-l] drawing genome comparison cross linking tracks? Message-ID: <50F7E464.3090906@sgul.ac.uk> Hi, I would like to produces graphics as shown in this section (cross linking between genome tracks) [scroll down a little to see the pictures]: http://www.biopython.org/DIST/docs/tutorial/Tutorial.html#htoc211 but using perl. Can Bio::Graphics do this? I can only find Gbrowse_syn, but this works within a web browser, I would like to generate these diagrams from a stand alone script. If not does anyone know any perl libraries that can be used to show genome comparisons? Thanks Adam From cjfields at illinois.edu Thu Jan 17 08:37:04 2013 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 17 Jan 2013 13:37:04 +0000 Subject: [Bioperl-l] drawing genome comparison cross linking tracks? In-Reply-To: <50F7E464.3090906@sgul.ac.uk> References: <50F7E464.3090906@sgul.ac.uk> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF69BFAE17@CITESMBX5.ad.uillinois.edu> Gbrowse_syn, SynBrowse, and others can do this, but there isn't a pre-packaged way of drawing this that I know of analogous to Biopython's. It would be nice to have, though. chris On Jan 17, 2013, at 5:45 AM, Adam Witney wrote: > Hi, > > I would like to produces graphics as shown in this section (cross linking between genome tracks) [scroll down a little to see the pictures]: > > http://www.biopython.org/DIST/docs/tutorial/Tutorial.html#htoc211 > > but using perl. Can Bio::Graphics do this? I can only find Gbrowse_syn, but this works within a web browser, I would like to generate these diagrams from a stand alone script. If not does anyone know any perl libraries that can be used to show genome comparisons? > > Thanks > > Adam > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From awitney at sgul.ac.uk Thu Jan 17 12:33:42 2013 From: awitney at sgul.ac.uk (Adam Witney) Date: Thu, 17 Jan 2013 17:33:42 +0000 Subject: [Bioperl-l] drawing genome comparison cross linking tracks? In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF69BFAE17@CITESMBX5.ad.uillinois.edu> References: <50F7E464.3090906@sgul.ac.uk> <118F034CF4C3EF48A96F86CE585B94BF69BFAE17@CITESMBX5.ad.uillinois.edu> Message-ID: <50F835F6.5090603@sgul.ac.uk> Thanks Chris. I hadn't seen SynBrowse before, looks good. But I would like to do this without all the database/webserver overhead, I have a perl script that can interact with a gbrowse_syn database to generate images ie no webserver, I just need to separate out the database stuff now. Thanks again Adam On 17/01/2013 13:37, Fields, Christopher J wrote: > Gbrowse_syn, SynBrowse, and others can do this, but there isn't a pre-packaged way of drawing this that I know of analogous to Biopython's. It would be nice to have, though. > > chris > > On Jan 17, 2013, at 5:45 AM, Adam Witney wrote: > >> Hi, >> >> I would like to produces graphics as shown in this section (cross linking between genome tracks) [scroll down a little to see the pictures]: >> >> http://www.biopython.org/DIST/docs/tutorial/Tutorial.html#htoc211 >> >> but using perl. Can Bio::Graphics do this? I can only find Gbrowse_syn, but this works within a web browser, I would like to generate these diagrams from a stand alone script. If not does anyone know any perl libraries that can be used to show genome comparisons? >> >> Thanks >> >> Adam >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Thu Jan 17 12:41:47 2013 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 17 Jan 2013 17:41:47 +0000 Subject: [Bioperl-l] drawing genome comparison cross linking tracks? In-Reply-To: <50F835F6.5090603@sgul.ac.uk> References: <50F7E464.3090906@sgul.ac.uk> <118F034CF4C3EF48A96F86CE585B94BF69BFAE17@CITESMBX5.ad.uillinois.edu> <50F835F6.5090603@sgul.ac.uk> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF69BFD4BA@CITESMBX5.ad.uillinois.edu> Let us know if this can be of use to the community, I'm sure others would find it useful as well. chris On Jan 17, 2013, at 11:33 AM, Adam Witney wrote: > > Thanks Chris. I hadn't seen SynBrowse before, looks good. > > But I would like to do this without all the database/webserver overhead, I have a perl script that can interact with a gbrowse_syn database to generate images ie no webserver, I just need to separate out the database stuff now. > > Thanks again > > Adam > > On 17/01/2013 13:37, Fields, Christopher J wrote: >> Gbrowse_syn, SynBrowse, and others can do this, but there isn't a pre-packaged way of drawing this that I know of analogous to Biopython's. It would be nice to have, though. >> >> chris >> >> On Jan 17, 2013, at 5:45 AM, Adam Witney wrote: >> >>> Hi, >>> >>> I would like to produces graphics as shown in this section (cross linking between genome tracks) [scroll down a little to see the pictures]: >>> >>> http://www.biopython.org/DIST/docs/tutorial/Tutorial.html#htoc211 >>> >>> but using perl. Can Bio::Graphics do this? I can only find Gbrowse_syn, but this works within a web browser, I would like to generate these diagrams from a stand alone script. If not does anyone know any perl libraries that can be used to show genome comparisons? >>> >>> Thanks >>> >>> Adam >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From buschj at hhu.de Sat Jan 19 11:12:54 2013 From: buschj at hhu.de (jobu) Date: Sat, 19 Jan 2013 17:12:54 +0100 Subject: [Bioperl-l] Parsing Blast-Report extracting "Features flanking .." Message-ID: <50FAC606.2060800@hhu.de> Dear List, I've got quite a lot of hits from blasting against nt-database that state: "Features flanking this part of subject sequence: " where the subject-accession only will give information about complete genomic sequences, the detailed information inside the "Features flanking" section of the HSP about annotations of upstream and downstream elements would be what I need. Is there any elegant way to make Bio::SearchIO do this job for me? greetings Jochen From saladi1 at illinois.edu Tue Jan 22 16:45:23 2013 From: saladi1 at illinois.edu (Shyam Saladi) Date: Tue, 22 Jan 2013 13:45:23 -0800 Subject: [Bioperl-l] Bio::DB::Taxonomy example Message-ID: Hi, There is an example on how to use Bio::DB::Taxonomy at http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy I tried the script but could not get it to work properly. After looking through the documentation it seems that my @extant_children = grep { $_->rank eq 'species' } $node->get_all_Descendents; should instead be my @extant_children = grep { $_->rank eq 'species' } $db->get_all_Descendents($node) I found this solves the problem for me. If my fix is in fact correct, an admin may want to change the example script online. Thanks, Shyam ** From cjfields at illinois.edu Tue Jan 22 18:40:05 2013 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 22 Jan 2013 23:40:05 +0000 Subject: [Bioperl-l] Bio::DB::Taxonomy example In-Reply-To: References: Message-ID: <118F034CF4C3EF48A96F86CE585B94BF6CDFE139@CITESMBX5.ad.uillinois.edu> Done. Thanks for pointing that out! chris On Jan 23, 2013, at 5:45 AM, Shyam Saladi wrote: > Hi, > > There is an example on how to use Bio::DB::Taxonomy at > http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy > > I tried the script but could not get it to work properly. After looking > through the documentation it seems that > > my @extant_children = grep { $_->rank eq 'species' } > $node->get_all_Descendents; > > should instead be > > my @extant_children = grep { $_->rank eq 'species' } > $db->get_all_Descendents($node) > > I found this solves the problem for me. If my fix is in fact correct, an > admin may want to change the example script online. > > Thanks, > Shyam > ** > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From buschj at hhu.de Fri Jan 25 05:20:22 2013 From: buschj at hhu.de (jobu) Date: Fri, 25 Jan 2013 11:20:22 +0100 Subject: [Bioperl-l] Parsing Blast-Report extracting "Features flanking .." In-Reply-To: References: <50FAC606.2060800@hhu.de> Message-ID: <51025C66.6060008@hhu.de> Am 22.01.2013 19:03, schrieb Mgavi Brathwaite: > What upstream and downstream elements are you interested in? I've got a huge pile of short RNA reads. Part of the question now is whether those RNA fragments originate from siRNA events, or may represent miRNAs / parts of pre-miRNAs. So I did an online blast search against database nt. The resulting report quite often just gives subject information like this: ----- > gb|CP002686.1| Arabidopsis thaliana chromosome 3, complete sequence Length=23459830 ----- Now I would like to get the hit's neighbouring regions for further analysis. Preferably I would like to do that in an automized way, but the only possible action with this kind of subject gi | description would be to fetch the entire chromosomal sequence I guess ? However, right below the line above, the report states more precisely: ------ Features flanking this part of subject sequence: 8872 bp at 5' side: cytochrome P450 90B1 402 bp at 3' side: U1 small nuclear ribonucleoprotein-70K ------ Still I would like to have the possibility to automatically fetch the subject's sequence(s), as of now I think parsing the report with SearchIO won't let me aquire that information, because SearchIO does not recognize report sections like those. I hope I did not miss any of SearchIOs capabilities, but I could not find any method covering my wish?! Right now maybe the only way to get the information I want is to construct my own parser and write it out into a separate file, which in turn again I could read into a hash before processing the Blast-Report with SearchIO to combine both data for further automized work. I am aware though that even successfully getting the flanking features would leave me with the more or less wide intergenic gap my hsp is located in. However I'm in need of a way to get the flanking features including their annotation and the region spanning between them. But I hope I do not have to get complete sequences to accomplish that, as this would be kind of an overkill. with kind regards Jochen From afonsomduarte at gmail.com Mon Jan 14 05:09:48 2013 From: afonsomduarte at gmail.com (Afonso Duarte) Date: Mon, 14 Jan 2013 02:09:48 -0800 (PST) Subject: [Bioperl-l] Parsing Blast results for different genus Message-ID: <217cf8ac-5323-48f5-8a81-aba032cba68b@googlegroups.com> Dear All, I have obtained several .gb BLAST results that I want to split in different .gb or .fasta files accordingly to genus. I searched for different ways to do it in the web, but didn't find a way to do it when one has something like 50 genus in the blast. (to do it for a short number or genus I could use splitgb.pl from http://www.bioperl.org/wiki/HOWTO:SeqIO but it will not be feasible for a large number of genus). Is there a script of way to do it in a efficient way ? Best PS sorry if this message appears twice ... as i was having some trouble posting the first time. From afonsomduarte at gmail.com Sun Jan 13 14:17:19 2013 From: afonsomduarte at gmail.com (Afonso Duarte) Date: Sun, 13 Jan 2013 11:17:19 -0800 (PST) Subject: [Bioperl-l] Split blast result by genus. How to parse it ? Message-ID: <83857a34-14a0-43e2-a2b3-5ca538d41bcd@googlegroups.com> Dear All, I have obtained several .gb BLAST results that I want to split in different .gb or .fasta files accordingly to genus. I searched for different ways to do it in the web, but didn't find a way to do it when one has something like 50 genus in the blast. (to do it for a short number or genus I could use splitgb.pl from http://www.bioperl.org/wiki/HOWTO:SeqIO but it will not be feasible for a large number of genus). Is there a script of way to do it in a efficient way ? Best From afonsomduarte at gmail.com Mon Jan 21 04:03:44 2013 From: afonsomduarte at gmail.com (Afonso Duarte) Date: Mon, 21 Jan 2013 01:03:44 -0800 (PST) Subject: [Bioperl-l] spliting genbank file into smaller ones via organism Message-ID: Dear List, I have obtained several .gb BLAST results that I want to split in different .gb or .fasta files accordingly to genus. I searched for different ways to do it in the web, but didn't find a way to do it when one has something like 50 genus in the blast. (to do it for a short number or genus I could use splitgb.pl from http://www.bioperl.org/wiki/HOWTO:SeqIO but it will not be feasible for a large number of genus). Is there a script of way to do it in a efficient way ? Best From jason.stajich at gmail.com Tue Jan 29 14:00:16 2013 From: jason.stajich at gmail.com (Jason Stajich) Date: Tue, 29 Jan 2013 11:00:16 -0800 Subject: [Bioperl-l] Parsing Blast-Report extracting "Features flanking .." In-Reply-To: <51025C66.6060008@hhu.de> References: <50FAC606.2060800@hhu.de> <51025C66.6060008@hhu.de> Message-ID: <6E83E3F3-C304-4DC4-9A11-FE1CA90F207D@gmail.com> We don't parse the NCBI feature info from the BLAST reports per your query. To look up a specific feature you can use Bio::DB::GenBank to query for sequence from a specific feature by accession number - see the HOWTOs for that. However, most people use tools that generate SAM/BAM files with short reads - then you can use a tool like bedtools to find overlaps of reads with the locations of features. basically: - download the genome and GFF for arabidopsis - align your sRNA to the genome with a short read aligner - bowtie, bwa, others - convert your sam to bam file with SAMtools or picard - compare the location of features with the reads to get expression summaries or individuals reads with BEDTools On Jan 25, 2013, at 2:20 AM, jobu wrote: > Am 22.01.2013 19:03, schrieb Mgavi Brathwaite: >> What upstream and downstream elements are you interested in? > > > I've got a huge pile of short RNA reads. > Part of the question now is whether those RNA fragments originate from > siRNA events, > or may represent miRNAs / parts of pre-miRNAs. > > So I did an online blast search against database nt. > The resulting report quite often just gives subject information like this: > > ----- >> gb|CP002686.1| Arabidopsis thaliana chromosome 3, complete sequence > Length=23459830 > ----- > > Now I would like to get the hit's neighbouring regions for further > analysis. > Preferably I would like to do that in an automized way, but the only > possible action with this kind of subject gi | description would be to > fetch the entire chromosomal sequence I guess ? > > However, > right below the line above, the report states more precisely: > > ------ > Features flanking this part of subject sequence: > 8872 bp at 5' side: cytochrome P450 90B1 > 402 bp at 3' side: U1 small nuclear ribonucleoprotein-70K > ------ > > Still I would like to have the possibility to automatically fetch the > subject's sequence(s), > as of now I think parsing the report with SearchIO won't let me aquire > that information, because SearchIO does not recognize report sections > like those. > > I hope I did not miss any of SearchIOs capabilities, but I could not > find any method covering my wish?! > > Right now maybe the only way to get the information I want is to > construct my own parser and write it out into a separate file, which in > turn again I could read into a hash before processing the Blast-Report > with SearchIO to combine both data for further automized work. > > I am aware though that even successfully getting the flanking features > would leave me with the more or less wide intergenic gap my hsp is > located in. > > However I'm in need of a way to get the flanking features including > their annotation and the region spanning between them. > But I hope I do not have to get complete sequences to accomplish that, > as this would be kind of an overkill. > > with kind regards > Jochen > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason.stajich at gmail.com jason at bioperl.org From dr_kilburn59 at yahoo.com Wed Jan 30 16:40:26 2013 From: dr_kilburn59 at yahoo.com (Dan kilburn) Date: Wed, 30 Jan 2013 16:40:26 -0500 Subject: [Bioperl-l] Bioperl-l Digest, Vol 117, Issue 13 In-Reply-To: References: Message-ID: <575B184F-C051-4FEF-9BEB-7AB98E3C52A6@yahoo.com> Hi Jason, Are there any plans to keep SearchIO up to date with ncbi blast? I know they change formats ridiculously often, but I had to write my own parser to get sequence identity, which I would rather not have done. I realize that this job would be a big load on anyone who takes it, but it's so fundamental. Maybe I can help. --Dan Sent from my iPhone On Jan 30, 2013, at 12:00 PM, bioperl-l-request at lists.open-bio.org wrote: > Send Bioperl-l mailing list submissions to > bioperl-l at lists.open-bio.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://lists.open-bio.org/mailman/listinfo/bioperl-l > or, via email, send a message with subject or body 'help' to > bioperl-l-request at lists.open-bio.org > > You can reach the person managing the list at > bioperl-l-owner at lists.open-bio.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Bioperl-l digest..." > > > Today's Topics: > > 1. Re: Parsing Blast-Report extracting "Features flanking .." > (Jason Stajich) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Tue, 29 Jan 2013 11:00:16 -0800 > From: Jason Stajich > Subject: Re: [Bioperl-l] Parsing Blast-Report extracting "Features > flanking .." > To: buschj at hhu.de > Cc: bioperl-l at lists.open-bio.org > Message-ID: <6E83E3F3-C304-4DC4-9A11-FE1CA90F207D at gmail.com> > Content-Type: text/plain; charset=us-ascii > > We don't parse the NCBI feature info from the BLAST reports per your query. To look up a specific feature you can use Bio::DB::GenBank to query for sequence from a specific feature by accession number - see the HOWTOs for that. > > However, most people use tools that generate SAM/BAM files with short reads - then you can use a tool like bedtools to find overlaps of reads with the locations of features. > > basically: > - download the genome and GFF for arabidopsis > - align your sRNA to the genome with a short read aligner - bowtie, bwa, others > - convert your sam to bam file with SAMtools or picard > - compare the location of features with the reads to get expression summaries or individuals reads with BEDTools > > > On Jan 25, 2013, at 2:20 AM, jobu wrote: > >> Am 22.01.2013 19:03, schrieb Mgavi Brathwaite: >>> What upstream and downstream elements are you interested in? >> >> >> I've got a huge pile of short RNA reads. >> Part of the question now is whether those RNA fragments originate from >> siRNA events, >> or may represent miRNAs / parts of pre-miRNAs. >> >> So I did an online blast search against database nt. >> The resulting report quite often just gives subject information like this: >> >> ----- >>> gb|CP002686.1| Arabidopsis thaliana chromosome 3, complete sequence >> Length=23459830 >> ----- >> >> Now I would like to get the hit's neighbouring regions for further >> analysis. >> Preferably I would like to do that in an automized way, but the only >> possible action with this kind of subject gi | description would be to >> fetch the entire chromosomal sequence I guess ? >> >> However, >> right below the line above, the report states more precisely: >> >> ------ >> Features flanking this part of subject sequence: >> 8872 bp at 5' side: cytochrome P450 90B1 >> 402 bp at 3' side: U1 small nuclear ribonucleoprotein-70K >> ------ >> >> Still I would like to have the possibility to automatically fetch the >> subject's sequence(s), >> as of now I think parsing the report with SearchIO won't let me aquire >> that information, because SearchIO does not recognize report sections >> like those. >> >> I hope I did not miss any of SearchIOs capabilities, but I could not >> find any method covering my wish?! >> >> Right now maybe the only way to get the information I want is to >> construct my own parser and write it out into a separate file, which in >> turn again I could read into a hash before processing the Blast-Report >> with SearchIO to combine both data for further automized work. >> >> I am aware though that even successfully getting the flanking features >> would leave me with the more or less wide intergenic gap my hsp is >> located in. >> >> However I'm in need of a way to get the flanking features including >> their annotation and the region spanning between them. >> But I hope I do not have to get complete sequences to accomplish that, >> as this would be kind of an overkill. >> >> with kind regards >> Jochen >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > > > > > ------------------------------ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > End of Bioperl-l Digest, Vol 117, Issue 13 > ****************************************** From cjfields at illinois.edu Wed Jan 30 18:13:49 2013 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 30 Jan 2013 23:13:49 +0000 Subject: [Bioperl-l] Bioperl-l Digest, Vol 117, Issue 13 In-Reply-To: <575B184F-C051-4FEF-9BEB-7AB98E3C52A6@yahoo.com> References: <575B184F-C051-4FEF-9BEB-7AB98E3C52A6@yahoo.com> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF6CE1175C@CHIMBX5.ad.uillinois.edu> We certainly accept support for updates to the code. In fact, if you are familiar with git/github the process is fairly straightforward: 1) Fork the code to your github account 2) Make and commit changes 3) Submit a pull request 4) Post something to the list just in case. We also accept code patches; the best way to submit these is as a bug report to the redmine repository (doesn't hurt to post here as well): https://redmine.open-bio.org/ chris On Jan 30, 2013, at 3:40 PM, Dan kilburn wrote: > Hi Jason, > > Are there any plans to keep SearchIO up to date with ncbi blast? I know they change formats ridiculously often, but I had to write my own parser to get sequence identity, which I would rather not have done. I realize that this job would be a big load on anyone who takes it, but it's so fundamental. Maybe I can help. > > --Dan > Sent from my iPhone > > On Jan 30, 2013, at 12:00 PM, bioperl-l-request at lists.open-bio.org wrote: > >> Send Bioperl-l mailing list submissions to >> bioperl-l at lists.open-bio.org >> >> To subscribe or unsubscribe via the World Wide Web, visit >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> or, via email, send a message with subject or body 'help' to >> bioperl-l-request at lists.open-bio.org >> >> You can reach the person managing the list at >> bioperl-l-owner at lists.open-bio.org >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of Bioperl-l digest..." >> >> >> Today's Topics: >> >> 1. Re: Parsing Blast-Report extracting "Features flanking .." >> (Jason Stajich) >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Tue, 29 Jan 2013 11:00:16 -0800 >> From: Jason Stajich >> Subject: Re: [Bioperl-l] Parsing Blast-Report extracting "Features >> flanking .." >> To: buschj at hhu.de >> Cc: bioperl-l at lists.open-bio.org >> Message-ID: <6E83E3F3-C304-4DC4-9A11-FE1CA90F207D at gmail.com> >> Content-Type: text/plain; charset=us-ascii >> >> We don't parse the NCBI feature info from the BLAST reports per your query. To look up a specific feature you can use Bio::DB::GenBank to query for sequence from a specific feature by accession number - see the HOWTOs for that. >> >> However, most people use tools that generate SAM/BAM files with short reads - then you can use a tool like bedtools to find overlaps of reads with the locations of features. >> >> basically: >> - download the genome and GFF for arabidopsis >> - align your sRNA to the genome with a short read aligner - bowtie, bwa, others >> - convert your sam to bam file with SAMtools or picard >> - compare the location of features with the reads to get expression summaries or individuals reads with BEDTools >> >> >> On Jan 25, 2013, at 2:20 AM, jobu wrote: >> >>> Am 22.01.2013 19:03, schrieb Mgavi Brathwaite: >>>> What upstream and downstream elements are you interested in? >>> >>> >>> I've got a huge pile of short RNA reads. >>> Part of the question now is whether those RNA fragments originate from >>> siRNA events, >>> or may represent miRNAs / parts of pre-miRNAs. >>> >>> So I did an online blast search against database nt. >>> The resulting report quite often just gives subject information like this: >>> >>> ----- >>>> gb|CP002686.1| Arabidopsis thaliana chromosome 3, complete sequence >>> Length=23459830 >>> ----- >>> >>> Now I would like to get the hit's neighbouring regions for further >>> analysis. >>> Preferably I would like to do that in an automized way, but the only >>> possible action with this kind of subject gi | description would be to >>> fetch the entire chromosomal sequence I guess ? >>> >>> However, >>> right below the line above, the report states more precisely: >>> >>> ------ >>> Features flanking this part of subject sequence: >>> 8872 bp at 5' side: cytochrome P450 90B1 >>> 402 bp at 3' side: U1 small nuclear ribonucleoprotein-70K >>> ------ >>> >>> Still I would like to have the possibility to automatically fetch the >>> subject's sequence(s), >>> as of now I think parsing the report with SearchIO won't let me aquire >>> that information, because SearchIO does not recognize report sections >>> like those. >>> >>> I hope I did not miss any of SearchIOs capabilities, but I could not >>> find any method covering my wish?! >>> >>> Right now maybe the only way to get the information I want is to >>> construct my own parser and write it out into a separate file, which in >>> turn again I could read into a hash before processing the Blast-Report >>> with SearchIO to combine both data for further automized work. >>> >>> I am aware though that even successfully getting the flanking features >>> would leave me with the more or less wide intergenic gap my hsp is >>> located in. >>> >>> However I'm in need of a way to get the flanking features including >>> their annotation and the region spanning between them. >>> But I hope I do not have to get complete sequences to accomplish that, >>> as this would be kind of an overkill. >>> >>> with kind regards >>> Jochen >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Jason Stajich >> jason.stajich at gmail.com >> jason at bioperl.org >> >> >> >> >> ------------------------------ >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> End of Bioperl-l Digest, Vol 117, Issue 13 >> ****************************************** > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From thiago.venancio at gmail.com Thu Jan 31 12:10:39 2013 From: thiago.venancio at gmail.com (Thiago Venancio) Date: Thu, 31 Jan 2013 09:10:39 -0800 (PST) Subject: [Bioperl-l] bioperl-ext Message-ID: <5523d5a2-b103-4cf3-8c84-bd33ef675a6d@googlegroups.com> Hi all, I am trying to install bioperl-ext and the compilation gives me the error: /usr/bin/ld: libs/libsw.a(aln.o): relocation R_X86_64_32 against `.rodata' can not be used when making a shared object; recompile with -fPIC libs/libsw.a: could not read symbols: Bad value collect2: ld returned 1 exit status make[1]: *** [blib/arch/auto/Bio/Ext/Align/Align.so] Error 1 make[1]: Leaving directory `/home/thiago/temp/bioperl-ext-master/Bio/Ext/Align' make: *** [subdirs] Error 2 However, the Makefile says that -fPIC is already being used, which is puzzling me. Any clues ? Thanks. Thiago From thiago.venancio at gmail.com Thu Jan 31 12:25:12 2013 From: thiago.venancio at gmail.com (Thiago Venancio) Date: Thu, 31 Jan 2013 09:25:12 -0800 (PST) Subject: [Bioperl-l] bioperl-ext In-Reply-To: <5523d5a2-b103-4cf3-8c84-bd33ef675a6d@googlegroups.com> References: <5523d5a2-b103-4cf3-8c84-bd33ef675a6d@googlegroups.com> Message-ID: Solved using the solution posted by jmf here : https://groups.google.com/forum/?fromgroups=#!searchin/bioperl-l/bioperl-ext/bioperl-l/FgJo_g8UQE8/FB1DEjJkTooJ Which I found after posting. T On Thursday, January 31, 2013 3:10:39 PM UTC-2, Thiago Venancio wrote: > > Hi all, > > I am trying to install bioperl-ext and the compilation gives me the error: > > /usr/bin/ld: libs/libsw.a(aln.o): relocation R_X86_64_32 against `.rodata' > can not be used when making a shared object; recompile with -fPIC > libs/libsw.a: could not read symbols: Bad value > collect2: ld returned 1 exit status > make[1]: *** [blib/arch/auto/Bio/Ext/Align/Align.so] Error 1 > make[1]: Leaving directory > `/home/thiago/temp/bioperl-ext-master/Bio/Ext/Align' > make: *** [subdirs] Error 2 > > However, the Makefile says that -fPIC is already being used, which is > puzzling me. > > Any clues ? > > Thanks. > Thiago > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From koriege at googlemail.com Thu Jan 31 21:49:20 2013 From: koriege at googlemail.com (koriege at googlemail.com) Date: Thu, 31 Jan 2013 18:49:20 -0800 (PST) Subject: [Bioperl-l] problem with Bio::*::Fasta id_parser Message-ID: Hi, I tried two methods to create a bioperl FASTA database, but it failes by extracting the substring out of my headers. Can someone explain me why I get the standard header or show me a work around? thanks in advance. pyr0 i) my $objDB = Bio::Index::Fasta->new(-filename => $PATHdbIdx, -write_flag => 1); $objDB->id_parser(\&get_id); $objDB->make_index(glob($objParameter->dbGenome())); sub get_id { my $header = shift; $header =~ /^>.*\bsp\|([A-Z]\d{5}\b)/; $1; } output Use of uninitialized value $id in concatenation (.) or string at /usr/share/perl5/Bio/Index/Abstract.pm line 753, <$FASTA> line 1. Use of uninitialized value $id in exists at /usr/share/perl5/Bio/Index/Abstract.pm line 754, <$FASTA> line 1. Use of uninitialized value $id in hash element at /usr/share/perl5/Bio/Index/Abstract.pm line 757, <$FASTA> line 1. gi|376282008|ref|NC_016798.1| ii) my $PATHdbIdx=catfile($objParameter->DIR,'data','db.idx'); unlink($PATHdbIdx); my $objDB = Bio::DB::Fasta->new($objParameter->dbGenome(), -makeid => \&get_id); $objDBgenome->set(\$objDB); output: Use of uninitialized value $key in pattern match (m//) at /usr/share/perl5/Bio/DB/Fasta.pm line 1178. Use of uninitialized value $id in exists at /usr/share/perl5/Bio/DB/Fasta.pm line 617. gi|376282008|ref|NC_016798.1| From yuf228 at hotmail.com Thu Jan 31 23:15:15 2013 From: yuf228 at hotmail.com (Rob) Date: Fri, 1 Feb 2013 04:15:15 +0000 (UTC) Subject: [Bioperl-l] Where to get BLASTCLUST or equivalent? References: <200305311150.h4VBopn2019091@localhost.localdomain> Message-ID: Cyril C.C. Chua bmb.leeds.ac.uk> writes: > > Hi, > > I have some difficulty in sourcing for BLASTCLUST or related > programs/mods. Does any1 know exactly how to locate them? > > Regards > > Cyril Chua > Hi Cyril, I heard of the following programmes that might do similar things (I HAVEN'T used any of them yet): Afree - http://www.vicbioinformatics.com/software.afree.shtml Uclust - http://drive5.com/uclust/uclust_userguide_2_1.pdf Usearch - http://www.drive5.com/usearch/ DomClust - http://mbgd.genome.ad.jp/domclust/ or Check this: http://ppod.princeton.edu/help/help_tech.html God bless, Robert From jimhu at tamu.edu Wed Jan 2 19:44:56 2013 From: jimhu at tamu.edu (Jim Hu) Date: Wed, 2 Jan 2013 13:44:56 -0600 Subject: [Bioperl-l] Converting blast+ output to gff (with gaps) Message-ID: <072DC877-6A75-4A2E-9E6B-027E1D874C4C@tamu.edu> I assume this has already been done many times, but I can't seem to find it on bioperl.org or via google. I'm looking for a script that will take one of the blast+ outformats that includes the positions of gaps and mismatches, and create gff with appropriate subfeatures. Thanks, Jim ===================================== Jim Hu Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054 From biojiangke at gmail.com Wed Jan 2 20:20:02 2013 From: biojiangke at gmail.com (Ke Jiang) Date: Wed, 2 Jan 2013 15:20:02 -0500 Subject: [Bioperl-l] Converting blast+ output to gff (with gaps) In-Reply-To: <072DC877-6A75-4A2E-9E6B-027E1D874C4C@tamu.edu> References: <072DC877-6A75-4A2E-9E6B-027E1D874C4C@tamu.edu> Message-ID: There is a blast2gff.pl floating around for Gbrowse to visualize blast results as tracks. Not sure about the blast+. One more reason to stick with the good old blast! https://github.com/GMOD/GBrowse/blob/master/contrib/blast2gff.pl Ke ------------------------------------------- Ke Jiang, Ph.D. Post-doctoral Fellow Delbruck Laboratory Cold Spring Harbor Laboratory Cold Spring Harbor, NY 11724 From cjfields at illinois.edu Wed Jan 2 20:47:52 2013 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 2 Jan 2013 20:47:52 +0000 Subject: [Bioperl-l] Converting blast+ output to gff (with gaps) In-Reply-To: References: <072DC877-6A75-4A2E-9E6B-027E1D874C4C@tamu.edu> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF4CF5FD44@CHIMBX5.ad.uillinois.edu> ?except that NCBI no longer supports legacy BLAST. Caveat emptor. chris On Jan 2, 2013, at 2:20 PM, Ke Jiang wrote: > There is a blast2gff.pl floating around for Gbrowse to visualize blast > results as tracks. Not sure about the blast+. One more reason to stick with > the good old blast! > > https://github.com/GMOD/GBrowse/blob/master/contrib/blast2gff.pl > > Ke > > ------------------------------------------- > Ke Jiang, Ph.D. > Post-doctoral Fellow > Delbruck Laboratory > Cold Spring Harbor Laboratory > Cold Spring Harbor, NY 11724 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bosborne11 at verizon.net Wed Jan 2 20:26:22 2013 From: bosborne11 at verizon.net (Brian Osborne) Date: Wed, 02 Jan 2013 15:26:22 -0500 Subject: [Bioperl-l] Converting blast+ output to gff (with gaps) In-Reply-To: <072DC877-6A75-4A2E-9E6B-027E1D874C4C@tamu.edu> References: <072DC877-6A75-4A2E-9E6B-027E1D874C4C@tamu.edu> Message-ID: Here's one: https://github.com/GMOD/GBrowse/blob/master/contrib/blast2gff.pl Another one: ~/git/bioperl-live>head scripts/utilities/bp_search2gff.pl #!perl # Author: Jason Stajich # Description: Turn SearchIO parseable report(s) into a GFF report # =head1 NAME bp_search2gff - Turn SearchIO parseable reports(s) into a GFF report Brian O. On Jan 2, 2013, at 2:44 PM, Jim Hu wrote: > I assume this has already been done many times, but I can't seem to find it on bioperl.org or via google. > > I'm looking for a script that will take one of the blast+ outformats that includes the positions of gaps and mismatches, and create gff with appropriate subfeatures. > > Thanks, > > Jim > ===================================== > Jim Hu > Professor > Dept. of Biochemistry and Biophysics > 2128 TAMU > Texas A&M Univ. > College Station, TX 77843-2128 > 979-862-4054 > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From scott at scottcain.net Wed Jan 2 21:32:16 2013 From: scott at scottcain.net (Scott Cain) Date: Wed, 2 Jan 2013 16:32:16 -0500 Subject: [Bioperl-l] Converting blast+ output to gff (with gaps) In-Reply-To: References: <072DC877-6A75-4A2E-9E6B-027E1D874C4C@tamu.edu> Message-ID: Hi Brian, I was going to suggest the same thing--though that script is fairly old, it's not as old as the blast2gff script in the GBrowse distribution (which probably should be retired). I believe it supports GFF3, though I don't have any sample data with which to test it to be sure. I also don't know if it supports BLAST+ input--I haven't kept up with SearchIO (on which search2gff.pl depends); will it accept it? Scott On Wed, Jan 2, 2013 at 3:26 PM, Brian Osborne wrote: > Here's one: > > https://github.com/GMOD/GBrowse/blob/master/contrib/blast2gff.pl > > Another one: > > ~/git/bioperl-live>head scripts/utilities/bp_search2gff.pl > #!perl > > # Author: Jason Stajich > # Description: Turn SearchIO parseable report(s) into a GFF report > # > =head1 NAME > > bp_search2gff - Turn SearchIO parseable reports(s) into a GFF report > > > > Brian O. > > On Jan 2, 2013, at 2:44 PM, Jim Hu wrote: > >> I assume this has already been done many times, but I can't seem to find it on bioperl.org or via google. >> >> I'm looking for a script that will take one of the blast+ outformats that includes the positions of gaps and mismatches, and create gff with appropriate subfeatures. >> >> Thanks, >> >> Jim >> ===================================== >> Jim Hu >> Professor >> Dept. of Biochemistry and Biophysics >> 2128 TAMU >> Texas A&M Univ. >> College Station, TX 77843-2128 >> 979-862-4054 >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From cjfields at illinois.edu Wed Jan 2 22:26:55 2013 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 2 Jan 2013 22:26:55 +0000 Subject: [Bioperl-l] Converting blast+ output to gff (with gaps) In-Reply-To: References: <072DC877-6A75-4A2E-9E6B-027E1D874C4C@tamu.edu> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF4CF5FFF5@CHIMBX5.ad.uillinois.edu> It should (I recall using it at one point). If it doesn't we should fix it so it does. How does MAKER deal with this? IIRC it uses (a modified) SearchIO-based method... chris On Jan 2, 2013, at 3:32 PM, Scott Cain wrote: > Hi Brian, > > I was going to suggest the same thing--though that script is fairly > old, it's not as old as the blast2gff script in the GBrowse > distribution (which probably should be retired). I believe it > supports GFF3, though I don't have any sample data with which to test > it to be sure. I also don't know if it supports BLAST+ input--I > haven't kept up with SearchIO (on which search2gff.pl depends); will > it accept it? > > Scott > > > On Wed, Jan 2, 2013 at 3:26 PM, Brian Osborne wrote: >> Here's one: >> >> https://github.com/GMOD/GBrowse/blob/master/contrib/blast2gff.pl >> >> Another one: >> >> ~/git/bioperl-live>head scripts/utilities/bp_search2gff.pl >> #!perl >> >> # Author: Jason Stajich >> # Description: Turn SearchIO parseable report(s) into a GFF report >> # >> =head1 NAME >> >> bp_search2gff - Turn SearchIO parseable reports(s) into a GFF report >> >> >> >> Brian O. >> >> On Jan 2, 2013, at 2:44 PM, Jim Hu wrote: >> >>> I assume this has already been done many times, but I can't seem to find it on bioperl.org or via google. >>> >>> I'm looking for a script that will take one of the blast+ outformats that includes the positions of gaps and mismatches, and create gff with appropriate subfeatures. >>> >>> Thanks, >>> >>> Jim >>> ===================================== >>> Jim Hu >>> Professor >>> Dept. of Biochemistry and Biophysics >>> 2128 TAMU >>> Texas A&M Univ. >>> College Station, TX 77843-2128 >>> 979-862-4054 >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bosborne11 at verizon.net Wed Jan 2 22:53:50 2013 From: bosborne11 at verizon.net (Brian Osborne) Date: Wed, 02 Jan 2013 17:53:50 -0500 Subject: [Bioperl-l] Converting blast+ output to gff (with gaps) In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF4CF5FFF5@CHIMBX5.ad.uillinois.edu> References: <072DC877-6A75-4A2E-9E6B-027E1D874C4C@tamu.edu> <118F034CF4C3EF48A96F86CE585B94BF4CF5FFF5@CHIMBX5.ad.uillinois.edu> Message-ID: <80636318-ED50-4F05-85C2-35F12E4FF9D8@verizon.net> Scott and Chris, I'll test it and see? Brian O. On Jan 2, 2013, at 5:26 PM, "Fields, Christopher J" wrote: > It should (I recall using it at one point). If it doesn't we should fix it so it does. > > How does MAKER deal with this? IIRC it uses (a modified) SearchIO-based method... > > chris > > On Jan 2, 2013, at 3:32 PM, Scott Cain wrote: > >> Hi Brian, >> >> I was going to suggest the same thing--though that script is fairly >> old, it's not as old as the blast2gff script in the GBrowse >> distribution (which probably should be retired). I believe it >> supports GFF3, though I don't have any sample data with which to test >> it to be sure. I also don't know if it supports BLAST+ input--I >> haven't kept up with SearchIO (on which search2gff.pl depends); will >> it accept it? >> >> Scott >> >> >> On Wed, Jan 2, 2013 at 3:26 PM, Brian Osborne wrote: >>> Here's one: >>> >>> https://github.com/GMOD/GBrowse/blob/master/contrib/blast2gff.pl >>> >>> Another one: >>> >>> ~/git/bioperl-live>head scripts/utilities/bp_search2gff.pl >>> #!perl >>> >>> # Author: Jason Stajich >>> # Description: Turn SearchIO parseable report(s) into a GFF report >>> # >>> =head1 NAME >>> >>> bp_search2gff - Turn SearchIO parseable reports(s) into a GFF report >>> >>> >>> >>> Brian O. >>> >>> On Jan 2, 2013, at 2:44 PM, Jim Hu wrote: >>> >>>> I assume this has already been done many times, but I can't seem to find it on bioperl.org or via google. >>>> >>>> I'm looking for a script that will take one of the blast+ outformats that includes the positions of gaps and mismatches, and create gff with appropriate subfeatures. >>>> >>>> Thanks, >>>> >>>> Jim >>>> ===================================== >>>> Jim Hu >>>> Professor >>>> Dept. of Biochemistry and Biophysics >>>> 2128 TAMU >>>> Texas A&M Univ. >>>> College Station, TX 77843-2128 >>>> 979-862-4054 >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D. scott at scottcain dot net >> GMOD Coordinator (http://gmod.org/) 216-392-3087 >> Ontario Institute for Cancer Research >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jimhu at tamu.edu Fri Jan 4 19:49:51 2013 From: jimhu at tamu.edu (Jim Hu) Date: Fri, 4 Jan 2013 13:49:51 -0600 Subject: [Bioperl-l] Converting blast+ output to gff (with gaps) In-Reply-To: <80636318-ED50-4F05-85C2-35F12E4FF9D8@verizon.net> References: <072DC877-6A75-4A2E-9E6B-027E1D874C4C@tamu.edu> <118F034CF4C3EF48A96F86CE585B94BF4CF5FFF5@CHIMBX5.ad.uillinois.edu> <80636318-ED50-4F05-85C2-35F12E4FF9D8@verizon.net> Message-ID: <7F69D24B-8D4C-41F5-B8A1-CB850EBB1C67@tamu.edu> Thanks for the replies, but... I can't tell what input formats for the blast results file are supported. Format 11 and format 6 give no output and no feedback. Putting some diagnostic print statements in the code suggests that I'm not getting any result objects from Bio::SearchIO. The script uses Bio::SearchIO, but does not seem to call the submodules for blast. Documentation links on the wiki seem to be broken, at least on this page: http://www.bioperl.org/wiki/Module:Bio::SearchIO Jim On Jan 2, 2013, at 4:53 PM, Brian Osborne wrote: > Scott and Chris, > > I'll test it and see? > > Brian O. > > > On Jan 2, 2013, at 5:26 PM, "Fields, Christopher J" wrote: > >> It should (I recall using it at one point). If it doesn't we should fix it so it does. >> >> How does MAKER deal with this? IIRC it uses (a modified) SearchIO-based method... >> >> chris >> >> On Jan 2, 2013, at 3:32 PM, Scott Cain wrote: >> >>> Hi Brian, >>> >>> I was going to suggest the same thing--though that script is fairly >>> old, it's not as old as the blast2gff script in the GBrowse >>> distribution (which probably should be retired). I believe it >>> supports GFF3, though I don't have any sample data with which to test >>> it to be sure. I also don't know if it supports BLAST+ input--I >>> haven't kept up with SearchIO (on which search2gff.pl depends); will >>> it accept it? >>> >>> Scott >>> >>> >>> On Wed, Jan 2, 2013 at 3:26 PM, Brian Osborne wrote: >>>> Here's one: >>>> >>>> https://github.com/GMOD/GBrowse/blob/master/contrib/blast2gff.pl >>>> >>>> Another one: >>>> >>>> ~/git/bioperl-live>head scripts/utilities/bp_search2gff.pl >>>> #!perl >>>> >>>> # Author: Jason Stajich >>>> # Description: Turn SearchIO parseable report(s) into a GFF report >>>> # >>>> =head1 NAME >>>> >>>> bp_search2gff - Turn SearchIO parseable reports(s) into a GFF report >>>> >>>> >>>> >>>> Brian O. >>>> >>>> On Jan 2, 2013, at 2:44 PM, Jim Hu wrote: >>>> >>>>> I assume this has already been done many times, but I can't seem to find it on bioperl.org or via google. >>>>> >>>>> I'm looking for a script that will take one of the blast+ outformats that includes the positions of gaps and mismatches, and create gff with appropriate subfeatures. >>>>> >>>>> Thanks, >>>>> >>>>> Jim >>>>> ===================================== >>>>> Jim Hu >>>>> Professor >>>>> Dept. of Biochemistry and Biophysics >>>>> 2128 TAMU >>>>> Texas A&M Univ. >>>>> College Station, TX 77843-2128 >>>>> 979-862-4054 >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >>> -- >>> ------------------------------------------------------------------------ >>> Scott Cain, Ph. D. scott at scottcain dot net >>> GMOD Coordinator (http://gmod.org/) 216-392-3087 >>> Ontario Institute for Cancer Research >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > ===================================== Jim Hu Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054 From bosborne11 at verizon.net Fri Jan 4 19:26:52 2013 From: bosborne11 at verizon.net (Brian Osborne) Date: Fri, 04 Jan 2013 14:26:52 -0500 Subject: [Bioperl-l] strange error in the beginners howto In-Reply-To: References: Message-ID: Matthew, First, let's distinguish "warning" and "error". Errors stop execution of the app or script, what you're seeing is a warning, your script actually completes. Does the warning vanish if you remove "-w"? Second, seems like this is a Perl version thing: http://www.sysarchitects.com/bioperl BIO On Jan 4, 2013, at 1:55 PM, "Thornton, Matthew" wrote: > Hello, > > I am working through the first part of the BioPerl beginners howto and I am getting an error and I was wondering if you could tell me what is wrong. > > I made the seqio.pl script: > > #!/bin/perl -w > > use Bio::Seq; > > $seq_obj = Bio::Seq->new(-seq => "aaaatgggggggggggccccgtt", > -display_id => "#12345", > -desc => "example 1", > -alphabet => "dna" ); > > print $seq_obj->seq(); > > I execute the script and I get this error: > > gserver at gserver:~/bio$ perl seqio.pl > Replacement list is longer than search list at /usr/local/share/perl/5.14.2/Bio/Range.pm line 251. > aaaatgggggggggggccccgttgserver at gserver:~/bio$ > > Is there something wrong with my installation? > > Any help is greatly appreciated. > > Thanks! > > > Matthew E. Thornton > > > > Laboratory Technician III > Broad Center for Stem Cell and Regenerative Medicine > University of Southern California > 1501 San Pablo St. ZNI > Los Angeles, CA 90089-2821 > ph.:323-442-1688 > fx.:323-442-4040 > matthew.thornton at med.usc.edu > Brian O. -- Brian Osborne, PhD BioTeam: http://bioteam.net email: briano at bioteam.net mobile: 978-317-3101 From MEC at stowers.org Fri Jan 4 20:20:17 2013 From: MEC at stowers.org (Cook, Malcolm) Date: Fri, 4 Jan 2013 14:20:17 -0600 Subject: [Bioperl-l] Converting blast+ output to gff (with gaps) In-Reply-To: <7F69D24B-8D4C-41F5-B8A1-CB850EBB1C67@tamu.edu> References: <072DC877-6A75-4A2E-9E6B-027E1D874C4C@tamu.edu> <118F034CF4C3EF48A96F86CE585B94BF4CF5FFF5@CHIMBX5.ad.uillinois.edu> <80636318-ED50-4F05-85C2-35F12E4FF9D8@verizon.net> <7F69D24B-8D4C-41F5-B8A1-CB850EBB1C67@tamu.edu> Message-ID: <2C40E43D1F7A56408C4463FD245DDDF907958EE791@EXCHMB-02.stowers-institute.org> Jim, Getting to your original question: > I'm looking for a script that will take one of the blast+ outformats that includes the positions of gaps and mismatches, and .create gff with appropriate subfeatures. Exactly what/how do you want/expect to encode the blast output as GFF{1,2,2.5,3}?? If GFF3 pe http://www.sequenceontology.org/gff3.shtml then are you hoping to get GFF3 marked up as described in section 'THE GAP ATTRIBUTE' or as in 'ALIGNMENTS' I would guess not because neither of them have 'subfeatures'. If you could explain more fully with examples (hand cobbled or borrowed from someone else) of what you expect then I might have a better idea of what options might suit your needs. ~Malcolm .-----Original Message----- .From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Jim Hu .Sent: Friday, January 04, 2013 1:50 PM .To: Brian Osborne .Cc: Fields, Christopher J; Scott Cain; bioperl-l at bioperl.org .Subject: Re: [Bioperl-l] Converting blast+ output to gff (with gaps) . .Thanks for the replies, but... . .I can't tell what input formats for the blast results file are supported. Format 11 and format 6 give no output and no feedback. Putting .some diagnostic print statements in the code suggests that I'm not getting any result objects from Bio::SearchIO. . .The script uses Bio::SearchIO, but does not seem to call the submodules for blast. Documentation links on the wiki seem to be .broken, at least on this page: . . http://www.bioperl.org/wiki/Module:Bio::SearchIO . .Jim . . .On Jan 2, 2013, at 4:53 PM, Brian Osborne wrote: . .> Scott and Chris, .> .> I'll test it and see... .> .> Brian O. .> .> .> On Jan 2, 2013, at 5:26 PM, "Fields, Christopher J" wrote: .> .>> It should (I recall using it at one point). If it doesn't we should fix it so it does. .>> .>> How does MAKER deal with this? IIRC it uses (a modified) SearchIO-based method... .>> .>> chris .>> .>> On Jan 2, 2013, at 3:32 PM, Scott Cain wrote: .>> .>>> Hi Brian, .>>> .>>> I was going to suggest the same thing--though that script is fairly .>>> old, it's not as old as the blast2gff script in the GBrowse .>>> distribution (which probably should be retired). I believe it .>>> supports GFF3, though I don't have any sample data with which to test .>>> it to be sure. I also don't know if it supports BLAST+ input--I .>>> haven't kept up with SearchIO (on which search2gff.pl depends); will .>>> it accept it? .>>> .>>> Scott .>>> .>>> .>>> On Wed, Jan 2, 2013 at 3:26 PM, Brian Osborne wrote: .>>>> Here's one: .>>>> .>>>> https://github.com/GMOD/GBrowse/blob/master/contrib/blast2gff.pl .>>>> .>>>> Another one: .>>>> .>>>> ~/git/bioperl-live>head scripts/utilities/bp_search2gff.pl .>>>> #!perl .>>>> .>>>> # Author: Jason Stajich .>>>> # Description: Turn SearchIO parseable report(s) into a GFF report .>>>> # .>>>> =head1 NAME .>>>> .>>>> bp_search2gff - Turn SearchIO parseable reports(s) into a GFF report .>>>> .>>>> .>>>> .>>>> Brian O. .>>>> .>>>> On Jan 2, 2013, at 2:44 PM, Jim Hu wrote: .>>>> .>>>>> I assume this has already been done many times, but I can't seem to find it on bioperl.org or via google. .>>>>> .>>>>> I'm looking for a script that will take one of the blast+ outformats that includes the positions of gaps and mismatches, and .create gff with appropriate subfeatures. .>>>>> .>>>>> Thanks, .>>>>> .>>>>> Jim .>>>>> ===================================== .>>>>> Jim Hu .>>>>> Professor .>>>>> Dept. of Biochemistry and Biophysics .>>>>> 2128 TAMU .>>>>> Texas A&M Univ. .>>>>> College Station, TX 77843-2128 .>>>>> 979-862-4054 .>>>>> .>>>>> .>>>>> .>>>>> _______________________________________________ .>>>>> Bioperl-l mailing list .>>>>> Bioperl-l at lists.open-bio.org .>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l .>>>> .>>>> .>>>> _______________________________________________ .>>>> Bioperl-l mailing list .>>>> Bioperl-l at lists.open-bio.org .>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l .>>> .>>> .>>> .>>> -- .>>> ------------------------------------------------------------------------ .>>> Scott Cain, Ph. D. scott at scottcain dot net .>>> GMOD Coordinator (http://gmod.org/) 216-392-3087 .>>> Ontario Institute for Cancer Research .>>> _______________________________________________ .>>> Bioperl-l mailing list .>>> Bioperl-l at lists.open-bio.org .>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l .>> .> . .===================================== .Jim Hu .Professor .Dept. of Biochemistry and Biophysics .2128 TAMU .Texas A&M Univ. .College Station, TX 77843-2128 .979-862-4054 . . . ._______________________________________________ .Bioperl-l mailing list .Bioperl-l at lists.open-bio.org .http://lists.open-bio.org/mailman/listinfo/bioperl-l From jimhu at tamu.edu Fri Jan 4 21:57:38 2013 From: jimhu at tamu.edu (Jim Hu) Date: Fri, 4 Jan 2013 15:57:38 -0600 Subject: [Bioperl-l] Converting blast+ output to gff (with gaps) In-Reply-To: <2C40E43D1F7A56408C4463FD245DDDF907958EE791@EXCHMB-02.stowers-institute.org> References: <072DC877-6A75-4A2E-9E6B-027E1D874C4C@tamu.edu> <118F034CF4C3EF48A96F86CE585B94BF4CF5FFF5@CHIMBX5.ad.uillinois.edu> <80636318-ED50-4F05-85C2-35F12E4FF9D8@verizon.net> <7F69D24B-8D4C-41F5-B8A1-CB850EBB1C67@tamu.edu> <2C40E43D1F7A56408C4463FD245DDDF907958EE791@EXCHMB-02.stowers-institute.org> Message-ID: <2D58FAEB-ED4B-4965-A8CB-890D0012D098@tamu.edu> Malcolm, Thanks, I should have reread the GFF3 spec before posting! In the section on the Gap attrribute and below on alignments it discusses two ways to represent an alignment. I was originally thinking of something like the later example shown for cDNA vs genome. But the gap attribute representation would be fine too. So, I can see how the final output could be done in different ways, but I'm still stuck on how to get there. I don't have a specific application in mind; I'm mostly just trying to understand how to get from having standalone blast+ output to get to things that look like the examples in the gff spec and the gbrowse documentation - really basic display of alignments that are gapped. For my teaching, we do EST vs genomic blast and want gapped cDNA alignments to show where the introns go. My other work is with bacteria where introns are rare, but there are times when I'd like to show an alignment that is interrupted by a transposable element, for example. Excerpting from blastp -help *** Formatting options -outfmt alignment view options: 0 = pairwise, 1 = query-anchored showing identities, 2 = query-anchored no identities, 3 = flat query-anchored, show identities, 4 = flat query-anchored, no identities, 5 = XML Blast output, 6 = tabular, 7 = tabular with comment lines, 8 = Text ASN.1, 9 = Binary ASN.1, 10 = Comma-separated values, 11 = BLAST archive format (ASN.1) Several of these are "lossy" in terms of where the actual gaps occur (e.g. 6). Others seem to me to be more human readable than suited for parsing. So I was hoping to get pointed to an existing script that would generate either the single feature with gap attribute OR the multi-line match features OR a combination from one of these output formats. I'm probably missing something very, very obvious. Best, Jim On Jan 4, 2013, at 2:20 PM, Cook, Malcolm wrote: > Jim, > > Getting to your original question: > >> I'm looking for a script that will take one of the blast+ outformats that includes the positions of gaps and mismatches, and .create gff with appropriate subfeatures. > > Exactly what/how do you want/expect to encode the blast output as GFF{1,2,2.5,3}?? > > If GFF3 pe http://www.sequenceontology.org/gff3.shtml then are you hoping to get GFF3 marked up as described in section 'THE GAP ATTRIBUTE' or as in 'ALIGNMENTS' > > I would guess not because neither of them have 'subfeatures'. > > If you could explain more fully with examples (hand cobbled or borrowed from someone else) of what you expect then I might have a better idea of what options might suit your needs. > > > ~Malcolm > > > .-----Original Message----- > .From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Jim Hu > .Sent: Friday, January 04, 2013 1:50 PM > .To: Brian Osborne > .Cc: Fields, Christopher J; Scott Cain; bioperl-l at bioperl.org > .Subject: Re: [Bioperl-l] Converting blast+ output to gff (with gaps) > . > .Thanks for the replies, but... > . > .I can't tell what input formats for the blast results file are supported. Format 11 and format 6 give no output and no feedback. Putting > .some diagnostic print statements in the code suggests that I'm not getting any result objects from Bio::SearchIO. > . > .The script uses Bio::SearchIO, but does not seem to call the submodules for blast. Documentation links on the wiki seem to be > .broken, at least on this page: > . > . http://www.bioperl.org/wiki/Module:Bio::SearchIO > . > .Jim > . > . > .On Jan 2, 2013, at 4:53 PM, Brian Osborne wrote: > . > .> Scott and Chris, > .> > .> I'll test it and see... > .> > .> Brian O. > .> > .> > .> On Jan 2, 2013, at 5:26 PM, "Fields, Christopher J" wrote: > .> > .>> It should (I recall using it at one point). If it doesn't we should fix it so it does. > .>> > .>> How does MAKER deal with this? IIRC it uses (a modified) SearchIO-based method... > .>> > .>> chris > .>> > .>> On Jan 2, 2013, at 3:32 PM, Scott Cain wrote: > .>> > .>>> Hi Brian, > .>>> > .>>> I was going to suggest the same thing--though that script is fairly > .>>> old, it's not as old as the blast2gff script in the GBrowse > .>>> distribution (which probably should be retired). I believe it > .>>> supports GFF3, though I don't have any sample data with which to test > .>>> it to be sure. I also don't know if it supports BLAST+ input--I > .>>> haven't kept up with SearchIO (on which search2gff.pl depends); will > .>>> it accept it? > .>>> > .>>> Scott > .>>> > .>>> > .>>> On Wed, Jan 2, 2013 at 3:26 PM, Brian Osborne wrote: > .>>>> Here's one: > .>>>> > .>>>> https://github.com/GMOD/GBrowse/blob/master/contrib/blast2gff.pl > .>>>> > .>>>> Another one: > .>>>> > .>>>> ~/git/bioperl-live>head scripts/utilities/bp_search2gff.pl > .>>>> #!perl > .>>>> > .>>>> # Author: Jason Stajich > .>>>> # Description: Turn SearchIO parseable report(s) into a GFF report > .>>>> # > .>>>> =head1 NAME > .>>>> > .>>>> bp_search2gff - Turn SearchIO parseable reports(s) into a GFF report > .>>>> > .>>>> > .>>>> > .>>>> Brian O. > .>>>> > .>>>> On Jan 2, 2013, at 2:44 PM, Jim Hu wrote: > .>>>> > .>>>>> I assume this has already been done many times, but I can't seem to find it on bioperl.org or via google. > .>>>>> > .>>>>> I'm looking for a script that will take one of the blast+ outformats that includes the positions of gaps and mismatches, and > .create gff with appropriate subfeatures. > .>>>>> > .>>>>> Thanks, > .>>>>> > .>>>>> Jim > .>>>>> ===================================== > .>>>>> Jim Hu > .>>>>> Professor > .>>>>> Dept. of Biochemistry and Biophysics > .>>>>> 2128 TAMU > .>>>>> Texas A&M Univ. > .>>>>> College Station, TX 77843-2128 > .>>>>> 979-862-4054 > .>>>>> > .>>>>> > .>>>>> > .>>>>> _______________________________________________ > .>>>>> Bioperl-l mailing list > .>>>>> Bioperl-l at lists.open-bio.org > .>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > .>>>> > .>>>> > .>>>> _______________________________________________ > .>>>> Bioperl-l mailing list > .>>>> Bioperl-l at lists.open-bio.org > .>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > .>>> > .>>> > .>>> > .>>> -- > .>>> ------------------------------------------------------------------------ > .>>> Scott Cain, Ph. D. scott at scottcain dot net > .>>> GMOD Coordinator (http://gmod.org/) 216-392-3087 > .>>> Ontario Institute for Cancer Research > .>>> _______________________________________________ > .>>> Bioperl-l mailing list > .>>> Bioperl-l at lists.open-bio.org > .>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > .>> > .> > . > .===================================== > .Jim Hu > .Professor > .Dept. of Biochemistry and Biophysics > .2128 TAMU > .Texas A&M Univ. > .College Station, TX 77843-2128 > .979-862-4054 > . > . > . > ._______________________________________________ > .Bioperl-l mailing list > .Bioperl-l at lists.open-bio.org > .http://lists.open-bio.org/mailman/listinfo/bioperl-l ===================================== Jim Hu Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054 From MEC at stowers.org Fri Jan 4 23:33:04 2013 From: MEC at stowers.org (Cook, Malcolm) Date: Fri, 4 Jan 2013 17:33:04 -0600 Subject: [Bioperl-l] Converting blast+ output to gff (with gaps) In-Reply-To: <2D58FAEB-ED4B-4965-A8CB-890D0012D098@tamu.edu> References: <072DC877-6A75-4A2E-9E6B-027E1D874C4C@tamu.edu> <118F034CF4C3EF48A96F86CE585B94BF4CF5FFF5@CHIMBX5.ad.uillinois.edu> <80636318-ED50-4F05-85C2-35F12E4FF9D8@verizon.net> <7F69D24B-8D4C-41F5-B8A1-CB850EBB1C67@tamu.edu> <2C40E43D1F7A56408C4463FD245DDDF907958EE791@EXCHMB-02.stowers-institute.org> <2D58FAEB-ED4B-4965-A8CB-890D0012D098@tamu.edu> Message-ID: <2C40E43D1F7A56408C4463FD245DDDF907958EE7CB@EXCHMB-02.stowers-institute.org> Jim, To get a working example for further discussion.... Assuming you have bioperl and blast+ installed.... This command gets a known transcript from ncbi and blasts it back at ncbi into fly genome: > bp_download_query_genbank.pl --query 'NM_001259364' | blastn -remote -db refseq_genomic -entrez_query 'melanogaster[Taxid]' -outfmt 6 > test.blast.tab Take a look at the results (see below, or run it yourself). First line output looks like this: NM_001259364 gi|116010442|ref|NT_033778.3| 100.00 2127 0 0 1869 3995 9330023 9327897 0.0 3928 Old GFF did not have parent-child. Depending on where you are going with this, you MIGHT want to use this strategy: http://gmod.org/wiki/GFF#Alignments Where the ID in column 9 serves to group the features. There is an 'implicit' parent. I think this would be respected by both GBrowse and UCSC genome browser (but I'm not positive). If so, converting blast output would yield one line output per each line input. You would need to assign a strand, swap sstart and send where sstart>send, and some slight re-formatting First line of GFF might look like: NT_033778.3 est EST_match 9330023 9327897 . + . ID=Match1;Name= NM_001259364;Target= NM_001259364 1869 3995 Is this along the lines of what you need? ~Malcolm Complete example blast+ output looks like this: NM_001259364 gi|116010442|ref|NT_033778.3| 100.00 2127 0 0 1869 3995 9330023 9327897 0.0 3928 NM_001259364 gi|116010442|ref|NT_033778.3| 100.00 1019 0 0 3995 5013 9327834 9326816 0.0 1882 NM_001259364 gi|116010442|ref|NT_033778.3| 100.00 374 0 0 345 718 9332643 9332270 0.0 691 NM_001259364 gi|116010442|ref|NT_033778.3| 100.00 300 0 0 1 300 9337978 9337679 1e-154 555 NM_001259364 gi|116010442|ref|NT_033778.3| 100.00 270 0 0 717 986 9332201 9331932 5e-138 499 NM_001259364 gi|116010442|ref|NT_033778.3| 100.00 243 0 0 1434 1676 9331200 9330958 5e-123 449 NM_001259364 gi|116010442|ref|NT_033778.3| 99.55 223 0 1 1154 1376 9331611 9331390 1e-109 405 NM_001259364 gi|116010442|ref|NT_033778.3| 100.00 196 0 0 1675 1870 9330634 9330439 7e-97 363 NM_001259364 gi|116010442|ref|NT_033778.3| 100.00 169 0 0 986 1154 9331847 9331679 7e-82 313 NM_001259364 gi|116010442|ref|NT_033778.3| 97.26 73 2 0 1372 1444 9331322 9331250 3e-25 124 NM_001259364 gi|116010442|ref|NT_033778.3| 100.00 44 0 0 301 344 9333095 9333052 2e-12 82.4 NM_001259364 gi|195586644|ref|NT_167067.1| 96.52 2127 74 0 1869 3995 7777827 7775701 0.0 3518 NM_001259364 gi|195586644|ref|NT_167067.1| 97.03 404 7 3 4004 4402 7775629 7775226 0.0 675 NM_001259364 gi|195586644|ref|NT_167067.1| 93.58 374 24 0 345 718 7780392 7780019 8e-156 558 NM_001259364 gi|195586644|ref|NT_167067.1| 94.37 302 15 1 1 300 7786262 7785961 6e-127 462 NM_001259364 gi|195586644|ref|NT_167067.1| 96.25 267 10 0 720 986 7779949 7779683 1e-119 438 NM_001259364 gi|195586644|ref|NT_167067.1| 95.00 240 12 0 1434 1673 7778959 7778720 2e-101 377 NM_001259364 gi|195586644|ref|NT_167067.1| 96.86 223 6 1 1154 1376 7779370 7779149 1e-99 372 NM_001259364 gi|195586644|ref|NT_167067.1| 93.88 196 12 0 1675 1870 7778384 7778189 7e-77 296 NM_001259364 gi|195586644|ref|NT_167067.1| 95.19 187 5 4 4827 5013 7774635 7774453 9e-76 292 NM_001259364 gi|195586644|ref|NT_167067.1| 92.86 168 12 0 987 1154 7779598 7779431 3e-61 244 NM_001259364 gi|195586644|ref|NT_167067.1| 92.86 70 5 0 1374 1443 7779079 7779010 2e-18 102 NM_001259364 gi|195586644|ref|NT_167067.1| 100.00 44 0 0 301 344 7780833 7780790 2e-12 82.4 NM_001259364 gi|195489961|ref|NT_167063.1| 92.43 2127 161 0 1869 3995 7628910 7631036 0.0 3037 NM_001259364 gi|195489961|ref|NT_167063.1| 91.94 1030 38 20 3995 5013 7631102 7632097 0.0 1400 NM_001259364 gi|195489961|ref|NT_167063.1| 88.56 376 39 4 345 718 7626215 7626588 4e-124 453 NM_001259364 gi|195489961|ref|NT_167063.1| 93.67 300 18 1 1 300 7620777 7621075 2e-122 448 NM_001259364 gi|195489961|ref|NT_167063.1| 95.41 196 9 0 1675 1870 7628263 7628458 7e-82 313 NM_001259364 gi|195489961|ref|NT_167063.1| 91.93 223 17 1 1154 1376 7627224 7627445 2e-81 311 NM_001259364 gi|195489961|ref|NT_167063.1| 86.42 265 31 3 720 983 7626653 7626913 1e-73 285 NM_001259364 gi|195489961|ref|NT_167063.1| 86.18 246 25 4 1434 1673 7627635 7627877 3e-65 257 NM_001259364 gi|195489961|ref|NT_167063.1| 88.27 162 19 0 993 1154 7626999 7627160 3e-46 195 NM_001259364 gi|195489961|ref|NT_167063.1| 93.06 72 5 0 1372 1443 7627513 7627584 1e-19 106 NM_001259364 gi|195489961|ref|NT_167063.1| 100.00 44 0 0 301 344 7625770 7625813 2e-12 82.4 ~Malcolm .-----Original Message----- .From: Jim Hu [mailto:jimhu at tamu.edu] .Sent: Friday, January 04, 2013 3:58 PM .To: Cook, Malcolm .Cc: 'Brian Osborne'; 'Fields, Christopher J'; 'Scott Cain'; 'bioperl-l at bioperl.org' .Subject: Re: [Bioperl-l] Converting blast+ output to gff (with gaps) . .Malcolm, . .Thanks, I should have reread the GFF3 spec before posting! . .In the section on the Gap attrribute and below on alignments it discusses two ways to represent an alignment. I was originally thinking .of something like the later example shown for cDNA vs genome. But the gap attribute representation would be fine too. So, I can see .how the final output could be done in different ways, but I'm still stuck on how to get there. . .I don't have a specific application in mind; I'm mostly just trying to understand how to get from having standalone blast+ output to get .to things that look like the examples in the gff spec and the gbrowse documentation - really basic display of alignments that are .gapped. For my teaching, we do EST vs genomic blast and want gapped cDNA alignments to show where the introns go. My other .work is with bacteria where introns are rare, but there are times when I'd like to show an alignment that is interrupted by a .transposable element, for example. . .Excerpting from blastp -help . . *** Formatting options . -outfmt . alignment view options: . 0 = pairwise, . 1 = query-anchored showing identities, . 2 = query-anchored no identities, . 3 = flat query-anchored, show identities, . 4 = flat query-anchored, no identities, . 5 = XML Blast output, . 6 = tabular, . 7 = tabular with comment lines, . 8 = Text ASN.1, . 9 = Binary ASN.1, . 10 = Comma-separated values, . 11 = BLAST archive format (ASN.1) . .Several of these are "lossy" in terms of where the actual gaps occur (e.g. 6). Others seem to me to be more human readable than .suited for parsing. So I was hoping to get pointed to an existing script that would generate either the single feature with gap attribute .OR the multi-line match features OR a combination from one of these output formats. . .I'm probably missing something very, very obvious. . .Best, . .Jim . . .On Jan 4, 2013, at 2:20 PM, Cook, Malcolm wrote: . .> Jim, .> .> Getting to your original question: .> .>> I'm looking for a script that will take one of the blast+ outformats that includes the positions of gaps and mismatches, and .create .gff with appropriate subfeatures. .> .> Exactly what/how do you want/expect to encode the blast output as GFF{1,2,2.5,3}?? .> .> If GFF3 pe http://www.sequenceontology.org/gff3.shtml then are you hoping to get GFF3 marked up as described in section 'THE .GAP ATTRIBUTE' or as in 'ALIGNMENTS' .> .> I would guess not because neither of them have 'subfeatures'. .> .> If you could explain more fully with examples (hand cobbled or borrowed from someone else) of what you expect then I might have .a better idea of what options might suit your needs. .> .> .> ~Malcolm .> .> .> .-----Original Message----- .> .From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Jim Hu .> .Sent: Friday, January 04, 2013 1:50 PM .> .To: Brian Osborne .> .Cc: Fields, Christopher J; Scott Cain; bioperl-l at bioperl.org .> .Subject: Re: [Bioperl-l] Converting blast+ output to gff (with gaps) .> . .> .Thanks for the replies, but... .> . .> .I can't tell what input formats for the blast results file are supported. Format 11 and format 6 give no output and no feedback. .Putting .> .some diagnostic print statements in the code suggests that I'm not getting any result objects from Bio::SearchIO. .> . .> .The script uses Bio::SearchIO, but does not seem to call the submodules for blast. Documentation links on the wiki seem to be .> .broken, at least on this page: .> . .> . http://www.bioperl.org/wiki/Module:Bio::SearchIO .> . .> .Jim .> . .> . .> .On Jan 2, 2013, at 4:53 PM, Brian Osborne wrote: .> . .> .> Scott and Chris, .> .> .> .> I'll test it and see... .> .> .> .> Brian O. .> .> .> .> .> .> On Jan 2, 2013, at 5:26 PM, "Fields, Christopher J" wrote: .> .> .> .>> It should (I recall using it at one point). If it doesn't we should fix it so it does. .> .>> .> .>> How does MAKER deal with this? IIRC it uses (a modified) SearchIO-based method... .> .>> .> .>> chris .> .>> .> .>> On Jan 2, 2013, at 3:32 PM, Scott Cain wrote: .> .>> .> .>>> Hi Brian, .> .>>> .> .>>> I was going to suggest the same thing--though that script is fairly .> .>>> old, it's not as old as the blast2gff script in the GBrowse .> .>>> distribution (which probably should be retired). I believe it .> .>>> supports GFF3, though I don't have any sample data with which to test .> .>>> it to be sure. I also don't know if it supports BLAST+ input--I .> .>>> haven't kept up with SearchIO (on which search2gff.pl depends); will .> .>>> it accept it? .> .>>> .> .>>> Scott .> .>>> .> .>>> .> .>>> On Wed, Jan 2, 2013 at 3:26 PM, Brian Osborne wrote: .> .>>>> Here's one: .> .>>>> .> .>>>> https://github.com/GMOD/GBrowse/blob/master/contrib/blast2gff.pl .> .>>>> .> .>>>> Another one: .> .>>>> .> .>>>> ~/git/bioperl-live>head scripts/utilities/bp_search2gff.pl .> .>>>> #!perl .> .>>>> .> .>>>> # Author: Jason Stajich .> .>>>> # Description: Turn SearchIO parseable report(s) into a GFF report .> .>>>> # .> .>>>> =head1 NAME .> .>>>> .> .>>>> bp_search2gff - Turn SearchIO parseable reports(s) into a GFF report .> .>>>> .> .>>>> .> .>>>> .> .>>>> Brian O. .> .>>>> .> .>>>> On Jan 2, 2013, at 2:44 PM, Jim Hu wrote: .> .>>>> .> .>>>>> I assume this has already been done many times, but I can't seem to find it on bioperl.org or via google. .> .>>>>> .> .>>>>> I'm looking for a script that will take one of the blast+ outformats that includes the positions of gaps and mismatches, and .> .create gff with appropriate subfeatures. .> .>>>>> .> .>>>>> Thanks, .> .>>>>> .> .>>>>> Jim .> .>>>>> ===================================== .> .>>>>> Jim Hu .> .>>>>> Professor .> .>>>>> Dept. of Biochemistry and Biophysics .> .>>>>> 2128 TAMU .> .>>>>> Texas A&M Univ. .> .>>>>> College Station, TX 77843-2128 .> .>>>>> 979-862-4054 .> .>>>>> .> .>>>>> .> .>>>>> .> .>>>>> _______________________________________________ .> .>>>>> Bioperl-l mailing list .> .>>>>> Bioperl-l at lists.open-bio.org .> .>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l .> .>>>> .> .>>>> .> .>>>> _______________________________________________ .> .>>>> Bioperl-l mailing list .> .>>>> Bioperl-l at lists.open-bio.org .> .>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l .> .>>> .> .>>> .> .>>> .> .>>> -- .> .>>> ------------------------------------------------------------------------ .> .>>> Scott Cain, Ph. D. scott at scottcain dot net .> .>>> GMOD Coordinator (http://gmod.org/) 216-392-3087 .> .>>> Ontario Institute for Cancer Research .> .>>> _______________________________________________ .> .>>> Bioperl-l mailing list .> .>>> Bioperl-l at lists.open-bio.org .> .>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l .> .>> .> .> .> . .> .===================================== .> .Jim Hu .> .Professor .> .Dept. of Biochemistry and Biophysics .> .2128 TAMU .> .Texas A&M Univ. .> .College Station, TX 77843-2128 .> .979-862-4054 .> . .> . .> . .> ._______________________________________________ .> .Bioperl-l mailing list .> .Bioperl-l at lists.open-bio.org .> .http://lists.open-bio.org/mailman/listinfo/bioperl-l . .===================================== .Jim Hu .Professor .Dept. of Biochemistry and Biophysics .2128 TAMU .Texas A&M Univ. .College Station, TX 77843-2128 .979-862-4054 . From bosborne11 at verizon.net Sat Jan 5 22:25:37 2013 From: bosborne11 at verizon.net (Brian Osborne) Date: Sat, 05 Jan 2013 17:25:37 -0500 Subject: [Bioperl-l] Converting blast+ output to gff (with gaps) In-Reply-To: <80636318-ED50-4F05-85C2-35F12E4FF9D8@verizon.net> References: <072DC877-6A75-4A2E-9E6B-027E1D874C4C@tamu.edu> <118F034CF4C3EF48A96F86CE585B94BF4CF5FFF5@CHIMBX5.ad.uillinois.edu> <80636318-ED50-4F05-85C2-35F12E4FF9D8@verizon.net> Message-ID: <12E97C07-D3E0-4E21-8A24-A4223468FF70@verizon.net> All, The scripts/utilities/bp_search2gff.pl script works fine with blastn (BLAST+), no surprise there, and since it uses Bio::Tools::GFF it can produce GFF3. Brian O. On Jan 2, 2013, at 5:53 PM, Brian Osborne wrote: > Scott and Chris, > > I'll test it and see? > > Brian O. > > > On Jan 2, 2013, at 5:26 PM, "Fields, Christopher J" wrote: > >> It should (I recall using it at one point). If it doesn't we should fix it so it does. >> >> How does MAKER deal with this? IIRC it uses (a modified) SearchIO-based method... >> >> chris >> >> On Jan 2, 2013, at 3:32 PM, Scott Cain wrote: >> >>> Hi Brian, >>> >>> I was going to suggest the same thing--though that script is fairly >>> old, it's not as old as the blast2gff script in the GBrowse >>> distribution (which probably should be retired). I believe it >>> supports GFF3, though I don't have any sample data with which to test >>> it to be sure. I also don't know if it supports BLAST+ input--I >>> haven't kept up with SearchIO (on which search2gff.pl depends); will >>> it accept it? >>> >>> Scott >>> >>> >>> On Wed, Jan 2, 2013 at 3:26 PM, Brian Osborne wrote: >>>> Here's one: >>>> >>>> https://github.com/GMOD/GBrowse/blob/master/contrib/blast2gff.pl >>>> >>>> Another one: >>>> >>>> ~/git/bioperl-live>head scripts/utilities/bp_search2gff.pl >>>> #!perl >>>> >>>> # Author: Jason Stajich