From jason at bioperl.org Mon Sep 1 03:42:26 2008 From: jason at bioperl.org (Jason Stajich) Date: Mon, 1 Sep 2008 00:42:26 -0700 Subject: [Bioperl-l] Bio::Tools::dpAlign feature request In-Reply-To: <868888.19741.qm@web30406.mail.mud.yahoo.com> References: <868888.19741.qm@web30406.mail.mud.yahoo.com> Message-ID: <99A57523-94B3-4A56-A617-22675FA10AB8@bioperl.org> Safe to ignore the tests. Those that are failing aren't even test for Bio::Align::dpAlign - but were written to test a bug that has not been fixed in the EVD module if I remember correctly that is why they are marked in a TODO block, but I can't tell if the Test.pm is actually skipping these tests or not. I think we probably need to deprecate some of these modules as there is no maintainer of Ewan's code in here. At a minimum we need to modularlize the tests for these modules into separate t dir and fix the need for multiple Makefile.PL in here and probably move to Build.PL -jason On Aug 28, 2008, at 11:00 AM, Yee Man Chan wrote: > > Hi Alexie > > My understanding is that you can ignore these failures. > > I believe test cases 17-20 were added by Jason Stajich before I > added the feature you requested. I am not sure what he was doing > there. > > I suppose he can give you the definite answer to whether this is > something important or not. > > By the way, did you try out the new feature? Does it work? > > Thanks > Yee Man > > --- On Thu, 8/28/08, Alexie Papanicolaou > wrote: > >> From: Alexie Papanicolaou >> Subject: Re: Bio::Tools::dpAlign feature request >> To: "Yee Man Chan" >> Date: Thursday, August 28, 2008, 6:15 AM >> hi >> >> is the version you emailed me newer or older than the >> subversion one? >> >> i'm testing the subversion version for bioperl-ext and >> >> not ok 17 # TODO evalues vary based on platform, needs >> fixing >> # Failed (TODO) test at test.pl line 156. >> # got: '2027805538' >> # expected: '1764904' >> not ok 18 # TODO evalues vary based on platform, needs >> fixing >> # Failed (TODO) test at test.pl line 157. >> # got: '-1375488148' >> # expected: '1764872' >> not ok 19 # TODO evalues vary based on platform, needs >> fixing >> # Failed (TODO) test at test.pl line 158. >> # got: '-808808307' >> # expected: '1764872' >> not ok 20 # TODO evalues vary based on platform, needs >> fixing >> # Failed (TODO) test at test.pl line 159. >> # got: '-2118162890' >> # expected: '1764872' >> >> you think these are ok to ignore? >> >> >> On Fri, 2008-08-01 at 21:03 -0700, Yee Man Chan wrote: >> >>> Hi Alexie >>> >>> Attached are the files that contains the feature >> you requested. linspc.c is the one that does the work and >> test.pl has a test case for it. The scoring scheme is as >> described before. Please let me know if it works. >>> >>> Yee Man >>> >>> --- On Wed, 7/30/08, Alexie Papanicolaou >> wrote: >>> >>>> From: Alexie Papanicolaou >> >>>> Subject: Re: Bio::Tools::dpAlign feature request >>>> To: ymc at yahoo.com >>>> Date: Wednesday, July 30, 2008, 2:44 PM >>>> Oh sorry >>>> >>>> Say match=3 and mismatch=-1, gopen= -10, gext=-5 >>>> for aligning >>>> seq1: ATG >>>> seq2: ATT >>>> match: 3,3,-1 >>>> >>>> seq1: AT-G >>>> seq2: ATTG >>>> match: 3,3,-1,-10,3 >>>> >>>> is that possible? or am I missing something? I >> was only >>>> today wondering >>>> if it is even possible... >>>> >>>> a >>>> >>>> >>>> Yee Man Chan wrote: >>>>> Sorry, I don't quite get it. Can you >> give me an >>>> example of the output you want? >>>>> >>>>> Yee Man >>>>> >>>>> --- On Wed, 7/30/08, Alexie Papanicolaou >>>> wrote: >>>>> >>>>> >>>>>> From: Alexie Papanicolaou >>>> >>>>>> Subject: Re: Bio::Tools::dpAlign feature >> request >>>>>> To: ymc at yahoo.com >>>>>> Date: Wednesday, July 30, 2008, 9:50 AM >>>>>> Dear Yee Man, >>>>>> >>>>>> Do you think it is possible to code a >> method for >>>> creating a >>>>>> delimited >>>>>> (space or comma) "score-line"? >>>>>> >>>>>> I'd like to parse it into an array >> and have >>>> the >>>>>> individual score for >>>>>> each alignment position. Is it easy to >> do? >>>>>> >>>>>> a >>>>>> >>>>>> Yee Man Chan wrote: >>>>>> >>>>>>> Hi Alexie >>>>>>> >>>>>>> How about I implement the simple >> case? >>>>>>> >>>>>>> So for match = +3, mismatch = -1, >>>>>>> >>>>>>> A and R = +3 >>>>>>> A and Y = -1 >>>>>>> A and B = -1 >>>>>>> A and D = +3 >>>>>>> A and N = +3 >>>>>>> A and X = -1 >>>>>>> >>>>>>> What do you think? >>>>>>> Yee Man >>>>>>> >>>>>>> >>>>>>> --- On Tue, 7/29/08, Alexie >> Papanicolaou >>>>>>> >>>>>> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>>> From: Alexie Papanicolaou >>>>>>>> >>>>>> >>>>>> >>>>>>>> Subject: Re: Bio::Tools::dpAlign >> feature >>>> request >>>>>>>> To: ymc at yahoo.com >>>>>>>> Date: Tuesday, July 29, 2008, >> 10:58 AM >>>>>>>> Dear Yee Man, >>>>>>>> hello, I was wondering how is >> this >>>> progressing and >>>>>>>> >>>>>> if you >>>>>> >>>>>>>> need help? >>>>>>>> >>>>>>>> many thanks >>>>>>>> alexie >>>>>>>> >>>>>>>> Yee Man Chan wrote: >>>>>>>> >>>>>>>> >>>>>>>>> Hi Alexie >>>>>>>>> >>>>>>>>> There are two ways to >> compute the >>>> score for >>>>>>>>> >>>>>> each >>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> aligned basepair in dpAlign >> module. One is >>>>>>>> >>>>>> match/mismatch >>>>>> >>>>>>>> if you specify your sequence as >> DNA and >>>> the other >>>>>>>> >>>>>> is a >>>>>> >>>>>>>> scoring matrix if you specify >> your >>>> sequence as >>>>>>>> >>>>>> protein. >>>>>> >>>>>>>> Obviously, the latter can >> completely >>>> dominate the >>>>>>>> >>>>>> former. >>>>>> >>>>>>>> If you take the time to type the >> scoring >>>> matrix >>>>>>>> >>>>>> file, then >>>>>> >>>>>>>> you can handle those IUPAC code >> by >>>> specifying the >>>>>>>> >>>>>> sequence >>>>>> >>>>>>>> as proteins. >>>>>>>> >>>>>>>> >>>>>>>>> If you think this is too >>>> troublesome, then >>>>>>>>> >>>>>> I might >>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> be able to extend the >> match/mismatch route >>>> to >>>>>>>> >>>>>> handle IUPAC >>>>>> >>>>>>>> codes. But the problem here is, >> how should >>>> I score >>>>>>>> >>>>>> a match >>>>>> >>>>>>>> of A and W when match is +3 and >> mismatch >>>> is -1? >>>>>>>> >>>>>> Should it >>>>>> >>>>>>>> have a score of +3/3 = +1 for >> match or >>>> +3/3-1*2/3 >>>>>>>> >>>>>> = +1/3? >>>>>> >>>>>>>> Do you know what the convention >> is? If >>>> not, maybe >>>>>>>> >>>>>> you can >>>>>> >>>>>>>> tell me what you think the score >> will be? >>>>>>>> >>>>>>>> >>>>>>>>> Yee Man >>>>>>>>> >>>>>>>>> --- On Thu, 6/26/08, Alexie >>>> Papanicolaou >>>>>>>>> >>>>>>>>> >>>>>>>> >> wrote: >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> From: Alexie >> Papanicolaou >>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>> Subject: >> Bio::Tools::dpAlign >>>> feature >>>>>>>>>> >>>>>> request >>>>>> >>>>>>>>>> To: ymc at yahoo.com >>>>>>>>>> Date: Thursday, June 26, >> 2008, >>>> 4:15 AM >>>>>>>>>> Dear Yee Man Chan, >>>>>>>>>> >>>>>>>>>> Many thank you for this >> module. I >>>> like it >>>>>>>>>> >>>>>> very >>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>> much. I was >>>>>>>> >>>>>>>> >>>>>>>>>> wondering if >>>>>>>>>> it would be possible for >> you to >>>> allow for >>>>>>>>>> >>>>>> IUPAC >>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>> DNA codes. >>>>>>>> >>>>>>>> >>>>>>>>>> I see it is in your TODO >> list and >>>> I hoped >>>>>>>>>> >>>>>> to >>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>> inspire you >>>>>>>> >>>>>>>> >>>>>>>>>> :-) >>>>>>>>>> >>>>>>>>>> Even a simple measure >> with the >>>> degenerate >>>>>>>>>> >>>>>> base >>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>> containing >>>>>>>> >>>>>>>> >>>>>>>>>> the aligned >>>>>>>>>> base count as a >> (perfect) match >>>> would be >>>>>>>>>> >>>>>> very >>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>> useful to me >>>>>>>> >>>>>>>> >>>>>>>>>> (i'm sorry, i >>>>>>>>>> 'm not a good coder >> to do it >>>> myself). >>>>>>>>>> >>>>>>>>>> many thanks for your >> work so far. >>>>>>>>>> alexie >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> -- >>>>>>>>>> "Eppur si >> evolve" >>>> ("And yet >>>>>>>>>> >>>>>> it >>>>>> >>>>>>>>>> evolves") >>>>>>>>>> -Galileo Jr (ca 21st >> century) >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Alexie Papanicolaou >>>>>>>>>> Entomology >>>>>>>>>> Max Planck Institute for >> Chemical >>>> Ecology >>>>>>>>>> Hans Knoell Str 8 >>>>>>>>>> Jena 07745 >>>>>>>>>> Germany >>>>>>>>>> Email >> apapanicolaou at ice.mpg.de >>>>>>>>>> Tel +493641571561 >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> -- >>>>>>>> -- >>>>>>>> "Eppur si evolve" >> ("And yet >>>> it >>>>>>>> evolves") >>>>>>>> -Galileo Jr (ca 21st century) >>>>>>>> >>>>>>>> "One Galileo in two >> thousand years is >>>>>>>> >>>>>> enough." >>>>>> >>>>>>>> -Pope Pius XII >>>>>>>> -- >>>>>>>> Alexie Papanicolaou >>>>>>>> Entomology >>>>>>>> Max Planck Institute for >> Chemical Ecology >>>>>>>> Hans Knoell Str 8 >>>>>>>> Jena 07745 >>>>>>>> Germany >>>>>>>> Email apapanicolaou at ice.mpg.de >>>>>>>> Tel +493641571561 >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> -- >>>>>> -- >>>>>> "Eppur si evolve" ("And >> yet it >>>>>> evolves") >>>>>> -Galileo Jr (ca 21st century) >>>>>> >>>>>> "One Galileo in two thousand years >> is >>>> enough." >>>>>> -Pope Pius XII >>>>>> -- >>>>>> Alexie Papanicolaou >>>>>> Entomology >>>>>> Max Planck Institute for Chemical >> Ecology >>>>>> Hans Knoell Str 8 >>>>>> Jena 07745 >>>>>> Germany >>>>>> Email apapanicolaou at ice.mpg.de >>>>>> Tel +493641571561 >>>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>>> -- >>>> -- >>>> "Eppur si evolve" ("And yet it >>>> evolves") >>>> -Galileo Jr (ca 21st century) >>>> >>>> "One Galileo in two thousand years is >> enough." >>>> -Pope Pius XII >>>> -- >>>> Alexie Papanicolaou >>>> Entomology >>>> Max Planck Institute for Chemical Ecology >>>> Hans Knoell Str 8 >>>> Jena 07745 >>>> Germany >>>> Email apapanicolaou at ice.mpg.de >>>> Tel +493641571561 >>> >>> >>> >> >> -- >> -- >> "Eppur si evolve" ("And yet it >> evolves") >> -Galileo Jr (ca 21st century) >> >> "One Galileo in two thousand years is enough." >> -Pope Pius XII >> -- >> Alexie Papanicolaou >> Entomology >> Max Planck Institute for Chemical Ecology >> Hans Knoell Str 8 >> Jena 07745 >> Germany >> Email apapanicolaou at ice.mpg.de >> Tel +493641571561 > > > Jason Stajich jason at bioperl.org From cjfields at illinois.edu Mon Sep 1 13:49:56 2008 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 1 Sep 2008 12:49:56 -0500 Subject: [Bioperl-l] Bio::Tools::dpAlign feature request In-Reply-To: <99A57523-94B3-4A56-A617-22675FA10AB8@bioperl.org> References: <868888.19741.qm@web30406.mail.mud.yahoo.com> <99A57523-94B3-4A56-A617-22675FA10AB8@bioperl.org> Message-ID: <44D25826-BE42-4539-8567-468FE74571B4@illinois.edu> On pSW: I agree, I don't think it is worth maintaining it considering there are actively supported C/C++-based toolkits with similar functionality (SeqAn) and Petr's BioLib initiative will likely be a more maintainable effort. chris On Sep 1, 2008, at 2:42 AM, Jason Stajich wrote: > Safe to ignore the tests. Those that are failing aren't even test > for Bio::Align::dpAlign - but were written to test a bug that has > not been fixed in the EVD module if I remember correctly that is why > they are marked in a TODO block, but I can't tell if the Test.pm is > actually skipping these tests or not. > > I think we probably need to deprecate some of these modules as there > is no maintainer of Ewan's code in here. > > At a minimum we need to modularlize the tests for these modules into > separate t dir and fix the need for multiple Makefile.PL in here and > probably move to Build.PL > > -jason > On Aug 28, 2008, at 11:00 AM, Yee Man Chan wrote: > >> >> Hi Alexie >> >> My understanding is that you can ignore these failures. >> >> I believe test cases 17-20 were added by Jason Stajich before I >> added the feature you requested. I am not sure what he was doing >> there. >> >> I suppose he can give you the definite answer to whether this is >> something important or not. >> >> By the way, did you try out the new feature? Does it work? >> >> Thanks >> Yee Man >> >> --- On Thu, 8/28/08, Alexie Papanicolaou >> wrote: >> >>> From: Alexie Papanicolaou >>> Subject: Re: Bio::Tools::dpAlign feature request >>> To: "Yee Man Chan" >>> Date: Thursday, August 28, 2008, 6:15 AM >>> hi >>> >>> is the version you emailed me newer or older than the >>> subversion one? >>> >>> i'm testing the subversion version for bioperl-ext and >>> >>> not ok 17 # TODO evalues vary based on platform, needs >>> fixing >>> # Failed (TODO) test at test.pl line 156. >>> # got: '2027805538' >>> # expected: '1764904' >>> not ok 18 # TODO evalues vary based on platform, needs >>> fixing >>> # Failed (TODO) test at test.pl line 157. >>> # got: '-1375488148' >>> # expected: '1764872' >>> not ok 19 # TODO evalues vary based on platform, needs >>> fixing >>> # Failed (TODO) test at test.pl line 158. >>> # got: '-808808307' >>> # expected: '1764872' >>> not ok 20 # TODO evalues vary based on platform, needs >>> fixing >>> # Failed (TODO) test at test.pl line 159. >>> # got: '-2118162890' >>> # expected: '1764872' >>> >>> you think these are ok to ignore? >>> >>> >>> On Fri, 2008-08-01 at 21:03 -0700, Yee Man Chan wrote: >>> >>>> Hi Alexie >>>> >>>> Attached are the files that contains the feature >>> you requested. linspc.c is the one that does the work and >>> test.pl has a test case for it. The scoring scheme is as >>> described before. Please let me know if it works. >>>> >>>> Yee Man >>>> >>>> --- On Wed, 7/30/08, Alexie Papanicolaou >>> wrote: >>>> >>>>> From: Alexie Papanicolaou >>> >>>>> Subject: Re: Bio::Tools::dpAlign feature request >>>>> To: ymc at yahoo.com >>>>> Date: Wednesday, July 30, 2008, 2:44 PM >>>>> Oh sorry >>>>> >>>>> Say match=3 and mismatch=-1, gopen= -10, gext=-5 >>>>> for aligning >>>>> seq1: ATG >>>>> seq2: ATT >>>>> match: 3,3,-1 >>>>> >>>>> seq1: AT-G >>>>> seq2: ATTG >>>>> match: 3,3,-1,-10,3 >>>>> >>>>> is that possible? or am I missing something? I >>> was only >>>>> today wondering >>>>> if it is even possible... >>>>> >>>>> a >>>>> >>>>> >>>>> Yee Man Chan wrote: >>>>>> Sorry, I don't quite get it. Can you >>> give me an >>>>> example of the output you want? >>>>>> >>>>>> Yee Man >>>>>> >>>>>> --- On Wed, 7/30/08, Alexie Papanicolaou >>>>> wrote: >>>>>> >>>>>> >>>>>>> From: Alexie Papanicolaou >>>>> >>>>>>> Subject: Re: Bio::Tools::dpAlign feature >>> request >>>>>>> To: ymc at yahoo.com >>>>>>> Date: Wednesday, July 30, 2008, 9:50 AM >>>>>>> Dear Yee Man, >>>>>>> >>>>>>> Do you think it is possible to code a >>> method for >>>>> creating a >>>>>>> delimited >>>>>>> (space or comma) "score-line"? >>>>>>> >>>>>>> I'd like to parse it into an array >>> and have >>>>> the >>>>>>> individual score for >>>>>>> each alignment position. Is it easy to >>> do? >>>>>>> >>>>>>> a >>>>>>> >>>>>>> Yee Man Chan wrote: >>>>>>> >>>>>>>> Hi Alexie >>>>>>>> >>>>>>>> How about I implement the simple >>> case? >>>>>>>> >>>>>>>> So for match = +3, mismatch = -1, >>>>>>>> >>>>>>>> A and R = +3 >>>>>>>> A and Y = -1 >>>>>>>> A and B = -1 >>>>>>>> A and D = +3 >>>>>>>> A and N = +3 >>>>>>>> A and X = -1 >>>>>>>> >>>>>>>> What do you think? >>>>>>>> Yee Man >>>>>>>> >>>>>>>> >>>>>>>> --- On Tue, 7/29/08, Alexie >>> Papanicolaou >>>>>>>> >>>>>>> wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> From: Alexie Papanicolaou >>>>>>>>> >>>>>>> >>>>>>> >>>>>>>>> Subject: Re: Bio::Tools::dpAlign >>> feature >>>>> request >>>>>>>>> To: ymc at yahoo.com >>>>>>>>> Date: Tuesday, July 29, 2008, >>> 10:58 AM >>>>>>>>> Dear Yee Man, >>>>>>>>> hello, I was wondering how is >>> this >>>>> progressing and >>>>>>>>> >>>>>>> if you >>>>>>> >>>>>>>>> need help? >>>>>>>>> >>>>>>>>> many thanks >>>>>>>>> alexie >>>>>>>>> >>>>>>>>> Yee Man Chan wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>>> Hi Alexie >>>>>>>>>> >>>>>>>>>> There are two ways to >>> compute the >>>>> score for >>>>>>>>>> >>>>>>> each >>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> aligned basepair in dpAlign >>> module. One is >>>>>>>>> >>>>>>> match/mismatch >>>>>>> >>>>>>>>> if you specify your sequence as >>> DNA and >>>>> the other >>>>>>>>> >>>>>>> is a >>>>>>> >>>>>>>>> scoring matrix if you specify >>> your >>>>> sequence as >>>>>>>>> >>>>>>> protein. >>>>>>> >>>>>>>>> Obviously, the latter can >>> completely >>>>> dominate the >>>>>>>>> >>>>>>> former. >>>>>>> >>>>>>>>> If you take the time to type the >>> scoring >>>>> matrix >>>>>>>>> >>>>>>> file, then >>>>>>> >>>>>>>>> you can handle those IUPAC code >>> by >>>>> specifying the >>>>>>>>> >>>>>>> sequence >>>>>>> >>>>>>>>> as proteins. >>>>>>>>> >>>>>>>>> >>>>>>>>>> If you think this is too >>>>> troublesome, then >>>>>>>>>> >>>>>>> I might >>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> be able to extend the >>> match/mismatch route >>>>> to >>>>>>>>> >>>>>>> handle IUPAC >>>>>>> >>>>>>>>> codes. But the problem here is, >>> how should >>>>> I score >>>>>>>>> >>>>>>> a match >>>>>>> >>>>>>>>> of A and W when match is +3 and >>> mismatch >>>>> is -1? >>>>>>>>> >>>>>>> Should it >>>>>>> >>>>>>>>> have a score of +3/3 = +1 for >>> match or >>>>> +3/3-1*2/3 >>>>>>>>> >>>>>>> = +1/3? >>>>>>> >>>>>>>>> Do you know what the convention >>> is? If >>>>> not, maybe >>>>>>>>> >>>>>>> you can >>>>>>> >>>>>>>>> tell me what you think the score >>> will be? >>>>>>>>> >>>>>>>>> >>>>>>>>>> Yee Man >>>>>>>>>> >>>>>>>>>> --- On Thu, 6/26/08, Alexie >>>>> Papanicolaou >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> From: Alexie >>> Papanicolaou >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>>> Subject: >>> Bio::Tools::dpAlign >>>>> feature >>>>>>>>>>> >>>>>>> request >>>>>>> >>>>>>>>>>> To: ymc at yahoo.com >>>>>>>>>>> Date: Thursday, June 26, >>> 2008, >>>>> 4:15 AM >>>>>>>>>>> Dear Yee Man Chan, >>>>>>>>>>> >>>>>>>>>>> Many thank you for this >>> module. I >>>>> like it >>>>>>>>>>> >>>>>>> very >>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> much. I was >>>>>>>>> >>>>>>>>> >>>>>>>>>>> wondering if >>>>>>>>>>> it would be possible for >>> you to >>>>> allow for >>>>>>>>>>> >>>>>>> IUPAC >>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> DNA codes. >>>>>>>>> >>>>>>>>> >>>>>>>>>>> I see it is in your TODO >>> list and >>>>> I hoped >>>>>>>>>>> >>>>>>> to >>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> inspire you >>>>>>>>> >>>>>>>>> >>>>>>>>>>> :-) >>>>>>>>>>> >>>>>>>>>>> Even a simple measure >>> with the >>>>> degenerate >>>>>>>>>>> >>>>>>> base >>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> containing >>>>>>>>> >>>>>>>>> >>>>>>>>>>> the aligned >>>>>>>>>>> base count as a >>> (perfect) match >>>>> would be >>>>>>>>>>> >>>>>>> very >>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> useful to me >>>>>>>>> >>>>>>>>> >>>>>>>>>>> (i'm sorry, i >>>>>>>>>>> 'm not a good coder >>> to do it >>>>> myself). >>>>>>>>>>> >>>>>>>>>>> many thanks for your >>> work so far. >>>>>>>>>>> alexie >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> -- >>>>>>>>>>> "Eppur si >>> evolve" >>>>> ("And yet >>>>>>>>>>> >>>>>>> it >>>>>>> >>>>>>>>>>> evolves") >>>>>>>>>>> -Galileo Jr (ca 21st >>> century) >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Alexie Papanicolaou >>>>>>>>>>> Entomology >>>>>>>>>>> Max Planck Institute for >>> Chemical >>>>> Ecology >>>>>>>>>>> Hans Knoell Str 8 >>>>>>>>>>> Jena 07745 >>>>>>>>>>> Germany >>>>>>>>>>> Email >>> apapanicolaou at ice.mpg.de >>>>>>>>>>> Tel +493641571561 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> -- >>>>>>>>> -- >>>>>>>>> "Eppur si evolve" >>> ("And yet >>>>> it >>>>>>>>> evolves") >>>>>>>>> -Galileo Jr (ca 21st century) >>>>>>>>> >>>>>>>>> "One Galileo in two >>> thousand years is >>>>>>>>> >>>>>>> enough." >>>>>>> >>>>>>>>> -Pope Pius XII >>>>>>>>> -- >>>>>>>>> Alexie Papanicolaou >>>>>>>>> Entomology >>>>>>>>> Max Planck Institute for >>> Chemical Ecology >>>>>>>>> Hans Knoell Str 8 >>>>>>>>> Jena 07745 >>>>>>>>> Germany >>>>>>>>> Email apapanicolaou at ice.mpg.de >>>>>>>>> Tel +493641571561 >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> -- >>>>>>> -- >>>>>>> "Eppur si evolve" ("And >>> yet it >>>>>>> evolves") >>>>>>> -Galileo Jr (ca 21st century) >>>>>>> >>>>>>> "One Galileo in two thousand years >>> is >>>>> enough." >>>>>>> -Pope Pius XII >>>>>>> -- >>>>>>> Alexie Papanicolaou >>>>>>> Entomology >>>>>>> Max Planck Institute for Chemical >>> Ecology >>>>>>> Hans Knoell Str 8 >>>>>>> Jena 07745 >>>>>>> Germany >>>>>>> Email apapanicolaou at ice.mpg.de >>>>>>> Tel +493641571561 >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> -- >>>>> -- >>>>> "Eppur si evolve" ("And yet it >>>>> evolves") >>>>> -Galileo Jr (ca 21st century) >>>>> >>>>> "One Galileo in two thousand years is >>> enough." >>>>> -Pope Pius XII >>>>> -- >>>>> Alexie Papanicolaou >>>>> Entomology >>>>> Max Planck Institute for Chemical Ecology >>>>> Hans Knoell Str 8 >>>>> Jena 07745 >>>>> Germany >>>>> Email apapanicolaou at ice.mpg.de >>>>> Tel +493641571561 >>>> >>>> >>>> >>> >>> -- >>> -- >>> "Eppur si evolve" ("And yet it >>> evolves") >>> -Galileo Jr (ca 21st century) >>> >>> "One Galileo in two thousand years is enough." >>> -Pope Pius XII >>> -- >>> Alexie Papanicolaou >>> Entomology >>> Max Planck Institute for Chemical Ecology >>> Hans Knoell Str 8 >>> Jena 07745 >>> Germany >>> Email apapanicolaou at ice.mpg.de >>> Tel +493641571561 >> >> >> > > Jason Stajich > jason at bioperl.org > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Marie-Claude Hofmann College of Veterinary Medicine University of Illinois Urbana-Champaign From lmanchon at univ-montp2.fr Wed Sep 3 09:04:14 2008 From: lmanchon at univ-montp2.fr (Laurent Manchon) Date: Wed, 03 Sep 2008 15:04:14 +0200 Subject: [Bioperl-l] parsing result of CAP3 (ACE file) Message-ID: <5.0.2.1.2.20080903150203.00c0db18@pop.univ-montp2.fr> -- Hi, Is somebody have a piece of code to parse result of CAP3 assembly program which format is ACE ? I need to retrieve the alignment from this file. thank you, Laurent -- +---------------------------------------------+ Laurent Manchon Email: lmanchon at univ-montp2.fr +---------------------------------------------+ From osborne6 at gmail.com Wed Sep 3 10:33:15 2008 From: osborne6 at gmail.com (John Osborne) Date: Wed, 3 Sep 2008 09:33:15 -0500 Subject: [Bioperl-l] interpro parsing enhancement? Message-ID: <324fccf0809030733y5c5e8592t5f0617d0ff4d2203@mail.gmail.com> Hi - I'm wondering if anyone is working on adding functionality to Bio::SeqIO::interpro to grab the Gene Ontology/GO classifications out of the interpro xml output? I've started working on that myself, but wanted to check if anyone else is doing the same. Thanks! -- John Osborne osborne6 at ieee.org/osborne6 at gmail.com/jro at freeshell.org From gundalav at gmail.com Wed Sep 3 10:48:07 2008 From: gundalav at gmail.com (Gundala Viswanath) Date: Wed, 3 Sep 2008 23:48:07 +0900 Subject: [Bioperl-l] Fitch's Parsimony Algorithm with Perl Message-ID: <73f827b50809030748o725d3772m681af9da3c0c26c0@mail.gmail.com> Hi, What's a correct way to implement Fitch's parsimony algorithm? Especially to compute minimum substitiution rate per column in the aligned sequence. Is there a Bioperl module to do it? For example CGGCGGAAAACTGTCCTCCGTGC mouse CGACGGAACATTCTCCTCCGCGC rat CGACGGAATATTCCCCTCCGTGC human CGACGGAAGACTCTCCTCCGTGC chimp 00100000302011000000100 -> number of subst per site (max parsimony) My code below doesn't seem to do the job. __BEGIN__ use Data::Dumper; use List::MoreUtils qw(uniq); # The related phylogenetic in Newick format tree is: my $tree = ' (mouse,rat,(human,chimp))'; my $sites = [ 'CGGCGGAAAACTGTCCTCCGTGC', # mouse 'CGACGGAACATTCTCCTCCGCGC', # rat 'CGACGGAATATTCCCCTCCGTGC', # human 'CGACGGAAGACTCTCCTCCGTGC', # chimp ]; my @val = my_parsimony($sites); print Dumper \@val; sub my_parsimony { my $tfbs = shift; my $mlen = length($tfbs->[0]); my $sum_min = 0; my @mincol; foreach my $pos ( 0 .. $mlen-1 ) { my @colbp = (); foreach my $site ( @{$tfbs} ) { my $bp = substr($site,$pos,1); push @colbp, $bp; } # this heuristic seems to be faulty # Column 11 it predicts 1 instead of 2 # Not sure how can I make use of the tree my $min_mm = scalar( uniq(@colbp) ) - 1; push @mincol, $min_mm; } return @mincol; } __END__ - Gundala Viswanath Jakarta - Indonesia From raulmendez at cbm.uam.es Wed Sep 3 10:33:46 2008 From: raulmendez at cbm.uam.es (Raul Mendez Giraldez) Date: Wed, 03 Sep 2008 16:33:46 +0200 Subject: [Bioperl-l] SeqHound In-Reply-To: <111DD141-75F8-4437-9EAD-E049BBADB515@uiuc.edu> Message-ID: <1220452426.31595.92.camel@pepa.cbm.uam.es> Hi Chris, I'm trying to set up and run bioperl Seqhound donwloaded from: http://bond.unleashedinformatics.com/downloads/api//seqhound-bioperl-4.0.tar.gz and I always get connection error messages. Do you know which version of SeqHound should I use and how can I configure to make it work? I've tried several possibilities for server1 at .shoundremrc as [remote] server1 = bond.unleashedinformatics.com CGI = /cgi-bin/seqrem port=8080 Also, I would like to get all the possible protein-protein interaction for a set of protein sequences. Would this be possible using SeqHound? Thanks, Ra?l -- Ra?l M?ndez Gir?ldez, Ph.D. Bioinformatics Unit Centre for Molecular Biology "Severo Ochoa" Universidad Aut?noma de Madrid C/ Nicol?s Cabrera, 1 Cantoblanco 28049, Madrid SPAIN Phone: +34 91 196 4633 From jaudall at gmail.com Wed Sep 3 11:38:08 2008 From: jaudall at gmail.com (Joshua Udall) Date: Wed, 3 Sep 2008 09:38:08 -0600 Subject: [Bioperl-l] parsing result of CAP3 (ACE file) In-Reply-To: <5.0.2.1.2.20080903150203.00c0db18@pop.univ-montp2.fr> References: <5.0.2.1.2.20080903150203.00c0db18@pop.univ-montp2.fr> Message-ID: <52cea20c0809030838k6fb1498btc15b8e76d98f9d70@mail.gmail.com> Laurent - I have modified modules that will do it as I recently ran into problems with the DB_FILE module in Assembly::IO. In addition, the current version of cap3 seems to put a contig length where a pad length is expected (based on the Ace format description). The modules I have will parse the ace file contig-by-contig rather than having the entire assembly slurped into memory (or a tied hash) all at once. You are welcome to them if you are interested and I'd like to get them in Bioperl at some point. Bascially, there are three files - a modified Contig.pm, ContigIO.pm, and a modified ace.pm (in a ContigIO directory). Josh On Wed, Sep 3, 2008 at 7:04 AM, Laurent Manchon wrote: > -- Hi, > > Is somebody have a piece of code to parse result of CAP3 assembly program > which > format is ACE ? > I need to retrieve the alignment from this file. > > thank you, > Laurent -- > > > > > +---------------------------------------------+ > Laurent Manchon > Email: lmanchon at univ-montp2.fr > +---------------------------------------------+ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Joshua Udall Assistant Professor 295 WIDB Plant and Wildlife Science Dept. Brigham Young University Provo, UT 84602 801-422-9307 Fax: 801-422-0008 USA From hartzell at alerce.com Wed Sep 3 19:19:45 2008 From: hartzell at alerce.com (George Hartzell) Date: Wed, 3 Sep 2008 16:19:45 -0700 Subject: [Bioperl-l] What's up with line 248 of Bio::Coordinate::Pair? Message-ID: <18623.7057.95449.99461@almost.alerce.com> Ok, confess. None of you know what's up with line 248 of Bio::Coordinate::Pair, do you? You probably don't even know what's *on* that line. Wonder how many will go look. Now that I either have your attention or have pissed you off (or both...), I think that creating a new Bio::Location::Split object in Bio::Coordinate::Pair::map() is a leftover or something, but I'm not quite sure enough to excise it and commit the change. Anyone up for it? g. From cjfields at illinois.edu Wed Sep 3 21:29:49 2008 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 3 Sep 2008 20:29:49 -0500 Subject: [Bioperl-l] What's up with line 248 of Bio::Coordinate::Pair? In-Reply-To: <18623.7057.95449.99461@almost.alerce.com> References: <18623.7057.95449.99461@almost.alerce.com> Message-ID: Well, it doesn't look like the SplitLocation is even used, so I think it is safe to remove. chris On Sep 3, 2008, at 6:19 PM, George Hartzell wrote: > > Ok, confess. None of you know what's up with line 248 of > Bio::Coordinate::Pair, do you? You probably don't even know what's > *on* that line. Wonder how many will go look. > > Now that I either have your attention or have pissed you off (or > both...), I think that creating a new Bio::Location::Split object in > Bio::Coordinate::Pair::map() is a leftover or something, but I'm not > quite sure enough to excise it and commit the change. > > Anyone up for it? > > g. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Marie-Claude Hofmann College of Veterinary Medicine University of Illinois Urbana-Champaign From jason at bioperl.org Thu Sep 4 00:40:38 2008 From: jason at bioperl.org (Jason Stajich) Date: Wed, 3 Sep 2008 21:40:38 -0700 Subject: [Bioperl-l] What's up with line 248 of Bio::Coordinate::Pair? In-Reply-To: References: <18623.7057.95449.99461@almost.alerce.com> Message-ID: <222DB5D8-BCCB-448E-BDEE-068A4A432660@bioperl.org> Agreed - I don't know if that was something was changed mid-stream, but removing it should cause no pain... -j On Sep 3, 2008, at 6:29 PM, Chris Fields wrote: > Well, it doesn't look like the SplitLocation is even used, so I > think it is safe to remove. > > chris > > On Sep 3, 2008, at 6:19 PM, George Hartzell wrote: > >> >> Ok, confess. None of you know what's up with line 248 of >> Bio::Coordinate::Pair, do you? You probably don't even know what's >> *on* that line. Wonder how many will go look. >> >> Now that I either have your attention or have pissed you off (or >> both...), I think that creating a new Bio::Location::Split object in >> Bio::Coordinate::Pair::map() is a leftover or something, but I'm not >> quite sure enough to excise it and commit the change. >> >> Anyone up for it? >> >> g. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Marie-Claude Hofmann > College of Veterinary Medicine > University of Illinois Urbana-Champaign > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason at bioperl.org From heikki at sanbi.ac.za Thu Sep 4 02:17:31 2008 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Thu, 4 Sep 2008 08:17:31 +0200 Subject: [Bioperl-l] What's up with line 248 of Bio::Coordinate::Pair? In-Reply-To: <222DB5D8-BCCB-448E-BDEE-068A4A432660@bioperl.org> References: <18623.7057.95449.99461@almost.alerce.com> <222DB5D8-BCCB-448E-BDEE-068A4A432660@bioperl.org> Message-ID: <200809040817.31916.heikki@sanbi.ac.za> Quilty. So I removed the line. George, please do not try to piss us off. You can get all the attention you want from us. :) What are you planning to do to Bio::Coordinate classes? -Heikki On Thursday 04 September 2008 06:40:38 Jason Stajich wrote: > Agreed - I don't know if that was something was changed mid-stream, > but removing it should cause no pain... > -j > > On Sep 3, 2008, at 6:29 PM, Chris Fields wrote: > > Well, it doesn't look like the SplitLocation is even used, so I > > think it is safe to remove. > > > > chris > > > > On Sep 3, 2008, at 6:19 PM, George Hartzell wrote: > >> Ok, confess. None of you know what's up with line 248 of > >> Bio::Coordinate::Pair, do you? You probably don't even know what's > >> *on* that line. Wonder how many will go look. > >> > >> Now that I either have your attention or have pissed you off (or > >> both...), I think that creating a new Bio::Location::Split object in > >> Bio::Coordinate::Pair::map() is a leftover or something, but I'm not > >> quite sure enough to excise it and commit the change. > >> > >> Anyone up for it? > >> > >> g. > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > Christopher Fields > > Postdoctoral Researcher > > Lab of Dr. Marie-Claude Hofmann > > College of Veterinary Medicine > > University of Illinois Urbana-Champaign > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Jason Stajich > jason at bioperl.org > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From hartzell at alerce.com Thu Sep 4 12:00:21 2008 From: hartzell at alerce.com (George Hartzell) Date: Thu, 4 Sep 2008 09:00:21 -0700 Subject: [Bioperl-l] What's up with line 248 of Bio::Coordinate::Pair? In-Reply-To: <200809040817.31916.heikki@sanbi.ac.za> References: <18623.7057.95449.99461@almost.alerce.com> <222DB5D8-BCCB-448E-BDEE-068A4A432660@bioperl.org> <200809040817.31916.heikki@sanbi.ac.za> Message-ID: <18624.1557.758033.258065@almost.alerce.com> Heikki Lehvaslaiho writes: > Quilty. So I removed the line. > > George, please do not try to piss us off. You can get all the attention you > want from us. :) Try not. Do... or do not. There is no try. > What are you planning to do to Bio::Coordinate classes? It's a project for a paying customer (yikes...). Pretty much exactly what GeneMapper does, though I'll probably end up dangling a couple more named coordinate spaces off of it. Nothing earth shattering. The classes look *great*. g. From hartzell at alerce.com Fri Sep 5 15:01:35 2008 From: hartzell at alerce.com (George Hartzell) Date: Fri, 05 Sep 2008 12:01:35 -0700 Subject: [Bioperl-l] Bio::Coordinate::Collection could DoWhatIMean better (w/ patch) Message-ID: Hi all, Bio::Coordinate::Collection surprised me a bit. At first I thought there was a bug, but it's clearly doing what it's supposed to. Now I'm wondering if what it's supposed to be doing makes sense in some context, or if what I expected would be better functionality. t/CoordinateMapper.t sets up the following scenario: # # Collection # # 1 5 6 10 # |---| |---| #-----|----------------------- # 1 5 9 15 19 # pair1 pair2 Then goes on to do the following query: # match more than two $pos = Bio::Location::Simple->new (-start => 5, -end => 19 ); ok $res = $transcribe->map($pos); is $res->each_gap, 2; is $res->each_match, 2; I was surprised to see that there were two gaps, one gene:10-19 and one from gene:5-14. Looking at the code, what's really happening is that, for the exon1 mapper there's match with gene:5-9 and a gap with gene:10-19 and for the exon2 mapper there's a gap with gene:5-14 and a match with gene:15-19. All four Result's just get tossed into the return value. The result my intuition wants is that there are two matches (gene:5-9 with exon1 and gene:15-19 with exon2) and a gap (gene:10-14). Yes, I guess that I could just synthesize these myself from the result in my app. It still seems that the current result is a bug though, since there's no way of knowing when you're walking through $res->each_Location that the first "gap" is with respect to the exon1 mapper and that the second "gap" is with respect to the exon2 mapper. The gaps are meaningless. I "fixed" it to work the way I think it should (two matches, one gap). I actually extended the test case a bit so that there's a multi-base gap, a match, another multibase-gap, another match, then a single base gap (just to make sure I got that right...). I had to touch up the test file a bit to account for my new test. The gaps that I return have a strand of 'undef', which seems to be The Right Thing. There's also a bit of funny business where I hang onto the seq_id of the gapped sequence. It assumes that the "in" sequence is the same for all of the mappers. This seems safe since otherwise the entire query is kind of weird.... There's a patch to todays svn head at: http://shrimp.alerce.com/bioperl/collection-diffs.txt The patch changes Build.PL to include a dependency on Set::IntSpan, CoordinateMapper.t to update the tests, and Bio/CoordinateMapper/Collection.pm for the new code. Who's code would this break. If anyone's relying on the current behaviour re: gaps, what's the situation in which you find it useful? Thanks! g. From ajmackey at gmail.com Fri Sep 5 16:54:56 2008 From: ajmackey at gmail.com (Aaron Mackey) Date: Fri, 5 Sep 2008 16:54:56 -0400 Subject: [Bioperl-l] Bio::Coordinate::Collection could DoWhatIMean better (w/ patch) In-Reply-To: References: Message-ID: <24c96eca0809051354h5b7218edtaa720140901d023f@mail.gmail.com> There are two uses for Collection: 1) all the "in" seq_id's are the same, and George's patch makes sense to me (i.e. agrees with my intuition) 2) all the "in" seq_id's are *not* the same (i.e. the collection is just a hash of indivual pairs), in which case my query would only match the subset of pairs having identical seq_id's to that specified by the query ... and then you're back to case #1 So overall, it looks like this was a bug, but I'd of course want to hear Heikki's opinion. Thanks for raising this, -Aaron On Fri, Sep 5, 2008 at 3:01 PM, George Hartzell wrote: > > Hi all, > > Bio::Coordinate::Collection surprised me a bit. At first I thought > there was a bug, but it's clearly doing what it's supposed to. Now > I'm wondering if what it's supposed to be doing makes sense in some > context, or if what I expected would be better functionality. > > t/CoordinateMapper.t sets up the following scenario: > > # > # Collection > # > # 1 5 6 10 > # |---| |---| > #-----|----------------------- > # 1 5 9 15 19 > # pair1 pair2 > > Then goes on to do the following query: > > # match more than two > $pos = Bio::Location::Simple->new (-start => 5, -end => 19 ); > ok $res = $transcribe->map($pos); > is $res->each_gap, 2; > is $res->each_match, 2; > > I was surprised to see that there were two gaps, one gene:10-19 and > one from gene:5-14. Looking at the code, what's really happening is > that, for the exon1 mapper there's match with gene:5-9 and a gap with > gene:10-19 and for the exon2 mapper there's a gap with gene:5-14 and a > match with gene:15-19. All four Result's just get tossed into the > return value. > > The result my intuition wants is that there are two matches (gene:5-9 > with exon1 and gene:15-19 with exon2) and a gap (gene:10-14). > > Yes, I guess that I could just synthesize these myself from the result > in my app. > > It still seems that the current result is a bug though, since there's > no way of knowing when you're walking through $res->each_Location that > the first "gap" is with respect to the exon1 mapper and that the > second "gap" is with respect to the exon2 mapper. The gaps are > meaningless. > > I "fixed" it to work the way I think it should (two matches, one > gap). I actually extended the test case a bit so that there's a > multi-base gap, a match, another multibase-gap, another match, then a > single base gap (just to make sure I got that right...). I had to > touch up the test file a bit to account for my new test. > > The gaps that I return have a strand of 'undef', which seems to be The > Right Thing. There's also a bit of funny business where I hang onto > the seq_id of the gapped sequence. It assumes that the "in" sequence > is the same for all of the mappers. This seems safe since otherwise > the entire query is kind of weird.... > > There's a patch to todays svn head at: > > http://shrimp.alerce.com/bioperl/collection-diffs.txt > > The patch changes Build.PL to include a dependency on Set::IntSpan, > CoordinateMapper.t to update the tests, and > Bio/CoordinateMapper/Collection.pm for the new code. > > Who's code would this break. > > If anyone's relying on the current behaviour re: gaps, what's the > situation in which you find it useful? > > Thanks! > > g. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jimhu at tamu.edu Mon Sep 8 14:44:22 2008 From: jimhu at tamu.edu (Jim Hu) Date: Mon, 8 Sep 2008 13:44:22 -0500 Subject: [Bioperl-l] Circular genomes in Chado/BioPerl Message-ID: <87B87004-A586-44A0-BB10-D4AA3FE9E669@tamu.edu> In discussions with GMOD about Gbrowse, we've come up with a proposal for handling circular genomes and features that cross the origin in such genomes. This applies to lots of prokaryotic and viral genomes, and might be valuable for some ways of representing terminally redundant linear genomes. 1) Keep the requirement that start < end 2) allow end > parent feature length 3) parent feature gets an is_circular boolean 4) use modular arithmetic to calculate the real position of end on the parent feature. We'd like to do this in a way that will be consistent with Chado and BioPerl representation of features as much as possible (realizing that there is the usual interbase or not coordinate issue). What do people think? Lincoln is on board for modifying the GFF3 spec. Thanks! Jim Hu ===================================== Jim Hu Associate Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054 From ajmackey at gmail.com Mon Sep 8 15:57:50 2008 From: ajmackey at gmail.com (Aaron Mackey) Date: Mon, 8 Sep 2008 15:57:50 -0400 Subject: [Bioperl-l] [Gmod-schema] Circular genomes in Chado/BioPerl In-Reply-To: <87B87004-A586-44A0-BB10-D4AA3FE9E669@tamu.edu> References: <87B87004-A586-44A0-BB10-D4AA3FE9E669@tamu.edu> Message-ID: <24c96eca0809081257l16461b23uaefe8154ed038bea@mail.gmail.com> How can you handle features that may cross the origin more than once? The modulus, though simple, seems to be only half the solution. It also makes it difficult to place features in the genome "by eye" (having to do the modulus subtraction in my head), or in sorting/filtering operations. I have an alternative that I wondered if you considered: allow the start/end to have an additional "circular revolution" prefix: a typical range tuple like: 100 200 - is thus shorthand for: 0:100 0:200 - (i.e. both the 100 and 200 are in the same "revolution" around the genome) and is then distinguishable from an "around the genome + 100" feature of: 1:100 0:200 - Just an alternative to consider (if you haven't already). I'm not wedded to the syntax, but I wouldn't want to see new columns in GFF just for this. Essentially, what you want is some form of compound polar coordinates, it seems. -Aaron On Mon, Sep 8, 2008 at 2:44 PM, Jim Hu wrote: > In discussions with GMOD about Gbrowse, we've come up with a proposal for > handling circular genomes and features that cross the origin in such > genomes. This applies to lots of prokaryotic and viral genomes, and might > be valuable for some ways of representing terminally redundant linear > genomes. > 1) Keep the requirement that start < end > 2) allow end > parent feature length > 3) parent feature gets an is_circular boolean > 4) use modular arithmetic to calculate the real position of end on the > parent feature. > We'd like to do this in a way that will be consistent with Chado and BioPerl > representation of features as much as possible (realizing that there is the > usual interbase or not coordinate issue). What do people think? Lincoln is > on board for modifying the GFF3 spec. > Thanks! > Jim Hu > > ===================================== > > Jim Hu > > Associate Professor > > Dept. of Biochemistry and Biophysics > > 2128 TAMU > > Texas A&M Univ. > > College Station, TX 77843-2128 > > 979-862-4054 > > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great > prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Gmod-schema mailing list > Gmod-schema at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/gmod-schema > > From js5 at sanger.ac.uk Mon Sep 8 16:13:12 2008 From: js5 at sanger.ac.uk (James Smith) Date: Mon, 8 Sep 2008 21:13:12 +0100 (BST) Subject: [Bioperl-l] Circular genomes in Chado/BioPerl In-Reply-To: <87B87004-A586-44A0-BB10-D4AA3FE9E669@tamu.edu> References: <87B87004-A586-44A0-BB10-D4AA3FE9E669@tamu.edu> Message-ID: On Mon, 8 Sep 2008, Jim Hu wrote: > In discussions with GMOD about Gbrowse, we've come up with a proposal for > handling circular genomes and features that cross the origin in such genomes. > This applies to lots of prokaryotic and viral genomes, and might be valuable > for some ways of representing terminally redundant linear genomes. > > 1) Keep the requirement that start < end > 2) allow end > parent feature length > 3) parent feature gets an is_circular boolean > 4) use modular arithmetic to calculate the real position of end on the parent > feature. This is how we are considering handling features in Ensembl as well (the Ensembl genomes project will be setting up websites for bacterial and viral genomes) > > We'd like to do this in a way that will be consistent with Chado and BioPerl > representation of features as much as possible (realizing that there is the > usual interbase or not coordinate issue). What do people think? Lincoln is > on board for modifying the GFF3 spec. > > Thanks! > > Jim Hu > > ===================================== > Jim Hu > Associate Professor > Dept. of Biochemistry and Biophysics > 2128 TAMU > Texas A&M Univ. > College Station, TX 77843-2128 > 979-862-4054 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From heikki at sanbi.ac.za Tue Sep 9 03:50:11 2008 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Tue, 9 Sep 2008 09:50:11 +0200 Subject: [Bioperl-l] Bio::Coordinate::Collection could DoWhatIMean better (w/ patch) In-Reply-To: References: Message-ID: <200809090950.11821.heikki@sanbi.ac.za> George, This is an error from my side. Great that you have a fix already. My only worry is the number of external dependencies in BioPerl. To limit these we have recoded number of functionalities into BioPerl-specific modules. Before you commit the fix, could you see if Bio::RangeI could be used or easily extended to be used instead of Set::IntSpan? Thanks, -Heikki On Friday 05 September 2008 21:01:35 George Hartzell wrote: > Hi all, > > Bio::Coordinate::Collection surprised me a bit. At first I thought > there was a bug, but it's clearly doing what it's supposed to. Now > I'm wondering if what it's supposed to be doing makes sense in some > context, or if what I expected would be better functionality. > > t/CoordinateMapper.t sets up the following scenario: > > # > # Collection > # > # 1 5 6 10 > # |---| |---| > #-----|----------------------- > # 1 5 9 15 19 > # pair1 pair2 > > Then goes on to do the following query: > > # match more than two > $pos = Bio::Location::Simple->new (-start => 5, -end => 19 ); > ok $res = $transcribe->map($pos); > is $res->each_gap, 2; > is $res->each_match, 2; > > I was surprised to see that there were two gaps, one gene:10-19 and > one from gene:5-14. Looking at the code, what's really happening is > that, for the exon1 mapper there's match with gene:5-9 and a gap with > gene:10-19 and for the exon2 mapper there's a gap with gene:5-14 and a > match with gene:15-19. All four Result's just get tossed into the > return value. > > The result my intuition wants is that there are two matches (gene:5-9 > with exon1 and gene:15-19 with exon2) and a gap (gene:10-14). > > Yes, I guess that I could just synthesize these myself from the result > in my app. > > It still seems that the current result is a bug though, since there's > no way of knowing when you're walking through $res->each_Location that > the first "gap" is with respect to the exon1 mapper and that the > second "gap" is with respect to the exon2 mapper. The gaps are > meaningless. > > I "fixed" it to work the way I think it should (two matches, one > gap). I actually extended the test case a bit so that there's a > multi-base gap, a match, another multibase-gap, another match, then a > single base gap (just to make sure I got that right...). I had to > touch up the test file a bit to account for my new test. > > The gaps that I return have a strand of 'undef', which seems to be The > Right Thing. There's also a bit of funny business where I hang onto > the seq_id of the gapped sequence. It assumes that the "in" sequence > is the same for all of the mappers. This seems safe since otherwise > the entire query is kind of weird.... > > There's a patch to todays svn head at: > > http://shrimp.alerce.com/bioperl/collection-diffs.txt > > The patch changes Build.PL to include a dependency on Set::IntSpan, > CoordinateMapper.t to update the tests, and > Bio/CoordinateMapper/Collection.pm for the new code. > > Who's code would this break. > > If anyone's relying on the current behaviour re: gaps, what's the > situation in which you find it useful? > > Thanks! > > g. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From heikki at sanbi.ac.za Tue Sep 9 03:50:11 2008 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Tue, 9 Sep 2008 09:50:11 +0200 Subject: [Bioperl-l] Bio::Coordinate::Collection could DoWhatIMean better (w/ patch) In-Reply-To: References: Message-ID: <200809090950.11821.heikki@sanbi.ac.za> George, This is an error from my side. Great that you have a fix already. My only worry is the number of external dependencies in BioPerl. To limit these we have recoded number of functionalities into BioPerl-specific modules. Before you commit the fix, could you see if Bio::RangeI could be used or easily extended to be used instead of Set::IntSpan? Thanks, -Heikki On Friday 05 September 2008 21:01:35 George Hartzell wrote: > Hi all, > > Bio::Coordinate::Collection surprised me a bit. At first I thought > there was a bug, but it's clearly doing what it's supposed to. Now > I'm wondering if what it's supposed to be doing makes sense in some > context, or if what I expected would be better functionality. > > t/CoordinateMapper.t sets up the following scenario: > > # > # Collection > # > # 1 5 6 10 > # |---| |---| > #-----|----------------------- > # 1 5 9 15 19 > # pair1 pair2 > > Then goes on to do the following query: > > # match more than two > $pos = Bio::Location::Simple->new (-start => 5, -end => 19 ); > ok $res = $transcribe->map($pos); > is $res->each_gap, 2; > is $res->each_match, 2; > > I was surprised to see that there were two gaps, one gene:10-19 and > one from gene:5-14. Looking at the code, what's really happening is > that, for the exon1 mapper there's match with gene:5-9 and a gap with > gene:10-19 and for the exon2 mapper there's a gap with gene:5-14 and a > match with gene:15-19. All four Result's just get tossed into the > return value. > > The result my intuition wants is that there are two matches (gene:5-9 > with exon1 and gene:15-19 with exon2) and a gap (gene:10-14). > > Yes, I guess that I could just synthesize these myself from the result > in my app. > > It still seems that the current result is a bug though, since there's > no way of knowing when you're walking through $res->each_Location that > the first "gap" is with respect to the exon1 mapper and that the > second "gap" is with respect to the exon2 mapper. The gaps are > meaningless. > > I "fixed" it to work the way I think it should (two matches, one > gap). I actually extended the test case a bit so that there's a > multi-base gap, a match, another multibase-gap, another match, then a > single base gap (just to make sure I got that right...). I had to > touch up the test file a bit to account for my new test. > > The gaps that I return have a strand of 'undef', which seems to be The > Right Thing. There's also a bit of funny business where I hang onto > the seq_id of the gapped sequence. It assumes that the "in" sequence > is the same for all of the mappers. This seems safe since otherwise > the entire query is kind of weird.... > > There's a patch to todays svn head at: > > http://shrimp.alerce.com/bioperl/collection-diffs.txt > > The patch changes Build.PL to include a dependency on Set::IntSpan, > CoordinateMapper.t to update the tests, and > Bio/CoordinateMapper/Collection.pm for the new code. > > Who's code would this break. > > If anyone's relying on the current behaviour re: gaps, what's the > situation in which you find it useful? > > Thanks! > > g. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From Frigerio at pierroton.inra.fr Tue Sep 9 04:45:19 2008 From: Frigerio at pierroton.inra.fr (Jean-Marc FRIGERIO) Date: Tue, 9 Sep 2008 10:45:19 +0200 Subject: [Bioperl-l] parsing result of CAP3 (ACE file) Message-ID: <200809091045.19249.Frigerio@pierroton.inra.fr> > > -- Hi, > > > > Is somebody have a piece of code to parse result of CAP3 assembly program > > which > > format is ACE ? > > I need to retrieve the alignment from this file. > > > > thank you, > > Laurent -- > > > > > > > > > > +---------------------------------------------+ > > Laurent Manchon > > Email: lmanchon at univ-montp2.fr > > +---------------------------------------------+ > > Laurent - > > I have modified modules that will do it as I recently ran into problems > with the DB_FILE module in Assembly::IO. In addition, the current version > of cap3 seems to put a contig length where a pad length is expected (based > on the Ace format description). The modules I have will parse the ace file > contig-by-contig rather than having the entire assembly slurped into memory > (or a tied hash) all at once. You are welcome to them if you are > interested and I'd like to get them in Bioperl at some point. Bascially, > there are three files - a modified Contig.pm, ContigIO.pm, and a modified > ace.pm (in a ContigIO directory). > > Josh Hi, Here are a 2 pieces of code running on an ace file (output of phrap is that the same as cap3 ?) ----------------------------- 1 ----------------------------------------- my $assembly = Bio::Assembly::IO->new('-file' => $file, '-format' => 'ace')->next_assembly; for my $contig ($assembly->all_contigs) { my $ct_seq = $contig->get_consensus_sequence; (my $ref_seq = uc $ct_seq->seq) =~ s/-//g; my $debut = $pos - 100 > 0 ? $pos - 100 : 1; my $fin = $pos + 100 <= length $ref_seq ? $pos + 100 : length $ref_seq; my $coll = $contig->get_features_collection; my @coll = $coll->features_in_range('-start' => $debut, '-end' => $fin); for my $tag (@coll) { next unless $tag->primary_tag eq 'comment'; #print "TAG: ",$tag->start,"\n"; my $tag_pos = $contig->change_coord('gapped consensus','ungapped consensus',$tag->start); #print "TAG POS: $tag_pos\n"; next if $pos == $tag_pos; substr($ref_seq,$tag_pos-1,1,'N'); } } ------------------------------------ 2 ------------------- my $assembly = Bio::Assembly::IO->new( '-file' => $file, '-format' => 'ace')->next_assembly; for my $contig ($assembly->all_contigs) { for my $seq ($contig->each_seq) { my $id = $seq->id; my $s = $seq->seq; my ($start,$end) = ($contig->change_coord("aligned $id","ungapped consensus", $seq->start), $contig->change_coord("aligned $id","ungapped consensus",$seq->end)); my $dir = $seq->strand < 0 ? 'R' : 'F'; ...... } -- Jean-Marc From zheboyang at gmail.com Tue Sep 9 07:05:15 2008 From: zheboyang at gmail.com (boyang zhe) Date: Tue, 9 Sep 2008 19:05:15 +0800 Subject: [Bioperl-l] help:HMM parsing error Message-ID: <127e75f60809090405h644e51eftcab073e8bf179720@mail.gmail.com> I write a script to parse the HMMER report ,it is as follows: #!/usr/bin/perl -w #TODO: Parse the HMMER report use strict; use Bio::SearchIO; my $directory="./HMM/"; opendir(HMMDIR, $directory), or die "Can't open the directory!"; my @filelist=readdir(HMMDIR); foreach my $filename(@filelist) { if ($filename !~/^\./) { my $infile="$directory"."$filename"; my $outfile="$infile"."HMMParse"; my $in = new Bio::SearchIO(-format => 'hmmer',-file =>"$infile"); - Ignored: while (my $result= $in->next_result ) { # get a Bio::Search::Result::HMMERResult object # get hits numbers my $hitnumber=$result->num_hits; if ($hitnumber != 0) { open(OUT, ">$outfile"), or die "can't open the output file!!!!"; while (my $hits= $result->next_hit ) { my $value=$hits->significance; if ($value <=0.01) { print OUT $hits->name,"\t",$hits->description,"\t",$hits->significance,"\n"; } } close OUT; } } } } closedir(HMMDIR); ############################################################## When it run, you will see that: -------------------- WARNING --------------------- MSG: unrecognized line: +E +L i T eek+ e+ ++ +l++H Y+ I+ + --------------------------------------------------- why? I hope to get your help, hanks very much! - Done. From bix at sendu.me.uk Tue Sep 9 07:46:39 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 09 Sep 2008 12:46:39 +0100 Subject: [Bioperl-l] help:HMM parsing error In-Reply-To: <127e75f60809090405h644e51eftcab073e8bf179720@mail.gmail.com> References: <127e75f60809090405h644e51eftcab073e8bf179720@mail.gmail.com> Message-ID: <48C6621F.8040304@sendu.me.uk> boyang zhe wrote: > I write a script to parse the HMMER report ,it is as follows: [...] > my $in = new Bio::SearchIO(-format => 'hmmer',-file =>"$infile"); [...] > -------------------- WARNING --------------------- > MSG: unrecognized line: +E +L i T eek+ e+ ++ +l++H > Y+ I+ + > > --------------------------------------------------- > > why? I hope to get your help, hanks very much! I didn't check your code, but the easiest thing to try would be to use -format => 'hmmer_pull' to use an alternate parser that may be able to recognise that line. You might need to install the latest Bioperl from SVN (or at least 1.5.2) to get access to the hmmer_pull parser. From bosborne11 at verizon.net Tue Sep 9 10:50:38 2008 From: bosborne11 at verizon.net (Brian Osborne) Date: Tue, 9 Sep 2008 10:50:38 -0400 Subject: [Bioperl-l] SeqHound In-Reply-To: <1220452426.31595.92.camel@pepa.cbm.uam.es> References: <1220452426.31595.92.camel@pepa.cbm.uam.es> Message-ID: <55655D67-352A-4E0D-B402-7FC30628C1B1@verizon.net> Raul, After spending a few minutes at bond.unleashedinformatics.com I have to admit that it's not clear how one accesses their free version of BOND. There are no examples that I can see in their packages. If you are interested in looking at protein-protein networks in the Bioperl context you can also check out the bioperl-network package: http://www.bioperl.org/wiki/Network_package If you don't care what language you're using then you should consider Cytoscape, it's probably the package with the most capability. Brian O. On Sep 3, 2008, at 10:33 AM, Raul Mendez Giraldez wrote: > Hi Chris, > > I'm trying to set up and run bioperl Seqhound donwloaded from: > > http://bond.unleashedinformatics.com/downloads/api//seqhound-bioperl-4.0.tar.gz > > and I always get connection error messages. Do you know which > version of > SeqHound should I use and how can I configure to make it work? I've > tried several possibilities for server1 at .shoundremrc as > > [remote] > server1 = bond.unleashedinformatics.com > CGI = /cgi-bin/seqrem > port=8080 > > Also, I would like to get all the possible protein-protein interaction > for a set of protein sequences. Would this be possible using SeqHound? > > Thanks, > Ra?l > > > > > -- > Ra?l M?ndez Gir?ldez, Ph.D. > Bioinformatics Unit > Centre for Molecular Biology "Severo Ochoa" > Universidad Aut?noma de Madrid > C/ Nicol?s Cabrera, 1 > Cantoblanco 28049, Madrid > SPAIN > > Phone: +34 91 196 4633 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jimhu at tamu.edu Tue Sep 9 12:05:59 2008 From: jimhu at tamu.edu (Jim Hu) Date: Tue, 9 Sep 2008 11:05:59 -0500 Subject: [Bioperl-l] [Gmod-schema] Circular genomes in Chado/BioPerl In-Reply-To: <24c96eca0809081257l16461b23uaefe8154ed038bea@mail.gmail.com> References: <87B87004-A586-44A0-BB10-D4AA3FE9E669@tamu.edu> <24c96eca0809081257l16461b23uaefe8154ed038bea@mail.gmail.com> Message-ID: Hi Aaron, I was thinking this would be handled by making the end=parent feature length x 2 + end coord. end/parent length = number of times crosses origin. Jim On Sep 8, 2008, at 2:57 PM, Aaron Mackey wrote: > How can you handle features that may cross the origin more than once? > The modulus, though simple, seems to be only half the solution. It > also makes it difficult to place features in the genome "by eye" > (having to do the modulus subtraction in my head), or in > sorting/filtering operations. > > I have an alternative that I wondered if you considered: allow the > start/end to have an additional "circular revolution" prefix: > > a typical range tuple like: 100 200 - > is thus shorthand for: 0:100 0:200 - > (i.e. both the 100 and 200 are in the same "revolution" around the > genome) > > and is then distinguishable from an "around the genome + 100" > feature of: > 1:100 0:200 - > > Just an alternative to consider (if you haven't already). I'm not > wedded to the syntax, but I wouldn't want to see new columns in GFF > just for this. Essentially, what you want is some form of compound > polar coordinates, it seems. > > -Aaron > > On Mon, Sep 8, 2008 at 2:44 PM, Jim Hu wrote: >> In discussions with GMOD about Gbrowse, we've come up with a >> proposal for >> handling circular genomes and features that cross the origin in such >> genomes. This applies to lots of prokaryotic and viral genomes, >> and might >> be valuable for some ways of representing terminally redundant linear >> genomes. >> 1) Keep the requirement that start < end >> 2) allow end > parent feature length >> 3) parent feature gets an is_circular boolean >> 4) use modular arithmetic to calculate the real position of end on >> the >> parent feature. >> We'd like to do this in a way that will be consistent with Chado >> and BioPerl >> representation of features as much as possible (realizing that >> there is the >> usual interbase or not coordinate issue). What do people think? >> Lincoln is >> on board for modifying the GFF3 spec. >> Thanks! >> Jim Hu >> >> ===================================== >> >> Jim Hu >> >> Associate Professor >> >> Dept. of Biochemistry and Biophysics >> >> 2128 TAMU >> >> Texas A&M Univ. >> >> College Station, TX 77843-2128 >> >> 979-862-4054 >> >> >> ------------------------------------------------------------------------- >> This SF.Net email is sponsored by the Moblin Your Move Developer's >> challenge >> Build the coolest Linux based applications with Moblin SDK & win >> great >> prizes >> Grand prize is a trip for two to an Open Source event anywhere in >> the world >> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >> _______________________________________________ >> Gmod-schema mailing list >> Gmod-schema at lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/gmod-schema >> >> ===================================== Jim Hu Associate Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054 From jason at bioperl.org Tue Sep 9 12:07:45 2008 From: jason at bioperl.org (Jason Stajich) Date: Tue, 9 Sep 2008 09:07:45 -0700 Subject: [Bioperl-l] help:HMM parsing error In-Reply-To: <48C6621F.8040304@sendu.me.uk> References: <127e75f60809090405h644e51eftcab073e8bf179720@mail.gmail.com> <48C6621F.8040304@sendu.me.uk> Message-ID: Although it would be good to fix the parser as well -- best solution is to submit that report as a bug to bugzilla http://bugzilla.open-bio.org/ -jason On Sep 9, 2008, at 4:46 AM, Sendu Bala wrote: > boyang zhe wrote: >> I write a script to parse the HMMER report ,it is as follows: > [...] >> my $in = new Bio::SearchIO(-format => 'hmmer',-file =>"$infile"); > [...] >> -------------------- WARNING --------------------- >> MSG: unrecognized line: +E +L i T eek+ e+ + >> + +l++H >> Y+ I+ + >> --------------------------------------------------- >> why? I hope to get your help, hanks very much! > > I didn't check your code, but the easiest thing to try would be to > use -format => 'hmmer_pull' to use an alternate parser that may be > able to recognise that line. You might need to install the latest > Bioperl from SVN (or at least 1.5.2) to get access to the hmmer_pull > parser. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason at bioperl.org From cain.cshl at gmail.com Tue Sep 9 13:33:12 2008 From: cain.cshl at gmail.com (Scott Cain) Date: Tue, 9 Sep 2008 13:33:12 -0400 Subject: [Bioperl-l] [Gmod-schema] Circular genomes in Chado/BioPerl In-Reply-To: References: <87B87004-A586-44A0-BB10-D4AA3FE9E669@tamu.edu> <24c96eca0809081257l16461b23uaefe8154ed038bea@mail.gmail.com> Message-ID: <536f21b00809091033v1e412f4ft54d8e139c347a20a@mail.gmail.com> Hi Jim and All, While I agree with Aaron's point that it is not easy to place features by visual inspection, this seems like a fairly minor point. The vast majority of GFF3 manipulation will be done in software, so as long as the API handles everything correctly, life is good. If we discount that objection, there doesn't seem to be much advantage of using Aaron's suggested method over Jim's. (As a side note--I have the same complaint about anything in XML--it is awful for a human to read. I still live with XML when I have to though :-) Additionally, the fact that Ensembl is using the same method as what Jim describes is a fairly powerful argument for doing the same. Hopefully there can be some code reuse. Scott On Tue, Sep 9, 2008 at 12:05 PM, Jim Hu wrote: > Hi Aaron, > I was thinking this would be handled by making the end=parent feature length > x 2 + end coord. end/parent length = number of times crosses origin. > Jim > On Sep 8, 2008, at 2:57 PM, Aaron Mackey wrote: > > How can you handle features that may cross the origin more than once? > The modulus, though simple, seems to be only half the solution. It > also makes it difficult to place features in the genome "by eye" > (having to do the modulus subtraction in my head), or in > sorting/filtering operations. > > I have an alternative that I wondered if you considered: allow the > start/end to have an additional "circular revolution" prefix: > > a typical range tuple like: 100 200 - > is thus shorthand for: 0:100 0:200 - > (i.e. both the 100 and 200 are in the same "revolution" around the genome) > > and is then distinguishable from an "around the genome + 100" feature of: > 1:100 0:200 - > > Just an alternative to consider (if you haven't already). I'm not > wedded to the syntax, but I wouldn't want to see new columns in GFF > just for this. Essentially, what you want is some form of compound > polar coordinates, it seems. > > -Aaron > > On Mon, Sep 8, 2008 at 2:44 PM, Jim Hu wrote: > > In discussions with GMOD about Gbrowse, we've come up with a proposal for > > handling circular genomes and features that cross the origin in such > > genomes. This applies to lots of prokaryotic and viral genomes, and might > > be valuable for some ways of representing terminally redundant linear > > genomes. > > 1) Keep the requirement that start < end > > 2) allow end > parent feature length > > 3) parent feature gets an is_circular boolean > > 4) use modular arithmetic to calculate the real position of end on the > > parent feature. > > We'd like to do this in a way that will be consistent with Chado and BioPerl > > representation of features as much as possible (realizing that there is the > > usual interbase or not coordinate issue). What do people think? Lincoln is > > on board for modifying the GFF3 spec. > > Thanks! > > Jim Hu > > ===================================== > > Jim Hu > > Associate Professor > > Dept. of Biochemistry and Biophysics > > 2128 TAMU > > Texas A&M Univ. > > College Station, TX 77843-2128 > > 979-862-4054 > > > ------------------------------------------------------------------------- > > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > > Build the coolest Linux based applications with Moblin SDK & win great > > prizes > > Grand prize is a trip for two to an Open Source event anywhere in the world > > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > > _______________________________________________ > > Gmod-schema mailing list > > Gmod-schema at lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/gmod-schema > > > > ===================================== > > Jim Hu > > Associate Professor > > Dept. of Biochemistry and Biophysics > > 2128 TAMU > > Texas A&M Univ. > > College Station, TX 77843-2128 > > 979-862-4054 > > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great > prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Gmod-schema mailing list > Gmod-schema at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/gmod-schema > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain.cshl at gmail.com GMOD Coordinator (http://gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From lincoln.stein at gmail.com Tue Sep 9 13:52:36 2008 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Tue, 9 Sep 2008 13:52:36 -0400 Subject: [Bioperl-l] [Gmod-schema] Circular genomes in Chado/BioPerl In-Reply-To: <24c96eca0809081257l16461b23uaefe8154ed038bea@mail.gmail.com> References: <87B87004-A586-44A0-BB10-D4AA3FE9E669@tamu.edu> <24c96eca0809081257l16461b23uaefe8154ed038bea@mail.gmail.com> Message-ID: <6dce9a0b0809091052g1c398a84tfe8f89d1bf9132c8@mail.gmail.com> It seems to me that the proposed modulus syntax handles multiple revolutions. Consider a 100 bp genome (to make it simple) and a feature that starts at 50, goes around twice, and ends at position 60: start = 50 end = 260 length = end - start + 1 revolutions = int (length/genome) stop position = length % genome + 1 Lincoln On Mon, Sep 8, 2008 at 3:57 PM, Aaron Mackey wrote: > How can you handle features that may cross the origin more than once? > The modulus, though simple, seems to be only half the solution. It > also makes it difficult to place features in the genome "by eye" > (having to do the modulus subtraction in my head), or in > sorting/filtering operations. > > I have an alternative that I wondered if you considered: allow the > start/end to have an additional "circular revolution" prefix: > > a typical range tuple like: 100 200 - > is thus shorthand for: 0:100 0:200 - > (i.e. both the 100 and 200 are in the same "revolution" around the genome) > > and is then distinguishable from an "around the genome + 100" feature of: > 1:100 0:200 - > > Just an alternative to consider (if you haven't already). I'm not > wedded to the syntax, but I wouldn't want to see new columns in GFF > just for this. Essentially, what you want is some form of compound > polar coordinates, it seems. > > -Aaron > > On Mon, Sep 8, 2008 at 2:44 PM, Jim Hu wrote: > > In discussions with GMOD about Gbrowse, we've come up with a proposal for > > handling circular genomes and features that cross the origin in such > > genomes. This applies to lots of prokaryotic and viral genomes, and > might > > be valuable for some ways of representing terminally redundant linear > > genomes. > > 1) Keep the requirement that start < end > > 2) allow end > parent feature length > > 3) parent feature gets an is_circular boolean > > 4) use modular arithmetic to calculate the real position of end on the > > parent feature. > > We'd like to do this in a way that will be consistent with Chado and > BioPerl > > representation of features as much as possible (realizing that there is > the > > usual interbase or not coordinate issue). What do people think? Lincoln > is > > on board for modifying the GFF3 spec. > > Thanks! > > Jim Hu > > > > ===================================== > > > > Jim Hu > > > > Associate Professor > > > > Dept. of Biochemistry and Biophysics > > > > 2128 TAMU > > > > Texas A&M Univ. > > > > College Station, TX 77843-2128 > > > > 979-862-4054 > > > > > > ------------------------------------------------------------------------- > > This SF.Net email is sponsored by the Moblin Your Move Developer's > challenge > > Build the coolest Linux based applications with Moblin SDK & win great > > prizes > > Grand prize is a trip for two to an Open Source event anywhere in the > world > > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > > _______________________________________________ > > Gmod-schema mailing list > > Gmod-schema at lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/gmod-schema > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Stacey Quinn Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 USA (516) 367-8380 Assistant: Sandra Michelsen From cjfields at illinois.edu Tue Sep 9 14:24:49 2008 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 9 Sep 2008 13:24:49 -0500 Subject: [Bioperl-l] [Gmod-schema] Circular genomes in Chado/BioPerl In-Reply-To: References: <87B87004-A586-44A0-BB10-D4AA3FE9E669@tamu.edu> <24c96eca0809081257l16461b23uaefe8154ed038bea@mail.gmail.com> Message-ID: Is there any particular reason we don't treat this similarly to the way BioPerl does, which is to simply treat the origin-overlapping feature as a split location? GenBank treats this similarly. For an faux example, the bug I just fixed for bugzilla has one: http://bugzilla.open-bio.org/show_bug.cgi?id=2579 An actual GenBank case is the Sulfolobus solfataricus genome (NC_002754), and I'm sure Jim could come up with more. The only caveat is whether we should represent this As for multiple revolutions, I'm not sure the hand-wringing about specifics is worth it unless we have explicit workable examples to test against (preferably examples which would potentially pop up), but Lincoln's proposal sounds fine. chris On Sep 9, 2008, at 11:05 AM, Jim Hu wrote: > Hi Aaron, > > I was thinking this would be handled by making the end=parent > feature length x 2 + end coord. end/parent length = number of times > crosses origin. > > Jim > > On Sep 8, 2008, at 2:57 PM, Aaron Mackey wrote: > >> How can you handle features that may cross the origin more than once? >> The modulus, though simple, seems to be only half the solution. It >> also makes it difficult to place features in the genome "by eye" >> (having to do the modulus subtraction in my head), or in >> sorting/filtering operations. >> >> I have an alternative that I wondered if you considered: allow the >> start/end to have an additional "circular revolution" prefix: >> >> a typical range tuple like: 100 200 - >> is thus shorthand for: 0:100 0:200 - >> (i.e. both the 100 and 200 are in the same "revolution" around the >> genome) >> >> and is then distinguishable from an "around the genome + 100" >> feature of: >> 1:100 0:200 - >> >> Just an alternative to consider (if you haven't already). I'm not >> wedded to the syntax, but I wouldn't want to see new columns in GFF >> just for this. Essentially, what you want is some form of compound >> polar coordinates, it seems. >> >> -Aaron >> >> On Mon, Sep 8, 2008 at 2:44 PM, Jim Hu wrote: >>> In discussions with GMOD about Gbrowse, we've come up with a >>> proposal for >>> handling circular genomes and features that cross the origin in such >>> genomes. This applies to lots of prokaryotic and viral genomes, >>> and might >>> be valuable for some ways of representing terminally redundant >>> linear >>> genomes. >>> 1) Keep the requirement that start < end >>> 2) allow end > parent feature length >>> 3) parent feature gets an is_circular boolean >>> 4) use modular arithmetic to calculate the real position of end on >>> the >>> parent feature. >>> We'd like to do this in a way that will be consistent with Chado >>> and BioPerl >>> representation of features as much as possible (realizing that >>> there is the >>> usual interbase or not coordinate issue). What do people think? >>> Lincoln is >>> on board for modifying the GFF3 spec. >>> Thanks! >>> Jim Hu >>> >>> ===================================== >>> >>> Jim Hu >>> >>> Associate Professor >>> >>> Dept. of Biochemistry and Biophysics >>> >>> 2128 TAMU >>> >>> Texas A&M Univ. >>> >>> College Station, TX 77843-2128 >>> >>> 979-862-4054 >>> >>> >>> ------------------------------------------------------------------------- >>> This SF.Net email is sponsored by the Moblin Your Move Developer's >>> challenge >>> Build the coolest Linux based applications with Moblin SDK & win >>> great >>> prizes >>> Grand prize is a trip for two to an Open Source event anywhere in >>> the world >>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>> _______________________________________________ >>> Gmod-schema mailing list >>> Gmod-schema at lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/gmod-schema >>> >>> > > ===================================== > Jim Hu > Associate Professor > Dept. of Biochemistry and Biophysics > 2128 TAMU > Texas A&M Univ. > College Station, TX 77843-2128 > 979-862-4054 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Marie-Claude Hofmann College of Veterinary Medicine University of Illinois Urbana-Champaign From ajmackey at gmail.com Tue Sep 9 14:48:12 2008 From: ajmackey at gmail.com (Aaron Mackey) Date: Tue, 9 Sep 2008 14:48:12 -0400 Subject: [Bioperl-l] [Gmod-schema] Circular genomes in Chado/BioPerl In-Reply-To: <6dce9a0b0809091052g1c398a84tfe8f89d1bf9132c8@mail.gmail.com> References: <87B87004-A586-44A0-BB10-D4AA3FE9E669@tamu.edu> <24c96eca0809081257l16461b23uaefe8154ed038bea@mail.gmail.com> <6dce9a0b0809091052g1c398a84tfe8f89d1bf9132c8@mail.gmail.com> Message-ID: <24c96eca0809091148l738604a7q13fba54ac05de01c@mail.gmail.com> Right, the modulus calculation continues to work, but for instance, what'll happen when I now ask Gbrowse (or Ensembl) to show me positions 50..260? Will it show me 50 .. 60, 1:100, or "unroll" the genome twice from 50..260 (that'd be a pretty cute trick, by the way!) You're (re)using simple arithmetic to compress a compound coordinate into a single-valued coordinate (which I realize can be trivially packed and unpacked by software), but I worry about the downstream consequences of software having to always remember that the coordinates given may have to be unpacked or not, and not being able to immediately identify whether "260" is a real or compound coordinate. To say it another way, I'm happy (that is, don't care much) whether Chado or any other underlying data storage uses such compound coordinates, because only Chado-reliant tools will need to care; but I do worry about GFF3 as a (relatively) simple exchange format having that kind of silent bug-causing complexity. I'd much rather see GFF be syntactically explicit, and not quite so cleverly implicit. Just one GFF user's two cents, thanks for listening, -Aaron On Tue, Sep 9, 2008 at 1:52 PM, Lincoln Stein wrote: > It seems to me that the proposed modulus syntax handles multiple > revolutions. Consider a 100 bp genome (to make it simple) and a feature that > starts at 50, goes around twice, and ends at position 60: > > start = 50 > end = 260 > > length = end - start + 1 > revolutions = int (length/genome) > stop position = length % genome + 1 > > Lincoln > > On Mon, Sep 8, 2008 at 3:57 PM, Aaron Mackey wrote: >> >> How can you handle features that may cross the origin more than once? >> The modulus, though simple, seems to be only half the solution. It >> also makes it difficult to place features in the genome "by eye" >> (having to do the modulus subtraction in my head), or in >> sorting/filtering operations. >> >> I have an alternative that I wondered if you considered: allow the >> start/end to have an additional "circular revolution" prefix: >> >> a typical range tuple like: 100 200 - >> is thus shorthand for: 0:100 0:200 - >> (i.e. both the 100 and 200 are in the same "revolution" around the genome) >> >> and is then distinguishable from an "around the genome + 100" feature of: >> 1:100 0:200 - >> >> Just an alternative to consider (if you haven't already). I'm not >> wedded to the syntax, but I wouldn't want to see new columns in GFF >> just for this. Essentially, what you want is some form of compound >> polar coordinates, it seems. >> >> -Aaron >> >> On Mon, Sep 8, 2008 at 2:44 PM, Jim Hu wrote: >> > In discussions with GMOD about Gbrowse, we've come up with a proposal >> > for >> > handling circular genomes and features that cross the origin in such >> > genomes. This applies to lots of prokaryotic and viral genomes, and >> > might >> > be valuable for some ways of representing terminally redundant linear >> > genomes. >> > 1) Keep the requirement that start < end >> > 2) allow end > parent feature length >> > 3) parent feature gets an is_circular boolean >> > 4) use modular arithmetic to calculate the real position of end on the >> > parent feature. >> > We'd like to do this in a way that will be consistent with Chado and >> > BioPerl >> > representation of features as much as possible (realizing that there is >> > the >> > usual interbase or not coordinate issue). What do people think? >> > Lincoln is >> > on board for modifying the GFF3 spec. >> > Thanks! >> > Jim Hu >> > >> > ===================================== >> > >> > Jim Hu >> > >> > Associate Professor >> > >> > Dept. of Biochemistry and Biophysics >> > >> > 2128 TAMU >> > >> > Texas A&M Univ. >> > >> > College Station, TX 77843-2128 >> > >> > 979-862-4054 >> > >> > >> > >> > ------------------------------------------------------------------------- >> > This SF.Net email is sponsored by the Moblin Your Move Developer's >> > challenge >> > Build the coolest Linux based applications with Moblin SDK & win great >> > prizes >> > Grand prize is a trip for two to an Open Source event anywhere in the >> > world >> > http://moblin-contest.org/redirect.php?banner_id=100&url=/ >> > _______________________________________________ >> > Gmod-schema mailing list >> > Gmod-schema at lists.sourceforge.net >> > https://lists.sourceforge.net/lists/listinfo/gmod-schema >> > >> > >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > Lincoln D. Stein > > Ontario Institute for Cancer Research > 101 College St., Suite 800 > Toronto, ON, Canada M5G0A3 > 416 673-8514 > Assistant: Stacey Quinn > > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 USA > (516) 367-8380 > Assistant: Sandra Michelsen > From cjfields at illinois.edu Tue Sep 9 14:49:13 2008 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 9 Sep 2008 13:49:13 -0500 Subject: [Bioperl-l] [Gmod-schema] Circular genomes in Chado/BioPerl In-Reply-To: References: <87B87004-A586-44A0-BB10-D4AA3FE9E669@tamu.edu> <24c96eca0809081257l16461b23uaefe8154ed038bea@mail.gmail.com>

Message-ID: <8C6E5EBC-25B1-41E3-BB89-FD3C228A6B70@illinois.edu> Sent just a bit too early! On Sep 9, 2008, at 1:24 PM, Chris Fields wrote: > Is there any particular reason we don't treat this similarly to the > way BioPerl does, which is to simply treat the origin-overlapping > feature as a split location? GenBank treats this similarly. For an > faux example, the bug I just fixed for bugzilla has one: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2579 > > An actual GenBank case is the Sulfolobus solfataricus genome > (NC_002754), and I'm sure Jim could come up with more. The only > caveat is whether we should represent this ... as a 'special case' for features overlapping the origin in a circular sequence. > As for multiple revolutions, I'm not sure the hand-wringing about > specifics is worth it unless we have explicit workable examples to > test against (preferably examples which would potentially pop up), > but Lincoln's proposal sounds fine. > > chris From Russell.Smithies at agresearch.co.nz Tue Sep 9 16:46:26 2008 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 10 Sep 2008 08:46:26 +1200 Subject: [Bioperl-l] [Gmod-schema] Circular genomes in Chado/BioPerl In-Reply-To: <6dce9a0b0809091052g1c398a84tfe8f89d1bf9132c8@mail.gmail.com> References: <87B87004-A586-44A0-BB10-D4AA3FE9E669@tamu.edu><24c96eca0809081257l16461b23uaefe8154ed038bea@mail.gmail.com> <6dce9a0b0809091052g1c398a84tfe8f89d1bf9132c8@mail.gmail.com> Message-ID: Excuse my ignorance (I'm not a biologist) but is it biologically possible/likely for a gene or feature to wrap more than once around a genome? Anyone got an example? Russell Smithies Bioinformatics Applications Developer Invermay? Research Centre Puddle Alley, Mosgiel, New Zealand > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open- > bio.org] On Behalf Of Lincoln Stein > Sent: Wednesday, 10 September 2008 5:53 a.m. > To: Aaron Mackey > Cc: GMOD Schema List; Jim Hu; Roy Welch; bioperl-l at bioperl.org; Mike Gribskov > Subject: Re: [Bioperl-l] [Gmod-schema] Circular genomes in Chado/BioPerl > > It seems to me that the proposed modulus syntax handles multiple > revolutions. Consider a 100 bp genome (to make it simple) and a feature that > starts at 50, goes around twice, and ends at position 60: > > start = 50 > end = 260 > > length = end - start + 1 > revolutions = int (length/genome) > stop position = length % genome + 1 > > Lincoln > > On Mon, Sep 8, 2008 at 3:57 PM, Aaron Mackey wrote: > > > How can you handle features that may cross the origin more than once? > > The modulus, though simple, seems to be only half the solution. It > > also makes it difficult to place features in the genome "by eye" > > (having to do the modulus subtraction in my head), or in > > sorting/filtering operations. > > > > I have an alternative that I wondered if you considered: allow the > > start/end to have an additional "circular revolution" prefix: > > > > a typical range tuple like: 100 200 - > > is thus shorthand for: 0:100 0:200 - > > (i.e. both the 100 and 200 are in the same "revolution" around the genome) > > > > and is then distinguishable from an "around the genome + 100" feature of: > > 1:100 0:200 - > > > > Just an alternative to consider (if you haven't already). I'm not > > wedded to the syntax, but I wouldn't want to see new columns in GFF > > just for this. Essentially, what you want is some form of compound > > polar coordinates, it seems. > > > > -Aaron > > > > On Mon, Sep 8, 2008 at 2:44 PM, Jim Hu wrote: > > > In discussions with GMOD about Gbrowse, we've come up with a proposal for > > > handling circular genomes and features that cross the origin in such > > > genomes. This applies to lots of prokaryotic and viral genomes, and > > might > > > be valuable for some ways of representing terminally redundant linear > > > genomes. > > > 1) Keep the requirement that start < end > > > 2) allow end > parent feature length > > > 3) parent feature gets an is_circular boolean > > > 4) use modular arithmetic to calculate the real position of end on the > > > parent feature. > > > We'd like to do this in a way that will be consistent with Chado and > > BioPerl > > > representation of features as much as possible (realizing that there is > > the > > > usual interbase or not coordinate issue). What do people think? Lincoln > > is > > > on board for modifying the GFF3 spec. > > > Thanks! > > > Jim Hu > > > > > > ===================================== > > > > > > Jim Hu > > > > > > Associate Professor > > > > > > Dept. of Biochemistry and Biophysics > > > > > > 2128 TAMU > > > > > > Texas A&M Univ. > > > > > > College Station, TX 77843-2128 > > > > > > 979-862-4054 > > > > > > > > > ------------------------------------------------------------------------- > > > This SF.Net email is sponsored by the Moblin Your Move Developer's > > challenge > > > Build the coolest Linux based applications with Moblin SDK & win great > > > prizes > > > Grand prize is a trip for two to an Open Source event anywhere in the > > world > > > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > > > _______________________________________________ > > > Gmod-schema mailing list > > > Gmod-schema at lists.sourceforge.net > > > https://lists.sourceforge.net/lists/listinfo/gmod-schema > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > -- > Lincoln D. Stein > > Ontario Institute for Cancer Research > 101 College St., Suite 800 > Toronto, ON, Canada M5G0A3 > 416 673-8514 > Assistant: Stacey Quinn > > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 USA > (516) 367-8380 > Assistant: Sandra Michelsen > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From cjm at berkeleybop.org Tue Sep 9 18:56:55 2008 From: cjm at berkeleybop.org (Chris Mungall) Date: Tue, 9 Sep 2008 15:56:55 -0700 Subject: [Bioperl-l] [Gmod-schema] Circular genomes in Chado/BioPerl In-Reply-To: <6dce9a0b0809091052g1c398a84tfe8f89d1bf9132c8@mail.gmail.com> References: <87B87004-A586-44A0-BB10-D4AA3FE9E669@tamu.edu> <24c96eca0809081257l16461b23uaefe8154ed038bea@mail.gmail.com> <6dce9a0b0809091052g1c398a84tfe8f89d1bf9132c8@mail.gmail.com> Message-ID: <18A0C3BE-ED04-4494-9231-B945158D6CE4@berkeleybop.org> I think I am happy with the modulo approach. Though I believe we first of all need for a formal specification of genome interval semantics that is independent of any particular syntax or implementation. This can be a fairly short specification - along the lines of what Lincoln has written below (although I would naturally prefer the normative version to be interbase - this doesn't preclude derived axioms in GFF coordinates). This spec should also define and standardize the terminology used: Lincoln draws a distinction between 'stop' and 'end'. I'm relatively happy with these terms - however, the choice we makes need to become enshrined otherwise we'll end up with confusion and mismatches between software and specification. One clarification: > revolutions = int (length/genome) This axiom is presumaby contextual on the genome being circular, which will have to be indicated using a new flag, as Jim suggest, yep? So the context independent axiom would be: > revolutions = IF src_is_circular THEN int (length/genome) ELSE 0 On Sep 9, 2008, at 10:52 AM, Lincoln Stein wrote: > It seems to me that the proposed modulus syntax handles multiple > revolutions. Consider a 100 bp genome (to make it simple) and a > feature that > starts at 50, goes around twice, and ends at position 60: > > start = 50 > end = 260 > > length = end - start + 1 > revolutions = int (length/genome) > stop position = length % genome + 1 > > Lincoln > > On Mon, Sep 8, 2008 at 3:57 PM, Aaron Mackey > wrote: > >> How can you handle features that may cross the origin more than once? >> The modulus, though simple, seems to be only half the solution. It >> also makes it difficult to place features in the genome "by eye" >> (having to do the modulus subtraction in my head), or in >> sorting/filtering operations. >> >> I have an alternative that I wondered if you considered: allow the >> start/end to have an additional "circular revolution" prefix: >> >> a typical range tuple like: 100 200 - >> is thus shorthand for: 0:100 0:200 - >> (i.e. both the 100 and 200 are in the same "revolution" around the >> genome) >> >> and is then distinguishable from an "around the genome + 100" >> feature of: >> 1:100 0:200 - >> >> Just an alternative to consider (if you haven't already). I'm not >> wedded to the syntax, but I wouldn't want to see new columns in GFF >> just for this. Essentially, what you want is some form of compound >> polar coordinates, it seems. >> >> -Aaron >> >> On Mon, Sep 8, 2008 at 2:44 PM, Jim Hu wrote: >>> In discussions with GMOD about Gbrowse, we've come up with a >>> proposal for >>> handling circular genomes and features that cross the origin in such >>> genomes. This applies to lots of prokaryotic and viral genomes, and >> might >>> be valuable for some ways of representing terminally redundant >>> linear >>> genomes. >>> 1) Keep the requirement that start < end >>> 2) allow end > parent feature length >>> 3) parent feature gets an is_circular boolean >>> 4) use modular arithmetic to calculate the real position of end on >>> the >>> parent feature. >>> We'd like to do this in a way that will be consistent with Chado and >> BioPerl >>> representation of features as much as possible (realizing that >>> there is >> the >>> usual interbase or not coordinate issue). What do people think? >>> Lincoln >> is >>> on board for modifying the GFF3 spec. >>> Thanks! >>> Jim Hu >>> >>> ===================================== >>> >>> Jim Hu >>> >>> Associate Professor >>> >>> Dept. of Biochemistry and Biophysics >>> >>> 2128 TAMU >>> >>> Texas A&M Univ. >>> >>> College Station, TX 77843-2128 >>> >>> 979-862-4054 >>> >>> >>> ------------------------------------------------------------------------- >>> This SF.Net email is sponsored by the Moblin Your Move Developer's >> challenge >>> Build the coolest Linux based applications with Moblin SDK & win >>> great >>> prizes >>> Grand prize is a trip for two to an Open Source event anywhere in >>> the >> world >>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>> _______________________________________________ >>> Gmod-schema mailing list >>> Gmod-schema at lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/gmod-schema >>> >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > Lincoln D. Stein > > Ontario Institute for Cancer Research > 101 College St., Suite 800 > Toronto, ON, Canada M5G0A3 > 416 673-8514 > Assistant: Stacey Quinn > > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 USA > (516) 367-8380 > Assistant: Sandra Michelsen > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From heikki at sanbi.ac.za Wed Sep 10 07:32:06 2008 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Wed, 10 Sep 2008 13:32:06 +0200 Subject: [Bioperl-l] phylogeny-trait association methods into BioPerl Message-ID: <200809101332.07137.heikki@sanbi.ac.za> FYI, I've been recently writing code to analyse phylogeny-trait associations. These traits are typically geographical location of the sequence but they can be any phenotypic characters associated with the sequences. This involves trees, i.e. Bio::Tree::Tree and Bio::Tree::Node objects and strings describing the traits. I've been using tags to store trait values within nodes. The tag methods are: Bio::Tree::Node::add_tag_value Bio::Tree::Node::get_all_tags Bio::Tree::Node::get_tag_values Bio::Tree::Node::has_tag Bio::Tree::Node::remove_all_tags Bio::Tree::Node::remove_tag Question: Is there any particular reason why there is no set_tag_value(scalar|@array) method? I am getting tired of writing: $node->remove_tag($key); map {$node->add_tag_value($key)} @values ; so I am going to implement that unless there is are strong objections. Otherwise it has been smooth sailing. I am going to add Bio::Tree::TreeFunctions::is_binary() and start populating Bio::Tree::Statistics soon with these methods: ps() - Parsimony Score (PS) from Fitch 1971 ai() - Association index (AI) of Whang et al. 2001 mc() - Monophyletic Clade (MC) size statistics by Salemi at al. 2005 cherries() - number of leaf node pairs If you have any comments, please feel free to post them here. -Heikki -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From hlapp at gmx.net Wed Sep 10 09:44:27 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 10 Sep 2008 07:44:27 -0600 Subject: [Bioperl-l] phylogeny-trait association methods into BioPerl In-Reply-To: <200809101332.07137.heikki@sanbi.ac.za> References: <200809101332.07137.heikki@sanbi.ac.za> Message-ID: <0FA6C4A1-9D83-4850-BC0A-4A55ED528C65@gmx.net> Sounds great Heikki! Just FYI, there is a considerable amount of code and packages for comparative analysis in R. You might want to look into the resources linked to from the http://r-phylo.org page. There is also a R-SIG- Phylo special interest group mailing list (should be linked from the aforementioned site). -hilmar On Sep 10, 2008, at 5:32 AM, Heikki Lehvaslaiho wrote: > FYI, > > I've been recently writing code to analyse phylogeny-trait > associations. These > traits are typically geographical location of the sequence but they > can be any > phenotypic characters associated with the sequences. > > This involves trees, i.e. Bio::Tree::Tree and Bio::Tree::Node > objects and > strings describing the traits. I've been using tags to store trait > values > within nodes. The tag methods are: > > Bio::Tree::Node::add_tag_value > Bio::Tree::Node::get_all_tags > Bio::Tree::Node::get_tag_values > Bio::Tree::Node::has_tag > Bio::Tree::Node::remove_all_tags > Bio::Tree::Node::remove_tag > > Question: Is there any particular reason why there is no > set_tag_value(scalar|@array) method? > > I am getting tired of writing: > $node->remove_tag($key); > map {$node->add_tag_value($key)} @values ; > so I am going to implement that unless there is are strong objections. > > Otherwise it has been smooth sailing. I am going to add > Bio::Tree::TreeFunctions::is_binary() and start populating > Bio::Tree::Statistics soon with these methods: > > ps() - Parsimony Score (PS) from Fitch 1971 > ai() - Association index (AI) of Whang et al. 2001 > mc() - Monophyletic Clade (MC) size statistics by Salemi at al. 2005 > cherries() - number of leaf node pairs > > If you have any comments, please feel free to post them here. > > -Heikki > > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ _/_/_/_/_/________________________________________________________ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From Caroline.Johnston at iop.kcl.ac.uk Wed Sep 10 11:43:26 2008 From: Caroline.Johnston at iop.kcl.ac.uk (Johnston, Caroline) Date: Wed, 10 Sep 2008 16:43:26 +0100 Subject: [Bioperl-l] seqs, seqfeatures, locations etc Message-ID: <5626ED9CB91C814197079FB9940312E506445F81@MAIL.bc.iop.kcl.ac.uk> Hello, I'm trying to get my head around the various classes for storing sequences, features and locations and was hoping someone could give me some implementation advice: I've got a Bio::EnsEMBL::Slice and I want to turn it into a Bio::Seq or SeqFeature object, with Bio::SeqFeature;:Gene::GeneStructure/Transcript/Exon info attached. I can create a Bio::Seq fine, but I also need to keep track of the chromosomal co-ordinates (chr, start, end, species, strand, genome release, database name) and I can't figure out how to store this in Bioperl. I was thinking that what I needed was some extension of a standard Bio::Seq to have genome-coordinate data attached and associated methods to translate the SeqFeature positions (relative to the Bio::Seq) to genome positions. I guess it's probably already possible to store this type of info in some collection of Bioperl objects, but between Bioperl and the EnsEMBL API I'm getting lost in perl modules. Can someone point me in the right direction? Thanks, Cass From bosborne11 at verizon.net Wed Sep 10 13:34:35 2008 From: bosborne11 at verizon.net (Brian Osborne) Date: Wed, 10 Sep 2008 13:34:35 -0400 Subject: [Bioperl-l] seqs, seqfeatures, locations etc In-Reply-To: <5626ED9CB91C814197079FB9940312E506445F81@MAIL.bc.iop.kcl.ac.uk> References: <5626ED9CB91C814197079FB9940312E506445F81@MAIL.bc.iop.kcl.ac.uk> Message-ID: Cass, There is a HOWTO about these Bioperl objects: http://www.bioperl.org/wiki/HOWTO:Feature-Annotation I think it addresses your questions. Brian O. On Sep 10, 2008, at 11:43 AM, Johnston, Caroline wrote: > Hello, > > I'm trying to get my head around the various classes for storing > sequences, features and locations and was hoping someone could give > me some implementation advice: > > I've got a Bio::EnsEMBL::Slice and I want to turn it into a Bio::Seq > or SeqFeature object, with Bio::SeqFeature;:Gene::GeneStructure/ > Transcript/Exon info attached. I can create a Bio::Seq fine, but I > also need to keep track of the chromosomal co-ordinates (chr, start, > end, species, strand, genome release, database name) and I can't > figure out how to store this in Bioperl. I was thinking that what I > needed was some extension of a standard Bio::Seq to have genome- > coordinate data attached and associated methods to translate the > SeqFeature positions (relative to the Bio::Seq) to genome positions. > I guess it's probably already possible to store this type of info in > some collection of Bioperl objects, but between Bioperl and the > EnsEMBL API I'm getting lost in perl modules. Can someone point me > in the right direction? > > Thanks, > Cass > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Wed Sep 10 13:44:25 2008 From: jason at bioperl.org (Jason Stajich) Date: Wed, 10 Sep 2008 10:44:25 -0700 Subject: [Bioperl-l] phylogeny-trait association methods into BioPerl In-Reply-To: <200809101332.07137.heikki@sanbi.ac.za> References: <200809101332.07137.heikki@sanbi.ac.za> Message-ID: <24F25525-0394-45BE-859F-A5735DE610DF@bioperl.org> Those are just take from Bio::SeqFeature::Generic tag manipulation methods so please feel free to add a better one - maybe can propagate that new method to Bio::SeqFeature::Generic as well? Would be nice to see those other methods add as well so am glad to see them. -jason On Sep 10, 2008, at 4:32 AM, Heikki Lehvaslaiho wrote: > FYI, > > I've been recently writing code to analyse phylogeny-trait > associations. These > traits are typically geographical location of the sequence but they > can be any > phenotypic characters associated with the sequences. > > This involves trees, i.e. Bio::Tree::Tree and Bio::Tree::Node > objects and > strings describing the traits. I've been using tags to store trait > values > within nodes. The tag methods are: > > Bio::Tree::Node::add_tag_value > Bio::Tree::Node::get_all_tags > Bio::Tree::Node::get_tag_values > Bio::Tree::Node::has_tag > Bio::Tree::Node::remove_all_tags > Bio::Tree::Node::remove_tag > > Question: Is there any particular reason why there is no > set_tag_value(scalar|@array) method? > > I am getting tired of writing: > $node->remove_tag($key); > map {$node->add_tag_value($key)} @values ; > so I am going to implement that unless there is are strong objections. > > Otherwise it has been smooth sailing. I am going to add > Bio::Tree::TreeFunctions::is_binary() and start populating > Bio::Tree::Statistics soon with these methods: > > ps() - Parsimony Score (PS) from Fitch 1971 > ai() - Association index (AI) of Whang et al. 2001 > mc() - Monophyletic Clade (MC) size statistics by Salemi at al. 2005 > cherries() - number of leaf node pairs > > If you have any comments, please feel free to post them here. > > -Heikki > > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ _/_/_/_/_/________________________________________________________ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason at bioperl.org From philsf79 at gmail.com Thu Sep 11 04:40:26 2008 From: philsf79 at gmail.com (Felipe Figueiredo) Date: Thu, 11 Sep 2008 05:40:26 -0300 Subject: [Bioperl-l] difference in opening file from @ARGV and STDIN? Message-ID: <1221122426.8059.49.camel@localhost> I'm not sure if this is related to bioperl (Bio::AlignIO) or if it's a general perl error on my part, but I find strange that the following code gives differente results depending on how I input the alignment: --- test.pl --- #!/usr/bin/perl use warnings; use strict; use Bio::AlignIO; my $file; if (@ARGV) { $file = shift @ARGV; } else { $file = "-"; } my $align = Bio::AlignIO->new(-file=>$file)->next_aln; printf "Sequences: %s\n",$align->no_sequences; --- test.pl --- If I run this using a file containing 4 sequences, the following hapens: --- run tests --- $ ./test.pl exemplo-alinhamento.fasta Sequences: 4 $ ./test.pl < exemplo-alinhamento.fasta Sequences: 3 $ cat exemplo-alinhamento.fasta | ./test.pl Sequences: 3 --- run tests --- The missing sequence is always the first one. Am I missing something, or my code for reading stdin is mistaken or is it a bug in Bio::AlignIO? I'm using bioperl 1.5.2.102-1ubuntu1, in Ubuntu 8.04 Hardy. best regards FF From David.Messina at sbc.su.se Thu Sep 11 07:21:51 2008 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 11 Sep 2008 13:21:51 +0200 Subject: [Bioperl-l] difference in opening file from @ARGV and STDIN? In-Reply-To: <1221122426.8059.49.camel@localhost> References: <1221122426.8059.49.camel@localhost> Message-ID: <628aabb70809110421x3668606dx4a64a203ab0ff9d2@mail.gmail.com> Hi Felipe, Specifying STDIN via a '-' argument to the -file parameter is not valid. While that is a convention with some UNIX tools, it's not, as far as I know, something you should be able to count on. In BioPerl, one can specify STDIN by passing the \*STDIN filehandle glob to the -fh parameter (NOT to -file). In other words, my $align = Bio::AlignIO->new(-fh => \*STDIN)->next_aln; That is a convention in BioPerl, so the -file and -fh parameters should work the same way in AlignIO, SearchIO, SeqIO, etc. Take a look at the beginners' HOWTO for some examples. http://www.bioperl.org/wiki/HOWTO:Beginners Dave From bosborne11 at verizon.net Thu Sep 11 11:01:32 2008 From: bosborne11 at verizon.net (Brian Osborne) Date: Thu, 11 Sep 2008 11:01:32 -0400 Subject: [Bioperl-l] SeqHound Message-ID: <0E295726-33FE-4083-98A1-9035E37455BD@verizon.net> Raul, Good question about BIND, I don't know if the public version is up-to- date. For the latest protein-protein interaction data I look at public databases like IntAct. If you use bioperl-network you should be able to read IntAct data into a graph, then find the interactions that you're interested in in that graph. There are a few qualifications to this statement though, like do you have the "right" identifiers or names. So you're right, bioperl- network constructs graphs from the XML files but the interactions you want are in those graphs. Something like: my $graphio = Bio::Network::IO->new(-file => 'human.xml', -format => 'psi25'); my $graph = $graphio->next_network(); my $node = $graph->get_nodes_by_id('UniProt:P12345'); my @neighbors = $graph->neighbors($node); Brian O. On Sep 11, 2008, at 7:32 AM, Raul Mendez Giraldez wrote: > Hi Brian, > > Actually I realized later that SeqHound is a part of Bioperl itself, > and > that regarding BIND (at least the public database) is reachable trough > BIND SOAP protocol, that can be implemented in perl through the module > SOAP::Lite. I still don't know whether the public BIND database is out > of date, or which part of the BOND database it covers. > > Regarding the Bio::Network packages, at the Bioperl suite, I guess it > rather for representing protein - protein interaction graphs, isn't > it? > That could be interesting to me, but in a second step. I am more > concerned now in getting this protein - protein interaction data, > for a > set of proteins some biologists gave me. I don't know anything about > Cytoscape, normally I'm trying to exploit perl data management > capabilities. > > Thanks for the info. > > Cheers, > > Raul > From MEC at stowers-institute.org Thu Sep 11 14:01:01 2008 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Thu, 11 Sep 2008 13:01:01 -0500 Subject: [Bioperl-l] difference in opening file from @ARGV and STDIN? In-Reply-To: <628aabb70809110421x3668606dx4a64a203ab0ff9d2@mail.gmail.com> References: <1221122426.8059.49.camel@localhost> <628aabb70809110421x3668606dx4a64a203ab0ff9d2@mail.gmail.com> Message-ID: Filipe and Dave, I find that the following works generically for SeqIO and AlignIO (at least)... #after processing all options using GetOpt, #any remaining options should name files to process... @ARGV = ('-') unless @ARGV; # Default to standard input my %inopt; $inopt{-fh} ||= \*ARGV; my $AlignIO = Bio::AlignIO->new( %inopt ) or die "calling Bio::AlignIO->new on %inopt" ; --Malcolm -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Dave Messina Sent: Thursday, September 11, 2008 6:22 AM To: Felipe Figueiredo Cc: bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] difference in opening file from @ARGV and STDIN? Hi Felipe, Specifying STDIN via a '-' argument to the -file parameter is not valid. While that is a convention with some UNIX tools, it's not, as far as I know, something you should be able to count on. In BioPerl, one can specify STDIN by passing the \*STDIN filehandle glob to the -fh parameter (NOT to -file). In other words, my $align = Bio::AlignIO->new(-fh => \*STDIN)->next_aln; That is a convention in BioPerl, so the -file and -fh parameters should work the same way in AlignIO, SearchIO, SeqIO, etc. Take a look at the beginners' HOWTO for some examples. http://www.bioperl.org/wiki/HOWTO:Beginners Dave _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Thu Sep 11 14:47:04 2008 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 11 Sep 2008 20:47:04 +0200 Subject: [Bioperl-l] difference in opening file from @ARGV and STDIN? In-Reply-To: References: <1221122426.8059.49.camel@localhost> <628aabb70809110421x3668606dx4a64a203ab0ff9d2@mail.gmail.com> Message-ID: <628aabb70809111147y7d99e1bdh414a4ab1037c0990@mail.gmail.com> Thanks, Malcolm. So then, '-' as STDIN does work? D From MEC at stowers-institute.org Thu Sep 11 16:19:25 2008 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Thu, 11 Sep 2008 15:19:25 -0500 Subject: [Bioperl-l] difference in opening file from @ARGV and STDIN? In-Reply-To: <628aabb70809111147y7d99e1bdh414a4ab1037c0990@mail.gmail.com> References: <1221122426.8059.49.camel@localhost> <628aabb70809110421x3668606dx4a64a203ab0ff9d2@mail.gmail.com> <628aabb70809111147y7d99e1bdh414a4ab1037c0990@mail.gmail.com> Message-ID: Note exactly the way I would put it. Look at the difference between the first command and the second is the following transcript: > echo -e ">asdf\natgc\n" | perl -MBio::SeqIO -e 'my $s = Bio::SeqIO->new(-format => qw{fasta}, -fh => \*ARGV); print $ARGV[0] . qq{ has } . $s->next_seq()->seq . qq{\n}' -- '-' - has atgc > echo -e ">asdf\natgc\n" | perl -MBio::SeqIO -e 'my $s = Bio::SeqIO->new(-format => qw{fasta}, -fh => \*ARGV); print $ARGV[0] . qq{ has } . $s->next_seq()->seq . qq{\n}' -- 'NoSuchFile' Can't open NoSuchFile: No such file or directory at /home/mec/cvs/bioperl-live/Bio/Root/IO.pm line 458. Can't call method "seq" on an undefined value at -e line 1. THe only difference is that @ARG is the singleton list composed of '-' in the first call, and is the singlton list composed of 'NoSuchFile' in the second. If you passed in a list of multiple files that actually do exist, it should work fine. It is really a matter of ARGV processing magic. from http://perldoc.perl.org/perlop.html The null filehandle <> is special: it can be used to emulate the behavior of sed and awk. Input from <> comes either from standard input, or from each file listed on the command line. Here's how it works: the first time <> is evaluated, the @ARGV array is checked, and if it is empty, $ARGV[0] is set to "-", which when opened gives you standard input. The @ARGV array is then processed as a list of filenames. The loop while (<>) { ... # code for each line } is equivalent to the following Perl-like pseudo code: unshift(@ARGV, '-') unless @ARGV; while ($ARGV = shift) { open(ARGV, $ARGV); while () { ... # code for each line } } Malcolm Cook Database Applications Manager - Bioinformatics Stowers Institute for Medical Research - Kansas City, Missouri ________________________________ From: dave at davemessina.com [mailto:dave at davemessina.com] On Behalf Of Dave Messina Sent: Thursday, September 11, 2008 1:47 PM To: Cook, Malcolm Cc: Felipe Figueiredo; bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] difference in opening file from @ARGV and STDIN? Thanks, Malcolm. So then, '-' as STDIN does work? D From David.Messina at sbc.su.se Thu Sep 11 17:15:44 2008 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 11 Sep 2008 23:15:44 +0200 Subject: [Bioperl-l] difference in opening file from @ARGV and STDIN? In-Reply-To: References: <1221122426.8059.49.camel@localhost> <628aabb70809110421x3668606dx4a64a203ab0ff9d2@mail.gmail.com> <628aabb70809111147y7d99e1bdh414a4ab1037c0990@mail.gmail.com> Message-ID: <628aabb70809111415w27044d57jd3121adb40900e45@mail.gmail.com> Cool, thanks for the explanation Malcolm! At the risk of belaboring this point and your patience, one thing still confuses me, though: and if [@ARGV] is empty, $ARGV[0] is set to "-" > If $ARGV[0] is set (by Perl's ARGV processing magic) to '-', then why in your earlier example do you manually set $ARGV[0] to '-' instead of simply leaving @ARGV empty? @ARGV = ('-') unless @ARGV; If I run your example and omit '-' as an argument, it still works: > echo -e ">asdf\natgc\n" | perl -MBio::SeqIO -e 'my $s = Bio::SeqIO->new(-format => qw{fasta}, -fh => \*ARGV); print $ARGV[0] . qq{ has } . $s->next_seq()->seq . qq{\n}' has atgc Dave From acouperthwaite at gmail.com Fri Sep 12 17:17:53 2008 From: acouperthwaite at gmail.com (Andrew Couperthwaite) Date: Fri, 12 Sep 2008 15:17:53 -0600 Subject: [Bioperl-l] Bio::DB::Query::GenBank question Message-ID: Hi, I'm having difficulty using the Bio::DB::Query::GenBank module. The sample script on the page http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/DB/Query/GenBank.html doesn't seem to work. I'm trying to use this and the Bio::DB::GenBank module to find and download a set of sequences from GenBank... I'm rather new to bioperl, can anyone point me in the right direction? Thanks, -Andrew From jason at bioperl.org Sat Sep 13 02:42:28 2008 From: jason at bioperl.org (Jason Stajich) Date: Fri, 12 Sep 2008 23:42:28 -0700 Subject: [Bioperl-l] Bio::DB::Query::GenBank question In-Reply-To: References: Message-ID: <2CCFFD09-B7B0-4172-A10F-224875A6E968@bioperl.org> Hi Andrew - a) what is the exact script code you are trying, what are the error messages? b) what version of bioperl? The first thing we'll suggest is: did you get the latest code from SVN yet or a nightly build? -jason On Sep 12, 2008, at 2:17 PM, Andrew Couperthwaite wrote: > Hi, > > I'm having difficulty using the Bio::DB::Query::GenBank module. > The sample script on the page http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/DB/Query/GenBank.html > doesn't seem to work. > > I'm trying to use this and the Bio::DB::GenBank module to find and > download a set of sequences from GenBank... > I'm rather new to bioperl, can anyone point me in the right direction? > > Thanks, > -Andrew > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason at bioperl.org From cjfields at illinois.edu Mon Sep 15 00:13:57 2008 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 14 Sep 2008 23:13:57 -0500 Subject: [Bioperl-l] significant bug with Bio::LocatableSeq Message-ID: <98938314-4AF1-44C2-8867-B383A44E542E@illinois.edu> While debugging some tests in bioperl, I noticed a fairly significant issue with Bio::LocatableSeq which is probably due to some inconsistencies with start/end coordinates. For some reason this started popping up with error messages recently when running AlignIO tests on bioperl-live (i.e. something changed which exposed the bug, maybe the verbosity level): 1..295 ok 1 - use Bio::AlignIO; ok 2 - The object isa Bio::AlignIO ok 3 - The object isa Bio::Align::AlignI ok 4 ok 5 ok 6 - The object isa Bio::AlignIO --------------------- WARNING --------------------- MSG: In sequence 02 residue count gives end value 399. Overriding value [355] with value 399 for Bio::LocatableSeq::end(). STACK Bio::LocatableSeq::end /Users/cjfields/bioperl/bioperl-live/blib/ lib/Bio/LocatableSeq.pm:150 STACK Bio::LocatableSeq::new /Users/cjfields/bioperl/bioperl-live/blib/ lib/Bio/LocatableSeq.pm:103 STACK Bio::AlignIO::arp::next_aln /Users/cjfields/bioperl/bioperl-live/ blib/lib/Bio/AlignIO/arp.pm:106 STACK toplevel t/AlignIO.t:34 --------------------------------------------------- .... followed by tons of similar errors. The problem is, no change is ever made. This is demonstrated by the following: ----------------------------- #!/usr/bin/perl -w use strict; use warnings; use Bio::LocatableSeq; my $seq = Bio::LocatableSeq->new( -id => 'foo', -seq => 'A----TGCGCTTCCTCGCTTCCG', -start => 10, -end => 100, # intentially bad -strand => -1); print $seq->end."\n"; ----------------------------- Results: --------------------- WARNING --------------------- MSG: In sequence foo residue count gives end value 28. Overriding value [100] with value 28 for Bio::LocatableSeq::end(). STACK Bio::LocatableSeq::end /Users/cjfields/bioperl/bioperl-live/Bio/ LocatableSeq.pm:150 STACK Bio::LocatableSeq::new /Users/cjfields/bioperl/bioperl-live/Bio/ LocatableSeq.pm:103 STACK toplevel seq.pl:7 --------------------------------------------------- 100 The warning pops up when -end is passed to LocatableSeq::new and indicates that the passed coordinate doesn't match up with the one calculated from the sequence (minus gaps). I've isolated the bug down to the end() method and am working on fixing it. Note that this affects LocatableSeq::length as well. This appears to affect arp, nexus, stockholm, and a few other AlignIO parsers as well. chris From acouperthwaite at gmail.com Mon Sep 15 15:05:13 2008 From: acouperthwaite at gmail.com (Andrew Couperthwaite) Date: Mon, 15 Sep 2008 13:05:13 -0600 Subject: [Bioperl-l] Bio::DB::Query::GenBank question In-Reply-To: <2CCFFD09-B7B0-4172-A10F-224875A6E968@bioperl.org> References: <2CCFFD09-B7B0-4172-A10F-224875A6E968@bioperl.org> Message-ID: <2BA47AC9-5A04-460D-BA56-64B152AFFF16@gmail.com> the code i'm starting with is this: ===== use Bio::DB::Query::GenBank; use Bio::DB::GenBank my $query_string = 'Oryza[Organism] AND EST[Keyword]'; my $query = Bio::DB::Query::GenBank->new(-query => 'Oryza[Organism] AND EST[Keyword]', -db=>'nucleotide'); my $count = $query->count; my @ids = $query->ids; # get a genbank database handle my $gb = Bio::DB::GenBank->new(); my $stream = $gb->get_Stream_by_query($query); while (my $seq = $stream->next_seq) { # do something with the sequence object print "hello"; } ===== It doesn't produce any error messages, it simply doesn't enter the while loop. It seems as though it isn't getting any results from genbank?/isn't contacting genbank? We are running version 1.5.2. Haven't updated since late May. -Andrew On 13-Sep-08, at 12:42 AM, Jason Stajich wrote: > Hi Andrew - > > a) what is the exact script code you are trying, what are the error > messages? > b) what version of bioperl? > > The first thing we'll suggest is: did you get the latest code from > SVN yet or a nightly build? > > -jason > On Sep 12, 2008, at 2:17 PM, Andrew Couperthwaite wrote: > >> Hi, >> >> I'm having difficulty using the Bio::DB::Query::GenBank module. >> The sample script on the page http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/DB/Query/GenBank.html >> doesn't seem to work. >> >> I'm trying to use this and the Bio::DB::GenBank module to find and >> download a set of sequences from GenBank... >> I'm rather new to bioperl, can anyone point me in the right >> direction? >> >> Thanks, >> -Andrew >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Jason Stajich > jason at bioperl.org > > > From jason at bioperl.org Mon Sep 15 15:25:49 2008 From: jason at bioperl.org (Jason Stajich) Date: Mon, 15 Sep 2008 12:25:49 -0700 Subject: [Bioperl-l] Bio::DB::Query::GenBank question In-Reply-To: <2BA47AC9-5A04-460D-BA56-64B152AFFF16@gmail.com> References: <2CCFFD09-B7B0-4172-A10F-224875A6E968@bioperl.org> <2BA47AC9-5A04-460D-BA56-64B152AFFF16@gmail.com> Message-ID: <25A13987-7306-42E8-BE4E-E50A70538E7D@bioperl.org> Try removing the line my @ids = $query->ids; I don't think GenBank supports this part of the query API now or you have to execute the query first before you can ask for the number of IDs from that object. I don't remember exactly what changed specifically, but I actually get an error with that line: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Id list has been truncated even after maxids requested STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/lib/perl5/Bio/Root/Root.pm:357 STACK: Bio::DB::Query::WebQuery::_fetch_ids /usr/local/lib/perl5/Bio/ DB/Query/WebQuery.pm:237 STACK: Bio::DB::Query::WebQuery::ids /usr/local/lib/perl5/Bio/DB/Query/ WebQuery.pm:201 STACK: query.pl:9 ----------------------------------------------------------- This is the query script I use that works fine for me for my downloads. http://fungalgenomes.org/~stajich/scripts/download_query_genbank.pl I'm using SVN code so it is possible things are changed since 1.5.2. -jason On Sep 15, 2008, at 12:05 PM, Andrew Couperthwaite wrote: > the code i'm starting with is this: > ===== > use Bio::DB::Query::GenBank; > use Bio::DB::GenBank > > my $query_string = 'Oryza[Organism] AND EST[Keyword]'; > my $query = Bio::DB::Query::GenBank->new(-query => > 'Oryza[Organism] AND EST[Keyword]', > -db=>'nucleotide'); > my $count = $query->count; > my @ids = $query->ids; > # get a genbank database handle > my $gb = Bio::DB::GenBank->new(); > my $stream = $gb->get_Stream_by_query($query); > > while (my $seq = $stream->next_seq) { > # do something with the sequence object > print "hello"; > } > ===== > It doesn't produce any error messages, it simply doesn't enter the > while loop. It seems as though it isn't getting any results from > genbank?/isn't contacting genbank? > > We are running version 1.5.2. Haven't updated since late May. > > -Andrew > > On 13-Sep-08, at 12:42 AM, Jason Stajich wrote: > >> Hi Andrew - >> >> a) what is the exact script code you are trying, what are the error >> messages? >> b) what version of bioperl? >> >> The first thing we'll suggest is: did you get the latest code from >> SVN yet or a nightly build? >> >> -jason >> On Sep 12, 2008, at 2:17 PM, Andrew Couperthwaite wrote: >> >>> Hi, >>> >>> I'm having difficulty using the Bio::DB::Query::GenBank module. >>> The sample script on the page http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/DB/Query/GenBank.html >>> doesn't seem to work. >>> >>> I'm trying to use this and the Bio::DB::GenBank module to find and >>> download a set of sequences from GenBank... >>> I'm rather new to bioperl, can anyone point me in the right >>> direction? >>> >>> Thanks, >>> -Andrew >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Jason Stajich >> jason at bioperl.org >> >> >> > Jason Stajich jason at bioperl.org From biology0046 at hotmail.com Mon Sep 15 16:00:02 2008 From: biology0046 at hotmail.com (=?gb2312?B?va3OxOL9?=) Date: Mon, 15 Sep 2008 20:00:02 +0000 Subject: [Bioperl-l] HSP tiling problem In-Reply-To: References: Message-ID: Dear all: I try to use bioperl's SearhIO module to parse the blast output, the output of blast contain many HSPs from the same hit, each HSP come out with an count of identity residues and alignment length, but what i want to get is the global identiy and alignment length of the query sequence and hit sequence, which may be done by tiling the HSPs and build a HSP contig, the bioperl BLASTutil module contain "hsp_tiling and length_aln, frac_identical" method fullfill my purpose, but i read through the mailing list, some user said the method used by bioperl were not precise in many cases, they sugguest to use WUBLAST. but i don't know which WUBLAST opinion should i use, i use "links" opinion, but the output still give me lots of HSPs, are these HSPs created from some small HSPs?best regards!wenkaichinese academy of sciences _________________________________________________________________ ???????????????????????????????????????????????????? http://im.live.cn/Share/18.htm From jason at bioperl.org Mon Sep 15 16:27:02 2008 From: jason at bioperl.org (Jason Stajich) Date: Mon, 15 Sep 2008 13:27:02 -0700 Subject: [Bioperl-l] HSP tiling problem In-Reply-To: References:

Message-ID: <90A30B97-1618-4AAF-86D9-B51929697CDE@bioperl.org> The -links option gives you a logical ordering of non-overlapping HSPs so it is the tiling you desire. You just process each HSP from the list provided by -links. There may be several alternative paths so you can compute a %id for all of them or just take the longest one, etc. You need to just decide how you want to compute global identity - an average of the HSP identities or just sum up the number of identical bases across them all divided by the total length of sequence that is aligned. All those pieces of information are available and described in the SearchIO HOWTO on the website. Or, requiring less coding, re-align the sequences with an aligner like SSEARCH to get a single alignment and a single %id/%sim number for the sequence pair. -jason On Sep 15, 2008, at 1:00 PM, ?????? wrote: > > Dear all: > I try to use bioperl's SearhIO module to parse the blast output, > the output of blast contain many HSPs from the same hit, each HSP > come out with an count of identity residues and alignment length, > but what i want to get is the global identiy and alignment length of > the query sequence and hit sequence, which may be done by tiling the > HSPs and build a HSP contig, > > the bioperl BLASTutil module contain "hsp_tiling and length_aln, > frac_identical" method fullfill my purpose, but i read through the > mailing list, some user said the method used by bioperl were not > precise in many cases, they sugguest to use WUBLAST. > > but i don't know which WUBLAST opinion should i use, i use "links" > opinion, but the output still give me lots of HSPs, are these HSPs > created from some small HSPs?best regards!wenkaichinese academy of > sciences > _________________________________________________________________ > ???????????????????????????????????????????????????? > http://im.live.cn/Share/18.htm > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason at bioperl.org From bosborne11 at verizon.net Mon Sep 15 22:22:54 2008 From: bosborne11 at verizon.net (Brian Osborne) Date: Mon, 15 Sep 2008 22:22:54 -0400 Subject: [Bioperl-l] Bio::DB::Query::GenBank question In-Reply-To: <25A13987-7306-42E8-BE4E-E50A70538E7D@bioperl.org> References: <2CCFFD09-B7B0-4172-A10F-224875A6E968@bioperl.org> <2BA47AC9-5A04-460D-BA56-64B152AFFF16@gmail.com> <25A13987-7306-42E8-BE4E-E50A70538E7D@bioperl.org> Message-ID: <583390B6-1075-4EB0-A22A-E1D07D2EDFA8@verizon.net> Jason, I added that script to scripts/, hope you don't mind! Brian O. On Sep 15, 2008, at 3:25 PM, Jason Stajich wrote: > This is the query script I use that works fine for me for my > downloads. From mirhan at indiana.edu Tue Sep 16 01:18:47 2008 From: mirhan at indiana.edu (Han, Mira) Date: Tue, 16 Sep 2008 01:18:47 -0400 Subject: [Bioperl-l] [Wg-phyloinformatics] Summer of Code: project summaries In-Reply-To: <2033D247-5E34-4BDE-A98E-BE08D9910951@duke.edu> Message-ID: Hi, Just writing to summarize the final product of this summer. (even though it's overdue) The phyloXML support in BioPerl project is finished officially, I've created three new modules, Bio::TreeIO::phyloxml, Bio::Tree::AnnotatableNode, and Bio::Annotation::Relation. The parsing is done for all elements except for And the validation step is still missing. I'm planning to add those in in the future. I was contemplating to make a Bio::Annotation::XML till the end, But decided not to since there was already a new module Bio::Annotation::TagTree, that looked like the same thing. Currently, the module does not use TagTree, it uses nested AnnotationCollections instead, because TagTree was a temporary stub at the time. There are methods in phyloxml that seem out of place there, For example, the print_annotation and print_attr functions that print the annotations in phyloXML format. Or the read_annotation function that reads the value of the annotation for a certain object. I feel like these functions should go with the Annotation object, But since there were no Annotation object that is phyloXML specific, I just put them in the TreeIO::phyloxml. This may change too in the future. The documentations are all finished and can be seen in http://www.bioperl.org/wiki/Module:Bio::TreeIO::phyloxml http://www.bioperl.org/wiki/Module:Bio::Tree::AnnotatableNode http://www.bioperl.org/wiki/Module:Bio::Annotation::Relation The detailed progress of the project is documented in http://www.bioperl.org/wiki/PhyloXML_support_in_BioPerl and there is also a demo page that describes the use in http://www.bioperl.org/wiki/Phyloxml_Project_Demo Finally I'd like to thank Chris Fields and Hilmar Lapp, for the constant support during the summer. And the people on the list for providing helpful suggestions when I came with a question. See you around. Mira Han On 9/14/08 1:15 PM, "Hilmar Lapp" wrote: Hi all, the 2008 Google Summer of Code program is officially over now. First off, I'd like to congratulate all of our students: you all passed your end-of-term evaluation, and you all passed solidly. Thanks for all your hard work, beginning with writing and refining your project applications and finally wrapping it up with documentation and a demonstration project. Based on what I can see, the students have all uploaded the code you produced over the summer to the Google Code repository, as requested by Google, so I'm not going to bore you with repeating the instructions for how to do this. If you are still having trouble with this process or have other questions regarding this though, don't hesitate to let me know and ask any question you have. For our imminently due Annual Report to NSF, we will need to document your projects, and I am asking you for your help in this. Essentially, we will need to produce similar information to what we composed for the 2007 fall NESCent newsletter: for each project we need a short summary of the project and the benefits of the results, written for a scientific but otherwise general audience, a headshot of you (the student), and a visual of the results. See http://www.nescent.org/news/newsletter_10_07.php#google for the example from last year as to what we have in mind. Kistin Jenkins or Jory Weintraub from NESCent's Education and Outreach group (both copied) will be in touch with you to possibly edit the summaries and fine-tune the other material, but if the students could please send your draft summary text, headshot, and visual as soon as you can to eog at nescent.org that would be great. If you are not sure what to use for the visual, I'd encourage you to speak with your mentor; however, feel free to ask me as well. Thanks again for working with us, and on your projects, and I hope you all not only had a most enjoyable summer but also will keep involved in your project as much as possible. I will also arrange for an end-of- term teleconference with the students shortly, and one of the things that I'd like to talk about is how we can best facilitate you remaining involved beyond the official end of the program. Cheers, -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : =========================================================== _______________________________________________ Wg-phyloinformatics mailing list Wg-phyloinformatics at nescent.org https://lists.nescent.org/mailman/listinfo/wg-phyloinformatics From hartzell at alerce.com Wed Sep 17 00:21:36 2008 From: hartzell at alerce.com (George Hartzell) Date: Tue, 16 Sep 2008 21:21:36 -0700 Subject: [Bioperl-l] Bio::Coordinate::Collection could DoWhatIMean better (w/ patch) In-Reply-To: <200809090950.11821.heikki@sanbi.ac.za> References: <200809090950.11821.heikki@sanbi.ac.za> Message-ID: <18640.34256.305270.924007@almost.alerce.com> Heikki Lehvaslaiho writes: > George, > > This is an error from my side. Great that you have a fix already. > > My only worry is the number of external dependencies in BioPerl. To limit > these we have recoded number of functionalities into BioPerl-specific modules. > Before you commit the fix, could you see if Bio::RangeI could be used or easily > extended to be used instead of Set::IntSpan? > > Thanks, > > -Heikki > > On Friday 05 September 2008 21:01:35 George Hartzell wrote: > > Hi all, > > > > Bio::Coordinate::Collection surprised me a bit. At first I thought > > there was a bug, but it's clearly doing what it's supposed to. Now > > I'm wondering if what it's supposed to be doing makes sense in some > > context, or if what I expected would be better functionality. > > > > t/CoordinateMapper.t sets up the following scenario: > > > > # > > # Collection > > # > > # 1 5 6 10 > > # |---| |---| > > #-----|----------------------- > > # 1 5 9 15 19 > > # pair1 pair2 > > > > Then goes on to do the following query: > > > > # match more than two > > $pos = Bio::Location::Simple->new (-start => 5, -end => 19 ); > > ok $res = $transcribe->map($pos); > > is $res->each_gap, 2; > > is $res->each_match, 2; > > > > I was surprised to see that there were two gaps, one gene:10-19 and > > one from gene:5-14. Looking at the code, what's really happening is > > that, for the exon1 mapper there's match with gene:5-9 and a gap with > > gene:10-19 and for the exon2 mapper there's a gap with gene:5-14 and a > > match with gene:15-19. All four Result's just get tossed into the > > return value. > > > > The result my intuition wants is that there are two matches (gene:5-9 > > with exon1 and gene:15-19 with exon2) and a gap (gene:10-14). > > > > Yes, I guess that I could just synthesize these myself from the result > > in my app. > > > > It still seems that the current result is a bug though, since there's > > no way of knowing when you're walking through $res->each_Location that > > the first "gap" is with respect to the exon1 mapper and that the > > second "gap" is with respect to the exon2 mapper. The gaps are > > meaningless. > > > > I "fixed" it to work the way I think it should (two matches, one > > gap). I actually extended the test case a bit so that there's a > > multi-base gap, a match, another multibase-gap, another match, then a > > single base gap (just to make sure I got that right...). I had to > > touch up the test file a bit to account for my new test. > > > > The gaps that I return have a strand of 'undef', which seems to be The > > Right Thing. There's also a bit of funny business where I hang onto > > the seq_id of the gapped sequence. It assumes that the "in" sequence > > is the same for all of the mappers. This seems safe since otherwise > > the entire query is kind of weird.... > > > > There's a patch to todays svn head at: > > > > http://shrimp.alerce.com/bioperl/collection-diffs.txt > > > > The patch changes Build.PL to include a dependency on Set::IntSpan, > > CoordinateMapper.t to update the tests, and > > Bio/CoordinateMapper/Collection.pm for the new code. > > > > Who's code would this break. > > > > If anyone's relying on the current behaviour re: gaps, what's the > > situation in which you find it useful? > > > > Thanks! > > > > g. I suspect that I can redo it w/ RangeI's. It'll be a good project. I'm pushing on a project at the moment but will get to it this weekend or next week. thanks, g. From hartzell at alerce.com Wed Sep 17 00:21:36 2008 From: hartzell at alerce.com (George Hartzell) Date: Tue, 16 Sep 2008 21:21:36 -0700 Subject: [Bioperl-l] Bio::Coordinate::Collection could DoWhatIMean better (w/ patch) In-Reply-To: <200809090950.11821.heikki@sanbi.ac.za> References: <200809090950.11821.heikki@sanbi.ac.za> Message-ID: <18640.34256.305270.924007@almost.alerce.com> Heikki Lehvaslaiho writes: > George, > > This is an error from my side. Great that you have a fix already. > > My only worry is the number of external dependencies in BioPerl. To limit > these we have recoded number of functionalities into BioPerl-specific modules. > Before you commit the fix, could you see if Bio::RangeI could be used or easily > extended to be used instead of Set::IntSpan? > > Thanks, > > -Heikki > > On Friday 05 September 2008 21:01:35 George Hartzell wrote: > > Hi all, > > > > Bio::Coordinate::Collection surprised me a bit. At first I thought > > there was a bug, but it's clearly doing what it's supposed to. Now > > I'm wondering if what it's supposed to be doing makes sense in some > > context, or if what I expected would be better functionality. > > > > t/CoordinateMapper.t sets up the following scenario: > > > > # > > # Collection > > # > > # 1 5 6 10 > > # |---| |---| > > #-----|----------------------- > > # 1 5 9 15 19 > > # pair1 pair2 > > > > Then goes on to do the following query: > > > > # match more than two > > $pos = Bio::Location::Simple->new (-start => 5, -end => 19 ); > > ok $res = $transcribe->map($pos); > > is $res->each_gap, 2; > > is $res->each_match, 2; > > > > I was surprised to see that there were two gaps, one gene:10-19 and > > one from gene:5-14. Looking at the code, what's really happening is > > that, for the exon1 mapper there's match with gene:5-9 and a gap with > > gene:10-19 and for the exon2 mapper there's a gap with gene:5-14 and a > > match with gene:15-19. All four Result's just get tossed into the > > return value. > > > > The result my intuition wants is that there are two matches (gene:5-9 > > with exon1 and gene:15-19 with exon2) and a gap (gene:10-14). > > > > Yes, I guess that I could just synthesize these myself from the result > > in my app. > > > > It still seems that the current result is a bug though, since there's > > no way of knowing when you're walking through $res->each_Location that > > the first "gap" is with respect to the exon1 mapper and that the > > second "gap" is with respect to the exon2 mapper. The gaps are > > meaningless. > > > > I "fixed" it to work the way I think it should (two matches, one > > gap). I actually extended the test case a bit so that there's a > > multi-base gap, a match, another multibase-gap, another match, then a > > single base gap (just to make sure I got that right...). I had to > > touch up the test file a bit to account for my new test. > > > > The gaps that I return have a strand of 'undef', which seems to be The > > Right Thing. There's also a bit of funny business where I hang onto > > the seq_id of the gapped sequence. It assumes that the "in" sequence > > is the same for all of the mappers. This seems safe since otherwise > > the entire query is kind of weird.... > > > > There's a patch to todays svn head at: > > > > http://shrimp.alerce.com/bioperl/collection-diffs.txt > > > > The patch changes Build.PL to include a dependency on Set::IntSpan, > > CoordinateMapper.t to update the tests, and > > Bio/CoordinateMapper/Collection.pm for the new code. > > > > Who's code would this break. > > > > If anyone's relying on the current behaviour re: gaps, what's the > > situation in which you find it useful? > > > > Thanks! > > > > g. I suspect that I can redo it w/ RangeI's. It'll be a good project. I'm pushing on a project at the moment but will get to it this weekend or next week. thanks, g. From cjfields at illinois.edu Wed Sep 17 12:41:52 2008 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 17 Sep 2008 11:41:52 -0500 Subject: [Bioperl-l] [Wg-phyloinformatics] Summer of Code: project summaries In-Reply-To: References: Message-ID: <25D73268-25A1-4780-A2D2-F27D47081D83@illinois.edu> On Sep 16, 2008, at 12:18 AM, Han, Mira wrote: > Hi, > Just writing to summarize the final product of this summer. (even > though it's overdue) BLTN > The phyloXML support in BioPerl project is finished officially, > I've created three new modules, > Bio::TreeIO::phyloxml, > Bio::Tree::AnnotatableNode, > and Bio::Annotation::Relation. > The parsing is done for all elements except for > And the validation step is still missing. > I'm planning to add those in in the future. Great work! One issue that will come up: have you tested round- tripping? (i.e. where object data read in from the original phlyoxml data is the same as object data read in from bioperl-derived phyloxml output) > I was contemplating to make a Bio::Annotation::XML till the end, > But decided not to since there was already a new module > Bio::Annotation::TagTree, that looked like the same thing. > Currently, the module does not use TagTree, it uses nested > AnnotationCollections instead, because TagTree was a temporary stub > at the time. I would consider it no longer temporary (just changed it in svn). The 'stub' part was left over from the initial commit; it has now been tested with bioperl-db (via swissprot tests). It may need more rigorous testing via bioperl-db but it appears to work just fine (famous last words...). There was one issue with Data::Stag (repeated 'Use of undefined variable' errors that popped up with Annotation.t tests), but Chris M. released a new Data::Stag to CPAN just recently which fixes that. It wasn't a bioperl bug and was relatively harmless. > There are methods in phyloxml that seem out of place there, > For example, the print_annotation and print_attr functions that > print the annotations in phyloXML format. > Or the read_annotation function that reads the value of the > annotation for a certain object. > I feel like these functions should go with the Annotation object, > But since there were no Annotation object that is phyloXML specific, > I just put them in the TreeIO::phyloxml. > This may change too in the future. I don't see a problem with this; if you look at the various IO parsers there are several helper methods present (see Bio::SeqIO::genbank for an example). If you were to go that route, I would add a method to the (generic) Annotation class that accepts some customization via a callback, maybe using a default callback if one isn't supplied. See the various Bio::AnnotationI display_text methods, which were added in as a more explicit replacement for operator overloading. > The documentations are all finished and can be seen in > http://www.bioperl.org/wiki/Module:Bio::TreeIO::phyloxml > http://www.bioperl.org/wiki/Module:Bio::Tree::AnnotatableNode > http://www.bioperl.org/wiki/Module:Bio::Annotation::Relation > > The detailed progress of the project is documented in > http://www.bioperl.org/wiki/PhyloXML_support_in_BioPerl > > and there is also a demo page that describes the use in > http://www.bioperl.org/wiki/Phyloxml_Project_Demo I'll give them a look over, but that's a great start! > Finally I'd like to thank Chris Fields and Hilmar Lapp, > for the constant support during the summer. > And the people on the list for providing helpful suggestions when I > came with a question. > See you around. > > Mira Han Thanks, and congrats on completeing the summer of code! Hope you stick around a bit longer; we need all the hands we can get. chris From gundalav at gmail.com Thu Sep 18 04:51:54 2008 From: gundalav at gmail.com (Gundala Viswanath) Date: Thu, 18 Sep 2008 17:51:54 +0900 Subject: [Bioperl-l] Converting EnSeMBL Probe names to Gene Symbol with Bioperl Message-ID: <73f827b50809180151u3ef0fad8k74623036072dfba6@mail.gmail.com> Dear all, Is there a way I can convert such EnSeMBL probe names into Gene Symbol with BioPerl? ENSG00000000003_at ENSG00000000005_at ENSG00000000419_at ENSG00000000457_at ENSG00000000460_at ENSG00000000938_at ENSG00000000971_at ENSG00000001036_at ENSG00000001084_at ENSG00000001167_at ENSG00000001461_at ENSG00000001497_at ENSG00000001561_at ... etc ... Or perhaps there is a flat file for conversion? - Gundala Viswanath Jakarta - Indonesia From cjfields at illinois.edu Thu Sep 18 08:07:52 2008 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 18 Sep 2008 07:07:52 -0500 Subject: [Bioperl-l] Species.t tests failing with bioperl-live Message-ID: I'm getting a couple of failed tests with Christopher Fields Postdoctoral Researcher Lab of Dr. Marie-Claude Hofmann College of Veterinary Medicine University of Illinois Urbana-Champaign From cjfields at illinois.edu Thu Sep 18 08:15:03 2008 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 18 Sep 2008 07:15:03 -0500 Subject: [Bioperl-l] Species.t tests failing with bioperl-live In-Reply-To: References: Message-ID: <1501D3A7-42BA-4489-95E3-5267FFC8A856@illinois.edu> On Sep 18, 2008, at 7:07 AM, Chris Fields wrote: > I'm getting a couple of failed tests with Um, that was supposed to be: I'm getting a couple of failed tests with Species.t; I think these are due to the last commit to Bio::Species: http://code.open-bio.org/svnweb/index.cgi/bioperl/revision/?rev=14793 Tests pass after commenting out that line. chris From bernd.web at gmail.com Thu Sep 18 13:34:49 2008 From: bernd.web at gmail.com (Bernd Web) Date: Thu, 18 Sep 2008 19:34:49 +0200 Subject: [Bioperl-l] SimpleAlign/AlignIO Message-ID: <716af09c0809181034w2f9cc44do6b2a02502ea81fea@mail.gmail.com> Hi, Possibly I overlooked this, but is it not possible to supply an alphabet to SimpleAlign? When SimpleAlign starts making sequence objects _guess_alphabet is executed and in case the sequence (slice) only contains gaps a warning is raised as the alphabet cannot be guessed. As all sequences in the alignment (generally) would be of the same type, it could be good to set the alpabet for the alignment; this also would prevent the executing of _guess_alphabet for all sequences. Is this possible? Regards, Bernd From cjfields at illinois.edu Thu Sep 18 13:56:56 2008 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 18 Sep 2008 12:56:56 -0500 Subject: [Bioperl-l] SimpleAlign/AlignIO In-Reply-To: <716af09c0809181034w2f9cc44do6b2a02502ea81fea@mail.gmail.com> References: <716af09c0809181034w2f9cc44do6b2a02502ea81fea@mail.gmail.com> Message-ID: <9012F474-5762-4ABA-ADAB-7F0ECB38ACC3@illinois.edu> Bernd, I'm not sure it's a good idea to constrain the entire alignment (and all sequences contained within) to a single alphabet. For instance, what if I wanted to align a translated (protein) sequence, possibly as 3- or 6-frame, to a DNA sequence of interest? Or if I wanted to align a Meta sequence w/o an alphabet (i.e. a structural consensus sequence) with a DNA/RNA/protein sequence? The former example is fairly common and may not be entirely possible with SimpleAlign (I haven't tried it to tell the truth), but the latter is what I do with the Stockholm parser. chris On Sep 18, 2008, at 12:34 PM, Bernd Web wrote: > Hi, > > Possibly I overlooked this, but is it not possible to supply an > alphabet to SimpleAlign? > When SimpleAlign starts making sequence objects _guess_alphabet is > executed and in case the sequence (slice) only contains gaps a warning > is raised as the alphabet cannot be guessed. > As all sequences in the alignment (generally) would be of the same > type, it could be good to set the alpabet for the alignment; this also > would prevent the executing of _guess_alphabet for all sequences. Is > this possible? > > > Regards, > Bernd > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Marie-Claude Hofmann College of Veterinary Medicine University of Illinois Urbana-Champaign From MEC at stowers-institute.org Thu Sep 18 17:20:02 2008 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Thu, 18 Sep 2008 16:20:02 -0500 Subject: [Bioperl-l] Converting EnSeMBL Probe names to Gene Symbol with Bioperl In-Reply-To: <73f827b50809180151u3ef0fad8k74623036072dfba6@mail.gmail.com> References: <73f827b50809180151u3ef0fad8k74623036072dfba6@mail.gmail.com> Message-ID: Gundala, Here is an option - if you want a script... http://research.stowers-institute.org/mec/software/scripts/affyprobeset2ensg/ It is not bioperl. It is perl. It uses DBI to query the ensembl MySQL database using SQL. It may be outdated due to changes in Ensembl schema. Good luck, Malcolm Cook Stowers Institute for Medical Research - Kansas City, Missouri -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Gundala Viswanath Sent: Thursday, September 18, 2008 3:52 AM To: bioperl-l at lists.open-bio.org Subject: [Bioperl-l] Converting EnSeMBL Probe names to Gene Symbol with Bioperl Dear all, Is there a way I can convert such EnSeMBL probe names into Gene Symbol with BioPerl? ENSG00000000003_at ENSG00000000005_at ENSG00000000419_at ENSG00000000457_at ENSG00000000460_at ENSG00000000938_at ENSG00000000971_at ENSG00000001036_at ENSG00000001084_at ENSG00000001167_at ENSG00000001461_at ENSG00000001497_at ENSG00000001561_at ... etc ... Or perhaps there is a flat file for conversion? - Gundala Viswanath Jakarta - Indonesia _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From anjan.purkayastha at gmail.com Thu Sep 18 17:27:06 2008 From: anjan.purkayastha at gmail.com (ANJAN PURKAYASTHA) Date: Thu, 18 Sep 2008 17:27:06 -0400 Subject: [Bioperl-l] Retrieving taxonomy information from a GenBank file Message-ID: Hi, I'm using the following code to access the "ORGANISM" tag value for the record NC_002526. I get a value of 0 even though the ORGANISM tag has a value. Any idea how this might be corrected? Thanks. Anjan my $gb= new Bio::DB::GenBank; my $seq = $gb->get_Seq_by_acc('NC_002526'); my $des= $seq->get_tag_values("ORGANISM"); -- ============================= anjan purkayastha, phd bioinformatics analyst whitehead institute for biomedical research nine cambridge center cambridge, ma 02142 purkayas [at] wi [dot] mit [dot] edu 703.740.6939 From jason at bioperl.org Thu Sep 18 17:51:54 2008 From: jason at bioperl.org (Jason Stajich) Date: Thu, 18 Sep 2008 14:51:54 -0700 Subject: [Bioperl-l] Converting EnSeMBL Probe names to Gene Symbol with Bioperl In-Reply-To: References: <73f827b50809180151u3ef0fad8k74623036072dfba6@mail.gmail.com> Message-ID: seems like something that biomart was designed for? There is a scripting interface too. http://biomart.org/ On Sep 18, 2008, at 2:20 PM, Cook, Malcolm wrote: > Gundala, > > Here is an option - if you want a script... > > http://research.stowers-institute.org/mec/software/scripts/affyprobeset2ensg/ > > It is not bioperl. It is perl. It uses DBI to query the ensembl > MySQL database using SQL. It may be outdated due to changes in > Ensembl schema. > > Good luck, > > Malcolm Cook > Stowers Institute for Medical Research - Kansas City, Missouri > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org > ] On Behalf Of Gundala Viswanath > Sent: Thursday, September 18, 2008 3:52 AM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Converting EnSeMBL Probe names to Gene Symbol > with Bioperl > > Dear all, > > Is there a way I can convert such EnSeMBL probe names into Gene > Symbol with BioPerl? > > ENSG00000000003_at > ENSG00000000005_at > ENSG00000000419_at > ENSG00000000457_at > ENSG00000000460_at > ENSG00000000938_at > ENSG00000000971_at > ENSG00000001036_at > ENSG00000001084_at > ENSG00000001167_at > ENSG00000001461_at > ENSG00000001497_at > ENSG00000001561_at > ... etc ... > > Or perhaps there is a flat file for conversion? > > > > - Gundala Viswanath > Jakarta - Indonesia > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason at bioperl.org From jason at bioperl.org Thu Sep 18 17:56:18 2008 From: jason at bioperl.org (Jason Stajich) Date: Thu, 18 Sep 2008 14:56:18 -0700 Subject: [Bioperl-l] Retrieving taxonomy information from a GenBank file In-Reply-To: References: Message-ID: You want the species object ($seq->species) -- see the Seq HOWTO, much of the Seq API is explained. http://bioperl.org/wiki/HOWTO:Feature-Annotation#The_Species_Object -jason On Sep 18, 2008, at 2:27 PM, ANJAN PURKAYASTHA wrote: > Hi, > I'm using the following code to access the "ORGANISM" tag value for > the > record NC_002526. > I get a value of 0 even though the ORGANISM tag has a value. Any > idea how > this might be corrected? > Thanks. > Anjan > > > my $gb= new Bio::DB::GenBank; > > my $seq = $gb->get_Seq_by_acc('NC_002526'); > > my $des= $seq->get_tag_values("ORGANISM"); > > -- > ============================= > anjan purkayastha, phd > bioinformatics analyst > whitehead institute for biomedical research > nine cambridge center > cambridge, ma 02142 > > purkayas [at] wi [dot] mit [dot] edu > 703.740.6939 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason at bioperl.org From cjfields at illinois.edu Thu Sep 18 17:48:26 2008 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 18 Sep 2008 16:48:26 -0500 Subject: [Bioperl-l] Converting EnSeMBL Probe names to Gene Symbol with Bioperl In-Reply-To: References: <73f827b50809180151u3ef0fad8k74623036072dfba6@mail.gmail.com> Message-ID: <0F56A0A4-AE10-4930-B479-C4D0818EAC7A@illinois.edu> You can probably use the Ensembl perl API alone or in combination with bioperl. I'm not sure it is working with the latest code (bioperl- live), but it's worth a try: http://www.ensembl.org/info/using/api/core/core_tutorial.html http://www.bioperl.org/wiki/HOWTO:Getting_Genomic_Sequences#Using_the_Perl_API_at_ENSEMBL chris On Sep 18, 2008, at 4:20 PM, Cook, Malcolm wrote: > Gundala, > > Here is an option - if you want a script... > > http://research.stowers-institute.org/mec/software/scripts/affyprobeset2ensg/ > > It is not bioperl. It is perl. It uses DBI to query the ensembl > MySQL database using SQL. It may be outdated due to changes in > Ensembl schema. > > Good luck, > > Malcolm Cook > Stowers Institute for Medical Research - Kansas City, Missouri > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org > ] On Behalf Of Gundala Viswanath > Sent: Thursday, September 18, 2008 3:52 AM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Converting EnSeMBL Probe names to Gene Symbol > with Bioperl > > Dear all, > > Is there a way I can convert such EnSeMBL probe names into Gene > Symbol with BioPerl? > > ENSG00000000003_at > ENSG00000000005_at > ENSG00000000419_at > ENSG00000000457_at > ENSG00000000460_at > ENSG00000000938_at > ENSG00000000971_at > ENSG00000001036_at > ENSG00000001084_at > ENSG00000001167_at > ENSG00000001461_at > ENSG00000001497_at > ENSG00000001561_at > ... etc ... > > Or perhaps there is a flat file for conversion? > > > > - Gundala Viswanath > Jakarta - Indonesia > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Marie-Claude Hofmann College of Veterinary Medicine University of Illinois Urbana-Champaign From spiros at lokku.com Fri Sep 19 06:17:52 2008 From: spiros at lokku.com (Spiros Denaxas) Date: Fri, 19 Sep 2008 11:17:52 +0100 Subject: [Bioperl-l] code coverage metrics Message-ID: Hello, I recently sent Chris an email about an idea I had, quantifying (and improving) the test coverage that we currently have on the bioperl core. I know there's a list on the wiki [1] with a list of tests that have low coverage and/or complete lack of tests. My idea was to somehow standardize and automate the generation of this list, preferably on a weekly basis initially. We can then more easily see where help is needed and possible assign individual tasks to tackle the most-suffering modules. There are currently several very good CPAN modules that do this, like Devel::Cover [2]. Is there any objection if I kick this off and start doing some work, aiming into creating a more detailed report on code coverage using the current HEAD and test suite? Spiros [1] http://www.bioperl.org/wiki/Untested_Modules_in_BioPerl [2] http://search.cpan.org/dist/Devel-Cover/lib/Devel/Cover.pm From spiros at lokku.com Fri Sep 19 07:51:29 2008 From: spiros at lokku.com (Spiros Denaxas) Date: Fri, 19 Sep 2008 12:51:29 +0100 Subject: [Bioperl-l] code coverage metrics In-Reply-To: <48D391BD.5010105@sendu.me.uk> References: <48D391BD.5010105@sendu.me.uk> Message-ID: On Fri, Sep 19, 2008 at 12:49 PM, Sendu Bala wrote: > Spiros Denaxas wrote: >> >> There are currently several very good CPAN modules that do this, like >> Devel::Cover [2]. Is there any objection if I kick this off and start >> doing some work, aiming into creating a more detailed report on code >> coverage using the current HEAD and test suite? > > You should read our previous discussion here: > http://thread.gmane.org/gmane.comp.lang.perl.bio.general/14586/focus=14588 > > Basically I think we have it all in place now; someone just needs to take > that last step and run a cron job that runs the Build commands. We also need > the website side of things sorted out I think. > > If you could make those things happen and take care of any issues that crop > up, that would be great. > Enlightening, thank you for pointing that out. I will have a look, I don't see a problem with the plan. Spiros From spiros at lokku.com Fri Sep 19 08:01:15 2008 From: spiros at lokku.com (Spiros Denaxas) Date: Fri, 19 Sep 2008 13:01:15 +0100 Subject: [Bioperl-l] code coverage metrics In-Reply-To: References: <48D391BD.5010105@sendu.me.uk> Message-ID: > > Enlightening, thank you for pointing that out. I will have a look, I > don't see a problem with the plan. > > Spiros > I have also updated the wiki to make sure we avoid duplicate effort on this, http://www.bioperl.org/wiki/Project_priority_list Spiros From bix at sendu.me.uk Fri Sep 19 07:49:17 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 19 Sep 2008 12:49:17 +0100 Subject: [Bioperl-l] code coverage metrics In-Reply-To: References: Message-ID: <48D391BD.5010105@sendu.me.uk> Spiros Denaxas wrote: > There are currently several very good CPAN modules that do this, like > Devel::Cover [2]. Is there any objection if I kick this off and start > doing some work, aiming into creating a more detailed report on code > coverage using the current HEAD and test suite? You should read our previous discussion here: http://thread.gmane.org/gmane.comp.lang.perl.bio.general/14586/focus=14588 Basically I think we have it all in place now; someone just needs to take that last step and run a cron job that runs the Build commands. We also need the website side of things sorted out I think. If you could make those things happen and take care of any issues that crop up, that would be great. From cjfields at illinois.edu Fri Sep 19 08:22:02 2008 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 19 Sep 2008 07:22:02 -0500 Subject: [Bioperl-l] code coverage metrics In-Reply-To: References: <48D391BD.5010105@sendu.me.uk> Message-ID: <0C6AAAEE-5D80-4102-B03D-FE0C045492AA@illinois.edu> On Sep 19, 2008, at 6:51 AM, Spiros Denaxas wrote: > On Fri, Sep 19, 2008 at 12:49 PM, Sendu Bala wrote: >> Spiros Denaxas wrote: >>> >>> There are currently several very good CPAN modules that do this, >>> like >>> Devel::Cover [2]. Is there any objection if I kick this off and >>> start >>> doing some work, aiming into creating a more detailed report on code >>> coverage using the current HEAD and test suite? >> >> You should read our previous discussion here: >> http://thread.gmane.org/gmane.comp.lang.perl.bio.general/14586/focus=14588 >> >> Basically I think we have it all in place now; someone just needs >> to take >> that last step and run a cron job that runs the Build commands. We >> also need >> the website side of things sorted out I think. >> >> If you could make those things happen and take care of any issues >> that crop >> up, that would be great. >> > > Enlightening, thank you for pointing that out. I will have a look, I > don't see a problem with the plan. > > Spiros Didn't know it was that far along! Any help pushing this along would be wonderful. chris From MEC at stowers-institute.org Fri Sep 19 09:06:08 2008 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Fri, 19 Sep 2008 08:06:08 -0500 Subject: [Bioperl-l] Converting EnSeMBL Probe names to Gene Symbol with Bioperl In-Reply-To: References: <73f827b50809180151u3ef0fad8k74623036072dfba6@mail.gmail.com>

Message-ID: Jason, Yes .... Biomart .... If I remember correctly, at the time I wrote the script, the problem with using biomart interactively for this task was that though you could query for genes or transcripts by affy id, the results would not include the term you queried with, so, you just get a list of genes that match any of your affyids without knowing which goes with which. But perhaps I was wrong. Certainly I see now that it is possible interactively to return the affyid for any gene for many standard probesets. Cheers, Malcolm Cook -----Original Message----- From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of Jason Stajich Sent: Thursday, September 18, 2008 4:52 PM To: Cook, Malcolm Cc: 'Gundala Viswanath'; 'bioperl-l at lists.open-bio.org' Subject: Re: [Bioperl-l] Converting EnSeMBL Probe names to Gene Symbol with Bioperl seems like something that biomart was designed for? There is a scripting interface too. http://biomart.org/ On Sep 18, 2008, at 2:20 PM, Cook, Malcolm wrote: > Gundala, > > Here is an option - if you want a script... > > http://research.stowers-institute.org/mec/software/scripts/affyprobese > t2ensg/ > > It is not bioperl. It is perl. It uses DBI to query the ensembl > MySQL database using SQL. It may be outdated due to changes in > Ensembl schema. > > Good luck, > > Malcolm Cook > Stowers Institute for Medical Research - Kansas City, Missouri > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org > ] On Behalf Of Gundala Viswanath > Sent: Thursday, September 18, 2008 3:52 AM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Converting EnSeMBL Probe names to Gene Symbol > with Bioperl > > Dear all, > > Is there a way I can convert such EnSeMBL probe names into Gene Symbol > with BioPerl? > > ENSG00000000003_at > ENSG00000000005_at > ENSG00000000419_at > ENSG00000000457_at > ENSG00000000460_at > ENSG00000000938_at > ENSG00000000971_at > ENSG00000001036_at > ENSG00000001084_at > ENSG00000001167_at > ENSG00000001461_at > ENSG00000001497_at > ENSG00000001561_at > ... etc ... > > Or perhaps there is a flat file for conversion? > > > > - Gundala Viswanath > Jakarta - Indonesia > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason at bioperl.org From MEC at stowers-institute.org Fri Sep 19 09:09:00 2008 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Fri, 19 Sep 2008 08:09:00 -0500 Subject: [Bioperl-l] Converting EnSeMBL Probe names to Gene Symbol with Bioperl In-Reply-To: <0F56A0A4-AE10-4930-B479-C4D0818EAC7A@illinois.edu> References: <73f827b50809180151u3ef0fad8k74623036072dfba6@mail.gmail.com> <0F56A0A4-AE10-4930-B479-C4D0818EAC7A@illinois.edu> Message-ID: Chris is right- the Ensembl perl API works fine. If you care to go this route and want to crib the code, you will see that my script has a commented out implementation using Ensembl perl API - it suffered from being MUCH TOO SLOOOOOOOOOO - accessing the tables directly was MUCH faster (unpublished correspondance with self) However, the API is much more likely to be backwards compatible in the face of database schema changes at Ensembl.... Malcolm Cook -----Original Message----- From: Chris Fields [mailto:cjfields at illinois.edu] Sent: Thursday, September 18, 2008 4:48 PM To: Cook, Malcolm Cc: 'Gundala Viswanath'; 'bioperl-l at lists.open-bio.org' Subject: Re: [Bioperl-l] Converting EnSeMBL Probe names to Gene Symbol with Bioperl You can probably use the Ensembl perl API alone or in combination with bioperl. I'm not sure it is working with the latest code (bioperl- live), but it's worth a try: http://www.ensembl.org/info/using/api/core/core_tutorial.html http://www.bioperl.org/wiki/HOWTO:Getting_Genomic_Sequences#Using_the_Perl_API_at_ENSEMBL chris On Sep 18, 2008, at 4:20 PM, Cook, Malcolm wrote: > Gundala, > > Here is an option - if you want a script... > > http://research.stowers-institute.org/mec/softyware/scripts/affyprobese > t2ensg/ > > It is not bioperl. It is perl. It uses DBI to query the ensembl > MySQL database using SQL. It may be outdated due to changes in > Ensembl schema. > > Good luck, > > Malcolm Cook > Stowers Institute for Medical Research - Kansas City, Missouri > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org > ] On Behalf Of Gundala Viswanath > Sent: Thursday, September 18, 2008 3:52 AM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Converting EnSeMBL Probe names to Gene Symbol > with Bioperl > > Dear all, > > Is there a way I can convert such EnSeMBL probe names into Gene Symbol > with BioPerl? > > ENSG00000000003_at > ENSG00000000005_at > ENSG00000000419_at > ENSG00000000457_at > ENSG00000000460_at > ENSG00000000938_at > ENSG00000000971_at > ENSG00000001036_at > ENSG00000001084_at > ENSG00000001167_at > ENSG00000001461_at > ENSG00000001497_at > ENSG00000001561_at > ... etc ... > > Or perhaps there is a flat file for conversion? > > > > - Gundala Viswanath > Jakarta - Indonesia > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Marie-Claude Hofmann College of Veterinary Medicine University of Illinois Urbana-Champaign From mauricio at open-bio.org Fri Sep 19 10:06:22 2008 From: mauricio at open-bio.org (Mauricio Herrera Cuadra) Date: Fri, 19 Sep 2008 09:06:22 -0500 Subject: [Bioperl-l] code coverage metrics In-Reply-To: <0C6AAAEE-5D80-4102-B03D-FE0C045492AA@illinois.edu> References: <48D391BD.5010105@sendu.me.uk> <0C6AAAEE-5D80-4102-B03D-FE0C045492AA@illinois.edu> Message-ID: <48D3B1DE.7070505@open-bio.org> Sorry guys, this has been my fault due to lack of time. I have a very old ticket waiting for me at the OBF Helpdesk to get the website side of things done. Spiros, I have some spare time today to try setting it up, assign it to me in the wiki list if you want, I'll send an update in a couple of days. Thanks, Mauricio. Chris Fields wrote: > > On Sep 19, 2008, at 6:51 AM, Spiros Denaxas wrote: > >> On Fri, Sep 19, 2008 at 12:49 PM, Sendu Bala wrote: >>> Spiros Denaxas wrote: >>>> >>>> There are currently several very good CPAN modules that do this, like >>>> Devel::Cover [2]. Is there any objection if I kick this off and start >>>> doing some work, aiming into creating a more detailed report on code >>>> coverage using the current HEAD and test suite? >>> >>> You should read our previous discussion here: >>> http://thread.gmane.org/gmane.comp.lang.perl.bio.general/14586/focus=14588 >>> >>> >>> Basically I think we have it all in place now; someone just needs to >>> take >>> that last step and run a cron job that runs the Build commands. We >>> also need >>> the website side of things sorted out I think. >>> >>> If you could make those things happen and take care of any issues >>> that crop >>> up, that would be great. >>> >> >> Enlightening, thank you for pointing that out. I will have a look, I >> don't see a problem with the plan. >> >> Spiros > > Didn't know it was that far along! Any help pushing this along would be > wonderful. > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From heikki at sanbi.ac.za Fri Sep 19 12:22:23 2008 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Fri, 19 Sep 2008 18:22:23 +0200 Subject: [Bioperl-l] phylogeny-trait association methods into BioPerl In-Reply-To: <200809101332.07137.heikki@sanbi.ac.za> References: <200809101332.07137.heikki@sanbi.ac.za> Message-ID: <200809191822.23503.heikki@sanbi.ac.za> The code is now in SVN. Bio::Tree::TreeFunctionsI::add_trait() can be used to set trait values in a tree. Enjoy, -Heikki On Wednesday 10 September 2008 13:32:06 Heikki Lehvaslaiho wrote: > FYI, > > I've been recently writing code to analyse phylogeny-trait associations. > These traits are typically geographical location of the sequence but they > can be any phenotypic characters associated with the sequences. > > This involves trees, i.e. Bio::Tree::Tree and Bio::Tree::Node objects and > strings describing the traits. I've been using tags to store trait values > within nodes. The tag methods are: > > Bio::Tree::Node::add_tag_value > Bio::Tree::Node::get_all_tags > Bio::Tree::Node::get_tag_values > Bio::Tree::Node::has_tag > Bio::Tree::Node::remove_all_tags > Bio::Tree::Node::remove_tag > > Question: Is there any particular reason why there is no > set_tag_value(scalar|@array) method? > > I am getting tired of writing: > $node->remove_tag($key); > map {$node->add_tag_value($key)} @values ; > so I am going to implement that unless there is are strong objections. > > Otherwise it has been smooth sailing. I am going to add > Bio::Tree::TreeFunctions::is_binary() and start populating > Bio::Tree::Statistics soon with these methods: > > ps() - Parsimony Score (PS) from Fitch 1971 > ai() - Association index (AI) of Whang et al. 2001 > mc() - Monophyletic Clade (MC) size statistics by Salemi at al. 2005 > cherries() - number of leaf node pairs > > If you have any comments, please feel free to post them here. > > -Heikki -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From hartzell at alerce.com Tue Sep 23 17:16:48 2008 From: hartzell at alerce.com (George Hartzell) Date: Tue, 23 Sep 2008 14:16:48 -0700 Subject: [Bioperl-l] confused about version numbers. Message-ID: <18649.23744.832478.551299@almost.alerce.com> I recently realized that I needed a feature from Bio::Search::Hit::GenericHit that exists in the svn trunk but isn't in the CPAN's version bioperl-1.5.2_102. I thought I'd just specify a version number in my Build.PL and that way I could sleep soundly. But, it looks like $Version in CPAN's Bio::Root::Version is 1.005002_102 while in the trunk it's 1.005002_100. What's the proper way for a Build.PL to be sure that it has a current bioperl? g. From sac at bioperl.org Tue Sep 23 18:05:15 2008 From: sac at bioperl.org (Steve Chervitz) Date: Tue, 23 Sep 2008 15:05:15 -0700 Subject: [Bioperl-l] confused about version numbers. In-Reply-To: <18649.23744.832478.551299@almost.alerce.com> References: <18649.23744.832478.551299@almost.alerce.com> Message-ID: <8f200b4c0809231505p6b3f9e77i278f52bcc355489@mail.gmail.com> Here's the general recommendation from the FAQ: http://bio.perl.org/wiki/FAQ#How_can_I_tell_what_version_of_BioPerl_is_installed.3F Not sure the recommended way to do this from a Build.PL. You could so such checking within a BEGIN block in your Build.PL, perhaps. Steve On Tue, Sep 23, 2008 at 2:16 PM, George Hartzell wrote: > > I recently realized that I needed a feature from > Bio::Search::Hit::GenericHit that exists in the svn trunk but isn't in > the CPAN's version bioperl-1.5.2_102. > > I thought I'd just specify a version number in my Build.PL and that > way I could sleep soundly. > > But, it looks like $Version in CPAN's Bio::Root::Version is > 1.005002_102 while in the trunk it's 1.005002_100. > > What's the proper way for a Build.PL to be sure that it has a current > bioperl? > > g. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From bix at sendu.me.uk Tue Sep 23 17:51:57 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 23 Sep 2008 22:51:57 +0100 Subject: [Bioperl-l] confused about version numbers. In-Reply-To: <18649.23744.832478.551299@almost.alerce.com> References: <18649.23744.832478.551299@almost.alerce.com> Message-ID: <48D964FD.7060007@sendu.me.uk> George Hartzell wrote: > I recently realized that I needed a feature from > Bio::Search::Hit::GenericHit that exists in the svn trunk but isn't in > the CPAN's version bioperl-1.5.2_102. > > I thought I'd just specify a version number in my Build.PL and that > way I could sleep soundly. > > But, it looks like $Version in CPAN's Bio::Root::Version is > 1.005002_102 while in the trunk it's 1.005002_100. Fixed. It's now 1.005002_103. > What's the proper way for a Build.PL to be sure that it has a current > bioperl? Your method was sound, I think. From cjfields at illinois.edu Tue Sep 23 21:23:25 2008 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 23 Sep 2008 20:23:25 -0500 Subject: [Bioperl-l] Bio::Species bug, was Species.t tests failing with bioperl-live References: <1501D3A7-42BA-4489-95E3-5267FFC8A856@illinois.edu> Message-ID: I'm reposting this in case it was lost (or forgotten). I'll file it as a bug if it's not addressed fairly soonish, but I would consider this a serious blocker for a 1.6 release. Any help would be appreciated. The last commit to Bio::Species, which was an effort to fix a memory leak, has unfortunately broken Species.t tests. The relevant commit used weaken(); my guess is something is being garbage collected way too early. Any ideas? chris Begin forwarded message: > From: Chris Fields > Date: September 18, 2008 7:15:03 AM CDT > To: Chris Fields > Cc: BioPerl List > Subject: Re: [Bioperl-l] Species.t tests failing with bioperl-live > > > On Sep 18, 2008, at 7:07 AM, Chris Fields wrote: > >> I'm getting a couple of failed tests with > > > Um, that was supposed to be: > > I'm getting a couple of failed tests with Species.t; I think these > are due to the last commit to Bio::Species: > > http://code.open-bio.org/svnweb/index.cgi/bioperl/revision/?rev=14793 > > Tests pass after commenting out that line. > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Caroline.Johnston at iop.kcl.ac.uk Wed Sep 24 10:19:05 2008 From: Caroline.Johnston at iop.kcl.ac.uk (Johnston, Caroline) Date: Wed, 24 Sep 2008 15:19:05 +0100 Subject: [Bioperl-l] Using Storable with SeqFeatures Message-ID: <5626ED9CB91C814197079FB9940312E506445F9B@MAIL.bc.iop.kcl.ac.uk> Hello. I'm trying to use Storable to save a Bio::Seq object and Storable seems to be having a weird problem dealing with freezing and thawing the code ref to Bio::SeqFeature::Generic cleanup_generic. If I change one line (931) in that function from foreach my $t ( keys %{$self->{'_gsf_tag_hash'} || {} } ) { ------ to foreach my $t ( keys %{$self->{'_gsf_tag_hash'} } ) { it works fine. I've pasted an example script at http://sial.org/pbot/32320. Any ideas why this syntax would break Storable? Would wrapping the foreach in an if(defined $self->{'_gsf_tag_hash'}) serve to replace the || {} ? I get the same problem using Bio::Root::Storable. Cxx From cain.cshl at gmail.com Wed Sep 24 18:00:35 2008 From: cain.cshl at gmail.com (Scott Cain) Date: Wed, 24 Sep 2008 18:00:35 -0400 Subject: [Bioperl-l] genbank2gff.pl choking on CONTIG sections Message-ID: <536f21b00809241500x5414289ayea143c7124758f5e@mail.gmail.com> Hi all, The BioPerl script bp_genbank2gff.pl, which will either convert a Genbank record to GFF or load it directly to a Bio::DB::GFF database, is choking on GenBank records with CONTIG sections. Since I don't think these would ever be useful for generating GFF or loading into a database (ie, the user will want to get all of the features on the parts, not know what the parts are), is there a way to force a Bio::DB::WebDBSeqI/Bio::DB::BioFetch to get the full record (like specifying view=gbwithparts in the url at ncbi)? Thanks, Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain.cshl at gmail.com GMOD Coordinator (http://gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From jason at bioperl.org Wed Sep 24 19:05:18 2008 From: jason at bioperl.org (Jason Stajich) Date: Wed, 24 Sep 2008 16:05:18 -0700 Subject: [Bioperl-l] genbank2gff.pl choking on CONTIG sections In-Reply-To: <536f21b00809241500x5414289ayea143c7124758f5e@mail.gmail.com> References: <536f21b00809241500x5414289ayea143c7124758f5e@mail.gmail.com> Message-ID: <5D6FEF22-76B7-4770-B749-A643AFF99169@bioperl.org> It should already if it is using Bio::DB::GenBank -- do you have example of a fail? There seems to be some defaulting to EMBL for the source in the biofetch code so it might be worth twiddling. from Bio::DB::GenBank Note that when querying for GenBank accessions starting with 'NT_' you will need to call $gb->request_format('fasta') beforehand, because in GenBank format (the default) the sequence part will be left out (the reason is that NT contigs are rather annotation with references to clones). Some work has been done to automatically detect and retrieve whole NT_ clones when the data is in that format (NCBI RefSeq clones). The former behavior prior to bioperl 1.6 was to retrieve these from EBI, but now these are retrieved directly from NCBI. The older behavior can be regained by setting the 'redirect_refseq' flag to a value evaluating to TRUE. On Sep 24, 2008, at 3:00 PM, Scott Cain wrote: > Hi all, > > The BioPerl script bp_genbank2gff.pl, which will either convert a > Genbank record to GFF or load it directly to a Bio::DB::GFF database, > is choking on GenBank records with CONTIG sections. Since I don't > think these would ever be useful for generating GFF or loading into a > database (ie, the user will want to get all of the features on the > parts, not know what the parts are), is there a way to force a > Bio::DB::WebDBSeqI/Bio::DB::BioFetch to get the full record (like > specifying view=gbwithparts in the url at ncbi)? > > Thanks, > Scott > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. cain.cshl at gmail.com > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Cold Spring Harbor Laboratory > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason at bioperl.org From hartzell at alerce.com Thu Sep 25 01:17:54 2008 From: hartzell at alerce.com (George Hartzell) Date: Wed, 24 Sep 2008 22:17:54 -0700 Subject: [Bioperl-l] Bio::Coordinate::Collection could DoWhatIMean better (w/ patch) In-Reply-To: <200809090950.11821.heikki@sanbi.ac.za> References: <200809090950.11821.heikki@sanbi.ac.za> Message-ID: <18651.7938.496813.710345@almost.alerce.com> Heikki Lehvaslaiho writes: > George, > > This is an error from my side. Great that you have a fix already. > > My only worry is the number of external dependencies in BioPerl. To limit > these we have recoded number of functionalities into BioPerl-specific modules. > Before you commit the fix, could you see if Bio::RangeI could be used or easily > extended to be used instead of Set::IntSpan? > > Thanks, > > -Heikki > > On Friday 05 September 2008 21:01:35 George Hartzell wrote: > > Hi all, > > > > Bio::Coordinate::Collection surprised me a bit. At first I thought > > there was a bug, but it's clearly doing what it's supposed to. Now > > I'm wondering if what it's supposed to be doing makes sense in some > > context, or if what I expected would be better functionality. > > > > t/CoordinateMapper.t sets up the following scenario: > > > > # > > # Collection > > # > > # 1 5 6 10 > > # |---| |---| > > #-----|----------------------- > > # 1 5 9 15 19 > > # pair1 pair2 > > > > Then goes on to do the following query: > > > > # match more than two > > $pos = Bio::Location::Simple->new (-start => 5, -end => 19 ); > > ok $res = $transcribe->map($pos); > > is $res->each_gap, 2; > > is $res->each_match, 2; > > > > I was surprised to see that there were two gaps, one gene:10-19 and > > one from gene:5-14. Looking at the code, what's really happening is > > that, for the exon1 mapper there's match with gene:5-9 and a gap with > > gene:10-19 and for the exon2 mapper there's a gap with gene:5-14 and a > > match with gene:15-19. All four Result's just get tossed into the > > return value. > > > > The result my intuition wants is that there are two matches (gene:5-9 > > with exon1 and gene:15-19 with exon2) and a gap (gene:10-14). > > > > Yes, I guess that I could just synthesize these myself from the result > > in my app. > > > > It still seems that the current result is a bug though, since there's > > no way of knowing when you're walking through $res->each_Location that > > the first "gap" is with respect to the exon1 mapper and that the > > second "gap" is with respect to the exon2 mapper. The gaps are > > meaningless. > > > > I "fixed" it to work the way I think it should (two matches, one > > gap). I actually extended the test case a bit so that there's a > > multi-base gap, a match, another multibase-gap, another match, then a > > single base gap (just to make sure I got that right...). I had to > > touch up the test file a bit to account for my new test. > > > > The gaps that I return have a strand of 'undef', which seems to be The > > Right Thing. There's also a bit of funny business where I hang onto > > the seq_id of the gapped sequence. It assumes that the "in" sequence > > is the same for all of the mappers. This seems safe since otherwise > > the entire query is kind of weird.... > > > > There's a patch to todays svn head at: > > > > http://shrimp.alerce.com/bioperl/collection-diffs.txt > > > > The patch changes Build.PL to include a dependency on Set::IntSpan, > > CoordinateMapper.t to update the tests, and > > Bio/CoordinateMapper/Collection.pm for the new code. > > > > Who's code would this break. > > > > If anyone's relying on the current behaviour re: gaps, what's the > > situation in which you find it useful? > > > > Thanks! > > > > g. > [...] So, I'm out of round 'tuit's trying to think of an _easy_ way to do this using Range/RangeI. I might just not be thinking clearly, and I've definitely trained myself to solve these kinds of problems with Set::IntSpan problems so I may just be waving my favorite hammer around. There may also be a BioPerl native way to do it using some fancier object than a RangeI (split locations?). It's easy to get the set of gaps for each pair, but is there already something that takes the intersection of a set of non-intersecting Range's? g. From hartzell at alerce.com Thu Sep 25 01:17:54 2008 From: hartzell at alerce.com (George Hartzell) Date: Wed, 24 Sep 2008 22:17:54 -0700 Subject: [Bioperl-l] Bio::Coordinate::Collection could DoWhatIMean better (w/ patch) In-Reply-To: <200809090950.11821.heikki@sanbi.ac.za> References: <200809090950.11821.heikki@sanbi.ac.za> Message-ID: <18651.7938.496813.710345@almost.alerce.com> Heikki Lehvaslaiho writes: > George, > > This is an error from my side. Great that you have a fix already. > > My only worry is the number of external dependencies in BioPerl. To limit > these we have recoded number of functionalities into BioPerl-specific modules. > Before you commit the fix, could you see if Bio::RangeI could be used or easily > extended to be used instead of Set::IntSpan? > > Thanks, > > -Heikki > > On Friday 05 September 2008 21:01:35 George Hartzell wrote: > > Hi all, > > > > Bio::Coordinate::Collection surprised me a bit. At first I thought > > there was a bug, but it's clearly doing what it's supposed to. Now > > I'm wondering if what it's supposed to be doing makes sense in some > > context, or if what I expected would be better functionality. > > > > t/CoordinateMapper.t sets up the following scenario: > > > > # > > # Collection > > # > > # 1 5 6 10 > > # |---| |---| > > #-----|----------------------- > > # 1 5 9 15 19 > > # pair1 pair2 > > > > Then goes on to do the following query: > > > > # match more than two > > $pos = Bio::Location::Simple->new (-start => 5, -end => 19 ); > > ok $res = $transcribe->map($pos); > > is $res->each_gap, 2; > > is $res->each_match, 2; > > > > I was surprised to see that there were two gaps, one gene:10-19 and > > one from gene:5-14. Looking at the code, what's really happening is > > that, for the exon1 mapper there's match with gene:5-9 and a gap with > > gene:10-19 and for the exon2 mapper there's a gap with gene:5-14 and a > > match with gene:15-19. All four Result's just get tossed into the > > return value. > > > > The result my intuition wants is that there are two matches (gene:5-9 > > with exon1 and gene:15-19 with exon2) and a gap (gene:10-14). > > > > Yes, I guess that I could just synthesize these myself from the result > > in my app. > > > > It still seems that the current result is a bug though, since there's > > no way of knowing when you're walking through $res->each_Location that > > the first "gap" is with respect to the exon1 mapper and that the > > second "gap" is with respect to the exon2 mapper. The gaps are > > meaningless. > > > > I "fixed" it to work the way I think it should (two matches, one > > gap). I actually extended the test case a bit so that there's a > > multi-base gap, a match, another multibase-gap, another match, then a > > single base gap (just to make sure I got that right...). I had to > > touch up the test file a bit to account for my new test. > > > > The gaps that I return have a strand of 'undef', which seems to be The > > Right Thing. There's also a bit of funny business where I hang onto > > the seq_id of the gapped sequence. It assumes that the "in" sequence > > is the same for all of the mappers. This seems safe since otherwise > > the entire query is kind of weird.... > > > > There's a patch to todays svn head at: > > > > http://shrimp.alerce.com/bioperl/collection-diffs.txt > > > > The patch changes Build.PL to include a dependency on Set::IntSpan, > > CoordinateMapper.t to update the tests, and > > Bio/CoordinateMapper/Collection.pm for the new code. > > > > Who's code would this break. > > > > If anyone's relying on the current behaviour re: gaps, what's the > > situation in which you find it useful? > > > > Thanks! > > > > g. > [...] So, I'm out of round 'tuit's trying to think of an _easy_ way to do this using Range/RangeI. I might just not be thinking clearly, and I've definitely trained myself to solve these kinds of problems with Set::IntSpan problems so I may just be waving my favorite hammer around. There may also be a BioPerl native way to do it using some fancier object than a RangeI (split locations?). It's easy to get the set of gaps for each pair, but is there already something that takes the intersection of a set of non-intersecting Range's? g. From tatedger at googlemail.com Thu Sep 25 11:23:46 2008 From: tatedger at googlemail.com (john paul) Date: Thu, 25 Sep 2008 16:23:46 +0100 Subject: [Bioperl-l] Can't locate object method "get_dbxrefs" Message-ID: Hello guys, I need to pick your brain on this. I was trying to load some sequences in a fresh RH build using bioperl-db and got the following error: [tatedger at localhost biosql]$ perl load_seqdatabase.pl --host localhost --dbuser root --dbname biosql --namespace swissprot --format swiss /home/tatedger/tmp/uniprot_sprot.dat --testonly Loading /home/tatedger/tmp/uniprot_sprot.dat ... Could not store Q4U9M9: Can't locate object method "get_dbxrefs" via package "Bio::Ontology::Term" at /usr/lib/perl5/site_perl/5.8.5/Bio/DB/Persistent/PersistentObject.pm line 552, line 70. I have seen some posts on this regard ( http://bioperl.org/pipermail/bioperl-l/2008-April/027544.html) but it wasn't clear to me what the solution would be. My configuration: - mysql version 4.1.7 - Red Hat Enterprise Linux ES release 4 (Nahant) to install bioperl and bioperl-db I used the help found on the website following - cpan>install S/SE/SENDU/bioperl-1.5.2_102.tar.gz - svn co svn://code.open-bio.org/bioperl/bioperl-db/trunk bioperl-db - biosql schema is loaded and load_ncbi_taxonomy.pl worked fine. bioperl-db test 04 shows the same error. [tatedger at localhost bioperl-db]$ perl t/04swiss.t 1..52 ok 1 ok 2 ok 3 ok 4 ok 5 Can't locate object method "get_dbxrefs" via package "Bio::Ontology::Term" at /usr/lib/perl5/site_perl/5.8.5/Bio/DB/Persistent/PersistentObject.pm line 552, line 78. my @INC file looks like /usr/lib/perl5/5.8.5/i386-linux-thread-multi /usr/lib/perl5/5.8.5 /usr/lib/perl5/site_perl/5.8.5/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.4/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.3/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.2/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.1/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.5 /usr/lib/perl5/site_perl/5.8.4 /usr/lib/perl5/site_perl/5.8.3 /usr/lib/perl5/site_perl/5.8.2 /usr/lib/perl5/site_perl/5.8.1 /usr/lib/perl5/site_perl/5.8.0 /usr/lib/perl5/site_perl /usr/lib/perl5/vendor_perl/5.8.5/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.4/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.3/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.2/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.1/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.5 /usr/lib/perl5/vendor_perl/5.8.4 /usr/lib/perl5/vendor_perl/5.8.3 /usr/lib/perl5/vendor_perl/5.8.2 /usr/lib/perl5/vendor_perl/5.8.1 /usr/lib/perl5/vendor_perl/5.8.0 /usr/lib/perl5/vendor_perl What am I doing wrong? Could you help? Thanks in advance for your help Tom From bosborne11 at verizon.net Thu Sep 25 11:40:45 2008 From: bosborne11 at verizon.net (Brian Osborne) Date: Thu, 25 Sep 2008 11:40:45 -0400 Subject: [Bioperl-l] Can't locate object method "get_dbxrefs" In-Reply-To: References: Message-ID: <2A46D690-D925-4188-9DDF-468E31CE83BE@verizon.net> John, What does: >locate Bio/Ontology/Term.pm Say? Brian O. On Sep 25, 2008, at 11:23 AM, john paul wrote: > Hello guys, > > I need to pick your brain on this. I was trying to load some > sequences in a > fresh RH build using bioperl-db and got the following error: > > [tatedger at localhost biosql]$ perl load_seqdatabase.pl --host localhost > --dbuser root --dbname biosql --namespace swissprot --format swiss > /home/tatedger/tmp/uniprot_sprot.dat --testonly > Loading /home/tatedger/tmp/uniprot_sprot.dat ... > Could not store Q4U9M9: Can't locate object method "get_dbxrefs" via > package > "Bio::Ontology::Term" at > /usr/lib/perl5/site_perl/5.8.5/Bio/DB/Persistent/PersistentObject.pm > line > 552, line 70. > > > I have seen some posts on this regard ( > http://bioperl.org/pipermail/bioperl-l/2008-April/027544.html) but > it wasn't > clear to me what the solution would be. > > > My configuration: > - mysql version 4.1.7 > - Red Hat Enterprise Linux ES release 4 (Nahant) > > to install bioperl and bioperl-db I used the help found on the website > following > - cpan>install S/SE/SENDU/bioperl-1.5.2_102.tar.gz > - svn co svn://code.open-bio.org/bioperl/bioperl-db/trunk bioperl-db > - biosql schema is loaded and load_ncbi_taxonomy.pl worked fine. > > bioperl-db test 04 shows the same error. > > [tatedger at localhost bioperl-db]$ perl t/04swiss.t > 1..52 > ok 1 > ok 2 > ok 3 > ok 4 > ok 5 > Can't locate object method "get_dbxrefs" via package > "Bio::Ontology::Term" > at /usr/lib/perl5/site_perl/5.8.5/Bio/DB/Persistent/ > PersistentObject.pm line > 552, line 78. > > > my @INC file looks like > /usr/lib/perl5/5.8.5/i386-linux-thread-multi > /usr/lib/perl5/5.8.5 > /usr/lib/perl5/site_perl/5.8.5/i386-linux-thread-multi > /usr/lib/perl5/site_perl/5.8.4/i386-linux-thread-multi > /usr/lib/perl5/site_perl/5.8.3/i386-linux-thread-multi > /usr/lib/perl5/site_perl/5.8.2/i386-linux-thread-multi > /usr/lib/perl5/site_perl/5.8.1/i386-linux-thread-multi > /usr/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi > /usr/lib/perl5/site_perl/5.8.5 > /usr/lib/perl5/site_perl/5.8.4 > /usr/lib/perl5/site_perl/5.8.3 > /usr/lib/perl5/site_perl/5.8.2 > /usr/lib/perl5/site_perl/5.8.1 > /usr/lib/perl5/site_perl/5.8.0 > /usr/lib/perl5/site_perl > /usr/lib/perl5/vendor_perl/5.8.5/i386-linux-thread-multi > /usr/lib/perl5/vendor_perl/5.8.4/i386-linux-thread-multi > /usr/lib/perl5/vendor_perl/5.8.3/i386-linux-thread-multi > /usr/lib/perl5/vendor_perl/5.8.2/i386-linux-thread-multi > /usr/lib/perl5/vendor_perl/5.8.1/i386-linux-thread-multi > /usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi > /usr/lib/perl5/vendor_perl/5.8.5 > /usr/lib/perl5/vendor_perl/5.8.4 > /usr/lib/perl5/vendor_perl/5.8.3 > /usr/lib/perl5/vendor_perl/5.8.2 > /usr/lib/perl5/vendor_perl/5.8.1 > /usr/lib/perl5/vendor_perl/5.8.0 > /usr/lib/perl5/vendor_perl > > What am I doing wrong? Could you help? > > Thanks in advance for your help > Tom > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From reece at harts.net Thu Sep 25 12:15:23 2008 From: reece at harts.net (Reece Hart) Date: Thu, 25 Sep 2008 09:15:23 -0700 Subject: [Bioperl-l] Bio::Coordinate::Collection could DoWhatIMean better (w/ patch) In-Reply-To: <18651.7938.496813.710345@almost.alerce.com> References: <200809090950.11821.heikki@sanbi.ac.za> <18651.7938.496813.710345@almost.alerce.com> Message-ID: <1222359323.15677.4.camel@snafu> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From tatedger at googlemail.com Thu Sep 25 12:27:29 2008 From: tatedger at googlemail.com (john paul) Date: Thu, 25 Sep 2008 17:27:29 +0100 Subject: [Bioperl-l] Can't locate object method "get_dbxrefs" In-Reply-To: <2A46D690-D925-4188-9DDF-468E31CE83BE@verizon.net> References: <2A46D690-D925-4188-9DDF-468E31CE83BE@verizon.net> Message-ID: On Thu, Sep 25, 2008 at 4:40 PM, Brian Osborne wrote: > John, > > What does: > > >locate Bio/Ontology/Term.pm > > Say? > Silly me. $ locate Bio/Ontology/Term.pm /usr/lib/perl5/site_perl/5.8.5/Bio/Ontology/Term.pm and $ cat /usr/lib/perl5/site_perl/5.8.5/Bio/Ontology/Term.pm|grep get_dbxrefs produced no answer the latest version posted by Sendu on cpan says for that module # $Id: Term.pm,v 1.31.4.1 2006/10/02 23:10:22 sendu Exp $ # # BioPerl module for Bio::Ontology::Term and the live version at ( http://code.open-bio.org/svnweb/index.cgi/bioperl/view/bioperl-live/trunk/Bio/Ontology/Term.pm ) says # $Id: Term.pm 14708 2008-06-10 00:08:17Z heikki $ # # BioPerl module for Bio::Ontology::Term # I was using an old distribution from CPAN. reinstalling from the main trunk svn co svn://code.open-bio.org/bioperl/bioperl-live/trunk bioperl-live solved the problem. Thanks again Tom > > Brian O. > > > > On Sep 25, 2008, at 11:23 AM, john paul wrote: > > Hello guys, >> >> I need to pick your brain on this. I was trying to load some sequences in >> a >> fresh RH build using bioperl-db and got the following error: >> >> [tatedger at localhost biosql]$ perl load_seqdatabase.pl --host localhost >> --dbuser root --dbname biosql --namespace swissprot --format swiss >> /home/tatedger/tmp/uniprot_sprot.dat --testonly >> Loading /home/tatedger/tmp/uniprot_sprot.dat ... >> Could not store Q4U9M9: Can't locate object method "get_dbxrefs" via >> package >> "Bio::Ontology::Term" at >> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/Persistent/PersistentObject.pm line >> 552, line 70. >> >> >> I have seen some posts on this regard ( >> http://bioperl.org/pipermail/bioperl-l/2008-April/027544.html) but it >> wasn't >> clear to me what the solution would be. >> >> >> My configuration: >> - mysql version 4.1.7 >> - Red Hat Enterprise Linux ES release 4 (Nahant) >> >> to install bioperl and bioperl-db I used the help found on the website >> following >> - cpan>install S/SE/SENDU/bioperl-1.5.2_102.tar.gz >> - svn co svn://code.open-bio.org/bioperl/bioperl-db/trunk bioperl-db >> - biosql schema is loaded and load_ncbi_taxonomy.pl worked fine. >> >> bioperl-db test 04 shows the same error. >> >> [tatedger at localhost bioperl-db]$ perl t/04swiss.t >> 1..52 >> ok 1 >> ok 2 >> ok 3 >> ok 4 >> ok 5 >> Can't locate object method "get_dbxrefs" via package "Bio::Ontology::Term" >> at /usr/lib/perl5/site_perl/5.8.5/Bio/DB/Persistent/PersistentObject.pm >> line >> 552, line 78. >> >> >> my @INC file looks like >> /usr/lib/perl5/5.8.5/i386-linux-thread-multi >> /usr/lib/perl5/5.8.5 >> /usr/lib/perl5/site_perl/5.8.5/i386-linux-thread-multi >> /usr/lib/perl5/site_perl/5.8.4/i386-linux-thread-multi >> /usr/lib/perl5/site_perl/5.8.3/i386-linux-thread-multi >> /usr/lib/perl5/site_perl/5.8.2/i386-linux-thread-multi >> /usr/lib/perl5/site_perl/5.8.1/i386-linux-thread-multi >> /usr/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi >> /usr/lib/perl5/site_perl/5.8.5 >> /usr/lib/perl5/site_perl/5.8.4 >> /usr/lib/perl5/site_perl/5.8.3 >> /usr/lib/perl5/site_perl/5.8.2 >> /usr/lib/perl5/site_perl/5.8.1 >> /usr/lib/perl5/site_perl/5.8.0 >> /usr/lib/perl5/site_perl >> /usr/lib/perl5/vendor_perl/5.8.5/i386-linux-thread-multi >> /usr/lib/perl5/vendor_perl/5.8.4/i386-linux-thread-multi >> /usr/lib/perl5/vendor_perl/5.8.3/i386-linux-thread-multi >> /usr/lib/perl5/vendor_perl/5.8.2/i386-linux-thread-multi >> /usr/lib/perl5/vendor_perl/5.8.1/i386-linux-thread-multi >> /usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi >> /usr/lib/perl5/vendor_perl/5.8.5 >> /usr/lib/perl5/vendor_perl/5.8.4 >> /usr/lib/perl5/vendor_perl/5.8.3 >> /usr/lib/perl5/vendor_perl/5.8.2 >> /usr/lib/perl5/vendor_perl/5.8.1 >> /usr/lib/perl5/vendor_perl/5.8.0 >> /usr/lib/perl5/vendor_perl >> >> What am I doing wrong? Could you help? >> >> Thanks in advance for your help >> Tom >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > From bix at sendu.me.uk Thu Sep 25 13:12:22 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 25 Sep 2008 18:12:22 +0100 Subject: [Bioperl-l] Can't locate object method "get_dbxrefs" In-Reply-To: References: Message-ID: <48DBC676.7090704@sendu.me.uk> john paul wrote: > - cpan>install S/SE/SENDU/bioperl-1.5.2_102.tar.gz > - svn co svn://code.open-bio.org/bioperl/bioperl-db/trunk bioperl-db [...] > Can't locate object method "get_dbxrefs" via package "Bio::Ontology::Term" [...] > What am I doing wrong? Could you help? You're using the svb versuib bioperl-db but the release version of core (aka live). Don't mix and match; use the same version of both. So either install bioperl-live from svn as well, or get rid of you svn bioperl-db and install bioperl-db 1.5.2: http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz I'd recommend using svn for both. From hlapp at gmx.net Thu Sep 25 18:24:37 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 25 Sep 2008 18:24:37 -0400 Subject: [Bioperl-l] Using Storable with SeqFeatures In-Reply-To: <5626ED9CB91C814197079FB9940312E506445F9B@MAIL.bc.iop.kcl.ac.uk> References: <5626ED9CB91C814197079FB9940312E506445F9B@MAIL.bc.iop.kcl.ac.uk> Message-ID: <517FAB85-BC7F-4135-B510-A2087267A314@gmx.net> On Sep 24, 2008, at 10:19 AM, Johnston, Caroline wrote: > Hello. > > I'm trying to use Storable to save a Bio::Seq object and Storable > seems to be having a weird problem dealing with freezing and thawing > the code ref to Bio::SeqFeature::Generic cleanup_generic. If I > change one line (931) in that function from > > foreach my $t ( keys %{$self->{'_gsf_tag_hash'} || {} } ) { > ------ > to > > foreach my $t ( keys %{$self->{'_gsf_tag_hash'} } ) { > > it works fine. I've pasted an example script at http://sial.org/pbot/32320 > . > > Any ideas why this syntax would break Storable? Would wrapping the > foreach in an if(defined $self->{'_gsf_tag_hash'}) serve to replace > the || {} ? This should have exactly the same behavior as far as I can see. Would you mind filing this as a bug repot with patch? -hilmar > > > I get the same problem using Bio::Root::Storable. > > Cxx > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From mdschultz at ucdavis.edu Thu Sep 25 18:06:02 2008 From: mdschultz at ucdavis.edu (Matthew Schultz) Date: Thu, 25 Sep 2008 15:06:02 -0700 (PDT) Subject: [Bioperl-l] BioPerl Installation Help Message-ID: <200809252206.m8PM62Se004142@phaenicia.ucdavis.edu> Hi Bioperl, I've been trying to install bioperl on my Ubuntu machine I'm new to Bioperl and don't have much experience in unix so I'm at a loss for what to do next. I tried to follow the instructions on using CPAN to install Bioperl, but the installation failed. At first I thought it was because of the few warnings I received, but after trying the "force install" command without success either I'm not so sure. I was about to try the alternate option using Build.pl, but am not sure where the Bioperl installation should go (or will the Build.pl script place it in the right folder?). Any help or advice you could give would be appreciated. Thanks for your time. -Matt Schultz P.S. Here are the failed test results: Failed Test Stat Wstat Total Fail Failed List of Failed ------------------------------------------------------------------------------- t/BioFetch_DB.t 27 4 14.81% 8 20-21 27 t/DB.t 78 2 2.56% 30-31 t/EMBL_DB.t 15 3 20.00% 6 13-14 t/GuessSeqFormat.t 46 1 2.17% 11 t/InterProParser.t 2 512 47 1 2.13% 2 t/tutorial.t 2 512 21 6 28.57% 19-21 18 subtests skipped. Failed 6/179 test scripts, 96.65% okay. 14/8122 subtests failed, 99.83% okay. make: *** [test_dynamic] Error 255 /usr/bin/make test -- NOT OK Running make install make test had returned bad status, won't install without force From caroline.johnston at iop.kcl.ac.uk Thu Sep 25 23:19:38 2008 From: caroline.johnston at iop.kcl.ac.uk (Caroline) Date: Fri, 26 Sep 2008 04:19:38 +0100 Subject: [Bioperl-l] Using Storable with SeqFeatures In-Reply-To: <517FAB85-BC7F-4135-B510-A2087267A314@gmx.net> References: <5626ED9CB91C814197079FB9940312E506445F9B@MAIL.bc.iop.kcl.ac.uk> <517FAB85-BC7F-4135-B510-A2087267A314@gmx.net> Message-ID: <1222399178.18153.0.camel@clive> On Thu, 2008-09-25 at 18:24 -0400, Hilmar Lapp wrote: > Would you mind filing this as a bug repot with patch? > > -hilmar > Done, I think. I haven't used Bugzilla before, so give me a shout if I haven't submitted it properly. Cxx From hartzell at alerce.com Fri Sep 26 00:25:08 2008 From: hartzell at alerce.com (George Hartzell) Date: Thu, 25 Sep 2008 21:25:08 -0700 Subject: [Bioperl-l] Bio::Coordinate::Collection could DoWhatIMean better (w/ patch) In-Reply-To: <200809090950.11821.heikki@sanbi.ac.za> References: <200809090950.11821.heikki@sanbi.ac.za> Message-ID: <18652.25636.694188.904749@almost.alerce.com> What would people think about a Bio::Range::Set (and/or SetI) which behaves like Set::Intspan (which alone would be a waste of effort) but also embraces RangeI's concepts of stranded-ness and strong/weak/unspecified comparisons? g. (who's sometimes thinks that 'tuits grow on trees...) From hartzell at alerce.com Fri Sep 26 00:25:08 2008 From: hartzell at alerce.com (George Hartzell) Date: Thu, 25 Sep 2008 21:25:08 -0700 Subject: [Bioperl-l] Bio::Coordinate::Collection could DoWhatIMean better (w/ patch) In-Reply-To: <200809090950.11821.heikki@sanbi.ac.za> References: <200809090950.11821.heikki@sanbi.ac.za> Message-ID: <18652.25636.694188.904749@almost.alerce.com> What would people think about a Bio::Range::Set (and/or SetI) which behaves like Set::Intspan (which alone would be a waste of effort) but also embraces RangeI's concepts of stranded-ness and strong/weak/unspecified comparisons? g. (who's sometimes thinks that 'tuits grow on trees...) From heikki at sanbi.ac.za Fri Sep 26 02:25:54 2008 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Fri, 26 Sep 2008 08:25:54 +0200 Subject: [Bioperl-l] Bio::Coordinate::Collection could DoWhatIMean better (w/ patch) In-Reply-To: <18652.25636.694188.904749@almost.alerce.com> References: <200809090950.11821.heikki@sanbi.ac.za> <18652.25636.694188.904749@almost.alerce.com> Message-ID: <200809260825.54638.heikki@sanbi.ac.za> George, Bio::Range::Set sounds like great idea. Sorry to put you in a tight spot with the request for keeping external dependencies to the minimum. I should have stressed that follow it only if it can be done easily. If you look at the actual dependencies of BioPerl at the moment, you'll realize not many have to patience to follow that guideline. The main thing is the get working code in. If you then have the energy to code a module that gets rid of the dependency, that's an added bonus. Thanks for your effort, Yours, -Heikki On Friday 26 September 2008 06:25:08 George Hartzell wrote: > What would people think about a Bio::Range::Set (and/or SetI) which > behaves like Set::Intspan (which alone would be a waste of effort) but > also embraces RangeI's concepts of stranded-ness and > strong/weak/unspecified comparisons? > > g. (who's sometimes thinks that 'tuits grow on trees...) -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From tatedger at googlemail.com Thu Sep 25 12:27:29 2008 From: tatedger at googlemail.com (john paul) Date: Thu, 25 Sep 2008 17:27:29 +0100 Subject: [Bioperl-l] [BioSQL-l] Can't locate object method "get_dbxrefs" In-Reply-To: <2A46D690-D925-4188-9DDF-468E31CE83BE@verizon.net> References: <2A46D690-D925-4188-9DDF-468E31CE83BE@verizon.net> Message-ID: On Thu, Sep 25, 2008 at 4:40 PM, Brian Osborne wrote: > John, > > What does: > > >locate Bio/Ontology/Term.pm > > Say? > Silly me. $ locate Bio/Ontology/Term.pm /usr/lib/perl5/site_perl/5.8.5/Bio/Ontology/Term.pm and $ cat /usr/lib/perl5/site_perl/5.8.5/Bio/Ontology/Term.pm|grep get_dbxrefs produced no answer the latest version posted by Sendu on cpan says for that module # $Id: Term.pm,v 1.31.4.1 2006/10/02 23:10:22 sendu Exp $ # # BioPerl module for Bio::Ontology::Term and the live version at ( http://code.open-bio.org/svnweb/index.cgi/bioperl/view/bioperl-live/trunk/Bio/Ontology/Term.pm ) says # $Id: Term.pm 14708 2008-06-10 00:08:17Z heikki $ # # BioPerl module for Bio::Ontology::Term # I was using an old distribution from CPAN. reinstalling from the main trunk svn co svn://code.open-bio.org/bioperl/bioperl-live/trunk bioperl-live solved the problem. Thanks again Tom > > Brian O. > > > > On Sep 25, 2008, at 11:23 AM, john paul wrote: > > Hello guys, >> >> I need to pick your brain on this. I was trying to load some sequences in >> a >> fresh RH build using bioperl-db and got the following error: >> >> [tatedger at localhost biosql]$ perl load_seqdatabase.pl --host localhost >> --dbuser root --dbname biosql --namespace swissprot --format swiss >> /home/tatedger/tmp/uniprot_sprot.dat --testonly >> Loading /home/tatedger/tmp/uniprot_sprot.dat ... >> Could not store Q4U9M9: Can't locate object method "get_dbxrefs" via >> package >> "Bio::Ontology::Term" at >> /usr/lib/perl5/site_perl/5.8.5/Bio/DB/Persistent/PersistentObject.pm line >> 552, line 70. >> >> >> I have seen some posts on this regard ( >> http://bioperl.org/pipermail/bioperl-l/2008-April/027544.html) but it >> wasn't >> clear to me what the solution would be. >> >> >> My configuration: >> - mysql version 4.1.7 >> - Red Hat Enterprise Linux ES release 4 (Nahant) >> >> to install bioperl and bioperl-db I used the help found on the website >> following >> - cpan>install S/SE/SENDU/bioperl-1.5.2_102.tar.gz >> - svn co svn://code.open-bio.org/bioperl/bioperl-db/trunk bioperl-db >> - biosql schema is loaded and load_ncbi_taxonomy.pl worked fine. >> >> bioperl-db test 04 shows the same error. >> >> [tatedger at localhost bioperl-db]$ perl t/04swiss.t >> 1..52 >> ok 1 >> ok 2 >> ok 3 >> ok 4 >> ok 5 >> Can't locate object method "get_dbxrefs" via package "Bio::Ontology::Term" >> at /usr/lib/perl5/site_perl/5.8.5/Bio/DB/Persistent/PersistentObject.pm >> line >> 552, line 78. >> >> >> my @INC file looks like >> /usr/lib/perl5/5.8.5/i386-linux-thread-multi >> /usr/lib/perl5/5.8.5 >> /usr/lib/perl5/site_perl/5.8.5/i386-linux-thread-multi >> /usr/lib/perl5/site_perl/5.8.4/i386-linux-thread-multi >> /usr/lib/perl5/site_perl/5.8.3/i386-linux-thread-multi >> /usr/lib/perl5/site_perl/5.8.2/i386-linux-thread-multi >> /usr/lib/perl5/site_perl/5.8.1/i386-linux-thread-multi >> /usr/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi >> /usr/lib/perl5/site_perl/5.8.5 >> /usr/lib/perl5/site_perl/5.8.4 >> /usr/lib/perl5/site_perl/5.8.3 >> /usr/lib/perl5/site_perl/5.8.2 >> /usr/lib/perl5/site_perl/5.8.1 >> /usr/lib/perl5/site_perl/5.8.0 >> /usr/lib/perl5/site_perl >> /usr/lib/perl5/vendor_perl/5.8.5/i386-linux-thread-multi >> /usr/lib/perl5/vendor_perl/5.8.4/i386-linux-thread-multi >> /usr/lib/perl5/vendor_perl/5.8.3/i386-linux-thread-multi >> /usr/lib/perl5/vendor_perl/5.8.2/i386-linux-thread-multi >> /usr/lib/perl5/vendor_perl/5.8.1/i386-linux-thread-multi >> /usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi >> /usr/lib/perl5/vendor_perl/5.8.5 >> /usr/lib/perl5/vendor_perl/5.8.4 >> /usr/lib/perl5/vendor_perl/5.8.3 >> /usr/lib/perl5/vendor_perl/5.8.2 >> /usr/lib/perl5/vendor_perl/5.8.1 >> /usr/lib/perl5/vendor_perl/5.8.0 >> /usr/lib/perl5/vendor_perl >> >> What am I doing wrong? Could you help? >> >> Thanks in advance for your help >> Tom >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ BioSQL-l mailing list BioSQL-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biosql-l From bix at sendu.me.uk Fri Sep 26 04:26:36 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 26 Sep 2008 09:26:36 +0100 Subject: [Bioperl-l] Bio::Species bug, was Species.t tests failing with bioperl-live In-Reply-To: References: <1501D3A7-42BA-4489-95E3-5267FFC8A856@illinois.edu> Message-ID: <48DC9CBC.1090909@sendu.me.uk> Chris Fields wrote: > The last commit to Bio::Species, which was an effort to fix a memory > leak, has unfortunately broken Species.t tests. The relevant commit > used weaken(); my guess is something is being garbage collected way too > early. Any ideas? Was that last commit motivated by a bug report, and/or is there an associated test? If not, perhaps it can just be reverted? From tatedger at googlemail.com Thu Sep 25 11:23:46 2008 From: tatedger at googlemail.com (john paul) Date: Thu, 25 Sep 2008 16:23:46 +0100 Subject: [Bioperl-l] [BioSQL-l] Can't locate object method "get_dbxrefs" Message-ID: Hello guys, I need to pick your brain on this. I was trying to load some sequences in a fresh RH build using bioperl-db and got the following error: [tatedger at localhost biosql]$ perl load_seqdatabase.pl --host localhost --dbuser root --dbname biosql --namespace swissprot --format swiss /home/tatedger/tmp/uniprot_sprot.dat --testonly Loading /home/tatedger/tmp/uniprot_sprot.dat ... Could not store Q4U9M9: Can't locate object method "get_dbxrefs" via package "Bio::Ontology::Term" at /usr/lib/perl5/site_perl/5.8.5/Bio/DB/Persistent/PersistentObject.pm line 552, line 70. I have seen some posts on this regard ( http://bioperl.org/pipermail/bioperl-l/2008-April/027544.html) but it wasn't clear to me what the solution would be. My configuration: - mysql version 4.1.7 - Red Hat Enterprise Linux ES release 4 (Nahant) to install bioperl and bioperl-db I used the help found on the website following - cpan>install S/SE/SENDU/bioperl-1.5.2_102.tar.gz - svn co svn://code.open-bio.org/bioperl/bioperl-db/trunk bioperl-db - biosql schema is loaded and load_ncbi_taxonomy.pl worked fine. bioperl-db test 04 shows the same error. [tatedger at localhost bioperl-db]$ perl t/04swiss.t 1..52 ok 1 ok 2 ok 3 ok 4 ok 5 Can't locate object method "get_dbxrefs" via package "Bio::Ontology::Term" at /usr/lib/perl5/site_perl/5.8.5/Bio/DB/Persistent/PersistentObject.pm line 552, line 78. my @INC file looks like /usr/lib/perl5/5.8.5/i386-linux-thread-multi /usr/lib/perl5/5.8.5 /usr/lib/perl5/site_perl/5.8.5/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.4/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.3/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.2/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.1/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.5 /usr/lib/perl5/site_perl/5.8.4 /usr/lib/perl5/site_perl/5.8.3 /usr/lib/perl5/site_perl/5.8.2 /usr/lib/perl5/site_perl/5.8.1 /usr/lib/perl5/site_perl/5.8.0 /usr/lib/perl5/site_perl /usr/lib/perl5/vendor_perl/5.8.5/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.4/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.3/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.2/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.1/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.5 /usr/lib/perl5/vendor_perl/5.8.4 /usr/lib/perl5/vendor_perl/5.8.3 /usr/lib/perl5/vendor_perl/5.8.2 /usr/lib/perl5/vendor_perl/5.8.1 /usr/lib/perl5/vendor_perl/5.8.0 /usr/lib/perl5/vendor_perl What am I doing wrong? Could you help? Thanks in advance for your help Tom _______________________________________________ BioSQL-l mailing list BioSQL-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biosql-l From cjfields at illinois.edu Fri Sep 26 08:20:45 2008 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 26 Sep 2008 07:20:45 -0500 Subject: [Bioperl-l] Bio::Species bug, was Species.t tests failing with bioperl-live In-Reply-To: <48DC9CBC.1090909@sendu.me.uk> References: <1501D3A7-42BA-4489-95E3-5267FFC8A856@illinois.edu> <48DC9CBC.1090909@sendu.me.uk> Message-ID: On Sep 26, 2008, at 3:26 AM, Sendu Bala wrote: > Chris Fields wrote: >> The last commit to Bio::Species, which was an effort to fix a >> memory leak, has unfortunately broken Species.t tests. The >> relevant commit used weaken(); my guess is something is being >> garbage collected way too early. Any ideas? > > Was that last commit motivated by a bug report, and/or is there an > associated test? If not, perhaps it can just be reverted? Here's the relevant list thread: http://thread.gmane.org/gmane.comp.lang.perl.bio.general/17541/focus=17550 The proposed patch was George's, which gets rid of the leak but causes the Species.t failures. I say we revert the patch, but we'll still need to address the original problem addressed by Rutger Vos, i.e. that there is a significant memory leak in Bio::Species. chris From bix at sendu.me.uk Fri Sep 26 09:45:21 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 26 Sep 2008 14:45:21 +0100 Subject: [Bioperl-l] Bio::Species bug, was Species.t tests failing with bioperl-live In-Reply-To: References: <1501D3A7-42BA-4489-95E3-5267FFC8A856@illinois.edu> <48DC9CBC.1090909@sendu.me.uk> Message-ID: <48DCE771.8050001@sendu.me.uk> Chris Fields wrote: > > On Sep 26, 2008, at 3:26 AM, Sendu Bala wrote: > >> Chris Fields wrote: >>> The last commit to Bio::Species, which was an effort to fix a memory >>> leak, has unfortunately broken Species.t tests. The relevant commit >>> used weaken(); my guess is something is being garbage collected way >>> too early. Any ideas? >> >> Was that last commit motivated by a bug report, and/or is there an >> associated test? If not, perhaps it can just be reverted? > > Here's the relevant list thread: > > http://thread.gmane.org/gmane.comp.lang.perl.bio.general/17541/focus=17550 > > The proposed patch was George's, which gets rid of the leak but causes > the Species.t failures. I say we revert the patch, but we'll still need > to address the original problem addressed by Rutger Vos, i.e. that there > is a significant memory leak in Bio::Species. Thanks. Rather than revert I've kludged it for now. Everything should be fine, it's just that I fear my solution isn't 'correct' and could mess things up we don't happen to have tests for. See http://bugzilla.open-bio.org/show_bug.cgi?id=2594 From cjfields at illinois.edu Fri Sep 26 10:54:55 2008 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 26 Sep 2008 09:54:55 -0500 Subject: [Bioperl-l] Bio::Species bug, was Species.t tests failing with bioperl-live In-Reply-To: <48DCE771.8050001@sendu.me.uk> References: <1501D3A7-42BA-4489-95E3-5267FFC8A856@illinois.edu> <48DC9CBC.1090909@sendu.me.uk> <48DCE771.8050001@sendu.me.uk> Message-ID: <9C49A138-CC07-4716-AFD6-FFB2F217C825@illinois.edu> On Sep 26, 2008, at 8:45 AM, Sendu Bala wrote: > Chris Fields wrote: >> On Sep 26, 2008, at 3:26 AM, Sendu Bala wrote: >>> Chris Fields wrote: >>>> The last commit to Bio::Species, which was an effort to fix a >>>> memory leak, has unfortunately broken Species.t tests. The >>>> relevant commit used weaken(); my guess is something is being >>>> garbage collected way too early. Any ideas? >>> >>> Was that last commit motivated by a bug report, and/or is there an >>> associated test? If not, perhaps it can just be reverted? >> Here's the relevant list thread: >> http://thread.gmane.org/gmane.comp.lang.perl.bio.general/17541/focus=17550 >> The proposed patch was George's, which gets rid of the leak but >> causes the Species.t failures. I say we revert the patch, but >> we'll still need to address the original problem addressed by >> Rutger Vos, i.e. that there is a significant memory leak in >> Bio::Species. > > Thanks. Rather than revert I've kludged it for now. Everything > should be fine, it's just that I fear my solution isn't 'correct' > and could mess things up we don't happen to have tests for. > > See http://bugzilla.open-bio.org/show_bug.cgi?id=2594 Probably something that will go away when we quit using Bio::Species altogether (in favor of Bio::Taxon) after 1.6. I'm not sure whether it will be worth the trouble, but it's worth discussing. Anyway, I'm still a bit unfamiliar with some aspects of Bio::Species (and I'm pretty busy ATM) so it make take a few days. chris From hartzell at alerce.com Fri Sep 26 20:21:41 2008 From: hartzell at alerce.com (George Hartzell) Date: Fri, 26 Sep 2008 17:21:41 -0700 Subject: [Bioperl-l] Bio::Species bug, was Species.t tests failing with bioperl-live In-Reply-To: <9C49A138-CC07-4716-AFD6-FFB2F217C825@illinois.edu> References: <1501D3A7-42BA-4489-95E3-5267FFC8A856@illinois.edu> <48DC9CBC.1090909@sendu.me.uk> <48DCE771.8050001@sendu.me.uk> <9C49A138-CC07-4716-AFD6-FFB2F217C825@illinois.edu> Message-ID: <18653.31893.762357.718899@almost.alerce.com> Chris Fields writes: > > On Sep 26, 2008, at 8:45 AM, Sendu Bala wrote: > > > Chris Fields wrote: > >> On Sep 26, 2008, at 3:26 AM, Sendu Bala wrote: > >>> Chris Fields wrote: > >>>> The last commit to Bio::Species, which was an effort to fix a > >>>> memory leak, has unfortunately broken Species.t tests. The > >>>> relevant commit used weaken(); my guess is something is being > >>>> garbage collected way too early. Any ideas? > >>> > >>> Was that last commit motivated by a bug report, and/or is there an > >>> associated test? If not, perhaps it can just be reverted? > >> Here's the relevant list thread: > >> http://thread.gmane.org/gmane.comp.lang.perl.bio.general/17541/focus=17550 > >> The proposed patch was George's, which gets rid of the leak but > >> causes the Species.t failures. I say we revert the patch, but > >> we'll still need to address the original problem addressed by > >> Rutger Vos, i.e. that there is a significant memory leak in > >> Bio::Species. > > > > Thanks. Rather than revert I've kludged it for now. Everything > > should be fine, it's just that I fear my solution isn't 'correct' > > and could mess things up we don't happen to have tests for. > > > > See http://bugzilla.open-bio.org/show_bug.cgi?id=2594 > > Probably something that will go away when we quit using Bio::Species > altogether (in favor of Bio::Taxon) after 1.6. I'm not sure whether > it will be worth the trouble, but it's worth discussing. > > Anyway, I'm still a bit unfamiliar with some aspects of Bio::Species > (and I'm pretty busy ATM) so it make take a few days. I can try to take a look at it again and see if there's a way to make it work [more] correctly. g. From torsten.seemann at infotech.monash.edu.au Sun Sep 28 21:46:06 2008 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Mon, 29 Sep 2008 11:46:06 +1000 Subject: [Bioperl-l] Retrieving taxonomy information from a GenBank file In-Reply-To: References:

Message-ID: Anjan, >> I'm using the following code to access the "ORGANISM" tag value for the >> record NC_002526. >> I get a value of 0 even though the ORGANISM tag has a value. Any idea >> how >> this might be corrected? >> my $des= $seq->get_tag_values("ORGANISM"); > You want the species object ($seq->species) -- see the Seq HOWTO, much of > the Seq API is explained. > http://bioperl.org/wiki/HOWTO:Feature-Annotation#The_Species_Object Of course Jason is correct here but there was still a semantic bug in your code anyway which you should probably be aware of. $seq->get_tag_values() returns an ARRAY which you are trying to store in a SCALAR ($des). An array, in a scalar context, returns the SIZE of the array, which in your case was zero (0) as there were no ORGANISM tags. What you probably meant was: my($des) = $seq->get_tag_values("ORGANISM"); # or my $des = ($seq->get_tag_values("ORGANISM")) [ 0 ]; -- --Torsten Seemann --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash University --Tel: 9905 9010 / 0422 923 024 From schwarz at tenaya.caltech.edu Mon Sep 29 01:57:43 2008 From: schwarz at tenaya.caltech.edu (Erich Schwarz) Date: Sun, 28 Sep 2008 22:57:43 -0700 (PDT) Subject: [Bioperl-l] exporting contigs with CDSes, stored via Bio::DB::GFF, into individual GenBank records? Message-ID: Hi all, I have newly sequenced contigs, with CDS predictions, loaded into a Bio::DB::GFF-readable format (i.e., loaded into a MySQL database via Bio::DB::GFF). I'd like to export each contig, with its annotated CDSes, into a single GenBank-formatted record for each contig (in order to be able submit this stuff to GenBank, without having to waste time with Sequin). Is there some straightforward way of getting Bio::DB::GFF to do that? Some time ago, when I last had to decipher BioPerl, I came up with code that would let me export protein translations of the contigs' CDSes in GenBank format: ------------------------------------------------------------------- #!/usr/bin/env perl use strict; use warnings; use Bio::Seq; use Bio::SeqIO; use Bio::DB::GFF; my $query_database = $ARGV[0]; my $dna = q{}; my $db = Bio::DB::GFF->new( -dsn => $query_database); my $gb_file = 'example.gb'; my $seq_out = Bio::SeqIO->new( -file => ">$gb_file", -format => 'genbank', ); my @contigs = sort map { $_->display_id } $db->features( -types => 'contig:assembly' ); foreach my $contig (@contigs) { my $segment1 = $db->segment($contig); my @p_txs = $segment1->features('processed_transcript'); foreach my $p_t (sort @p_txs) { $dna = q{}; my @CDSes = $p_t->CDS; my $cds_name = $CDSes[0]->display_id(); foreach my $cds (@CDSes) { # $cds->seq == Bio::PrimarySeq, *not* plain nt seq.! $dna = $dna . $cds->seq->seq; } my $full_cds = Bio::Seq->new( -display_id => $cds_name, -seq => $dna, ); my $prot = $full_cds->translate; $seq_out->write_seq($prot); } } -------------------------------------------------------------------- Returning to this, I tried using $db->get_Seq_by_id($contig) to give me a Bio::Seq object for each contig (which I could then output into GenBank form), but that proved futile. I'm willing to work this out on my own if I have to, but if somebody can answer this in 30 seconds, it will save me a lot of time -- and be very appreciated! --Erich From jason at bioperl.org Tue Sep 30 12:48:17 2008 From: jason at bioperl.org (Jason Stajich) Date: Tue, 30 Sep 2008 09:48:17 -0700 Subject: [Bioperl-l] exporting contigs with CDSes, stored via Bio::DB::GFF, into individual GenBank records? In-Reply-To: References: Message-ID: <9880E773-812C-4F16-806F-7DB854346330@bioperl.org> Eric. CC-ing Gbrowse since this is regarding Gbrowse data-store. I've definitely done exactly this although I remember I had to tweak the features a bit to make sure i had some some of the necessary stuff to sequin. If you want to get a specific segment you just do what you already have in your code: my $segment = $db->segment($contig_name); Or you can iterate through all the features - depends on how you named your segments/contigs/chromsomes, I named mine "contig:scaffold" for type:source my $iterator = $dbh->get_seq_stream(-type=>'scaffold'); while (my $s = $iterator->next_seq) { } Now You *should* be able to pass this segment object to $seqio- >write_seq($segment); However Bio::DB::GFF::Feature doesn't implement the whole SeqI APi so you probably have to create your own sequence and move the features over: y $iterator = $dbh->get_seq_stream(-type=>'scaffold'); while (my $s = $iterator->next_seq) { my $seq = Bio::Seq->new(); $seq->primary_seq($s->seq); for my $feature ( $s->features('processed_transcript') ) { my $f = Bio::SeqFeature::Generic->new(-location => $feature->location, -primary_tag => $feature->primary_tag, -source_tag => $feature->primary_tag, -score => $feature->score, -seq_id => $feature->seq_id); $f->add_tag_value('locus_tag',$feature->name); # might also add all other tag/value pairs from this feature like DBXREF, etc. # derive a CDS feature from this feature as well. # perhaps derive a gene feature that only has start/end for the feature # might add a translation tag/value pair for CDS features $seq->add_SeqFeature($f); } $out->write_seq($seq); } I suspect you'll have to edit the feature objects some to a) remove the ones you don't want to output b) add additional info like translation frame -- I think there is a slightly different way that NCBI wants things labeled wrt translation frame c) add in other annotation information that may or may not be encoded as tag/values that It is also possible I am missing some things in here, but hopefully it gets you started. I think it would be quite useful for us to try and write a generic script for this but maybe focus on Bio::DB::SeqFeature dbs since it will be easier to assume that all the 3-level gene->mRNA- >CDS/exon relationships will be explicitly specified. -jason On Sep 28, 2008, at 10:57 PM, Erich Schwarz wrote: > Hi all, > > I have newly sequenced contigs, with CDS predictions, loaded > into a Bio::DB::GFF-readable format (i.e., loaded into a MySQL > database via Bio::DB::GFF). I'd like to export each contig, with > its annotated CDSes, into a single GenBank-formatted record for each > contig (in order to be able submit this stuff to GenBank, without > having to waste time with Sequin). Is there some straightforward > way of getting Bio::DB::GFF to do that? > > Some time ago, when I last had to decipher BioPerl, I came up > with code that would let me export protein translations of the > contigs' CDSes in GenBank format: > > ------------------------------------------------------------------- > > #!/usr/bin/env perl > > use strict; > use warnings; > use Bio::Seq; > use Bio::SeqIO; > use Bio::DB::GFF; > > my $query_database = $ARGV[0]; > my $dna = q{}; > my $db = Bio::DB::GFF->new( -dsn => $query_database); > > my $gb_file = 'example.gb'; > my $seq_out = Bio::SeqIO->new( -file => ">$gb_file", -format => > 'genbank', ); > > my @contigs = sort > map { $_->display_id } > $db->features( -types => 'contig:assembly' ); > > foreach my $contig (@contigs) { > my $segment1 = $db->segment($contig); > my @p_txs = $segment1->features('processed_transcript'); > foreach my $p_t (sort @p_txs) { > $dna = q{}; > my @CDSes = $p_t->CDS; > my $cds_name = $CDSes[0]->display_id(); > foreach my $cds (@CDSes) { > # $cds->seq == Bio::PrimarySeq, *not* plain nt seq.! > $dna = $dna . $cds->seq->seq; > } > my $full_cds = Bio::Seq->new( -display_id => $cds_name, > -seq => $dna, ); > my $prot = $full_cds->translate; > $seq_out->write_seq($prot); > } > } > > -------------------------------------------------------------------- > > Returning to this, I tried using > > $db->get_Seq_by_id($contig) > > to give me a Bio::Seq object for each contig (which I could then > output into GenBank form), but that proved futile. > > I'm willing to work this out on my own if I have to, but if > somebody can answer this in 30 seconds, it will save me a lot of > time -- and be very appreciated! > > > --Erich > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason at bioperl.org From gilbertd at cricket.bio.indiana.edu Tue Sep 30 13:23:56 2008 From: gilbertd at cricket.bio.indiana.edu (Don Gilbert) Date: Tue, 30 Sep 2008 12:23:56 -0500 (EST) Subject: [Bioperl-l] [Gmod-gbrowse] exporting contigs with CDSes, stored via Bio::DB::GFF, into individual GenBank records? Message-ID: <200809301723.m8UHNuc04293@cricket.bio.indiana.edu> Eric, If the Bio::DB::GFF database to Genbank submission route doesn't get you where you want, you can also look at storing your data in a GMOD Chado database, then using Bulkfiles to produce the Genbank Submission file set. - Don Gilbert Find a GenBank Submit output from Chado dbs in this tool release http://eugenes.org/gmod/GMODTools/ GMODTools-1.2b.zip 20-Jun-2008 - adding (in progress) Genbank Submission table writer, 'bulkfiles -format=genbanktbl', with output suited to submit to NCBI as per these specifications http://www.ncbi.nlm.nih.gov/Genbank/eukaryotic_genome_submission.html see also http://gmod.org/wiki/GMODTools and this test case with genbank-submit output http://gmod.org/wiki/GMODTools_TestCase From mauricio at open-bio.org Tue Sep 30 17:56:04 2008 From: mauricio at open-bio.org (Mauricio Herrera Cuadra) Date: Tue, 30 Sep 2008 16:56:04 -0500 Subject: [Bioperl-l] Test coverage for BioPerl now available Message-ID: <48E2A074.5060305@open-bio.org> Hi all, Daily-updated test coverage reports are now available for those BioPerl packages which make use of the Build.PL mechanism (except bioperl-db): http://bioperl.org/test-coverage/bioperl-live/ http://bioperl.org/test-coverage/bioperl-network/ http://bioperl.org/test-coverage/bioperl-run/ These reports will help us to know the current 'quality' of the code in SVN for most of the BioPerl modules. This idea was started by Nathan Haigh and Sendu a long time ago and it was my fault to not implement on time the necessary script to run the process on a daily basis, so apologies for that. There are still a few things to be done in order to have this working as it should: - Nathan, current Devel::Cover module from CPAN doesn't include the JS modifications to make table columns sortable. Do you know what happened to the code you contributed to the author for that? - Reports could be generated for the rest of the BioPerl packages as soon as they're migrated to the Build.PL infrastructure. Anyone up for that? - bioperl-db tests require BioSQL to be setup in the webserver machine, and the same goes for bioperl-run's tests with ALL of its dependencies. The bioperl.org site is co-hosted with all of the other OBF projects and that machine also takes care of other things (mailing lists, etc), so I would like your feedback on possible workarounds to not overload the server if we want to setup such test reports. Thanks & regards, Mauricio. From jason at bioperl.org Mon Sep 1 03:42:26 2008 From: jason at bioperl.org (Jason Stajich) Date: Mon, 1 Sep 2008 00:42:26 -0700 Subject: [Bioperl-l] Bio::Tools::dpAlign feature request In-Reply-To: <868888.19741.qm@web30406.mail.mud.yahoo.com> References: <868888.19741.qm@web30406.mail.mud.yahoo.com> Message-ID: <99A57523-94B3-4A56-A617-22675FA10AB8@bioperl.org> Safe to ignore the tests. Those that are failing aren't even test for Bio::Align::dpAlign - but were written to test a bug that has not been fixed in the EVD module if I remember correctly that is why they are marked in a TODO block, but I can't tell if the Test.pm is actually skipping these tests or not. I think we probably need to deprecate some of these modules as there is no maintainer of Ewan's code in here. At a minimum we need to modularlize the tests for these modules into separate t dir and fix the need for multiple Makefile.PL in here and probably move to Build.PL -jason On Aug 28, 2008, at 11:00 AM, Yee Man Chan wrote: > > Hi Alexie > > My understanding is that you can ignore these failures. > > I believe test cases 17-20 were added by Jason Stajich before I > added the feature you requested. I am not sure what he was doing > there. > > I suppose he can give you the definite answer to whether this is > something important or not. > > By the way, did you try out the new feature? Does it work? > > Thanks > Yee Man > > --- On Thu, 8/28/08, Alexie Papanicolaou > wrote: > >> From: Alexie Papanicolaou >> Subject: Re: Bio::Tools::dpAlign feature request >> To: "Yee Man Chan" >> Date: Thursday, August 28, 2008, 6:15 AM >> hi >> >> is the version you emailed me newer or older than the >> subversion one? >> >> i'm testing the subversion version for bioperl-ext and >> >> not ok 17 # TODO evalues vary based on platform, needs >> fixing >> # Failed (TODO) test at test.pl line 156. >> # got: '2027805538' >> # expected: '1764904' >> not ok 18 # TODO evalues vary based on platform, needs >> fixing >> # Failed (TODO) test at test.pl line 157. >> # got: '-1375488148' >> # expected: '1764872' >> not ok 19 # TODO evalues vary based on platform, needs >> fixing >> # Failed (TODO) test at test.pl line 158. >> # got: '-808808307' >> # expected: '1764872' >> not ok 20 # TODO evalues vary based on platform, needs >> fixing >> # Failed (TODO) test at test.pl line 159. >> # got: '-2118162890' >> # expected: '1764872' >> >> you think these are ok to ignore? >> >> >> On Fri, 2008-08-01 at 21:03 -0700, Yee Man Chan wrote: >> >>> Hi Alexie >>> >>> Attached are the files that contains the feature >> you requested. linspc.c is the one that does the work and >> test.pl has a test case for it. The scoring scheme is as >> described before. Please let me know if it works. >>> >>> Yee Man >>> >>> --- On Wed, 7/30/08, Alexie Papanicolaou >> wrote: >>> >>>> From: Alexie Papanicolaou >> >>>> Subject: Re: Bio::Tools::dpAlign feature request >>>> To: ymc at yahoo.com >>>> Date: Wednesday, July 30, 2008, 2:44 PM >>>> Oh sorry >>>> >>>> Say match=3 and mismatch=-1, gopen= -10, gext=-5 >>>> for aligning >>>> seq1: ATG >>>> seq2: ATT >>>> match: 3,3,-1 >>>> >>>> seq1: AT-G >>>> seq2: ATTG >>>> match: 3,3,-1,-10,3 >>>> >>>> is that possible? or am I missing something? I >> was only >>>> today wondering >>>> if it is even possible... >>>> >>>> a >>>> >>>> >>>> Yee Man Chan wrote: >>>>> Sorry, I don't quite get it. Can you >> give me an >>>> example of the output you want? >>>>> >>>>> Yee Man >>>>> >>>>> --- On Wed, 7/30/08, Alexie Papanicolaou >>>> wrote: >>>>> >>>>> >>>>>> From: Alexie Papanicolaou >>>> >>>>>> Subject: Re: Bio::Tools::dpAlign feature >> request >>>>>> To: ymc at yahoo.com >>>>>> Date: Wednesday, July 30, 2008, 9:50 AM >>>>>> Dear Yee Man, >>>>>> >>>>>> Do you think it is possible to code a >> method for >>>> creating a >>>>>> delimited >>>>>> (space or comma) "score-line"? >>>>>> >>>>>> I'd like to parse it into an array >> and have >>>> the >>>>>> individual score for >>>>>> each alignment position. Is it easy to >> do? >>>>>> >>>>>> a >>>>>> >>>>>> Yee Man Chan wrote: >>>>>> >>>>>>> Hi Alexie >>>>>>> >>>>>>> How about I implement the simple >> case? >>>>>>> >>>>>>> So for match = +3, mismatch = -1, >>>>>>> >>>>>>> A and R = +3 >>>>>>> A and Y = -1 >>>>>>> A and B = -1 >>>>>>> A and D = +3 >>>>>>> A and N = +3 >>>>>>> A and X = -1 >>>>>>> >>>>>>> What do you think? >>>>>>> Yee Man >>>>>>> >>>>>>> >>>>>>> --- On Tue, 7/29/08, Alexie >> Papanicolaou >>>>>>> >>>>>> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>>> From: Alexie Papanicolaou >>>>>>>> >>>>>> >>>>>> >>>>>>>> Subject: Re: Bio::Tools::dpAlign >> feature >>>> request >>>>>>>> To: ymc at yahoo.com >>>>>>>> Date: Tuesday, July 29, 2008, >> 10:58 AM >>>>>>>> Dear Yee Man, >>>>>>>> hello, I was wondering how is >> this >>>> progressing and >>>>>>>> >>>>>> if you >>>>>> >>>>>>>> need help? >>>>>>>> >>>>>>>> many thanks >>>>>>>> alexie >>>>>>>> >>>>>>>> Yee Man Chan wrote: >>>>>>>> >>>>>>>> >>>>>>>>> Hi Alexie >>>>>>>>> >>>>>>>>> There are two ways to >> compute the >>>> score for >>>>>>>>> >>>>>> each >>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> aligned basepair in dpAlign >> module. One is >>>>>>>> >>>>>> match/mismatch >>>>>> >>>>>>>> if you specify your sequence as >> DNA and >>>> the other >>>>>>>> >>>>>> is a >>>>>> >>>>>>>> scoring matrix if you specify >> your >>>> sequence as >>>>>>>> >>>>>> protein. >>>>>> >>>>>>>> Obviously, the latter can >> completely >>>> dominate the >>>>>>>> >>>>>> former. >>>>>> >>>>>>>> If you take the time to type the >> scoring >>>> matrix >>>>>>>> >>>>>> file, then >>>>>> >>>>>>>> you can handle those IUPAC code >> by >>>> specifying the >>>>>>>> >>>>>> sequence >>>>>> >>>>>>>> as proteins. >>>>>>>> >>>>>>>> >>>>>>>>> If you think this is too >>>> troublesome, then >>>>>>>>> >>>>>> I might >>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> be able to extend the >> match/mismatch route >>>> to >>>>>>>> >>>>>> handle IUPAC >>>>>> >>>>>>>> codes. But the problem here is, >> how should >>>> I score >>>>>>>> >>>>>> a match >>>>>> >>>>>>>> of A and W when match is +3 and >> mismatch >>>> is -1? >>>>>>>> >>>>>> Should it >>>>>> >>>>>>>> have a score of +3/3 = +1 for >> match or >>>> +3/3-1*2/3 >>>>>>>> >>>>>> = +1/3? >>>>>> >>>>>>>> Do you know what the convention >> is? If >>>> not, maybe >>>>>>>> >>>>>> you can >>>>>> >>>>>>>> tell me what you think the score >> will be? >>>>>>>> >>>>>>>> >>>>>>>>> Yee Man >>>>>>>>> >>>>>>>>> --- On Thu, 6/26/08, Alexie >>>> Papanicolaou >>>>>>>>> >>>>>>>>> >>>>>>>> >> wrote: >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> From: Alexie >> Papanicolaou >>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>> Subject: >> Bio::Tools::dpAlign >>>> feature >>>>>>>>>> >>>>>> request >>>>>> >>>>>>>>>> To: ymc at yahoo.com >>>>>>>>>> Date: Thursday, June 26, >> 2008, >>>> 4:15 AM >>>>>>>>>> Dear Yee Man Chan, >>>>>>>>>> >>>>>>>>>> Many thank you for this >> module. I >>>> like it >>>>>>>>>> >>>>>> very >>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>> much. I was >>>>>>>> >>>>>>>> >>>>>>>>>> wondering if >>>>>>>>>> it would be possible for >> you to >>>> allow for >>>>>>>>>> >>>>>> IUPAC >>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>> DNA codes. >>>>>>>> >>>>>>>> >>>>>>>>>> I see it is in your TODO >> list and >>>> I hoped >>>>>>>>>> >>>>>> to >>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>> inspire you >>>>>>>> >>>>>>>> >>>>>>>>>> :-) >>>>>>>>>> >>>>>>>>>> Even a simple measure >> with the >>>> degenerate >>>>>>>>>> >>>>>> base >>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>> containing >>>>>>>> >>>>>>>> >>>>>>>>>> the aligned >>>>>>>>>> base count as a >> (perfect) match >>>> would be >>>>>>>>>> >>>>>> very >>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>> useful to me >>>>>>>> >>>>>>>> >>>>>>>>>> (i'm sorry, i >>>>>>>>>> 'm not a good coder >> to do it >>>> myself). >>>>>>>>>> >>>>>>>>>> many thanks for your >> work so far. >>>>>>>>>> alexie >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> -- >>>>>>>>>> "Eppur si >> evolve" >>>> ("And yet >>>>>>>>>> >>>>>> it >>>>>> >>>>>>>>>> evolves") >>>>>>>>>> -Galileo Jr (ca 21st >> century) >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Alexie Papanicolaou >>>>>>>>>> Entomology >>>>>>>>>> Max Planck Institute for >> Chemical >>>> Ecology >>>>>>>>>> Hans Knoell Str 8 >>>>>>>>>> Jena 07745 >>>>>>>>>> Germany >>>>>>>>>> Email >> apapanicolaou at ice.mpg.de >>>>>>>>>> Tel +493641571561 >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> -- >>>>>>>> -- >>>>>>>> "Eppur si evolve" >> ("And yet >>>> it >>>>>>>> evolves") >>>>>>>> -Galileo Jr (ca 21st century) >>>>>>>> >>>>>>>> "One Galileo in two >> thousand years is >>>>>>>> >>>>>> enough." >>>>>> >>>>>>>> -Pope Pius XII >>>>>>>> -- >>>>>>>> Alexie Papanicolaou >>>>>>>> Entomology >>>>>>>> Max Planck Institute for >> Chemical Ecology >>>>>>>> Hans Knoell Str 8 >>>>>>>> Jena 07745 >>>>>>>> Germany >>>>>>>> Email apapanicolaou at ice.mpg.de >>>>>>>> Tel +493641571561 >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> -- >>>>>> -- >>>>>> "Eppur si evolve" ("And >> yet it >>>>>> evolves") >>>>>> -Galileo Jr (ca 21st century) >>>>>> >>>>>> "One Galileo in two thousand years >> is >>>> enough." >>>>>> -Pope Pius XII >>>>>> -- >>>>>> Alexie Papanicolaou >>>>>> Entomology >>>>>> Max Planck Institute for Chemical >> Ecology >>>>>> Hans Knoell Str 8 >>>>>> Jena 07745 >>>>>> Germany >>>>>> Email apapanicolaou at ice.mpg.de >>>>>> Tel +493641571561 >>>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>>> -- >>>> -- >>>> "Eppur si evolve" ("And yet it >>>> evolves") >>>> -Galileo Jr (ca 21st century) >>>> >>>> "One Galileo in two thousand years is >> enough." >>>> -Pope Pius XII >>>> -- >>>> Alexie Papanicolaou >>>> Entomology >>>> Max Planck Institute for Chemical Ecology >>>> Hans Knoell Str 8 >>>> Jena 07745 >>>> Germany >>>> Email apapanicolaou at ice.mpg.de >>>> Tel +493641571561 >>> >>> >>> >> >> -- >> -- >> "Eppur si evolve" ("And yet it >> evolves") >> -Galileo Jr (ca 21st century) >> >> "One Galileo in two thousand years is enough." >> -Pope Pius XII >> -- >> Alexie Papanicolaou >> Entomology >> Max Planck Institute for Chemical Ecology >> Hans Knoell Str 8 >> Jena 07745 >> Germany >> Email apapanicolaou at ice.mpg.de >> Tel +493641571561 > > > Jason Stajich jason at bioperl.org From cjfields at illinois.edu Mon Sep 1 13:49:56 2008 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 1 Sep 2008 12:49:56 -0500 Subject: [Bioperl-l] Bio::Tools::dpAlign feature request In-Reply-To: <99A57523-94B3-4A56-A617-22675FA10AB8@bioperl.org> References: <868888.19741.qm@web30406.mail.mud.yahoo.com> <99A57523-94B3-4A56-A617-22675FA10AB8@bioperl.org> Message-ID: <44D25826-BE42-4539-8567-468FE74571B4@illinois.edu> On pSW: I agree, I don't think it is worth maintaining it considering there are actively supported C/C++-based toolkits with similar functionality (SeqAn) and Petr's BioLib initiative will likely be a more maintainable effort. chris On Sep 1, 2008, at 2:42 AM, Jason Stajich wrote: > Safe to ignore the tests. Those that are failing aren't even test > for Bio::Align::dpAlign - but were written to test a bug that has > not been fixed in the EVD module if I remember correctly that is why > they are marked in a TODO block, but I can't tell if the Test.pm is > actually skipping these tests or not. > > I think we probably need to deprecate some of these modules as there > is no maintainer of Ewan's code in here. > > At a minimum we need to modularlize the tests for these modules into > separate t dir and fix the need for multiple Makefile.PL in here and > probably move to Build.PL > > -jason > On Aug 28, 2008, at 11:00 AM, Yee Man Chan wrote: > >> >> Hi Alexie >> >> My understanding is that you can ignore these failures. >> >> I believe test cases 17-20 were added by Jason Stajich before I >> added the feature you requested. I am not sure what he was doing >> there. >> >> I suppose he can give you the definite answer to whether this is >> something important or not. >> >> By the way, did you try out the new feature? Does it work? >> >> Thanks >> Yee Man >> >> --- On Thu, 8/28/08, Alexie Papanicolaou >> wrote: >> >>> From: Alexie Papanicolaou >>> Subject: Re: Bio::Tools::dpAlign feature request >>> To: "Yee Man Chan" >>> Date: Thursday, August 28, 2008, 6:15 AM >>> hi >>> >>> is the version you emailed me newer or older than the >>> subversion one? >>> >>> i'm testing the subversion version for bioperl-ext and >>> >>> not ok 17 # TODO evalues vary based on platform, needs >>> fixing >>> # Failed (TODO) test at test.pl line 156. >>> # got: '2027805538' >>> # expected: '1764904' >>> not ok 18 # TODO evalues vary based on platform, needs >>> fixing >>> # Failed (TODO) test at test.pl line 157. >>> # got: '-1375488148' >>> # expected: '1764872' >>> not ok 19 # TODO evalues vary based on platform, needs >>> fixing >>> # Failed (TODO) test at test.pl line 158. >>> # got: '-808808307' >>> # expected: '1764872' >>> not ok 20 # TODO evalues vary based on platform, needs >>> fixing >>> # Failed (TODO) test at test.pl line 159. >>> # got: '-2118162890' >>> # expected: '1764872' >>> >>> you think these are ok to ignore? >>> >>> >>> On Fri, 2008-08-01 at 21:03 -0700, Yee Man Chan wrote: >>> >>>> Hi Alexie >>>> >>>> Attached are the files that contains the feature >>> you requested. linspc.c is the one that does the work and >>> test.pl has a test case for it. The scoring scheme is as >>> described before. Please let me know if it works. >>>> >>>> Yee Man >>>> >>>> --- On Wed, 7/30/08, Alexie Papanicolaou >>> wrote: >>>> >>>>> From: Alexie Papanicolaou >>> >>>>> Subject: Re: Bio::Tools::dpAlign feature request >>>>> To: ymc at yahoo.com >>>>> Date: Wednesday, July 30, 2008, 2:44 PM >>>>> Oh sorry >>>>> >>>>> Say match=3 and mismatch=-1, gopen= -10, gext=-5 >>>>> for aligning >>>>> seq1: ATG >>>>> seq2: ATT >>>>> match: 3,3,-1 >>>>> >>>>> seq1: AT-G >>>>> seq2: ATTG >>>>> match: 3,3,-1,-10,3 >>>>> >>>>> is that possible? or am I missing something? I >>> was only >>>>> today wondering >>>>> if it is even possible... >>>>> >>>>> a >>>>> >>>>> >>>>> Yee Man Chan wrote: >>>>>> Sorry, I don't quite get it. Can you >>> give me an >>>>> example of the output you want? >>>>>> >>>>>> Yee Man >>>>>> >>>>>> --- On Wed, 7/30/08, Alexie Papanicolaou >>>>> wrote: >>>>>> >>>>>> >>>>>>> From: Alexie Papanicolaou >>>>> >>>>>>> Subject: Re: Bio::Tools::dpAlign feature >>> request >>>>>>> To: ymc at yahoo.com >>>>>>> Date: Wednesday, July 30, 2008, 9:50 AM >>>>>>> Dear Yee Man, >>>>>>> >>>>>>> Do you think it is possible to code a >>> method for >>>>> creating a >>>>>>> delimited >>>>>>> (space or comma) "score-line"? >>>>>>> >>>>>>> I'd like to parse it into an array >>> and have >>>>> the >>>>>>> individual score for >>>>>>> each alignment position. Is it easy to >>> do? >>>>>>> >>>>>>> a >>>>>>> >>>>>>> Yee Man Chan wrote: >>>>>>> >>>>>>>> Hi Alexie >>>>>>>> >>>>>>>> How about I implement the simple >>> case? >>>>>>>> >>>>>>>> So for match = +3, mismatch = -1, >>>>>>>> >>>>>>>> A and R = +3 >>>>>>>> A and Y = -1 >>>>>>>> A and B = -1 >>>>>>>> A and D = +3 >>>>>>>> A and N = +3 >>>>>>>> A and X = -1 >>>>>>>> >>>>>>>> What do you think? >>>>>>>> Yee Man >>>>>>>> >>>>>>>> >>>>>>>> --- On Tue, 7/29/08, Alexie >>> Papanicolaou >>>>>>>> >>>>>>> wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> From: Alexie Papanicolaou >>>>>>>>> >>>>>>> >>>>>>> >>>>>>>>> Subject: Re: Bio::Tools::dpAlign >>> feature >>>>> request >>>>>>>>> To: ymc at yahoo.com >>>>>>>>> Date: Tuesday, July 29, 2008, >>> 10:58 AM >>>>>>>>> Dear Yee Man, >>>>>>>>> hello, I was wondering how is >>> this >>>>> progressing and >>>>>>>>> >>>>>>> if you >>>>>>> >>>>>>>>> need help? >>>>>>>>> >>>>>>>>> many thanks >>>>>>>>> alexie >>>>>>>>> >>>>>>>>> Yee Man Chan wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>>> Hi Alexie >>>>>>>>>> >>>>>>>>>> There are two ways to >>> compute the >>>>> score for >>>>>>>>>> >>>>>>> each >>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> aligned basepair in dpAlign >>> module. One is >>>>>>>>> >>>>>>> match/mismatch >>>>>>> >>>>>>>>> if you specify your sequence as >>> DNA and >>>>> the other >>>>>>>>> >>>>>>> is a >>>>>>> >>>>>>>>> scoring matrix if you specify >>> your >>>>> sequence as >>>>>>>>> >>>>>>> protein. >>>>>>> >>>>>>>>> Obviously, the latter can >>> completely >>>>> dominate the >>>>>>>>> >>>>>>> former. >>>>>>> >>>>>>>>> If you take the time to type the >>> scoring >>>>> matrix >>>>>>>>> >>>>>>> file, then >>>>>>> >>>>>>>>> you can handle those IUPAC code >>> by >>>>> specifying the >>>>>>>>> >>>>>>> sequence >>>>>>> >>>>>>>>> as proteins. >>>>>>>>> >>>>>>>>> >>>>>>>>>> If you think this is too >>>>> troublesome, then >>>>>>>>>> >>>>>>> I might >>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> be able to extend the >>> match/mismatch route >>>>> to >>>>>>>>> >>>>>>> handle IUPAC >>>>>>> >>>>>>>>> codes. But the problem here is, >>> how should >>>>> I score >>>>>>>>> >>>>>>> a match >>>>>>> >>>>>>>>> of A and W when match is +3 and >>> mismatch >>>>> is -1? >>>>>>>>> >>>>>>> Should it >>>>>>> >>>>>>>>> have a score of +3/3 = +1 for >>> match or >>>>> +3/3-1*2/3 >>>>>>>>> >>>>>>> = +1/3? >>>>>>> >>>>>>>>> Do you know what the convention >>> is? If >>>>> not, maybe >>>>>>>>> >>>>>>> you can >>>>>>> >>>>>>>>> tell me what you think the score >>> will be? >>>>>>>>> >>>>>>>>> >>>>>>>>>> Yee Man >>>>>>>>>> >>>>>>>>>> --- On Thu, 6/26/08, Alexie >>>>> Papanicolaou >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> From: Alexie >>> Papanicolaou >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>>> Subject: >>> Bio::Tools::dpAlign >>>>> feature >>>>>>>>>>> >>>>>>> request >>>>>>> >>>>>>>>>>> To: ymc at yahoo.com >>>>>>>>>>> Date: Thursday, June 26, >>> 2008, >>>>> 4:15 AM >>>>>>>>>>> Dear Yee Man Chan, >>>>>>>>>>> >>>>>>>>>>> Many thank you for this >>> module. I >>>>> like it >>>>>>>>>>> >>>>>>> very >>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> much. I was >>>>>>>>> >>>>>>>>> >>>>>>>>>>> wondering if >>>>>>>>>>> it would be possible for >>> you to >>>>> allow for >>>>>>>>>>> >>>>>>> IUPAC >>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> DNA codes. >>>>>>>>> >>>>>>>>> >>>>>>>>>>> I see it is in your TODO >>> list and >>>>> I hoped >>>>>>>>>>> >>>>>>> to >>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> inspire you >>>>>>>>> >>>>>>>>> >>>>>>>>>>> :-) >>>>>>>>>>> >>>>>>>>>>> Even a simple measure >>> with the >>>>> degenerate >>>>>>>>>>> >>>>>>> base >>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> containing >>>>>>>>> >>>>>>>>> >>>>>>>>>>> the aligned >>>>>>>>>>> base count as a >>> (perfect) match >>>>> would be >>>>>>>>>>> >>>>>>> very >>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> useful to me >>>>>>>>> >>>>>>>>> >>>>>>>>>>> (i'm sorry, i >>>>>>>>>>> 'm not a good coder >>> to do it >>>>> myself). >>>>>>>>>>> >>>>>>>>>>> many thanks for your >>> work so far. >>>>>>>>>>> alexie >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> -- >>>>>>>>>>> "Eppur si >>> evolve" >>>>> ("And yet >>>>>>>>>>> >>>>>>> it >>>>>>> >>>>>>>>>>> evolves") >>>>>>>>>>> -Galileo Jr (ca 21st >>> century) >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Alexie Papanicolaou >>>>>>>>>>> Entomology >>>>>>>>>>> Max Planck Institute for >>> Chemical >>>>> Ecology >>>>>>>>>>> Hans Knoell Str 8 >>>>>>>>>>> Jena 07745 >>>>>>>>>>> Germany >>>>>>>>>>> Email >>> apapanicolaou at ice.mpg.de >>>>>>>>>>> Tel +493641571561 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> -- >>>>>>>>> -- >>>>>>>>> "Eppur si evolve" >>> ("And yet >>>>> it >>>>>>>>> evolves") >>>>>>>>> -Galileo Jr (ca 21st century) >>>>>>>>> >>>>>>>>> "One Galileo in two >>> thousand years is >>>>>>>>> >>>>>>> enough." >>>>>>> >>>>>>>>> -Pope Pius XII >>>>>>>>> -- >>>>>>>>> Alexie Papanicolaou >>>>>>>>> Entomology >>>>>>>>> Max Planck Institute for >>> Chemical Ecology >>>>>>>>> Hans Knoell Str 8 >>>>>>>>> Jena 07745 >>>>>>>>> Germany >>>>>>>>> Email apapanicolaou at ice.mpg.de >>>>>>>>> Tel +493641571561 >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> -- >>>>>>> -- >>>>>>> "Eppur si evolve" ("And >>> yet it >>>>>>> evolves") >>>>>>> -Galileo Jr (ca 21st century) >>>>>>> >>>>>>> "One Galileo in two thousand years >>> is >>>>> enough." >>>>>>> -Pope Pius XII >>>>>>> -- >>>>>>> Alexie Papanicolaou >>>>>>> Entomology >>>>>>> Max Planck Institute for Chemical >>> Ecology >>>>>>> Hans Knoell Str 8 >>>>>>> Jena 07745 >>>>>>> Germany >>>>>>> Email apapanicolaou at ice.mpg.de >>>>>>> Tel +493641571561 >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> -- >>>>> -- >>>>> "Eppur si evolve" ("And yet it >>>>> evolves") >>>>> -Galileo Jr (ca 21st century) >>>>> >>>>> "One Galileo in two thousand years is >>> enough." >>>>> -Pope Pius XII >>>>> -- >>>>> Alexie Papanicolaou >>>>> Entomology >>>>> Max Planck Institute for Chemical Ecology >>>>> Hans Knoell Str 8 >>>>> Jena 07745 >>>>> Germany >>>>> Email apapanicolaou at ice.mpg.de >>>>> Tel +493641571561 >>>> >>>> >>>> >>> >>> -- >>> -- >>> "Eppur si evolve" ("And yet it >>> evolves") >>> -Galileo Jr (ca 21st century) >>> >>> "One Galileo in two thousand years is enough." >>> -Pope Pius XII >>> -- >>> Alexie Papanicolaou >>> Entomology >>> Max Planck Institute for Chemical Ecology >>> Hans Knoell Str 8 >>> Jena 07745 >>> Germany >>> Email apapanicolaou at ice.mpg.de >>> Tel +493641571561 >> >> >> > > Jason Stajich > jason at bioperl.org > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Marie-Claude Hofmann College of Veterinary Medicine University of Illinois Urbana-Champaign From lmanchon at univ-montp2.fr Wed Sep 3 09:04:14 2008 From: lmanchon at univ-montp2.fr (Laurent Manchon) Date: Wed, 03 Sep 2008 15:04:14 +0200 Subject: [Bioperl-l] parsing result of CAP3 (ACE file) Message-ID: <5.0.2.1.2.20080903150203.00c0db18@pop.univ-montp2.fr> -- Hi, Is somebody have a piece of code to parse result of CAP3 assembly program which format is ACE ? I need to retrieve the alignment from this file. thank you, Laurent -- +---------------------------------------------+ Laurent Manchon Email: lmanchon at univ-montp2.fr +---------------------------------------------+ From osborne6 at gmail.com Wed Sep 3 10:33:15 2008 From: osborne6 at gmail.com (John Osborne) Date: Wed, 3 Sep 2008 09:33:15 -0500 Subject: [Bioperl-l] interpro parsing enhancement? Message-ID: <324fccf0809030733y5c5e8592t5f0617d0ff4d2203@mail.gmail.com> Hi - I'm wondering if anyone is working on adding functionality to Bio::SeqIO::interpro to grab the Gene Ontology/GO classifications out of the interpro xml output? I've started working on that myself, but wanted to check if anyone else is doing the same. Thanks! -- John Osborne osborne6 at ieee.org/osborne6 at gmail.com/jro at freeshell.org From gundalav at gmail.com Wed Sep 3 10:48:07 2008 From: gundalav at gmail.com (Gundala Viswanath) Date: Wed, 3 Sep 2008 23:48:07 +0900 Subject: [Bioperl-l] Fitch's Parsimony Algorithm with Perl Message-ID: <73f827b50809030748o725d3772m681af9da3c0c26c0@mail.gmail.com> Hi, What's a correct way to implement Fitch's parsimony algorithm? Especially to compute minimum substitiution rate per column in the aligned sequence. Is there a Bioperl module to do it? For example CGGCGGAAAACTGTCCTCCGTGC mouse CGACGGAACATTCTCCTCCGCGC rat CGACGGAATATTCCCCTCCGTGC human CGACGGAAGACTCTCCTCCGTGC chimp 00100000302011000000100 -> number of subst per site (max parsimony) My code below doesn't seem to do the job. __BEGIN__ use Data::Dumper; use List::MoreUtils qw(uniq); # The related phylogenetic in Newick format tree is: my $tree = ' (mouse,rat,(human,chimp))'; my $sites = [ 'CGGCGGAAAACTGTCCTCCGTGC', # mouse 'CGACGGAACATTCTCCTCCGCGC', # rat 'CGACGGAATATTCCCCTCCGTGC', # human 'CGACGGAAGACTCTCCTCCGTGC', # chimp ]; my @val = my_parsimony($sites); print Dumper \@val; sub my_parsimony { my $tfbs = shift; my $mlen = length($tfbs->[0]); my $sum_min = 0; my @mincol; foreach my $pos ( 0 .. $mlen-1 ) { my @colbp = (); foreach my $site ( @{$tfbs} ) { my $bp = substr($site,$pos,1); push @colbp, $bp; } # this heuristic seems to be faulty # Column 11 it predicts 1 instead of 2 # Not sure how can I make use of the tree my $min_mm = scalar( uniq(@colbp) ) - 1; push @mincol, $min_mm; } return @mincol; } __END__ - Gundala Viswanath Jakarta - Indonesia From raulmendez at cbm.uam.es Wed Sep 3 10:33:46 2008 From: raulmendez at cbm.uam.es (Raul Mendez Giraldez) Date: Wed, 03 Sep 2008 16:33:46 +0200 Subject: [Bioperl-l] SeqHound In-Reply-To: <111DD141-75F8-4437-9EAD-E049BBADB515@uiuc.edu> Message-ID: <1220452426.31595.92.camel@pepa.cbm.uam.es> Hi Chris, I'm trying to set up and run bioperl Seqhound donwloaded from: http://bond.unleashedinformatics.com/downloads/api//seqhound-bioperl-4.0.tar.gz and I always get connection error messages. Do you know which version of SeqHound should I use and how can I configure to make it work? I've tried several possibilities for server1 at .shoundremrc as [remote] server1 = bond.unleashedinformatics.com CGI = /cgi-bin/seqrem port=8080 Also, I would like to get all the possible protein-protein interaction for a set of protein sequences. Would this be possible using SeqHound? Thanks, Ra?l -- Ra?l M?ndez Gir?ldez, Ph.D. Bioinformatics Unit Centre for Molecular Biology "Severo Ochoa" Universidad Aut?noma de Madrid C/ Nicol?s Cabrera, 1 Cantoblanco 28049, Madrid SPAIN Phone: +34 91 196 4633 From jaudall at gmail.com Wed Sep 3 11:38:08 2008 From: jaudall at gmail.com (Joshua Udall) Date: Wed, 3 Sep 2008 09:38:08 -0600 Subject: [Bioperl-l] parsing result of CAP3 (ACE file) In-Reply-To: <5.0.2.1.2.20080903150203.00c0db18@pop.univ-montp2.fr> References: <5.0.2.1.2.20080903150203.00c0db18@pop.univ-montp2.fr> Message-ID: <52cea20c0809030838k6fb1498btc15b8e76d98f9d70@mail.gmail.com> Laurent - I have modified modules that will do it as I recently ran into problems with the DB_FILE module in Assembly::IO. In addition, the current version of cap3 seems to put a contig length where a pad length is expected (based on the Ace format description). The modules I have will parse the ace file contig-by-contig rather than having the entire assembly slurped into memory (or a tied hash) all at once. You are welcome to them if you are interested and I'd like to get them in Bioperl at some point. Bascially, there are three files - a modified Contig.pm, ContigIO.pm, and a modified ace.pm (in a ContigIO directory). Josh On Wed, Sep 3, 2008 at 7:04 AM, Laurent Manchon wrote: > -- Hi, > > Is somebody have a piece of code to parse result of CAP3 assembly program > which > format is ACE ? > I need to retrieve the alignment from this file. > > thank you, > Laurent -- > > > > > +---------------------------------------------+ > Laurent Manchon > Email: lmanchon at univ-montp2.fr > +---------------------------------------------+ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Joshua Udall Assistant Professor 295 WIDB Plant and Wildlife Science Dept. Brigham Young University Provo, UT 84602 801-422-9307 Fax: 801-422-0008 USA From hartzell at alerce.com Wed Sep 3 19:19:45 2008 From: hartzell at alerce.com (George Hartzell) Date: Wed, 3 Sep 2008 16:19:45 -0700 Subject: [Bioperl-l] What's up with line 248 of Bio::Coordinate::Pair? Message-ID: <18623.7057.95449.99461@almost.alerce.com> Ok, confess. None of you know what's up with line 248 of Bio::Coordinate::Pair, do you? You probably don't even know what's *on* that line. Wonder how many will go look. Now that I either have your attention or have pissed you off (or both...), I think that creating a new Bio::Location::Split object in Bio::Coordinate::Pair::map() is a leftover or something, but I'm not quite sure enough to excise it and commit the change. Anyone up for it? g. From cjfields at illinois.edu Wed Sep 3 21:29:49 2008 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 3 Sep 2008 20:29:49 -0500 Subject: [Bioperl-l] What's up with line 248 of Bio::Coordinate::Pair? In-Reply-To: <18623.7057.95449.99461@almost.alerce.com> References: <18623.7057.95449.99461@almost.alerce.com> Message-ID: Well, it doesn't look like the SplitLocation is even used, so I think it is safe to remove. chris On Sep 3, 2008, at 6:19 PM, George Hartzell wrote: > > Ok, confess. None of you know what's up with line 248 of > Bio::Coordinate::Pair, do you? You probably don't even know what's > *on* that line. Wonder how many will go look. > > Now that I either have your attention or have pissed you off (or > both...), I think that creating a new Bio::Location::Split object in > Bio::Coordinate::Pair::map() is a leftover or something, but I'm not > quite sure enough to excise it and commit the change. > > Anyone up for it? > > g. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Marie-Claude Hofmann College of Veterinary Medicine University of Illinois Urbana-Champaign From jason at bioperl.org Thu Sep 4 00:40:38 2008 From: jason at bioperl.org (Jason Stajich) Date: Wed, 3 Sep 2008 21:40:38 -0700 Subject: [Bioperl-l] What's up with line 248 of Bio::Coordinate::Pair? In-Reply-To: References: <18623.7057.95449.99461@almost.alerce.com> Message-ID: <222DB5D8-BCCB-448E-BDEE-068A4A432660@bioperl.org> Agreed - I don't know if that was something was changed mid-stream, but removing it should cause no pain... -j On Sep 3, 2008, at 6:29 PM, Chris Fields wrote: > Well, it doesn't look like the SplitLocation is even used, so I > think it is safe to remove. > > chris > > On Sep 3, 2008, at 6:19 PM, George Hartzell wrote: > >> >> Ok, confess. None of you know what's up with line 248 of >> Bio::Coordinate::Pair, do you? You probably don't even know what's >> *on* that line. Wonder how many will go look. >> >> Now that I either have your attention or have pissed you off (or >> both...), I think that creating a new Bio::Location::Split object in >> Bio::Coordinate::Pair::map() is a leftover or something, but I'm not >> quite sure enough to excise it and commit the change. >> >> Anyone up for it? >> >> g. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Marie-Claude Hofmann > College of Veterinary Medicine > University of Illinois Urbana-Champaign > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason at bioperl.org From heikki at sanbi.ac.za Thu Sep 4 02:17:31 2008 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Thu, 4 Sep 2008 08:17:31 +0200 Subject: [Bioperl-l] What's up with line 248 of Bio::Coordinate::Pair? In-Reply-To: <222DB5D8-BCCB-448E-BDEE-068A4A432660@bioperl.org> References: <18623.7057.95449.99461@almost.alerce.com> <222DB5D8-BCCB-448E-BDEE-068A4A432660@bioperl.org> Message-ID: <200809040817.31916.heikki@sanbi.ac.za> Quilty. So I removed the line. George, please do not try to piss us off. You can get all the attention you want from us. :) What are you planning to do to Bio::Coordinate classes? -Heikki On Thursday 04 September 2008 06:40:38 Jason Stajich wrote: > Agreed - I don't know if that was something was changed mid-stream, > but removing it should cause no pain... > -j > > On Sep 3, 2008, at 6:29 PM, Chris Fields wrote: > > Well, it doesn't look like the SplitLocation is even used, so I > > think it is safe to remove. > > > > chris > > > > On Sep 3, 2008, at 6:19 PM, George Hartzell wrote: > >> Ok, confess. None of you know what's up with line 248 of > >> Bio::Coordinate::Pair, do you? You probably don't even know what's > >> *on* that line. Wonder how many will go look. > >> > >> Now that I either have your attention or have pissed you off (or > >> both...), I think that creating a new Bio::Location::Split object in > >> Bio::Coordinate::Pair::map() is a leftover or something, but I'm not > >> quite sure enough to excise it and commit the change. > >> > >> Anyone up for it? > >> > >> g. > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > Christopher Fields > > Postdoctoral Researcher > > Lab of Dr. Marie-Claude Hofmann > > College of Veterinary Medicine > > University of Illinois Urbana-Champaign > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Jason Stajich > jason at bioperl.org > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From hartzell at alerce.com Thu Sep 4 12:00:21 2008 From: hartzell at alerce.com (George Hartzell) Date: Thu, 4 Sep 2008 09:00:21 -0700 Subject: [Bioperl-l] What's up with line 248 of Bio::Coordinate::Pair? In-Reply-To: <200809040817.31916.heikki@sanbi.ac.za> References: <18623.7057.95449.99461@almost.alerce.com> <222DB5D8-BCCB-448E-BDEE-068A4A432660@bioperl.org> <200809040817.31916.heikki@sanbi.ac.za> Message-ID: <18624.1557.758033.258065@almost.alerce.com> Heikki Lehvaslaiho writes: > Quilty. So I removed the line. > > George, please do not try to piss us off. You can get all the attention you > want from us. :) Try not. Do... or do not. There is no try. > What are you planning to do to Bio::Coordinate classes? It's a project for a paying customer (yikes...). Pretty much exactly what GeneMapper does, though I'll probably end up dangling a couple more named coordinate spaces off of it. Nothing earth shattering. The classes look *great*. g. From hartzell at alerce.com Fri Sep 5 15:01:35 2008 From: hartzell at alerce.com (George Hartzell) Date: Fri, 05 Sep 2008 12:01:35 -0700 Subject: [Bioperl-l] Bio::Coordinate::Collection could DoWhatIMean better (w/ patch) Message-ID: Hi all, Bio::Coordinate::Collection surprised me a bit. At first I thought there was a bug, but it's clearly doing what it's supposed to. Now I'm wondering if what it's supposed to be doing makes sense in some context, or if what I expected would be better functionality. t/CoordinateMapper.t sets up the following scenario: # # Collection # # 1 5 6 10 # |---| |---| #-----|----------------------- # 1 5 9 15 19 # pair1 pair2 Then goes on to do the following query: # match more than two $pos = Bio::Location::Simple->new (-start => 5, -end => 19 ); ok $res = $transcribe->map($pos); is $res->each_gap, 2; is $res->each_match, 2; I was surprised to see that there were two gaps, one gene:10-19 and one from gene:5-14. Looking at the code, what's really happening is that, for the exon1 mapper there's match with gene:5-9 and a gap with gene:10-19 and for the exon2 mapper there's a gap with gene:5-14 and a match with gene:15-19. All four Result's just get tossed into the return value. The result my intuition wants is that there are two matches (gene:5-9 with exon1 and gene:15-19 with exon2) and a gap (gene:10-14). Yes, I guess that I could just synthesize these myself from the result in my app. It still seems that the current result is a bug though, since there's no way of knowing when you're walking through $res->each_Location that the first "gap" is with respect to the exon1 mapper and that the second "gap" is with respect to the exon2 mapper. The gaps are meaningless. I "fixed" it to work the way I think it should (two matches, one gap). I actually extended the test case a bit so that there's a multi-base gap, a match, another multibase-gap, another match, then a single base gap (just to make sure I got that right...). I had to touch up the test file a bit to account for my new test. The gaps that I return have a strand of 'undef', which seems to be The Right Thing. There's also a bit of funny business where I hang onto the seq_id of the gapped sequence. It assumes that the "in" sequence is the same for all of the mappers. This seems safe since otherwise the entire query is kind of weird.... There's a patch to todays svn head at: http://shrimp.alerce.com/bioperl/collection-diffs.txt The patch changes Build.PL to include a dependency on Set::IntSpan, CoordinateMapper.t to update the tests, and Bio/CoordinateMapper/Collection.pm for the new code. Who's code would this break. If anyone's relying on the current behaviour re: gaps, what's the situation in which you find it useful? Thanks! g. From ajmackey at gmail.com Fri Sep 5 16:54:56 2008 From: ajmackey at gmail.com (Aaron Mackey) Date: Fri, 5 Sep 2008 16:54:56 -0400 Subject: [Bioperl-l] Bio::Coordinate::Collection could DoWhatIMean better (w/ patch) In-Reply-To: References: Message-ID: <24c96eca0809051354h5b7218edtaa720140901d023f@mail.gmail.com> There are two uses for Collection: 1) all the "in" seq_id's are the same, and George's patch makes sense to me (i.e. agrees with my intuition) 2) all the "in" seq_id's are *not* the same (i.e. the collection is just a hash of indivual pairs), in which case my query would only match the subset of pairs having identical seq_id's to that specified by the query ... and then you're back to case #1 So overall, it looks like this was a bug, but I'd of course want to hear Heikki's opinion. Thanks for raising this, -Aaron On Fri, Sep 5, 2008 at 3:01 PM, George Hartzell wrote: > > Hi all, > > Bio::Coordinate::Collection surprised me a bit. At first I thought > there was a bug, but it's clearly doing what it's supposed to. Now > I'm wondering if what it's supposed to be doing makes sense in some > context, or if what I expected would be better functionality. > > t/CoordinateMapper.t sets up the following scenario: > > # > # Collection > # > # 1 5 6 10 > # |---| |---| > #-----|----------------------- > # 1 5 9 15 19 > # pair1 pair2 > > Then goes on to do the following query: > > # match more than two > $pos = Bio::Location::Simple->new (-start => 5, -end => 19 ); > ok $res = $transcribe->map($pos); > is $res->each_gap, 2; > is $res->each_match, 2; > > I was surprised to see that there were two gaps, one gene:10-19 and > one from gene:5-14. Looking at the code, what's really happening is > that, for the exon1 mapper there's match with gene:5-9 and a gap with > gene:10-19 and for the exon2 mapper there's a gap with gene:5-14 and a > match with gene:15-19. All four Result's just get tossed into the > return value. > > The result my intuition wants is that there are two matches (gene:5-9 > with exon1 and gene:15-19 with exon2) and a gap (gene:10-14). > > Yes, I guess that I could just synthesize these myself from the result > in my app. > > It still seems that the current result is a bug though, since there's > no way of knowing when you're walking through $res->each_Location that > the first "gap" is with respect to the exon1 mapper and that the > second "gap" is with respect to the exon2 mapper. The gaps are > meaningless. > > I "fixed" it to work the way I think it should (two matches, one > gap). I actually extended the test case a bit so that there's a > multi-base gap, a match, another multibase-gap, another match, then a > single base gap (just to make sure I got that right...). I had to > touch up the test file a bit to account for my new test. > > The gaps that I return have a strand of 'undef', which seems to be The > Right Thing. There's also a bit of funny business where I hang onto > the seq_id of the gapped sequence. It assumes that the "in" sequence > is the same for all of the mappers. This seems safe since otherwise > the entire query is kind of weird.... > > There's a patch to todays svn head at: > > http://shrimp.alerce.com/bioperl/collection-diffs.txt > > The patch changes Build.PL to include a dependency on Set::IntSpan, > CoordinateMapper.t to update the tests, and > Bio/CoordinateMapper/Collection.pm for the new code. > > Who's code would this break. > > If anyone's relying on the current behaviour re: gaps, what's the > situation in which you find it useful? > > Thanks! > > g. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jimhu at tamu.edu Mon Sep 8 14:44:22 2008 From: jimhu at tamu.edu (Jim Hu) Date: Mon, 8 Sep 2008 13:44:22 -0500 Subject: [Bioperl-l] Circular genomes in Chado/BioPerl Message-ID: <87B87004-A586-44A0-BB10-D4AA3FE9E669@tamu.edu> In discussions with GMOD about Gbrowse, we've come up with a proposal for handling circular genomes and features that cross the origin in such genomes. This applies to lots of prokaryotic and viral genomes, and might be valuable for some ways of representing terminally redundant linear genomes. 1) Keep the requirement that start < end 2) allow end > parent feature length 3) parent feature gets an is_circular boolean 4) use modular arithmetic to calculate the real position of end on the parent feature. We'd like to do this in a way that will be consistent with Chado and BioPerl representation of features as much as possible (realizing that there is the usual interbase or not coordinate issue). What do people think? Lincoln is on board for modifying the GFF3 spec. Thanks! Jim Hu ===================================== Jim Hu Associate Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054 From ajmackey at gmail.com Mon Sep 8 15:57:50 2008 From: ajmackey at gmail.com (Aaron Mackey) Date: Mon, 8 Sep 2008 15:57:50 -0400 Subject: [Bioperl-l] [Gmod-schema] Circular genomes in Chado/BioPerl In-Reply-To: <87B87004-A586-44A0-BB10-D4AA3FE9E669@tamu.edu> References: <87B87004-A586-44A0-BB10-D4AA3FE9E669@tamu.edu> Message-ID: <24c96eca0809081257l16461b23uaefe8154ed038bea@mail.gmail.com> How can you handle features that may cross the origin more than once? The modulus, though simple, seems to be only half the solution. It also makes it difficult to place features in the genome "by eye" (having to do the modulus subtraction in my head), or in sorting/filtering operations. I have an alternative that I wondered if you considered: allow the start/end to have an additional "circular revolution" prefix: a typical range tuple like: 100 200 - is thus shorthand for: 0:100 0:200 - (i.e. both the 100 and 200 are in the same "revolution" around the genome) and is then distinguishable from an "around the genome + 100" feature of: 1:100 0:200 - Just an alternative to consider (if you haven't already). I'm not wedded to the syntax, but I wouldn't want to see new columns in GFF just for this. Essentially, what you want is some form of compound polar coordinates, it seems. -Aaron On Mon, Sep 8, 2008 at 2:44 PM, Jim Hu wrote: > In discussions with GMOD about Gbrowse, we've come up with a proposal for > handling circular genomes and features that cross the origin in such > genomes. This applies to lots of prokaryotic and viral genomes, and might > be valuable for some ways of representing terminally redundant linear > genomes. > 1) Keep the requirement that start < end > 2) allow end > parent feature length > 3) parent feature gets an is_circular boolean > 4) use modular arithmetic to calculate the real position of end on the > parent feature. > We'd like to do this in a way that will be consistent with Chado and BioPerl > representation of features as much as possible (realizing that there is the > usual interbase or not coordinate issue). What do people think? Lincoln is > on board for modifying the GFF3 spec. > Thanks! > Jim Hu > > ===================================== > > Jim Hu > > Associate Professor > > Dept. of Biochemistry and Biophysics > > 2128 TAMU > > Texas A&M Univ. > > College Station, TX 77843-2128 > > 979-862-4054 > > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great > prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Gmod-schema mailing list > Gmod-schema at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/gmod-schema > > From js5 at sanger.ac.uk Mon Sep 8 16:13:12 2008 From: js5 at sanger.ac.uk (James Smith) Date: Mon, 8 Sep 2008 21:13:12 +0100 (BST) Subject: [Bioperl-l] Circular genomes in Chado/BioPerl In-Reply-To: <87B87004-A586-44A0-BB10-D4AA3FE9E669@tamu.edu> References: <87B87004-A586-44A0-BB10-D4AA3FE9E669@tamu.edu> Message-ID: On Mon, 8 Sep 2008, Jim Hu wrote: > In discussions with GMOD about Gbrowse, we've come up with a proposal for > handling circular genomes and features that cross the origin in such genomes. > This applies to lots of prokaryotic and viral genomes, and might be valuable > for some ways of representing terminally redundant linear genomes. > > 1) Keep the requirement that start < end > 2) allow end > parent feature length > 3) parent feature gets an is_circular boolean > 4) use modular arithmetic to calculate the real position of end on the parent > feature. This is how we are considering handling features in Ensembl as well (the Ensembl genomes project will be setting up websites for bacterial and viral genomes) > > We'd like to do this in a way that will be consistent with Chado and BioPerl > representation of features as much as possible (realizing that there is the > usual interbase or not coordinate issue). What do people think? Lincoln is > on board for modifying the GFF3 spec. > > Thanks! > > Jim Hu > > ===================================== > Jim Hu > Associate Professor > Dept. of Biochemistry and Biophysics > 2128 TAMU > Texas A&M Univ. > College Station, TX 77843-2128 > 979-862-4054 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From heikki at sanbi.ac.za Tue Sep 9 03:50:11 2008 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Tue, 9 Sep 2008 09:50:11 +0200 Subject: [Bioperl-l] Bio::Coordinate::Collection could DoWhatIMean better (w/ patch) In-Reply-To: References: Message-ID: <200809090950.11821.heikki@sanbi.ac.za> George, This is an error from my side. Great that you have a fix already. My only worry is the number of external dependencies in BioPerl. To limit these we have recoded number of functionalities into BioPerl-specific modules. Before you commit the fix, could you see if Bio::RangeI could be used or easily extended to be used instead of Set::IntSpan? Thanks, -Heikki On Friday 05 September 2008 21:01:35 George Hartzell wrote: > Hi all, > > Bio::Coordinate::Collection surprised me a bit. At first I thought > there was a bug, but it's clearly doing what it's supposed to. Now > I'm wondering if what it's supposed to be doing makes sense in some > context, or if what I expected would be better functionality. > > t/CoordinateMapper.t sets up the following scenario: > > # > # Collection > # > # 1 5 6 10 > # |---| |---| > #-----|----------------------- > # 1 5 9 15 19 > # pair1 pair2 > > Then goes on to do the following query: > > # match more than two > $pos = Bio::Location::Simple->new (-start => 5, -end => 19 ); > ok $res = $transcribe->map($pos); > is $res->each_gap, 2; > is $res->each_match, 2; > > I was surprised to see that there were two gaps, one gene:10-19 and > one from gene:5-14. Looking at the code, what's really happening is > that, for the exon1 mapper there's match with gene:5-9 and a gap with > gene:10-19 and for the exon2 mapper there's a gap with gene:5-14 and a > match with gene:15-19. All four Result's just get tossed into the > return value. > > The result my intuition wants is that there are two matches (gene:5-9 > with exon1 and gene:15-19 with exon2) and a gap (gene:10-14). > > Yes, I guess that I could just synthesize these myself from the result > in my app. > > It still seems that the current result is a bug though, since there's > no way of knowing when you're walking through $res->each_Location that > the first "gap" is with respect to the exon1 mapper and that the > second "gap" is with respect to the exon2 mapper. The gaps are > meaningless. > > I "fixed" it to work the way I think it should (two matches, one > gap). I actually extended the test case a bit so that there's a > multi-base gap, a match, another multibase-gap, another match, then a > single base gap (just to make sure I got that right...). I had to > touch up the test file a bit to account for my new test. > > The gaps that I return have a strand of 'undef', which seems to be The > Right Thing. There's also a bit of funny business where I hang onto > the seq_id of the gapped sequence. It assumes that the "in" sequence > is the same for all of the mappers. This seems safe since otherwise > the entire query is kind of weird.... > > There's a patch to todays svn head at: > > http://shrimp.alerce.com/bioperl/collection-diffs.txt > > The patch changes Build.PL to include a dependency on Set::IntSpan, > CoordinateMapper.t to update the tests, and > Bio/CoordinateMapper/Collection.pm for the new code. > > Who's code would this break. > > If anyone's relying on the current behaviour re: gaps, what's the > situation in which you find it useful? > > Thanks! > > g. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From heikki at sanbi.ac.za Tue Sep 9 03:50:11 2008 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Tue, 9 Sep 2008 09:50:11 +0200 Subject: [Bioperl-l] Bio::Coordinate::Collection could DoWhatIMean better (w/ patch) In-Reply-To: References: Message-ID: <200809090950.11821.heikki@sanbi.ac.za> George, This is an error from my side. Great that you have a fix already. My only worry is the number of external dependencies in BioPerl. To limit these we have recoded number of functionalities into BioPerl-specific modules. Before you commit the fix, could you see if Bio::RangeI could be used or easily extended to be used instead of Set::IntSpan? Thanks, -Heikki On Friday 05 September 2008 21:01:35 George Hartzell wrote: > Hi all, > > Bio::Coordinate::Collection surprised me a bit. At first I thought > there was a bug, but it's clearly doing what it's supposed to. Now > I'm wondering if what it's supposed to be doing makes sense in some > context, or if what I expected would be better functionality. > > t/CoordinateMapper.t sets up the following scenario: > > # > # Collection > # > # 1 5 6 10 > # |---| |---| > #-----|----------------------- > # 1 5 9 15 19 > # pair1 pair2 > > Then goes on to do the following query: > > # match more than two > $pos = Bio::Location::Simple->new (-start => 5, -end => 19 ); > ok $res = $transcribe->map($pos); > is $res->each_gap, 2; > is $res->each_match, 2; > > I was surprised to see that there were two gaps, one gene:10-19 and > one from gene:5-14. Looking at the code, what's really happening is > that, for the exon1 mapper there's match with gene:5-9 and a gap with > gene:10-19 and for the exon2 mapper there's a gap with gene:5-14 and a > match with gene:15-19. All four Result's just get tossed into the > return value. > > The result my intuition wants is that there are two matches (gene:5-9 > with exon1 and gene:15-19 with exon2) and a gap (gene:10-14). > > Yes, I guess that I could just synthesize these myself from the result > in my app. > > It still seems that the current result is a bug though, since there's > no way of knowing when you're walking through $res->each_Location that > the first "gap" is with respect to the exon1 mapper and that the > second "gap" is with respect to the exon2 mapper. The gaps are > meaningless. > > I "fixed" it to work the way I think it should (two matches, one > gap). I actually extended the test case a bit so that there's a > multi-base gap, a match, another multibase-gap, another match, then a > single base gap (just to make sure I got that right...). I had to > touch up the test file a bit to account for my new test. > > The gaps that I return have a strand of 'undef', which seems to be The > Right Thing. There's also a bit of funny business where I hang onto > the seq_id of the gapped sequence. It assumes that the "in" sequence > is the same for all of the mappers. This seems safe since otherwise > the entire query is kind of weird.... > > There's a patch to todays svn head at: > > http://shrimp.alerce.com/bioperl/collection-diffs.txt > > The patch changes Build.PL to include a dependency on Set::IntSpan, > CoordinateMapper.t to update the tests, and > Bio/CoordinateMapper/Collection.pm for the new code. > > Who's code would this break. > > If anyone's relying on the current behaviour re: gaps, what's the > situation in which you find it useful? > > Thanks! > > g. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From Frigerio at pierroton.inra.fr Tue Sep 9 04:45:19 2008 From: Frigerio at pierroton.inra.fr (Jean-Marc FRIGERIO) Date: Tue, 9 Sep 2008 10:45:19 +0200 Subject: [Bioperl-l] parsing result of CAP3 (ACE file) Message-ID: <200809091045.19249.Frigerio@pierroton.inra.fr> > > -- Hi, > > > > Is somebody have a piece of code to parse result of CAP3 assembly program > > which > > format is ACE ? > > I need to retrieve the alignment from this file. > > > > thank you, > > Laurent -- > > > > > > > > > > +---------------------------------------------+ > > Laurent Manchon > > Email: lmanchon at univ-montp2.fr > > +---------------------------------------------+ > > Laurent - > > I have modified modules that will do it as I recently ran into problems > with the DB_FILE module in Assembly::IO. In addition, the current version > of cap3 seems to put a contig length where a pad length is expected (based > on the Ace format description). The modules I have will parse the ace file > contig-by-contig rather than having the entire assembly slurped into memory > (or a tied hash) all at once. You are welcome to them if you are > interested and I'd like to get them in Bioperl at some point. Bascially, > there are three files - a modified Contig.pm, ContigIO.pm, and a modified > ace.pm (in a ContigIO directory). > > Josh Hi, Here are a 2 pieces of code running on an ace file (output of phrap is that the same as cap3 ?) ----------------------------- 1 ----------------------------------------- my $assembly = Bio::Assembly::IO->new('-file' => $file, '-format' => 'ace')->next_assembly; for my $contig ($assembly->all_contigs) { my $ct_seq = $contig->get_consensus_sequence; (my $ref_seq = uc $ct_seq->seq) =~ s/-//g; my $debut = $pos - 100 > 0 ? $pos - 100 : 1; my $fin = $pos + 100 <= length $ref_seq ? $pos + 100 : length $ref_seq; my $coll = $contig->get_features_collection; my @coll = $coll->features_in_range('-start' => $debut, '-end' => $fin); for my $tag (@coll) { next unless $tag->primary_tag eq 'comment'; #print "TAG: ",$tag->start,"\n"; my $tag_pos = $contig->change_coord('gapped consensus','ungapped consensus',$tag->start); #print "TAG POS: $tag_pos\n"; next if $pos == $tag_pos; substr($ref_seq,$tag_pos-1,1,'N'); } } ------------------------------------ 2 ------------------- my $assembly = Bio::Assembly::IO->new( '-file' => $file, '-format' => 'ace')->next_assembly; for my $contig ($assembly->all_contigs) { for my $seq ($contig->each_seq) { my $id = $seq->id; my $s = $seq->seq; my ($start,$end) = ($contig->change_coord("aligned $id","ungapped consensus", $seq->start), $contig->change_coord("aligned $id","ungapped consensus",$seq->end)); my $dir = $seq->strand < 0 ? 'R' : 'F'; ...... } -- Jean-Marc From zheboyang at gmail.com Tue Sep 9 07:05:15 2008 From: zheboyang at gmail.com (boyang zhe) Date: Tue, 9 Sep 2008 19:05:15 +0800 Subject: [Bioperl-l] help:HMM parsing error Message-ID: <127e75f60809090405h644e51eftcab073e8bf179720@mail.gmail.com> I write a script to parse the HMMER report ,it is as follows: #!/usr/bin/perl -w #TODO: Parse the HMMER report use strict; use Bio::SearchIO; my $directory="./HMM/"; opendir(HMMDIR, $directory), or die "Can't open the directory!"; my @filelist=readdir(HMMDIR); foreach my $filename(@filelist) { if ($filename !~/^\./) { my $infile="$directory"."$filename"; my $outfile="$infile"."HMMParse"; my $in = new Bio::SearchIO(-format => 'hmmer',-file =>"$infile"); - Ignored: while (my $result= $in->next_result ) { # get a Bio::Search::Result::HMMERResult object # get hits numbers my $hitnumber=$result->num_hits; if ($hitnumber != 0) { open(OUT, ">$outfile"), or die "can't open the output file!!!!"; while (my $hits= $result->next_hit ) { my $value=$hits->significance; if ($value <=0.01) { print OUT $hits->name,"\t",$hits->description,"\t",$hits->significance,"\n"; } } close OUT; } } } } closedir(HMMDIR); ############################################################## When it run, you will see that: -------------------- WARNING --------------------- MSG: unrecognized line: +E +L i T eek+ e+ ++ +l++H Y+ I+ + --------------------------------------------------- why? I hope to get your help, hanks very much! - Done. From bix at sendu.me.uk Tue Sep 9 07:46:39 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 09 Sep 2008 12:46:39 +0100 Subject: [Bioperl-l] help:HMM parsing error In-Reply-To: <127e75f60809090405h644e51eftcab073e8bf179720@mail.gmail.com> References: <127e75f60809090405h644e51eftcab073e8bf179720@mail.gmail.com> Message-ID: <48C6621F.8040304@sendu.me.uk> boyang zhe wrote: > I write a script to parse the HMMER report ,it is as follows: [...] > my $in = new Bio::SearchIO(-format => 'hmmer',-file =>"$infile"); [...] > -------------------- WARNING --------------------- > MSG: unrecognized line: +E +L i T eek+ e+ ++ +l++H > Y+ I+ + > > --------------------------------------------------- > > why? I hope to get your help, hanks very much! I didn't check your code, but the easiest thing to try would be to use -format => 'hmmer_pull' to use an alternate parser that may be able to recognise that line. You might need to install the latest Bioperl from SVN (or at least 1.5.2) to get access to the hmmer_pull parser. From bosborne11 at verizon.net Tue Sep 9 10:50:38 2008 From: bosborne11 at verizon.net (Brian Osborne) Date: Tue, 9 Sep 2008 10:50:38 -0400 Subject: [Bioperl-l] SeqHound In-Reply-To: <1220452426.31595.92.camel@pepa.cbm.uam.es> References: <1220452426.31595.92.camel@pepa.cbm.uam.es> Message-ID: <55655D67-352A-4E0D-B402-7FC30628C1B1@verizon.net> Raul, After spending a few minutes at bond.unleashedinformatics.com I have to admit that it's not clear how one accesses their free version of BOND. There are no examples that I can see in their packages. If you are interested in looking at protein-protein networks in the Bioperl context you can also check out the bioperl-network package: http://www.bioperl.org/wiki/Network_package If you don't care what language you're using then you should consider Cytoscape, it's probably the package with the most capability. Brian O. On Sep 3, 2008, at 10:33 AM, Raul Mendez Giraldez wrote: > Hi Chris, > > I'm trying to set up and run bioperl Seqhound donwloaded from: > > http://bond.unleashedinformatics.com/downloads/api//seqhound-bioperl-4.0.tar.gz > > and I always get connection error messages. Do you know which > version of > SeqHound should I use and how can I configure to make it work? I've > tried several possibilities for server1 at .shoundremrc as > > [remote] > server1 = bond.unleashedinformatics.com > CGI = /cgi-bin/seqrem > port=8080 > > Also, I would like to get all the possible protein-protein interaction > for a set of protein sequences. Would this be possible using SeqHound? > > Thanks, > Ra?l > > > > > -- > Ra?l M?ndez Gir?ldez, Ph.D. > Bioinformatics Unit > Centre for Molecular Biology "Severo Ochoa" > Universidad Aut?noma de Madrid > C/ Nicol?s Cabrera, 1 > Cantoblanco 28049, Madrid > SPAIN > > Phone: +34 91 196 4633 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jimhu at tamu.edu Tue Sep 9 12:05:59 2008 From: jimhu at tamu.edu (Jim Hu) Date: Tue, 9 Sep 2008 11:05:59 -0500 Subject: [Bioperl-l] [Gmod-schema] Circular genomes in Chado/BioPerl In-Reply-To: <24c96eca0809081257l16461b23uaefe8154ed038bea@mail.gmail.com> References: <87B87004-A586-44A0-BB10-D4AA3FE9E669@tamu.edu> <24c96eca0809081257l16461b23uaefe8154ed038bea@mail.gmail.com> Message-ID: Hi Aaron, I was thinking this would be handled by making the end=parent feature length x 2 + end coord. end/parent length = number of times crosses origin. Jim On Sep 8, 2008, at 2:57 PM, Aaron Mackey wrote: > How can you handle features that may cross the origin more than once? > The modulus, though simple, seems to be only half the solution. It > also makes it difficult to place features in the genome "by eye" > (having to do the modulus subtraction in my head), or in > sorting/filtering operations. > > I have an alternative that I wondered if you considered: allow the > start/end to have an additional "circular revolution" prefix: > > a typical range tuple like: 100 200 - > is thus shorthand for: 0:100 0:200 - > (i.e. both the 100 and 200 are in the same "revolution" around the > genome) > > and is then distinguishable from an "around the genome + 100" > feature of: > 1:100 0:200 - > > Just an alternative to consider (if you haven't already). I'm not > wedded to the syntax, but I wouldn't want to see new columns in GFF > just for this. Essentially, what you want is some form of compound > polar coordinates, it seems. > > -Aaron > > On Mon, Sep 8, 2008 at 2:44 PM, Jim Hu wrote: >> In discussions with GMOD about Gbrowse, we've come up with a >> proposal for >> handling circular genomes and features that cross the origin in such >> genomes. This applies to lots of prokaryotic and viral genomes, >> and might >> be valuable for some ways of representing terminally redundant linear >> genomes. >> 1) Keep the requirement that start < end >> 2) allow end > parent feature length >> 3) parent feature gets an is_circular boolean >> 4) use modular arithmetic to calculate the real position of end on >> the >> parent feature. >> We'd like to do this in a way that will be consistent with Chado >> and BioPerl >> representation of features as much as possible (realizing that >> there is the >> usual interbase or not coordinate issue). What do people think? >> Lincoln is >> on board for modifying the GFF3 spec. >> Thanks! >> Jim Hu >> >> ===================================== >> >> Jim Hu >> >> Associate Professor >> >> Dept. of Biochemistry and Biophysics >> >> 2128 TAMU >> >> Texas A&M Univ. >> >> College Station, TX 77843-2128 >> >> 979-862-4054 >> >> >> ------------------------------------------------------------------------- >> This SF.Net email is sponsored by the Moblin Your Move Developer's >> challenge >> Build the coolest Linux based applications with Moblin SDK & win >> great >> prizes >> Grand prize is a trip for two to an Open Source event anywhere in >> the world >> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >> _______________________________________________ >> Gmod-schema mailing list >> Gmod-schema at lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/gmod-schema >> >> ===================================== Jim Hu Associate Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054 From jason at bioperl.org Tue Sep 9 12:07:45 2008 From: jason at bioperl.org (Jason Stajich) Date: Tue, 9 Sep 2008 09:07:45 -0700 Subject: [Bioperl-l] help:HMM parsing error In-Reply-To: <48C6621F.8040304@sendu.me.uk> References: <127e75f60809090405h644e51eftcab073e8bf179720@mail.gmail.com> <48C6621F.8040304@sendu.me.uk> Message-ID: Although it would be good to fix the parser as well -- best solution is to submit that report as a bug to bugzilla http://bugzilla.open-bio.org/ -jason On Sep 9, 2008, at 4:46 AM, Sendu Bala wrote: > boyang zhe wrote: >> I write a script to parse the HMMER report ,it is as follows: > [...] >> my $in = new Bio::SearchIO(-format => 'hmmer',-file =>"$infile"); > [...] >> -------------------- WARNING --------------------- >> MSG: unrecognized line: +E +L i T eek+ e+ + >> + +l++H >> Y+ I+ + >> --------------------------------------------------- >> why? I hope to get your help, hanks very much! > > I didn't check your code, but the easiest thing to try would be to > use -format => 'hmmer_pull' to use an alternate parser that may be > able to recognise that line. You might need to install the latest > Bioperl from SVN (or at least 1.5.2) to get access to the hmmer_pull > parser. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason at bioperl.org From cain.cshl at gmail.com Tue Sep 9 13:33:12 2008 From: cain.cshl at gmail.com (Scott Cain) Date: Tue, 9 Sep 2008 13:33:12 -0400 Subject: [Bioperl-l] [Gmod-schema] Circular genomes in Chado/BioPerl In-Reply-To: References: <87B87004-A586-44A0-BB10-D4AA3FE9E669@tamu.edu> <24c96eca0809081257l16461b23uaefe8154ed038bea@mail.gmail.com> Message-ID: <536f21b00809091033v1e412f4ft54d8e139c347a20a@mail.gmail.com> Hi Jim and All, While I agree with Aaron's point that it is not easy to place features by visual inspection, this seems like a fairly minor point. The vast majority of GFF3 manipulation will be done in software, so as long as the API handles everything correctly, life is good. If we discount that objection, there doesn't seem to be much advantage of using Aaron's suggested method over Jim's. (As a side note--I have the same complaint about anything in XML--it is awful for a human to read. I still live with XML when I have to though :-) Additionally, the fact that Ensembl is using the same method as what Jim describes is a fairly powerful argument for doing the same. Hopefully there can be some code reuse. Scott On Tue, Sep 9, 2008 at 12:05 PM, Jim Hu wrote: > Hi Aaron, > I was thinking this would be handled by making the end=parent feature length > x 2 + end coord. end/parent length = number of times crosses origin. > Jim > On Sep 8, 2008, at 2:57 PM, Aaron Mackey wrote: > > How can you handle features that may cross the origin more than once? > The modulus, though simple, seems to be only half the solution. It > also makes it difficult to place features in the genome "by eye" > (having to do the modulus subtraction in my head), or in > sorting/filtering operations. > > I have an alternative that I wondered if you considered: allow the > start/end to have an additional "circular revolution" prefix: > > a typical range tuple like: 100 200 - > is thus shorthand for: 0:100 0:200 - > (i.e. both the 100 and 200 are in the same "revolution" around the genome) > > and is then distinguishable from an "around the genome + 100" feature of: > 1:100 0:200 - > > Just an alternative to consider (if you haven't already). I'm not > wedded to the syntax, but I wouldn't want to see new columns in GFF > just for this. Essentially, what you want is some form of compound > polar coordinates, it seems. > > -Aaron > > On Mon, Sep 8, 2008 at 2:44 PM, Jim Hu wrote: > > In discussions with GMOD about Gbrowse, we've come up with a proposal for > > handling circular genomes and features that cross the origin in such > > genomes. This applies to lots of prokaryotic and viral genomes, and might > > be valuable for some ways of representing terminally redundant linear > > genomes. > > 1) Keep the requirement that start < end > > 2) allow end > parent feature length > > 3) parent feature gets an is_circular boolean > > 4) use modular arithmetic to calculate the real position of end on the > > parent feature. > > We'd like to do this in a way that will be consistent with Chado and BioPerl > > representation of features as much as possible (realizing that there is the > > usual interbase or not coordinate issue). What do people think? Lincoln is > > on board for modifying the GFF3 spec. > > Thanks! > > Jim Hu > > ===================================== > > Jim Hu > > Associate Professor > > Dept. of Biochemistry and Biophysics > > 2128 TAMU > > Texas A&M Univ. > > College Station, TX 77843-2128 > > 979-862-4054 > > > ------------------------------------------------------------------------- > > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > > Build the coolest Linux based applications with Moblin SDK & win great > > prizes > > Grand prize is a trip for two to an Open Source event anywhere in the world > > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > > _______________________________________________ > > Gmod-schema mailing list > > Gmod-schema at lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/gmod-schema > > > > ===================================== > > Jim Hu > > Associate Professor > > Dept. of Biochemistry and Biophysics > > 2128 TAMU > > Texas A&M Univ. > > College Station, TX 77843-2128 > > 979-862-4054 > > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great > prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Gmod-schema mailing list > Gmod-schema at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/gmod-schema > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain.cshl at gmail.com GMOD Coordinator (http://gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From lincoln.stein at gmail.com Tue Sep 9 13:52:36 2008 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Tue, 9 Sep 2008 13:52:36 -0400 Subject: [Bioperl-l] [Gmod-schema] Circular genomes in Chado/BioPerl In-Reply-To: <24c96eca0809081257l16461b23uaefe8154ed038bea@mail.gmail.com> References: <87B87004-A586-44A0-BB10-D4AA3FE9E669@tamu.edu> <24c96eca0809081257l16461b23uaefe8154ed038bea@mail.gmail.com> Message-ID: <6dce9a0b0809091052g1c398a84tfe8f89d1bf9132c8@mail.gmail.com> It seems to me that the proposed modulus syntax handles multiple revolutions. Consider a 100 bp genome (to make it simple) and a feature that starts at 50, goes around twice, and ends at position 60: start = 50 end = 260 length = end - start + 1 revolutions = int (length/genome) stop position = length % genome + 1 Lincoln On Mon, Sep 8, 2008 at 3:57 PM, Aaron Mackey wrote: > How can you handle features that may cross the origin more than once? > The modulus, though simple, seems to be only half the solution. It > also makes it difficult to place features in the genome "by eye" > (having to do the modulus subtraction in my head), or in > sorting/filtering operations. > > I have an alternative that I wondered if you considered: allow the > start/end to have an additional "circular revolution" prefix: > > a typical range tuple like: 100 200 - > is thus shorthand for: 0:100 0:200 - > (i.e. both the 100 and 200 are in the same "revolution" around the genome) > > and is then distinguishable from an "around the genome + 100" feature of: > 1:100 0:200 - > > Just an alternative to consider (if you haven't already). I'm not > wedded to the syntax, but I wouldn't want to see new columns in GFF > just for this. Essentially, what you want is some form of compound > polar coordinates, it seems. > > -Aaron > > On Mon, Sep 8, 2008 at 2:44 PM, Jim Hu wrote: > > In discussions with GMOD about Gbrowse, we've come up with a proposal for > > handling circular genomes and features that cross the origin in such > > genomes. This applies to lots of prokaryotic and viral genomes, and > might > > be valuable for some ways of representing terminally redundant linear > > genomes. > > 1) Keep the requirement that start < end > > 2) allow end > parent feature length > > 3) parent feature gets an is_circular boolean > > 4) use modular arithmetic to calculate the real position of end on the > > parent feature. > > We'd like to do this in a way that will be consistent with Chado and > BioPerl > > representation of features as much as possible (realizing that there is > the > > usual interbase or not coordinate issue). What do people think? Lincoln > is > > on board for modifying the GFF3 spec. > > Thanks! > > Jim Hu > > > > ===================================== > > > > Jim Hu > > > > Associate Professor > > > > Dept. of Biochemistry and Biophysics > > > > 2128 TAMU > > > > Texas A&M Univ. > > > > College Station, TX 77843-2128 > > > > 979-862-4054 > > > > > > ------------------------------------------------------------------------- > > This SF.Net email is sponsored by the Moblin Your Move Developer's > challenge > > Build the coolest Linux based applications with Moblin SDK & win great > > prizes > > Grand prize is a trip for two to an Open Source event anywhere in the > world > > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > > _______________________________________________ > > Gmod-schema mailing list > > Gmod-schema at lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/gmod-schema > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Stacey Quinn Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 USA (516) 367-8380 Assistant: Sandra Michelsen From cjfields at illinois.edu Tue Sep 9 14:24:49 2008 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 9 Sep 2008 13:24:49 -0500 Subject: [Bioperl-l] [Gmod-schema] Circular genomes in Chado/BioPerl In-Reply-To: References: <87B87004-A586-44A0-BB10-D4AA3FE9E669@tamu.edu> <24c96eca0809081257l16461b23uaefe8154ed038bea@mail.gmail.com> Message-ID: Is there any particular reason we don't treat this similarly to the way BioPerl does, which is to simply treat the origin-overlapping feature as a split location? GenBank treats this similarly. For an faux example, the bug I just fixed for bugzilla has one: http://bugzilla.open-bio.org/show_bug.cgi?id=2579 An actual GenBank case is the Sulfolobus solfataricus genome (NC_002754), and I'm sure Jim could come up with more. The only caveat is whether we should represent this As for multiple revolutions, I'm not sure the hand-wringing about specifics is worth it unless we have explicit workable examples to test against (preferably examples which would potentially pop up), but Lincoln's proposal sounds fine. chris On Sep 9, 2008, at 11:05 AM, Jim Hu wrote: > Hi Aaron, > > I was thinking this would be handled by making the end=parent > feature length x 2 + end coord. end/parent length = number of times > crosses origin. > > Jim > > On Sep 8, 2008, at 2:57 PM, Aaron Mackey wrote: > >> How can you handle features that may cross the origin more than once? >> The modulus, though simple, seems to be only half the solution. It >> also makes it difficult to place features in the genome "by eye" >> (having to do the modulus subtraction in my head), or in >> sorting/filtering operations. >> >> I have an alternative that I wondered if you considered: allow the >> start/end to have an additional "circular revolution" prefix: >> >> a typical range tuple like: 100 200 - >> is thus shorthand for: 0:100 0:200 - >> (i.e. both the 100 and 200 are in the same "revolution" around the >> genome) >> >> and is then distinguishable from an "around the genome + 100" >> feature of: >> 1:100 0:200 - >> >> Just an alternative to consider (if you haven't already). I'm not >> wedded to the syntax, but I wouldn't want to see new columns in GFF >> just for this. Essentially, what you want is some form of compound >> polar coordinates, it seems. >> >> -Aaron >> >> On Mon, Sep 8, 2008 at 2:44 PM, Jim Hu wrote: >>> In discussions with GMOD about Gbrowse, we've come up with a >>> proposal for >>> handling circular genomes and features that cross the origin in such >>> genomes. This applies to lots of prokaryotic and viral genomes, >>> and might >>> be valuable for some ways of representing terminally redundant >>> linear >>> genomes. >>> 1) Keep the requirement that start < end >>> 2) allow end > parent feature length >>> 3) parent feature gets an is_circular boolean >>> 4) use modular arithmetic to calculate the real position of end on >>> the >>> parent feature. >>> We'd like to do this in a way that will be consistent with Chado >>> and BioPerl >>> representation of features as much as possible (realizing that >>> there is the >>> usual interbase or not coordinate issue). What do people think? >>> Lincoln is >>> on board for modifying the GFF3 spec. >>> Thanks! >>> Jim Hu >>> >>> ===================================== >>> >>> Jim Hu >>> >>> Associate Professor >>> >>> Dept. of Biochemistry and Biophysics >>> >>> 2128 TAMU >>> >>> Texas A&M Univ. >>> >>> College Station, TX 77843-2128 >>> >>> 979-862-4054 >>> >>> >>> ------------------------------------------------------------------------- >>> This SF.Net email is sponsored by the Moblin Your Move Developer's >>> challenge >>> Build the coolest Linux based applications with Moblin SDK & win >>> great >>> prizes >>> Grand prize is a trip for two to an Open Source event anywhere in >>> the world >>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>> _______________________________________________ >>> Gmod-schema mailing list >>> Gmod-schema at lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/gmod-schema >>> >>> > > ===================================== > Jim Hu > Associate Professor > Dept. of Biochemistry and Biophysics > 2128 TAMU > Texas A&M Univ. > College Station, TX 77843-2128 > 979-862-4054 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Marie-Claude Hofmann College of Veterinary Medicine University of Illinois Urbana-Champaign From ajmackey at gmail.com Tue Sep 9 14:48:12 2008 From: ajmackey at gmail.com (Aaron Mackey) Date: Tue, 9 Sep 2008 14:48:12 -0400 Subject: [Bioperl-l] [Gmod-schema] Circular genomes in Chado/BioPerl In-Reply-To: <6dce9a0b0809091052g1c398a84tfe8f89d1bf9132c8@mail.gmail.com> References: <87B87004-A586-44A0-BB10-D4AA3FE9E669@tamu.edu> <24c96eca0809081257l16461b23uaefe8154ed038bea@mail.gmail.com> <6dce9a0b0809091052g1c398a84tfe8f89d1bf9132c8@mail.gmail.com> Message-ID: <24c96eca0809091148l738604a7q13fba54ac05de01c@mail.gmail.com> Right, the modulus calculation continues to work, but for instance, what'll happen when I now ask Gbrowse (or Ensembl) to show me positions 50..260? Will it show me 50 .. 60, 1:100, or "unroll" the genome twice from 50..260 (that'd be a pretty cute trick, by the way!) You're (re)using simple arithmetic to compress a compound coordinate into a single-valued coordinate (which I realize can be trivially packed and unpacked by software), but I worry about the downstream consequences of software having to always remember that the coordinates given may have to be unpacked or not, and not being able to immediately identify whether "260" is a real or compound coordinate. To say it another way, I'm happy (that is, don't care much) whether Chado or any other underlying data storage uses such compound coordinates, because only Chado-reliant tools will need to care; but I do worry about GFF3 as a (relatively) simple exchange format having that kind of silent bug-causing complexity. I'd much rather see GFF be syntactically explicit, and not quite so cleverly implicit. Just one GFF user's two cents, thanks for listening, -Aaron On Tue, Sep 9, 2008 at 1:52 PM, Lincoln Stein wrote: > It seems to me that the proposed modulus syntax handles multiple > revolutions. Consider a 100 bp genome (to make it simple) and a feature that > starts at 50, goes around twice, and ends at position 60: > > start = 50 > end = 260 > > length = end - start + 1 > revolutions = int (length/genome) > stop position = length % genome + 1 > > Lincoln > > On Mon, Sep 8, 2008 at 3:57 PM, Aaron Mackey wrote: >> >> How can you handle features that may cross the origin more than once? >> The modulus, though simple, seems to be only half the solution. It >> also makes it difficult to place features in the genome "by eye" >> (having to do the modulus subtraction in my head), or in >> sorting/filtering operations. >> >> I have an alternative that I wondered if you considered: allow the >> start/end to have an additional "circular revolution" prefix: >> >> a typical range tuple like: 100 200 - >> is thus shorthand for: 0:100 0:200 - >> (i.e. both the 100 and 200 are in the same "revolution" around the genome) >> >> and is then distinguishable from an "around the genome + 100" feature of: >> 1:100 0:200 - >> >> Just an alternative to consider (if you haven't already). I'm not >> wedded to the syntax, but I wouldn't want to see new columns in GFF >> just for this. Essentially, what you want is some form of compound >> polar coordinates, it seems. >> >> -Aaron >> >> On Mon, Sep 8, 2008 at 2:44 PM, Jim Hu wrote: >> > In discussions with GMOD about Gbrowse, we've come up with a proposal >> > for >> > handling circular genomes and features that cross the origin in such >> > genomes. This applies to lots of prokaryotic and viral genomes, and >> > might >> > be valuable for some ways of representing terminally redundant linear >> > genomes. >> > 1) Keep the requirement that start < end >> > 2) allow end > parent feature length >> > 3) parent feature gets an is_circular boolean >> > 4) use modular arithmetic to calculate the real position of end on the >> > parent feature. >> > We'd like to do this in a way that will be consistent with Chado and >> > BioPerl >> > representation of features as much as possible (realizing that there is >> > the >> > usual interbase or not coordinate issue). What do people think? >> > Lincoln is >> > on board for modifying the GFF3 spec. >> > Thanks! >> > Jim Hu >> > >> > ===================================== >> > >> > Jim Hu >> > >> > Associate Professor >> > >> > Dept. of Biochemistry and Biophysics >> > >> > 2128 TAMU >> > >> > Texas A&M Univ. >> > >> > College Station, TX 77843-2128 >> > >> > 979-862-4054 >> > >> > >> > >> > ------------------------------------------------------------------------- >> > This SF.Net email is sponsored by the Moblin Your Move Developer's >> > challenge >> > Build the coolest Linux based applications with Moblin SDK & win great >> > prizes >> > Grand prize is a trip for two to an Open Source event anywhere in the >> > world >> > http://moblin-contest.org/redirect.php?banner_id=100&url=/ >> > _______________________________________________ >> > Gmod-schema mailing list >> > Gmod-schema at lists.sourceforge.net >> > https://lists.sourceforge.net/lists/listinfo/gmod-schema >> > >> > >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > Lincoln D. Stein > > Ontario Institute for Cancer Research > 101 College St., Suite 800 > Toronto, ON, Canada M5G0A3 > 416 673-8514 > Assistant: Stacey Quinn > > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 USA > (516) 367-8380 > Assistant: Sandra Michelsen > From cjfields at illinois.edu Tue Sep 9 14:49:13 2008 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 9 Sep 2008 13:49:13 -0500 Subject: [Bioperl-l] [Gmod-schema] Circular genomes in Chado/BioPerl In-Reply-To: References: <87B87004-A586-44A0-BB10-D4AA3FE9E669@tamu.edu> <24c96eca0809081257l16461b23uaefe8154ed038bea@mail.gmail.com>