From shameer at ncbs.res.in Tue May 1 07:36:31 2007 From: shameer at ncbs.res.in (Shameer Khadar) Date: Tue, 1 May 2007 17:06:31 +0530 (IST) Subject: [Bioperl-l] Help : Imagemaps using Bio::Graphics In-Reply-To: References: <10259461.post@talk.nabble.com> Message-ID: <41667.192.168.1.1.1178019391.squirrel@mail.ncbs.res.in> Dear All, I am trying to impliment a bioperl based program to generate a dynamic, clickable image. I have used Dr. Lincoln Steins's code provided in example3 at this URL : http://stein.cshl.org/genome_informatics/BioGraphics/blast3.html seems to be perfect for my purpose. I need to add few modifications to the image. I reffered the Bio::Graphics HOWTO, Creating_Imagemaps documents and other old bio-perl list mails (may be am missing something imp.. ? ) but I couldnt get a quick solution, Thought I will ask about it to the experts for some tips and tricks. This is what I am looking for : 1. I need image of exactly same size and the scale (0.1k .. 0.9k) to be changed according to length of the sequence. My sequence length is usually in a range of 70 - 200. 2. I also need to make the image interactive / clickable on the various blue bar as different hyperlink to NCBI / PDB using ID (This ids will be used instead of name of the blast hits) Many thanks in advance for your inputs, -- Shameer Khadar Lab (# 25) The Computational Biology Group National Centre for Biological Sciences (TIFR) GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India T - 91-080-23666001 EXT - 6251 W - http://www.ncbs.res.in From shameer at ncbs.res.in Tue May 1 12:04:13 2007 From: shameer at ncbs.res.in (Shameer Khadar) Date: Tue, 1 May 2007 21:34:13 +0530 (IST) Subject: [Bioperl-l] Help : Imagemaps using Bio::Graphics In-Reply-To: <1178028249.2644.13.camel@localhost.localdomain> References: <10259461.post@talk.nabble.com> <41667.192.168.1.1.1178019391.squirrel@mail.ncbs.res.in> <1178028249.2644.13.camel@localhost.localdomain> Message-ID: <42403.192.168.1.1.1178035453.squirrel@mail.ncbs.res.in> Dear Scot, > There is a fair amount of documentation in the perldoc for > Bio::Graphics::Panel under the section called 'Creating Imagemaps'; have > you read that? I agreed, but I couldnt the exact information I needed :( (may be I missed something important). > Also, for changing the scale, that should happen > automatically--have you tried yet? I tried by changing the Lincoln's program eg: blast3.pl my $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>1000); to my $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>300); But it had given me a smaller scale of length upto 300. I was looking for an option where I need same width and height of given image and a dynamic start and end values depending on length of my sequence. Since I couldnt accomplish, I thought of getting some help from you guys. I think I need to play a little bit with the value for reformat the scale to accomodate my hits as well. Thanks a lot for your inputs, -- Shameer Khadar Lab (# 25) The Computational Biology Group National Centre for Biological Sciences (TIFR) GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India T - 91-080-23666001 EXT - 6251 W - http://www.ncbs.res.in From shameer at ncbs.res.in Tue May 1 12:04:11 2007 From: shameer at ncbs.res.in (Shameer Khadar) Date: Tue, 1 May 2007 21:34:11 +0530 (IST) Subject: [Bioperl-l] Help : Imagemaps using Bio::Graphics In-Reply-To: <1178028249.2644.13.camel@localhost.localdomain> References: <10259461.post@talk.nabble.com> <41667.192.168.1.1.1178019391.squirrel@mail.ncbs.res.in> <1178028249.2644.13.camel@localhost.localdomain> Message-ID: <42391.192.168.1.1.1178035451.squirrel@mail.ncbs.res.in> Dear Scot, > There is a fair amount of documentation in the perldoc for > Bio::Graphics::Panel under the section called 'Creating Imagemaps'; have > you read that? I agreed, but I couldnt the exact information I needed :( (may be I missed something important). > Also, for changing the scale, that should happen > automatically--have you tried yet? I tried by changing the Lincoln's program eg: blast3.pl my $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>1000); to my $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>300); But it had given me a smaller scale of length upto 300. I was looking for an option where I need same width and height of given image and a dynamic start and end values depending on length of my sequence. Since I couldnt accomplish, I thought of getting some help from you guys. I think I need to play a little bit with the value for reformat the scale to accomodate my hits as well. Thanks a lot for your inputs, -- Shameer Khadar Lab (# 25) The Computational Biology Group National Centre for Biological Sciences (TIFR) GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India T - 91-080-23666001 EXT - 6251 W - http://www.ncbs.res.in From cain at cshl.edu Tue May 1 10:04:09 2007 From: cain at cshl.edu (Scott Cain) Date: Tue, 01 May 2007 10:04:09 -0400 Subject: [Bioperl-l] Help : Imagemaps using Bio::Graphics In-Reply-To: <41667.192.168.1.1.1178019391.squirrel@mail.ncbs.res.in> References: <10259461.post@talk.nabble.com> <41667.192.168.1.1.1178019391.squirrel@mail.ncbs.res.in> Message-ID: <1178028249.2644.13.camel@localhost.localdomain> Hi Shameer, There is a fair amount of documentation in the perldoc for Bio::Graphics::Panel under the section called 'Creating Imagemaps'; have you read that? Also, for changing the scale, that should happen automatically--have you tried yet? Scott On Tue, 2007-05-01 at 17:06 +0530, Shameer Khadar wrote: > Dear All, > > I am trying to impliment a bioperl based program to generate a dynamic, > clickable image. I have used Dr. Lincoln Steins's code provided in > example3 at this URL : > http://stein.cshl.org/genome_informatics/BioGraphics/blast3.html seems to > be perfect for my purpose. > > I need to add few modifications to the image. I reffered the Bio::Graphics > HOWTO, Creating_Imagemaps documents and other old bio-perl list mails > (may be am missing something imp.. ? ) but I couldnt get a quick > solution, Thought I will ask about it to the experts for some tips and > tricks. > > This is what I am looking for : > > 1. I need image of exactly same size and the scale (0.1k .. 0.9k) to be > changed according to length of the sequence. My sequence length is usually > in a range of 70 - 200. > > 2. I also need to make the image interactive / clickable on the various > blue bar as different hyperlink to NCBI / PDB using ID (This ids will be > used instead of name of the blast hits) > > > Many thanks in advance for your inputs, -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070501/f84a3220/attachment.bin From cjfields at uiuc.edu Tue May 1 13:10:10 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 1 May 2007 12:10:10 -0500 Subject: [Bioperl-l] Pb makefile In-Reply-To: References: Message-ID: Is there any reason you want to install bioperl 1.4 (which is over 3 yrs old)? The latest is v.1.5.2 (Dec. 2006); man page generation has been fixed for that version, which uses Module::Build. The man page generation was turned off prior to 1.4, though I may be wrong. Based on the Extutils::MakeMaker FAQ you should be able to prevent man page generation this way: perl Makefile.PL INSTALLMAN1DIR=none INSTALLMAN3DIR=none chris On Apr 30, 2007, at 5:35 AM, Francoise.LECOMTE at biogemma.com wrote: > Hi > I try to install biopoerl1.4 on Tru64 plateform and I've got a message > "make:line too long" when I run the command make install > How can I solve it ? How disable man pages installaton in > Makefile.PL if > it can sove this problem > > Best regards > > Fran?oise Lecomte From cain.cshl at gmail.com Tue May 1 15:50:42 2007 From: cain.cshl at gmail.com (Scott Cain) Date: Tue, 01 May 2007 15:50:42 -0400 Subject: [Bioperl-l] Help : Imagemaps using Bio::Graphics In-Reply-To: <42391.192.168.1.1.1178035451.squirrel@mail.ncbs.res.in> References: <10259461.post@talk.nabble.com> <41667.192.168.1.1.1178019391.squirrel@mail.ncbs.res.in> <1178028249.2644.13.camel@localhost.localdomain> <42391.192.168.1.1.1178035451.squirrel@mail.ncbs.res.in> Message-ID: <1178049042.2644.36.camel@localhost.localdomain> Perhaps if you provided some code and sample data we might be able to help you better. Scott On Tue, 2007-05-01 at 21:34 +0530, Shameer Khadar wrote: > Dear Scot, > > > There is a fair amount of documentation in the perldoc for > > Bio::Graphics::Panel under the section called 'Creating Imagemaps'; have > > you read that? > > I agreed, but I couldnt the exact information I needed :( (may be I missed > something important). > > > Also, for changing the scale, that should happen > > automatically--have you tried yet? > > I tried by changing the Lincoln's program eg: blast3.pl > my $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>1000); > to my > $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>300); > > But it had given me a smaller scale of length upto 300. I was looking for > an option where I need same width and height of given image and a dynamic > start and end values depending on length of my sequence. Since I couldnt > accomplish, I thought of getting some help from you guys. I think I need > to play a little bit with the value for reformat the scale to accomodate > my hits as well. > > Thanks a lot for your inputs, -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain.cshl at gmail.com GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070501/9c655e4c/attachment.bin From agathman at semo.edu Tue May 1 19:10:20 2007 From: agathman at semo.edu (Gathman, Allen) Date: Tue, 1 May 2007 18:10:20 -0500 Subject: [Bioperl-l] Problem with spliced_seq in BioPerl 1.5.2 Message-ID: <2DA21E6CECCDE7419541C7A6DB798F0C080CB704@EXCHANGE.semo.edu> Hi, all -- I've been using BioPerl 1.4 for a while; recently I installed 1.5.2, and found that scripts that had been using spliced_seq are now broken. Any thoughts on what might be going on? Here's a sample script: ********************************************* #!/usr/bin/perl -w use strict; use Bio::DB::GFF; my $db = Bio::DB::GFF-> new ( - adaptor => 'dbi::mysql', -dsn => 'dbi:mysql:database=cc;host=localhost', -fasta => '/gbrowse/databases/cc' ); $db->add_aggregator('transcript{CDS/mRNA}'); my $seg=$db->segment('ccin_Contig120'); my @genes=$seg->features(-types=>('transcript:GLEAN_alt')); for my $gene (@genes) { my $gid = $gene->display_id; print STDERR "Gene is $gid\n"; my $splgene = $gene->spliced_seq(); } ******************************************** The line with "spliced_seq" in it crashes the program. Here's the STDERR output: Gene is Jan06m400_GLEAN_11487 -------------------- WARNING --------------------- MSG: Calling spliced_seq with a Bio::Das::SegmentI which does have absolute set to 1 -- be warned you may not be getting things on the correct strand --------------------------------------------------- -------------------- WARNING --------------------- MSG: seq doesn't validate, mismatch is ::,(0,88211,0),::,(0,8821170),::,(0,8821260),::,(0,8821308),::,(0,881935 ,),::,(0,881,468),::,(0,881,4,4),::,(0,8818,0),::,(0,8819098) --------------------------------------------------- ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Attempting to set the sequence to [Bio::PrimarySeq=HASH(0x88211d0)Bio::PrimarySeq=HASH(0x8821170)Bio::Prim arySeq=HASH(0x8821260)Bio::PrimarySeq=HASH(0x8821308)Bio::PrimarySeq=HAS H(0x881935c)Bio::PrimarySeq=HASH(0x881f468)Bio::PrimarySeq=HASH(0x881f4a 4)Bio::PrimarySeq=HASH(0x8818ff0)Bio::PrimarySeq=HASH(0x8819098)] which does not look healthy STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.6/Bio/Root/Root.pm:359 STACK: Bio::PrimarySeq::seq /usr/lib/perl5/site_perl/5.8.6/Bio/PrimarySeq.pm:258 STACK: Bio::PrimarySeq::new /usr/lib/perl5/site_perl/5.8.6/Bio/PrimarySeq.pm:210 STACK: Bio::Seq::new /usr/lib/perl5/site_perl/5.8.6/Bio/Seq.pm:484 STACK: Bio::SeqFeatureI::spliced_seq /usr/lib/perl5/site_perl/5.8.6/Bio/SeqFeatureI.pm:498 STACK: /transfer/testsplice.pl:20 ----------------------------------------------------------- Allen Gathman http://cstl-csm.semo.edu/gathman From cjfields at uiuc.edu Tue May 1 20:27:46 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 1 May 2007 19:27:46 -0500 Subject: [Bioperl-l] Problem with spliced_seq in BioPerl 1.5.2 In-Reply-To: <2DA21E6CECCDE7419541C7A6DB798F0C080CB704@EXCHANGE.semo.edu> References: <2DA21E6CECCDE7419541C7A6DB798F0C080CB704@EXCHANGE.semo.edu> Message-ID: <9F00B020-AFF0-40DB-9694-6061B5A11A73@uiuc.edu> Can you file a bug on this? Attach the script and maybe detail what data is loaded into your local MySQL database (if possible). chris On May 1, 2007, at 6:10 PM, Gathman, Allen wrote: > Hi, all -- > > I've been using BioPerl 1.4 for a while; recently I installed > 1.5.2, and > found that scripts that had been using spliced_seq are now broken. > Any > thoughts on what might be going on? > > Here's a sample script: > > ********************************************* > > #!/usr/bin/perl -w > > use strict; > use Bio::DB::GFF; > > my $db = Bio::DB::GFF-> new ( - adaptor => 'dbi::mysql', > -dsn => > 'dbi:mysql:database=cc;host=localhost', > -fasta => '/gbrowse/databases/cc' > ); > $db->add_aggregator('transcript{CDS/mRNA}'); > my $seg=$db->segment('ccin_Contig120'); > my @genes=$seg->features(-types=>('transcript:GLEAN_alt')); > > for my $gene (@genes) { > my $gid = $gene->display_id; > > print STDERR "Gene is $gid\n"; > my $splgene = $gene->spliced_seq(); > } > > ******************************************** > The line with "spliced_seq" in it crashes the program. Here's the > STDERR output: > > Gene is Jan06m400_GLEAN_11487 > > -------------------- WARNING --------------------- > > MSG: Calling spliced_seq with a Bio::Das::SegmentI which does have > absolute set to 1 -- be warned you may not be getting things on the > correct strand > > --------------------------------------------------- > > -------------------- WARNING --------------------- > > MSG: seq doesn't validate, mismatch is > ::,(0,88211,0),::,(0,8821170),::,(0,8821260),::,(0,8821308),::, > (0,881935 > ,),::,(0,881,468),::,(0,881,4,4),::,(0,8818,0),::,(0,8819098) > > --------------------------------------------------- > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: Attempting to set the sequence to > [Bio::PrimarySeq=HASH(0x88211d0)Bio::PrimarySeq=HASH(0x8821170) > Bio::Prim > arySeq=HASH(0x8821260)Bio::PrimarySeq=HASH(0x8821308) > Bio::PrimarySeq=HAS > H(0x881935c)Bio::PrimarySeq=HASH(0x881f468)Bio::PrimarySeq=HASH > (0x881f4a > 4)Bio::PrimarySeq=HASH(0x8818ff0)Bio::PrimarySeq=HASH(0x8819098)] > which > does not look healthy > > STACK: Error::throw > > STACK: Bio::Root::Root::throw > /usr/lib/perl5/site_perl/5.8.6/Bio/Root/Root.pm:359 > > STACK: Bio::PrimarySeq::seq > /usr/lib/perl5/site_perl/5.8.6/Bio/PrimarySeq.pm:258 > > STACK: Bio::PrimarySeq::new > /usr/lib/perl5/site_perl/5.8.6/Bio/PrimarySeq.pm:210 > > STACK: Bio::Seq::new /usr/lib/perl5/site_perl/5.8.6/Bio/Seq.pm:484 > > STACK: Bio::SeqFeatureI::spliced_seq > /usr/lib/perl5/site_perl/5.8.6/Bio/SeqFeatureI.pm:498 > > STACK: /transfer/testsplice.pl:20 > > ----------------------------------------------------------- > > Allen Gathman > > http://cstl-csm.semo.edu/gathman > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From shameer at ncbs.res.in Tue May 1 23:46:59 2007 From: shameer at ncbs.res.in (Shameer Khadar) Date: Wed, 2 May 2007 09:16:59 +0530 (IST) Subject: [Bioperl-l] Help : Imagemaps using Bio::Graphics In-Reply-To: <1178049042.2644.36.camel@localhost.localdomain> References: <10259461.post@talk.nabble.com> <41667.192.168.1.1.1178019391.squirrel@mail.ncbs.res.in> <1178028249.2644.13.camel@localhost.localdomain> <42391.192.168.1.1.1178035451.squirrel@mail.ncbs.res.in> <1178049042.2644.36.camel@localhost.localdomain> Message-ID: <59122.192.168.1.1.1178077619.squirrel@mail.ncbs.res.in> Dear Scott, Once thanks a lot for your inputs. I am following same data formats as in http://stein.cshl.org/genome_informatics/BioGraphics/eg/blast_hits.txt Only difference is instead of Hits, I will be using PFAMID/PDBID. Allt he blue boxes (feature) should be clickable like a hot-spot/imagesmap images. The purpose is to display these results in a web page. I am using the program in Stein's Bio::Graphics example http://stein.cshl.org/genome_informatics/BioGraphics/eg/blast3.pl I need exactly same image as in http://stein.cshl.org/genome_informatics/BioGraphics/fig3.png only difference is I need the scale (0.1k - 0.9k) in a range of simple 1-XXX , here XXX depends on the length of the sequence input. Many thanks for your help, > Perhaps if you provided some code and sample data we might be able to > help you better. > > Scott > -- Shameer Khadar Lab (# 25) The Computational Biology Group National Centre for Biological Sciences (TIFR) GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India T - 91-080-23666001 EXT - 6251 W - http://www.ncbs.res.in From sdavis2 at mail.nih.gov Wed May 2 06:02:48 2007 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Wed, 2 May 2007 06:02:48 -0400 Subject: [Bioperl-l] Help : Imagemaps using Bio::Graphics In-Reply-To: <59122.192.168.1.1.1178077619.squirrel@mail.ncbs.res.in> References: <10259461.post@talk.nabble.com> <1178049042.2644.36.camel@localhost.localdomain> <59122.192.168.1.1.1178077619.squirrel@mail.ncbs.res.in> Message-ID: <200705020602.48404.sdavis2@mail.nih.gov> On Tuesday 01 May 2007 23:46, Shameer Khadar wrote: > Dear Scott, > > Once thanks a lot for your inputs. > > I am following same data formats as in > http://stein.cshl.org/genome_informatics/BioGraphics/eg/blast_hits.txt > Only difference is instead of Hits, I will be using PFAMID/PDBID. Allt he > blue boxes (feature) should be clickable like a hot-spot/imagesmap images. > The purpose is to display these results in a web page. Do you have your data loaded into bioperl objects? What code did you use for that (post that code)? > I am using the program in Stein's Bio::Graphics example > http://stein.cshl.org/genome_informatics/BioGraphics/eg/blast3.pl Does this example run on your computer? Have you been able to use the bioperl objects you created in the first step in the creation of a graphic? If not, what have you tried (post the code) and any error messages. > I need exactly same image as in > http://stein.cshl.org/genome_informatics/BioGraphics/fig3.png > only difference is I need the scale (0.1k - 0.9k) in a range of simple > 1-XXX , here XXX depends on the length of the sequence input. Again, what have you tried? Posting code is helpful here, also. I'm not an expert in bioperl graphics, but it does really help those that know to see the code that you have written to know how best to help. Sean From lzlgboy at gmail.com Wed May 2 09:58:14 2007 From: lzlgboy at gmail.com (kenzy ken) Date: Wed, 2 May 2007 21:58:14 +0800 Subject: [Bioperl-l] Extract CDS from CDNA given Protein SEQs Message-ID: Hi ,everyone I got a task to extract cds sequences from cdna , and I have the protein sequence for each cdna, what should I do? Should I try 3_frmae_translate? But how. Thanks. -- ?????? Chen,Kenian =========================== School of Life Science, Sun Yat-Sen University =========================== Xingang Xilu 135 Guangzhou, Guangdong 510275 P. R. China =========================== Phone: (86) 20-84113677; (86) 20-34474683; Fax: (86) 20-34022356 =========================== Email:lzlgboy at gmail.com; chenkn at mail2.sysu.edu.cn From MEC at stowers-institute.org Wed May 2 18:38:31 2007 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Wed, 2 May 2007 17:38:31 -0500 Subject: [Bioperl-l] Handling discontiguous feature locations in Bio::DB::SeqFeature::Store -- proposed patch to Bio::Graphics::FeatureBase In-Reply-To: References: <6dce9a0b0704271044w2484708n949b00c65dc841dc@mail.gmail.com> Message-ID: Lincoln, Here for your comment and review is a very reworked version of Bio::Graphics::FeatureBase->gff3_string. The main difference is to that homogenous children get ALL their attributes except for start/stop from the parent, including the group. I also provide option as to whether or now to "remove extraneous level of parentage" called $preserveHomegenousParent. There is an in-line comment and question for you in the code body. It works well in my hands to my use cases, but, I'm not positive it is in the spirit of your intentions. Cheers, Malcolm sub gff3_string { my ($self, $recurse, $preserveHomegenousParent, # Note: the following parameters, whose name begins with '$_', # are intended for recursive call only. $_parent, $_self_is_hsf, # is $self the child in a homogeneous parent/child relationship? $_hsf_parentgroup, # if so, what is the group (GFF column 9) of the parent ) = @_; # PURPOSE: Return GFF3 format for the feature $self. Optionally # $recurse to include GFF for any subfeatures of the feature. If # recursing, provide special handling to "remove an extraneous level # of parentage" (unless $preserveHomegenousParent) for features # which have subfeatures all of whose types are the same as the # feature itself (the "homogenous parent/child" case). This usage is # a convention for representing discontiguous features; they may be # created by using the -segment directive without specifying a # distinct -subtype in to `new` when creating a # Bio::Graphics::FeatureBase (i.e. Bio::DB::SeqFeature, # Bio::Graphics::Feature). Such homogenous subfeatures created in # this fashion DO NOT have the parent (GFF column 9) attributes # propogated to them; so, since they are all part of the same # parent, the ONLY difference relevant to GFF production SHOULD be # the $start and $end coordinates for their segment, and ALL THIER # OTHER ATTRIBUTES should be copied down from the parent (including: # strand, score, Name, ID, Parent, etc). my $hparentORself = $_self_is_hsf ? $_parent : $self; # $self's parent, if it is a homogenous child, otherwise $self. if ($recurse && (my @ssf = $self->sub_SeqFeature)) { my $homogenous = ! grep {$_->type ne $self->type} @ssf; # will be TRUE only if all subfeatures are the same type as $self. my $mygroup = # compute $self's group if it is needed to be passed down to # subfeatures, unless it is already being passed down (in which # case there are (at least) 3 levels of homogenous parent child # (will this ever happen in practice???)) ! $homogenous ? '' : $_self_is_hsf ? $_hsf_parentgroup : $self->format_attributes($_parent); return (join("\n", (($preserveHomegenousParent ? ($self->gff3_string(0)) : ()) , map {$_->gff3_string($recurse,$preserveHomegenousParent,$hparentORself,$homo genous,$mygroup)} @ssf))); } else { my $name = $hparentORself->name; my $class = $hparentORself->class; my $group = $_self_is_hsf ? $_hsf_parentgroup : $self->format_attributes($_parent); my $strand = ('-','.','+')[$self->strand+1]; # TODO: understand conditions under which this could be other than # hparentORself->strand. In particular, why does add_segment flip # the strand when start > stop? I thought this was not allowed! # Lincoln - any ideas? my $p = join("\t", $hparentORself->ref||'.',$hparentORself->source||'.',$hparentORself->met hod||'.', $self->start||'.',$self->stop||'.', defined($hparentORself->score) ? $hparentORself->score : '.', $strand||'.', defined($hparentORself->phase) ? $hparentORself->phase : '.', $group||''); } } ________________________________ From: Cook, Malcolm Sent: Friday, April 27, 2007 1:45 PM To: 'lincoln.stein at gmail.com' Cc: 'lstein at cshl.org'; 'bioperl list' Subject: RE: Handling discontiguous feature locations in Bio::DB::SeqFeature::Store -- proposed patch to Bio::Graphics::FeatureBase Hi Lincoln, Cool. The principal of what I figured out I still think holds but the implementation is slightly broke. Improved patch forthoming next week. Malcolm Cook Database Applications Manager - Bioinformatics Stowers Institute for Medical Research - Kansas City, Missouri ________________________________ From: lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com] On Behalf Of Lincoln Stein Sent: Friday, April 27, 2007 12:45 PM To: Cook, Malcolm Cc: lstein at cshl.org; bioperl list Subject: Re: Handling discontiguous feature locations in Bio::DB::SeqFeature::Store -- proposed patch to Bio::Graphics::FeatureBase Hi Malcom, This is absolutely ok and you can go ahead and commit. Thanks for figuring this out! Lincoln On 4/26/07, Cook, Malcolm < MEC at stowers-institute.org > wrote: Lincoln, et al, I find that the gff3_string for Bio::DB::SeqFeature objects retreived from a Bio::DB::SeqFeature::Store that were initially created with -seqments (i.e. whose location was discontiguous) does not display any other attributes in column 9 than "Name". What do you think of the following patch to Bio::Graphics::FeatureBase, whose effect is to "contrive to return (duplicated) common group values" (which otherwise get lost when "collapsing" "homogenous" parent/child features) Another approach would be to copy the attributes from the parent to the children when the -seqments are first created. Another approach would be to use Bio::SeqFeature::Generic as the db's -seqfeature_class and save with -location being a Bio::Location::Split, but this was wrougth with other problems. Any other suggestions? Do you want me to commit this patch? Cheers, Malcolm Patch follows: Index: FeatureBase.pm =================================================================== RCS file: /home/repository/bioperl/bioperl-live/Bio/Graphics/FeatureBase.pm,v retrieving revision 1.29 diff -c -r1.29 FeatureBase.pm *** FeatureBase.pm 16 Apr 2007 19:55:33 -0000 1.29 --- FeatureBase.pm 26 Apr 2007 16:30:23 -0000 *************** *** 581,587 **** foreach (@children) { s/Parent=/ID=/g; } # replace Parent tag with ID ! return join "\n", at children; } return join("\n",$p, at children); --- 581,589 ---- foreach (@children) { s/Parent=/ID=/g; } # replace Parent tag with ID ! #return join "\n", at children; ! # Instead of above, additionally, contrive to return (duplicated) common group values ! return(join("$group\n", at children) . $group); } return join("\n",$p, at children); -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From lstein at cshl.edu Thu May 3 12:01:38 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Thu, 3 May 2007 12:01:38 -0400 Subject: [Bioperl-l] Help : Imagemaps using Bio::Graphics In-Reply-To: <42391.192.168.1.1.1178035451.squirrel@mail.ncbs.res.in> References: <10259461.post@talk.nabble.com> <41667.192.168.1.1.1178019391.squirrel@mail.ncbs.res.in> <1178028249.2644.13.camel@localhost.localdomain> <42391.192.168.1.1.1178035451.squirrel@mail.ncbs.res.in> Message-ID: <6dce9a0b0705030901w203344b4te03ad271a5482faf@mail.gmail.com> The width of the image is determined by the -width attribute and is given in pixels. You cannot control the height of the image as it is computed dynamically based on the number of features and bumping options. Lincoln On 5/1/07, Shameer Khadar wrote: > > Dear Scot, > > > There is a fair amount of documentation in the perldoc for > > Bio::Graphics::Panel under the section called 'Creating Imagemaps'; have > > you read that? > > I agreed, but I couldnt the exact information I needed :( (may be I missed > something important). > > > Also, for changing the scale, that should happen > > automatically--have you tried yet? > > I tried by changing the Lincoln's program eg: blast3.pl > my $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>1000); > to my > $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>300); > > But it had given me a smaller scale of length upto 300. I was looking for > an option where I need same width and height of given image and a dynamic > start and end values depending on length of my sequence. Since I couldnt > accomplish, I thought of getting some help from you guys. I think I need > to play a little bit with the value for reformat the scale to accomodate > my hits as well. > > Thanks a lot for your inputs, > -- > Shameer Khadar > Lab (# 25) The Computational Biology Group > National Centre for Biological Sciences (TIFR) > GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India > T - 91-080-23666001 EXT - 6251 > W - http://www.ncbs.res.in > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From bioperlanand at yahoo.com Thu May 3 16:09:18 2007 From: bioperlanand at yahoo.com (Anand Venkatraman) Date: Thu, 3 May 2007 13:09:18 -0700 (PDT) Subject: [Bioperl-l] a query on Obtaining UniProt sequences Message-ID: <922386.19570.qm@web36808.mail.mud.yahoo.com> Hi I am using Bioperl 1.4 and I am trying to obtain protein sequences for specific Uniprot records. For some records (ROA1_HUMAN), it prints the correct sequence, but it first prints the warning "Use of uninitialized value in substitution (s///) at /usr/lib/perl5/site_perl/5.8.3/Bio/SeqIO/swiss.pm line 855, line 43." For other records (BOLA_HAEIN), it prints the correct sequence (without any warnings). Here is the code: ------------------------------------------------------------------------------------------- #!/usr/bin/perl -w use strict; use Bio::Perl; use Bio::DB::SwissProt; my $sp = new Bio::DB::SwissProt; #my $seq_object = $sp->get_Seq_by_id('ROA1_HUMAN'); my $seq_object = $sp->get_Seq_by_id('BOLA_HAEIN'); my $sequence_as_a_string = $seq_object->seq(); print "$sequence_as_a_string\n"; ------------------------------------------------------------------------------------------- Is there something I need to fix. Thanks in advance for the help. Anand --------------------------------- Ahhh...imagining that irresistible "new car" smell? Check outnew cars at Yahoo! Autos. From MEC at stowers-institute.org Thu May 3 16:19:00 2007 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Thu, 3 May 2007 15:19:00 -0500 Subject: [Bioperl-l] Handling discontiguous feature locations in Bio::DB::SeqFeature::Store -- proposed patch to Bio::Graphics::FeatureBase In-Reply-To: <6dce9a0b0705030745u3a1afffew68538f515c6b663b@mail.gmail.com> References: <6dce9a0b0704271044w2484708n949b00c65dc841dc@mail.gmail.com> <6dce9a0b0705030745u3a1afffew68538f515c6b663b@mail.gmail.com> Message-ID: Lincoln, Ah, yes, round-tripping GFF, the holy grail.... Unfortunately, I don't really have a baseline to go against for an example that roundtrips successfully now. Do you? For example, after loading test data: > bp_seqfeature_load.PLS bioperl-live/t/data/biodbgff/test.gff3 the Contig1 portion of which looks like this: ##gff-version 3 ## sequence-region Contig1 1 37450 Contig1 confirmed transcript 1001 2000 42 + . ID=Transcript:trans-1;Gene=abc-1;Gene=xyz-2;Note=function+unknown Contig1 confirmed exon 1001 1100 . + . ID=Transcript:trans-1 Contig1 confirmed exon 1201 1300 . + . ID=Transcript:trans-1 Contig1 confirmed exon 1401 1450 . + . ID=Transcript:trans-1 Contig1 confirmed CDS 1051 1100 . + 0 ID=Transcript:trans-1 Contig1 confirmed CDS 1201 1300 . + 2 ID=Transcript:trans-1 Contig1 confirmed CDS 1401 1440 . + 0 ID=Transcript:trans-1 Contig1 est similarity 1001 1100 96 . . Target=EST:CEESC13F 1 100 + Contig1 est similarity 1201 1300 99 . . Target=EST:CEESC13F 101 200 + Contig1 est similarity 1401 1450 99 . . Target=EST:CEESC13F 201 250 + Contig1 tc1 transposon 5001 6000 . + . ID=Transposon:c128.1 Contig1 tc1 transposon 8001 9000 . - . ID=Transposon:c128.2 Contig1 confirmed transcript 30001 31000 . - . ID=Transcript:trans-2;Gene=xyz-2;Note=Terribly+interesting Contig1 confirmed exon 30001 30100 . - . ID=Transcript:trans-2;Gene=abc-1;Note=function+unknown Contig1 confirmed exon 30701 30800 . - . ID=Transcript:trans-2 Contig1 confirmed exon 30801 31000 . - . ID=Transcript:trans-2 and then generating output with >bp_seqfeature_gff3.PLS --gff=1 -- seq_id Contig1 # using a script I just committed - I hope you like it. Note: gff=1 => recurse we get output gff with problems such as: 1 IDs get turned into Aliases 2 the seqid of a Target attributes gets copied into the features Name attribute 3 supression of parents of homogeneous subfeatures doesn't work when the parent has other subfeatures that those with its same type (i.e. the transcript feature also has exon subfeatures) look: Contig1 est similarity 1001 1100 96 . . Name=EST:CEESC13F;ID=3;Target=EST:CEESC13F 1 100 + Contig1 est similarity 1201 1300 99 . . Name=EST:CEESC13F;ID=4;Target=EST:CEESC13F 101 200 + Contig1 est similarity 1401 1450 99 . . Name=EST:CEESC13F;ID=5;Target=EST:CEESC13F 201 250 + Contig1 confirmed transcript 1001 2000 42 + . ID=2;Alias=Transcript:trans-1;Gene=abc-1,xyz-2;Note=function+unknown Contig1 confirmed transcript 1001 2000 42 + . Parent=2;Alias=Transcript:trans-1;Note=function+unknown;Gene=abc-1,xyz-2 Contig1 confirmed exon 1001 1100 . + . Parent=2;Alias=Transcript:trans-1 Contig1 confirmed exon 1201 1300 . + . Parent=2;Alias=Transcript:trans-1 Contig1 confirmed exon 1401 1450 . + . Parent=2;Alias=Transcript:trans-1 Contig1 confirmed CDS 1051 1100 . + 0 Parent=2;Alias=Transcript:trans-1 Contig1 confirmed CDS 1201 1300 . + 2 Parent=2;Alias=Transcript:trans-1 Contig1 confirmed CDS 1401 1440 . + 0 Parent=2;Alias=Transcript:trans-1 Contig1 tc1 transposon 5001 6000 . + . ID=6;Alias=Transposon:c128.1 Contig1 tc1 transposon 8001 9000 . - . ID=7;Alias=Transposon:c128.2 Contig1 confirmed transcript 30001 31000 . - . ID=9;Alias=Transcript:trans-2;Gene=xyz-2;Note=Terribly+interesting Contig1 confirmed transcript 30001 31000 . - . Parent=9;Alias=Transcript:trans-2;Note=Terribly+interesting;Gene=xyz-2 Contig1 confirmed exon 30001 30100 . - . Parent=9;Alias=Transcript:trans-2;Gene=abc-1;Note=function+unknown Contig1 confirmed exon 30701 30800 . - . Parent=9;Alias=Transcript:trans-2 Contig1 confirmed exon 30801 31000 . - . Parent=9;Alias=Transcript:trans-2 Contig1 . region 1 37450 . . . Name=Contig1;ID=1 with my new version of gff3_string (not yet commited), only the 3rd problem is addressed, generating bp_seqfeature_gff3.PLS --gff 1 -- seq_id Contig1 Contig1 est similarity 1001 1100 96 . . Name=EST:CEESC13F;ID=3;Target=EST:CEESC13F 1 100 + Contig1 est similarity 1201 1300 99 . . Name=EST:CEESC13F;ID=4;Target=EST:CEESC13F 101 200 + Contig1 est similarity 1401 1450 99 . . Name=EST:CEESC13F;ID=5;Target=EST:CEESC13F 201 250 + Contig1 confirmed transcript 1001 2000 42 + . ID=2;Alias=Transcript:trans-1;Gene=abc-1,xyz-2;Note=function+unknown Contig1 confirmed exon 1001 1100 . + . Parent=2;Alias=Transcript:trans-1 Contig1 confirmed exon 1201 1300 . + . Parent=2;Alias=Transcript:trans-1 Contig1 confirmed exon 1401 1450 . + . Parent=2;Alias=Transcript:trans-1 Contig1 confirmed CDS 1051 1100 . + 0 Parent=2;Alias=Transcript:trans-1 Contig1 confirmed CDS 1201 1300 . + 2 Parent=2;Alias=Transcript:trans-1 Contig1 confirmed CDS 1401 1440 . + 0 Parent=2;Alias=Transcript:trans-1 Contig1 tc1 transposon 5001 6000 . + . ID=6;Alias=Transposon:c128.1 Contig1 tc1 transposon 8001 9000 . - . ID=7;Alias=Transposon:c128.2 Contig1 confirmed transcript 30001 31000 . - . ID=9;Alias=Transcript:trans-2;Gene=xyz-2;Note=Terribly+interesting Contig1 confirmed exon 30001 30100 . - . Parent=9;Alias=Transcript:trans-2;Gene=abc-1;Note=function+unknown Contig1 confirmed exon 30701 30800 . - . Parent=9;Alias=Transcript:trans-2 Contig1 confirmed exon 30801 31000 . - . Parent=9;Alias=Transcript:trans-2 Contig1 . region 1 37450 . . . Name=Contig1;ID=1 I had to make another change to get this output though, since I had to change the behaviour to # provide special handling to "remove an extraneous level # of parentage" (unless $preserveHomegenousParent) for features # which have at least one subfeature with the same type as the # feature itself (thus redefining Lincoln's "homogenous # parent/child" case, which previously required all children to have # the same type as parent) I think you will agree this is the more desirable behaviour. I would be happy to test any other GFF you suggest might be (more or less) roundtripped. What think you? --Malcolm ________________________________ From: lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com] On Behalf Of Lincoln Stein Sent: Thursday, May 03, 2007 9:46 AM To: Cook, Malcolm Subject: Re: Handling discontiguous feature locations in Bio::DB::SeqFeature::Store -- proposed patch to Bio::Graphics::FeatureBase Hi Malcolm, For me, the major use case is that GFF3 files round-trip correctly through the database. Do any of your use cases cover that? Lincoln On 5/2/07, Cook, Malcolm wrote: Lincoln, Here for your comment and review is a very reworked version of Bio::Graphics::FeatureBase->gff3_string. The main difference is to that homogenous children get ALL their attributes except for start/stop from the parent, including the group. I also provide option as to whether or now to "remove extraneous level of parentage" called $preserveHomegenousParent. There is an in-line comment and question for you in the code body. It works well in my hands to my use cases, but, I'm not positive it is in the spirit of your intentions. Cheers, Malcolm sub gff3_string { my ($self, $recurse, $preserveHomegenousParent, # Note: the following parameters, whose name begins with '$_', # are intended for recursive call only. $_parent, $_self_is_hsf, # is $self the child in a homogeneous parent/child relationship? $_hsf_parentgroup, # if so, what is the group (GFF column 9) of the parent ) = @_; # PURPOSE: Return GFF3 format for the feature $self. Optionally # $recurse to include GFF for any subfeatures of the feature. If # recursing, provide special handling to "remove an extraneous level # of parentage" (unless $preserveHomegenousParent) for features # which have subfeatures all of whose types are the same as the # feature itself (the "homogenous parent/child" case). This usage is # a convention for representing discontiguous features; they may be # created by using the -segment directive without specifying a # distinct -subtype in to `new` when creating a # Bio::Graphics::FeatureBase (i.e. Bio::DB::SeqFeature, # Bio::Graphics::Feature). Such homogenous subfeatures created in # this fashion DO NOT have the parent (GFF column 9) attributes # propogated to them; so, since they are all part of the same # parent, the ONLY difference relevant to GFF production SHOULD be # the $start and $end coordinates for their segment, and ALL THIER # OTHER ATTRIBUTES should be copied down from the parent (including: # strand, score, Name, ID, Parent, etc). my $hparentORself = $_self_is_hsf ? $_parent : $self; # $self's parent, if it is a homogenous child, otherwise $self. if ($recurse && (my @ssf = $self->sub_SeqFeature)) { my $homogenous = ! grep {$_->type ne $self->type} @ssf; # will be TRUE only if all subfeatures are the same type as $self. my $mygroup = # compute $self's group if it is needed to be passed down to # subfeatures, unless it is already being passed down (in which # case there are (at least) 3 levels of homogenous parent child # (will this ever happen in practice???)) ! $homogenous ? '' : $_self_is_hsf ? $_hsf_parentgroup : $self->format_attributes($_parent); return (join("\n", (($preserveHomegenousParent ? ($self->gff3_string(0)) : ()) , map {$_->gff3_string($recurse,$preserveHomegenousParent,$hparentORself,$homo genous,$mygroup)} @ssf))); } else { my $name = $hparentORself->name; my $class = $hparentORself->class; my $group = $_self_is_hsf ? $_hsf_parentgroup : $self->format_attributes($_parent); my $strand = ('-','.','+')[$self->strand+1]; # TODO: understand conditions under which this could be other than # hparentORself->strand. In particular, why does add_segment flip # the strand when start > stop? I thought this was not allowed! # Lincoln - any ideas? my $p = join("\t", $hparentORself->ref||'.',$hparentORself->source||'.',$hparentORself->met hod||'.', $self->start||'.',$self->stop||'.', defined($hparentORself->score) ? $hparentORself->score : '.', $strand||'.', defined($hparentORself->phase) ? $hparentORself->phase : '.', $group||''); } } ________________________________ From: Cook, Malcolm Sent: Friday, April 27, 2007 1:45 PM To: 'lincoln.stein at gmail.com' Cc: 'lstein at cshl.org'; 'bioperl list' Subject: RE: Handling discontiguous feature locations in Bio::DB::SeqFeature::Store -- proposed patch to Bio::Graphics::FeatureBase Hi Lincoln, Cool. The principal of what I figured out I still think holds but the implementation is slightly broke. Improved patch forthoming next week. Malcolm Cook Database Applications Manager - Bioinformatics Stowers Institute for Medical Research - Kansas City, Missouri ________________________________ From: lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com] On Behalf Of Lincoln Stein Sent: Friday, April 27, 2007 12:45 PM To: Cook, Malcolm Cc: lstein at cshl.org; bioperl list Subject: Re: Handling discontiguous feature locations in Bio::DB::SeqFeature::Store -- proposed patch to Bio::Graphics::FeatureBase Hi Malcom, This is absolutely ok and you can go ahead and commit. Thanks for figuring this out! Lincoln On 4/26/07, Cook, Malcolm < MEC at stowers-institute.org > wrote: Lincoln, et al, I find that the gff3_string for Bio::DB::SeqFeature objects retreived from a Bio::DB::SeqFeature::Store that were initially created with -seqments (i.e. whose location was discontiguous) does not display any other attributes in column 9 than "Name". What do you think of the following patch to Bio::Graphics::FeatureBase, whose effect is to "contrive to return (duplicated) common group values" (which otherwise get lost when "collapsing" "homogenous" parent/child features) Another approach would be to copy the attributes from the parent to the children when the -seqments are first created. Another approach would be to use Bio::SeqFeature::Generic as the db's -seqfeature_class and save with -location being a Bio::Location::Split, but this was wrougth with other problems. Any other suggestions? Do you want me to commit this patch? Cheers, Malcolm Patch follows: Index: FeatureBase.pm =================================================================== RCS file: /home/repository/bioperl/bioperl-live/Bio/Graphics/FeatureBase.pm,v retrieving revision 1.29 diff -c -r1.29 FeatureBase.pm *** FeatureBase.pm 16 Apr 2007 19:55:33 -0000 1.29 --- FeatureBase.pm 26 Apr 2007 16:30:23 -0000 *************** *** 581,587 **** foreach (@children) { s/Parent=/ID=/g; } # replace Parent tag with ID ! return join "\n", at children; } return join("\n",$p, at children); --- 581,589 ---- foreach (@children) { s/Parent=/ID=/g; } # replace Parent tag with ID ! #return join "\n", at children; ! # Instead of above, additionally, contrive to return (duplicated) common group values ! return(join("$group\n", at children) . $group); } return join("\n",$p, at children); -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From cjfields at uiuc.edu Thu May 3 16:57:43 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 3 May 2007 15:57:43 -0500 Subject: [Bioperl-l] a query on Obtaining UniProt sequences In-Reply-To: <922386.19570.qm@web36808.mail.mud.yahoo.com> References: <922386.19570.qm@web36808.mail.mud.yahoo.com> Message-ID: <2930F3F1-2BFB-4320-9A2C-50DFE6F808A1@uiuc.edu> I would update to BioPerl 1.5.2. v.1.4 is 3 yrs old and there have been tons of changes both for sequence retrieval and parsers. We can't predict when a new 'stable' release will be available but 1.5.2 works well for most purposes. chris On May 3, 2007, at 3:09 PM, Anand Venkatraman wrote: > Hi > > I am using Bioperl 1.4 and I am trying to obtain protein sequences > for specific Uniprot records. > ... > Is there something I need to fix. > > Thanks in advance for the help. > > Anand > > > --------------------------------- > Ahhh...imagining that irresistible "new car" smell? > Check outnew cars at Yahoo! Autos. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From thiago.venancio at gmail.com Thu May 3 17:12:35 2007 From: thiago.venancio at gmail.com (Thiago Venancio) Date: Thu, 3 May 2007 18:12:35 -0300 Subject: [Bioperl-l] extracting coding sequence from BLAST In-Reply-To: <54F53FA0-4ED6-4DE8-A853-750AE5930FC2@bioperl.org> References: <44255ea80704131205haba420dg8adf11bd0596f65e@mail.gmail.com> <8C7B42CC-A652-4172-A038-E9461231EE84@bioperl.org> <44255ea80704131320t79bc5c64kc519c5c90ebe4ed@mail.gmail.com> <54F53FA0-4ED6-4DE8-A853-750AE5930FC2@bioperl.org> Message-ID: <44255ea80705031412n7abef247je70d2681bb3cc7ed@mail.gmail.com> Hi all, Just for record. I am getting good results to extract CDS from protein X dna alignments by using the following procedure: - BLASTX to identify the hits for each dna sequence (if you want to process sequences for further multiple sequence alignment, it is important to record the frames); - fastx/y to refine the alignment between the protein and the dna. FASTX/Y is is quite good, because it performs well with frame shifts and a allows better identification of premature stop codons. In addition, the alignment (and the CDS prediction) is better. This is interesting to note, to avoid analysis of "phantom" mRNAs, which are sequences that have stops, so merely looking at the blast can raise misleading results sometimes. Best. Thiago On 4/13/07, Jason Stajich wrote: > > Hi - > There are some tools that do this for you -- I've listed a few from a > google search or from what I remember reading. It would be great If you > (and others!) are willing to contribute a little of the info of what you > find that works for you to the wiki, that would be great as well. A little > HOWTO would be cool - here or on openwetware.org. > > Prot4EST http://zeldia.cap.ed.ac.uk/bioinformatics/prot4EST/index.shtml > EST-PAC: doi: http://dx.doi.org/10.1186/1751-0473-1-2 > > Ewan Birney's estwise as part of wise package also can help if you have a > likely protein from BLAST you want to align to the est - estwise can handle > frameshifts, but can be too slow for some people. Exonerate's protein2dna > model may also work here, but I haven't tried it. > > -jason > On Apr 13, 2007, at 1:20 PM, Thiago Venancio wrote: > > Thanks Jason. > > I have a large dataset (assembled ESTs) and several BLASTX or TBLASTX > comparisons and want to extract some translated coding regions for further > multiple aligmnent and phylogenetic analysis. > > Best. > > Thiago > > On 4/13/07, Jason Stajich wrote: > > > Depends on how far away the query protein is, but I don't trust BLAST for > the actual alignment. Find the boundaries, add a little slop, and refine > the alignment of protein to genome with a good alignment program designed > to > like genewise or exonerate or even FASTX/Y. > -jason > On Apr 13, 2007, at 12:05 PM, Thiago Venancio wrote: > > Hi all. > > What is the best way to extract coding region from a nucleotide sequence > based on a BLASTX or TBLASTX comparisons ? > > Thanks in advance. > > Thiago > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -- > Jason Stajich > jason at bioperl.org > http://jason.open-bio.org/ > > > > > -- > Jason Stajich > jason at bioperl.org > http://jason.open-bio.org/ > > > -- "The way to get started is to quit talking and begin doing." Walt Disney ======================== Thiago Motta Venancio, MSc PhD student in Bioinformatics University of Sao Paulo ======================== From lstein at cshl.edu Thu May 3 17:35:57 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Thu, 3 May 2007 17:35:57 -0400 Subject: [Bioperl-l] CSHL is hiring Message-ID: <6dce9a0b0705031435r3bc2d2ddlfca5ac02844b4ef0@mail.gmail.com> Hi Folks, Sorry for the spam. My group at CSHL is looking for a scientific programmer with good software development credentials and some experience in bioinformatics. Experience in object-oriented Perl programming is a strict requirement. This is to work on user interface development for several projects including: - BioMart (data warehouse) project (www.biomart.org) - GBrowse genome browser (www.gmod.org/GBrowse) - Reactome pathways database (www.reactome.org) I can offer salaries in the 60-80K range, depending on level of experience. Please reply to lstein at cshl.edu. Best, Lincoln -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From MEC at stowers-institute.org Tue May 8 12:59:10 2007 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Tue, 8 May 2007 11:59:10 -0500 Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap start and stop coordinates?? Message-ID: Why does Bio::DB::GFF::Feature::gff3_string swap start and stop coordinates, as in: ($start,$stop) = ($stop,$start) if defined($start) && defined($stop) && $start > $stop; I thought it is not legal for a feature to be so composed. Anyone know? Cheers, Malcolm Cook Stowers Institute for Medical Research - Kansas City, Missouri From cjfields at uiuc.edu Tue May 8 13:12:45 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 8 May 2007 12:12:45 -0500 Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap start and stop coordinates?? In-Reply-To: References: Message-ID: <79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu> I believe all seqfeature location coordinates are designed to have start < stop for consistency; in cases where the strand matters (CDS, gene, etc.) then the strand is set to 1 or -1. When start > stop, the two are reversed and the strand is flipped; at least that's the way locations are set up in BioPerl. chris On May 8, 2007, at 11:59 AM, Cook, Malcolm wrote: > Why does Bio::DB::GFF::Feature::gff3_string swap start and stop > coordinates, > > as in: > ($start,$stop) = ($stop,$start) if defined($start) && defined($stop) > && $start > $stop; > > I thought it is not legal for a feature to be so composed. > > Anyone know? > > Cheers, > > Malcolm Cook > Stowers Institute for Medical Research - Kansas City, Missouri > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From juheymann at yahoo.com Tue May 8 14:37:20 2007 From: juheymann at yahoo.com (Bohr) Date: Tue, 8 May 2007 11:37:20 -0700 (PDT) Subject: [Bioperl-l] problem with Bioperl get_sequence ('swiss', "acc#"); Message-ID: <10381379.post@talk.nabble.com> Hi, I installed bioperl under OSX Tiger via Fink. I tested the installation using the test tutorial via: perl -w bptutorial.pl 5 The script failed indicating that the file to retrieve was missing. To identify the problem, I used a script using 'get_sequence' that will retrieve a file from 'genbank' or 'embl'. Both succeeded. If I replace it with 'swiss' or 'swissprot' and substitute the ID with the identical ID as in the tutorial, I am recreating the problem found with bptutorial.pl. Other ID's do the same. Any pointers on the origin of this finding would be greatly appreciated. -- View this message in context: http://www.nabble.com/problem-with-Bioperl-get_sequence-%28%27swiss%27%2C-%22acc-%22%29--tf3711391.html#a10381379 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From cjfields at uiuc.edu Tue May 8 17:53:04 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 8 May 2007 16:53:04 -0500 Subject: [Bioperl-l] problem with Bioperl get_sequence ('swiss', "acc#"); In-Reply-To: <10381379.post@talk.nabble.com> References: <10381379.post@talk.nabble.com> Message-ID: <2B5306D0-0498-47FD-8D57-1B559DC8E838@uiuc.edu> The Fink BioPerl distribution is 1.5.1. You'll need to update to v 1.5.2 due to changes on the various remote servers (NCBI, UniProt, etc) accessed via bioperl. As a note, the bptutorial.pl has been moved to the bioperl wiki: http://www.bioperl.org/wiki/Bptutorial chris On May 8, 2007, at 1:37 PM, Bohr wrote: > > Hi, > > I installed bioperl under OSX Tiger via Fink. I tested the > installation > using the test tutorial via: perl -w bptutorial.pl 5 > > The script failed indicating that the file to retrieve was missing. To > identify the problem, I used a script using 'get_sequence' that will > retrieve a file from 'genbank' or 'embl'. Both succeeded. If I > replace it > with 'swiss' or 'swissprot' and substitute the ID with the > identical ID as > in the tutorial, I am recreating the problem found with > bptutorial.pl. Other > ID's do the same. > > Any pointers on the origin of this finding would be greatly > appreciated. > -- > View this message in context: http://www.nabble.com/problem-with- > Bioperl-get_sequence-%28%27swiss%27%2C-%22acc-%22%29-- > tf3711391.html#a10381379 > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From juheymann at yahoo.com Wed May 9 18:17:27 2007 From: juheymann at yahoo.com (Bohr) Date: Wed, 9 May 2007 15:17:27 -0700 (PDT) Subject: [Bioperl-l] problem with Bioperl get_sequence ('swiss', "acc#"); In-Reply-To: <2B5306D0-0498-47FD-8D57-1B559DC8E838@uiuc.edu> References: <10381379.post@talk.nabble.com> <2B5306D0-0498-47FD-8D57-1B559DC8E838@uiuc.edu> Message-ID: <10403903.post@talk.nabble.com> Thank you for the feedback and the suggestion. I installed 1.5.2 via Build.pl and the results were the same e.g. embl and genbank worked fine, swissprot failed Here is the output: MSG: acc (CALX_YEAST) does not exist --------------------------------------------------- ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Did not provide a valid Bio::PrimarySeqI object STACK: Error::throw STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.6/Bio/Root/Root.pm:328 STACK: Bio::SeqIO::fasta::write_seq /sw/lib/perl5/5.8.6/Bio/SeqIO/fasta.pm:181 Before contemplating too much: Here my question: how do I verify the update to 1.5.2? (I ran ./Build test and that came back positive.) And what else could have gone wrong here? What might be a clever way to troubleshoot this? --------------------------------------------------------------------------- Chris Fields wrote: > > The Fink BioPerl distribution is 1.5.1. You'll need to update to v > 1.5.2 due to changes on the various remote servers (NCBI, UniProt, > etc) accessed via bioperl. > > As a note, the bptutorial.pl has been moved to the bioperl wiki: > > http://www.bioperl.org/wiki/Bptutorial > > chris > > On May 8, 2007, at 1:37 PM, Bohr wrote: > >> >> Hi, >> >> I installed bioperl under OSX Tiger via Fink. I tested the >> installation >> using the test tutorial via: perl -w bptutorial.pl 5 >> >> The script failed indicating that the file to retrieve was missing. To >> identify the problem, I used a script using 'get_sequence' that will >> retrieve a file from 'genbank' or 'embl'. Both succeeded. If I >> replace it >> with 'swiss' or 'swissprot' and substitute the ID with the >> identical ID as >> in the tutorial, I am recreating the problem found with >> bptutorial.pl. Other >> ID's do the same. >> >> Any pointers on the origin of this finding would be greatly >> appreciated. >> -- >> View this message in context: http://www.nabble.com/problem-with- >> Bioperl-get_sequence-%28%27swiss%27%2C-%22acc-%22%29-- >> tf3711391.html#a10381379 >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://www.nabble.com/problem-with-Bioperl-get_sequence-%28%27swiss%27%2C-%22acc-%22%29--tf3711391.html#a10403903 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From ursula_cox at btinternet.com Wed May 9 18:12:26 2007 From: ursula_cox at btinternet.com (Ursula at BT) Date: Wed, 9 May 2007 23:12:26 +0100 Subject: [Bioperl-l] Getting a Subset of an Existing EnzymeCollection Message-ID: <002e01c79287$20bbfe60$4101a8c0@AMDDualCore> Dear BioPerl List, I'm new to BioPerl (and Perl for that matter). I have an array of enzyme names, and a larger collection of enzymes (guaranteed to be a superset by the way it's constructed). I need to make a new collection containing just the enzymes corresponding to the names I have in the array. I was hoping that something like: my $all_rebase = Bio::Restriction::IO->new(-file=>'bionet.704',-format=>'bionet'); my $all_rebase_collection = $all_rebase->read(); my @enzymes = ('AasI','AatI','AccII','AatII','AauI','Acc113I','Acc16I','Acc65I','AccB1I',' AccB7I','AccI'); my $new_collection = Bio::Restriction::EnzymeCollection(-empty => 1); foreach $enzyme (all_rebase_collection) { $new_collection($enzyme) if grep $_ eq $enzyme->name, @enzymes; } would work, but I get a syntax error near "$new_collection(". Any clues much appreciated, Ursula Cox From juheymann at yahoo.com Wed May 9 18:38:42 2007 From: juheymann at yahoo.com (Bohr) Date: Wed, 9 May 2007 15:38:42 -0700 (PDT) Subject: [Bioperl-l] problem with Bioperl get_sequence ('swiss', "acc#"); In-Reply-To: <2B5306D0-0498-47FD-8D57-1B559DC8E838@uiuc.edu> References: <10381379.post@talk.nabble.com> <2B5306D0-0498-47FD-8D57-1B559DC8E838@uiuc.edu> Message-ID: <10404211.post@talk.nabble.com> Thank you for pointing that out! I installed 1.5.2 via Build.pl. The scripts work as expected now. Chris Fields wrote: > > The Fink BioPerl distribution is 1.5.1. You'll need to update to v > 1.5.2 due to changes on the various remote servers (NCBI, UniProt, > etc) accessed via bioperl. > > As a note, the bptutorial.pl has been moved to the bioperl wiki: > > http://www.bioperl.org/wiki/Bptutorial > > chris > > On May 8, 2007, at 1:37 PM, Bohr wrote: > >> >> Hi, >> >> I installed bioperl under OSX Tiger via Fink. I tested the >> installation >> using the test tutorial via: perl -w bptutorial.pl 5 >> >> The script failed indicating that the file to retrieve was missing. To >> identify the problem, I used a script using 'get_sequence' that will >> retrieve a file from 'genbank' or 'embl'. Both succeeded. If I >> replace it >> with 'swiss' or 'swissprot' and substitute the ID with the >> identical ID as >> in the tutorial, I am recreating the problem found with >> bptutorial.pl. Other >> ID's do the same. >> >> Any pointers on the origin of this finding would be greatly >> appreciated. >> -- >> View this message in context: http://www.nabble.com/problem-with- >> Bioperl-get_sequence-%28%27swiss%27%2C-%22acc-%22%29-- >> tf3711391.html#a10381379 >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://www.nabble.com/problem-with-Bioperl-get_sequence-%28%27swiss%27%2C-%22acc-%22%29--tf3711391.html#a10404211 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From cjfields at uiuc.edu Wed May 9 19:37:33 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 9 May 2007 18:37:33 -0500 Subject: [Bioperl-l] Getting a Subset of an Existing EnzymeCollection In-Reply-To: <002e01c79287$20bbfe60$4101a8c0@AMDDualCore> References: <002e01c79287$20bbfe60$4101a8c0@AMDDualCore> Message-ID: On May 9, 2007, at 5:12 PM, Ursula at BT wrote: > Dear BioPerl List, > > > > I'm new to BioPerl (and Perl for that matter). I have an array of > enzyme > names, and a larger collection of enzymes (guaranteed to be a > superset by > the way it's constructed). I need to make a new collection > containing just > the enzymes corresponding to the names I have in the array. First, prior to using BioPerl you should really brush up on perl itself (Learning Perl, or James Tisdall's Perl for Bioinformatics books, the former preferred). Though there are several scripts available to get you started with Bioperl, much of the code is written with the expectation that you can write and debug a basic perl script (and there is some expectation that you are somewhat familiar with OO Perl). Saying that, let's see what's wrong... > I was hoping that something like: > > > > my $all_rebase = > Bio::Restriction::IO->new(-file=>'bionet.704',-format=>'bionet'); > > my $all_rebase_collection = $all_rebase->read(); The 'bionet' format is not supported; only 'withrefm', 'itype2', 'bairoch' are (the latter only experimentally). See 'perldoc Bio::Restriction::IO'. > my @enzymes = > ('AasI','AatI','AccII','AatII','AauI','Acc113I','Acc16I','Acc65I','Acc > B1I',' > AccB7I','AccI'); > > > > my $new_collection = Bio::Restriction::EnzymeCollection(-empty => 1); Missing a new() constructor here. > foreach $enzyme (all_rebase_collection) Not sure what this is. No '$' sigil for $all_rebase_collection will make the compiler look for (and fail to find) the sub all_rebase_collection(). > > { > > $new_collection($enzyme) if grep $_ eq $enzyme->name, > @enzymes; > > } > > > > would work, but I get a syntax error near "$new_collection(". Yep. You don't have your grep sub block in brackets {}, hence the error. See 'perldoc -f grep'. > Any clues much appreciated, > > > > Ursula Cox No prob, but again you might want to brush up on perl. chris From darin.london at duke.edu Thu May 10 12:17:38 2007 From: darin.london at duke.edu (darin.london at duke.edu) Date: Thu, 10 May 2007 12:17:38 -0400 Subject: [Bioperl-l] BOSC 2007 Second Call For Papers Message-ID: <200705101617.l4AGHceI002463@tenero.duhs.duke.edu> The BOSC Organizing Committee are proud to announce BOSC 2007, occurring in Vienna, Austria on July 19th, 20th. The conference this year promises to be exciting, as the BOSC developers attempt to define and solve currently intractable problems in Bioinformatics. Please refer to the following website for complete information, and requests for submissions. Thank you, and we hope to see you in Vienna. http://open-bio.org/wiki/BOSC_2007 The BOSC organizing Committee Please pass this email on to anyone that would be interested. From lstein at cshl.edu Thu May 10 13:13:09 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Thu, 10 May 2007 13:12:09 -0401 Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap start and stop coordinates?? In-Reply-To: References: Message-ID: <6dce9a0b0705101013w1923c173l5ec5d9288c67c9a2@mail.gmail.com> It's a workaround for some broken data sources. It should "never happen." Lincoln On 5/8/07, Cook, Malcolm wrote: > > Why does Bio::DB::GFF::Feature::gff3_string swap start and stop > coordinates, > > as in: > ($start,$stop) = ($stop,$start) if defined($start) && defined($stop) > && $start > $stop; > > I thought it is not legal for a feature to be so composed. > > Anyone know? > > Cheers, > > Malcolm Cook > Stowers Institute for Medical Research - Kansas City, Missouri > > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From Bank.Beszteri at awi.de Thu May 10 12:13:00 2007 From: Bank.Beszteri at awi.de (Bank Beszteri) Date: Thu, 10 May 2007 18:13:00 +0200 Subject: [Bioperl-l] Bio::Tree::Tree -- rerooting & bootstrap problem Message-ID: <4643448C.4000807@awi.de> Dear Bioperl folks, I?m trying to use Bio::Tree::Tree for manipulating phylogenetic trees, but in some things it did not behave as I expected it to, so I had to look inside a bit. In particular, I had problems with mixed up bootstrap values after re-rooting. After looking into the Bio::Tree::Tree data structures, it seems that a) bootstrap values are stored as attributes of nodes of the tree [to my understanding, they should rather be attributes of branches but Bio::Tree::Tree apparently tries to simplify away branches]; each node stores the bootstrap value belonging to the branch that connects it to its ancestor node (I?m reading in trees from Newick strings, and bootstrap values arrive in the id fields of internal branches) b) when re-rooting a tree, bootstrap values stay with the same node where they were before. Because the node that used to be the ancestor of a particular node in the original tree might have become its descendant after re-rooting, the bootstrap values are being mixed up. Can you confirm my conclusion? Whether yes or no, have you got an easy workaround or alternative solution to re-rooting trees (without having to touch the reroot method) or any other hints that could be useful for me to deal with this issue? Cheers, Bank -- Dr. B?nk Beszteri Alfred Wegener Institute for Polar and Marine Research From dmessina at wustl.edu Thu May 10 16:16:48 2007 From: dmessina at wustl.edu (David Messina) Date: Thu, 10 May 2007 15:16:48 -0500 Subject: [Bioperl-l] Cross_match parser and Search::Result object Message-ID: <1C6C74AC-9CD2-48E8-8A6A-772D6DEA8C45@wustl.edu> Hi everyone, Shin Leong here at the Wash U GSC has written SearchIO-compliant cross_match parsing and result modules. Specifically, Bio::SearchIO::cross_match and Bio::Search::Result::CrossMatchResult. To my knowledge this functionality doesn't exist in BioPerl. Any comments or objections before I commit these to CVS? Thanks, Dave -- Dave Messina Senior Analyst, Assembly Group Genome Sequencing Center Washington University St. Louis, MO From aperezp at uma.es Thu May 10 13:58:32 2007 From: aperezp at uma.es (=?ISO-8859-1?Q?=22Antonio_J=2E_P=E9rez=22?=) Date: Thu, 10 May 2007 19:58:32 +0200 Subject: [Bioperl-l] Get Swiss Entry Message-ID: <46435D48.4020309@uma.es> An HTML attachment was scrubbed... URL: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070510/ca4e893e/attachment.html From jason at bioperl.org Thu May 10 16:53:28 2007 From: jason at bioperl.org (Jason Stajich) Date: Thu, 10 May 2007 13:53:28 -0700 Subject: [Bioperl-l] Cross_match parser and Search::Result object In-Reply-To: <1C6C74AC-9CD2-48E8-8A6A-772D6DEA8C45@wustl.edu> References: <1C6C74AC-9CD2-48E8-8A6A-772D6DEA8C45@wustl.edu> Message-ID: Awesome! On May 10, 2007, at 1:16 PM, David Messina wrote: > Hi everyone, > > Shin Leong here at the Wash U GSC has written SearchIO-compliant > cross_match parsing and result modules. Specifically, > Bio::SearchIO::cross_match and Bio::Search::Result::CrossMatchResult. > > To my knowledge this functionality doesn't exist in BioPerl. Any > comments or objections before I commit these to CVS? > > Thanks, > Dave > > > -- > Dave Messina > Senior Analyst, Assembly Group > Genome Sequencing Center > Washington University > St. Louis, MO > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org http://jason.open-bio.org/ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070510/b841b428/attachment.html -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2613 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070510/b841b428/attachment.bin From cjfields at uiuc.edu Fri May 11 00:55:05 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 10 May 2007 23:55:05 -0500 Subject: [Bioperl-l] Cross_match parser and Search::Result object In-Reply-To: <1C6C74AC-9CD2-48E8-8A6A-772D6DEA8C45@wustl.edu> References: <1C6C74AC-9CD2-48E8-8A6A-772D6DEA8C45@wustl.edu> Message-ID: <1E23C374-16B7-4D00-9340-79DBA4B8BABF@uiuc.edu> Sounds good to me! Any tests to be added? chris On May 10, 2007, at 3:16 PM, David Messina wrote: > Hi everyone, > > Shin Leong here at the Wash U GSC has written SearchIO-compliant > cross_match parsing and result modules. Specifically, > Bio::SearchIO::cross_match and Bio::Search::Result::CrossMatchResult. > > To my knowledge this functionality doesn't exist in BioPerl. Any > comments or objections before I commit these to CVS? > > Thanks, > Dave > > > -- > Dave Messina > Senior Analyst, Assembly Group > Genome Sequencing Center > Washington University > St. Louis, MO > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From dmessina at wustl.edu Fri May 11 01:42:53 2007 From: dmessina at wustl.edu (David Messina) Date: Fri, 11 May 2007 00:42:53 -0500 Subject: [Bioperl-l] Cross_match parser and Search::Result object In-Reply-To: <1E23C374-16B7-4D00-9340-79DBA4B8BABF@uiuc.edu> References: <1C6C74AC-9CD2-48E8-8A6A-772D6DEA8C45@wustl.edu> <1E23C374-16B7-4D00-9340-79DBA4B8BABF@uiuc.edu> Message-ID: <9744D96F-D2F4-43B5-B9D3-147A506F3AE7@wustl.edu> > Sounds good to me! Any tests to be added? No tests right now as far as I can tell. I'm swamped personally, but perhaps I can persuade Mark Johnson over here to crank out a few. From cjfields at uiuc.edu Fri May 11 11:25:34 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 11 May 2007 10:25:34 -0500 Subject: [Bioperl-l] Cross_match parser and Search::Result object In-Reply-To: <57045.10.0.1.216.1178896496.squirrel@gscmail.wustl.edu> References: <1C6C74AC-9CD2-48E8-8A6A-772D6DEA8C45@wustl.edu> <1E23C374-16B7-4D00-9340-79DBA4B8BABF@uiuc.edu> <9744D96F-D2F4-43B5-B9D3-147A506F3AE7@wustl.edu> <57045.10.0.1.216.1178896496.squirrel@gscmail.wustl.edu> Message-ID: Thanks Mark! I don't think you'll need to add a ton of tests; just enough to demo anything that you feel is necessary or specific to the parser. These could go into SearchIO.t or their own test suite. chris On May 11, 2007, at 10:14 AM, Mark Johnson wrote: >>> Sounds good to me! Any tests to be added? >> >> No tests right now as far as I can tell. I'm swamped personally, but >> perhaps I can persuade Mark Johnson over here to crank out a few. > > I'll see what I can do. I just had to open my mouth about getting > this > contributed back after I noticed it, so I suppose this is appropriate > retribution. 8) > > From mjohnson at watson.wustl.edu Fri May 11 11:14:56 2007 From: mjohnson at watson.wustl.edu (Mark Johnson) Date: Fri, 11 May 2007 10:14:56 -0500 (CDT) Subject: [Bioperl-l] Cross_match parser and Search::Result object In-Reply-To: <9744D96F-D2F4-43B5-B9D3-147A506F3AE7@wustl.edu> References: <1C6C74AC-9CD2-48E8-8A6A-772D6DEA8C45@wustl.edu> <1E23C374-16B7-4D00-9340-79DBA4B8BABF@uiuc.edu> <9744D96F-D2F4-43B5-B9D3-147A506F3AE7@wustl.edu> Message-ID: <57045.10.0.1.216.1178896496.squirrel@gscmail.wustl.edu> >> Sounds good to me! Any tests to be added? > > No tests right now as far as I can tell. I'm swamped personally, but > perhaps I can persuade Mark Johnson over here to crank out a few. I'll see what I can do. I just had to open my mouth about getting this contributed back after I noticed it, so I suppose this is appropriate retribution. 8) From golharam at umdnj.edu Fri May 11 16:20:41 2007 From: golharam at umdnj.edu (Ryan Golhar) Date: Fri, 11 May 2007 16:20:41 -0400 Subject: [Bioperl-l] Bio:Tools::Run::Alignment::ClustalW not cleaning up after itself Message-ID: <000501c79409$d8c03480$f6028a0a@PICO> I'm running a large series of clustalw alignments. After a large number of alignments, my perl script would die indicating too many links were open. I checked my /tmp directory (while the script is running) and noticed that the temp directory created for ClustalW are not removed until after the script exists. How can I force the cleanup of these directories after I am done with the alignment? My code is essentially this; $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new(); $aa_aln = $aln_factory->align(\@aa_seqs); open(STDOUT, ">&OLDOUT"); $dna_aln = &aa_to_dna_aln($aa_aln, \%dna_seqs); Ryan From jason at bioperl.org Fri May 11 16:53:19 2007 From: jason at bioperl.org (Jason Stajich) Date: Fri, 11 May 2007 13:53:19 -0700 Subject: [Bioperl-l] Bio:Tools::Run::Alignment::ClustalW not cleaning up after itself In-Reply-To: <000501c79409$d8c03480$f6028a0a@PICO> References: <000501c79409$d8c03480$f6028a0a@PICO> Message-ID: Did you try adding this after your calls getting the CDS aln. $aln_factory->cleanup(); -jason On May 11, 2007, at 1:20 PM, Ryan Golhar wrote: > I'm running a large series of clustalw alignments. After a large > number of > alignments, my perl script would die indicating too many links were > open. I > checked my /tmp directory (while the script is running) and noticed > that the > temp directory created for ClustalW are not removed until after the > script > exists. > How can I force the cleanup of these directories after I am done > with the > alignment? > > My code is essentially this; > > $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new(); > $aa_aln = $aln_factory->align(\@aa_seqs); > open(STDOUT, ">&OLDOUT"); > $dna_aln = &aa_to_dna_aln($aa_aln, \%dna_seqs); > > > Ryan > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org http://jason.open-bio.org/ From cjfields at uiuc.edu Fri May 11 16:57:23 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 11 May 2007 15:57:23 -0500 Subject: [Bioperl-l] Bio:Tools::Run::Alignment::ClustalW not cleaning up after itself In-Reply-To: <000501c79409$d8c03480$f6028a0a@PICO> References: <000501c79409$d8c03480$f6028a0a@PICO> Message-ID: <41E91E58-48A5-4E29-B6BA-E9417BF17513@uiuc.edu> cleanup() is supposed to clean up temp directory stuff; it's inherited from Bio::Tools::Run::WrapperBase. chris On May 11, 2007, at 3:20 PM, Ryan Golhar wrote: > I'm running a large series of clustalw alignments. After a large > number of > alignments, my perl script would die indicating too many links were > open. I > checked my /tmp directory (while the script is running) and noticed > that the > temp directory created for ClustalW are not removed until after the > script > exists. > How can I force the cleanup of these directories after I am done > with the > alignment? > > My code is essentially this; > > $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new(); > $aa_aln = $aln_factory->align(\@aa_seqs); > open(STDOUT, ">&OLDOUT"); > $dna_aln = &aa_to_dna_aln($aa_aln, \%dna_seqs); > > > Ryan > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From golharam at umdnj.edu Fri May 11 18:11:47 2007 From: golharam at umdnj.edu (Ryan Golhar) Date: Fri, 11 May 2007 18:11:47 -0400 Subject: [Bioperl-l] Bio:Tools::Run::Alignment::ClustalW not cleaning up after itself In-Reply-To: Message-ID: <001301c79419$5e794e90$f6028a0a@PICO> No, I didn't, but I will now. Thanks. Interestingly enough ClustalW removes the files from within the temp directory, but not the temp directory itself. -----Original Message----- From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of Jason Stajich Sent: Friday, May 11, 2007 4:53 PM To: golharam at umdnj.edu Cc: bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Bio:Tools::Run::Alignment::ClustalW not cleaning up after itself Did you try adding this after your calls getting the CDS aln. $aln_factory->cleanup(); -jason On May 11, 2007, at 1:20 PM, Ryan Golhar wrote: I'm running a large series of clustalw alignments. After a large number of alignments, my perl script would die indicating too many links were open. I checked my /tmp directory (while the script is running) and noticed that the temp directory created for ClustalW are not removed until after the script exists. How can I force the cleanup of these directories after I am done with the alignment? My code is essentially this; $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new(); $aa_aln = $aln_factory->align(\@aa_seqs); open(STDOUT, ">&OLDOUT"); $dna_aln = &aa_to_dna_aln($aa_aln, \%dna_seqs); Ryan _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org http://jason.open-bio.org/ From goshng at gmail.com Sat May 12 11:21:59 2007 From: goshng at gmail.com (Sang Chul Choi) Date: Sat, 12 May 2007 11:21:59 -0400 Subject: [Bioperl-l] How can I change only one letter of Bio::Seq object without making another object? Message-ID: <33f36270705120821g15c53932wad96d8627ef5b5b7@mail.gmail.com> Hi, One Bio::Seq's sequence is "ACGT" and I want this object to have "ACGA" by changing the fouth letter from T to A. I thought I could do this by reading sequence string through the method of seq(), changing the string by perl's general function, and generating another Bio::Seq object with the new string. This seems to be silly, a little bit. Is there any simple way to do this? Or, is there any method of Bio::Seq to do this: to change one letter at a particular position, or additionally to change letters with some range? Thank you, Sang Chul From jason at bioperl.org Sat May 12 12:50:10 2007 From: jason at bioperl.org (Jason Stajich) Date: Sat, 12 May 2007 09:50:10 -0700 Subject: [Bioperl-l] How can I change only one letter of Bio::Seq object without making another object? In-Reply-To: <33f36270705120821g15c53932wad96d8627ef5b5b7@mail.gmail.com> References: <33f36270705120821g15c53932wad96d8627ef5b5b7@mail.gmail.com> Message-ID: <22C99635-C22D-4F51-AADD-5CCF595222DF@bioperl.org> You can get/set the seq data via the seq() method. use Bio::Seq; my $seq = Bio::Seq->new(-seq => 'ACGT'); my $str = $seq->seq; print $str, "\n"; substr($str,3,1,'A'); $seq->seq($str); print $seq->seq, "\n"; On May 12, 2007, at 8:21 AM, Sang Chul Choi wrote: > Hi, > > One Bio::Seq's sequence is "ACGT" and I want this object to have > "ACGA" by changing the fouth letter from T to A. I thought I could do > this by reading sequence string through the method of seq(), changing > the string by perl's general function, and generating another Bio::Seq > object with the new string. This seems to be silly, a little bit. > > Is there any simple way to do this? Or, is there any method of > Bio::Seq to do this: to change one letter at a particular position, or > additionally to change letters with some range? > > Thank you, > > Sang Chul > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org http://jason.open-bio.org/ From jason at bioperl.org Sat May 12 18:12:56 2007 From: jason at bioperl.org (Jason Stajich) Date: Sat, 12 May 2007 15:12:56 -0700 Subject: [Bioperl-l] Bio::Tree::Tree -- rerooting & bootstrap problem In-Reply-To: <4643448C.4000807@awi.de> References: <4643448C.4000807@awi.de> Message-ID: <1369AFDC-2082-4021-8603-55E8ED032D41@bioperl.org> On May 10, 2007, at 9:13 AM, Bank Beszteri wrote: > Dear Bioperl folks, > > I?m trying to use Bio::Tree::Tree for manipulating phylogenetic trees, > but in some things it did not behave as I expected it to, so I had to > look inside a bit. > In particular, I had problems with mixed up bootstrap values after > re-rooting. After looking into the Bio::Tree::Tree data structures, it > seems that > > a) bootstrap values are stored as attributes of nodes of the tree > [to my > understanding, they should rather be attributes of branches but > Bio::Tree::Tree apparently tries to simplify away branches]; each node > stores the bootstrap value belonging to the branch that connects it to > its ancestor node (I?m reading in trees from Newick strings, and > bootstrap values arrive in the id fields of internal branches) Please feel free to suggest an alternative implementation if you don't agree with the object model. It has worked quite well in our hands so I'd be all ears for someone wanting to get in an do some more work on it. We have answered the question as to why bootstrap values are internal ids many times on this list and I believe on the wiki -- the parser can't tell the difference between a node id and a bootstrap value because nexus uses the same slot for both. if you know you have bootstrap values in the internal node it is trivial to process your tree and copy the values over. for my $node ( grep { ! $_->is_Leaf } $tree->get_all_nodes ) { $node->bootstrap($node->id); $node->id(''); } I just added this as a method to TreeFunctionI so that it can be easily called now to help satisfy everyone who hopes that the toolkit can guess whether the internal nodes are bootstraps or identifiers. > > b) when re-rooting a tree, bootstrap values stay with the same node > where they were before. Because the node that used to be the > ancestor of > a particular node in the original tree might have become its > descendant > after re-rooting, the bootstrap values are being mixed up. > > Can you confirm my conclusion? Whether yes or no, have you got an easy > workaround or alternative solution to re-rooting trees (without having > to touch the reroot method) or any other hints that could be useful > for > me to deal with this issue? > I think you are right, but I am not clear what should be value for the internal node attached to the root now. Note that is always helpful to provide example code illustrating your problem. Here is an example which I think illustrates your problem. use Bio::TreeIO; my $in = Bio::TreeIO->new(-format => 'newick', -fh => \*DATA); my $out = Bio::TreeIO->new(-format => 'newick'); while( my $t = $in->next_tree ){ my ($a) = $t->find_node(-id =>"A"); $out->write_tree($t); $t->reroot($a); $out->write_tree($t); } __DATA__ (((A:5,B:5)90:2,C:4)25:3,D:10); > Cheers, > > Bank > > > > -- > Dr. B?nk Beszteri > Alfred Wegener Institute for Polar and Marine Research > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org http://jason.open-bio.org/ From darin.london at duke.edu Mon May 14 10:44:56 2007 From: darin.london at duke.edu (darin.london at duke.edu) Date: Mon, 14 May 2007 10:44:56 -0400 Subject: [Bioperl-l] BOSC 2007 Abstract Submission Deadline Extended Message-ID: <200705141444.l4EEium2026969@tenero.duhs.duke.edu> Due to technical difficulties in sending out the 2nd call for papers, the BOSC organizers are extending the deadline for abstract submissions to Monday May 21st. The announcement day will remain the same so that it remains before the Early Discount Date. http://open-bio.org/wiki/BOSC_2007 The BOSC organizing Committee Please pass this email on to anyone that would be interested. From thiago.venancio at gmail.com Mon May 14 14:54:44 2007 From: thiago.venancio at gmail.com (Thiago Venancio) Date: Mon, 14 May 2007 15:54:44 -0300 Subject: [Bioperl-l] get regions Message-ID: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com> Hi all, Using Bio::Seq, is there any easy way to get the coordinates where a regular expression matches or should I build a sliding window? For example, looking for a given promoter region in a FASTA file. If the region is found, I would like to recover exactly the coordinates where it matches. Thanks in advance. Thiago -- "Doubt is not a pleasant condition, but certainty is absurd." Voltaire ======================== Thiago Motta Venancio, MSc PhD student in Bioinformatics University of Sao Paulo ======================== From jason at bioperl.org Mon May 14 15:06:11 2007 From: jason at bioperl.org (Jason Stajich) Date: Mon, 14 May 2007 12:06:11 -0700 Subject: [Bioperl-l] get regions In-Reply-To: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com> References: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com> Message-ID: <13EAE352-2A44-4728-BDA9-B828CFE2DC11@bioperl.org> I assume you are doing the matches on the string with =~ so Bio::Seq doesn't really help you here I don't think. See the $` variable in Perl for how to capture the position of where a regexp matches. -jason On May 14, 2007, at 11:54 AM, Thiago Venancio wrote: > Hi all, > > Using Bio::Seq, is there any easy way to get the coordinates where a > regular expression matches or should I build a sliding window? > > For example, looking for a given promoter region in a FASTA file. If > the region is found, I would like to recover exactly the coordinates > where it matches. > > Thanks in advance. > > Thiago > -- > "Doubt is not a pleasant condition, but certainty is absurd." > Voltaire > > ======================== > Thiago Motta Venancio, MSc > PhD student in Bioinformatics > University of Sao Paulo > ======================== > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org http://jason.open-bio.org/ From Kevin.M.Brown at asu.edu Mon May 14 15:15:09 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Mon, 14 May 2007 12:15:09 -0700 Subject: [Bioperl-l] get regions In-Reply-To: <13EAE352-2A44-4728-BDA9-B828CFE2DC11@bioperl.org> References: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com> <13EAE352-2A44-4728-BDA9-B828CFE2DC11@bioperl.org> Message-ID: <1A4207F8295607498283FE9E93B775B403283D5C@EX02.asurite.ad.asu.edu> I do this in perl with the pos() function. This requires the use of the match operator (m) like if ($gene =~ m/$pattern/gi) { $start = pos($gene) - length($pattern) + 1; } pos() returns the location of the pointer where the regex left off after finding a match. I remove the length of my pattern (which is just a string with a few placeholder (.) wildcards, so I know how long the match will always be). > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Jason Stajich > Sent: Monday, May 14, 2007 12:06 PM > To: Thiago Venancio > Cc: bioperl-l list > Subject: Re: [Bioperl-l] get regions > > I assume you are doing the matches on the string with =~ so > Bio::Seq doesn't really help you here I don't think. > See the $` variable in Perl for how to capture the position > of where a regexp matches. > > -jason > On May 14, 2007, at 11:54 AM, Thiago Venancio wrote: > > > Hi all, > > > > Using Bio::Seq, is there any easy way to get the > coordinates where a > > regular expression matches or should I build a sliding window? > > > > For example, looking for a given promoter region in a FASTA > file. If > > the region is found, I would like to recover exactly the > coordinates > > where it matches. > > > > Thanks in advance. > > > > Thiago > > -- > > "Doubt is not a pleasant condition, but certainty is absurd." > > Voltaire > > > > ======================== > > Thiago Motta Venancio, MSc > > PhD student in Bioinformatics > > University of Sao Paulo > > ======================== > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason at bioperl.org > http://jason.open-bio.org/ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From Bank.Beszteri at awi.de Mon May 14 09:20:07 2007 From: Bank.Beszteri at awi.de (Bank Beszteri) Date: Mon, 14 May 2007 15:20:07 +0200 Subject: [Bioperl-l] Bio::Tree::Tree -- rerooting & bootstrap problem In-Reply-To: <1369AFDC-2082-4021-8603-55E8ED032D41@bioperl.org> References: <4643448C.4000807@awi.de> <1369AFDC-2082-4021-8603-55E8ED032D41@bioperl.org> Message-ID: <46486207.60304@awi.de> Dear Jason, thanks for your answer! Sorry about having been ambiguous - it is clear that bootstrap values are parsed as ids from newick files, I had no problem with that, it was only the first step of the explanation of my problem, which was the rerooting issue. Thanks for your example code as well, it is indeed really useful to illustrate the problem. I modified the original tree a bit to make my point clearer: In your example, there are two internal node ids in a four-taxon tree. This is not a realistic situtation for bootstrap values, because bootstrap values are attached to bipartitions of terminal nodes, i.e., edges / branches of a tree (in what proportion of the bootstrap replicates was a particular bipartition recovered - an alternative representation of bootstraps, like produced e.g. by PAUP, is indeed a "taxon bipartition table"). This means that in a four taxon tree, we can have at most one bootstrap value - corresponding to the single non-trivial bipartition (all other bipartitions are trivial, i.e., they separate a terminal node from the rest). So here is an example 4-taxon tree with a bootstrap value: (A:52,(B:46,C:50)68:11,D:70); After rerooting at node B (using your example code) it looks like ((B:46,C:50,(A:52,D:70):11)68); Now there are two problems: 1) this seems to be a small problem with TreeIO rather than with rerooting: there is an extra pair of parentheses around the whole tree; but more importantly: 2) the bootstrap value appears at the root node, which is not sensible according to the convention that "each node stores the bootstrap value belonging to the branch linking it to its ancestor". You would like the bootstrap value appear at the node connecting A & D in this situation, which would look like (B:46,C:50,(A:52,D:70)68:11); because in this new situation, this position would correspond to the same bipartition as in the original tree [which is (A,D)(B,C)]. In the meanwhile, I got a mail showing me the solution (thx Daniel!), which is in fact pretty simple: all that has to be done is go through the nodes on the path from the old to the new root after rerooting, and for each node, take the bootstrap values from its ancestor (and remove it from the ancestor). This leaves the root node without a bootstrap value, which is exactly what you want (because it has no branch connecting it to its ancestor, there is no sensible bootstrap value attached to a root node). So this exercise tells me that bootstraps and "real" node ids should be handled in different manners when rerooting: real ids should of course stay with the nodes, whereas bootstrap values on the path between the new and old root should move over to the other end of the corresponding branch. Best wishes, Bank Jason Stajich wrote: > > On May 10, 2007, at 9:13 AM, Bank Beszteri wrote: > >> Dear Bioperl folks, >> >> I?m trying to use Bio::Tree::Tree for manipulating phylogenetic trees, >> but in some things it did not behave as I expected it to, so I had to >> look inside a bit. >> In particular, I had problems with mixed up bootstrap values after >> re-rooting. After looking into the Bio::Tree::Tree data structures, it >> seems that >> >> a) bootstrap values are stored as attributes of nodes of the tree [to my >> understanding, they should rather be attributes of branches but >> Bio::Tree::Tree apparently tries to simplify away branches]; each node >> stores the bootstrap value belonging to the branch that connects it to >> its ancestor node (I?m reading in trees from Newick strings, and >> bootstrap values arrive in the id fields of internal branches) > > Please feel free to suggest an alternative implementation if you don't > agree with the object model. It has worked quite well in our hands > so I'd be all ears for someone wanting to get in an do some more work > on it. > > We have answered the question as to why bootstrap values are internal > ids many times on this list and I believe on the wiki -- the parser > can't tell the difference between a node id and a bootstrap value > because nexus uses the same slot for both. if you know you have > bootstrap values in the internal node it is trivial to process your > tree and copy the values over. > > > for my $node ( grep { ! $_->is_Leaf } $tree->get_all_nodes ) { > $node->bootstrap($node->id); > $node->id(''); > } > > I just added this as a method to TreeFunctionI so that it can be > easily called now to help satisfy everyone who hopes that the toolkit > can guess whether the internal nodes are bootstraps or identifiers. > > >> >> b) when re-rooting a tree, bootstrap values stay with the same node >> where they were before. Because the node that used to be the ancestor of >> a particular node in the original tree might have become its descendant >> after re-rooting, the bootstrap values are being mixed up. >> >> Can you confirm my conclusion? Whether yes or no, have you got an easy >> workaround or alternative solution to re-rooting trees (without having >> to touch the reroot method) or any other hints that could be useful for >> me to deal with this issue? >> > > I think you are right, but I am not clear what should be value for the > internal node attached to the root now. > > Note that is always helpful to provide example code illustrating your > problem. Here is an example which I think illustrates your problem. > > use Bio::TreeIO; > > my $in = Bio::TreeIO->new(-format => 'newick', > -fh => \*DATA); > my $out = Bio::TreeIO->new(-format => 'newick'); > while( my $t = $in->next_tree ){ > my ($a) = $t->find_node(-id =>"A"); > $out->write_tree($t); > $t->reroot($a); > $out->write_tree($t); > } > __DATA__ > (((A:5,B:5)90:2,C:4)25:3,D:10); > > >> Cheers, >> >> Bank >> >> >> >> -- >> Dr. B?nk Beszteri >> Alfred Wegener Institute for Polar and Marine Research >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason at bioperl.org > http://jason.open-bio.org/ > > From basu at pharm.sunysb.edu Mon May 14 15:10:33 2007 From: basu at pharm.sunysb.edu (Siddhartha Basu) Date: Mon, 14 May 2007 15:10:33 -0400 Subject: [Bioperl-l] get regions In-Reply-To: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com> References: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com> Message-ID: <4648B429.2030907@pharm.sunysb.edu> Thiago Venancio wrote: > Hi all, > > Using Bio::Seq, is there any easy way to get the coordinates where a > regular expression matches or should I build a sliding window? The perl core function "pos" should help you in this case. Do a 'perldoc -f pos' for details. -sidd > > For example, looking for a given promoter region in a FASTA file. If > the region is found, I would like to recover exactly the coordinates > where it matches. > > Thanks in advance. > > Thiago From cjfields at uiuc.edu Mon May 14 16:48:36 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 14 May 2007 15:48:36 -0500 Subject: [Bioperl-l] get regions In-Reply-To: <13EAE352-2A44-4728-BDA9-B828CFE2DC11@bioperl.org> References: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com> <13EAE352-2A44-4728-BDA9-B828CFE2DC11@bioperl.org> Message-ID: <1A37A8AB-4C9F-4BB6-BC91-3493C68A84DA@uiuc.edu> I use pos() with m{}g; the quoted globals tend to slow things down for me. Ah, see Kevin's answered that... chris On May 14, 2007, at 2:06 PM, Jason Stajich wrote: > I assume you are doing the matches on the string with =~ so Bio::Seq > doesn't really help you here I don't think. > See the $` variable in Perl for how to capture the position of where > a regexp matches. > > -jason > On May 14, 2007, at 11:54 AM, Thiago Venancio wrote: > >> Hi all, >> >> Using Bio::Seq, is there any easy way to get the coordinates where a >> regular expression matches or should I build a sliding window? >> >> For example, looking for a given promoter region in a FASTA file. If >> the region is found, I would like to recover exactly the coordinates >> where it matches. >> >> Thanks in advance. >> >> Thiago >> -- >> "Doubt is not a pleasant condition, but certainty is absurd." >> Voltaire >> >> ======================== >> Thiago Motta Venancio, MSc >> PhD student in Bioinformatics >> University of Sao Paulo >> ======================== >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason at bioperl.org > http://jason.open-bio.org/ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From jason at bioperl.org Mon May 14 17:50:09 2007 From: jason at bioperl.org (Jason Stajich) Date: Mon, 14 May 2007 14:50:09 -0700 Subject: [Bioperl-l] get regions In-Reply-To: <1A37A8AB-4C9F-4BB6-BC91-3493C68A84DA@uiuc.edu> References: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com> <13EAE352-2A44-4728-BDA9-B828CFE2DC11@bioperl.org> <1A37A8AB-4C9F-4BB6-BC91-3493C68A84DA@uiuc.edu> Message-ID: yep you are right pos() much better and faster for getting the position. -j On May 14, 2007, at 1:48 PM, Chris Fields wrote: > I use pos() with m{}g; the quoted globals tend to slow things down > for me. > > Ah, see Kevin's answered that... > > chris > > On May 14, 2007, at 2:06 PM, Jason Stajich wrote: > >> I assume you are doing the matches on the string with =~ so Bio::Seq >> doesn't really help you here I don't think. >> See the $` variable in Perl for how to capture the position of where >> a regexp matches. >> >> -jason >> On May 14, 2007, at 11:54 AM, Thiago Venancio wrote: >> >>> Hi all, >>> >>> Using Bio::Seq, is there any easy way to get the coordinates where a >>> regular expression matches or should I build a sliding window? >>> >>> For example, looking for a given promoter region in a FASTA file. If >>> the region is found, I would like to recover exactly the coordinates >>> where it matches. >>> >>> Thanks in advance. >>> >>> Thiago >>> -- >>> "Doubt is not a pleasant condition, but certainty is absurd." >>> Voltaire >>> >>> ======================== >>> Thiago Motta Venancio, MSc >>> PhD student in Bioinformatics >>> University of Sao Paulo >>> ======================== >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> jason at bioperl.org >> http://jason.open-bio.org/ >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > -- Jason Stajich jason at bioperl.org http://jason.open-bio.org/ From sac at bioperl.org Mon May 14 21:46:55 2007 From: sac at bioperl.org (Steve Chervitz) Date: Mon, 14 May 2007 18:46:55 -0700 Subject: [Bioperl-l] get regions In-Reply-To: <1A4207F8295607498283FE9E93B775B403283D5C@EX02.asurite.ad.asu.edu> References: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com> <13EAE352-2A44-4728-BDA9-B828CFE2DC11@bioperl.org> <1A4207F8295607498283FE9E93B775B403283D5C@EX02.asurite.ad.asu.edu> Message-ID: <8f200b4c0705141846h68146d40nc238bc911d1a4b4d@mail.gmail.com> On 5/14/07, Kevin Brown wrote: > I do this in perl with the pos() function. This requires the use of the > match operator (m) like > > if ($gene =~ m/$pattern/gi) > { > $start = pos($gene) - length($pattern) + 1; > } > > pos() returns the location of the pointer where the regex left off after > finding a match. Cool. I hadn't known that was possible. > I remove the length of my pattern (which is just a > string with a few placeholder (.) wildcards, so I know how long the > match will always be). To generalize your code so that it will work for any pattern, such as one that can match strings of variable length like "A{5,10}", just subtract the length of the actual string that was matched: if ($gene =~ m/$pattern/gi) { $start = pos($gene) - length($&) + 1; } Steve > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org > > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > > Jason Stajich > > Sent: Monday, May 14, 2007 12:06 PM > > To: Thiago Venancio > > Cc: bioperl-l list > > Subject: Re: [Bioperl-l] get regions > > > > I assume you are doing the matches on the string with =~ so > > Bio::Seq doesn't really help you here I don't think. > > See the $` variable in Perl for how to capture the position > > of where a regexp matches. > > > > -jason > > On May 14, 2007, at 11:54 AM, Thiago Venancio wrote: > > > > > Hi all, > > > > > > Using Bio::Seq, is there any easy way to get the > > coordinates where a > > > regular expression matches or should I build a sliding window? > > > > > > For example, looking for a given promoter region in a FASTA > > file. If > > > the region is found, I would like to recover exactly the > > coordinates > > > where it matches. > > > > > > Thanks in advance. > > > > > > Thiago > > > -- > > > "Doubt is not a pleasant condition, but certainty is absurd." > > > Voltaire > > > > > > ======================== > > > Thiago Motta Venancio, MSc > > > PhD student in Bioinformatics > > > University of Sao Paulo > > > ======================== > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > > Jason Stajich > > jason at bioperl.org > > http://jason.open-bio.org/ > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From shameer at ncbs.res.in Mon May 14 23:03:57 2007 From: shameer at ncbs.res.in (Shameer Khadar) Date: Tue, 15 May 2007 08:33:57 +0530 (IST) Subject: [Bioperl-l] How to produce Bio::Graphics images using PROSITE output ? In-Reply-To: <6dce9a0b0705030901w203344b4te03ad271a5482faf@mail.gmail.com> References: <10259461.post@talk.nabble.com> <41667.192.168.1.1.1178019391.squirrel@mail.ncbs.res.in> <1178028249.2644.13.camel@localhost.localdomain> <42391.192.168.1.1.1178035451.squirrel@mail.ncbs.res.in> <6dce9a0b0705030901w203344b4te03ad271a5482faf@mail.gmail.com> Message-ID: <49697.192.168.1.1.1179198237.squirrel@mail.ncbs.res.in> Dear All, Thanks a lot for all your inputs [Help : Imagemaps using Bio::Graphics ]. I am still working on the other part of this project. Now, I am sure that I can impliment it using Bio::Graphics. I will come back to imagemaps with in a week or two. Meanwhile, I need to parse a prosite output to present it as a Bio::Graphics image. Any one had tries Bio::Graphics to create images using prosite output ? I tried in the How-to I couldnt find anything related to prosite. My output looks like this : >Sequence : PS00001 ASN_GLYCOSYLATION N-glycosylation site. 75 - 78 NGSM >Sequence : PS00005 PKC_PHOSPHO_SITE Protein kinase C phosphorylation site. 41 - 43 SpK >Sequence : PS00008 MYRISTYL N-myristoylation site. 6 - 11 GTitNQ >Sequence : PS00009 AMIDATION Amidation site. 78 - 81 mGKR I need to impliment an image like blast-parser image. Thanks to any inputs/pointers. > The width of the image is determined by the -width attribute and is given > in > pixels. You cannot control the height of the image as it is computed > dynamically based on the number of features and bumping options. > > Lincoln > > On 5/1/07, Shameer Khadar wrote: >> >> Dear Scot, >> >> > There is a fair amount of documentation in the perldoc for >> > Bio::Graphics::Panel under the section called 'Creating Imagemaps'; >> have >> > you read that? >> >> I agreed, but I couldnt the exact information I needed :( (may be I >> missed >> something important). >> >> > Also, for changing the scale, that should happen >> > automatically--have you tried yet? >> >> I tried by changing the Lincoln's program eg: blast3.pl >> my $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>1000); >> to my >> $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>300); >> >> But it had given me a smaller scale of length upto 300. I was looking >> for >> an option where I need same width and height of given image and a >> dynamic >> start and end values depending on length of my sequence. Since I couldnt >> accomplish, I thought of getting some help from you guys. I think I need >> to play a little bit with the value for reformat the scale to accomodate >> my hits as well. >> >> Thanks a lot for your inputs, >> -- >> Shameer Khadar -- Shameer Khadar Prof. R. Sowdhamini's Lab (# 25) The Computational Biology Group National Centre for Biological Sciences (TIFR) GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India T - 91-080-23666001 EXT - 6251 W - http://www.ncbs.res.in From bix at sendu.me.uk Tue May 15 04:23:52 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 15 May 2007 09:23:52 +0100 Subject: [Bioperl-l] New Blast parser Message-ID: <46496E18.1000809@sendu.me.uk> Back in August of last year I introduced Bio::PullParserI, a module that aids in the creation of fast SearchIO and Search modules. I've finally gotten around to implementing a Blast parser using the interface, which I've called Bio::SearchIO::blast_pull. To use it you say: my $sio = Bio::SearchIO->new(-format => "blast_pull", -file => "file"); or in the near future (when I've committed StandAloneBlast changes): my $sab = Bio::Tools::Run::StandAloneBlast->new(-_READMETHOD => "blast_pull"); Currently the parser is incomplete: I've only tested it with NCBI BLASTN and BLASTP. However, results are promising. In one particular real-world usage-case involving running and parsing multiple Blast jobs via StandAloneBlast (amongst other things), changing only the _READMETHOD from 'blast' to 'blast_pull' in the code dropped run time from 20223s to 951s (~20x faster) and memory usage from over 8GB to less than 5GB (~40% less). Please try it out and feed-back any bugs you discover. Cheers, Sendu. From aaron.j.mackey at gsk.com Tue May 15 10:30:13 2007 From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com) Date: Tue, 15 May 2007 10:30:13 -0400 Subject: [Bioperl-l] get regions In-Reply-To: <8f200b4c0705141846h68146d40nc238bc911d1a4b4d@mail.gmail.com> Message-ID: Or, use a zero-width, positive look ahead assertion, and don't incur the penalty of either $` or $&: if ($gene =~ m/(?=$pattern)/gi) { $start = pos($gene) + 1; } -Aaron bioperl-l-bounces at lists.open-bio.org wrote on 05/14/2007 09:46:55 PM: > On 5/14/07, Kevin Brown wrote: > > I do this in perl with the pos() function. This requires the use of the > > match operator (m) like > > > > if ($gene =~ m/$pattern/gi) > > { > > $start = pos($gene) - length($pattern) + 1; > > } > > > > pos() returns the location of the pointer where the regex left off after > > finding a match. > > Cool. I hadn't known that was possible. > > > I remove the length of my pattern (which is just a > > string with a few placeholder (.) wildcards, so I know how long the > > match will always be). > > To generalize your code so that it will work for any pattern, such as > one that can match strings of variable length like "A{5,10}", just > subtract the length of the actual string that was matched: > > if ($gene =~ m/$pattern/gi) > { > $start = pos($gene) - length($&) + 1; > } > > Steve > > > > -----Original Message----- > > > From: bioperl-l-bounces at lists.open-bio.org > > > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > > > Jason Stajich > > > Sent: Monday, May 14, 2007 12:06 PM > > > To: Thiago Venancio > > > Cc: bioperl-l list > > > Subject: Re: [Bioperl-l] get regions > > > > > > I assume you are doing the matches on the string with =~ so > > > Bio::Seq doesn't really help you here I don't think. > > > See the $` variable in Perl for how to capture the position > > > of where a regexp matches. > > > > > > -jason > > > On May 14, 2007, at 11:54 AM, Thiago Venancio wrote: > > > > > > > Hi all, > > > > > > > > Using Bio::Seq, is there any easy way to get the > > > coordinates where a > > > > regular expression matches or should I build a sliding window? > > > > > > > > For example, looking for a given promoter region in a FASTA > > > file. If > > > > the region is found, I would like to recover exactly the > > > coordinates > > > > where it matches. > > > > > > > > Thanks in advance. > > > > > > > > Thiago > > > > -- > > > > "Doubt is not a pleasant condition, but certainty is absurd." > > > > Voltaire > > > > > > > > ======================== > > > > Thiago Motta Venancio, MSc > > > > PhD student in Bioinformatics > > > > University of Sao Paulo > > > > ======================== > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l at lists.open-bio.org > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > -- > > > Jason Stajich > > > jason at bioperl.org > > > http://jason.open-bio.org/ > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From diogoat at gmail.com Tue May 15 18:44:59 2007 From: diogoat at gmail.com (Diogo Tschoeke) Date: Tue, 15 May 2007 19:44:59 -0300 Subject: [Bioperl-l] Downloading a sequence in genbank format Message-ID: <638512560705151544q27968474tbc5633f74db21083@mail.gmail.com> Dear All, I need to download a lot of sequence of Leishmania major in genbank format... But i can't download on the page of NCBI, because the downloaded file are corrupted... when i use a browser to download this sequences And them i looking for some script to download that`s file and fink something like that: ######################################################### use strict; use warnings; use Bio::Seq; use Bio::SeqIO; use Bio::DB::GenBank; my $query = Bio::DB::Query::GenBank->new (-query =>'Leishmania major [Organism]', -db => 'nucleotide'); my $gb = new Bio::DB::GenBank; my $seqio = $gb->get_Stream_by_query($query); my $out = Bio::SeqIO->new(-format => 'genbank', -file => '>>teste6.gb'); $out->write_seq($seqio); ######################################################### And the system return me this erros [diogo1 at genome perl]$ perl teste6.pl -------------------- WARNING --------------------- MSG: Bio::SeqIO::genbank=HASH(0x96c0f08) is not a SeqI compliant module. Attempting to dump, but may fail! --------------------------------------------------- Can't locate object method "seq" via package "Bio::SeqIO::genbank" at /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/genbank.pm line 692. Any Ideia? Thank`s Diogo Tschoeke Laboratory of Molecular Biology of Trypanosomatides Funda??o Osvaldo Cruz - Fiocruz RJ, Brazil http:biowebdb.org From diogoat at gmail.com Tue May 15 19:27:05 2007 From: diogoat at gmail.com (Diogo Tschoeke) Date: Tue, 15 May 2007 20:27:05 -0300 Subject: [Bioperl-l] Downloading a sequence in genbank format In-Reply-To: References: <638512560705151544q27968474tbc5633f74db21083@mail.gmail.com> Message-ID: <638512560705151627t2e25f17cg7f820f3097a67748@mail.gmail.com> Thank for your help Barry!! It`s work very fine and i`'m using the script... like you said... The error was on the print that`s right? I need to use a while to print all sequeces... Thanks a Lot Diogo Tschoeke Laboratory of Molecular Biology of Trypanosomatides Funda??o Osvaldo Cruz - Fiocruz RJ, Brazil http://biowebdb.org 2007/5/15, Barry Moore : > > Diogo- > > write_seq expects to be given a Bio::Seq object, not a Bio::SeqIO > object. Try this > > use strict; > use warnings; > > use Bio::Seq; > use Bio::SeqIO; > use Bio::DB::GenBank; > > my $query = Bio::DB::Query::GenBank->new > (-query =>'Leishmania major > [Organism]', > -db => 'nucleotide'); > my $gb = new Bio::DB::GenBank; > my $seqio = $gb->get_Stream_by_query($query); > > my $out = Bio::SeqIO->new(-format => 'genbank', > -file => '>>teste6.gb'); > while (my $seq = $seqio->next_seq) { > $out->write_seq($seq); > } > > Barry > > On May 15, 2007, at 4:44 PM, Diogo Tschoeke wrote: > > > Dear All, > > > > I need to download a lot of sequence of Leishmania major in genbank > > format... > > But i can't download on the page of NCBI, because the downloaded > > file are > > corrupted... when i use a browser to download this sequences > > And them i looking for some script to download that`s file and fink > > something like that: > > > > > > ######################################################### > > use strict; > > use warnings; > > > > use Bio::Seq; > > use Bio::SeqIO; > > use Bio::DB::GenBank; > > > > my $query = Bio::DB::Query::GenBank->new > > (-query =>'Leishmania major > > [Organism]', > > -db => 'nucleotide'); > > my $gb = new Bio::DB::GenBank; > > my $seqio = $gb->get_Stream_by_query($query); > > > > my $out = Bio::SeqIO->new(-format => 'genbank', > > -file => '>>teste6.gb'); > > $out->write_seq($seqio); > > ######################################################### > > > > And the system return me this erros > > [diogo1 at genome perl]$ perl teste6.pl > > > > -------------------- WARNING --------------------- > > MSG: Bio::SeqIO::genbank=HASH(0x96c0f08) is not a SeqI compliant > > module. > > Attempting to dump, but may fail! > > --------------------------------------------------- > > Can't locate object method "seq" via package "Bio::SeqIO::genbank" at > > /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/genbank.pm line 692. > > > > Any Ideia? > > > > Thank`s > > > > Diogo Tschoeke > > Laboratory of Molecular Biology of Trypanosomatides > > Funda??o Osvaldo Cruz - Fiocruz RJ, Brazil > > http://biowebdb.org > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From barry.moore at genetics.utah.edu Tue May 15 19:17:39 2007 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Tue, 15 May 2007 17:17:39 -0600 Subject: [Bioperl-l] Downloading a sequence in genbank format In-Reply-To: <638512560705151544q27968474tbc5633f74db21083@mail.gmail.com> References: <638512560705151544q27968474tbc5633f74db21083@mail.gmail.com> Message-ID: Diogo- write_seq expects to be given a Bio::Seq object, not a Bio::SeqIO object. Try this use strict; use warnings; use Bio::Seq; use Bio::SeqIO; use Bio::DB::GenBank; my $query = Bio::DB::Query::GenBank->new (-query =>'Leishmania major [Organism]', -db => 'nucleotide'); my $gb = new Bio::DB::GenBank; my $seqio = $gb->get_Stream_by_query($query); my $out = Bio::SeqIO->new(-format => 'genbank', -file => '>>teste6.gb'); while (my $seq = $seqio->next_seq) { $out->write_seq($seq); } Barry On May 15, 2007, at 4:44 PM, Diogo Tschoeke wrote: > Dear All, > > I need to download a lot of sequence of Leishmania major in genbank > format... > But i can't download on the page of NCBI, because the downloaded > file are > corrupted... when i use a browser to download this sequences > And them i looking for some script to download that`s file and fink > something like that: > > > ######################################################### > use strict; > use warnings; > > use Bio::Seq; > use Bio::SeqIO; > use Bio::DB::GenBank; > > my $query = Bio::DB::Query::GenBank->new > (-query =>'Leishmania major > [Organism]', > -db => 'nucleotide'); > my $gb = new Bio::DB::GenBank; > my $seqio = $gb->get_Stream_by_query($query); > > my $out = Bio::SeqIO->new(-format => 'genbank', > -file => '>>teste6.gb'); > $out->write_seq($seqio); > ######################################################### > > And the system return me this erros > [diogo1 at genome perl]$ perl teste6.pl > > -------------------- WARNING --------------------- > MSG: Bio::SeqIO::genbank=HASH(0x96c0f08) is not a SeqI compliant > module. > Attempting to dump, but may fail! > --------------------------------------------------- > Can't locate object method "seq" via package "Bio::SeqIO::genbank" at > /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/genbank.pm line 692. > > Any Ideia? > > Thank`s > > Diogo Tschoeke > Laboratory of Molecular Biology of Trypanosomatides > Funda??o Osvaldo Cruz - Fiocruz RJ, Brazil > http:biowebdb.org > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Tue May 15 22:44:43 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 15 May 2007 21:44:43 -0500 Subject: [Bioperl-l] get regions In-Reply-To: <8f200b4c0705141846h68146d40nc238bc911d1a4b4d@mail.gmail.com> References: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com> <13EAE352-2A44-4728-BDA9-B828CFE2DC11@bioperl.org> <1A4207F8295607498283FE9E93B775B403283D5C@EX02.asurite.ad.asu.edu> <8f200b4c0705141846h68146d40nc238bc911d1a4b4d@mail.gmail.com> Message-ID: <6CDAB174-E36A-4D2A-8468-AD980CFCAED6@uiuc.edu> On May 14, 2007, at 8:46 PM, Steve Chervitz wrote: ... > To generalize your code so that it will work for any pattern, such as > one that can match strings of variable length like "A{5,10}", just > subtract the length of the actual string that was matched: > > if ($gene =~ m/$pattern/gi) > { > $start = pos($gene) - length($&) + 1; > } > > Steve Right, but $& (as well as $` and $') inflict a significant penalty for their use, as Aaron alludes to. Their use, even indirectly via a library module, can cause a significant performance hit. chris From sac at bioperl.org Wed May 16 04:16:38 2007 From: sac at bioperl.org (Steve Chervitz) Date: Wed, 16 May 2007 01:16:38 -0700 Subject: [Bioperl-l] get regions In-Reply-To: <6CDAB174-E36A-4D2A-8468-AD980CFCAED6@uiuc.edu> References: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com> <13EAE352-2A44-4728-BDA9-B828CFE2DC11@bioperl.org> <1A4207F8295607498283FE9E93B775B403283D5C@EX02.asurite.ad.asu.edu> <8f200b4c0705141846h68146d40nc238bc911d1a4b4d@mail.gmail.com> <6CDAB174-E36A-4D2A-8468-AD980CFCAED6@uiuc.edu> Message-ID: <8f200b4c0705160116j265f9e8eu1174d6e41e6ebbdc@mail.gmail.com> On 5/15/07, Chris Fields wrote: > > On May 14, 2007, at 8:46 PM, Steve Chervitz wrote: > ... > > > To generalize your code so that it will work for any pattern, such as > > one that can match strings of variable length like "A{5,10}", just > > subtract the length of the actual string that was matched: > > > > if ($gene =~ m/$pattern/gi) > > { > > $start = pos($gene) - length($&) + 1; > > } > > > > Steve > > Right, but $& (as well as $` and $') inflict a significant penalty > for their use, as Aaron alludes to. Their use, even indirectly via a > library module, can cause a significant performance hit. > > chris Yes. I had forgotten how poisonous $&, $` and $' were to regex performance. Please forgive me. We might consider regularly auditing the bioperl module tree for use of these in committed code. But regarding the use of the look ahead assertion, there's a problem if you want to find *all* occurrences of the pattern in a target string and the pattern can have variable length hits: it may report overlapping hits because it only collects the starting points of the match, and does not determine how long each match would be. For example: $gene = 'TTTAAAAAAAAGG'; $pattern="A{5,10}"; while ($gene =~ m/(?=$pattern)/gi) { $start = pos($gene) + 1; print ++$hit, " hit starts at $start\n"; } Generates: 1 hit starts at 4 2 hit starts at 5 3 hit starts at 6 4 hit starts at 7 You could get around this by imposing a constraint to avoid trivial overlaps. OK if you know the length of the pattern, but not so good for more complex patterns. If there was I way to get the look ahead to match the longest string possible for a variable length patte