From brunovecchi at yahoo.com.ar Fri Aug 1 00:16:16 2008 From: brunovecchi at yahoo.com.ar (Bruno Vecchi) Date: Fri, 01 Aug 2008 01:16:16 -0300 Subject: [Bioperl-l] Bio::Biblio doesn't find articles [SOLVED] Message-ID: <48928E10.7090903@yahoo.com.ar> An HTML attachment was scrubbed... URL: From Kevin.Clancy at invitrogen.com Fri Aug 1 18:30:30 2008 From: Kevin.Clancy at invitrogen.com (Clancy, Kevin) Date: Fri, 1 Aug 2008 15:30:30 -0700 Subject: [Bioperl-l] Reference to a staden module under Bio::SeqIO.pm Message-ID: <28813B71732ED64A83348116D27A1A9A0251ACA3@CBD01EXCMBX01.ads.invitrogen.net> Hi Folks I am using the windows version of Bioperl 1.5.2_100. I recently was compiling a tool using ActiveState's PerlApp which included Bioperl modules. I received an error for the Bio::SeqIO module, which was calling for the Bio::SeqIO::staden::read method(?) on line 312 - 314 of the Bio::SeqIO.pm module. I don't appear to have a copy of the staden module under the Bio::SeqIO directory and it doesn't appear to be present in the current BioPerl trunk. I simply commented this out of my SeqIO.pm file to perform my build and its all running normally. Was this simply a reference to a non existent module or am I missing something? Thank you for your help. kevin Kevin Clancy, PhD Senior Scientist, Informatic Sciences Invitrogen Corp Carlsbad, CA 92008 Phone: (768) 268 8356 Email: kevin.clancy at invitrogen.com From jason at bioperl.org Sat Aug 2 08:58:05 2008 From: jason at bioperl.org (Jason Stajich) Date: Sat, 2 Aug 2008 07:58:05 -0500 Subject: [Bioperl-l] Inframe stop codon In-Reply-To: <516747.39380.qm@web36405.mail.mud.yahoo.com> References: <516747.39380.qm@web36405.mail.mud.yahoo.com> Message-ID: [regarding PAML analyses] You would need to translate the cDNA sequence and identify where the stop codon is, then remove that codon or remove that sequence from your bulk analyses. it depends on why you think the stop codon is in the sequence - mis-annotation, this is a pseudogene, or what? If this is a small percentage of a lot of sequences I would probably just skip these, if this is the terminal stop codon that being included in the sequences, you just need to remove the last codon from the sequences before providing it to PAML. There Seq HOWTO has many examples of how to manipulate a sequence object with substr, trunc, as well as just the simple seq() method that gives you the sequence as a string, which you can manipulate, then update the sequence object afterwards. As in my $str = $seq->seq; # remove the last codon from this cDNA sequence substr($str, -3, 3,''); $seq->seq($str); Alternatively you can use trunc to truncate the sequence my $trunc = $seq->trunc(1,$seq->length -3); $seq = $trunc; You can translate the sequence with the $seq->translate command, then test for presence of a stop codon (This is exactly the code that is running in the pairwise_kaks script that is in the scripts/utilities/ directory). If you have a stop codon you need to figure out where it is at the end of the sequence or not. If it is the terminal codon, you can just lop off the last codon on all your sequences, but if it is internal, you need to decide what you want to do with this sequence. If there are multiple stop codons, I am not sure it is appropriate to run PAML here, unless you are interested in some sort of pseudo-rate calculation that has many of the codons omitted. Otherwise you may just want to calculate a DNA substitution rate for the sequences to make comparison. I suggest working a single file by hand to get the appropriate steps down and then coding it up will be easier. I am sure folks on the list can help too so it is important to post to the mailing list - I don't see any messages from you on the list about this query. -jason On Aug 2, 2008, at 5:42 AM, Tannistha wrote: > > Hi Jason, > > Please suggest me how to filter the inframe stop codons, > aa_to_dna_aln returns the sequence with in-frame stop codons. > I have posted my query along with the input files to the forum. > > Thanks for your earlier advice, runmode =0 is working for me. > > Look forward to your reply > > Best Regards > Tannistha > > > Dr. Tannistha Nandi > email: tannistha3 at yahoo.com > > > From David.Messina at sbc.su.se Sun Aug 3 15:10:18 2008 From: David.Messina at sbc.su.se (Dave Messina) Date: Sun, 3 Aug 2008 21:10:18 +0200 Subject: [Bioperl-l] Reference to a staden module under Bio::SeqIO.pm In-Reply-To: <28813B71732ED64A83348116D27A1A9A0251ACA3@CBD01EXCMBX01.ads.invitrogen.net> References: <28813B71732ED64A83348116D27A1A9A0251ACA3@CBD01EXCMBX01.ads.invitrogen.net> Message-ID: <628aabb70808031210u28f46f1fp5f40cd3443134d6c@mail.gmail.com> Hi Kevin, The staden module is a oddball one, to be sure. A search on the BioPerl website turns up this FAQ entry: http://www.bioperl.org/wiki/FAQ#bioperl-ext_won.27t_compile_the_staden_IO_lib_part_-_what_do_I_do.3F Also the Windows install page http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows says: > Some external programs such as Staden and > the EMBOSS suite of programs can only > be installed on Windows by using Cygwin and its gcc > C compiler (see Bioperl in Cygwin, below) > In any case, the staden module (and associated external libraries) is used only if you are trying to read the scf, abi, alf, pln, exp, ctf, or ztr binary formats. So your edit shouldn't cause you any problems otherwise. Dave From cjfields at uiuc.edu Sun Aug 3 16:20:52 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 3 Aug 2008 15:20:52 -0500 Subject: [Bioperl-l] Reference to a staden module under Bio::SeqIO.pm In-Reply-To: <628aabb70808031210u28f46f1fp5f40cd3443134d6c@mail.gmail.com> References: <28813B71732ED64A83348116D27A1A9A0251ACA3@CBD01EXCMBX01.ads.invitrogen.net> <628aabb70808031210u28f46f1fp5f40cd3443134d6c@mail.gmail.com> Message-ID: This seems to be a problem with PerlApp and eval{}; judging by a quick Google search this isn't the only module affected. The line in question is wrapped in an eval{} to check for the availability of Bio::SeqIO::staden::read (but not die on it). BTW, the eval was moved into the relevant plugin modules post-1.5.2, so the eval{} is checked when the module is loaded dynamically (i.e. when a format requiring it is passed in). It was causing other issues with ActivePerl installations and was redundant, so it was removed. http://bugzilla.open-bio.org/show_bug.cgi?id=2295 chris On Aug 3, 2008, at 2:10 PM, Dave Messina wrote: > Hi Kevin, > > The staden module is a oddball one, to be sure. > > A search on the BioPerl website turns up this FAQ entry: > http://www.bioperl.org/wiki/FAQ#bioperl-ext_won.27t_compile_the_staden_IO_lib_part_-_what_do_I_do.3F > > Also the Windows install page > http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows > > says: > >> Some external programs such as Staden > > and >> the EMBOSS suite of programs >> can only >> be installed on Windows by using Cygwin >> and its gcc >> C compiler (see Bioperl in Cygwin, below) >> > > > In any case, the staden module (and associated external libraries) > is used > only if you are trying to read the scf, abi, alf, pln, exp, ctf, or > ztr > binary formats. So your edit shouldn't cause you any problems > otherwise. > > Dave > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Marie-Claude Hofmann College of Veterinary Medicine University of Illinois Urbana-Champaign From btemperton at googlemail.com Sat Aug 2 16:05:37 2008 From: btemperton at googlemail.com (Benbo) Date: Sat, 2 Aug 2008 13:05:37 -0700 (PDT) Subject: [Bioperl-l] Finding possible primers regex Message-ID: <18792782.post@talk.nabble.com> Hi there, I'm trying to write a perl script to scan an aligned multiple entry fasta file and find possible primers. So far I've produced a string which contains bases which match all sequences and * where they don't match e.g. 1) TTAGCCTAA 2) TTAGCAGAA 3) TTACCCTAA would give TTA*C**AA. I want to parse this string and pull out all sequences which are 18-21 bp in length and have no more than 4 * in them. So far, I've got this: while($fragment_match =~ /([GTAC*]{18,21})/g){ print "$1\n"; } hoping to match all fragments 18-21 characters in length. However even that doesn't work as it has essentially chunked it into 21 char blocks, rather than what I hoped for of 0-18 0-19 0-20 0-21 1-19 1-20 1-21 1-22 etc. Can anyone let me know if this is already possible in BioPerl, or how one would go about it with regex. Sadly I'm fairly new to perl and getting to grips with BioPerl, so please treat me gently :). Many thanks, Ben -- View this message in context: http://www.nabble.com/Finding-possible-primers-regex-tp18792782p18792782.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From cjfields at uiuc.edu Mon Aug 4 00:08:51 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 3 Aug 2008 23:08:51 -0500 Subject: [Bioperl-l] Finding possible primers regex In-Reply-To: <18792782.post@talk.nabble.com> References: <18792782.post@talk.nabble.com> Message-ID: <33A8975C-2A88-4697-8298-7D16CB03CEAE@uiuc.edu> On Aug 2, 2008, at 3:05 PM, Benbo wrote: > > Hi there, > I'm trying to write a perl script to scan an aligned multiple entry > fasta > file and find possible primers. So far I've produced a string which > contains > bases which match all sequences and * where they don't match e.g. > 1) TTAGCCTAA > 2) TTAGCAGAA > 3) TTACCCTAA > > would give TTA*C**AA. > > I want to parse this string and pull out all sequences which are > 18-21 bp in > length and have no more than 4 * in them. > > So far, I've got this: > > while($fragment_match =~ /([GTAC*]{18,21})/g){ > print "$1\n"; > } > > hoping to match all fragments 18-21 characters in length. However > even that > doesn't work as it has essentially chunked it into 21 char blocks, > rather > than what I hoped for of > 0-18 > 0-19 > 0-20 > 0-21 > 1-19 > 1-20 > 1-21 > 1-22 > > etc. > > Can anyone let me know if this is already possible in BioPerl, or > how one > would go about it with regex. Sadly I'm fairly new to perl and > getting to > grips with BioPerl, so please treat me gently :). > > Many thanks, > > Ben There is a trick to this which is discussed more extensively in 'Mastering Regular Expressions'. Essentially you have to embed code into the regex and trick the parser into backtracking using a negative lookahead. The match itself fails (i.e. no match is returned), but the embedded code is executed for each match attempt, The following script is a slight modification of one I used which checks the consensus string from the input alignment (in aligned FASTA format here), extracts the alignment slice using that match, then spit the alignment out to STDOUT in clustalw format. This should work for perl 5.8 and up, but it's only been tested on perl 5.10. You should be able to use this to fit what you want. my $in = Bio::AlignIO->new(-file => $file, -format => 'fasta'); my $out = Bio::AlignIO->new(-fh => \*STDOUT, -format => 'clustalw'); while (my $aln = $in->next_aln) { my $c = $aln->consensus_string(100); my @matches; $c =~ m/ ([GTAC?]{18,21}) (?{my $match = check_match($1); push @matches, [$match, pos(), length($match)] if defined $match;}) (?!) /xig; for my $match (@matches) { my ($hit, $st, $end) = ($match->[0], $match->[1] - $match->[2] + 1, $match->[1]); my $newaln = $aln->slice($st, $end); $out->write_aln($newaln); } } sub check_match { my $match = shift; return unless $match; my $ct = $match =~ tr/?/?/; return $match if $ct <= 4; } chris From heikki at sanbi.ac.za Mon Aug 4 02:42:57 2008 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Mon, 4 Aug 2008 08:42:57 +0200 Subject: [Bioperl-l] Bio::Coordinate::Pair In-Reply-To: References: Message-ID: <200808040842.57466.heikki@sanbi.ac.za> Prashanth, Your example coordinates do not do the conversion but more or less report the locations of your features in some third coordinates. The way to think coordinates pairs is to use them as HSPs. You tell the pair object what is the matching segment in the pair of sequences. The synopsis in Bio::Coordinate::Pair class file gives the following example: use Bio::Location::Simple; use Bio::Coordinate::Pair; my $match1 = Bio::Location::Simple->new (-seq_id => 'propeptide', -start => 21, -end => 40, -strand=>1 ); my $match2 = Bio::Location::Simple->new (-seq_id => 'peptide', -start => 1, -end => 20, -strand=>1 ); my $pair = Bio::Coordinate::Pair->new(-in => $match1, -out => $match2 ); # location to match $pos = Bio::Location::Simple->new (-start => 25, -end => 25, -strand=> -1 ); $res = $pair->map($pos); print $res->match->start; # 5 In other words, region 25-40 in the propeptide matches locations 1-20 in the final peptide. Therefore conversion from 25 gives 5: signalp 21 25 40 --------------------|---|--------------| 1 5 pep 20 I hope this clarifies it. The advantage of using these objects over manual conversion is that the code has been debugged (no all too easy +/-1 errors) and that they can be chained together. Yours, -Heikki On Tuesday 29 July 2008 22:07:55 Prashanth Athri wrote: > Dear Professor Lehvaslaiho: > > I had a quick question about the module- Bio::Coordinate::Pair > > The BioPerl tutorial has the following example: > > $input_coordinates = Bio::Location::Simple->new > (-seq_id => 'propeptide', -start => 1000, -end => 2000, -strand=>1 ); > > $output_coordinates = Bio::Location::Simple->new > (-seq_id => 'peptide', -start => 1100, -end => 2100, -strand=>1 ); > > $pair = Bio::Coordinate::Pair->new > (-in => $input_coordinates , -out => $output_coordinates ); > > $pos = Bio::Location::Simple->new (-start => 500, -end => 500 ); > > $res = $pair->map($pos); > $converted_start = $res->start; > > The way I understand it, $converted_start should return ?1600?. But when I > run this snippet, it returns ?500?. Could you please let me know how > $pair->map($pos) is processed? > > I appreciate your time and thanks in advance. > > Regards, > Prashanth -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From lengjingmao at gmail.com Tue Aug 5 03:36:23 2008 From: lengjingmao at gmail.com (Shaohua Fan) Date: Tue, 5 Aug 2008 15:36:23 +0800 Subject: [Bioperl-l] how to remove indentical sequences from a dataset References: <18792782.post@talk.nabble.com> Message-ID: <79F0046F95254BE9B57DCC387671D908@6B2F7FFC298C46F> Hi, there , I have a sequence dataset which contains about 200 sequences. there are some identical sequences in this. is there any bioperl modules which can remove those identical sequences? thanks a lot. yours, shaohua ----- Original Message ----- From: "Benbo" To: Sent: Sunday, August 03, 2008 4:05 AM Subject: [Bioperl-l] Finding possible primers regex > > Hi there, > I'm trying to write a perl script to scan an aligned multiple entry fasta > file and find possible primers. So far I've produced a string which contains > bases which match all sequences and * where they don't match e.g. > 1) TTAGCCTAA > 2) TTAGCAGAA > 3) TTACCCTAA > > would give TTA*C**AA. > > I want to parse this string and pull out all sequences which are 18-21 bp in > length and have no more than 4 * in them. > > So far, I've got this: > > while($fragment_match =~ /([GTAC*]{18,21})/g){ > print "$1\n"; > } > > hoping to match all fragments 18-21 characters in length. However even that > doesn't work as it has essentially chunked it into 21 char blocks, rather > than what I hoped for of > 0-18 > 0-19 > 0-20 > 0-21 > 1-19 > 1-20 > 1-21 > 1-22 > > etc. > > Can anyone let me know if this is already possible in BioPerl, or how one > would go about it with regex. Sadly I'm fairly new to perl and getting to > grips with BioPerl, so please treat me gently :). > > Many thanks, > > Ben > > > > -- > View this message in context: http://www.nabble.com/Finding-possible-primers-regex-tp18792782p18792782.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bernd.web at gmail.com Tue Aug 5 05:49:55 2008 From: bernd.web at gmail.com (Bernd Web) Date: Tue, 5 Aug 2008 11:49:55 +0200 Subject: [Bioperl-l] how to remove indentical sequences from a dataset In-Reply-To: <79F0046F95254BE9B57DCC387671D908@6B2F7FFC298C46F> References: <18792782.post@talk.nabble.com> <79F0046F95254BE9B57DCC387671D908@6B2F7FFC298C46F> Message-ID: <716af09c0808050249p723b27c5uc84416663e1474bc@mail.gmail.com> Hi, There is a BioPerl Utility script doing this. See http://www.bioperl.org/wiki/Bioperl_scripts under the Utilities header. " scripts/utilities/bp_nrdb.PLS Make a non-redundant database based on sequence, not id. Requires Digest::MD5." Alternatively, you can make a hash using the sequences as keys. Regards, Bernd On Tue, Aug 5, 2008 at 9:36 AM, Shaohua Fan wrote: > Hi, there , > > I have a sequence dataset which contains about 200 sequences. there are some identical sequences in this. is there any bioperl modules which can remove those identical sequences? > > thanks a lot. > yours, > shaohua > ----- Original Message ----- > From: "Benbo" > To: > Sent: Sunday, August 03, 2008 4:05 AM > Subject: [Bioperl-l] Finding possible primers regex > > >> >> Hi there, >> I'm trying to write a perl script to scan an aligned multiple entry fasta >> file and find possible primers. So far I've produced a string which contains >> bases which match all sequences and * where they don't match e.g. >> 1) TTAGCCTAA >> 2) TTAGCAGAA >> 3) TTACCCTAA >> >> would give TTA*C**AA. >> >> I want to parse this string and pull out all sequences which are 18-21 bp in >> length and have no more than 4 * in them. >> >> So far, I've got this: >> >> while($fragment_match =~ /([GTAC*]{18,21})/g){ >> print "$1\n"; >> } >> >> hoping to match all fragments 18-21 characters in length. However even that >> doesn't work as it has essentially chunked it into 21 char blocks, rather >> than what I hoped for of >> 0-18 >> 0-19 >> 0-20 >> 0-21 >> 1-19 >> 1-20 >> 1-21 >> 1-22 >> >> etc. >> >> Can anyone let me know if this is already possible in BioPerl, or how one >> would go about it with regex. Sadly I'm fairly new to perl and getting to >> grips with BioPerl, so please treat me gently :). >> >> Many thanks, >> >> Ben >> >> >> >> -- >> View this message in context: http://www.nabble.com/Finding-possible-primers-regex-tp18792782p18792782.html >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From diriano at uni-potsdam.de Tue Aug 5 06:28:58 2008 From: diriano at uni-potsdam.de (Diego Mauricio Riano Pachon) Date: Tue, 05 Aug 2008 12:28:58 +0200 Subject: [Bioperl-l] how to remove indentical sequences from a dataset In-Reply-To: <716af09c0808050249p723b27c5uc84416663e1474bc@mail.gmail.com> References: <18792782.post@talk.nabble.com> <79F0046F95254BE9B57DCC387671D908@6B2F7FFC298C46F> <716af09c0808050249p723b27c5uc84416663e1474bc@mail.gmail.com> Message-ID: <48982B6A.4050304@uni-potsdam.de> Hi all, Or you might try a non-bioperl solution that works pretty well, check: http://blast.wustl.edu/pub/nrdb/executables/nrdb.linux-x86 Best, Diego Bernd Web wrote: > Hi, > > There is a BioPerl Utility script doing this. > See http://www.bioperl.org/wiki/Bioperl_scripts under the Utilities header. > > " scripts/utilities/bp_nrdb.PLS > Make a non-redundant database based on sequence, not id. Requires > Digest::MD5." > > Alternatively, you can make a hash using the sequences as keys. > > > Regards, > Bernd > > On Tue, Aug 5, 2008 at 9:36 AM, Shaohua Fan wrote: >> Hi, there , >> >> I have a sequence dataset which contains about 200 sequences. there are some identical sequences in this. is there any bioperl modules which can remove those identical sequences? >> >> thanks a lot. >> yours, >> shaohua >> ----- Original Message ----- >> From: "Benbo" >> To: >> Sent: Sunday, August 03, 2008 4:05 AM >> Subject: [Bioperl-l] Finding possible primers regex >> >> >>> Hi there, >>> I'm trying to write a perl script to scan an aligned multiple entry fasta >>> file and find possible primers. So far I've produced a string which contains >>> bases which match all sequences and * where they don't match e.g. >>> 1) TTAGCCTAA >>> 2) TTAGCAGAA >>> 3) TTACCCTAA >>> >>> would give TTA*C**AA. >>> >>> I want to parse this string and pull out all sequences which are 18-21 bp in >>> length and have no more than 4 * in them. >>> >>> So far, I've got this: >>> >>> while($fragment_match =~ /([GTAC*]{18,21})/g){ >>> print "$1\n"; >>> } >>> >>> hoping to match all fragments 18-21 characters in length. However even that >>> doesn't work as it has essentially chunked it into 21 char blocks, rather >>> than what I hoped for of >>> 0-18 >>> 0-19 >>> 0-20 >>> 0-21 >>> 1-19 >>> 1-20 >>> 1-21 >>> 1-22 >>> >>> etc. >>> >>> Can anyone let me know if this is already possible in BioPerl, or how one >>> would go about it with regex. Sadly I'm fairly new to perl and getting to >>> grips with BioPerl, so please treat me gently :). >>> >>> Many thanks, >>> >>> Ben >>> >>> >>> >>> -- >>> View this message in context: http://www.nabble.com/Finding-possible-primers-regex-tp18792782p18792782.html >>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ___________________________________ Diego Mauricio Ria?o Pach?n Biologist - PhD student AG Mueller-Roeber Institute for Biochemistry and Biology University of Potsdam Address: Karl-Liebknecht-Str. 24-25 Haus 20 14476 Golm Germany Tel: +49 331 977 2809 Fax: +49 331 977 2512 web: http://www.geocities.com/dmrp.geo From cjfields at uiuc.edu Tue Aug 5 11:19:54 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 5 Aug 2008 10:19:54 -0500 Subject: [Bioperl-l] how to remove indentical sequences from a dataset In-Reply-To: <48982B6A.4050304@uni-potsdam.de> References: <18792782.post@talk.nabble.com> <79F0046F95254BE9B57DCC387671D908@6B2F7FFC298C46F> <716af09c0808050249p723b27c5uc84416663e1474bc@mail.gmail.com> <48982B6A.4050304@uni-potsdam.de> Message-ID: <4DDBF772-170A-414A-9468-A2607498F3E2@uiuc.edu> Here are two links which go into detail (the last is a specific implementation): http://en.wikipedia.org/wiki/Sequence_clustering http://www.bioinformatics.org/cd-hit/ chris On Aug 5, 2008, at 5:28 AM, Diego Mauricio Riano Pachon wrote: > Hi all, > > Or you might try a non-bioperl solution that works pretty well, check: > > http://blast.wustl.edu/pub/nrdb/executables/nrdb.linux-x86 > > Best, > > Diego > > Bernd Web wrote: >> Hi, >> There is a BioPerl Utility script doing this. >> See http://www.bioperl.org/wiki/Bioperl_scripts under the Utilities >> header. >> " scripts/utilities/bp_nrdb.PLS >> Make a non-redundant database based on sequence, not id. Requires >> Digest::MD5." >> Alternatively, you can make a hash using the sequences as keys. >> Regards, >> Bernd >> On Tue, Aug 5, 2008 at 9:36 AM, Shaohua Fan >> wrote: >>> Hi, there , >>> >>> I have a sequence dataset which contains about 200 sequences. >>> there are some identical sequences in this. is there any bioperl >>> modules which can remove those identical sequences? >>> >>> thanks a lot. >>> yours, >>> shaohua >>> ----- Original Message ----- >>> From: "Benbo" >>> To: >>> Sent: Sunday, August 03, 2008 4:05 AM >>> Subject: [Bioperl-l] Finding possible primers regex >>> >>> >>>> Hi there, >>>> I'm trying to write a perl script to scan an aligned multiple >>>> entry fasta >>>> file and find possible primers. So far I've produced a string >>>> which contains >>>> bases which match all sequences and * where they don't match e.g. >>>> 1) TTAGCCTAA >>>> 2) TTAGCAGAA >>>> 3) TTACCCTAA >>>> >>>> would give TTA*C**AA. >>>> >>>> I want to parse this string and pull out all sequences which are >>>> 18-21 bp in >>>> length and have no more than 4 * in them. >>>> >>>> So far, I've got this: >>>> >>>> while($fragment_match =~ /([GTAC*]{18,21})/g){ >>>> print "$1\n"; >>>> } >>>> >>>> hoping to match all fragments 18-21 characters in length. However >>>> even that >>>> doesn't work as it has essentially chunked it into 21 char >>>> blocks, rather >>>> than what I hoped for of >>>> 0-18 >>>> 0-19 >>>> 0-20 >>>> 0-21 >>>> 1-19 >>>> 1-20 >>>> 1-21 >>>> 1-22 >>>> >>>> etc. >>>> >>>> Can anyone let me know if this is already possible in BioPerl, or >>>> how one >>>> would go about it with regex. Sadly I'm fairly new to perl and >>>> getting to >>>> grips with BioPerl, so please treat me gently :). >>>> >>>> Many thanks, >>>> >>>> Ben >>>> >>>> >>>> >>>> -- >>>> View this message in context: http://www.nabble.com/Finding-possible-primers-regex-tp18792782p18792782.html >>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -- > ___________________________________ > Diego Mauricio Ria?o Pach?n > Biologist - PhD student > AG Mueller-Roeber > Institute for Biochemistry and Biology > University of Potsdam > > Address: Karl-Liebknecht-Str. 24-25 > Haus 20 > 14476 Golm > Germany > > Tel: +49 331 977 2809 > Fax: +49 331 977 2512 > > web: http://www.geocities.com/dmrp.geo > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Marie-Claude Hofmann College of Veterinary Medicine University of Illinois Urbana-Champaign From cjfields at uiuc.edu Tue Aug 5 11:19:54 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 5 Aug 2008 10:19:54 -0500 Subject: [Bioperl-l] how to remove indentical sequences from a dataset In-Reply-To: <48982B6A.4050304@uni-potsdam.de> References: <18792782.post@talk.nabble.com> <79F0046F95254BE9B57DCC387671D908@6B2F7FFC298C46F> <716af09c0808050249p723b27c5uc84416663e1474bc@mail.gmail.com> <48982B6A.4050304@uni-potsdam.de> Message-ID: <4DDBF772-170A-414A-9468-A2607498F3E2@uiuc.edu> Here are two links which go into detail (the last is a specific implementation): http://en.wikipedia.org/wiki/Sequence_clustering http://www.bioinformatics.org/cd-hit/ chris On Aug 5, 2008, at 5:28 AM, Diego Mauricio Riano Pachon wrote: > Hi all, > > Or you might try a non-bioperl solution that works pretty well, check: > > http://blast.wustl.edu/pub/nrdb/executables/nrdb.linux-x86 > > Best, > > Diego > > Bernd Web wrote: >> Hi, >> There is a BioPerl Utility script doing this. >> See http://www.bioperl.org/wiki/Bioperl_scripts under the Utilities >> header. >> " scripts/utilities/bp_nrdb.PLS >> Make a non-redundant database based on sequence, not id. Requires >> Digest::MD5." >> Alternatively, you can make a hash using the sequences as keys. >> Regards, >> Bernd >> On Tue, Aug 5, 2008 at 9:36 AM, Shaohua Fan >> wrote: >>> Hi, there , >>> >>> I have a sequence dataset which contains about 200 sequences. >>> there are some identical sequences in this. is there any bioperl >>> modules which can remove those identical sequences? >>> >>> thanks a lot. >>> yours, >>> shaohua >>> ----- Original Message ----- >>> From: "Benbo" >>> To: >>> Sent: Sunday, August 03, 2008 4:05 AM >>> Subject: [Bioperl-l] Finding possible primers regex >>> >>> >>>> Hi there, >>>> I'm trying to write a perl script to scan an aligned multiple >>>> entry fasta >>>> file and find possible primers. So far I've produced a string >>>> which contains >>>> bases which match all sequences and * where they don't match e.g. >>>> 1) TTAGCCTAA >>>> 2) TTAGCAGAA >>>> 3) TTACCCTAA >>>> >>>> would give TTA*C**AA. >>>> >>>> I want to parse this string and pull out all sequences which are >>>> 18-21 bp in >>>> length and have no more than 4 * in them. >>>> >>>> So far, I've got this: >>>> >>>> while($fragment_match =~ /([GTAC*]{18,21})/g){ >>>> print "$1\n"; >>>> } >>>> >>>> hoping to match all fragments 18-21 characters in length. However >>>> even that >>>> doesn't work as it has essentially chunked it into 21 char >>>> blocks, rather >>>> than what I hoped for of >>>> 0-18 >>>> 0-19 >>>> 0-20 >>>> 0-21 >>>> 1-19 >>>> 1-20 >>>> 1-21 >>>> 1-22 >>>> >>>> etc. >>>> >>>> Can anyone let me know if this is already possible in BioPerl, or >>>> how one >>>> would go about it with regex. Sadly I'm fairly new to perl and >>>> getting to >>>> grips with BioPerl, so please treat me gently :). >>>> >>>> Many thanks, >>>> >>>> Ben >>>> >>>> >>>> >>>> -- >>>> View this message in context: http://www.nabble.com/Finding-possible-primers-regex-tp18792782p18792782.html >>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -- > ___________________________________ > Diego Mauricio Ria?o Pach?n > Biologist - PhD student > AG Mueller-Roeber > Institute for Biochemistry and Biology > University of Potsdam > > Address: Karl-Liebknecht-Str. 24-25 > Haus 20 > 14476 Golm > Germany > > Tel: +49 331 977 2809 > Fax: +49 331 977 2512 > > web: http://www.geocities.com/dmrp.geo > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Marie-Claude Hofmann College of Veterinary Medicine University of Illinois Urbana-Champaign From lengjingmao at gmail.com Tue Aug 5 11:24:22 2008 From: lengjingmao at gmail.com (Shaohua Fan) Date: Tue, 5 Aug 2008 23:24:22 +0800 Subject: [Bioperl-l] how to remove indentical sequences from a dataset References: <18792782.post@talk.nabble.com> <79F0046F95254BE9B57DCC387671D908@6B2F7FFC298C46F> <716af09c0808050249p723b27c5uc84416663e1474bc@mail.gmail.com> <48982B6A.4050304@uni-potsdam.de> <4DDBF772-170A-414A-9468-A2607498F3E2@uiuc.edu> Message-ID: <3A95AD6D18A749F3B73C135CCC8E7C90@6B2F7FFC298C46F> hi, thanks a lot for the help! cheers, shaohua ----- Original Message ----- From: "Chris Fields" To: "Diego Mauricio Riano Pachon" Cc: "Bernd Web" ; ; "Shaohua Fan" Sent: Tuesday, August 05, 2008 11:19 PM Subject: Re: [Bioperl-l] how to remove indentical sequences from a dataset Here are two links which go into detail (the last is a specific implementation): http://en.wikipedia.org/wiki/Sequence_clustering http://www.bioinformatics.org/cd-hit/ chris On Aug 5, 2008, at 5:28 AM, Diego Mauricio Riano Pachon wrote: > Hi all, > > Or you might try a non-bioperl solution that works pretty well, check: > > http://blast.wustl.edu/pub/nrdb/executables/nrdb.linux-x86 > > Best, > > Diego > > Bernd Web wrote: >> Hi, >> There is a BioPerl Utility script doing this. >> See http://www.bioperl.org/wiki/Bioperl_scripts under the Utilities >> header. >> " scripts/utilities/bp_nrdb.PLS >> Make a non-redundant database based on sequence, not id. Requires >> Digest::MD5." >> Alternatively, you can make a hash using the sequences as keys. >> Regards, >> Bernd >> On Tue, Aug 5, 2008 at 9:36 AM, Shaohua Fan >> wrote: >>> Hi, there , >>> >>> I have a sequence dataset which contains about 200 sequences. >>> there are some identical sequences in this. is there any bioperl >>> modules which can remove those identical sequences? >>> >>> thanks a lot. >>> yours, >>> shaohua >>> ----- Original Message ----- >>> From: "Benbo" >>> To: >>> Sent: Sunday, August 03, 2008 4:05 AM >>> Subject: [Bioperl-l] Finding possible primers regex >>> >>> >>>> Hi there, >>>> I'm trying to write a perl script to scan an aligned multiple >>>> entry fasta >>>> file and find possible primers. So far I've produced a string >>>> which contains >>>> bases which match all sequences and * where they don't match e.g. >>>> 1) TTAGCCTAA >>>> 2) TTAGCAGAA >>>> 3) TTACCCTAA >>>> >>>> would give TTA*C**AA. >>>> >>>> I want to parse this string and pull out all sequences which are >>>> 18-21 bp in >>>> length and have no more than 4 * in them. >>>> >>>> So far, I've got this: >>>> >>>> while($fragment_match =~ /([GTAC*]{18,21})/g){ >>>> print "$1\n"; >>>> } >>>> >>>> hoping to match all fragments 18-21 characters in length. However >>>> even that >>>> doesn't work as it has essentially chunked it into 21 char >>>> blocks, rather >>>> than what I hoped for of >>>> 0-18 >>>> 0-19 >>>> 0-20 >>>> 0-21 >>>> 1-19 >>>> 1-20 >>>> 1-21 >>>> 1-22 >>>> >>>> etc. >>>> >>>> Can anyone let me know if this is already possible in BioPerl, or >>>> how one >>>> would go about it with regex. Sadly I'm fairly new to perl and >>>> getting to >>>> grips with BioPerl, so please treat me gently :). >>>> >>>> Many thanks, >>>> >>>> Ben >>>> >>>> >>>> >>>> -- >>>> View this message in context: http://www.nabble.com/Finding-possible-primers-regex-tp18792782p18792782.html >>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -- > ___________________________________ > Diego Mauricio Ria?o Pach?n > Biologist - PhD student > AG Mueller-Roeber > Institute for Biochemistry and Biology > University of Potsdam > > Address: Karl-Liebknecht-Str. 24-25 > Haus 20 > 14476 Golm > Germany > > Tel: +49 331 977 2809 > Fax: +49 331 977 2512 > > web: http://www.geocities.com/dmrp.geo > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Marie-Claude Hofmann College of Veterinary Medicine University of Illinois Urbana-Champaign From martin.senger at gmail.com Tue Aug 5 22:53:07 2008 From: martin.senger at gmail.com (Martin Senger) Date: Wed, 6 Aug 2008 10:53:07 +0800 Subject: [Bioperl-l] Bio::Biblio doesn't find articles Message-ID: <4d93f07c0808051953k4cb7511cg5ec4cd93f53cfd0f@mail.gmail.com> I am afraid that the server that serves the MEDLINE database to the Bio::Biblio module (using the SOAP protocol), and that is running at EBI, may be not fully supported. I am not working at EBI anymore and I have stopped to monitor their servers. I am still their collaborator - but I am not, unfortunately, involved in the MEDLINE tools anymore. I would be happy to continue to maintain the Bio::Biblio module but it relies on a server that I do not anymore control. Cheers, Martin -- Martin Senger email: martin.senger at gmail.com,m.senger at cgiar.org skype: martinsenger From Russell.Smithies at agresearch.co.nz Wed Aug 6 17:20:04 2008 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 7 Aug 2008 09:20:04 +1200 Subject: [Bioperl-l] not BioPerl Message-ID: Has anyone taken a look at the new Perl interface to the NCBI C++ Toolkit? Unfortunately, I can't even get their examples working as I'm behind a firewall and documentation on setting proxy stuff is virtually non-existant :-( Russell Smithies Bioinformatics Applications Developer T +64 3 489 9085 E russell.smithies at agresearch.co.nz Invermay Research Centre Puddle Alley, Mosgiel, New Zealand T +64 3 489 3809 F +64 3 489 9174 www.agresearch.co.nz ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From cjfields at illinois.edu Wed Aug 6 17:33:27 2008 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 6 Aug 2008 16:33:27 -0500 Subject: [Bioperl-l] not BioPerl In-Reply-To: References: Message-ID: Looks like they're binary releases for 32- and 64-bit linux (quite large, at 25 MB). Would be nice to have the C++ bindings for other OS's (my guess is this was set up via swig). I have access to a linux cluster, so I may give this a try soon. chris On Aug 6, 2008, at 4:20 PM, Smithies, Russell wrote: > Has anyone taken a look at the new Perl interface to the NCBI C++ > Toolkit? > Unfortunately, I can't even get their examples working as I'm behind a > firewall and documentation on setting proxy stuff is virtually > non-existant :-( > > > Russell Smithies > > Bioinformatics Applications Developer > T +64 3 489 9085 > E russell.smithies at agresearch.co.nz > > Invermay Research Centre > Puddle Alley, > Mosgiel, > New Zealand > T +64 3 489 3809 > F +64 3 489 9174 > www.agresearch.co.nz > > > > > = > ====================================================================== > Attention: The information contained in this message and/or > attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or > privileged > material. Any review, retransmission, dissemination or other use of, > or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by > AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > = > ====================================================================== > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Marie-Claude Hofmann College of Veterinary Medicine University of Illinois Urbana-Champaign From vinaykmittal at gatech.edu Wed Aug 6 16:56:22 2008 From: vinaykmittal at gatech.edu (Mittal, Vinay K) Date: Wed, 6 Aug 2008 16:56:22 -0400 (EDT) Subject: [Bioperl-l] Error installing Biopel Module Message-ID: <469631287.3995201218056182383.JavaMail.root@mail5.gatech.edu> Hi, I just installed Active perl 5.10.0 and was trying to install Bioperl Modules. While installing Bioperl through package manager(ppm), I am getting following errors: ppm install failed: Can't find any package that provides SOAP::Lite for Bundle-BioPerl-Core Can't find any package that provides Convert::Binary::C for Bundle-BioPerl-Core I don't know what the problem is. I have never used Bioperl Modules before. Thanks. -- -------- Vinay Kumar Mittal MS,Bioinformatics Georgia Institute of Technology From rfrancis at ichr.uwa.edu.au Wed Aug 6 21:11:28 2008 From: rfrancis at ichr.uwa.edu.au (Richard Francis) Date: Thu, 07 Aug 2008 09:11:28 +0800 Subject: [Bioperl-l] AlignIO::clustalw match_line query Message-ID: <1218071488.3074.2.camel@acs-pc-a0966.ichr.uwa.edu.au> Dear List, I wonder if you can help. I?m having trouble finding out on which criteria the conserved and semi-conserved substitution decisions for a match line produced from the match_line function in AlignIO are based. I note that match_line produces the same output as an alignment match line would from ClustalW and indeed is used in the AlignIO::clustalw module, but are the substitution decisions based on the same Venn diagram at http://www.ebi.ac.uk/Tools/clustalw2/clustalw_help.html#color ie are they faithful to the generation of the match line from within ClustalW itself? I need to know this as part of a paper I?m writing so I would really appreciate your help with this. Kind regards and thanks in advance, Richard Francis ##################################################################################### This e-mail message has been scanned for Viruses and Content and cleared by MailMarshal ##################################################################################### From jason at bioperl.org Wed Aug 6 22:26:06 2008 From: jason at bioperl.org (Jason Stajich) Date: Wed, 6 Aug 2008 19:26:06 -0700 Subject: [Bioperl-l] AlignIO::clustalw match_line query In-Reply-To: <1218071488.3074.2.camel@acs-pc-a0966.ichr.uwa.edu.au> References: <1218071488.3074.2.camel@acs-pc-a0966.ichr.uwa.edu.au> Message-ID: Implemented independently, but it was based on what the clustalw documentation says. The main code is in the match_line function in Bio::SimpleAlign. See the CONSERVATION_GROUPS Hash which looks like this: So a 'strong' (":") on the match line would be coded where the residues seen in a column are only 'S', 'T', or 'A' (for example). 'strong' => [ qw( STA NEQK NHQK NDEQ QHRK MILV MILF HY FYW )], 'weak' => [ qw( CSA ATV SAG STNK STPA SGND SNDEQK NDEQHK NEQHRK FVLIM HFY )],); } It was checked against clustalw output by hand when it was implemented. If you know of any inconsistencies, let use know. -jason On Aug 6, 2008, at 6:11 PM, Richard Francis wrote: > Dear List, > > I wonder if you can help. > > I?m having trouble finding out on which criteria the conserved and > semi-conserved substitution decisions for a match line produced > from the > match_line function in AlignIO are based. > > I note that match_line produces the same output as an alignment match > line would from ClustalW and indeed is used in the AlignIO::clustalw > module, but are the substitution decisions based on the same Venn > diagram at http://www.ebi.ac.uk/Tools/clustalw2/ > clustalw_help.html#color > ie are they faithful to the generation of the match line from within > ClustalW itself? > > I need to know this as part of a paper I?m writing so I would really > appreciate your help with this. > > Kind regards and thanks in advance, > > Richard Francis > ###################################################################### > ############### > This e-mail message has been scanned for Viruses and Content and > cleared > by MailMarshal > ###################################################################### > ############### > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From betts at embl.de Thu Aug 7 08:42:59 2008 From: betts at embl.de (Matthew Betts) Date: Thu, 7 Aug 2008 14:42:59 +0200 (CEST) Subject: [Bioperl-l] Bio:Graphics for drawing secondary structure Message-ID: Hi, Has any one tried to draw secondary structure with Bio::Graphics? i.e. two different types of glyph with different colours on the same track. Could use a hash reference to get the different glyph types (would be nice if there was a cylinder glyph and a thick arrow glyph), or heterogeneous segments to get the different colours, but I can't see how to do both at the same time. Any example code or suggestions on how I could implement it would be great. Thanks, Matthew -- Matthew Betts PhD, Russell Group (Structural Bioinformatics) EMBL, Meyerhofstrasse 1, D-69117 Heidelberg, Germany phone: +49 (0)6221 387 8305; mailto:betts at embl.de From cain.cshl at gmail.com Thu Aug 7 10:08:39 2008 From: cain.cshl at gmail.com (Scott Cain) Date: Thu, 7 Aug 2008 10:08:39 -0400 Subject: [Bioperl-l] Bio:Graphics for drawing secondary structure In-Reply-To: References: Message-ID: <536f21b00808070708q6180d4fft279078f2a28ac93d@mail.gmail.com> Hi Matthew, I don't have any code examples, but people have used GBrowse for protein secondary structure, which uses Bio::Graphics underneath the hood. If you want to put more than one glyph and/or more than one color in a track, it is fairly easy. You just need to provide a callback for each option when you create the track, like this: $panel->add_track($features_array_ref, -glyph => sub { #code to set the glyph according the attributes of the feature }, -bgcolor => sub { #code to set the color }, -fgcolor => 'black', ...etc... ); For more information, see the biographics howto: http://www.bioperl.org/wiki/HOWTO:Graphics Scott On Thu, Aug 7, 2008 at 8:42 AM, Matthew Betts wrote: > > Hi, > > Has any one tried to draw secondary structure with Bio::Graphics? i.e. two > different types of glyph with different colours on the same track. > > Could use a hash reference to get the different glyph types (would be nice > if there was a cylinder glyph and a thick arrow glyph), or heterogeneous > segments to get the different colours, but I can't see how to do both at > the same time. > > Any example code or suggestions on how I could implement it would be > great. > > Thanks, > > Matthew > > -- > Matthew Betts PhD, Russell Group (Structural Bioinformatics) > EMBL, Meyerhofstrasse 1, D-69117 Heidelberg, Germany > phone: +49 (0)6221 387 8305; mailto:betts at embl.de > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain.cshl at gmail.com GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From betts at embl.de Thu Aug 7 12:27:28 2008 From: betts at embl.de (Matthew Betts) Date: Thu, 7 Aug 2008 18:27:28 +0200 (CEST) Subject: [Bioperl-l] Bio:Graphics for drawing secondary structure In-Reply-To: <536f21b00808070708q6180d4fft279078f2a28ac93d@mail.gmail.com> References: <536f21b00808070708q6180d4fft279078f2a28ac93d@mail.gmail.com> Message-ID: Hi Scott, Thanks for that, was a great help - I didn't realise I could use a code ref for anything other than the glyph name. I'm now doing this: $panel->add_track( '-bgcolor' => sub { my($feature) = @_; $feature->display_name eq 'strand' ? 'cyan' : 'magenta'; }, '-strand_arrow' => sub { my($feature) = @_; $feature->display_name eq 'strand' ? 1 : 0; }, ); Matthew On Thu, 7 Aug 2008, Scott Cain wrote: > Hi Matthew, > > I don't have any code examples, but people have used GBrowse for > protein secondary structure, which uses Bio::Graphics underneath the > hood. > > If you want to put more than one glyph and/or more than one color in a > track, it is fairly easy. You just need to provide a callback for > each option when you create the track, like this: > > $panel->add_track($features_array_ref, > -glyph => sub { #code to set the glyph > according the attributes of the feature }, > -bgcolor => sub { #code to set the color }, > -fgcolor => 'black', > ...etc... > ); > > For more information, see the biographics howto: > > http://www.bioperl.org/wiki/HOWTO:Graphics > > Scott > > > > On Thu, Aug 7, 2008 at 8:42 AM, Matthew Betts wrote: > > > > Hi, > > > > Has any one tried to draw secondary structure with Bio::Graphics? i.e. two > > different types of glyph with different colours on the same track. > > > > Could use a hash reference to get the different glyph types (would be nice > > if there was a cylinder glyph and a thick arrow glyph), or heterogeneous > > segments to get the different colours, but I can't see how to do both at > > the same time. > > > > Any example code or suggestions on how I could implement it would be > > great. > > > > Thanks, > > > > Matthew > > > > -- > > Matthew Betts PhD, Russell Group (Structural Bioinformatics) > > EMBL, Meyerhofstrasse 1, D-69117 Heidelberg, Germany > > phone: +49 (0)6221 387 8305; mailto:betts at embl.de > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > From jay at jays.net Thu Aug 7 12:32:29 2008 From: jay at jays.net (Jay Hannah) Date: Thu, 07 Aug 2008 11:32:29 -0500 Subject: [Bioperl-l] not BioPerl In-Reply-To: References: Message-ID: <489B239D.8060305@jays.net> Smithies, Russell wrote: > Has anyone taken a look at the new Perl interface to the NCBI C++ Toolkit? > Unfortunately, I can't even get their examples working as I'm behind a > firewall and documentation on setting proxy stuff is virtually > non-existant :-( > Do people actually use the NCBI C++ Toolkit for things outside of NCBI? What? I tried to leverage it a year or so ago for an Entrez/sequence/search project and got nowhere. j From jcherry at ncbi.nlm.nih.gov Thu Aug 7 13:06:28 2008 From: jcherry at ncbi.nlm.nih.gov (Josh Cherry) Date: Thu, 7 Aug 2008 13:06:28 -0400 (EDT) Subject: [Bioperl-l] NCBI C++ Toolkit wrapper (was: not BioPerl) Message-ID: For those who may be wondering what this is about, a Perl interface to the NCBI C++ Toolkit is available at ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools++/CURRENT/. The C++ Toolkit is the main code base that we develop and use at NCBI. It includes many things that may be of interest to BioPerl users, such as sequence analysis algorithms, means for interacting with NCBI databases, and facilities for reading, writing, and manipulating NCBI data model objects (usually defined by ASN.1 specifications; writeable as ASN.1, XML, and JSON, and readable from ASN.1 and XML). Russell, I think you can make things work from behind a firewall by setting some environment variables: set CONN_FIREWALL to 1, possibly set CONN_STATELESS to 1, and set CONN_HTTP_PROXY_HOST and CONN_HTTP_PROXY_PORT as appropriate. Please email me if you can't get things to work. I'll see that decent instructions for this are included in the next release. Josh Cherry On Aug 6, 2008, at 4:20 PM, Smithies, Russell wrote: > Has anyone taken a look at the new Perl interface to the NCBI C++ > Toolkit? > Unfortunately, I can't even get their examples working as I'm behind a > firewall and documentation on setting proxy stuff is virtually > non-existant :-( > > > Russell Smithies From tristan.lefebure at gmail.com Thu Aug 7 13:35:24 2008 From: tristan.lefebure at gmail.com (Tristan Lefebure) Date: Thu, 7 Aug 2008 13:35:24 -0400 Subject: [Bioperl-l] (TreeFunctionsI) merge_lineage method very slow on large trees Message-ID: <200808071335.24668.tristan.lefebure@gmail.com> Hi list, I'm using a script very similar to bp_taxonomy2tree.pl distributed with BioPerl (with the only difference that I'm using taxids instead of taxon names). Basically, the script generates a taxonomic tree given a list of taxids using the NCBI taxonomy db. For each taxon, it generates a taxon object, and then merge this object to a tree object that keeps growing. It runs very well with a small number of taxa, but with many taxa (>1000), it is very very very slow (about a week for 3000 taxa). The slowness is due to the function merge_lineage (line 65), which merges the existing tree object with a new taxon object. I guess that the difficulty with a big tree (i.e. more than 1000 leaf) is to find the nodes in common between the tree and the new taxon object... Would you have any idea of how to get around the problem? Should I look under the hood of merge_lineage to try to improve it for large trees? Thanks! Version: bioperl-1.5.2_102 OS: GNU/Linux -Tristan From cjfields at illinois.edu Thu Aug 7 13:38:53 2008 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 7 Aug 2008 12:38:53 -0500 Subject: [Bioperl-l] NCBI C++ Toolkit wrapper (was: not BioPerl) In-Reply-To: References: Message-ID: Josh, Thanks for the update. I saw that these are only binaries for linux 32/64-bit. Are there plans to either support other OS's (OS X, Win, etc) or to maybe make a release with the XS-bindings so users can work towards that? With additional support I can see this easily fitting into several spots in BioPerl, but otherwise I'm unsure. chris On Aug 7, 2008, at 12:06 PM, Josh Cherry wrote: > For those who may be wondering what this is about, a Perl interface > to the NCBI C++ Toolkit is available at ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools++/CURRENT/ > . The C++ Toolkit is the main code base that we develop and use at > NCBI. It includes many things that may be of interest to BioPerl > users, such as sequence analysis algorithms, means for interacting > with NCBI databases, and facilities for reading, writing, and > manipulating NCBI data model objects (usually defined by ASN.1 > specifications; writeable as ASN.1, XML, and JSON, and readable from > ASN.1 and XML). > > Russell, I think you can make things work from behind a firewall by > setting some environment variables: set CONN_FIREWALL to 1, possibly > set CONN_STATELESS to 1, and set CONN_HTTP_PROXY_HOST and > CONN_HTTP_PROXY_PORT as appropriate. Please email me if you can't > get things to work. I'll see that decent instructions for this are > included in the next release. > > Josh Cherry > > > On Aug 6, 2008, at 4:20 PM, Smithies, Russell wrote: > >> Has anyone taken a look at the new Perl interface to the NCBI C++ >> Toolkit? >> Unfortunately, I can't even get their examples working as I'm >> behind a >> firewall and documentation on setting proxy stuff is virtually >> non-existant :-( >> >> >> Russell Smithies > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Marie-Claude Hofmann College of Veterinary Medicine University of Illinois Urbana-Champaign From jcherry at ncbi.nlm.nih.gov Thu Aug 7 14:04:17 2008 From: jcherry at ncbi.nlm.nih.gov (Josh Cherry) Date: Thu, 7 Aug 2008 14:04:17 -0400 (EDT) Subject: [Bioperl-l] NCBI C++ Toolkit wrapper (was: not BioPerl) In-Reply-To: References: Message-ID: Chris, Support for other OS's is definitely a possibility, depending on community feedback (how useful are the wrappers in general, and how much demand is there for them on other platforms?). I wish I could magically make them available for Windows and OS X, but there are some technical issues to work out. Josh On Thu, 7 Aug 2008, Chris Fields wrote: > Josh, > > Thanks for the update. I saw that these are only binaries for linux > 32/64-bit. Are there plans to either support other OS's (OS X, Win, etc) or > to maybe make a release with the XS-bindings so users can work towards that? > With additional support I can see this easily fitting into several spots in > BioPerl, but otherwise I'm unsure. > > chris > > On Aug 7, 2008, at 12:06 PM, Josh Cherry wrote: > >> For those who may be wondering what this is about, a Perl interface to the >> NCBI C++ Toolkit is available at >> ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools++/CURRENT/. The C++ Toolkit is >> the main code base that we develop and use at NCBI. It includes many >> things that may be of interest to BioPerl users, such as sequence analysis >> algorithms, means for interacting with NCBI databases, and facilities for >> reading, writing, and manipulating NCBI data model objects (usually defined >> by ASN.1 specifications; writeable as ASN.1, XML, and JSON, and readable >> from ASN.1 and XML). >> >> Russell, I think you can make things work from behind a firewall by setting >> some environment variables: set CONN_FIREWALL to 1, possibly set >> CONN_STATELESS to 1, and set CONN_HTTP_PROXY_HOST and CONN_HTTP_PROXY_PORT >> as appropriate. Please email me if you can't get things to work. I'll see >> that decent instructions for this are included in the next release. >> >> Josh Cherry >> >> >> On Aug 6, 2008, at 4:20 PM, Smithies, Russell wrote: >> >>> Has anyone taken a look at the new Perl interface to the NCBI C++ >>> Toolkit? >>> Unfortunately, I can't even get their examples working as I'm behind a >>> firewall and documentation on setting proxy stuff is virtually >>> non-existant :-( >>> >>> >>> Russell Smithies >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Marie-Claude Hofmann > College of Veterinary Medicine > University of Illinois Urbana-Champaign > > > > From bix at sendu.me.uk Thu Aug 7 18:20:29 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 07 Aug 2008 23:20:29 +0100 Subject: [Bioperl-l] (TreeFunctionsI) merge_lineage method very slow on large trees In-Reply-To: <200808071335.24668.tristan.lefebure@gmail.com> References: <200808071335.24668.tristan.lefebure@gmail.com> Message-ID: <489B752D.2080209@sendu.me.uk> Tristan Lefebure wrote: > I'm using a script very similar to bp_taxonomy2tree.pl distributed with > BioPerl (with the only difference that I'm using taxids instead of taxon > names). Basically, the script generates a taxonomic tree given a list of > taxids using the NCBI taxonomy db. For each taxon, it generates a taxon > object, and then merge this object to a tree object that keeps growing. It > runs very well with a small number of taxa, but with many taxa (>1000), it is > very very very slow (about a week for 3000 taxa). > > The slowness is due to the function merge_lineage (line 65), which merges the > existing tree object with a new taxon object. I guess that the difficulty > with a big tree (i.e. more than 1000 leaf) is to find the nodes in common > between the tree and the new taxon object... > > Would you have any idea of how to get around the problem? Should I look under > the hood of merge_lineage to try to improve it for large trees? Yes, please do. It might have been me that wrote that, in which case I didn't do anything fancy or consider the above problem. From cjfields at illinois.edu Thu Aug 7 20:42:16 2008 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 7 Aug 2008 19:42:16 -0500 Subject: [Bioperl-l] (TreeFunctionsI) merge_lineage method very slow on large trees In-Reply-To: <489B752D.2080209@sendu.me.uk> References: <200808071335.24668.tristan.lefebure@gmail.com> <489B752D.2080209@sendu.me.uk> Message-ID: <7A185A45-A886-4DD0-8BF0-E7CDC6B65F6B@illinois.edu> On Aug 7, 2008, at 5:20 PM, Sendu Bala wrote: > Tristan Lefebure wrote: >> I'm using a script very similar to bp_taxonomy2tree.pl distributed >> with BioPerl (with the only difference that I'm using taxids >> instead of taxon names). Basically, the script generates a >> taxonomic tree given a list of taxids using the NCBI taxonomy db. >> For each taxon, it generates a taxon object, and then merge this >> object to a tree object that keeps growing. It runs very well with >> a small number of taxa, but with many taxa (>1000), it is very very >> very slow (about a week for 3000 taxa). >> The slowness is due to the function merge_lineage (line 65), which >> merges the existing tree object with a new taxon object. I guess >> that the difficulty with a big tree (i.e. more than 1000 leaf) is >> to find the nodes in common between the tree and the new taxon >> object... >> Would you have any idea of how to get around the problem? Should I >> look under the hood of merge_lineage to try to improve it for large >> trees? > > Yes, please do. It might have been me that wrote that, in which case > I didn't do anything fancy or consider the above problem. Actually I thought that was fixed; wasn't some caching added for that script at one point? chris From bix at sendu.me.uk Fri Aug 8 03:50:50 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 08 Aug 2008 08:50:50 +0100 Subject: [Bioperl-l] (TreeFunctionsI) merge_lineage method very slow on large trees In-Reply-To: <7A185A45-A886-4DD0-8BF0-E7CDC6B65F6B@illinois.edu> References: <200808071335.24668.tristan.lefebure@gmail.com> <489B752D.2080209@sendu.me.uk> <7A185A45-A886-4DD0-8BF0-E7CDC6B65F6B@illinois.edu> Message-ID: <489BFADA.1060308@sendu.me.uk> Chris Fields wrote: > > On Aug 7, 2008, at 5:20 PM, Sendu Bala wrote: > >> Tristan Lefebure wrote: >>> I'm using a script very similar to bp_taxonomy2tree.pl distributed >>> with BioPerl (with the only difference that I'm using taxids instead >>> of taxon names). Basically, the script generates a taxonomic tree >>> given a list of taxids using the NCBI taxonomy db. For each taxon, it >>> generates a taxon object, and then merge this object to a tree object >>> that keeps growing. It runs very well with a small number of taxa, >>> but with many taxa (>1000), it is very very very slow (about a week >>> for 3000 taxa). >>> The slowness is due to the function merge_lineage (line 65), which >>> merges the existing tree object with a new taxon object. I guess that >>> the difficulty with a big tree (i.e. more than 1000 leaf) is to find >>> the nodes in common between the tree and the new taxon object... >>> Would you have any idea of how to get around the problem? Should I >>> look under the hood of merge_lineage to try to improve it for large >>> trees? >> >> Yes, please do. It might have been me that wrote that, in which case I >> didn't do anything fancy or consider the above problem. > > Actually I thought that was fixed; Oh yeah. Looks like I did something related to 'speedup for merge_lineage()' on the 18th Dec 2006. Tristan, checkout Bio/Tree/TreeFunctionsI.pm from svn and see if that solves your problem. From tristan.lefebure at gmail.com Fri Aug 8 12:02:32 2008 From: tristan.lefebure at gmail.com (Tristan Lefebure) Date: Fri, 8 Aug 2008 12:02:32 -0400 Subject: [Bioperl-l] (TreeFunctionsI) merge_lineage method very slow on large trees In-Reply-To: <489BFADA.1060308@sendu.me.uk> References: <200808071335.24668.tristan.lefebure@gmail.com> <489B752D.2080209@sendu.me.uk> <7A185A45-A886-4DD0-8BF0-E7CDC6B65F6B@illinois.edu> <489BFADA.1060308@sendu.me.uk> Message-ID: Yes indeed, with the svn code it took 10 minutes (compared to one week!) Thanks, -Tristan On Fri, Aug 8, 2008 at 3:50 AM, Sendu Bala wrote: > Chris Fields wrote: > >> >> On Aug 7, 2008, at 5:20 PM, Sendu Bala wrote: >> >> Tristan Lefebure wrote: >>> >>>> I'm using a script very similar to bp_taxonomy2tree.pl distributed with >>>> BioPerl (with the only difference that I'm using taxids instead of taxon >>>> names). Basically, the script generates a taxonomic tree given a list of >>>> taxids using the NCBI taxonomy db. For each taxon, it generates a taxon >>>> object, and then merge this object to a tree object that keeps growing. It >>>> runs very well with a small number of taxa, but with many taxa (>1000), it >>>> is very very very slow (about a week for 3000 taxa). >>>> The slowness is due to the function merge_lineage (line 65), which >>>> merges the existing tree object with a new taxon object. I guess that the >>>> difficulty with a big tree (i.e. more than 1000 leaf) is to find the nodes >>>> in common between the tree and the new taxon object... >>>> Would you have any idea of how to get around the problem? Should I look >>>> under the hood of merge_lineage to try to improve it for large trees? >>>> >>> >>> Yes, please do. It might have been me that wrote that, in which case I >>> didn't do anything fancy or consider the above problem. >>> >> >> Actually I thought that was fixed; >> > > Oh yeah. Looks like I did something related to 'speedup for > merge_lineage()' on the 18th Dec 2006. Tristan, checkout > Bio/Tree/TreeFunctionsI.pm from svn and see if that solves your problem. > From rvos at interchange.ubc.ca Fri Aug 8 19:59:20 2008 From: rvos at interchange.ubc.ca (Rutger Vos) Date: Fri, 8 Aug 2008 16:59:20 -0700 Subject: [Bioperl-l] malloc errors while using Bio::SeqIO? Message-ID: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com> Hi, while going through a large genbank file (ftp://ftp.ncbi.nlm.nih.gov/genbank/gbpri21.seq.gz) I ran into malloc errors. Just for the record (I doubt this does anyone any good), I got: perl(391) malloc: *** vm_allocate(size=8421376) failed (error code=3) perl(391) malloc: *** error: can't allocate region perl(391) malloc: *** set a breakpoint in szone_error to debug Out of memory! What I was trying to do is go through the file, and only write out those seq objects that aren't human, and that have CDS features, i.e.: ################################################ #!/usr/bin/perl use strict; use warnings; use Bio::SeqIO; my $dir = shift @ARGV; # the directory with *.gz files my $out = shift @ARGV; # the directory to write to... mkdir $out if not -d $out; # ...which may need to be created opendir my $dirhandle, $dir or die $!; for my $archive ( readdir $dirhandle ) { next if $archive !~ /\.gz$/; my $file = $archive; $file =~ s/\.gz$//; # external call to the gunzip utility, # such that we keep the archive system( "gunzip -c \"${dir}/${archive}\" > \"${dir}/${file}\"" ); # object that parses genbank files, # returns Bio::Seq objects my $reader = Bio::SeqIO->new( '-format' => 'genbank', '-file' => "${dir}/${file}" ); # object that receives Bio::Seq objects, # writes genbank files my $writer = Bio::SeqIO->new( '-format' => 'genbank', '-file' => ">${out}/${file}", ); while ( my $seq = $reader->next_seq ) { my $name = $seq->species->binomial; if ( $name ne 'Homo sapiens' ) { # search for coding sequences among the features my $HasCDS = 0; FEATURE: for my $f ( $seq->get_SeqFeatures ) { if ( $f->primary_tag eq 'CDS' ) { $HasCDS++; last FEATURE; } } # write the sequence to file if ( $HasCDS ) { $writer->write_seq( $seq ); } } } # delete the extracted, unfiltered file unlink "${dir}/${file}"; } ################################################ Okay, so it runs out of memory. Can I do something to fix that? Should I flush on either of the I/O objects after each $seq? Could there be memory leaks in the Bio::Seq objects? Should I $seq->DESTROY them explicitly or something like that? Thanks, Rutger -- Dr. Rutger A. Vos Department of zoology University of British Columbia http://www.nexml.org http://rutgervos.blogspot.com From David.Messina at sbc.su.se Sat Aug 9 07:04:04 2008 From: David.Messina at sbc.su.se (Dave Messina) Date: Sat, 9 Aug 2008 13:04:04 +0200 Subject: [Bioperl-l] malloc errors while using Bio::SeqIO? In-Reply-To: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com> References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com> Message-ID: <628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com> Hi Rutger, I ran your script on the same genbank file and, while I did not run out of memory, I did see what appears to be a memory leak. Even when I manually undef'd the reader and writer object every 1000 records, memory usage continued to grow. I can't quite figure out what's going on, though. If I run a different program using SeqIO (the simple sequence converter from the SeqIO HOWTO) on the same input file, I don't see this same runaway growth. Also, the problem seems a lot worse on perl 5.10 than on 5.8.8; on 5.8.8 the sequence converter holds steady at about 12MB of real memory, whereas on 5.10 it grows, albeit slowly, for as long as the program is executing. When I killed it about 20% of the way through the file, it was up to about 44MB of real memory. Anyone else have a chance to look at this? Dave From rvos at interchange.ubc.ca Sat Aug 9 07:36:20 2008 From: rvos at interchange.ubc.ca (Rutger Vos) Date: Sat, 9 Aug 2008 04:36:20 -0700 Subject: [Bioperl-l] malloc errors while using Bio::SeqIO? In-Reply-To: <628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com> References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com> <628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com> Message-ID: <2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com> Hi Dave, thanks for the reply. The memory usage is in fact much more atrocious than just 44 mb: I'm actually looping over all 36 such archives (the genbank primates), and on my macbook it steadily increase to over 1gb of memory. What seemed to be helping somewhat is to call $reader->flush and $writer->flush after each seq, at least to the extent that I make it through that one file, but last time I tried I didn't get much further: the whole terminal process died shortly after instead. I seem to vaguely recall that even if perl free()'s memory, that doesn't necessarily mean that the memory is returned to the OS for the runtime of the program - depending on the OS and perl version. What OS are you on? I'm running perl 5.8.6 on OS X 10.4.11 intel. Rutger On Sat, Aug 9, 2008 at 4:04 AM, Dave Messina wrote: > Hi Rutger, > I ran your script on the same genbank file and, while I did not run out of > memory, I did see what appears to be a memory leak. Even when I manually > undef'd the reader and writer object every 1000 records, memory usage > continued to grow. > > I can't quite figure out what's going on, though. > If I run a different program using SeqIO (the simple sequence converter from > the SeqIO HOWTO) on the same input file, I don't see this same runaway > growth. > Also, the problem seems a lot worse on perl 5.10 than on 5.8.8; on 5.8.8 the > sequence converter holds steady at about 12MB of real memory, whereas on > 5.10 it grows, albeit slowly, for as long as the program is executing. When > I killed it about 20% of the way through the file, it was up to about 44MB > of real memory. > Anyone else have a chance to look at this? > > Dave > -- Dr. Rutger A. Vos Department of zoology University of British Columbia http://www.nexml.org http://rutgervos.blogspot.com From David.Messina at sbc.su.se Sat Aug 9 08:58:56 2008 From: David.Messina at sbc.su.se (Dave Messina) Date: Sat, 9 Aug 2008 14:58:56 +0200 Subject: [Bioperl-l] malloc errors while using Bio::SeqIO? In-Reply-To: <2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com> References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com> <628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com> <2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com> Message-ID: <628aabb70808090558j4e820208h6883af0e112d7f55@mail.gmail.com> > > I seem to vaguely recall that even if perl free()'s memory that doesn't > necessarily mean that the memory is returned to the OS for the runtime of > the program I believe that's correct. > What OS are you on? I'm running perl 5.8.6 on OS X 10.4.11 intel. > perl 5.10 or 5.8.8 on OS X 10.5.4 Intel. Dave From cjfields at illinois.edu Sat Aug 9 09:56:19 2008 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 9 Aug 2008 08:56:19 -0500 Subject: [Bioperl-l] malloc errors while using Bio::SeqIO? In-Reply-To: <628aabb70808090558j4e820208h6883af0e112d7f55@mail.gmail.com> References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com> <628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com> <2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com> <628aabb70808090558j4e820208h6883af0e112d7f55@mail.gmail.com> Message-ID: <57147D88-ABE6-44E0-8D76-790B0C735438@illinois.edu> There is definitely a memory leak. I can confirm it on OSX 10.4/10.5 using bioperl-live. I'll try looking into it this weekend, but I can't promise when it'll be fixed; my laptop is on the fritz. chris On Aug 9, 2008, at 7:58 AM, Dave Messina wrote: >> >> I seem to vaguely recall that even if perl free()'s memory that >> doesn't >> necessarily mean that the memory is returned to the OS for the >> runtime of >> the program > > > I believe that's correct. > > > >> What OS are you on? I'm running perl 5.8.6 on OS X 10.4.11 intel. >> > > perl 5.10 or 5.8.8 on OS X 10.5.4 Intel. > > > Dave From cjfields at illinois.edu Sat Aug 9 10:15:23 2008 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 9 Aug 2008 09:15:23 -0500 Subject: [Bioperl-l] malloc errors while using Bio::SeqIO? In-Reply-To: <628aabb70808090558j4e820208h6883af0e112d7f55@mail.gmail.com> References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com> <628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com> <2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com> <628aabb70808090558j4e820208h6883af0e112d7f55@mail.gmail.com> Message-ID: <9DB4A373-B4CF-4207-A631-64951D8DB119@illinois.edu> Forgot to mention, maybe we can file this as a bug? It's a pretty serious one but it should be easy to narrow down; the change had to be introduced fairly recently. chris On Aug 9, 2008, at 7:58 AM, Dave Messina wrote: >> >> I seem to vaguely recall that even if perl free()'s memory that >> doesn't >> necessarily mean that the memory is returned to the OS for the >> runtime of >> the program > > > I believe that's correct. > > > >> What OS are you on? I'm running perl 5.8.6 on OS X 10.4.11 intel. >> > > perl 5.10 or 5.8.8 on OS X 10.5.4 Intel. > > > Dave > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Marie-Claude Hofmann College of Veterinary Medicine University of Illinois Urbana-Champaign From hlapp at gmx.net Sat Aug 9 12:00:46 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 9 Aug 2008 12:00:46 -0400 Subject: [Bioperl-l] malloc errors while using Bio::SeqIO? In-Reply-To: <9DB4A373-B4CF-4207-A631-64951D8DB119@illinois.edu> References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com> <628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com> <2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com> <628aabb70808090558j4e820208h6883af0e112d7f55@mail.gmail.com> <9DB4A373-B4CF-4207-A631-64951D8DB119@illinois.edu> Message-ID: <897A8CAC-EDAF-4F26-B6E3-A8CF0F918A70@gmx.net> This smells of circular references somewhere. I think the first point I would go looking is the species storing - does the problem go away if you turn that off? Maybe the version of weaken() is at play here? -hilmar On Aug 9, 2008, at 10:15 AM, Chris Fields wrote: > Forgot to mention, maybe we can file this as a bug? It's a pretty > serious one but it should be easy to narrow down; the change had to > be introduced fairly recently. > > chris > > On Aug 9, 2008, at 7:58 AM, Dave Messina wrote: > >>> >>> I seem to vaguely recall that even if perl free()'s memory that >>> doesn't >>> necessarily mean that the memory is returned to the OS for the >>> runtime of >>> the program >> >> >> I believe that's correct. >> >> >> >>> What OS are you on? I'm running perl 5.8.6 on OS X 10.4.11 intel. >>> >> >> perl 5.10 or 5.8.8 on OS X 10.5.4 Intel. >> >> >> Dave >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Marie-Claude Hofmann > College of Veterinary Medicine > University of Illinois Urbana-Champaign > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Sat Aug 9 12:07:30 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 9 Aug 2008 12:07:30 -0400 Subject: [Bioperl-l] Finding possible primers regex In-Reply-To: <33A8975C-2A88-4697-8298-7D16CB03CEAE@uiuc.edu> References: <18792782.post@talk.nabble.com> <33A8975C-2A88-4697-8298-7D16CB03CEAE@uiuc.edu> Message-ID: <591AE8EB-4D45-4859-A93E-EA9BF01CA9C6@gmx.net> This looks like a neat trick. Do you think it's worth including as a SimpleAlign method (obviously w/o the printing to STDOUT)? I can imagine that a lot of people might appreciate it. -hilmar On Aug 4, 2008, at 12:08 AM, Chris Fields wrote: > On Aug 2, 2008, at 3:05 PM, Benbo wrote: > >> >> Hi there, >> I'm trying to write a perl script to scan an aligned multiple entry >> fasta >> file and find possible primers. So far I've produced a string which >> contains >> bases which match all sequences and * where they don't match e.g. >> 1) TTAGCCTAA >> 2) TTAGCAGAA >> 3) TTACCCTAA >> >> would give TTA*C**AA. >> >> I want to parse this string and pull out all sequences which are >> 18-21 bp in >> length and have no more than 4 * in them. >> >> So far, I've got this: >> >> while($fragment_match =~ /([GTAC*]{18,21})/g){ >> print "$1\n"; >> } >> >> hoping to match all fragments 18-21 characters in length. However >> even that >> doesn't work as it has essentially chunked it into 21 char blocks, >> rather >> than what I hoped for of >> 0-18 >> 0-19 >> 0-20 >> 0-21 >> 1-19 >> 1-20 >> 1-21 >> 1-22 >> >> etc. >> >> Can anyone let me know if this is already possible in BioPerl, or >> how one >> would go about it with regex. Sadly I'm fairly new to perl and >> getting to >> grips with BioPerl, so please treat me gently :). >> >> Many thanks, >> >> Ben > > There is a trick to this which is discussed more extensively in > 'Mastering Regular Expressions'. Essentially you have to embed code > into the regex and trick the parser into backtracking using a > negative lookahead. The match itself fails (i.e. no match is > returned), but the embedded code is executed for each match attempt, > > The following script is a slight modification of one I used which > checks the consensus string from the input alignment (in aligned > FASTA format here), extracts the alignment slice using that match, > then spit the alignment out to STDOUT in clustalw format. This > should work for perl 5.8 and up, but it's only been tested on perl > 5.10. You should be able to use this to fit what you want. > > my $in = Bio::AlignIO->new(-file => $file, > -format => 'fasta'); > my $out = Bio::AlignIO->new(-fh => \*STDOUT, > -format => 'clustalw'); > > while (my $aln = $in->next_aln) { > my $c = $aln->consensus_string(100); > my @matches; > $c =~ m/ > ([GTAC?]{18,21}) > (?{my $match = check_match($1); > push @matches, [$match, > pos(), > length($match)] > if defined $match;}) > (?!) > /xig; > for my $match (@matches) { > my ($hit, $st, $end) = ($match->[0], > $match->[1] - $match->[2] + 1, > $match->[1]); > my $newaln = $aln->slice($st, $end); > $out->write_aln($newaln); > } > } > > sub check_match { > my $match = shift; > return unless $match; > my $ct = $match =~ tr/?/?/; > return $match if $ct <= 4; > } > > > chris > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From rvos at interchange.ubc.ca Sat Aug 9 13:47:33 2008 From: rvos at interchange.ubc.ca (Rutger Vos) Date: Sat, 9 Aug 2008 10:47:33 -0700 Subject: [Bioperl-l] malloc errors while using Bio::SeqIO? In-Reply-To: <897A8CAC-EDAF-4F26-B6E3-A8CF0F918A70@gmx.net> References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com> <628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com> <2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com> <628aabb70808090558j4e820208h6883af0e112d7f55@mail.gmail.com> <9DB4A373-B4CF-4207-A631-64951D8DB119@illinois.edu> <897A8CAC-EDAF-4F26-B6E3-A8CF0F918A70@gmx.net> Message-ID: <2bb9b24a0808091047t46a6bfa8r7e11a3a1537180@mail.gmail.com> I am sure my version of weaken() works as advertised. Is there a way to turn off species storing from outside the code base or do you mean I go and start commenting bits out in Bio::SeqIO::genbank (or wherever)? On Sat, Aug 9, 2008 at 9:00 AM, Hilmar Lapp wrote: > This smells of circular references somewhere. I think the first point I > would go looking is the species storing - does the problem go away if you > turn that off? Maybe the version of weaken() is at play here? > > -hilmar > > On Aug 9, 2008, at 10:15 AM, Chris Fields wrote: > >> Forgot to mention, maybe we can file this as a bug? It's a pretty serious >> one but it should be easy to narrow down; the change had to be introduced >> fairly recently. >> >> chris >> >> On Aug 9, 2008, at 7:58 AM, Dave Messina wrote: >> >>>> >>>> I seem to vaguely recall that even if perl free()'s memory that doesn't >>>> necessarily mean that the memory is returned to the OS for the runtime >>>> of >>>> the program >>> >>> >>> I believe that's correct. >>> >>> >>> >>>> What OS are you on? I'm running perl 5.8.6 on OS X 10.4.11 intel. >>>> >>> >>> perl 5.10 or 5.8.8 on OS X 10.5.4 Intel. >>> >>> >>> Dave >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Marie-Claude Hofmann >> College of Veterinary Medicine >> University of Illinois Urbana-Champaign >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > -- Dr. Rutger A. Vos Department of zoology University of British Columbia http://www.nexml.org http://rutgervos.blogspot.com From hartzell at alerce.com Sat Aug 9 14:33:51 2008 From: hartzell at alerce.com (George Hartzell) Date: Sat, 9 Aug 2008 11:33:51 -0700 Subject: [Bioperl-l] malloc errors while using Bio::SeqIO? In-Reply-To: <2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com> References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com> <628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com> <2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com> Message-ID: <18589.58127.57270.352974@almost.alerce.com> I'm pretty sure that this fixes the problem: g. Index: Bio/Species.pm =================================================================== --- Bio/Species.pm (revision 14791) +++ Bio/Species.pm (working copy) @@ -340,6 +340,7 @@ } $self->{_species} = $species; + weaken($self->{tree}->{'_rootnode'}) unless isweak($self->{tree}->{'_rootnode'}); } return $self->{_species}; } From cjfields at illinois.edu Sat Aug 9 15:08:27 2008 From: cjfields at illinois.edu (Christopher Fields) Date: Sat, 9 Aug 2008 14:08:27 -0500 (CDT) Subject: [Bioperl-l] malloc errors while using Bio::SeqIO? Message-ID: <20080809140827.BHN28056@expms6.cites.uiuc.edu> I'm pretty sure it's not due to a particular version of weaken(), though it does sound like a circular references issue. I have tried this with perl 5.8.6, 5.8.8, and 5.10 (all Mac OS, either 10.4 or 10.5); all have the same memory leak issues. You can try using SeqBuilder to get rid of the Bio::Species object. I'll give that a try when I can. Unfortunately my laptop is now with the local Apple geniuses awaiting a motherboard, so I can't get to it right away (I'll give it a try on my wife's laptop). chris ---- Original message ---- >Date: Sat, 9 Aug 2008 10:47:33 -0700 >From: "Rutger Vos" >Subject: Re: [Bioperl-l] malloc errors while using Bio::SeqIO? >To: "Hilmar Lapp" >Cc: Chris Fields , bioperl list > >I am sure my version of weaken() works as advertised. Is there a way >to turn off species storing from outside the code base or do you mean >I go and start commenting bits out in Bio::SeqIO::genbank (or >wherever)? > >On Sat, Aug 9, 2008 at 9:00 AM, Hilmar Lapp wrote: >> This smells of circular references somewhere. I think the first point I >> would go looking is the species storing - does the problem go away if you >> turn that off? Maybe the version of weaken() is at play here? >> >> -hilmar >> >> On Aug 9, 2008, at 10:15 AM, Chris Fields wrote: >> >>> Forgot to mention, maybe we can file this as a bug? It's a pretty serious >>> one but it should be easy to narrow down; the change had to be introduced >>> fairly recently. >>> >>> chris >>> >>> On Aug 9, 2008, at 7:58 AM, Dave Messina wrote: >>> >>>>> >>>>> I seem to vaguely recall that even if perl free()'s memory that doesn't >>>>> necessarily mean that the memory is returned to the OS for the runtime >>>>> of >>>>> the program >>>> >>>> >>>> I believe that's correct. >>>> >>>> >>>> >>>>> What OS are you on? I'm running perl 5.8.6 on OS X 10.4.11 intel. >>>>> >>>> >>>> perl 5.10 or 5.8.8 on OS X 10.5.4 Intel. >>>> >>>> >>>> Dave >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> Christopher Fields >>> Postdoctoral Researcher >>> Lab of Dr. Marie-Claude Hofmann >>> College of Veterinary Medicine >>> University of Illinois Urbana-Champaign >>> >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> ================================================= ========== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> ================================================= ========== >> >> >> >> > > > >-- >Dr. Rutger A. Vos >Department of zoology >University of British Columbia >http://www.nexml.org >http://rutgervos.blogspot.com >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l From hartzell at alerce.com Sat Aug 9 20:17:52 2008 From: hartzell at alerce.com (George Hartzell) Date: Sat, 9 Aug 2008 17:17:52 -0700 Subject: [Bioperl-l] malloc errors while using Bio::SeqIO? In-Reply-To: <18589.58127.57270.352974@almost.alerce.com> References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com> <628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com> <2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com> <18589.58127.57270.352974@almost.alerce.com> Message-ID: <18590.13232.892714.952555@almost.alerce.com> George Hartzell writes: > > I'm pretty sure that this fixes the problem: > > g. > > Index: Bio/Species.pm > =================================================================== > --- Bio/Species.pm (revision 14791) > +++ Bio/Species.pm (working copy) > @@ -340,6 +340,7 @@ > } > > $self->{_species} = $species; > + weaken($self->{tree}->{'_rootnode'}) unless isweak($self->{tree}->{'_rootnode'}); > } > return $self->{_species}; > } Actually, it's a bit clearer with the weaken moved up in the block so that it's closer to where the new tree is allocated. Chris suggested that I go ahead and I commit it. g. From David.Messina at sbc.su.se Sun Aug 10 05:57:07 2008 From: David.Messina at sbc.su.se (Dave Messina) Date: Sun, 10 Aug 2008 11:57:07 +0200 Subject: [Bioperl-l] malloc errors while using Bio::SeqIO? In-Reply-To: <18590.13232.892714.952555@almost.alerce.com> References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com> <628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com> <2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com> <18589.58127.57270.352974@almost.alerce.com> <18590.13232.892714.952555@almost.alerce.com> Message-ID: <628aabb70808100257o1c905255vf1d3a6b9912e21de@mail.gmail.com> Nice, George -- holds steady at about 32MB now. Much better. :) Dave From hartzell at alerce.com Sun Aug 10 12:51:39 2008 From: hartzell at alerce.com (George Hartzell) Date: Sun, 10 Aug 2008 09:51:39 -0700 Subject: [Bioperl-l] malloc errors while using Bio::SeqIO? In-Reply-To: <628aabb70808100257o1c905255vf1d3a6b9912e21de@mail.gmail.com> References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com> <628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com> <2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com> <18589.58127.57270.352974@almost.alerce.com> <18590.13232.892714.952555@almost.alerce.com> <628aabb70808100257o1c905255vf1d3a6b9912e21de@mail.gmail.com> Message-ID: <18591.7323.244987.436383@almost.alerce.com> Dave Messina writes: > Nice, George -- holds steady at about 32MB now. > Much better. :) Good to hear. Bonus points go to rvos@ for providing such a nice clean bug report and test case, it made running it down much more appealing. g. From valiente at lsi.upc.edu Mon Aug 11 04:09:37 2008 From: valiente at lsi.upc.edu (Gabriel Valiente) Date: Mon, 11 Aug 2008 11:09:37 +0300 Subject: [Bioperl-l] get_lca method very slow on many nodes In-Reply-To: References: Message-ID: Despite the speedup for merge_lineage, the get_lca method still runs very slow on a large number of nodes (say, 1500 nodes) and it does not rely on merge_lineage. In the get_lca method, all the lineages are first collected in @paths in order to later find their $lca, while it might be faster to process each $path as soon as it is obtained with the get_lineage_nodes method. Any other ideas how to speedup the get_lca method? Thanks, Gabriel From lmanchon at univ-montp2.fr Mon Aug 11 12:32:20 2008 From: lmanchon at univ-montp2.fr (Laurent Manchon) Date: Mon, 11 Aug 2008 18:32:20 +0200 Subject: [Bioperl-l] protein pattern scan Message-ID: <5.0.2.1.2.20080811182952.00bebff0@pop.univ-montp2.fr> Hi, do you know if it's possible to search protein motif in a multifasta protein file using bioperl to return the motif, the position and the name of the corresponding sequence ? thank you for your help. +---------------------------------------------+ Laurent Manchon Email: lmanchon at univ-montp2.fr +---------------------------------------------+ From cjfields at illinois.edu Mon Aug 11 13:32:05 2008 From: cjfields at illinois.edu (Christopher Fields) Date: Mon, 11 Aug 2008 12:32:05 -0500 (CDT) Subject: [Bioperl-l] protein pattern scan Message-ID: <20080811123205.BHO45474@expms6.cites.uiuc.edu> This is covered the FAQ: http://www.bioperl.org/wiki/FAQ#How_do_I_do_motif_searches_with_BioPerl.3F_Can_I_do_.22find_all_sequences_that_are_75.25_identical.22_to_a_given_motif.3F chris ---- Original message ---- >Date: Mon, 11 Aug 2008 18:32:20 +0200 >From: Laurent Manchon >Subject: [Bioperl-l] protein pattern scan >To: bioperl-l at lists.open-bio.org > >Hi, > >do you know if it's possible to search protein motif in a multifasta >protein file >using bioperl to return the motif, the position and the name of the >corresponding sequence ? > >thank you for your help. > > >+---------------------------------------------+ > Laurent Manchon > Email: lmanchon at univ-montp2.fr >+---------------------------------------------+ >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l From bix at sendu.me.uk Mon Aug 11 13:44:37 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 11 Aug 2008 18:44:37 +0100 Subject: [Bioperl-l] get_lca method very slow on many nodes In-Reply-To: References: Message-ID: <48A07A85.6050601@sendu.me.uk> Gabriel Valiente wrote: > Despite the speedup for merge_lineage, the get_lca method still runs > very slow on a large number of nodes (say, 1500 nodes) and it does not > rely on merge_lineage. In the get_lca method, all the lineages are first > collected in @paths in order to later find their $lca, while it might be > faster to process each $path as soon as it is obtained with the > get_lineage_nodes method. If you try that idea out and it works, please do commit it. I've no further suggestions atm, but I haven't had a chance to look at it to remind myself what happens. From cjfields at illinois.edu Mon Aug 11 15:50:38 2008 From: cjfields at illinois.edu (Christopher Fields) Date: Mon, 11 Aug 2008 14:50:38 -0500 (CDT) Subject: [Bioperl-l] Finding possible primers regex Message-ID: <20080811145038.BHO59267@expms6.cites.uiuc.edu> When I can I could try generating a method which accepts a regex/Bio::Tools::SeqPattern and returns an AlignIO stream or array of SimpleAlign instances (the former could be attached to a temp file for iteration). Any preference? chris ---- Original message ---- >Date: Sat, 9 Aug 2008 12:07:30 -0400 >From: Hilmar Lapp >Subject: Re: [Bioperl-l] Finding possible primers regex >To: Chris Fields >Cc: Benbo , Bioperl-l at lists.open-bio.org > >This looks like a neat trick. Do you think it's worth including as a >SimpleAlign method (obviously w/o the printing to STDOUT)? I can >imagine that a lot of people might appreciate it. > > -hilmar > >On Aug 4, 2008, at 12:08 AM, Chris Fields wrote: > >> On Aug 2, 2008, at 3:05 PM, Benbo wrote: >> >>> >>> Hi there, >>> I'm trying to write a perl script to scan an aligned multiple entry >>> fasta >>> file and find possible primers. So far I've produced a string which >>> contains >>> bases which match all sequences and * where they don't match e.g. >>> 1) TTAGCCTAA >>> 2) TTAGCAGAA >>> 3) TTACCCTAA >>> >>> would give TTA*C**AA. >>> >>> I want to parse this string and pull out all sequences which are >>> 18-21 bp in >>> length and have no more than 4 * in them. >>> >>> So far, I've got this: >>> >>> while($fragment_match =~ /([GTAC*]{18,21})/g){ >>> print "$1\n"; >>> } >>> >>> hoping to match all fragments 18-21 characters in length. However >>> even that >>> doesn't work as it has essentially chunked it into 21 char blocks, >>> rather >>> than what I hoped for of >>> 0-18 >>> 0-19 >>> 0-20 >>> 0-21 >>> 1-19 >>> 1-20 >>> 1-21 >>> 1-22 >>> >>> etc. >>> >>> Can anyone let me know if this is already possible in BioPerl, or >>> how one >>> would go about it with regex. Sadly I'm fairly new to perl and >>> getting to >>> grips with BioPerl, so please treat me gently :). >>> >>> Many thanks, >>> >>> Ben >> >> There is a trick to this which is discussed more extensively in >> 'Mastering Regular Expressions'. Essentially you have to embed code >> into the regex and trick the parser into backtracking using a >> negative lookahead. The match itself fails (i.e. no match is >> returned), but the embedded code is executed for each match attempt, >> >> The following script is a slight modification of one I used which >> checks the consensus string from the input alignment (in aligned >> FASTA format here), extracts the alignment slice using that match, >> then spit the alignment out to STDOUT in clustalw format. This >> should work for perl 5.8 and up, but it's only been tested on perl >> 5.10. You should be able to use this to fit what you want. >> >> my $in = Bio::AlignIO->new(-file => $file, >> -format => 'fasta'); >> my $out = Bio::AlignIO->new(-fh => \*STDOUT, >> -format => 'clustalw'); >> >> while (my $aln = $in->next_aln) { >> my $c = $aln->consensus_string(100); >> my @matches; >> $c =~ m/ >> ([GTAC?]{18,21}) >> (?{my $match = check_match($1); >> push @matches, [$match, >> pos(), >> length($match)] >> if defined $match;}) >> (?!) >> /xig; >> for my $match (@matches) { >> my ($hit, $st, $end) = ($match->[0], >> $match->[1] - $match->[2] + 1, >> $match->[1]); >> my $newaln = $aln->slice($st, $end); >> $out->write_aln($newaln); >> } >> } >> >> sub check_match { >> my $match = shift; >> return unless $match; >> my $ct = $match =~ tr/?/?/; >> return $match if $ct <= 4; >> } >> >> >> chris >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >-- >=========================================================== >: Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >=========================================================== > > > From hlapp at gmx.net Mon Aug 11 22:35:13 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 11 Aug 2008 22:35:13 -0400 Subject: [Bioperl-l] Finding possible primers regex In-Reply-To: <20080811145038.BHO59267@expms6.cites.uiuc.edu> References: <20080811145038.BHO59267@expms6.cites.uiuc.edu> Message-ID: Actually, now that you ask I'm wondering whether one wouldn't sometimes want to retain the relationship between the match and the resulting spliced alignment? If so, neither AlignIO nor array would accomplish that, right? Other than that I myself don't have a strong preference either way. I suppose AlignIO stream is somewhat more extensible, since as you say it could be coupled to a file if the resulting set of alignments is really large. -hilmar On Aug 11, 2008, at 3:50 PM, Christopher Fields wrote: > When I can I could try generating a method which accepts a regex/ > Bio::Tools::SeqPattern and returns an AlignIO stream or array of > SimpleAlign instances (the former could be attached to a temp file > for iteration). Any preference? > > chris > > ---- Original message ---- >> Date: Sat, 9 Aug 2008 12:07:30 -0400 >> From: Hilmar Lapp >> Subject: Re: [Bioperl-l] Finding possible primers regex >> To: Chris Fields >> Cc: Benbo , Bioperl-l at lists.open-bio.org >> >> This looks like a neat trick. Do you think it's worth including as a >> SimpleAlign method (obviously w/o the printing to STDOUT)? I can >> imagine that a lot of people might appreciate it. >> >> -hilmar >> >> On Aug 4, 2008, at 12:08 AM, Chris Fields wrote: >> >>> On Aug 2, 2008, at 3:05 PM, Benbo wrote: >>> >>>> >>>> Hi there, >>>> I'm trying to write a perl script to scan an aligned multiple entry >>>> fasta >>>> file and find possible primers. So far I've produced a string which >>>> contains >>>> bases which match all sequences and * where they don't match e.g. >>>> 1) TTAGCCTAA >>>> 2) TTAGCAGAA >>>> 3) TTACCCTAA >>>> >>>> would give TTA*C**AA. >>>> >>>> I want to parse this string and pull out all sequences which are >>>> 18-21 bp in >>>> length and have no more than 4 * in them. >>>> >>>> So far, I've got this: >>>> >>>> while($fragment_match =~ /([GTAC*]{18,21})/g){ >>>> print "$1\n"; >>>> } >>>> >>>> hoping to match all fragments 18-21 characters in length. However >>>> even that >>>> doesn't work as it has essentially chunked it into 21 char blocks, >>>> rather >>>> than what I hoped for of >>>> 0-18 >>>> 0-19 >>>> 0-20 >>>> 0-21 >>>> 1-19 >>>> 1-20 >>>> 1-21 >>>> 1-22 >>>> >>>> etc. >>>> >>>> Can anyone let me know if this is already possible in BioPerl, or >>>> how one >>>> would go about it with regex. Sadly I'm fairly new to perl and >>>> getting to >>>> grips with BioPerl, so please treat me gently :). >>>> >>>> Many thanks, >>>> >>>> Ben >>> >>> There is a trick to this which is discussed more extensively in >>> 'Mastering Regular Expressions'. Essentially you have to embed code >>> into the regex and trick the parser into backtracking using a >>> negative lookahead. The match itself fails (i.e. no match is >>> returned), but the embedded code is executed for each match attempt, >>> >>> The following script is a slight modification of one I used which >>> checks the consensus string from the input alignment (in aligned >>> FASTA format here), extracts the alignment slice using that match, >>> then spit the alignment out to STDOUT in clustalw format. This >>> should work for perl 5.8 and up, but it's only been tested on perl >>> 5.10. You should be able to use this to fit what you want. >>> >>> my $in = Bio::AlignIO->new(-file => $file, >>> -format => 'fasta'); >>> my $out = Bio::AlignIO->new(-fh => \*STDOUT, >>> -format => 'clustalw'); >>> >>> while (my $aln = $in->next_aln) { >>> my $c = $aln->consensus_string(100); >>> my @matches; >>> $c =~ m/ >>> ([GTAC?]{18,21}) >>> (?{my $match = check_match($1); >>> push @matches, [$match, >>> pos(), >>> length($match)] >>> if defined $match;}) >>> (?!) >>> /xig; >>> for my $match (@matches) { >>> my ($hit, $st, $end) = ($match->[0], >>> $match->[1] - $match->[2] + 1, >>> $match->[1]); >>> my $newaln = $aln->slice($st, $end); >>> $out->write_aln($newaln); >>> } >>> } >>> >>> sub check_match { >>> my $match = shift; >>> return unless $match; >>> my $ct = $match =~ tr/?/?/; >>> return $match if $ct <= 4; >>> } >>> >>> >>> chris >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From mirhan at indiana.edu Mon Aug 11 23:46:35 2008 From: mirhan at indiana.edu (Han, Mira) Date: Mon, 11 Aug 2008 23:46:35 -0400 Subject: [Bioperl-l] [Wg-phyloinformatics] Re: phyloXML weekly report In-Reply-To: Message-ID: Hi, Yes it is true that it's similar to get_all_Annotations, it's basically a recursive version of it. I wanted to provide a method to get at nested annotations without going through all the if(isa collection) do recursive call.. etc. everytime, because most of the xml elements are implemented as nested annotation collections to the nodes. ( I am contemplating on using tagtrees instead of nested annotation collections in the future, but as of now, Annotation::tagtrees was documented as a temporary implementation, so I passed on that option. ) I forgot about the interface part. At least for my purpose I would think it's a good function to have in the interface. I agree that adding a recursive option to the get_all_Annotation would be better. Mira On 8/11/08 11:28 PM, "Hilmar Lapp" wrote: Hi Mira - On Aug 11, 2008, at 11:33 AM, Han, Mira wrote: > Added get_deep_Annotations in Annotation::Collection > in order to get annotations that are within nested collections. I hope I'm not contradicting Chris here, but we will probably want to think about this a bit more. Your implementation won't work as it is assuming an interface function that isn't defined on the interface (both get_deep_Annotations() and _deep_annotation_helper()). Also, it does nearly the same as get_all_Annotations(), and passing on the keys to nested collections should maybe simply be an option to that method. Alternatively, one could add an option -recurse to get_Annotation. The other difference you note is that your method does not flatten the nested annotations, but unless I am missing something your implementation does flatten annotations from nested collections. So even if we need a separate method for this, something like get_nested_Annotations() would probably be a more appropriate name, and if we do need a separate method, it should be compelling enough to add it to the interface too (as otherwise your code will only work with certain implementation classes). -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : =========================================================== From mirhan at indiana.edu Tue Aug 12 00:00:28 2008 From: mirhan at indiana.edu (Han, Mira) Date: Tue, 12 Aug 2008 00:00:28 -0400 Subject: [Bioperl-l] [Wg-phyloinformatics] Re: phyloXML weekly report In-Reply-To: <9E53DAE8-3A8F-4EEC-B2B4-741214907D90@duke.edu> Message-ID: Oh yes, I meant get_Annotations, I want a get_Annotations that is recursive and passes the keys to the recursive calls. On 8/11/08 11:54 PM, "Hilmar Lapp" wrote: Hi Mira - On Aug 11, 2008, at 11:46 PM, Han, Mira wrote: > Yes it is true that it's similar to get_all_Annotations, it's > basically a recursive version of it. I suppose you mean get_Annotations(), right? (get_all_Annotations() is already recursive) -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : =========================================================== From hlapp at duke.edu Mon Aug 11 23:54:43 2008 From: hlapp at duke.edu (Hilmar Lapp) Date: Mon, 11 Aug 2008 23:54:43 -0400 Subject: [Bioperl-l] [Wg-phyloinformatics] Re: phyloXML weekly report In-Reply-To: References: Message-ID: <9E53DAE8-3A8F-4EEC-B2B4-741214907D90@duke.edu> Hi Mira - On Aug 11, 2008, at 11:46 PM, Han, Mira wrote: > Yes it is true that it's similar to get_all_Annotations, it's > basically a recursive version of it. I suppose you mean get_Annotations(), right? (get_all_Annotations() is already recursive) -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : =========================================================== From mrphysh at juno.com Tue Aug 12 10:30:36 2008 From: mrphysh at juno.com (mrphysh at juno.com) Date: Tue, 12 Aug 2008 14:30:36 GMT Subject: [Bioperl-l] Can't locate IO/String.pm._._..install problem Message-ID: <20080812.083036.25924.0@webmail02.vgs.untd.com> I am studying bioperl and making progress. I have been struggling with the database retrieval from on-line databases. This is an example................ #!/usr/bin/perl -w use Bio::Perl; $seq_object = get_sequence('swiss',"ROA1_HUMAN"); write_sequence(">roa1.fasta",'fasta',$seq_object); exit; This script gives Can't locate IO/String.pm in @INC (@INC contains: /etc/perl /usr/local/lib/perl/5.8.8 /usr/local/share/perl/5.8.8 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.8 /usr/share/perl/5.8 /usr/local/lib/site_perl .) at ee_bpo.pl line 12. BEGIN failed--compilation aborted at ee_bpo.pl line 12. I have chased around with the paths in @INC, using "use lib'. This is an install problem. The original installation was with perl Makefile.pl. I reinstalled over the old with cpan. stuff like this: cpan> o conf prerequisites_policy follow cpan> i /bioperl/ cpan> install Bundle::BioPerl cpan> install B/BI/BIRNEY/bioperl-1.2.1.tar.gz cpan> force install B/BI/BIRNEY/bioperl-1.2.1.tar.gz This all seemed to proceed smoothly. this guy did not produce an error. use Bio::Perl; I am almost thinking that the problem is with the perl. But regular ftp through perl works: use Net::FTP;#I found this in usr/share/perl/5.8.8/Net As a perl command this module seems to work. I looked in the archives and found nothing. I think I have done my homework. any ideas? I run Ubuntu on a pentium III (and love it). the version of Ubuntu is new. the Perl (and MySQL) came with the OS: perl 5.8.8 John Brigham in Denver. ____________________________________________________________ Click to get a free auto insurance quotes from top companies. http://thirdpartyoffers.juno.com/TGL2141/fc/Ioyw6i3m2nsox4VCjepKpyEFCMEzNF4I2x42PAQjIIwUwo0E7h1wL0/ From jay at jays.net Tue Aug 12 11:08:59 2008 From: jay at jays.net (Jay Hannah) Date: Tue, 12 Aug 2008 10:08:59 -0500 Subject: [Bioperl-l] Can't locate IO/String.pm._._..install problem In-Reply-To: <20080812.083036.25924.0@webmail02.vgs.untd.com> References: <20080812.083036.25924.0@webmail02.vgs.untd.com> Message-ID: On Aug 12, 2008, at 2:30 PM, mrphysh at juno.com wrote: > Can't locate IO/String.pm in @INC ... > cpan> install Bundle::BioPerl > cpan> install B/BI/BIRNEY/bioperl-1.2.1.tar.gz > cpan> force install B/BI/BIRNEY/bioperl-1.2.1.tar.gz > This all seemed to proceed smoothly bioperl-1.2.1 is very old. Apparently Bundle::BioPerl is out of date? Here's lots of info about installing BioPerl: http://www.bioperl.org/wiki/Getting_BioPerl I recommend using bioperl-live directly from SVN, but I'm sort of a rebel like that. :) Alternately, you could try just doing a cpan> install IO::String HTH, j http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah From heikki at sanbi.ac.za Thu Aug 14 09:14:48 2008 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Thu, 14 Aug 2008 15:14:48 +0200 Subject: [Bioperl-l] TreeFunctionsI::findnode_by_id ? Message-ID: <200808141514.49124.heikki@sanbi.ac.za> A generic method for retrieving nodes from a Bio::Tree::TreeI objects is Bio::Tree::TreeFunctionsI::find_node. It defaults to searching the 'id' attribute unless a field is given. I can retrieve nodes based on internal id like this: $tree->find_node(-internal_id => $internal_id); I now found Bio::Tree::TreeFunctionsI::findnode_by_id() that retrieves by id. However, the POD documentation claims that it retrieves by internal id. What needs to be done? A. Fix the doc to speak about id B. Fix to code to retrieve by internal_id C. Fix the doc and create findnode_by_internal_id() C. Remove findnode_by_id() as redundant and confusing D. Deprecate findnode_by_id() as redundant and confusing There are no tests for findnode_by_id() which to me tilts selection to D and A for now. Any other opinions? -Heikki -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From hlapp at gmx.net Thu Aug 14 18:28:20 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 14 Aug 2008 18:28:20 -0400 Subject: [Bioperl-l] [Obo-discuss] software developer resources, OBO API? In-Reply-To: <48A448DD.4000206@psb.ugent.be> References: <6caff30c0808140627ucdfc25cj7c11a7ffb255c06a@mail.gmail.com> <48A448DD.4000206@psb.ugent.be> Message-ID: <1CFC1BF0-7718-4641-82DB-C094E4C56A53@gmx.net> Hi Erick, how did you determine that go-perl is specific to GO? I've found it to work quite well for any kind of OBO-formatted ontology. Also, you note that BioPerl doesn't have the ability to write in certain formats, and to intersect and "unify" (would you mind explaining what you mean by that?) ontologies. It seems that your implementation of RDF etc export isn't really reusable or modular in any way, but I'd love to bring the intersection function over to BioPerl (BTW when you decided to roll your own ontology API, did you get the impression that BioPerl isn't receptive to you adding to it?). Would you mind pointing me to the place in the code where I would find that, as I can't seem to find it. -hilmar On Aug 14, 2008, at 11:01 AM, Erick Antezana wrote: > Hi Arne, > > if you plan to work with PERL, you might take a look at ONTO-PERL : > > http://bioinformatics.oxfordjournals.org/cgi/content/abstract/btn042 > http://search.cpan.org/dist/ONTO-PERL/ > http://search.cpan.org/src/EASR/ONTO-PERL-1.13/doc/example00.html > > ONTO-PERL has been used intensively to build the Cell Cycle Ontology. > > cheers, > Erick > > Arne Muller wrote: >> Dear All, >> >> I'm new to this list and don't know much about ontologies in general >> (I worked a bit with GO some time ago). >> >> Let me explain my problem: We have several related vocabularies >> (non-hierarchical and redundant because of different spellings etc >> ...) to describe organs and tissues in our department, and we need to >> map each of these vocabs to all of our other legacy vocabs that >> describe similar concepts. We'd like to use the adult mouse anatomy >> ontology and modify/extend it with additional terms (if necessary), >> synonyms and dbXrefs. Most of our vocabs should be mapped as dbXrefs >> to existing terms in the MA ontology. The goal is that different >> units >> in our department use slightly different vocabulary to describe >> samples, and we now need link these different system (always the same >> old story ... ;-). >> >> For the moment I'm not planning to turn our messy legacy vocabs into >> OBO formated ontologies and to map them via cross products and the >> OBO >> relation ontology - though this might be the most proper way to do >> ... (comments are welcome). >> >> I'll have to write an "easy to use" tool that allows our data curator >> to easily map the legacy vocabs as dbXrefs of terms in the MA >> ontology. The question is, how am I gonna do this? I've a fairly good >> idea of how my software (java webapp) should look like, but are there >> any APIs and implementations of the OBO model as well as a DB schema >> and mappings between the model and the schema? >> >> I've had a look into the OLS from the EBI that seems to be fairly >> simple (which is good ;-) and that uses the oboedit.jar somewhere at >> the back-end. I've also found something like an obo api on >> http://wiki.geneontology.org/index.php/OBO-Edit:_Getting_the_Source_Code#.28Optional.29_Getting_the_OBO_API_from_Subclipse >> but so far I've not found any documentation nor examples on how to >> get >> started. >> >> I'd be happy to hear how developers and bioinformatics people use obo >> in their own tools (I better ask before going DIY ...). >> >> thanks a lot for your comments and help >> +kind regards, >> >> Arne >> >> ------------------------------------------------------------------------ >> >> ------------------------------------------------------------------------- >> This SF.Net email is sponsored by the Moblin Your Move Developer's >> challenge >> Build the coolest Linux based applications with Moblin SDK & win >> great prizes >> Grand prize is a trip for two to an Open Source event anywhere in >> the world >> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> Obo-discuss mailing list >> Obo-discuss at lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/obo-discuss >> > > -- > ================================================================== > Erick Antezana http://www.cellcycleontology.org > PhD student > Tel:+32 (0)9 331 38 24 fax:+32 (0)9 3313809 > VIB Department of Plant Systems Biology, Ghent University > Technologiepark 927, 9052 Gent, BELGIUM > erant at psb.ugent.be http://www.psb.ugent.be/~erant > ================================================================== > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's > challenge > Build the coolest Linux based applications with Moblin SDK & win > great prizes > Grand prize is a trip for two to an Open Source event anywhere in > the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Obo-discuss mailing list > Obo-discuss at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/obo-discuss -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From mjanis at chem.ucla.edu Thu Aug 14 19:37:05 2008 From: mjanis at chem.ucla.edu (Michael Janis) Date: Thu, 14 Aug 2008 16:37:05 -0700 Subject: [Bioperl-l] Code to contribute Message-ID: <008201c8fe66$aa21f2d0$fe65d870$@ucla.edu> Hi, I've had some perl code lying around for what seems like forever and I'd like to contribute it to bioperl, if such facilities don't already exist in bioperl. The code implements shuffling (DNA or RNA) keeping the dinucleotide composition (and codon usage) intact through a Eularian path approach as described in Altschul and Erickson (1985). The code seeds the Eularian paths by keeping the first and last nucleotide invariant in the shuffle - which has minimal detrimental effects to the purpose of the algorithm, in my experience. A quick search on the bioperl website shows that there is a mutation.pls script, and facilities for using Sean Eddy's SQUID C library, which implements the same function (I wrote this particular function before I knew how to use C). As such, it's probably not as elegant as Sean Eddy's implementation, but it works - and it's entirely in perl. The bioperl developer pages suggest a post to the mailing list as the best place to start contributing to bioperl. Is this a useful function to add to the project? Best Regards, Michael ------------------------------- Michael Janis mjanis at chem.ucla.edu ------------------------------- From rvos at interchange.ubc.ca Thu Aug 14 19:51:43 2008 From: rvos at interchange.ubc.ca (Rutger Vos) Date: Thu, 14 Aug 2008 16:51:43 -0700 Subject: [Bioperl-l] Fwd: Code to contribute In-Reply-To: <2bb9b24a0808141651n20fa102eh735f6a9d07409edd@mail.gmail.com> References: <008201c8fe66$aa21f2d0$fe65d870$@ucla.edu> <2bb9b24a0808141651n20fa102eh735f6a9d07409edd@mail.gmail.com> Message-ID: <2bb9b24a0808141651x46239ad5o1d8790eabd922453@mail.gmail.com> Sounds exciting! I bet the general advice you'll get is to i) check out the latest code from svn ii) see which bioperl objects/interfaces (e.g. Bio::Seq) you'd use to integrate your algorithm into bioperl iii) write a class that performs the algorithm as some sort of analysis factory taking the sequence object (or ideally object interface) as an input iv) run that class by the mailing list v) check it into svn. On Thu, Aug 14, 2008 at 4:37 PM, Michael Janis wrote: > Hi, > > > > I've had some perl code lying around for what seems like forever and I'd > like to contribute it to bioperl, if such facilities don't already exist in > bioperl. The code implements shuffling (DNA or RNA) keeping the > dinucleotide composition (and codon usage) intact through a Eularian path > approach as described in Altschul and Erickson (1985). The code seeds the > Eularian paths by keeping the first and last nucleotide invariant in the > shuffle - which has minimal detrimental effects to the purpose of the > algorithm, in my experience. > > > > A quick search on the bioperl website shows that there is a mutation.pls > script, and facilities for using Sean Eddy's SQUID C library, which > implements the same function (I wrote this particular function before I knew > how to use C). As such, it's probably not as elegant as Sean Eddy's > implementation, but it works - and it's entirely in perl. > > > > The bioperl developer pages suggest a post to the mailing list as the best > place to start contributing to bioperl. Is this a useful function to add to > the project? > > > > Best Regards, > > > > Michael > > > > ------------------------------- > > Michael Janis > > mjanis at chem.ucla.edu > > ------------------------------- > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Dr. Rutger A. Vos Department of zoology University of British Columbia http://www.nexml.org http://rutgervos.blogspot.com -- Dr. Rutger A. Vos Department of zoology University of British Columbia http://www.nexml.org http://rutgervos.blogspot.com From mjanis at chem.ucla.edu Thu Aug 14 19:55:04 2008 From: mjanis at chem.ucla.edu (Michael Janis) Date: Thu, 14 Aug 2008 16:55:04 -0700 Subject: [Bioperl-l] Fwd: Code to contribute In-Reply-To: <2bb9b24a0808141651x46239ad5o1d8790eabd922453@mail.gmail.com> References: <008201c8fe66$aa21f2d0$fe65d870$@ucla.edu> <2bb9b24a0808141651n20fa102eh735f6a9d07409edd@mail.gmail.com> <2bb9b24a0808141651x46239ad5o1d8790eabd922453@mail.gmail.com> Message-ID: <008701c8fe69$2cee6020$86cb2060$@ucla.edu> Thanks, Rutger, I'll do exactly that! (give me a few days) Best Regards, Michael ------------------------------- Michael Janis mjanis at chem.ucla.edu ------------------------------- -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Rutger Vos Sent: Thursday, August 14, 2008 4:52 PM To: bioperl-l at lists.open-bio.org Subject: [Bioperl-l] Fwd: Code to contribute Sounds exciting! I bet the general advice you'll get is to i) check out the latest code from svn ii) see which bioperl objects/interfaces (e.g. Bio::Seq) you'd use to integrate your algorithm into bioperl iii) write a class that performs the algorithm as some sort of analysis factory taking the sequence object (or ideally object interface) as an input iv) run that class by the mailing list v) check it into svn. On Thu, Aug 14, 2008 at 4:37 PM, Michael Janis wrote: > Hi, > > > > I've had some perl code lying around for what seems like forever and I'd > like to contribute it to bioperl, if such facilities don't already exist in > bioperl. The code implements shuffling (DNA or RNA) keeping the > dinucleotide composition (and codon usage) intact through a Eularian path > approach as described in Altschul and Erickson (1985). The code seeds the > Eularian paths by keeping the first and last nucleotide invariant in the > shuffle - which has minimal detrimental effects to the purpose of the > algorithm, in my experience. > > > > A quick search on the bioperl website shows that there is a mutation.pls > script, and facilities for using Sean Eddy's SQUID C library, which > implements the same function (I wrote this particular function before I knew > how to use C). As such, it's probably not as elegant as Sean Eddy's > implementation, but it works - and it's entirely in perl. > > > > The bioperl developer pages suggest a post to the mailing list as the best > place to start contributing to bioperl. Is this a useful function to add to > the project? > > > > Best Regards, > > > > Michael > > > > ------------------------------- > > Michael Janis > > mjanis at chem.ucla.edu > > ------------------------------- > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Dr. Rutger A. Vos Department of zoology University of British Co