From bill at genenformics.com Sun Jun 1 00:28:04 2008 From: bill at genenformics.com (bill at genenformics.com) Date: Sat, 31 May 2008 21:28:04 -0700 (PDT) Subject: [Bioperl-l] How to extract list of SNPs for a given gene? In-Reply-To: References: <6F230E9769AA8D4EB4BC401DF133EDB7180C4A@NIHCESMLBX15.nih.gov> Message-ID: <61887.98.218.182.229.1212294484.squirrel@webmail.dreamhost.com> Hi, Abhijit, Gene2Snp, a standalone application which find SNPs for given Entrez Genes, can be freely downloaded from http://www.genenformics.com/download.html A sample output is available at http://www.genenformics.com/Gene2Snp_example_result.txt This application may consume lots of CPU/memory due to complexity of locus region. Bill at genenformics.com > From: artendulkar at gmail.com [mailto:artendulkar at gmail.com] > Sent: Tuesday, May 20, 2008 4:21 PM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] How to extract list of SNPs for a given gene? > > Hi, > Can anyone please tell me how to get list of SNPs in any particular gene > using BioPerl, given NCBI Gene ID? > Is there any method, which takes NCBI gene ID as argument and returns > list > of SNPs by connecting to dbSNP? > Thank you. > Abhijit > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Sun Jun 1 12:51:38 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 1 Jun 2008 11:51:38 -0500 Subject: [Bioperl-l] Fwd: BPlite In-Reply-To: References: <48412383.5080201@ucsf.edu> Message-ID: <5DB8D706-5312-4D70-B71A-60915A84C825@uiuc.edu> The problem is BPLite has been officially deprecated in favor of Bio::SearchIO BLAST parsing (including Sendu's BLAST-based pull parser). If there is interest in resurrecting BPLite we would need someone to actively maintain it. chris On May 31, 2008, at 4:53 PM, Jason Stajich wrote: > > > Begin forwarded message: > >> From: Anatoly Urisman >> Date: May 31, 2008 5:08:03 AM CDT >> To: jason... >> Subject: BPlite >> >> Hi Jason, >> I was wondering if you are aware of a fix to the BPlite.pm module >> that supports the new NCBI blastall output (i.e. reports are not >> delimited by something like BLASTN 2.2.8 [Jan-05-2004]). >> Thanks. >> Anatoly Urisman, MD-PhD > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Marie-Claude Hofmann College of Veterinary Medicine University of Illinois Urbana-Champaign From cjfields at uiuc.edu Tue Jun 3 11:25:26 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 3 Jun 2008 10:25:26 -0500 Subject: [Bioperl-l] [ANNOUNCEMENT] BioPerl and the Google Summer of Code Message-ID: On behalf of the BioPerl core developers, Google, and the National Evolutionary Synthesis Center (NESCent), I would like to congratulate Mira Han on being accepted as a student for the Google Summer of Code (GSoC) and welcome her to the BioPerl community. Mira's accepted project proposal involves developing phyloXML support for BioPerl. Following is the proposal abstract: "PhyloXML is an XML document model for phylogenetic data that incorporates various annotation types, including user customized data. The format is currently not supported by BioPerl. I propose a SAX based data structure and interface for PhyloXML support in BioPerl. I will use most of the existing IO structures such as TreeIO and TreeEventBuilder and subclass them to extend the functions specific to PhyloXML. The objects will be connected to various existing BioPerl modules, such as SeqI, TaxonI, AnnotationI by reference in order to accommodate different phyloXML elements." NESCent, under the Phyloinformatics Summer of Code, is participating as a mentoring organization in the GSoC for the second year. This year, five projects (including Mira's) are being funded by Google, with a sixth project being funded by external sources. Mira's co- mentors for this project are myself, Jason Stajich, Rutger Vos, and Christian Zmasek (the primary developer of phyloXML). However, I encourage Mira to ask questions on the BioPerl mail list for feedback from the greater BioPerl community. Further information on phyloXML: http://www.phyloxml.org/ Further information on NESCent's Phyloinformatics Summer of Code, including all funded projects: https://www.nescent.org/wg/phyloinformatics/index.php?title=Phyloinformatics_Summer_of_Code_2008 Sincerely, Christopher Fields Postdoctoral Researcher Lab of Dr. Marie-Claude Hofmann The Institute for Genomic Biology University of Illinois Urbana-Champaign From miraceti at gmail.com Tue Jun 3 11:58:00 2008 From: miraceti at gmail.com (miraceti) Date: Tue, 3 Jun 2008 11:58:00 -0400 Subject: [Bioperl-l] [ANNOUNCEMENT] BioPerl and the Google Summer of Code In-Reply-To: References: Message-ID: Thanks for the welcome, I'm very excited to be part of the great community, Here is the wiki page for the project, *http://www.bioperl.org/wiki/PhyloXML_support_in_BioPerl *I'll look forward to interacting with you and getting lots of help! Mira Han On Tue, Jun 3, 2008 at 11:25 AM, Chris Fields wrote: > On behalf of the BioPerl core developers, Google, and the National > Evolutionary Synthesis Center (NESCent), I would like to congratulate Mira > Han on being accepted as a student for the Google Summer of Code (GSoC) and > welcome her to the BioPerl community. Mira's accepted project proposal > involves developing phyloXML support for BioPerl. Following is the proposal > abstract: > > "PhyloXML is an XML document model for phylogenetic data that incorporates > various annotation types, including user customized data. The format is > currently not supported by BioPerl. I propose a SAX based data structure and > interface for PhyloXML support in BioPerl. I will use most of the existing > IO structures such as TreeIO and TreeEventBuilder and subclass them to > extend the functions specific to PhyloXML. The objects will be connected to > various existing BioPerl modules, such as SeqI, TaxonI, AnnotationI by > reference in order to accommodate different phyloXML elements." > > NESCent, under the Phyloinformatics Summer of Code, is participating as a > mentoring organization in the GSoC for the second year. This year, five > projects (including Mira's) are being funded by Google, with a sixth project > being funded by external sources. Mira's co-mentors for this project are > myself, Jason Stajich, Rutger Vos, and Christian Zmasek (the primary > developer of phyloXML). However, I encourage Mira to ask questions on the > BioPerl mail list for feedback from the greater BioPerl community. > > Further information on phyloXML: > > http://www.phyloxml.org/ > > Further information on NESCent's Phyloinformatics Summer of Code, including > all funded projects: > > > https://www.nescent.org/wg/phyloinformatics/index.php?title=Phyloinformatics_Summer_of_Code_2008 > > Sincerely, > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Marie-Claude Hofmann > The Institute for Genomic Biology > University of Illinois Urbana-Champaign > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From bamboowarrior at gmail.com Tue Jun 3 16:50:37 2008 From: bamboowarrior at gmail.com (Arkady) Date: Tue, 3 Jun 2008 15:50:37 -0500 Subject: [Bioperl-l] liftOver API Message-ID: <91656c3f0806031350i442c6359v9c3461247e4340c6@mail.gmail.com> Hi folks, I've seen references occasionally to ensembl API or a BioPerl module for converting between (human) genome assemblies (e.g. hg17 to hg18). I'm also, of course, aware of liftOver, and of the chain file format. But I'm more interested in the API. Does this still exist? Where can I find it? If not, does anyone have something that does this? Cheers, John Woods From ousmane.diallo at crchum.qc.ca Tue Jun 3 17:21:42 2008 From: ousmane.diallo at crchum.qc.ca (Ousmane Diallo) Date: Tue, 03 Jun 2008 17:21:42 -0400 Subject: [Bioperl-l] How to get protein ID and get protein accession from GI Message-ID: <4845B5E6.4050703@crchum.qc.ca> hello, Could somebody help me on how to get the protein ID and ACCESSION using the mRNA gi or accession. my $db_obj = Bio::DB::GenBank->new(); my $seq_obj = $db_obj->get_Seq_by_acc('123456') ; # I pass here the mrna acc to get the seq_obj my $gi = $seq_obj->primary_id ; # I get here the gi I need to get the protein ID and protein ACCESSION" THANKS!! From pallavi.sarmah at igib.res.in Wed Jun 4 08:33:08 2008 From: pallavi.sarmah at igib.res.in (Pallavi Sarmah) Date: Wed, 4 Jun 2008 18:03:08 +0530 Subject: [Bioperl-l] Bioperl-ext installation error Message-ID: <4C33FA201D55F743B5DE794497FCA8971F0C88@n1ex> Hi, I am trying to install Bioperl-ext and when rum the Makefile.PL it gives me the following error. ERROR from evaluation of /home/pallavi/Pallavi/downloads/bioperl-ext-1.5.1/Bio/SeqIO/staden/Makefile.PL: Invalid version '' for Bio::SeqIO::staden::read. Can anyone let me know the remedy for this. I stuck with this for last 2-3 days. Pallavi From sidd.basu at gmail.com Wed Jun 4 10:54:13 2008 From: sidd.basu at gmail.com (Siddhartha Basu) Date: Wed, 4 Jun 2008 09:54:13 -0500 Subject: [Bioperl-l] Re: How to get protein ID and get protein accession from GI In-Reply-To: <4845B5E6.4050703@crchum.qc.ca> References: <4845B5E6.4050703@crchum.qc.ca> Message-ID: <4846ac97.c505be0a.0446.1dc2@mx.google.com> Hi, You have to get the 'Feature' object for that. On Tue, 03 Jun 2008, Ousmane Diallo wrote: > hello, > Could somebody help me on how to get the protein ID and ACCESSION using the mRNA gi or accession. > > > my $db_obj = Bio::DB::GenBank->new(); my $seq_obj = > $db_obj->get_Seq_by_acc('123456') ; # I pass here the mrna acc to get the > seq_obj > my $gi = $seq_obj->primary_id ; # I get here the gi my ($feat) = grep { $_->primary_tag() eq 'Protein' } $seq_obj->get_SeqFeatures(); print $feat->seq_id(),"\n"; For details and explanations read the Howto here ..... http://www.bioperl.org/wiki/HOWTO:Feature-Annotation -siddhartha > > > I need to get the protein ID and protein ACCESSION" > > THANKS!! > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jay at jays.net Wed Jun 4 10:19:03 2008 From: jay at jays.net (Jay Hannah) Date: Wed, 4 Jun 2008 09:19:03 -0500 Subject: [Bioperl-l] How to get protein ID and get protein accession from GI In-Reply-To: <4845B5E6.4050703@crchum.qc.ca> References: <4845B5E6.4050703@crchum.qc.ca> Message-ID: <200F7B8D-3066-48A8-9965-9E4E215749D9@jays.net> On Jun 3, 2008, at 4:21 PM, Ousmane Diallo wrote: > Could somebody help me on how to get the protein ID and ACCESSION > using the mRNA gi or accession. > > my $db_obj = Bio::DB::GenBank->new(); > my $seq_obj = $db_obj->get_Seq_by_acc('123456') ; # I pass here > the mrna acc to get the seq_obj > my $gi = $seq_obj->primary_id ; # I get here > the gi > > I need to get the protein ID and protein ACCESSION" Please provide a real MRNA accession # you're interested in. I prefer sending example code that I know actually works on your data of interest. :) Thanks, j http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah From bill at genenformics.com Thu Jun 5 01:16:07 2008 From: bill at genenformics.com (bill at genenformics.com) Date: Wed, 4 Jun 2008 22:16:07 -0700 (PDT) Subject: [Bioperl-l] How to get protein ID and get protein accession from GI Message-ID: <62133.98.218.171.90.1212642967.squirrel@webmail.dreamhost.com> Hi, Ousmane, IdConvert, a standalone application which convert protein/nucleotide gi/acc, can be freely downloaded from http://www.genenformics.com/download.html A sample output is available at http://www.genenformics.com/IdConvert_example_result.txt The following is a test run: >IdConvert.exe 300,NM_005252,399,NP_001225 #Input Nuc_GI Nuc_Acc Pro_GI Pro_Acc Desc 300 299 X59693.1 300 CAA42214.1 ubiquinol--cytochrome c reductase [Bos taurus] NM_005252 6552332 NM_005252.2 4885241 NP_005243.1 v-fos FBJ murine osteosarcoma viral oncogene homolog [Homo sapiens] 399 399 V00111.1 400 CAA23445.1 unnamed protein product [Bos taurus] NP_001225 15451858 NM_001234.3 4502589 NP_001225.1 caveolin 3 [Homo sapiens] Bill at genenformics.com > hello, > Could somebody help me on how to get the protein ID and ACCESSION using > the mRNA gi or accession. > > > my $db_obj = Bio::DB::GenBank->new(); > my $seq_obj = $db_obj->get_Seq_by_acc('123456') ; # I pass here the mrna > acc to get the seq_obj > my $gi = $seq_obj->primary_id ; # I get here the gi > > > I need to get the protein ID and protein ACCESSION" > > THANKS!! > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From bug-bioperl at rt.cpan.org Thu Jun 5 12:01:13 2008 From: bug-bioperl at rt.cpan.org (=?UTF-8?B?Q8OpZHJpYyBDYWJhdQ==?= via RT) Date: Thu, 05 Jun 2008 12:01:13 -0400 Subject: [Bioperl-l] [rt.cpan.org #36480] Bug in Bio::Search::SearchUtils.pm In-Reply-To: <00ba01c8c725$49c454a0$dd4cfde0$@Cabau@tours.inra.fr> References: <00ba01c8c725$49c454a0$dd4cfde0$@Cabau@tours.inra.fr> Message-ID: Thu Jun 05 12:01:11 2008: Request 36480 was acted upon. Transaction: Ticket created by Cedric.Cabau at tours.inra.fr Queue: bioperl Subject: Bug in Bio::Search::SearchUtils.pm Broken in: (no value) Severity: (no value) Owner: Nobody Requestors: Cedric.Cabau at tours.inra.fr Status: new Ticket BioPerl version: bioperl-1.5.2_102 Module: Bio::Search::SearchUtils.pm OS: CentOS Linux version 2.6.18-53.1.14.el5 (gcc version 4.1.2 20070626 (Red Hat 4.1.2-14)) Perl: v5.8.8 built for x86_64-linux-thread-multi Bug description: Methods tile_hsps looks (among other things) if alignment is ambiguous. Inside loop foreach $hsp ( $sbjct->hsps() ) { at line 177, methods $qoverlap = &_adjust_contigs() (line 200) and $soverlap = &_adjust_contigs() (line 206) are used to know if current HSP overlap previous ones on query and on subject. The problem is that in this loop, only the result of the last comparison which means the last HSP with previous ones is kept in variables $qoverlap and $soverlap. After the loop, we found (line 299): if($qoverlap) { if($soverlap) { $sbjct->ambiguous_aln('qs'); } else { $sbjct->ambiguous_aln('q'); } } elsif($soverlap) { $sbjct->ambiguous_aln('s'); } Only the result of the last comparison is stored in $sbjct and method ambiguous_aln from module Bio::Search::Hit::GenericHit will return wrong value if the alignment presents overlapping HSPs but last HSP not overlap with previous ones. To solve this bug, I just modify in line 200: $qoverlap = &_adjust_contigs('query', $hsp, $qstart, $qstop, \@qcontigs, $max_overlap, $frame, $qstrand); by $qoverlap += &_adjust_contigs('query', $hsp, $qstart, $qstop, \@qcontigs, $max_overlap, $frame, $qstrand); and in line 206: $soverlap = &_adjust_contigs('sbjct', $hsp, $sstart, $sstop, \@scontigs, $max_overlap, $frame, $sstrand); by $soverlap += &_adjust_contigs('sbjct', $hsp, $sstart, $sstop, \@scontigs, $max_overlap, $frame, $sstrand); to keep trace of overlaps in the whole HSPs screening process. Regards, Cedric -- +---------------------------------------------------------------+ | C?dric Cabau INRA - SIGENAE - URA | | Tel : 02.47.42.75.42 Fax : 02.47.42.77.78 | | http://www.sigenae.org INRA - UR 83 - 37380 Nouzilly | +---------------------------------------------------------------+ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jvaughn7 at utk.edu Thu Jun 5 11:53:54 2008 From: jvaughn7 at utk.edu (JustinV) Date: Thu, 5 Jun 2008 08:53:54 -0700 (PDT) Subject: [Bioperl-l] updating a reciprocal blast file Message-ID: <17673277.post@talk.nabble.com> I have a large reciprocal blast file that contains 3 proteomes. I'd like to integrate another proteome for downstream clustering. I imagine a command-line script that takes as input the new proteome in fasta format, the directory of the the old proteomes in fasta format, and the pre-existing reciprocal blast file, and then performs the proper blasts and updates the pre-existing reciprocal blast file accordingly. I am using blast locally and the downstream processing is done by OrthoMCL. I assume this has been handled before, but I can't track down the code. If anyone is familiar with a pre-exisiting script or has pertinent advice, I'd be much obliged. Justin -- View this message in context: http://www.nabble.com/updating-a-reciprocal-blast-file-tp17673277p17673277.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From jason at bioperl.org Thu Jun 5 16:18:40 2008 From: jason at bioperl.org (Jason Stajich) Date: Thu, 5 Jun 2008 13:18:40 -0700 Subject: [Bioperl-l] updating a reciprocal blast file In-Reply-To: <17673277.post@talk.nabble.com> References: <17673277.post@talk.nabble.com> Message-ID: <167EE308-F6F6-4EE9-A773-255A68906ABB@bioperl.org> Are you keeping each pairwise in a separate file and then combining it all? http://fungalgenomes.org/~stajich/scripts/pairwise_blast_jobs_big.pl Are you fixing E-values so they are scaled across different sized databases? You will probably want to add a Z= parameter to insure values are useable. I also had to hack ORTHOMCL locally to cache things in DB_Files as it was too memory intensive the way it runs on my big datasets. -jason On Jun 5, 2008, at 8:53 AM, JustinV wrote: > > I have a large reciprocal blast file that contains 3 proteomes. > I'd like to > integrate another proteome for downstream clustering. I imagine a > command-line script that takes as input the new proteome in fasta > format, > the directory of the the old proteomes in fasta format, and the pre- > existing > reciprocal blast file, and then performs the proper blasts and > updates the > pre-existing reciprocal blast file accordingly. I am using blast > locally > and the downstream processing is done by OrthoMCL. I assume this > has been > handled before, but I can't track down the code. If anyone is > familiar with > a pre-exisiting script or has pertinent advice, I'd be much obliged. > > Justin > -- > View this message in context: http://www.nabble.com/updating-a- > reciprocal-blast-file-tp17673277p17673277.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jvaughn7 at utk.edu Fri Jun 6 10:18:16 2008 From: jvaughn7 at utk.edu (JustinV) Date: Fri, 6 Jun 2008 07:18:16 -0700 (PDT) Subject: [Bioperl-l] updating a reciprocal blast file In-Reply-To: <167EE308-F6F6-4EE9-A773-255A68906ABB@bioperl.org> References: <17673277.post@talk.nabble.com> <167EE308-F6F6-4EE9-A773-255A68906ABB@bioperl.org> Message-ID: <17693254.post@talk.nabble.com> Jason, Thanks for the suggestions. The blast report is a single file containing each query in the 3 proteomes against a database of the 3 proteomes. As you probably remember, OrthoMCL offers many modes of running that head-off various levels of processing. In any case, the reciprocal blast is the bottle-neck. Mode -3 forgoes this step, but it requires a single, exhaustive blast report. Perhaps, I could run the reciprocal blasts separately as you suggested and then integrate them with your pairwise_blast_jobs_big.pl. I have to say, this code looks a little spooky. Wouldn't it be possible to just resort (by normalized e-value, discussed below) and reprint each set of query hits based on a hash or index of the results of that query against the new proteome? And then tack on the results of new proteome against the updated database to the end of the total blast report. In terms of normalizing the e-value, since I am using a consistent scoring matrix, can't I just recalculate the scores based on new database size: (new e-value) = (new database size) * (old e-value) / (old database size) as in http://www.springerlink.com/content/55m318wwqdgtw85h/ (methods) and elsewhere. As of yet, I've been satisfied with the run time downstream of the reciprocal blast, but, as I've said, I'm currently only using three plant (dicot) proteomes. By "hack ORTHOMCL locally to cache things in DB_Files" do you mean serializing the blastSeq objects from early blasts in the blast_parse subroutine using bioperl-db or something? Maybe this is dumb assumption. In any case, I'd be curious to see your modified version of the OrthoMCL script. Justin Jason Stajich-3 wrote: > > Are you keeping each pairwise in a separate file and then combining > it all? > http://fungalgenomes.org/~stajich/scripts/pairwise_blast_jobs_big.pl > > Are you fixing E-values so they are scaled across different sized > databases? You will probably want to add a Z= parameter to insure > values are useable. > > I also had to hack ORTHOMCL locally to cache things in DB_Files as it > was too memory intensive the way it runs on my big datasets. > > -jason > On Jun 5, 2008, at 8:53 AM, JustinV wrote: > >> >> I have a large reciprocal blast file that contains 3 proteomes. >> I'd like to >> integrate another proteome for downstream clustering. I imagine a >> command-line script that takes as input the new proteome in fasta >> format, >> the directory of the the old proteomes in fasta format, and the pre- >> existing >> reciprocal blast file, and then performs the proper blasts and >> updates the >> pre-existing reciprocal blast file accordingly. I am using blast >> locally >> and the downstream processing is done by OrthoMCL. I assume this >> has been >> handled before, but I can't track down the code. If anyone is >> familiar with >> a pre-exisiting script or has pertinent advice, I'd be much obliged. >> >> Justin >> -- >> View this message in context: http://www.nabble.com/updating-a- >> reciprocal-blast-file-tp17673277p17673277.html >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://www.nabble.com/updating-a-reciprocal-blast-file-tp17673277p17693254.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From whuang.ustc at gmail.com Sun Jun 8 23:27:42 2008 From: whuang.ustc at gmail.com (Wen Huang) Date: Sun, 8 Jun 2008 22:27:42 -0500 Subject: [Bioperl-l] EMBL format field Message-ID: <3A0B2CCF-6EA7-485D-ABC9-6A0F83B3F4B0@gmail.com> Hi all, I have a EMBL file that I want to extract one of the line ###file### ID BAA19060; SV 1; linear; mRNA; STD; MAM; 2115 BP. XX PA AB000170.1 XX DE Sus scrofa (pig) endopeptidase 24.16 type M1 XX OS Sus scrofa (pig) OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; OC Eutheria; Laurasiatheria; Cetartiodactyla; Suina; Suidae; Sus. OX NCBI_TaxID=9823; ......... I want the accession number in the line that starts with PA, AB000170 in this example. Can anybody kindly help, tell me which module and method I should use? I tried various things like $seq_obj -> primary_id, display_id, get_secondary_id, etc.. they did not work... Thanks a lot! Wen From Marc.Logghe at ablynx.com Mon Jun 9 04:47:11 2008 From: Marc.Logghe at ablynx.com (Marc Logghe) Date: Mon, 9 Jun 2008 10:47:11 +0200 Subject: [Bioperl-l] EMBL format field In-Reply-To: <3A0B2CCF-6EA7-485D-ABC9-6A0F83B3F4B0@gmail.com> References: <3A0B2CCF-6EA7-485D-ABC9-6A0F83B3F4B0@gmail.com> Message-ID: <03C512635899144083CADB0EE222018901C8072B@alpaca.lan.ablynx.com> Hi Wen, A dump of that sequence object (Data::Dumper is your friend !) reveals that the PA EMBL field is not saved into the object. However, you will find the string 'AB000170.1' in the embedded CDS feature, more precisely the seqid of the location object. I don't know whether that is always the case, but it is in your particular example. So, to get your hands on that value you have to do: my ($cds) = grep {$_->primary_tag eq 'CDS'} $seq->get_SeqFeatures; my $parent_id = $cds->location->seq_id; HTH, Marc Marc Logghe Senior Bioinformatician Ablynx nv > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Wen Huang > Sent: Monday, June 09, 2008 5:28 AM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] EMBL format field > > Hi all, > > I have a EMBL file that I want to extract one of the line > > ###file### > ID BAA19060; SV 1; linear; mRNA; STD; MAM; 2115 BP. > XX > PA AB000170.1 > XX > DE Sus scrofa (pig) endopeptidase 24.16 type M1 > XX > OS Sus scrofa (pig) > OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; > Mammalia; > OC Eutheria; Laurasiatheria; Cetartiodactyla; Suina; Suidae; Sus. > OX NCBI_TaxID=9823; > ......... > > I want the accession number in the line that starts with PA, AB000170 > in this example. > > Can anybody kindly help, tell me which module and method I should use? > I tried various things like $seq_obj -> primary_id, display_id, > get_secondary_id, etc.. they did not work... > > Thanks a lot! > > Wen > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gmx.net Mon Jun 9 08:30:07 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 9 Jun 2008 08:30:07 -0400 Subject: [Bioperl-l] EMBL format field In-Reply-To: <03C512635899144083CADB0EE222018901C8072B@alpaca.lan.ablynx.com> References: <3A0B2CCF-6EA7-485D-ABC9-6A0F83B3F4B0@gmail.com> <03C512635899144083CADB0EE222018901C8072B@alpaca.lan.ablynx.com> Message-ID: If this is the case with the latest version of BioPerl it should be filed as a bug report for the embl parser. The ID ought to be reported in $seq->get_secondary_accessions() (which returns an array). If it doesn't, it sounds like a bug to me. -hilmar On Jun 9, 2008, at 4:47 AM, Marc Logghe wrote: > Hi Wen, > A dump of that sequence object (Data::Dumper is your friend !) reveals > that the PA EMBL field is not saved into the object. However, you will > find the string 'AB000170.1' in the embedded CDS feature, more > precisely > the seqid of the location object. I don't know whether that is always > the case, but it is in your particular example. > So, to get your hands on that value you have to do: > > my ($cds) = grep {$_->primary_tag eq 'CDS'} $seq->get_SeqFeatures; > my $parent_id = $cds->location->seq_id; > > HTH, > Marc > > Marc Logghe > Senior Bioinformatician > Ablynx nv >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Wen Huang >> Sent: Monday, June 09, 2008 5:28 AM >> To: bioperl-l at lists.open-bio.org >> Subject: [Bioperl-l] EMBL format field >> >> Hi all, >> >> I have a EMBL file that I want to extract one of the line >> >> ###file### >> ID BAA19060; SV 1; linear; mRNA; STD; MAM; 2115 BP. >> XX >> PA AB000170.1 >> XX >> DE Sus scrofa (pig) endopeptidase 24.16 type M1 >> XX >> OS Sus scrofa (pig) >> OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; >> Euteleostomi; >> Mammalia; >> OC Eutheria; Laurasiatheria; Cetartiodactyla; Suina; Suidae; Sus. >> OX NCBI_TaxID=9823; >> ......... >> >> I want the accession number in the line that starts with PA, AB000170 >> in this example. >> >> Can anybody kindly help, tell me which module and method I should >> use? >> I tried various things like $seq_obj -> primary_id, display_id, >> get_secondary_id, etc.. they did not work... >> >> Thanks a lot! >> >> Wen >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From whuang.ustc at gmail.com Mon Jun 9 10:05:35 2008 From: whuang.ustc at gmail.com (Wen Huang) Date: Mon, 9 Jun 2008 09:05:35 -0500 Subject: [Bioperl-l] EMBL format field In-Reply-To: <03C512635899144083CADB0EE222018901C8072B@alpaca.lan.ablynx.com> References: <3A0B2CCF-6EA7-485D-ABC9-6A0F83B3F4B0@gmail.com> <03C512635899144083CADB0EE222018901C8072B@alpaca.lan.ablynx.com> Message-ID: Hi Marc, Thanks a lot! It does work!! Wen On Jun 9, 2008, at 3:47 AM, Marc Logghe wrote: > Hi Wen, > A dump of that sequence object (Data::Dumper is your friend !) reveals > that the PA EMBL field is not saved into the object. However, you will > find the string 'AB000170.1' in the embedded CDS feature, more > precisely > the seqid of the location object. I don't know whether that is always > the case, but it is in your particular example. > So, to get your hands on that value you have to do: > > my ($cds) = grep {$_->primary_tag eq 'CDS'} $seq->get_SeqFeatures; > my $parent_id = $cds->location->seq_id; > > HTH, > Marc > > Marc Logghe > Senior Bioinformatician > Ablynx nv >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Wen Huang >> Sent: Monday, June 09, 2008 5:28 AM >> To: bioperl-l at lists.open-bio.org >> Subject: [Bioperl-l] EMBL format field >> >> Hi all, >> >> I have a EMBL file that I want to extract one of the line >> >> ###file### >> ID BAA19060; SV 1; linear; mRNA; STD; MAM; 2115 BP. >> XX >> PA AB000170.1 >> XX >> DE Sus scrofa (pig) endopeptidase 24.16 type M1 >> XX >> OS Sus scrofa (pig) >> OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; >> Euteleostomi; >> Mammalia; >> OC Eutheria; Laurasiatheria; Cetartiodactyla; Suina; Suidae; Sus. >> OX NCBI_TaxID=9823; >> ......... >> >> I want the accession number in the line that starts with PA, AB000170 >> in this example. >> >> Can anybody kindly help, tell me which module and method I should >> use? >> I tried various things like $seq_obj -> primary_id, display_id, >> get_secondary_id, etc.. they did not work... >> >> Thanks a lot! >> >> Wen >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l From whuang.ustc at gmail.com Mon Jun 9 10:07:28 2008 From: whuang.ustc at gmail.com (Wen Huang) Date: Mon, 9 Jun 2008 09:07:28 -0500 Subject: [Bioperl-l] EMBL format field In-Reply-To: References: <3A0B2CCF-6EA7-485D-ABC9-6A0F83B3F4B0@gmail.com> <03C512635899144083CADB0EE222018901C8072B@alpaca.lan.ablynx.com> Message-ID: <45F42C94-02FD-429B-816F-883C7855A97D@gmail.com> Hilmar, I tried that, it did not work. Marc's way can work. Thanks, Wen On Jun 9, 2008, at 7:30 AM, Hilmar Lapp wrote: > If this is the case with the latest version of BioPerl it should be > filed as a bug report for the embl parser. The ID ought to be > reported in $seq->get_secondary_accessions() (which returns an > array). If it doesn't, it sounds like a bug to me. > > -hilmar > > On Jun 9, 2008, at 4:47 AM, Marc Logghe wrote: >> Hi Wen, >> A dump of that sequence object (Data::Dumper is your friend !) >> reveals >> that the PA EMBL field is not saved into the object. However, you >> will >> find the string 'AB000170.1' in the embedded CDS feature, more >> precisely >> the seqid of the location object. I don't know whether that is always >> the case, but it is in your particular example. >> So, to get your hands on that value you have to do: >> >> my ($cds) = grep {$_->primary_tag eq 'CDS'} $seq->get_SeqFeatures; >> my $parent_id = $cds->location->seq_id; >> >> HTH, >> Marc >> >> Marc Logghe >> Senior Bioinformatician >> Ablynx nv >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>> bounces at lists.open-bio.org] On Behalf Of Wen Huang >>> Sent: Monday, June 09, 2008 5:28 AM >>> To: bioperl-l at lists.open-bio.org >>> Subject: [Bioperl-l] EMBL format field >>> >>> Hi all, >>> >>> I have a EMBL file that I want to extract one of the line >>> >>> ###file### >>> ID BAA19060; SV 1; linear; mRNA; STD; MAM; 2115 BP. >>> XX >>> PA AB000170.1 >>> XX >>> DE Sus scrofa (pig) endopeptidase 24.16 type M1 >>> XX >>> OS Sus scrofa (pig) >>> OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; >>> Euteleostomi; >>> Mammalia; >>> OC Eutheria; Laurasiatheria; Cetartiodactyla; Suina; Suidae; Sus. >>> OX NCBI_TaxID=9823; >>> ......... >>> >>> I want the accession number in the line that starts with PA, >>> AB000170 >>> in this example. >>> >>> Can anybody kindly help, tell me which module and method I should >>> use? >>> I tried various things like $seq_obj -> primary_id, display_id, >>> get_secondary_id, etc.. they did not work... >>> >>> Thanks a lot! >>> >>> Wen >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > From cjfields at uiuc.edu Mon Jun 9 14:12:29 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 9 Jun 2008 13:12:29 -0500 Subject: [Bioperl-l] [Wg-phyloinformatics] Re: phyloXML weekly report In-Reply-To: References: Message-ID: <87FB1AA1-943D-4BCC-8F00-2CC5F05FFEAB@uiuc.edu> [cross-posting to bioperl-l for archiving] On Jun 8, 2008, at 11:32 PM, Han, Mira wrote: > ... > issues: > There are a lot of s when processing the elements, > I tried to make a hash of function references that point to the > member functions, > But when I tried calling it through the hash, it was giving me an > error that I'm trying to call a method on an unblessed object. I ran into something similar when setting up a few SeqIO modules (Bio::SeqIO::gbdriver being on of them) which passed on data chunks to method handlers. It has something to do with how the method is set up in the class (package) namespace and how you refer to it. It's a little tricky b/c you run into semantic issues with perl's 'hammered- on' OO, but it can be done. If you call using '$self->{lookup}->{$tag}->(@args)' directly, what happens is you can successfully call the method since you are still in the proper module namespace. However, since you aren't calling from the invocant ($self) directly but rather from a reference in the invocant, it treats the call like a subroutine instead of a method. Therefore no invocant is passed as the first argument (you will instead get either the first element in @args or 'undef' assigned to $self within the method). Not sure if this is supposed to be a feature or a bug. Regardless, any attempt within the method to do something with $self will result in a 'using an unblessed reference' or 'not a hash reference'. There are two solutions, both of which work. If you have method references stored in a hash table in the invocant: $self->{lookup}->{tag1} = \&foo; $self->{lookup}->{tag2} = \&bar; .... you can grab the actual code reference (checking using 'exists') and use it directly on the invocant, but NOT as a code reference. This acts as a symbolic reference, which is allowed for subroutine and method calls (I think it's supposed to be DWIM-my): if (exists $self->{lookup}->{$tag}) { my $method = $self->{lookup}->{$tag}; $self->$method(@args); } else {...} The above also works if you use strings in the lookup table which contain the name of the methods (again, symbolic reference): $self->{lookup}->{tag1} = 'foo'; $self->{lookup}->{tag2} = 'bar'; Alternately, you can pass the invocant in explicitly (which looks weird to me, hence my above solution): if (exists $self->{lookup}->{$tag}) { $self->{lookup}->{$tag}->($self, @args); } else {...} perl6 fixes a lot of these issues, but of course it won't be out for a while longer. > I'd like to figure out how to do it, > But before that, is hashing really better than lots of if-elses? Using a stack of if-elsifs isn't as efficient as a lookup since you would test each case in succession (so something that is further down the if-elseif test stack would have passed through and failed each previous test case before success). A lookup table would test simply based on the existence of a value stored under a key (tag). An alternative is to use 5.10 features (smart matching and given-when, which is like a switch statement), but that will limit usage for those still using 5.8.8, which is probably a majority of users, since 5.10 came out just last December. chris > > > Mira > > > > On 6/2/08 10:29 AM, "Han, Mira" wrote: > > > > Last week (May 26-30): > 1. made skeleton files for TreeIO:: PhyloEventBuilder, > TreeIO::phyloXML, Tree::NodePhyloXML > 2. managed to connect and load them up but there is a bus error > problem. > I think it's probably due to some of the function calls that I'm > making > That I haven't looked into properly. I'm suspecting it will go away > once I properly > build in the end_element for > > This week (Jun 2-6): > 1. implement start_element, and end_element for and > > - start_element: : add treelevel, : push data > to current_items. > - end_element: : minus treelevel, : pop data > from current_elements, use new() to build node from popped data. > 2. get rid of that bus error > 3. TreeIO::phyloXML::Next_tree() : look for element > _______________________________________________ > Wg-phyloinformatics mailing list > Wg-phyloinformatics at nescent.org > https://lists.nescent.org/mailman/listinfo/wg-phyloinformatics Christopher Fields Postdoctoral Researcher Lab of Dr. Marie-Claude Hofmann College of Veterinary Medicine University of Illinois Urbana-Champaign From cjfields at uiuc.edu Mon Jun 9 15:08:23 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 9 Jun 2008 14:08:23 -0500 Subject: [Bioperl-l] [Wg-phyloinformatics] Re: phyloXML weekly report In-Reply-To: References: Message-ID: <763316E4-F480-45FC-BC81-A450D61CAB2D@uiuc.edu> Yes, that works as well. chris On Jun 9, 2008, at 1:30 PM, aaron.j.mackey at gsk.com wrote: > How about just: > > $self->${ $self->{lookup}->{tag} }(@args) > > i.e., shorthand for: > > $method = $self->{lookup}->{tag} > $self->$method(@args); > > -Aaron > > wg-phyloinformatics-bounces at nescent.org wrote on 06/09/2008 02:12:29 > PM: > >> [cross-posting to bioperl-l for archiving] >> >> On Jun 8, 2008, at 11:32 PM, Han, Mira wrote: >> >>> ... >>> issues: >>> There are a lot of s when processing the elements, >>> I tried to make a hash of function references that point to the >>> member functions, >>> But when I tried calling it through the hash, it was giving me an >>> error that I'm trying to call a method on an unblessed object. >> >> I ran into something similar when setting up a few SeqIO modules >> (Bio::SeqIO::gbdriver being on of them) which passed on data chunks >> to >> method handlers. It has something to do with how the method is set >> up >> in the class (package) namespace and how you refer to it. It's a >> little tricky b/c you run into semantic issues with perl's 'hammered- >> on' OO, but it can be done. >> >> If you call using '$self->{lookup}->{$tag}->(@args)' directly, what >> happens is you can successfully call the method since you are still >> in >> the proper module namespace. However, since you aren't calling from >> the invocant ($self) directly but rather from a reference in the >> invocant, it treats the call like a subroutine instead of a method. >> Therefore no invocant is passed as the first argument (you will >> instead get either the first element in @args or 'undef' assigned to >> $self within the method). Not sure if this is supposed to be a >> feature or a bug. Regardless, any attempt within the method to do >> something with $self will result in a 'using an unblessed reference' >> or 'not a hash reference'. >> >> There are two solutions, both of which work. If you have method >> references stored in a hash table in the invocant: >> >> $self->{lookup}->{tag1} = \&foo; >> $self->{lookup}->{tag2} = \&bar; >> .... >> >> you can grab the actual code reference (checking using 'exists') and >> use it directly on the invocant, but NOT as a code reference. This >> acts as a symbolic reference, which is allowed for subroutine and >> method calls (I think it's supposed to be DWIM-my): >> >> if (exists $self->{lookup}->{$tag}) { >> my $method = $self->{lookup}->{$tag}; >> $self->$method(@args); >> } else {...} >> >> The above also works if you use strings in the lookup table which >> contain the name of the methods (again, symbolic reference): >> >> $self->{lookup}->{tag1} = 'foo'; >> $self->{lookup}->{tag2} = 'bar'; >> >> Alternately, you can pass the invocant in explicitly (which looks >> weird to me, hence my above solution): >> >> if (exists $self->{lookup}->{$tag}) { >> $self->{lookup}->{$tag}->($self, @args); >> } else {...} >> >> perl6 fixes a lot of these issues, but of course it won't be out >> for a >> while longer. >> >>> I'd like to figure out how to do it, >>> But before that, is hashing really better than lots of if-elses? >> >> Using a stack of if-elsifs isn't as efficient as a lookup since you >> would test each case in succession (so something that is further down >> the if-elseif test stack would have passed through and failed each >> previous test case before success). A lookup table would test simply >> based on the existence of a value stored under a key (tag). >> >> An alternative is to use 5.10 features (smart matching and given- >> when, >> which is like a switch statement), but that will limit usage for >> those >> still using 5.8.8, which is probably a majority of users, since 5.10 >> came out just last December. >> >> chris >> >>> >>> >>> Mira >>> >>> >>> >>> On 6/2/08 10:29 AM, "Han, Mira" wrote: >>> >>> >>> >>> Last week (May 26-30): >>> 1. made skeleton files for TreeIO:: PhyloEventBuilder, >>> TreeIO::phyloXML, Tree::NodePhyloXML >>> 2. managed to connect and load them up but there is a bus error >>> problem. >>> I think it's probably due to some of the function calls that I'm >>> making >>> That I haven't looked into properly. I'm suspecting it will go away >>> once I properly >>> build in the end_element for >>> >>> This week (Jun 2-6): >>> 1. implement start_element, and end_element for and >>> >>> - start_element: : add treelevel, : push data >>> to current_items. >>> - end_element: : minus treelevel, : pop data >>> from current_elements, use new() to build node from popped data. >>> 2. get rid of that bus error >>> 3. TreeIO::phyloXML::Next_tree() : look for element >>> _______________________________________________ >>> Wg-phyloinformatics mailing list >>> Wg-phyloinformatics at nescent.org >>> https://lists.nescent.org/mailman/listinfo/wg-phyloinformatics >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Marie-Claude Hofmann >> College of Veterinary Medicine >> University of Illinois Urbana-Champaign >> >> >> >> >> _______________________________________________ >> Wg-phyloinformatics mailing list >> Wg-phyloinformatics at nescent.org >> https://lists.nescent.org/mailman/listinfo/wg-phyloinformatics >> > > Christopher Fields Postdoctoral Researcher Lab of Dr. Marie-Claude Hofmann College of Veterinary Medicine University of Illinois Urbana-Champaign From aaron.j.mackey at gsk.com Mon Jun 9 14:30:22 2008 From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com) Date: Mon, 9 Jun 2008 14:30:22 -0400 Subject: [Bioperl-l] [Wg-phyloinformatics] Re: phyloXML weekly report In-Reply-To: <87FB1AA1-943D-4BCC-8F00-2CC5F05FFEAB@uiuc.edu> Message-ID: How about just: $self->${ $self->{lookup}->{tag} }(@args) i.e., shorthand for: $method = $self->{lookup}->{tag} $self->$method(@args); -Aaron wg-phyloinformatics-bounces at nescent.org wrote on 06/09/2008 02:12:29 PM: > [cross-posting to bioperl-l for archiving] > > On Jun 8, 2008, at 11:32 PM, Han, Mira wrote: > > > ... > > issues: > > There are a lot of s when processing the elements, > > I tried to make a hash of function references that point to the > > member functions, > > But when I tried calling it through the hash, it was giving me an > > error that I'm trying to call a method on an unblessed object. > > I ran into something similar when setting up a few SeqIO modules > (Bio::SeqIO::gbdriver being on of them) which passed on data chunks to > method handlers. It has something to do with how the method is set up > in the class (package) namespace and how you refer to it. It's a > little tricky b/c you run into semantic issues with perl's 'hammered- > on' OO, but it can be done. > > If you call using '$self->{lookup}->{$tag}->(@args)' directly, what > happens is you can successfully call the method since you are still in > the proper module namespace. However, since you aren't calling from > the invocant ($self) directly but rather from a reference in the > invocant, it treats the call like a subroutine instead of a method. > Therefore no invocant is passed as the first argument (you will > instead get either the first element in @args or 'undef' assigned to > $self within the method). Not sure if this is supposed to be a > feature or a bug. Regardless, any attempt within the method to do > something with $self will result in a 'using an unblessed reference' > or 'not a hash reference'. > > There are two solutions, both of which work. If you have method > references stored in a hash table in the invocant: > > $self->{lookup}->{tag1} = \&foo; > $self->{lookup}->{tag2} = \&bar; > .... > > you can grab the actual code reference (checking using 'exists') and > use it directly on the invocant, but NOT as a code reference. This > acts as a symbolic reference, which is allowed for subroutine and > method calls (I think it's supposed to be DWIM-my): > > if (exists $self->{lookup}->{$tag}) { > my $method = $self->{lookup}->{$tag}; > $self->$method(@args); > } else {...} > > The above also works if you use strings in the lookup table which > contain the name of the methods (again, symbolic reference): > > $self->{lookup}->{tag1} = 'foo'; > $self->{lookup}->{tag2} = 'bar'; > > Alternately, you can pass the invocant in explicitly (which looks > weird to me, hence my above solution): > > if (exists $self->{lookup}->{$tag}) { > $self->{lookup}->{$tag}->($self, @args); > } else {...} > > perl6 fixes a lot of these issues, but of course it won't be out for a > while longer. > > > I'd like to figure out how to do it, > > But before that, is hashing really better than lots of if-elses? > > Using a stack of if-elsifs isn't as efficient as a lookup since you > would test each case in succession (so something that is further down > the if-elseif test stack would have passed through and failed each > previous test case before success). A lookup table would test simply > based on the existence of a value stored under a key (tag). > > An alternative is to use 5.10 features (smart matching and given-when, > which is like a switch statement), but that will limit usage for those > still using 5.8.8, which is probably a majority of users, since 5.10 > came out just last December. > > chris > > > > > > > Mira > > > > > > > > On 6/2/08 10:29 AM, "Han, Mira" wrote: > > > > > > > > Last week (May 26-30): > > 1. made skeleton files for TreeIO:: PhyloEventBuilder, > > TreeIO::phyloXML, Tree::NodePhyloXML > > 2. managed to connect and load them up but there is a bus error > > problem. > > I think it's probably due to some of the function calls that I'm > > making > > That I haven't looked into properly. I'm suspecting it will go away > > once I properly > > build in the end_element for > > > > This week (Jun 2-6): > > 1. implement start_element, and end_element for and > > > > - start_element: : add treelevel, : push data > > to current_items. > > - end_element: : minus treelevel, : pop data > > from current_elements, use new() to build node from popped data. > > 2. get rid of that bus error > > 3. TreeIO::phyloXML::Next_tree() : look for element > > _______________________________________________ > > Wg-phyloinformatics mailing list > > Wg-phyloinformatics at nescent.org > > https://lists.nescent.org/mailman/listinfo/wg-phyloinformatics > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Marie-Claude Hofmann > College of Veterinary Medicine > University of Illinois Urbana-Champaign > > > > > _______________________________________________ > Wg-phyloinformatics mailing list > Wg-phyloinformatics at nescent.org > https://lists.nescent.org/mailman/listinfo/wg-phyloinformatics > From miraceti at gmail.com Tue Jun 10 00:37:03 2008 From: miraceti at gmail.com (miraceti) Date: Tue, 10 Jun 2008 00:37:03 -0400 Subject: [Bioperl-l] [Wg-phyloinformatics] Re: phyloXML weekly report In-Reply-To: <763316E4-F480-45FC-BC81-A450D61CAB2D@uiuc.edu> References: <763316E4-F480-45FC-BC81-A450D61CAB2D@uiuc.edu> Message-ID: Thanks for that, It now works. FYI, my $method = $self->{'_start_element'}->{$reader->name}; $self->$method(); Worked. $self->{'_start_element'}->{$reader->name}($self); Worked. $self->${$self->{'_start_element'}->{$reader->name}}; Gave error: Not a SCALAR reference at /usr/local/src/bioperl-live/Bio/TreeIO/phyloxml.pm line 197 $self->${\scalar($self->{'_start_element'}->{$reader->name})}; Worked. I'm using the first way. Mira On Mon, Jun 9, 2008 at 3:08 PM, Chris Fields wrote: > Yes, that works as well. > > chris > > > On Jun 9, 2008, at 1:30 PM, aaron.j.mackey at gsk.com wrote: > > How about just: >> >> $self->${ $self->{lookup}->{tag} }(@args) >> >> i.e., shorthand for: >> >> $method = $self->{lookup}->{tag} >> $self->$method(@args); >> >> -Aaron >> >> wg-phyloinformatics-bounces at nescent.org wrote on 06/09/2008 02:12:29 PM: >> >> [cross-posting to bioperl-l for archiving] >>> >>> On Jun 8, 2008, at 11:32 PM, Han, Mira wrote: >>> >>> ... >>>> issues: >>>> There are a lot of s when processing the elements, >>>> I tried to make a hash of function references that point to the >>>> member functions, >>>> But when I tried calling it through the hash, it was giving me an >>>> error that I'm trying to call a method on an unblessed object. >>>> >>> >>> I ran into something similar when setting up a few SeqIO modules >>> (Bio::SeqIO::gbdriver being on of them) which passed on data chunks to >>> method handlers. It has something to do with how the method is set up >>> in the class (package) namespace and how you refer to it. It's a >>> little tricky b/c you run into semantic issues with perl's 'hammered- >>> on' OO, but it can be done. >>> >>> If you call using '$self->{lookup}->{$tag}->(@args)' directly, what >>> happens is you can successfully call the method since you are still in >>> the proper module namespace. However, since you aren't calling from >>> the invocant ($self) directly but rather from a reference in the >>> invocant, it treats the call like a subroutine instead of a method. >>> Therefore no invocant is passed as the first argument (you will >>> instead get either the first element in @args or 'undef' assigned to >>> $self within the method). Not sure if this is supposed to be a >>> feature or a bug. Regardless, any attempt within the method to do >>> something with $self will result in a 'using an unblessed reference' >>> or 'not a hash reference'. >>> >>> There are two solutions, both of which work. If you have method >>> references stored in a hash table in the invocant: >>> >>> $self->{lookup}->{tag1} = \&foo; >>> $self->{lookup}->{tag2} = \&bar; >>> .... >>> >>> you can grab the actual code reference (checking using 'exists') and >>> use it directly on the invocant, but NOT as a code reference. This >>> acts as a symbolic reference, which is allowed for subroutine and >>> method calls (I think it's supposed to be DWIM-my): >>> >>> if (exists $self->{lookup}->{$tag}) { >>> my $method = $self->{lookup}->{$tag}; >>> $self->$method(@args); >>> } else {...} >>> >>> The above also works if you use strings in the lookup table which >>> contain the name of the methods (again, symbolic reference): >>> >>> $self->{lookup}->{tag1} = 'foo'; >>> $self->{lookup}->{tag2} = 'bar'; >>> >>> Alternately, you can pass the invocant in explicitly (which looks >>> weird to me, hence my above solution): >>> >>> if (exists $self->{lookup}->{$tag}) { >>> $self->{lookup}->{$tag}->($self, @args); >>> } else {...} >>> >>> perl6 fixes a lot of these issues, but of course it won't be out for a >>> while longer. >>> >>> I'd like to figure out how to do it, >>>> But before that, is hashing really better than lots of if-elses? >>>> >>> >>> Using a stack of if-elsifs isn't as efficient as a lookup since you >>> would test each case in succession (so something that is further down >>> the if-elseif test stack would have passed through and failed each >>> previous test case before success). A lookup table would test simply >>> based on the existence of a value stored under a key (tag). >>> >>> An alternative is to use 5.10 features (smart matching and given-when, >>> which is like a switch statement), but that will limit usage for those >>> still using 5.8.8, which is probably a majority of users, since 5.10 >>> came out just last December. >>> >>> chris >>> >>> >>>> >>>> Mira >>>> >>>> >>>> >>>> On 6/2/08 10:29 AM, "Han, Mira" wrote: >>>> >>>> >>>> >>>> Last week (May 26-30): >>>> 1. made skeleton files for TreeIO:: PhyloEventBuilder, >>>> TreeIO::phyloXML, Tree::NodePhyloXML >>>> 2. managed to connect and load them up but there is a bus error >>>> problem. >>>> I think it's probably due to some of the function calls that I'm >>>> making >>>> That I haven't looked into properly. I'm suspecting it will go away >>>> once I properly >>>> build in the end_element for >>>> >>>> This week (Jun 2-6): >>>> 1. implement start_element, and end_element for and >>>> >>>> - start_element: : add treelevel, : push data >>>> to current_items. >>>> - end_element: : minus treelevel, : pop data >>>> from current_elements, use new() to build node from popped data. >>>> 2. get rid of that bus error >>>> 3. TreeIO::phyloXML::Next_tree() : look for element >>>> _______________________________________________ >>>> Wg-phyloinformatics mailing list >>>> Wg-phyloinformatics at nescent.org >>>> https://lists.nescent.org/mailman/listinfo/wg-phyloinformatics >>>> >>> >>> Christopher Fields >>> Postdoctoral Researcher >>> Lab of Dr. Marie-Claude Hofmann >>> College of Veterinary Medicine >>> University of Illinois Urbana-Champaign >>> >>> >>> >>> >>> _______________________________________________ >>> Wg-phyloinformatics mailing list >>> Wg-phyloinformatics at nescent.org >>> https://lists.nescent.org/mailman/listinfo/wg-phyloinformatics >>> >>> >> >> > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Marie-Claude Hofmann > College of Veterinary Medicine > University of Illinois Urbana-Champaign > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From yezhiqiang at gmail.com Tue Jun 10 07:43:50 2008 From: yezhiqiang at gmail.com (Zhi-Qiang Ye) Date: Tue, 10 Jun 2008 19:43:50 +0800 Subject: [Bioperl-l] EMBL format field In-Reply-To: References: <3A0B2CCF-6EA7-485D-ABC9-6A0F83B3F4B0@gmail.com> <03C512635899144083CADB0EE222018901C8072B@alpaca.lan.ablynx.com> Message-ID: <34198fe40806100443g6c18f47dj881d68c0bf14ba8f@mail.gmail.com> That's weird. I also met this problem. I tried a embl-format file like this: ID CB271253; SV 1; linear; mRNA; EST; INV; 591 BP. XX AC CB271253; XX DT 24-FEB-2003 (Rel. 74, Created) DT 24-FEB-2003 (Rel. 74, Last updated, Version 1) XX DE taa17c02.x2 Hydra EST -II Hydra magnipapillata cDNA 3' similar to DE SW:OPSD_RABIT P49912 RHODOPSIN. ;, mRNA sequence. from: http://www.ebi.ac.uk/cgi-bin/dbfetch?db=embl&id=CB271253&style=raw the $seq object's ->id, ->display_id are "unkown id" ... ZQ Ye 2008/6/9 Hilmar Lapp : > If this is the case with the latest version of BioPerl it should be filed as > a bug report for the embl parser. The ID ought to be reported in > $seq->get_secondary_accessions() (which returns an array). If it doesn't, it > sounds like a bug to me. > > -hilmar > > On Jun 9, 2008, at 4:47 AM, Marc Logghe wrote: >> >> Hi Wen, >> A dump of that sequence object (Data::Dumper is your friend !) reveals >> that the PA EMBL field is not saved into the object. However, you will >> find the string 'AB000170.1' in the embedded CDS feature, more precisely >> the seqid of the location object. I don't know whether that is always >> the case, but it is in your particular example. >> So, to get your hands on that value you have to do: >> >> my ($cds) = grep {$_->primary_tag eq 'CDS'} $seq->get_SeqFeatures; >> my $parent_id = $cds->location->seq_id; >> >> HTH, >> Marc >> >> Marc Logghe >> Senior Bioinformatician >> Ablynx nv >>> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>> bounces at lists.open-bio.org] On Behalf Of Wen Huang >>> Sent: Monday, June 09, 2008 5:28 AM >>> To: bioperl-l at lists.open-bio.org >>> Subject: [Bioperl-l] EMBL format field >>> >>> Hi all, >>> >>> I have a EMBL file that I want to extract one of the line >>> >>> ###file### >>> ID BAA19060; SV 1; linear; mRNA; STD; MAM; 2115 BP. >>> XX >>> PA AB000170.1 >>> XX >>> DE Sus scrofa (pig) endopeptidase 24.16 type M1 >>> XX >>> OS Sus scrofa (pig) >>> OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; >>> Mammalia; >>> OC Eutheria; Laurasiatheria; Cetartiodactyla; Suina; Suidae; Sus. >>> OX NCBI_TaxID=9823; >>> ......... >>> >>> I want the accession number in the line that starts with PA, AB000170 >>> in this example. >>> >>> Can anybody kindly help, tell me which module and method I should use? >>> I tried various things like $seq_obj -> primary_id, display_id, >>> get_secondary_id, etc.. they did not work... >>> >>> Thanks a lot! >>> >>> Wen >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jeremydt at gmail.com Tue Jun 10 14:50:19 2008 From: jeremydt at gmail.com (Jeremy Davis-Turak) Date: Tue, 10 Jun 2008 11:50:19 -0700 Subject: [Bioperl-l] Error installing bioperl Message-ID: <378b225b0806101150i5fe6ff2as831d6ee8b5254b4c@mail.gmail.com> Hi, I'm getting the following error, either using CPAN or make with bioperl-1.4 (also with bioperl-1.2) Writing Makefile for Bio make: *** No rule to make target `/usr/lib64/perl5/5.10.0/x86_64-linux-thread-multi/CORE/config.h', needed by `Makefile'. Stop. Can you please help? Thanks, Jeremy From Kevin.M.Brown at asu.edu Tue Jun 10 15:10:40 2008 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Tue, 10 Jun 2008 12:10:40 -0700 Subject: [Bioperl-l] Error installing bioperl In-Reply-To: <378b225b0806101150i5fe6ff2as831d6ee8b5254b4c@mail.gmail.com> References: <378b225b0806101150i5fe6ff2as831d6ee8b5254b4c@mail.gmail.com> Message-ID: <1A4207F8295607498283FE9E93B775B404F2AFCD@EX02.asurite.ad.asu.edu> Bioperl 1.4 is a very old version. Try following the install directions at http://www.bioperl.org/wiki/Installing_BioPerl > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Jeremy Davis-Turak > Sent: Tuesday, June 10, 2008 11:50 AM > To: bioperl-l at bioperl.org > Subject: [Bioperl-l] Error installing bioperl > > Hi, I'm getting the following error, either using CPAN or make with > bioperl-1.4 (also with bioperl-1.2) > > Writing Makefile for Bio > make: *** No rule to make target > `/usr/lib64/perl5/5.10.0/x86_64-linux-thread-multi/CORE/config > .h', needed by > `Makefile'. Stop. > > > Can you please help? > > Thanks, > > Jeremy > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From heikki at sanbi.ac.za Tue Jun 10 19:22:10 2008 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Wed, 11 Jun 2008 01:22:10 +0200 Subject: [Bioperl-l] A lot of POD fixes in bioperl-live and bioperl run Message-ID: <200806110122.10982.heikki@sanbi.ac.za> I have recently done a lot fixes in the inline Plain Old Documenation (POD) texts in bioperl-live and bioperl-run. Last ones (hopefully) were committed a few minutes ago. This has resulted quite large updates from SVN. I wanted to apologize the inconvenience and to explain reasons for these small and pedantic fixes. In contrast to perl, POD is sensitive to white space. This makes it relatively difficult to find and fix all minor errors in POD. I've now gone through the trouble of fixing all POD mistakes causing even the smallest warning in the podchecker. The main reason for doing this was to reduce the the number of warnings reported by the pod.pl bioperl maintenence tool. Too many minor warnings make it difficult to recognise more serious errors affecting the integrity and readability of POD documentation. One example case is when a paragraph that was supposed to be 'in verbatim', is in fact touching the previous paragraph and the pod engine formats it and destroys the intended ascii graph or table. The only way POD engine is able to report this is to warn about unescaped special characters. -Heikki -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From jason at bioperl.org Tue Jun 10 19:55:56 2008 From: jason at bioperl.org (Jason Stajich) Date: Tue, 10 Jun 2008 16:55:56 -0700 Subject: [Bioperl-l] EMBL format field In-Reply-To: <34198fe40806100443g6c18f47dj881d68c0bf14ba8f@mail.gmail.com> References: <3A0B2CCF-6EA7-485D-ABC9-6A0F83B3F4B0@gmail.com> <03C512635899144083CADB0EE222018901C8072B@alpaca.lan.ablynx.com> <34198fe40806100443g6c18f47dj881d68c0bf14ba8f@mail.gmail.com> Message-ID: <46E681F5-CB56-4ADA-BEB9-00083CFA78F9@bioperl.org> What version of bioperl? It works for me using this code I get 'CB271253' printed out. #!/usr/bin/perl -w use strict; use Bio::SeqIO; my $in = Bio::SeqIO->new(-format => 'embl', -file => shift); while( my $seq = $in->next_seq ) { print $seq->id,"\n"; } On Jun 10, 2008, at 4:43 AM, Zhi-Qiang Ye wrote: > That's weird. I also met this problem. I tried a embl-format file > like this: > > ID CB271253; SV 1; linear; mRNA; EST; INV; 591 BP. > XX > AC CB271253; > XX > DT 24-FEB-2003 (Rel. 74, Created) > DT 24-FEB-2003 (Rel. 74, Last updated, Version 1) > XX > DE taa17c02.x2 Hydra EST -II Hydra magnipapillata cDNA 3' similar to > DE SW:OPSD_RABIT P49912 RHODOPSIN. ;, mRNA sequence. > > from: http://www.ebi.ac.uk/cgi-bin/dbfetch? > db=embl&id=CB271253&style=raw > > the $seq object's ->id, ->display_id are "unkown id" ... > > > > ZQ Ye > > 2008/6/9 Hilmar Lapp : >> If this is the case with the latest version of BioPerl it should >> be filed as >> a bug report for the embl parser. The ID ought to be reported in >> $seq->get_secondary_accessions() (which returns an array). If it >> doesn't, it >> sounds like a bug to me. >> >> -hilmar >> >> On Jun 9, 2008, at 4:47 AM, Marc Logghe wrote: >>> >>> Hi Wen, >>> A dump of that sequence object (Data::Dumper is your friend !) >>> reveals >>> that the PA EMBL field is not saved into the object. However, you >>> will >>> find the string 'AB000170.1' in the embedded CDS feature, more >>> precisely >>> the seqid of the location object. I don't know whether that is >>> always >>> the case, but it is in your particular example. >>> So, to get your hands on that value you have to do: >>> >>> my ($cds) = grep {$_->primary_tag eq 'CDS'} $seq->get_SeqFeatures; >>> my $parent_id = $cds->location->seq_id; >>> >>> HTH, >>> Marc >>> >>> Marc Logghe >>> Senior Bioinformatician >>> Ablynx nv >>>> >>>> -----Original Message----- >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>> bounces at lists.open-bio.org] On Behalf Of Wen Huang >>>> Sent: Monday, June 09, 2008 5:28 AM >>>> To: bioperl-l at lists.open-bio.org >>>> Subject: [Bioperl-l] EMBL format field >>>> >>>> Hi all, >>>> >>>> I have a EMBL file that I want to extract one of the line >>>> >>>> ###file### >>>> ID BAA19060; SV 1; linear; mRNA; STD; MAM; 2115 BP. >>>> XX >>>> PA AB000170.1 >>>> XX >>>> DE Sus scrofa (pig) endopeptidase 24.16 type M1 >>>> XX >>>> OS Sus scrofa (pig) >>>> OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; >>>> Euteleostomi; >>>> Mammalia; >>>> OC Eutheria; Laurasiatheria; Cetartiodactyla; Suina; Suidae; Sus. >>>> OX NCBI_TaxID=9823; >>>> ......... >>>> >>>> I want the accession number in the line that starts with PA, >>>> AB000170 >>>> in this example. >>>> >>>> Can anybody kindly help, tell me which module and method I >>>> should use? >>>> I tried various things like $seq_obj -> primary_id, display_id, >>>> get_secondary_id, etc.. they did not work... >>>> >>>> Thanks a lot! >>>> >>>> Wen >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Tue Jun 10 19:57:42 2008 From: jason at bioperl.org (Jason Stajich) Date: Tue, 10 Jun 2008 16:57:42 -0700 Subject: [Bioperl-l] EMBL format field In-Reply-To: <45F42C94-02FD-429B-816F-883C7855A97D@gmail.com> References: <3A0B2CCF-6EA7-485D-ABC9-6A0F83B3F4B0@gmail.com> <03C512635899144083CADB0EE222018901C8072B@alpaca.lan.ablynx.com> <45F42C94-02FD-429B-816F-883C7855A97D@gmail.com> Message-ID: <468D955B-533C-439A-96EA-B7CC6392BFE3@bioperl.org> PA is a field that we don't currently parse, something that should be filed as a bug on bugzilla. Would you be able to do this? -jason On Jun 9, 2008, at 7:07 AM, Wen Huang wrote: > Hilmar, > > I tried that, it did not work. Marc's way can work. > > Thanks, > Wen > > On Jun 9, 2008, at 7:30 AM, Hilmar Lapp wrote: > >> If this is the case with the latest version of BioPerl it should >> be filed as a bug report for the embl parser. The ID ought to be >> reported in $seq->get_secondary_accessions() (which returns an >> array). If it doesn't, it sounds like a bug to me. >> >> -hilmar >> >> On Jun 9, 2008, at 4:47 AM, Marc Logghe wrote: >>> Hi Wen, >>> A dump of that sequence object (Data::Dumper is your friend !) >>> reveals >>> that the PA EMBL field is not saved into the object. However, you >>> will >>> find the string 'AB000170.1' in the embedded CDS feature, more >>> precisely >>> the seqid of the location object. I don't know whether that is >>> always >>> the case, but it is in your particular example. >>> So, to get your hands on that value you have to do: >>> >>> my ($cds) = grep {$_->primary_tag eq 'CDS'} $seq->get_SeqFeatures; >>> my $parent_id = $cds->location->seq_id; >>> >>> HTH, >>> Marc >>> >>> Marc Logghe >>> Senior Bioinformatician >>> Ablynx nv >>>> -----Original Message----- >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>> bounces at lists.open-bio.org] On Behalf Of Wen Huang >>>> Sent: Monday, June 09, 2008 5:28 AM >>>> To: bioperl-l at lists.open-bio.org >>>> Subject: [Bioperl-l] EMBL format field >>>> >>>> Hi all, >>>> >>>> I have a EMBL file that I want to extract one of the line >>>> >>>> ###file### >>>> ID BAA19060; SV 1; linear; mRNA; STD; MAM; 2115 BP. >>>> XX >>>> PA AB000170.1 >>>> XX >>>> DE Sus scrofa (pig) endopeptidase 24.16 type M1 >>>> XX >>>> OS Sus scrofa (pig) >>>> OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; >>>> Euteleostomi; >>>> Mammalia; >>>> OC Eutheria; Laurasiatheria; Cetartiodactyla; Suina; Suidae; Sus. >>>> OX NCBI_TaxID=9823; >>>> ......... >>>> >>>> I want the accession number in the line that starts with PA, >>>> AB000170 >>>> in this example. >>>> >>>> Can anybody kindly help, tell me which module and method I >>>> should use? >>>> I tried various things like $seq_obj -> primary_id, display_id, >>>> get_secondary_id, etc.. they did not work... >>>> >>>> Thanks a lot! >>>> >>>> Wen >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Tue Jun 10 20:19:55 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 10 Jun 2008 19:19:55 -0500 Subject: [Bioperl-l] EMBL format field In-Reply-To: <468D955B-533C-439A-96EA-B7CC6392BFE3@bioperl.org> References: <3A0B2CCF-6EA7-485D-ABC9-6A0F83B3F4B0@gmail.com> <03C512635899144083CADB0EE222018901C8072B@alpaca.lan.ablynx.com> <45F42C94-02FD-429B-816F-883C7855A97D@gmail.com> <468D955B-533C-439A-96EA-B7CC6392BFE3@bioperl.org> Message-ID: <2542BB6E-AC43-4D5A-8788-2330E59A6E6F@uiuc.edu> PA is an odd field; it isn't described in the EMBL user manual: http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html but appears in mRNA files, so I'm guessing it stands for the (p)rotein (a)ccession. I don't think this should be stored as primary/secondary accession, but maybe as a DBLink annootation? chris On Jun 10, 2008, at 6:57 PM, Jason Stajich wrote: > PA is a field that we don't currently parse, something that should > be filed as a bug on bugzilla. > Would you be able to do this? > > -jason > On Jun 9, 2008, at 7:07 AM, Wen Huang wrote: > >> Hilmar, >> >> I tried that, it did not work. Marc's way can work. >> >> Thanks, >> Wen >> >> On Jun 9, 2008, at 7:30 AM, Hilmar Lapp wrote: >> >>> If this is the case with the latest version of BioPerl it should >>> be filed as a bug report for the embl parser. The ID ought to be >>> reported in $seq->get_secondary_accessions() (which returns an >>> array). If it doesn't, it sounds like a bug to me. >>> >>> -hilmar >>> >>> On Jun 9, 2008, at 4:47 AM, Marc Logghe wrote: >>>> Hi Wen, >>>> A dump of that sequence object (Data::Dumper is your friend !) >>>> reveals >>>> that the PA EMBL field is not saved into the object. However, you >>>> will >>>> find the string 'AB000170.1' in the embedded CDS feature, more >>>> precisely >>>> the seqid of the location object. I don't know whether that is >>>> always >>>> the case, but it is in your particular example. >>>> So, to get your hands on that value you have to do: >>>> >>>> my ($cds) = grep {$_->primary_tag eq 'CDS'} $seq->get_SeqFeatures; >>>> my $parent_id = $cds->location->seq_id; >>>> >>>> HTH, >>>> Marc >>>> >>>> Marc Logghe >>>> Senior Bioinformatician >>>> Ablynx nv >>>>> -----Original Message----- >>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>> bounces at lists.open-bio.org] On Behalf Of Wen Huang >>>>> Sent: Monday, June 09, 2008 5:28 AM >>>>> To: bioperl-l at lists.open-bio.org >>>>> Subject: [Bioperl-l] EMBL format field >>>>> >>>>> Hi all, >>>>> >>>>> I have a EMBL file that I want to extract one of the line >>>>> >>>>> ###file### >>>>> ID BAA19060; SV 1; linear; mRNA; STD; MAM; 2115 BP. >>>>> XX >>>>> PA AB000170.1 >>>>> XX >>>>> DE Sus scrofa (pig) endopeptidase 24.16 type M1 >>>>> XX >>>>> OS Sus scrofa (pig) >>>>> OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; >>>>> Euteleostomi; >>>>> Mammalia; >>>>> OC Eutheria; Laurasiatheria; Cetartiodactyla; Suina; Suidae; >>>>> Sus. >>>>> OX NCBI_TaxID=9823; >>>>> ......... >>>>> >>>>> I want the accession number in the line that starts with PA, >>>>> AB000170 >>>>> in this example. >>>>> >>>>> Can anybody kindly help, tell me which module and method I >>>>> should use? >>>>> I tried various things like $seq_obj -> primary_id, display_id, >>>>> get_secondary_id, etc.. they did not work... >>>>> >>>>> Thanks a lot! >>>>> >>>>> Wen >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>> =========================================================== >>> >>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Marie-Claude Hofmann College of Veterinary Medicine University of Illinois Urbana-Champaign From jason at bioperl.org Tue Jun 10 20:36:20 2008 From: jason at bioperl.org (Jason Stajich) Date: Tue, 10 Jun 2008 17:36:20 -0700 Subject: [Bioperl-l] EMBL format field In-Reply-To: <2542BB6E-AC43-4D5A-8788-2330E59A6E6F@uiuc.edu> References: <3A0B2CCF-6EA7-485D-ABC9-6A0F83B3F4B0@gmail.com> <03C512635899144083CADB0EE222018901C8072B@alpaca.lan.ablynx.com> <45F42C94-02FD-429B-816F-883C7855A97D@gmail.com> <468D955B-533C-439A-96EA-B7CC6392BFE3@bioperl.org> <2542BB6E-AC43-4D5A-8788-2330E59A6E6F@uiuc.edu> Message-ID: <9A1F4E7F-A231-410C-8890-14ECDC7D0258@bioperl.org> I agree if it isn't the accession # it shouldn't be stored there. I guess it is a DBlink, but it is going to be hacky to round-trip this as you'll have to have a special case for records that are mRNAs... -jason On Jun 10, 2008, at 5:19 PM, Chris Fields wrote: > PA is an odd field; it isn't described in the EMBL user manual: > > http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html > > but appears in mRNA files, so I'm guessing it stands for the (p) > rotein (a)ccession. I don't think this should be stored as primary/ > secondary accession, but maybe as a DBLink annootation? > > chris > > On Jun 10, 2008, at 6:57 PM, Jason Stajich wrote: > >> PA is a field that we don't currently parse, something that should >> be filed as a bug on bugzilla. >> Would you be able to do this? >> >> -jason >> On Jun 9, 2008, at 7:07 AM, Wen Huang wrote: >> >>> Hilmar, >>> >>> I tried that, it did not work. Marc's way can work. >>> >>> Thanks, >>> Wen >>> >>> On Jun 9, 2008, at 7:30 AM, Hilmar Lapp wrote: >>> >>>> If this is the case with the latest version of BioPerl it should >>>> be filed as a bug report for the embl parser. The ID ought to be >>>> reported in $seq->get_secondary_accessions() (which returns an >>>> array). If it doesn't, it sounds like a bug to me. >>>> >>>> -hilmar >>>> >>>> On Jun 9, 2008, at 4:47 AM, Marc Logghe wrote: >>>>> Hi Wen, >>>>> A dump of that sequence object (Data::Dumper is your friend !) >>>>> reveals >>>>> that the PA EMBL field is not saved into the object. However, >>>>> you will >>>>> find the string 'AB000170.1' in the embedded CDS feature, more >>>>> precisely >>>>> the seqid of the location object. I don't know whether that is >>>>> always >>>>> the case, but it is in your particular example. >>>>> So, to get your hands on that value you have to do: >>>>> >>>>> my ($cds) = grep {$_->primary_tag eq 'CDS'} $seq->get_SeqFeatures; >>>>> my $parent_id = $cds->location->seq_id; >>>>> >>>>> HTH, >>>>> Marc >>>>> >>>>> Marc Logghe >>>>> Senior Bioinformatician >>>>> Ablynx nv >>>>>> -----Original Message----- >>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>>> bounces at lists.open-bio.org] On Behalf Of Wen Huang >>>>>> Sent: Monday, June 09, 2008 5:28 AM >>>>>> To: bioperl-l at lists.open-bio.org >>>>>> Subject: [Bioperl-l] EMBL format field >>>>>> >>>>>> Hi all, >>>>>> >>>>>> I have a EMBL file that I want to extract one of the line >>>>>> >>>>>> ###file### >>>>>> ID BAA19060; SV 1; linear; mRNA; STD; MAM; 2115 BP. >>>>>> XX >>>>>> PA AB000170.1 >>>>>> XX >>>>>> DE Sus scrofa (pig) endopeptidase 24.16 type M1 >>>>>> XX >>>>>> OS Sus scrofa (pig) >>>>>> OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; >>>>>> Euteleostomi; >>>>>> Mammalia; >>>>>> OC Eutheria; Laurasiatheria; Cetartiodactyla; Suina; Suidae; >>>>>> Sus. >>>>>> OX NCBI_TaxID=9823; >>>>>> ......... >>>>>> >>>>>> I want the accession number in the line that starts with PA, >>>>>> AB000170 >>>>>> in this example. >>>>>> >>>>>> Can anybody kindly help, tell me which module and method I >>>>>> should use? >>>>>> I tried various things like $seq_obj -> primary_id, display_id, >>>>>> get_secondary_id, etc.. they did not work... >>>>>> >>>>>> Thanks a lot! >>>>>> >>>>>> Wen >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> -- >>>> =========================================================== >>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>>> =========================================================== >>>> >>>> >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Marie-Claude Hofmann > College of Veterinary Medicine > University of Illinois Urbana-Champaign > > > > From whuang.ustc at gmail.com Tue Jun 10 20:51:51 2008 From: whuang.ustc at gmail.com (Wen Huang) Date: Tue, 10 Jun 2008 19:51:51 -0500 Subject: [Bioperl-l] EMBL format field In-Reply-To: <9A1F4E7F-A231-410C-8890-14ECDC7D0258@bioperl.org> References: <3A0B2CCF-6EA7-485D-ABC9-6A0F83B3F4B0@gmail.com> <03C512635899144083CADB0EE222018901C8072B@alpaca.lan.ablynx.com> <45F42C94-02FD-429B-816F-883C7855A97D@gmail.com> <468D955B-533C-439A-96EA-B7CC6392BFE3@bioperl.org> <2542BB6E-AC43-4D5A-8788-2330E59A6E6F@uiuc.edu> <9A1F4E7F-A231-410C-8890-14ECDC7D0258@bioperl.org> Message-ID: Hi Everybody, Thank you for your thoughtful discussion and help. I have found another way to get around it (by grep and awk), but not so perl-ish. I don't think I know how to submit a bug report to bugzilla, but I do think that it is not a good idea to include the parent id in a PA line, or even in the file... The file I got is from EMBL-CDS databank, I wanted to get the mRNA from which they are derived. I guess it is better to include it as a DBlink as Jason pointed out. Thanks, Wen On Jun 10, 2008, at 7:36 PM, Jason Stajich wrote: > I agree if it isn't the accession # it shouldn't be stored there. I > guess it is a DBlink, but it is going to be hacky to round-trip this > as you'll have to have a special case for records that are mRNAs... > > -jason > On Jun 10, 2008, at 5:19 PM, Chris Fields wrote: > >> PA is an odd field; it isn't described in the EMBL user manual: >> >> http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html >> >> but appears in mRNA files, so I'm guessing it stands for the >> (p)rotein (a)ccession. I don't think this should be stored as >> primary/secondary accession, but maybe as a DBLink annootation? >> >> chris >> >> On Jun 10, 2008, at 6:57 PM, Jason Stajich wrote: >> >>> PA is a field that we don't currently parse, something that should >>> be filed as a bug on bugzilla. >>> Would you be able to do this? >>> >>> -jason >>> On Jun 9, 2008, at 7:07 AM, Wen Huang wrote: >>> >>>> Hilmar, >>>> >>>> I tried that, it did not work. Marc's way can work. >>>> >>>> Thanks, >>>> Wen >>>> >>>> On Jun 9, 2008, at 7:30 AM, Hilmar Lapp wrote: >>>> >>>>> If this is the case with the latest version of BioPerl it should >>>>> be filed as a bug report for the embl parser. The ID ought to be >>>>> reported in $seq->get_secondary_accessions() (which returns an >>>>> array). If it doesn't, it sounds like a bug to me. >>>>> >>>>> -hilmar >>>>> >>>>> On Jun 9, 2008, at 4:47 AM, Marc Logghe wrote: >>>>>> Hi Wen, >>>>>> A dump of that sequence object (Data::Dumper is your friend !) >>>>>> reveals >>>>>> that the PA EMBL field is not saved into the object. However, >>>>>> you will >>>>>> find the string 'AB000170.1' in the embedded CDS feature, more >>>>>> precisely >>>>>> the seqid of the location object. I don't know whether that is >>>>>> always >>>>>> the case, but it is in your particular example. >>>>>> So, to get your hands on that value you have to do: >>>>>> >>>>>> my ($cds) = grep {$_->primary_tag eq 'CDS'} $seq- >>>>>> >get_SeqFeatures; >>>>>> my $parent_id = $cds->location->seq_id; >>>>>> >>>>>> HTH, >>>>>> Marc >>>>>> >>>>>> Marc Logghe >>>>>> Senior Bioinformatician >>>>>> Ablynx nv >>>>>>> -----Original Message----- >>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>>>> bounces at lists.open-bio.org] On Behalf Of Wen Huang >>>>>>> Sent: Monday, June 09, 2008 5:28 AM >>>>>>> To: bioperl-l at lists.open-bio.org >>>>>>> Subject: [Bioperl-l] EMBL format field >>>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> I have a EMBL file that I want to extract one of the line >>>>>>> >>>>>>> ###file### >>>>>>> ID BAA19060; SV 1; linear; mRNA; STD; MAM; 2115 BP. >>>>>>> XX >>>>>>> PA AB000170.1 >>>>>>> XX >>>>>>> DE Sus scrofa (pig) endopeptidase 24.16 type M1 >>>>>>> XX >>>>>>> OS Sus scrofa (pig) >>>>>>> OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; >>>>>>> Euteleostomi; >>>>>>> Mammalia; >>>>>>> OC Eutheria; Laurasiatheria; Cetartiodactyla; Suina; Suidae; >>>>>>> Sus. >>>>>>> OX NCBI_TaxID=9823; >>>>>>> ......... >>>>>>> >>>>>>> I want the accession number in the line that starts with PA, >>>>>>> AB000170 >>>>>>> in this example. >>>>>>> >>>>>>> Can anybody kindly help, tell me which module and method I >>>>>>> should use? >>>>>>> I tried various things like $seq_obj -> primary_id, display_id, >>>>>>> get_secondary_id, etc.. they did not work... >>>>>>> >>>>>>> Thanks a lot! >>>>>>> >>>>>>> Wen >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> -- >>>>> =========================================================== >>>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>>>> =========================================================== >>>>> >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Marie-Claude Hofmann >> College of Veterinary Medicine >> University of Illinois Urbana-Champaign >> >> >> >> > From hlapp at gmx.net Tue Jun 10 21:35:50 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 10 Jun 2008 21:35:50 -0400 Subject: [Bioperl-l] EMBL format field In-Reply-To: <9A1F4E7F-A231-410C-8890-14ECDC7D0258@bioperl.org> References: <3A0B2CCF-6EA7-485D-ABC9-6A0F83B3F4B0@gmail.com> <03C512635899144083CADB0EE222018901C8072B@alpaca.lan.ablynx.com> <45F42C94-02FD-429B-816F-883C7855A97D@gmail.com> <468D955B-533C-439A-96EA-B7CC6392BFE3@bioperl.org> <2542BB6E-AC43-4D5A-8788-2330E59A6E6F@uiuc.edu> <9A1F4E7F-A231-410C-8890-14ECDC7D0258@bioperl.org> Message-ID: <283146E5-0041-42CF-B881-90624D5DE393@gmx.net> On Jun 10, 2008, at 8:36 PM, Jason Stajich wrote: > I agree if it isn't the accession # it shouldn't be stored there. > I guess it is a DBlink, but it is going to be hacky to round-trip > this as you'll have to have a special case for records that are > mRNAs... I think I agree with that - didn't realize it is the accession of the (translated) protein. It would be ideal to convert this into a DBLink annotation indeed, but that's an opinion and an interpretation of the file (even if a very useful one). As such I believe it should be the matter of a SeqProcessor. Hmm - except that at that point the information has been lost already so there's actually nothing that the SeqProcessor could massage. So what if the line would simply be a B::Annotation::SimpleValue with 'PA' as key and the accession# as value? That wouldn't be an interpretation, and yet would make the value available to a SeqProcessor for converting into a DBLink. -hilmar > > -jason > On Jun 10, 2008, at 5:19 PM, Chris Fields wrote: > >> PA is an odd field; it isn't described in the EMBL user manual: >> >> http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html >> >> but appears in mRNA files, so I'm guessing it stands for the (p) >> rotein (a)ccession. I don't think this should be stored as >> primary/secondary accession, but maybe as a DBLink annootation? >> >> chris >> >> On Jun 10, 2008, at 6:57 PM, Jason Stajich wrote: >> >>> PA is a field that we don't currently parse, something that >>> should be filed as a bug on bugzilla. >>> Would you be able to do this? >>> >>> -jason >>> On Jun 9, 2008, at 7:07 AM, Wen Huang wrote: >>> >>>> Hilmar, >>>> >>>> I tried that, it did not work. Marc's way can work. >>>> >>>> Thanks, >>>> Wen >>>> >>>> On Jun 9, 2008, at 7:30 AM, Hilmar Lapp wrote: >>>> >>>>> If this is the case with the latest version of BioPerl it >>>>> should be filed as a bug report for the embl parser. The ID >>>>> ought to be reported in $seq->get_secondary_accessions() (which >>>>> returns an array). If it doesn't, it sounds like a bug to me. >>>>> >>>>> -hilmar >>>>> >>>>> On Jun 9, 2008, at 4:47 AM, Marc Logghe wrote: >>>>>> Hi Wen, >>>>>> A dump of that sequence object (Data::Dumper is your friend !) >>>>>> reveals >>>>>> that the PA EMBL field is not saved into the object. However, >>>>>> you will >>>>>> find the string 'AB000170.1' in the embedded CDS feature, more >>>>>> precisely >>>>>> the seqid of the location object. I don't know whether that is >>>>>> always >>>>>> the case, but it is in your particular example. >>>>>> So, to get your hands on that value you have to do: >>>>>> >>>>>> my ($cds) = grep {$_->primary_tag eq 'CDS'} $seq- >>>>>> >get_SeqFeatures; >>>>>> my $parent_id = $cds->location->seq_id; >>>>>> >>>>>> HTH, >>>>>> Marc >>>>>> >>>>>> Marc Logghe >>>>>> Senior Bioinformatician >>>>>> Ablynx nv >>>>>>> -----Original Message----- >>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>>>> bounces at lists.open-bio.org] On Behalf Of Wen Huang >>>>>>> Sent: Monday, June 09, 2008 5:28 AM >>>>>>> To: bioperl-l at lists.open-bio.org >>>>>>> Subject: [Bioperl-l] EMBL format field >>>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> I have a EMBL file that I want to extract one of the line >>>>>>> >>>>>>> ###file### >>>>>>> ID BAA19060; SV 1; linear; mRNA; STD; MAM; 2115 BP. >>>>>>> XX >>>>>>> PA AB000170.1 >>>>>>> XX >>>>>>> DE Sus scrofa (pig) endopeptidase 24.16 type M1 >>>>>>> XX >>>>>>> OS Sus scrofa (pig) >>>>>>> OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; >>>>>>> Euteleostomi; >>>>>>> Mammalia; >>>>>>> OC Eutheria; Laurasiatheria; Cetartiodactyla; Suina; >>>>>>> Suidae; Sus. >>>>>>> OX NCBI_TaxID=9823; >>>>>>> ......... >>>>>>> >>>>>>> I want the accession number in the line that starts with PA, >>>>>>> AB000170 >>>>>>> in this example. >>>>>>> >>>>>>> Can anybody kindly help, tell me which module and method I >>>>>>> should use? >>>>>>> I tried various things like $seq_obj -> primary_id, display_id, >>>>>>> get_secondary_id, etc.. they did not work... >>>>>>> >>>>>>> Thanks a lot! >>>>>>> >>>>>>> Wen >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> -- >>>>> =========================================================== >>>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>>>> =========================================================== >>>>> >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Marie-Claude Hofmann >> College of Veterinary Medicine >> University of Illinois Urbana-Champaign >> >> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From bill at genenformics.com Tue Jun 10 21:43:55 2008 From: bill at genenformics.com (bill at genenformics.com) Date: Tue, 10 Jun 2008 18:43:55 -0700 (PDT) Subject: [Bioperl-l] EMBL format field In-Reply-To: <283146E5-0041-42CF-B881-90624D5DE393@gmx.net> References: <3A0B2CCF-6EA7-485D-ABC9-6A0F83B3F4B0@gmail.com> <03C512635899144083CADB0EE222018901C8072B@alpaca.lan.ablynx.com> <45F42C94-02FD-429B-816F-883C7855A97D@gmail.com> <468D955B-533C-439A-96EA-B7CC6392BFE3@bioperl.org> <2542BB6E-AC43-4D5A-8788-2330E59A6E6F@uiuc.edu> <9A1F4E7F-A231-410C-8890-14ECDC7D0258@bioperl.org> <283146E5-0041-42CF-B881-90624D5DE393@gmx.net> Message-ID: <61099.98.218.171.90.1213148635.squirrel@webmail.dreamhost.com> This can be accomplished using IdConvert if protein accession/gi is known: $> ./IdConvert.exe BAA19060 #Input Nuc_GI Nuc_Acc Pro_GI Pro_Acc Desc BAA19060 1783121 AB000170.1 1783123 BAA19061.1 endopeptidase 24.16 type M3 [Sus scrofa] Download IdConvert from http://www.genenformics.com/download.html for free. Bill at genenformics.com > > On Jun 10, 2008, at 8:36 PM, Jason Stajich wrote: >> I agree if it isn't the accession # it shouldn't be stored there. >> I guess it is a DBlink, but it is going to be hacky to round-trip >> this as you'll have to have a special case for records that are >> mRNAs... > > I think I agree with that - didn't realize it is the accession of the > (translated) protein. It would be ideal to convert this into a DBLink > annotation indeed, but that's an opinion and an interpretation of the > file (even if a very useful one). As such I believe it should be the > matter of a SeqProcessor. > > Hmm - except that at that point the information has been lost already > so there's actually nothing that the SeqProcessor could massage. > > So what if the line would simply be a B::Annotation::SimpleValue with > 'PA' as key and the accession# as value? That wouldn't be an > interpretation, and yet would make the value available to a > SeqProcessor for converting into a DBLink. > > -hilmar > >> >> -jason >> On Jun 10, 2008, at 5:19 PM, Chris Fields wrote: >> >>> PA is an odd field; it isn't described in the EMBL user manual: >>> >>> http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html >>> >>> but appears in mRNA files, so I'm guessing it stands for the (p) >>> rotein (a)ccession. I don't think this should be stored as >>> primary/secondary accession, but maybe as a DBLink annootation? >>> >>> chris >>> >>> On Jun 10, 2008, at 6:57 PM, Jason Stajich wrote: >>> >>>> PA is a field that we don't currently parse, something that >>>> should be filed as a bug on bugzilla. >>>> Would you be able to do this? >>>> >>>> -jason >>>> On Jun 9, 2008, at 7:07 AM, Wen Huang wrote: >>>> >>>>> Hilmar, >>>>> >>>>> I tried that, it did not work. Marc's way can work. >>>>> >>>>> Thanks, >>>>> Wen >>>>> >>>>> On Jun 9, 2008, at 7:30 AM, Hilmar Lapp wrote: >>>>> >>>>>> If this is the case with the latest version of BioPerl it >>>>>> should be filed as a bug report for the embl parser. The ID >>>>>> ought to be reported in $seq->get_secondary_accessions() (which >>>>>> returns an array). If it doesn't, it sounds like a bug to me. >>>>>> >>>>>> -hilmar >>>>>> >>>>>> On Jun 9, 2008, at 4:47 AM, Marc Logghe wrote: >>>>>>> Hi Wen, >>>>>>> A dump of that sequence object (Data::Dumper is your friend !) >>>>>>> reveals >>>>>>> that the PA EMBL field is not saved into the object. However, >>>>>>> you will >>>>>>> find the string 'AB000170.1' in the embedded CDS feature, more >>>>>>> precisely >>>>>>> the seqid of the location object. I don't know whether that is >>>>>>> always >>>>>>> the case, but it is in your particular example. >>>>>>> So, to get your hands on that value you have to do: >>>>>>> >>>>>>> my ($cds) = grep {$_->primary_tag eq 'CDS'} $seq- >>>>>>> >get_SeqFeatures; >>>>>>> my $parent_id = $cds->location->seq_id; >>>>>>> >>>>>>> HTH, >>>>>>> Marc >>>>>>> >>>>>>> Marc Logghe >>>>>>> Senior Bioinformatician >>>>>>> Ablynx nv >>>>>>>> -----Original Message----- >>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>>>>> bounces at lists.open-bio.org] On Behalf Of Wen Huang >>>>>>>> Sent: Monday, June 09, 2008 5:28 AM >>>>>>>> To: bioperl-l at lists.open-bio.org >>>>>>>> Subject: [Bioperl-l] EMBL format field >>>>>>>> >>>>>>>> Hi all, >>>>>>>> >>>>>>>> I have a EMBL file that I want to extract one of the line >>>>>>>> >>>>>>>> ###file### >>>>>>>> ID BAA19060; SV 1; linear; mRNA; STD; MAM; 2115 BP. >>>>>>>> XX >>>>>>>> PA AB000170.1 >>>>>>>> XX >>>>>>>> DE Sus scrofa (pig) endopeptidase 24.16 type M1 >>>>>>>> XX >>>>>>>> OS Sus scrofa (pig) >>>>>>>> OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; >>>>>>>> Euteleostomi; >>>>>>>> Mammalia; >>>>>>>> OC Eutheria; Laurasiatheria; Cetartiodactyla; Suina; >>>>>>>> Suidae; Sus. >>>>>>>> OX NCBI_TaxID=9823; >>>>>>>> ......... >>>>>>>> >>>>>>>> I want the accession number in the line that starts with PA, >>>>>>>> AB000170 >>>>>>>> in this example. >>>>>>>> >>>>>>>> Can anybody kindly help, tell me which module and method I >>>>>>>> should use? >>>>>>>> I tried various things like $seq_obj -> primary_id, display_id, >>>>>>>> get_secondary_id, etc.. they did not work... >>>>>>>> >>>>>>>> Thanks a lot! >>>>>>>> >>>>>>>> Wen >>>>>>>> _______________________________________________ >>>>>>>> Bioperl-l mailing list >>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> -- >>>>>> =========================================================== >>>>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>>>>> =========================================================== >>>>>> >>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> Christopher Fields >>> Postdoctoral Researcher >>> Lab of Dr. Marie-Claude Hofmann >>> College of Veterinary Medicine >>> University of Illinois Urbana-Champaign >>> >>> >>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From hlapp at gmx.net Tue Jun 10 22:09:13 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 10 Jun 2008 22:09:13 -0400 Subject: [Bioperl-l] EMBL format field In-Reply-To: <61099.98.218.171.90.1213148635.squirrel@webmail.dreamhost.com> References: <3A0B2CCF-6EA7-485D-ABC9-6A0F83B3F4B0@gmail.com> <03C512635899144083CADB0EE222018901C8072B@alpaca.lan.ablynx.com> <45F42C94-02FD-429B-816F-883C7855A97D@gmail.com> <468D955B-533C-439A-96EA-B7CC6392BFE3@bioperl.org> <2542BB6E-AC43-4D5A-8788-2330E59A6E6F@uiuc.edu> <9A1F4E7F-A231-410C-8890-14ECDC7D0258@bioperl.org> <283146E5-0041-42CF-B881-90624D5DE393@gmx.net> <61099.98.218.171.90.1213148635.squirrel@webmail.dreamhost.com> Message-ID: Bill, this mailing list is about BioPerl. There are many programs and web- sites out there that convert between IDs, that wasn't the question. We welcome your participation in helping to solve Bioperl-related problems, and sometimes the easiest solution is to use other, cross- platform open-source tools. For peddling commercial products, no matter how useful they are and how little the cost, please use other forums. -hilmar On Jun 10, 2008, at 9:43 PM, bill at genenformics.com wrote: > This can be accomplished using IdConvert if protein accession/gi is > known: > > $> ./IdConvert.exe BAA19060 > #Input Nuc_GI Nuc_Acc Pro_GI Pro_Acc Desc > BAA19060 1783121 AB000170.1 1783123 BAA19061.1 > endopeptidase 24.16 type M3 [Sus scrofa] > > Download IdConvert from http://www.genenformics.com/download.html > for free. > > Bill at genenformics.com > > >> >> On Jun 10, 2008, at 8:36 PM, Jason Stajich wrote: >>> I agree if it isn't the accession # it shouldn't be stored there. >>> I guess it is a DBlink, but it is going to be hacky to round-trip >>> this as you'll have to have a special case for records that are >>> mRNAs... >> >> I think I agree with that - didn't realize it is the accession of the >> (translated) protein. It would be ideal to convert this into a DBLink >> annotation indeed, but that's an opinion and an interpretation of the >> file (even if a very useful one). As such I believe it should be the >> matter of a SeqProcessor. >> >> Hmm - except that at that point the information has been lost already >> so there's actually nothing that the SeqProcessor could massage. >> >> So what if the line would simply be a B::Annotation::SimpleValue with >> 'PA' as key and the accession# as value? That wouldn't be an >> interpretation, and yet would make the value available to a >> SeqProcessor for converting into a DBLink. >> >> -hilmar >> >>> >>> -jason >>> On Jun 10, 2008, at 5:19 PM, Chris Fields wrote: >>> >>>> PA is an odd field; it isn't described in the EMBL user manual: >>>> >>>> http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html >>>> >>>> but appears in mRNA files, so I'm guessing it stands for the (p) >>>> rotein (a)ccession. I don't think this should be stored as >>>> primary/secondary accession, but maybe as a DBLink annootation? >>>> >>>> chris >>>> >>>> On Jun 10, 2008, at 6:57 PM, Jason Stajich wrote: >>>> >>>>> PA is a field that we don't currently parse, something that >>>>> should be filed as a bug on bugzilla. >>>>> Would you be able to do this? >>>>> >>>>> -jason >>>>> On Jun 9, 2008, at 7:07 AM, Wen Huang wrote: >>>>> >>>>>> Hilmar, >>>>>> >>>>>> I tried that, it did not work. Marc's way can work. >>>>>> >>>>>> Thanks, >>>>>> Wen >>>>>> >>>>>> On Jun 9, 2008, at 7:30 AM, Hilmar Lapp wrote: >>>>>> >>>>>>> If this is the case with the latest version of BioPerl it >>>>>>> should be filed as a bug report for the embl parser. The ID >>>>>>> ought to be reported in $seq->get_secondary_accessions() (which >>>>>>> returns an array). If it doesn't, it sounds like a bug to me. >>>>>>> >>>>>>> -hilmar >>>>>>> >>>>>>> On Jun 9, 2008, at 4:47 AM, Marc Logghe wrote: >>>>>>>> Hi Wen, >>>>>>>> A dump of that sequence object (Data::Dumper is your friend !) >>>>>>>> reveals >>>>>>>> that the PA EMBL field is not saved into the object. However, >>>>>>>> you will >>>>>>>> find the string 'AB000170.1' in the embedded CDS feature, more >>>>>>>> precisely >>>>>>>> the seqid of the location object. I don't know whether that is >>>>>>>> always >>>>>>>> the case, but it is in your particular example. >>>>>>>> So, to get your hands on that value you have to do: >>>>>>>> >>>>>>>> my ($cds) = grep {$_->primary_tag eq 'CDS'} $seq- >>>>>>>>> get_SeqFeatures; >>>>>>>> my $parent_id = $cds->location->seq_id; >>>>>>>> >>>>>>>> HTH, >>>>>>>> Marc >>>>>>>> >>>>>>>> Marc Logghe >>>>>>>> Senior Bioinformatician >>>>>>>> Ablynx nv >>>>>>>>> -----Original Message----- >>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Wen Huang >>>>>>>>> Sent: Monday, June 09, 2008 5:28 AM >>>>>>>>> To: bioperl-l at lists.open-bio.org >>>>>>>>> Subject: [Bioperl-l] EMBL format field >>>>>>>>> >>>>>>>>> Hi all, >>>>>>>>> >>>>>>>>> I have a EMBL file that I want to extract one of the line >>>>>>>>> >>>>>>>>> ###file### >>>>>>>>> ID BAA19060; SV 1; linear; mRNA; STD; MAM; 2115 BP. >>>>>>>>> XX >>>>>>>>> PA AB000170.1 >>>>>>>>> XX >>>>>>>>> DE Sus scrofa (pig) endopeptidase 24.16 type M1 >>>>>>>>> XX >>>>>>>>> OS Sus scrofa (pig) >>>>>>>>> OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; >>>>>>>>> Euteleostomi; >>>>>>>>> Mammalia; >>>>>>>>> OC Eutheria; Laurasiatheria; Cetartiodactyla; Suina; >>>>>>>>> Suidae; Sus. >>>>>>>>> OX NCBI_TaxID=9823; >>>>>>>>> ......... >>>>>>>>> >>>>>>>>> I want the accession number in the line that starts with PA, >>>>>>>>> AB000170 >>>>>>>>> in this example. >>>>>>>>> >>>>>>>>> Can anybody kindly help, tell me which module and method I >>>>>>>>> should use? >>>>>>>>> I tried various things like $seq_obj -> primary_id, >>>>>>>>> display_id, >>>>>>>>> get_secondary_id, etc.. they did not work... >>>>>>>>> >>>>>>>>> Thanks a lot! >>>>>>>>> >>>>>>>>> Wen >>>>>>>>> _______________________________________________ >>>>>>>>> Bioperl-l mailing list >>>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Bioperl-l mailing list >>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>>>>> -- >>>>>>> =========================================================== >>>>>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>>>>>> =========================================================== >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> Christopher Fields >>>> Postdoctoral Researcher >>>> Lab of Dr. Marie-Claude Hofmann >>>> College of Veterinary Medicine >>>> University of Illinois Urbana-Champaign >>>> >>>> >>>> >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From bill at genenformics.com Tue Jun 10 22:33:45 2008 From: bill at genenformics.com (bill at genenformics.com) Date: Tue, 10 Jun 2008 19:33:45 -0700 (PDT) Subject: [Bioperl-l] EMBL format field In-Reply-To: References: <3A0B2CCF-6EA7-485D-ABC9-6A0F83B3F4B0@gmail.com> <03C512635899144083CADB0EE222018901C8072B@alpaca.lan.ablynx.com> <45F42C94-02FD-429B-816F-883C7855A97D@gmail.com> <468D955B-533C-439A-96EA-B7CC6392BFE3@bioperl.org> <2542BB6E-AC43-4D5A-8788-2330E59A6E6F@uiuc.edu> <9A1F4E7F-A231-410C-8890-14ECDC7D0258@bioperl.org> <283146E5-0041-42CF-B881-90624D5DE393@gmx.net> <61099.98.218.171.90.1213148635.squirrel@webmail.dreamhost.com> Message-ID: <61609.98.218.171.90.1213151625.squirrel@webmail.dreamhost.com> Hi, Hilmar, Thank you for your advice. I am a BioPerl user and I step in only when there is no efficient/effective BioPerl method to solve specific problems. Please forgive us for providing free solutions. Bill at genenformics.com > Bill, > > this mailing list is about BioPerl. There are many programs and web- > sites out there that convert between IDs, that wasn't the question. > > We welcome your participation in helping to solve Bioperl-related > problems, and sometimes the easiest solution is to use other, cross- > platform open-source tools. > > For peddling commercial products, no matter how useful they are and > how little the cost, please use other forums. > > -hilmar > > On Jun 10, 2008, at 9:43 PM, bill at genenformics.com wrote: >> This can be accomplished using IdConvert if protein accession/gi is >> known: >> >> $> ./IdConvert.exe BAA19060 >> #Input Nuc_GI Nuc_Acc Pro_GI Pro_Acc Desc >> BAA19060 1783121 AB000170.1 1783123 BAA19061.1 >> endopeptidase 24.16 type M3 [Sus scrofa] >> >> Download IdConvert from http://www.genenformics.com/download.html >> for free. >> >> Bill at genenformics.com >> >> >>> >>> On Jun 10, 2008, at 8:36 PM, Jason Stajich wrote: >>>> I agree if it isn't the accession # it shouldn't be stored there. >>>> I guess it is a DBlink, but it is going to be hacky to round-trip >>>> this as you'll have to have a special case for records that are >>>> mRNAs... >>> >>> I think I agree with that - didn't realize it is the accession of the >>> (translated) protein. It would be ideal to convert this into a DBLink >>> annotation indeed, but that's an opinion and an interpretation of the >>> file (even if a very useful one). As such I believe it should be the >>> matter of a SeqProcessor. >>> >>> Hmm - except that at that point the information has been lost already >>> so there's actually nothing that the SeqProcessor could massage. >>> >>> So what if the line would simply be a B::Annotation::SimpleValue with >>> 'PA' as key and the accession# as value? That wouldn't be an >>> interpretation, and yet would make the value available to a >>> SeqProcessor for converting into a DBLink. >>> >>> -hilmar >>> >>>> >>>> -jason >>>> On Jun 10, 2008, at 5:19 PM, Chris Fields wrote: >>>> >>>>> PA is an odd field; it isn't described in the EMBL user manual: >>>>> >>>>> http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html >>>>> >>>>> but appears in mRNA files, so I'm guessing it stands for the (p) >>>>> rotein (a)ccession. I don't think this should be stored as >>>>> primary/secondary accession, but maybe as a DBLink annootation? >>>>> >>>>> chris >>>>> >>>>> On Jun 10, 2008, at 6:57 PM, Jason Stajich wrote: >>>>> >>>>>> PA is a field that we don't currently parse, something that >>>>>> should be filed as a bug on bugzilla. >>>>>> Would you be able to do this? >>>>>> >>>>>> -jason >>>>>> On Jun 9, 2008, at 7:07 AM, Wen Huang wrote: >>>>>> >>>>>>> Hilmar, >>>>>>> >>>>>>> I tried that, it did not work. Marc's way can work. >>>>>>> >>>>>>> Thanks, >>>>>>> Wen >>>>>>> >>>>>>> On Jun 9, 2008, at 7:30 AM, Hilmar Lapp wrote: >>>>>>> >>>>>>>> If this is the case with the latest version of BioPerl it >>>>>>>> should be filed as a bug report for the embl parser. The ID >>>>>>>> ought to be reported in $seq->get_secondary_accessions() (which >>>>>>>> returns an array). If it doesn't, it sounds like a bug to me. >>>>>>>> >>>>>>>> -hilmar >>>>>>>> >>>>>>>> On Jun 9, 2008, at 4:47 AM, Marc Logghe wrote: >>>>>>>>> Hi Wen, >>>>>>>>> A dump of that sequence object (Data::Dumper is your friend !) >>>>>>>>> reveals >>>>>>>>> that the PA EMBL field is not saved into the object. However, >>>>>>>>> you will >>>>>>>>> find the string 'AB000170.1' in the embedded CDS feature, more >>>>>>>>> precisely >>>>>>>>> the seqid of the location object. I don't know whether that is >>>>>>>>> always >>>>>>>>> the case, but it is in your particular example. >>>>>>>>> So, to get your hands on that value you have to do: >>>>>>>>> >>>>>>>>> my ($cds) = grep {$_->primary_tag eq 'CDS'} $seq- >>>>>>>>>> get_SeqFeatures; >>>>>>>>> my $parent_id = $cds->location->seq_id; >>>>>>>>> >>>>>>>>> HTH, >>>>>>>>> Marc >>>>>>>>> >>>>>>>>> Marc Logghe >>>>>>>>> Senior Bioinformatician >>>>>>>>> Ablynx nv >>>>>>>>>> -----Original Message----- >>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Wen Huang >>>>>>>>>> Sent: Monday, June 09, 2008 5:28 AM >>>>>>>>>> To: bioperl-l at lists.open-bio.org >>>>>>>>>> Subject: [Bioperl-l] EMBL format field >>>>>>>>>> >>>>>>>>>> Hi all, >>>>>>>>>> >>>>>>>>>> I have a EMBL file that I want to extract one of the line >>>>>>>>>> >>>>>>>>>> ###file### >>>>>>>>>> ID BAA19060; SV 1; linear; mRNA; STD; MAM; 2115 BP. >>>>>>>>>> XX >>>>>>>>>> PA AB000170.1 >>>>>>>>>> XX >>>>>>>>>> DE Sus scrofa (pig) endopeptidase 24.16 type M1 >>>>>>>>>> XX >>>>>>>>>> OS Sus scrofa (pig) >>>>>>>>>> OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; >>>>>>>>>> Euteleostomi; >>>>>>>>>> Mammalia; >>>>>>>>>> OC Eutheria; Laurasiatheria; Cetartiodactyla; Suina; >>>>>>>>>> Suidae; Sus. >>>>>>>>>> OX NCBI_TaxID=9823; >>>>>>>>>> ......... >>>>>>>>>> >>>>>>>>>> I want the accession number in the line that starts with PA, >>>>>>>>>> AB000170 >>>>>>>>>> in this example. >>>>>>>>>> >>>>>>>>>> Can anybody kindly help, tell me which module and method I >>>>>>>>>> should use? >>>>>>>>>> I tried various things like $seq_obj -> primary_id, >>>>>>>>>> display_id, >>>>>>>>>> get_secondary_id, etc.. they did not work... >>>>>>>>>> >>>>>>>>>> Thanks a lot! >>>>>>>>>> >>>>>>>>>> Wen >>>>>>>>>> _______________________________________________ >>>>>>>>>> Bioperl-l mailing list >>>>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Bioperl-l mailing list >>>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>> >>>>>>>> -- >>>>>>>> =========================================================== >>>>>>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>>>>>>> =========================================================== >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> Christopher Fields >>>>> Postdoctoral Researcher >>>>> Lab of Dr. Marie-Claude Hofmann >>>>> College of Veterinary Medicine >>>>> University of Illinois Urbana-Champaign >>>>> >>>>> >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>> =========================================================== >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Tue Jun 10 22:59:43 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 10 Jun 2008 21:59:43 -0500 Subject: [Bioperl-l] EMBL format field In-Reply-To: <61609.98.218.171.90.1213151625.squirrel@webmail.dreamhost.com> References: <3A0B2CCF-6EA7-485D-ABC9-6A0F83B3F4B0@gmail.com> <03C512635899144083CADB0EE222018901C8072B@alpaca.lan.ablynx.com> <45F42C94-02FD-429B-816F-883C7855A97D@gmail.com> <468D955B-533C-439A-96EA-B7CC6392BFE3@bioperl.org> <2542BB6E-AC43-4D5A-8788-2330E59A6E6F@uiuc.edu> <9A1F4E7F-A231-410C-8890-14ECDC7D0258@bioperl.org> <283146E5-0041-42CF-B881-90624D5DE393@gmx.net> <61099.98.218.171.90.1213148635.squirrel@webmail.dreamhost.com> <61609.98.218.171.90.1213151625.squirrel@webmail.dreamhost.com> Message-ID: <3804E643-9BBC-4B28-B1DE-D75AA2C9FE74@uiuc.edu> Bill, It's okay to offer suggestions to problems, particularly if no one answers, but I have to agree with Hilmar in this case. The specific problem: your 'solution' is tied to commercial software (albeit free), which appear to be closed-source and with questionable licensing. I couldn't find documentation on your website addressing either issue. Therefore, I couldn't recommend using it unless the latter two issues were clarified, preferably by becoming open-source. chris On Jun 10, 2008, at 9:33 PM, bill at genenformics.com wrote: > Hi, Hilmar, > > Thank you for your advice. > > I am a BioPerl user and I step in only when there is no > efficient/effective BioPerl method to solve specific problems. > > Please forgive us for providing free solutions. > > Bill at genenformics.com > >> Bill, >> >> this mailing list is about BioPerl. There are many programs and web- >> sites out there that convert between IDs, that wasn't the question. >> >> We welcome your participation in helping to solve Bioperl-related >> problems, and sometimes the easiest solution is to use other, cross- >> platform open-source tools. >> >> For peddling commercial products, no matter how useful they are and >> how little the cost, please use other forums. >> >> -hilmar >> >> On Jun 10, 2008, at 9:43 PM, bill at genenformics.com wrote: >>> This can be accomplished using IdConvert if protein accession/gi is >>> known: >>> >>> $> ./IdConvert.exe BAA19060 >>> #Input Nuc_GI Nuc_Acc Pro_GI Pro_Acc Desc >>> BAA19060 1783121 AB000170.1 1783123 BAA19061.1 >>> endopeptidase 24.16 type M3 [Sus scrofa] >>> >>> Download IdConvert from http://www.genenformics.com/download.html >>> for free. >>> >>> Bill at genenformics.com >>> >>> >>>> >>>> On Jun 10, 2008, at 8:36 PM, Jason Stajich wrote: >>>>> I agree if it isn't the accession # it shouldn't be stored there. >>>>> I guess it is a DBlink, but it is going to be hacky to round-trip >>>>> this as you'll have to have a special case for records that are >>>>> mRNAs... >>>> >>>> I think I agree with that - didn't realize it is the accession of >>>> the >>>> (translated) protein. It would be ideal to convert this into a >>>> DBLink >>>> annotation indeed, but that's an opinion and an interpretation of >>>> the >>>> file (even if a very useful one). As such I believe it should be >>>> the >>>> matter of a SeqProcessor. >>>> >>>> Hmm - except that at that point the information has been lost >>>> already >>>> so there's actually nothing that the SeqProcessor could massage. >>>> >>>> So what if the line would simply be a B::Annotation::SimpleValue >>>> with >>>> 'PA' as key and the accession# as value? That wouldn't be an >>>> interpretation, and yet would make the value available to a >>>> SeqProcessor for converting into a DBLink. >>>> >>>> -hilmar >>>> >>>>> >>>>> -jason >>>>> On Jun 10, 2008, at 5:19 PM, Chris Fields wrote: >>>>> >>>>>> PA is an odd field; it isn't described in the EMBL user manual: >>>>>> >>>>>> http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html >>>>>> >>>>>> but appears in mRNA files, so I'm guessing it stands for the (p) >>>>>> rotein (a)ccession. I don't think this should be stored as >>>>>> primary/secondary accession, but maybe as a DBLink annootation? >>>>>> >>>>>> chris >>>>>> >>>>>> On Jun 10, 2008, at 6:57 PM, Jason Stajich wrote: >>>>>> >>>>>>> PA is a field that we don't currently parse, something that >>>>>>> should be filed as a bug on bugzilla. >>>>>>> Would you be able to do this? >>>>>>> >>>>>>> -jason >>>>>>> On Jun 9, 2008, at 7:07 AM, Wen Huang wrote: >>>>>>> >>>>>>>> Hilmar, >>>>>>>> >>>>>>>> I tried that, it did not work. Marc's way can work. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Wen >>>>>>>> >>>>>>>> On Jun 9, 2008, at 7:30 AM, Hilmar Lapp wrote: >>>>>>>> >>>>>>>>> If this is the case with the latest version of BioPerl it >>>>>>>>> should be filed as a bug report for the embl parser. The ID >>>>>>>>> ought to be reported in $seq->get_secondary_accessions() >>>>>>>>> (which >>>>>>>>> returns an array). If it doesn't, it sounds like a bug to me. >>>>>>>>> >>>>>>>>> -hilmar >>>>>>>>> >>>>>>>>> On Jun 9, 2008, at 4:47 AM, Marc Logghe wrote: >>>>>>>>>> Hi Wen, >>>>>>>>>> A dump of that sequence object (Data::Dumper is your >>>>>>>>>> friend !) >>>>>>>>>> reveals >>>>>>>>>> that the PA EMBL field is not saved into the object. However, >>>>>>>>>> you will >>>>>>>>>> find the string 'AB000170.1' in the embedded CDS feature, >>>>>>>>>> more >>>>>>>>>> precisely >>>>>>>>>> the seqid of the location object. I don't know whether that >>>>>>>>>> is >>>>>>>>>> always >>>>>>>>>> the case, but it is in your particular example. >>>>>>>>>> So, to get your hands on that value you have to do: >>>>>>>>>> >>>>>>>>>> my ($cds) = grep {$_->primary_tag eq 'CDS'} $seq- >>>>>>>>>>> get_SeqFeatures; >>>>>>>>>> my $parent_id = $cds->location->seq_id; >>>>>>>>>> >>>>>>>>>> HTH, >>>>>>>>>> Marc >>>>>>>>>> >>>>>>>>>> Marc Logghe >>>>>>>>>> Senior Bioinformatician >>>>>>>>>> Ablynx nv >>>>>>>>>>> -----Original Message----- >>>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl- >>>>>>>>>>> l- >>>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Wen Huang >>>>>>>>>>> Sent: Monday, June 09, 2008 5:28 AM >>>>>>>>>>> To: bioperl-l at lists.open-bio.org >>>>>>>>>>> Subject: [Bioperl-l] EMBL format field >>>>>>>>>>> >>>>>>>>>>> Hi all, >>>>>>>>>>> >>>>>>>>>>> I have a EMBL file that I want to extract one of the line >>>>>>>>>>> >>>>>>>>>>> ###file### >>>>>>>>>>> ID BAA19060; SV 1; linear; mRNA; STD; MAM; 2115 BP. >>>>>>>>>>> XX >>>>>>>>>>> PA AB000170.1 >>>>>>>>>>> XX >>>>>>>>>>> DE Sus scrofa (pig) endopeptidase 24.16 type M1 >>>>>>>>>>> XX >>>>>>>>>>> OS Sus scrofa (pig) >>>>>>>>>>> OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; >>>>>>>>>>> Euteleostomi; >>>>>>>>>>> Mammalia; >>>>>>>>>>> OC Eutheria; Laurasiatheria; Cetartiodactyla; Suina; >>>>>>>>>>> Suidae; Sus. >>>>>>>>>>> OX NCBI_TaxID=9823; >>>>>>>>>>> ......... >>>>>>>>>>> >>>>>>>>>>> I want the accession number in the line that starts with PA, >>>>>>>>>>> AB000170 >>>>>>>>>>> in this example. >>>>>>>>>>> >>>>>>>>>>> Can anybody kindly help, tell me which module and method I >>>>>>>>>>> should use? >>>>>>>>>>> I tried various things like $seq_obj -> primary_id, >>>>>>>>>>> display_id, >>>>>>>>>>> get_secondary_id, etc.. they did not work... >>>>>>>>>>> >>>>>>>>>>> Thanks a lot! >>>>>>>>>>> >>>>>>>>>>> Wen >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Bioperl-l mailing list >>>>>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Bioperl-l mailing list >>>>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>> >>>>>>>>> -- >>>>>>>>> =========================================================== >>>>>>>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>>>>>>>> =========================================================== >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Bioperl-l mailing list >>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> Christopher Fields >>>>>> Postdoctoral Researcher >>>>>> Lab of Dr. Marie-Claude Hofmann >>>>>> College of Veterinary Medicine >>>>>> University of Illinois Urbana-Champaign >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> -- >>>> =========================================================== >>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>>> =========================================================== >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Marie-Claude Hofmann College of Veterinary Medicine University of Illinois Urbana-Champaign From cjfields at uiuc.edu Tue Jun 10 23:00:04 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 10 Jun 2008 22:00:04 -0500 Subject: [Bioperl-l] A lot of POD fixes in bioperl-live and bioperl run In-Reply-To: <200806110122.10982.heikki@sanbi.ac.za> References: <200806110122.10982.heikki@sanbi.ac.za> Message-ID: <2BCA7C8B-CDAC-49A8-9809-F9127DB05BEC@uiuc.edu> Thanks for the work on this Heikki! chris On Jun 10, 2008, at 6:22 PM, Heikki Lehvaslaiho wrote: > I have recently done a lot fixes in the inline Plain Old Documenation > (POD) texts in bioperl-live and bioperl-run. Last ones (hopefully) > were > committed a few minutes ago. This has resulted quite large updates > from SVN. > > I wanted to apologize the inconvenience and to explain reasons for > these small > and pedantic fixes. > > In contrast to perl, POD is sensitive to white space. This makes it > relatively