From rmb32 at cornell.edu Sun Aug 1 15:17:14 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Sun, 01 Aug 2010 12:17:14 -0700 Subject: [Bioperl-l] GMOD Evo Hackathon Open Call for Participation Message-ID: <4C55C83A.3060700@cornell.edu> We are seeking participants for the GMOD Tools for Evolutionary Biology Hackathon, held November 8-12, 2010 at the US National Evolutionary Synthesis Center (NESCent) in Durham, NC. This hackathon targets three critical gaps in the capabilities of the GMOD toolbox that currently limit its utility for evolutionary research: 1. Visualization of comparative genomics data 2. Visualization of phylogenetic data and trees 3. Support for population diversity and phenotype data If you are interested in these areas and have relevant expertise, you are strongly encouraged to apply. Relevant areas of expertise include more than just software development: if you are a GMOD power user, visualization guru, domain expert (comparative, phylogenetics, population, ...), or documentation wizard, then your skills are needed! How To Apply: Fill out the online application form at http://bit.ly/gmodevohack. Applications are due August 25. About GMOD: GMOD is an intercompatible suite of open-source software components for storing, managing, analyzing, and visualizing genome-scale data. GMOD includes many widely-used software components: GBrowse and JBrowse, both genome viewers; GBrowse_syn, a comparative genomics viewer; Chado, a generic and modular database schema; CMap, a comparative map viewer; as well as many other components including Apollo, MAKER, BioMart, InterMine, and Galaxy. We hope to extend the functionality of existing GMOD components, and integrate new components as well. About Hackathons: A hackathon is an intense event at which a group of programmers with different backgrounds and skills collaborate hands-on and face-to-face to develop working code that is of utility to the community as a whole. The mix of people will include domain experts and computer-savvy end-users. More details about the event, its motivation, organization, procedures, and attendees, as well as URLs to the hackathon and related websites are included below. Sincerely, The GMOD EvoHack Organizing Committee (and project affiliations as relevant): Nicole Washington, Chair (LBNL, modENCODE, Phenote) Robert Buels (SGN, Chado NatDiv) Scott Cain (OICR, GMOD) Dave Clements (NESCent, GMOD) Hilmar Lapp (NESCent, Phenoscape, Chado NatDiv) Sheldon McKay (University of Arizona, iPlant, GBrowse_syn) ----------------------------- About the GMOD Evo Hackathon Overview We are organizing a hackathon to fill critical gaps in the capabilities of the Generic Model Organism Database (GMOD) toolbox that currently limit its utility for evolutionary research. Specifically, we will focus on tools for 1) viewing comparative genomics data; 2) visualizing phylogenomic data; and 3) supporting population diversity data and phenotype annotation. The event will be hosted at NESCent and bring together a group of about 20+ software developers, end-user representatives, and documentation experts who would otherwise not meet. The participants will include key developers of GMOD components that currently lack features critical for emerging evolutionary biology research, developers of informatics tools in evolutionary research that lack GMOD integration, and informatics-savvy biologists who can represent end-user requirements. The event will provide a unique opportunity to infuse the GMOD developer community with a heightened awareness of unmet needs in evolutionary biology that GMOD components have the potential to fill, and for tool developers in evolutionary biology to better understand how best to extend or integrate with already existing GMOD components. Before the Event Discussion of ideas and sometimes even design actually starts well before the hackathon, on mailing lists, wiki pages, and conference calls set up among accepted attendees. This advance work lays the foundation for participants to be productive from the very first day. This also means that participants should be willing to contribute some time in advance of the hackathon itself to participate in this preparatory discussion. During the Event Typically, hackathon participants use the morning of the first day of the event to organize themselves into working groups of between 3 and 6 people, each with a focused implementation objective. Ideas and objectives are discussed, and attendees coalesce around the projects in which they have the most experience or interest. Deliverables / Event Results The meeting's attendance, working groups, and outcomes will be fully logged and documented on the GMOD wiki (http://gmod.org). Each working group during the event will typically have its own wiki page, linked from the main EvoHack page, where it documents its minutes and design notes, and provides links to the code and documentation it produces. Also, since GMOD and NESCent are both committed to open source principles, all code and documentation produced by participants during the hackathon must be published under an OSI-approved open source license. As contributions to existing GMOD tools, all hackathon products will most likely satisfy this requirement automatically. NESCent This event is sponsored by the US National Evolutionary Synthesis Center (NESCent, http://www.nescent.org) through its Informatics Whitepapers program (http://www.nescent.org/informatics/whitepapers.php). NESCent promotes the synthesis of information, concepts and knowledge to address significant, emerging, or novel questions in evolutionary science and its applications. NESCent achieves this by supporting research and education across disciplinary, institutional, geographic, and demographic boundaries (see http://www.nescent.org/science/proposals.php). Links Main GMOD EvoHack page, and full proposal: http://gmod.org/wiki/GMOD_Evo_Hackathon NESCent: http://www.nescent.org/ GMOD: http://gmod.org Similar past NESCent events, see: http://hackathon.nescent.org/ GMOD hackathon application: http://bit.ly/gmodevohack -- http://gmod.org/wiki/GMOD_News http://gmod.org/wiki/GMOD_Europe_2010 http://gmod.org/wiki/Help_Desk_Feedback From maj at fortinbras.us Sun Aug 1 19:19:16 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 1 Aug 2010 19:19:16 -0400 Subject: [Bioperl-l] SOAP Eutilities In-Reply-To: References: Message-ID: <627BEC8B2E624A69A0B11EEBC8C93B71@NewLife> Turns out that module lives in bioperl-run; try git clone git://github.com/bioperl/bioperl-run.git MAJ ----- Original Message ----- From: "Robson de Souza" To: Sent: Saturday, July 31, 2010 4:56 PM Subject: [Bioperl-l] SOAP Eutilities > Hi, > > Bio::DB::SoapEUtilities, referred in the HOWTO on EUtilities, seems to > have disappeared from the Git repository. > A simple > > git clone git://github.com/bioperl/bioperl-live.git > > does not download it. Any ideas why? > Robson > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From David.Messina at sbc.su.se Mon Aug 2 09:58:10 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 2 Aug 2010 15:58:10 +0200 Subject: [Bioperl-l] phyloxml and element order In-Reply-To: References: Message-ID: Hi Fred, Thanks for letting us know about this ? definitely sounds like a bug. Would you please submit this to our bug tracker? http://bugzilla.open-bio.org (You can just copy and paste your previous email.) Dave On Jul 30, 2010, at 06:59, Fr?d?ric Romagn? wrote: > Hi, > > I'm using bioperl to create phyloxml trees, after few tentatives, i got my > tree with all the element/attributes i want but when I write the tree, > element are not written following the order specified in the XSD Schema. > > For example, i got : > > > > Loxosceles intermedia > > Araneomorphae Sicariidae > > > 969 > HAAERADSRKPIWDIAHMVNDLELVD > > > > Araneomorphae Sicariidae > > > > The program forester complains that should be written before the > element. > > According to > http://phyloxml.wordpress.com/2009/11/25/order-of-elements-in-phyloxml this > is what bioperl is supposed to do. > > All my element/attributes are set before writing the tree using > 'add_Annotation', 'add_tag_value' and 'sequence' methods from a > Bio::Tree::AnnotatableNode object, so i think the error comes from the > write_tree method. > > Any help would be appreciated. > > Thank you, > Fred > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From shalabh.sharma7 at gmail.com Mon Aug 2 15:44:35 2010 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Mon, 2 Aug 2010 15:44:35 -0400 Subject: [Bioperl-l] clustalw to maf format Message-ID: Hi, I am trying to convert clustalw to maf format. I am trying to use AlignIO for that but its not working. Its giving me the following error: EXCEPTION Bio::Root::NotImplemented ------------- MSG: Abstract method "Bio::AlignIO::maf::write_aln" is not implemented by package Bio::AlignIO::maf. This is not your fault - author of Bio::AlignIO::maf should be blamed! STACK Bio::Root::RootI::throw_not_implemented /Library/Perl/5.8.8/Bio/Root/RootI.pm:707 STACK Bio::AlignIO::maf::write_aln /Library/Perl/5.8.8/Bio/AlignIO/ maf.pm:176 STACK Bio::AlignIO::PRINT /Library/Perl/5.8.8/Bio/AlignIO.pm:492 STACK toplevel msf2mafy.pl:11 Is there any other way i can convert clustalw to maf? I would really appreciate if anyone can help me out. Thanks Shalabh From Russell.Smithies at agresearch.co.nz Mon Aug 2 16:25:26 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue, 3 Aug 2010 08:25:26 +1200 Subject: [Bioperl-l] clustalw to maf format In-Reply-To: References: Message-ID: <18DF7D20DFEC044098A1062202F5FFF32F02147B68@exchsth.agresearch.co.nz> This might work if you only have a few: http://www.ibi.vu.nl/programs/convertalignwww/ --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of shalabh sharma > Sent: Tuesday, 3 August 2010 7:45 a.m. > To: bioperl-l > Subject: [Bioperl-l] clustalw to maf format > > Hi, > I am trying to convert clustalw to maf format. > I am trying to use AlignIO for that but its not working. > > Its giving me the following error: > > EXCEPTION Bio::Root::NotImplemented ------------- > MSG: Abstract method "Bio::AlignIO::maf::write_aln" is not implemented by > package Bio::AlignIO::maf. > This is not your fault - author of Bio::AlignIO::maf should be blamed! > > STACK Bio::Root::RootI::throw_not_implemented > /Library/Perl/5.8.8/Bio/Root/RootI.pm:707 > STACK Bio::AlignIO::maf::write_aln /Library/Perl/5.8.8/Bio/AlignIO/ > maf.pm:176 > STACK Bio::AlignIO::PRINT /Library/Perl/5.8.8/Bio/AlignIO.pm:492 > STACK toplevel msf2mafy.pl:11 > > > Is there any other way i can convert clustalw to maf? > > I would really appreciate if anyone can help me out. > > Thanks > Shalabh > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From shalabh.sharma7 at gmail.com Mon Aug 2 16:53:31 2010 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Mon, 2 Aug 2010 16:53:31 -0400 Subject: [Bioperl-l] clustalw to maf format In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32F02147B68@exchsth.agresearch.co.nz> References: <18DF7D20DFEC044098A1062202F5FFF32F02147B68@exchsth.agresearch.co.nz> Message-ID: Hi Russell, Thanks for the reply, but i have around 400 alignments and some huge ones :( Thanks Shalabh On Mon, Aug 2, 2010 at 4:25 PM, Smithies, Russell < Russell.Smithies at agresearch.co.nz> wrote: > This might work if you only have a few: > http://www.ibi.vu.nl/programs/convertalignwww/ > > --Russell > > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of shalabh sharma > > Sent: Tuesday, 3 August 2010 7:45 a.m. > > To: bioperl-l > > Subject: [Bioperl-l] clustalw to maf format > > > > Hi, > > I am trying to convert clustalw to maf format. > > I am trying to use AlignIO for that but its not working. > > > > Its giving me the following error: > > > > EXCEPTION Bio::Root::NotImplemented ------------- > > MSG: Abstract method "Bio::AlignIO::maf::write_aln" is not implemented by > > package Bio::AlignIO::maf. > > This is not your fault - author of Bio::AlignIO::maf should be blamed! > > > > STACK Bio::Root::RootI::throw_not_implemented > > /Library/Perl/5.8.8/Bio/Root/RootI.pm:707 > > STACK Bio::AlignIO::maf::write_aln /Library/Perl/5.8.8/Bio/AlignIO/ > > maf.pm:176 > > STACK Bio::AlignIO::PRINT /Library/Perl/5.8.8/Bio/AlignIO.pm:492 > > STACK toplevel msf2mafy.pl:11 > > > > > > Is there any other way i can convert clustalw to maf? > > > > I would really appreciate if anyone can help me out. > > > > Thanks > > Shalabh > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > From biopython at maubp.freeserve.co.uk Mon Aug 2 17:24:09 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 2 Aug 2010 22:24:09 +0100 Subject: [Bioperl-l] clustalw to maf format In-Reply-To: References: Message-ID: On Mon, Aug 2, 2010 at 8:44 PM, shalabh sharma wrote: > Hi, > ? ?I am trying to convert clustalw to maf format. > I am trying to use AlignIO for that but its not working. Could you tell us why you have to use maf format? I'm curious because all of the phylogenetics tools I've had to work with personally will take some other format which is more widely supported (e.g. FASTA, PFAM, ClustalW, PHYLIP, ...). Peter From bernd.web at gmail.com Mon Aug 2 17:25:52 2010 From: bernd.web at gmail.com (Bernd Web) Date: Mon, 2 Aug 2010 23:25:52 +0200 Subject: [Bioperl-l] clustalw to maf format In-Reply-To: References: <18DF7D20DFEC044098A1062202F5FFF32F02147B68@exchsth.agresearch.co.nz> Message-ID: Hi Shalabh, This ConvertAlign does not write maf either, it only reads it (i made it). I found some other converters on the web but they do not export to maf format either... http://biotechvana.uv.es/servers/afc/main.php http://www.hiv.lanl.gov/content/sequence/FORMAT_CONVERSION/form.html Galaxy has a MAF to Fasta converter: http://main.g2.bx.psu.edu/root?tool_id=MAF_To_Fasta1 Regards, Bernd On Mon, Aug 2, 2010 at 10:53 PM, shalabh sharma wrote: > Hi Russell, > ? ? ? ? ? ?Thanks for the reply, but i ?have around 400 alignments and some > huge ones :( > > Thanks > Shalabh > > > On Mon, Aug 2, 2010 at 4:25 PM, Smithies, Russell < > Russell.Smithies at agresearch.co.nz> wrote: > >> This might work if you only have a few: >> http://www.ibi.vu.nl/programs/convertalignwww/ >> >> --Russell >> >> >> > -----Original Message----- >> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> > bounces at lists.open-bio.org] On Behalf Of shalabh sharma >> > Sent: Tuesday, 3 August 2010 7:45 a.m. >> > To: bioperl-l >> > Subject: [Bioperl-l] clustalw to maf format >> > >> > Hi, >> > ? ? I am trying to convert clustalw to maf format. >> > I am trying to use AlignIO for that but its not working. >> > >> > Its giving me the following error: >> > >> > EXCEPTION Bio::Root::NotImplemented ------------- >> > MSG: Abstract method "Bio::AlignIO::maf::write_aln" is not implemented by >> > package Bio::AlignIO::maf. >> > This is not your fault - author of Bio::AlignIO::maf should be blamed! >> > >> > STACK Bio::Root::RootI::throw_not_implemented >> > /Library/Perl/5.8.8/Bio/Root/RootI.pm:707 >> > STACK Bio::AlignIO::maf::write_aln /Library/Perl/5.8.8/Bio/AlignIO/ >> > maf.pm:176 >> > STACK Bio::AlignIO::PRINT /Library/Perl/5.8.8/Bio/AlignIO.pm:492 >> > STACK toplevel msf2mafy.pl:11 >> > >> > >> > Is there any other way i can convert clustalw to maf? >> > >> > I would really appreciate if anyone can help me out. >> > >> > Thanks >> > Shalabh >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> ======================================================================= >> Attention: The information contained in this message and/or attachments >> from AgResearch Limited is intended only for the persons or entities >> to which it is addressed and may contain confidential and/or privileged >> material. Any review, retransmission, dissemination or other use of, or >> taking of any action in reliance upon, this information by persons or >> entities other than the intended recipients is prohibited by AgResearch >> Limited. If you have received this message in error, please notify the >> sender immediately. >> ======================================================================= >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Mon Aug 2 17:31:20 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 2 Aug 2010 16:31:20 -0500 Subject: [Bioperl-l] clustalw to maf format In-Reply-To: References: <18DF7D20DFEC044098A1062202F5FFF32F02147B68@exchsth.agresearch.co.nz> Message-ID: <6E9C9D64-D23A-4FC8-B213-FC8A7FFA4F27@illinois.edu> No other format will work? The main reason you see unimplemented methods like this is there is no active interest in working with this format beyond getting the information stored within them into objects and other commonly-used formats. chris On Aug 2, 2010, at 3:53 PM, shalabh sharma wrote: > Hi Russell, > Thanks for the reply, but i have around 400 alignments and some > huge ones :( > > Thanks > Shalabh > > > On Mon, Aug 2, 2010 at 4:25 PM, Smithies, Russell < > Russell.Smithies at agresearch.co.nz> wrote: > >> This might work if you only have a few: >> http://www.ibi.vu.nl/programs/convertalignwww/ >> >> --Russell >> >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>> bounces at lists.open-bio.org] On Behalf Of shalabh sharma >>> Sent: Tuesday, 3 August 2010 7:45 a.m. >>> To: bioperl-l >>> Subject: [Bioperl-l] clustalw to maf format >>> >>> Hi, >>> I am trying to convert clustalw to maf format. >>> I am trying to use AlignIO for that but its not working. >>> >>> Its giving me the following error: >>> >>> EXCEPTION Bio::Root::NotImplemented ------------- >>> MSG: Abstract method "Bio::AlignIO::maf::write_aln" is not implemented by >>> package Bio::AlignIO::maf. >>> This is not your fault - author of Bio::AlignIO::maf should be blamed! >>> >>> STACK Bio::Root::RootI::throw_not_implemented >>> /Library/Perl/5.8.8/Bio/Root/RootI.pm:707 >>> STACK Bio::AlignIO::maf::write_aln /Library/Perl/5.8.8/Bio/AlignIO/ >>> maf.pm:176 >>> STACK Bio::AlignIO::PRINT /Library/Perl/5.8.8/Bio/AlignIO.pm:492 >>> STACK toplevel msf2mafy.pl:11 >>> >>> >>> Is there any other way i can convert clustalw to maf? >>> >>> I would really appreciate if anyone can help me out. >>> >>> Thanks >>> Shalabh >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> ======================================================================= >> Attention: The information contained in this message and/or attachments >> from AgResearch Limited is intended only for the persons or entities >> to which it is addressed and may contain confidential and/or privileged >> material. Any review, retransmission, dissemination or other use of, or >> taking of any action in reliance upon, this information by persons or >> entities other than the intended recipients is prohibited by AgResearch >> Limited. If you have received this message in error, please notify the >> sender immediately. >> ======================================================================= >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From shalabh.sharma7 at gmail.com Mon Aug 2 18:30:41 2010 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Mon, 2 Aug 2010 18:30:41 -0400 Subject: [Bioperl-l] clustalw to maf format In-Reply-To: <6E9C9D64-D23A-4FC8-B213-FC8A7FFA4F27@illinois.edu> References: <18DF7D20DFEC044098A1062202F5FFF32F02147B68@exchsth.agresearch.co.nz> <6E9C9D64-D23A-4FC8-B213-FC8A7FFA4F27@illinois.edu> Message-ID: Hi All, Thanks for the replies. Actually i am working on a pipeline involving RNAz. I had impression that there must be a converter available as their webserver can take xmfa or maf format but standalone is only accepting maf format. I think i will use a program that can output as xmfa and write to those people if they can provide me with the converter. Thanks Shalabh On Mon, Aug 2, 2010 at 5:31 PM, Chris Fields wrote: > No other format will work? The main reason you see unimplemented methods > like this is there is no active interest in working with this format beyond > getting the information stored within them into objects and other > commonly-used formats. > > chris > > On Aug 2, 2010, at 3:53 PM, shalabh sharma wrote: > > > Hi Russell, > > Thanks for the reply, but i have around 400 alignments and > some > > huge ones :( > > > > Thanks > > Shalabh > > > > > > On Mon, Aug 2, 2010 at 4:25 PM, Smithies, Russell < > > Russell.Smithies at agresearch.co.nz> wrote: > > > >> This might work if you only have a few: > >> http://www.ibi.vu.nl/programs/convertalignwww/ > >> > >> --Russell > >> > >> > >>> -----Original Message----- > >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>> bounces at lists.open-bio.org] On Behalf Of shalabh sharma > >>> Sent: Tuesday, 3 August 2010 7:45 a.m. > >>> To: bioperl-l > >>> Subject: [Bioperl-l] clustalw to maf format > >>> > >>> Hi, > >>> I am trying to convert clustalw to maf format. > >>> I am trying to use AlignIO for that but its not working. > >>> > >>> Its giving me the following error: > >>> > >>> EXCEPTION Bio::Root::NotImplemented ------------- > >>> MSG: Abstract method "Bio::AlignIO::maf::write_aln" is not implemented > by > >>> package Bio::AlignIO::maf. > >>> This is not your fault - author of Bio::AlignIO::maf should be blamed! > >>> > >>> STACK Bio::Root::RootI::throw_not_implemented > >>> /Library/Perl/5.8.8/Bio/Root/RootI.pm:707 > >>> STACK Bio::AlignIO::maf::write_aln /Library/Perl/5.8.8/Bio/AlignIO/ > >>> maf.pm:176 > >>> STACK Bio::AlignIO::PRINT /Library/Perl/5.8.8/Bio/AlignIO.pm:492 > >>> STACK toplevel msf2mafy.pl:11 > >>> > >>> > >>> Is there any other way i can convert clustalw to maf? > >>> > >>> I would really appreciate if anyone can help me out. > >>> > >>> Thanks > >>> Shalabh > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> ======================================================================= > >> Attention: The information contained in this message and/or attachments > >> from AgResearch Limited is intended only for the persons or entities > >> to which it is addressed and may contain confidential and/or privileged > >> material. Any review, retransmission, dissemination or other use of, or > >> taking of any action in reliance upon, this information by persons or > >> entities other than the intended recipients is prohibited by AgResearch > >> Limited. If you have received this message in error, please notify the > >> sender immediately. > >> ======================================================================= > >> > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From chiragmatkarbioinfo at gmail.com Tue Aug 3 03:47:37 2010 From: chiragmatkarbioinfo at gmail.com (chirag matkar) Date: Tue, 3 Aug 2010 13:17:37 +0530 Subject: [Bioperl-l] Pubmed Parsing Message-ID: Hello all, I have a list of Pubmed Ids. I want to parse articles to find specific SNP related information. Can i work it out using a Script? -- Regards, Chirag Matkar From genehack at genehack.org Tue Aug 3 05:03:35 2010 From: genehack at genehack.org (John Anderson) Date: Tue, 3 Aug 2010 05:03:35 -0400 Subject: [Bioperl-l] Pubmed Parsing In-Reply-To: References: Message-ID: <5E557C44-224B-4460-9C2C-E375555B8BE6@genehack.org> On Aug 3, 2010, at 3:47 AM, chirag matkar wrote: > I have a list of Pubmed Ids. > I want to parse articles to find specific SNP related information. > Can i work it out using a Script? Can you provide a more specific example of what you'd like to do? For example, something along the lines of, "for PMID 1234, get ... about SNP 5678" (where '...' is replaced with whatever it is you're trying to get). Even describing how you would obtain this information using the website yourself will be helpful. thanks, john. From gowthaman.ramasamy at seattlebiomed.org Tue Aug 3 01:29:10 2010 From: gowthaman.ramasamy at seattlebiomed.org (Gowthaman Ramasamy) Date: Mon, 2 Aug 2010 22:29:10 -0700 Subject: [Bioperl-l] Getting pileup consensus from BAM files using Bio::DB::Sam In-Reply-To: Message-ID: Hi List, I am trying to find out the consensus using pileup via Bio::DB::Sam. Using the following script I could parse out the ref_base and different bases from reads at that position. Though, I am not able to find a method to derive consensus. Similar to the values produced by "samtools pileup -c -f xxxxxx.fasta yyyyyyy.bam". The script I use now retrives ref base, query bases for each position. How do I improve it to get the consensus? Thanks very much in advance, Gowthaman use Bio::DB::Sam; my $bam = Bio::DB::Sam->new(-bam => 'something.bam', -fasta => 'something.fasta' ); my $cb = sub { my ($seqid, $pos, $pileups) = @_; my $refBase = $bam->segment($seqid, $pos, $pos)->dna; print "\n$pos\t$refBase=>"; for my $pileup (@$pileups){ my $al = $pileup->alignment; my $qBase = substr($al->qseq, $pileup->qpos, 1); print "$qBase,"; } }; $bam->pileup('Lin.chr10i', $cb); From scott at scottcain.net Tue Aug 3 06:32:59 2010 From: scott at scottcain.net (Scott Cain) Date: Tue, 3 Aug 2010 06:32:59 -0400 Subject: [Bioperl-l] Getting pileup consensus from BAM files using Bio::DB::Sam In-Reply-To: References: Message-ID: Hi Gowthaman, I don't see a method to extract the consensus. You are welcome to submit a patch :-) Scott On Tue, Aug 3, 2010 at 1:29 AM, Gowthaman Ramasamy wrote: > Hi List, > I am trying to find out the consensus using pileup via Bio::DB::Sam. Using the following script I could parse out the ref_base and different bases from reads at that position. Though, I am not able to find a method to derive consensus. Similar to the values produced by "samtools pileup -c -f xxxxxx.fasta yyyyyyy.bam". > > The script I use now retrives ref base, query bases for each position. How do I improve it to get the consensus? > > Thanks very much in advance, > Gowthaman > > > use Bio::DB::Sam; > > my $bam = Bio::DB::Sam->new(-bam => 'something.bam', > ? ? ? ? ? ? ? ? ? ? ? ? ? ?-fasta => 'something.fasta' > ? ? ? ? ? ? ? ? ? ? ? ? ? ); > > my $cb = sub { > ? ? ? ? ? ? ? ? ? ? ? ?my ($seqid, $pos, $pileups) = @_; > ? ? ? ? ? ? ? ? ? ? ? ?my $refBase = $bam->segment($seqid, $pos, $pos)->dna; > ? ? ? ? ? ? ? ? ? ? ? ?print "\n$pos\t$refBase=>"; > ? ? ? ? ? ? ? ? ? ? ? ?for my $pileup (@$pileups){ > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?my $al = $pileup->alignment; > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?my $qBase = substr($al->qseq, $pileup->qpos, 1); > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?print "$qBase,"; > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?} > ? ? ? ? ? ? ? ? ? ? ? ?}; > > $bam->pileup('Lin.chr10i', $cb); > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From lincoln.stein at gmail.com Tue Aug 3 12:57:52 2010 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Tue, 3 Aug 2010 12:57:52 -0400 Subject: [Bioperl-l] Getting pileup consensus from BAM files using Bio::DB::Sam In-Reply-To: References: Message-ID: Samtools is running MAQ on the pileup. You could either implement MAQ in perl, or come up with your own consensus caller. Lincoln On Tue, Aug 3, 2010 at 1:29 AM, Gowthaman Ramasamy < gowthaman.ramasamy at seattlebiomed.org> wrote: > Hi List, > I am trying to find out the consensus using pileup via Bio::DB::Sam. Using > the following script I could parse out the ref_base and different bases from > reads at that position. Though, I am not able to find a method to derive > consensus. Similar to the values produced by "samtools pileup -c -f > xxxxxx.fasta yyyyyyy.bam". > > The script I use now retrives ref base, query bases for each position. How > do I improve it to get the consensus? > > Thanks very much in advance, > Gowthaman > > > use Bio::DB::Sam; > > my $bam = Bio::DB::Sam->new(-bam => 'something.bam', > -fasta => 'something.fasta' > ); > > my $cb = sub { > my ($seqid, $pos, $pileups) = @_; > my $refBase = $bam->segment($seqid, $pos, > $pos)->dna; > print "\n$pos\t$refBase=>"; > for my $pileup (@$pileups){ > my $al = $pileup->alignment; > my $qBase = substr($al->qseq, $pileup->qpos, > 1); > print "$qBase,"; > } > }; > > $bam->pileup('Lin.chr10i', $cb); > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From biopython at maubp.freeserve.co.uk Tue Aug 3 13:06:46 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 3 Aug 2010 18:06:46 +0100 Subject: [Bioperl-l] Getting pileup consensus from BAM files using Bio::DB::Sam In-Reply-To: References: Message-ID: On Tue, Aug 3, 2010 at 5:57 PM, Lincoln Stein wrote: > Samtools is running MAQ on the pileup. You could either implement MAQ in > perl, or come up with your own consensus caller. > > Lincoln See also: http://seqanswers.com/forums/showthread.php?t=6241 From gowthaman.ramasamy at seattlebiomed.org Tue Aug 3 13:28:36 2010 From: gowthaman.ramasamy at seattlebiomed.org (Gowthaman Ramasamy) Date: Tue, 3 Aug 2010 10:28:36 -0700 Subject: [Bioperl-l] Getting pileup consensus from BAM files using Bio::DB::Sam In-Reply-To: References: , Message-ID: <89080953C3D300419AACB6E63A7EEFBA5C47613B34@mail02.sbri.org> Hi Lincoln, Thats a good lead. I will try to use MAQ in perl rather than using my simple majority rule. -gowtham ________________________________________ From: Lincoln Stein [lincoln.stein at gmail.com] Sent: Tuesday, August 03, 2010 9:57 AM To: Gowthaman Ramasamy Cc: bioperl-l Subject: Re: [Bioperl-l] Getting pileup consensus from BAM files using Bio::DB::Sam Samtools is running MAQ on the pileup. You could either implement MAQ in perl, or come up with your own consensus caller. Lincoln On Tue, Aug 3, 2010 at 1:29 AM, Gowthaman Ramasamy > wrote: Hi List, I am trying to find out the consensus using pileup via Bio::DB::Sam. Using the following script I could parse out the ref_base and different bases from reads at that position. Though, I am not able to find a method to derive consensus. Similar to the values produced by "samtools pileup -c -f xxxxxx.fasta yyyyyyy.bam". The script I use now retrives ref base, query bases for each position. How do I improve it to get the consensus? Thanks very much in advance, Gowthaman use Bio::DB::Sam; my $bam = Bio::DB::Sam->new(-bam => 'something.bam', -fasta => 'something.fasta' ); my $cb = sub { my ($seqid, $pos, $pileups) = @_; my $refBase = $bam->segment($seqid, $pos, $pos)->dna; print "\n$pos\t$refBase=>"; for my $pileup (@$pileups){ my $al = $pileup->alignment; my $qBase = substr($al->qseq, $pileup->qpos, 1); print "$qBase,"; } }; $bam->pileup('Lin.chr10i', $cb); _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa > From stefan.kirov at bms.com Tue Aug 3 16:22:35 2010 From: stefan.kirov at bms.com (Stefan Kirov) Date: Tue, 03 Aug 2010 16:22:35 -0400 Subject: [Bioperl-l] nmica parser Message-ID: <4C587A8B.8090603@bms.com> Has anyone written nmica parser? If not I will perhaps do that. It should be straightforward- the output is XML. Stefan -------------- next part -------------- A non-text attachment was scrubbed... Name: stefan_kirov.vcf Type: text/x-vcard Size: 207 bytes Desc: not available URL: From fs5 at sanger.ac.uk Wed Aug 4 04:45:39 2010 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Wed, 04 Aug 2010 09:45:39 +0100 Subject: [Bioperl-l] Pubmed Parsing In-Reply-To: References: Message-ID: <1280911539.3499.46.camel@deskpro15336.dynamic.sanger.ac.uk> Hi Chiraq, have a look at this earlier post: http://bioperl.org/pipermail/bioperl-l/2009-April/029690.html However, you won't be able to retrieve all full texts and it is quite a task to parse natural language and get useful information about a gene, protein, SNP etc out of a manuscript. Frank On Tue, 2010-08-03 at 13:17 +0530, chirag matkar wrote: > Hello all, > I have a list of Pubmed Ids. > I want to parse articles to find specific SNP related information. > Can i work it out using a Script? > > > > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From David.Messina at sbc.su.se Thu Aug 5 08:16:17 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 5 Aug 2010 14:16:17 +0200 Subject: [Bioperl-l] call for a TreeIO volunteer Message-ID: <91AC5B00-5969-4C56-B08A-6EEA76916A10@sbc.su.se> Hi everybody, We've got a couple of small open bugs related to the Bio::TreeIO modules, and we could really use someone to take a look at them. Ideally, that someone would have familiarity with TreeIO already.* It'd help us to get the next release (1.6.2) out the door. The bugs in question are: - TreeIO::newick writes root node branch length incorrectly http://bugzilla.open-bio.org/show_bug.cgi?id=3039 - Bio::TreeIO::nhx cannot parse empty [&&NHX] + round-trip failure http://bugzilla.open-bio.org/show_bug.cgi?id=3007 Thanks, Dave on behalf of the core developers * Even if you don't, though, if you've been looking for an opportunity to contribute to BioPerl, and this sounds like something you'd like to work on, by all means raise your hand. From clements at nescent.org Thu Aug 5 13:15:41 2010 From: clements at nescent.org (Dave Clements) Date: Thu, 5 Aug 2010 10:15:41 -0700 Subject: [Bioperl-l] GMOD Europe 2010, 13-16 Sept, Cambridge, UK In-Reply-To: References: Message-ID: GMOD Europe 2010 ================ 13-16 September 2010 Cambridge, UK http://gmod.org/wiki/GMOD_Europe_2010 We are pleased to announce GMOD Europe 2010, four days of GMOD events being held 13-16 September 2010, at the University of Cambridge. GMOD Europe 2010 includes: 1) GMOD Community Meeting, Monday & Tuesday: Project updates, developer and user presentations and best practices, project direction. 2) GMOD Satellite Meetings, Wednesday: Special interest groups where GMOD community members meet to discuss specific topics of interest. 3) InterMine Workshop, Wednesday: A one day workshop on installing, configuring and using the InterMine biological data warehouse system. 4) BioMart Workshop, Thursday: A one day workshop on using the BioMart biological data warehouse system, including accessing data through APIs. Registration is now open for these events. There is a ?50 registration fee for the GMOD Meeting to cover catered lunches and other expenses. Registration for all other events is free, but required, as space is limited. These events are open to all: GMOD users, developers, prospective users, biologists, and computer scientists. See http://gmod.org/wiki/January_2010_GMOD_Meeting for an idea of what goes on at GMOD meetings, GMOD is a collection of interoperable open source software components for managing, visualizing and annotating biological data. GMOD incorporates many widely used tools, including GBrowse and JBrowse for genome browsing, InterMine and BioMart for data mining, Galaxy and Ergatis for workflow, Chado for data management, GBrowse_syn and CMap for comparative genomics, plus many other tools (Apollo, MAKER, Pathway Tools, Textpresso, ...). GMOD is also an active community of researchers and developers addressing common challenges in exploiting their data. If you are struggling to fully exploit your data then please consider attending GMOD Europe 2010. Please let us know if you have any questions, and we hope to see you in Cambridge. Thanks, Scott Cain and Dave Clements -- http://gmod.org/wiki/GMOD_News http://gmod.org/wiki/GMOD_Evo_Hackathon http://gmod.org/wiki/GMOD_Europe_2010 http://gmod.org/wiki/Help_Desk_Feedback From abhishek.vit at gmail.com Thu Aug 5 18:15:56 2010 From: abhishek.vit at gmail.com (Abhishek Pratap) Date: Thu, 5 Aug 2010 18:15:56 -0400 Subject: [Bioperl-l] Wrapper for Picard tools in Bioperl Message-ID: Hi All Just wondering if there is any Picard wrapper/s available in Bioperl. Thanks! -Abhi ----------------------------- Abhishek Pratap Bioinformatics Software Engineer II Genomics Resource Center Institute for Genome Sciences School of Medicine, Univ of Maryland 801, W. Baltimore Street, Baltimore, MD 21209 Ph: (+1)-410-706-2296 www.igs.umaryland.edu/ From Russell.Smithies at agresearch.co.nz Thu Aug 5 18:37:46 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Fri, 6 Aug 2010 10:37:46 +1200 Subject: [Bioperl-l] Wrapper for Picard tools in Bioperl In-Reply-To: References: Message-ID: <18DF7D20DFEC044098A1062202F5FFF32F02262E96@exchsth.agresearch.co.nz> Might be part of the "Enterprise" package. If not, some developer should "make it so". :-) --Russell (I hate Fridays) > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Abhishek Pratap > Sent: Friday, 6 August 2010 10:16 a.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Wrapper for Picard tools in Bioperl > > Hi All > > Just wondering if there is any Picard wrapper/s available in Bioperl. > > > Thanks! > -Abhi > > ----------------------------- > Abhishek Pratap > Bioinformatics Software Engineer II > Genomics Resource Center > Institute for Genome Sciences > School of Medicine, Univ of Maryland > 801, W. Baltimore Street, Baltimore, MD 21209 > Ph: (+1)-410-706-2296 > www.igs.umaryland.edu/ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From cjfields at illinois.edu Thu Aug 5 19:10:16 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 5 Aug 2010 18:10:16 -0500 Subject: [Bioperl-l] Wrapper for Picard tools in Bioperl In-Reply-To: References: Message-ID: <26E3E5B6-47CF-4744-9687-199C218B5571@illinois.edu> Picard uses samtools, which has a perl API: http://search.cpan.org/dist/Bio-SamTools/ which uses BioPerl. Ah, the circle of life... chris On Aug 5, 2010, at 5:15 PM, Abhishek Pratap wrote: > Hi All > > Just wondering if there is any Picard wrapper/s available in Bioperl. > > > Thanks! > -Abhi > > ----------------------------- > Abhishek Pratap > Bioinformatics Software Engineer II > Genomics Resource Center > Institute for Genome Sciences > School of Medicine, Univ of Maryland > 801, W. Baltimore Street, Baltimore, MD 21209 > Ph: (+1)-410-706-2296 > www.igs.umaryland.edu/ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From dan.kortschak at adelaide.edu.au Thu Aug 5 21:06:45 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Fri, 06 Aug 2010 10:36:45 +0930 Subject: [Bioperl-l] MUMmer parser work Message-ID: <1281056805.2414.26.camel@zoidberg.mbs.adelaide.edu.au> Hello Everyone, I've just noticed the absence of a MUMmer parser and thought that it might be a worthwhile contribution to bioperl-run (I won't be able to start on this for a while, but given Mark's excellent work on CommandExts, it should take too long to get up when I do have time). Has anyone made any effort in this direction that I would be stepping on, or if they have left it, that I could pick up to shorten the work time? cheers Dan From cjfields at illinois.edu Thu Aug 5 23:13:51 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 5 Aug 2010 22:13:51 -0500 Subject: [Bioperl-l] MUMmer parser work In-Reply-To: <1281056805.2414.26.camel@zoidberg.mbs.adelaide.edu.au> References: <1281056805.2414.26.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: <80AF6158-9ADF-47A6-97EC-C322F75C8959@illinois.edu> Dan, Just so you know, there is a proposed MUMmer AlignIO parser that John (genehack) is planning on trying to incorporate in: http://bugzilla.open-bio.org/show_bug.cgi?id=2701 It currently lacks significant tests, so feel free to chip in there as needed. chris On Aug 5, 2010, at 8:06 PM, Dan Kortschak wrote: > Hello Everyone, > > I've just noticed the absence of a MUMmer parser and thought that it > might be a worthwhile contribution to bioperl-run (I won't be able to > start on this for a while, but given Mark's excellent work on > CommandExts, it should take too long to get up when I do have time). Has > anyone made any effort in this direction that I would be stepping on, or > if they have left it, that I could pick up to shorten the work time? > > cheers > Dan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From greg at ebi.ac.uk Fri Aug 6 05:47:21 2010 From: greg at ebi.ac.uk (Gregory Jordan) Date: Fri, 6 Aug 2010 10:47:21 +0100 Subject: [Bioperl-l] call for a TreeIO volunteer In-Reply-To: <91AC5B00-5969-4C56-B08A-6EEA76916A10@sbc.su.se> References: <91AC5B00-5969-4C56-B08A-6EEA76916A10@sbc.su.se> Message-ID: I can help out with these. I'm pretty sure I've previously fought with (and perhaps even come up with a fix for) bug 3039, and I can take a look at 3007 too. Now lemme just see if I can get up and running with the Bioperl test suite. I'll give a shout if I run into any problems. Cheers, Greg On 5 August 2010 13:16, Dave Messina wrote: > Hi everybody, > > We've got a couple of small open bugs related to the Bio::TreeIO modules, > and we could really use someone to take a look at them. Ideally, that > someone would have familiarity with TreeIO already.* > > It'd help us to get the next release (1.6.2) out the door. > > The bugs in question are: > - TreeIO::newick writes root node branch length incorrectly > http://bugzilla.open-bio.org/show_bug.cgi?id=3039 > > - Bio::TreeIO::nhx cannot parse empty [&&NHX] + round-trip failure > http://bugzilla.open-bio.org/show_bug.cgi?id=3007 > > > Thanks, > Dave > on behalf of the core developers > > > * Even if you don't, though, if you've been looking for an opportunity to > contribute to BioPerl, and this sounds like something you'd like to work on, > by all means raise your hand. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jun.yin at ucd.ie Fri Aug 6 06:52:14 2010 From: jun.yin at ucd.ie (Jun Yin) Date: Fri, 06 Aug 2010 11:52:14 +0100 Subject: [Bioperl-l] Packages retrieving online alignment sequences Message-ID: <00d901cb3555$6e3a5500$4aaeff00$%yin@ucd.ie> Hi, all, I am the google summer of code student working on refactoring Bio::Align subsystem. I recently implemented several packages retrieving online alignment sequences. The aim of the packages are to provide convenient methods to retrieve online alignment sequences for the BioPerl users. The alignment sequences are converted into Bio::SimpleAlign object after the retrieval, which will be easy to manipulate and write to local disk. Now the packages support Pfam, Rfam, Prosite and Entrez Protein Clusters databases. Here is the structure of the packages: Packages Bio::DB::Align (interface, and calling other packages) Bio::DB::Align::Pfam (retrieving alignment from Pfam) Bio::DB::Align::Rfam (retrieving alignment from Rfam) Bio::DB::Align:Prosite (retrieving alignment from Prosite) Bio::DB::Align:ProtClustDB (retrieving alignment from Entrez Protein Clusters Database) Usually four methods are provided for each package: Methods get_Aln_by_id (retrieving alignment by id and returns Bio::SimpleAlign object) get_Aln_by_acc (retrieving alignment by acession and returns Bio::SimpleAlign object) (Rfam and Prosite only supports this method) id2acc (id to accession conversion) acc2id (accession to id conversion) These packages are built dependent on LWP::UserAgent, HTTP::Request and Bio::DB::GenericWebAgent. Bio::DB::Align::ProtClustDB is dependent on Bio::DB::EUtilities. Calling the packages can be: my $dbobj=Bio::DB::Align->new(-db=>"rfam"); Or, my $dbobj= Bio::DB::Align::Pfam->new(); my $aln=$dbobj->get_Aln_by_acc("RF0001"); my $aln2=$dbobj->get_Aln_by_acc(-accession=>"RF0001",-alignment=>"full"); print $aln->length(); foreach my $seq ($aln->each_Seq) { #do something } I have done some tests on these packages. And, I will write them into standard tests later. Any suggestions on these packages are welcome. Cheers, Jun Yin Ph.D. student in U.C.D. Bioinformatics Laboratory Conway Institute University College Dublin From David.Messina at sbc.su.se Fri Aug 6 08:59:19 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 6 Aug 2010 14:59:19 +0200 Subject: [Bioperl-l] call for a TreeIO volunteer In-Reply-To: References: <91AC5B00-5969-4C56-B08A-6EEA76916A10@sbc.su.se> Message-ID: <6D6DAA77-2A2F-4AAA-B36D-FACED1FDE383@sbc.su.se> > I can help out with these. I'm pretty sure I've previously fought with (and perhaps even come up with a fix for) bug 3039, and I can take a look at 3007 too. Awesome ? thanks Greg! > Now lemme just see if I can get up and running with the Bioperl test suite. I'll give a shout if I run into any problems. Please do. Dave From David.Messina at sbc.su.se Fri Aug 6 09:06:47 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 6 Aug 2010 15:06:47 +0200 Subject: [Bioperl-l] Packages retrieving online alignment sequences In-Reply-To: <00d901cb3555$6e3a5500$4aaeff00$%yin@ucd.ie> References: <00d901cb3555$6e3a5500$4aaeff00$%yin@ucd.ie> Message-ID: Sounds great, Jun! Did you happen to test your code on very large alignments? I know there's one in Pfam that's something like 100,000 sequences. An rRNA, I believe. Dave From jun.yin at ucd.ie Fri Aug 6 09:11:41 2010 From: jun.yin at ucd.ie (Jun Yin) Date: Fri, 06 Aug 2010 14:11:41 +0100 Subject: [Bioperl-l] Packages retrieving online alignment sequences In-Reply-To: References: <00d901cb3555$6e3a5500$4aaeff00$%yin@ucd.ie> Message-ID: <00fc01cb3568$e97968b0$bc6c3a10$%yin@ucd.ie> Hi, Dave, Thx for reminding me this. I will definitely try it. Cheers, Jun Yin Ph.D.?student in U.C.D. Bioinformatics Laboratory Conway Institute University College Dublin -----Original Message----- From: Dave Messina [mailto:David.Messina at sbc.su.se] Sent: Friday, August 06, 2010 2:07 PM To: Jun Yin Cc: bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Packages retrieving online alignment sequences Sounds great, Jun! Did you happen to test your code on very large alignments? I know there's one in Pfam that's something like 100,000 sequences. An rRNA, I believe. Dave __________ Information from ESET Smart Security, version of virus signature database 5346 (20100806) __________ The message was checked by ESET Smart Security. http://www.eset.com __________ Information from ESET Smart Security, version of virus signature database 5346 (20100806) __________ The message was checked by ESET Smart Security. http://www.eset.com From cjfields at illinois.edu Fri Aug 6 09:19:54 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 6 Aug 2010 08:19:54 -0500 Subject: [Bioperl-l] call for a TreeIO volunteer In-Reply-To: <6D6DAA77-2A2F-4AAA-B36D-FACED1FDE383@sbc.su.se> References: <91AC5B00-5969-4C56-B08A-6EEA76916A10@sbc.su.se> <6D6DAA77-2A2F-4AAA-B36D-FACED1FDE383@sbc.su.se> Message-ID: <8CB3DE9A-4C5C-42A3-94B4-8818D7143951@illinois.edu> On Aug 6, 2010, at 7:59 AM, Dave Messina wrote: > >> I can help out with these. I'm pretty sure I've previously fought with (and perhaps even come up with a fix for) bug 3039, and I can take a look at 3007 too. > > Awesome ? thanks Greg! > > >> Now lemme just see if I can get up and running with the Bioperl test suite. I'll give a shout if I run into any problems. > > Please do. > > > > Dave Agreed, and thanks for helping out! chris From dianabowley at gmail.com Fri Aug 6 18:33:57 2010 From: dianabowley at gmail.com (DRBowley) Date: Fri, 6 Aug 2010 15:33:57 -0700 (PDT) Subject: [Bioperl-l] BioPerl install issues Message-ID: I'm new to both perl and bioperl and I'm having issues installing bioperl. I'm trying to install on a Mac OS 10.6.4, and I've already installed perl (5.10.0). I tried installing using the recommended approach for Mac - via Fink... "fink install bioperl-pm5100" Looking back over the terminal window text it looks like the problem is: "This package requires Module::Build v0.2805 or greater to install itself." I tried doing "fink selfupdate" and that did not fix the problem. Any suggestions? Thanks! Diana From Kevin.M.Brown at asu.edu Fri Aug 6 18:50:45 2010 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Fri, 6 Aug 2010 15:50:45 -0700 Subject: [Bioperl-l] BioPerl install issues In-Reply-To: References: Message-ID: <1A4207F8295607498283FE9E93B775B406E44A05@EX02.asurite.ad.asu.edu> http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix#INSTALLING_BIOPE RL_THE_EASY_WAY_USING_Build.PL Not sure why you had to install perl since it should have been part of the stock OSX install (or at least it was last time I logged onto a mac). Not sure why the Fink method has so many issues, but might try the above which works for linux or bsd. -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of DRBowley Sent: Friday, August 06, 2010 3:34 PM To: bioperl-l at bioperl.org Subject: [Bioperl-l] BioPerl install issues I'm new to both perl and bioperl and I'm having issues installing bioperl. I'm trying to install on a Mac OS 10.6.4, and I've already installed perl (5.10.0). I tried installing using the recommended approach for Mac - via Fink... "fink install bioperl-pm5100" Looking back over the terminal window text it looks like the problem is: "This package requires Module::Build v0.2805 or greater to install itself." I tried doing "fink selfupdate" and that did not fix the problem. Any suggestions? Thanks! Diana _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From skastu01 at students.poly.edu Fri Aug 6 20:03:50 2010 From: skastu01 at students.poly.edu (Lakshmi Kastury) Date: Sat, 7 Aug 2010 00:03:50 +0000 Subject: [Bioperl-l] BioPerl install issues Message-ID: Hi - I went through several failed attempts on MACOS Snow Leopard, and fink was a dead end. Eventually I succeeded to install on Windows Vista using CPAN. I am not sure if this method will work with MACOS: 1. Opened command prompt. 2. Typed command: >perl -MCPAN -e "install Bundle::BioPerl" 3. Answered yes to the series of questions, which prompts install of several bundles and a compiler. The instructions were in a link from: http://bioperl.org/Core/Latest/INSTALL All the best, Lakshmi > Date: Fri, 6 Aug 2010 15:33:57 -0700 > From: dianabowley at gmail.com > To: bioperl-l at bioperl.org > Subject: [Bioperl-l] BioPerl install issues > > I'm new to both perl and bioperl and I'm having issues installing > bioperl. I'm trying to install on a Mac OS 10.6.4, and I've already > installed perl (5.10.0). I tried installing using the recommended > approach for Mac - via Fink... > "fink install bioperl-pm5100" > > Looking back over the terminal window text it looks like the problem > is: > "This package requires Module::Build v0.2805 or greater to install > itself." > > I tried doing "fink selfupdate" and that did not fix the problem. > > Any suggestions? > > Thanks! > Diana > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Sat Aug 7 02:47:40 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Sat, 7 Aug 2010 08:47:40 +0200 Subject: [Bioperl-l] BioPerl install issues In-Reply-To: References: Message-ID: <5BE9DB7C-9A51-4C09-8F83-8CA8ED4AADFE@sbc.su.se> On Aug 7, 2010, at 02:03 , Lakshmi Kastury wrote: > I am not sure if this method will work with MACOS: It will. CPAN is cross-platform and is the best way to install BioPerl. Dave From cjfields at illinois.edu Sat Aug 7 09:58:56 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 7 Aug 2010 08:58:56 -0500 Subject: [Bioperl-l] BioPerl install issues In-Reply-To: <5BE9DB7C-9A51-4C09-8F83-8CA8ED4AADFE@sbc.su.se> References: <5BE9DB7C-9A51-4C09-8F83-8CA8ED4AADFE@sbc.su.se> Message-ID: It should work fine. Even installing from trunk right now works w/o failing tests. chris On Aug 7, 2010, at 1:47 AM, Dave Messina wrote: > > On Aug 7, 2010, at 02:03 , Lakshmi Kastury wrote: > >> I am not sure if this method will work with MACOS: > > It will. CPAN is cross-platform and is the best way to install BioPerl. > > > Dave > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From greg at ebi.ac.uk Sat Aug 7 17:14:58 2010 From: greg at ebi.ac.uk (Gregory Jordan) Date: Sat, 7 Aug 2010 22:14:58 +0100 Subject: [Bioperl-l] Packages retrieving online alignment sequences In-Reply-To: <00fc01cb3568$e97968b0$bc6c3a10$%yin@ucd.ie> References: <00d901cb3555$6e3a5500$4aaeff00$%yin@ucd.ie> <00fc01cb3568$e97968b0$bc6c3a10$%yin@ucd.ie> Message-ID: Maybe I'm just a bit naive here, but what is the expected difference between accession and ID and why do we need a separate method for each? Seems to me that one could just have a single method, get_Aln, which determines under the hood whether the query string is an accession or ID. It would be nice if the SimpleAlign object had its Annotation filled with some extra metadata (such as accession, ID, database version number, URI, etc.). One other thing: have you thought about adding an Ensembl adaptor? Or maybe something similar already exists in BioPerl...? Sure Ensembl provides their own Perl API, but for someone who doesn't want to go through the hassle of installing it from CVS (pardon my french, but wtf!?! Who still uses CVS) and learning a whole new API, it might be convenient to have a simple BioPerl module for quickly grabbing gene family alignments from the public Ensembl MySQL databases. I'd be willing to help write the necessary SQL queries for this. greg On 6 August 2010 14:11, Jun Yin wrote: > Hi, Dave, > > Thx for reminding me this. I will definitely try it. > > Cheers, > Jun Yin > Ph.D. student in U.C.D. > > Bioinformatics Laboratory > Conway Institute > University College Dublin > > > -----Original Message----- > From: Dave Messina [mailto:David.Messina at sbc.su.se] > Sent: Friday, August 06, 2010 2:07 PM > To: Jun Yin > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Packages retrieving online alignment sequences > > Sounds great, Jun! > > Did you happen to test your code on very large alignments? I know there's > one in Pfam that's something like 100,000 sequences. An rRNA, I believe. > > > Dave > > > __________ Information from ESET Smart Security, version of virus signature > database 5346 (20100806) __________ > > The message was checked by ESET Smart Security. > > http://www.eset.com > > > > > __________ Information from ESET Smart Security, version of virus signature > database 5346 (20100806) __________ > > The message was checked by ESET Smart Security. > > http://www.eset.com > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Sat Aug 7 18:07:39 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 7 Aug 2010 17:07:39 -0500 Subject: [Bioperl-l] Packages retrieving online alignment sequences In-Reply-To: References: <00d901cb3555$6e3a5500$4aaeff00$%yin@ucd.ie> <00fc01cb3568$e97968b0$bc6c3a10$%yin@ucd.ie> Message-ID: <21E3B6D7-01BC-4DDA-B5B3-06F1F5AD7105@illinois.edu> On Aug 7, 2010, at 4:14 PM, Gregory Jordan wrote: > Maybe I'm just a bit naive here, but what is the expected difference between > accession and ID and why do we need a separate method for each? Depends on the remote service, but in many cases there is a difference. With NCBI eutils you can have either an accession and the unique identifier (UID, or GI for nuc/protein seqs). efetch can use both, but only the UID is guaranteed to retrieve a single sequence all the time; the accession can (very rarely) map to more than one sequence. The other eutils services require either a string (esearch) or a UID, but do not allow an accession. > Seems to me > that one could just have a single method, get_Aln, which determines under > the hood whether the query string is an accession or ID. A simpler method could be introduced, but I can see that being potentially brittle in the long run. A naked alphanumeric string doesn't reveal much about what it is at face value w/o knowing database/service-specific behavior. And then we're reliant on that behavior not changing, which we can't guarantee (this has bitten us in the past). What would one do if NCBI (for instance) allowed accessions derived completely of digits, or conversely a unique ID with mixed alphanumerics? Using methods specific for ID/acc at least guarantees a behavior on the backend w/o guessing, and if there is no danger of overlap (a service accepts either/or) one could simply be an alias of the other. > It would be nice if the SimpleAlign object had its Annotation filled with > some extra metadata (such as accession, ID, database version number, URI, > etc.). According to the deobfuscator SimpleAlign does have accession() and id(). The others could be simple attributes, and can be added as simple getter/setters, or as annotation via Bio::Annotation (this is the way Stockholm annotation is currently handled). Something to think about. > One other thing: have you thought about adding an Ensembl adaptor? Or maybe > something similar already exists in BioPerl...? That's a good idea, though it might make more sense if this was done when mem-efficient (possibly DB-dependent) AlignI modules are present within bioperl, which is part of the GSoC (see below). For instance, have a Bio::Align::AlignI with a backend ensembl DB adaptor that works lazily. If using the Ensembl Perl API, a few possible roadblocks/problems might pop up. Ensembl currently requires bioperl (v1.2.3, but it works with the latest as well, at least when I've used it). If using the ensembl perl API we would just need to ensure we aren't conflicting with ensembl code that pulls in bioperl classes expecting a v1.2.3 API when we only support the latest. I don't foresee this being an issue, though (there is precedent for this, see Sendu's Ensembl module Bio::Tools::Run::Ensembl in bioperl-run). > Sure Ensembl provides their own Perl API, but for someone who doesn't want > to go through the hassle of installing it from CVS (pardon my french, but > wtf!?! Who still uses CVS) and learning a whole new API, it might be > convenient to have a simple BioPerl module for quickly grabbing gene family > alignments from the public Ensembl MySQL databases. I'd be willing to help > write the necessary SQL queries for this. > > greg The GSoC project on alignment subsystem refactoring will be finishing up this month, so I'm sure Jun discuss ideas for initial DB-dependent implementations. The more input and coders implementing the better, IMO. As for writing up an adaptor to ensembl outside of it's API, overall I don't think it's a bad idea, but if it's possible maybe start without reinventing things, then move to direct SQL. Unless it's easier to use SQL. chris > On 6 August 2010 14:11, Jun Yin wrote: > >> Hi, Dave, >> >> Thx for reminding me this. I will definitely try it. >> >> Cheers, >> Jun Yin >> Ph.D. student in U.C.D. >> >> Bioinformatics Laboratory >> Conway Institute >> University College Dublin >> >> >> -----Original Message----- >> From: Dave Messina [mailto:David.Messina at sbc.su.se] >> Sent: Friday, August 06, 2010 2:07 PM >> To: Jun Yin >> Cc: bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] Packages retrieving online alignment sequences >> >> Sounds great, Jun! >> >> Did you happen to test your code on very large alignments? I know there's >> one in Pfam that's something like 100,000 sequences. An rRNA, I believe. >> >> >> Dave >> >> >> __________ Information from ESET Smart Security, version of virus signature >> database 5346 (20100806) __________ >> >> The message was checked by ESET Smart Security. >> >> http://www.eset.com >> >> >> >> >> __________ Information from ESET Smart Security, version of virus signature >> database 5346 (20100806) __________ >> >> The message was checked by ESET Smart Security. >> >> http://www.eset.com >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hartzell at alerce.com Sat Aug 7 17:45:04 2010 From: hartzell at alerce.com (George Hartzell) Date: Sat, 7 Aug 2010 14:45:04 -0700 Subject: [Bioperl-l] BioPerl install issues In-Reply-To: References: <5BE9DB7C-9A51-4C09-8F83-8CA8ED4AADFE@sbc.su.se> Message-ID: <19549.54240.499140.501136@gargle.gargle.HOWL> Chris Fields writes: > It should work fine. Even installing from trunk right now works > w/o failing tests. As a slight aside, if you're looking to build a current perl binary for your mac (e.g. 5.12.1) you should take a look at perlbrew (http://search.cpan.org/dist/App-perlbrew/). The three steps at the top of the installation section of the README are all you need to get going. Even a manager can do it. If you're using bash on the mac via terminal you'll probably want to put the one-liner they prescribe into your .bash_profile instead of your .bashrc, but everything else just flows right along. Once you have that in place you have a nicely isolated system into which you can install things to your hearts content without worrying about PERL5LIB and local::lib and the rest. g. From cjfields at illinois.edu Sat Aug 7 21:19:54 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 7 Aug 2010 20:19:54 -0500 Subject: [Bioperl-l] BioPerl install issues In-Reply-To: <19549.54240.499140.501136@gargle.gargle.HOWL> References: <5BE9DB7C-9A51-4C09-8F83-8CA8ED4AADFE@sbc.su.se> <19549.54240.499140.501136@gargle.gargle.HOWL> Message-ID: On Aug 7, 2010, at 4:45 PM, George Hartzell wrote: > Chris Fields writes: >> It should work fine. Even installing from trunk right now works >> w/o failing tests. > > As a slight aside, if you're looking to build a current perl binary > for your mac (e.g. 5.12.1) you should take a look at perlbrew > (http://search.cpan.org/dist/App-perlbrew/). The three steps at the > top of the installation section of the README are all you need to get > going. Even a manager can do it. > > If you're using bash on the mac via terminal you'll probably want to > put the one-liner they prescribe into your .bash_profile instead of > your .bashrc, but everything else just flows right along. > > Once you have that in place you have a nicely isolated system into > which you can install things to your hearts content without worrying > about PERL5LIB and local::lib and the rest. > > g. Have to second using perlbrew, started using it for my local Ubuntu installation (don't have it running on my macbook yet, but it's in the plans). chris From greg at ebi.ac.uk Sun Aug 8 02:12:41 2010 From: greg at ebi.ac.uk (Gregory Jordan) Date: Sun, 8 Aug 2010 07:12:41 +0100 Subject: [Bioperl-l] Packages retrieving online alignment sequences In-Reply-To: <21E3B6D7-01BC-4DDA-B5B3-06F1F5AD7105@illinois.edu> References: <00d901cb3555$6e3a5500$4aaeff00$%yin@ucd.ie> <00fc01cb3568$e97968b0$bc6c3a10$%yin@ucd.ie> <21E3B6D7-01BC-4DDA-B5B3-06F1F5AD7105@illinois.edu> Message-ID: On 7 August 2010 23:07, Chris Fields wrote: > > A simpler method could be introduced, but I can see that being potentially > brittle in the long run. A naked alphanumeric string doesn't reveal much > about what it is at face value w/o knowing database/service-specific > behavior. And then we're reliant on that behavior not changing, which we > can't guarantee (this has bitten us in the past). What would one do if NCBI > (for instance) allowed accessions derived completely of digits, or > conversely a unique ID with mixed alphanumerics? > > Using methods specific for ID/acc at least guarantees a behavior on the > backend w/o guessing, and if there is no danger of overlap (a service > accepts either/or) one could simply be an alias of the other. > Thanks for the clarification on IDs vs accessions. As long as the behavior and distinction are well-documented, I'm sure it won't make too much of a difference. My main concern was just that having two similar methods -- with no clearly laid out distinction between the two and one of them only supported by half of the implementing subclasses -- might confuse potential users. As a point of reference: both Rfam and Pfam allow either an ID or an accession in their front-page search interface (http://www.pfam.org / http://www.rfam.org/). In fact, they seem to entirely hide the distinction between ID and Accession from the end user; nowhere on the Rfam page for an individual result is it clear which string is the accession and which is the ID (http://rfam.sanger.ac.uk/family/snoZ107_R87). Thus, a potential user of the Rfam module wouldn't know whether to call the get_by_ID or get_by_Accession method, even after looking at the Rfam page for his / her desired alignment! As you can probably tell, I'm all in favor of a unified search whenever feasible / possible. :-) > As for writing up an adaptor to ensembl outside of it's API, overall I > don't think it's a bad idea, but if it's possible maybe start without > reinventing things, then move to direct SQL. Unless it's easier to use SQL. > > For fetching Ensembl's gene family alignments, using the SQL will be easiest. They don't tend to get unreasonably large in terms of memory -- I think the biggest tend to be ~700 sequences with a few thousand alignment columns or so -- and it's a simple table join or two to get both the tree and alignment from the database. For genomic alignments, I agree that a more memory-efficient and/or lazy backend would be necessary. And it's pretty much impossible to get those things out of the Ensembl tables without using their API. --greg From dan.kortschak at adelaide.edu.au Sun Aug 8 20:53:43 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Mon, 09 Aug 2010 10:23:43 +0930 Subject: [Bioperl-l] MUMmer parser work In-Reply-To: <80AF6158-9ADF-47A6-97EC-C322F75C8959@illinois.edu> References: <1281056805.2414.26.camel@zoidberg.mbs.adelaide.edu.au> <80AF6158-9ADF-47A6-97EC-C322F75C8959@illinois.edu> Message-ID: <1281315223.2414.48.camel@zoidberg.mbs.adelaide.edu.au> Hi Chris, Is that set of files planned to be included in the git repository on bioperl-live? I don't want to push something that is being organised by someone else. cheers Dan On Thu, 2010-08-05 at 22:13 -0500, Chris Fields wrote: > Dan, > > Just so you know, there is a proposed MUMmer AlignIO parser that John (genehack) is planning on trying to incorporate in: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2701 > > It currently lacks significant tests, so feel free to chip in there as needed. > > chris From genehack at genehack.org Sun Aug 8 21:42:27 2010 From: genehack at genehack.org (John SJ Anderson) Date: Sun, 8 Aug 2010 21:42:27 -0400 Subject: [Bioperl-l] MUMmer parser work In-Reply-To: <1281315223.2414.48.camel@zoidberg.mbs.adelaide.edu.au> References: <1281056805.2414.26.camel@zoidberg.mbs.adelaide.edu.au> <80AF6158-9ADF-47A6-97EC-C322F75C8959@illinois.edu> <1281315223.2414.48.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: <5BEA6ECA-B7A7-4417-BC91-763AB956347A@genehack.org> I'm working on getting those files into a topic branch in bioperl-live so they can be reviewed -- that'll probably be pushed back to the main master within the next couple days at the latest. j. On Aug 8, 2010, at 20:53 , Dan Kortschak wrote: > Hi Chris, > > Is that set of files planned to be included in the git repository on > bioperl-live? I don't want to push something that is being organised by > someone else. > > cheers > Dan > > On Thu, 2010-08-05 at 22:13 -0500, Chris Fields wrote: >> Dan, >> >> Just so you know, there is a proposed MUMmer AlignIO parser that John (genehack) is planning on trying to incorporate in: >> >> http://bugzilla.open-bio.org/show_bug.cgi?id=2701 >> >> It currently lacks significant tests, so feel free to chip in there as needed. >> >> chris > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From dan.kortschak at adelaide.edu.au Sun Aug 8 22:03:52 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Mon, 09 Aug 2010 11:33:52 +0930 Subject: [Bioperl-l] MUMmer parser work In-Reply-To: <5BEA6ECA-B7A7-4417-BC91-763AB956347A@genehack.org> References: <1281056805.2414.26.camel@zoidberg.mbs.adelaide.edu.au> <80AF6158-9ADF-47A6-97EC-C322F75C8959@illinois.edu> <1281315223.2414.48.camel@zoidberg.mbs.adelaide.edu.au> <5BEA6ECA-B7A7-4417-BC91-763AB956347A@genehack.org> Message-ID: <1281319432.2414.49.camel@zoidberg.mbs.adelaide.edu.au> Excellent. Thanks for that. Dan On Sun, 2010-08-08 at 21:42 -0400, John SJ Anderson wrote: > I'm working on getting those files into a topic branch in bioperl-live so they can be reviewed -- that'll probably be pushed back to the main master within the next couple days at the latest. > > j. From cjfields at illinois.edu Mon Aug 9 22:40:07 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 9 Aug 2010 21:40:07 -0500 Subject: [Bioperl-l] bioperl-live, moving Bio->lib/Bio Message-ID: Any objections to moving the Bio directory to lib/Bio in bioperl-live? It's a more standard location for code in most distributions; I have a branch (topic/cjfields_standard_lib) that has this working, though it's possible that it needs more work. chris From genehack at genehack.org Tue Aug 10 04:30:44 2010 From: genehack at genehack.org (John SJ Anderson) Date: Tue, 10 Aug 2010 04:30:44 -0400 Subject: [Bioperl-l] bioperl-live, moving Bio->lib/Bio In-Reply-To: References: Message-ID: On Aug 9, 2010, at 22:40 , Chris Fields wrote: > Any objections to moving the Bio directory to lib/Bio in bioperl-live? +1 on this idea. j. From genehack at genehack.org Tue Aug 10 07:21:51 2010 From: genehack at genehack.org (John Anderson) Date: Tue, 10 Aug 2010 07:21:51 -0400 Subject: [Bioperl-l] MUMmer parser work In-Reply-To: <5BEA6ECA-B7A7-4417-BC91-763AB956347A@genehack.org> References: <1281056805.2414.26.camel@zoidberg.mbs.adelaide.edu.au> <80AF6158-9ADF-47A6-97EC-C322F75C8959@illinois.edu> <1281315223.2414.48.camel@zoidberg.mbs.adelaide.edu.au> <5BEA6ECA-B7A7-4417-BC91-763AB956347A@genehack.org> Message-ID: <7A4F93AB-1BF7-4775-BC0E-38E7B431ECC6@genehack.org> On Aug 8, 2010, at 9:42 PM, John SJ Anderson wrote: > I'm working on getting those files into a topic branch in bioperl-live so they can be reviewed -- that'll probably be pushed back to the main master within the next couple days at the latest. Okay, the files have been added to topic/bug-2701 -- see . Please note, these are just the files from the bug report, slotted into the appropriate spots. I haven't reviewed the code or done anything about the non-BioPerl-y tests or the general lack of test coverage. I hope to do something about that in the coming week, but if somebody beats me to it, that would be okay too. j. From maj at fortinbras.us Tue Aug 10 19:52:05 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 10 Aug 2010 19:52:05 -0400 Subject: [Bioperl-l] bioperl-live, moving Bio->lib/Bio In-Reply-To: References: Message-ID: <1C55239986494A8D82BDC21A85B324E9@NewLife> +1 ----- Original Message ----- From: "Chris Fields" To: "BioPerl List" Sent: Monday, August 09, 2010 10:40 PM Subject: [Bioperl-l] bioperl-live, moving Bio->lib/Bio > Any objections to moving the Bio directory to lib/Bio in bioperl-live? It's a > more standard location for code in most distributions; I have a branch > (topic/cjfields_standard_lib) that has this working, though it's possible that > it needs more work. > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From fayroz_farouk at yahoo.com Sun Aug 8 04:24:31 2010 From: fayroz_farouk at yahoo.com (fayroz) Date: Sun, 8 Aug 2010 01:24:31 -0700 (PDT) Subject: [Bioperl-l] using HMMER Message-ID: <603590.1072.qm@web112620.mail.gq1.yahoo.com> i need your help, i?am a new perl user and want to use bioperl modules to run HMMER program ( HMMsearch) i have" model.hmm" and a "fasta file" to?see?which of them are similar?with the model i write this code but there is a problems #!/usr/local/bin/perl W use Bio::AlignIO; use Bio::SearchIO; use Bio::SeqIO ; use Bio::Tools::Run::Hmmer; # run hmmsearch (similar for hmmpfam) my $factory = Bio::Tools::Run::Hmmer->new(-hmm => 'h6_avian.hmm',-informat => 'fasta'); my $seq = Bio::SeqIO->new('-file'=> "one_seq.fa", '-format'=>'Fasta'); # Pass the factory a Bio::Seq object or a file name, returns a Bio::SearchIO my $searchio = $factory->hmmsearch($seq); while (my $result = $searchio->next_result){ while(my $hit = $result->next_hit){ while (my $hsp = $hit->next_hsp){ print join("\t", ( $result->query_name, $hsp->query->start, $hsp->query->end, $hit->name, $hsp->hit->start, $hsp->hit->end, $hsp->score, $hsp->evalue, $hsp->seq_str, )), "\n"; } } } exceptions: MSG: Unknown kind of input 'Bio::SeqIO::fasta=HASH(0x329a504)' STACK Bio::Tools::Run::Hmmer::_setinput D:/Perl/site/lib/Bio/Tools/Run/Hmmer.pm:381 STACK Bio::Tools::Run::Hmmer::hmmsearch D:/Perl/site/lib/Bio/Tools/Run/Hmmer.pm:352 ?STACK toplevel test_bioperl.pl:12 thank you fayroz? From douglas.hoen at gmail.com Tue Aug 10 21:54:53 2010 From: douglas.hoen at gmail.com (Douglas Hoen) Date: Tue, 10 Aug 2010 21:54:53 -0400 Subject: [Bioperl-l] Bio::SeqFeature::SimilarityPair->from_searchResult()? Message-ID: <4513D6B2-F7B3-4A6E-91CA-879C9E372E84@gmail.com> Hi, I was wondering why the Synopsis in the docs for Bio::SeqFeature::SimilarityPair has the following: $sim_pair = Bio::SeqFeature::SimilarityPair->from_searchResult($blastHit); There doesn't actually seem to be a from_searchResult method. Am I missing something? Thanks, -- Doug From zhaoy at mail.cbi.pku.edu.cn Wed Aug 11 04:17:42 2010 From: zhaoy at mail.cbi.pku.edu.cn (zhaoy at mail.cbi.pku.edu.cn) Date: Wed, 11 Aug 2010 16:17:42 +0800 (CST) Subject: [Bioperl-l] About extracting sequence from genewise format result Message-ID: <53663.162.105.250.100.1281514662.squirrel@mail.cbi.pku.edu.cn> Dear authors: Hello! Recently I am trying to parse the genewise format result for extracting the nuclear sequence using method "hit_string" in module "SearchIO", however, the result is empty. What's more terrible, the cycle seems not working, because I always get the last result. I'm confused. My perl code is shown below: #!/usr/bin/perl -w use strict; use warnings; use Bio::SearchIO; my $in = new Bio::SearchIO(-format => 'wise', -wisetype => 'genewise', -file => 'test'); while( my $result = $in->next_result ) { while (my $hit = $result->next_hit) { while (my $hsp = $hit->next_hsp){ print "Query=", $result->query_name, "\n", "Length=", $hsp->length('total'),"\n", "hit_string:", $hsp->hit_string, "\n"; } } } And one of the genewise format results is shown below: genewise $Name: wise2-4-0alpha $ (unreleased release) This program is freely distributed under a GPL. See source directory Copyright (c) GRL limited: portions of the code are from separate copyright Query protein: Cpa_s110_24 Comp Matrix: BLOSUM62.bla Gap open: 12 Gap extension: 2 Start/End global Target Sequence Bdi_chr3:38292015..38292302 Strand: forward Start/End (protein) global Gene Parameter file: gene.stat Splice site model: GT/AG only Codon Table: codon.table Subs error: 1e-06 Indel error: 1e-06 Null model syn Algorithm 623 genewise output Score 37.97 bits over entire alignment Scores as bits over a synchronous coding model Warning: The bits scores is not probablistically correct for single seqs See WWW help for more info Cpa_s110_24 1 MGNCQAVDAATLAIQHPS-GKVDRLYWPVSASEVMRTNPGHYVALLI-- MGNCQA DAA + IQHP+ GKV+RLYWP +A++VMR NPGHYVAL++ MGNCQAADAAAVVIQHPAEGKVERLYWPATAADVMRKNPGHYVALVVVH Bdi_chr3:382920 1 agatcggggggggacccgggaggccttcgaggggacaacgctggcgggc tgagaccaccctttaaccagatagtagcccccattgaacgaatctttta gctcgggtggcggcgcgcgggcgcccggccgcccgcgcccccccccccc Cpa_s110_24 47 ----STTLCPSNSNASNAESVRVTRIKLLRPTDTLVLGQVYRLITTQEV P+ + A + R+T++KLL+P DTL++GQVYRLIT+Q VSGGAGETDPAVAGGGAAAAARITKVKLLKPRDTLLIGQVYRLITSQ-- Bdi_chr3:382920 148 gtgggggagcgggggggggggaaaagaccaccgaccagcgtccaatc tcggcgacacctcgggcccccgtcatattacgactttgatagttcca cctcctgtcccacaaaattccgccgcgccgcgctgcccgccccccca Cpa_s110_24 92 MKGLWAKKCAKMKKYQEADHKDGLKPETIPGRRSGPERDTQVAKHERHR ------------------------------------------------- Bdi_chr3:382920 289 Cpa_s110_24 141 SRVAASTNQAGLKSRTWQPSLKSISEAAS ----------------------------- Bdi_chr3:382920 289 // Gene 1 Gene 1 288 Exon 1 288 phase 0 Supporting 1 54 1 18 Supporting 58 141 19 46 Supporting 160 288 47 89 // ...... The part of output of this code is shown below: Query=Aly_481360 Length=0 hit_string: Query=Aly_481360 Length=0 hit_string: ...... What's wrong with my code and how can I get the correct result? I'm looking forward to your reply. Thanks very much! Best regards, Zackaly From roy.chaudhuri at gmail.com Wed Aug 11 10:32:39 2010 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Wed, 11 Aug 2010 15:32:39 +0100 Subject: [Bioperl-l] using HMMER In-Reply-To: <603590.1072.qm@web112620.mail.gq1.yahoo.com> References: <603590.1072.qm@web112620.mail.gq1.yahoo.com> Message-ID: <4C62B487.9090103@gmail.com> Hi Fayroz, Your $seq variable contains a Bio::SeqIO object (a biological filehandle), not a Bio::Seq (sequence object). You need to change that line to: my $seqio = Bio::SeqIO->new(-file=>'one_seq.fa', -format=>'fasta'); my $seq=$seqio->next_seq; If you have multiple sequences in the file, then you will need to loop over them: while (my $seq=$seqio->next_seq) { # Code to run Hmmer goes here } Also, I don't think you need to specify -informat for your Bio::Tools::Run::Hmmer object, since you're passing it a sequence object, not a filename. Hope this helps. Roy. On 08/08/2010 09:24, fayroz wrote: > i need your help, i am a new perl user and want to use bioperl modules to run > HMMER program ( HMMsearch) i have" model.hmm" and a "fasta file" to see which of > them are similar with the model > i write this code but there is a problems > > #!/usr/local/bin/perl W > use Bio::AlignIO; > use Bio::SearchIO; > use Bio::SeqIO ; > use Bio::Tools::Run::Hmmer; > > # run hmmsearch (similar for hmmpfam) > my $factory = Bio::Tools::Run::Hmmer->new(-hmm => 'h6_avian.hmm',-informat => > 'fasta'); > my $seq = Bio::SeqIO->new('-file'=> "one_seq.fa", '-format'=>'Fasta'); > > # Pass the factory a Bio::Seq object or a file name, returns a Bio::SearchIO > my $searchio = $factory->hmmsearch($seq); > > while (my $result = $searchio->next_result){ > while(my $hit = $result->next_hit){ > while (my $hsp = $hit->next_hsp){ > print join("\t", ( $result->query_name, > $hsp->query->start, > $hsp->query->end, > $hit->name, > $hsp->hit->start, > $hsp->hit->end, > $hsp->score, > $hsp->evalue, > $hsp->seq_str, > )), "\n"; > } > } > } > > > exceptions: > MSG: Unknown kind of input 'Bio::SeqIO::fasta=HASH(0x329a504)' > STACK Bio::Tools::Run::Hmmer::_setinput > D:/Perl/site/lib/Bio/Tools/Run/Hmmer.pm:381 > STACK Bio::Tools::Run::Hmmer::hmmsearch > D:/Perl/site/lib/Bio/Tools/Run/Hmmer.pm:352 > STACK toplevel test_bioperl.pl:12 > thank you > > fayroz > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Aug 11 11:07:36 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 11 Aug 2010 10:07:36 -0500 Subject: [Bioperl-l] using HMMER In-Reply-To: <4C62B487.9090103@gmail.com> References: <603590.1072.qm@web112620.mail.gq1.yahoo.com> <4C62B487.9090103@gmail.com> Message-ID: <62C86AFB-FF3A-44C6-A413-50C3F839DF34@illinois.edu> might also want to check whether you are using hmmer2 vs hmmer3. not sure if the wrapper works for hmmer3. chris On Aug 11, 2010, at 9:32 AM, Roy Chaudhuri wrote: > Hi Fayroz, > > Your $seq variable contains a Bio::SeqIO object (a biological filehandle), not a Bio::Seq (sequence object). > > You need to change that line to: > my $seqio = Bio::SeqIO->new(-file=>'one_seq.fa', -format=>'fasta'); > my $seq=$seqio->next_seq; > > If you have multiple sequences in the file, then you will need to loop over them: > while (my $seq=$seqio->next_seq) { > # Code to run Hmmer goes here > } > > Also, I don't think you need to specify -informat for your Bio::Tools::Run::Hmmer object, since you're passing it a sequence object, not a filename. > > Hope this helps. > Roy. > > On 08/08/2010 09:24, fayroz wrote: >> i need your help, i am a new perl user and want to use bioperl modules to run >> HMMER program ( HMMsearch) i have" model.hmm" and a "fasta file" to see which of >> them are similar with the model >> i write this code but there is a problems >> >> #!/usr/local/bin/perl W >> use Bio::AlignIO; >> use Bio::SearchIO; >> use Bio::SeqIO ; >> use Bio::Tools::Run::Hmmer; >> >> # run hmmsearch (similar for hmmpfam) >> my $factory = Bio::Tools::Run::Hmmer->new(-hmm => 'h6_avian.hmm',-informat => >> 'fasta'); >> my $seq = Bio::SeqIO->new('-file'=> "one_seq.fa", '-format'=>'Fasta'); >> >> # Pass the factory a Bio::Seq object or a file name, returns a Bio::SearchIO >> my $searchio = $factory->hmmsearch($seq); >> >> while (my $result = $searchio->next_result){ >> while(my $hit = $result->next_hit){ >> while (my $hsp = $hit->next_hsp){ >> print join("\t", ( $result->query_name, >> $hsp->query->start, >> $hsp->query->end, >> $hit->name, >> $hsp->hit->start, >> $hsp->hit->end, >> $hsp->score, >> $hsp->evalue, >> $hsp->seq_str, >> )), "\n"; >> } >> } >> } >> >> >> exceptions: >> MSG: Unknown kind of input 'Bio::SeqIO::fasta=HASH(0x329a504)' >> STACK Bio::Tools::Run::Hmmer::_setinput >> D:/Perl/site/lib/Bio/Tools/Run/Hmmer.pm:381 >> STACK Bio::Tools::Run::Hmmer::hmmsearch >> D:/Perl/site/lib/Bio/Tools/Run/Hmmer.pm:352 >> STACK toplevel test_bioperl.pl:12 >> thank you >> >> fayroz >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From douglas.hoen at gmail.com Wed Aug 11 15:13:49 2010 From: douglas.hoen at gmail.com (Doug) Date: Wed, 11 Aug 2010 12:13:49 -0700 (PDT) Subject: [Bioperl-l] How to store results of searches of translated DNA in SeqFeature::Store database of the original DNA? Message-ID: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com> Hi, I am trying to store in a SeqFeature::Store database the results of searches of translated DNA. The DB contains the original DNA sequences. For instance, I have done HMMER searches of 6-frame translations of the sequences stored in the DB. I want to store these results "at" their (equivalent) DNA positions, which I can calculate. Preferably, I would like to directly store the SeqFeature::Similarity objects that I get from parsing these searches. But they are of course located on different coordinate systems than the DNA, so I guess I can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct DNA position and then store the Similarity's as sub-SeqFeatures. I could just set the Similarity's position to the (calculated) DNA coordinates, or alternately make a new SeqFeature and copy in the attributes I want. But is there a more elegant solution? Thanks, -- Doug From douglas.hoen at gmail.com Wed Aug 11 16:11:26 2010 From: douglas.hoen at gmail.com (Doug) Date: Wed, 11 Aug 2010 13:11:26 -0700 (PDT) Subject: [Bioperl-l] How to store results of searches of translated DNA in SeqFeature::Store database of the original DNA? In-Reply-To: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com> References: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com> Message-ID: One possible answer to my own question: Use Bio::SeqFeature::PositionProxy's? Would this work? On Aug 11, 3:13?pm, Doug wrote: > Hi, > > I am trying to store in a SeqFeature::Store database the results of > searches of translated DNA. The DB contains the original DNA > sequences. For instance, I have done HMMER searches of 6-frame > translations of the sequences stored in the DB. I want to store these > results "at" their (equivalent) DNA positions, which I can calculate. > Preferably, I would like to directly store the SeqFeature::Similarity > objects that I get from parsing these searches. But they are of course > located on different coordinate systems than the DNA, so I guess I > can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct > DNA position and then store the Similarity's as sub-SeqFeatures. > > I could just set the Similarity's position to the (calculated) DNA > coordinates, or alternately make a new SeqFeature and copy in the > attributes I want. But is there a more elegant solution? > > Thanks, > -- Doug > _______________________________________________ > Bioperl-l mailing list > Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l From scott at scottcain.net Wed Aug 11 16:16:22 2010 From: scott at scottcain.net (Scott Cain) Date: Wed, 11 Aug 2010 16:16:22 -0400 Subject: [Bioperl-l] How to store results of searches of translated DNA in SeqFeature::Store database of the original DNA? In-Reply-To: References: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com> Message-ID: Hi Doug, I don't know if any of the things you've thought of would work; I've never tried it. My inclination would be to express your data in GFF3 and use the standard loader. Scott On Wed, Aug 11, 2010 at 4:11 PM, Doug wrote: > One possible answer to my own question: Use > Bio::SeqFeature::PositionProxy's? Would this work? > > On Aug 11, 3:13?pm, Doug wrote: >> Hi, >> >> I am trying to store in a SeqFeature::Store database the results of >> searches of translated DNA. The DB contains the original DNA >> sequences. For instance, I have done HMMER searches of 6-frame >> translations of the sequences stored in the DB. I want to store these >> results "at" their (equivalent) DNA positions, which I can calculate. >> Preferably, I would like to directly store the SeqFeature::Similarity >> objects that I get from parsing these searches. But they are of course >> located on different coordinate systems than the DNA, so I guess I >> can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct >> DNA position and then store the Similarity's as sub-SeqFeatures. >> >> I could just set the Similarity's position to the (calculated) DNA >> coordinates, or alternately make a new SeqFeature and copy in the >> attributes I want. But is there a more elegant solution? >> >> Thanks, >> -- Doug >> _______________________________________________ >> Bioperl-l mailing list >> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From douglas.hoen at gmail.com Wed Aug 11 16:38:54 2010 From: douglas.hoen at gmail.com (Doug) Date: Wed, 11 Aug 2010 13:38:54 -0700 (PDT) Subject: [Bioperl-l] How to store results of searches of translated DNA in SeqFeature::Store database of the original DNA? In-Reply-To: References: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com> Message-ID: <6e28dc26-ada0-4be2-9f62-a4d632aaf0bb@j8g2000yqd.googlegroups.com> Hi Scott, Good idea. Would you happen to know of an existing HMMER3 to GFF3 converter? Thanks for your advice, -- Doug On Aug 11, 4:16?pm, Scott Cain wrote: > Hi Doug, > > I don't know if any of the things you've thought of would work; I've > never tried it. ?My inclination would be to express your data in GFF3 > and use the standard loader. > > Scott > > > > > > On Wed, Aug 11, 2010 at 4:11 PM, Doug wrote: > > One possible answer to my own question: Use > > Bio::SeqFeature::PositionProxy's? Would this work? > > > On Aug 11, 3:13?pm, Doug wrote: > >> Hi, > > >> I am trying to store in a SeqFeature::Store database the results of > >> searches of translated DNA. The DB contains the original DNA > >> sequences. For instance, I have done HMMER searches of 6-frame > >> translations of the sequences stored in the DB. I want to store these > >> results "at" their (equivalent) DNA positions, which I can calculate. > >> Preferably, I would like to directly store the SeqFeature::Similarity > >> objects that I get from parsing these searches. But they are of course > >> located on different coordinate systems than the DNA, so I guess I > >> can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct > >> DNA position and then store the Similarity's as sub-SeqFeatures. > > >> I could just set the Similarity's position to the (calculated) DNA > >> coordinates, or alternately make a new SeqFeature and copy in the > >> attributes I want. But is there a more elegant solution? > > >> Thanks, > >> -- Doug > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > > Bioperl-l mailing list > > Bioper... at lists.open-bio.org > >http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net > GMOD Coordinator (http://gmod.org/)?? ? ? ? ? ? ? ? ?? 216-392-3087 > Ontario Institute for Cancer Research > > _______________________________________________ > Bioperl-l mailing list > Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l From douglas.hoen at gmail.com Wed Aug 11 16:53:35 2010 From: douglas.hoen at gmail.com (Doug) Date: Wed, 11 Aug 2010 13:53:35 -0700 (PDT) Subject: [Bioperl-l] How to store results of searches of translated DNA in SeqFeature::Store database of the original DNA? In-Reply-To: <6e28dc26-ada0-4be2-9f62-a4d632aaf0bb@j8g2000yqd.googlegroups.com> References: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com> <6e28dc26-ada0-4be2-9f62-a4d632aaf0bb@j8g2000yqd.googlegroups.com> Message-ID: One more note: I did try using PositionProxy but it failed. It doesn't implement seq_id() and so can't be stored in the DB: ------------- EXCEPTION: Bio::Root::NotImplemented ------------- MSG: Abstract method "Bio::SeqFeatureI::seq_id" is not implemented by package Bio::SeqFeature::PositionProxy. This is not your fault - author of Bio::SeqFeature::PositionProxy should be blamed! ... On Aug 11, 4:38?pm, Doug wrote: > Hi Scott, > > Good idea. Would you happen to know of an existing HMMER3 to GFF3 > converter? > > Thanks for your advice, > -- Doug > > On Aug 11, 4:16?pm, Scott Cain wrote: > > > > > > > Hi Doug, > > > I don't know if any of the things you've thought of would work; I've > > never tried it. ?My inclination would be to express your data in GFF3 > > and use the standard loader. > > > Scott > > > On Wed, Aug 11, 2010 at 4:11 PM, Doug wrote: > > > One possible answer to my own question: Use > > > Bio::SeqFeature::PositionProxy's? Would this work? > > > > On Aug 11, 3:13?pm, Doug wrote: > > >> Hi, > > > >> I am trying to store in a SeqFeature::Store database the results of > > >> searches of translated DNA. The DB contains the original DNA > > >> sequences. For instance, I have done HMMER searches of 6-frame > > >> translations of the sequences stored in the DB. I want to store these > > >> results "at" their (equivalent) DNA positions, which I can calculate. > > >> Preferably, I would like to directly store the SeqFeature::Similarity > > >> objects that I get from parsing these searches. But they are of course > > >> located on different coordinate systems than the DNA, so I guess I > > >> can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct > > >> DNA position and then store the Similarity's as sub-SeqFeatures. > > > >> I could just set the Similarity's position to the (calculated) DNA > > >> coordinates, or alternately make a new SeqFeature and copy in the > > >> attributes I want. But is there a more elegant solution? > > > >> Thanks, > > >> -- Doug > > >> _______________________________________________ > > >> Bioperl-l mailing list > > >> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioper... at lists.open-bio.org > > >http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -- > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net > > GMOD Coordinator (http://gmod.org/)?? ? ? ? ? ? ? ? ?? 216-392-3087 > > Ontario Institute for Cancer Research > > > _______________________________________________ > > Bioperl-l mailing list > > Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Aug 11 16:45:00 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 11 Aug 2010 15:45:00 -0500 Subject: [Bioperl-l] How to store results of searches of translated DNA in SeqFeature::Store database of the original DNA? In-Reply-To: <6e28dc26-ada0-4be2-9f62-a4d632aaf0bb@j8g2000yqd.googlegroups.com> References: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com> <6e28dc26-ada0-4be2-9f62-a4d632aaf0bb@j8g2000yqd.googlegroups.com> Message-ID: <190AF658-E8FE-43D7-A71F-196AE54DA1DB@illinois.edu> HMMER3 is parsed by Bio::SearchIO now in bioperl-live, and I think there is a generic SearchIO->GFF3 script floating around the intertubes somewheres... chris On Aug 11, 2010, at 3:38 PM, Doug wrote: > Hi Scott, > > Good idea. Would you happen to know of an existing HMMER3 to GFF3 > converter? > > Thanks for your advice, > -- Doug > > On Aug 11, 4:16 pm, Scott Cain wrote: >> Hi Doug, >> >> I don't know if any of the things you've thought of would work; I've >> never tried it. My inclination would be to express your data in GFF3 >> and use the standard loader. >> >> Scott >> >> >> >> >> >> On Wed, Aug 11, 2010 at 4:11 PM, Doug wrote: >>> One possible answer to my own question: Use >>> Bio::SeqFeature::PositionProxy's? Would this work? >> >>> On Aug 11, 3:13 pm, Doug wrote: >>>> Hi, >> >>>> I am trying to store in a SeqFeature::Store database the results of >>>> searches of translated DNA. The DB contains the original DNA >>>> sequences. For instance, I have done HMMER searches of 6-frame >>>> translations of the sequences stored in the DB. I want to store these >>>> results "at" their (equivalent) DNA positions, which I can calculate. >>>> Preferably, I would like to directly store the SeqFeature::Similarity >>>> objects that I get from parsing these searches. But they are of course >>>> located on different coordinate systems than the DNA, so I guess I >>>> can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct >>>> DNA position and then store the Similarity's as sub-SeqFeatures. >> >>>> I could just set the Similarity's position to the (calculated) DNA >>>> coordinates, or alternately make a new SeqFeature and copy in the >>>> attributes I want. But is there a more elegant solution? >> >>>> Thanks, >>>> -- Doug >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l >> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioper... at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D. scott at scottcain dot net >> GMOD Coordinator (http://gmod.org/) 216-392-3087 >> Ontario Institute for Cancer Research >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From scott at scottcain.net Wed Aug 11 17:05:25 2010 From: scott at scottcain.net (Scott Cain) Date: Wed, 11 Aug 2010 17:05:25 -0400 Subject: [Bioperl-l] How to store results of searches of translated DNA in SeqFeature::Store database of the original DNA? In-Reply-To: <190AF658-E8FE-43D7-A71F-196AE54DA1DB@illinois.edu> References: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com> <6e28dc26-ada0-4be2-9f62-a4d632aaf0bb@j8g2000yqd.googlegroups.com> <190AF658-E8FE-43D7-A71F-196AE54DA1DB@illinois.edu> Message-ID: Um, yeah, it's in bioperl: bp_search2gff.pl. Scott On Wed, Aug 11, 2010 at 4:45 PM, Chris Fields wrote: > HMMER3 is parsed by Bio::SearchIO now in bioperl-live, and I think there is a generic SearchIO->GFF3 script floating around the intertubes somewheres... > > chris > > On Aug 11, 2010, at 3:38 PM, Doug wrote: > >> Hi Scott, >> >> Good idea. Would you happen to know of an existing HMMER3 to GFF3 >> converter? >> >> Thanks for your advice, >> -- Doug >> >> On Aug 11, 4:16 pm, Scott Cain wrote: >>> Hi Doug, >>> >>> I don't know if any of the things you've thought of would work; I've >>> never tried it. ?My inclination would be to express your data in GFF3 >>> and use the standard loader. >>> >>> Scott >>> >>> >>> >>> >>> >>> On Wed, Aug 11, 2010 at 4:11 PM, Doug wrote: >>>> One possible answer to my own question: Use >>>> Bio::SeqFeature::PositionProxy's? Would this work? >>> >>>> On Aug 11, 3:13 pm, Doug wrote: >>>>> Hi, >>> >>>>> I am trying to store in a SeqFeature::Store database the results of >>>>> searches of translated DNA. The DB contains the original DNA >>>>> sequences. For instance, I have done HMMER searches of 6-frame >>>>> translations of the sequences stored in the DB. I want to store these >>>>> results "at" their (equivalent) DNA positions, which I can calculate. >>>>> Preferably, I would like to directly store the SeqFeature::Similarity >>>>> objects that I get from parsing these searches. But they are of course >>>>> located on different coordinate systems than the DNA, so I guess I >>>>> can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct >>>>> DNA position and then store the Similarity's as sub-SeqFeatures. >>> >>>>> I could just set the Similarity's position to the (calculated) DNA >>>>> coordinates, or alternately make a new SeqFeature and copy in the >>>>> attributes I want. But is there a more elegant solution? >>> >>>>> Thanks, >>>>> -- Doug >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioper... at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> -- >>> ------------------------------------------------------------------------ >>> Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain dot net >>> GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ?216-392-3087 >>> Ontario Institute for Cancer Research >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From cjfields at illinois.edu Wed Aug 11 17:07:20 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 11 Aug 2010 16:07:20 -0500 Subject: [Bioperl-l] How to store results of searches of translated DNA in SeqFeature::Store database of the original DNA? In-Reply-To: References: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com> <6e28dc26-ada0-4be2-9f62-a4d632aaf0bb@j8g2000yqd.googlegroups.com> <190AF658-E8FE-43D7-A71F-196AE54DA1DB@illinois.edu> Message-ID: For some reason I thought there was a more up-to-date one somewhere. Ah well, can't keep track of all the code in bioperl :> chris On Aug 11, 2010, at 4:05 PM, Scott Cain wrote: > Um, yeah, it's in bioperl: bp_search2gff.pl. > > Scott > > > On Wed, Aug 11, 2010 at 4:45 PM, Chris Fields wrote: >> HMMER3 is parsed by Bio::SearchIO now in bioperl-live, and I think there is a generic SearchIO->GFF3 script floating around the intertubes somewheres... >> >> chris >> >> On Aug 11, 2010, at 3:38 PM, Doug wrote: >> >>> Hi Scott, >>> >>> Good idea. Would you happen to know of an existing HMMER3 to GFF3 >>> converter? >>> >>> Thanks for your advice, >>> -- Doug >>> >>> On Aug 11, 4:16 pm, Scott Cain wrote: >>>> Hi Doug, >>>> >>>> I don't know if any of the things you've thought of would work; I've >>>> never tried it. My inclination would be to express your data in GFF3 >>>> and use the standard loader. >>>> >>>> Scott >>>> >>>> >>>> >>>> >>>> >>>> On Wed, Aug 11, 2010 at 4:11 PM, Doug wrote: >>>>> One possible answer to my own question: Use >>>>> Bio::SeqFeature::PositionProxy's? Would this work? >>>> >>>>> On Aug 11, 3:13 pm, Doug wrote: >>>>>> Hi, >>>> >>>>>> I am trying to store in a SeqFeature::Store database the results of >>>>>> searches of translated DNA. The DB contains the original DNA >>>>>> sequences. For instance, I have done HMMER searches of 6-frame >>>>>> translations of the sequences stored in the DB. I want to store these >>>>>> results "at" their (equivalent) DNA positions, which I can calculate. >>>>>> Preferably, I would like to directly store the SeqFeature::Similarity >>>>>> objects that I get from parsing these searches. But they are of course >>>>>> located on different coordinate systems than the DNA, so I guess I >>>>>> can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct >>>>>> DNA position and then store the Similarity's as sub-SeqFeatures. >>>> >>>>>> I could just set the Similarity's position to the (calculated) DNA >>>>>> coordinates, or alternately make a new SeqFeature and copy in the >>>>>> attributes I want. But is there a more elegant solution? >>>> >>>>>> Thanks, >>>>>> -- Doug >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioper... at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> -- >>>> ------------------------------------------------------------------------ >>>> Scott Cain, Ph. D. scott at scottcain dot net >>>> GMOD Coordinator (http://gmod.org/) 216-392-3087 >>>> Ontario Institute for Cancer Research >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research From douglas.hoen at gmail.com Wed Aug 11 17:11:20 2010 From: douglas.hoen at gmail.com (Douglas Hoen) Date: Wed, 11 Aug 2010 17:11:20 -0400 Subject: [Bioperl-l] How to store results of searches of translated DNA in SeqFeature::Store database of the original DNA? In-Reply-To: References: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com> <6e28dc26-ada0-4be2-9f62-a4d632aaf0bb@j8g2000yqd.googlegroups.com> <190AF658-E8FE-43D7-A71F-196AE54DA1DB@illinois.edu> Message-ID: Great, thanks so much for the info. On 2010-08-11, at 5:05 PM, Scott Cain wrote: > Um, yeah, it's in bioperl: bp_search2gff.pl. > > Scott > > > On Wed, Aug 11, 2010 at 4:45 PM, Chris Fields wrote: >> HMMER3 is parsed by Bio::SearchIO now in bioperl-live, and I think there is a generic SearchIO->GFF3 script floating around the intertubes somewheres... >> >> chris >> >> On Aug 11, 2010, at 3:38 PM, Doug wrote: >> >>> Hi Scott, >>> >>> Good idea. Would you happen to know of an existing HMMER3 to GFF3 >>> converter? >>> >>> Thanks for your advice, >>> -- Doug >>> >>> On Aug 11, 4:16 pm, Scott Cain wrote: >>>> Hi Doug, >>>> >>>> I don't know if any of the things you've thought of would work; I've >>>> never tried it. My inclination would be to express your data in GFF3 >>>> and use the standard loader. >>>> >>>> Scott >>>> >>>> >>>> >>>> >>>> >>>> On Wed, Aug 11, 2010 at 4:11 PM, Doug wrote: >>>>> One possible answer to my own question: Use >>>>> Bio::SeqFeature::PositionProxy's? Would this work? >>>> >>>>> On Aug 11, 3:13 pm, Doug wrote: >>>>>> Hi, >>>> >>>>>> I am trying to store in a SeqFeature::Store database the results of >>>>>> searches of translated DNA. The DB contains the original DNA >>>>>> sequences. For instance, I have done HMMER searches of 6-frame >>>>>> translations of the sequences stored in the DB. I want to store these >>>>>> results "at" their (equivalent) DNA positions, which I can calculate. >>>>>> Preferably, I would like to directly store the SeqFeature::Similarity >>>>>> objects that I get from parsing these searches. But they are of course >>>>>> located on different coordinate systems than the DNA, so I guess I >>>>>> can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct >>>>>> DNA position and then store the Similarity's as sub-SeqFeatures. >>>> >>>>>> I could just set the Similarity's position to the (calculated) DNA >>>>>> coordinates, or alternately make a new SeqFeature and copy in the >>>>>> attributes I want. But is there a more elegant solution? >>>> >>>>>> Thanks, >>>>>> -- Doug >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioper... at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> -- >>>> ------------------------------------------------------------------------ >>>> Scott Cain, Ph. D. scott at scottcain dot net >>>> GMOD Coordinator (http://gmod.org/) 216-392-3087 >>>> Ontario Institute for Cancer Research >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research From Russell.Smithies at agresearch.co.nz Wed Aug 11 17:31:32 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 12 Aug 2010 09:31:32 +1200 Subject: [Bioperl-l] AlignIO and Gbrowse_syn In-Reply-To: References: Message-ID: <18DF7D20DFEC044098A1062202F5FFF32F0237EAB7@exchsth.agresearch.co.nz> I know there was some brief discussion about .maf format a few weeks ago but I've had an enquiry (as below) from a colleague. If GBrowse_syn is using .maf format, does AlignIO need more work? Any comments? --Russell I'd like to plug LASTZ alignments into GBrowse_syn. LASTZ can produce a limit number of alignment formats (http://www.bx.psu.edu/miller_lab/dist/README.lastz-1.02.00/README.lastz-1.02.00a.html#options_output). GBrowse_syn accepts clustalw format plus "other commonly used formats recognized by BioPerl's AlignIO parser" (http://gmod.org/wiki/GBrowse_syn_Database) . Since LASTZ doesn't produce clustalw, I've tried parsing LASTZ maf output to clustalw (and other alignment formats) using AlignIO, however I run into the following issues: *Strand info is lost (probably fair enough, since this isn't part of the clustalw format per se; incorporating strand info within sequence IDs is a GBrowse_syn clustalw specification) *The coordinate system for reverse strand matches differs between LASTZ .maf and BioPerl .maf: for LASTZ, coordinates relate to the reverse complemented sequence, whereas for BioPerl/GBrowse, coordinates relate to the original (non-rev complemented) sequence. E.g. a coordinate of "1" in the LASTZ .maf file refers to the last base of the original sequence; AlignIO prints "1" to the output clustalw file, but since strand info is lost it is construed as the first position at the very start of the original sequence. As a result all reverse match coordinates in the resulting clustalw output file are incorrect. *AlignIO is unable to parse multiple, individual aligned regions within the same .maf file; it interleaves them I would be interested to hear whether anyone has already found a solution to integrating LASTZ and GBrowse_syn... and also whether any development of AlignIO to improve support of maf format is planned. ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From cjfields at illinois.edu Wed Aug 11 18:02:38 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 11 Aug 2010 17:02:38 -0500 Subject: [Bioperl-l] AlignIO and Gbrowse_syn In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32F0237EAB7@exchsth.agresearch.co.nz> References: <18DF7D20DFEC044098A1062202F5FFF32F0237EAB7@exchsth.agresearch.co.nz> Message-ID: Russell, We have had very few requests to support .maf until recently, which is why there has been little done with it. We welcome any help to improve it. chris On Aug 11, 2010, at 4:31 PM, Smithies, Russell wrote: > I know there was some brief discussion about .maf format a few weeks ago but I've had an enquiry (as below) from a colleague. > If GBrowse_syn is using .maf format, does AlignIO need more work? > Any comments? > > --Russell > > > I'd like to plug LASTZ alignments into GBrowse_syn. LASTZ can produce a limit number of alignment formats (http://www.bx.psu.edu/miller_lab/dist/README.lastz-1.02.00/README.lastz-1.02.00a.html#options_output). GBrowse_syn accepts clustalw format plus "other commonly used formats recognized by BioPerl's AlignIO parser" (http://gmod.org/wiki/GBrowse_syn_Database) . Since LASTZ doesn't produce clustalw, I've tried parsing LASTZ maf output to clustalw (and other alignment formats) using AlignIO, however I run into the following issues: > *Strand info is lost (probably fair enough, since this isn't part of the clustalw format per se; incorporating strand info within sequence IDs is a GBrowse_syn clustalw specification) > *The coordinate system for reverse strand matches differs between LASTZ .maf and BioPerl .maf: for LASTZ, coordinates relate to the reverse complemented sequence, whereas for BioPerl/GBrowse, coordinates relate to the original (non-rev complemented) sequence. E.g. a coordinate of "1" in the LASTZ .maf file refers to the last base of the original sequence; AlignIO prints "1" to the output clustalw file, but since strand info is lost it is construed as the first position at the very start of the original sequence. As a result all reverse match coordinates in the resulting clustalw output file are incorrect. > *AlignIO is unable to parse multiple, individual aligned regions within the same .maf file; it interleaves them > > I would be interested to hear whether anyone has already found a solution to integrating LASTZ and GBrowse_syn... and also whether any development of AlignIO to improve support of maf format is planned. > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From douglas.hoen at gmail.com Thu Aug 12 01:59:37 2010 From: douglas.hoen at gmail.com (Doug Hoen) Date: Wed, 11 Aug 2010 22:59:37 -0700 (PDT) Subject: [Bioperl-l] HMMER3 to GFF3 Message-ID: <4bb89ced-69d9-43ff-ae20-4ce134efc40a@f6g2000yqa.googlegroups.com> Hi, I am trying to convert HMMER3 (hmmscan) output files into GFF3 files. Based on previous advice (see the thread, "How to store results of searches of translated DNA in SeqFeature::Store database of the original DNA?"), I have installed bioperl-live for its new HMMER3 parsing capabilities (in SearchIO) and am trying to use bp_search2gff.pl to do the file conversion. The hmmscan was done on translated chromosome sequences with conserved domain models. I want to get the GFF 'start' and 'end' columns to be based on these coordinates, not those of the models. To do this (with my files), it seems I need to use the option "--type hit". However, this changes the "Target" sequence name from the model name to chromosome name, and the model name does not appear anywhere in the output (see below). Could someone please confirm whether the results are incorrect and, if so, perhaps suggest a fix? It may well be that this problem is due to the unusual way I am using hmmscan, rather than a problem with HMMER3 parsing...? Many thanks, -- Doug ======================================================== Here's what it looks like if I do *not* use the "--type hit" option. (RVT_2 is a conserved domain name. I need this in the output.) COMMAND: ------------------ bp_search2gff.pl -i ../chr1-tesigsv2.hmmscan -o chr1-tesigsv2-hmmscan- original-locations-v2.gff3 --format hmmer3 --source HMMER3 --version 3 --component OUTPUT: ------------------ ==> chr1-tesigsv2-hmmscan-original-locations-v2.gff3 <== ##gff-version 3 Chr1_1 chromosome Component 1 10142557 . . 1 sequence=Chr1_1 Chr1_1 HMMER3 similarity 1 245 307.3 . 0 Target=Sequence:RVT_2 1898330 1898579 Chr1_1 HMMER3 similarity 1 244 329.5 . 0 Target=Sequence:RVT_2 2573551 2573796 Chr1_1 HMMER3 similarity 1 245 308.8 . 0 Target=Sequence:RVT_2 3159685 3159930 Chr1_1 HMMER3 similarity 1 102 108.2 . 0 Target=Sequence:RVT_2 3438684 3438791 Chr1_1 HMMER3 similarity 2 245 277.2 . 0 Target=Sequence:RVT_2 3566642 3566891 Chr1_1 HMMER3 similarity 13 213 251.4 . 0 Target=Sequence:RVT_2 4251160 4251373 Chr1_1 HMMER3 similarity 1 244 310.6 . 0 Target=Sequence:RVT_2 4252791 4253036 Chr1_1 HMMER3 similarity 6 99 94.2 . 0 Target=Sequence:RVT_2 4271555 4271653 ======================================================== And here's what it looks like if I *do* use the "--type hit" option. The coordinates look good but the model name has disappeared (and the Target=Sequence seems wrong). COMMAND: ------------------ bp_search2gff.pl -i ../chr1-tesigsv2.hmmscan -o chr1-tesigsv2-hmmscan- original-locations-v3.gff3 --format hmmer3 --type hit --source HMMER3 --version 3 --component OUTPUT: ------------------ ==> chr1-tesigsv2-hmmscan-original-locations-v3.gff3 <== ##gff-version 3 RVT_2 HMMER3 similarity 1898330 1898579 307.3 . 0 Target=Sequence:Chr1_1 1 245 RVT_2 HMMER3 similarity 2573551 2573796 329.5 . 0 Target=Sequence:Chr1_1 1 244 RVT_2 HMMER3 similarity 3159685 3159930 308.8 . 0 Target=Sequence:Chr1_1 1 245 RVT_2 HMMER3 similarity 3438684 3438791 108.2 . 0 Target=Sequence:Chr1_1 1 102 RVT_2 HMMER3 similarity 3566642 3566891 277.2 . 0 Target=Sequence:Chr1_1 2 245 RVT_2 HMMER3 similarity 4251160 4251373 251.4 . 0 Target=Sequence:Chr1_1 13 213 RVT_2 HMMER3 similarity 4252791 4253036 310.6 . 0 Target=Sequence:Chr1_1 1 244 RVT_2 HMMER3 similarity 4271555 4271653 94.2 . 0 Target=Sequence:Chr1_1 6 99 RVT_2 HMMER3 similarity 4481232 4481477 281.5 . 0 Target=Sequence:Chr1_1 2 245 ======================================================== And here's what the input HMMER3 result file looks like: ==> ../chr1-tesigsv2.hmmscan <== # hmmscan :: search sequence(s) against a profile database # HMMER 3.0rc1 (February 2010); http://hmmer.org/ # Copyright (C) 2010 Howard Hughes Medical Institute. # Freely distributed under the GNU General Public License (GPLv3). # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # query sequence file: [...]/whole_chromosomes/translated/ chr1.pep # target HMM database: [...]/signatures/Pfam-A.hmm # output directed to file: chr1-tesigsv2.hmmscan # model-specific thresholding: TC cutoffs # Max sensitivity mode: on [all heuristic filters off] # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Query: Chr1_1 [L=10142557] Description: CHROMOSOME dumped from ADB: Jun/20/09 14:53; last updated: 2009-02-02 Scores for complete sequence (score includes all domains): --- full sequence --- --- best 1 domain --- -#dom- E-value score bias E-value score bias exp N Model Description ------- ------ ----- ------- ------ ----- ---- -- -------- ----------- 0 3971.3 17.7 2.6e-101 329.5 0.6 19.4 17 RVT_2 Reverse transcriptase (RNA-dependent DNA pol 0 3040.7 23.0 1e-206 678.6 0.1 12.2 10 ATHILA ATHILA ORF-1 family 0 1681.9 79.1 1.9e-46 149.9 0.4 28.0 21 RVT_1 Reverse transcriptase (RNA-dependent DNA pol 0 1446.9 27.4 3.6e-95 309.1 0.2 7.6 5 Transposase_21 Transposase family tnp2 0 1168.4 50.3 1.4e-29 94.4 0.3 21.5 18 rve Integrase core domain 9.1e-300 960.0 69.0 3.1e-20 64.0 0.0 28.8 20 Retrotrans_gag Retrotransposon gag protein 1.5e-180 577.0 31.6 1.6e-29 93.1 1.5 9.5 8 Transposase_23 TNP1/EN/SPM transposase 4.4e-143 456.9 82.8 4.8e-18 56.4 0.1 12.9 11 MuDR MuDR family transposase 3.8e-116 371.4 19.6 1.2e-18 58.9 0.0 13.7 7 MULE MULE transposase domain 7.1e-106 344.1 5.6 2.7e-97 316.0 0.0 3.6 1 Plant_tran Plant transposon protein 9.2e-85 275.4 22.9 5.4e-60 194.4 0.3 6.4 3 Peptidase_C48 Ulp1 protease family, C-terminal catalytic d 1.8e-77 249.8 24.8 4.4e-28 89.8 0.1 10.8 3 Transposase_24 Plant transposase (Ptta/En/Spm family) 2.8e-47 150.1 1.2 5.5e-23 72.3 0.2 3.7 2 hATC hAT family dimerisation domain 5.7e-28 89.4 3.6 4.7e-13 41.1 0.0 6.5 1 RVP_2 Retroviral aspartyl protease 1e-16 53.3 0.0 4.4e-07 22.1 0.0 6.8 1 RnaseH RNase H 1.5e-08 25.3 2.4 0.00016 12.1 0.0 4.9 0 Transposase_mut Transposase, Mutator family Domain annotation for each model (and alignments): >> RVT_2 Reverse transcriptase (RNA-dependent DNA polymerase) # score bias c-Evalue i-Evalue hmmfrom hmm to alifrom ali to envfrom env to acc --- ------ ----- --------- --------- ------- ------- ------- ------- ------- ------- ---- 1 ! 307.3 0.0 5.3e-95 1.5e-94 1 245 [. 1898330 1898578 .. 1898330 1898579 .. 0.99 2 ! 329.5 0.6 8.9e-102 2.6e-101 1 244 [. 2573551 2573794 .. 2573551 2573796 .. 0.99 3 ! 308.8 0.0 1.8e-95 5.2e-95 1 245 [. 3159685 3159929 .. 3159685 3159930 .. 0.99 4 ! 108.2 0.1 3.4e-34 9.7e-34 1 102 [. 3438684 3438785 .. 3438684 3438791 .. 0.96 5 ! 277.2 0.0 8.1e-86 2.3e-85 2 245 .. 3566643 3566890 .. 3566642 3566891 .. 0.99 6 ! 251.4 0.0 6.2e-78 1.8e-77 13 213 .. 4251164 4251364 .. 4251160 4251373 .. 0.97 7 ! 310.6 0.0 5.1e-96 1.5e-95 1 244 [. 4252791 4253034 .. 4252791 4253036 .. 0.99 8 ! 94.2 0.1 6.1e-30 1.8e-29 6 99 .. 4271560 4271653 .. 4271555 4271653 .. 0.97 9 ! 281.5 0.9 3.9e-87 1.1e-86 2 245 .. 4481233 4481476 .. 4481232 4481477 .. 0.98 10 ! 248.2 0.0 5.9e-77 1.7e-76 1 190 [. 4521040 4521233 .. 4521040 4521237 .. 0.97 11 ! 314.6 0.1 3.2e-97 9.2e-97 1 244 [. 4652456 4652702 .. 4652456 4652704 .. 0.98 12 ! 40.7 0.0 1.3e-13 3.7e-13 2 92 .. 5219607 5219697 .. 5219606 5219701 .. 0.90 13 ! 221.0 0.0 1.2e-68 3.4e-68 2 245 .. 5241015 5241258 .. 5241014 5241259 .. 0.95 14 ! 81.2 0.0 5.6e-26 1.6e-25 2 115 .. 5501957 5502070 .. 5501956 5502080 .. 0.92 15 ! 272.4 0.0 2.3e-84 6.7e-84 30 245 .. 6483057 6483271 .. 6483050 6483272 .. 0.98 16 ! 178.5 0.0 1.2e-55 3.3e-55 81 244 .. 7250563 7250726 .. 7250552 7250728 .. 0.96 17 ! 313.7 0.0 5.9e-97 1.7e-96 2 245 .. 7707124 7707367 .. 7707123 7707368 .. 0.99 Alignments for each domain: == domain 1 score: 307.3 bits; conditional E-value: 5.3e-95 RVT_2 1 nktwelvelpkgkkviglkWvfklKlnedgeierykARlVakGftqkegidyeetfspvvklesirlllalaaekkleleqlDvktaFLngelee 95 n tw +++lp gkk++g+kWv+k+Kln+dg++erykARlVakG+tq+eg+dy +tfspv+kl++++ll+a+aa+k+++l+qlD+++aFLng+l+e Chr1_1 1898330 NGTWVVCSLPVGKKAVGCKWVYKIKLNADGSLERYKARLVAKGYTQTEGLDYVDTFSPVAKLTTVKLLIAVAAAKGWSLSQLDISNAFLNGSLDE 1898424 68********************************************************************************************* PP RVT_2 96 evYvkqpeGfedkkk....enkvckLkkslYgLkqapraWyeklsevllklgfkkseadkclfvkkkeeeliivllYVDDlliagsskelieelk 186 e+Y++ p+G++ ++ +n vc+LkkslYgLkqa+r+Wy k+se l++lgf+ +s+ d++lf++k++++ ++vl+YVDD++ia+s +++ e l Chr1_1 1898425 EIYMTLPPGYSPRQGdsfpPNAVCRLKKSLYGLKQASRQWYLKFSESLKALGFTQSSGDHTLFTRKSKNSYMAVLVYVDDIIIASSCDRETELLR 1898519 ***********998889999*************************************************************************** PP RVT_2 187 eeLkkefemkdlgelkyfLgleierkeegillsqekyvkkllkkfkmedakpvstplea 245 ++L+++ +++dlg+l+yfLglei+r+++gi+++q+ky+ +ll+++++ +k++s +p+e+ Chr1_1 1898520 DALQRSSKLRDLGTLRYFLGLEIARNTDGISICQRKYTLELLAETGLLGCKSSSVPMEP 1898578 *********************************************************97 PP == domain 2 score: 329.5 bits; conditional E-value: 8.9e-102 RVT_2 1 nktwelvelpkgkkviglkWvfklKlnedgeierykARlVakGftqkegidyeetfspvvklesirlllalaaekkleleqlDvktaFLngelee 95 n+twel++lp+g+k+ig+kWv+k K+n++ge+erykARlVakG++q++gidy+e +f+pv++le++rl+++laa++k++++q+D k aFLng++ee Chr1_1 2573551 NDTWELTSLPNGHKAIGVKWVYKAKKNSKGEVERYKARLVAKGYSQRAGIDYDEVFAPVARLETVRLIISLAAQNKWKIHQMDFKLAFLNGDFEE 2573645 79********************************************************************************************* PP RVT_2 96 evYvkqpeGfedkkkenkvckLkkslYgLkqapraWyeklsevllklgfkkseadkclfvkkkeeeliivllYVDDlliagsskelieelkeeLk 190 evY++qp+G+ +k++e+kv++Lkk+lYgLkqapraW++++++++++++f k+ + +++l++k ++e+++i +lYVDDl+++g++ ++ ee+k+e++ Chr1_1 2573646 EVYIEQPQGYIVKGEEDKVLRLKKALYGLKQAPRAWNTRIDKYFKEKDFIKCPYEHALYIKIQKEDILIACLYVDDLIFTGNNPSMFEEFKKEMT 2573740 *********************************************************************************************** PP RVT_2 191 kefemkdlgelkyfLgleierkeegillsqekyvkkllkkfkmedakpvstple 244 kefem+d+g ++y+Lg+e+++++++i+++qe y+k++lkkfkm+d++pv tp +e Chr1_1 2573741 KEFEMTDIGLMSYYLGIEVKQEDNRIFITQEGYAKEVLKKFKMDDSNPVCTPME 2573794 ****************************************************97 PP From kai.blin at biotech.uni-tuebingen.de Thu Aug 12 08:16:45 2010 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Thu, 12 Aug 2010 14:16:45 +0200 Subject: [Bioperl-l] HMMER3 to GFF3 In-Reply-To: <4bb89ced-69d9-43ff-ae20-4ce134efc40a@f6g2000yqa.googlegroups.com> References: <4bb89ced-69d9-43ff-ae20-4ce134efc40a@f6g2000yqa.googlegroups.com> Message-ID: <20100812141645.1dc6507a.kai.blin@biotech.uni-tuebingen.de> On Wed, 11 Aug 2010 22:59:37 -0700 (PDT) Doug Hoen wrote: Hi Doug, > Could someone please confirm whether the results are incorrect and, if > so, perhaps suggest a fix? It may well be that this problem is due to > the unusual way I am using hmmscan, rather than a problem with HMMER3 > parsing...? Can you please attach your hmmer input file? Along the way something inserted line breaks, making it unreadable. It might well be possible that the HMMer3 parser still handles a little different from the HMMer2 parser, I haven't tried that script. Cheers, Kai -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Institute for Microbiology and Infection Medicine Division of Microbiology/Biotechnology Eberhard-Karls-University of T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Deutschland Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From kai.blin at biotech.uni-tuebingen.de Thu Aug 12 08:09:00 2010 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Thu, 12 Aug 2010 14:09:00 +0200 Subject: [Bioperl-l] using HMMER In-Reply-To: <62C86AFB-FF3A-44C6-A413-50C3F839DF34@illinois.edu> References: <603590.1072.qm@web112620.mail.gq1.yahoo.com> <4C62B487.9090103@gmail.com> <62C86AFB-FF3A-44C6-A413-50C3F839DF34@illinois.edu> Message-ID: <20100812140900.291bbb01.kai.blin@biotech.uni-tuebingen.de> On Wed, 11 Aug 2010 10:07:36 -0500 Chris Fields wrote: > might also want to check whether you are using hmmer2 vs hmmer3. not sure if the wrapper works for hmmer3. It might if you initialize it using my $factory = Bio::Tools::Run::Hmmer->new(-hmm => 'model.hmm', -_READMETHOD => 'hmmer3'); at least for the programs that still exist with the same name in hmmer3. It won't support hmmer3 using the default options, though. If I have some spare time, I'll look into this, no promises on the timeframe, though. Cheers, Kai -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Institute for Microbiology and Infection Medicine Division of Microbiology/Biotechnology Eberhard-Karls-University of T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Deutschland Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From cjfields at illinois.edu Thu Aug 12 11:28:50 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 12 Aug 2010 10:28:50 -0500 Subject: [Bioperl-l] using HMMER In-Reply-To: <20100812140900.291bbb01.kai.blin@biotech.uni-tuebingen.de> References: <603590.1072.qm@web112620.mail.gq1.yahoo.com> <4C62B487.9090103@gmail.com> <62C86AFB-FF3A-44C6-A413-50C3F839DF34@illinois.edu> <20100812140900.291bbb01.kai.blin@biotech.uni-tuebingen.de> Message-ID: <8129B813-5B15-4DDC-AB0D-5D95EFFCE78D@illinois.edu> On Aug 12, 2010, at 7:09 AM, Kai Blin wrote: > On Wed, 11 Aug 2010 10:07:36 -0500 > Chris Fields wrote: > >> might also want to check whether you are using hmmer2 vs hmmer3. not sure if the wrapper works for hmmer3. > > It might if you initialize it using > my $factory = Bio::Tools::Run::Hmmer->new(-hmm => 'model.hmm', -_READMETHOD => 'hmmer3'); > > at least for the programs that still exist with the same name in > hmmer3. It won't support hmmer3 using the default options, though. > > If I have some spare time, I'll look into this, no promises on the > timeframe, though. > > Cheers, > Kai > > -- > Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de > Institute for Microbiology and Infection Medicine > Division of Microbiology/Biotechnology > Eberhard-Karls-University of T?bingen > Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 > D-72076 T?bingen Fax : ++49 7071 29-5979 > Deutschland > Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben Would be nice to convert this over (at some point) to use Mark's CommandExts. I'm thinking of doing this with Infernal, so if I get that running it wouldn't be terribly difficult to get hmmer3 working as well. chris From cjfields at illinois.edu Thu Aug 12 12:14:44 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 12 Aug 2010 11:14:44 -0500 Subject: [Bioperl-l] using HMMER In-Reply-To: <857996.8184.qm@web112610.mail.gq1.yahoo.com> References: <603590.1072.qm@web112620.mail.gq1.yahoo.com> <4C62B487.9090103@gmail.com> <62C86AFB-FF3A-44C6-A413-50C3F839DF34@illinois.edu> <20100812140900.291bbb01.kai.blin@biotech.uni-tuebingen.de> <8129B813-5B15-4DDC-AB0D-5D95EFFCE78D@illinois.edu> <857996.8184.qm@web112610.mail.gq1.yahoo.com> Message-ID: <43FD0A31-DB95-4AE9-B678-937EE6346BC2@illinois.edu> Fayroz, Please keep responses on-list. It seems you need to update your local bioperl, as 'hmmer3' is a recent addition, after 1.6.1. It will be in 1.6.2 if I can get the time to make a release :> chris On Aug 12, 2010, at 10:58 AM, fayroz wrote: > dear chris, > from HMMER documentation i found this statement > "The HMMER programs must either be in your path, or you must set the environment > variable HMMERDIR to point to their location." > is it will solve the problem? > how can i do it please ? i work under windows7 platform > > > when i appled this line with hmmer3 > my $factory = Bio::Tools::Run::Hmmer->new(-hmm => 'model.hmm', -_READMETHOD => > 'hmmer3'); > > this output apper: > > Bio::SearchIO: hmmer3 cannot be found > > and when try with hmmer2 the same output apper: > > Exception > ------------- EXCEPTION ------------- > MSG: Failed to load module Bio::SearchIO::hmmer3. Can't locate > Bio\SearchIO\hmmer3.pm in @INC (@INC contains: D:\Perl\bin\ D:/Perl/site/lib > D:/Perl/lib .) at D:/Perl/site/lib/Bio/Root/Root.pm line 439, line 1. > STACK Bio::Root::Root::_load_module D:/Perl/site/lib/Bio/Root/Root.pm:441 > STACK (eval) D:/Perl/site/lib/Bio/SearchIO.pm:446 > STACK Bio::SearchIO::_load_format_module D:/Perl/site/lib/Bio/SearchIO.pm:445 > STACK Bio::SearchIO::new D:/Perl/site/lib/Bio/SearchIO.pm:189 > STACK Bio::Tools::Run::Hmmer::_run D:/Perl/site/lib/Bio/Tools/Run/Hmmer.pm:431 > STACK Bio::Tools::Run::Hmmer::hmmsearch > D:/Perl/site/lib/Bio/Tools/Run/Hmmer.pm:353 > STACK toplevel C:\Users\Khaled\AppData\Local\Temp\dzprltmp.pl:13 > ------------------------------------- > For more information about the SearchIO system please see the SearchIO docs. > This includes ways of checking for formats at compile time, not run time > '--informat' is not recognized as an internal or external command, > operable program or batch file. > Can't call method "next_result" on an undefined value at > C:\Users\Khaled\AppData\Local\Temp\dzprltmp.pl line 15, line 1. > > > > ----- Original Message ---- > From: Chris Fields > To: Kai Blin > Cc: fayroz ; bioperl-l at bioperl.org > Sent: Thu, August 12, 2010 6:28:50 PM > Subject: Re: [Bioperl-l] using HMMER > > On Aug 12, 2010, at 7:09 AM, Kai Blin wrote: > >> On Wed, 11 Aug 2010 10:07:36 -0500 >> Chris Fields wrote: >> >>> might also want to check whether you are using hmmer2 vs hmmer3. not sure if >>> the wrapper works for hmmer3. >> >> It might if you initialize it using >> my $factory = Bio::Tools::Run::Hmmer->new(-hmm => 'model.hmm', -_READMETHOD => >> 'hmmer3'); >> >> at least for the programs that still exist with the same name in >> hmmer3. It won't support hmmer3 using the default options, though. >> >> If I have some spare time, I'll look into this, no promises on the >> timeframe, though. >> >> Cheers, >> Kai >> >> -- >> Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de >> Institute for Microbiology and Infection Medicine >> Division of Microbiology/Biotechnology >> Eberhard-Karls-University of T?bingen >> Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 >> D-72076 T?bingen Fax : ++49 7071 29-5979 >> Deutschland >> Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben > > Would be nice to convert this over (at some point) to use Mark's CommandExts. > I'm thinking of doing this with Infernal, so if I get that running it wouldn't > be terribly difficult to get hmmer3 working as well. > > chris > > > From jason at bioperl.org Thu Aug 12 14:37:11 2010 From: jason at bioperl.org (Jason Stajich) Date: Thu, 12 Aug 2010 11:37:11 -0700 Subject: [Bioperl-l] Other: Script for editing alignments? In-Reply-To: <20100812061811.4D92468539@evol.biology.mcmaster.ca> References: <20100812061811.4D92468539@evol.biology.mcmaster.ca> Message-ID: <4C643F57.3040408@bioperl.org> Hi Si - This is pretty straightforward with Bioperl. Here's one solution: #!/usr/bin/perl -w use strict; use Bio::AlignIO; my $in = Bio::AlignIO->new(-format => 'fasta', -file => shift @ARGV); my $out = Bio::AlignIO->new(-format => 'fasta'); while( my $aln = $in->next_aln ) { for my $seq ( $aln->each_seq ) { my $str = $seq->seq; if( $str =~ /^(-+)/ ) { my $rep = length($1); # replace from the 5' end substr($str,0,$rep,'N'x$rep); } if( $str =~ /(-+)$/ ) { my $rep = length($1); # replace from the 3' end substr($str,-1 * $rep,length($str),'N'x$rep); } $seq->seq($str); } # don't print the /start-end info in the FASTA ID $aln->set_displayname_flat(1); $out->write_aln($aln); } -jason evoldir at evol.biology.mcmaster.ca wrote, On 8/11/10 11:18 PM: > Dear All > > Alignment programs like MUSCLE and Clustal often output alignments with > "-" symbols indicating indels (real events) within sequence alignments, > but also "-" symbols at the 5' and 3' ends of sequences. The latter > however, are not real evolutionary events and really should be Ns > (missing data), depending on the sort of analytical framework you use. > > If there is sufficient heterogeneity and signal within the 5' and 3' > ends of sequences, the "-"s can be manually edited in a text editor to > Ns with no problem, if the alignment is small. If it is large (e.g. 2000 > seqs), or there are lots of alignments, it becomes a lengthy task. > > I'm investigating such alignments presently and so was wondering if > anyone had a clever way of implementing sed, or had a Perl script that > would perform such a task. Simply put, it would require replacing the 5' > and 3' "-" below only with Ns and leaving the within sequence "-"s > alone. The sequences naturally may span more than one line. > > >Taxon 1 > -----ATGCTG--TGACTG----TGACT--- > >Taxon 2 > ---GTATGTTG--TGACTGCT--TGACCGTC > > to > > >Taxon 1 > NNNNNATGCTG--TGACTG----TGACTNNN > >Taxon 2 > NNNGTATGTTG--TGACTGCT--TGACCGTC > > It's a simple task, but I haven't seen any scripts out there to do the job. > > If there are any scripters out there who can help, or if someone knows > of an application that would help, it would be great to hear from you. > > With best wishes and thanks > > Si Creer > > From genehack at genehack.org Thu Aug 12 20:32:07 2010 From: genehack at genehack.org (John SJ Anderson) Date: Thu, 12 Aug 2010 20:32:07 -0400 Subject: [Bioperl-l] Bio::SeqFeature::SimilarityPair->from_searchResult()? In-Reply-To: <4513D6B2-F7B3-4A6E-91CA-879C9E372E84@gmail.com> References: <4513D6B2-F7B3-4A6E-91CA-879C9E372E84@gmail.com> Message-ID: On Aug 10, 2010, at 21:54 , Douglas Hoen wrote: > I was wondering why the Synopsis in the docs for Bio::SeqFeature::SimilarityPair has the following: > $sim_pair = Bio::SeqFeature::SimilarityPair->from_searchResult($blastHit); > > There doesn't actually seem to be a from_searchResult method. Am I missing something? No, it looks like that method got removed back in 2002 as a part of moving to Bio::SearchIO (which was removed still later...): Unfortunately, the commit didn't update the documentation. From the tiny little bit I've looked at the code, it looks like you should just be calling the 'new()' method instead (note that it takes a set of arguments, not just a BLAST hit object). Hope this helps -- if you should happen to have the tuits, a patch to update the documentation to reflect the current interface would be awesome... chrs, john. From david.breimann at gmail.com Fri Aug 13 09:01:10 2010 From: david.breimann at gmail.com (David Breimann) Date: Fri, 13 Aug 2010 16:01:10 +0300 Subject: [Bioperl-l] Problem executing bp_genbank2gff3.pl from another perl script Message-ID: Hi, I am rying to run bp_genbank2gff3.pl from another perl script that gets a genbank as its argument. This does not work (no output files are generated): my $command = "bp_genbank2gff3.pl -y -o /tmp $ARGV[0]"; open( my $command_out, "-|", $command ); close $command_out; but this does open( my $command_out, "-|", $command ); sleep 3; # why do I need to sleep? close $command_out; Why? I though that close is supposed to block until the command is done: Closing any piped filehandle causes the parent process to wait for the child to finish... (see http://perldoc.perl.org/functions/open.html). Thanks Dave From jun.yin at ucd.ie Fri Aug 13 09:36:34 2010 From: jun.yin at ucd.ie (Jun Yin) Date: Fri, 13 Aug 2010 14:36:34 +0100 Subject: [Bioperl-l] Bio::LocatableSeq end checking inconsistency Message-ID: <004a01cb3aec$8c2ddd60$a4899820$%yin@ucd.ie> Hi, all, I am the google summer of code student working on Bio::Align subsystem refactoring. The code (Bio::SimpleAlign) I re-implemented now has passed nearly all the test, except a few tests on seq/start-end testing. But here comes a problem. This may be an old issue, that the Bio::LocatableSeq end assignment and checking are inconsistent. The current end checking method is based on: $end=$seq->_ungapped_len+$seq->start-1 However, this checking may not fit the real world case. The inconsistency usually happens when a few columns of the sequence are removed. For example: my $a = Bio::LocatableSeq->new( -id => 'a', -strand => 1, -seq => '-tcgatc-atcgatcg', -start => 30, -end => 43 ); If we remove the 1st, 8th and the last columns $a->seq() will be 'tcgatcatcgatc' $a->_ungapped_len==12 Actually, in the real world, the first residue will still be 30 (the old $seq->start), and the last residue is the residue before the 43 (the old $seq->end), thus 42. But if you call a validation, the calculation is $a->_ungapped_len+$a->start-1=12+30-1=41 So the reassignment of the $seq->end will not pass the validation. So unless you save the information to a new sequence object, the original position information will be lost anyway. But in some cases, we have to change the sequence in its original sequence object .. What is your suggestion on this issue? A. pass the test and lose the information #convenient in coding but the start-end annotation is not right any more B. keep the information and forget the test #the object will still remember where the last residue was in the original sequence. But is it really meaningful at all? Because all the other residues may come from nowhere C. Neither of above #any other suggestions? Cheers, Jun Yin Ph.D. student in U.C.D. Bioinformatics Laboratory Conway Institute University College Dublin From jessica.sun at gmail.com Fri Aug 13 11:06:46 2010 From: jessica.sun at gmail.com (Jessica Sun) Date: Fri, 13 Aug 2010 11:06:46 -0400 Subject: [Bioperl-l] Add sequence feature Message-ID: Does anyone knows how to open a genbank file, add new feature and then save a new genbank file with new feature added in bioperl ? thx -- Jessica Jingping Sun From jessica.sun at gmail.com Fri Aug 13 11:27:10 2010 From: jessica.sun at gmail.com (Jessica Sun) Date: Fri, 13 Aug 2010 11:27:10 -0400 Subject: [Bioperl-l] Add sequence feature In-Reply-To: <4C6562E0.7090008@gmail.com> References: <4C6562E0.7090008@gmail.com> Message-ID: unfortunately. I want to add the feature to the sequence object I got from the Genbank file, I do not mind to save a new genbank file but these new genbank file contains the original genbank format and info I got plus the new feature tags I need to added to. Any quick solution to this? thx Jessica On Fri, Aug 13, 2010 at 11:21 AM, Roy Chaudhuri wrote: > Hi Jessica. > > You need to use Bio::SeqIO to read in the GenBank file to a BioPerl > sequence object, and to write your new GenBank file: > http://www.bioperl.org/wiki/HOWTO:SeqIO > > To add a new feature follow the instructions here: > > http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Own_Sequences > > (except that you are adding the feature to the sequence object you got from > the Genbank file, not a new Bio::Seq object). > > Cheers. > Roy. > > > On 13/08/2010 16:06, Jessica Sun wrote: > >> Does anyone knows how to open a genbank file, add new feature and then >> save >> a new genbank >> file with new feature added in bioperl ? >> >> thx >> >> > -- Jessica Jingping Sun From roy.chaudhuri at gmail.com Fri Aug 13 11:21:04 2010 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Fri, 13 Aug 2010 16:21:04 +0100 Subject: [Bioperl-l] Add sequence feature In-Reply-To: References: Message-ID: <4C6562E0.7090008@gmail.com> Hi Jessica. You need to use Bio::SeqIO to read in the GenBank file to a BioPerl sequence object, and to write your new GenBank file: http://www.bioperl.org/wiki/HOWTO:SeqIO To add a new feature follow the instructions here: http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Own_Sequences (except that you are adding the feature to the sequence object you got from the Genbank file, not a new Bio::Seq object). Cheers. Roy. On 13/08/2010 16:06, Jessica Sun wrote: > Does anyone knows how to open a genbank file, add new feature and then save > a new genbank > file with new feature added in bioperl ? > > thx > From roy.chaudhuri at gmail.com Fri Aug 13 11:37:20 2010 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Fri, 13 Aug 2010 16:37:20 +0100 Subject: [Bioperl-l] Add sequence feature In-Reply-To: References: <4C6562E0.7090008@gmail.com> Message-ID: <4C6566B0.60706@gmail.com> I'm not sure I understand, do you mean that you want to load just the sequence from the GenBank file (ignoring the existing annotation), then add your own features? There are instructions on how to do that here: http://www.bioperl.org/wiki/HOWTO:SeqIO#Speed.2C_Bio::Seq::SeqBuilder On 13/08/2010 16:27, Jessica Sun wrote: > unfortunately. I want to add the feature to the sequence object I got > from the Genbank file, I do not mind to save a new genbank file but > these new genbank file contains the original genbank format and info I > got plus the new feature tags I need to added to. Any quick solution to > this? > > thx > > Jessica > > > > On Fri, Aug 13, 2010 at 11:21 AM, Roy Chaudhuri > wrote: > > Hi Jessica. > > You need to use Bio::SeqIO to read in the GenBank file to a BioPerl > sequence object, and to write your new GenBank file: > http://www.bioperl.org/wiki/HOWTO:SeqIO > > To add a new feature follow the instructions here: > http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Own_Sequences > > (except that you are adding the feature to the sequence object you > got from the Genbank file, not a new Bio::Seq object). > > Cheers. > Roy. > > > On 13/08/2010 16:06, Jessica Sun wrote: > > Does anyone knows how to open a genbank file, add new feature > and then save > a new genbank > file with new feature added in bioperl ? > > thx > > > > > > -- > Jessica Jingping Sun From roy.chaudhuri at gmail.com Fri Aug 13 11:57:27 2010 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Fri, 13 Aug 2010 16:57:27 +0100 Subject: [Bioperl-l] Add sequence feature In-Reply-To: References: <4C6562E0.7090008@gmail.com> <4C6566B0.60706@gmail.com> Message-ID: <4C656B67.5020402@gmail.com> Please remember to copy replies to the mailing list. You can loop over the features in your Bio::Seq object: for my $feat ($seq->get_SeqFeatures) { # do something } And once you have found the feature you want to modify, you can add a tag using something like: $feat->add_tag_value('note',"this is a note"); When you're finished you can write out the modified sequence object to a new GenBank file. On 13/08/2010 16:40, Jessica Sun wrote: > no i want to load the genbank file with existing features and I need to > add some new feature tags to the existing ones and then save to a new > update genbank file for local usage. I just not quite good on how to > easily merge the two steps you recommended into one in a neat way. > > thx > > > On Fri, Aug 13, 2010 at 11:37 AM, Roy Chaudhuri > wrote: > > I'm not sure I understand, do you mean that you want to load just > the sequence from the GenBank file (ignoring the existing > annotation), then add your own features? There are instructions on > how to do that here: > http://www.bioperl.org/wiki/HOWTO:SeqIO#Speed.2C_Bio::Seq::SeqBuilder > > > On 13/08/2010 16:27, Jessica Sun wrote: > > unfortunately. I want to add the feature to the sequence object > I got > from the Genbank file, I do not mind to save a new genbank file but > these new genbank file contains the original genbank format and > info I > got plus the new feature tags I need to added to. Any quick > solution to > this? > > thx > > Jessica > > > > On Fri, Aug 13, 2010 at 11:21 AM, Roy Chaudhuri > > >> wrote: > > Hi Jessica. > > You need to use Bio::SeqIO to read in the GenBank file to a > BioPerl > sequence object, and to write your new GenBank file: > http://www.bioperl.org/wiki/HOWTO:SeqIO > > To add a new feature follow the instructions here: > http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Own_Sequences > > (except that you are adding the feature to the sequence > object you > got from the Genbank file, not a new Bio::Seq object). > > Cheers. > Roy. > > > On 13/08/2010 16:06, Jessica Sun wrote: > > Does anyone knows how to open a genbank file, add new > feature > and then save > a new genbank > file with new feature added in bioperl ? > > thx > > > > > > -- > Jessica Jingping Sun > > > > > > -- > Jessica Jingping Sun From jessica.sun at gmail.com Fri Aug 13 13:06:32 2010 From: jessica.sun at gmail.com (Jessica Sun) Date: Fri, 13 Aug 2010 13:06:32 -0400 Subject: [Bioperl-l] Add sequence feature In-Reply-To: <4C656B67.5020402@gmail.com> References: <4C6562E0.7090008@gmail.com> <4C6566B0.60706@gmail.com> <4C656B67.5020402@gmail.com> Message-ID: Thanks. I somehow get these error messages. --------------------- WARNING --------------------- MSG: Bio::SeqIO::genbank=HASH(0xa7ba1c) is not a SeqI compliant module. Attempting to dump, but may fail! --------------------------------------------------- Can't locate object method "seq" via package "Bio::SeqIO::genbank" at /Library/Perl/5.8.8/Bio/SeqIO/genbank.pm line 760, line 447. by doing this, my $feat = new Bio::SeqFeature::Generic(-start =>20, -end => $40, -primary_tag => 'newfeature' ); $feat->add_tag_value("note","this is notes"); $f->add_SeqFeature($feat); ## f is original feature pointer $io = Bio::SeqIO->new(-format => "genbank", -file => ">$newoutfile" ); $io->write_seq($seqio_object); On Fri, Aug 13, 2010 at 11:57 AM, Roy Chaudhuri wrote: > Please remember to copy replies to the mailing list. > > You can loop over the features in your Bio::Seq object: > for my $feat ($seq->get_SeqFeatures) { # do something } > > And once you have found the feature you want to modify, you can add a tag > using something like: > $feat->add_tag_value('note',"this is a note"); > > When you're finished you can write out the modified sequence object to a > new GenBank file. > > > On 13/08/2010 16:40, Jessica Sun wrote: > >> no i want to load the genbank file with existing features and I need to >> add some new feature tags to the existing ones and then save to a new >> update genbank file for local usage. I just not quite good on how to >> easily merge the two steps you recommended into one in a neat way. >> >> thx >> >> >> On Fri, Aug 13, 2010 at 11:37 AM, Roy Chaudhuri > > wrote: >> >> I'm not sure I understand, do you mean that you want to load just >> the sequence from the GenBank file (ignoring the existing >> annotation), then add your own features? There are instructions on >> how to do that here: >> http://www.bioperl.org/wiki/HOWTO:SeqIO#Speed.2C_Bio::Seq::SeqBuilder >> >> >> On 13/08/2010 16:27, Jessica Sun wrote: >> >> unfortunately. I want to add the feature to the sequence object >> I got >> from the Genbank file, I do not mind to save a new genbank file but >> these new genbank file contains the original genbank format and >> info I >> got plus the new feature tags I need to added to. Any quick >> solution to >> this? >> >> thx >> >> Jessica >> >> >> >> On Fri, Aug 13, 2010 at 11:21 AM, Roy Chaudhuri >> >> > >> wrote: >> >> Hi Jessica. >> >> You need to use Bio::SeqIO to read in the GenBank file to a >> BioPerl >> sequence object, and to write your new GenBank file: >> http://www.bioperl.org/wiki/HOWTO:SeqIO >> >> To add a new feature follow the instructions here: >> >> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Own_Sequences >> >> (except that you are adding the feature to the sequence >> object you >> got from the Genbank file, not a new Bio::Seq object). >> >> Cheers. >> Roy. >> >> >> On 13/08/2010 16:06, Jessica Sun wrote: >> >> Does anyone knows how to open a genbank file, add new >> feature >> and then save >> a new genbank >> file with new feature added in bioperl ? >> >> thx >> >> >> >> >> >> -- >> Jessica Jingping Sun >> >> >> >> >> >> -- >> Jessica Jingping Sun >> > > -- Jessica Jingping Sun From drummike at gmail.com Fri Aug 13 13:41:55 2010 From: drummike at gmail.com (Mike Williams) Date: Fri, 13 Aug 2010 13:41:55 -0400 Subject: [Bioperl-l] Add sequence feature In-Reply-To: References: <4C6562E0.7090008@gmail.com> <4C6566B0.60706@gmail.com> <4C656B67.5020402@gmail.com> Message-ID: On Fri, Aug 13, 2010 at 1:06 PM, Jessica Sun wrote: > Thanks. I somehow get these error messages. > by doing this, > > my $feat = new Bio::SeqFeature::Generic(-start =>20, > -end => $40, > -primary_tag => 'newfeature' ); > $feat->add_tag_value("note","this is > notes"); > That $40 looks fishy. Try deleting the dollar sign. You did mean just 40, right? Mike From MEC at stowers.org Fri Aug 13 13:37:50 2010 From: MEC at stowers.org (Cook, Malcolm) Date: Fri, 13 Aug 2010 12:37:50 -0500 Subject: [Bioperl-l] Add sequence feature In-Reply-To: References: <4C6562E0.7090008@gmail.com> <4C6566B0.60706@gmail.com> <4C656B67.5020402@gmail.com> Message-ID: Jessica, Show more code! In particular, where did $f get set? --Malcolm -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Jessica Sun Sent: Friday, August 13, 2010 12:07 PM To: Roy Chaudhuri Cc: bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Add sequence feature Thanks. I somehow get these error messages. --------------------- WARNING --------------------- MSG: Bio::SeqIO::genbank=HASH(0xa7ba1c) is not a SeqI compliant module. Attempting to dump, but may fail! --------------------------------------------------- Can't locate object method "seq" via package "Bio::SeqIO::genbank" at /Library/Perl/5.8.8/Bio/SeqIO/genbank.pm line 760, line 447. by doing this, my $feat = new Bio::SeqFeature::Generic(-start =>20, -end => $40, -primary_tag => 'newfeature' ); $feat->add_tag_value("note","this is notes"); $f->add_SeqFeature($feat); ## f is original feature pointer $io = Bio::SeqIO->new(-format => "genbank", -file => ">$newoutfile" ); $io->write_seq($seqio_object); On Fri, Aug 13, 2010 at 11:57 AM, Roy Chaudhuri wrote: > Please remember to copy replies to the mailing list. > > You can loop over the features in your Bio::Seq object: > for my $feat ($seq->get_SeqFeatures) { # do something } > > And once you have found the feature you want to modify, you can add a > tag using something like: > $feat->add_tag_value('note',"this is a note"); > > When you're finished you can write out the modified sequence object to > a new GenBank file. > > > On 13/08/2010 16:40, Jessica Sun wrote: > >> no i want to load the genbank file with existing features and I need >> to add some new feature tags to the existing ones and then save to a >> new update genbank file for local usage. I just not quite good on how >> to easily merge the two steps you recommended into one in a neat way. >> >> thx >> >> >> On Fri, Aug 13, 2010 at 11:37 AM, Roy Chaudhuri >> > wrote: >> >> I'm not sure I understand, do you mean that you want to load just >> the sequence from the GenBank file (ignoring the existing >> annotation), then add your own features? There are instructions on >> how to do that here: >> >> http://www.bioperl.org/wiki/HOWTO:SeqIO#Speed.2C_Bio::Seq::SeqBuilder >> >> >> On 13/08/2010 16:27, Jessica Sun wrote: >> >> unfortunately. I want to add the feature to the sequence object >> I got >> from the Genbank file, I do not mind to save a new genbank file but >> these new genbank file contains the original genbank format and >> info I >> got plus the new feature tags I need to added to. Any quick >> solution to >> this? >> >> thx >> >> Jessica >> >> >> >> On Fri, Aug 13, 2010 at 11:21 AM, Roy Chaudhuri >> >> > >> wrote: >> >> Hi Jessica. >> >> You need to use Bio::SeqIO to read in the GenBank file to a >> BioPerl >> sequence object, and to write your new GenBank file: >> http://www.bioperl.org/wiki/HOWTO:SeqIO >> >> To add a new feature follow the instructions here: >> >> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Ow >> n_Sequences >> >> (except that you are adding the feature to the sequence >> object you >> got from the Genbank file, not a new Bio::Seq object). >> >> Cheers. >> Roy. >> >> >> On 13/08/2010 16:06, Jessica Sun wrote: >> >> Does anyone knows how to open a genbank file, add new >> feature >> and then save >> a new genbank >> file with new feature added in bioperl ? >> >> thx >> >> >> >> >> >> -- >> Jessica Jingping Sun >> >> >> >> >> >> -- >> Jessica Jingping Sun >> > > -- Jessica Jingping Sun _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From Kevin.M.Brown at asu.edu Fri Aug 13 13:53:50 2010 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Fri, 13 Aug 2010 10:53:50 -0700 Subject: [Bioperl-l] Add sequence feature In-Reply-To: References: <4C6562E0.7090008@gmail.com><4C6566B0.60706@gmail.com><4C656B67.5020402@gmail.com> Message-ID: <1A4207F8295607498283FE9E93B775B406E4529F@EX02.asurite.ad.asu.edu> If I'm reading your sample code correctly, then you are mistakenly trying to output the input SeqIO object and not the actual Bio::Seq object that was read in by SeqIO. My $seqio = Bio::SeqIO->new; My $seq = $seqio->next_seq; #manipulate $seq My $out = Bio::SeqIO->new; $out->write_seq($seq); -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Jessica Sun Sent: Friday, August 13, 2010 10:07 AM To: Roy Chaudhuri Cc: bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Add sequence feature Thanks. I somehow get these error messages. --------------------- WARNING --------------------- MSG: Bio::SeqIO::genbank=HASH(0xa7ba1c) is not a SeqI compliant module. Attempting to dump, but may fail! --------------------------------------------------- Can't locate object method "seq" via package "Bio::SeqIO::genbank" at /Library/Perl/5.8.8/Bio/SeqIO/genbank.pm line 760, line 447. by doing this, my $feat = new Bio::SeqFeature::Generic(-start =>20, -end => $40, -primary_tag => 'newfeature' ); $feat->add_tag_value("note","this is notes"); $f->add_SeqFeature($feat); ## f is original feature pointer $io = Bio::SeqIO->new(-format => "genbank", -file => ">$newoutfile" ); $io->write_seq($seqio_object); On Fri, Aug 13, 2010 at 11:57 AM, Roy Chaudhuri wrote: > Please remember to copy replies to the mailing list. > > You can loop over the features in your Bio::Seq object: > for my $feat ($seq->get_SeqFeatures) { # do something } > > And once you have found the feature you want to modify, you can add a tag > using something like: > $feat->add_tag_value('note',"this is a note"); > > When you're finished you can write out the modified sequence object to a > new GenBank file. > > > On 13/08/2010 16:40, Jessica Sun wrote: > >> no i want to load the genbank file with existing features and I need to >> add some new feature tags to the existing ones and then save to a new >> update genbank file for local usage. I just not quite good on how to >> easily merge the two steps you recommended into one in a neat way. >> >> thx >> >> >> On Fri, Aug 13, 2010 at 11:37 AM, Roy Chaudhuri > > wrote: >> >> I'm not sure I understand, do you mean that you want to load just >> the sequence from the GenBank file (ignoring the existing >> annotation), then add your own features? There are instructions on >> how to do that here: >> http://www.bioperl.org/wiki/HOWTO:SeqIO#Speed.2C_Bio::Seq::SeqBuilder >> >> >> On 13/08/2010 16:27, Jessica Sun wrote: >> >> unfortunately. I want to add the feature to the sequence object >> I got >> from the Genbank file, I do not mind to save a new genbank file but >> these new genbank file contains the original genbank format and >> info I >> got plus the new feature tags I need to added to. Any quick >> solution to >> this? >> >> thx >> >> Jessica >> >> >> >> On Fri, Aug 13, 2010 at 11:21 AM, Roy Chaudhuri >> >> > >> wrote: >> >> Hi Jessica. >> >> You need to use Bio::SeqIO to read in the GenBank file to a >> BioPerl >> sequence object, and to write your new GenBank file: >> http://www.bioperl.org/wiki/HOWTO:SeqIO >> >> To add a new feature follow the instructions here: >> >> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Own_S equences >> >> (except that you are adding the feature to the sequence >> object you >> got from the Genbank file, not a new Bio::Seq object). >> >> Cheers. >> Roy. >> >> >> On 13/08/2010 16:06, Jessica Sun wrote: >> >> Does anyone knows how to open a genbank file, add new >> feature >> and then save >> a new genbank >> file with new feature added in bioperl ? >> >> thx >> >> >> >> >> >> -- >> Jessica Jingping Sun >> >> >> >> >> >> -- >> Jessica Jingping Sun >> > > -- Jessica Jingping Sun _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From jessica.sun at gmail.com Fri Aug 13 15:16:51 2010 From: jessica.sun at gmail.com (Jessica Sun) Date: Fri, 13 Aug 2010 15:16:51 -0400 Subject: [Bioperl-l] Fwd: Add sequence feature In-Reply-To: References: <4C6562E0.7090008@gmail.com> <4C6566B0.60706@gmail.com> <4C656B67.5020402@gmail.com> <1A4207F8295607498283FE9E93B775B406E4529F@EX02.asurite.ad.asu.edu> Message-ID: ---------- Forwarded message ---------- From: Jessica Sun Date: Fri, Aug 13, 2010 at 3:16 PM Subject: Re: [Bioperl-l] Add sequence feature To: Kevin Brown yes, I change that, somehow it still did not take the added features in. On Fri, Aug 13, 2010 at 1:53 PM, Kevin Brown wrote: > If I'm reading your sample code correctly, then you are mistakenly > trying to output the input SeqIO object and not the actual Bio::Seq > object that was read in by SeqIO. > > My $seqio = Bio::SeqIO->new; > My $seq = $seqio->next_seq; > > #manipulate $seq > > My $out = Bio::SeqIO->new; > $out->write_seq($seq); > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Jessica Sun > Sent: Friday, August 13, 2010 10:07 AM > To: Roy Chaudhuri > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Add sequence feature > > Thanks. I somehow get these error messages. > > --------------------- WARNING --------------------- > MSG: Bio::SeqIO::genbank=HASH(0xa7ba1c) is not a SeqI compliant module. > Attempting to dump, but may fail! > --------------------------------------------------- > Can't locate object method "seq" via package "Bio::SeqIO::genbank" at > /Library/Perl/5.8.8/Bio/SeqIO/genbank.pm line 760, line 447. > > by doing this, > > my $feat = new Bio::SeqFeature::Generic(-start =>20, > -end => $40, > -primary_tag => 'newfeature' ); > $feat->add_tag_value("note","this is > notes"); > $f->add_SeqFeature($feat); ## f is original feature pointer > $io = Bio::SeqIO->new(-format => "genbank", -file => ">$newoutfile" ); > > $io->write_seq($seqio_object); > > On Fri, Aug 13, 2010 at 11:57 AM, Roy Chaudhuri > wrote: > > > Please remember to copy replies to the mailing list. > > > > You can loop over the features in your Bio::Seq object: > > for my $feat ($seq->get_SeqFeatures) { # do something } > > > > And once you have found the feature you want to modify, you can add a > tag > > using something like: > > $feat->add_tag_value('note',"this is a note"); > > > > When you're finished you can write out the modified sequence object to > a > > new GenBank file. > > > > > > On 13/08/2010 16:40, Jessica Sun wrote: > > > >> no i want to load the genbank file with existing features and I need > to > >> add some new feature tags to the existing ones and then save to a new > >> update genbank file for local usage. I just not quite good on how to > >> easily merge the two steps you recommended into one in a neat way. > >> > >> thx > >> > >> > >> On Fri, Aug 13, 2010 at 11:37 AM, Roy Chaudhuri > >> > wrote: > >> > >> I'm not sure I understand, do you mean that you want to load just > >> the sequence from the GenBank file (ignoring the existing > >> annotation), then add your own features? There are instructions on > >> how to do that here: > >> > http://www.bioperl.org/wiki/HOWTO:SeqIO#Speed.2C_Bio::Seq::SeqBuilder > >> > >> > >> On 13/08/2010 16:27, Jessica Sun wrote: > >> > >> unfortunately. I want to add the feature to the sequence > object > >> I got > >> from the Genbank file, I do not mind to save a new genbank > file but > >> these new genbank file contains the original genbank format > and > >> info I > >> got plus the new feature tags I need to added to. Any quick > >> solution to > >> this? > >> > >> thx > >> > >> Jessica > >> > >> > >> > >> On Fri, Aug 13, 2010 at 11:21 AM, Roy Chaudhuri > >> > >> >> >> wrote: > >> > >> Hi Jessica. > >> > >> You need to use Bio::SeqIO to read in the GenBank file to > a > >> BioPerl > >> sequence object, and to write your new GenBank file: > >> http://www.bioperl.org/wiki/HOWTO:SeqIO > >> > >> To add a new feature follow the instructions here: > >> > >> > http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Own_S > equences > >> > >> (except that you are adding the feature to the sequence > >> object you > >> got from the Genbank file, not a new Bio::Seq object). > >> > >> Cheers. > >> Roy. > >> > >> > >> On 13/08/2010 16:06, Jessica Sun wrote: > >> > >> Does anyone knows how to open a genbank file, add new > >> feature > >> and then save > >> a new genbank > >> file with new feature added in bioperl ? > >> > >> thx > >> > >> > >> > >> > >> > >> -- > >> Jessica Jingping Sun > >> > >> > >> > >> > >> > >> -- > >> Jessica Jingping Sun > >> > > > > > > > -- > Jessica Jingping Sun > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Jessica Jingping Sun -- Jessica Jingping Sun From MEC at stowers.org Fri Aug 13 15:56:09 2010 From: MEC at stowers.org (Cook, Malcolm) Date: Fri, 13 Aug 2010 14:56:09 -0500 Subject: [Bioperl-l] Fwd: Add sequence feature In-Reply-To: References: <4C6562E0.7090008@gmail.com> <4C6566B0.60706@gmail.com> <4C656B67.5020402@gmail.com> <1A4207F8295607498283FE9E93B775B406E4529F@EX02.asurite.ad.asu.edu> Message-ID: if you want to show all your code we might not have to guess at what the problem is..... Malcolm Cook Stowers Institute for Medical Research - Bioinformatics Kansas City, Missouri USA -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Jessica Sun Sent: Friday, August 13, 2010 2:17 PM To: bioperl-l at lists.open-bio.org Subject: [Bioperl-l] Fwd: Add sequence feature ---------- Forwarded message ---------- From: Jessica Sun Date: Fri, Aug 13, 2010 at 3:16 PM Subject: Re: [Bioperl-l] Add sequence feature To: Kevin Brown yes, I change that, somehow it still did not take the added features in. On Fri, Aug 13, 2010 at 1:53 PM, Kevin Brown wrote: > If I'm reading your sample code correctly, then you are mistakenly > trying to output the input SeqIO object and not the actual Bio::Seq > object that was read in by SeqIO. > > My $seqio = Bio::SeqIO->new; > My $seq = $seqio->next_seq; > > #manipulate $seq > > My $out = Bio::SeqIO->new; > $out->write_seq($seq); > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Jessica Sun > Sent: Friday, August 13, 2010 10:07 AM > To: Roy Chaudhuri > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Add sequence feature > > Thanks. I somehow get these error messages. > > --------------------- WARNING --------------------- > MSG: Bio::SeqIO::genbank=HASH(0xa7ba1c) is not a SeqI compliant module. > Attempting to dump, but may fail! > --------------------------------------------------- > Can't locate object method "seq" via package "Bio::SeqIO::genbank" at > /Library/Perl/5.8.8/Bio/SeqIO/genbank.pm line 760, line 447. > > by doing this, > > my $feat = new Bio::SeqFeature::Generic(-start =>20, > -end => $40, > -primary_tag => 'newfeature' ); > $feat->add_tag_value("note","this > is notes"); $f->add_SeqFeature($feat); ## f is original feature > pointer $io = Bio::SeqIO->new(-format => "genbank", -file => > ">$newoutfile" ); > > $io->write_seq($seqio_object); > > On Fri, Aug 13, 2010 at 11:57 AM, Roy Chaudhuri > wrote: > > > Please remember to copy replies to the mailing list. > > > > You can loop over the features in your Bio::Seq object: > > for my $feat ($seq->get_SeqFeatures) { # do something } > > > > And once you have found the feature you want to modify, you can add > > a > tag > > using something like: > > $feat->add_tag_value('note',"this is a note"); > > > > When you're finished you can write out the modified sequence object > > to > a > > new GenBank file. > > > > > > On 13/08/2010 16:40, Jessica Sun wrote: > > > >> no i want to load the genbank file with existing features and I > >> need > to > >> add some new feature tags to the existing ones and then save to a > >> new update genbank file for local usage. I just not quite good on > >> how to easily merge the two steps you recommended into one in a neat way. > >> > >> thx > >> > >> > >> On Fri, Aug 13, 2010 at 11:37 AM, Roy Chaudhuri > >> > wrote: > >> > >> I'm not sure I understand, do you mean that you want to load just > >> the sequence from the GenBank file (ignoring the existing > >> annotation), then add your own features? There are instructions on > >> how to do that here: > >> > http://www.bioperl.org/wiki/HOWTO:SeqIO#Speed.2C_Bio::Seq::SeqBuilder > >> > >> > >> On 13/08/2010 16:27, Jessica Sun wrote: > >> > >> unfortunately. I want to add the feature to the sequence > object > >> I got > >> from the Genbank file, I do not mind to save a new genbank > file but > >> these new genbank file contains the original genbank format > and > >> info I > >> got plus the new feature tags I need to added to. Any quick > >> solution to > >> this? > >> > >> thx > >> > >> Jessica > >> > >> > >> > >> On Fri, Aug 13, 2010 at 11:21 AM, Roy Chaudhuri > >> > >> >> >> wrote: > >> > >> Hi Jessica. > >> > >> You need to use Bio::SeqIO to read in the GenBank file > >> to > a > >> BioPerl > >> sequence object, and to write your new GenBank file: > >> http://www.bioperl.org/wiki/HOWTO:SeqIO > >> > >> To add a new feature follow the instructions here: > >> > >> > http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Own > _S > equences > >> > >> (except that you are adding the feature to the sequence > >> object you > >> got from the Genbank file, not a new Bio::Seq object). > >> > >> Cheers. > >> Roy. > >> > >> > >> On 13/08/2010 16:06, Jessica Sun wrote: > >> > >> Does anyone knows how to open a genbank file, add new > >> feature > >> and then save > >> a new genbank > >> file with new feature added in bioperl ? > >> > >> thx > >> > >> > >> > >> > >> > >> -- > >> Jessica Jingping Sun > >> > >> > >> > >> > >> > >> -- > >> Jessica Jingping Sun > >> > > > > > > > -- > Jessica Jingping Sun > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Jessica Jingping Sun -- Jessica Jingping Sun _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Aug 16 14:02:15 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 16 Aug 2010 13:02:15 -0500 Subject: [Bioperl-l] Bug? Features with similar ranges, different IDs are considered overlapping Message-ID: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu> All, This is in reference to a bug report I filed a while back. In the below test script, two features with the same start/end are compared. If the features have the same seq_id(), overlap succeeds. If the seq_id is changed (e.g. is on another chromosome, for instance), the overlap still succeeds. The question is: is this a bug? My vote would be 'yes', but there have been various arguments to say it's not. chris (maybe I'll make this a regular thing on the list, just to hash out some of the edge cases I run into periodically) ========================================= #!/usr/bin/perl -w use strict; use warnings; use Test::More; use Bio::SeqFeature::Generic; my ( $feat1, $feat2 ); $feat1 = Bio::SeqFeature::Generic->new( -start => 40, -end => 80, -strand => 1, -seq_id => 'ABC123', ); is $feat1->start, 40, 'start of feature location'; is $feat1->end, 80, 'end of feature location'; is $feat1->seq_id, 'ABC123', 'seq_id'; $feat2 = Bio::SeqFeature::Generic->new( -start => 40, -end => 80, -strand => 1, -seq_id => 'ABC123', ); is $feat2->start, 40, 'start of feature location'; is $feat2->end, 80, 'end of feature location'; is $feat2->seq_id, 'ABC123', 'seq_id'; # Generic features with same Seq ID should overlap ok( $feat2->overlaps($feat1), 'feat2 overlaps feat1' ); # Generic features with different Seq IDs shouldn't overlap is( $feat2->seq_id('XYZ678'), 'XYZ678', 'change seq_id' ); # this currently fails ok( !( $feat2->overlaps($feat1), 'feat2 doesn\'t overlap feat1' ) ); done_testing(); From David.Messina at sbc.su.se Mon Aug 16 14:51:54 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 16 Aug 2010 20:51:54 +0200 Subject: [Bioperl-l] Bug? Features with similar ranges, different IDs are considered overlapping In-Reply-To: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu> References: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu> Message-ID: > The question is: is this a bug? Hmm, tricky. Genomic start and end positions with differing IDs shouldn't overlap, but can't SeqFeatures apply to proteins and other molecules where one would want to compare positions without regard to ID? Dave From cjfields at illinois.edu Mon Aug 16 21:39:00 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 16 Aug 2010 20:39:00 -0500 Subject: [Bioperl-l] Bug? Features with similar ranges, different IDs are considered overlapping In-Reply-To: References: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu> Message-ID: On Aug 16, 2010, at 1:51 PM, Dave Messina wrote: >> The question is: is this a bug? > > Hmm, tricky. > > Genomic start and end positions with differing IDs shouldn't overlap, but can't SeqFeatures apply to proteins and other molecules where one would want to compare positions without regard to ID? > > Dave Good point; it's probably the context the methods are used that matters. So, maybe just a document clarification? chris From David.Messina at sbc.su.se Tue Aug 17 05:06:05 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 17 Aug 2010 11:06:05 +0200 Subject: [Bioperl-l] Bug? Features with similar ranges, different IDs are considered overlapping In-Reply-To: References: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu> Message-ID: <83732A2C-C6E3-479A-ACAA-FE5756271991@sbc.su.se> > Good point; it's probably the context the methods are used that matters. So, maybe just a document clarification? That's always good, but it really doesn't solve the issue you're describing. I mean, who would expect to get overlaps for features on different chromosomes? To me, that's a clear violation of reasonable user expectations. You shouldn't have to read the docs about something like that. So what's the solution for these duelling use cases? I haven't thought about it much, but a first approximation might be to add a -genomic boolean flag that, when true, would do the right thing and check the ID when doing overlaps or other positional comparisons. (Maybe -genomic is too obscure. Maybe it should be -same_id_for_overlaps or something like that.) And maybe having to know to set a flag is effectively the same thing as having to read the docs to understand SeqFeature's overlap behavior. What do the rest of you out there think? Dave From scott at scottcain.net Tue Aug 17 08:45:27 2010 From: scott at scottcain.net (Scott Cain) Date: Tue, 17 Aug 2010 08:45:27 -0400 Subject: [Bioperl-l] Bug? Features with similar ranges, different IDs are considered overlapping In-Reply-To: <83732A2C-C6E3-479A-ACAA-FE5756271991@sbc.su.se> References: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu> <83732A2C-C6E3-479A-ACAA-FE5756271991@sbc.su.se> Message-ID: Hi Dave and Chris, It seems to me that the genomic comparison is the thing people would do more often, so if you're going to create a flag, the default should be for the genomic comparison and if somebody is doing the protein space comparison and not getting the the expected results, they'll probably read the docs to find out why. Scott -- Scott Cain, Ph. D. scott at scottcain dot net Ontario Institute for Cancer Research http://gmod.org/ 216 392 3087 Snet from my iPhone. On Aug 17, 2010, at 5:06 AM, Dave Messina wrote: >> Good point; it's probably the context the methods are used that matters. So, maybe just a document clarification? > > That's always good, but it really doesn't solve the issue you're describing. > > I mean, who would expect to get overlaps for features on different chromosomes? > > To me, that's a clear violation of reasonable user expectations. You shouldn't have to read the docs about something like that. > > So what's the solution for these duelling use cases? I haven't thought about it much, but a first approximation might be to add a -genomic boolean flag that, when true, would do the right thing and check the ID when doing overlaps or other positional comparisons. > > (Maybe -genomic is too obscure. Maybe it should be -same_id_for_overlaps or something like that.) > > And maybe having to know to set a flag is effectively the same thing as having to read the docs to understand SeqFeature's overlap behavior. > > What do the rest of you out there think? > > > Dave > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From david.breimann at gmail.com Tue Aug 17 09:44:08 2010 From: david.breimann at gmail.com (David Breimann) Date: Tue, 17 Aug 2010 16:44:08 +0300 Subject: [Bioperl-l] bp_genbank2gff3.pl error with circular genomes Message-ID: Hello, The following genbank has a gene that runs over the 'end" of the chromosome and into its "beginning", and the script generates an error. ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Bacillus_cereus_ATCC_10987/NC_005707.gbk NC_005707 Unflattening error: Details: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: PROBLEM, SEVERITY==2 Ranges not in correct order. Strange ensembl genbank entry? Range: [207497,208369] [1,687] STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:473 STACK: Bio::SeqFeature::Tools::Unflattener::problem /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:952 STACK: Bio::SeqFeature::Tools::Unflattener::_check_order_is_consistent /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2842 STACK: Bio::SeqFeature::Tools::Unflattener::infer_mRNA_from_CDS /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2713 STACK: Bio::SeqFeature::Tools::Unflattener::unflatten_seq /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:1532 STACK: main::unflatten_seq /usr/local/bin/bp_genbank2gff3.pl:1023 STACK: /usr/local/bin/bp_genbank2gff3.pl:506 ----------------------------------------------------------- Best, Dave From cjfields at illinois.edu Tue Aug 17 09:51:02 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 17 Aug 2010 08:51:02 -0500 Subject: [Bioperl-l] bp_genbank2gff3.pl error with circular genomes In-Reply-To: References: Message-ID: <8E620E8B-4A4A-42B2-A2FA-60C8ECA4C8A5@illinois.edu> I think Chris Mungall has a branch set up for this in bioperl: http://github.com/bioperl/bioperl-live/tree/circular Is that correct? Should we merge that code into the master branch? chris On Aug 17, 2010, at 8:44 AM, David Breimann wrote: > Hello, > > The following genbank has a gene that runs over the 'end" of the > chromosome and into its "beginning", and the script generates an > error. > > ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Bacillus_cereus_ATCC_10987/NC_005707.gbk > > NC_005707 Unflattening error: > Details: > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: PROBLEM, SEVERITY==2 > Ranges not in correct order. Strange ensembl genbank entry? Range: > [207497,208369] [1,687] > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:473 > STACK: Bio::SeqFeature::Tools::Unflattener::problem > /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:952 > STACK: Bio::SeqFeature::Tools::Unflattener::_check_order_is_consistent > /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2842 > STACK: Bio::SeqFeature::Tools::Unflattener::infer_mRNA_from_CDS > /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2713 > STACK: Bio::SeqFeature::Tools::Unflattener::unflatten_seq > /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:1532 > STACK: main::unflatten_seq /usr/local/bin/bp_genbank2gff3.pl:1023 > STACK: /usr/local/bin/bp_genbank2gff3.pl:506 > ----------------------------------------------------------- > > Best, > Dave > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Tue Aug 17 09:52:11 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 17 Aug 2010 15:52:11 +0200 Subject: [Bioperl-l] Bug? Features with similar ranges, different IDs are considered overlapping In-Reply-To: References: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu> <83732A2C-C6E3-479A-ACAA-FE5756271991@sbc.su.se> Message-ID: > It seems to me that the genomic comparison is the thing people would do more often, so if you're going to create a flag, the default should be for the genomic comparison Yep, agreed. And such a flag should be named for the non-default behavior, then, like: -ignore_IDs_for_overlaps Dave From douglas.hoen at gmail.com Thu Aug 12 10:24:27 2010 From: douglas.hoen at gmail.com (Douglas Hoen) Date: Thu, 12 Aug 2010 10:24:27 -0400 Subject: [Bioperl-l] HMMER3 to GFF3 In-Reply-To: <20100812141645.1dc6507a.kai.blin@biotech.uni-tuebingen.de> References: <4bb89ced-69d9-43ff-ae20-4ce134efc40a@f6g2000yqa.googlegroups.com> <20100812141645.1dc6507a.kai.blin@biotech.uni-tuebingen.de> Message-ID: Hi Kai, Here it is. Thanks, -- Doug -------------- next part -------------- A non-text attachment was scrubbed... Name: chr1-tesigsv2.hmmscan Type: application/octet-stream Size: 676132 bytes Desc: not available URL: -------------- next part -------------- On 2010-08-12, at 8:16 AM, Kai Blin wrote: > On Wed, 11 Aug 2010 22:59:37 -0700 (PDT) > Doug Hoen wrote: > > Hi Doug, > >> Could someone please confirm whether the results are incorrect and, if >> so, perhaps suggest a fix? It may well be that this problem is due to >> the unusual way I am using hmmscan, rather than a problem with HMMER3 >> parsing...? > > Can you please attach your hmmer input file? Along the way something > inserted line breaks, making it unreadable. > > It might well be possible that the HMMer3 parser still handles a little > different from the HMMer2 parser, I haven't tried that script. > > Cheers, > Kai > > -- > Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de > Institute for Microbiology and Infection Medicine > Division of Microbiology/Biotechnology > Eberhard-Karls-University of T?bingen > Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 > D-72076 T?bingen Fax : ++49 7071 29-5979 > Deutschland > Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From CJMungall at lbl.gov Tue Aug 17 11:53:15 2010 From: CJMungall at lbl.gov (Chris Mungall) Date: Tue, 17 Aug 2010 08:53:15 -0700 Subject: [Bioperl-l] bp_genbank2gff3.pl error with circular genomes In-Reply-To: <8E620E8B-4A4A-42B2-A2FA-60C8ECA4C8A5@illinois.edu> References: <8E620E8B-4A4A-42B2-A2FA-60C8ECA4C8A5@illinois.edu> Message-ID: You can merge this in. It should allow David to proceed. I haven't kept up on synchrony between bioperl and GFF on circular genomes. The above fix is conservative in that essentially preserves the genbank coordinates even when the origin is crossed: http://github.com/bioperl/bioperl-live/commit/d752a4cb5168d1bb01f8c80247a57f66b2bd9daf However, if this is to conform to GFF3 then the resulting coordinates that cross the origin should have start/end incremented by the genome length On Aug 17, 2010, at 6:51 AM, Chris Fields wrote: > I think Chris Mungall has a branch set up for this in bioperl: > > http://github.com/bioperl/bioperl-live/tree/circular > > Is that correct? Should we merge that code into the master branch? > > chris > > On Aug 17, 2010, at 8:44 AM, David Breimann wrote: > >> Hello, >> >> The following genbank has a gene that runs over the 'end" of the >> chromosome and into its "beginning", and the script generates an >> error. >> >> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Bacillus_cereus_ATCC_10987/NC_005707.gbk >> >> NC_005707 Unflattening error: >> Details: >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: PROBLEM, SEVERITY==2 >> Ranges not in correct order. Strange ensembl genbank entry? Range: >> [207497,208369] [1,687] >> STACK: Error::throw >> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/ >> Root.pm:473 >> STACK: Bio::SeqFeature::Tools::Unflattener::problem >> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:952 >> STACK: >> Bio::SeqFeature::Tools::Unflattener::_check_order_is_consistent >> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2842 >> STACK: Bio::SeqFeature::Tools::Unflattener::infer_mRNA_from_CDS >> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2713 >> STACK: Bio::SeqFeature::Tools::Unflattener::unflatten_seq >> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:1532 >> STACK: main::unflatten_seq /usr/local/bin/bp_genbank2gff3.pl:1023 >> STACK: /usr/local/bin/bp_genbank2gff3.pl:506 >> ----------------------------------------------------------- >> >> Best, >> Dave >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Tue Aug 17 15:24:23 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 17 Aug 2010 14:24:23 -0500 Subject: [Bioperl-l] bp_genbank2gff3.pl error with circular genomes In-Reply-To: References: <8E620E8B-4A4A-42B2-A2FA-60C8ECA4C8A5@illinois.edu> Message-ID: <8C50FCC4-4BB5-4ACB-A138-BB20F50D2C45@illinois.edu> On Aug 17, 2010, at 10:53 AM, Chris Mungall wrote: > You can merge this in. It should allow David to proceed. Will do. I'll go ahead and delete the remote branch as well. > I haven't kept up on synchrony between bioperl and GFF on circular genomes. The above fix is conservative in that essentially preserves the genbank coordinates even when the origin is crossed: > > http://github.com/bioperl/bioperl-live/commit/d752a4cb5168d1bb01f8c80247a57f66b2bd9daf > > However, if this is to conform to GFF3 then the resulting coordinates that cross the origin should have start/end incremented by the genome length Yes, that is a problem that needs to be addressed. Might be worth filing a bug report for tracking this; we can use David's example, or the one I recently added for phi-X174. chris > On Aug 17, 2010, at 6:51 AM, Chris Fields wrote: > >> I think Chris Mungall has a branch set up for this in bioperl: >> >> http://github.com/bioperl/bioperl-live/tree/circular >> >> Is that correct? Should we merge that code into the master branch? >> >> chris >> >> On Aug 17, 2010, at 8:44 AM, David Breimann wrote: >> >>> Hello, >>> >>> The following genbank has a gene that runs over the 'end" of the >>> chromosome and into its "beginning", and the script generates an >>> error. >>> >>> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Bacillus_cereus_ATCC_10987/NC_005707.gbk >>> >>> NC_005707 Unflattening error: >>> Details: >>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>> MSG: PROBLEM, SEVERITY==2 >>> Ranges not in correct order. Strange ensembl genbank entry? Range: >>> [207497,208369] [1,687] >>> STACK: Error::throw >>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:473 >>> STACK: Bio::SeqFeature::Tools::Unflattener::problem >>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:952 >>> STACK: Bio::SeqFeature::Tools::Unflattener::_check_order_is_consistent >>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2842 >>> STACK: Bio::SeqFeature::Tools::Unflattener::infer_mRNA_from_CDS >>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2713 >>> STACK: Bio::SeqFeature::Tools::Unflattener::unflatten_seq >>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:1532 >>> STACK: main::unflatten_seq /usr/local/bin/bp_genbank2gff3.pl:1023 >>> STACK: /usr/local/bin/bp_genbank2gff3.pl:506 >>> ----------------------------------------------------------- >>> >>> Best, >>> Dave >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From sheldon.mckay at gmail.com Tue Aug 17 16:42:50 2010 From: sheldon.mckay at gmail.com (Sheldon McKay) Date: Tue, 17 Aug 2010 16:42:50 -0400 Subject: [Bioperl-l] AlignIO and Gbrowse_syn In-Reply-To: References: <18DF7D20DFEC044098A1062202F5FFF32F0237EAB7@exchsth.agresearch.co.nz> Message-ID: The growse_syn dev team is pretty small (n=1) right now, so any patches would be welcome. Sheldon On Wed, Aug 11, 2010 at 6:02 PM, Chris Fields wrote: > Russell, > > We have had very few requests to support .maf until recently, which is why there has been little done with it. ?We welcome any help to improve it. > > chris > > On Aug 11, 2010, at 4:31 PM, Smithies, Russell wrote: > >> I know there was some brief discussion about .maf format a few weeks ago but I've had an enquiry (as below) from a colleague. >> If GBrowse_syn is using .maf format, does AlignIO need more work? >> Any comments? >> >> --Russell >> >> >> I'd like to plug LASTZ alignments into GBrowse_syn. LASTZ can produce a limit number of alignment formats (http://www.bx.psu.edu/miller_lab/dist/README.lastz-1.02.00/README.lastz-1.02.00a.html#options_output). GBrowse_syn accepts clustalw format plus "other commonly used formats recognized by BioPerl's AlignIO parser" (http://gmod.org/wiki/GBrowse_syn_Database) . ?Since LASTZ doesn't produce clustalw, I've tried parsing LASTZ maf output to clustalw (and other alignment formats) using AlignIO, however I run into the following issues: >> *Strand info is lost (probably fair enough, since this isn't part of the clustalw format per se; incorporating strand info within sequence IDs is a GBrowse_syn clustalw specification) >> *The coordinate system for reverse strand matches ?differs between LASTZ .maf and BioPerl .maf: for LASTZ, coordinates relate to the reverse complemented sequence, whereas for BioPerl/GBrowse, coordinates relate to the original (non-rev complemented) sequence. E.g. a coordinate of "1" in the LASTZ .maf file refers to the last base of the original sequence; AlignIO prints "1" to the output clustalw file, but since strand info is lost it is construed as the first position at the very start of the original sequence. As a result all reverse match coordinates in the resulting clustalw output file are incorrect. >> *AlignIO is unable to parse multiple, individual aligned regions within the same .maf file; it interleaves them >> >> I would be interested to hear whether anyone has already found a solution to integrating LASTZ and GBrowse_syn... and also whether any development of AlignIO to improve support of maf format is planned. >> ======================================================================= >> Attention: The information contained in this message and/or attachments >> from AgResearch Limited is intended only for the persons or entities >> to which it is addressed and may contain confidential and/or privileged >> material. Any review, retransmission, dissemination or other use of, or >> taking of any action in reliance upon, this information by persons or >> entities other than the intended recipients is prohibited by AgResearch >> Limited. If you have received this message in error, please notify the >> sender immediately. >> ======================================================================= >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From hxu.hong at gmail.com Tue Aug 17 16:50:43 2010 From: hxu.hong at gmail.com (Hong Xu) Date: Tue, 17 Aug 2010 16:50:43 -0400 Subject: [Bioperl-l] Bio::Tools::Primer3 question Message-ID: Hello all, I'm working to parse the Primer3 release 2.2.2-beta result. I made the necessary changes to make Bio::Tools::Primer3 work with the new output tags of Primer3 release 2.2.2. But when I tried to get the primer Tm, I found that Bio::Tools::Primer3 gave different Tm from Primer3 result file. Then I learned that the Tm was calculated by Bio::SeqFeature::Primer module, not from parsing Primer3 result. If I want to get data from parsing Primer3 result, should I write my own Primer3 parser instead of Bio::Tools::Primer3? thanks a lot, Hong From cjfields at illinois.edu Tue Aug 17 17:14:02 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 17 Aug 2010 16:14:02 -0500 Subject: [Bioperl-l] Bio::Tools::Primer3 question In-Reply-To: References: Message-ID: Already ahead of you there, unfortunately. I wrote a complete reimplementation of both the Primer3 parser and the Primer3 wrapper that handles both v1 and v2 of primer3_core. Lack of tuits lately have prevented me from getting tests written up, so for the time being it's sitting in bioperl-dev: http://github.com/bioperl/bioperl-dev They are Bio::Tools::Primer3Redux (parser) and Bio::Tools::Run::Primer3Redux (wrapper). I rewrote those b/c I found the original modules not adequate enough in many ways for my purposes then (the newer version uses simple features or feature pairs instead of the primer features, for the same reasons you mention re: Tm). You're more than welcome to hack on the code a bit. I'm planning on pulling it out into my own github repo for separate submission to CPAN. chris On Aug 17, 2010, at 3:50 PM, Hong Xu wrote: > Hello all, > > I'm working to parse the Primer3 release 2.2.2-beta result. I made the > necessary changes to make Bio::Tools::Primer3 work with the new output > tags of Primer3 release 2.2.2. But when I tried to get the primer Tm, > I found that Bio::Tools::Primer3 gave different Tm from Primer3 result > file. Then I learned that the Tm was calculated by > Bio::SeqFeature::Primer module, not from parsing Primer3 result. If I > want to get data from parsing Primer3 result, should I write my own > Primer3 parser instead of Bio::Tools::Primer3? > > thanks a lot, > Hong > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Aug 17 23:42:59 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 17 Aug 2010 22:42:59 -0500 Subject: [Bioperl-l] bp_genbank2gff3.pl error with circular genomes In-Reply-To: <8C50FCC4-4BB5-4ACB-A138-BB20F50D2C45@illinois.edu> References: <8E620E8B-4A4A-42B2-A2FA-60C8ECA4C8A5@illinois.edu> <8C50FCC4-4BB5-4ACB-A138-BB20F50D2C45@illinois.edu> Message-ID: Chris, David, The branch is now merged back to trunk. David, let us know if this helps. chris (f) On Aug 17, 2010, at 2:24 PM, Chris Fields wrote: > On Aug 17, 2010, at 10:53 AM, Chris Mungall wrote: > >> You can merge this in. It should allow David to proceed. > > Will do. I'll go ahead and delete the remote branch as well. > >> I haven't kept up on synchrony between bioperl and GFF on circular genomes. The above fix is conservative in that essentially preserves the genbank coordinates even when the origin is crossed: >> >> http://github.com/bioperl/bioperl-live/commit/d752a4cb5168d1bb01f8c80247a57f66b2bd9daf >> >> However, if this is to conform to GFF3 then the resulting coordinates that cross the origin should have start/end incremented by the genome length > > Yes, that is a problem that needs to be addressed. Might be worth filing a bug report for tracking this; we can use David's example, or the one I recently added for phi-X174. > > chris > >> On Aug 17, 2010, at 6:51 AM, Chris Fields wrote: >> >>> I think Chris Mungall has a branch set up for this in bioperl: >>> >>> http://github.com/bioperl/bioperl-live/tree/circular >>> >>> Is that correct? Should we merge that code into the master branch? >>> >>> chris >>> >>> On Aug 17, 2010, at 8:44 AM, David Breimann wrote: >>> >>>> Hello, >>>> >>>> The following genbank has a gene that runs over the 'end" of the >>>> chromosome and into its "beginning", and the script generates an >>>> error. >>>> >>>> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Bacillus_cereus_ATCC_10987/NC_005707.gbk >>>> >>>> NC_005707 Unflattening error: >>>> Details: >>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>> MSG: PROBLEM, SEVERITY==2 >>>> Ranges not in correct order. Strange ensembl genbank entry? Range: >>>> [207497,208369] [1,687] >>>> STACK: Error::throw >>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:473 >>>> STACK: Bio::SeqFeature::Tools::Unflattener::problem >>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:952 >>>> STACK: Bio::SeqFeature::Tools::Unflattener::_check_order_is_consistent >>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2842 >>>> STACK: Bio::SeqFeature::Tools::Unflattener::infer_mRNA_from_CDS >>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2713 >>>> STACK: Bio::SeqFeature::Tools::Unflattener::unflatten_seq >>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:1532 >>>> STACK: main::unflatten_seq /usr/local/bin/bp_genbank2gff3.pl:1023 >>>> STACK: /usr/local/bin/bp_genbank2gff3.pl:506 >>>> ----------------------------------------------------------- >>>> >>>> Best, >>>> Dave >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Aug 18 00:48:55 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 17 Aug 2010 23:48:55 -0500 Subject: [Bioperl-l] Bio::Tools::Primer3 question In-Reply-To: References: Message-ID: Hong, The latest code, along with working tests, is present here: http://github.com/cjfields/Bio-Tools-Primer3Redux It needs a few more tests but the initial wrapper tests work fine for primer3 v2.2.1 on both Mac and Linux. Will try using this to CPAN after a bit more cleanup. chris On Aug 17, 2010, at 4:14 PM, Chris Fields wrote: > Already ahead of you there, unfortunately. I wrote a complete reimplementation of both the Primer3 parser and the Primer3 wrapper that handles both v1 and v2 of primer3_core. Lack of tuits lately have prevented me from getting tests written up, so for the time being it's sitting in bioperl-dev: > > http://github.com/bioperl/bioperl-dev > > They are Bio::Tools::Primer3Redux (parser) and Bio::Tools::Run::Primer3Redux (wrapper). > > I rewrote those b/c I found the original modules not adequate enough in many ways for my purposes then (the newer version uses simple features or feature pairs instead of the primer features, for the same reasons you mention re: Tm). You're more than welcome to hack on the code a bit. I'm planning on pulling it out into my own github repo for separate submission to CPAN. > > chris > > On Aug 17, 2010, at 3:50 PM, Hong Xu wrote: > >> Hello all, >> >> I'm working to parse the Primer3 release 2.2.2-beta result. I made the >> necessary changes to make Bio::Tools::Primer3 work with the new output >> tags of Primer3 release 2.2.2. But when I tried to get the primer Tm, >> I found that Bio::Tools::Primer3 gave different Tm from Primer3 result >> file. Then I learned that the Tm was calculated by >> Bio::SeqFeature::Primer module, not from parsing Primer3 result. If I >> want to get data from parsing Primer3 result, should I write my own >> Primer3 parser instead of Bio::Tools::Primer3? >> >> thanks a lot, >> Hong >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From david.breimann at gmail.com Wed Aug 18 02:46:58 2010 From: david.breimann at gmail.com (David Breimann) Date: Wed, 18 Aug 2010 09:46:58 +0300 Subject: [Bioperl-l] bp_genbank2gff3.pl error with circular genomes In-Reply-To: References: <8E620E8B-4A4A-42B2-A2FA-60C8ECA4C8A5@illinois.edu> <8C50FCC4-4BB5-4ACB-A138-BB20F50D2C45@illinois.edu> Message-ID: Dear Chris's, I tested the updated version on multiple genomes that previously returned errors (for future reference: NC_005707, NC_006578, NC_007103, NC_007104, NC_007106, NC_007107, NC_008573, NC_008762, NC_008763, NC_008785, NC_009457, NC_012040). The script now ends normally on all of them. However, as you mentioned, the result GFF3 file does not comply with GFF3 specifications for circular genomes. This in turn causes some unexpected results in other applications. Best, Dave On Wed, Aug 18, 2010 at 6:42 AM, Chris Fields wrote: > Chris, David, > > The branch is now merged back to trunk. ?David, let us know if this helps. > > chris (f) > > On Aug 17, 2010, at 2:24 PM, Chris Fields wrote: > >> On Aug 17, 2010, at 10:53 AM, Chris Mungall wrote: >> >>> You can merge this in. It should allow David to proceed. >> >> Will do. ?I'll go ahead and delete the remote branch as well. >> >>> I haven't kept up on synchrony between bioperl and GFF on circular genomes. The above fix is conservative in that essentially preserves the genbank coordinates even when the origin is crossed: >>> >>> ? ? ?http://github.com/bioperl/bioperl-live/commit/d752a4cb5168d1bb01f8c80247a57f66b2bd9daf >>> >>> However, if this is to conform to GFF3 then the resulting coordinates that cross the origin should have start/end incremented by the genome length >> >> Yes, that is a problem that needs to be addressed. ?Might be worth filing a bug report for tracking this; we can use David's example, or the one I recently added for phi-X174. >> >> chris >> >>> On Aug 17, 2010, at 6:51 AM, Chris Fields wrote: >>> >>>> I think Chris Mungall has a branch set up for this in bioperl: >>>> >>>> http://github.com/bioperl/bioperl-live/tree/circular >>>> >>>> Is that correct? ?Should we merge that code into the master branch? >>>> >>>> chris >>>> >>>> On Aug 17, 2010, at 8:44 AM, David Breimann wrote: >>>> >>>>> Hello, >>>>> >>>>> The following genbank has a gene that runs over the 'end" of the >>>>> chromosome and into its "beginning", and the script generates an >>>>> error. >>>>> >>>>> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Bacillus_cereus_ATCC_10987/NC_005707.gbk >>>>> >>>>> NC_005707 Unflattening error: >>>>> Details: >>>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>>> MSG: PROBLEM, SEVERITY==2 >>>>> Ranges not in correct order. Strange ensembl genbank entry? Range: >>>>> [207497,208369] [1,687] >>>>> STACK: Error::throw >>>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:473 >>>>> STACK: Bio::SeqFeature::Tools::Unflattener::problem >>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:952 >>>>> STACK: Bio::SeqFeature::Tools::Unflattener::_check_order_is_consistent >>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2842 >>>>> STACK: Bio::SeqFeature::Tools::Unflattener::infer_mRNA_from_CDS >>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2713 >>>>> STACK: Bio::SeqFeature::Tools::Unflattener::unflatten_seq >>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:1532 >>>>> STACK: main::unflatten_seq /usr/local/bin/bp_genbank2gff3.pl:1023 >>>>> STACK: /usr/local/bin/bp_genbank2gff3.pl:506 >>>>> ----------------------------------------------------------- >>>>> >>>>> Best, >>>>> Dave >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From G.Gallone at sms.ed.ac.uk Wed Aug 18 10:57:01 2010 From: G.Gallone at sms.ed.ac.uk (Giuseppe Gallone) Date: Wed, 18 Aug 2010 15:57:01 +0100 Subject: [Bioperl-l] [RFC] Interolog::Walk Message-ID: <4C6BF4BD.5010200@sms.ed.ac.uk> Hello BioPerl community - I've written a new module called Interolog::Walk that I'm planning to put on CPAN. I would be grateful if you might take a look at the brief description I attached and tell me what you think. I'll be more than happy to post further details should the module be of some interest for someone. Also, I am not totally sure about having the correct name for it. This is my first module and It would be great if you could advise on naming it appropriately. Hopefully the following description will give an idea on what it does. =================== NAME Interolog::Walk - Retrieve, score and visualize putative Protein-Protein Interactions through the orthology-walk method DESCRIPTION A common activity in computational biology is to mine protein-protein interactions from publicly available databases in order to build Protein-Protein Interaction (PPI) datasets. In many instances, however, the number of experimentally obtained annotated PPIs is very scarce and it would be helpful to enrich the experimental dataset with high-quality, computationally-inferred PPIs. Such computationally-obtained dataset can extend, support or enrich experimental PPI datasets, and are of crucial importance in high-throughput gene prioritization studies, i.e. to drive hypotheses and restrict the dimensionality of many gene functional discovery problems. This Perl Module, Interolog::Walk, is aimed at building putative PPI datasets on the basis of a number of comparative biology paradigms: the module implements a collection of computational biology algorithms based on the concept of "orthology projection". If interacting proteins A and B in organism X have orthologs A' and B' in organism Y, under certain conditions one can assume that the interaction will be conserved in organism Y, i.e. the A-B interaction can be "projected through the orthologies" to obtain a putative A'-B' interaction. The pair of interactions (A-B) and (A'-B') are named "Interologs" (see for instance [1] and [2]). Interolog::Walk collects, analyses and collates gene orthology data provided by the Ensembl Consortium (www.ensembl.org) as well as PPI data provided by EBI Intact (http://www.ebi.ac.uk/intact/). It provides the user with the possibility of rating the quality and reliability of the putative interactions collected, by means of confidence scores, and optionally outputs network representations of the datasets, compatible with the biological network representation standard, Cytoscape. USAGE In order to carry out an interolog walk we start with a set of gene identifiers in one organism of interest. We query those ids against a number of comparative biology databases to retrieve a list of orthologues for each gene id of interest, in one or more species. In the following step we rely on PPI databases to retrieve the list of available interactors for the protein ids obtained. The output at this stage consists of a list of interactors of the orthologues of the initial gene set, plus several fields of ancillary data. In the last step of the process we project the interactions - again using orthology data - back to the original species of interest. The output of the process is a list of PUTATIVE INTERACTORS of the initial gene set, plus several fields of ancillary data. ==================== Given the scope and the focus of the project, I would imagine that viable alternatives for the namespace might be Bio::Orthology::InterologWalk Bio::InterologMap or maybe Interolog::Map Orthology::Map Orthology::InterologMap There are no similar projects as far as I could see so I shouldn't run the risk of overlapping namespaces. Still I would love to know your informed opinion about it. best, Giuseppe REFERENCES [1] Yu H, Luscombe NM, Lu HX, Zhu X, Xia Y, Han JD, Bertin N, Chung S, Vidal M, Gerstein M. Annotation transfer between genomes: protein-protein interologs and protein-DNA regulogs. Genome Research 2004 Jun;14(6):1107-18. [2]Wiles AM, Doderer M, Ruan J, Gu T-T, Ravi D, Blackman BA, Bishop AJR. "Building and Analyzing Protein Interactome Networks by Cross-species Comparisons." BMC Systems Biology 2010, 4:36 - PMID: 20353594 -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From David.Messina at sbc.su.se Wed Aug 18 12:52:58 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 18 Aug 2010 18:52:58 +0200 Subject: [Bioperl-l] [RFC] Interolog::Walk In-Reply-To: <4C6BF4BD.5010200@sms.ed.ac.uk> References: <4C6BF4BD.5010200@sms.ed.ac.uk> Message-ID: <8BBCD561-CA5E-412B-8CBC-34767006A74A@sbc.su.se> Hi Giuseppe, Sounds really interesting ? thanks for posting this. > Bio::Orthology::InterologWalk I vote for this name, or in any case something with Bio:: as the top-level namespace since it's a biology-related package. I like that you're providing a lot of background and information about the project in the documentation. However, the USAGE section should give information about how to use the module, with example code. You can look at other modules on CPAN (or in BioPerl) to see the conventions for writing documentation. Also, from what you wrote, it sounds like this might be a pipeline or a script rather than a module per se, or perhaps a script and a set of modules. It would be helpful to clarify in your documentation (if you haven't already) how exactly things are organized (and of course example code will help with that, too). Hope that's helpful, and let us know when you've got it up on CPAN so we can try it out! Dave From cjfields at illinois.edu Wed Aug 18 14:24:16 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 18 Aug 2010 13:24:16 -0500 Subject: [Bioperl-l] bp_genbank2gff3.pl error with circular genomes In-Reply-To: References: <8E620E8B-4A4A-42B2-A2FA-60C8ECA4C8A5@illinois.edu> <8C50FCC4-4BB5-4ACB-A138-BB20F50D2C45@illinois.edu> Message-ID: Okay, will file this as a bug. Thanks! chris On Aug 18, 2010, at 1:46 AM, David Breimann wrote: > Dear Chris's, > > I tested the updated version on multiple genomes that previously > returned errors (for future reference: NC_005707, NC_006578, > NC_007103, NC_007104, NC_007106, NC_007107, NC_008573, NC_008762, > NC_008763, NC_008785, NC_009457, NC_012040). The script now ends > normally on all of them. However, as you mentioned, the result GFF3 > file does not comply with GFF3 specifications for circular genomes. > This in turn causes some unexpected results in other applications. > > Best, > Dave > > On Wed, Aug 18, 2010 at 6:42 AM, Chris Fields wrote: >> Chris, David, >> >> The branch is now merged back to trunk. David, let us know if this helps. >> >> chris (f) >> >> On Aug 17, 2010, at 2:24 PM, Chris Fields wrote: >> >>> On Aug 17, 2010, at 10:53 AM, Chris Mungall wrote: >>> >>>> You can merge this in. It should allow David to proceed. >>> >>> Will do. I'll go ahead and delete the remote branch as well. >>> >>>> I haven't kept up on synchrony between bioperl and GFF on circular genomes. The above fix is conservative in that essentially preserves the genbank coordinates even when the origin is crossed: >>>> >>>> http://github.com/bioperl/bioperl-live/commit/d752a4cb5168d1bb01f8c80247a57f66b2bd9daf >>>> >>>> However, if this is to conform to GFF3 then the resulting coordinates that cross the origin should have start/end incremented by the genome length >>> >>> Yes, that is a problem that needs to be addressed. Might be worth filing a bug report for tracking this; we can use David's example, or the one I recently added for phi-X174. >>> >>> chris >>> >>>> On Aug 17, 2010, at 6:51 AM, Chris Fields wrote: >>>> >>>>> I think Chris Mungall has a branch set up for this in bioperl: >>>>> >>>>> http://github.com/bioperl/bioperl-live/tree/circular >>>>> >>>>> Is that correct? Should we merge that code into the master branch? >>>>> >>>>> chris >>>>> >>>>> On Aug 17, 2010, at 8:44 AM, David Breimann wrote: >>>>> >>>>>> Hello, >>>>>> >>>>>> The following genbank has a gene that runs over the 'end" of the >>>>>> chromosome and into its "beginning", and the script generates an >>>>>> error. >>>>>> >>>>>> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Bacillus_cereus_ATCC_10987/NC_005707.gbk >>>>>> >>>>>> NC_005707 Unflattening error: >>>>>> Details: >>>>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>>>> MSG: PROBLEM, SEVERITY==2 >>>>>> Ranges not in correct order. Strange ensembl genbank entry? Range: >>>>>> [207497,208369] [1,687] >>>>>> STACK: Error::throw >>>>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:473 >>>>>> STACK: Bio::SeqFeature::Tools::Unflattener::problem >>>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:952 >>>>>> STACK: Bio::SeqFeature::Tools::Unflattener::_check_order_is_consistent >>>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2842 >>>>>> STACK: Bio::SeqFeature::Tools::Unflattener::infer_mRNA_from_CDS >>>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2713 >>>>>> STACK: Bio::SeqFeature::Tools::Unflattener::unflatten_seq >>>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:1532 >>>>>> STACK: main::unflatten_seq /usr/local/bin/bp_genbank2gff3.pl:1023 >>>>>> STACK: /usr/local/bin/bp_genbank2gff3.pl:506 >>>>>> ----------------------------------------------------------- >>>>>> >>>>>> Best, >>>>>> Dave >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cdavis at bcm.tmc.edu Wed Aug 18 15:19:53 2010 From: cdavis at bcm.tmc.edu (Caleb Davis) Date: Wed, 18 Aug 2010 14:19:53 -0500 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlast.pm - bl2seq question Message-ID: <4C6C3259.4060304@bcm.tmc.edu> Hello, thank you for bioperl! I am getting discrepancies between the online bl2seq (www.ncbi.nlm.nih.gov/blast/*bl2seq*/wblast2.cgi) and bioperl's implementation, and I'm not sure why. I'm seeing a desired behavior through the web interface but can't replicate it locally. Specifically, online bl2seq aligns across a 1 bp insertion in the subject whereas the local bl2seq just reports a shorter alignment. Any ideas? Thanks again, --Caleb The desired parameter differences from default are -F F -W 7 (turn complexity filter off, word size = 7). Below I present the online and local results given the following input sequences: >consensus GAGGATCCAGAATTCTC >FVFTF6N01A86BR AACCCAATGTAAGGAAGCTAAGAACCTTGAAAAGAGGATACCAGAATTCTC Here are the parameters and result I'm getting online: Blast4-request ::= { body queue-search { program "blastn", service "plain", queries bioseq-set { seq-set { seq { id { local id 26297 }, descr { title "consensus", user { type str "CFastaReader", data { { label str "DefLine", data str ">consensus" } } } }, inst { repr raw, mol na, length 17, seq-data ncbi2na '8A3520F740'H } } } }, subject sequences { { id { local id 26299 }, descr { title "FVFTF6N01A86BR", user { type str "CFastaReader", data { { label str "DefLine", data str ">FVFTF6N01A86BR" } } } }, inst { repr raw, mol na, length 51, seq-data ncbi2na '0543B0A09C205F80228C520F74'H } } }, algorithm-options { { name "EvalueThreshold", value cutoff e-value { 1, 10, 1 } }, { name "UngappedMode", value boolean FALSE }, { name "PercentIdentity", value real { 0, 10, 0 } }, { name "HitlistSize", value integer 100 }, { name "EffectiveSearchSpace", value big-integer 0 }, { name "DbLength", value big-integer 0 }, { name "WindowSize", value integer 0 }, { name "DustFiltering", value boolean FALSE }, { name "RepeatFiltering", value boolean FALSE }, { name "MaskAtHash", value boolean TRUE }, { name "MismatchPenalty", value integer -3 }, { name "MatchReward", value integer 2 }, { name "GapOpeningCost", value integer 5 }, { name "GapExtensionCost", value integer 2 }, { name "StrandOption", value strand-type both-strands }, { name "WordSize", value integer 7 } }, format-options { { name "Web_JobTitle", value string "consensus" }, { name "Web_BlastSpecialPage", value string "blast2seq" } } } } >lcl|30439 FVFTF6N01A86BR Length=51 Sort alignments for this subject sequence by: E value Score Percent identity Query start position Subject start position Score = 24.7 bits (26), Expect = 2e-05 Identities = 17/18 (94%), Gaps = 1/18 (5%) Strand=Plus/Plus Query 1 GAGGAT-CCAGAATTCTC 17 |||||| ||||||||||| Sbjct 34 GAGGATACCAGAATTCTC 51 Here's the output from a local search (I changed the expect to 5.0 just to prove to myself that some parameters are getting through OK): my @params = (-program => 'blastn', -outfile => 'bl2seq.out', -FILTER => 'F', -WORDSIZE => 7, -expect => 5.0); my $factory = Bio::Tools::Run::StandAloneBlast->new(@params); my $bl2seq_report = $factory->bl2seq($cons_seqobj, $single_seqobj); #consensus vs. FVFTF6N01A86BR print Dumper $bl2seq_report->next_result; $VAR1 = bless( { '_inclusion_threshold' => undef, '_queryacc' => 'adapter_consensus', '_iteration_index' => 0, '_iteration_count' => 1, '_hits' => [], '_hitindex' => 0, '_querylength' => '17', '_querydesc' => '', '_iterations' => [ bless( { '_oldhits_not_below_threshold' => [], '_newhits_unclassified' => [], '_number' => 1, '_oldhits_newly_below_threshold' => [], '_hit_factory' => bless( { 'interface' => 'Bio::Search::Hit::HitI', 'type' => 'Bio::Search::Hit::BlastHit', '_loaded_types' => { 'Bio::Search::Hit::BlastHit' => 1 }, '_root_verbose' => 0 }, 'Bio::Factory::ObjectFactory' ), '_newhits_below_threshold' => [ { '-algorithm' => 'BLASTN', '-description' => '', '-length' => '51', '-query_len' => '17', '-hsp_factory' => bless( { 'interface' => 'Bio::Search::HSP::HSPI', 'type' => 'Bio::Search::HSP::GenericHSP', '_loaded_types' => { 'Bio::Search::HSP::GenericHSP' => 1 }, '_root_verbose' => 0 }, 'Bio::Factory::ObjectFactory' ), '-name' => 'FVFTF6N01A86BR', '-rank' => 1, '-hsps' => [ { '-query_start' => '7', '-algorithm' => 'BLASTN', '-hit_seq' => 'ccagaattctc', '-hit_length' => '51', '-query_length' => '17', '-query_desc' => '', '-query_frame' => 0, '-rank' => 1, '-hit_desc' => '', '-query_end' => '17', '-hit_name' => 'FVFTF6N01A86BR', '-identical' => '11', '-query_name' => 'adapter_consensus', '-evalue' => '1e-04', '-score' => '11', '-conserved' => '11', '-hit_frame' => 0, '-hsp_length' => '11', '-query_seq' => 'ccagaattctc', '-hit_start' => '41', '-homology_seq' => '|||||||||||', '-hit_end' => '51', '-bits' => '22.3' }, { '-query_start' => '9', '-algorithm' => 'BLASTN', '-hit_seq' => 'agaattct', '-hit_length' => '51', '-query_length' => '17', '-query_desc' => '', '-query_frame' => 0, '-rank' => 2, '-hit_desc' => '', '-query_end' => '16', '-hit_name' => 'FVFTF6N01A86BR', '-identical' => '8', '-query_name' => 'adapter_consensus', '-evalue' => '0.007', '-score' => '8', '-conserved' => '8', '-hit_frame' => 0, '-hsp_length' => '8', '-query_seq' => 'agaattct', '-hit_start' => '50', '-homology_seq' => '||||||||', '-hit_end' => '43', '-bits' => '16.4' } ], '-accession' => 'FVFTF6N01A86BR', '-significance' => '1e-04' } ], '_root_verbose' => 0, '_newhits_not_below_threshold' => [], '_oldhits_below_threshold' => [] }, 'Bio::Search::Iteration::GenericIteration' ) ], '_hit_factory' => $VAR1->{'_iterations'}[0]{'_hit_factory'}, '_statistics' => bless( { 'stats' => { 'S1' => '4', 'S1_bits' => '8.4', 'kappa_gapped' => '0.711', 'X3_bits' => '99.1', 'X1' => '4', 'lambda_gapped' => '1.37', 'X2' => '15', 'S2' => '4', 'seqs_better_than_cutoff' => '1', 'Hits_to_DB' => '5', 'num_extensions' => '2', 'num_successful_extensions' => '2', 'X1_bits' => '7.9', 'X3' => '50', 'dbentries' => '1', 'entropy_gapped' => '1.31', 'X2_bits' => '29.7', 'S2_bits' => '8.4' } }, 'Bio::Search::GenericStatistics' ), '_algorithm' => 'BLASTN', '_parameters' => bless( { 'params' => { 'gapext' => '2', 'matrix' => 'blastn matrix:1 -3', 'expect' => '5.0', 'allowgaps' => 'yes', 'gapopen' => '5' } }, 'Bio::Tools::Run::GenericParameters' ), '_root_verbose' => 0, '_queryname' => 'adapter_consensus' }, 'Bio::Search::Result::BlastResult' ); From David.Messina at sbc.su.se Wed Aug 18 18:32:37 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 19 Aug 2010 00:32:37 +0200 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlast.pm - bl2seq question In-Reply-To: <4C6C3259.4060304@bcm.tmc.edu> References: <4C6C3259.4060304@bcm.tmc.edu> Message-ID: Hi Caleb, The first thing I would do is take BioPerl out of the equation and test your local bl2seq on the command line. If you get the same output locally as on the web version, then there is a problem with BioPerl. If you're still seeing a discrepancy between the web and your local run, then this isn't a problem with BioPerl. Just to be clear, BioPerl doesn't "implement" any of the BLAST programs; it is simply a wrapper around the programs that you download from NCBI. That doesn't mean BioPerl isn't at fault, of course, just that it's important to isolate the problem carefully. The most common reasons for these discrepancies are: - different version numbers of BLAST 2.2.21? 2.2.22? Is it the same on the web as locally? - similarly, different implementations of BLAST NCBI's old BLAST suite is now deprecated and replaced with BLAST+. All of the online BLAST web queries are Blast+ now ? are you running BLAST+ locally? (there's also a separate BioPerl wrapper for BLAST+ called Bio::Tools::Run::BlastPlus) - hidden "default" parameters Even though you're only changing a handful of parameters, the defaults (particularly on the web version) may be different than what you expect. In your case, it looks like on the web version, match score is 2 and mismatch is -3. However, in the local version I believe match score is 1 and a mismatch is -3. See this line in the params block near the end of your post: 'matrix' => 'blastn matrix:1 -3', Dave From sidd.basu at gmail.com Wed Aug 18 20:28:32 2010 From: sidd.basu at gmail.com (Siddhartha Basu) Date: Wed, 18 Aug 2010 19:28:32 -0500 Subject: [Bioperl-l] Re: [RFC] Interolog::Walk In-Reply-To: <4C6BF4BD.5010200@sms.ed.ac.uk> References: <4C6BF4BD.5010200@sms.ed.ac.uk> Message-ID: <20100819002830.GA366@Macintosh-235.local> Hi, On Wed, 18 Aug 2010, Giuseppe Gallone wrote: > Hello BioPerl community - I've written a new module called Interolog::Walk > that I'm planning to put on CPAN. I would be grateful if you might take a > look at the brief description I attached and tell me what you think. I'll > be more than happy to post further details should the module be of some > interest for someone. > > Also, I am not totally sure about having the correct name for it. This is > my first module and It would be great if you could advise on naming it > appropriately. Hopefully the following description will give an idea on > what it does. > > =================== > > > NAME > Interolog::Walk - Retrieve, score and visualize putative > Protein-Protein Interactions through the orthology-walk method > > DESCRIPTION > A common activity in computational biology is to mine protein-protein > interactions from publicly available databases in order to build > Protein-Protein Interaction (PPI) datasets. > In many instances, however, the number of experimentally obtained annotated > PPIs is very scarce and it would be helpful to enrich the experimental > dataset with high-quality, computationally-inferred PPIs. Such > computationally-obtained dataset can extend, support or enrich experimental > PPI datasets, and are of crucial importance in high-throughput gene > prioritization studies, i.e. to drive hypotheses and restrict the > dimensionality of many gene functional discovery problems. > This Perl Module, Interolog::Walk, is aimed at building putative PPI > datasets on the basis of a number of comparative biology paradigms: the > module implements a collection of computational biology algorithms based on > the concept of "orthology projection". If interacting proteins A and B in > organism X have orthologs A' and B' in organism Y, under certain conditions > one can assume that the interaction will be conserved in organism Y, i.e. > the A-B interaction can be "projected through the orthologies" to obtain a > putative A'-B' interaction. The pair of interactions (A-B) and (A'-B') are > named "Interologs" (see for instance [1] and [2]). > > Interolog::Walk collects, analyses and collates gene orthology data > provided by the Ensembl Consortium (www.ensembl.org) as well as PPI data > provided by EBI Intact (http://www.ebi.ac.uk/intact/). It provides the user > with the possibility of rating the quality and reliability of the putative > interactions collected, by means of confidence scores, and optionally > outputs network representations of the datasets, compatible with the > biological network representation standard, Cytoscape. Sounds interesting. I am currently playing around with a perl based webapp for displaying interactome using cytoscapeweb. Depending how your design pans out, would be happy to use your module as a backend analysis layer. And on a related note, you might want to have a look at bioperl-network and if there is any overlap might be worth contributing. -siddhartha > > USAGE > In order to carry out an interolog walk we start with a set of gene > identifiers in one organism of interest. We query those ids against a > number of comparative biology databases to retrieve a list of orthologues > for each gene id of interest, in one or more species. > In the following step we rely on PPI databases to retrieve the list of > available interactors for the protein ids obtained. The output at this > stage consists of a list of interactors of the orthologues of the initial > gene set, plus several fields of ancillary data. > In the last step of the process we project the interactions - again using > orthology data - back to the original species of interest. The output of > the process is a list of PUTATIVE INTERACTORS of the initial gene set, plus > several fields of ancillary data. > > ==================== > > Given the scope and the focus of the project, I would imagine that viable > alternatives for the namespace might be > > Bio::Orthology::InterologWalk > Bio::InterologMap > > or maybe > Interolog::Map > Orthology::Map > Orthology::InterologMap > > There are no similar projects as far as I could see so I shouldn't run the > risk of overlapping namespaces. Still I would love to know your informed > opinion about it. > > best, > Giuseppe > > > > REFERENCES > [1] Yu H, Luscombe NM, Lu HX, Zhu X, Xia Y, Han JD, Bertin N, Chung S, > Vidal M, Gerstein M. Annotation transfer between genomes: protein-protein > interologs and protein-DNA regulogs. Genome Research 2004 > Jun;14(6):1107-18. > > [2]Wiles AM, Doderer M, Ruan J, Gu T-T, Ravi D, Blackman BA, Bishop AJR. > "Building and Analyzing Protein Interactome Networks by Cross-species > Comparisons." BMC Systems Biology 2010, 4:36 - PMID: 20353594 > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From dan.kortschak at adelaide.edu.au Wed Aug 18 22:15:03 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Thu, 19 Aug 2010 11:45:03 +0930 Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query Message-ID: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au> Hi Everyone, I'm wanting to set up a persistent data store for some of my work and am in the process of choosing parts for my system. From my brief look around I think I'd like to use BioSQL (next best choice being Chado - but BioPerl bindings in bioperl-db for BioSQL being the decider here), but have noticed comments some time back that bioperl-db and PostgreSQL 8.3 (my prefered engine - though MySQL is possible, but makes the whole system messier) don't play well together. What is the status of the casting expectation conflict between bioperl-db and Pg8.3? The scripts are run with safe data, so placeholders aren't strictly crucial (though speed may be an issue?) and `$dbh->{pg_server_prepare} = 0;' seems like it could be an option. Can anybody provide any advice on this issue? thanks Dan Kortschak From cjfields at illinois.edu Wed Aug 18 23:29:36 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 18 Aug 2010 22:29:36 -0500 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlast.pm - bl2seq question In-Reply-To: References: <4C6C3259.4060304@bcm.tmc.edu> Message-ID: <194D43EC-A44C-450A-B57B-EC379DBCB935@illinois.edu> Wouldn't surprise me too much if the parameters are not set the same; IIRC the main BLAST URL API and the online NCBI Web-BLAST have different default settings. chris On Aug 18, 2010, at 5:32 PM, Dave Messina wrote: > Hi Caleb, > > The first thing I would do is take BioPerl out of the equation and test your local bl2seq on the command line. If you get the same output locally as on the web version, then there is a problem with BioPerl. If you're still seeing a discrepancy between the web and your local run, then this isn't a problem with BioPerl. > > Just to be clear, BioPerl doesn't "implement" any of the BLAST programs; it is simply a wrapper around the programs that you download from NCBI. That doesn't mean BioPerl isn't at fault, of course, just that it's important to isolate the problem carefully. > > The most common reasons for these discrepancies are: > > - different version numbers of BLAST > > 2.2.21? 2.2.22? Is it the same on the web as locally? > > - similarly, different implementations of BLAST > > NCBI's old BLAST suite is now deprecated and replaced with BLAST+. All of the online BLAST web queries are Blast+ now ? are you running BLAST+ locally? (there's also a separate BioPerl wrapper for BLAST+ called Bio::Tools::Run::BlastPlus) > > - hidden "default" parameters > > Even though you're only changing a handful of parameters, the defaults (particularly on the web version) may be different than what you expect. > > In your case, it looks like on the web version, match score is 2 and mismatch is -3. However, in the local version I believe match score is 1 and a mismatch is -3. > > See this line in the params block near the end of your post: > > 'matrix' => 'blastn matrix:1 -3', > > > > Dave > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at drycafe.net Thu Aug 19 01:48:19 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Thu, 19 Aug 2010 01:48:19 -0400 Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query In-Reply-To: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au> References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: <986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net> Hi Dan, the casting isn't an issue anymore, I think. (And even if it were, there is actually a small script that brings back the casts that were built into 8.2.) Have you found an example where it still is? -hilmar On Aug 18, 2010, at 10:15 PM, Dan Kortschak wrote: > Hi Everyone, > > I'm wanting to set up a persistent data store for some of my work > and am > in the process of choosing parts for my system. From my brief look > around I think I'd like to use BioSQL (next best choice being Chado - > but BioPerl bindings in bioperl-db for BioSQL being the decider here), > but have noticed comments some time back that bioperl-db and > PostgreSQL > 8.3 (my prefered engine - though MySQL is possible, but makes the > whole > system messier) don't play well together. > > What is the status of the casting expectation conflict between > bioperl-db and Pg8.3? The scripts are run with safe data, so > placeholders aren't strictly crucial (though speed may be an issue?) > and > `$dbh->{pg_server_prepare} = 0;' seems like it could be an option. > > Can anybody provide any advice on this issue? > > thanks > Dan Kortschak > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From dan.kortschak at adelaide.edu.au Thu Aug 19 01:54:03 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Thu, 19 Aug 2010 15:24:03 +0930 Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query In-Reply-To: <986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net> References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au> <986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net> Message-ID: <1282197243.14127.27.camel@zoidberg.mbs.adelaide.edu.au> Hi Hilmar, No, I haven't found any problems, just hoping to avoid them by prior research. thanks Dan On Thu, 2010-08-19 at 01:48 -0400, Hilmar Lapp wrote: > Hi Dan, > > the casting isn't an issue anymore, I think. (And even if it were, > there is actually a small script that brings back the casts that > were > built into 8.2.) Have you found an example where it still is? > > -hilmar From biopython at maubp.freeserve.co.uk Thu Aug 19 06:01:03 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 19 Aug 2010 11:01:03 +0100 Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query In-Reply-To: <986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net> References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au> <986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net> Message-ID: On Thu, Aug 19, 2010 at 6:48 AM, Hilmar Lapp wrote: > Hi Dan, > > the casting isn't an issue anymore, I think. (And even if it were, there is > actually a small script that brings back the casts that were built into > 8.2.) Have you found an example where it still is? > > ? ? ? ?-hilmar Hi Hilmar, Do the bioperl-db bindings for BioSQL on PostgreSQL still require those extra rules in the schema? http://bugzilla.open-bio.org/show_bug.cgi?id=2839 Peter From G.Gallone at sms.ed.ac.uk Thu Aug 19 06:45:36 2010 From: G.Gallone at sms.ed.ac.uk (Giuseppe Gallone) Date: Thu, 19 Aug 2010 11:45:36 +0100 Subject: [Bioperl-l] [RFC] Interolog::Walk In-Reply-To: <8BBCD561-CA5E-412B-8CBC-34767006A74A@sbc.su.se> References: <4C6BF4BD.5010200@sms.ed.ac.uk> <8BBCD561-CA5E-412B-8CBC-34767006A74A@sbc.su.se> Message-ID: <4C6D0B50.4050902@sms.ed.ac.uk> Hi Dave, thank you very much for your helpful comments. Regarding the module name: I will follow your advice and avoid to propose a new root during the module registration. As for the second level, I haven't been able to find anything related to homology/orthology, therefore I'm not sure whether I should go for Bio::Orthology::InterologMap or Bio::Homology::InterologMap The first one being maybe a bit more specific. I might also expand further as in Bio::Orthology::Interolog::Map, just in case somebody else finds other interesting applications for the Interolog concept and would like to "plug in" their own contribution. Would this make any sense? I also appreciate your comments on the documentation. The one I provided is actually not the full pod I was planning to include, but rather an extract. What I have at the moment is a description, for each method, in the following form: ===================================== remove_duplicate_rows Usage : $RC = InterologMap::remove_duplicate_rows(input_handle => $dbh, output_handle => $out_data, header => 'standard', ); Purpose : This is used to clean up a TSV data files of duplicate entries. Occasionally, Intact can return duplicate entries. This routine will make sure no such duplicates are kept. A new datafile is built. The number of unique data rows is updated. Returns : success/error Argument : database handle to input file, filehandle to outputfile, header type. Header type is one of the following: - "standard": when the routine is used to clean up an interolog walk file (the header will be longer) - "direct": when the routine is used to clean up a file of real db interaction (the header is shorter) - no field provided: default is standard Throws : - Comment : Sample See Also : ======================================= On top of that, there is a DESCRIPTION, USAGE, and SYNOPSIS. The synopsis has some code with an example of typical usage of the module. Please take a look at this (attached below) and tell me what you think. You mention that the description contains a lot of background information. Would you recommend reducing it, or placing it elsewhere? I was considering to write a little tutorial in latex as soon as possible anyway, to provide a "centralised" source of information to familiarise with the module. Does this respect the CPAN regulations? As for your question on the structure of the module: you are indeed right, the idea when running the "orthology walk" is to create a pipeline of subroutines: there's a core set of subroutines meant to work in strict sequentiality. Each of these subroutines expects, as input, the output of the previous one. The input/output dataset is currently in the form of a TSV text file, which I process with the help of the DBI module (to be more specific, I use DBD::CSV). While there's a certain flexibility regarding how to use the module, one core idea remains: in order to get the set of putative interactors, the user would have to call at least three basic routines: (A) ================= 1)get_forward_orthologies(): this queries the initial gene list against one or more Ensembl dbs (using the Ensembl Perl Api) and retrieves their orthologues, plus a number of ancillary data fields (mainly conservation data, eg dn/ds ratio,distance from ancestor,orthology type, etc) 2)get_interactors(): this queries the orthology list built in the previous stage against a PSICQUIC-enabled PPI db using Rest (at the moment I only query the EBI Intact DB, but it should be easy to expand this and query all PSICQUIC compatible PPI dbs transparently). This step will "fatten" the dataset built in (1) with the interactors of those orthologues, plus ancillary data (including lots of parameters describing the quality, nature, origin of the annotated interaction) 3)get_backward_orthologies(): this queries the interactor list built in the previous stage against one or more Ensembl dbs to find orthologues *back* in the original species. It also adds a number of supplementary information just like in (1). ================== At the end of this procedure the user will have a TSV files where each row contains a binary putative interaction plus (currently) 37 supplementary data fields. One can then scan these results to check for duplicates, to compute counts, to see if we have discovered new gene ids that were not present in the original dataset (hopefully we have :) ). Most importantly, one can then further process these results to do one or more of the following: (B) compute a global confidence score to assess the reliability of the each binary putative interaction (C) extract the binary putative PPIs from the dataset and save them in a format compatible with Cytoscape: this helps providing a visual quality to the result: one can then apply network analysis tools to discover motifs, clusters, etc. The format I use is currently .SIF + attributes, as detailed in http://cytoscape.wodaklab.org/wiki/Cytoscape_User_Manual/Network_Formats (D) given the same initial gene list, one can also build a dataset of REAL, experimentally-obtained PPIs,(without mapping through orthologies in other species). One can then compare this dataset with the Putative dataset to see if/where the two overlap, what's the intersection or the differences, etc. In order to suggest ways of using the module I have written 4 sample scripts and I will include them in the module. Each script utilises the module and uses/reuses subroutines in a pipeline fashion, and does the following: 1)doInterologWalk.pl: runs the basic pipeline in (A) 2)doScores.pl: computes and adds confidence scores as explained in (B) 3)doNetworks.pl: computes SIF network + attributes as in (D) 4)getRealInteractions.pl: runs a pipeline to obtain real PPIs from the inital gene set. Hope I didn't make this too confusing. I would love to hear back from you and from anybody else that would like to provide feedback. Cheers Giuseppe On 18/08/10 17:52, Dave Messina wrote: > Hi Giuseppe, > > Sounds really interesting ? thanks for posting this. > >> Bio::Orthology::InterologWalk > > I vote for this name, or in any case something with Bio:: as the top-level namespace since it's a biology-related package. > > I like that you're providing a lot of background and information about the project in the documentation. However, the USAGE section should give information about how to use the module, with example code. You can look at other modules on CPAN (or in BioPerl) to see the conventions for writing documentation. > > Also, from what you wrote, it sounds like this might be a pipeline or a script rather than a module per se, or perhaps a script and a set of modules. It would be helpful to clarify in your documentation (if you haven't already) how exactly things are organized (and of course example code will help with that, too). > > > Hope that's helpful, and let us know when you've got it up on CPAN so we can try it out! > > > Dave > > NAME Interolog::Walk - Retrieve, score and visualize putative Protein-Protein Interactions through the orthology-walk method SYNOPSIS use Interolog::Walk; First, obtain Intact Interactions for the dataset (see example in "getDirectInteractions.pl"): #get a registry from Ensembl my $registry = InterologMap::setup_ensembl_adaptor(connect_to_db => $ensembl_db, source_species => $sourceorg, verbose => 1 ); #query actual interactions $RC = InterologMap::Direct::get_direct_interactions(registry => $registry, source_species => $sourceorg, input_path => $in_path, output_path => $out_path, url => $url, ); do some postprocessing (see "do_counts()" and "extract_unseen_ids()" ) and then do the actual interolog walk on the dataset with the following sequence of three methods. get orthologues of starting set: $RC = InterologMap::get_forward_orthologies(registry => $registry, ensembl_db => $ensembl_db, input_path => $in_path, output_path => $out_path, source_org => $sourceorg, dest_org => $destorg, ); add interactors of orthologues found by "get_forward_orthologies()": $RC = InterologMap::get_interactions(input_path => $in_path, output_path => $out_path, url => $url, url_global => $url_global, ); add orthologues of interactors found by "get_interactions()": $RC = InterologMap::get_backward_orthologies(registry => $registry, ensembl_db => $ensembl_db, input_path => $in_path, output_path => $out_path, error_path => $err_path, source_org => $sourceorg, ); do some postprocessing (see "remove_duplicate_rows()", "do_counts()", "extract_unseen_ids()") and then optionally compute a composite score for the putative interactions obtained: $RC = InterologMap::Scores::compute_scores(input_path => $in_path, score_path => $score_path, output_path => $out_path, term_graph => $onto_graph, M_IT_SCORE => $M_IT, M_DM_SCORE => $M_DM, M_ME_DM_SCORE => $M_MDM, M_ME_TAXA_SCORE => $M_MTAXA ); get some networks and network attributes which you can then visualise with cytoscape $RC = InterologMap::Networks::do_network(registry => $registry, db => $ensembl_db, input_path => $in_path, output_path => $out_path, source_org => $sourceorg, orthology_type => $orthtype, ); $RC = InterologMap::Networks::do_attributes(registry => $registry, input_path => $in_path, output_path => $out_path, source_org => $sourceorg, label_type => 'external name' ); *The synopsis above only lists the major methods and parameters.* DESCRIPTION A common activity in computational biology is to mine protein-protein interactions from publicly available databases to build *Protein-Protein Interaction* (PPI) datasets. In many instances, however, the number of experimentally obtained annotated PPIs is very scarce and it would be helpful to enrich the experimental dataset with high-quality, computationally-inferred PPIs. Such computationally-obtained dataset can extend, support or enrich experimental PPI datasets, and are of crucial importance in high-throughput gene prioritization studies, i.e. to drive hypotheses and restrict the dimensionality of functional discovery problems. This Perl Module, Interolog::Walk, is aimed at building putative PPI datasets on the basis of a number of comparative biology paradigms: the module implements a collection of computational biology algorithms based on the concept of "orthology projection". If interacting proteins A and B in organism X have orthologs A' and B' in organism Y, under certain conditions one can assume that the interaction will be conserved in organism Y, i.e. the A-B interaction can be "projected through the orthologies" to obtain a putative A'-B' interaction. The pair of interactions (A-B) and (A'-B') are named "Interologs". Interolog::Walk collects, analyses and collates gene orthology data provided by the Ensembl Consortium as well as PPI data provided by EBI Intact. It provides the user with the possibility of rating the quality and reliability of the putative interactions collected, by means of confidence scores, and optionally outputs network representations of the datasets, compatible with the biological network representation standard, Cytoscape. BASIC USAGE Rationale behind "Interolog::Walk". \EBI Intact API/ .--------------. | .-------------. (2) | A(e.g. mouse)|<------------------------>| B(mouse) | (3) `--------------' `-------------' ^ | /Ensembl\ | | \ Ensembl / / Compara \ | | \Compara/ / Api \ | | \ Api / | | .--------------. .-------------. (1) | A'(e.g. fly) |. . . . . . . . . . . . . | B'(fly) | (4) `--------------' [SCORED]PUTATIVE PPI `-------------' (Output of Interolog::Walk) In order to carry out an interolog walk we start with a set of gene identifiers in one organism of interest (1). We query those ids against a number of comparative biology databases to retrieve a list of orthologues for the gene id of interest, in one or more species (2). In the next step we rely instead on PPI databases to retrieve the list of available interactors for the protein ids obtained in (2). The output at this stage consists of a list of interactors of the orthologues of the initial gene set, plus several fields of ancillary data (whose importance will be explained later) (3). In the last step of this process we will need to project the interactions in (3) - again using orthology data - back to the original species of interest. The output of the process is a list of PUTATIVE INTERACTORS of the initial gene set, plus several fields of ancillary data. "Interolog::Walk" provides three main functions to carry out the basic walk, "get_forward_orthologies()", "get_interactions()" and "get_backward_orthologies()". These functions must be called strictly sequentially in your script, as the process, analyse and attach data to the output in a pipeline-like fashion, i.e. processing the output of the preceding function. get_forward_orthologies get_interactions get_backward_orthologies SCORING THE PUTATIVE INTERACTIONS BUILDING PUTATIVE INTERACTION NETWORKS BUGS Please report any you find SUPPORT TODO AUTHOR Giuseppe Gallone CPAN ID: GGALLONE University of Edinburgh COPYRIGHT The Interolog::Walk module is Copyright (c) 2010 Giuseppe Gallone All rights reserved. You may distribute under the terms of either the GNU General Public License or the Artistic License, as specified in the Perl 5.10.0 README file. SEE ALSO -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From G.Gallone at sms.ed.ac.uk Thu Aug 19 08:42:28 2010 From: G.Gallone at sms.ed.ac.uk (Giuseppe Gallone) Date: Thu, 19 Aug 2010 13:42:28 +0100 Subject: [Bioperl-l] [RFC] Interolog::Walk In-Reply-To: <20100819002830.GA366@Macintosh-235.local> References: <4C6BF4BD.5010200@sms.ed.ac.uk> <20100819002830.GA366@Macintosh-235.local> Message-ID: <4C6D26B4.5090702@sms.ed.ac.uk> Dear Siddhartha, glad to hear this might be helpful. As for the bioperl-network package you mention, thank for you for mentioning that. I gave a quick look to its documentation and looks like a much deeper and more complex effort than what I have in my package. I've actually been using a lot the package Graph on which it seems to be based and found it very helpful. I'm not sure if the network routines in my module overlap with it though: all I do in my package is parse the dataset, filtering out only what requested to build a cytoscape SIF file and optionally some cytoscape NOA attribute files, as requested by the cytoscape specification in http://cytoscape.wodaklab.org/wiki/Cytoscape_User_Manual/Network_Formats instead it looks like bioperl-network actually builds some kind of internal representation of the network for further manipulation in Perl, if I understand it correctly? Kind regards Giuseppe On 19/08/10 01:28, Siddhartha Basu wrote: > Sounds interesting. I am currently playing around with a perl based webapp for displaying interactome > using cytoscapeweb. Depending how your design pans out, would be happy to > use your module as a backend analysis layer. And on a related note, you > might want to have a look at bioperl-network and if there is any overlap > might be worth contributing. > > -siddhartha > -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From xupeng86 at gmail.com Thu Aug 19 04:02:48 2010 From: xupeng86 at gmail.com (xupeng) Date: Thu, 19 Aug 2010 16:02:48 +0800 Subject: [Bioperl-l] Why I can't find the perl script "load_seqdatabase.pl" when use biosql database? Message-ID: <201008191602.49068.xupeng86@gmail.com> I've downloaded the biosql-1.0.1.tar.gz. It works well. But I can't find the 'load_seqdatabase.pl' when I try to import the Genbank files into biosql databsase. Can anyone give me a copy of that file? many thanks ! From sunhanifk at gmail.com Thu Aug 19 10:25:38 2010 From: sunhanifk at gmail.com (han sun) Date: Thu, 19 Aug 2010 22:25:38 +0800 Subject: [Bioperl-l] Could I install BioPerl on Windows with the ActivePerl 5.12.1? Message-ID: Hello everyone, I have used perl for several months,and I now want to feel the power of bioperl. But it seems that the installing is more difficult than I thought. I typed the commands. install-shell rep add bioperl http://bioperl.org/DIST rep add uwinnipeg http://cpan.uwinnipeg.ca/PPMPackages/12xx/ rep add trouchelle http://trouchelle.com/ppm12/ install BioPerl However,the installing failed, ppm install failed: Can't find any package that provides List::MoreUtils for Bundle-BioPerl-Core Can't find any package that provides PostScript::TextBlock for Bundle-BioPerl-Core Can't find any package that provides Ace:: for Bundle-BioPerl-Core Can't find any package that provides SOAP::Lite for Bundle-BioPerl-Core Can't find any package that provides Convert::Binary::C for Bundle-BioPerl-Core Can't find any package that provides XML::Twig for Bundle-BioPerl-Core Can't find any package that provides DB_File:: for Bundle-BioPerl-Core Can't find any package that provides IPC::Run for GraphViz Can't find any package that provides XML-XPathEngine for XML-DOM-XPath Can't find any package that provides List-MoreUtils for Moose Can't find any package that provides List-MoreUtils for Class-MOP then I tried install http://www.bribes.org/perl/ppm/GD.ppd and tried the installation again,but it still didn't help. * * * * * * *Do you konw what's wrong with the problem?* * * * * *Please help me,thanks very much.* From cjfields1 at gmail.com Thu Aug 19 10:33:26 2010 From: cjfields1 at gmail.com (Christopher Fields) Date: Thu, 19 Aug 2010 09:33:26 -0500 Subject: [Bioperl-l] Could I install BioPerl on Windows with the ActivePerl 5.12.1? In-Reply-To: References: Message-ID: <78E913D5-00E2-45F2-AA9D-7F4A7CDBFDA1@gmail.com> Try using ActivePerl 5.10 instead of v5.12. It's very possible the PPM won't work for v5.12 yet. chris On Aug 19, 2010, at 9:25 AM, han sun wrote: > Hello everyone, > > I have used perl for several months,and I now want to feel the power of > bioperl. > But it seems that the installing is more difficult than I thought. > > I typed the commands. > > > > install-shell > > > rep add bioperl http://bioperl.org/DIST > > > rep add uwinnipeg > http://cpan.uwinnipeg.ca/PPMPackages/12xx/ > > > rep add trouchelle http://trouchelle.com/ppm12/ > > install BioPerl > > However,the installing failed, > > ppm install failed: > Can't find any package that provides List::MoreUtils for Bundle-BioPerl-Core > Can't find any package that provides PostScript::TextBlock for > Bundle-BioPerl-Core > Can't find any package that provides Ace:: for Bundle-BioPerl-Core > Can't find any package that provides SOAP::Lite for Bundle-BioPerl-Core > Can't find any package that provides Convert::Binary::C for > Bundle-BioPerl-Core > Can't find any package that provides XML::Twig for Bundle-BioPerl-Core > Can't find any package that provides DB_File:: for Bundle-BioPerl-Core > Can't find any package that provides IPC::Run for GraphViz > Can't find any package that provides XML-XPathEngine for XML-DOM-XPath > Can't find any package that provides List-MoreUtils for Moose > Can't find any package that provides List-MoreUtils for Class-MOP > > > then I tried > > install http://www.bribes.org/perl/ppm/GD.ppd > > and tried the installation again,but it still didn't help. > > * > * > * > * > * > * > > > *Do you konw what's wrong with the problem?* > * > * > * > * > *Please help me,thanks very much.* > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at drycafe.net Thu Aug 19 10:53:22 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Thu, 19 Aug 2010 10:53:22 -0400 Subject: [Bioperl-l] Why I can't find the perl script "load_seqdatabase.pl" when use biosql database? In-Reply-To: <201008191602.49068.xupeng86@gmail.com> References: <201008191602.49068.xupeng86@gmail.com> Message-ID: <14AD92D9-3EA7-4E74-8EF7-2B2EC70D9095@drycafe.net> The file comes with Bioperl-db, not BioSQL. That is so because it depends on BioPerl and on Bioperl-db, and so you will need to have both installed. -hilmar On Aug 19, 2010, at 4:02 AM, xupeng wrote: > I've downloaded the biosql-1.0.1.tar.gz. It works well. But I > can't find the 'load_seqdatabase.pl' when I try to import the > Genbank files into biosql databsase. > Can anyone give me a copy of that file? > many thanks ! > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From hlapp at drycafe.net Thu Aug 19 10:58:46 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Thu, 19 Aug 2010 10:58:46 -0400 Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query In-Reply-To: References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au> <986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net> Message-ID: <045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net> Yes, unfortunately they do. The feature for obviating them (namely nested transactions) is there in Pg 8.2+, but Bioperl-db doesn't use them yet ... I have to learn more about Class::DBIx first to decide whether it's better to first implement nested transactions in the home- grown ORM that Bioperl-db in essence is, or whether it's better to reimplement everything in Class::DBIx instead. There are new datatypes in Bioperl, and relations in BioSQL that could hold them, and so I need to decide what's the way forward. -hilmar On Aug 19, 2010, at 6:01 AM, Peter wrote: > On Thu, Aug 19, 2010 at 6:48 AM, Hilmar Lapp > wrote: >> Hi Dan, >> >> the casting isn't an issue anymore, I think. (And even if it were, >> there is >> actually a small script that brings back the casts that were built >> into >> 8.2.) Have you found an example where it still is? >> >> -hilmar > > Hi Hilmar, > > Do the bioperl-db bindings for BioSQL on PostgreSQL still require > those > extra rules in the schema? > http://bugzilla.open-bio.org/show_bug.cgi?id=2839 > > Peter -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From mmuratet at hudsonalpha.org Thu Aug 19 11:00:52 2010 From: mmuratet at hudsonalpha.org (Michael Muratet) Date: Thu, 19 Aug 2010 10:00:52 -0500 Subject: [Bioperl-l] Why I can't find the perl script "load_seqdatabase.pl" when use biosql database? In-Reply-To: <14AD92D9-3EA7-4E74-8EF7-2B2EC70D9095@drycafe.net> References: <201008191602.49068.xupeng86@gmail.com> <14AD92D9-3EA7-4E74-8EF7-2B2EC70D9095@drycafe.net> Message-ID: On Aug 19, 2010, at 9:53 AM, Hilmar Lapp wrote: > The file comes with Bioperl-db, not BioSQL. That is so because it > depends on BioPerl and on Bioperl-db, and so you will need to have > both installed. Is load_seqdatabase.pl still the best method? I vaguely remember a post that said that load_seqdatabase was deprecated, but I can't find it in the archives. Mike > > -hilmar > > On Aug 19, 2010, at 4:02 AM, xupeng wrote: > >> I've downloaded the biosql-1.0.1.tar.gz. It works well. But I >> can't find the 'load_seqdatabase.pl' when I try to import the >> Genbank files into biosql databsase. >> Can anyone give me a copy of that file? >> many thanks ! >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Michael Muratet, Ph.D. Senior Scientist HudsonAlpha Institute for Biotechnology mmuratet at hudsonalpha.org (256) 327-0473 (p) (256) 327-0966 (f) Room 4005 601 Genome Way Huntsville, Alabama 35806 From hlapp at drycafe.net Thu Aug 19 11:29:31 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Thu, 19 Aug 2010 11:29:31 -0400 Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query In-Reply-To: <5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu> References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au> <986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net> <045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net> <5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu> Message-ID: <5F77404A-086D-4D0C-B3A5-F5119FCF878A@drycafe.net> On Aug 19, 2010, at 11:09 AM, Chris Fields wrote: > DBIx::Class Did I have this in the wrong order :-) More coffee, please. -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From hlapp at drycafe.net Thu Aug 19 11:30:26 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Thu, 19 Aug 2010 11:30:26 -0400 Subject: [Bioperl-l] Why I can't find the perl script "load_seqdatabase.pl" when use biosql database? In-Reply-To: References: <201008191602.49068.xupeng86@gmail.com> <14AD92D9-3EA7-4E74-8EF7-2B2EC70D9095@drycafe.net> Message-ID: It's not deprecated. Unless I'm again mixing up something? -hilmar On Aug 19, 2010, at 11:00 AM, Michael Muratet wrote: > > On Aug 19, 2010, at 9:53 AM, Hilmar Lapp wrote: > >> The file comes with Bioperl-db, not BioSQL. That is so because it >> depends on BioPerl and on Bioperl-db, and so you will need to have >> both installed. > > Is load_seqdatabase.pl still the best method? I vaguely remember a > post that said that load_seqdatabase was deprecated, but I can't > find it in the archives. > > Mike > >> >> -hilmar >> >> On Aug 19, 2010, at 4:02 AM, xupeng wrote: >> >>> I've downloaded the biosql-1.0.1.tar.gz. It works well. But I >>> can't find the 'load_seqdatabase.pl' when I try to import the >>> Genbank files into biosql databsase. >>> Can anyone give me a copy of that file? >>> many thanks ! >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : >> =========================================================== >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Michael Muratet, Ph.D. > Senior Scientist > HudsonAlpha Institute for Biotechnology > mmuratet at hudsonalpha.org > (256) 327-0473 (p) > (256) 327-0966 (f) > > Room 4005 > 601 Genome Way > Huntsville, Alabama 35806 > > > > > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From cjfields at illinois.edu Thu Aug 19 11:09:13 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 19 Aug 2010 10:09:13 -0500 Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query In-Reply-To: <045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net> References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au> <986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net> <045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net> Message-ID: <5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu> I think it's worth exploring having a DBIx::Class-based middle-ware approach similar to what Rob Buels has done for Chado. That would be fairly easy to get started using DBIx::Class::Schema::Loader. After that it would require optimization and tweaking, which is potentially more complex than Rob's setup as Chado is very Pg-specific, but maybe Rob can elaborate... chris On Aug 19, 2010, at 9:58 AM, Hilmar Lapp wrote: > Yes, unfortunately they do. The feature for obviating them (namely nested transactions) is there in Pg 8.2+, but Bioperl-db doesn't use them yet ... I have to learn more about Class::DBIx first to decide whether it's better to first implement nested transactions in the home-grown ORM that Bioperl-db in essence is, or whether it's better to reimplement everything in Class::DBIx instead. > > There are new datatypes in Bioperl, and relations in BioSQL that could hold them, and so I need to decide what's the way forward. > > -hilmar > > On Aug 19, 2010, at 6:01 AM, Peter wrote: > >> On Thu, Aug 19, 2010 at 6:48 AM, Hilmar Lapp wrote: >>> Hi Dan, >>> >>> the casting isn't an issue anymore, I think. (And even if it were, there is >>> actually a small script that brings back the casts that were built into >>> 8.2.) Have you found an example where it still is? >>> >>> -hilmar >> >> Hi Hilmar, >> >> Do the bioperl-db bindings for BioSQL on PostgreSQL still require those >> extra rules in the schema? >> http://bugzilla.open-bio.org/show_bug.cgi?id=2839 >> >> Peter > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Aug 19 11:37:39 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 19 Aug 2010 10:37:39 -0500 Subject: [Bioperl-l] Why I can't find the perl script "load_seqdatabase.pl" when use biosql database? In-Reply-To: References: <201008191602.49068.xupeng86@gmail.com> <14AD92D9-3EA7-4E74-8EF7-2B2EC70D9095@drycafe.net> Message-ID: <68FB78FF-11F7-43D7-9FA3-5DFF7D391FAB@illinois.edu> I don't recall this either. So, can't blame it on lack of coffee :) chris On Aug 19, 2010, at 10:30 AM, Hilmar Lapp wrote: > It's not deprecated. Unless I'm again mixing up something? > > -hilmar > > On Aug 19, 2010, at 11:00 AM, Michael Muratet wrote: > >> >> On Aug 19, 2010, at 9:53 AM, Hilmar Lapp wrote: >> >>> The file comes with Bioperl-db, not BioSQL. That is so because it depends on BioPerl and on Bioperl-db, and so you will need to have both installed. >> >> Is load_seqdatabase.pl still the best method? I vaguely remember a post that said that load_seqdatabase was deprecated, but I can't find it in the archives. >> >> Mike >> >>> >>> -hilmar >>> >>> On Aug 19, 2010, at 4:02 AM, xupeng wrote: >>> >>>> I've downloaded the biosql-1.0.1.tar.gz. It works well. But I >>>> can't find the 'load_seqdatabase.pl' when I try to import the >>>> Genbank files into biosql databsase. >>>> Can anyone give me a copy of that file? >>>> many thanks ! >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : >>> =========================================================== >>> >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Michael Muratet, Ph.D. >> Senior Scientist >> HudsonAlpha Institute for Biotechnology >> mmuratet at hudsonalpha.org >> (256) 327-0473 (p) >> (256) 327-0966 (f) >> >> Room 4005 >> 601 Genome Way >> Huntsville, Alabama 35806 >> >> >> >> >> > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From mmuratet at hudsonalpha.org Thu Aug 19 11:40:02 2010 From: mmuratet at hudsonalpha.org (Michael Muratet) Date: Thu, 19 Aug 2010 10:40:02 -0500 Subject: [Bioperl-l] Why I can't find the perl script "load_seqdatabase.pl" when use biosql database? In-Reply-To: <68FB78FF-11F7-43D7-9FA3-5DFF7D391FAB@illinois.edu> References: <201008191602.49068.xupeng86@gmail.com> <14AD92D9-3EA7-4E74-8EF7-2B2EC70D9095@drycafe.net> <68FB78FF-11F7-43D7-9FA3-5DFF7D391FAB@illinois.edu> Message-ID: On Aug 19, 2010, at 10:37 AM, Chris Fields wrote: > I don't recall this either. So, can't blame it on lack of coffee :) Thanks. I'll keep using it! Mike > > chris > > On Aug 19, 2010, at 10:30 AM, Hilmar Lapp wrote: > >> It's not deprecated. Unless I'm again mixing up something? >> >> -hilmar >> >> On Aug 19, 2010, at 11:00 AM, Michael Muratet wrote: >> >>> >>> On Aug 19, 2010, at 9:53 AM, Hilmar Lapp wrote: >>> >>>> The file comes with Bioperl-db, not BioSQL. That is so because it >>>> depends on BioPerl and on Bioperl-db, and so you will need to >>>> have both installed. >>> >>> Is load_seqdatabase.pl still the best method? I vaguely remember a >>> post that said that load_seqdatabase was deprecated, but I can't >>> find it in the archives. >>> >>> Mike >>> >>>> >>>> -hilmar >>>> >>>> On Aug 19, 2010, at 4:02 AM, xupeng wrote: >>>> >>>>> I've downloaded the biosql-1.0.1.tar.gz. It works well. But I >>>>> can't find the 'load_seqdatabase.pl' when I try to import the >>>>> Genbank files into biosql databsase. >>>>> Can anyone give me a copy of that file? >>>>> many thanks ! >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> -- >>>> =========================================================== >>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : >>>> =========================================================== >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> Michael Muratet, Ph.D. >>> Senior Scientist >>> HudsonAlpha Institute for Biotechnology >>> mmuratet at hudsonalpha.org >>> (256) 327-0473 (p) >>> (256) 327-0966 (f) >>> >>> Room 4005 >>> 601 Genome Way >>> Huntsville, Alabama 35806 >>> >>> >>> >>> >>> >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : >> =========================================================== >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > Michael Muratet, Ph.D. Senior Scientist HudsonAlpha Institute for Biotechnology mmuratet at hudsonalpha.org (256) 327-0473 (p) (256) 327-0966 (f) Room 4005 601 Genome Way Huntsville, Alabama 35806 From cjfields at illinois.edu Thu Aug 19 11:55:54 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 19 Aug 2010 10:55:54 -0500 Subject: [Bioperl-l] Bug? Features with similar ranges, different IDs are considered overlapping In-Reply-To: References: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu> <83732A2C-C6E3-479A-ACAA-FE5756271991@sbc.su.se> Message-ID: <5611499B-FA63-4A52-8279-99B554418374@illinois.edu> On Aug 17, 2010, at 8:52 AM, Dave Messina wrote: >> It seems to me that the genomic comparison is the thing people would do more often, so if you're going to create a flag, the default should be for the genomic comparison > > Yep, agreed. > > And such a flag should be named for the non-default behavior, then, like: -ignore_IDs_for_overlaps > > Dave Probably would just be -ignore_ids as this behavior would have to be consistent across the various Bio::RangeI methods (overlaps, contains, etc). The params are case-insensitive IIRC, so the _IDs would just be lc(). RangeI doesn't define a seq_id(), though, so we either use can() in RangeI (which is dirtier IMO) or define this in the appropriate class, probably LocationI or SeqFeatureI. chris From cjfields at illinois.edu Thu Aug 19 11:56:11 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 19 Aug 2010 10:56:11 -0500 Subject: [Bioperl-l] Bug? Features with similar ranges, different IDs are considered overlapping In-Reply-To: References: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu> <83732A2C-C6E3-479A-ACAA-FE5756271991@sbc.su.se> Message-ID: <7CF700A0-C7A0-4BD2-9757-50B693B3B614@illinois.edu> Makes sense. chris On Aug 17, 2010, at 7:45 AM, Scott Cain wrote: > Hi Dave and Chris, > > It seems to me that the genomic comparison is the thing people would do more often, so if you're going to create a flag, the default should be for the genomic comparison and if somebody is doing the protein space comparison and not getting the the expected results, they'll probably read the docs to find out why. > > Scott > > -- > Scott Cain, Ph. D. > scott at scottcain dot net > Ontario Institute for Cancer Research > http://gmod.org/ > 216 392 3087 > > Snet from my iPhone. > > On Aug 17, 2010, at 5:06 AM, Dave Messina wrote: > >>> Good point; it's probably the context the methods are used that matters. So, maybe just a document clarification? >> >> That's always good, but it really doesn't solve the issue you're describing. >> >> I mean, who would expect to get overlaps for features on different chromosomes? >> >> To me, that's a clear violation of reasonable user expectations. You shouldn't have to read the docs about something like that. >> >> So what's the solution for these duelling use cases? I haven't thought about it much, but a first approximation might be to add a -genomic boolean flag that, when true, would do the right thing and check the ID when doing overlaps or other positional comparisons. >> >> (Maybe -genomic is too obscure. Maybe it should be -same_id_for_overlaps or something like that.) >> >> And maybe having to know to set a flag is effectively the same thing as having to read the docs to understand SeqFeature's overlap behavior. >> >> What do the rest of you out there think? >> >> >> Dave >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Thu Aug 19 12:54:23 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 19 Aug 2010 18:54:23 +0200 Subject: [Bioperl-l] Bug? Features with similar ranges, different IDs are considered overlapping References: <83299B71-0F73-440D-A9C5-DC1DA2AFF605@davemessina.com> Message-ID: <1EFB951F-AEE1-4B2A-9E29-114E40B25D21@sbc.su.se> [Ccing list for real this time] On Aug 19, 2010, at 17:55, Chris Fields wrote: > Probably would just be -ignore_ids You're right, that's the way to go. > define this in the appropriate class, probably LocationI or Yep, that's cleaner. Thanks! Dave From cjfields1 at gmail.com Thu Aug 19 13:20:32 2010 From: cjfields1 at gmail.com (Christopher Fields) Date: Thu, 19 Aug 2010 12:20:32 -0500 Subject: [Bioperl-l] Could I install BioPerl on Windows with the ActivePerl 5.12.1? In-Reply-To: References: <78E913D5-00E2-45F2-AA9D-7F4A7CDBFDA1@gmail.com> Message-ID: <5115F433-06AC-46F1-81AD-D15C4A8D9524@gmail.com> cc'ing list. Looks like the BioPerl PPM is possibly broken for perl 5.12. Shouldn't be too hard to fix, but apparently there are a lot of missing packages. Troubling... chris On Aug 19, 2010, at 11:29 AM, han sun wrote: > v5.10 works,thanks. > > 2010/8/19 Christopher Fields > Try using ActivePerl 5.10 instead of v5.12. It's very possible the PPM won't work for v5.12 yet. > > chris > > On Aug 19, 2010, at 9:25 AM, han sun wrote: > > > Hello everyone, > > > > I have used perl for several months,and I now want to feel the power of > > bioperl. > > But it seems that the installing is more difficult than I thought. > > > > I typed the commands. > > > > > > > > install-shell > > > > > > rep add bioperl http://bioperl.org/DIST > > > > > > rep add uwinnipeg > > http://cpan.uwinnipeg.ca/PPMPackages/12xx/ > > > > > > rep add trouchelle http://trouchelle.com/ppm12/ > > > > install BioPerl > > > > However,the installing failed, > > > > ppm install failed: > > Can't find any package that provides List::MoreUtils for Bundle-BioPerl-Core > > Can't find any package that provides PostScript::TextBlock for > > Bundle-BioPerl-Core > > Can't find any package that provides Ace:: for Bundle-BioPerl-Core > > Can't find any package that provides SOAP::Lite for Bundle-BioPerl-Core > > Can't find any package that provides Convert::Binary::C for > > Bundle-BioPerl-Core > > Can't find any package that provides XML::Twig for Bundle-BioPerl-Core > > Can't find any package that provides DB_File:: for Bundle-BioPerl-Core > > Can't find any package that provides IPC::Run for GraphViz > > Can't find any package that provides XML-XPathEngine for XML-DOM-XPath > > Can't find any package that provides List-MoreUtils for Moose > > Can't find any package that provides List-MoreUtils for Class-MOP > > > > > > then I tried > > > > install http://www.bribes.org/perl/ppm/GD.ppd > > > > and tried the installation again,but it still didn't help. > > > > * > > * > > * > > * > > * > > * > > > > > > *Do you konw what's wrong with the problem?* > > * > > * > > * > > * > > *Please help me,thanks very much.* > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From rmb32 at cornell.edu Thu Aug 19 13:09:45 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 19 Aug 2010 10:09:45 -0700 Subject: [Bioperl-l] reminder: Aug 25 deadline for GMOD Hackathon application Message-ID: <4C6D6559.3080809@cornell.edu> Hi all, This is your one-week reminder: the deadline for open applications to the GMOD Evo hackathon is Wednesday, August 25th. Rob ======================================== We are seeking participants for the GMOD Tools for Evolutionary Biology Hackathon, held November 8-12, 2010 at the US National Evolutionary Synthesis Center (NESCent) in Durham, NC. This hackathon targets three critical gaps in the capabilities of the GMOD toolbox that currently limit its utility for evolutionary research: 1. Visualization of comparative genomics data 2. Visualization of phylogenetic data and trees 3. Support for population diversity and phenotype data If you are interested in these areas and have relevant expertise, you are strongly encouraged to apply. Relevant areas of expertise include more than just software development: if you are a GMOD power user, visualization guru, domain expert (comparative, phylogenetics, population, ...), or documentation wizard, then your skills are needed! How To Apply: Fill out the online application form at http://bit.ly/gmodevohack. Applications are due August 25. About GMOD: GMOD is an intercompatible suite of open-source software components for storing, managing, analyzing, and visualizing genome-scale data. GMOD includes many widely-used software components: GBrowse and JBrowse, both genome viewers; GBrowse_syn, a comparative genomics viewer; Chado, a generic and modular database schema; CMap, a comparative map viewer; as well as many other components including Apollo, MAKER, BioMart, InterMine, and Galaxy. We hope to extend the functionality of existing GMOD components, and integrate new components as well. About Hackathons: A hackathon is an intense event at which a group of programmers with different backgrounds and skills collaborate hands-on and face-to-face to develop working code that is of utility to the community as a whole. The mix of people will include domain experts and computer-savvy end-users. More details about the event, its motivation, organization, procedures, and attendees, as well as URLs to the hackathon and related websites are included below. Sincerely, The GMOD EvoHack Organizing Committee (and project affiliations as relevant): Nicole Washington, Chair (LBNL, modENCODE, Phenote) Robert Buels (SGN, Chado NatDiv) Scott Cain (OICR, GMOD) Dave Clements (NESCent, GMOD) Hilmar Lapp (NESCent, Phenoscape, Chado NatDiv) Sheldon McKay (University of Arizona, iPlant, GBrowse_syn) ----------------------------- About the GMOD Evo Hackathon Overview We are organizing a hackathon to fill critical gaps in the capabilities of the Generic Model Organism Database (GMOD) toolbox that currently limit its utility for evolutionary research. Specifically, we will focus on tools for 1) viewing comparative genomics data; 2) visualizing phylogenomic data; and 3) supporting population diversity data and phenotype annotation. The event will be hosted at NESCent and bring together a group of about 20+ software developers, end-user representatives, and documentation experts who would otherwise not meet. The participants will include key developers of GMOD components that currently lack features critical for emerging evolutionary biology research, developers of informatics tools in evolutionary research that lack GMOD integration, and informatics-savvy biologists who can represent end-user requirements. The event will provide a unique opportunity to infuse the GMOD developer community with a heightened awareness of unmet needs in evolutionary biology that GMOD components have the potential to fill, and for tool developers in evolutionary biology to better understand how best to extend or integrate with already existing GMOD components. Before the Event Discussion of ideas and sometimes even design actually starts well before the hackathon, on mailing lists, wiki pages, and conference calls set up among accepted attendees. This advance work lays the foundation for participants to be productive from the very first day. This also means that participants should be willing to contribute some time in advance of the hackathon itself to participate in this preparatory discussion. During the Event Typically, hackathon participants use the morning of the first day of the event to organize themselves into working groups of between 3 and 6 people, each with a focused implementation objective. Ideas and objectives are discussed, and attendees coalesce around the projects in which they have the most experience or interest. Deliverables / Event Results The meeting's attendance, working groups, and outcomes will be fully logged and documented on the GMOD wiki (http://gmod.org). Each working group during the event will typically have its own wiki page, linked from the main EvoHack page, where it documents its minutes and design notes, and provides links to the code and documentation it produces. Also, since GMOD and NESCent are both committed to open source principles, all code and documentation produced by participants during the hackathon must be published under an OSI-approved open source license. As contributions to existing GMOD tools, all hackathon products will most likely satisfy this requirement automatically. NESCent This event is sponsored by the US National Evolutionary Synthesis Center (NESCent, http://www.nescent.org) through its Informatics Whitepapers program (http://www.nescent.org/informatics/whitepapers.php). NESCent promotes the synthesis of information, concepts and knowledge to address significant, emerging, or novel questions in evolutionary science and its applications. NESCent achieves this by supporting research and education across disciplinary, institutional, geographic, and demographic boundaries (see http://www.nescent.org/science/proposals.php). Links Main GMOD EvoHack page, and full proposal: http://gmod.org/wiki/GMOD_Evo_Hackathon NESCent: http://www.nescent.org/ GMOD: http://gmod.org Similar past NESCent events, see: http://hackathon.nescent.org/ GMOD hackathon application: http://bit.ly/gmodevohack -- http://gmod.org/wiki/GMOD_News http://gmod.org/wiki/GMOD_Europe_2010 http://gmod.org/wiki/Help_Desk_Feedback From David.Messina at sbc.su.se Thu Aug 19 14:55:50 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 19 Aug 2010 20:55:50 +0200 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlast.pm - bl2seq question In-Reply-To: <4C6D7123.9080908@bcm.tmc.edu> References: <4C6C3259.4060304@bcm.tmc.edu> <4C6D7123.9080908@bcm.tmc.edu> Message-ID: <4E977318-05AC-4D8E-9A39-8C07A2419198@sbc.su.se> Glad I could help, Caleb. Dave On Aug 19, 2010, at 20:00, Caleb Davis wrote: > Hi Dave, > > Thank you so much for your detailed response! Fixing the reward parameter replicated the online result for me. All of the other factors you brought up will help me track down any future problems. Thanks again. > > --Caleb > From rmb32 at cornell.edu Thu Aug 19 18:19:11 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 19 Aug 2010 15:19:11 -0700 Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query In-Reply-To: <5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu> References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au> <986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net> <045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net> <5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu> Message-ID: <4C6DADDF.1000103@cornell.edu> Chris Fields wrote: > I think it's worth exploring having a DBIx::Class-based middle-ware approach similar to what Rob Buels has done for Chado. That would be fairly easy to get started using DBIx::Class::Schema::Loader. > > After that it would require optimization and tweaking, which is potentially more complex than Rob's setup as Chado is very Pg-specific, but maybe Rob can elaborate... Elaborating on how Bio::Chado::Schema is developed: The vast majority of the code and POD in BCS is autogenerated by DBIx::Class::Schema::Loader. DBICSL gives you a baseline set of DBIx::Class classes that covers all the tables, views, columns, unique constraints, and foreign key relationships. Beyond that, you have to add on yourself. In BCS, we have mostly done things like: * make better-named aliases for some of the autogenerated relationships (though DBICSL does a surprisingly good job of naming relationships automatically most of the time) * add a tiny bit of bioperl compatibility (this needs a lot more work by somebody, volunteers needed!) * add convenience methods for using some of the Chado property tables * use DBIx::Class::Tree::NestedSet to add some powerful ways of traversing phylogenetic tree relationships Regarding DB backend specificity, BCS isn't Pg-specific at all, because DBIx::Class itself goes to great lengths to be compatible (and performant!) with just about every relational database out there. In fact, the BCS test suite deploys a Chado schema into a temporary SQLite database using DBIC::Schema's deploy() method, and runs all of its tests on that. Very handy. Chado's Pg-specific server-side functions can of course be called through BCS if they are present, but it's perfectly possible to use Chado without any of the server-side functions, and mostly the way I use it. Rob From David.Messina at sbc.su.se Fri Aug 20 05:19:14 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 20 Aug 2010 11:19:14 +0200 Subject: [Bioperl-l] Git for the lazy Message-ID: <4A13D48C-B920-4FA5-AF18-292C764A8B79@sbc.su.se> Hi everyone, If you're like me and still getting up to speed with Git, you might find this helpful: http://www.spheredev.org/wiki/Git_for_the_lazy Dave From bgs500 at york.ac.uk Fri Aug 20 09:07:50 2010 From: bgs500 at york.ac.uk (Ben Saville) Date: Fri, 20 Aug 2010 14:07:50 +0100 Subject: [Bioperl-l] Problem Parsing BLAST output Message-ID: <77D8BE69-E2D3-4C75-BED4-08F1755E414F@york.ac.uk> Hi Everyone, I'm very much new to the world of sequence data analysis (and this mailing list!), and have reached a roadblock. I have BLASTed some contigs against a series of databases that I created. From this I would like to parse through the data and separate it before extracting the information of interest at a later point. I would like to separate the data by query ID. I found the following Bioperl script; #!/usr/bin/perl use Bio::Search::Result::BlastResult; use Bio::SearchIO; my $report = Bio::SearchIO->new( -file=>'All_BCM_results.bls', -format => blast); my $result = $report->next_result; my %hits_by_query; while (my $hit = $result->next_hit) { push @{$hits_by_query{$hit->name}}, $hit; } foreach my $qid ( keys %hits_by_query ) { my $result = Bio::Search::Result::BlastResult->new(); $result->add_hit($_) for ( @{$hits_by_query{$qid}} ); my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", - format=>'blast' ); $blio->write_result($result); } running this script resulted in the following error; BlastResult::new(): Not adding iterations. ------------- EXCEPTION: Bio::Root::NoSuchThing ------------- MSG: No such iteration number: 0. Valid range=1-0 VALUE: The number zero (0) STACK: Error::throw STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.8/Bio/Root/Root.pm:368 STACK: Bio::Search::Result::BlastResult::iteration /sw/lib/perl5/5.8.8/ Bio/Search/Result/BlastResult.pm:328 STACK: Bio::Search::Result::BlastResult::add_hit /sw/lib/perl5/5.8.8/ Bio/Search/Result/BlastResult.pm:258 STACK: /Users/bsaville/Desktop/Parsing_BLAST_by_query.pl:15 ------------------------------------------------------------- So I added my $result = Bio::Search::Result::BlastResult->new(1); The 1 to the line shown above, as it told me this was within the valid range. This produced the following error; ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Must define arrayref of Iterations when initializing a Bio::Search::Result::BlastResult STACK: Error::throw STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.8/Bio/Root/Root.pm:368 STACK: Bio::Search::Result::BlastResult::new /sw/lib/perl5/5.8.8/Bio/ Search/Result/BlastResult.pm:128 STACK: /Users/bsaville/Desktop/Parsing_BLAST_by_query.pl:14 ----------------------------------------------------------- I know that it is my inexperience that is causing this problem, but I really can't figure this out. Regards Ben Saville From David.Messina at sbc.su.se Fri Aug 20 09:48:28 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 20 Aug 2010 15:48:28 +0200 Subject: [Bioperl-l] Problem Parsing BLAST output In-Reply-To: <77D8BE69-E2D3-4C75-BED4-08F1755E414F@york.ac.uk> References: <77D8BE69-E2D3-4C75-BED4-08F1755E414F@york.ac.uk> Message-ID: <0384052D-74D2-4789-B7FA-76EED826044F@sbc.su.se> Hi Ben, I would not use the script you posted ? I don't think it does what you want. If you haven't already, you should take a look at the beginners' HOWTO http://www.bioperl.org/wiki/HOWTO:Beginners the SearchIO HOWTO http://www.bioperl.org/wiki/HOWTO:SearchIO and the example scripts included with BioPerl: http://www.bioperl.org/wiki/Scripts Incidentally, it's a lot of fiddly data processing to parse blast reports for many contigs against multiple databases and then go back and collate the results by query. I'm not sure exactly what you want to do once you've separated by query ? if you provide some more information, we could suggest ways to best get you where you want to go. I will mention, though, that BLAST has the ability to search multiple separate databases in one go and collate the results for you. So that's something to consider. Dave From bernd.web at gmail.com Fri Aug 20 11:17:05 2010 From: bernd.web at gmail.com (Bernd Web) Date: Fri, 20 Aug 2010 17:17:05 +0200 Subject: [Bioperl-l] Bio::LocatableSeq end checking inconsistency In-Reply-To: <004a01cb3aec$8c2ddd60$a4899820$%yin@ucd.ie> References: <004a01cb3aec$8c2ddd60$a4899820$%yin@ucd.ie> Message-ID: Hi Yin, I am not quite sure if the following is also related to your gapped length issue but I found I had to adapt the calculation of ungapped_len in Bio::LocatableSeq. If my slices did not contain any letters or a new gap char I used, SimpleAlign could not find the sequences when outputting the alignment. This was due to a difference in length calculation: SimpleAlign: uses \W: $slice_seq =~ s/\W//g; Bio::LocatableSeq::ungapped_len uses "$string =~ s/[\.\-]+//g;" I had to include '~' (for my local sequences) in the ungapped_len; otherwise i would run into the end issues with SimpleAlign. Kind regards, Bernd On Fri, Aug 13, 2010 at 3:36 PM, Jun Yin wrote: > Hi, all, > > > > I am the google summer of code student working on Bio::Align subsystem > refactoring. The code (Bio::SimpleAlign) I re-implemented now has passed > nearly all the test, except a few tests on seq/start-end testing. But here > comes a problem. This may be an old issue, that the Bio::LocatableSeq end > assignment and checking are inconsistent. > > > > The current end checking method is based on: > > $end=$seq->_ungapped_len+$seq->start-1 > > However, this checking may not fit the real world case. > > > > The inconsistency usually happens when a few columns of the sequence are > removed. > > > > For example: > > my $a = Bio::LocatableSeq->new( > > ? ?-id ? ?=> 'a', > > ? ?-strand => 1, > > ? ?-seq ? => '-tcgatc-atcgatcg', > > ? ?-start => 30, > > ? ?-end ? => 43 > > ); > > > > If we remove the 1st, 8th and the last columns > > > > $a->seq() will be 'tcgatcatcgatc' > > $a->_ungapped_len==12 > > > > Actually, in the real world, the first residue will still be 30 (the old > $seq->start), and the last residue is the residue before the 43 (the old > $seq->end), thus 42. > > > > But if you call a validation, the calculation is > $a->_ungapped_len+$a->start-1=12+30-1=41 > > So the reassignment of the $seq->end will not pass the validation. > > > > So unless you save the information to a new sequence object, the original > position information will be lost anyway. But in some cases, we have to > change the sequence in its original sequence object .. > > > > What is your suggestion on this issue? > > A. pass the test and lose the information ? ? ?#convenient in coding but the > start-end annotation is not right any more > > B. keep the information and forget the test ? #the object will still > remember where the last residue was in the original sequence. But is it > really meaningful at all? Because all the other residues may come from > nowhere > > C. Neither of above #any other suggestions? > > > > Cheers, > > Jun Yin > > Ph.D. student in U.C.D. > > > > Bioinformatics Laboratory > > Conway Institute > > University College Dublin > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From sidd.basu at gmail.com Fri Aug 20 11:59:59 2010 From: sidd.basu at gmail.com (Siddhartha Basu) Date: Fri, 20 Aug 2010 10:59:59 -0500 Subject: [Bioperl-l] Re: bioperl-db and postgres8.3 - status query In-Reply-To: <4C6DADDF.1000103@cornell.edu> References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au> <986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net> <045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net> <5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu> <4C6DADDF.1000103@cornell.edu> Message-ID: <20100820155956.GB400@vpn-165-124-164-118.vpn.northwestern.edu> Hi, On Thu, 19 Aug 2010, Robert Buels wrote: > Chris Fields wrote: > > I think it's worth exploring having a DBIx::Class-based middle-ware > > approach similar to what Rob Buels has done for Chado. That would be > > fairly easy to get started using DBIx::Class::Schema::Loader. > > After that it would require optimization and tweaking, which is > > potentially more complex than Rob's setup as Chado is very Pg-specific, > > but maybe Rob can elaborate... > > Elaborating on how Bio::Chado::Schema is developed: > > The vast majority of the code and POD in BCS is autogenerated by > DBIx::Class::Schema::Loader. DBICSL gives you a baseline set of > DBIx::Class classes that covers all the tables, views, columns, unique > constraints, and foreign key relationships. > > Beyond that, you have to add on yourself. In BCS, we have mostly done > things like: > > * make better-named aliases for some of the autogenerated > relationships (though DBICSL does a surprisingly good job of naming > relationships automatically most of the time) > * add a tiny bit of bioperl compatibility (this needs a lot more work > by somebody, volunteers needed!) > * add convenience methods for using some of the Chado property tables > * use DBIx::Class::Tree::NestedSet to add some powerful ways of > traversing phylogenetic tree relationships > > Regarding DB backend specificity, BCS isn't Pg-specific at all, because > DBIx::Class itself goes to great lengths to be compatible (and performant!) > with just about every relational database out there. I would vouch for that at least as far as chado in oracle is concerned. So, far BCS works out flawlessly with our oracle chado instance at dictybase. Quite a chunk of BCS based code is also active in couple of our Mojo based webapps. The part which i still couldn't use directly is the 'synonym' table as it clashes with oracle specific reserved keywords. However, overall it seems to quite cross-RDMS compatible and highly recommended. -siddhartha >In fact, the BCS test > suite deploys a Chado schema into a temporary SQLite database using > DBIC::Schema's deploy() method, and runs all of its tests on that. Very > handy. > > Chado's Pg-specific server-side functions can of course be called through > BCS if they are present, but it's perfectly possible to use Chado without > any of the server-side functions, and mostly the way I use it. > > Rob > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jun.yin at ucd.ie Fri Aug 20 12:17:33 2010 From: jun.yin at ucd.ie (Jun Yin) Date: Fri, 20 Aug 2010 17:17:33 +0100 Subject: [Bioperl-l] Bio::LocatableSeq end checking inconsistency In-Reply-To: References: <004a01cb3aec$8c2ddd60$a4899820$%yin@ucd.ie> Message-ID: <000b01cb4083$31f98280$95ec8780$%yin@ucd.ie> Hi, Bernd, Thx for your input. Yes, this is one of the old bugs in Bio::SimpleAlign. $aln->slice just simply $slice_seq =~ s/\W//g to calculate the ungapped length. But in $seq->_ungapped_len, this method use $string =~ s{[$GAP_SYMBOLS$FRAMESHIFT_SYMBOLS]+}{}g; Which is '\-\.=~\\\/ ' to calculate the ungapped length. To solve this problem, first, now I use $nonres = join("",$self->gap_char, $self->match_char,$self->missing_char); Which is '-\.&' to remove the non-residue chars in the alignment sequence (though if you use '=','~','\','/' will also cause problems). Secondly, I have merged slice, remove_columns and remove_gaps, using the same internal function. Thus it is easier to debug. These changes will be merged into main BioPerl branch after next version. But anyway, the confict is still there, because the non residue chars are defined as: In Bio::SimpleAlign, $aln->gap_char, $aln->missing_char, $aln->match_char In Bio::LocatableSeq $GAP_SYMBOLS = '\-\.=~'; $FRAMESHIFT_SYMBOLS = '\\\/'; so try to use '-' or '.' for your gap char at the moment, otherwise you may encounter end warnings in calculation. And, if you want to keep gap only sequences, you can call the method as: $aln2 = $aln->slice(20,30,1) The last parameter is to keep gap only sequence. Cheers, Jun Yin Ph.D.?student in U.C.D. Bioinformatics Laboratory Conway Institute University College Dublin -----Original Message----- From: Bernd Web [mailto:bernd.web at gmail.com] Sent: Friday, August 20, 2010 4:17 PM To: Jun Yin Cc: bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Bio::LocatableSeq end checking inconsistency Hi Yin, I am not quite sure if the following is also related to your gapped length issue but I found I had to adapt the calculation of ungapped_len in Bio::LocatableSeq. If my slices did not contain any letters or a new gap char I used, SimpleAlign could not find the sequences when outputting the alignment. This was due to a difference in length calculation: SimpleAlign: uses \W: $slice_seq =~ s/\W//g; Bio::LocatableSeq::ungapped_len uses "$string =~ s/[\.\-]+//g;" I had to include '~' (for my local sequences) in the ungapped_len; otherwise i would run into the end issues with SimpleAlign. Kind regards, Bernd On Fri, Aug 13, 2010 at 3:36 PM, Jun Yin wrote: > Hi, all, > > > > I am the google summer of code student working on Bio::Align subsystem > refactoring. The code (Bio::SimpleAlign) I re-implemented now has passed > nearly all the test, except a few tests on seq/start-end testing. But here > comes a problem. This may be an old issue, that the Bio::LocatableSeq end > assignment and checking are inconsistent. > > > > The current end checking method is based on: > > $end=$seq->_ungapped_len+$seq->start-1 > > However, this checking may not fit the real world case. > > > > The inconsistency usually happens when a few columns of the sequence are > removed. > > > > For example: > > my $a = Bio::LocatableSeq->new( > > ? ?-id ? ?=> 'a', > > ? ?-strand => 1, > > ? ?-seq ? => '-tcgatc-atcgatcg', > > ? ?-start => 30, > > ? ?-end ? => 43 > > ); > > > > If we remove the 1st, 8th and the last columns > > > > $a->seq() will be 'tcgatcatcgatc' > > $a->_ungapped_len==12 > > > > Actually, in the real world, the first residue will still be 30 (the old > $seq->start), and the last residue is the residue before the 43 (the old > $seq->end), thus 42. > > > > But if you call a validation, the calculation is > $a->_ungapped_len+$a->start-1=12+30-1=41 > > So the reassignment of the $seq->end will not pass the validation. > > > > So unless you save the information to a new sequence object, the original > position information will be lost anyway. But in some cases, we have to > change the sequence in its original sequence object .. > > > > What is your suggestion on this issue? > > A. pass the test and lose the information ? ? ?#convenient in coding but the > start-end annotation is not right any more > > B. keep the information and forget the test ? #the object will still > remember where the last residue was in the original sequence. But is it > really meaningful at all? Because all the other residues may come from > nowhere > > C. Neither of above #any other suggestions? > > > > Cheers, > > Jun Yin > > Ph.D. student in U.C.D. > > > > Bioinformatics Laboratory > > Conway Institute > > University College Dublin > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > __________ Information from ESET Smart Security, version of virus signature database 5377 (20100818) __________ The message was checked by ESET Smart Security. http://www.eset.com __________ Information from ESET Smart Security, version of virus signature database 5377 (20100818) __________ The message was checked by ESET Smart Security. http://www.eset.com From cjfields at illinois.edu Fri Aug 20 12:23:07 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 20 Aug 2010 11:23:07 -0500 Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query In-Reply-To: <20100820155956.GB400@vpn-165-124-164-118.vpn.northwestern.edu> References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au> <986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net> <045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net> <5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu> <4C6DADDF.1000103@cornell.edu> <20100820155956.GB400@vpn-165-124-164-118.vpn.northwestern.edu> Message-ID: <1282321387.30812.17.camel@pyrimidine.igb.uiuc.edu> On Fri, 2010-08-20 at 10:59 -0500, Siddhartha Basu wrote: > Hi, > > On Thu, 19 Aug 2010, Robert Buels wrote: > > > Chris Fields wrote: > > > I think it's worth exploring having a DBIx::Class-based middle-ware > > > approach similar to what Rob Buels has done for Chado. That would be > > > fairly easy to get started using DBIx::Class::Schema::Loader. > > > After that it would require optimization and tweaking, which is > > > potentially more complex than Rob's setup as Chado is very Pg-specific, > > > but maybe Rob can elaborate... > > > > Elaborating on how Bio::Chado::Schema is developed: > > > > The vast majority of the code and POD in BCS is autogenerated by > > DBIx::Class::Schema::Loader. DBICSL gives you a baseline set of > > DBIx::Class classes that covers all the tables, views, columns, unique > > constraints, and foreign key relationships. > > > > Beyond that, you have to add on yourself. In BCS, we have mostly done > > things like: > > > > * make better-named aliases for some of the autogenerated > > relationships (though DBICSL does a surprisingly good job of naming > > relationships automatically most of the time) > > * add a tiny bit of bioperl compatibility (this needs a lot more work > > by somebody, volunteers needed!) > > * add convenience methods for using some of the Chado property tables > > * use DBIx::Class::Tree::NestedSet to add some powerful ways of > > traversing phylogenetic tree relationships > > > > Regarding DB backend specificity, BCS isn't Pg-specific at all, because > > DBIx::Class itself goes to great lengths to be compatible (and performant!) > > with just about every relational database out there. > I would vouch for that at least as far as chado in oracle is concerned. > So, far BCS works out flawlessly with our oracle chado instance at > dictybase. Quite a chunk of BCS based code is also active in couple of > our Mojo based webapps. The part which i still couldn't use directly is > the 'synonym' table as it clashes with oracle specific reserved keywords. > However, overall it seems to quite cross-RDMS compatible and highly > recommended. > > -siddhartha Just to point out, I didn't say BCS is Pg-specific, but that Chado is (that was the DBMS it was designed for). Maybe that should be amended to 'was' now :) I recall seeing a page on this somewhere on the GMOD website along the lines of "MySQL has problems so we chose Pg", and that Chado support would focus on Pg. I'm guessing that's no longer the case? Or is only the server-side stuff Pg-specific. > >In fact, the BCS test > > suite deploys a Chado schema into a temporary SQLite database using > > DBIC::Schema's deploy() method, and runs all of its tests on that. Very > > handy. > > > > Chado's Pg-specific server-side functions can of course be called through > > BCS if they are present, but it's perfectly possible to use Chado without > > any of the server-side functions, and mostly the way I use it. > > > > Rob I think this opens up the possibility of starting a DBIx::Class-based middleware solution. Hilmar, did you want to take that on? chris From sidd.basu at gmail.com Fri Aug 20 13:39:44 2010 From: sidd.basu at gmail.com (Siddhartha Basu) Date: Fri, 20 Aug 2010 12:39:44 -0500 Subject: [Bioperl-l] Re: bioperl-db and postgres8.3 - status query In-Reply-To: <1282321387.30812.17.camel@pyrimidine.igb.uiuc.edu> References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au> <986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net> <045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net> <5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu> <4C6DADDF.1000103@cornell.edu> <20100820155956.GB400@vpn-165-124-164-118.vpn.northwestern.edu> <1282321387.30812.17.camel@pyrimidine.igb.uiuc.edu> Message-ID: <20100820173942.GC400@vpn-165-124-164-118.vpn.northwestern.edu> On Fri, 20 Aug 2010, Chris Fields wrote: > On Fri, 2010-08-20 at 10:59 -0500, Siddhartha Basu wrote: > > Hi, > > > > On Thu, 19 Aug 2010, Robert Buels wrote: > > > > > Chris Fields wrote: > > > > I think it's worth exploring having a DBIx::Class-based middle-ware > > > > approach similar to what Rob Buels has done for Chado. That would be > > > > fairly easy to get started using DBIx::Class::Schema::Loader. > > > > After that it would require optimization and tweaking, which is > > > > potentially more complex than Rob's setup as Chado is very Pg-specific, > > > > but maybe Rob can elaborate... > > > > > > Elaborating on how Bio::Chado::Schema is developed: > > > > > > The vast majority of the code and POD in BCS is autogenerated by > > > DBIx::Class::Schema::Loader. DBICSL gives you a baseline set of > > > DBIx::Class classes that covers all the tables, views, columns, unique > > > constraints, and foreign key relationships. > > > > > > Beyond that, you have to add on yourself. In BCS, we have mostly done > > > things like: > > > > > > * make better-named aliases for some of the autogenerated > > > relationships (though DBICSL does a surprisingly good job of naming > > > relationships automatically most of the time) > > > * add a tiny bit of bioperl compatibility (this needs a lot more work > > > by somebody, volunteers needed!) > > > * add convenience methods for using some of the Chado property tables > > > * use DBIx::Class::Tree::NestedSet to add some powerful ways of > > > traversing phylogenetic tree relationships > > > > > > Regarding DB backend specificity, BCS isn't Pg-specific at all, because > > > DBIx::Class itself goes to great lengths to be compatible (and performant!) > > > with just about every relational database out there. > > I would vouch for that at least as far as chado in oracle is concerned. > > So, far BCS works out flawlessly with our oracle chado instance at > > dictybase. Quite a chunk of BCS based code is also active in couple of > > our Mojo based webapps. The part which i still couldn't use directly is > > the 'synonym' table as it clashes with oracle specific reserved keywords. > > However, overall it seems to quite cross-RDMS compatible and highly > > recommended. > > > > -siddhartha > > Just to point out, I didn't say BCS is Pg-specific, but that Chado is > (that was the DBMS it was designed for). Maybe that should be amended > to 'was' now :) > > I recall seeing a page on this somewhere on the GMOD website along the > lines of "MySQL has problems so we chose Pg", and that Chado support > would focus on Pg. As far as i understand GMOD stongly recommends and the popular backend for chado is Pg. However, my point was if anybody wants to use or tryout chado schema on a different backend or have an existing setup, tools like DBIx::Class or particularly BCS makes it quite easier to do so. The code developed on top also become quite robust and portable. -siddhartha >I'm guessing that's no longer the case? Or is only > the server-side stuff Pg-specific. > > > >In fact, the BCS test > > > suite deploys a Chado schema into a temporary SQLite database using > > > DBIC::Schema's deploy() method, and runs all of its tests on that. Very > > > handy. > > > > > > Chado's Pg-specific server-side functions can of course be called through > > > BCS if they are present, but it's perfectly possible to use Chado without > > > any of the server-side functions, and mostly the way I use it. > > > > > > Rob > > I think this opens up the possibility of starting a DBIx::Class-based > middleware solution. Hilmar, did you want to take that on? > > chris > > From buiduyminh at gmail.com Fri Aug 20 17:29:00 2010 From: buiduyminh at gmail.com (Minh Bui) Date: Fri, 20 Aug 2010 17:29:00 -0400 Subject: [Bioperl-l] bp_seqfeature_load.pl fails on Mac os. Please help. Message-ID: Hi,, I am trying to load my GFF file to mysql database but I got this error when I ran the bp_seqfeature_load.pl ( bioperl 1.6.1 on MAC) [BioComplexity-5:/usr/local/bin] minh% perl bp_seqfeature_load.pl install_driver(mysql) failed: Can't locate DBD/mysql.pm in @INC (@INC contains: /sw/lib/perl5 /sw/lib/perl5/darwin /System/Library/Perl/5.8.6/darwin-thread-multi-2level /System/Library/Perl/5.8.6 /Library/Perl/5.8.6/darwin-thread-multi-2level /Library/Perl/5.8.6 /Library/Perl /Network/Library/Perl/5.8.6/darwin-thread-multi-2level /Network/Library/Perl/5.8.6 /Network/Library/Perl /System/Library/Perl/Extras/5.8.6/darwin-thread-multi-2level /System/Library/Perl/Extras/5.8.6 /Library/Perl/5.8.1 .) at (eval 44) line 3. Perhaps the DBD::mysql perl module hasn't been fully installed, or perhaps the capitalisation of 'mysql' isn't right. Available drivers: DBM, ExampleP, File, Gofer, Proxy, Sponge. at /Library/Perl/5.8.6/Bio/DB/SeqFeature/Store/DBI/mysql.pm line 212 I am using MAC OSX version 10.4.10 and MAMP? Isnt it the "/Library/Perl/5.8.6" already in @INC? What am I missing? I have been googling this error for a few hours. I also install Bioperl and reinstall DBD::mysql using CPAN. It still doesnt work.. Here is my $PERL5LIB: /sw/lib/perl5:/sw/lib/perl5/darwin/ I really need help on this. Thank you, From awitney at sgul.ac.uk Sat Aug 21 06:39:10 2010 From: awitney at sgul.ac.uk (Adam Witney) Date: Sat, 21 Aug 2010 11:39:10 +0100 Subject: [Bioperl-l] bp_seqfeature_load.pl fails on Mac os. Please help. In-Reply-To: References: Message-ID: <491D1B66-741F-4315-8A6B-46F465956017@sgul.ac.uk> On 20 Aug 2010, at 22:29, Minh Bui wrote: > Hi,, > I am trying to load my GFF file to mysql database but I got this error > when I ran the bp_seqfeature_load.pl ( bioperl 1.6.1 on MAC) > > [BioComplexity-5:/usr/local/bin] minh% perl bp_seqfeature_load.pl > install_driver(mysql) failed: Can't locate DBD/mysql.pm in @INC (@INC > contains: /sw/lib/perl5 /sw/lib/perl5/darwin > /System/Library/Perl/5.8.6/darwin-thread-multi-2level > /System/Library/Perl/5.8.6 > /Library/Perl/5.8.6/darwin-thread-multi-2level /Library/Perl/5.8.6 > /Library/Perl /Network/Library/Perl/5.8.6/darwin-thread-multi-2level > /Network/Library/Perl/5.8.6 /Network/Library/Perl > /System/Library/Perl/Extras/5.8.6/darwin-thread-multi-2level > /System/Library/Perl/Extras/5.8.6 /Library/Perl/5.8.1 .) at (eval 44) > line 3. > Perhaps the DBD::mysql perl module hasn't been fully installed, > or perhaps the capitalisation of 'mysql' isn't right. > Available drivers: DBM, ExampleP, File, Gofer, Proxy, Sponge. > at /Library/Perl/5.8.6/Bio/DB/SeqFeature/Store/DBI/mysql.pm line 212 > > I am using MAC OSX version 10.4.10 and MAMP? Isnt it the > "/Library/Perl/5.8.6" already in @INC? What am I missing? > I have been googling this error for a few hours. I also install > Bioperl and reinstall DBD::mysql using CPAN. It still doesnt work.. > > Here is my $PERL5LIB: /sw/lib/perl5:/sw/lib/perl5/darwin/ Where did DBD:mysql get installed? can you verify that DBD/mysql.pm is actually in one of those directories listed above? From i.hatethispart at ymail.com Sat Aug 21 10:07:28 2010 From: i.hatethispart at ymail.com (keiko) Date: Sat, 21 Aug 2010 07:07:28 -0700 (PDT) Subject: [Bioperl-l] clustalw.exe In-Reply-To: <3612399.post@talk.nabble.com> References: <3612399.post@talk.nabble.com> Message-ID: <29499435.post@talk.nabble.com> Katrin wrote: > > hello, I am a new Perl/Bioperl-User and first I must excuse me for my > really bad english, but I hope everybody will understand me. I have the > following problem: In my Perl-skript is the following system call: > $y=exec("C:\\Programme\\xampp-win32-1.5.1\\xampp\\perl\\clustalw.exe > C:\\Programme\\xampp-win32-1.5.1\\xampp\\htdocs\\gene\\clustal.fasta"); If > I call this Script with the Shell (cmd.exe) everything works correctly. > But if I call this script with PHP I get the following error message: > Error: unknown option > /C:\Programme\xampp-win32-1.5.1\xampp\htdocs\gene\clustal.fasta. I tried > also system and qx. And I tested the environment variables: I wrote a > bat-file with the definition of all environment-variables and the system > call, but this did not work, too. The same problem is in php. The > PHP-Scipt is called from html and I worked under WindowsXP with xampp. I > hope, somebody can help me. greetings Katrin > Hi. I also have a problem with this one. I want to call clustalw using php. Can I ask what you included in your bat-file and where did you download your clustal? thanks a lot! -- View this message in context: http://old.nabble.com/clustalw.exe-tp3612399p29499435.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From jason at bioperl.org Sun Aug 22 14:29:30 2010 From: jason at bioperl.org (Jason Stajich) Date: Sun, 22 Aug 2010 11:29:30 -0700 Subject: [Bioperl-l] Enquiry on Bio::DB::Taxonomy In-Reply-To: References: Message-ID: <4C716C8A.3010000@bioperl.org> Hi Amali - This is how I'd print out the full classification by using the Tree methods (with probably a different way of initializing the $db object to your flatfiles location). #!/usr/bin/perl -w use strict; use Bio::DB::Taxonomy; my $db= Bio::DB::Taxonomy->new(-source => 'flatfile', -nodesfile => 'taxonomy/nodes.dmp', -namesfile => 'taxonomy/names.dmp'); my $taxonid = $db->get_taxonid('Homo sapiens'); my $taxon = $db->get_taxon(-taxonid => $taxonid); my $tree = Bio::Tree::Tree->new(-node => $taxon); my @taxa = $tree->get_nodes; print join(",", map { $_->scientific_name } @taxa), "\n"; -jason Amali Thrimawithana wrote, On 8/18/10 3:56 PM: > Dear Dr Stajich, > > I am a Masters student at Auckland university and my research is on > identifying yeast species present in wine by the use of 454 sequencing. In > order to carry out this research, a pipeline is being built in which at the > final step each representative OTU need to be classified at different > taxonomic levels (ie: at Phylum, family, class, genus and species) by using > the results from BLAST. To identify the sequences at each taxonomic level, I > have been trying out the Bio::DB::Taxonomy module in bioperl. Using this > module, I am able to get the genus and species level by splitting the > scientific name returned by the Bio::taxon object. But unfortunately I am > uncertain on how to get the information for the other levels of the rank. I > have tried several commands including "my @class = $node->classification;", > but it does not work. Hence, could you please let me know how I might be > able to get the higher levels of taxonomy such as class and phylum using > bioperl? > > Look forward to hearing from you soon > > Thanking You > > Amali > From cjfields at illinois.edu Sun Aug 22 15:56:58 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 22 Aug 2010 14:56:58 -0500 Subject: [Bioperl-l] clustalw.exe In-Reply-To: <29499435.post@talk.nabble.com> References: <3612399.post@talk.nabble.com> <29499435.post@talk.nabble.com> Message-ID: On Aug 21, 2010, at 9:07 AM, keiko wrote: > Katrin wrote: >> >> hello, I am a new Perl/Bioperl-User and first I must excuse me for my >> really bad english, but I hope everybody will understand me. I have the >> following problem: In my Perl-skript is the following system call: >> $y=exec("C:\\Programme\\xampp-win32-1.5.1\\xampp\\perl\\clustalw.exe >> C:\\Programme\\xampp-win32-1.5.1\\xampp\\htdocs\\gene\\clustal.fasta"); If >> I call this Script with the Shell (cmd.exe) everything works correctly. >> But if I call this script with PHP I get the following error message: >> Error: unknown option >> /C:\Programme\xampp-win32-1.5.1\xampp\htdocs\gene\clustal.fasta. I tried >> also system and qx. And I tested the environment variables: I wrote a >> bat-file with the definition of all environment-variables and the system >> call, but this did not work, too. The same problem is in php. The >> PHP-Scipt is called from html and I worked under WindowsXP with xampp. I >> hope, somebody can help me. greetings Katrin >> > > Hi. I also have a problem with this one. I want to call clustalw using php. > Can I ask what you included in your bat-file and where did you download your > clustal? thanks a lot! Not sure, but what does this have to do with BioPerl? chris From jason at bioperl.org Mon Aug 23 11:56:47 2010 From: jason at bioperl.org (Jason Stajich) Date: Mon, 23 Aug 2010 08:56:47 -0700 Subject: [Bioperl-l] a problem when using the Bioperl modules In-Reply-To: References: Message-ID: <4C729A3F.7080304@bioperl.org> Wei - Please ask your questions on the bioperl mailing list, I cannot answer questions directly for all requests. Your problem has been answered by me on the list before so I urge you to use the list archives as a starting point. The line lengths of the fasta file sequence aren't the same length. you need to run this bp_sreformat -if fasta -of fasta -i ORIGINAL -o NEW mv NEW ORIGINAL or with sreformat sreformat fasta ORIGINAL > NEW mv NEW ORIGINAL Guifeng Wei wrote, On 8/23/10 4:57 AM: > Dear professor Stajich, > So sorry to interrupt you. i came across a problem when i use the > Bio::DB::Fasta modules of BioPerl. The aim i want to arrive at is to > extract the subsequences accoording to the *.bed files which are the > C.elegans genomic sequnece annotation. The code i programed is in the > attached file. > The genomic sequences file contains sequences from 6 chromosomes of > C.elegans. > when i run this program in the command line, the following error > warnings was coming. > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Each line of the fasta entry must be the same length except the last. > Line above #301451 ' > ..' is 22 != 51 chars. > STACK: Error::throw > STACK: Bio::Root::Root::throw > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368 > STACK: Bio::DB::Fasta::calculate_offsets > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:770 > STACK: Bio::DB::Fasta::index_file > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:680 > STACK: Bio::DB::Fasta::new > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:491 > STACK: bed_to_fasta.pl:14 > ----------------------------------------------------------- > indexing was interrupted, so unlinking > /home/wgf/WORM_DATA/elegans.WS190.dna.fa.index at > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm line 1053. > > and therefore i write to you in hope that you can help me solve this > problem,as well as, give me some suggestion about how to learn Bioperl > well. > thank you very very much. > yours sincerely > Wei Guifeng From jason.stajich at ucr.edu Mon Aug 23 11:58:07 2010 From: jason.stajich at ucr.edu (Jason Stajich) Date: Mon, 23 Aug 2010 08:58:07 -0700 Subject: [Bioperl-l] a problem when using the Bioperl modules In-Reply-To: References: Message-ID: <4C729A8F.1070506@ucr.edu> You haven't defined this variable $db - you need to not skip the part that initializes the Bio::DB::Fasta object that you had previous asked about. Please send all your future queries to the mailing list. Guifeng Wei wrote, On 8/23/10 8:14 AM: > Dear professor, > after that, i revised my scripts, which is that i divide the genomic > sequences into 7 single file, every file contains the sequence from a > chromosome. > however, when i try to run the scripts, the following error was coming. > Can't call method "seq" on an undefined value at bed_to_fasta.pl > line 29, line 1. > while(){ > chomp $_; > my @bed=split(/\s+/, $_ ); > #print length($db->seq('chrI')); > my $chr_id=$bed[0]; > my $start=$bed[1]; > my $end=$bed[2]; > my $seq_name=$bed[3]; > my $strand=$bed[5]; > my $segment = $db ->seq($chr_id,$start=>$end); > print ">",$seq_name,"_",$chr_id,":",$start=>$end; > print "$segment\n"; > } > the blue line is . > why? -- Jason E. Stajich, PhD Assistant Professor Department of Plant Pathology & Microbiology University of California Riverside, CA 92521 jason.stajich at ucr.edu office: 951.827.2363 http://lab.stajich.org/ http://twitter.com/stajichlab http://fungalgenomes.org/blog/ http://plantpathology.ucr.edu/ http://genomics.ucr.edu/ http://cepceb.ucr.edu/ From guifengwei at gmail.com Mon Aug 23 22:44:57 2010 From: guifengwei at gmail.com (Guifeng Wei) Date: Tue, 24 Aug 2010 10:44:57 +0800 Subject: [Bioperl-l] a problem when using the Bio::DB::Fasta Message-ID: Hi, i came across a problem when i use the Bio::DB::Fasta modules of BioPerl. The aim i want to arrive at is to extract the subsequences accoording to the *.bed files which are the C.elegans genomic sequnece annotation. when i tried to run the scripts i wrote, the error message was coming, as follows: Can't call method "seq" on an undefined value at bed_to_fasta.pl line 28, line 1. so, ask for favor to slove this problem. Here is my perl scripts. #!/usr/bin/perl -w # Purpose: extract sequences from genomic sequences use strict; use Bio::DB::Fasta; open(IN,$ARGV[0]) || die "sorry, the program cannot open the .bed file, plea check it. \n"; my $db = Bio::DB::Fasta->new( '/home/wgf/elegans190.dna/' ); # The dir ...../elegans190.dna/ includes 6 files:chrI,chrII,chrIII,chrIV,chrV,chrX, #each stands for the sequences from the coressponding chromosome. while(){ chomp $_; my @bed=split(/\s+/, $_ ); my $chr_id=$bed[0]; my $start=$bed[1]; my $end=$bed[2]; my $seq_name=$bed[3]; my $strand=$bed[5]; my $segment = $db->seq( $chr_id, $start=>$end ); print ">",$seq_name,"_",$chr_id,":",$start=>$end; print "$segment\n"; } close(IN); From florent.angly at gmail.com Tue Aug 24 01:06:21 2010 From: florent.angly at gmail.com (Florent Angly) Date: Tue, 24 Aug 2010 15:06:21 +1000 Subject: [Bioperl-l] a problem when using the Bio::DB::Fasta In-Reply-To: References: Message-ID: <4C73534D.6080607@gmail.com> Hi Guifeng, From the Bio::DB::Fasta documentation: > $db = Bio::DB::Fasta->new($fasta_path [,%options]) > Create a new Bio::DB::Fasta object from the Fasta file or files > indicated by $fasta_path. Indexing will be performed > automatically > if needed. If successful, new() will return the database > accessor > object. Otherwise it will return undef. Hence, after you create the database object $db, you should check that it was successful, e.g.: > my $db = Bio::DB::Fasta->new( '/home/wgf/elegans190.dna/' ); > if (not defined $db) { > die "There was a problem creating the database\n"; > } A problem creating the database would explain the message you get. If the extension of the FASTA files in the directory path that you gave as input is not fa, fasta, fast, FA, FASTA, FAST or dna, then you should use the -glob option when constructing your database object. From the documentation: > -glob Glob expression to use > *.{fa,fasta,fast,FA,FASTA,FAST,dna} > for searching for Fasta > files in directories. Florent On 24/08/10 12:44, Guifeng Wei wrote: > Hi, > > i came across a problem when i use the Bio::DB::Fasta modules of > BioPerl. The aim i want to arrive at is to extract the subsequences > accoording to the *.bed files which are the C.elegans genomic sequnece > annotation. > > when i tried to run the scripts i wrote, the error message was coming, as > follows: > > Can't call method "seq" on an undefined value at bed_to_fasta.pl line 28, > line 1. > > so, ask for favor to slove this problem. > Here is my perl scripts. > > #!/usr/bin/perl -w > # Purpose: extract sequences from genomic sequences > use strict; > use Bio::DB::Fasta; > open(IN,$ARGV[0]) || die "sorry, the program cannot open the .bed file, plea > check it. \n"; > my $db = Bio::DB::Fasta->new( '/home/wgf/elegans190.dna/' ); > # The dir ...../elegans190.dna/ includes 6 > files:chrI,chrII,chrIII,chrIV,chrV,chrX, > #each stands for the sequences from the coressponding chromosome. > > while(){ > chomp $_; > my @bed=split(/\s+/, $_ ); > > my $chr_id=$bed[0]; > my $start=$bed[1]; > my $end=$bed[2]; > my $seq_name=$bed[3]; > my $strand=$bed[5]; > > my $segment = $db->seq( $chr_id, $start=>$end ); > > print ">",$seq_name,"_",$chr_id,":",$start=>$end; > print "$segment\n"; > > } > > close(IN); > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From guifengwei at gmail.com Tue Aug 24 07:28:16 2010 From: guifengwei at gmail.com (Guifeng Wei) Date: Tue, 24 Aug 2010 19:28:16 +0800 Subject: [Bioperl-l] a problem when using the Bio::DB::Fasta In-Reply-To: References: Message-ID: Hi, i have revised my scripts according to the previous email from Florent. However, there were still some errors which frustrated me so much. The errors are as follows: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Each line of the fasta entry must be the same length except the last. Line above #301451 ' ..' is 22 != 51 chars. STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368 STACK: Bio::DB::Fasta::calculate_offsets /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:770 STACK: Bio::DB::Fasta::index_dir /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:593 STACK: Bio::DB::Fasta::new /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:488 STACK: bed2fasta.pl:13 ----------------------------------------------------------- indexing was interrupted, so unlinking /home/wgf/elegans190.dna//directory.index at /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm line 1053 But in the directory /home/wgf/elegans190.dna/ , it concludes 6 files, each contains the complete sequences from one single chromosome, the format is fasta. The extension of the FASTA files is .fa. Every single file is started as ">chromosoemeXXX" followed by the thousands of sequences. and therefore, it warn me that "Each line of the fasta entry must be the same length except the last". and "indexing was interrupted, so unlinking /home/wgf/elegans190.dna//directory". i was much confused about this. so for help. Wei Guifeng From biopython at maubp.freeserve.co.uk Tue Aug 24 09:28:33 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 24 Aug 2010 14:28:33 +0100 Subject: [Bioperl-l] a problem when using the Bio::DB::Fasta In-Reply-To: References: Message-ID: On Tue, Aug 24, 2010 at 12:28 PM, Guifeng Wei wrote: > Hi, > > i have revised my scripts according to the previous email from Florent. > However, there were still some errors which frustrated me so much. > > The errors are as follows: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Each line of the fasta entry must be the same length except the last. > ? ?Line above #301451 ' > ..' is 22 != 51 chars. > STACK: Error::throw > STACK: Bio::Root::Root::throw > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368 > STACK: Bio::DB::Fasta::calculate_offsets > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:770 > STACK: Bio::DB::Fasta::index_dir > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:593 > STACK: Bio::DB::Fasta::new > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:488 > STACK: bed2fasta.pl:13 > ----------------------------------------------------------- > indexing was interrupted, so unlinking > /home/wgf/elegans190.dna//directory.index at > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm line 1053 > But in the directory /home/wgf/elegans190.dna/ , it concludes 6 files, > each contains the complete sequences from one single chromosome, the format > is fasta. The extension of the FASTA files is .fa. Every single file is > started as ">chromosoemeXXX" followed by the thousands of sequences. > > and therefore, it warn me that "Each line of the fasta entry must be the > same length except the last". and "indexing was interrupted, so unlinking > /home/wgf/elegans190.dna//directory". > > i was much confused about this. so for help. > > Wei Guifeng Hi Wei, It sounds like there is inconsistent line wrapping in your FASTA file. This is often not a problem at all, but the DB indexing system (and indeed other indexing tools like the samtools fasta index) requires all the entries have the same wrapping. e.g. This is a valid FASTA file but would not be suitable for indexing: >Test ACGTACGT ACGTACGT ACGTACGT ACGT ACGT T Ignoring the final line (special case - here length one) that uses a mixture of line lengths, 8 and 4. If you had used this it should be fine: >Test ACGTACGT ACGTACGT ACGTACGT ACGTACGT T All the lines are now wrapped at length 8 (and the final line is less than or equal to length 8). Of course, in a real file wrapping a 60 or 80 characters is more common ;) Peter From cjfields at illinois.edu Tue Aug 24 09:38:45 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 24 Aug 2010 08:38:45 -0500 Subject: [Bioperl-l] a problem when using the Bio::DB::Fasta In-Reply-To: References: Message-ID: <995BCF30-99B2-46C2-A4E8-681F9E2A0BB5@illinois.edu> Guifeng, Did you follow Jason's advice yesterday about converting the FASTA over to a more consistent length? Or checking the database itself? These are both things reiterated by Florent and Peter. >From Jason's last response: ------------------------- Wei - Please ask your questions on the bioperl mailing list, I cannot answer questions directly for all requests. Your problem has been answered by me on the list before so I urge you to use the list archives as a starting point. The line lengths of the fasta file sequence aren't the same length. you need to run this bp_sreformat -if fasta -of fasta -i ORIGINAL -o NEW mv NEW ORIGINAL or with sreformat sreformat fasta ORIGINAL > NEW mv NEW ORIGINAL ------------------------- chris On Aug 24, 2010, at 6:28 AM, Guifeng Wei wrote: > Hi, > > i have revised my scripts according to the previous email from Florent. > However, there were still some errors which frustrated me so much. > > The errors are as follows: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Each line of the fasta entry must be the same length except the last. > Line above #301451 ' > ..' is 22 != 51 chars. > STACK: Error::throw > STACK: Bio::Root::Root::throw > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368 > STACK: Bio::DB::Fasta::calculate_offsets > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:770 > STACK: Bio::DB::Fasta::index_dir > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:593 > STACK: Bio::DB::Fasta::new > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:488 > STACK: bed2fasta.pl:13 > ----------------------------------------------------------- > indexing was interrupted, so unlinking > /home/wgf/elegans190.dna//directory.index at > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm line 1053 > But in the directory /home/wgf/elegans190.dna/ , it concludes 6 files, > each contains the complete sequences from one single chromosome, the format > is fasta. The extension of the FASTA files is .fa. Every single file is > started as ">chromosoemeXXX" followed by the thousands of sequences. > > and therefore, it warn me that "Each line of the fasta entry must be the > same length except the last". and "indexing was interrupted, so unlinking > /home/wgf/elegans190.dna//directory". > > i was much confused about this. so for help. > > Wei Guifeng > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From scott at scottcain.net Tue Aug 24 11:01:47 2010 From: scott at scottcain.net (Scott Cain) Date: Tue, 24 Aug 2010 11:01:47 -0400 Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query In-Reply-To: <1282321387.30812.17.camel@pyrimidine.igb.uiuc.edu> References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au> <986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net> <045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net> <5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu> <4C6DADDF.1000103@cornell.edu> <20100820155956.GB400@vpn-165-124-164-118.vpn.northwestern.edu> <1282321387.30812.17.camel@pyrimidine.igb.uiuc.edu> Message-ID: Hi Chris, GMOD still only supports Chado with Postgres (for example, the GFF loader assumes a Postgres database), but when I reengineered the GFF loader a few years ago, I tried to do it with subclassing the loader in mind so that it could be subclassed to work with other RDMS. Scott On Fri, Aug 20, 2010 at 12:23 PM, Chris Fields wrote: > On Fri, 2010-08-20 at 10:59 -0500, Siddhartha Basu wrote: >> Hi, >> >> On Thu, 19 Aug 2010, Robert Buels wrote: >> >> > Chris Fields wrote: >> > > I think it's worth exploring having a DBIx::Class-based middle-ware >> > > approach similar to what Rob Buels has done for Chado. ?That would be >> > > fairly easy to get started using DBIx::Class::Schema::Loader. >> > > After that it would require optimization and tweaking, which is >> > > potentially more complex than Rob's setup as Chado is very Pg-specific, >> > > but maybe Rob can elaborate... >> > >> > Elaborating on how Bio::Chado::Schema is developed: >> > >> > The vast majority of the code and POD in BCS is autogenerated by >> > DBIx::Class::Schema::Loader. ?DBICSL gives you a baseline set of >> > DBIx::Class classes that covers all the tables, views, columns, unique >> > constraints, and foreign key relationships. >> > >> > Beyond that, you have to add on yourself. ?In BCS, we have mostly done >> > things like: >> > >> > ? * make better-named aliases for some of the autogenerated >> > ? ? relationships (though DBICSL does a surprisingly good job of naming >> > ? ? relationships automatically most of the time) >> > ? * add a tiny bit of bioperl compatibility (this needs a lot more work >> > ? ? by somebody, volunteers needed!) >> > ? * add convenience methods for using some of the Chado property tables >> > ? * use DBIx::Class::Tree::NestedSet to add some powerful ways of >> > ? ? traversing phylogenetic tree relationships >> > >> > Regarding DB backend specificity, BCS isn't Pg-specific at all, because >> > DBIx::Class itself goes to great lengths to be compatible (and performant!) >> > with just about every relational database out there. >> I would vouch for that at least as far as chado in oracle is concerned. >> So, ?far BCS works out flawlessly with our oracle chado instance at >> dictybase. Quite a chunk of BCS based code is also active in couple of >> our Mojo based webapps. The part which i still couldn't use directly is >> the 'synonym' table as it clashes with oracle specific reserved keywords. >> However, ?overall it seems to quite cross-RDMS compatible and highly >> recommended. >> >> -siddhartha > > Just to point out, I didn't say BCS is Pg-specific, but that Chado is > (that was the DBMS it was designed for). ?Maybe that should be amended > to 'was' now :) > > I recall seeing a page on this somewhere on the GMOD website along the > lines of "MySQL has problems so we chose Pg", and that Chado support > would focus on Pg. ?I'm guessing that's no longer the case? ?Or is only > the server-side stuff Pg-specific. > >> >In fact, the BCS test >> > suite deploys a Chado schema into a temporary SQLite database using >> > DBIC::Schema's deploy() method, and runs all of its tests on that. ?Very >> > handy. >> > >> > Chado's Pg-specific server-side functions can of course be called through >> > BCS if they are present, but it's perfectly possible to use Chado without >> > any of the server-side functions, and mostly the way I use it. >> > >> > Rob > > I think this opens up the possibility of starting a DBIx::Class-based > middleware solution. ?Hilmar, did you want to take that on? > > chris > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From bgs500 at york.ac.uk Tue Aug 24 11:35:53 2010 From: bgs500 at york.ac.uk (Ben Saville) Date: Tue, 24 Aug 2010 16:35:53 +0100 Subject: [Bioperl-l] Problem Parsing BLAST output In-Reply-To: <0384052D-74D2-4789-B7FA-76EED826044F@sbc.su.se> References: <77D8BE69-E2D3-4C75-BED4-08F1755E414F@york.ac.uk> <0384052D-74D2-4789-B7FA-76EED826044F@sbc.su.se> Message-ID: <34F7412D-2BFA-4D80-AEEB-2B8A9BE415D4@york.ac.uk> Sorry for the Delay in replying, 454 data analysis is very time consuming. please see http://seqanswers.com/forums/showthread.php?t=6484 For a discussion about this problem, and how we solved the issue. Thanks for the reply though, much appreciated! Regards Ben Saville On 20 Aug 2010, at 14:48, Dave Messina wrote: > Hi Ben, > > I would not use the script you posted ? I don't think it does what > you want. > > If you haven't already, you should take a look at the beginners' HOWTO > > http://www.bioperl.org/wiki/HOWTO:Beginners > > > the SearchIO HOWTO > > http://www.bioperl.org/wiki/HOWTO:SearchIO > > > and the example scripts included with BioPerl: > > http://www.bioperl.org/wiki/Scripts > > > > Incidentally, it's a lot of fiddly data processing to parse blast > reports for many contigs against multiple databases and then go back > and collate the results by query. I'm not sure exactly what you want > to do once you've separated by query ? if you provide some more > information, we could suggest ways to best get you where you want to > go. > > I will mention, though, that BLAST has the ability to search > multiple separate databases in one go and collate the results for > you. So that's something to consider. > > > > Dave > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Aug 24 11:54:20 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 24 Aug 2010 10:54:20 -0500 Subject: [Bioperl-l] a problem when using the Bio::DB::Fasta In-Reply-To: References: <995BCF30-99B2-46C2-A4E8-681F9E2A0BB5@illinois.edu> Message-ID: Please keep all responses on-list. Regarding sreformat: http://tinyurl.com/28q75rr Judging by the stack traces below, you are also running off a UNIX-like system. To concatenate files, use 'cat'. So, for all files ending with .fa: cat *.fa >> all.fa chris On Aug 24, 2010, at 8:54 AM, Guifeng Wei wrote: > Hello Fields, > > i have checked the fasta files. i suddenly find that the last line is blank line, and the last second is less than common. > > i am not able to run the command line as Jason's advice because i have no knowledge about "sreformat". > > i also want to ask a more question. i want megre the several single chromosome sequence file into one, OK? > > thank you very much. > > Wei Guifeng > 2010/8/24 Chris Fields > Guifeng, > > Did you follow Jason's advice yesterday about converting the FASTA over to a more consistent length? Or checking the database itself? These are both things reiterated by Florent and Peter. > > From Jason's last response: > > ------------------------- > Wei - > > Please ask your questions on the bioperl mailing list, I cannot answer questions directly for all requests. > Your problem has been answered by me on the list before so I urge you to use the list archives as a starting point. > > The line lengths of the fasta file sequence aren't the same length. > > you need to run this > bp_sreformat -if fasta -of fasta -i ORIGINAL -o NEW > mv NEW ORIGINAL > > or with sreformat > sreformat fasta ORIGINAL > NEW > mv NEW ORIGINAL > ------------------------- > > chris > > > On Aug 24, 2010, at 6:28 AM, Guifeng Wei wrote: > > > Hi, > > > > i have revised my scripts according to the previous email from Florent. > > However, there were still some errors which frustrated me so much. > > > > The errors are as follows: > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: Each line of the fasta entry must be the same length except the last. > > Line above #301451 ' > > ..' is 22 != 51 chars. > > STACK: Error::throw > > STACK: Bio::Root::Root::throw > > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368 > > STACK: Bio::DB::Fasta::calculate_offsets > > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:770 > > STACK: Bio::DB::Fasta::index_dir > > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:593 > > STACK: Bio::DB::Fasta::new > > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:488 > > STACK: bed2fasta.pl:13 > > ----------------------------------------------------------- > > indexing was interrupted, so unlinking > > /home/wgf/elegans190.dna//directory.index at > > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm line 1053 > > But in the directory /home/wgf/elegans190.dna/ , it concludes 6 files, > > each contains the complete sequences from one single chromosome, the format > > is fasta. The extension of the FASTA files is .fa. Every single file is > > started as ">chromosoemeXXX" followed by the thousands of sequences. > > > > and therefore, it warn me that "Each line of the fasta entry must be the > > same length except the last". and "indexing was interrupted, so unlinking > > /home/wgf/elegans190.dna//directory". > > > > i was much confused about this. so for help. > > > > Wei Guifeng > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > -- > ?????? Wei Guifeng > > > From cjfields at illinois.edu Tue Aug 24 12:14:51 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 24 Aug 2010 11:14:51 -0500 Subject: [Bioperl-l] Problem Parsing BLAST output In-Reply-To: <34F7412D-2BFA-4D80-AEEB-2B8A9BE415D4@york.ac.uk> References: <77D8BE69-E2D3-4C75-BED4-08F1755E414F@york.ac.uk> <0384052D-74D2-4789-B7FA-76EED826044F@sbc.su.se> <34F7412D-2BFA-4D80-AEEB-2B8A9BE415D4@york.ac.uk> Message-ID: <69C47A74-09C7-4024-9303-A3893658A2A8@illinois.edu> Just in case anyone needs it, there is a way to index these as well (both BLAST and the two tabular BLAST versions) for fast lookups of specific reports, if needed. See Bio::Index::Blast and Bio::Index::BlastTable in BioPerl. Caveat: I believe there is a bug with BLAST+ text output indexing (it chops the header off subsequent reports). I haven't investigated it enough, though, but I'll try looking into it today. chris On Aug 24, 2010, at 10:35 AM, Ben Saville wrote: > Sorry for the Delay in replying, 454 data analysis is very time consuming. > > please see http://seqanswers.com/forums/showthread.php?t=6484 > For a discussion about this problem, and how we solved the issue. > > Thanks for the reply though, much appreciated! > > Regards > Ben Saville > > > > > > On 20 Aug 2010, at 14:48, Dave Messina wrote: > >> Hi Ben, >> >> I would not use the script you posted ? I don't think it does what you want. >> >> If you haven't already, you should take a look at the beginners' HOWTO >> >> http://www.bioperl.org/wiki/HOWTO:Beginners >> >> >> the SearchIO HOWTO >> >> http://www.bioperl.org/wiki/HOWTO:SearchIO >> >> >> and the example scripts included with BioPerl: >> >> http://www.bioperl.org/wiki/Scripts >> >> >> >> Incidentally, it's a lot of fiddly data processing to parse blast reports for many contigs against multiple databases and then go back and collate the results by query. I'm not sure exactly what you want to do once you've separated by query ? if you provide some more information, we could suggest ways to best get you where you want to go. >> >> I will mention, though, that BLAST has the ability to search multiple separate databases in one go and collate the results for you. So that's something to consider. >> >> >> >> Dave >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Aug 24 12:17:17 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 24 Aug 2010 11:17:17 -0500 Subject: [Bioperl-l] FYI: interesting stuff in BLAST 2.2.24 release announcement References: <73C34D2F-813E-4FCE-8819-B86BC5A974C0@ncbi.nlm.nih.gov> Message-ID: FYI, Very interesting additions to BLAST+ (archive format). chris Begin forwarded message: > From: mcginnis > Date: August 24, 2010 10:46:50 AM CDT > To: NLM/NCBI List blast-announce > Subject: [blast-announce] Correction: BLAST 2.2.24 release announcement > > A new version of the stand-alone applications is available. > > Users are encouraged to use the BLAST+ applications available at ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ > > This release includes a number of bug fixes as well as new features for the BLAST+ applications: > > * Introduce BLAST Archive format to permit reformatting of stand-alone BLAST searches with the blast_formatter(see BLAST+ user manual) > * Added the blast_formatter application (see BLAST+ user manual) > * Added support for translated subject soft masking in the BLAST databases > * Added support for the BLAST Trace-back operations (btop) output format > * Added command line options to blastdbcmd for listing available BLAST databases > * Improved performance of formatting of remote BLAST searches > * Use a consistent exit code for out of memory conditions > * Fixed bug in indexed megablast with multiple space-separated BLAST databases > * Fixed bugs in legacy_blast.pl, blastdbcmd, rpsblast, and makeblastdb > * Fixed Windows installer for 64-bit installations > > BLAST+ applications, as well as the legacy C applications (e.g. blastall), may be downloaded from http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download From David.Messina at sbc.su.se Tue Aug 24 13:00:14 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 24 Aug 2010 19:00:14 +0200 Subject: [Bioperl-l] FYI: interesting stuff in BLAST 2.2.24 release announcement In-Reply-To: References: <73C34D2F-813E-4FCE-8819-B86BC5A974C0@ncbi.nlm.nih.gov> Message-ID: <27DD75E8-4452-4B2D-B5B9-A686C113E5B6@sbc.su.se> Here's a link to the manual: ftp://ftp.ncbi.nlm.nih.gov//blast/executables/blast%2B/2.2.24/user_manual.pdf (Is it on the NCBI website somewhere? Strange to have only a downloadable PDF.) The section on the new archive format is on page 27. It seems like a nice idea to have the flexibility, but I wonder about the time cost of using this format. One of the big gains from using tab-delimited output is that BLAST doesn't have to do all the post-processing to generate the alignment views. By doing the archive format, which if I understand it correctly is ASN.1, you're always paying the full price in time (and space, for that matter). Dave On Aug 24, 2010, at 18:17 , Chris Fields wrote: > FYI, > > Very interesting additions to BLAST+ (archive format). > > chris > > Begin forwarded message: > >> From: mcginnis >> Date: August 24, 2010 10:46:50 AM CDT >> To: NLM/NCBI List blast-announce >> Subject: [blast-announce] Correction: BLAST 2.2.24 release announcement >> >> A new version of the stand-alone applications is available. >> >> Users are encouraged to use the BLAST+ applications available at ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ >> >> This release includes a number of bug fixes as well as new features for the BLAST+ applications: >> >> * Introduce BLAST Archive format to permit reformatting of stand-alone BLAST searches with the blast_formatter(see BLAST+ user manual) >> * Added the blast_formatter application (see BLAST+ user manual) >> * Added support for translated subject soft masking in the BLAST databases >> * Added support for the BLAST Trace-back operations (btop) output format >> * Added command line options to blastdbcmd for listing available BLAST databases >> * Improved performance of formatting of remote BLAST searches >> * Use a consistent exit code for out of memory conditions >> * Fixed bug in indexed megablast with multiple space-separated BLAST databases >> * Fixed bugs in legacy_blast.pl, blastdbcmd, rpsblast, and makeblastdb >> * Fixed Windows installer for 64-bit installations >> >> BLAST+ applications, as well as the legacy C applications (e.g. blastall), may be downloaded from http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Aug 24 13:26:49 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 24 Aug 2010 12:26:49 -0500 Subject: [Bioperl-l] FYI: interesting stuff in BLAST 2.2.24 release announcement In-Reply-To: <27DD75E8-4452-4B2D-B5B9-A686C113E5B6@sbc.su.se> References: <73C34D2F-813E-4FCE-8819-B86BC5A974C0@ncbi.nlm.nih.gov> <27DD75E8-4452-4B2D-B5B9-A686C113E5B6@sbc.su.se> Message-ID: It's probably more applicable from the viewpoint of a cluster admin who would want to add the flexibility of having a single archive and allowing any format (as opposed to re-running the analysis). I'm just wondering if there is anything to glean there for possible alignment archiving purposes (ala SAM/BAM), but if it's ASN.1, likely not. chris On Aug 24, 2010, at 12:00 PM, Dave Messina wrote: > Here's a link to the manual: > ftp://ftp.ncbi.nlm.nih.gov//blast/executables/blast%2B/2.2.24/user_manual.pdf > > (Is it on the NCBI website somewhere? Strange to have only a downloadable PDF.) The section on the new archive format is on page 27. > > It seems like a nice idea to have the flexibility, but I wonder about the time cost of using this format. > > One of the big gains from using tab-delimited output is that BLAST doesn't have to do all the post-processing to generate the alignment views. By doing the archive format, which if I understand it correctly is ASN.1, you're always paying the full price in time (and space, for that matter). > > > > Dave > > > > > On Aug 24, 2010, at 18:17 , Chris Fields wrote: > >> FYI, >> >> Very interesting additions to BLAST+ (archive format). >> >> chris >> >> Begin forwarded message: >> >>> From: mcginnis >>> Date: August 24, 2010 10:46:50 AM CDT >>> To: NLM/NCBI List blast-announce >>> Subject: [blast-announce] Correction: BLAST 2.2.24 release announcement >>> >>> A new version of the stand-alone applications is available. >>> >>> Users are encouraged to use the BLAST+ applications available at ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ >>> >>> This release includes a number of bug fixes as well as new features for the BLAST+ applications: >>> >>> * Introduce BLAST Archive format to permit reformatting of stand-alone BLAST searches with the blast_formatter(see BLAST+ user manual) >>> * Added the blast_formatter application (see BLAST+ user manual) >>> * Added support for translated subject soft masking in the BLAST databases >>> * Added support for the BLAST Trace-back operations (btop) output format >>> * Added command line options to blastdbcmd for listing available BLAST databases >>> * Improved performance of formatting of remote BLAST searches >>> * Use a consistent exit code for out of memory conditions >>> * Fixed bug in indexed megablast with multiple space-separated BLAST databases >>> * Fixed bugs in legacy_blast.pl, blastdbcmd, rpsblast, and makeblastdb >>> * Fixed Windows installer for 64-bit installations >>> >>> BLAST+ applications, as well as the legacy C applications (e.g. blastall), may be downloaded from http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Tue Aug 24 14:45:29 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 24 Aug 2010 20:45:29 +0200 Subject: [Bioperl-l] FYI: interesting stuff in BLAST 2.2.24 release announcement In-Reply-To: References: <73C34D2F-813E-4FCE-8819-B86BC5A974C0@ncbi.nlm.nih.gov> <27DD75E8-4452-4B2D-B5B9-A686C113E5B6@sbc.su.se> Message-ID: <00C04DF9-F3C2-4574-B1E4-A3BF28EE953F@sbc.su.se> > It's probably more applicable from the viewpoint of a cluster admin who would want to add the flexibility of having a single archive and allowing any format (as opposed to re-running the analysis). Good point. > I'm just wondering if there is anything to glean there for possible alignment archiving purposes (ala SAM/BAM), but if it's ASN.1, likely not. To be honest, I didn't look that closely at it. It may be worth considering nevertheless. Dave From buiduyminh at gmail.com Tue Aug 24 14:56:43 2010 From: buiduyminh at gmail.com (Minh Bui) Date: Tue, 24 Aug 2010 14:56:43 -0400 Subject: [Bioperl-l] bp_seqfeature_load.pl fails on Mac os. Please help. In-Reply-To: <491D1B66-741F-4315-8A6B-46F465956017@sgul.ac.uk> References: <491D1B66-741F-4315-8A6B-46F465956017@sgul.ac.uk> Message-ID: How can I know where DBD:mysql PATH on my MAC? I am very new to MAC sorry. I just check and mysql.pm is in /Library/Perl/5.8.6/Bio/DB/SeqFeature/Store/DBI/mysql.pm On 8/21/10, Adam Witney wrote: > > On 20 Aug 2010, at 22:29, Minh Bui wrote: > > > Hi,, > > I am trying to load my GFF file to mysql database but I got this error > > when I ran the bp_seqfeature_load.pl ( bioperl 1.6.1 on ?MAC) > > > > [BioComplexity-5:/usr/local/bin] minh% perl bp_seqfeature_load.pl > > install_driver(mysql) failed: Can't locate DBD/mysql.pm in @INC (@INC > > contains: /sw/lib/perl5 /sw/lib/perl5/darwin > > /System/Library/Perl/5.8.6/darwin-thread-multi-2level > > /System/Library/Perl/5.8.6 > > /Library/Perl/5.8.6/darwin-thread-multi-2level /Library/Perl/5.8.6 > > /Library/Perl /Network/Library/Perl/5.8.6/darwin-thread-multi-2level > > /Network/Library/Perl/5.8.6 /Network/Library/Perl > > /System/Library/Perl/Extras/5.8.6/darwin-thread-multi-2level > > /System/Library/Perl/Extras/5.8.6 /Library/Perl/5.8.1 .) at (eval 44) > > line 3. > > Perhaps the DBD::mysql perl module hasn't been fully installed, > > or perhaps the capitalisation of 'mysql' isn't right. > > Available drivers: DBM, ExampleP, File, Gofer, Proxy, Sponge. > > at /Library/Perl/5.8.6/Bio/DB/SeqFeature/Store/DBI/mysql.pm line 212 > > > > I am using MAC OSX version 10.4.10 and MAMP? Isnt it the > > "/Library/Perl/5.8.6" already in @INC? What am I missing? > > I have been googling this error for a few hours. I also install > > Bioperl and reinstall DBD::mysql using CPAN. It still doesnt work.. > > > > Here is my $PERL5LIB: ?/sw/lib/perl5:/sw/lib/perl5/darwin/ > > > > Where did DBD:mysql get installed? can you verify that DBD/mysql.pm is actually in one of those directories listed above? > > From scott at scottcain.net Tue Aug 24 15:04:04 2010 From: scott at scottcain.net (Scott Cain) Date: Tue, 24 Aug 2010 15:04:04 -0400 Subject: [Bioperl-l] bp_seqfeature_load.pl fails on Mac os. Please help. In-Reply-To: References: <491D1B66-741F-4315-8A6B-46F465956017@sgul.ac.uk> Message-ID: Hi Minh, The file you found is not DBD::mysql though; it is Bio::DB::SeqFeature::Store::DBI::mysql, which was installed along with BioPerl. How did you find that file? The same method presumably would turn up DBD::mysql if it existed. I would use a command like this: locate mysql.pm which would locate all of the instances of files name mysql.pm on your computer. I would expect it to be located in /Library/Perl/5.8.6/darwin-thread-multi-2level/DBD/ if it was installed in a "normal" way (that is, not involving macports or fink or MAMP). Scott On Tue, Aug 24, 2010 at 2:56 PM, Minh Bui wrote: > How can I know where DBD:mysql PATH on my MAC? I am very new to MAC sorry. > > I just check and mysql.pm is in > /Library/Perl/5.8.6/Bio/DB/SeqFeature/Store/DBI/mysql.pm > > > > On 8/21/10, Adam Witney wrote: >> >> ?On 20 Aug 2010, at 22:29, Minh Bui wrote: >> >> ?> Hi,, >> ?> I am trying to load my GFF file to mysql database but I got this error >> ?> when I ran the bp_seqfeature_load.pl ( bioperl 1.6.1 on ?MAC) >> ?> >> ?> [BioComplexity-5:/usr/local/bin] minh% perl bp_seqfeature_load.pl >> ?> install_driver(mysql) failed: Can't locate DBD/mysql.pm in @INC (@INC >> ?> contains: /sw/lib/perl5 /sw/lib/perl5/darwin >> ?> /System/Library/Perl/5.8.6/darwin-thread-multi-2level >> ?> /System/Library/Perl/5.8.6 >> ?> /Library/Perl/5.8.6/darwin-thread-multi-2level /Library/Perl/5.8.6 >> ?> /Library/Perl /Network/Library/Perl/5.8.6/darwin-thread-multi-2level >> ?> /Network/Library/Perl/5.8.6 /Network/Library/Perl >> ?> /System/Library/Perl/Extras/5.8.6/darwin-thread-multi-2level >> ?> /System/Library/Perl/Extras/5.8.6 /Library/Perl/5.8.1 .) at (eval 44) >> ?> line 3. >> ?> Perhaps the DBD::mysql perl module hasn't been fully installed, >> ?> or perhaps the capitalisation of 'mysql' isn't right. >> ?> Available drivers: DBM, ExampleP, File, Gofer, Proxy, Sponge. >> ?> at /Library/Perl/5.8.6/Bio/DB/SeqFeature/Store/DBI/mysql.pm line 212 >> ?> >> ?> I am using MAC OSX version 10.4.10 and MAMP? Isnt it the >> ?> "/Library/Perl/5.8.6" already in @INC? What am I missing? >> ?> I have been googling this error for a few hours. I also install >> ?> Bioperl and reinstall DBD::mysql using CPAN. It still doesnt work.. >> ?> >> ?> Here is my $PERL5LIB: ?/sw/lib/perl5:/sw/lib/perl5/darwin/ >> >> >> >> Where did DBD:mysql get installed? can you verify that DBD/mysql.pm is actually in one of those directories listed above? >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From jason at bioperl.org Wed Aug 25 00:33:45 2010 From: jason at bioperl.org (Jason Stajich) Date: Tue, 24 Aug 2010 21:33:45 -0700 Subject: [Bioperl-l] Enquiry on gi_taxid_nucl.dmp.gz In-Reply-To: References: Message-ID: <4C749D29.3040003@bioperl.org> hi - please keep questions on list. I think one of your problem is your first use of $gi2taxidfile is wrong. when you call tie you want to specify an dbfile you want to store the index in. So call it "/tmp/gi2taxid.idx" or something like that. In my code here http://github.com/bioperl/bioperl-live/blob/master/scripts/taxa/classify_hits_kingdom.PLS you will see on line 97 we construct the name of the index file to be the folder, plus 'idx', plus the name gi2taxid which will be the name of index file. Also it would be safer for the split to be whitespace matching and that you want the the two first columns from the file. Doing this would eliminate the need for the chomp on the line above. my ($gi, $taxid) = split(/\s+/, $_); instead of chomp; my ($gi, $taxid) = split(" ", $_,2); There may be other problems but these should be fixed first -- and please send queries to the mailing list rather than to me directly so that others can answer questions. -jason Amali Thrimawithana wrote, On 8/24/10 8:13 PM: > Dear Jason > > Thank you very much for the information. I manage to get the information on > different taxonomic levels with the help of one of your example code > "local_taxonomydb_query". However I am having trouble with creating a local > index file of the gi_taxid_nucl.dmp so that I am able to get the taxonomic > id given the GI number of NCBI. At the moment I am using the tie() function > with DB_file and then storing the detail into a hash. However when I try to > retrieve a taxonomic ID given the GI number, it is not returning any thing > but an error. Below is part of the code (borrowed from the example code > classify kingdom), can you please let me know where I am going wrong? > ... > my $dbh2 = tie(%taxid4gi, 'DB_File', $gi2taxidfile); > > if( ! $done ) { > my $fh; > open(GI2TAXID, "$gi2taxidfile") or die $!; #here passing the unzipped > gi_taxid_nucl.dmp > my$i=0; > while () { > chomp; > my ($gi, $taxid) = split(" ", $_, 2); > $taxid4gi{$gi} = $taxid > if exists $taxid4gi{$gi}; > $i++; > unless( $DEBUG&& $i % 100000 ) { > warn "$i\n"; > } > } > $dbh2->sync; > } > my $gi2='183397240'; > my $taxd2=$taxid4gi{$gi2}; > print $taxd2, " \n"; > > Any help would be much appreciated > > Thanking you > Amali > > On 23 August 2010 06:29, Jason Stajich wrote: > > >> Hi Amali - >> >> This is how I'd print out the full classification by using the Tree methods >> (with probably a different way of initializing the $db object to your >> flatfiles location). >> >> #!/usr/bin/perl -w >> use strict; >> use Bio::DB::Taxonomy; >> >> my $db= Bio::DB::Taxonomy->new(-source => 'flatfile', >> -nodesfile => 'taxonomy/nodes.dmp', >> -namesfile => 'taxonomy/names.dmp'); >> >> my $taxonid = $db->get_taxonid('Homo sapiens'); >> my $taxon = $db->get_taxon(-taxonid => $taxonid); >> my $tree = Bio::Tree::Tree->new(-node => $taxon); >> my @taxa = $tree->get_nodes; >> print join(",", map { $_->scientific_name } @taxa), "\n"; >> >> -jason >> >> Amali Thrimawithana wrote, On 8/18/10 3:56 PM: >> >> Dear Dr Stajich, >> >>> I am a Masters student at Auckland university and my research is on >>> identifying yeast species present in wine by the use of 454 sequencing. In >>> order to carry out this research, a pipeline is being built in which at >>> the >>> final step each representative OTU need to be classified at different >>> taxonomic levels (ie: at Phylum, family, class, genus and species) by >>> using >>> the results from BLAST. To identify the sequences at each taxonomic level, >>> I >>> have been trying out the Bio::DB::Taxonomy module in bioperl. Using this >>> module, I am able to get the genus and species level by splitting the >>> scientific name returned by the Bio::taxon object. But unfortunately I am >>> uncertain on how to get the information for the other levels of the rank. >>> I >>> have tried several commands including "my @class = >>> $node->classification;", >>> but it does not work. Hence, could you please let me know how I might be >>> able to get the higher levels of taxonomy such as class and phylum using >>> bioperl? >>> >>> Look forward to hearing from you soon >>> >>> Thanking You >>> >>> Amali >>> >>> >>> From roy.chaudhuri at gmail.com Wed Aug 25 07:12:15 2010 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Wed, 25 Aug 2010 12:12:15 +0100 Subject: [Bioperl-l] Enquiry on gi_taxid_nucl.dmp.gz In-Reply-To: <4C749D29.3040003@bioperl.org> References: <4C749D29.3040003@bioperl.org> Message-ID: <4C74FA8F.3080506@gmail.com> > Also it would be safer for the split to be whitespace matching and that > you want the the two first columns from the file. Doing this would > eliminate the need for the chomp on the line above. > > my ($gi, $taxid) = split(/\s+/, $_); > > instead of > > chomp; > my ($gi, $taxid) = split(" ", $_,2); Sorry to be pedantic, but according to perldoc -f split: "As a special case, specifying a PATTERN of space (' ') will split on white space just as "split" with no arguments does" The only difference between patterns of " " and /\s+/ is that the latter will return an initial null field if there is leading white space, which may or may not be what you want. $ perl -e 'print join("-", split(" ", " 1\t2 3")), "\n"' 1-2-3 $ perl -e 'print join("-", split(/\s+/, " 1\t2 3")), "\n"' -1-2-3 Cheers. Roy. From kanmaninradha at gmail.com Thu Aug 26 04:29:08 2010 From: kanmaninradha at gmail.com (kanmani radha) Date: Thu, 26 Aug 2010 01:29:08 -0700 Subject: [Bioperl-l] getting DNA sequence for exon features from GFF Message-ID: Hi All, I would like to get the DNA seq from GFF file. I'm using Bio::Tools::GFF module. I could get everything else but not the DNA seq. Can anyone help me to find this out, Please. I appreciate your help very much. thanks, Kanmani #!/usr/bin/perl use strict; use warnings; use Bio::Tools::GFF; my $file = shift; my $gffio = Bio::Tools::GFF->new(-file => $file, -gff_version => 3); $gffio->features_attached_to_seqs(1); while (my $feat = $gffio->next_feature()){ my $start = $feat->start; my $end= $feat->end; my $size = $end-$start+1; my $strand = $feat->strand; my $seqid = $feat->seq_id; my $score = $feat->score; my $frame = $feat->frame; my $source = $feat->source_tag; my $type = $feat->primary_tag; my $gffstr = $gffio->gff_string($feat); my @alltags = $feat->all_tags(); my @ID_tag_value = $feat->each_tag_value("ID"); my $seq = $feat->seq(); print "$seq\n"; if($type eq "gene"){ # print "@ID_tag_value\t$size\t$type\t$start\t$end\n"; } } From David.Messina at sbc.su.se Thu Aug 26 04:53:48 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 26 Aug 2010 10:53:48 +0200 Subject: [Bioperl-l] getting DNA sequence for exon features from GFF In-Reply-To: References: Message-ID: <6A715DB8-C51A-4195-BB6C-73A7345CB65A@sbc.su.se> Admittedly i'm not up on the latest uses of GFF, but as far as I know, GFF is an annotation format only ? it does not contain the actual sequence. Have you looked in your GFF file to see if there are nucleotides in there? Dave On Aug 26, 2010, at 10:29, kanmani radha wrote: > Hi All, > I would like to get the DNA seq from GFF file. I'm using Bio::Tools::GFF > module. I could get everything else but not the DNA seq. From biopython at maubp.freeserve.co.uk Thu Aug 26 05:02:53 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 26 Aug 2010 10:02:53 +0100 Subject: [Bioperl-l] getting DNA sequence for exon features from GFF In-Reply-To: <6A715DB8-C51A-4195-BB6C-73A7345CB65A@sbc.su.se> References: <6A715DB8-C51A-4195-BB6C-73A7345CB65A@sbc.su.se> Message-ID: On Thu, Aug 26, 2010 at 9:53 AM, Dave Messina wrote: > > Admittedly i'm not up on the latest uses of GFF, but as far as I know, GFF > is an annotation format only ? it does not contain the actual sequence. > > Have you looked in your GFF file to see if there are nucleotides in there? > > Dave Actually a GFF file can optionally include a FASTA format sequence at the end of the file, although it seems to be more common to just supply separate GFF and FASTA files and cross reference by ID. Peter From David.Messina at sbc.su.se Thu Aug 26 05:08:20 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 26 Aug 2010 11:08:20 +0200 Subject: [Bioperl-l] getting DNA sequence for exon features from GFF In-Reply-To: References: <6A715DB8-C51A-4195-BB6C-73A7345CB65A@sbc.su.se> Message-ID: Aha, great, thanks for clarifying, Peter. And if I bothered to look at the Bio::Tools::GFF documentation before answering :), I would have seen this: http://doc.bioperl.org/bioperl-live/Bio/Tools/GFF.html#General which describes how you can use $gffio->get_seqs() and related methods to pull out the sequence data. Dave On Aug 26, 2010, at 11:02, Peter wrote: > On Thu, Aug 26, 2010 at 9:53 AM, Dave Messina wrote: >> >> Admittedly i'm not up on the latest uses of GFF, but as far as I know, GFF >> is an annotation format only ? it does not contain the actual sequence. >> >> Have you looked in your GFF file to see if there are nucleotides in there? >> >> Dave > > Actually a GFF file can optionally include a FASTA format sequence > at the end of the file, although it seems to be more common to just > supply separate GFF and FASTA files and cross reference by ID. > > Peter From David.Messina at sbc.su.se Thu Aug 26 05:18:25 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 26 Aug 2010 11:18:25 +0200 Subject: [Bioperl-l] getting DNA sequence for exon features from GFF In-Reply-To: References: <6A715DB8-C51A-4195-BB6C-73A7345CB65A@sbc.su.se> Message-ID: <984552CF-01F3-4D29-932F-DD030CCC1448@sbc.su.se> So, just to finish the thought: Kanmani, Apologies for my sloppy and uninformed answer. The following is only slightly less sloppy and uninformed, but may actually answer your question. I think you need to call $gffio->get_seqs() probably as my @seq_objects = $gffio->get_seqs(); and then loop through those something like: foreach my $seq_object (@seq_objects) { my $seq = $seq_object->seq(); foreach my $feat ($seq->get_SeqFeatures) { # do your feature processing here } } Note that I haven't tested the above code. Dave From fs5 at sanger.ac.uk Thu Aug 26 05:19:44 2010 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Thu, 26 Aug 2010 10:19:44 +0100 Subject: [Bioperl-l] getting DNA sequence for exon features from GFF In-Reply-To: References: Message-ID: <1282814384.4777.48.camel@deskpro15336.dynamic.sanger.ac.uk> Hi Kammani, While GFF files may contain DNA sequence data, most of them don't, so you will have to use the location information you get from the GFF annotation file in conjunction with, e.g., a local FASTA database of the genomic sequence you are working with or an online resource. Frank On Thu, 2010-08-26 at 01:29 -0700, kanmani radha wrote: > Hi All, > I would like to get the DNA seq from GFF file. I'm using Bio::Tools::GFF > module. I could get everything else but not the DNA seq. > > Can anyone help me to find this out, Please. I appreciate your help very > much. > thanks, > Kanmani > > #!/usr/bin/perl > > use strict; > use warnings; > use Bio::Tools::GFF; > > my $file = shift; > > my $gffio = Bio::Tools::GFF->new(-file => $file, -gff_version => 3); > $gffio->features_attached_to_seqs(1); > > while (my $feat = $gffio->next_feature()){ > my $start = $feat->start; > my $end= $feat->end; > my $size = $end-$start+1; > my $strand = $feat->strand; > my $seqid = $feat->seq_id; > my $score = $feat->score; > my $frame = $feat->frame; > my $source = $feat->source_tag; > my $type = $feat->primary_tag; > my $gffstr = $gffio->gff_string($feat); > my @alltags = $feat->all_tags(); > my @ID_tag_value = $feat->each_tag_value("ID"); > > my $seq = $feat->seq(); > print "$seq\n"; > > if($type eq "gene"){ # > print "@ID_tag_value\t$size\t$type\t$start\t$end\n"; > } > } > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From cjfields at illinois.edu Thu Aug 26 10:20:48 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 26 Aug 2010 09:20:48 -0500 Subject: [Bioperl-l] getting DNA sequence for exon features from GFF In-Reply-To: <1282814384.4777.48.camel@deskpro15336.dynamic.sanger.ac.uk> References: <1282814384.4777.48.camel@deskpro15336.dynamic.sanger.ac.uk> Message-ID: <6CC5D44B-47C7-4A05-B834-EAB9173C74CE@illinois.edu> Kammani, If you are using BioPerl, the best option currently available is to load a database with all relevant information (GFF and FASTA), then use that database for querying. The most commonly-used ones now are Bio::DB::SeqFeature::Store and Bio::DB::GFF; the former is very GFF3-centric, but I believe it can handle GFF/GTF, and it has various database adaptors (MySQL, Pg, BDB, SQLite). chris On Aug 26, 2010, at 4:19 AM, Frank Schwach wrote: > Hi Kammani, > > While GFF files may contain DNA sequence data, most of them don't, so > you will have to use the location information you get from the GFF > annotation file in conjunction with, e.g., a local FASTA database of the > genomic sequence you are working with or an online resource. > > > Frank > > > > On Thu, 2010-08-26 at 01:29 -0700, kanmani radha wrote: >> Hi All, >> I would like to get the DNA seq from GFF file. I'm using Bio::Tools::GFF >> module. I could get everything else but not the DNA seq. >> >> Can anyone help me to find this out, Please. I appreciate your help very >> much. >> thanks, >> Kanmani >> >> #!/usr/bin/perl >> >> use strict; >> use warnings; >> use Bio::Tools::GFF; >> >> my $file = shift; >> >> my $gffio = Bio::Tools::GFF->new(-file => $file, -gff_version => 3); >> $gffio->features_attached_to_seqs(1); >> >> while (my $feat = $gffio->next_feature()){ >> my $start = $feat->start; >> my $end= $feat->end; >> my $size = $end-$start+1; >> my $strand = $feat->strand; >> my $seqid = $feat->seq_id; >> my $score = $feat->score; >> my $frame = $feat->frame; >> my $source = $feat->source_tag; >> my $type = $feat->primary_tag; >> my $gffstr = $gffio->gff_string($feat); >> my @alltags = $feat->all_tags(); >> my @ID_tag_value = $feat->each_tag_value("ID"); >> >> my $seq = $feat->seq(); >> print "$seq\n"; >> >> if($type eq "gene"){ # >> print "@ID_tag_value\t$size\t$type\t$start\t$end\n"; >> } >> } >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Aug 26 10:31:59 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 26 Aug 2010 09:31:59 -0500 Subject: [Bioperl-l] getting DNA sequence for exon features from GFF In-Reply-To: References: <6A715DB8-C51A-4195-BB6C-73A7345CB65A@sbc.su.se> Message-ID: On Aug 26, 2010, at 4:02 AM, Peter wrote: > On Thu, Aug 26, 2010 at 9:53 AM, Dave Messina wrote: >> >> Admittedly i'm not up on the latest uses of GFF, but as far as I know, GFF >> is an annotation format only ? it does not contain the actual sequence. >> >> Have you looked in your GFF file to see if there are nucleotides in there? >> >> Dave > > Actually a GFF file can optionally include a FASTA format sequence > at the end of the file, although it seems to be more common to just > supply separate GFF and FASTA files and cross reference by ID. > > Peter IIRC, optionally including FASTA sequence is specified only in the GFF3 spec; use of FASTA isn't explicitly mentioned in earlier versions. We only support it with earlier GFF due to convergence of the various GFF parsers. The original GFF spec proposed allowing sequence, but it's in the form of meta information and I have never seen it used in practice (as you mention, the FASTA is normally loaded separately). chris From kanmaninradha at gmail.com Thu Aug 26 12:22:14 2010 From: kanmaninradha at gmail.com (kanmani radha) Date: Thu, 26 Aug 2010 09:22:14 -0700 Subject: [Bioperl-l] getting DNA sequence for exon features from GFF In-Reply-To: <6CC5D44B-47C7-4A05-B834-EAB9173C74CE@illinois.edu> References: <1282814384.4777.48.camel@deskpro15336.dynamic.sanger.ac.uk> <6CC5D44B-47C7-4A05-B834-EAB9173C74CE@illinois.edu> Message-ID: Hi Everyone, Thanks very much for this clarification. Thanks a ton for every one who spared their time to educate me. I see your points. Please correct me if I am wrong. I understand that, Its better to use use Bio::DB::SeqFeature or Bio::DB::GFF to load the fasta sequences (from a separate multifasta) file and then Bio::Tools::GFF to parse the feature info from a gff file . Then query the created database for the relevent GFF coordinates.... I will implement this. Thanks once again. Kanmani On Thu, Aug 26, 2010 at 7:20 AM, Chris Fields wrote: > Kammani, > > If you are using BioPerl, the best option currently available is to load a > database with all relevant information (GFF and FASTA), then use that > database for querying. The most commonly-used ones now are > Bio::DB::SeqFeature::Store and Bio::DB::GFF; the former is very > GFF3-centric, but I believe it can handle GFF/GTF, and it has various > database adaptors (MySQL, Pg, BDB, SQLite). > > chris > > On Aug 26, 2010, at 4:19 AM, Frank Schwach wrote: > > > Hi Kammani, > > > > While GFF files may contain DNA sequence data, most of them don't, so > > you will have to use the location information you get from the GFF > > annotation file in conjunction with, e.g., a local FASTA database of the > > genomic sequence you are working with or an online resource. > > > > > > Frank > > > > > > > > On Thu, 2010-08-26 at 01:29 -0700, kanmani radha wrote: > >> Hi All, > >> I would like to get the DNA seq from GFF file. I'm using Bio::Tools::GFF > >> module. I could get everything else but not the DNA seq. > >> > >> Can anyone help me to find this out, Please. I appreciate your help very > >> much. > >> thanks, > >> Kanmani > >> > >> #!/usr/bin/perl > >> > >> use strict; > >> use warnings; > >> use Bio::Tools::GFF; > >> > >> my $file = shift; > >> > >> my $gffio = Bio::Tools::GFF->new(-file => $file, -gff_version => 3); > >> $gffio->features_attached_to_seqs(1); > >> > >> while (my $feat = $gffio->next_feature()){ > >> my $start = $feat->start; > >> my $end= $feat->end; > >> my $size = $end-$start+1; > >> my $strand = $feat->strand; > >> my $seqid = $feat->seq_id; > >> my $score = $feat->score; > >> my $frame = $feat->frame; > >> my $source = $feat->source_tag; > >> my $type = $feat->primary_tag; > >> my $gffstr = $gffio->gff_string($feat); > >> my @alltags = $feat->all_tags(); > >> my @ID_tag_value = $feat->each_tag_value("ID"); > >> > >> my $seq = $feat->seq(); > >> print "$seq\n"; > >> > >> if($type eq "gene"){ # > >> print "@ID_tag_value\t$size\t$type\t$start\t$end\n"; > >> } > >> } > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > -- > > The Wellcome Trust Sanger Institute is operated by Genome Research > > Limited, a charity registered in England with number 1021457 and a > > company registered in England with number 2742969, whose registered > > office is 215 Euston Road, London, NW1 2BE. > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Thu Aug 26 13:08:56 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 26 Aug 2010 12:08:56 -0500 Subject: [Bioperl-l] getting DNA sequence for exon features from GFF In-Reply-To: References: <1282814384.4777.48.camel@deskpro15336.dynamic.sanger.ac.uk> <6CC5D44B-47C7-4A05-B834-EAB9173C74CE@illinois.edu> Message-ID: On Aug 26, 2010, at 11:22 AM, kanmani radha wrote: > Hi Everyone, > > Thanks very much for this clarification. Thanks a ton for every one who > spared their time to educate me. > > I see your points. Please correct me if I am wrong. > > I understand that, Its better to use use Bio::DB::SeqFeature or Bio::DB::GFF > to load the fasta sequences (from a separate multifasta) file and > then Bio::Tools::GFF to parse the feature info from a gff file . Then query > the created database for the relevent GFF coordinates.... > > I will implement this. > > Thanks once again. > Kanmani Yes, in general. I forgot to mention that you can have an in-memory database as well, but it's only suggested if you have a few thousand or so features and small sequences (I think bacterial chromosomes will work). chris From Havard.Aanes at nvh.no Wed Aug 25 11:47:12 2010 From: Havard.Aanes at nvh.no (=?iso-8859-1?Q?Aanes_H=E5vard?=) Date: Wed, 25 Aug 2010 17:47:12 +0200 Subject: [Bioperl-l] bpfetch.pl Message-ID: <897520BC3AAE754FA4E34E2FD26490A8021C61597B8D@A-EXMB1.veths.no> Hi, I am trying do obtain a set of mRNA sequences from a database, made by the bpindex script. I thought this should be a trivial task, but it appears not to be. I get the sequences if I do one by one, like this: perl scripts/index/bpfetch.pl -dir ./ zebrafish:NM_201192 zebrafish:NM_212708 But I need hundreds of sequences, so my plan was to put the RefSeq IDs in a file and use that as an argument (or whatever it is called in perl). That does not work: haavaaan at login2 ~/download/src/bioperl-1.2.3 $ perl scripts/index/bpfetch.pl -dir ./ zebrafish:./some_seqs You are running bpindex.pl without installing bioperl. You have done it from bioperl/scripts, and so we can find the necessary information but it is much better to install bioperl Please read the README in the bioperl distribution Sequence %id in Database zebrafish is not present Any suggestions on how to do this? Alternative approaches are also appreciated. I have no experience in perl, just started using linux, and for the moment there is no time to learn perl, so I would really be grateful for any help to solve this specific task. Best regards H?vard Aanes (M.Sc.) Ph.D. student Section for biochemistry and physiology The Norwegian School of Veterinary Science Telephone: +47 22597358 The new e-mail domain name for The Norwegian School of Veterinary Science is @nvh.no. The former domain address @veths.no will still be in use, but it will be discontinued within 1-2 years. Please update your e-mail records. This message verifies that the e-mail has been scanned for virus, and deemed virus-free according to our scanengines. From kanmaninradha at gmail.com Thu Aug 26 04:23:28 2010 From: kanmaninradha at gmail.com (kanmani) Date: Thu, 26 Aug 2010 01:23:28 -0700 (PDT) Subject: [Bioperl-l] Bio::Tools:GFF to get DNA sequences... Message-ID: <9b7381d7-3596-4e60-a2ac-6c8c135d457d@s24g2000pri.googlegroups.com> Hi I am trying to get the DNA sequences for each exon feature. I have the following script. Everything works except getting sequences. Can some one correct me.....Thanks. #!/usr/bin/perl use strict; use warnings; use Bio::Tools::GFF; my $file = shift; my $gffio = Bio::Tools::GFF->new(-file => $file, -gff_version => 3); $gffio->features_attached_to_seqs(1); while (my $feat = $gffio->next_feature()){ my $start = $feat->start; my $end= $feat->end; my $size = $end-$start+1; my $strand = $feat->strand; my $seqid = $feat->seq_id; my $score = $feat->score; my $frame = $feat->frame; my $source = $feat->source_tag; my $type = $feat->primary_tag; my $gffstr = $gffio->gff_string($feat); my @alltags = $feat->all_tags(); my @ID_tag_value = $feat->each_tag_value("ID"); my $seq = $feat->seq(); print "$seq\n"; if($type eq "gene"){ print "@ID_tag_value\t$size\t$type\t$start\t$end\n"; } } From kanmaninradha at gmail.com Thu Aug 26 17:24:40 2010 From: kanmaninradha at gmail.com (kanmani radha) Date: Thu, 26 Aug 2010 14:24:40 -0700 Subject: [Bioperl-l] getting DNA sequence for exon features from GFF In-Reply-To: References: <1282814384.4777.48.camel@deskpro15336.dynamic.sanger.ac.uk> <6CC5D44B-47C7-4A05-B834-EAB9173C74CE@illinois.edu> Message-ID: Hi Chris and others, For a brief amount time i could get away using Bio::DB::Fasta to index fasta files and Bio::Tools::GFF to iterate thru GFF features. But, i hit the wall again. Looks like sequential access of GFF featuers is not sufficient, I want to have a random access to it. I see the only way to do that is by using Bio::DB::GFF as suggested by Chris. Here is my question. Is there any tutorial to configure Bioperl or this module in particular to work with MySQL/postgres. I will really appreciate it. And thanks for all your help. Kanmani On Thu, Aug 26, 2010 at 10:08 AM, Chris Fields wrote: > On Aug 26, 2010, at 11:22 AM, kanmani radha wrote: > > > Hi Everyone, > > > > Thanks very much for this clarification. Thanks a ton for every one who > > spared their time to educate me. > > > > I see your points. Please correct me if I am wrong. > > > > I understand that, Its better to use use Bio::DB::SeqFeature or > Bio::DB::GFF > > to load the fasta sequences (from a separate multifasta) file and > > then Bio::Tools::GFF to parse the feature info from a gff file . Then > query > > the created database for the relevent GFF coordinates.... > > > > I will implement this. > > > > Thanks once again. > > Kanmani > > Yes, in general. I forgot to mention that you can have an in-memory > database as well, but it's only suggested if you have a few thousand or so > features and small sequences (I think bacterial chromosomes will work). > > chris From kanmaninradha at gmail.com Thu Aug 26 18:04:20 2010 From: kanmaninradha at gmail.com (kanmani radha) Date: Thu, 26 Aug 2010 15:04:20 -0700 Subject: [Bioperl-l] getting DNA sequence for exon features from GFF In-Reply-To: References: <1282814384.4777.48.camel@deskpro15336.dynamic.sanger.ac.uk> <6CC5D44B-47C7-4A05-B834-EAB9173C74CE@illinois.edu> Message-ID: HI, I made some progress since then.... - Installing Bio::DB::DBI::mysql needed Biosql. - Downloaded and installed biosql follow the instruction as given in their INSTALL file - Created biosql db in my mysql server - loaded schema using script from biosql - installed DBI - Now, I have problem with DBD::mysql. That reminds me couple years back i had to struggle installing this driver on another machine. I thought i ask around this time. It fails with a bunch of error messages.....the first of it being.... dbdimp.h:22:49 error: mysql.h no such filer or directory But, My mysql installation has header file in "/usr/include/mysql3/mysql/mysql.h". Can anyone suggest how to move forward from that..... thanks, Kanmani On Thu, Aug 26, 2010 at 2:24 PM, kanmani radha wrote: > Hi Chris and others, > > For a brief amount time i could get away using Bio::DB::Fasta to index > fasta files and Bio::Tools::GFF to iterate thru GFF features. But, i hit the > wall again. Looks like sequential access of GFF featuers is not sufficient, > I want to have a random access to it. I see the only way to do that is by > using Bio::DB::GFF as suggested by Chris. > > Here is my question. Is there any tutorial to configure Bioperl or this > module in particular to work with MySQL/postgres. I will really appreciate > it. > > And thanks for all your help. > Kanmani > > > On Thu, Aug 26, 2010 at 10:08 AM, Chris Fields wrote: > >> On Aug 26, 2010, at 11:22 AM, kanmani radha wrote: >> >> > Hi Everyone, >> > >> > Thanks very much for this clarification. Thanks a ton for every one who >> > spared their time to educate me. >> > >> > I see your points. Please correct me if I am wrong. >> > >> > I understand that, Its better to use use Bio::DB::SeqFeature or >> Bio::DB::GFF >> > to load the fasta sequences (from a separate multifasta) file and >> > then Bio::Tools::GFF to parse the feature info from a gff file . Then >> query >> > the created database for the relevent GFF coordinates.... >> > >> > I will implement this. >> > >> > Thanks once again. >> > Kanmani >> >> Yes, in general. I forgot to mention that you can have an in-memory >> database as well, but it's only suggested if you have a few thousand or so >> features and small sequences (I think bacterial chromosomes will work). >> >> chris > > > From rafalucas.unicamp at gmail.com Thu Aug 26 18:11:07 2010 From: rafalucas.unicamp at gmail.com (Rafael Lucas) Date: Thu, 26 Aug 2010 19:11:07 -0300 Subject: [Bioperl-l] Help in algorithm Bio::Structure::IO::pdb Message-ID: Hi folks, How are you? I'm from Brazil and I was making an algorithm that Cryptographyc a data and then print the result in a pdb file. So I have a .fasta file and want to pass this file to .pdb file, if I use a program, like PyMol, it will take so much time, so I wanna use the Bio::Structure::IO::pdb to accelerate this process, could you help me in this problem? Thank you, Rafael Lucas Faculdade de Tecnologia em Analise e Desenvolvimento de Sistemas FT - UNICAMP +55 (19)9614-0533 From J.Christopher.Ellis at duke.edu Thu Aug 26 22:06:30 2010 From: J.Christopher.Ellis at duke.edu (J. Christopher Ellis) Date: Thu, 26 Aug 2010 22:06:30 -0400 Subject: [Bioperl-l] standaloneblastplus blastn crash Message-ID: <55861.1282874790@duke.edu> When I run the standaloneblastplus I get the following error... ------------- EXCEPTION ------------- MSG: C:Program FilesNCBIblast-2.2.24+binblastn.exe call crashed: There was a problem running C:Program FilesNCBIblast-2.2.24+binblastn.exe :? at C:/Perl64/lib/Bio/Tools/Run/WrapperBase/CommandExts.pm line 1001. STACK Bio::Tools::Run::WrapperBase::_run C:/Perl64/lib/Bio/Tools/Run/WrapperBase/CommandExts.pm:1006 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD C:/Perl64/lib/Bio/Tools/Run/StandAloneBlastPlus.pm:1303 STACK Bio::Tools::Run::StandAloneBlastPlus::run C:/Perl64/lib/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:270 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD C:/Perl64/lib/Bio/Tools/Run/StandAloneBlastPlus.pm:1301 STACK toplevel localBlast.pl:9 ------------------------------------- I have a sneaky suspicion that it is an easy fix but for the life of me I can not figure it out! :) Thanks in advance, Chris From indraniel at gmail.com Thu Aug 26 21:57:54 2010 From: indraniel at gmail.com (Indraniel) Date: Fri, 27 Aug 2010 01:57:54 +0000 (UTC) Subject: [Bioperl-l] How to convert SFF into Fastq References: Message-ID: A fourth option is the following tool, sff2fastq (written in C), described here: http://indraniel.wordpress.com/2010/04/23/sff2fastq/ and http://github.com/indraniel/sff2fastq Indraniel From David.Messina at sbc.su.se Fri Aug 27 03:41:21 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 27 Aug 2010 09:41:21 +0200 Subject: [Bioperl-l] [RFC] Interolog::Walk In-Reply-To: <4C6D0B50.4050902@sms.ed.ac.uk> References: <4C6BF4BD.5010200@sms.ed.ac.uk> <8BBCD561-CA5E-412B-8CBC-34767006A74A@sbc.su.se> <4C6D0B50.4050902@sms.ed.ac.uk> Message-ID: Hi Giuseppe, On Aug 19, 2010, at 12:45, Giuseppe Gallone wrote: > Bio::Orthology::InterologMap > Bio::Orthology::Interolog::Map, > just in case somebody else finds other interesting applications for the Interolog concept and would like to "plug in" their own contribution. Would this make any sense? Absolutely. I think either of the above is a good option, and I agree that the second is a little more flexible. Your POD looks great! Way better than most. Having seen the whole thing now, I think your description is fine as is. And if you have another tutorial and example scripts on top of it, that would really be terrific, above and beyond what most people would expect. So, time to unleash it on the world! :) Dave From David.Messina at sbc.su.se Fri Aug 27 03:58:12 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 27 Aug 2010 09:58:12 +0200 Subject: [Bioperl-l] standaloneblastplus blastn crash In-Reply-To: <55861.1282874790@duke.edu> References: <55861.1282874790@duke.edu> Message-ID: <9275A540-AE42-47B0-BA73-A906964C451B@sbc.su.se> Hi Chris, If you look at the error message, it says what the problem is: it's trying to call the blastn executable with no spaces in the path name. > MSG: C:Program FilesNCBIblast-2.2.24+binblastn.exe call crashed: There > was a problem running C:Program FilesNCBIblast-2.2.24+binblastn.exe Now, that could be a problem is BioPerl or it could be a problem in your code. It's hard to diagnose where the problem lies without your code, so please post your code. Dave From G.Gallone at sms.ed.ac.uk Fri Aug 27 07:07:57 2010 From: G.Gallone at sms.ed.ac.uk (Giuseppe Gallone) Date: Fri, 27 Aug 2010 12:07:57 +0100 Subject: [Bioperl-l] [RFC] Interolog::Walk In-Reply-To: References: <4C6BF4BD.5010200@sms.ed.ac.uk> <8BBCD561-CA5E-412B-8CBC-34767006A74A@sbc.su.se> <4C6D0B50.4050902@sms.ed.ac.uk> Message-ID: <4C779C8D.1090007@sms.ed.ac.uk> Hi Dave, thank you very much for your feedback :) . I will register the namespace right now. I think I will use 'homology' as the second level name though, because I plan to extend the module to work with paralogues as well. As for the category, which one of the following you reckon it will fit a Bio:: package better http://www.cpan.org/modules/by-category/ Regards Giuseppe On 27/08/10 08:41, Dave Messina wrote: > Hi Giuseppe, > > > On Aug 19, 2010, at 12:45, Giuseppe Gallone wrote: >> Bio::Orthology::InterologMap >> Bio::Orthology::Interolog::Map, > >> just in case somebody else finds other interesting applications for the Interolog concept and would like to "plug in" their own contribution. Would this make any sense? > > Absolutely. I think either of the above is a good option, and I agree that the second is a little more flexible. > > Your POD looks great! Way better than most. Having seen the whole thing now, I think your description is fine as is. And if you have another tutorial and example scripts on top of it, that would really be terrific, above and beyond what most people would expect. > > So, time to unleash it on the world! :) > > > Dave > > -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From David.Messina at sbc.su.se Fri Aug 27 07:25:06 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 27 Aug 2010 13:25:06 +0200 Subject: [Bioperl-l] [RFC] Interolog::Walk In-Reply-To: <4C779C8D.1090007@sms.ed.ac.uk> References: <4C6BF4BD.5010200@sms.ed.ac.uk> <8BBCD561-CA5E-412B-8CBC-34767006A74A@sbc.su.se> <4C6D0B50.4050902@sms.ed.ac.uk> <4C779C8D.1090007@sms.ed.ac.uk> Message-ID: <80E5F23B-EA13-40EE-B0C5-81F2E6A69C01@sbc.su.se> Hi Giuseppe, > I think I will use 'homology' as the second level name though, because I plan to extend the module to work with paralogues as well. Sounds good. > As for the category, which one of the following you reckon it will fit a Bio:: package better > > http://www.cpan.org/modules/by-category/ Bio:: is in 23 - miscellaneous modules, so probably keeping with that makes sense. I don't know much about that stuff, though. Chris F. or other CPAN cognoscenti care to comment? Dave From cjfields at illinois.edu Fri Aug 27 09:26:51 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 27 Aug 2010 08:26:51 -0500 Subject: [Bioperl-l] [RFC] Interolog::Walk In-Reply-To: <80E5F23B-EA13-40EE-B0C5-81F2E6A69C01@sbc.su.se> References: <4C6BF4BD.5010200@sms.ed.ac.uk> <8BBCD561-CA5E-412B-8CBC-34767006A74A@sbc.su.se> <4C6D0B50.4050902@sms.ed.ac.uk> <4C779C8D.1090007@sms.ed.ac.uk> <80E5F23B-EA13-40EE-B0C5-81F2E6A69C01@sbc.su.se> Message-ID: <88BB7813-E892-4BEC-9C49-5FD22325BBF7@illinois.edu> On Aug 27, 2010, at 6:25 AM, Dave Messina wrote: > Hi Giuseppe, > > >> I think I will use 'homology' as the second level name though, because I plan to extend the module to work with paralogues as well. > > Sounds good. > > >> As for the category, which one of the following you reckon it will fit a Bio:: package better >> >> http://www.cpan.org/modules/by-category/ > > > Bio:: is in 23 - miscellaneous modules, so probably keeping with that makes sense. > > I don't know much about that stuff, though. Chris F. or other CPAN cognoscenti care to comment? > > > Dave That's probably the best spot, as we cover a fairly broad range (mainly due to core monolithic structure). Though it's terribly non-descript, sort of the junk drawer of CPAN. chris From adamkennedybackup at gmail.com Sun Aug 29 07:35:50 2010 From: adamkennedybackup at gmail.com (Adam Kennedy) Date: Sun, 29 Aug 2010 21:35:50 +1000 Subject: [Bioperl-l] Could I install BioPerl on Windows with the ActivePerl 5.12.1? In-Reply-To: <5115F433-06AC-46F1-81AD-D15C4A8D9524@gmail.com> References: <78E913D5-00E2-45F2-AA9D-7F4A7CDBFDA1@gmail.com> <5115F433-06AC-46F1-81AD-D15C4A8D9524@gmail.com> Message-ID: http://strawberryperl.com/download/professional/strawberry-perl-professional-5.10.1.3-alpha-2.msi You get BioPerl installed out the box. Adam K On 20 August 2010 03:20, Christopher Fields wrote: > cc'ing list. ?Looks like the BioPerl PPM is possibly broken for perl 5.12. ?Shouldn't be too hard to fix, but apparently there are a lot of missing packages. Troubling... > > chris > > On Aug 19, 2010, at 11:29 AM, han sun wrote: > >> v5.10 works,thanks. >> >> 2010/8/19 Christopher Fields >> Try using ActivePerl 5.10 instead of v5.12. ?It's very possible the PPM won't work for v5.12 yet. >> >> chris >> >> On Aug 19, 2010, at 9:25 AM, han sun wrote: >> >> > Hello everyone, >> > >> > I have used perl for several months,and I now want to feel the power of >> > bioperl. >> > But it seems that the installing is more difficult than I thought. >> > >> > I typed the commands. >> > >> > >> > >> > install-shell >> > >> > >> > rep add bioperl http://bioperl.org/DIST >> > >> > >> > rep add uwinnipeg >> > http://cpan.uwinnipeg.ca/PPMPackages/12xx/ >> > >> > >> > rep add trouchelle http://trouchelle.com/ppm12/ >> > >> > install BioPerl >> > >> > However,the installing failed, >> > >> > ppm install failed: >> > Can't find any package that provides List::MoreUtils for Bundle-BioPerl-Core >> > Can't find any package that provides PostScript::TextBlock for >> > Bundle-BioPerl-Core >> > Can't find any package that provides Ace:: for Bundle-BioPerl-Core >> > Can't find any package that provides SOAP::Lite for Bundle-BioPerl-Core >> > Can't find any package that provides Convert::Binary::C for >> > Bundle-BioPerl-Core >> > Can't find any package that provides XML::Twig for Bundle-BioPerl-Core >> > Can't find any package that provides DB_File:: for Bundle-BioPerl-Core >> > Can't find any package that provides IPC::Run for GraphViz >> > Can't find any package that provides XML-XPathEngine for XML-DOM-XPath >> > Can't find any package that provides List-MoreUtils for Moose >> > Can't find any package that provides List-MoreUtils for Class-MOP >> > >> > >> > then I tried >> > >> > install http://www.bribes.org/perl/ppm/GD.ppd >> > >> > and tried the installation again,but it still didn't help. >> > >> > * >> > * >> > * >> > * >> > * >> > * >> > >> > >> > *Do you konw what's wrong with the problem?* >> > * >> > * >> > * >> > * >> > *Please help me,thanks very much.* >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields1 at gmail.com Sun Aug 29 11:58:50 2010 From: cjfields1 at gmail.com (Christopher Fields) Date: Sun, 29 Aug 2010 10:58:50 -0500 Subject: [Bioperl-l] Could I install BioPerl on Windows with the ActivePerl 5.12.1? In-Reply-To: References: <78E913D5-00E2-45F2-AA9D-7F4A7CDBFDA1@gmail.com> <5115F433-06AC-46F1-81AD-D15C4A8D9524@gmail.com> Message-ID: Yes, and I am thinking of pointing more and more users that direction instead. Can't say maintaining PPM packages with ever-fluctuating specs is easy when I don't work with Windows anymore. chris On Aug 29, 2010, at 6:35 AM, Adam Kennedy wrote: > http://strawberryperl.com/download/professional/strawberry-perl-professional-5.10.1.3-alpha-2.msi > > You get BioPerl installed out the box. > > Adam K > > On 20 August 2010 03:20, Christopher Fields wrote: >> cc'ing list. Looks like the BioPerl PPM is possibly broken for perl 5.12. Shouldn't be too hard to fix, but apparently there are a lot of missing packages. Troubling... >> >> chris >> >> On Aug 19, 2010, at 11:29 AM, han sun wrote: >> >>> v5.10 works,thanks. >>> >>> 2010/8/19 Christopher Fields >>> Try using ActivePerl 5.10 instead of v5.12. It's very possible the PPM won't work for v5.12 yet. >>> >>> chris >>> >>> On Aug 19, 2010, at 9:25 AM, han sun wrote: >>> >>>> Hello everyone, >>>> >>>> I have used perl for several months,and I now want to feel the power of >>>> bioperl. >>>> But it seems that the installing is more difficult than I thought. >>>> >>>> I typed the commands. >>>> >>>> >>>> >>>> install-shell >>>> >>>> >>>> rep add bioperl http://bioperl.org/DIST >>>> >>>> >>>> rep add uwinnipeg >>>> http://cpan.uwinnipeg.ca/PPMPackages/12xx/ >>>> >>>> >>>> rep add trouchelle http://trouchelle.com/ppm12/ >>>> >>>> install BioPerl >>>> >>>> However,the installing failed, >>>> >>>> ppm install failed: >>>> Can't find any package that provides List::MoreUtils for Bundle-BioPerl-Core >>>> Can't find any package that provides PostScript::TextBlock for >>>> Bundle-BioPerl-Core >>>> Can't find any package that provides Ace:: for Bundle-BioPerl-Core >>>> Can't find any package that provides SOAP::Lite for Bundle-BioPerl-Core >>>> Can't find any package that provides Convert::Binary::C for >>>> Bundle-BioPerl-Core >>>> Can't find any package that provides XML::Twig for Bundle-BioPerl-Core >>>> Can't find any package that provides DB_File:: for Bundle-BioPerl-Core >>>> Can't find any package that provides IPC::Run for GraphViz >>>> Can't find any package that provides XML-XPathEngine for XML-DOM-XPath >>>> Can't find any package that provides List-MoreUtils for Moose >>>> Can't find any package that provides List-MoreUtils for Class-MOP >>>> >>>> >>>> then I tried >>>> >>>> install http://www.bribes.org/perl/ppm/GD.ppd >>>> >>>> and tried the installation again,but it still didn't help. >>>> >>>> * >>>> * >>>> * >>>> * >>>> * >>>> * >>>> >>>> >>>> *Do you konw what's wrong with the problem?* >>>> * >>>> * >>>> * >>>> * >>>> *Please help me,thanks very much.* >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From odclerck at gmail.com Fri Aug 27 03:44:14 2010 From: odclerck at gmail.com (odclerck) Date: Fri, 27 Aug 2010 00:44:14 -0700 (PDT) Subject: [Bioperl-l] fasta header replace Message-ID: <29550202.post@talk.nabble.com> Hi, Was wondering if someone had an easy script available that converts the headers of a fasta sequences to a value stored in a separate text file. Macrogen produces files with sequences that look more or less like this: >100825-30_I01_CF_CentralAmerica1_A1_psbAF.ab1 1012, 1000 bases, 0 checksum. I can filter out the position on the plate e.g. "A1" easily but would like to replace this with the name of the strain stored in a different text file, e.g. "A1_D1222". Realize this sounds pretty basic to most of you, but I'm pretty new at scripting. Olivier -- View this message in context: http://old.nabble.com/fasta-header-replace-tp29550202p29550202.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From J.Christopher.Ellis at duke.edu Mon Aug 30 08:55:04 2010 From: J.Christopher.Ellis at duke.edu (J. Christopher Ellis) Date: Mon, 30 Aug 2010 08:55:04 -0400 Subject: [Bioperl-l] Taxonomy DB problem Message-ID: <51468.1283172904@duke.edu> Hi All, I am trying to extract the entire taxonomy of an organism including the classifications. Some thing like... Phylum:Proteobacteria,?Class:Gammaproteobacteria,?Order:Enterobacteriales,?Family:Enterobacteriaceae,?Genus:Escherichia I?am?not?worried?about?format?just?that?I?get?the?information?and?the?associated?level?of?hierarchy.?The?script?found?at?http://bioperl.org/wiki/Species_names_from_accession_numbers?seemed?like?a?good?starting?point?so?I?copied?it?and?tried?run?it?but?got?an?error. My?first?question?is?"Is?there?a?known?fix?for?this?"?and?my?second?question?is?how?do?I?get?the?full?hierarchical?information?(as?seen?above)?with?the?taxonomy?db? Thanks?for?all?your?help?in?advance! Chris? From rafalucas.unicamp at gmail.com Mon Aug 30 09:24:11 2010 From: rafalucas.unicamp at gmail.com (Rafael Lucas) Date: Mon, 30 Aug 2010 10:24:11 -0300 Subject: [Bioperl-l] help in algorithm Bio::Structure::IO::pdb Message-ID: Hi folks, How are you? I'm from Brazil and I was making an algorithm that Cryptographyc a data and then print the result in a pdb file. So I have a .fasta file and want to pass this file to .pdb file, if I use a program, like PyMol, it will take so much time, so I wanna use the Bio::Structure::IO::pdb to accelerate this process, could you help me in this problem? Thank you, Rafael Lucas Faculdade de Tecnologia em Analise e Desenvolvimento de Sistemas FT - UNICAMP +55 (19)9614-0533 From cjfields at illinois.edu Mon Aug 30 09:36:41 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 30 Aug 2010 08:36:41 -0500 Subject: [Bioperl-l] Taxonomy DB problem In-Reply-To: <51468.1283172904@duke.edu> References: <51468.1283172904@duke.edu> Message-ID: Chris, Regarding a fix for that script, we would have to see your modified script and the error. However, there are modules within BioPerl to essentially do what you want, in particular, Bio::DB::Taxonomy. chris On Aug 30, 2010, at 7:55 AM, J. Christopher Ellis wrote: > Hi All, > > I am trying to extract the entire taxonomy of an organism including the > classifications. Some thing like... > > Phylum:Proteobacteria, Class:Gammaproteobacteria, Order:Enterobacteriales, Family:Enterobacteriaceae, Genus:Escherichia > > I am not worried about format just that I get the information and the associated level of hierarchy. The script found at http://bioperl.org/wiki/Species_names_from_accession_numbers seemed like a good starting point so I copied it and tried run it but got an error. > > My first question is "Is there a known fix for this?" and my second question is how do I get the full hierarchical information (as seen above) with the taxonomy db? > > Thanks for all your help in advance! > > Chris > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From fs5 at sanger.ac.uk Mon Aug 30 11:11:06 2010 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Mon, 30 Aug 2010 16:11:06 +0100 Subject: [Bioperl-l] fasta header replace In-Reply-To: <29550202.post@talk.nabble.com> References: <29550202.post@talk.nabble.com> Message-ID: <4C7BCA0A.70503@sanger.ac.uk> Hi Olivier, Do you know how to read a file and build a hash from the contents? This is what you will need to do, e.g. if your file is A1 Strain_A A2 Strain_A A3 Strain_B then you can do something like: open (INFILE, '>', $infile_path) or die; my %well2strain; While (){ my ($well, $strain) = ($_=~/^([A-Z]\d+)\s+(\w+)/); $well2strain{$well}=$strain; } You can then use the values of the hash to set the sequence ID as you parse the FASTA file. The BioPerl SeqIO howto gives details about how to read and write the FASTA file (http://www.bioperl.org/wiki/HOWTO:SeqIO#Working_Examples). You can change the id of a sequence object with $some_seq_object->id( 'my new ID'); See http://doc.bioperl.org/releases/bioperl-1.0/Bio/Seq.html for details. Hope that helps to get you started. Frank odclerck wrote: > Hi, > Was wondering if someone had an easy script available that converts the > headers of a fasta sequences to a value stored in a separate text file. > > Macrogen produces files with sequences that look more or less like this: > >> 100825-30_I01_CF_CentralAmerica1_A1_psbAF.ab1 1012, 1000 bases, 0 checksum. >> > > I can filter out the position on the plate e.g. "A1" easily but would like > to replace this with the name of the strain stored in a different text file, > e.g. "A1_D1222". > > Realize this sounds pretty basic to most of you, but I'm pretty new at > scripting. > Olivier > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From jessica.sun at gmail.com Mon Aug 30 11:51:39 2010 From: jessica.sun at gmail.com (Jessica Sun) Date: Mon, 30 Aug 2010 11:51:39 -0400 Subject: [Bioperl-l] Git for the lazy In-Reply-To: <4A13D48C-B920-4FA5-AF18-292C764A8B79@sbc.su.se> References: <4A13D48C-B920-4FA5-AF18-292C764A8B79@sbc.su.se> Message-ID: I want to add sequence features with tags and tag values, I want to have them in my order, however somehow it seems it is in default alphabetically orders of the tags, does any one knows how to fix? thanks a lot in advance. From G.Gallone at sms.ed.ac.uk Tue Aug 31 07:52:57 2010 From: G.Gallone at sms.ed.ac.uk (Giuseppe Gallone) Date: Tue, 31 Aug 2010 12:52:57 +0100 Subject: [Bioperl-l] New CPAN Release - Bio::Homology::InterologWalk - A Perl Module to retrieve putative PPIs through Interologs Message-ID: <4C7CED19.80802@sms.ed.ac.uk> Dear Bioperl users, I would like to announce the release of Bio::Homology::InterologWalk, a module that retrieves, scores and visualizes putative Protein-Protein Interactions through the orthology-walk method. The project is available from the following link http://search.cpan.org/~ggallone/ and a description of the idea behind it is here http://search.cpan.org/~ggallone/Bio-Homology-InterologWalk-0.02/lib/Bio/Homology/InterologWalk.pm#DESCRIPTION The project is in a very early stage (currently ver. 0.02 alpha) and has currently been tested only on Linux environments. It has not been tested on Macs, but it should work fine, and I would appreciate any feedback from Mac users who try it. *Any* form of feedback will be extremely appreciated (bug, typos, syntactical errors, verbal abuse etc :) ). Best, Giuseppe -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From cjfields at illinois.edu Tue Aug 31 11:01:59 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 31 Aug 2010 10:01:59 -0500 Subject: [Bioperl-l] Taxonomy DB problem In-Reply-To: <56973.1283255847@duke.edu> References: <56973.1283255847@duke.edu> Message-ID: <7167CA86-857E-4E16-A3D6-BA45045CF892@illinois.edu> Yes, I see that one. It may be the ID hash that is being returned is empty. I'll look into it. -c On Aug 31, 2010, at 6:57 AM, J. Christopher Ellis wrote: > Hi Chris, > > The error is... > > "Use of uninitialized value $id in join or string at C:/Perl64/site/lib/Bio/Tools/EUtilities/EUtilParameters.pm line 363." > > The script from http://bioperl.org/wiki/Species_names_from_accession_numbers is as follows.... > > use Bio::DB::EUtilities; > > > > > > > > > my (%taxa, @taxa); > > > > my (%names, %idmap); > > > > > > > > > # these are protein ids; nuc ids will work by changing -dbfrom => 'nucleotide', > > > > # (probably) > > > > > > > > > my @ids = qw(1621261 89318838 68536103 > > 20807972 > 730439); > > > > > > > my $factory = Bio::DB::EUtilities->new( > > - > eutil => 'elink', > > > -db => 'taxonomy', > > > > > -dbfrom => 'protein', > > > > > -correspondence => 1, > > > > > -id => \@ids); > > > > > > > > > # iterate through the LinkSet objects > > > > while (my $ds = $factory->next_LinkSet) { > > > > > $taxa{($ds->get_submitted_ids)[0] > > } > = ($ds->get_ids)[0] > > } > > > > > > > > > @taxa = @taxa{@ids}; > > > > > > > > > $factory = Bio::DB::EUtilities->new(-eutil > > => > 'esummary', > > > -db => 'taxonomy', > > > > > -id => \@taxa ); > > > > > > > > > while (local $_ = $factory->next_DocSum) > > > { > > > $names{($_->get_contents_by_name('TaxId')) > > [ > 0]} = > > ($_->get_contents_by_name('ScientificName'))[0 > > ] > ; > > } > > > > > > > > > foreach (@ids) { > > > > > $idmap{$_} = $names{$taxa{$_ > > } > }; > > } > > > > > > > > > # %idmap is > > > > # 1621261 => 'Mycobacterium tuberculosis H37Rv' > > > > # 20807972 => 'Thermoanaerobacter tengcongensis MB4' > > > > # 68536103 => 'Corynebacterium jeikeium K411' > > > > # 730439 => 'Bacillus caldolyticus' > > > > # 89318838 => undef (this record has been removed from the db) > > > > > > > > > 1; > > > Thanks, > > > > Chris > > > On Mon 08/30/10 09:36 , "Chris Fields" cjfields at illinois.edu sent: > Chris, > > Regarding a fix for that script, we would have to see your modified script and the error. However, there are modules within BioPerl to essentially do what you want, in particular, Bio::DB::Taxonomy. > > chris > > On Aug 30, 2010, at 7:55 AM, J. Christopher Ellis wrote: > > > Hi All, > > > > I am trying to extract the entire taxonomy of an organism including the > > classifications. Some thing like... > > > > Phylum:Proteobacteria, Class:Gammaproteobacteria, Order:Enterobacteriales, Family:Enterobacteriaceae, Genus:Escherichia > > > > I am not worried about format just that I get the information and the associated level of hierarchy. The script found athttp://bioperl.org/wiki/Species_names_from_accession_numbers">http://bioperl.org/wiki/Species_names_from_accession_numbers seemed like a good starting point so I copied it and tried run it but got an error. > > > > My first question is "Is there a known fix for this?" and my second question is how do I get the full hierarchical information (as seen above) with the taxonomy db? > > > > Thanks for all your help in advance! > > > > Chris > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l">http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From J.Christopher.Ellis at duke.edu Tue Aug 31 07:57:27 2010 From: J.Christopher.Ellis at duke.edu (J. Christopher Ellis) Date: Tue, 31 Aug 2010 07:57:27 -0400 Subject: [Bioperl-l] Taxonomy DB problem Message-ID: <56973.1283255847@duke.edu> Hi Chris, The error is... "Use of uninitialized value $id in join or string at C:/Perl64/site/lib/Bio/Tools/EUtilities/EUtilParameters.pm line 363." The script from http://bioperl.org/wiki/Species_names_from_accession_numbers is as follows.... use?Bio::DB::EUtilities; ? my?(%taxa,?@taxa); my?(%names,?%idmap); ? #?these?are?protein?ids;?nuc?ids?will?work?by?changing?-dbfrom?=>?'nucleotide', #?(probably) ? my?@ids?=?qw(1621261?89318838?68536103? 20807972?730439); ? my?$factory?=?Bio::DB::EUtilities->new( -eutil?=>?'elink', ?-db?=>?'taxonomy', ?-dbfrom?=>?'protein', ?-correspondence?=>?1, ?-id?=>?@ids); ? #?iterate?through?the?LinkSet?objects while?(my?$ds?=?$factory->next_LinkSet)?{ ?$taxa{($ds->get_submitted_ids)[0] }?=?($ds->get_ids)[0] } ? @taxa?=?@taxa{@ids}; ? $factory?=?Bio::DB::EUtilities->new(-eutil? =>?'esummary', ?-db?=>?'taxonomy', ?-id?=>?@taxa?); ? while?(local?$_?=?$factory->next_DocSum) ?{ ?$names{($_->get_contents_by_name('TaxId')) [0]}?=? ($_->get_contents_by_name('ScientificName'))[0 ]; } ? foreach?(@ids)?{ ?$idmap{$_}?=?$names{$taxa{$_ }}; } ? #?%idmap?is #?1621261?=>?'Mycobacterium?tuberculosis?H37Rv' #?20807972?=>?'Thermoanaerobacter?tengcongensis?MB4' #?68536103?=>?'Corynebacterium?jeikeium?K411' #?730439?=>?'Bacillus?caldolyticus' #?89318838?=>?undef?(this?record?has?been?removed?from?the?db) ? 1; Thanks, Chris On Mon 08/30/10 09:36 , "Chris Fields" cjfields at illinois.edu sent: Chris, Regarding a fix for that script, we would have to see your modified script and the error. However, there are modules within BioPerl to essentially do what you want, in particular, Bio::DB::Taxonomy. chris On Aug 30, 2010, at 7:55 AM, J. Christopher Ellis wrote: > Hi All, > > I am trying to extract the entire taxonomy of an organism including the > classifications. Some thing like... > > Phylum:Proteobacteria, Class:Gammaproteobacteria, Order:Enterobacteriales, Family:Enterobacteriaceae, Genus:Escherichia > > I am not worried about format just that I get the information and the associated level of hierarchy. The script found at http://bioperl.org/wiki/Species_names_from_accession_numbers seemed like a good starting point so I copied it and tried run it but got an error. > > My first question is "Is there a known fix for this?" and my second question is how do I get the full hierarchical information (as seen above) with the taxonomy db? > > Thanks for all your help in advance! > > Chris > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From rmb32 at cornell.edu Sun Aug 1 15:17:14 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Sun, 01 Aug 2010 12:17:14 -0700 Subject: [Bioperl-l] GMOD Evo Hackathon Open Call for Participation Message-ID: <4C55C83A.3060700@cornell.edu> We are seeking participants for the GMOD Tools for Evolutionary Biology Hackathon, held November 8-12, 2010 at the US National Evolutionary Synthesis Center (NESCent) in Durham, NC. This hackathon targets three critical gaps in the capabilities of the GMOD toolbox that currently limit its utility for evolutionary research: 1. Visualization of comparative genomics data 2. Visualization of phylogenetic data and trees 3. Support for population diversity and phenotype data If you are interested in these areas and have relevant expertise, you are strongly encouraged to apply. Relevant areas of expertise include more than just software development: if you are a GMOD power user, visualization guru, domain expert (comparative, phylogenetics, population, ...), or documentation wizard, then your skills are needed! How To Apply: Fill out the online application form at http://bit.ly/gmodevohack. Applications are due August 25. About GMOD: GMOD is an intercompatible suite of open-source software components for storing, managing, analyzing, and visualizing genome-scale data. GMOD includes many widely-used software components: GBrowse and JBrowse, both genome viewers; GBrowse_syn, a comparative genomics viewer; Chado, a generic and modular database schema; CMap, a comparative map viewer; as well as many other components including Apollo, MAKER, BioMart, InterMine, and Galaxy. We hope to extend the functionality of existing GMOD components, and integrate new components as well. About Hackathons: A hackathon is an intense event at which a group of programmers with different backgrounds and skills collaborate hands-on and face-to-face to develop working code that is of utility to the community as a whole. The mix of people will include domain experts and computer-savvy end-users. More details about the event, its motivation, organization, procedures, and attendees, as well as URLs to the hackathon and related websites are included below. Sincerely, The GMOD EvoHack Organizing Committee (and project affiliations as relevant): Nicole Washington, Chair (LBNL, modENCODE, Phenote) Robert Buels (SGN, Chado NatDiv) Scott Cain (OICR, GMOD) Dave Clements (NESCent, GMOD) Hilmar Lapp (NESCent, Phenoscape, Chado NatDiv) Sheldon McKay (University of Arizona, iPlant, GBrowse_syn) ----------------------------- About the GMOD Evo Hackathon Overview We are organizing a hackathon to fill critical gaps in the capabilities of the Generic Model Organism Database (GMOD) toolbox that currently limit its utility for evolutionary research. Specifically, we will focus on tools for 1) viewing comparative genomics data; 2) visualizing phylogenomic data; and 3) supporting population diversity data and phenotype annotation. The event will be hosted at NESCent and bring together a group of about 20+ software developers, end-user representatives, and documentation experts who would otherwise not meet. The participants will include key developers of GMOD components that currently lack features critical for emerging evolutionary biology research, developers of informatics tools in evolutionary research that lack GMOD integration, and informatics-savvy biologists who can represent end-user requirements. The event will provide a unique opportunity to infuse the GMOD developer community with a heightened awareness of unmet needs in evolutionary biology that GMOD components have the potential to fill, and for tool developers in evolutionary biology to better understand how best to extend or integrate with already existing GMOD components. Before the Event Discussion of ideas and sometimes even design actually starts well before the hackathon, on mailing lists, wiki pages, and conference calls set up among accepted attendees. This advance work lays the foundation for participants to be productive from the very first day. This also means that participants should be willing to contribute some time in advance of the hackathon itself to participate in this preparatory discussion. During the Event Typically, hackathon participants use the morning of the first day of the event to organize themselves into working groups of between 3 and 6 people, each with a focused implementation objective. Ideas and objectives are discussed, and attendees coalesce around the projects in which they have the most experience or interest. Deliverables / Event Results The meeting's attendance, working groups, and outcomes will be fully logged and documented on the GMOD wiki (http://gmod.org). Each working group during the event will typically have its own wiki page, linked from the main EvoHack page, where it documents its minutes and design notes, and provides links to the code and documentation it produces. Also, since GMOD and NESCent are both committed to open source principles, all code and documentation produced by participants during the hackathon must be published under an OSI-approved open source license. As contributions to existing GMOD tools, all hackathon products will most likely satisfy this requirement automatically. NESCent This event is sponsored by the US National Evolutionary Synthesis Center (NESCent, http://www.nescent.org) through its Informatics Whitepapers program (http://www.nescent.org/informatics/whitepapers.php). NESCent promotes the synthesis of information, concepts and knowledge to address significant, emerging, or novel questions in evolutionary science and its applications. NESCent achieves this by supporting research and education across disciplinary, institutional, geographic, and demographic boundaries (see http://www.nescent.org/science/proposals.php). Links Main GMOD EvoHack page, and full proposal: http://gmod.org/wiki/GMOD_Evo_Hackathon NESCent: http://www.nescent.org/ GMOD: http://gmod.org Similar past NESCent events, see: http://hackathon.nescent.org/ GMOD hackathon application: http://bit.ly/gmodevohack -- http://gmod.org/wiki/GMOD_News http://gmod.org/wiki/GMOD_Europe_2010 http://gmod.org/wiki/Help_Desk_Feedback From maj at fortinbras.us Sun Aug 1 19:19:16 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 1 Aug 2010 19:19:16 -0400 Subject: [Bioperl-l] SOAP Eutilities In-Reply-To: References: Message-ID: <627BEC8B2E624A69A0B11EEBC8C93B71@NewLife> Turns out that module lives in bioperl-run; try git clone git://github.com/bioperl/bioperl-run.git MAJ ----- Original Message ----- From: "Robson de Souza" To: Sent: Saturday, July 31, 2010 4:56 PM Subject: [Bioperl-l] SOAP Eutilities > Hi, > > Bio::DB::SoapEUtilities, referred in the HOWTO on EUtilities, seems to > have disappeared from the Git repository. > A simple > > git clone git://github.com/bioperl/bioperl-live.git > > does not download it. Any ideas why? > Robson > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From David.Messina at sbc.su.se Mon Aug 2 09:58:10 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 2 Aug 2010 15:58:10 +0200 Subject: [Bioperl-l] phyloxml and element order In-Reply-To: References: Message-ID: Hi Fred, Thanks for letting us know about this ? definitely sounds like a bug. Would you please submit this to our bug tracker? http://bugzilla.open-bio.org (You can just copy and paste your previous email.) Dave On Jul 30, 2010, at 06:59, Fr?d?ric Romagn? wrote: > Hi, > > I'm using bioperl to create phyloxml trees, after few tentatives, i got my > tree with all the element/attributes i want but when I write the tree, > element are not written following the order specified in the XSD Schema. > > For example, i got : > > > > Loxosceles intermedia > > Araneomorphae Sicariidae > > > 969 > HAAERADSRKPIWDIAHMVNDLELVD > > > > Araneomorphae Sicariidae > > > > The program forester complains that should be written before the > element. > > According to > http://phyloxml.wordpress.com/2009/11/25/order-of-elements-in-phyloxml this > is what bioperl is supposed to do. > > All my element/attributes are set before writing the tree using > 'add_Annotation', 'add_tag_value' and 'sequence' methods from a > Bio::Tree::AnnotatableNode object, so i think the error comes from the > write_tree method. > > Any help would be appreciated. > > Thank you, > Fred > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From shalabh.sharma7 at gmail.com Mon Aug 2 15:44:35 2010 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Mon, 2 Aug 2010 15:44:35 -0400 Subject: [Bioperl-l] clustalw to maf format Message-ID: Hi, I am trying to convert clustalw to maf format. I am trying to use AlignIO for that but its not working. Its giving me the following error: EXCEPTION Bio::Root::NotImplemented ------------- MSG: Abstract method "Bio::AlignIO::maf::write_aln" is not implemented by package Bio::AlignIO::maf. This is not your fault - author of Bio::AlignIO::maf should be blamed! STACK Bio::Root::RootI::throw_not_implemented /Library/Perl/5.8.8/Bio/Root/RootI.pm:707 STACK Bio::AlignIO::maf::write_aln /Library/Perl/5.8.8/Bio/AlignIO/ maf.pm:176 STACK Bio::AlignIO::PRINT /Library/Perl/5.8.8/Bio/AlignIO.pm:492 STACK toplevel msf2mafy.pl:11 Is there any other way i can convert clustalw to maf? I would really appreciate if anyone can help me out. Thanks Shalabh From Russell.Smithies at agresearch.co.nz Mon Aug 2 16:25:26 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue, 3 Aug 2010 08:25:26 +1200 Subject: [Bioperl-l] clustalw to maf format In-Reply-To: References: Message-ID: <18DF7D20DFEC044098A1062202F5FFF32F02147B68@exchsth.agresearch.co.nz> This might work if you only have a few: http://www.ibi.vu.nl/programs/convertalignwww/ --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of shalabh sharma > Sent: Tuesday, 3 August 2010 7:45 a.m. > To: bioperl-l > Subject: [Bioperl-l] clustalw to maf format > > Hi, > I am trying to convert clustalw to maf format. > I am trying to use AlignIO for that but its not working. > > Its giving me the following error: > > EXCEPTION Bio::Root::NotImplemented ------------- > MSG: Abstract method "Bio::AlignIO::maf::write_aln" is not implemented by > package Bio::AlignIO::maf. > This is not your fault - author of Bio::AlignIO::maf should be blamed! > > STACK Bio::Root::RootI::throw_not_implemented > /Library/Perl/5.8.8/Bio/Root/RootI.pm:707 > STACK Bio::AlignIO::maf::write_aln /Library/Perl/5.8.8/Bio/AlignIO/ > maf.pm:176 > STACK Bio::AlignIO::PRINT /Library/Perl/5.8.8/Bio/AlignIO.pm:492 > STACK toplevel msf2mafy.pl:11 > > > Is there any other way i can convert clustalw to maf? > > I would really appreciate if anyone can help me out. > > Thanks > Shalabh > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From shalabh.sharma7 at gmail.com Mon Aug 2 16:53:31 2010 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Mon, 2 Aug 2010 16:53:31 -0400 Subject: [Bioperl-l] clustalw to maf format In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32F02147B68@exchsth.agresearch.co.nz> References: <18DF7D20DFEC044098A1062202F5FFF32F02147B68@exchsth.agresearch.co.nz> Message-ID: Hi Russell, Thanks for the reply, but i have around 400 alignments and some huge ones :( Thanks Shalabh On Mon, Aug 2, 2010 at 4:25 PM, Smithies, Russell < Russell.Smithies at agresearch.co.nz> wrote: > This might work if you only have a few: > http://www.ibi.vu.nl/programs/convertalignwww/ > > --Russell > > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of shalabh sharma > > Sent: Tuesday, 3 August 2010 7:45 a.m. > > To: bioperl-l > > Subject: [Bioperl-l] clustalw to maf format > > > > Hi, > > I am trying to convert clustalw to maf format. > > I am trying to use AlignIO for that but its not working. > > > > Its giving me the following error: > > > > EXCEPTION Bio::Root::NotImplemented ------------- > > MSG: Abstract method "Bio::AlignIO::maf::write_aln" is not implemented by > > package Bio::AlignIO::maf. > > This is not your fault - author of Bio::AlignIO::maf should be blamed! > > > > STACK Bio::Root::RootI::throw_not_implemented > > /Library/Perl/5.8.8/Bio/Root/RootI.pm:707 > > STACK Bio::AlignIO::maf::write_aln /Library/Perl/5.8.8/Bio/AlignIO/ > > maf.pm:176 > > STACK Bio::AlignIO::PRINT /Library/Perl/5.8.8/Bio/AlignIO.pm:492 > > STACK toplevel msf2mafy.pl:11 > > > > > > Is there any other way i can convert clustalw to maf? > > > > I would really appreciate if anyone can help me out. > > > > Thanks > > Shalabh > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > From biopython at maubp.freeserve.co.uk Mon Aug 2 17:24:09 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 2 Aug 2010 22:24:09 +0100 Subject: [Bioperl-l] clustalw to maf format In-Reply-To: References: Message-ID: On Mon, Aug 2, 2010 at 8:44 PM, shalabh sharma wrote: > Hi, > ? ?I am trying to convert clustalw to maf format. > I am trying to use AlignIO for that but its not working. Could you tell us why you have to use maf format? I'm curious because all of the phylogenetics tools I've had to work with personally will take some other format which is more widely supported (e.g. FASTA, PFAM, ClustalW, PHYLIP, ...). Peter From bernd.web at gmail.com Mon Aug 2 17:25:52 2010 From: bernd.web at gmail.com (Bernd Web) Date: Mon, 2 Aug 2010 23:25:52 +0200 Subject: [Bioperl-l] clustalw to maf format In-Reply-To: References: <18DF7D20DFEC044098A1062202F5FFF32F02147B68@exchsth.agresearch.co.nz> Message-ID: Hi Shalabh, This ConvertAlign does not write maf either, it only reads it (i made it). I found some other converters on the web but they do not export to maf format either... http://biotechvana.uv.es/servers/afc/main.php http://www.hiv.lanl.gov/content/sequence/FORMAT_CONVERSION/form.html Galaxy has a MAF to Fasta converter: http://main.g2.bx.psu.edu/root?tool_id=MAF_To_Fasta1 Regards, Bernd On Mon, Aug 2, 2010 at 10:53 PM, shalabh sharma wrote: > Hi Russell, > ? ? ? ? ? ?Thanks for the reply, but i ?have around 400 alignments and some > huge ones :( > > Thanks > Shalabh > > > On Mon, Aug 2, 2010 at 4:25 PM, Smithies, Russell < > Russell.Smithies at agresearch.co.nz> wrote: > >> This might work if you only have a few: >> http://www.ibi.vu.nl/programs/convertalignwww/ >> >> --Russell >> >> >> > -----Original Message----- >> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> > bounces at lists.open-bio.org] On Behalf Of shalabh sharma >> > Sent: Tuesday, 3 August 2010 7:45 a.m. >> > To: bioperl-l >> > Subject: [Bioperl-l] clustalw to maf format >> > >> > Hi, >> > ? ? I am trying to convert clustalw to maf format. >> > I am trying to use AlignIO for that but its not working. >> > >> > Its giving me the following error: >> > >> > EXCEPTION Bio::Root::NotImplemented ------------- >> > MSG: Abstract method "Bio::AlignIO::maf::write_aln" is not implemented by >> > package Bio::AlignIO::maf. >> > This is not your fault - author of Bio::AlignIO::maf should be blamed! >> > >> > STACK Bio::Root::RootI::throw_not_implemented >> > /Library/Perl/5.8.8/Bio/Root/RootI.pm:707 >> > STACK Bio::AlignIO::maf::write_aln /Library/Perl/5.8.8/Bio/AlignIO/ >> > maf.pm:176 >> > STACK Bio::AlignIO::PRINT /Library/Perl/5.8.8/Bio/AlignIO.pm:492 >> > STACK toplevel msf2mafy.pl:11 >> > >> > >> > Is there any other way i can convert clustalw to maf? >> > >> > I would really appreciate if anyone can help me out. >> > >> > Thanks >> > Shalabh >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> ======================================================================= >> Attention: The information contained in this message and/or attachments >> from AgResearch Limited is intended only for the persons or entities >> to which it is addressed and may contain confidential and/or privileged >> material. Any review, retransmission, dissemination or other use of, or >> taking of any action in reliance upon, this information by persons or >> entities other than the intended recipients is prohibited by AgResearch >> Limited. If you have received this message in error, please notify the >> sender immediately. >> ======================================================================= >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Mon Aug 2 17:31:20 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 2 Aug 2010 16:31:20 -0500 Subject: [Bioperl-l] clustalw to maf format In-Reply-To: References: <18DF7D20DFEC044098A1062202F5FFF32F02147B68@exchsth.agresearch.co.nz> Message-ID: <6E9C9D64-D23A-4FC8-B213-FC8A7FFA4F27@illinois.edu> No other format will work? The main reason you see unimplemented methods like this is there is no active interest in working with this format beyond getting the information stored within them into objects and other commonly-used formats. chris On Aug 2, 2010, at 3:53 PM, shalabh sharma wrote: > Hi Russell, > Thanks for the reply, but i have around 400 alignments and some > huge ones :( > > Thanks > Shalabh > > > On Mon, Aug 2, 2010 at 4:25 PM, Smithies, Russell < > Russell.Smithies at agresearch.co.nz> wrote: > >> This might work if you only have a few: >> http://www.ibi.vu.nl/programs/convertalignwww/ >> >> --Russell >> >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>> bounces at lists.open-bio.org] On Behalf Of shalabh sharma >>> Sent: Tuesday, 3 August 2010 7:45 a.m. >>> To: bioperl-l >>> Subject: [Bioperl-l] clustalw to maf format >>> >>> Hi, >>> I am trying to convert clustalw to maf format. >>> I am trying to use AlignIO for that but its not working. >>> >>> Its giving me the following error: >>> >>> EXCEPTION Bio::Root::NotImplemented ------------- >>> MSG: Abstract method "Bio::AlignIO::maf::write_aln" is not implemented by >>> package Bio::AlignIO::maf. >>> This is not your fault - author of Bio::AlignIO::maf should be blamed! >>> >>> STACK Bio::Root::RootI::throw_not_implemented >>> /Library/Perl/5.8.8/Bio/Root/RootI.pm:707 >>> STACK Bio::AlignIO::maf::write_aln /Library/Perl/5.8.8/Bio/AlignIO/ >>> maf.pm:176 >>> STACK Bio::AlignIO::PRINT /Library/Perl/5.8.8/Bio/AlignIO.pm:492 >>> STACK toplevel msf2mafy.pl:11 >>> >>> >>> Is there any other way i can convert clustalw to maf? >>> >>> I would really appreciate if anyone can help me out. >>> >>> Thanks >>> Shalabh >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> ======================================================================= >> Attention: The information contained in this message and/or attachments >> from AgResearch Limited is intended only for the persons or entities >> to which it is addressed and may contain confidential and/or privileged >> material. Any review, retransmission, dissemination or other use of, or >> taking of any action in reliance upon, this information by persons or >> entities other than the intended recipients is prohibited by AgResearch >> Limited. If you have received this message in error, please notify the >> sender immediately. >> ======================================================================= >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From shalabh.sharma7 at gmail.com Mon Aug 2 18:30:41 2010 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Mon, 2 Aug 2010 18:30:41 -0400 Subject: [Bioperl-l] clustalw to maf format In-Reply-To: <6E9C9D64-D23A-4FC8-B213-FC8A7FFA4F27@illinois.edu> References: <18DF7D20DFEC044098A1062202F5FFF32F02147B68@exchsth.agresearch.co.nz> <6E9C9D64-D23A-4FC8-B213-FC8A7FFA4F27@illinois.edu> Message-ID: Hi All, Thanks for the replies. Actually i am working on a pipeline involving RNAz. I had impression that there must be a converter available as their webserver can take xmfa or maf format but standalone is only accepting maf format. I think i will use a program that can output as xmfa and write to those people if they can provide me with the converter. Thanks Shalabh On Mon, Aug 2, 2010 at 5:31 PM, Chris Fields wrote: > No other format will work? The main reason you see unimplemented methods > like this is there is no active interest in working with this format beyond > getting the information stored within them into objects and other > commonly-used formats. > > chris > > On Aug 2, 2010, at 3:53 PM, shalabh sharma wrote: > > > Hi Russell, > > Thanks for the reply, but i have around 400 alignments and > some > > huge ones :( > > > > Thanks > > Shalabh > > > > > > On Mon, Aug 2, 2010 at 4:25 PM, Smithies, Russell < > > Russell.Smithies at agresearch.co.nz> wrote: > > > >> This might work if you only have a few: > >> http://www.ibi.vu.nl/programs/convertalignwww/ > >> > >> --Russell > >> > >> > >>> -----Original Message----- > >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>> bounces at lists.open-bio.org] On Behalf Of shalabh sharma > >>> Sent: Tuesday, 3 August 2010 7:45 a.m. > >>> To: bioperl-l > >>> Subject: [Bioperl-l] clustalw to maf format > >>> > >>> Hi, > >>> I am trying to convert clustalw to maf format. > >>> I am trying to use AlignIO for that but its not working. > >>> > >>> Its giving me the following error: > >>> > >>> EXCEPTION Bio::Root::NotImplemented ------------- > >>> MSG: Abstract method "Bio::AlignIO::maf::write_aln" is not implemented > by > >>> package Bio::AlignIO::maf. > >>> This is not your fault - author of Bio::AlignIO::maf should be blamed! > >>> > >>> STACK Bio::Root::RootI::throw_not_implemented > >>> /Library/Perl/5.8.8/Bio/Root/RootI.pm:707 > >>> STACK Bio::AlignIO::maf::write_aln /Library/Perl/5.8.8/Bio/AlignIO/ > >>> maf.pm:176 > >>> STACK Bio::AlignIO::PRINT /Library/Perl/5.8.8/Bio/AlignIO.pm:492 > >>> STACK toplevel msf2mafy.pl:11 > >>> > >>> > >>> Is there any other way i can convert clustalw to maf? > >>> > >>> I would really appreciate if anyone can help me out. > >>> > >>> Thanks > >>> Shalabh > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> ======================================================================= > >> Attention: The information contained in this message and/or attachments > >> from AgResearch Limited is intended only for the persons or entities > >> to which it is addressed and may contain confidential and/or privileged > >> material. Any review, retransmission, dissemination or other use of, or > >> taking of any action in reliance upon, this information by persons or > >> entities other than the intended recipients is prohibited by AgResearch > >> Limited. If you have received this message in error, please notify the > >> sender immediately. > >> ======================================================================= > >> > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From chiragmatkarbioinfo at gmail.com Tue Aug 3 03:47:37 2010 From: chiragmatkarbioinfo at gmail.com (chirag matkar) Date: Tue, 3 Aug 2010 13:17:37 +0530 Subject: [Bioperl-l] Pubmed Parsing Message-ID: Hello all, I have a list of Pubmed Ids. I want to parse articles to find specific SNP related information. Can i work it out using a Script? -- Regards, Chirag Matkar From genehack at genehack.org Tue Aug 3 05:03:35 2010 From: genehack at genehack.org (John Anderson) Date: Tue, 3 Aug 2010 05:03:35 -0400 Subject: [Bioperl-l] Pubmed Parsing In-Reply-To: References: Message-ID: <5E557C44-224B-4460-9C2C-E375555B8BE6@genehack.org> On Aug 3, 2010, at 3:47 AM, chirag matkar wrote: > I have a list of Pubmed Ids. > I want to parse articles to find specific SNP related information. > Can i work it out using a Script? Can you provide a more specific example of what you'd like to do? For example, something along the lines of, "for PMID 1234, get ... about SNP 5678" (where '...' is replaced with whatever it is you're trying to get). Even describing how you would obtain this information using the website yourself will be helpful. thanks, john. From gowthaman.ramasamy at seattlebiomed.org Tue Aug 3 01:29:10 2010 From: gowthaman.ramasamy at seattlebiomed.org (Gowthaman Ramasamy) Date: Mon, 2 Aug 2010 22:29:10 -0700 Subject: [Bioperl-l] Getting pileup consensus from BAM files using Bio::DB::Sam In-Reply-To: Message-ID: Hi List, I am trying to find out the consensus using pileup via Bio::DB::Sam. Using the following script I could parse out the ref_base and different bases from reads at that position. Though, I am not able to find a method to derive consensus. Similar to the values produced by "samtools pileup -c -f xxxxxx.fasta yyyyyyy.bam". The script I use now retrives ref base, query bases for each position. How do I improve it to get the consensus? Thanks very much in advance, Gowthaman use Bio::DB::Sam; my $bam = Bio::DB::Sam->new(-bam => 'something.bam', -fasta => 'something.fasta' ); my $cb = sub { my ($seqid, $pos, $pileups) = @_; my $refBase = $bam->segment($seqid, $pos, $pos)->dna; print "\n$pos\t$refBase=>"; for my $pileup (@$pileups){ my $al = $pileup->alignment; my $qBase = substr($al->qseq, $pileup->qpos, 1); print "$qBase,"; } }; $bam->pileup('Lin.chr10i', $cb); From scott at scottcain.net Tue Aug 3 06:32:59 2010 From: scott at scottcain.net (Scott Cain) Date: Tue, 3 Aug 2010 06:32:59 -0400 Subject: [Bioperl-l] Getting pileup consensus from BAM files using Bio::DB::Sam In-Reply-To: References: Message-ID: Hi Gowthaman, I don't see a method to extract the consensus. You are welcome to submit a patch :-) Scott On Tue, Aug 3, 2010 at 1:29 AM, Gowthaman Ramasamy wrote: > Hi List, > I am trying to find out the consensus using pileup via Bio::DB::Sam. Using the following script I could parse out the ref_base and different bases from reads at that position. Though, I am not able to find a method to derive consensus. Similar to the values produced by "samtools pileup -c -f xxxxxx.fasta yyyyyyy.bam". > > The script I use now retrives ref base, query bases for each position. How do I improve it to get the consensus? > > Thanks very much in advance, > Gowthaman > > > use Bio::DB::Sam; > > my $bam = Bio::DB::Sam->new(-bam => 'something.bam', > ? ? ? ? ? ? ? ? ? ? ? ? ? ?-fasta => 'something.fasta' > ? ? ? ? ? ? ? ? ? ? ? ? ? ); > > my $cb = sub { > ? ? ? ? ? ? ? ? ? ? ? ?my ($seqid, $pos, $pileups) = @_; > ? ? ? ? ? ? ? ? ? ? ? ?my $refBase = $bam->segment($seqid, $pos, $pos)->dna; > ? ? ? ? ? ? ? ? ? ? ? ?print "\n$pos\t$refBase=>"; > ? ? ? ? ? ? ? ? ? ? ? ?for my $pileup (@$pileups){ > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?my $al = $pileup->alignment; > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?my $qBase = substr($al->qseq, $pileup->qpos, 1); > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?print "$qBase,"; > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?} > ? ? ? ? ? ? ? ? ? ? ? ?}; > > $bam->pileup('Lin.chr10i', $cb); > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From lincoln.stein at gmail.com Tue Aug 3 12:57:52 2010 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Tue, 3 Aug 2010 12:57:52 -0400 Subject: [Bioperl-l] Getting pileup consensus from BAM files using Bio::DB::Sam In-Reply-To: References: Message-ID: Samtools is running MAQ on the pileup. You could either implement MAQ in perl, or come up with your own consensus caller. Lincoln On Tue, Aug 3, 2010 at 1:29 AM, Gowthaman Ramasamy < gowthaman.ramasamy at seattlebiomed.org> wrote: > Hi List, > I am trying to find out the consensus using pileup via Bio::DB::Sam. Using > the following script I could parse out the ref_base and different bases from > reads at that position. Though, I am not able to find a method to derive > consensus. Similar to the values produced by "samtools pileup -c -f > xxxxxx.fasta yyyyyyy.bam". > > The script I use now retrives ref base, query bases for each position. How > do I improve it to get the consensus? > > Thanks very much in advance, > Gowthaman > > > use Bio::DB::Sam; > > my $bam = Bio::DB::Sam->new(-bam => 'something.bam', > -fasta => 'something.fasta' > ); > > my $cb = sub { > my ($seqid, $pos, $pileups) = @_; > my $refBase = $bam->segment($seqid, $pos, > $pos)->dna; > print "\n$pos\t$refBase=>"; > for my $pileup (@$pileups){ > my $al = $pileup->alignment; > my $qBase = substr($al->qseq, $pileup->qpos, > 1); > print "$qBase,"; > } > }; > > $bam->pileup('Lin.chr10i', $cb); > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From biopython at maubp.freeserve.co.uk Tue Aug 3 13:06:46 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 3 Aug 2010 18:06:46 +0100 Subject: [Bioperl-l] Getting pileup consensus from BAM files using Bio::DB::Sam In-Reply-To: References: Message-ID: On Tue, Aug 3, 2010 at 5:57 PM, Lincoln Stein wrote: > Samtools is running MAQ on the pileup. You could either implement MAQ in > perl, or come up with your own consensus caller. > > Lincoln See also: http://seqanswers.com/forums/showthread.php?t=6241 From gowthaman.ramasamy at seattlebiomed.org Tue Aug 3 13:28:36 2010 From: gowthaman.ramasamy at seattlebiomed.org (Gowthaman Ramasamy) Date: Tue, 3 Aug 2010 10:28:36 -0700 Subject: [Bioperl-l] Getting pileup consensus from BAM files using Bio::DB::Sam In-Reply-To: References: , Message-ID: <89080953C3D300419AACB6E63A7EEFBA5C47613B34@mail02.sbri.org> Hi Lincoln, Thats a good lead. I will try to use MAQ in perl rather than using my simple majority rule. -gowtham ________________________________________ From: Lincoln Stein [lincoln.stein at gmail.com] Sent: Tuesday, August 03, 2010 9:57 AM To: Gowthaman Ramasamy Cc: bioperl-l Subject: Re: [Bioperl-l] Getting pileup consensus from BAM files using Bio::DB::Sam Samtools is running MAQ on the pileup. You could either implement MAQ in perl, or come up with your own consensus caller. Lincoln On Tue, Aug 3, 2010 at 1:29 AM, Gowthaman Ramasamy > wrote: Hi List, I am trying to find out the consensus using pileup via Bio::DB::Sam. Using the following script I could parse out the ref_base and different bases from reads at that position. Though, I am not able to find a method to derive consensus. Similar to the values produced by "samtools pileup -c -f xxxxxx.fasta yyyyyyy.bam". The script I use now retrives ref base, query bases for each position. How do I improve it to get the consensus? Thanks very much in advance, Gowthaman use Bio::DB::Sam; my $bam = Bio::DB::Sam->new(-bam => 'something.bam', -fasta => 'something.fasta' ); my $cb = sub { my ($seqid, $pos, $pileups) = @_; my $refBase = $bam->segment($seqid, $pos, $pos)->dna; print "\n$pos\t$refBase=>"; for my $pileup (@$pileups){ my $al = $pileup->alignment; my $qBase = substr($al->qseq, $pileup->qpos, 1); print "$qBase,"; } }; $bam->pileup('Lin.chr10i', $cb); _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa > From stefan.kirov at bms.com Tue Aug 3 16:22:35 2010 From: stefan.kirov at bms.com (Stefan Kirov) Date: Tue, 03 Aug 2010 16:22:35 -0400 Subject: [Bioperl-l] nmica parser Message-ID: <4C587A8B.8090603@bms.com> Has anyone written nmica parser? If not I will perhaps do that. It should be straightforward- the output is XML. Stefan -------------- next part -------------- A non-text attachment was scrubbed... Name: stefan_kirov.vcf Type: text/x-vcard Size: 207 bytes Desc: not available URL: From fs5 at sanger.ac.uk Wed Aug 4 04:45:39 2010 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Wed, 04 Aug 2010 09:45:39 +0100 Subject: [Bioperl-l] Pubmed Parsing In-Reply-To: References: Message-ID: <1280911539.3499.46.camel@deskpro15336.dynamic.sanger.ac.uk> Hi Chiraq, have a look at this earlier post: http://bioperl.org/pipermail/bioperl-l/2009-April/029690.html However, you won't be able to retrieve all full texts and it is quite a task to parse natural language and get useful information about a gene, protein, SNP etc out of a manuscript. Frank On Tue, 2010-08-03 at 13:17 +0530, chirag matkar wrote: > Hello all, > I have a list of Pubmed Ids. > I want to parse articles to find specific SNP related information. > Can i work it out using a Script? > > > > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From David.Messina at sbc.su.se Thu Aug 5 08:16:17 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 5 Aug 2010 14:16:17 +0200 Subject: [Bioperl-l] call for a TreeIO volunteer Message-ID: <91AC5B00-5969-4C56-B08A-6EEA76916A10@sbc.su.se> Hi everybody, We've got a couple of small open bugs related to the Bio::TreeIO modules, and we could really use someone to take a look at them. Ideally, that someone would have familiarity with TreeIO already.* It'd help us to get the next release (1.6.2) out the door. The bugs in question are: - TreeIO::newick writes root node branch length incorrectly http://bugzilla.open-bio.org/show_bug.cgi?id=3039 - Bio::TreeIO::nhx cannot parse empty [&&NHX] + round-trip failure http://bugzilla.open-bio.org/show_bug.cgi?id=3007 Thanks, Dave on behalf of the core developers * Even if you don't, though, if you've been looking for an opportunity to contribute to BioPerl, and this sounds like something you'd like to work on, by all means raise your hand. From clements at nescent.org Thu Aug 5 13:15:41 2010 From: clements at nescent.org (Dave Clements) Date: Thu, 5 Aug 2010 10:15:41 -0700 Subject: [Bioperl-l] GMOD Europe 2010, 13-16 Sept, Cambridge, UK In-Reply-To: References: Message-ID: GMOD Europe 2010 ================ 13-16 September 2010 Cambridge, UK http://gmod.org/wiki/GMOD_Europe_2010 We are pleased to announce GMOD Europe 2010, four days of GMOD events being held 13-16 September 2010, at the University of Cambridge. GMOD Europe 2010 includes: 1) GMOD Community Meeting, Monday & Tuesday: Project updates, developer and user presentations and best practices, project direction. 2) GMOD Satellite Meetings, Wednesday: Special interest groups where GMOD community members meet to discuss specific topics of interest. 3) InterMine Workshop, Wednesday: A one day workshop on installing, configuring and using the InterMine biological data warehouse system. 4) BioMart Workshop, Thursday: A one day workshop on using the BioMart biological data warehouse system, including accessing data through APIs. Registration is now open for these events. There is a ?50 registration fee for the GMOD Meeting to cover catered lunches and other expenses. Registration for all other events is free, but required, as space is limited. These events are open to all: GMOD users, developers, prospective users, biologists, and computer scientists. See http://gmod.org/wiki/January_2010_GMOD_Meeting for an idea of what goes on at GMOD meetings, GMOD is a collection of interoperable open source software components for managing, visualizing and annotating biological data. GMOD incorporates many widely used tools, including GBrowse and JBrowse for genome browsing, InterMine and BioMart for data mining, Galaxy and Ergatis for workflow, Chado for data management, GBrowse_syn and CMap for comparative genomics, plus many other tools (Apollo, MAKER, Pathway Tools, Textpresso, ...). GMOD is also an active community of researchers and developers addressing common challenges in exploiting their data. If you are struggling to fully exploit your data then please consider attending GMOD Europe 2010. Please let us know if you have any questions, and we hope to see you in Cambridge. Thanks, Scott Cain and Dave Clements -- http://gmod.org/wiki/GMOD_News http://gmod.org/wiki/GMOD_Evo_Hackathon http://gmod.org/wiki/GMOD_Europe_2010 http://gmod.org/wiki/Help_Desk_Feedback From abhishek.vit at gmail.com Thu Aug 5 18:15:56 2010 From: abhishek.vit at gmail.com (Abhishek Pratap) Date: Thu, 5 Aug 2010 18:15:56 -0400 Subject: [Bioperl-l] Wrapper for Picard tools in Bioperl Message-ID: Hi All Just wondering if there is any Picard wrapper/s available in Bioperl. Thanks! -Abhi ----------------------------- Abhishek Pratap Bioinformatics Software Engineer II Genomics Resource Center Institute for Genome Sciences School of Medicine, Univ of Maryland 801, W. Baltimore Street, Baltimore, MD 21209 Ph: (+1)-410-706-2296 www.igs.umaryland.edu/ From Russell.Smithies at agresearch.co.nz Thu Aug 5 18:37:46 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Fri, 6 Aug 2010 10:37:46 +1200 Subject: [Bioperl-l] Wrapper for Picard tools in Bioperl In-Reply-To: References: Message-ID: <18DF7D20DFEC044098A1062202F5FFF32F02262E96@exchsth.agresearch.co.nz> Might be part of the "Enterprise" package. If not, some developer should "make it so". :-) --Russell (I hate Fridays) > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Abhishek Pratap > Sent: Friday, 6 August 2010 10:16 a.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Wrapper for Picard tools in Bioperl > > Hi All > > Just wondering if there is any Picard wrapper/s available in Bioperl. > > > Thanks! > -Abhi > > ----------------------------- > Abhishek Pratap > Bioinformatics Software Engineer II > Genomics Resource Center > Institute for Genome Sciences > School of Medicine, Univ of Maryland > 801, W. Baltimore Street, Baltimore, MD 21209 > Ph: (+1)-410-706-2296 > www.igs.umaryland.edu/ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From cjfields at illinois.edu Thu Aug 5 19:10:16 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 5 Aug 2010 18:10:16 -0500 Subject: [Bioperl-l] Wrapper for Picard tools in Bioperl In-Reply-To: References: Message-ID: <26E3E5B6-47CF-4744-9687-199C218B5571@illinois.edu> Picard uses samtools, which has a perl API: http://search.cpan.org/dist/Bio-SamTools/ which uses BioPerl. Ah, the circle of life... chris On Aug 5, 2010, at 5:15 PM, Abhishek Pratap wrote: > Hi All > > Just wondering if there is any Picard wrapper/s available in Bioperl. > > > Thanks! > -Abhi > > ----------------------------- > Abhishek Pratap > Bioinformatics Software Engineer II > Genomics Resource Center > Institute for Genome Sciences > School of Medicine, Univ of Maryland > 801, W. Baltimore Street, Baltimore, MD 21209 > Ph: (+1)-410-706-2296 > www.igs.umaryland.edu/ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From dan.kortschak at adelaide.edu.au Thu Aug 5 21:06:45 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Fri, 06 Aug 2010 10:36:45 +0930 Subject: [Bioperl-l] MUMmer parser work Message-ID: <1281056805.2414.26.camel@zoidberg.mbs.adelaide.edu.au> Hello Everyone, I've just noticed the absence of a MUMmer parser and thought that it might be a worthwhile contribution to bioperl-run (I won't be able to start on this for a while, but given Mark's excellent work on CommandExts, it should take too long to get up when I do have time). Has anyone made any effort in this direction that I would be stepping on, or if they have left it, that I could pick up to shorten the work time? cheers Dan From cjfields at illinois.edu Thu Aug 5 23:13:51 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 5 Aug 2010 22:13:51 -0500 Subject: [Bioperl-l] MUMmer parser work In-Reply-To: <1281056805.2414.26.camel@zoidberg.mbs.adelaide.edu.au> References: <1281056805.2414.26.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: <80AF6158-9ADF-47A6-97EC-C322F75C8959@illinois.edu> Dan, Just so you know, there is a proposed MUMmer AlignIO parser that John (genehack) is planning on trying to incorporate in: http://bugzilla.open-bio.org/show_bug.cgi?id=2701 It currently lacks significant tests, so feel free to chip in there as needed. chris On Aug 5, 2010, at 8:06 PM, Dan Kortschak wrote: > Hello Everyone, > > I've just noticed the absence of a MUMmer parser and thought that it > might be a worthwhile contribution to bioperl-run (I won't be able to > start on this for a while, but given Mark's excellent work on > CommandExts, it should take too long to get up when I do have time). Has > anyone made any effort in this direction that I would be stepping on, or > if they have left it, that I could pick up to shorten the work time? > > cheers > Dan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From greg at ebi.ac.uk Fri Aug 6 05:47:21 2010 From: greg at ebi.ac.uk (Gregory Jordan) Date: Fri, 6 Aug 2010 10:47:21 +0100 Subject: [Bioperl-l] call for a TreeIO volunteer In-Reply-To: <91AC5B00-5969-4C56-B08A-6EEA76916A10@sbc.su.se> References: <91AC5B00-5969-4C56-B08A-6EEA76916A10@sbc.su.se> Message-ID: I can help out with these. I'm pretty sure I've previously fought with (and perhaps even come up with a fix for) bug 3039, and I can take a look at 3007 too. Now lemme just see if I can get up and running with the Bioperl test suite. I'll give a shout if I run into any problems. Cheers, Greg On 5 August 2010 13:16, Dave Messina wrote: > Hi everybody, > > We've got a couple of small open bugs related to the Bio::TreeIO modules, > and we could really use someone to take a look at them. Ideally, that > someone would have familiarity with TreeIO already.* > > It'd help us to get the next release (1.6.2) out the door. > > The bugs in question are: > - TreeIO::newick writes root node branch length incorrectly > http://bugzilla.open-bio.org/show_bug.cgi?id=3039 > > - Bio::TreeIO::nhx cannot parse empty [&&NHX] + round-trip failure > http://bugzilla.open-bio.org/show_bug.cgi?id=3007 > > > Thanks, > Dave > on behalf of the core developers > > > * Even if you don't, though, if you've been looking for an opportunity to > contribute to BioPerl, and this sounds like something you'd like to work on, > by all means raise your hand. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jun.yin at ucd.ie Fri Aug 6 06:52:14 2010 From: jun.yin at ucd.ie (Jun Yin) Date: Fri, 06 Aug 2010 11:52:14 +0100 Subject: [Bioperl-l] Packages retrieving online alignment sequences Message-ID: <00d901cb3555$6e3a5500$4aaeff00$%yin@ucd.ie> Hi, all, I am the google summer of code student working on refactoring Bio::Align subsystem. I recently implemented several packages retrieving online alignment sequences. The aim of the packages are to provide convenient methods to retrieve online alignment sequences for the BioPerl users. The alignment sequences are converted into Bio::SimpleAlign object after the retrieval, which will be easy to manipulate and write to local disk. Now the packages support Pfam, Rfam, Prosite and Entrez Protein Clusters databases. Here is the structure of the packages: Packages Bio::DB::Align (interface, and calling other packages) Bio::DB::Align::Pfam (retrieving alignment from Pfam) Bio::DB::Align::Rfam (retrieving alignment from Rfam) Bio::DB::Align:Prosite (retrieving alignment from Prosite) Bio::DB::Align:ProtClustDB (retrieving alignment from Entrez Protein Clusters Database) Usually four methods are provided for each package: Methods get_Aln_by_id (retrieving alignment by id and returns Bio::SimpleAlign object) get_Aln_by_acc (retrieving alignment by acession and returns Bio::SimpleAlign object) (Rfam and Prosite only supports this method) id2acc (id to accession conversion) acc2id (accession to id conversion) These packages are built dependent on LWP::UserAgent, HTTP::Request and Bio::DB::GenericWebAgent. Bio::DB::Align::ProtClustDB is dependent on Bio::DB::EUtilities. Calling the packages can be: my $dbobj=Bio::DB::Align->new(-db=>"rfam"); Or, my $dbobj= Bio::DB::Align::Pfam->new(); my $aln=$dbobj->get_Aln_by_acc("RF0001"); my $aln2=$dbobj->get_Aln_by_acc(-accession=>"RF0001",-alignment=>"full"); print $aln->length(); foreach my $seq ($aln->each_Seq) { #do something } I have done some tests on these packages. And, I will write them into standard tests later. Any suggestions on these packages are welcome. Cheers, Jun Yin Ph.D. student in U.C.D. Bioinformatics Laboratory Conway Institute University College Dublin From David.Messina at sbc.su.se Fri Aug 6 08:59:19 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 6 Aug 2010 14:59:19 +0200 Subject: [Bioperl-l] call for a TreeIO volunteer In-Reply-To: References: <91AC5B00-5969-4C56-B08A-6EEA76916A10@sbc.su.se> Message-ID: <6D6DAA77-2A2F-4AAA-B36D-FACED1FDE383@sbc.su.se> > I can help out with these. I'm pretty sure I've previously fought with (and perhaps even come up with a fix for) bug 3039, and I can take a look at 3007 too. Awesome ? thanks Greg! > Now lemme just see if I can get up and running with the Bioperl test suite. I'll give a shout if I run into any problems. Please do. Dave From David.Messina at sbc.su.se Fri Aug 6 09:06:47 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 6 Aug 2010 15:06:47 +0200 Subject: [Bioperl-l] Packages retrieving online alignment sequences In-Reply-To: <00d901cb3555$6e3a5500$4aaeff00$%yin@ucd.ie> References: <00d901cb3555$6e3a5500$4aaeff00$%yin@ucd.ie> Message-ID: Sounds great, Jun! Did you happen to test your code on very large alignments? I know there's one in Pfam that's something like 100,000 sequences. An rRNA, I believe. Dave From jun.yin at ucd.ie Fri Aug 6 09:11:41 2010 From: jun.yin at ucd.ie (Jun Yin) Date: Fri, 06 Aug 2010 14:11:41 +0100 Subject: [Bioperl-l] Packages retrieving online alignment sequences In-Reply-To: References: <00d901cb3555$6e3a5500$4aaeff00$%yin@ucd.ie> Message-ID: <00fc01cb3568$e97968b0$bc6c3a10$%yin@ucd.ie> Hi, Dave, Thx for reminding me this. I will definitely try it. Cheers, Jun Yin Ph.D.?student in U.C.D. Bioinformatics Laboratory Conway Institute University College Dublin -----Original Message----- From: Dave Messina [mailto:David.Messina at sbc.su.se] Sent: Friday, August 06, 2010 2:07 PM To: Jun Yin Cc: bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Packages retrieving online alignment sequences Sounds great, Jun! Did you happen to test your code on very large alignments? I know there's one in Pfam that's something like 100,000 sequences. An rRNA, I believe. Dave __________ Information from ESET Smart Security, version of virus signature database 5346 (20100806) __________ The message was checked by ESET Smart Security. http://www.eset.com __________ Information from ESET Smart Security, version of virus signature database 5346 (20100806) __________ The message was checked by ESET Smart Security. http://www.eset.com From cjfields at illinois.edu Fri Aug 6 09:19:54 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 6 Aug 2010 08:19:54 -0500 Subject: [Bioperl-l] call for a TreeIO volunteer In-Reply-To: <6D6DAA77-2A2F-4AAA-B36D-FACED1FDE383@sbc.su.se> References: <91AC5B00-5969-4C56-B08A-6EEA76916A10@sbc.su.se> <6D6DAA77-2A2F-4AAA-B36D-FACED1FDE383@sbc.su.se> Message-ID: <8CB3DE9A-4C5C-42A3-94B4-8818D7143951@illinois.edu> On Aug 6, 2010, at 7:59 AM, Dave Messina wrote: > >> I can help out with these. I'm pretty sure I've previously fought with (and perhaps even come up with a fix for) bug 3039, and I can take a look at 3007 too. > > Awesome ? thanks Greg! > > >> Now lemme just see if I can get up and running with the Bioperl test suite. I'll give a shout if I run into any problems. > > Please do. > > > > Dave Agreed, and thanks for helping out! chris From dianabowley at gmail.com Fri Aug 6 18:33:57 2010 From: dianabowley at gmail.com (DRBowley) Date: Fri, 6 Aug 2010 15:33:57 -0700 (PDT) Subject: [Bioperl-l] BioPerl install issues Message-ID: I'm new to both perl and bioperl and I'm having issues installing bioperl. I'm trying to install on a Mac OS 10.6.4, and I've already installed perl (5.10.0). I tried installing using the recommended approach for Mac - via Fink... "fink install bioperl-pm5100" Looking back over the terminal window text it looks like the problem is: "This package requires Module::Build v0.2805 or greater to install itself." I tried doing "fink selfupdate" and that did not fix the problem. Any suggestions? Thanks! Diana From Kevin.M.Brown at asu.edu Fri Aug 6 18:50:45 2010 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Fri, 6 Aug 2010 15:50:45 -0700 Subject: [Bioperl-l] BioPerl install issues In-Reply-To: References: Message-ID: <1A4207F8295607498283FE9E93B775B406E44A05@EX02.asurite.ad.asu.edu> http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix#INSTALLING_BIOPE RL_THE_EASY_WAY_USING_Build.PL Not sure why you had to install perl since it should have been part of the stock OSX install (or at least it was last time I logged onto a mac). Not sure why the Fink method has so many issues, but might try the above which works for linux or bsd. -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of DRBowley Sent: Friday, August 06, 2010 3:34 PM To: bioperl-l at bioperl.org Subject: [Bioperl-l] BioPerl install issues I'm new to both perl and bioperl and I'm having issues installing bioperl. I'm trying to install on a Mac OS 10.6.4, and I've already installed perl (5.10.0). I tried installing using the recommended approach for Mac - via Fink... "fink install bioperl-pm5100" Looking back over the terminal window text it looks like the problem is: "This package requires Module::Build v0.2805 or greater to install itself." I tried doing "fink selfupdate" and that did not fix the problem. Any suggestions? Thanks! Diana _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From skastu01 at students.poly.edu Fri Aug 6 20:03:50 2010 From: skastu01 at students.poly.edu (Lakshmi Kastury) Date: Sat, 7 Aug 2010 00:03:50 +0000 Subject: [Bioperl-l] BioPerl install issues Message-ID: Hi - I went through several failed attempts on MACOS Snow Leopard, and fink was a dead end. Eventually I succeeded to install on Windows Vista using CPAN. I am not sure if this method will work with MACOS: 1. Opened command prompt. 2. Typed command: >perl -MCPAN -e "install Bundle::BioPerl" 3. Answered yes to the series of questions, which prompts install of several bundles and a compiler. The instructions were in a link from: http://bioperl.org/Core/Latest/INSTALL All the best, Lakshmi > Date: Fri, 6 Aug 2010 15:33:57 -0700 > From: dianabowley at gmail.com > To: bioperl-l at bioperl.org > Subject: [Bioperl-l] BioPerl install issues > > I'm new to both perl and bioperl and I'm having issues installing > bioperl. I'm trying to install on a Mac OS 10.6.4, and I've already > installed perl (5.10.0). I tried installing using the recommended > approach for Mac - via Fink... > "fink install bioperl-pm5100" > > Looking back over the terminal window text it looks like the problem > is: > "This package requires Module::Build v0.2805 or greater to install > itself." > > I tried doing "fink selfupdate" and that did not fix the problem. > > Any suggestions? > > Thanks! > Diana > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Sat Aug 7 02:47:40 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Sat, 7 Aug 2010 08:47:40 +0200 Subject: [Bioperl-l] BioPerl install issues In-Reply-To: References: Message-ID: <5BE9DB7C-9A51-4C09-8F83-8CA8ED4AADFE@sbc.su.se> On Aug 7, 2010, at 02:03 , Lakshmi Kastury wrote: > I am not sure if this method will work with MACOS: It will. CPAN is cross-platform and is the best way to install BioPerl. Dave From cjfields at illinois.edu Sat Aug 7 09:58:56 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 7 Aug 2010 08:58:56 -0500 Subject: [Bioperl-l] BioPerl install issues In-Reply-To: <5BE9DB7C-9A51-4C09-8F83-8CA8ED4AADFE@sbc.su.se> References: <5BE9DB7C-9A51-4C09-8F83-8CA8ED4AADFE@sbc.su.se> Message-ID: It should work fine. Even installing from trunk right now works w/o failing tests. chris On Aug 7, 2010, at 1:47 AM, Dave Messina wrote: > > On Aug 7, 2010, at 02:03 , Lakshmi Kastury wrote: > >> I am not sure if this method will work with MACOS: > > It will. CPAN is cross-platform and is the best way to install BioPerl. > > > Dave > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From greg at ebi.ac.uk Sat Aug 7 17:14:58 2010 From: greg at ebi.ac.uk (Gregory Jordan) Date: Sat, 7 Aug 2010 22:14:58 +0100 Subject: [Bioperl-l] Packages retrieving online alignment sequences In-Reply-To: <00fc01cb3568$e97968b0$bc6c3a10$%yin@ucd.ie> References: <00d901cb3555$6e3a5500$4aaeff00$%yin@ucd.ie> <00fc01cb3568$e97968b0$bc6c3a10$%yin@ucd.ie> Message-ID: Maybe I'm just a bit naive here, but what is the expected difference between accession and ID and why do we need a separate method for each? Seems to me that one could just have a single method, get_Aln, which determines under the hood whether the query string is an accession or ID. It would be nice if the SimpleAlign object had its Annotation filled with some extra metadata (such as accession, ID, database version number, URI, etc.). One other thing: have you thought about adding an Ensembl adaptor? Or maybe something similar already exists in BioPerl...? Sure Ensembl provides their own Perl API, but for someone who doesn't want to go through the hassle of installing it from CVS (pardon my french, but wtf!?! Who still uses CVS) and learning a whole new API, it might be convenient to have a simple BioPerl module for quickly grabbing gene family alignments from the public Ensembl MySQL databases. I'd be willing to help write the necessary SQL queries for this. greg On 6 August 2010 14:11, Jun Yin wrote: > Hi, Dave, > > Thx for reminding me this. I will definitely try it. > > Cheers, > Jun Yin > Ph.D. student in U.C.D. > > Bioinformatics Laboratory > Conway Institute > University College Dublin > > > -----Original Message----- > From: Dave Messina [mailto:David.Messina at sbc.su.se] > Sent: Friday, August 06, 2010 2:07 PM > To: Jun Yin > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Packages retrieving online alignment sequences > > Sounds great, Jun! > > Did you happen to test your code on very large alignments? I know there's > one in Pfam that's something like 100,000 sequences. An rRNA, I believe. > > > Dave > > > __________ Information from ESET Smart Security, version of virus signature > database 5346 (20100806) __________ > > The message was checked by ESET Smart Security. > > http://www.eset.com > > > > > __________ Information from ESET Smart Security, version of virus signature > database 5346 (20100806) __________ > > The message was checked by ESET Smart Security. > > http://www.eset.com > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Sat Aug 7 18:07:39 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 7 Aug 2010 17:07:39 -0500 Subject: [Bioperl-l] Packages retrieving online alignment sequences In-Reply-To: References: <00d901cb3555$6e3a5500$4aaeff00$%yin@ucd.ie> <00fc01cb3568$e97968b0$bc6c3a10$%yin@ucd.ie> Message-ID: <21E3B6D7-01BC-4DDA-B5B3-06F1F5AD7105@illinois.edu> On Aug 7, 2010, at 4:14 PM, Gregory Jordan wrote: > Maybe I'm just a bit naive here, but what is the expected difference between > accession and ID and why do we need a separate method for each? Depends on the remote service, but in many cases there is a difference. With NCBI eutils you can have either an accession and the unique identifier (UID, or GI for nuc/protein seqs). efetch can use both, but only the UID is guaranteed to retrieve a single sequence all the time; the accession can (very rarely) map to more than one sequence. The other eutils services require either a string (esearch) or a UID, but do not allow an accession. > Seems to me > that one could just have a single method, get_Aln, which determines under > the hood whether the query string is an accession or ID. A simpler method could be introduced, but I can see that being potentially brittle in the long run. A naked alphanumeric string doesn't reveal much about what it is at face value w/o knowing database/service-specific behavior. And then we're reliant on that behavior not changing, which we can't guarantee (this has bitten us in the past). What would one do if NCBI (for instance) allowed accessions derived completely of digits, or conversely a unique ID with mixed alphanumerics? Using methods specific for ID/acc at least guarantees a behavior on the backend w/o guessing, and if there is no danger of overlap (a service accepts either/or) one could simply be an alias of the other. > It would be nice if the SimpleAlign object had its Annotation filled with > some extra metadata (such as accession, ID, database version number, URI, > etc.). According to the deobfuscator SimpleAlign does have accession() and id(). The others could be simple attributes, and can be added as simple getter/setters, or as annotation via Bio::Annotation (this is the way Stockholm annotation is currently handled). Something to think about. > One other thing: have you thought about adding an Ensembl adaptor? Or maybe > something similar already exists in BioPerl...? That's a good idea, though it might make more sense if this was done when mem-efficient (possibly DB-dependent) AlignI modules are present within bioperl, which is part of the GSoC (see below). For instance, have a Bio::Align::AlignI with a backend ensembl DB adaptor that works lazily. If using the Ensembl Perl API, a few possible roadblocks/problems might pop up. Ensembl currently requires bioperl (v1.2.3, but it works with the latest as well, at least when I've used it). If using the ensembl perl API we would just need to ensure we aren't conflicting with ensembl code that pulls in bioperl classes expecting a v1.2.3 API when we only support the latest. I don't foresee this being an issue, though (there is precedent for this, see Sendu's Ensembl module Bio::Tools::Run::Ensembl in bioperl-run). > Sure Ensembl provides their own Perl API, but for someone who doesn't want > to go through the hassle of installing it from CVS (pardon my french, but > wtf!?! Who still uses CVS) and learning a whole new API, it might be > convenient to have a simple BioPerl module for quickly grabbing gene family > alignments from the public Ensembl MySQL databases. I'd be willing to help > write the necessary SQL queries for this. > > greg The GSoC project on alignment subsystem refactoring will be finishing up this month, so I'm sure Jun discuss ideas for initial DB-dependent implementations. The more input and coders implementing the better, IMO. As for writing up an adaptor to ensembl outside of it's API, overall I don't think it's a bad idea, but if it's possible maybe start without reinventing things, then move to direct SQL. Unless it's easier to use SQL. chris > On 6 August 2010 14:11, Jun Yin wrote: > >> Hi, Dave, >> >> Thx for reminding me this. I will definitely try it. >> >> Cheers, >> Jun Yin >> Ph.D. student in U.C.D. >> >> Bioinformatics Laboratory >> Conway Institute >> University College Dublin >> >> >> -----Original Message----- >> From: Dave Messina [mailto:David.Messina at sbc.su.se] >> Sent: Friday, August 06, 2010 2:07 PM >> To: Jun Yin >> Cc: bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] Packages retrieving online alignment sequences >> >> Sounds great, Jun! >> >> Did you happen to test your code on very large alignments? I know there's >> one in Pfam that's something like 100,000 sequences. An rRNA, I believe. >> >> >> Dave >> >> >> __________ Information from ESET Smart Security, version of virus signature >> database 5346 (20100806) __________ >> >> The message was checked by ESET Smart Security. >> >> http://www.eset.com >> >> >> >> >> __________ Information from ESET Smart Security, version of virus signature >> database 5346 (20100806) __________ >> >> The message was checked by ESET Smart Security. >> >> http://www.eset.com >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hartzell at alerce.com Sat Aug 7 17:45:04 2010 From: hartzell at alerce.com (George Hartzell) Date: Sat, 7 Aug 2010 14:45:04 -0700 Subject: [Bioperl-l] BioPerl install issues In-Reply-To: References: <5BE9DB7C-9A51-4C09-8F83-8CA8ED4AADFE@sbc.su.se> Message-ID: <19549.54240.499140.501136@gargle.gargle.HOWL> Chris Fields writes: > It should work fine. Even installing from trunk right now works > w/o failing tests. As a slight aside, if you're looking to build a current perl binary for your mac (e.g. 5.12.1) you should take a look at perlbrew (http://search.cpan.org/dist/App-perlbrew/). The three steps at the top of the installation section of the README are all you need to get going. Even a manager can do it. If you're using bash on the mac via terminal you'll probably want to put the one-liner they prescribe into your .bash_profile instead of your .bashrc, but everything else just flows right along. Once you have that in place you have a nicely isolated system into which you can install things to your hearts content without worrying about PERL5LIB and local::lib and the rest. g. From cjfields at illinois.edu Sat Aug 7 21:19:54 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 7 Aug 2010 20:19:54 -0500 Subject: [Bioperl-l] BioPerl install issues In-Reply-To: <19549.54240.499140.501136@gargle.gargle.HOWL> References: <5BE9DB7C-9A51-4C09-8F83-8CA8ED4AADFE@sbc.su.se> <19549.54240.499140.501136@gargle.gargle.HOWL> Message-ID: On Aug 7, 2010, at 4:45 PM, George Hartzell wrote: > Chris Fields writes: >> It should work fine. Even installing from trunk right now works >> w/o failing tests. > > As a slight aside, if you're looking to build a current perl binary > for your mac (e.g. 5.12.1) you should take a look at perlbrew > (http://search.cpan.org/dist/App-perlbrew/). The three steps at the > top of the installation section of the README are all you need to get > going. Even a manager can do it. > > If you're using bash on the mac via terminal you'll probably want to > put the one-liner they prescribe into your .bash_profile instead of > your .bashrc, but everything else just flows right along. > > Once you have that in place you have a nicely isolated system into > which you can install things to your hearts content without worrying > about PERL5LIB and local::lib and the rest. > > g. Have to second using perlbrew, started using it for my local Ubuntu installation (don't have it running on my macbook yet, but it's in the plans). chris From greg at ebi.ac.uk Sun Aug 8 02:12:41 2010 From: greg at ebi.ac.uk (Gregory Jordan) Date: Sun, 8 Aug 2010 07:12:41 +0100 Subject: [Bioperl-l] Packages retrieving online alignment sequences In-Reply-To: <21E3B6D7-01BC-4DDA-B5B3-06F1F5AD7105@illinois.edu> References: <00d901cb3555$6e3a5500$4aaeff00$%yin@ucd.ie> <00fc01cb3568$e97968b0$bc6c3a10$%yin@ucd.ie> <21E3B6D7-01BC-4DDA-B5B3-06F1F5AD7105@illinois.edu> Message-ID: On 7 August 2010 23:07, Chris Fields wrote: > > A simpler method could be introduced, but I can see that being potentially > brittle in the long run. A naked alphanumeric string doesn't reveal much > about what it is at face value w/o knowing database/service-specific > behavior. And then we're reliant on that behavior not changing, which we > can't guarantee (this has bitten us in the past). What would one do if NCBI > (for instance) allowed accessions derived completely of digits, or > conversely a unique ID with mixed alphanumerics? > > Using methods specific for ID/acc at least guarantees a behavior on the > backend w/o guessing, and if there is no danger of overlap (a service > accepts either/or) one could simply be an alias of the other. > Thanks for the clarification on IDs vs accessions. As long as the behavior and distinction are well-documented, I'm sure it won't make too much of a difference. My main concern was just that having two similar methods -- with no clearly laid out distinction between the two and one of them only supported by half of the implementing subclasses -- might confuse potential users. As a point of reference: both Rfam and Pfam allow either an ID or an accession in their front-page search interface (http://www.pfam.org / http://www.rfam.org/). In fact, they seem to entirely hide the distinction between ID and Accession from the end user; nowhere on the Rfam page for an individual result is it clear which string is the accession and which is the ID (http://rfam.sanger.ac.uk/family/snoZ107_R87). Thus, a potential user of the Rfam module wouldn't know whether to call the get_by_ID or get_by_Accession method, even after looking at the Rfam page for his / her desired alignment! As you can probably tell, I'm all in favor of a unified search whenever feasible / possible. :-) > As for writing up an adaptor to ensembl outside of it's API, overall I > don't think it's a bad idea, but if it's possible maybe start without > reinventing things, then move to direct SQL. Unless it's easier to use SQL. > > For fetching Ensembl's gene family alignments, using the SQL will be easiest. They don't tend to get unreasonably large in terms of memory -- I think the biggest tend to be ~700 sequences with a few thousand alignment columns or so -- and it's a simple table join or two to get both the tree and alignment from the database. For genomic alignments, I agree that a more memory-efficient and/or lazy backend would be necessary. And it's pretty much impossible to get those things out of the Ensembl tables without using their API. --greg From dan.kortschak at adelaide.edu.au Sun Aug 8 20:53:43 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Mon, 09 Aug 2010 10:23:43 +0930 Subject: [Bioperl-l] MUMmer parser work In-Reply-To: <80AF6158-9ADF-47A6-97EC-C322F75C8959@illinois.edu> References: <1281056805.2414.26.camel@zoidberg.mbs.adelaide.edu.au> <80AF6158-9ADF-47A6-97EC-C322F75C8959@illinois.edu> Message-ID: <1281315223.2414.48.camel@zoidberg.mbs.adelaide.edu.au> Hi Chris, Is that set of files planned to be included in the git repository on bioperl-live? I don't want to push something that is being organised by someone else. cheers Dan On Thu, 2010-08-05 at 22:13 -0500, Chris Fields wrote: > Dan, > > Just so you know, there is a proposed MUMmer AlignIO parser that John (genehack) is planning on trying to incorporate in: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2701 > > It currently lacks significant tests, so feel free to chip in there as needed. > > chris From genehack at genehack.org Sun Aug 8 21:42:27 2010 From: genehack at genehack.org (John SJ Anderson) Date: Sun, 8 Aug 2010 21:42:27 -0400 Subject: [Bioperl-l] MUMmer parser work In-Reply-To: <1281315223.2414.48.camel@zoidberg.mbs.adelaide.edu.au> References: <1281056805.2414.26.camel@zoidberg.mbs.adelaide.edu.au> <80AF6158-9ADF-47A6-97EC-C322F75C8959@illinois.edu> <1281315223.2414.48.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: <5BEA6ECA-B7A7-4417-BC91-763AB956347A@genehack.org> I'm working on getting those files into a topic branch in bioperl-live so they can be reviewed -- that'll probably be pushed back to the main master within the next couple days at the latest. j. On Aug 8, 2010, at 20:53 , Dan Kortschak wrote: > Hi Chris, > > Is that set of files planned to be included in the git repository on > bioperl-live? I don't want to push something that is being organised by > someone else. > > cheers > Dan > > On Thu, 2010-08-05 at 22:13 -0500, Chris Fields wrote: >> Dan, >> >> Just so you know, there is a proposed MUMmer AlignIO parser that John (genehack) is planning on trying to incorporate in: >> >> http://bugzilla.open-bio.org/show_bug.cgi?id=2701 >> >> It currently lacks significant tests, so feel free to chip in there as needed. >> >> chris > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From dan.kortschak at adelaide.edu.au Sun Aug 8 22:03:52 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Mon, 09 Aug 2010 11:33:52 +0930 Subject: [Bioperl-l] MUMmer parser work In-Reply-To: <5BEA6ECA-B7A7-4417-BC91-763AB956347A@genehack.org> References: <1281056805.2414.26.camel@zoidberg.mbs.adelaide.edu.au> <80AF6158-9ADF-47A6-97EC-C322F75C8959@illinois.edu> <1281315223.2414.48.camel@zoidberg.mbs.adelaide.edu.au> <5BEA6ECA-B7A7-4417-BC91-763AB956347A@genehack.org> Message-ID: <1281319432.2414.49.camel@zoidberg.mbs.adelaide.edu.au> Excellent. Thanks for that. Dan On Sun, 2010-08-08 at 21:42 -0400, John SJ Anderson wrote: > I'm working on getting those files into a topic branch in bioperl-live so they can be reviewed -- that'll probably be pushed back to the main master within the next couple days at the latest. > > j. From cjfields at illinois.edu Mon Aug 9 22:40:07 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 9 Aug 2010 21:40:07 -0500 Subject: [Bioperl-l] bioperl-live, moving Bio->lib/Bio Message-ID: Any objections to moving the Bio directory to lib/Bio in bioperl-live? It's a more standard location for code in most distributions; I have a branch (topic/cjfields_standard_lib) that has this working, though it's possible that it needs more work. chris From genehack at genehack.org Tue Aug 10 04:30:44 2010 From: genehack at genehack.org (John SJ Anderson) Date: Tue, 10 Aug 2010 04:30:44 -0400 Subject: [Bioperl-l] bioperl-live, moving Bio->lib/Bio In-Reply-To: References: Message-ID: On Aug 9, 2010, at 22:40 , Chris Fields wrote: > Any objections to moving the Bio directory to lib/Bio in bioperl-live? +1 on this idea. j. From genehack at genehack.org Tue Aug 10 07:21:51 2010 From: genehack at genehack.org (John Anderson) Date: Tue, 10 Aug 2010 07:21:51 -0400 Subject: [Bioperl-l] MUMmer parser work In-Reply-To: <5BEA6ECA-B7A7-4417-BC91-763AB956347A@genehack.org> References: <1281056805.2414.26.camel@zoidberg.mbs.adelaide.edu.au> <80AF6158-9ADF-47A6-97EC-C322F75C8959@illinois.edu> <1281315223.2414.48.camel@zoidberg.mbs.adelaide.edu.au> <5BEA6ECA-B7A7-4417-BC91-763AB956347A@genehack.org> Message-ID: <7A4F93AB-1BF7-4775-BC0E-38E7B431ECC6@genehack.org> On Aug 8, 2010, at 9:42 PM, John SJ Anderson wrote: > I'm working on getting those files into a topic branch in bioperl-live so they can be reviewed -- that'll probably be pushed back to the main master within the next couple days at the latest. Okay, the files have been added to topic/bug-2701 -- see . Please note, these are just the files from the bug report, slotted into the appropriate spots. I haven't reviewed the code or done anything about the non-BioPerl-y tests or the general lack of test coverage. I hope to do something about that in the coming week, but if somebody beats me to it, that would be okay too. j. From maj at fortinbras.us Tue Aug 10 19:52:05 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 10 Aug 2010 19:52:05 -0400 Subject: [Bioperl-l] bioperl-live, moving Bio->lib/Bio In-Reply-To: References: Message-ID: <1C55239986494A8D82BDC21A85B324E9@NewLife> +1 ----- Original Message ----- From: "Chris Fields" To: "BioPerl List" Sent: Monday, August 09, 2010 10:40 PM Subject: [Bioperl-l] bioperl-live, moving Bio->lib/Bio > Any objections to moving the Bio directory to lib/Bio in bioperl-live? It's a > more standard location for code in most distributions; I have a branch > (topic/cjfields_standard_lib) that has this working, though it's possible that > it needs more work. > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From fayroz_farouk at yahoo.com Sun Aug 8 04:24:31 2010 From: fayroz_farouk at yahoo.com (fayroz) Date: Sun, 8 Aug 2010 01:24:31 -0700 (PDT) Subject: [Bioperl-l] using HMMER Message-ID: <603590.1072.qm@web112620.mail.gq1.yahoo.com> i need your help, i?am a new perl user and want to use bioperl modules to run HMMER program ( HMMsearch) i have" model.hmm" and a "fasta file" to?see?which of them are similar?with the model i write this code but there is a problems #!/usr/local/bin/perl W use Bio::AlignIO; use Bio::SearchIO; use Bio::SeqIO ; use Bio::Tools::Run::Hmmer; # run hmmsearch (similar for hmmpfam) my $factory = Bio::Tools::Run::Hmmer->new(-hmm => 'h6_avian.hmm',-informat => 'fasta'); my $seq = Bio::SeqIO->new('-file'=> "one_seq.fa", '-format'=>'Fasta'); # Pass the factory a Bio::Seq object or a file name, returns a Bio::SearchIO my $searchio = $factory->hmmsearch($seq); while (my $result = $searchio->next_result){ while(my $hit = $result->next_hit){ while (my $hsp = $hit->next_hsp){ print join("\t", ( $result->query_name, $hsp->query->start, $hsp->query->end, $hit->name, $hsp->hit->start, $hsp->hit->end, $hsp->score, $hsp->evalue, $hsp->seq_str, )), "\n"; } } } exceptions: MSG: Unknown kind of input 'Bio::SeqIO::fasta=HASH(0x329a504)' STACK Bio::Tools::Run::Hmmer::_setinput D:/Perl/site/lib/Bio/Tools/Run/Hmmer.pm:381 STACK Bio::Tools::Run::Hmmer::hmmsearch D:/Perl/site/lib/Bio/Tools/Run/Hmmer.pm:352 ?STACK toplevel test_bioperl.pl:12 thank you fayroz? From douglas.hoen at gmail.com Tue Aug 10 21:54:53 2010 From: douglas.hoen at gmail.com (Douglas Hoen) Date: Tue, 10 Aug 2010 21:54:53 -0400 Subject: [Bioperl-l] Bio::SeqFeature::SimilarityPair->from_searchResult()? Message-ID: <4513D6B2-F7B3-4A6E-91CA-879C9E372E84@gmail.com> Hi, I was wondering why the Synopsis in the docs for Bio::SeqFeature::SimilarityPair has the following: $sim_pair = Bio::SeqFeature::SimilarityPair->from_searchResult($blastHit); There doesn't actually seem to be a from_searchResult method. Am I missing something? Thanks, -- Doug From zhaoy at mail.cbi.pku.edu.cn Wed Aug 11 04:17:42 2010 From: zhaoy at mail.cbi.pku.edu.cn (zhaoy at mail.cbi.pku.edu.cn) Date: Wed, 11 Aug 2010 16:17:42 +0800 (CST) Subject: [Bioperl-l] About extracting sequence from genewise format result Message-ID: <53663.162.105.250.100.1281514662.squirrel@mail.cbi.pku.edu.cn> Dear authors: Hello! Recently I am trying to parse the genewise format result for extracting the nuclear sequence using method "hit_string" in module "SearchIO", however, the result is empty. What's more terrible, the cycle seems not working, because I always get the last result. I'm confused. My perl code is shown below: #!/usr/bin/perl -w use strict; use warnings; use Bio::SearchIO; my $in = new Bio::SearchIO(-format => 'wise', -wisetype => 'genewise', -file => 'test'); while( my $result = $in->next_result ) { while (my $hit = $result->next_hit) { while (my $hsp = $hit->next_hsp){ print "Query=", $result->query_name, "\n", "Length=", $hsp->length('total'),"\n", "hit_string:", $hsp->hit_string, "\n"; } } } And one of the genewise format results is shown below: genewise $Name: wise2-4-0alpha $ (unreleased release) This program is freely distributed under a GPL. See source directory Copyright (c) GRL limited: portions of the code are from separate copyright Query protein: Cpa_s110_24 Comp Matrix: BLOSUM62.bla Gap open: 12 Gap extension: 2 Start/End global Target Sequence Bdi_chr3:38292015..38292302 Strand: forward Start/End (protein) global Gene Parameter file: gene.stat Splice site model: GT/AG only Codon Table: codon.table Subs error: 1e-06 Indel error: 1e-06 Null model syn Algorithm 623 genewise output Score 37.97 bits over entire alignment Scores as bits over a synchronous coding model Warning: The bits scores is not probablistically correct for single seqs See WWW help for more info Cpa_s110_24 1 MGNCQAVDAATLAIQHPS-GKVDRLYWPVSASEVMRTNPGHYVALLI-- MGNCQA DAA + IQHP+ GKV+RLYWP +A++VMR NPGHYVAL++ MGNCQAADAAAVVIQHPAEGKVERLYWPATAADVMRKNPGHYVALVVVH Bdi_chr3:382920 1 agatcggggggggacccgggaggccttcgaggggacaacgctggcgggc tgagaccaccctttaaccagatagtagcccccattgaacgaatctttta gctcgggtggcggcgcgcgggcgcccggccgcccgcgcccccccccccc Cpa_s110_24 47 ----STTLCPSNSNASNAESVRVTRIKLLRPTDTLVLGQVYRLITTQEV P+ + A + R+T++KLL+P DTL++GQVYRLIT+Q VSGGAGETDPAVAGGGAAAAARITKVKLLKPRDTLLIGQVYRLITSQ-- Bdi_chr3:382920 148 gtgggggagcgggggggggggaaaagaccaccgaccagcgtccaatc tcggcgacacctcgggcccccgtcatattacgactttgatagttcca cctcctgtcccacaaaattccgccgcgccgcgctgcccgccccccca Cpa_s110_24 92 MKGLWAKKCAKMKKYQEADHKDGLKPETIPGRRSGPERDTQVAKHERHR ------------------------------------------------- Bdi_chr3:382920 289 Cpa_s110_24 141 SRVAASTNQAGLKSRTWQPSLKSISEAAS ----------------------------- Bdi_chr3:382920 289 // Gene 1 Gene 1 288 Exon 1 288 phase 0 Supporting 1 54 1 18 Supporting 58 141 19 46 Supporting 160 288 47 89 // ...... The part of output of this code is shown below: Query=Aly_481360 Length=0 hit_string: Query=Aly_481360 Length=0 hit_string: ...... What's wrong with my code and how can I get the correct result? I'm looking forward to your reply. Thanks very much! Best regards, Zackaly From roy.chaudhuri at gmail.com Wed Aug 11 10:32:39 2010 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Wed, 11 Aug 2010 15:32:39 +0100 Subject: [Bioperl-l] using HMMER In-Reply-To: <603590.1072.qm@web112620.mail.gq1.yahoo.com> References: <603590.1072.qm@web112620.mail.gq1.yahoo.com> Message-ID: <4C62B487.9090103@gmail.com> Hi Fayroz, Your $seq variable contains a Bio::SeqIO object (a biological filehandle), not a Bio::Seq (sequence object). You need to change that line to: my $seqio = Bio::SeqIO->new(-file=>'one_seq.fa', -format=>'fasta'); my $seq=$seqio->next_seq; If you have multiple sequences in the file, then you will need to loop over them: while (my $seq=$seqio->next_seq) { # Code to run Hmmer goes here } Also, I don't think you need to specify -informat for your Bio::Tools::Run::Hmmer object, since you're passing it a sequence object, not a filename. Hope this helps. Roy. On 08/08/2010 09:24, fayroz wrote: > i need your help, i am a new perl user and want to use bioperl modules to run > HMMER program ( HMMsearch) i have" model.hmm" and a "fasta file" to see which of > them are similar with the model > i write this code but there is a problems > > #!/usr/local/bin/perl W > use Bio::AlignIO; > use Bio::SearchIO; > use Bio::SeqIO ; > use Bio::Tools::Run::Hmmer; > > # run hmmsearch (similar for hmmpfam) > my $factory = Bio::Tools::Run::Hmmer->new(-hmm => 'h6_avian.hmm',-informat => > 'fasta'); > my $seq = Bio::SeqIO->new('-file'=> "one_seq.fa", '-format'=>'Fasta'); > > # Pass the factory a Bio::Seq object or a file name, returns a Bio::SearchIO > my $searchio = $factory->hmmsearch($seq); > > while (my $result = $searchio->next_result){ > while(my $hit = $result->next_hit){ > while (my $hsp = $hit->next_hsp){ > print join("\t", ( $result->query_name, > $hsp->query->start, > $hsp->query->end, > $hit->name, > $hsp->hit->start, > $hsp->hit->end, > $hsp->score, > $hsp->evalue, > $hsp->seq_str, > )), "\n"; > } > } > } > > > exceptions: > MSG: Unknown kind of input 'Bio::SeqIO::fasta=HASH(0x329a504)' > STACK Bio::Tools::Run::Hmmer::_setinput > D:/Perl/site/lib/Bio/Tools/Run/Hmmer.pm:381 > STACK Bio::Tools::Run::Hmmer::hmmsearch > D:/Perl/site/lib/Bio/Tools/Run/Hmmer.pm:352 > STACK toplevel test_bioperl.pl:12 > thank you > > fayroz > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Aug 11 11:07:36 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 11 Aug 2010 10:07:36 -0500 Subject: [Bioperl-l] using HMMER In-Reply-To: <4C62B487.9090103@gmail.com> References: <603590.1072.qm@web112620.mail.gq1.yahoo.com> <4C62B487.9090103@gmail.com> Message-ID: <62C86AFB-FF3A-44C6-A413-50C3F839DF34@illinois.edu> might also want to check whether you are using hmmer2 vs hmmer3. not sure if the wrapper works for hmmer3. chris On Aug 11, 2010, at 9:32 AM, Roy Chaudhuri wrote: > Hi Fayroz, > > Your $seq variable contains a Bio::SeqIO object (a biological filehandle), not a Bio::Seq (sequence object). > > You need to change that line to: > my $seqio = Bio::SeqIO->new(-file=>'one_seq.fa', -format=>'fasta'); > my $seq=$seqio->next_seq; > > If you have multiple sequences in the file, then you will need to loop over them: > while (my $seq=$seqio->next_seq) { > # Code to run Hmmer goes here > } > > Also, I don't think you need to specify -informat for your Bio::Tools::Run::Hmmer object, since you're passing it a sequence object, not a filename. > > Hope this helps. > Roy. > > On 08/08/2010 09:24, fayroz wrote: >> i need your help, i am a new perl user and want to use bioperl modules to run >> HMMER program ( HMMsearch) i have" model.hmm" and a "fasta file" to see which of >> them are similar with the model >> i write this code but there is a problems >> >> #!/usr/local/bin/perl W >> use Bio::AlignIO; >> use Bio::SearchIO; >> use Bio::SeqIO ; >> use Bio::Tools::Run::Hmmer; >> >> # run hmmsearch (similar for hmmpfam) >> my $factory = Bio::Tools::Run::Hmmer->new(-hmm => 'h6_avian.hmm',-informat => >> 'fasta'); >> my $seq = Bio::SeqIO->new('-file'=> "one_seq.fa", '-format'=>'Fasta'); >> >> # Pass the factory a Bio::Seq object or a file name, returns a Bio::SearchIO >> my $searchio = $factory->hmmsearch($seq); >> >> while (my $result = $searchio->next_result){ >> while(my $hit = $result->next_hit){ >> while (my $hsp = $hit->next_hsp){ >> print join("\t", ( $result->query_name, >> $hsp->query->start, >> $hsp->query->end, >> $hit->name, >> $hsp->hit->start, >> $hsp->hit->end, >> $hsp->score, >> $hsp->evalue, >> $hsp->seq_str, >> )), "\n"; >> } >> } >> } >> >> >> exceptions: >> MSG: Unknown kind of input 'Bio::SeqIO::fasta=HASH(0x329a504)' >> STACK Bio::Tools::Run::Hmmer::_setinput >> D:/Perl/site/lib/Bio/Tools/Run/Hmmer.pm:381 >> STACK Bio::Tools::Run::Hmmer::hmmsearch >> D:/Perl/site/lib/Bio/Tools/Run/Hmmer.pm:352 >> STACK toplevel test_bioperl.pl:12 >> thank you >> >> fayroz >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From douglas.hoen at gmail.com Wed Aug 11 15:13:49 2010 From: douglas.hoen at gmail.com (Doug) Date: Wed, 11 Aug 2010 12:13:49 -0700 (PDT) Subject: [Bioperl-l] How to store results of searches of translated DNA in SeqFeature::Store database of the original DNA? Message-ID: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com> Hi, I am trying to store in a SeqFeature::Store database the results of searches of translated DNA. The DB contains the original DNA sequences. For instance, I have done HMMER searches of 6-frame translations of the sequences stored in the DB. I want to store these results "at" their (equivalent) DNA positions, which I can calculate. Preferably, I would like to directly store the SeqFeature::Similarity objects that I get from parsing these searches. But they are of course located on different coordinate systems than the DNA, so I guess I can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct DNA position and then store the Similarity's as sub-SeqFeatures. I could just set the Similarity's position to the (calculated) DNA coordinates, or alternately make a new SeqFeature and copy in the attributes I want. But is there a more elegant solution? Thanks, -- Doug From douglas.hoen at gmail.com Wed Aug 11 16:11:26 2010 From: douglas.hoen at gmail.com (Doug) Date: Wed, 11 Aug 2010 13:11:26 -0700 (PDT) Subject: [Bioperl-l] How to store results of searches of translated DNA in SeqFeature::Store database of the original DNA? In-Reply-To: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com> References: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com> Message-ID: One possible answer to my own question: Use Bio::SeqFeature::PositionProxy's? Would this work? On Aug 11, 3:13?pm, Doug wrote: > Hi, > > I am trying to store in a SeqFeature::Store database the results of > searches of translated DNA. The DB contains the original DNA > sequences. For instance, I have done HMMER searches of 6-frame > translations of the sequences stored in the DB. I want to store these > results "at" their (equivalent) DNA positions, which I can calculate. > Preferably, I would like to directly store the SeqFeature::Similarity > objects that I get from parsing these searches. But they are of course > located on different coordinate systems than the DNA, so I guess I > can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct > DNA position and then store the Similarity's as sub-SeqFeatures. > > I could just set the Similarity's position to the (calculated) DNA > coordinates, or alternately make a new SeqFeature and copy in the > attributes I want. But is there a more elegant solution? > > Thanks, > -- Doug > _______________________________________________ > Bioperl-l mailing list > Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l From scott at scottcain.net Wed Aug 11 16:16:22 2010 From: scott at scottcain.net (Scott Cain) Date: Wed, 11 Aug 2010 16:16:22 -0400 Subject: [Bioperl-l] How to store results of searches of translated DNA in SeqFeature::Store database of the original DNA? In-Reply-To: References: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com> Message-ID: Hi Doug, I don't know if any of the things you've thought of would work; I've never tried it. My inclination would be to express your data in GFF3 and use the standard loader. Scott On Wed, Aug 11, 2010 at 4:11 PM, Doug wrote: > One possible answer to my own question: Use > Bio::SeqFeature::PositionProxy's? Would this work? > > On Aug 11, 3:13?pm, Doug wrote: >> Hi, >> >> I am trying to store in a SeqFeature::Store database the results of >> searches of translated DNA. The DB contains the original DNA >> sequences. For instance, I have done HMMER searches of 6-frame >> translations of the sequences stored in the DB. I want to store these >> results "at" their (equivalent) DNA positions, which I can calculate. >> Preferably, I would like to directly store the SeqFeature::Similarity >> objects that I get from parsing these searches. But they are of course >> located on different coordinate systems than the DNA, so I guess I >> can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct >> DNA position and then store the Similarity's as sub-SeqFeatures. >> >> I could just set the Similarity's position to the (calculated) DNA >> coordinates, or alternately make a new SeqFeature and copy in the >> attributes I want. But is there a more elegant solution? >> >> Thanks, >> -- Doug >> _______________________________________________ >> Bioperl-l mailing list >> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From douglas.hoen at gmail.com Wed Aug 11 16:38:54 2010 From: douglas.hoen at gmail.com (Doug) Date: Wed, 11 Aug 2010 13:38:54 -0700 (PDT) Subject: [Bioperl-l] How to store results of searches of translated DNA in SeqFeature::Store database of the original DNA? In-Reply-To: References: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com> Message-ID: <6e28dc26-ada0-4be2-9f62-a4d632aaf0bb@j8g2000yqd.googlegroups.com> Hi Scott, Good idea. Would you happen to know of an existing HMMER3 to GFF3 converter? Thanks for your advice, -- Doug On Aug 11, 4:16?pm, Scott Cain wrote: > Hi Doug, > > I don't know if any of the things you've thought of would work; I've > never tried it. ?My inclination would be to express your data in GFF3 > and use the standard loader. > > Scott > > > > > > On Wed, Aug 11, 2010 at 4:11 PM, Doug wrote: > > One possible answer to my own question: Use > > Bio::SeqFeature::PositionProxy's? Would this work? > > > On Aug 11, 3:13?pm, Doug wrote: > >> Hi, > > >> I am trying to store in a SeqFeature::Store database the results of > >> searches of translated DNA. The DB contains the original DNA > >> sequences. For instance, I have done HMMER searches of 6-frame > >> translations of the sequences stored in the DB. I want to store these > >> results "at" their (equivalent) DNA positions, which I can calculate. > >> Preferably, I would like to directly store the SeqFeature::Similarity > >> objects that I get from parsing these searches. But they are of course > >> located on different coordinate systems than the DNA, so I guess I > >> can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct > >> DNA position and then store the Similarity's as sub-SeqFeatures. > > >> I could just set the Similarity's position to the (calculated) DNA > >> coordinates, or alternately make a new SeqFeature and copy in the > >> attributes I want. But is there a more elegant solution? > > >> Thanks, > >> -- Doug > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > > Bioperl-l mailing list > > Bioper... at lists.open-bio.org > >http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net > GMOD Coordinator (http://gmod.org/)?? ? ? ? ? ? ? ? ?? 216-392-3087 > Ontario Institute for Cancer Research > > _______________________________________________ > Bioperl-l mailing list > Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l From douglas.hoen at gmail.com Wed Aug 11 16:53:35 2010 From: douglas.hoen at gmail.com (Doug) Date: Wed, 11 Aug 2010 13:53:35 -0700 (PDT) Subject: [Bioperl-l] How to store results of searches of translated DNA in SeqFeature::Store database of the original DNA? In-Reply-To: <6e28dc26-ada0-4be2-9f62-a4d632aaf0bb@j8g2000yqd.googlegroups.com> References: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com> <6e28dc26-ada0-4be2-9f62-a4d632aaf0bb@j8g2000yqd.googlegroups.com> Message-ID: One more note: I did try using PositionProxy but it failed. It doesn't implement seq_id() and so can't be stored in the DB: ------------- EXCEPTION: Bio::Root::NotImplemented ------------- MSG: Abstract method "Bio::SeqFeatureI::seq_id" is not implemented by package Bio::SeqFeature::PositionProxy. This is not your fault - author of Bio::SeqFeature::PositionProxy should be blamed! ... On Aug 11, 4:38?pm, Doug wrote: > Hi Scott, > > Good idea. Would you happen to know of an existing HMMER3 to GFF3 > converter? > > Thanks for your advice, > -- Doug > > On Aug 11, 4:16?pm, Scott Cain wrote: > > > > > > > Hi Doug, > > > I don't know if any of the things you've thought of would work; I've > > never tried it. ?My inclination would be to express your data in GFF3 > > and use the standard loader. > > > Scott > > > On Wed, Aug 11, 2010 at 4:11 PM, Doug wrote: > > > One possible answer to my own question: Use > > > Bio::SeqFeature::PositionProxy's? Would this work? > > > > On Aug 11, 3:13?pm, Doug wrote: > > >> Hi, > > > >> I am trying to store in a SeqFeature::Store database the results of > > >> searches of translated DNA. The DB contains the original DNA > > >> sequences. For instance, I have done HMMER searches of 6-frame > > >> translations of the sequences stored in the DB. I want to store these > > >> results "at" their (equivalent) DNA positions, which I can calculate. > > >> Preferably, I would like to directly store the SeqFeature::Similarity > > >> objects that I get from parsing these searches. But they are of course > > >> located on different coordinate systems than the DNA, so I guess I > > >> can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct > > >> DNA position and then store the Similarity's as sub-SeqFeatures. > > > >> I could just set the Similarity's position to the (calculated) DNA > > >> coordinates, or alternately make a new SeqFeature and copy in the > > >> attributes I want. But is there a more elegant solution? > > > >> Thanks, > > >> -- Doug > > >> _______________________________________________ > > >> Bioperl-l mailing list > > >> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioper... at lists.open-bio.org > > >http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -- > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net > > GMOD Coordinator (http://gmod.org/)?? ? ? ? ? ? ? ? ?? 216-392-3087 > > Ontario Institute for Cancer Research > > > _______________________________________________ > > Bioperl-l mailing list > > Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Aug 11 16:45:00 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 11 Aug 2010 15:45:00 -0500 Subject: [Bioperl-l] How to store results of searches of translated DNA in SeqFeature::Store database of the original DNA? In-Reply-To: <6e28dc26-ada0-4be2-9f62-a4d632aaf0bb@j8g2000yqd.googlegroups.com> References: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com> <6e28dc26-ada0-4be2-9f62-a4d632aaf0bb@j8g2000yqd.googlegroups.com> Message-ID: <190AF658-E8FE-43D7-A71F-196AE54DA1DB@illinois.edu> HMMER3 is parsed by Bio::SearchIO now in bioperl-live, and I think there is a generic SearchIO->GFF3 script floating around the intertubes somewheres... chris On Aug 11, 2010, at 3:38 PM, Doug wrote: > Hi Scott, > > Good idea. Would you happen to know of an existing HMMER3 to GFF3 > converter? > > Thanks for your advice, > -- Doug > > On Aug 11, 4:16 pm, Scott Cain wrote: >> Hi Doug, >> >> I don't know if any of the things you've thought of would work; I've >> never tried it. My inclination would be to express your data in GFF3 >> and use the standard loader. >> >> Scott >> >> >> >> >> >> On Wed, Aug 11, 2010 at 4:11 PM, Doug wrote: >>> One possible answer to my own question: Use >>> Bio::SeqFeature::PositionProxy's? Would this work? >> >>> On Aug 11, 3:13 pm, Doug wrote: >>>> Hi, >> >>>> I am trying to store in a SeqFeature::Store database the results of >>>> searches of translated DNA. The DB contains the original DNA >>>> sequences. For instance, I have done HMMER searches of 6-frame >>>> translations of the sequences stored in the DB. I want to store these >>>> results "at" their (equivalent) DNA positions, which I can calculate. >>>> Preferably, I would like to directly store the SeqFeature::Similarity >>>> objects that I get from parsing these searches. But they are of course >>>> located on different coordinate systems than the DNA, so I guess I >>>> can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct >>>> DNA position and then store the Similarity's as sub-SeqFeatures. >> >>>> I could just set the Similarity's position to the (calculated) DNA >>>> coordinates, or alternately make a new SeqFeature and copy in the >>>> attributes I want. But is there a more elegant solution? >> >>>> Thanks, >>>> -- Doug >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l >> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioper... at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D. scott at scottcain dot net >> GMOD Coordinator (http://gmod.org/) 216-392-3087 >> Ontario Institute for Cancer Research >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From scott at scottcain.net Wed Aug 11 17:05:25 2010 From: scott at scottcain.net (Scott Cain) Date: Wed, 11 Aug 2010 17:05:25 -0400 Subject: [Bioperl-l] How to store results of searches of translated DNA in SeqFeature::Store database of the original DNA? In-Reply-To: <190AF658-E8FE-43D7-A71F-196AE54DA1DB@illinois.edu> References: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com> <6e28dc26-ada0-4be2-9f62-a4d632aaf0bb@j8g2000yqd.googlegroups.com> <190AF658-E8FE-43D7-A71F-196AE54DA1DB@illinois.edu> Message-ID: Um, yeah, it's in bioperl: bp_search2gff.pl. Scott On Wed, Aug 11, 2010 at 4:45 PM, Chris Fields wrote: > HMMER3 is parsed by Bio::SearchIO now in bioperl-live, and I think there is a generic SearchIO->GFF3 script floating around the intertubes somewheres... > > chris > > On Aug 11, 2010, at 3:38 PM, Doug wrote: > >> Hi Scott, >> >> Good idea. Would you happen to know of an existing HMMER3 to GFF3 >> converter? >> >> Thanks for your advice, >> -- Doug >> >> On Aug 11, 4:16 pm, Scott Cain wrote: >>> Hi Doug, >>> >>> I don't know if any of the things you've thought of would work; I've >>> never tried it. ?My inclination would be to express your data in GFF3 >>> and use the standard loader. >>> >>> Scott >>> >>> >>> >>> >>> >>> On Wed, Aug 11, 2010 at 4:11 PM, Doug wrote: >>>> One possible answer to my own question: Use >>>> Bio::SeqFeature::PositionProxy's? Would this work? >>> >>>> On Aug 11, 3:13 pm, Doug wrote: >>>>> Hi, >>> >>>>> I am trying to store in a SeqFeature::Store database the results of >>>>> searches of translated DNA. The DB contains the original DNA >>>>> sequences. For instance, I have done HMMER searches of 6-frame >>>>> translations of the sequences stored in the DB. I want to store these >>>>> results "at" their (equivalent) DNA positions, which I can calculate. >>>>> Preferably, I would like to directly store the SeqFeature::Similarity >>>>> objects that I get from parsing these searches. But they are of course >>>>> located on different coordinate systems than the DNA, so I guess I >>>>> can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct >>>>> DNA position and then store the Similarity's as sub-SeqFeatures. >>> >>>>> I could just set the Similarity's position to the (calculated) DNA >>>>> coordinates, or alternately make a new SeqFeature and copy in the >>>>> attributes I want. But is there a more elegant solution? >>> >>>>> Thanks, >>>>> -- Doug >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioper... at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> -- >>> ------------------------------------------------------------------------ >>> Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain dot net >>> GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ?216-392-3087 >>> Ontario Institute for Cancer Research >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From cjfields at illinois.edu Wed Aug 11 17:07:20 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 11 Aug 2010 16:07:20 -0500 Subject: [Bioperl-l] How to store results of searches of translated DNA in SeqFeature::Store database of the original DNA? In-Reply-To: References: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com> <6e28dc26-ada0-4be2-9f62-a4d632aaf0bb@j8g2000yqd.googlegroups.com> <190AF658-E8FE-43D7-A71F-196AE54DA1DB@illinois.edu> Message-ID: For some reason I thought there was a more up-to-date one somewhere. Ah well, can't keep track of all the code in bioperl :> chris On Aug 11, 2010, at 4:05 PM, Scott Cain wrote: > Um, yeah, it's in bioperl: bp_search2gff.pl. > > Scott > > > On Wed, Aug 11, 2010 at 4:45 PM, Chris Fields wrote: >> HMMER3 is parsed by Bio::SearchIO now in bioperl-live, and I think there is a generic SearchIO->GFF3 script floating around the intertubes somewheres... >> >> chris >> >> On Aug 11, 2010, at 3:38 PM, Doug wrote: >> >>> Hi Scott, >>> >>> Good idea. Would you happen to know of an existing HMMER3 to GFF3 >>> converter? >>> >>> Thanks for your advice, >>> -- Doug >>> >>> On Aug 11, 4:16 pm, Scott Cain wrote: >>>> Hi Doug, >>>> >>>> I don't know if any of the things you've thought of would work; I've >>>> never tried it. My inclination would be to express your data in GFF3 >>>> and use the standard loader. >>>> >>>> Scott >>>> >>>> >>>> >>>> >>>> >>>> On Wed, Aug 11, 2010 at 4:11 PM, Doug wrote: >>>>> One possible answer to my own question: Use >>>>> Bio::SeqFeature::PositionProxy's? Would this work? >>>> >>>>> On Aug 11, 3:13 pm, Doug wrote: >>>>>> Hi, >>>> >>>>>> I am trying to store in a SeqFeature::Store database the results of >>>>>> searches of translated DNA. The DB contains the original DNA >>>>>> sequences. For instance, I have done HMMER searches of 6-frame >>>>>> translations of the sequences stored in the DB. I want to store these >>>>>> results "at" their (equivalent) DNA positions, which I can calculate. >>>>>> Preferably, I would like to directly store the SeqFeature::Similarity >>>>>> objects that I get from parsing these searches. But they are of course >>>>>> located on different coordinate systems than the DNA, so I guess I >>>>>> can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct >>>>>> DNA position and then store the Similarity's as sub-SeqFeatures. >>>> >>>>>> I could just set the Similarity's position to the (calculated) DNA >>>>>> coordinates, or alternately make a new SeqFeature and copy in the >>>>>> attributes I want. But is there a more elegant solution? >>>> >>>>>> Thanks, >>>>>> -- Doug >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioper... at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> -- >>>> ------------------------------------------------------------------------ >>>> Scott Cain, Ph. D. scott at scottcain dot net >>>> GMOD Coordinator (http://gmod.org/) 216-392-3087 >>>> Ontario Institute for Cancer Research >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research From douglas.hoen at gmail.com Wed Aug 11 17:11:20 2010 From: douglas.hoen at gmail.com (Douglas Hoen) Date: Wed, 11 Aug 2010 17:11:20 -0400 Subject: [Bioperl-l] How to store results of searches of translated DNA in SeqFeature::Store database of the original DNA? In-Reply-To: References: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com> <6e28dc26-ada0-4be2-9f62-a4d632aaf0bb@j8g2000yqd.googlegroups.com> <190AF658-E8FE-43D7-A71F-196AE54DA1DB@illinois.edu> Message-ID: Great, thanks so much for the info. On 2010-08-11, at 5:05 PM, Scott Cain wrote: > Um, yeah, it's in bioperl: bp_search2gff.pl. > > Scott > > > On Wed, Aug 11, 2010 at 4:45 PM, Chris Fields wrote: >> HMMER3 is parsed by Bio::SearchIO now in bioperl-live, and I think there is a generic SearchIO->GFF3 script floating around the intertubes somewheres... >> >> chris >> >> On Aug 11, 2010, at 3:38 PM, Doug wrote: >> >>> Hi Scott, >>> >>> Good idea. Would you happen to know of an existing HMMER3 to GFF3 >>> converter? >>> >>> Thanks for your advice, >>> -- Doug >>> >>> On Aug 11, 4:16 pm, Scott Cain wrote: >>>> Hi Doug, >>>> >>>> I don't know if any of the things you've thought of would work; I've >>>> never tried it. My inclination would be to express your data in GFF3 >>>> and use the standard loader. >>>> >>>> Scott >>>> >>>> >>>> >>>> >>>> >>>> On Wed, Aug 11, 2010 at 4:11 PM, Doug wrote: >>>>> One possible answer to my own question: Use >>>>> Bio::SeqFeature::PositionProxy's? Would this work? >>>> >>>>> On Aug 11, 3:13 pm, Doug wrote: >>>>>> Hi, >>>> >>>>>> I am trying to store in a SeqFeature::Store database the results of >>>>>> searches of translated DNA. The DB contains the original DNA >>>>>> sequences. For instance, I have done HMMER searches of 6-frame >>>>>> translations of the sequences stored in the DB. I want to store these >>>>>> results "at" their (equivalent) DNA positions, which I can calculate. >>>>>> Preferably, I would like to directly store the SeqFeature::Similarity >>>>>> objects that I get from parsing these searches. But they are of course >>>>>> located on different coordinate systems than the DNA, so I guess I >>>>>> can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct >>>>>> DNA position and then store the Similarity's as sub-SeqFeatures. >>>> >>>>>> I could just set the Similarity's position to the (calculated) DNA >>>>>> coordinates, or alternately make a new SeqFeature and copy in the >>>>>> attributes I want. But is there a more elegant solution? >>>> >>>>>> Thanks, >>>>>> -- Doug >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioper... at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> -- >>>> ------------------------------------------------------------------------ >>>> Scott Cain, Ph. D. scott at scottcain dot net >>>> GMOD Coordinator (http://gmod.org/) 216-392-3087 >>>> Ontario Institute for Cancer Research >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research From Russell.Smithies at agresearch.co.nz Wed Aug 11 17:31:32 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 12 Aug 2010 09:31:32 +1200 Subject: [Bioperl-l] AlignIO and Gbrowse_syn In-Reply-To: References: Message-ID: <18DF7D20DFEC044098A1062202F5FFF32F0237EAB7@exchsth.agresearch.co.nz> I know there was some brief discussion about .maf format a few weeks ago but I've had an enquiry (as below) from a colleague. If GBrowse_syn is using .maf format, does AlignIO need more work? Any comments? --Russell I'd like to plug LASTZ alignments into GBrowse_syn. LASTZ can produce a limit number of alignment formats (http://www.bx.psu.edu/miller_lab/dist/README.lastz-1.02.00/README.lastz-1.02.00a.html#options_output). GBrowse_syn accepts clustalw format plus "other commonly used formats recognized by BioPerl's AlignIO parser" (http://gmod.org/wiki/GBrowse_syn_Database) . Since LASTZ doesn't produce clustalw, I've tried parsing LASTZ maf output to clustalw (and other alignment formats) using AlignIO, however I run into the following issues: *Strand info is lost (probably fair enough, since this isn't part of the clustalw format per se; incorporating strand info within sequence IDs is a GBrowse_syn clustalw specification) *The coordinate system for reverse strand matches differs between LASTZ .maf and BioPerl .maf: for LASTZ, coordinates relate to the reverse complemented sequence, whereas for BioPerl/GBrowse, coordinates relate to the original (non-rev complemented) sequence. E.g. a coordinate of "1" in the LASTZ .maf file refers to the last base of the original sequence; AlignIO prints "1" to the output clustalw file, but since strand info is lost it is construed as the first position at the very start of the original sequence. As a result all reverse match coordinates in the resulting clustalw output file are incorrect. *AlignIO is unable to parse multiple, individual aligned regions within the same .maf file; it interleaves them I would be interested to hear whether anyone has already found a solution to integrating LASTZ and GBrowse_syn... and also whether any development of AlignIO to improve support of maf format is planned. ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From cjfields at illinois.edu Wed Aug 11 18:02:38 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 11 Aug 2010 17:02:38 -0500 Subject: [Bioperl-l] AlignIO and Gbrowse_syn In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32F0237EAB7@exchsth.agresearch.co.nz> References: <18DF7D20DFEC044098A1062202F5FFF32F0237EAB7@exchsth.agresearch.co.nz> Message-ID: Russell, We have had very few requests to support .maf until recently, which is why there has been little done with it. We welcome any help to improve it. chris On Aug 11, 2010, at 4:31 PM, Smithies, Russell wrote: > I know there was some brief discussion about .maf format a few weeks ago but I've had an enquiry (as below) from a colleague. > If GBrowse_syn is using .maf format, does AlignIO need more work? > Any comments? > > --Russell > > > I'd like to plug LASTZ alignments into GBrowse_syn. LASTZ can produce a limit number of alignment formats (http://www.bx.psu.edu/miller_lab/dist/README.lastz-1.02.00/README.lastz-1.02.00a.html#options_output). GBrowse_syn accepts clustalw format plus "other commonly used formats recognized by BioPerl's AlignIO parser" (http://gmod.org/wiki/GBrowse_syn_Database) . Since LASTZ doesn't produce clustalw, I've tried parsing LASTZ maf output to clustalw (and other alignment formats) using AlignIO, however I run into the following issues: > *Strand info is lost (probably fair enough, since this isn't part of the clustalw format per se; incorporating strand info within sequence IDs is a GBrowse_syn clustalw specification) > *The coordinate system for reverse strand matches differs between LASTZ .maf and BioPerl .maf: for LASTZ, coordinates relate to the reverse complemented sequence, whereas for BioPerl/GBrowse, coordinates relate to the original (non-rev complemented) sequence. E.g. a coordinate of "1" in the LASTZ .maf file refers to the last base of the original sequence; AlignIO prints "1" to the output clustalw file, but since strand info is lost it is construed as the first position at the very start of the original sequence. As a result all reverse match coordinates in the resulting clustalw output file are incorrect. > *AlignIO is unable to parse multiple, individual aligned regions within the same .maf file; it interleaves them > > I would be interested to hear whether anyone has already found a solution to integrating LASTZ and GBrowse_syn... and also whether any development of AlignIO to improve support of maf format is planned. > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From douglas.hoen at gmail.com Thu Aug 12 01:59:37 2010 From: douglas.hoen at gmail.com (Doug Hoen) Date: Wed, 11 Aug 2010 22:59:37 -0700 (PDT) Subject: [Bioperl-l] HMMER3 to GFF3 Message-ID: <4bb89ced-69d9-43ff-ae20-4ce134efc40a@f6g2000yqa.googlegroups.com> Hi, I am trying to convert HMMER3 (hmmscan) output files into GFF3 files. Based on previous advice (see the thread, "How to store results of searches of translated DNA in SeqFeature::Store database of the original DNA?"), I have installed bioperl-live for its new HMMER3 parsing capabilities (in SearchIO) and am trying to use bp_search2gff.pl to do the file conversion. The hmmscan was done on translated chromosome sequences with conserved domain models. I want to get the GFF 'start' and 'end' columns to be based on these coordinates, not those of the models. To do this (with my files), it seems I need to use the option "--type hit". However, this changes the "Target" sequence name from the model name to chromosome name, and the model name does not appear anywhere in the output (see below). Could someone please confirm whether the results are incorrect and, if so, perhaps suggest a fix? It may well be that this problem is due to the unusual way I am using hmmscan, rather than a problem with HMMER3 parsing...? Many thanks, -- Doug ======================================================== Here's what it looks like if I do *not* use the "--type hit" option. (RVT_2 is a conserved domain name. I need this in the output.) COMMAND: ------------------ bp_search2gff.pl -i ../chr1-tesigsv2.hmmscan -o chr1-tesigsv2-hmmscan- original-locations-v2.gff3 --format hmmer3 --source HMMER3 --version 3 --component OUTPUT: ------------------ ==> chr1-tesigsv2-hmmscan-original-locations-v2.gff3 <== ##gff-version 3 Chr1_1 chromosome Component 1 10142557 . . 1 sequence=Chr1_1 Chr1_1 HMMER3 similarity 1 245 307.3 . 0 Target=Sequence:RVT_2 1898330 1898579 Chr1_1 HMMER3 similarity 1 244 329.5 . 0 Target=Sequence:RVT_2 2573551 2573796 Chr1_1 HMMER3 similarity 1 245 308.8 . 0 Target=Sequence:RVT_2 3159685 3159930 Chr1_1 HMMER3 similarity 1 102 108.2 . 0 Target=Sequence:RVT_2 3438684 3438791 Chr1_1 HMMER3 similarity 2 245 277.2 . 0 Target=Sequence:RVT_2 3566642 3566891 Chr1_1 HMMER3 similarity 13 213 251.4 . 0 Target=Sequence:RVT_2 4251160 4251373 Chr1_1 HMMER3 similarity 1 244 310.6 . 0 Target=Sequence:RVT_2 4252791 4253036 Chr1_1 HMMER3 similarity 6 99 94.2 . 0 Target=Sequence:RVT_2 4271555 4271653 ======================================================== And here's what it looks like if I *do* use the "--type hit" option. The coordinates look good but the model name has disappeared (and the Target=Sequence seems wrong). COMMAND: ------------------ bp_search2gff.pl -i ../chr1-tesigsv2.hmmscan -o chr1-tesigsv2-hmmscan- original-locations-v3.gff3 --format hmmer3 --type hit --source HMMER3 --version 3 --component OUTPUT: ------------------ ==> chr1-tesigsv2-hmmscan-original-locations-v3.gff3 <== ##gff-version 3 RVT_2 HMMER3 similarity 1898330 1898579 307.3 . 0 Target=Sequence:Chr1_1 1 245 RVT_2 HMMER3 similarity 2573551 2573796 329.5 . 0 Target=Sequence:Chr1_1 1 244 RVT_2 HMMER3 similarity 3159685 3159930 308.8 . 0 Target=Sequence:Chr1_1 1 245 RVT_2 HMMER3 similarity 3438684 3438791 108.2 . 0 Target=Sequence:Chr1_1 1 102 RVT_2 HMMER3 similarity 3566642 3566891 277.2 . 0 Target=Sequence:Chr1_1 2 245 RVT_2 HMMER3 similarity 4251160 4251373 251.4 . 0 Target=Sequence:Chr1_1 13 213 RVT_2 HMMER3 similarity 4252791 4253036 310.6 . 0 Target=Sequence:Chr1_1 1 244 RVT_2 HMMER3 similarity 4271555 4271653 94.2 . 0 Target=Sequence:Chr1_1 6 99 RVT_2 HMMER3 similarity 4481232 4481477 281.5 . 0 Target=Sequence:Chr1_1 2 245 ======================================================== And here's what the input HMMER3 result file looks like: ==> ../chr1-tesigsv2.hmmscan <== # hmmscan :: search sequence(s) against a profile database # HMMER 3.0rc1 (February 2010); http://hmmer.org/ # Copyright (C) 2010 Howard Hughes Medical Institute. # Freely distributed under the GNU General Public License (GPLv3). # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # query sequence file: [...]/whole_chromosomes/translated/ chr1.pep # target HMM database: [...]/signatures/Pfam-A.hmm # output directed to file: chr1-tesigsv2.hmmscan # model-specific thresholding: TC cutoffs # Max sensitivity mode: on [all heuristic filters off] # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Query: Chr1_1 [L=10142557] Description: CHROMOSOME dumped from ADB: Jun/20/09 14:53; last updated: 2009-02-02 Scores for complete sequence (score includes all domains): --- full sequence --- --- best 1 domain --- -#dom- E-value score bias E-value score bias exp N Model Description ------- ------ ----- ------- ------ ----- ---- -- -------- ----------- 0 3971.3 17.7 2.6e-101 329.5 0.6 19.4 17 RVT_2 Reverse transcriptase (RNA-dependent DNA pol 0 3040.7 23.0 1e-206 678.6 0.1 12.2 10 ATHILA ATHILA ORF-1 family 0 1681.9 79.1 1.9e-46 149.9 0.4 28.0 21 RVT_1 Reverse transcriptase (RNA-dependent DNA pol 0 1446.9 27.4 3.6e-95 309.1 0.2 7.6 5 Transposase_21 Transposase family tnp2 0 1168.4 50.3 1.4e-29 94.4 0.3 21.5 18 rve Integrase core domain 9.1e-300 960.0 69.0 3.1e-20 64.0 0.0 28.8 20 Retrotrans_gag Retrotransposon gag protein 1.5e-180 577.0 31.6 1.6e-29 93.1 1.5 9.5 8 Transposase_23 TNP1/EN/SPM transposase 4.4e-143 456.9 82.8 4.8e-18 56.4 0.1 12.9 11 MuDR MuDR family transposase 3.8e-116 371.4 19.6 1.2e-18 58.9 0.0 13.7 7 MULE MULE transposase domain 7.1e-106 344.1 5.6 2.7e-97 316.0 0.0 3.6 1 Plant_tran Plant transposon protein 9.2e-85 275.4 22.9 5.4e-60 194.4 0.3 6.4 3 Peptidase_C48 Ulp1 protease family, C-terminal catalytic d 1.8e-77 249.8 24.8 4.4e-28 89.8 0.1 10.8 3 Transposase_24 Plant transposase (Ptta/En/Spm family) 2.8e-47 150.1 1.2 5.5e-23 72.3 0.2 3.7 2 hATC hAT family dimerisation domain 5.7e-28 89.4 3.6 4.7e-13 41.1 0.0 6.5 1 RVP_2 Retroviral aspartyl protease 1e-16 53.3 0.0 4.4e-07 22.1 0.0 6.8 1 RnaseH RNase H 1.5e-08 25.3 2.4 0.00016 12.1 0.0 4.9 0 Transposase_mut Transposase, Mutator family Domain annotation for each model (and alignments): >> RVT_2 Reverse transcriptase (RNA-dependent DNA polymerase) # score bias c-Evalue i-Evalue hmmfrom hmm to alifrom ali to envfrom env to acc --- ------ ----- --------- --------- ------- ------- ------- ------- ------- ------- ---- 1 ! 307.3 0.0 5.3e-95 1.5e-94 1 245 [. 1898330 1898578 .. 1898330 1898579 .. 0.99 2 ! 329.5 0.6 8.9e-102 2.6e-101 1 244 [. 2573551 2573794 .. 2573551 2573796 .. 0.99 3 ! 308.8 0.0 1.8e-95 5.2e-95 1 245 [. 3159685 3159929 .. 3159685 3159930 .. 0.99 4 ! 108.2 0.1 3.4e-34 9.7e-34 1 102 [. 3438684 3438785 .. 3438684 3438791 .. 0.96 5 ! 277.2 0.0 8.1e-86 2.3e-85 2 245 .. 3566643 3566890 .. 3566642 3566891 .. 0.99 6 ! 251.4 0.0 6.2e-78 1.8e-77 13 213 .. 4251164 4251364 .. 4251160 4251373 .. 0.97 7 ! 310.6 0.0 5.1e-96 1.5e-95 1 244 [. 4252791 4253034 .. 4252791 4253036 .. 0.99 8 ! 94.2 0.1 6.1e-30 1.8e-29 6 99 .. 4271560 4271653 .. 4271555 4271653 .. 0.97 9 ! 281.5 0.9 3.9e-87 1.1e-86 2 245 .. 4481233 4481476 .. 4481232 4481477 .. 0.98 10 ! 248.2 0.0 5.9e-77 1.7e-76 1 190 [. 4521040 4521233 .. 4521040 4521237 .. 0.97 11 ! 314.6 0.1 3.2e-97 9.2e-97 1 244 [. 4652456 4652702 .. 4652456 4652704 .. 0.98 12 ! 40.7 0.0 1.3e-13 3.7e-13 2 92 .. 5219607 5219697 .. 5219606 5219701 .. 0.90 13 ! 221.0 0.0 1.2e-68 3.4e-68 2 245 .. 5241015 5241258 .. 5241014 5241259 .. 0.95 14 ! 81.2 0.0 5.6e-26 1.6e-25 2 115 .. 5501957 5502070 .. 5501956 5502080 .. 0.92 15 ! 272.4 0.0 2.3e-84 6.7e-84 30 245 .. 6483057 6483271 .. 6483050 6483272 .. 0.98 16 ! 178.5 0.0 1.2e-55 3.3e-55 81 244 .. 7250563 7250726 .. 7250552 7250728 .. 0.96 17 ! 313.7 0.0 5.9e-97 1.7e-96 2 245 .. 7707124 7707367 .. 7707123 7707368 .. 0.99 Alignments for each domain: == domain 1 score: 307.3 bits; conditional E-value: 5.3e-95 RVT_2 1 nktwelvelpkgkkviglkWvfklKlnedgeierykARlVakGftqkegidyeetfspvvklesirlllalaaekkleleqlDvktaFLngelee 95 n tw +++lp gkk++g+kWv+k+Kln+dg++erykARlVakG+tq+eg+dy +tfspv+kl++++ll+a+aa+k+++l+qlD+++aFLng+l+e Chr1_1 1898330 NGTWVVCSLPVGKKAVGCKWVYKIKLNADGSLERYKARLVAKGYTQTEGLDYVDTFSPVAKLTTVKLLIAVAAAKGWSLSQLDISNAFLNGSLDE 1898424 68********************************************************************************************* PP RVT_2 96 evYvkqpeGfedkkk....enkvckLkkslYgLkqapraWyeklsevllklgfkkseadkclfvkkkeeeliivllYVDDlliagsskelieelk 186 e+Y++ p+G++ ++ +n vc+LkkslYgLkqa+r+Wy k+se l++lgf+ +s+ d++lf++k++++ ++vl+YVDD++ia+s +++ e l Chr1_1 1898425 EIYMTLPPGYSPRQGdsfpPNAVCRLKKSLYGLKQASRQWYLKFSESLKALGFTQSSGDHTLFTRKSKNSYMAVLVYVDDIIIASSCDRETELLR 1898519 ***********998889999*************************************************************************** PP RVT_2 187 eeLkkefemkdlgelkyfLgleierkeegillsqekyvkkllkkfkmedakpvstplea 245 ++L+++ +++dlg+l+yfLglei+r+++gi+++q+ky+ +ll+++++ +k++s +p+e+ Chr1_1 1898520 DALQRSSKLRDLGTLRYFLGLEIARNTDGISICQRKYTLELLAETGLLGCKSSSVPMEP 1898578 *********************************************************97 PP == domain 2 score: 329.5 bits; conditional E-value: 8.9e-102 RVT_2 1 nktwelvelpkgkkviglkWvfklKlnedgeierykARlVakGftqkegidyeetfspvvklesirlllalaaekkleleqlDvktaFLngelee 95 n+twel++lp+g+k+ig+kWv+k K+n++ge+erykARlVakG++q++gidy+e +f+pv++le++rl+++laa++k++++q+D k aFLng++ee Chr1_1 2573551 NDTWELTSLPNGHKAIGVKWVYKAKKNSKGEVERYKARLVAKGYSQRAGIDYDEVFAPVARLETVRLIISLAAQNKWKIHQMDFKLAFLNGDFEE 2573645 79********************************************************************************************* PP RVT_2 96 evYvkqpeGfedkkkenkvckLkkslYgLkqapraWyeklsevllklgfkkseadkclfvkkkeeeliivllYVDDlliagsskelieelkeeLk 190 evY++qp+G+ +k++e+kv++Lkk+lYgLkqapraW++++++++++++f k+ + +++l++k ++e+++i +lYVDDl+++g++ ++ ee+k+e++ Chr1_1 2573646 EVYIEQPQGYIVKGEEDKVLRLKKALYGLKQAPRAWNTRIDKYFKEKDFIKCPYEHALYIKIQKEDILIACLYVDDLIFTGNNPSMFEEFKKEMT 2573740 *********************************************************************************************** PP RVT_2 191 kefemkdlgelkyfLgleierkeegillsqekyvkkllkkfkmedakpvstple 244 kefem+d+g ++y+Lg+e+++++++i+++qe y+k++lkkfkm+d++pv tp +e Chr1_1 2573741 KEFEMTDIGLMSYYLGIEVKQEDNRIFITQEGYAKEVLKKFKMDDSNPVCTPME 2573794 ****************************************************97 PP From kai.blin at biotech.uni-tuebingen.de Thu Aug 12 08:16:45 2010 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Thu, 12 Aug 2010 14:16:45 +0200 Subject: [Bioperl-l] HMMER3 to GFF3 In-Reply-To: <4bb89ced-69d9-43ff-ae20-4ce134efc40a@f6g2000yqa.googlegroups.com> References: <4bb89ced-69d9-43ff-ae20-4ce134efc40a@f6g2000yqa.googlegroups.com> Message-ID: <20100812141645.1dc6507a.kai.blin@biotech.uni-tuebingen.de> On Wed, 11 Aug 2010 22:59:37 -0700 (PDT) Doug Hoen wrote: Hi Doug, > Could someone please confirm whether the results are incorrect and, if > so, perhaps suggest a fix? It may well be that this problem is due to > the unusual way I am using hmmscan, rather than a problem with HMMER3 > parsing...? Can you please attach your hmmer input file? Along the way something inserted line breaks, making it unreadable. It might well be possible that the HMMer3 parser still handles a little different from the HMMer2 parser, I haven't tried that script. Cheers, Kai -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Institute for Microbiology and Infection Medicine Division of Microbiology/Biotechnology Eberhard-Karls-University of T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Deutschland Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From kai.blin at biotech.uni-tuebingen.de Thu Aug 12 08:09:00 2010 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Thu, 12 Aug 2010 14:09:00 +0200 Subject: [Bioperl-l] using HMMER In-Reply-To: <62C86AFB-FF3A-44C6-A413-50C3F839DF34@illinois.edu> References: <603590.1072.qm@web112620.mail.gq1.yahoo.com> <4C62B487.9090103@gmail.com> <62C86AFB-FF3A-44C6-A413-50C3F839DF34@illinois.edu> Message-ID: <20100812140900.291bbb01.kai.blin@biotech.uni-tuebingen.de> On Wed, 11 Aug 2010 10:07:36 -0500 Chris Fields wrote: > might also want to check whether you are using hmmer2 vs hmmer3. not sure if the wrapper works for hmmer3. It might if you initialize it using my $factory = Bio::Tools::Run::Hmmer->new(-hmm => 'model.hmm', -_READMETHOD => 'hmmer3'); at least for the programs that still exist with the same name in hmmer3. It won't support hmmer3 using the default options, though. If I have some spare time, I'll look into this, no promises on the timeframe, though. Cheers, Kai -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Institute for Microbiology and Infection Medicine Division of Microbiology/Biotechnology Eberhard-Karls-University of T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Deutschland Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From cjfields at illinois.edu Thu Aug 12 11:28:50 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 12 Aug 2010 10:28:50 -0500 Subject: [Bioperl-l] using HMMER In-Reply-To: <20100812140900.291bbb01.kai.blin@biotech.uni-tuebingen.de> References: <603590.1072.qm@web112620.mail.gq1.yahoo.com> <4C62B487.9090103@gmail.com> <62C86AFB-FF3A-44C6-A413-50C3F839DF34@illinois.edu> <20100812140900.291bbb01.kai.blin@biotech.uni-tuebingen.de> Message-ID: <8129B813-5B15-4DDC-AB0D-5D95EFFCE78D@illinois.edu> On Aug 12, 2010, at 7:09 AM, Kai Blin wrote: > On Wed, 11 Aug 2010 10:07:36 -0500 > Chris Fields wrote: > >> might also want to check whether you are using hmmer2 vs hmmer3. not sure if the wrapper works for hmmer3. > > It might if you initialize it using > my $factory = Bio::Tools::Run::Hmmer->new(-hmm => 'model.hmm', -_READMETHOD => 'hmmer3'); > > at least for the programs that still exist with the same name in > hmmer3. It won't support hmmer3 using the default options, though. > > If I have some spare time, I'll look into this, no promises on the > timeframe, though. > > Cheers, > Kai > > -- > Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de > Institute for Microbiology and Infection Medicine > Division of Microbiology/Biotechnology > Eberhard-Karls-University of T?bingen > Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 > D-72076 T?bingen Fax : ++49 7071 29-5979 > Deutschland > Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben Would be nice to convert this over (at some point) to use Mark's CommandExts. I'm thinking of doing this with Infernal, so if I get that running it wouldn't be terribly difficult to get hmmer3 working as well. chris From cjfields at illinois.edu Thu Aug 12 12:14:44 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 12 Aug 2010 11:14:44 -0500 Subject: [Bioperl-l] using HMMER In-Reply-To: <857996.8184.qm@web112610.mail.gq1.yahoo.com> References: <603590.1072.qm@web112620.mail.gq1.yahoo.com> <4C62B487.9090103@gmail.com> <62C86AFB-FF3A-44C6-A413-50C3F839DF34@illinois.edu> <20100812140900.291bbb01.kai.blin@biotech.uni-tuebingen.de> <8129B813-5B15-4DDC-AB0D-5D95EFFCE78D@illinois.edu> <857996.8184.qm@web112610.mail.gq1.yahoo.com> Message-ID: <43FD0A31-DB95-4AE9-B678-937EE6346BC2@illinois.edu> Fayroz, Please keep responses on-list. It seems you need to update your local bioperl, as 'hmmer3' is a recent addition, after 1.6.1. It will be in 1.6.2 if I can get the time to make a release :> chris On Aug 12, 2010, at 10:58 AM, fayroz wrote: > dear chris, > from HMMER documentation i found this statement > "The HMMER programs must either be in your path, or you must set the environment > variable HMMERDIR to point to their location." > is it will solve the problem? > how can i do it please ? i work under windows7 platform > > > when i appled this line with hmmer3 > my $factory = Bio::Tools::Run::Hmmer->new(-hmm => 'model.hmm', -_READMETHOD => > 'hmmer3'); > > this output apper: > > Bio::SearchIO: hmmer3 cannot be found > > and when try with hmmer2 the same output apper: > > Exception > ------------- EXCEPTION ------------- > MSG: Failed to load module Bio::SearchIO::hmmer3. Can't locate > Bio\SearchIO\hmmer3.pm in @INC (@INC contains: D:\Perl\bin\ D:/Perl/site/lib > D:/Perl/lib .) at D:/Perl/site/lib/Bio/Root/Root.pm line 439, line 1. > STACK Bio::Root::Root::_load_module D:/Perl/site/lib/Bio/Root/Root.pm:441 > STACK (eval) D:/Perl/site/lib/Bio/SearchIO.pm:446 > STACK Bio::SearchIO::_load_format_module D:/Perl/site/lib/Bio/SearchIO.pm:445 > STACK Bio::SearchIO::new D:/Perl/site/lib/Bio/SearchIO.pm:189 > STACK Bio::Tools::Run::Hmmer::_run D:/Perl/site/lib/Bio/Tools/Run/Hmmer.pm:431 > STACK Bio::Tools::Run::Hmmer::hmmsearch > D:/Perl/site/lib/Bio/Tools/Run/Hmmer.pm:353 > STACK toplevel C:\Users\Khaled\AppData\Local\Temp\dzprltmp.pl:13 > ------------------------------------- > For more information about the SearchIO system please see the SearchIO docs. > This includes ways of checking for formats at compile time, not run time > '--informat' is not recognized as an internal or external command, > operable program or batch file. > Can't call method "next_result" on an undefined value at > C:\Users\Khaled\AppData\Local\Temp\dzprltmp.pl line 15, line 1. > > > > ----- Original Message ---- > From: Chris Fields > To: Kai Blin > Cc: fayroz ; bioperl-l at bioperl.org > Sent: Thu, August 12, 2010 6:28:50 PM > Subject: Re: [Bioperl-l] using HMMER > > On Aug 12, 2010, at 7:09 AM, Kai Blin wrote: > >> On Wed, 11 Aug 2010 10:07:36 -0500 >> Chris Fields wrote: >> >>> might also want to check whether you are using hmmer2 vs hmmer3. not sure if >>> the wrapper works for hmmer3. >> >> It might if you initialize it using >> my $factory = Bio::Tools::Run::Hmmer->new(-hmm => 'model.hmm', -_READMETHOD => >> 'hmmer3'); >> >> at least for the programs that still exist with the same name in >> hmmer3. It won't support hmmer3 using the default options, though. >> >> If I have some spare time, I'll look into this, no promises on the >> timeframe, though. >> >> Cheers, >> Kai >> >> -- >> Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de >> Institute for Microbiology and Infection Medicine >> Division of Microbiology/Biotechnology >> Eberhard-Karls-University of T?bingen >> Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 >> D-72076 T?bingen Fax : ++49 7071 29-5979 >> Deutschland >> Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben > > Would be nice to convert this over (at some point) to use Mark's CommandExts. > I'm thinking of doing this with Infernal, so if I get that running it wouldn't > be terribly difficult to get hmmer3 working as well. > > chris > > > From jason at bioperl.org Thu Aug 12 14:37:11 2010 From: jason at bioperl.org (Jason Stajich) Date: Thu, 12 Aug 2010 11:37:11 -0700 Subject: [Bioperl-l] Other: Script for editing alignments? In-Reply-To: <20100812061811.4D92468539@evol.biology.mcmaster.ca> References: <20100812061811.4D92468539@evol.biology.mcmaster.ca> Message-ID: <4C643F57.3040408@bioperl.org> Hi Si - This is pretty straightforward with Bioperl. Here's one solution: #!/usr/bin/perl -w use strict; use Bio::AlignIO; my $in = Bio::AlignIO->new(-format => 'fasta', -file => shift @ARGV); my $out = Bio::AlignIO->new(-format => 'fasta'); while( my $aln = $in->next_aln ) { for my $seq ( $aln->each_seq ) { my $str = $seq->seq; if( $str =~ /^(-+)/ ) { my $rep = length($1); # replace from the 5' end substr($str,0,$rep,'N'x$rep); } if( $str =~ /(-+)$/ ) { my $rep = length($1); # replace from the 3' end substr($str,-1 * $rep,length($str),'N'x$rep); } $seq->seq($str); } # don't print the /start-end info in the FASTA ID $aln->set_displayname_flat(1); $out->write_aln($aln); } -jason evoldir at evol.biology.mcmaster.ca wrote, On 8/11/10 11:18 PM: > Dear All > > Alignment programs like MUSCLE and Clustal often output alignments with > "-" symbols indicating indels (real events) within sequence alignments, > but also "-" symbols at the 5' and 3' ends of sequences. The latter > however, are not real evolutionary events and really should be Ns > (missing data), depending on the sort of analytical framework you use. > > If there is sufficient heterogeneity and signal within the 5' and 3' > ends of sequences, the "-"s can be manually edited in a text editor to > Ns with no problem, if the alignment is small. If it is large (e.g. 2000 > seqs), or there are lots of alignments, it becomes a lengthy task. > > I'm investigating such alignments presently and so was wondering if > anyone had a clever way of implementing sed, or had a Perl script that > would perform such a task. Simply put, it would require replacing the 5' > and 3' "-" below only with Ns and leaving the within sequence "-"s > alone. The sequences naturally may span more than one line. > > >Taxon 1 > -----ATGCTG--TGACTG----TGACT--- > >Taxon 2 > ---GTATGTTG--TGACTGCT--TGACCGTC > > to > > >Taxon 1 > NNNNNATGCTG--TGACTG----TGACTNNN > >Taxon 2 > NNNGTATGTTG--TGACTGCT--TGACCGTC > > It's a simple task, but I haven't seen any scripts out there to do the job. > > If there are any scripters out there who can help, or if someone knows > of an application that would help, it would be great to hear from you. > > With best wishes and thanks > > Si Creer > > From genehack at genehack.org Thu Aug 12 20:32:07 2010 From: genehack at genehack.org (John SJ Anderson) Date: Thu, 12 Aug 2010 20:32:07 -0400 Subject: [Bioperl-l] Bio::SeqFeature::SimilarityPair->from_searchResult()? In-Reply-To: <4513D6B2-F7B3-4A6E-91CA-879C9E372E84@gmail.com> References: <4513D6B2-F7B3-4A6E-91CA-879C9E372E84@gmail.com> Message-ID: On Aug 10, 2010, at 21:54 , Douglas Hoen wrote: > I was wondering why the Synopsis in the docs for Bio::SeqFeature::SimilarityPair has the following: > $sim_pair = Bio::SeqFeature::SimilarityPair->from_searchResult($blastHit); > > There doesn't actually seem to be a from_searchResult method. Am I missing something? No, it looks like that method got removed back in 2002 as a part of moving to Bio::SearchIO (which was removed still later...): Unfortunately, the commit didn't update the documentation. From the tiny little bit I've looked at the code, it looks like you should just be calling the 'new()' method instead (note that it takes a set of arguments, not just a BLAST hit object). Hope this helps -- if you should happen to have the tuits, a patch to update the documentation to reflect the current interface would be awesome... chrs, john. From david.breimann at gmail.com Fri Aug 13 09:01:10 2010 From: david.breimann at gmail.com (David Breimann) Date: Fri, 13 Aug 2010 16:01:10 +0300 Subject: [Bioperl-l] Problem executing bp_genbank2gff3.pl from another perl script Message-ID: Hi, I am rying to run bp_genbank2gff3.pl from another perl script that gets a genbank as its argument. This does not work (no output files are generated): my $command = "bp_genbank2gff3.pl -y -o /tmp $ARGV[0]"; open( my $command_out, "-|", $command ); close $command_out; but this does open( my $command_out, "-|", $command ); sleep 3; # why do I need to sleep? close $command_out; Why? I though that close is supposed to block until the command is done: Closing any piped filehandle causes the parent process to wait for the child to finish... (see http://perldoc.perl.org/functions/open.html). Thanks Dave From jun.yin at ucd.ie Fri Aug 13 09:36:34 2010 From: jun.yin at ucd.ie (Jun Yin) Date: Fri, 13 Aug 2010 14:36:34 +0100 Subject: [Bioperl-l] Bio::LocatableSeq end checking inconsistency Message-ID: <004a01cb3aec$8c2ddd60$a4899820$%yin@ucd.ie> Hi, all, I am the google summer of code student working on Bio::Align subsystem refactoring. The code (Bio::SimpleAlign) I re-implemented now has passed nearly all the test, except a few tests on seq/start-end testing. But here comes a problem. This may be an old issue, that the Bio::LocatableSeq end assignment and checking are inconsistent. The current end checking method is based on: $end=$seq->_ungapped_len+$seq->start-1 However, this checking may not fit the real world case. The inconsistency usually happens when a few columns of the sequence are removed. For example: my $a = Bio::LocatableSeq->new( -id => 'a', -strand => 1, -seq => '-tcgatc-atcgatcg', -start => 30, -end => 43 ); If we remove the 1st, 8th and the last columns $a->seq() will be 'tcgatcatcgatc' $a->_ungapped_len==12 Actually, in the real world, the first residue will still be 30 (the old $seq->start), and the last residue is the residue before the 43 (the old $seq->end), thus 42. But if you call a validation, the calculation is $a->_ungapped_len+$a->start-1=12+30-1=41 So the reassignment of the $seq->end will not pass the validation. So unless you save the information to a new sequence object, the original position information will be lost anyway. But in some cases, we have to change the sequence in its original sequence object .. What is your suggestion on this issue? A. pass the test and lose the information #convenient in coding but the start-end annotation is not right any more B. keep the information and forget the test #the object will still remember where the last residue was in the original sequence. But is it really meaningful at all? Because all the other residues may come from nowhere C. Neither of above #any other suggestions? Cheers, Jun Yin Ph.D. student in U.C.D. Bioinformatics Laboratory Conway Institute University College Dublin From jessica.sun at gmail.com Fri Aug 13 11:06:46 2010 From: jessica.sun at gmail.com (Jessica Sun) Date: Fri, 13 Aug 2010 11:06:46 -0400 Subject: [Bioperl-l] Add sequence feature Message-ID: Does anyone knows how to open a genbank file, add new feature and then save a new genbank file with new feature added in bioperl ? thx -- Jessica Jingping Sun From jessica.sun at gmail.com Fri Aug 13 11:27:10 2010 From: jessica.sun at gmail.com (Jessica Sun) Date: Fri, 13 Aug 2010 11:27:10 -0400 Subject: [Bioperl-l] Add sequence feature In-Reply-To: <4C6562E0.7090008@gmail.com> References: <4C6562E0.7090008@gmail.com> Message-ID: unfortunately. I want to add the feature to the sequence object I got from the Genbank file, I do not mind to save a new genbank file but these new genbank file contains the original genbank format and info I got plus the new feature tags I need to added to. Any quick solution to this? thx Jessica On Fri, Aug 13, 2010 at 11:21 AM, Roy Chaudhuri wrote: > Hi Jessica. > > You need to use Bio::SeqIO to read in the GenBank file to a BioPerl > sequence object, and to write your new GenBank file: > http://www.bioperl.org/wiki/HOWTO:SeqIO > > To add a new feature follow the instructions here: > > http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Own_Sequences > > (except that you are adding the feature to the sequence object you got from > the Genbank file, not a new Bio::Seq object). > > Cheers. > Roy. > > > On 13/08/2010 16:06, Jessica Sun wrote: > >> Does anyone knows how to open a genbank file, add new feature and then >> save >> a new genbank >> file with new feature added in bioperl ? >> >> thx >> >> > -- Jessica Jingping Sun From roy.chaudhuri at gmail.com Fri Aug 13 11:21:04 2010 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Fri, 13 Aug 2010 16:21:04 +0100 Subject: [Bioperl-l] Add sequence feature In-Reply-To: References: Message-ID: <4C6562E0.7090008@gmail.com> Hi Jessica. You need to use Bio::SeqIO to read in the GenBank file to a BioPerl sequence object, and to write your new GenBank file: http://www.bioperl.org/wiki/HOWTO:SeqIO To add a new feature follow the instructions here: http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Own_Sequences (except that you are adding the feature to the sequence object you got from the Genbank file, not a new Bio::Seq object). Cheers. Roy. On 13/08/2010 16:06, Jessica Sun wrote: > Does anyone knows how to open a genbank file, add new feature and then save > a new genbank > file with new feature added in bioperl ? > > thx > From roy.chaudhuri at gmail.com Fri Aug 13 11:37:20 2010 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Fri, 13 Aug 2010 16:37:20 +0100 Subject: [Bioperl-l] Add sequence feature In-Reply-To: References: <4C6562E0.7090008@gmail.com> Message-ID: <4C6566B0.60706@gmail.com> I'm not sure I understand, do you mean that you want to load just the sequence from the GenBank file (ignoring the existing annotation), then add your own features? There are instructions on how to do that here: http://www.bioperl.org/wiki/HOWTO:SeqIO#Speed.2C_Bio::Seq::SeqBuilder On 13/08/2010 16:27, Jessica Sun wrote: > unfortunately. I want to add the feature to the sequence object I got > from the Genbank file, I do not mind to save a new genbank file but > these new genbank file contains the original genbank format and info I > got plus the new feature tags I need to added to. Any quick solution to > this? > > thx > > Jessica > > > > On Fri, Aug 13, 2010 at 11:21 AM, Roy Chaudhuri > wrote: > > Hi Jessica. > > You need to use Bio::SeqIO to read in the GenBank file to a BioPerl > sequence object, and to write your new GenBank file: > http://www.bioperl.org/wiki/HOWTO:SeqIO > > To add a new feature follow the instructions here: > http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Own_Sequences > > (except that you are adding the feature to the sequence object you > got from the Genbank file, not a new Bio::Seq object). > > Cheers. > Roy. > > > On 13/08/2010 16:06, Jessica Sun wrote: > > Does anyone knows how to open a genbank file, add new feature > and then save > a new genbank > file with new feature added in bioperl ? > > thx > > > > > > -- > Jessica Jingping Sun From roy.chaudhuri at gmail.com Fri Aug 13 11:57:27 2010 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Fri, 13 Aug 2010 16:57:27 +0100 Subject: [Bioperl-l] Add sequence feature In-Reply-To: References: <4C6562E0.7090008@gmail.com> <4C6566B0.60706@gmail.com> Message-ID: <4C656B67.5020402@gmail.com> Please remember to copy replies to the mailing list. You can loop over the features in your Bio::Seq object: for my $feat ($seq->get_SeqFeatures) { # do something } And once you have found the feature you want to modify, you can add a tag using something like: $feat->add_tag_value('note',"this is a note"); When you're finished you can write out the modified sequence object to a new GenBank file. On 13/08/2010 16:40, Jessica Sun wrote: > no i want to load the genbank file with existing features and I need to > add some new feature tags to the existing ones and then save to a new > update genbank file for local usage. I just not quite good on how to > easily merge the two steps you recommended into one in a neat way. > > thx > > > On Fri, Aug 13, 2010 at 11:37 AM, Roy Chaudhuri > wrote: > > I'm not sure I understand, do you mean that you want to load just > the sequence from the GenBank file (ignoring the existing > annotation), then add your own features? There are instructions on > how to do that here: > http://www.bioperl.org/wiki/HOWTO:SeqIO#Speed.2C_Bio::Seq::SeqBuilder > > > On 13/08/2010 16:27, Jessica Sun wrote: > > unfortunately. I want to add the feature to the sequence object > I got > from the Genbank file, I do not mind to save a new genbank file but > these new genbank file contains the original genbank format and > info I > got plus the new feature tags I need to added to. Any quick > solution to > this? > > thx > > Jessica > > > > On Fri, Aug 13, 2010 at 11:21 AM, Roy Chaudhuri > > >> wrote: > > Hi Jessica. > > You need to use Bio::SeqIO to read in the GenBank file to a > BioPerl > sequence object, and to write your new GenBank file: > http://www.bioperl.org/wiki/HOWTO:SeqIO > > To add a new feature follow the instructions here: > http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Own_Sequences > > (except that you are adding the feature to the sequence > object you > got from the Genbank file, not a new Bio::Seq object). > > Cheers. > Roy. > > > On 13/08/2010 16:06, Jessica Sun wrote: > > Does anyone knows how to open a genbank file, add new > feature > and then save > a new genbank > file with new feature added in bioperl ? > > thx > > > > > > -- > Jessica Jingping Sun > > > > > > -- > Jessica Jingping Sun From jessica.sun at gmail.com Fri Aug 13 13:06:32 2010 From: jessica.sun at gmail.com (Jessica Sun) Date: Fri, 13 Aug 2010 13:06:32 -0400 Subject: [Bioperl-l] Add sequence feature In-Reply-To: <4C656B67.5020402@gmail.com> References: <4C6562E0.7090008@gmail.com> <4C6566B0.60706@gmail.com> <4C656B67.5020402@gmail.com> Message-ID: Thanks. I somehow get these error messages. --------------------- WARNING --------------------- MSG: Bio::SeqIO::genbank=HASH(0xa7ba1c) is not a SeqI compliant module. Attempting to dump, but may fail! --------------------------------------------------- Can't locate object method "seq" via package "Bio::SeqIO::genbank" at /Library/Perl/5.8.8/Bio/SeqIO/genbank.pm line 760, line 447. by doing this, my $feat = new Bio::SeqFeature::Generic(-start =>20, -end => $40, -primary_tag => 'newfeature' ); $feat->add_tag_value("note","this is notes"); $f->add_SeqFeature($feat); ## f is original feature pointer $io = Bio::SeqIO->new(-format => "genbank", -file => ">$newoutfile" ); $io->write_seq($seqio_object); On Fri, Aug 13, 2010 at 11:57 AM, Roy Chaudhuri wrote: > Please remember to copy replies to the mailing list. > > You can loop over the features in your Bio::Seq object: > for my $feat ($seq->get_SeqFeatures) { # do something } > > And once you have found the feature you want to modify, you can add a tag > using something like: > $feat->add_tag_value('note',"this is a note"); > > When you're finished you can write out the modified sequence object to a > new GenBank file. > > > On 13/08/2010 16:40, Jessica Sun wrote: > >> no i want to load the genbank file with existing features and I need to >> add some new feature tags to the existing ones and then save to a new >> update genbank file for local usage. I just not quite good on how to >> easily merge the two steps you recommended into one in a neat way. >> >> thx >> >> >> On Fri, Aug 13, 2010 at 11:37 AM, Roy Chaudhuri > > wrote: >> >> I'm not sure I understand, do you mean that you want to load just >> the sequence from the GenBank file (ignoring the existing >> annotation), then add your own features? There are instructions on >> how to do that here: >> http://www.bioperl.org/wiki/HOWTO:SeqIO#Speed.2C_Bio::Seq::SeqBuilder >> >> >> On 13/08/2010 16:27, Jessica Sun wrote: >> >> unfortunately. I want to add the feature to the sequence object >> I got >> from the Genbank file, I do not mind to save a new genbank file but >> these new genbank file contains the original genbank format and >> info I >> got plus the new feature tags I need to added to. Any quick >> solution to >> this? >> >> thx >> >> Jessica >> >> >> >> On Fri, Aug 13, 2010 at 11:21 AM, Roy Chaudhuri >> >> > >> wrote: >> >> Hi Jessica. >> >> You need to use Bio::SeqIO to read in the GenBank file to a >> BioPerl >> sequence object, and to write your new GenBank file: >> http://www.bioperl.org/wiki/HOWTO:SeqIO >> >> To add a new feature follow the instructions here: >> >> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Own_Sequences >> >> (except that you are adding the feature to the sequence >> object you >> got from the Genbank file, not a new Bio::Seq object). >> >> Cheers. >> Roy. >> >> >> On 13/08/2010 16:06, Jessica Sun wrote: >> >> Does anyone knows how to open a genbank file, add new >> feature >> and then save >> a new genbank >> file with new feature added in bioperl ? >> >> thx >> >> >> >> >> >> -- >> Jessica Jingping Sun >> >> >> >> >> >> -- >> Jessica Jingping Sun >> > > -- Jessica Jingping Sun From drummike at gmail.com Fri Aug 13 13:41:55 2010 From: drummike at gmail.com (Mike Williams) Date: Fri, 13 Aug 2010 13:41:55 -0400 Subject: [Bioperl-l] Add sequence feature In-Reply-To: References: <4C6562E0.7090008@gmail.com> <4C6566B0.60706@gmail.com> <4C656B67.5020402@gmail.com> Message-ID: On Fri, Aug 13, 2010 at 1:06 PM, Jessica Sun wrote: > Thanks. I somehow get these error messages. > by doing this, > > my $feat = new Bio::SeqFeature::Generic(-start =>20, > -end => $40, > -primary_tag => 'newfeature' ); > $feat->add_tag_value("note","this is > notes"); > That $40 looks fishy. Try deleting the dollar sign. You did mean just 40, right? Mike From MEC at stowers.org Fri Aug 13 13:37:50 2010 From: MEC at stowers.org (Cook, Malcolm) Date: Fri, 13 Aug 2010 12:37:50 -0500 Subject: [Bioperl-l] Add sequence feature In-Reply-To: References: <4C6562E0.7090008@gmail.com> <4C6566B0.60706@gmail.com> <4C656B67.5020402@gmail.com> Message-ID: Jessica, Show more code! In particular, where did $f get set? --Malcolm -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Jessica Sun Sent: Friday, August 13, 2010 12:07 PM To: Roy Chaudhuri Cc: bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Add sequence feature Thanks. I somehow get these error messages. --------------------- WARNING --------------------- MSG: Bio::SeqIO::genbank=HASH(0xa7ba1c) is not a SeqI compliant module. Attempting to dump, but may fail! --------------------------------------------------- Can't locate object method "seq" via package "Bio::SeqIO::genbank" at /Library/Perl/5.8.8/Bio/SeqIO/genbank.pm line 760, line 447. by doing this, my $feat = new Bio::SeqFeature::Generic(-start =>20, -end => $40, -primary_tag => 'newfeature' ); $feat->add_tag_value("note","this is notes"); $f->add_SeqFeature($feat); ## f is original feature pointer $io = Bio::SeqIO->new(-format => "genbank", -file => ">$newoutfile" ); $io->write_seq($seqio_object); On Fri, Aug 13, 2010 at 11:57 AM, Roy Chaudhuri wrote: > Please remember to copy replies to the mailing list. > > You can loop over the features in your Bio::Seq object: > for my $feat ($seq->get_SeqFeatures) { # do something } > > And once you have found the feature you want to modify, you can add a > tag using something like: > $feat->add_tag_value('note',"this is a note"); > > When you're finished you can write out the modified sequence object to > a new GenBank file. > > > On 13/08/2010 16:40, Jessica Sun wrote: > >> no i want to load the genbank file with existing features and I need >> to add some new feature tags to the existing ones and then save to a >> new update genbank file for local usage. I just not quite good on how >> to easily merge the two steps you recommended into one in a neat way. >> >> thx >> >> >> On Fri, Aug 13, 2010 at 11:37 AM, Roy Chaudhuri >> > wrote: >> >> I'm not sure I understand, do you mean that you want to load just >> the sequence from the GenBank file (ignoring the existing >> annotation), then add your own features? There are instructions on >> how to do that here: >> >> http://www.bioperl.org/wiki/HOWTO:SeqIO#Speed.2C_Bio::Seq::SeqBuilder >> >> >> On 13/08/2010 16:27, Jessica Sun wrote: >> >> unfortunately. I want to add the feature to the sequence object >> I got >> from the Genbank file, I do not mind to save a new genbank file but >> these new genbank file contains the original genbank format and >> info I >> got plus the new feature tags I need to added to. Any quick >> solution to >> this? >> >> thx >> >> Jessica >> >> >> >> On Fri, Aug 13, 2010 at 11:21 AM, Roy Chaudhuri >> >> > >> wrote: >> >> Hi Jessica. >> >> You need to use Bio::SeqIO to read in the GenBank file to a >> BioPerl >> sequence object, and to write your new GenBank file: >> http://www.bioperl.org/wiki/HOWTO:SeqIO >> >> To add a new feature follow the instructions here: >> >> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Ow >> n_Sequences >> >> (except that you are adding the feature to the sequence >> object you >> got from the Genbank file, not a new Bio::Seq object). >> >> Cheers. >> Roy. >> >> >> On 13/08/2010 16:06, Jessica Sun wrote: >> >> Does anyone knows how to open a genbank file, add new >> feature >> and then save >> a new genbank >> file with new feature added in bioperl ? >> >> thx >> >> >> >> >> >> -- >> Jessica Jingping Sun >> >> >> >> >> >> -- >> Jessica Jingping Sun >> > > -- Jessica Jingping Sun _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From Kevin.M.Brown at asu.edu Fri Aug 13 13:53:50 2010 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Fri, 13 Aug 2010 10:53:50 -0700 Subject: [Bioperl-l] Add sequence feature In-Reply-To: References: <4C6562E0.7090008@gmail.com><4C6566B0.60706@gmail.com><4C656B67.5020402@gmail.com> Message-ID: <1A4207F8295607498283FE9E93B775B406E4529F@EX02.asurite.ad.asu.edu> If I'm reading your sample code correctly, then you are mistakenly trying to output the input SeqIO object and not the actual Bio::Seq object that was read in by SeqIO. My $seqio = Bio::SeqIO->new; My $seq = $seqio->next_seq; #manipulate $seq My $out = Bio::SeqIO->new; $out->write_seq($seq); -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Jessica Sun Sent: Friday, August 13, 2010 10:07 AM To: Roy Chaudhuri Cc: bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Add sequence feature Thanks. I somehow get these error messages. --------------------- WARNING --------------------- MSG: Bio::SeqIO::genbank=HASH(0xa7ba1c) is not a SeqI compliant module. Attempting to dump, but may fail! --------------------------------------------------- Can't locate object method "seq" via package "Bio::SeqIO::genbank" at /Library/Perl/5.8.8/Bio/SeqIO/genbank.pm line 760, line 447. by doing this, my $feat = new Bio::SeqFeature::Generic(-start =>20, -end => $40, -primary_tag => 'newfeature' ); $feat->add_tag_value("note","this is notes"); $f->add_SeqFeature($feat); ## f is original feature pointer $io = Bio::SeqIO->new(-format => "genbank", -file => ">$newoutfile" ); $io->write_seq($seqio_object); On Fri, Aug 13, 2010 at 11:57 AM, Roy Chaudhuri wrote: > Please remember to copy replies to the mailing list. > > You can loop over the features in your Bio::Seq object: > for my $feat ($seq->get_SeqFeatures) { # do something } > > And once you have found the feature you want to modify, you can add a tag > using something like: > $feat->add_tag_value('note',"this is a note"); > > When you're finished you can write out the modified sequence object to a > new GenBank file. > > > On 13/08/2010 16:40, Jessica Sun wrote: > >> no i want to load the genbank file with existing features and I need to >> add some new feature tags to the existing ones and then save to a new >> update genbank file for local usage. I just not quite good on how to >> easily merge the two steps you recommended into one in a neat way. >> >> thx >> >> >> On Fri, Aug 13, 2010 at 11:37 AM, Roy Chaudhuri > > wrote: >> >> I'm not sure I understand, do you mean that you want to load just >> the sequence from the GenBank file (ignoring the existing >> annotation), then add your own features? There are instructions on >> how to do that here: >> http://www.bioperl.org/wiki/HOWTO:SeqIO#Speed.2C_Bio::Seq::SeqBuilder >> >> >> On 13/08/2010 16:27, Jessica Sun wrote: >> >> unfortunately. I want to add the feature to the sequence object >> I got >> from the Genbank file, I do not mind to save a new genbank file but >> these new genbank file contains the original genbank format and >> info I >> got plus the new feature tags I need to added to. Any quick >> solution to >> this? >> >> thx >> >> Jessica >> >> >> >> On Fri, Aug 13, 2010 at 11:21 AM, Roy Chaudhuri >> >> > >> wrote: >> >> Hi Jessica. >> >> You need to use Bio::SeqIO to read in the GenBank file to a >> BioPerl >> sequence object, and to write your new GenBank file: >> http://www.bioperl.org/wiki/HOWTO:SeqIO >> >> To add a new feature follow the instructions here: >> >> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Own_S equences >> >> (except that you are adding the feature to the sequence >> object you >> got from the Genbank file, not a new Bio::Seq object). >> >> Cheers. >> Roy. >> >> >> On 13/08/2010 16:06, Jessica Sun wrote: >> >> Does anyone knows how to open a genbank file, add new >> feature >> and then save >> a new genbank >> file with new feature added in bioperl ? >> >> thx >> >> >> >> >> >> -- >> Jessica Jingping Sun >> >> >> >> >> >> -- >> Jessica Jingping Sun >> > > -- Jessica Jingping Sun _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From jessica.sun at gmail.com Fri Aug 13 15:16:51 2010 From: jessica.sun at gmail.com (Jessica Sun) Date: Fri, 13 Aug 2010 15:16:51 -0400 Subject: [Bioperl-l] Fwd: Add sequence feature In-Reply-To: References: <4C6562E0.7090008@gmail.com> <4C6566B0.60706@gmail.com> <4C656B67.5020402@gmail.com> <1A4207F8295607498283FE9E93B775B406E4529F@EX02.asurite.ad.asu.edu> Message-ID: ---------- Forwarded message ---------- From: Jessica Sun Date: Fri, Aug 13, 2010 at 3:16 PM Subject: Re: [Bioperl-l] Add sequence feature To: Kevin Brown yes, I change that, somehow it still did not take the added features in. On Fri, Aug 13, 2010 at 1:53 PM, Kevin Brown wrote: > If I'm reading your sample code correctly, then you are mistakenly > trying to output the input SeqIO object and not the actual Bio::Seq > object that was read in by SeqIO. > > My $seqio = Bio::SeqIO->new; > My $seq = $seqio->next_seq; > > #manipulate $seq > > My $out = Bio::SeqIO->new; > $out->write_seq($seq); > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Jessica Sun > Sent: Friday, August 13, 2010 10:07 AM > To: Roy Chaudhuri > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Add sequence feature > > Thanks. I somehow get these error messages. > > --------------------- WARNING --------------------- > MSG: Bio::SeqIO::genbank=HASH(0xa7ba1c) is not a SeqI compliant module. > Attempting to dump, but may fail! > --------------------------------------------------- > Can't locate object method "seq" via package "Bio::SeqIO::genbank" at > /Library/Perl/5.8.8/Bio/SeqIO/genbank.pm line 760, line 447. > > by doing this, > > my $feat = new Bio::SeqFeature::Generic(-start =>20, > -end => $40, > -primary_tag => 'newfeature' ); > $feat->add_tag_value("note","this is > notes"); > $f->add_SeqFeature($feat); ## f is original feature pointer > $io = Bio::SeqIO->new(-format => "genbank", -file => ">$newoutfile" ); > > $io->write_seq($seqio_object); > > On Fri, Aug 13, 2010 at 11:57 AM, Roy Chaudhuri > wrote: > > > Please remember to copy replies to the mailing list. > > > > You can loop over the features in your Bio::Seq object: > > for my $feat ($seq->get_SeqFeatures) { # do something } > > > > And once you have found the feature you want to modify, you can add a > tag > > using something like: > > $feat->add_tag_value('note',"this is a note"); > > > > When you're finished you can write out the modified sequence object to > a > > new GenBank file. > > > > > > On 13/08/2010 16:40, Jessica Sun wrote: > > > >> no i want to load the genbank file with existing features and I need > to > >> add some new feature tags to the existing ones and then save to a new > >> update genbank file for local usage. I just not quite good on how to > >> easily merge the two steps you recommended into one in a neat way. > >> > >> thx > >> > >> > >> On Fri, Aug 13, 2010 at 11:37 AM, Roy Chaudhuri > >> > wrote: > >> > >> I'm not sure I understand, do you mean that you want to load just > >> the sequence from the GenBank file (ignoring the existing > >> annotation), then add your own features? There are instructions on > >> how to do that here: > >> > http://www.bioperl.org/wiki/HOWTO:SeqIO#Speed.2C_Bio::Seq::SeqBuilder > >> > >> > >> On 13/08/2010 16:27, Jessica Sun wrote: > >> > >> unfortunately. I want to add the feature to the sequence > object > >> I got > >> from the Genbank file, I do not mind to save a new genbank > file but > >> these new genbank file contains the original genbank format > and > >> info I > >> got plus the new feature tags I need to added to. Any quick > >> solution to > >> this? > >> > >> thx > >> > >> Jessica > >> > >> > >> > >> On Fri, Aug 13, 2010 at 11:21 AM, Roy Chaudhuri > >> > >> >> >> wrote: > >> > >> Hi Jessica. > >> > >> You need to use Bio::SeqIO to read in the GenBank file to > a > >> BioPerl > >> sequence object, and to write your new GenBank file: > >> http://www.bioperl.org/wiki/HOWTO:SeqIO > >> > >> To add a new feature follow the instructions here: > >> > >> > http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Own_S > equences > >> > >> (except that you are adding the feature to the sequence > >> object you > >> got from the Genbank file, not a new Bio::Seq object). > >> > >> Cheers. > >> Roy. > >> > >> > >> On 13/08/2010 16:06, Jessica Sun wrote: > >> > >> Does anyone knows how to open a genbank file, add new > >> feature > >> and then save > >> a new genbank > >> file with new feature added in bioperl ? > >> > >> thx > >> > >> > >> > >> > >> > >> -- > >> Jessica Jingping Sun > >> > >> > >> > >> > >> > >> -- > >> Jessica Jingping Sun > >> > > > > > > > -- > Jessica Jingping Sun > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Jessica Jingping Sun -- Jessica Jingping Sun From MEC at stowers.org Fri Aug 13 15:56:09 2010 From: MEC at stowers.org (Cook, Malcolm) Date: Fri, 13 Aug 2010 14:56:09 -0500 Subject: [Bioperl-l] Fwd: Add sequence feature In-Reply-To: References: <4C6562E0.7090008@gmail.com> <4C6566B0.60706@gmail.com> <4C656B67.5020402@gmail.com> <1A4207F8295607498283FE9E93B775B406E4529F@EX02.asurite.ad.asu.edu> Message-ID: if you want to show all your code we might not have to guess at what the problem is..... Malcolm Cook Stowers Institute for Medical Research - Bioinformatics Kansas City, Missouri USA -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Jessica Sun Sent: Friday, August 13, 2010 2:17 PM To: bioperl-l at lists.open-bio.org Subject: [Bioperl-l] Fwd: Add sequence feature ---------- Forwarded message ---------- From: Jessica Sun Date: Fri, Aug 13, 2010 at 3:16 PM Subject: Re: [Bioperl-l] Add sequence feature To: Kevin Brown yes, I change that, somehow it still did not take the added features in. On Fri, Aug 13, 2010 at 1:53 PM, Kevin Brown wrote: > If I'm reading your sample code correctly, then you are mistakenly > trying to output the input SeqIO object and not the actual Bio::Seq > object that was read in by SeqIO. > > My $seqio = Bio::SeqIO->new; > My $seq = $seqio->next_seq; > > #manipulate $seq > > My $out = Bio::SeqIO->new; > $out->write_seq($seq); > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Jessica Sun > Sent: Friday, August 13, 2010 10:07 AM > To: Roy Chaudhuri > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Add sequence feature > > Thanks. I somehow get these error messages. > > --------------------- WARNING --------------------- > MSG: Bio::SeqIO::genbank=HASH(0xa7ba1c) is not a SeqI compliant module. > Attempting to dump, but may fail! > --------------------------------------------------- > Can't locate object method "seq" via package "Bio::SeqIO::genbank" at > /Library/Perl/5.8.8/Bio/SeqIO/genbank.pm line 760, line 447. > > by doing this, > > my $feat = new Bio::SeqFeature::Generic(-start =>20, > -end => $40, > -primary_tag => 'newfeature' ); > $feat->add_tag_value("note","this > is notes"); $f->add_SeqFeature($feat); ## f is original feature > pointer $io = Bio::SeqIO->new(-format => "genbank", -file => > ">$newoutfile" ); > > $io->write_seq($seqio_object); > > On Fri, Aug 13, 2010 at 11:57 AM, Roy Chaudhuri > wrote: > > > Please remember to copy replies to the mailing list. > > > > You can loop over the features in your Bio::Seq object: > > for my $feat ($seq->get_SeqFeatures) { # do something } > > > > And once you have found the feature you want to modify, you can add > > a > tag > > using something like: > > $feat->add_tag_value('note',"this is a note"); > > > > When you're finished you can write out the modified sequence object > > to > a > > new GenBank file. > > > > > > On 13/08/2010 16:40, Jessica Sun wrote: > > > >> no i want to load the genbank file with existing features and I > >> need > to > >> add some new feature tags to the existing ones and then save to a > >> new update genbank file for local usage. I just not quite good on > >> how to easily merge the two steps you recommended into one in a neat way. > >> > >> thx > >> > >> > >> On Fri, Aug 13, 2010 at 11:37 AM, Roy Chaudhuri > >> > wrote: > >> > >> I'm not sure I understand, do you mean that you want to load just > >> the sequence from the GenBank file (ignoring the existing > >> annotation), then add your own features? There are instructions on > >> how to do that here: > >> > http://www.bioperl.org/wiki/HOWTO:SeqIO#Speed.2C_Bio::Seq::SeqBuilder > >> > >> > >> On 13/08/2010 16:27, Jessica Sun wrote: > >> > >> unfortunately. I want to add the feature to the sequence > object > >> I got > >> from the Genbank file, I do not mind to save a new genbank > file but > >> these new genbank file contains the original genbank format > and > >> info I > >> got plus the new feature tags I need to added to. Any quick > >> solution to > >> this? > >> > >> thx > >> > >> Jessica > >> > >> > >> > >> On Fri, Aug 13, 2010 at 11:21 AM, Roy Chaudhuri > >> > >> >> >> wrote: > >> > >> Hi Jessica. > >> > >> You need to use Bio::SeqIO to read in the GenBank file > >> to > a > >> BioPerl > >> sequence object, and to write your new GenBank file: > >> http://www.bioperl.org/wiki/HOWTO:SeqIO > >> > >> To add a new feature follow the instructions here: > >> > >> > http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Own > _S > equences > >> > >> (except that you are adding the feature to the sequence > >> object you > >> got from the Genbank file, not a new Bio::Seq object). > >> > >> Cheers. > >> Roy. > >> > >> > >> On 13/08/2010 16:06, Jessica Sun wrote: > >> > >> Does anyone knows how to open a genbank file, add new > >> feature > >> and then save > >> a new genbank > >> file with new feature added in bioperl ? > >> > >> thx > >> > >> > >> > >> > >> > >> -- > >> Jessica Jingping Sun > >> > >> > >> > >> > >> > >> -- > >> Jessica Jingping Sun > >> > > > > > > > -- > Jessica Jingping Sun > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Jessica Jingping Sun -- Jessica Jingping Sun _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Aug 16 14:02:15 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 16 Aug 2010 13:02:15 -0500 Subject: [Bioperl-l] Bug? Features with similar ranges, different IDs are considered overlapping Message-ID: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu> All, This is in reference to a bug report I filed a while back. In the below test script, two features with the same start/end are compared. If the features have the same seq_id(), overlap succeeds. If the seq_id is changed (e.g. is on another chromosome, for instance), the overlap still succeeds. The question is: is this a bug? My vote would be 'yes', but there have been various arguments to say it's not. chris (maybe I'll make this a regular thing on the list, just to hash out some of the edge cases I run into periodically) ========================================= #!/usr/bin/perl -w use strict; use warnings; use Test::More; use Bio::SeqFeature::Generic; my ( $feat1, $feat2 ); $feat1 = Bio::SeqFeature::Generic->new( -start => 40, -end => 80, -strand => 1, -seq_id => 'ABC123', ); is $feat1->start, 40, 'start of feature location'; is $feat1->end, 80, 'end of feature location'; is $feat1->seq_id, 'ABC123', 'seq_id'; $feat2 = Bio::SeqFeature::Generic->new( -start => 40, -end => 80, -strand => 1, -seq_id => 'ABC123', ); is $feat2->start, 40, 'start of feature location'; is $feat2->end, 80, 'end of feature location'; is $feat2->seq_id, 'ABC123', 'seq_id'; # Generic features with same Seq ID should overlap ok( $feat2->overlaps($feat1), 'feat2 overlaps feat1' ); # Generic features with different Seq IDs shouldn't overlap is( $feat2->seq_id('XYZ678'), 'XYZ678', 'change seq_id' ); # this currently fails ok( !( $feat2->overlaps($feat1), 'feat2 doesn\'t overlap feat1' ) ); done_testing(); From David.Messina at sbc.su.se Mon Aug 16 14:51:54 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 16 Aug 2010 20:51:54 +0200 Subject: [Bioperl-l] Bug? Features with similar ranges, different IDs are considered overlapping In-Reply-To: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu> References: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu> Message-ID: > The question is: is this a bug? Hmm, tricky. Genomic start and end positions with differing IDs shouldn't overlap, but can't SeqFeatures apply to proteins and other molecules where one would want to compare positions without regard to ID? Dave From cjfields at illinois.edu Mon Aug 16 21:39:00 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 16 Aug 2010 20:39:00 -0500 Subject: [Bioperl-l] Bug? Features with similar ranges, different IDs are considered overlapping In-Reply-To: References: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu> Message-ID: On Aug 16, 2010, at 1:51 PM, Dave Messina wrote: >> The question is: is this a bug? > > Hmm, tricky. > > Genomic start and end positions with differing IDs shouldn't overlap, but can't SeqFeatures apply to proteins and other molecules where one would want to compare positions without regard to ID? > > Dave Good point; it's probably the context the methods are used that matters. So, maybe just a document clarification? chris From David.Messina at sbc.su.se Tue Aug 17 05:06:05 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 17 Aug 2010 11:06:05 +0200 Subject: [Bioperl-l] Bug? Features with similar ranges, different IDs are considered overlapping In-Reply-To: References: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu> Message-ID: <83732A2C-C6E3-479A-ACAA-FE5756271991@sbc.su.se> > Good point; it's probably the context the methods are used that matters. So, maybe just a document clarification? That's always good, but it really doesn't solve the issue you're describing. I mean, who would expect to get overlaps for features on different chromosomes? To me, that's a clear violation of reasonable user expectations. You shouldn't have to read the docs about something like that. So what's the solution for these duelling use cases? I haven't thought about it much, but a first approximation might be to add a -genomic boolean flag that, when true, would do the right thing and check the ID when doing overlaps or other positional comparisons. (Maybe -genomic is too obscure. Maybe it should be -same_id_for_overlaps or something like that.) And maybe having to know to set a flag is effectively the same thing as having to read the docs to understand SeqFeature's overlap behavior. What do the rest of you out there think? Dave From scott at scottcain.net Tue Aug 17 08:45:27 2010 From: scott at scottcain.net (Scott Cain) Date: Tue, 17 Aug 2010 08:45:27 -0400 Subject: [Bioperl-l] Bug? Features with similar ranges, different IDs are considered overlapping In-Reply-To: <83732A2C-C6E3-479A-ACAA-FE5756271991@sbc.su.se> References: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu> <83732A2C-C6E3-479A-ACAA-FE5756271991@sbc.su.se> Message-ID: Hi Dave and Chris, It seems to me that the genomic comparison is the thing people would do more often, so if you're going to create a flag, the default should be for the genomic comparison and if somebody is doing the protein space comparison and not getting the the expected results, they'll probably read the docs to find out why. Scott -- Scott Cain, Ph. D. scott at scottcain dot net Ontario Institute for Cancer Research http://gmod.org/ 216 392 3087 Snet from my iPhone. On Aug 17, 2010, at 5:06 AM, Dave Messina wrote: >> Good point; it's probably the context the methods are used that matters. So, maybe just a document clarification? > > That's always good, but it really doesn't solve the issue you're describing. > > I mean, who would expect to get overlaps for features on different chromosomes? > > To me, that's a clear violation of reasonable user expectations. You shouldn't have to read the docs about something like that. > > So what's the solution for these duelling use cases? I haven't thought about it much, but a first approximation might be to add a -genomic boolean flag that, when true, would do the right thing and check the ID when doing overlaps or other positional comparisons. > > (Maybe -genomic is too obscure. Maybe it should be -same_id_for_overlaps or something like that.) > > And maybe having to know to set a flag is effectively the same thing as having to read the docs to understand SeqFeature's overlap behavior. > > What do the rest of you out there think? > > > Dave > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From david.breimann at gmail.com Tue Aug 17 09:44:08 2010 From: david.breimann at gmail.com (David Breimann) Date: Tue, 17 Aug 2010 16:44:08 +0300 Subject: [Bioperl-l] bp_genbank2gff3.pl error with circular genomes Message-ID: Hello, The following genbank has a gene that runs over the 'end" of the chromosome and into its "beginning", and the script generates an error. ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Bacillus_cereus_ATCC_10987/NC_005707.gbk NC_005707 Unflattening error: Details: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: PROBLEM, SEVERITY==2 Ranges not in correct order. Strange ensembl genbank entry? Range: [207497,208369] [1,687] STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:473 STACK: Bio::SeqFeature::Tools::Unflattener::problem /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:952 STACK: Bio::SeqFeature::Tools::Unflattener::_check_order_is_consistent /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2842 STACK: Bio::SeqFeature::Tools::Unflattener::infer_mRNA_from_CDS /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2713 STACK: Bio::SeqFeature::Tools::Unflattener::unflatten_seq /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:1532 STACK: main::unflatten_seq /usr/local/bin/bp_genbank2gff3.pl:1023 STACK: /usr/local/bin/bp_genbank2gff3.pl:506 ----------------------------------------------------------- Best, Dave From cjfields at illinois.edu Tue Aug 17 09:51:02 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 17 Aug 2010 08:51:02 -0500 Subject: [Bioperl-l] bp_genbank2gff3.pl error with circular genomes In-Reply-To: References: Message-ID: <8E620E8B-4A4A-42B2-A2FA-60C8ECA4C8A5@illinois.edu> I think Chris Mungall has a branch set up for this in bioperl: http://github.com/bioperl/bioperl-live/tree/circular Is that correct? Should we merge that code into the master branch? chris On Aug 17, 2010, at 8:44 AM, David Breimann wrote: > Hello, > > The following genbank has a gene that runs over the 'end" of the > chromosome and into its "beginning", and the script generates an > error. > > ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Bacillus_cereus_ATCC_10987/NC_005707.gbk > > NC_005707 Unflattening error: > Details: > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: PROBLEM, SEVERITY==2 > Ranges not in correct order. Strange ensembl genbank entry? Range: > [207497,208369] [1,687] > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:473 > STACK: Bio::SeqFeature::Tools::Unflattener::problem > /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:952 > STACK: Bio::SeqFeature::Tools::Unflattener::_check_order_is_consistent > /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2842 > STACK: Bio::SeqFeature::Tools::Unflattener::infer_mRNA_from_CDS > /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2713 > STACK: Bio::SeqFeature::Tools::Unflattener::unflatten_seq > /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:1532 > STACK: main::unflatten_seq /usr/local/bin/bp_genbank2gff3.pl:1023 > STACK: /usr/local/bin/bp_genbank2gff3.pl:506 > ----------------------------------------------------------- > > Best, > Dave > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Tue Aug 17 09:52:11 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 17 Aug 2010 15:52:11 +0200 Subject: [Bioperl-l] Bug? Features with similar ranges, different IDs are considered overlapping In-Reply-To: References: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu> <83732A2C-C6E3-479A-ACAA-FE5756271991@sbc.su.se> Message-ID: > It seems to me that the genomic comparison is the thing people would do more often, so if you're going to create a flag, the default should be for the genomic comparison Yep, agreed. And such a flag should be named for the non-default behavior, then, like: -ignore_IDs_for_overlaps Dave From douglas.hoen at gmail.com Thu Aug 12 10:24:27 2010 From: douglas.hoen at gmail.com (Douglas Hoen) Date: Thu, 12 Aug 2010 10:24:27 -0400 Subject: [Bioperl-l] HMMER3 to GFF3 In-Reply-To: <20100812141645.1dc6507a.kai.blin@biotech.uni-tuebingen.de> References: <4bb89ced-69d9-43ff-ae20-4ce134efc40a@f6g2000yqa.googlegroups.com> <20100812141645.1dc6507a.kai.blin@biotech.uni-tuebingen.de> Message-ID: Hi Kai, Here it is. Thanks, -- Doug -------------- next part -------------- A non-text attachment was scrubbed... Name: chr1-tesigsv2.hmmscan Type: application/octet-stream Size: 676132 bytes Desc: not available URL: -------------- next part -------------- On 2010-08-12, at 8:16 AM, Kai Blin wrote: > On Wed, 11 Aug 2010 22:59:37 -0700 (PDT) > Doug Hoen wrote: > > Hi Doug, > >> Could someone please confirm whether the results are incorrect and, if >> so, perhaps suggest a fix? It may well be that this problem is due to >> the unusual way I am using hmmscan, rather than a problem with HMMER3 >> parsing...? > > Can you please attach your hmmer input file? Along the way something > inserted line breaks, making it unreadable. > > It might well be possible that the HMMer3 parser still handles a little > different from the HMMer2 parser, I haven't tried that script. > > Cheers, > Kai > > -- > Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de > Institute for Microbiology and Infection Medicine > Division of Microbiology/Biotechnology > Eberhard-Karls-University of T?bingen > Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 > D-72076 T?bingen Fax : ++49 7071 29-5979 > Deutschland > Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From CJMungall at lbl.gov Tue Aug 17 11:53:15 2010 From: CJMungall at lbl.gov (Chris Mungall) Date: Tue, 17 Aug 2010 08:53:15 -0700 Subject: [Bioperl-l] bp_genbank2gff3.pl error with circular genomes In-Reply-To: <8E620E8B-4A4A-42B2-A2FA-60C8ECA4C8A5@illinois.edu> References: <8E620E8B-4A4A-42B2-A2FA-60C8ECA4C8A5@illinois.edu> Message-ID: You can merge this in. It should allow David to proceed. I haven't kept up on synchrony between bioperl and GFF on circular genomes. The above fix is conservative in that essentially preserves the genbank coordinates even when the origin is crossed: http://github.com/bioperl/bioperl-live/commit/d752a4cb5168d1bb01f8c80247a57f66b2bd9daf However, if this is to conform to GFF3 then the resulting coordinates that cross the origin should have start/end incremented by the genome length On Aug 17, 2010, at 6:51 AM, Chris Fields wrote: > I think Chris Mungall has a branch set up for this in bioperl: > > http://github.com/bioperl/bioperl-live/tree/circular > > Is that correct? Should we merge that code into the master branch? > > chris > > On Aug 17, 2010, at 8:44 AM, David Breimann wrote: > >> Hello, >> >> The following genbank has a gene that runs over the 'end" of the >> chromosome and into its "beginning", and the script generates an >> error. >> >> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Bacillus_cereus_ATCC_10987/NC_005707.gbk >> >> NC_005707 Unflattening error: >> Details: >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: PROBLEM, SEVERITY==2 >> Ranges not in correct order. Strange ensembl genbank entry? Range: >> [207497,208369] [1,687] >> STACK: Error::throw >> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/ >> Root.pm:473 >> STACK: Bio::SeqFeature::Tools::Unflattener::problem >> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:952 >> STACK: >> Bio::SeqFeature::Tools::Unflattener::_check_order_is_consistent >> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2842 >> STACK: Bio::SeqFeature::Tools::Unflattener::infer_mRNA_from_CDS >> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2713 >> STACK: Bio::SeqFeature::Tools::Unflattener::unflatten_seq >> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:1532 >> STACK: main::unflatten_seq /usr/local/bin/bp_genbank2gff3.pl:1023 >> STACK: /usr/local/bin/bp_genbank2gff3.pl:506 >> ----------------------------------------------------------- >> >> Best, >> Dave >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Tue Aug 17 15:24:23 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 17 Aug 2010 14:24:23 -0500 Subject: [Bioperl-l] bp_genbank2gff3.pl error with circular genomes In-Reply-To: References: <8E620E8B-4A4A-42B2-A2FA-60C8ECA4C8A5@illinois.edu> Message-ID: <8C50FCC4-4BB5-4ACB-A138-BB20F50D2C45@illinois.edu> On Aug 17, 2010, at 10:53 AM, Chris Mungall wrote: > You can merge this in. It should allow David to proceed. Will do. I'll go ahead and delete the remote branch as well. > I haven't kept up on synchrony between bioperl and GFF on circular genomes. The above fix is conservative in that essentially preserves the genbank coordinates even when the origin is crossed: > > http://github.com/bioperl/bioperl-live/commit/d752a4cb5168d1bb01f8c80247a57f66b2bd9daf > > However, if this is to conform to GFF3 then the resulting coordinates that cross the origin should have start/end incremented by the genome length Yes, that is a problem that needs to be addressed. Might be worth filing a bug report for tracking this; we can use David's example, or the one I recently added for phi-X174. chris > On Aug 17, 2010, at 6:51 AM, Chris Fields wrote: > >> I think Chris Mungall has a branch set up for this in bioperl: >> >> http://github.com/bioperl/bioperl-live/tree/circular >> >> Is that correct? Should we merge that code into the master branch? >> >> chris >> >> On Aug 17, 2010, at 8:44 AM, David Breimann wrote: >> >>> Hello, >>> >>> The following genbank has a gene that runs over the 'end" of the >>> chromosome and into its "beginning", and the script generates an >>> error. >>> >>> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Bacillus_cereus_ATCC_10987/NC_005707.gbk >>> >>> NC_005707 Unflattening error: >>> Details: >>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>> MSG: PROBLEM, SEVERITY==2 >>> Ranges not in correct order. Strange ensembl genbank entry? Range: >>> [207497,208369] [1,687] >>> STACK: Error::throw >>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:473 >>> STACK: Bio::SeqFeature::Tools::Unflattener::problem >>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:952 >>> STACK: Bio::SeqFeature::Tools::Unflattener::_check_order_is_consistent >>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2842 >>> STACK: Bio::SeqFeature::Tools::Unflattener::infer_mRNA_from_CDS >>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2713 >>> STACK: Bio::SeqFeature::Tools::Unflattener::unflatten_seq >>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:1532 >>> STACK: main::unflatten_seq /usr/local/bin/bp_genbank2gff3.pl:1023 >>> STACK: /usr/local/bin/bp_genbank2gff3.pl:506 >>> ----------------------------------------------------------- >>> >>> Best, >>> Dave >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From sheldon.mckay at gmail.com Tue Aug 17 16:42:50 2010 From: sheldon.mckay at gmail.com (Sheldon McKay) Date: Tue, 17 Aug 2010 16:42:50 -0400 Subject: [Bioperl-l] AlignIO and Gbrowse_syn In-Reply-To: References: <18DF7D20DFEC044098A1062202F5FFF32F0237EAB7@exchsth.agresearch.co.nz> Message-ID: The growse_syn dev team is pretty small (n=1) right now, so any patches would be welcome. Sheldon On Wed, Aug 11, 2010 at 6:02 PM, Chris Fields wrote: > Russell, > > We have had very few requests to support .maf until recently, which is why there has been little done with it. ?We welcome any help to improve it. > > chris > > On Aug 11, 2010, at 4:31 PM, Smithies, Russell wrote: > >> I know there was some brief discussion about .maf format a few weeks ago but I've had an enquiry (as below) from a colleague. >> If GBrowse_syn is using .maf format, does AlignIO need more work? >> Any comments? >> >> --Russell >> >> >> I'd like to plug LASTZ alignments into GBrowse_syn. LASTZ can produce a limit number of alignment formats (http://www.bx.psu.edu/miller_lab/dist/README.lastz-1.02.00/README.lastz-1.02.00a.html#options_output). GBrowse_syn accepts clustalw format plus "other commonly used formats recognized by BioPerl's AlignIO parser" (http://gmod.org/wiki/GBrowse_syn_Database) . ?Since LASTZ doesn't produce clustalw, I've tried parsing LASTZ maf output to clustalw (and other alignment formats) using AlignIO, however I run into the following issues: >> *Strand info is lost (probably fair enough, since this isn't part of the clustalw format per se; incorporating strand info within sequence IDs is a GBrowse_syn clustalw specification) >> *The coordinate system for reverse strand matches ?differs between LASTZ .maf and BioPerl .maf: for LASTZ, coordinates relate to the reverse complemented sequence, whereas for BioPerl/GBrowse, coordinates relate to the original (non-rev complemented) sequence. E.g. a coordinate of "1" in the LASTZ .maf file refers to the last base of the original sequence; AlignIO prints "1" to the output clustalw file, but since strand info is lost it is construed as the first position at the very start of the original sequence. As a result all reverse match coordinates in the resulting clustalw output file are incorrect. >> *AlignIO is unable to parse multiple, individual aligned regions within the same .maf file; it interleaves them >> >> I would be interested to hear whether anyone has already found a solution to integrating LASTZ and GBrowse_syn... and also whether any development of AlignIO to improve support of maf format is planned. >> ======================================================================= >> Attention: The information contained in this message and/or attachments >> from AgResearch Limited is intended only for the persons or entities >> to which it is addressed and may contain confidential and/or privileged >> material. Any review, retransmission, dissemination or other use of, or >> taking of any action in reliance upon, this information by persons or >> entities other than the intended recipients is prohibited by AgResearch >> Limited. If you have received this message in error, please notify the >> sender immediately. >> ======================================================================= >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From hxu.hong at gmail.com Tue Aug 17 16:50:43 2010 From: hxu.hong at gmail.com (Hong Xu) Date: Tue, 17 Aug 2010 16:50:43 -0400 Subject: [Bioperl-l] Bio::Tools::Primer3 question Message-ID: Hello all, I'm working to parse the Primer3 release 2.2.2-beta result. I made the necessary changes to make Bio::Tools::Primer3 work with the new output tags of Primer3 release 2.2.2. But when I tried to get the primer Tm, I found that Bio::Tools::Primer3 gave different Tm from Primer3 result file. Then I learned that the Tm was calculated by Bio::SeqFeature::Primer module, not from parsing Primer3 result. If I want to get data from parsing Primer3 result, should I write my own Primer3 parser instead of Bio::Tools::Primer3? thanks a lot, Hong From cjfields at illinois.edu Tue Aug 17 17:14:02 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 17 Aug 2010 16:14:02 -0500 Subject: [Bioperl-l] Bio::Tools::Primer3 question In-Reply-To: References: Message-ID: Already ahead of you there, unfortunately. I wrote a complete reimplementation of both the Primer3 parser and the Primer3 wrapper that handles both v1 and v2 of primer3_core. Lack of tuits lately have prevented me from getting tests written up, so for the time being it's sitting in bioperl-dev: http://github.com/bioperl/bioperl-dev They are Bio::Tools::Primer3Redux (parser) and Bio::Tools::Run::Primer3Redux (wrapper). I rewrote those b/c I found the original modules not adequate enough in many ways for my purposes then (the newer version uses simple features or feature pairs instead of the primer features, for the same reasons you mention re: Tm). You're more than welcome to hack on the code a bit. I'm planning on pulling it out into my own github repo for separate submission to CPAN. chris On Aug 17, 2010, at 3:50 PM, Hong Xu wrote: > Hello all, > > I'm working to parse the Primer3 release 2.2.2-beta result. I made the > necessary changes to make Bio::Tools::Primer3 work with the new output > tags of Primer3 release 2.2.2. But when I tried to get the primer Tm, > I found that Bio::Tools::Primer3 gave different Tm from Primer3 result > file. Then I learned that the Tm was calculated by > Bio::SeqFeature::Primer module, not from parsing Primer3 result. If I > want to get data from parsing Primer3 result, should I write my own > Primer3 parser instead of Bio::Tools::Primer3? > > thanks a lot, > Hong > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Aug 17 23:42:59 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 17 Aug 2010 22:42:59 -0500 Subject: [Bioperl-l] bp_genbank2gff3.pl error with circular genomes In-Reply-To: <8C50FCC4-4BB5-4ACB-A138-BB20F50D2C45@illinois.edu> References: <8E620E8B-4A4A-42B2-A2FA-60C8ECA4C8A5@illinois.edu> <8C50FCC4-4BB5-4ACB-A138-BB20F50D2C45@illinois.edu> Message-ID: Chris, David, The branch is now merged back to trunk. David, let us know if this helps. chris (f) On Aug 17, 2010, at 2:24 PM, Chris Fields wrote: > On Aug 17, 2010, at 10:53 AM, Chris Mungall wrote: > >> You can merge this in. It should allow David to proceed. > > Will do. I'll go ahead and delete the remote branch as well. > >> I haven't kept up on synchrony between bioperl and GFF on circular genomes. The above fix is conservative in that essentially preserves the genbank coordinates even when the origin is crossed: >> >> http://github.com/bioperl/bioperl-live/commit/d752a4cb5168d1bb01f8c80247a57f66b2bd9daf >> >> However, if this is to conform to GFF3 then the resulting coordinates that cross the origin should have start/end incremented by the genome length > > Yes, that is a problem that needs to be addressed. Might be worth filing a bug report for tracking this; we can use David's example, or the one I recently added for phi-X174. > > chris > >> On Aug 17, 2010, at 6:51 AM, Chris Fields wrote: >> >>> I think Chris Mungall has a branch set up for this in bioperl: >>> >>> http://github.com/bioperl/bioperl-live/tree/circular >>> >>> Is that correct? Should we merge that code into the master branch? >>> >>> chris >>> >>> On Aug 17, 2010, at 8:44 AM, David Breimann wrote: >>> >>>> Hello, >>>> >>>> The following genbank has a gene that runs over the 'end" of the >>>> chromosome and into its "beginning", and the script generates an >>>> error. >>>> >>>> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Bacillus_cereus_ATCC_10987/NC_005707.gbk >>>> >>>> NC_005707 Unflattening error: >>>> Details: >>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>> MSG: PROBLEM, SEVERITY==2 >>>> Ranges not in correct order. Strange ensembl genbank entry? Range: >>>> [207497,208369] [1,687] >>>> STACK: Error::throw >>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:473 >>>> STACK: Bio::SeqFeature::Tools::Unflattener::problem >>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:952 >>>> STACK: Bio::SeqFeature::Tools::Unflattener::_check_order_is_consistent >>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2842 >>>> STACK: Bio::SeqFeature::Tools::Unflattener::infer_mRNA_from_CDS >>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2713 >>>> STACK: Bio::SeqFeature::Tools::Unflattener::unflatten_seq >>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:1532 >>>> STACK: main::unflatten_seq /usr/local/bin/bp_genbank2gff3.pl:1023 >>>> STACK: /usr/local/bin/bp_genbank2gff3.pl:506 >>>> ----------------------------------------------------------- >>>> >>>> Best, >>>> Dave >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Aug 18 00:48:55 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 17 Aug 2010 23:48:55 -0500 Subject: [Bioperl-l] Bio::Tools::Primer3 question In-Reply-To: References: Message-ID: Hong, The latest code, along with working tests, is present here: http://github.com/cjfields/Bio-Tools-Primer3Redux It needs a few more tests but the initial wrapper tests work fine for primer3 v2.2.1 on both Mac and Linux. Will try using this to CPAN after a bit more cleanup. chris On Aug 17, 2010, at 4:14 PM, Chris Fields wrote: > Already ahead of you there, unfortunately. I wrote a complete reimplementation of both the Primer3 parser and the Primer3 wrapper that handles both v1 and v2 of primer3_core. Lack of tuits lately have prevented me from getting tests written up, so for the time being it's sitting in bioperl-dev: > > http://github.com/bioperl/bioperl-dev > > They are Bio::Tools::Primer3Redux (parser) and Bio::Tools::Run::Primer3Redux (wrapper). > > I rewrote those b/c I found the original modules not adequate enough in many ways for my purposes then (the newer version uses simple features or feature pairs instead of the primer features, for the same reasons you mention re: Tm). You're more than welcome to hack on the code a bit. I'm planning on pulling it out into my own github repo for separate submission to CPAN. > > chris > > On Aug 17, 2010, at 3:50 PM, Hong Xu wrote: > >> Hello all, >> >> I'm working to parse the Primer3 release 2.2.2-beta result. I made the >> necessary changes to make Bio::Tools::Primer3 work with the new output >> tags of Primer3 release 2.2.2. But when I tried to get the primer Tm, >> I found that Bio::Tools::Primer3 gave different Tm from Primer3 result >> file. Then I learned that the Tm was calculated by >> Bio::SeqFeature::Primer module, not from parsing Primer3 result. If I >> want to get data from parsing Primer3 result, should I write my own >> Primer3 parser instead of Bio::Tools::Primer3? >> >> thanks a lot, >> Hong >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From david.breimann at gmail.com Wed Aug 18 02:46:58 2010 From: david.breimann at gmail.com (David Breimann) Date: Wed, 18 Aug 2010 09:46:58 +0300 Subject: [Bioperl-l] bp_genbank2gff3.pl error with circular genomes In-Reply-To: References: <8E620E8B-4A4A-42B2-A2FA-60C8ECA4C8A5@illinois.edu> <8C50FCC4-4BB5-4ACB-A138-BB20F50D2C45@illinois.edu> Message-ID: Dear Chris's, I tested the updated version on multiple genomes that previously returned errors (for future reference: NC_005707, NC_006578, NC_007103, NC_007104, NC_007106, NC_007107, NC_008573, NC_008762, NC_008763, NC_008785, NC_009457, NC_012040). The script now ends normally on all of them. However, as you mentioned, the result GFF3 file does not comply with GFF3 specifications for circular genomes. This in turn causes some unexpected results in other applications. Best, Dave On Wed, Aug 18, 2010 at 6:42 AM, Chris Fields wrote: > Chris, David, > > The branch is now merged back to trunk. ?David, let us know if this helps. > > chris (f) > > On Aug 17, 2010, at 2:24 PM, Chris Fields wrote: > >> On Aug 17, 2010, at 10:53 AM, Chris Mungall wrote: >> >>> You can merge this in. It should allow David to proceed. >> >> Will do. ?I'll go ahead and delete the remote branch as well. >> >>> I haven't kept up on synchrony between bioperl and GFF on circular genomes. The above fix is conservative in that essentially preserves the genbank coordinates even when the origin is crossed: >>> >>> ? ? ?http://github.com/bioperl/bioperl-live/commit/d752a4cb5168d1bb01f8c80247a57f66b2bd9daf >>> >>> However, if this is to conform to GFF3 then the resulting coordinates that cross the origin should have start/end incremented by the genome length >> >> Yes, that is a problem that needs to be addressed. ?Might be worth filing a bug report for tracking this; we can use David's example, or the one I recently added for phi-X174. >> >> chris >> >>> On Aug 17, 2010, at 6:51 AM, Chris Fields wrote: >>> >>>> I think Chris Mungall has a branch set up for this in bioperl: >>>> >>>> http://github.com/bioperl/bioperl-live/tree/circular >>>> >>>> Is that correct? ?Should we merge that code into the master branch? >>>> >>>> chris >>>> >>>> On Aug 17, 2010, at 8:44 AM, David Breimann wrote: >>>> >>>>> Hello, >>>>> >>>>> The following genbank has a gene that runs over the 'end" of the >>>>> chromosome and into its "beginning", and the script generates an >>>>> error. >>>>> >>>>> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Bacillus_cereus_ATCC_10987/NC_005707.gbk >>>>> >>>>> NC_005707 Unflattening error: >>>>> Details: >>>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>>> MSG: PROBLEM, SEVERITY==2 >>>>> Ranges not in correct order. Strange ensembl genbank entry? Range: >>>>> [207497,208369] [1,687] >>>>> STACK: Error::throw >>>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:473 >>>>> STACK: Bio::SeqFeature::Tools::Unflattener::problem >>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:952 >>>>> STACK: Bio::SeqFeature::Tools::Unflattener::_check_order_is_consistent >>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2842 >>>>> STACK: Bio::SeqFeature::Tools::Unflattener::infer_mRNA_from_CDS >>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2713 >>>>> STACK: Bio::SeqFeature::Tools::Unflattener::unflatten_seq >>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:1532 >>>>> STACK: main::unflatten_seq /usr/local/bin/bp_genbank2gff3.pl:1023 >>>>> STACK: /usr/local/bin/bp_genbank2gff3.pl:506 >>>>> ----------------------------------------------------------- >>>>> >>>>> Best, >>>>> Dave >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From G.Gallone at sms.ed.ac.uk Wed Aug 18 10:57:01 2010 From: G.Gallone at sms.ed.ac.uk (Giuseppe Gallone) Date: Wed, 18 Aug 2010 15:57:01 +0100 Subject: [Bioperl-l] [RFC] Interolog::Walk Message-ID: <4C6BF4BD.5010200@sms.ed.ac.uk> Hello BioPerl community - I've written a new module called Interolog::Walk that I'm planning to put on CPAN. I would be grateful if you might take a look at the brief description I attached and tell me what you think. I'll be more than happy to post further details should the module be of some interest for someone. Also, I am not totally sure about having the correct name for it. This is my first module and It would be great if you could advise on naming it appropriately. Hopefully the following description will give an idea on what it does. =================== NAME Interolog::Walk - Retrieve, score and visualize putative Protein-Protein Interactions through the orthology-walk method DESCRIPTION A common activity in computational biology is to mine protein-protein interactions from publicly available databases in order to build Protein-Protein Interaction (PPI) datasets. In many instances, however, the number of experimentally obtained annotated PPIs is very scarce and it would be helpful to enrich the experimental dataset with high-quality, computationally-inferred PPIs. Such computationally-obtained dataset can extend, support or enrich experimental PPI datasets, and are of crucial importance in high-throughput gene prioritization studies, i.e. to drive hypotheses and restrict the dimensionality of many gene functional discovery problems. This Perl Module, Interolog::Walk, is aimed at building putative PPI datasets on the basis of a number of comparative biology paradigms: the module implements a collection of computational biology algorithms based on the concept of "orthology projection". If interacting proteins A and B in organism X have orthologs A' and B' in organism Y, under certain conditions one can assume that the interaction will be conserved in organism Y, i.e. the A-B interaction can be "projected through the orthologies" to obtain a putative A'-B' interaction. The pair of interactions (A-B) and (A'-B') are named "Interologs" (see for instance [1] and [2]). Interolog::Walk collects, analyses and collates gene orthology data provided by the Ensembl Consortium (www.ensembl.org) as well as PPI data provided by EBI Intact (http://www.ebi.ac.uk/intact/). It provides the user with the possibility of rating the quality and reliability of the putative interactions collected, by means of confidence scores, and optionally outputs network representations of the datasets, compatible with the biological network representation standard, Cytoscape. USAGE In order to carry out an interolog walk we start with a set of gene identifiers in one organism of interest. We query those ids against a number of comparative biology databases to retrieve a list of orthologues for each gene id of interest, in one or more species. In the following step we rely on PPI databases to retrieve the list of available interactors for the protein ids obtained. The output at this stage consists of a list of interactors of the orthologues of the initial gene set, plus several fields of ancillary data. In the last step of the process we project the interactions - again using orthology data - back to the original species of interest. The output of the process is a list of PUTATIVE INTERACTORS of the initial gene set, plus several fields of ancillary data. ==================== Given the scope and the focus of the project, I would imagine that viable alternatives for the namespace might be Bio::Orthology::InterologWalk Bio::InterologMap or maybe Interolog::Map Orthology::Map Orthology::InterologMap There are no similar projects as far as I could see so I shouldn't run the risk of overlapping namespaces. Still I would love to know your informed opinion about it. best, Giuseppe REFERENCES [1] Yu H, Luscombe NM, Lu HX, Zhu X, Xia Y, Han JD, Bertin N, Chung S, Vidal M, Gerstein M. Annotation transfer between genomes: protein-protein interologs and protein-DNA regulogs. Genome Research 2004 Jun;14(6):1107-18. [2]Wiles AM, Doderer M, Ruan J, Gu T-T, Ravi D, Blackman BA, Bishop AJR. "Building and Analyzing Protein Interactome Networks by Cross-species Comparisons." BMC Systems Biology 2010, 4:36 - PMID: 20353594 -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From David.Messina at sbc.su.se Wed Aug 18 12:52:58 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 18 Aug 2010 18:52:58 +0200 Subject: [Bioperl-l] [RFC] Interolog::Walk In-Reply-To: <4C6BF4BD.5010200@sms.ed.ac.uk> References: <4C6BF4BD.5010200@sms.ed.ac.uk> Message-ID: <8BBCD561-CA5E-412B-8CBC-34767006A74A@sbc.su.se> Hi Giuseppe, Sounds really interesting ? thanks for posting this. > Bio::Orthology::InterologWalk I vote for this name, or in any case something with Bio:: as the top-level namespace since it's a biology-related package. I like that you're providing a lot of background and information about the project in the documentation. However, the USAGE section should give information about how to use the module, with example code. You can look at other modules on CPAN (or in BioPerl) to see the conventions for writing documentation. Also, from what you wrote, it sounds like this might be a pipeline or a script rather than a module per se, or perhaps a script and a set of modules. It would be helpful to clarify in your documentation (if you haven't already) how exactly things are organized (and of course example code will help with that, too). Hope that's helpful, and let us know when you've got it up on CPAN so we can try it out! Dave From cjfields at illinois.edu Wed Aug 18 14:24:16 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 18 Aug 2010 13:24:16 -0500 Subject: [Bioperl-l] bp_genbank2gff3.pl error with circular genomes In-Reply-To: References: <8E620E8B-4A4A-42B2-A2FA-60C8ECA4C8A5@illinois.edu> <8C50FCC4-4BB5-4ACB-A138-BB20F50D2C45@illinois.edu> Message-ID: Okay, will file this as a bug. Thanks! chris On Aug 18, 2010, at 1:46 AM, David Breimann wrote: > Dear Chris's, > > I tested the updated version on multiple genomes that previously > returned errors (for future reference: NC_005707, NC_006578, > NC_007103, NC_007104, NC_007106, NC_007107, NC_008573, NC_008762, > NC_008763, NC_008785, NC_009457, NC_012040). The script now ends > normally on all of them. However, as you mentioned, the result GFF3 > file does not comply with GFF3 specifications for circular genomes. > This in turn causes some unexpected results in other applications. > > Best, > Dave > > On Wed, Aug 18, 2010 at 6:42 AM, Chris Fields wrote: >> Chris, David, >> >> The branch is now merged back to trunk. David, let us know if this helps. >> >> chris (f) >> >> On Aug 17, 2010, at 2:24 PM, Chris Fields wrote: >> >>> On Aug 17, 2010, at 10:53 AM, Chris Mungall wrote: >>> >>>> You can merge this in. It should allow David to proceed. >>> >>> Will do. I'll go ahead and delete the remote branch as well. >>> >>>> I haven't kept up on synchrony between bioperl and GFF on circular genomes. The above fix is conservative in that essentially preserves the genbank coordinates even when the origin is crossed: >>>> >>>> http://github.com/bioperl/bioperl-live/commit/d752a4cb5168d1bb01f8c80247a57f66b2bd9daf >>>> >>>> However, if this is to conform to GFF3 then the resulting coordinates that cross the origin should have start/end incremented by the genome length >>> >>> Yes, that is a problem that needs to be addressed. Might be worth filing a bug report for tracking this; we can use David's example, or the one I recently added for phi-X174. >>> >>> chris >>> >>>> On Aug 17, 2010, at 6:51 AM, Chris Fields wrote: >>>> >>>>> I think Chris Mungall has a branch set up for this in bioperl: >>>>> >>>>> http://github.com/bioperl/bioperl-live/tree/circular >>>>> >>>>> Is that correct? Should we merge that code into the master branch? >>>>> >>>>> chris >>>>> >>>>> On Aug 17, 2010, at 8:44 AM, David Breimann wrote: >>>>> >>>>>> Hello, >>>>>> >>>>>> The following genbank has a gene that runs over the 'end" of the >>>>>> chromosome and into its "beginning", and the script generates an >>>>>> error. >>>>>> >>>>>> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Bacillus_cereus_ATCC_10987/NC_005707.gbk >>>>>> >>>>>> NC_005707 Unflattening error: >>>>>> Details: >>>>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>>>> MSG: PROBLEM, SEVERITY==2 >>>>>> Ranges not in correct order. Strange ensembl genbank entry? Range: >>>>>> [207497,208369] [1,687] >>>>>> STACK: Error::throw >>>>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:473 >>>>>> STACK: Bio::SeqFeature::Tools::Unflattener::problem >>>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:952 >>>>>> STACK: Bio::SeqFeature::Tools::Unflattener::_check_order_is_consistent >>>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2842 >>>>>> STACK: Bio::SeqFeature::Tools::Unflattener::infer_mRNA_from_CDS >>>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2713 >>>>>> STACK: Bio::SeqFeature::Tools::Unflattener::unflatten_seq >>>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:1532 >>>>>> STACK: main::unflatten_seq /usr/local/bin/bp_genbank2gff3.pl:1023 >>>>>> STACK: /usr/local/bin/bp_genbank2gff3.pl:506 >>>>>> ----------------------------------------------------------- >>>>>> >>>>>> Best, >>>>>> Dave >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cdavis at bcm.tmc.edu Wed Aug 18 15:19:53 2010 From: cdavis at bcm.tmc.edu (Caleb Davis) Date: Wed, 18 Aug 2010 14:19:53 -0500 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlast.pm - bl2seq question Message-ID: <4C6C3259.4060304@bcm.tmc.edu> Hello, thank you for bioperl! I am getting discrepancies between the online bl2seq (www.ncbi.nlm.nih.gov/blast/*bl2seq*/wblast2.cgi) and bioperl's implementation, and I'm not sure why. I'm seeing a desired behavior through the web interface but can't replicate it locally. Specifically, online bl2seq aligns across a 1 bp insertion in the subject whereas the local bl2seq just reports a shorter alignment. Any ideas? Thanks again, --Caleb The desired parameter differences from default are -F F -W 7 (turn complexity filter off, word size = 7). Below I present the online and local results given the following input sequences: >consensus GAGGATCCAGAATTCTC >FVFTF6N01A86BR AACCCAATGTAAGGAAGCTAAGAACCTTGAAAAGAGGATACCAGAATTCTC Here are the parameters and result I'm getting online: Blast4-request ::= { body queue-search { program "blastn", service "plain", queries bioseq-set { seq-set { seq { id { local id 26297 }, descr { title "consensus", user { type str "CFastaReader", data { { label str "DefLine", data str ">consensus" } } } }, inst { repr raw, mol na, length 17, seq-data ncbi2na '8A3520F740'H } } } }, subject sequences { { id { local id 26299 }, descr { title "FVFTF6N01A86BR", user { type str "CFastaReader", data { { label str "DefLine", data str ">FVFTF6N01A86BR" } } } }, inst { repr raw, mol na, length 51, seq-data ncbi2na '0543B0A09C205F80228C520F74'H } } }, algorithm-options { { name "EvalueThreshold", value cutoff e-value { 1, 10, 1 } }, { name "UngappedMode", value boolean FALSE }, { name "PercentIdentity", value real { 0, 10, 0 } }, { name "HitlistSize", value integer 100 }, { name "EffectiveSearchSpace", value big-integer 0 }, { name "DbLength", value big-integer 0 }, { name "WindowSize", value integer 0 }, { name "DustFiltering", value boolean FALSE }, { name "RepeatFiltering", value boolean FALSE }, { name "MaskAtHash", value boolean TRUE }, { name "MismatchPenalty", value integer -3 }, { name "MatchReward", value integer 2 }, { name "GapOpeningCost", value integer 5 }, { name "GapExtensionCost", value integer 2 }, { name "StrandOption", value strand-type both-strands }, { name "WordSize", value integer 7 } }, format-options { { name "Web_JobTitle", value string "consensus" }, { name "Web_BlastSpecialPage", value string "blast2seq" } } } } >lcl|30439 FVFTF6N01A86BR Length=51 Sort alignments for this subject sequence by: E value Score Percent identity Query start position Subject start position Score = 24.7 bits (26), Expect = 2e-05 Identities = 17/18 (94%), Gaps = 1/18 (5%) Strand=Plus/Plus Query 1 GAGGAT-CCAGAATTCTC 17 |||||| ||||||||||| Sbjct 34 GAGGATACCAGAATTCTC 51 Here's the output from a local search (I changed the expect to 5.0 just to prove to myself that some parameters are getting through OK): my @params = (-program => 'blastn', -outfile => 'bl2seq.out', -FILTER => 'F', -WORDSIZE => 7, -expect => 5.0); my $factory = Bio::Tools::Run::StandAloneBlast->new(@params); my $bl2seq_report = $factory->bl2seq($cons_seqobj, $single_seqobj); #consensus vs. FVFTF6N01A86BR print Dumper $bl2seq_report->next_result; $VAR1 = bless( { '_inclusion_threshold' => undef, '_queryacc' => 'adapter_consensus', '_iteration_index' => 0, '_iteration_count' => 1, '_hits' => [], '_hitindex' => 0, '_querylength' => '17', '_querydesc' => '', '_iterations' => [ bless( { '_oldhits_not_below_threshold' => [], '_newhits_unclassified' => [], '_number' => 1, '_oldhits_newly_below_threshold' => [], '_hit_factory' => bless( { 'interface' => 'Bio::Search::Hit::HitI', 'type' => 'Bio::Search::Hit::BlastHit', '_loaded_types' => { 'Bio::Search::Hit::BlastHit' => 1 }, '_root_verbose' => 0 }, 'Bio::Factory::ObjectFactory' ), '_newhits_below_threshold' => [ { '-algorithm' => 'BLASTN', '-description' => '', '-length' => '51', '-query_len' => '17', '-hsp_factory' => bless( { 'interface' => 'Bio::Search::HSP::HSPI', 'type' => 'Bio::Search::HSP::GenericHSP', '_loaded_types' => { 'Bio::Search::HSP::GenericHSP' => 1 }, '_root_verbose' => 0 }, 'Bio::Factory::ObjectFactory' ), '-name' => 'FVFTF6N01A86BR', '-rank' => 1, '-hsps' => [ { '-query_start' => '7', '-algorithm' => 'BLASTN', '-hit_seq' => 'ccagaattctc', '-hit_length' => '51', '-query_length' => '17', '-query_desc' => '', '-query_frame' => 0, '-rank' => 1, '-hit_desc' => '', '-query_end' => '17', '-hit_name' => 'FVFTF6N01A86BR', '-identical' => '11', '-query_name' => 'adapter_consensus', '-evalue' => '1e-04', '-score' => '11', '-conserved' => '11', '-hit_frame' => 0, '-hsp_length' => '11', '-query_seq' => 'ccagaattctc', '-hit_start' => '41', '-homology_seq' => '|||||||||||', '-hit_end' => '51', '-bits' => '22.3' }, { '-query_start' => '9', '-algorithm' => 'BLASTN', '-hit_seq' => 'agaattct', '-hit_length' => '51', '-query_length' => '17', '-query_desc' => '', '-query_frame' => 0, '-rank' => 2, '-hit_desc' => '', '-query_end' => '16', '-hit_name' => 'FVFTF6N01A86BR', '-identical' => '8', '-query_name' => 'adapter_consensus', '-evalue' => '0.007', '-score' => '8', '-conserved' => '8', '-hit_frame' => 0, '-hsp_length' => '8', '-query_seq' => 'agaattct', '-hit_start' => '50', '-homology_seq' => '||||||||', '-hit_end' => '43', '-bits' => '16.4' } ], '-accession' => 'FVFTF6N01A86BR', '-significance' => '1e-04' } ], '_root_verbose' => 0, '_newhits_not_below_threshold' => [], '_oldhits_below_threshold' => [] }, 'Bio::Search::Iteration::GenericIteration' ) ], '_hit_factory' => $VAR1->{'_iterations'}[0]{'_hit_factory'}, '_statistics' => bless( { 'stats' => { 'S1' => '4', 'S1_bits' => '8.4', 'kappa_gapped' => '0.711', 'X3_bits' => '99.1', 'X1' => '4', 'lambda_gapped' => '1.37', 'X2' => '15', 'S2' => '4', 'seqs_better_than_cutoff' => '1', 'Hits_to_DB' => '5', 'num_extensions' => '2', 'num_successful_extensions' => '2', 'X1_bits' => '7.9', 'X3' => '50', 'dbentries' => '1', 'entropy_gapped' => '1.31', 'X2_bits' => '29.7', 'S2_bits' => '8.4' } }, 'Bio::Search::GenericStatistics' ), '_algorithm' => 'BLASTN', '_parameters' => bless( { 'params' => { 'gapext' => '2', 'matrix' => 'blastn matrix:1 -3', 'expect' => '5.0', 'allowgaps' => 'yes', 'gapopen' => '5' } }, 'Bio::Tools::Run::GenericParameters' ), '_root_verbose' => 0, '_queryname' => 'adapter_consensus' }, 'Bio::Search::Result::BlastResult' ); From David.Messina at sbc.su.se Wed Aug 18 18:32:37 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 19 Aug 2010 00:32:37 +0200 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlast.pm - bl2seq question In-Reply-To: <4C6C3259.4060304@bcm.tmc.edu> References: <4C6C3259.4060304@bcm.tmc.edu> Message-ID: Hi Caleb, The first thing I would do is take BioPerl out of the equation and test your local bl2seq on the command line. If you get the same output locally as on the web version, then there is a problem with BioPerl. If you're still seeing a discrepancy between the web and your local run, then this isn't a problem with BioPerl. Just to be clear, BioPerl doesn't "implement" any of the BLAST programs; it is simply a wrapper around the programs that you download from NCBI. That doesn't mean BioPerl isn't at fault, of course, just that it's important to isolate the problem carefully. The most common reasons for these discrepancies are: - different version numbers of BLAST 2.2.21? 2.2.22? Is it the same on the web as locally? - similarly, different implementations of BLAST NCBI's old BLAST suite is now deprecated and replaced with BLAST+. All of the online BLAST web queries are Blast+ now ? are you running BLAST+ locally? (there's also a separate BioPerl wrapper for BLAST+ called Bio::Tools::Run::BlastPlus) - hidden "default" parameters Even though you're only changing a handful of parameters, the defaults (particularly on the web version) may be different than what you expect. In your case, it looks like on the web version, match score is 2 and mismatch is -3. However, in the local version I believe match score is 1 and a mismatch is -3. See this line in the params block near the end of your post: 'matrix' => 'blastn matrix:1 -3', Dave From sidd.basu at gmail.com Wed Aug 18 20:28:32 2010 From: sidd.basu at gmail.com (Siddhartha Basu) Date: Wed, 18 Aug 2010 19:28:32 -0500 Subject: [Bioperl-l] Re: [RFC] Interolog::Walk In-Reply-To: <4C6BF4BD.5010200@sms.ed.ac.uk> References: <4C6BF4BD.5010200@sms.ed.ac.uk> Message-ID: <20100819002830.GA366@Macintosh-235.local> Hi, On Wed, 18 Aug 2010, Giuseppe Gallone wrote: > Hello BioPerl community - I've written a new module called Interolog::Walk > that I'm planning to put on CPAN. I would be grateful if you might take a > look at the brief description I attached and tell me what you think. I'll > be more than happy to post further details should the module be of some > interest for someone. > > Also, I am not totally sure about having the correct name for it. This is > my first module and It would be great if you could advise on naming it > appropriately. Hopefully the following description will give an idea on > what it does. > > =================== > > > NAME > Interolog::Walk - Retrieve, score and visualize putative > Protein-Protein Interactions through the orthology-walk method > > DESCRIPTION > A common activity in computational biology is to mine protein-protein > interactions from publicly available databases in order to build > Protein-Protein Interaction (PPI) datasets. > In many instances, however, the number of experimentally obtained annotated > PPIs is very scarce and it would be helpful to enrich the experimental > dataset with high-quality, computationally-inferred PPIs. Such > computationally-obtained dataset can extend, support or enrich experimental > PPI datasets, and are of crucial importance in high-throughput gene > prioritization studies, i.e. to drive hypotheses and restrict the > dimensionality of many gene functional discovery problems. > This Perl Module, Interolog::Walk, is aimed at building putative PPI > datasets on the basis of a number of comparative biology paradigms: the > module implements a collection of computational biology algorithms based on > the concept of "orthology projection". If interacting proteins A and B in > organism X have orthologs A' and B' in organism Y, under certain conditions > one can assume that the interaction will be conserved in organism Y, i.e. > the A-B interaction can be "projected through the orthologies" to obtain a > putative A'-B' interaction. The pair of interactions (A-B) and (A'-B') are > named "Interologs" (see for instance [1] and [2]). > > Interolog::Walk collects, analyses and collates gene orthology data > provided by the Ensembl Consortium (www.ensembl.org) as well as PPI data > provided by EBI Intact (http://www.ebi.ac.uk/intact/). It provides the user > with the possibility of rating the quality and reliability of the putative > interactions collected, by means of confidence scores, and optionally > outputs network representations of the datasets, compatible with the > biological network representation standard, Cytoscape. Sounds interesting. I am currently playing around with a perl based webapp for displaying interactome using cytoscapeweb. Depending how your design pans out, would be happy to use your module as a backend analysis layer. And on a related note, you might want to have a look at bioperl-network and if there is any overlap might be worth contributing. -siddhartha > > USAGE > In order to carry out an interolog walk we start with a set of gene > identifiers in one organism of interest. We query those ids against a > number of comparative biology databases to retrieve a list of orthologues > for each gene id of interest, in one or more species. > In the following step we rely on PPI databases to retrieve the list of > available interactors for the protein ids obtained. The output at this > stage consists of a list of interactors of the orthologues of the initial > gene set, plus several fields of ancillary data. > In the last step of the process we project the interactions - again using > orthology data - back to the original species of interest. The output of > the process is a list of PUTATIVE INTERACTORS of the initial gene set, plus > several fields of ancillary data. > > ==================== > > Given the scope and the focus of the project, I would imagine that viable > alternatives for the namespace might be > > Bio::Orthology::InterologWalk > Bio::InterologMap > > or maybe > Interolog::Map > Orthology::Map > Orthology::InterologMap > > There are no similar projects as far as I could see so I shouldn't run the > risk of overlapping namespaces. Still I would love to know your informed > opinion about it. > > best, > Giuseppe > > > > REFERENCES > [1] Yu H, Luscombe NM, Lu HX, Zhu X, Xia Y, Han JD, Bertin N, Chung S, > Vidal M, Gerstein M. Annotation transfer between genomes: protein-protein > interologs and protein-DNA regulogs. Genome Research 2004 > Jun;14(6):1107-18. > > [2]Wiles AM, Doderer M, Ruan J, Gu T-T, Ravi D, Blackman BA, Bishop AJR. > "Building and Analyzing Protein Interactome Networks by Cross-species > Comparisons." BMC Systems Biology 2010, 4:36 - PMID: 20353594 > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From dan.kortschak at adelaide.edu.au Wed Aug 18 22:15:03 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Thu, 19 Aug 2010 11:45:03 +0930 Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query Message-ID: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au> Hi Everyone, I'm wanting to set up a persistent data store for some of my work and am in the process of choosing parts for my system. From my brief look around I think I'd like to use BioSQL (next best choice being Chado - but BioPerl bindings in bioperl-db for BioSQL being the decider here), but have noticed comments some time back that bioperl-db and PostgreSQL 8.3 (my prefered engine - though MySQL is possible, but makes the whole system messier) don't play well together. What is the status of the casting expectation conflict between bioperl-db and Pg8.3? The scripts are run with safe data, so placeholders aren't strictly crucial (though speed may be an issue?) and `$dbh->{pg_server_prepare} = 0;' seems like it could be an option. Can anybody provide any advice on this issue? thanks Dan Kortschak From cjfields at illinois.edu Wed Aug 18 23:29:36 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 18 Aug 2010 22:29:36 -0500 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlast.pm - bl2seq question In-Reply-To: References: <4C6C3259.4060304@bcm.tmc.edu> Message-ID: <194D43EC-A44C-450A-B57B-EC379DBCB935@illinois.edu> Wouldn't surprise me too much if the parameters are not set the same; IIRC the main BLAST URL API and the online NCBI Web-BLAST have different default settings. chris On Aug 18, 2010, at 5:32 PM, Dave Messina wrote: > Hi Caleb, > > The first thing I would do is take BioPerl out of the equation and test your local bl2seq on the command line. If you get the same output locally as on the web version, then there is a problem with BioPerl. If you're still seeing a discrepancy between the web and your local run, then this isn't a problem with BioPerl. > > Just to be clear, BioPerl doesn't "implement" any of the BLAST programs; it is simply a wrapper around the programs that you download from NCBI. That doesn't mean BioPerl isn't at fault, of course, just that it's important to isolate the problem carefully. > > The most common reasons for these discrepancies are: > > - different version numbers of BLAST > > 2.2.21? 2.2.22? Is it the same on the web as locally? > > - similarly, different implementations of BLAST > > NCBI's old BLAST suite is now deprecated and replaced with BLAST+. All of the online BLAST web queries are Blast+ now ? are you running BLAST+ locally? (there's also a separate BioPerl wrapper for BLAST+ called Bio::Tools::Run::BlastPlus) > > - hidden "default" parameters > > Even though you're only changing a handful of parameters, the defaults (particularly on the web version) may be different than what you expect. > > In your case, it looks like on the web version, match score is 2 and mismatch is -3. However, in the local version I believe match score is 1 and a mismatch is -3. > > See this line in the params block near the end of your post: > > 'matrix' => 'blastn matrix:1 -3', > > > > Dave > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at drycafe.net Thu Aug 19 01:48:19 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Thu, 19 Aug 2010 01:48:19 -0400 Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query In-Reply-To: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au> References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: <986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net> Hi Dan, the casting isn't an issue anymore, I think. (And even if it were, there is actually a small script that brings back the casts that were built into 8.2.) Have you found an example where it still is? -hilmar On Aug 18, 2010, at 10:15 PM, Dan Kortschak wrote: > Hi Everyone, > > I'm wanting to set up a persistent data store for some of my work > and am > in the process of choosing parts for my system. From my brief look > around I think I'd like to use BioSQL (next best choice being Chado - > but BioPerl bindings in bioperl-db for BioSQL being the decider here), > but have noticed comments some time back that bioperl-db and > PostgreSQL > 8.3 (my prefered engine - though MySQL is possible, but makes the > whole > system messier) don't play well together. > > What is the status of the casting expectation conflict between > bioperl-db and Pg8.3? The scripts are run with safe data, so > placeholders aren't strictly crucial (though speed may be an issue?) > and > `$dbh->{pg_server_prepare} = 0;' seems like it could be an option. > > Can anybody provide any advice on this issue? > > thanks > Dan Kortschak > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From dan.kortschak at adelaide.edu.au Thu Aug 19 01:54:03 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Thu, 19 Aug 2010 15:24:03 +0930 Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query In-Reply-To: <986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net> References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au> <986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net> Message-ID: <1282197243.14127.27.camel@zoidberg.mbs.adelaide.edu.au> Hi Hilmar, No, I haven't found any problems, just hoping to avoid them by prior research. thanks Dan On Thu, 2010-08-19 at 01:48 -0400, Hilmar Lapp wrote: > Hi Dan, > > the casting isn't an issue anymore, I think. (And even if it were, > there is actually a small script that brings back the casts that > were > built into 8.2.) Have you found an example where it still is? > > -hilmar From biopython at maubp.freeserve.co.uk Thu Aug 19 06:01:03 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 19 Aug 2010 11:01:03 +0100 Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query In-Reply-To: <986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net> References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au> <986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net> Message-ID: On Thu, Aug 19, 2010 at 6:48 AM, Hilmar Lapp wrote: > Hi Dan, > > the casting isn't an issue anymore, I think. (And even if it were, there is > actually a small script that brings back the casts that were built into > 8.2.) Have you found an example where it still is? > > ? ? ? ?-hilmar Hi Hilmar, Do the bioperl-db bindings for BioSQL on PostgreSQL still require those extra rules in the schema? http://bugzilla.open-bio.org/show_bug.cgi?id=2839 Peter From G.Gallone at sms.ed.ac.uk Thu Aug 19 06:45:36 2010 From: G.Gallone at sms.ed.ac.uk (Giuseppe Gallone) Date: Thu, 19 Aug 2010 11:45:36 +0100 Subject: [Bioperl-l] [RFC] Interolog::Walk In-Reply-To: <8BBCD561-CA5E-412B-8CBC-34767006A74A@sbc.su.se> References: <4C6BF4BD.5010200@sms.ed.ac.uk> <8BBCD561-CA5E-412B-8CBC-34767006A74A@sbc.su.se> Message-ID: <4C6D0B50.4050902@sms.ed.ac.uk> Hi Dave, thank you very much for your helpful comments. Regarding the module name: I will follow your advice and avoid to propose a new root during the module registration. As for the second level, I haven't been able to find anything related to homology/orthology, therefore I'm not sure whether I should go for Bio::Orthology::InterologMap or Bio::Homology::InterologMap The first one being maybe a bit more specific. I might also expand further as in Bio::Orthology::Interolog::Map, just in case somebody else finds other interesting applications for the Interolog concept and would like to "plug in" their own contribution. Would this make any sense? I also appreciate your comments on the documentation. The one I provided is actually not the full pod I was planning to include, but rather an extract. What I have at the moment is a description, for each method, in the following form: ===================================== remove_duplicate_rows Usage : $RC = InterologMap::remove_duplicate_rows(input_handle => $dbh, output_handle => $out_data, header => 'standard', ); Purpose : This is used to clean up a TSV data files of duplicate entries. Occasionally, Intact can return duplicate entries. This routine will make sure no such duplicates are kept. A new datafile is built. The number of unique data rows is updated. Returns : success/error Argument : database handle to input file, filehandle to outputfile, header type. Header type is one of the following: - "standard": when the routine is used to clean up an interolog walk file (the header will be longer) - "direct": when the routine is used to clean up a file of real db interaction (the header is shorter) - no field provided: default is standard Throws : - Comment : Sample See Also : ======================================= On top of that, there is a DESCRIPTION, USAGE, and SYNOPSIS. The synopsis has some code with an example of typical usage of the module. Please take a look at this (attached below) and tell me what you think. You mention that the description contains a lot of background information. Would you recommend reducing it, or placing it elsewhere? I was considering to write a little tutorial in latex as soon as possible anyway, to provide a "centralised" source of information to familiarise with the module. Does this respect the CPAN regulations? As for your question on the structure of the module: you are indeed right, the idea when running the "orthology walk" is to create a pipeline of subroutines: there's a core set of subroutines meant to work in strict sequentiality. Each of these subroutines expects, as input, the output of the previous one. The input/output dataset is currently in the form of a TSV text file, which I process with the help of the DBI module (to be more specific, I use DBD::CSV). While there's a certain flexibility regarding how to use the module, one core idea remains: in order to get the set of putative interactors, the user would have to call at least three basic routines: (A) ================= 1)get_forward_orthologies(): this queries the initial gene list against one or more Ensembl dbs (using the Ensembl Perl Api) and retrieves their orthologues, plus a number of ancillary data fields (mainly conservation data, eg dn/ds ratio,distance from ancestor,orthology type, etc) 2)get_interactors(): this queries the orthology list built in the previous stage against a PSICQUIC-enabled PPI db using Rest (at the moment I only query the EBI Intact DB, but it should be easy to expand this and query all PSICQUIC compatible PPI dbs transparently). This step will "fatten" the dataset built in (1) with the interactors of those orthologues, plus ancillary data (including lots of parameters describing the quality, nature, origin of the annotated interaction) 3)get_backward_orthologies(): this queries the interactor list built in the previous stage against one or more Ensembl dbs to find orthologues *back* in the original species. It also adds a number of supplementary information just like in (1). ================== At the end of this procedure the user will have a TSV files where each row contains a binary putative interaction plus (currently) 37 supplementary data fields. One can then scan these results to check for duplicates, to compute counts, to see if we have discovered new gene ids that were not present in the original dataset (hopefully we have :) ). Most importantly, one can then further process these results to do one or more of the following: (B) compute a global confidence score to assess the reliability of the each binary putative interaction (C) extract the binary putative PPIs from the dataset and save them in a format compatible with Cytoscape: this helps providing a visual quality to the result: one can then apply network analysis tools to discover motifs, clusters, etc. The format I use is currently .SIF + attributes, as detailed in http://cytoscape.wodaklab.org/wiki/Cytoscape_User_Manual/Network_Formats (D) given the same initial gene list, one can also build a dataset of REAL, experimentally-obtained PPIs,(without mapping through orthologies in other species). One can then compare this dataset with the Putative dataset to see if/where the two overlap, what's the intersection or the differences, etc. In order to suggest ways of using the module I have written 4 sample scripts and I will include them in the module. Each script utilises the module and uses/reuses subroutines in a pipeline fashion, and does the following: 1)doInterologWalk.pl: runs the basic pipeline in (A) 2)doScores.pl: computes and adds confidence scores as explained in (B) 3)doNetworks.pl: computes SIF network + attributes as in (D) 4)getRealInteractions.pl: runs a pipeline to obtain real PPIs from the inital gene set. Hope I didn't make this too confusing. I would love to hear back from you and from anybody else that would like to provide feedback. Cheers Giuseppe On 18/08/10 17:52, Dave Messina wrote: > Hi Giuseppe, > > Sounds really interesting ? thanks for posting this. > >> Bio::Orthology::InterologWalk > > I vote for this name, or in any case something with Bio:: as the top-level namespace since it's a biology-related package. > > I like that you're providing a lot of background and information about the project in the documentation. However, the USAGE section should give information about how to use the module, with example code. You can look at other modules on CPAN (or in BioPerl) to see the conventions for writing documentation. > > Also, from what you wrote, it sounds like this might be a pipeline or a script rather than a module per se, or perhaps a script and a set of modules. It would be helpful to clarify in your documentation (if you haven't already) how exactly things are organized (and of course example code will help with that, too). > > > Hope that's helpful, and let us know when you've got it up on CPAN so we can try it out! > > > Dave > > NAME Interolog::Walk - Retrieve, score and visualize putative Protein-Protein Interactions through the orthology-walk method SYNOPSIS use Interolog::Walk; First, obtain Intact Interactions for the dataset (see example in "getDirectInteractions.pl"): #get a registry from Ensembl my $registry = InterologMap::setup_ensembl_adaptor(connect_to_db => $ensembl_db, source_species => $sourceorg, verbose => 1 ); #query actual interactions $RC = InterologMap::Direct::get_direct_interactions(registry => $registry, source_species => $sourceorg, input_path => $in_path, output_path => $out_path, url => $url, ); do some postprocessing (see "do_counts()" and "extract_unseen_ids()" ) and then do the actual interolog walk on the dataset with the following sequence of three methods. get orthologues of starting set: $RC = InterologMap::get_forward_orthologies(registry => $registry, ensembl_db => $ensembl_db, input_path => $in_path, output_path => $out_path, source_org => $sourceorg, dest_org => $destorg, ); add interactors of orthologues found by "get_forward_orthologies()": $RC = InterologMap::get_interactions(input_path => $in_path, output_path => $out_path, url => $url, url_global => $url_global, ); add orthologues of interactors found by "get_interactions()": $RC = InterologMap::get_backward_orthologies(registry => $registry, ensembl_db => $ensembl_db, input_path => $in_path, output_path => $out_path, error_path => $err_path, source_org => $sourceorg, ); do some postprocessing (see "remove_duplicate_rows()", "do_counts()", "extract_unseen_ids()") and then optionally compute a composite score for the putative interactions obtained: $RC = InterologMap::Scores::compute_scores(input_path => $in_path, score_path => $score_path, output_path => $out_path, term_graph => $onto_graph, M_IT_SCORE => $M_IT, M_DM_SCORE => $M_DM, M_ME_DM_SCORE => $M_MDM, M_ME_TAXA_SCORE => $M_MTAXA ); get some networks and network attributes which you can then visualise with cytoscape $RC = InterologMap::Networks::do_network(registry => $registry, db => $ensembl_db, input_path => $in_path, output_path => $out_path, source_org => $sourceorg, orthology_type => $orthtype, ); $RC = InterologMap::Networks::do_attributes(registry => $registry, input_path => $in_path, output_path => $out_path, source_org => $sourceorg, label_type => 'external name' ); *The synopsis above only lists the major methods and parameters.* DESCRIPTION A common activity in computational biology is to mine protein-protein interactions from publicly available databases to build *Protein-Protein Interaction* (PPI) datasets. In many instances, however, the number of experimentally obtained annotated PPIs is very scarce and it would be helpful to enrich the experimental dataset with high-quality, computationally-inferred PPIs. Such computationally-obtained dataset can extend, support or enrich experimental PPI datasets, and are of crucial importance in high-throughput gene prioritization studies, i.e. to drive hypotheses and restrict the dimensionality of functional discovery problems. This Perl Module, Interolog::Walk, is aimed at building putative PPI datasets on the basis of a number of comparative biology paradigms: the module implements a collection of computational biology algorithms based on the concept of "orthology projection". If interacting proteins A and B in organism X have orthologs A' and B' in organism Y, under certain conditions one can assume that the interaction will be conserved in organism Y, i.e. the A-B interaction can be "projected through the orthologies" to obtain a putative A'-B' interaction. The pair of interactions (A-B) and (A'-B') are named "Interologs". Interolog::Walk collects, analyses and collates gene orthology data provided by the Ensembl Consortium as well as PPI data provided by EBI Intact. It provides the user with the possibility of rating the quality and reliability of the putative interactions collected, by means of confidence scores, and optionally outputs network representations of the datasets, compatible with the biological network representation standard, Cytoscape. BASIC USAGE Rationale behind "Interolog::Walk". \EBI Intact API/ .--------------. | .-------------. (2) | A(e.g. mouse)|<------------------------>| B(mouse) | (3) `--------------' `-------------' ^ | /Ensembl\ | | \ Ensembl / / Compara \ | | \Compara/ / Api \ | | \ Api / | | .--------------. .-------------. (1) | A'(e.g. fly) |. . . . . . . . . . . . . | B'(fly) | (4) `--------------' [SCORED]PUTATIVE PPI `-------------' (Output of Interolog::Walk) In order to carry out an interolog walk we start with a set of gene identifiers in one organism of interest (1). We query those ids against a number of comparative biology databases to retrieve a list of orthologues for the gene id of interest, in one or more species (2). In the next step we rely instead on PPI databases to retrieve the list of available interactors for the protein ids obtained in (2). The output at this stage consists of a list of interactors of the orthologues of the initial gene set, plus several fields of ancillary data (whose importance will be explained later) (3). In the last step of this process we will need to project the interactions in (3) - again using orthology data - back to the original species of interest. The output of the process is a list of PUTATIVE INTERACTORS of the initial gene set, plus several fields of ancillary data. "Interolog::Walk" provides three main functions to carry out the basic walk, "get_forward_orthologies()", "get_interactions()" and "get_backward_orthologies()". These functions must be called strictly sequentially in your script, as the process, analyse and attach data to the output in a pipeline-like fashion, i.e. processing the output of the preceding function. get_forward_orthologies get_interactions get_backward_orthologies SCORING THE PUTATIVE INTERACTIONS BUILDING PUTATIVE INTERACTION NETWORKS BUGS Please report any you find SUPPORT TODO AUTHOR Giuseppe Gallone CPAN ID: GGALLONE University of Edinburgh COPYRIGHT The Interolog::Walk module is Copyright (c) 2010 Giuseppe Gallone All rights reserved. You may distribute under the terms of either the GNU General Public License or the Artistic License, as specified in the Perl 5.10.0 README file. SEE ALSO -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From G.Gallone at sms.ed.ac.uk Thu Aug 19 08:42:28 2010 From: G.Gallone at sms.ed.ac.uk (Giuseppe Gallone) Date: Thu, 19 Aug 2010 13:42:28 +0100 Subject: [Bioperl-l] [RFC] Interolog::Walk In-Reply-To: <20100819002830.GA366@Macintosh-235.local> References: <4C6BF4BD.5010200@sms.ed.ac.uk> <20100819002830.GA366@Macintosh-235.local> Message-ID: <4C6D26B4.5090702@sms.ed.ac.uk> Dear Siddhartha, glad to hear this might be helpful. As for the bioperl-network package you mention, thank for you for mentioning that. I gave a quick look to its documentation and looks like a much deeper and more complex effort than what I have in my package. I've actually been using a lot the package Graph on which it seems to be based and found it very helpful. I'm not sure if the network routines in my module overlap with it though: all I do in my package is parse the dataset, filtering out only what requested to build a cytoscape SIF file and optionally some cytoscape NOA attribute files, as requested by the cytoscape specification in http://cytoscape.wodaklab.org/wiki/Cytoscape_User_Manual/Network_Formats instead it looks like bioperl-network actually builds some kind of internal representation of the network for further manipulation in Perl, if I understand it correctly? Kind regards Giuseppe On 19/08/10 01:28, Siddhartha Basu wrote: > Sounds interesting. I am currently playing around with a perl based webapp for displaying interactome > using cytoscapeweb. Depending how your design pans out, would be happy to > use your module as a backend analysis layer. And on a related note, you > might want to have a look at bioperl-network and if there is any overlap > might be worth contributing. > > -siddhartha > -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From xupeng86 at gmail.com Thu Aug 19 04:02:48 2010 From: xupeng86 at gmail.com (xupeng) Date: Thu, 19 Aug 2010 16:02:48 +0800 Subject: [Bioperl-l] Why I can't find the perl script "load_seqdatabase.pl" when use biosql database? Message-ID: <201008191602.49068.xupeng86@gmail.com> I've downloaded the biosql-1.0.1.tar.gz. It works well. But I can't find the 'load_seqdatabase.pl' when I try to import the Genbank files into biosql databsase. Can anyone give me a copy of that file? many thanks ! From sunhanifk at gmail.com Thu Aug 19 10:25:38 2010 From: sunhanifk at gmail.com (han sun) Date: Thu, 19 Aug 2010 22:25:38 +0800 Subject: [Bioperl-l] Could I install BioPerl on Windows with the ActivePerl 5.12.1? Message-ID: Hello everyone, I have used perl for several months,and I now want to feel the power of bioperl. But it seems that the installing is more difficult than I thought. I typed the commands. install-shell rep add bioperl http://bioperl.org/DIST rep add uwinnipeg http://cpan.uwinnipeg.ca/PPMPackages/12xx/ rep add trouchelle http://trouchelle.com/ppm12/ install BioPerl However,the installing failed, ppm install failed: Can't find any package that provides List::MoreUtils for Bundle-BioPerl-Core Can't find any package that provides PostScript::TextBlock for Bundle-BioPerl-Core Can't find any package that provides Ace:: for Bundle-BioPerl-Core Can't find any package that provides SOAP::Lite for Bundle-BioPerl-Core Can't find any package that provides Convert::Binary::C for Bundle-BioPerl-Core Can't find any package that provides XML::Twig for Bundle-BioPerl-Core Can't find any package that provides DB_File:: for Bundle-BioPerl-Core Can't find any package that provides IPC::Run for GraphViz Can't find any package that provides XML-XPathEngine for XML-DOM-XPath Can't find any package that provides List-MoreUtils for Moose Can't find any package that provides List-MoreUtils for Class-MOP then I tried install http://www.bribes.org/perl/ppm/GD.ppd and tried the installation again,but it still didn't help. * * * * * * *Do you konw what's wrong with the problem?* * * * * *Please help me,thanks very much.* From cjfields1 at gmail.com Thu Aug 19 10:33:26 2010 From: cjfields1 at gmail.com (Christopher Fields) Date: Thu, 19 Aug 2010 09:33:26 -0500 Subject: [Bioperl-l] Could I install BioPerl on Windows with the ActivePerl 5.12.1? In-Reply-To: References: Message-ID: <78E913D5-00E2-45F2-AA9D-7F4A7CDBFDA1@gmail.com> Try using ActivePerl 5.10 instead of v5.12. It's very possible the PPM won't work for v5.12 yet. chris On Aug 19, 2010, at 9:25 AM, han sun wrote: > Hello everyone, > > I have used perl for several months,and I now want to feel the power of > bioperl. > But it seems that the installing is more difficult than I thought. > > I typed the commands. > > > > install-shell > > > rep add bioperl http://bioperl.org/DIST > > > rep add uwinnipeg > http://cpan.uwinnipeg.ca/PPMPackages/12xx/ > > > rep add trouchelle http://trouchelle.com/ppm12/ > > install BioPerl > > However,the installing failed, > > ppm install failed: > Can't find any package that provides List::MoreUtils for Bundle-BioPerl-Core > Can't find any package that provides PostScript::TextBlock for > Bundle-BioPerl-Core > Can't find any package that provides Ace:: for Bundle-BioPerl-Core > Can't find any package that provides SOAP::Lite for Bundle-BioPerl-Core > Can't find any package that provides Convert::Binary::C for > Bundle-BioPerl-Core > Can't find any package that provides XML::Twig for Bundle-BioPerl-Core > Can't find any package that provides DB_File:: for Bundle-BioPerl-Core > Can't find any package that provides IPC::Run for GraphViz > Can't find any package that provides XML-XPathEngine for XML-DOM-XPath > Can't find any package that provides List-MoreUtils for Moose > Can't find any package that provides List-MoreUtils for Class-MOP > > > then I tried > > install http://www.bribes.org/perl/ppm/GD.ppd > > and tried the installation again,but it still didn't help. > > * > * > * > * > * > * > > > *Do you konw what's wrong with the problem?* > * > * > * > * > *Please help me,thanks very much.* > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at drycafe.net Thu Aug 19 10:53:22 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Thu, 19 Aug 2010 10:53:22 -0400 Subject: [Bioperl-l] Why I can't find the perl script "load_seqdatabase.pl" when use biosql database? In-Reply-To: <201008191602.49068.xupeng86@gmail.com> References: <201008191602.49068.xupeng86@gmail.com> Message-ID: <14AD92D9-3EA7-4E74-8EF7-2B2EC70D9095@drycafe.net> The file comes with Bioperl-db, not BioSQL. That is so because it depends on BioPerl and on Bioperl-db, and so you will need to have both installed. -hilmar On Aug 19, 2010, at 4:02 AM, xupeng wrote: > I've downloaded the biosql-1.0.1.tar.gz. It works well. But I > can't find the 'load_seqdatabase.pl' when I try to import the > Genbank files into biosql databsase. > Can anyone give me a copy of that file? > many thanks ! > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From hlapp at drycafe.net Thu Aug 19 10:58:46 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Thu, 19 Aug 2010 10:58:46 -0400 Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query In-Reply-To: References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au> <986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net> Message-ID: <045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net> Yes, unfortunately they do. The feature for obviating them (namely nested transactions) is there in Pg 8.2+, but Bioperl-db doesn't use them yet ... I have to learn more about Class::DBIx first to decide whether it's better to first implement nested transactions in the home- grown ORM that Bioperl-db in essence is, or whether it's better to reimplement everything in Class::DBIx instead. There are new datatypes in Bioperl, and relations in BioSQL that could hold them, and so I need to decide what's the way forward. -hilmar On Aug 19, 2010, at 6:01 AM, Peter wrote: > On Thu, Aug 19, 2010 at 6:48 AM, Hilmar Lapp > wrote: >> Hi Dan, >> >> the casting isn't an issue anymore, I think. (And even if it were, >> there is >> actually a small script that brings back the casts that were built >> into >> 8.2.) Have you found an example where it still is? >> >> -hilmar > > Hi Hilmar, > > Do the bioperl-db bindings for BioSQL on PostgreSQL still require > those > extra rules in the schema? > http://bugzilla.open-bio.org/show_bug.cgi?id=2839 > > Peter -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From mmuratet at hudsonalpha.org Thu Aug 19 11:00:52 2010 From: mmuratet at hudsonalpha.org (Michael Muratet) Date: Thu, 19 Aug 2010 10:00:52 -0500 Subject: [Bioperl-l] Why I can't find the perl script "load_seqdatabase.pl" when use biosql database? In-Reply-To: <14AD92D9-3EA7-4E74-8EF7-2B2EC70D9095@drycafe.net> References: <201008191602.49068.xupeng86@gmail.com> <14AD92D9-3EA7-4E74-8EF7-2B2EC70D9095@drycafe.net> Message-ID: On Aug 19, 2010, at 9:53 AM, Hilmar Lapp wrote: > The file comes with Bioperl-db, not BioSQL. That is so because it > depends on BioPerl and on Bioperl-db, and so you will need to have > both installed. Is load_seqdatabase.pl still the best method? I vaguely remember a post that said that load_seqdatabase was deprecated, but I can't find it in the archives. Mike > > -hilmar > > On Aug 19, 2010, at 4:02 AM, xupeng wrote: > >> I've downloaded the biosql-1.0.1.tar.gz. It works well. But I >> can't find the 'load_seqdatabase.pl' when I try to import the >> Genbank files into biosql databsase. >> Can anyone give me a copy of that file? >> many thanks ! >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Michael Muratet, Ph.D. Senior Scientist HudsonAlpha Institute for Biotechnology mmuratet at hudsonalpha.org (256) 327-0473 (p) (256) 327-0966 (f) Room 4005 601 Genome Way Huntsville, Alabama 35806 From hlapp at drycafe.net Thu Aug 19 11:29:31 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Thu, 19 Aug 2010 11:29:31 -0400 Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query In-Reply-To: <5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu> References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au> <986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net> <045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net> <5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu> Message-ID: <5F77404A-086D-4D0C-B3A5-F5119FCF878A@drycafe.net> On Aug 19, 2010, at 11:09 AM, Chris Fields wrote: > DBIx::Class Did I have this in the wrong order :-) More coffee, please. -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From hlapp at drycafe.net Thu Aug 19 11:30:26 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Thu, 19 Aug 2010 11:30:26 -0400 Subject: [Bioperl-l] Why I can't find the perl script "load_seqdatabase.pl" when use biosql database? In-Reply-To: References: <201008191602.49068.xupeng86@gmail.com> <14AD92D9-3EA7-4E74-8EF7-2B2EC70D9095@drycafe.net> Message-ID: It's not deprecated. Unless I'm again mixing up something? -hilmar On Aug 19, 2010, at 11:00 AM, Michael Muratet wrote: > > On Aug 19, 2010, at 9:53 AM, Hilmar Lapp wrote: > >> The file comes with Bioperl-db, not BioSQL. That is so because it >> depends on BioPerl and on Bioperl-db, and so you will need to have >> both installed. > > Is load_seqdatabase.pl still the best method? I vaguely remember a > post that said that load_seqdatabase was deprecated, but I can't > find it in the archives. > > Mike > >> >> -hilmar >> >> On Aug 19, 2010, at 4:02 AM, xupeng wrote: >> >>> I've downloaded the biosql-1.0.1.tar.gz. It works well. But I >>> can't find the 'load_seqdatabase.pl' when I try to import the >>> Genbank files into biosql databsase. >>> Can anyone give me a copy of that file? >>> many thanks ! >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : >> =========================================================== >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Michael Muratet, Ph.D. > Senior Scientist > HudsonAlpha Institute for Biotechnology > mmuratet at hudsonalpha.org > (256) 327-0473 (p) > (256) 327-0966 (f) > > Room 4005 > 601 Genome Way > Huntsville, Alabama 35806 > > > > > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From cjfields at illinois.edu Thu Aug 19 11:09:13 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 19 Aug 2010 10:09:13 -0500 Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query In-Reply-To: <045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net> References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au> <986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net> <045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net> Message-ID: <5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu> I think it's worth exploring having a DBIx::Class-based middle-ware approach similar to what Rob Buels has done for Chado. That would be fairly easy to get started using DBIx::Class::Schema::Loader. After that it would require optimization and tweaking, which is potentially more complex than Rob's setup as Chado is very Pg-specific, but maybe Rob can elaborate... chris On Aug 19, 2010, at 9:58 AM, Hilmar Lapp wrote: > Yes, unfortunately they do. The feature for obviating them (namely nested transactions) is there in Pg 8.2+, but Bioperl-db doesn't use them yet ... I have to learn more about Class::DBIx first to decide whether it's better to first implement nested transactions in the home-grown ORM that Bioperl-db in essence is, or whether it's better to reimplement everything in Class::DBIx instead. > > There are new datatypes in Bioperl, and relations in BioSQL that could hold them, and so I need to decide what's the way forward. > > -hilmar > > On Aug 19, 2010, at 6:01 AM, Peter wrote: > >> On Thu, Aug 19, 2010 at 6:48 AM, Hilmar Lapp wrote: >>> Hi Dan, >>> >>> the casting isn't an issue anymore, I think. (And even if it were, there is >>> actually a small script that brings back the casts that were built into >>> 8.2.) Have you found an example where it still is? >>> >>> -hilmar >> >> Hi Hilmar, >> >> Do the bioperl-db bindings for BioSQL on PostgreSQL still require those >> extra rules in the schema? >> http://bugzilla.open-bio.org/show_bug.cgi?id=2839 >> >> Peter > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Aug 19 11:37:39 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 19 Aug 2010 10:37:39 -0500 Subject: [Bioperl-l] Why I can't find the perl script "load_seqdatabase.pl" when use biosql database? In-Reply-To: References: <201008191602.49068.xupeng86@gmail.com> <14AD92D9-3EA7-4E74-8EF7-2B2EC70D9095@drycafe.net> Message-ID: <68FB78FF-11F7-43D7-9FA3-5DFF7D391FAB@illinois.edu> I don't recall this either. So, can't blame it on lack of coffee :) chris On Aug 19, 2010, at 10:30 AM, Hilmar Lapp wrote: > It's not deprecated. Unless I'm again mixing up something? > > -hilmar > > On Aug 19, 2010, at 11:00 AM, Michael Muratet wrote: > >> >> On Aug 19, 2010, at 9:53 AM, Hilmar Lapp wrote: >> >>> The file comes with Bioperl-db, not BioSQL. That is so because it depends on BioPerl and on Bioperl-db, and so you will need to have both installed. >> >> Is load_seqdatabase.pl still the best method? I vaguely remember a post that said that load_seqdatabase was deprecated, but I can't find it in the archives. >> >> Mike >> >>> >>> -hilmar >>> >>> On Aug 19, 2010, at 4:02 AM, xupeng wrote: >>> >>>> I've downloaded the biosql-1.0.1.tar.gz. It works well. But I >>>> can't find the 'load_seqdatabase.pl' when I try to import the >>>> Genbank files into biosql databsase. >>>> Can anyone give me a copy of that file? >>>> many thanks ! >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : >>> =========================================================== >>> >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Michael Muratet, Ph.D. >> Senior Scientist >> HudsonAlpha Institute for Biotechnology >> mmuratet at hudsonalpha.org >> (256) 327-0473 (p) >> (256) 327-0966 (f) >> >> Room 4005 >> 601 Genome Way >> Huntsville, Alabama 35806 >> >> >> >> >> > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From mmuratet at hudsonalpha.org Thu Aug 19 11:40:02 2010 From: mmuratet at hudsonalpha.org (Michael Muratet) Date: Thu, 19 Aug 2010 10:40:02 -0500 Subject: [Bioperl-l] Why I can't find the perl script "load_seqdatabase.pl" when use biosql database? In-Reply-To: <68FB78FF-11F7-43D7-9FA3-5DFF7D391FAB@illinois.edu> References: <201008191602.49068.xupeng86@gmail.com> <14AD92D9-3EA7-4E74-8EF7-2B2EC70D9095@drycafe.net> <68FB78FF-11F7-43D7-9FA3-5DFF7D391FAB@illinois.edu> Message-ID: On Aug 19, 2010, at 10:37 AM, Chris Fields wrote: > I don't recall this either. So, can't blame it on lack of coffee :) Thanks. I'll keep using it! Mike > > chris > > On Aug 19, 2010, at 10:30 AM, Hilmar Lapp wrote: > >> It's not deprecated. Unless I'm again mixing up something? >> >> -hilmar >> >> On Aug 19, 2010, at 11:00 AM, Michael Muratet wrote: >> >>> >>> On Aug 19, 2010, at 9:53 AM, Hilmar Lapp wrote: >>> >>>> The file comes with Bioperl-db, not BioSQL. That is so because it >>>> depends on BioPerl and on Bioperl-db, and so you will need to >>>> have both installed. >>> >>> Is load_seqdatabase.pl still the best method? I vaguely remember a >>> post that said that load_seqdatabase was deprecated, but I can't >>> find it in the archives. >>> >>> Mike >>> >>>> >>>> -hilmar >>>> >>>> On Aug 19, 2010, at 4:02 AM, xupeng wrote: >>>> >>>>> I've downloaded the biosql-1.0.1.tar.gz. It works well. But I >>>>> can't find the 'load_seqdatabase.pl' when I try to import the >>>>> Genbank files into biosql databsase. >>>>> Can anyone give me a copy of that file? >>>>> many thanks ! >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> -- >>>> =========================================================== >>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : >>>> =========================================================== >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> Michael Muratet, Ph.D. >>> Senior Scientist >>> HudsonAlpha Institute for Biotechnology >>> mmuratet at hudsonalpha.org >>> (256) 327-0473 (p) >>> (256) 327-0966 (f) >>> >>> Room 4005 >>> 601 Genome Way >>> Huntsville, Alabama 35806 >>> >>> >>> >>> >>> >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : >> =========================================================== >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > Michael Muratet, Ph.D. Senior Scientist HudsonAlpha Institute for Biotechnology mmuratet at hudsonalpha.org (256) 327-0473 (p) (256) 327-0966 (f) Room 4005 601 Genome Way Huntsville, Alabama 35806 From cjfields at illinois.edu Thu Aug 19 11:55:54 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 19 Aug 2010 10:55:54 -0500 Subject: [Bioperl-l] Bug? Features with similar ranges, different IDs are considered overlapping In-Reply-To: References: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu> <83732A2C-C6E3-479A-ACAA-FE5756271991@sbc.su.se> Message-ID: <5611499B-FA63-4A52-8279-99B554418374@illinois.edu> On Aug 17, 2010, at 8:52 AM, Dave Messina wrote: >> It seems to me that the genomic comparison is the thing people would do more often, so if you're going to create a flag, the default should be for the genomic comparison > > Yep, agreed. > > And such a flag should be named for the non-default behavior, then, like: -ignore_IDs_for_overlaps > > Dave Probably would just be -ignore_ids as this behavior would have to be consistent across the various Bio::RangeI methods (overlaps, contains, etc). The params are case-insensitive IIRC, so the _IDs would just be lc(). RangeI doesn't define a seq_id(), though, so we either use can() in RangeI (which is dirtier IMO) or define this in the appropriate class, probably LocationI or SeqFeatureI. chris From cjfields at illinois.edu Thu Aug 19 11:56:11 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 19 Aug 2010 10:56:11 -0500 Subject: [Bioperl-l] Bug? Features with similar ranges, different IDs are considered overlapping In-Reply-To: References: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu> <83732A2C-C6E3-479A-ACAA-FE5756271991@sbc.su.se> Message-ID: <7CF700A0-C7A0-4BD2-9757-50B693B3B614@illinois.edu> Makes sense. chris On Aug 17, 2010, at 7:45 AM, Scott Cain wrote: > Hi Dave and Chris, > > It seems to me that the genomic comparison is the thing people would do more often, so if you're going to create a flag, the default should be for the genomic comparison and if somebody is doing the protein space comparison and not getting the the expected results, they'll probably read the docs to find out why. > > Scott > > -- > Scott Cain, Ph. D. > scott at scottcain dot net > Ontario Institute for Cancer Research > http://gmod.org/ > 216 392 3087 > > Snet from my iPhone. > > On Aug 17, 2010, at 5:06 AM, Dave Messina wrote: > >>> Good point; it's probably the context the methods are used that matters. So, maybe just a document clarification? >> >> That's always good, but it really doesn't solve the issue you're describing. >> >> I mean, who would expect to get overlaps for features on different chromosomes? >> >> To me, that's a clear violation of reasonable user expectations. You shouldn't have to read the docs about something like that. >> >> So what's the solution for these duelling use cases? I haven't thought about it much, but a first approximation might be to add a -genomic boolean flag that, when true, would do the right thing and check the ID when doing overlaps or other positional comparisons. >> >> (Maybe -genomic is too obscure. Maybe it should be -same_id_for_overlaps or something like that.) >> >> And maybe having to know to set a flag is effectively the same thing as having to read the docs to understand SeqFeature's overlap behavior. >> >> What do the rest of you out there think? >> >> >> Dave >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Thu Aug 19 12:54:23 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 19 Aug 2010 18:54:23 +0200 Subject: [Bioperl-l] Bug? Features with similar ranges, different IDs are considered overlapping References: <83299B71-0F73-440D-A9C5-DC1DA2AFF605@davemessina.com> Message-ID: <1EFB951F-AEE1-4B2A-9E29-114E40B25D21@sbc.su.se> [Ccing list for real this time] On Aug 19, 2010, at 17:55, Chris Fields wrote: > Probably would just be -ignore_ids You're right, that's the way to go. > define this in the appropriate class, probably LocationI or Yep, that's cleaner. Thanks! Dave From cjfields1 at gmail.com Thu Aug 19 13:20:32 2010 From: cjfields1 at gmail.com (Christopher Fields) Date: Thu, 19 Aug 2010 12:20:32 -0500 Subject: [Bioperl-l] Could I install BioPerl on Windows with the ActivePerl 5.12.1? In-Reply-To: References: <78E913D5-00E2-45F2-AA9D-7F4A7CDBFDA1@gmail.com> Message-ID: <5115F433-06AC-46F1-81AD-D15C4A8D9524@gmail.com> cc'ing list. Looks like the BioPerl PPM is possibly broken for perl 5.12. Shouldn't be too hard to fix, but apparently there are a lot of missing packages. Troubling... chris On Aug 19, 2010, at 11:29 AM, han sun wrote: > v5.10 works,thanks. > > 2010/8/19 Christopher Fields > Try using ActivePerl 5.10 instead of v5.12. It's very possible the PPM won't work for v5.12 yet. > > chris > > On Aug 19, 2010, at 9:25 AM, han sun wrote: > > > Hello everyone, > > > > I have used perl for several months,and I now want to feel the power of > > bioperl. > > But it seems that the installing is more difficult than I thought. > > > > I typed the commands. > > > > > > > > install-shell > > > > > > rep add bioperl http://bioperl.org/DIST > > > > > > rep add uwinnipeg > > http://cpan.uwinnipeg.ca/PPMPackages/12xx/ > > > > > > rep add trouchelle http://trouchelle.com/ppm12/ > > > > install BioPerl > > > > However,the installing failed, > > > > ppm install failed: > > Can't find any package that provides List::MoreUtils for Bundle-BioPerl-Core > > Can't find any package that provides PostScript::TextBlock for > > Bundle-BioPerl-Core > > Can't find any package that provides Ace:: for Bundle-BioPerl-Core > > Can't find any package that provides SOAP::Lite for Bundle-BioPerl-Core > > Can't find any package that provides Convert::Binary::C for > > Bundle-BioPerl-Core > > Can't find any package that provides XML::Twig for Bundle-BioPerl-Core > > Can't find any package that provides DB_File:: for Bundle-BioPerl-Core > > Can't find any package that provides IPC::Run for GraphViz > > Can't find any package that provides XML-XPathEngine for XML-DOM-XPath > > Can't find any package that provides List-MoreUtils for Moose > > Can't find any package that provides List-MoreUtils for Class-MOP > > > > > > then I tried > > > > install http://www.bribes.org/perl/ppm/GD.ppd > > > > and tried the installation again,but it still didn't help. > > > > * > > * > > * > > * > > * > > * > > > > > > *Do you konw what's wrong with the problem?* > > * > > * > > * > > * > > *Please help me,thanks very much.* > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From rmb32 at cornell.edu Thu Aug 19 13:09:45 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 19 Aug 2010 10:09:45 -0700 Subject: [Bioperl-l] reminder: Aug 25 deadline for GMOD Hackathon application Message-ID: <4C6D6559.3080809@cornell.edu> Hi all, This is your one-week reminder: the deadline for open applications to the GMOD Evo hackathon is Wednesday, August 25th. Rob ======================================== We are seeking participants for the GMOD Tools for Evolutionary Biology Hackathon, held November 8-12, 2010 at the US National Evolutionary Synthesis Center (NESCent) in Durham, NC. This hackathon targets three critical gaps in the capabilities of the GMOD toolbox that currently limit its utility for evolutionary research: 1. Visualization of comparative genomics data 2. Visualization of phylogenetic data and trees 3. Support for population diversity and phenotype data If you are interested in these areas and have relevant expertise, you are strongly encouraged to apply. Relevant areas of expertise include more than just software development: if you are a GMOD power user, visualization guru, domain expert (comparative, phylogenetics, population, ...), or documentation wizard, then your skills are needed! How To Apply: Fill out the online application form at http://bit.ly/gmodevohack. Applications are due August 25. About GMOD: GMOD is an intercompatible suite of open-source software components for storing, managing, analyzing, and visualizing genome-scale data. GMOD includes many widely-used software components: GBrowse and JBrowse, both genome viewers; GBrowse_syn, a comparative genomics viewer; Chado, a generic and modular database schema; CMap, a comparative map viewer; as well as many other components including Apollo, MAKER, BioMart, InterMine, and Galaxy. We hope to extend the functionality of existing GMOD components, and integrate new components as well. About Hackathons: A hackathon is an intense event at which a group of programmers with different backgrounds and skills collaborate hands-on and face-to-face to develop working code that is of utility to the community as a whole. The mix of people will include domain experts and computer-savvy end-users. More details about the event, its motivation, organization, procedures, and attendees, as well as URLs to the hackathon and related websites are included below. Sincerely, The GMOD EvoHack Organizing Committee (and project affiliations as relevant): Nicole Washington, Chair (LBNL, modENCODE, Phenote) Robert Buels (SGN, Chado NatDiv) Scott Cain (OICR, GMOD) Dave Clements (NESCent, GMOD) Hilmar Lapp (NESCent, Phenoscape, Chado NatDiv) Sheldon McKay (University of Arizona, iPlant, GBrowse_syn) ----------------------------- About the GMOD Evo Hackathon Overview We are organizing a hackathon to fill critical gaps in the capabilities of the Generic Model Organism Database (GMOD) toolbox that currently limit its utility for evolutionary research. Specifically, we will focus on tools for 1) viewing comparative genomics data; 2) visualizing phylogenomic data; and 3) supporting population diversity data and phenotype annotation. The event will be hosted at NESCent and bring together a group of about 20+ software developers, end-user representatives, and documentation experts who would otherwise not meet. The participants will include key developers of GMOD components that currently lack features critical for emerging evolutionary biology research, developers of informatics tools in evolutionary research that lack GMOD integration, and informatics-savvy biologists who can represent end-user requirements. The event will provide a unique opportunity to infuse the GMOD developer community with a heightened awareness of unmet needs in evolutionary biology that GMOD components have the potential to fill, and for tool developers in evolutionary biology to better understand how best to extend or integrate with already existing GMOD components. Before the Event Discussion of ideas and sometimes even design actually starts well before the hackathon, on mailing lists, wiki pages, and conference calls set up among accepted attendees. This advance work lays the foundation for participants to be productive from the very first day. This also means that participants should be willing to contribute some time in advance of the hackathon itself to participate in this preparatory discussion. During the Event Typically, hackathon participants use the morning of the first day of the event to organize themselves into working groups of between 3 and 6 people, each with a focused implementation objective. Ideas and objectives are discussed, and attendees coalesce around the projects in which they have the most experience or interest. Deliverables / Event Results The meeting's attendance, working groups, and outcomes will be fully logged and documented on the GMOD wiki (http://gmod.org). Each working group during the event will typically have its own wiki page, linked from the main EvoHack page, where it documents its minutes and design notes, and provides links to the code and documentation it produces. Also, since GMOD and NESCent are both committed to open source principles, all code and documentation produced by participants during the hackathon must be published under an OSI-approved open source license. As contributions to existing GMOD tools, all hackathon products will most likely satisfy this requirement automatically. NESCent This event is sponsored by the US National Evolutionary Synthesis Center (NESCent, http://www.nescent.org) through its Informatics Whitepapers program (http://www.nescent.org/informatics/whitepapers.php). NESCent promotes the synthesis of information, concepts and knowledge to address significant, emerging, or novel questions in evolutionary science and its applications. NESCent achieves this by supporting research and education across disciplinary, institutional, geographic, and demographic boundaries (see http://www.nescent.org/science/proposals.php). Links Main GMOD EvoHack page, and full proposal: http://gmod.org/wiki/GMOD_Evo_Hackathon NESCent: http://www.nescent.org/ GMOD: http://gmod.org Similar past NESCent events, see: http://hackathon.nescent.org/ GMOD hackathon application: http://bit.ly/gmodevohack -- http://gmod.org/wiki/GMOD_News http://gmod.org/wiki/GMOD_Europe_2010 http://gmod.org/wiki/Help_Desk_Feedback From David.Messina at sbc.su.se Thu Aug 19 14:55:50 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 19 Aug 2010 20:55:50 +0200 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlast.pm - bl2seq question In-Reply-To: <4C6D7123.9080908@bcm.tmc.edu> References: <4C6C3259.4060304@bcm.tmc.edu> <4C6D7123.9080908@bcm.tmc.edu> Message-ID: <4E977318-05AC-4D8E-9A39-8C07A2419198@sbc.su.se> Glad I could help, Caleb. Dave On Aug 19, 2010, at 20:00, Caleb Davis wrote: > Hi Dave, > > Thank you so much for your detailed response! Fixing the reward parameter replicated the online result for me. All of the other factors you brought up will help me track down any future problems. Thanks again. > > --Caleb > From rmb32 at cornell.edu Thu Aug 19 18:19:11 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 19 Aug 2010 15:19:11 -0700 Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query In-Reply-To: <5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu> References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au> <986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net> <045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net> <5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu> Message-ID: <4C6DADDF.1000103@cornell.edu> Chris Fields wrote: > I think it's worth exploring having a DBIx::Class-based middle-ware approach similar to what Rob Buels has done for Chado. That would be fairly easy to get started using DBIx::Class::Schema::Loader. > > After that it would require optimization and tweaking, which is potentially more complex than Rob's setup as Chado is very Pg-specific, but maybe Rob can elaborate... Elaborating on how Bio::Chado::Schema is developed: The vast majority of the code and POD in BCS is autogenerated by DBIx::Class::Schema::Loader. DBICSL gives you a baseline set of DBIx::Class classes that covers all the tables, views, columns, unique constraints, and foreign key relationships. Beyond that, you have to add on yourself. In BCS, we have mostly done things like: * make better-named aliases for some of the autogenerated relationships (though DBICSL does a surprisingly good job of naming relationships automatically most of the time) * add a tiny bit of bioperl compatibility (this needs a lot more work by somebody, volunteers needed!) * add convenience methods for using some of the Chado property tables * use DBIx::Class::Tree::NestedSet to add some powerful ways of traversing phylogenetic tree relationships Regarding DB backend specificity, BCS isn't Pg-specific at all, because DBIx::Class itself goes to great lengths to be compatible (and performant!) with just about every relational database out there. In fact, the BCS test suite deploys a Chado schema into a temporary SQLite database using DBIC::Schema's deploy() method, and runs all of its tests on that. Very handy. Chado's Pg-specific server-side functions can of course be called through BCS if they are present, but it's perfectly possible to use Chado without any of the server-side functions, and mostly the way I use it. Rob From David.Messina at sbc.su.se Fri Aug 20 05:19:14 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 20 Aug 2010 11:19:14 +0200 Subject: [Bioperl-l] Git for the lazy Message-ID: <4A13D48C-B920-4FA5-AF18-292C764A8B79@sbc.su.se> Hi everyone, If you're like me and still getting up to speed with Git, you might find this helpful: http://www.spheredev.org/wiki/Git_for_the_lazy Dave From bgs500 at york.ac.uk Fri Aug 20 09:07:50 2010 From: bgs500 at york.ac.uk (Ben Saville) Date: Fri, 20 Aug 2010 14:07:50 +0100 Subject: [Bioperl-l] Problem Parsing BLAST output Message-ID: <77D8BE69-E2D3-4C75-BED4-08F1755E414F@york.ac.uk> Hi Everyone, I'm very much new to the world of sequence data analysis (and this mailing list!), and have reached a roadblock. I have BLASTed some contigs against a series of databases that I created. From this I would like to parse through the data and separate it before extracting the information of interest at a later point. I would like to separate the data by query ID. I found the following Bioperl script; #!/usr/bin/perl use Bio::Search::Result::BlastResult; use Bio::SearchIO; my $report = Bio::SearchIO->new( -file=>'All_BCM_results.bls', -format => blast); my $result = $report->next_result; my %hits_by_query; while (my $hit = $result->next_hit) { push @{$hits_by_query{$hit->name}}, $hit; } foreach my $qid ( keys %hits_by_query ) { my $result = Bio::Search::Result::BlastResult->new(); $result->add_hit($_) for ( @{$hits_by_query{$qid}} ); my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", - format=>'blast' ); $blio->write_result($result); } running this script resulted in the following error; BlastResult::new(): Not adding iterations. ------------- EXCEPTION: Bio::Root::NoSuchThing ------------- MSG: No such iteration number: 0. Valid range=1-0 VALUE: The number zero (0) STACK: Error::throw STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.8/Bio/Root/Root.pm:368 STACK: Bio::Search::Result::BlastResult::iteration /sw/lib/perl5/5.8.8/ Bio/Search/Result/BlastResult.pm:328 STACK: Bio::Search::Result::BlastResult::add_hit /sw/lib/perl5/5.8.8/ Bio/Search/Result/BlastResult.pm:258 STACK: /Users/bsaville/Desktop/Parsing_BLAST_by_query.pl:15 ------------------------------------------------------------- So I added my $result = Bio::Search::Result::BlastResult->new(1); The 1 to the line shown above, as it told me this was within the valid range. This produced the following error; ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Must define arrayref of Iterations when initializing a Bio::Search::Result::BlastResult STACK: Error::throw STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.8/Bio/Root/Root.pm:368 STACK: Bio::Search::Result::BlastResult::new /sw/lib/perl5/5.8.8/Bio/ Search/Result/BlastResult.pm:128 STACK: /Users/bsaville/Desktop/Parsing_BLAST_by_query.pl:14 ----------------------------------------------------------- I know that it is my inexperience that is causing this problem, but I really can't figure this out. Regards Ben Saville From David.Messina at sbc.su.se Fri Aug 20 09:48:28 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 20 Aug 2010 15:48:28 +0200 Subject: [Bioperl-l] Problem Parsing BLAST output In-Reply-To: <77D8BE69-E2D3-4C75-BED4-08F1755E414F@york.ac.uk> References: <77D8BE69-E2D3-4C75-BED4-08F1755E414F@york.ac.uk> Message-ID: <0384052D-74D2-4789-B7FA-76EED826044F@sbc.su.se> Hi Ben, I would not use the script you posted ? I don't think it does what you want. If you haven't already, you should take a look at the beginners' HOWTO http://www.bioperl.org/wiki/HOWTO:Beginners the SearchIO HOWTO http://www.bioperl.org/wiki/HOWTO:SearchIO and the example scripts included with BioPerl: http://www.bioperl.org/wiki/Scripts Incidentally, it's a lot of fiddly data processing to parse blast reports for many contigs against multiple databases and then go back and collate the results by query. I'm not sure exactly what you want to do once you've separated by query ? if you provide some more information, we could suggest ways to best get you where you want to go. I will mention, though, that BLAST has the ability to search multiple separate databases in one go and collate the results for you. So that's something to consider. Dave From bernd.web at gmail.com Fri Aug 20 11:17:05 2010 From: bernd.web at gmail.com (Bernd Web) Date: Fri, 20 Aug 2010 17:17:05 +0200 Subject: [Bioperl-l] Bio::LocatableSeq end checking inconsistency In-Reply-To: <004a01cb3aec$8c2ddd60$a4899820$%yin@ucd.ie> References: <004a01cb3aec$8c2ddd60$a4899820$%yin@ucd.ie> Message-ID: Hi Yin, I am not quite sure if the following is also related to your gapped length issue but I found I had to adapt the calculation of ungapped_len in Bio::LocatableSeq. If my slices did not contain any letters or a new gap char I used, SimpleAlign could not find the sequences when outputting the alignment. This was due to a difference in length calculation: SimpleAlign: uses \W: $slice_seq =~ s/\W//g; Bio::LocatableSeq::ungapped_len uses "$string =~ s/[\.\-]+//g;" I had to include '~' (for my local sequences) in the ungapped_len; otherwise i would run into the end issues with SimpleAlign. Kind regards, Bernd On Fri, Aug 13, 2010 at 3:36 PM, Jun Yin wrote: > Hi, all, > > > > I am the google summer of code student working on Bio::Align subsystem > refactoring. The code (Bio::SimpleAlign) I re-implemented now has passed > nearly all the test, except a few tests on seq/start-end testing. But here > comes a problem. This may be an old issue, that the Bio::LocatableSeq end > assignment and checking are inconsistent. > > > > The current end checking method is based on: > > $end=$seq->_ungapped_len+$seq->start-1 > > However, this checking may not fit the real world case. > > > > The inconsistency usually happens when a few columns of the sequence are > removed. > > > > For example: > > my $a = Bio::LocatableSeq->new( > > ? ?-id ? ?=> 'a', > > ? ?-strand => 1, > > ? ?-seq ? => '-tcgatc-atcgatcg', > > ? ?-start => 30, > > ? ?-end ? => 43 > > ); > > > > If we remove the 1st, 8th and the last columns > > > > $a->seq() will be 'tcgatcatcgatc' > > $a->_ungapped_len==12 > > > > Actually, in the real world, the first residue will still be 30 (the old > $seq->start), and the last residue is the residue before the 43 (the old > $seq->end), thus 42. > > > > But if you call a validation, the calculation is > $a->_ungapped_len+$a->start-1=12+30-1=41 > > So the reassignment of the $seq->end will not pass the validation. > > > > So unless you save the information to a new sequence object, the original > position information will be lost anyway. But in some cases, we have to > change the sequence in its original sequence object .. > > > > What is your suggestion on this issue? > > A. pass the test and lose the information ? ? ?#convenient in coding but the > start-end annotation is not right any more > > B. keep the information and forget the test ? #the object will still > remember where the last residue was in the original sequence. But is it > really meaningful at all? Because all the other residues may come from > nowhere > > C. Neither of above #any other suggestions? > > > > Cheers, > > Jun Yin > > Ph.D. student in U.C.D. > > > > Bioinformatics Laboratory > > Conway Institute > > University College Dublin > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From sidd.basu at gmail.com Fri Aug 20 11:59:59 2010 From: sidd.basu at gmail.com (Siddhartha Basu) Date: Fri, 20 Aug 2010 10:59:59 -0500 Subject: [Bioperl-l] Re: bioperl-db and postgres8.3 - status query In-Reply-To: <4C6DADDF.1000103@cornell.edu> References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au> <986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net> <045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net> <5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu> <4C6DADDF.1000103@cornell.edu> Message-ID: <20100820155956.GB400@vpn-165-124-164-118.vpn.northwestern.edu> Hi, On Thu, 19 Aug 2010, Robert Buels wrote: > Chris Fields wrote: > > I think it's worth exploring having a DBIx::Class-based middle-ware > > approach similar to what Rob Buels has done for Chado. That would be > > fairly easy to get started using DBIx::Class::Schema::Loader. > > After that it would require optimization and tweaking, which is > > potentially more complex than Rob's setup as Chado is very Pg-specific, > > but maybe Rob can elaborate... > > Elaborating on how Bio::Chado::Schema is developed: > > The vast majority of the code and POD in BCS is autogenerated by > DBIx::Class::Schema::Loader. DBICSL gives you a baseline set of > DBIx::Class classes that covers all the tables, views, columns, unique > constraints, and foreign key relationships. > > Beyond that, you have to add on yourself. In BCS, we have mostly done > things like: > > * make better-named aliases for some of the autogenerated > relationships (though DBICSL does a surprisingly good job of naming > relationships automatically most of the time) > * add a tiny bit of bioperl compatibility (this needs a lot more work > by somebody, volunteers needed!) > * add convenience methods for using some of the Chado property tables > * use DBIx::Class::Tree::NestedSet to add some powerful ways of > traversing phylogenetic tree relationships > > Regarding DB backend specificity, BCS isn't Pg-specific at all, because > DBIx::Class itself goes to great lengths to be compatible (and performant!) > with just about every relational database out there. I would vouch for that at least as far as chado in oracle is concerned. So, far BCS works out flawlessly with our oracle chado instance at dictybase. Quite a chunk of BCS based code is also active in couple of our Mojo based webapps. The part which i still couldn't use directly is the 'synonym' table as it clashes with oracle specific reserved keywords. However, overall it seems to quite cross-RDMS compatible and highly recommended. -siddhartha >In fact, the BCS test > suite deploys a Chado schema into a temporary SQLite database using > DBIC::Schema's deploy() method, and runs all of its tests on that. Very > handy. > > Chado's Pg-specific server-side functions can of course be called through > BCS if they are present, but it's perfectly possible to use Chado without > any of the server-side functions, and mostly the way I use it. > > Rob > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jun.yin at ucd.ie Fri Aug 20 12:17:33 2010 From: jun.yin at ucd.ie (Jun Yin) Date: Fri, 20 Aug 2010 17:17:33 +0100 Subject: [Bioperl-l] Bio::LocatableSeq end checking inconsistency In-Reply-To: References: <004a01cb3aec$8c2ddd60$a4899820$%yin@ucd.ie> Message-ID: <000b01cb4083$31f98280$95ec8780$%yin@ucd.ie> Hi, Bernd, Thx for your input. Yes, this is one of the old bugs in Bio::SimpleAlign. $aln->slice just simply $slice_seq =~ s/\W//g to calculate the ungapped length. But in $seq->_ungapped_len, this method use $string =~ s{[$GAP_SYMBOLS$FRAMESHIFT_SYMBOLS]+}{}g; Which is '\-\.=~\\\/ ' to calculate the ungapped length. To solve this problem, first, now I use $nonres = join("",$self->gap_char, $self->match_char,$self->missing_char); Which is '-\.&' to remove the non-residue chars in the alignment sequence (though if you use '=','~','\','/' will also cause problems). Secondly, I have merged slice, remove_columns and remove_gaps, using the same internal function. Thus it is easier to debug. These changes will be merged into main BioPerl branch after next version. But anyway, the confict is still there, because the non residue chars are defined as: In Bio::SimpleAlign, $aln->gap_char, $aln->missing_char, $aln->match_char In Bio::LocatableSeq $GAP_SYMBOLS = '\-\.=~'; $FRAMESHIFT_SYMBOLS = '\\\/'; so try to use '-' or '.' for your gap char at the moment, otherwise you may encounter end warnings in calculation. And, if you want to keep gap only sequences, you can call the method as: $aln2 = $aln->slice(20,30,1) The last parameter is to keep gap only sequence. Cheers, Jun Yin Ph.D.?student in U.C.D. Bioinformatics Laboratory Conway Institute University College Dublin -----Original Message----- From: Bernd Web [mailto:bernd.web at gmail.com] Sent: Friday, August 20, 2010 4:17 PM To: Jun Yin Cc: bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Bio::LocatableSeq end checking inconsistency Hi Yin, I am not quite sure if the following is also related to your gapped length issue but I found I had to adapt the calculation of ungapped_len in Bio::LocatableSeq. If my slices did not contain any letters or a new gap char I used, SimpleAlign could not find the sequences when outputting the alignment. This was due to a difference in length calculation: SimpleAlign: uses \W: $slice_seq =~ s/\W//g; Bio::LocatableSeq::ungapped_len uses "$string =~ s/[\.\-]+//g;" I had to include '~' (for my local sequences) in the ungapped_len; otherwise i would run into the end issues with SimpleAlign. Kind regards, Bernd On Fri, Aug 13, 2010 at 3:36 PM, Jun Yin wrote: > Hi, all, > > > > I am the google summer of code student working on Bio::Align subsystem > refactoring. The code (Bio::SimpleAlign) I re-implemented now has passed > nearly all the test, except a few tests on seq/start-end testing. But here > comes a problem. This may be an old issue, that the Bio::LocatableSeq end > assignment and checking are inconsistent. > > > > The current end checking method is based on: > > $end=$seq->_ungapped_len+$seq->start-1 > > However, this checking may not fit the real world case. > > > > The inconsistency usually happens when a few columns of the sequence are > removed. > > > > For example: > > my $a = Bio::LocatableSeq->new( > > ? ?-id ? ?=> 'a', > > ? ?-strand => 1, > > ? ?-seq ? => '-tcgatc-atcgatcg', > > ? ?-start => 30, > > ? ?-end ? => 43 > > ); > > > > If we remove the 1st, 8th and the last columns > > > > $a->seq() will be 'tcgatcatcgatc' > > $a->_ungapped_len==12 > > > > Actually, in the real world, the first residue will still be 30 (the old > $seq->start), and the last residue is the residue before the 43 (the old > $seq->end), thus 42. > > > > But if you call a validation, the calculation is > $a->_ungapped_len+$a->start-1=12+30-1=41 > > So the reassignment of the $seq->end will not pass the validation. > > > > So unless you save the information to a new sequence object, the original > position information will be lost anyway. But in some cases, we have to > change the sequence in its original sequence object .. > > > > What is your suggestion on this issue? > > A. pass the test and lose the information ? ? ?#convenient in coding but the > start-end annotation is not right any more > > B. keep the information and forget the test ? #the object will still > remember where the last residue was in the original sequence. But is it > really meaningful at all? Because all the other residues may come from > nowhere > > C. Neither of above #any other suggestions? > > > > Cheers, > > Jun Yin > > Ph.D. student in U.C.D. > > > > Bioinformatics Laboratory > > Conway Institute > > University College Dublin > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > __________ Information from ESET Smart Security, version of virus signature database 5377 (20100818) __________ The message was checked by ESET Smart Security. http://www.eset.com __________ Information from ESET Smart Security, version of virus signature database 5377 (20100818) __________ The message was checked by ESET Smart Security. http://www.eset.com From cjfields at illinois.edu Fri Aug 20 12:23:07 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 20 Aug 2010 11:23:07 -0500 Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query In-Reply-To: <20100820155956.GB400@vpn-165-124-164-118.vpn.northwestern.edu> References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au> <986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net> <045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net> <5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu> <4C6DADDF.1000103@cornell.edu> <20100820155956.GB400@vpn-165-124-164-118.vpn.northwestern.edu> Message-ID: <1282321387.30812.17.camel@pyrimidine.igb.uiuc.edu> On Fri, 2010-08-20 at 10:59 -0500, Siddhartha Basu wrote: > Hi, > > On Thu, 19 Aug 2010, Robert Buels wrote: > > > Chris Fields wrote: > > > I think it's worth exploring having a DBIx::Class-based middle-ware > > > approach similar to what Rob Buels has done for Chado. That would be > > > fairly easy to get started using DBIx::Class::Schema::Loader. > > > After that it would require optimization and tweaking, which is > > > potentially more complex than Rob's setup as Chado is very Pg-specific, > > > but maybe Rob can elaborate... > > > > Elaborating on how Bio::Chado::Schema is developed: > > > > The vast majority of the code and POD in BCS is autogenerated by > > DBIx::Class::Schema::Loader. DBICSL gives you a baseline set of > > DBIx::Class classes that covers all the tables, views, columns, unique > > constraints, and foreign key relationships. > > > > Beyond that, you have to add on yourself. In BCS, we have mostly done > > things like: > > > > * make better-named aliases for some of the autogenerated > > relationships (though DBICSL does a surprisingly good job of naming > > relationships automatically most of the time) > > * add a tiny bit of bioperl compatibility (this needs a lot more work > > by somebody, volunteers needed!) > > * add convenience methods for using some of the Chado property tables > > * use DBIx::Class::Tree::NestedSet to add some powerful ways of > > traversing phylogenetic tree relationships > > > > Regarding DB backend specificity, BCS isn't Pg-specific at all, because > > DBIx::Class itself goes to great lengths to be compatible (and performant!) > > with just about every relational database out there. > I would vouch for that at least as far as chado in oracle is concerned. > So, far BCS works out flawlessly with our oracle chado instance at > dictybase. Quite a chunk of BCS based code is also active in couple of > our Mojo based webapps. The part which i still couldn't use directly is > the 'synonym' table as it clashes with oracle specific reserved keywords. > However, overall it seems to quite cross-RDMS compatible and highly > recommended. > > -siddhartha Just to point out, I didn't say BCS is Pg-specific, but that Chado is (that was the DBMS it was designed for). Maybe that should be amended to 'was' now :) I recall seeing a page on this somewhere on the GMOD website along the lines of "MySQL has problems so we chose Pg", and that Chado support would focus on Pg. I'm guessing that's no longer the case? Or is only the server-side stuff Pg-specific. > >In fact, the BCS test > > suite deploys a Chado schema into a temporary SQLite database using > > DBIC::Schema's deploy() method, and runs all of its tests on that. Very > > handy. > > > > Chado's Pg-specific server-side functions can of course be called through > > BCS if they are present, but it's perfectly possible to use Chado without > > any of the server-side functions, and mostly the way I use it. > > > > Rob I think this opens up the possibility of starting a DBIx::Class-based middleware solution. Hilmar, did you want to take that on? chris From sidd.basu at gmail.com Fri Aug 20 13:39:44 2010 From: sidd.basu at gmail.com (Siddhartha Basu) Date: Fri, 20 Aug 2010 12:39:44 -0500 Subject: [Bioperl-l] Re: bioperl-db and postgres8.3 - status query In-Reply-To: <1282321387.30812.17.camel@pyrimidine.igb.uiuc.edu> References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au> <986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net> <045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net> <5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu> <4C6DADDF.1000103@cornell.edu> <20100820155956.GB400@vpn-165-124-164-118.vpn.northwestern.edu> <1282321387.30812.17.camel@pyrimidine.igb.uiuc.edu> Message-ID: <20100820173942.GC400@vpn-165-124-164-118.vpn.northwestern.edu> On Fri, 20 Aug 2010, Chris Fields wrote: > On Fri, 2010-08-20 at 10:59 -0500, Siddhartha Basu wrote: > > Hi, > > > > On Thu, 19 Aug 2010, Robert Buels wrote: > > > > > Chris Fields wrote: > > > > I think it's worth exploring having a DBIx::Class-based middle-ware > > > > approach similar to what Rob Buels has done for Chado. That would be > > > > fairly easy to get started using DBIx::Class::Schema::Loader. > > > > After that it would require optimization and tweaking, which is > > > > potentially more complex than Rob's setup as Chado is very Pg-specific, > > > > but maybe Rob can elaborate... > > > > > > Elaborating on how Bio::Chado::Schema is developed: > > > > > > The vast majority of the code and POD in BCS is autogenerated by > > > DBIx::Class::Schema::Loader. DBICSL gives you a baseline set of > > > DBIx::Class classes that covers all the tables, views, columns, unique > > > constraints, and foreign key relationships. > > > > > > Beyond that, you have to add on yourself. In BCS, we have mostly done > > > things like: > > > > > > * make better-named aliases for some of the autogenerated > > > relationships (though DBICSL does a surprisingly good job of naming > > > relationships automatically most of the time) > > > * add a tiny bit of bioperl compatibility (this needs a lot more work > > > by somebody, volunteers needed!) > > > * add convenience methods for using some of the Chado property tables > > > * use DBIx::Class::Tree::NestedSet to add some powerful ways of > > > traversing phylogenetic tree relationships > > > > > > Regarding DB backend specificity, BCS isn't Pg-specific at all, because > > > DBIx::Class itself goes to great lengths to be compatible (and performant!) > > > with just about every relational database out there. > > I would vouch for that at least as far as chado in oracle is concerned. > > So, far BCS works out flawlessly with our oracle chado instance at > > dictybase. Quite a chunk of BCS based code is also active in couple of > > our Mojo based webapps. The part which i still couldn't use directly is > > the 'synonym' table as it clashes with oracle specific reserved keywords. > > However, overall it seems to quite cross-RDMS compatible and highly > > recommended. > > > > -siddhartha > > Just to point out, I didn't say BCS is Pg-specific, but that Chado is > (that was the DBMS it was designed for). Maybe that should be amended > to 'was' now :) > > I recall seeing a page on this somewhere on the GMOD website along the > lines of "MySQL has problems so we chose Pg", and that Chado support > would focus on Pg. As far as i understand GMOD stongly recommends and the popular backend for chado is Pg. However, my point was if anybody wants to use or tryout chado schema on a different backend or have an existing setup, tools like DBIx::Class or particularly BCS makes it quite easier to do so. The code developed on top also become quite robust and portable. -siddhartha >I'm guessing that's no longer the case? Or is only > the server-side stuff Pg-specific. > > > >In fact, the BCS test > > > suite deploys a Chado schema into a temporary SQLite database using > > > DBIC::Schema's deploy() method, and runs all of its tests on that. Very > > > handy. > > > > > > Chado's Pg-specific server-side functions can of course be called through > > > BCS if they are present, but it's perfectly possible to use Chado without > > > any of the server-side functions, and mostly the way I use it. > > > > > > Rob > > I think this opens up the possibility of starting a DBIx::Class-based > middleware solution. Hilmar, did you want to take that on? > > chris > > From buiduyminh at gmail.com Fri Aug 20 17:29:00 2010 From: buiduyminh at gmail.com (Minh Bui) Date: Fri, 20 Aug 2010 17:29:00 -0400 Subject: [Bioperl-l] bp_seqfeature_load.pl fails on Mac os. Please help. Message-ID: Hi,, I am trying to load my GFF file to mysql database but I got this error when I ran the bp_seqfeature_load.pl ( bioperl 1.6.1 on MAC) [BioComplexity-5:/usr/local/bin] minh% perl bp_seqfeature_load.pl install_driver(mysql) failed: Can't locate DBD/mysql.pm in @INC (@INC contains: /sw/lib/perl5 /sw/lib/perl5/darwin /System/Library/Perl/5.8.6/darwin-thread-multi-2level /System/Library/Perl/5.8.6 /Library/Perl/5.8.6/darwin-thread-multi-2level /Library/Perl/5.8.6 /Library/Perl /Network/Library/Perl/5.8.6/darwin-thread-multi-2level /Network/Library/Perl/5.8.6 /Network/Library/Perl /System/Library/Perl/Extras/5.8.6/darwin-thread-multi-2level /System/Library/Perl/Extras/5.8.6 /Library/Perl/5.8.1 .) at (eval 44) line 3. Perhaps the DBD::mysql perl module hasn't been fully installed, or perhaps the capitalisation of 'mysql' isn't right. Available drivers: DBM, ExampleP, File, Gofer, Proxy, Sponge. at /Library/Perl/5.8.6/Bio/DB/SeqFeature/Store/DBI/mysql.pm line 212 I am using MAC OSX version 10.4.10 and MAMP? Isnt it the "/Library/Perl/5.8.6" already in @INC? What am I missing? I have been googling this error for a few hours. I also install Bioperl and reinstall DBD::mysql using CPAN. It still doesnt work.. Here is my $PERL5LIB: /sw/lib/perl5:/sw/lib/perl5/darwin/ I really need help on this. Thank you, From awitney at sgul.ac.uk Sat Aug 21 06:39:10 2010 From: awitney at sgul.ac.uk (Adam Witney) Date: Sat, 21 Aug 2010 11:39:10 +0100 Subject: [Bioperl-l] bp_seqfeature_load.pl fails on Mac os. Please help. In-Reply-To: References: Message-ID: <491D1B66-741F-4315-8A6B-46F465956017@sgul.ac.uk> On 20 Aug 2010, at 22:29, Minh Bui wrote: > Hi,, > I am trying to load my GFF file to mysql database but I got this error > when I ran the bp_seqfeature_load.pl ( bioperl 1.6.1 on MAC) > > [BioComplexity-5:/usr/local/bin] minh% perl bp_seqfeature_load.pl > install_driver(mysql) failed: Can't locate DBD/mysql.pm in @INC (@INC > contains: /sw/lib/perl5 /sw/lib/perl5/darwin > /System/Library/Perl/5.8.6/darwin-thread-multi-2level > /System/Library/Perl/5.8.6 > /Library/Perl/5.8.6/darwin-thread-multi-2level /Library/Perl/5.8.6 > /Library/Perl /Network/Library/Perl/5.8.6/darwin-thread-multi-2level > /Network/Library/Perl/5.8.6 /Network/Library/Perl > /System/Library/Perl/Extras/5.8.6/darwin-thread-multi-2level > /System/Library/Perl/Extras/5.8.6 /Library/Perl/5.8.1 .) at (eval 44) > line 3. > Perhaps the DBD::mysql perl module hasn't been fully installed, > or perhaps the capitalisation of 'mysql' isn't right. > Available drivers: DBM, ExampleP, File, Gofer, Proxy, Sponge. > at /Library/Perl/5.8.6/Bio/DB/SeqFeature/Store/DBI/mysql.pm line 212 > > I am using MAC OSX version 10.4.10 and MAMP? Isnt it the > "/Library/Perl/5.8.6" already in @INC? What am I missing? > I have been googling this error for a few hours. I also install > Bioperl and reinstall DBD::mysql using CPAN. It still doesnt work.. > > Here is my $PERL5LIB: /sw/lib/perl5:/sw/lib/perl5/darwin/ Where did DBD:mysql get installed? can you verify that DBD/mysql.pm is actually in one of those directories listed above? From i.hatethispart at ymail.com Sat Aug 21 10:07:28 2010 From: i.hatethispart at ymail.com (keiko) Date: Sat, 21 Aug 2010 07:07:28 -0700 (PDT) Subject: [Bioperl-l] clustalw.exe In-Reply-To: <3612399.post@talk.nabble.com> References: <3612399.post@talk.nabble.com> Message-ID: <29499435.post@talk.nabble.com> Katrin wrote: > > hello, I am a new Perl/Bioperl-User and first I must excuse me for my > really bad english, but I hope everybody will understand me. I have the > following problem: In my Perl-skript is the following system call: > $y=exec("C:\\Programme\\xampp-win32-1.5.1\\xampp\\perl\\clustalw.exe > C:\\Programme\\xampp-win32-1.5.1\\xampp\\htdocs\\gene\\clustal.fasta"); If > I call this Script with the Shell (cmd.exe) everything works correctly. > But if I call this script with PHP I get the following error message: > Error: unknown option > /C:\Programme\xampp-win32-1.5.1\xampp\htdocs\gene\clustal.fasta. I tried > also system and qx. And I tested the environment variables: I wrote a > bat-file with the definition of all environment-variables and the system > call, but this did not work, too. The same problem is in php. The > PHP-Scipt is called from html and I worked under WindowsXP with xampp. I > hope, somebody can help me. greetings Katrin > Hi. I also have a problem with this one. I want to call clustalw using php. Can I ask what you included in your bat-file and where did you download your clustal? thanks a lot! -- View this message in context: http://old.nabble.com/clustalw.exe-tp3612399p29499435.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From jason at bioperl.org Sun Aug 22 14:29:30 2010 From: jason at bioperl.org (Jason Stajich) Date: Sun, 22 Aug 2010 11:29:30 -0700 Subject: [Bioperl-l] Enquiry on Bio::DB::Taxonomy In-Reply-To: References: Message-ID: <4C716C8A.3010000@bioperl.org> Hi Amali - This is how I'd print out the full classification by using the Tree methods (with probably a different way of initializing the $db object to your flatfiles location). #!/usr/bin/perl -w use strict; use Bio::DB::Taxonomy; my $db= Bio::DB::Taxonomy->new(-source => 'flatfile', -nodesfile => 'taxonomy/nodes.dmp', -namesfile => 'taxonomy/names.dmp'); my $taxonid = $db->get_taxonid('Homo sapiens'); my $taxon = $db->get_taxon(-taxonid => $taxonid); my $tree = Bio::Tree::Tree->new(-node => $taxon); my @taxa = $tree->get_nodes; print join(",", map { $_->scientific_name } @taxa), "\n"; -jason Amali Thrimawithana wrote, On 8/18/10 3:56 PM: > Dear Dr Stajich, > > I am a Masters student at Auckland university and my research is on > identifying yeast species present in wine by the use of 454 sequencing. In > order to carry out this research, a pipeline is being built in which at the > final step each representative OTU need to be classified at different > taxonomic levels (ie: at Phylum, family, class, genus and species) by using > the results from BLAST. To identify the sequences at each taxonomic level, I > have been trying out the Bio::DB::Taxonomy module in bioperl. Using this > module, I am able to get the genus and species level by splitting the > scientific name returned by the Bio::taxon object. But unfortunately I am > uncertain on how to get the information for the other levels of the rank. I > have tried several commands including "my @class = $node->classification;", > but it does not work. Hence, could you please let me know how I might be > able to get the higher levels of taxonomy such as class and phylum using > bioperl? > > Look forward to hearing from you soon > > Thanking You > > Amali > From cjfields at illinois.edu Sun Aug 22 15:56:58 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 22 Aug 2010 14:56:58 -0500 Subject: [Bioperl-l] clustalw.exe In-Reply-To: <29499435.post@talk.nabble.com> References: <3612399.post@talk.nabble.com> <29499435.post@talk.nabble.com> Message-ID: On Aug 21, 2010, at 9:07 AM, keiko wrote: > Katrin wrote: >> >> hello, I am a new Perl/Bioperl-User and first I must excuse me for my >> really bad english, but I hope everybody will understand me. I have the >> following problem: In my Perl-skript is the following system call: >> $y=exec("C:\\Programme\\xampp-win32-1.5.1\\xampp\\perl\\clustalw.exe >> C:\\Programme\\xampp-win32-1.5.1\\xampp\\htdocs\\gene\\clustal.fasta"); If >> I call this Script with the Shell (cmd.exe) everything works correctly. >> But if I call this script with PHP I get the following error message: >> Error: unknown option >> /C:\Programme\xampp-win32-1.5.1\xampp\htdocs\gene\clustal.fasta. I tried >> also system and qx. And I tested the environment variables: I wrote a >> bat-file with the definition of all environment-variables and the system >> call, but this did not work, too. The same problem is in php. The >> PHP-Scipt is called from html and I worked under WindowsXP with xampp. I >> hope, somebody can help me. greetings Katrin >> > > Hi. I also have a problem with this one. I want to call clustalw using php. > Can I ask what you included in your bat-file and where did you download your > clustal? thanks a lot! Not sure, but what does this have to do with BioPerl? chris From jason at bioperl.org Mon Aug 23 11:56:47 2010 From: jason at bioperl.org (Jason Stajich) Date: Mon, 23 Aug 2010 08:56:47 -0700 Subject: [Bioperl-l] a problem when using the Bioperl modules In-Reply-To: References: Message-ID: <4C729A3F.7080304@bioperl.org> Wei - Please ask your questions on the bioperl mailing list, I cannot answer questions directly for all requests. Your problem has been answered by me on the list before so I urge you to use the list archives as a starting point. The line lengths of the fasta file sequence aren't the same length. you need to run this bp_sreformat -if fasta -of fasta -i ORIGINAL -o NEW mv NEW ORIGINAL or with sreformat sreformat fasta ORIGINAL > NEW mv NEW ORIGINAL Guifeng Wei wrote, On 8/23/10 4:57 AM: > Dear professor Stajich, > So sorry to interrupt you. i came across a problem when i use the > Bio::DB::Fasta modules of BioPerl. The aim i want to arrive at is to > extract the subsequences accoording to the *.bed files which are the > C.elegans genomic sequnece annotation. The code i programed is in the > attached file. > The genomic sequences file contains sequences from 6 chromosomes of > C.elegans. > when i run this program in the command line, the following error > warnings was coming. > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Each line of the fasta entry must be the same length except the last. > Line above #301451 ' > ..' is 22 != 51 chars. > STACK: Error::throw > STACK: Bio::Root::Root::throw > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368 > STACK: Bio::DB::Fasta::calculate_offsets > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:770 > STACK: Bio::DB::Fasta::index_file > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:680 > STACK: Bio::DB::Fasta::new > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:491 > STACK: bed_to_fasta.pl:14 > ----------------------------------------------------------- > indexing was interrupted, so unlinking > /home/wgf/WORM_DATA/elegans.WS190.dna.fa.index at > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm line 1053. > > and therefore i write to you in hope that you can help me solve this > problem,as well as, give me some suggestion about how to learn Bioperl > well. > thank you very very much. > yours sincerely > Wei Guifeng From jason.stajich at ucr.edu Mon Aug 23 11:58:07 2010 From: jason.stajich at ucr.edu (Jason Stajich) Date: Mon, 23 Aug 2010 08:58:07 -0700 Subject: [Bioperl-l] a problem when using the Bioperl modules In-Reply-To: References: Message-ID: <4C729A8F.1070506@ucr.edu> You haven't defined this variable $db - you need to not skip the part that initializes the Bio::DB::Fasta object that you had previous asked about. Please send all your future queries to the mailing list. Guifeng Wei wrote, On 8/23/10 8:14 AM: > Dear professor, > after that, i revised my scripts, which is that i divide the genomic > sequences into 7 single file, every file contains the sequence from a > chromosome. > however, when i try to run the scripts, the following error was coming. > Can't call method "seq" on an undefined value at bed_to_fasta.pl > line 29, line 1. > while(){ > chomp $_; > my @bed=split(/\s+/, $_ ); > #print length($db->seq('chrI')); > my $chr_id=$bed[0]; > my $start=$bed[1]; > my $end=$bed[2]; > my $seq_name=$bed[3]; > my $strand=$bed[5]; > my $segment = $db ->seq($chr_id,$start=>$end); > print ">",$seq_name,"_",$chr_id,":",$start=>$end; > print "$segment\n"; > } > the blue line is . > why? -- Jason E. Stajich, PhD Assistant Professor Department of Plant Pathology & Microbiology University of California Riverside, CA 92521 jason.stajich at ucr.edu office: 951.827.2363 http://lab.stajich.org/ http://twitter.com/stajichlab http://fungalgenomes.org/blog/ http://plantpathology.ucr.edu/ http://genomics.ucr.edu/ http://cepceb.ucr.edu/ From guifengwei at gmail.com Mon Aug 23 22:44:57 2010 From: guifengwei at gmail.com (Guifeng Wei) Date: Tue, 24 Aug 2010 10:44:57 +0800 Subject: [Bioperl-l] a problem when using the Bio::DB::Fasta Message-ID: Hi, i came across a problem when i use the Bio::DB::Fasta modules of BioPerl. The aim i want to arrive at is to extract the subsequences accoording to the *.bed files which are the C.elegans genomic sequnece annotation. when i tried to run the scripts i wrote, the error message was coming, as follows: Can't call method "seq" on an undefined value at bed_to_fasta.pl line 28, line 1. so, ask for favor to slove this problem. Here is my perl scripts. #!/usr/bin/perl -w # Purpose: extract sequences from genomic sequences use strict; use Bio::DB::Fasta; open(IN,$ARGV[0]) || die "sorry, the program cannot open the .bed file, plea check it. \n"; my $db = Bio::DB::Fasta->new( '/home/wgf/elegans190.dna/' ); # The dir ...../elegans190.dna/ includes 6 files:chrI,chrII,chrIII,chrIV,chrV,chrX, #each stands for the sequences from the coressponding chromosome. while(){ chomp $_; my @bed=split(/\s+/, $_ ); my $chr_id=$bed[0]; my $start=$bed[1]; my $end=$bed[2]; my $seq_name=$bed[3]; my $strand=$bed[5]; my $segment = $db->seq( $chr_id, $start=>$end ); print ">",$seq_name,"_",$chr_id,":",$start=>$end; print "$segment\n"; } close(IN); From florent.angly at gmail.com Tue Aug 24 01:06:21 2010 From: florent.angly at gmail.com (Florent Angly) Date: Tue, 24 Aug 2010 15:06:21 +1000 Subject: [Bioperl-l] a problem when using the Bio::DB::Fasta In-Reply-To: References: Message-ID: <4C73534D.6080607@gmail.com> Hi Guifeng, From the Bio::DB::Fasta documentation: > $db = Bio::DB::Fasta->new($fasta_path [,%options]) > Create a new Bio::DB::Fasta object from the Fasta file or files > indicated by $fasta_path. Indexing will be performed > automatically > if needed. If successful, new() will return the database > accessor > object. Otherwise it will return undef. Hence, after you create the database object $db, you should check that it was successful, e.g.: > my $db = Bio::DB::Fasta->new( '/home/wgf/elegans190.dna/' ); > if (not defined $db) { > die "There was a problem creating the database\n"; > } A problem creating the database would explain the message you get. If the extension of the FASTA files in the directory path that you gave as input is not fa, fasta, fast, FA, FASTA, FAST or dna, then you should use the -glob option when constructing your database object. From the documentation: > -glob Glob expression to use > *.{fa,fasta,fast,FA,FASTA,FAST,dna} > for searching for Fasta > files in directories. Florent On 24/08/10 12:44, Guifeng Wei wrote: > Hi, > > i came across a problem when i use the Bio::DB::Fasta modules of > BioPerl. The aim i want to arrive at is to extract the subsequences > accoording to the *.bed files which are the C.elegans genomic sequnece > annotation. > > when i tried to run the scripts i wrote, the error message was coming, as > follows: > > Can't call method "seq" on an undefined value at bed_to_fasta.pl line 28, > line 1. > > so, ask for favor to slove this problem. > Here is my perl scripts. > > #!/usr/bin/perl -w > # Purpose: extract sequences from genomic sequences > use strict; > use Bio::DB::Fasta; > open(IN,$ARGV[0]) || die "sorry, the program cannot open the .bed file, plea > check it. \n"; > my $db = Bio::DB::Fasta->new( '/home/wgf/elegans190.dna/' ); > # The dir ...../elegans190.dna/ includes 6 > files:chrI,chrII,chrIII,chrIV,chrV,chrX, > #each stands for the sequences from the coressponding chromosome. > > while(){ > chomp $_; > my @bed=split(/\s+/, $_ ); > > my $chr_id=$bed[0]; > my $start=$bed[1]; > my $end=$bed[2]; > my $seq_name=$bed[3]; > my $strand=$bed[5]; > > my $segment = $db->seq( $chr_id, $start=>$end ); > > print ">",$seq_name,"_",$chr_id,":",$start=>$end; > print "$segment\n"; > > } > > close(IN); > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From guifengwei at gmail.com Tue Aug 24 07:28:16 2010 From: guifengwei at gmail.com (Guifeng Wei) Date: Tue, 24 Aug 2010 19:28:16 +0800 Subject: [Bioperl-l] a problem when using the Bio::DB::Fasta In-Reply-To: References: Message-ID: Hi, i have revised my scripts according to the previous email from Florent. However, there were still some errors which frustrated me so much. The errors are as follows: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Each line of the fasta entry must be the same length except the last. Line above #301451 ' ..' is 22 != 51 chars. STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368 STACK: Bio::DB::Fasta::calculate_offsets /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:770 STACK: Bio::DB::Fasta::index_dir /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:593 STACK: Bio::DB::Fasta::new /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:488 STACK: bed2fasta.pl:13 ----------------------------------------------------------- indexing was interrupted, so unlinking /home/wgf/elegans190.dna//directory.index at /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm line 1053 But in the directory /home/wgf/elegans190.dna/ , it concludes 6 files, each contains the complete sequences from one single chromosome, the format is fasta. The extension of the FASTA files is .fa. Every single file is started as ">chromosoemeXXX" followed by the thousands of sequences. and therefore, it warn me that "Each line of the fasta entry must be the same length except the last". and "indexing was interrupted, so unlinking /home/wgf/elegans190.dna//directory". i was much confused about this. so for help. Wei Guifeng From biopython at maubp.freeserve.co.uk Tue Aug 24 09:28:33 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 24 Aug 2010 14:28:33 +0100 Subject: [Bioperl-l] a problem when using the Bio::DB::Fasta In-Reply-To: References: Message-ID: On Tue, Aug 24, 2010 at 12:28 PM, Guifeng Wei wrote: > Hi, > > i have revised my scripts according to the previous email from Florent. > However, there were still some errors which frustrated me so much. > > The errors are as follows: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Each line of the fasta entry must be the same length except the last. > ? ?Line above #301451 ' > ..' is 22 != 51 chars. > STACK: Error::throw > STACK: Bio::Root::Root::throw > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368 > STACK: Bio::DB::Fasta::calculate_offsets > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:770 > STACK: Bio::DB::Fasta::index_dir > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:593 > STACK: Bio::DB::Fasta::new > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:488 > STACK: bed2fasta.pl:13 > ----------------------------------------------------------- > indexing was interrupted, so unlinking > /home/wgf/elegans190.dna//directory.index at > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm line 1053 > But in the directory /home/wgf/elegans190.dna/ , it concludes 6 files, > each contains the complete sequences from one single chromosome, the format > is fasta. The extension of the FASTA files is .fa. Every single file is > started as ">chromosoemeXXX" followed by the thousands of sequences. > > and therefore, it warn me that "Each line of the fasta entry must be the > same length except the last". and "indexing was interrupted, so unlinking > /home/wgf/elegans190.dna//directory". > > i was much confused about this. so for help. > > Wei Guifeng Hi Wei, It sounds like there is inconsistent line wrapping in your FASTA file. This is often not a problem at all, but the DB indexing system (and indeed other indexing tools like the samtools fasta index) requires all the entries have the same wrapping. e.g. This is a valid FASTA file but would not be suitable for indexing: >Test ACGTACGT ACGTACGT ACGTACGT ACGT ACGT T Ignoring the final line (special case - here length one) that uses a mixture of line lengths, 8 and 4. If you had used this it should be fine: >Test ACGTACGT ACGTACGT ACGTACGT ACGTACGT T All the lines are now wrapped at length 8 (and the final line is less than or equal to length 8). Of course, in a real file wrapping a 60 or 80 characters is more common ;) Peter From cjfields at illinois.edu Tue Aug 24 09:38:45 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 24 Aug 2010 08:38:45 -0500 Subject: [Bioperl-l] a problem when using the Bio::DB::Fasta In-Reply-To: References: Message-ID: <995BCF30-99B2-46C2-A4E8-681F9E2A0BB5@illinois.edu> Guifeng, Did you follow Jason's advice yesterday about converting the FASTA over to a more consistent length? Or checking the database itself? These are both things reiterated by Florent and Peter. >From Jason's last response: ------------------------- Wei - Please ask your questions on the bioperl mailing list, I cannot answer questions directly for all requests. Your problem has been answered by me on the list before so I urge you to use the list archives as a starting point. The line lengths of the fasta file sequence aren't the same length. you need to run this bp_sreformat -if fasta -of fasta -i ORIGINAL -o NEW mv NEW ORIGINAL or with sreformat sreformat fasta ORIGINAL > NEW mv NEW ORIGINAL ------------------------- chris On Aug 24, 2010, at 6:28 AM, Guifeng Wei wrote: > Hi, > > i have revised my scripts according to the previous email from Florent. > However, there were still some errors which frustrated me so much. > > The errors are as follows: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Each line of the fasta entry must be the same length except the last. > Line above #301451 ' > ..' is 22 != 51 chars. > STACK: Error::throw > STACK: Bio::Root::Root::throw > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368 > STACK: Bio::DB::Fasta::calculate_offsets > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:770 > STACK: Bio::DB::Fasta::index_dir > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:593 > STACK: Bio::DB::Fasta::new > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:488 > STACK: bed2fasta.pl:13 > ----------------------------------------------------------- > indexing was interrupted, so unlinking > /home/wgf/elegans190.dna//directory.index at > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm line 1053 > But in the directory /home/wgf/elegans190.dna/ , it concludes 6 files, > each contains the complete sequences from one single chromosome, the format > is fasta. The extension of the FASTA files is .fa. Every single file is > started as ">chromosoemeXXX" followed by the thousands of sequences. > > and therefore, it warn me that "Each line of the fasta entry must be the > same length except the last". and "indexing was interrupted, so unlinking > /home/wgf/elegans190.dna//directory". > > i was much confused about this. so for help. > > Wei Guifeng > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From scott at scottcain.net Tue Aug 24 11:01:47 2010 From: scott at scottcain.net (Scott Cain) Date: Tue, 24 Aug 2010 11:01:47 -0400 Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query In-Reply-To: <1282321387.30812.17.camel@pyrimidine.igb.uiuc.edu> References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au> <986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net> <045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net> <5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu> <4C6DADDF.1000103@cornell.edu> <20100820155956.GB400@vpn-165-124-164-118.vpn.northwestern.edu> <1282321387.30812.17.camel@pyrimidine.igb.uiuc.edu> Message-ID: Hi Chris, GMOD still only supports Chado with Postgres (for example, the GFF loader assumes a Postgres database), but when I reengineered the GFF loader a few years ago, I tried to do it with subclassing the loader in mind so that it could be subclassed to work with other RDMS. Scott On Fri, Aug 20, 2010 at 12:23 PM, Chris Fields wrote: > On Fri, 2010-08-20 at 10:59 -0500, Siddhartha Basu wrote: >> Hi, >> >> On Thu, 19 Aug 2010, Robert Buels wrote: >> >> > Chris Fields wrote: >> > > I think it's worth exploring having a DBIx::Class-based middle-ware >> > > approach similar to what Rob Buels has done for Chado. ?That would be >> > > fairly easy to get started using DBIx::Class::Schema::Loader. >> > > After that it would require optimization and tweaking, which is >> > > potentially more complex than Rob's setup as Chado is very Pg-specific, >> > > but maybe Rob can elaborate... >> > >> > Elaborating on how Bio::Chado::Schema is developed: >> > >> > The vast majority of the code and POD in BCS is autogenerated by >> > DBIx::Class::Schema::Loader. ?DBICSL gives you a baseline set of >> > DBIx::Class classes that covers all the tables, views, columns, unique >> > constraints, and foreign key relationships. >> > >> > Beyond that, you have to add on yourself. ?In BCS, we have mostly done >> > things like: >> > >> > ? * make better-named aliases for some of the autogenerated >> > ? ? relationships (though DBICSL does a surprisingly good job of naming >> > ? ? relationships automatically most of the time) >> > ? * add a tiny bit of bioperl compatibility (this needs a lot more work >> > ? ? by somebody, volunteers needed!) >> > ? * add convenience methods for using some of the Chado property tables >> > ? * use DBIx::Class::Tree::NestedSet to add some powerful ways of >> > ? ? traversing phylogenetic tree relationships >> > >> > Regarding DB backend specificity, BCS isn't Pg-specific at all, because >> > DBIx::Class itself goes to great lengths to be compatible (and performant!) >> > with just about every relational database out there. >> I would vouch for that at least as far as chado in oracle is concerned. >> So, ?far BCS works out flawlessly with our oracle chado instance at >> dictybase. Quite a chunk of BCS based code is also active in couple of >> our Mojo based webapps. The part which i still couldn't use directly is >> the 'synonym' table as it clashes with oracle specific reserved keywords. >> However, ?overall it seems to quite cross-RDMS compatible and highly >> recommended. >> >> -siddhartha > > Just to point out, I didn't say BCS is Pg-specific, but that Chado is > (that was the DBMS it was designed for). ?Maybe that should be amended > to 'was' now :) > > I recall seeing a page on this somewhere on the GMOD website along the > lines of "MySQL has problems so we chose Pg", and that Chado support > would focus on Pg. ?I'm guessing that's no longer the case? ?Or is only > the server-side stuff Pg-specific. > >> >In fact, the BCS test >> > suite deploys a Chado schema into a temporary SQLite database using >> > DBIC::Schema's deploy() method, and runs all of its tests on that. ?Very >> > handy. >> > >> > Chado's Pg-specific server-side functions can of course be called through >> > BCS if they are present, but it's perfectly possible to use Chado without >> > any of the server-side functions, and mostly the way I use it. >> > >> > Rob > > I think this opens up the possibility of starting a DBIx::Class-based > middleware solution. ?Hilmar, did you want to take that on? > > chris > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From bgs500 at york.ac.uk Tue Aug 24 11:35:53 2010 From: bgs500 at york.ac.uk (Ben Saville) Date: Tue, 24 Aug 2010 16:35:53 +0100 Subject: [Bioperl-l] Problem Parsing BLAST output In-Reply-To: <0384052D-74D2-4789-B7FA-76EED826044F@sbc.su.se> References: <77D8BE69-E2D3-4C75-BED4-08F1755E414F@york.ac.uk> <0384052D-74D2-4789-B7FA-76EED826044F@sbc.su.se> Message-ID: <34F7412D-2BFA-4D80-AEEB-2B8A9BE415D4@york.ac.uk> Sorry for the Delay in replying, 454 data analysis is very time consuming. please see http://seqanswers.com/forums/showthread.php?t=6484 For a discussion about this problem, and how we solved the issue. Thanks for the reply though, much appreciated! Regards Ben Saville On 20 Aug 2010, at 14:48, Dave Messina wrote: > Hi Ben, > > I would not use the script you posted ? I don't think it does what > you want. > > If you haven't already, you should take a look at the beginners' HOWTO > > http://www.bioperl.org/wiki/HOWTO:Beginners > > > the SearchIO HOWTO > > http://www.bioperl.org/wiki/HOWTO:SearchIO > > > and the example scripts included with BioPerl: > > http://www.bioperl.org/wiki/Scripts > > > > Incidentally, it's a lot of fiddly data processing to parse blast > reports for many contigs against multiple databases and then go back > and collate the results by query. I'm not sure exactly what you want > to do once you've separated by query ? if you provide some more > information, we could suggest ways to best get you where you want to > go. > > I will mention, though, that BLAST has the ability to search > multiple separate databases in one go and collate the results for > you. So that's something to consider. > > > > Dave > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Aug 24 11:54:20 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 24 Aug 2010 10:54:20 -0500 Subject: [Bioperl-l] a problem when using the Bio::DB::Fasta In-Reply-To: References: <995BCF30-99B2-46C2-A4E8-681F9E2A0BB5@illinois.edu> Message-ID: Please keep all responses on-list. Regarding sreformat: http://tinyurl.com/28q75rr Judging by the stack traces below, you are also running off a UNIX-like system. To concatenate files, use 'cat'. So, for all files ending with .fa: cat *.fa >> all.fa chris On Aug 24, 2010, at 8:54 AM, Guifeng Wei wrote: > Hello Fields, > > i have checked the fasta files. i suddenly find that the last line is blank line, and the last second is less than common. > > i am not able to run the command line as Jason's advice because i have no knowledge about "sreformat". > > i also want to ask a more question. i want megre the several single chromosome sequence file into one, OK? > > thank you very much. > > Wei Guifeng > 2010/8/24 Chris Fields > Guifeng, > > Did you follow Jason's advice yesterday about converting the FASTA over to a more consistent length? Or checking the database itself? These are both things reiterated by Florent and Peter. > > From Jason's last response: > > ------------------------- > Wei - > > Please ask your questions on the bioperl mailing list, I cannot answer questions directly for all requests. > Your problem has been answered by me on the list before so I urge you to use the list archives as a starting point. > > The line lengths of the fasta file sequence aren't the same length. > > you need to run this > bp_sreformat -if fasta -of fasta -i ORIGINAL -o NEW > mv NEW ORIGINAL > > or with sreformat > sreformat fasta ORIGINAL > NEW > mv NEW ORIGINAL > ------------------------- > > chris > > > On Aug 24, 2010, at 6:28 AM, Guifeng Wei wrote: > > > Hi, > > > > i have revised my scripts according to the previous email from Florent. > > However, there were still some errors which frustrated me so much. > > > > The errors are as follows: > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: Each line of the fasta entry must be the same length except the last. > > Line above #301451 ' > > ..' is 22 != 51 chars. > > STACK: Error::throw > > STACK: Bio::Root::Root::throw > > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368 > > STACK: Bio::DB::Fasta::calculate_offsets > > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:770 > > STACK: Bio::DB::Fasta::index_dir > > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:593 > > STACK: Bio::DB::Fasta::new > > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:488 > > STACK: bed2fasta.pl:13 > > ----------------------------------------------------------- > > indexing was interrupted, so unlinking > > /home/wgf/elegans190.dna//directory.index at > > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm line 1053 > > But in the directory /home/wgf/elegans190.dna/ , it concludes 6 files, > > each contains the complete sequences from one single chromosome, the format > > is fasta. The extension of the FASTA files is .fa. Every single file is > > started as ">chromosoemeXXX" followed by the thousands of sequences. > > > > and therefore, it warn me that "Each line of the fasta entry must be the > > same length except the last". and "indexing was interrupted, so unlinking > > /home/wgf/elegans190.dna//directory". > > > > i was much confused about this. so for help. > > > > Wei Guifeng > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > -- > ?????? Wei Guifeng > > > From cjfields at illinois.edu Tue Aug 24 12:14:51 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 24 Aug 2010 11:14:51 -0500 Subject: [Bioperl-l] Problem Parsing BLAST output In-Reply-To: <34F7412D-2BFA-4D80-AEEB-2B8A9BE415D4@york.ac.uk> References: <77D8BE69-E2D3-4C75-BED4-08F1755E414F@york.ac.uk> <0384052D-74D2-4789-B7FA-76EED826044F@sbc.su.se> <34F7412D-2BFA-4D80-AEEB-2B8A9BE415D4@york.ac.uk> Message-ID: <69C47A74-09C7-4024-9303-A3893658A2A8@illinois.edu> Just in case anyone needs it, there is a way to index these as well (both BLAST and the two tabular BLAST versions) for fast lookups of specific reports, if needed. See Bio::Index::Blast and Bio::Index::BlastTable in BioPerl. Caveat: I believe there is a bug with BLAST+ text output indexing (it chops the header off subsequent reports). I haven't investigated it enough, though, but I'll try looking into it today. chris On Aug 24, 2010, at 10:35 AM, Ben Saville wrote: > Sorry for the Delay in replying, 454 data analysis is very time consuming. > > please see http://seqanswers.com/forums/showthread.php?t=6484 > For a discussion about this problem, and how we solved the issue. > > Thanks for the reply though, much appreciated! > > Regards > Ben Saville > > > > > > On 20 Aug 2010, at 14:48, Dave Messina wrote: > >> Hi Ben, >> >> I would not use the script you posted ? I don't think it does what you want. >> >> If you haven't already, you should take a look at the beginners' HOWTO >> >> http://www.bioperl.org/wiki/HOWTO:Beginners >> >> >> the SearchIO HOWTO >> >> http://www.bioperl.org/wiki/HOWTO:SearchIO >> >> >> and the example scripts included with BioPerl: >> >> http://www.bioperl.org/wiki/Scripts >> >> >> >> Incidentally, it's a lot of fiddly data processing to parse blast reports for many contigs against multiple databases and then go back and collate the results by query. I'm not sure exactly what you want to do once you've separated by query ? if you provide some more information, we could suggest ways to best get you where you want to go. >> >> I will mention, though, that BLAST has the ability to search multiple separate databases in one go and collate the results for you. So that's something to consider. >> >> >> >> Dave >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Aug 24 12:17:17 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 24 Aug 2010 11:17:17 -0500 Subject: [Bioperl-l] FYI: interesting stuff in BLAST 2.2.24 release announcement References: <73C34D2F-813E-4FCE-8819-B86BC5A974C0@ncbi.nlm.nih.gov> Message-ID: FYI, Very interesting additions to BLAST+ (archive format). chris Begin forwarded message: > From: mcginnis > Date: August 24, 2010 10:46:50 AM CDT > To: NLM/NCBI List blast-announce > Subject: [blast-announce] Correction: BLAST 2.2.24 release announcement > > A new version of the stand-alone applications is available. > > Users are encouraged to use the BLAST+ applications available at ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ > > This release includes a number of bug fixes as well as new features for the BLAST+ applications: > > * Introduce BLAST Archive format to permit reformatting of stand-alone BLAST searches with the blast_formatter(see BLAST+ user manual) > * Added the blast_formatter application (see BLAST+ user manual) > * Added support for translated subject soft masking in the BLAST databases > * Added support for the BLAST Trace-back operations (btop) output format > * Added command line options to blastdbcmd for listing available BLAST databases > * Improved performance of formatting of remote BLAST searches > * Use a consistent exit code for out of memory conditions > * Fixed bug in indexed megablast with multiple space-separated BLAST databases > * Fixed bugs in legacy_blast.pl, blastdbcmd, rpsblast, and makeblastdb > * Fixed Windows installer for 64-bit installations > > BLAST+ applications, as well as the legacy C applications (e.g. blastall), may be downloaded from http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download From David.Messina at sbc.su.se Tue Aug 24 13:00:14 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 24 Aug 2010 19:00:14 +0200 Subject: [Bioperl-l] FYI: interesting stuff in BLAST 2.2.24 release announcement In-Reply-To: References: <73C34D2F-813E-4FCE-8819-B86BC5A974C0@ncbi.nlm.nih.gov> Message-ID: <27DD75E8-4452-4B2D-B5B9-A686C113E5B6@sbc.su.se> Here's a link to the manual: ftp://ftp.ncbi.nlm.nih.gov//blast/executables/blast%2B/2.2.24/user_manual.pdf (Is it on the NCBI website somewhere? Strange to have only a downloadable PDF.) The section on the new archive format is on page 27. It seems like a nice idea to have the flexibility, but I wonder about the time cost of using this format. One of the big gains from using tab-delimited output is that BLAST doesn't have to do all the post-processing to generate the alignment views. By doing the archive format, which if I understand it correctly is ASN.1, you're always paying the full price in time (and space, for that matter). Dave On Aug 24, 2010, at 18:17 , Chris Fields wrote: > FYI, > > Very interesting additions to BLAST+ (archive format). > > chris > > Begin forwarded message: > >> From: mcginnis >> Date: August 24, 2010 10:46:50 AM CDT >> To: NLM/NCBI List blast-announce >> Subject: [blast-announce] Correction: BLAST 2.2.24 release announcement >> >> A new version of the stand-alone applications is available. >> >> Users are encouraged to use the BLAST+ applications available at ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ >> >> This release includes a number of bug fixes as well as new features for the BLAST+ applications: >> >> * Introduce BLAST Archive format to permit reformatting of stand-alone BLAST searches with the blast_formatter(see BLAST+ user manual) >> * Added the blast_formatter application (see BLAST+ user manual) >> * Added support for translated subject soft masking in the BLAST databases >> * Added support for the BLAST Trace-back operations (btop) output format >> * Added command line options to blastdbcmd for listing available BLAST databases >> * Improved performance of formatting of remote BLAST searches >> * Use a consistent exit code for out of memory conditions >> * Fixed bug in indexed megablast with multiple space-separated BLAST databases >> * Fixed bugs in legacy_blast.pl, blastdbcmd, rpsblast, and makeblastdb >> * Fixed Windows installer for 64-bit installations >> >> BLAST+ applications, as well as the legacy C applications (e.g. blastall), may be downloaded from http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Aug 24 13:26:49 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 24 Aug 2010 12:26:49 -0500 Subject: [Bioperl-l] FYI: interesting stuff in BLAST 2.2.24 release announcement In-Reply-To: <27DD75E8-4452-4B2D-B5B9-A686C113E5B6@sbc.su.se> References: <73C34D2F-813E-4FCE-8819-B86BC5A974C0@ncbi.nlm.nih.gov> <27DD75E8-4452-4B2D-B5B9-A686C113E5B6@sbc.su.se> Message-ID: It's probably more applicable from the viewpoint of a cluster admin who would want to add the flexibility of having a single archive and allowing any format (as opposed to re-running the analysis). I'm just wondering if there is anything to glean there for possible alignment archiving purposes (ala SAM/BAM), but if it's ASN.1, likely not. chris On Aug 24, 2010, at 12:00 PM, Dave Messina wrote: > Here's a link to the manual: > ftp://ftp.ncbi.nlm.nih.gov//blast/executables/blast%2B/2.2.24/user_manual.pdf > > (Is it on the NCBI website somewhere? Strange to have only a downloadable PDF.) The section on the new archive format is on page 27. > > It seems like a nice idea to have the flexibility, but I wonder about the time cost of using this format. > > One of the big gains from using tab-delimited output is that BLAST doesn't have to do all the post-processing to generate the alignment views. By doing the archive format, which if I understand it correctly is ASN.1, you're always paying the full price in time (and space, for that matter). > > > > Dave > > > > > On Aug 24, 2010, at 18:17 , Chris Fields wrote: > >> FYI, >> >> Very interesting additions to BLAST+ (archive format). >> >> chris >> >> Begin forwarded message: >> >>> From: mcginnis >>> Date: August 24, 2010 10:46:50 AM CDT >>> To: NLM/NCBI List blast-announce >>> Subject: [blast-announce] Correction: BLAST 2.2.24 release announcement >>> >>> A new version of the stand-alone applications is available. >>> >>> Users are encouraged to use the BLAST+ applications available at ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ >>> >>> This release includes a number of bug fixes as well as new features for the BLAST+ applications: >>> >>> * Introduce BLAST Archive format to permit reformatting of stand-alone BLAST searches with the blast_formatter(see BLAST+ user manual) >>> * Added the blast_formatter application (see BLAST+ user manual) >>> * Added support for translated subject soft masking in the BLAST databases >>> * Added support for the BLAST Trace-back operations (btop) output format >>> * Added command line options to blastdbcmd for listing available BLAST databases >>> * Improved performance of formatting of remote BLAST searches >>> * Use a consistent exit code for out of memory conditions >>> * Fixed bug in indexed megablast with multiple space-separated BLAST databases >>> * Fixed bugs in legacy_blast.pl, blastdbcmd, rpsblast, and makeblastdb >>> * Fixed Windows installer for 64-bit installations >>> >>> BLAST+ applications, as well as the legacy C applications (e.g. blastall), may be downloaded from http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Tue Aug 24 14:45:29 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 24 Aug 2010 20:45:29 +0200 Subject: [Bioperl-l] FYI: interesting stuff in BLAST 2.2.24 release announcement In-Reply-To: References: <73C34D2F-813E-4FCE-8819-B86BC5A974C0@ncbi.nlm.nih.gov> <27DD75E8-4452-4B2D-B5B9-A686C113E5B6@sbc.su.se> Message-ID: <00C04DF9-F3C2-4574-B1E4-A3BF28EE953F@sbc.su.se> > It's probably more applicable from the viewpoint of a cluster admin who would want to add the flexibility of having a single archive and allowing any format (as opposed to re-running the analysis). Good point. > I'm just wondering if there is anything to glean there for possible alignment archiving purposes (ala SAM/BAM), but if it's ASN.1, likely not. To be honest, I didn't look that closely at it. It may be worth considering nevertheless. Dave From buiduyminh at gmail.com Tue Aug 24 14:56:43 2010 From: buiduyminh at gmail.com (Minh Bui) Date: Tue, 24 Aug 2010 14:56:43 -0400 Subject: [Bioperl-l] bp_seqfeature_load.pl fails on Mac os. Please help. In-Reply-To: <491D1B66-741F-4315-8A6B-46F465956017@sgul.ac.uk> References: <491D1B66-741F-4315-8A6B-46F465956017@sgul.ac.uk> Message-ID: How can I know where DBD:mysql PATH on my MAC? I am very new to MAC sorry. I just check and mysql.pm is in /Library/Perl/5.8.6/Bio/DB/SeqFeature/Store/DBI/mysql.pm On 8/21/10, Adam Witney wrote: > > On 20 Aug 2010, at 22:29, Minh Bui wrote: > > > Hi,, > > I am trying to load my GFF file to mysql database but I got this error > > when I ran the bp_seqfeature_load.pl ( bioperl 1.6.1 on ?MAC) > > > > [BioComplexity-5:/usr/local/bin] minh% perl bp_seqfeature_load.pl > > install_driver(mysql) failed: Can't locate DBD/mysql.pm in @INC (@INC > > contains: /sw/lib/perl5 /sw/lib/perl5/darwin > > /System/Library/Perl/5.8.6/darwin-thread-multi-2level > > /System/Library/Perl/5.8.6 > > /Library/Perl/5.8.6/darwin-thread-multi-2level /Library/Perl/5.8.6 > > /Library/Perl /Network/Library/Perl/5.8.6/darwin-thread-multi-2level > > /Network/Library/Perl/5.8.6 /Network/Library/Perl > > /System/Library/Perl/Extras/5.8.6/darwin-thread-multi-2level > > /System/Library/Perl/Extras/5.8.6 /Library/Perl/5.8.1 .) at (eval 44) > > line 3. > > Perhaps the DBD::mysql perl module hasn't been fully installed, > > or perhaps the capitalisation of 'mysql' isn't right. > > Available drivers: DBM, ExampleP, File, Gofer, Proxy, Sponge. > > at /Library/Perl/5.8.6/Bio/DB/SeqFeature/Store/DBI/mysql.pm line 212 > > > > I am using MAC OSX version 10.4.10 and MAMP? Isnt it the > > "/Library/Perl/5.8.6" already in @INC? What am I missing? > > I have been googling this error for a few hours. I also install > > Bioperl and reinstall DBD::mysql using CPAN. It still doesnt work.. > > > > Here is my $PERL5LIB: ?/sw/lib/perl5:/sw/lib/perl5/darwin/ > > > > Where did DBD:mysql get installed? can you verify that DBD/mysql.pm is actually in one of those directories listed above? > > From scott at scottcain.net Tue Aug 24 15:04:04 2010 From: scott at scottcain.net (Scott Cain) Date: Tue, 24 Aug 2010 15:04:04 -0400 Subject: [Bioperl-l] bp_seqfeature_load.pl fails on Mac os. Please help. In-Reply-To: References: <491D1B66-741F-4315-8A6B-46F465956017@sgul.ac.uk> Message-ID: Hi Minh, The file you found is not DBD::mysql though; it is Bio::DB::SeqFeature::Store::DBI::mysql, which was installed along with BioPerl. How did you find that file? The same method presumably would turn up DBD::mysql if it existed. I would use a command like this: locate mysql.pm which would locate all of the instances of files name mysql.pm on your computer. I would expect it to be located in /Library/Perl/5.8.6/darwin-thread-multi-2level/DBD/ if it was installed in a "normal" way (that is, not involving macports or fink or MAMP). Scott On Tue, Aug 24, 2010 at 2:56 PM, Minh Bui wrote: > How can I know where DBD:mysql PATH on my MAC? I am very new to MAC sorry. > > I just check and mysql.pm is in > /Library/Perl/5.8.6/Bio/DB/SeqFeature/Store/DBI/mysql.pm > > > > On 8/21/10, Adam Witney wrote: >> >> ?On 20 Aug 2010, at 22:29, Minh Bui wrote: >> >> ?> Hi,, >> ?> I am trying to load my GFF file to mysql database but I got this error >> ?> when I ran the bp_seqfeature_load.pl ( bioperl 1.6.1 on ?MAC) >> ?> >> ?> [BioComplexity-5:/usr/local/bin] minh% perl bp_seqfeature_load.pl >> ?> install_driver(mysql) failed: Can't locate DBD/mysql.pm in @INC (@INC >> ?> contains: /sw/lib/perl5 /sw/lib/perl5/darwin >> ?> /System/Library/Perl/5.8.6/darwin-thread-multi-2level >> ?> /System/Library/Perl/5.8.6 >> ?> /Library/Perl/5.8.6/darwin-thread-multi-2level /Library/Perl/5.8.6 >> ?> /Library/Perl /Network/Library/Perl/5.8.6/darwin-thread-multi-2level >> ?> /Network/Library/Perl/5.8.6 /Network/Library/Perl >> ?> /System/Library/Perl/Extras/5.8.6/darwin-thread-multi-2level >> ?> /System/Library/Perl/Extras/5.8.6 /Library/Perl/5.8.1 .) at (eval 44) >> ?> line 3. >> ?> Perhaps the DBD::mysql perl module hasn't been fully installed, >> ?> or perhaps the capitalisation of 'mysql' isn't right. >> ?> Available drivers: DBM, ExampleP, File, Gofer, Proxy, Sponge. >> ?> at /Library/Perl/5.8.6/Bio/DB/SeqFeature/Store/DBI/mysql.pm line 212 >> ?> >> ?> I am using MAC OSX version 10.4.10 and MAMP? Isnt it the >> ?> "/Library/Perl/5.8.6" already in @INC? What am I missing? >> ?> I have been googling this error for a few hours. I also install >> ?> Bioperl and reinstall DBD::mysql using CPAN. It still doesnt work.. >> ?> >> ?> Here is my $PERL5LIB: ?/sw/lib/perl5:/sw/lib/perl5/darwin/ >> >> >> >> Where did DBD:mysql get installed? can you verify that DBD/mysql.pm is actually in one of those directories listed above? >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From jason at bioperl.org Wed Aug 25 00:33:45 2010 From: jason at bioperl.org (Jason Stajich) Date: Tue, 24 Aug 2010 21:33:45 -0700 Subject: [Bioperl-l] Enquiry on gi_taxid_nucl.dmp.gz In-Reply-To: References: Message-ID: <4C749D29.3040003@bioperl.org> hi - please keep questions on list. I think one of your problem is your first use of $gi2taxidfile is wrong. when you call tie you want to specify an dbfile you want to store the index in. So call it "/tmp/gi2taxid.idx" or something like that. In my code here http://github.com/bioperl/bioperl-live/blob/master/scripts/taxa/classify_hits_kingdom.PLS you will see on line 97 we construct the name of the index file to be the folder, plus 'idx', plus the name gi2taxid which will be the name of index file. Also it would be safer for the split to be whitespace matching and that you want the the two first columns from the file. Doing this would eliminate the need for the chomp on the line above. my ($gi, $taxid) = split(/\s+/, $_); instead of chomp; my ($gi, $taxid) = split(" ", $_,2); There may be other problems but these should be fixed first -- and please send queries to the mailing list rather than to me directly so that others can answer questions. -jason Amali Thrimawithana wrote, On 8/24/10 8:13 PM: > Dear Jason > > Thank you very much for the information. I manage to get the information on > different taxonomic levels with the help of one of your example code > "local_taxonomydb_query". However I am having trouble with creating a local > index file of the gi_taxid_nucl.dmp so that I am able to get the taxonomic > id given the GI number of NCBI. At the moment I am using the tie() function > with DB_file and then storing the detail into a hash. However when I try to > retrieve a taxonomic ID given the GI number, it is not returning any thing > but an error. Below is part of the code (borrowed from the example code > classify kingdom), can you please let me know where I am going wrong? > ... > my $dbh2 = tie(%taxid4gi, 'DB_File', $gi2taxidfile); > > if( ! $done ) { > my $fh; > open(GI2TAXID, "$gi2taxidfile") or die $!; #here passing the unzipped > gi_taxid_nucl.dmp > my$i=0; > while () { > chomp; > my ($gi, $taxid) = split(" ", $_, 2); > $taxid4gi{$gi} = $taxid > if exists $taxid4gi{$gi}; > $i++; > unless( $DEBUG&& $i % 100000 ) { > warn "$i\n"; > } > } > $dbh2->sync; > } > my $gi2='183397240'; > my $taxd2=$taxid4gi{$gi2}; > print $taxd2, " \n"; > > Any help would be much appreciated > > Thanking you > Amali > > On 23 August 2010 06:29, Jason Stajich wrote: > > >> Hi Amali - >> >> This is how I'd print out the full classification by using the Tree methods >> (with probably a different way of initializing the $db object to your >> flatfiles location). >> >> #!/usr/bin/perl -w >> use strict; >> use Bio::DB::Taxonomy; >> >> my $db= Bio::DB::Taxonomy->new(-source => 'flatfile', >> -nodesfile => 'taxonomy/nodes.dmp', >> -namesfile => 'taxonomy/names.dmp'); >> >> my $taxonid = $db->get_taxonid('Homo sapiens'); >> my $taxon = $db->get_taxon(-taxonid => $taxonid); >> my $tree = Bio::Tree::Tree->new(-node => $taxon); >> my @taxa = $tree->get_nodes; >> print join(",", map { $_->scientific_name } @taxa), "\n"; >> >> -jason >> >> Amali Thrimawithana wrote, On 8/18/10 3:56 PM: >> >> Dear Dr Stajich, >> >>> I am a Masters student at Auckland university and my research is on >>> identifying yeast species present in wine by the use of 454 sequencing. In >>> order to carry out this research, a pipeline is being built in which at >>> the >>> final step each representative OTU need to be classified at different >>> taxonomic levels (ie: at Phylum, family, class, genus and species) by >>> using >>> the results from BLAST. To identify the sequences at each taxonomic level, >>> I >>> have been trying out the Bio::DB::Taxonomy module in bioperl. Using this >>> module, I am able to get the genus and species level by splitting the >>> scientific name returned by the Bio::taxon object. But unfortunately I am >>> uncertain on how to get the information for the other levels of the rank. >>> I >>> have tried several commands including "my @class = >>> $node->classification;", >>> but it does not work. Hence, could you please let me know how I might be >>> able to get the higher levels of taxonomy such as class and phylum using >>> bioperl? >>> >>> Look forward to hearing from you soon >>> >>> Thanking You >>> >>> Amali >>> >>> >>> From roy.chaudhuri at gmail.com Wed Aug 25 07:12:15 2010 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Wed, 25 Aug 2010 12:12:15 +0100 Subject: [Bioperl-l] Enquiry on gi_taxid_nucl.dmp.gz In-Reply-To: <4C749D29.3040003@bioperl.org> References: <4C749D29.3040003@bioperl.org> Message-ID: <4C74FA8F.3080506@gmail.com> > Also it would be safer for the split to be whitespace matching and that > you want the the two first columns from the file. Doing this would > eliminate the need for the chomp on the line above. > > my ($gi, $taxid) = split(/\s+/, $_); > > instead of > > chomp; > my ($gi, $taxid) = split(" ", $_,2); Sorry to be pedantic, but according to perldoc -f split: "As a special case, specifying a PATTERN of space (' ') will split on white space just as "split" with no arguments does" The only difference between patterns of " " and /\s+/ is that the latter will return an initial null field if there is leading white space, which may or may not be what you want. $ perl -e 'print join("-", split(" ", " 1\t2 3")), "\n"' 1-2-3 $ perl -e 'print join("-", split(/\s+/, " 1\t2 3")), "\n"' -1-2-3 Cheers. Roy. From kanmaninradha at gmail.com Thu Aug 26 04:29:08 2010 From: kanmaninradha at gmail.com (kanmani radha) Date: Thu, 26 Aug 2010 01:29:08 -0700 Subject: [Bioperl-l] getting DNA sequence for exon features from GFF Message-ID: Hi All, I would like to get the DNA seq from GFF file. I'm using Bio::Tools::GFF module. I could get everything else but not the DNA seq. Can anyone help me to find this out, Please. I appreciate your help very much. thanks, Kanmani #!/usr/bin/perl use strict; use warnings; use Bio::Tools::GFF; my $file = shift; my $gffio = Bio::Tools::GFF->new(-file => $file, -gff_version => 3); $gffio->features_attached_to_seqs(1); while (my $feat = $gffio->next_feature()){ my $start = $feat->start; my $end= $feat->end; my $size = $end-$start+1; my $strand = $feat->strand; my $seqid = $feat->seq_id; my $score = $feat->score; my $frame = $feat->frame; my $source = $feat->source_tag; my $type = $feat->primary_tag; my $gffstr = $gffio->gff_string($feat); my @alltags = $feat->all_tags(); my @ID_tag_value = $feat->each_tag_value("ID"); my $seq = $feat->seq(); print "$seq\n"; if($type eq "gene"){ # print "@ID_tag_value\t$size\t$type\t$start\t$end\n"; } } From David.Messina at sbc.su.se Thu Aug 26 04:53:48 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 26 Aug 2010 10:53:48 +0200 Subject: [Bioperl-l] getting DNA sequence for exon features from GFF In-Reply-To: References: Message-ID: <6A715DB8-C51A-4195-BB6C-73A7345CB65A@sbc.su.se> Admittedly i'm not up on the latest uses of GFF, but as far as I know, GFF is an annotation format only ? it does not contain the actual sequence. Have you looked in your GFF file to see if there are nucleotides in there? Dave On Aug 26, 2010, at 10:29, kanmani radha wrote: > Hi All, > I would like to get the DNA seq from GFF file. I'm using Bio::Tools::GFF > module. I could get everything else but not the DNA seq. From biopython at maubp.freeserve.co.uk Thu Aug 26 05:02:53 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 26 Aug 2010 10:02:53 +0100 Subject: [Bioperl-l] getting DNA sequence for exon features from GFF In-Reply-To: <6A715DB8-C51A-4195-BB6C-73A7345CB65A@sbc.su.se> References: <6A715DB8-C51A-4195-BB6C-73A7345CB65A@sbc.su.se> Message-ID: On Thu, Aug 26, 2010 at 9:53 AM, Dave Messina wrote: > > Admittedly i'm not up on the latest uses of GFF, but as far as I know, GFF > is an annotation format only ? it does not contain the actual sequence. > > Have you looked in your GFF file to see if there are nucleotides in there? > > Dave Actually a GFF file can optionally include a FASTA format sequence at the end of the file, although it seems to be more common to just supply separate GFF and FASTA files and cross reference by ID. Peter From David.Messina at sbc.su.se Thu Aug 26 05:08:20 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 26 Aug 2010 11:08:20 +0200 Subject: [Bioperl-l] getting DNA sequence for exon features from GFF In-Reply-To: References: <6A715DB8-C51A-4195-BB6C-73A7345CB65A@sbc.su.se> Message-ID: Aha, great, thanks for clarifying, Peter. And if I bothered to look at the Bio::Tools::GFF documentation before answering :), I would have seen this: http://doc.bioperl.org/bioperl-live/Bio/Tools/GFF.html#General which describes how you can use $gffio->get_seqs() and related methods to pull out the sequence data. Dave On Aug 26, 2010, at 11:02, Peter wrote: > On Thu, Aug 26, 2010 at 9:53 AM, Dave Messina wrote: >> >> Admittedly i'm not up on the latest uses of GFF, but as far as I know, GFF >> is an annotation format only ? it does not contain the actual sequence. >> >> Have you looked in your GFF file to see if there are nucleotides in there? >> >> Dave > > Actually a GFF file can optionally include a FASTA format sequence > at the end of the file, although it seems to be more common to just > supply separate GFF and FASTA files and cross reference by ID. > > Peter From David.Messina at sbc.su.se Thu Aug 26 05:18:25 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 26 Aug 2010 11:18:25 +0200 Subject: [Bioperl-l] getting DNA sequence for exon features from GFF In-Reply-To: References: <6A715DB8-C51A-4195-BB6C-73A7345CB65A@sbc.su.se> Message-ID: <984552CF-01F3-4D29-932F-DD030CCC1448@sbc.su.se> So, just to finish the thought: Kanmani, Apologies for my sloppy and uninformed answer. The following is only slightly less sloppy and uninformed, but may actually answer your question. I think you need to call $gffio->get_seqs() probably as my @seq_objects = $gffio->get_seqs(); and then loop through those something like: foreach my $seq_object (@seq_objects) { my $seq = $seq_object->seq(); foreach my $feat ($seq->get_SeqFeatures) { # do your feature processing here } } Note that I haven't tested the above code. Dave From fs5 at sanger.ac.uk Thu Aug 26 05:19:44 2010 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Thu, 26 Aug 2010 10:19:44 +0100 Subject: [Bioperl-l] getting DNA sequence for exon features from GFF In-Reply-To: References: Message-ID: <1282814384.4777.48.camel@deskpro15336.dynamic.sanger.ac.uk> Hi Kammani, While GFF files may contain DNA sequence data, most of them don't, so you will have to use the location information you get from the GFF annotation file in conjunction with, e.g., a local FASTA database of the genomic sequence you are working with or an online resource. Frank On Thu, 2010-08-26 at 01:29 -0700, kanmani radha wrote: > Hi All, > I would like to get the DNA seq from GFF file. I'm using Bio::Tools::GFF > module. I could get everything else but not the DNA seq. > > Can anyone help me to find this out, Please. I appreciate your help very > much. > thanks, > Kanmani > > #!/usr/bin/perl > > use strict; > use warnings; > use Bio::Tools::GFF; > > my $file = shift; > > my $gffio = Bio::Tools::GFF->new(-file => $file, -gff_version => 3); > $gffio->features_attached_to_seqs(1); > > while (my $feat = $gffio->next_feature()){ > my $start = $feat->start; > my $end= $feat->end; > my $size = $end-$start+1; > my $strand = $feat->strand; > my $seqid = $feat->seq_id; > my $score = $feat->score; > my $frame = $feat->frame; > my $source = $feat->source_tag; > my $type = $feat->primary_tag; > my $gffstr = $gffio->gff_string($feat); > my @alltags = $feat->all_tags(); > my @ID_tag_value = $feat->each_tag_value("ID"); > > my $seq = $feat->seq(); > print "$seq\n"; > > if($type eq "gene"){ # > print "@ID_tag_value\t$size\t$type\t$start\t$end\n"; > } > } > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From cjfields at illinois.edu Thu Aug 26 10:20:48 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 26 Aug 2010 09:20:48 -0500 Subject: [Bioperl-l] getting DNA sequence for exon features from GFF In-Reply-To: <1282814384.4777.48.camel@deskpro15336.dynamic.sanger.ac.uk> References: <1282814384.4777.48.camel@deskpro15336.dynamic.sanger.ac.uk> Message-ID: <6CC5D44B-47C7-4A05-B834-EAB9173C74CE@illinois.edu> Kammani, If you are using BioPerl, the best option currently available is to load a database with all relevant information (GFF and FASTA), then use that database for querying. The most commonly-used ones now are Bio::DB::SeqFeature::Store and Bio::DB::GFF; the former is very GFF3-centric, but I believe it can handle GFF/GTF, and it has various database adaptors (MySQL, Pg, BDB, SQLite). chris On Aug 26, 2010, at 4:19 AM, Frank Schwach wrote: > Hi Kammani, > > While GFF files may contain DNA sequence data, most of them don't, so > you will have to use the location information you get from the GFF > annotation file in conjunction with, e.g., a local FASTA database of the > genomic sequence you are working with or an online resource. > > > Frank > > > > On Thu, 2010-08-26 at 01:29 -0700, kanmani radha wrote: >> Hi All, >> I would like to get the DNA seq from GFF file. I'm using Bio::Tools::GFF >> module. I could get everything else but not the DNA seq. >> >> Can anyone help me to find this out, Please. I appreciate your help very >> much. >> thanks, >> Kanmani >> >> #!/usr/bin/perl >> >> use strict; >> use warnings; >> use Bio::Tools::GFF; >> >> my $file = shift; >> >> my $gffio = Bio::Tools::GFF->new(-file => $file, -gff_version => 3); >> $gffio->features_attached_to_seqs(1); >> >> while (my $feat = $gffio->next_feature()){ >> my $start = $feat->start; >> my $end= $feat->end; >> my $size = $end-$start+1; >> my $strand = $feat->strand; >> my $seqid = $feat->seq_id; >> my $score = $feat->score; >> my $frame = $feat->frame; >> my $source = $feat->source_tag; >> my $type = $feat->primary_tag; >> my $gffstr = $gffio->gff_string($feat); >> my @alltags = $feat->all_tags(); >> my @ID_tag_value = $feat->each_tag_value("ID"); >> >> my $seq = $feat->seq(); >> print "$seq\n"; >> >> if($type eq "gene"){ # >> print "@ID_tag_value\t$size\t$type\t$start\t$end\n"; >> } >> } >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Aug 26 10:31:59 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 26 Aug 2010 09:31:59 -0500 Subject: [Bioperl-l] getting DNA sequence for exon features from GFF In-Reply-To: References: <6A715DB8-C51A-4195-BB6C-73A7345CB65A@sbc.su.se> Message-ID: On Aug 26, 2010, at 4:02 AM, Peter wrote: > On Thu, Aug 26, 2010 at 9:53 AM, Dave Messina wrote: >> >> Admittedly i'm not up on the latest uses of GFF, but as far as I know, GFF >> is an annotation format only ? it does not contain the actual sequence. >> >> Have you looked in your GFF file to see if there are nucleotides in there? >> >> Dave > > Actually a GFF file can optionally include a FASTA format sequence > at the end of the file, although it seems to be more common to just > supply separate GFF and FASTA files and cross reference by ID. > > Peter IIRC, optionally including FASTA sequence is specified only in the GFF3 spec; use of FASTA isn't explicitly mentioned in earlier versions. We only support it with earlier GFF due to convergence of the various GFF parsers. The original GFF spec proposed allowing sequence, but it's in the form of meta information and I have never seen it used in practice (as you mention, the FASTA is normally loaded separately). chris From kanmaninradha at gmail.com Thu Aug 26 12:22:14 2010 From: kanmaninradha at gmail.com (kanmani radha) Date: Thu, 26 Aug 2010 09:22:14 -0700 Subject: [Bioperl-l] getting DNA sequence for exon features from GFF In-Reply-To: <6CC5D44B-47C7-4A05-B834-EAB9173C74CE@illinois.edu> References: <1282814384.4777.48.camel@deskpro15336.dynamic.sanger.ac.uk> <6CC5D44B-47C7-4A05-B834-EAB9173C74CE@illinois.edu> Message-ID: Hi Everyone, Thanks very much for this clarification. Thanks a ton for every one who spared their time to educate me. I see your points. Please correct me if I am wrong. I understand that, Its better to use use Bio::DB::SeqFeature or Bio::DB::GFF to load the fasta sequences (from a separate multifasta) file and then Bio::Tools::GFF to parse the feature info from a gff file . Then query the created database for the relevent GFF coordinates.... I will implement this. Thanks once again. Kanmani On Thu, Aug 26, 2010 at 7:20 AM, Chris Fields wrote: > Kammani, > > If you are using BioPerl, the best option currently available is to load a > database with all relevant information (GFF and FASTA), then use that > database for querying. The most commonly-used ones now are > Bio::DB::SeqFeature::Store and Bio::DB::GFF; the former is very > GFF3-centric, but I believe it can handle GFF/GTF, and it has various > database adaptors (MySQL, Pg, BDB, SQLite). > > chris > > On Aug 26, 2010, at 4:19 AM, Frank Schwach wrote: > > > Hi Kammani, > > > > While GFF files may contain DNA sequence data, most of them don't, so > > you will have to use the location information you get from the GFF > > annotation file in conjunction with, e.g., a local FASTA database of the > > genomic sequence you are working with or an online resource. > > > > > > Frank > > > > > > > > On Thu, 2010-08-26 at 01:29 -0700, kanmani radha wrote: > >> Hi All, > >> I would like to get the DNA seq from GFF file. I'm using Bio::Tools::GFF > >> module. I could get everything else but not the DNA seq. > >> > >> Can anyone help me to find this out, Please. I appreciate your help very > >> much. > >> thanks, > >> Kanmani > >> > >> #!/usr/bin/perl > >> > >> use strict; > >> use warnings; > >> use Bio::Tools::GFF; > >> > >> my $file = shift; > >> > >> my $gffio = Bio::Tools::GFF->new(-file => $file, -gff_version => 3); > >> $gffio->features_attached_to_seqs(1); > >> > >> while (my $feat = $gffio->next_feature()){ > >> my $start = $feat->start; > >> my $end= $feat->end; > >> my $size = $end-$start+1; > >> my $strand = $feat->strand; > >> my $seqid = $feat->seq_id; > >> my $score = $feat->score; > >> my $frame = $feat->frame; > >> my $source = $feat->source_tag; > >> my $type = $feat->primary_tag; > >> my $gffstr = $gffio->gff_string($feat); > >> my @alltags = $feat->all_tags(); > >> my @ID_tag_value = $feat->each_tag_value("ID"); > >> > >> my $seq = $feat->seq(); > >> print "$seq\n"; > >> > >> if($type eq "gene"){ # > >> print "@ID_tag_value\t$size\t$type\t$start\t$end\n"; > >> } > >> } > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > -- > > The Wellcome Trust Sanger Institute is operated by Genome Research > > Limited, a charity registered in England with number 1021457 and a > > company registered in England with number 2742969, whose registered > > office is 215 Euston Road, London, NW1 2BE. > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Thu Aug 26 13:08:56 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 26 Aug 2010 12:08:56 -0500 Subject: [Bioperl-l] getting DNA sequence for exon features from GFF In-Reply-To: References: <1282814384.4777.48.camel@deskpro15336.dynamic.sanger.ac.uk> <6CC5D44B-47C7-4A05-B834-EAB9173C74CE@illinois.edu> Message-ID: On Aug 26, 2010, at 11:22 AM, kanmani radha wrote: > Hi Everyone, > > Thanks very much for this clarification. Thanks a ton for every one who > spared their time to educate me. > > I see your points. Please correct me if I am wrong. > > I understand that, Its better to use use Bio::DB::SeqFeature or Bio::DB::GFF > to load the fasta sequences (from a separate multifasta) file and > then Bio::Tools::GFF to parse the feature info from a gff file . Then query > the created database for the relevent GFF coordinates.... > > I will implement this. > > Thanks once again. > Kanmani Yes, in general. I forgot to mention that you can have an in-memory database as well, but it's only suggested if you have a few thousand or so features and small sequences (I think bacterial chromosomes will work). chris From Havard.Aanes at nvh.no Wed Aug 25 11:47:12 2010 From: Havard.Aanes at nvh.no (=?iso-8859-1?Q?Aanes_H=E5vard?=) Date: Wed, 25 Aug 2010 17:47:12 +0200 Subject: [Bioperl-l] bpfetch.pl Message-ID: <897520BC3AAE754FA4E34E2FD26490A8021C61597B8D@A-EXMB1.veths.no> Hi, I am trying do obtain a set of mRNA sequences from a database, made by the bpindex script. I thought this should be a trivial task, but it appears not to be. I get the sequences if I do one by one, like this: perl scripts/index/bpfetch.pl -dir ./ zebrafish:NM_201192 zebrafish:NM_212708 But I need hundreds of sequences, so my plan was to put the RefSeq IDs in a file and use that as an argument (or whatever it is called in perl). That does not work: haavaaan at login2 ~/download/src/bioperl-1.2.3 $ perl scripts/index/bpfetch.pl -dir ./ zebrafish:./some_seqs You are running bpindex.pl without installing bioperl. You have done it from bioperl/scripts, and so we can find the necessary information but it is much better to install bioperl Please read the README in the bioperl distribution Sequence %id in Database zebrafish is not present Any suggestions on how to do this? Alternative approaches are also appreciated. I have no experience in perl, just started using linux, and for the moment there is no time to learn perl, so I would really be grateful for any help to solve this specific task. Best regards H?vard Aanes (M.Sc.) Ph.D. student Section for biochemistry and physiology The Norwegian School of Veterinary Science Telephone: +47 22597358 The new e-mail domain name for The Norwegian School of Veterinary Science is @nvh.no. The former domain address @veths.no will still be in use, but it will be discontinued within 1-2 years. Please update your e-mail records. This message verifies that the e-mail has been scanned for virus, and deemed virus-free according to our scanengines. From kanmaninradha at gmail.com Thu Aug 26 04:23:28 2010 From: kanmaninradha at gmail.com (kanmani) Date: Thu, 26 Aug 2010 01:23:28 -0700 (PDT) Subject: [Bioperl-l] Bio::Tools:GFF to get DNA sequences... Message-ID: <9b7381d7-3596-4e60-a2ac-6c8c135d457d@s24g2000pri.googlegroups.com> Hi I am trying to get the DNA sequences for each exon feature. I have the following script. Everything works except getting sequences. Can some one correct me.....Thanks. #!/usr/bin/perl use strict; use warnings; use Bio::Tools::GFF; my $file = shift; my $gffio = Bio::Tools::GFF->new(-file => $file, -gff_version => 3); $gffio->features_attached_to_seqs(1); while (my $feat = $gffio->next_feature()){ my $start = $feat->start; my $end= $feat->end; my $size = $end-$start+1; my $strand = $feat->strand; my $seqid = $feat->seq_id; my $score = $feat->score; my $frame = $feat->frame; my $source = $feat->source_tag; my $type = $feat->primary_tag; my $gffstr = $gffio->gff_string($feat); my @alltags = $feat->all_tags(); my @ID_tag_value = $feat->each_tag_value("ID"); my $seq = $feat->seq(); print "$seq\n"; if($type eq "gene"){ print "@ID_tag_value\t$size\t$type\t$start\t$end\n"; } } From kanmaninradha at gmail.com Thu Aug 26 17:24:40 2010 From: kanmaninradha at gmail.com (kanmani radha) Date: Thu, 26 Aug 2010 14:24:40 -0700 Subject: [Bioperl-l] getting DNA sequence for exon features from GFF In-Reply-To: References: <1282814384.4777.48.camel@deskpro15336.dynamic.sanger.ac.uk> <6CC5D44B-47C7-4A05-B834-EAB9173C74CE@illinois.edu> Message-ID: Hi Chris and others, For a brief amount time i could get away using Bio::DB::Fasta to index fasta files and Bio::Tools::GFF to iterate thru GFF features. But, i hit the wall again. Looks like sequential access of GFF featuers is not sufficient, I want to have a random access to it. I see the only way to do that is by using Bio::DB::GFF as suggested by Chris. Here is my question. Is there any tutorial to configure Bioperl or this module in particular to work with MySQL/postgres. I will really appreciate it. And thanks for all your help. Kanmani On Thu, Aug 26, 2010 at 10:08 AM, Chris Fields wrote: > On Aug 26, 2010, at 11:22 AM, kanmani radha wrote: > > > Hi Everyone, > > > > Thanks very much for this clarification. Thanks a ton for every one who > > spared their time to educate me. > > > > I see your points. Please correct me if I am wrong. > > > > I understand that, Its better to use use Bio::DB::SeqFeature or > Bio::DB::GFF > > to load the fasta sequences (from a separate multifasta) file and > > then Bio::Tools::GFF to parse the feature info from a gff file . Then > query > > the created database for the relevent GFF coordinates.... > > > > I will implement this. > > > > Thanks once again. > > Kanmani > > Yes, in general. I forgot to mention that you can have an in-memory > database as well, but it's only suggested if you have a few thousand or so > features and small sequences (I think bacterial chromosomes will work). > > chris From kanmaninradha at gmail.com Thu Aug 26 18:04:20 2010 From: kanmaninradha at gmail.com (kanmani radha) Date: Thu, 26 Aug 2010 15:04:20 -0700 Subject: [Bioperl-l] getting DNA sequence for exon features from GFF In-Reply-To: References: <1282814384.4777.48.camel@deskpro15336.dynamic.sanger.ac.uk> <6CC5D44B-47C7-4A05-B834-EAB9173C74CE@illinois.edu> Message-ID: HI, I made some progress since then.... - Installing Bio::DB::DBI::mysql needed Biosql. - Downloaded and installed biosql follow the instruction as given in their INSTALL file - Created biosql db in my mysql server - loaded schema using script from biosql - installed DBI - Now, I have problem with DBD::mysql. That reminds me couple years back i had to struggle installing this driver on another machine. I thought i ask around this time. It fails with a bunch of error messages.....the first of it being.... dbdimp.h:22:49 error: mysql.h no such filer or directory But, My mysql installation has header file in "/usr/include/mysql3/mysql/mysql.h". Can anyone suggest how to move forward from that..... thanks, Kanmani On Thu, Aug 26, 2010 at 2:24 PM, kanmani radha wrote: > Hi Chris and others, > > For a brief amount time i could get away using Bio::DB::Fasta to index > fasta files and Bio::Tools::GFF to iterate thru GFF features. But, i hit the > wall again. Looks like sequential access of GFF featuers is not sufficient, > I want to have a random access to it. I see the only way to do that is by > using Bio::DB::GFF as suggested by Chris. > > Here is my question. Is there any tutorial to configure Bioperl or this > module in particular to work with MySQL/postgres. I will really appreciate > it. > > And thanks for all your help. > Kanmani > > > On Thu, Aug 26, 2010 at 10:08 AM, Chris Fields wrote: > >> On Aug 26, 2010, at 11:22 AM, kanmani radha wrote: >> >> > Hi Everyone, >> > >> > Thanks very much for this clarification. Thanks a ton for every one who >> > spared their time to educate me. >> > >> > I see your points. Please correct me if I am wrong. >> > >> > I understand that, Its better to use use Bio::DB::SeqFeature or >> Bio::DB::GFF >> > to load the fasta sequences (from a separate multifasta) file and >> > then Bio::Tools::GFF to parse the feature info from a gff file . Then >> query >> > the created database for the relevent GFF coordinates.... >> > >> > I will implement this. >> > >> > Thanks once again. >> > Kanmani >> >> Yes, in general. I forgot to mention that you can have an in-memory >> database as well, but it's only suggested if you have a few thousand or so >> features and small sequences (I think bacterial chromosomes will work). >> >> chris > > > From rafalucas.unicamp at gmail.com Thu Aug 26 18:11:07 2010 From: rafalucas.unicamp at gmail.com (Rafael Lucas) Date: Thu, 26 Aug 2010 19:11:07 -0300 Subject: [Bioperl-l] Help in algorithm Bio::Structure::IO::pdb Message-ID: Hi folks, How are you? I'm from Brazil and I was making an algorithm that Cryptographyc a data and then print the result in a pdb file. So I have a .fasta file and want to pass this file to .pdb file, if I use a program, like PyMol, it will take so much time, so I wanna use the Bio::Structure::IO::pdb to accelerate this process, could you help me in this problem? Thank you, Rafael Lucas Faculdade de Tecnologia em Analise e Desenvolvimento de Sistemas FT - UNICAMP +55 (19)9614-0533 From J.Christopher.Ellis at duke.edu Thu Aug 26 22:06:30 2010 From: J.Christopher.Ellis at duke.edu (J. Christopher Ellis) Date: Thu, 26 Aug 2010 22:06:30 -0400 Subject: [Bioperl-l] standaloneblastplus blastn crash Message-ID: <55861.1282874790@duke.edu> When I run the standaloneblastplus I get the following error... ------------- EXCEPTION ------------- MSG: C:Program FilesNCBIblast-2.2.24+binblastn.exe call crashed: There was a problem running C:Program FilesNCBIblast-2.2.24+binblastn.exe :? at C:/Perl64/lib/Bio/Tools/Run/WrapperBase/CommandExts.pm line 1001. STACK Bio::Tools::Run::WrapperBase::_run C:/Perl64/lib/Bio/Tools/Run/WrapperBase/CommandExts.pm:1006 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD C:/Perl64/lib/Bio/Tools/Run/StandAloneBlastPlus.pm:1303 STACK Bio::Tools::Run::StandAloneBlastPlus::run C:/Perl64/lib/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:270 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD C:/Perl64/lib/Bio/Tools/Run/StandAloneBlastPlus.pm:1301 STACK toplevel localBlast.pl:9 ------------------------------------- I have a sneaky suspicion that it is an easy fix but for the life of me I can not figure it out! :) Thanks in advance, Chris From indraniel at gmail.com Thu Aug 26 21:57:54 2010 From: indraniel at gmail.com (Indraniel) Date: Fri, 27 Aug 2010 01:57:54 +0000 (UTC) Subject: [Bioperl-l] How to convert SFF into Fastq References: Message-ID: A fourth option is the following tool, sff2fastq (written in C), described here: http://indraniel.wordpress.com/2010/04/23/sff2fastq/ and http://github.com/indraniel/sff2fastq Indraniel From David.Messina at sbc.su.se Fri Aug 27 03:41:21 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 27 Aug 2010 09:41:21 +0200 Subject: [Bioperl-l] [RFC] Interolog::Walk In-Reply-To: <4C6D0B50.4050902@sms.ed.ac.uk> References: <4C6BF4BD.5010200@sms.ed.ac.uk> <8BBCD561-CA5E-412B-8CBC-34767006A74A@sbc.su.se> <4C6D0B50.4050902@sms.ed.ac.uk> Message-ID: Hi Giuseppe, On Aug 19, 2010, at 12:45, Giuseppe Gallone wrote: > Bio::Orthology::InterologMap > Bio::Orthology::Interolog::Map, > just in case somebody else finds other interesting applications for the Interolog concept and would like to "plug in" their own contribution. Would this make any sense? Absolutely. I think either of the above is a good option, and I agree that the second is a little more flexible. Your POD looks great! Way better than most. Having seen the whole thing now, I think your description is fine as is. And if you have another tutorial and example scripts on top of it, that would really be terrific, above and beyond what most people would expect. So, time to unleash it on the world! :) Dave From David.Messina at sbc.su.se Fri Aug 27 03:58:12 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 27 Aug 2010 09:58:12 +0200 Subject: [Bioperl-l] standaloneblastplus blastn crash In-Reply-To: <55861.1282874790@duke.edu> References: <55861.1282874790@duke.edu> Message-ID: <9275A540-AE42-47B0-BA73-A906964C451B@sbc.su.se> Hi Chris, If you look at the error message, it says what the problem is: it's trying to call the blastn executable with no spaces in the path name. > MSG: C:Program FilesNCBIblast-2.2.24+binblastn.exe call crashed: There > was a problem running C:Program FilesNCBIblast-2.2.24+binblastn.exe Now, that could be a problem is BioPerl or it could be a problem in your code. It's hard to diagnose where the problem lies without your code, so please post your code. Dave From G.Gallone at sms.ed.ac.uk Fri Aug 27 07:07:57 2010 From: G.Gallone at sms.ed.ac.uk (Giuseppe Gallone) Date: Fri, 27 Aug 2010 12:07:57 +0100 Subject: [Bioperl-l] [RFC] Interolog::Walk In-Reply-To: References: <4C6BF4BD.5010200@sms.ed.ac.uk> <8BBCD561-CA5E-412B-8CBC-34767006A74A@sbc.su.se> <4C6D0B50.4050902@sms.ed.ac.uk> Message-ID: <4C779C8D.1090007@sms.ed.ac.uk> Hi Dave, thank you very much for your feedback :) . I will register the namespace right now. I think I will use 'homology' as the second level name though, because I plan to extend the module to work with paralogues as well. As for the category, which one of the following you reckon it will fit a Bio:: package better http://www.cpan.org/modules/by-category/ Regards Giuseppe On 27/08/10 08:41, Dave Messina wrote: > Hi Giuseppe, > > > On Aug 19, 2010, at 12:45, Giuseppe Gallone wrote: >> Bio::Orthology::InterologMap >> Bio::Orthology::Interolog::Map, > >> just in case somebody else finds other interesting applications for the Interolog concept and would like to "plug in" their own contribution. Would this make any sense? > > Absolutely. I think either of the above is a good option, and I agree that the second is a little more flexible. > > Your POD looks great! Way better than most. Having seen the whole thing now, I think your description is fine as is. And if you have another tutorial and example scripts on top of it, that would really be terrific, above and beyond what most people would expect. > > So, time to unleash it on the world! :) > > > Dave > > -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From David.Messina at sbc.su.se Fri Aug 27 07:25:06 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 27 Aug 2010 13:25:06 +0200 Subject: [Bioperl-l] [RFC] Interolog::Walk In-Reply-To: <4C779C8D.1090007@sms.ed.ac.uk> References: <4C6BF4BD.5010200@sms.ed.ac.uk> <8BBCD561-CA5E-412B-8CBC-34767006A74A@sbc.su.se> <4C6D0B50.4050902@sms.ed.ac.uk> <4C779C8D.1090007@sms.ed.ac.uk> Message-ID: <80E5F23B-EA13-40EE-B0C5-81F2E6A69C01@sbc.su.se> Hi Giuseppe, > I think I will use 'homology' as the second level name though, because I plan to extend the module to work with paralogues as well. Sounds good. > As for the category, which one of the following you reckon it will fit a Bio:: package better > > http://www.cpan.org/modules/by-category/ Bio:: is in 23 - miscellaneous modules, so probably keeping with that makes sense. I don't know much about that stuff, though. Chris F. or other CPAN cognoscenti care to comment? Dave From cjfields at illinois.edu Fri Aug 27 09:26:51 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 27 Aug 2010 08:26:51 -0500 Subject: [Bioperl-l] [RFC] Interolog::Walk In-Reply-To: <80E5F23B-EA13-40EE-B0C5-81F2E6A69C01@sbc.su.se> References: <4C6BF4BD.5010200@sms.ed.ac.uk> <8BBCD561-CA5E-412B-8CBC-34767006A74A@sbc.su.se> <4C6D0B50.4050902@sms.ed.ac.uk> <4C779C8D.1090007@sms.ed.ac.uk> <80E5F23B-EA13-40EE-B0C5-81F2E6A69C01@sbc.su.se> Message-ID: <88BB7813-E892-4BEC-9C49-5FD22325BBF7@illinois.edu> On Aug 27, 2010, at 6:25 AM, Dave Messina wrote: > Hi Giuseppe, > > >> I think I will use 'homology' as the second level name though, because I plan to extend the module to work with paralogues as well. > > Sounds good. > > >> As for the category, which one of the following you reckon it will fit a Bio:: package better >> >> http://www.cpan.org/modules/by-category/ > > > Bio:: is in 23 - miscellaneous modules, so probably keeping with that makes sense. > > I don't know much about that stuff, though. Chris F. or other CPAN cognoscenti care to comment? > > > Dave That's probably the best spot, as we cover a fairly broad range (mainly due to core monolithic structure). Though it's terribly non-descript, sort of the junk drawer of CPAN. chris From adamkennedybackup at gmail.com Sun Aug 29 07:35:50 2010 From: adamkennedybackup at gmail.com (Adam Kennedy) Date: Sun, 29 Aug 2010 21:35:50 +1000 Subject: [Bioperl-l] Could I install BioPerl on Windows with the ActivePerl 5.12.1? In-Reply-To: <5115F433-06AC-46F1-81AD-D15C4A8D9524@gmail.com> References: <78E913D5-00E2-45F2-AA9D-7F4A7CDBFDA1@gmail.com> <5115F433-06AC-46F1-81AD-D15C4A8D9524@gmail.com> Message-ID: http://strawberryperl.com/download/professional/strawberry-perl-professional-5.10.1.3-alpha-2.msi You get BioPerl installed out the box. Adam K On 20 August 2010 03:20, Christopher Fields wrote: > cc'ing list. ?Looks like the BioPerl PPM is possibly broken for perl 5.12. ?Shouldn't be too hard to fix, but apparently there are a lot of missing packages. Troubling... > > chris > > On Aug 19, 2010, at 11:29 AM, han sun wrote: > >> v5.10 works,thanks. >> >> 2010/8/19 Christopher Fields >> Try using ActivePerl 5.10 instead of v5.12. ?It's very possible the PPM won't work for v5.12 yet. >> >> chris >> >> On Aug 19, 2010, at 9:25 AM, han sun wrote: >> >> > Hello everyone, >> > >> > I have used perl for several months,and I now want to feel the power of >> > bioperl. >> > But it seems that the installing is more difficult than I thought. >> > >> > I typed the commands. >> > >> > >> > >> > install-shell >> > >> > >> > rep add bioperl http://bioperl.org/DIST >> > >> > >> > rep add uwinnipeg >> > http://cpan.uwinnipeg.ca/PPMPackages/12xx/ >> > >> > >> > rep add trouchelle http://trouchelle.com/ppm12/ >> > >> > install BioPerl >> > >> > However,the installing failed, >> > >> > ppm install failed: >> > Can't find any package that provides List::MoreUtils for Bundle-BioPerl-Core >> > Can't find any package that provides PostScript::TextBlock for >> > Bundle-BioPerl-Core >> > Can't find any package that provides Ace:: for Bundle-BioPerl-Core >> > Can't find any package that provides SOAP::Lite for Bundle-BioPerl-Core >> > Can't find any package that provides Convert::Binary::C for >> > Bundle-BioPerl-Core >> > Can't find any package that provides XML::Twig for Bundle-BioPerl-Core >> > Can't find any package that provides DB_File:: for Bundle-BioPerl-Core >> > Can't find any package that provides IPC::Run for GraphViz >> > Can't find any package that provides XML-XPathEngine for XML-DOM-XPath >> > Can't find any package that provides List-MoreUtils for Moose >> > Can't find any package that provides List-MoreUtils for Class-MOP >> > >> > >> > then I tried >> > >> > install http://www.bribes.org/perl/ppm/GD.ppd >> > >> > and tried the installation again,but it still didn't help. >> > >> > * >> > * >> > * >> > * >> > * >> > * >> > >> > >> > *Do you konw what's wrong with the problem?* >> > * >> > * >> > * >> > * >> > *Please help me,thanks very much.* >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields1 at gmail.com Sun Aug 29 11:58:50 2010 From: cjfields1 at gmail.com (Christopher Fields) Date: Sun, 29 Aug 2010 10:58:50 -0500 Subject: [Bioperl-l] Could I install BioPerl on Windows with the ActivePerl 5.12.1? In-Reply-To: References: <78E913D5-00E2-45F2-AA9D-7F4A7CDBFDA1@gmail.com> <5115F433-06AC-46F1-81AD-D15C4A8D9524@gmail.com> Message-ID: Yes, and I am thinking of pointing more and more users that direction instead. Can't say maintaining PPM packages with ever-fluctuating specs is easy when I don't work with Windows anymore. chris On Aug 29, 2010, at 6:35 AM, Adam Kennedy wrote: > http://strawberryperl.com/download/professional/strawberry-perl-professional-5.10.1.3-alpha-2.msi > > You get BioPerl installed out the box. > > Adam K > > On 20 August 2010 03:20, Christopher Fields wrote: >> cc'ing list. Looks like the BioPerl PPM is possibly broken for perl 5.12. Shouldn't be too hard to fix, but apparently there are a lot of missing packages. Troubling... >> >> chris >> >> On Aug 19, 2010, at 11:29 AM, han sun wrote: >> >>> v5.10 works,thanks. >>> >>> 2010/8/19 Christopher Fields >>> Try using ActivePerl 5.10 instead of v5.12. It's very possible the PPM won't work for v5.12 yet. >>> >>> chris >>> >>> On Aug 19, 2010, at 9:25 AM, han sun wrote: >>> >>>> Hello everyone, >>>> >>>> I have used perl for several months,and I now want to feel the power of >>>> bioperl. >>>> But it seems that the installing is more difficult than I thought. >>>> >>>> I typed the commands. >>>> >>>> >>>> >>>> install-shell >>>> >>>> >>>> rep add bioperl http://bioperl.org/DIST >>>> >>>> >>>> rep add uwinnipeg >>>> http://cpan.uwinnipeg.ca/PPMPackages/12xx/ >>>> >>>> >>>> rep add trouchelle http://trouchelle.com/ppm12/ >>>> >>>> install BioPerl >>>> >>>> However,the installing failed, >>>> >>>> ppm install failed: >>>> Can't find any package that provides List::MoreUtils for Bundle-BioPerl-Core >>>> Can't find any package that provides PostScript::TextBlock for >>>> Bundle-BioPerl-Core >>>> Can't find any package that provides Ace:: for Bundle-BioPerl-Core >>>> Can't find any package that provides SOAP::Lite for Bundle-BioPerl-Core >>>> Can't find any package that provides Convert::Binary::C for >>>> Bundle-BioPerl-Core >>>> Can't find any package that provides XML::Twig for Bundle-BioPerl-Core >>>> Can't find any package that provides DB_File:: for Bundle-BioPerl-Core >>>> Can't find any package that provides IPC::Run for GraphViz >>>> Can't find any package that provides XML-XPathEngine for XML-DOM-XPath >>>> Can't find any package that provides List-MoreUtils for Moose >>>> Can't find any package that provides List-MoreUtils for Class-MOP >>>> >>>> >>>> then I tried >>>> >>>> install http://www.bribes.org/perl/ppm/GD.ppd >>>> >>>> and tried the installation again,but it still didn't help. >>>> >>>> * >>>> * >>>> * >>>> * >>>> * >>>> * >>>> >>>> >>>> *Do you konw what's wrong with the problem?* >>>> * >>>> * >>>> * >>>> * >>>> *Please help me,thanks very much.* >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From odclerck at gmail.com Fri Aug 27 03:44:14 2010 From: odclerck at gmail.com (odclerck) Date: Fri, 27 Aug 2010 00:44:14 -0700 (PDT) Subject: [Bioperl-l] fasta header replace Message-ID: <29550202.post@talk.nabble.com> Hi, Was wondering if someone had an easy script available that converts the headers of a fasta sequences to a value stored in a separate text file. Macrogen produces files with sequences that look more or less like this: >100825-30_I01_CF_CentralAmerica1_A1_psbAF.ab1 1012, 1000 bases, 0 checksum. I can filter out the position on the plate e.g. "A1" easily but would like to replace this with the name of the strain stored in a different text file, e.g. "A1_D1222". Realize this sounds pretty basic to most of you, but I'm pretty new at scripting. Olivier -- View this message in context: http://old.nabble.com/fasta-header-replace-tp29550202p29550202.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From J.Christopher.Ellis at duke.edu Mon Aug 30 08:55:04 2010 From: J.Christopher.Ellis at duke.edu (J. Christopher Ellis) Date: Mon, 30 Aug 2010 08:55:04 -0400 Subject: [Bioperl-l] Taxonomy DB problem Message-ID: <51468.1283172904@duke.edu> Hi All, I am trying to extract the entire taxonomy of an organism including the classifications. Some thing like... Phylum:Proteobacteria,?Class:Gammaproteobacteria,?Order:Enterobacteriales,?Family:Enterobacteriaceae,?Genus:Escherichia I?am?not?worried?about?format?just?that?I?get?the?information?and?the?associated?level?of?hierarchy.?The?script?found?at?http://bioperl.org/wiki/Species_names_from_accession_numbers?seemed?like?a?good?starting?point?so?I?copied?it?and?tried?run?it?but?got?an?error. My?first?question?is?"Is?there?a?known?fix?for?this?"?and?my?second?question?is?how?do?I?get?the?full?hierarchical?information?(as?seen?above)?with?the?taxonomy?db? Thanks?for?all?your?help?in?advance! Chris? From rafalucas.unicamp at gmail.com Mon Aug 30 09:24:11 2010 From: rafalucas.unicamp at gmail.com (Rafael Lucas) Date: Mon, 30 Aug 2010 10:24:11 -0300 Subject: [Bioperl-l] help in algorithm Bio::Structure::IO::pdb Message-ID: Hi folks, How are you? I'm from Brazil and I was making an algorithm that Cryptographyc a data and then print the result in a pdb file. So I have a .fasta file and want to pass this file to .pdb file, if I use a program, like PyMol, it will take so much time, so I wanna use the Bio::Structure::IO::pdb to accelerate this process, could you help me in this problem? Thank you, Rafael Lucas Faculdade de Tecnologia em Analise e Desenvolvimento de Sistemas FT - UNICAMP +55 (19)9614-0533 From cjfields at illinois.edu Mon Aug 30 09:36:41 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 30 Aug 2010 08:36:41 -0500 Subject: [Bioperl-l] Taxonomy DB problem In-Reply-To: <51468.1283172904@duke.edu> References: <51468.1283172904@duke.edu> Message-ID: Chris, Regarding a fix for that script, we would have to see your modified script and the error. However, there are modules within BioPerl to essentially do what you want, in particular, Bio::DB::Taxonomy. chris On Aug 30, 2010, at 7:55 AM, J. Christopher Ellis wrote: > Hi All, > > I am trying to extract the entire taxonomy of an organism including the > classifications. Some thing like... > > Phylum:Proteobacteria, Class:Gammaproteobacteria, Order:Enterobacteriales, Family:Enterobacteriaceae, Genus:Escherichia > > I am not worried about format just that I get the information and the associated level of hierarchy. The script found at http://bioperl.org/wiki/Species_names_from_accession_numbers seemed like a good starting point so I copied it and tried run it but got an error. > > My first question is "Is there a known fix for this?" and my second question is how do I get the full hierarchical information (as seen above) with the taxonomy db? > > Thanks for all your help in advance! > > Chris > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From fs5 at sanger.ac.uk Mon Aug 30 11:11:06 2010 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Mon, 30 Aug 2010 16:11:06 +0100 Subject: [Bioperl-l] fasta header replace In-Reply-To: <29550202.post@talk.nabble.com> References: <29550202.post@talk.nabble.com> Message-ID: <4C7BCA0A.70503@sanger.ac.uk> Hi Olivier, Do you know how to read a file and build a hash from the contents? This is what you will need to do, e.g. if your file is A1 Strain_A A2 Strain_A A3 Strain_B then you can do something like: open (INFILE, '>', $infile_path) or die; my %well2strain; While (){ my ($well, $strain) = ($_=~/^([A-Z]\d+)\s+(\w+)/); $well2strain{$well}=$strain; } You can then use the values of the hash to set the sequence ID as you parse the FASTA file. The BioPerl SeqIO howto gives details about how to read and write the FASTA file (http://www.bioperl.org/wiki/HOWTO:SeqIO#Working_Examples). You can change the id of a sequence object with $some_seq_object->id( 'my new ID'); See http://doc.bioperl.org/releases/bioperl-1.0/Bio/Seq.html for details. Hope that helps to get you started. Frank odclerck wrote: > Hi, > Was wondering if someone had an easy script available that converts the > headers of a fasta sequences to a value stored in a separate text file. > > Macrogen produces files with sequences that look more or less like this: > >> 100825-30_I01_CF_CentralAmerica1_A1_psbAF.ab1 1012, 1000 bases, 0 checksum. >> > > I can filter out the position on the plate e.g. "A1" easily but would like > to replace this with the name of the strain stored in a different text file, > e.g. "A1_D1222". > > Realize this sounds pretty basic to most of you, but I'm pretty new at > scripting. > Olivier > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From jessica.sun at gmail.com Mon Aug 30 11:51:39 2010 From: jessica.sun at gmail.com (Jessica Sun) Date: Mon, 30 Aug 2010 11:51:39 -0400 Subject: [Bioperl-l] Git for the lazy In-Reply-To: <4A13D48C-B920-4FA5-AF18-292C764A8B79@sbc.su.se> References: <4A13D48C-B920-4FA5-AF18-292C764A8B79@sbc.su.se> Message-ID: I want to add sequence features with tags and tag values, I want to have them in my order, however somehow it seems it is in default alphabetically orders of the tags, does any one knows how to fix? thanks a lot in advance. From G.Gallone at sms.ed.ac.uk Tue Aug 31 07:52:57 2010 From: G.Gallone at sms.ed.ac.uk (Giuseppe Gallone) Date: Tue, 31 Aug 2010 12:52:57 +0100 Subject: [Bioperl-l] New CPAN Release - Bio::Homology::InterologWalk - A Perl Module to retrieve putative PPIs through Interologs Message-ID: <4C7CED19.80802@sms.ed.ac.uk> Dear Bioperl users, I would like to announce the release of Bio::Homology::InterologWalk, a module that retrieves, scores and visualizes putative Protein-Protein Interactions through the orthology-walk method. The project is available from the following link http://search.cpan.org/~ggallone/ and a description of the idea behind it is here http://search.cpan.org/~ggallone/Bio-Homology-InterologWalk-0.02/lib/Bio/Homology/InterologWalk.pm#DESCRIPTION The project is in a very early stage (currently ver. 0.02 alpha) and has currently been tested only on Linux environments. It has not been tested on Macs, but it should work fine, and I would appreciate any feedback from Mac users who try it. *Any* form of feedback will be extremely appreciated (bug, typos, syntactical errors, verbal abuse etc :) ). Best, Giuseppe -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From cjfields at illinois.edu Tue Aug 31 11:01:59 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 31 Aug 2010 10:01:59 -0500 Subject: [Bioperl-l] Taxonomy DB problem In-Reply-To: <56973.1283255847@duke.edu> References: <56973.1283255847@duke.edu> Message-ID: <7167CA86-857E-4E16-A3D6-BA45045CF892@illinois.edu> Yes, I see that one. It may be the ID hash that is being returned is empty. I'll look into it. -c On Aug 31, 2010, at 6:57 AM, J. Christopher Ellis wrote: > Hi Chris, > > The error is... > > "Use of uninitialized value $id in join or string at C:/Perl64/site/lib/Bio/Tools/EUtilities/EUtilParameters.pm line 363." > > The script from http://bioperl.org/wiki/Species_names_from_accession_numbers is as follows.... > > use Bio::DB::EUtilities; > > > > > > > > > my (%taxa, @taxa); > > > > my (%names, %idmap); > > > > > > > > > # these are protein ids; nuc ids will work by changing -dbfrom => 'nucleotide', > > > > # (probably) > > > > > > > > > my @ids = qw(1621261 89318838 68536103 > > 20807972 > 730439); > > > > > > > my $factory = Bio::DB::EUtilities->new( > > - > eutil => 'elink', > > > -db => 'taxonomy', > > > > > -dbfrom => 'protein', > > > > > -correspondence => 1, > > > > > -id => \@ids); > > > > > > > > > # iterate through the LinkSet objects > > > > while (my $ds = $factory->next_LinkSet) { > > > > > $taxa{($ds->get_submitted_ids)[0] > > } > = ($ds->get_ids)[0] > > } > > > > > > > > > @taxa = @taxa{@ids}; > > > > > > > > > $factory = Bio::DB::EUtilities->new(-eutil > > => > 'esummary', > > > -db => 'taxonomy', > > > > > -id => \@taxa ); > > > > > > > > > while (local $_ = $factory->next_DocSum) > > > { > > > $names{($_->get_contents_by_name('TaxId')) > > [ > 0]} = > > ($_->get_contents_by_name('ScientificName'))[0 > > ] > ; > > } > > > > > > > > > foreach (@ids) { > > > > > $idmap{$_} = $names{$taxa{$_ > > } > }; > > } > > > > > > > > > # %idmap is > > > > # 1621261 => 'Mycobacterium tuberculosis H37Rv' > > > > # 20807972 => 'Thermoanaerobacter tengcongensis MB4' > > > > # 68536103 => 'Corynebacterium jeikeium K411' > > > > # 730439 => 'Bacillus caldolyticus' > > > > # 89318838 => undef (this record has been removed from the db) > > > > > > > > > 1; > > > Thanks, > > > > Chris > > > On Mon 08/30/10 09:36 , "Chris Fields" cjfields at illinois.edu sent: > Chris, > > Regarding a fix for that script, we would have to see your modified script and the error. However, there are modules within BioPerl to essentially do what you want, in particular, Bio::DB::Taxonomy. > > chris > > On Aug 30, 2010, at 7:55 AM, J. Christopher Ellis wrote: > > > Hi All, > > > > I am trying to extract the entire taxonomy of an organism including the > > classifications. Some thing like... > > > > Phylum:Proteobacteria, Class:Gammaproteobacteria, Order:Enterobacteriales, Family:Enterobacteriaceae, Genus:Escherichia > > > > I am not worried about format just that I get the information and the associated level of hierarchy. The script found athttp://bioperl.org/wiki/Species_names_from_accession_numbers">http://bioperl.org/wiki/Species_names_from_accession_numbers seemed like a good starting point so I copied it and tried run it but got an error. > > > > My first question is "Is there a known fix for this?" and my second question is how do I get the full hierarchical information (as seen above) with the taxonomy db? > > > > Thanks for all your help in advance! > > > > Chris > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l">http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From J.Christopher.Ellis at duke.edu Tue Aug 31 07:57:27 2010 From: J.Christopher.Ellis at duke.edu (J. Christopher Ellis) Date: Tue, 31 Aug 2010 07:57:27 -0400 Subject: [Bioperl-l] Taxonomy DB problem Message-ID: <56973.1283255847@duke.edu> Hi Chris, The error is... "Use of uninitialized value $id in join or string at C:/Perl64/site/lib/Bio/Tools/EUtilities/EUtilParameters.pm line 363." The script from http://bioperl.org/wiki/Species_names_from_accession_numbers is as follows.... use?Bio::DB::EUtilities; ? my?(%taxa,?@taxa); my?(%names,?%idmap); ? #?these?are?protein?ids;?nuc?ids?will?work?by?changing?-dbfrom?=>?'nucleotide', #?(probably) ? my?@ids?=?qw(1621261?89318838?68536103? 20807972?730439); ? my?$factory?=?Bio::DB::EUtilities->new( -eutil?=>?'elink', ?-db?=>?'taxonomy', ?-dbfrom?=>?'protein', ?-correspondence?=>?1, ?-id?=>?@ids); ? #?iterate?through?the?LinkSet?objects while?(my?$ds?=?$factory->next_LinkSet)?{ ?$taxa{($ds->get_submitted_ids)[0] }?=?($ds->get_ids)[0] } ? @taxa?=?@taxa{@ids}; ? $factory?=?Bio::DB::EUtilities->new(-eutil? =>?'esummary', ?-db?=>?'taxonomy', ?-id?=>?@taxa?); ? while?(local?$_?=?$factory->next_DocSum) ?{ ?$names{($_->get_contents_by_name('TaxId')) [0]}?=? ($_->get_contents_by_name('ScientificName'))[0 ]; } ? foreach?(@ids)?{ ?$idmap{$_}?=?$names{$taxa{$_ }}; } ? #?%idmap?is #?1621261?=>?'Mycobacterium?tuberculosis?H37Rv' #?20807972?=>?'Thermoanaerobacter?tengcongensis?MB4' #?68536103?=>?'Corynebacterium?jeikeium?K411' #?730439?=>?'Bacillus?caldolyticus' #?89318838?=>?undef?(this?record?has?been?removed?from?the?db) ? 1; Thanks, Chris On Mon 08/30/10 09:36 , "Chris Fields" cjfields at illinois.edu sent: Chris, Regarding a fix for that script, we would have to see your modified script and the error. However, there are modules within BioPerl to essentially do what you want, in particular, Bio::DB::Taxonomy. chris On Aug 30, 2010, at 7:55 AM, J. Christopher Ellis wrote: > Hi All, > > I am trying to extract the entire taxonomy of an organism including the > classifications. Some thing like... > > Phylum:Proteobacteria, Class:Gammaproteobacteria, Order:Enterobacteriales, Family:Enterobacteriaceae, Genus:Escherichia > > I am not worried about format just that I get the information and the associated level of hierarchy. The script found at http://bioperl.org/wiki/Species_names_from_accession_numbers seemed like a good starting point so I copied it and tried run it but got an error. > > My first question is "Is there a known fix for this?" and my second question is how do I get the full hierarchical information (as seen above) with the taxonomy db? > > Thanks for all your help in advance! > > Chris > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From rmb32 at cornell.edu Sun Aug 1 15:17:14 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Sun, 01 Aug 2010 12:17:14 -0700 Subject: [Bioperl-l] GMOD Evo Hackathon Open Call for Participation Message-ID: <4C55C83A.3060700@cornell.edu> We are seeking participants for the GMOD Tools for Evolutionary Biology Hackathon, held November 8-12, 2010 at the US National Evolutionary Synthesis Center (NESCent) in Durham, NC. This hackathon targets three critical gaps in the capabilities of the GMOD toolbox that currently limit its utility for evolutionary research: 1. Visualization of comparative genomics data 2. Visualization of phylogenetic data and trees 3. Support for population diversity and phenotype data If you are interested in these areas and have relevant expertise, you are strongly encouraged to apply. Relevant areas of expertise include more than just software development: if you are a GMOD power user, visualization guru, domain expert (comparative, phylogenetics, population, ...), or documentation wizard, then your skills are needed! How To Apply: Fill out the online application form at http://bit.ly/gmodevohack. Applications are due August 25. About GMOD: GMOD is an intercompatible suite of open-source software components for storing, managing, analyzing, and visualizing genome-scale data. GMOD includes many widely-used software components: GBrowse and JBrowse, both genome viewers; GBrowse_syn, a comparative genomics viewer; Chado, a generic and modular database schema; CMap, a comparative map viewer; as well as many other components including Apollo, MAKER, BioMart, InterMine, and Galaxy. We hope to extend the functionality of existing GMOD components, and integrate new components as well. About Hackathons: A hackathon is an intense event at which a group of programmers with different backgrounds and skills collaborate hands-on and face-to-face to develop working code that is of utility to the community as a whole. The mix of people will include domain experts and computer-savvy end-users. More details about the event, its motivation, organization, procedures, and attendees, as well as URLs to the hackathon and related websites are included below. Sincerely, The GMOD EvoHack Organizing Committee (and project affiliations as relevant): Nicole Washington, Chair (LBNL, modENCODE, Phenote) Robert Buels (SGN, Chado NatDiv) Scott Cain (OICR, GMOD) Dave Clements (NESCent, GMOD) Hilmar Lapp (NESCent, Phenoscape, Chado NatDiv) Sheldon McKay (University of Arizona, iPlant, GBrowse_syn) ----------------------------- About the GMOD Evo Hackathon Overview We are organizing a hackathon to fill critical gaps in the capabilities of the Generic Model Organism Database (GMOD) toolbox that currently limit its utility for evolutionary research. Specifically, we will focus on tools for 1) viewing comparative genomics data; 2) visualizing phylogenomic data; and 3) supporting population diversity data and phenotype annotation. The event will be hosted at NESCent and bring together a group of about 20+ software developers, end-user representatives, and documentation experts who would otherwise not meet. The participants will include key developers of GMOD components that currently lack features critical for emerging evolutionary biology research, developers of informatics tools in evolutionary research that lack GMOD integration, and informatics-savvy biologists who can represent end-user requirements. The event will provide a unique opportunity to infuse the GMOD developer community with a heightened awareness of unmet needs in evolutionary biology that GMOD components have the potential to fill, and for tool developers in evolutionary biology to better understand how best to extend or integrate with already existing GMOD components. Before the Event Discussion of ideas and sometimes even design actually starts well before the hackathon, on mailing lists, wiki pages, and conference calls set up among accepted attendees. This advance work lays the foundation for participants to be productive from the very first day. This also means that participants should be willing to contribute some time in advance of the hackathon itself to participate in this preparatory discussion. During the Event Typically, hackathon participants use the morning of the first day of the event to organize themselves into working groups of between 3 and 6 people, each with a focused implementation objective. Ideas and objectives are discussed, and attendees coalesce around the projects in which they have the most experience or interest. Deliverables / Event Results The meeting's attendance, working groups, and outcomes will be fully logged and documented on the GMOD wiki (http://gmod.org). Each working group during the event will typically have its own wiki page, linked from the main EvoHack page, where it documents its minutes and design notes, and provides links to the code and documentation it produces. Also, since GMOD and NESCent are both committed to open source principles, all code and documentation produced by participants during the hackathon must be published under an OSI-approved open source license. As contributions to existing GMOD tools, all hackathon products will most likely satisfy this requirement automatically. NESCent This event is sponsored by the US National Evolutionary Synthesis Center (NESCent, http://www.nescent.org) through its Informatics Whitepapers program (http://www.nescent.org/informatics/whitepapers.php). NESCent promotes the synthesis of information, concepts and knowledge to address significant, emerging, or novel questions in evolutionary science and its applications. NESCent achieves this by supporting research and education across disciplinary, institutional, geographic, and demographic boundaries (see http://www.nescent.org/science/proposals.php). Links Main GMOD EvoHack page, and full proposal: http://gmod.org/wiki/GMOD_Evo_Hackathon NESCent: http://www.nescent.org/ GMOD: http://gmod.org Similar past NESCent events, see: http://hackathon.nescent.org/ GMOD hackathon application: http://bit.ly/gmodevohack -- http://gmod.org/wiki/GMOD_News http://gmod.org/wiki/GMOD_Europe_2010 http://gmod.org/wiki/Help_Desk_Feedback From maj at fortinbras.us Sun Aug 1 19:19:16 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 1 Aug 2010 19:19:16 -0400 Subject: [Bioperl-l] SOAP Eutilities In-Reply-To: References: Message-ID: <627BEC8B2E624A69A0B11EEBC8C93B71@NewLife> Turns out that module lives in bioperl-run; try git clone git://github.com/bioperl/bioperl-run.git MAJ ----- Original Message ----- From: "Robson de Souza" To: Sent: Saturday, July 31, 2010 4:56 PM Subject: [Bioperl-l] SOAP Eutilities > Hi, > > Bio::DB::SoapEUtilities, referred in the HOWTO on EUtilities, seems to > have disappeared from the Git repository. > A simple > > git clone git://github.com/bioperl/bioperl-live.git > > does not download it. Any ideas why? > Robson > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From David.Messina at sbc.su.se Mon Aug 2 09:58:10 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 2 Aug 2010 15:58:10 +0200 Subject: [Bioperl-l] phyloxml and element order In-Reply-To: References: Message-ID: Hi Fred, Thanks for letting us know about this ? definitely sounds like a bug. Would you please submit this to our bug tracker? http://bugzilla.open-bio.org (You can just copy and paste your previous email.) Dave On Jul 30, 2010, at 06:59, Fr?d?ric Romagn? wrote: > Hi, > > I'm using bioperl to create phyloxml trees, after few tentatives, i got my > tree with all the element/attributes i want but when I write the tree, > element are not written following the order specified in the XSD Schema. > > For example, i got : > > > > Loxosceles intermedia > > Araneomorphae Sicariidae > > > 969 > HAAERADSRKPIWDIAHMVNDLELVD > > > > Araneomorphae Sicariidae > > > > The program forester complains that should be written before the > element. > > According to > http://phyloxml.wordpress.com/2009/11/25/order-of-elements-in-phyloxml this > is what bioperl is supposed to do. > > All my element/attributes are set before writing the tree using > 'add_Annotation', 'add_tag_value' and 'sequence' methods from a > Bio::Tree::AnnotatableNode object, so i think the error comes from the > write_tree method. > > Any help would be appreciated. > > Thank you, > Fred > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From shalabh.sharma7 at gmail.com Mon Aug 2 15:44:35 2010 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Mon, 2 Aug 2010 15:44:35 -0400 Subject: [Bioperl-l] clustalw to maf format Message-ID: Hi, I am trying to convert clustalw to maf format. I am trying to use AlignIO for that but its not working. Its giving me the following error: EXCEPTION Bio::Root::NotImplemented ------------- MSG: Abstract method "Bio::AlignIO::maf::write_aln" is not implemented by package Bio::AlignIO::maf. This is not your fault - author of Bio::AlignIO::maf should be blamed! STACK Bio::Root::RootI::throw_not_implemented /Library/Perl/5.8.8/Bio/Root/RootI.pm:707 STACK Bio::AlignIO::maf::write_aln /Library/Perl/5.8.8/Bio/AlignIO/ maf.pm:176 STACK Bio::AlignIO::PRINT /Library/Perl/5.8.8/Bio/AlignIO.pm:492 STACK toplevel msf2mafy.pl:11 Is there any other way i can convert clustalw to maf? I would really appreciate if anyone can help me out. Thanks Shalabh From Russell.Smithies at agresearch.co.nz Mon Aug 2 16:25:26 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue, 3 Aug 2010 08:25:26 +1200 Subject: [Bioperl-l] clustalw to maf format In-Reply-To: References: Message-ID: <18DF7D20DFEC044098A1062202F5FFF32F02147B68@exchsth.agresearch.co.nz> This might work if you only have a few: http://www.ibi.vu.nl/programs/convertalignwww/ --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of shalabh sharma > Sent: Tuesday, 3 August 2010 7:45 a.m. > To: bioperl-l > Subject: [Bioperl-l] clustalw to maf format > > Hi, > I am trying to convert clustalw to maf format. > I am trying to use AlignIO for that but its not working. > > Its giving me the following error: > > EXCEPTION Bio::Root::NotImplemented ------------- > MSG: Abstract method "Bio::AlignIO::maf::write_aln" is not implemented by > package Bio::AlignIO::maf. > This is not your fault - author of Bio::AlignIO::maf should be blamed! > > STACK Bio::Root::RootI::throw_not_implemented > /Library/Perl/5.8.8/Bio/Root/RootI.pm:707 > STACK Bio::AlignIO::maf::write_aln /Library/Perl/5.8.8/Bio/AlignIO/ > maf.pm:176 > STACK Bio::AlignIO::PRINT /Library/Perl/5.8.8/Bio/AlignIO.pm:492 > STACK toplevel msf2mafy.pl:11 > > > Is there any other way i can convert clustalw to maf? > > I would really appreciate if anyone can help me out. > > Thanks > Shalabh > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From shalabh.sharma7 at gmail.com Mon Aug 2 16:53:31 2010 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Mon, 2 Aug 2010 16:53:31 -0400 Subject: [Bioperl-l] clustalw to maf format In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32F02147B68@exchsth.agresearch.co.nz> References: <18DF7D20DFEC044098A1062202F5FFF32F02147B68@exchsth.agresearch.co.nz> Message-ID: Hi Russell, Thanks for the reply, but i have around 400 alignments and some huge ones :( Thanks Shalabh On Mon, Aug 2, 2010 at 4:25 PM, Smithies, Russell < Russell.Smithies at agresearch.co.nz> wrote: > This might work if you only have a few: > http://www.ibi.vu.nl/programs/convertalignwww/ > > --Russell > > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of shalabh sharma > > Sent: Tuesday, 3 August 2010 7:45 a.m. > > To: bioperl-l > > Subject: [Bioperl-l] clustalw to maf format > > > > Hi, > > I am trying to convert clustalw to maf format. > > I am trying to use AlignIO for that but its not working. > > > > Its giving me the following error: > > > > EXCEPTION Bio::Root::NotImplemented ------------- > > MSG: Abstract method "Bio::AlignIO::maf::write_aln" is not implemented by > > package Bio::AlignIO::maf. > > This is not your fault - author of Bio::AlignIO::maf should be blamed! > > > > STACK Bio::Root::RootI::throw_not_implemented > > /Library/Perl/5.8.8/Bio/Root/RootI.pm:707 > > STACK Bio::AlignIO::maf::write_aln /Library/Perl/5.8.8/Bio/AlignIO/ > > maf.pm:176 > > STACK Bio::AlignIO::PRINT /Library/Perl/5.8.8/Bio/AlignIO.pm:492 > > STACK toplevel msf2mafy.pl:11 > > > > > > Is there any other way i can convert clustalw to maf? > > > > I would really appreciate if anyone can help me out. > > > > Thanks > > Shalabh > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > From biopython at maubp.freeserve.co.uk Mon Aug 2 17:24:09 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 2 Aug 2010 22:24:09 +0100 Subject: [Bioperl-l] clustalw to maf format In-Reply-To: References: Message-ID: On Mon, Aug 2, 2010 at 8:44 PM, shalabh sharma wrote: > Hi, > ? ?I am trying to convert clustalw to maf format. > I am trying to use AlignIO for that but its not working. Could you tell us why you have to use maf format? I'm curious because all of the phylogenetics tools I've had to work with personally will take some other format which is more widely supported (e.g. FASTA, PFAM, ClustalW, PHYLIP, ...). Peter From bernd.web at gmail.com Mon Aug 2 17:25:52 2010 From: bernd.web at gmail.com (Bernd Web) Date: Mon, 2 Aug 2010 23:25:52 +0200 Subject: [Bioperl-l] clustalw to maf format In-Reply-To: References: <18DF7D20DFEC044098A1062202F5FFF32F02147B68@exchsth.agresearch.co.nz> Message-ID: Hi Shalabh, This ConvertAlign does not write maf either, it only reads it (i made it). I found some other converters on the web but they do not export to maf format either... http://biotechvana.uv.es/servers/afc/main.php http://www.hiv.lanl.gov/content/sequence/FORMAT_CONVERSION/form.html Galaxy has a MAF to Fasta converter: http://main.g2.bx.psu.edu/root?tool_id=MAF_To_Fasta1 Regards, Bernd On Mon, Aug 2, 2010 at 10:53 PM, shalabh sharma wrote: > Hi Russell, > ? ? ? ? ? ?Thanks for the reply, but i ?have around 400 alignments and some > huge ones :( > > Thanks > Shalabh > > > On Mon, Aug 2, 2010 at 4:25 PM, Smithies, Russell < > Russell.Smithies at agresearch.co.nz> wrote: > >> This might work if you only have a few: >> http://www.ibi.vu.nl/programs/convertalignwww/ >> >> --Russell >> >> >> > -----Original Message----- >> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> > bounces at lists.open-bio.org] On Behalf Of shalabh sharma >> > Sent: Tuesday, 3 August 2010 7:45 a.m. >> > To: bioperl-l >> > Subject: [Bioperl-l] clustalw to maf format >> > >> > Hi, >> > ? ? I am trying to convert clustalw to maf format. >> > I am trying to use AlignIO for that but its not working. >> > >> > Its giving me the following error: >> > >> > EXCEPTION Bio::Root::NotImplemented ------------- >> > MSG: Abstract method "Bio::AlignIO::maf::write_aln" is not implemented by >> > package Bio::AlignIO::maf. >> > This is not your fault - author of Bio::AlignIO::maf should be blamed! >> > >> > STACK Bio::Root::RootI::throw_not_implemented >> > /Library/Perl/5.8.8/Bio/Root/RootI.pm:707 >> > STACK Bio::AlignIO::maf::write_aln /Library/Perl/5.8.8/Bio/AlignIO/ >> > maf.pm:176 >> > STACK Bio::AlignIO::PRINT /Library/Perl/5.8.8/Bio/AlignIO.pm:492 >> > STACK toplevel msf2mafy.pl:11 >> > >> > >> > Is there any other way i can convert clustalw to maf? >> > >> > I would really appreciate if anyone can help me out. >> > >> > Thanks >> > Shalabh >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> ======================================================================= >> Attention: The information contained in this message and/or attachments >> from AgResearch Limited is intended only for the persons or entities >> to which it is addressed and may contain confidential and/or privileged >> material. Any review, retransmission, dissemination or other use of, or >> taking of any action in reliance upon, this information by persons or >> entities other than the intended recipients is prohibited by AgResearch >> Limited. If you have received this message in error, please notify the >> sender immediately. >> ======================================================================= >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Mon Aug 2 17:31:20 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 2 Aug 2010 16:31:20 -0500 Subject: [Bioperl-l] clustalw to maf format In-Reply-To: References: <18DF7D20DFEC044098A1062202F5FFF32F02147B68@exchsth.agresearch.co.nz> Message-ID: <6E9C9D64-D23A-4FC8-B213-FC8A7FFA4F27@illinois.edu> No other format will work? The main reason you see unimplemented methods like this is there is no active interest in working with this format beyond getting the information stored within them into objects and other commonly-used formats. chris On Aug 2, 2010, at 3:53 PM, shalabh sharma wrote: > Hi Russell, > Thanks for the reply, but i have around 400 alignments and some > huge ones :( > > Thanks > Shalabh > > > On Mon, Aug 2, 2010 at 4:25 PM, Smithies, Russell < > Russell.Smithies at agresearch.co.nz> wrote: > >> This might work if you only have a few: >> http://www.ibi.vu.nl/programs/convertalignwww/ >> >> --Russell >> >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>> bounces at lists.open-bio.org] On Behalf Of shalabh sharma >>> Sent: Tuesday, 3 August 2010 7:45 a.m. >>> To: bioperl-l >>> Subject: [Bioperl-l] clustalw to maf format >>> >>> Hi, >>> I am trying to convert clustalw to maf format. >>> I am trying to use AlignIO for that but its not working. >>> >>> Its giving me the following error: >>> >>> EXCEPTION Bio::Root::NotImplemented ------------- >>> MSG: Abstract method "Bio::AlignIO::maf::write_aln" is not implemented by >>> package Bio::AlignIO::maf. >>> This is not your fault - author of Bio::AlignIO::maf should be blamed! >>> >>> STACK Bio::Root::RootI::throw_not_implemented >>> /Library/Perl/5.8.8/Bio/Root/RootI.pm:707 >>> STACK Bio::AlignIO::maf::write_aln /Library/Perl/5.8.8/Bio/AlignIO/ >>> maf.pm:176 >>> STACK Bio::AlignIO::PRINT /Library/Perl/5.8.8/Bio/AlignIO.pm:492 >>> STACK toplevel msf2mafy.pl:11 >>> >>> >>> Is there any other way i can convert clustalw to maf? >>> >>> I would really appreciate if anyone can help me out. >>> >>> Thanks >>> Shalabh >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> ======================================================================= >> Attention: The information contained in this message and/or attachments >> from AgResearch Limited is intended only for the persons or entities >> to which it is addressed and may contain confidential and/or privileged >> material. Any review, retransmission, dissemination or other use of, or >> taking of any action in reliance upon, this information by persons or >> entities other than the intended recipients is prohibited by AgResearch >> Limited. If you have received this message in error, please notify the >> sender immediately. >> ======================================================================= >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From shalabh.sharma7 at gmail.com Mon Aug 2 18:30:41 2010 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Mon, 2 Aug 2010 18:30:41 -0400 Subject: [Bioperl-l] clustalw to maf format In-Reply-To: <6E9C9D64-D23A-4FC8-B213-FC8A7FFA4F27@illinois.edu> References: <18DF7D20DFEC044098A1062202F5FFF32F02147B68@exchsth.agresearch.co.nz> <6E9C9D64-D23A-4FC8-B213-FC8A7FFA4F27@illinois.edu> Message-ID: Hi All, Thanks for the replies. Actually i am working on a pipeline involving RNAz. I had impression that there must be a converter available as their webserver can take xmfa or maf format but standalone is only accepting maf format. I think i will use a program that can output as xmfa and write to those people if they can provide me with the converter. Thanks Shalabh On Mon, Aug 2, 2010 at 5:31 PM, Chris Fields wrote: > No other format will work? The main reason you see unimplemented methods > like this is there is no active interest in working with this format beyond > getting the information stored within them into objects and other > commonly-used formats. > > chris > > On Aug 2, 2010, at 3:53 PM, shalabh sharma wrote: > > > Hi Russell, > > Thanks for the reply, but i have around 400 alignments and > some > > huge ones :( > > > > Thanks > > Shalabh > > > > > > On Mon, Aug 2, 2010 at 4:25 PM, Smithies, Russell < > > Russell.Smithies at agresearch.co.nz> wrote: > > > >> This might work if you only have a few: > >> http://www.ibi.vu.nl/programs/convertalignwww/ > >> > >> --Russell > >> > >> > >>> -----Original Message----- > >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>> bounces at lists.open-bio.org] On Behalf Of shalabh sharma > >>> Sent: Tuesday, 3 August 2010 7:45 a.m. > >>> To: bioperl-l > >>> Subject: [Bioperl-l] clustalw to maf format > >>> > >>> Hi, > >>> I am trying to convert clustalw to maf format. > >>> I am trying to use AlignIO for that but its not working. > >>> > >>> Its giving me the following error: > >>> > >>> EXCEPTION Bio::Root::NotImplemented ------------- > >>> MSG: Abstract method "Bio::AlignIO::maf::write_aln" is not implemented > by > >>> package Bio::AlignIO::maf. > >>> This is not your fault - author of Bio::AlignIO::maf should be blamed! > >>> > >>> STACK Bio::Root::RootI::throw_not_implemented > >>> /Library/Perl/5.8.8/Bio/Root/RootI.pm:707 > >>> STACK Bio::AlignIO::maf::write_aln /Library/Perl/5.8.8/Bio/AlignIO/ > >>> maf.pm:176 > >>> STACK Bio::AlignIO::PRINT /Library/Perl/5.8.8/Bio/AlignIO.pm:492 > >>> STACK toplevel msf2mafy.pl:11 > >>> > >>> > >>> Is there any other way i can convert clustalw to maf? > >>> > >>> I would really appreciate if anyone can help me out. > >>> > >>> Thanks > >>> Shalabh > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> ======================================================================= > >> Attention: The information contained in this message and/or attachments > >> from AgResearch Limited is intended only for the persons or entities > >> to which it is addressed and may contain confidential and/or privileged > >> material. Any review, retransmission, dissemination or other use of, or > >> taking of any action in reliance upon, this information by persons or > >> entities other than the intended recipients is prohibited by AgResearch > >> Limited. If you have received this message in error, please notify the > >> sender immediately. > >> ======================================================================= > >> > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From chiragmatkarbioinfo at gmail.com Tue Aug 3 03:47:37 2010 From: chiragmatkarbioinfo at gmail.com (chirag matkar) Date: Tue, 3 Aug 2010 13:17:37 +0530 Subject: [Bioperl-l] Pubmed Parsing Message-ID: Hello all, I have a list of Pubmed Ids. I want to parse articles to find specific SNP related information. Can i work it out using a Script? -- Regards, Chirag Matkar From genehack at genehack.org Tue Aug 3 05:03:35 2010 From: genehack at genehack.org (John Anderson) Date: Tue, 3 Aug 2010 05:03:35 -0400 Subject: [Bioperl-l] Pubmed Parsing In-Reply-To: References: Message-ID: <5E557C44-224B-4460-9C2C-E375555B8BE6@genehack.org> On Aug 3, 2010, at 3:47 AM, chirag matkar wrote: > I have a list of Pubmed Ids. > I want to parse articles to find specific SNP related information. > Can i work it out using a Script? Can you provide a more specific example of what you'd like to do? For example, something along the lines of, "for PMID 1234, get ... about SNP 5678" (where '...' is replaced with whatever it is you're trying to get). Even describing how you would obtain this information using the website yourself will be helpful. thanks, john. From gowthaman.ramasamy at seattlebiomed.org Tue Aug 3 01:29:10 2010 From: gowthaman.ramasamy at seattlebiomed.org (Gowthaman Ramasamy) Date: Mon, 2 Aug 2010 22:29:10 -0700 Subject: [Bioperl-l] Getting pileup consensus from BAM files using Bio::DB::Sam In-Reply-To: Message-ID: Hi List, I am trying to find out the consensus using pileup via Bio::DB::Sam. Using the following script I could parse out the ref_base and different bases from reads at that position. Though, I am not able to find a method to derive consensus. Similar to the values produced by "samtools pileup -c -f xxxxxx.fasta yyyyyyy.bam". The script I use now retrives ref base, query bases for each position. How do I improve it to get the consensus? Thanks very much in advance, Gowthaman use Bio::DB::Sam; my $bam = Bio::DB::Sam->new(-bam => 'something.bam', -fasta => 'something.fasta' ); my $cb = sub { my ($seqid, $pos, $pileups) = @_; my $refBase = $bam->segment($seqid, $pos, $pos)->dna; print "\n$pos\t$refBase=>"; for my $pileup (@$pileups){ my $al = $pileup->alignment; my $qBase = substr($al->qseq, $pileup->qpos, 1); print "$qBase,"; } }; $bam->pileup('Lin.chr10i', $cb); From scott at scottcain.net Tue Aug 3 06:32:59 2010 From: scott at scottcain.net (Scott Cain) Date: Tue, 3 Aug 2010 06:32:59 -0400 Subject: [Bioperl-l] Getting pileup consensus from BAM files using Bio::DB::Sam In-Reply-To: References: Message-ID: Hi Gowthaman, I don't see a method to extract the consensus. You are welcome to submit a patch :-) Scott On Tue, Aug 3, 2010 at 1:29 AM, Gowthaman Ramasamy wrote: > Hi List, > I am trying to find out the consensus using pileup via Bio::DB::Sam. Using the following script I could parse out the ref_base and different bases from reads at that position. Though, I am not able to find a method to derive consensus. Similar to the values produced by "samtools pileup -c -f xxxxxx.fasta yyyyyyy.bam". > > The script I use now retrives ref base, query bases for each position. How do I improve it to get the consensus? > > Thanks very much in advance, > Gowthaman > > > use Bio::DB::Sam; > > my $bam = Bio::DB::Sam->new(-bam => 'something.bam', > ? ? ? ? ? ? ? ? ? ? ? ? ? ?-fasta => 'something.fasta' > ? ? ? ? ? ? ? ? ? ? ? ? ? ); > > my $cb = sub { > ? ? ? ? ? ? ? ? ? ? ? ?my ($seqid, $pos, $pileups) = @_; > ? ? ? ? ? ? ? ? ? ? ? ?my $refBase = $bam->segment($seqid, $pos, $pos)->dna; > ? ? ? ? ? ? ? ? ? ? ? ?print "\n$pos\t$refBase=>"; > ? ? ? ? ? ? ? ? ? ? ? ?for my $pileup (@$pileups){ > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?my $al = $pileup->alignment; > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?my $qBase = substr($al->qseq, $pileup->qpos, 1); > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?print "$qBase,"; > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?} > ? ? ? ? ? ? ? ? ? ? ? ?}; > > $bam->pileup('Lin.chr10i', $cb); > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From lincoln.stein at gmail.com Tue Aug 3 12:57:52 2010 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Tue, 3 Aug 2010 12:57:52 -0400 Subject: [Bioperl-l] Getting pileup consensus from BAM files using Bio::DB::Sam In-Reply-To: References: Message-ID: Samtools is running MAQ on the pileup. You could either implement MAQ in perl, or come up with your own consensus caller. Lincoln On Tue, Aug 3, 2010 at 1:29 AM, Gowthaman Ramasamy < gowthaman.ramasamy at seattlebiomed.org> wrote: > Hi List, > I am trying to find out the consensus using pileup via Bio::DB::Sam. Using > the following script I could parse out the ref_base and different bases from > reads at that position. Though, I am not able to find a method to derive > consensus. Similar to the values produced by "samtools pileup -c -f > xxxxxx.fasta yyyyyyy.bam". > > The script I use now retrives ref base, query bases for each position. How > do I improve it to get the consensus? > > Thanks very much in advance, > Gowthaman > > > use Bio::DB::Sam; > > my $bam = Bio::DB::Sam->new(-bam => 'something.bam', > -fasta => 'something.fasta' > ); > > my $cb = sub { > my ($seqid, $pos, $pileups) = @_; > my $refBase = $bam->segment($seqid, $pos, > $pos)->dna; > print "\n$pos\t$refBase=>"; > for my $pileup (@$pileups){ > my $al = $pileup->alignment; > my $qBase = substr($al->qseq, $pileup->qpos, > 1); > print "$qBase,"; > } > }; > > $bam->pileup('Lin.chr10i', $cb); > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From biopython at maubp.freeserve.co.uk Tue Aug 3 13:06:46 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 3 Aug 2010 18:06:46 +0100 Subject: [Bioperl-l] Getting pileup consensus from BAM files using Bio::DB::Sam In-Reply-To: References: Message-ID: On Tue, Aug 3, 2010 at 5:57 PM, Lincoln Stein wrote: > Samtools is running MAQ on the pileup. You could either implement MAQ in > perl, or come up with your own consensus caller. > > Lincoln See also: http://seqanswers.com/forums/showthread.php?t=6241 From gowthaman.ramasamy at seattlebiomed.org Tue Aug 3 13:28:36 2010 From: gowthaman.ramasamy at seattlebiomed.org (Gowthaman Ramasamy) Date: Tue, 3 Aug 2010 10:28:36 -0700 Subject: [Bioperl-l] Getting pileup consensus from BAM files using Bio::DB::Sam In-Reply-To: References: , Message-ID: <89080953C3D300419AACB6E63A7EEFBA5C47613B34@mail02.sbri.org> Hi Lincoln, Thats a good lead. I will try to use MAQ in perl rather than using my simple majority rule. -gowtham ________________________________________ From: Lincoln Stein [lincoln.stein at gmail.com] Sent: Tuesday, August 03, 2010 9:57 AM To: Gowthaman Ramasamy Cc: bioperl-l Subject: Re: [Bioperl-l] Getting pileup consensus from BAM files using Bio::DB::Sam Samtools is running MAQ on the pileup. You could either implement MAQ in perl, or come up with your own consensus caller. Lincoln On Tue, Aug 3, 2010 at 1:29 AM, Gowthaman Ramasamy > wrote: Hi List, I am trying to find out the consensus using pileup via Bio::DB::Sam. Using the following script I could parse out the ref_base and different bases from reads at that position. Though, I am not able to find a method to derive consensus. Similar to the values produced by "samtools pileup -c -f xxxxxx.fasta yyyyyyy.bam". The script I use now retrives ref base, query bases for each position. How do I improve it to get the consensus? Thanks very much in advance, Gowthaman use Bio::DB::Sam; my $bam = Bio::DB::Sam->new(-bam => 'something.bam', -fasta => 'something.fasta' ); my $cb = sub { my ($seqid, $pos, $pileups) = @_; my $refBase = $bam->segment($seqid, $pos, $pos)->dna; print "\n$pos\t$refBase=>"; for my $pileup (@$pileups){ my $al = $pileup->alignment; my $qBase = substr($al->qseq, $pileup->qpos, 1); print "$qBase,"; } }; $bam->pileup('Lin.chr10i', $cb); _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa > From stefan.kirov at bms.com Tue Aug 3 16:22:35 2010 From: stefan.kirov at bms.com (Stefan Kirov) Date: Tue, 03 Aug 2010 16:22:35 -0400 Subject: [Bioperl-l] nmica parser Message-ID: <4C587A8B.8090603@bms.com> Has anyone written nmica parser? If not I will perhaps do that. It should be straightforward- the output is XML. Stefan -------------- next part -------------- A non-text attachment was scrubbed... Name: stefan_kirov.vcf Type: text/x-vcard Size: 207 bytes Desc: not available URL: From fs5 at sanger.ac.uk Wed Aug 4 04:45:39 2010 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Wed, 04 Aug 2010 09:45:39 +0100 Subject: [Bioperl-l] Pubmed Parsing In-Reply-To: References: Message-ID: <1280911539.3499.46.camel@deskpro15336.dynamic.sanger.ac.uk> Hi Chiraq, have a look at this earlier post: http://bioperl.org/pipermail/bioperl-l/2009-April/029690.html However, you won't be able to retrieve all full texts and it is quite a task to parse natural language and get useful information about a gene, protein, SNP etc out of a manuscript. Frank On Tue, 2010-08-03 at 13:17 +0530, chirag matkar wrote: > Hello all, > I have a list of Pubmed Ids. > I want to parse articles to find specific SNP related information. > Can i work it out using a Script? > > > > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From David.Messina at sbc.su.se Thu Aug 5 08:16:17 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 5 Aug 2010 14:16:17 +0200 Subject: [Bioperl-l] call for a TreeIO volunteer Message-ID: <91AC5B00-5969-4C56-B08A-6EEA76916A10@sbc.su.se> Hi everybody, We've got a couple of small open bugs related to the Bio::TreeIO modules, and we could really use someone to take a look at them. Ideally, that someone would have familiarity with TreeIO already.* It'd help us to get the next release (1.6.2) out the door. The bugs in question are: - TreeIO::newick writes root node branch length incorrectly http://bugzilla.open-bio.org/show_bug.cgi?id=3039 - Bio::TreeIO::nhx cannot parse empty [&&NHX] + round-trip failure http://bugzilla.open-bio.org/show_bug.cgi?id=3007 Thanks, Dave on behalf of the core developers * Even if you don't, though, if you've been looking for an opportunity to contribute to BioPerl, and this sounds like something you'd like to work on, by all means raise your hand. From clements at nescent.org Thu Aug 5 13:15:41 2010 From: clements at nescent.org (Dave Clements) Date: Thu, 5 Aug 2010 10:15:41 -0700 Subject: [Bioperl-l] GMOD Europe 2010, 13-16 Sept, Cambridge, UK In-Reply-To: References: Message-ID: GMOD Europe 2010 ================ 13-16 September 2010 Cambridge, UK http://gmod.org/wiki/GMOD_Europe_2010 We are pleased to announce GMOD Europe 2010, four days of GMOD events being held 13-16 September 2010, at the University of Cambridge. GMOD Europe 2010 includes: 1) GMOD Community Meeting, Monday & Tuesday: Project updates, developer and user presentations and best practices, project direction. 2) GMOD Satellite Meetings, Wednesday: Special interest groups where GMOD community members meet to discuss specific topics of interest. 3) InterMine Workshop, Wednesday: A one day workshop on installing, configuring and using the InterMine biological data warehouse system. 4) BioMart Workshop, Thursday: A one day workshop on using the BioMart biological data warehouse system, including accessing data through APIs. Registration is now open for these events. There is a ?50 registration fee for the GMOD Meeting to cover catered lunches and other expenses. Registration for all other events is free, but required, as space is limited. These events are open to all: GMOD users, developers, prospective users, biologists, and computer scientists. See http://gmod.org/wiki/January_2010_GMOD_Meeting for an idea of what goes on at GMOD meetings, GMOD is a collection of interoperable open source software components for managing, visualizing and annotating biological data. GMOD incorporates many widely used tools, including GBrowse and JBrowse for genome browsing, InterMine and BioMart for data mining, Galaxy and Ergatis for workflow, Chado for data management, GBrowse_syn and CMap for comparative genomics, plus many other tools (Apollo, MAKER, Pathway Tools, Textpresso, ...). GMOD is also an active community of researchers and developers addressing common challenges in exploiting their data. If you are struggling to fully exploit your data then please consider attending GMOD Europe 2010. Please let us know if you have any questions, and we hope to see you in Cambridge. Thanks, Scott Cain and Dave Clements -- http://gmod.org/wiki/GMOD_News http://gmod.org/wiki/GMOD_Evo_Hackathon http://gmod.org/wiki/GMOD_Europe_2010 http://gmod.org/wiki/Help_Desk_Feedback From abhishek.vit at gmail.com Thu Aug 5 18:15:56 2010 From: abhishek.vit at gmail.com (Abhishek Pratap) Date: Thu, 5 Aug 2010 18:15:56 -0400 Subject: [Bioperl-l] Wrapper for Picard tools in Bioperl Message-ID: Hi All Just wondering if there is any Picard wrapper/s available in Bioperl. Thanks! -Abhi ----------------------------- Abhishek Pratap Bioinformatics Software Engineer II Genomics Resource Center Institute for Genome Sciences School of Medicine, Univ of Maryland 801, W. Baltimore Street, Baltimore, MD 21209 Ph: (+1)-410-706-2296 www.igs.umaryland.edu/ From Russell.Smithies at agresearch.co.nz Thu Aug 5 18:37:46 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Fri, 6 Aug 2010 10:37:46 +1200 Subject: [Bioperl-l] Wrapper for Picard tools in Bioperl In-Reply-To: References: Message-ID: <18DF7D20DFEC044098A1062202F5FFF32F02262E96@exchsth.agresearch.co.nz> Might be part of the "Enterprise" package. If not, some developer should "make it so". :-) --Russell (I hate Fridays) > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Abhishek Pratap > Sent: Friday, 6 August 2010 10:16 a.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Wrapper for Picard tools in Bioperl > > Hi All > > Just wondering if there is any Picard wrapper/s available in Bioperl. > > > Thanks! > -Abhi > > ----------------------------- > Abhishek Pratap > Bioinformatics Software Engineer II > Genomics Resource Center > Institute for Genome Sciences > School of Medicine, Univ of Maryland > 801, W. Baltimore Street, Baltimore, MD 21209 > Ph: (+1)-410-706-2296 > www.igs.umaryland.edu/ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From cjfields at illinois.edu Thu Aug 5 19:10:16 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 5 Aug 2010 18:10:16 -0500 Subject: [Bioperl-l] Wrapper for Picard tools in Bioperl In-Reply-To: References: Message-ID: <26E3E5B6-47CF-4744-9687-199C218B5571@illinois.edu> Picard uses samtools, which has a perl API: http://search.cpan.org/dist/Bio-SamTools/ which uses BioPerl. Ah, the circle of life... chris On Aug 5, 2010, at 5:15 PM, Abhishek Pratap wrote: > Hi All > > Just wondering if there is any Picard wrapper/s available in Bioperl. > > > Thanks! > -Abhi > > ----------------------------- > Abhishek Pratap > Bioinformatics Software Engineer II > Genomics Resource Center > Institute for Genome Sciences > School of Medicine, Univ of Maryland > 801, W. Baltimore Street, Baltimore, MD 21209 > Ph: (+1)-410-706-2296 > www.igs.umaryland.edu/ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From dan.kortschak at adelaide.edu.au Thu Aug 5 21:06:45 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Fri, 06 Aug 2010 10:36:45 +0930 Subject: [Bioperl-l] MUMmer parser work Message-ID: <1281056805.2414.26.camel@zoidberg.mbs.adelaide.edu.au> Hello Everyone, I've just noticed the absence of a MUMmer parser and thought that it might be a worthwhile contribution to bioperl-run (I won't be able to start on this for a while, but given Mark's excellent work on CommandExts, it should take too long to get up when I do have time). Has anyone made any effort in this direction that I would be stepping on, or if they have left it, that I could pick up to shorten the work time? cheers Dan From cjfields at illinois.edu Thu Aug 5 23:13:51 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 5 Aug 2010 22:13:51 -0500 Subject: [Bioperl-l] MUMmer parser work In-Reply-To: <1281056805.2414.26.camel@zoidberg.mbs.adelaide.edu.au> References: <1281056805.2414.26.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: <80AF6158-9ADF-47A6-97EC-C322F75C8959@illinois.edu> Dan, Just so you know, there is a proposed MUMmer AlignIO parser that John (genehack) is planning on trying to incorporate in: http://bugzilla.open-bio.org/show_bug.cgi?id=2701 It currently lacks significant tests, so feel free to chip in there as needed. chris On Aug 5, 2010, at 8:06 PM, Dan Kortschak wrote: > Hello Everyone, > > I've just noticed the absence of a MUMmer parser and thought that it > might be a worthwhile contribution to bioperl-run (I won't be able to > start on this for a while, but given Mark's excellent work on > CommandExts, it should take too long to get up when I do have time). Has > anyone made any effort in this direction that I would be stepping on, or > if they have left it, that I could pick up to shorten the work time? > > cheers > Dan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From greg at ebi.ac.uk Fri Aug 6 05:47:21 2010 From: greg at ebi.ac.uk (Gregory Jordan) Date: Fri, 6 Aug 2010 10:47:21 +0100 Subject: [Bioperl-l] call for a TreeIO volunteer In-Reply-To: <91AC5B00-5969-4C56-B08A-6EEA76916A10@sbc.su.se> References: <91AC5B00-5969-4C56-B08A-6EEA76916A10@sbc.su.se> Message-ID: I can help out with these. I'm pretty sure I've previously fought with (and perhaps even come up with a fix for) bug 3039, and I can take a look at 3007 too. Now lemme just see if I can get up and running with the Bioperl test suite. I'll give a shout if I run into any problems. Cheers, Greg On 5 August 2010 13:16, Dave Messina wrote: > Hi everybody, > > We've got a couple of small open bugs related to the Bio::TreeIO modules, > and we could really use someone to take a look at them. Ideally, that > someone would have familiarity with TreeIO already.* > > It'd help us to get the next release (1.6.2) out the door. > > The bugs in question are: > - TreeIO::newick writes root node branch length incorrectly > http://bugzilla.open-bio.org/show_bug.cgi?id=3039 > > - Bio::TreeIO::nhx cannot parse empty [&&NHX] + round-trip failure > http://bugzilla.open-bio.org/show_bug.cgi?id=3007 > > > Thanks, > Dave > on behalf of the core developers > > > * Even if you don't, though, if you've been looking for an opportunity to > contribute to BioPerl, and this sounds like something you'd like to work on, > by all means raise your hand. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jun.yin at ucd.ie Fri Aug 6 06:52:14 2010 From: jun.yin at ucd.ie (Jun Yin) Date: Fri, 06 Aug 2010 11:52:14 +0100 Subject: [Bioperl-l] Packages retrieving online alignment sequences Message-ID: <00d901cb3555$6e3a5500$4aaeff00$%yin@ucd.ie> Hi, all, I am the google summer of code student working on refactoring Bio::Align subsystem. I recently implemented several packages retrieving online alignment sequences. The aim of the packages are to provide convenient methods to retrieve online alignment sequences for the BioPerl users. The alignment sequences are converted into Bio::SimpleAlign object after the retrieval, which will be easy to manipulate and write to local disk. Now the packages support Pfam, Rfam, Prosite and Entrez Protein Clusters databases. Here is the structure of the packages: Packages Bio::DB::Align (interface, and calling other packages) Bio::DB::Align::Pfam (retrieving alignment from Pfam) Bio::DB::Align::Rfam (retrieving alignment from Rfam) Bio::DB::Align:Prosite (retrieving alignment from Prosite) Bio::DB::Align:ProtClustDB (retrieving alignment from Entrez Protein Clusters Database) Usually four methods are provided for each package: Methods get_Aln_by_id (retrieving alignment by id and returns Bio::SimpleAlign object) get_Aln_by_acc (retrieving alignment by acession and returns Bio::SimpleAlign object) (Rfam and Prosite only supports this method) id2acc (id to accession conversion) acc2id (accession to id conversion) These packages are built dependent on LWP::UserAgent, HTTP::Request and Bio::DB::GenericWebAgent. Bio::DB::Align::ProtClustDB is dependent on Bio::DB::EUtilities. Calling the packages can be: my $dbobj=Bio::DB::Align->new(-db=>"rfam"); Or, my $dbobj= Bio::DB::Align::Pfam->new(); my $aln=$dbobj->get_Aln_by_acc("RF0001"); my $aln2=$dbobj->get_Aln_by_acc(-accession=>"RF0001",-alignment=>"full"); print $aln->length(); foreach my $seq ($aln->each_Seq) { #do something } I have done some tests on these packages. And, I will write them into standard tests later. Any suggestions on these packages are welcome. Cheers, Jun Yin Ph.D. student in U.C.D. Bioinformatics Laboratory Conway Institute University College Dublin From David.Messina at sbc.su.se Fri Aug 6 08:59:19 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 6 Aug 2010 14:59:19 +0200 Subject: [Bioperl-l] call for a TreeIO volunteer In-Reply-To: References: <91AC5B00-5969-4C56-B08A-6EEA76916A10@sbc.su.se> Message-ID: <6D6DAA77-2A2F-4AAA-B36D-FACED1FDE383@sbc.su.se> > I can help out with these. I'm pretty sure I've previously fought with (and perhaps even come up with a fix for) bug 3039, and I can take a look at 3007 too. Awesome ? thanks Greg! > Now lemme just see if I can get up and running with the Bioperl test suite. I'll give a shout if I run into any problems. Please do. Dave From David.Messina at sbc.su.se Fri Aug 6 09:06:47 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 6 Aug 2010 15:06:47 +0200 Subject: [Bioperl-l] Packages retrieving online alignment sequences In-Reply-To: <00d901cb3555$6e3a5500$4aaeff00$%yin@ucd.ie> References: <00d901cb3555$6e3a5500$4aaeff00$%yin@ucd.ie> Message-ID: Sounds great, Jun! Did you happen to test your code on very large alignments? I know there's one in Pfam that's something like 100,000 sequences. An rRNA, I believe. Dave From jun.yin at ucd.ie Fri Aug 6 09:11:41 2010 From: jun.yin at ucd.ie (Jun Yin) Date: Fri, 06 Aug 2010 14:11:41 +0100 Subject: [Bioperl-l] Packages retrieving online alignment sequences In-Reply-To: References: <00d901cb3555$6e3a5500$4aaeff00$%yin@ucd.ie> Message-ID: <00fc01cb3568$e97968b0$bc6c3a10$%yin@ucd.ie> Hi, Dave, Thx for reminding me this. I will definitely try it. Cheers, Jun Yin Ph.D.?student in U.C.D. Bioinformatics Laboratory Conway Institute University College Dublin -----Original Message----- From: Dave Messina [mailto:David.Messina at sbc.su.se] Sent: Friday, August 06, 2010 2:07 PM To: Jun Yin Cc: bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Packages retrieving online alignment sequences Sounds great, Jun! Did you happen to test your code on very large alignments? I know there's one in Pfam that's something like 100,000 sequences. An rRNA, I believe. Dave __________ Information from ESET Smart Security, version of virus signature database 5346 (20100806) __________ The message was checked by ESET Smart Security. http://www.eset.com __________ Information from ESET Smart Security, version of virus signature database 5346 (20100806) __________ The message was checked by ESET Smart Security. http://www.eset.com From cjfields at illinois.edu Fri Aug 6 09:19:54 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 6 Aug 2010 08:19:54 -0500 Subject: [Bioperl-l] call for a TreeIO volunteer In-Reply-To: <6D6DAA77-2A2F-4AAA-B36D-FACED1FDE383@sbc.su.se> References: <91AC5B00-5969-4C56-B08A-6EEA76916A10@sbc.su.se> <6D6DAA77-2A2F-4AAA-B36D-FACED1FDE383@sbc.su.se> Message-ID: <8CB3DE9A-4C5C-42A3-94B4-8818D7143951@illinois.edu> On Aug 6, 2010, at 7:59 AM, Dave Messina wrote: > >> I can help out with these. I'm pretty sure I've previously fought with (and perhaps even come up with a fix for) bug 3039, and I can take a look at 3007 too. > > Awesome ? thanks Greg! > > >> Now lemme just see if I can get up and running with the Bioperl test suite. I'll give a shout if I run into any problems. > > Please do. > > > > Dave Agreed, and thanks for helping out! chris From dianabowley at gmail.com Fri Aug 6 18:33:57 2010 From: dianabowley at gmail.com (DRBowley) Date: Fri, 6 Aug 2010 15:33:57 -0700 (PDT) Subject: [Bioperl-l] BioPerl install issues Message-ID: I'm new to both perl and bioperl and I'm having issues installing bioperl. I'm trying to install on a Mac OS 10.6.4, and I've already installed perl (5.10.0). I tried installing using the recommended approach for Mac - via Fink... "fink install bioperl-pm5100" Looking back over the terminal window text it looks like the problem is: "This package requires Module::Build v0.2805 or greater to install itself." I tried doing "fink selfupdate" and that did not fix the problem. Any suggestions? Thanks! Diana From Kevin.M.Brown at asu.edu Fri Aug 6 18:50:45 2010 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Fri, 6 Aug 2010 15:50:45 -0700 Subject: [Bioperl-l] BioPerl install issues In-Reply-To: References: Message-ID: <1A4207F8295607498283FE9E93B775B406E44A05@EX02.asurite.ad.asu.edu> http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix#INSTALLING_BIOPE RL_THE_EASY_WAY_USING_Build.PL Not sure why you had to install perl since it should have been part of the stock OSX install (or at least it was last time I logged onto a mac). Not sure why the Fink method has so many issues, but might try the above which works for linux or bsd. -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of DRBowley Sent: Friday, August 06, 2010 3:34 PM To: bioperl-l at bioperl.org Subject: [Bioperl-l] BioPerl install issues I'm new to both perl and bioperl and I'm having issues installing bioperl. I'm trying to install on a Mac OS 10.6.4, and I've already installed perl (5.10.0). I tried installing using the recommended approach for Mac - via Fink... "fink install bioperl-pm5100" Looking back over the terminal window text it looks like the problem is: "This package requires Module::Build v0.2805 or greater to install itself." I tried doing "fink selfupdate" and that did not fix the problem. Any suggestions? Thanks! Diana _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From skastu01 at students.poly.edu Fri Aug 6 20:03:50 2010 From: skastu01 at students.poly.edu (Lakshmi Kastury) Date: Sat, 7 Aug 2010 00:03:50 +0000 Subject: [Bioperl-l] BioPerl install issues Message-ID: Hi - I went through several failed attempts on MACOS Snow Leopard, and fink was a dead end. Eventually I succeeded to install on Windows Vista using CPAN. I am not sure if this method will work with MACOS: 1. Opened command prompt. 2. Typed command: >perl -MCPAN -e "install Bundle::BioPerl" 3. Answered yes to the series of questions, which prompts install of several bundles and a compiler. The instructions were in a link from: http://bioperl.org/Core/Latest/INSTALL All the best, Lakshmi > Date: Fri, 6 Aug 2010 15:33:57 -0700 > From: dianabowley at gmail.com > To: bioperl-l at bioperl.org > Subject: [Bioperl-l] BioPerl install issues > > I'm new to both perl and bioperl and I'm having issues installing > bioperl. I'm trying to install on a Mac OS 10.6.4, and I've already > installed perl (5.10.0). I tried installing using the recommended > approach for Mac - via Fink... > "fink install bioperl-pm5100" > > Looking back over the terminal window text it looks like the problem > is: > "This package requires Module::Build v0.2805 or greater to install > itself." > > I tried doing "fink selfupdate" and that did not fix the problem. > > Any suggestions? > > Thanks! > Diana > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Sat Aug 7 02:47:40 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Sat, 7 Aug 2010 08:47:40 +0200 Subject: [Bioperl-l] BioPerl install issues In-Reply-To: References: Message-ID: <5BE9DB7C-9A51-4C09-8F83-8CA8ED4AADFE@sbc.su.se> On Aug 7, 2010, at 02:03 , Lakshmi Kastury wrote: > I am not sure if this method will work with MACOS: It will. CPAN is cross-platform and is the best way to install BioPerl. Dave From cjfields at illinois.edu Sat Aug 7 09:58:56 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 7 Aug 2010 08:58:56 -0500 Subject: [Bioperl-l] BioPerl install issues In-Reply-To: <5BE9DB7C-9A51-4C09-8F83-8CA8ED4AADFE@sbc.su.se> References: <5BE9DB7C-9A51-4C09-8F83-8CA8ED4AADFE@sbc.su.se> Message-ID: It should work fine. Even installing from trunk right now works w/o failing tests. chris On Aug 7, 2010, at 1:47 AM, Dave Messina wrote: > > On Aug 7, 2010, at 02:03 , Lakshmi Kastury wrote: > >> I am not sure if this method will work with MACOS: > > It will. CPAN is cross-platform and is the best way to install BioPerl. > > > Dave > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From greg at ebi.ac.uk Sat Aug 7 17:14:58 2010 From: greg at ebi.ac.uk (Gregory Jordan) Date: Sat, 7 Aug 2010 22:14:58 +0100 Subject: [Bioperl-l] Packages retrieving online alignment sequences In-Reply-To: <00fc01cb3568$e97968b0$bc6c3a10$%yin@ucd.ie> References: <00d901cb3555$6e3a5500$4aaeff00$%yin@ucd.ie> <00fc01cb3568$e97968b0$bc6c3a10$%yin@ucd.ie> Message-ID: Maybe I'm just a bit naive here, but what is the expected difference between accession and ID and why do we need a separate method for each? Seems to me that one could just have a single method, get_Aln, which determines under the hood whether the query string is an accession or ID. It would be nice if the SimpleAlign object had its Annotation filled with some extra metadata (such as accession, ID, database version number, URI, etc.). One other thing: have you thought about adding an Ensembl adaptor? Or maybe something similar already exists in BioPerl...? Sure Ensembl provides their own Perl API, but for someone who doesn't want to go through the hassle of installing it from CVS (pardon my french, but wtf!?! Who still uses CVS) and learning a whole new API, it might be convenient to have a simple BioPerl module for quickly grabbing gene family alignments from the public Ensembl MySQL databases. I'd be willing to help write the necessary SQL queries for this. greg On 6 August 2010 14:11, Jun Yin wrote: > Hi, Dave, > > Thx for reminding me this. I will definitely try it. > > Cheers, > Jun Yin > Ph.D. student in U.C.D. > > Bioinformatics Laboratory > Conway Institute > University College Dublin > > > -----Original Message----- > From: Dave Messina [mailto:David.Messina at sbc.su.se] > Sent: Friday, August 06, 2010 2:07 PM > To: Jun Yin > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Packages retrieving online alignment sequences > > Sounds great, Jun! > > Did you happen to test your code on very large alignments? I know there's > one in Pfam that's something like 100,000 sequences. An rRNA, I believe. > > > Dave > > > __________ Information from ESET Smart Security, version of virus signature > database 5346 (20100806) __________ > > The message was checked by ESET Smart Security. > > http://www.eset.com > > > > > __________ Information from ESET Smart Security, version of virus signature > database 5346 (20100806) __________ > > The message was checked by ESET Smart Security. > > http://www.eset.com > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Sat Aug 7 18:07:39 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 7 Aug 2010 17:07:39 -0500 Subject: [Bioperl-l] Packages retrieving online alignment sequences In-Reply-To: References: <00d901cb3555$6e3a5500$4aaeff00$%yin@ucd.ie> <00fc01cb3568$e97968b0$bc6c3a10$%yin@ucd.ie> Message-ID: <21E3B6D7-01BC-4DDA-B5B3-06F1F5AD7105@illinois.edu> On Aug 7, 2010, at 4:14 PM, Gregory Jordan wrote: > Maybe I'm just a bit naive here, but what is the expected difference between > accession and ID and why do we need a separate method for each? Depends on the remote service, but in many cases there is a difference. With NCBI eutils you can have either an accession and the unique identifier (UID, or GI for nuc/protein seqs). efetch can use both, but only the UID is guaranteed to retrieve a single sequence all the time; the accession can (very rarely) map to more than one sequence. The other eutils services require either a string (esearch) or a UID, but do not allow an accession. > Seems to me > that one could just have a single method, get_Aln, which determines under > the hood whether the query string is an accession or ID. A simpler method could be introduced, but I can see that being potentially brittle in the long run. A naked alphanumeric string doesn't reveal much about what it is at face value w/o knowing database/service-specific behavior. And then we're reliant on that behavior not changing, which we can't guarantee (this has bitten us in the past). What would one do if NCBI (for instance) allowed accessions derived completely of digits, or conversely a unique ID with mixed alphanumerics? Using methods specific for ID/acc at least guarantees a behavior on the backend w/o guessing, and if there is no danger of overlap (a service accepts either/or) one could simply be an alias of the other. > It would be nice if the SimpleAlign object had its Annotation filled with > some extra metadata (such as accession, ID, database version number, URI, > etc.). According to the deobfuscator SimpleAlign does have accession() and id(). The others could be simple attributes, and can be added as simple getter/setters, or as annotation via Bio::Annotation (this is the way Stockholm annotation is currently handled). Something to think about. > One other thing: have you thought about adding an Ensembl adaptor? Or maybe > something similar already exists in BioPerl...? That's a good idea, though it might make more sense if this was done when mem-efficient (possibly DB-dependent) AlignI modules are present within bioperl, which is part of the GSoC (see below). For instance, have a Bio::Align::AlignI with a backend ensembl DB adaptor that works lazily. If using the Ensembl Perl API, a few possible roadblocks/problems might pop up. Ensembl currently requires bioperl (v1.2.3, but it works with the latest as well, at least when I've used it). If using the ensembl perl API we would just need to ensure we aren't conflicting with ensembl code that pulls in bioperl classes expecting a v1.2.3 API when we only support the latest. I don't foresee this being an issue, though (there is precedent for this, see Sendu's Ensembl module Bio::Tools::Run::Ensembl in bioperl-run). > Sure Ensembl provides their own Perl API, but for someone who doesn't want > to go through the hassle of installing it from CVS (pardon my french, but > wtf!?! Who still uses CVS) and learning a whole new API, it might be > convenient to have a simple BioPerl module for quickly grabbing gene family > alignments from the public Ensembl MySQL databases. I'd be willing to help > write the necessary SQL queries for this. > > greg The GSoC project on alignment subsystem refactoring will be finishing up this month, so I'm sure Jun discuss ideas for initial DB-dependent implementations. The more input and coders implementing the better, IMO. As for writing up an adaptor to ensembl outside of it's API, overall I don't think it's a bad idea, but if it's possible maybe start without reinventing things, then move to direct SQL. Unless it's easier to use SQL. chris > On 6 August 2010 14:11, Jun Yin wrote: > >> Hi, Dave, >> >> Thx for reminding me this. I will definitely try it. >> >> Cheers, >> Jun Yin >> Ph.D. student in U.C.D. >> >> Bioinformatics Laboratory >> Conway Institute >> University College Dublin >> >> >> -----Original Message----- >> From: Dave Messina [mailto:David.Messina at sbc.su.se] >> Sent: Friday, August 06, 2010 2:07 PM >> To: Jun Yin >> Cc: bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] Packages retrieving online alignment sequences >> >> Sounds great, Jun! >> >> Did you happen to test your code on very large alignments? I know there's >> one in Pfam that's something like 100,000 sequences. An rRNA, I believe. >> >> >> Dave >> >> >> __________ Information from ESET Smart Security, version of virus signature >> database 5346 (20100806) __________ >> >> The message was checked by ESET Smart Security. >> >> http://www.eset.com >> >> >> >> >> __________ Information from ESET Smart Security, version of virus signature >> database 5346 (20100806) __________ >> >> The message was checked by ESET Smart Security. >> >> http://www.eset.com >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hartzell at alerce.com Sat Aug 7 17:45:04 2010 From: hartzell at alerce.com (George Hartzell) Date: Sat, 7 Aug 2010 14:45:04 -0700 Subject: [Bioperl-l] BioPerl install issues In-Reply-To: References: <5BE9DB7C-9A51-4C09-8F83-8CA8ED4AADFE@sbc.su.se> Message-ID: <19549.54240.499140.501136@gargle.gargle.HOWL> Chris Fields writes: > It should work fine. Even installing from trunk right now works > w/o failing tests. As a slight aside, if you're looking to build a current perl binary for your mac (e.g. 5.12.1) you should take a look at perlbrew (http://search.cpan.org/dist/App-perlbrew/). The three steps at the top of the installation section of the README are all you need to get going. Even a manager can do it. If you're using bash on the mac via terminal you'll probably want to put the one-liner they prescribe into your .bash_profile instead of your .bashrc, but everything else just flows right along. Once you have that in place you have a nicely isolated system into which you can install things to your hearts content without worrying about PERL5LIB and local::lib and the rest. g. From cjfields at illinois.edu Sat Aug 7 21:19:54 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 7 Aug 2010 20:19:54 -0500 Subject: [Bioperl-l] BioPerl install issues In-Reply-To: <19549.54240.499140.501136@gargle.gargle.HOWL> References: <5BE9DB7C-9A51-4C09-8F83-8CA8ED4AADFE@sbc.su.se> <19549.54240.499140.501136@gargle.gargle.HOWL> Message-ID: On Aug 7, 2010, at 4:45 PM, George Hartzell wrote: > Chris Fields writes: >> It should work fine. Even installing from trunk right now works >> w/o failing tests. > > As a slight aside, if you're looking to build a current perl binary > for your mac (e.g. 5.12.1) you should take a look at perlbrew > (http://search.cpan.org/dist/App-perlbrew/). The three steps at the > top of the installation section of the README are all you need to get > going. Even a manager can do it. > > If you're using bash on the mac via terminal you'll probably want to > put the one-liner they prescribe into your .bash_profile instead of > your .bashrc, but everything else just flows right along. > > Once you have that in place you have a nicely isolated system into > which you can install things to your hearts content without worrying > about PERL5LIB and local::lib and the rest. > > g. Have to second using perlbrew, started using it for my local Ubuntu installation (don't have it running on my macbook yet, but it's in the plans). chris From greg at ebi.ac.uk Sun Aug 8 02:12:41 2010 From: greg at ebi.ac.uk (Gregory Jordan) Date: Sun, 8 Aug 2010 07:12:41 +0100 Subject: [Bioperl-l] Packages retrieving online alignment sequences In-Reply-To: <21E3B6D7-01BC-4DDA-B5B3-06F1F5AD7105@illinois.edu> References: <00d901cb3555$6e3a5500$4aaeff00$%yin@ucd.ie> <00fc01cb3568$e97968b0$bc6c3a10$%yin@ucd.ie> <21E3B6D7-01BC-4DDA-B5B3-06F1F5AD7105@illinois.edu> Message-ID: On 7 August 2010 23:07, Chris Fields wrote: > > A simpler method could be introduced, but I can see that being potentially > brittle in the long run. A naked alphanumeric string doesn't reveal much > about what it is at face value w/o knowing database/service-specific > behavior. And then we're reliant on that behavior not changing, which we > can't guarantee (this has bitten us in the past). What would one do if NCBI > (for instance) allowed accessions derived completely of digits, or > conversely a unique ID with mixed alphanumerics? > > Using methods specific for ID/acc at least guarantees a behavior on the > backend w/o guessing, and if there is no danger of overlap (a service > accepts either/or) one could simply be an alias of the other. > Thanks for the clarification on IDs vs accessions. As long as the behavior and distinction are well-documented, I'm sure it won't make too much of a difference. My main concern was just that having two similar methods -- with no clearly laid out distinction between the two and one of them only supported by half of the implementing subclasses -- might confuse potential users. As a point of reference: both Rfam and Pfam allow either an ID or an accession in their front-page search interface (http://www.pfam.org / http://www.rfam.org/). In fact, they seem to entirely hide the distinction between ID and Accession from the end user; nowhere on the Rfam page for an individual result is it clear which string is the accession and which is the ID (http://rfam.sanger.ac.uk/family/snoZ107_R87). Thus, a potential user of the Rfam module wouldn't know whether to call the get_by_ID or get_by_Accession method, even after looking at the Rfam page for his / her desired alignment! As you can probably tell, I'm all in favor of a unified search whenever feasible / possible. :-) > As for writing up an adaptor to ensembl outside of it's API, overall I > don't think it's a bad idea, but if it's possible maybe start without > reinventing things, then move to direct SQL. Unless it's easier to use SQL. > > For fetching Ensembl's gene family alignments, using the SQL will be easiest. They don't tend to get unreasonably large in terms of memory -- I think the biggest tend to be ~700 sequences with a few thousand alignment columns or so -- and it's a simple table join or two to get both the tree and alignment from the database. For genomic alignments, I agree that a more memory-efficient and/or lazy backend would be necessary. And it's pretty much impossible to get those things out of the Ensembl tables without using their API. --greg From dan.kortschak at adelaide.edu.au Sun Aug 8 20:53:43 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Mon, 09 Aug 2010 10:23:43 +0930 Subject: [Bioperl-l] MUMmer parser work In-Reply-To: <80AF6158-9ADF-47A6-97EC-C322F75C8959@illinois.edu> References: <1281056805.2414.26.camel@zoidberg.mbs.adelaide.edu.au> <80AF6158-9ADF-47A6-97EC-C322F75C8959@illinois.edu> Message-ID: <1281315223.2414.48.camel@zoidberg.mbs.adelaide.edu.au> Hi Chris, Is that set of files planned to be included in the git repository on bioperl-live? I don't want to push something that is being organised by someone else. cheers Dan On Thu, 2010-08-05 at 22:13 -0500, Chris Fields wrote: > Dan, > > Just so you know, there is a proposed MUMmer AlignIO parser that John (genehack) is planning on trying to incorporate in: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2701 > > It currently lacks significant tests, so feel free to chip in there as needed. > > chris From genehack at genehack.org Sun Aug 8 21:42:27 2010 From: genehack at genehack.org (John SJ Anderson) Date: Sun, 8 Aug 2010 21:42:27 -0400 Subject: [Bioperl-l] MUMmer parser work In-Reply-To: <1281315223.2414.48.camel@zoidberg.mbs.adelaide.edu.au> References: <1281056805.2414.26.camel@zoidberg.mbs.adelaide.edu.au> <80AF6158-9ADF-47A6-97EC-C322F75C8959@illinois.edu> <1281315223.2414.48.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: <5BEA6ECA-B7A7-4417-BC91-763AB956347A@genehack.org> I'm working on getting those files into a topic branch in bioperl-live so they can be reviewed -- that'll probably be pushed back to the main master within the next couple days at the latest. j. On Aug 8, 2010, at 20:53 , Dan Kortschak wrote: > Hi Chris, > > Is that set of files planned to be included in the git repository on > bioperl-live? I don't want to push something that is being organised by > someone else. > > cheers > Dan > > On Thu, 2010-08-05 at 22:13 -0500, Chris Fields wrote: >> Dan, >> >> Just so you know, there is a proposed MUMmer AlignIO parser that John (genehack) is planning on trying to incorporate in: >> >> http://bugzilla.open-bio.org/show_bug.cgi?id=2701 >> >> It currently lacks significant tests, so feel free to chip in there as needed. >> >> chris > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From dan.kortschak at adelaide.edu.au Sun Aug 8 22:03:52 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Mon, 09 Aug 2010 11:33:52 +0930 Subject: [Bioperl-l] MUMmer parser work In-Reply-To: <5BEA6ECA-B7A7-4417-BC91-763AB956347A@genehack.org> References: <1281056805.2414.26.camel@zoidberg.mbs.adelaide.edu.au> <80AF6158-9ADF-47A6-97EC-C322F75C8959@illinois.edu> <1281315223.2414.48.camel@zoidberg.mbs.adelaide.edu.au> <5BEA6ECA-B7A7-4417-BC91-763AB956347A@genehack.org> Message-ID: <1281319432.2414.49.camel@zoidberg.mbs.adelaide.edu.au> Excellent. Thanks for that. Dan On Sun, 2010-08-08 at 21:42 -0400, John SJ Anderson wrote: > I'm working on getting those files into a topic branch in bioperl-live so they can be reviewed -- that'll probably be pushed back to the main master within the next couple days at the latest. > > j. From cjfields at illinois.edu Mon Aug 9 22:40:07 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 9 Aug 2010 21:40:07 -0500 Subject: [Bioperl-l] bioperl-live, moving Bio->lib/Bio Message-ID: Any objections to moving the Bio directory to lib/Bio in bioperl-live? It's a more standard location for code in most distributions; I have a branch (topic/cjfields_standard_lib) that has this working, though it's possible that it needs more work. chris From genehack at genehack.org Tue Aug 10 04:30:44 2010 From: genehack at genehack.org (John SJ Anderson) Date: Tue, 10 Aug 2010 04:30:44 -0400 Subject: [Bioperl-l] bioperl-live, moving Bio->lib/Bio In-Reply-To: References: Message-ID: On Aug 9, 2010, at 22:40 , Chris Fields wrote: > Any objections to moving the Bio directory to lib/Bio in bioperl-live? +1 on this idea. j. From genehack at genehack.org Tue Aug 10 07:21:51 2010 From: genehack at genehack.org (John Anderson) Date: Tue, 10 Aug 2010 07:21:51 -0400 Subject: [Bioperl-l] MUMmer parser work In-Reply-To: <5BEA6ECA-B7A7-4417-BC91-763AB956347A@genehack.org> References: <1281056805.2414.26.camel@zoidberg.mbs.adelaide.edu.au> <80AF6158-9ADF-47A6-97EC-C322F75C8959@illinois.edu> <1281315223.2414.48.camel@zoidberg.mbs.adelaide.edu.au> <5BEA6ECA-B7A7-4417-BC91-763AB956347A@genehack.org> Message-ID: <7A4F93AB-1BF7-4775-BC0E-38E7B431ECC6@genehack.org> On Aug 8, 2010, at 9:42 PM, John SJ Anderson wrote: > I'm working on getting those files into a topic branch in bioperl-live so they can be reviewed -- that'll probably be pushed back to the main master within the next couple days at the latest. Okay, the files have been added to topic/bug-2701 -- see . Please note, these are just the files from the bug report, slotted into the appropriate spots. I haven't reviewed the code or done anything about the non-BioPerl-y tests or the general lack of test coverage. I hope to do something about that in the coming week, but if somebody beats me to it, that would be okay too. j. From maj at fortinbras.us Tue Aug 10 19:52:05 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 10 Aug 2010 19:52:05 -0400 Subject: [Bioperl-l] bioperl-live, moving Bio->lib/Bio In-Reply-To: References: Message-ID: <1C55239986494A8D82BDC21A85B324E9@NewLife> +1 ----- Original Message ----- From: "Chris Fields" To: "BioPerl List" Sent: Monday, August 09, 2010 10:40 PM Subject: [Bioperl-l] bioperl-live, moving Bio->lib/Bio > Any objections to moving the Bio directory to lib/Bio in bioperl-live? It's a > more standard location for code in most distributions; I have a branch > (topic/cjfields_standard_lib) that has this working, though it's possible that > it needs more work. > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From fayroz_farouk at yahoo.com Sun Aug 8 04:24:31 2010 From: fayroz_farouk at yahoo.com (fayroz) Date: Sun, 8 Aug 2010 01:24:31 -0700 (PDT) Subject: [Bioperl-l] using HMMER Message-ID: <603590.1072.qm@web112620.mail.gq1.yahoo.com> i need your help, i?am a new perl user and want to use bioperl modules to run HMMER program ( HMMsearch) i have" model.hmm" and a "fasta file" to?see?which of them are similar?with the model i write this code but there is a problems #!/usr/local/bin/perl W use Bio::AlignIO; use Bio::SearchIO; use Bio::SeqIO ; use Bio::Tools::Run::Hmmer; # run hmmsearch (similar for hmmpfam) my $factory = Bio::Tools::Run::Hmmer->new(-hmm => 'h6_avian.hmm',-informat => 'fasta'); my $seq = Bio::SeqIO->new('-file'=> "one_seq.fa", '-format'=>'Fasta'); # Pass the factory a Bio::Seq object or a file name, returns a Bio::SearchIO my $searchio = $factory->hmmsearch($seq); while (my $result = $searchio->next_result){ while(my $hit = $result->next_hit){ while (my $hsp = $hit->next_hsp){ print join("\t", ( $result->query_name, $hsp->query->start, $hsp->query->end, $hit->name, $hsp->hit->start, $hsp->hit->end, $hsp->score, $hsp->evalue, $hsp->seq_str, )), "\n"; } } } exceptions: MSG: Unknown kind of input 'Bio::SeqIO::fasta=HASH(0x329a504)' STACK Bio::Tools::Run::Hmmer::_setinput D:/Perl/site/lib/Bio/Tools/Run/Hmmer.pm:381 STACK Bio::Tools::Run::Hmmer::hmmsearch D:/Perl/site/lib/Bio/Tools/Run/Hmmer.pm:352 ?STACK toplevel test_bioperl.pl:12 thank you fayroz? From douglas.hoen at gmail.com Tue Aug 10 21:54:53 2010 From: douglas.hoen at gmail.com (Douglas Hoen) Date: Tue, 10 Aug 2010 21:54:53 -0400 Subject: [Bioperl-l] Bio::SeqFeature::SimilarityPair->from_searchResult()? Message-ID: <4513D6B2-F7B3-4A6E-91CA-879C9E372E84@gmail.com> Hi, I was wondering why the Synopsis in the docs for Bio::SeqFeature::SimilarityPair has the following: $sim_pair = Bio::SeqFeature::SimilarityPair->from_searchResult($blastHit); There doesn't actually seem to be a from_searchResult method. Am I missing something? Thanks, -- Doug From zhaoy at mail.cbi.pku.edu.cn Wed Aug 11 04:17:42 2010 From: zhaoy at mail.cbi.pku.edu.cn (zhaoy at mail.cbi.pku.edu.cn) Date: Wed, 11 Aug 2010 16:17:42 +0800 (CST) Subject: [Bioperl-l] About extracting sequence from genewise format result Message-ID: <53663.162.105.250.100.1281514662.squirrel@mail.cbi.pku.edu.cn> Dear authors: Hello! Recently I am trying to parse the genewise format result for extracting the nuclear sequence using method "hit_string" in module "SearchIO", however, the result is empty. What's more terrible, the cycle seems not working, because I always get the last result. I'm confused. My perl code is shown below: #!/usr/bin/perl -w use strict; use warnings; use Bio::SearchIO; my $in = new Bio::SearchIO(-format => 'wise', -wisetype => 'genewise', -file => 'test'); while( my $result = $in->next_result ) { while (my $hit = $result->next_hit) { while (my $hsp = $hit->next_hsp){ print "Query=", $result->query_name, "\n", "Length=", $hsp->length('total'),"\n", "hit_string:", $hsp->hit_string, "\n"; } } } And one of the genewise format results is shown below: genewise $Name: wise2-4-0alpha $ (unreleased release) This program is freely distributed under a GPL. See source directory Copyright (c) GRL limited: portions of the code are from separate copyright Query protein: Cpa_s110_24 Comp Matrix: BLOSUM62.bla Gap open: 12 Gap extension: 2 Start/End global Target Sequence Bdi_chr3:38292015..38292302 Strand: forward Start/End (protein) global Gene Parameter file: gene.stat Splice site model: GT/AG only Codon Table: codon.table Subs error: 1e-06 Indel error: 1e-06 Null model syn Algorithm 623 genewise output Score 37.97 bits over entire alignment Scores as bits over a synchronous coding model Warning: The bits scores is not probablistically correct for single seqs See WWW help for more info Cpa_s110_24 1 MGNCQAVDAATLAIQHPS-GKVDRLYWPVSASEVMRTNPGHYVALLI-- MGNCQA DAA + IQHP+ GKV+RLYWP +A++VMR NPGHYVAL++ MGNCQAADAAAVVIQHPAEGKVERLYWPATAADVMRKNPGHYVALVVVH Bdi_chr3:382920 1 agatcggggggggacccgggaggccttcgaggggacaacgctggcgggc tgagaccaccctttaaccagatagtagcccccattgaacgaatctttta gctcgggtggcggcgcgcgggcgcccggccgcccgcgcccccccccccc Cpa_s110_24 47 ----STTLCPSNSNASNAESVRVTRIKLLRPTDTLVLGQVYRLITTQEV P+ + A + R+T++KLL+P DTL++GQVYRLIT+Q VSGGAGETDPAVAGGGAAAAARITKVKLLKPRDTLLIGQVYRLITSQ-- Bdi_chr3:382920 148 gtgggggagcgggggggggggaaaagaccaccgaccagcgtccaatc tcggcgacacctcgggcccccgtcatattacgactttgatagttcca cctcctgtcccacaaaattccgccgcgccgcgctgcccgccccccca Cpa_s110_24 92 MKGLWAKKCAKMKKYQEADHKDGLKPETIPGRRSGPERDTQVAKHERHR ------------------------------------------------- Bdi_chr3:382920 289 Cpa_s110_24 141 SRVAASTNQAGLKSRTWQPSLKSISEAAS ----------------------------- Bdi_chr3:382920 289 // Gene 1 Gene 1 288 Exon 1 288 phase 0 Supporting 1 54 1 18 Supporting 58 141 19 46 Supporting 160 288 47 89 // ...... The part of output of this code is shown below: Query=Aly_481360 Length=0 hit_string: Query=Aly_481360 Length=0 hit_string: ...... What's wrong with my code and how can I get the correct result? I'm looking forward to your reply. Thanks very much! Best regards, Zackaly From roy.chaudhuri at gmail.com Wed Aug 11 10:32:39 2010 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Wed, 11 Aug 2010 15:32:39 +0100 Subject: [Bioperl-l] using HMMER In-Reply-To: <603590.1072.qm@web112620.mail.gq1.yahoo.com> References: <603590.1072.qm@web112620.mail.gq1.yahoo.com> Message-ID: <4C62B487.9090103@gmail.com> Hi Fayroz, Your $seq variable contains a Bio::SeqIO object (a biological filehandle), not a Bio::Seq (sequence object). You need to change that line to: my $seqio = Bio::SeqIO->new(-file=>'one_seq.fa', -format=>'fasta'); my $seq=$seqio->next_seq; If you have multiple sequences in the file, then you will need to loop over them: while (my $seq=$seqio->next_seq) { # Code to run Hmmer goes here } Also, I don't think you need to specify -informat for your Bio::Tools::Run::Hmmer object, since you're passing it a sequence object, not a filename. Hope this helps. Roy. On 08/08/2010 09:24, fayroz wrote: > i need your help, i am a new perl user and want to use bioperl modules to run > HMMER program ( HMMsearch) i have" model.hmm" and a "fasta file" to see which of > them are similar with the model > i write this code but there is a problems > > #!/usr/local/bin/perl W > use Bio::AlignIO; > use Bio::SearchIO; > use Bio::SeqIO ; > use Bio::Tools::Run::Hmmer; > > # run hmmsearch (similar for hmmpfam) > my $factory = Bio::Tools::Run::Hmmer->new(-hmm => 'h6_avian.hmm',-informat => > 'fasta'); > my $seq = Bio::SeqIO->new('-file'=> "one_seq.fa", '-format'=>'Fasta'); > > # Pass the factory a Bio::Seq object or a file name, returns a Bio::SearchIO > my $searchio = $factory->hmmsearch($seq); > > while (my $result = $searchio->next_result){ > while(my $hit = $result->next_hit){ > while (my $hsp = $hit->next_hsp){ > print join("\t", ( $result->query_name, > $hsp->query->start, > $hsp->query->end, > $hit->name, > $hsp->hit->start, > $hsp->hit->end, > $hsp->score, > $hsp->evalue, > $hsp->seq_str, > )), "\n"; > } > } > } > > > exceptions: > MSG: Unknown kind of input 'Bio::SeqIO::fasta=HASH(0x329a504)' > STACK Bio::Tools::Run::Hmmer::_setinput > D:/Perl/site/lib/Bio/Tools/Run/Hmmer.pm:381 > STACK Bio::Tools::Run::Hmmer::hmmsearch > D:/Perl/site/lib/Bio/Tools/Run/Hmmer.pm:352 > STACK toplevel test_bioperl.pl:12 > thank you > > fayroz > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Aug 11 11:07:36 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 11 Aug 2010 10:07:36 -0500 Subject: [Bioperl-l] using HMMER In-Reply-To: <4C62B487.9090103@gmail.com> References: <603590.1072.qm@web112620.mail.gq1.yahoo.com> <4C62B487.9090103@gmail.com> Message-ID: <62C86AFB-FF3A-44C6-A413-50C3F839DF34@illinois.edu> might also want to check whether you are using hmmer2 vs hmmer3. not sure if the wrapper works for hmmer3. chris On Aug 11, 2010, at 9:32 AM, Roy Chaudhuri wrote: > Hi Fayroz, > > Your $seq variable contains a Bio::SeqIO object (a biological filehandle), not a Bio::Seq (sequence object). > > You need to change that line to: > my $seqio = Bio::SeqIO->new(-file=>'one_seq.fa', -format=>'fasta'); > my $seq=$seqio->next_seq; > > If you have multiple sequences in the file, then you will need to loop over them: > while (my $seq=$seqio->next_seq) { > # Code to run Hmmer goes here > } > > Also, I don't think you need to specify -informat for your Bio::Tools::Run::Hmmer object, since you're passing it a sequence object, not a filename. > > Hope this helps. > Roy. > > On 08/08/2010 09:24, fayroz wrote: >> i need your help, i am a new perl user and want to use bioperl modules to run >> HMMER program ( HMMsearch) i have" model.hmm" and a "fasta file" to see which of >> them are similar with the model >> i write this code but there is a problems >> >> #!/usr/local/bin/perl W >> use Bio::AlignIO; >> use Bio::SearchIO; >> use Bio::SeqIO ; >> use Bio::Tools::Run::Hmmer; >> >> # run hmmsearch (similar for hmmpfam) >> my $factory = Bio::Tools::Run::Hmmer->new(-hmm => 'h6_avian.hmm',-informat => >> 'fasta'); >> my $seq = Bio::SeqIO->new('-file'=> "one_seq.fa", '-format'=>'Fasta'); >> >> # Pass the factory a Bio::Seq object or a file name, returns a Bio::SearchIO >> my $searchio = $factory->hmmsearch($seq); >> >> while (my $result = $searchio->next_result){ >> while(my $hit = $result->next_hit){ >> while (my $hsp = $hit->next_hsp){ >> print join("\t", ( $result->query_name, >> $hsp->query->start, >> $hsp->query->end, >> $hit->name, >> $hsp->hit->start, >> $hsp->hit->end, >> $hsp->score, >> $hsp->evalue, >> $hsp->seq_str, >> )), "\n"; >> } >> } >> } >> >> >> exceptions: >> MSG: Unknown kind of input 'Bio::SeqIO::fasta=HASH(0x329a504)' >> STACK Bio::Tools::Run::Hmmer::_setinput >> D:/Perl/site/lib/Bio/Tools/Run/Hmmer.pm:381 >> STACK Bio::Tools::Run::Hmmer::hmmsearch >> D:/Perl/site/lib/Bio/Tools/Run/Hmmer.pm:352 >> STACK toplevel test_bioperl.pl:12 >> thank you >> >> fayroz >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From douglas.hoen at gmail.com Wed Aug 11 15:13:49 2010 From: douglas.hoen at gmail.com (Doug) Date: Wed, 11 Aug 2010 12:13:49 -0700 (PDT) Subject: [Bioperl-l] How to store results of searches of translated DNA in SeqFeature::Store database of the original DNA? Message-ID: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com> Hi, I am trying to store in a SeqFeature::Store database the results of searches of translated DNA. The DB contains the original DNA sequences. For instance, I have done HMMER searches of 6-frame translations of the sequences stored in the DB. I want to store these results "at" their (equivalent) DNA positions, which I can calculate. Preferably, I would like to directly store the SeqFeature::Similarity objects that I get from parsing these searches. But they are of course located on different coordinate systems than the DNA, so I guess I can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct DNA position and then store the Similarity's as sub-SeqFeatures. I could just set the Similarity's position to the (calculated) DNA coordinates, or alternately make a new SeqFeature and copy in the attributes I want. But is there a more elegant solution? Thanks, -- Doug From douglas.hoen at gmail.com Wed Aug 11 16:11:26 2010 From: douglas.hoen at gmail.com (Doug) Date: Wed, 11 Aug 2010 13:11:26 -0700 (PDT) Subject: [Bioperl-l] How to store results of searches of translated DNA in SeqFeature::Store database of the original DNA? In-Reply-To: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com> References: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com> Message-ID: One possible answer to my own question: Use Bio::SeqFeature::PositionProxy's? Would this work? On Aug 11, 3:13?pm, Doug wrote: > Hi, > > I am trying to store in a SeqFeature::Store database the results of > searches of translated DNA. The DB contains the original DNA > sequences. For instance, I have done HMMER searches of 6-frame > translations of the sequences stored in the DB. I want to store these > results "at" their (equivalent) DNA positions, which I can calculate. > Preferably, I would like to directly store the SeqFeature::Similarity > objects that I get from parsing these searches. But they are of course > located on different coordinate systems than the DNA, so I guess I > can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct > DNA position and then store the Similarity's as sub-SeqFeatures. > > I could just set the Similarity's position to the (calculated) DNA > coordinates, or alternately make a new SeqFeature and copy in the > attributes I want. But is there a more elegant solution? > > Thanks, > -- Doug > _______________________________________________ > Bioperl-l mailing list > Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l From scott at scottcain.net Wed Aug 11 16:16:22 2010 From: scott at scottcain.net (Scott Cain) Date: Wed, 11 Aug 2010 16:16:22 -0400 Subject: [Bioperl-l] How to store results of searches of translated DNA in SeqFeature::Store database of the original DNA? In-Reply-To: References: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com> Message-ID: Hi Doug, I don't know if any of the things you've thought of would work; I've never tried it. My inclination would be to express your data in GFF3 and use the standard loader. Scott On Wed, Aug 11, 2010 at 4:11 PM, Doug wrote: > One possible answer to my own question: Use > Bio::SeqFeature::PositionProxy's? Would this work? > > On Aug 11, 3:13?pm, Doug wrote: >> Hi, >> >> I am trying to store in a SeqFeature::Store database the results of >> searches of translated DNA. The DB contains the original DNA >> sequences. For instance, I have done HMMER searches of 6-frame >> translations of the sequences stored in the DB. I want to store these >> results "at" their (equivalent) DNA positions, which I can calculate. >> Preferably, I would like to directly store the SeqFeature::Similarity >> objects that I get from parsing these searches. But they are of course >> located on different coordinate systems than the DNA, so I guess I >> can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct >> DNA position and then store the Similarity's as sub-SeqFeatures. >> >> I could just set the Similarity's position to the (calculated) DNA >> coordinates, or alternately make a new SeqFeature and copy in the >> attributes I want. But is there a more elegant solution? >> >> Thanks, >> -- Doug >> _______________________________________________ >> Bioperl-l mailing list >> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From douglas.hoen at gmail.com Wed Aug 11 16:38:54 2010 From: douglas.hoen at gmail.com (Doug) Date: Wed, 11 Aug 2010 13:38:54 -0700 (PDT) Subject: [Bioperl-l] How to store results of searches of translated DNA in SeqFeature::Store database of the original DNA? In-Reply-To: References: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com> Message-ID: <6e28dc26-ada0-4be2-9f62-a4d632aaf0bb@j8g2000yqd.googlegroups.com> Hi Scott, Good idea. Would you happen to know of an existing HMMER3 to GFF3 converter? Thanks for your advice, -- Doug On Aug 11, 4:16?pm, Scott Cain wrote: > Hi Doug, > > I don't know if any of the things you've thought of would work; I've > never tried it. ?My inclination would be to express your data in GFF3 > and use the standard loader. > > Scott > > > > > > On Wed, Aug 11, 2010 at 4:11 PM, Doug wrote: > > One possible answer to my own question: Use > > Bio::SeqFeature::PositionProxy's? Would this work? > > > On Aug 11, 3:13?pm, Doug wrote: > >> Hi, > > >> I am trying to store in a SeqFeature::Store database the results of > >> searches of translated DNA. The DB contains the original DNA > >> sequences. For instance, I have done HMMER searches of 6-frame > >> translations of the sequences stored in the DB. I want to store these > >> results "at" their (equivalent) DNA positions, which I can calculate. > >> Preferably, I would like to directly store the SeqFeature::Similarity > >> objects that I get from parsing these searches. But they are of course > >> located on different coordinate systems than the DNA, so I guess I > >> can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct > >> DNA position and then store the Similarity's as sub-SeqFeatures. > > >> I could just set the Similarity's position to the (calculated) DNA > >> coordinates, or alternately make a new SeqFeature and copy in the > >> attributes I want. But is there a more elegant solution? > > >> Thanks, > >> -- Doug > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > > Bioperl-l mailing list > > Bioper... at lists.open-bio.org > >http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net > GMOD Coordinator (http://gmod.org/)?? ? ? ? ? ? ? ? ?? 216-392-3087 > Ontario Institute for Cancer Research > > _______________________________________________ > Bioperl-l mailing list > Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l From douglas.hoen at gmail.com Wed Aug 11 16:53:35 2010 From: douglas.hoen at gmail.com (Doug) Date: Wed, 11 Aug 2010 13:53:35 -0700 (PDT) Subject: [Bioperl-l] How to store results of searches of translated DNA in SeqFeature::Store database of the original DNA? In-Reply-To: <6e28dc26-ada0-4be2-9f62-a4d632aaf0bb@j8g2000yqd.googlegroups.com> References: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com> <6e28dc26-ada0-4be2-9f62-a4d632aaf0bb@j8g2000yqd.googlegroups.com> Message-ID: One more note: I did try using PositionProxy but it failed. It doesn't implement seq_id() and so can't be stored in the DB: ------------- EXCEPTION: Bio::Root::NotImplemented ------------- MSG: Abstract method "Bio::SeqFeatureI::seq_id" is not implemented by package Bio::SeqFeature::PositionProxy. This is not your fault - author of Bio::SeqFeature::PositionProxy should be blamed! ... On Aug 11, 4:38?pm, Doug wrote: > Hi Scott, > > Good idea. Would you happen to know of an existing HMMER3 to GFF3 > converter? > > Thanks for your advice, > -- Doug > > On Aug 11, 4:16?pm, Scott Cain wrote: > > > > > > > Hi Doug, > > > I don't know if any of the things you've thought of would work; I've > > never tried it. ?My inclination would be to express your data in GFF3 > > and use the standard loader. > > > Scott > > > On Wed, Aug 11, 2010 at 4:11 PM, Doug wrote: > > > One possible answer to my own question: Use > > > Bio::SeqFeature::PositionProxy's? Would this work? > > > > On Aug 11, 3:13?pm, Doug wrote: > > >> Hi, > > > >> I am trying to store in a SeqFeature::Store database the results of > > >> searches of translated DNA. The DB contains the original DNA > > >> sequences. For instance, I have done HMMER searches of 6-frame > > >> translations of the sequences stored in the DB. I want to store these > > >> results "at" their (equivalent) DNA positions, which I can calculate. > > >> Preferably, I would like to directly store the SeqFeature::Similarity > > >> objects that I get from parsing these searches. But they are of course > > >> located on different coordinate systems than the DNA, so I guess I > > >> can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct > > >> DNA position and then store the Similarity's as sub-SeqFeatures. > > > >> I could just set the Similarity's position to the (calculated) DNA > > >> coordinates, or alternately make a new SeqFeature and copy in the > > >> attributes I want. But is there a more elegant solution? > > > >> Thanks, > > >> -- Doug > > >> _______________________________________________ > > >> Bioperl-l mailing list > > >> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioper... at lists.open-bio.org > > >http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -- > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net > > GMOD Coordinator (http://gmod.org/)?? ? ? ? ? ? ? ? ?? 216-392-3087 > > Ontario Institute for Cancer Research > > > _______________________________________________ > > Bioperl-l mailing list > > Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Aug 11 16:45:00 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 11 Aug 2010 15:45:00 -0500 Subject: [Bioperl-l] How to store results of searches of translated DNA in SeqFeature::Store database of the original DNA? In-Reply-To: <6e28dc26-ada0-4be2-9f62-a4d632aaf0bb@j8g2000yqd.googlegroups.com> References: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com> <6e28dc26-ada0-4be2-9f62-a4d632aaf0bb@j8g2000yqd.googlegroups.com> Message-ID: <190AF658-E8FE-43D7-A71F-196AE54DA1DB@illinois.edu> HMMER3 is parsed by Bio::SearchIO now in bioperl-live, and I think there is a generic SearchIO->GFF3 script floating around the intertubes somewheres... chris On Aug 11, 2010, at 3:38 PM, Doug wrote: > Hi Scott, > > Good idea. Would you happen to know of an existing HMMER3 to GFF3 > converter? > > Thanks for your advice, > -- Doug > > On Aug 11, 4:16 pm, Scott Cain wrote: >> Hi Doug, >> >> I don't know if any of the things you've thought of would work; I've >> never tried it. My inclination would be to express your data in GFF3 >> and use the standard loader. >> >> Scott >> >> >> >> >> >> On Wed, Aug 11, 2010 at 4:11 PM, Doug wrote: >>> One possible answer to my own question: Use >>> Bio::SeqFeature::PositionProxy's? Would this work? >> >>> On Aug 11, 3:13 pm, Doug wrote: >>>> Hi, >> >>>> I am trying to store in a SeqFeature::Store database the results of >>>> searches of translated DNA. The DB contains the original DNA >>>> sequences. For instance, I have done HMMER searches of 6-frame >>>> translations of the sequences stored in the DB. I want to store these >>>> results "at" their (equivalent) DNA positions, which I can calculate. >>>> Preferably, I would like to directly store the SeqFeature::Similarity >>>> objects that I get from parsing these searches. But they are of course >>>> located on different coordinate systems than the DNA, so I guess I >>>> can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct >>>> DNA position and then store the Similarity's as sub-SeqFeatures. >> >>>> I could just set the Similarity's position to the (calculated) DNA >>>> coordinates, or alternately make a new SeqFeature and copy in the >>>> attributes I want. But is there a more elegant solution? >> >>>> Thanks, >>>> -- Doug >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l >> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioper... at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D. scott at scottcain dot net >> GMOD Coordinator (http://gmod.org/) 216-392-3087 >> Ontario Institute for Cancer Research >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From scott at scottcain.net Wed Aug 11 17:05:25 2010 From: scott at scottcain.net (Scott Cain) Date: Wed, 11 Aug 2010 17:05:25 -0400 Subject: [Bioperl-l] How to store results of searches of translated DNA in SeqFeature::Store database of the original DNA? In-Reply-To: <190AF658-E8FE-43D7-A71F-196AE54DA1DB@illinois.edu> References: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com> <6e28dc26-ada0-4be2-9f62-a4d632aaf0bb@j8g2000yqd.googlegroups.com> <190AF658-E8FE-43D7-A71F-196AE54DA1DB@illinois.edu> Message-ID: Um, yeah, it's in bioperl: bp_search2gff.pl. Scott On Wed, Aug 11, 2010 at 4:45 PM, Chris Fields wrote: > HMMER3 is parsed by Bio::SearchIO now in bioperl-live, and I think there is a generic SearchIO->GFF3 script floating around the intertubes somewheres... > > chris > > On Aug 11, 2010, at 3:38 PM, Doug wrote: > >> Hi Scott, >> >> Good idea. Would you happen to know of an existing HMMER3 to GFF3 >> converter? >> >> Thanks for your advice, >> -- Doug >> >> On Aug 11, 4:16 pm, Scott Cain wrote: >>> Hi Doug, >>> >>> I don't know if any of the things you've thought of would work; I've >>> never tried it. ?My inclination would be to express your data in GFF3 >>> and use the standard loader. >>> >>> Scott >>> >>> >>> >>> >>> >>> On Wed, Aug 11, 2010 at 4:11 PM, Doug wrote: >>>> One possible answer to my own question: Use >>>> Bio::SeqFeature::PositionProxy's? Would this work? >>> >>>> On Aug 11, 3:13 pm, Doug wrote: >>>>> Hi, >>> >>>>> I am trying to store in a SeqFeature::Store database the results of >>>>> searches of translated DNA. The DB contains the original DNA >>>>> sequences. For instance, I have done HMMER searches of 6-frame >>>>> translations of the sequences stored in the DB. I want to store these >>>>> results "at" their (equivalent) DNA positions, which I can calculate. >>>>> Preferably, I would like to directly store the SeqFeature::Similarity >>>>> objects that I get from parsing these searches. But they are of course >>>>> located on different coordinate systems than the DNA, so I guess I >>>>> can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct >>>>> DNA position and then store the Similarity's as sub-SeqFeatures. >>> >>>>> I could just set the Similarity's position to the (calculated) DNA >>>>> coordinates, or alternately make a new SeqFeature and copy in the >>>>> attributes I want. But is there a more elegant solution? >>> >>>>> Thanks, >>>>> -- Doug >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioper... at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> -- >>> ------------------------------------------------------------------------ >>> Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain dot net >>> GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ?216-392-3087 >>> Ontario Institute for Cancer Research >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From cjfields at illinois.edu Wed Aug 11 17:07:20 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 11 Aug 2010 16:07:20 -0500 Subject: [Bioperl-l] How to store results of searches of translated DNA in SeqFeature::Store database of the original DNA? In-Reply-To: References: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com> <6e28dc26-ada0-4be2-9f62-a4d632aaf0bb@j8g2000yqd.googlegroups.com> <190AF658-E8FE-43D7-A71F-196AE54DA1DB@illinois.edu> Message-ID: For some reason I thought there was a more up-to-date one somewhere. Ah well, can't keep track of all the code in bioperl :> chris On Aug 11, 2010, at 4:05 PM, Scott Cain wrote: > Um, yeah, it's in bioperl: bp_search2gff.pl. > > Scott > > > On Wed, Aug 11, 2010 at 4:45 PM, Chris Fields wrote: >> HMMER3 is parsed by Bio::SearchIO now in bioperl-live, and I think there is a generic SearchIO->GFF3 script floating around the intertubes somewheres... >> >> chris >> >> On Aug 11, 2010, at 3:38 PM, Doug wrote: >> >>> Hi Scott, >>> >>> Good idea. Would you happen to know of an existing HMMER3 to GFF3 >>> converter? >>> >>> Thanks for your advice, >>> -- Doug >>> >>> On Aug 11, 4:16 pm, Scott Cain wrote: >>>> Hi Doug, >>>> >>>> I don't know if any of the things you've thought of would work; I've >>>> never tried it. My inclination would be to express your data in GFF3 >>>> and use the standard loader. >>>> >>>> Scott >>>> >>>> >>>> >>>> >>>> >>>> On Wed, Aug 11, 2010 at 4:11 PM, Doug wrote: >>>>> One possible answer to my own question: Use >>>>> Bio::SeqFeature::PositionProxy's? Would this work? >>>> >>>>> On Aug 11, 3:13 pm, Doug wrote: >>>>>> Hi, >>>> >>>>>> I am trying to store in a SeqFeature::Store database the results of >>>>>> searches of translated DNA. The DB contains the original DNA >>>>>> sequences. For instance, I have done HMMER searches of 6-frame >>>>>> translations of the sequences stored in the DB. I want to store these >>>>>> results "at" their (equivalent) DNA positions, which I can calculate. >>>>>> Preferably, I would like to directly store the SeqFeature::Similarity >>>>>> objects that I get from parsing these searches. But they are of course >>>>>> located on different coordinate systems than the DNA, so I guess I >>>>>> can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct >>>>>> DNA position and then store the Similarity's as sub-SeqFeatures. >>>> >>>>>> I could just set the Similarity's position to the (calculated) DNA >>>>>> coordinates, or alternately make a new SeqFeature and copy in the >>>>>> attributes I want. But is there a more elegant solution? >>>> >>>>>> Thanks, >>>>>> -- Doug >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioper... at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> -- >>>> ------------------------------------------------------------------------ >>>> Scott Cain, Ph. D. scott at scottcain dot net >>>> GMOD Coordinator (http://gmod.org/) 216-392-3087 >>>> Ontario Institute for Cancer Research >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research From douglas.hoen at gmail.com Wed Aug 11 17:11:20 2010 From: douglas.hoen at gmail.com (Douglas Hoen) Date: Wed, 11 Aug 2010 17:11:20 -0400 Subject: [Bioperl-l] How to store results of searches of translated DNA in SeqFeature::Store database of the original DNA? In-Reply-To: References: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com> <6e28dc26-ada0-4be2-9f62-a4d632aaf0bb@j8g2000yqd.googlegroups.com> <190AF658-E8FE-43D7-A71F-196AE54DA1DB@illinois.edu> Message-ID: Great, thanks so much for the info. On 2010-08-11, at 5:05 PM, Scott Cain wrote: > Um, yeah, it's in bioperl: bp_search2gff.pl. > > Scott > > > On Wed, Aug 11, 2010 at 4:45 PM, Chris Fields wrote: >> HMMER3 is parsed by Bio::SearchIO now in bioperl-live, and I think there is a generic SearchIO->GFF3 script floating around the intertubes somewheres... >> >> chris >> >> On Aug 11, 2010, at 3:38 PM, Doug wrote: >> >>> Hi Scott, >>> >>> Good idea. Would you happen to know of an existing HMMER3 to GFF3 >>> converter? >>> >>> Thanks for your advice, >>> -- Doug >>> >>> On Aug 11, 4:16 pm, Scott Cain wrote: >>>> Hi Doug, >>>> >>>> I don't know if any of the things you've thought of would work; I've >>>> never tried it. My inclination would be to express your data in GFF3 >>>> and use the standard loader. >>>> >>>> Scott >>>> >>>> >>>> >>>> >>>> >>>> On Wed, Aug 11, 2010 at 4:11 PM, Doug wrote: >>>>> One possible answer to my own question: Use >>>>> Bio::SeqFeature::PositionProxy's? Would this work? >>>> >>>>> On Aug 11, 3:13 pm, Doug wrote: >>>>>> Hi, >>>> >>>>>> I am trying to store in a SeqFeature::Store database the results of >>>>>> searches of translated DNA. The DB contains the original DNA >>>>>> sequences. For instance, I have done HMMER searches of 6-frame >>>>>> translations of the sequences stored in the DB. I want to store these >>>>>> results "at" their (equivalent) DNA positions, which I can calculate. >>>>>> Preferably, I would like to directly store the SeqFeature::Similarity >>>>>> objects that I get from parsing these searches. But they are of course >>>>>> located on different coordinate systems than the DNA, so I guess I >>>>>> can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct >>>>>> DNA position and then store the Similarity's as sub-SeqFeatures. >>>> >>>>>> I could just set the Similarity's position to the (calculated) DNA >>>>>> coordinates, or alternately make a new SeqFeature and copy in the >>>>>> attributes I want. But is there a more elegant solution? >>>> >>>>>> Thanks, >>>>>> -- Doug >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioper... at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> -- >>>> ------------------------------------------------------------------------ >>>> Scott Cain, Ph. D. scott at scottcain dot net >>>> GMOD Coordinator (http://gmod.org/) 216-392-3087 >>>> Ontario Institute for Cancer Research >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research From Russell.Smithies at agresearch.co.nz Wed Aug 11 17:31:32 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 12 Aug 2010 09:31:32 +1200 Subject: [Bioperl-l] AlignIO and Gbrowse_syn In-Reply-To: References: Message-ID: <18DF7D20DFEC044098A1062202F5FFF32F0237EAB7@exchsth.agresearch.co.nz> I know there was some brief discussion about .maf format a few weeks ago but I've had an enquiry (as below) from a colleague. If GBrowse_syn is using .maf format, does AlignIO need more work? Any comments? --Russell I'd like to plug LASTZ alignments into GBrowse_syn. LASTZ can produce a limit number of alignment formats (http://www.bx.psu.edu/miller_lab/dist/README.lastz-1.02.00/README.lastz-1.02.00a.html#options_output). GBrowse_syn accepts clustalw format plus "other commonly used formats recognized by BioPerl's AlignIO parser" (http://gmod.org/wiki/GBrowse_syn_Database) . Since LASTZ doesn't produce clustalw, I've tried parsing LASTZ maf output to clustalw (and other alignment formats) using AlignIO, however I run into the following issues: *Strand info is lost (probably fair enough, since this isn't part of the clustalw format per se; incorporating strand info within sequence IDs is a GBrowse_syn clustalw specification) *The coordinate system for reverse strand matches differs between LASTZ .maf and BioPerl .maf: for LASTZ, coordinates relate to the reverse complemented sequence, whereas for BioPerl/GBrowse, coordinates relate to the original (non-rev complemented) sequence. E.g. a coordinate of "1" in the LASTZ .maf file refers to the last base of the original sequence; AlignIO prints "1" to the output clustalw file, but since strand info is lost it is construed as the first position at the very start of the original sequence. As a result all reverse match coordinates in the resulting clustalw output file are incorrect. *AlignIO is unable to parse multiple, individual aligned regions within the same .maf file; it interleaves them I would be interested to hear whether anyone has already found a solution to integrating LASTZ and GBrowse_syn... and also whether any development of AlignIO to improve support of maf format is planned. ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From cjfields at illinois.edu Wed Aug 11 18:02:38 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 11 Aug 2010 17:02:38 -0500 Subject: [Bioperl-l] AlignIO and Gbrowse_syn In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32F0237EAB7@exchsth.agresearch.co.nz> References: <18DF7D20DFEC044098A1062202F5FFF32F0237EAB7@exchsth.agresearch.co.nz> Message-ID: Russell, We have had very few requests to support .maf until recently, which is why there has been little done with it. We welcome any help to improve it. chris On Aug 11, 2010, at 4:31 PM, Smithies, Russell wrote: > I know there was some brief discussion about .maf format a few weeks ago but I've had an enquiry (as below) from a colleague. > If GBrowse_syn is using .maf format, does AlignIO need more work? > Any comments? > > --Russell > > > I'd like to plug LASTZ alignments into GBrowse_syn. LASTZ can produce a limit number of alignment formats (http://www.bx.psu.edu/miller_lab/dist/README.lastz-1.02.00/README.lastz-1.02.00a.html#options_output). GBrowse_syn accepts clustalw format plus "other commonly used formats recognized by BioPerl's AlignIO parser" (http://gmod.org/wiki/GBrowse_syn_Database) . Since LASTZ doesn't produce clustalw, I've tried parsing LASTZ maf output to clustalw (and other alignment formats) using AlignIO, however I run into the following issues: > *Strand info is lost (probably fair enough, since this isn't part of the clustalw format per se; incorporating strand info within sequence IDs is a GBrowse_syn clustalw specification) > *The coordinate system for reverse strand matches differs between LASTZ .maf and BioPerl .maf: for LASTZ, coordinates relate to the reverse complemented sequence, whereas for BioPerl/GBrowse, coordinates relate to the original (non-rev complemented) sequence. E.g. a coordinate of "1" in the LASTZ .maf file refers to the last base of the original sequence; AlignIO prints "1" to the output clustalw file, but since strand info is lost it is construed as the first position at the very start of the original sequence. As a result all reverse match coordinates in the resulting clustalw output file are incorrect. > *AlignIO is unable to parse multiple, individual aligned regions within the same .maf file; it interleaves them > > I would be interested to hear whether anyone has already found a solution to integrating LASTZ and GBrowse_syn... and also whether any development of AlignIO to improve support of maf format is planned. > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From douglas.hoen at gmail.com Thu Aug 12 01:59:37 2010 From: douglas.hoen at gmail.com (Doug Hoen) Date: Wed, 11 Aug 2010 22:59:37 -0700 (PDT) Subject: [Bioperl-l] HMMER3 to GFF3 Message-ID: <4bb89ced-69d9-43ff-ae20-4ce134efc40a@f6g2000yqa.googlegroups.com> Hi, I am trying to convert HMMER3 (hmmscan) output files into GFF3 files. Based on previous advice (see the thread, "How to store results of searches of translated DNA in SeqFeature::Store database of the original DNA?"), I have installed bioperl-live for its new HMMER3 parsing capabilities (in SearchIO) and am trying to use bp_search2gff.pl to do the file conversion. The hmmscan was done on translated chromosome sequences with conserved domain models. I want to get the GFF 'start' and 'end' columns to be based on these coordinates, not those of the models. To do this (with my files), it seems I need to use the option "--type hit". However, this changes the "Target" sequence name from the model name to chromosome name, and the model name does not appear anywhere in the output (see below). Could someone please confirm whether the results are incorrect and, if so, perhaps suggest a fix? It may well be that this problem is due to the unusual way I am using hmmscan, rather than a problem with HMMER3 parsing...? Many thanks, -- Doug ======================================================== Here's what it looks like if I do *not* use the "--type hit" option. (RVT_2 is a conserved domain name. I need this in the output.) COMMAND: ------------------ bp_search2gff.pl -i ../chr1-tesigsv2.hmmscan -o chr1-tesigsv2-hmmscan- original-locations-v2.gff3 --format hmmer3 --source HMMER3 --version 3 --component OUTPUT: ------------------ ==> chr1-tesigsv2-hmmscan-original-locations-v2.gff3 <== ##gff-version 3 Chr1_1 chromosome Component 1 10142557 . . 1 sequence=Chr1_1 Chr1_1 HMMER3 similarity 1 245 307.3 . 0 Target=Sequence:RVT_2 1898330 1898579 Chr1_1 HMMER3 similarity 1 244 329.5 . 0 Target=Sequence:RVT_2 2573551 2573796 Chr1_1 HMMER3 similarity 1 245 308.8 . 0 Target=Sequence:RVT_2 3159685 3159930 Chr1_1 HMMER3 similarity 1 102 108.2 . 0 Target=Sequence:RVT_2 3438684 3438791 Chr1_1 HMMER3 similarity 2 245 277.2 . 0 Target=Sequence:RVT_2 3566642 3566891 Chr1_1 HMMER3 similarity 13 213 251.4 . 0 Target=Sequence:RVT_2 4251160 4251373 Chr1_1 HMMER3 similarity 1 244 310.6 . 0 Target=Sequence:RVT_2 4252791 4253036 Chr1_1 HMMER3 similarity 6 99 94.2 . 0 Target=Sequence:RVT_2 4271555 4271653 ======================================================== And here's what it looks like if I *do* use the "--type hit" option. The coordinates look good but the model name has disappeared (and the Target=Sequence seems wrong). COMMAND: ------------------ bp_search2gff.pl -i ../chr1-tesigsv2.hmmscan -o chr1-tesigsv2-hmmscan- original-locations-v3.gff3 --format hmmer3 --type hit --source HMMER3 --version 3 --component OUTPUT: ------------------ ==> chr1-tesigsv2-hmmscan-original-locations-v3.gff3 <== ##gff-version 3 RVT_2 HMMER3 similarity 1898330 1898579 307.3 . 0 Target=Sequence:Chr1_1 1 245 RVT_2 HMMER3 similarity 2573551 2573796 329.5 . 0 Target=Sequence:Chr1_1 1 244 RVT_2 HMMER3 similarity 3159685 3159930 308.8 . 0 Target=Sequence:Chr1_1 1 245 RVT_2 HMMER3 similarity 3438684 3438791 108.2 . 0 Target=Sequence:Chr1_1 1 102 RVT_2 HMMER3 similarity 3566642 3566891 277.2 . 0 Target=Sequence:Chr1_1 2 245 RVT_2 HMMER3 similarity 4251160 4251373 251.4 . 0 Target=Sequence:Chr1_1 13 213 RVT_2 HMMER3 similarity 4252791 4253036 310.6 . 0 Target=Sequence:Chr1_1 1 244 RVT_2 HMMER3 similarity 4271555 4271653 94.2 . 0 Target=Sequence:Chr1_1 6 99 RVT_2 HMMER3 similarity 4481232 4481477 281.5 . 0 Target=Sequence:Chr1_1 2 245 ======================================================== And here's what the input HMMER3 result file looks like: ==> ../chr1-tesigsv2.hmmscan <== # hmmscan :: search sequence(s) against a profile database # HMMER 3.0rc1 (February 2010); http://hmmer.org/ # Copyright (C) 2010 Howard Hughes Medical Institute. # Freely distributed under the GNU General Public License (GPLv3). # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # query sequence file: [...]/whole_chromosomes/translated/ chr1.pep # target HMM database: [...]/signatures/Pfam-A.hmm # output directed to file: chr1-tesigsv2.hmmscan # model-specific thresholding: TC cutoffs # Max sensitivity mode: on [all heuristic filters off] # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Query: Chr1_1 [L=10142557] Description: CHROMOSOME dumped from ADB: Jun/20/09 14:53; last updated: 2009-02-02 Scores for complete sequence (score includes all domains): --- full sequence --- --- best 1 domain --- -#dom- E-value score bias E-value score bias exp N Model Description ------- ------ ----- ------- ------ ----- ---- -- -------- ----------- 0 3971.3 17.7 2.6e-101 329.5 0.6 19.4 17 RVT_2 Reverse transcriptase (RNA-dependent DNA pol 0 3040.7 23.0 1e-206 678.6 0.1 12.2 10 ATHILA ATHILA ORF-1 family 0 1681.9 79.1 1.9e-46 149.9 0.4 28.0 21 RVT_1 Reverse transcriptase (RNA-dependent DNA pol 0 1446.9 27.4 3.6e-95 309.1 0.2 7.6 5 Transposase_21 Transposase family tnp2 0 1168.4 50.3 1.4e-29 94.4 0.3 21.5 18 rve Integrase core domain 9.1e-300 960.0 69.0 3.1e-20 64.0 0.0 28.8 20 Retrotrans_gag Retrotransposon gag protein 1.5e-180 577.0 31.6 1.6e-29 93.1 1.5 9.5 8 Transposase_23 TNP1/EN/SPM transposase 4.4e-143 456.9 82.8 4.8e-18 56.4 0.1 12.9 11 MuDR MuDR family transposase 3.8e-116 371.4 19.6 1.2e-18 58.9 0.0 13.7 7 MULE MULE transposase domain 7.1e-106 344.1 5.6 2.7e-97 316.0 0.0 3.6 1 Plant_tran Plant transposon protein 9.2e-85 275.4 22.9 5.4e-60 194.4 0.3 6.4 3 Peptidase_C48 Ulp1 protease family, C-terminal catalytic d 1.8e-77 249.8 24.8 4.4e-28 89.8 0.1 10.8 3 Transposase_24 Plant transposase (Ptta/En/Spm family) 2.8e-47 150.1 1.2 5.5e-23 72.3 0.2 3.7 2 hATC hAT family dimerisation domain 5.7e-28 89.4 3.6 4.7e-13 41.1 0.0 6.5 1 RVP_2 Retroviral aspartyl protease 1e-16 53.3 0.0 4.4e-07 22.1 0.0 6.8 1 RnaseH RNase H 1.5e-08 25.3 2.4 0.00016 12.1 0.0 4.9 0 Transposase_mut Transposase, Mutator family Domain annotation for each model (and alignments): >> RVT_2 Reverse transcriptase (RNA-dependent DNA polymerase) # score bias c-Evalue i-Evalue hmmfrom hmm to alifrom ali to envfrom env to acc --- ------ ----- --------- --------- ------- ------- ------- ------- ------- ------- ---- 1 ! 307.3 0.0 5.3e-95 1.5e-94 1 245 [. 1898330 1898578 .. 1898330 1898579 .. 0.99 2 ! 329.5 0.6 8.9e-102 2.6e-101 1 244 [. 2573551 2573794 .. 2573551 2573796 .. 0.99 3 ! 308.8 0.0 1.8e-95 5.2e-95 1 245 [. 3159685 3159929 .. 3159685 3159930 .. 0.99 4 ! 108.2 0.1 3.4e-34 9.7e-34 1 102 [. 3438684 3438785 .. 3438684 3438791 .. 0.96 5 ! 277.2 0.0 8.1e-86 2.3e-85 2 245 .. 3566643 3566890 .. 3566642 3566891 .. 0.99 6 ! 251.4 0.0 6.2e-78 1.8e-77 13 213 .. 4251164 4251364 .. 4251160 4251373 .. 0.97 7 ! 310.6 0.0 5.1e-96 1.5e-95 1 244 [. 4252791 4253034 .. 4252791 4253036 .. 0.99 8 ! 94.2 0.1 6.1e-30 1.8e-29 6 99 .. 4271560 4271653 .. 4271555 4271653 .. 0.97 9 ! 281.5 0.9 3.9e-87 1.1e-86 2 245 .. 4481233 4481476 .. 4481232 4481477 .. 0.98 10 ! 248.2 0.0 5.9e-77 1.7e-76 1 190 [. 4521040 4521233 .. 4521040 4521237 .. 0.97 11 ! 314.6 0.1 3.2e-97 9.2e-97 1 244 [. 4652456 4652702 .. 4652456 4652704 .. 0.98 12 ! 40.7 0.0 1.3e-13 3.7e-13 2 92 .. 5219607 5219697 .. 5219606 5219701 .. 0.90 13 ! 221.0 0.0 1.2e-68 3.4e-68 2 245 .. 5241015 5241258 .. 5241014 5241259 .. 0.95 14 ! 81.2 0.0 5.6e-26 1.6e-25 2 115 .. 5501957 5502070 .. 5501956 5502080 .. 0.92 15 ! 272.4 0.0 2.3e-84 6.7e-84 30 245 .. 6483057 6483271 .. 6483050 6483272 .. 0.98 16 ! 178.5 0.0 1.2e-55 3.3e-55 81 244 .. 7250563 7250726 .. 7250552 7250728 .. 0.96 17 ! 313.7 0.0 5.9e-97 1.7e-96 2 245 .. 7707124 7707367 .. 7707123 7707368 .. 0.99 Alignments for each domain: == domain 1 score: 307.3 bits; conditional E-value: 5.3e-95 RVT_2 1 nktwelvelpkgkkviglkWvfklKlnedgeierykARlVakGftqkegidyeetfspvvklesirlllalaaekkleleqlDvktaFLngelee 95 n tw +++lp gkk++g+kWv+k+Kln+dg++erykARlVakG+tq+eg+dy +tfspv+kl++++ll+a+aa+k+++l+qlD+++aFLng+l+e Chr1_1 1898330 NGTWVVCSLPVGKKAVGCKWVYKIKLNADGSLERYKARLVAKGYTQTEGLDYVDTFSPVAKLTTVKLLIAVAAAKGWSLSQLDISNAFLNGSLDE 1898424 68********************************************************************************************* PP RVT_2 96 evYvkqpeGfedkkk....enkvckLkkslYgLkqapraWyeklsevllklgfkkseadkclfvkkkeeeliivllYVDDlliagsskelieelk 186 e+Y++ p+G++ ++ +n vc+LkkslYgLkqa+r+Wy k+se l++lgf+ +s+ d++lf++k++++ ++vl+YVDD++ia+s +++ e l Chr1_1 1898425 EIYMTLPPGYSPRQGdsfpPNAVCRLKKSLYGLKQASRQWYLKFSESLKALGFTQSSGDHTLFTRKSKNSYMAVLVYVDDIIIASSCDRETELLR 1898519 ***********998889999*************************************************************************** PP RVT_2 187 eeLkkefemkdlgelkyfLgleierkeegillsqekyvkkllkkfkmedakpvstplea 245 ++L+++ +++dlg+l+yfLglei+r+++gi+++q+ky+ +ll+++++ +k++s +p+e+ Chr1_1 1898520 DALQRSSKLRDLGTLRYFLGLEIARNTDGISICQRKYTLELLAETGLLGCKSSSVPMEP 1898578 *********************************************************97 PP == domain 2 score: 329.5 bits; conditional E-value: 8.9e-102 RVT_2 1 nktwelvelpkgkkviglkWvfklKlnedgeierykARlVakGftqkegidyeetfspvvklesirlllalaaekkleleqlDvktaFLngelee 95 n+twel++lp+g+k+ig+kWv+k K+n++ge+erykARlVakG++q++gidy+e +f+pv++le++rl+++laa++k++++q+D k aFLng++ee Chr1_1 2573551 NDTWELTSLPNGHKAIGVKWVYKAKKNSKGEVERYKARLVAKGYSQRAGIDYDEVFAPVARLETVRLIISLAAQNKWKIHQMDFKLAFLNGDFEE 2573645 79********************************************************************************************* PP RVT_2 96 evYvkqpeGfedkkkenkvckLkkslYgLkqapraWyeklsevllklgfkkseadkclfvkkkeeeliivllYVDDlliagsskelieelkeeLk 190 evY++qp+G+ +k++e+kv++Lkk+lYgLkqapraW++++++++++++f k+ + +++l++k ++e+++i +lYVDDl+++g++ ++ ee+k+e++ Chr1_1 2573646 EVYIEQPQGYIVKGEEDKVLRLKKALYGLKQAPRAWNTRIDKYFKEKDFIKCPYEHALYIKIQKEDILIACLYVDDLIFTGNNPSMFEEFKKEMT 2573740 *********************************************************************************************** PP RVT_2 191 kefemkdlgelkyfLgleierkeegillsqekyvkkllkkfkmedakpvstple 244 kefem+d+g ++y+Lg+e+++++++i+++qe y+k++lkkfkm+d++pv tp +e Chr1_1 2573741 KEFEMTDIGLMSYYLGIEVKQEDNRIFITQEGYAKEVLKKFKMDDSNPVCTPME 2573794 ****************************************************97 PP From kai.blin at biotech.uni-tuebingen.de Thu Aug 12 08:16:45 2010 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Thu, 12 Aug 2010 14:16:45 +0200 Subject: [Bioperl-l] HMMER3 to GFF3 In-Reply-To: <4bb89ced-69d9-43ff-ae20-4ce134efc40a@f6g2000yqa.googlegroups.com> References: <4bb89ced-69d9-43ff-ae20-4ce134efc40a@f6g2000yqa.googlegroups.com> Message-ID: <20100812141645.1dc6507a.kai.blin@biotech.uni-tuebingen.de> On Wed, 11 Aug 2010 22:59:37 -0700 (PDT) Doug Hoen wrote: Hi Doug, > Could someone please confirm whether the results are incorrect and, if > so, perhaps suggest a fix? It may well be that this problem is due to > the unusual way I am using hmmscan, rather than a problem with HMMER3 > parsing...? Can you please attach your hmmer input file? Along the way something inserted line breaks, making it unreadable. It might well be possible that the HMMer3 parser still handles a little different from the HMMer2 parser, I haven't tried that script. Cheers, Kai -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Institute for Microbiology and Infection Medicine Division of Microbiology/Biotechnology Eberhard-Karls-University of T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Deutschland Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From kai.blin at biotech.uni-tuebingen.de Thu Aug 12 08:09:00 2010 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Thu, 12 Aug 2010 14:09:00 +0200 Subject: [Bioperl-l] using HMMER In-Reply-To: <62C86AFB-FF3A-44C6-A413-50C3F839DF34@illinois.edu> References: <603590.1072.qm@web112620.mail.gq1.yahoo.com> <4C62B487.9090103@gmail.com> <62C86AFB-FF3A-44C6-A413-50C3F839DF34@illinois.edu> Message-ID: <20100812140900.291bbb01.kai.blin@biotech.uni-tuebingen.de> On Wed, 11 Aug 2010 10:07:36 -0500 Chris Fields wrote: > might also want to check whether you are using hmmer2 vs hmmer3. not sure if the wrapper works for hmmer3. It might if you initialize it using my $factory = Bio::Tools::Run::Hmmer->new(-hmm => 'model.hmm', -_READMETHOD => 'hmmer3'); at least for the programs that still exist with the same name in hmmer3. It won't support hmmer3 using the default options, though. If I have some spare time, I'll look into this, no promises on the timeframe, though. Cheers, Kai -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Institute for Microbiology and Infection Medicine Division of Microbiology/Biotechnology Eberhard-Karls-University of T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Deutschland Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From cjfields at illinois.edu Thu Aug 12 11:28:50 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 12 Aug 2010 10:28:50 -0500 Subject: [Bioperl-l] using HMMER In-Reply-To: <20100812140900.291bbb01.kai.blin@biotech.uni-tuebingen.de> References: <603590.1072.qm@web112620.mail.gq1.yahoo.com> <4C62B487.9090103@gmail.com> <62C86AFB-FF3A-44C6-A413-50C3F839DF34@illinois.edu> <20100812140900.291bbb01.kai.blin@biotech.uni-tuebingen.de> Message-ID: <8129B813-5B15-4DDC-AB0D-5D95EFFCE78D@illinois.edu> On Aug 12, 2010, at 7:09 AM, Kai Blin wrote: > On Wed, 11 Aug 2010 10:07:36 -0500 > Chris Fields wrote: > >> might also want to check whether you are using hmmer2 vs hmmer3. not sure if the wrapper works for hmmer3. > > It might if you initialize it using > my $factory = Bio::Tools::Run::Hmmer->new(-hmm => 'model.hmm', -_READMETHOD => 'hmmer3'); > > at least for the programs that still exist with the same name in > hmmer3. It won't support hmmer3 using the default options, though. > > If I have some spare time, I'll look into this, no promises on the > timeframe, though. > > Cheers, > Kai > > -- > Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de > Institute for Microbiology and Infection Medicine > Division of Microbiology/Biotechnology > Eberhard-Karls-University of T?bingen > Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 > D-72076 T?bingen Fax : ++49 7071 29-5979 > Deutschland > Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben Would be nice to convert this over (at some point) to use Mark's CommandExts. I'm thinking of doing this with Infernal, so if I get that running it wouldn't be terribly difficult to get hmmer3 working as well. chris From cjfields at illinois.edu Thu Aug 12 12:14:44 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 12 Aug 2010 11:14:44 -0500 Subject: [Bioperl-l] using HMMER In-Reply-To: <857996.8184.qm@web112610.mail.gq1.yahoo.com> References: <603590.1072.qm@web112620.mail.gq1.yahoo.com> <4C62B487.9090103@gmail.com> <62C86AFB-FF3A-44C6-A413-50C3F839DF34@illinois.edu> <20100812140900.291bbb01.kai.blin@biotech.uni-tuebingen.de> <8129B813-5B15-4DDC-AB0D-5D95EFFCE78D@illinois.edu> <857996.8184.qm@web112610.mail.gq1.yahoo.com> Message-ID: <43FD0A31-DB95-4AE9-B678-937EE6346BC2@illinois.edu> Fayroz, Please keep responses on-list. It seems you need to update your local bioperl, as 'hmmer3' is a recent addition, after 1.6.1. It will be in 1.6.2 if I can get the time to make a release :> chris On Aug 12, 2010, at 10:58 AM, fayroz wrote: > dear chris, > from HMMER documentation i found this statement > "The HMMER programs must either be in your path, or you must set the environment > variable HMMERDIR to point to their location." > is it will solve the problem? > how can i do it please ? i work under windows7 platform > > > when i appled this line with hmmer3 > my $factory = Bio::Tools::Run::Hmmer->new(-hmm => 'model.hmm', -_READMETHOD => > 'hmmer3'); > > this output apper: > > Bio::SearchIO: hmmer3 cannot be found > > and when try with hmmer2 the same output apper: > > Exception > ------------- EXCEPTION ------------- > MSG: Failed to load module Bio::SearchIO::hmmer3. Can't locate > Bio\SearchIO\hmmer3.pm in @INC (@INC contains: D:\Perl\bin\ D:/Perl/site/lib > D:/Perl/lib .) at D:/Perl/site/lib/Bio/Root/Root.pm line 439, line 1. > STACK Bio::Root::Root::_load_module D:/Perl/site/lib/Bio/Root/Root.pm:441 > STACK (eval) D:/Perl/site/lib/Bio/SearchIO.pm:446 > STACK Bio::SearchIO::_load_format_module D:/Perl/site/lib/Bio/SearchIO.pm:445 > STACK Bio::SearchIO::new D:/Perl/site/lib/Bio/SearchIO.pm:189 > STACK Bio::Tools::Run::Hmmer::_run D:/Perl/site/lib/Bio/Tools/Run/Hmmer.pm:431 > STACK Bio::Tools::Run::Hmmer::hmmsearch > D:/Perl/site/lib/Bio/Tools/Run/Hmmer.pm:353 > STACK toplevel C:\Users\Khaled\AppData\Local\Temp\dzprltmp.pl:13 > ------------------------------------- > For more information about the SearchIO system please see the SearchIO docs. > This includes ways of checking for formats at compile time, not run time > '--informat' is not recognized as an internal or external command, > operable program or batch file. > Can't call method "next_result" on an undefined value at > C:\Users\Khaled\AppData\Local\Temp\dzprltmp.pl line 15, line 1. > > > > ----- Original Message ---- > From: Chris Fields > To: Kai Blin > Cc: fayroz ; bioperl-l at bioperl.org > Sent: Thu, August 12, 2010 6:28:50 PM > Subject: Re: [Bioperl-l] using HMMER > > On Aug 12, 2010, at 7:09 AM, Kai Blin wrote: > >> On Wed, 11 Aug 2010 10:07:36 -0500 >> Chris Fields wrote: >> >>> might also want to check whether you are using hmmer2 vs hmmer3. not sure if >>> the wrapper works for hmmer3. >> >> It might if you initialize it using >> my $factory = Bio::Tools::Run::Hmmer->new(-hmm => 'model.hmm', -_READMETHOD => >> 'hmmer3'); >> >> at least for the programs that still exist with the same name in >> hmmer3. It won't support hmmer3 using the default options, though. >> >> If I have some spare time, I'll look into this, no promises on the >> timeframe, though. >> >> Cheers, >> Kai >> >> -- >> Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de >> Institute for Microbiology and Infection Medicine >> Division of Microbiology/Biotechnology >> Eberhard-Karls-University of T?bingen >> Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 >> D-72076 T?bingen Fax : ++49 7071 29-5979 >> Deutschland >> Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben > > Would be nice to convert this over (at some point) to use Mark's CommandExts. > I'm thinking of doing this with Infernal, so if I get that running it wouldn't > be terribly difficult to get hmmer3 working as well. > > chris > > > From jason at bioperl.org Thu Aug 12 14:37:11 2010 From: jason at bioperl.org (Jason Stajich) Date: Thu, 12 Aug 2010 11:37:11 -0700 Subject: [Bioperl-l] Other: Script for editing alignments? In-Reply-To: <20100812061811.4D92468539@evol.biology.mcmaster.ca> References: <20100812061811.4D92468539@evol.biology.mcmaster.ca> Message-ID: <4C643F57.3040408@bioperl.org> Hi Si - This is pretty straightforward with Bioperl. Here's one solution: #!/usr/bin/perl -w use strict; use Bio::AlignIO; my $in = Bio::AlignIO->new(-format => 'fasta', -file => shift @ARGV); my $out = Bio::AlignIO->new(-format => 'fasta'); while( my $aln = $in->next_aln ) { for my $seq ( $aln->each_seq ) { my $str = $seq->seq; if( $str =~ /^(-+)/ ) { my $rep = length($1); # replace from the 5' end substr($str,0,$rep,'N'x$rep); } if( $str =~ /(-+)$/ ) { my $rep = length($1); # replace from the 3' end substr($str,-1 * $rep,length($str),'N'x$rep); } $seq->seq($str); } # don't print the /start-end info in the FASTA ID $aln->set_displayname_flat(1); $out->write_aln($aln); } -jason evoldir at evol.biology.mcmaster.ca wrote, On 8/11/10 11:18 PM: > Dear All > > Alignment programs like MUSCLE and Clustal often output alignments with > "-" symbols indicating indels (real events) within sequence alignments, > but also "-" symbols at the 5' and 3' ends of sequences. The latter > however, are not real evolutionary events and really should be Ns > (missing data), depending on the sort of analytical framework you use. > > If there is sufficient heterogeneity and signal within the 5' and 3' > ends of sequences, the "-"s can be manually edited in a text editor to > Ns with no problem, if the alignment is small. If it is large (e.g. 2000 > seqs), or there are lots of alignments, it becomes a lengthy task. > > I'm investigating such alignments presently and so was wondering if > anyone had a clever way of implementing sed, or had a Perl script that > would perform such a task. Simply put, it would require replacing the 5' > and 3' "-" below only with Ns and leaving the within sequence "-"s > alone. The sequences naturally may span more than one line. > > >Taxon 1 > -----ATGCTG--TGACTG----TGACT--- > >Taxon 2 > ---GTATGTTG--TGACTGCT--TGACCGTC > > to > > >Taxon 1 > NNNNNATGCTG--TGACTG----TGACTNNN > >Taxon 2 > NNNGTATGTTG--TGACTGCT--TGACCGTC > > It's a simple task, but I haven't seen any scripts out there to do the job. > > If there are any scripters out there who can help, or if someone knows > of an application that would help, it would be great to hear from you. > > With best wishes and thanks > > Si Creer > > From genehack at genehack.org Thu Aug 12 20:32:07 2010 From: genehack at genehack.org (John SJ Anderson) Date: Thu, 12 Aug 2010 20:32:07 -0400 Subject: [Bioperl-l] Bio::SeqFeature::SimilarityPair->from_searchResult()? In-Reply-To: <4513D6B2-F7B3-4A6E-91CA-879C9E372E84@gmail.com> References: <4513D6B2-F7B3-4A6E-91CA-879C9E372E84@gmail.com> Message-ID: On Aug 10, 2010, at 21:54 , Douglas Hoen wrote: > I was wondering why the Synopsis in the docs for Bio::SeqFeature::SimilarityPair has the following: > $sim_pair = Bio::SeqFeature::SimilarityPair->from_searchResult($blastHit); > > There doesn't actually seem to be a from_searchResult method. Am I missing something? No, it looks like that method got removed back in 2002 as a part of moving to Bio::SearchIO (which was removed still later...): Unfortunately, the commit didn't update the documentation. From the tiny little bit I've looked at the code, it looks like you should just be calling the 'new()' method instead (note that it takes a set of arguments, not just a BLAST hit object). Hope this helps -- if you should happen to have the tuits, a patch to update the documentation to reflect the current interface would be awesome... chrs, john. From david.breimann at gmail.com Fri Aug 13 09:01:10 2010 From: david.breimann at gmail.com (David Breimann) Date: Fri, 13 Aug 2010 16:01:10 +0300 Subject: [Bioperl-l] Problem executing bp_genbank2gff3.pl from another perl script Message-ID: Hi, I am rying to run bp_genbank2gff3.pl from another perl script that gets a genbank as its argument. This does not work (no output files are generated): my $command = "bp_genbank2gff3.pl -y -o /tmp $ARGV[0]"; open( my $command_out, "-|", $command ); close $command_out; but this does open( my $command_out, "-|", $command ); sleep 3; # why do I need to sleep? close $command_out; Why? I though that close is supposed to block until the command is done: Closing any piped filehandle causes the parent process to wait for the child to finish... (see http://perldoc.perl.org/functions/open.html). Thanks Dave From jun.yin at ucd.ie Fri Aug 13 09:36:34 2010 From: jun.yin at ucd.ie (Jun Yin) Date: Fri, 13 Aug 2010 14:36:34 +0100 Subject: [Bioperl-l] Bio::LocatableSeq end checking inconsistency Message-ID: <004a01cb3aec$8c2ddd60$a4899820$%yin@ucd.ie> Hi, all, I am the google summer of code student working on Bio::Align subsystem refactoring. The code (Bio::SimpleAlign) I re-implemented now has passed nearly all the test, except a few tests on seq/start-end testing. But here comes a problem. This may be an old issue, that the Bio::LocatableSeq end assignment and checking are inconsistent. The current end checking method is based on: $end=$seq->_ungapped_len+$seq->start-1 However, this checking may not fit the real world case. The inconsistency usually happens when a few columns of the sequence are removed. For example: my $a = Bio::LocatableSeq->new( -id => 'a', -strand => 1, -seq => '-tcgatc-atcgatcg', -start => 30, -end => 43 ); If we remove the 1st, 8th and the last columns $a->seq() will be 'tcgatcatcgatc' $a->_ungapped_len==12 Actually, in the real world, the first residue will still be 30 (the old $seq->start), and the last residue is the residue before the 43 (the old $seq->end), thus 42. But if you call a validation, the calculation is $a->_ungapped_len+$a->start-1=12+30-1=41 So the reassignment of the $seq->end will not pass the validation. So unless you save the information to a new sequence object, the original position information will be lost anyway. But in some cases, we have to change the sequence in its original sequence object .. What is your suggestion on this issue? A. pass the test and lose the information #convenient in coding but the start-end annotation is not right any more B. keep the information and forget the test #the object will still remember where the last residue was in the original sequence. But is it really meaningful at all? Because all the other residues may come from nowhere C. Neither of above #any other suggestions? Cheers, Jun Yin Ph.D. student in U.C.D. Bioinformatics Laboratory Conway Institute University College Dublin From jessica.sun at gmail.com Fri Aug 13 11:06:46 2010 From: jessica.sun at gmail.com (Jessica Sun) Date: Fri, 13 Aug 2010 11:06:46 -0400 Subject: [Bioperl-l] Add sequence feature Message-ID: Does anyone knows how to open a genbank file, add new feature and then save a new genbank file with new feature added in bioperl ? thx -- Jessica Jingping Sun From jessica.sun at gmail.com Fri Aug 13 11:27:10 2010 From: jessica.sun at gmail.com (Jessica Sun) Date: Fri, 13 Aug 2010 11:27:10 -0400 Subject: [Bioperl-l] Add sequence feature In-Reply-To: <4C6562E0.7090008@gmail.com> References: <4C6562E0.7090008@gmail.com> Message-ID: unfortunately. I want to add the feature to the sequence object I got from the Genbank file, I do not mind to save a new genbank file but these new genbank file contains the original genbank format and info I got plus the new feature tags I need to added to. Any quick solution to this? thx Jessica On Fri, Aug 13, 2010 at 11:21 AM, Roy Chaudhuri wrote: > Hi Jessica. > > You need to use Bio::SeqIO to read in the GenBank file to a BioPerl > sequence object, and to write your new GenBank file: > http://www.bioperl.org/wiki/HOWTO:SeqIO > > To add a new feature follow the instructions here: > > http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Own_Sequences > > (except that you are adding the feature to the sequence object you got from > the Genbank file, not a new Bio::Seq object). > > Cheers. > Roy. > > > On 13/08/2010 16:06, Jessica Sun wrote: > >> Does anyone knows how to open a genbank file, add new feature and then >> save >> a new genbank >> file with new feature added in bioperl ? >> >> thx >> >> > -- Jessica Jingping Sun From roy.chaudhuri at gmail.com Fri Aug 13 11:21:04 2010 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Fri, 13 Aug 2010 16:21:04 +0100 Subject: [Bioperl-l] Add sequence feature In-Reply-To: References: Message-ID: <4C6562E0.7090008@gmail.com> Hi Jessica. You need to use Bio::SeqIO to read in the GenBank file to a BioPerl sequence object, and to write your new GenBank file: http://www.bioperl.org/wiki/HOWTO:SeqIO To add a new feature follow the instructions here: http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Own_Sequences (except that you are adding the feature to the sequence object you got from the Genbank file, not a new Bio::Seq object). Cheers. Roy. On 13/08/2010 16:06, Jessica Sun wrote: > Does anyone knows how to open a genbank file, add new feature and then save > a new genbank > file with new feature added in bioperl ? > > thx > From roy.chaudhuri at gmail.com Fri Aug 13 11:37:20 2010 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Fri, 13 Aug 2010 16:37:20 +0100 Subject: [Bioperl-l] Add sequence feature In-Reply-To: References: <4C6562E0.7090008@gmail.com> Message-ID: <4C6566B0.60706@gmail.com> I'm not sure I understand, do you mean that you want to load just the sequence from the GenBank file (ignoring the existing annotation), then add your own features? There are instructions on how to do that here: http://www.bioperl.org/wiki/HOWTO:SeqIO#Speed.2C_Bio::Seq::SeqBuilder On 13/08/2010 16:27, Jessica Sun wrote: > unfortunately. I want to add the feature to the sequence object I got > from the Genbank file, I do not mind to save a new genbank file but > these new genbank file contains the original genbank format and info I > got plus the new feature tags I need to added to. Any quick solution to > this? > > thx > > Jessica > > > > On Fri, Aug 13, 2010 at 11:21 AM, Roy Chaudhuri > wrote: > > Hi Jessica. > > You need to use Bio::SeqIO to read in the GenBank file to a BioPerl > sequence object, and to write your new GenBank file: > http://www.bioperl.org/wiki/HOWTO:SeqIO > > To add a new feature follow the instructions here: > http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Own_Sequences > > (except that you are adding the feature to the sequence object you > got from the Genbank file, not a new Bio::Seq object). > > Cheers. > Roy. > > > On 13/08/2010 16:06, Jessica Sun wrote: > > Does anyone knows how to open a genbank file, add new feature > and then save > a new genbank > file with new feature added in bioperl ? > > thx > > > > > > -- > Jessica Jingping Sun From roy.chaudhuri at gmail.com Fri Aug 13 11:57:27 2010 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Fri, 13 Aug 2010 16:57:27 +0100 Subject: [Bioperl-l] Add sequence feature In-Reply-To: References: <4C6562E0.7090008@gmail.com> <4C6566B0.60706@gmail.com> Message-ID: <4C656B67.5020402@gmail.com> Please remember to copy replies to the mailing list. You can loop over the features in your Bio::Seq object: for my $feat ($seq->get_SeqFeatures) { # do something } And once you have found the feature you want to modify, you can add a tag using something like: $feat->add_tag_value('note',"this is a note"); When you're finished you can write out the modified sequence object to a new GenBank file. On 13/08/2010 16:40, Jessica Sun wrote: > no i want to load the genbank file with existing features and I need to > add some new feature tags to the existing ones and then save to a new > update genbank file for local usage. I just not quite good on how to > easily merge the two steps you recommended into one in a neat way. > > thx > > > On Fri, Aug 13, 2010 at 11:37 AM, Roy Chaudhuri > wrote: > > I'm not sure I understand, do you mean that you want to load just > the sequence from the GenBank file (ignoring the existing > annotation), then add your own features? There are instructions on > how to do that here: > http://www.bioperl.org/wiki/HOWTO:SeqIO#Speed.2C_Bio::Seq::SeqBuilder > > > On 13/08/2010 16:27, Jessica Sun wrote: > > unfortunately. I want to add the feature to the sequence object > I got > from the Genbank file, I do not mind to save a new genbank file but > these new genbank file contains the original genbank format and > info I > got plus the new feature tags I need to added to. Any quick > solution to > this? > > thx > > Jessica > > > > On Fri, Aug 13, 2010 at 11:21 AM, Roy Chaudhuri > > >> wrote: > > Hi Jessica. > > You need to use Bio::SeqIO to read in the GenBank file to a > BioPerl > sequence object, and to write your new GenBank file: > http://www.bioperl.org/wiki/HOWTO:SeqIO > > To add a new feature follow the instructions here: > http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Own_Sequences > > (except that you are adding the feature to the sequence > object you > got from the Genbank file, not a new Bio::Seq object). > > Cheers. > Roy. > > > On 13/08/2010 16:06, Jessica Sun wrote: > > Does anyone knows how to open a genbank file, add new > feature > and then save > a new genbank > file with new feature added in bioperl ? > > thx > > > > > > -- > Jessica Jingping Sun > > > > > > -- > Jessica Jingping Sun From jessica.sun at gmail.com Fri Aug 13 13:06:32 2010 From: jessica.sun at gmail.com (Jessica Sun) Date: Fri, 13 Aug 2010 13:06:32 -0400 Subject: [Bioperl-l] Add sequence feature In-Reply-To: <4C656B67.5020402@gmail.com> References: <4C6562E0.7090008@gmail.com> <4C6566B0.60706@gmail.com> <4C656B67.5020402@gmail.com> Message-ID: Thanks. I somehow get these error messages. --------------------- WARNING --------------------- MSG: Bio::SeqIO::genbank=HASH(0xa7ba1c) is not a SeqI compliant module. Attempting to dump, but may fail! --------------------------------------------------- Can't locate object method "seq" via package "Bio::SeqIO::genbank" at /Library/Perl/5.8.8/Bio/SeqIO/genbank.pm line 760, line 447. by doing this, my $feat = new Bio::SeqFeature::Generic(-start =>20, -end => $40, -primary_tag => 'newfeature' ); $feat->add_tag_value("note","this is notes"); $f->add_SeqFeature($feat); ## f is original feature pointer $io = Bio::SeqIO->new(-format => "genbank", -file => ">$newoutfile" ); $io->write_seq($seqio_object); On Fri, Aug 13, 2010 at 11:57 AM, Roy Chaudhuri wrote: > Please remember to copy replies to the mailing list. > > You can loop over the features in your Bio::Seq object: > for my $feat ($seq->get_SeqFeatures) { # do something } > > And once you have found the feature you want to modify, you can add a tag > using something like: > $feat->add_tag_value('note',"this is a note"); > > When you're finished you can write out the modified sequence object to a > new GenBank file. > > > On 13/08/2010 16:40, Jessica Sun wrote: > >> no i want to load the genbank file with existing features and I need to >> add some new feature tags to the existing ones and then save to a new >> update genbank file for local usage. I just not quite good on how to >> easily merge the two steps you recommended into one in a neat way. >> >> thx >> >> >> On Fri, Aug 13, 2010 at 11:37 AM, Roy Chaudhuri > > wrote: >> >> I'm not sure I understand, do you mean that you want to load just >> the sequence from the GenBank file (ignoring the existing >> annotation), then add your own features? There are instructions on >> how to do that here: >> http://www.bioperl.org/wiki/HOWTO:SeqIO#Speed.2C_Bio::Seq::SeqBuilder >> >> >> On 13/08/2010 16:27, Jessica Sun wrote: >> >> unfortunately. I want to add the feature to the sequence object >> I got >> from the Genbank file, I do not mind to save a new genbank file but >> these new genbank file contains the original genbank format and >> info I >> got plus the new feature tags I need to added to. Any quick >> solution to >> this? >> >> thx >> >> Jessica >> >> >> >> On Fri, Aug 13, 2010 at 11:21 AM, Roy Chaudhuri >> >> > >> wrote: >> >> Hi Jessica. >> >> You need to use Bio::SeqIO to read in the GenBank file to a >> BioPerl >> sequence object, and to write your new GenBank file: >> http://www.bioperl.org/wiki/HOWTO:SeqIO >> >> To add a new feature follow the instructions here: >> >> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Own_Sequences >> >> (except that you are adding the feature to the sequence >> object you >> got from the Genbank file, not a new Bio::Seq object). >> >> Cheers. >> Roy. >> >> >> On 13/08/2010 16:06, Jessica Sun wrote: >> >> Does anyone knows how to open a genbank file, add new >> feature >> and then save >> a new genbank >> file with new feature added in bioperl ? >> >> thx >> >> >> >> >> >> -- >> Jessica Jingping Sun >> >> >> >> >> >> -- >> Jessica Jingping Sun >> > > -- Jessica Jingping Sun From drummike at gmail.com Fri Aug 13 13:41:55 2010 From: drummike at gmail.com (Mike Williams) Date: Fri, 13 Aug 2010 13:41:55 -0400 Subject: [Bioperl-l] Add sequence feature In-Reply-To: References: <4C6562E0.7090008@gmail.com> <4C6566B0.60706@gmail.com> <4C656B67.5020402@gmail.com> Message-ID: On Fri, Aug 13, 2010 at 1:06 PM, Jessica Sun wrote: > Thanks. I somehow get these error messages. > by doing this, > > my $feat = new Bio::SeqFeature::Generic(-start =>20, > -end => $40, > -primary_tag => 'newfeature' ); > $feat->add_tag_value("note","this is > notes"); > That $40 looks fishy. Try deleting the dollar sign. You did mean just 40, right? Mike From MEC at stowers.org Fri Aug 13 13:37:50 2010 From: MEC at stowers.org (Cook, Malcolm) Date: Fri, 13 Aug 2010 12:37:50 -0500 Subject: [Bioperl-l] Add sequence feature In-Reply-To: References: <4C6562E0.7090008@gmail.com> <4C6566B0.60706@gmail.com> <4C656B67.5020402@gmail.com> Message-ID: Jessica, Show more code! In particular, where did $f get set? --Malcolm -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Jessica Sun Sent: Friday, August 13, 2010 12:07 PM To: Roy Chaudhuri Cc: bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Add sequence feature Thanks. I somehow get these error messages. --------------------- WARNING --------------------- MSG: Bio::SeqIO::genbank=HASH(0xa7ba1c) is not a SeqI compliant module. Attempting to dump, but may fail! --------------------------------------------------- Can't locate object method "seq" via package "Bio::SeqIO::genbank" at /Library/Perl/5.8.8/Bio/SeqIO/genbank.pm line 760, line 447. by doing this, my $feat = new Bio::SeqFeature::Generic(-start =>20, -end => $40, -primary_tag => 'newfeature' ); $feat->add_tag_value("note","this is notes"); $f->add_SeqFeature($feat); ## f is original feature pointer $io = Bio::SeqIO->new(-format => "genbank", -file => ">$newoutfile" ); $io->write_seq($seqio_object); On Fri, Aug 13, 2010 at 11:57 AM, Roy Chaudhuri wrote: > Please remember to copy replies to the mailing list. > > You can loop over the features in your Bio::Seq object: > for my $feat ($seq->get_SeqFeatures) { # do something } > > And once you have found the feature you want to modify, you can add a > tag using something like: > $feat->add_tag_value('note',"this is a note"); > > When you're finished you can write out the modified sequence object to > a new GenBank file. > > > On 13/08/2010 16:40, Jessica Sun wrote: > >> no i want to load the genbank file with existing features and I need >> to add some new feature tags to the existing ones and then save to a >> new update genbank file for local usage. I just not quite good on how >> to easily merge the two steps you recommended into one in a neat way. >> >> thx >> >> >> On Fri, Aug 13, 2010 at 11:37 AM, Roy Chaudhuri >> > wrote: >> >> I'm not sure I understand, do you mean that you want to load just >> the sequence from the GenBank file (ignoring the existing >> annotation), then add your own features? There are instructions on >> how to do that here: >> >> http://www.bioperl.org/wiki/HOWTO:SeqIO#Speed.2C_Bio::Seq::SeqBuilder >> >> >> On 13/08/2010 16:27, Jessica Sun wrote: >> >> unfortunately. I want to add the feature to the sequence object >> I got >> from the Genbank file, I do not mind to save a new genbank file but >> these new genbank file contains the original genbank format and >> info I >> got plus the new feature tags I need to added to. Any quick >> solution to >> this? >> >> thx >> >> Jessica >> >> >> >> On Fri, Aug 13, 2010 at 11:21 AM, Roy Chaudhuri >> >> > >> wrote: >> >> Hi Jessica. >> >> You need to use Bio::SeqIO to read in the GenBank file to a >> BioPerl >> sequence object, and to write your new GenBank file: >> http://www.bioperl.org/wiki/HOWTO:SeqIO >> >> To add a new feature follow the instructions here: >> >> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Ow >> n_Sequences >> >> (except that you are adding the feature to the sequence >> object you >> got from the Genbank file, not a new Bio::Seq object). >> >> Cheers. >> Roy. >> >> >> On 13/08/2010 16:06, Jessica Sun wrote: >> >> Does anyone knows how to open a genbank file, add new >> feature >> and then save >> a new genbank >> file with new feature added in bioperl ? >> >> thx >> >> >> >> >> >> -- >> Jessica Jingping Sun >> >> >> >> >> >> -- >> Jessica Jingping Sun >> > > -- Jessica Jingping Sun _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From Kevin.M.Brown at asu.edu Fri Aug 13 13:53:50 2010 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Fri, 13 Aug 2010 10:53:50 -0700 Subject: [Bioperl-l] Add sequence feature In-Reply-To: References: <4C6562E0.7090008@gmail.com><4C6566B0.60706@gmail.com><4C656B67.5020402@gmail.com> Message-ID: <1A4207F8295607498283FE9E93B775B406E4529F@EX02.asurite.ad.asu.edu> If I'm reading your sample code correctly, then you are mistakenly trying to output the input SeqIO object and not the actual Bio::Seq object that was read in by SeqIO. My $seqio = Bio::SeqIO->new; My $seq = $seqio->next_seq; #manipulate $seq My $out = Bio::SeqIO->new; $out->write_seq($seq); -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Jessica Sun Sent: Friday, August 13, 2010 10:07 AM To: Roy Chaudhuri Cc: bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Add sequence feature Thanks. I somehow get these error messages. --------------------- WARNING --------------------- MSG: Bio::SeqIO::genbank=HASH(0xa7ba1c) is not a SeqI compliant module. Attempting to dump, but may fail! --------------------------------------------------- Can't locate object method "seq" via package "Bio::SeqIO::genbank" at /Library/Perl/5.8.8/Bio/SeqIO/genbank.pm line 760, line 447. by doing this, my $feat = new Bio::SeqFeature::Generic(-start =>20, -end => $40, -primary_tag => 'newfeature' ); $feat->add_tag_value("note","this is notes"); $f->add_SeqFeature($feat); ## f is original feature pointer $io = Bio::SeqIO->new(-format => "genbank", -file => ">$newoutfile" ); $io->write_seq($seqio_object); On Fri, Aug 13, 2010 at 11:57 AM, Roy Chaudhuri wrote: > Please remember to copy replies to the mailing list. > > You can loop over the features in your Bio::Seq object: > for my $feat ($seq->get_SeqFeatures) { # do something } > > And once you have found the feature you want to modify, you can add a tag > using something like: > $feat->add_tag_value('note',"this is a note"); > > When you're finished you can write out the modified sequence object to a > new GenBank file. > > > On 13/08/2010 16:40, Jessica Sun wrote: > >> no i want to load the genbank file with existing features and I need to >> add some new feature tags to the existing ones and then save to a new >> update genbank file for local usage. I just not quite good on how to >> easily merge the two steps you recommended into one in a neat way. >> >> thx >> >> >> On Fri, Aug 13, 2010 at 11:37 AM, Roy Chaudhuri > > wrote: >> >> I'm not sure I understand, do you mean that you want to load just >> the sequence from the GenBank file (ignoring the existing >> annotation), then add your own features? There are instructions on >> how to do that here: >> http://www.bioperl.org/wiki/HOWTO:SeqIO#Speed.2C_Bio::Seq::SeqBuilder >> >> >> On 13/08/2010 16:27, Jessica Sun wrote: >> >> unfortunately. I want to add the feature to the sequence object >> I got >> from the Genbank file, I do not mind to save a new genbank file but >> these new genbank file contains the original genbank format and >> info I >> got plus the new feature tags I need to added to. Any quick >> solution to >> this? >> >> thx >> >> Jessica >> >> >> >> On Fri, Aug 13, 2010 at 11:21 AM, Roy Chaudhuri >> >> > >> wrote: >> >> Hi Jessica. >> >> You need to use Bio::SeqIO to read in the GenBank file to a >> BioPerl >> sequence object, and to write your new GenBank file: >> http://www.bioperl.org/wiki/HOWTO:SeqIO >> >> To add a new feature follow the instructions here: >> >> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Own_S equences >> >> (except that you are adding the feature to the sequence >> object you >> got from the Genbank file, not a new Bio::Seq object). >> >> Cheers. >> Roy. >> >> >> On 13/08/2010 16:06, Jessica Sun wrote: >> >> Does anyone knows how to open a genbank file, add new >> feature >> and then save >> a new genbank >> file with new feature added in bioperl ? >> >> thx >> >> >> >> >> >> -- >> Jessica Jingping Sun >> >> >> >> >> >> -- >> Jessica Jingping Sun >> > > -- Jessica Jingping Sun _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From jessica.sun at gmail.com Fri Aug 13 15:16:51 2010 From: jessica.sun at gmail.com (Jessica Sun) Date: Fri, 13 Aug 2010 15:16:51 -0400 Subject: [Bioperl-l] Fwd: Add sequence feature In-Reply-To: References: <4C6562E0.7090008@gmail.com> <4C6566B0.60706@gmail.com> <4C656B67.5020402@gmail.com> <1A4207F8295607498283FE9E93B775B406E4529F@EX02.asurite.ad.asu.edu> Message-ID: ---------- Forwarded message ---------- From: Jessica Sun Date: Fri, Aug 13, 2010 at 3:16 PM Subject: Re: [Bioperl-l] Add sequence feature To: Kevin Brown yes, I change that, somehow it still did not take the added features in. On Fri, Aug 13, 2010 at 1:53 PM, Kevin Brown wrote: > If I'm reading your sample code correctly, then you are mistakenly > trying to output the input SeqIO object and not the actual Bio::Seq > object that was read in by SeqIO. > > My $seqio = Bio::SeqIO->new; > My $seq = $seqio->next_seq; > > #manipulate $seq > > My $out = Bio::SeqIO->new; > $out->write_seq($seq); > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Jessica Sun > Sent: Friday, August 13, 2010 10:07 AM > To: Roy Chaudhuri > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Add sequence feature > > Thanks. I somehow get these error messages. > > --------------------- WARNING --------------------- > MSG: Bio::SeqIO::genbank=HASH(0xa7ba1c) is not a SeqI compliant module. > Attempting to dump, but may fail! > --------------------------------------------------- > Can't locate object method "seq" via package "Bio::SeqIO::genbank" at > /Library/Perl/5.8.8/Bio/SeqIO/genbank.pm line 760, line 447. > > by doing this, > > my $feat = new Bio::SeqFeature::Generic(-start =>20, > -end => $40, > -primary_tag => 'newfeature' ); > $feat->add_tag_value("note","this is > notes"); > $f->add_SeqFeature($feat); ## f is original feature pointer > $io = Bio::SeqIO->new(-format => "genbank", -file => ">$newoutfile" ); > > $io->write_seq($seqio_object); > > On Fri, Aug 13, 2010 at 11:57 AM, Roy Chaudhuri > wrote: > > > Please remember to copy replies to the mailing list. > > > > You can loop over the features in your Bio::Seq object: > > for my $feat ($seq->get_SeqFeatures) { # do something } > > > > And once you have found the feature you want to modify, you can add a > tag > > using something like: > > $feat->add_tag_value('note',"this is a note"); > > > > When you're finished you can write out the modified sequence object to > a > > new GenBank file. > > > > > > On 13/08/2010 16:40, Jessica Sun wrote: > > > >> no i want to load the genbank file with existing features and I need > to > >> add some new feature tags to the existing ones and then save to a new > >> update genbank file for local usage. I just not quite good on how to > >> easily merge the two steps you recommended into one in a neat way. > >> > >> thx > >> > >> > >> On Fri, Aug 13, 2010 at 11:37 AM, Roy Chaudhuri > >> > wrote: > >> > >> I'm not sure I understand, do you mean that you want to load just > >> the sequence from the GenBank file (ignoring the existing > >> annotation), then add your own features? There are instructions on > >> how to do that here: > >> > http://www.bioperl.org/wiki/HOWTO:SeqIO#Speed.2C_Bio::Seq::SeqBuilder > >> > >> > >> On 13/08/2010 16:27, Jessica Sun wrote: > >> > >> unfortunately. I want to add the feature to the sequence > object > >> I got > >> from the Genbank file, I do not mind to save a new genbank > file but > >> these new genbank file contains the original genbank format > and > >> info I > >> got plus the new feature tags I need to added to. Any quick > >> solution to > >> this? > >> > >> thx > >> > >> Jessica > >> > >> > >> > >> On Fri, Aug 13, 2010 at 11:21 AM, Roy Chaudhuri > >> > >> >> >> wrote: > >> > >> Hi Jessica. > >> > >> You need to use Bio::SeqIO to read in the GenBank file to > a > >> BioPerl > >> sequence object, and to write your new GenBank file: > >> http://www.bioperl.org/wiki/HOWTO:SeqIO > >> > >> To add a new feature follow the instructions here: > >> > >> > http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Own_S > equences > >> > >> (except that you are adding the feature to the sequence > >> object you > >> got from the Genbank file, not a new Bio::Seq object). > >> > >> Cheers. > >> Roy. > >> > >> > >> On 13/08/2010 16:06, Jessica Sun wrote: > >> > >> Does anyone knows how to open a genbank file, add new > >> feature > >> and then save > >> a new genbank > >> file with new feature added in bioperl ? > >> > >> thx > >> > >> > >> > >> > >> > >> -- > >> Jessica Jingping Sun > >> > >> > >> > >> > >> > >> -- > >> Jessica Jingping Sun > >> > > > > > > > -- > Jessica Jingping Sun > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Jessica Jingping Sun -- Jessica Jingping Sun From MEC at stowers.org Fri Aug 13 15:56:09 2010 From: MEC at stowers.org (Cook, Malcolm) Date: Fri, 13 Aug 2010 14:56:09 -0500 Subject: [Bioperl-l] Fwd: Add sequence feature In-Reply-To: References: <4C6562E0.7090008@gmail.com> <4C6566B0.60706@gmail.com> <4C656B67.5020402@gmail.com> <1A4207F8295607498283FE9E93B775B406E4529F@EX02.asurite.ad.asu.edu> Message-ID: if you want to show all your code we might not have to guess at what the problem is..... Malcolm Cook Stowers Institute for Medical Research - Bioinformatics Kansas City, Missouri USA -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Jessica Sun Sent: Friday, August 13, 2010 2:17 PM To: bioperl-l at lists.open-bio.org Subject: [Bioperl-l] Fwd: Add sequence feature ---------- Forwarded message ---------- From: Jessica Sun Date: Fri, Aug 13, 2010 at 3:16 PM Subject: Re: [Bioperl-l] Add sequence feature To: Kevin Brown yes, I change that, somehow it still did not take the added features in. On Fri, Aug 13, 2010 at 1:53 PM, Kevin Brown wrote: > If I'm reading your sample code correctly, then you are mistakenly > trying to output the input SeqIO object and not the actual Bio::Seq > object that was read in by SeqIO. > > My $seqio = Bio::SeqIO->new; > My $seq = $seqio->next_seq; > > #manipulate $seq > > My $out = Bio::SeqIO->new; > $out->write_seq($seq); > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Jessica Sun > Sent: Friday, August 13, 2010 10:07 AM > To: Roy Chaudhuri > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Add sequence feature > > Thanks. I somehow get these error messages. > > --------------------- WARNING --------------------- > MSG: Bio::SeqIO::genbank=HASH(0xa7ba1c) is not a SeqI compliant module. > Attempting to dump, but may fail! > --------------------------------------------------- > Can't locate object method "seq" via package "Bio::SeqIO::genbank" at > /Library/Perl/5.8.8/Bio/SeqIO/genbank.pm line 760, line 447. > > by doing this, > > my $feat = new Bio::SeqFeature::Generic(-start =>20, > -end => $40, > -primary_tag => 'newfeature' ); > $feat->add_tag_value("note","this > is notes"); $f->add_SeqFeature($feat); ## f is original feature > pointer $io = Bio::SeqIO->new(-format => "genbank", -file => > ">$newoutfile" ); > > $io->write_seq($seqio_object); > > On Fri, Aug 13, 2010 at 11:57 AM, Roy Chaudhuri > wrote: > > > Please remember to copy replies to the mailing list. > > > > You can loop over the features in your Bio::Seq object: > > for my $feat ($seq->get_SeqFeatures) { # do something } > > > > And once you have found the feature you want to modify, you can add > > a > tag > > using something like: > > $feat->add_tag_value('note',"this is a note"); > > > > When you're finished you can write out the modified sequence object > > to > a > > new GenBank file. > > > > > > On 13/08/2010 16:40, Jessica Sun wrote: > > > >> no i want to load the genbank file with existing features and I > >> need > to > >> add some new feature tags to the existing ones and then save to a > >> new update genbank file for local usage. I just not quite good on > >> how to easily merge the two steps you recommended into one in a neat way. > >> > >> thx > >> > >> > >> On Fri, Aug 13, 2010 at 11:37 AM, Roy Chaudhuri > >> > wrote: > >> > >> I'm not sure I understand, do you mean that you want to load just > >> the sequence from the GenBank file (ignoring the existing > >> annotation), then add your own features? There are instructions on > >> how to do that here: > >> > http://www.bioperl.org/wiki/HOWTO:SeqIO#Speed.2C_Bio::Seq::SeqBuilder > >> > >> > >> On 13/08/2010 16:27, Jessica Sun wrote: > >> > >> unfortunately. I want to add the feature to the sequence > object > >> I got > >> from the Genbank file, I do not mind to save a new genbank > file but > >> these new genbank file contains the original genbank format > and > >> info I > >> got plus the new feature tags I need to added to. Any quick > >> solution to > >> this? > >> > >> thx > >> > >> Jessica > >> > >> > >> > >> On Fri, Aug 13, 2010 at 11:21 AM, Roy Chaudhuri > >> > >> >> >> wrote: > >> > >> Hi Jessica. > >> > >> You need to use Bio::SeqIO to read in the GenBank file > >> to > a > >> BioPerl > >> sequence object, and to write your new GenBank file: > >> http://www.bioperl.org/wiki/HOWTO:SeqIO > >> > >> To add a new feature follow the instructions here: > >> > >> > http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Own > _S > equences > >> > >> (except that you are adding the feature to the sequence > >> object you > >> got from the Genbank file, not a new Bio::Seq object). > >> > >> Cheers. > >> Roy. > >> > >> > >> On 13/08/2010 16:06, Jessica Sun wrote: > >> > >> Does anyone knows how to open a genbank file, add new > >> feature > >> and then save > >> a new genbank > >> file with new feature added in bioperl ? > >> > >> thx > >> > >> > >> > >> > >> > >> -- > >> Jessica Jingping Sun > >> > >> > >> > >> > >> > >> -- > >> Jessica Jingping Sun > >> > > > > > > > -- > Jessica Jingping Sun > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Jessica Jingping Sun -- Jessica Jingping Sun _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Aug 16 14:02:15 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 16 Aug 2010 13:02:15 -0500 Subject: [Bioperl-l] Bug? Features with similar ranges, different IDs are considered overlapping Message-ID: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu> All, This is in reference to a bug report I filed a while back. In the below test script, two features with the same start/end are compared. If the features have the same seq_id(), overlap succeeds. If the seq_id is changed (e.g. is on another chromosome, for instance), the overlap still succeeds. The question is: is this a bug? My vote would be 'yes', but there have been various arguments to say it's not. chris (maybe I'll make this a regular thing on the list, just to hash out some of the edge cases I run into periodically) ========================================= #!/usr/bin/perl -w use strict; use warnings; use Test::More; use Bio::SeqFeature::Generic; my ( $feat1, $feat2 ); $feat1 = Bio::SeqFeature::Generic->new( -start => 40, -end => 80, -strand => 1, -seq_id => 'ABC123', ); is $feat1->start, 40, 'start of feature location'; is $feat1->end, 80, 'end of feature location'; is $feat1->seq_id, 'ABC123', 'seq_id'; $feat2 = Bio::SeqFeature::Generic->new( -start => 40, -end => 80, -strand => 1, -seq_id => 'ABC123', ); is $feat2->start, 40, 'start of feature location'; is $feat2->end, 80, 'end of feature location'; is $feat2->seq_id, 'ABC123', 'seq_id'; # Generic features with same Seq ID should overlap ok( $feat2->overlaps($feat1), 'feat2 overlaps feat1' ); # Generic features with different Seq IDs shouldn't overlap is( $feat2->seq_id('XYZ678'), 'XYZ678', 'change seq_id' ); # this currently fails ok( !( $feat2->overlaps($feat1), 'feat2 doesn\'t overlap feat1' ) ); done_testing(); From David.Messina at sbc.su.se Mon Aug 16 14:51:54 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 16 Aug 2010 20:51:54 +0200 Subject: [Bioperl-l] Bug? Features with similar ranges, different IDs are considered overlapping In-Reply-To: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu> References: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu> Message-ID: > The question is: is this a bug? Hmm, tricky. Genomic start and end positions with differing IDs shouldn't overlap, but can't SeqFeatures apply to proteins and other molecules where one would want to compare positions without regard to ID? Dave From cjfields at illinois.edu Mon Aug 16 21:39:00 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 16 Aug 2010 20:39:00 -0500 Subject: [Bioperl-l] Bug? Features with similar ranges, different IDs are considered overlapping In-Reply-To: References: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu> Message-ID: On Aug 16, 2010, at 1:51 PM, Dave Messina wrote: >> The question is: is this a bug? > > Hmm, tricky. > > Genomic start and end positions with differing IDs shouldn't overlap, but can't SeqFeatures apply to proteins and other molecules where one would want to compare positions without regard to ID? > > Dave Good point; it's probably the context the methods are used that matters. So, maybe just a document clarification? chris From David.Messina at sbc.su.se Tue Aug 17 05:06:05 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 17 Aug 2010 11:06:05 +0200 Subject: [Bioperl-l] Bug? Features with similar ranges, different IDs are considered overlapping In-Reply-To: References: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu> Message-ID: <83732A2C-C6E3-479A-ACAA-FE5756271991@sbc.su.se> > Good point; it's probably the context the methods are used that matters. So, maybe just a document clarification? That's always good, but it really doesn't solve the issue you're describing. I mean, who would expect to get overlaps for features on different chromosomes? To me, that's a clear violation of reasonable user expectations. You shouldn't have to read the docs about something like that. So what's the solution for these duelling use cases? I haven't thought about it much, but a first approximation might be to add a -genomic boolean flag that, when true, would do the right thing and check the ID when doing overlaps or other positional comparisons. (Maybe -genomic is too obscure. Maybe it should be -same_id_for_overlaps or something like that.) And maybe having to know to set a flag is effectively the same thing as having to read the docs to understand SeqFeature's overlap behavior. What do the rest of you out there think? Dave From scott at scottcain.net Tue Aug 17 08:45:27 2010 From: scott at scottcain.net (Scott Cain) Date: Tue, 17 Aug 2010 08:45:27 -0400 Subject: [Bioperl-l] Bug? Features with similar ranges, different IDs are considered overlapping In-Reply-To: <83732A2C-C6E3-479A-ACAA-FE5756271991@sbc.su.se> References: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu> <83732A2C-C6E3-479A-ACAA-FE5756271991@sbc.su.se> Message-ID: Hi Dave and Chris, It seems to me that the genomic comparison is the thing people would do more often, so if you're going to create a flag, the default should be for the genomic comparison and if somebody is doing the protein space comparison and not getting the the expected results, they'll probably read the docs to find out why. Scott -- Scott Cain, Ph. D. scott at scottcain dot net Ontario Institute for Cancer Research http://gmod.org/ 216 392 3087 Snet from my iPhone. On Aug 17, 2010, at 5:06 AM, Dave Messina wrote: >> Good point; it's probably the context the methods are used that matters. So, maybe just a document clarification? > > That's always good, but it really doesn't solve the issue you're describing. > > I mean, who would expect to get overlaps for features on different chromosomes? > > To me, that's a clear violation of reasonable user expectations. You shouldn't have to read the docs about something like that. > > So what's the solution for these duelling use cases? I haven't thought about it much, but a first approximation might be to add a -genomic boolean flag that, when true, would do the right thing and check the ID when doing overlaps or other positional comparisons. > > (Maybe -genomic is too obscure. Maybe it should be -same_id_for_overlaps or something like that.) > > And maybe having to know to set a flag is effectively the same thing as having to read the docs to understand SeqFeature's overlap behavior. > > What do the rest of you out there think? > > > Dave > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From david.breimann at gmail.com Tue Aug 17 09:44:08 2010 From: david.breimann at gmail.com (David Breimann) Date: Tue, 17 Aug 2010 16:44:08 +0300 Subject: [Bioperl-l] bp_genbank2gff3.pl error with circular genomes Message-ID: Hello, The following genbank has a gene that runs over the 'end" of the chromosome and into its "beginning", and the script generates an error. ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Bacillus_cereus_ATCC_10987/NC_005707.gbk NC_005707 Unflattening error: Details: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: PROBLEM, SEVERITY==2 Ranges not in correct order. Strange ensembl genbank entry? Range: [207497,208369] [1,687] STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:473 STACK: Bio::SeqFeature::Tools::Unflattener::problem /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:952 STACK: Bio::SeqFeature::Tools::Unflattener::_check_order_is_consistent /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2842 STACK: Bio::SeqFeature::Tools::Unflattener::infer_mRNA_from_CDS /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2713 STACK: Bio::SeqFeature::Tools::Unflattener::unflatten_seq /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:1532 STACK: main::unflatten_seq /usr/local/bin/bp_genbank2gff3.pl:1023 STACK: /usr/local/bin/bp_genbank2gff3.pl:506 ----------------------------------------------------------- Best, Dave From cjfields at illinois.edu Tue Aug 17 09:51:02 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 17 Aug 2010 08:51:02 -0500 Subject: [Bioperl-l] bp_genbank2gff3.pl error with circular genomes In-Reply-To: References: Message-ID: <8E620E8B-4A4A-42B2-A2FA-60C8ECA4C8A5@illinois.edu> I think Chris Mungall has a branch set up for this in bioperl: http://github.com/bioperl/bioperl-live/tree/circular Is that correct? Should we merge that code into the master branch? chris On Aug 17, 2010, at 8:44 AM, David Breimann wrote: > Hello, > > The following genbank has a gene that runs over the 'end" of the > chromosome and into its "beginning", and the script generates an > error. > > ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Bacillus_cereus_ATCC_10987/NC_005707.gbk > > NC_005707 Unflattening error: > Details: > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: PROBLEM, SEVERITY==2 > Ranges not in correct order. Strange ensembl genbank entry? Range: > [207497,208369] [1,687] > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:473 > STACK: Bio::SeqFeature::Tools::Unflattener::problem > /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:952 > STACK: Bio::SeqFeature::Tools::Unflattener::_check_order_is_consistent > /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2842 > STACK: Bio::SeqFeature::Tools::Unflattener::infer_mRNA_from_CDS > /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2713 > STACK: Bio::SeqFeature::Tools::Unflattener::unflatten_seq > /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:1532 > STACK: main::unflatten_seq /usr/local/bin/bp_genbank2gff3.pl:1023 > STACK: /usr/local/bin/bp_genbank2gff3.pl:506 > ----------------------------------------------------------- > > Best, > Dave > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Tue Aug 17 09:52:11 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 17 Aug 2010 15:52:11 +0200 Subject: [Bioperl-l] Bug? Features with similar ranges, different IDs are considered overlapping In-Reply-To: References: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu> <83732A2C-C6E3-479A-ACAA-FE5756271991@sbc.su.se> Message-ID: > It seems to me that the genomic comparison is the thing people would do more often, so if you're going to create a flag, the default should be for the genomic comparison Yep, agreed. And such a flag should be named for the non-default behavior, then, like: -ignore_IDs_for_overlaps Dave From douglas.hoen at gmail.com Thu Aug 12 10:24:27 2010 From: douglas.hoen at gmail.com (Douglas Hoen) Date: Thu, 12 Aug 2010 10:24:27 -0400 Subject: [Bioperl-l] HMMER3 to GFF3 In-Reply-To: <20100812141645.1dc6507a.kai.blin@biotech.uni-tuebingen.de> References: <4bb89ced-69d9-43ff-ae20-4ce134efc40a@f6g2000yqa.googlegroups.com> <20100812141645.1dc6507a.kai.blin@biotech.uni-tuebingen.de> Message-ID: Hi Kai, Here it is. Thanks, -- Doug -------------- next part -------------- A non-text attachment was scrubbed... Name: chr1-tesigsv2.hmmscan Type: application/octet-stream Size: 676132 bytes Desc: not available URL: -------------- next part -------------- On 2010-08-12, at 8:16 AM, Kai Blin wrote: > On Wed, 11 Aug 2010 22:59:37 -0700 (PDT) > Doug Hoen wrote: > > Hi Doug, > >> Could someone please confirm whether the results are incorrect and, if >> so, perhaps suggest a fix? It may well be that this problem is due to >> the unusual way I am using hmmscan, rather than a problem with HMMER3 >> parsing...? > > Can you please attach your hmmer input file? Along the way something > inserted line breaks, making it unreadable. > > It might well be possible that the HMMer3 parser still handles a little > different from the HMMer2 parser, I haven't tried that script. > > Cheers, > Kai > > -- > Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de > Institute for Microbiology and Infection Medicine > Division of Microbiology/Biotechnology > Eberhard-Karls-University of T?bingen > Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 > D-72076 T?bingen Fax : ++49 7071 29-5979 > Deutschland > Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From CJMungall at lbl.gov Tue Aug 17 11:53:15 2010 From: CJMungall at lbl.gov (Chris Mungall) Date: Tue, 17 Aug 2010 08:53:15 -0700 Subject: [Bioperl-l] bp_genbank2gff3.pl error with circular genomes In-Reply-To: <8E620E8B-4A4A-42B2-A2FA-60C8ECA4C8A5@illinois.edu> References: <8E620E8B-4A4A-42B2-A2FA-60C8ECA4C8A5@illinois.edu> Message-ID: You can merge this in. It should allow David to proceed. I haven't kept up on synchrony between bioperl and GFF on circular genomes. The above fix is conservative in that essentially preserves the genbank coordinates even when the origin is crossed: http://github.com/bioperl/bioperl-live/commit/d752a4cb5168d1bb01f8c80247a57f66b2bd9daf However, if this is to conform to GFF3 then the resulting coordinates that cross the origin should have start/end incremented by the genome length On Aug 17, 2010, at 6:51 AM, Chris Fields wrote: > I think Chris Mungall has a branch set up for this in bioperl: > > http://github.com/bioperl/bioperl-live/tree/circular > > Is that correct? Should we merge that code into the master branch? > > chris > > On Aug 17, 2010, at 8:44 AM, David Breimann wrote: > >> Hello, >> >> The following genbank has a gene that runs over the 'end" of the >> chromosome and into its "beginning", and the script generates an >> error. >> >> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Bacillus_cereus_ATCC_10987/NC_005707.gbk >> >> NC_005707 Unflattening error: >> Details: >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: PROBLEM, SEVERITY==2 >> Ranges not in correct order. Strange ensembl genbank entry? Range: >> [207497,208369] [1,687] >> STACK: Error::throw >> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/ >> Root.pm:473 >> STACK: Bio::SeqFeature::Tools::Unflattener::problem >> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:952 >> STACK: >> Bio::SeqFeature::Tools::Unflattener::_check_order_is_consistent >> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2842 >> STACK: Bio::SeqFeature::Tools::Unflattener::infer_mRNA_from_CDS >> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2713 >> STACK: Bio::SeqFeature::Tools::Unflattener::unflatten_seq >> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:1532 >> STACK: main::unflatten_seq /usr/local/bin/bp_genbank2gff3.pl:1023 >> STACK: /usr/local/bin/bp_genbank2gff3.pl:506 >> ----------------------------------------------------------- >> >> Best, >> Dave >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Tue Aug 17 15:24:23 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 17 Aug 2010 14:24:23 -0500 Subject: [Bioperl-l] bp_genbank2gff3.pl error with circular genomes In-Reply-To: References: <8E620E8B-4A4A-42B2-A2FA-60C8ECA4C8A5@illinois.edu> Message-ID: <8C50FCC4-4BB5-4ACB-A138-BB20F50D2C45@illinois.edu> On Aug 17, 2010, at 10:53 AM, Chris Mungall wrote: > You can merge this in. It should allow David to proceed. Will do. I'll go ahead and delete the remote branch as well. > I haven't kept up on synchrony between bioperl and GFF on circular genomes. The above fix is conservative in that essentially preserves the genbank coordinates even when the origin is crossed: > > http://github.com/bioperl/bioperl-live/commit/d752a4cb5168d1bb01f8c80247a57f66b2bd9daf > > However, if this is to conform to GFF3 then the resulting coordinates that cross the origin should have start/end incremented by the genome length Yes, that is a problem that needs to be addressed. Might be worth filing a bug report for tracking this; we can use David's example, or the one I recently added for phi-X174. chris > On Aug 17, 2010, at 6:51 AM, Chris Fields wrote: > >> I think Chris Mungall has a branch set up for this in bioperl: >> >> http://github.com/bioperl/bioperl-live/tree/circular >> >> Is that correct? Should we merge that code into the master branch? >> >> chris >> >> On Aug 17, 2010, at 8:44 AM, David Breimann wrote: >> >>> Hello, >>> >>> The following genbank has a gene that runs over the 'end" of the >>> chromosome and into its "beginning", and the script generates an >>> error. >>> >>> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Bacillus_cereus_ATCC_10987/NC_005707.gbk >>> >>> NC_005707 Unflattening error: >>> Details: >>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>> MSG: PROBLEM, SEVERITY==2 >>> Ranges not in correct order. Strange ensembl genbank entry? Range: >>> [207497,208369] [1,687] >>> STACK: Error::throw >>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:473 >>> STACK: Bio::SeqFeature::Tools::Unflattener::problem >>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:952 >>> STACK: Bio::SeqFeature::Tools::Unflattener::_check_order_is_consistent >>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2842 >>> STACK: Bio::SeqFeature::Tools::Unflattener::infer_mRNA_from_CDS >>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2713 >>> STACK: Bio::SeqFeature::Tools::Unflattener::unflatten_seq >>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:1532 >>> STACK: main::unflatten_seq /usr/local/bin/bp_genbank2gff3.pl:1023 >>> STACK: /usr/local/bin/bp_genbank2gff3.pl:506 >>> ----------------------------------------------------------- >>> >>> Best, >>> Dave >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From sheldon.mckay at gmail.com Tue Aug 17 16:42:50 2010 From: sheldon.mckay at gmail.com (Sheldon McKay) Date: Tue, 17 Aug 2010 16:42:50 -0400 Subject: [Bioperl-l] AlignIO and Gbrowse_syn In-Reply-To: References: <18DF7D20DFEC044098A1062202F5FFF32F0237EAB7@exchsth.agresearch.co.nz> Message-ID: The growse_syn dev team is pretty small (n=1) right now, so any patches would be welcome. Sheldon On Wed, Aug 11, 2010 at 6:02 PM, Chris Fields wrote: > Russell, > > We have had very few requests to support .maf until recently, which is why there has been little done with it. ?We welcome any help to improve it. > > chris > > On Aug 11, 2010, at 4:31 PM, Smithies, Russell wrote: > >> I know there was some brief discussion about .maf format a few weeks ago but I've had an enquiry (as below) from a colleague. >> If GBrowse_syn is using .maf format, does AlignIO need more work? >> Any comments? >> >> --Russell >> >> >> I'd like to plug LASTZ alignments into GBrowse_syn. LASTZ can produce a limit number of alignment formats (http://www.bx.psu.edu/miller_lab/dist/README.lastz-1.02.00/README.lastz-1.02.00a.html#options_output). GBrowse_syn accepts clustalw format plus "other commonly used formats recognized by BioPerl's AlignIO parser" (http://gmod.org/wiki/GBrowse_syn_Database) . ?Since LASTZ doesn't produce clustalw, I've tried parsing LASTZ maf output to clustalw (and other alignment formats) using AlignIO, however I run into the following issues: >> *Strand info is lost (probably fair enough, since this isn't part of the clustalw format per se; incorporating strand info within sequence IDs is a GBrowse_syn clustalw specification) >> *The coordinate system for reverse strand matches ?differs between LASTZ .maf and BioPerl .maf: for LASTZ, coordinates relate to the reverse complemented sequence, whereas for BioPerl/GBrowse, coordinates relate to the original (non-rev complemented) sequence. E.g. a coordinate of "1" in the LASTZ .maf file refers to the last base of the original sequence; AlignIO prints "1" to the output clustalw file, but since strand info is lost it is construed as the first position at the very start of the original sequence. As a result all reverse match coordinates in the resulting clustalw output file are incorrect. >> *AlignIO is unable to parse multiple, individual aligned regions within the same .maf file; it interleaves them >> >> I would be interested to hear whether anyone has already found a solution to integrating LASTZ and GBrowse_syn... and also whether any development of AlignIO to improve support of maf format is planned. >> ======================================================================= >> Attention: The information contained in this message and/or attachments >> from AgResearch Limited is intended only for the persons or entities >> to which it is addressed and may contain confidential and/or privileged >> material. Any review, retransmission, dissemination or other use of, or >> taking of any action in reliance upon, this information by persons or >> entities other than the intended recipients is prohibited by AgResearch >> Limited. If you have received this message in error, please notify the >> sender immediately. >> ======================================================================= >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From hxu.hong at gmail.com Tue Aug 17 16:50:43 2010 From: hxu.hong at gmail.com (Hong Xu) Date: Tue, 17 Aug 2010 16:50:43 -0400 Subject: [Bioperl-l] Bio::Tools::Primer3 question Message-ID: Hello all, I'm working to parse the Primer3 release 2.2.2-beta result. I made the necessary changes to make Bio::Tools::Primer3 work with the new output tags of Primer3 release 2.2.2. But when I tried to get the primer Tm, I found that Bio::Tools::Primer3 gave different Tm from Primer3 result file. Then I learned that the Tm was calculated by Bio::SeqFeature::Primer module, not from parsing Primer3 result. If I want to get data from parsing Primer3 result, should I write my own Primer3 parser instead of Bio::Tools::Primer3? thanks a lot, Hong From cjfields at illinois.edu Tue Aug 17 17:14:02 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 17 Aug 2010 16:14:02 -0500 Subject: [Bioperl-l] Bio::Tools::Primer3 question In-Reply-To: References: Message-ID: Already ahead of you there, unfortunately. I wrote a complete reimplementation of both the Primer3 parser and the Primer3 wrapper that handles both v1 and v2 of primer3_core. Lack of tuits lately have prevented me from getting tests written up, so for the time being it's sitting in bioperl-dev: http://github.com/bioperl/bioperl-dev They are Bio::Tools::Primer3Redux (parser) and Bio::Tools::Run::Primer3Redux (wrapper). I rewrote those b/c I found the original modules not adequate enough in many ways for my purposes then (the newer version uses simple features or feature pairs instead of the primer features, for the same reasons you mention re: Tm). You're more than welcome to hack on the code a bit. I'm planning on pulling it out into my own github repo for separate submission to CPAN. chris On Aug 17, 2010, at 3:50 PM, Hong Xu wrote: > Hello all, > > I'm working to parse the Primer3 release 2.2.2-beta result. I made the > necessary changes to make Bio::Tools::Primer3 work with the new output > tags of Primer3 release 2.2.2. But when I tried to get the primer Tm, > I found that Bio::Tools::Primer3 gave different Tm from Primer3 result > file. Then I learned that the Tm was calculated by > Bio::SeqFeature::Primer module, not from parsing Primer3 result. If I > want to get data from parsing Primer3 result, should I write my own > Primer3 parser instead of Bio::Tools::Primer3? > > thanks a lot, > Hong > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Aug 17 23:42:59 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 17 Aug 2010 22:42:59 -0500 Subject: [Bioperl-l] bp_genbank2gff3.pl error with circular genomes In-Reply-To: <8C50FCC4-4BB5-4ACB-A138-BB20F50D2C45@illinois.edu> References: <8E620E8B-4A4A-42B2-A2FA-60C8ECA4C8A5@illinois.edu> <8C50FCC4-4BB5-4ACB-A138-BB20F50D2C45@illinois.edu> Message-ID: Chris, David, The branch is now merged back to trunk. David, let us know if this helps. chris (f) On Aug 17, 2010, at 2:24 PM, Chris Fields wrote: > On Aug 17, 2010, at 10:53 AM, Chris Mungall wrote: > >> You can merge this in. It should allow David to proceed. > > Will do. I'll go ahead and delete the remote branch as well. > >> I haven't kept up on synchrony between bioperl and GFF on circular genomes. The above fix is conservative in that essentially preserves the genbank coordinates even when the origin is crossed: >> >> http://github.com/bioperl/bioperl-live/commit/d752a4cb5168d1bb01f8c80247a57f66b2bd9daf >> >> However, if this is to conform to GFF3 then the resulting coordinates that cross the origin should have start/end incremented by the genome length > > Yes, that is a problem that needs to be addressed. Might be worth filing a bug report for tracking this; we can use David's example, or the one I recently added for phi-X174. > > chris > >> On Aug 17, 2010, at 6:51 AM, Chris Fields wrote: >> >>> I think Chris Mungall has a branch set up for this in bioperl: >>> >>> http://github.com/bioperl/bioperl-live/tree/circular >>> >>> Is that correct? Should we merge that code into the master branch? >>> >>> chris >>> >>> On Aug 17, 2010, at 8:44 AM, David Breimann wrote: >>> >>>> Hello, >>>> >>>> The following genbank has a gene that runs over the 'end" of the >>>> chromosome and into its "beginning", and the script generates an >>>> error. >>>> >>>> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Bacillus_cereus_ATCC_10987/NC_005707.gbk >>>> >>>> NC_005707 Unflattening error: >>>> Details: >>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>> MSG: PROBLEM, SEVERITY==2 >>>> Ranges not in correct order. Strange ensembl genbank entry? Range: >>>> [207497,208369] [1,687] >>>> STACK: Error::throw >>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:473 >>>> STACK: Bio::SeqFeature::Tools::Unflattener::problem >>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:952 >>>> STACK: Bio::SeqFeature::Tools::Unflattener::_check_order_is_consistent >>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2842 >>>> STACK: Bio::SeqFeature::Tools::Unflattener::infer_mRNA_from_CDS >>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2713 >>>> STACK: Bio::SeqFeature::Tools::Unflattener::unflatten_seq >>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:1532 >>>> STACK: main::unflatten_seq /usr/local/bin/bp_genbank2gff3.pl:1023 >>>> STACK: /usr/local/bin/bp_genbank2gff3.pl:506 >>>> ----------------------------------------------------------- >>>> >>>> Best, >>>> Dave >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Aug 18 00:48:55 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 17 Aug 2010 23:48:55 -0500 Subject: [Bioperl-l] Bio::Tools::Primer3 question In-Reply-To: References: Message-ID: Hong, The latest code, along with working tests, is present here: http://github.com/cjfields/Bio-Tools-Primer3Redux It needs a few more tests but the initial wrapper tests work fine for primer3 v2.2.1 on both Mac and Linux. Will try using this to CPAN after a bit more cleanup. chris On Aug 17, 2010, at 4:14 PM, Chris Fields wrote: > Already ahead of you there, unfortunately. I wrote a complete reimplementation of both the Primer3 parser and the Primer3 wrapper that handles both v1 and v2 of primer3_core. Lack of tuits lately have prevented me from getting tests written up, so for the time being it's sitting in bioperl-dev: > > http://github.com/bioperl/bioperl-dev > > They are Bio::Tools::Primer3Redux (parser) and Bio::Tools::Run::Primer3Redux (wrapper). > > I rewrote those b/c I found the original modules not adequate enough in many ways for my purposes then (the newer version uses simple features or feature pairs instead of the primer features, for the same reasons you mention re: Tm). You're more than welcome to hack on the code a bit. I'm planning on pulling it out into my own github repo for separate submission to CPAN. > > chris > > On Aug 17, 2010, at 3:50 PM, Hong Xu wrote: > >> Hello all, >> >> I'm working to parse the Primer3 release 2.2.2-beta result. I made the >> necessary changes to make Bio::Tools::Primer3 work with the new output >> tags of Primer3 release 2.2.2. But when I tried to get the primer Tm, >> I found that Bio::Tools::Primer3 gave different Tm from Primer3 result >> file. Then I learned that the Tm was calculated by >> Bio::SeqFeature::Primer module, not from parsing Primer3 result. If I >> want to get data from parsing Primer3 result, should I write my own >> Primer3 parser instead of Bio::Tools::Primer3? >> >> thanks a lot, >> Hong >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From david.breimann at gmail.com Wed Aug 18 02:46:58 2010 From: david.breimann at gmail.com (David Breimann) Date: Wed, 18 Aug 2010 09:46:58 +0300 Subject: [Bioperl-l] bp_genbank2gff3.pl error with circular genomes In-Reply-To: References: <8E620E8B-4A4A-42B2-A2FA-60C8ECA4C8A5@illinois.edu> <8C50FCC4-4BB5-4ACB-A138-BB20F50D2C45@illinois.edu> Message-ID: Dear Chris's, I tested the updated version on multiple genomes that previously returned errors (for future reference: NC_005707, NC_006578, NC_007103, NC_007104, NC_007106, NC_007107, NC_008573, NC_008762, NC_008763, NC_008785, NC_009457, NC_012040). The script now ends normally on all of them. However, as you mentioned, the result GFF3 file does not comply with GFF3 specifications for circular genomes. This in turn causes some unexpected results in other applications. Best, Dave On Wed, Aug 18, 2010 at 6:42 AM, Chris Fields wrote: > Chris, David, > > The branch is now merged back to trunk. ?David, let us know if this helps. > > chris (f) > > On Aug 17, 2010, at 2:24 PM, Chris Fields wrote: > >> On Aug 17, 2010, at 10:53 AM, Chris Mungall wrote: >> >>> You can merge this in. It should allow David to proceed. >> >> Will do. ?I'll go ahead and delete the remote branch as well. >> >>> I haven't kept up on synchrony between bioperl and GFF on circular genomes. The above fix is conservative in that essentially preserves the genbank coordinates even when the origin is crossed: >>> >>> ? ? ?http://github.com/bioperl/bioperl-live/commit/d752a4cb5168d1bb01f8c80247a57f66b2bd9daf >>> >>> However, if this is to conform to GFF3 then the resulting coordinates that cross the origin should have start/end incremented by the genome length >> >> Yes, that is a problem that needs to be addressed. ?Might be worth filing a bug report for tracking this; we can use David's example, or the one I recently added for phi-X174. >> >> chris >> >>> On Aug 17, 2010, at 6:51 AM, Chris Fields wrote: >>> >>>> I think Chris Mungall has a branch set up for this in bioperl: >>>> >>>> http://github.com/bioperl/bioperl-live/tree/circular >>>> >>>> Is that correct? ?Should we merge that code into the master branch? >>>> >>>> chris >>>> >>>> On Aug 17, 2010, at 8:44 AM, David Breimann wrote: >>>> >>>>> Hello, >>>>> >>>>> The following genbank has a gene that runs over the 'end" of the >>>>> chromosome and into its "beginning", and the script generates an >>>>> error. >>>>> >>>>> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Bacillus_cereus_ATCC_10987/NC_005707.gbk >>>>> >>>>> NC_005707 Unflattening error: >>>>> Details: >>>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>>> MSG: PROBLEM, SEVERITY==2 >>>>> Ranges not in correct order. Strange ensembl genbank entry? Range: >>>>> [207497,208369] [1,687] >>>>> STACK: Error::throw >>>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:473 >>>>> STACK: Bio::SeqFeature::Tools::Unflattener::problem >>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:952 >>>>> STACK: Bio::SeqFeature::Tools::Unflattener::_check_order_is_consistent >>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2842 >>>>> STACK: Bio::SeqFeature::Tools::Unflattener::infer_mRNA_from_CDS >>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2713 >>>>> STACK: Bio::SeqFeature::Tools::Unflattener::unflatten_seq >>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:1532 >>>>> STACK: main::unflatten_seq /usr/local/bin/bp_genbank2gff3.pl:1023 >>>>> STACK: /usr/local/bin/bp_genbank2gff3.pl:506 >>>>> ----------------------------------------------------------- >>>>> >>>>> Best, >>>>> Dave >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From G.Gallone at sms.ed.ac.uk Wed Aug 18 10:57:01 2010 From: G.Gallone at sms.ed.ac.uk (Giuseppe Gallone) Date: Wed, 18 Aug 2010 15:57:01 +0100 Subject: [Bioperl-l] [RFC] Interolog::Walk Message-ID: <4C6BF4BD.5010200@sms.ed.ac.uk> Hello BioPerl community - I've written a new module called Interolog::Walk that I'm planning to put on CPAN. I would be grateful if you might take a look at the brief description I attached and tell me what you think. I'll be more than happy to post further details should the module be of some interest for someone. Also, I am not totally sure about having the correct name for it. This is my first module and It would be great if you could advise on naming it appropriately. Hopefully the following description will give an idea on what it does. =================== NAME Interolog::Walk - Retrieve, score and visualize putative Protein-Protein Interactions through the orthology-walk method DESCRIPTION A common activity in computational biology is to mine protein-protein interactions from publicly available databases in order to build Protein-Protein Interaction (PPI) datasets. In many instances, however, the number of experimentally obtained annotated PPIs is very scarce and it would be helpful to enrich the experimental dataset with high-quality, computationally-inferred PPIs. Such computationally-obtained dataset can extend, support or enrich experimental PPI datasets, and are of crucial importance in high-throughput gene prioritization studies, i.e. to drive hypotheses and restrict the dimensionality of many gene functional discovery problems. This Perl Module, Interolog::Walk, is aimed at building putative PPI datasets on the basis of a number of comparative biology paradigms: the module implements a collection of computational biology algorithms based on the concept of "orthology projection". If interacting proteins A and B in organism X have orthologs A' and B' in organism Y, under certain conditions one can assume that the interaction will be conserved in organism Y, i.e. the A-B interaction can be "projected through the orthologies" to obtain a putative A'-B' interaction. The pair of interactions (A-B) and (A'-B') are named "Interologs" (see for instance [1] and [2]). Interolog::Walk collects, analyses and collates gene orthology data provided by the Ensembl Consortium (www.ensembl.org) as well as PPI data provided by EBI Intact (http://www.ebi.ac.uk/intact/). It provides the user with the possibility of rating the quality and reliability of the putative interactions collected, by means of confidence scores, and optionally outputs network representations of the datasets, compatible with the biological network representation standard, Cytoscape. USAGE In order to carry out an interolog walk we start with a set of gene identifiers in one organism of interest. We query those ids against a number of comparative biology databases to retrieve a list of orthologues for each gene id of interest, in one or more species. In the following step we rely on PPI databases to retrieve the list of available interactors for the protein ids obtained. The output at this stage consists of a list of interactors of the orthologues of the initial gene set, plus several fields of ancillary data. In the last step of the process we project the interactions - again using orthology data - back to the original species of interest. The output of the process is a list of PUTATIVE INTERACTORS of the initial gene set, plus several fields of ancillary data. ==================== Given the scope and the focus of the project, I would imagine that viable alternatives for the namespace might be Bio::Orthology::InterologWalk Bio::InterologMap or maybe Interolog::Map Orthology::Map Orthology::InterologMap There are no similar projects as far as I could see so I shouldn't run the risk of overlapping namespaces. Still I would love to know your informed opinion about it. best, Giuseppe REFERENCES [1] Yu H, Luscombe NM, Lu HX, Zhu X, Xia Y, Han JD, Bertin N, Chung S, Vidal M, Gerstein M. Annotation transfer between genomes: protein-protein interologs and protein-DNA regulogs. Genome Research 2004 Jun;14(6):1107-18. [2]Wiles AM, Doderer M, Ruan J, Gu T-T, Ravi D, Blackman BA, Bishop AJR. "Building and Analyzing Protein Interactome Networks by Cross-species Comparisons." BMC Systems Biology 2010, 4:36 - PMID: 20353594 -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From David.Messina at sbc.su.se Wed Aug 18 12:52:58 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 18 Aug 2010 18:52:58 +0200 Subject: [Bioperl-l] [RFC] Interolog::Walk In-Reply-To: <4C6BF4BD.5010200@sms.ed.ac.uk> References: <4C6BF4BD.5010200@sms.ed.ac.uk> Message-ID: <8BBCD561-CA5E-412B-8CBC-34767006A74A@sbc.su.se> Hi Giuseppe, Sounds really interesting ? thanks for posting this. > Bio::Orthology::InterologWalk I vote for this name, or in any case something with Bio:: as the top-level namespace since it's a biology-related package. I like that you're providing a lot of background and information about the project in the documentation. However, the USAGE section should give information about how to use the module, with example code. You can look at other modules on CPAN (or in BioPerl) to see the conventions for writing documentation. Also, from what you wrote, it sounds like this might be a pipeline or a script rather than a module per se, or perhaps a script and a set of modules. It would be helpful to clarify in your documentation (if you haven't already) how exactly things are organized (and of course example code will help with that, too). Hope that's helpful, and let us know when you've got it up on CPAN so we can try it out! Dave From cjfields at illinois.edu Wed Aug 18 14:24:16 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 18 Aug 2010 13:24:16 -0500 Subject: [Bioperl-l] bp_genbank2gff3.pl error with circular genomes In-Reply-To: References: <8E620E8B-4A4A-42B2-A2FA-60C8ECA4C8A5@illinois.edu> <8C50FCC4-4BB5-4ACB-A138-BB20F50D2C45@illinois.edu> Message-ID: Okay, will file this as a bug. Thanks! chris On Aug 18, 2010, at 1:46 AM, David Breimann wrote: > Dear Chris's, > > I tested the updated version on multiple genomes that previously > returned errors (for future reference: NC_005707, NC_006578, > NC_007103, NC_007104, NC_007106, NC_007107, NC_008573, NC_008762, > NC_008763, NC_008785, NC_009457, NC_012040). The script now ends > normally on all of them. However, as you mentioned, the result GFF3 > file does not comply with GFF3 specifications for circular genomes. > This in turn causes some unexpected results in other applications. > > Best, > Dave > > On Wed, Aug 18, 2010 at 6:42 AM, Chris Fields wrote: >> Chris, David, >> >> The branch is now merged back to trunk. David, let us know if this helps. >> >> chris (f) >> >> On Aug 17, 2010, at 2:24 PM, Chris Fields wrote: >> >>> On Aug 17, 2010, at 10:53 AM, Chris Mungall wrote: >>> >>>> You can merge this in. It should allow David to proceed. >>> >>> Will do. I'll go ahead and delete the remote branch as well. >>> >>>> I haven't kept up on synchrony between bioperl and GFF on circular genomes. The above fix is conservative in that essentially preserves the genbank coordinates even when the origin is crossed: >>>> >>>> http://github.com/bioperl/bioperl-live/commit/d752a4cb5168d1bb01f8c80247a57f66b2bd9daf >>>> >>>> However, if this is to conform to GFF3 then the resulting coordinates that cross the origin should have start/end incremented by the genome length >>> >>> Yes, that is a problem that needs to be addressed. Might be worth filing a bug report for tracking this; we can use David's example, or the one I recently added for phi-X174. >>> >>> chris >>> >>>> On Aug 17, 2010, at 6:51 AM, Chris Fields wrote: >>>> >>>>> I think Chris Mungall has a branch set up for this in bioperl: >>>>> >>>>> http://github.com/bioperl/bioperl-live/tree/circular >>>>> >>>>> Is that correct? Should we merge that code into the master branch? >>>>> >>>>> chris >>>>> >>>>> On Aug 17, 2010, at 8:44 AM, David Breimann wrote: >>>>> >>>>>> Hello, >>>>>> >>>>>> The following genbank has a gene that runs over the 'end" of the >>>>>> chromosome and into its "beginning", and the script generates an >>>>>> error. >>>>>> >>>>>> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Bacillus_cereus_ATCC_10987/NC_005707.gbk >>>>>> >>>>>> NC_005707 Unflattening error: >>>>>> Details: >>>>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>>>> MSG: PROBLEM, SEVERITY==2 >>>>>> Ranges not in correct order. Strange ensembl genbank entry? Range: >>>>>> [207497,208369] [1,687] >>>>>> STACK: Error::throw >>>>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:473 >>>>>> STACK: Bio::SeqFeature::Tools::Unflattener::problem >>>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:952 >>>>>> STACK: Bio::SeqFeature::Tools::Unflattener::_check_order_is_consistent >>>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2842 >>>>>> STACK: Bio::SeqFeature::Tools::Unflattener::infer_mRNA_from_CDS >>>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2713 >>>>>> STACK: Bio::SeqFeature::Tools::Unflattener::unflatten_seq >>>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:1532 >>>>>> STACK: main::unflatten_seq /usr/local/bin/bp_genbank2gff3.pl:1023 >>>>>> STACK: /usr/local/bin/bp_genbank2gff3.pl:506 >>>>>> ----------------------------------------------------------- >>>>>> >>>>>> Best, >>>>>> Dave >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cdavis at bcm.tmc.edu Wed Aug 18 15:19:53 2010 From: cdavis at bcm.tmc.edu (Caleb Davis) Date: Wed, 18 Aug 2010 14:19:53 -0500 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlast.pm - bl2seq question Message-ID: <4C6C3259.4060304@bcm.tmc.edu> Hello, thank you for bioperl! I am getting discrepancies between the online bl2seq (www.ncbi.nlm.nih.gov/blast/*bl2seq*/wblast2.cgi) and bioperl's implementation, and I'm not sure why. I'm seeing a desired behavior through the web interface but can't replicate it locally. Specifically, online bl2seq aligns across a 1 bp insertion in the subject whereas the local bl2seq just reports a shorter alignment. Any ideas? Thanks again, --Caleb The desired parameter differences from default are -F F -W 7 (turn complexity filter off, word size = 7). Below I present the online and local results given the following input sequences: >consensus GAGGATCCAGAATTCTC >FVFTF6N01A86BR AACCCAATGTAAGGAAGCTAAGAACCTTGAAAAGAGGATACCAGAATTCTC Here are the parameters and result I'm getting online: Blast4-request ::= { body queue-search { program "blastn", service "plain", queries bioseq-set { seq-set { seq { id { local id 26297 }, descr { title "consensus", user { type str "CFastaReader", data { { label str "DefLine", data str ">consensus" } } } }, inst { repr raw, mol na, length 17, seq-data ncbi2na '8A3520F740'H } } } }, subject sequences { { id { local id 26299 }, descr { title "FVFTF6N01A86BR", user { type str "CFastaReader", data { { label str "DefLine", data str ">FVFTF6N01A86BR" } } } }, inst { repr raw, mol na, length 51, seq-data ncbi2na '0543B0A09C205F80228C520F74'H } } }, algorithm-options { { name "EvalueThreshold", value cutoff e-value { 1, 10, 1 } }, { name "UngappedMode", value boolean FALSE }, { name "PercentIdentity", value real { 0, 10, 0 } }, { name "HitlistSize", value integer 100 }, { name "EffectiveSearchSpace", value big-integer 0 }, { name "DbLength", value big-integer 0 }, { name "WindowSize", value integer 0 }, { name "DustFiltering", value boolean FALSE }, { name "RepeatFiltering", value boolean FALSE }, { name "MaskAtHash", value boolean TRUE }, { name "MismatchPenalty", value integer -3 }, { name "MatchReward", value integer 2 }, { name "GapOpeningCost", value integer 5 }, { name "GapExtensionCost", value integer 2 }, { name "StrandOption", value strand-type both-strands }, { name "WordSize", value integer 7 } }, format-options { { name "Web_JobTitle", value string "consensus" }, { name "Web_BlastSpecialPage", value string "blast2seq" } } } } >lcl|30439 FVFTF6N01A86BR Length=51 Sort alignments for this subject sequence by: E value Score Percent identity Query start position Subject start position Score = 24.7 bits (26), Expect = 2e-05 Identities = 17/18 (94%), Gaps = 1/18 (5%) Strand=Plus/Plus Query 1 GAGGAT-CCAGAATTCTC 17 |||||| ||||||||||| Sbjct 34 GAGGATACCAGAATTCTC 51 Here's the output from a local search (I changed the expect to 5.0 just to prove to myself that some parameters are getting through OK): my @params = (-program => 'blastn', -outfile => 'bl2seq.out', -FILTER => 'F', -WORDSIZE => 7, -expect => 5.0); my $factory = Bio::Tools::Run::StandAloneBlast->new(@params); my $bl2seq_report = $factory->bl2seq($cons_seqobj, $single_seqobj); #consensus vs. FVFTF6N01A86BR print Dumper $bl2seq_report->next_result; $VAR1 = bless( { '_inclusion_threshold' => undef, '_queryacc' => 'adapter_consensus', '_iteration_index' => 0, '_iteration_count' => 1, '_hits' => [], '_hitindex' => 0, '_querylength' => '17', '_querydesc' => '', '_iterations' => [ bless( { '_oldhits_not_below_threshold' => [], '_newhits_unclassified' => [], '_number' => 1, '_oldhits_newly_below_threshold' => [], '_hit_factory' => bless( { 'interface' => 'Bio::Search::Hit::HitI', 'type' => 'Bio::Search::Hit::BlastHit', '_loaded_types' => { 'Bio::Search::Hit::BlastHit' => 1 }, '_root_verbose' => 0 }, 'Bio::Factory::ObjectFactory' ), '_newhits_below_threshold' => [ { '-algorithm' => 'BLASTN', '-description' => '', '-length' => '51', '-query_len' => '17', '-hsp_factory' => bless( { 'interface' => 'Bio::Search::HSP::HSPI', 'type' => 'Bio::Search::HSP::GenericHSP', '_loaded_types' => { 'Bio::Search::HSP::GenericHSP' => 1 }, '_root_verbose' => 0 }, 'Bio::Factory::ObjectFactory' ), '-name' => 'FVFTF6N01A86BR', '-rank' => 1, '-hsps' => [ { '-query_start' => '7', '-algorithm' => 'BLASTN', '-hit_seq' => 'ccagaattctc', '-hit_length' => '51', '-query_length' => '17', '-query_desc' => '', '-query_frame' => 0, '-rank' => 1, '-hit_desc' => '', '-query_end' => '17', '-hit_name' => 'FVFTF6N01A86BR', '-identical' => '11', '-query_name' => 'adapter_consensus', '-evalue' => '1e-04', '-score' => '11', '-conserved' => '11', '-hit_frame' => 0, '-hsp_length' => '11', '-query_seq' => 'ccagaattctc', '-hit_start' => '41', '-homology_seq' => '|||||||||||', '-hit_end' => '51', '-bits' => '22.3' }, { '-query_start' => '9', '-algorithm' => 'BLASTN', '-hit_seq' => 'agaattct', '-hit_length' => '51', '-query_length' => '17', '-query_desc' => '', '-query_frame' => 0, '-rank' => 2, '-hit_desc' => '', '-query_end' => '16', '-hit_name' => 'FVFTF6N01A86BR', '-identical' => '8', '-query_name' => 'adapter_consensus', '-evalue' => '0.007', '-score' => '8', '-conserved' => '8', '-hit_frame' => 0, '-hsp_length' => '8', '-query_seq' => 'agaattct', '-hit_start' => '50', '-homology_seq' => '||||||||', '-hit_end' => '43', '-bits' => '16.4' } ], '-accession' => 'FVFTF6N01A86BR', '-significance' => '1e-04' } ], '_root_verbose' => 0, '_newhits_not_below_threshold' => [], '_oldhits_below_threshold' => [] }, 'Bio::Search::Iteration::GenericIteration' ) ], '_hit_factory' => $VAR1->{'_iterations'}[0]{'_hit_factory'}, '_statistics' => bless( { 'stats' => { 'S1' => '4', 'S1_bits' => '8.4', 'kappa_gapped' => '0.711', 'X3_bits' => '99.1', 'X1' => '4', 'lambda_gapped' => '1.37', 'X2' => '15', 'S2' => '4', 'seqs_better_than_cutoff' => '1', 'Hits_to_DB' => '5', 'num_extensions' => '2', 'num_successful_extensions' => '2', 'X1_bits' => '7.9', 'X3' => '50', 'dbentries' => '1', 'entropy_gapped' => '1.31', 'X2_bits' => '29.7', 'S2_bits' => '8.4' } }, 'Bio::Search::GenericStatistics' ), '_algorithm' => 'BLASTN', '_parameters' => bless( { 'params' => { 'gapext' => '2', 'matrix' => 'blastn matrix:1 -3', 'expect' => '5.0', 'allowgaps' => 'yes', 'gapopen' => '5' } }, 'Bio::Tools::Run::GenericParameters' ), '_root_verbose' => 0, '_queryname' => 'adapter_consensus' }, 'Bio::Search::Result::BlastResult' ); From David.Messina at sbc.su.se Wed Aug 18 18:32:37 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 19 Aug 2010 00:32:37 +0200 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlast.pm - bl2seq question In-Reply-To: <4C6C3259.4060304@bcm.tmc.edu> References: <4C6C3259.4060304@bcm.tmc.edu> Message-ID: Hi Caleb, The first thing I would do is take BioPerl out of the equation and test your local bl2seq on the command line. If you get the same output locally as on the web version, then there is a problem with BioPerl. If you're still seeing a discrepancy between the web and your local run, then this isn't a problem with BioPerl. Just to be clear, BioPerl doesn't "implement" any of the BLAST programs; it is simply a wrapper around the programs that you download from NCBI. That doesn't mean BioPerl isn't at fault, of course, just that it's important to isolate the problem carefully. The most common reasons for these discrepancies are: - different version numbers of BLAST 2.2.21? 2.2.22? Is it the same on the web as locally? - similarly, different implementations of BLAST NCBI's old BLAST suite is now deprecated and replaced with BLAST+. All of the online BLAST web queries are Blast+ now ? are you running BLAST+ locally? (there's also a separate BioPerl wrapper for BLAST+ called Bio::Tools::Run::BlastPlus) - hidden "default" parameters Even though you're only changing a handful of parameters, the defaults (particularly on the web version) may be different than what you expect. In your case, it looks like on the web version, match score is 2 and mismatch is -3. However, in the local version I believe match score is 1 and a mismatch is -3. See this line in the params block near the end of your post: 'matrix' => 'blastn matrix:1 -3', Dave From sidd.basu at gmail.com Wed Aug 18 20:28:32 2010 From: sidd.basu at gmail.com (Siddhartha Basu) Date: Wed, 18 Aug 2010 19:28:32 -0500 Subject: [Bioperl-l] Re: [RFC] Interolog::Walk In-Reply-To: <4C6BF4BD.5010200@sms.ed.ac.uk> References: <4C6BF4BD.5010200@sms.ed.ac.uk> Message-ID: <20100819002830.GA366@Macintosh-235.local> Hi, On Wed, 18 Aug 2010, Giuseppe Gallone wrote: > Hello BioPerl community - I've written a new module called Interolog::Walk > that I'm planning to put on CPAN. I would be grateful if you might take a > look at the brief description I attached and tell me what you think. I'll > be more than happy to post further details should the module be of some > interest for someone. > > Also, I am not totally sure about having the correct name for it. This is > my first module and It would be great if you could advise on naming it > appropriately. Hopefully the following description will give an idea on > what it does. > > =================== > > > NAME > Interolog::Walk - Retrieve, score and visualize putative > Protein-Protein Interactions through the orthology-walk method > > DESCRIPTION > A common activity in computational biology is to mine protein-protein > interactions from publicly available databases in order to build > Protein-Protein Interaction (PPI) datasets. > In many instances, however, the number of experimentally obtained annotated > PPIs is very scarce and it would be helpful to enrich the experimental > dataset with high-quality, computationally-inferred PPIs. Such > computationally-obtained dataset can extend, support or enrich experimental > PPI datasets, and are of crucial importance in high-throughput gene > prioritization studies, i.e. to drive hypotheses and restrict the > dimensionality of many gene functional discovery problems. > This Perl Module, Interolog::Walk, is aimed at building putative PPI > datasets on the basis of a number of comparative biology paradigms: the > module implements a collection of computational biology algorithms based on > the concept of "orthology projection". If interacting proteins A and B in > organism X have orthologs A' and B' in organism Y, under certain conditions > one can assume that the interaction will be conserved in organism Y, i.e. > the A-B interaction can be "projected through the orthologies" to obtain a > putative A'-B' interaction. The pair of interactions (A-B) and (A'-B') are > named "Interologs" (see for instance [1] and [2]). > > Interolog::Walk collects, analyses and collates gene orthology data > provided by the Ensembl Consortium (www.ensembl.org) as well as PPI data > provided by EBI Intact (http://www.ebi.ac.uk/intact/). It provides the user > with the possibility of rating the quality and reliability of the putative > interactions collected, by means of confidence scores, and optionally > outputs network representations of the datasets, compatible with the > biological network representation standard, Cytoscape. Sounds interesting. I am currently playing around with a perl based webapp for displaying interactome using cytoscapeweb. Depending how your design pans out, would be happy to use your module as a backend analysis layer. And on a related note, you might want to have a look at bioperl-network and if there is any overlap might be worth contributing. -siddhartha > > USAGE > In order to carry out an interolog walk we start with a set of gene > identifiers in one organism of interest. We query those ids against a > number of comparative biology databases to retrieve a list of orthologues > for each gene id of interest, in one or more species. > In the following step we rely on PPI databases to retrieve the list of > available interactors for the protein ids obtained. The output at this > stage consists of a list of interactors of the orthologues of the initial > gene set, plus several fields of ancillary data. > In the last step of the process we project the interactions - again using > orthology data - back to the original species of interest. The output of > the process is a list of PUTATIVE INTERACTORS of the initial gene set, plus > several fields of ancillary data. > > ==================== > > Given the scope and the focus of the project, I would imagine that viable > alternatives for the namespace might be > > Bio::Orthology::InterologWalk > Bio::InterologMap > > or maybe > Interolog::Map > Orthology::Map > Orthology::InterologMap > > There are no similar projects as far as I could see so I shouldn't run the > risk of overlapping namespaces. Still I would love to know your informed > opinion about it. > > best, > Giuseppe > > > > REFERENCES > [1] Yu H, Luscombe NM, Lu HX, Zhu X, Xia Y, Han JD, Bertin N, Chung S, > Vidal M, Gerstein M. Annotation transfer between genomes: protein-protein > interologs and protein-DNA regulogs. Genome Research 2004 > Jun;14(6):1107-18. > > [2]Wiles AM, Doderer M, Ruan J, Gu T-T, Ravi D, Blackman BA, Bishop AJR. > "Building and Analyzing Protein Interactome Networks by Cross-species > Comparisons." BMC Systems Biology 2010, 4:36 - PMID: 20353594 > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From dan.kortschak at adelaide.edu.au Wed Aug 18 22:15:03 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Thu, 19 Aug 2010 11:45:03 +0930 Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query Message-ID: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au> Hi Everyone, I'm wanting to set up a persistent data store for some of my work and am in the process of choosing parts for my system. From my brief look around I think I'd like to use BioSQL (next best choice being Chado - but BioPerl bindings in bioperl-db for BioSQL being the decider here), but have noticed comments some time back that bioperl-db and PostgreSQL 8.3 (my prefered engine - though MySQL is possible, but makes the whole system messier) don't play well together. What is the status of the casting expectation conflict between bioperl-db and Pg8.3? The scripts are run with safe data, so placeholders aren't strictly crucial (though speed may be an issue?) and `$dbh->{pg_server_prepare} = 0;' seems like it could be an option. Can anybody provide any advice on this issue? thanks Dan Kortschak From cjfields at illinois.edu Wed Aug 18 23:29:36 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 18 Aug 2010 22:29:36 -0500 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlast.pm - bl2seq question In-Reply-To: References: <4C6C3259.4060304@bcm.tmc.edu> Message-ID: <194D43EC-A44C-450A-B57B-EC379DBCB935@illinois.edu> Wouldn't surprise me too much if the parameters are not set the same; IIRC the main BLAST URL API and the online NCBI Web-BLAST have different default settings. chris On Aug 18, 2010, at 5:32 PM, Dave Messina wrote: > Hi Caleb, > > The first thing I would do is take BioPerl out of the equation and test your local bl2seq on the command line. If you get the same output locally as on the web version, then there is a problem with BioPerl. If you're still seeing a discrepancy between the web and your local run, then this isn't a problem with BioPerl. > > Just to be clear, BioPerl doesn't "implement" any of the BLAST programs; it is simply a wrapper around the programs that you download from NCBI. That doesn't mean BioPerl isn't at fault, of course, just that it's important to isolate the problem carefully. > > The most common reasons for these discrepancies are: > > - different version numbers of BLAST > > 2.2.21? 2.2.22? Is it the same on the web as locally? > > - similarly, different implementations of BLAST > > NCBI's old BLAST suite is now deprecated and replaced with BLAST+. All of the online BLAST web queries are Blast+ now ? are you running BLAST+ locally? (there's also a separate BioPerl wrapper for BLAST+ called Bio::Tools::Run::BlastPlus) > > - hidden "default" parameters > > Even though you're only changing a handful of parameters, the defaults (particularly on the web version) may be different than what you expect. > > In your case, it looks like on the web version, match score is 2 and mismatch is -3. However, in the local version I believe match score is 1 and a mismatch is -3. > > See this line in the params block near the end of your post: > > 'matrix' => 'blastn matrix:1 -3', > > > > Dave > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at drycafe.net Thu Aug 19 01:48:19 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Thu, 19 Aug 2010 01:48:19 -0400 Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query In-Reply-To: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au> References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: <986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net> Hi Dan, the casting isn't an issue anymore, I think. (And even if it were, there is actually a small script that brings back the casts that were built into 8.2.) Have you found an example where it still is? -hilmar On Aug 18, 2010, at 10:15 PM, Dan Kortschak wrote: > Hi Everyone, > > I'm wanting to set up a persistent data store for some of my work > and am > in the process of choosing parts for my system. From my brief look > around I think I'd like to use BioSQL (next best choice being Chado - > but BioPerl bindings in bioperl-db for BioSQL being the decider here), > but have noticed comments some time back that bioperl-db and > PostgreSQL > 8.3 (my prefered engine - though MySQL is possible, but makes the > whole > system messier) don't play well together. > > What is the status of the casting expectation conflict between > bioperl-db and Pg8.3? The scripts are run with safe data, so > placeholders aren't strictly crucial (though speed may be an issue?) > and > `$dbh->{pg_server_prepare} = 0;' seems like it could be an option. > > Can anybody provide any advice on this issue? > > thanks > Dan Kortschak > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From dan.kortschak at adelaide.edu.au Thu Aug 19 01:54:03 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Thu, 19 Aug 2010 15:24:03 +0930 Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query In-Reply-To: <986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net> References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au> <986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net> Message-ID: <1282197243.14127.27.camel@zoidberg.mbs.adelaide.edu.au> Hi Hilmar, No, I haven't found any problems, just hoping to avoid them by prior research. thanks Dan On Thu, 2010-08-19 at 01:48 -0400, Hilmar Lapp wrote: > Hi Dan, > > the casting isn't an issue anymore, I think. (And even if it were, > there is actually a small script that brings back the casts that > were > built into 8.2.) Have you found an example where it still is? > > -hilmar From biopython at maubp.freeserve.co.uk Thu Aug 19 06:01:03 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 19 Aug 2010 11:01:03 +0100 Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query In-Reply-To: <986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net> References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au> <986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net> Message-ID: On Thu, Aug 19, 2010 at 6:48 AM, Hilmar Lapp wrote: > Hi Dan, > > the casting isn't an issue anymore, I think. (And even if it were, there is > actually a small script that brings back the casts that were built into > 8.2.) Have you found an example where it still is? > > ? ? ? ?-hilmar Hi Hilmar, Do the bioperl-db bindings for BioSQL on PostgreSQL still require those extra rules in the schema? http://bugzilla.open-bio.org/show_bug.cgi?id=2839 Peter From G.Gallone at sms.ed.ac.uk Thu Aug 19 06:45:36 2010 From: G.Gallone at sms.ed.ac.uk (Giuseppe Gallone) Date: Thu, 19 Aug 2010 11:45:36 +0100 Subject: [Bioperl-l] [RFC] Interolog::Walk In-Reply-To: <8BBCD561-CA5E-412B-8CBC-34767006A74A@sbc.su.se> References: <4C6BF4BD.5010200@sms.ed.ac.uk> <8BBCD561-CA5E-412B-8CBC-34767006A74A@sbc.su.se> Message-ID: <4C6D0B50.4050902@sms.ed.ac.uk> Hi Dave, thank you very much for your helpful comments. Regarding the module name: I will follow your advice and avoid to propose a new root during the module registration. As for the second level, I haven't been able to find anything related to homology/orthology, therefore I'm not sure whether I should go for Bio::Orthology::InterologMap or Bio::Homology::InterologMap The first one being maybe a bit more specific. I might also expand further as in Bio::Orthology::Interolog::Map, just in case somebody else finds other interesting applications for the Interolog concept and would like to "plug in" their own contribution. Would this make any sense? I also appreciate your comments on the documentation. The one I provided is actually not the full pod I was planning to include, but rather an extract. What I have at the moment is a description, for each method, in the following form: ===================================== remove_duplicate_rows Usage : $RC = InterologMap::remove_duplicate_rows(input_handle => $dbh, output_handle => $out_data, header => 'standard', ); Purpose : This is used to clean up a TSV data files of duplicate entries. Occasionally, Intact can return duplicate entries. This routine will make sure no such duplicates are kept. A new datafile is built. The number of unique data rows is updated. Returns : success/error Argument : database handle to input file, filehandle to outputfile, header type. Header type is one of the following: - "standard": when the routine is used to clean up an interolog walk file (the header will be longer) - "direct": when the routine is used to clean up a file of real db interaction (the header is shorter) - no field provided: default is standard Throws : - Comment : Sample See Also : ======================================= On top of that, there is a DESCRIPTION, USAGE, and SYNOPSIS. The synopsis has some code with an example of typical usage of the module. Please take a look at this (attached below) and tell me what you think. You mention that the description contains a lot of background information. Would you recommend reducing it, or placing it elsewhere? I was considering to write a little tutorial in latex as soon as possible anyway, to provide a "centralised" source of information to familiarise with the module. Does this respect the CPAN regulations? As for your question on the structure of the module: you are indeed right, the idea when running the "orthology walk" is to create a pipeline of subroutines: there's a core set of subroutines meant to work in strict sequentiality. Each of these subroutines expects, as input, the output of the previous one. The input/output dataset is currently in the form of a TSV text file, which I process with the help of the DBI module (to be more specific, I use DBD::CSV). While there's a certain flexibility regarding how to use the module, one core idea remains: in order to get the set of putative interactors, the user would have to call at least three basic routines: (A) ================= 1)get_forward_orthologies(): this queries the initial gene list against one or more Ensembl dbs (using the Ensembl Perl Api) and retrieves their orthologues, plus a number of ancillary data fields (mainly conservation data, eg dn/ds ratio,distance from ancestor,orthology type, etc) 2)get_interactors(): this queries the orthology list built in the previous stage against a PSICQUIC-enabled PPI db using Rest (at the moment I only query the EBI Intact DB, but it should be easy to expand this and query all PSICQUIC compatible PPI dbs transparently). This step will "fatten" the dataset built in (1) with the interactors of those orthologues, plus ancillary data (including lots of parameters describing the quality, nature, origin of the annotated interaction) 3)get_backward_orthologies(): this queries the interactor list built in the previous stage against one or more Ensembl dbs to find orthologues *back* in the original species. It also adds a number of supplementary information just like in (1). ================== At the end of this procedure the user will have a TSV files where each row contains a binary putative interaction plus (currently) 37 supplementary data fields. One can then scan these results to check for duplicates, to compute counts, to see if we have discovered new gene ids that were not present in the original dataset (hopefully we have :) ). Most importantly, one can then further process these results to do one or more of the following: (B) compute a global confidence score to assess the reliability of the each binary putative interaction (C) extract the binary putative PPIs from the dataset and save them in a format compatible with Cytoscape: this helps providing a visual quality to the result: one can then apply network analysis tools to discover motifs, clusters, etc. The format I use is currently .SIF + attributes, as detailed in http://cytoscape.wodaklab.org/wiki/Cytoscape_User_Manual/Network_Formats (D) given the same initial gene list, one can also build a dataset of REAL, experimentally-obtained PPIs,(without mapping through orthologies in other species). One can then compare this dataset with the Putative dataset to see if/where the two overlap, what's the intersection or the differences, etc. In order to suggest ways of using the module I have written 4 sample scripts and I will include them in the module. Each script utilises the module and uses/reuses subroutines in a pipeline fashion, and does the following: 1)doInterologWalk.pl: runs the basic pipeline in (A) 2)doScores.pl: computes and adds confidence scores as explained in (B) 3)doNetworks.pl: computes SIF network + attributes as in (D) 4)getRealInteractions.pl: runs a pipeline to obtain real PPIs from the inital gene set. Hope I didn't make this too confusing. I would love to hear back from you and from anybody else that would like to provide feedback. Cheers Giuseppe On 18/08/10 17:52, Dave Messina wrote: > Hi Giuseppe, > > Sounds really interesting ? thanks for posting this. > >> Bio::Orthology::InterologWalk > > I vote for this name, or in any case something with Bio:: as the top-level namespace since it's a biology-related package. > > I like that you're providing a lot of background and information about the project in the documentation. However, the USAGE section should give information about how to use the module, with example code. You can look at other modules on CPAN (or in BioPerl) to see the conventions for writing documentation. > > Also, from what you wrote, it sounds like this might be a pipeline or a script rather than a module per se, or perhaps a script and a set of modules. It would be helpful to clarify in your documentation (if you haven't already) how exactly things are organized (and of course example code will help with that, too). > > > Hope that's helpful, and let us know when you've got it up on CPAN so we can try it out! > > > Dave > > NAME Interolog::Walk - Retrieve, score and visualize putative Protein-Protein Interactions through the orthology-walk method SYNOPSIS use Interolog::Walk; First, obtain Intact Interactions for the dataset (see example in "getDirectInteractions.pl"): #get a registry from Ensembl my $registry = InterologMap::setup_ensembl_adaptor(connect_to_db => $ensembl_db, source_species => $sourceorg, verbose => 1 ); #query actual interactions $RC = InterologMap::Direct::get_direct_interactions(registry => $registry, source_species => $sourceorg, input_path => $in_path, output_path => $out_path, url => $url, ); do some postprocessing (see "do_counts()" and "extract_unseen_ids()" ) and then do the actual interolog walk on the dataset with the following sequence of three methods. get orthologues of starting set: $RC = InterologMap::get_forward_orthologies(registry => $registry, ensembl_db => $ensembl_db, input_path => $in_path, output_path => $out_path, source_org => $sourceorg, dest_org => $destorg, ); add interactors of orthologues found by "get_forward_orthologies()": $RC = InterologMap::get_interactions(input_path => $in_path, output_path => $out_path, url => $url, url_global => $url_global, ); add orthologues of interactors found by "get_interactions()": $RC = InterologMap::get_backward_orthologies(registry => $registry, ensembl_db => $ensembl_db, input_path => $in_path, output_path => $out_path, error_path => $err_path, source_org => $sourceorg, ); do some postprocessing (see "remove_duplicate_rows()", "do_counts()", "extract_unseen_ids()") and then optionally compute a composite score for the putative interactions obtained: $RC = InterologMap::Scores::compute_scores(input_path => $in_path, score_path => $score_path, output_path => $out_path, term_graph => $onto_graph, M_IT_SCORE => $M_IT, M_DM_SCORE => $M_DM, M_ME_DM_SCORE => $M_MDM, M_ME_TAXA_SCORE => $M_MTAXA ); get some networks and network attributes which you can then visualise with cytoscape $RC = InterologMap::Networks::do_network(registry => $registry, db => $ensembl_db, input_path => $in_path, output_path => $out_path, source_org => $sourceorg, orthology_type => $orthtype, ); $RC = InterologMap::Networks::do_attributes(registry => $registry, input_path => $in_path, output_path => $out_path, source_org => $sourceorg, label_type => 'external name' ); *The synopsis above only lists the major methods and parameters.* DESCRIPTION A common activity in computational biology is to mine protein-protein interactions from publicly available databases to build *Protein-Protein Interaction* (PPI) datasets. In many instances, however, the number of experimentally obtained annotated PPIs is very scarce and it would be helpful to enrich the experimental dataset with high-quality, computationally-inferred PPIs. Such computationally-obtained dataset can extend, support or enrich experimental PPI datasets, and are of crucial importance in high-throughput gene prioritization studies, i.e. to drive hypotheses and restrict the dimensionality of functional discovery problems. This Perl Module, Interolog::Walk, is aimed at building putative PPI datasets on the basis of a number of comparative biology paradigms: the module implements a collection of computational biology algorithms based on the concept of "orthology projection". If interacting proteins A and B in organism X have orthologs A' and B' in organism Y, under certain conditions one can assume that the interaction will be conserved in organism Y, i.e. the A-B interaction can be "projected through the orthologies" to obtain a putative A'-B' interaction. The pair of interactions (A-B) and (A'-B') are named "Interologs". Interolog::Walk collects, analyses and collates gene orthology data provided by the Ensembl Consortium as well as PPI data provided by EBI Intact. It provides the user with the possibility of rating the quality and reliability of the putative interactions collected, by means of confidence scores, and optionally outputs network representations of the datasets, compatible with the biological network representation standard, Cytoscape. BASIC USAGE Rationale behind "Interolog::Walk". \EBI Intact API/ .--------------. | .-------------. (2) | A(e.g. mouse)|<------------------------>| B(mouse) | (3) `--------------' `-------------' ^ | /Ensembl\ | | \ Ensembl / / Compara \ | | \Compara/ / Api \ | | \ Api / | | .--------------. .-------------. (1) | A'(e.g. fly) |. . . . . . . . . . . . . | B'(fly) | (4) `--------------' [SCORED]PUTATIVE PPI `-------------' (Output of Interolog::Walk) In order to carry out an interolog walk we start with a set of gene identifiers in one organism of interest (1). We query those ids against a number of comparative biology databases to retrieve a list of orthologues for the gene id of interest, in one or more species (2). In the next step we rely instead on PPI databases to retrieve the list of available interactors for the protein ids obtained in (2). The output at this stage consists of a list of interactors of the orthologues of the initial gene set, plus several fields of ancillary data (whose importance will be explained later) (3). In the last step of this process we will need to project the interactions in (3) - again using orthology data - back to the original species of interest. The output of the process is a list of PUTATIVE INTERACTORS of the initial gene set, plus several fields of ancillary data. "Interolog::Walk" provides three main functions to carry out the basic walk, "get_forward_orthologies()", "get_interactions()" and "get_backward_orthologies()". These functions must be called strictly sequentially in your script, as the process, analyse and attach data to the output in a pipeline-like fashion, i.e. processing the output of the preceding function. get_forward_orthologies get_interactions get_backward_orthologies SCORING THE PUTATIVE INTERACTIONS BUILDING PUTATIVE INTERACTION NETWORKS BUGS Please report any you find SUPPORT TODO AUTHOR Giuseppe Gallone CPAN ID: GGALLONE University of Edinburgh COPYRIGHT The Interolog::Walk module is Copyright (c) 2010 Giuseppe Gallone All rights reserved. You may distribute under the terms of either the GNU General Public License or the Artistic License, as specified in the Perl 5.10.0 README file. SEE ALSO -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From G.Gallone at sms.ed.ac.uk Thu Aug 19 08:42:28 2010 From: G.Gallone at sms.ed.ac.uk (Giuseppe Gallone) Date: Thu, 19 Aug 2010 13:42:28 +0100 Subject: [Bioperl-l] [RFC] Interolog::Walk In-Reply-To: <20100819002830.GA366@Macintosh-235.local> References: <4C6BF4BD.5010200@sms.ed.ac.uk> <20100819002830.GA366@Macintosh-235.local> Message-ID: <4C6D26B4.5090702@sms.ed.ac.uk> Dear Siddhartha, glad to hear this might be helpful. As for the bioperl-network package you mention, thank for you for mentioning that. I gave a quick look to its documentation and looks like a much deeper and more complex effort than what I have in my package. I've actually been using a lot the package Graph on which it seems to be based and found it very helpful. I'm not sure if the network routines in my module overlap with it though: all I do in my package is parse the dataset, filtering out only what requested to build a cytoscape SIF file and optionally some cytoscape NOA attribute files, as requested by the cytoscape specification in http://cytoscape.wodaklab.org/wiki/Cytoscape_User_Manual/Network_Formats instead it looks like bioperl-network actually builds some kind of internal representation of the network for further manipulation in Perl, if I understand it correctly? Kind regards Giuseppe On 19/08/10 01:28, Siddhartha Basu wrote: > Sounds interesting. I am currently playing around with a perl based webapp for displaying interactome > using cytoscapeweb. Depending how your design pans out, would be happy to > use your module as a backend analysis layer. And on a related note, you > might want to have a look at bioperl-network and if there is any overlap > might be worth contributing. > > -siddhartha > -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From xupeng86 at gmail.com Thu Aug 19 04:02:48 2010 From: xupeng86 at gmail.com (xupeng) Date: Thu, 19 Aug 2010 16:02:48 +0800 Subject: [Bioperl-l] Why I can't find the perl script "load_seqdatabase.pl" when use biosql database? Message-ID: <201008191602.49068.xupeng86@gmail.com> I've downloaded the biosql-1.0.1.tar.gz. It works well. But I can't find the 'load_seqdatabase.pl' when I try to import the Genbank files into biosql databsase. Can anyone give me a copy of that file? many thanks ! From sunhanifk at gmail.com Thu Aug 19 10:25:38 2010 From: sunhanifk at gmail.com (han sun) Date: Thu, 19 Aug 2010 22:25:38 +0800 Subject: [Bioperl-l] Could I install BioPerl on Windows with the ActivePerl 5.12.1? Message-ID: Hello everyone, I have used perl for several months,and I now want to feel the power of bioperl. But it seems that the installing is more difficult than I thought. I typed the commands. install-shell rep add bioperl http://bioperl.org/DIST rep add uwinnipeg http://cpan.uwinnipeg.ca/PPMPackages/12xx/ rep add trouchelle http://trouchelle.com/ppm12/ install BioPerl However,the installing failed, ppm install failed: Can't find any package that provides List::MoreUtils for Bundle-BioPerl-Core Can't find any package that provides PostScript::TextBlock for Bundle-BioPerl-Core Can't find any package that provides Ace:: for Bundle-BioPerl-Core Can't find any package that provides SOAP::Lite for Bundle-BioPerl-Core Can't find any package that provides Convert::Binary::C for Bundle-BioPerl-Core Can't find any package that provides XML::Twig for Bundle-BioPerl-Core Can't find any package that provides DB_File:: for Bundle-BioPerl-Core Can't find any package that provides IPC::Run for GraphViz Can't find any package that provides XML-XPathEngine for XML-DOM-XPath Can't find any package that provides List-MoreUtils for Moose Can't find any package that provides List-MoreUtils for Class-MOP then I tried install http://www.bribes.org/perl/ppm/GD.ppd and tried the installation again,but it still didn't help. * * * * * * *Do you konw what's wrong with the problem?* * * * * *Please help me,thanks very much.* From cjfields1 at gmail.com Thu Aug 19 10:33:26 2010 From: cjfields1 at gmail.com (Christopher Fields) Date: Thu, 19 Aug 2010 09:33:26 -0500 Subject: [Bioperl-l] Could I install BioPerl on Windows with the ActivePerl 5.12.1? In-Reply-To: References: Message-ID: <78E913D5-00E2-45F2-AA9D-7F4A7CDBFDA1@gmail.com> Try using ActivePerl 5.10 instead of v5.12. It's very possible the PPM won't work for v5.12 yet. chris On Aug 19, 2010, at 9:25 AM, han sun wrote: > Hello everyone, > > I have used perl for several months,and I now want to feel the power of > bioperl. > But it seems that the installing is more difficult than I thought. > > I typed the commands. > > > > install-shell > > > rep add bioperl http://bioperl.org/DIST > > > rep add uwinnipeg > http://cpan.uwinnipeg.ca/PPMPackages/12xx/ > > > rep add trouchelle http://trouchelle.com/ppm12/ > > install BioPerl > > However,the installing failed, > > ppm install failed: > Can't find any package that provides List::MoreUtils for Bundle-BioPerl-Core > Can't find any package that provides PostScript::TextBlock for > Bundle-BioPerl-Core > Can't find any package that provides Ace:: for Bundle-BioPerl-Core > Can't find any package that provides SOAP::Lite for Bundle-BioPerl-Core > Can't find any package that provides Convert::Binary::C for > Bundle-BioPerl-Core > Can't find any package that provides XML::Twig for Bundle-BioPerl-Core > Can't find any package that provides DB_File:: for Bundle-BioPerl-Core > Can't find any package that provides IPC::Run for GraphViz > Can't find any package that provides XML-XPathEngine for XML-DOM-XPath > Can't find any package that provides List-MoreUtils for Moose > Can't find any package that provides List-MoreUtils for Class-MOP > > > then I tried > > install http://www.bribes.org/perl/ppm/GD.ppd > > and tried the installation again,but it still didn't help. > > * > * > * > * > * > * > > > *Do you konw what's wrong with the problem?* > * > * > * > * > *Please help me,thanks very much.* > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at drycafe.net Thu Aug 19 10:53:22 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Thu, 19 Aug 2010 10:53:22 -0400 Subject: [Bioperl-l] Why I can't find the perl script "load_seqdatabase.pl" when use biosql database? In-Reply-To: <201008191602.49068.xupeng86@gmail.com> References: <201008191602.49068.xupeng86@gmail.com> Message-ID: <14AD92D9-3EA7-4E74-8EF7-2B2EC70D9095@drycafe.net> The file comes with Bioperl-db, not BioSQL. That is so because it depends on BioPerl and on Bioperl-db, and so you will need to have both installed. -hilmar On Aug 19, 2010, at 4:02 AM, xupeng wrote: > I've downloaded the biosql-1.0.1.tar.gz. It works well. But I > can't find the 'load_seqdatabase.pl' when I try to import the > Genbank files into biosql databsase. > Can anyone give me a copy of that file? > many thanks ! > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From hlapp at drycafe.net Thu Aug 19 10:58:46 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Thu, 19 Aug 2010 10:58:46 -0400 Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query In-Reply-To: References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au> <986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net> Message-ID: <045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net> Yes, unfortunately they do. The feature for obviating them (namely nested transactions) is there in Pg 8.2+, but Bioperl-db doesn't use them yet ... I have to learn more about Class::DBIx first to decide whether it's better to first implement nested transactions in the home- grown ORM that Bioperl-db in essence is, or whether it's better to reimplement everything in Class::DBIx instead. There are new datatypes in Bioperl, and relations in BioSQL that could hold them, and so I need to decide what's the way forward. -hilmar On Aug 19, 2010, at 6:01 AM, Peter wrote: > On Thu, Aug 19, 2010 at 6:48 AM, Hilmar Lapp > wrote: >> Hi Dan, >> >> the casting isn't an issue anymore, I think. (And even if it were, >> there is >> actually a small script that brings back the casts that were built >> into >> 8.2.) Have you found an example where it still is? >> >> -hilmar > > Hi Hilmar, > > Do the bioperl-db bindings for BioSQL on PostgreSQL still require > those > extra rules in the schema? > http://bugzilla.open-bio.org/show_bug.cgi?id=2839 > > Peter -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From mmuratet at hudsonalpha.org Thu Aug 19 11:00:52 2010 From: mmuratet at hudsonalpha.org (Michael Muratet) Date: Thu, 19 Aug 2010 10:00:52 -0500 Subject: [Bioperl-l] Why I can't find the perl script "load_seqdatabase.pl" when use biosql database? In-Reply-To: <14AD92D9-3EA7-4E74-8EF7-2B2EC70D9095@drycafe.net> References: <201008191602.49068.xupeng86@gmail.com> <14AD92D9-3EA7-4E74-8EF7-2B2EC70D9095@drycafe.net> Message-ID: On Aug 19, 2010, at 9:53 AM, Hilmar Lapp wrote: > The file comes with Bioperl-db, not BioSQL. That is so because it > depends on BioPerl and on Bioperl-db, and so you will need to have > both installed. Is load_seqdatabase.pl still the best method? I vaguely remember a post that said that load_seqdatabase was deprecated, but I can't find it in the archives. Mike > > -hilmar > > On Aug 19, 2010, at 4:02 AM, xupeng wrote: > >> I've downloaded the biosql-1.0.1.tar.gz. It works well. But I >> can't find the 'load_seqdatabase.pl' when I try to import the >> Genbank files into biosql databsase. >> Can anyone give me a copy of that file? >> many thanks ! >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Michael Muratet, Ph.D. Senior Scientist HudsonAlpha Institute for Biotechnology mmuratet at hudsonalpha.org (256) 327-0473 (p) (256) 327-0966 (f) Room 4005 601 Genome Way Huntsville, Alabama 35806 From hlapp at drycafe.net Thu Aug 19 11:29:31 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Thu, 19 Aug 2010 11:29:31 -0400 Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query In-Reply-To: <5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu> References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au> <986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net> <045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net> <5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu> Message-ID: <5F77404A-086D-4D0C-B3A5-F5119FCF878A@drycafe.net> On Aug 19, 2010, at 11:09 AM, Chris Fields wrote: > DBIx::Class Did I have this in the wrong order :-) More coffee, please. -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From hlapp at drycafe.net Thu Aug 19 11:30:26 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Thu, 19 Aug 2010 11:30:26 -0400 Subject: [Bioperl-l] Why I can't find the perl script "load_seqdatabase.pl" when use biosql database? In-Reply-To: References: <201008191602.49068.xupeng86@gmail.com> <14AD92D9-3EA7-4E74-8EF7-2B2EC70D9095@drycafe.net> Message-ID: It's not deprecated. Unless I'm again mixing up something? -hilmar On Aug 19, 2010, at 11:00 AM, Michael Muratet wrote: > > On Aug 19, 2010, at 9:53 AM, Hilmar Lapp wrote: > >> The file comes with Bioperl-db, not BioSQL. That is so because it >> depends on BioPerl and on Bioperl-db, and so you will need to have >> both installed. > > Is load_seqdatabase.pl still the best method? I vaguely remember a > post that said that load_seqdatabase was deprecated, but I can't > find it in the archives. > > Mike > >> >> -hilmar >> >> On Aug 19, 2010, at 4:02 AM, xupeng wrote: >> >>> I've downloaded the biosql-1.0.1.tar.gz. It works well. But I >>> can't find the 'load_seqdatabase.pl' when I try to import the >>> Genbank files into biosql databsase. >>> Can anyone give me a copy of that file? >>> many thanks ! >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : >> =========================================================== >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Michael Muratet, Ph.D. > Senior Scientist > HudsonAlpha Institute for Biotechnology > mmuratet at hudsonalpha.org > (256) 327-0473 (p) > (256) 327-0966 (f) > > Room 4005 > 601 Genome Way > Huntsville, Alabama 35806 > > > > > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From cjfields at illinois.edu Thu Aug 19 11:09:13 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 19 Aug 2010 10:09:13 -0500 Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query In-Reply-To: <045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net> References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au> <986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net> <045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net> Message-ID: <5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu> I think it's worth exploring having a DBIx::Class-based middle-ware approach similar to what Rob Buels has done for Chado. That would be fairly easy to get started using DBIx::Class::Schema::Loader. After that it would require optimization and tweaking, which is potentially more complex than Rob's setup as Chado is very Pg-specific, but maybe Rob can elaborate... chris On Aug 19, 2010, at 9:58 AM, Hilmar Lapp wrote: > Yes, unfortunately they do. The feature for obviating them (namely nested transactions) is there in Pg 8.2+, but Bioperl-db doesn't use them yet ... I have to learn more about Class::DBIx first to decide whether it's better to first implement nested transactions in the home-grown ORM that Bioperl-db in essence is, or whether it's better to reimplement everything in Class::DBIx instead. > > There are new datatypes in Bioperl, and relations in BioSQL that could hold them, and so I need to decide what's the way forward. > > -hilmar > > On Aug 19, 2010, at 6:01 AM, Peter wrote: > >> On Thu, Aug 19, 2010 at 6:48 AM, Hilmar Lapp wrote: >>> Hi Dan, >>> >>> the casting isn't an issue anymore, I think. (And even if it were, there is >>> actually a small script that brings back the casts that were built into >>> 8.2.) Have you found an example where it still is? >>> >>> -hilmar >> >> Hi Hilmar, >> >> Do the bioperl-db bindings for BioSQL on PostgreSQL still require those >> extra rules in the schema? >> http://bugzilla.open-bio.org/show_bug.cgi?id=2839 >> >> Peter > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Aug 19 11:37:39 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 19 Aug 2010 10:37:39 -0500 Subject: [Bioperl-l] Why I can't find the perl script "load_seqdatabase.pl" when use biosql database? In-Reply-To: References: <201008191602.49068.xupeng86@gmail.com> <14AD92D9-3EA7-4E74-8EF7-2B2EC70D9095@drycafe.net> Message-ID: <68FB78FF-11F7-43D7-9FA3-5DFF7D391FAB@illinois.edu> I don't recall this either. So, can't blame it on lack of coffee :) chris On Aug 19, 2010, at 10:30 AM, Hilmar Lapp wrote: > It's not deprecated. Unless I'm again mixing up something? > > -hilmar > > On Aug 19, 2010, at 11:00 AM, Michael Muratet wrote: > >> >> On Aug 19, 2010, at 9:53 AM, Hilmar Lapp wrote: >> >>> The file comes with Bioperl-db, not BioSQL. That is so because it depends on BioPerl and on Bioperl-db, and so you will need to have both installed. >> >> Is load_seqdatabase.pl still the best method? I vaguely remember a post that said that load_seqdatabase was deprecated, but I can't find it in the archives. >> >> Mike >> >>> >>> -hilmar >>> >>> On Aug 19, 2010, at 4:02 AM, xupeng wrote: >>> >>>> I've downloaded the biosql-1.0.1.tar.gz. It works well. But I >>>> can't find the 'load_seqdatabase.pl' when I try to import the >>>> Genbank files into biosql databsase. >>>> Can anyone give me a copy of that file? >>>> many thanks ! >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : >>> =========================================================== >>> >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Michael Muratet, Ph.D. >> Senior Scientist >> HudsonAlpha Institute for Biotechnology >> mmuratet at hudsonalpha.org >> (256) 327-0473 (p) >> (256) 327-0966 (f) >> >> Room 4005 >> 601 Genome Way >> Huntsville, Alabama 35806 >> >> >> >> >> > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From mmuratet at hudsonalpha.org Thu Aug 19 11:40:02 2010 From: mmuratet at hudsonalpha.org (Michael Muratet) Date: Thu, 19 Aug 2010 10:40:02 -0500 Subject: [Bioperl-l] Why I can't find the perl script "load_seqdatabase.pl" when use biosql database? In-Reply-To: <68FB78FF-11F7-43D7-9FA3-5DFF7D391FAB@illinois.edu> References: <201008191602.49068.xupeng86@gmail.com> <14AD92D9-3EA7-4E74-8EF7-2B2EC70D9095@drycafe.net> <68FB78FF-11F7-43D7-9FA3-5DFF7D391FAB@illinois.edu> Message-ID: On Aug 19, 2010, at 10:37 AM, Chris Fields wrote: > I don't recall this either. So, can't blame it on lack of coffee :) Thanks. I'll keep using it! Mike > > chris > > On Aug 19, 2010, at 10:30 AM, Hilmar Lapp wrote: > >> It's not deprecated. Unless I'm again mixing up something? >> >> -hilmar >> >> On Aug 19, 2010, at 11:00 AM, Michael Muratet wrote: >> >>> >>> On Aug 19, 2010, at 9:53 AM, Hilmar Lapp wrote: >>> >>>> The file comes with Bioperl-db, not BioSQL. That is so because it >>>> depends on BioPerl and on Bioperl-db, and so you will need to >>>> have both installed. >>> >>> Is load_seqdatabase.pl still the best method? I vaguely remember a >>> post that said that load_seqdatabase was deprecated, but I can't >>> find it in the archives. >>> >>> Mike >>> >>>> >>>> -hilmar >>>> >>>> On Aug 19, 2010, at 4:02 AM, xupeng wrote: >>>> >>>>> I've downloaded the biosql-1.0.1.tar.gz. It works well. But I >>>>> can't find the 'load_seqdatabase.pl' when I try to import the >>>>> Genbank files into biosql databsase. >>>>> Can anyone give me a copy of that file? >>>>> many thanks ! >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> -- >>>> =========================================================== >>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : >>>> =========================================================== >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> Michael Muratet, Ph.D. >>> Senior Scientist >>> HudsonAlpha Institute for Biotechnology >>> mmuratet at hudsonalpha.org >>> (256) 327-0473 (p) >>> (256) 327-0966 (f) >>> >>> Room 4005 >>> 601 Genome Way >>> Huntsville, Alabama 35806 >>> >>> >>> >>> >>> >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : >> =========================================================== >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > Michael Muratet, Ph.D. Senior Scientist HudsonAlpha Institute for Biotechnology mmuratet at hudsonalpha.org (256) 327-0473 (p) (256) 327-0966 (f) Room 4005 601 Genome Way Huntsville, Alabama 35806 From cjfields at illinois.edu Thu Aug 19 11:55:54 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 19 Aug 2010 10:55:54 -0500 Subject: [Bioperl-l] Bug? Features with similar ranges, different IDs are considered overlapping In-Reply-To: References: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu> <83732A2C-C6E3-479A-ACAA-FE5756271991@sbc.su.se> Message-ID: <5611499B-FA63-4A52-8279-99B554418374@illinois.edu> On Aug 17, 2010, at 8:52 AM, Dave Messina wrote: >> It seems to me that the genomic comparison is the thing people would do more often, so if you're going to create a flag, the default should be for the genomic comparison > > Yep, agreed. > > And such a flag should be named for the non-default behavior, then, like: -ignore_IDs_for_overlaps > > Dave Probably would just be -ignore_ids as this behavior would have to be consistent across the various Bio::RangeI methods (overlaps, contains, etc). The params are case-insensitive IIRC, so the _IDs would just be lc(). RangeI doesn't define a seq_id(), though, so we either use can() in RangeI (which is dirtier IMO) or define this in the appropriate class, probably LocationI or SeqFeatureI. chris From cjfields at illinois.edu Thu Aug 19 11:56:11 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 19 Aug 2010 10:56:11 -0500 Subject: [Bioperl-l] Bug? Features with similar ranges, different IDs are considered overlapping In-Reply-To: References: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu> <83732A2C-C6E3-479A-ACAA-FE5756271991@sbc.su.se> Message-ID: <7CF700A0-C7A0-4BD2-9757-50B693B3B614@illinois.edu> Makes sense. chris On Aug 17, 2010, at 7:45 AM, Scott Cain wrote: > Hi Dave and Chris, > > It seems to me that the genomic comparison is the thing people would do more often, so if you're going to create a flag, the default should be for the genomic comparison and if somebody is doing the protein space comparison and not getting the the expected results, they'll probably read the docs to find out why. > > Scott > > -- > Scott Cain, Ph. D. > scott at scottcain dot net > Ontario Institute for Cancer Research > http://gmod.org/ > 216 392 3087 > > Snet from my iPhone. > > On Aug 17, 2010, at 5:06 AM, Dave Messina wrote: > >>> Good point; it's probably the context the methods are used that matters. So, maybe just a document clarification? >> >> That's always good, but it really doesn't solve the issue you're describing. >> >> I mean, who would expect to get overlaps for features on different chromosomes? >> >> To me, that's a clear violation of reasonable user expectations. You shouldn't have to read the docs about something like that. >> >> So what's the solution for these duelling use cases? I haven't thought about it much, but a first approximation might be to add a -genomic boolean flag that, when true, would do the right thing and check the ID when doing overlaps or other positional comparisons. >> >> (Maybe -genomic is too obscure. Maybe it should be -same_id_for_overlaps or something like that.) >> >> And maybe having to know to set a flag is effectively the same thing as having to read the docs to understand SeqFeature's overlap behavior. >> >> What do the rest of you out there think? >> >> >> Dave >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Thu Aug 19 12:54:23 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 19 Aug 2010 18:54:23 +0200 Subject: [Bioperl-l] Bug? Features with similar ranges, different IDs are considered overlapping References: <83299B71-0F73-440D-A9C5-DC1DA2AFF605@davemessina.com> Message-ID: <1EFB951F-AEE1-4B2A-9E29-114E40B25D21@sbc.su.se> [Ccing list for real this time] On Aug 19, 2010, at 17:55, Chris Fields wrote: > Probably would just be -ignore_ids You're right, that's the way to go. > define this in the appropriate class, probably LocationI or Yep, that's cleaner. Thanks! Dave From cjfields1 at gmail.com Thu Aug 19 13:20:32 2010 From: cjfields1 at gmail.com (Christopher Fields) Date: Thu, 19 Aug 2010 12:20:32 -0500 Subject: [Bioperl-l] Could I install BioPerl on Windows with the ActivePerl 5.12.1? In-Reply-To: References: <78E913D5-00E2-45F2-AA9D-7F4A7CDBFDA1@gmail.com> Message-ID: <5115F433-06AC-46F1-81AD-D15C4A8D9524@gmail.com> cc'ing list. Looks like the BioPerl PPM is possibly broken for perl 5.12. Shouldn't be too hard to fix, but apparently there are a lot of missing packages. Troubling... chris On Aug 19, 2010, at 11:29 AM, han sun wrote: > v5.10 works,thanks. > > 2010/8/19 Christopher Fields > Try using ActivePerl 5.10 instead of v5.12. It's very possible the PPM won't work for v5.12 yet. > > chris > > On Aug 19, 2010, at 9:25 AM, han sun wrote: > > > Hello everyone, > > > > I have used perl for several months,and I now want to feel the power of > > bioperl. > > But it seems that the installing is more difficult than I thought. > > > > I typed the commands. > > > > > > > > install-shell > > > > > > rep add bioperl http://bioperl.org/DIST > > > > > > rep add uwinnipeg > > http://cpan.uwinnipeg.ca/PPMPackages/12xx/ > > > > > > rep add trouchelle http://trouchelle.com/ppm12/ > > > > install BioPerl > > > > However,the installing failed, > > > > ppm install failed: > > Can't find any package that provides List::MoreUtils for Bundle-BioPerl-Core > > Can't find any package that provides PostScript::TextBlock for > > Bundle-BioPerl-Core > > Can't find any package that provides Ace:: for Bundle-BioPerl-Core > > Can't find any package that provides SOAP::Lite for Bundle-BioPerl-Core > > Can't find any package that provides Convert::Binary::C for > > Bundle-BioPerl-Core > > Can't find any package that provides XML::Twig for Bundle-BioPerl-Core > > Can't find any package that provides DB_File:: for Bundle-BioPerl-Core > > Can't find any package that provides IPC::Run for GraphViz > > Can't find any package that provides XML-XPathEngine for XML-DOM-XPath > > Can't find any package that provides List-MoreUtils for Moose > > Can't find any package that provides List-MoreUtils for Class-MOP > > > > > > then I tried > > > > install http://www.bribes.org/perl/ppm/GD.ppd > > > > and tried the installation again,but it still didn't help. > > > > * > > * > > * > > * > > * > > * > > > > > > *Do you konw what's wrong with the problem?* > > * > > * > > * > > * > > *Please help me,thanks very much.* > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From rmb32 at cornell.edu Thu Aug 19 13:09:45 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 19 Aug 2010 10:09:45 -0700 Subject: [Bioperl-l] reminder: Aug 25 deadline for GMOD Hackathon application Message-ID: <4C6D6559.3080809@cornell.edu> Hi all, This is your one-week reminder: the deadline for open applications to the GMOD Evo hackathon is Wednesday, August 25th. Rob ======================================== We are seeking participants for the GMOD Tools for Evolutionary Biology Hackathon, held November 8-12, 2010 at the US National Evolutionary Synthesis Center (NESCent) in Durham, NC. This hackathon targets three critical gaps in the capabilities of the GMOD toolbox that currently limit its utility for evolutionary research: 1. Visualization of comparative genomics data 2. Visualization of phylogenetic data and trees 3. Support for population diversity and phenotype data If you are interested in these areas and have relevant expertise, you are strongly encouraged to apply. Relevant areas of expertise include more than just software development: if you are a GMOD power user, visualization guru, domain expert (comparative, phylogenetics, population, ...), or documentation wizard, then your skills are needed! How To Apply: Fill out the online application form at http://bit.ly/gmodevohack. Applications are due August 25. About GMOD: GMOD is an intercompatible suite of open-source software components for storing, managing, analyzing, and visualizing genome-scale data. GMOD includes many widely-used software components: GBrowse and JBrowse, both genome viewers; GBrowse_syn, a comparative genomics viewer; Chado, a generic and modular database schema; CMap, a comparative map viewer; as well as many other components including Apollo, MAKER, BioMart, InterMine, and Galaxy. We hope to extend the functionality of existing GMOD components, and integrate new components as well. About Hackathons: A hackathon is an intense event at which a group of programmers with different backgrounds and skills collaborate hands-on and face-to-face to develop working code that is of utility to the community as a whole. The mix of people will include domain experts and computer-savvy end-users. More details about the event, its motivation, organization, procedures, and attendees, as well as URLs to the hackathon and related websites are included below. Sincerely, The GMOD EvoHack Organizing Committee (and project affiliations as relevant): Nicole Washington, Chair (LBNL, modENCODE, Phenote) Robert Buels (SGN, Chado NatDiv) Scott Cain (OICR, GMOD) Dave Clements (NESCent, GMOD) Hilmar Lapp (NESCent, Phenoscape, Chado NatDiv) Sheldon McKay (University of Arizona, iPlant, GBrowse_syn) ----------------------------- About the GMOD Evo Hackathon Overview We are organizing a hackathon to fill critical gaps in the capabilities of the Generic Model Organism Database (GMOD) toolbox that currently limit its utility for evolutionary research. Specifically, we will focus on tools for 1) viewing comparative genomics data; 2) visualizing phylogenomic data; and 3) supporting population diversity data and phenotype annotation. The event will be hosted at NESCent and bring together a group of about 20+ software developers, end-user representatives, and documentation experts who would otherwise not meet. The participants will include key developers of GMOD components that currently lack features critical for emerging evolutionary biology research, developers of informatics tools in evolutionary research that lack GMOD integration, and informatics-savvy biologists who can represent end-user requirements. The event will provide a unique opportunity to infuse the GMOD developer community with a heightened awareness of unmet needs in evolutionary biology that GMOD components have the potential to fill, and for tool developers in evolutionary biology to better understand how best to extend or integrate with already existing GMOD components. Before the Event Discussion of ideas and sometimes even design actually starts well before the hackathon, on mailing lists, wiki pages, and conference calls set up among accepted attendees. This advance work lays the foundation for participants to be productive from the very first day. This also means that participants should be willing to contribute some time in advance of the hackathon itself to participate in this preparatory discussion. During the Event Typically, hackathon participants use the morning of the first day of the event to organize themselves into working groups of between 3 and 6 people, each with a focused implementation objective. Ideas and objectives are discussed, and attendees coalesce around the projects in which they have the most experience or interest. Deliverables / Event Results The meeting's attendance, working groups, and outcomes will be fully logged and documented on the GMOD wiki (http://gmod.org). Each working group during the event will typically have its own wiki page, linked from the main EvoHack page, where it documents its minutes and design notes, and provides links to the code and documentation it produces. Also, since GMOD and NESCent are both committed to open source principles, all code and documentation produced by participants during the hackathon must be published under an OSI-approved open source license. As contributions to existing GMOD tools, all hackathon products will most likely satisfy this requirement automatically. NESCent This event is sponsored by the US National Evolutionary Synthesis Center (NESCent, http://www.nescent.org) through its Informatics Whitepapers program (http://www.nescent.org/informatics/whitepapers.php). NESCent promotes the synthesis of information, concepts and knowledge to address significant, emerging, or novel questions in evolutionary science and its applications. NESCent achieves this by supporting research and education across disciplinary, institutional, geographic, and demographic boundaries (see http://www.nescent.org/science/proposals.php). Links Main GMOD EvoHack page, and full proposal: http://gmod.org/wiki/GMOD_Evo_Hackathon NESCent: http://www.nescent.org/ GMOD: http://gmod.org Similar past NESCent events, see: http://hackathon.nescent.org/ GMOD hackathon application: http://bit.ly/gmodevohack -- http://gmod.org/wiki/GMOD_News http://gmod.org/wiki/GMOD_Europe_2010 http://gmod.org/wiki/Help_Desk_Feedback From David.Messina at sbc.su.se Thu Aug 19 14:55:50 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 19 Aug 2010 20:55:50 +0200 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlast.pm - bl2seq question In-Reply-To: <4C6D7123.9080908@bcm.tmc.edu> References: <4C6C3259.4060304@bcm.tmc.edu> <4C6D7123.9080908@bcm.tmc.edu> Message-ID: <4E977318-05AC-4D8E-9A39-8C07A2419198@sbc.su.se> Glad I could help, Caleb. Dave On Aug 19, 2010, at 20:00, Caleb Davis wrote: > Hi Dave, > > Thank you so much for your detailed response! Fixing the reward parameter replicated the online result for me. All of the other factors you brought up will help me track down any future problems. Thanks again. > > --Caleb > From rmb32 at cornell.edu Thu Aug 19 18:19:11 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 19 Aug 2010 15:19:11 -0700 Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query In-Reply-To: <5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu> References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au> <986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net> <045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net> <5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu> Message-ID: <4C6DADDF.1000103@cornell.edu> Chris Fields wrote: > I think it's worth exploring having a DBIx::Class-based middle-ware approach similar to what Rob Buels has done for Chado. That would be fairly easy to get started using DBIx::Class::Schema::Loader. > > After that it would require optimization and tweaking, which is potentially more complex than Rob's setup as Chado is very Pg-specific, but maybe Rob can elaborate... Elaborating on how Bio::Chado::Schema is developed: The vast majority of the code and POD in BCS is autogenerated by DBIx::Class::Schema::Loader. DBICSL gives you a baseline set of DBIx::Class classes that covers all the tables, views, columns, unique constraints, and foreign key relationships. Beyond that, you have to add on yourself. In BCS, we have mostly done things like: * make better-named aliases for some of the autogenerated relationships (though DBICSL does a surprisingly good job of naming relationships automatically most of the time) * add a tiny bit of bioperl compatibility (this needs a lot more work by somebody, volunteers needed!) * add convenience methods for using some of the Chado property tables * use DBIx::Class::Tree::NestedSet to add some powerful ways of traversing phylogenetic tree relationships Regarding DB backend specificity, BCS isn't Pg-specific at all, because DBIx::Class itself goes to great lengths to be compatible (and performant!) with just about every relational database out there. In fact, the BCS test suite deploys a Chado schema into a temporary SQLite database using DBIC::Schema's deploy() method, and runs all of its tests on that. Very handy. Chado's Pg-specific server-side functions can of course be called through BCS if they are present, but it's perfectly possible to use Chado without any of the server-side functions, and mostly the way I use it. Rob From David.Messina at sbc.su.se Fri Aug 20 05:19:14 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 20 Aug 2010 11:19:14 +0200 Subject: [Bioperl-l] Git for the lazy Message-ID: <4A13D48C-B920-4FA5-AF18-292C764A8B79@sbc.su.se> Hi everyone, If you're like me and still getting up to speed with Git, you might find this helpful: http://www.spheredev.org/wiki/Git_for_the_lazy Dave From bgs500 at york.ac.uk Fri Aug 20 09:07:50 2010 From: bgs500 at york.ac.uk (Ben Saville) Date: Fri, 20 Aug 2010 14:07:50 +0100 Subject: [Bioperl-l] Problem Parsing BLAST output Message-ID: <77D8BE69-E2D3-4C75-BED4-08F1755E414F@york.ac.uk> Hi Everyone, I'm very much new to the world of sequence data analysis (and this mailing list!), and have reached a roadblock. I have BLASTed some contigs against a series of databases that I created. From this I would like to parse through the data and separate it before extracting the information of interest at a later point. I would like to separate the data by query ID. I found the following Bioperl script; #!/usr/bin/perl use Bio::Search::Result::BlastResult; use Bio::SearchIO; my $report = Bio::SearchIO->new( -file=>'All_BCM_results.bls', -format => blast); my $result = $report->next_result; my %hits_by_query; while (my $hit = $result->next_hit) { push @{$hits_by_query{$hit->name}}, $hit; } foreach my $qid ( keys %hits_by_query ) { my $result = Bio::Search::Result::BlastResult->new(); $result->add_hit($_) for ( @{$hits_by_query{$qid}} ); my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", - format=>'blast' ); $blio->write_result($result); } running this script resulted in the following error; BlastResult::new(): Not adding iterations. ------------- EXCEPTION: Bio::Root::NoSuchThing ------------- MSG: No such iteration number: 0. Valid range=1-0 VALUE: The number zero (0) STACK: Error::throw STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.8/Bio/Root/Root.pm:368 STACK: Bio::Search::Result::BlastResult::iteration /sw/lib/perl5/5.8.8/ Bio/Search/Result/BlastResult.pm:328 STACK: Bio::Search::Result::BlastResult::add_hit /sw/lib/perl5/5.8.8/ Bio/Search/Result/BlastResult.pm:258 STACK: /Users/bsaville/Desktop/Parsing_BLAST_by_query.pl:15 ------------------------------------------------------------- So I added my $result = Bio::Search::Result::BlastResult->new(1); The 1 to the line shown above, as it told me this was within the valid range. This produced the following error; ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Must define arrayref of Iterations when initializing a Bio::Search::Result::BlastResult STACK: Error::throw STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.8/Bio/Root/Root.pm:368 STACK: Bio::Search::Result::BlastResult::new /sw/lib/perl5/5.8.8/Bio/ Search/Result/BlastResult.pm:128 STACK: /Users/bsaville/Desktop/Parsing_BLAST_by_query.pl:14 ----------------------------------------------------------- I know that it is my inexperience that is causing this problem, but I really can't figure this out. Regards Ben Saville From David.Messina at sbc.su.se Fri Aug 20 09:48:28 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 20 Aug 2010 15:48:28 +0200 Subject: [Bioperl-l] Problem Parsing BLAST output In-Reply-To: <77D8BE69-E2D3-4C75-BED4-08F1755E414F@york.ac.uk> References: <77D8BE69-E2D3-4C75-BED4-08F1755E414F@york.ac.uk> Message-ID: <0384052D-74D2-4789-B7FA-76EED826044F@sbc.su.se> Hi Ben, I would not use the script you posted ? I don't think it does what you want. If you haven't already, you should take a look at the beginners' HOWTO http://www.bioperl.org/wiki/HOWTO:Beginners the SearchIO HOWTO http://www.bioperl.org/wiki/HOWTO:SearchIO and the example scripts included with BioPerl: http://www.bioperl.org/wiki/Scripts Incidentally, it's a lot of fiddly data processing to parse blast reports for many contigs against multiple databases and then go back and collate the results by query. I'm not sure exactly what you want to do once you've separated by query ? if you provide some more information, we could suggest ways to best get you where you want to go. I will mention, though, that BLAST has the ability to search multiple separate databases in one go and collate the results for you. So that's something to consider. Dave From bernd.web at gmail.com Fri Aug 20 11:17:05 2010 From: bernd.web at gmail.com (Bernd Web) Date: Fri, 20 Aug 2010 17:17:05 +0200 Subject: [Bioperl-l] Bio::LocatableSeq end checking inconsistency In-Reply-To: <004a01cb3aec$8c2ddd60$a4899820$%yin@ucd.ie> References: <004a01cb3aec$8c2ddd60$a4899820$%yin@ucd.ie> Message-ID: Hi Yin, I am not quite sure if the following is also related to your gapped length issue but I found I had to adapt the calculation of ungapped_len in Bio::LocatableSeq. If my slices did not contain any letters or a new gap char I used, SimpleAlign could not find the sequences when outputting the alignment. This was due to a difference in length calculation: SimpleAlign: uses \W: $slice_seq =~ s/\W//g; Bio::LocatableSeq::ungapped_len uses "$string =~ s/[\.\-]+//g;" I had to include '~' (for my local sequences) in the ungapped_len; otherwise i would run into the end issues with SimpleAlign. Kind regards, Bernd On Fri, Aug 13, 2010 at 3:36 PM, Jun Yin wrote: > Hi, all, > > > > I am the google summer of code student working on Bio::Align subsystem > refactoring. The code (Bio::SimpleAlign) I re-implemented now has passed > nearly all the test, except a few tests on seq/start-end testing. But here > comes a problem. This may be an old issue, that the Bio::LocatableSeq end > assignment and checking are inconsistent. > > > > The current end checking method is based on: > > $end=$seq->_ungapped_len+$seq->start-1 > > However, this checking may not fit the real world case. > > > > The inconsistency usually happens when a few columns of the sequence are > removed. > > > > For example: > > my $a = Bio::LocatableSeq->new( > > ? ?-id ? ?=> 'a', > > ? ?-strand => 1, > > ? ?-seq ? => '-tcgatc-atcgatcg', > > ? ?-start => 30, > > ? ?-end ? => 43 > > ); > > > > If we remove the 1st, 8th and the last columns > > > > $a->seq() will be 'tcgatcatcgatc' > > $a->_ungapped_len==12 > > > > Actually, in the real world, the first residue will still be 30 (the old > $seq->start), and the last residue is the residue before the 43 (the old > $seq->end), thus 42. > > > > But if you call a validation, the calculation is > $a->_ungapped_len+$a->start-1=12+30-1=41 > > So the reassignment of the $seq->end will not pass the validation. > > > > So unless you save the information to a new sequence object, the original > position information will be lost anyway. But in some cases, we have to > change the sequence in its original sequence object .. > > > > What is your suggestion on this issue? > > A. pass the test and lose the information ? ? ?#convenient in coding but the > start-end annotation is not right any more > > B. keep the information and forget the test ? #the object will still > remember where the last residue was in the original sequence. But is it > really meaningful at all? Because all the other residues may come from > nowhere > > C. Neither of above #any other suggestions? > > > > Cheers, > > Jun Yin > > Ph.D. student in U.C.D. > > > > Bioinformatics Laboratory > > Conway Institute > > University College Dublin > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From sidd.basu at gmail.com Fri Aug 20 11:59:59 2010 From: sidd.basu at gmail.com (Siddhartha Basu) Date: Fri, 20 Aug 2010 10:59:59 -0500 Subject: [Bioperl-l] Re: bioperl-db and postgres8.3 - status query In-Reply-To: <4C6DADDF.1000103@cornell.edu> References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au> <986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net> <045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net> <5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu> <4C6DADDF.1000103@cornell.edu> Message-ID: <20100820155956.GB400@vpn-165-124-164-118.vpn.northwestern.edu> Hi, On Thu, 19 Aug 2010, Robert Buels wrote: > Chris Fields wrote: > > I think it's worth exploring having a DBIx::Class-based middle-ware > > approach similar to what Rob Buels has done for Chado. That would be > > fairly easy to get started using DBIx::Class::Schema::Loader. > > After that it would require optimization and tweaking, which is > > potentially more complex than Rob's setup as Chado is very Pg-specific, > > but maybe Rob can elaborate... > > Elaborating on how Bio::Chado::Schema is developed: > > The vast majority of the code and POD in BCS is autogenerated by > DBIx::Class::Schema::Loader. DBICSL gives you a baseline set of > DBIx::Class classes that covers all the tables, views, columns, unique > constraints, and foreign key relationships. > > Beyond that, you have to add on yourself. In BCS, we have mostly done > things like: > > * make better-named aliases for some of the autogenerated > relationships (though DBICSL does a surprisingly good job of naming > relationships automatically most of the time) > * add a tiny bit of bioperl compatibility (this needs a lot more work > by somebody, volunteers needed!) > * add convenience methods for using some of the Chado property tables > * use DBIx::Class::Tree::NestedSet to add some powerful ways of > traversing phylogenetic tree relationships > > Regarding DB backend specificity, BCS isn't Pg-specific at all, because > DBIx::Class itself goes to great lengths to be compatible (and performant!) > with just about every relational database out there. I would vouch for that at least as far as chado in oracle is concerned. So, far BCS works out flawlessly with our oracle chado instance at dictybase. Quite a chunk of BCS based code is also active in couple of our Mojo based webapps. The part which i still couldn't use directly is the 'synonym' table as it clashes with oracle specific reserved keywords. However, overall it seems to quite cross-RDMS compatible and highly recommended. -siddhartha >In fact, the BCS test > suite deploys a Chado schema into a temporary SQLite database using > DBIC::Schema's deploy() method, and runs all of its tests on that. Very > handy. > > Chado's Pg-specific server-side functions can of course be called through > BCS if they are present, but it's perfectly possible to use Chado without > any of the server-side functions, and mostly the way I use it. > > Rob > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jun.yin at ucd.ie Fri Aug 20 12:17:33 2010 From: jun.yin at ucd.ie (Jun Yin) Date: Fri, 20 Aug 2010 17:17:33 +0100 Subject: [Bioperl-l] Bio::LocatableSeq end checking inconsistency In-Reply-To: References: <004a01cb3aec$8c2ddd60$a4899820$%yin@ucd.ie> Message-ID: <000b01cb4083$31f98280$95ec8780$%yin@ucd.ie> Hi, Bernd, Thx for your input. Yes, this is one of the old bugs in Bio::SimpleAlign. $aln->slice just simply $slice_seq =~ s/\W//g to calculate the ungapped length. But in $seq->_ungapped_len, this method use $string =~ s{[$GAP_SYMBOLS$FRAMESHIFT_SYMBOLS]+}{}g; Which is '\-\.=~\\\/ ' to calculate the ungapped length. To solve this problem, first, now I use $nonres = join("",$self->gap_char, $self->match_char,$self->missing_char); Which is '-\.&' to remove the non-residue chars in the alignment sequence (though if you use '=','~','\','/' will also cause problems). Secondly, I have merged slice, remove_columns and remove_gaps, using the same internal function. Thus it is easier to debug. These changes will be merged into main BioPerl branch after next version. But anyway, the confict is still there, because the non residue chars are defined as: In Bio::SimpleAlign, $aln->gap_char, $aln->missing_char, $aln->match_char In Bio::LocatableSeq $GAP_SYMBOLS = '\-\.=~'; $FRAMESHIFT_SYMBOLS = '\\\/'; so try to use '-' or '.' for your gap char at the moment, otherwise you may encounter end warnings in calculation. And, if you want to keep gap only sequences, you can call the method as: $aln2 = $aln->slice(20,30,1) The last parameter is to keep gap only sequence. Cheers, Jun Yin Ph.D.?student in U.C.D. Bioinformatics Laboratory Conway Institute University College Dublin -----Original Message----- From: Bernd Web [mailto:bernd.web at gmail.com] Sent: Friday, August 20, 2010 4:17 PM To: Jun Yin Cc: bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Bio::LocatableSeq end checking inconsistency Hi Yin, I am not quite sure if the following is also related to your gapped length issue but I found I had to adapt the calculation of ungapped_len in Bio::LocatableSeq. If my slices did not contain any letters or a new gap char I used, SimpleAlign could not find the sequences when outputting the alignment. This was due to a difference in length calculation: SimpleAlign: uses \W: $slice_seq =~ s/\W//g; Bio::LocatableSeq::ungapped_len uses "$string =~ s/[\.\-]+//g;" I had to include '~' (for my local sequences) in the ungapped_len; otherwise i would run into the end issues with SimpleAlign. Kind regards, Bernd On Fri, Aug 13, 2010 at 3:36 PM, Jun Yin wrote: > Hi, all, > > > > I am the google summer of code student working on Bio::Align subsystem > refactoring. The code (Bio::SimpleAlign) I re-implemented now has passed > nearly all the test, except a few tests on seq/start-end testing. But here > comes a problem. This may be an old issue, that the Bio::LocatableSeq end > assignment and checking are inconsistent. > > > > The current end checking method is based on: > > $end=$seq->_ungapped_len+$seq->start-1 > > However, this checking may not fit the real world case. > > > > The inconsistency usually happens when a few columns of the sequence are > removed. > > > > For example: > > my $a = Bio::LocatableSeq->new( > > ? ?-id ? ?=> 'a', > > ? ?-strand => 1, > > ? ?-seq ? => '-tcgatc-atcgatcg', > > ? ?-start => 30, > > ? ?-end ? => 43 > > ); > > > > If we remove the 1st, 8th and the last columns > > > > $a->seq() will be 'tcgatcatcgatc' > > $a->_ungapped_len==12 > > > > Actually, in the real world, the first residue will still be 30 (the old > $seq->start), and the last residue is the residue before the 43 (the old > $seq->end), thus 42. > > > > But if you call a validation, the calculation is > $a->_ungapped_len+$a->start-1=12+30-1=41 > > So the reassignment of the $seq->end will not pass the validation. > > > > So unless you save the information to a new sequence object, the original > position information will be lost anyway. But in some cases, we have to > change the sequence in its original sequence object .. > > > > What is your suggestion on this issue? > > A. pass the test and lose the information ? ? ?#convenient in coding but the > start-end annotation is not right any more > > B. keep the information and forget the test ? #the object will still > remember where the last residue was in the original sequence. But is it > really meaningful at all? Because all the other residues may come from > nowhere > > C. Neither of above #any other suggestions? > > > > Cheers, > > Jun Yin > > Ph.D. student in U.C.D. > > > > Bioinformatics Laboratory > > Conway Institute > > University College Dublin > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > __________ Information from ESET Smart Security, version of virus signature database 5377 (20100818) __________ The message was checked by ESET Smart Security. http://www.eset.com __________ Information from ESET Smart Security, version of virus signature database 5377 (20100818) __________ The message was checked by ESET Smart Security. http://www.eset.com From cjfields at illinois.edu Fri Aug 20 12:23:07 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 20 Aug 2010 11:23:07 -0500 Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query In-Reply-To: <20100820155956.GB400@vpn-165-124-164-118.vpn.northwestern.edu> References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au> <986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net> <045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net> <5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu> <4C6DADDF.1000103@cornell.edu> <20100820155956.GB400@vpn-165-124-164-118.vpn.northwestern.edu> Message-ID: <1282321387.30812.17.camel@pyrimidine.igb.uiuc.edu> On Fri, 2010-08-20 at 10:59 -0500, Siddhartha Basu wrote: > Hi, > > On Thu, 19 Aug 2010, Robert Buels wrote: > > > Chris Fields wrote: > > > I think it's worth exploring having a DBIx::Class-based middle-ware > > > approach similar to what Rob Buels has done for Chado. That would be > > > fairly easy to get started using DBIx::Class::Schema::Loader. > > > After that it would require optimization and tweaking, which is > > > potentially more complex than Rob's setup as Chado is very Pg-specific, > > > but maybe Rob can elaborate... > > > > Elaborating on how Bio::Chado::Schema is developed: > > > > The vast majority of the code and POD in BCS is autogenerated by > > DBIx::Class::Schema::Loader. DBICSL gives you a baseline set of > > DBIx::Class classes that covers all the tables, views, columns, unique > > constraints, and foreign key relationships. > > > > Beyond that, you have to add on yourself. In BCS, we have mostly done > > things like: > > > > * make better-named aliases for some of the autogenerated > > relationships (though DBICSL does a surprisingly good job of naming > > relationships automatically most of the time) > > * add a tiny bit of bioperl compatibility (this needs a lot more work > > by somebody, volunteers needed!) > > * add convenience methods for using some of the Chado property tables > > * use DBIx::Class::Tree::NestedSet to add some powerful ways of > > traversing phylogenetic tree relationships > > > > Regarding DB backend specificity, BCS isn't Pg-specific at all, because > > DBIx::Class itself goes to great lengths to be compatible (and performant!) > > with just about every relational database out there. > I would vouch for that at least as far as chado in oracle is concerned. > So, far BCS works out flawlessly with our oracle chado instance at > dictybase. Quite a chunk of BCS based code is also active in couple of > our Mojo based webapps. The part which i still couldn't use directly is > the 'synonym' table as it clashes with oracle specific reserved keywords. > However, overall it seems to quite cross-RDMS compatible and highly > recommended. > > -siddhartha Just to point out, I didn't say BCS is Pg-specific, but that Chado is (that was the DBMS it was designed for). Maybe that should be amended to 'was' now :) I recall seeing a page on this somewhere on the GMOD website along the lines of "MySQL has problems so we chose Pg", and that Chado support would focus on Pg. I'm guessing that's no longer the case? Or is only the server-side stuff Pg-specific. > >In fact, the BCS test > > suite deploys a Chado schema into a temporary SQLite database using > > DBIC::Schema's deploy() method, and runs all of its tests on that. Very > > handy. > > > > Chado's Pg-specific server-side functions can of course be called through > > BCS if they are present, but it's perfectly possible to use Chado without > > any of the server-side functions, and mostly the way I use it. > > > > Rob I think this opens up the possibility of starting a DBIx::Class-based middleware solution. Hilmar, did you want to take that on? chris From sidd.basu at gmail.com Fri Aug 20 13:39:44 2010 From: sidd.basu at gmail.com (Siddhartha Basu) Date: Fri, 20 Aug 2010 12:39:44 -0500 Subject: [Bioperl-l] Re: bioperl-db and postgres8.3 - status query In-Reply-To: <1282321387.30812.17.camel@pyrimidine.igb.uiuc.edu> References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au> <986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net> <045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net> <5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu> <4C6DADDF.1000103@cornell.edu> <20100820155956.GB400@vpn-165-124-164-118.vpn.northwestern.edu> <1282321387.30812.17.camel@pyrimidine.igb.uiuc.edu> Message-ID: <20100820173942.GC400@vpn-165-124-164-118.vpn.northwestern.edu> On Fri, 20 Aug 2010, Chris Fields wrote: > On Fri, 2010-08-20 at 10:59 -0500, Siddhartha Basu wrote: > > Hi, > > > > On Thu, 19 Aug 2010, Robert Buels wrote: > > > > > Chris Fields wrote: > > > > I think it's worth exploring having a DBIx::Class-based middle-ware > > > > approach similar to what Rob Buels has done for Chado. That would be > > > > fairly easy to get started using DBIx::Class::Schema::Loader. > > > > After that it would require optimization and tweaking, which is > > > > potentially more complex than Rob's setup as Chado is very Pg-specific, > > > > but maybe Rob can elaborate... > > > > > > Elaborating on how Bio::Chado::Schema is developed: > > > > > > The vast majority of the code and POD in BCS is autogenerated by > > > DBIx::Class::Schema::Loader. DBICSL gives you a baseline set of > > > DBIx::Class classes that covers all the tables, views, columns, unique > > > constraints, and foreign key relationships. > > > > > > Beyond that, you have to add on yourself. In BCS, we have mostly done > > > things like: > > > > > > * make better-named aliases for some of the autogenerated > > > relationships (though DBICSL does a surprisingly good job of naming > > > relationships automatically most of the time) > > > * add a tiny bit of bioperl compatibility (this needs a lot more work > > > by somebody, volunteers needed!) > > > * add convenience methods for using some of the Chado property tables > > > * use DBIx::Class::Tree::NestedSet to add some powerful ways of > > > traversing phylogenetic tree relationships > > > > > > Regarding DB backend specificity, BCS isn't Pg-specific at all, because > > > DBIx::Class itself goes to great lengths to be compatible (and performant!) > > > with just about every relational database out there. > > I would vouch for that at least as far as chado in oracle is concerned. > > So, far BCS works out flawlessly with our oracle chado instance at > > dictybase. Quite a chunk of BCS based code is also active in couple of > > our Mojo based webapps. The part which i still couldn't use directly is > > the 'synonym' table as it clashes with oracle specific reserved keywords. > > However, overall it seems to quite cross-RDMS compatible and highly > > recommended. > > > > -siddhartha > > Just to point out, I didn't say BCS is Pg-specific, but that Chado is > (that was the DBMS it was designed for). Maybe that should be amended > to 'was' now :) > > I recall seeing a page on this somewhere on the GMOD website along the > lines of "MySQL has problems so we chose Pg", and that Chado support > would focus on Pg. As far as i understand GMOD stongly recommends and the popular backend for chado is Pg. However, my point was if anybody wants to use or tryout chado schema on a different backend or have an existing setup, tools like DBIx::Class or particularly BCS makes it quite easier to do so. The code developed on top also become quite robust and portable. -siddhartha >I'm guessing that's no longer the case? Or is only > the server-side stuff Pg-specific. > > > >In fact, the BCS test > > > suite deploys a Chado schema into a temporary SQLite database using > > > DBIC::Schema's deploy() method, and runs all of its tests on that. Very > > > handy. > > > > > > Chado's Pg-specific server-side functions can of course be called through > > > BCS if they are present, but it's perfectly possible to use Chado without > > > any of the server-side functions, and mostly the way I use it. > > > > > > Rob > > I think this opens up the possibility of starting a DBIx::Class-based > middleware solution. Hilmar, did you want to take that on? > > chris > > From buiduyminh at gmail.com Fri Aug 20 17:29:00 2010 From: buiduyminh at gmail.com (Minh Bui) Date: Fri, 20 Aug 2010 17:29:00 -0400 Subject: [Bioperl-l] bp_seqfeature_load.pl fails on Mac os. Please help. Message-ID: Hi,, I am trying to load my GFF file to mysql database but I got this error when I ran the bp_seqfeature_load.pl ( bioperl 1.6.1 on MAC) [BioComplexity-5:/usr/local/bin] minh% perl bp_seqfeature_load.pl install_driver(mysql) failed: Can't locate DBD/mysql.pm in @INC (@INC contains: /sw/lib/perl5 /sw/lib/perl5/darwin /System/Library/Perl/5.8.6/darwin-thread-multi-2level /System/Library/Perl/5.8.6 /Library/Perl/5.8.6/darwin-thread-multi-2level /Library/Perl/5.8.6 /Library/Perl /Network/Library/Perl/5.8.6/darwin-thread-multi-2level /Network/Library/Perl/5.8.6 /Network/Library/Perl /System/Library/Perl/Extras/5.8.6/darwin-thread-multi-2level /System/Library/Perl/Extras/5.8.6 /Library/Perl/5.8.1 .) at (eval 44) line 3. Perhaps the DBD::mysql perl module hasn't been fully installed, or perhaps the capitalisation of 'mysql' isn't right. Available drivers: DBM, ExampleP, File, Gofer, Proxy, Sponge. at /Library/Perl/5.8.6/Bio/DB/SeqFeature/Store/DBI/mysql.pm line 212 I am using MAC OSX version 10.4.10 and MAMP? Isnt it the "/Library/Perl/5.8.6" already in @INC? What am I missing? I have been googling this error for a few hours. I also install Bioperl and reinstall DBD::mysql using CPAN. It still doesnt work.. Here is my $PERL5LIB: /sw/lib/perl5:/sw/lib/perl5/darwin/ I really need help on this. Thank you, From awitney at sgul.ac.uk Sat Aug 21 06:39:10 2010 From: awitney at sgul.ac.uk (Adam Witney) Date: Sat, 21 Aug 2010 11:39:10 +0100 Subject: [Bioperl-l] bp_seqfeature_load.pl fails on Mac os. Please help. In-Reply-To: References: Message-ID: <491D1B66-741F-4315-8A6B-46F465956017@sgul.ac.uk> On 20 Aug 2010, at 22:29, Minh Bui wrote: > Hi,, > I am trying to load my GFF file to mysql database but I got this error > when I ran the bp_seqfeature_load.pl ( bioperl 1.6.1 on MAC) > > [BioComplexity-5:/usr/local/bin] minh% perl bp_seqfeature_load.pl > install_driver(mysql) failed: Can't locate DBD/mysql.pm in @INC (@INC > contains: /sw/lib/perl5 /sw/lib/perl5/darwin > /System/Library/Perl/5.8.6/darwin-thread-multi-2level > /System/Library/Perl/5.8.6 > /Library/Perl/5.8.6/darwin-thread-multi-2level /Library/Perl/5.8.6 > /Library/Perl /Network/Library/Perl/5.8.6/darwin-thread-multi-2level > /Network/Library/Perl/5.8.6 /Network/Library/Perl > /System/Library/Perl/Extras/5.8.6/darwin-thread-multi-2level > /System/Library/Perl/Extras/5.8.6 /Library/Perl/5.8.1 .) at (eval 44) > line 3. > Perhaps the DBD::mysql perl module hasn't been fully installed, > or perhaps the capitalisation of 'mysql' isn't right. > Available drivers: DBM, ExampleP, File, Gofer, Proxy, Sponge. > at /Library/Perl/5.8.6/Bio/DB/SeqFeature/Store/DBI/mysql.pm line 212 > > I am using MAC OSX version 10.4.10 and MAMP? Isnt it the > "/Library/Perl/5.8.6" already in @INC? What am I missing? > I have been googling this error for a few hours. I also install > Bioperl and reinstall DBD::mysql using CPAN. It still doesnt work.. > > Here is my $PERL5LIB: /sw/lib/perl5:/sw/lib/perl5/darwin/ Where did DBD:mysql get installed? can you verify that DBD/mysql.pm is actually in one of those directories listed above? From i.hatethispart at ymail.com Sat Aug 21 10:07:28 2010 From: i.hatethispart at ymail.com (keiko) Date: Sat, 21 Aug 2010 07:07:28 -0700 (PDT) Subject: [Bioperl-l] clustalw.exe In-Reply-To: <3612399.post@talk.nabble.com> References: <3612399.post@talk.nabble.com> Message-ID: <29499435.post@talk.nabble.com> Katrin wrote: > > hello, I am a new Perl/Bioperl-User and first I must excuse me for my > really bad english, but I hope everybody will understand me. I have the > following problem: In my Perl-skript is the following system call: > $y=exec("C:\\Programme\\xampp-win32-1.5.1\\xampp\\perl\\clustalw.exe > C:\\Programme\\xampp-win32-1.5.1\\xampp\\htdocs\\gene\\clustal.fasta"); If > I call this Script with the Shell (cmd.exe) everything works correctly. > But if I call this script with PHP I get the following error message: > Error: unknown option > /C:\Programme\xampp-win32-1.5.1\xampp\htdocs\gene\clustal.fasta. I tried > also system and qx. And I tested the environment variables: I wrote a > bat-file with the definition of all environment-variables and the system > call, but this did not work, too. The same problem is in php. The > PHP-Scipt is called from html and I worked under WindowsXP with xampp. I > hope, somebody can help me. greetings Katrin > Hi. I also have a problem with this one. I want to call clustalw using php. Can I ask what you included in your bat-file and where did you download your clustal? thanks a lot! -- View this message in context: http://old.nabble.com/clustalw.exe-tp3612399p29499435.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From jason at bioperl.org Sun Aug 22 14:29:30 2010 From: jason at bioperl.org (Jason Stajich) Date: Sun, 22 Aug 2010 11:29:30 -0700 Subject: [Bioperl-l] Enquiry on Bio::DB::Taxonomy In-Reply-To: References: Message-ID: <4C716C8A.3010000@bioperl.org> Hi Amali - This is how I'd print out the full classification by using the Tree methods (with probably a different way of initializing the $db object to your flatfiles location). #!/usr/bin/perl -w use strict; use Bio::DB::Taxonomy; my $db= Bio::DB::Taxonomy->new(-source => 'flatfile', -nodesfile => 'taxonomy/nodes.dmp', -namesfile => 'taxonomy/names.dmp'); my $taxonid = $db->get_taxonid('Homo sapiens'); my $taxon = $db->get_taxon(-taxonid => $taxonid); my $tree = Bio::Tree::Tree->new(-node => $taxon); my @taxa = $tree->get_nodes; print join(",", map { $_->scientific_name } @taxa), "\n"; -jason Amali Thrimawithana wrote, On 8/18/10 3:56 PM: > Dear Dr Stajich, > > I am a Masters student at Auckland university and my research is on > identifying yeast species present in wine by the use of 454 sequencing. In > order to carry out this research, a pipeline is being built in which at the > final step each representative OTU need to be classified at different > taxonomic levels (ie: at Phylum, family, class, genus and species) by using > the results from BLAST. To identify the sequences at each taxonomic level, I > have been trying out the Bio::DB::Taxonomy module in bioperl. Using this > module, I am able to get the genus and species level by splitting the > scientific name returned by the Bio::taxon object. But unfortunately I am > uncertain on how to get the information for the other levels of the rank. I > have tried several commands including "my @class = $node->classification;", > but it does not work. Hence, could you please let me know how I might be > able to get the higher levels of taxonomy such as class and phylum using > bioperl? > > Look forward to hearing from you soon > > Thanking You > > Amali > From cjfields at illinois.edu Sun Aug 22 15:56:58 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 22 Aug 2010 14:56:58 -0500 Subject: [Bioperl-l] clustalw.exe In-Reply-To: <29499435.post@talk.nabble.com> References: <3612399.post@talk.nabble.com> <29499435.post@talk.nabble.com> Message-ID: On Aug 21, 2010, at 9:07 AM, keiko wrote: > Katrin wrote: >> >> hello, I am a new Perl/Bioperl-User and first I must excuse me for my >> really bad english, but I hope everybody will understand me. I have the >> following problem: In my Perl-skript is the following system call: >> $y=exec("C:\\Programme\\xampp-win32-1.5.1\\xampp\\perl\\clustalw.exe >> C:\\Programme\\xampp-win32-1.5.1\\xampp\\htdocs\\gene\\clustal.fasta"); If >> I call this Script with the Shell (cmd.exe) everything works correctly. >> But if I call this script with PHP I get the following error message: >> Error: unknown option >> /C:\Programme\xampp-win32-1.5.1\xampp\htdocs\gene\clustal.fasta. I tried >> also system and qx. And I tested the environment variables: I wrote a >> bat-file with the definition of all environment-variables and the system >> call, but this did not work, too. The same problem is in php. The >> PHP-Scipt is called from html and I worked under WindowsXP with xampp. I >> hope, somebody can help me. greetings Katrin >> > > Hi. I also have a problem with this one. I want to call clustalw using php. > Can I ask what you included in your bat-file and where did you download your > clustal? thanks a lot! Not sure, but what does this have to do with BioPerl? chris From jason at bioperl.org Mon Aug 23 11:56:47 2010 From: jason at bioperl.org (Jason Stajich) Date: Mon, 23 Aug 2010 08:56:47 -0700 Subject: [Bioperl-l] a problem when using the Bioperl modules In-Reply-To: References: Message-ID: <4C729A3F.7080304@bioperl.org> Wei - Please ask your questions on the bioperl mailing list, I cannot answer questions directly for all requests. Your problem has been answered by me on the list before so I urge you to use the list archives as a starting point. The line lengths of the fasta file sequence aren't the same length. you need to run this bp_sreformat -if fasta -of fasta -i ORIGINAL -o NEW mv NEW ORIGINAL or with sreformat sreformat fasta ORIGINAL > NEW mv NEW ORIGINAL Guifeng Wei wrote, On 8/23/10 4:57 AM: > Dear professor Stajich, > So sorry to interrupt you. i came across a problem when i use the > Bio::DB::Fasta modules of BioPerl. The aim i want to arrive at is to > extract the subsequences accoording to the *.bed files which are the > C.elegans genomic sequnece annotation. The code i programed is in the > attached file. > The genomic sequences file contains sequences from 6 chromosomes of > C.elegans. > when i run this program in the command line, the following error > warnings was coming. > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Each line of the fasta entry must be the same length except the last. > Line above #301451 ' > ..' is 22 != 51 chars. > STACK: Error::throw > STACK: Bio::Root::Root::throw > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368 > STACK: Bio::DB::Fasta::calculate_offsets > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:770 > STACK: Bio::DB::Fasta::index_file > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:680 > STACK: Bio::DB::Fasta::new > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:491 > STACK: bed_to_fasta.pl:14 > ----------------------------------------------------------- > indexing was interrupted, so unlinking > /home/wgf/WORM_DATA/elegans.WS190.dna.fa.index at > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm line 1053. > > and therefore i write to you in hope that you can help me solve this > problem,as well as, give me some suggestion about how to learn Bioperl > well. > thank you very very much. > yours sincerely > Wei Guifeng From jason.stajich at ucr.edu Mon Aug 23 11:58:07 2010 From: jason.stajich at ucr.edu (Jason Stajich) Date: Mon, 23 Aug 2010 08:58:07 -0700 Subject: [Bioperl-l] a problem when using the Bioperl modules In-Reply-To: References: Message-ID: <4C729A8F.1070506@ucr.edu> You haven't defined this variable $db - you need to not skip the part that initializes the Bio::DB::Fasta object that you had previous asked about. Please send all your future queries to the mailing list. Guifeng Wei wrote, On 8/23/10 8:14 AM: > Dear professor, > after that, i revised my scripts, which is that i divide the genomic > sequences into 7 single file, every file contains the sequence from a > chromosome. > however, when i try to run the scripts, the following error was coming. > Can't call method "seq" on an undefined value at bed_to_fasta.pl > line 29, line 1. > while(){ > chomp $_; > my @bed=split(/\s+/, $_ ); > #print length($db->seq('chrI')); > my $chr_id=$bed[0]; > my $start=$bed[1]; > my $end=$bed[2]; > my $seq_name=$bed[3]; > my $strand=$bed[5]; > my $segment = $db ->seq($chr_id,$start=>$end); > print ">",$seq_name,"_",$chr_id,":",$start=>$end; > print "$segment\n"; > } > the blue line is . > why? -- Jason E. Stajich, PhD Assistant Professor Department of Plant Pathology & Microbiology University of California Riverside, CA 92521 jason.stajich at ucr.edu office: 951.827.2363 http://lab.stajich.org/ http://twitter.com/stajichlab http://fungalgenomes.org/blog/ http://plantpathology.ucr.edu/ http://genomics.ucr.edu/ http://cepceb.ucr.edu/ From guifengwei at gmail.com Mon Aug 23 22:44:57 2010 From: guifengwei at gmail.com (Guifeng Wei) Date: Tue, 24 Aug 2010 10:44:57 +0800 Subject: [Bioperl-l] a problem when using the Bio::DB::Fasta Message-ID: Hi, i came across a problem when i use the Bio::DB::Fasta modules of BioPerl. The aim i want to arrive at is to extract the subsequences accoording to the *.bed files which are the C.elegans genomic sequnece annotation. when i tried to run the scripts i wrote, the error message was coming, as follows: Can't call method "seq" on an undefined value at bed_to_fasta.pl line 28, line 1. so, ask for favor to slove this problem. Here is my perl scripts. #!/usr/bin/perl -w # Purpose: extract sequences from genomic sequences use strict; use Bio::DB::Fasta; open(IN,$ARGV[0]) || die "sorry, the program cannot open the .bed file, plea check it. \n"; my $db = Bio::DB::Fasta->new( '/home/wgf/elegans190.dna/' ); # The dir ...../elegans190.dna/ includes 6 files:chrI,chrII,chrIII,chrIV,chrV,chrX, #each stands for the sequences from the coressponding chromosome. while(){ chomp $_; my @bed=split(/\s+/, $_ ); my $chr_id=$bed[0]; my $start=$bed[1]; my $end=$bed[2]; my $seq_name=$bed[3]; my $strand=$bed[5]; my $segment = $db->seq( $chr_id, $start=>$end ); print ">",$seq_name,"_",$chr_id,":",$start=>$end; print "$segment\n"; } close(IN); From florent.angly at gmail.com Tue Aug 24 01:06:21 2010 From: florent.angly at gmail.com (Florent Angly) Date: Tue, 24 Aug 2010 15:06:21 +1000 Subject: [Bioperl-l] a problem when using the Bio::DB::Fasta In-Reply-To: References: Message-ID: <4C73534D.6080607@gmail.com> Hi Guifeng, From the Bio::DB::Fasta documentation: > $db = Bio::DB::Fasta->new($fasta_path [,%options]) > Create a new Bio::DB::Fasta object from the Fasta file or files > indicated by $fasta_path. Indexing will be performed > automatically > if needed. If successful, new() will return the database > accessor > object. Otherwise it will return undef. Hence, after you create the database object $db, you should check that it was successful, e.g.: > my $db = Bio::DB::Fasta->new( '/home/wgf/elegans190.dna/' ); > if (not defined $db) { > die "There was a problem creating the database\n"; > } A problem creating the database would explain the message you get. If the extension of the FASTA files in the directory path that you gave as input is not fa, fasta, fast, FA, FASTA, FAST or dna, then you should use the -glob option when constructing your database object. From the documentation: > -glob Glob expression to use > *.{fa,fasta,fast,FA,FASTA,FAST,dna} > for searching for Fasta > files in directories. Florent On 24/08/10 12:44, Guifeng Wei wrote: > Hi, > > i came across a problem when i use the Bio::DB::Fasta modules of > BioPerl. The aim i want to arrive at is to extract the subsequences > accoording to the *.bed files which are the C.elegans genomic sequnece > annotation. > > when i tried to run the scripts i wrote, the error message was coming, as > follows: > > Can't call method "seq" on an undefined value at bed_to_fasta.pl line 28, > line 1. > > so, ask for favor to slove this problem. > Here is my perl scripts. > > #!/usr/bin/perl -w > # Purpose: extract sequences from genomic sequences > use strict; > use Bio::DB::Fasta; > open(IN,$ARGV[0]) || die "sorry, the program cannot open the .bed file, plea > check it. \n"; > my $db = Bio::DB::Fasta->new( '/home/wgf/elegans190.dna/' ); > # The dir ...../elegans190.dna/ includes 6 > files:chrI,chrII,chrIII,chrIV,chrV,chrX, > #each stands for the sequences from the coressponding chromosome. > > while(){ > chomp $_; > my @bed=split(/\s+/, $_ ); > > my $chr_id=$bed[0]; > my $start=$bed[1]; > my $end=$bed[2]; > my $seq_name=$bed[3]; > my $strand=$bed[5]; > > my $segment = $db->seq( $chr_id, $start=>$end ); > > print ">",$seq_name,"_",$chr_id,":",$start=>$end; > print "$segment\n"; > > } > > close(IN); > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From guifengwei at gmail.com Tue Aug 24 07:28:16 2010 From: guifengwei at gmail.com (Guifeng Wei) Date: Tue, 24 Aug 2010 19:28:16 +0800 Subject: [Bioperl-l] a problem when using the Bio::DB::Fasta In-Reply-To: References: Message-ID: Hi, i have revised my scripts according to the previous email from Florent. However, there were still some errors which frustrated me so much. The errors are as follows: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Each line of the fasta entry must be the same length except the last. Line above #301451 ' ..' is 22 != 51 chars. STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368 STACK: Bio::DB::Fasta::calculate_offsets /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:770 STACK: Bio::DB::Fasta::index_dir /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:593 STACK: Bio::DB::Fasta::new /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:488 STACK: bed2fasta.pl:13 ----------------------------------------------------------- indexing was interrupted, so unlinking /home/wgf/elegans190.dna//directory.index at /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm line 1053 But in the directory /home/wgf/elegans190.dna/ , it concludes 6 files, each contains the complete sequences from one single chromosome, the format is fasta. The extension of the FASTA files is .fa. Every single file is started as ">chromosoemeXXX" followed by the thousands of sequences. and therefore, it warn me that "Each line of the fasta entry must be the same length except the last". and "indexing was interrupted, so unlinking /home/wgf/elegans190.dna//directory". i was much confused about this. so for help. Wei Guifeng From biopython at maubp.freeserve.co.uk Tue Aug 24 09:28:33 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 24 Aug 2010 14:28:33 +0100 Subject: [Bioperl-l] a problem when using the Bio::DB::Fasta In-Reply-To: References: Message-ID: On Tue, Aug 24, 2010 at 12:28 PM, Guifeng Wei wrote: > Hi, > > i have revised my scripts according to the previous email from Florent. > However, there were still some errors which frustrated me so much. > > The errors are as follows: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Each line of the fasta entry must be the same length except the last. > ? ?Line above #301451 ' > ..' is 22 != 51 chars. > STACK: Error::throw > STACK: Bio::Root::Root::throw > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368 > STACK: Bio::DB::Fasta::calculate_offsets > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:770 > STACK: Bio::DB::Fasta::index_dir > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:593 > STACK: Bio::DB::Fasta::new > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:488 > STACK: bed2fasta.pl:13 > ----------------------------------------------------------- > indexing was interrupted, so unlinking > /home/wgf/elegans190.dna//directory.index at > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm line 1053 > But in the directory /home/wgf/elegans190.dna/ , it concludes 6 files, > each contains the complete sequences from one single chromosome, the format > is fasta. The extension of the FASTA files is .fa. Every single file is > started as ">chromosoemeXXX" followed by the thousands of sequences. > > and therefore, it warn me that "Each line of the fasta entry must be the > same length except the last". and "indexing was interrupted, so unlinking > /home/wgf/elegans190.dna//directory". > > i was much confused about this. so for help. > > Wei Guifeng Hi Wei, It sounds like there is inconsistent line wrapping in your FASTA file. This is often not a problem at all, but the DB indexing system (and indeed other indexing tools like the samtools fasta index) requires all the entries have the same wrapping. e.g. This is a valid FASTA file but would not be suitable for indexing: >Test ACGTACGT ACGTACGT ACGTACGT ACGT ACGT T Ignoring the final line (special case - here length one) that uses a mixture of line lengths, 8 and 4. If you had used this it should be fine: >Test ACGTACGT ACGTACGT ACGTACGT ACGTACGT T All the lines are now wrapped at length 8 (and the final line is less than or equal to length 8). Of course, in a real file wrapping a 60 or 80 characters is more common ;) Peter From cjfields at illinois.edu Tue Aug 24 09:38:45 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 24 Aug 2010 08:38:45 -0500 Subject: [Bioperl-l] a problem when using the Bio::DB::Fasta In-Reply-To: References: Message-ID: <995BCF30-99B2-46C2-A4E8-681F9E2A0BB5@illinois.edu> Guifeng, Did you follow Jason's advice yesterday about converting the FASTA over to a more consistent length? Or checking the database itself? These are both things reiterated by Florent and Peter. >From Jason's last response: ------------------------- Wei - Please ask your questions on the bioperl mailing list, I cannot answer questions directly for all requests. Your problem has been answered by me on the list before so I urge you to use the list archives as a starting point. The line lengths of the fasta file sequence aren't the same length. you need to run this bp_sreformat -if fasta -of fasta -i ORIGINAL -o NEW mv NEW ORIGINAL or with sreformat sreformat fasta ORIGINAL > NEW mv NEW ORIGINAL ------------------------- chris On Aug 24, 2010, at 6:28 AM, Guifeng Wei wrote: > Hi, > > i have revised my scripts according to the previous email from Florent. > However, there were still some errors which frustrated me so much. > > The errors are as follows: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Each line of the fasta entry must be the same length except the last. > Line above #301451 ' > ..' is 22 != 51 chars. > STACK: Error::throw > STACK: Bio::Root::Root::throw > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368 > STACK: Bio::DB::Fasta::calculate_offsets > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:770 > STACK: Bio::DB::Fasta::index_dir > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:593 > STACK: Bio::DB::Fasta::new > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:488 > STACK: bed2fasta.pl:13 > ----------------------------------------------------------- > indexing was interrupted, so unlinking > /home/wgf/elegans190.dna//directory.index at > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm line 1053 > But in the directory /home/wgf/elegans190.dna/ , it concludes 6 files, > each contains the complete sequences from one single chromosome, the format > is fasta. The extension of the FASTA files is .fa. Every single file is > started as ">chromosoemeXXX" followed by the thousands of sequences. > > and therefore, it warn me that "Each line of the fasta entry must be the > same length except the last". and "indexing was interrupted, so unlinking > /home/wgf/elegans190.dna//directory". > > i was much confused about this. so for help. > > Wei Guifeng > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From scott at scottcain.net Tue Aug 24 11:01:47 2010 From: scott at scottcain.net (Scott Cain) Date: Tue, 24 Aug 2010 11:01:47 -0400 Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query In-Reply-To: <1282321387.30812.17.camel@pyrimidine.igb.uiuc.edu> References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au> <986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net> <045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net> <5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu> <4C6DADDF.1000103@cornell.edu> <20100820155956.GB400@vpn-165-124-164-118.vpn.northwestern.edu> <1282321387.30812.17.camel@pyrimidine.igb.uiuc.edu> Message-ID: Hi Chris, GMOD still only supports Chado with Postgres (for example, the GFF loader assumes a Postgres database), but when I reengineered the GFF loader a few years ago, I tried to do it with subclassing the loader in mind so that it could be subclassed to work with other RDMS. Scott On Fri, Aug 20, 2010 at 12:23 PM, Chris Fields wrote: > On Fri, 2010-08-20 at 10:59 -0500, Siddhartha Basu wrote: >> Hi, >> >> On Thu, 19 Aug 2010, Robert Buels wrote: >> >> > Chris Fields wrote: >> > > I think it's worth exploring having a DBIx::Class-based middle-ware >> > > approach similar to what Rob Buels has done for Chado. ?That would be >> > > fairly easy to get started using DBIx::Class::Schema::Loader. >> > > After that it would require optimization and tweaking, which is >> > > potentially more complex than Rob's setup as Chado is very Pg-specific, >> > > but maybe Rob can elaborate... >> > >> > Elaborating on how Bio::Chado::Schema is developed: >> > >> > The vast majority of the code and POD in BCS is autogenerated by >> > DBIx::Class::Schema::Loader. ?DBICSL gives you a baseline set of >> > DBIx::Class classes that covers all the tables, views, columns, unique >> > constraints, and foreign key relationships. >> > >> > Beyond that, you have to add on yourself. ?In BCS, we have mostly done >> > things like: >> > >> > ? * make better-named aliases for some of the autogenerated >> > ? ? relationships (though DBICSL does a surprisingly good job of naming >> > ? ? relationships automatically most of the time) >> > ? * add a tiny bit of bioperl compatibility (this needs a lot more work >> > ? ? by somebody, volunteers needed!) >> > ? * add convenience methods for using some of the Chado property tables >> > ? * use DBIx::Class::Tree::NestedSet to add some powerful ways of >> > ? ? traversing phylogenetic tree relationships >> > >> > Regarding DB backend specificity, BCS isn't Pg-specific at all, because >> > DBIx::Class itself goes to great lengths to be compatible (and performant!) >> > with just about every relational database out there. >> I would vouch for that at least as far as chado in oracle is concerned. >> So, ?far BCS works out flawlessly with our oracle chado instance at >> dictybase. Quite a chunk of BCS based code is also active in couple of >> our Mojo based webapps. The part which i still couldn't use directly is >> the 'synonym' table as it clashes with oracle specific reserved keywords. >> However, ?overall it seems to quite cross-RDMS compatible and highly >> recommended. >> >> -siddhartha > > Just to point out, I didn't say BCS is Pg-specific, but that Chado is > (that was the DBMS it was designed for). ?Maybe that should be amended > to 'was' now :) > > I recall seeing a page on this somewhere on the GMOD website along the > lines of "MySQL has problems so we chose Pg", and that Chado support > would focus on Pg. ?I'm guessing that's no longer the case? ?Or is only > the server-side stuff Pg-specific. > >> >In fact, the BCS test >> > suite deploys a Chado schema into a temporary SQLite database using >> > DBIC::Schema's deploy() method, and runs all of its tests on that. ?Very >> > handy. >> > >> > Chado's Pg-specific server-side functions can of course be called through >> > BCS if they are present, but it's perfectly possible to use Chado without >> > any of the server-side functions, and mostly the way I use it. >> > >> > Rob > > I think this opens up the possibility of starting a DBIx::Class-based > middleware solution. ?Hilmar, did you want to take that on? > > chris > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From bgs500 at york.ac.uk Tue Aug 24 11:35:53 2010 From: bgs500 at york.ac.uk (Ben Saville) Date: Tue, 24 Aug 2010 16:35:53 +0100 Subject: [Bioperl-l] Problem Parsing BLAST output In-Reply-To: <0384052D-74D2-4789-B7FA-76EED826044F@sbc.su.se> References: <77D8BE69-E2D3-4C75-BED4-08F1755E414F@york.ac.uk> <0384052D-74D2-4789-B7FA-76EED826044F@sbc.su.se> Message-ID: <34F7412D-2BFA-4D80-AEEB-2B8A9BE415D4@york.ac.uk> Sorry for the Delay in replying, 454 data analysis is very time consuming. please see http://seqanswers.com/forums/showthread.php?t=6484 For a discussion about this problem, and how we solved the issue. Thanks for the reply though, much appreciated! Regards Ben Saville On 20 Aug 2010, at 14:48, Dave Messina wrote: > Hi Ben, > > I would not use the script you posted ? I don't think it does what > you want. > > If you haven't already, you should take a look at the beginners' HOWTO > > http://www.bioperl.org/wiki/HOWTO:Beginners > > > the SearchIO HOWTO > > http://www.bioperl.org/wiki/HOWTO:SearchIO > > > and the example scripts included with BioPerl: > > http://www.bioperl.org/wiki/Scripts > > > > Incidentally, it's a lot of fiddly data processing to parse blast > reports for many contigs against multiple databases and then go back > and collate the results by query. I'm not sure exactly what you want > to do once you've separated by query ? if you provide some more > information, we could suggest ways to best get you where you want to > go. > > I will mention, though, that BLAST has the ability to search > multiple separate databases in one go and collate the results for > you. So that's something to consider. > > > > Dave > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Aug 24 11:54:20 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 24 Aug 2010 10:54:20 -0500 Subject: [Bioperl-l] a problem when using the Bio::DB::Fasta In-Reply-To: References: <995BCF30-99B2-46C2-A4E8-681F9E2A0BB5@illinois.edu> Message-ID: Please keep all responses on-list. Regarding sreformat: http://tinyurl.com/28q75rr Judging by the stack traces below, you are also running off a UNIX-like system. To concatenate files, use 'cat'. So, for all files ending with .fa: cat *.fa >> all.fa chris On Aug 24, 2010, at 8:54 AM, Guifeng Wei wrote: > Hello Fields, > > i have checked the fasta files. i suddenly find that the last line is blank line, and the last second is less than common. > > i am not able to run the command line as Jason's advice because i have no knowledge about "sreformat". > > i also want to ask a more question. i want megre the several single chromosome sequence file into one, OK? > > thank you very much. > > Wei Guifeng > 2010/8/24 Chris Fields > Guifeng, > > Did you follow Jason's advice yesterday about converting the FASTA over to a more consistent length? Or checking the database itself? These are both things reiterated by Florent and Peter. > > From Jason's last response: > > ------------------------- > Wei - > > Please ask your questions on the bioperl mailing list, I cannot answer questions directly for all requests. > Your problem has been answered by me on the list before so I urge you to use the list archives as a starting point. > > The line lengths of the fasta file sequence aren't the same length. > > you need to run this > bp_sreformat -if fasta -of fasta -i ORIGINAL -o NEW > mv NEW ORIGINAL > > or with sreformat > sreformat fasta ORIGINAL > NEW > mv NEW ORIGINAL > ------------------------- > > chris > > > On Aug 24, 2010, at 6:28 AM, Guifeng Wei wrote: > > > Hi, > > > > i have revised my scripts according to the previous email from Florent. > > However, there were still some errors which frustrated me so much. > > > > The errors are as follows: > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: Each line of the fasta entry must be the same length except the last. > > Line above #301451 ' > > ..' is 22 != 51 chars. > > STACK: Error::throw > > STACK: Bio::Root::Root::throw > > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368 > > STACK: Bio::DB::Fasta::calculate_offsets > > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:770 > > STACK: Bio::DB::Fasta::index_dir > > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:593 > > STACK: Bio::DB::Fasta::new > > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:488 > > STACK: bed2fasta.pl:13 > > ----------------------------------------------------------- > > indexing was interrupted, so unlinking > > /home/wgf/elegans190.dna//directory.index at > > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm line 1053 > > But in the directory /home/wgf/elegans190.dna/ , it concludes 6 files, > > each contains the complete sequences from one single chromosome, the format > > is fasta. The extension of the FASTA files is .fa. Every single file is > > started as ">chromosoemeXXX" followed by the thousands of sequences. > > > > and therefore, it warn me that "Each line of the fasta entry must be the > > same length except the last". and "indexing was interrupted, so unlinking > > /home/wgf/elegans190.dna//directory". > > > > i was much confused about this. so for help. > > > > Wei Guifeng > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > -- > ?????? Wei Guifeng > > > From cjfields at illinois.edu Tue Aug 24 12:14:51 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 24 Aug 2010 11:14:51 -0500 Subject: [Bioperl-l] Problem Parsing BLAST output In-Reply-To: <34F7412D-2BFA-4D80-AEEB-2B8A9BE415D4@york.ac.uk> References: <77D8BE69-E2D3-4C75-BED4-08F1755E414F@york.ac.uk> <0384052D-74D2-4789-B7FA-76EED826044F@sbc.su.se> <34F7412D-2BFA-4D80-AEEB-2B8A9BE415D4@york.ac.uk> Message-ID: <69C47A74-09C7-4024-9303-A3893658A2A8@illinois.edu> Just in case anyone needs it, there is a way to index these as well (both BLAST and the two tabular BLAST versions) for fast lookups of specific reports, if needed. See Bio::Index::Blast and Bio::Index::BlastTable in BioPerl. Caveat: I believe there is a bug with BLAST+ text output indexing (it chops the header off subsequent reports). I haven't investigated it enough, though, but I'll try looking into it today. chris On Aug 24, 2010, at 10:35 AM, Ben Saville wrote: > Sorry for the Delay in replying, 454 data analysis is very time consuming. > > please see http://seqanswers.com/forums/showthread.php?t=6484 > For a discussion about this problem, and how we solved the issue. > > Thanks for the reply though, much appreciated! > > Regards > Ben Saville > > > > > > On 20 Aug 2010, at 14:48, Dave Messina wrote: > >> Hi Ben, >> >> I would not use the script you posted ? I don't think it does what you want. >> >> If you haven't already, you should take a look at the beginners' HOWTO >> >> http://www.bioperl.org/wiki/HOWTO:Beginners >> >> >> the SearchIO HOWTO >> >> http://www.bioperl.org/wiki/HOWTO:SearchIO >> >> >> and the example scripts included with BioPerl: >> >> http://www.bioperl.org/wiki/Scripts >> >> >> >> Incidentally, it's a lot of fiddly data processing to parse blast reports for many contigs against multiple databases and then go back and collate the results by query. I'm not sure exactly what you want to do once you've separated by query ? if you provide some more information, we could suggest ways to best get you where you want to go. >> >> I will mention, though, that BLAST has the ability to search multiple separate databases in one go and collate the results for you. So that's something to consider. >> >> >> >> Dave >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Aug 24 12:17:17 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 24 Aug 2010 11:17:17 -0500 Subject: [Bioperl-l] FYI: interesting stuff in BLAST 2.2.24 release announcement References: <73C34D2F-813E-4FCE-8819-B86BC5A974C0@ncbi.nlm.nih.gov> Message-ID: FYI, Very interesting additions to BLAST+ (archive format). chris Begin forwarded message: > From: mcginnis > Date: August 24, 2010 10:46:50 AM CDT > To: NLM/NCBI List blast-announce > Subject: [blast-announce] Correction: BLAST 2.2.24 release announcement > > A new version of the stand-alone applications is available. > > Users are encouraged to use the BLAST+ applications available at ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ > > This release includes a number of bug fixes as well as new features for the BLAST+ applications: > > * Introduce BLAST Archive format to permit reformatting of stand-alone BLAST searches with the blast_formatter(see BLAST+ user manual) > * Added the blast_formatter application (see BLAST+ user manual) > * Added support for translated subject soft masking in the BLAST databases > * Added support for the BLAST Trace-back operations (btop) output format > * Added command line options to blastdbcmd for listing available BLAST databases > * Improved performance of formatting of remote BLAST searches > * Use a consistent exit code for out of memory conditions > * Fixed bug in indexed megablast with multiple space-separated BLAST databases > * Fixed bugs in legacy_blast.pl, blastdbcmd, rpsblast, and makeblastdb > * Fixed Windows installer for 64-bit installations > > BLAST+ applications, as well as the legacy C applications (e.g. blastall), may be downloaded from http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download From David.Messina at sbc.su.se Tue Aug 24 13:00:14 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 24 Aug 2010 19:00:14 +0200 Subject: [Bioperl-l] FYI: interesting stuff in BLAST 2.2.24 release announcement In-Reply-To: References: <73C34D2F-813E-4FCE-8819-B86BC5A974C0@ncbi.nlm.nih.gov> Message-ID: <27DD75E8-4452-4B2D-B5B9-A686C113E5B6@sbc.su.se> Here's a link to the manual: ftp://ftp.ncbi.nlm.nih.gov//blast/executables/blast%2B/2.2.24/user_manual.pdf (Is it on the NCBI website somewhere? Strange to have only a downloadable PDF.) The section on the new archive format is on page 27. It seems like a nice idea to have the flexibility, but I wonder about the time cost of using this format. One of the big gains from using tab-delimited output is that BLAST doesn't have to do all the post-processing to generate the alignment views. By doing the archive format, which if I understand it correctly is ASN.1, you're always paying the full price in time (and space, for that matter). Dave On Aug 24, 2010, at 18:17 , Chris Fields wrote: > FYI, > > Very interesting additions to BLAST+ (archive format). > > chris > > Begin forwarded message: > >> From: mcginnis >> Date: August 24, 2010 10:46:50 AM CDT >> To: NLM/NCBI List blast-announce >> Subject: [blast-announce] Correction: BLAST 2.2.24 release announcement >> >> A new version of the stand-alone applications is available. >> >> Users are encouraged to use the BLAST+ applications available at ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ >> >> This release includes a number of bug fixes as well as new features for the BLAST+ applications: >> >> * Introduce BLAST Archive format to permit reformatting of stand-alone BLAST searches with the blast_formatter(see BLAST+ user manual) >> * Added the blast_formatter application (see BLAST+ user manual) >> * Added support for translated subject soft masking in the BLAST databases >> * Added support for the BLAST Trace-back operations (btop) output format >> * Added command line options to blastdbcmd for listing available BLAST databases >> * Improved performance of formatting of remote BLAST searches >> * Use a consistent exit code for out of memory conditions >> * Fixed bug in indexed megablast with multiple space-separated BLAST databases >> * Fixed bugs in legacy_blast.pl, blastdbcmd, rpsblast, and makeblastdb >> * Fixed Windows installer for 64-bit installations >> >> BLAST+ applications, as well as the legacy C applications (e.g. blastall), may be downloaded from http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Aug 24 13:26:49 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 24 Aug 2010 12:26:49 -0500 Subject: [Bioperl-l] FYI: interesting stuff in BLAST 2.2.24 release announcement In-Reply-To: <27DD75E8-4452-4B2D-B5B9-A686C113E5B6@sbc.su.se> References: <73C34D2F-813E-4FCE-8819-B86BC5A974C0@ncbi.nlm.nih.gov> <27DD75E8-4452-4B2D-B5B9-A686C113E5B6@sbc.su.se> Message-ID: It's probably more applicable from the viewpoint of a cluster admin who would want to add the flexibility of having a single archive and allowing any format (as opposed to re-running the analysis). I'm just wondering if there is anything to glean there for possible alignment archiving purposes (ala SAM/BAM), but if it's ASN.1, likely not. chris On Aug 24, 2010, at 12:00 PM, Dave Messina wrote: > Here's a link to the manual: > ftp://ftp.ncbi.nlm.nih.gov//blast/executables/blast%2B/2.2.24/user_manual.pdf > > (Is it on the NCBI website somewhere? Strange to have only a downloadable PDF.) The section on the new archive format is on page 27. > > It seems like a nice idea to have the flexibility, but I wonder about the time cost of using this format. > > One of the big gains from using tab-delimited output is that BLAST doesn't have to do all the post-processing to generate the alignment views. By doing the archive format, which if I understand it correctly is ASN.1, you're always paying the full price in time (and space, for that matter). > > > > Dave > > > > > On Aug 24, 2010, at 18:17 , Chris Fields wrote: > >> FYI, >> >> Very interesting additions to BLAST+ (archive format). >> >> chris >> >> Begin forwarded message: >> >>> From: mcginnis >>> Date: August 24, 2010 10:46:50 AM CDT >>> To: NLM/NCBI List blast-announce >>> Subject: [blast-announce] Correction: BLAST 2.2.24 release announcement >>> >>> A new version of the stand-alone applications is available. >>> >>> Users are encouraged to use the BLAST+ applications available at ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ >>> >>> This release includes a number of bug fixes as well as new features for the BLAST+ applications: >>> >>> * Introduce BLAST Archive format to permit reformatting of stand-alone BLAST searches with the blast_formatter(see BLAST+ user manual) >>> * Added the blast_formatter application (see BLAST+ user manual) >>> * Added support for translated subject soft masking in the BLAST databases >>> * Added support for the BLAST Trace-back operations (btop) output format >>> * Added command line options to blastdbcmd for listing available BLAST databases >>> * Improved performance of formatting of remote BLAST searches >>> * Use a consistent exit code for out of memory conditions >>> * Fixed bug in indexed megablast with multiple space-separated BLAST databases >>> * Fixed bugs in legacy_blast.pl, blastdbcmd, rpsblast, and makeblastdb >>> * Fixed Windows installer for 64-bit installations >>> >>> BLAST+ applications, as well as the legacy C applications (e.g. blastall), may be downloaded from http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Tue Aug 24 14:45:29 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 24 Aug 2010 20:45:29 +0200 Subject: [Bioperl-l] FYI: interesting stuff in BLAST 2.2.24 release announcement In-Reply-To: References: <73C34D2F-813E-4FCE-8819-B86BC5A974C0@ncbi.nlm.nih.gov> <27DD75E8-4452-4B2D-B5B9-A686C113E5B6@sbc.su.se> Message-ID: <00C04DF9-F3C2-4574-B1E4-A3BF28EE953F@sbc.su.se> > It's probably more applicable from the viewpoint of a cluster admin who would want to add the flexibility of having a single archive and allowing any format (as opposed to re-running the analysis). Good point. > I'm just wondering if there is anything to glean there for possible alignment archiving purposes (ala SAM/BAM), but if it's ASN.1, likely not. To be honest, I didn't look that closely at it. It may be worth considering nevertheless. Dave From buiduyminh at gmail.com Tue Aug 24 14:56:43 2010 From: buiduyminh at gmail.com (Minh Bui) Date: Tue, 24 Aug 2010 14:56:43 -0400 Subject: [Bioperl-l] bp_seqfeature_load.pl fails on Mac os. Please help. In-Reply-To: <491D1B66-741F-4315-8A6B-46F465956017@sgul.ac.uk> References: <491D1B66-741F-4315-8A6B-46F465956017@sgul.ac.uk> Message-ID: How can I know where DBD:mysql PATH on my MAC? I am very new to MAC sorry. I just check and mysql.pm is in /Library/Perl/5.8.6/Bio/DB/SeqFeature/Store/DBI/mysql.pm On 8/21/10, Adam Witney wrote: > > On 20 Aug 2010, at 22:29, Minh Bui wrote: > > > Hi,, > > I am trying to load my GFF file to mysql database but I got this error > > when I ran the bp_seqfeature_load.pl ( bioperl 1.6.1 on ?MAC) > > > > [BioComplexity-5:/usr/local/bin] minh% perl bp_seqfeature_load.pl > > install_driver(mysql) failed: Can't locate DBD/mysql.pm in @INC (@INC > > contains: /sw/lib/perl5 /sw/lib/perl5/darwin > > /System/Library/Perl/5.8.6/darwin-thread-multi-2level > > /System/Library/Perl/5.8.6 > > /Library/Perl/5.8.6/darwin-thread-multi-2level /Library/Perl/5.8.6 > > /Library/Perl /Network/Library/Perl/5.8.6/darwin-thread-multi-2level > > /Network/Library/Perl/5.8.6 /Network/Library/Perl > > /System/Library/Perl/Extras/5.8.6/darwin-thread-multi-2level > > /System/Library/Perl/Extras/5.8.6 /Library/Perl/5.8.1 .) at (eval 44) > > line 3. > > Perhaps the DBD::mysql perl module hasn't been fully installed, > > or perhaps the capitalisation of 'mysql' isn't right. > > Available drivers: DBM, ExampleP, File, Gofer, Proxy, Sponge. > > at /Library/Perl/5.8.6/Bio/DB/SeqFeature/Store/DBI/mysql.pm line 212 > > > > I am using MAC OSX version 10.4.10 and MAMP? Isnt it the > > "/Library/Perl/5.8.6" already in @INC? What am I missing? > > I have been googling this error for a few hours. I also install > > Bioperl and reinstall DBD::mysql using CPAN. It still doesnt work.. > > > > Here is my $PERL5LIB: ?/sw/lib/perl5:/sw/lib/perl5/darwin/ > > > > Where did DBD:mysql get installed? can you verify that DBD/mysql.pm is actually in one of those directories listed above? > > From scott at scottcain.net Tue Aug 24 15:04:04 2010 From: scott at scottcain.net (Scott Cain) Date: Tue, 24 Aug 2010 15:04:04 -0400 Subject: [Bioperl-l] bp_seqfeature_load.pl fails on Mac os. Please help. In-Reply-To: References: <491D1B66-741F-4315-8A6B-46F465956017@sgul.ac.uk> Message-ID: Hi Minh, The file you found is not DBD::mysql though; it is Bio::DB::SeqFeature::Store::DBI::mysql, which was installed along with BioPerl. How did you find that file? The same method presumably would turn up DBD::mysql if it existed. I would use a command like this: locate mysql.pm which would locate all of the instances of files name mysql.pm on your computer. I would expect it to be located in /Library/Perl/5.8.6/darwin-thread-multi-2level/DBD/ if it was installed in a "normal" way (that is, not involving macports or fink or MAMP). Scott On Tue, Aug 24, 2010 at 2:56 PM, Minh Bui wrote: > How can I know where DBD:mysql PATH on my MAC? I am very new to MAC sorry. > > I just check and mysql.pm is in > /Library/Perl/5.8.6/Bio/DB/SeqFeature/Store/DBI/mysql.pm > > > > On 8/21/10, Adam Witney wrote: >> >> ?On 20 Aug 2010, at 22:29, Minh Bui wrote: >> >> ?> Hi,, >> ?> I am trying to load my GFF file to mysql database but I got this error >> ?> when I ran the bp_seqfeature_load.pl ( bioperl 1.6.1 on ?MAC) >> ?> >> ?> [BioComplexity-5:/usr/local/bin] minh% perl bp_seqfeature_load.pl >> ?> install_driver(mysql) failed: Can't locate DBD/mysql.pm in @INC (@INC >> ?> contains: /sw/lib/perl5 /sw/lib/perl5/darwin >> ?> /System/Library/Perl/5.8.6/darwin-thread-multi-2level >> ?> /System/Library/Perl/5.8.6 >> ?> /Library/Perl/5.8.6/darwin-thread-multi-2level /Library/Perl/5.8.6 >> ?> /Library/Perl /Network/Library/Perl/5.8.6/darwin-thread-multi-2level >> ?> /Network/Library/Perl/5.8.6 /Network/Library/Perl >> ?> /System/Library/Perl/Extras/5.8.6/darwin-thread-multi-2level >> ?> /System/Library/Perl/Extras/5.8.6 /Library/Perl/5.8.1 .) at (eval 44) >> ?> line 3. >> ?> Perhaps the DBD::mysql perl module hasn't been fully installed, >> ?> or perhaps the capitalisation of 'mysql' isn't right. >> ?> Available drivers: DBM, ExampleP, File, Gofer, Proxy, Sponge. >> ?> at /Library/Perl/5.8.6/Bio/DB/SeqFeature/Store/DBI/mysql.pm line 212 >> ?> >> ?> I am using MAC OSX version 10.4.10 and MAMP? Isnt it the >> ?> "/Library/Perl/5.8.6" already in @INC? What am I missing? >> ?> I have been googling this error for a few hours. I also install >> ?> Bioperl and reinstall DBD::mysql using CPAN. It still doesnt work.. >> ?> >> ?> Here is my $PERL5LIB: ?/sw/lib/perl5:/sw/lib/perl5/darwin/ >> >> >> >> Where did DBD:mysql get installed? can you verify that DBD/mysql.pm is actually in one of those directories listed above? >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From jason at bioperl.org Wed Aug 25 00:33:45 2010 From: jason at bioperl.org (Jason Stajich) Date: Tue, 24 Aug 2010 21:33:45 -0700 Subject: [Bioperl-l] Enquiry on gi_taxid_nucl.dmp.gz In-Reply-To: References: Message-ID: <4C749D29.3040003@bioperl.org> hi - please keep questions on list. I think one of your problem is your first use of $gi2taxidfile is wrong. when you call tie you want to specify an dbfile you want to store the index in. So call it "/tmp/gi2taxid.idx" or something like that. In my code here http://github.com/bioperl/bioperl-live/blob/master/scripts/taxa/classify_hits_kingdom.PLS you will see on line 97 we construct the name of the index file to be the folder, plus 'idx', plus the name gi2taxid which will be the name of index file. Also it would be safer for the split to be whitespace matching and that you want the the two first columns from the file. Doing this would eliminate the need for the chomp on the line above. my ($gi, $taxid) = split(/\s+/, $_); instead of chomp; my ($gi, $taxid) = split(" ", $_,2); There may be other problems but these should be fixed first -- and please send queries to the mailing list rather than to me directly so that others can answer questions. -jason Amali Thrimawithana wrote, On 8/24/10 8:13 PM: > Dear Jason > > Thank you very much for the information. I manage to get the information on > different taxonomic levels with the help of one of your example code > "local_taxonomydb_query". However I am having trouble with creating a local > index file of the gi_taxid_nucl.dmp so that I am able to get the taxonomic > id given the GI number of NCBI. At the moment I am using the tie() function > with DB_file and then storing the detail into a hash. However when I try to > retrieve a taxonomic ID given the GI number, it is not returning any thing > but an error. Below is part of the code (borrowed from the example code > classify kingdom), can you please let me know where I am going wrong? > ... > my $dbh2 = tie(%taxid4gi, 'DB_File', $gi2taxidfile); > > if( ! $done ) { > my $fh; > open(GI2TAXID, "$gi2taxidfile") or die $!; #here passing the unzipped > gi_taxid_nucl.dmp > my$i=0; > while () { > chomp; > my ($gi, $taxid) = split(" ", $_, 2); > $taxid4gi{$gi} = $taxid > if exists $taxid4gi{$gi}; > $i++; > unless( $DEBUG&& $i % 100000 ) { > warn "$i\n"; > } > } > $dbh2->sync; > } > my $gi2='183397240'; > my $taxd2=$taxid4gi{$gi2}; > print $taxd2, " \n"; > > Any help would be much appreciated > > Thanking you > Amali > > On 23 August 2010 06:29, Jason Stajich wrote: > > >> Hi Amali - >> >> This is how I'd print out the full classification by using the Tree methods >> (with probably a different way of initializing the $db object to your >> flatfiles location). >> >> #!/usr/bin/perl -w >> use strict; >> use Bio::DB::Taxonomy; >> >> my $db= Bio::DB::Taxonomy->new(-source => 'flatfile', >> -nodesfile => 'taxonomy/nodes.dmp', >> -namesfile => 'taxonomy/names.dmp'); >> >> my $taxonid = $db->get_taxonid('Homo sapiens'); >> my $taxon = $db->get_taxon(-taxonid => $taxonid); >> my $tree = Bio::Tree::Tree->new(-node => $taxon); >> my @taxa = $tree->get_nodes; >> print join(",", map { $_->scientific_name } @taxa), "\n"; >> >> -jason >> >> Amali Thrimawithana wrote, On 8/18/10 3:56 PM: >> >> Dear Dr Stajich, >> >>> I am a Masters student at Auckland university and my research is on >>> identifying yeast species present in wine by the use of 454 sequencing. In >>> order to carry out this research, a pipeline is being built in which at >>> the >>> final step each representative OTU need to be classified at different >>> taxonomic levels (ie: at Phylum, family, class, genus and species) by >>> using >>> the results from BLAST. To identify the sequences at each taxonomic level, >>> I >>> have been trying out the Bio::DB::Taxonomy module in bioperl. Using this >>> module, I am able to get the genus and species level by splitting the >>> scientific name returned by the Bio::taxon object. But unfortunately I am >>> uncertain on how to get the information for the other levels of the rank. >>> I >>> have tried several commands including "my @class = >>> $node->classification;", >>> but it does not work. Hence, could you please let me know how I might be >>> able to get the higher levels of taxonomy such as class and phylum using >>> bioperl? >>> >>> Look forward to hearing from you soon >>> >>> Thanking You >>> >>> Amali >>> >>> >>> From roy.chaudhuri at gmail.com Wed Aug 25 07:12:15 2010 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Wed, 25 Aug 2010 12:12:15 +0100 Subject: [Bioperl-l] Enquiry on gi_taxid_nucl.dmp.gz In-Reply-To: <4C749D29.3040003@bioperl.org> References: <4C749D29.3040003@bioperl.org> Message-ID: <4C74FA8F.3080506@gmail.com> > Also it would be safer for the split to be whitespace matching and that > you want the the two first columns from the file. Doing this would > eliminate the need for the chomp on the line above. > > my ($gi, $taxid) = split(/\s+/, $_); > > instead of > > chomp; > my ($gi, $taxid) = split(" ", $_,2); Sorry to be pedantic, but according to perldoc -f split: "As a special case, specifying a PATTERN of space (' ') will split on white space just as "split" with no arguments does" The only difference between patterns of " " and /\s+/ is that the latter will return an initial null field if there is leading white space, which may or may not be what you want. $ perl -e 'print join("-", split(" ", " 1\t2 3")), "\n"' 1-2-3 $ perl -e 'print join("-", split(/\s+/, " 1\t2 3")), "\n"' -1-2-3 Cheers. Roy. From kanmaninradha at gmail.com Thu Aug 26 04:29:08 2010 From: kanmaninradha at gmail.com (kanmani radha) Date: Thu, 26 Aug 2010 01:29:08 -0700 Subject: [Bioperl-l] getting DNA sequence for exon features from GFF Message-ID: Hi All, I would like to get the DNA seq from GFF file. I'm using Bio::Tools::GFF module. I could get everything else but not the DNA seq. Can anyone help me to find this out, Please. I appreciate your help very much. thanks, Kanmani #!/usr/bin/perl use strict; use warnings; use Bio::Tools::GFF; my $file = shift; my $gffio = Bio::Tools::GFF->new(-file => $file, -gff_version => 3); $gffio->features_attached_to_seqs(1); while (my $feat = $gffio->next_feature()){ my $start = $feat->start; my $end= $feat->end; my $size = $end-$start+1; my $strand = $feat->strand; my $seqid = $feat->seq_id; my $score = $feat->score; my $frame = $feat->frame; my $source = $feat->source_tag; my $type = $feat->primary_tag; my $gffstr = $gffio->gff_string($feat); my @alltags = $feat->all_tags(); my @ID_tag_value = $feat->each_tag_value("ID"); my $seq = $feat->seq(); print "$seq\n"; if($type eq "gene"){ # print "@ID_tag_value\t$size\t$type\t$start\t$end\n"; } } From David.Messina at sbc.su.se Thu Aug 26 04:53:48 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 26 Aug 2010 10:53:48 +0200 Subject: [Bioperl-l] getting DNA sequence for exon features from GFF In-Reply-To: References: Message-ID: <6A715DB8-C51A-4195-BB6C-73A7345CB65A@sbc.su.se> Admittedly i'm not up on the latest uses of GFF, but as far as I know, GFF is an annotation format only ? it does not contain the actual sequence. Have you looked in your GFF file to see if there are nucleotides in there? Dave On Aug 26, 2010, at 10:29, kanmani radha wrote: > Hi All, > I would like to get the DNA seq from GFF file. I'm using Bio::Tools::GFF > module. I could get everything else but not the DNA seq. From biopython at maubp.freeserve.co.uk Thu Aug 26 05:02:53 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 26 Aug 2010 10:02:53 +0100 Subject: [Bioperl-l] getting DNA sequence for exon features from GFF In-Reply-To: <6A715DB8-C51A-4195-BB6C-73A7345CB65A@sbc.su.se> References: <6A715DB8-C51A-4195-BB6C-73A7345CB65A@sbc.su.se> Message-ID: On Thu, Aug 26, 2010 at 9:53 AM, Dave Messina wrote: > > Admittedly i'm not up on the latest uses of GFF, but as far as I know, GFF > is an annotation format only ? it does not contain the actual sequence. > > Have you looked in your GFF file to see if there are nucleotides in there? > > Dave Actually a GFF file can optionally include a FASTA format sequence at the end of the file, although it seems to be more common to just supply separate GFF and FASTA files and cross reference by ID. Peter From David.Messina at sbc.su.se Thu Aug 26 05:08:20 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 26 Aug 2010 11:08:20 +0200 Subject: [Bioperl-l] getting DNA sequence for exon features from GFF In-Reply-To: References: <6A715DB8-C51A-4195-BB6C-73A7345CB65A@sbc.su.se> Message-ID: Aha, great, thanks for clarifying, Peter. And if I bothered to look at the Bio::Tools::GFF documentation before answering :), I would have seen this: http://doc.bioperl.org/bioperl-live/Bio/Tools/GFF.html#General which describes how you can use $gffio->get_seqs() and related methods to pull out the sequence data. Dave On Aug 26, 2010, at 11:02, Peter wrote: > On Thu, Aug 26, 2010 at 9:53 AM, Dave Messina wrote: >> >> Admittedly i'm not up on the latest uses of GFF, but as far as I know, GFF >> is an annotation format only ? it does not contain the actual sequence. >> >> Have you looked in your GFF file to see if there are nucleotides in there? >> >> Dave > > Actually a GFF file can optionally include a FASTA format sequence > at the end of the file, although it seems to be more common to just > supply separate GFF and FASTA files and cross reference by ID. > > Peter From David.Messina at sbc.su.se Thu Aug 26 05:18:25 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 26 Aug 2010 11:18:25 +0200 Subject: [Bioperl-l] getting DNA sequence for exon features from GFF In-Reply-To: References: <6A715DB8-C51A-4195-BB6C-73A7345CB65A@sbc.su.se> Message-ID: <984552CF-01F3-4D29-932F-DD030CCC1448@sbc.su.se> So, just to finish the thought: Kanmani, Apologies for my sloppy and uninformed answer. The following is only slightly less sloppy and uninformed, but may actually answer your question. I think you need to call $gffio->get_seqs() probably as my @seq_objects = $gffio->get_seqs(); and then loop through those something like: foreach my $seq_object (@seq_objects) { my $seq = $seq_object->seq(); foreach my $feat ($seq->get_SeqFeatures) { # do your feature processing here } } Note that I haven't tested the above code. Dave From fs5 at sanger.ac.uk Thu Aug 26 05:19:44 2010 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Thu, 26 Aug 2010 10:19:44 +0100 Subject: [Bioperl-l] getting DNA sequence for exon features from GFF In-Reply-To: References: Message-ID: <1282814384.4777.48.camel@deskpro15336.dynamic.sanger.ac.uk> Hi Kammani, While GFF files may contain DNA sequence data, most of them don't, so you will have to use the location information you get from the GFF annotation file in conjunction with, e.g., a local FASTA database of the genomic sequence you are working with or an online resource. Frank On Thu, 2010-08-26 at 01:29 -0700, kanmani radha wrote: > Hi All, > I would like to get the DNA seq from GFF file. I'm using Bio::Tools::GFF > module. I could get everything else but not the DNA seq. > > Can anyone help me to find this out, Please. I appreciate your help very > much. > thanks, > Kanmani > > #!/usr/bin/perl > > use strict; > use warnings; > use Bio::Tools::GFF; > > my $file = shift; > > my $gffio = Bio::Tools::GFF->new(-file => $file, -gff_version => 3); > $gffio->features_attached_to_seqs(1); > > while (my $feat = $gffio->next_feature()){ > my $start = $feat->start; > my $end= $feat->end; > my $size = $end-$start+1; > my $strand = $feat->strand; > my $seqid = $feat->seq_id; > my $score = $feat->score; > my $frame = $feat->frame; > my $source = $feat->source_tag; > my $type = $feat->primary_tag; > my $gffstr = $gffio->gff_string($feat); > my @alltags = $feat->all_tags(); > my @ID_tag_value = $feat->each_tag_value("ID"); > > my $seq = $feat->seq(); > print "$seq\n"; > > if($type eq "gene"){ # > print "@ID_tag_value\t$size\t$type\t$start\t$end\n"; > } > } > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From cjfields at illinois.edu Thu Aug 26 10:20:48 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 26 Aug 2010 09:20:48 -0500 Subject: [Bioperl-l] getting DNA sequence for exon features from GFF In-Reply-To: <1282814384.4777.48.camel@deskpro15336.dynamic.sanger.ac.uk> References: <1282814384.4777.48.camel@deskpro15336.dynamic.sanger.ac.uk> Message-ID: <6CC5D44B-47C7-4A05-B834-EAB9173C74CE@illinois.edu> Kammani, If you are using BioPerl, the best option currently available is to load a database with all relevant information (GFF and FASTA), then use that database for querying. The most commonly-used ones now are Bio::DB::SeqFeature::Store and Bio::DB::GFF; the former is very GFF3-centric, but I believe it can handle GFF/GTF, and it has various database adaptors (MySQL, Pg, BDB, SQLite). chris On Aug 26, 2010, at 4:19 AM, Frank Schwach wrote: > Hi Kammani, > > While GFF files may contain DNA sequence data, most of them don't, so > you will have to use the location information you get from the GFF > annotation file in conjunction with, e.g., a local FASTA database of the > genomic sequence you are working with or an online resource. > > > Frank > > > > On Thu, 2010-08-26 at 01:29 -0700, kanmani radha wrote: >> Hi All, >> I would like to get the DNA seq from GFF file. I'm using Bio::Tools::GFF >> module. I could get everything else but not the DNA seq. >> >> Can anyone help me to find this out, Please. I appreciate your help very >> much. >> thanks, >> Kanmani >> >> #!/usr/bin/perl >> >> use strict; >> use warnings; >> use Bio::Tools::GFF; >> >> my $file = shift; >> >> my $gffio = Bio::Tools::GFF->new(-file => $file, -gff_version => 3); >> $gffio->features_attached_to_seqs(1); >> >> while (my $feat = $gffio->next_feature()){ >> my $start = $feat->start; >> my $end= $feat->end; >> my $size = $end-$start+1; >> my $strand = $feat->strand; >> my $seqid = $feat->seq_id; >> my $score = $feat->score; >> my $frame = $feat->frame; >> my $source = $feat->source_tag; >> my $type = $feat->primary_tag; >> my $gffstr = $gffio->gff_string($feat); >> my @alltags = $feat->all_tags(); >> my @ID_tag_value = $feat->each_tag_value("ID"); >> >> my $seq = $feat->seq(); >> print "$seq\n"; >> >> if($type eq "gene"){ # >> print "@ID_tag_value\t$size\t$type\t$start\t$end\n"; >> } >> } >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Aug 26 10:31:59 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 26 Aug 2010 09:31:59 -0500 Subject: [Bioperl-l] getting DNA sequence for exon features from GFF In-Reply-To: References: <6A715DB8-C51A-4195-BB6C-73A7345CB65A@sbc.su.se> Message-ID: On Aug 26, 2010, at 4:02 AM, Peter wrote: > On Thu, Aug 26, 2010 at 9:53 AM, Dave Messina wrote: >> >> Admittedly i'm not up on the latest uses of GFF, but as far as I know, GFF >> is an annotation format only ? it does not contain the actual sequence. >> >> Have you looked in your GFF file to see if there are nucleotides in there? >> >> Dave > > Actually a GFF file can optionally include a FASTA format sequence > at the end of the file, although it seems to be more common to just > supply separate GFF and FASTA files and cross reference by ID. > > Peter IIRC, optionally including FASTA sequence is specified only in the GFF3 spec; use of FASTA isn't explicitly mentioned in earlier versions. We only support it with earlier GFF due to convergence of the various GFF parsers. The original GFF spec proposed allowing sequence, but it's in the form of meta information and I have never seen it used in practice (as you mention, the FASTA is normally loaded separately). chris From kanmaninradha at gmail.com Thu Aug 26 12:22:14 2010 From: kanmaninradha at gmail.com (kanmani radha) Date: Thu, 26 Aug 2010 09:22:14 -0700 Subject: [Bioperl-l] getting DNA sequence for exon features from GFF In-Reply-To: <6CC5D44B-47C7-4A05-B834-EAB9173C74CE@illinois.edu> References: <1282814384.4777.48.camel@deskpro15336.dynamic.sanger.ac.uk> <6CC5D44B-47C7-4A05-B834-EAB9173C74CE@illinois.edu> Message-ID: Hi Everyone, Thanks very much for this clarification. Thanks a ton for every one who spared their time to educate me. I see your points. Please correct me if I am wrong. I understand that, Its better to use use Bio::DB::SeqFeature or Bio::DB::GFF to load the fasta sequences (from a separate multifasta) file and then Bio::Tools::GFF to parse the feature info from a gff file . Then query the created database for the relevent GFF coordinates.... I will implement this. Thanks once again. Kanmani On Thu, Aug 26, 2010 at 7:20 AM, Chris Fields wrote: > Kammani, > > If you are using BioPerl, the best option currently available is to load a > database with all relevant information (GFF and FASTA), then use that > database for querying. The most commonly-used ones now are > Bio::DB::SeqFeature::Store and Bio::DB::GFF; the former is very > GFF3-centric, but I believe it can handle GFF/GTF, and it has various > database adaptors (MySQL, Pg, BDB, SQLite). > > chris > > On Aug 26, 2010, at 4:19 AM, Frank Schwach wrote: > > > Hi Kammani, > > > > While GFF files may contain DNA sequence data, most of them don't, so > > you will have to use the location information you get from the GFF > > annotation file in conjunction with, e.g., a local FASTA database of the > > genomic sequence you are working with or an online resource. > > > > > > Frank > > > > > > > > On Thu, 2010-08-26 at 01:29 -0700, kanmani radha wrote: > >> Hi All, > >> I would like to get the DNA seq from GFF file. I'm using Bio::Tools::GFF > >> module. I could get everything else but not the DNA seq. > >> > >> Can anyone help me to find this out, Please. I appreciate your help very > >> much. > >> thanks, > >> Kanmani > >> > >> #!/usr/bin/perl > >> > >> use strict; > >> use warnings; > >> use Bio::Tools::GFF; > >> > >> my $file = shift; > >> > >> my $gffio = Bio::Tools::GFF->new(-file => $file, -gff_version => 3); > >> $gffio->features_attached_to_seqs(1); > >> > >> while (my $feat = $gffio->next_feature()){ > >> my $start = $feat->start; > >> my $end= $feat->end; > >> my $size = $end-$start+1; > >> my $strand = $feat->strand; > >> my $seqid = $feat->seq_id; > >> my $score = $feat->score; > >> my $frame = $feat->frame; > >> my $source = $feat->source_tag; > >> my $type = $feat->primary_tag; > >> my $gffstr = $gffio->gff_string($feat); > >> my @alltags = $feat->all_tags(); > >> my @ID_tag_value = $feat->each_tag_value("ID"); > >> > >> my $seq = $feat->seq(); > >> print "$seq\n"; > >> > >> if($type eq "gene"){ # > >> print "@ID_tag_value\t$size\t$type\t$start\t$end\n"; > >> } > >> } > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > -- > > The Wellcome Trust Sanger Institute is operated by Genome Research > > Limited, a charity registered in England with number 1021457 and a > > company registered in England with number 2742969, whose registered > > office is 215 Euston Road, London, NW1 2BE. > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Thu Aug 26 13:08:56 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 26 Aug 2010 12:08:56 -0500 Subject: [Bioperl-l] getting DNA sequence for exon features from GFF In-Reply-To: References: <1282814384.4777.48.camel@deskpro15336.dynamic.sanger.ac.uk> <6CC5D44B-47C7-4A05-B834-EAB9173C74CE@illinois.edu> Message-ID: On Aug 26, 2010, at 11:22 AM, kanmani radha wrote: > Hi Everyone, > > Thanks very much for this clarification. Thanks a ton for every one who > spared their time to educate me. > > I see your points. Please correct me if I am wrong. > > I understand that, Its better to use use Bio::DB::SeqFeature or Bio::DB::GFF > to load the fasta sequences (from a separate multifasta) file and > then Bio::Tools::GFF to parse the feature info from a gff file . Then query > the created database for the relevent GFF coordinates.... > > I will implement this. > > Thanks once again. > Kanmani Yes, in general. I forgot to mention that you can have an in-memory database as well, but it's only suggested if you have a few thousand or so features and small sequences (I think bacterial chromosomes will work). chris From Havard.Aanes at nvh.no Wed Aug 25 11:47:12 2010 From: Havard.Aanes at nvh.no (=?iso-8859-1?Q?Aanes_H=E5vard?=) Date: Wed, 25 Aug 2010 17:47:12 +0200 Subject: [Bioperl-l] bpfetch.pl Message-ID: <897520BC3AAE754FA4E34E2FD26490A8021C61597B8D@A-EXMB1.veths.no> Hi, I am trying do obtain a set of mRNA sequences from a database, made by the bpindex script. I thought this should be a trivial task, but it appears not to be. I get the sequences if I do one by one, like this: perl scripts/index/bpfetch.pl -dir ./ zebrafish:NM_201192 zebrafish:NM_212708 But I need hundreds of sequences, so my plan was to put the RefSeq IDs in a file and use that as an argument (or whatever it is called in perl). That does not work: haavaaan at login2 ~/download/src/bioperl-1.2.3 $ perl scripts/index/bpfetch.pl -dir ./ zebrafish:./some_seqs You are running bpindex.pl without installing bioperl. You have done it from bioperl/scripts, and so we can find the necessary information but it is much better to install bioperl Please read the README in the bioperl distribution Sequence %id in Database zebrafish is not present Any suggestions on how to do this? Alternative approaches are also appreciated. I have no experience in perl, just started using linux, and for the moment there is no time to learn perl, so I would really be grateful for any help to solve this specific task. Best regards H?vard Aanes (M.Sc.) Ph.D. student Section for biochemistry and physiology The Norwegian School of Veterinary Science Telephone: +47 22597358 The new e-mail domain name for The Norwegian School of Veterinary Science is @nvh.no. The former domain address @veths.no will still be in use, but it will be discontinued within 1-2 years. Please update your e-mail records. This message verifies that the e-mail has been scanned for virus, and deemed virus-free according to our scanengines. From kanmaninradha at gmail.com Thu Aug 26 04:23:28 2010 From: kanmaninradha at gmail.com (kanmani) Date: Thu, 26 Aug 2010 01:23:28 -0700 (PDT) Subject: [Bioperl-l] Bio::Tools:GFF to get DNA sequences... Message-ID: <9b7381d7-3596-4e60-a2ac-6c8c135d457d@s24g2000pri.googlegroups.com> Hi I am trying to get the DNA sequences for each exon feature. I have the following script. Everything works except getting sequences. Can some one correct me.....Thanks. #!/usr/bin/perl use strict; use warnings; use Bio::Tools::GFF; my $file = shift; my $gffio = Bio::Tools::GFF->new(-file => $file, -gff_version => 3); $gffio->features_attached_to_seqs(1); while (my $feat = $gffio->next_feature()){ my $start = $feat->start; my $end= $feat->end; my $size = $end-$start+1; my $strand = $feat->strand; my $seqid = $feat->seq_id; my $score = $feat->score; my $frame = $feat->frame; my $source = $feat->source_tag; my $type = $feat->primary_tag; my $gffstr = $gffio->gff_string($feat); my @alltags = $feat->all_tags(); my @ID_tag_value = $feat->each_tag_value("ID"); my $seq = $feat->seq(); print "$seq\n"; if($type eq "gene"){ print "@ID_tag_value\t$size\t$type\t$start\t$end\n"; } } From kanmaninradha at gmail.com Thu Aug 26 17:24:40 2010 From: kanmaninradha at gmail.com (kanmani radha) Date: Thu, 26 Aug 2010 14:24:40 -0700 Subject: [Bioperl-l] getting DNA sequence for exon features from GFF In-Reply-To: References: <1282814384.4777.48.camel@deskpro15336.dynamic.sanger.ac.uk> <6CC5D44B-47C7-4A05-B834-EAB9173C74CE@illinois.edu> Message-ID: Hi Chris and others, For a brief amount time i could get away using Bio::DB::Fasta to index fasta files and Bio::Tools::GFF to iterate thru GFF features. But, i hit the wall again. Looks like sequential access of GFF featuers is not sufficient, I want to have a random access to it. I see the only way to do that is by using Bio::DB::GFF as suggested by Chris. Here is my question. Is there any tutorial to configure Bioperl or this module in particular to work with MySQL/postgres. I will really appreciate it. And thanks for all your help. Kanmani On Thu, Aug 26, 2010 at 10:08 AM, Chris Fields wrote: > On Aug 26, 2010, at 11:22 AM, kanmani radha wrote: > > > Hi Everyone, > > > > Thanks very much for this clarification. Thanks a ton for every one who > > spared their time to educate me. > > > > I see your points. Please correct me if I am wrong. > > > > I understand that, Its better to use use Bio::DB::SeqFeature or > Bio::DB::GFF > > to load the fasta sequences (from a separate multifasta) file and > > then Bio::Tools::GFF to parse the feature info from a gff file . Then > query > > the created database for the relevent GFF coordinates.... > > > > I will implement this. > > > > Thanks once again. > > Kanmani > > Yes, in general. I forgot to mention that you can have an in-memory > database as well, but it's only suggested if you have a few thousand or so > features and small sequences (I think bacterial chromosomes will work). > > chris From kanmaninradha at gmail.com Thu Aug 26 18:04:20 2010 From: kanmaninradha at gmail.com (kanmani radha) Date: Thu, 26 Aug 2010 15:04:20 -0700 Subject: [Bioperl-l] getting DNA sequence for exon features from GFF In-Reply-To: References: <1282814384.4777.48.camel@deskpro15336.dynamic.sanger.ac.uk> <6CC5D44B-47C7-4A05-B834-EAB9173C74CE@illinois.edu> Message-ID: HI, I made some progress since then.... - Installing Bio::DB::DBI::mysql needed Biosql. - Downloaded and installed biosql follow the instruction as given in their INSTALL file - Created biosql db in my mysql server - loaded schema using script from biosql - installed DBI - Now, I have problem with DBD::mysql. That reminds me couple years back i had to struggle installing this driver on another machine. I thought i ask around this time. It fails with a bunch of error messages.....the first of it being.... dbdimp.h:22:49 error: mysql.h no such filer or directory But, My mysql installation has header file in "/usr/include/mysql3/mysql/mysql.h". Can anyone suggest how to move forward from that..... thanks, Kanmani On Thu, Aug 26, 2010 at 2:24 PM, kanmani radha wrote: > Hi Chris and others, > > For a brief amount time i could get away using Bio::DB::Fasta to index > fasta files and Bio::Tools::GFF to iterate thru GFF features. But, i hit the > wall again. Looks like sequential access of GFF featuers is not sufficient, > I want to have a random access to it. I see the only way to do that is by > using Bio::DB::GFF as suggested by Chris. > > Here is my question. Is there any tutorial to configure Bioperl or this > module in particular to work with MySQL/postgres. I will really appreciate > it. > > And thanks for all your help. > Kanmani > > > On Thu, Aug 26, 2010 at 10:08 AM, Chris Fields wrote: > >> On Aug 26, 2010, at 11:22 AM, kanmani radha wrote: >> >> > Hi Everyone, >> > >> > Thanks very much for this clarification. Thanks a ton for every one who >> > spared their time to educate me. >> > >> > I see your points. Please correct me if I am wrong. >> > >> > I understand that, Its better to use use Bio::DB::SeqFeature or >> Bio::DB::GFF >> > to load the fasta sequences (from a separate multifasta) file and >> > then Bio::Tools::GFF to parse the feature info from a gff file . Then >> query >> > the created database for the relevent GFF coordinates.... >> > >> > I will implement this. >> > >> > Thanks once again. >> > Kanmani >> >> Yes, in general. I forgot to mention that you can have an in-memory >> database as well, but it's only suggested if you have a few thousand or so >> features and small sequences (I think bacterial chromosomes will work). >> >> chris > > > From rafalucas.unicamp at gmail.com Thu Aug 26 18:11:07 2010 From: rafalucas.unicamp at gmail.com (Rafael Lucas) Date: Thu, 26 Aug 2010 19:11:07 -0300 Subject: [Bioperl-l] Help in algorithm Bio::Structure::IO::pdb Message-ID: Hi folks, How are you? I'm from Brazil and I was making an algorithm that Cryptographyc a data and then print the result in a pdb file. So I have a .fasta file and want to pass this file to .pdb file, if I use a program, like PyMol, it will take so much time, so I wanna use the Bio::Structure::IO::pdb to accelerate this process, could you help me in this problem? Thank you, Rafael Lucas Faculdade de Tecnologia em Analise e Desenvolvimento de Sistemas FT - UNICAMP +55 (19)9614-0533 From J.Christopher.Ellis at duke.edu Thu Aug 26 22:06:30 2010 From: J.Christopher.Ellis at duke.edu (J. Christopher Ellis) Date: Thu, 26 Aug 2010 22:06:30 -0400 Subject: [Bioperl-l] standaloneblastplus blastn crash Message-ID: <55861.1282874790@duke.edu> When I run the standaloneblastplus I get the following error... ------------- EXCEPTION ------------- MSG: C:Program FilesNCBIblast-2.2.24+binblastn.exe call crashed: There was a problem running C:Program FilesNCBIblast-2.2.24+binblastn.exe :? at C:/Perl64/lib/Bio/Tools/Run/WrapperBase/CommandExts.pm line 1001. STACK Bio::Tools::Run::WrapperBase::_run C:/Perl64/lib/Bio/Tools/Run/WrapperBase/CommandExts.pm:1006 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD C:/Perl64/lib/Bio/Tools/Run/StandAloneBlastPlus.pm:1303 STACK Bio::Tools::Run::StandAloneBlastPlus::run C:/Perl64/lib/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:270 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD C:/Perl64/lib/Bio/Tools/Run/StandAloneBlastPlus.pm:1301 STACK toplevel localBlast.pl:9 ------------------------------------- I have a sneaky suspicion that it is an easy fix but for the life of me I can not figure it out! :) Thanks in advance, Chris From indraniel at gmail.com Thu Aug 26 21:57:54 2010 From: indraniel at gmail.com (Indraniel) Date: Fri, 27 Aug 2010 01:57:54 +0000 (UTC) Subject: [Bioperl-l] How to convert SFF into Fastq References: Message-ID: A fourth option is the following tool, sff2fastq (written in C), described here: http://indraniel.wordpress.com/2010/04/23/sff2fastq/ and http://github.com/indraniel/sff2fastq Indraniel From David.Messina at sbc.su.se Fri Aug 27 03:41:21 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 27 Aug 2010 09:41:21 +0200 Subject: [Bioperl-l] [RFC] Interolog::Walk In-Reply-To: <4C6D0B50.4050902@sms.ed.ac.uk> References: <4C6BF4BD.5010200@sms.ed.ac.uk> <8BBCD561-CA5E-412B-8CBC-34767006A74A@sbc.su.se> <4C6D0B50.4050902@sms.ed.ac.uk> Message-ID: Hi Giuseppe, On Aug 19, 2010, at 12:45, Giuseppe Gallone wrote: > Bio::Orthology::InterologMap > Bio::Orthology::Interolog::Map, > just in case somebody else finds other interesting applications for the Interolog concept and would like to "plug in" their own contribution. Would this make any sense? Absolutely. I think either of the above is a good option, and I agree that the second is a little more flexible. Your POD looks great! Way better than most. Having seen the whole thing now, I think your description is fine as is. And if you have another tutorial and example scripts on top of it, that would really be terrific, above and beyond what most people would expect. So, time to unleash it on the world! :) Dave From David.Messina at sbc.su.se Fri Aug 27 03:58:12 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 27 Aug 2010 09:58:12 +0200 Subject: [Bioperl-l] standaloneblastplus blastn crash In-Reply-To: <55861.1282874790@duke.edu> References: <55861.1282874790@duke.edu> Message-ID: <9275A540-AE42-47B0-BA73-A906964C451B@sbc.su.se> Hi Chris, If you look at the error message, it says what the problem is: it's trying to call the blastn executable with no spaces in the path name. > MSG: C:Program FilesNCBIblast-2.2.24+binblastn.exe call crashed: There > was a problem running C:Program FilesNCBIblast-2.2.24+binblastn.exe Now, that could be a problem is BioPerl or it could be a problem in your code. It's hard to diagnose where the problem lies without your code, so please post your code. Dave From G.Gallone at sms.ed.ac.uk Fri Aug 27 07:07:57 2010 From: G.Gallone at sms.ed.ac.uk (Giuseppe Gallone) Date: Fri, 27 Aug 2010 12:07:57 +0100 Subject: [Bioperl-l] [RFC] Interolog::Walk In-Reply-To: References: <4C6BF4BD.5010200@sms.ed.ac.uk> <8BBCD561-CA5E-412B-8CBC-34767006A74A@sbc.su.se> <4C6D0B50.4050902@sms.ed.ac.uk> Message-ID: <4C779C8D.1090007@sms.ed.ac.uk> Hi Dave, thank you very much for your feedback :) . I will register the namespace right now. I think I will use 'homology' as the second level name though, because I plan to extend the module to work with paralogues as well. As for the category, which one of the following you reckon it will fit a Bio:: package better http://www.cpan.org/modules/by-category/ Regards Giuseppe On 27/08/10 08:41, Dave Messina wrote: > Hi Giuseppe, > > > On Aug 19, 2010, at 12:45, Giuseppe Gallone wrote: >> Bio::Orthology::InterologMap >> Bio::Orthology::Interolog::Map, > >> just in case somebody else finds other interesting applications for the Interolog concept and would like to "plug in" their own contribution. Would this make any sense? > > Absolutely. I think either of the above is a good option, and I agree that the second is a little more flexible. > > Your POD looks great! Way better than most. Having seen the whole thing now, I think your description is fine as is. And if you have another tutorial and example scripts on top of it, that would really be terrific, above and beyond what most people would expect. > > So, time to unleash it on the world! :) > > > Dave > > -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From David.Messina at sbc.su.se Fri Aug 27 07:25:06 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 27 Aug 2010 13:25:06 +0200 Subject: [Bioperl-l] [RFC] Interolog::Walk In-Reply-To: <4C779C8D.1090007@sms.ed.ac.uk> References: <4C6BF4BD.5010200@sms.ed.ac.uk> <8BBCD561-CA5E-412B-8CBC-34767006A74A@sbc.su.se> <4C6D0B50.4050902@sms.ed.ac.uk> <4C779C8D.1090007@sms.ed.ac.uk> Message-ID: <80E5F23B-EA13-40EE-B0C5-81F2E6A69C01@sbc.su.se> Hi Giuseppe, > I think I will use 'homology' as the second level name though, because I plan to extend the module to work with paralogues as well. Sounds good. > As for the category, which one of the following you reckon it will fit a Bio:: package better > > http://www.cpan.org/modules/by-category/ Bio:: is in 23 - miscellaneous modules, so probably keeping with that makes sense. I don't know much about that stuff, though. Chris F. or other CPAN cognoscenti care to comment? Dave From cjfields at illinois.edu Fri Aug 27 09:26:51 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 27 Aug 2010 08:26:51 -0500 Subject: [Bioperl-l] [RFC] Interolog::Walk In-Reply-To: <80E5F23B-EA13-40EE-B0C5-81F2E6A69C01@sbc.su.se> References: <4C6BF4BD.5010200@sms.ed.ac.uk> <8BBCD561-CA5E-412B-8CBC-34767006A74A@sbc.su.se> <4C6D0B50.4050902@sms.ed.ac.uk> <4C779C8D.1090007@sms.ed.ac.uk> <80E5F23B-EA13-40EE-B0C5-81F2E6A69C01@sbc.su.se> Message-ID: <88BB7813-E892-4BEC-9C49-5FD22325BBF7@illinois.edu> On Aug 27, 2010, at 6:25 AM, Dave Messina wrote: > Hi Giuseppe, > > >> I think I will use 'homology' as the second level name though, because I plan to extend the module to work with paralogues as well. > > Sounds good. > > >> As for the category, which one of the following you reckon it will fit a Bio:: package better >> >> http://www.cpan.org/modules/by-category/ > > > Bio:: is in 23 - miscellaneous modules, so probably keeping with that makes sense. > > I don't know much about that stuff, though. Chris F. or other CPAN cognoscenti care to comment? > > > Dave That's probably the best spot, as we cover a fairly broad range (mainly due to core monolithic structure). Though it's terribly non-descript, sort of the junk drawer of CPAN. chris From adamkennedybackup at gmail.com Sun Aug 29 07:35:50 2010 From: adamkennedybackup at gmail.com (Adam Kennedy) Date: Sun, 29 Aug 2010 21:35:50 +1000 Subject: [Bioperl-l] Could I install BioPerl on Windows with the ActivePerl 5.12.1? In-Reply-To: <5115F433-06AC-46F1-81AD-D15C4A8D9524@gmail.com> References: <78E913D5-00E2-45F2-AA9D-7F4A7CDBFDA1@gmail.com> <5115F433-06AC-46F1-81AD-D15C4A8D9524@gmail.com> Message-ID: http://strawberryperl.com/download/professional/strawberry-perl-professional-5.10.1.3-alpha-2.msi You get BioPerl installed out the box. Adam K On 20 August 2010 03:20, Christopher Fields wrote: > cc'ing list. ?Looks like the BioPerl PPM is possibly broken for perl 5.12. ?Shouldn't be too hard to fix, but apparently there are a lot of missing packages. Troubling... > > chris > > On Aug 19, 2010, at 11:29 AM, han sun wrote: > >> v5.10 works,thanks. >> >> 2010/8/19 Christopher Fields >> Try using ActivePerl 5.10 instead of v5.12. ?It's very possible the PPM won't work for v5.12 yet. >> >> chris >> >> On Aug 19, 2010, at 9:25 AM, han sun wrote: >> >> > Hello everyone, >> > >> > I have used perl for several months,and I now want to feel the power of >> > bioperl. >> > But it seems that the installing is more difficult than I thought. >> > >> > I typed the commands. >> > >> > >> > >> > install-shell >> > >> > >> > rep add bioperl http://bioperl.org/DIST >> > >> > >> > rep add uwinnipeg >> > http://cpan.uwinnipeg.ca/PPMPackages/12xx/ >> > >> > >> > rep add trouchelle http://trouchelle.com/ppm12/ >> > >> > install BioPerl >> > >> > However,the installing failed, >> > >> > ppm install failed: >> > Can't find any package that provides List::MoreUtils for Bundle-BioPerl-Core >> > Can't find any package that provides PostScript::TextBlock for >> > Bundle-BioPerl-Core >> > Can't find any package that provides Ace:: for Bundle-BioPerl-Core >> > Can't find any package that provides SOAP::Lite for Bundle-BioPerl-Core >> > Can't find any package that provides Convert::Binary::C for >> > Bundle-BioPerl-Core >> > Can't find any package that provides XML::Twig for Bundle-BioPerl-Core >> > Can't find any package that provides DB_File:: for Bundle-BioPerl-Core >> > Can't find any package that provides IPC::Run for GraphViz >> > Can't find any package that provides XML-XPathEngine for XML-DOM-XPath >> > Can't find any package that provides List-MoreUtils for Moose >> > Can't find any package that provides List-MoreUtils for Class-MOP >> > >> > >> > then I tried >> > >> > install http://www.bribes.org/perl/ppm/GD.ppd >> > >> > and tried the installation again,but it still didn't help. >> > >> > * >> > * >> > * >> > * >> > * >> > * >> > >> > >> > *Do you konw what's wrong with the problem?* >> > * >> > * >> > * >> > * >> > *Please help me,thanks very much.* >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields1 at gmail.com Sun Aug 29 11:58:50 2010 From: cjfields1 at gmail.com (Christopher Fields) Date: Sun, 29 Aug 2010 10:58:50 -0500 Subject: [Bioperl-l] Could I install BioPerl on Windows with the ActivePerl 5.12.1? In-Reply-To: References: <78E913D5-00E2-45F2-AA9D-7F4A7CDBFDA1@gmail.com> <5115F433-06AC-46F1-81AD-D15C4A8D9524@gmail.com> Message-ID: Yes, and I am thinking of pointing more and more users that direction instead. Can't say maintaining PPM packages with ever-fluctuating specs is easy when I don't work with Windows anymore. chris On Aug 29, 2010, at 6:35 AM, Adam Kennedy wrote: > http://strawberryperl.com/download/professional/strawberry-perl-professional-5.10.1.3-alpha-2.msi > > You get BioPerl installed out the box. > > Adam K > > On 20 August 2010 03:20, Christopher Fields wrote: >> cc'ing list. Looks like the BioPerl PPM is possibly broken for perl 5.12. Shouldn't be too hard to fix, but apparently there are a lot of missing packages. Troubling... >> >> chris >> >> On Aug 19, 2010, at 11:29 AM, han sun wrote: >> >>> v5.10 works,thanks. >>> >>> 2010/8/19 Christopher Fields >>> Try using ActivePerl 5.10 instead of v5.12. It's very possible the PPM won't work for v5.12 yet. >>> >>> chris >>> >>> On Aug 19, 2010, at 9:25 AM, han sun wrote: >>> >>>> Hello everyone, >>>> >>>> I have used perl for several months,and I now want to feel the power of >>>> bioperl. >>>> But it seems that the installing is more difficult than I thought. >>>> >>>> I typed the commands. >>>> >>>> >>>> >>>> install-shell >>>> >>>> >>>> rep add bioperl http://bioperl.org/DIST >>>> >>>> >>>> rep add uwinnipeg >>>> http://cpan.uwinnipeg.ca/PPMPackages/12xx/ >>>> >>>> >>>> rep add trouchelle http://trouchelle.com/ppm12/ >>>> >>>> install BioPerl >>>> >>>> However,the installing failed, >>>> >>>> ppm install failed: >>>> Can't find any package that provides List::MoreUtils for Bundle-BioPerl-Core >>>> Can't find any package that provides PostScript::TextBlock for >>>> Bundle-BioPerl-Core >>>> Can't find any package that provides Ace:: for Bundle-BioPerl-Core >>>> Can't find any package that provides SOAP::Lite for Bundle-BioPerl-Core >>>> Can't find any package that provides Convert::Binary::C for >>>> Bundle-BioPerl-Core >>>> Can't find any package that provides XML::Twig for Bundle-BioPerl-Core >>>> Can't find any package that provides DB_File:: for Bundle-BioPerl-Core >>>> Can't find any package that provides IPC::Run for GraphViz >>>> Can't find any package that provides XML-XPathEngine for XML-DOM-XPath >>>> Can't find any package that provides List-MoreUtils for Moose >>>> Can't find any package that provides List-MoreUtils for Class-MOP >>>> >>>> >>>> then I tried >>>> >>>> install http://www.bribes.org/perl/ppm/GD.ppd >>>> >>>> and tried the installation again,but it still didn't help. >>>> >>>> * >>>> * >>>> * >>>> * >>>> * >>>> * >>>> >>>> >>>> *Do you konw what's wrong with the problem?* >>>> * >>>> * >>>> * >>>> * >>>> *Please help me,thanks very much.* >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From odclerck at gmail.com Fri Aug 27 03:44:14 2010 From: odclerck at gmail.com (odclerck) Date: Fri, 27 Aug 2010 00:44:14 -0700 (PDT) Subject: [Bioperl-l] fasta header replace Message-ID: <29550202.post@talk.nabble.com> Hi, Was wondering if someone had an easy script available that converts the headers of a fasta sequences to a value stored in a separate text file. Macrogen produces files with sequences that look more or less like this: >100825-30_I01_CF_CentralAmerica1_A1_psbAF.ab1 1012, 1000 bases, 0 checksum. I can filter out the position on the plate e.g. "A1" easily but would like to replace this with the name of the strain stored in a different text file, e.g. "A1_D1222". Realize this sounds pretty basic to most of you, but I'm pretty new at scripting. Olivier -- View this message in context: http://old.nabble.com/fasta-header-replace-tp29550202p29550202.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From J.Christopher.Ellis at duke.edu Mon Aug 30 08:55:04 2010 From: J.Christopher.Ellis at duke.edu (J. Christopher Ellis) Date: Mon, 30 Aug 2010 08:55:04 -0400 Subject: [Bioperl-l] Taxonomy DB problem Message-ID: <51468.1283172904@duke.edu> Hi All, I am trying to extract the entire taxonomy of an organism including the classifications. Some thing like... Phylum:Proteobacteria,?Class:Gammaproteobacteria,?Order:Enterobacteriales,?Family:Enterobacteriaceae,?Genus:Escherichia I?am?not?worried?about?format?just?that?I?get?the?information?and?the?associated?level?of?hierarchy.?The?script?found?at?http://bioperl.org/wiki/Species_names_from_accession_numbers?seemed?like?a?good?starting?point?so?I?copied?it?and?tried?run?it?but?got?an?error. My?first?question?is?"Is?there?a?known?fix?for?this?"?and?my?second?question?is?how?do?I?get?the?full?hierarchical?information?(as?seen?above)?with?the?taxonomy?db? Thanks?for?all?your?help?in?advance! Chris? From rafalucas.unicamp at gmail.com Mon Aug 30 09:24:11 2010 From: rafalucas.unicamp at gmail.com (Rafael Lucas) Date: Mon, 30 Aug 2010 10:24:11 -0300 Subject: [Bioperl-l] help in algorithm Bio::Structure::IO::pdb Message-ID: Hi folks, How are you? I'm from Brazil and I was making an algorithm that Cryptographyc a data and then print the result in a pdb file. So I have a .fasta file and want to pass this file to .pdb file, if I use a program, like PyMol, it will take so much time, so I wanna use the Bio::Structure::IO::pdb to accelerate this process, could you help me in this problem? Thank you, Rafael Lucas Faculdade de Tecnologia em Analise e Desenvolvimento de Sistemas FT - UNICAMP +55 (19)9614-0533 From cjfields at illinois.edu Mon Aug 30 09:36:41 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 30 Aug 2010 08:36:41 -0500 Subject: [Bioperl-l] Taxonomy DB problem In-Reply-To: <51468.1283172904@duke.edu> References: <51468.1283172904@duke.edu> Message-ID: Chris, Regarding a fix for that script, we would have to see your modified script and the error. However, there are modules within BioPerl to essentially do what you want, in particular, Bio::DB::Taxonomy. chris On Aug 30, 2010, at 7:55 AM, J. Christopher Ellis wrote: > Hi All, > > I am trying to extract the entire taxonomy of an organism including the > classifications. Some thing like... > > Phylum:Proteobacteria, Class:Gammaproteobacteria, Order:Enterobacteriales, Family:Enterobacteriaceae, Genus:Escherichia > > I am not worried about format just that I get the information and the associated level of hierarchy. The script found at http://bioperl.org/wiki/Species_names_from_accession_numbers seemed like a good starting point so I copied it and tried run it but got an error. > > My first question is "Is there a known fix for this?" and my second question is how do I get the full hierarchical information (as seen above) with the taxonomy db? > > Thanks for all your help in advance! > > Chris > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From fs5 at sanger.ac.uk Mon Aug 30 11:11:06 2010 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Mon, 30 Aug 2010 16:11:06 +0100 Subject: [Bioperl-l] fasta header replace In-Reply-To: <29550202.post@talk.nabble.com> References: <29550202.post@talk.nabble.com> Message-ID: <4C7BCA0A.70503@sanger.ac.uk> Hi Olivier, Do you know how to read a file and build a hash from the contents? This is what you will need to do, e.g. if your file is A1 Strain_A A2 Strain_A A3 Strain_B then you can do something like: open (INFILE, '>', $infile_path) or die; my %well2strain; While (){ my ($well, $strain) = ($_=~/^([A-Z]\d+)\s+(\w+)/); $well2strain{$well}=$strain; } You can then use the values of the hash to set the sequence ID as you parse the FASTA file. The BioPerl SeqIO howto gives details about how to read and write the FASTA file (http://www.bioperl.org/wiki/HOWTO:SeqIO#Working_Examples). You can change the id of a sequence object with $some_seq_object->id( 'my new ID'); See http://doc.bioperl.org/releases/bioperl-1.0/Bio/Seq.html for details. Hope that helps to get you started. Frank odclerck wrote: > Hi, > Was wondering if someone had an easy script available that converts the > headers of a fasta sequences to a value stored in a separate text file. > > Macrogen produces files with sequences that look more or less like this: > >> 100825-30_I01_CF_CentralAmerica1_A1_psbAF.ab1 1012, 1000 bases, 0 checksum. >> > > I can filter out the position on the plate e.g. "A1" easily but would like > to replace this with the name of the strain stored in a different text file, > e.g. "A1_D1222". > > Realize this sounds pretty basic to most of you, but I'm pretty new at > scripting. > Olivier > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From jessica.sun at gmail.com Mon Aug 30 11:51:39 2010 From: jessica.sun at gmail.com (Jessica Sun) Date: Mon, 30 Aug 2010 11:51:39 -0400 Subject: [Bioperl-l] Git for the lazy In-Reply-To: <4A13D48C-B920-4FA5-AF18-292C764A8B79@sbc.su.se> References: <4A13D48C-B920-4FA5-AF18-292C764A8B79@sbc.su.se> Message-ID: I want to add sequence features with tags and tag values, I want to have them in my order, however somehow it seems it is in default alphabetically orders of the tags, does any one knows how to fix? thanks a lot in advance. From G.Gallone at sms.ed.ac.uk Tue Aug 31 07:52:57 2010 From: G.Gallone at sms.ed.ac.uk (Giuseppe Gallone) Date: Tue, 31 Aug 2010 12:52:57 +0100 Subject: [Bioperl-l] New CPAN Release - Bio::Homology::InterologWalk - A Perl Module to retrieve putative PPIs through Interologs Message-ID: <4C7CED19.80802@sms.ed.ac.uk> Dear Bioperl users, I would like to announce the release of Bio::Homology::InterologWalk, a module that retrieves, scores and visualizes putative Protein-Protein Interactions through the orthology-walk method. The project is available from the following link http://search.cpan.org/~ggallone/ and a description of the idea behind it is here http://search.cpan.org/~ggallone/Bio-Homology-InterologWalk-0.02/lib/Bio/Homology/InterologWalk.pm#DESCRIPTION The project is in a very early stage (currently ver. 0.02 alpha) and has currently been tested only on Linux environments. It has not been tested on Macs, but it should work fine, and I would appreciate any feedback from Mac users who try it. *Any* form of feedback will be extremely appreciated (bug, typos, syntactical errors, verbal abuse etc :) ). Best, Giuseppe -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From cjfields at illinois.edu Tue Aug 31 11:01:59 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 31 Aug 2010 10:01:59 -0500 Subject: [Bioperl-l] Taxonomy DB problem In-Reply-To: <56973.1283255847@duke.edu> References: <56973.1283255847@duke.edu> Message-ID: <7167CA86-857E-4E16-A3D6-BA45045CF892@illinois.edu> Yes, I see that one. It may be the ID hash that is being returned is empty. I'll look into it. -c On Aug 31, 2010, at 6:57 AM, J. Christopher Ellis wrote: > Hi Chris, > > The error is... > > "Use of uninitialized value $id in join or string at C:/Perl64/site/lib/Bio/Tools/EUtilities/EUtilParameters.pm line 363." > > The script from http://bioperl.org/wiki/Species_names_from_accession_numbers is as follows.... > > use Bio::DB::EUtilities; > > > > > > > > > my (%taxa, @taxa); > > > > my (%names, %idmap); > > > > > > > > > # these are protein ids; nuc ids will work by changing -dbfrom => 'nucleotide', > > > > # (probably) > > > > > > > > > my @ids = qw(1621261 89318838 68536103 > > 20807972 > 730439); > > > > > > > my $factory = Bio::DB::EUtilities->new( > > - > eutil => 'elink', > > > -db => 'taxonomy', > > > > > -dbfrom => 'protein', > > > > > -correspondence => 1, > > > > > -id => \@ids); > > > > > > > > > # iterate through the LinkSet objects > > > > while (my $ds = $factory->next_LinkSet) { > > > > > $taxa{($ds->get_submitted_ids)[0] > > } > = ($ds->get_ids)[0] > > } > > > > > > > > > @taxa = @taxa{@ids}; > > > > > > > > > $factory = Bio::DB::EUtilities->new(-eutil > > => > 'esummary', > > > -db => 'taxonomy', > > > > > -id => \@taxa ); > > > > > > > > > while (local $_ = $factory->next_DocSum) > > > { > > > $names{($_->get_contents_by_name('TaxId')) > > [ > 0]} = > > ($_->get_contents_by_name('ScientificName'))[0 > > ] > ; > > } > > > > > > > > > foreach (@ids) { > > > > > $idmap{$_} = $names{$taxa{$_ > > } > }; > > } > > > > > > > > > # %idmap is > > > > # 1621261 => 'Mycobacterium tuberculosis H37Rv' > > > > # 20807972 => 'Thermoanaerobacter tengcongensis MB4' > > > > # 68536103 => 'Corynebacterium jeikeium K411' > > > > # 730439 => 'Bacillus caldolyticus' > > > > # 89318838 => undef (this record has been removed from the db) > > > > > > > > > 1; > > > Thanks, > > > > Chris > > > On Mon 08/30/10 09:36 , "Chris Fields" cjfields at illinois.edu sent: > Chris, > > Regarding a fix for that script, we would have to see your modified script and the error. However, there are modules within BioPerl to essentially do what you want, in particular, Bio::DB::Taxonomy. > > chris > > On Aug 30, 2010, at 7:55 AM, J. Christopher Ellis wrote: > > > Hi All, > > > > I am trying to extract the entire taxonomy of an organism including the > > classifications. Some thing like... > > > > Phylum:Proteobacteria, Class:Gammaproteobacteria, Order:Enterobacteriales, Family:Enterobacteriaceae, Genus:Escherichia > > > > I am not worried about format just that I get the information and the associated level of hierarchy. The script found athttp://bioperl.org/wiki/Species_names_from_accession_numbers">http://bioperl.org/wiki/Species_names_from_accession_numbers seemed like a good starting point so I copied it and tried run it but got an error. > > > > My first question is "Is there a known fix for this?" and my second question is how do I get the full hierarchical information (as seen above) with the taxonomy db? > > > > Thanks for all your help in advance! > > > > Chris > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l">http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From J.Christopher.Ellis at duke.edu Tue Aug 31 07:57:27 2010 From: J.Christopher.Ellis at duke.edu (J. Christopher Ellis) Date: Tue, 31 Aug 2010 07:57:27 -0400 Subject: [Bioperl-l] Taxonomy DB problem Message-ID: <56973.1283255847@duke.edu> Hi Chris, The error is... "Use of uninitialized value $id in join or string at C:/Perl64/site/lib/Bio/Tools/EUtilities/EUtilParameters.pm line 363." The script from http://bioperl.org/wiki/Species_names_from_accession_numbers is as follows.... use?Bio::DB::EUtilities; ? my?(%taxa,?@taxa); my?(%names,?%idmap); ? #?these?are?protein?ids;?nuc?ids?will?work?by?changing?-dbfrom?=>?'nucleotide', #?(probably) ? my?@ids?=?qw(1621261?89318838?68536103? 20807972?730439); ? my?$factory?=?Bio::DB::EUtilities->new( -eutil?=>?'elink', ?-db?=>?'taxonomy', ?-dbfrom?=>?'protein', ?-correspondence?=>?1, ?-id?=>?@ids); ? #?iterate?through?the?LinkSet?objects while?(my?$ds?=?$factory->next_LinkSet)?{ ?$taxa{($ds->get_submitted_ids)[0] }?=?($ds->get_ids)[0] } ? @taxa?=?@taxa{@ids}; ? $factory?=?Bio::DB::EUtilities->new(-eutil? =>?'esummary', ?-db?=>?'taxonomy', ?-id?=>?@taxa?); ? while?(local?$_?=?$factory->next_DocSum) ?{ ?$names{($_->get_contents_by_name('TaxId')) [0]}?=? ($_->get_contents_by_name('ScientificName'))[0 ]; } ? foreach?(@ids)?{ ?$idmap{$_}?=?$names{$taxa{$_ }}; } ? #?%idmap?is #?1621261?=>?'Mycobacterium?tuberculosis?H37Rv' #?20807972?=>?'Thermoanaerobacter?tengcongensis?MB4' #?68536103?=>?'Corynebacterium?jeikeium?K411' #?730439?=>?'Bacillus?caldolyticus' #?89318838?=>?undef?(this?record?has?been?removed?from?the?db) ? 1; Thanks, Chris On Mon 08/30/10 09:36 , "Chris Fields" cjfields at illinois.edu sent: Chris, Regarding a fix for that script, we would have to see your modified script and the error. However, there are modules within BioPerl to essentially do what you want, in particular, Bio::DB::Taxonomy. chris On Aug 30, 2010, at 7:55 AM, J. Christopher Ellis wrote: > Hi All, > > I am trying to extract the entire taxonomy of an organism including the > classifications. Some thing like... > > Phylum:Proteobacteria, Class:Gammaproteobacteria, Order:Enterobacteriales, Family:Enterobacteriaceae, Genus:Escherichia > > I am not worried about format just that I get the information and the associated level of hierarchy. The script found at http://bioperl.org/wiki/Species_names_from_accession_numbers seemed like a good starting point so I copied it and tried run it but got an error. > > My first question is "Is there a known fix for this?" and my second question is how do I get the full hierarchical information (as seen above) with the taxonomy db? > > Thanks for all your help in advance! > > Chris > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From rmb32 at cornell.edu Sun Aug 1 19:17:14 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Sun, 01 Aug 2010 12:17:14 -0700 Subject: [Bioperl-l] GMOD Evo Hackathon Open Call for Participation Message-ID: <4C55C83A.3060700@cornell.edu> We are seeking participants for the GMOD Tools for Evolutionary Biology Hackathon, held November 8-12, 2010 at the US National Evolutionary Synthesis Center (NESCent) in Durham, NC. This hackathon targets three critical gaps in the capabilities of the GMOD toolbox that currently limit its utility for evolutionary research: 1. Visualization of comparative genomics data 2. Visualization of phylogenetic data and trees 3. Support for population diversity and phenotype data If you are interested in these areas and have relevant expertise, you are strongly encouraged to apply. Relevant areas of expertise include more than just software development: if you are a GMOD power user, visualization guru, domain expert (comparative, phylogenetics, population, ...), or documentation wizard, then your skills are needed! How To Apply: Fill out the online application form at http://bit.ly/gmodevohack. Applications are due August 25. About GMOD: GMOD is an intercompatible suite of open-source software components for storing, managing, analyzing, and visualizing genome-scale data. GMOD includes many widely-used software components: GBrowse and JBrowse, both genome viewers; GBrowse_syn, a comparative genomics viewer; Chado, a generic and modular database schema; CMap, a comparative map viewer; as well as many other components including Apollo, MAKER, BioMart, InterMine, and Galaxy. We hope to extend the functionality of existing GMOD components, and integrate new components as well. About Hackathons: A hackathon is an intense event at which a group of programmers with different backgrounds and skills collaborate hands-on and face-to-face to develop working code that is of utility to the community as a whole. The mix of people will include domain experts and computer-savvy end-users. More details about the event, its motivation, organization, procedures, and attendees, as well as URLs to the hackathon and related websites are included below. Sincerely, The GMOD EvoHack Organizing Committee (and project affiliations as relevant): Nicole Washington, Chair (LBNL, modENCODE, Phenote) Robert Buels (SGN, Chado NatDiv) Scott Cain (OICR, GMOD) Dave Clements (NESCent, GMOD) Hilmar Lapp (NESCent, Phenoscape, Chado NatDiv) Sheldon McKay (University of Arizona, iPlant, GBrowse_syn) ----------------------------- About the GMOD Evo Hackathon Overview We are organizing a hackathon to fill critical gaps in the capabilities of the Generic Model Organism Database (GMOD) toolbox that currently limit its utility for evolutionary research. Specifically, we will focus on tools for 1) viewing comparative genomics data; 2) visualizing phylogenomic data; and 3) supporting population diversity data and phenotype annotation. The event will be hosted at NESCent and bring together a group of about 20+ software developers, end-user representatives, and documentation experts who would otherwise not meet. The participants will include key developers of GMOD components that currently lack features critical for emerging evolutionary biology research, developers of informatics tools in evolutionary research that lack GMOD integration, and informatics-savvy biologists who can represent end-user requirements. The event will provide a unique opportunity to infuse the GMOD developer community with a heightened awareness of unmet needs in evolutionary biology that GMOD components have the potential to fill, and for tool developers in evolutionary biology to better understand how best to extend or integrate with already existing GMOD components. Before the Event Discussion of ideas and sometimes even design actually starts well before the hackathon, on mailing lists, wiki pages, and conference calls set up among accepted attendees. This advance work lays the foundation for participants to be productive from the very first day. This also means that participants should be willing to contribute some time in advance of the hackathon itself to participate in this preparatory discussion. During the Event Typically, hackathon participants use the morning of the first day of the event to organize themselves into working groups of between 3 and 6 people, each with a focused implementation objective. Ideas and objectives are discussed, and attendees coalesce around the projects in which they have the most experience or interest. Deliverables / Event Results The meeting's attendance, working groups, and outcomes will be fully logged and documented on the GMOD wiki (http://gmod.org). Each working group during the event will typically have its own wiki page, linked from the main EvoHack page, where it documents its minutes and design notes, and provides links to the code and documentation it produces. Also, since GMOD and NESCent are both committed to open source principles, all code and documentation produced by participants during the hackathon must be published under an OSI-approved open source license. As contributions to existing GMOD tools, all hackathon products will most likely satisfy this requirement automatically. NESCent This event is sponsored by the US National Evolutionary Synthesis Center (NESCent, http://www.nescent.org) through its Informatics Whitepapers program (http://www.nescent.org/informatics/whitepapers.php). NESCent promotes the synthesis of information, concepts and knowledge to address significant, emerging, or novel questions in evolutionary science and its applications. NESCent achieves this by supporting research and education across disciplinary, institutional, geographic, and demographic boundaries (see http://www.nescent.org/science/proposals.php). Links Main GMOD EvoHack page, and full proposal: http://gmod.org/wiki/GMOD_Evo_Hackathon NESCent: http://www.nescent.org/ GMOD: http://gmod.org Similar past NESCent events, see: http://hackathon.nescent.org/ GMOD hackathon application: http://bit.ly/gmodevohack -- http://gmod.org/wiki/GMOD_News http://gmod.org/wiki/GMOD_Europe_2010 http://gmod.org/wiki/Help_Desk_Feedback From maj at fortinbras.us Sun Aug 1 23:19:16 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 1 Aug 2010 19:19:16 -0400 Subject: [Bioperl-l] SOAP Eutilities In-Reply-To: References: Message-ID: <627BEC8B2E624A69A0B11EEBC8C93B71@NewLife> Turns out that module lives in bioperl-run; try git clone git://github.com/bioperl/bioperl-run.git MAJ ----- Original Message ----- From: "Robson de Souza" To: Sent: Saturday, July 31, 2010 4:56 PM Subject: [Bioperl-l] SOAP Eutilities > Hi, > > Bio::DB::SoapEUtilities, referred in the HOWTO on EUtilities, seems to > have disappeared from the Git repository. > A simple > > git clone git://github.com/bioperl/bioperl-live.git > > does not download it. Any ideas why? > Robson > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From David.Messina at sbc.su.se Mon Aug 2 13:58:10 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 2 Aug 2010 15:58:10 +0200 Subject: [Bioperl-l] phyloxml and element order In-Reply-To: References: Message-ID: Hi Fred, Thanks for letting us know about this ? definitely sounds like a bug. Would you please submit this to our bug tracker? http://bugzilla.open-bio.org (You can just copy and paste your previous email.) Dave On Jul 30, 2010, at 06:59, Fr?d?ric Romagn? wrote: > Hi, > > I'm using bioperl to create phyloxml trees, after few tentatives, i got my > tree with all the element/attributes i want but when I write the tree, > element are not written following the order specified in the XSD Schema. > > For example, i got : > > > > Loxosceles intermedia > > Araneomorphae Sicariidae > > > 969 > HAAERADSRKPIWDIAHMVNDLELVD > > > > Araneomorphae Sicariidae > > > > The program forester complains that should be written before the > element. > > According to > http://phyloxml.wordpress.com/2009/11/25/order-of-elements-in-phyloxml this > is what bioperl is supposed to do. > > All my element/attributes are set before writing the tree using > 'add_Annotation', 'add_tag_value' and 'sequence' methods from a > Bio::Tree::AnnotatableNode object, so i think the error comes from the > write_tree method. > > Any help would be appreciated. > > Thank you, > Fred > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From shalabh.sharma7 at gmail.com Mon Aug 2 19:44:35 2010 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Mon, 2 Aug 2010 15:44:35 -0400 Subject: [Bioperl-l] clustalw to maf format Message-ID: Hi, I am trying to convert clustalw to maf format. I am trying to use AlignIO for that but its not working. Its giving me the following error: EXCEPTION Bio::Root::NotImplemented ------------- MSG: Abstract method "Bio::AlignIO::maf::write_aln" is not implemented by package Bio::AlignIO::maf. This is not your fault - author of Bio::AlignIO::maf should be blamed! STACK Bio::Root::RootI::throw_not_implemented /Library/Perl/5.8.8/Bio/Root/RootI.pm:707 STACK Bio::AlignIO::maf::write_aln /Library/Perl/5.8.8/Bio/AlignIO/ maf.pm:176 STACK Bio::AlignIO::PRINT /Library/Perl/5.8.8/Bio/AlignIO.pm:492 STACK toplevel msf2mafy.pl:11 Is there any other way i can convert clustalw to maf? I would really appreciate if anyone can help me out. Thanks Shalabh From Russell.Smithies at agresearch.co.nz Mon Aug 2 20:25:26 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue, 3 Aug 2010 08:25:26 +1200 Subject: [Bioperl-l] clustalw to maf format In-Reply-To: References: Message-ID: <18DF7D20DFEC044098A1062202F5FFF32F02147B68@exchsth.agresearch.co.nz> This might work if you only have a few: http://www.ibi.vu.nl/programs/convertalignwww/ --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of shalabh sharma > Sent: Tuesday, 3 August 2010 7:45 a.m. > To: bioperl-l > Subject: [Bioperl-l] clustalw to maf format > > Hi, > I am trying to convert clustalw to maf format. > I am trying to use AlignIO for that but its not working. > > Its giving me the following error: > > EXCEPTION Bio::Root::NotImplemented ------------- > MSG: Abstract method "Bio::AlignIO::maf::write_aln" is not implemented by > package Bio::AlignIO::maf. > This is not your fault - author of Bio::AlignIO::maf should be blamed! > > STACK Bio::Root::RootI::throw_not_implemented > /Library/Perl/5.8.8/Bio/Root/RootI.pm:707 > STACK Bio::AlignIO::maf::write_aln /Library/Perl/5.8.8/Bio/AlignIO/ > maf.pm:176 > STACK Bio::AlignIO::PRINT /Library/Perl/5.8.8/Bio/AlignIO.pm:492 > STACK toplevel msf2mafy.pl:11 > > > Is there any other way i can convert clustalw to maf? > > I would really appreciate if anyone can help me out. > > Thanks > Shalabh > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From shalabh.sharma7 at gmail.com Mon Aug 2 20:53:31 2010 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Mon, 2 Aug 2010 16:53:31 -0400 Subject: [Bioperl-l] clustalw to maf format In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32F02147B68@exchsth.agresearch.co.nz> References: <18DF7D20DFEC044098A1062202F5FFF32F02147B68@exchsth.agresearch.co.nz> Message-ID: Hi Russell, Thanks for the reply, but i have around 400 alignments and some huge ones :( Thanks Shalabh On Mon, Aug 2, 2010 at 4:25 PM, Smithies, Russell < Russell.Smithies at agresearch.co.nz> wrote: > This might work if you only have a few: > http://www.ibi.vu.nl/programs/convertalignwww/ > > --Russell > > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of shalabh sharma > > Sent: Tuesday, 3 August 2010 7:45 a.m. > > To: bioperl-l > > Subject: [Bioperl-l] clustalw to maf format > > > > Hi, > > I am trying to convert clustalw to maf format. > > I am trying to use AlignIO for that but its not working. > > > > Its giving me the following error: > > > > EXCEPTION Bio::Root::NotImplemented ------------- > > MSG: Abstract method "Bio::AlignIO::maf::write_aln" is not implemented by > > package Bio::AlignIO::maf. > > This is not your fault - author of Bio::AlignIO::maf should be blamed! > > > > STACK Bio::Root::RootI::throw_not_implemented > > /Library/Perl/5.8.8/Bio/Root/RootI.pm:707 > > STACK Bio::AlignIO::maf::write_aln /Library/Perl/5.8.8/Bio/AlignIO/ > > maf.pm:176 > > STACK Bio::AlignIO::PRINT /Library/Perl/5.8.8/Bio/AlignIO.pm:492 > > STACK toplevel msf2mafy.pl:11 > > > > > > Is there any other way i can convert clustalw to maf? > > > > I would really appreciate if anyone can help me out. > > > > Thanks > > Shalabh > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > From biopython at maubp.freeserve.co.uk Mon Aug 2 21:24:09 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 2 Aug 2010 22:24:09 +0100 Subject: [Bioperl-l] clustalw to maf format In-Reply-To: References: Message-ID: On Mon, Aug 2, 2010 at 8:44 PM, shalabh sharma wrote: > Hi, > ? ?I am trying to convert clustalw to maf format. > I am trying to use AlignIO for that but its not working. Could you tell us why you have to use maf format? I'm curious because all of the phylogenetics tools I've had to work with personally will take some other format which is more widely supported (e.g. FASTA, PFAM, ClustalW, PHYLIP, ...). Peter From bernd.web at gmail.com Mon Aug 2 21:25:52 2010 From: bernd.web at gmail.com (Bernd Web) Date: Mon, 2 Aug 2010 23:25:52 +0200 Subject: [Bioperl-l] clustalw to maf format In-Reply-To: References: <18DF7D20DFEC044098A1062202F5FFF32F02147B68@exchsth.agresearch.co.nz> Message-ID: Hi Shalabh, This ConvertAlign does not write maf either, it only reads it (i made it). I found some other converters on the web but they do not export to maf format either... http://biotechvana.uv.es/servers/afc/main.php http://www.hiv.lanl.gov/content/sequence/FORMAT_CONVERSION/form.html Galaxy has a MAF to Fasta converter: http://main.g2.bx.psu.edu/root?tool_id=MAF_To_Fasta1 Regards, Bernd On Mon, Aug 2, 2010 at 10:53 PM, shalabh sharma wrote: > Hi Russell, > ? ? ? ? ? ?Thanks for the reply, but i ?have around 400 alignments and some > huge ones :( > > Thanks > Shalabh > > > On Mon, Aug 2, 2010 at 4:25 PM, Smithies, Russell < > Russell.Smithies at agresearch.co.nz> wrote: > >> This might work if you only have a few: >> http://www.ibi.vu.nl/programs/convertalignwww/ >> >> --Russell >> >> >> > -----Original Message----- >> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> > bounces at lists.open-bio.org] On Behalf Of shalabh sharma >> > Sent: Tuesday, 3 August 2010 7:45 a.m. >> > To: bioperl-l >> > Subject: [Bioperl-l] clustalw to maf format >> > >> > Hi, >> > ? ? I am trying to convert clustalw to maf format. >> > I am trying to use AlignIO for that but its not working. >> > >> > Its giving me the following error: >> > >> > EXCEPTION Bio::Root::NotImplemented ------------- >> > MSG: Abstract method "Bio::AlignIO::maf::write_aln" is not implemented by >> > package Bio::AlignIO::maf. >> > This is not your fault - author of Bio::AlignIO::maf should be blamed! >> > >> > STACK Bio::Root::RootI::throw_not_implemented >> > /Library/Perl/5.8.8/Bio/Root/RootI.pm:707 >> > STACK Bio::AlignIO::maf::write_aln /Library/Perl/5.8.8/Bio/AlignIO/ >> > maf.pm:176 >> > STACK Bio::AlignIO::PRINT /Library/Perl/5.8.8/Bio/AlignIO.pm:492 >> > STACK toplevel msf2mafy.pl:11 >> > >> > >> > Is there any other way i can convert clustalw to maf? >> > >> > I would really appreciate if anyone can help me out. >> > >> > Thanks >> > Shalabh >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> ======================================================================= >> Attention: The information contained in this message and/or attachments >> from AgResearch Limited is intended only for the persons or entities >> to which it is addressed and may contain confidential and/or privileged >> material. Any review, retransmission, dissemination or other use of, or >> taking of any action in reliance upon, this information by persons or >> entities other than the intended recipients is prohibited by AgResearch >> Limited. If you have received this message in error, please notify the >> sender immediately. >> ======================================================================= >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Mon Aug 2 21:31:20 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 2 Aug 2010 16:31:20 -0500 Subject: [Bioperl-l] clustalw to maf format In-Reply-To: References: <18DF7D20DFEC044098A1062202F5FFF32F02147B68@exchsth.agresearch.co.nz> Message-ID: <6E9C9D64-D23A-4FC8-B213-FC8A7FFA4F27@illinois.edu> No other format will work? The main reason you see unimplemented methods like this is there is no active interest in working with this format beyond getting the information stored within them into objects and other commonly-used formats. chris On Aug 2, 2010, at 3:53 PM, shalabh sharma wrote: > Hi Russell, > Thanks for the reply, but i have around 400 alignments and some > huge ones :( > > Thanks > Shalabh > > > On Mon, Aug 2, 2010 at 4:25 PM, Smithies, Russell < > Russell.Smithies at agresearch.co.nz> wrote: > >> This might work if you only have a few: >> http://www.ibi.vu.nl/programs/convertalignwww/ >> >> --Russell >> >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>> bounces at lists.open-bio.org] On Behalf Of shalabh sharma >>> Sent: Tuesday, 3 August 2010 7:45 a.m. >>> To: bioperl-l >>> Subject: [Bioperl-l] clustalw to maf format >>> >>> Hi, >>> I am trying to convert clustalw to maf format. >>> I am trying to use AlignIO for that but its not working. >>> >>> Its giving me the following error: >>> >>> EXCEPTION Bio::Root::NotImplemented ------------- >>> MSG: Abstract method "Bio::AlignIO::maf::write_aln" is not implemented by >>> package Bio::AlignIO::maf. >>> This is not your fault - author of Bio::AlignIO::maf should be blamed! >>> >>> STACK Bio::Root::RootI::throw_not_implemented >>> /Library/Perl/5.8.8/Bio/Root/RootI.pm:707 >>> STACK Bio::AlignIO::maf::write_aln /Library/Perl/5.8.8/Bio/AlignIO/ >>> maf.pm:176 >>> STACK Bio::AlignIO::PRINT /Library/Perl/5.8.8/Bio/AlignIO.pm:492 >>> STACK toplevel msf2mafy.pl:11 >>> >>> >>> Is there any other way i can convert clustalw to maf? >>> >>> I would really appreciate if anyone can help me out. >>> >>> Thanks >>> Shalabh >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> ======================================================================= >> Attention: The information contained in this message and/or attachments >> from AgResearch Limited is intended only for the persons or entities >> to which it is addressed and may contain confidential and/or privileged >> material. Any review, retransmission, dissemination or other use of, or >> taking of any action in reliance upon, this information by persons or >> entities other than the intended recipients is prohibited by AgResearch >> Limited. If you have received this message in error, please notify the >> sender immediately. >> ======================================================================= >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From shalabh.sharma7 at gmail.com Mon Aug 2 22:30:41 2010 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Mon, 2 Aug 2010 18:30:41 -0400 Subject: [Bioperl-l] clustalw to maf format In-Reply-To: <6E9C9D64-D23A-4FC8-B213-FC8A7FFA4F27@illinois.edu> References: <18DF7D20DFEC044098A1062202F5FFF32F02147B68@exchsth.agresearch.co.nz> <6E9C9D64-D23A-4FC8-B213-FC8A7FFA4F27@illinois.edu> Message-ID: Hi All, Thanks for the replies. Actually i am working on a pipeline involving RNAz. I had impression that there must be a converter available as their webserver can take xmfa or maf format but standalone is only accepting maf format. I think i will use a program that can output as xmfa and write to those people if they can provide me with the converter. Thanks Shalabh On Mon, Aug 2, 2010 at 5:31 PM, Chris Fields wrote: > No other format will work? The main reason you see unimplemented methods > like this is there is no active interest in working with this format beyond > getting the information stored within them into objects and other > commonly-used formats. > > chris > > On Aug 2, 2010, at 3:53 PM, shalabh sharma wrote: > > > Hi Russell, > > Thanks for the reply, but i have around 400 alignments and > some > > huge ones :( > > > > Thanks > > Shalabh > > > > > > On Mon, Aug 2, 2010 at 4:25 PM, Smithies, Russell < > > Russell.Smithies at agresearch.co.nz> wrote: > > > >> This might work if you only have a few: > >> http://www.ibi.vu.nl/programs/convertalignwww/ > >> > >> --Russell > >> > >> > >>> -----Original Message----- > >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>> bounces at lists.open-bio.org] On Behalf Of shalabh sharma > >>> Sent: Tuesday, 3 August 2010 7:45 a.m. > >>> To: bioperl-l > >>> Subject: [Bioperl-l] clustalw to maf format > >>> > >>> Hi, > >>> I am trying to convert clustalw to maf format. > >>> I am trying to use AlignIO for that but its not working. > >>> > >>> Its giving me the following error: > >>> > >>> EXCEPTION Bio::Root::NotImplemented ------------- > >>> MSG: Abstract method "Bio::AlignIO::maf::write_aln" is not implemented > by > >>> package Bio::AlignIO::maf. > >>> This is not your fault - author of Bio::AlignIO::maf should be blamed! > >>> > >>> STACK Bio::Root::RootI::throw_not_implemented > >>> /Library/Perl/5.8.8/Bio/Root/RootI.pm:707 > >>> STACK Bio::AlignIO::maf::write_aln /Library/Perl/5.8.8/Bio/AlignIO/ > >>> maf.pm:176 > >>> STACK Bio::AlignIO::PRINT /Library/Perl/5.8.8/Bio/AlignIO.pm:492 > >>> STACK toplevel msf2mafy.pl:11 > >>> > >>> > >>> Is there any other way i can convert clustalw to maf? > >>> > >>> I would really appreciate if anyone can help me out. > >>> > >>> Thanks > >>> Shalabh > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> ======================================================================= > >> Attention: The information contained in this message and/or attachments > >> from AgResearch Limited is intended only for the persons or entities > >> to which it is addressed and may contain confidential and/or privileged > >> material. Any review, retransmission, dissemination or other use of, or > >> taking of any action in reliance upon, this information by persons or > >> entities other than the intended recipients is prohibited by AgResearch > >> Limited. If you have received this message in error, please notify the > >> sender immediately. > >> ======================================================================= > >> > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From chiragmatkarbioinfo at gmail.com Tue Aug 3 07:47:37 2010 From: chiragmatkarbioinfo at gmail.com (chirag matkar) Date: Tue, 3 Aug 2010 13:17:37 +0530 Subject: [Bioperl-l] Pubmed Parsing Message-ID: Hello all, I have a list of Pubmed Ids. I want to parse articles to find specific SNP related information. Can i work it out using a Script? -- Regards, Chirag Matkar From genehack at genehack.org Tue Aug 3 09:03:35 2010 From: genehack at genehack.org (John Anderson) Date: Tue, 3 Aug 2010 05:03:35 -0400 Subject: [Bioperl-l] Pubmed Parsing In-Reply-To: References: Message-ID: <5E557C44-224B-4460-9C2C-E375555B8BE6@genehack.org> On Aug 3, 2010, at 3:47 AM, chirag matkar wrote: > I have a list of Pubmed Ids. > I want to parse articles to find specific SNP related information. > Can i work it out using a Script? Can you provide a more specific example of what you'd like to do? For example, something along the lines of, "for PMID 1234, get ... about SNP 5678" (where '...' is replaced with whatever it is you're trying to get). Even describing how you would obtain this information using the website yourself will be helpful. thanks, john. From gowthaman.ramasamy at seattlebiomed.org Tue Aug 3 05:29:10 2010 From: gowthaman.ramasamy at seattlebiomed.org (Gowthaman Ramasamy) Date: Mon, 2 Aug 2010 22:29:10 -0700 Subject: [Bioperl-l] Getting pileup consensus from BAM files using Bio::DB::Sam In-Reply-To: Message-ID: Hi List, I am trying to find out the consensus using pileup via Bio::DB::Sam. Using the following script I could parse out the ref_base and different bases from reads at that position. Though, I am not able to find a method to derive consensus. Similar to the values produced by "samtools pileup -c -f xxxxxx.fasta yyyyyyy.bam". The script I use now retrives ref base, query bases for each position. How do I improve it to get the consensus? Thanks very much in advance, Gowthaman use Bio::DB::Sam; my $bam = Bio::DB::Sam->new(-bam => 'something.bam', -fasta => 'something.fasta' ); my $cb = sub { my ($seqid, $pos, $pileups) = @_; my $refBase = $bam->segment($seqid, $pos, $pos)->dna; print "\n$pos\t$refBase=>"; for my $pileup (@$pileups){ my $al = $pileup->alignment; my $qBase = substr($al->qseq, $pileup->qpos, 1); print "$qBase,"; } }; $bam->pileup('Lin.chr10i', $cb); From scott at scottcain.net Tue Aug 3 10:32:59 2010 From: scott at scottcain.net (Scott Cain) Date: Tue, 3 Aug 2010 06:32:59 -0400 Subject: [Bioperl-l] Getting pileup consensus from BAM files using Bio::DB::Sam In-Reply-To: References: Message-ID: Hi Gowthaman, I don't see a method to extract the consensus. You are welcome to submit a patch :-) Scott On Tue, Aug 3, 2010 at 1:29 AM, Gowthaman Ramasamy wrote: > Hi List, > I am trying to find out the consensus using pileup via Bio::DB::Sam. Using the following script I could parse out the ref_base and different bases from reads at that position. Though, I am not able to find a method to derive consensus. Similar to the values produced by "samtools pileup -c -f xxxxxx.fasta yyyyyyy.bam". > > The script I use now retrives ref base, query bases for each position. How do I improve it to get the consensus? > > Thanks very much in advance, > Gowthaman > > > use Bio::DB::Sam; > > my $bam = Bio::DB::Sam->new(-bam => 'something.bam', > ? ? ? ? ? ? ? ? ? ? ? ? ? ?-fasta => 'something.fasta' > ? ? ? ? ? ? ? ? ? ? ? ? ? ); > > my $cb = sub { > ? ? ? ? ? ? ? ? ? ? ? ?my ($seqid, $pos, $pileups) = @_; > ? ? ? ? ? ? ? ? ? ? ? ?my $refBase = $bam->segment($seqid, $pos, $pos)->dna; > ? ? ? ? ? ? ? ? ? ? ? ?print "\n$pos\t$refBase=>"; > ? ? ? ? ? ? ? ? ? ? ? ?for my $pileup (@$pileups){ > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?my $al = $pileup->alignment; > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?my $qBase = substr($al->qseq, $pileup->qpos, 1); > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?print "$qBase,"; > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?} > ? ? ? ? ? ? ? ? ? ? ? ?}; > > $bam->pileup('Lin.chr10i', $cb); > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From lincoln.stein at gmail.com Tue Aug 3 16:57:52 2010 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Tue, 3 Aug 2010 12:57:52 -0400 Subject: [Bioperl-l] Getting pileup consensus from BAM files using Bio::DB::Sam In-Reply-To: References: Message-ID: Samtools is running MAQ on the pileup. You could either implement MAQ in perl, or come up with your own consensus caller. Lincoln On Tue, Aug 3, 2010 at 1:29 AM, Gowthaman Ramasamy < gowthaman.ramasamy at seattlebiomed.org> wrote: > Hi List, > I am trying to find out the consensus using pileup via Bio::DB::Sam. Using > the following script I could parse out the ref_base and different bases from > reads at that position. Though, I am not able to find a method to derive > consensus. Similar to the values produced by "samtools pileup -c -f > xxxxxx.fasta yyyyyyy.bam". > > The script I use now retrives ref base, query bases for each position. How > do I improve it to get the consensus? > > Thanks very much in advance, > Gowthaman > > > use Bio::DB::Sam; > > my $bam = Bio::DB::Sam->new(-bam => 'something.bam', > -fasta => 'something.fasta' > ); > > my $cb = sub { > my ($seqid, $pos, $pileups) = @_; > my $refBase = $bam->segment($seqid, $pos, > $pos)->dna; > print "\n$pos\t$refBase=>"; > for my $pileup (@$pileups){ > my $al = $pileup->alignment; > my $qBase = substr($al->qseq, $pileup->qpos, > 1); > print "$qBase,"; > } > }; > > $bam->pileup('Lin.chr10i', $cb); > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From biopython at maubp.freeserve.co.uk Tue Aug 3 17:06:46 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 3 Aug 2010 18:06:46 +0100 Subject: [Bioperl-l] Getting pileup consensus from BAM files using Bio::DB::Sam In-Reply-To: References: Message-ID: On Tue, Aug 3, 2010 at 5:57 PM, Lincoln Stein wrote: > Samtools is running MAQ on the pileup. You could either implement MAQ in > perl, or come up with your own consensus caller. > > Lincoln See also: http://seqanswers.com/forums/showthread.php?t=6241 From gowthaman.ramasamy at seattlebiomed.org Tue Aug 3 17:28:36 2010 From: gowthaman.ramasamy at seattlebiomed.org (Gowthaman Ramasamy) Date: Tue, 3 Aug 2010 10:28:36 -0700 Subject: [Bioperl-l] Getting pileup consensus from BAM files using Bio::DB::Sam In-Reply-To: References: , Message-ID: <89080953C3D300419AACB6E63A7EEFBA5C47613B34@mail02.sbri.org> Hi Lincoln, Thats a good lead. I will try to use MAQ in perl rather than using my simple majority rule. -gowtham ________________________________________ From: Lincoln Stein [lincoln.stein at gmail.com] Sent: Tuesday, August 03, 2010 9:57 AM To: Gowthaman Ramasamy Cc: bioperl-l Subject: Re: [Bioperl-l] Getting pileup consensus from BAM files using Bio::DB::Sam Samtools is running MAQ on the pileup. You could either implement MAQ in perl, or come up with your own consensus caller. Lincoln On Tue, Aug 3, 2010 at 1:29 AM, Gowthaman Ramasamy > wrote: Hi List, I am trying to find out the consensus using pileup via Bio::DB::Sam. Using the following script I could parse out the ref_base and different bases from reads at that position. Though, I am not able to find a method to derive consensus. Similar to the values produced by "samtools pileup -c -f xxxxxx.fasta yyyyyyy.bam". The script I use now retrives ref base, query bases for each position. How do I improve it to get the consensus? Thanks very much in advance, Gowthaman use Bio::DB::Sam; my $bam = Bio::DB::Sam->new(-bam => 'something.bam', -fasta => 'something.fasta' ); my $cb = sub { my ($seqid, $pos, $pileups) = @_; my $refBase = $bam->segment($seqid, $pos, $pos)->dna; print "\n$pos\t$refBase=>"; for my $pileup (@$pileups){ my $al = $pileup->alignment; my $qBase = substr($al->qseq, $pileup->qpos, 1); print "$qBase,"; } }; $bam->pileup('Lin.chr10i', $cb); _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa > From stefan.kirov at bms.com Tue Aug 3 20:22:35 2010 From: stefan.kirov at bms.com (Stefan Kirov) Date: Tue, 03 Aug 2010 16:22:35 -0400 Subject: [Bioperl-l] nmica parser Message-ID: <4C587A8B.8090603@bms.com> Has anyone written nmica parser? If not I will perhaps do that. It should be straightforward- the output is XML. Stefan -------------- next part -------------- A non-text attachment was scrubbed... Name: stefan_kirov.vcf Type: text/x-vcard Size: 207 bytes Desc: not available URL: From fs5 at sanger.ac.uk Wed Aug 4 08:45:39 2010 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Wed, 04 Aug 2010 09:45:39 +0100 Subject: [Bioperl-l] Pubmed Parsing In-Reply-To: References: Message-ID: <1280911539.3499.46.camel@deskpro15336.dynamic.sanger.ac.uk> Hi Chiraq, have a look at this earlier post: http://bioperl.org/pipermail/bioperl-l/2009-April/029690.html However, you won't be able to retrieve all full texts and it is quite a task to parse natural language and get useful information about a gene, protein, SNP etc out of a manuscript. Frank On Tue, 2010-08-03 at 13:17 +0530, chirag matkar wrote: > Hello all, > I have a list of Pubmed Ids. > I want to parse articles to find specific SNP related information. > Can i work it out using a Script? > > > > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From David.Messina at sbc.su.se Thu Aug 5 12:16:17 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 5 Aug 2010 14:16:17 +0200 Subject: [Bioperl-l] call for a TreeIO volunteer Message-ID: <91AC5B00-5969-4C56-B08A-6EEA76916A10@sbc.su.se> Hi everybody, We've got a couple of small open bugs related to the Bio::TreeIO modules, and we could really use someone to take a look at them. Ideally, that someone would have familiarity with TreeIO already.* It'd help us to get the next release (1.6.2) out the door. The bugs in question are: - TreeIO::newick writes root node branch length incorrectly http://bugzilla.open-bio.org/show_bug.cgi?id=3039 - Bio::TreeIO::nhx cannot parse empty [&&NHX] + round-trip failure http://bugzilla.open-bio.org/show_bug.cgi?id=3007 Thanks, Dave on behalf of the core developers * Even if you don't, though, if you've been looking for an opportunity to contribute to BioPerl, and this sounds like something you'd like to work on, by all means raise your hand. From clements at nescent.org Thu Aug 5 17:15:41 2010 From: clements at nescent.org (Dave Clements) Date: Thu, 5 Aug 2010 10:15:41 -0700 Subject: [Bioperl-l] GMOD Europe 2010, 13-16 Sept, Cambridge, UK In-Reply-To: References: Message-ID: GMOD Europe 2010 ================ 13-16 September 2010 Cambridge, UK http://gmod.org/wiki/GMOD_Europe_2010 We are pleased to announce GMOD Europe 2010, four days of GMOD events being held 13-16 September 2010, at the University of Cambridge. GMOD Europe 2010 includes: 1) GMOD Community Meeting, Monday & Tuesday: Project updates, developer and user presentations and best practices, project direction. 2) GMOD Satellite Meetings, Wednesday: Special interest groups where GMOD community members meet to discuss specific topics of interest. 3) InterMine Workshop, Wednesday: A one day workshop on installing, configuring and using the InterMine biological data warehouse system. 4) BioMart Workshop, Thursday: A one day workshop on using the BioMart biological data warehouse system, including accessing data through APIs. Registration is now open for these events. There is a ?50 registration fee for the GMOD Meeting to cover catered lunches and other expenses. Registration for all other events is free, but required, as space is limited. These events are open to all: GMOD users, developers, prospective users, biologists, and computer scientists. See http://gmod.org/wiki/January_2010_GMOD_Meeting for an idea of what goes on at GMOD meetings, GMOD is a collection of interoperable open source software components for managing, visualizing and annotating biological data. GMOD incorporates many widely used tools, including GBrowse and JBrowse for genome browsing, InterMine and BioMart for data mining, Galaxy and Ergatis for workflow, Chado for data management, GBrowse_syn and CMap for comparative genomics, plus many other tools (Apollo, MAKER, Pathway Tools, Textpresso, ...). GMOD is also an active community of researchers and developers addressing common challenges in exploiting their data. If you are struggling to fully exploit your data then please consider attending GMOD Europe 2010. Please let us know if you have any questions, and we hope to see you in Cambridge. Thanks, Scott Cain and Dave Clements -- http://gmod.org/wiki/GMOD_News http://gmod.org/wiki/GMOD_Evo_Hackathon http://gmod.org/wiki/GMOD_Europe_2010 http://gmod.org/wiki/Help_Desk_Feedback From abhishek.vit at gmail.com Thu Aug 5 22:15:56 2010 From: abhishek.vit at gmail.com (Abhishek Pratap) Date: Thu, 5 Aug 2010 18:15:56 -0400 Subject: [Bioperl-l] Wrapper for Picard tools in Bioperl Message-ID: Hi All Just wondering if there is any Picard wrapper/s available in Bioperl. Thanks! -Abhi ----------------------------- Abhishek Pratap Bioinformatics Software Engineer II Genomics Resource Center Institute for Genome Sciences School of Medicine, Univ of Maryland 801, W. Baltimore Street, Baltimore, MD 21209 Ph: (+1)-410-706-2296 www.igs.umaryland.edu/ From Russell.Smithies at agresearch.co.nz Thu Aug 5 22:37:46 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Fri, 6 Aug 2010 10:37:46 +1200 Subject: [Bioperl-l] Wrapper for Picard tools in Bioperl In-Reply-To: References: Message-ID: <18DF7D20DFEC044098A1062202F5FFF32F02262E96@exchsth.agresearch.co.nz> Might be part of the "Enterprise" package. If not, some developer should "make it so". :-) --Russell (I hate Fridays) > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Abhishek Pratap > Sent: Friday, 6 August 2010 10:16 a.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Wrapper for Picard tools in Bioperl > > Hi All > > Just wondering if there is any Picard wrapper/s available in Bioperl. > > > Thanks! > -Abhi > > ----------------------------- > Abhishek Pratap > Bioinformatics Software Engineer II > Genomics Resource Center > Institute for Genome Sciences > School of Medicine, Univ of Maryland > 801, W. Baltimore Street, Baltimore, MD 21209 > Ph: (+1)-410-706-2296 > www.igs.umaryland.edu/ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From cjfields at illinois.edu Thu Aug 5 23:10:16 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 5 Aug 2010 18:10:16 -0500 Subject: [Bioperl-l] Wrapper for Picard tools in Bioperl In-Reply-To: References: Message-ID: <26E3E5B6-47CF-4744-9687-199C218B5571@illinois.edu> Picard uses samtools, which has a perl API: http://search.cpan.org/dist/Bio-SamTools/ which uses BioPerl. Ah, the circle of life... chris On Aug 5, 2010, at 5:15 PM, Abhishek Pratap wrote: > Hi All > > Just wondering if there is any Picard wrapper/s available in Bioperl. > > > Thanks! > -Abhi > > ----------------------------- > Abhishek Pratap > Bioinformatics Software Engineer II > Genomics Resource Center > Institute for Genome Sciences > School of Medicine, Univ of Maryland > 801, W. Baltimore Street, Baltimore, MD 21209 > Ph: (+1)-410-706-2296 > www.igs.umaryland.edu/ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From dan.kortschak at adelaide.edu.au Fri Aug 6 01:06:45 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Fri, 06 Aug 2010 10:36:45 +0930 Subject: [Bioperl-l] MUMmer parser work Message-ID: <1281056805.2414.26.camel@zoidberg.mbs.adelaide.edu.au> Hello Everyone, I've just noticed the absence of a MUMmer parser and thought that it might be a worthwhile contribution to bioperl-run (I won't be able to start on this for a while, but given Mark's excellent work on CommandExts, it should take too long to get up when I do have time). Has anyone made any effort in this direction that I would be stepping on, or if they have left it, that I could pick up to shorten the work time? cheers Dan From cjfields at illinois.edu Fri Aug 6 03:13:51 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 5 Aug 2010 22:13:51 -0500 Subject: [Bioperl-l] MUMmer parser work In-Reply-To: <1281056805.2414.26.camel@zoidberg.mbs.adelaide.edu.au> References: <1281056805.2414.26.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: <80AF6158-9ADF-47A6-97EC-C322F75C8959@illinois.edu> Dan, Just so you know, there is a proposed MUMmer AlignIO parser that John (genehack) is planning on trying to incorporate in: http://bugzilla.open-bio.org/show_bug.cgi?id=2701 It currently lacks significant tests, so feel free to chip in there as needed. chris On Aug 5, 2010, at 8:06 PM, Dan Kortschak wrote: > Hello Everyone, > > I've just noticed the absence of a MUMmer parser and thought that it > might be a worthwhile contribution to bioperl-run (I won't be able to > start on this for a while, but given Mark's excellent work on > CommandExts, it should take too long to get up when I do have time). Has > anyone made any effort in this direction that I would be stepping on, or > if they have left it, that I could pick up to shorten the work time? > > cheers > Dan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From greg at ebi.ac.uk Fri Aug 6 09:47:21 2010 From: greg at ebi.ac.uk (Gregory Jordan) Date: Fri, 6 Aug 2010 10:47:21 +0100 Subject: [Bioperl-l] call for a TreeIO volunteer In-Reply-To: <91AC5B00-5969-4C56-B08A-6EEA76916A10@sbc.su.se> References: <91AC5B00-5969-4C56-B08A-6EEA76916A10@sbc.su.se> Message-ID: I can help out with these. I'm pretty sure I've previously fought with (and perhaps even come up with a fix for) bug 3039, and I can take a look at 3007 too. Now lemme just see if I can get up and running with the Bioperl test suite. I'll give a shout if I run into any problems. Cheers, Greg On 5 August 2010 13:16, Dave Messina wrote: > Hi everybody, > > We've got a couple of small open bugs related to the Bio::TreeIO modules, > and we could really use someone to take a look at them. Ideally, that > someone would have familiarity with TreeIO already.* > > It'd help us to get the next release (1.6.2) out the door. > > The bugs in question are: > - TreeIO::newick writes root node branch length incorrectly > http://bugzilla.open-bio.org/show_bug.cgi?id=3039 > > - Bio::TreeIO::nhx cannot parse empty [&&NHX] + round-trip failure > http://bugzilla.open-bio.org/show_bug.cgi?id=3007 > > > Thanks, > Dave > on behalf of the core developers > > > * Even if you don't, though, if you've been looking for an opportunity to > contribute to BioPerl, and this sounds like something you'd like to work on, > by all means raise your hand. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jun.yin at ucd.ie Fri Aug 6 10:52:14 2010 From: jun.yin at ucd.ie (Jun Yin) Date: Fri, 06 Aug 2010 11:52:14 +0100 Subject: [Bioperl-l] Packages retrieving online alignment sequences Message-ID: <00d901cb3555$6e3a5500$4aaeff00$%yin@ucd.ie> Hi, all, I am the google summer of code student working on refactoring Bio::Align subsystem. I recently implemented several packages retrieving online alignment sequences. The aim of the packages are to provide convenient methods to retrieve online alignment sequences for the BioPerl users. The alignment sequences are converted into Bio::SimpleAlign object after the retrieval, which will be easy to manipulate and write to local disk. Now the packages support Pfam, Rfam, Prosite and Entrez Protein Clusters databases. Here is the structure of the packages: Packages Bio::DB::Align (interface, and calling other packages) Bio::DB::Align::Pfam (retrieving alignment from Pfam) Bio::DB::Align::Rfam (retrieving alignment from Rfam) Bio::DB::Align:Prosite (retrieving alignment from Prosite) Bio::DB::Align:ProtClustDB (retrieving alignment from Entrez Protein Clusters Database) Usually four methods are provided for each package: Methods get_Aln_by_id (retrieving alignment by id and returns Bio::SimpleAlign object) get_Aln_by_acc (retrieving alignment by acession and returns Bio::SimpleAlign object) (Rfam and Prosite only supports this method) id2acc (id to accession conversion) acc2id (accession to id conversion) These packages are built dependent on LWP::UserAgent, HTTP::Request and Bio::DB::GenericWebAgent. Bio::DB::Align::ProtClustDB is dependent on Bio::DB::EUtilities. Calling the packages can be: my $dbobj=Bio::DB::Align->new(-db=>"rfam"); Or, my $dbobj= Bio::DB::Align::Pfam->new(); my $aln=$dbobj->get_Aln_by_acc("RF0001"); my $aln2=$dbobj->get_Aln_by_acc(-accession=>"RF0001",-alignment=>"full"); print $aln->length(); foreach my $seq ($aln->each_Seq) { #do something } I have done some tests on these packages. And, I will write them into standard tests later. Any suggestions on these packages are welcome. Cheers, Jun Yin Ph.D. student in U.C.D. Bioinformatics Laboratory Conway Institute University College Dublin From David.Messina at sbc.su.se Fri Aug 6 12:59:19 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 6 Aug 2010 14:59:19 +0200 Subject: [Bioperl-l] call for a TreeIO volunteer In-Reply-To: References: <91AC5B00-5969-4C56-B08A-6EEA76916A10@sbc.su.se> Message-ID: <6D6DAA77-2A2F-4AAA-B36D-FACED1FDE383@sbc.su.se> > I can help out with these. I'm pretty sure I've previously fought with (and perhaps even come up with a fix for) bug 3039, and I can take a look at 3007 too. Awesome ? thanks Greg! > Now lemme just see if I can get up and running with the Bioperl test suite. I'll give a shout if I run into any problems. Please do. Dave From David.Messina at sbc.su.se Fri Aug 6 13:06:47 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 6 Aug 2010 15:06:47 +0200 Subject: [Bioperl-l] Packages retrieving online alignment sequences In-Reply-To: <00d901cb3555$6e3a5500$4aaeff00$%yin@ucd.ie> References: <00d901cb3555$6e3a5500$4aaeff00$%yin@ucd.ie> Message-ID: Sounds great, Jun! Did you happen to test your code on very large alignments? I know there's one in Pfam that's something like 100,000 sequences. An rRNA, I believe. Dave From jun.yin at ucd.ie Fri Aug 6 13:11:41 2010 From: jun.yin at ucd.ie (Jun Yin) Date: Fri, 06 Aug 2010 14:11:41 +0100 Subject: [Bioperl-l] Packages retrieving online alignment sequences In-Reply-To: References: <00d901cb3555$6e3a5500$4aaeff00$%yin@ucd.ie> Message-ID: <00fc01cb3568$e97968b0$bc6c3a10$%yin@ucd.ie> Hi, Dave, Thx for reminding me this. I will definitely try it. Cheers, Jun Yin Ph.D.?student in U.C.D. Bioinformatics Laboratory Conway Institute University College Dublin -----Original Message----- From: Dave Messina [mailto:David.Messina at sbc.su.se] Sent: Friday, August 06, 2010 2:07 PM To: Jun Yin Cc: bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Packages retrieving online alignment sequences Sounds great, Jun! Did you happen to test your code on very large alignments? I know there's one in Pfam that's something like 100,000 sequences. An rRNA, I believe. Dave __________ Information from ESET Smart Security, version of virus signature database 5346 (20100806) __________ The message was checked by ESET Smart Security. http://www.eset.com __________ Information from ESET Smart Security, version of virus signature database 5346 (20100806) __________ The message was checked by ESET Smart Security. http://www.eset.com From cjfields at illinois.edu Fri Aug 6 13:19:54 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 6 Aug 2010 08:19:54 -0500 Subject: [Bioperl-l] call for a TreeIO volunteer In-Reply-To: <6D6DAA77-2A2F-4AAA-B36D-FACED1FDE383@sbc.su.se> References: <91AC5B00-5969-4C56-B08A-6EEA76916A10@sbc.su.se> <6D6DAA77-2A2F-4AAA-B36D-FACED1FDE383@sbc.su.se> Message-ID: <8CB3DE9A-4C5C-42A3-94B4-8818D7143951@illinois.edu> On Aug 6, 2010, at 7:59 AM, Dave Messina wrote: > >> I can help out with these. I'm pretty sure I've previously fought with (and perhaps even come up with a fix for) bug 3039, and I can take a look at 3007 too. > > Awesome ? thanks Greg! > > >> Now lemme just see if I can get up and running with the Bioperl test suite. I'll give a shout if I run into any problems. > > Please do. > > > > Dave Agreed, and thanks for helping out! chris From dianabowley at gmail.com Fri Aug 6 22:33:57 2010 From: dianabowley at gmail.com (DRBowley) Date: Fri, 6 Aug 2010 15:33:57 -0700 (PDT) Subject: [Bioperl-l] BioPerl install issues Message-ID: I'm new to both perl and bioperl and I'm having issues installing bioperl. I'm trying to install on a Mac OS 10.6.4, and I've already installed perl (5.10.0). I tried installing using the recommended approach for Mac - via Fink... "fink install bioperl-pm5100" Looking back over the terminal window text it looks like the problem is: "This package requires Module::Build v0.2805 or greater to install itself." I tried doing "fink selfupdate" and that did not fix the problem. Any suggestions? Thanks! Diana From Kevin.M.Brown at asu.edu Fri Aug 6 22:50:45 2010 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Fri, 6 Aug 2010 15:50:45 -0700 Subject: [Bioperl-l] BioPerl install issues In-Reply-To: References: Message-ID: <1A4207F8295607498283FE9E93B775B406E44A05@EX02.asurite.ad.asu.edu> http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix#INSTALLING_BIOPE RL_THE_EASY_WAY_USING_Build.PL Not sure why you had to install perl since it should have been part of the stock OSX install (or at least it was last time I logged onto a mac). Not sure why the Fink method has so many issues, but might try the above which works for linux or bsd. -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of DRBowley Sent: Friday, August 06, 2010 3:34 PM To: bioperl-l at bioperl.org Subject: [Bioperl-l] BioPerl install issues I'm new to both perl and bioperl and I'm having issues installing bioperl. I'm trying to install on a Mac OS 10.6.4, and I've already installed perl (5.10.0). I tried installing using the recommended approach for Mac - via Fink... "fink install bioperl-pm5100" Looking back over the terminal window text it looks like the problem is: "This package requires Module::Build v0.2805 or greater to install itself." I tried doing "fink selfupdate" and that did not fix the problem. Any suggestions? Thanks! Diana _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From skastu01 at students.poly.edu Sat Aug 7 00:03:50 2010 From: skastu01 at students.poly.edu (Lakshmi Kastury) Date: Sat, 7 Aug 2010 00:03:50 +0000 Subject: [Bioperl-l] BioPerl install issues Message-ID: Hi - I went through several failed attempts on MACOS Snow Leopard, and fink was a dead end. Eventually I succeeded to install on Windows Vista using CPAN. I am not sure if this method will work with MACOS: 1. Opened command prompt. 2. Typed command: >perl -MCPAN -e "install Bundle::BioPerl" 3. Answered yes to the series of questions, which prompts install of several bundles and a compiler. The instructions were in a link from: http://bioperl.org/Core/Latest/INSTALL All the best, Lakshmi > Date: Fri, 6 Aug 2010 15:33:57 -0700 > From: dianabowley at gmail.com > To: bioperl-l at bioperl.org > Subject: [Bioperl-l] BioPerl install issues > > I'm new to both perl and bioperl and I'm having issues installing > bioperl. I'm trying to install on a Mac OS 10.6.4, and I've already > installed perl (5.10.0). I tried installing using the recommended > approach for Mac - via Fink... > "fink install bioperl-pm5100" > > Looking back over the terminal window text it looks like the problem > is: > "This package requires Module::Build v0.2805 or greater to install > itself." > > I tried doing "fink selfupdate" and that did not fix the problem. > > Any suggestions? > > Thanks! > Diana > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Sat Aug 7 06:47:40 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Sat, 7 Aug 2010 08:47:40 +0200 Subject: [Bioperl-l] BioPerl install issues In-Reply-To: References: Message-ID: <5BE9DB7C-9A51-4C09-8F83-8CA8ED4AADFE@sbc.su.se> On Aug 7, 2010, at 02:03 , Lakshmi Kastury wrote: > I am not sure if this method will work with MACOS: It will. CPAN is cross-platform and is the best way to install BioPerl. Dave From cjfields at illinois.edu Sat Aug 7 13:58:56 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 7 Aug 2010 08:58:56 -0500 Subject: [Bioperl-l] BioPerl install issues In-Reply-To: <5BE9DB7C-9A51-4C09-8F83-8CA8ED4AADFE@sbc.su.se> References: <5BE9DB7C-9A51-4C09-8F83-8CA8ED4AADFE@sbc.su.se> Message-ID: It should work fine. Even installing from trunk right now works w/o failing tests. chris On Aug 7, 2010, at 1:47 AM, Dave Messina wrote: > > On Aug 7, 2010, at 02:03 , Lakshmi Kastury wrote: > >> I am not sure if this method will work with MACOS: > > It will. CPAN is cross-platform and is the best way to install BioPerl. > > > Dave > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From greg at ebi.ac.uk Sat Aug 7 21:14:58 2010 From: greg at ebi.ac.uk (Gregory Jordan) Date: Sat, 7 Aug 2010 22:14:58 +0100 Subject: [Bioperl-l] Packages retrieving online alignment sequences In-Reply-To: <00fc01cb3568$e97968b0$bc6c3a10$%yin@ucd.ie> References: <00d901cb3555$6e3a5500$4aaeff00$%yin@ucd.ie> <00fc01cb3568$e97968b0$bc6c3a10$%yin@ucd.ie> Message-ID: Maybe I'm just a bit naive here, but what is the expected difference between accession and ID and why do we need a separate method for each? Seems to me that one could just have a single method, get_Aln, which determines under the hood whether the query string is an accession or ID. It would be nice if the SimpleAlign object had its Annotation filled with some extra metadata (such as accession, ID, database version number, URI, etc.). One other thing: have you thought about adding an Ensembl adaptor? Or maybe something similar already exists in BioPerl...? Sure Ensembl provides their own Perl API, but for someone who doesn't want to go through the hassle of installing it from CVS (pardon my french, but wtf!?! Who still uses CVS) and learning a whole new API, it might be convenient to have a simple BioPerl module for quickly grabbing gene family alignments from the public Ensembl MySQL databases. I'd be willing to help write the necessary SQL queries for this. greg On 6 August 2010 14:11, Jun Yin wrote: > Hi, Dave, > > Thx for reminding me this. I will definitely try it. > > Cheers, > Jun Yin > Ph.D. student in U.C.D. > > Bioinformatics Laboratory > Conway Institute > University College Dublin > > > -----Original Message----- > From: Dave Messina [mailto:David.Messina at sbc.su.se] > Sent: Friday, August 06, 2010 2:07 PM > To: Jun Yin > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Packages retrieving online alignment sequences > > Sounds great, Jun! > > Did you happen to test your code on very large alignments? I know there's > one in Pfam that's something like 100,000 sequences. An rRNA, I believe. > > > Dave > > > __________ Information from ESET Smart Security, version of virus signature > database 5346 (20100806) __________ > > The message was checked by ESET Smart Security. > > http://www.eset.com > > > > > __________ Information from ESET Smart Security, version of virus signature > database 5346 (20100806) __________ > > The message was checked by ESET Smart Security. > > http://www.eset.com > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Sat Aug 7 22:07:39 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 7 Aug 2010 17:07:39 -0500 Subject: [Bioperl-l] Packages retrieving online alignment sequences In-Reply-To: References: <00d901cb3555$6e3a5500$4aaeff00$%yin@ucd.ie> <00fc01cb3568$e97968b0$bc6c3a10$%yin@ucd.ie> Message-ID: <21E3B6D7-01BC-4DDA-B5B3-06F1F5AD7105@illinois.edu> On Aug 7, 2010, at 4:14 PM, Gregory Jordan wrote: > Maybe I'm just a bit naive here, but what is the expected difference between > accession and ID and why do we need a separate method for each? Depends on the remote service, but in many cases there is a difference. With NCBI eutils you can have either an accession and the unique identifier (UID, or GI for nuc/protein seqs). efetch can use both, but only the UID is guaranteed to retrieve a single sequence all the time; the accession can (very rarely) map to more than one sequence. The other eutils services require either a string (esearch) or a UID, but do not allow an accession. > Seems to me > that one could just have a single method, get_Aln, which determines under > the hood whether the query string is an accession or ID. A simpler method could be introduced, but I can see that being potentially brittle in the long run. A naked alphanumeric string doesn't reveal much about what it is at face value w/o knowing database/service-specific behavior. And then we're reliant on that behavior not changing, which we can't guarantee (this has bitten us in the past). What would one do if NCBI (for instance) allowed accessions derived completely of digits, or conversely a unique ID with mixed alphanumerics? Using methods specific for ID/acc at least guarantees a behavior on the backend w/o guessing, and if there is no danger of overlap (a service accepts either/or) one could simply be an alias of the other. > It would be nice if the SimpleAlign object had its Annotation filled with > some extra metadata (such as accession, ID, database version number, URI, > etc.). According to the deobfuscator SimpleAlign does have accession() and id(). The others could be simple attributes, and can be added as simple getter/setters, or as annotation via Bio::Annotation (this is the way Stockholm annotation is currently handled). Something to think about. > One other thing: have you thought about adding an Ensembl adaptor? Or maybe > something similar already exists in BioPerl...? That's a good idea, though it might make more sense if this was done when mem-efficient (possibly DB-dependent) AlignI modules are present within bioperl, which is part of the GSoC (see below). For instance, have a Bio::Align::AlignI with a backend ensembl DB adaptor that works lazily. If using the Ensembl Perl API, a few possible roadblocks/problems might pop up. Ensembl currently requires bioperl (v1.2.3, but it works with the latest as well, at least when I've used it). If using the ensembl perl API we would just need to ensure we aren't conflicting with ensembl code that pulls in bioperl classes expecting a v1.2.3 API when we only support the latest. I don't foresee this being an issue, though (there is precedent for this, see Sendu's Ensembl module Bio::Tools::Run::Ensembl in bioperl-run). > Sure Ensembl provides their own Perl API, but for someone who doesn't want > to go through the hassle of installing it from CVS (pardon my french, but > wtf!?! Who still uses CVS) and learning a whole new API, it might be > convenient to have a simple BioPerl module for quickly grabbing gene family > alignments from the public Ensembl MySQL databases. I'd be willing to help > write the necessary SQL queries for this. > > greg The GSoC project on alignment subsystem refactoring will be finishing up this month, so I'm sure Jun discuss ideas for initial DB-dependent implementations. The more input and coders implementing the better, IMO. As for writing up an adaptor to ensembl outside of it's API, overall I don't think it's a bad idea, but if it's possible maybe start without reinventing things, then move to direct SQL. Unless it's easier to use SQL. chris > On 6 August 2010 14:11, Jun Yin wrote: > >> Hi, Dave, >> >> Thx for reminding me this. I will definitely try it. >> >> Cheers, >> Jun Yin >> Ph.D. student in U.C.D. >> >> Bioinformatics Laboratory >> Conway Institute >> University College Dublin >> >> >> -----Original Message----- >> From: Dave Messina [mailto:David.Messina at sbc.su.se] >> Sent: Friday, August 06, 2010 2:07 PM >> To: Jun Yin >> Cc: bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] Packages retrieving online alignment sequences >> >> Sounds great, Jun! >> >> Did you happen to test your code on very large alignments? I know there's >> one in Pfam that's something like 100,000 sequences. An rRNA, I believe. >> >> >> Dave >> >> >> __________ Information from ESET Smart Security, version of virus signature >> database 5346 (20100806) __________ >> >> The message was checked by ESET Smart Security. >> >> http://www.eset.com >> >> >> >> >> __________ Information from ESET Smart Security, version of virus signature >> database 5346 (20100806) __________ >> >> The message was checked by ESET Smart Security. >> >> http://www.eset.com >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hartzell at alerce.com Sat Aug 7 21:45:04 2010 From: hartzell at alerce.com (George Hartzell) Date: Sat, 7 Aug 2010 14:45:04 -0700 Subject: [Bioperl-l] BioPerl install issues In-Reply-To: References: <5BE9DB7C-9A51-4C09-8F83-8CA8ED4AADFE@sbc.su.se> Message-ID: <19549.54240.499140.501136@gargle.gargle.HOWL> Chris Fields writes: > It should work fine. Even installing from trunk right now works > w/o failing tests. As a slight aside, if you're looking to build a current perl binary for your mac (e.g. 5.12.1) you should take a look at perlbrew (http://search.cpan.org/dist/App-perlbrew/). The three steps at the top of the installation section of the README are all you need to get going. Even a manager can do it. If you're using bash on the mac via terminal you'll probably want to put the one-liner they prescribe into your .bash_profile instead of your .bashrc, but everything else just flows right along. Once you have that in place you have a nicely isolated system into which you can install things to your hearts content without worrying about PERL5LIB and local::lib and the rest. g. From cjfields at illinois.edu Sun Aug 8 01:19:54 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 7 Aug 2010 20:19:54 -0500 Subject: [Bioperl-l] BioPerl install issues In-Reply-To: <19549.54240.499140.501136@gargle.gargle.HOWL> References: <5BE9DB7C-9A51-4C09-8F83-8CA8ED4AADFE@sbc.su.se> <19549.54240.499140.501136@gargle.gargle.HOWL> Message-ID: On Aug 7, 2010, at 4:45 PM, George Hartzell wrote: > Chris Fields writes: >> It should work fine. Even installing from trunk right now works >> w/o failing tests. > > As a slight aside, if you're looking to build a current perl binary > for your mac (e.g. 5.12.1) you should take a look at perlbrew > (http://search.cpan.org/dist/App-perlbrew/). The three steps at the > top of the installation section of the README are all you need to get > going. Even a manager can do it. > > If you're using bash on the mac via terminal you'll probably want to > put the one-liner they prescribe into your .bash_profile instead of > your .bashrc, but everything else just flows right along. > > Once you have that in place you have a nicely isolated system into > which you can install things to your hearts content without worrying > about PERL5LIB and local::lib and the rest. > > g. Have to second using perlbrew, started using it for my local Ubuntu installation (don't have it running on my macbook yet, but it's in the plans). chris From greg at ebi.ac.uk Sun Aug 8 06:12:41 2010 From: greg at ebi.ac.uk (Gregory Jordan) Date: Sun, 8 Aug 2010 07:12:41 +0100 Subject: [Bioperl-l] Packages retrieving online alignment sequences In-Reply-To: <21E3B6D7-01BC-4DDA-B5B3-06F1F5AD7105@illinois.edu> References: <00d901cb3555$6e3a5500$4aaeff00$%yin@ucd.ie> <00fc01cb3568$e97968b0$bc6c3a10$%yin@ucd.ie> <21E3B6D7-01BC-4DDA-B5B3-06F1F5AD7105@illinois.edu> Message-ID: On 7 August 2010 23:07, Chris Fields wrote: > > A simpler method could be introduced, but I can see that being potentially > brittle in the long run. A naked alphanumeric string doesn't reveal much > about what it is at face value w/o knowing database/service-specific > behavior. And then we're reliant on that behavior not changing, which we > can't guarantee (this has bitten us in the past). What would one do if NCBI > (for instance) allowed accessions derived completely of digits, or > conversely a unique ID with mixed alphanumerics? > > Using methods specific for ID/acc at least guarantees a behavior on the > backend w/o guessing, and if there is no danger of overlap (a service > accepts either/or) one could simply be an alias of the other. > Thanks for the clarification on IDs vs accessions. As long as the behavior and distinction are well-documented, I'm sure it won't make too much of a difference. My main concern was just that having two similar methods -- with no clearly laid out distinction between the two and one of them only supported by half of the implementing subclasses -- might confuse potential users. As a point of reference: both Rfam and Pfam allow either an ID or an accession in their front-page search interface (http://www.pfam.org / http://www.rfam.org/). In fact, they seem to entirely hide the distinction between ID and Accession from the end user; nowhere on the Rfam page for an individual result is it clear which string is the accession and which is the ID (http://rfam.sanger.ac.uk/family/snoZ107_R87). Thus, a potential user of the Rfam module wouldn't know whether to call the get_by_ID or get_by_Accession method, even after looking at the Rfam page for his / her desired alignment! As you can probably tell, I'm all in favor of a unified search whenever feasible / possible. :-) > As for writing up an adaptor to ensembl outside of it's API, overall I > don't think it's a bad idea, but if it's possible maybe start without > reinventing things, then move to direct SQL. Unless it's easier to use SQL. > > For fetching Ensembl's gene family alignments, using the SQL will be easiest. They don't tend to get unreasonably large in terms of memory -- I think the biggest tend to be ~700 sequences with a few thousand alignment columns or so -- and it's a simple table join or two to get both the tree and alignment from the database. For genomic alignments, I agree that a more memory-efficient and/or lazy backend would be necessary. And it's pretty much impossible to get those things out of the Ensembl tables without using their API. --greg From dan.kortschak at adelaide.edu.au Mon Aug 9 00:53:43 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Mon, 09 Aug 2010 10:23:43 +0930 Subject: [Bioperl-l] MUMmer parser work In-Reply-To: <80AF6158-9ADF-47A6-97EC-C322F75C8959@illinois.edu> References: <1281056805.2414.26.camel@zoidberg.mbs.adelaide.edu.au> <80AF6158-9ADF-47A6-97EC-C322F75C8959@illinois.edu> Message-ID: <1281315223.2414.48.camel@zoidberg.mbs.adelaide.edu.au> Hi Chris, Is that set of files planned to be included in the git repository on bioperl-live? I don't want to push something that is being organised by someone else. cheers Dan On Thu, 2010-08-05 at 22:13 -0500, Chris Fields wrote: > Dan, > > Just so you know, there is a proposed MUMmer AlignIO parser that John (genehack) is planning on trying to incorporate in: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2701 > > It currently lacks significant tests, so feel free to chip in there as needed. > > chris From genehack at genehack.org Mon Aug 9 01:42:27 2010 From: genehack at genehack.org (John SJ Anderson) Date: Sun, 8 Aug 2010 21:42:27 -0400 Subject: [Bioperl-l] MUMmer parser work In-Reply-To: <1281315223.2414.48.camel@zoidberg.mbs.adelaide.edu.au> References: <1281056805.2414.26.camel@zoidberg.mbs.adelaide.edu.au> <80AF6158-9ADF-47A6-97EC-C322F75C8959@illinois.edu> <1281315223.2414.48.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: <5BEA6ECA-B7A7-4417-BC91-763AB956347A@genehack.org> I'm working on getting those files into a topic branch in bioperl-live so they can be reviewed -- that'll probably be pushed back to the main master within the next couple days at the latest. j. On Aug 8, 2010, at 20:53 , Dan Kortschak wrote: > Hi Chris, > > Is that set of files planned to be included in the git repository on > bioperl-live? I don't want to push something that is being organised by > someone else. > > cheers > Dan > > On Thu, 2010-08-05 at 22:13 -0500, Chris Fields wrote: >> Dan, >> >> Just so you know, there is a proposed MUMmer AlignIO parser that John (genehack) is planning on trying to incorporate in: >> >> http://bugzilla.open-bio.org/show_bug.cgi?id=2701 >> >> It currently lacks significant tests, so feel free to chip in there as needed. >> >> chris > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From dan.kortschak at adelaide.edu.au Mon Aug 9 02:03:52 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Mon, 09 Aug 2010 11:33:52 +0930 Subject: [Bioperl-l] MUMmer parser work In-Reply-To: <5BEA6ECA-B7A7-4417-BC91-763AB956347A@genehack.org> References: <1281056805.2414.26.camel@zoidberg.mbs.adelaide.edu.au> <80AF6158-9ADF-47A6-97EC-C322F75C8959@illinois.edu> <1281315223.2414.48.camel@zoidberg.mbs.adelaide.edu.au> <5BEA6ECA-B7A7-4417-BC91-763AB956347A@genehack.org> Message-ID: <1281319432.2414.49.camel@zoidberg.mbs.adelaide.edu.au> Excellent. Thanks for that. Dan On Sun, 2010-08-08 at 21:42 -0400, John SJ Anderson wrote: > I'm working on getting those files into a topic branch in bioperl-live so they can be reviewed -- that'll probably be pushed back to the main master within the next couple days at the latest. > > j. From cjfields at illinois.edu Tue Aug 10 02:40:07 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 9 Aug 2010 21:40:07 -0500 Subject: [Bioperl-l] bioperl-live, moving Bio->lib/Bio Message-ID: Any objections to moving the Bio directory to lib/Bio in bioperl-live? It's a more standard location for code in most distributions; I have a branch (topic/cjfields_standard_lib) that has this working, though it's possible that it needs more work. chris From genehack at genehack.org Tue Aug 10 08:30:44 2010 From: genehack at genehack.org (John SJ Anderson) Date: Tue, 10 Aug 2010 04:30:44 -0400 Subject: [Bioperl-l] bioperl-live, moving Bio->lib/Bio In-Reply-To: References: Message-ID: On Aug 9, 2010, at 22:40 , Chris Fields wrote: > Any objections to moving the Bio directory to lib/Bio in bioperl-live? +1 on this idea. j. From genehack at genehack.org Tue Aug 10 11:21:51 2010 From: genehack at genehack.org (John Anderson) Date: Tue, 10 Aug 2010 07:21:51 -0400 Subject: [Bioperl-l] MUMmer parser work In-Reply-To: <5BEA6ECA-B7A7-4417-BC91-763AB956347A@genehack.org> References: <1281056805.2414.26.camel@zoidberg.mbs.adelaide.edu.au> <80AF6158-9ADF-47A6-97EC-C322F75C8959@illinois.edu> <1281315223.2414.48.camel@zoidberg.mbs.adelaide.edu.au> <5BEA6ECA-B7A7-4417-BC91-763AB956347A@genehack.org> Message-ID: <7A4F93AB-1BF7-4775-BC0E-38E7B431ECC6@genehack.org> On Aug 8, 2010, at 9:42 PM, John SJ Anderson wrote: > I'm working on getting those files into a topic branch in bioperl-live so they can be reviewed -- that'll probably be pushed back to the main master within the next couple days at the latest. Okay, the files have been added to topic/bug-2701 -- see . Please note, these are just the files from the bug report, slotted into the appropriate spots. I haven't reviewed the code or done anything about the non-BioPerl-y tests or the general lack of test coverage. I hope to do something about that in the coming week, but if somebody beats me to it, that would be okay too. j. From maj at fortinbras.us Tue Aug 10 23:52:05 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 10 Aug 2010 19:52:05 -0400 Subject: [Bioperl-l] bioperl-live, moving Bio->lib/Bio In-Reply-To: References: Message-ID: <1C55239986494A8D82BDC21A85B324E9@NewLife> +1 ----- Original Message ----- From: "Chris Fields" To: "BioPerl List" Sent: Monday, August 09, 2010 10:40 PM Subject: [Bioperl-l] bioperl-live, moving Bio->lib/Bio > Any objections to moving the Bio directory to lib/Bio in bioperl-live? It's a > more standard location for code in most distributions; I have a branch > (topic/cjfields_standard_lib) that has this working, though it's possible that > it needs more work. > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From fayroz_farouk at yahoo.com Sun Aug 8 08:24:31 2010 From: fayroz_farouk at yahoo.com (fayroz) Date: Sun, 8 Aug 2010 01:24:31 -0700 (PDT) Subject: [Bioperl-l] using HMMER Message-ID: <603590.1072.qm@web112620.mail.gq1.yahoo.com> i need your help, i?am a new perl user and want to use bioperl modules to run HMMER program ( HMMsearch) i have" model.hmm" and a "fasta file" to?see?which of them are similar?with the model i write this code but there is a problems #!/usr/local/bin/perl W use Bio::AlignIO; use Bio::SearchIO; use Bio::SeqIO ; use Bio::Tools::Run::Hmmer; # run hmmsearch (similar for hmmpfam) my $factory = Bio::Tools::Run::Hmmer->new(-hmm => 'h6_avian.hmm',-informat => 'fasta'); my $seq = Bio::SeqIO->new('-file'=> "one_seq.fa", '-format'=>'Fasta'); # Pass the factory a Bio::Seq object or a file name, returns a Bio::SearchIO my $searchio = $factory->hmmsearch($seq); while (my $result = $searchio->next_result){ while(my $hit = $result->next_hit){ while (my $hsp = $hit->next_hsp){ print join("\t", ( $result->query_name, $hsp->query->start, $hsp->query->end, $hit->name, $hsp->hit->start, $hsp->hit->end, $hsp->score, $hsp->evalue, $hsp->seq_str, )), "\n"; } } } exceptions: MSG: Unknown kind of input 'Bio::SeqIO::fasta=HASH(0x329a504)' STACK Bio::Tools::Run::Hmmer::_setinput D:/Perl/site/lib/Bio/Tools/Run/Hmmer.pm:381 STACK Bio::Tools::Run::Hmmer::hmmsearch D:/Perl/site/lib/Bio/Tools/Run/Hmmer.pm:352 ?STACK toplevel test_bioperl.pl:12 thank you fayroz? From douglas.hoen at gmail.com Wed Aug 11 01:54:53 2010 From: douglas.hoen at gmail.com (Douglas Hoen) Date: Tue, 10 Aug 2010 21:54:53 -0400 Subject: [Bioperl-l] Bio::SeqFeature::SimilarityPair->from_searchResult()? Message-ID: <4513D6B2-F7B3-4A6E-91CA-879C9E372E84@gmail.com> Hi, I was wondering why the Synopsis in the docs for Bio::SeqFeature::SimilarityPair has the following: $sim_pair = Bio::SeqFeature::SimilarityPair->from_searchResult($blastHit); There doesn't actually seem to be a from_searchResult method. Am I missing something? Thanks, -- Doug From zhaoy at mail.cbi.pku.edu.cn Wed Aug 11 08:17:42 2010 From: zhaoy at mail.cbi.pku.edu.cn (zhaoy at mail.cbi.pku.edu.cn) Date: Wed, 11 Aug 2010 16:17:42 +0800 (CST) Subject: [Bioperl-l] About extracting sequence from genewise format result Message-ID: <53663.162.105.250.100.1281514662.squirrel@mail.cbi.pku.edu.cn> Dear authors: Hello! Recently I am trying to parse the genewise format result for extracting the nuclear sequence using method "hit_string" in module "SearchIO", however, the result is empty. What's more terrible, the cycle seems not working, because I always get the last result. I'm confused. My perl code is shown below: #!/usr/bin/perl -w use strict; use warnings; use Bio::SearchIO; my $in = new Bio::SearchIO(-format => 'wise', -wisetype => 'genewise', -file => 'test'); while( my $result = $in->next_result ) { while (my $hit = $result->next_hit) { while (my $hsp = $hit->next_hsp){ print "Query=", $result->query_name, "\n", "Length=", $hsp->length('total'),"\n", "hit_string:", $hsp->hit_string, "\n"; } } } And one of the genewise format results is shown below: genewise $Name: wise2-4-0alpha $ (unreleased release) This program is freely distributed under a GPL. See source directory Copyright (c) GRL limited: portions of the code are from separate copyright Query protein: Cpa_s110_24 Comp Matrix: BLOSUM62.bla Gap open: 12 Gap extension: 2 Start/End global Target Sequence Bdi_chr3:38292015..38292302 Strand: forward Start/End (protein) global Gene Parameter file: gene.stat Splice site model: GT/AG only Codon Table: codon.table Subs error: 1e-06 Indel error: 1e-06 Null model syn Algorithm 623 genewise output Score 37.97 bits over entire alignment Scores as bits over a synchronous coding model Warning: The bits scores is not probablistically correct for single seqs See WWW help for more info Cpa_s110_24 1 MGNCQAVDAATLAIQHPS-GKVDRLYWPVSASEVMRTNPGHYVALLI-- MGNCQA DAA + IQHP+ GKV+RLYWP +A++VMR NPGHYVAL++ MGNCQAADAAAVVIQHPAEGKVERLYWPATAADVMRKNPGHYVALVVVH Bdi_chr3:382920 1 agatcggggggggacccgggaggccttcgaggggacaacgctggcgggc tgagaccaccctttaaccagatagtagcccccattgaacgaatctttta gctcgggtggcggcgcgcgggcgcccggccgcccgcgcccccccccccc Cpa_s110_24 47 ----STTLCPSNSNASNAESVRVTRIKLLRPTDTLVLGQVYRLITTQEV P+ + A + R+T++KLL+P DTL++GQVYRLIT+Q VSGGAGETDPAVAGGGAAAAARITKVKLLKPRDTLLIGQVYRLITSQ-- Bdi_chr3:382920 148 gtgggggagcgggggggggggaaaagaccaccgaccagcgtccaatc tcggcgacacctcgggcccccgtcatattacgactttgatagttcca cctcctgtcccacaaaattccgccgcgccgcgctgcccgccccccca Cpa_s110_24 92 MKGLWAKKCAKMKKYQEADHKDGLKPETIPGRRSGPERDTQVAKHERHR ------------------------------------------------- Bdi_chr3:382920 289 Cpa_s110_24 141 SRVAASTNQAGLKSRTWQPSLKSISEAAS ----------------------------- Bdi_chr3:382920 289 // Gene 1 Gene 1 288 Exon 1 288 phase 0 Supporting 1 54 1 18 Supporting 58 141 19 46 Supporting 160 288 47 89 // ...... The part of output of this code is shown below: Query=Aly_481360 Length=0 hit_string: Query=Aly_481360 Length=0 hit_string: ...... What's wrong with my code and how can I get the correct result? I'm looking forward to your reply. Thanks very much! Best regards, Zackaly From roy.chaudhuri at gmail.com Wed Aug 11 14:32:39 2010 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Wed, 11 Aug 2010 15:32:39 +0100 Subject: [Bioperl-l] using HMMER In-Reply-To: <603590.1072.qm@web112620.mail.gq1.yahoo.com> References: <603590.1072.qm@web112620.mail.gq1.yahoo.com> Message-ID: <4C62B487.9090103@gmail.com> Hi Fayroz, Your $seq variable contains a Bio::SeqIO object (a biological filehandle), not a Bio::Seq (sequence object). You need to change that line to: my $seqio = Bio::SeqIO->new(-file=>'one_seq.fa', -format=>'fasta'); my $seq=$seqio->next_seq; If you have multiple sequences in the file, then you will need to loop over them: while (my $seq=$seqio->next_seq) { # Code to run Hmmer goes here } Also, I don't think you need to specify -informat for your Bio::Tools::Run::Hmmer object, since you're passing it a sequence object, not a filename. Hope this helps. Roy. On 08/08/2010 09:24, fayroz wrote: > i need your help, i am a new perl user and want to use bioperl modules to run > HMMER program ( HMMsearch) i have" model.hmm" and a "fasta file" to see which of > them are similar with the model > i write this code but there is a problems > > #!/usr/local/bin/perl W > use Bio::AlignIO; > use Bio::SearchIO; > use Bio::SeqIO ; > use Bio::Tools::Run::Hmmer; > > # run hmmsearch (similar for hmmpfam) > my $factory = Bio::Tools::Run::Hmmer->new(-hmm => 'h6_avian.hmm',-informat => > 'fasta'); > my $seq = Bio::SeqIO->new('-file'=> "one_seq.fa", '-format'=>'Fasta'); > > # Pass the factory a Bio::Seq object or a file name, returns a Bio::SearchIO > my $searchio = $factory->hmmsearch($seq); > > while (my $result = $searchio->next_result){ > while(my $hit = $result->next_hit){ > while (my $hsp = $hit->next_hsp){ > print join("\t", ( $result->query_name, > $hsp->query->start, > $hsp->query->end, > $hit->name, > $hsp->hit->start, > $hsp->hit->end, > $hsp->score, > $hsp->evalue, > $hsp->seq_str, > )), "\n"; > } > } > } > > > exceptions: > MSG: Unknown kind of input 'Bio::SeqIO::fasta=HASH(0x329a504)' > STACK Bio::Tools::Run::Hmmer::_setinput > D:/Perl/site/lib/Bio/Tools/Run/Hmmer.pm:381 > STACK Bio::Tools::Run::Hmmer::hmmsearch > D:/Perl/site/lib/Bio/Tools/Run/Hmmer.pm:352 > STACK toplevel test_bioperl.pl:12 > thank you > > fayroz > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Aug 11 15:07:36 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 11 Aug 2010 10:07:36 -0500 Subject: [Bioperl-l] using HMMER In-Reply-To: <4C62B487.9090103@gmail.com> References: <603590.1072.qm@web112620.mail.gq1.yahoo.com> <4C62B487.9090103@gmail.com> Message-ID: <62C86AFB-FF3A-44C6-A413-50C3F839DF34@illinois.edu> might also want to check whether you are using hmmer2 vs hmmer3. not sure if the wrapper works for hmmer3. chris On Aug 11, 2010, at 9:32 AM, Roy Chaudhuri wrote: > Hi Fayroz, > > Your $seq variable contains a Bio::SeqIO object (a biological filehandle), not a Bio::Seq (sequence object). > > You need to change that line to: > my $seqio = Bio::SeqIO->new(-file=>'one_seq.fa', -format=>'fasta'); > my $seq=$seqio->next_seq; > > If you have multiple sequences in the file, then you will need to loop over them: > while (my $seq=$seqio->next_seq) { > # Code to run Hmmer goes here > } > > Also, I don't think you need to specify -informat for your Bio::Tools::Run::Hmmer object, since you're passing it a sequence object, not a filename. > > Hope this helps. > Roy. > > On 08/08/2010 09:24, fayroz wrote: >> i need your help, i am a new perl user and want to use bioperl modules to run >> HMMER program ( HMMsearch) i have" model.hmm" and a "fasta file" to see which of >> them are similar with the model >> i write this code but there is a problems >> >> #!/usr/local/bin/perl W >> use Bio::AlignIO; >> use Bio::SearchIO; >> use Bio::SeqIO ; >> use Bio::Tools::Run::Hmmer; >> >> # run hmmsearch (similar for hmmpfam) >> my $factory = Bio::Tools::Run::Hmmer->new(-hmm => 'h6_avian.hmm',-informat => >> 'fasta'); >> my $seq = Bio::SeqIO->new('-file'=> "one_seq.fa", '-format'=>'Fasta'); >> >> # Pass the factory a Bio::Seq object or a file name, returns a Bio::SearchIO >> my $searchio = $factory->hmmsearch($seq); >> >> while (my $result = $searchio->next_result){ >> while(my $hit = $result->next_hit){ >> while (my $hsp = $hit->next_hsp){ >> print join("\t", ( $result->query_name, >> $hsp->query->start, >> $hsp->query->end, >> $hit->name, >> $hsp->hit->start, >> $hsp->hit->end, >> $hsp->score, >> $hsp->evalue, >> $hsp->seq_str, >> )), "\n"; >> } >> } >> } >> >> >> exceptions: >> MSG: Unknown kind of input 'Bio::SeqIO::fasta=HASH(0x329a504)' >> STACK Bio::Tools::Run::Hmmer::_setinput >> D:/Perl/site/lib/Bio/Tools/Run/Hmmer.pm:381 >> STACK Bio::Tools::Run::Hmmer::hmmsearch >> D:/Perl/site/lib/Bio/Tools/Run/Hmmer.pm:352 >> STACK toplevel test_bioperl.pl:12 >> thank you >> >> fayroz >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From douglas.hoen at gmail.com Wed Aug 11 19:13:49 2010 From: douglas.hoen at gmail.com (Doug) Date: Wed, 11 Aug 2010 12:13:49 -0700 (PDT) Subject: [Bioperl-l] How to store results of searches of translated DNA in SeqFeature::Store database of the original DNA? Message-ID: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com> Hi, I am trying to store in a SeqFeature::Store database the results of searches of translated DNA. The DB contains the original DNA sequences. For instance, I have done HMMER searches of 6-frame translations of the sequences stored in the DB. I want to store these results "at" their (equivalent) DNA positions, which I can calculate. Preferably, I would like to directly store the SeqFeature::Similarity objects that I get from parsing these searches. But they are of course located on different coordinate systems than the DNA, so I guess I can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct DNA position and then store the Similarity's as sub-SeqFeatures. I could just set the Similarity's position to the (calculated) DNA coordinates, or alternately make a new SeqFeature and copy in the attributes I want. But is there a more elegant solution? Thanks, -- Doug From douglas.hoen at gmail.com Wed Aug 11 20:11:26 2010 From: douglas.hoen at gmail.com (Doug) Date: Wed, 11 Aug 2010 13:11:26 -0700 (PDT) Subject: [Bioperl-l] How to store results of searches of translated DNA in SeqFeature::Store database of the original DNA? In-Reply-To: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com> References: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com> Message-ID: One possible answer to my own question: Use Bio::SeqFeature::PositionProxy's? Would this work? On Aug 11, 3:13?pm, Doug wrote: > Hi, > > I am trying to store in a SeqFeature::Store database the results of > searches of translated DNA. The DB contains the original DNA > sequences. For instance, I have done HMMER searches of 6-frame > translations of the sequences stored in the DB. I want to store these > results "at" their (equivalent) DNA positions, which I can calculate. > Preferably, I would like to directly store the SeqFeature::Similarity > objects that I get from parsing these searches. But they are of course > located on different coordinate systems than the DNA, so I guess I > can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct > DNA position and then store the Similarity's as sub-SeqFeatures. > > I could just set the Similarity's position to the (calculated) DNA > coordinates, or alternately make a new SeqFeature and copy in the > attributes I want. But is there a more elegant solution? > > Thanks, > -- Doug > _______________________________________________ > Bioperl-l mailing list > Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l From scott at scottcain.net Wed Aug 11 20:16:22 2010 From: scott at scottcain.net (Scott Cain) Date: Wed, 11 Aug 2010 16:16:22 -0400 Subject: [Bioperl-l] How to store results of searches of translated DNA in SeqFeature::Store database of the original DNA? In-Reply-To: References: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com> Message-ID: Hi Doug, I don't know if any of the things you've thought of would work; I've never tried it. My inclination would be to express your data in GFF3 and use the standard loader. Scott On Wed, Aug 11, 2010 at 4:11 PM, Doug wrote: > One possible answer to my own question: Use > Bio::SeqFeature::PositionProxy's? Would this work? > > On Aug 11, 3:13?pm, Doug wrote: >> Hi, >> >> I am trying to store in a SeqFeature::Store database the results of >> searches of translated DNA. The DB contains the original DNA >> sequences. For instance, I have done HMMER searches of 6-frame >> translations of the sequences stored in the DB. I want to store these >> results "at" their (equivalent) DNA positions, which I can calculate. >> Preferably, I would like to directly store the SeqFeature::Similarity >> objects that I get from parsing these searches. But they are of course >> located on different coordinate systems than the DNA, so I guess I >> can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct >> DNA position and then store the Similarity's as sub-SeqFeatures. >> >> I could just set the Similarity's position to the (calculated) DNA >> coordinates, or alternately make a new SeqFeature and copy in the >> attributes I want. But is there a more elegant solution? >> >> Thanks, >> -- Doug >> _______________________________________________ >> Bioperl-l mailing list >> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From douglas.hoen at gmail.com Wed Aug 11 20:38:54 2010 From: douglas.hoen at gmail.com (Doug) Date: Wed, 11 Aug 2010 13:38:54 -0700 (PDT) Subject: [Bioperl-l] How to store results of searches of translated DNA in SeqFeature::Store database of the original DNA? In-Reply-To: References: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com> Message-ID: <6e28dc26-ada0-4be2-9f62-a4d632aaf0bb@j8g2000yqd.googlegroups.com> Hi Scott, Good idea. Would you happen to know of an existing HMMER3 to GFF3 converter? Thanks for your advice, -- Doug On Aug 11, 4:16?pm, Scott Cain wrote: > Hi Doug, > > I don't know if any of the things you've thought of would work; I've > never tried it. ?My inclination would be to express your data in GFF3 > and use the standard loader. > > Scott > > > > > > On Wed, Aug 11, 2010 at 4:11 PM, Doug wrote: > > One possible answer to my own question: Use > > Bio::SeqFeature::PositionProxy's? Would this work? > > > On Aug 11, 3:13?pm, Doug wrote: > >> Hi, > > >> I am trying to store in a SeqFeature::Store database the results of > >> searches of translated DNA. The DB contains the original DNA > >> sequences. For instance, I have done HMMER searches of 6-frame > >> translations of the sequences stored in the DB. I want to store these > >> results "at" their (equivalent) DNA positions, which I can calculate. > >> Preferably, I would like to directly store the SeqFeature::Similarity > >> objects that I get from parsing these searches. But they are of course > >> located on different coordinate systems than the DNA, so I guess I > >> can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct > >> DNA position and then store the Similarity's as sub-SeqFeatures. > > >> I could just set the Similarity's position to the (calculated) DNA > >> coordinates, or alternately make a new SeqFeature and copy in the > >> attributes I want. But is there a more elegant solution? > > >> Thanks, > >> -- Doug > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > > Bioperl-l mailing list > > Bioper... at lists.open-bio.org > >http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net > GMOD Coordinator (http://gmod.org/)?? ? ? ? ? ? ? ? ?? 216-392-3087 > Ontario Institute for Cancer Research > > _______________________________________________ > Bioperl-l mailing list > Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l From douglas.hoen at gmail.com Wed Aug 11 20:53:35 2010 From: douglas.hoen at gmail.com (Doug) Date: Wed, 11 Aug 2010 13:53:35 -0700 (PDT) Subject: [Bioperl-l] How to store results of searches of translated DNA in SeqFeature::Store database of the original DNA? In-Reply-To: <6e28dc26-ada0-4be2-9f62-a4d632aaf0bb@j8g2000yqd.googlegroups.com> References: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com> <6e28dc26-ada0-4be2-9f62-a4d632aaf0bb@j8g2000yqd.googlegroups.com> Message-ID: One more note: I did try using PositionProxy but it failed. It doesn't implement seq_id() and so can't be stored in the DB: ------------- EXCEPTION: Bio::Root::NotImplemented ------------- MSG: Abstract method "Bio::SeqFeatureI::seq_id" is not implemented by package Bio::SeqFeature::PositionProxy. This is not your fault - author of Bio::SeqFeature::PositionProxy should be blamed! ... On Aug 11, 4:38?pm, Doug wrote: > Hi Scott, > > Good idea. Would you happen to know of an existing HMMER3 to GFF3 > converter? > > Thanks for your advice, > -- Doug > > On Aug 11, 4:16?pm, Scott Cain wrote: > > > > > > > Hi Doug, > > > I don't know if any of the things you've thought of would work; I've > > never tried it. ?My inclination would be to express your data in GFF3 > > and use the standard loader. > > > Scott > > > On Wed, Aug 11, 2010 at 4:11 PM, Doug wrote: > > > One possible answer to my own question: Use > > > Bio::SeqFeature::PositionProxy's? Would this work? > > > > On Aug 11, 3:13?pm, Doug wrote: > > >> Hi, > > > >> I am trying to store in a SeqFeature::Store database the results of > > >> searches of translated DNA. The DB contains the original DNA > > >> sequences. For instance, I have done HMMER searches of 6-frame > > >> translations of the sequences stored in the DB. I want to store these > > >> results "at" their (equivalent) DNA positions, which I can calculate. > > >> Preferably, I would like to directly store the SeqFeature::Similarity > > >> objects that I get from parsing these searches. But they are of course > > >> located on different coordinate systems than the DNA, so I guess I > > >> can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct > > >> DNA position and then store the Similarity's as sub-SeqFeatures. > > > >> I could just set the Similarity's position to the (calculated) DNA > > >> coordinates, or alternately make a new SeqFeature and copy in the > > >> attributes I want. But is there a more elegant solution? > > > >> Thanks, > > >> -- Doug > > >> _______________________________________________ > > >> Bioperl-l mailing list > > >> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioper... at lists.open-bio.org > > >http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -- > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net > > GMOD Coordinator (http://gmod.org/)?? ? ? ? ? ? ? ? ?? 216-392-3087 > > Ontario Institute for Cancer Research > > > _______________________________________________ > > Bioperl-l mailing list > > Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Aug 11 20:45:00 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 11 Aug 2010 15:45:00 -0500 Subject: [Bioperl-l] How to store results of searches of translated DNA in SeqFeature::Store database of the original DNA? In-Reply-To: <6e28dc26-ada0-4be2-9f62-a4d632aaf0bb@j8g2000yqd.googlegroups.com> References: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com> <6e28dc26-ada0-4be2-9f62-a4d632aaf0bb@j8g2000yqd.googlegroups.com> Message-ID: <190AF658-E8FE-43D7-A71F-196AE54DA1DB@illinois.edu> HMMER3 is parsed by Bio::SearchIO now in bioperl-live, and I think there is a generic SearchIO->GFF3 script floating around the intertubes somewheres... chris On Aug 11, 2010, at 3:38 PM, Doug wrote: > Hi Scott, > > Good idea. Would you happen to know of an existing HMMER3 to GFF3 > converter? > > Thanks for your advice, > -- Doug > > On Aug 11, 4:16 pm, Scott Cain wrote: >> Hi Doug, >> >> I don't know if any of the things you've thought of would work; I've >> never tried it. My inclination would be to express your data in GFF3 >> and use the standard loader. >> >> Scott >> >> >> >> >> >> On Wed, Aug 11, 2010 at 4:11 PM, Doug wrote: >>> One possible answer to my own question: Use >>> Bio::SeqFeature::PositionProxy's? Would this work? >> >>> On Aug 11, 3:13 pm, Doug wrote: >>>> Hi, >> >>>> I am trying to store in a SeqFeature::Store database the results of >>>> searches of translated DNA. The DB contains the original DNA >>>> sequences. For instance, I have done HMMER searches of 6-frame >>>> translations of the sequences stored in the DB. I want to store these >>>> results "at" their (equivalent) DNA positions, which I can calculate. >>>> Preferably, I would like to directly store the SeqFeature::Similarity >>>> objects that I get from parsing these searches. But they are of course >>>> located on different coordinate systems than the DNA, so I guess I >>>> can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct >>>> DNA position and then store the Similarity's as sub-SeqFeatures. >> >>>> I could just set the Similarity's position to the (calculated) DNA >>>> coordinates, or alternately make a new SeqFeature and copy in the >>>> attributes I want. But is there a more elegant solution? >> >>>> Thanks, >>>> -- Doug >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l >> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioper... at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D. scott at scottcain dot net >> GMOD Coordinator (http://gmod.org/) 216-392-3087 >> Ontario Institute for Cancer Research >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From scott at scottcain.net Wed Aug 11 21:05:25 2010 From: scott at scottcain.net (Scott Cain) Date: Wed, 11 Aug 2010 17:05:25 -0400 Subject: [Bioperl-l] How to store results of searches of translated DNA in SeqFeature::Store database of the original DNA? In-Reply-To: <190AF658-E8FE-43D7-A71F-196AE54DA1DB@illinois.edu> References: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com> <6e28dc26-ada0-4be2-9f62-a4d632aaf0bb@j8g2000yqd.googlegroups.com> <190AF658-E8FE-43D7-A71F-196AE54DA1DB@illinois.edu> Message-ID: Um, yeah, it's in bioperl: bp_search2gff.pl. Scott On Wed, Aug 11, 2010 at 4:45 PM, Chris Fields wrote: > HMMER3 is parsed by Bio::SearchIO now in bioperl-live, and I think there is a generic SearchIO->GFF3 script floating around the intertubes somewheres... > > chris > > On Aug 11, 2010, at 3:38 PM, Doug wrote: > >> Hi Scott, >> >> Good idea. Would you happen to know of an existing HMMER3 to GFF3 >> converter? >> >> Thanks for your advice, >> -- Doug >> >> On Aug 11, 4:16 pm, Scott Cain wrote: >>> Hi Doug, >>> >>> I don't know if any of the things you've thought of would work; I've >>> never tried it. ?My inclination would be to express your data in GFF3 >>> and use the standard loader. >>> >>> Scott >>> >>> >>> >>> >>> >>> On Wed, Aug 11, 2010 at 4:11 PM, Doug wrote: >>>> One possible answer to my own question: Use >>>> Bio::SeqFeature::PositionProxy's? Would this work? >>> >>>> On Aug 11, 3:13 pm, Doug wrote: >>>>> Hi, >>> >>>>> I am trying to store in a SeqFeature::Store database the results of >>>>> searches of translated DNA. The DB contains the original DNA >>>>> sequences. For instance, I have done HMMER searches of 6-frame >>>>> translations of the sequences stored in the DB. I want to store these >>>>> results "at" their (equivalent) DNA positions, which I can calculate. >>>>> Preferably, I would like to directly store the SeqFeature::Similarity >>>>> objects that I get from parsing these searches. But they are of course >>>>> located on different coordinate systems than the DNA, so I guess I >>>>> can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct >>>>> DNA position and then store the Similarity's as sub-SeqFeatures. >>> >>>>> I could just set the Similarity's position to the (calculated) DNA >>>>> coordinates, or alternately make a new SeqFeature and copy in the >>>>> attributes I want. But is there a more elegant solution? >>> >>>>> Thanks, >>>>> -- Doug >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioper... at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> -- >>> ------------------------------------------------------------------------ >>> Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain dot net >>> GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ?216-392-3087 >>> Ontario Institute for Cancer Research >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From cjfields at illinois.edu Wed Aug 11 21:07:20 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 11 Aug 2010 16:07:20 -0500 Subject: [Bioperl-l] How to store results of searches of translated DNA in SeqFeature::Store database of the original DNA? In-Reply-To: References: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com> <6e28dc26-ada0-4be2-9f62-a4d632aaf0bb@j8g2000yqd.googlegroups.com> <190AF658-E8FE-43D7-A71F-196AE54DA1DB@illinois.edu> Message-ID: For some reason I thought there was a more up-to-date one somewhere. Ah well, can't keep track of all the code in bioperl :> chris On Aug 11, 2010, at 4:05 PM, Scott Cain wrote: > Um, yeah, it's in bioperl: bp_search2gff.pl. > > Scott > > > On Wed, Aug 11, 2010 at 4:45 PM, Chris Fields wrote: >> HMMER3 is parsed by Bio::SearchIO now in bioperl-live, and I think there is a generic SearchIO->GFF3 script floating around the intertubes somewheres... >> >> chris >> >> On Aug 11, 2010, at 3:38 PM, Doug wrote: >> >>> Hi Scott, >>> >>> Good idea. Would you happen to know of an existing HMMER3 to GFF3 >>> converter? >>> >>> Thanks for your advice, >>> -- Doug >>> >>> On Aug 11, 4:16 pm, Scott Cain wrote: >>>> Hi Doug, >>>> >>>> I don't know if any of the things you've thought of would work; I've >>>> never tried it. My inclination would be to express your data in GFF3 >>>> and use the standard loader. >>>> >>>> Scott >>>> >>>> >>>> >>>> >>>> >>>> On Wed, Aug 11, 2010 at 4:11 PM, Doug wrote: >>>>> One possible answer to my own question: Use >>>>> Bio::SeqFeature::PositionProxy's? Would this work? >>>> >>>>> On Aug 11, 3:13 pm, Doug wrote: >>>>>> Hi, >>>> >>>>>> I am trying to store in a SeqFeature::Store database the results of >>>>>> searches of translated DNA. The DB contains the original DNA >>>>>> sequences. For instance, I have done HMMER searches of 6-frame >>>>>> translations of the sequences stored in the DB. I want to store these >>>>>> results "at" their (equivalent) DNA positions, which I can calculate. >>>>>> Preferably, I would like to directly store the SeqFeature::Similarity >>>>>> objects that I get from parsing these searches. But they are of course >>>>>> located on different coordinate systems than the DNA, so I guess I >>>>>> can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct >>>>>> DNA position and then store the Similarity's as sub-SeqFeatures. >>>> >>>>>> I could just set the Similarity's position to the (calculated) DNA >>>>>> coordinates, or alternately make a new SeqFeature and copy in the >>>>>> attributes I want. But is there a more elegant solution? >>>> >>>>>> Thanks, >>>>>> -- Doug >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioper... at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> -- >>>> ------------------------------------------------------------------------ >>>> Scott Cain, Ph. D. scott at scottcain dot net >>>> GMOD Coordinator (http://gmod.org/) 216-392-3087 >>>> Ontario Institute for Cancer Research >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research From douglas.hoen at gmail.com Wed Aug 11 21:11:20 2010 From: douglas.hoen at gmail.com (Douglas Hoen) Date: Wed, 11 Aug 2010 17:11:20 -0400 Subject: [Bioperl-l] How to store results of searches of translated DNA in SeqFeature::Store database of the original DNA? In-Reply-To: References: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com> <6e28dc26-ada0-4be2-9f62-a4d632aaf0bb@j8g2000yqd.googlegroups.com> <190AF658-E8FE-43D7-A71F-196AE54DA1DB@illinois.edu> Message-ID: Great, thanks so much for the info. On 2010-08-11, at 5:05 PM, Scott Cain wrote: > Um, yeah, it's in bioperl: bp_search2gff.pl. > > Scott > > > On Wed, Aug 11, 2010 at 4:45 PM, Chris Fields wrote: >> HMMER3 is parsed by Bio::SearchIO now in bioperl-live, and I think there is a generic SearchIO->GFF3 script floating around the intertubes somewheres... >> >> chris >> >> On Aug 11, 2010, at 3:38 PM, Doug wrote: >> >>> Hi Scott, >>> >>> Good idea. Would you happen to know of an existing HMMER3 to GFF3 >>> converter? >>> >>> Thanks for your advice, >>> -- Doug >>> >>> On Aug 11, 4:16 pm, Scott Cain wrote: >>>> Hi Doug, >>>> >>>> I don't know if any of the things you've thought of would work; I've >>>> never tried it. My inclination would be to express your data in GFF3 >>>> and use the standard loader. >>>> >>>> Scott >>>> >>>> >>>> >>>> >>>> >>>> On Wed, Aug 11, 2010 at 4:11 PM, Doug wrote: >>>>> One possible answer to my own question: Use >>>>> Bio::SeqFeature::PositionProxy's? Would this work? >>>> >>>>> On Aug 11, 3:13 pm, Doug wrote: >>>>>> Hi, >>>> >>>>>> I am trying to store in a SeqFeature::Store database the results of >>>>>> searches of translated DNA. The DB contains the original DNA >>>>>> sequences. For instance, I have done HMMER searches of 6-frame >>>>>> translations of the sequences stored in the DB. I want to store these >>>>>> results "at" their (equivalent) DNA positions, which I can calculate. >>>>>> Preferably, I would like to directly store the SeqFeature::Similarity >>>>>> objects that I get from parsing these searches. But they are of course >>>>>> located on different coordinate systems than the DNA, so I guess I >>>>>> can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct >>>>>> DNA position and then store the Similarity's as sub-SeqFeatures. >>>> >>>>>> I could just set the Similarity's position to the (calculated) DNA >>>>>> coordinates, or alternately make a new SeqFeature and copy in the >>>>>> attributes I want. But is there a more elegant solution? >>>> >>>>>> Thanks, >>>>>> -- Doug >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioper... at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> -- >>>> ------------------------------------------------------------------------ >>>> Scott Cain, Ph. D. scott at scottcain dot net >>>> GMOD Coordinator (http://gmod.org/) 216-392-3087 >>>> Ontario Institute for Cancer Research >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research From Russell.Smithies at agresearch.co.nz Wed Aug 11 21:31:32 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 12 Aug 2010 09:31:32 +1200 Subject: [Bioperl-l] AlignIO and Gbrowse_syn In-Reply-To: References: Message-ID: <18DF7D20DFEC044098A1062202F5FFF32F0237EAB7@exchsth.agresearch.co.nz> I know there was some brief discussion about .maf format a few weeks ago but I've had an enquiry (as below) from a colleague. If GBrowse_syn is using .maf format, does AlignIO need more work? Any comments? --Russell I'd like to plug LASTZ alignments into GBrowse_syn. LASTZ can produce a limit number of alignment formats (http://www.bx.psu.edu/miller_lab/dist/README.lastz-1.02.00/README.lastz-1.02.00a.html#options_output). GBrowse_syn accepts clustalw format plus "other commonly used formats recognized by BioPerl's AlignIO parser" (http://gmod.org/wiki/GBrowse_syn_Database) . Since LASTZ doesn't produce clustalw, I've tried parsing LASTZ maf output to clustalw (and other alignment formats) using AlignIO, however I run into the following issues: *Strand info is lost (probably fair enough, since this isn't part of the clustalw format per se; incorporating strand info within sequence IDs is a GBrowse_syn clustalw specification) *The coordinate system for reverse strand matches differs between LASTZ .maf and BioPerl .maf: for LASTZ, coordinates relate to the reverse complemented sequence, whereas for BioPerl/GBrowse, coordinates relate to the original (non-rev complemented) sequence. E.g. a coordinate of "1" in the LASTZ .maf file refers to the last base of the original sequence; AlignIO prints "1" to the output clustalw file, but since strand info is lost it is construed as the first position at the very start of the original sequence. As a result all reverse match coordinates in the resulting clustalw output file are incorrect. *AlignIO is unable to parse multiple, individual aligned regions within the same .maf file; it interleaves them I would be interested to hear whether anyone has already found a solution to integrating LASTZ and GBrowse_syn... and also whether any development of AlignIO to improve support of maf format is planned. ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From cjfields at illinois.edu Wed Aug 11 22:02:38 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 11 Aug 2010 17:02:38 -0500 Subject: [Bioperl-l] AlignIO and Gbrowse_syn In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32F0237EAB7@exchsth.agresearch.co.nz> References: <18DF7D20DFEC044098A1062202F5FFF32F0237EAB7@exchsth.agresearch.co.nz> Message-ID: Russell, We have had very few requests to support .maf until recently, which is why there has been little done with it. We welcome any help to improve it. chris On Aug 11, 2010, at 4:31 PM, Smithies, Russell wrote: > I know there was some brief discussion about .maf format a few weeks ago but I've had an enquiry (as below) from a colleague. > If GBrowse_syn is using .maf format, does AlignIO need more work? > Any comments? > > --Russell > > > I'd like to plug LASTZ alignments into GBrowse_syn. LASTZ can produce a limit number of alignment formats (http://www.bx.psu.edu/miller_lab/dist/README.lastz-1.02.00/README.lastz-1.02.00a.html#options_output). GBrowse_syn accepts clustalw format plus "other commonly used formats recognized by BioPerl's AlignIO parser" (http://gmod.org/wiki/GBrowse_syn_Database) . Since LASTZ doesn't produce clustalw, I've tried parsing LASTZ maf output to clustalw (and other alignment formats) using AlignIO, however I run into the following issues: > *Strand info is lost (probably fair enough, since this isn't part of the clustalw format per se; incorporating strand info within sequence IDs is a GBrowse_syn clustalw specification) > *The coordinate system for reverse strand matches differs between LASTZ .maf and BioPerl .maf: for LASTZ, coordinates relate to the reverse complemented sequence, whereas for BioPerl/GBrowse, coordinates relate to the original (non-rev complemented) sequence. E.g. a coordinate of "1" in the LASTZ .maf file refers to the last base of the original sequence; AlignIO prints "1" to the output clustalw file, but since strand info is lost it is construed as the first position at the very start of the original sequence. As a result all reverse match coordinates in the resulting clustalw output file are incorrect. > *AlignIO is unable to parse multiple, individual aligned regions within the same .maf file; it interleaves them > > I would be interested to hear whether anyone has already found a solution to integrating LASTZ and GBrowse_syn... and also whether any development of AlignIO to improve support of maf format is planned. > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From douglas.hoen at gmail.com Thu Aug 12 05:59:37 2010 From: douglas.hoen at gmail.com (Doug Hoen) Date: Wed, 11 Aug 2010 22:59:37 -0700 (PDT) Subject: [Bioperl-l] HMMER3 to GFF3 Message-ID: <4bb89ced-69d9-43ff-ae20-4ce134efc40a@f6g2000yqa.googlegroups.com> Hi, I am trying to convert HMMER3 (hmmscan) output files into GFF3 files. Based on previous advice (see the thread, "How to store results of searches of translated DNA in SeqFeature::Store database of the original DNA?"), I have installed bioperl-live for its new HMMER3 parsing capabilities (in SearchIO) and am trying to use bp_search2gff.pl to do the file conversion. The hmmscan was done on translated chromosome sequences with conserved domain models. I want to get the GFF 'start' and 'end' columns to be based on these coordinates, not those of the models. To do this (with my files), it seems I need to use the option "--type hit". However, this changes the "Target" sequence name from the model name to chromosome name, and the model name does not appear anywhere in the output (see below). Could someone please confirm whether the results are incorrect and, if so, perhaps suggest a fix? It may well be that this problem is due to the unusual way I am using hmmscan, rather than a problem with HMMER3 parsing...? Many thanks, -- Doug ======================================================== Here's what it looks like if I do *not* use the "--type hit" option. (RVT_2 is a conserved domain name. I need this in the output.) COMMAND: ------------------ bp_search2gff.pl -i ../chr1-tesigsv2.hmmscan -o chr1-tesigsv2-hmmscan- original-locations-v2.gff3 --format hmmer3 --source HMMER3 --version 3 --component OUTPUT: ------------------ ==> chr1-tesigsv2-hmmscan-original-locations-v2.gff3 <== ##gff-version 3 Chr1_1 chromosome Component 1 10142557 . . 1 sequence=Chr1_1 Chr1_1 HMMER3 similarity 1 245 307.3 . 0 Target=Sequence:RVT_2 1898330 1898579 Chr1_1 HMMER3 similarity 1 244 329.5 . 0 Target=Sequence:RVT_2 2573551 2573796 Chr1_1 HMMER3 similarity 1 245 308.8 . 0 Target=Sequence:RVT_2 3159685 3159930 Chr1_1 HMMER3 similarity 1 102 108.2 . 0 Target=Sequence:RVT_2 3438684 3438791 Chr1_1 HMMER3 similarity 2 245 277.2 . 0 Target=Sequence:RVT_2 3566642 3566891 Chr1_1 HMMER3 similarity 13 213 251.4 . 0 Target=Sequence:RVT_2 4251160 4251373 Chr1_1 HMMER3 similarity 1 244 310.6 . 0 Target=Sequence:RVT_2 4252791 4253036 Chr1_1 HMMER3 similarity 6 99 94.2 . 0 Target=Sequence:RVT_2 4271555 4271653 ======================================================== And here's what it looks like if I *do* use the "--type hit" option. The coordinates look good but the model name has disappeared (and the Target=Sequence seems wrong). COMMAND: ------------------ bp_search2gff.pl -i ../chr1-tesigsv2.hmmscan -o chr1-tesigsv2-hmmscan- original-locations-v3.gff3 --format hmmer3 --type hit --source HMMER3 --version 3 --component OUTPUT: ------------------ ==> chr1-tesigsv2-hmmscan-original-locations-v3.gff3 <== ##gff-version 3 RVT_2 HMMER3 similarity 1898330 1898579 307.3 . 0 Target=Sequence:Chr1_1 1 245 RVT_2 HMMER3 similarity 2573551 2573796 329.5 . 0 Target=Sequence:Chr1_1 1 244 RVT_2 HMMER3 similarity 3159685 3159930 308.8 . 0 Target=Sequence:Chr1_1 1 245 RVT_2 HMMER3 similarity 3438684 3438791 108.2 . 0 Target=Sequence:Chr1_1 1 102 RVT_2 HMMER3 similarity 3566642 3566891 277.2 . 0 Target=Sequence:Chr1_1 2 245 RVT_2 HMMER3 similarity 4251160 4251373 251.4 . 0 Target=Sequence:Chr1_1 13 213 RVT_2 HMMER3 similarity 4252791 4253036 310.6 . 0 Target=Sequence:Chr1_1 1 244 RVT_2 HMMER3 similarity 4271555 4271653 94.2 . 0 Target=Sequence:Chr1_1 6 99 RVT_2 HMMER3 similarity 4481232 4481477 281.5 . 0 Target=Sequence:Chr1_1 2 245 ======================================================== And here's what the input HMMER3 result file looks like: ==> ../chr1-tesigsv2.hmmscan <== # hmmscan :: search sequence(s) against a profile database # HMMER 3.0rc1 (February 2010); http://hmmer.org/ # Copyright (C) 2010 Howard Hughes Medical Institute. # Freely distributed under the GNU General Public License (GPLv3). # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # query sequence file: [...]/whole_chromosomes/translated/ chr1.pep # target HMM database: [...]/signatures/Pfam-A.hmm # output directed to file: chr1-tesigsv2.hmmscan # model-specific thresholding: TC cutoffs # Max sensitivity mode: on [all heuristic filters off] # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Query: Chr1_1 [L=10142557] Description: CHROMOSOME dumped from ADB: Jun/20/09 14:53; last updated: 2009-02-02 Scores for complete sequence (score includes all domains): --- full sequence --- --- best 1 domain --- -#dom- E-value score bias E-value score bias exp N Model Description ------- ------ ----- ------- ------ ----- ---- -- -------- ----------- 0 3971.3 17.7 2.6e-101 329.5 0.6 19.4 17 RVT_2 Reverse transcriptase (RNA-dependent DNA pol 0 3040.7 23.0 1e-206 678.6 0.1 12.2 10 ATHILA ATHILA ORF-1 family 0 1681.9 79.1 1.9e-46 149.9 0.4 28.0 21 RVT_1 Reverse transcriptase (RNA-dependent DNA pol 0 1446.9 27.4 3.6e-95 309.1 0.2 7.6 5 Transposase_21 Transposase family tnp2 0 1168.4 50.3 1.4e-29 94.4 0.3 21.5 18 rve Integrase core domain 9.1e-300 960.0 69.0 3.1e-20 64.0 0.0 28.8 20 Retrotrans_gag Retrotransposon gag protein 1.5e-180 577.0 31.6 1.6e-29 93.1 1.5 9.5 8 Transposase_23 TNP1/EN/SPM transposase 4.4e-143 456.9 82.8 4.8e-18 56.4 0.1 12.9 11 MuDR MuDR family transposase 3.8e-116 371.4 19.6 1.2e-18 58.9 0.0 13.7 7 MULE MULE transposase domain 7.1e-106 344.1 5.6 2.7e-97 316.0 0.0 3.6 1 Plant_tran Plant transposon protein 9.2e-85 275.4 22.9 5.4e-60 194.4 0.3 6.4 3 Peptidase_C48 Ulp1 protease family, C-terminal catalytic d 1.8e-77 249.8 24.8 4.4e-28 89.8 0.1 10.8 3 Transposase_24 Plant transposase (Ptta/En/Spm family) 2.8e-47 150.1 1.2 5.5e-23 72.3 0.2 3.7 2 hATC hAT family dimerisation domain 5.7e-28 89.4 3.6 4.7e-13 41.1 0.0 6.5 1 RVP_2 Retroviral aspartyl protease 1e-16 53.3 0.0 4.4e-07 22.1 0.0 6.8 1 RnaseH RNase H 1.5e-08 25.3 2.4 0.00016 12.1 0.0 4.9 0 Transposase_mut Transposase, Mutator family Domain annotation for each model (and alignments): >> RVT_2 Reverse transcriptase (RNA-dependent DNA polymerase) # score bias c-Evalue i-Evalue hmmfrom hmm to alifrom ali to envfrom env to acc --- ------ ----- --------- --------- ------- ------- ------- ------- ------- ------- ---- 1 ! 307.3 0.0 5.3e-95 1.5e-94 1 245 [. 1898330 1898578 .. 1898330 1898579 .. 0.99 2 ! 329.5 0.6 8.9e-102 2.6e-101 1 244 [. 2573551 2573794 .. 2573551 2573796 .. 0.99 3 ! 308.8 0.0 1.8e-95 5.2e-95 1 245 [. 3159685 3159929 .. 3159685 3159930 .. 0.99 4 ! 108.2 0.1 3.4e-34 9.7e-34 1 102 [. 3438684 3438785 .. 3438684 3438791 .. 0.96 5 ! 277.2 0.0 8.1e-86 2.3e-85 2 245 .. 3566643 3566890 .. 3566642 3566891 .. 0.99 6 ! 251.4 0.0 6.2e-78 1.8e-77 13 213 .. 4251164 4251364 .. 4251160 4251373 .. 0.97 7 ! 310.6 0.0 5.1e-96 1.5e-95 1 244 [. 4252791 4253034 .. 4252791 4253036 .. 0.99 8 ! 94.2 0.1 6.1e-30 1.8e-29 6 99 .. 4271560 4271653 .. 4271555 4271653 .. 0.97 9 ! 281.5 0.9 3.9e-87 1.1e-86 2 245 .. 4481233 4481476 .. 4481232 4481477 .. 0.98 10 ! 248.2 0.0 5.9e-77 1.7e-76 1 190 [. 4521040 4521233 .. 4521040 4521237 .. 0.97 11 ! 314.6 0.1 3.2e-97 9.2e-97 1 244 [. 4652456 4652702 .. 4652456 4652704 .. 0.98 12 ! 40.7 0.0 1.3e-13 3.7e-13 2 92 .. 5219607 5219697 .. 5219606 5219701 .. 0.90 13 ! 221.0 0.0 1.2e-68 3.4e-68 2 245 .. 5241015 5241258 .. 5241014 5241259 .. 0.95 14 ! 81.2 0.0 5.6e-26 1.6e-25 2 115 .. 5501957 5502070 .. 5501956 5502080 .. 0.92 15 ! 272.4 0.0 2.3e-84 6.7e-84 30 245 .. 6483057 6483271 .. 6483050 6483272 .. 0.98 16 ! 178.5 0.0 1.2e-55 3.3e-55 81 244 .. 7250563 7250726 .. 7250552 7250728 .. 0.96 17 ! 313.7 0.0 5.9e-97 1.7e-96 2 245 .. 7707124 7707367 .. 7707123 7707368 .. 0.99 Alignments for each domain: == domain 1 score: 307.3 bits; conditional E-value: 5.3e-95 RVT_2 1 nktwelvelpkgkkviglkWvfklKlnedgeierykARlVakGftqkegidyeetfspvvklesirlllalaaekkleleqlDvktaFLngelee 95 n tw +++lp gkk++g+kWv+k+Kln+dg++erykARlVakG+tq+eg+dy +tfspv+kl++++ll+a+aa+k+++l+qlD+++aFLng+l+e Chr1_1 1898330 NGTWVVCSLPVGKKAVGCKWVYKIKLNADGSLERYKARLVAKGYTQTEGLDYVDTFSPVAKLTTVKLLIAVAAAKGWSLSQLDISNAFLNGSLDE 1898424 68********************************************************************************************* PP RVT_2 96 evYvkqpeGfedkkk....enkvckLkkslYgLkqapraWyeklsevllklgfkkseadkclfvkkkeeeliivllYVDDlliagsskelieelk 186 e+Y++ p+G++ ++ +n vc+LkkslYgLkqa+r+Wy k+se l++lgf+ +s+ d++lf++k++++ ++vl+YVDD++ia+s +++ e l Chr1_1 1898425 EIYMTLPPGYSPRQGdsfpPNAVCRLKKSLYGLKQASRQWYLKFSESLKALGFTQSSGDHTLFTRKSKNSYMAVLVYVDDIIIASSCDRETELLR 1898519 ***********998889999*************************************************************************** PP RVT_2 187 eeLkkefemkdlgelkyfLgleierkeegillsqekyvkkllkkfkmedakpvstplea 245 ++L+++ +++dlg+l+yfLglei+r+++gi+++q+ky+ +ll+++++ +k++s +p+e+ Chr1_1 1898520 DALQRSSKLRDLGTLRYFLGLEIARNTDGISICQRKYTLELLAETGLLGCKSSSVPMEP 1898578 *********************************************************97 PP == domain 2 score: 329.5 bits; conditional E-value: 8.9e-102 RVT_2 1 nktwelvelpkgkkviglkWvfklKlnedgeierykARlVakGftqkegidyeetfspvvklesirlllalaaekkleleqlDvktaFLngelee 95 n+twel++lp+g+k+ig+kWv+k K+n++ge+erykARlVakG++q++gidy+e +f+pv++le++rl+++laa++k++++q+D k aFLng++ee Chr1_1 2573551 NDTWELTSLPNGHKAIGVKWVYKAKKNSKGEVERYKARLVAKGYSQRAGIDYDEVFAPVARLETVRLIISLAAQNKWKIHQMDFKLAFLNGDFEE 2573645 79********************************************************************************************* PP RVT_2 96 evYvkqpeGfedkkkenkvckLkkslYgLkqapraWyeklsevllklgfkkseadkclfvkkkeeeliivllYVDDlliagsskelieelkeeLk 190 evY++qp+G+ +k++e+kv++Lkk+lYgLkqapraW++++++++++++f k+ + +++l++k ++e+++i +lYVDDl+++g++ ++ ee+k+e++ Chr1_1 2573646 EVYIEQPQGYIVKGEEDKVLRLKKALYGLKQAPRAWNTRIDKYFKEKDFIKCPYEHALYIKIQKEDILIACLYVDDLIFTGNNPSMFEEFKKEMT 2573740 *********************************************************************************************** PP RVT_2 191 kefemkdlgelkyfLgleierkeegillsqekyvkkllkkfkmedakpvstple 244 kefem+d+g ++y+Lg+e+++++++i+++qe y+k++lkkfkm+d++pv tp +e Chr1_1 2573741 KEFEMTDIGLMSYYLGIEVKQEDNRIFITQEGYAKEVLKKFKMDDSNPVCTPME 2573794 ****************************************************97 PP From kai.blin at biotech.uni-tuebingen.de Thu Aug 12 12:16:45 2010 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Thu, 12 Aug 2010 14:16:45 +0200 Subject: [Bioperl-l] HMMER3 to GFF3 In-Reply-To: <4bb89ced-69d9-43ff-ae20-4ce134efc40a@f6g2000yqa.googlegroups.com> References: <4bb89ced-69d9-43ff-ae20-4ce134efc40a@f6g2000yqa.googlegroups.com> Message-ID: <20100812141645.1dc6507a.kai.blin@biotech.uni-tuebingen.de> On Wed, 11 Aug 2010 22:59:37 -0700 (PDT) Doug Hoen wrote: Hi Doug, > Could someone please confirm whether the results are incorrect and, if > so, perhaps suggest a fix? It may well be that this problem is due to > the unusual way I am using hmmscan, rather than a problem with HMMER3 > parsing...? Can you please attach your hmmer input file? Along the way something inserted line breaks, making it unreadable. It might well be possible that the HMMer3 parser still handles a little different from the HMMer2 parser, I haven't tried that script. Cheers, Kai -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Institute for Microbiology and Infection Medicine Division of Microbiology/Biotechnology Eberhard-Karls-University of T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Deutschland Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From kai.blin at biotech.uni-tuebingen.de Thu Aug 12 12:09:00 2010 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Thu, 12 Aug 2010 14:09:00 +0200 Subject: [Bioperl-l] using HMMER In-Reply-To: <62C86AFB-FF3A-44C6-A413-50C3F839DF34@illinois.edu> References: <603590.1072.qm@web112620.mail.gq1.yahoo.com> <4C62B487.9090103@gmail.com> <62C86AFB-FF3A-44C6-A413-50C3F839DF34@illinois.edu> Message-ID: <20100812140900.291bbb01.kai.blin@biotech.uni-tuebingen.de> On Wed, 11 Aug 2010 10:07:36 -0500 Chris Fields wrote: > might also want to check whether you are using hmmer2 vs hmmer3. not sure if the wrapper works for hmmer3. It might if you initialize it using my $factory = Bio::Tools::Run::Hmmer->new(-hmm => 'model.hmm', -_READMETHOD => 'hmmer3'); at least for the programs that still exist with the same name in hmmer3. It won't support hmmer3 using the default options, though. If I have some spare time, I'll look into this, no promises on the timeframe, though. Cheers, Kai -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Institute for Microbiology and Infection Medicine Division of Microbiology/Biotechnology Eberhard-Karls-University of T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Deutschland Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From cjfields at illinois.edu Thu Aug 12 15:28:50 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 12 Aug 2010 10:28:50 -0500 Subject: [Bioperl-l] using HMMER In-Reply-To: <20100812140900.291bbb01.kai.blin@biotech.uni-tuebingen.de> References: <603590.1072.qm@web112620.mail.gq1.yahoo.com> <4C62B487.9090103@gmail.com> <62C86AFB-FF3A-44C6-A413-50C3F839DF34@illinois.edu> <20100812140900.291bbb01.kai.blin@biotech.uni-tuebingen.de> Message-ID: <8129B813-5B15-4DDC-AB0D-5D95EFFCE78D@illinois.edu> On Aug 12, 2010, at 7:09 AM, Kai Blin wrote: > On Wed, 11 Aug 2010 10:07:36 -0500 > Chris Fields wrote: > >> might also want to check whether you are using hmmer2 vs hmmer3. not sure if the wrapper works for hmmer3. > > It might if you initialize it using > my $factory = Bio::Tools::Run::Hmmer->new(-hmm => 'model.hmm', -_READMETHOD => 'hmmer3'); > > at least for the programs that still exist with the same name in > hmmer3. It won't support hmmer3 using the default options, though. > > If I have some spare time, I'll look into this, no promises on the > timeframe, though. > > Cheers, > Kai > > -- > Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de > Institute for Microbiology and Infection Medicine > Division of Microbiology/Biotechnology > Eberhard-Karls-University of T?bingen > Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 > D-72076 T?bingen Fax : ++49 7071 29-5979 > Deutschland > Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben Would be nice to convert this over (at some point) to use Mark's CommandExts. I'm thinking of doing this with Infernal, so if I get that running it wouldn't be terribly difficult to get hmmer3 working as well. chris From cjfields at illinois.edu Thu Aug 12 16:14:44 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 12 Aug 2010 11:14:44 -0500 Subject: [Bioperl-l] using HMMER In-Reply-To: <857996.8184.qm@web112610.mail.gq1.yahoo.com> References: <603590.1072.qm@web112620.mail.gq1.yahoo.com> <4C62B487.9090103@gmail.com> <62C86AFB-FF3A-44C6-A413-50C3F839DF34@illinois.edu> <20100812140900.291bbb01.kai.blin@biotech.uni-tuebingen.de> <8129B813-5B15-4DDC-AB0D-5D95EFFCE78D@illinois.edu> <857996.8184.qm@web112610.mail.gq1.yahoo.com> Message-ID: <43FD0A31-DB95-4AE9-B678-937EE6346BC2@illinois.edu> Fayroz, Please keep responses on-list. It seems you need to update your local bioperl, as 'hmmer3' is a recent addition, after 1.6.1. It will be in 1.6.2 if I can get the time to make a release :> chris On Aug 12, 2010, at 10:58 AM, fayroz wrote: > dear chris, > from HMMER documentation i found this statement > "The HMMER programs must either be in your path, or you must set the environment > variable HMMERDIR to point to their location." > is it will solve the problem? > how can i do it please ? i work under windows7 platform > > > when i appled this line with hmmer3 > my $factory = Bio::Tools::Run::Hmmer->new(-hmm => 'model.hmm', -_READMETHOD => > 'hmmer3'); > > this output apper: > > Bio::SearchIO: hmmer3 cannot be found > > and when try with hmmer2 the same output apper: > > Exception > ------------- EXCEPTION ------------- > MSG: Failed to load module Bio::SearchIO::hmmer3. Can't locate > Bio\SearchIO\hmmer3.pm in @INC (@INC contains: D:\Perl\bin\ D:/Perl/site/lib > D:/Perl/lib .) at D:/Perl/site/lib/Bio/Root/Root.pm line 439, line 1. > STACK Bio::Root::Root::_load_module D:/Perl/site/lib/Bio/Root/Root.pm:441 > STACK (eval) D:/Perl/site/lib/Bio/SearchIO.pm:446 > STACK Bio::SearchIO::_load_format_module D:/Perl/site/lib/Bio/SearchIO.pm:445 > STACK Bio::SearchIO::new D:/Perl/site/lib/Bio/SearchIO.pm:189 > STACK Bio::Tools::Run::Hmmer::_run D:/Perl/site/lib/Bio/Tools/Run/Hmmer.pm:431 > STACK Bio::Tools::Run::Hmmer::hmmsearch > D:/Perl/site/lib/Bio/Tools/Run/Hmmer.pm:353 > STACK toplevel C:\Users\Khaled\AppData\Local\Temp\dzprltmp.pl:13 > ------------------------------------- > For more information about the SearchIO system please see the SearchIO docs. > This includes ways of checking for formats at compile time, not run time > '--informat' is not recognized as an internal or external command, > operable program or batch file. > Can't call method "next_result" on an undefined value at > C:\Users\Khaled\AppData\Local\Temp\dzprltmp.pl line 15, line 1. > > > > ----- Original Message ---- > From: Chris Fields > To: Kai Blin > Cc: fayroz ; bioperl-l at bioperl.org > Sent: Thu, August 12, 2010 6:28:50 PM > Subject: Re: [Bioperl-l] using HMMER > > On Aug 12, 2010, at 7:09 AM, Kai Blin wrote: > >> On Wed, 11 Aug 2010 10:07:36 -0500 >> Chris Fields wrote: >> >>> might also want to check whether you are using hmmer2 vs hmmer3. not sure if >>> the wrapper works for hmmer3. >> >> It might if you initialize it using >> my $factory = Bio::Tools::Run::Hmmer->new(-hmm => 'model.hmm', -_READMETHOD => >> 'hmmer3'); >> >> at least for the programs that still exist with the same name in >> hmmer3. It won't support hmmer3 using the default options, though. >> >> If I have some spare time, I'll look into this, no promises on the >> timeframe, though. >> >> Cheers, >> Kai >> >> -- >> Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de >> Institute for Microbiology and Infection Medicine >> Division of Microbiology/Biotechnology >> Eberhard-Karls-University of T?bingen >> Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 >> D-72076 T?bingen Fax : ++49 7071 29-5979 >> Deutschland >> Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben > > Would be nice to convert this over (at some point) to use Mark's CommandExts. > I'm thinking of doing this with Infernal, so if I get that running it wouldn't > be terribly difficult to get hmmer3 working as well. > > chris > > > From jason at bioperl.org Thu Aug 12 18:37:11 2010 From: jason at bioperl.org (Jason Stajich) Date: Thu, 12 Aug 2010 11:37:11 -0700 Subject: [Bioperl-l] Other: Script for editing alignments? In-Reply-To: <20100812061811.4D92468539@evol.biology.mcmaster.ca> References: <20100812061811.4D92468539@evol.biology.mcmaster.ca> Message-ID: <4C643F57.3040408@bioperl.org> Hi Si - This is pretty straightforward with Bioperl. Here's one solution: #!/usr/bin/perl -w use strict; use Bio::AlignIO; my $in = Bio::AlignIO->new(-format => 'fasta', -file => shift @ARGV); my $out = Bio::AlignIO->new(-format => 'fasta'); while( my $aln = $in->next_aln ) { for my $seq ( $aln->each_seq ) { my $str = $seq->seq; if( $str =~ /^(-+)/ ) { my $rep = length($1); # replace from the 5' end substr($str,0,$rep,'N'x$rep); } if( $str =~ /(-+)$/ ) { my $rep = length($1); # replace from the 3' end substr($str,-1 * $rep,length($str),'N'x$rep); } $seq->seq($str); } # don't print the /start-end info in the FASTA ID $aln->set_displayname_flat(1); $out->write_aln($aln); } -jason evoldir at evol.biology.mcmaster.ca wrote, On 8/11/10 11:18 PM: > Dear All > > Alignment programs like MUSCLE and Clustal often output alignments with > "-" symbols indicating indels (real events) within sequence alignments, > but also "-" symbols at the 5' and 3' ends of sequences. The latter > however, are not real evolutionary events and really should be Ns > (missing data), depending on the sort of analytical framework you use. > > If there is sufficient heterogeneity and signal within the 5' and 3' > ends of sequences, the "-"s can be manually edited in a text editor to > Ns with no problem, if the alignment is small. If it is large (e.g. 2000 > seqs), or there are lots of alignments, it becomes a lengthy task. > > I'm investigating such alignments presently and so was wondering if > anyone had a clever way of implementing sed, or had a Perl script that > would perform such a task. Simply put, it would require replacing the 5' > and 3' "-" below only with Ns and leaving the within sequence "-"s > alone. The sequences naturally may span more than one line. > > >Taxon 1 > -----ATGCTG--TGACTG----TGACT--- > >Taxon 2 > ---GTATGTTG--TGACTGCT--TGACCGTC > > to > > >Taxon 1 > NNNNNATGCTG--TGACTG----TGACTNNN > >Taxon 2 > NNNGTATGTTG--TGACTGCT--TGACCGTC > > It's a simple task, but I haven't seen any scripts out there to do the job. > > If there are any scripters out there who can help, or if someone knows > of an application that would help, it would be great to hear from you. > > With best wishes and thanks > > Si Creer > > From genehack at genehack.org Fri Aug 13 00:32:07 2010 From: genehack at genehack.org (John SJ Anderson) Date: Thu, 12 Aug 2010 20:32:07 -0400 Subject: [Bioperl-l] Bio::SeqFeature::SimilarityPair->from_searchResult()? In-Reply-To: <4513D6B2-F7B3-4A6E-91CA-879C9E372E84@gmail.com> References: <4513D6B2-F7B3-4A6E-91CA-879C9E372E84@gmail.com> Message-ID: On Aug 10, 2010, at 21:54 , Douglas Hoen wrote: > I was wondering why the Synopsis in the docs for Bio::SeqFeature::SimilarityPair has the following: > $sim_pair = Bio::SeqFeature::SimilarityPair->from_searchResult($blastHit); > > There doesn't actually seem to be a from_searchResult method. Am I missing something? No, it looks like that method got removed back in 2002 as a part of moving to Bio::SearchIO (which was removed still later...): Unfortunately, the commit didn't update the documentation. From the tiny little bit I've looked at the code, it looks like you should just be calling the 'new()' method instead (note that it takes a set of arguments, not just a BLAST hit object). Hope this helps -- if you should happen to have the tuits, a patch to update the documentation to reflect the current interface would be awesome... chrs, john. From david.breimann at gmail.com Fri Aug 13 13:01:10 2010 From: david.breimann at gmail.com (David Breimann) Date: Fri, 13 Aug 2010 16:01:10 +0300 Subject: [Bioperl-l] Problem executing bp_genbank2gff3.pl from another perl script Message-ID: Hi, I am rying to run bp_genbank2gff3.pl from another perl script that gets a genbank as its argument. This does not work (no output files are generated): my $command = "bp_genbank2gff3.pl -y -o /tmp $ARGV[0]"; open( my $command_out, "-|", $command ); close $command_out; but this does open( my $command_out, "-|", $command ); sleep 3; # why do I need to sleep? close $command_out; Why? I though that close is supposed to block until the command is done: Closing any piped filehandle causes the parent process to wait for the child to finish... (see http://perldoc.perl.org/functions/open.html). Thanks Dave From jun.yin at ucd.ie Fri Aug 13 13:36:34 2010 From: jun.yin at ucd.ie (Jun Yin) Date: Fri, 13 Aug 2010 14:36:34 +0100 Subject: [Bioperl-l] Bio::LocatableSeq end checking inconsistency Message-ID: <004a01cb3aec$8c2ddd60$a4899820$%yin@ucd.ie> Hi, all, I am the google summer of code student working on Bio::Align subsystem refactoring. The code (Bio::SimpleAlign) I re-implemented now has passed nearly all the test, except a few tests on seq/start-end testing. But here comes a problem. This may be an old issue, that the Bio::LocatableSeq end assignment and checking are inconsistent. The current end checking method is based on: $end=$seq->_ungapped_len+$seq->start-1 However, this checking may not fit the real world case. The inconsistency usually happens when a few columns of the sequence are removed. For example: my $a = Bio::LocatableSeq->new( -id => 'a', -strand => 1, -seq => '-tcgatc-atcgatcg', -start => 30, -end => 43 ); If we remove the 1st, 8th and the last columns $a->seq() will be 'tcgatcatcgatc' $a->_ungapped_len==12 Actually, in the real world, the first residue will still be 30 (the old $seq->start), and the last residue is the residue before the 43 (the old $seq->end), thus 42. But if you call a validation, the calculation is $a->_ungapped_len+$a->start-1=12+30-1=41 So the reassignment of the $seq->end will not pass the validation. So unless you save the information to a new sequence object, the original position information will be lost anyway. But in some cases, we have to change the sequence in its original sequence object .. What is your suggestion on this issue? A. pass the test and lose the information #convenient in coding but the start-end annotation is not right any more B. keep the information and forget the test #the object will still remember where the last residue was in the original sequence. But is it really meaningful at all? Because all the other residues may come from nowhere C. Neither of above #any other suggestions? Cheers, Jun Yin Ph.D. student in U.C.D. Bioinformatics Laboratory Conway Institute University College Dublin From jessica.sun at gmail.com Fri Aug 13 15:06:46 2010 From: jessica.sun at gmail.com (Jessica Sun) Date: Fri, 13 Aug 2010 11:06:46 -0400 Subject: [Bioperl-l] Add sequence feature Message-ID: Does anyone knows how to open a genbank file, add new feature and then save a new genbank file with new feature added in bioperl ? thx -- Jessica Jingping Sun From jessica.sun at gmail.com Fri Aug 13 15:27:10 2010 From: jessica.sun at gmail.com (Jessica Sun) Date: Fri, 13 Aug 2010 11:27:10 -0400 Subject: [Bioperl-l] Add sequence feature In-Reply-To: <4C6562E0.7090008@gmail.com> References: <4C6562E0.7090008@gmail.com> Message-ID: unfortunately. I want to add the feature to the sequence object I got from the Genbank file, I do not mind to save a new genbank file but these new genbank file contains the original genbank format and info I got plus the new feature tags I need to added to. Any quick solution to this? thx Jessica On Fri, Aug 13, 2010 at 11:21 AM, Roy Chaudhuri wrote: > Hi Jessica. > > You need to use Bio::SeqIO to read in the GenBank file to a BioPerl > sequence object, and to write your new GenBank file: > http://www.bioperl.org/wiki/HOWTO:SeqIO > > To add a new feature follow the instructions here: > > http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Own_Sequences > > (except that you are adding the feature to the sequence object you got from > the Genbank file, not a new Bio::Seq object). > > Cheers. > Roy. > > > On 13/08/2010 16:06, Jessica Sun wrote: > >> Does anyone knows how to open a genbank file, add new feature and then >> save >> a new genbank >> file with new feature added in bioperl ? >> >> thx >> >> > -- Jessica Jingping Sun From roy.chaudhuri at gmail.com Fri Aug 13 15:21:04 2010 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Fri, 13 Aug 2010 16:21:04 +0100 Subject: [Bioperl-l] Add sequence feature In-Reply-To: References: Message-ID: <4C6562E0.7090008@gmail.com> Hi Jessica. You need to use Bio::SeqIO to read in the GenBank file to a BioPerl sequence object, and to write your new GenBank file: http://www.bioperl.org/wiki/HOWTO:SeqIO To add a new feature follow the instructions here: http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Own_Sequences (except that you are adding the feature to the sequence object you got from the Genbank file, not a new Bio::Seq object). Cheers. Roy. On 13/08/2010 16:06, Jessica Sun wrote: > Does anyone knows how to open a genbank file, add new feature and then save > a new genbank > file with new feature added in bioperl ? > > thx > From roy.chaudhuri at gmail.com Fri Aug 13 15:37:20 2010 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Fri, 13 Aug 2010 16:37:20 +0100 Subject: [Bioperl-l] Add sequence feature In-Reply-To: References: <4C6562E0.7090008@gmail.com> Message-ID: <4C6566B0.60706@gmail.com> I'm not sure I understand, do you mean that you want to load just the sequence from the GenBank file (ignoring the existing annotation), then add your own features? There are instructions on how to do that here: http://www.bioperl.org/wiki/HOWTO:SeqIO#Speed.2C_Bio::Seq::SeqBuilder On 13/08/2010 16:27, Jessica Sun wrote: > unfortunately. I want to add the feature to the sequence object I got > from the Genbank file, I do not mind to save a new genbank file but > these new genbank file contains the original genbank format and info I > got plus the new feature tags I need to added to. Any quick solution to > this? > > thx > > Jessica > > > > On Fri, Aug 13, 2010 at 11:21 AM, Roy Chaudhuri > wrote: > > Hi Jessica. > > You need to use Bio::SeqIO to read in the GenBank file to a BioPerl > sequence object, and to write your new GenBank file: > http://www.bioperl.org/wiki/HOWTO:SeqIO > > To add a new feature follow the instructions here: > http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Own_Sequences > > (except that you are adding the feature to the sequence object you > got from the Genbank file, not a new Bio::Seq object). > > Cheers. > Roy. > > > On 13/08/2010 16:06, Jessica Sun wrote: > > Does anyone knows how to open a genbank file, add new feature > and then save > a new genbank > file with new feature added in bioperl ? > > thx > > > > > > -- > Jessica Jingping Sun From roy.chaudhuri at gmail.com Fri Aug 13 15:57:27 2010 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Fri, 13 Aug 2010 16:57:27 +0100 Subject: [Bioperl-l] Add sequence feature In-Reply-To: References: <4C6562E0.7090008@gmail.com> <4C6566B0.60706@gmail.com> Message-ID: <4C656B67.5020402@gmail.com> Please remember to copy replies to the mailing list. You can loop over the features in your Bio::Seq object: for my $feat ($seq->get_SeqFeatures) { # do something } And once you have found the feature you want to modify, you can add a tag using something like: $feat->add_tag_value('note',"this is a note"); When you're finished you can write out the modified sequence object to a new GenBank file. On 13/08/2010 16:40, Jessica Sun wrote: > no i want to load the genbank file with existing features and I need to > add some new feature tags to the existing ones and then save to a new > update genbank file for local usage. I just not quite good on how to > easily merge the two steps you recommended into one in a neat way. > > thx > > > On Fri, Aug 13, 2010 at 11:37 AM, Roy Chaudhuri > wrote: > > I'm not sure I understand, do you mean that you want to load just > the sequence from the GenBank file (ignoring the existing > annotation), then add your own features? There are instructions on > how to do that here: > http://www.bioperl.org/wiki/HOWTO:SeqIO#Speed.2C_Bio::Seq::SeqBuilder > > > On 13/08/2010 16:27, Jessica Sun wrote: > > unfortunately. I want to add the feature to the sequence object > I got > from the Genbank file, I do not mind to save a new genbank file but > these new genbank file contains the original genbank format and > info I > got plus the new feature tags I need to added to. Any quick > solution to > this? > > thx > > Jessica > > > > On Fri, Aug 13, 2010 at 11:21 AM, Roy Chaudhuri > > >> wrote: > > Hi Jessica. > > You need to use Bio::SeqIO to read in the GenBank file to a > BioPerl > sequence object, and to write your new GenBank file: > http://www.bioperl.org/wiki/HOWTO:SeqIO > > To add a new feature follow the instructions here: > http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Own_Sequences > > (except that you are adding the feature to the sequence > object you > got from the Genbank file, not a new Bio::Seq object). > > Cheers. > Roy. > > > On 13/08/2010 16:06, Jessica Sun wrote: > > Does anyone knows how to open a genbank file, add new > feature > and then save > a new genbank > file with new feature added in bioperl ? > > thx > > > > > > -- > Jessica Jingping Sun > > > > > > -- > Jessica Jingping Sun From jessica.sun at gmail.com Fri Aug 13 17:06:32 2010 From: jessica.sun at gmail.com (Jessica Sun) Date: Fri, 13 Aug 2010 13:06:32 -0400 Subject: [Bioperl-l] Add sequence feature In-Reply-To: <4C656B67.5020402@gmail.com> References: <4C6562E0.7090008@gmail.com> <4C6566B0.60706@gmail.com> <4C656B67.5020402@gmail.com> Message-ID: Thanks. I somehow get these error messages. --------------------- WARNING --------------------- MSG: Bio::SeqIO::genbank=HASH(0xa7ba1c) is not a SeqI compliant module. Attempting to dump, but may fail! --------------------------------------------------- Can't locate object method "seq" via package "Bio::SeqIO::genbank" at /Library/Perl/5.8.8/Bio/SeqIO/genbank.pm line 760, line 447. by doing this, my $feat = new Bio::SeqFeature::Generic(-start =>20, -end => $40, -primary_tag => 'newfeature' ); $feat->add_tag_value("note","this is notes"); $f->add_SeqFeature($feat); ## f is original feature pointer $io = Bio::SeqIO->new(-format => "genbank", -file => ">$newoutfile" ); $io->write_seq($seqio_object); On Fri, Aug 13, 2010 at 11:57 AM, Roy Chaudhuri wrote: > Please remember to copy replies to the mailing list. > > You can loop over the features in your Bio::Seq object: > for my $feat ($seq->get_SeqFeatures) { # do something } > > And once you have found the feature you want to modify, you can add a tag > using something like: > $feat->add_tag_value('note',"this is a note"); > > When you're finished you can write out the modified sequence object to a > new GenBank file. > > > On 13/08/2010 16:40, Jessica Sun wrote: > >> no i want to load the genbank file with existing features and I need to >> add some new feature tags to the existing ones and then save to a new >> update genbank file for local usage. I just not quite good on how to >> easily merge the two steps you recommended into one in a neat way. >> >> thx >> >> >> On Fri, Aug 13, 2010 at 11:37 AM, Roy Chaudhuri > > wrote: >> >> I'm not sure I understand, do you mean that you want to load just >> the sequence from the GenBank file (ignoring the existing >> annotation), then add your own features? There are instructions on >> how to do that here: >> http://www.bioperl.org/wiki/HOWTO:SeqIO#Speed.2C_Bio::Seq::SeqBuilder >> >> >> On 13/08/2010 16:27, Jessica Sun wrote: >> >> unfortunately. I want to add the feature to the sequence object >> I got >> from the Genbank file, I do not mind to save a new genbank file but >> these new genbank file contains the original genbank format and >> info I >> got plus the new feature tags I need to added to. Any quick >> solution to >> this? >> >> thx >> >> Jessica >> >> >> >> On Fri, Aug 13, 2010 at 11:21 AM, Roy Chaudhuri >> >> > >> wrote: >> >> Hi Jessica. >> >> You need to use Bio::SeqIO to read in the GenBank file to a >> BioPerl >> sequence object, and to write your new GenBank file: >> http://www.bioperl.org/wiki/HOWTO:SeqIO >> >> To add a new feature follow the instructions here: >> >> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Own_Sequences >> >> (except that you are adding the feature to the sequence >> object you >> got from the Genbank file, not a new Bio::Seq object). >> >> Cheers. >> Roy. >> >> >> On 13/08/2010 16:06, Jessica Sun wrote: >> >> Does anyone knows how to open a genbank file, add new >> feature >> and then save >> a new genbank >> file with new feature added in bioperl ? >> >> thx >> >> >> >> >> >> -- >> Jessica Jingping Sun >> >> >> >> >> >> -- >> Jessica Jingping Sun >> > > -- Jessica Jingping Sun From drummike at gmail.com Fri Aug 13 17:41:55 2010 From: drummike at gmail.com (Mike Williams) Date: Fri, 13 Aug 2010 13:41:55 -0400 Subject: [Bioperl-l] Add sequence feature In-Reply-To: References: <4C6562E0.7090008@gmail.com> <4C6566B0.60706@gmail.com> <4C656B67.5020402@gmail.com> Message-ID: On Fri, Aug 13, 2010 at 1:06 PM, Jessica Sun wrote: > Thanks. I somehow get these error messages. > by doing this, > > my $feat = new Bio::SeqFeature::Generic(-start =>20, > -end => $40, > -primary_tag => 'newfeature' ); > $feat->add_tag_value("note","this is > notes"); > That $40 looks fishy. Try deleting the dollar sign. You did mean just 40, right? Mike From MEC at stowers.org Fri Aug 13 17:37:50 2010 From: MEC at stowers.org (Cook, Malcolm) Date: Fri, 13 Aug 2010 12:37:50 -0500 Subject: [Bioperl-l] Add sequence feature In-Reply-To: References: <4C6562E0.7090008@gmail.com> <4C6566B0.60706@gmail.com> <4C656B67.5020402@gmail.com> Message-ID: Jessica, Show more code! In particular, where did $f get set? --Malcolm -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Jessica Sun Sent: Friday, August 13, 2010 12:07 PM To: Roy Chaudhuri Cc: bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Add sequence feature Thanks. I somehow get these error messages. --------------------- WARNING --------------------- MSG: Bio::SeqIO::genbank=HASH(0xa7ba1c) is not a SeqI compliant module. Attempting to dump, but may fail! --------------------------------------------------- Can't locate object method "seq" via package "Bio::SeqIO::genbank" at /Library/Perl/5.8.8/Bio/SeqIO/genbank.pm line 760, line 447. by doing this, my $feat = new Bio::SeqFeature::Generic(-start =>20, -end => $40, -primary_tag => 'newfeature' ); $feat->add_tag_value("note","this is notes"); $f->add_SeqFeature($feat); ## f is original feature pointer $io = Bio::SeqIO->new(-format => "genbank", -file => ">$newoutfile" ); $io->write_seq($seqio_object); On Fri, Aug 13, 2010 at 11:57 AM, Roy Chaudhuri wrote: > Please remember to copy replies to the mailing list. > > You can loop over the features in your Bio::Seq object: > for my $feat ($seq->get_SeqFeatures) { # do something } > > And once you have found the feature you want to modify, you can add a > tag using something like: > $feat->add_tag_value('note',"this is a note"); > > When you're finished you can write out the modified sequence object to > a new GenBank file. > > > On 13/08/2010 16:40, Jessica Sun wrote: > >> no i want to load the genbank file with existing features and I need >> to add some new feature tags to the existing ones and then save to a >> new update genbank file for local usage. I just not quite good on how >> to easily merge the two steps you recommended into one in a neat way. >> >> thx >> >> >> On Fri, Aug 13, 2010 at 11:37 AM, Roy Chaudhuri >> > wrote: >> >> I'm not sure I understand, do you mean that you want to load just >> the sequence from the GenBank file (ignoring the existing >> annotation), then add your own features? There are instructions on >> how to do that here: >> >> http://www.bioperl.org/wiki/HOWTO:SeqIO#Speed.2C_Bio::Seq::SeqBuilder >> >> >> On 13/08/2010 16:27, Jessica Sun wrote: >> >> unfortunately. I want to add the feature to the sequence object >> I got >> from the Genbank file, I do not mind to save a new genbank file but >> these new genbank file contains the original genbank format and >> info I >> got plus the new feature tags I need to added to. Any quick >> solution to >> this? >> >> thx >> >> Jessica >> >> >> >> On Fri, Aug 13, 2010 at 11:21 AM, Roy Chaudhuri >> >> > >> wrote: >> >> Hi Jessica. >> >> You need to use Bio::SeqIO to read in the GenBank file to a >> BioPerl >> sequence object, and to write your new GenBank file: >> http://www.bioperl.org/wiki/HOWTO:SeqIO >> >> To add a new feature follow the instructions here: >> >> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Ow >> n_Sequences >> >> (except that you are adding the feature to the sequence >> object you >> got from the Genbank file, not a new Bio::Seq object). >> >> Cheers. >> Roy. >> >> >> On 13/08/2010 16:06, Jessica Sun wrote: >> >> Does anyone knows how to open a genbank file, add new >> feature >> and then save >> a new genbank >> file with new feature added in bioperl ? >> >> thx >> >> >> >> >> >> -- >> Jessica Jingping Sun >> >> >> >> >> >> -- >> Jessica Jingping Sun >> > > -- Jessica Jingping Sun _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From Kevin.M.Brown at asu.edu Fri Aug 13 17:53:50 2010 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Fri, 13 Aug 2010 10:53:50 -0700 Subject: [Bioperl-l] Add sequence feature In-Reply-To: References: <4C6562E0.7090008@gmail.com><4C6566B0.60706@gmail.com><4C656B67.5020402@gmail.com> Message-ID: <1A4207F8295607498283FE9E93B775B406E4529F@EX02.asurite.ad.asu.edu> If I'm reading your sample code correctly, then you are mistakenly trying to output the input SeqIO object and not the actual Bio::Seq object that was read in by SeqIO. My $seqio = Bio::SeqIO->new; My $seq = $seqio->next_seq; #manipulate $seq My $out = Bio::SeqIO->new; $out->write_seq($seq); -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Jessica Sun Sent: Friday, August 13, 2010 10:07 AM To: Roy Chaudhuri Cc: bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Add sequence feature Thanks. I somehow get these error messages. --------------------- WARNING --------------------- MSG: Bio::SeqIO::genbank=HASH(0xa7ba1c) is not a SeqI compliant module. Attempting to dump, but may fail! --------------------------------------------------- Can't locate object method "seq" via package "Bio::SeqIO::genbank" at /Library/Perl/5.8.8/Bio/SeqIO/genbank.pm line 760, line 447. by doing this, my $feat = new Bio::SeqFeature::Generic(-start =>20, -end => $40, -primary_tag => 'newfeature' ); $feat->add_tag_value("note","this is notes"); $f->add_SeqFeature($feat); ## f is original feature pointer $io = Bio::SeqIO->new(-format => "genbank", -file => ">$newoutfile" ); $io->write_seq($seqio_object); On Fri, Aug 13, 2010 at 11:57 AM, Roy Chaudhuri wrote: > Please remember to copy replies to the mailing list. > > You can loop over the features in your Bio::Seq object: > for my $feat ($seq->get_SeqFeatures) { # do something } > > And once you have found the feature you want to modify, you can add a tag > using something like: > $feat->add_tag_value('note',"this is a note"); > > When you're finished you can write out the modified sequence object to a > new GenBank file. > > > On 13/08/2010 16:40, Jessica Sun wrote: > >> no i want to load the genbank file with existing features and I need to >> add some new feature tags to the existing ones and then save to a new >> update genbank file for local usage. I just not quite good on how to >> easily merge the two steps you recommended into one in a neat way. >> >> thx >> >> >> On Fri, Aug 13, 2010 at 11:37 AM, Roy Chaudhuri > > wrote: >> >> I'm not sure I understand, do you mean that you want to load just >> the sequence from the GenBank file (ignoring the existing >> annotation), then add your own features? There are instructions on >> how to do that here: >> http://www.bioperl.org/wiki/HOWTO:SeqIO#Speed.2C_Bio::Seq::SeqBuilder >> >> >> On 13/08/2010 16:27, Jessica Sun wrote: >> >> unfortunately. I want to add the feature to the sequence object >> I got >> from the Genbank file, I do not mind to save a new genbank file but >> these new genbank file contains the original genbank format and >> info I >> got plus the new feature tags I need to added to. Any quick >> solution to >> this? >> >> thx >> >> Jessica >> >> >> >> On Fri, Aug 13, 2010 at 11:21 AM, Roy Chaudhuri >> >> > >> wrote: >> >> Hi Jessica. >> >> You need to use Bio::SeqIO to read in the GenBank file to a >> BioPerl >> sequence object, and to write your new GenBank file: >> http://www.bioperl.org/wiki/HOWTO:SeqIO >> >> To add a new feature follow the instructions here: >> >> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Own_S equences >> >> (except that you are adding the feature to the sequence >> object you >> got from the Genbank file, not a new Bio::Seq object). >> >> Cheers. >> Roy. >> >> >> On 13/08/2010 16:06, Jessica Sun wrote: >> >> Does anyone knows how to open a genbank file, add new >> feature >> and then save >> a new genbank >> file with new feature added in bioperl ? >> >> thx >> >> >> >> >> >> -- >> Jessica Jingping Sun >> >> >> >> >> >> -- >> Jessica Jingping Sun >> > > -- Jessica Jingping Sun _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From jessica.sun at gmail.com Fri Aug 13 19:16:51 2010 From: jessica.sun at gmail.com (Jessica Sun) Date: Fri, 13 Aug 2010 15:16:51 -0400 Subject: [Bioperl-l] Fwd: Add sequence feature In-Reply-To: References: <4C6562E0.7090008@gmail.com> <4C6566B0.60706@gmail.com> <4C656B67.5020402@gmail.com> <1A4207F8295607498283FE9E93B775B406E4529F@EX02.asurite.ad.asu.edu> Message-ID: ---------- Forwarded message ---------- From: Jessica Sun Date: Fri, Aug 13, 2010 at 3:16 PM Subject: Re: [Bioperl-l] Add sequence feature To: Kevin Brown yes, I change that, somehow it still did not take the added features in. On Fri, Aug 13, 2010 at 1:53 PM, Kevin Brown wrote: > If I'm reading your sample code correctly, then you are mistakenly > trying to output the input SeqIO object and not the actual Bio::Seq > object that was read in by SeqIO. > > My $seqio = Bio::SeqIO->new; > My $seq = $seqio->next_seq; > > #manipulate $seq > > My $out = Bio::SeqIO->new; > $out->write_seq($seq); > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Jessica Sun > Sent: Friday, August 13, 2010 10:07 AM > To: Roy Chaudhuri > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Add sequence feature > > Thanks. I somehow get these error messages. > > --------------------- WARNING --------------------- > MSG: Bio::SeqIO::genbank=HASH(0xa7ba1c) is not a SeqI compliant module. > Attempting to dump, but may fail! > --------------------------------------------------- > Can't locate object method "seq" via package "Bio::SeqIO::genbank" at > /Library/Perl/5.8.8/Bio/SeqIO/genbank.pm line 760, line 447. > > by doing this, > > my $feat = new Bio::SeqFeature::Generic(-start =>20, > -end => $40, > -primary_tag => 'newfeature' ); > $feat->add_tag_value("note","this is > notes"); > $f->add_SeqFeature($feat); ## f is original feature pointer > $io = Bio::SeqIO->new(-format => "genbank", -file => ">$newoutfile" ); > > $io->write_seq($seqio_object); > > On Fri, Aug 13, 2010 at 11:57 AM, Roy Chaudhuri > wrote: > > > Please remember to copy replies to the mailing list. > > > > You can loop over the features in your Bio::Seq object: > > for my $feat ($seq->get_SeqFeatures) { # do something } > > > > And once you have found the feature you want to modify, you can add a > tag > > using something like: > > $feat->add_tag_value('note',"this is a note"); > > > > When you're finished you can write out the modified sequence object to > a > > new GenBank file. > > > > > > On 13/08/2010 16:40, Jessica Sun wrote: > > > >> no i want to load the genbank file with existing features and I need > to > >> add some new feature tags to the existing ones and then save to a new > >> update genbank file for local usage. I just not quite good on how to > >> easily merge the two steps you recommended into one in a neat way. > >> > >> thx > >> > >> > >> On Fri, Aug 13, 2010 at 11:37 AM, Roy Chaudhuri > >> > wrote: > >> > >> I'm not sure I understand, do you mean that you want to load just > >> the sequence from the GenBank file (ignoring the existing > >> annotation), then add your own features? There are instructions on > >> how to do that here: > >> > http://www.bioperl.org/wiki/HOWTO:SeqIO#Speed.2C_Bio::Seq::SeqBuilder > >> > >> > >> On 13/08/2010 16:27, Jessica Sun wrote: > >> > >> unfortunately. I want to add the feature to the sequence > object > >> I got > >> from the Genbank file, I do not mind to save a new genbank > file but > >> these new genbank file contains the original genbank format > and > >> info I > >> got plus the new feature tags I need to added to. Any quick > >> solution to > >> this? > >> > >> thx > >> > >> Jessica > >> > >> > >> > >> On Fri, Aug 13, 2010 at 11:21 AM, Roy Chaudhuri > >> > >> >> >> wrote: > >> > >> Hi Jessica. > >> > >> You need to use Bio::SeqIO to read in the GenBank file to > a > >> BioPerl > >> sequence object, and to write your new GenBank file: > >> http://www.bioperl.org/wiki/HOWTO:SeqIO > >> > >> To add a new feature follow the instructions here: > >> > >> > http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Own_S > equences > >> > >> (except that you are adding the feature to the sequence > >> object you > >> got from the Genbank file, not a new Bio::Seq object). > >> > >> Cheers. > >> Roy. > >> > >> > >> On 13/08/2010 16:06, Jessica Sun wrote: > >> > >> Does anyone knows how to open a genbank file, add new > >> feature > >> and then save > >> a new genbank > >> file with new feature added in bioperl ? > >> > >> thx > >> > >> > >> > >> > >> > >> -- > >> Jessica Jingping Sun > >> > >> > >> > >> > >> > >> -- > >> Jessica Jingping Sun > >> > > > > > > > -- > Jessica Jingping Sun > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Jessica Jingping Sun -- Jessica Jingping Sun From MEC at stowers.org Fri Aug 13 19:56:09 2010 From: MEC at stowers.org (Cook, Malcolm) Date: Fri, 13 Aug 2010 14:56:09 -0500 Subject: [Bioperl-l] Fwd: Add sequence feature In-Reply-To: References: <4C6562E0.7090008@gmail.com> <4C6566B0.60706@gmail.com> <4C656B67.5020402@gmail.com> <1A4207F8295607498283FE9E93B775B406E4529F@EX02.asurite.ad.asu.edu> Message-ID: if you want to show all your code we might not have to guess at what the problem is..... Malcolm Cook Stowers Institute for Medical Research - Bioinformatics Kansas City, Missouri USA -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Jessica Sun Sent: Friday, August 13, 2010 2:17 PM To: bioperl-l at lists.open-bio.org Subject: [Bioperl-l] Fwd: Add sequence feature ---------- Forwarded message ---------- From: Jessica Sun Date: Fri, Aug 13, 2010 at 3:16 PM Subject: Re: [Bioperl-l] Add sequence feature To: Kevin Brown yes, I change that, somehow it still did not take the added features in. On Fri, Aug 13, 2010 at 1:53 PM, Kevin Brown wrote: > If I'm reading your sample code correctly, then you are mistakenly > trying to output the input SeqIO object and not the actual Bio::Seq > object that was read in by SeqIO. > > My $seqio = Bio::SeqIO->new; > My $seq = $seqio->next_seq; > > #manipulate $seq > > My $out = Bio::SeqIO->new; > $out->write_seq($seq); > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Jessica Sun > Sent: Friday, August 13, 2010 10:07 AM > To: Roy Chaudhuri > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Add sequence feature > > Thanks. I somehow get these error messages. > > --------------------- WARNING --------------------- > MSG: Bio::SeqIO::genbank=HASH(0xa7ba1c) is not a SeqI compliant module. > Attempting to dump, but may fail! > --------------------------------------------------- > Can't locate object method "seq" via package "Bio::SeqIO::genbank" at > /Library/Perl/5.8.8/Bio/SeqIO/genbank.pm line 760, line 447. > > by doing this, > > my $feat = new Bio::SeqFeature::Generic(-start =>20, > -end => $40, > -primary_tag => 'newfeature' ); > $feat->add_tag_value("note","this > is notes"); $f->add_SeqFeature($feat); ## f is original feature > pointer $io = Bio::SeqIO->new(-format => "genbank", -file => > ">$newoutfile" ); > > $io->write_seq($seqio_object); > > On Fri, Aug 13, 2010 at 11:57 AM, Roy Chaudhuri > wrote: > > > Please remember to copy replies to the mailing list. > > > > You can loop over the features in your Bio::Seq object: > > for my $feat ($seq->get_SeqFeatures) { # do something } > > > > And once you have found the feature you want to modify, you can add > > a > tag > > using something like: > > $feat->add_tag_value('note',"this is a note"); > > > > When you're finished you can write out the modified sequence object > > to > a > > new GenBank file. > > > > > > On 13/08/2010 16:40, Jessica Sun wrote: > > > >> no i want to load the genbank file with existing features and I > >> need > to > >> add some new feature tags to the existing ones and then save to a > >> new update genbank file for local usage. I just not quite good on > >> how to easily merge the two steps you recommended into one in a neat way. > >> > >> thx > >> > >> > >> On Fri, Aug 13, 2010 at 11:37 AM, Roy Chaudhuri > >> > wrote: > >> > >> I'm not sure I understand, do you mean that you want to load just > >> the sequence from the GenBank file (ignoring the existing > >> annotation), then add your own features? There are instructions on > >> how to do that here: > >> > http://www.bioperl.org/wiki/HOWTO:SeqIO#Speed.2C_Bio::Seq::SeqBuilder > >> > >> > >> On 13/08/2010 16:27, Jessica Sun wrote: > >> > >> unfortunately. I want to add the feature to the sequence > object > >> I got > >> from the Genbank file, I do not mind to save a new genbank > file but > >> these new genbank file contains the original genbank format > and > >> info I > >> got plus the new feature tags I need to added to. Any quick > >> solution to > >> this? > >> > >> thx > >> > >> Jessica > >> > >> > >> > >> On Fri, Aug 13, 2010 at 11:21 AM, Roy Chaudhuri > >> > >> >> >> wrote: > >> > >> Hi Jessica. > >> > >> You need to use Bio::SeqIO to read in the GenBank file > >> to > a > >> BioPerl > >> sequence object, and to write your new GenBank file: > >> http://www.bioperl.org/wiki/HOWTO:SeqIO > >> > >> To add a new feature follow the instructions here: > >> > >> > http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Own > _S > equences > >> > >> (except that you are adding the feature to the sequence > >> object you > >> got from the Genbank file, not a new Bio::Seq object). > >> > >> Cheers. > >> Roy. > >> > >> > >> On 13/08/2010 16:06, Jessica Sun wrote: > >> > >> Does anyone knows how to open a genbank file, add new > >> feature > >> and then save > >> a new genbank > >> file with new feature added in bioperl ? > >> > >> thx > >> > >> > >> > >> > >> > >> -- > >> Jessica Jingping Sun > >> > >> > >> > >> > >> > >> -- > >> Jessica Jingping Sun > >> > > > > > > > -- > Jessica Jingping Sun > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Jessica Jingping Sun -- Jessica Jingping Sun _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Aug 16 18:02:15 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 16 Aug 2010 13:02:15 -0500 Subject: [Bioperl-l] Bug? Features with similar ranges, different IDs are considered overlapping Message-ID: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu> All, This is in reference to a bug report I filed a while back. In the below test script, two features with the same start/end are compared. If the features have the same seq_id(), overlap succeeds. If the seq_id is changed (e.g. is on another chromosome, for instance), the overlap still succeeds. The question is: is this a bug? My vote would be 'yes', but there have been various arguments to say it's not. chris (maybe I'll make this a regular thing on the list, just to hash out some of the edge cases I run into periodically) ========================================= #!/usr/bin/perl -w use strict; use warnings; use Test::More; use Bio::SeqFeature::Generic; my ( $feat1, $feat2 ); $feat1 = Bio::SeqFeature::Generic->new( -start => 40, -end => 80, -strand => 1, -seq_id => 'ABC123', ); is $feat1->start, 40, 'start of feature location'; is $feat1->end, 80, 'end of feature location'; is $feat1->seq_id, 'ABC123', 'seq_id'; $feat2 = Bio::SeqFeature::Generic->new( -start => 40, -end => 80, -strand => 1, -seq_id => 'ABC123', ); is $feat2->start, 40, 'start of feature location'; is $feat2->end, 80, 'end of feature location'; is $feat2->seq_id, 'ABC123', 'seq_id'; # Generic features with same Seq ID should overlap ok( $feat2->overlaps($feat1), 'feat2 overlaps feat1' ); # Generic features with different Seq IDs shouldn't overlap is( $feat2->seq_id('XYZ678'), 'XYZ678', 'change seq_id' ); # this currently fails ok( !( $feat2->overlaps($feat1), 'feat2 doesn\'t overlap feat1' ) ); done_testing(); From David.Messina at sbc.su.se Mon Aug 16 18:51:54 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 16 Aug 2010 20:51:54 +0200 Subject: [Bioperl-l] Bug? Features with similar ranges, different IDs are considered overlapping In-Reply-To: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu> References: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu> Message-ID: > The question is: is this a bug? Hmm, tricky. Genomic start and end positions with differing IDs shouldn't overlap, but can't SeqFeatures apply to proteins and other molecules where one would want to compare positions without regard to ID? Dave From cjfields at illinois.edu Tue Aug 17 01:39:00 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 16 Aug 2010 20:39:00 -0500 Subject: [Bioperl-l] Bug? Features with similar ranges, different IDs are considered overlapping In-Reply-To: References: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu> Message-ID: On Aug 16, 2010, at 1:51 PM, Dave Messina wrote: >> The question is: is this a bug? > > Hmm, tricky. > > Genomic start and end positions with differing IDs shouldn't overlap, but can't SeqFeatures apply to proteins and other molecules where one would want to compare positions without regard to ID? > > Dave Good point; it's probably the context the methods are used that matters. So, maybe just a document clarification? chris From David.Messina at sbc.su.se Tue Aug 17 09:06:05 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 17 Aug 2010 11:06:05 +0200 Subject: [Bioperl-l] Bug? Features with similar ranges, different IDs are considered overlapping In-Reply-To: References: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu> Message-ID: <83732A2C-C6E3-479A-ACAA-FE5756271991@sbc.su.se> > Good point; it's probably the context the methods are used that matters. So, maybe just a document clarification? That's always good, but it really doesn't solve the issue you're describing. I mean, who would expect to get overlaps for features on different chromosomes? To me, that's a clear violation of reasonable user expectations. You shouldn't have to read the docs about something like that. So what's the solution for these duelling use cases? I haven't thought about it much, but a first approximation might be to add a -genomic boolean flag that, when true, would do the right thing and check the ID when doing overlaps or other positional comparisons. (Maybe -genomic is too obscure. Maybe it should be -same_id_for_overlaps or something like that.) And maybe having to know to set a flag is effectively the same thing as having to read the docs to understand SeqFeature's overlap behavior. What do the rest of you out there think? Dave From scott at scottcain.net Tue Aug 17 12:45:27 2010 From: scott at scottcain.net (Scott Cain) Date: Tue, 17 Aug 2010 08:45:27 -0400 Subject: [Bioperl-l] Bug? Features with similar ranges, different IDs are considered overlapping In-Reply-To: <83732A2C-C6E3-479A-ACAA-FE5756271991@sbc.su.se> References: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu> <83732A2C-C6E3-479A-ACAA-FE5756271991@sbc.su.se> Message-ID: Hi Dave and Chris, It seems to me that the genomic comparison is the thing people would do more often, so if you're going to create a flag, the default should be for the genomic comparison and if somebody is doing the protein space comparison and not getting the the expected results, they'll probably read the docs to find out why. Scott -- Scott Cain, Ph. D. scott at scottcain dot net Ontario Institute for Cancer Research http://gmod.org/ 216 392 3087 Snet from my iPhone. On Aug 17, 2010, at 5:06 AM, Dave Messina wrote: >> Good point; it's probably the context the methods are used that matters. So, maybe just a document clarification? > > That's always good, but it really doesn't solve the issue you're describing. > > I mean, who would expect to get overlaps for features on different chromosomes? > > To me, that's a clear violation of reasonable user expectations. You shouldn't have to read the docs about something like that. > > So what's the solution for these duelling use cases? I haven't thought about it much, but a first approximation might be to add a -genomic boolean flag that, when true, would do the right thing and check the ID when doing overlaps or other positional comparisons. > > (Maybe -genomic is too obscure. Maybe it should be -same_id_for_overlaps or something like that.) > > And maybe having to know to set a flag is effectively the same thing as having to read the docs to understand SeqFeature's overlap behavior. > > What do the rest of you out there think? > > > Dave > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From david.breimann at gmail.com Tue Aug 17 13:44:08 2010 From: david.breimann at gmail.com (David Breimann) Date: Tue, 17 Aug 2010 16:44:08 +0300 Subject: [Bioperl-l] bp_genbank2gff3.pl error with circular genomes Message-ID: Hello, The following genbank has a gene that runs over the 'end" of the chromosome and into its "beginning", and the script generates an error. ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Bacillus_cereus_ATCC_10987/NC_005707.gbk NC_005707 Unflattening error: Details: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: PROBLEM, SEVERITY==2 Ranges not in correct order. Strange ensembl genbank entry? Range: [207497,208369] [1,687] STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:473 STACK: Bio::SeqFeature::Tools::Unflattener::problem /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:952 STACK: Bio::SeqFeature::Tools::Unflattener::_check_order_is_consistent /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2842 STACK: Bio::SeqFeature::Tools::Unflattener::infer_mRNA_from_CDS /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2713 STACK: Bio::SeqFeature::Tools::Unflattener::unflatten_seq /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:1532 STACK: main::unflatten_seq /usr/local/bin/bp_genbank2gff3.pl:1023 STACK: /usr/local/bin/bp_genbank2gff3.pl:506 ----------------------------------------------------------- Best, Dave From cjfields at illinois.edu Tue Aug 17 13:51:02 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 17 Aug 2010 08:51:02 -0500 Subject: [Bioperl-l] bp_genbank2gff3.pl error with circular genomes In-Reply-To: References: Message-ID: <8E620E8B-4A4A-42B2-A2FA-60C8ECA4C8A5@illinois.edu> I think Chris Mungall has a branch set up for this in bioperl: http://github.com/bioperl/bioperl-live/tree/circular Is that correct? Should we merge that code into the master branch? chris On Aug 17, 2010, at 8:44 AM, David Breimann wrote: > Hello, > > The following genbank has a gene that runs over the 'end" of the > chromosome and into its "beginning", and the script generates an > error. > > ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Bacillus_cereus_ATCC_10987/NC_005707.gbk > > NC_005707 Unflattening error: > Details: > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: PROBLEM, SEVERITY==2 > Ranges not in correct order. Strange ensembl genbank entry? Range: > [207497,208369] [1,687] > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:473 > STACK: Bio::SeqFeature::Tools::Unflattener::problem > /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:952 > STACK: Bio::SeqFeature::Tools::Unflattener::_check_order_is_consistent > /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2842 > STACK: Bio::SeqFeature::Tools::Unflattener::infer_mRNA_from_CDS > /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2713 > STACK: Bio::SeqFeature::Tools::Unflattener::unflatten_seq > /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:1532 > STACK: main::unflatten_seq /usr/local/bin/bp_genbank2gff3.pl:1023 > STACK: /usr/local/bin/bp_genbank2gff3.pl:506 > ----------------------------------------------------------- > > Best, > Dave > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Tue Aug 17 13:52:11 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 17 Aug 2010 15:52:11 +0200 Subject: [Bioperl-l] Bug? Features with similar ranges, different IDs are considered overlapping In-Reply-To: References: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu> <83732A2C-C6E3-479A-ACAA-FE5756271991@sbc.su.se> Message-ID: > It seems to me that the genomic comparison is the thing people would do more often, so if you're going to create a flag, the default should be for the genomic comparison Yep, agreed. And such a flag should be named for the non-default behavior, then, like: -ignore_IDs_for_overlaps Dave From douglas.hoen at gmail.com Thu Aug 12 14:24:27 2010 From: douglas.hoen at gmail.com (Douglas Hoen) Date: Thu, 12 Aug 2010 10:24:27 -0400 Subject: [Bioperl-l] HMMER3 to GFF3 In-Reply-To: <20100812141645.1dc6507a.kai.blin@biotech.uni-tuebingen.de> References: <4bb89ced-69d9-43ff-ae20-4ce134efc40a@f6g2000yqa.googlegroups.com> <20100812141645.1dc6507a.kai.blin@biotech.uni-tuebingen.de> Message-ID: Hi Kai, Here it is. Thanks, -- Doug -------------- next part -------------- A non-text attachment was scrubbed... Name: chr1-tesigsv2.hmmscan Type: application/octet-stream Size: 676132 bytes Desc: not available URL: -------------- next part -------------- On 2010-08-12, at 8:16 AM, Kai Blin wrote: > On Wed, 11 Aug 2010 22:59:37 -0700 (PDT) > Doug Hoen wrote: > > Hi Doug, > >> Could someone please confirm whether the results are incorrect and, if >> so, perhaps suggest a fix? It may well be that this problem is due to >> the unusual way I am using hmmscan, rather than a problem with HMMER3 >> parsing...? > > Can you please attach your hmmer input file? Along the way something > inserted line breaks, making it unreadable. > > It might well be possible that the HMMer3 parser still handles a little > different from the HMMer2 parser, I haven't tried that script. > > Cheers, > Kai > > -- > Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de > Institute for Microbiology and Infection Medicine > Division of Microbiology/Biotechnology > Eberhard-Karls-University of T?bingen > Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 > D-72076 T?bingen Fax : ++49 7071 29-5979 > Deutschland > Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From CJMungall at lbl.gov Tue Aug 17 15:53:15 2010 From: CJMungall at lbl.gov (Chris Mungall) Date: Tue, 17 Aug 2010 08:53:15 -0700 Subject: [Bioperl-l] bp_genbank2gff3.pl error with circular genomes In-Reply-To: <8E620E8B-4A4A-42B2-A2FA-60C8ECA4C8A5@illinois.edu> References: <8E620E8B-4A4A-42B2-A2FA-60C8ECA4C8A5@illinois.edu> Message-ID: You can merge this in. It should allow David to proceed. I haven't kept up on synchrony between bioperl and GFF on circular genomes. The above fix is conservative in that essentially preserves the genbank coordinates even when the origin is crossed: http://github.com/bioperl/bioperl-live/commit/d752a4cb5168d1bb01f8c80247a57f66b2bd9daf However, if this is to conform to GFF3 then the resulting coordinates that cross the origin should have start/end incremented by the genome length On Aug 17, 2010, at 6:51 AM, Chris Fields wrote: > I think Chris Mungall has a branch set up for this in bioperl: > > http://github.com/bioperl/bioperl-live/tree/circular > > Is that correct? Should we merge that code into the master branch? > > chris > > On Aug 17, 2010, at 8:44 AM, David Breimann wrote: > >> Hello, >> >> The following genbank has a gene that runs over the 'end" of the >> chromosome and into its "beginning", and the script generates an >> error. >> >> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Bacillus_cereus_ATCC_10987/NC_005707.gbk >> >> NC_005707 Unflattening error: >> Details: >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: PROBLEM, SEVERITY==2 >> Ranges not in correct order. Strange ensembl genbank entry? Range: >> [207497,208369] [1,687] >> STACK: Error::throw >> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/ >> Root.pm:473 >> STACK: Bio::SeqFeature::Tools::Unflattener::problem >> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:952 >> STACK: >> Bio::SeqFeature::Tools::Unflattener::_check_order_is_consistent >> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2842 >> STACK: Bio::SeqFeature::Tools::Unflattener::infer_mRNA_from_CDS >> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2713 >> STACK: Bio::SeqFeature::Tools::Unflattener::unflatten_seq >> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:1532 >> STACK: main::unflatten_seq /usr/local/bin/bp_genbank2gff3.pl:1023 >> STACK: /usr/local/bin/bp_genbank2gff3.pl:506 >> ----------------------------------------------------------- >> >> Best, >> Dave >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Tue Aug 17 19:24:23 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 17 Aug 2010 14:24:23 -0500 Subject: [Bioperl-l] bp_genbank2gff3.pl error with circular genomes In-Reply-To: References: <8E620E8B-4A4A-42B2-A2FA-60C8ECA4C8A5@illinois.edu> Message-ID: <8C50FCC4-4BB5-4ACB-A138-BB20F50D2C45@illinois.edu> On Aug 17, 2010, at 10:53 AM, Chris Mungall wrote: > You can merge this in. It should allow David to proceed. Will do. I'll go ahead and delete the remote branch as well. > I haven't kept up on synchrony between bioperl and GFF on circular genomes. The above fix is conservative in that essentially preserves the genbank coordinates even when the origin is crossed: > > http://github.com/bioperl/bioperl-live/commit/d752a4cb5168d1bb01f8c80247a57f66b2bd9daf > > However, if this is to conform to GFF3 then the resulting coordinates that cross the origin should have start/end incremented by the genome length Yes, that is a problem that needs to be addressed. Might be worth filing a bug report for tracking this; we can use David's example, or the one I recently added for phi-X174. chris > On Aug 17, 2010, at 6:51 AM, Chris Fields wrote: > >> I think Chris Mungall has a branch set up for this in bioperl: >> >> http://github.com/bioperl/bioperl-live/tree/circular >> >> Is that correct? Should we merge that code into the master branch? >> >> chris >> >> On Aug 17, 2010, at 8:44 AM, David Breimann wrote: >> >>> Hello, >>> >>> The following genbank has a gene that runs over the 'end" of the >>> chromosome and into its "beginning", and the script generates an >>> error. >>> >>> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Bacillus_cereus_ATCC_10987/NC_005707.gbk >>> >>> NC_005707 Unflattening error: >>> Details: >>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>> MSG: PROBLEM, SEVERITY==2 >>> Ranges not in correct order. Strange ensembl genbank entry? Range: >>> [207497,208369] [1,687] >>> STACK: Error::throw >>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:473 >>> STACK: Bio::SeqFeature::Tools::Unflattener::problem >>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:952 >>> STACK: Bio::SeqFeature::Tools::Unflattener::_check_order_is_consistent >>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2842 >>> STACK: Bio::SeqFeature::Tools::Unflattener::infer_mRNA_from_CDS >>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2713 >>> STACK: Bio::SeqFeature::Tools::Unflattener::unflatten_seq >>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:1532 >>> STACK: main::unflatten_seq /usr/local/bin/bp_genbank2gff3.pl:1023 >>> STACK: /usr/local/bin/bp_genbank2gff3.pl:506 >>> ----------------------------------------------------------- >>> >>> Best, >>> Dave >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From sheldon.mckay at gmail.com Tue Aug 17 20:42:50 2010 From: sheldon.mckay at gmail.com (Sheldon McKay) Date: Tue, 17 Aug 2010 16:42:50 -0400 Subject: [Bioperl-l] AlignIO and Gbrowse_syn In-Reply-To: References: <18DF7D20DFEC044098A1062202F5FFF32F0237EAB7@exchsth.agresearch.co.nz> Message-ID: The growse_syn dev team is pretty small (n=1) right now, so any patches would be welcome. Sheldon On Wed, Aug 11, 2010 at 6:02 PM, Chris Fields wrote: > Russell, > > We have had very few requests to support .maf until recently, which is why there has been little done with it. ?We welcome any help to improve it. > > chris > > On Aug 11, 2010, at 4:31 PM, Smithies, Russell wrote: > >> I know there was some brief discussion about .maf format a few weeks ago but I've had an enquiry (as below) from a colleague. >> If GBrowse_syn is using .maf format, does AlignIO need more work? >> Any comments? >> >> --Russell >> >> >> I'd like to plug LASTZ alignments into GBrowse_syn. LASTZ can produce a limit number of alignment formats (http://www.bx.psu.edu/miller_lab/dist/README.lastz-1.02.00/README.lastz-1.02.00a.html#options_output). GBrowse_syn accepts clustalw format plus "other commonly used formats recognized by BioPerl's AlignIO parser" (http://gmod.org/wiki/GBrowse_syn_Database) . ?Since LASTZ doesn't produce clustalw, I've tried parsing LASTZ maf output to clustalw (and other alignment formats) using AlignIO, however I run into the following issues: >> *Strand info is lost (probably fair enough, since this isn't part of the clustalw format per se; incorporating strand info within sequence IDs is a GBrowse_syn clustalw specification) >> *The coordinate system for reverse strand matches ?differs between LASTZ .maf and BioPerl .maf: for LASTZ, coordinates relate to the reverse complemented sequence, whereas for BioPerl/GBrowse, coordinates relate to the original (non-rev complemented) sequence. E.g. a coordinate of "1" in the LASTZ .maf file refers to the last base of the original sequence; AlignIO prints "1" to the output clustalw file, but since strand info is lost it is construed as the first position at the very start of the original sequence. As a result all reverse match coordinates in the resulting clustalw output file are incorrect. >> *AlignIO is unable to parse multiple, individual aligned regions within the same .maf file; it interleaves them >> >> I would be interested to hear whether anyone has already found a solution to integrating LASTZ and GBrowse_syn... and also whether any development of AlignIO to improve support of maf format is planned. >> ======================================================================= >> Attention: The information contained in this message and/or attachments >> from AgResearch Limited is intended only for the persons or entities >> to which it is addressed and may contain confidential and/or privileged >> material. Any review, retransmission, dissemination or other use of, or >> taking of any action in reliance upon, this information by persons or >> entities other than the intended recipients is prohibited by AgResearch >> Limited. If you have received this message in error, please notify the >> sender immediately. >> ======================================================================= >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From hxu.hong at gmail.com Tue Aug 17 20:50:43 2010 From: hxu.hong at gmail.com (Hong Xu) Date: Tue, 17 Aug 2010 16:50:43 -0400 Subject: [Bioperl-l] Bio::Tools::Primer3 question Message-ID: Hello all, I'm working to parse the Primer3 release 2.2.2-beta result. I made the necessary changes to make Bio::Tools::Primer3 work with the new output tags of Primer3 release 2.2.2. But when I tried to get the primer Tm, I found that Bio::Tools::Primer3 gave different Tm from Primer3 result file. Then I learned that the Tm was calculated by Bio::SeqFeature::Primer module, not from parsing Primer3 result. If I want to get data from parsing Primer3 result, should I write my own Primer3 parser instead of Bio::Tools::Primer3? thanks a lot, Hong From cjfields at illinois.edu Tue Aug 17 21:14:02 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 17 Aug 2010 16:14:02 -0500 Subject: [Bioperl-l] Bio::Tools::Primer3 question In-Reply-To: References: Message-ID: Already ahead of you there, unfortunately. I wrote a complete reimplementation of both the Primer3 parser and the Primer3 wrapper that handles both v1 and v2 of primer3_core. Lack of tuits lately have prevented me from getting tests written up, so for the time being it's sitting in bioperl-dev: http://github.com/bioperl/bioperl-dev They are Bio::Tools::Primer3Redux (parser) and Bio::Tools::Run::Primer3Redux (wrapper). I rewrote those b/c I found the original modules not adequate enough in many ways for my purposes then (the newer version uses simple features or feature pairs instead of the primer features, for the same reasons you mention re: Tm). You're more than welcome to hack on the code a bit. I'm planning on pulling it out into my own github repo for separate submission to CPAN. chris On Aug 17, 2010, at 3:50 PM, Hong Xu wrote: > Hello all, > > I'm working to parse the Primer3 release 2.2.2-beta result. I made the > necessary changes to make Bio::Tools::Primer3 work with the new output > tags of Primer3 release 2.2.2. But when I tried to get the primer Tm, > I found that Bio::Tools::Primer3 gave different Tm from Primer3 result > file. Then I learned that the Tm was calculated by > Bio::SeqFeature::Primer module, not from parsing Primer3 result. If I > want to get data from parsing Primer3 result, should I write my own > Primer3 parser instead of Bio::Tools::Primer3? > > thanks a lot, > Hong > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Aug 18 03:42:59 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 17 Aug 2010 22:42:59 -0500 Subject: [Bioperl-l] bp_genbank2gff3.pl error with circular genomes In-Reply-To: <8C50FCC4-4BB5-4ACB-A138-BB20F50D2C45@illinois.edu> References: <8E620E8B-4A4A-42B2-A2FA-60C8ECA4C8A5@illinois.edu> <8C50FCC4-4BB5-4ACB-A138-BB20F50D2C45@illinois.edu> Message-ID: Chris, David, The branch is now merged back to trunk. David, let us know if this helps. chris (f) On Aug 17, 2010, at 2:24 PM, Chris Fields wrote: > On Aug 17, 2010, at 10:53 AM, Chris Mungall wrote: > >> You can merge this in. It should allow David to proceed. > > Will do. I'll go ahead and delete the remote branch as well. > >> I haven't kept up on synchrony between bioperl and GFF on circular genomes. The above fix is conservative in that essentially preserves the genbank coordinates even when the origin is crossed: >> >> http://github.com/bioperl/bioperl-live/commit/d752a4cb5168d1bb01f8c80247a57f66b2bd9daf >> >> However, if this is to conform to GFF3 then the resulting coordinates that cross the origin should have start/end incremented by the genome length > > Yes, that is a problem that needs to be addressed. Might be worth filing a bug report for tracking this; we can use David's example, or the one I recently added for phi-X174. > > chris > >> On Aug 17, 2010, at 6:51 AM, Chris Fields wrote: >> >>> I think Chris Mungall has a branch set up for this in bioperl: >>> >>> http://github.com/bioperl/bioperl-live/tree/circular >>> >>> Is that correct? Should we merge that code into the master branch? >>> >>> chris >>> >>> On Aug 17, 2010, at 8:44 AM, David Breimann wrote: >>> >>>> Hello, >>>> >>>> The following genbank has a gene that runs over the 'end" of the >>>> chromosome and into its "beginning", and the script generates an >>>> error. >>>> >>>> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Bacillus_cereus_ATCC_10987/NC_005707.gbk >>>> >>>> NC_005707 Unflattening error: >>>> Details: >>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>> MSG: PROBLEM, SEVERITY==2 >>>> Ranges not in correct order. Strange ensembl genbank entry? Range: >>>> [207497,208369] [1,687] >>>> STACK: Error::throw >>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:473 >>>> STACK: Bio::SeqFeature::Tools::Unflattener::problem >>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:952 >>>> STACK: Bio::SeqFeature::Tools::Unflattener::_check_order_is_consistent >>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2842 >>>> STACK: Bio::SeqFeature::Tools::Unflattener::infer_mRNA_from_CDS >>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2713 >>>> STACK: Bio::SeqFeature::Tools::Unflattener::unflatten_seq >>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:1532 >>>> STACK: main::unflatten_seq /usr/local/bin/bp_genbank2gff3.pl:1023 >>>> STACK: /usr/local/bin/bp_genbank2gff3.pl:506 >>>> ----------------------------------------------------------- >>>> >>>> Best, >>>> Dave >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Aug 18 04:48:55 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 17 Aug 2010 23:48:55 -0500 Subject: [Bioperl-l] Bio::Tools::Primer3 question In-Reply-To: References: Message-ID: Hong, The latest code, along with working tests, is present here: http://github.com/cjfields/Bio-Tools-Primer3Redux It needs a few more tests but the initial wrapper tests work fine for primer3 v2.2.1 on both Mac and Linux. Will try using this to CPAN after a bit more cleanup. chris On Aug 17, 2010, at 4:14 PM, Chris Fields wrote: > Already ahead of you there, unfortunately. I wrote a complete reimplementation of both the Primer3 parser and the Primer3 wrapper that handles both v1 and v2 of primer3_core. Lack of tuits lately have prevented me from getting tests written up, so for the time being it's sitting in bioperl-dev: > > http://github.com/bioperl/bioperl-dev > > They are Bio::Tools::Primer3Redux (parser) and Bio::Tools::Run::Primer3Redux (wrapper). > > I rewrote those b/c I found the original modules not adequate enough in many ways for my purposes then (the newer version uses simple features or feature pairs instead of the primer features, for the same reasons you mention re: Tm). You're more than welcome to hack on the code a bit. I'm planning on pulling it out into my own github repo for separate submission to CPAN. > > chris > > On Aug 17, 2010, at 3:50 PM, Hong Xu wrote: > >> Hello all, >> >> I'm working to parse the Primer3 release 2.2.2-beta result. I made the >> necessary changes to make Bio::Tools::Primer3 work with the new output >> tags of Primer3 release 2.2.2. But when I tried to get the primer Tm, >> I found that Bio::Tools::Primer3 gave different Tm from Primer3 result >> file. Then I learned that the Tm was calculated by >> Bio::SeqFeature::Primer module, not from parsing Primer3 result. If I >> want to get data from parsing Primer3 result, should I write my own >> Primer3 parser instead of Bio::Tools::Primer3? >> >> thanks a lot, >> Hong >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From david.breimann at gmail.com Wed Aug 18 06:46:58 2010 From: david.breimann at gmail.com (David Breimann) Date: Wed, 18 Aug 2010 09:46:58 +0300 Subject: [Bioperl-l] bp_genbank2gff3.pl error with circular genomes In-Reply-To: References: <8E620E8B-4A4A-42B2-A2FA-60C8ECA4C8A5@illinois.edu> <8C50FCC4-4BB5-4ACB-A138-BB20F50D2C45@illinois.edu> Message-ID: Dear Chris's, I tested the updated version on multiple genomes that previously returned errors (for future reference: NC_005707, NC_006578, NC_007103, NC_007104, NC_007106, NC_007107, NC_008573, NC_008762, NC_008763, NC_008785, NC_009457, NC_012040). The script now ends normally on all of them. However, as you mentioned, the result GFF3 file does not comply with GFF3 specifications for circular genomes. This in turn causes some unexpected results in other applications. Best, Dave On Wed, Aug 18, 2010 at 6:42 AM, Chris Fields wrote: > Chris, David, > > The branch is now merged back to trunk. ?David, let us know if this helps. > > chris (f) > > On Aug 17, 2010, at 2:24 PM, Chris Fields wrote: > >> On Aug 17, 2010, at 10:53 AM, Chris Mungall wrote: >> >>> You can merge this in. It should allow David to proceed. >> >> Will do. ?I'll go ahead and delete the remote branch as well. >> >>> I haven't kept up on synchrony between bioperl and GFF on circular genomes. The above fix is conservative in that essentially preserves the genbank coordinates even when the origin is crossed: >>> >>> ? ? ?http://github.com/bioperl/bioperl-live/commit/d752a4cb5168d1bb01f8c80247a57f66b2bd9daf >>> >>> However, if this is to conform to GFF3 then the resulting coordinates that cross the origin should have start/end incremented by the genome length >> >> Yes, that is a problem that needs to be addressed. ?Might be worth filing a bug report for tracking this; we can use David's example, or the one I recently added for phi-X174. >> >> chris >> >>> On Aug 17, 2010, at 6:51 AM, Chris Fields wrote: >>> >>>> I think Chris Mungall has a branch set up for this in bioperl: >>>> >>>> http://github.com/bioperl/bioperl-live/tree/circular >>>> >>>> Is that correct? ?Should we merge that code into the master branch? >>>> >>>> chris >>>> >>>> On Aug 17, 2010, at 8:44 AM, David Breimann wrote: >>>> >>>>> Hello, >>>>> >>>>> The following genbank has a gene that runs over the 'end" of the >>>>> chromosome and into its "beginning", and the script generates an >>>>> error. >>>>> >>>>> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Bacillus_cereus_ATCC_10987/NC_005707.gbk >>>>> >>>>> NC_005707 Unflattening error: >>>>> Details: >>>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>>> MSG: PROBLEM, SEVERITY==2 >>>>> Ranges not in correct order. Strange ensembl genbank entry? Range: >>>>> [207497,208369] [1,687] >>>>> STACK: Error::throw >>>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:473 >>>>> STACK: Bio::SeqFeature::Tools::Unflattener::problem >>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:952 >>>>> STACK: Bio::SeqFeature::Tools::Unflattener::_check_order_is_consistent >>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2842 >>>>> STACK: Bio::SeqFeature::Tools::Unflattener::infer_mRNA_from_CDS >>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2713 >>>>> STACK: Bio::SeqFeature::Tools::Unflattener::unflatten_seq >>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:1532 >>>>> STACK: main::unflatten_seq /usr/local/bin/bp_genbank2gff3.pl:1023 >>>>> STACK: /usr/local/bin/bp_genbank2gff3.pl:506 >>>>> ----------------------------------------------------------- >>>>> >>>>> Best, >>>>> Dave >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From G.Gallone at sms.ed.ac.uk Wed Aug 18 14:57:01 2010 From: G.Gallone at sms.ed.ac.uk (Giuseppe Gallone) Date: Wed, 18 Aug 2010 15:57:01 +0100 Subject: [Bioperl-l] [RFC] Interolog::Walk Message-ID: <4C6BF4BD.5010200@sms.ed.ac.uk> Hello BioPerl community - I've written a new module called Interolog::Walk that I'm planning to put on CPAN. I would be grateful if you might take a look at the brief description I attached and tell me what you think. I'll be more than happy to post further details should the module be of some interest for someone. Also, I am not totally sure about having the correct name for it. This is my first module and It would be great if you could advise on naming it appropriately. Hopefully the following description will give an idea on what it does. =================== NAME Interolog::Walk - Retrieve, score and visualize putative Protein-Protein Interactions through the orthology-walk method DESCRIPTION A common activity in computational biology is to mine protein-protein interactions from publicly available databases in order to build Protein-Protein Interaction (PPI) datasets. In many instances, however, the number of experimentally obtained annotated PPIs is very scarce and it would be helpful to enrich the experimental dataset with high-quality, computationally-inferred PPIs. Such computationally-obtained dataset can extend, support or enrich experimental PPI datasets, and are of crucial importance in high-throughput gene prioritization studies, i.e. to drive hypotheses and restrict the dimensionality of many gene functional discovery problems. This Perl Module, Interolog::Walk, is aimed at building putative PPI datasets on the basis of a number of comparative biology paradigms: the module implements a collection of computational biology algorithms based on the concept of "orthology projection". If interacting proteins A and B in organism X have orthologs A' and B' in organism Y, under certain conditions one can assume that the interaction will be conserved in organism Y, i.e. the A-B interaction can be "projected through the orthologies" to obtain a putative A'-B' interaction. The pair of interactions (A-B) and (A'-B') are named "Interologs" (see for instance [1] and [2]). Interolog::Walk collects, analyses and collates gene orthology data provided by the Ensembl Consortium (www.ensembl.org) as well as PPI data provided by EBI Intact (http://www.ebi.ac.uk/intact/). It provides the user with the possibility of rating the quality and reliability of the putative interactions collected, by means of confidence scores, and optionally outputs network representations of the datasets, compatible with the biological network representation standard, Cytoscape. USAGE In order to carry out an interolog walk we start with a set of gene identifiers in one organism of interest. We query those ids against a number of comparative biology databases to retrieve a list of orthologues for each gene id of interest, in one or more species. In the following step we rely on PPI databases to retrieve the list of available interactors for the protein ids obtained. The output at this stage consists of a list of interactors of the orthologues of the initial gene set, plus several fields of ancillary data. In the last step of the process we project the interactions - again using orthology data - back to the original species of interest. The output of the process is a list of PUTATIVE INTERACTORS of the initial gene set, plus several fields of ancillary data. ==================== Given the scope and the focus of the project, I would imagine that viable alternatives for the namespace might be Bio::Orthology::InterologWalk Bio::InterologMap or maybe Interolog::Map Orthology::Map Orthology::InterologMap There are no similar projects as far as I could see so I shouldn't run the risk of overlapping namespaces. Still I would love to know your informed opinion about it. best, Giuseppe REFERENCES [1] Yu H, Luscombe NM, Lu HX, Zhu X, Xia Y, Han JD, Bertin N, Chung S, Vidal M, Gerstein M. Annotation transfer between genomes: protein-protein interologs and protein-DNA regulogs. Genome Research 2004 Jun;14(6):1107-18. [2]Wiles AM, Doderer M, Ruan J, Gu T-T, Ravi D, Blackman BA, Bishop AJR. "Building and Analyzing Protein Interactome Networks by Cross-species Comparisons." BMC Systems Biology 2010, 4:36 - PMID: 20353594 -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From David.Messina at sbc.su.se Wed Aug 18 16:52:58 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 18 Aug 2010 18:52:58 +0200 Subject: [Bioperl-l] [RFC] Interolog::Walk In-Reply-To: <4C6BF4BD.5010200@sms.ed.ac.uk> References: <4C6BF4BD.5010200@sms.ed.ac.uk> Message-ID: <8BBCD561-CA5E-412B-8CBC-34767006A74A@sbc.su.se> Hi Giuseppe, Sounds really interesting ? thanks for posting this. > Bio::Orthology::InterologWalk I vote for this name, or in any case something with Bio:: as the top-level namespace since it's a biology-related package. I like that you're providing a lot of background and information about the project in the documentation. However, the USAGE section should give information about how to use the module, with example code. You can look at other modules on CPAN (or in BioPerl) to see the conventions for writing documentation. Also, from what you wrote, it sounds like this might be a pipeline or a script rather than a module per se, or perhaps a script and a set of modules. It would be helpful to clarify in your documentation (if you haven't already) how exactly things are organized (and of course example code will help with that, too). Hope that's helpful, and let us know when you've got it up on CPAN so we can try it out! Dave From cjfields at illinois.edu Wed Aug 18 18:24:16 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 18 Aug 2010 13:24:16 -0500 Subject: [Bioperl-l] bp_genbank2gff3.pl error with circular genomes In-Reply-To: References: <8E620E8B-4A4A-42B2-A2FA-60C8ECA4C8A5@illinois.edu> <8C50FCC4-4BB5-4ACB-A138-BB20F50D2C45@illinois.edu> Message-ID: Okay, will file this as a bug. Thanks! chris On Aug 18, 2010, at 1:46 AM, David Breimann wrote: > Dear Chris's, > > I tested the updated version on multiple genomes that previously > returned errors (for future reference: NC_005707, NC_006578, > NC_007103, NC_007104, NC_007106, NC_007107, NC_008573, NC_008762, > NC_008763, NC_008785, NC_009457, NC_012040). The script now ends > normally on all of them. However, as you mentioned, the result GFF3 > file does not comply with GFF3 specifications for circular genomes. > This in turn causes some unexpected results in other applications. > > Best, > Dave > > On Wed, Aug 18, 2010 at 6:42 AM, Chris Fields wrote: >> Chris, David, >> >> The branch is now merged back to trunk. David, let us know if this helps. >> >> chris (f) >> >> On Aug 17, 2010, at 2:24 PM, Chris Fields wrote: >> >>> On Aug 17, 2010, at 10:53 AM, Chris Mungall wrote: >>> >>>> You can merge this in. It should allow David to proceed. >>> >>> Will do. I'll go ahead and delete the remote branch as well. >>> >>>> I haven't kept up on synchrony between bioperl and GFF on circular genomes. The above fix is conservative in that essentially preserves the genbank coordinates even when the origin is crossed: >>>> >>>> http://github.com/bioperl/bioperl-live/commit/d752a4cb5168d1bb01f8c80247a57f66b2bd9daf >>>> >>>> However, if this is to conform to GFF3 then the resulting coordinates that cross the origin should have start/end incremented by the genome length >>> >>> Yes, that is a problem that needs to be addressed. Might be worth filing a bug report for tracking this; we can use David's example, or the one I recently added for phi-X174. >>> >>> chris >>> >>>> On Aug 17, 2010, at 6:51 AM, Chris Fields wrote: >>>> >>>>> I think Chris Mungall has a branch set up for this in bioperl: >>>>> >>>>> http://github.com/bioperl/bioperl-live/tree/circular >>>>> >>>>> Is that correct? Should we merge that code into the master branch? >>>>> >>>>> chris >>>>> >>>>> On Aug 17, 2010, at 8:44 AM, David Breimann wrote: >>>>> >>>>>> Hello, >>>>>> >>>>>> The following genbank has a gene that runs over the 'end" of the >>>>>> chromosome and into its "beginning", and the script generates an >>>>>> error. >>>>>> >>>>>> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Bacillus_cereus_ATCC_10987/NC_005707.gbk >>>>>> >>>>>> NC_005707 Unflattening error: >>>>>> Details: >>>>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>>>> MSG: PROBLEM, SEVERITY==2 >>>>>> Ranges not in correct order. Strange ensembl genbank entry? Range: >>>>>> [207497,208369] [1,687] >>>>>> STACK: Error::throw >>>>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:473 >>>>>> STACK: Bio::SeqFeature::Tools::Unflattener::problem >>>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:952 >>>>>> STACK: Bio::SeqFeature::Tools::Unflattener::_check_order_is_consistent >>>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2842 >>>>>> STACK: Bio::SeqFeature::Tools::Unflattener::infer_mRNA_from_CDS >>>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2713 >>>>>> STACK: Bio::SeqFeature::Tools::Unflattener::unflatten_seq >>>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:1532 >>>>>> STACK: main::unflatten_seq /usr/local/bin/bp_genbank2gff3.pl:1023 >>>>>> STACK: /usr/local/bin/bp_genbank2gff3.pl:506 >>>>>> ----------------------------------------------------------- >>>>>> >>>>>> Best, >>>>>> Dave >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cdavis at bcm.tmc.edu Wed Aug 18 19:19:53 2010 From: cdavis at bcm.tmc.edu (Caleb Davis) Date: Wed, 18 Aug 2010 14:19:53 -0500 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlast.pm - bl2seq question Message-ID: <4C6C3259.4060304@bcm.tmc.edu> Hello, thank you for bioperl! I am getting discrepancies between the online bl2seq (www.ncbi.nlm.nih.gov/blast/*bl2seq*/wblast2.cgi) and bioperl's implementation, and I'm not sure why. I'm seeing a desired behavior through the web interface but can't replicate it locally. Specifically, online bl2seq aligns across a 1 bp insertion in the subject whereas the local bl2seq just reports a shorter alignment. Any ideas? Thanks again, --Caleb The desired parameter differences from default are -F F -W 7 (turn complexity filter off, word size = 7). Below I present the online and local results given the following input sequences: >consensus GAGGATCCAGAATTCTC >FVFTF6N01A86BR AACCCAATGTAAGGAAGCTAAGAACCTTGAAAAGAGGATACCAGAATTCTC Here are the parameters and result I'm getting online: Blast4-request ::= { body queue-search { program "blastn", service "plain", queries bioseq-set { seq-set { seq { id { local id 26297 }, descr { title "consensus", user { type str "CFastaReader", data { { label str "DefLine", data str ">consensus" } } } }, inst { repr raw, mol na, length 17, seq-data ncbi2na '8A3520F740'H } } } }, subject sequences { { id { local id 26299 }, descr { title "FVFTF6N01A86BR", user { type str "CFastaReader", data { { label str "DefLine", data str ">FVFTF6N01A86BR" } } } }, inst { repr raw, mol na, length 51, seq-data ncbi2na '0543B0A09C205F80228C520F74'H } } }, algorithm-options { { name "EvalueThreshold", value cutoff e-value { 1, 10, 1 } }, { name "UngappedMode", value boolean FALSE }, { name "PercentIdentity", value real { 0, 10, 0 } }, { name "HitlistSize", value integer 100 }, { name "EffectiveSearchSpace", value big-integer 0 }, { name "DbLength", value big-integer 0 }, { name "WindowSize", value integer 0 }, { name "DustFiltering", value boolean FALSE }, { name "RepeatFiltering", value boolean FALSE }, { name "MaskAtHash", value boolean TRUE }, { name "MismatchPenalty", value integer -3 }, { name "MatchReward", value integer 2 }, { name "GapOpeningCost", value integer 5 }, { name "GapExtensionCost", value integer 2 }, { name "StrandOption", value strand-type both-strands }, { name "WordSize", value integer 7 } }, format-options { { name "Web_JobTitle", value string "consensus" }, { name "Web_BlastSpecialPage", value string "blast2seq" } } } } >lcl|30439 FVFTF6N01A86BR Length=51 Sort alignments for this subject sequence by: E value Score Percent identity Query start position Subject start position Score = 24.7 bits (26), Expect = 2e-05 Identities = 17/18 (94%), Gaps = 1/18 (5%) Strand=Plus/Plus Query 1 GAGGAT-CCAGAATTCTC 17 |||||| ||||||||||| Sbjct 34 GAGGATACCAGAATTCTC 51 Here's the output from a local search (I changed the expect to 5.0 just to prove to myself that some parameters are getting through OK): my @params = (-program => 'blastn', -outfile => 'bl2seq.out', -FILTER => 'F', -WORDSIZE => 7, -expect => 5.0); my $factory = Bio::Tools::Run::StandAloneBlast->new(@params); my $bl2seq_report = $factory->bl2seq($cons_seqobj, $single_seqobj); #consensus vs. FVFTF6N01A86BR print Dumper $bl2seq_report->next_result; $VAR1 = bless( { '_inclusion_threshold' => undef, '_queryacc' => 'adapter_consensus', '_iteration_index' => 0, '_iteration_count' => 1, '_hits' => [], '_hitindex' => 0, '_querylength' => '17', '_querydesc' => '', '_iterations' => [ bless( { '_oldhits_not_below_threshold' => [], '_newhits_unclassified' => [], '_number' => 1, '_oldhits_newly_below_threshold' => [], '_hit_factory' => bless( { 'interface' => 'Bio::Search::Hit::HitI', 'type' => 'Bio::Search::Hit::BlastHit', '_loaded_types' => { 'Bio::Search::Hit::BlastHit' => 1 }, '_root_verbose' => 0 }, 'Bio::Factory::ObjectFactory' ), '_newhits_below_threshold' => [ { '-algorithm' => 'BLASTN', '-description' => '', '-length' => '51', '-query_len' => '17', '-hsp_factory' => bless( { 'interface' => 'Bio::Search::HSP::HSPI', 'type' => 'Bio::Search::HSP::GenericHSP', '_loaded_types' => { 'Bio::Search::HSP::GenericHSP' => 1 }, '_root_verbose' => 0 }, 'Bio::Factory::ObjectFactory' ), '-name' => 'FVFTF6N01A86BR', '-rank' => 1, '-hsps' => [ { '-query_start' => '7', '-algorithm' => 'BLASTN', '-hit_seq' => 'ccagaattctc', '-hit_length' => '51', '-query_length' => '17', '-query_desc' => '', '-query_frame' => 0, '-rank' => 1, '-hit_desc' => '', '-query_end' => '17', '-hit_name' => 'FVFTF6N01A86BR', '-identical' => '11', '-query_name' => 'adapter_consensus', '-evalue' => '1e-04', '-score' => '11', '-conserved' => '11', '-hit_frame' => 0, '-hsp_length' => '11', '-query_seq' => 'ccagaattctc', '-hit_start' => '41', '-homology_seq' => '|||||||||||', '-hit_end' => '51', '-bits' => '22.3' }, { '-query_start' => '9', '-algorithm' => 'BLASTN', '-hit_seq' => 'agaattct', '-hit_length' => '51', '-query_length' => '17', '-query_desc' => '', '-query_frame' => 0, '-rank' => 2, '-hit_desc' => '', '-query_end' => '16', '-hit_name' => 'FVFTF6N01A86BR', '-identical' => '8', '-query_name' => 'adapter_consensus', '-evalue' => '0.007', '-score' => '8', '-conserved' => '8', '-hit_frame' => 0, '-hsp_length' => '8', '-query_seq' => 'agaattct', '-hit_start' => '50', '-homology_seq' => '||||||||', '-hit_end' => '43', '-bits' => '16.4' } ], '-accession' => 'FVFTF6N01A86BR', '-significance' => '1e-04' } ], '_root_verbose' => 0, '_newhits_not_below_threshold' => [], '_oldhits_below_threshold' => [] }, 'Bio::Search::Iteration::GenericIteration' ) ], '_hit_factory' => $VAR1->{'_iterations'}[0]{'_hit_factory'}, '_statistics' => bless( { 'stats' => { 'S1' => '4', 'S1_bits' => '8.4', 'kappa_gapped' => '0.711', 'X3_bits' => '99.1', 'X1' => '4', 'lambda_gapped' => '1.37', 'X2' => '15', 'S2' => '4', 'seqs_better_than_cutoff' => '1', 'Hits_to_DB' => '5', 'num_extensions' => '2', 'num_successful_extensions' => '2', 'X1_bits' => '7.9', 'X3' => '50', 'dbentries' => '1', 'entropy_gapped' => '1.31', 'X2_bits' => '29.7', 'S2_bits' => '8.4' } }, 'Bio::Search::GenericStatistics' ), '_algorithm' => 'BLASTN', '_parameters' => bless( { 'params' => { 'gapext' => '2', 'matrix' => 'blastn matrix:1 -3', 'expect' => '5.0', 'allowgaps' => 'yes', 'gapopen' => '5' } }, 'Bio::Tools::Run::GenericParameters' ), '_root_verbose' => 0, '_queryname' => 'adapter_consensus' }, 'Bio::Search::Result::BlastResult' ); From David.Messina at sbc.su.se Wed Aug 18 22:32:37 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 19 Aug 2010 00:32:37 +0200 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlast.pm - bl2seq question In-Reply-To: <4C6C3259.4060304@bcm.tmc.edu> References: <4C6C3259.4060304@bcm.tmc.edu> Message-ID: Hi Caleb, The first thing I would do is take BioPerl out of the equation and test your local bl2seq on the command line. If you get the same output locally as on the web version, then there is a problem with BioPerl. If you're still seeing a discrepancy between the web and your local run, then this isn't a problem with BioPerl. Just to be clear, BioPerl doesn't "implement" any of the BLAST programs; it is simply a wrapper around the programs that you download from NCBI. That doesn't mean BioPerl isn't at fault, of course, just that it's important to isolate the problem carefully. The most common reasons for these discrepancies are: - different version numbers of BLAST 2.2.21? 2.2.22? Is it the same on the web as locally? - similarly, different implementations of BLAST NCBI's old BLAST suite is now deprecated and replaced with BLAST+. All of the online BLAST web queries are Blast+ now ? are you running BLAST+ locally? (there's also a separate BioPerl wrapper for BLAST+ called Bio::Tools::Run::BlastPlus) - hidden "default" parameters Even though you're only changing a handful of parameters, the defaults (particularly on the web version) may be different than what you expect. In your case, it looks like on the web version, match score is 2 and mismatch is -3. However, in the local version I believe match score is 1 and a mismatch is -3. See this line in the params block near the end of your post: 'matrix' => 'blastn matrix:1 -3', Dave From sidd.basu at gmail.com Thu Aug 19 00:28:32 2010 From: sidd.basu at gmail.com (Siddhartha Basu) Date: Wed, 18 Aug 2010 19:28:32 -0500 Subject: [Bioperl-l] Re: [RFC] Interolog::Walk In-Reply-To: <4C6BF4BD.5010200@sms.ed.ac.uk> References: <4C6BF4BD.5010200@sms.ed.ac.uk> Message-ID: <20100819002830.GA366@Macintosh-235.local> Hi, On Wed, 18 Aug 2010, Giuseppe Gallone wrote: > Hello BioPerl community - I've written a new module called Interolog::Walk > that I'm planning to put on CPAN. I would be grateful if you might take a > look at the brief description I attached and tell me what you think. I'll > be more than happy to post further details should the module be of some > interest for someone. > > Also, I am not totally sure about having the correct name for it. This is > my first module and It would be great if you could advise on naming it > appropriately. Hopefully the following description will give an idea on > what it does. > > =================== > > > NAME > Interolog::Walk - Retrieve, score and visualize putative > Protein-Protein Interactions through the orthology-walk method > > DESCRIPTION > A common activity in computational biology is to mine protein-protein > interactions from publicly available databases in order to build > Protein-Protein Interaction (PPI) datasets. > In many instances, however, the number of experimentally obtained annotated > PPIs is very scarce and it would be helpful to enrich the experimental > dataset with high-quality, computationally-inferred PPIs. Such > computationally-obtained dataset can extend, support or enrich experimental > PPI datasets, and are of crucial importance in high-throughput gene > prioritization studies, i.e. to drive hypotheses and restrict the > dimensionality of many gene functional discovery problems. > This Perl Module, Interolog::Walk, is aimed at building putative PPI > datasets on the basis of a number of comparative biology paradigms: the > module implements a collection of computational biology algorithms based on > the concept of "orthology projection". If interacting proteins A and B in > organism X have orthologs A' and B' in organism Y, under certain conditions > one can assume that the interaction will be conserved in organism Y, i.e. > the A-B interaction can be "projected through the orthologies" to obtain a > putative A'-B' interaction. The pair of interactions (A-B) and (A'-B') are > named "Interologs" (see for instance [1] and [2]). > > Interolog::Walk collects, analyses and collates gene orthology data > provided by the Ensembl Consortium (www.ensembl.org) as well as PPI data > provided by EBI Intact (http://www.ebi.ac.uk/intact/). It provides the user > with the possibility of rating the quality and reliability of the putative > interactions collected, by means of confidence scores, and optionally > outputs network representations of the datasets, compatible with the > biological network representation standard, Cytoscape. Sounds interesting. I am currently playing around with a perl based webapp for displaying interactome using cytoscapeweb. Depending how your design pans out, would be happy to use your module as a backend analysis layer. And on a related note, you might want to have a look at bioperl-network and if there is any overlap might be worth contributing. -siddhartha > > USAGE > In order to carry out an interolog walk we start with a set of gene > identifiers in one organism of interest. We query those ids against a > number of comparative biology databases to retrieve a list of orthologues > for each gene id of interest, in one or more species. > In the following step we rely on PPI databases to retrieve the list of > available interactors for the protein ids obtained. The output at this > stage consists of a list of interactors of the orthologues of the initial > gene set, plus several fields of ancillary data. > In the last step of the process we project the interactions - again using > orthology data - back to the original species of interest. The output of > the process is a list of PUTATIVE INTERACTORS of the initial gene set, plus > several fields of ancillary data. > > ==================== > > Given the scope and the focus of the project, I would imagine that viable > alternatives for the namespace might be > > Bio::Orthology::InterologWalk > Bio::InterologMap > > or maybe > Interolog::Map > Orthology::Map > Orthology::InterologMap > > There are no similar projects as far as I could see so I shouldn't run the > risk of overlapping namespaces. Still I would love to know your informed > opinion about it. > > best, > Giuseppe > > > > REFERENCES > [1] Yu H, Luscombe NM, Lu HX, Zhu X, Xia Y, Han JD, Bertin N, Chung S, > Vidal M, Gerstein M. Annotation transfer between genomes: protein-protein > interologs and protein-DNA regulogs. Genome Research 2004 > Jun;14(6):1107-18. > > [2]Wiles AM, Doderer M, Ruan J, Gu T-T, Ravi D, Blackman BA, Bishop AJR. > "Building and Analyzing Protein Interactome Networks by Cross-species > Comparisons." BMC Systems Biology 2010, 4:36 - PMID: 20353594 > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From dan.kortschak at adelaide.edu.au Thu Aug 19 02:15:03 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Thu, 19 Aug 2010 11:45:03 +0930 Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query Message-ID: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au> Hi Everyone, I'm wanting to set up a persistent data store for some of my work and am in the process of choosing parts for my system. From my brief look around I think I'd like to use BioSQL (next best choice being Chado - but BioPerl bindings in bioperl-db for BioSQL being the decider here), but have noticed comments some time back that bioperl-db and PostgreSQL 8.3 (my prefered engine - though MySQL is possible, but makes the whole system messier) don't play well together. What is the status of the casting expectation conflict between bioperl-db and Pg8.3? The scripts are run with safe data, so placeholders aren't strictly crucial (though speed may be an issue?) and `$dbh->{pg_server_prepare} = 0;' seems like it could be an option. Can anybody provide any advice on this issue? thanks Dan Kortschak From cjfields at illinois.edu Thu Aug 19 03:29:36 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 18 Aug 2010 22:29:36 -0500 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlast.pm - bl2seq question In-Reply-To: References: <4C6C3259.4060304@bcm.tmc.edu> Message-ID: <194D43EC-A44C-450A-B57B-EC379DBCB935@illinois.edu> Wouldn't surprise me too much if the parameters are not set the same; IIRC the main BLAST URL API and the online NCBI Web-BLAST have different default settings. chris On Aug 18, 2010, at 5:32 PM, Dave Messina wrote: > Hi Caleb, > > The first thing I would do is take BioPerl out of the equation and test your local bl2seq on the command line. If you get the same output locally as on the web version, then there is a problem with BioPerl. If you're still seeing a discrepancy between the web and your local run, then this isn't a problem with BioPerl. > > Just to be clear, BioPerl doesn't "implement" any of the BLAST programs; it is simply a wrapper around the programs that you download from NCBI. That doesn't mean BioPerl isn't at fault, of course, just that it's important to isolate the problem carefully. > > The most common reasons for these discrepancies are: > > - different version numbers of BLAST > > 2.2.21? 2.2.22? Is it the same on the web as locally? > > - similarly, different implementations of BLAST > > NCBI's old BLAST suite is now deprecated and replaced with BLAST+. All of the online BLAST web queries are Blast+ now ? are you running BLAST+ locally? (there's also a separate BioPerl wrapper for BLAST+ called Bio::Tools::Run::BlastPlus) > > - hidden "default" parameters > > Even though you're only changing a handful of parameters, the defaults (particularly on the web version) may be different than what you expect. > > In your case, it looks like on the web version, match score is 2 and mismatch is -3. However, in the local version I believe match score is 1 and a mismatch is -3. > > See this line in the params block near the end of your post: > > 'matrix' => 'blastn matrix:1 -3', > > > > Dave > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at drycafe.net Thu Aug 19 05:48:19 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Thu, 19 Aug 2010 01:48:19 -0400 Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query In-Reply-To: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au> References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: <986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net> Hi Dan, the casting isn't an issue anymore, I think. (And even if it were, there is actually a small script that brings back the casts that were built into 8.2.) Have you found an example where it still is? -hilmar On Aug 18, 2010, at 10:15 PM, Dan Kortschak wrote: > Hi Everyone, > > I'm wanting to set up a persistent data store for some of my work > and am > in the process of choosing parts for my system. From my brief look > around I think I'd like to use BioSQL (next best choice being Chado - > but BioPerl bindings in bioperl-db for BioSQL being the decider here), > but have noticed comments some time back that bioperl-db and > PostgreSQL > 8.3 (my prefered engine - though MySQL is possible, but makes the > whole > system messier) don't play well together. > > What is the status of the casting expectation conflict between > bioperl-db and Pg8.3? The scripts are run with safe data, so > placeholders aren't strictly crucial (though speed may be an issue?) > and > `$dbh->{pg_server_prepare} = 0;' seems like it could be an option. > > Can anybody provide any advice on this issue? > > thanks > Dan Kortschak > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From dan.kortschak at adelaide.edu.au Thu Aug 19 05:54:03 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Thu, 19 Aug 2010 15:24:03 +0930 Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query In-Reply-To: <986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net> References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au> <986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net> Message-ID: <1282197243.14127.27.camel@zoidberg.mbs.adelaide.edu.au> Hi Hilmar, No, I haven't found any problems, just hoping to avoid them by prior research. thanks Dan On Thu, 2010-08-19 at 01:48 -0400, Hilmar Lapp wrote: > Hi Dan, > > the casting isn't an issue anymore, I think. (And even if it were, > there is actually a small script that brings back the casts that > were > built into 8.2.) Have you found an example where it still is? > > -hilmar From biopython at maubp.freeserve.co.uk Thu Aug 19 10:01:03 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 19 Aug 2010 11:01:03 +0100 Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query In-Reply-To: <986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net> References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au> <986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net> Message-ID: On Thu, Aug 19, 2010 at 6:48 AM, Hilmar Lapp wrote: > Hi Dan, > > the casting isn't an issue anymore, I think. (And even if it were, there is > actually a small script that brings back the casts that were built into > 8.2.) Have you found an example where it still is? > > ? ? ? ?-hilmar Hi Hilmar, Do the bioperl-db bindings for BioSQL on PostgreSQL still require those extra rules in the schema? http://bugzilla.open-bio.org/show_bug.cgi?id=2839 Peter From G.Gallone at sms.ed.ac.uk Thu Aug 19 10:45:36 2010 From: G.Gallone at sms.ed.ac.uk (Giuseppe Gallone) Date: Thu, 19 Aug 2010 11:45:36 +0100 Subject: [Bioperl-l] [RFC] Interolog::Walk In-Reply-To: <8BBCD561-CA5E-412B-8CBC-34767006A74A@sbc.su.se> References: <4C6BF4BD.5010200@sms.ed.ac.uk> <8BBCD561-CA5E-412B-8CBC-34767006A74A@sbc.su.se> Message-ID: <4C6D0B50.4050902@sms.ed.ac.uk> Hi Dave, thank you very much for your helpful comments. Regarding the module name: I will follow your advice and avoid to propose a new root during the module registration. As for the second level, I haven't been able to find anything related to homology/orthology, therefore I'm not sure whether I should go for Bio::Orthology::InterologMap or Bio::Homology::InterologMap The first one being maybe a bit more specific. I might also expand further as in Bio::Orthology::Interolog::Map, just in case somebody else finds other interesting applications for the Interolog concept and would like to "plug in" their own contribution. Would this make any sense? I also appreciate your comments on the documentation. The one I provided is actually not the full pod I was planning to include, but rather an extract. What I have at the moment is a description, for each method, in the following form: ===================================== remove_duplicate_rows Usage : $RC = InterologMap::remove_duplicate_rows(input_handle => $dbh, output_handle => $out_data, header => 'standard', ); Purpose : This is used to clean up a TSV data files of duplicate entries. Occasionally, Intact can return duplicate entries. This routine will make sure no such duplicates are kept. A new datafile is built. The number of unique data rows is updated. Returns : success/error Argument : database handle to input file, filehandle to outputfile, header type. Header type is one of the following: - "standard": when the routine is used to clean up an interolog walk file (the header will be longer) - "direct": when the routine is used to clean up a file of real db interaction (the header is shorter) - no field provided: default is standard Throws : - Comment : Sample See Also : ======================================= On top of that, there is a DESCRIPTION, USAGE, and SYNOPSIS. The synopsis has some code with an example of typical usage of the module. Please take a look at this (attached below) and tell me what you think. You mention that the description contains a lot of background information. Would you recommend reducing it, or placing it elsewhere? I was considering to write a little tutorial in latex as soon as possible anyway, to provide a "centralised" source of information to familiarise with the module. Does this respect the CPAN regulations? As for your question on the structure of the module: you are indeed right, the idea when running the "orthology walk" is to create a pipeline of subroutines: there's a core set of subroutines meant to work in strict sequentiality. Each of these subroutines expects, as input, the output of the previous one. The input/output dataset is currently in the form of a TSV text file, which I process with the help of the DBI module (to be more specific, I use DBD::CSV). While there's a certain flexibility regarding how to use the module, one core idea remains: in order to get the set of putative interactors, the user would have to call at least three basic routines: (A) ================= 1)get_forward_orthologies(): this queries the initial gene list against one or more Ensembl dbs (using the Ensembl Perl Api) and retrieves their orthologues, plus a number of ancillary data fields (mainly conservation data, eg dn/ds ratio,distance from ancestor,orthology type, etc) 2)get_interactors(): this queries the orthology list built in the previous stage against a PSICQUIC-enabled PPI db using Rest (at the moment I only query the EBI Intact DB, but it should be easy to expand this and query all PSICQUIC compatible PPI dbs transparently). This step will "fatten" the dataset built in (1) with the interactors of those orthologues, plus ancillary data (including lots of parameters describing the quality, nature, origin of the annotated interaction) 3)get_backward_orthologies(): this queries the interactor list built in the previous stage against one or more Ensembl dbs to find orthologues *back* in the original species. It also adds a number of supplementary information just like in (1). ================== At the end of this procedure the user will have a TSV files where each row contains a binary putative interaction plus (currently) 37 supplementary data fields. One can then scan these results to check for duplicates, to compute counts, to see if we have discovered new gene ids that were not present in the original dataset (hopefully we have :) ). Most importantly, one can then further process these results to do one or more of the following: (B) compute a global confidence score to assess the reliability of the each binary putative interaction (C) extract the binary putative PPIs from the dataset and save them in a format compatible with Cytoscape: this helps providing a visual quality to the result: one can then apply network analysis tools to discover motifs, clusters, etc. The format I use is currently .SIF + attributes, as detailed in http://cytoscape.wodaklab.org/wiki/Cytoscape_User_Manual/Network_Formats (D) given the same initial gene list, one can also build a dataset of REAL, experimentally-obtained PPIs,(without mapping through orthologies in other species). One can then compare this dataset with the Putative dataset to see if/where the two overlap, what's the intersection or the differences, etc. In order to suggest ways of using the module I have written 4 sample scripts and I will include them in the module. Each script utilises the module and uses/reuses subroutines in a pipeline fashion, and does the following: 1)doInterologWalk.pl: runs the basic pipeline in (A) 2)doScores.pl: computes and adds confidence scores as explained in (B) 3)doNetworks.pl: computes SIF network + attributes as in (D) 4)getRealInteractions.pl: runs a pipeline to obtain real PPIs from the inital gene set. Hope I didn't make this too confusing. I would love to hear back from you and from anybody else that would like to provide feedback. Cheers Giuseppe On 18/08/10 17:52, Dave Messina wrote: > Hi Giuseppe, > > Sounds really interesting ? thanks for posting this. > >> Bio::Orthology::InterologWalk > > I vote for this name, or in any case something with Bio:: as the top-level namespace since it's a biology-related package. > > I like that you're providing a lot of background and information about the project in the documentation. However, the USAGE section should give information about how to use the module, with example code. You can look at other modules on CPAN (or in BioPerl) to see the conventions for writing documentation. > > Also, from what you wrote, it sounds like this might be a pipeline or a script rather than a module per se, or perhaps a script and a set of modules. It would be helpful to clarify in your documentation (if you haven't already) how exactly things are organized (and of course example code will help with that, too). > > > Hope that's helpful, and let us know when you've got it up on CPAN so we can try it out! > > > Dave > > NAME Interolog::Walk - Retrieve, score and visualize putative Protein-Protein Interactions through the orthology-walk method SYNOPSIS use Interolog::Walk; First, obtain Intact Interactions for the dataset (see example in "getDirectInteractions.pl"): #get a registry from Ensembl my $registry = InterologMap::setup_ensembl_adaptor(connect_to_db => $ensembl_db, source_species => $sourceorg, verbose => 1 ); #query actual interactions $RC = InterologMap::Direct::get_direct_interactions(registry => $registry, source_species => $sourceorg, input_path => $in_path, output_path => $out_path, url => $url, ); do some postprocessing (see "do_counts()" and "extract_unseen_ids()" ) and then do the actual interolog walk on the dataset with the following sequence of three methods. get orthologues of starting set: $RC = InterologMap::get_forward_orthologies(registry => $registry, ensembl_db => $ensembl_db, input_path => $in_path, output_path => $out_path, source_org => $sourceorg, dest_org => $destorg, ); add interactors of orthologues found by "get_forward_orthologies()": $RC = InterologMap::get_interactions(input_path => $in_path, output_path => $out_path, url => $url, url_global => $url_global, ); add orthologues of interactors found by "get_interactions()": $RC = InterologMap::get_backward_orthologies(registry => $registry, ensembl_db => $ensembl_db, input_path => $in_path, output_path => $out_path, error_path => $err_path, source_org => $sourceorg, ); do some postprocessing (see "remove_duplicate_rows()", "do_counts()", "extract_unseen_ids()") and then optionally compute a composite score for the putative interactions obtained: $RC = InterologMap::Scores::compute_scores(input_path => $in_path, score_path => $score_path, output_path => $out_path, term_graph => $onto_graph, M_IT_SCORE => $M_IT, M_DM_SCORE => $M_DM, M_ME_DM_SCORE => $M_MDM, M_ME_TAXA_SCORE => $M_MTAXA ); get some networks and network attributes which you can then visualise with cytoscape $RC = InterologMap::Networks::do_network(registry => $registry, db => $ensembl_db, input_path => $in_path, output_path => $out_path, source_org => $sourceorg, orthology_type => $orthtype, ); $RC = InterologMap::Networks::do_attributes(registry => $registry, input_path => $in_path, output_path => $out_path, source_org => $sourceorg, label_type => 'external name' ); *The synopsis above only lists the major methods and parameters.* DESCRIPTION A common activity in computational biology is to mine protein-protein interactions from publicly available databases to build *Protein-Protein Interaction* (PPI) datasets. In many instances, however, the number of experimentally obtained annotated PPIs is very scarce and it would be helpful to enrich the experimental dataset with high-quality, computationally-inferred PPIs. Such computationally-obtained dataset can extend, support or enrich experimental PPI datasets, and are of crucial importance in high-throughput gene prioritization studies, i.e. to drive hypotheses and restrict the dimensionality of functional discovery problems. This Perl Module, Interolog::Walk, is aimed at building putative PPI datasets on the basis of a number of comparative biology paradigms: the module implements a collection of computational biology algorithms based on the concept of "orthology projection". If interacting proteins A and B in organism X have orthologs A' and B' in organism Y, under certain conditions one can assume that the interaction will be conserved in organism Y, i.e. the A-B interaction can be "projected through the orthologies" to obtain a putative A'-B' interaction. The pair of interactions (A-B) and (A'-B') are named "Interologs". Interolog::Walk collects, analyses and collates gene orthology data provided by the Ensembl Consortium as well as PPI data provided by EBI Intact. It provides the user with the possibility of rating the quality and reliability of the putative interactions collected, by means of confidence scores, and optionally outputs network representations of the datasets, compatible with the biological network representation standard, Cytoscape. BASIC USAGE Rationale behind "Interolog::Walk". \EBI Intact API/ .--------------. | .-------------. (2) | A(e.g. mouse)|<------------------------>| B(mouse) | (3) `--------------' `-------------' ^ | /Ensembl\ | | \ Ensembl / / Compara \ | | \Compara/ / Api \ | | \ Api / | | .--------------. .-------------. (1) | A'(e.g. fly) |. . . . . . . . . . . . . | B'(fly) | (4) `--------------' [SCORED]PUTATIVE PPI `-------------' (Output of Interolog::Walk) In order to carry out an interolog walk we start with a set of gene identifiers in one organism of interest (1). We query those ids against a number of comparative biology databases to retrieve a list of orthologues for the gene id of interest, in one or more species (2). In the next step we rely instead on PPI databases to retrieve the list of available interactors for the protein ids obtained in (2). The output at this stage consists of a list of interactors of the orthologues of the initial gene set, plus several fields of ancillary data (whose importance will be explained later) (3). In the last step of this process we will need to project the interactions in (3) - again using orthology data - back to the original species of interest. The output of the process is a list of PUTATIVE INTERACTORS of the initial gene set, plus several fields of ancillary data. "Interolog::Walk" provides three main functions to carry out the basic walk, "get_forward_orthologies()", "get_interactions()" and "get_backward_orthologies()". These functions must be called strictly sequentially in your script, as the process, analyse and attach data to the output in a pipeline-like fashion, i.e. processing the output of the preceding function. get_forward_orthologies get_interactions get_backward_orthologies SCORING THE PUTATIVE INTERACTIONS BUILDING PUTATIVE INTERACTION NETWORKS BUGS Please report any you find SUPPORT TODO AUTHOR Giuseppe Gallone CPAN ID: GGALLONE University of Edinburgh COPYRIGHT The Interolog::Walk module is Copyright (c) 2010 Giuseppe Gallone All rights reserved. You may distribute under the terms of either the GNU General Public License or the Artistic License, as specified in the Perl 5.10.0 README file. SEE ALSO -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From G.Gallone at sms.ed.ac.uk Thu Aug 19 12:42:28 2010 From: G.Gallone at sms.ed.ac.uk (Giuseppe Gallone) Date: Thu, 19 Aug 2010 13:42:28 +0100 Subject: [Bioperl-l] [RFC] Interolog::Walk In-Reply-To: <20100819002830.GA366@Macintosh-235.local> References: <4C6BF4BD.5010200@sms.ed.ac.uk> <20100819002830.GA366@Macintosh-235.local> Message-ID: <4C6D26B4.5090702@sms.ed.ac.uk> Dear Siddhartha, glad to hear this might be helpful. As for the bioperl-network package you mention, thank for you for mentioning that. I gave a quick look to its documentation and looks like a much deeper and more complex effort than what I have in my package. I've actually been using a lot the package Graph on which it seems to be based and found it very helpful. I'm not sure if the network routines in my module overlap with it though: all I do in my package is parse the dataset, filtering out only what requested to build a cytoscape SIF file and optionally some cytoscape NOA attribute files, as requested by the cytoscape specification in http://cytoscape.wodaklab.org/wiki/Cytoscape_User_Manual/Network_Formats instead it looks like bioperl-network actually builds some kind of internal representation of the network for further manipulation in Perl, if I understand it correctly? Kind regards Giuseppe On 19/08/10 01:28, Siddhartha Basu wrote: > Sounds interesting. I am currently playing around with a perl based webapp for displaying interactome > using cytoscapeweb. Depending how your design pans out, would be happy to > use your module as a backend analysis layer. And on a related note, you > might want to have a look at bioperl-network and if there is any overlap > might be worth contributing. > > -siddhartha > -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From xupeng86 at gmail.com Thu Aug 19 08:02:48 2010 From: xupeng86 at gmail.com (xupeng) Date: Thu, 19 Aug 2010 16:02:48 +0800 Subject: [Bioperl-l] Why I can't find the perl script "load_seqdatabase.pl" when use biosql database? Message-ID: <201008191602.49068.xupeng86@gmail.com> I've downloaded the biosql-1.0.1.tar.gz. It works well. But I can't find the 'load_seqdatabase.pl' when I try to import the Genbank files into biosql databsase. Can anyone give me a copy of that file? many thanks ! From sunhanifk at gmail.com Thu Aug 19 14:25:38 2010 From: sunhanifk at gmail.com (han sun) Date: Thu, 19 Aug 2010 22:25:38 +0800 Subject: [Bioperl-l] Could I install BioPerl on Windows with the ActivePerl 5.12.1? Message-ID: Hello everyone, I have used perl for several months,and I now want to feel the power of bioperl. But it seems that the installing is more difficult than I thought. I typed the commands. install-shell rep add bioperl http://bioperl.org/DIST rep add uwinnipeg http://cpan.uwinnipeg.ca/PPMPackages/12xx/ rep add trouchelle http://trouchelle.com/ppm12/ install BioPerl However,the installing failed, ppm install failed: Can't find any package that provides List::MoreUtils for Bundle-BioPerl-Core Can't find any package that provides PostScript::TextBlock for Bundle-BioPerl-Core Can't find any package that provides Ace:: for Bundle-BioPerl-Core Can't find any package that provides SOAP::Lite for Bundle-BioPerl-Core Can't find any package that provides Convert::Binary::C for Bundle-BioPerl-Core Can't find any package that provides XML::Twig for Bundle-BioPerl-Core Can't find any package that provides DB_File:: for Bundle-BioPerl-Core Can't find any package that provides IPC::Run for GraphViz Can't find any package that provides XML-XPathEngine for XML-DOM-XPath Can't find any package that provides List-MoreUtils for Moose Can't find any package that provides List-MoreUtils for Class-MOP then I tried install http://www.bribes.org/perl/ppm/GD.ppd and tried the installation again,but it still didn't help. * * * * * * *Do you konw what's wrong with the problem?* * * * * *Please help me,thanks very much.* From cjfields1 at gmail.com Thu Aug 19 14:33:26 2010 From: cjfields1 at gmail.com (Christopher Fields) Date: Thu, 19 Aug 2010 09:33:26 -0500 Subject: [Bioperl-l] Could I install BioPerl on Windows with the ActivePerl 5.12.1? In-Reply-To: References: Message-ID: <78E913D5-00E2-45F2-AA9D-7F4A7CDBFDA1@gmail.com> Try using ActivePerl 5.10 instead of v5.12. It's very possible the PPM won't work for v5.12 yet. chris On Aug 19, 2010, at 9:25 AM, han sun wrote: > Hello everyone, > > I have used perl for several months,and I now want to feel the power of > bioperl. > But it seems that the installing is more difficult than I thought. > > I typed the commands. > > > > install-shell > > > rep add bioperl http://bioperl.org/DIST > > > rep add uwinnipeg > http://cpan.uwinnipeg.ca/PPMPackages/12xx/ > > > rep add trouchelle http://trouchelle.com/ppm12/ > > install BioPerl > > However,the installing failed, > > ppm install failed: > Can't find any package that provides List::MoreUtils for Bundle-BioPerl-Core > Can't find any package that provides PostScript::TextBlock for > Bundle-BioPerl-Core > Can't find any package that provides Ace:: for Bundle-BioPerl-Core > Can't find any package that provides SOAP::Lite for Bundle-BioPerl-Core > Can't find any package that provides Convert::Binary::C for > Bundle-BioPerl-Core > Can't find any package that provides XML::Twig for Bundle-BioPerl-Core > Can't find any package that provides DB_File:: for Bundle-BioPerl-Core > Can't find any package that provides IPC::Run for GraphViz > Can't find any package that provides XML-XPathEngine for XML-DOM-XPath > Can't find any package that provides List-MoreUtils for Moose > Can't find any package that provides List-MoreUtils for Class-MOP > > > then I tried > > install http://www.bribes.org/perl/ppm/GD.ppd > > and tried the installation again,but it still didn't help. > > * > * > * > * > * > * > > > *Do you konw what's wrong with the problem?* > * > * > * > * > *Please help me,thanks very much.* > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at drycafe.net Thu Aug 19 14:53:22 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Thu, 19 Aug 2010 10:53:22 -0400 Subject: [Bioperl-l] Why I can't find the perl script "load_seqdatabase.pl" when use biosql database? In-Reply-To: <201008191602.49068.xupeng86@gmail.com> References: <201008191602.49068.xupeng86@gmail.com> Message-ID: <14AD92D9-3EA7-4E74-8EF7-2B2EC70D9095@drycafe.net> The file comes with Bioperl-db, not BioSQL. That is so because it depends on BioPerl and on Bioperl-db, and so you will need to have both installed. -hilmar On Aug 19, 2010, at 4:02 AM, xupeng wrote: > I've downloaded the biosql-1.0.1.tar.gz. It works well. But I > can't find the 'load_seqdatabase.pl' when I try to import the > Genbank files into biosql databsase. > Can anyone give me a copy of that file? > many thanks ! > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From hlapp at drycafe.net Thu Aug 19 14:58:46 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Thu, 19 Aug 2010 10:58:46 -0400 Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query In-Reply-To: References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au> <986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net> Message-ID: <045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net> Yes, unfortunately they do. The feature for obviating them (namely nested transactions) is there in Pg 8.2+, but Bioperl-db doesn't use them yet ... I have to learn more about Class::DBIx first to decide whether it's better to first implement nested transactions in the home- grown ORM that Bioperl-db in essence is, or whether it's better to reimplement everything in Class::DBIx instead. There are new datatypes in Bioperl, and relations in BioSQL that could hold them, and so I need to decide what's the way forward. -hilmar On Aug 19, 2010, at 6:01 AM, Peter wrote: > On Thu, Aug 19, 2010 at 6:48 AM, Hilmar Lapp > wrote: >> Hi Dan, >> >> the casting isn't an issue anymore, I think. (And even if it were, >> there is >> actually a small script that brings back the casts that were built >> into >> 8.2.) Have you found an example where it still is? >> >> -hilmar > > Hi Hilmar, > > Do the bioperl-db bindings for BioSQL on PostgreSQL still require > those > extra rules in the schema? > http://bugzilla.open-bio.org/show_bug.cgi?id=2839 > > Peter -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From mmuratet at hudsonalpha.org Thu Aug 19 15:00:52 2010 From: mmuratet at hudsonalpha.org (Michael Muratet) Date: Thu, 19 Aug 2010 10:00:52 -0500 Subject: [Bioperl-l] Why I can't find the perl script "load_seqdatabase.pl" when use biosql database? In-Reply-To: <14AD92D9-3EA7-4E74-8EF7-2B2EC70D9095@drycafe.net> References: <201008191602.49068.xupeng86@gmail.com> <14AD92D9-3EA7-4E74-8EF7-2B2EC70D9095@drycafe.net> Message-ID: On Aug 19, 2010, at 9:53 AM, Hilmar Lapp wrote: > The file comes with Bioperl-db, not BioSQL. That is so because it > depends on BioPerl and on Bioperl-db, and so you will need to have > both installed. Is load_seqdatabase.pl still the best method? I vaguely remember a post that said that load_seqdatabase was deprecated, but I can't find it in the archives. Mike > > -hilmar > > On Aug 19, 2010, at 4:02 AM, xupeng wrote: > >> I've downloaded the biosql-1.0.1.tar.gz. It works well. But I >> can't find the 'load_seqdatabase.pl' when I try to import the >> Genbank files into biosql databsase. >> Can anyone give me a copy of that file? >> many thanks ! >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Michael Muratet, Ph.D. Senior Scientist HudsonAlpha Institute for Biotechnology mmuratet at hudsonalpha.org (256) 327-0473 (p) (256) 327-0966 (f) Room 4005 601 Genome Way Huntsville, Alabama 35806 From hlapp at drycafe.net Thu Aug 19 15:29:31 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Thu, 19 Aug 2010 11:29:31 -0400 Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query In-Reply-To: <5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu> References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au> <986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net> <045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net> <5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu> Message-ID: <5F77404A-086D-4D0C-B3A5-F5119FCF878A@drycafe.net> On Aug 19, 2010, at 11:09 AM, Chris Fields wrote: > DBIx::Class Did I have this in the wrong order :-) More coffee, please. -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From hlapp at drycafe.net Thu Aug 19 15:30:26 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Thu, 19 Aug 2010 11:30:26 -0400 Subject: [Bioperl-l] Why I can't find the perl script "load_seqdatabase.pl" when use biosql database? In-Reply-To: References: <201008191602.49068.xupeng86@gmail.com> <14AD92D9-3EA7-4E74-8EF7-2B2EC70D9095@drycafe.net> Message-ID: It's not deprecated. Unless I'm again mixing up something? -hilmar On Aug 19, 2010, at 11:00 AM, Michael Muratet wrote: > > On Aug 19, 2010, at 9:53 AM, Hilmar Lapp wrote: > >> The file comes with Bioperl-db, not BioSQL. That is so because it >> depends on BioPerl and on Bioperl-db, and so you will need to have >> both installed. > > Is load_seqdatabase.pl still the best method? I vaguely remember a > post that said that load_seqdatabase was deprecated, but I can't > find it in the archives. > > Mike > >> >> -hilmar >> >> On Aug 19, 2010, at 4:02 AM, xupeng wrote: >> >>> I've downloaded the biosql-1.0.1.tar.gz. It works well. But I >>> can't find the 'load_seqdatabase.pl' when I try to import the >>> Genbank files into biosql databsase. >>> Can anyone give me a copy of that file? >>> many thanks ! >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : >> =========================================================== >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Michael Muratet, Ph.D. > Senior Scientist > HudsonAlpha Institute for Biotechnology > mmuratet at hudsonalpha.org > (256) 327-0473 (p) > (256) 327-0966 (f) > > Room 4005 > 601 Genome Way > Huntsville, Alabama 35806 > > > > > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From cjfields at illinois.edu Thu Aug 19 15:09:13 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 19 Aug 2010 10:09:13 -0500 Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query In-Reply-To: <045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net> References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au> <986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net> <045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net> Message-ID: <5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu> I think it's worth exploring having a DBIx::Class-based middle-ware approach similar to what Rob Buels has done for Chado. That would be fairly easy to get started using DBIx::Class::Schema::Loader. After that it would require optimization and tweaking, which is potentially more complex than Rob's setup as Chado is very Pg-specific, but maybe Rob can elaborate... chris On Aug 19, 2010, at 9:58 AM, Hilmar Lapp wrote: > Yes, unfortunately they do. The feature for obviating them (namely nested transactions) is there in Pg 8.2+, but Bioperl-db doesn't use them yet ... I have to learn more about Class::DBIx first to decide whether it's better to first implement nested transactions in the home-grown ORM that Bioperl-db in essence is, or whether it's better to reimplement everything in Class::DBIx instead. > > There are new datatypes in Bioperl, and relations in BioSQL that could hold them, and so I need to decide what's the way forward. > > -hilmar > > On Aug 19, 2010, at 6:01 AM, Peter wrote: > >> On Thu, Aug 19, 2010 at 6:48 AM, Hilmar Lapp wrote: >>> Hi Dan, >>> >>> the casting isn't an issue anymore, I think. (And even if it were, there is >>> actually a small script that brings back the casts that were built into >>> 8.2.) Have you found an example where it still is? >>> >>> -hilmar >> >> Hi Hilmar, >> >> Do the bioperl-db bindings for BioSQL on PostgreSQL still require those >> extra rules in the schema? >> http://bugzilla.open-bio.org/show_bug.cgi?id=2839 >> >> Peter > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Aug 19 15:37:39 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 19 Aug 2010 10:37:39 -0500 Subject: [Bioperl-l] Why I can't find the perl script "load_seqdatabase.pl" when use biosql database? In-Reply-To: References: <201008191602.49068.xupeng86@gmail.com> <14AD92D9-3EA7-4E74-8EF7-2B2EC70D9095@drycafe.net> Message-ID: <68FB78FF-11F7-43D7-9FA3-5DFF7D391FAB@illinois.edu> I don't recall this either. So, can't blame it on lack of coffee :) chris On Aug 19, 2010, at 10:30 AM, Hilmar Lapp wrote: > It's not deprecated. Unless I'm again mixing up something? > > -hilmar > > On Aug 19, 2010, at 11:00 AM, Michael Muratet wrote: > >> >> On Aug 19, 2010, at 9:53 AM, Hilmar Lapp wrote: >> >>> The file comes with Bioperl-db, not BioSQL. That is so because it depends on BioPerl and on Bioperl-db, and so you will need to have both installed. >> >> Is load_seqdatabase.pl still the best method? I vaguely remember a post that said that load_seqdatabase was deprecated, but I can't find it in the archives. >> >> Mike >> >>> >>> -hilmar >>> >>> On Aug 19, 2010, at 4:02 AM, xupeng wrote: >>> >>>> I've downloaded the biosql-1.0.1.tar.gz. It works well. But I >>>> can't find the 'load_seqdatabase.pl' when I try to import the >>>> Genbank files into biosql databsase. >>>> Can anyone give me a copy of that file? >>>> many thanks ! >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : >>> =========================================================== >>> >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Michael Muratet, Ph.D. >> Senior Scientist >> HudsonAlpha Institute for Biotechnology >> mmuratet at hudsonalpha.org >> (256) 327-0473 (p) >> (256) 327-0966 (f) >> >> Room 4005 >> 601 Genome Way >> Huntsville, Alabama 35806 >> >> >> >> >> > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From mmuratet at hudsonalpha.org Thu Aug 19 15:40:02 2010 From: mmuratet at hudsonalpha.org (Michael Muratet) Date: Thu, 19 Aug 2010 10:40:02 -0500 Subject: [Bioperl-l] Why I can't find the perl script "load_seqdatabase.pl" when use biosql database? In-Reply-To: <68FB78FF-11F7-43D7-9FA3-5DFF7D391FAB@illinois.edu> References: <201008191602.49068.xupeng86@gmail.com> <14AD92D9-3EA7-4E74-8EF7-2B2EC70D9095@drycafe.net> <68FB78FF-11F7-43D7-9FA3-5DFF7D391FAB@illinois.edu> Message-ID: On Aug 19, 2010, at 10:37 AM, Chris Fields wrote: > I don't recall this either. So, can't blame it on lack of coffee :) Thanks. I'll keep using it! Mike > > chris > > On Aug 19, 2010, at 10:30 AM, Hilmar Lapp wrote: > >> It's not deprecated. Unless I'm again mixing up something? >> >> -hilmar >> >> On Aug 19, 2010, at 11:00 AM, Michael Muratet wrote: >> >>> >>> On Aug 19, 2010, at 9:53 AM, Hilmar Lapp wrote: >>> >>>> The file comes with Bioperl-db, not BioSQL. That is so because it >>>> depends on BioPerl and on Bioperl-db, and so you will need to >>>> have both installed. >>> >>> Is load_seqdatabase.pl still the best method? I vaguely remember a >>> post that said that load_seqdatabase was deprecated, but I can't >>> find it in the archives. >>> >>> Mike >>> >>>> >>>> -hilmar >>>> >>>> On Aug 19, 2010, at 4:02 AM, xupeng wrote: >>>> >>>>> I've downloaded the biosql-1.0.1.tar.gz. It works well. But I >>>>> can't find the 'load_seqdatabase.pl' when I try to import the >>>>> Genbank files into biosql databsase. >>>>> Can anyone give me a copy of that file? >>>>> many thanks ! >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> -- >>>> =========================================================== >>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : >>>> =========================================================== >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> Michael Muratet, Ph.D. >>> Senior Scientist >>> HudsonAlpha Institute for Biotechnology >>> mmuratet at hudsonalpha.org >>> (256) 327-0473 (p) >>> (256) 327-0966 (f) >>> >>> Room 4005 >>> 601 Genome Way >>> Huntsville, Alabama 35806 >>> >>> >>> >>> >>> >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : >> =========================================================== >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > Michael Muratet, Ph.D. Senior Scientist HudsonAlpha Institute for Biotechnology mmuratet at hudsonalpha.org (256) 327-0473 (p) (256) 327-0966 (f) Room 4005 601 Genome Way Huntsville, Alabama 35806 From cjfields at illinois.edu Thu Aug 19 15:55:54 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 19 Aug 2010 10:55:54 -0500 Subject: [Bioperl-l] Bug? Features with similar ranges, different IDs are considered overlapping In-Reply-To: References: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu> <83732A2C-C6E3-479A-ACAA-FE5756271991@sbc.su.se> Message-ID: <5611499B-FA63-4A52-8279-99B554418374@illinois.edu> On Aug 17, 2010, at 8:52 AM, Dave Messina wrote: >> It seems to me that the genomic comparison is the thing people would do more often, so if you're going to create a flag, the default should be for the genomic comparison > > Yep, agreed. > > And such a flag should be named for the non-default behavior, then, like: -ignore_IDs_for_overlaps > > Dave Probably would just be -ignore_ids as this behavior would have to be consistent across the various Bio::RangeI methods (overlaps, contains, etc). The params are case-insensitive IIRC, so the _IDs would just be lc(). RangeI doesn't define a seq_id(), though, so we either use can() in RangeI (which is dirtier IMO) or define this in the appropriate class, probably LocationI or SeqFeatureI. chris From cjfields at illinois.edu Thu Aug 19 15:56:11 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 19 Aug 2010 10:56:11 -0500 Subject: [Bioperl-l] Bug? Features with similar ranges, different IDs are considered overlapping In-Reply-To: References: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu> <83732A2C-C6E3-479A-ACAA-FE5756271991@sbc.su.se> Message-ID: <7CF700A0-C7A0-4BD2-9757-50B693B3B614@illinois.edu> Makes sense. chris On Aug 17, 2010, at 7:45 AM, Scott Cain wrote: > Hi Dave and Chris, > > It seems to me that the genomic comparison is the thing people would do more often, so if you're going to create a flag, the default should be for the genomic comparison and if somebody is doing the protein space comparison and not getting the the expected results, they'll probably read the docs to find out why. > > Scott > > -- > Scott Cain, Ph. D. > scott at scottcain dot net > Ontario Institute for Cancer Research > http://gmod.org/ > 216 392 3087 > > Snet from my iPhone. > > On Aug 17, 2010, at 5:06 AM, Dave Messina wrote: > >>> Good point; it's probably the context the methods are used that matters. So, maybe just a document clarification? >> >> That's always good, but it really doesn't solve the issue you're describing. >> >> I mean, who would expect to get overlaps for features on different chromosomes? >> >> To me, that's a clear violation of reasonable user expectations. You shouldn't have to read the docs about something like that. >> >> So what's the solution for these duelling use cases? I haven't thought about it much, but a first approximation might be to add a -genomic boolean flag that, when true, would do the right thing and check the ID when doing overlaps or other positional comparisons. >> >> (Maybe -genomic is too obscure. Maybe it should be -same_id_for_overlaps or something like that.) >> >> And maybe having to know to set a flag is effectively the same thing as having to read the docs to understand SeqFeature's overlap behavior. >> >> What do the rest of you out there think? >> >> >> Dave >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Thu Aug 19 16:54:23 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 19 Aug 2010 18:54:23 +0200 Subject: [Bioperl-l] Bug? Features with similar ranges, different IDs are considered overlapping References: <83299B71-0F73-440D-A9C5-DC1DA2AFF605@davemessina.com> Message-ID: <1EFB951F-AEE1-4B2A-9E29-114E40B25D21@sbc.su.se> [Ccing list for real this time] On Aug 19, 2010, at 17:55, Chris Fields wrote: > Probably would just be -ignore_ids You're right, that's the way to go. > define this in the appropriate class, probably LocationI or Yep, that's cleaner. Thanks! Dave From cjfields1 at gmail.com Thu Aug 19 17:20:32 2010 From: cjfields1 at gmail.com (Christopher Fields) Date: Thu, 19 Aug 2010 12:20:32 -0500 Subject: [Bioperl-l] Could I install BioPerl on Windows with the ActivePerl 5.12.1? In-Reply-To: References: <78E913D5-00E2-45F2-AA9D-7F4A7CDBFDA1@gmail.com> Message-ID: <5115F433-06AC-46F1-81AD-D15C4A8D9524@gmail.com> cc'ing list. Looks like the BioPerl PPM is possibly broken for perl 5.12. Shouldn't be too hard to fix, but apparently there are a lot of missing packages. Troubling... chris On Aug 19, 2010, at 11:29 AM, han sun wrote: > v5.10 works,thanks. > > 2010/8/19 Christopher Fields > Try using ActivePerl 5.10 instead of v5.12. It's very possible the PPM won't work for v5.12 yet. > > chris > > On Aug 19, 2010, at 9:25 AM, han sun wrote: > > > Hello everyone, > > > > I have used perl for several months,and I now want to feel the power of > > bioperl. > > But it seems that the installing is more difficult than I thought. > > > > I typed the commands. > > > > > > > > install-shell > > > > > > rep add bioperl http://bioperl.org/DIST > > > > > > rep add uwinnipeg > > http://cpan.uwinnipeg.ca/PPMPackages/12xx/ > > > > > > rep add trouchelle http://trouchelle.com/ppm12/ > > > > install BioPerl > > > > However,the installing failed, > > > > ppm install failed: > > Can't find any package that provides List::MoreUtils for Bundle-BioPerl-Core > > Can't find any package that provides PostScript::TextBlock for > > Bundle-BioPerl-Core > > Can't find any package that provides Ace:: for Bundle-BioPerl-Core > > Can't find any package that provides SOAP::Lite for Bundle-BioPerl-Core > > Can't find any package that provides Convert::Binary::C for > > Bundle-BioPerl-Core > > Can't find any package that provides XML::Twig for Bundle-BioPerl-Core > > Can't find any package that provides DB_File:: for Bundle-BioPerl-Core > > Can't find any package that provides IPC::Run for GraphViz > > Can't find any package that provides XML-XPathEngine for XML-DOM-XPath > > Can't find any package that provides List-MoreUtils for Moose > > Can't find any package that provides List-MoreUtils for Class-MOP > > > > > > then I tried > > > > install http://www.bribes.org/perl/ppm/GD.ppd > > > > and tried the installation again,but it still didn't help. > > > > * > > * > > * > > * > > * > > * > > > > > > *Do you konw what's wrong with the problem?* > > * > > * > > * > > * > > *Please help me,thanks very much.* > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From rmb32 at cornell.edu Thu Aug 19 17:09:45 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 19 Aug 2010 10:09:45 -0700 Subject: [Bioperl-l] reminder: Aug 25 deadline for GMOD Hackathon application Message-ID: <4C6D6559.3080809@cornell.edu> Hi all, This is your one-week reminder: the deadline for open applications to the GMOD Evo hackathon is Wednesday, August 25th. Rob ======================================== We are seeking participants for the GMOD Tools for Evolutionary Biology Hackathon, held November 8-12, 2010 at the US National Evolutionary Synthesis Center (NESCent) in Durham, NC. This hackathon targets three critical gaps in the capabilities of the GMOD toolbox that currently limit its utility for evolutionary research: 1. Visualization of comparative genomics data 2. Visualization of phylogenetic data and trees 3. Support for population diversity and phenotype data If you are interested in these areas and have relevant expertise, you are strongly encouraged to apply. Relevant areas of expertise include more than just software development: if you are a GMOD power user, visualization guru, domain expert (comparative, phylogenetics, population, ...), or documentation wizard, then your skills are needed! How To Apply: Fill out the online application form at http://bit.ly/gmodevohack. Applications are due August 25. About GMOD: GMOD is an intercompatible suite of open-source software components for storing, managing, analyzing, and visualizing genome-scale data. GMOD includes many widely-used software components: GBrowse and JBrowse, both genome viewers; GBrowse_syn, a comparative genomics viewer; Chado, a generic and modular database schema; CMap, a comparative map viewer; as well as many other components including Apollo, MAKER, BioMart, InterMine, and Galaxy. We hope to extend the functionality of existing GMOD components, and integrate new components as well. About Hackathons: A hackathon is an intense event at which a group of programmers with different backgrounds and skills collaborate hands-on and face-to-face to develop working code that is of utility to the community as a whole. The mix of people will include domain experts and computer-savvy end-users. More details about the event, its motivation, organization, procedures, and attendees, as well as URLs to the hackathon and related websites are included below. Sincerely, The GMOD EvoHack Organizing Committee (and project affiliations as relevant): Nicole Washington, Chair (LBNL, modENCODE, Phenote) Robert Buels (SGN, Chado NatDiv) Scott Cain (OICR, GMOD) Dave Clements (NESCent, GMOD) Hilmar Lapp (NESCent, Phenoscape, Chado NatDiv) Sheldon McKay (University of Arizona, iPlant, GBrowse_syn) ----------------------------- About the GMOD Evo Hackathon Overview We are organizing a hackathon to fill critical gaps in the capabilities of the Generic Model Organism Database (GMOD) toolbox that currently limit its utility for evolutionary research. Specifically, we will focus on tools for 1) viewing comparative genomics data; 2) visualizing phylogenomic data; and 3) supporting population diversity data and phenotype annotation. The event will be hosted at NESCent and bring together a group of about 20+ software developers, end-user representatives, and documentation experts who would otherwise not meet. The participants will include key developers of GMOD components that currently lack features critical for emerging evolutionary biology research, developers of informatics tools in evolutionary research that lack GMOD integration, and informatics-savvy biologists who can represent end-user requirements. The event will provide a unique opportunity to infuse the GMOD developer community with a heightened awareness of unmet needs in evolutionary biology that GMOD components have the potential to fill, and for tool developers in evolutionary biology to better understand how best to extend or integrate with already existing GMOD components. Before the Event Discussion of ideas and sometimes even design actually starts well before the hackathon, on mailing lists, wiki pages, and conference calls set up among accepted attendees. This advance work lays the foundation for participants to be productive from the very first day. This also means that participants should be willing to contribute some time in advance of the hackathon itself to participate in this preparatory discussion. During the Event Typically, hackathon participants use the morning of the first day of the event to organize themselves into working groups of between 3 and 6 people, each with a focused implementation objective. Ideas and objectives are discussed, and attendees coalesce around the projects in which they have the most experience or interest. Deliverables / Event Results The meeting's attendance, working groups, and outcomes will be fully logged and documented on the GMOD wiki (http://gmod.org). Each working group during the event will typically have its own wiki page, linked from the main EvoHack page, where it documents its minutes and design notes, and provides links to the code and documentation it produces. Also, since GMOD and NESCent are both committed to open source principles, all code and documentation produced by participants during the hackathon must be published under an OSI-approved open source license. As contributions to existing GMOD tools, all hackathon products will most likely satisfy this requirement automatically. NESCent This event is sponsored by the US National Evolutionary Synthesis Center (NESCent, http://www.nescent.org) through its Informatics Whitepapers program (http://www.nescent.org/informatics/whitepapers.php). NESCent promotes the synthesis of information, concepts and knowledge to address significant, emerging, or novel questions in evolutionary science and its applications. NESCent achieves this by supporting research and education across disciplinary, institutional, geographic, and demographic boundaries (see http://www.nescent.org/science/proposals.php). Links Main GMOD EvoHack page, and full proposal: http://gmod.org/wiki/GMOD_Evo_Hackathon NESCent: http://www.nescent.org/ GMOD: http://gmod.org Similar past NESCent events, see: http://hackathon.nescent.org/ GMOD hackathon application: http://bit.ly/gmodevohack -- http://gmod.org/wiki/GMOD_News http://gmod.org/wiki/GMOD_Europe_2010 http://gmod.org/wiki/Help_Desk_Feedback From David.Messina at sbc.su.se Thu Aug 19 18:55:50 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 19 Aug 2010 20:55:50 +0200 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlast.pm - bl2seq question In-Reply-To: <4C6D7123.9080908@bcm.tmc.edu> References: <4C6C3259.4060304@bcm.tmc.edu> <4C6D7123.9080908@bcm.tmc.edu> Message-ID: <4E977318-05AC-4D8E-9A39-8C07A2419198@sbc.su.se> Glad I could help, Caleb. Dave On Aug 19, 2010, at 20:00, Caleb Davis wrote: > Hi Dave, > > Thank you so much for your detailed response! Fixing the reward parameter replicated the online result for me. All of the other factors you brought up will help me track down any future problems. Thanks again. > > --Caleb > From rmb32 at cornell.edu Thu Aug 19 22:19:11 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 19 Aug 2010 15:19:11 -0700 Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query In-Reply-To: <5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu> References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au> <986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net> <045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net> <5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu> Message-ID: <4C6DADDF.1000103@cornell.edu> Chris Fields wrote: > I think it's worth exploring having a DBIx::Class-based middle-ware approach similar to what Rob Buels has done for Chado. That would be fairly easy to get started using DBIx::Class::Schema::Loader. > > After that it would require optimization and tweaking, which is potentially more complex than Rob's setup as Chado is very Pg-specific, but maybe Rob can elaborate... Elaborating on how Bio::Chado::Schema is developed: The vast majority of the code and POD in BCS is autogenerated by DBIx::Class::Schema::Loader. DBICSL gives you a baseline set of DBIx::Class classes that covers all the tables, views, columns, unique constraints, and foreign key relationships. Beyond that, you have to add on yourself. In BCS, we have mostly done things like: * make better-named aliases for some of the autogenerated relationships (though DBICSL does a surprisingly good job of naming relationships automatically most of the time) * add a tiny bit of bioperl compatibility (this needs a lot more work by somebody, volunteers needed!) * add convenience methods for using some of the Chado property tables * use DBIx::Class::Tree::NestedSet to add some powerful ways of traversing phylogenetic tree relationships Regarding DB backend specificity, BCS isn't Pg-specific at all, because DBIx::Class itself goes to great lengths to be compatible (and performant!) with just about every relational database out there. In fact, the BCS test suite deploys a Chado schema into a temporary SQLite database using DBIC::Schema's deploy() method, and runs all of its tests on that. Very handy. Chado's Pg-specific server-side functions can of course be called through BCS if they are present, but it's perfectly possible to use Chado without any of the server-side functions, and mostly the way I use it. Rob From David.Messina at sbc.su.se Fri Aug 20 09:19:14 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 20 Aug 2010 11:19:14 +0200 Subject: [Bioperl-l] Git for the lazy Message-ID: <4A13D48C-B920-4FA5-AF18-292C764A8B79@sbc.su.se> Hi everyone, If you're like me and still getting up to speed with Git, you might find this helpful: http://www.spheredev.org/wiki/Git_for_the_lazy Dave From bgs500 at york.ac.uk Fri Aug 20 13:07:50 2010 From: bgs500 at york.ac.uk (Ben Saville) Date: Fri, 20 Aug 2010 14:07:50 +0100 Subject: [Bioperl-l] Problem Parsing BLAST output Message-ID: <77D8BE69-E2D3-4C75-BED4-08F1755E414F@york.ac.uk> Hi Everyone, I'm very much new to the world of sequence data analysis (and this mailing list!), and have reached a roadblock. I have BLASTed some contigs against a series of databases that I created. From this I would like to parse through the data and separate it before extracting the information of interest at a later point. I would like to separate the data by query ID. I found the following Bioperl script; #!/usr/bin/perl use Bio::Search::Result::BlastResult; use Bio::SearchIO; my $report = Bio::SearchIO->new( -file=>'All_BCM_results.bls', -format => blast); my $result = $report->next_result; my %hits_by_query; while (my $hit = $result->next_hit) { push @{$hits_by_query{$hit->name}}, $hit; } foreach my $qid ( keys %hits_by_query ) { my $result = Bio::Search::Result::BlastResult->new(); $result->add_hit($_) for ( @{$hits_by_query{$qid}} ); my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", - format=>'blast' ); $blio->write_result($result); } running this script resulted in the following error; BlastResult::new(): Not adding iterations. ------------- EXCEPTION: Bio::Root::NoSuchThing ------------- MSG: No such iteration number: 0. Valid range=1-0 VALUE: The number zero (0) STACK: Error::throw STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.8/Bio/Root/Root.pm:368 STACK: Bio::Search::Result::BlastResult::iteration /sw/lib/perl5/5.8.8/ Bio/Search/Result/BlastResult.pm:328 STACK: Bio::Search::Result::BlastResult::add_hit /sw/lib/perl5/5.8.8/ Bio/Search/Result/BlastResult.pm:258 STACK: /Users/bsaville/Desktop/Parsing_BLAST_by_query.pl:15 ------------------------------------------------------------- So I added my $result = Bio::Search::Result::BlastResult->new(1); The 1 to the line shown above, as it told me this was within the valid range. This produced the following error; ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Must define arrayref of Iterations when initializing a Bio::Search::Result::BlastResult STACK: Error::throw STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.8/Bio/Root/Root.pm:368 STACK: Bio::Search::Result::BlastResult::new /sw/lib/perl5/5.8.8/Bio/ Search/Result/BlastResult.pm:128 STACK: /Users/bsaville/Desktop/Parsing_BLAST_by_query.pl:14 ----------------------------------------------------------- I know that it is my inexperience that is causing this problem, but I really can't figure this out. Regards Ben Saville From David.Messina at sbc.su.se Fri Aug 20 13:48:28 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 20 Aug 2010 15:48:28 +0200 Subject: [Bioperl-l] Problem Parsing BLAST output In-Reply-To: <77D8BE69-E2D3-4C75-BED4-08F1755E414F@york.ac.uk> References: <77D8BE69-E2D3-4C75-BED4-08F1755E414F@york.ac.uk> Message-ID: <0384052D-74D2-4789-B7FA-76EED826044F@sbc.su.se> Hi Ben, I would not use the script you posted ? I don't think it does what you want. If you haven't already, you should take a look at the beginners' HOWTO http://www.bioperl.org/wiki/HOWTO:Beginners the SearchIO HOWTO http://www.bioperl.org/wiki/HOWTO:SearchIO and the example scripts included with BioPerl: http://www.bioperl.org/wiki/Scripts Incidentally, it's a lot of fiddly data processing to parse blast reports for many contigs against multiple databases and then go back and collate the results by query. I'm not sure exactly what you want to do once you've separated by query ? if you provide some more information, we could suggest ways to best get you where you want to go. I will mention, though, that BLAST has the ability to search multiple separate databases in one go and collate the results for you. So that's something to consider. Dave From bernd.web at gmail.com Fri Aug 20 15:17:05 2010 From: bernd.web at gmail.com (Bernd Web) Date: Fri, 20 Aug 2010 17:17:05 +0200 Subject: [Bioperl-l] Bio::LocatableSeq end checking inconsistency In-Reply-To: <004a01cb3aec$8c2ddd60$a4899820$%yin@ucd.ie> References: <004a01cb3aec$8c2ddd60$a4899820$%yin@ucd.ie> Message-ID: Hi Yin, I am not quite sure if the following is also related to your gapped length issue but I found I had to adapt the calculation of ungapped_len in Bio::LocatableSeq. If my slices did not contain any letters or a new gap char I used, SimpleAlign could not find the sequences when outputting the alignment. This was due to a difference in length calculation: SimpleAlign: uses \W: $slice_seq =~ s/\W//g; Bio::LocatableSeq::ungapped_len uses "$string =~ s/[\.\-]+//g;" I had to include '~' (for my local sequences) in the ungapped_len; otherwise i would run into the end issues with SimpleAlign. Kind regards, Bernd On Fri, Aug 13, 2010 at 3:36 PM, Jun Yin wrote: > Hi, all, > > > > I am the google summer of code student working on Bio::Align subsystem > refactoring. The code (Bio::SimpleAlign) I re-implemented now has passed > nearly all the test, except a few tests on seq/start-end testing. But here > comes a problem. This may be an old issue, that the Bio::LocatableSeq end > assignment and checking are inconsistent. > > > > The current end checking method is based on: > > $end=$seq->_ungapped_len+$seq->start-1 > > However, this checking may not fit the real world case. > > > > The inconsistency usually happens when a few columns of the sequence are > removed. > > > > For example: > > my $a = Bio::LocatableSeq->new( > > ? ?-id ? ?=> 'a', > > ? ?-strand => 1, > > ? ?-seq ? => '-tcgatc-atcgatcg', > > ? ?-start => 30, > > ? ?-end ? => 43 > > ); > > > > If we remove the 1st, 8th and the last columns > > > > $a->seq() will be 'tcgatcatcgatc' > > $a->_ungapped_len==12 > > > > Actually, in the real world, the first residue will still be 30 (the old > $seq->start), and the last residue is the residue before the 43 (the old > $seq->end), thus 42. > > > > But if you call a validation, the calculation is > $a->_ungapped_len+$a->start-1=12+30-1=41 > > So the reassignment of the $seq->end will not pass the validation. > > > > So unless you save the information to a new sequence object, the original > position information will be lost anyway. But in some cases, we have to > change the sequence in its original sequence object .. > > > > What is your suggestion on this issue? > > A. pass the test and lose the information ? ? ?#convenient in coding but the > start-end annotation is not right any more > > B. keep the information and forget the test ? #the object will still > remember where the last residue was in the original sequence. But is it > really meaningful at all? Because all the other residues may come from > nowhere > > C. Neither of above #any other suggestions? > > > > Cheers, > > Jun Yin > > Ph.D. student in U.C.D. > > > > Bioinformatics Laboratory > > Conway Institute > > University College Dublin > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From sidd.basu at gmail.com Fri Aug 20 15:59:59 2010 From: sidd.basu at gmail.com (Siddhartha Basu) Date: Fri, 20 Aug 2010 10:59:59 -0500 Subject: [Bioperl-l] Re: bioperl-db and postgres8.3 - status query In-Reply-To: <4C6DADDF.1000103@cornell.edu> References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au> <986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net> <045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net> <5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu> <4C6DADDF.1000103@cornell.edu> Message-ID: <20100820155956.GB400@vpn-165-124-164-118.vpn.northwestern.edu> Hi, On Thu, 19 Aug 2010, Robert Buels wrote: > Chris Fields wrote: > > I think it's worth exploring having a DBIx::Class-based middle-ware > > approach similar to what Rob Buels has done for Chado. That would be > > fairly easy to get started using DBIx::Class::Schema::Loader. > > After that it would require optimization and tweaking, which is > > potentially more complex than Rob's setup as Chado is very Pg-specific, > > but maybe Rob can elaborate... > > Elaborating on how Bio::Chado::Schema is developed: > > The vast majority of the code and POD in BCS is autogenerated by > DBIx::Class::Schema::Loader. DBICSL gives you a baseline set of > DBIx::Class classes that covers all the tables, views, columns, unique > constraints, and foreign key relationships. > > Beyond that, you have to add on yourself. In BCS, we have mostly done > things like: > > * make better-named aliases for some of the autogenerated > relationships (though DBICSL does a surprisingly good job of naming > relationships automatically most of the time) > * add a tiny bit of bioperl compatibility (this needs a lot more work > by somebody, volunteers needed!) > * add convenience methods for using some of the Chado property tables > * use DBIx::Class::Tree::NestedSet to add some powerful ways of > traversing phylogenetic tree relationships > > Regarding DB backend specificity, BCS isn't Pg-specific at all, because > DBIx::Class itself goes to great lengths to be compatible (and performant!) > with just about every relational database out there. I would vouch for that at least as far as chado in oracle is concerned. So, far BCS works out flawlessly with our oracle chado instance at dictybase. Quite a chunk of BCS based code is also active in couple of our Mojo based webapps. The part which i still couldn't use directly is the 'synonym' table as it clashes with oracle specific reserved keywords. However, overall it seems to quite cross-RDMS compatible and highly recommended. -siddhartha >In fact, the BCS test > suite deploys a Chado schema into a temporary SQLite database using > DBIC::Schema's deploy() method, and runs all of its tests on that. Very > handy. > > Chado's Pg-specific server-side functions can of course be called through > BCS if they are present, but it's perfectly possible to use Chado without > any of the server-side functions, and mostly the way I use it. > > Rob > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jun.yin at ucd.ie Fri Aug 20 16:17:33 2010 From: jun.yin at ucd.ie (Jun Yin) Date: Fri, 20 Aug 2010 17:17:33 +0100 Subject: [Bioperl-l] Bio::LocatableSeq end checking inconsistency In-Reply-To: References: <004a01cb3aec$8c2ddd60$a4899820$%yin@ucd.ie> Message-ID: <000b01cb4083$31f98280$95ec8780$%yin@ucd.ie> Hi, Bernd, Thx for your input. Yes, this is one of the old bugs in Bio::SimpleAlign. $aln->slice just simply $slice_seq =~ s/\W//g to calculate the ungapped length. But in $seq->_ungapped_len, this method use $string =~ s{[$GAP_SYMBOLS$FRAMESHIFT_SYMBOLS]+}{}g; Which is '\-\.=~\\\/ ' to calculate the ungapped length. To solve this problem, first, now I use $nonres = join("",$self->gap_char, $self->match_char,$self->missing_char); Which is '-\.&' to remove the non-residue chars in the alignment sequence (though if you use '=','~','\','/' will also cause problems). Secondly, I have merged slice, remove_columns and remove_gaps, using the same internal function. Thus it is easier to debug. These changes will be merged into main BioPerl branch after next version. But anyway, the confict is still there, because the non residue chars are defined as: In Bio::SimpleAlign, $aln->gap_char, $aln->missing_char, $aln->match_char In Bio::LocatableSeq $GAP_SYMBOLS = '\-\.=~'; $FRAMESHIFT_SYMBOLS = '\\\/'; so try to use '-' or '.' for your gap char at the moment, otherwise you may encounter end warnings in calculation. And, if you want to keep gap only sequences, you can call the method as: $aln2 = $aln->slice(20,30,1) The last parameter is to keep gap only sequence. Cheers, Jun Yin Ph.D.?student in U.C.D. Bioinformatics Laboratory Conway Institute University College Dublin -----Original Message----- From: Bernd Web [mailto:bernd.web at gmail.com] Sent: Friday, August 20, 2010 4:17 PM To: Jun Yin Cc: bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Bio::LocatableSeq end checking inconsistency Hi Yin, I am not quite sure if the following is also related to your gapped length issue but I found I had to adapt the calculation of ungapped_len in Bio::LocatableSeq. If my slices did not contain any letters or a new gap char I used, SimpleAlign could not find the sequences when outputting the alignment. This was due to a difference in length calculation: SimpleAlign: uses \W: $slice_seq =~ s/\W//g; Bio::LocatableSeq::ungapped_len uses "$string =~ s/[\.\-]+//g;" I had to include '~' (for my local sequences) in the ungapped_len; otherwise i would run into the end issues with SimpleAlign. Kind regards, Bernd On Fri, Aug 13, 2010 at 3:36 PM, Jun Yin wrote: > Hi, all, > > > > I am the google summer of code student working on Bio::Align subsystem > refactoring. The code (Bio::SimpleAlign) I re-implemented now has passed > nearly all the test, except a few tests on seq/start-end testing. But here > comes a problem. This may be an old issue, that the Bio::LocatableSeq end > assignment and checking are inconsistent. > > > > The current end checking method is based on: > > $end=$seq->_ungapped_len+$seq->start-1 > > However, this checking may not fit the real world case. > > > > The inconsistency usually happens when a few columns of the sequence are > removed. > > > > For example: > > my $a = Bio::LocatableSeq->new( > > ? ?-id ? ?=> 'a', > > ? ?-strand => 1, > > ? ?-seq ? => '-tcgatc-atcgatcg', > > ? ?-start => 30, > > ? ?-end ? => 43 > > ); > > > > If we remove the 1st, 8th and the last columns > > > > $a->seq() will be 'tcgatcatcgatc' > > $a->_ungapped_len==12 > > > > Actually, in the real world, the first residue will still be 30 (the old > $seq->start), and the last residue is the residue before the 43 (the old > $seq->end), thus 42. > > > > But if you call a validation, the calculation is > $a->_ungapped_len+$a->start-1=12+30-1=41 > > So the reassignment of the $seq->end will not pass the validation. > > > > So unless you save the information to a new sequence object, the original > position information will be lost anyway. But in some cases, we have to > change the sequence in its original sequence object .. > > > > What is your suggestion on this issue? > > A. pass the test and lose the information ? ? ?#convenient in coding but the > start-end annotation is not right any more > > B. keep the information and forget the test ? #the object will still > remember where the last residue was in the original sequence. But is it > really meaningful at all? Because all the other residues may come from > nowhere > > C. Neither of above #any other suggestions? > > > > Cheers, > > Jun Yin > > Ph.D. student in U.C.D. > > > > Bioinformatics Laboratory > > Conway Institute > > University College Dublin > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > __________ Information from ESET Smart Security, version of virus signature database 5377 (20100818) __________ The message was checked by ESET Smart Security. http://www.eset.com __________ Information from ESET Smart Security, version of virus signature database 5377 (20100818) __________ The message was checked by ESET Smart Security. http://www.eset.com From cjfields at illinois.edu Fri Aug 20 16:23:07 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 20 Aug 2010 11:23:07 -0500 Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query In-Reply-To: <20100820155956.GB400@vpn-165-124-164-118.vpn.northwestern.edu> References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au> <986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net> <045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net> <5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu> <4C6DADDF.1000103@cornell.edu> <20100820155956.GB400@vpn-165-124-164-118.vpn.northwestern.edu> Message-ID: <1282321387.30812.17.camel@pyrimidine.igb.uiuc.edu> On Fri, 2010-08-20 at 10:59 -0500, Siddhartha Basu wrote: > Hi, > > On Thu, 19 Aug 2010, Robert Buels wrote: > > > Chris Fields wrote: > > > I think it's worth exploring having a DBIx::Class-based middle-ware > > > approach similar to what Rob Buels has done for Chado. That would be > > > fairly easy to get started using DBIx::Class::Schema::Loader. > > > After that it would require optimization and tweaking, which is > > > potentially more complex than Rob's setup as Chado is very Pg-specific, > > > but maybe Rob can elaborate... > > > > Elaborating on how Bio::Chado::Schema is developed: > > > > The vast majority of the code and POD in BCS is autogenerated by > > DBIx::Class::Schema::Loader. DBICSL gives you a baseline set of > > DBIx::Class classes that covers all the tables, views, columns, unique > > constraints, and foreign key relationships. > > > > Beyond that, you have to add on yourself. In BCS, we have mostly done > > things like: > > > > * make better-named aliases for some of the autogenerated > > relationships (though DBICSL does a surprisingly good job of naming > > relationships automatically most of the time) > > * add a tiny bit of bioperl compatibility (this needs a lot more work > > by somebody, volunteers needed!) > > * add convenience methods for using some of the Chado property tables > > * use DBIx::Class::Tree::NestedSet to add some powerful ways of > > traversing phylogenetic tree relationships > > > > Regarding DB backend specificity, BCS isn't Pg-specific at all, because > > DBIx::Class itself goes to great lengths to be compatible (and performant!) > > with just about every relational database out there. > I would vouch for that at least as far as chado in oracle is concerned. > So, far BCS works out flawlessly with our oracle chado instance at > dictybase. Quite a chunk of BCS based code is also active in couple of > our Mojo based webapps. The part which i still couldn't use directly is > the 'synonym' table as it clashes with oracle specific reserved keywords. > However, overall it seems to quite cross-RDMS compatible and highly > recommended. > > -siddhartha Just to point out, I didn't say BCS is Pg-specific, but that Chado is (that was the DBMS it was designed for). Maybe that should be amended to 'was' now :) I recall seeing a page on this somewhere on the GMOD website along the lines of "MySQL has problems so we chose Pg", and that Chado support would focus on Pg. I'm guessing that's no longer the case? Or is only the server-side stuff Pg-specific. > >In fact, the BCS test > > suite deploys a Chado schema into a temporary SQLite database using > > DBIC::Schema's deploy() method, and runs all of its tests on that. Very > > handy. > > > > Chado's Pg-specific server-side functions can of course be called through > > BCS if they are present, but it's perfectly possible to use Chado without > > any of the server-side functions, and mostly the way I use it. > > > > Rob I think this opens up the possibility of starting a DBIx::Class-based middleware solution. Hilmar, did you want to take that on? chris From sidd.basu at gmail.com Fri Aug 20 17:39:44 2010 From: sidd.basu at gmail.com (Siddhartha Basu) Date: Fri, 20 Aug 2010 12:39:44 -0500 Subject: [Bioperl-l] Re: bioperl-db and postgres8.3 - status query In-Reply-To: <1282321387.30812.17.camel@pyrimidine.igb.uiuc.edu> References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au> <986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net> <045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net> <5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu> <4C6DADDF.1000103@cornell.edu> <20100820155956.GB400@vpn-165-124-164-118.vpn.northwestern.edu> <1282321387.30812.17.camel@pyrimidine.igb.uiuc.edu> Message-ID: <20100820173942.GC400@vpn-165-124-164-118.vpn.northwestern.edu> On Fri, 20 Aug 2010, Chris Fields wrote: > On Fri, 2010-08-20 at 10:59 -0500, Siddhartha Basu wrote: > > Hi, > > > > On Thu, 19 Aug 2010, Robert Buels wrote: > > > > > Chris Fields wrote: > > > > I think it's worth exploring having a DBIx::Class-based middle-ware > > > > approach similar to what Rob Buels has done for Chado. That would be > > > > fairly easy to get started using DBIx::Class::Schema::Loader. > > > > After that it would require optimization and tweaking, which is > > > > potentially more complex than Rob's setup as Chado is very Pg-specific, > > > > but maybe Rob can elaborate... > > > > > > Elaborating on how Bio::Chado::Schema is developed: > > > > > > The vast majority of the code and POD in BCS is autogenerated by > > > DBIx::Class::Schema::Loader. DBICSL gives you a baseline set of > > > DBIx::Class classes that covers all the tables, views, columns, unique > > > constraints, and foreign key relationships. > > > > > > Beyond that, you have to add on yourself. In BCS, we have mostly done > > > things like: > > > > > > * make better-named aliases for some of the autogenerated > > > relationships (though DBICSL does a surprisingly good job of naming > > > relationships automatically most of the time) > > > * add a tiny bit of bioperl compatibility (this needs a lot more work > > > by somebody, volunteers needed!) > > > * add convenience methods for using some of the Chado property tables > > > * use DBIx::Class::Tree::NestedSet to add some powerful ways of > > > traversing phylogenetic tree relationships > > > > > > Regarding DB backend specificity, BCS isn't Pg-specific at all, because > > > DBIx::Class itself goes to great lengths to be compatible (and performant!) > > > with just about every relational database out there. > > I would vouch for that at least as far as chado in oracle is concerned. > > So, far BCS works out flawlessly with our oracle chado instance at > > dictybase. Quite a chunk of BCS based code is also active in couple of > > our Mojo based webapps. The part which i still couldn't use directly is > > the 'synonym' table as it clashes with oracle specific reserved keywords. > > However, overall it seems to quite cross-RDMS compatible and highly > > recommended. > > > > -siddhartha > > Just to point out, I didn't say BCS is Pg-specific, but that Chado is > (that was the DBMS it was designed for). Maybe that should be amended > to 'was' now :) > > I recall seeing a page on this somewhere on the GMOD website along the > lines of "MySQL has problems so we chose Pg", and that Chado support > would focus on Pg. As far as i understand GMOD stongly recommends and the popular backend for chado is Pg. However, my point was if anybody wants to use or tryout chado schema on a different backend or have an existing setup, tools like DBIx::Class or particularly BCS makes it quite easier to do so. The code developed on top also become quite robust and portable. -siddhartha >I'm guessing that's no longer the case? Or is only > the server-side stuff Pg-specific. > > > >In fact, the BCS test > > > suite deploys a Chado schema into a temporary SQLite database using > > > DBIC::Schema's deploy() method, and runs all of its tests on that. Very > > > handy. > > > > > > Chado's Pg-specific server-side functions can of course be called through > > > BCS if they are present, but it's perfectly possible to use Chado without > > > any of the server-side functions, and mostly the way I use it. > > > > > > Rob > > I think this opens up the possibility of starting a DBIx::Class-based > middleware solution. Hilmar, did you want to take that on? > > chris > > From buiduyminh at gmail.com Fri Aug 20 21:29:00 2010 From: buiduyminh at gmail.com (Minh Bui) Date: Fri, 20 Aug 2010 17:29:00 -0400 Subject: [Bioperl-l] bp_seqfeature_load.pl fails on Mac os. Please help. Message-ID: Hi,, I am trying to load my GFF file to mysql database but I got this error when I ran the bp_seqfeature_load.pl ( bioperl 1.6.1 on MAC) [BioComplexity-5:/usr/local/bin] minh% perl bp_seqfeature_load.pl install_driver(mysql) failed: Can't locate DBD/mysql.pm in @INC (@INC contains: /sw/lib/perl5 /sw/lib/perl5/darwin /System/Library/Perl/5.8.6/darwin-thread-multi-2level /System/Library/Perl/5.8.6 /Library/Perl/5.8.6/darwin-thread-multi-2level /Library/Perl/5.8.6 /Library/Perl /Network/Library/Perl/5.8.6/darwin-thread-multi-2level /Network/Library/Perl/5.8.6 /Network/Library/Perl /System/Library/Perl/Extras/5.8.6/darwin-thread-multi-2level /System/Library/Perl/Extras/5.8.6 /Library/Perl/5.8.1 .) at (eval 44) line 3. Perhaps the DBD::mysql perl module hasn't been fully installed, or perhaps the capitalisation of 'mysql' isn't right. Available drivers: DBM, ExampleP, File, Gofer, Proxy, Sponge. at /Library/Perl/5.8.6/Bio/DB/SeqFeature/Store/DBI/mysql.pm line 212 I am using MAC OSX version 10.4.10 and MAMP? Isnt it the "/Library/Perl/5.8.6" already in @INC? What am I missing? I have been googling this error for a few hours. I also install Bioperl and reinstall DBD::mysql using CPAN. It still doesnt work.. Here is my $PERL5LIB: /sw/lib/perl5:/sw/lib/perl5/darwin/ I really need help on this. Thank you, From awitney at sgul.ac.uk Sat Aug 21 10:39:10 2010 From: awitney at sgul.ac.uk (Adam Witney) Date: Sat, 21 Aug 2010 11:39:10 +0100 Subject: [Bioperl-l] bp_seqfeature_load.pl fails on Mac os. Please help. In-Reply-To: References: Message-ID: <491D1B66-741F-4315-8A6B-46F465956017@sgul.ac.uk> On 20 Aug 2010, at 22:29, Minh Bui wrote: > Hi,, > I am trying to load my GFF file to mysql database but I got this error > when I ran the bp_seqfeature_load.pl ( bioperl 1.6.1 on MAC) > > [BioComplexity-5:/usr/local/bin] minh% perl bp_seqfeature_load.pl > install_driver(mysql) failed: Can't locate DBD/mysql.pm in @INC (@INC > contains: /sw/lib/perl5 /sw/lib/perl5/darwin > /System/Library/Perl/5.8.6/darwin-thread-multi-2level > /System/Library/Perl/5.8.6 > /Library/Perl/5.8.6/darwin-thread-multi-2level /Library/Perl/5.8.6 > /Library/Perl /Network/Library/Perl/5.8.6/darwin-thread-multi-2level > /Network/Library/Perl/5.8.6 /Network/Library/Perl > /System/Library/Perl/Extras/5.8.6/darwin-thread-multi-2level > /System/Library/Perl/Extras/5.8.6 /Library/Perl/5.8.1 .) at (eval 44) > line 3. > Perhaps the DBD::mysql perl module hasn't been fully installed, > or perhaps the capitalisation of 'mysql' isn't right. > Available drivers: DBM, ExampleP, File, Gofer, Proxy, Sponge. > at /Library/Perl/5.8.6/Bio/DB/SeqFeature/Store/DBI/mysql.pm line 212 > > I am using MAC OSX version 10.4.10 and MAMP? Isnt it the > "/Library/Perl/5.8.6" already in @INC? What am I missing? > I have been googling this error for a few hours. I also install > Bioperl and reinstall DBD::mysql using CPAN. It still doesnt work.. > > Here is my $PERL5LIB: /sw/lib/perl5:/sw/lib/perl5/darwin/ Where did DBD:mysql get installed? can you verify that DBD/mysql.pm is actually in one of those directories listed above? From i.hatethispart at ymail.com Sat Aug 21 14:07:28 2010 From: i.hatethispart at ymail.com (keiko) Date: Sat, 21 Aug 2010 07:07:28 -0700 (PDT) Subject: [Bioperl-l] clustalw.exe In-Reply-To: <3612399.post@talk.nabble.com> References: <3612399.post@talk.nabble.com> Message-ID: <29499435.post@talk.nabble.com> Katrin wrote: > > hello, I am a new Perl/Bioperl-User and first I must excuse me for my > really bad english, but I hope everybody will understand me. I have the > following problem: In my Perl-skript is the following system call: > $y=exec("C:\\Programme\\xampp-win32-1.5.1\\xampp\\perl\\clustalw.exe > C:\\Programme\\xampp-win32-1.5.1\\xampp\\htdocs\\gene\\clustal.fasta"); If > I call this Script with the Shell (cmd.exe) everything works correctly. > But if I call this script with PHP I get the following error message: > Error: unknown option > /C:\Programme\xampp-win32-1.5.1\xampp\htdocs\gene\clustal.fasta. I tried > also system and qx. And I tested the environment variables: I wrote a > bat-file with the definition of all environment-variables and the system > call, but this did not work, too. The same problem is in php. The > PHP-Scipt is called from html and I worked under WindowsXP with xampp. I > hope, somebody can help me. greetings Katrin > Hi. I also have a problem with this one. I want to call clustalw using php. Can I ask what you included in your bat-file and where did you download your clustal? thanks a lot! -- View this message in context: http://old.nabble.com/clustalw.exe-tp3612399p29499435.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From jason at bioperl.org Sun Aug 22 18:29:30 2010 From: jason at bioperl.org (Jason Stajich) Date: Sun, 22 Aug 2010 11:29:30 -0700 Subject: [Bioperl-l] Enquiry on Bio::DB::Taxonomy In-Reply-To: References: Message-ID: <4C716C8A.3010000@bioperl.org> Hi Amali - This is how I'd print out the full classification by using the Tree methods (with probably a different way of initializing the $db object to your flatfiles location). #!/usr/bin/perl -w use strict; use Bio::DB::Taxonomy; my $db= Bio::DB::Taxonomy->new(-source => 'flatfile', -nodesfile => 'taxonomy/nodes.dmp', -namesfile => 'taxonomy/names.dmp'); my $taxonid = $db->get_taxonid('Homo sapiens'); my $taxon = $db->get_taxon(-taxonid => $taxonid); my $tree = Bio::Tree::Tree->new(-node => $taxon); my @taxa = $tree->get_nodes; print join(",", map { $_->scientific_name } @taxa), "\n"; -jason Amali Thrimawithana wrote, On 8/18/10 3:56 PM: > Dear Dr Stajich, > > I am a Masters student at Auckland university and my research is on > identifying yeast species present in wine by the use of 454 sequencing. In > order to carry out this research, a pipeline is being built in which at the > final step each representative OTU need to be classified at different > taxonomic levels (ie: at Phylum, family, class, genus and species) by using > the results from BLAST. To identify the sequences at each taxonomic level, I > have been trying out the Bio::DB::Taxonomy module in bioperl. Using this > module, I am able to get the genus and species level by splitting the > scientific name returned by the Bio::taxon object. But unfortunately I am > uncertain on how to get the information for the other levels of the rank. I > have tried several commands including "my @class = $node->classification;", > but it does not work. Hence, could you please let me know how I might be > able to get the higher levels of taxonomy such as class and phylum using > bioperl? > > Look forward to hearing from you soon > > Thanking You > > Amali > From cjfields at illinois.edu Sun Aug 22 19:56:58 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 22 Aug 2010 14:56:58 -0500 Subject: [Bioperl-l] clustalw.exe In-Reply-To: <29499435.post@talk.nabble.com> References: <3612399.post@talk.nabble.com> <29499435.post@talk.nabble.com> Message-ID: On Aug 21, 2010, at 9:07 AM, keiko wrote: > Katrin wrote: >> >> hello, I am a new Perl/Bioperl-User and first I must excuse me for my >> really bad english, but I hope everybody will understand me. I have the >> following problem: In my Perl-skript is the following system call: >> $y=exec("C:\\Programme\\xampp-win32-1.5.1\\xampp\\perl\\clustalw.exe >> C:\\Programme\\xampp-win32-1.5.1\\xampp\\htdocs\\gene\\clustal.fasta"); If >> I call this Script with the Shell (cmd.exe) everything works correctly. >> But if I call this script with PHP I get the following error message: >> Error: unknown option >> /C:\Programme\xampp-win32-1.5.1\xampp\htdocs\gene\clustal.fasta. I tried >> also system and qx. And I tested the environment variables: I wrote a >> bat-file with the definition of all environment-variables and the system >> call, but this did not work, too. The same problem is in php. The >> PHP-Scipt is called from html and I worked under WindowsXP with xampp. I >> hope, somebody can help me. greetings Katrin >> > > Hi. I also have a problem with this one. I want to call clustalw using php. > Can I ask what you included in your bat-file and where did you download your > clustal? thanks a lot! Not sure, but what does this have to do with BioPerl? chris From jason at bioperl.org Mon Aug 23 15:56:47 2010 From: jason at bioperl.org (Jason Stajich) Date: Mon, 23 Aug 2010 08:56:47 -0700 Subject: [Bioperl-l] a problem when using the Bioperl modules In-Reply-To: References: Message-ID: <4C729A3F.7080304@bioperl.org> Wei - Please ask your questions on the bioperl mailing list, I cannot answer questions directly for all requests. Your problem has been answered by me on the list before so I urge you to use the list archives as a starting point. The line lengths of the fasta file sequence aren't the same length. you need to run this bp_sreformat -if fasta -of fasta -i ORIGINAL -o NEW mv NEW ORIGINAL or with sreformat sreformat fasta ORIGINAL > NEW mv NEW ORIGINAL Guifeng Wei wrote, On 8/23/10 4:57 AM: > Dear professor Stajich, > So sorry to interrupt you. i came across a problem when i use the > Bio::DB::Fasta modules of BioPerl. The aim i want to arrive at is to > extract the subsequences accoording to the *.bed files which are the > C.elegans genomic sequnece annotation. The code i programed is in the > attached file. > The genomic sequences file contains sequences from 6 chromosomes of > C.elegans. > when i run this program in the command line, the following error > warnings was coming. > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Each line of the fasta entry must be the same length except the last. > Line above #301451 ' > ..' is 22 != 51 chars. > STACK: Error::throw > STACK: Bio::Root::Root::throw > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368 > STACK: Bio::DB::Fasta::calculate_offsets > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:770 > STACK: Bio::DB::Fasta::index_file > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:680 > STACK: Bio::DB::Fasta::new > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:491 > STACK: bed_to_fasta.pl:14 > ----------------------------------------------------------- > indexing was interrupted, so unlinking > /home/wgf/WORM_DATA/elegans.WS190.dna.fa.index at > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm line 1053. > > and therefore i write to you in hope that you can help me solve this > problem,as well as, give me some suggestion about how to learn Bioperl > well. > thank you very very much. > yours sincerely > Wei Guifeng From jason.stajich at ucr.edu Mon Aug 23 15:58:07 2010 From: jason.stajich at ucr.edu (Jason Stajich) Date: Mon, 23 Aug 2010 08:58:07 -0700 Subject: [Bioperl-l] a problem when using the Bioperl modules In-Reply-To: References: Message-ID: <4C729A8F.1070506@ucr.edu> You haven't defined this variable $db - you need to not skip the part that initializes the Bio::DB::Fasta object that you had previous asked about. Please send all your future queries to the mailing list. Guifeng Wei wrote, On 8/23/10 8:14 AM: > Dear professor, > after that, i revised my scripts, which is that i divide the genomic > sequences into 7 single file, every file contains the sequence from a > chromosome. > however, when i try to run the scripts, the following error was coming. > Can't call method "seq" on an undefined value at bed_to_fasta.pl > line 29, line 1. > while(){ > chomp $_; > my @bed=split(/\s+/, $_ ); > #print length($db->seq('chrI')); > my $chr_id=$bed[0]; > my $start=$bed[1]; > my $end=$bed[2]; > my $seq_name=$bed[3]; > my $strand=$bed[5]; > my $segment = $db ->seq($chr_id,$start=>$end); > print ">",$seq_name,"_",$chr_id,":",$start=>$end; > print "$segment\n"; > } > the blue line is . > why? -- Jason E. Stajich, PhD Assistant Professor Department of Plant Pathology & Microbiology University of California Riverside, CA 92521 jason.stajich at ucr.edu office: 951.827.2363 http://lab.stajich.org/ http://twitter.com/stajichlab http://fungalgenomes.org/blog/ http://plantpathology.ucr.edu/ http://genomics.ucr.edu/ http://cepceb.ucr.edu/ From guifengwei at gmail.com Tue Aug 24 02:44:57 2010 From: guifengwei at gmail.com (Guifeng Wei) Date: Tue, 24 Aug 2010 10:44:57 +0800 Subject: [Bioperl-l] a problem when using the Bio::DB::Fasta Message-ID: Hi, i came across a problem when i use the Bio::DB::Fasta modules of BioPerl. The aim i want to arrive at is to extract the subsequences accoording to the *.bed files which are the C.elegans genomic sequnece annotation. when i tried to run the scripts i wrote, the error message was coming, as follows: Can't call method "seq" on an undefined value at bed_to_fasta.pl line 28, line 1. so, ask for favor to slove this problem. Here is my perl scripts. #!/usr/bin/perl -w # Purpose: extract sequences from genomic sequences use strict; use Bio::DB::Fasta; open(IN,$ARGV[0]) || die "sorry, the program cannot open the .bed file, plea check it. \n"; my $db = Bio::DB::Fasta->new( '/home/wgf/elegans190.dna/' ); # The dir ...../elegans190.dna/ includes 6 files:chrI,chrII,chrIII,chrIV,chrV,chrX, #each stands for the sequences from the coressponding chromosome. while(){ chomp $_; my @bed=split(/\s+/, $_ ); my $chr_id=$bed[0]; my $start=$bed[1]; my $end=$bed[2]; my $seq_name=$bed[3]; my $strand=$bed[5]; my $segment = $db->seq( $chr_id, $start=>$end ); print ">",$seq_name,"_",$chr_id,":",$start=>$end; print "$segment\n"; } close(IN); From florent.angly at gmail.com Tue Aug 24 05:06:21 2010 From: florent.angly at gmail.com (Florent Angly) Date: Tue, 24 Aug 2010 15:06:21 +1000 Subject: [Bioperl-l] a problem when using the Bio::DB::Fasta In-Reply-To: References: Message-ID: <4C73534D.6080607@gmail.com> Hi Guifeng, From the Bio::DB::Fasta documentation: > $db = Bio::DB::Fasta->new($fasta_path [,%options]) > Create a new Bio::DB::Fasta object from the Fasta file or files > indicated by $fasta_path. Indexing will be performed > automatically > if needed. If successful, new() will return the database > accessor > object. Otherwise it will return undef. Hence, after you create the database object $db, you should check that it was successful, e.g.: > my $db = Bio::DB::Fasta->new( '/home/wgf/elegans190.dna/' ); > if (not defined $db) { > die "There was a problem creating the database\n"; > } A problem creating the database would explain the message you get. If the extension of the FASTA files in the directory path that you gave as input is not fa, fasta, fast, FA, FASTA, FAST or dna, then you should use the -glob option when constructing your database object. From the documentation: > -glob Glob expression to use > *.{fa,fasta,fast,FA,FASTA,FAST,dna} > for searching for Fasta > files in directories. Florent On 24/08/10 12:44, Guifeng Wei wrote: > Hi, > > i came across a problem when i use the Bio::DB::Fasta modules of > BioPerl. The aim i want to arrive at is to extract the subsequences > accoording to the *.bed files which are the C.elegans genomic sequnece > annotation. > > when i tried to run the scripts i wrote, the error message was coming, as > follows: > > Can't call method "seq" on an undefined value at bed_to_fasta.pl line 28, > line 1. > > so, ask for favor to slove this problem. > Here is my perl scripts. > > #!/usr/bin/perl -w > # Purpose: extract sequences from genomic sequences > use strict; > use Bio::DB::Fasta; > open(IN,$ARGV[0]) || die "sorry, the program cannot open the .bed file, plea > check it. \n"; > my $db = Bio::DB::Fasta->new( '/home/wgf/elegans190.dna/' ); > # The dir ...../elegans190.dna/ includes 6 > files:chrI,chrII,chrIII,chrIV,chrV,chrX, > #each stands for the sequences from the coressponding chromosome. > > while(){ > chomp $_; > my @bed=split(/\s+/, $_ ); > > my $chr_id=$bed[0]; > my $start=$bed[1]; > my $end=$bed[2]; > my $seq_name=$bed[3]; > my $strand=$bed[5]; > > my $segment = $db->seq( $chr_id, $start=>$end ); > > print ">",$seq_name,"_",$chr_id,":",$start=>$end; > print "$segment\n"; > > } > > close(IN); > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From guifengwei at gmail.com Tue Aug 24 11:28:16 2010 From: guifengwei at gmail.com (Guifeng Wei) Date: Tue, 24 Aug 2010 19:28:16 +0800 Subject: [Bioperl-l] a problem when using the Bio::DB::Fasta In-Reply-To: References: Message-ID: Hi, i have revised my scripts according to the previous email from Florent. However, there were still some errors which frustrated me so much. The errors are as follows: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Each line of the fasta entry must be the same length except the last. Line above #301451 ' ..' is 22 != 51 chars. STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368 STACK: Bio::DB::Fasta::calculate_offsets /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:770 STACK: Bio::DB::Fasta::index_dir /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:593 STACK: Bio::DB::Fasta::new /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:488 STACK: bed2fasta.pl:13 ----------------------------------------------------------- indexing was interrupted, so unlinking /home/wgf/elegans190.dna//directory.index at /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm line 1053 But in the directory /home/wgf/elegans190.dna/ , it concludes 6 files, each contains the complete sequences from one single chromosome, the format is fasta. The extension of the FASTA files is .fa. Every single file is started as ">chromosoemeXXX" followed by the thousands of sequences. and therefore, it warn me that "Each line of the fasta entry must be the same length except the last". and "indexing was interrupted, so unlinking /home/wgf/elegans190.dna//directory". i was much confused about this. so for help. Wei Guifeng From biopython at maubp.freeserve.co.uk Tue Aug 24 13:28:33 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 24 Aug 2010 14:28:33 +0100 Subject: [Bioperl-l] a problem when using the Bio::DB::Fasta In-Reply-To: References: Message-ID: On Tue, Aug 24, 2010 at 12:28 PM, Guifeng Wei wrote: > Hi, > > i have revised my scripts according to the previous email from Florent. > However, there were still some errors which frustrated me so much. > > The errors are as follows: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Each line of the fasta entry must be the same length except the last. > ? ?Line above #301451 ' > ..' is 22 != 51 chars. > STACK: Error::throw > STACK: Bio::Root::Root::throw > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368 > STACK: Bio::DB::Fasta::calculate_offsets > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:770 > STACK: Bio::DB::Fasta::index_dir > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:593 > STACK: Bio::DB::Fasta::new > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:488 > STACK: bed2fasta.pl:13 > ----------------------------------------------------------- > indexing was interrupted, so unlinking > /home/wgf/elegans190.dna//directory.index at > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm line 1053 > But in the directory /home/wgf/elegans190.dna/ , it concludes 6 files, > each contains the complete sequences from one single chromosome, the format > is fasta. The extension of the FASTA files is .fa. Every single file is > started as ">chromosoemeXXX" followed by the thousands of sequences. > > and therefore, it warn me that "Each line of the fasta entry must be the > same length except the last". and "indexing was interrupted, so unlinking > /home/wgf/elegans190.dna//directory". > > i was much confused about this. so for help. > > Wei Guifeng Hi Wei, It sounds like there is inconsistent line wrapping in your FASTA file. This is often not a problem at all, but the DB indexing system (and indeed other indexing tools like the samtools fasta index) requires all the entries have the same wrapping. e.g. This is a valid FASTA file but would not be suitable for indexing: >Test ACGTACGT ACGTACGT ACGTACGT ACGT ACGT T Ignoring the final line (special case - here length one) that uses a mixture of line lengths, 8 and 4. If you had used this it should be fine: >Test ACGTACGT ACGTACGT ACGTACGT ACGTACGT T All the lines are now wrapped at length 8 (and the final line is less than or equal to length 8). Of course, in a real file wrapping a 60 or 80 characters is more common ;) Peter From cjfields at illinois.edu Tue Aug 24 13:38:45 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 24 Aug 2010 08:38:45 -0500 Subject: [Bioperl-l] a problem when using the Bio::DB::Fasta In-Reply-To: References: Message-ID: <995BCF30-99B2-46C2-A4E8-681F9E2A0BB5@illinois.edu> Guifeng, Did you follow Jason's advice yesterday about converting the FASTA over to a more consistent length? Or checking the database itself? These are both things reiterated by Florent and Peter. >From Jason's last response: ------------------------- Wei - Please ask your questions on the bioperl mailing list, I cannot answer questions directly for all requests. Your problem has been answered by me on the list before so I urge you to use the list archives as a starting point. The line lengths of the fasta file sequence aren't the same length. you need to run this bp_sreformat -if fasta -of fasta -i ORIGINAL -o NEW mv NEW ORIGINAL or with sreformat sreformat fasta ORIGINAL > NEW mv NEW ORIGINAL ------------------------- chris On Aug 24, 2010, at 6:28 AM, Guifeng Wei wrote: > Hi, > > i have revised my scripts according to the previous email from Florent. > However, there were still some errors which frustrated me so much. > > The errors are as follows: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Each line of the fasta entry must be the same length except the last. > Line above #301451 ' > ..' is 22 != 51 chars. > STACK: Error::throw > STACK: Bio::Root::Root::throw > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368 > STACK: Bio::DB::Fasta::calculate_offsets > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:770 > STACK: Bio::DB::Fasta::index_dir > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:593 > STACK: Bio::DB::Fasta::new > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:488 > STACK: bed2fasta.pl:13 > ----------------------------------------------------------- > indexing was interrupted, so unlinking > /home/wgf/elegans190.dna//directory.index at > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm line 1053 > But in the directory /home/wgf/elegans190.dna/ , it concludes 6 files, > each contains the complete sequences from one single chromosome, the format > is fasta. The extension of the FASTA files is .fa. Every single file is > started as ">chromosoemeXXX" followed by the thousands of sequences. > > and therefore, it warn me that "Each line of the fasta entry must be the > same length except the last". and "indexing was interrupted, so unlinking > /home/wgf/elegans190.dna//directory". > > i was much confused about this. so for help. > > Wei Guifeng > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From scott at scottcain.net Tue Aug 24 15:01:47 2010 From: scott at scottcain.net (Scott Cain) Date: Tue, 24 Aug 2010 11:01:47 -0400 Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query In-Reply-To: <1282321387.30812.17.camel@pyrimidine.igb.uiuc.edu> References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au> <986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net> <045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net> <5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu> <4C6DADDF.1000103@cornell.edu> <20100820155956.GB400@vpn-165-124-164-118.vpn.northwestern.edu> <1282321387.30812.17.camel@pyrimidine.igb.uiuc.edu> Message-ID: Hi Chris, GMOD still only supports Chado with Postgres (for example, the GFF loader assumes a Postgres database), but when I reengineered the GFF loader a few years ago, I tried to do it with subclassing the loader in mind so that it could be subclassed to work with other RDMS. Scott On Fri, Aug 20, 2010 at 12:23 PM, Chris Fields wrote: > On Fri, 2010-08-20 at 10:59 -0500, Siddhartha Basu wrote: >> Hi, >> >> On Thu, 19 Aug 2010, Robert Buels wrote: >> >> > Chris Fields wrote: >> > > I think it's worth exploring having a DBIx::Class-based middle-ware >> > > approach similar to what Rob Buels has done for Chado. ?That would be >> > > fairly easy to get started using DBIx::Class::Schema::Loader. >> > > After that it would require optimization and tweaking, which is >> > > potentially more complex than Rob's setup as Chado is very Pg-specific, >> > > but maybe Rob can elaborate... >> > >> > Elaborating on how Bio::Chado::Schema is developed: >> > >> > The vast majority of the code and POD in BCS is autogenerated by >> > DBIx::Class::Schema::Loader. ?DBICSL gives you a baseline set of >> > DBIx::Class classes that covers all the tables, views, columns, unique >> > constraints, and foreign key relationships. >> > >> > Beyond that, you have to add on yourself. ?In BCS, we have mostly done >> > things like: >> > >> > ? * make better-named aliases for some of the autogenerated >> > ? ? relationships (though DBICSL does a surprisingly good job of naming >> > ? ? relationships automatically most of the time) >> > ? * add a tiny bit of bioperl compatibility (this needs a lot more work >> > ? ? by somebody, volunteers needed!) >> > ? * add convenience methods for using some of the Chado property tables >> > ? * use DBIx::Class::Tree::NestedSet to add some powerful ways of >> > ? ? traversing phylogenetic tree relationships >> > >> > Regarding DB backend specificity, BCS isn't Pg-specific at all, because >> > DBIx::Class itself goes to great lengths to be compatible (and performant!) >> > with just about every relational database out there. >> I would vouch for that at least as far as chado in oracle is concerned. >> So, ?far BCS works out flawlessly with our oracle chado instance at >> dictybase. Quite a chunk of BCS based code is also active in couple of >> our Mojo based webapps. The part which i still couldn't use directly is >> the 'synonym' table as it clashes with oracle specific reserved keywords. >> However, ?overall it seems to quite cross-RDMS compatible and highly >> recommended. >> >> -siddhartha > > Just to point out, I didn't say BCS is Pg-specific, but that Chado is > (that was the DBMS it was designed for). ?Maybe that should be amended > to 'was' now :) > > I recall seeing a page on this somewhere on the GMOD website along the > lines of "MySQL has problems so we chose Pg", and that Chado support > would focus on Pg. ?I'm guessing that's no longer the case? ?Or is only > the server-side stuff Pg-specific. > >> >In fact, the BCS test >> > suite deploys a Chado schema into a temporary SQLite database using >> > DBIC::Schema's deploy() method, and runs all of its tests on that. ?Very >> > handy. >> > >> > Chado's Pg-specific server-side functions can of course be called through >> > BCS if they are present, but it's perfectly possible to use Chado without >> > any of the server-side functions, and mostly the way I use it. >> > >> > Rob > > I think this opens up the possibility of starting a DBIx::Class-based > middleware solution. ?Hilmar, did you want to take that on? > > chris > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From bgs500 at york.ac.uk Tue Aug 24 15:35:53 2010 From: bgs500 at york.ac.uk (Ben Saville) Date: Tue, 24 Aug 2010 16:35:53 +0100 Subject: [Bioperl-l] Problem Parsing BLAST output In-Reply-To: <0384052D-74D2-4789-B7FA-76EED826044F@sbc.su.se> References: <77D8BE69-E2D3-4C75-BED4-08F1755E414F@york.ac.uk> <0384052D-74D2-4789-B7FA-76EED826044F@sbc.su.se> Message-ID: <34F7412D-2BFA-4D80-AEEB-2B8A9BE415D4@york.ac.uk> Sorry for the Delay in replying, 454 data analysis is very time consuming. please see http://seqanswers.com/forums/showthread.php?t=6484 For a discussion about this problem, and how we solved the issue. Thanks for the reply though, much appreciated! Regards Ben Saville On 20 Aug 2010, at 14:48, Dave Messina wrote: > Hi Ben, > > I would not use the script you posted ? I don't think it does what > you want. > > If you haven't already, you should take a look at the beginners' HOWTO > > http://www.bioperl.org/wiki/HOWTO:Beginners > > > the SearchIO HOWTO > > http://www.bioperl.org/wiki/HOWTO:SearchIO > > > and the example scripts included with BioPerl: > > http://www.bioperl.org/wiki/Scripts > > > > Incidentally, it's a lot of fiddly data processing to parse blast > reports for many contigs against multiple databases and then go back > and collate the results by query. I'm not sure exactly what you want > to do once you've separated by query ? if you provide some more > information, we could suggest ways to best get you where you want to > go. > > I will mention, though, that BLAST has the ability to search > multiple separate databases in one go and collate the results for > you. So that's something to consider. > > > > Dave > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Aug 24 15:54:20 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 24 Aug 2010 10:54:20 -0500 Subject: [Bioperl-l] a problem when using the Bio::DB::Fasta In-Reply-To: References: <995BCF30-99B2-46C2-A4E8-681F9E2A0BB5@illinois.edu> Message-ID: Please keep all responses on-list. Regarding sreformat: http://tinyurl.com/28q75rr Judging by the stack traces below, you are also running off a UNIX-like system. To concatenate files, use 'cat'. So, for all files ending with .fa: cat *.fa >> all.fa chris On Aug 24, 2010, at 8:54 AM, Guifeng Wei wrote: > Hello Fields, > > i have checked the fasta files. i suddenly find that the last line is blank line, and the last second is less than common. > > i am not able to run the command line as Jason's advice because i have no knowledge about "sreformat". > > i also want to ask a more question. i want megre the several single chromosome sequence file into one, OK? > > thank you very much. > > Wei Guifeng > 2010/8/24 Chris Fields > Guifeng, > > Did you follow Jason's advice yesterday about converting the FASTA over to a more consistent length? Or checking the database itself? These are both things reiterated by Florent and Peter. > > From Jason's last response: > > ------------------------- > Wei - > > Please ask your questions on the bioperl mailing list, I cannot answer questions directly for all requests. > Your problem has been answered by me on the list before so I urge you to use the list archives as a starting point. > > The line lengths of the fasta file sequence aren't the same length. > > you need to run this > bp_sreformat -if fasta -of fasta -i ORIGINAL -o NEW > mv NEW ORIGINAL > > or with sreformat > sreformat fasta ORIGINAL > NEW > mv NEW ORIGINAL > ------------------------- > > chris > > > On Aug 24, 2010, at 6:28 AM, Guifeng Wei wrote: > > > Hi, > > > > i have revised my scripts according to the previous email from Florent. > > However, there were still some errors which frustrated me so much. > > > > The errors are as follows: > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: Each line of the fasta entry must be the same length except the last. > > Line above #301451 ' > > ..' is 22 != 51 chars. > > STACK: Error::throw > > STACK: Bio::Root::Root::throw > > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368 > > STACK: Bio::DB::Fasta::calculate_offsets > > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:770 > > STACK: Bio::DB::Fasta::index_dir > > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:593 > > STACK: Bio::DB::Fasta::new > > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:488 > > STACK: bed2fasta.pl:13 > > ----------------------------------------------------------- > > indexing was interrupted, so unlinking > > /home/wgf/elegans190.dna//directory.index at > > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm line 1053 > > But in the directory /home/wgf/elegans190.dna/ , it concludes 6 files, > > each contains the complete sequences from one single chromosome, the format > > is fasta. The extension of the FASTA files is .fa. Every single file is > > started as ">chromosoemeXXX" followed by the thousands of sequences. > > > > and therefore, it warn me that "Each line of the fasta entry must be the > > same length except the last". and "indexing was interrupted, so unlinking > > /home/wgf/elegans190.dna//directory". > > > > i was much confused about this. so for help. > > > > Wei Guifeng > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > -- > ??? Wei Guifeng > > > From cjfields at illinois.edu Tue Aug 24 16:14:51 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 24 Aug 2010 11:14:51 -0500 Subject: [Bioperl-l] Problem Parsing BLAST output In-Reply-To: <34F7412D-2BFA-4D80-AEEB-2B8A9BE415D4@york.ac.uk> References: <77D8BE69-E2D3-4C75-BED4-08F1755E414F@york.ac.uk> <0384052D-74D2-4789-B7FA-76EED826044F@sbc.su.se> <34F7412D-2BFA-4D80-AEEB-2B8A9BE415D4@york.ac.uk> Message-ID: <69C47A74-09C7-4024-9303-A3893658A2A8@illinois.edu> Just in case anyone needs it, there is a way to index these as well (both BLAST and the two tabular BLAST versions) for fast lookups of specific reports, if needed. See Bio::Index::Blast and Bio::Index::BlastTable in BioPerl. Caveat: I believe there is a bug with BLAST+ text output indexing (it chops the header off subsequent reports). I haven't investigated it enough, though, but I'll try looking into it today. chris On Aug 24, 2010, at 10:35 AM, Ben Saville wrote: > Sorry for the Delay in replying, 454 data analysis is very time consuming. > > please see http://seqanswers.com/forums/showthread.php?t=6484 > For a discussion about this problem, and how we solved the issue. > > Thanks for the reply though, much appreciated! > > Regards > Ben Saville > > > > > > On 20 Aug 2010, at 14:48, Dave Messina wrote: > >> Hi Ben, >> >> I would not use the script you posted ? I don't think it does what you want. >> >> If you haven't already, you should take a look at the beginners' HOWTO >> >> http://www.bioperl.org/wiki/HOWTO:Beginners >> >> >> the SearchIO HOWTO >> >> http://www.bioperl.org/wiki/HOWTO:SearchIO >> >> >> and the example scripts included with BioPerl: >> >> http://www.bioperl.org/wiki/Scripts >> >> >> >> Incidentally, it's a lot of fiddly data processing to parse blast reports for many contigs against multiple databases and then go back and collate the results by query. I'm not sure exactly what you want to do once you've separated by query ? if you provide some more information, we could suggest ways to best get you where you want to go. >> >> I will mention, though, that BLAST has the ability to search multiple separate databases in one go and collate the results for you. So that's something to consider. >> >> >> >> Dave >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Aug 24 16:17:17 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 24 Aug 2010 11:17:17 -0500 Subject: [Bioperl-l] FYI: interesting stuff in BLAST 2.2.24 release announcement References: <73C34D2F-813E-4FCE-8819-B86BC5A974C0@ncbi.nlm.nih.gov> Message-ID: FYI, Very interesting additions to BLAST+ (archive format). chris Begin forwarded message: > From: mcginnis > Date: August 24, 2010 10:46:50 AM CDT > To: NLM/NCBI List blast-announce > Subject: [blast-announce] Correction: BLAST 2.2.24 release announcement > > A new version of the stand-alone applications is available. > > Users are encouraged to use the BLAST+ applications available at ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ > > This release includes a number of bug fixes as well as new features for the BLAST+ applications: > > * Introduce BLAST Archive format to permit reformatting of stand-alone BLAST searches with the blast_formatter(see BLAST+ user manual) > * Added the blast_formatter application (see BLAST+ user manual) > * Added support for translated subject soft masking in the BLAST databases > * Added support for the BLAST Trace-back operations (btop) output format > * Added command line options to blastdbcmd for listing available BLAST databases > * Improved performance of formatting of remote BLAST searches > * Use a consistent exit code for out of memory conditions > * Fixed bug in indexed megablast with multiple space-separated BLAST databases > * Fixed bugs in legacy_blast.pl, blastdbcmd, rpsblast, and makeblastdb > * Fixed Windows installer for 64-bit installations > > BLAST+ applications, as well as the legacy C applications (e.g. blastall), may be downloaded from http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download From David.Messina at sbc.su.se Tue Aug 24 17:00:14 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 24 Aug 2010 19:00:14 +0200 Subject: [Bioperl-l] FYI: interesting stuff in BLAST 2.2.24 release announcement In-Reply-To: References: <73C34D2F-813E-4FCE-8819-B86BC5A974C0@ncbi.nlm.nih.gov> Message-ID: <27DD75E8-4452-4B2D-B5B9-A686C113E5B6@sbc.su.se> Here's a link to the manual: ftp://ftp.ncbi.nlm.nih.gov//blast/executables/blast%2B/2.2.24/user_manual.pdf (Is it on the NCBI website somewhere? Strange to have only a downloadable PDF.) The section on the new archive format is on page 27. It seems like a nice idea to have the flexibility, but I wonder about the time cost of using this format. One of the big gains from using tab-delimited output is that BLAST doesn't have to do all the post-processing to generate the alignment views. By doing the archive format, which if I understand it correctly is ASN.1, you're always paying the full price in time (and space, for that matter). Dave On Aug 24, 2010, at 18:17 , Chris Fields wrote: > FYI, > > Very interesting additions to BLAST+ (archive format). > > chris > > Begin forwarded message: > >> From: mcginnis >> Date: August 24, 2010 10:46:50 AM CDT >> To: NLM/NCBI List blast-announce >> Subject: [blast-announce] Correction: BLAST 2.2.24 release announcement >> >> A new version of the stand-alone applications is available. >> >> Users are encouraged to use the BLAST+ applications available at ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ >> >> This release includes a number of bug fixes as well as new features for the BLAST+ applications: >> >> * Introduce BLAST Archive format to permit reformatting of stand-alone BLAST searches with the blast_formatter(see BLAST+ user manual) >> * Added the blast_formatter application (see BLAST+ user manual) >> * Added support for translated subject soft masking in the BLAST databases >> * Added support for the BLAST Trace-back operations (btop) output format >> * Added command line options to blastdbcmd for listing available BLAST databases >> * Improved performance of formatting of remote BLAST searches >> * Use a consistent exit code for out of memory conditions >> * Fixed bug in indexed megablast with multiple space-separated BLAST databases >> * Fixed bugs in legacy_blast.pl, blastdbcmd, rpsblast, and makeblastdb >> * Fixed Windows installer for 64-bit installations >> >> BLAST+ applications, as well as the legacy C applications (e.g. blastall), may be downloaded from http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Aug 24 17:26:49 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 24 Aug 2010 12:26:49 -0500 Subject: [Bioperl-l] FYI: interesting stuff in BLAST 2.2.24 release announcement In-Reply-To: <27DD75E8-4452-4B2D-B5B9-A686C113E5B6@sbc.su.se> References: <73C34D2F-813E-4FCE-8819-B86BC5A974C0@ncbi.nlm.nih.gov> <27DD75E8-4452-4B2D-B5B9-A686C113E5B6@sbc.su.se> Message-ID: It's probably more applicable from the viewpoint of a cluster admin who would want to add the flexibility of having a single archive and allowing any format (as opposed to re-running the analysis). I'm just wondering if there is anything to glean there for possible alignment archiving purposes (ala SAM/BAM), but if it's ASN.1, likely not. chris On Aug 24, 2010, at 12:00 PM, Dave Messina wrote: > Here's a link to the manual: > ftp://ftp.ncbi.nlm.nih.gov//blast/executables/blast%2B/2.2.24/user_manual.pdf > > (Is it on the NCBI website somewhere? Strange to have only a downloadable PDF.) The section on the new archive format is on page 27. > > It seems like a nice idea to have the flexibility, but I wonder about the time cost of using this format. > > One of the big gains from using tab-delimited output is that BLAST doesn't have to do all the post-processing to generate the alignment views. By doing the archive format, which if I understand it correctly is ASN.1, you're always paying the full price in time (and space, for that matter). > > > > Dave > > > > > On Aug 24, 2010, at 18:17 , Chris Fields wrote: > >> FYI, >> >> Very interesting additions to BLAST+ (archive format). >> >> chris >> >> Begin forwarded message: >> >>> From: mcginnis >>> Date: August 24, 2010 10:46:50 AM CDT >>> To: NLM/NCBI List blast-announce >>> Subject: [blast-announce] Correction: BLAST 2.2.24 release announcement >>> >>> A new version of the stand-alone applications is available. >>> >>> Users are encouraged to use the BLAST+ applications available at ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ >>> >>> This release includes a number of bug fixes as well as new features for the BLAST+ applications: >>> >>> * Introduce BLAST Archive format to permit reformatting of stand-alone BLAST searches with the blast_formatter(see BLAST+ user manual) >>> * Added the blast_formatter application (see BLAST+ user manual) >>> * Added support for translated subject soft masking in the BLAST databases >>> * Added support for the BLAST Trace-back operations (btop) output format >>> * Added command line options to blastdbcmd for listing available BLAST databases >>> * Improved performance of formatting of remote BLAST searches >>> * Use a consistent exit code for out of memory conditions >>> * Fixed bug in indexed megablast with multiple space-separated BLAST databases >>> * Fixed bugs in legacy_blast.pl, blastdbcmd, rpsblast, and makeblastdb >>> * Fixed Windows installer for 64-bit installations >>> >>> BLAST+ applications, as well as the legacy C applications (e.g. blastall), may be downloaded from http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Tue Aug 24 18:45:29 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 24 Aug 2010 20:45:29 +0200 Subject: [Bioperl-l] FYI: interesting stuff in BLAST 2.2.24 release announcement In-Reply-To: References: <73C34D2F-813E-4FCE-8819-B86BC5A974C0@ncbi.nlm.nih.gov> <27DD75E8-4452-4B2D-B5B9-A686C113E5B6@sbc.su.se> Message-ID: <00C04DF9-F3C2-4574-B1E4-A3BF28EE953F@sbc.su.se> > It's probably more applicable from the viewpoint of a cluster admin who would want to add the flexibility of having a single archive and allowing any format (as opposed to re-running the analysis). Good point. > I'm just wondering if there is anything to glean there for possible alignment archiving purposes (ala SAM/BAM), but if it's ASN.1, likely not. To be honest, I didn't look that closely at it. It may be worth considering nevertheless. Dave From buiduyminh at gmail.com Tue Aug 24 18:56:43 2010 From: buiduyminh at gmail.com (Minh Bui) Date: Tue, 24 Aug 2010 14:56:43 -0400 Subject: [Bioperl-l] bp_seqfeature_load.pl fails on Mac os. Please help. In-Reply-To: <491D1B66-741F-4315-8A6B-46F465956017@sgul.ac.uk> References: <491D1B66-741F-4315-8A6B-46F465956017@sgul.ac.uk> Message-ID: How can I know where DBD:mysql PATH on my MAC? I am very new to MAC sorry. I just check and mysql.pm is in /Library/Perl/5.8.6/Bio/DB/SeqFeature/Store/DBI/mysql.pm On 8/21/10, Adam Witney wrote: > > On 20 Aug 2010, at 22:29, Minh Bui wrote: > > > Hi,, > > I am trying to load my GFF file to mysql database but I got this error > > when I ran the bp_seqfeature_load.pl ( bioperl 1.6.1 on ?MAC) > > > > [BioComplexity-5:/usr/local/bin] minh% perl bp_seqfeature_load.pl > > install_driver(mysql) failed: Can't locate DBD/mysql.pm in @INC (@INC > > contains: /sw/lib/perl5 /sw/lib/perl5/darwin > > /System/Library/Perl/5.8.6/darwin-thread-multi-2level > > /System/Library/Perl/5.8.6 > > /Library/Perl/5.8.6/darwin-thread-multi-2level /Library/Perl/5.8.6 > > /Library/Perl /Network/Library/Perl/5.8.6/darwin-thread-multi-2level > > /Network/Library/Perl/5.8.6 /Network/Library/Perl > > /System/Library/Perl/Extras/5.8.6/darwin-thread-multi-2level > > /System/Library/Perl/Extras/5.8.6 /Library/Perl/5.8.1 .) at (eval 44) > > line 3. > > Perhaps the DBD::mysql perl module hasn't been fully installed, > > or perhaps the capitalisation of 'mysql' isn't right. > > Available drivers: DBM, ExampleP, File, Gofer, Proxy, Sponge. > > at /Library/Perl/5.8.6/Bio/DB/SeqFeature/Store/DBI/mysql.pm line 212 > > > > I am using MAC OSX version 10.4.10 and MAMP? Isnt it the > > "/Library/Perl/5.8.6" already in @INC? What am I missing? > > I have been googling this error for a few hours. I also install > > Bioperl and reinstall DBD::mysql using CPAN. It still doesnt work.. > > > > Here is my $PERL5LIB: ?/sw/lib/perl5:/sw/lib/perl5/darwin/ > > > > Where did DBD:mysql get installed? can you verify that DBD/mysql.pm is actually in one of those directories listed above? > > From scott at scottcain.net Tue Aug 24 19:04:04 2010 From: scott at scottcain.net (Scott Cain) Date: Tue, 24 Aug 2010 15:04:04 -0400 Subject: [Bioperl-l] bp_seqfeature_load.pl fails on Mac os. Please help. In-Reply-To: References: <491D1B66-741F-4315-8A6B-46F465956017@sgul.ac.uk> Message-ID: Hi Minh, The file you found is not DBD::mysql though; it is Bio::DB::SeqFeature::Store::DBI::mysql, which was installed along with BioPerl. How did you find that file? The same method presumably would turn up DBD::mysql if it existed. I would use a command like this: locate mysql.pm which would locate all of the instances of files name mysql.pm on your computer. I would expect it to be located in /Library/Perl/5.8.6/darwin-thread-multi-2level/DBD/ if it was installed in a "normal" way (that is, not involving macports or fink or MAMP). Scott On Tue, Aug 24, 2010 at 2:56 PM, Minh Bui wrote: > How can I know where DBD:mysql PATH on my MAC? I am very new to MAC sorry. > > I just check and mysql.pm is in > /Library/Perl/5.8.6/Bio/DB/SeqFeature/Store/DBI/mysql.pm > > > > On 8/21/10, Adam Witney wrote: >> >> ?On 20 Aug 2010, at 22:29, Minh Bui wrote: >> >> ?> Hi,, >> ?> I am trying to load my GFF file to mysql database but I got this error >> ?> when I ran the bp_seqfeature_load.pl ( bioperl 1.6.1 on ?MAC) >> ?> >> ?> [BioComplexity-5:/usr/local/bin] minh% perl bp_seqfeature_load.pl >> ?> install_driver(mysql) failed: Can't locate DBD/mysql.pm in @INC (@INC >> ?> contains: /sw/lib/perl5 /sw/lib/perl5/darwin >> ?> /System/Library/Perl/5.8.6/darwin-thread-multi-2level >> ?> /System/Library/Perl/5.8.6 >> ?> /Library/Perl/5.8.6/darwin-thread-multi-2level /Library/Perl/5.8.6 >> ?> /Library/Perl /Network/Library/Perl/5.8.6/darwin-thread-multi-2level >> ?> /Network/Library/Perl/5.8.6 /Network/Library/Perl >> ?> /System/Library/Perl/Extras/5.8.6/darwin-thread-multi-2level >> ?> /System/Library/Perl/Extras/5.8.6 /Library/Perl/5.8.1 .) at (eval 44) >> ?> line 3. >> ?> Perhaps the DBD::mysql perl module hasn't been fully installed, >> ?> or perhaps the capitalisation of 'mysql' isn't right. >> ?> Available drivers: DBM, ExampleP, File, Gofer, Proxy, Sponge. >> ?> at /Library/Perl/5.8.6/Bio/DB/SeqFeature/Store/DBI/mysql.pm line 212 >> ?> >> ?> I am using MAC OSX version 10.4.10 and MAMP? Isnt it the >> ?> "/Library/Perl/5.8.6" already in @INC? What am I missing? >> ?> I have been googling this error for a few hours. I also install >> ?> Bioperl and reinstall DBD::mysql using CPAN. It still doesnt work.. >> ?> >> ?> Here is my $PERL5LIB: ?/sw/lib/perl5:/sw/lib/perl5/darwin/ >> >> >> >> Where did DBD:mysql get installed? can you verify that DBD/mysql.pm is actually in one of those directories listed above? >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From jason at bioperl.org Wed Aug 25 04:33:45 2010 From: jason at bioperl.org (Jason Stajich) Date: Tue, 24 Aug 2010 21:33:45 -0700 Subject: [Bioperl-l] Enquiry on gi_taxid_nucl.dmp.gz In-Reply-To: References: Message-ID: <4C749D29.3040003@bioperl.org> hi - please keep questions on list. I think one of your problem is your first use of $gi2taxidfile is wrong. when you call tie you want to specify an dbfile you want to store the index in. So call it "/tmp/gi2taxid.idx" or something like that. In my code here http://github.com/bioperl/bioperl-live/blob/master/scripts/taxa/classify_hits_kingdom.PLS you will see on line 97 we construct the name of the index file to be the folder, plus 'idx', plus the name gi2taxid which will be the name of index file. Also it would be safer for the split to be whitespace matching and that you want the the two first columns from the file. Doing this would eliminate the need for the chomp on the line above. my ($gi, $taxid) = split(/\s+/, $_); instead of chomp; my ($gi, $taxid) = split(" ", $_,2); There may be other problems but these should be fixed first -- and please send queries to the mailing list rather than to me directly so that others can answer questions. -jason Amali Thrimawithana wrote, On 8/24/10 8:13 PM: > Dear Jason > > Thank you very much for the information. I manage to get the information on > different taxonomic levels with the help of one of your example code > "local_taxonomydb_query". However I am having trouble with creating a local > index file of the gi_taxid_nucl.dmp so that I am able to get the taxonomic > id given the GI number of NCBI. At the moment I am using the tie() function > with DB_file and then storing the detail into a hash. However when I try to > retrieve a taxonomic ID given the GI number, it is not returning any thing > but an error. Below is part of the code (borrowed from the example code > classify kingdom), can you please let me know where I am going wrong? > ... > my $dbh2 = tie(%taxid4gi, 'DB_File', $gi2taxidfile); > > if( ! $done ) { > my $fh; > open(GI2TAXID, "$gi2taxidfile") or die $!; #here passing the unzipped > gi_taxid_nucl.dmp > my$i=0; > while () { > chomp; > my ($gi, $taxid) = split(" ", $_, 2); > $taxid4gi{$gi} = $taxid > if exists $taxid4gi{$gi}; > $i++; > unless( $DEBUG&& $i % 100000 ) { > warn "$i\n"; > } > } > $dbh2->sync; > } > my $gi2='183397240'; > my $taxd2=$taxid4gi{$gi2}; > print $taxd2, " \n"; > > Any help would be much appreciated > > Thanking you > Amali > > On 23 August 2010 06:29, Jason Stajich wrote: > > >> Hi Amali - >> >> This is how I'd print out the full classification by using the Tree methods >> (with probably a different way of initializing the $db object to your >> flatfiles location). >> >> #!/usr/bin/perl -w >> use strict; >> use Bio::DB::Taxonomy; >> >> my $db= Bio::DB::Taxonomy->new(-source => 'flatfile', >> -nodesfile => 'taxonomy/nodes.dmp', >> -namesfile => 'taxonomy/names.dmp'); >> >> my $taxonid = $db->get_taxonid('Homo sapiens'); >> my $taxon = $db->get_taxon(-taxonid => $taxonid); >> my $tree = Bio::Tree::Tree->new(-node => $taxon); >> my @taxa = $tree->get_nodes; >> print join(",", map { $_->scientific_name } @taxa), "\n"; >> >> -jason >> >> Amali Thrimawithana wrote, On 8/18/10 3:56 PM: >> >> Dear Dr Stajich, >> >>> I am a Masters student at Auckland university and my research is on >>> identifying yeast species present in wine by the use of 454 sequencing. In >>> order to carry out this research, a pipeline is being built in which at >>> the >>> final step each representative OTU need to be classified at different >>> taxonomic levels (ie: at Phylum, family, class, genus and species) by >>> using >>> the results from BLAST. To identify the sequences at each taxonomic level, >>> I >>> have been trying out the Bio::DB::Taxonomy module in bioperl. Using this >>> module, I am able to get the genus and species level by splitting the >>> scientific name returned by the Bio::taxon object. But unfortunately I am >>> uncertain on how to get the information for the other levels of the rank. >>> I >>> have tried several commands including "my @class = >>> $node->classification;", >>> but it does not work. Hence, could you please let me know how I might be >>> able to get the higher levels of taxonomy such as class and phylum using >>> bioperl? >>> >>> Look forward to hearing from you soon >>> >>> Thanking You >>> >>> Amali >>> >>> >>> From roy.chaudhuri at gmail.com Wed Aug 25 11:12:15 2010 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Wed, 25 Aug 2010 12:12:15 +0100 Subject: [Bioperl-l] Enquiry on gi_taxid_nucl.dmp.gz In-Reply-To: <4C749D29.3040003@bioperl.org> References: <4C749D29.3040003@bioperl.org> Message-ID: <4C74FA8F.3080506@gmail.com> > Also it would be safer for the split to be whitespace matching and that > you want the the two first columns from the file. Doing this would > eliminate the need for the chomp on the line above. > > my ($gi, $taxid) = split(/\s+/, $_); > > instead of > > chomp; > my ($gi, $taxid) = split(" ", $_,2); Sorry to be pedantic, but according to perldoc -f split: "As a special case, specifying a PATTERN of space (' ') will split on white space just as "split" with no arguments does" The only difference between patterns of " " and /\s+/ is that the latter will return an initial null field if there is leading white space, which may or may not be what you want. $ perl -e 'print join("-", split(" ", " 1\t2 3")), "\n"' 1-2-3 $ perl -e 'print join("-", split(/\s+/, " 1\t2 3")), "\n"' -1-2-3 Cheers. Roy. From kanmaninradha at gmail.com Thu Aug 26 08:29:08 2010 From: kanmaninradha at gmail.com (kanmani radha) Date: Thu, 26 Aug 2010 01:29:08 -0700 Subject: [Bioperl-l] getting DNA sequence for exon features from GFF Message-ID: Hi All, I would like to get the DNA seq from GFF file. I'm using Bio::Tools::GFF module. I could get everything else but not the DNA seq. Can anyone help me to find this out, Please. I appreciate your help very much. thanks, Kanmani #!/usr/bin/perl use strict; use warnings; use Bio::Tools::GFF; my $file = shift; my $gffio = Bio::Tools::GFF->new(-file => $file, -gff_version => 3); $gffio->features_attached_to_seqs(1); while (my $feat = $gffio->next_feature()){ my $start = $feat->start; my $end= $feat->end; my $size = $end-$start+1; my $strand = $feat->strand; my $seqid = $feat->seq_id; my $score = $feat->score; my $frame = $feat->frame; my $source = $feat->source_tag; my $type = $feat->primary_tag; my $gffstr = $gffio->gff_string($feat); my @alltags = $feat->all_tags(); my @ID_tag_value = $feat->each_tag_value("ID"); my $seq = $feat->seq(); print "$seq\n"; if($type eq "gene"){ # print "@ID_tag_value\t$size\t$type\t$start\t$end\n"; } } From David.Messina at sbc.su.se Thu Aug 26 08:53:48 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 26 Aug 2010 10:53:48 +0200 Subject: [Bioperl-l] getting DNA sequence for exon features from GFF In-Reply-To: References: Message-ID: <6A715DB8-C51A-4195-BB6C-73A7345CB65A@sbc.su.se> Admittedly i'm not up on the latest uses of GFF, but as far as I know, GFF is an annotation format only ? it does not contain the actual sequence. Have you looked in your GFF file to see if there are nucleotides in there? Dave On Aug 26, 2010, at 10:29, kanmani radha wrote: > Hi All, > I would like to get the DNA seq from GFF file. I'm using Bio::Tools::GFF > module. I could get everything else but not the DNA seq. From biopython at maubp.freeserve.co.uk Thu Aug 26 09:02:53 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 26 Aug 2010 10:02:53 +0100 Subject: [Bioperl-l] getting DNA sequence for exon features from GFF In-Reply-To: <6A715DB8-C51A-4195-BB6C-73A7345CB65A@sbc.su.se> References: <6A715DB8-C51A-4195-BB6C-73A7345CB65A@sbc.su.se> Message-ID: On Thu, Aug 26, 2010 at 9:53 AM, Dave Messina wrote: > > Admittedly i'm not up on the latest uses of GFF, but as far as I know, GFF > is an annotation format only ? it does not contain the actual sequence. > > Have you looked in your GFF file to see if there are nucleotides in there? > > Dave Actually a GFF file can optionally include a FASTA format sequence at the end of the file, although it seems to be more common to just supply separate GFF and FASTA files and cross reference by ID. Peter From David.Messina at sbc.su.se Thu Aug 26 09:08:20 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 26 Aug 2010 11:08:20 +0200 Subject: [Bioperl-l] getting DNA sequence for exon features from GFF In-Reply-To: References: <6A715DB8-C51A-4195-BB6C-73A7345CB65A@sbc.su.se> Message-ID: Aha, great, thanks for clarifying, Peter. And if I bothered to look at the Bio::Tools::GFF documentation before answering :), I would have seen this: http://doc.bioperl.org/bioperl-live/Bio/Tools/GFF.html#General which describes how you can use $gffio->get_seqs() and related methods to pull out the sequence data. Dave On Aug 26, 2010, at 11:02, Peter wrote: > On Thu, Aug 26, 2010 at 9:53 AM, Dave Messina wrote: >> >> Admittedly i'm not up on the latest uses of GFF, but as far as I know, GFF >> is an annotation format only ? it does not contain the actual sequence. >> >> Have you looked in your GFF file to see if there are nucleotides in there? >> >> Dave > > Actually a GFF file can optionally include a FASTA format sequence > at the end of the file, although it seems to be more common to just > supply separate GFF and FASTA files and cross reference by ID. > > Peter From David.Messina at sbc.su.se Thu Aug 26 09:18:25 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 26 Aug 2010 11:18:25 +0200 Subject: [Bioperl-l] getting DNA sequence for exon features from GFF In-Reply-To: References: <6A715DB8-C51A-4195-BB6C-73A7345CB65A@sbc.su.se> Message-ID: <984552CF-01F3-4D29-932F-DD030CCC1448@sbc.su.se> So, just to finish the thought: Kanmani, Apologies for my sloppy and uninformed answer. The following is only slightly less sloppy and uninformed, but may actually answer your question. I think you need to call $gffio->get_seqs() probably as my @seq_objects = $gffio->get_seqs(); and then loop through those something like: foreach my $seq_object (@seq_objects) { my $seq = $seq_object->seq(); foreach my $feat ($seq->get_SeqFeatures) { # do your feature processing here } } Note that I haven't tested the above code. Dave From fs5 at sanger.ac.uk Thu Aug 26 09:19:44 2010 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Thu, 26 Aug 2010 10:19:44 +0100 Subject: [Bioperl-l] getting DNA sequence for exon features from GFF In-Reply-To: References: Message-ID: <1282814384.4777.48.camel@deskpro15336.dynamic.sanger.ac.uk> Hi Kammani, While GFF files may contain DNA sequence data, most of them don't, so you will have to use the location information you get from the GFF annotation file in conjunction with, e.g., a local FASTA database of the genomic sequence you are working with or an online resource. Frank On Thu, 2010-08-26 at 01:29 -0700, kanmani radha wrote: > Hi All, > I would like to get the DNA seq from GFF file. I'm using Bio::Tools::GFF > module. I could get everything else but not the DNA seq. > > Can anyone help me to find this out, Please. I appreciate your help very > much. > thanks, > Kanmani > > #!/usr/bin/perl > > use strict; > use warnings; > use Bio::Tools::GFF; > > my $file = shift; > > my $gffio = Bio::Tools::GFF->new(-file => $file, -gff_version => 3); > $gffio->features_attached_to_seqs(1); > > while (my $feat = $gffio->next_feature()){ > my $start = $feat->start; > my $end= $feat->end; > my $size = $end-$start+1; > my $strand = $feat->strand; > my $seqid = $feat->seq_id; > my $score = $feat->score; > my $frame = $feat->frame; > my $source = $feat->source_tag; > my $type = $feat->primary_tag; > my $gffstr = $gffio->gff_string($feat); > my @alltags = $feat->all_tags(); > my @ID_tag_value = $feat->each_tag_value("ID"); > > my $seq = $feat->seq(); > print "$seq\n"; > > if($type eq "gene"){ # > print "@ID_tag_value\t$size\t$type\t$start\t$end\n"; > } > } > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From cjfields at illinois.edu Thu Aug 26 14:20:48 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 26 Aug 2010 09:20:48 -0500 Subject: [Bioperl-l] getting DNA sequence for exon features from GFF In-Reply-To: <1282814384.4777.48.camel@deskpro15336.dynamic.sanger.ac.uk> References: <1282814384.4777.48.camel@deskpro15336.dynamic.sanger.ac.uk> Message-ID: <6CC5D44B-47C7-4A05-B834-EAB9173C74CE@illinois.edu> Kammani, If you are using BioPerl, the best option currently available is to load a database with all relevant information (GFF and FASTA), then use that database for querying. The most commonly-used ones now are Bio::DB::SeqFeature::Store and Bio::DB::GFF; the former is very GFF3-centric, but I believe it can handle GFF/GTF, and it has various database adaptors (MySQL, Pg, BDB, SQLite). chris On Aug 26, 2010, at 4:19 AM, Frank Schwach wrote: > Hi Kammani, > > While GFF files may contain DNA sequence data, most of them don't, so > you will have to use the location information you get from the GFF > annotation file in conjunction with, e.g., a local FASTA database of the > genomic sequence you are working with or an online resource. > > > Frank > > > > On Thu, 2010-08-26 at 01:29 -0700, kanmani radha wrote: >> Hi All, >> I would like to get the DNA seq from GFF file. I'm using Bio::Tools::GFF >> module. I could get everything else but not the DNA seq. >> >> Can anyone help me to find this out, Please. I appreciate your help very >> much. >> thanks, >> Kanmani >> >> #!/usr/bin/perl >> >> use strict; >> use warnings; >> use Bio::Tools::GFF; >> >> my $file = shift; >> >> my $gffio = Bio::Tools::GFF->new(-file => $file, -gff_version => 3); >> $gffio->features_attached_to_seqs(1); >> >> while (my $feat = $gffio->next_feature()){ >> my $start = $feat->start; >> my $end= $feat->end; >> my $size = $end-$start+1; >> my $strand = $feat->strand; >> my $seqid = $feat->seq_id; >> my $score = $feat->score; >> my $frame = $feat->frame; >> my $source = $feat->source_tag; >> my $type = $feat->primary_tag; >> my $gffstr = $gffio->gff_string($feat); >> my @alltags = $feat->all_tags(); >> my @ID_tag_value = $feat->each_tag_value("ID"); >> >> my $seq = $feat->seq(); >> print "$seq\n"; >> >> if($type eq "gene"){ # >> print "@ID_tag_value\t$size\t$type\t$start\t$end\n"; >> } >> } >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Aug 26 14:31:59 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 26 Aug 2010 09:31:59 -0500 Subject: [Bioperl-l] getting DNA sequence for exon features from GFF In-Reply-To: References: <6A715DB8-C51A-4195-BB6C-73A7345CB65A@sbc.su.se> Message-ID: On Aug 26, 2010, at 4:02 AM, Peter wrote: > On Thu, Aug 26, 2010 at 9:53 AM, Dave Messina wrote: >> >> Admittedly i'm not up on the latest uses of GFF, but as far as I know, GFF >> is an annotation format only ? it does not contain the actual sequence. >> >> Have you looked in your GFF file to see if there are nucleotides in there? >> >> Dave > > Actually a GFF file can optionally include a FASTA format sequence > at the end of the file, although it seems to be more common to just > supply separate GFF and FASTA files and cross reference by ID. > > Peter IIRC, optionally including FASTA sequence is specified only in the GFF3 spec; use of FASTA isn't explicitly mentioned in earlier versions. We only support it with earlier GFF due to convergence of the various GFF parsers. The original GFF spec proposed allowing sequence, but it's in the form of meta information and I have never seen it used in practice (as you mention, the FASTA is normally loaded separately). chris From kanmaninradha at gmail.com Thu Aug 26 16:22:14 2010 From: kanmaninradha at gmail.com (kanmani radha) Date: Thu, 26 Aug 2010 09:22:14 -0700 Subject: [Bioperl-l] getting DNA sequence for exon features from GFF In-Reply-To: <6CC5D44B-47C7-4A05-B834-EAB9173C74CE@illinois.edu> References: <1282814384.4777.48.camel@deskpro15336.dynamic.sanger.ac.uk> <6CC5D44B-47C7-4A05-B834-EAB9173C74CE@illinois.edu> Message-ID: Hi Everyone, Thanks very much for this clarification. Thanks a ton for every one who spared their time to educate me. I see your points. Please correct me if I am wrong. I understand that, Its better to use use Bio::DB::SeqFeature or Bio::DB::GFF to load the fasta sequences (from a separate multifasta) file and then Bio::Tools::GFF to parse the feature info from a gff file . Then query the created database for the relevent GFF coordinates.... I will implement this. Thanks once again. Kanmani On Thu, Aug 26, 2010 at 7:20 AM, Chris Fields wrote: > Kammani, > > If you are using BioPerl, the best option currently available is to load a > database with all relevant information (GFF and FASTA), then use that > database for querying. The most commonly-used ones now are > Bio::DB::SeqFeature::Store and Bio::DB::GFF; the former is very > GFF3-centric, but I believe it can handle GFF/GTF, and it has various > database adaptors (MySQL, Pg, BDB, SQLite). > > chris > > On Aug 26, 2010, at 4:19 AM, Frank Schwach wrote: > > > Hi Kammani, > > > > While GFF files may contain DNA sequence data, most of them don't, so > > you will have to use the location information you get from the GFF > > annotation file in conjunction with, e.g., a local FASTA database of the > > genomic sequence you are working with or an online resource. > > > > > > Frank > > > > > > > > On Thu, 2010-08-26 at 01:29 -0700, kanmani radha wrote: > >> Hi All, > >> I would like to get the DNA seq from GFF file. I'm using Bio::Tools::GFF > >> module. I could get everything else but not the DNA seq. > >> > >> Can anyone help me to find this out, Please. I appreciate your help very > >> much. > >> thanks, > >> Kanmani > >> > >> #!/usr/bin/perl > >> > >> use strict; > >> use warnings; > >> use Bio::Tools::GFF; > >> > >> my $file = shift; > >> > >> my $gffio = Bio::Tools::GFF->new(-file => $file, -gff_version => 3); > >> $gffio->features_attached_to_seqs(1); > >> > >> while (my $feat = $gffio->next_feature()){ > >> my $start = $feat->start; > >> my $end= $feat->end; > >> my $size = $end-$start+1; > >> my $strand = $feat->strand; > >> my $seqid = $feat->seq_id; > >> my $score = $feat->score; > >> my $frame = $feat->frame; > >> my $source = $feat->source_tag; > >> my $type = $feat->primary_tag; > >> my $gffstr = $gffio->gff_string($feat); > >> my @alltags = $feat->all_tags(); > >> my @ID_tag_value = $feat->each_tag_value("ID"); > >> > >> my $seq = $feat->seq(); > >> print "$seq\n"; > >> > >> if($type eq "gene"){ # > >> print "@ID_tag_value\t$size\t$type\t$start\t$end\n"; > >> } > >> } > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > -- > > The Wellcome Trust Sanger Institute is operated by Genome Research > > Limited, a charity registered in England with number 1021457 and a > > company registered in England with number 2742969, whose registered > > office is 215 Euston Road, London, NW1 2BE. > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Thu Aug 26 17:08:56 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 26 Aug 2010 12:08:56 -0500 Subject: [Bioperl-l] getting DNA sequence for exon features from GFF In-Reply-To: References: <1282814384.4777.48.camel@deskpro15336.dynamic.sanger.ac.uk> <6CC5D44B-47C7-4A05-B834-EAB9173C74CE@illinois.edu> Message-ID: On Aug 26, 2010, at 11:22 AM, kanmani radha wrote: > Hi Everyone, > > Thanks very much for this clarification. Thanks a ton for every one who > spared their time to educate me. > > I see your points. Please correct me if I am wrong. > > I understand that, Its better to use use Bio::DB::SeqFeature or Bio::DB::GFF > to load the fasta sequences (from a separate multifasta) file and > then Bio::Tools::GFF to parse the feature info from a gff file . Then query > the created database for the relevent GFF coordinates.... > > I will implement this. > > Thanks once again. > Kanmani Yes, in general. I forgot to mention that you can have an in-memory database as well, but it's only suggested if you have a few thousand or so features and small sequences (I think bacterial chromosomes will work). chris From Havard.Aanes at nvh.no Wed Aug 25 15:47:12 2010 From: Havard.Aanes at nvh.no (=?iso-8859-1?Q?Aanes_H=E5vard?=) Date: Wed, 25 Aug 2010 17:47:12 +0200 Subject: [Bioperl-l] bpfetch.pl Message-ID: <897520BC3AAE754FA4E34E2FD26490A8021C61597B8D@A-EXMB1.veths.no> Hi, I am trying do obtain a set of mRNA sequences from a database, made by the bpindex script. I thought this should be a trivial task, but it appears not to be. I get the sequences if I do one by one, like this: perl scripts/index/bpfetch.pl -dir ./ zebrafish:NM_201192 zebrafish:NM_212708 But I need hundreds of sequences, so my plan was to put the RefSeq IDs in a file and use that as an argument (or whatever it is called in perl). That does not work: haavaaan at login2 ~/download/src/bioperl-1.2.3 $ perl scripts/index/bpfetch.pl -dir ./ zebrafish:./some_seqs You are running bpindex.pl without installing bioperl. You have done it from bioperl/scripts, and so we can find the necessary information but it is much better to install bioperl Please read the README in the bioperl distribution Sequence %id in Database zebrafish is not present Any suggestions on how to do this? Alternative approaches are also appreciated. I have no experience in perl, just started using linux, and for the moment there is no time to learn perl, so I would really be grateful for any help to solve this specific task. Best regards H?vard Aanes (M.Sc.) Ph.D. student Section for biochemistry and physiology The Norwegian School of Veterinary Science Telephone: +47 22597358 The new e-mail domain name for The Norwegian School of Veterinary Science is @nvh.no. The former domain address @veths.no will still be in use, but it will be discontinued within 1-2 years. Please update your e-mail records. This message verifies that the e-mail has been scanned for virus, and deemed virus-free according to our scanengines. From kanmaninradha at gmail.com Thu Aug 26 08:23:28 2010 From: kanmaninradha at gmail.com (kanmani) Date: Thu, 26 Aug 2010 01:23:28 -0700 (PDT) Subject: [Bioperl-l] Bio::Tools:GFF to get DNA sequences... Message-ID: <9b7381d7-3596-4e60-a2ac-6c8c135d457d@s24g2000pri.googlegroups.com> Hi I am trying to get the DNA sequences for each exon feature. I have the following script. Everything works except getting sequences. Can some one correct me.....Thanks. #!/usr/bin/perl use strict; use warnings; use Bio::Tools::GFF; my $file = shift; my $gffio = Bio::Tools::GFF->new(-file => $file, -gff_version => 3); $gffio->features_attached_to_seqs(1); while (my $feat = $gffio->next_feature()){ my $start = $feat->start; my $end= $feat->end; my $size = $end-$start+1; my $strand = $feat->strand; my $seqid = $feat->seq_id; my $score = $feat->score; my $frame = $feat->frame; my $source = $feat->source_tag; my $type = $feat->primary_tag; my $gffstr = $gffio->gff_string($feat); my @alltags = $feat->all_tags(); my @ID_tag_value = $feat->each_tag_value("ID"); my $seq = $feat->seq(); print "$seq\n"; if($type eq "gene"){ print "@ID_tag_value\t$size\t$type\t$start\t$end\n"; } } From kanmaninradha at gmail.com Thu Aug 26 21:24:40 2010 From: kanmaninradha at gmail.com (kanmani radha) Date: Thu, 26 Aug 2010 14:24:40 -0700 Subject: [Bioperl-l] getting DNA sequence for exon features from GFF In-Reply-To: References: <1282814384.4777.48.camel@deskpro15336.dynamic.sanger.ac.uk> <6CC5D44B-47C7-4A05-B834-EAB9173C74CE@illinois.edu> Message-ID: Hi Chris and others, For a brief amount time i could get away using Bio::DB::Fasta to index fasta files and Bio::Tools::GFF to iterate thru GFF features. But, i hit the wall again. Looks like sequential access of GFF featuers is not sufficient, I want to have a random access to it. I see the only way to do that is by using Bio::DB::GFF as suggested by Chris. Here is my question. Is there any tutorial to configure Bioperl or this module in particular to work with MySQL/postgres. I will really appreciate it. And thanks for all your help. Kanmani On Thu, Aug 26, 2010 at 10:08 AM, Chris Fields wrote: > On Aug 26, 2010, at 11:22 AM, kanmani radha wrote: > > > Hi Everyone, > > > > Thanks very much for this clarification. Thanks a ton for every one who > > spared their time to educate me. > > > > I see your points. Please correct me if I am wrong. > > > > I understand that, Its better to use use Bio::DB::SeqFeature or > Bio::DB::GFF > > to load the fasta sequences (from a separate multifasta) file and > > then Bio::Tools::GFF to parse the feature info from a gff file . Then > query > > the created database for the relevent GFF coordinates.... > > > > I will implement this. > > > > Thanks once again. > > Kanmani > > Yes, in general. I forgot to mention that you can have an in-memory > database as well, but it's only suggested if you have a few thousand or so > features and small sequences (I think bacterial chromosomes will work). > > chris From kanmaninradha at gmail.com Thu Aug 26 22:04:20 2010 From: kanmaninradha at gmail.com (kanmani radha) Date: Thu, 26 Aug 2010 15:04:20 -0700 Subject: [Bioperl-l] getting DNA sequence for exon features from GFF In-Reply-To: References: <1282814384.4777.48.camel@deskpro15336.dynamic.sanger.ac.uk> <6CC5D44B-47C7-4A05-B834-EAB9173C74CE@illinois.edu> Message-ID: HI, I made some progress since then.... - Installing Bio::DB::DBI::mysql needed Biosql. - Downloaded and installed biosql follow the instruction as given in their INSTALL file - Created biosql db in my mysql server - loaded schema using script from biosql - installed DBI - Now, I have problem with DBD::mysql. That reminds me couple years back i had to struggle installing this driver on another machine. I thought i ask around this time. It fails with a bunch of error messages.....the first of it being.... dbdimp.h:22:49 error: mysql.h no such filer or directory But, My mysql installation has header file in "/usr/include/mysql3/mysql/mysql.h". Can anyone suggest how to move forward from that..... thanks, Kanmani On Thu, Aug 26, 2010 at 2:24 PM, kanmani radha wrote: > Hi Chris and others, > > For a brief amount time i could get away using Bio::DB::Fasta to index > fasta files and Bio::Tools::GFF to iterate thru GFF features. But, i hit the > wall again. Looks like sequential access of GFF featuers is not sufficient, > I want to have a random access to it. I see the only way to do that is by > using Bio::DB::GFF as suggested by Chris. > > Here is my question. Is there any tutorial to configure Bioperl or this > module in particular to work with MySQL/postgres. I will really appreciate > it. > > And thanks for all your help. > Kanmani > > > On Thu, Aug 26, 2010 at 10:08 AM, Chris Fields wrote: > >> On Aug 26, 2010, at 11:22 AM, kanmani radha wrote: >> >> > Hi Everyone, >> > >> > Thanks very much for this clarification. Thanks a ton for every one who >> > spared their time to educate me. >> > >> > I see your points. Please correct me if I am wrong. >> > >> > I understand that, Its better to use use Bio::DB::SeqFeature or >> Bio::DB::GFF >> > to load the fasta sequences (from a separate multifasta) file and >> > then Bio::Tools::GFF to parse the feature info from a gff file . Then >> query >> > the created database for the relevent GFF coordinates.... >> > >> > I will implement this. >> > >> > Thanks once again. >> > Kanmani >> >> Yes, in general. I forgot to mention that you can have an in-memory >> database as well, but it's only suggested if you have a few thousand or so >> features and small sequences (I think bacterial chromosomes will work). >> >> chris > > > From rafalucas.unicamp at gmail.com Thu Aug 26 22:11:07 2010 From: rafalucas.unicamp at gmail.com (Rafael Lucas) Date: Thu, 26 Aug 2010 19:11:07 -0300 Subject: [Bioperl-l] Help in algorithm Bio::Structure::IO::pdb Message-ID: Hi folks, How are you? I'm from Brazil and I was making an algorithm that Cryptographyc a data and then print the result in a pdb file. So I have a .fasta file and want to pass this file to .pdb file, if I use a program, like PyMol, it will take so much time, so I wanna use the Bio::Structure::IO::pdb to accelerate this process, could you help me in this problem? Thank you, Rafael Lucas Faculdade de Tecnologia em Analise e Desenvolvimento de Sistemas FT - UNICAMP +55 (19)9614-0533 From J.Christopher.Ellis at duke.edu Fri Aug 27 02:06:30 2010 From: J.Christopher.Ellis at duke.edu (J. Christopher Ellis) Date: Thu, 26 Aug 2010 22:06:30 -0400 Subject: [Bioperl-l] standaloneblastplus blastn crash Message-ID: <55861.1282874790@duke.edu> When I run the standaloneblastplus I get the following error... ------------- EXCEPTION ------------- MSG: C:Program FilesNCBIblast-2.2.24+binblastn.exe call crashed: There was a problem running C:Program FilesNCBIblast-2.2.24+binblastn.exe :? at C:/Perl64/lib/Bio/Tools/Run/WrapperBase/CommandExts.pm line 1001. STACK Bio::Tools::Run::WrapperBase::_run C:/Perl64/lib/Bio/Tools/Run/WrapperBase/CommandExts.pm:1006 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD C:/Perl64/lib/Bio/Tools/Run/StandAloneBlastPlus.pm:1303 STACK Bio::Tools::Run::StandAloneBlastPlus::run C:/Perl64/lib/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:270 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD C:/Perl64/lib/Bio/Tools/Run/StandAloneBlastPlus.pm:1301 STACK toplevel localBlast.pl:9 ------------------------------------- I have a sneaky suspicion that it is an easy fix but for the life of me I can not figure it out! :) Thanks in advance, Chris From indraniel at gmail.com Fri Aug 27 01:57:54 2010 From: indraniel at gmail.com (Indraniel) Date: Fri, 27 Aug 2010 01:57:54 +0000 (UTC) Subject: [Bioperl-l] How to convert SFF into Fastq References: Message-ID: A fourth option is the following tool, sff2fastq (written in C), described here: http://indraniel.wordpress.com/2010/04/23/sff2fastq/ and http://github.com/indraniel/sff2fastq Indraniel From David.Messina at sbc.su.se Fri Aug 27 07:41:21 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 27 Aug 2010 09:41:21 +0200 Subject: [Bioperl-l] [RFC] Interolog::Walk In-Reply-To: <4C6D0B50.4050902@sms.ed.ac.uk> References: <4C6BF4BD.5010200@sms.ed.ac.uk> <8BBCD561-CA5E-412B-8CBC-34767006A74A@sbc.su.se> <4C6D0B50.4050902@sms.ed.ac.uk> Message-ID: Hi Giuseppe, On Aug 19, 2010, at 12:45, Giuseppe Gallone wrote: > Bio::Orthology::InterologMap > Bio::Orthology::Interolog::Map, > just in case somebody else finds other interesting applications for the Interolog concept and would like to "plug in" their own contribution. Would this make any sense? Absolutely. I think either of the above is a good option, and I agree that the second is a little more flexible. Your POD looks great! Way better than most. Having seen the whole thing now, I think your description is fine as is. And if you have another tutorial and example scripts on top of it, that would really be terrific, above and beyond what most people would expect. So, time to unleash it on the world! :) Dave From David.Messina at sbc.su.se Fri Aug 27 07:58:12 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 27 Aug 2010 09:58:12 +0200 Subject: [Bioperl-l] standaloneblastplus blastn crash In-Reply-To: <55861.1282874790@duke.edu> References: <55861.1282874790@duke.edu> Message-ID: <9275A540-AE42-47B0-BA73-A906964C451B@sbc.su.se> Hi Chris, If you look at the error message, it says what the problem is: it's trying to call the blastn executable with no spaces in the path name. > MSG: C:Program FilesNCBIblast-2.2.24+binblastn.exe call crashed: There > was a problem running C:Program FilesNCBIblast-2.2.24+binblastn.exe Now, that could be a problem is BioPerl or it could be a problem in your code. It's hard to diagnose where the problem lies without your code, so please post your code. Dave From G.Gallone at sms.ed.ac.uk Fri Aug 27 11:07:57 2010 From: G.Gallone at sms.ed.ac.uk (Giuseppe Gallone) Date: Fri, 27 Aug 2010 12:07:57 +0100 Subject: [Bioperl-l] [RFC] Interolog::Walk In-Reply-To: References: <4C6BF4BD.5010200@sms.ed.ac.uk> <8BBCD561-CA5E-412B-8CBC-34767006A74A@sbc.su.se> <4C6D0B50.4050902@sms.ed.ac.uk> Message-ID: <4C779C8D.1090007@sms.ed.ac.uk> Hi Dave, thank you very much for your feedback :) . I will register the namespace right now. I think I will use 'homology' as the second level name though, because I plan to extend the module to work with paralogues as well. As for the category, which one of the following you reckon it will fit a Bio:: package better http://www.cpan.org/modules/by-category/ Regards Giuseppe On 27/08/10 08:41, Dave Messina wrote: > Hi Giuseppe, > > > On Aug 19, 2010, at 12:45, Giuseppe Gallone wrote: >> Bio::Orthology::InterologMap >> Bio::Orthology::Interolog::Map, > >> just in case somebody else finds other interesting applications for the Interolog concept and would like to "plug in" their own contribution. Would this make any sense? > > Absolutely. I think either of the above is a good option, and I agree that the second is a little more flexible. > > Your POD looks great! Way better than most. Having seen the whole thing now, I think your description is fine as is. And if you have another tutorial and example scripts on top of it, that would really be terrific, above and beyond what most people would expect. > > So, time to unleash it on the world! :) > > > Dave > > -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From David.Messina at sbc.su.se Fri Aug 27 11:25:06 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 27 Aug 2010 13:25:06 +0200 Subject: [Bioperl-l] [RFC] Interolog::Walk In-Reply-To: <4C779C8D.1090007@sms.ed.ac.uk> References: <4C6BF4BD.5010200@sms.ed.ac.uk> <8BBCD561-CA5E-412B-8CBC-34767006A74A@sbc.su.se> <4C6D0B50.4050902@sms.ed.ac.uk> <4C779C8D.1090007@sms.ed.ac.uk> Message-ID: <80E5F23B-EA13-40EE-B0C5-81F2E6A69C01@sbc.su.se> Hi Giuseppe, > I think I will use 'homology' as the second level name though, because I plan to extend the module to work with paralogues as well. Sounds good. > As for the category, which one of the following you reckon it will fit a Bio:: package better > > http://www.cpan.org/modules/by-category/ Bio:: is in 23 - miscellaneous modules, so probably keeping with that makes sense. I don't know much about that stuff, though. Chris F. or other CPAN cognoscenti care to comment? Dave From cjfields at illinois.edu Fri Aug 27 13:26:51 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 27 Aug 2010 08:26:51 -0500 Subject: [Bioperl-l] [RFC] Interolog::Walk In-Reply-To: <80E5F23B-EA13-40EE-B0C5-81F2E6A69C01@sbc.su.se> References: <4C6BF4BD.5010200@sms.ed.ac.uk> <8BBCD561-CA5E-412B-8CBC-34767006A74A@sbc.su.se> <4C6D0B50.4050902@sms.ed.ac.uk> <4C779C8D.1090007@sms.ed.ac.uk> <80E5F23B-EA13-40EE-B0C5-81F2E6A69C01@sbc.su.se> Message-ID: <88BB7813-E892-4BEC-9C49-5FD22325BBF7@illinois.edu> On Aug 27, 2010, at 6:25 AM, Dave Messina wrote: > Hi Giuseppe, > > >> I think I will use 'homology' as the second level name though, because I plan to extend the module to work with paralogues as well. > > Sounds good. > > >> As for the category, which one of the following you reckon it will fit a Bio:: package better >> >> http://www.cpan.org/modules/by-category/ > > > Bio:: is in 23 - miscellaneous modules, so probably keeping with that makes sense. > > I don't know much about that stuff, though. Chris F. or other CPAN cognoscenti care to comment? > > > Dave That's probably the best spot, as we cover a fairly broad range (mainly due to core monolithic structure). Though it's terribly non-descript, sort of the junk drawer of CPAN. chris From adamkennedybackup at gmail.com Sun Aug 29 11:35:50 2010 From: adamkennedybackup at gmail.com (Adam Kennedy) Date: Sun, 29 Aug 2010 21:35:50 +1000 Subject: [Bioperl-l] Could I install BioPerl on Windows with the ActivePerl 5.12.1? In-Reply-To: <5115F433-06AC-46F1-81AD-D15C4A8D9524@gmail.com> References: <78E913D5-00E2-45F2-AA9D-7F4A7CDBFDA1@gmail.com> <5115F433-06AC-46F1-81AD-D15C4A8D9524@gmail.com> Message-ID: http://strawberryperl.com/download/professional/strawberry-perl-professional-5.10.1.3-alpha-2.msi You get BioPerl installed out the box. Adam K On 20 August 2010 03:20, Christopher Fields wrote: > cc'ing list. ?Looks like the BioPerl PPM is possibly broken for perl 5.12. ?Shouldn't be too hard to fix, but apparently there are a lot of missing packages. Troubling... > > chris > > On Aug 19, 2010, at 11:29 AM, han sun wrote: > >> v5.10 works,thanks. >> >> 2010/8/19 Christopher Fields >> Try using ActivePerl 5.10 instead of v5.12. ?It's very possible the PPM won't work for v5.12 yet. >> >> chris >> >> On Aug 19, 2010, at 9:25 AM, han sun wrote: >> >> > Hello everyone, >> > >> > I have used perl for several months,and I now want to feel the power of >> > bioperl. >> > But it seems that the installing is more difficult than I thought. >> > >> > I typed the commands. >> > >> > >> > >> > install-shell >> > >> > >> > rep add bioperl http://bioperl.org/DIST >> > >> > >> > rep add uwinnipeg >> > http://cpan.uwinnipeg.ca/PPMPackages/12xx/ >> > >> > >> > rep add trouchelle http://trouchelle.com/ppm12/ >> > >> > install BioPerl >> > >> > However,the installing failed, >> > >> > ppm install failed: >> > Can't find any package that provides List::MoreUtils for Bundle-BioPerl-Core >> > Can't find any package that provides PostScript::TextBlock for >> > Bundle-BioPerl-Core >> > Can't find any package that provides Ace:: for Bundle-BioPerl-Core >> > Can't find any package that provides SOAP::Lite for Bundle-BioPerl-Core >> > Can't find any package that provides Convert::Binary::C for >> > Bundle-BioPerl-Core >> > Can't find any package that provides XML::Twig for Bundle-BioPerl-Core >> > Can't find any package that provides DB_File:: for Bundle-BioPerl-Core >> > Can't find any package that provides IPC::Run for GraphViz >> > Can't find any package that provides XML-XPathEngine for XML-DOM-XPath >> > Can't find any package that provides List-MoreUtils for Moose >> > Can't find any package that provides List-MoreUtils for Class-MOP >> > >> > >> > then I tried >> > >> > install http://www.bribes.org/perl/ppm/GD.ppd >> > >> > and tried the installation again,but it still didn't help. >> > >> > * >> > * >> > * >> > * >> > * >> > * >> > >> > >> > *Do you konw what's wrong with the problem?* >> > * >> > * >> > * >> > * >> > *Please help me,thanks very much.* >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields1 at gmail.com Sun Aug 29 15:58:50 2010 From: cjfields1 at gmail.com (Christopher Fields) Date: Sun, 29 Aug 2010 10:58:50 -0500 Subject: [Bioperl-l] Could I install BioPerl on Windows with the ActivePerl 5.12.1? In-Reply-To: References: <78E913D5-00E2-45F2-AA9D-7F4A7CDBFDA1@gmail.com> <5115F433-06AC-46F1-81AD-D15C4A8D9524@gmail.com> Message-ID: Yes, and I am thinking of pointing more and more users that direction instead. Can't say maintaining PPM packages with ever-fluctuating specs is easy when I don't work with Windows anymore. chris On Aug 29, 2010, at 6:35 AM, Adam Kennedy wrote: > http://strawberryperl.com/download/professional/strawberry-perl-professional-5.10.1.3-alpha-2.msi > > You get BioPerl installed out the box. > > Adam K > > On 20 August 2010 03:20, Christopher Fields wrote: >> cc'ing list. Looks like the BioPerl PPM is possibly broken for perl 5.12. Shouldn't be too hard to fix, but apparently there are a lot of missing packages. Troubling... >> >> chris >> >> On Aug 19, 2010, at 11:29 AM, han sun wrote: >> >>> v5.10 works,thanks. >>> >>> 2010/8/19 Christopher Fields >>> Try using ActivePerl 5.10 instead of v5.12. It's very possible the PPM won't work for v5.12 yet. >>> >>> chris >>> >>> On Aug 19, 2010, at 9:25 AM, han sun wrote: >>> >>>> Hello everyone, >>>> >>>> I have used perl for several months,and I now want to feel the power of >>>> bioperl. >>>> But it seems that the installing is more difficult than I thought. >>>> >>>> I typed the commands. >>>> >>>> >>>> >>>> install-shell >>>> >>>> >>>> rep add bioperl http://bioperl.org/DIST >>>> >>>> >>>> rep add uwinnipeg >>>> http://cpan.uwinnipeg.ca/PPMPackages/12xx/ >>>> >>>> >>>> rep add trouchelle http://trouchelle.com/ppm12/ >>>> >>>> install BioPerl >>>> >>>> However,the installing failed, >>>> >>>> ppm install failed: >>>> Can't find any package that provides List::MoreUtils for Bundle-BioPerl-Core >>>> Can't find any package that provides PostScript::TextBlock for >>>> Bundle-BioPerl-Core >>>> Can't find any package that provides Ace:: for Bundle-BioPerl-Core >>>> Can't find any package that provides SOAP::Lite for Bundle-BioPerl-Core >>>> Can't find any package that provides Convert::Binary::C for >>>> Bundle-BioPerl-Core >>>> Can't find any package that provides XML::Twig for Bundle-BioPerl-Core >>>> Can't find any package that provides DB_File:: for Bundle-BioPerl-Core >>>> Can't find any package that provides IPC::Run for GraphViz >>>> Can't find any package that provides XML-XPathEngine for XML-DOM-XPath >>>> Can't find any package that provides List-MoreUtils for Moose >>>> Can't find any package that provides List-MoreUtils for Class-MOP >>>> >>>> >>>> then I tried >>>> >>>> install http://www.bribes.org/perl/ppm/GD.ppd >>>> >>>> and tried the installation again,but it still didn't help. >>>> >>>> * >>>> * >>>> * >>>> * >>>> * >>>> * >>>> >>>> >>>> *Do you konw what's wrong with the problem?* >>>> * >>>> * >>>> * >>>> * >>>> *Please help me,thanks very much.* >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From odclerck at gmail.com Fri Aug 27 07:44:14 2010 From: odclerck at gmail.com (odclerck) Date: Fri, 27 Aug 2010 00:44:14 -0700 (PDT) Subject: [Bioperl-l] fasta header replace Message-ID: <29550202.post@talk.nabble.com> Hi, Was wondering if someone had an easy script available that converts the headers of a fasta sequences to a value stored in a separate text file. Macrogen produces files with sequences that look more or less like this: >100825-30_I01_CF_CentralAmerica1_A1_psbAF.ab1 1012, 1000 bases, 0 checksum. I can filter out the position on the plate e.g. "A1" easily but would like to replace this with the name of the strain stored in a different text file, e.g. "A1_D1222". Realize this sounds pretty basic to most of you, but I'm pretty new at scripting. Olivier -- View this message in context: http://old.nabble.com/fasta-header-replace-tp29550202p29550202.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From J.Christopher.Ellis at duke.edu Mon Aug 30 12:55:04 2010 From: J.Christopher.Ellis at duke.edu (J. Christopher Ellis) Date: Mon, 30 Aug 2010 08:55:04 -0400 Subject: [Bioperl-l] Taxonomy DB problem Message-ID: <51468.1283172904@duke.edu> Hi All, I am trying to extract the entire taxonomy of an organism including the classifications. Some thing like... Phylum:Proteobacteria,?Class:Gammaproteobacteria,?Order:Enterobacteriales,?Family:Enterobacteriaceae,?Genus:Escherichia I?am?not?worried?about?format?just?that?I?get?the?information?and?the?associated?level?of?hierarchy.?The?script?found?at?http://bioperl.org/wiki/Species_names_from_accession_numbers?seemed?like?a?good?starting?point?so?I?copied?it?and?tried?run?it?but?got?an?error. My?first?question?is?"Is?there?a?known?fix?for?this?"?and?my?second?question?is?how?do?I?get?the?full?hierarchical?information?(as?seen?above)?with?the?taxonomy?db? Thanks?for?all?your?help?in?advance! Chris? From rafalucas.unicamp at gmail.com Mon Aug 30 13:24:11 2010 From: rafalucas.unicamp at gmail.com (Rafael Lucas) Date: Mon, 30 Aug 2010 10:24:11 -0300 Subject: [Bioperl-l] help in algorithm Bio::Structure::IO::pdb Message-ID: Hi folks, How are you? I'm from Brazil and I was making an algorithm that Cryptographyc a data and then print the result in a pdb file. So I have a .fasta file and want to pass this file to .pdb file, if I use a program, like PyMol, it will take so much time, so I wanna use the Bio::Structure::IO::pdb to accelerate this process, could you help me in this problem? Thank you, Rafael Lucas Faculdade de Tecnologia em Analise e Desenvolvimento de Sistemas FT - UNICAMP +55 (19)9614-0533 From cjfields at illinois.edu Mon Aug 30 13:36:41 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 30 Aug 2010 08:36:41 -0500 Subject: [Bioperl-l] Taxonomy DB problem In-Reply-To: <51468.1283172904@duke.edu> References: <51468.1283172904@duke.edu> Message-ID: Chris, Regarding a fix for that script, we would have to see your modified script and the error. However, there are modules within BioPerl to essentially do what you want, in particular, Bio::DB::Taxonomy. chris On Aug 30, 2010, at 7:55 AM, J. Christopher Ellis wrote: > Hi All, > > I am trying to extract the entire taxonomy of an organism including the > classifications. Some thing like... > > Phylum:Proteobacteria, Class:Gammaproteobacteria, Order:Enterobacteriales, Family:Enterobacteriaceae, Genus:Escherichia > > I am not worried about format just that I get the information and the associated level of hierarchy. The script found at http://bioperl.org/wiki/Species_names_from_accession_numbers seemed like a good starting point so I copied it and tried run it but got an error. > > My first question is "Is there a known fix for this?" and my second question is how do I get the full hierarchical information (as seen above) with the taxonomy db? > > Thanks for all your help in advance! > > Chris > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From fs5 at sanger.ac.uk Mon Aug 30 15:11:06 2010 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Mon, 30 Aug 2010 16:11:06 +0100 Subject: [Bioperl-l] fasta header replace In-Reply-To: <29550202.post@talk.nabble.com> References: <29550202.post@talk.nabble.com> Message-ID: <4C7BCA0A.70503@sanger.ac.uk> Hi Olivier, Do you know how to read a file and build a hash from the contents? This is what you will need to do, e.g. if your file is A1 Strain_A A2 Strain_A A3 Strain_B then you can do something like: open (INFILE, '>', $infile_path) or die; my %well2strain; While (){ my ($well, $strain) = ($_=~/^([A-Z]\d+)\s+(\w+)/); $well2strain{$well}=$strain; } You can then use the values of the hash to set the sequence ID as you parse the FASTA file. The BioPerl SeqIO howto gives details about how to read and write the FASTA file (http://www.bioperl.org/wiki/HOWTO:SeqIO#Working_Examples). You can change the id of a sequence object with $some_seq_object->id( 'my new ID'); See http://doc.bioperl.org/releases/bioperl-1.0/Bio/Seq.html for details. Hope that helps to get you started. Frank odclerck wrote: > Hi, > Was wondering if someone had an easy script available that converts the > headers of a fasta sequences to a value stored in a separate text file. > > Macrogen produces files with sequences that look more or less like this: > >> 100825-30_I01_CF_CentralAmerica1_A1_psbAF.ab1 1012, 1000 bases, 0 checksum. >> > > I can filter out the position on the plate e.g. "A1" easily but would like > to replace this with the name of the strain stored in a different text file, > e.g. "A1_D1222". > > Realize this sounds pretty basic to most of you, but I'm pretty new at > scripting. > Olivier > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From jessica.sun at gmail.com Mon Aug 30 15:51:39 2010 From: jessica.sun at gmail.com (Jessica Sun) Date: Mon, 30 Aug 2010 11:51:39 -0400 Subject: [Bioperl-l] Git for the lazy In-Reply-To: <4A13D48C-B920-4FA5-AF18-292C764A8B79@sbc.su.se> References: <4A13D48C-B920-4FA5-AF18-292C764A8B79@sbc.su.se> Message-ID: I want to add sequence features with tags and tag values, I want to have them in my order, however somehow it seems it is in default alphabetically orders of the tags, does any one knows how to fix? thanks a lot in advance. From G.Gallone at sms.ed.ac.uk Tue Aug 31 11:52:57 2010 From: G.Gallone at sms.ed.ac.uk (Giuseppe Gallone) Date: Tue, 31 Aug 2010 12:52:57 +0100 Subject: [Bioperl-l] New CPAN Release - Bio::Homology::InterologWalk - A Perl Module to retrieve putative PPIs through Interologs Message-ID: <4C7CED19.80802@sms.ed.ac.uk> Dear Bioperl users, I would like to announce the release of Bio::Homology::InterologWalk, a module that retrieves, scores and visualizes putative Protein-Protein Interactions through the orthology-walk method. The project is available from the following link http://search.cpan.org/~ggallone/ and a description of the idea behind it is here http://search.cpan.org/~ggallone/Bio-Homology-InterologWalk-0.02/lib/Bio/Homology/InterologWalk.pm#DESCRIPTION The project is in a very early stage (currently ver. 0.02 alpha) and has currently been tested only on Linux environments. It has not been tested on Macs, but it should work fine, and I would appreciate any feedback from Mac users who try it. *Any* form of feedback will be extremely appreciated (bug, typos, syntactical errors, verbal abuse etc :) ). Best, Giuseppe -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From cjfields at illinois.edu Tue Aug 31 15:01:59 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 31 Aug 2010 10:01:59 -0500 Subject: [Bioperl-l] Taxonomy DB problem In-Reply-To: <56973.1283255847@duke.edu> References: <56973.1283255847@duke.edu> Message-ID: <7167CA86-857E-4E16-A3D6-BA45045CF892@illinois.edu> Yes, I see that one. It may be the ID hash that is being returned is empty. I'll look into it. -c On Aug 31, 2010, at 6:57 AM, J. Christopher Ellis wrote: > Hi Chris, > > The error is... > > "Use of uninitialized value $id in join or string at C:/Perl64/site/lib/Bio/Tools/EUtilities/EUtilParameters.pm line 363." > > The script from http://bioperl.org/wiki/Species_names_from_accession_numbers is as follows.... > > use Bio::DB::EUtilities; > > > > > > > > > my (%taxa, @taxa); > > > > my (%names, %idmap); > > > > > > > > > # these are protein ids; nuc ids will work by changing -dbfrom => 'nucleotide', > > > > # (probably) > > > > > > > > > my @ids = qw(1621261 89318838 68536103 > > 20807972 > 730439); > > > > > > > my $factory = Bio::DB::EUtilities->new( > > - > eutil => 'elink', > > > -db => 'taxonomy', > > > > > -dbfrom => 'protein', > > > > > -correspondence => 1, > > > > > -id => \@ids); > > > > > > > > > # iterate through the LinkSet objects > > > > while (my $ds = $factory->next_LinkSet) { > > > > > $taxa{($ds->get_submitted_ids)[0] > > } > = ($ds->get_ids)[0] > > } > > > > > > > > > @taxa = @taxa{@ids}; > > > > > > > > > $factory = Bio::DB::EUtilities->new(-eutil > > => > 'esummary', > > > -db => 'taxonomy', > > > > > -id => \@taxa ); > > > > > > > > > while (local $_ = $factory->next_DocSum) > > > { > > > $names{($_->get_contents_by_name('TaxId')) > > [ > 0]} = > > ($_->get_contents_by_name('ScientificName'))[0 > > ] > ; > > } > > > > > > > > > foreach (@ids) { > > > > > $idmap{$_} = $names{$taxa{$_ > > } > }; > > } > > > > > > > > > # %idmap is > > > > # 1621261 => 'Mycobacterium tuberculosis H37Rv' > > > > # 20807972 => 'Thermoanaerobacter tengcongensis MB4' > > > > # 68536103 => 'Corynebacterium jeikeium K411' > > > > # 730439 => 'Bacillus caldolyticus' > > > > # 89318838 => undef (this record has been removed from the db) > > > > > > > > > 1; > > > Thanks, > > > > Chris > > > On Mon 08/30/10 09:36 , "Chris Fields" cjfields at illinois.edu sent: > Chris, > > Regarding a fix for that script, we would have to see your modified script and the error. However, there are modules within BioPerl to essentially do what you want, in particular, Bio::DB::Taxonomy. > > chris > > On Aug 30, 2010, at 7:55 AM, J. Christopher Ellis wrote: > > > Hi All, > > > > I am trying to extract the entire taxonomy of an organism including the > > classifications. Some thing like... > > > > Phylum:Proteobacteria, Class:Gammaproteobacteria, Order:Enterobacteriales, Family:Enterobacteriaceae, Genus:Escherichia > > > > I am not worried about format just that I get the information and the associated level of hierarchy. The script found athttp://bioperl.org/wiki/Species_names_from_accession_numbers">http://bioperl.org/wiki/Species_names_from_accession_numbers seemed like a good starting point so I copied it and tried run it but got an error. > > > > My first question is "Is there a known fix for this?" and my second question is how do I get the full hierarchical information (as seen above) with the taxonomy db? > > > > Thanks for all your help in advance! > > > > Chris > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l">http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From J.Christopher.Ellis at duke.edu Tue Aug 31 11:57:27 2010 From: J.Christopher.Ellis at duke.edu (J. Christopher Ellis) Date: Tue, 31 Aug 2010 07:57:27 -0400 Subject: [Bioperl-l] Taxonomy DB problem Message-ID: <56973.1283255847@duke.edu> Hi Chris, The error is... "Use of uninitialized value $id in join or string at C:/Perl64/site/lib/Bio/Tools/EUtilities/EUtilParameters.pm line 363." The script from http://bioperl.org/wiki/Species_names_from_accession_numbers is as follows.... use?Bio::DB::EUtilities; ? my?(%taxa,?@taxa); my?(%names,?%idmap); ? #?these?are?protein?ids;?nuc?ids?will?work?by?changing?-dbfrom?=>?'nucleotide', #?(probably) ? my?@ids?=?qw(1621261?89318838?68536103? 20807972?730439); ? my?$factory?=?Bio::DB::EUtilities->new( -eutil?=>?'elink', ?-db?=>?'taxonomy', ?-dbfrom?=>?'protein', ?-correspondence?=>?1, ?-id?=>?@ids); ? #?iterate?through?the?LinkSet?objects while?(my?$ds?=?$factory->next_LinkSet)?{ ?$taxa{($ds->get_submitted_ids)[0] }?=?($ds->get_ids)[0] } ? @taxa?=?@taxa{@ids}; ? $factory?=?Bio::DB::EUtilities->new(-eutil? =>?'esummary', ?-db?=>?'taxonomy', ?-id?=>?@taxa?); ? while?(local?$_?=?$factory->next_DocSum) ?{ ?$names{($_->get_contents_by_name('TaxId')) [0]}?=? ($_->get_contents_by_name('ScientificName'))[0 ]; } ? foreach?(@ids)?{ ?$idmap{$_}?=?$names{$taxa{$_ }}; } ? #?%idmap?is #?1621261?=>?'Mycobacterium?tuberculosis?H37Rv' #?20807972?=>?'Thermoanaerobacter?tengcongensis?MB4' #?68536103?=>?'Corynebacterium?jeikeium?K411' #?730439?=>?'Bacillus?caldolyticus' #?89318838?=>?undef?(this?record?has?been?removed?from?the?db) ? 1; Thanks, Chris On Mon 08/30/10 09:36 , "Chris Fields" cjfields at illinois.edu sent: Chris, Regarding a fix for that script, we would have to see your modified script and the error. However, there are modules within BioPerl to essentially do what you want, in particular, Bio::DB::Taxonomy. chris On Aug 30, 2010, at 7:55 AM, J. Christopher Ellis wrote: > Hi All, > > I am trying to extract the entire taxonomy of an organism including the > classifications. Some thing like... > > Phylum:Proteobacteria, Class:Gammaproteobacteria, Order:Enterobacteriales, Family:Enterobacteriaceae, Genus:Escherichia > > I am not worried about format just that I get the information and the associated level of hierarchy. The script found at http://bioperl.org/wiki/Species_names_from_accession_numbers seemed like a good starting point so I copied it and tried run it but got an error. > > My first question is "Is there a known fix for this?" and my second question is how do I get the full hierarchical information (as seen above) with the taxonomy db? > > Thanks for all your help in advance! > > Chris > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l