From rmb32 at cornell.edu  Sun Aug  1 15:17:14 2010
From: rmb32 at cornell.edu (Robert Buels)
Date: Sun, 01 Aug 2010 12:17:14 -0700
Subject: [Bioperl-l] GMOD Evo Hackathon Open Call for Participation
Message-ID: <4C55C83A.3060700@cornell.edu>

We are seeking participants for the GMOD Tools for Evolutionary Biology 
Hackathon, held November 8-12, 2010 at the US National Evolutionary 
Synthesis Center (NESCent) in Durham, NC.

This hackathon targets three critical gaps in the capabilities of the 
GMOD toolbox that currently limit its utility for evolutionary research:

  1. Visualization of comparative genomics data
  2. Visualization of phylogenetic data and trees
  3. Support for population diversity and phenotype data

If you are interested in these areas and have relevant expertise, you 
are strongly encouraged to apply. Relevant areas of expertise include 
more than just software development: if you are a GMOD power user, 
visualization guru, domain expert (comparative, phylogenetics, 
population, ...), or documentation wizard, then your skills are needed!

How To Apply:

Fill out the online application form at http://bit.ly/gmodevohack. 
Applications are due August 25.

About GMOD:

GMOD is an intercompatible suite of open-source software components for 
storing, managing, analyzing, and visualizing genome-scale data. GMOD 
includes many widely-used software components: GBrowse and JBrowse, both 
genome viewers; GBrowse_syn, a comparative genomics viewer; Chado, a 
generic and modular database schema; CMap, a comparative map viewer; as 
well as many other components including Apollo, MAKER, BioMart, 
InterMine, and Galaxy. We hope to extend the functionality of existing 
GMOD components, and integrate new components as well.

About Hackathons:

A hackathon is an intense event at which a group of programmers with 
different backgrounds and skills collaborate hands-on and face-to-face 
to develop working code that is of utility to the community as a whole. 
The mix of people will include domain experts and computer-savvy end-users.

More details about the event, its motivation, organization, procedures, 
and attendees, as well as URLs to the hackathon and related websites are 
included below.

Sincerely,

The GMOD EvoHack Organizing Committee (and project affiliations as
relevant):

Nicole Washington, Chair (LBNL, modENCODE, Phenote)

Robert Buels (SGN, Chado NatDiv)

Scott Cain (OICR, GMOD)

Dave Clements (NESCent, GMOD)

Hilmar Lapp (NESCent, Phenoscape, Chado NatDiv)

Sheldon McKay (University of Arizona, iPlant, GBrowse_syn)


-----------------------------

About the GMOD Evo Hackathon

Overview

We are organizing a hackathon to fill critical gaps in the capabilities 
of the Generic Model Organism Database (GMOD) toolbox that currently 
limit its utility for evolutionary research. Specifically, we will focus 
on tools for

   1) viewing comparative genomics data;
   2) visualizing phylogenomic data; and
   3) supporting population diversity data and phenotype annotation.

The event will be hosted at NESCent and bring together a group of about 
20+ software developers, end-user representatives, and documentation 
experts who would otherwise not meet. The participants will include key 
developers of GMOD components that currently lack features critical for 
emerging evolutionary biology research, developers of informatics tools 
in evolutionary research that lack GMOD integration, and 
informatics-savvy biologists who can represent end-user requirements.

The event will provide a unique opportunity to infuse the GMOD developer 
community with a heightened awareness of unmet needs in evolutionary 
biology that GMOD components have the potential to fill, and for tool 
developers in evolutionary biology to better understand how best to 
extend or integrate with already existing GMOD components.

Before the Event

Discussion of ideas and sometimes even design actually starts well 
before the hackathon, on mailing lists, wiki pages, and conference calls 
set up among accepted attendees.  This advance work lays the foundation 
for participants to be productive from the very first day.  This also 
means that participants should be willing to contribute some time in 
advance of the hackathon itself to participate in this preparatory 
discussion.

During the Event

Typically, hackathon participants use the morning of the first day of 
the event to organize themselves into working groups of between 3 and 6 
people, each with a focused implementation objective.  Ideas and 
objectives are discussed, and attendees coalesce around the projects in 
which they have the most experience or interest.


Deliverables / Event Results

The meeting's attendance, working groups, and outcomes will be fully 
logged and documented on the GMOD wiki (http://gmod.org). Each working 
group during the event will typically have its own wiki page, linked 
from the main EvoHack page, where it documents its minutes and design 
notes, and provides links to the code and documentation it produces. 
Also, since GMOD and NESCent are both committed to open source 
principles, all code and documentation produced by participants during 
the hackathon must be published under an OSI-approved open source 
license. As contributions to existing GMOD tools, all hackathon products 
will most likely satisfy this requirement automatically.

NESCent

This event is sponsored by the US National Evolutionary Synthesis Center 
(NESCent, http://www.nescent.org) through its Informatics Whitepapers 
program (http://www.nescent.org/informatics/whitepapers.php). NESCent 
promotes the synthesis of information, concepts and knowledge to address 
significant, emerging, or novel questions in evolutionary science and 
its applications. NESCent achieves this by supporting research and 
education across disciplinary, institutional, geographic, and 
demographic boundaries (see http://www.nescent.org/science/proposals.php).

Links

Main GMOD EvoHack page, and full proposal:
http://gmod.org/wiki/GMOD_Evo_Hackathon

NESCent: http://www.nescent.org/
GMOD: http://gmod.org <http://gmod.org/>
Similar past NESCent events, see: http://hackathon.nescent.org/
GMOD hackathon application:  http://bit.ly/gmodevohack

-- 
http://gmod.org/wiki/GMOD_News
http://gmod.org/wiki/GMOD_Europe_2010
http://gmod.org/wiki/Help_Desk_Feedback


From maj at fortinbras.us  Sun Aug  1 19:19:16 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Sun, 1 Aug 2010 19:19:16 -0400
Subject: [Bioperl-l] SOAP Eutilities
In-Reply-To: <AANLkTi=DSQ2vktjCghDscW6OyHv25HYNXqA96LXTz443@mail.gmail.com>
References: <AANLkTi=DSQ2vktjCghDscW6OyHv25HYNXqA96LXTz443@mail.gmail.com>
Message-ID: <627BEC8B2E624A69A0B11EEBC8C93B71@NewLife>

Turns out that module lives in bioperl-run; try 

git clone git://github.com/bioperl/bioperl-run.git

MAJ
----- Original Message ----- 
From: "Robson de Souza" <robfsouza at gmail.com>
To: <bioperl-l at bioperl.org>
Sent: Saturday, July 31, 2010 4:56 PM
Subject: [Bioperl-l] SOAP Eutilities


> Hi,
> 
> Bio::DB::SoapEUtilities, referred in the HOWTO on EUtilities, seems to
> have disappeared from the Git repository.
> A simple
> 
> git clone git://github.com/bioperl/bioperl-live.git
> 
> does not download it. Any ideas why?
> Robson
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>

From David.Messina at sbc.su.se  Mon Aug  2 09:58:10 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Mon, 2 Aug 2010 15:58:10 +0200
Subject: [Bioperl-l] phyloxml and element order
In-Reply-To: <AANLkTimk5j3VfOvLNcN_c+FsgoVqpntB9xR5NfDopLPh@mail.gmail.com>
References: <AANLkTimk5j3VfOvLNcN_c+FsgoVqpntB9xR5NfDopLPh@mail.gmail.com>
Message-ID: <AB413C9E-ED42-48AF-A8AB-893771AD7067@sbc.su.se>

Hi Fred,

Thanks for letting us know about this ? definitely sounds like a bug.

Would you please submit this to our bug tracker?

    http://bugzilla.open-bio.org


(You can just copy and paste your previous email.)

Dave


On Jul 30, 2010, at 06:59, Fr?d?ric Romagn? wrote:

> Hi,
> 
> I'm using bioperl to create phyloxml trees, after few tentatives, i got my
> tree with all the element/attributes i want but when I write the tree,
> element are not written following the order specified in the XSD Schema.
> 
> For example, i got :
> 
> <clade>
>   <clade>
>      <name>Loxosceles intermedia</name>
>      <taxonomy>
>         <scientific_name>Araneomorphae Sicariidae</scientific_name>
>      </taxonomy>
>      <sequence>
>         <accession source="Arachnoserver">969</accession>
>         <mol_seq>HAAERADSRKPIWDIAHMVNDLELVD</mol_seq>
>      </sequence>
>   </clade>
>   <taxonomy>
>      <scientific_name>Araneomorphae Sicariidae</scientific_name>
>   </taxonomy>
> </clade>
> 
> The program forester complains that <taxonomy> should be written before the
> <clade> element.
> 
> According to
> http://phyloxml.wordpress.com/2009/11/25/order-of-elements-in-phyloxml this
> is what bioperl is supposed to do.
> 
> All my element/attributes are set before writing the tree using
> 'add_Annotation', 'add_tag_value' and 'sequence' methods from a
> Bio::Tree::AnnotatableNode object, so i think the error comes from the
> write_tree method.
> 
> Any help would be appreciated.
> 
> Thank you,
> Fred
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From shalabh.sharma7 at gmail.com  Mon Aug  2 15:44:35 2010
From: shalabh.sharma7 at gmail.com (shalabh sharma)
Date: Mon, 2 Aug 2010 15:44:35 -0400
Subject: [Bioperl-l] clustalw to maf format
Message-ID: <AANLkTik4yDV6p0AyMWXxsJP1ifYAehu7jw=2oUWeo0B3@mail.gmail.com>

Hi,
    I am trying to convert clustalw to maf format.
I am trying to use AlignIO for that but its not working.

Its giving me the following error:

EXCEPTION Bio::Root::NotImplemented -------------
MSG: Abstract method "Bio::AlignIO::maf::write_aln" is not implemented by
package Bio::AlignIO::maf.
This is not your fault - author of Bio::AlignIO::maf should be blamed!

STACK Bio::Root::RootI::throw_not_implemented
/Library/Perl/5.8.8/Bio/Root/RootI.pm:707
STACK Bio::AlignIO::maf::write_aln /Library/Perl/5.8.8/Bio/AlignIO/
maf.pm:176
STACK Bio::AlignIO::PRINT /Library/Perl/5.8.8/Bio/AlignIO.pm:492
STACK toplevel msf2mafy.pl:11


Is there any other way i can convert clustalw to maf?

I would really appreciate if anyone can help me out.

Thanks
Shalabh

From Russell.Smithies at agresearch.co.nz  Mon Aug  2 16:25:26 2010
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Tue, 3 Aug 2010 08:25:26 +1200
Subject: [Bioperl-l] clustalw to maf format
In-Reply-To: <AANLkTik4yDV6p0AyMWXxsJP1ifYAehu7jw=2oUWeo0B3@mail.gmail.com>
References: <AANLkTik4yDV6p0AyMWXxsJP1ifYAehu7jw=2oUWeo0B3@mail.gmail.com>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF32F02147B68@exchsth.agresearch.co.nz>

This might work if you only have a few:
http://www.ibi.vu.nl/programs/convertalignwww/

--Russell


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of shalabh sharma
> Sent: Tuesday, 3 August 2010 7:45 a.m.
> To: bioperl-l
> Subject: [Bioperl-l] clustalw to maf format
> 
> Hi,
>     I am trying to convert clustalw to maf format.
> I am trying to use AlignIO for that but its not working.
> 
> Its giving me the following error:
> 
> EXCEPTION Bio::Root::NotImplemented -------------
> MSG: Abstract method "Bio::AlignIO::maf::write_aln" is not implemented by
> package Bio::AlignIO::maf.
> This is not your fault - author of Bio::AlignIO::maf should be blamed!
> 
> STACK Bio::Root::RootI::throw_not_implemented
> /Library/Perl/5.8.8/Bio/Root/RootI.pm:707
> STACK Bio::AlignIO::maf::write_aln /Library/Perl/5.8.8/Bio/AlignIO/
> maf.pm:176
> STACK Bio::AlignIO::PRINT /Library/Perl/5.8.8/Bio/AlignIO.pm:492
> STACK toplevel msf2mafy.pl:11
> 
> 
> Is there any other way i can convert clustalw to maf?
> 
> I would really appreciate if anyone can help me out.
> 
> Thanks
> Shalabh
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From shalabh.sharma7 at gmail.com  Mon Aug  2 16:53:31 2010
From: shalabh.sharma7 at gmail.com (shalabh sharma)
Date: Mon, 2 Aug 2010 16:53:31 -0400
Subject: [Bioperl-l] clustalw to maf format
In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32F02147B68@exchsth.agresearch.co.nz>
References: <AANLkTik4yDV6p0AyMWXxsJP1ifYAehu7jw=2oUWeo0B3@mail.gmail.com>
	<18DF7D20DFEC044098A1062202F5FFF32F02147B68@exchsth.agresearch.co.nz>
Message-ID: <AANLkTingREcmgoeS7RVzi4j84Kk9bFmg_F6p-tScpKWA@mail.gmail.com>

Hi Russell,
            Thanks for the reply, but i  have around 400 alignments and some
huge ones :(

Thanks
Shalabh


On Mon, Aug 2, 2010 at 4:25 PM, Smithies, Russell <
Russell.Smithies at agresearch.co.nz> wrote:

> This might work if you only have a few:
> http://www.ibi.vu.nl/programs/convertalignwww/
>
> --Russell
>
>
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of shalabh sharma
> > Sent: Tuesday, 3 August 2010 7:45 a.m.
> > To: bioperl-l
> > Subject: [Bioperl-l] clustalw to maf format
> >
> > Hi,
> >     I am trying to convert clustalw to maf format.
> > I am trying to use AlignIO for that but its not working.
> >
> > Its giving me the following error:
> >
> > EXCEPTION Bio::Root::NotImplemented -------------
> > MSG: Abstract method "Bio::AlignIO::maf::write_aln" is not implemented by
> > package Bio::AlignIO::maf.
> > This is not your fault - author of Bio::AlignIO::maf should be blamed!
> >
> > STACK Bio::Root::RootI::throw_not_implemented
> > /Library/Perl/5.8.8/Bio/Root/RootI.pm:707
> > STACK Bio::AlignIO::maf::write_aln /Library/Perl/5.8.8/Bio/AlignIO/
> > maf.pm:176
> > STACK Bio::AlignIO::PRINT /Library/Perl/5.8.8/Bio/AlignIO.pm:492
> > STACK toplevel msf2mafy.pl:11
> >
> >
> > Is there any other way i can convert clustalw to maf?
> >
> > I would really appreciate if anyone can help me out.
> >
> > Thanks
> > Shalabh
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
>

From biopython at maubp.freeserve.co.uk  Mon Aug  2 17:24:09 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 2 Aug 2010 22:24:09 +0100
Subject: [Bioperl-l] clustalw to maf format
In-Reply-To: <AANLkTik4yDV6p0AyMWXxsJP1ifYAehu7jw=2oUWeo0B3@mail.gmail.com>
References: <AANLkTik4yDV6p0AyMWXxsJP1ifYAehu7jw=2oUWeo0B3@mail.gmail.com>
Message-ID: <AANLkTikFJP0aZHWgcRVxfJ9dhg-8Aj+aRWLF2GJDseW3@mail.gmail.com>

On Mon, Aug 2, 2010 at 8:44 PM, shalabh sharma
<shalabh.sharma7 at gmail.com> wrote:
> Hi,
> ? ?I am trying to convert clustalw to maf format.
> I am trying to use AlignIO for that but its not working.

Could you tell us why you have to use maf format?
I'm curious because all of the phylogenetics tools I've
had to work with personally will take some other format
which is more widely supported (e.g. FASTA, PFAM,
ClustalW, PHYLIP, ...).

Peter


From bernd.web at gmail.com  Mon Aug  2 17:25:52 2010
From: bernd.web at gmail.com (Bernd Web)
Date: Mon, 2 Aug 2010 23:25:52 +0200
Subject: [Bioperl-l] clustalw to maf format
In-Reply-To: <AANLkTingREcmgoeS7RVzi4j84Kk9bFmg_F6p-tScpKWA@mail.gmail.com>
References: <AANLkTik4yDV6p0AyMWXxsJP1ifYAehu7jw=2oUWeo0B3@mail.gmail.com>
	<18DF7D20DFEC044098A1062202F5FFF32F02147B68@exchsth.agresearch.co.nz>
	<AANLkTingREcmgoeS7RVzi4j84Kk9bFmg_F6p-tScpKWA@mail.gmail.com>
Message-ID: <AANLkTimQe9fgO3jMeWR_y3E7gNskh26GUVVuEyfgtRJc@mail.gmail.com>

Hi Shalabh,

This ConvertAlign does not write maf either, it only reads it (i made
it). I found some other converters on the web but they do not export
to maf format either...

http://biotechvana.uv.es/servers/afc/main.php
http://www.hiv.lanl.gov/content/sequence/FORMAT_CONVERSION/form.html

Galaxy has a MAF to Fasta converter:
http://main.g2.bx.psu.edu/root?tool_id=MAF_To_Fasta1


Regards,
Bernd


On Mon, Aug 2, 2010 at 10:53 PM, shalabh sharma
<shalabh.sharma7 at gmail.com> wrote:
> Hi Russell,
> ? ? ? ? ? ?Thanks for the reply, but i ?have around 400 alignments and some
> huge ones :(
>
> Thanks
> Shalabh
>
>
> On Mon, Aug 2, 2010 at 4:25 PM, Smithies, Russell <
> Russell.Smithies at agresearch.co.nz> wrote:
>
>> This might work if you only have a few:
>> http://www.ibi.vu.nl/programs/convertalignwww/
>>
>> --Russell
>>
>>
>> > -----Original Message-----
>> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> > bounces at lists.open-bio.org] On Behalf Of shalabh sharma
>> > Sent: Tuesday, 3 August 2010 7:45 a.m.
>> > To: bioperl-l
>> > Subject: [Bioperl-l] clustalw to maf format
>> >
>> > Hi,
>> > ? ? I am trying to convert clustalw to maf format.
>> > I am trying to use AlignIO for that but its not working.
>> >
>> > Its giving me the following error:
>> >
>> > EXCEPTION Bio::Root::NotImplemented -------------
>> > MSG: Abstract method "Bio::AlignIO::maf::write_aln" is not implemented by
>> > package Bio::AlignIO::maf.
>> > This is not your fault - author of Bio::AlignIO::maf should be blamed!
>> >
>> > STACK Bio::Root::RootI::throw_not_implemented
>> > /Library/Perl/5.8.8/Bio/Root/RootI.pm:707
>> > STACK Bio::AlignIO::maf::write_aln /Library/Perl/5.8.8/Bio/AlignIO/
>> > maf.pm:176
>> > STACK Bio::AlignIO::PRINT /Library/Perl/5.8.8/Bio/AlignIO.pm:492
>> > STACK toplevel msf2mafy.pl:11
>> >
>> >
>> > Is there any other way i can convert clustalw to maf?
>> >
>> > I would really appreciate if anyone can help me out.
>> >
>> > Thanks
>> > Shalabh
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> =======================================================================
>> Attention: The information contained in this message and/or attachments
>> from AgResearch Limited is intended only for the persons or entities
>> to which it is addressed and may contain confidential and/or privileged
>> material. Any review, retransmission, dissemination or other use of, or
>> taking of any action in reliance upon, this information by persons or
>> entities other than the intended recipients is prohibited by AgResearch
>> Limited. If you have received this message in error, please notify the
>> sender immediately.
>> =======================================================================
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at illinois.edu  Mon Aug  2 17:31:20 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 2 Aug 2010 16:31:20 -0500
Subject: [Bioperl-l] clustalw to maf format
In-Reply-To: <AANLkTingREcmgoeS7RVzi4j84Kk9bFmg_F6p-tScpKWA@mail.gmail.com>
References: <AANLkTik4yDV6p0AyMWXxsJP1ifYAehu7jw=2oUWeo0B3@mail.gmail.com>
	<18DF7D20DFEC044098A1062202F5FFF32F02147B68@exchsth.agresearch.co.nz>
	<AANLkTingREcmgoeS7RVzi4j84Kk9bFmg_F6p-tScpKWA@mail.gmail.com>
Message-ID: <6E9C9D64-D23A-4FC8-B213-FC8A7FFA4F27@illinois.edu>

No other format will work?  The main reason you see unimplemented methods like this is there is no active interest in working with this format beyond getting the information stored within them into objects and other commonly-used formats.

chris

On Aug 2, 2010, at 3:53 PM, shalabh sharma wrote:

> Hi Russell,
>            Thanks for the reply, but i  have around 400 alignments and some
> huge ones :(
> 
> Thanks
> Shalabh
> 
> 
> On Mon, Aug 2, 2010 at 4:25 PM, Smithies, Russell <
> Russell.Smithies at agresearch.co.nz> wrote:
> 
>> This might work if you only have a few:
>> http://www.ibi.vu.nl/programs/convertalignwww/
>> 
>> --Russell
>> 
>> 
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>> bounces at lists.open-bio.org] On Behalf Of shalabh sharma
>>> Sent: Tuesday, 3 August 2010 7:45 a.m.
>>> To: bioperl-l
>>> Subject: [Bioperl-l] clustalw to maf format
>>> 
>>> Hi,
>>>    I am trying to convert clustalw to maf format.
>>> I am trying to use AlignIO for that but its not working.
>>> 
>>> Its giving me the following error:
>>> 
>>> EXCEPTION Bio::Root::NotImplemented -------------
>>> MSG: Abstract method "Bio::AlignIO::maf::write_aln" is not implemented by
>>> package Bio::AlignIO::maf.
>>> This is not your fault - author of Bio::AlignIO::maf should be blamed!
>>> 
>>> STACK Bio::Root::RootI::throw_not_implemented
>>> /Library/Perl/5.8.8/Bio/Root/RootI.pm:707
>>> STACK Bio::AlignIO::maf::write_aln /Library/Perl/5.8.8/Bio/AlignIO/
>>> maf.pm:176
>>> STACK Bio::AlignIO::PRINT /Library/Perl/5.8.8/Bio/AlignIO.pm:492
>>> STACK toplevel msf2mafy.pl:11
>>> 
>>> 
>>> Is there any other way i can convert clustalw to maf?
>>> 
>>> I would really appreciate if anyone can help me out.
>>> 
>>> Thanks
>>> Shalabh
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> =======================================================================
>> Attention: The information contained in this message and/or attachments
>> from AgResearch Limited is intended only for the persons or entities
>> to which it is addressed and may contain confidential and/or privileged
>> material. Any review, retransmission, dissemination or other use of, or
>> taking of any action in reliance upon, this information by persons or
>> entities other than the intended recipients is prohibited by AgResearch
>> Limited. If you have received this message in error, please notify the
>> sender immediately.
>> =======================================================================
>> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From shalabh.sharma7 at gmail.com  Mon Aug  2 18:30:41 2010
From: shalabh.sharma7 at gmail.com (shalabh sharma)
Date: Mon, 2 Aug 2010 18:30:41 -0400
Subject: [Bioperl-l] clustalw to maf format
In-Reply-To: <6E9C9D64-D23A-4FC8-B213-FC8A7FFA4F27@illinois.edu>
References: <AANLkTik4yDV6p0AyMWXxsJP1ifYAehu7jw=2oUWeo0B3@mail.gmail.com>
	<18DF7D20DFEC044098A1062202F5FFF32F02147B68@exchsth.agresearch.co.nz>
	<AANLkTingREcmgoeS7RVzi4j84Kk9bFmg_F6p-tScpKWA@mail.gmail.com>
	<6E9C9D64-D23A-4FC8-B213-FC8A7FFA4F27@illinois.edu>
Message-ID: <AANLkTi=DS92zQ+_YANSbgmMiPUDo66YXyLhQby+_z_VD@mail.gmail.com>

Hi All,
      Thanks for the replies.
Actually i am working on a pipeline involving RNAz.
I had impression that there must be a converter available as their webserver
can take xmfa or maf format but standalone is only accepting maf format.

I think i will use a program that can output as xmfa and write to those
people if they can provide me with the converter.

Thanks
Shalabh


On Mon, Aug 2, 2010 at 5:31 PM, Chris Fields <cjfields at illinois.edu> wrote:

> No other format will work?  The main reason you see unimplemented methods
> like this is there is no active interest in working with this format beyond
> getting the information stored within them into objects and other
> commonly-used formats.
>
> chris
>
> On Aug 2, 2010, at 3:53 PM, shalabh sharma wrote:
>
> > Hi Russell,
> >            Thanks for the reply, but i  have around 400 alignments and
> some
> > huge ones :(
> >
> > Thanks
> > Shalabh
> >
> >
> > On Mon, Aug 2, 2010 at 4:25 PM, Smithies, Russell <
> > Russell.Smithies at agresearch.co.nz> wrote:
> >
> >> This might work if you only have a few:
> >> http://www.ibi.vu.nl/programs/convertalignwww/
> >>
> >> --Russell
> >>
> >>
> >>> -----Original Message-----
> >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >>> bounces at lists.open-bio.org] On Behalf Of shalabh sharma
> >>> Sent: Tuesday, 3 August 2010 7:45 a.m.
> >>> To: bioperl-l
> >>> Subject: [Bioperl-l] clustalw to maf format
> >>>
> >>> Hi,
> >>>    I am trying to convert clustalw to maf format.
> >>> I am trying to use AlignIO for that but its not working.
> >>>
> >>> Its giving me the following error:
> >>>
> >>> EXCEPTION Bio::Root::NotImplemented -------------
> >>> MSG: Abstract method "Bio::AlignIO::maf::write_aln" is not implemented
> by
> >>> package Bio::AlignIO::maf.
> >>> This is not your fault - author of Bio::AlignIO::maf should be blamed!
> >>>
> >>> STACK Bio::Root::RootI::throw_not_implemented
> >>> /Library/Perl/5.8.8/Bio/Root/RootI.pm:707
> >>> STACK Bio::AlignIO::maf::write_aln /Library/Perl/5.8.8/Bio/AlignIO/
> >>> maf.pm:176
> >>> STACK Bio::AlignIO::PRINT /Library/Perl/5.8.8/Bio/AlignIO.pm:492
> >>> STACK toplevel msf2mafy.pl:11
> >>>
> >>>
> >>> Is there any other way i can convert clustalw to maf?
> >>>
> >>> I would really appreciate if anyone can help me out.
> >>>
> >>> Thanks
> >>> Shalabh
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >> =======================================================================
> >> Attention: The information contained in this message and/or attachments
> >> from AgResearch Limited is intended only for the persons or entities
> >> to which it is addressed and may contain confidential and/or privileged
> >> material. Any review, retransmission, dissemination or other use of, or
> >> taking of any action in reliance upon, this information by persons or
> >> entities other than the intended recipients is prohibited by AgResearch
> >> Limited. If you have received this message in error, please notify the
> >> sender immediately.
> >> =======================================================================
> >>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>

From chiragmatkarbioinfo at gmail.com  Tue Aug  3 03:47:37 2010
From: chiragmatkarbioinfo at gmail.com (chirag matkar)
Date: Tue, 3 Aug 2010 13:17:37 +0530
Subject: [Bioperl-l] Pubmed Parsing
Message-ID: <AANLkTim+qcBN_9kXVLAkessaHUY9e=gc4Ad5MVGWk-mF@mail.gmail.com>

Hello all,
I have a list of Pubmed Ids.
I want to parse articles to find specific SNP related information.
Can i work it out using a Script?


-- 
Regards,
Chirag Matkar

From genehack at genehack.org  Tue Aug  3 05:03:35 2010
From: genehack at genehack.org (John Anderson)
Date: Tue, 3 Aug 2010 05:03:35 -0400
Subject: [Bioperl-l] Pubmed Parsing
In-Reply-To: <AANLkTim+qcBN_9kXVLAkessaHUY9e=gc4Ad5MVGWk-mF@mail.gmail.com>
References: <AANLkTim+qcBN_9kXVLAkessaHUY9e=gc4Ad5MVGWk-mF@mail.gmail.com>
Message-ID: <5E557C44-224B-4460-9C2C-E375555B8BE6@genehack.org>


On Aug 3, 2010, at 3:47 AM, chirag matkar wrote:

> I have a list of Pubmed Ids.
> I want to parse articles to find specific SNP related information.
> Can i work it out using a Script?

Can you provide a more specific example of what you'd like to do? For example, something along the lines of, "for PMID 1234, get ... about SNP 5678" (where '...' is replaced with whatever it is you're trying to get). Even describing how you would obtain this information using the website yourself will be helpful.

thanks,
john.


From gowthaman.ramasamy at seattlebiomed.org  Tue Aug  3 01:29:10 2010
From: gowthaman.ramasamy at seattlebiomed.org (Gowthaman Ramasamy)
Date: Mon, 2 Aug 2010 22:29:10 -0700
Subject: [Bioperl-l] Getting pileup consensus from BAM files using
	Bio::DB::Sam
In-Reply-To: <AANLkTi=DS92zQ+_YANSbgmMiPUDo66YXyLhQby+_z_VD@mail.gmail.com>
Message-ID: <C87CF736.E5DB%gowthaman.ramasamy@sbri.org>

Hi List,
I am trying to find out the consensus using pileup via Bio::DB::Sam. Using the following script I could parse out the ref_base and different bases from reads at that position. Though, I am not able to find a method to derive consensus. Similar to the values produced by "samtools pileup -c -f xxxxxx.fasta yyyyyyy.bam".

The script I use now retrives ref base, query bases for each position. How do I improve it to get the consensus?

Thanks very much in advance,
Gowthaman


use Bio::DB::Sam;

my $bam = Bio::DB::Sam->new(-bam => 'something.bam',
                            -fasta => 'something.fasta'
                           );

my $cb = sub {
                        my ($seqid, $pos, $pileups) = @_;
                        my $refBase = $bam->segment($seqid, $pos, $pos)->dna;
                        print "\n$pos\t$refBase=>";
                        for my $pileup (@$pileups){
                                my $al = $pileup->alignment;
                                my $qBase = substr($al->qseq, $pileup->qpos, 1);
                                print "$qBase,";
                                }
                        };

$bam->pileup('Lin.chr10i', $cb);


From scott at scottcain.net  Tue Aug  3 06:32:59 2010
From: scott at scottcain.net (Scott Cain)
Date: Tue, 3 Aug 2010 06:32:59 -0400
Subject: [Bioperl-l] Getting pileup consensus from BAM files using
	Bio::DB::Sam
In-Reply-To: <C87CF736.E5DB%gowthaman.ramasamy@sbri.org>
References: <AANLkTi=DS92zQ+_YANSbgmMiPUDo66YXyLhQby+_z_VD@mail.gmail.com>
	<C87CF736.E5DB%gowthaman.ramasamy@sbri.org>
Message-ID: <AANLkTi=vkM5rhy2x_s3p1jZKPtnLjq4wWD=ebGxxmaha@mail.gmail.com>

Hi Gowthaman,

I don't see a method to extract the consensus.  You are welcome to
submit a patch :-)

Scott


On Tue, Aug 3, 2010 at 1:29 AM, Gowthaman Ramasamy
<gowthaman.ramasamy at seattlebiomed.org> wrote:
> Hi List,
> I am trying to find out the consensus using pileup via Bio::DB::Sam. Using the following script I could parse out the ref_base and different bases from reads at that position. Though, I am not able to find a method to derive consensus. Similar to the values produced by "samtools pileup -c -f xxxxxx.fasta yyyyyyy.bam".
>
> The script I use now retrives ref base, query bases for each position. How do I improve it to get the consensus?
>
> Thanks very much in advance,
> Gowthaman
>
>
> use Bio::DB::Sam;
>
> my $bam = Bio::DB::Sam->new(-bam => 'something.bam',
> ? ? ? ? ? ? ? ? ? ? ? ? ? ?-fasta => 'something.fasta'
> ? ? ? ? ? ? ? ? ? ? ? ? ? );
>
> my $cb = sub {
> ? ? ? ? ? ? ? ? ? ? ? ?my ($seqid, $pos, $pileups) = @_;
> ? ? ? ? ? ? ? ? ? ? ? ?my $refBase = $bam->segment($seqid, $pos, $pos)->dna;
> ? ? ? ? ? ? ? ? ? ? ? ?print "\n$pos\t$refBase=>";
> ? ? ? ? ? ? ? ? ? ? ? ?for my $pileup (@$pileups){
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?my $al = $pileup->alignment;
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?my $qBase = substr($al->qseq, $pileup->qpos, 1);
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?print "$qBase,";
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?}
> ? ? ? ? ? ? ? ? ? ? ? ?};
>
> $bam->pileup('Lin.chr10i', $cb);
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
Ontario Institute for Cancer Research


From lincoln.stein at gmail.com  Tue Aug  3 12:57:52 2010
From: lincoln.stein at gmail.com (Lincoln Stein)
Date: Tue, 3 Aug 2010 12:57:52 -0400
Subject: [Bioperl-l] Getting pileup consensus from BAM files using
	Bio::DB::Sam
In-Reply-To: <C87CF736.E5DB%gowthaman.ramasamy@sbri.org>
References: <AANLkTi=DS92zQ+_YANSbgmMiPUDo66YXyLhQby+_z_VD@mail.gmail.com>
	<C87CF736.E5DB%gowthaman.ramasamy@sbri.org>
Message-ID: <AANLkTimhT7KFHgg8HssiMrXkGNSjRMqjkuE++iAaqj4C@mail.gmail.com>

Samtools is running MAQ on the pileup. You could either implement MAQ in
perl, or come up with your own consensus caller.

Lincoln

On Tue, Aug 3, 2010 at 1:29 AM, Gowthaman Ramasamy <
gowthaman.ramasamy at seattlebiomed.org> wrote:

> Hi List,
> I am trying to find out the consensus using pileup via Bio::DB::Sam. Using
> the following script I could parse out the ref_base and different bases from
> reads at that position. Though, I am not able to find a method to derive
> consensus. Similar to the values produced by "samtools pileup -c -f
> xxxxxx.fasta yyyyyyy.bam".
>
> The script I use now retrives ref base, query bases for each position. How
> do I improve it to get the consensus?
>
> Thanks very much in advance,
> Gowthaman
>
>
> use Bio::DB::Sam;
>
> my $bam = Bio::DB::Sam->new(-bam => 'something.bam',
>                            -fasta => 'something.fasta'
>                           );
>
> my $cb = sub {
>                        my ($seqid, $pos, $pileups) = @_;
>                        my $refBase = $bam->segment($seqid, $pos,
> $pos)->dna;
>                        print "\n$pos\t$refBase=>";
>                        for my $pileup (@$pileups){
>                                my $al = $pileup->alignment;
>                                my $qBase = substr($al->qseq, $pileup->qpos,
> 1);
>                                print "$qBase,";
>                                }
>                        };
>
> $bam->pileup('Lin.chr10i', $cb);
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Renata Musa <Renata.Musa at oicr.on.ca>

From biopython at maubp.freeserve.co.uk  Tue Aug  3 13:06:46 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 3 Aug 2010 18:06:46 +0100
Subject: [Bioperl-l] Getting pileup consensus from BAM files using
	Bio::DB::Sam
In-Reply-To: <AANLkTimhT7KFHgg8HssiMrXkGNSjRMqjkuE++iAaqj4C@mail.gmail.com>
References: <AANLkTi=DS92zQ+_YANSbgmMiPUDo66YXyLhQby+_z_VD@mail.gmail.com>
	<C87CF736.E5DB%gowthaman.ramasamy@sbri.org>
	<AANLkTimhT7KFHgg8HssiMrXkGNSjRMqjkuE++iAaqj4C@mail.gmail.com>
Message-ID: <AANLkTinoszFJNtDeEbh_DyFLp97aayv7bYVu6c=znq1h@mail.gmail.com>

On Tue, Aug 3, 2010 at 5:57 PM, Lincoln Stein <lincoln.stein at gmail.com> wrote:
> Samtools is running MAQ on the pileup. You could either implement MAQ in
> perl, or come up with your own consensus caller.
>
> Lincoln

See also: http://seqanswers.com/forums/showthread.php?t=6241

From gowthaman.ramasamy at seattlebiomed.org  Tue Aug  3 13:28:36 2010
From: gowthaman.ramasamy at seattlebiomed.org (Gowthaman Ramasamy)
Date: Tue, 3 Aug 2010 10:28:36 -0700
Subject: [Bioperl-l] Getting pileup consensus from BAM files using
 Bio::DB::Sam
In-Reply-To: <AANLkTimhT7KFHgg8HssiMrXkGNSjRMqjkuE++iAaqj4C@mail.gmail.com>
References: <AANLkTi=DS92zQ+_YANSbgmMiPUDo66YXyLhQby+_z_VD@mail.gmail.com>
	<C87CF736.E5DB%gowthaman.ramasamy@sbri.org>,
	<AANLkTimhT7KFHgg8HssiMrXkGNSjRMqjkuE++iAaqj4C@mail.gmail.com>
Message-ID: <89080953C3D300419AACB6E63A7EEFBA5C47613B34@mail02.sbri.org>

Hi Lincoln,
Thats a good lead. I will try to use MAQ in perl rather than using my simple majority rule.

-gowtham
________________________________________
From: Lincoln Stein [lincoln.stein at gmail.com]
Sent: Tuesday, August 03, 2010 9:57 AM
To: Gowthaman Ramasamy
Cc: bioperl-l
Subject: Re: [Bioperl-l] Getting pileup consensus from BAM files using  Bio::DB::Sam

Samtools is running MAQ on the pileup. You could either implement MAQ in perl, or come up with your own consensus caller.

Lincoln

On Tue, Aug 3, 2010 at 1:29 AM, Gowthaman Ramasamy <gowthaman.ramasamy at seattlebiomed.org<mailto:gowthaman.ramasamy at seattlebiomed.org>> wrote:
Hi List,
I am trying to find out the consensus using pileup via Bio::DB::Sam. Using the following script I could parse out the ref_base and different bases from reads at that position. Though, I am not able to find a method to derive consensus. Similar to the values produced by "samtools pileup -c -f xxxxxx.fasta yyyyyyy.bam".

The script I use now retrives ref base, query bases for each position. How do I improve it to get the consensus?

Thanks very much in advance,
Gowthaman


use Bio::DB::Sam;

my $bam = Bio::DB::Sam->new(-bam => 'something.bam',
                           -fasta => 'something.fasta'
                          );

my $cb = sub {
                       my ($seqid, $pos, $pileups) = @_;
                       my $refBase = $bam->segment($seqid, $pos, $pos)->dna;
                       print "\n$pos\t$refBase=>";
                       for my $pileup (@$pileups){
                               my $al = $pileup->alignment;
                               my $qBase = substr($al->qseq, $pileup->qpos, 1);
                               print "$qBase,";
                               }
                       };

$bam->pileup('Lin.chr10i', $cb);

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org<mailto:Bioperl-l at lists.open-bio.org>
http://lists.open-bio.org/mailman/listinfo/bioperl-l


--
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Renata Musa <Renata.Musa at oicr.on.ca<mailto:Renata.Musa at oicr.on.ca>>


From stefan.kirov at bms.com  Tue Aug  3 16:22:35 2010
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Tue, 03 Aug 2010 16:22:35 -0400
Subject: [Bioperl-l] nmica parser
Message-ID: <4C587A8B.8090603@bms.com>

Has anyone written nmica parser? If not I will perhaps do that. It 
should be straightforward- the output is XML.
Stefan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: stefan_kirov.vcf
Type: text/x-vcard
Size: 207 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20100803/7e4ab529/attachment.vcf>

From fs5 at sanger.ac.uk  Wed Aug  4 04:45:39 2010
From: fs5 at sanger.ac.uk (Frank Schwach)
Date: Wed, 04 Aug 2010 09:45:39 +0100
Subject: [Bioperl-l] Pubmed Parsing
In-Reply-To: <AANLkTim+qcBN_9kXVLAkessaHUY9e=gc4Ad5MVGWk-mF@mail.gmail.com>
References: <AANLkTim+qcBN_9kXVLAkessaHUY9e=gc4Ad5MVGWk-mF@mail.gmail.com>
Message-ID: <1280911539.3499.46.camel@deskpro15336.dynamic.sanger.ac.uk>

Hi Chiraq,

have a look at this earlier post:
http://bioperl.org/pipermail/bioperl-l/2009-April/029690.html

However, you won't be able to retrieve all full texts and it is quite a
task to parse natural language and get useful information about a gene,
protein, SNP etc out of a manuscript. 

Frank

On Tue, 2010-08-03 at 13:17 +0530, chirag matkar wrote:
> Hello all,
> I have a list of Pubmed Ids.
> I want to parse articles to find specific SNP related information.
> Can i work it out using a Script?
> 
> 
> 
> 
> 


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 

From David.Messina at sbc.su.se  Thu Aug  5 08:16:17 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 5 Aug 2010 14:16:17 +0200
Subject: [Bioperl-l] call for a TreeIO volunteer
Message-ID: <91AC5B00-5969-4C56-B08A-6EEA76916A10@sbc.su.se>

Hi everybody,

We've got a couple of small open bugs related to the Bio::TreeIO modules, and we could really use someone to take a look at them. Ideally, that someone would have familiarity with TreeIO already.*

It'd help us to get the next release (1.6.2) out the door.

The bugs in question are:
- TreeIO::newick writes root node branch length incorrectly
http://bugzilla.open-bio.org/show_bug.cgi?id=3039

- Bio::TreeIO::nhx cannot parse empty [&&NHX] + round-trip failure
http://bugzilla.open-bio.org/show_bug.cgi?id=3007


Thanks,
Dave
on behalf of the core developers


* Even if you don't, though, if you've been looking for an opportunity to contribute to BioPerl, and this sounds like something you'd like to work on, by all means raise your hand.


From clements at nescent.org  Thu Aug  5 13:15:41 2010
From: clements at nescent.org (Dave Clements)
Date: Thu, 5 Aug 2010 10:15:41 -0700
Subject: [Bioperl-l] GMOD Europe 2010, 13-16 Sept, Cambridge, UK
In-Reply-To: <AANLkTinpd0pP9cBGUfnEd8PuV-VOcfqz6VKdCRp0d=uA@mail.gmail.com>
References: <AANLkTinpd0pP9cBGUfnEd8PuV-VOcfqz6VKdCRp0d=uA@mail.gmail.com>
Message-ID: <AANLkTi=BCjD3w0w4S+44qRb4ShW-P6DVBH0SZ+41k1Ah@mail.gmail.com>

GMOD Europe 2010
================
13-16 September 2010
Cambridge, UK
http://gmod.org/wiki/GMOD_Europe_2010


We are pleased to announce GMOD Europe 2010, four days of GMOD events being
held 13-16 September 2010, at the University of Cambridge. GMOD Europe 2010
includes:

1) GMOD Community Meeting, Monday & Tuesday:  Project updates, developer and
user presentations and best practices, project direction.

2) GMOD Satellite Meetings, Wednesday:  Special interest groups where GMOD
community members meet to discuss specific topics of interest.

3) InterMine Workshop, Wednesday:  A one day workshop on installing,
configuring and using the InterMine biological data warehouse system.

4) BioMart Workshop, Thursday:  A one day workshop on using the BioMart
biological data warehouse system, including accessing data through APIs.

Registration is now open for these events. There is a ?50 registration fee
for the GMOD Meeting to cover catered lunches and other expenses.
Registration for all other events is free, but required, as space is
limited.  These events are open to all: GMOD users, developers, prospective
users, biologists, and computer scientists.  See
http://gmod.org/wiki/January_2010_GMOD_Meeting for an idea of what goes on
at GMOD meetings,

GMOD is a collection of interoperable open source software components for
managing, visualizing and annotating biological data.  GMOD incorporates
many widely used tools, including GBrowse and JBrowse for genome browsing,
InterMine and BioMart for data mining, Galaxy and Ergatis for workflow,
Chado for data management, GBrowse_syn and CMap for comparative genomics,
plus many other tools (Apollo, MAKER, Pathway Tools, Textpresso, ...).  GMOD
is also an active community of researchers and developers addressing common
challenges in exploiting their data.  If you are struggling to fully exploit
your data then please consider attending GMOD Europe 2010.

Please let us know if you have any questions, and we hope to see you in
Cambridge.

Thanks,

Scott Cain and Dave Clements
-- 
http://gmod.org/wiki/GMOD_News
 <http://gmod.org/wiki/GMOD_News>http://gmod.org/wiki/GMOD_Evo_Hackathon
http://gmod.org/wiki/GMOD_Europe_2010
http://gmod.org/wiki/Help_Desk_Feedback


From abhishek.vit at gmail.com  Thu Aug  5 18:15:56 2010
From: abhishek.vit at gmail.com (Abhishek Pratap)
Date: Thu, 5 Aug 2010 18:15:56 -0400
Subject: [Bioperl-l] Wrapper for Picard tools in Bioperl
Message-ID: <AANLkTi=rrPKSuuddK-+gTqPyo-wKQA0ZamDP59_+dUfi@mail.gmail.com>

Hi All

Just wondering if there is any Picard wrapper/s available in Bioperl.


Thanks!
-Abhi

-----------------------------
Abhishek Pratap
Bioinformatics Software Engineer II
Genomics Resource Center
Institute for Genome Sciences
School of Medicine, Univ of Maryland
801, W. Baltimore Street, Baltimore, MD 21209
Ph: (+1)-410-706-2296
www.igs.umaryland.edu/

From Russell.Smithies at agresearch.co.nz  Thu Aug  5 18:37:46 2010
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Fri, 6 Aug 2010 10:37:46 +1200
Subject: [Bioperl-l] Wrapper for Picard tools in Bioperl
In-Reply-To: <AANLkTi=rrPKSuuddK-+gTqPyo-wKQA0ZamDP59_+dUfi@mail.gmail.com>
References: <AANLkTi=rrPKSuuddK-+gTqPyo-wKQA0ZamDP59_+dUfi@mail.gmail.com>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF32F02262E96@exchsth.agresearch.co.nz>

Might be part of the "Enterprise" package.
If not, some developer should "make it so".

:-)

--Russell
(I hate Fridays)

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Abhishek Pratap
> Sent: Friday, 6 August 2010 10:16 a.m.
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Wrapper for Picard tools in Bioperl
> 
> Hi All
> 
> Just wondering if there is any Picard wrapper/s available in Bioperl.
> 
> 
> Thanks!
> -Abhi
> 
> -----------------------------
> Abhishek Pratap
> Bioinformatics Software Engineer II
> Genomics Resource Center
> Institute for Genome Sciences
> School of Medicine, Univ of Maryland
> 801, W. Baltimore Street, Baltimore, MD 21209
> Ph: (+1)-410-706-2296
> www.igs.umaryland.edu/
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From cjfields at illinois.edu  Thu Aug  5 19:10:16 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 5 Aug 2010 18:10:16 -0500
Subject: [Bioperl-l] Wrapper for Picard tools in Bioperl
In-Reply-To: <AANLkTi=rrPKSuuddK-+gTqPyo-wKQA0ZamDP59_+dUfi@mail.gmail.com>
References: <AANLkTi=rrPKSuuddK-+gTqPyo-wKQA0ZamDP59_+dUfi@mail.gmail.com>
Message-ID: <26E3E5B6-47CF-4744-9687-199C218B5571@illinois.edu>

Picard uses samtools, which has a perl API:

http://search.cpan.org/dist/Bio-SamTools/

which uses BioPerl.  Ah, the circle of life...

chris

On Aug 5, 2010, at 5:15 PM, Abhishek Pratap wrote:

> Hi All
> 
> Just wondering if there is any Picard wrapper/s available in Bioperl.
> 
> 
> Thanks!
> -Abhi
> 
> -----------------------------
> Abhishek Pratap
> Bioinformatics Software Engineer II
> Genomics Resource Center
> Institute for Genome Sciences
> School of Medicine, Univ of Maryland
> 801, W. Baltimore Street, Baltimore, MD 21209
> Ph: (+1)-410-706-2296
> www.igs.umaryland.edu/
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From dan.kortschak at adelaide.edu.au  Thu Aug  5 21:06:45 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Fri, 06 Aug 2010 10:36:45 +0930
Subject: [Bioperl-l] MUMmer parser work
Message-ID: <1281056805.2414.26.camel@zoidberg.mbs.adelaide.edu.au>

Hello Everyone,

I've just noticed the absence of a MUMmer parser and thought that it
might be a worthwhile contribution to bioperl-run (I won't be able to
start on this for a while, but given Mark's excellent work on
CommandExts, it should take too long to get up when I do have time). Has
anyone made any effort in this direction that I would be stepping on, or
if they have left it, that I could pick up to shorten the work time?

cheers
Dan


From cjfields at illinois.edu  Thu Aug  5 23:13:51 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 5 Aug 2010 22:13:51 -0500
Subject: [Bioperl-l] MUMmer parser work
In-Reply-To: <1281056805.2414.26.camel@zoidberg.mbs.adelaide.edu.au>
References: <1281056805.2414.26.camel@zoidberg.mbs.adelaide.edu.au>
Message-ID: <80AF6158-9ADF-47A6-97EC-C322F75C8959@illinois.edu>

Dan,

Just so you know, there is a proposed MUMmer AlignIO parser that John (genehack) is planning on trying to incorporate in:

http://bugzilla.open-bio.org/show_bug.cgi?id=2701

It currently lacks significant tests, so feel free to chip in there as needed.

chris

On Aug 5, 2010, at 8:06 PM, Dan Kortschak wrote:

> Hello Everyone,
> 
> I've just noticed the absence of a MUMmer parser and thought that it
> might be a worthwhile contribution to bioperl-run (I won't be able to
> start on this for a while, but given Mark's excellent work on
> CommandExts, it should take too long to get up when I do have time). Has
> anyone made any effort in this direction that I would be stepping on, or
> if they have left it, that I could pick up to shorten the work time?
> 
> cheers
> Dan
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From greg at ebi.ac.uk  Fri Aug  6 05:47:21 2010
From: greg at ebi.ac.uk (Gregory Jordan)
Date: Fri, 6 Aug 2010 10:47:21 +0100
Subject: [Bioperl-l] call for a TreeIO volunteer
In-Reply-To: <91AC5B00-5969-4C56-B08A-6EEA76916A10@sbc.su.se>
References: <91AC5B00-5969-4C56-B08A-6EEA76916A10@sbc.su.se>
Message-ID: <AANLkTiknuVWFiz6kmOYAsHaLnPxMZEBWsHeBtv0yfuCQ@mail.gmail.com>

I can help out with these. I'm pretty sure I've previously fought with (and
perhaps even come up with a fix for) bug 3039, and I can take a look at 3007
too.

Now lemme just see if I can get up and running with the Bioperl test suite.
I'll give a shout if I run into any problems.

Cheers,
 Greg

On 5 August 2010 13:16, Dave Messina <David.Messina at sbc.su.se> wrote:

> Hi everybody,
>
> We've got a couple of small open bugs related to the Bio::TreeIO modules,
> and we could really use someone to take a look at them. Ideally, that
> someone would have familiarity with TreeIO already.*
>
> It'd help us to get the next release (1.6.2) out the door.
>
> The bugs in question are:
> - TreeIO::newick writes root node branch length incorrectly
> http://bugzilla.open-bio.org/show_bug.cgi?id=3039
>
> - Bio::TreeIO::nhx cannot parse empty [&&NHX] + round-trip failure
> http://bugzilla.open-bio.org/show_bug.cgi?id=3007
>
>
> Thanks,
> Dave
> on behalf of the core developers
>
>
> * Even if you don't, though, if you've been looking for an opportunity to
> contribute to BioPerl, and this sounds like something you'd like to work on,
> by all means raise your hand.
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

From jun.yin at ucd.ie  Fri Aug  6 06:52:14 2010
From: jun.yin at ucd.ie (Jun Yin)
Date: Fri, 06 Aug 2010 11:52:14 +0100
Subject: [Bioperl-l] Packages retrieving online alignment sequences
Message-ID: <00d901cb3555$6e3a5500$4aaeff00$%yin@ucd.ie>

Hi, all,

 
I am the google summer of code student working on refactoring Bio::Align
subsystem. I recently implemented several packages retrieving online
alignment sequences. The aim of the packages are to provide convenient
methods to retrieve online alignment sequences for the BioPerl users. The
alignment sequences are converted into Bio::SimpleAlign object after the
retrieval, which will be easy to manipulate and write to local disk. Now the
packages support Pfam, Rfam, Prosite and Entrez Protein Clusters databases.

 
Here is the structure of the packages:

Packages

Bio::DB::Align (interface, and calling other packages)

Bio::DB::Align::Pfam (retrieving alignment from Pfam)

Bio::DB::Align::Rfam (retrieving alignment from Rfam)

Bio::DB::Align:Prosite (retrieving alignment from Prosite)

Bio::DB::Align:ProtClustDB (retrieving alignment from Entrez Protein
Clusters Database)

 
Usually four methods are provided for each package:

Methods

get_Aln_by_id (retrieving alignment by id and returns Bio::SimpleAlign
object)

get_Aln_by_acc (retrieving alignment by acession and returns
Bio::SimpleAlign object) (Rfam and Prosite only supports this method)

id2acc (id to accession conversion)

acc2id (accession to id conversion)

 
These packages are built dependent on LWP::UserAgent, HTTP::Request and
Bio::DB::GenericWebAgent. Bio::DB::Align::ProtClustDB is dependent on
Bio::DB::EUtilities.

 
Calling the packages can be:

 
my $dbobj=Bio::DB::Align->new(-db=>"rfam");

Or, my $dbobj= Bio::DB::Align::Pfam->new();


my $aln=$dbobj->get_Aln_by_acc("RF0001");
my $aln2=$dbobj->get_Aln_by_acc(-accession=>"RF0001",-alignment=>"full");

print $aln->length();

foreach my $seq ($aln->each_Seq) {
#do something
}

 
I have done some tests on these packages. And, I will write them into
standard tests later. Any suggestions on these packages are welcome.

 
Cheers,

Jun Yin

Ph.D. student in U.C.D.

 
Bioinformatics Laboratory

Conway Institute

University College Dublin

 
From David.Messina at sbc.su.se  Fri Aug  6 08:59:19 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 6 Aug 2010 14:59:19 +0200
Subject: [Bioperl-l] call for a TreeIO volunteer
In-Reply-To: <AANLkTiknuVWFiz6kmOYAsHaLnPxMZEBWsHeBtv0yfuCQ@mail.gmail.com>
References: <91AC5B00-5969-4C56-B08A-6EEA76916A10@sbc.su.se>
	<AANLkTiknuVWFiz6kmOYAsHaLnPxMZEBWsHeBtv0yfuCQ@mail.gmail.com>
Message-ID: <6D6DAA77-2A2F-4AAA-B36D-FACED1FDE383@sbc.su.se>


> I can help out with these. I'm pretty sure I've previously fought with (and perhaps even come up with a fix for) bug 3039, and I can take a look at 3007 too.

Awesome ? thanks Greg!


> Now lemme just see if I can get up and running with the Bioperl test suite. I'll give a shout if I run into any problems.

Please do.


Dave


From David.Messina at sbc.su.se  Fri Aug  6 09:06:47 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 6 Aug 2010 15:06:47 +0200
Subject: [Bioperl-l] Packages retrieving online alignment sequences
In-Reply-To: <00d901cb3555$6e3a5500$4aaeff00$%yin@ucd.ie>
References: <00d901cb3555$6e3a5500$4aaeff00$%yin@ucd.ie>
Message-ID: <F90660F7-74F9-41F2-A3E4-B3B42B817A0D@sbc.su.se>

Sounds great, Jun!

Did you happen to test your code on very large alignments? I know there's one in Pfam that's something like 100,000 sequences. An rRNA, I believe.


Dave


From jun.yin at ucd.ie  Fri Aug  6 09:11:41 2010
From: jun.yin at ucd.ie (Jun Yin)
Date: Fri, 06 Aug 2010 14:11:41 +0100
Subject: [Bioperl-l] Packages retrieving online alignment sequences
In-Reply-To: <F90660F7-74F9-41F2-A3E4-B3B42B817A0D@sbc.su.se>
References: <00d901cb3555$6e3a5500$4aaeff00$%yin@ucd.ie>
	<F90660F7-74F9-41F2-A3E4-B3B42B817A0D@sbc.su.se>
Message-ID: <00fc01cb3568$e97968b0$bc6c3a10$%yin@ucd.ie>

Hi, Dave,

Thx for reminding me this. I will definitely try it.

Cheers,
Jun Yin
Ph.D.?student in U.C.D.

Bioinformatics Laboratory
Conway Institute
University College Dublin


-----Original Message-----
From: Dave Messina [mailto:David.Messina at sbc.su.se] 
Sent: Friday, August 06, 2010 2:07 PM
To: Jun Yin
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Packages retrieving online alignment sequences

Sounds great, Jun!

Did you happen to test your code on very large alignments? I know there's
one in Pfam that's something like 100,000 sequences. An rRNA, I believe.


Dave


__________ Information from ESET Smart Security, version of virus signature
database 5346 (20100806) __________

The message was checked by ESET Smart Security.

http://www.eset.com


__________ Information from ESET Smart Security, version of virus signature
database 5346 (20100806) __________

The message was checked by ESET Smart Security.

http://www.eset.com
 

From cjfields at illinois.edu  Fri Aug  6 09:19:54 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 6 Aug 2010 08:19:54 -0500
Subject: [Bioperl-l] call for a TreeIO volunteer
In-Reply-To: <6D6DAA77-2A2F-4AAA-B36D-FACED1FDE383@sbc.su.se>
References: <91AC5B00-5969-4C56-B08A-6EEA76916A10@sbc.su.se>
	<AANLkTiknuVWFiz6kmOYAsHaLnPxMZEBWsHeBtv0yfuCQ@mail.gmail.com>
	<6D6DAA77-2A2F-4AAA-B36D-FACED1FDE383@sbc.su.se>
Message-ID: <8CB3DE9A-4C5C-42A3-94B4-8818D7143951@illinois.edu>

On Aug 6, 2010, at 7:59 AM, Dave Messina wrote:

> 
>> I can help out with these. I'm pretty sure I've previously fought with (and perhaps even come up with a fix for) bug 3039, and I can take a look at 3007 too.
> 
> Awesome ? thanks Greg!
> 
> 
>> Now lemme just see if I can get up and running with the Bioperl test suite. I'll give a shout if I run into any problems.
> 
> Please do.
> 
> 
> 
> Dave

Agreed, and thanks for helping out!

chris

From dianabowley at gmail.com  Fri Aug  6 18:33:57 2010
From: dianabowley at gmail.com (DRBowley)
Date: Fri, 6 Aug 2010 15:33:57 -0700 (PDT)
Subject: [Bioperl-l] BioPerl install issues
Message-ID: <b70994fe-d6c3-4c58-8b45-dfe50b9a8fe5@t5g2000prd.googlegroups.com>

I'm new to both perl and bioperl and I'm having issues installing
bioperl.  I'm trying to install on a Mac OS 10.6.4, and I've already
installed perl (5.10.0).  I tried installing using the recommended
approach for Mac - via Fink...
"fink install bioperl-pm5100"

Looking back over the terminal window text it looks like the problem
is:
"This package requires Module::Build v0.2805 or greater to install
itself."

I tried doing "fink selfupdate" and that did not fix the problem.

Any suggestions?

Thanks!
Diana

From Kevin.M.Brown at asu.edu  Fri Aug  6 18:50:45 2010
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Fri, 6 Aug 2010 15:50:45 -0700
Subject: [Bioperl-l] BioPerl install issues
In-Reply-To: <b70994fe-d6c3-4c58-8b45-dfe50b9a8fe5@t5g2000prd.googlegroups.com>
References: <b70994fe-d6c3-4c58-8b45-dfe50b9a8fe5@t5g2000prd.googlegroups.com>
Message-ID: <1A4207F8295607498283FE9E93B775B406E44A05@EX02.asurite.ad.asu.edu>

http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix#INSTALLING_BIOPE
RL_THE_EASY_WAY_USING_Build.PL

Not sure why you had to install perl since it should have been part of
the stock OSX install (or at least it was last time I logged onto a
mac). Not sure why the Fink method has so many issues, but might try the
above which works for linux or bsd.

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of DRBowley
Sent: Friday, August 06, 2010 3:34 PM
To: bioperl-l at bioperl.org
Subject: [Bioperl-l] BioPerl install issues

I'm new to both perl and bioperl and I'm having issues installing
bioperl.  I'm trying to install on a Mac OS 10.6.4, and I've already
installed perl (5.10.0).  I tried installing using the recommended
approach for Mac - via Fink...
"fink install bioperl-pm5100"

Looking back over the terminal window text it looks like the problem
is:
"This package requires Module::Build v0.2805 or greater to install
itself."

I tried doing "fink selfupdate" and that did not fix the problem.

Any suggestions?

Thanks!
Diana
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From skastu01 at students.poly.edu  Fri Aug  6 20:03:50 2010
From: skastu01 at students.poly.edu (Lakshmi Kastury)
Date: Sat, 7 Aug 2010 00:03:50 +0000
Subject: [Bioperl-l] BioPerl install issues
Message-ID: <BLU106-W267722078497EAEDEC08C594920@phx.gbl>


Hi -
I went through several failed attempts on MACOS Snow Leopard, and fink was a dead end. Eventually I succeeded to install on Windows Vista using CPAN. I am not sure if this method will work with MACOS:

1. Opened command prompt.
2. Typed command: >perl -MCPAN -e "install Bundle::BioPerl"
3. Answered yes to the series of questions, which prompts install of several bundles and a compiler.

The instructions were in a link from:
http://bioperl.org/Core/Latest/INSTALL

All the best,
Lakshmi

> Date: Fri, 6 Aug 2010 15:33:57 -0700
> From: dianabowley at gmail.com
> To: bioperl-l at bioperl.org
> Subject: [Bioperl-l] BioPerl install issues
> 
> I'm new to both perl and bioperl and I'm having issues installing
> bioperl.  I'm trying to install on a Mac OS 10.6.4, and I've already
> installed perl (5.10.0).  I tried installing using the recommended
> approach for Mac - via Fink...
> "fink install bioperl-pm5100"
> 
> Looking back over the terminal window text it looks like the problem
> is:
> "This package requires Module::Build v0.2805 or greater to install
> itself."
> 
> I tried doing "fink selfupdate" and that did not fix the problem.
> 
> Any suggestions?
> 
> Thanks!
> Diana
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
 		 	   		  

From David.Messina at sbc.su.se  Sat Aug  7 02:47:40 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Sat, 7 Aug 2010 08:47:40 +0200
Subject: [Bioperl-l] BioPerl install issues
In-Reply-To: <BLU106-W267722078497EAEDEC08C594920@phx.gbl>
References: <BLU106-W267722078497EAEDEC08C594920@phx.gbl>
Message-ID: <5BE9DB7C-9A51-4C09-8F83-8CA8ED4AADFE@sbc.su.se>


On Aug 7, 2010, at 02:03 , Lakshmi Kastury wrote:

>  I am not sure if this method will work with MACOS:

It will. CPAN is cross-platform and is the best way to install BioPerl.


Dave


From cjfields at illinois.edu  Sat Aug  7 09:58:56 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Sat, 7 Aug 2010 08:58:56 -0500
Subject: [Bioperl-l] BioPerl install issues
In-Reply-To: <5BE9DB7C-9A51-4C09-8F83-8CA8ED4AADFE@sbc.su.se>
References: <BLU106-W267722078497EAEDEC08C594920@phx.gbl>
	<5BE9DB7C-9A51-4C09-8F83-8CA8ED4AADFE@sbc.su.se>
Message-ID: <A21BBC5D-1D71-4534-B278-9FCFA0BB6DA8@illinois.edu>

It should work fine.  Even installing from trunk right now works w/o failing tests. 

chris

On Aug 7, 2010, at 1:47 AM, Dave Messina wrote:

> 
> On Aug 7, 2010, at 02:03 , Lakshmi Kastury wrote:
> 
>> I am not sure if this method will work with MACOS:
> 
> It will. CPAN is cross-platform and is the best way to install BioPerl.
> 
> 
> Dave
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From greg at ebi.ac.uk  Sat Aug  7 17:14:58 2010
From: greg at ebi.ac.uk (Gregory Jordan)
Date: Sat, 7 Aug 2010 22:14:58 +0100
Subject: [Bioperl-l] Packages retrieving online alignment sequences
In-Reply-To: <00fc01cb3568$e97968b0$bc6c3a10$%yin@ucd.ie>
References: <00d901cb3555$6e3a5500$4aaeff00$%yin@ucd.ie>
	<F90660F7-74F9-41F2-A3E4-B3B42B817A0D@sbc.su.se> 
	<00fc01cb3568$e97968b0$bc6c3a10$%yin@ucd.ie>
Message-ID: <AANLkTimL938B1ovmOKC_FBNw1OwjipVpjOXZfN+P5Kf5@mail.gmail.com>

Maybe I'm just a bit naive here, but what is the expected difference between
accession and ID and why do we need a separate method for each? Seems to me
that one could just have a single method, get_Aln, which determines under
the hood whether the query string is an accession or ID.

It would be nice if the SimpleAlign object had its Annotation filled with
some extra metadata (such as accession, ID, database version number, URI,
etc.).

One other thing: have you thought about adding an Ensembl adaptor? Or maybe
something similar already exists in BioPerl...?

Sure Ensembl provides their own Perl API, but for someone who doesn't want
to go through the hassle of installing it from CVS (pardon my french, but
wtf!?! Who still uses CVS) and learning a whole new API, it might be
convenient to have a simple BioPerl module for quickly grabbing gene family
alignments from the public Ensembl MySQL databases. I'd be willing to help
write the necessary SQL queries for this.

greg

On 6 August 2010 14:11, Jun Yin <jun.yin at ucd.ie> wrote:

> Hi, Dave,
>
> Thx for reminding me this. I will definitely try it.
>
> Cheers,
> Jun Yin
> Ph.D. student in U.C.D.
>
> Bioinformatics Laboratory
> Conway Institute
> University College Dublin
>
>
> -----Original Message-----
> From: Dave Messina [mailto:David.Messina at sbc.su.se]
> Sent: Friday, August 06, 2010 2:07 PM
> To: Jun Yin
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Packages retrieving online alignment sequences
>
> Sounds great, Jun!
>
> Did you happen to test your code on very large alignments? I know there's
> one in Pfam that's something like 100,000 sequences. An rRNA, I believe.
>
>
> Dave
>
>
> __________ Information from ESET Smart Security, version of virus signature
> database 5346 (20100806) __________
>
> The message was checked by ESET Smart Security.
>
> http://www.eset.com
>
>
>
>
> __________ Information from ESET Smart Security, version of virus signature
> database 5346 (20100806) __________
>
> The message was checked by ESET Smart Security.
>
> http://www.eset.com
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

From cjfields at illinois.edu  Sat Aug  7 18:07:39 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Sat, 7 Aug 2010 17:07:39 -0500
Subject: [Bioperl-l] Packages retrieving online alignment sequences
In-Reply-To: <AANLkTimL938B1ovmOKC_FBNw1OwjipVpjOXZfN+P5Kf5@mail.gmail.com>
References: <00d901cb3555$6e3a5500$4aaeff00$%yin@ucd.ie>
	<F90660F7-74F9-41F2-A3E4-B3B42B817A0D@sbc.su.se>
	<00fc01cb3568$e97968b0$bc6c3a10$%yin@ucd.ie>
	<AANLkTimL938B1ovmOKC_FBNw1OwjipVpjOXZfN+P5Kf5@mail.gmail.com>
Message-ID: <21E3B6D7-01BC-4DDA-B5B3-06F1F5AD7105@illinois.edu>

On Aug 7, 2010, at 4:14 PM, Gregory Jordan wrote:

> Maybe I'm just a bit naive here, but what is the expected difference between
> accession and ID and why do we need a separate method for each?

Depends on the remote service, but in many cases there is a difference.  With NCBI eutils you can have either an accession and the unique identifier (UID, or GI for nuc/protein seqs).  efetch can use both, but only the UID is guaranteed to retrieve a single sequence all the time; the accession can (very rarely) map to more than one sequence.  

The other eutils services require either a string (esearch) or a UID, but do not allow an accession.

> Seems to me
> that one could just have a single method, get_Aln, which determines under
> the hood whether the query string is an accession or ID.

A simpler method could be introduced, but I can see that being potentially brittle in the long run.  A naked alphanumeric string doesn't reveal much about what it is at face value w/o knowing database/service-specific behavior.  And then we're reliant on that behavior not changing, which we can't guarantee (this has bitten us in the past).  What would one do if NCBI (for instance) allowed accessions derived completely of digits, or conversely a unique ID with mixed alphanumerics?

Using methods specific for ID/acc at least guarantees a behavior on the backend w/o guessing, and if there is no danger of overlap (a service accepts either/or) one could simply be an alias of the other.

> It would be nice if the SimpleAlign object had its Annotation filled with
> some extra metadata (such as accession, ID, database version number, URI,
> etc.).

According to the deobfuscator SimpleAlign does have accession() and id().  The others could be simple attributes, and can be added as simple getter/setters, or as annotation via Bio::Annotation (this is the way Stockholm annotation is currently handled).  Something to think about.

> One other thing: have you thought about adding an Ensembl adaptor? Or maybe
> something similar already exists in BioPerl...?

That's a good idea, though it might make more sense if this was done when mem-efficient (possibly DB-dependent) AlignI modules are present within bioperl, which is part of the GSoC (see below).  For instance, have a Bio::Align::AlignI with a backend ensembl DB adaptor that works lazily.

If using the Ensembl Perl API, a few possible roadblocks/problems might pop up. Ensembl currently requires bioperl (v1.2.3, but it works with the latest as well, at least when I've used it).  If using the ensembl perl API we would just need to ensure we aren't conflicting with ensembl code that pulls in bioperl classes expecting a v1.2.3 API when we only support the latest.  I don't foresee this being an issue, though (there is precedent for this, see Sendu's Ensembl module Bio::Tools::Run::Ensembl in bioperl-run).

> Sure Ensembl provides their own Perl API, but for someone who doesn't want
> to go through the hassle of installing it from CVS (pardon my french, but
> wtf!?! Who still uses CVS) and learning a whole new API, it might be
> convenient to have a simple BioPerl module for quickly grabbing gene family
> alignments from the public Ensembl MySQL databases. I'd be willing to help
> write the necessary SQL queries for this.
> 
> greg

The GSoC project on alignment subsystem refactoring will be finishing up this month, so I'm sure Jun discuss ideas for initial DB-dependent implementations.  The more input and coders implementing the better, IMO.

As for writing up an adaptor to ensembl outside of it's API, overall I don't think it's a bad idea, but if it's possible maybe start without reinventing things, then move to direct SQL.  Unless it's easier to use SQL.

chris

> On 6 August 2010 14:11, Jun Yin <jun.yin at ucd.ie> wrote:
> 
>> Hi, Dave,
>> 
>> Thx for reminding me this. I will definitely try it.
>> 
>> Cheers,
>> Jun Yin
>> Ph.D. student in U.C.D.
>> 
>> Bioinformatics Laboratory
>> Conway Institute
>> University College Dublin
>> 
>> 
>> -----Original Message-----
>> From: Dave Messina [mailto:David.Messina at sbc.su.se]
>> Sent: Friday, August 06, 2010 2:07 PM
>> To: Jun Yin
>> Cc: bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] Packages retrieving online alignment sequences
>> 
>> Sounds great, Jun!
>> 
>> Did you happen to test your code on very large alignments? I know there's
>> one in Pfam that's something like 100,000 sequences. An rRNA, I believe.
>> 
>> 
>> Dave
>> 
>> 
>> __________ Information from ESET Smart Security, version of virus signature
>> database 5346 (20100806) __________
>> 
>> The message was checked by ESET Smart Security.
>> 
>> http://www.eset.com
>> 
>> 
>> 
>> 
>> __________ Information from ESET Smart Security, version of virus signature
>> database 5346 (20100806) __________
>> 
>> The message was checked by ESET Smart Security.
>> 
>> http://www.eset.com
>> 
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hartzell at alerce.com  Sat Aug  7 17:45:04 2010
From: hartzell at alerce.com (George Hartzell)
Date: Sat, 7 Aug 2010 14:45:04 -0700
Subject: [Bioperl-l] BioPerl install issues
In-Reply-To: <A21BBC5D-1D71-4534-B278-9FCFA0BB6DA8@illinois.edu>
References: <BLU106-W267722078497EAEDEC08C594920@phx.gbl>
	<5BE9DB7C-9A51-4C09-8F83-8CA8ED4AADFE@sbc.su.se>
	<A21BBC5D-1D71-4534-B278-9FCFA0BB6DA8@illinois.edu>
Message-ID: <19549.54240.499140.501136@gargle.gargle.HOWL>

Chris Fields writes:
 > It should work fine.  Even installing from trunk right now works
 > w/o failing tests.  

As a slight aside, if you're looking to build a current perl binary
for your mac (e.g. 5.12.1) you should take a look at perlbrew
(http://search.cpan.org/dist/App-perlbrew/).  The three steps at the
top of the installation section of the README are all you need to get
going.  Even a manager can do it.

If you're using bash on the mac via terminal you'll probably want to
put the one-liner they prescribe into your .bash_profile instead of
your .bashrc, but everything else just flows right along.

Once you have that in place you have a nicely isolated system into
which you can install things to your hearts content without worrying
about PERL5LIB and local::lib and the rest.

g.

From cjfields at illinois.edu  Sat Aug  7 21:19:54 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Sat, 7 Aug 2010 20:19:54 -0500
Subject: [Bioperl-l] BioPerl install issues
In-Reply-To: <19549.54240.499140.501136@gargle.gargle.HOWL>
References: <BLU106-W267722078497EAEDEC08C594920@phx.gbl>
	<5BE9DB7C-9A51-4C09-8F83-8CA8ED4AADFE@sbc.su.se>
	<A21BBC5D-1D71-4534-B278-9FCFA0BB6DA8@illinois.edu>
	<19549.54240.499140.501136@gargle.gargle.HOWL>
Message-ID: <EA5D5C26-7F3E-46B5-9CD0-F3D51B5F9511@illinois.edu>

On Aug 7, 2010, at 4:45 PM, George Hartzell wrote:

> Chris Fields writes:
>> It should work fine.  Even installing from trunk right now works
>> w/o failing tests.  
> 
> As a slight aside, if you're looking to build a current perl binary
> for your mac (e.g. 5.12.1) you should take a look at perlbrew
> (http://search.cpan.org/dist/App-perlbrew/).  The three steps at the
> top of the installation section of the README are all you need to get
> going.  Even a manager can do it.
> 
> If you're using bash on the mac via terminal you'll probably want to
> put the one-liner they prescribe into your .bash_profile instead of
> your .bashrc, but everything else just flows right along.
> 
> Once you have that in place you have a nicely isolated system into
> which you can install things to your hearts content without worrying
> about PERL5LIB and local::lib and the rest.
> 
> g.

Have to second using perlbrew, started using it for my local Ubuntu installation (don't have it running on my macbook yet, but it's in the plans).

chris


From greg at ebi.ac.uk  Sun Aug  8 02:12:41 2010
From: greg at ebi.ac.uk (Gregory Jordan)
Date: Sun, 8 Aug 2010 07:12:41 +0100
Subject: [Bioperl-l] Packages retrieving online alignment sequences
In-Reply-To: <21E3B6D7-01BC-4DDA-B5B3-06F1F5AD7105@illinois.edu>
References: <00d901cb3555$6e3a5500$4aaeff00$%yin@ucd.ie>
	<F90660F7-74F9-41F2-A3E4-B3B42B817A0D@sbc.su.se> 
	<00fc01cb3568$e97968b0$bc6c3a10$%yin@ucd.ie>
	<AANLkTimL938B1ovmOKC_FBNw1OwjipVpjOXZfN+P5Kf5@mail.gmail.com> 
	<21E3B6D7-01BC-4DDA-B5B3-06F1F5AD7105@illinois.edu>
Message-ID: <AANLkTim9jkmKSGHm5bHPLOF3_xf+p9xMTN5Ha7bOMR7P@mail.gmail.com>

On 7 August 2010 23:07, Chris Fields <cjfields at illinois.edu> wrote:

>
> A simpler method could be introduced, but I can see that being potentially
> brittle in the long run.  A naked alphanumeric string doesn't reveal much
> about what it is at face value w/o knowing database/service-specific
> behavior.  And then we're reliant on that behavior not changing, which we
> can't guarantee (this has bitten us in the past).  What would one do if NCBI
> (for instance) allowed accessions derived completely of digits, or
> conversely a unique ID with mixed alphanumerics?
>
> Using methods specific for ID/acc at least guarantees a behavior on the
> backend w/o guessing, and if there is no danger of overlap (a service
> accepts either/or) one could simply be an alias of the other.
>

Thanks for the clarification on IDs vs accessions. As long as the behavior
and distinction are well-documented, I'm sure it won't make too much of a
difference.

My main concern was just that having two similar methods -- with no clearly
laid out distinction between the two and one of them only supported by half
of the implementing subclasses -- might confuse potential users.

As a point of reference: both Rfam and Pfam allow either an ID or an
accession in their front-page search interface (http://www.pfam.org /
http://www.rfam.org/). In fact, they seem to entirely hide the distinction
between ID and Accession from the end user; nowhere on the Rfam page for an
individual result is it clear which string is the accession and which is the
ID (http://rfam.sanger.ac.uk/family/snoZ107_R87).

Thus, a potential user of the Rfam module wouldn't know whether to call the
get_by_ID or get_by_Accession method, even after looking at the Rfam page
for his / her desired alignment!

As you can probably tell, I'm all in favor of a unified search whenever
feasible / possible. :-)


> As for writing up an adaptor to ensembl outside of it's API, overall I
> don't think it's a bad idea, but if it's possible maybe start without
> reinventing things, then move to direct SQL.  Unless it's easier to use SQL.
>
>
For fetching Ensembl's gene family alignments, using the SQL will be
easiest. They don't tend to get unreasonably large in terms of memory  -- I
think the biggest tend to be ~700 sequences with a few thousand alignment
columns or so -- and it's a simple table join or two to get both the tree
and alignment from the database.

For genomic alignments, I agree that a more memory-efficient and/or lazy
backend would be necessary. And it's pretty much impossible to get those
things out of the Ensembl tables without using their API.

--greg

From dan.kortschak at adelaide.edu.au  Sun Aug  8 20:53:43 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Mon, 09 Aug 2010 10:23:43 +0930
Subject: [Bioperl-l] MUMmer parser work
In-Reply-To: <80AF6158-9ADF-47A6-97EC-C322F75C8959@illinois.edu>
References: <1281056805.2414.26.camel@zoidberg.mbs.adelaide.edu.au>
	<80AF6158-9ADF-47A6-97EC-C322F75C8959@illinois.edu>
Message-ID: <1281315223.2414.48.camel@zoidberg.mbs.adelaide.edu.au>

Hi Chris,

Is that set of files planned to be included in the git repository on
bioperl-live? I don't want to push something that is being organised by
someone else.

cheers
Dan

On Thu, 2010-08-05 at 22:13 -0500, Chris Fields wrote:
> Dan,
> 
> Just so you know, there is a proposed MUMmer AlignIO parser that John (genehack) is planning on trying to incorporate in:
> 
> http://bugzilla.open-bio.org/show_bug.cgi?id=2701
> 
> It currently lacks significant tests, so feel free to chip in there as needed.
> 
> chris


From genehack at genehack.org  Sun Aug  8 21:42:27 2010
From: genehack at genehack.org (John SJ Anderson)
Date: Sun, 8 Aug 2010 21:42:27 -0400
Subject: [Bioperl-l] MUMmer parser work
In-Reply-To: <1281315223.2414.48.camel@zoidberg.mbs.adelaide.edu.au>
References: <1281056805.2414.26.camel@zoidberg.mbs.adelaide.edu.au>
	<80AF6158-9ADF-47A6-97EC-C322F75C8959@illinois.edu>
	<1281315223.2414.48.camel@zoidberg.mbs.adelaide.edu.au>
Message-ID: <5BEA6ECA-B7A7-4417-BC91-763AB956347A@genehack.org>

I'm working on getting those files into a topic branch in bioperl-live so they can be reviewed -- that'll probably be pushed back to the main master within the next couple days at the latest. 

j.

On Aug 8, 2010, at 20:53 , Dan Kortschak wrote:

> Hi Chris,
> 
> Is that set of files planned to be included in the git repository on
> bioperl-live? I don't want to push something that is being organised by
> someone else.
> 
> cheers
> Dan
> 
> On Thu, 2010-08-05 at 22:13 -0500, Chris Fields wrote:
>> Dan,
>> 
>> Just so you know, there is a proposed MUMmer AlignIO parser that John (genehack) is planning on trying to incorporate in:
>> 
>> http://bugzilla.open-bio.org/show_bug.cgi?id=2701
>> 
>> It currently lacks significant tests, so feel free to chip in there as needed.
>> 
>> chris
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From dan.kortschak at adelaide.edu.au  Sun Aug  8 22:03:52 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Mon, 09 Aug 2010 11:33:52 +0930
Subject: [Bioperl-l] MUMmer parser work
In-Reply-To: <5BEA6ECA-B7A7-4417-BC91-763AB956347A@genehack.org>
References: <1281056805.2414.26.camel@zoidberg.mbs.adelaide.edu.au>
	<80AF6158-9ADF-47A6-97EC-C322F75C8959@illinois.edu>
	<1281315223.2414.48.camel@zoidberg.mbs.adelaide.edu.au>
	<5BEA6ECA-B7A7-4417-BC91-763AB956347A@genehack.org>
Message-ID: <1281319432.2414.49.camel@zoidberg.mbs.adelaide.edu.au>

Excellent. Thanks for that.

Dan

On Sun, 2010-08-08 at 21:42 -0400, John SJ Anderson wrote:
> I'm working on getting those files into a topic branch in bioperl-live so they can be reviewed -- that'll probably be pushed back to the main master within the next couple days at the latest. 
> 
> j.


From cjfields at illinois.edu  Mon Aug  9 22:40:07 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 9 Aug 2010 21:40:07 -0500
Subject: [Bioperl-l] bioperl-live, moving Bio->lib/Bio
Message-ID: <DE527A62-E6E7-45B0-96A5-F94E7A7A137F@illinois.edu>

Any objections to moving the Bio directory to lib/Bio in bioperl-live?  It's a more standard location for code in most distributions; I have a branch (topic/cjfields_standard_lib) that has this working, though it's possible that it needs more work.

chris

From genehack at genehack.org  Tue Aug 10 04:30:44 2010
From: genehack at genehack.org (John SJ Anderson)
Date: Tue, 10 Aug 2010 04:30:44 -0400
Subject: [Bioperl-l] bioperl-live, moving Bio->lib/Bio
In-Reply-To: <DE527A62-E6E7-45B0-96A5-F94E7A7A137F@illinois.edu>
References: <DE527A62-E6E7-45B0-96A5-F94E7A7A137F@illinois.edu>
Message-ID: <B2C73D74-1F72-402B-A3F7-C4E3ECF7D3B6@genehack.org>


On Aug 9, 2010, at 22:40 , Chris Fields wrote:

> Any objections to moving the Bio directory to lib/Bio in bioperl-live?  

+1 on this idea. 

j.


From genehack at genehack.org  Tue Aug 10 07:21:51 2010
From: genehack at genehack.org (John Anderson)
Date: Tue, 10 Aug 2010 07:21:51 -0400
Subject: [Bioperl-l] MUMmer parser work
In-Reply-To: <5BEA6ECA-B7A7-4417-BC91-763AB956347A@genehack.org>
References: <1281056805.2414.26.camel@zoidberg.mbs.adelaide.edu.au>
	<80AF6158-9ADF-47A6-97EC-C322F75C8959@illinois.edu>
	<1281315223.2414.48.camel@zoidberg.mbs.adelaide.edu.au>
	<5BEA6ECA-B7A7-4417-BC91-763AB956347A@genehack.org>
Message-ID: <7A4F93AB-1BF7-4775-BC0E-38E7B431ECC6@genehack.org>


On Aug 8, 2010, at 9:42 PM, John SJ Anderson wrote:

> I'm working on getting those files into a topic branch in bioperl-live so they can be reviewed -- that'll probably be pushed back to the main master within the next couple days at the latest. 

Okay, the files have been added to topic/bug-2701 -- see <http://github.com/bioperl/bioperl-live/commits/topic/bug-2701>.

Please note, these are just the files from the bug report, slotted into the appropriate spots. I haven't reviewed the code or done anything about the non-BioPerl-y tests or the general lack of test coverage. I hope to do something about that in the coming week, but if somebody beats me to it, that would be okay too.

j.


From maj at fortinbras.us  Tue Aug 10 19:52:05 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 10 Aug 2010 19:52:05 -0400
Subject: [Bioperl-l] bioperl-live, moving Bio->lib/Bio
In-Reply-To: <DE527A62-E6E7-45B0-96A5-F94E7A7A137F@illinois.edu>
References: <DE527A62-E6E7-45B0-96A5-F94E7A7A137F@illinois.edu>
Message-ID: <1C55239986494A8D82BDC21A85B324E9@NewLife>

+1
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "BioPerl List" <bioperl-l at lists.open-bio.org>
Sent: Monday, August 09, 2010 10:40 PM
Subject: [Bioperl-l] bioperl-live, moving Bio->lib/Bio


> Any objections to moving the Bio directory to lib/Bio in bioperl-live?  It's a 
> more standard location for code in most distributions; I have a branch 
> (topic/cjfields_standard_lib) that has this working, though it's possible that 
> it needs more work.
>
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From fayroz_farouk at yahoo.com  Sun Aug  8 04:24:31 2010
From: fayroz_farouk at yahoo.com (fayroz)
Date: Sun, 8 Aug 2010 01:24:31 -0700 (PDT)
Subject: [Bioperl-l] using HMMER
Message-ID: <603590.1072.qm@web112620.mail.gq1.yahoo.com>

i need your help, i?am a new perl user and want to use bioperl modules to run 
HMMER program ( HMMsearch) i have" model.hmm" and a "fasta file" to?see?which of 
them are similar?with the model
i write this code but there is a problems

#!/usr/local/bin/perl W
use Bio::AlignIO;
use Bio::SearchIO;
use Bio::SeqIO ;
use Bio::Tools::Run::Hmmer;

# run hmmsearch (similar for hmmpfam)
my $factory = Bio::Tools::Run::Hmmer->new(-hmm => 'h6_avian.hmm',-informat => 
'fasta');
my $seq = Bio::SeqIO->new('-file'=> "one_seq.fa", '-format'=>'Fasta');

# Pass the factory a Bio::Seq object or a file name, returns a Bio::SearchIO
my $searchio = $factory->hmmsearch($seq);

while (my $result = $searchio->next_result){
while(my $hit = $result->next_hit){
while (my $hsp = $hit->next_hsp){
print join("\t", ( $result->query_name,
$hsp->query->start,
$hsp->query->end,
$hit->name,
$hsp->hit->start,
$hsp->hit->end,
$hsp->score,
$hsp->evalue,
$hsp->seq_str,
)), "\n";
}
}
}


exceptions:
MSG: Unknown kind of input 'Bio::SeqIO::fasta=HASH(0x329a504)'
STACK Bio::Tools::Run::Hmmer::_setinput 
D:/Perl/site/lib/Bio/Tools/Run/Hmmer.pm:381
STACK Bio::Tools::Run::Hmmer::hmmsearch 
D:/Perl/site/lib/Bio/Tools/Run/Hmmer.pm:352
?STACK toplevel test_bioperl.pl:12
thank you

fayroz?


From douglas.hoen at gmail.com  Tue Aug 10 21:54:53 2010
From: douglas.hoen at gmail.com (Douglas Hoen)
Date: Tue, 10 Aug 2010 21:54:53 -0400
Subject: [Bioperl-l] Bio::SeqFeature::SimilarityPair->from_searchResult()?
Message-ID: <4513D6B2-F7B3-4A6E-91CA-879C9E372E84@gmail.com>

Hi,

I was wondering why the Synopsis in the docs for Bio::SeqFeature::SimilarityPair has the following:
$sim_pair = Bio::SeqFeature::SimilarityPair->from_searchResult($blastHit);

There doesn't actually seem to be a from_searchResult method. Am I missing something?

Thanks,
-- Doug

From zhaoy at mail.cbi.pku.edu.cn  Wed Aug 11 04:17:42 2010
From: zhaoy at mail.cbi.pku.edu.cn (zhaoy at mail.cbi.pku.edu.cn)
Date: Wed, 11 Aug 2010 16:17:42 +0800 (CST)
Subject: [Bioperl-l] About extracting sequence from genewise format result
Message-ID: <53663.162.105.250.100.1281514662.squirrel@mail.cbi.pku.edu.cn>

Dear authors:

Hello!

Recently I am trying to parse the genewise format result for extracting
the nuclear sequence using method "hit_string" in module "SearchIO",
however, the result is empty. What's more terrible, the cycle seems not
working, because I always get the last result. I'm confused.

My perl code is shown below:

#!/usr/bin/perl -w
use strict;
use warnings;

use Bio::SearchIO;
my $in = new Bio::SearchIO(-format => 'wise',
                           -wisetype => 'genewise',
                           -file   => 'test');
while( my $result = $in->next_result ) {
        while (my $hit = $result->next_hit) {
           while (my $hsp = $hit->next_hsp){
                print "Query=",      $result->query_name, "\n",
                      "Length=",     $hsp->length('total'),"\n",
                      "hit_string:", $hsp->hit_string, "\n";
}
}
}

And one of the genewise format results is shown below:

genewise $Name: wise2-4-0alpha $ (unreleased release)
This program is freely distributed under a GPL. See source directory
Copyright (c) GRL limited: portions of the code are from separate copyright

Query protein:       Cpa_s110_24
Comp Matrix:         BLOSUM62.bla
Gap open:            12
Gap extension:       2
Start/End            global
Target Sequence      Bdi_chr3:38292015..38292302
Strand:              forward
Start/End (protein)  global
Gene Parameter file: gene.stat
Splice site model:   GT/AG only
Codon Table:         codon.table
Subs error:          1e-06
Indel error:         1e-06
Null model           syn
Algorithm            623

genewise output
Score 37.97 bits over entire alignment
Scores as bits over a synchronous coding model

Warning: The bits scores is not probablistically correct for single seqs
See WWW help for more info

Cpa_s110_24        1 MGNCQAVDAATLAIQHPS-GKVDRLYWPVSASEVMRTNPGHYVALLI--
                     MGNCQA DAA + IQHP+ GKV+RLYWP +A++VMR NPGHYVAL++
                     MGNCQAADAAAVVIQHPAEGKVERLYWPATAADVMRKNPGHYVALVVVH
Bdi_chr3:382920    1 agatcggggggggacccgggaggccttcgaggggacaacgctggcgggc
                     tgagaccaccctttaaccagatagtagcccccattgaacgaatctttta
                     gctcgggtggcggcgcgcgggcgcccggccgcccgcgcccccccccccc


Cpa_s110_24       47 ----STTLCPSNSNASNAESVRVTRIKLLRPTDTLVLGQVYRLITTQEV
                              P+ +    A + R+T++KLL+P DTL++GQVYRLIT+Q
                     VSGGAGETDPAVAGGGAAAAARITKVKLLKPRDTLLIGQVYRLITSQ--
Bdi_chr3:382920  148 gtgggggagcgggggggggggaaaagaccaccgaccagcgtccaatc
                     tcggcgacacctcgggcccccgtcatattacgactttgatagttcca
                     cctcctgtcccacaaaattccgccgcgccgcgctgcccgccccccca


Cpa_s110_24       92 MKGLWAKKCAKMKKYQEADHKDGLKPETIPGRRSGPERDTQVAKHERHR

                     -------------------------------------------------
Bdi_chr3:382920  289


Cpa_s110_24      141 SRVAASTNQAGLKSRTWQPSLKSISEAAS

                     -----------------------------
Bdi_chr3:382920  289


//
Gene 1
Gene 1 288
  Exon 1 288 phase 0
     Supporting 1 54 1 18
     Supporting 58 141 19 46
     Supporting 160 288 47 89
//

......


The part of output of this code is shown below:
Query=Aly_481360
Length=0
hit_string:

Query=Aly_481360
Length=0
hit_string:

......

What's wrong with my code and how can I get the correct result? I'm
looking forward to your reply.

Thanks very much!

Best regards,
Zackaly


From roy.chaudhuri at gmail.com  Wed Aug 11 10:32:39 2010
From: roy.chaudhuri at gmail.com (Roy Chaudhuri)
Date: Wed, 11 Aug 2010 15:32:39 +0100
Subject: [Bioperl-l] using HMMER
In-Reply-To: <603590.1072.qm@web112620.mail.gq1.yahoo.com>
References: <603590.1072.qm@web112620.mail.gq1.yahoo.com>
Message-ID: <4C62B487.9090103@gmail.com>

Hi Fayroz,

Your $seq variable contains a Bio::SeqIO object (a biological 
filehandle), not a Bio::Seq (sequence object).

You need to change that line to:
my $seqio = Bio::SeqIO->new(-file=>'one_seq.fa', -format=>'fasta');
my $seq=$seqio->next_seq;

If you have multiple sequences in the file, then you will need to loop 
over them:
while (my $seq=$seqio->next_seq) {
# Code to run Hmmer goes here
}

Also, I don't think you need to specify -informat for your 
Bio::Tools::Run::Hmmer object, since you're passing it a sequence 
object, not a filename.

Hope this helps.
Roy.

On 08/08/2010 09:24, fayroz wrote:
> i need your help, i am a new perl user and want to use bioperl modules to run
> HMMER program ( HMMsearch) i have" model.hmm" and a "fasta file" to see which of
> them are similar with the model
> i write this code but there is a problems
>
> #!/usr/local/bin/perl W
> use Bio::AlignIO;
> use Bio::SearchIO;
> use Bio::SeqIO ;
> use Bio::Tools::Run::Hmmer;
>
> # run hmmsearch (similar for hmmpfam)
> my $factory = Bio::Tools::Run::Hmmer->new(-hmm =>  'h6_avian.hmm',-informat =>
> 'fasta');
> my $seq = Bio::SeqIO->new('-file'=>  "one_seq.fa", '-format'=>'Fasta');
>
> # Pass the factory a Bio::Seq object or a file name, returns a Bio::SearchIO
> my $searchio = $factory->hmmsearch($seq);
>
> while (my $result = $searchio->next_result){
> while(my $hit = $result->next_hit){
> while (my $hsp = $hit->next_hsp){
> print join("\t", ( $result->query_name,
> $hsp->query->start,
> $hsp->query->end,
> $hit->name,
> $hsp->hit->start,
> $hsp->hit->end,
> $hsp->score,
> $hsp->evalue,
> $hsp->seq_str,
> )), "\n";
> }
> }
> }
>
>
> exceptions:
> MSG: Unknown kind of input 'Bio::SeqIO::fasta=HASH(0x329a504)'
> STACK Bio::Tools::Run::Hmmer::_setinput
> D:/Perl/site/lib/Bio/Tools/Run/Hmmer.pm:381
> STACK Bio::Tools::Run::Hmmer::hmmsearch
> D:/Perl/site/lib/Bio/Tools/Run/Hmmer.pm:352
>   STACK toplevel test_bioperl.pl:12
> thank you
>
> fayroz
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Wed Aug 11 11:07:36 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 11 Aug 2010 10:07:36 -0500
Subject: [Bioperl-l] using HMMER
In-Reply-To: <4C62B487.9090103@gmail.com>
References: <603590.1072.qm@web112620.mail.gq1.yahoo.com>
	<4C62B487.9090103@gmail.com>
Message-ID: <62C86AFB-FF3A-44C6-A413-50C3F839DF34@illinois.edu>

might also want to check whether you are using hmmer2 vs hmmer3.  not sure if the wrapper works for hmmer3.

chris

On Aug 11, 2010, at 9:32 AM, Roy Chaudhuri wrote:

> Hi Fayroz,
> 
> Your $seq variable contains a Bio::SeqIO object (a biological filehandle), not a Bio::Seq (sequence object).
> 
> You need to change that line to:
> my $seqio = Bio::SeqIO->new(-file=>'one_seq.fa', -format=>'fasta');
> my $seq=$seqio->next_seq;
> 
> If you have multiple sequences in the file, then you will need to loop over them:
> while (my $seq=$seqio->next_seq) {
> # Code to run Hmmer goes here
> }
> 
> Also, I don't think you need to specify -informat for your Bio::Tools::Run::Hmmer object, since you're passing it a sequence object, not a filename.
> 
> Hope this helps.
> Roy.
> 
> On 08/08/2010 09:24, fayroz wrote:
>> i need your help, i am a new perl user and want to use bioperl modules to run
>> HMMER program ( HMMsearch) i have" model.hmm" and a "fasta file" to see which of
>> them are similar with the model
>> i write this code but there is a problems
>> 
>> #!/usr/local/bin/perl W
>> use Bio::AlignIO;
>> use Bio::SearchIO;
>> use Bio::SeqIO ;
>> use Bio::Tools::Run::Hmmer;
>> 
>> # run hmmsearch (similar for hmmpfam)
>> my $factory = Bio::Tools::Run::Hmmer->new(-hmm =>  'h6_avian.hmm',-informat =>
>> 'fasta');
>> my $seq = Bio::SeqIO->new('-file'=>  "one_seq.fa", '-format'=>'Fasta');
>> 
>> # Pass the factory a Bio::Seq object or a file name, returns a Bio::SearchIO
>> my $searchio = $factory->hmmsearch($seq);
>> 
>> while (my $result = $searchio->next_result){
>> while(my $hit = $result->next_hit){
>> while (my $hsp = $hit->next_hsp){
>> print join("\t", ( $result->query_name,
>> $hsp->query->start,
>> $hsp->query->end,
>> $hit->name,
>> $hsp->hit->start,
>> $hsp->hit->end,
>> $hsp->score,
>> $hsp->evalue,
>> $hsp->seq_str,
>> )), "\n";
>> }
>> }
>> }
>> 
>> 
>> exceptions:
>> MSG: Unknown kind of input 'Bio::SeqIO::fasta=HASH(0x329a504)'
>> STACK Bio::Tools::Run::Hmmer::_setinput
>> D:/Perl/site/lib/Bio/Tools/Run/Hmmer.pm:381
>> STACK Bio::Tools::Run::Hmmer::hmmsearch
>> D:/Perl/site/lib/Bio/Tools/Run/Hmmer.pm:352
>>  STACK toplevel test_bioperl.pl:12
>> thank you
>> 
>> fayroz
>> 
>> 
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From douglas.hoen at gmail.com  Wed Aug 11 15:13:49 2010
From: douglas.hoen at gmail.com (Doug)
Date: Wed, 11 Aug 2010 12:13:49 -0700 (PDT)
Subject: [Bioperl-l] How to store results of searches of translated DNA in
	SeqFeature::Store database of the original DNA?
Message-ID: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com>

Hi,

I am trying to store in a SeqFeature::Store database the results of
searches of translated DNA. The DB contains the original DNA
sequences. For instance, I have done HMMER searches of 6-frame
translations of the sequences stored in the DB. I want to store these
results "at" their (equivalent) DNA positions, which I can calculate.
Preferably, I would like to directly store the SeqFeature::Similarity
objects that I get from parsing these searches. But they are of course
located on different coordinate systems than the DNA, so I guess I
can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct
DNA position and then store the Similarity's as sub-SeqFeatures.

I could just set the Similarity's position to the (calculated) DNA
coordinates, or alternately make a new SeqFeature and copy in the
attributes I want. But is there a more elegant solution?

Thanks,
-- Doug

From douglas.hoen at gmail.com  Wed Aug 11 16:11:26 2010
From: douglas.hoen at gmail.com (Doug)
Date: Wed, 11 Aug 2010 13:11:26 -0700 (PDT)
Subject: [Bioperl-l] How to store results of searches of translated DNA
	in SeqFeature::Store database of the original DNA?
In-Reply-To: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com>
References: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com>
Message-ID: <f2a15012-4af5-4e91-babe-3fb8ff55dd86@k10g2000yqa.googlegroups.com>

One possible answer to my own question: Use
Bio::SeqFeature::PositionProxy's? Would this work?

On Aug 11, 3:13?pm, Doug <douglas.h... at gmail.com> wrote:
> Hi,
>
> I am trying to store in a SeqFeature::Store database the results of
> searches of translated DNA. The DB contains the original DNA
> sequences. For instance, I have done HMMER searches of 6-frame
> translations of the sequences stored in the DB. I want to store these
> results "at" their (equivalent) DNA positions, which I can calculate.
> Preferably, I would like to directly store the SeqFeature::Similarity
> objects that I get from parsing these searches. But they are of course
> located on different coordinate systems than the DNA, so I guess I
> can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct
> DNA position and then store the Similarity's as sub-SeqFeatures.
>
> I could just set the Similarity's position to the (calculated) DNA
> coordinates, or alternately make a new SeqFeature and copy in the
> attributes I want. But is there a more elegant solution?
>
> Thanks,
> -- Doug
> _______________________________________________
> Bioperl-l mailing list
> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l


From scott at scottcain.net  Wed Aug 11 16:16:22 2010
From: scott at scottcain.net (Scott Cain)
Date: Wed, 11 Aug 2010 16:16:22 -0400
Subject: [Bioperl-l] How to store results of searches of translated DNA
	in SeqFeature::Store database of the original DNA?
In-Reply-To: <f2a15012-4af5-4e91-babe-3fb8ff55dd86@k10g2000yqa.googlegroups.com>
References: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com>
	<f2a15012-4af5-4e91-babe-3fb8ff55dd86@k10g2000yqa.googlegroups.com>
Message-ID: <AANLkTim2X9uaVq6ChayrRJr10L3MeA4fVfuHvA0HyvqM@mail.gmail.com>

Hi Doug,

I don't know if any of the things you've thought of would work; I've
never tried it.  My inclination would be to express your data in GFF3
and use the standard loader.

Scott


On Wed, Aug 11, 2010 at 4:11 PM, Doug <douglas.hoen at gmail.com> wrote:
> One possible answer to my own question: Use
> Bio::SeqFeature::PositionProxy's? Would this work?
>
> On Aug 11, 3:13?pm, Doug <douglas.h... at gmail.com> wrote:
>> Hi,
>>
>> I am trying to store in a SeqFeature::Store database the results of
>> searches of translated DNA. The DB contains the original DNA
>> sequences. For instance, I have done HMMER searches of 6-frame
>> translations of the sequences stored in the DB. I want to store these
>> results "at" their (equivalent) DNA positions, which I can calculate.
>> Preferably, I would like to directly store the SeqFeature::Similarity
>> objects that I get from parsing these searches. But they are of course
>> located on different coordinate systems than the DNA, so I guess I
>> can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct
>> DNA position and then store the Similarity's as sub-SeqFeatures.
>>
>> I could just set the Similarity's position to the (calculated) DNA
>> coordinates, or alternately make a new SeqFeature and copy in the
>> attributes I want. But is there a more elegant solution?
>>
>> Thanks,
>> -- Doug
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
Ontario Institute for Cancer Research


From douglas.hoen at gmail.com  Wed Aug 11 16:38:54 2010
From: douglas.hoen at gmail.com (Doug)
Date: Wed, 11 Aug 2010 13:38:54 -0700 (PDT)
Subject: [Bioperl-l] How to store results of searches of translated DNA
	in SeqFeature::Store database of the original DNA?
In-Reply-To: <AANLkTim2X9uaVq6ChayrRJr10L3MeA4fVfuHvA0HyvqM@mail.gmail.com>
References: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com>
	<f2a15012-4af5-4e91-babe-3fb8ff55dd86@k10g2000yqa.googlegroups.com> 
	<AANLkTim2X9uaVq6ChayrRJr10L3MeA4fVfuHvA0HyvqM@mail.gmail.com>
Message-ID: <6e28dc26-ada0-4be2-9f62-a4d632aaf0bb@j8g2000yqd.googlegroups.com>

Hi Scott,

Good idea. Would you happen to know of an existing HMMER3 to GFF3
converter?

Thanks for your advice,
-- Doug

On Aug 11, 4:16?pm, Scott Cain <sc... at scottcain.net> wrote:
> Hi Doug,
>
> I don't know if any of the things you've thought of would work; I've
> never tried it. ?My inclination would be to express your data in GFF3
> and use the standard loader.
>
> Scott
>
>
>
>
>
> On Wed, Aug 11, 2010 at 4:11 PM, Doug <douglas.h... at gmail.com> wrote:
> > One possible answer to my own question: Use
> > Bio::SeqFeature::PositionProxy's? Would this work?
>
> > On Aug 11, 3:13?pm, Doug <douglas.h... at gmail.com> wrote:
> >> Hi,
>
> >> I am trying to store in a SeqFeature::Store database the results of
> >> searches of translated DNA. The DB contains the original DNA
> >> sequences. For instance, I have done HMMER searches of 6-frame
> >> translations of the sequences stored in the DB. I want to store these
> >> results "at" their (equivalent) DNA positions, which I can calculate.
> >> Preferably, I would like to directly store the SeqFeature::Similarity
> >> objects that I get from parsing these searches. But they are of course
> >> located on different coordinate systems than the DNA, so I guess I
> >> can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct
> >> DNA position and then store the Similarity's as sub-SeqFeatures.
>
> >> I could just set the Similarity's position to the (calculated) DNA
> >> coordinates, or alternately make a new SeqFeature and copy in the
> >> attributes I want. But is there a more elegant solution?
>
> >> Thanks,
> >> -- Doug
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioper... at lists.open-bio.org
> >http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net
> GMOD Coordinator (http://gmod.org/)?? ? ? ? ? ? ? ? ?? 216-392-3087
> Ontario Institute for Cancer Research
>
> _______________________________________________
> Bioperl-l mailing list
> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l


From douglas.hoen at gmail.com  Wed Aug 11 16:53:35 2010
From: douglas.hoen at gmail.com (Doug)
Date: Wed, 11 Aug 2010 13:53:35 -0700 (PDT)
Subject: [Bioperl-l] How to store results of searches of translated DNA
	in SeqFeature::Store database of the original DNA?
In-Reply-To: <6e28dc26-ada0-4be2-9f62-a4d632aaf0bb@j8g2000yqd.googlegroups.com>
References: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com>
	<f2a15012-4af5-4e91-babe-3fb8ff55dd86@k10g2000yqa.googlegroups.com> 
	<AANLkTim2X9uaVq6ChayrRJr10L3MeA4fVfuHvA0HyvqM@mail.gmail.com> 
	<6e28dc26-ada0-4be2-9f62-a4d632aaf0bb@j8g2000yqd.googlegroups.com>
Message-ID: <a9d5aca2-3c28-49e8-bd76-119309c38c05@x21g2000yqa.googlegroups.com>

One more note: I did try using PositionProxy but it failed. It doesn't
implement seq_id() and so can't be stored in the DB:

------------- EXCEPTION: Bio::Root::NotImplemented -------------
MSG: Abstract method "Bio::SeqFeatureI::seq_id" is not implemented by
package Bio::SeqFeature::PositionProxy.
This is not your fault - author of Bio::SeqFeature::PositionProxy
should be blamed!

...


On Aug 11, 4:38?pm, Doug <douglas.h... at gmail.com> wrote:
> Hi Scott,
>
> Good idea. Would you happen to know of an existing HMMER3 to GFF3
> converter?
>
> Thanks for your advice,
> -- Doug
>
> On Aug 11, 4:16?pm, Scott Cain <sc... at scottcain.net> wrote:
>
>
>
>
>
> > Hi Doug,
>
> > I don't know if any of the things you've thought of would work; I've
> > never tried it. ?My inclination would be to express your data in GFF3
> > and use the standard loader.
>
> > Scott
>
> > On Wed, Aug 11, 2010 at 4:11 PM, Doug <douglas.h... at gmail.com> wrote:
> > > One possible answer to my own question: Use
> > > Bio::SeqFeature::PositionProxy's? Would this work?
>
> > > On Aug 11, 3:13?pm, Doug <douglas.h... at gmail.com> wrote:
> > >> Hi,
>
> > >> I am trying to store in a SeqFeature::Store database the results of
> > >> searches of translated DNA. The DB contains the original DNA
> > >> sequences. For instance, I have done HMMER searches of 6-frame
> > >> translations of the sequences stored in the DB. I want to store these
> > >> results "at" their (equivalent) DNA positions, which I can calculate.
> > >> Preferably, I would like to directly store the SeqFeature::Similarity
> > >> objects that I get from parsing these searches. But they are of course
> > >> located on different coordinate systems than the DNA, so I guess I
> > >> can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct
> > >> DNA position and then store the Similarity's as sub-SeqFeatures.
>
> > >> I could just set the Similarity's position to the (calculated) DNA
> > >> coordinates, or alternately make a new SeqFeature and copy in the
> > >> attributes I want. But is there a more elegant solution?
>
> > >> Thanks,
> > >> -- Doug
> > >> _______________________________________________
> > >> Bioperl-l mailing list
> > >> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioper... at lists.open-bio.org
> > >http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> > --
> > ------------------------------------------------------------------------
> > Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net
> > GMOD Coordinator (http://gmod.org/)?? ? ? ? ? ? ? ? ?? 216-392-3087
> > Ontario Institute for Cancer Research
>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Wed Aug 11 16:45:00 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 11 Aug 2010 15:45:00 -0500
Subject: [Bioperl-l] How to store results of searches of translated DNA
	in SeqFeature::Store database of the original DNA?
In-Reply-To: <6e28dc26-ada0-4be2-9f62-a4d632aaf0bb@j8g2000yqd.googlegroups.com>
References: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com>
	<f2a15012-4af5-4e91-babe-3fb8ff55dd86@k10g2000yqa.googlegroups.com>
	<AANLkTim2X9uaVq6ChayrRJr10L3MeA4fVfuHvA0HyvqM@mail.gmail.com>
	<6e28dc26-ada0-4be2-9f62-a4d632aaf0bb@j8g2000yqd.googlegroups.com>
Message-ID: <190AF658-E8FE-43D7-A71F-196AE54DA1DB@illinois.edu>

HMMER3 is parsed by Bio::SearchIO now in bioperl-live, and I think there is a generic SearchIO->GFF3 script floating around the intertubes somewheres...

chris

On Aug 11, 2010, at 3:38 PM, Doug wrote:

> Hi Scott,
> 
> Good idea. Would you happen to know of an existing HMMER3 to GFF3
> converter?
> 
> Thanks for your advice,
> -- Doug
> 
> On Aug 11, 4:16 pm, Scott Cain <sc... at scottcain.net> wrote:
>> Hi Doug,
>> 
>> I don't know if any of the things you've thought of would work; I've
>> never tried it.  My inclination would be to express your data in GFF3
>> and use the standard loader.
>> 
>> Scott
>> 
>> 
>> 
>> 
>> 
>> On Wed, Aug 11, 2010 at 4:11 PM, Doug <douglas.h... at gmail.com> wrote:
>>> One possible answer to my own question: Use
>>> Bio::SeqFeature::PositionProxy's? Would this work?
>> 
>>> On Aug 11, 3:13 pm, Doug <douglas.h... at gmail.com> wrote:
>>>> Hi,
>> 
>>>> I am trying to store in a SeqFeature::Store database the results of
>>>> searches of translated DNA. The DB contains the original DNA
>>>> sequences. For instance, I have done HMMER searches of 6-frame
>>>> translations of the sequences stored in the DB. I want to store these
>>>> results "at" their (equivalent) DNA positions, which I can calculate.
>>>> Preferably, I would like to directly store the SeqFeature::Similarity
>>>> objects that I get from parsing these searches. But they are of course
>>>> located on different coordinate systems than the DNA, so I guess I
>>>> can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct
>>>> DNA position and then store the Similarity's as sub-SeqFeatures.
>> 
>>>> I could just set the Similarity's position to the (calculated) DNA
>>>> coordinates, or alternately make a new SeqFeature and copy in the
>>>> attributes I want. But is there a more elegant solution?
>> 
>>>> Thanks,
>>>> -- Doug
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioper... at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> --
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.                                   scott at scottcain dot net
>> GMOD Coordinator (http://gmod.org/)                    216-392-3087
>> Ontario Institute for Cancer Research
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From scott at scottcain.net  Wed Aug 11 17:05:25 2010
From: scott at scottcain.net (Scott Cain)
Date: Wed, 11 Aug 2010 17:05:25 -0400
Subject: [Bioperl-l] How to store results of searches of translated DNA
	in SeqFeature::Store database of the original DNA?
In-Reply-To: <190AF658-E8FE-43D7-A71F-196AE54DA1DB@illinois.edu>
References: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com>
	<f2a15012-4af5-4e91-babe-3fb8ff55dd86@k10g2000yqa.googlegroups.com>
	<AANLkTim2X9uaVq6ChayrRJr10L3MeA4fVfuHvA0HyvqM@mail.gmail.com>
	<6e28dc26-ada0-4be2-9f62-a4d632aaf0bb@j8g2000yqd.googlegroups.com>
	<190AF658-E8FE-43D7-A71F-196AE54DA1DB@illinois.edu>
Message-ID: <AANLkTimY09-wo9R_ZbPmSG_9x7TZjVobTM95VO5fgCa4@mail.gmail.com>

Um, yeah, it's in bioperl: bp_search2gff.pl.

Scott


On Wed, Aug 11, 2010 at 4:45 PM, Chris Fields <cjfields at illinois.edu> wrote:
> HMMER3 is parsed by Bio::SearchIO now in bioperl-live, and I think there is a generic SearchIO->GFF3 script floating around the intertubes somewheres...
>
> chris
>
> On Aug 11, 2010, at 3:38 PM, Doug wrote:
>
>> Hi Scott,
>>
>> Good idea. Would you happen to know of an existing HMMER3 to GFF3
>> converter?
>>
>> Thanks for your advice,
>> -- Doug
>>
>> On Aug 11, 4:16 pm, Scott Cain <sc... at scottcain.net> wrote:
>>> Hi Doug,
>>>
>>> I don't know if any of the things you've thought of would work; I've
>>> never tried it. ?My inclination would be to express your data in GFF3
>>> and use the standard loader.
>>>
>>> Scott
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Aug 11, 2010 at 4:11 PM, Doug <douglas.h... at gmail.com> wrote:
>>>> One possible answer to my own question: Use
>>>> Bio::SeqFeature::PositionProxy's? Would this work?
>>>
>>>> On Aug 11, 3:13 pm, Doug <douglas.h... at gmail.com> wrote:
>>>>> Hi,
>>>
>>>>> I am trying to store in a SeqFeature::Store database the results of
>>>>> searches of translated DNA. The DB contains the original DNA
>>>>> sequences. For instance, I have done HMMER searches of 6-frame
>>>>> translations of the sequences stored in the DB. I want to store these
>>>>> results "at" their (equivalent) DNA positions, which I can calculate.
>>>>> Preferably, I would like to directly store the SeqFeature::Similarity
>>>>> objects that I get from parsing these searches. But they are of course
>>>>> located on different coordinate systems than the DNA, so I guess I
>>>>> can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct
>>>>> DNA position and then store the Similarity's as sub-SeqFeatures.
>>>
>>>>> I could just set the Similarity's position to the (calculated) DNA
>>>>> coordinates, or alternately make a new SeqFeature and copy in the
>>>>> attributes I want. But is there a more elegant solution?
>>>
>>>>> Thanks,
>>>>> -- Doug
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioper... at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> --
>>> ------------------------------------------------------------------------
>>> Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain dot net
>>> GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ?216-392-3087
>>> Ontario Institute for Cancer Research
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
Ontario Institute for Cancer Research


From cjfields at illinois.edu  Wed Aug 11 17:07:20 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 11 Aug 2010 16:07:20 -0500
Subject: [Bioperl-l] How to store results of searches of translated DNA
	in SeqFeature::Store database of the original DNA?
In-Reply-To: <AANLkTimY09-wo9R_ZbPmSG_9x7TZjVobTM95VO5fgCa4@mail.gmail.com>
References: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com>
	<f2a15012-4af5-4e91-babe-3fb8ff55dd86@k10g2000yqa.googlegroups.com>
	<AANLkTim2X9uaVq6ChayrRJr10L3MeA4fVfuHvA0HyvqM@mail.gmail.com>
	<6e28dc26-ada0-4be2-9f62-a4d632aaf0bb@j8g2000yqd.googlegroups.com>
	<190AF658-E8FE-43D7-A71F-196AE54DA1DB@illinois.edu>
	<AANLkTimY09-wo9R_ZbPmSG_9x7TZjVobTM95VO5fgCa4@mail.gmail.com>
Message-ID: <CCD1DE1D-867E-468D-941A-7C418C126FBE@illinois.edu>

For some reason I thought there was a more up-to-date one somewhere.  Ah well, can't keep track of all the code in bioperl :>

chris

On Aug 11, 2010, at 4:05 PM, Scott Cain wrote:

> Um, yeah, it's in bioperl: bp_search2gff.pl.
> 
> Scott
> 
> 
> On Wed, Aug 11, 2010 at 4:45 PM, Chris Fields <cjfields at illinois.edu> wrote:
>> HMMER3 is parsed by Bio::SearchIO now in bioperl-live, and I think there is a generic SearchIO->GFF3 script floating around the intertubes somewheres...
>> 
>> chris
>> 
>> On Aug 11, 2010, at 3:38 PM, Doug wrote:
>> 
>>> Hi Scott,
>>> 
>>> Good idea. Would you happen to know of an existing HMMER3 to GFF3
>>> converter?
>>> 
>>> Thanks for your advice,
>>> -- Doug
>>> 
>>> On Aug 11, 4:16 pm, Scott Cain <sc... at scottcain.net> wrote:
>>>> Hi Doug,
>>>> 
>>>> I don't know if any of the things you've thought of would work; I've
>>>> never tried it.  My inclination would be to express your data in GFF3
>>>> and use the standard loader.
>>>> 
>>>> Scott
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Wed, Aug 11, 2010 at 4:11 PM, Doug <douglas.h... at gmail.com> wrote:
>>>>> One possible answer to my own question: Use
>>>>> Bio::SeqFeature::PositionProxy's? Would this work?
>>>> 
>>>>> On Aug 11, 3:13 pm, Doug <douglas.h... at gmail.com> wrote:
>>>>>> Hi,
>>>> 
>>>>>> I am trying to store in a SeqFeature::Store database the results of
>>>>>> searches of translated DNA. The DB contains the original DNA
>>>>>> sequences. For instance, I have done HMMER searches of 6-frame
>>>>>> translations of the sequences stored in the DB. I want to store these
>>>>>> results "at" their (equivalent) DNA positions, which I can calculate.
>>>>>> Preferably, I would like to directly store the SeqFeature::Similarity
>>>>>> objects that I get from parsing these searches. But they are of course
>>>>>> located on different coordinate systems than the DNA, so I guess I
>>>>>> can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct
>>>>>> DNA position and then store the Similarity's as sub-SeqFeatures.
>>>> 
>>>>>> I could just set the Similarity's position to the (calculated) DNA
>>>>>> coordinates, or alternately make a new SeqFeature and copy in the
>>>>>> attributes I want. But is there a more elegant solution?
>>>> 
>>>>>> Thanks,
>>>>>> -- Doug
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>> 
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioper... at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>> 
>>>> --
>>>> ------------------------------------------------------------------------
>>>> Scott Cain, Ph. D.                                   scott at scottcain dot net
>>>> GMOD Coordinator (http://gmod.org/)                    216-392-3087
>>>> Ontario Institute for Cancer Research
>>>> 
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> 
> 
> 
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain dot net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research


From douglas.hoen at gmail.com  Wed Aug 11 17:11:20 2010
From: douglas.hoen at gmail.com (Douglas Hoen)
Date: Wed, 11 Aug 2010 17:11:20 -0400
Subject: [Bioperl-l] How to store results of searches of translated DNA
	in SeqFeature::Store database of the original DNA?
In-Reply-To: <AANLkTimY09-wo9R_ZbPmSG_9x7TZjVobTM95VO5fgCa4@mail.gmail.com>
References: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com>
	<f2a15012-4af5-4e91-babe-3fb8ff55dd86@k10g2000yqa.googlegroups.com>
	<AANLkTim2X9uaVq6ChayrRJr10L3MeA4fVfuHvA0HyvqM@mail.gmail.com>
	<6e28dc26-ada0-4be2-9f62-a4d632aaf0bb@j8g2000yqd.googlegroups.com>
	<190AF658-E8FE-43D7-A71F-196AE54DA1DB@illinois.edu>
	<AANLkTimY09-wo9R_ZbPmSG_9x7TZjVobTM95VO5fgCa4@mail.gmail.com>
Message-ID: <A8FFFBCC-4E4F-478B-B824-BB4249B11BA1@gmail.com>

Great, thanks so much for the info.

On 2010-08-11, at 5:05 PM, Scott Cain wrote:

> Um, yeah, it's in bioperl: bp_search2gff.pl.
> 
> Scott
> 
> 
> On Wed, Aug 11, 2010 at 4:45 PM, Chris Fields <cjfields at illinois.edu> wrote:
>> HMMER3 is parsed by Bio::SearchIO now in bioperl-live, and I think there is a generic SearchIO->GFF3 script floating around the intertubes somewheres...
>> 
>> chris
>> 
>> On Aug 11, 2010, at 3:38 PM, Doug wrote:
>> 
>>> Hi Scott,
>>> 
>>> Good idea. Would you happen to know of an existing HMMER3 to GFF3
>>> converter?
>>> 
>>> Thanks for your advice,
>>> -- Doug
>>> 
>>> On Aug 11, 4:16 pm, Scott Cain <sc... at scottcain.net> wrote:
>>>> Hi Doug,
>>>> 
>>>> I don't know if any of the things you've thought of would work; I've
>>>> never tried it.  My inclination would be to express your data in GFF3
>>>> and use the standard loader.
>>>> 
>>>> Scott
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Wed, Aug 11, 2010 at 4:11 PM, Doug <douglas.h... at gmail.com> wrote:
>>>>> One possible answer to my own question: Use
>>>>> Bio::SeqFeature::PositionProxy's? Would this work?
>>>> 
>>>>> On Aug 11, 3:13 pm, Doug <douglas.h... at gmail.com> wrote:
>>>>>> Hi,
>>>> 
>>>>>> I am trying to store in a SeqFeature::Store database the results of
>>>>>> searches of translated DNA. The DB contains the original DNA
>>>>>> sequences. For instance, I have done HMMER searches of 6-frame
>>>>>> translations of the sequences stored in the DB. I want to store these
>>>>>> results "at" their (equivalent) DNA positions, which I can calculate.
>>>>>> Preferably, I would like to directly store the SeqFeature::Similarity
>>>>>> objects that I get from parsing these searches. But they are of course
>>>>>> located on different coordinate systems than the DNA, so I guess I
>>>>>> can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct
>>>>>> DNA position and then store the Similarity's as sub-SeqFeatures.
>>>> 
>>>>>> I could just set the Similarity's position to the (calculated) DNA
>>>>>> coordinates, or alternately make a new SeqFeature and copy in the
>>>>>> attributes I want. But is there a more elegant solution?
>>>> 
>>>>>> Thanks,
>>>>>> -- Doug
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>> 
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioper... at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>> 
>>>> --
>>>> ------------------------------------------------------------------------
>>>> Scott Cain, Ph. D.                                   scott at scottcain dot net
>>>> GMOD Coordinator (http://gmod.org/)                    216-392-3087
>>>> Ontario Institute for Cancer Research
>>>> 
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> 
> 
> 
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain dot net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research


From Russell.Smithies at agresearch.co.nz  Wed Aug 11 17:31:32 2010
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Thu, 12 Aug 2010 09:31:32 +1200
Subject: [Bioperl-l] AlignIO  and Gbrowse_syn
In-Reply-To: <AANLkTimhT7KFHgg8HssiMrXkGNSjRMqjkuE++iAaqj4C@mail.gmail.com>
References: <AANLkTi=DS92zQ+_YANSbgmMiPUDo66YXyLhQby+_z_VD@mail.gmail.com>
	<C87CF736.E5DB%gowthaman.ramasamy@sbri.org>
	<AANLkTimhT7KFHgg8HssiMrXkGNSjRMqjkuE++iAaqj4C@mail.gmail.com>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF32F0237EAB7@exchsth.agresearch.co.nz>

I know there was some brief discussion about .maf format a few weeks ago but I've had an enquiry (as below) from a colleague. 
If GBrowse_syn is using .maf format, does AlignIO need more work?
Any comments?

--Russell


I'd like to plug LASTZ alignments into GBrowse_syn. LASTZ can produce a limit number of alignment formats (http://www.bx.psu.edu/miller_lab/dist/README.lastz-1.02.00/README.lastz-1.02.00a.html#options_output). GBrowse_syn accepts clustalw format plus "other commonly used formats recognized by BioPerl's AlignIO parser" (http://gmod.org/wiki/GBrowse_syn_Database) .  Since LASTZ doesn't produce clustalw, I've tried parsing LASTZ maf output to clustalw (and other alignment formats) using AlignIO, however I run into the following issues:
*Strand info is lost (probably fair enough, since this isn't part of the clustalw format per se; incorporating strand info within sequence IDs is a GBrowse_syn clustalw specification)
*The coordinate system for reverse strand matches  differs between LASTZ .maf and BioPerl .maf: for LASTZ, coordinates relate to the reverse complemented sequence, whereas for BioPerl/GBrowse, coordinates relate to the original (non-rev complemented) sequence. E.g. a coordinate of "1" in the LASTZ .maf file refers to the last base of the original sequence; AlignIO prints "1" to the output clustalw file, but since strand info is lost it is construed as the first position at the very start of the original sequence. As a result all reverse match coordinates in the resulting clustalw output file are incorrect.
*AlignIO is unable to parse multiple, individual aligned regions within the same .maf file; it interleaves them

I would be interested to hear whether anyone has already found a solution to integrating LASTZ and GBrowse_syn... and also whether any development of AlignIO to improve support of maf format is planned.
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From cjfields at illinois.edu  Wed Aug 11 18:02:38 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 11 Aug 2010 17:02:38 -0500
Subject: [Bioperl-l] AlignIO  and Gbrowse_syn
In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32F0237EAB7@exchsth.agresearch.co.nz>
References: <AANLkTi=DS92zQ+_YANSbgmMiPUDo66YXyLhQby+_z_VD@mail.gmail.com>
	<C87CF736.E5DB%gowthaman.ramasamy@sbri.org>
	<AANLkTimhT7KFHgg8HssiMrXkGNSjRMqjkuE++iAaqj4C@mail.gmail.com>
	<18DF7D20DFEC044098A1062202F5FFF32F0237EAB7@exchsth.agresearch.co.nz>
Message-ID: <E53C66C1-E4F1-4E83-B5ED-631CE62D7DCE@illinois.edu>

Russell,

We have had very few requests to support .maf until recently, which is why there has been little done with it.  We welcome any help to improve it.  

chris

On Aug 11, 2010, at 4:31 PM, Smithies, Russell wrote:

> I know there was some brief discussion about .maf format a few weeks ago but I've had an enquiry (as below) from a colleague. 
> If GBrowse_syn is using .maf format, does AlignIO need more work?
> Any comments?
> 
> --Russell
> 
> 
> I'd like to plug LASTZ alignments into GBrowse_syn. LASTZ can produce a limit number of alignment formats (http://www.bx.psu.edu/miller_lab/dist/README.lastz-1.02.00/README.lastz-1.02.00a.html#options_output). GBrowse_syn accepts clustalw format plus "other commonly used formats recognized by BioPerl's AlignIO parser" (http://gmod.org/wiki/GBrowse_syn_Database) .  Since LASTZ doesn't produce clustalw, I've tried parsing LASTZ maf output to clustalw (and other alignment formats) using AlignIO, however I run into the following issues:
> *Strand info is lost (probably fair enough, since this isn't part of the clustalw format per se; incorporating strand info within sequence IDs is a GBrowse_syn clustalw specification)
> *The coordinate system for reverse strand matches  differs between LASTZ .maf and BioPerl .maf: for LASTZ, coordinates relate to the reverse complemented sequence, whereas for BioPerl/GBrowse, coordinates relate to the original (non-rev complemented) sequence. E.g. a coordinate of "1" in the LASTZ .maf file refers to the last base of the original sequence; AlignIO prints "1" to the output clustalw file, but since strand info is lost it is construed as the first position at the very start of the original sequence. As a result all reverse match coordinates in the resulting clustalw output file are incorrect.
> *AlignIO is unable to parse multiple, individual aligned regions within the same .maf file; it interleaves them
> 
> I would be interested to hear whether anyone has already found a solution to integrating LASTZ and GBrowse_syn... and also whether any development of AlignIO to improve support of maf format is planned.
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From douglas.hoen at gmail.com  Thu Aug 12 01:59:37 2010
From: douglas.hoen at gmail.com (Doug Hoen)
Date: Wed, 11 Aug 2010 22:59:37 -0700 (PDT)
Subject: [Bioperl-l] HMMER3 to GFF3
Message-ID: <4bb89ced-69d9-43ff-ae20-4ce134efc40a@f6g2000yqa.googlegroups.com>

Hi,

 I am trying to convert HMMER3 (hmmscan) output files into GFF3 files.
Based on previous advice (see the thread, "How to store results of
searches of translated DNA in SeqFeature::Store database of the
original DNA?"), I have installed bioperl-live for its new HMMER3
parsing capabilities (in SearchIO) and am trying to use
bp_search2gff.pl to do the file conversion.

The hmmscan was done on translated chromosome sequences with conserved
domain models. I want to get the GFF 'start' and 'end' columns to be
based on these coordinates, not those of the models. To do this (with
my files), it seems I need to use the option "--type hit". However,
this changes the "Target" sequence name from the model name to
chromosome name, and the model name does not appear anywhere in the
output (see below).

Could someone please confirm whether the results are incorrect and, if
so, perhaps suggest a fix? It may well be that this problem is due to
the unusual way I am using hmmscan, rather than a problem with HMMER3
parsing...?

Many thanks,
-- Doug


========================================================


Here's what it looks like if I do *not* use the "--type hit" option.
(RVT_2 is a conserved domain name. I need this in the output.)


COMMAND:
------------------
bp_search2gff.pl -i ../chr1-tesigsv2.hmmscan -o chr1-tesigsv2-hmmscan-
original-locations-v2.gff3 --format hmmer3 --source HMMER3 --version 3
--component


OUTPUT:
------------------
==> chr1-tesigsv2-hmmscan-original-locations-v2.gff3 <==
##gff-version 3
Chr1_1	chromosome	Component	1	10142557	.	.	1	sequence=Chr1_1
Chr1_1	HMMER3	similarity	1	245	307.3	.	0	Target=Sequence:RVT_2 1898330
1898579
Chr1_1	HMMER3	similarity	1	244	329.5	.	0	Target=Sequence:RVT_2 2573551
2573796
Chr1_1	HMMER3	similarity	1	245	308.8	.	0	Target=Sequence:RVT_2 3159685
3159930
Chr1_1	HMMER3	similarity	1	102	108.2	.	0	Target=Sequence:RVT_2 3438684
3438791
Chr1_1	HMMER3	similarity	2	245	277.2	.	0	Target=Sequence:RVT_2 3566642
3566891
Chr1_1	HMMER3	similarity	13	213	251.4	.	0	Target=Sequence:RVT_2
4251160 4251373
Chr1_1	HMMER3	similarity	1	244	310.6	.	0	Target=Sequence:RVT_2 4252791
4253036
Chr1_1	HMMER3	similarity	6	99	94.2	.	0	Target=Sequence:RVT_2 4271555
4271653


========================================================


And here's what it looks like if I *do* use the "--type hit" option.
The coordinates look good but the model name has disappeared (and the
Target=Sequence seems wrong).


COMMAND:
------------------
bp_search2gff.pl -i ../chr1-tesigsv2.hmmscan -o chr1-tesigsv2-hmmscan-
original-locations-v3.gff3 --format hmmer3 --type hit --source HMMER3
--version 3 --component


OUTPUT:
------------------
==> chr1-tesigsv2-hmmscan-original-locations-v3.gff3 <==
##gff-version 3
RVT_2	HMMER3	similarity	1898330	1898579	307.3	.	0
Target=Sequence:Chr1_1 1 245
RVT_2	HMMER3	similarity	2573551	2573796	329.5	.	0
Target=Sequence:Chr1_1 1 244
RVT_2	HMMER3	similarity	3159685	3159930	308.8	.	0
Target=Sequence:Chr1_1 1 245
RVT_2	HMMER3	similarity	3438684	3438791	108.2	.	0
Target=Sequence:Chr1_1 1 102
RVT_2	HMMER3	similarity	3566642	3566891	277.2	.	0
Target=Sequence:Chr1_1 2 245
RVT_2	HMMER3	similarity	4251160	4251373	251.4	.	0
Target=Sequence:Chr1_1 13 213
RVT_2	HMMER3	similarity	4252791	4253036	310.6	.	0
Target=Sequence:Chr1_1 1 244
RVT_2	HMMER3	similarity	4271555	4271653	94.2	.	0
Target=Sequence:Chr1_1 6 99
RVT_2	HMMER3	similarity	4481232	4481477	281.5	.	0
Target=Sequence:Chr1_1 2 245


========================================================


And here's what the input HMMER3 result file looks like:


==> ../chr1-tesigsv2.hmmscan <==
# hmmscan :: search sequence(s) against a profile database
# HMMER 3.0rc1 (February 2010); http://hmmer.org/
# Copyright (C) 2010 Howard Hughes Medical Institute.
# Freely distributed under the GNU General Public License (GPLv3).
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- -
# query sequence file:             [...]/whole_chromosomes/translated/
chr1.pep
# target HMM database:             [...]/signatures/Pfam-A.hmm
# output directed to file:         chr1-tesigsv2.hmmscan
# model-specific thresholding:     TC cutoffs
# Max sensitivity mode:            on [all heuristic filters off]
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- -

Query:       Chr1_1  [L=10142557]
Description: CHROMOSOME dumped from ADB: Jun/20/09 14:53; last
updated: 2009-02-02
Scores for complete sequence (score includes all domains):
   --- full sequence ---   --- best 1 domain ---    -#dom-
    E-value  score  bias    E-value  score  bias    exp  N
Model           Description
    ------- ------ -----    ------- ------ -----   ---- --
--------        -----------
          0 3971.3  17.7   2.6e-101  329.5   0.6   19.4 17
RVT_2           Reverse transcriptase (RNA-dependent DNA pol
          0 3040.7  23.0     1e-206  678.6   0.1   12.2 10
ATHILA          ATHILA ORF-1 family
          0 1681.9  79.1    1.9e-46  149.9   0.4   28.0 21
RVT_1           Reverse transcriptase (RNA-dependent DNA pol
          0 1446.9  27.4    3.6e-95  309.1   0.2    7.6  5
Transposase_21  Transposase family tnp2
          0 1168.4  50.3    1.4e-29   94.4   0.3   21.5 18
rve             Integrase core domain
   9.1e-300  960.0  69.0    3.1e-20   64.0   0.0   28.8 20
Retrotrans_gag  Retrotransposon gag protein
   1.5e-180  577.0  31.6    1.6e-29   93.1   1.5    9.5  8
Transposase_23  TNP1/EN/SPM transposase
   4.4e-143  456.9  82.8    4.8e-18   56.4   0.1   12.9 11
MuDR            MuDR family transposase
   3.8e-116  371.4  19.6    1.2e-18   58.9   0.0   13.7  7
MULE            MULE transposase domain
   7.1e-106  344.1   5.6    2.7e-97  316.0   0.0    3.6  1
Plant_tran      Plant transposon protein
    9.2e-85  275.4  22.9    5.4e-60  194.4   0.3    6.4  3
Peptidase_C48   Ulp1 protease family, C-terminal catalytic d
    1.8e-77  249.8  24.8    4.4e-28   89.8   0.1   10.8  3
Transposase_24  Plant transposase (Ptta/En/Spm family)
    2.8e-47  150.1   1.2    5.5e-23   72.3   0.2    3.7  2
hATC            hAT family dimerisation domain
    5.7e-28   89.4   3.6    4.7e-13   41.1   0.0    6.5  1
RVP_2           Retroviral aspartyl protease
      1e-16   53.3   0.0    4.4e-07   22.1   0.0    6.8  1
RnaseH          RNase H
    1.5e-08   25.3   2.4    0.00016   12.1   0.0    4.9  0
Transposase_mut Transposase, Mutator family


Domain annotation for each model (and alignments):
>> RVT_2  Reverse transcriptase (RNA-dependent DNA polymerase)
   #    score  bias  c-Evalue  i-Evalue hmmfrom  hmm to    alifrom
ali to    envfrom  env to     acc
 ---   ------ ----- --------- --------- ------- -------    -------
-------    ------- -------    ----
   1 !  307.3   0.0   5.3e-95   1.5e-94       1     245 [. 1898330
1898578 .. 1898330 1898579 .. 0.99
   2 !  329.5   0.6  8.9e-102  2.6e-101       1     244 [. 2573551
2573794 .. 2573551 2573796 .. 0.99
   3 !  308.8   0.0   1.8e-95   5.2e-95       1     245 [. 3159685
3159929 .. 3159685 3159930 .. 0.99
   4 !  108.2   0.1   3.4e-34   9.7e-34       1     102 [. 3438684
3438785 .. 3438684 3438791 .. 0.96
   5 !  277.2   0.0   8.1e-86   2.3e-85       2     245 .. 3566643
3566890 .. 3566642 3566891 .. 0.99
   6 !  251.4   0.0   6.2e-78   1.8e-77      13     213 .. 4251164
4251364 .. 4251160 4251373 .. 0.97
   7 !  310.6   0.0   5.1e-96   1.5e-95       1     244 [. 4252791
4253034 .. 4252791 4253036 .. 0.99
   8 !   94.2   0.1   6.1e-30   1.8e-29       6      99 .. 4271560
4271653 .. 4271555 4271653 .. 0.97
   9 !  281.5   0.9   3.9e-87   1.1e-86       2     245 .. 4481233
4481476 .. 4481232 4481477 .. 0.98
  10 !  248.2   0.0   5.9e-77   1.7e-76       1     190 [. 4521040
4521233 .. 4521040 4521237 .. 0.97
  11 !  314.6   0.1   3.2e-97   9.2e-97       1     244 [. 4652456
4652702 .. 4652456 4652704 .. 0.98
  12 !   40.7   0.0   1.3e-13   3.7e-13       2      92 .. 5219607
5219697 .. 5219606 5219701 .. 0.90
  13 !  221.0   0.0   1.2e-68   3.4e-68       2     245 .. 5241015
5241258 .. 5241014 5241259 .. 0.95
  14 !   81.2   0.0   5.6e-26   1.6e-25       2     115 .. 5501957
5502070 .. 5501956 5502080 .. 0.92
  15 !  272.4   0.0   2.3e-84   6.7e-84      30     245 .. 6483057
6483271 .. 6483050 6483272 .. 0.98
  16 !  178.5   0.0   1.2e-55   3.3e-55      81     244 .. 7250563
7250726 .. 7250552 7250728 .. 0.96
  17 !  313.7   0.0   5.9e-97   1.7e-96       2     245 .. 7707124
7707367 .. 7707123 7707368 .. 0.99

  Alignments for each domain:
  == domain 1    score: 307.3 bits;  conditional E-value: 5.3e-95
   RVT_2       1
nktwelvelpkgkkviglkWvfklKlnedgeierykARlVakGftqkegidyeetfspvvklesirlllalaaekkleleqlDvktaFLngelee
95
                 n tw +++lp gkk++g+kWv+k+Kln+dg++erykARlVakG+tq+eg+dy
+tfspv+kl++++ll+a+aa+k+++l+qlD+++aFLng+l+e
  Chr1_1 1898330
NGTWVVCSLPVGKKAVGCKWVYKIKLNADGSLERYKARLVAKGYTQTEGLDYVDTFSPVAKLTTVKLLIAVAAAKGWSLSQLDISNAFLNGSLDE
1898424
 
68*********************************************************************************************
PP

   RVT_2      96
evYvkqpeGfedkkk....enkvckLkkslYgLkqapraWyeklsevllklgfkkseadkclfvkkkeeeliivllYVDDlliagsskelieelk
186
                 e+Y++ p+G++ ++     +n vc+LkkslYgLkqa+r+Wy k+se l++lgf+
+s+ d++lf++k++++ ++vl+YVDD++ia+s +++ e l
  Chr1_1 1898425
EIYMTLPPGYSPRQGdsfpPNAVCRLKKSLYGLKQASRQWYLKFSESLKALGFTQSSGDHTLFTRKSKNSYMAVLVYVDDIIIASSCDRETELLR
1898519
 
***********998889999***************************************************************************
PP

   RVT_2     187
eeLkkefemkdlgelkyfLgleierkeegillsqekyvkkllkkfkmedakpvstplea 245
                 ++L+++ +++dlg+l+yfLglei+r+++gi+++q+ky+ +ll+++++  +k++s
+p+e+
  Chr1_1 1898520
DALQRSSKLRDLGTLRYFLGLEIARNTDGISICQRKYTLELLAETGLLGCKSSSVPMEP 1898578
 
*********************************************************97 PP

  == domain 2    score: 329.5 bits;  conditional E-value: 8.9e-102
   RVT_2       1
nktwelvelpkgkkviglkWvfklKlnedgeierykARlVakGftqkegidyeetfspvvklesirlllalaaekkleleqlDvktaFLngelee
95
                 n+twel++lp+g+k+ig+kWv+k K+n++ge+erykARlVakG++q++gidy+e
+f+pv++le++rl+++laa++k++++q+D k aFLng++ee
  Chr1_1 2573551
NDTWELTSLPNGHKAIGVKWVYKAKKNSKGEVERYKARLVAKGYSQRAGIDYDEVFAPVARLETVRLIISLAAQNKWKIHQMDFKLAFLNGDFEE
2573645
 
79*********************************************************************************************
PP

   RVT_2      96
evYvkqpeGfedkkkenkvckLkkslYgLkqapraWyeklsevllklgfkkseadkclfvkkkeeeliivllYVDDlliagsskelieelkeeLk
190
                 evY++qp+G+ +k++e+kv++Lkk+lYgLkqapraW++++++++++++f k+ +
+++l++k ++e+++i +lYVDDl+++g++ ++ ee+k+e++
  Chr1_1 2573646
EVYIEQPQGYIVKGEEDKVLRLKKALYGLKQAPRAWNTRIDKYFKEKDFIKCPYEHALYIKIQKEDILIACLYVDDLIFTGNNPSMFEEFKKEMT
2573740
 
***********************************************************************************************
PP

   RVT_2     191
kefemkdlgelkyfLgleierkeegillsqekyvkkllkkfkmedakpvstple 244
                 kefem+d+g ++y+Lg+e+++++++i+++qe y+k++lkkfkm+d++pv tp
+e
  Chr1_1 2573741
KEFEMTDIGLMSYYLGIEVKQEDNRIFITQEGYAKEVLKKFKMDDSNPVCTPME 2573794
 
****************************************************97 PP

From kai.blin at biotech.uni-tuebingen.de  Thu Aug 12 08:16:45 2010
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Thu, 12 Aug 2010 14:16:45 +0200
Subject: [Bioperl-l] HMMER3 to GFF3
In-Reply-To: <4bb89ced-69d9-43ff-ae20-4ce134efc40a@f6g2000yqa.googlegroups.com>
References: <4bb89ced-69d9-43ff-ae20-4ce134efc40a@f6g2000yqa.googlegroups.com>
Message-ID: <20100812141645.1dc6507a.kai.blin@biotech.uni-tuebingen.de>

On Wed, 11 Aug 2010 22:59:37 -0700 (PDT)
Doug Hoen <douglas.hoen at gmail.com> wrote:

Hi Doug,

> Could someone please confirm whether the results are incorrect and, if
> so, perhaps suggest a fix? It may well be that this problem is due to
> the unusual way I am using hmmscan, rather than a problem with HMMER3
> parsing...?

Can you please attach your hmmer input file? Along the way something
inserted line breaks, making it unreadable.

It might well be possible that the HMMer3 parser still handles a little
different from the HMMer2 parser, I haven't tried that script.

Cheers,
Kai

-- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-University of T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Deutschland
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben


From kai.blin at biotech.uni-tuebingen.de  Thu Aug 12 08:09:00 2010
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Thu, 12 Aug 2010 14:09:00 +0200
Subject: [Bioperl-l] using HMMER
In-Reply-To: <62C86AFB-FF3A-44C6-A413-50C3F839DF34@illinois.edu>
References: <603590.1072.qm@web112620.mail.gq1.yahoo.com>
	<4C62B487.9090103@gmail.com>
	<62C86AFB-FF3A-44C6-A413-50C3F839DF34@illinois.edu>
Message-ID: <20100812140900.291bbb01.kai.blin@biotech.uni-tuebingen.de>

On Wed, 11 Aug 2010 10:07:36 -0500
Chris Fields <cjfields at illinois.edu> wrote:

> might also want to check whether you are using hmmer2 vs hmmer3.  not sure if the wrapper works for hmmer3.

It might if you initialize it using
my $factory = Bio::Tools::Run::Hmmer->new(-hmm => 'model.hmm', -_READMETHOD => 'hmmer3');

at least for the programs that still exist with the same name in
hmmer3. It won't support hmmer3 using the default options, though.

If I have some spare time, I'll look into this, no promises on the
timeframe, though.

Cheers,
Kai

-- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-University of T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Deutschland
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben


From cjfields at illinois.edu  Thu Aug 12 11:28:50 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 12 Aug 2010 10:28:50 -0500
Subject: [Bioperl-l] using HMMER
In-Reply-To: <20100812140900.291bbb01.kai.blin@biotech.uni-tuebingen.de>
References: <603590.1072.qm@web112620.mail.gq1.yahoo.com>
	<4C62B487.9090103@gmail.com>
	<62C86AFB-FF3A-44C6-A413-50C3F839DF34@illinois.edu>
	<20100812140900.291bbb01.kai.blin@biotech.uni-tuebingen.de>
Message-ID: <8129B813-5B15-4DDC-AB0D-5D95EFFCE78D@illinois.edu>

On Aug 12, 2010, at 7:09 AM, Kai Blin wrote:

> On Wed, 11 Aug 2010 10:07:36 -0500
> Chris Fields <cjfields at illinois.edu> wrote:
> 
>> might also want to check whether you are using hmmer2 vs hmmer3.  not sure if the wrapper works for hmmer3.
> 
> It might if you initialize it using
> my $factory = Bio::Tools::Run::Hmmer->new(-hmm => 'model.hmm', -_READMETHOD => 'hmmer3');
> 
> at least for the programs that still exist with the same name in
> hmmer3. It won't support hmmer3 using the default options, though.
> 
> If I have some spare time, I'll look into this, no promises on the
> timeframe, though.
> 
> Cheers,
> Kai
> 
> -- 
> Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
> Institute for Microbiology and Infection Medicine
> Division of Microbiology/Biotechnology
> Eberhard-Karls-University of T?bingen
> Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
> D-72076 T?bingen                        Fax :   ++49 7071 29-5979
> Deutschland
> Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben

Would be nice to convert this over (at some point) to use Mark's CommandExts.  I'm thinking of doing this with Infernal, so if I get that running it wouldn't be terribly difficult to get hmmer3 working as well.

chris

From cjfields at illinois.edu  Thu Aug 12 12:14:44 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 12 Aug 2010 11:14:44 -0500
Subject: [Bioperl-l] using HMMER
In-Reply-To: <857996.8184.qm@web112610.mail.gq1.yahoo.com>
References: <603590.1072.qm@web112620.mail.gq1.yahoo.com>
	<4C62B487.9090103@gmail.com>
	<62C86AFB-FF3A-44C6-A413-50C3F839DF34@illinois.edu>
	<20100812140900.291bbb01.kai.blin@biotech.uni-tuebingen.de>
	<8129B813-5B15-4DDC-AB0D-5D95EFFCE78D@illinois.edu>
	<857996.8184.qm@web112610.mail.gq1.yahoo.com>
Message-ID: <43FD0A31-DB95-4AE9-B678-937EE6346BC2@illinois.edu>

Fayroz,

Please keep responses on-list.

It seems you need to update your local bioperl, as 'hmmer3' is a recent addition, after 1.6.1.  It will be in 1.6.2 if I can get the time to make a release :>

chris

On Aug 12, 2010, at 10:58 AM, fayroz wrote:

> dear chris,
> from HMMER documentation i found this statement
> "The HMMER programs must either be in your path, or you must set the environment
> variable HMMERDIR to point to their location." 
> is it will solve the problem?
> how can i do it please ? i work under windows7 platform
> 
> 
> when i appled this line with hmmer3
> my $factory = Bio::Tools::Run::Hmmer->new(-hmm => 'model.hmm', -_READMETHOD => 
> 'hmmer3');
> 
> this output apper: 
> 
> Bio::SearchIO: hmmer3 cannot be found
> 
> and when try with hmmer2 the same output apper: 
> 
> Exception
> ------------- EXCEPTION -------------
> MSG: Failed to load module Bio::SearchIO::hmmer3. Can't locate 
> Bio\SearchIO\hmmer3.pm in @INC (@INC contains: D:\Perl\bin\ D:/Perl/site/lib 
> D:/Perl/lib .) at D:/Perl/site/lib/Bio/Root/Root.pm line 439, <GEN0> line 1.
> STACK Bio::Root::Root::_load_module D:/Perl/site/lib/Bio/Root/Root.pm:441
> STACK (eval) D:/Perl/site/lib/Bio/SearchIO.pm:446
> STACK Bio::SearchIO::_load_format_module D:/Perl/site/lib/Bio/SearchIO.pm:445
> STACK Bio::SearchIO::new D:/Perl/site/lib/Bio/SearchIO.pm:189
> STACK Bio::Tools::Run::Hmmer::_run D:/Perl/site/lib/Bio/Tools/Run/Hmmer.pm:431
> STACK Bio::Tools::Run::Hmmer::hmmsearch 
> D:/Perl/site/lib/Bio/Tools/Run/Hmmer.pm:353
> STACK toplevel C:\Users\Khaled\AppData\Local\Temp\dzprltmp.pl:13
> -------------------------------------
> For more information about the SearchIO system please see the SearchIO docs.
> This includes ways of checking for formats at compile time, not run time
> '--informat' is not recognized as an internal or external command,
> operable program or batch file.
> Can't call method "next_result" on an undefined value at 
> C:\Users\Khaled\AppData\Local\Temp\dzprltmp.pl line 15, <GEN0> line 1.
> 
> 
> 
> ----- Original Message ----
> From: Chris Fields <cjfields at illinois.edu>
> To: Kai Blin <kai.blin at biotech.uni-tuebingen.de>
> Cc: fayroz <fayroz_farouk at yahoo.com>; bioperl-l at bioperl.org
> Sent: Thu, August 12, 2010 6:28:50 PM
> Subject: Re: [Bioperl-l] using HMMER
> 
> On Aug 12, 2010, at 7:09 AM, Kai Blin wrote:
> 
>> On Wed, 11 Aug 2010 10:07:36 -0500
>> Chris Fields <cjfields at illinois.edu> wrote:
>> 
>>> might also want to check whether you are using hmmer2 vs hmmer3.  not sure if 
>>> the wrapper works for hmmer3.
>> 
>> It might if you initialize it using
>> my $factory = Bio::Tools::Run::Hmmer->new(-hmm => 'model.hmm', -_READMETHOD => 
>> 'hmmer3');
>> 
>> at least for the programs that still exist with the same name in
>> hmmer3. It won't support hmmer3 using the default options, though.
>> 
>> If I have some spare time, I'll look into this, no promises on the
>> timeframe, though.
>> 
>> Cheers,
>> Kai
>> 
>> -- 
>> Dipl.-Inform. Kai Blin        kai.blin at biotech.uni-tuebingen.de
>> Institute for Microbiology and Infection Medicine
>> Division of Microbiology/Biotechnology
>> Eberhard-Karls-University of T?bingen
>> Auf der Morgenstelle 28                Phone : ++49 7071 29-78841
>> D-72076 T?bingen                        Fax :  ++49 7071 29-5979
>> Deutschland
>> Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben
> 
> Would be nice to convert this over (at some point) to use Mark's CommandExts.  
> I'm thinking of doing this with Infernal, so if I get that running it wouldn't 
> be terribly difficult to get hmmer3 working as well.
> 
> chris
> 
> 
> 


From jason at bioperl.org  Thu Aug 12 14:37:11 2010
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 12 Aug 2010 11:37:11 -0700
Subject: [Bioperl-l] Other: Script for editing alignments?
In-Reply-To: <20100812061811.4D92468539@evol.biology.mcmaster.ca>
References: <20100812061811.4D92468539@evol.biology.mcmaster.ca>
Message-ID: <4C643F57.3040408@bioperl.org>

Hi Si -

This is pretty straightforward with Bioperl. Here's one solution:

#!/usr/bin/perl -w
use strict;
use Bio::AlignIO;
my $in = Bio::AlignIO->new(-format => 'fasta', -file => shift @ARGV);
my $out = Bio::AlignIO->new(-format => 'fasta');

while( my $aln = $in->next_aln ) {
  for my $seq ( $aln->each_seq ) {
  my $str = $seq->seq;
  if( $str =~ /^(-+)/ ) {
     my $rep = length($1);
# replace from the 5' end
     substr($str,0,$rep,'N'x$rep);
  }
  if( $str =~ /(-+)$/ ) {
    my $rep = length($1);
   # replace from the 3' end
    substr($str,-1 * $rep,length($str),'N'x$rep);
  }
     $seq->seq($str);
  }
  # don't print the /start-end info in the FASTA ID
  $aln->set_displayname_flat(1);
  $out->write_aln($aln);
}

-jason

evoldir at evol.biology.mcmaster.ca wrote, On 8/11/10 11:18 PM:
> Dear All
>
> Alignment programs like MUSCLE and Clustal often output alignments with
> "-" symbols indicating indels (real events) within sequence alignments,
> but also "-" symbols at the 5' and 3' ends of sequences. The latter
> however, are not real evolutionary events and really should be Ns
> (missing data), depending on the sort of analytical framework you use.
>
> If there is sufficient heterogeneity and signal within the 5' and 3'
> ends of sequences, the "-"s can be manually edited in a text editor to
> Ns with no problem, if the alignment is small. If it is large (e.g. 2000
> seqs), or there are lots of alignments, it becomes a lengthy task.
>
> I'm investigating such alignments presently and so was wondering if
> anyone had a clever way of implementing sed, or had a Perl script that
> would perform such a task. Simply put, it would require replacing the 5'
> and 3' "-" below only with Ns and leaving the within sequence "-"s
> alone. The sequences naturally may span more than one line.
>
>   >Taxon 1
> -----ATGCTG--TGACTG----TGACT---
>   >Taxon 2
> ---GTATGTTG--TGACTGCT--TGACCGTC
>
> to
>
>   >Taxon 1
> NNNNNATGCTG--TGACTG----TGACTNNN
>   >Taxon 2
> NNNGTATGTTG--TGACTGCT--TGACCGTC
>
> It's a simple task, but I haven't seen any scripts out there to do the job.
>
> If there are any scripters out there who can help, or if someone knows
> of an application that would help, it would be great to hear from you.
>
> With best wishes and thanks
>
> Si Creer
>
>    

From genehack at genehack.org  Thu Aug 12 20:32:07 2010
From: genehack at genehack.org (John SJ Anderson)
Date: Thu, 12 Aug 2010 20:32:07 -0400
Subject: [Bioperl-l]
	Bio::SeqFeature::SimilarityPair->from_searchResult()?
In-Reply-To: <4513D6B2-F7B3-4A6E-91CA-879C9E372E84@gmail.com>
References: <4513D6B2-F7B3-4A6E-91CA-879C9E372E84@gmail.com>
Message-ID: <ABCC813F-9FF8-465E-B5AF-E95BD8291D95@genehack.org>


On Aug 10, 2010, at 21:54 , Douglas Hoen wrote:

> I was wondering why the Synopsis in the docs for Bio::SeqFeature::SimilarityPair has the following:
> $sim_pair = Bio::SeqFeature::SimilarityPair->from_searchResult($blastHit);
> 
> There doesn't actually seem to be a from_searchResult method. Am I missing something?

No, it looks like that method got removed back in 2002 as a part of moving to Bio::SearchIO (which was removed still later...):

  <http://github.com/bioperl/bioperl-live/commit/5e3bdc11eb0ceffcd8e8966299a6367e792f2fd1>

Unfortunately, the commit didn't update the documentation. From the tiny little bit I've looked at the code, it looks like you should just be calling the 'new()' method instead (note that it takes a set of arguments, not just a BLAST hit object).

Hope this helps -- if you should happen to have the tuits, a patch to update the documentation to reflect the current interface would be awesome...

chrs,
john.


From david.breimann at gmail.com  Fri Aug 13 09:01:10 2010
From: david.breimann at gmail.com (David Breimann)
Date: Fri, 13 Aug 2010 16:01:10 +0300
Subject: [Bioperl-l] Problem executing bp_genbank2gff3.pl from another perl
	script
Message-ID: <AANLkTikqTXynSe4dTqw1Tz5GOOyoDOZTC5C-HJWLKfaL@mail.gmail.com>

Hi,
I am rying to run bp_genbank2gff3.pl from another perl script that
gets a genbank as its argument.

This does not work  (no output files are generated):
    my $command = "bp_genbank2gff3.pl -y -o /tmp $ARGV[0]";

    open( my $command_out, "-|", $command );
    close $command_out;

but this does

    open( my $command_out, "-|", $command );
    sleep 3; # why do I need to sleep?
    close $command_out;

Why?

I though that close is supposed to block until the command is done:

Closing any piped filehandle causes the parent process to wait for the
child to finish... (see http://perldoc.perl.org/functions/open.html).

Thanks
Dave

From jun.yin at ucd.ie  Fri Aug 13 09:36:34 2010
From: jun.yin at ucd.ie (Jun Yin)
Date: Fri, 13 Aug 2010 14:36:34 +0100
Subject: [Bioperl-l] Bio::LocatableSeq end checking inconsistency
Message-ID: <004a01cb3aec$8c2ddd60$a4899820$%yin@ucd.ie>

Hi, all,

 
I am the google summer of code student working on Bio::Align subsystem
refactoring. The code (Bio::SimpleAlign) I re-implemented now has passed
nearly all the test, except a few tests on seq/start-end testing. But here
comes a problem. This may be an old issue, that the Bio::LocatableSeq end
assignment and checking are inconsistent.

 
The current end checking method is based on:

$end=$seq->_ungapped_len+$seq->start-1

However, this checking may not fit the real world case.

 
The inconsistency usually happens when a few columns of the sequence are
removed.

 
For example:

my $a = Bio::LocatableSeq->new(

    -id    => 'a',

    -strand => 1,

    -seq   => '-tcgatc-atcgatcg',

    -start => 30,

    -end   => 43

);

 
If we remove the 1st, 8th and the last columns

 
$a->seq() will be 'tcgatcatcgatc'

$a->_ungapped_len==12

 
Actually, in the real world, the first residue will still be 30 (the old
$seq->start), and the last residue is the residue before the 43 (the old
$seq->end), thus 42.

 
But if you call a validation, the calculation is
$a->_ungapped_len+$a->start-1=12+30-1=41

So the reassignment of the $seq->end will not pass the validation.

 
So unless you save the information to a new sequence object, the original
position information will be lost anyway. But in some cases, we have to
change the sequence in its original sequence object ..

 
What is your suggestion on this issue? 

A. pass the test and lose the information      #convenient in coding but the
start-end annotation is not right any more

B. keep the information and forget the test   #the object will still
remember where the last residue was in the original sequence. But is it
really meaningful at all? Because all the other residues may come from
nowhere

C. Neither of above #any other suggestions?

 
Cheers,

Jun Yin

Ph.D. student in U.C.D.

 
Bioinformatics Laboratory

Conway Institute

University College Dublin

 
From jessica.sun at gmail.com  Fri Aug 13 11:06:46 2010
From: jessica.sun at gmail.com (Jessica Sun)
Date: Fri, 13 Aug 2010 11:06:46 -0400
Subject: [Bioperl-l] Add sequence feature
Message-ID: <AANLkTimkwyqE1tbZ+p-T41yvBWVq=GrCJ+L0rT+84mPy@mail.gmail.com>

Does anyone knows how to open a genbank file, add new feature and then save
a new genbank
file with new feature added in bioperl ?

thx

-- 
Jessica Jingping Sun

From jessica.sun at gmail.com  Fri Aug 13 11:27:10 2010
From: jessica.sun at gmail.com (Jessica Sun)
Date: Fri, 13 Aug 2010 11:27:10 -0400
Subject: [Bioperl-l] Add sequence feature
In-Reply-To: <4C6562E0.7090008@gmail.com>
References: <AANLkTimkwyqE1tbZ+p-T41yvBWVq=GrCJ+L0rT+84mPy@mail.gmail.com>
	<4C6562E0.7090008@gmail.com>
Message-ID: <AANLkTikGRnpybPPsoSf_FAXWUUpFCK9YEMMiyStBbfbN@mail.gmail.com>

unfortunately. I want to add the feature to the sequence object I got from
the Genbank file, I do not mind to save a new genbank file but these new
genbank file contains the original genbank format and info I got plus the
new feature tags I need to added to. Any quick solution to this?

thx

Jessica


On Fri, Aug 13, 2010 at 11:21 AM, Roy Chaudhuri <roy.chaudhuri at gmail.com>wrote:

> Hi Jessica.
>
> You need to use Bio::SeqIO to read in the GenBank file to a BioPerl
> sequence object, and to write your new GenBank file:
> http://www.bioperl.org/wiki/HOWTO:SeqIO
>
> To add a new feature follow the instructions here:
>
> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Own_Sequences
>
> (except that you are adding the feature to the sequence object you got from
> the Genbank file, not a new Bio::Seq object).
>
> Cheers.
> Roy.
>
>
> On 13/08/2010 16:06, Jessica Sun wrote:
>
>> Does anyone knows how to open a genbank file, add new feature and then
>> save
>> a new genbank
>> file with new feature added in bioperl ?
>>
>> thx
>>
>>
>


-- 
Jessica Jingping Sun

From roy.chaudhuri at gmail.com  Fri Aug 13 11:21:04 2010
From: roy.chaudhuri at gmail.com (Roy Chaudhuri)
Date: Fri, 13 Aug 2010 16:21:04 +0100
Subject: [Bioperl-l] Add sequence feature
In-Reply-To: <AANLkTimkwyqE1tbZ+p-T41yvBWVq=GrCJ+L0rT+84mPy@mail.gmail.com>
References: <AANLkTimkwyqE1tbZ+p-T41yvBWVq=GrCJ+L0rT+84mPy@mail.gmail.com>
Message-ID: <4C6562E0.7090008@gmail.com>

Hi Jessica.

You need to use Bio::SeqIO to read in the GenBank file to a BioPerl 
sequence object, and to write your new GenBank file:
http://www.bioperl.org/wiki/HOWTO:SeqIO

To add a new feature follow the instructions here:
http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Own_Sequences

(except that you are adding the feature to the sequence object you got 
from the Genbank file, not a new Bio::Seq object).

Cheers.
Roy.

On 13/08/2010 16:06, Jessica Sun wrote:
> Does anyone knows how to open a genbank file, add new feature and then save
> a new genbank
> file with new feature added in bioperl ?
>
> thx
>


From roy.chaudhuri at gmail.com  Fri Aug 13 11:37:20 2010
From: roy.chaudhuri at gmail.com (Roy Chaudhuri)
Date: Fri, 13 Aug 2010 16:37:20 +0100
Subject: [Bioperl-l] Add sequence feature
In-Reply-To: <AANLkTikGRnpybPPsoSf_FAXWUUpFCK9YEMMiyStBbfbN@mail.gmail.com>
References: <AANLkTimkwyqE1tbZ+p-T41yvBWVq=GrCJ+L0rT+84mPy@mail.gmail.com>	<4C6562E0.7090008@gmail.com>
	<AANLkTikGRnpybPPsoSf_FAXWUUpFCK9YEMMiyStBbfbN@mail.gmail.com>
Message-ID: <4C6566B0.60706@gmail.com>

I'm not sure I understand, do you mean that you want to load just the 
sequence from the GenBank file (ignoring the existing annotation), then 
add your own features? There are instructions on how to do that here:
http://www.bioperl.org/wiki/HOWTO:SeqIO#Speed.2C_Bio::Seq::SeqBuilder

On 13/08/2010 16:27, Jessica Sun wrote:
> unfortunately. I want to add the feature to the sequence object I got
> from the Genbank file, I do not mind to save a new genbank file but
> these new genbank file contains the original genbank format and info I
> got plus the new feature tags I need to added to. Any quick solution to
> this?
>
> thx
>
> Jessica
>
>
>
> On Fri, Aug 13, 2010 at 11:21 AM, Roy Chaudhuri <roy.chaudhuri at gmail.com
> <mailto:roy.chaudhuri at gmail.com>> wrote:
>
>     Hi Jessica.
>
>     You need to use Bio::SeqIO to read in the GenBank file to a BioPerl
>     sequence object, and to write your new GenBank file:
>     http://www.bioperl.org/wiki/HOWTO:SeqIO
>
>     To add a new feature follow the instructions here:
>     http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Own_Sequences
>
>     (except that you are adding the feature to the sequence object you
>     got from the Genbank file, not a new Bio::Seq object).
>
>     Cheers.
>     Roy.
>
>
>     On 13/08/2010 16:06, Jessica Sun wrote:
>
>         Does anyone knows how to open a genbank file, add new feature
>         and then save
>         a new genbank
>         file with new feature added in bioperl ?
>
>         thx
>
>
>
>
>
> --
> Jessica Jingping Sun


From roy.chaudhuri at gmail.com  Fri Aug 13 11:57:27 2010
From: roy.chaudhuri at gmail.com (Roy Chaudhuri)
Date: Fri, 13 Aug 2010 16:57:27 +0100
Subject: [Bioperl-l] Add sequence feature
In-Reply-To: <AANLkTikRuHaDmA3JvsdBHBXpgeti-pJvHuJk15s_9K+P@mail.gmail.com>
References: <AANLkTimkwyqE1tbZ+p-T41yvBWVq=GrCJ+L0rT+84mPy@mail.gmail.com>	<4C6562E0.7090008@gmail.com>	<AANLkTikGRnpybPPsoSf_FAXWUUpFCK9YEMMiyStBbfbN@mail.gmail.com>	<4C6566B0.60706@gmail.com>
	<AANLkTikRuHaDmA3JvsdBHBXpgeti-pJvHuJk15s_9K+P@mail.gmail.com>
Message-ID: <4C656B67.5020402@gmail.com>

Please remember to copy replies to the mailing list.

You can loop over the features in your Bio::Seq object:
for my $feat ($seq->get_SeqFeatures) { # do something }

And once you have found the feature you want to modify, you can add a 
tag using something like:
$feat->add_tag_value('note',"this is a note");

When you're finished you can write out the modified sequence object to a 
new GenBank file.

On 13/08/2010 16:40, Jessica Sun wrote:
> no i want to load the genbank file with existing features and I need to
> add some new feature tags to the existing ones and then save to a new
> update genbank file for local usage. I just not quite good on how to
> easily merge the two steps you recommended into one in a neat way.
>
> thx
>
>
> On Fri, Aug 13, 2010 at 11:37 AM, Roy Chaudhuri <roy.chaudhuri at gmail.com
> <mailto:roy.chaudhuri at gmail.com>> wrote:
>
>     I'm not sure I understand, do you mean that you want to load just
>     the sequence from the GenBank file (ignoring the existing
>     annotation), then add your own features? There are instructions on
>     how to do that here:
>     http://www.bioperl.org/wiki/HOWTO:SeqIO#Speed.2C_Bio::Seq::SeqBuilder
>
>
>     On 13/08/2010 16:27, Jessica Sun wrote:
>
>         unfortunately. I want to add the feature to the sequence object
>         I got
>         from the Genbank file, I do not mind to save a new genbank file but
>         these new genbank file contains the original genbank format and
>         info I
>         got plus the new feature tags I need to added to. Any quick
>         solution to
>         this?
>
>         thx
>
>         Jessica
>
>
>
>         On Fri, Aug 13, 2010 at 11:21 AM, Roy Chaudhuri
>         <roy.chaudhuri at gmail.com <mailto:roy.chaudhuri at gmail.com>
>         <mailto:roy.chaudhuri at gmail.com
>         <mailto:roy.chaudhuri at gmail.com>>> wrote:
>
>             Hi Jessica.
>
>             You need to use Bio::SeqIO to read in the GenBank file to a
>         BioPerl
>             sequence object, and to write your new GenBank file:
>         http://www.bioperl.org/wiki/HOWTO:SeqIO
>
>             To add a new feature follow the instructions here:
>         http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Own_Sequences
>
>             (except that you are adding the feature to the sequence
>         object you
>             got from the Genbank file, not a new Bio::Seq object).
>
>             Cheers.
>             Roy.
>
>
>             On 13/08/2010 16:06, Jessica Sun wrote:
>
>                 Does anyone knows how to open a genbank file, add new
>         feature
>                 and then save
>                 a new genbank
>                 file with new feature added in bioperl ?
>
>                 thx
>
>
>
>
>
>         --
>         Jessica Jingping Sun
>
>
>
>
>
> --
> Jessica Jingping Sun


From jessica.sun at gmail.com  Fri Aug 13 13:06:32 2010
From: jessica.sun at gmail.com (Jessica Sun)
Date: Fri, 13 Aug 2010 13:06:32 -0400
Subject: [Bioperl-l] Add sequence feature
In-Reply-To: <4C656B67.5020402@gmail.com>
References: <AANLkTimkwyqE1tbZ+p-T41yvBWVq=GrCJ+L0rT+84mPy@mail.gmail.com>
	<4C6562E0.7090008@gmail.com>
	<AANLkTikGRnpybPPsoSf_FAXWUUpFCK9YEMMiyStBbfbN@mail.gmail.com>
	<4C6566B0.60706@gmail.com>
	<AANLkTikRuHaDmA3JvsdBHBXpgeti-pJvHuJk15s_9K+P@mail.gmail.com>
	<4C656B67.5020402@gmail.com>
Message-ID: <AANLkTikJ7md5gYebt3GOXBMbCBvDZQ1TmoEkUMp1MPf5@mail.gmail.com>

Thanks. I somehow get these error messages.

--------------------- WARNING ---------------------
MSG:  Bio::SeqIO::genbank=HASH(0xa7ba1c) is not a SeqI compliant module.
Attempting to dump, but may fail!
---------------------------------------------------
Can't locate object method "seq" via package "Bio::SeqIO::genbank" at
/Library/Perl/5.8.8/Bio/SeqIO/genbank.pm line 760, <GEN0> line 447.

by doing this,

my $feat = new Bio::SeqFeature::Generic(-start                 =>20,
                                        -end         => $40,
                                        -primary_tag => 'newfeature' );
                                    $feat->add_tag_value("note","this is
notes");
  $f->add_SeqFeature($feat); ## f is original feature pointer
$io = Bio::SeqIO->new(-format => "genbank", -file => ">$newoutfile" );

    $io->write_seq($seqio_object);

On Fri, Aug 13, 2010 at 11:57 AM, Roy Chaudhuri <roy.chaudhuri at gmail.com>wrote:

> Please remember to copy replies to the mailing list.
>
> You can loop over the features in your Bio::Seq object:
> for my $feat ($seq->get_SeqFeatures) { # do something }
>
> And once you have found the feature you want to modify, you can add a tag
> using something like:
> $feat->add_tag_value('note',"this is a note");
>
> When you're finished you can write out the modified sequence object to a
> new GenBank file.
>
>
> On 13/08/2010 16:40, Jessica Sun wrote:
>
>> no i want to load the genbank file with existing features and I need to
>> add some new feature tags to the existing ones and then save to a new
>> update genbank file for local usage. I just not quite good on how to
>> easily merge the two steps you recommended into one in a neat way.
>>
>> thx
>>
>>
>> On Fri, Aug 13, 2010 at 11:37 AM, Roy Chaudhuri <roy.chaudhuri at gmail.com
>> <mailto:roy.chaudhuri at gmail.com>> wrote:
>>
>>    I'm not sure I understand, do you mean that you want to load just
>>    the sequence from the GenBank file (ignoring the existing
>>    annotation), then add your own features? There are instructions on
>>    how to do that here:
>>    http://www.bioperl.org/wiki/HOWTO:SeqIO#Speed.2C_Bio::Seq::SeqBuilder
>>
>>
>>    On 13/08/2010 16:27, Jessica Sun wrote:
>>
>>        unfortunately. I want to add the feature to the sequence object
>>        I got
>>        from the Genbank file, I do not mind to save a new genbank file but
>>        these new genbank file contains the original genbank format and
>>        info I
>>        got plus the new feature tags I need to added to. Any quick
>>        solution to
>>        this?
>>
>>        thx
>>
>>        Jessica
>>
>>
>>
>>        On Fri, Aug 13, 2010 at 11:21 AM, Roy Chaudhuri
>>        <roy.chaudhuri at gmail.com <mailto:roy.chaudhuri at gmail.com>
>>        <mailto:roy.chaudhuri at gmail.com
>>        <mailto:roy.chaudhuri at gmail.com>>> wrote:
>>
>>            Hi Jessica.
>>
>>            You need to use Bio::SeqIO to read in the GenBank file to a
>>        BioPerl
>>            sequence object, and to write your new GenBank file:
>>        http://www.bioperl.org/wiki/HOWTO:SeqIO
>>
>>            To add a new feature follow the instructions here:
>>
>> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Own_Sequences
>>
>>            (except that you are adding the feature to the sequence
>>        object you
>>            got from the Genbank file, not a new Bio::Seq object).
>>
>>            Cheers.
>>            Roy.
>>
>>
>>            On 13/08/2010 16:06, Jessica Sun wrote:
>>
>>                Does anyone knows how to open a genbank file, add new
>>        feature
>>                and then save
>>                a new genbank
>>                file with new feature added in bioperl ?
>>
>>                thx
>>
>>
>>
>>
>>
>>        --
>>        Jessica Jingping Sun
>>
>>
>>
>>
>>
>> --
>> Jessica Jingping Sun
>>
>
>


-- 
Jessica Jingping Sun

From drummike at gmail.com  Fri Aug 13 13:41:55 2010
From: drummike at gmail.com (Mike Williams)
Date: Fri, 13 Aug 2010 13:41:55 -0400
Subject: [Bioperl-l] Add sequence feature
In-Reply-To: <AANLkTikJ7md5gYebt3GOXBMbCBvDZQ1TmoEkUMp1MPf5@mail.gmail.com>
References: <AANLkTimkwyqE1tbZ+p-T41yvBWVq=GrCJ+L0rT+84mPy@mail.gmail.com>
	<4C6562E0.7090008@gmail.com>
	<AANLkTikGRnpybPPsoSf_FAXWUUpFCK9YEMMiyStBbfbN@mail.gmail.com>
	<4C6566B0.60706@gmail.com>
	<AANLkTikRuHaDmA3JvsdBHBXpgeti-pJvHuJk15s_9K+P@mail.gmail.com>
	<4C656B67.5020402@gmail.com>
	<AANLkTikJ7md5gYebt3GOXBMbCBvDZQ1TmoEkUMp1MPf5@mail.gmail.com>
Message-ID: <AANLkTi=SuCgDmDZ1qQW0-mUQJxigteO4GPnSQD09oB90@mail.gmail.com>

On Fri, Aug 13, 2010 at 1:06 PM, Jessica Sun <jessica.sun at gmail.com> wrote:

> Thanks. I somehow get these error messages.
> by doing this,
>
> my $feat = new Bio::SeqFeature::Generic(-start                 =>20,
>                                        -end         => $40,
>                                        -primary_tag => 'newfeature' );
>                                     $feat->add_tag_value("note","this is
> notes");
>

That $40 looks fishy.  Try deleting the dollar sign.  You did mean just 40,
right?

Mike

From MEC at stowers.org  Fri Aug 13 13:37:50 2010
From: MEC at stowers.org (Cook, Malcolm)
Date: Fri, 13 Aug 2010 12:37:50 -0500
Subject: [Bioperl-l] Add sequence feature
In-Reply-To: <AANLkTikJ7md5gYebt3GOXBMbCBvDZQ1TmoEkUMp1MPf5@mail.gmail.com>
References: <AANLkTimkwyqE1tbZ+p-T41yvBWVq=GrCJ+L0rT+84mPy@mail.gmail.com>
	<4C6562E0.7090008@gmail.com>
	<AANLkTikGRnpybPPsoSf_FAXWUUpFCK9YEMMiyStBbfbN@mail.gmail.com>
	<4C6566B0.60706@gmail.com>
	<AANLkTikRuHaDmA3JvsdBHBXpgeti-pJvHuJk15s_9K+P@mail.gmail.com>
	<4C656B67.5020402@gmail.com>
	<AANLkTikJ7md5gYebt3GOXBMbCBvDZQ1TmoEkUMp1MPf5@mail.gmail.com>
Message-ID: <BD62CBAC4395B94096109020651BE2EC1312232E24@EXCHMB-02.stowers-institute.org>

Jessica,

Show more code!

In particular, where did $f get set?

--Malcolm

 
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Jessica Sun
Sent: Friday, August 13, 2010 12:07 PM
To: Roy Chaudhuri
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Add sequence feature

Thanks. I somehow get these error messages.

--------------------- WARNING ---------------------
MSG:  Bio::SeqIO::genbank=HASH(0xa7ba1c) is not a SeqI compliant module.
Attempting to dump, but may fail!
---------------------------------------------------
Can't locate object method "seq" via package "Bio::SeqIO::genbank" at /Library/Perl/5.8.8/Bio/SeqIO/genbank.pm line 760, <GEN0> line 447.

by doing this,

my $feat = new Bio::SeqFeature::Generic(-start                 =>20,
                                        -end         => $40,
                                        -primary_tag => 'newfeature' );
                                    $feat->add_tag_value("note","this is notes");
  $f->add_SeqFeature($feat); ## f is original feature pointer $io = Bio::SeqIO->new(-format => "genbank", -file => ">$newoutfile" );

    $io->write_seq($seqio_object);

On Fri, Aug 13, 2010 at 11:57 AM, Roy Chaudhuri <roy.chaudhuri at gmail.com>wrote:

> Please remember to copy replies to the mailing list.
>
> You can loop over the features in your Bio::Seq object:
> for my $feat ($seq->get_SeqFeatures) { # do something }
>
> And once you have found the feature you want to modify, you can add a 
> tag using something like:
> $feat->add_tag_value('note',"this is a note");
>
> When you're finished you can write out the modified sequence object to 
> a new GenBank file.
>
>
> On 13/08/2010 16:40, Jessica Sun wrote:
>
>> no i want to load the genbank file with existing features and I need 
>> to add some new feature tags to the existing ones and then save to a 
>> new update genbank file for local usage. I just not quite good on how 
>> to easily merge the two steps you recommended into one in a neat way.
>>
>> thx
>>
>>
>> On Fri, Aug 13, 2010 at 11:37 AM, Roy Chaudhuri 
>> <roy.chaudhuri at gmail.com <mailto:roy.chaudhuri at gmail.com>> wrote:
>>
>>    I'm not sure I understand, do you mean that you want to load just
>>    the sequence from the GenBank file (ignoring the existing
>>    annotation), then add your own features? There are instructions on
>>    how to do that here:
>>    
>> http://www.bioperl.org/wiki/HOWTO:SeqIO#Speed.2C_Bio::Seq::SeqBuilder
>>
>>
>>    On 13/08/2010 16:27, Jessica Sun wrote:
>>
>>        unfortunately. I want to add the feature to the sequence object
>>        I got
>>        from the Genbank file, I do not mind to save a new genbank file but
>>        these new genbank file contains the original genbank format and
>>        info I
>>        got plus the new feature tags I need to added to. Any quick
>>        solution to
>>        this?
>>
>>        thx
>>
>>        Jessica
>>
>>
>>
>>        On Fri, Aug 13, 2010 at 11:21 AM, Roy Chaudhuri
>>        <roy.chaudhuri at gmail.com <mailto:roy.chaudhuri at gmail.com>
>>        <mailto:roy.chaudhuri at gmail.com
>>        <mailto:roy.chaudhuri at gmail.com>>> wrote:
>>
>>            Hi Jessica.
>>
>>            You need to use Bio::SeqIO to read in the GenBank file to a
>>        BioPerl
>>            sequence object, and to write your new GenBank file:
>>        http://www.bioperl.org/wiki/HOWTO:SeqIO
>>
>>            To add a new feature follow the instructions here:
>>
>> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Ow
>> n_Sequences
>>
>>            (except that you are adding the feature to the sequence
>>        object you
>>            got from the Genbank file, not a new Bio::Seq object).
>>
>>            Cheers.
>>            Roy.
>>
>>
>>            On 13/08/2010 16:06, Jessica Sun wrote:
>>
>>                Does anyone knows how to open a genbank file, add new
>>        feature
>>                and then save
>>                a new genbank
>>                file with new feature added in bioperl ?
>>
>>                thx
>>
>>
>>
>>
>>
>>        --
>>        Jessica Jingping Sun
>>
>>
>>
>>
>>
>> --
>> Jessica Jingping Sun
>>
>
>


--
Jessica Jingping Sun
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From Kevin.M.Brown at asu.edu  Fri Aug 13 13:53:50 2010
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Fri, 13 Aug 2010 10:53:50 -0700
Subject: [Bioperl-l] Add sequence feature
In-Reply-To: <AANLkTikJ7md5gYebt3GOXBMbCBvDZQ1TmoEkUMp1MPf5@mail.gmail.com>
References: <AANLkTimkwyqE1tbZ+p-T41yvBWVq=GrCJ+L0rT+84mPy@mail.gmail.com><4C6562E0.7090008@gmail.com><AANLkTikGRnpybPPsoSf_FAXWUUpFCK9YEMMiyStBbfbN@mail.gmail.com><4C6566B0.60706@gmail.com><AANLkTikRuHaDmA3JvsdBHBXpgeti-pJvHuJk15s_9K+P@mail.gmail.com><4C656B67.5020402@gmail.com>
	<AANLkTikJ7md5gYebt3GOXBMbCBvDZQ1TmoEkUMp1MPf5@mail.gmail.com>
Message-ID: <1A4207F8295607498283FE9E93B775B406E4529F@EX02.asurite.ad.asu.edu>

If I'm reading your sample code correctly, then you are mistakenly
trying to output the input SeqIO object and not the actual Bio::Seq
object that was read in by SeqIO.

My $seqio = Bio::SeqIO->new;
My $seq = $seqio->next_seq;

#manipulate $seq

My $out = Bio::SeqIO->new;
$out->write_seq($seq);

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Jessica Sun
Sent: Friday, August 13, 2010 10:07 AM
To: Roy Chaudhuri
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Add sequence feature

Thanks. I somehow get these error messages.

--------------------- WARNING ---------------------
MSG:  Bio::SeqIO::genbank=HASH(0xa7ba1c) is not a SeqI compliant module.
Attempting to dump, but may fail!
---------------------------------------------------
Can't locate object method "seq" via package "Bio::SeqIO::genbank" at
/Library/Perl/5.8.8/Bio/SeqIO/genbank.pm line 760, <GEN0> line 447.

by doing this,

my $feat = new Bio::SeqFeature::Generic(-start                 =>20,
                                        -end         => $40,
                                        -primary_tag => 'newfeature' );
                                    $feat->add_tag_value("note","this is
notes");
  $f->add_SeqFeature($feat); ## f is original feature pointer
$io = Bio::SeqIO->new(-format => "genbank", -file => ">$newoutfile" );

    $io->write_seq($seqio_object);

On Fri, Aug 13, 2010 at 11:57 AM, Roy Chaudhuri
<roy.chaudhuri at gmail.com>wrote:

> Please remember to copy replies to the mailing list.
>
> You can loop over the features in your Bio::Seq object:
> for my $feat ($seq->get_SeqFeatures) { # do something }
>
> And once you have found the feature you want to modify, you can add a
tag
> using something like:
> $feat->add_tag_value('note',"this is a note");
>
> When you're finished you can write out the modified sequence object to
a
> new GenBank file.
>
>
> On 13/08/2010 16:40, Jessica Sun wrote:
>
>> no i want to load the genbank file with existing features and I need
to
>> add some new feature tags to the existing ones and then save to a new
>> update genbank file for local usage. I just not quite good on how to
>> easily merge the two steps you recommended into one in a neat way.
>>
>> thx
>>
>>
>> On Fri, Aug 13, 2010 at 11:37 AM, Roy Chaudhuri
<roy.chaudhuri at gmail.com
>> <mailto:roy.chaudhuri at gmail.com>> wrote:
>>
>>    I'm not sure I understand, do you mean that you want to load just
>>    the sequence from the GenBank file (ignoring the existing
>>    annotation), then add your own features? There are instructions on
>>    how to do that here:
>>
http://www.bioperl.org/wiki/HOWTO:SeqIO#Speed.2C_Bio::Seq::SeqBuilder
>>
>>
>>    On 13/08/2010 16:27, Jessica Sun wrote:
>>
>>        unfortunately. I want to add the feature to the sequence
object
>>        I got
>>        from the Genbank file, I do not mind to save a new genbank
file but
>>        these new genbank file contains the original genbank format
and
>>        info I
>>        got plus the new feature tags I need to added to. Any quick
>>        solution to
>>        this?
>>
>>        thx
>>
>>        Jessica
>>
>>
>>
>>        On Fri, Aug 13, 2010 at 11:21 AM, Roy Chaudhuri
>>        <roy.chaudhuri at gmail.com <mailto:roy.chaudhuri at gmail.com>
>>        <mailto:roy.chaudhuri at gmail.com
>>        <mailto:roy.chaudhuri at gmail.com>>> wrote:
>>
>>            Hi Jessica.
>>
>>            You need to use Bio::SeqIO to read in the GenBank file to
a
>>        BioPerl
>>            sequence object, and to write your new GenBank file:
>>        http://www.bioperl.org/wiki/HOWTO:SeqIO
>>
>>            To add a new feature follow the instructions here:
>>
>>
http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Own_S
equences
>>
>>            (except that you are adding the feature to the sequence
>>        object you
>>            got from the Genbank file, not a new Bio::Seq object).
>>
>>            Cheers.
>>            Roy.
>>
>>
>>            On 13/08/2010 16:06, Jessica Sun wrote:
>>
>>                Does anyone knows how to open a genbank file, add new
>>        feature
>>                and then save
>>                a new genbank
>>                file with new feature added in bioperl ?
>>
>>                thx
>>
>>
>>
>>
>>
>>        --
>>        Jessica Jingping Sun
>>
>>
>>
>>
>>
>> --
>> Jessica Jingping Sun
>>
>
>


-- 
Jessica Jingping Sun
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jessica.sun at gmail.com  Fri Aug 13 15:16:51 2010
From: jessica.sun at gmail.com (Jessica Sun)
Date: Fri, 13 Aug 2010 15:16:51 -0400
Subject: [Bioperl-l] Fwd:  Add sequence feature
In-Reply-To: <AANLkTim6MBPBbRr2bEkCgCL+6NMXGqJ0wWoz3-JPRKyG@mail.gmail.com>
References: <AANLkTimkwyqE1tbZ+p-T41yvBWVq=GrCJ+L0rT+84mPy@mail.gmail.com>
	<4C6562E0.7090008@gmail.com>
	<AANLkTikGRnpybPPsoSf_FAXWUUpFCK9YEMMiyStBbfbN@mail.gmail.com>
	<4C6566B0.60706@gmail.com>
	<AANLkTikRuHaDmA3JvsdBHBXpgeti-pJvHuJk15s_9K+P@mail.gmail.com>
	<4C656B67.5020402@gmail.com>
	<AANLkTikJ7md5gYebt3GOXBMbCBvDZQ1TmoEkUMp1MPf5@mail.gmail.com>
	<1A4207F8295607498283FE9E93B775B406E4529F@EX02.asurite.ad.asu.edu>
	<AANLkTim6MBPBbRr2bEkCgCL+6NMXGqJ0wWoz3-JPRKyG@mail.gmail.com>
Message-ID: <AANLkTimFO1Yn-n7vqmmvAF5smQeGadEW_fs_a0U-7ej4@mail.gmail.com>

---------- Forwarded message ----------
From: Jessica Sun <jessica.sun at gmail.com>
Date: Fri, Aug 13, 2010 at 3:16 PM
Subject: Re: [Bioperl-l] Add sequence feature
To: Kevin Brown <Kevin.M.Brown at asu.edu>


yes, I change that, somehow it still did not take the added features in.


On Fri, Aug 13, 2010 at 1:53 PM, Kevin Brown <Kevin.M.Brown at asu.edu> wrote:

> If I'm reading your sample code correctly, then you are mistakenly
> trying to output the input SeqIO object and not the actual Bio::Seq
> object that was read in by SeqIO.
>
> My $seqio = Bio::SeqIO->new;
> My $seq = $seqio->next_seq;
>
> #manipulate $seq
>
> My $out = Bio::SeqIO->new;
> $out->write_seq($seq);
>
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Jessica Sun
> Sent: Friday, August 13, 2010 10:07 AM
> To: Roy Chaudhuri
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Add sequence feature
>
> Thanks. I somehow get these error messages.
>
> --------------------- WARNING ---------------------
> MSG:  Bio::SeqIO::genbank=HASH(0xa7ba1c) is not a SeqI compliant module.
> Attempting to dump, but may fail!
> ---------------------------------------------------
> Can't locate object method "seq" via package "Bio::SeqIO::genbank" at
> /Library/Perl/5.8.8/Bio/SeqIO/genbank.pm line 760, <GEN0> line 447.
>
> by doing this,
>
> my $feat = new Bio::SeqFeature::Generic(-start                 =>20,
>                                        -end         => $40,
>                                        -primary_tag => 'newfeature' );
>                                    $feat->add_tag_value("note","this is
> notes");
>  $f->add_SeqFeature($feat); ## f is original feature pointer
> $io = Bio::SeqIO->new(-format => "genbank", -file => ">$newoutfile" );
>
>    $io->write_seq($seqio_object);
>
> On Fri, Aug 13, 2010 at 11:57 AM, Roy Chaudhuri
> <roy.chaudhuri at gmail.com>wrote:
>
> > Please remember to copy replies to the mailing list.
> >
> > You can loop over the features in your Bio::Seq object:
> > for my $feat ($seq->get_SeqFeatures) { # do something }
> >
> > And once you have found the feature you want to modify, you can add a
> tag
> > using something like:
> > $feat->add_tag_value('note',"this is a note");
> >
> > When you're finished you can write out the modified sequence object to
> a
> > new GenBank file.
> >
> >
> > On 13/08/2010 16:40, Jessica Sun wrote:
> >
> >> no i want to load the genbank file with existing features and I need
> to
> >> add some new feature tags to the existing ones and then save to a new
> >> update genbank file for local usage. I just not quite good on how to
> >> easily merge the two steps you recommended into one in a neat way.
> >>
> >> thx
> >>
> >>
> >> On Fri, Aug 13, 2010 at 11:37 AM, Roy Chaudhuri
> <roy.chaudhuri at gmail.com
> >> <mailto:roy.chaudhuri at gmail.com>> wrote:
> >>
> >>    I'm not sure I understand, do you mean that you want to load just
> >>    the sequence from the GenBank file (ignoring the existing
> >>    annotation), then add your own features? There are instructions on
> >>    how to do that here:
> >>
> http://www.bioperl.org/wiki/HOWTO:SeqIO#Speed.2C_Bio::Seq::SeqBuilder
> >>
> >>
> >>    On 13/08/2010 16:27, Jessica Sun wrote:
> >>
> >>        unfortunately. I want to add the feature to the sequence
> object
> >>        I got
> >>        from the Genbank file, I do not mind to save a new genbank
> file but
> >>        these new genbank file contains the original genbank format
> and
> >>        info I
> >>        got plus the new feature tags I need to added to. Any quick
> >>        solution to
> >>        this?
> >>
> >>        thx
> >>
> >>        Jessica
> >>
> >>
> >>
> >>        On Fri, Aug 13, 2010 at 11:21 AM, Roy Chaudhuri
> >>        <roy.chaudhuri at gmail.com <mailto:roy.chaudhuri at gmail.com>
> >>        <mailto:roy.chaudhuri at gmail.com
> >>        <mailto:roy.chaudhuri at gmail.com>>> wrote:
> >>
> >>            Hi Jessica.
> >>
> >>            You need to use Bio::SeqIO to read in the GenBank file to
> a
> >>        BioPerl
> >>            sequence object, and to write your new GenBank file:
> >>        http://www.bioperl.org/wiki/HOWTO:SeqIO
> >>
> >>            To add a new feature follow the instructions here:
> >>
> >>
> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Own_S
> equences
> >>
> >>            (except that you are adding the feature to the sequence
> >>        object you
> >>            got from the Genbank file, not a new Bio::Seq object).
> >>
> >>            Cheers.
> >>            Roy.
> >>
> >>
> >>            On 13/08/2010 16:06, Jessica Sun wrote:
> >>
> >>                Does anyone knows how to open a genbank file, add new
> >>        feature
> >>                and then save
> >>                a new genbank
> >>                file with new feature added in bioperl ?
> >>
> >>                thx
> >>
> >>
> >>
> >>
> >>
> >>        --
> >>        Jessica Jingping Sun
> >>
> >>
> >>
> >>
> >>
> >> --
> >> Jessica Jingping Sun
> >>
> >
> >
>
>
> --
> Jessica Jingping Sun
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Jessica Jingping Sun


-- 
Jessica Jingping Sun

From MEC at stowers.org  Fri Aug 13 15:56:09 2010
From: MEC at stowers.org (Cook, Malcolm)
Date: Fri, 13 Aug 2010 14:56:09 -0500
Subject: [Bioperl-l] Fwd:  Add sequence feature
In-Reply-To: <AANLkTimFO1Yn-n7vqmmvAF5smQeGadEW_fs_a0U-7ej4@mail.gmail.com>
References: <AANLkTimkwyqE1tbZ+p-T41yvBWVq=GrCJ+L0rT+84mPy@mail.gmail.com>
	<4C6562E0.7090008@gmail.com>
	<AANLkTikGRnpybPPsoSf_FAXWUUpFCK9YEMMiyStBbfbN@mail.gmail.com>
	<4C6566B0.60706@gmail.com>
	<AANLkTikRuHaDmA3JvsdBHBXpgeti-pJvHuJk15s_9K+P@mail.gmail.com>
	<4C656B67.5020402@gmail.com>
	<AANLkTikJ7md5gYebt3GOXBMbCBvDZQ1TmoEkUMp1MPf5@mail.gmail.com>
	<1A4207F8295607498283FE9E93B775B406E4529F@EX02.asurite.ad.asu.edu>
	<AANLkTim6MBPBbRr2bEkCgCL+6NMXGqJ0wWoz3-JPRKyG@mail.gmail.com>
	<AANLkTimFO1Yn-n7vqmmvAF5smQeGadEW_fs_a0U-7ej4@mail.gmail.com>
Message-ID: <BD62CBAC4395B94096109020651BE2EC1312232E46@EXCHMB-02.stowers-institute.org>

if you want to show all your code we might not have to guess at what the problem is.....
 

Malcolm Cook
Stowers Institute for Medical Research -  Bioinformatics
Kansas City, Missouri  USA
 

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Jessica Sun
Sent: Friday, August 13, 2010 2:17 PM
To: bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] Fwd: Add sequence feature

---------- Forwarded message ----------
From: Jessica Sun <jessica.sun at gmail.com>
Date: Fri, Aug 13, 2010 at 3:16 PM
Subject: Re: [Bioperl-l] Add sequence feature
To: Kevin Brown <Kevin.M.Brown at asu.edu>


yes, I change that, somehow it still did not take the added features in.


On Fri, Aug 13, 2010 at 1:53 PM, Kevin Brown <Kevin.M.Brown at asu.edu> wrote:

> If I'm reading your sample code correctly, then you are mistakenly 
> trying to output the input SeqIO object and not the actual Bio::Seq 
> object that was read in by SeqIO.
>
> My $seqio = Bio::SeqIO->new;
> My $seq = $seqio->next_seq;
>
> #manipulate $seq
>
> My $out = Bio::SeqIO->new;
> $out->write_seq($seq);
>
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Jessica Sun
> Sent: Friday, August 13, 2010 10:07 AM
> To: Roy Chaudhuri
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Add sequence feature
>
> Thanks. I somehow get these error messages.
>
> --------------------- WARNING ---------------------
> MSG:  Bio::SeqIO::genbank=HASH(0xa7ba1c) is not a SeqI compliant module.
> Attempting to dump, but may fail!
> ---------------------------------------------------
> Can't locate object method "seq" via package "Bio::SeqIO::genbank" at 
> /Library/Perl/5.8.8/Bio/SeqIO/genbank.pm line 760, <GEN0> line 447.
>
> by doing this,
>
> my $feat = new Bio::SeqFeature::Generic(-start                 =>20,
>                                        -end         => $40,
>                                        -primary_tag => 'newfeature' );
>                                    $feat->add_tag_value("note","this 
> is notes");  $f->add_SeqFeature($feat); ## f is original feature 
> pointer $io = Bio::SeqIO->new(-format => "genbank", -file => 
> ">$newoutfile" );
>
>    $io->write_seq($seqio_object);
>
> On Fri, Aug 13, 2010 at 11:57 AM, Roy Chaudhuri
> <roy.chaudhuri at gmail.com>wrote:
>
> > Please remember to copy replies to the mailing list.
> >
> > You can loop over the features in your Bio::Seq object:
> > for my $feat ($seq->get_SeqFeatures) { # do something }
> >
> > And once you have found the feature you want to modify, you can add 
> > a
> tag
> > using something like:
> > $feat->add_tag_value('note',"this is a note");
> >
> > When you're finished you can write out the modified sequence object 
> > to
> a
> > new GenBank file.
> >
> >
> > On 13/08/2010 16:40, Jessica Sun wrote:
> >
> >> no i want to load the genbank file with existing features and I 
> >> need
> to
> >> add some new feature tags to the existing ones and then save to a 
> >> new update genbank file for local usage. I just not quite good on 
> >> how to easily merge the two steps you recommended into one in a neat way.
> >>
> >> thx
> >>
> >>
> >> On Fri, Aug 13, 2010 at 11:37 AM, Roy Chaudhuri
> <roy.chaudhuri at gmail.com
> >> <mailto:roy.chaudhuri at gmail.com>> wrote:
> >>
> >>    I'm not sure I understand, do you mean that you want to load just
> >>    the sequence from the GenBank file (ignoring the existing
> >>    annotation), then add your own features? There are instructions on
> >>    how to do that here:
> >>
> http://www.bioperl.org/wiki/HOWTO:SeqIO#Speed.2C_Bio::Seq::SeqBuilder
> >>
> >>
> >>    On 13/08/2010 16:27, Jessica Sun wrote:
> >>
> >>        unfortunately. I want to add the feature to the sequence
> object
> >>        I got
> >>        from the Genbank file, I do not mind to save a new genbank
> file but
> >>        these new genbank file contains the original genbank format
> and
> >>        info I
> >>        got plus the new feature tags I need to added to. Any quick
> >>        solution to
> >>        this?
> >>
> >>        thx
> >>
> >>        Jessica
> >>
> >>
> >>
> >>        On Fri, Aug 13, 2010 at 11:21 AM, Roy Chaudhuri
> >>        <roy.chaudhuri at gmail.com <mailto:roy.chaudhuri at gmail.com>
> >>        <mailto:roy.chaudhuri at gmail.com
> >>        <mailto:roy.chaudhuri at gmail.com>>> wrote:
> >>
> >>            Hi Jessica.
> >>
> >>            You need to use Bio::SeqIO to read in the GenBank file 
> >> to
> a
> >>        BioPerl
> >>            sequence object, and to write your new GenBank file:
> >>        http://www.bioperl.org/wiki/HOWTO:SeqIO
> >>
> >>            To add a new feature follow the instructions here:
> >>
> >>
> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Own
> _S
> equences
> >>
> >>            (except that you are adding the feature to the sequence
> >>        object you
> >>            got from the Genbank file, not a new Bio::Seq object).
> >>
> >>            Cheers.
> >>            Roy.
> >>
> >>
> >>            On 13/08/2010 16:06, Jessica Sun wrote:
> >>
> >>                Does anyone knows how to open a genbank file, add new
> >>        feature
> >>                and then save
> >>                a new genbank
> >>                file with new feature added in bioperl ?
> >>
> >>                thx
> >>
> >>
> >>
> >>
> >>
> >>        --
> >>        Jessica Jingping Sun
> >>
> >>
> >>
> >>
> >>
> >> --
> >> Jessica Jingping Sun
> >>
> >
> >
>
>
> --
> Jessica Jingping Sun
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


--
Jessica Jingping Sun


-- 
Jessica Jingping Sun
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Mon Aug 16 14:02:15 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 16 Aug 2010 13:02:15 -0500
Subject: [Bioperl-l] Bug? Features with similar ranges,
	different IDs are considered overlapping
Message-ID: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu>

All,

This is in reference to a bug report I filed a while back.  In the below test script, two features with the same start/end are compared.  If the features have the same seq_id(), overlap succeeds.  If the seq_id is changed (e.g. is on another chromosome, for instance), the overlap still succeeds.  

The question is: is this a bug?  My vote would be 'yes', but there have been various arguments to say it's not.  

chris

(maybe I'll make this a regular thing on the list, just to hash out some of the edge cases I run into periodically)

=========================================

#!/usr/bin/perl -w

use strict;
use warnings;
use Test::More;
use Bio::SeqFeature::Generic;

my ( $feat1, $feat2 );

$feat1 = Bio::SeqFeature::Generic->new(
    -start  => 40,
    -end    => 80,
    -strand => 1,
    -seq_id => 'ABC123',
);

is $feat1->start,  40,       'start of feature location';
is $feat1->end,    80,       'end of feature location';
is $feat1->seq_id, 'ABC123', 'seq_id';

$feat2 = Bio::SeqFeature::Generic->new(
    -start  => 40,
    -end    => 80,
    -strand => 1,
    -seq_id => 'ABC123',
);

is $feat2->start,  40,       'start of feature location';
is $feat2->end,    80,       'end of feature location';
is $feat2->seq_id, 'ABC123', 'seq_id';

# Generic features with same Seq ID should overlap
ok( $feat2->overlaps($feat1), 'feat2 overlaps feat1' );

# Generic features with different Seq IDs shouldn't overlap
is( $feat2->seq_id('XYZ678'), 'XYZ678', 'change seq_id' );

# this currently fails
ok( !( $feat2->overlaps($feat1), 'feat2 doesn\'t overlap feat1' ) );

done_testing();


From David.Messina at sbc.su.se  Mon Aug 16 14:51:54 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Mon, 16 Aug 2010 20:51:54 +0200
Subject: [Bioperl-l] Bug? Features with similar ranges,
	different IDs are considered overlapping
In-Reply-To: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu>
References: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu>
Message-ID: <A07B1A30-7708-4401-BB13-7B4463D306E7@sbc.su.se>

> The question is: is this a bug?

Hmm, tricky.

Genomic start and end positions with differing IDs shouldn't overlap, but can't SeqFeatures apply to proteins and other molecules where one would want to compare positions without regard to ID?


Dave


From cjfields at illinois.edu  Mon Aug 16 21:39:00 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 16 Aug 2010 20:39:00 -0500
Subject: [Bioperl-l] Bug? Features with similar ranges,
	different IDs are considered overlapping
In-Reply-To: <A07B1A30-7708-4401-BB13-7B4463D306E7@sbc.su.se>
References: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu>
	<A07B1A30-7708-4401-BB13-7B4463D306E7@sbc.su.se>
Message-ID: <E3473ED6-2122-4B4A-8A73-E80C4136CCAC@illinois.edu>

On Aug 16, 2010, at 1:51 PM, Dave Messina wrote:

>> The question is: is this a bug?
> 
> Hmm, tricky.
> 
> Genomic start and end positions with differing IDs shouldn't overlap, but can't SeqFeatures apply to proteins and other molecules where one would want to compare positions without regard to ID?
> 
> Dave

Good point; it's probably the context the methods are used that matters.  So, maybe just a document clarification?

chris

From David.Messina at sbc.su.se  Tue Aug 17 05:06:05 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Tue, 17 Aug 2010 11:06:05 +0200
Subject: [Bioperl-l] Bug? Features with similar ranges,
	different IDs are considered overlapping
In-Reply-To: <E3473ED6-2122-4B4A-8A73-E80C4136CCAC@illinois.edu>
References: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu>
	<A07B1A30-7708-4401-BB13-7B4463D306E7@sbc.su.se>
	<E3473ED6-2122-4B4A-8A73-E80C4136CCAC@illinois.edu>
Message-ID: <83732A2C-C6E3-479A-ACAA-FE5756271991@sbc.su.se>

> Good point; it's probably the context the methods are used that matters.  So, maybe just a document clarification?

That's always good, but it really doesn't solve the issue you're describing.

I mean, who would expect to get overlaps for features on different chromosomes?

To me, that's a clear violation of reasonable user expectations. You shouldn't have to read the docs about something like that.

So what's the solution for these duelling use cases? I haven't thought about it much, but a first approximation might be to add a -genomic boolean flag that, when true, would do the right thing and check the ID when doing overlaps or other positional comparisons.

(Maybe -genomic is too obscure. Maybe it should be -same_id_for_overlaps or something like that.)

And maybe having to know to set a flag is effectively the same thing as having to read the docs to understand SeqFeature's overlap behavior.

What do the rest of you out there think?


Dave


From scott at scottcain.net  Tue Aug 17 08:45:27 2010
From: scott at scottcain.net (Scott Cain)
Date: Tue, 17 Aug 2010 08:45:27 -0400
Subject: [Bioperl-l] Bug? Features with similar ranges,
	different IDs are considered overlapping
In-Reply-To: <83732A2C-C6E3-479A-ACAA-FE5756271991@sbc.su.se>
References: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu>
	<A07B1A30-7708-4401-BB13-7B4463D306E7@sbc.su.se>
	<E3473ED6-2122-4B4A-8A73-E80C4136CCAC@illinois.edu>
	<83732A2C-C6E3-479A-ACAA-FE5756271991@sbc.su.se>
Message-ID: <B7A8E3B4-1E7E-4768-AFF3-3D4C4A5FC3B1@scottcain.net>

Hi Dave and Chris,

It seems to me that the genomic comparison is the thing people would do more often, so if you're going to create a flag, the default should be for the genomic comparison and if somebody is doing the protein space comparison and not getting the the expected results, they'll probably read the docs to find out why. 

Scott

--
Scott Cain, Ph. D.
scott at scottcain dot net
Ontario Institute for Cancer Research
http://gmod.org/
216 392 3087 

Snet from my iPhone.

On Aug 17, 2010, at 5:06 AM, Dave Messina <David.Messina at sbc.su.se> wrote:

>> Good point; it's probably the context the methods are used that matters.  So, maybe just a document clarification?
> 
> That's always good, but it really doesn't solve the issue you're describing.
> 
> I mean, who would expect to get overlaps for features on different chromosomes?
> 
> To me, that's a clear violation of reasonable user expectations. You shouldn't have to read the docs about something like that.
> 
> So what's the solution for these duelling use cases? I haven't thought about it much, but a first approximation might be to add a -genomic boolean flag that, when true, would do the right thing and check the ID when doing overlaps or other positional comparisons.
> 
> (Maybe -genomic is too obscure. Maybe it should be -same_id_for_overlaps or something like that.)
> 
> And maybe having to know to set a flag is effectively the same thing as having to read the docs to understand SeqFeature's overlap behavior.
> 
> What do the rest of you out there think?
> 
> 
> Dave
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From david.breimann at gmail.com  Tue Aug 17 09:44:08 2010
From: david.breimann at gmail.com (David Breimann)
Date: Tue, 17 Aug 2010 16:44:08 +0300
Subject: [Bioperl-l] bp_genbank2gff3.pl error with circular genomes
Message-ID: <AANLkTinzCSDcbXVDabwW+qmwSOKcVkYC-a-pkmYy1dfM@mail.gmail.com>

Hello,

The following genbank has a gene that runs over the 'end" of the
chromosome and into its "beginning", and the script generates an
error.

ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Bacillus_cereus_ATCC_10987/NC_005707.gbk

NC_005707 Unflattening error:
Details:
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: PROBLEM, SEVERITY==2
Ranges not in correct order. Strange ensembl genbank entry? Range:
[207497,208369] [1,687]
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:473
STACK: Bio::SeqFeature::Tools::Unflattener::problem
/usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:952
STACK: Bio::SeqFeature::Tools::Unflattener::_check_order_is_consistent
/usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2842
STACK: Bio::SeqFeature::Tools::Unflattener::infer_mRNA_from_CDS
/usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2713
STACK: Bio::SeqFeature::Tools::Unflattener::unflatten_seq
/usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:1532
STACK: main::unflatten_seq /usr/local/bin/bp_genbank2gff3.pl:1023
STACK: /usr/local/bin/bp_genbank2gff3.pl:506
-----------------------------------------------------------

Best,
Dave

From cjfields at illinois.edu  Tue Aug 17 09:51:02 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 17 Aug 2010 08:51:02 -0500
Subject: [Bioperl-l] bp_genbank2gff3.pl error with circular genomes
In-Reply-To: <AANLkTinzCSDcbXVDabwW+qmwSOKcVkYC-a-pkmYy1dfM@mail.gmail.com>
References: <AANLkTinzCSDcbXVDabwW+qmwSOKcVkYC-a-pkmYy1dfM@mail.gmail.com>
Message-ID: <8E620E8B-4A4A-42B2-A2FA-60C8ECA4C8A5@illinois.edu>

I think Chris Mungall has a branch set up for this in bioperl:

http://github.com/bioperl/bioperl-live/tree/circular

Is that correct?  Should we merge that code into the master branch?

chris

On Aug 17, 2010, at 8:44 AM, David Breimann wrote:

> Hello,
> 
> The following genbank has a gene that runs over the 'end" of the
> chromosome and into its "beginning", and the script generates an
> error.
> 
> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Bacillus_cereus_ATCC_10987/NC_005707.gbk
> 
> NC_005707 Unflattening error:
> Details:
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: PROBLEM, SEVERITY==2
> Ranges not in correct order. Strange ensembl genbank entry? Range:
> [207497,208369] [1,687]
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:473
> STACK: Bio::SeqFeature::Tools::Unflattener::problem
> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:952
> STACK: Bio::SeqFeature::Tools::Unflattener::_check_order_is_consistent
> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2842
> STACK: Bio::SeqFeature::Tools::Unflattener::infer_mRNA_from_CDS
> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2713
> STACK: Bio::SeqFeature::Tools::Unflattener::unflatten_seq
> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:1532
> STACK: main::unflatten_seq /usr/local/bin/bp_genbank2gff3.pl:1023
> STACK: /usr/local/bin/bp_genbank2gff3.pl:506
> -----------------------------------------------------------
> 
> Best,
> Dave
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From David.Messina at sbc.su.se  Tue Aug 17 09:52:11 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Tue, 17 Aug 2010 15:52:11 +0200
Subject: [Bioperl-l] Bug? Features with similar ranges,
	different IDs are considered overlapping
In-Reply-To: <B7A8E3B4-1E7E-4768-AFF3-3D4C4A5FC3B1@scottcain.net>
References: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu>
	<A07B1A30-7708-4401-BB13-7B4463D306E7@sbc.su.se>
	<E3473ED6-2122-4B4A-8A73-E80C4136CCAC@illinois.edu>
	<83732A2C-C6E3-479A-ACAA-FE5756271991@sbc.su.se>
	<B7A8E3B4-1E7E-4768-AFF3-3D4C4A5FC3B1@scottcain.net>
Message-ID: <EA0C23FB-8C2F-4C04-B0E8-4207409916DC@sbc.su.se>

> It seems to me that the genomic comparison is the thing people would do more often, so if you're going to create a flag, the default should be for the genomic comparison

Yep, agreed.

And such a flag should be named for the non-default behavior, then, like: -ignore_IDs_for_overlaps


Dave


From douglas.hoen at gmail.com  Thu Aug 12 10:24:27 2010
From: douglas.hoen at gmail.com (Douglas Hoen)
Date: Thu, 12 Aug 2010 10:24:27 -0400
Subject: [Bioperl-l] HMMER3 to GFF3
In-Reply-To: <20100812141645.1dc6507a.kai.blin@biotech.uni-tuebingen.de>
References: <4bb89ced-69d9-43ff-ae20-4ce134efc40a@f6g2000yqa.googlegroups.com>
	<20100812141645.1dc6507a.kai.blin@biotech.uni-tuebingen.de>
Message-ID: <A1AA9B70-69B9-4AA6-BB5F-FB0D0FDD0491@gmail.com>

Hi Kai,

Here it is.

Thanks,
-- Doug


-------------- next part --------------
A non-text attachment was scrubbed...
Name: chr1-tesigsv2.hmmscan
Type: application/octet-stream
Size: 676132 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20100812/7818b4a4/attachment-0001.obj>
-------------- next part --------------


On 2010-08-12, at 8:16 AM, Kai Blin wrote:

> On Wed, 11 Aug 2010 22:59:37 -0700 (PDT)
> Doug Hoen <douglas.hoen at gmail.com> wrote:
> 
> Hi Doug,
> 
>> Could someone please confirm whether the results are incorrect and, if
>> so, perhaps suggest a fix? It may well be that this problem is due to
>> the unusual way I am using hmmscan, rather than a problem with HMMER3
>> parsing...?
> 
> Can you please attach your hmmer input file? Along the way something
> inserted line breaks, making it unreadable.
> 
> It might well be possible that the HMMer3 parser still handles a little
> different from the HMMer2 parser, I haven't tried that script.
> 
> Cheers,
> Kai
> 
> -- 
> Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
> Institute for Microbiology and Infection Medicine
> Division of Microbiology/Biotechnology
> Eberhard-Karls-University of T?bingen
> Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
> D-72076 T?bingen                        Fax :   ++49 7071 29-5979
> Deutschland
> Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben


From CJMungall at lbl.gov  Tue Aug 17 11:53:15 2010
From: CJMungall at lbl.gov (Chris Mungall)
Date: Tue, 17 Aug 2010 08:53:15 -0700
Subject: [Bioperl-l] bp_genbank2gff3.pl error with circular genomes
In-Reply-To: <8E620E8B-4A4A-42B2-A2FA-60C8ECA4C8A5@illinois.edu>
References: <AANLkTinzCSDcbXVDabwW+qmwSOKcVkYC-a-pkmYy1dfM@mail.gmail.com>
	<8E620E8B-4A4A-42B2-A2FA-60C8ECA4C8A5@illinois.edu>
Message-ID: <D64E3F00-57BE-484B-A4DE-EEAC673C82E4@lbl.gov>


You can merge this in. It should allow David to proceed.

I haven't kept up on synchrony between bioperl and GFF on circular  
genomes. The above fix is conservative in that essentially preserves  
the genbank coordinates even when the origin is crossed:

	http://github.com/bioperl/bioperl-live/commit/d752a4cb5168d1bb01f8c80247a57f66b2bd9daf

However, if this is to conform to GFF3 then the resulting coordinates  
that cross the origin should have start/end incremented by the genome  
length

On Aug 17, 2010, at 6:51 AM, Chris Fields wrote:

> I think Chris Mungall has a branch set up for this in bioperl:
>
> http://github.com/bioperl/bioperl-live/tree/circular
>
> Is that correct?  Should we merge that code into the master branch?
>
> chris
>
> On Aug 17, 2010, at 8:44 AM, David Breimann wrote:
>
>> Hello,
>>
>> The following genbank has a gene that runs over the 'end" of the
>> chromosome and into its "beginning", and the script generates an
>> error.
>>
>> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Bacillus_cereus_ATCC_10987/NC_005707.gbk
>>
>> NC_005707 Unflattening error:
>> Details:
>> ------------- EXCEPTION: Bio::Root::Exception -------------
>> MSG: PROBLEM, SEVERITY==2
>> Ranges not in correct order. Strange ensembl genbank entry? Range:
>> [207497,208369] [1,687]
>> STACK: Error::throw
>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/ 
>> Root.pm:473
>> STACK: Bio::SeqFeature::Tools::Unflattener::problem
>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:952
>> STACK:  
>> Bio::SeqFeature::Tools::Unflattener::_check_order_is_consistent
>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2842
>> STACK: Bio::SeqFeature::Tools::Unflattener::infer_mRNA_from_CDS
>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2713
>> STACK: Bio::SeqFeature::Tools::Unflattener::unflatten_seq
>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:1532
>> STACK: main::unflatten_seq /usr/local/bin/bp_genbank2gff3.pl:1023
>> STACK: /usr/local/bin/bp_genbank2gff3.pl:506
>> -----------------------------------------------------------
>>
>> Best,
>> Dave
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at illinois.edu  Tue Aug 17 15:24:23 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 17 Aug 2010 14:24:23 -0500
Subject: [Bioperl-l] bp_genbank2gff3.pl error with circular genomes
In-Reply-To: <D64E3F00-57BE-484B-A4DE-EEAC673C82E4@lbl.gov>
References: <AANLkTinzCSDcbXVDabwW+qmwSOKcVkYC-a-pkmYy1dfM@mail.gmail.com>
	<8E620E8B-4A4A-42B2-A2FA-60C8ECA4C8A5@illinois.edu>
	<D64E3F00-57BE-484B-A4DE-EEAC673C82E4@lbl.gov>
Message-ID: <8C50FCC4-4BB5-4ACB-A138-BB20F50D2C45@illinois.edu>

On Aug 17, 2010, at 10:53 AM, Chris Mungall wrote:

> You can merge this in. It should allow David to proceed.

Will do.  I'll go ahead and delete the remote branch as well.

> I haven't kept up on synchrony between bioperl and GFF on circular genomes. The above fix is conservative in that essentially preserves the genbank coordinates even when the origin is crossed:
> 
> 	http://github.com/bioperl/bioperl-live/commit/d752a4cb5168d1bb01f8c80247a57f66b2bd9daf
> 
> However, if this is to conform to GFF3 then the resulting coordinates that cross the origin should have start/end incremented by the genome length

Yes, that is a problem that needs to be addressed.  Might be worth filing a bug report for tracking this; we can use David's example, or the one I recently added for phi-X174.

chris

> On Aug 17, 2010, at 6:51 AM, Chris Fields wrote:
> 
>> I think Chris Mungall has a branch set up for this in bioperl:
>> 
>> http://github.com/bioperl/bioperl-live/tree/circular
>> 
>> Is that correct?  Should we merge that code into the master branch?
>> 
>> chris
>> 
>> On Aug 17, 2010, at 8:44 AM, David Breimann wrote:
>> 
>>> Hello,
>>> 
>>> The following genbank has a gene that runs over the 'end" of the
>>> chromosome and into its "beginning", and the script generates an
>>> error.
>>> 
>>> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Bacillus_cereus_ATCC_10987/NC_005707.gbk
>>> 
>>> NC_005707 Unflattening error:
>>> Details:
>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>> MSG: PROBLEM, SEVERITY==2
>>> Ranges not in correct order. Strange ensembl genbank entry? Range:
>>> [207497,208369] [1,687]
>>> STACK: Error::throw
>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:473
>>> STACK: Bio::SeqFeature::Tools::Unflattener::problem
>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:952
>>> STACK: Bio::SeqFeature::Tools::Unflattener::_check_order_is_consistent
>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2842
>>> STACK: Bio::SeqFeature::Tools::Unflattener::infer_mRNA_from_CDS
>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2713
>>> STACK: Bio::SeqFeature::Tools::Unflattener::unflatten_seq
>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:1532
>>> STACK: main::unflatten_seq /usr/local/bin/bp_genbank2gff3.pl:1023
>>> STACK: /usr/local/bin/bp_genbank2gff3.pl:506
>>> -----------------------------------------------------------
>>> 
>>> Best,
>>> Dave
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From sheldon.mckay at gmail.com  Tue Aug 17 16:42:50 2010
From: sheldon.mckay at gmail.com (Sheldon McKay)
Date: Tue, 17 Aug 2010 16:42:50 -0400
Subject: [Bioperl-l] AlignIO and Gbrowse_syn
In-Reply-To: <E53C66C1-E4F1-4E83-B5ED-631CE62D7DCE@illinois.edu>
References: <AANLkTi=DS92zQ+_YANSbgmMiPUDo66YXyLhQby+_z_VD@mail.gmail.com>
	<C87CF736.E5DB%gowthaman.ramasamy@sbri.org>
	<AANLkTimhT7KFHgg8HssiMrXkGNSjRMqjkuE++iAaqj4C@mail.gmail.com>
	<18DF7D20DFEC044098A1062202F5FFF32F0237EAB7@exchsth.agresearch.co.nz>
	<E53C66C1-E4F1-4E83-B5ED-631CE62D7DCE@illinois.edu>
Message-ID: <AANLkTikYi9TGag3poS=xB73iGxqX_-ThZS9wU1TC2JDH@mail.gmail.com>

The growse_syn dev team is pretty small (n=1) right now, so any
patches would be welcome.

Sheldon


On Wed, Aug 11, 2010 at 6:02 PM, Chris Fields <cjfields at illinois.edu> wrote:
> Russell,
>
> We have had very few requests to support .maf until recently, which is why there has been little done with it. ?We welcome any help to improve it.
>
> chris
>
> On Aug 11, 2010, at 4:31 PM, Smithies, Russell wrote:
>
>> I know there was some brief discussion about .maf format a few weeks ago but I've had an enquiry (as below) from a colleague.
>> If GBrowse_syn is using .maf format, does AlignIO need more work?
>> Any comments?
>>
>> --Russell
>>
>>
>> I'd like to plug LASTZ alignments into GBrowse_syn. LASTZ can produce a limit number of alignment formats (http://www.bx.psu.edu/miller_lab/dist/README.lastz-1.02.00/README.lastz-1.02.00a.html#options_output). GBrowse_syn accepts clustalw format plus "other commonly used formats recognized by BioPerl's AlignIO parser" (http://gmod.org/wiki/GBrowse_syn_Database) . ?Since LASTZ doesn't produce clustalw, I've tried parsing LASTZ maf output to clustalw (and other alignment formats) using AlignIO, however I run into the following issues:
>> *Strand info is lost (probably fair enough, since this isn't part of the clustalw format per se; incorporating strand info within sequence IDs is a GBrowse_syn clustalw specification)
>> *The coordinate system for reverse strand matches ?differs between LASTZ .maf and BioPerl .maf: for LASTZ, coordinates relate to the reverse complemented sequence, whereas for BioPerl/GBrowse, coordinates relate to the original (non-rev complemented) sequence. E.g. a coordinate of "1" in the LASTZ .maf file refers to the last base of the original sequence; AlignIO prints "1" to the output clustalw file, but since strand info is lost it is construed as the first position at the very start of the original sequence. As a result all reverse match coordinates in the resulting clustalw output file are incorrect.
>> *AlignIO is unable to parse multiple, individual aligned regions within the same .maf file; it interleaves them
>>
>> I would be interested to hear whether anyone has already found a solution to integrating LASTZ and GBrowse_syn... and also whether any development of AlignIO to improve support of maf format is planned.
>> =======================================================================
>> Attention: The information contained in this message and/or attachments
>> from AgResearch Limited is intended only for the persons or entities
>> to which it is addressed and may contain confidential and/or privileged
>> material. Any review, retransmission, dissemination or other use of, or
>> taking of any action in reliance upon, this information by persons or
>> entities other than the intended recipients is prohibited by AgResearch
>> Limited. If you have received this message in error, please notify the
>> sender immediately.
>> =======================================================================
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From hxu.hong at gmail.com  Tue Aug 17 16:50:43 2010
From: hxu.hong at gmail.com (Hong Xu)
Date: Tue, 17 Aug 2010 16:50:43 -0400
Subject: [Bioperl-l] Bio::Tools::Primer3 question
Message-ID: <AANLkTi=NcuvzepGaqw_TUTr5MM6F2K_b8PT8Fa3qrZg2@mail.gmail.com>

Hello all,

I'm working to parse the Primer3 release 2.2.2-beta result. I made the
necessary changes to make Bio::Tools::Primer3 work with the new output
tags of Primer3 release 2.2.2. But when I tried to get the primer Tm,
I found that Bio::Tools::Primer3 gave different Tm from Primer3 result
file. Then I learned that the Tm was calculated by
Bio::SeqFeature::Primer module, not from parsing Primer3 result. If I
want to get data from parsing Primer3 result, should I write my own
Primer3 parser instead of Bio::Tools::Primer3?

thanks a lot,
Hong

From cjfields at illinois.edu  Tue Aug 17 17:14:02 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 17 Aug 2010 16:14:02 -0500
Subject: [Bioperl-l] Bio::Tools::Primer3 question
In-Reply-To: <AANLkTi=NcuvzepGaqw_TUTr5MM6F2K_b8PT8Fa3qrZg2@mail.gmail.com>
References: <AANLkTi=NcuvzepGaqw_TUTr5MM6F2K_b8PT8Fa3qrZg2@mail.gmail.com>
Message-ID: <E039C425-80C3-4F18-B589-AE98896A1175@illinois.edu>

Already ahead of you there, unfortunately.  I wrote a complete reimplementation of both the Primer3 parser and the Primer3 wrapper that handles both v1 and v2 of primer3_core.  Lack of tuits lately have prevented me from getting tests written up, so for the time being it's sitting in bioperl-dev:

http://github.com/bioperl/bioperl-dev

They are Bio::Tools::Primer3Redux (parser) and Bio::Tools::Run::Primer3Redux (wrapper).

I rewrote those b/c I found the original modules not adequate enough in many ways for my purposes then (the newer version uses simple features or feature pairs instead of the primer features, for the same reasons you mention re: Tm).  You're more than welcome to hack on the code a bit.  I'm planning on pulling it out into my own github repo for separate submission to CPAN.  

chris

On Aug 17, 2010, at 3:50 PM, Hong Xu wrote:

> Hello all,
> 
> I'm working to parse the Primer3 release 2.2.2-beta result. I made the
> necessary changes to make Bio::Tools::Primer3 work with the new output
> tags of Primer3 release 2.2.2. But when I tried to get the primer Tm,
> I found that Bio::Tools::Primer3 gave different Tm from Primer3 result
> file. Then I learned that the Tm was calculated by
> Bio::SeqFeature::Primer module, not from parsing Primer3 result. If I
> want to get data from parsing Primer3 result, should I write my own
> Primer3 parser instead of Bio::Tools::Primer3?
> 
> thanks a lot,
> Hong
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Tue Aug 17 23:42:59 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 17 Aug 2010 22:42:59 -0500
Subject: [Bioperl-l] bp_genbank2gff3.pl error with circular genomes
In-Reply-To: <8C50FCC4-4BB5-4ACB-A138-BB20F50D2C45@illinois.edu>
References: <AANLkTinzCSDcbXVDabwW+qmwSOKcVkYC-a-pkmYy1dfM@mail.gmail.com>
	<8E620E8B-4A4A-42B2-A2FA-60C8ECA4C8A5@illinois.edu>
	<D64E3F00-57BE-484B-A4DE-EEAC673C82E4@lbl.gov>
	<8C50FCC4-4BB5-4ACB-A138-BB20F50D2C45@illinois.edu>
Message-ID: <D1CC1B9C-36A7-4427-9100-AE5C85C5E965@illinois.edu>

Chris, David, 

The branch is now merged back to trunk.  David, let us know if this helps.

chris (f)

On Aug 17, 2010, at 2:24 PM, Chris Fields wrote:

> On Aug 17, 2010, at 10:53 AM, Chris Mungall wrote:
> 
>> You can merge this in. It should allow David to proceed.
> 
> Will do.  I'll go ahead and delete the remote branch as well.
> 
>> I haven't kept up on synchrony between bioperl and GFF on circular genomes. The above fix is conservative in that essentially preserves the genbank coordinates even when the origin is crossed:
>> 
>> 	http://github.com/bioperl/bioperl-live/commit/d752a4cb5168d1bb01f8c80247a57f66b2bd9daf
>> 
>> However, if this is to conform to GFF3 then the resulting coordinates that cross the origin should have start/end incremented by the genome length
> 
> Yes, that is a problem that needs to be addressed.  Might be worth filing a bug report for tracking this; we can use David's example, or the one I recently added for phi-X174.
> 
> chris
> 
>> On Aug 17, 2010, at 6:51 AM, Chris Fields wrote:
>> 
>>> I think Chris Mungall has a branch set up for this in bioperl:
>>> 
>>> http://github.com/bioperl/bioperl-live/tree/circular
>>> 
>>> Is that correct?  Should we merge that code into the master branch?
>>> 
>>> chris
>>> 
>>> On Aug 17, 2010, at 8:44 AM, David Breimann wrote:
>>> 
>>>> Hello,
>>>> 
>>>> The following genbank has a gene that runs over the 'end" of the
>>>> chromosome and into its "beginning", and the script generates an
>>>> error.
>>>> 
>>>> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Bacillus_cereus_ATCC_10987/NC_005707.gbk
>>>> 
>>>> NC_005707 Unflattening error:
>>>> Details:
>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>>> MSG: PROBLEM, SEVERITY==2
>>>> Ranges not in correct order. Strange ensembl genbank entry? Range:
>>>> [207497,208369] [1,687]
>>>> STACK: Error::throw
>>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:473
>>>> STACK: Bio::SeqFeature::Tools::Unflattener::problem
>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:952
>>>> STACK: Bio::SeqFeature::Tools::Unflattener::_check_order_is_consistent
>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2842
>>>> STACK: Bio::SeqFeature::Tools::Unflattener::infer_mRNA_from_CDS
>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2713
>>>> STACK: Bio::SeqFeature::Tools::Unflattener::unflatten_seq
>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:1532
>>>> STACK: main::unflatten_seq /usr/local/bin/bp_genbank2gff3.pl:1023
>>>> STACK: /usr/local/bin/bp_genbank2gff3.pl:506
>>>> -----------------------------------------------------------
>>>> 
>>>> Best,
>>>> Dave
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Wed Aug 18 00:48:55 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 17 Aug 2010 23:48:55 -0500
Subject: [Bioperl-l] Bio::Tools::Primer3 question
In-Reply-To: <E039C425-80C3-4F18-B589-AE98896A1175@illinois.edu>
References: <AANLkTi=NcuvzepGaqw_TUTr5MM6F2K_b8PT8Fa3qrZg2@mail.gmail.com>
	<E039C425-80C3-4F18-B589-AE98896A1175@illinois.edu>
Message-ID: <C4B91FBD-1705-4045-9D98-F5ABEA80C038@illinois.edu>

Hong,

The latest code, along with working tests, is present here:

http://github.com/cjfields/Bio-Tools-Primer3Redux

It needs a few more tests but the initial wrapper tests work fine for primer3 v2.2.1 on both Mac and Linux.  Will try using this to CPAN after a bit more cleanup.

chris

On Aug 17, 2010, at 4:14 PM, Chris Fields wrote:

> Already ahead of you there, unfortunately.  I wrote a complete reimplementation of both the Primer3 parser and the Primer3 wrapper that handles both v1 and v2 of primer3_core.  Lack of tuits lately have prevented me from getting tests written up, so for the time being it's sitting in bioperl-dev:
> 
> http://github.com/bioperl/bioperl-dev
> 
> They are Bio::Tools::Primer3Redux (parser) and Bio::Tools::Run::Primer3Redux (wrapper).
> 
> I rewrote those b/c I found the original modules not adequate enough in many ways for my purposes then (the newer version uses simple features or feature pairs instead of the primer features, for the same reasons you mention re: Tm).  You're more than welcome to hack on the code a bit.  I'm planning on pulling it out into my own github repo for separate submission to CPAN.  
> 
> chris
> 
> On Aug 17, 2010, at 3:50 PM, Hong Xu wrote:
> 
>> Hello all,
>> 
>> I'm working to parse the Primer3 release 2.2.2-beta result. I made the
>> necessary changes to make Bio::Tools::Primer3 work with the new output
>> tags of Primer3 release 2.2.2. But when I tried to get the primer Tm,
>> I found that Bio::Tools::Primer3 gave different Tm from Primer3 result
>> file. Then I learned that the Tm was calculated by
>> Bio::SeqFeature::Primer module, not from parsing Primer3 result. If I
>> want to get data from parsing Primer3 result, should I write my own
>> Primer3 parser instead of Bio::Tools::Primer3?
>> 
>> thanks a lot,
>> Hong
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From david.breimann at gmail.com  Wed Aug 18 02:46:58 2010
From: david.breimann at gmail.com (David Breimann)
Date: Wed, 18 Aug 2010 09:46:58 +0300
Subject: [Bioperl-l] bp_genbank2gff3.pl error with circular genomes
In-Reply-To: <D1CC1B9C-36A7-4427-9100-AE5C85C5E965@illinois.edu>
References: <AANLkTinzCSDcbXVDabwW+qmwSOKcVkYC-a-pkmYy1dfM@mail.gmail.com>
	<8E620E8B-4A4A-42B2-A2FA-60C8ECA4C8A5@illinois.edu>
	<D64E3F00-57BE-484B-A4DE-EEAC673C82E4@lbl.gov>
	<8C50FCC4-4BB5-4ACB-A138-BB20F50D2C45@illinois.edu>
	<D1CC1B9C-36A7-4427-9100-AE5C85C5E965@illinois.edu>
Message-ID: <AANLkTinsqQCpybg6MUzTwqNuKMn=kJMV4pL64GXwAOkG@mail.gmail.com>

Dear Chris's,

I tested the updated version on multiple genomes that previously
returned errors (for future reference: NC_005707, NC_006578,
NC_007103, NC_007104, NC_007106, NC_007107, NC_008573, NC_008762,
NC_008763, NC_008785, NC_009457, NC_012040). The script now ends
normally on all of them. However, as you mentioned, the result GFF3
file does not comply with GFF3 specifications for circular genomes.
This in turn causes some unexpected results in other applications.

Best,
Dave

On Wed, Aug 18, 2010 at 6:42 AM, Chris Fields <cjfields at illinois.edu> wrote:
> Chris, David,
>
> The branch is now merged back to trunk. ?David, let us know if this helps.
>
> chris (f)
>
> On Aug 17, 2010, at 2:24 PM, Chris Fields wrote:
>
>> On Aug 17, 2010, at 10:53 AM, Chris Mungall wrote:
>>
>>> You can merge this in. It should allow David to proceed.
>>
>> Will do. ?I'll go ahead and delete the remote branch as well.
>>
>>> I haven't kept up on synchrony between bioperl and GFF on circular genomes. The above fix is conservative in that essentially preserves the genbank coordinates even when the origin is crossed:
>>>
>>> ? ? ?http://github.com/bioperl/bioperl-live/commit/d752a4cb5168d1bb01f8c80247a57f66b2bd9daf
>>>
>>> However, if this is to conform to GFF3 then the resulting coordinates that cross the origin should have start/end incremented by the genome length
>>
>> Yes, that is a problem that needs to be addressed. ?Might be worth filing a bug report for tracking this; we can use David's example, or the one I recently added for phi-X174.
>>
>> chris
>>
>>> On Aug 17, 2010, at 6:51 AM, Chris Fields wrote:
>>>
>>>> I think Chris Mungall has a branch set up for this in bioperl:
>>>>
>>>> http://github.com/bioperl/bioperl-live/tree/circular
>>>>
>>>> Is that correct? ?Should we merge that code into the master branch?
>>>>
>>>> chris
>>>>
>>>> On Aug 17, 2010, at 8:44 AM, David Breimann wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> The following genbank has a gene that runs over the 'end" of the
>>>>> chromosome and into its "beginning", and the script generates an
>>>>> error.
>>>>>
>>>>> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Bacillus_cereus_ATCC_10987/NC_005707.gbk
>>>>>
>>>>> NC_005707 Unflattening error:
>>>>> Details:
>>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>>>> MSG: PROBLEM, SEVERITY==2
>>>>> Ranges not in correct order. Strange ensembl genbank entry? Range:
>>>>> [207497,208369] [1,687]
>>>>> STACK: Error::throw
>>>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:473
>>>>> STACK: Bio::SeqFeature::Tools::Unflattener::problem
>>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:952
>>>>> STACK: Bio::SeqFeature::Tools::Unflattener::_check_order_is_consistent
>>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2842
>>>>> STACK: Bio::SeqFeature::Tools::Unflattener::infer_mRNA_from_CDS
>>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2713
>>>>> STACK: Bio::SeqFeature::Tools::Unflattener::unflatten_seq
>>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:1532
>>>>> STACK: main::unflatten_seq /usr/local/bin/bp_genbank2gff3.pl:1023
>>>>> STACK: /usr/local/bin/bp_genbank2gff3.pl:506
>>>>> -----------------------------------------------------------
>>>>>
>>>>> Best,
>>>>> Dave
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From G.Gallone at sms.ed.ac.uk  Wed Aug 18 10:57:01 2010
From: G.Gallone at sms.ed.ac.uk (Giuseppe Gallone)
Date: Wed, 18 Aug 2010 15:57:01 +0100
Subject: [Bioperl-l] [RFC] Interolog::Walk
Message-ID: <4C6BF4BD.5010200@sms.ed.ac.uk>

Hello BioPerl community - I've written a new module called 
Interolog::Walk that I'm planning to put on CPAN. I would be grateful if 
you might take a look at the brief description I attached and tell me 
what you think. I'll be more than happy to post further details should 
the module be of some interest for someone.

Also, I am not totally sure about having the correct name for it. This 
is my first module and It would be great if you could advise on naming 
it appropriately. Hopefully the following description will give an idea 
on what it does.

===================


NAME
     Interolog::Walk - Retrieve, score and visualize putative 
Protein-Protein Interactions through the orthology-walk method

DESCRIPTION
     A common activity in computational biology is to mine 
protein-protein interactions from publicly available databases in order 
to build Protein-Protein Interaction (PPI) datasets.
In many instances, however, the number of experimentally obtained 
annotated PPIs is very scarce and it would be helpful to enrich the 
experimental dataset with high-quality, computationally-inferred PPIs. 
Such computationally-obtained dataset can extend, support or enrich 
experimental PPI datasets, and are of crucial importance in 
high-throughput gene prioritization studies, i.e. to drive hypotheses 
and restrict the dimensionality of many gene functional discovery problems.
This Perl Module, Interolog::Walk, is aimed at building putative PPI 
datasets on the basis of a number of comparative biology paradigms: the 
module implements a collection of computational biology algorithms based 
on the concept of "orthology projection". If interacting proteins A and 
B in organism X have orthologs A' and B' in organism Y, under certain 
conditions one can assume that the interaction will be conserved in 
organism Y, i.e. the A-B interaction can be "projected through the 
orthologies" to obtain a putative A'-B' interaction. The pair of 
interactions (A-B) and (A'-B') are named "Interologs" (see for instance 
[1] and [2]).

Interolog::Walk collects, analyses and collates gene orthology data 
provided by the Ensembl Consortium (www.ensembl.org) as well as PPI data 
provided by EBI Intact (http://www.ebi.ac.uk/intact/). It provides the 
user with the possibility of rating the quality and reliability of the 
putative interactions collected, by means of confidence scores, and 
optionally outputs network representations of the datasets, compatible 
with the biological network representation standard, Cytoscape.

USAGE
In order to carry out an interolog walk we start with a set of gene 
identifiers in one organism of interest. We query those ids against a 
number of comparative biology databases to retrieve a list of 
orthologues for each gene id of interest, in one or more species.
In the following step we rely  on PPI databases to retrieve the list of 
available interactors for the protein ids obtained. The output at this 
stage consists of a list of interactors of the orthologues of the 
initial gene set, plus several fields of ancillary data.
In the last step of the process we  project the interactions - again 
using orthology data - back to the original species of interest. The 
output of the process is a list of PUTATIVE INTERACTORS of the initial 
gene set, plus several fields of ancillary data.

====================

Given the scope and the focus of the project, I would imagine that 
viable alternatives for the namespace might be

Bio::Orthology::InterologWalk
Bio::InterologMap

or maybe
Interolog::Map
Orthology::Map
Orthology::InterologMap

There are no similar projects as far as I could see so I shouldn't run 
the risk of overlapping namespaces. Still I would love to know your 
informed opinion about it.

best,
Giuseppe


REFERENCES
[1] Yu H, Luscombe NM, Lu HX, Zhu X, Xia Y, Han JD, Bertin N, Chung S, 
Vidal M, Gerstein M. Annotation transfer between genomes: 
protein-protein interologs and protein-DNA regulogs. Genome Research 
2004 Jun;14(6):1107-18.

[2]Wiles AM, Doderer M, Ruan J, Gu T-T, Ravi D, Blackman BA, Bishop AJR. 
"Building and Analyzing Protein Interactome Networks by Cross-species 
Comparisons." BMC Systems Biology 2010, 4:36 - PMID: 20353594

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

From David.Messina at sbc.su.se  Wed Aug 18 12:52:58 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Wed, 18 Aug 2010 18:52:58 +0200
Subject: [Bioperl-l] [RFC] Interolog::Walk
In-Reply-To: <4C6BF4BD.5010200@sms.ed.ac.uk>
References: <4C6BF4BD.5010200@sms.ed.ac.uk>
Message-ID: <8BBCD561-CA5E-412B-8CBC-34767006A74A@sbc.su.se>

Hi Giuseppe,

Sounds really interesting ? thanks for posting this.

> Bio::Orthology::InterologWalk

I vote for this name, or in any case something with Bio:: as the top-level namespace since it's a biology-related package.

I like that you're providing a lot of background and information about the project in the documentation. However, the USAGE section should give information about how to use the module, with example code. You can look at other modules on CPAN (or in BioPerl) to see the conventions for writing documentation.

Also, from what you wrote, it sounds like this might be a pipeline or a script rather than a module per se, or perhaps a script and a set of modules. It would be helpful to clarify in your documentation (if you haven't already) how exactly things are organized (and of course example code will help with that, too).


Hope that's helpful, and let us know when you've got it up on CPAN so we can try it out!


Dave


From cjfields at illinois.edu  Wed Aug 18 14:24:16 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 18 Aug 2010 13:24:16 -0500
Subject: [Bioperl-l] bp_genbank2gff3.pl error with circular genomes
In-Reply-To: <AANLkTinsqQCpybg6MUzTwqNuKMn=kJMV4pL64GXwAOkG@mail.gmail.com>
References: <AANLkTinzCSDcbXVDabwW+qmwSOKcVkYC-a-pkmYy1dfM@mail.gmail.com>
	<8E620E8B-4A4A-42B2-A2FA-60C8ECA4C8A5@illinois.edu>
	<D64E3F00-57BE-484B-A4DE-EEAC673C82E4@lbl.gov>
	<8C50FCC4-4BB5-4ACB-A138-BB20F50D2C45@illinois.edu>
	<D1CC1B9C-36A7-4427-9100-AE5C85C5E965@illinois.edu>
	<AANLkTinsqQCpybg6MUzTwqNuKMn=kJMV4pL64GXwAOkG@mail.gmail.com>
Message-ID: <C385563A-9724-4045-B5A2-7F28A5CB897A@illinois.edu>

Okay, will file this as a bug.  Thanks!

chris

On Aug 18, 2010, at 1:46 AM, David Breimann wrote:

> Dear Chris's,
> 
> I tested the updated version on multiple genomes that previously
> returned errors (for future reference: NC_005707, NC_006578,
> NC_007103, NC_007104, NC_007106, NC_007107, NC_008573, NC_008762,
> NC_008763, NC_008785, NC_009457, NC_012040). The script now ends
> normally on all of them. However, as you mentioned, the result GFF3
> file does not comply with GFF3 specifications for circular genomes.
> This in turn causes some unexpected results in other applications.
> 
> Best,
> Dave
> 
> On Wed, Aug 18, 2010 at 6:42 AM, Chris Fields <cjfields at illinois.edu> wrote:
>> Chris, David,
>> 
>> The branch is now merged back to trunk.  David, let us know if this helps.
>> 
>> chris (f)
>> 
>> On Aug 17, 2010, at 2:24 PM, Chris Fields wrote:
>> 
>>> On Aug 17, 2010, at 10:53 AM, Chris Mungall wrote:
>>> 
>>>> You can merge this in. It should allow David to proceed.
>>> 
>>> Will do.  I'll go ahead and delete the remote branch as well.
>>> 
>>>> I haven't kept up on synchrony between bioperl and GFF on circular genomes. The above fix is conservative in that essentially preserves the genbank coordinates even when the origin is crossed:
>>>> 
>>>>      http://github.com/bioperl/bioperl-live/commit/d752a4cb5168d1bb01f8c80247a57f66b2bd9daf
>>>> 
>>>> However, if this is to conform to GFF3 then the resulting coordinates that cross the origin should have start/end incremented by the genome length
>>> 
>>> Yes, that is a problem that needs to be addressed.  Might be worth filing a bug report for tracking this; we can use David's example, or the one I recently added for phi-X174.
>>> 
>>> chris
>>> 
>>>> On Aug 17, 2010, at 6:51 AM, Chris Fields wrote:
>>>> 
>>>>> I think Chris Mungall has a branch set up for this in bioperl:
>>>>> 
>>>>> http://github.com/bioperl/bioperl-live/tree/circular
>>>>> 
>>>>> Is that correct?  Should we merge that code into the master branch?
>>>>> 
>>>>> chris
>>>>> 
>>>>> On Aug 17, 2010, at 8:44 AM, David Breimann wrote:
>>>>> 
>>>>>> Hello,
>>>>>> 
>>>>>> The following genbank has a gene that runs over the 'end" of the
>>>>>> chromosome and into its "beginning", and the script generates an
>>>>>> error.
>>>>>> 
>>>>>> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Bacillus_cereus_ATCC_10987/NC_005707.gbk
>>>>>> 
>>>>>> NC_005707 Unflattening error:
>>>>>> Details:
>>>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>>>>> MSG: PROBLEM, SEVERITY==2
>>>>>> Ranges not in correct order. Strange ensembl genbank entry? Range:
>>>>>> [207497,208369] [1,687]
>>>>>> STACK: Error::throw
>>>>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:473
>>>>>> STACK: Bio::SeqFeature::Tools::Unflattener::problem
>>>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:952
>>>>>> STACK: Bio::SeqFeature::Tools::Unflattener::_check_order_is_consistent
>>>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2842
>>>>>> STACK: Bio::SeqFeature::Tools::Unflattener::infer_mRNA_from_CDS
>>>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2713
>>>>>> STACK: Bio::SeqFeature::Tools::Unflattener::unflatten_seq
>>>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:1532
>>>>>> STACK: main::unflatten_seq /usr/local/bin/bp_genbank2gff3.pl:1023
>>>>>> STACK: /usr/local/bin/bp_genbank2gff3.pl:506
>>>>>> -----------------------------------------------------------
>>>>>> 
>>>>>> Best,
>>>>>> Dave
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>> 
>>>> 
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cdavis at bcm.tmc.edu  Wed Aug 18 15:19:53 2010
From: cdavis at bcm.tmc.edu (Caleb Davis)
Date: Wed, 18 Aug 2010 14:19:53 -0500
Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlast.pm - bl2seq question
Message-ID: <4C6C3259.4060304@bcm.tmc.edu>

Hello, thank you for bioperl!

I am getting discrepancies between the online bl2seq 
(www.ncbi.nlm.nih.gov/blast/*bl2seq*/wblast2.cgi) and bioperl's 
implementation, and I'm not sure why. I'm seeing a desired behavior 
through the web interface but can't replicate it locally. Specifically, 
online bl2seq aligns across a 1 bp insertion in the subject whereas the 
local bl2seq just reports a shorter alignment.

Any ideas? Thanks again,
--Caleb

The desired parameter differences from default are -F F -W 7 (turn 
complexity filter off, word size = 7). Below I present the online and 
local results given the following input sequences:

 >consensus
GAGGATCCAGAATTCTC
 >FVFTF6N01A86BR
AACCCAATGTAAGGAAGCTAAGAACCTTGAAAAGAGGATACCAGAATTCTC

Here are the parameters and result I'm getting online:
Blast4-request ::= {
  body queue-search {
    program "blastn",
    service "plain",
    queries bioseq-set {
      seq-set {
        seq {
          id {
            local id 26297
          },
          descr {
            title "consensus",
            user {
              type str "CFastaReader",
              data {
                {
                  label str "DefLine",
                  data str ">consensus"
                }
              }
            }
          },
          inst {
            repr raw,
            mol na,
            length 17,
            seq-data ncbi2na '8A3520F740'H
          }
        }
      }
    },
    subject sequences {
      {
        id {
          local id 26299
        },
        descr {
          title "FVFTF6N01A86BR",
          user {
            type str "CFastaReader",
            data {
              {
                label str "DefLine",
                data str ">FVFTF6N01A86BR"
              }
            }
          }
        },
        inst {
          repr raw,
          mol na,
          length 51,
          seq-data ncbi2na '0543B0A09C205F80228C520F74'H
        }
      }
    },
    algorithm-options {
      {
        name "EvalueThreshold",
        value cutoff e-value { 1, 10, 1 }
      },
      {
        name "UngappedMode",
        value boolean FALSE
      },
      {
        name "PercentIdentity",
        value real { 0, 10, 0 }
      },
      {
        name "HitlistSize",
        value integer 100
      },
      {
        name "EffectiveSearchSpace",
        value big-integer 0
      },
      {
        name "DbLength",
        value big-integer 0
      },
      {
        name "WindowSize",
        value integer 0
      },
      {
        name "DustFiltering",
        value boolean FALSE
      },
      {
        name "RepeatFiltering",
        value boolean FALSE
      },
      {
        name "MaskAtHash",
        value boolean TRUE
      },
      {
        name "MismatchPenalty",
        value integer -3
      },
      {
        name "MatchReward",
        value integer 2
      },
      {
        name "GapOpeningCost",
        value integer 5
      },
      {
        name "GapExtensionCost",
        value integer 2
      },
      {
        name "StrandOption",
        value strand-type both-strands
      },
      {
        name "WordSize",
        value integer 7
      }
    },
    format-options {
      {
        name "Web_JobTitle",
        value string "consensus"
      },
      {
        name "Web_BlastSpecialPage",
        value string "blast2seq"
      }
    }
  }
}

 >lcl|30439 FVFTF6N01A86BR
Length=51


                                                         Sort alignments 
for this subject sequence by:
                                                           E value  
Score  Percent identity
                                                           Query start 
position  Subject start position
 Score = 24.7 bits (26),  Expect = 2e-05
 Identities = 17/18 (94%), Gaps = 1/18 (5%)
 Strand=Plus/Plus

Query  1   GAGGAT-CCAGAATTCTC  17
           |||||| |||||||||||
Sbjct  34  GAGGATACCAGAATTCTC  51

Here's the output from a local search (I changed the expect to 5.0 just 
to prove to myself that some parameters are getting through OK):
my @params = (-program => 'blastn', -outfile => 'bl2seq.out', -FILTER => 
'F', -WORDSIZE => 7, -expect => 5.0);
my $factory = Bio::Tools::Run::StandAloneBlast->new(@params);
my $bl2seq_report = $factory->bl2seq($cons_seqobj, $single_seqobj); 
#consensus vs. FVFTF6N01A86BR
print Dumper $bl2seq_report->next_result;

$VAR1 = bless( {
                 '_inclusion_threshold' => undef,
                 '_queryacc' => 'adapter_consensus',
                 '_iteration_index' => 0,
                 '_iteration_count' => 1,
                 '_hits' => [],
                 '_hitindex' => 0,
                 '_querylength' => '17',
                 '_querydesc' => '',
                 '_iterations' => [
                                    bless( {
                                             
'_oldhits_not_below_threshold' => [],
                                             '_newhits_unclassified' => [],
                                             '_number' => 1,
                                             
'_oldhits_newly_below_threshold' => [],
                                             '_hit_factory' => bless( {
                                                                        
'interface' => 'Bio::Search::Hit::HitI',
                                                                        
'type' => 'Bio::Search::Hit::BlastHit',
                                                                        
'_loaded_types' => {
                                                                                             
'Bio::Search::Hit::BlastHit' => 1
                                                                                           
},
                                                                        
'_root_verbose' => 0
                                                                      }, 
'Bio::Factory::ObjectFactory' ),
                                             '_newhits_below_threshold' => [
                                                                             
{
                                                                               
'-algorithm' => 'BLASTN',
                                                                               
'-description' => '',
                                                                               
'-length' => '51',
                                                                               
'-query_len' => '17',
                                                                               
'-hsp_factory' => bless( {
                                                                                                          
'interface' => 'Bio::Search::HSP::HSPI',
                                                                                                          
'type' => 'Bio::Search::HSP::GenericHSP',
                                                                                                          
'_loaded_types' => {
                                                                                                                               
'Bio::Search::HSP::GenericHSP' => 1
                                                                                                                             
},
                                                                                                          
'_root_verbose' => 0
                                                                                                        
}, 'Bio::Factory::ObjectFactory' ),
                                                                               
'-name' => 'FVFTF6N01A86BR',
                                                                               
'-rank' => 1,
                                                                               
'-hsps' => [
                                                                                            
{
                                                                                              
'-query_start' => '7',
                                                                                              
'-algorithm' => 'BLASTN',
                                                                                              
'-hit_seq' => 'ccagaattctc',
                                                                                              
'-hit_length' => '51',
                                                                                              
'-query_length' => '17',
                                                                                              
'-query_desc' => '',
                                                                                              
'-query_frame' => 0,
                                                                                              
'-rank' => 1,
                                                                                              
'-hit_desc' => '',
                                                                                              
'-query_end' => '17',
                                                                                              
'-hit_name' => 'FVFTF6N01A86BR',
                                                                                              
'-identical' => '11',
                                                                                              
'-query_name' => 'adapter_consensus',
                                                                                              
'-evalue' => '1e-04',
                                                                                              
'-score' => '11',
                                                                                              
'-conserved' => '11',
                                                                                              
'-hit_frame' => 0,
                                                                                              
'-hsp_length' => '11',
                                                                                              
'-query_seq' => 'ccagaattctc',
                                                                                              
'-hit_start' => '41',
                                                                                              
'-homology_seq' => '|||||||||||',
                                                                                              
'-hit_end' => '51',
                                                                                              
'-bits' => '22.3'
                                                                                            
},
                                                                                            
{
                                                                                              
'-query_start' => '9',
                                                                                              
'-algorithm' => 'BLASTN',
                                                                                              
'-hit_seq' => 'agaattct',
                                                                                              
'-hit_length' => '51',
                                                                                              
'-query_length' => '17',
                                                                                              
'-query_desc' => '',
                                                                                              
'-query_frame' => 0,
                                                                                              
'-rank' => 2,
                                                                                              
'-hit_desc' => '',
                                                                                              
'-query_end' => '16',
                                                                                              
'-hit_name' => 'FVFTF6N01A86BR',
                                                                                              
'-identical' => '8',
                                                                                              
'-query_name' => 'adapter_consensus',
                                                                                              
'-evalue' => '0.007',
                                                                                              
'-score' => '8',
                                                                                              
'-conserved' => '8',
                                                                                              
'-hit_frame' => 0,
                                                                                              
'-hsp_length' => '8',
                                                                                              
'-query_seq' => 'agaattct',
                                                                                              
'-hit_start' => '50',
                                                                                              
'-homology_seq' => '||||||||',
                                                                                              
'-hit_end' => '43',
                                                                                              
'-bits' => '16.4'
                                                                                            
}
                                                                                          
],
                                                                               
'-accession' => 'FVFTF6N01A86BR',
                                                                               
'-significance' => '1e-04'
                                                                             
}
                                                                           
],
                                             '_root_verbose' => 0,
                                             
'_newhits_not_below_threshold' => [],
                                             '_oldhits_below_threshold' 
=> []
                                           }, 
'Bio::Search::Iteration::GenericIteration' )
                                  ],
                 '_hit_factory' => 
$VAR1->{'_iterations'}[0]{'_hit_factory'},
                 '_statistics' => bless( {
                                           'stats' => {
                                                        'S1' => '4',
                                                        'S1_bits' => '8.4',
                                                        'kappa_gapped' 
=> '0.711',
                                                        'X3_bits' => '99.1',
                                                        'X1' => '4',
                                                        'lambda_gapped' 
=> '1.37',
                                                        'X2' => '15',
                                                        'S2' => '4',
                                                        
'seqs_better_than_cutoff' => '1',
                                                        'Hits_to_DB' => '5',
                                                        'num_extensions' 
=> '2',
                                                        
'num_successful_extensions' => '2',
                                                        'X1_bits' => '7.9',
                                                        'X3' => '50',
                                                        'dbentries' => '1',
                                                        'entropy_gapped' 
=> '1.31',
                                                        'X2_bits' => '29.7',
                                                        'S2_bits' => '8.4'
                                                      }
                                         }, 
'Bio::Search::GenericStatistics' ),
                 '_algorithm' => 'BLASTN',
                 '_parameters' => bless( {
                                           'params' => {
                                                         'gapext' => '2',
                                                         'matrix' => 
'blastn matrix:1 -3',
                                                         'expect' => '5.0',
                                                         'allowgaps' => 
'yes',
                                                         'gapopen' => '5'
                                                       }
                                         }, 
'Bio::Tools::Run::GenericParameters' ),
                 '_root_verbose' => 0,
                 '_queryname' => 'adapter_consensus'
               }, 'Bio::Search::Result::BlastResult' );


From David.Messina at sbc.su.se  Wed Aug 18 18:32:37 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 19 Aug 2010 00:32:37 +0200
Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlast.pm - bl2seq
	question
In-Reply-To: <4C6C3259.4060304@bcm.tmc.edu>
References: <4C6C3259.4060304@bcm.tmc.edu>
Message-ID: <E8F0F7A7-BC33-4E37-8AAB-75A9470E82A5@sbc.su.se>

Hi Caleb,

The first thing I would do is take BioPerl out of the equation and test your local bl2seq on the command line. If you get the same output locally as on the web version, then there is a problem with BioPerl. If you're still seeing a discrepancy between the web and your local run, then this isn't a problem with BioPerl.

Just to be clear, BioPerl doesn't "implement" any of the BLAST programs; it is simply a wrapper around the programs that you download from NCBI. That doesn't mean BioPerl isn't at fault, of course, just that it's important to isolate the problem carefully.

The most common reasons for these discrepancies are:

- different version numbers of BLAST

2.2.21? 2.2.22? Is it the same on the web as locally?

- similarly, different implementations of BLAST

NCBI's old BLAST suite is now deprecated and replaced with BLAST+. All of the online BLAST web queries are Blast+ now ? are you running BLAST+ locally? (there's also a separate BioPerl wrapper for BLAST+ called Bio::Tools::Run::BlastPlus)

- hidden "default" parameters

Even though you're only changing a handful of parameters, the defaults (particularly on the web version) may be different than what you expect.

In your case, it looks like on the web version, match score is 2 and mismatch is -3. However, in the local version I believe match score is 1 and a mismatch is -3.

See this line in the params block near the end of your post:

	'matrix' => 'blastn matrix:1 -3',


Dave


From sidd.basu at gmail.com  Wed Aug 18 20:28:32 2010
From: sidd.basu at gmail.com (Siddhartha Basu)
Date: Wed, 18 Aug 2010 19:28:32 -0500
Subject: [Bioperl-l]  Re: [RFC] Interolog::Walk
In-Reply-To: <4C6BF4BD.5010200@sms.ed.ac.uk>
References: <4C6BF4BD.5010200@sms.ed.ac.uk>
Message-ID: <20100819002830.GA366@Macintosh-235.local>

Hi, 

On Wed, 18 Aug 2010, Giuseppe Gallone wrote:

> Hello BioPerl community - I've written a new module called Interolog::Walk 
> that I'm planning to put on CPAN. I would be grateful if you might take a 
> look at the brief description I attached and tell me what you think. I'll 
> be more than happy to post further details should the module be of some 
> interest for someone.
>
> Also, I am not totally sure about having the correct name for it. This is 
> my first module and It would be great if you could advise on naming it 
> appropriately. Hopefully the following description will give an idea on 
> what it does.
>
> ===================
>
>
> NAME
>     Interolog::Walk - Retrieve, score and visualize putative 
> Protein-Protein Interactions through the orthology-walk method
>
> DESCRIPTION
>     A common activity in computational biology is to mine protein-protein 
> interactions from publicly available databases in order to build 
> Protein-Protein Interaction (PPI) datasets.
> In many instances, however, the number of experimentally obtained annotated 
> PPIs is very scarce and it would be helpful to enrich the experimental 
> dataset with high-quality, computationally-inferred PPIs. Such 
> computationally-obtained dataset can extend, support or enrich experimental 
> PPI datasets, and are of crucial importance in high-throughput gene 
> prioritization studies, i.e. to drive hypotheses and restrict the 
> dimensionality of many gene functional discovery problems.
> This Perl Module, Interolog::Walk, is aimed at building putative PPI 
> datasets on the basis of a number of comparative biology paradigms: the 
> module implements a collection of computational biology algorithms based on 
> the concept of "orthology projection". If interacting proteins A and B in 
> organism X have orthologs A' and B' in organism Y, under certain conditions 
> one can assume that the interaction will be conserved in organism Y, i.e. 
> the A-B interaction can be "projected through the orthologies" to obtain a 
> putative A'-B' interaction. The pair of interactions (A-B) and (A'-B') are 
> named "Interologs" (see for instance [1] and [2]).
>
> Interolog::Walk collects, analyses and collates gene orthology data 
> provided by the Ensembl Consortium (www.ensembl.org) as well as PPI data 
> provided by EBI Intact (http://www.ebi.ac.uk/intact/). It provides the user 
> with the possibility of rating the quality and reliability of the putative 
> interactions collected, by means of confidence scores, and optionally 
> outputs network representations of the datasets, compatible with the 
> biological network representation standard, Cytoscape.

Sounds interesting. I am currently playing around with a perl based webapp for displaying interactome
using cytoscapeweb. Depending how your design pans out,  would be happy to
use your module as a backend analysis layer. And on a related note,  you
might want to have a look at bioperl-network and if there is any overlap
might be worth contributing.

-siddhartha

>
> USAGE
> In order to carry out an interolog walk we start with a set of gene 
> identifiers in one organism of interest. We query those ids against a 
> number of comparative biology databases to retrieve a list of orthologues 
> for each gene id of interest, in one or more species.
> In the following step we rely  on PPI databases to retrieve the list of 
> available interactors for the protein ids obtained. The output at this 
> stage consists of a list of interactors of the orthologues of the initial 
> gene set, plus several fields of ancillary data.
> In the last step of the process we  project the interactions - again using 
> orthology data - back to the original species of interest. The output of 
> the process is a list of PUTATIVE INTERACTORS of the initial gene set, plus 
> several fields of ancillary data.
>
> ====================
>
> Given the scope and the focus of the project, I would imagine that viable 
> alternatives for the namespace might be
>
> Bio::Orthology::InterologWalk
> Bio::InterologMap
>
> or maybe
> Interolog::Map
> Orthology::Map
> Orthology::InterologMap
>
> There are no similar projects as far as I could see so I shouldn't run the 
> risk of overlapping namespaces. Still I would love to know your informed 
> opinion about it.
>
> best,
> Giuseppe
>
>
>
> REFERENCES
> [1] Yu H, Luscombe NM, Lu HX, Zhu X, Xia Y, Han JD, Bertin N, Chung S, 
> Vidal M, Gerstein M. Annotation transfer between genomes: protein-protein 
> interologs and protein-DNA regulogs. Genome Research 2004 
> Jun;14(6):1107-18.
>
> [2]Wiles AM, Doderer M, Ruan J, Gu T-T, Ravi D, Blackman BA, Bishop AJR. 
> "Building and Analyzing Protein Interactome Networks by Cross-species 
> Comparisons." BMC Systems Biology 2010, 4:36 - PMID: 20353594
>
> -- 
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

From dan.kortschak at adelaide.edu.au  Wed Aug 18 22:15:03 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Thu, 19 Aug 2010 11:45:03 +0930
Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query
Message-ID: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au>

Hi Everyone,

I'm wanting to set up a persistent data store for some of my work and am
in the process of choosing parts for my system. From my brief look
around I think I'd like to use BioSQL (next best choice being Chado -
but BioPerl bindings in bioperl-db for BioSQL being the decider here),
but have noticed comments some time back that bioperl-db and PostgreSQL
8.3 (my prefered engine - though MySQL is possible, but makes the whole
system messier) don't play well together.

What is the status of the casting expectation conflict between
bioperl-db and Pg8.3? The scripts are run with safe data, so
placeholders aren't strictly crucial (though speed may be an issue?) and
`$dbh->{pg_server_prepare} = 0;' seems like it could be an option.

Can anybody provide any advice on this issue?

thanks
Dan Kortschak


From cjfields at illinois.edu  Wed Aug 18 23:29:36 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 18 Aug 2010 22:29:36 -0500
Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlast.pm - bl2seq
	question
In-Reply-To: <E8F0F7A7-BC33-4E37-8AAB-75A9470E82A5@sbc.su.se>
References: <4C6C3259.4060304@bcm.tmc.edu>
	<E8F0F7A7-BC33-4E37-8AAB-75A9470E82A5@sbc.su.se>
Message-ID: <194D43EC-A44C-450A-B57B-EC379DBCB935@illinois.edu>

Wouldn't surprise me too much if the parameters are not set the same; IIRC the main BLAST URL API and the online NCBI Web-BLAST have different default settings.

chris

On Aug 18, 2010, at 5:32 PM, Dave Messina wrote:

> Hi Caleb,
> 
> The first thing I would do is take BioPerl out of the equation and test your local bl2seq on the command line. If you get the same output locally as on the web version, then there is a problem with BioPerl. If you're still seeing a discrepancy between the web and your local run, then this isn't a problem with BioPerl.
> 
> Just to be clear, BioPerl doesn't "implement" any of the BLAST programs; it is simply a wrapper around the programs that you download from NCBI. That doesn't mean BioPerl isn't at fault, of course, just that it's important to isolate the problem carefully.
> 
> The most common reasons for these discrepancies are:
> 
> - different version numbers of BLAST
> 
> 2.2.21? 2.2.22? Is it the same on the web as locally?
> 
> - similarly, different implementations of BLAST
> 
> NCBI's old BLAST suite is now deprecated and replaced with BLAST+. All of the online BLAST web queries are Blast+ now ? are you running BLAST+ locally? (there's also a separate BioPerl wrapper for BLAST+ called Bio::Tools::Run::BlastPlus)
> 
> - hidden "default" parameters
> 
> Even though you're only changing a handful of parameters, the defaults (particularly on the web version) may be different than what you expect.
> 
> In your case, it looks like on the web version, match score is 2 and mismatch is -3. However, in the local version I believe match score is 1 and a mismatch is -3.
> 
> See this line in the params block near the end of your post:
> 
> 	'matrix' => 'blastn matrix:1 -3',
> 
> 
> 
> Dave
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hlapp at drycafe.net  Thu Aug 19 01:48:19 2010
From: hlapp at drycafe.net (Hilmar Lapp)
Date: Thu, 19 Aug 2010 01:48:19 -0400
Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query
In-Reply-To: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au>
References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au>
Message-ID: <986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net>

Hi Dan,

the casting isn't an issue anymore, I think. (And even if it were,  
there is actually a small script that brings back the casts that were  
built into 8.2.) Have you found an example where it still is?

	-hilmar

On Aug 18, 2010, at 10:15 PM, Dan Kortschak wrote:

> Hi Everyone,
>
> I'm wanting to set up a persistent data store for some of my work  
> and am
> in the process of choosing parts for my system. From my brief look
> around I think I'd like to use BioSQL (next best choice being Chado -
> but BioPerl bindings in bioperl-db for BioSQL being the decider here),
> but have noticed comments some time back that bioperl-db and  
> PostgreSQL
> 8.3 (my prefered engine - though MySQL is possible, but makes the  
> whole
> system messier) don't play well together.
>
> What is the status of the casting expectation conflict between
> bioperl-db and Pg8.3? The scripts are run with safe data, so
> placeholders aren't strictly crucial (though speed may be an issue?)  
> and
> `$dbh->{pg_server_prepare} = 0;' seems like it could be an option.
>
> Can anybody provide any advice on this issue?
>
> thanks
> Dan Kortschak
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
===========================================================


From dan.kortschak at adelaide.edu.au  Thu Aug 19 01:54:03 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Thu, 19 Aug 2010 15:24:03 +0930
Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query
In-Reply-To: <986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net>
References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au>
	<986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net>
Message-ID: <1282197243.14127.27.camel@zoidberg.mbs.adelaide.edu.au>

Hi Hilmar,

No, I haven't found any problems, just hoping to avoid them by prior
research.

thanks
Dan

On Thu, 2010-08-19 at 01:48 -0400, Hilmar Lapp wrote:
> Hi Dan,
> 
> the casting isn't an issue anymore, I think. (And even if it were,  
> there is actually a small script that brings back the casts that
> were  
> built into 8.2.) Have you found an example where it still is?
> 
>         -hilmar


From biopython at maubp.freeserve.co.uk  Thu Aug 19 06:01:03 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 19 Aug 2010 11:01:03 +0100
Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query
In-Reply-To: <986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net>
References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au>
	<986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net>
Message-ID: <AANLkTikiTQwzKJGrAKAtCqbHiAVw_uENrJu4Y038=anD@mail.gmail.com>

On Thu, Aug 19, 2010 at 6:48 AM, Hilmar Lapp <hlapp at drycafe.net> wrote:
> Hi Dan,
>
> the casting isn't an issue anymore, I think. (And even if it were, there is
> actually a small script that brings back the casts that were built into
> 8.2.) Have you found an example where it still is?
>
> ? ? ? ?-hilmar

Hi Hilmar,

Do the bioperl-db bindings for BioSQL on PostgreSQL still require those
extra rules in the schema?
http://bugzilla.open-bio.org/show_bug.cgi?id=2839

Peter


From G.Gallone at sms.ed.ac.uk  Thu Aug 19 06:45:36 2010
From: G.Gallone at sms.ed.ac.uk (Giuseppe Gallone)
Date: Thu, 19 Aug 2010 11:45:36 +0100
Subject: [Bioperl-l] [RFC] Interolog::Walk
In-Reply-To: <8BBCD561-CA5E-412B-8CBC-34767006A74A@sbc.su.se>
References: <4C6BF4BD.5010200@sms.ed.ac.uk>
	<8BBCD561-CA5E-412B-8CBC-34767006A74A@sbc.su.se>
Message-ID: <4C6D0B50.4050902@sms.ed.ac.uk>

Hi Dave,

thank you very much for your helpful comments.

Regarding the module name: I will follow your advice and avoid to 
propose a new root during the module registration. As for the second 
level, I haven't been able to find anything related to 
homology/orthology, therefore I'm not sure whether I should go for

Bio::Orthology::InterologMap
or
Bio::Homology::InterologMap

The first one being maybe a bit more specific. I might also expand 
further as in

Bio::Orthology::Interolog::Map,

just in case somebody else finds other interesting applications for the 
Interolog concept and would like to "plug in" their own contribution. 
Would this make any sense?

I also appreciate your comments on the documentation. The one I provided 
is actually not the full pod I was planning to include, but rather an 
extract. What I have at the moment is a description, for each method, in 
the following form:

=====================================
    remove_duplicate_rows
      Usage     : $RC = InterologMap::remove_duplicate_rows(input_handle 
    => $dbh,
 
output_handle   => $out_data,
                                                            header 
     => 'standard',
                                                            );
      Purpose   : This is used to clean up a TSV data files of duplicate 
entries. Occasionally,  Intact can return duplicate
                  entries. This routine will make sure no such 
duplicates are kept. A new datafile is built.
                  The number of unique data rows is updated.
      Returns   : success/error
      Argument  : database handle to input file, filehandle to 
outputfile, header type. Header type is one of the following:
                  - "standard": when the routine is used to clean up an 
interolog walk file (the header will be longer)
                  - "direct":   when the routine is used to clean up a 
file of real db interaction (the header is shorter)
                  - no field provided: default is standard
      Throws    : -
      Comment   : Sample


     See Also :
=======================================

On top of that, there is a DESCRIPTION, USAGE, and SYNOPSIS. The 
synopsis has some code with an example of typical usage of the module. 
Please take a look at this (attached below) and tell me what you think.

You mention that the description contains a lot of background 
information. Would you recommend reducing it, or placing it elsewhere?
I was considering to write a little tutorial in latex as soon as 
possible anyway, to provide a "centralised" source of information to 
familiarise with the module. Does this respect the CPAN regulations?

As for your question on the structure of the module: you are indeed 
right, the idea when running the "orthology walk" is to create a 
pipeline of subroutines: there's a core set of subroutines meant to work 
in strict sequentiality.
Each of these subroutines expects, as input, the output of the previous 
one. The input/output dataset is currently in the form of a TSV text 
file, which I process with the help of the DBI module (to be more 
specific, I use DBD::CSV).

While there's a certain flexibility regarding how to use the module, one 
core idea remains: in order to get the set of putative interactors, the 
user would have to call at least three basic routines:

(A)
=================
1)get_forward_orthologies(): this queries the initial gene list against 
one or more Ensembl dbs (using the Ensembl Perl Api) and retrieves their 
orthologues, plus a number of ancillary data fields (mainly conservation 
data, eg dn/ds ratio,distance from ancestor,orthology type, etc)

2)get_interactors(): this queries the orthology list built in the 
previous stage against a PSICQUIC-enabled PPI db using Rest (at the 
moment I only query the EBI Intact DB, but it should be easy to expand 
this and query all PSICQUIC compatible PPI dbs transparently). This step 
will "fatten" the dataset built in (1) with the interactors of those 
orthologues, plus ancillary data (including lots of parameters 
describing the quality, nature, origin of the annotated interaction)

3)get_backward_orthologies(): this queries the interactor list built in 
the previous stage against one or more Ensembl dbs to find orthologues 
*back* in the original species. It also adds a number of supplementary 
information just like in (1).
==================

At the end of this procedure the user will have a TSV files where each 
row contains a binary putative interaction plus (currently) 37 
supplementary data fields.

One can then scan these results to check for duplicates, to compute 
counts, to see if we have discovered new gene ids that were not present 
in the original dataset (hopefully we have :) ).

Most importantly, one can then further process these results to do one 
or more of the following:
(B) compute a global confidence score to assess the reliability of the 
each binary putative interaction
(C) extract the binary putative PPIs from the dataset and save them in a 
format compatible with Cytoscape: this helps providing a visual quality 
to the result: one can then apply network analysis tools to discover 
motifs, clusters, etc. The format I use is currently .SIF + attributes, 
as detailed in
http://cytoscape.wodaklab.org/wiki/Cytoscape_User_Manual/Network_Formats
(D) given the same initial gene list, one can also build a dataset of 
REAL, experimentally-obtained PPIs,(without mapping through orthologies 
in other species). One can then compare this dataset with the Putative 
dataset to see if/where the two overlap, what's the intersection or the 
differences, etc.


In order to suggest ways of using the module I have written 4 sample 
scripts and I will include them in the module. Each script utilises the 
module and uses/reuses subroutines in a pipeline fashion, and does the 
following:

1)doInterologWalk.pl: runs the basic pipeline in (A)
2)doScores.pl: computes and adds confidence scores as explained in (B)
3)doNetworks.pl: computes SIF network + attributes as in (D)
4)getRealInteractions.pl: runs a pipeline to obtain real PPIs from the 
inital gene set.

Hope I didn't make this too confusing. I would love to hear back from 
you and from anybody else that would like to provide feedback.

Cheers
Giuseppe

On 18/08/10 17:52, Dave Messina wrote:
> Hi Giuseppe,
>
> Sounds really interesting ? thanks for posting this.
>
>> Bio::Orthology::InterologWalk
>
> I vote for this name, or in any case something with Bio:: as the top-level namespace since it's a biology-related package.
>
> I like that you're providing a lot of background and information about the project in the documentation. However, the USAGE section should give information about how to use the module, with example code. You can look at other modules on CPAN (or in BioPerl) to see the conventions for writing documentation.
>
> Also, from what you wrote, it sounds like this might be a pipeline or a script rather than a module per se, or perhaps a script and a set of modules. It would be helpful to clarify in your documentation (if you haven't already) how exactly things are organized (and of course example code will help with that, too).
>
>
> Hope that's helpful, and let us know when you've got it up on CPAN so we can try it out!
>
>
> Dave
>
>

NAME
     Interolog::Walk - Retrieve, score and visualize putative 
Protein-Protein
     Interactions through the orthology-walk method

SYNOPSIS
       use Interolog::Walk;

     First, obtain Intact Interactions for the dataset (see example in
     "getDirectInteractions.pl"):

       #get a registry from Ensembl
       my $registry = InterologMap::setup_ensembl_adaptor(connect_to_db 
  => $ensembl_db,
                                                          source_species 
=> $sourceorg,
                                                          verbose 
  => 1
                                                          );


       #query actual interactions
       $RC = InterologMap::Direct::get_direct_interactions(registry 
     => $registry,
 
source_species   => $sourceorg,
                                                           input_path 
     => $in_path,
                                                           output_path 
     => $out_path,
                                                           url 
     => $url,
                                                           );

     do some postprocessing (see "do_counts()" and "extract_unseen_ids()" )
     and then do the actual interolog walk on the dataset with the following
     sequence of three methods.

     get orthologues of starting set:

       $RC = InterologMap::get_forward_orthologies(registry        => 
$registry,
                                                   ensembl_db      => 
$ensembl_db,
                                                   input_path      => 
$in_path,
                                                   output_path     => 
$out_path,
                                                   source_org      => 
$sourceorg,
                                                   dest_org        => 
$destorg,
                                                   );

     add interactors of orthologues found by "get_forward_orthologies()":

       $RC = InterologMap::get_interactions(input_path    => $in_path,
                                            output_path   => $out_path,
                                            url           => $url,
                                            url_global    => $url_global,
                                            );

     add orthologues of interactors found by "get_interactions()":

       $RC = InterologMap::get_backward_orthologies(registry    => 
$registry,
                                                    ensembl_db  => 
$ensembl_db,
                                                    input_path  => $in_path,
                                                    output_path => 
$out_path,
                                                    error_path  => 
$err_path,
                                                    source_org  => 
$sourceorg,
                                                    );

     do some postprocessing (see "remove_duplicate_rows()", "do_counts()",
     "extract_unseen_ids()") and then optionally compute a composite score
     for the putative interactions obtained:

        $RC = InterologMap::Scores::compute_scores(input_path      => 
$in_path,
                                                   score_path      => 
$score_path,
                                                   output_path     => 
$out_path,
                                                   term_graph      => 
$onto_graph,
                                                   M_IT_SCORE      => $M_IT,
                                                   M_DM_SCORE      => $M_DM,
                                                   M_ME_DM_SCORE   => 
$M_MDM,
                                                   M_ME_TAXA_SCORE => 
$M_MTAXA
                                                   );

     get some networks and network attributes which you can then visualise
     with cytoscape

        $RC = InterologMap::Networks::do_network(registry            => 
$registry,
                                                    db               => 
$ensembl_db,
                                                    input_path       => 
$in_path,
                                                    output_path      => 
$out_path,
                                                    source_org       => 
$sourceorg,
                                                    orthology_type   => 
$orthtype,
                                                    );

        $RC = InterologMap::Networks::do_attributes(registry      => 
$registry,
                                                    input_path    => 
$in_path,
                                                    output_path   => 
$out_path,
                                                    source_org    => 
$sourceorg,
                                                    label_type    => 
'external name'
                                                    );

     *The synopsis above only lists the major methods and parameters.*

DESCRIPTION
     A common activity in computational biology is to mine protein-protein
     interactions from publicly available databases to build 
*Protein-Protein
     Interaction* (PPI) datasets. In many instances, however, the number of
     experimentally obtained annotated PPIs is very scarce and it would be
     helpful to enrich the experimental dataset with high-quality,
     computationally-inferred PPIs. Such computationally-obtained 
dataset can
     extend, support or enrich experimental PPI datasets, and are of crucial
     importance in high-throughput gene prioritization studies, i.e. to 
drive
     hypotheses and restrict the dimensionality of functional discovery
     problems. This Perl Module, Interolog::Walk, is aimed at building
     putative PPI datasets on the basis of a number of comparative biology
     paradigms: the module implements a collection of computational biology
     algorithms based on the concept of "orthology projection". If
     interacting proteins A and B in organism X have orthologs A' and B' in
     organism Y, under certain conditions one can assume that the 
interaction
     will be conserved in organism Y, i.e. the A-B interaction can be
     "projected through the orthologies" to obtain a putative A'-B'
     interaction. The pair of interactions (A-B) and (A'-B') are named
     "Interologs".

     Interolog::Walk collects, analyses and collates gene orthology data
     provided by the Ensembl Consortium as well as PPI data provided by EBI
     Intact. It provides the user with the possibility of rating the quality
     and reliability of the putative interactions collected, by means of
     confidence scores, and optionally outputs network representations 
of the
     datasets, compatible with the biological network representation
     standard, Cytoscape.

BASIC USAGE
   Rationale behind "Interolog::Walk".
                                   \EBI Intact API/
              .--------------.            |             .-------------.
          (2) | A(e.g. mouse)|<------------------------>|   B(mouse)  |  (3)
              `--------------'          <PPI>           `-------------'
                     ^                                         |
        /Ensembl\    | <Orthology>                 <Orthology> | \ Ensembl /
       / Compara \   |                                         |  \Compara/
      /    Api    \  |                                         |   \ Api /
                     |                                         |
              .--------------.                           .-------------.
          (1) | A'(e.g. fly) |. . . . . . . . . . . . .  |   B'(fly)   | (4)
              `--------------'     [SCORED]PUTATIVE PPI  `-------------'
                              (Output of Interolog::Walk)

     In order to carry out an interolog walk we start with a set of gene
     identifiers in one organism of interest (1). We query those ids against
     a number of comparative biology databases to retrieve a list of
     orthologues for the gene id of interest, in one or more species (2). In
     the next step we rely instead on PPI databases to retrieve the list of
     available interactors for the protein ids obtained in (2). The 
output at
     this stage consists of a list of interactors of the orthologues of the
     initial gene set, plus several fields of ancillary data (whose
     importance will be explained later) (3). In the last step of this
     process we will need to project the interactions in (3) - again using
     orthology data - back to the original species of interest. The 
output of
     the process is a list of PUTATIVE INTERACTORS of the initial gene set,
     plus several fields of ancillary data.

     "Interolog::Walk" provides three main functions to carry out the basic
     walk, "get_forward_orthologies()", "get_interactions()" and
     "get_backward_orthologies()". These functions must be called strictly
     sequentially in your script, as the process, analyse and attach data to
     the output in a pipeline-like fashion, i.e. processing the output 
of the
     preceding function.

     get_forward_orthologies
     get_interactions
     get_backward_orthologies

SCORING THE PUTATIVE INTERACTIONS
BUILDING PUTATIVE INTERACTION NETWORKS
BUGS
     Please report any you find

SUPPORT
     TODO

AUTHOR
     Giuseppe Gallone <ggallone at cpan.org>

     CPAN ID: GGALLONE

     University of Edinburgh

COPYRIGHT
     The Interolog::Walk module is Copyright (c) 2010 Giuseppe Gallone All
     rights reserved.

     You may distribute under the terms of either the GNU General Public
     License or the Artistic License, as specified in the Perl 5.10.0 README
     file.

SEE ALSO


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

From G.Gallone at sms.ed.ac.uk  Thu Aug 19 08:42:28 2010
From: G.Gallone at sms.ed.ac.uk (Giuseppe Gallone)
Date: Thu, 19 Aug 2010 13:42:28 +0100
Subject: [Bioperl-l] [RFC] Interolog::Walk
In-Reply-To: <20100819002830.GA366@Macintosh-235.local>
References: <4C6BF4BD.5010200@sms.ed.ac.uk>
	<20100819002830.GA366@Macintosh-235.local>
Message-ID: <4C6D26B4.5090702@sms.ed.ac.uk>

Dear Siddhartha,

glad to hear this might be helpful. As for the bioperl-network package 
you mention, thank for you for mentioning that. I gave a quick look to 
its documentation and looks like a much deeper and more complex effort 
than what I have in my package. I've actually been using a lot the 
package Graph on which it seems to be based and found it very helpful.

I'm not sure if the network routines in my module overlap with it 
though: all I do in my package is parse the dataset, filtering out only 
what requested to build a cytoscape SIF file and optionally some 
cytoscape NOA attribute files, as requested by the cytoscape 
specification in

http://cytoscape.wodaklab.org/wiki/Cytoscape_User_Manual/Network_Formats

instead it looks like  bioperl-network actually builds some kind of 
internal representation of the network for further manipulation in Perl, 
if I understand it correctly?

Kind regards
Giuseppe

On 19/08/10 01:28, Siddhartha Basu wrote:

> Sounds interesting. I am currently playing around with a perl based webapp for displaying interactome
> using cytoscapeweb. Depending how your design pans out,  would be happy to
> use your module as a backend analysis layer. And on a related note,  you
> might want to have a look at bioperl-network and if there is any overlap
> might be worth contributing.
>
> -siddhartha
>


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

From xupeng86 at gmail.com  Thu Aug 19 04:02:48 2010
From: xupeng86 at gmail.com (xupeng)
Date: Thu, 19 Aug 2010 16:02:48 +0800
Subject: [Bioperl-l] Why I can't find the perl script "load_seqdatabase.pl"
	when use biosql database?
Message-ID: <201008191602.49068.xupeng86@gmail.com>

 	I've downloaded the biosql-1.0.1.tar.gz. It works well. But I 
can't find the 'load_seqdatabase.pl' when I try to import the 
Genbank files into biosql databsase. 
	Can anyone give me a copy of that file? 
many thanks ! 

From sunhanifk at gmail.com  Thu Aug 19 10:25:38 2010
From: sunhanifk at gmail.com (han sun)
Date: Thu, 19 Aug 2010 22:25:38 +0800
Subject: [Bioperl-l] Could I install BioPerl on Windows with the ActivePerl
	5.12.1?
Message-ID: <AANLkTi=ycKzqWWQ-FHk=4WBxhedt7CYT-WkBZkxRjgrm@mail.gmail.com>

Hello everyone,

I have used perl for several months,and I now want to feel the power of
bioperl.
But it seems that the installing is more difficult than I thought.

I typed the commands.


install-shell


rep add bioperl http://bioperl.org/DIST


rep add uwinnipeg
http://cpan.uwinnipeg.ca/PPMPackages/12xx/<http://cpan.uwinnipeg.ca/PPMPackages/10xx/>


rep add trouchelle http://trouchelle.com/ppm12/

install BioPerl

However,the installing failed,

ppm install failed:
Can't find any package that provides List::MoreUtils for Bundle-BioPerl-Core
Can't find any package that provides PostScript::TextBlock for
Bundle-BioPerl-Core
Can't find any package that provides Ace:: for Bundle-BioPerl-Core
Can't find any package that provides SOAP::Lite for Bundle-BioPerl-Core
Can't find any package that provides Convert::Binary::C for
Bundle-BioPerl-Core
Can't find any package that provides XML::Twig for Bundle-BioPerl-Core
Can't find any package that provides DB_File:: for Bundle-BioPerl-Core
Can't find any package that provides IPC::Run for GraphViz
Can't find any package that provides XML-XPathEngine for XML-DOM-XPath
Can't find any package that provides List-MoreUtils for Moose
Can't find any package that provides List-MoreUtils for Class-MOP


then I tried

install http://www.bribes.org/perl/ppm/GD.ppd

and tried the installation again,but it still didn't help.

*
*
*
*
*
*


*Do you konw what's wrong with the problem?*
*
*
*
*
*Please help me,thanks very much.*

From cjfields1 at gmail.com  Thu Aug 19 10:33:26 2010
From: cjfields1 at gmail.com (Christopher Fields)
Date: Thu, 19 Aug 2010 09:33:26 -0500
Subject: [Bioperl-l] Could I install BioPerl on Windows with the
	ActivePerl 5.12.1?
In-Reply-To: <AANLkTi=ycKzqWWQ-FHk=4WBxhedt7CYT-WkBZkxRjgrm@mail.gmail.com>
References: <AANLkTi=ycKzqWWQ-FHk=4WBxhedt7CYT-WkBZkxRjgrm@mail.gmail.com>
Message-ID: <78E913D5-00E2-45F2-AA9D-7F4A7CDBFDA1@gmail.com>

Try using ActivePerl 5.10 instead of v5.12.  It's very possible the PPM won't work for v5.12 yet.

chris

On Aug 19, 2010, at 9:25 AM, han sun wrote:

> Hello everyone,
> 
> I have used perl for several months,and I now want to feel the power of
> bioperl.
> But it seems that the installing is more difficult than I thought.
> 
> I typed the commands.
> 
> 
> 
> install-shell
> 
> 
> rep add bioperl http://bioperl.org/DIST
> 
> 
> rep add uwinnipeg
> http://cpan.uwinnipeg.ca/PPMPackages/12xx/<http://cpan.uwinnipeg.ca/PPMPackages/10xx/>
> 
> 
> rep add trouchelle http://trouchelle.com/ppm12/
> 
> install BioPerl
> 
> However,the installing failed,
> 
> ppm install failed:
> Can't find any package that provides List::MoreUtils for Bundle-BioPerl-Core
> Can't find any package that provides PostScript::TextBlock for
> Bundle-BioPerl-Core
> Can't find any package that provides Ace:: for Bundle-BioPerl-Core
> Can't find any package that provides SOAP::Lite for Bundle-BioPerl-Core
> Can't find any package that provides Convert::Binary::C for
> Bundle-BioPerl-Core
> Can't find any package that provides XML::Twig for Bundle-BioPerl-Core
> Can't find any package that provides DB_File:: for Bundle-BioPerl-Core
> Can't find any package that provides IPC::Run for GraphViz
> Can't find any package that provides XML-XPathEngine for XML-DOM-XPath
> Can't find any package that provides List-MoreUtils for Moose
> Can't find any package that provides List-MoreUtils for Class-MOP
> 
> 
> then I tried
> 
> install http://www.bribes.org/perl/ppm/GD.ppd
> 
> and tried the installation again,but it still didn't help.
> 
> *
> *
> *
> *
> *
> *
> 
> 
> *Do you konw what's wrong with the problem?*
> *
> *
> *
> *
> *Please help me,thanks very much.*
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hlapp at drycafe.net  Thu Aug 19 10:53:22 2010
From: hlapp at drycafe.net (Hilmar Lapp)
Date: Thu, 19 Aug 2010 10:53:22 -0400
Subject: [Bioperl-l] Why I can't find the perl script
	"load_seqdatabase.pl" when use biosql database?
In-Reply-To: <201008191602.49068.xupeng86@gmail.com>
References: <201008191602.49068.xupeng86@gmail.com>
Message-ID: <14AD92D9-3EA7-4E74-8EF7-2B2EC70D9095@drycafe.net>

The file comes with Bioperl-db, not BioSQL. That is so because it  
depends on BioPerl and on Bioperl-db, and so you will need to have  
both installed.

	-hilmar

On Aug 19, 2010, at 4:02 AM, xupeng wrote:

> 	I've downloaded the biosql-1.0.1.tar.gz. It works well. But I
> can't find the 'load_seqdatabase.pl' when I try to import the
> Genbank files into biosql databsase.
> 	Can anyone give me a copy of that file?
> many thanks !
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
===========================================================


From hlapp at drycafe.net  Thu Aug 19 10:58:46 2010
From: hlapp at drycafe.net (Hilmar Lapp)
Date: Thu, 19 Aug 2010 10:58:46 -0400
Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query
In-Reply-To: <AANLkTikiTQwzKJGrAKAtCqbHiAVw_uENrJu4Y038=anD@mail.gmail.com>
References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au>
	<986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net>
	<AANLkTikiTQwzKJGrAKAtCqbHiAVw_uENrJu4Y038=anD@mail.gmail.com>
Message-ID: <045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net>

Yes, unfortunately they do. The feature for obviating them (namely  
nested transactions) is there in Pg 8.2+, but Bioperl-db doesn't use  
them yet ... I have to learn more about Class::DBIx first to decide  
whether it's better to first implement nested transactions in the home- 
grown ORM that Bioperl-db in essence is, or whether it's better to  
reimplement everything in Class::DBIx instead.

There are new datatypes in Bioperl, and relations in BioSQL that could  
hold them, and so I need to decide what's the way forward.

	-hilmar

On Aug 19, 2010, at 6:01 AM, Peter wrote:

> On Thu, Aug 19, 2010 at 6:48 AM, Hilmar Lapp <hlapp at drycafe.net>  
> wrote:
>> Hi Dan,
>>
>> the casting isn't an issue anymore, I think. (And even if it were,  
>> there is
>> actually a small script that brings back the casts that were built  
>> into
>> 8.2.) Have you found an example where it still is?
>>
>>        -hilmar
>
> Hi Hilmar,
>
> Do the bioperl-db bindings for BioSQL on PostgreSQL still require  
> those
> extra rules in the schema?
> http://bugzilla.open-bio.org/show_bug.cgi?id=2839
>
> Peter

-- 
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
===========================================================


From mmuratet at hudsonalpha.org  Thu Aug 19 11:00:52 2010
From: mmuratet at hudsonalpha.org (Michael Muratet)
Date: Thu, 19 Aug 2010 10:00:52 -0500
Subject: [Bioperl-l] Why I can't find the perl script
	"load_seqdatabase.pl" when use biosql database?
In-Reply-To: <14AD92D9-3EA7-4E74-8EF7-2B2EC70D9095@drycafe.net>
References: <201008191602.49068.xupeng86@gmail.com>
	<14AD92D9-3EA7-4E74-8EF7-2B2EC70D9095@drycafe.net>
Message-ID: <C6FECD93-E599-465B-A93A-BD1F2CDFBE9C@hudsonalpha.org>


On Aug 19, 2010, at 9:53 AM, Hilmar Lapp wrote:

> The file comes with Bioperl-db, not BioSQL. That is so because it  
> depends on BioPerl and on Bioperl-db, and so you will need to have  
> both installed.

Is load_seqdatabase.pl still the best method? I vaguely remember a  
post that said that load_seqdatabase was deprecated, but I can't find  
it in the archives.

Mike

>
> 	-hilmar
>
> On Aug 19, 2010, at 4:02 AM, xupeng wrote:
>
>> 	I've downloaded the biosql-1.0.1.tar.gz. It works well. But I
>> can't find the 'load_seqdatabase.pl' when I try to import the
>> Genbank files into biosql databsase.
>> 	Can anyone give me a copy of that file?
>> many thanks !
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> -- 
> ===========================================================
> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
> ===========================================================
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Michael Muratet, Ph.D.
Senior Scientist
HudsonAlpha Institute for Biotechnology
mmuratet at hudsonalpha.org
(256) 327-0473 (p)
(256) 327-0966 (f)

Room 4005
601 Genome Way
Huntsville, Alabama 35806


From hlapp at drycafe.net  Thu Aug 19 11:29:31 2010
From: hlapp at drycafe.net (Hilmar Lapp)
Date: Thu, 19 Aug 2010 11:29:31 -0400
Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query
In-Reply-To: <5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu>
References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au>
	<986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net>
	<AANLkTikiTQwzKJGrAKAtCqbHiAVw_uENrJu4Y038=anD@mail.gmail.com>
	<045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net>
	<5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu>
Message-ID: <5F77404A-086D-4D0C-B3A5-F5119FCF878A@drycafe.net>


On Aug 19, 2010, at 11:09 AM, Chris Fields wrote:

> DBIx::Class


Did I have this in the wrong order :-) More coffee, please.
-- 
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
===========================================================


From hlapp at drycafe.net  Thu Aug 19 11:30:26 2010
From: hlapp at drycafe.net (Hilmar Lapp)
Date: Thu, 19 Aug 2010 11:30:26 -0400
Subject: [Bioperl-l] Why I can't find the perl script
	"load_seqdatabase.pl" when use biosql database?
In-Reply-To: <C6FECD93-E599-465B-A93A-BD1F2CDFBE9C@hudsonalpha.org>
References: <201008191602.49068.xupeng86@gmail.com>
	<14AD92D9-3EA7-4E74-8EF7-2B2EC70D9095@drycafe.net>
	<C6FECD93-E599-465B-A93A-BD1F2CDFBE9C@hudsonalpha.org>
Message-ID: <C5FD4B85-25B3-4D76-AA99-B3DBE42400C7@drycafe.net>

It's not deprecated. Unless I'm again mixing up something?

	-hilmar

On Aug 19, 2010, at 11:00 AM, Michael Muratet wrote:

>
> On Aug 19, 2010, at 9:53 AM, Hilmar Lapp wrote:
>
>> The file comes with Bioperl-db, not BioSQL. That is so because it  
>> depends on BioPerl and on Bioperl-db, and so you will need to have  
>> both installed.
>
> Is load_seqdatabase.pl still the best method? I vaguely remember a  
> post that said that load_seqdatabase was deprecated, but I can't  
> find it in the archives.
>
> Mike
>
>>
>> 	-hilmar
>>
>> On Aug 19, 2010, at 4:02 AM, xupeng wrote:
>>
>>> 	I've downloaded the biosql-1.0.1.tar.gz. It works well. But I
>>> can't find the 'load_seqdatabase.pl' when I try to import the
>>> Genbank files into biosql databsase.
>>> 	Can anyone give me a copy of that file?
>>> many thanks !
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> -- 
>> ===========================================================
>> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
>> ===========================================================
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Michael Muratet, Ph.D.
> Senior Scientist
> HudsonAlpha Institute for Biotechnology
> mmuratet at hudsonalpha.org
> (256) 327-0473 (p)
> (256) 327-0966 (f)
>
> Room 4005
> 601 Genome Way
> Huntsville, Alabama 35806
>
>
>
>
>

-- 
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
===========================================================


From cjfields at illinois.edu  Thu Aug 19 11:09:13 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 19 Aug 2010 10:09:13 -0500
Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query
In-Reply-To: <045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net>
References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au>
	<986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net>
	<AANLkTikiTQwzKJGrAKAtCqbHiAVw_uENrJu4Y038=anD@mail.gmail.com>
	<045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net>
Message-ID: <5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu>

I think it's worth exploring having a DBIx::Class-based middle-ware approach similar to what Rob Buels has done for Chado.  That would be fairly easy to get started using DBIx::Class::Schema::Loader.  

After that it would require optimization and tweaking, which is potentially more complex than Rob's setup as Chado is very Pg-specific, but maybe Rob can elaborate...

chris

On Aug 19, 2010, at 9:58 AM, Hilmar Lapp wrote:

> Yes, unfortunately they do. The feature for obviating them (namely nested transactions) is there in Pg 8.2+, but Bioperl-db doesn't use them yet ... I have to learn more about Class::DBIx first to decide whether it's better to first implement nested transactions in the home-grown ORM that Bioperl-db in essence is, or whether it's better to reimplement everything in Class::DBIx instead.
> 
> There are new datatypes in Bioperl, and relations in BioSQL that could hold them, and so I need to decide what's the way forward.
> 
> 	-hilmar
> 
> On Aug 19, 2010, at 6:01 AM, Peter wrote:
> 
>> On Thu, Aug 19, 2010 at 6:48 AM, Hilmar Lapp <hlapp at drycafe.net> wrote:
>>> Hi Dan,
>>> 
>>> the casting isn't an issue anymore, I think. (And even if it were, there is
>>> actually a small script that brings back the casts that were built into
>>> 8.2.) Have you found an example where it still is?
>>> 
>>>       -hilmar
>> 
>> Hi Hilmar,
>> 
>> Do the bioperl-db bindings for BioSQL on PostgreSQL still require those
>> extra rules in the schema?
>> http://bugzilla.open-bio.org/show_bug.cgi?id=2839
>> 
>> Peter
> 
> -- 
> ===========================================================
> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
> ===========================================================
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Thu Aug 19 11:37:39 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 19 Aug 2010 10:37:39 -0500
Subject: [Bioperl-l] Why I can't find the perl script
	"load_seqdatabase.pl" when use biosql database?
In-Reply-To: <C5FD4B85-25B3-4D76-AA99-B3DBE42400C7@drycafe.net>
References: <201008191602.49068.xupeng86@gmail.com>
	<14AD92D9-3EA7-4E74-8EF7-2B2EC70D9095@drycafe.net>
	<C6FECD93-E599-465B-A93A-BD1F2CDFBE9C@hudsonalpha.org>
	<C5FD4B85-25B3-4D76-AA99-B3DBE42400C7@drycafe.net>
Message-ID: <68FB78FF-11F7-43D7-9FA3-5DFF7D391FAB@illinois.edu>

I don't recall this either.  So, can't blame it on lack of coffee :)

chris

On Aug 19, 2010, at 10:30 AM, Hilmar Lapp wrote:

> It's not deprecated. Unless I'm again mixing up something?
> 
> 	-hilmar
> 
> On Aug 19, 2010, at 11:00 AM, Michael Muratet wrote:
> 
>> 
>> On Aug 19, 2010, at 9:53 AM, Hilmar Lapp wrote:
>> 
>>> The file comes with Bioperl-db, not BioSQL. That is so because it depends on BioPerl and on Bioperl-db, and so you will need to have both installed.
>> 
>> Is load_seqdatabase.pl still the best method? I vaguely remember a post that said that load_seqdatabase was deprecated, but I can't find it in the archives.
>> 
>> Mike
>> 
>>> 
>>> 	-hilmar
>>> 
>>> On Aug 19, 2010, at 4:02 AM, xupeng wrote:
>>> 
>>>> 	I've downloaded the biosql-1.0.1.tar.gz. It works well. But I
>>>> can't find the 'load_seqdatabase.pl' when I try to import the
>>>> Genbank files into biosql databsase.
>>>> 	Can anyone give me a copy of that file?
>>>> many thanks !
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>>> -- 
>>> ===========================================================
>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
>>> ===========================================================
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> Michael Muratet, Ph.D.
>> Senior Scientist
>> HudsonAlpha Institute for Biotechnology
>> mmuratet at hudsonalpha.org
>> (256) 327-0473 (p)
>> (256) 327-0966 (f)
>> 
>> Room 4005
>> 601 Genome Way
>> Huntsville, Alabama 35806
>> 
>> 
>> 
>> 
>> 
> 
> -- 
> ===========================================================
> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
> ===========================================================
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From mmuratet at hudsonalpha.org  Thu Aug 19 11:40:02 2010
From: mmuratet at hudsonalpha.org (Michael Muratet)
Date: Thu, 19 Aug 2010 10:40:02 -0500
Subject: [Bioperl-l] Why I can't find the perl script
	"load_seqdatabase.pl" when use biosql database?
In-Reply-To: <68FB78FF-11F7-43D7-9FA3-5DFF7D391FAB@illinois.edu>
References: <201008191602.49068.xupeng86@gmail.com>
	<14AD92D9-3EA7-4E74-8EF7-2B2EC70D9095@drycafe.net>
	<C6FECD93-E599-465B-A93A-BD1F2CDFBE9C@hudsonalpha.org>
	<C5FD4B85-25B3-4D76-AA99-B3DBE42400C7@drycafe.net>
	<68FB78FF-11F7-43D7-9FA3-5DFF7D391FAB@illinois.edu>
Message-ID: <A0AD0D4E-89EC-4FA0-8625-FF0A2EFB5669@hudsonalpha.org>


On Aug 19, 2010, at 10:37 AM, Chris Fields wrote:

> I don't recall this either.  So, can't blame it on lack of coffee :)

Thanks. I'll keep using it!

Mike
>
> chris
>
> On Aug 19, 2010, at 10:30 AM, Hilmar Lapp wrote:
>
>> It's not deprecated. Unless I'm again mixing up something?
>>
>> 	-hilmar
>>
>> On Aug 19, 2010, at 11:00 AM, Michael Muratet wrote:
>>
>>>
>>> On Aug 19, 2010, at 9:53 AM, Hilmar Lapp wrote:
>>>
>>>> The file comes with Bioperl-db, not BioSQL. That is so because it  
>>>> depends on BioPerl and on Bioperl-db, and so you will need to  
>>>> have both installed.
>>>
>>> Is load_seqdatabase.pl still the best method? I vaguely remember a  
>>> post that said that load_seqdatabase was deprecated, but I can't  
>>> find it in the archives.
>>>
>>> Mike
>>>
>>>>
>>>> 	-hilmar
>>>>
>>>> On Aug 19, 2010, at 4:02 AM, xupeng wrote:
>>>>
>>>>> 	I've downloaded the biosql-1.0.1.tar.gz. It works well. But I
>>>>> can't find the 'load_seqdatabase.pl' when I try to import the
>>>>> Genbank files into biosql databsase.
>>>>> 	Can anyone give me a copy of that file?
>>>>> many thanks !
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>> -- 
>>>> ===========================================================
>>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
>>>> ===========================================================
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> Michael Muratet, Ph.D.
>>> Senior Scientist
>>> HudsonAlpha Institute for Biotechnology
>>> mmuratet at hudsonalpha.org
>>> (256) 327-0473 (p)
>>> (256) 327-0966 (f)
>>>
>>> Room 4005
>>> 601 Genome Way
>>> Huntsville, Alabama 35806
>>>
>>>
>>>
>>>
>>>
>>
>> -- 
>> ===========================================================
>> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
>> ===========================================================
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

Michael Muratet, Ph.D.
Senior Scientist
HudsonAlpha Institute for Biotechnology
mmuratet at hudsonalpha.org
(256) 327-0473 (p)
(256) 327-0966 (f)

Room 4005
601 Genome Way
Huntsville, Alabama 35806


From cjfields at illinois.edu  Thu Aug 19 11:55:54 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 19 Aug 2010 10:55:54 -0500
Subject: [Bioperl-l] Bug? Features with similar ranges,
	different IDs are considered overlapping
In-Reply-To: <EA0C23FB-8C2F-4C04-B0E8-4207409916DC@sbc.su.se>
References: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu>
	<A07B1A30-7708-4401-BB13-7B4463D306E7@sbc.su.se>
	<E3473ED6-2122-4B4A-8A73-E80C4136CCAC@illinois.edu>
	<83732A2C-C6E3-479A-ACAA-FE5756271991@sbc.su.se>
	<B7A8E3B4-1E7E-4768-AFF3-3D4C4A5FC3B1@scottcain.net>
	<EA0C23FB-8C2F-4C04-B0E8-4207409916DC@sbc.su.se>
Message-ID: <5611499B-FA63-4A52-8279-99B554418374@illinois.edu>

On Aug 17, 2010, at 8:52 AM, Dave Messina wrote:

>> It seems to me that the genomic comparison is the thing people would do more often, so if you're going to create a flag, the default should be for the genomic comparison
> 
> Yep, agreed.
> 
> And such a flag should be named for the non-default behavior, then, like: -ignore_IDs_for_overlaps
> 
> Dave

Probably would just be -ignore_ids as this behavior would have to be consistent across the various Bio::RangeI methods (overlaps, contains, etc).  The params are case-insensitive IIRC, so the _IDs would just be lc().

RangeI doesn't define a seq_id(), though, so we either use can() in RangeI (which is dirtier IMO) or define this in the appropriate class, probably LocationI or SeqFeatureI.

chris


From cjfields at illinois.edu  Thu Aug 19 11:56:11 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 19 Aug 2010 10:56:11 -0500
Subject: [Bioperl-l] Bug? Features with similar ranges,
	different IDs are considered overlapping
In-Reply-To: <B7A8E3B4-1E7E-4768-AFF3-3D4C4A5FC3B1@scottcain.net>
References: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu>
	<A07B1A30-7708-4401-BB13-7B4463D306E7@sbc.su.se>
	<E3473ED6-2122-4B4A-8A73-E80C4136CCAC@illinois.edu>
	<83732A2C-C6E3-479A-ACAA-FE5756271991@sbc.su.se>
	<B7A8E3B4-1E7E-4768-AFF3-3D4C4A5FC3B1@scottcain.net>
Message-ID: <7CF700A0-C7A0-4BD2-9757-50B693B3B614@illinois.edu>

Makes sense.  

chris

On Aug 17, 2010, at 7:45 AM, Scott Cain wrote:

> Hi Dave and Chris,
> 
> It seems to me that the genomic comparison is the thing people would do more often, so if you're going to create a flag, the default should be for the genomic comparison and if somebody is doing the protein space comparison and not getting the the expected results, they'll probably read the docs to find out why. 
> 
> Scott
> 
> --
> Scott Cain, Ph. D.
> scott at scottcain dot net
> Ontario Institute for Cancer Research
> http://gmod.org/
> 216 392 3087 
> 
> Snet from my iPhone.
> 
> On Aug 17, 2010, at 5:06 AM, Dave Messina <David.Messina at sbc.su.se> wrote:
> 
>>> Good point; it's probably the context the methods are used that matters.  So, maybe just a document clarification?
>> 
>> That's always good, but it really doesn't solve the issue you're describing.
>> 
>> I mean, who would expect to get overlaps for features on different chromosomes?
>> 
>> To me, that's a clear violation of reasonable user expectations. You shouldn't have to read the docs about something like that.
>> 
>> So what's the solution for these duelling use cases? I haven't thought about it much, but a first approximation might be to add a -genomic boolean flag that, when true, would do the right thing and check the ID when doing overlaps or other positional comparisons.
>> 
>> (Maybe -genomic is too obscure. Maybe it should be -same_id_for_overlaps or something like that.)
>> 
>> And maybe having to know to set a flag is effectively the same thing as having to read the docs to understand SeqFeature's overlap behavior.
>> 
>> What do the rest of you out there think?
>> 
>> 
>> Dave
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From David.Messina at sbc.su.se  Thu Aug 19 12:54:23 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 19 Aug 2010 18:54:23 +0200
Subject: [Bioperl-l]  Bug? Features with similar ranges,
	different IDs are considered overlapping
References: <83299B71-0F73-440D-A9C5-DC1DA2AFF605@davemessina.com>
Message-ID: <1EFB951F-AEE1-4B2A-9E29-114E40B25D21@sbc.su.se>

[Ccing list for real this time]

On Aug 19, 2010, at 17:55, Chris Fields <cjfields at illinois.edu> wrote:

> Probably would just be -ignore_ids

You're right, that's the way to go. 


> define this in the appropriate class, probably LocationI or 

Yep, that's cleaner.

Thanks!


Dave


From cjfields1 at gmail.com  Thu Aug 19 13:20:32 2010
From: cjfields1 at gmail.com (Christopher Fields)
Date: Thu, 19 Aug 2010 12:20:32 -0500
Subject: [Bioperl-l] Could I install BioPerl on Windows with the
	ActivePerl 5.12.1?
In-Reply-To: <AANLkTimBPL6Sr2kmg+f0t1j8pk_9nBAoqubKzY4AJoxo@mail.gmail.com>
References: <AANLkTi=ycKzqWWQ-FHk=4WBxhedt7CYT-WkBZkxRjgrm@mail.gmail.com>
	<78E913D5-00E2-45F2-AA9D-7F4A7CDBFDA1@gmail.com>
	<AANLkTimBPL6Sr2kmg+f0t1j8pk_9nBAoqubKzY4AJoxo@mail.gmail.com>
Message-ID: <5115F433-06AC-46F1-81AD-D15C4A8D9524@gmail.com>

cc'ing list.  Looks like the BioPerl PPM is possibly broken for perl 5.12.  Shouldn't be too hard to fix, but apparently there are a lot of missing packages. Troubling...

chris

On Aug 19, 2010, at 11:29 AM, han sun wrote:

> v5.10 works,thanks.
> 
> 2010/8/19 Christopher Fields <cjfields1 at gmail.com>
> Try using ActivePerl 5.10 instead of v5.12.  It's very possible the PPM won't work for v5.12 yet.
> 
> chris
> 
> On Aug 19, 2010, at 9:25 AM, han sun wrote:
> 
> > Hello everyone,
> >
> > I have used perl for several months,and I now want to feel the power of
> > bioperl.
> > But it seems that the installing is more difficult than I thought.
> >
> > I typed the commands.
> >
> >
> >
> > install-shell
> >
> >
> > rep add bioperl http://bioperl.org/DIST
> >
> >
> > rep add uwinnipeg
> > http://cpan.uwinnipeg.ca/PPMPackages/12xx/<http://cpan.uwinnipeg.ca/PPMPackages/10xx/>
> >
> >
> > rep add trouchelle http://trouchelle.com/ppm12/
> >
> > install BioPerl
> >
> > However,the installing failed,
> >
> > ppm install failed:
> > Can't find any package that provides List::MoreUtils for Bundle-BioPerl-Core
> > Can't find any package that provides PostScript::TextBlock for
> > Bundle-BioPerl-Core
> > Can't find any package that provides Ace:: for Bundle-BioPerl-Core
> > Can't find any package that provides SOAP::Lite for Bundle-BioPerl-Core
> > Can't find any package that provides Convert::Binary::C for
> > Bundle-BioPerl-Core
> > Can't find any package that provides XML::Twig for Bundle-BioPerl-Core
> > Can't find any package that provides DB_File:: for Bundle-BioPerl-Core
> > Can't find any package that provides IPC::Run for GraphViz
> > Can't find any package that provides XML-XPathEngine for XML-DOM-XPath
> > Can't find any package that provides List-MoreUtils for Moose
> > Can't find any package that provides List-MoreUtils for Class-MOP
> >
> >
> > then I tried
> >
> > install http://www.bribes.org/perl/ppm/GD.ppd
> >
> > and tried the installation again,but it still didn't help.
> >
> > *
> > *
> > *
> > *
> > *
> > *
> >
> >
> > *Do you konw what's wrong with the problem?*
> > *
> > *
> > *
> > *
> > *Please help me,thanks very much.*
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 


From rmb32 at cornell.edu  Thu Aug 19 13:09:45 2010
From: rmb32 at cornell.edu (Robert Buels)
Date: Thu, 19 Aug 2010 10:09:45 -0700
Subject: [Bioperl-l] reminder: Aug 25 deadline for GMOD Hackathon application
Message-ID: <4C6D6559.3080809@cornell.edu>

Hi all,

This is your one-week reminder: the deadline for open applications to 
the GMOD Evo hackathon is Wednesday, August 25th.

Rob

========================================

We are seeking participants for the GMOD Tools for Evolutionary Biology
Hackathon, held November 8-12, 2010 at the US National Evolutionary
Synthesis Center (NESCent) in Durham, NC.

This hackathon targets three critical gaps in the capabilities of the
GMOD toolbox that currently limit its utility for evolutionary research:

  1. Visualization of comparative genomics data
  2. Visualization of phylogenetic data and trees
  3. Support for population diversity and phenotype data

If you are interested in these areas and have relevant expertise, you
are strongly encouraged to apply. Relevant areas of expertise include
more than just software development: if you are a GMOD power user,
visualization guru, domain expert (comparative, phylogenetics,
population, ...), or documentation wizard, then your skills are needed!

How To Apply:

Fill out the online application form at http://bit.ly/gmodevohack.
Applications are due August 25.

About GMOD:

GMOD is an intercompatible suite of open-source software components for
storing, managing, analyzing, and visualizing genome-scale data. GMOD
includes many widely-used software components: GBrowse and JBrowse, both
genome viewers; GBrowse_syn, a comparative genomics viewer; Chado, a
generic and modular database schema; CMap, a comparative map viewer; as
well as many other components including Apollo, MAKER, BioMart,
InterMine, and Galaxy. We hope to extend the functionality of existing
GMOD components, and integrate new components as well.

About Hackathons:

A hackathon is an intense event at which a group of programmers with
different backgrounds and skills collaborate hands-on and face-to-face
to develop working code that is of utility to the community as a whole.
The mix of people will include domain experts and computer-savvy end-users.

More details about the event, its motivation, organization, procedures,
and attendees, as well as URLs to the hackathon and related websites are
included below.

Sincerely,

The GMOD EvoHack Organizing Committee (and project affiliations as
relevant):

Nicole Washington, Chair (LBNL, modENCODE, Phenote)

Robert Buels (SGN, Chado NatDiv)

Scott Cain (OICR, GMOD)

Dave Clements (NESCent, GMOD)

Hilmar Lapp (NESCent, Phenoscape, Chado NatDiv)

Sheldon McKay (University of Arizona, iPlant, GBrowse_syn)


-----------------------------

About the GMOD Evo Hackathon

Overview

We are organizing a hackathon to fill critical gaps in the capabilities
of the Generic Model Organism Database (GMOD) toolbox that currently
limit its utility for evolutionary research. Specifically, we will focus
on tools for

   1) viewing comparative genomics data;
   2) visualizing phylogenomic data; and
   3) supporting population diversity data and phenotype annotation.

The event will be hosted at NESCent and bring together a group of about
20+ software developers, end-user representatives, and documentation
experts who would otherwise not meet. The participants will include key
developers of GMOD components that currently lack features critical for
emerging evolutionary biology research, developers of informatics tools
in evolutionary research that lack GMOD integration, and
informatics-savvy biologists who can represent end-user requirements.

The event will provide a unique opportunity to infuse the GMOD developer
community with a heightened awareness of unmet needs in evolutionary
biology that GMOD components have the potential to fill, and for tool
developers in evolutionary biology to better understand how best to
extend or integrate with already existing GMOD components.

Before the Event

Discussion of ideas and sometimes even design actually starts well
before the hackathon, on mailing lists, wiki pages, and conference calls
set up among accepted attendees.  This advance work lays the foundation
for participants to be productive from the very first day.  This also
means that participants should be willing to contribute some time in
advance of the hackathon itself to participate in this preparatory
discussion.

During the Event

Typically, hackathon participants use the morning of the first day of
the event to organize themselves into working groups of between 3 and 6
people, each with a focused implementation objective.  Ideas and
objectives are discussed, and attendees coalesce around the projects in
which they have the most experience or interest.


Deliverables / Event Results

The meeting's attendance, working groups, and outcomes will be fully
logged and documented on the GMOD wiki (http://gmod.org). Each working
group during the event will typically have its own wiki page, linked
from the main EvoHack page, where it documents its minutes and design
notes, and provides links to the code and documentation it produces.
Also, since GMOD and NESCent are both committed to open source
principles, all code and documentation produced by participants during
the hackathon must be published under an OSI-approved open source
license. As contributions to existing GMOD tools, all hackathon products
will most likely satisfy this requirement automatically.

NESCent

This event is sponsored by the US National Evolutionary Synthesis Center
(NESCent, http://www.nescent.org) through its Informatics Whitepapers
program (http://www.nescent.org/informatics/whitepapers.php). NESCent
promotes the synthesis of information, concepts and knowledge to address
significant, emerging, or novel questions in evolutionary science and
its applications. NESCent achieves this by supporting research and
education across disciplinary, institutional, geographic, and
demographic boundaries (see http://www.nescent.org/science/proposals.php).

Links

Main GMOD EvoHack page, and full proposal:
http://gmod.org/wiki/GMOD_Evo_Hackathon

NESCent: http://www.nescent.org/
GMOD: http://gmod.org <http://gmod.org/>
Similar past NESCent events, see: http://hackathon.nescent.org/
GMOD hackathon application:  http://bit.ly/gmodevohack

-- 
http://gmod.org/wiki/GMOD_News
http://gmod.org/wiki/GMOD_Europe_2010
http://gmod.org/wiki/Help_Desk_Feedback


From David.Messina at sbc.su.se  Thu Aug 19 14:55:50 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 19 Aug 2010 20:55:50 +0200
Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlast.pm - bl2seq
	question
In-Reply-To: <4C6D7123.9080908@bcm.tmc.edu>
References: <4C6C3259.4060304@bcm.tmc.edu>
	<E8F0F7A7-BC33-4E37-8AAB-75A9470E82A5@sbc.su.se>
	<4C6D7123.9080908@bcm.tmc.edu>
Message-ID: <4E977318-05AC-4D8E-9A39-8C07A2419198@sbc.su.se>


Glad I could help, Caleb.

Dave


On Aug 19, 2010, at 20:00, Caleb Davis <cdavis at bcm.tmc.edu> wrote:

> Hi Dave,
> 
> Thank you so much for your detailed response! Fixing the reward parameter replicated the online result for me.  All of the other factors you brought up will help me track down any future problems. Thanks again.
> 
> --Caleb
> 


From rmb32 at cornell.edu  Thu Aug 19 18:19:11 2010
From: rmb32 at cornell.edu (Robert Buels)
Date: Thu, 19 Aug 2010 15:19:11 -0700
Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query
In-Reply-To: <5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu>
References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au>
	<986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net>
	<AANLkTikiTQwzKJGrAKAtCqbHiAVw_uENrJu4Y038=anD@mail.gmail.com>
	<045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net>
	<5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu>
Message-ID: <4C6DADDF.1000103@cornell.edu>

Chris Fields wrote:
> I think it's worth exploring having a DBIx::Class-based middle-ware approach similar to what Rob Buels has done for Chado.  That would be fairly easy to get started using DBIx::Class::Schema::Loader.
> 
> After that it would require optimization and tweaking, which is potentially more complex than Rob's setup as Chado is very Pg-specific, but maybe Rob can elaborate...

Elaborating on how Bio::Chado::Schema is developed:

The vast majority of the code and POD in BCS is autogenerated by 
DBIx::Class::Schema::Loader.  DBICSL gives you a baseline set of 
DBIx::Class classes that covers all the tables, views, columns, unique 
constraints, and foreign key relationships.

Beyond that, you have to add on yourself.  In BCS, we have mostly done 
things like:

   * make better-named aliases for some of the autogenerated
     relationships (though DBICSL does a surprisingly good job of naming
     relationships automatically most of the time)
   * add a tiny bit of bioperl compatibility (this needs a lot more work
     by somebody, volunteers needed!)
   * add convenience methods for using some of the Chado property tables
   * use DBIx::Class::Tree::NestedSet to add some powerful ways of
     traversing phylogenetic tree relationships

Regarding DB backend specificity, BCS isn't Pg-specific at all, because 
DBIx::Class itself goes to great lengths to be compatible (and 
performant!) with just about every relational database out there.  In 
fact, the BCS test suite deploys a Chado schema into a temporary SQLite 
database using DBIC::Schema's deploy() method, and runs all of its tests 
on that.  Very handy.

Chado's Pg-specific server-side functions can of course be called 
through BCS if they are present, but it's perfectly possible to use 
Chado without any of the server-side functions, and mostly the way I use it.

Rob

From David.Messina at sbc.su.se  Fri Aug 20 05:19:14 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 20 Aug 2010 11:19:14 +0200
Subject: [Bioperl-l] Git for the lazy
Message-ID: <4A13D48C-B920-4FA5-AF18-292C764A8B79@sbc.su.se>

Hi everyone,

If you're like me and still getting up to speed with Git, you might find this helpful:

	http://www.spheredev.org/wiki/Git_for_the_lazy


Dave


From bgs500 at york.ac.uk  Fri Aug 20 09:07:50 2010
From: bgs500 at york.ac.uk (Ben Saville)
Date: Fri, 20 Aug 2010 14:07:50 +0100
Subject: [Bioperl-l] Problem Parsing BLAST output
Message-ID: <77D8BE69-E2D3-4C75-BED4-08F1755E414F@york.ac.uk>

Hi Everyone,

I'm very much new to the world of sequence data analysis (and this  
mailing list!), and have reached a roadblock.

I have BLASTed some contigs against a series of databases that I  
created. From this I would like to parse through the data and separate  
it before extracting the information of interest at a later point. I  
would like to separate the data by query ID. I found the following  
Bioperl script;

#!/usr/bin/perl

use Bio::Search::Result::BlastResult;
use Bio::SearchIO;

my $report = Bio::SearchIO->new( -file=>'All_BCM_results.bls', -format  
=> blast);
my $result = $report->next_result;
my %hits_by_query;
while (my $hit = $result->next_hit) {
   push @{$hits_by_query{$hit->name}}, $hit;
}

foreach my $qid ( keys %hits_by_query ) {
   my $result = Bio::Search::Result::BlastResult->new();
   $result->add_hit($_) for ( @{$hits_by_query{$qid}} );
   my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", - 
format=>'blast' );
   $blio->write_result($result);
}

running this script resulted in the following error;

BlastResult::new(): Not adding iterations.

------------- EXCEPTION: Bio::Root::NoSuchThing -------------
MSG: No such iteration number: 0. Valid range=1-0
VALUE: The number zero (0)
STACK: Error::throw
STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.8/Bio/Root/Root.pm:368
STACK: Bio::Search::Result::BlastResult::iteration /sw/lib/perl5/5.8.8/ 
Bio/Search/Result/BlastResult.pm:328
STACK: Bio::Search::Result::BlastResult::add_hit /sw/lib/perl5/5.8.8/ 
Bio/Search/Result/BlastResult.pm:258
STACK: /Users/bsaville/Desktop/Parsing_BLAST_by_query.pl:15
-------------------------------------------------------------

So I added
my $result = Bio::Search::Result::BlastResult->new(1);
The 1 to the line shown above, as it told me this was within the valid  
range. This produced the following error;

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Must define arrayref of Iterations when initializing a  
Bio::Search::Result::BlastResult

STACK: Error::throw
STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.8/Bio/Root/Root.pm:368
STACK: Bio::Search::Result::BlastResult::new /sw/lib/perl5/5.8.8/Bio/ 
Search/Result/BlastResult.pm:128
STACK: /Users/bsaville/Desktop/Parsing_BLAST_by_query.pl:14
-----------------------------------------------------------

I know that it is my inexperience that is causing this problem, but I  
really can't figure this out.

Regards
Ben Saville


From David.Messina at sbc.su.se  Fri Aug 20 09:48:28 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 20 Aug 2010 15:48:28 +0200
Subject: [Bioperl-l] Problem Parsing BLAST output
In-Reply-To: <77D8BE69-E2D3-4C75-BED4-08F1755E414F@york.ac.uk>
References: <77D8BE69-E2D3-4C75-BED4-08F1755E414F@york.ac.uk>
Message-ID: <0384052D-74D2-4789-B7FA-76EED826044F@sbc.su.se>

Hi Ben,

I would not use the script you posted ? I don't think it does what you want.

If you haven't already, you should take a look at the beginners' HOWTO

	http://www.bioperl.org/wiki/HOWTO:Beginners


 the SearchIO HOWTO

	http://www.bioperl.org/wiki/HOWTO:SearchIO


and the example scripts included with BioPerl:

	http://www.bioperl.org/wiki/Scripts


Incidentally, it's a lot of fiddly data processing to parse blast reports for many contigs against multiple databases and then go back and collate the results by query. I'm not sure exactly what you want to do once you've separated by query ? if you provide some more information, we could suggest ways to best get you where you want to go.

I will mention, though, that BLAST has the ability to search multiple separate databases in one go and collate the results for you. So that's something to consider.


Dave


From bernd.web at gmail.com  Fri Aug 20 11:17:05 2010
From: bernd.web at gmail.com (Bernd Web)
Date: Fri, 20 Aug 2010 17:17:05 +0200
Subject: [Bioperl-l] Bio::LocatableSeq end checking inconsistency
In-Reply-To: <004a01cb3aec$8c2ddd60$a4899820$%yin@ucd.ie>
References: <004a01cb3aec$8c2ddd60$a4899820$%yin@ucd.ie>
Message-ID: <AANLkTim2MyJ1XKmvYHr+8gX-j9h9z81==e5suTW09PWs@mail.gmail.com>

Hi Yin,

I am not quite sure if the following is also related to your gapped
length issue but I found I had to adapt the calculation of
ungapped_len in   Bio::LocatableSeq. If my slices did not contain any
letters or a new gap char I used, SimpleAlign could not find the
sequences when outputting the alignment. This was due to a difference
in length calculation:

SimpleAlign: uses \W:  $slice_seq =~ s/\W//g;
Bio::LocatableSeq::ungapped_len uses  "$string =~ s/[\.\-]+//g;"

I had to include '~' (for my local sequences) in the ungapped_len;
otherwise i would run into the end issues with SimpleAlign.


Kind regards,
Bernd


On Fri, Aug 13, 2010 at 3:36 PM, Jun Yin <jun.yin at ucd.ie> wrote:
> Hi, all,
>
>
>
> I am the google summer of code student working on Bio::Align subsystem
> refactoring. The code (Bio::SimpleAlign) I re-implemented now has passed
> nearly all the test, except a few tests on seq/start-end testing. But here
> comes a problem. This may be an old issue, that the Bio::LocatableSeq end
> assignment and checking are inconsistent.
>
>
>
> The current end checking method is based on:
>
> $end=$seq->_ungapped_len+$seq->start-1
>
> However, this checking may not fit the real world case.
>
>
>
> The inconsistency usually happens when a few columns of the sequence are
> removed.
>
>
>
> For example:
>
> my $a = Bio::LocatableSeq->new(
>
> ? ?-id ? ?=> 'a',
>
> ? ?-strand => 1,
>
> ? ?-seq ? => '-tcgatc-atcgatcg',
>
> ? ?-start => 30,
>
> ? ?-end ? => 43
>
> );
>
>
>
> If we remove the 1st, 8th and the last columns
>
>
>
> $a->seq() will be 'tcgatcatcgatc'
>
> $a->_ungapped_len==12
>
>
>
> Actually, in the real world, the first residue will still be 30 (the old
> $seq->start), and the last residue is the residue before the 43 (the old
> $seq->end), thus 42.
>
>
>
> But if you call a validation, the calculation is
> $a->_ungapped_len+$a->start-1=12+30-1=41
>
> So the reassignment of the $seq->end will not pass the validation.
>
>
>
> So unless you save the information to a new sequence object, the original
> position information will be lost anyway. But in some cases, we have to
> change the sequence in its original sequence object ..
>
>
>
> What is your suggestion on this issue?
>
> A. pass the test and lose the information ? ? ?#convenient in coding but the
> start-end annotation is not right any more
>
> B. keep the information and forget the test ? #the object will still
> remember where the last residue was in the original sequence. But is it
> really meaningful at all? Because all the other residues may come from
> nowhere
>
> C. Neither of above #any other suggestions?
>
>
>
> Cheers,
>
> Jun Yin
>
> Ph.D. student in U.C.D.
>
>
>
> Bioinformatics Laboratory
>
> Conway Institute
>
> University College Dublin
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From sidd.basu at gmail.com  Fri Aug 20 11:59:59 2010
From: sidd.basu at gmail.com (Siddhartha Basu)
Date: Fri, 20 Aug 2010 10:59:59 -0500
Subject: [Bioperl-l]  Re: bioperl-db and postgres8.3 - status query
In-Reply-To: <4C6DADDF.1000103@cornell.edu>
References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au>
	<986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net>
	<AANLkTikiTQwzKJGrAKAtCqbHiAVw_uENrJu4Y038=anD@mail.gmail.com>
	<045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net>
	<5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu>
	<4C6DADDF.1000103@cornell.edu>
Message-ID: <20100820155956.GB400@vpn-165-124-164-118.vpn.northwestern.edu>

Hi, 

On Thu, 19 Aug 2010, Robert Buels wrote:

> Chris Fields wrote:
> > I think it's worth exploring having a DBIx::Class-based middle-ware 
> > approach similar to what Rob Buels has done for Chado.  That would be 
> > fairly easy to get started using DBIx::Class::Schema::Loader.
> > After that it would require optimization and tweaking, which is 
> > potentially more complex than Rob's setup as Chado is very Pg-specific, 
> > but maybe Rob can elaborate...
>
> Elaborating on how Bio::Chado::Schema is developed:
>
> The vast majority of the code and POD in BCS is autogenerated by 
> DBIx::Class::Schema::Loader.  DBICSL gives you a baseline set of 
> DBIx::Class classes that covers all the tables, views, columns, unique 
> constraints, and foreign key relationships.
>
> Beyond that, you have to add on yourself.  In BCS, we have mostly done 
> things like:
>
>   * make better-named aliases for some of the autogenerated
>     relationships (though DBICSL does a surprisingly good job of naming
>     relationships automatically most of the time)
>   * add a tiny bit of bioperl compatibility (this needs a lot more work
>     by somebody, volunteers needed!)
>   * add convenience methods for using some of the Chado property tables
>   * use DBIx::Class::Tree::NestedSet to add some powerful ways of
>     traversing phylogenetic tree relationships
>
> Regarding DB backend specificity, BCS isn't Pg-specific at all, because 
> DBIx::Class itself goes to great lengths to be compatible (and performant!) 
> with just about every relational database out there.  
I would vouch for that at least as far as chado in oracle is concerned.
So,  far BCS works out flawlessly with our oracle chado instance at
dictybase. Quite a chunk of BCS based code is also active in couple of
our Mojo based webapps. The part which i still couldn't use directly is
the 'synonym' table as it clashes with oracle specific reserved keywords. 
However,  overall it seems to quite cross-RDMS compatible and highly
recommended.

-siddhartha


>In fact, the BCS test 
> suite deploys a Chado schema into a temporary SQLite database using 
> DBIC::Schema's deploy() method, and runs all of its tests on that.  Very 
> handy.
>
> Chado's Pg-specific server-side functions can of course be called through 
> BCS if they are present, but it's perfectly possible to use Chado without 
> any of the server-side functions, and mostly the way I use it.
>
> Rob
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

From jun.yin at ucd.ie  Fri Aug 20 12:17:33 2010
From: jun.yin at ucd.ie (Jun Yin)
Date: Fri, 20 Aug 2010 17:17:33 +0100
Subject: [Bioperl-l] Bio::LocatableSeq end checking inconsistency
In-Reply-To: <AANLkTim2MyJ1XKmvYHr+8gX-j9h9z81==e5suTW09PWs@mail.gmail.com>
References: <004a01cb3aec$8c2ddd60$a4899820$%yin@ucd.ie>
	<AANLkTim2MyJ1XKmvYHr+8gX-j9h9z81==e5suTW09PWs@mail.gmail.com>
Message-ID: <000b01cb4083$31f98280$95ec8780$%yin@ucd.ie>

Hi, Bernd,

Thx for your input. 

Yes, this is one of the old bugs in Bio::SimpleAlign.  $aln->slice just
simply $slice_seq =~ s/\W//g to calculate the ungapped length.
But in  $seq->_ungapped_len, this method use $string =~
s{[$GAP_SYMBOLS$FRAMESHIFT_SYMBOLS]+}{}g;
Which is '\-\.=~\\\/ ' to calculate the ungapped length.

To solve this problem, first, now I use 
$nonres = join("",$self->gap_char, $self->match_char,$self->missing_char);
Which is '-\.&' to remove the non-residue chars in the alignment sequence
(though if you use '=','~','\','/' will also cause problems).

Secondly, I have merged slice, remove_columns and remove_gaps, using the
same internal function. Thus it is easier to debug.

These changes will be merged into main BioPerl branch after next version.

But anyway, the confict is still there, because the non residue chars are
defined as:
In Bio::SimpleAlign, $aln->gap_char, $aln->missing_char, $aln->match_char
In Bio::LocatableSeq   
$GAP_SYMBOLS = '\-\.=~';
$FRAMESHIFT_SYMBOLS = '\\\/';

so try to use '-' or '.' for your gap char at the moment, otherwise you may
encounter end warnings in calculation.

And, if you want to keep gap only sequences, you can call the method as:
$aln2 = $aln->slice(20,30,1)
The last parameter is to keep gap only sequence.

Cheers,
Jun Yin
Ph.D.?student in U.C.D.

Bioinformatics Laboratory
Conway Institute
University College Dublin


-----Original Message-----
From: Bernd Web [mailto:bernd.web at gmail.com] 
Sent: Friday, August 20, 2010 4:17 PM
To: Jun Yin
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Bio::LocatableSeq end checking inconsistency

Hi Yin,

I am not quite sure if the following is also related to your gapped
length issue but I found I had to adapt the calculation of
ungapped_len in   Bio::LocatableSeq. If my slices did not contain any
letters or a new gap char I used, SimpleAlign could not find the
sequences when outputting the alignment. This was due to a difference
in length calculation:

SimpleAlign: uses \W:  $slice_seq =~ s/\W//g;
Bio::LocatableSeq::ungapped_len uses  "$string =~ s/[\.\-]+//g;"

I had to include '~' (for my local sequences) in the ungapped_len;
otherwise i would run into the end issues with SimpleAlign.


Kind regards,
Bernd


On Fri, Aug 13, 2010 at 3:36 PM, Jun Yin <jun.yin at ucd.ie> wrote:
> Hi, all,
>
>
>
> I am the google summer of code student working on Bio::Align subsystem
> refactoring. The code (Bio::SimpleAlign) I re-implemented now has passed
> nearly all the test, except a few tests on seq/start-end testing. But here
> comes a problem. This may be an old issue, that the Bio::LocatableSeq end
> assignment and checking are inconsistent.
>
>
>
> The current end checking method is based on:
>
> $end=$seq->_ungapped_len+$seq->start-1
>
> However, this checking may not fit the real world case.
>
>
>
> The inconsistency usually happens when a few columns of the sequence are
> removed.
>
>
>
> For example:
>
> my $a = Bio::LocatableSeq->new(
>
> ? ?-id ? ?=> 'a',
>
> ? ?-strand => 1,
>
> ? ?-seq ? => '-tcgatc-atcgatcg',
>
> ? ?-start => 30,
>
> ? ?-end ? => 43
>
> );
>
>
>
> If we remove the 1st, 8th and the last columns
>
>
>
> $a->seq() will be 'tcgatcatcgatc'
>
> $a->_ungapped_len==12
>
>
>
> Actually, in the real world, the first residue will still be 30 (the old
> $seq->start), and the last residue is the residue before the 43 (the old
> $seq->end), thus 42.
>
>
>
> But if you call a validation, the calculation is
> $a->_ungapped_len+$a->start-1=12+30-1=41
>
> So the reassignment of the $seq->end will not pass the validation.
>
>
>
> So unless you save the information to a new sequence object, the original
> position information will be lost anyway. But in some cases, we have to
> change the sequence in its original sequence object ..
>
>
>
> What is your suggestion on this issue?
>
> A. pass the test and lose the information ? ? ?#convenient in coding but
the
> start-end annotation is not right any more
>
> B. keep the information and forget the test ? #the object will still
> remember where the last residue was in the original sequence. But is it
> really meaningful at all? Because all the other residues may come from
> nowhere
>
> C. Neither of above #any other suggestions?
>
>
>
> Cheers,
>
> Jun Yin
>
> Ph.D. student in U.C.D.
>
>
>
> Bioinformatics Laboratory
>
> Conway Institute
>
> University College Dublin
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

__________ Information from ESET Smart Security, version of virus signature
database 5377 (20100818) __________

The message was checked by ESET Smart Security.

http://www.eset.com


__________ Information from ESET Smart Security, version of virus signature
database 5377 (20100818) __________

The message was checked by ESET Smart Security.

http://www.eset.com
 

From cjfields at illinois.edu  Fri Aug 20 12:23:07 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 20 Aug 2010 11:23:07 -0500
Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query
In-Reply-To: <20100820155956.GB400@vpn-165-124-164-118.vpn.northwestern.edu>
References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au>
	<986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net>
	<AANLkTikiTQwzKJGrAKAtCqbHiAVw_uENrJu4Y038=anD@mail.gmail.com>
	<045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net>
	<5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu>
	<4C6DADDF.1000103@cornell.edu>
	<20100820155956.GB400@vpn-165-124-164-118.vpn.northwestern.edu>
Message-ID: <1282321387.30812.17.camel@pyrimidine.igb.uiuc.edu>

On Fri, 2010-08-20 at 10:59 -0500, Siddhartha Basu wrote:
> Hi, 
> 
> On Thu, 19 Aug 2010, Robert Buels wrote:
> 
> > Chris Fields wrote:
> > > I think it's worth exploring having a DBIx::Class-based middle-ware 
> > > approach similar to what Rob Buels has done for Chado.  That would be 
> > > fairly easy to get started using DBIx::Class::Schema::Loader.
> > > After that it would require optimization and tweaking, which is 
> > > potentially more complex than Rob's setup as Chado is very Pg-specific, 
> > > but maybe Rob can elaborate...
> >
> > Elaborating on how Bio::Chado::Schema is developed:
> >
> > The vast majority of the code and POD in BCS is autogenerated by 
> > DBIx::Class::Schema::Loader.  DBICSL gives you a baseline set of 
> > DBIx::Class classes that covers all the tables, views, columns, unique 
> > constraints, and foreign key relationships.
> >
> > Beyond that, you have to add on yourself.  In BCS, we have mostly done 
> > things like:
> >
> >   * make better-named aliases for some of the autogenerated
> >     relationships (though DBICSL does a surprisingly good job of naming
> >     relationships automatically most of the time)
> >   * add a tiny bit of bioperl compatibility (this needs a lot more work
> >     by somebody, volunteers needed!)
> >   * add convenience methods for using some of the Chado property tables
> >   * use DBIx::Class::Tree::NestedSet to add some powerful ways of
> >     traversing phylogenetic tree relationships
> >
> > Regarding DB backend specificity, BCS isn't Pg-specific at all, because 
> > DBIx::Class itself goes to great lengths to be compatible (and performant!) 
> > with just about every relational database out there.  
> I would vouch for that at least as far as chado in oracle is concerned.
> So,  far BCS works out flawlessly with our oracle chado instance at
> dictybase. Quite a chunk of BCS based code is also active in couple of
> our Mojo based webapps. The part which i still couldn't use directly is
> the 'synonym' table as it clashes with oracle specific reserved keywords. 
> However,  overall it seems to quite cross-RDMS compatible and highly
> recommended.
> 
> -siddhartha

Just to point out, I didn't say BCS is Pg-specific, but that Chado is
(that was the DBMS it was designed for).  Maybe that should be amended
to 'was' now :)

I recall seeing a page on this somewhere on the GMOD website along the
lines of "MySQL has problems so we chose Pg", and that Chado support
would focus on Pg.  I'm guessing that's no longer the case?  Or is only
the server-side stuff Pg-specific.

> >In fact, the BCS test 
> > suite deploys a Chado schema into a temporary SQLite database using 
> > DBIC::Schema's deploy() method, and runs all of its tests on that.  Very 
> > handy.
> >
> > Chado's Pg-specific server-side functions can of course be called through 
> > BCS if they are present, but it's perfectly possible to use Chado without 
> > any of the server-side functions, and mostly the way I use it.
> >
> > Rob

I think this opens up the possibility of starting a DBIx::Class-based
middleware solution.  Hilmar, did you want to take that on?

chris


From sidd.basu at gmail.com  Fri Aug 20 13:39:44 2010
From: sidd.basu at gmail.com (Siddhartha Basu)
Date: Fri, 20 Aug 2010 12:39:44 -0500
Subject: [Bioperl-l]  Re: bioperl-db and postgres8.3 - status query
In-Reply-To: <1282321387.30812.17.camel@pyrimidine.igb.uiuc.edu>
References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au>
	<986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net>
	<AANLkTikiTQwzKJGrAKAtCqbHiAVw_uENrJu4Y038=anD@mail.gmail.com>
	<045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net>
	<5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu>
	<4C6DADDF.1000103@cornell.edu>
	<20100820155956.GB400@vpn-165-124-164-118.vpn.northwestern.edu>
	<1282321387.30812.17.camel@pyrimidine.igb.uiuc.edu>
Message-ID: <20100820173942.GC400@vpn-165-124-164-118.vpn.northwestern.edu>

On Fri, 20 Aug 2010, Chris Fields wrote:

> On Fri, 2010-08-20 at 10:59 -0500, Siddhartha Basu wrote:
> > Hi, 
> > 
> > On Thu, 19 Aug 2010, Robert Buels wrote:
> > 
> > > Chris Fields wrote:
> > > > I think it's worth exploring having a DBIx::Class-based middle-ware 
> > > > approach similar to what Rob Buels has done for Chado.  That would be 
> > > > fairly easy to get started using DBIx::Class::Schema::Loader.
> > > > After that it would require optimization and tweaking, which is 
> > > > potentially more complex than Rob's setup as Chado is very Pg-specific, 
> > > > but maybe Rob can elaborate...
> > >
> > > Elaborating on how Bio::Chado::Schema is developed:
> > >
> > > The vast majority of the code and POD in BCS is autogenerated by 
> > > DBIx::Class::Schema::Loader.  DBICSL gives you a baseline set of 
> > > DBIx::Class classes that covers all the tables, views, columns, unique 
> > > constraints, and foreign key relationships.
> > >
> > > Beyond that, you have to add on yourself.  In BCS, we have mostly done 
> > > things like:
> > >
> > >   * make better-named aliases for some of the autogenerated
> > >     relationships (though DBICSL does a surprisingly good job of naming
> > >     relationships automatically most of the time)
> > >   * add a tiny bit of bioperl compatibility (this needs a lot more work
> > >     by somebody, volunteers needed!)
> > >   * add convenience methods for using some of the Chado property tables
> > >   * use DBIx::Class::Tree::NestedSet to add some powerful ways of
> > >     traversing phylogenetic tree relationships
> > >
> > > Regarding DB backend specificity, BCS isn't Pg-specific at all, because 
> > > DBIx::Class itself goes to great lengths to be compatible (and performant!) 
> > > with just about every relational database out there.  
> > I would vouch for that at least as far as chado in oracle is concerned.
> > So,  far BCS works out flawlessly with our oracle chado instance at
> > dictybase. Quite a chunk of BCS based code is also active in couple of
> > our Mojo based webapps. The part which i still couldn't use directly is
> > the 'synonym' table as it clashes with oracle specific reserved keywords. 
> > However,  overall it seems to quite cross-RDMS compatible and highly
> > recommended.
> > 
> > -siddhartha
> 
> Just to point out, I didn't say BCS is Pg-specific, but that Chado is
> (that was the DBMS it was designed for).  Maybe that should be amended
> to 'was' now :)
> 
> I recall seeing a page on this somewhere on the GMOD website along the
> lines of "MySQL has problems so we chose Pg", and that Chado support
> would focus on Pg.  
As far as i understand GMOD stongly recommends and the popular backend
for chado is Pg. However, my point was if anybody wants to use or tryout chado
schema on a different backend or have an existing setup,  
tools like DBIx::Class or particularly BCS makes it quite easier to do
so. The code developed on top also become quite robust and portable.

-siddhartha 

>I'm guessing that's no longer the case?  Or is only
> the server-side stuff Pg-specific.
> 
> > >In fact, the BCS test 
> > > suite deploys a Chado schema into a temporary SQLite database using 
> > > DBIC::Schema's deploy() method, and runs all of its tests on that.  Very 
> > > handy.
> > >
> > > Chado's Pg-specific server-side functions can of course be called through 
> > > BCS if they are present, but it's perfectly possible to use Chado without 
> > > any of the server-side functions, and mostly the way I use it.
> > >
> > > Rob
> 
> I think this opens up the possibility of starting a DBIx::Class-based
> middleware solution.  Hilmar, did you want to take that on?
> 
> chris
> 
> 

From buiduyminh at gmail.com  Fri Aug 20 17:29:00 2010
From: buiduyminh at gmail.com (Minh Bui)
Date: Fri, 20 Aug 2010 17:29:00 -0400
Subject: [Bioperl-l] bp_seqfeature_load.pl fails on Mac os. Please help.
Message-ID: <AANLkTinsyOMPJxpks_pqMwLpW8gx0VRihhJsLDnF53mu@mail.gmail.com>

Hi,,
I am trying to load my GFF file to mysql database but I got this error
when I ran the bp_seqfeature_load.pl ( bioperl 1.6.1 on  MAC)

[BioComplexity-5:/usr/local/bin] minh% perl bp_seqfeature_load.pl
install_driver(mysql) failed: Can't locate DBD/mysql.pm in @INC (@INC
contains: /sw/lib/perl5 /sw/lib/perl5/darwin
/System/Library/Perl/5.8.6/darwin-thread-multi-2level
/System/Library/Perl/5.8.6
/Library/Perl/5.8.6/darwin-thread-multi-2level /Library/Perl/5.8.6
/Library/Perl /Network/Library/Perl/5.8.6/darwin-thread-multi-2level
/Network/Library/Perl/5.8.6 /Network/Library/Perl
/System/Library/Perl/Extras/5.8.6/darwin-thread-multi-2level
/System/Library/Perl/Extras/5.8.6 /Library/Perl/5.8.1 .) at (eval 44)
line 3.
Perhaps the DBD::mysql perl module hasn't been fully installed,
or perhaps the capitalisation of 'mysql' isn't right.
Available drivers: DBM, ExampleP, File, Gofer, Proxy, Sponge.
 at /Library/Perl/5.8.6/Bio/DB/SeqFeature/Store/DBI/mysql.pm line 212

I am using MAC OSX version 10.4.10 and MAMP? Isnt it the
"/Library/Perl/5.8.6" already in @INC? What am I missing?
I have been googling this error for a few hours. I also install
Bioperl and reinstall DBD::mysql using CPAN. It still doesnt work..

Here is my $PERL5LIB:  /sw/lib/perl5:/sw/lib/perl5/darwin/

I really need help on this.
Thank you,

From awitney at sgul.ac.uk  Sat Aug 21 06:39:10 2010
From: awitney at sgul.ac.uk (Adam Witney)
Date: Sat, 21 Aug 2010 11:39:10 +0100
Subject: [Bioperl-l] bp_seqfeature_load.pl fails on Mac os. Please help.
In-Reply-To: <AANLkTinsyOMPJxpks_pqMwLpW8gx0VRihhJsLDnF53mu@mail.gmail.com>
References: <AANLkTinsyOMPJxpks_pqMwLpW8gx0VRihhJsLDnF53mu@mail.gmail.com>
Message-ID: <491D1B66-741F-4315-8A6B-46F465956017@sgul.ac.uk>


On 20 Aug 2010, at 22:29, Minh Bui wrote:

> Hi,,
> I am trying to load my GFF file to mysql database but I got this error
> when I ran the bp_seqfeature_load.pl ( bioperl 1.6.1 on  MAC)
> 
> [BioComplexity-5:/usr/local/bin] minh% perl bp_seqfeature_load.pl
> install_driver(mysql) failed: Can't locate DBD/mysql.pm in @INC (@INC
> contains: /sw/lib/perl5 /sw/lib/perl5/darwin
> /System/Library/Perl/5.8.6/darwin-thread-multi-2level
> /System/Library/Perl/5.8.6
> /Library/Perl/5.8.6/darwin-thread-multi-2level /Library/Perl/5.8.6
> /Library/Perl /Network/Library/Perl/5.8.6/darwin-thread-multi-2level
> /Network/Library/Perl/5.8.6 /Network/Library/Perl
> /System/Library/Perl/Extras/5.8.6/darwin-thread-multi-2level
> /System/Library/Perl/Extras/5.8.6 /Library/Perl/5.8.1 .) at (eval 44)
> line 3.
> Perhaps the DBD::mysql perl module hasn't been fully installed,
> or perhaps the capitalisation of 'mysql' isn't right.
> Available drivers: DBM, ExampleP, File, Gofer, Proxy, Sponge.
> at /Library/Perl/5.8.6/Bio/DB/SeqFeature/Store/DBI/mysql.pm line 212
> 
> I am using MAC OSX version 10.4.10 and MAMP? Isnt it the
> "/Library/Perl/5.8.6" already in @INC? What am I missing?
> I have been googling this error for a few hours. I also install
> Bioperl and reinstall DBD::mysql using CPAN. It still doesnt work..
> 
> Here is my $PERL5LIB:  /sw/lib/perl5:/sw/lib/perl5/darwin/


Where did DBD:mysql get installed? can you verify that DBD/mysql.pm is actually in one of those directories listed above?


From i.hatethispart at ymail.com  Sat Aug 21 10:07:28 2010
From: i.hatethispart at ymail.com (keiko)
Date: Sat, 21 Aug 2010 07:07:28 -0700 (PDT)
Subject: [Bioperl-l] clustalw.exe
In-Reply-To: <3612399.post@talk.nabble.com>
References: <3612399.post@talk.nabble.com>
Message-ID: <29499435.post@talk.nabble.com>


Katrin wrote:
> 
> hello, I am a new Perl/Bioperl-User and first I must excuse me for my
> really bad english, but I hope everybody will understand me. I have the
> following problem: In my Perl-skript is the following system call:
> $y=exec("C:\\Programme\\xampp-win32-1.5.1\\xampp\\perl\\clustalw.exe
> C:\\Programme\\xampp-win32-1.5.1\\xampp\\htdocs\\gene\\clustal.fasta"); If
> I call this Script with the Shell (cmd.exe) everything works correctly.
> But if I call this script with PHP I get the following error message:
> Error: unknown option
> /C:\Programme\xampp-win32-1.5.1\xampp\htdocs\gene\clustal.fasta. I tried
> also system and qx. And I tested the environment variables: I wrote a
> bat-file with the definition of all environment-variables and the system
> call, but this did not work, too. The same problem is in php. The
> PHP-Scipt is called from html and I worked under WindowsXP with xampp. I
> hope, somebody can help me. greetings Katrin
> 

Hi. I also have a problem with this one. I want to call clustalw using php.
Can I ask what you included in your bat-file and where did you download your
clustal? thanks a lot!
-- 
View this message in context: http://old.nabble.com/clustalw.exe-tp3612399p29499435.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From jason at bioperl.org  Sun Aug 22 14:29:30 2010
From: jason at bioperl.org (Jason Stajich)
Date: Sun, 22 Aug 2010 11:29:30 -0700
Subject: [Bioperl-l] Enquiry on Bio::DB::Taxonomy
In-Reply-To: <AANLkTik9qpKSQV9dRKzxSrt_q5qq=g6X6eop8LTqkRVm@mail.gmail.com>
References: <AANLkTik9qpKSQV9dRKzxSrt_q5qq=g6X6eop8LTqkRVm@mail.gmail.com>
Message-ID: <4C716C8A.3010000@bioperl.org>

Hi Amali -

This is how I'd print out the full classification by using the Tree 
methods (with probably a different way of initializing the $db object to 
your flatfiles location).

#!/usr/bin/perl -w
use strict;
use Bio::DB::Taxonomy;

my $db= Bio::DB::Taxonomy->new(-source => 'flatfile',
                    -nodesfile => 'taxonomy/nodes.dmp',
                    -namesfile => 'taxonomy/names.dmp');

my $taxonid = $db->get_taxonid('Homo sapiens');
my $taxon = $db->get_taxon(-taxonid => $taxonid);
my $tree = Bio::Tree::Tree->new(-node => $taxon);
my @taxa = $tree->get_nodes;
print join(",", map { $_->scientific_name } @taxa), "\n";

-jason

Amali Thrimawithana wrote, On 8/18/10 3:56 PM:
> Dear Dr Stajich,
>
> I am a Masters student at Auckland university and my research is on
> identifying yeast species present in wine by the use of 454 sequencing. In
> order to carry out this research, a pipeline is being built in which at the
> final step each representative OTU need to be classified at different
> taxonomic levels (ie: at Phylum, family, class, genus and species) by using
> the results from BLAST. To identify the sequences at each taxonomic level, I
> have been trying out the Bio::DB::Taxonomy module in bioperl. Using this
> module, I am able to get the genus and species level by splitting the
> scientific name returned by the Bio::taxon object. But unfortunately I am
> uncertain on how to get the information for the other levels of the rank. I
> have tried several commands including "my @class = $node->classification;",
> but it does not work. Hence, could you please let me know how I might be
> able to get the higher levels of taxonomy such as class and phylum using
> bioperl?
>
> Look forward to hearing from you soon
>
> Thanking You
>
> Amali
>    

From cjfields at illinois.edu  Sun Aug 22 15:56:58 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Sun, 22 Aug 2010 14:56:58 -0500
Subject: [Bioperl-l] clustalw.exe
In-Reply-To: <29499435.post@talk.nabble.com>
References: <3612399.post@talk.nabble.com> <29499435.post@talk.nabble.com>
Message-ID: <E6C6AE4B-A6AB-4B90-8D81-74DE14B165BD@illinois.edu>

On Aug 21, 2010, at 9:07 AM, keiko wrote:

> Katrin wrote:
>> 
>> hello, I am a new Perl/Bioperl-User and first I must excuse me for my
>> really bad english, but I hope everybody will understand me. I have the
>> following problem: In my Perl-skript is the following system call:
>> $y=exec("C:\\Programme\\xampp-win32-1.5.1\\xampp\\perl\\clustalw.exe
>> C:\\Programme\\xampp-win32-1.5.1\\xampp\\htdocs\\gene\\clustal.fasta"); If
>> I call this Script with the Shell (cmd.exe) everything works correctly.
>> But if I call this script with PHP I get the following error message:
>> Error: unknown option
>> /C:\Programme\xampp-win32-1.5.1\xampp\htdocs\gene\clustal.fasta. I tried
>> also system and qx. And I tested the environment variables: I wrote a
>> bat-file with the definition of all environment-variables and the system
>> call, but this did not work, too. The same problem is in php. The
>> PHP-Scipt is called from html and I worked under WindowsXP with xampp. I
>> hope, somebody can help me. greetings Katrin
>> 
> 
> Hi. I also have a problem with this one. I want to call clustalw using php.
> Can I ask what you included in your bat-file and where did you download your
> clustal? thanks a lot!

Not sure, but what does this have to do with BioPerl?

chris


From jason at bioperl.org  Mon Aug 23 11:56:47 2010
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 23 Aug 2010 08:56:47 -0700
Subject: [Bioperl-l] a problem when using the Bioperl modules
In-Reply-To: <AANLkTinZYJC6JwP776K3phzbAmtjiKMi_K_VTH=B6oeC@mail.gmail.com>
References: <AANLkTinZYJC6JwP776K3phzbAmtjiKMi_K_VTH=B6oeC@mail.gmail.com>
Message-ID: <4C729A3F.7080304@bioperl.org>

Wei -

Please ask your questions on the bioperl mailing list, I cannot answer 
questions directly for all requests.
Your problem has been answered by me on the list before so I urge you to 
use the list archives as a starting point.

The line lengths of the fasta file sequence aren't the same length.

you need to run this
bp_sreformat -if fasta -of fasta -i ORIGINAL -o NEW
mv NEW ORIGINAL

or with sreformat
sreformat fasta ORIGINAL > NEW
mv NEW ORIGINAL


Guifeng Wei wrote, On 8/23/10 4:57 AM:
> Dear professor Stajich,
> So sorry to interrupt you. i came across a problem when i use the 
> Bio::DB::Fasta modules of BioPerl.  The aim i want to arrive at is to 
> extract the subsequences accoording to the *.bed files which are the 
> C.elegans genomic sequnece annotation.  The code i programed is in the 
> attached file.
> The genomic sequences file contains sequences from 6 chromosomes of 
> C.elegans.
> when i run this program in the command line, the following error 
> warnings was coming.
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Each line of the fasta entry must be the same length except the last.
>     Line above #301451 '
> ..' is 22 != 51 chars.
> STACK: Error::throw
> STACK: Bio::Root::Root::throw 
> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368
> STACK: Bio::DB::Fasta::calculate_offsets 
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:770
> STACK: Bio::DB::Fasta::index_file 
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:680
> STACK: Bio::DB::Fasta::new 
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:491
> STACK: bed_to_fasta.pl:14 <http://bed_to_fasta.pl:14>
> -----------------------------------------------------------
> indexing was interrupted, so unlinking 
> /home/wgf/WORM_DATA/elegans.WS190.dna.fa.index at 
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm line 1053.
>
> and therefore i write to you in hope that you can help me solve this 
> problem,as well as, give me some suggestion about how to learn Bioperl 
> well.
> thank you very very much.
> yours sincerely
> Wei Guifeng

From jason.stajich at ucr.edu  Mon Aug 23 11:58:07 2010
From: jason.stajich at ucr.edu (Jason Stajich)
Date: Mon, 23 Aug 2010 08:58:07 -0700
Subject: [Bioperl-l] a problem when using the Bioperl modules
In-Reply-To: <AANLkTinrqwQCho_obj-_9MvQAyLEBVvaFA+HzJpFKovS@mail.gmail.com>
References: <AANLkTinZYJC6JwP776K3phzbAmtjiKMi_K_VTH=B6oeC@mail.gmail.com>
	<AANLkTinrqwQCho_obj-_9MvQAyLEBVvaFA+HzJpFKovS@mail.gmail.com>
Message-ID: <4C729A8F.1070506@ucr.edu>

You haven't defined this variable $db - you need to not skip the part 
that initializes the Bio::DB::Fasta object that you had previous asked 
about.
Please send all your future queries to the mailing list.


Guifeng Wei wrote, On 8/23/10 8:14 AM:
> Dear professor,
> after that, i revised my scripts, which is that i divide the genomic 
> sequences into 7 single file, every file contains the sequence from a 
> chromosome.
> however, when i try to run the scripts, the following error was coming.
> Can't call method "seq" on an undefined value at bed_to_fasta.pl 
> <http://bed_to_fasta.pl> line 29, <IN> line 1.
> while(<IN>){
>         chomp $_;
>         my @bed=split(/\s+/, $_ );
>     #print length($db->seq('chrI'));
>         my $chr_id=$bed[0];
>         my $start=$bed[1];
>         my $end=$bed[2];
>         my $seq_name=$bed[3];
>         my $strand=$bed[5];
> my $segment =  $db ->seq($chr_id,$start=>$end);
>         print ">",$seq_name,"_",$chr_id,":",$start=>$end;
>         print "$segment\n";
> }
> the blue line is .
> why?

-- 
Jason E. Stajich, PhD
Assistant Professor
Department of Plant Pathology & Microbiology
University of California
Riverside, CA 92521
jason.stajich at ucr.edu
office: 951.827.2363

http://lab.stajich.org/
http://twitter.com/stajichlab
http://fungalgenomes.org/blog/

http://plantpathology.ucr.edu/
http://genomics.ucr.edu/
http://cepceb.ucr.edu/


From guifengwei at gmail.com  Mon Aug 23 22:44:57 2010
From: guifengwei at gmail.com (Guifeng Wei)
Date: Tue, 24 Aug 2010 10:44:57 +0800
Subject: [Bioperl-l] a problem when using the Bio::DB::Fasta
Message-ID: <AANLkTinnyEw4f8F5BP+CicffaVCe+pBEXc+0rj5vu1iG@mail.gmail.com>

Hi,

i came across a problem when i use the Bio::DB::Fasta modules of
BioPerl. The aim i want to arrive at is to extract the subsequences
accoording to the *.bed files which are the C.elegans genomic sequnece
annotation.

when i tried to run the scripts i wrote, the error message was coming, as
follows:

Can't call method "seq" on an undefined value at bed_to_fasta.pl line 28,
<IN> line 1.

so, ask for favor to slove this problem.
Here is my perl scripts.

#!/usr/bin/perl -w
# Purpose: extract sequences from genomic sequences
use strict;
use Bio::DB::Fasta;
open(IN,$ARGV[0]) || die "sorry, the program cannot open the .bed file, plea
check it. \n";
my $db = Bio::DB::Fasta->new( '/home/wgf/elegans190.dna/' );
# The dir ...../elegans190.dna/ includes 6
files:chrI,chrII,chrIII,chrIV,chrV,chrX,
#each stands for the sequences from the coressponding chromosome.

while(<IN>){
        chomp $_;
        my @bed=split(/\s+/, $_ );

        my $chr_id=$bed[0];
        my $start=$bed[1];
        my $end=$bed[2];
        my $seq_name=$bed[3];
        my $strand=$bed[5];

        my $segment =  $db->seq( $chr_id, $start=>$end );

        print ">",$seq_name,"_",$chr_id,":",$start=>$end;
        print "$segment\n";

}

close(IN);

From florent.angly at gmail.com  Tue Aug 24 01:06:21 2010
From: florent.angly at gmail.com (Florent Angly)
Date: Tue, 24 Aug 2010 15:06:21 +1000
Subject: [Bioperl-l] a problem when using the Bio::DB::Fasta
In-Reply-To: <AANLkTinnyEw4f8F5BP+CicffaVCe+pBEXc+0rj5vu1iG@mail.gmail.com>
References: <AANLkTinnyEw4f8F5BP+CicffaVCe+pBEXc+0rj5vu1iG@mail.gmail.com>
Message-ID: <4C73534D.6080607@gmail.com>

  Hi Guifeng,

 From the Bio::DB::Fasta documentation:
>        $db = Bio::DB::Fasta->new($fasta_path [,%options])
>          Create a new Bio::DB::Fasta object from the Fasta file or files
>          indicated by $fasta_path.  Indexing will be performed 
> automatically
>          if needed.  If successful, new() will return the database 
> accessor
>          object.  Otherwise it will return undef.

Hence, after you create the database object $db, you should check that 
it was successful, e.g.:
> my $db = Bio::DB::Fasta->new( '/home/wgf/elegans190.dna/' );
> if (not defined $db) {
>   die "There was a problem creating the database\n";
> }
A problem creating the database would explain the message you get.

If the extension of the FASTA files in the directory path that you gave 
as input is not fa, fasta, fast, FA, FASTA, FAST or dna, then you should 
use the -glob option when constructing your database object. From the 
documentation:
>           -glob         Glob expression to use    
> *.{fa,fasta,fast,FA,FASTA,FAST,dna}
>                         for searching for Fasta
>                              files in directories.


Florent


On 24/08/10 12:44, Guifeng Wei wrote:
> Hi,
>
> i came across a problem when i use the Bio::DB::Fasta modules of
> BioPerl. The aim i want to arrive at is to extract the subsequences
> accoording to the *.bed files which are the C.elegans genomic sequnece
> annotation.
>
> when i tried to run the scripts i wrote, the error message was coming, as
> follows:
>
> Can't call method "seq" on an undefined value at bed_to_fasta.pl line 28,
> <IN>  line 1.
>
> so, ask for favor to slove this problem.
> Here is my perl scripts.
>
> #!/usr/bin/perl -w
> # Purpose: extract sequences from genomic sequences
> use strict;
> use Bio::DB::Fasta;
> open(IN,$ARGV[0]) || die "sorry, the program cannot open the .bed file, plea
> check it. \n";
> my $db = Bio::DB::Fasta->new( '/home/wgf/elegans190.dna/' );
> # The dir ...../elegans190.dna/ includes 6
> files:chrI,chrII,chrIII,chrIV,chrV,chrX,
> #each stands for the sequences from the coressponding chromosome.
>
> while(<IN>){
>          chomp $_;
>          my @bed=split(/\s+/, $_ );
>
>          my $chr_id=$bed[0];
>          my $start=$bed[1];
>          my $end=$bed[2];
>          my $seq_name=$bed[3];
>          my $strand=$bed[5];
>
>          my $segment =  $db->seq( $chr_id, $start=>$end );
>
>          print ">",$seq_name,"_",$chr_id,":",$start=>$end;
>          print "$segment\n";
>
> }
>
> close(IN);
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From guifengwei at gmail.com  Tue Aug 24 07:28:16 2010
From: guifengwei at gmail.com (Guifeng Wei)
Date: Tue, 24 Aug 2010 19:28:16 +0800
Subject: [Bioperl-l] a problem when using the Bio::DB::Fasta
In-Reply-To: <AANLkTinnyEw4f8F5BP+CicffaVCe+pBEXc+0rj5vu1iG@mail.gmail.com>
References: <AANLkTinnyEw4f8F5BP+CicffaVCe+pBEXc+0rj5vu1iG@mail.gmail.com>
Message-ID: <AANLkTik_yFysscFwAd-8Ar4S_cM-XCk5w+C=8121MWNA@mail.gmail.com>

Hi,

i have revised my scripts according to the previous email from Florent.
However, there were still some errors which frustrated me so much.

The errors are as follows:

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Each line of the fasta entry must be the same length except the last.
    Line above #301451 '
..' is 22 != 51 chars.
STACK: Error::throw
STACK: Bio::Root::Root::throw
/usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368
STACK: Bio::DB::Fasta::calculate_offsets
/usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:770
STACK: Bio::DB::Fasta::index_dir
/usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:593
STACK: Bio::DB::Fasta::new
/usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:488
STACK: bed2fasta.pl:13
-----------------------------------------------------------
indexing was interrupted, so unlinking
/home/wgf/elegans190.dna//directory.index at
/usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm line 1053
But in the directory /home/wgf/elegans190.dna/ , it concludes 6 files,
each contains the complete sequences from one single chromosome, the format
is fasta. The extension of the FASTA files is .fa. Every single file is
started as ">chromosoemeXXX" followed by the thousands of sequences.

and therefore, it warn me that "Each line of the fasta entry must be the
same length except the last". and "indexing was interrupted, so unlinking
/home/wgf/elegans190.dna//directory".

i was much confused about this. so for help.

Wei Guifeng

From biopython at maubp.freeserve.co.uk  Tue Aug 24 09:28:33 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 24 Aug 2010 14:28:33 +0100
Subject: [Bioperl-l] a problem when using the Bio::DB::Fasta
In-Reply-To: <AANLkTik_yFysscFwAd-8Ar4S_cM-XCk5w+C=8121MWNA@mail.gmail.com>
References: <AANLkTinnyEw4f8F5BP+CicffaVCe+pBEXc+0rj5vu1iG@mail.gmail.com>
	<AANLkTik_yFysscFwAd-8Ar4S_cM-XCk5w+C=8121MWNA@mail.gmail.com>
Message-ID: <AANLkTi=Nn7m1_6mPoiUcmJNsBoFu4eh-pO9QJaVipOU0@mail.gmail.com>

On Tue, Aug 24, 2010 at 12:28 PM, Guifeng Wei <guifengwei at gmail.com> wrote:
> Hi,
>
> i have revised my scripts according to the previous email from Florent.
> However, there were still some errors which frustrated me so much.
>
> The errors are as follows:
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Each line of the fasta entry must be the same length except the last.
> ? ?Line above #301451 '
> ..' is 22 != 51 chars.
> STACK: Error::throw
> STACK: Bio::Root::Root::throw
> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368
> STACK: Bio::DB::Fasta::calculate_offsets
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:770
> STACK: Bio::DB::Fasta::index_dir
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:593
> STACK: Bio::DB::Fasta::new
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:488
> STACK: bed2fasta.pl:13
> -----------------------------------------------------------
> indexing was interrupted, so unlinking
> /home/wgf/elegans190.dna//directory.index at
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm line 1053
> But in the directory /home/wgf/elegans190.dna/ , it concludes 6 files,
> each contains the complete sequences from one single chromosome, the format
> is fasta. The extension of the FASTA files is .fa. Every single file is
> started as ">chromosoemeXXX" followed by the thousands of sequences.
>
> and therefore, it warn me that "Each line of the fasta entry must be the
> same length except the last". and "indexing was interrupted, so unlinking
> /home/wgf/elegans190.dna//directory".
>
> i was much confused about this. so for help.
>
> Wei Guifeng

Hi Wei,

It sounds like there is inconsistent line wrapping in your FASTA file.
This is often not a problem at all, but the DB indexing system (and
indeed other indexing tools like the samtools fasta index) requires
all the entries have the same wrapping.

e.g. This is a valid FASTA file but would not be suitable for indexing:

>Test
ACGTACGT
ACGTACGT
ACGTACGT
ACGT
ACGT
T

Ignoring the final line (special case - here length one) that uses a
mixture of line lengths, 8 and 4. If you had used this it should be
fine:

>Test
ACGTACGT
ACGTACGT
ACGTACGT
ACGTACGT
T

All the lines are now wrapped at length 8 (and the final line is
less than or equal to length 8).

Of course, in a real file wrapping a 60 or 80 characters is more
common ;)

Peter


From cjfields at illinois.edu  Tue Aug 24 09:38:45 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 24 Aug 2010 08:38:45 -0500
Subject: [Bioperl-l] a problem when using the Bio::DB::Fasta
In-Reply-To: <AANLkTik_yFysscFwAd-8Ar4S_cM-XCk5w+C=8121MWNA@mail.gmail.com>
References: <AANLkTinnyEw4f8F5BP+CicffaVCe+pBEXc+0rj5vu1iG@mail.gmail.com>
	<AANLkTik_yFysscFwAd-8Ar4S_cM-XCk5w+C=8121MWNA@mail.gmail.com>
Message-ID: <995BCF30-99B2-46C2-A4E8-681F9E2A0BB5@illinois.edu>

Guifeng,

Did you follow Jason's advice yesterday about converting the FASTA over to a more consistent length?  Or checking the database itself?  These are both things reiterated by Florent and Peter.

>From Jason's last response:

-------------------------
Wei -

Please ask your questions on the bioperl mailing list, I cannot answer questions directly for all requests.
Your problem has been answered by me on the list before so I urge you to use the list archives as a starting point.

The line lengths of the fasta file sequence aren't the same length.

you need to run this
bp_sreformat -if fasta -of fasta -i ORIGINAL -o NEW
mv NEW ORIGINAL

or with sreformat
sreformat fasta ORIGINAL > NEW
mv NEW ORIGINAL
-------------------------

chris


On Aug 24, 2010, at 6:28 AM, Guifeng Wei wrote:

> Hi,
> 
> i have revised my scripts according to the previous email from Florent.
> However, there were still some errors which frustrated me so much.
> 
> The errors are as follows:
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Each line of the fasta entry must be the same length except the last.
>   Line above #301451 '
> ..' is 22 != 51 chars.
> STACK: Error::throw
> STACK: Bio::Root::Root::throw
> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368
> STACK: Bio::DB::Fasta::calculate_offsets
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:770
> STACK: Bio::DB::Fasta::index_dir
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:593
> STACK: Bio::DB::Fasta::new
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:488
> STACK: bed2fasta.pl:13
> -----------------------------------------------------------
> indexing was interrupted, so unlinking
> /home/wgf/elegans190.dna//directory.index at
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm line 1053
> But in the directory /home/wgf/elegans190.dna/ , it concludes 6 files,
> each contains the complete sequences from one single chromosome, the format
> is fasta. The extension of the FASTA files is .fa. Every single file is
> started as ">chromosoemeXXX" followed by the thousands of sequences.
> 
> and therefore, it warn me that "Each line of the fasta entry must be the
> same length except the last". and "indexing was interrupted, so unlinking
> /home/wgf/elegans190.dna//directory".
> 
> i was much confused about this. so for help.
> 
> Wei Guifeng
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From scott at scottcain.net  Tue Aug 24 11:01:47 2010
From: scott at scottcain.net (Scott Cain)
Date: Tue, 24 Aug 2010 11:01:47 -0400
Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query
In-Reply-To: <1282321387.30812.17.camel@pyrimidine.igb.uiuc.edu>
References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au>
	<986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net>
	<AANLkTikiTQwzKJGrAKAtCqbHiAVw_uENrJu4Y038=anD@mail.gmail.com>
	<045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net>
	<5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu>
	<4C6DADDF.1000103@cornell.edu>
	<20100820155956.GB400@vpn-165-124-164-118.vpn.northwestern.edu>
	<1282321387.30812.17.camel@pyrimidine.igb.uiuc.edu>
Message-ID: <AANLkTin01uf32_1G2+d8PA2YEtw3UfB5FK+CVPnLCD81@mail.gmail.com>

Hi Chris,

GMOD still only supports Chado with Postgres (for example, the GFF
loader assumes a Postgres database), but when I reengineered the GFF
loader a few years ago, I tried to do it with subclassing the loader
in mind so that it could be subclassed to work with other RDMS.

Scott


On Fri, Aug 20, 2010 at 12:23 PM, Chris Fields <cjfields at illinois.edu> wrote:
> On Fri, 2010-08-20 at 10:59 -0500, Siddhartha Basu wrote:
>> Hi,
>>
>> On Thu, 19 Aug 2010, Robert Buels wrote:
>>
>> > Chris Fields wrote:
>> > > I think it's worth exploring having a DBIx::Class-based middle-ware
>> > > approach similar to what Rob Buels has done for Chado. ?That would be
>> > > fairly easy to get started using DBIx::Class::Schema::Loader.
>> > > After that it would require optimization and tweaking, which is
>> > > potentially more complex than Rob's setup as Chado is very Pg-specific,
>> > > but maybe Rob can elaborate...
>> >
>> > Elaborating on how Bio::Chado::Schema is developed:
>> >
>> > The vast majority of the code and POD in BCS is autogenerated by
>> > DBIx::Class::Schema::Loader. ?DBICSL gives you a baseline set of
>> > DBIx::Class classes that covers all the tables, views, columns, unique
>> > constraints, and foreign key relationships.
>> >
>> > Beyond that, you have to add on yourself. ?In BCS, we have mostly done
>> > things like:
>> >
>> > ? * make better-named aliases for some of the autogenerated
>> > ? ? relationships (though DBICSL does a surprisingly good job of naming
>> > ? ? relationships automatically most of the time)
>> > ? * add a tiny bit of bioperl compatibility (this needs a lot more work
>> > ? ? by somebody, volunteers needed!)
>> > ? * add convenience methods for using some of the Chado property tables
>> > ? * use DBIx::Class::Tree::NestedSet to add some powerful ways of
>> > ? ? traversing phylogenetic tree relationships
>> >
>> > Regarding DB backend specificity, BCS isn't Pg-specific at all, because
>> > DBIx::Class itself goes to great lengths to be compatible (and performant!)
>> > with just about every relational database out there.
>> I would vouch for that at least as far as chado in oracle is concerned.
>> So, ?far BCS works out flawlessly with our oracle chado instance at
>> dictybase. Quite a chunk of BCS based code is also active in couple of
>> our Mojo based webapps. The part which i still couldn't use directly is
>> the 'synonym' table as it clashes with oracle specific reserved keywords.
>> However, ?overall it seems to quite cross-RDMS compatible and highly
>> recommended.
>>
>> -siddhartha
>
> Just to point out, I didn't say BCS is Pg-specific, but that Chado is
> (that was the DBMS it was designed for). ?Maybe that should be amended
> to 'was' now :)
>
> I recall seeing a page on this somewhere on the GMOD website along the
> lines of "MySQL has problems so we chose Pg", and that Chado support
> would focus on Pg. ?I'm guessing that's no longer the case? ?Or is only
> the server-side stuff Pg-specific.
>
>> >In fact, the BCS test
>> > suite deploys a Chado schema into a temporary SQLite database using
>> > DBIC::Schema's deploy() method, and runs all of its tests on that. ?Very
>> > handy.
>> >
>> > Chado's Pg-specific server-side functions can of course be called through
>> > BCS if they are present, but it's perfectly possible to use Chado without
>> > any of the server-side functions, and mostly the way I use it.
>> >
>> > Rob
>
> I think this opens up the possibility of starting a DBIx::Class-based
> middleware solution. ?Hilmar, did you want to take that on?
>
> chris
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
Ontario Institute for Cancer Research


From bgs500 at york.ac.uk  Tue Aug 24 11:35:53 2010
From: bgs500 at york.ac.uk (Ben Saville)
Date: Tue, 24 Aug 2010 16:35:53 +0100
Subject: [Bioperl-l] Problem Parsing BLAST output
In-Reply-To: <0384052D-74D2-4789-B7FA-76EED826044F@sbc.su.se>
References: <77D8BE69-E2D3-4C75-BED4-08F1755E414F@york.ac.uk>
	<0384052D-74D2-4789-B7FA-76EED826044F@sbc.su.se>
Message-ID: <34F7412D-2BFA-4D80-AEEB-2B8A9BE415D4@york.ac.uk>

Sorry for the Delay in replying, 454 data analysis is very time  
consuming.

please see http://seqanswers.com/forums/showthread.php?t=6484
For a discussion about this problem, and how we solved the issue.

Thanks for the reply though, much appreciated!

Regards
Ben Saville


On 20 Aug 2010, at 14:48, Dave Messina wrote:

> Hi Ben,
>
> I would not use the script you posted ? I don't think it does what  
> you want.
>
> If you haven't already, you should take a look at the beginners' HOWTO
>
> 	http://www.bioperl.org/wiki/HOWTO:Beginners
>
>
> the SearchIO HOWTO
>
> 	http://www.bioperl.org/wiki/HOWTO:SearchIO
>
>
> and the example scripts included with BioPerl:
>
> 	http://www.bioperl.org/wiki/Scripts
>
>
>
> Incidentally, it's a lot of fiddly data processing to parse blast  
> reports for many contigs against multiple databases and then go back  
> and collate the results by query. I'm not sure exactly what you want  
> to do once you've separated by query ? if you provide some more  
> information, we could suggest ways to best get you where you want to  
> go.
>
> I will mention, though, that BLAST has the ability to search  
> multiple separate databases in one go and collate the results for  
> you. So that's something to consider.
>
>
>
> Dave
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Tue Aug 24 11:54:20 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 24 Aug 2010 10:54:20 -0500
Subject: [Bioperl-l] a problem when using the Bio::DB::Fasta
In-Reply-To: <AANLkTi=7_fFU4Q53S1onRZpFaVoS6ndNNq68ZSHMDoe3@mail.gmail.com>
References: <AANLkTinnyEw4f8F5BP+CicffaVCe+pBEXc+0rj5vu1iG@mail.gmail.com>
	<AANLkTik_yFysscFwAd-8Ar4S_cM-XCk5w+C=8121MWNA@mail.gmail.com>
	<995BCF30-99B2-46C2-A4E8-681F9E2A0BB5@illinois.edu>
	<AANLkTi=7_fFU4Q53S1onRZpFaVoS6ndNNq68ZSHMDoe3@mail.gmail.com>
Message-ID: <B269BA3E-C0E7-4FEA-BA78-E164F4D2B787@illinois.edu>

Please keep all responses on-list.  

Regarding sreformat:

http://tinyurl.com/28q75rr

Judging by the stack traces below, you are also running off a UNIX-like system.  To concatenate files, use 'cat'.  So, for all files ending with .fa:

cat *.fa >> all.fa

chris

On Aug 24, 2010, at 8:54 AM, Guifeng Wei wrote:

> Hello Fields,
>  
> i have checked the fasta files. i suddenly find that the last line is blank line, and the last second is less than common.
>  
> i am not able to run the command line as Jason's advice because i have no knowledge about "sreformat".
>  
> i also want to ask a more question. i want megre the several single chromosome sequence file into one, OK?
>  
> thank you very much.
>  
> Wei Guifeng
> 2010/8/24 Chris Fields <cjfields at illinois.edu>
> Guifeng,
> 
> Did you follow Jason's advice yesterday about converting the FASTA over to a more consistent length?  Or checking the database itself?  These are both things reiterated by Florent and Peter.
> 
> From Jason's last response:
> 
> -------------------------
> Wei -
> 
> Please ask your questions on the bioperl mailing list, I cannot answer questions directly for all requests.
> Your problem has been answered by me on the list before so I urge you to use the list archives as a starting point.
> 
> The line lengths of the fasta file sequence aren't the same length.
> 
> you need to run this
> bp_sreformat -if fasta -of fasta -i ORIGINAL -o NEW
> mv NEW ORIGINAL
> 
> or with sreformat
> sreformat fasta ORIGINAL > NEW
> mv NEW ORIGINAL
> -------------------------
> 
> chris
> 
> 
> On Aug 24, 2010, at 6:28 AM, Guifeng Wei wrote:
> 
> > Hi,
> >
> > i have revised my scripts according to the previous email from Florent.
> > However, there were still some errors which frustrated me so much.
> >
> > The errors are as follows:
> >
> > ------------- EXCEPTION: Bio::Root::Exception -------------
> > MSG: Each line of the fasta entry must be the same length except the last.
> >   Line above #301451 '
> > ..' is 22 != 51 chars.
> > STACK: Error::throw
> > STACK: Bio::Root::Root::throw
> > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368
> > STACK: Bio::DB::Fasta::calculate_offsets
> > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:770
> > STACK: Bio::DB::Fasta::index_dir
> > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:593
> > STACK: Bio::DB::Fasta::new
> > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:488
> > STACK: bed2fasta.pl:13
> > -----------------------------------------------------------
> > indexing was interrupted, so unlinking
> > /home/wgf/elegans190.dna//directory.index at
> > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm line 1053
> > But in the directory /home/wgf/elegans190.dna/ , it concludes 6 files,
> > each contains the complete sequences from one single chromosome, the format
> > is fasta. The extension of the FASTA files is .fa. Every single file is
> > started as ">chromosoemeXXX" followed by the thousands of sequences.
> >
> > and therefore, it warn me that "Each line of the fasta entry must be the
> > same length except the last". and "indexing was interrupted, so unlinking
> > /home/wgf/elegans190.dna//directory".
> >
> > i was much confused about this. so for help.
> >
> > Wei Guifeng
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> 
> 
> -- 
> ?????? Wei Guifeng
> 
> 
> 


From cjfields at illinois.edu  Tue Aug 24 12:14:51 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 24 Aug 2010 11:14:51 -0500
Subject: [Bioperl-l] Problem Parsing BLAST output
In-Reply-To: <34F7412D-2BFA-4D80-AEEB-2B8A9BE415D4@york.ac.uk>
References: <77D8BE69-E2D3-4C75-BED4-08F1755E414F@york.ac.uk>
	<0384052D-74D2-4789-B7FA-76EED826044F@sbc.su.se>
	<34F7412D-2BFA-4D80-AEEB-2B8A9BE415D4@york.ac.uk>
Message-ID: <69C47A74-09C7-4024-9303-A3893658A2A8@illinois.edu>

Just in case anyone needs it, there is a way to index these as well (both BLAST and the two tabular BLAST versions) for fast lookups of specific reports, if needed.  See Bio::Index::Blast and Bio::Index::BlastTable in BioPerl.

Caveat: I believe there is a bug with BLAST+ text output indexing (it chops the header off subsequent reports).  I haven't investigated it enough, though, but I'll try looking into it today.  

chris

On Aug 24, 2010, at 10:35 AM, Ben Saville wrote:

> Sorry for the Delay in replying, 454 data analysis is very time consuming.
> 
> please see http://seqanswers.com/forums/showthread.php?t=6484
> For a discussion about this problem, and how we solved the issue.
> 
> Thanks for the reply though, much appreciated!
> 
> Regards
> Ben Saville
> 
> 
> 
> 
> 
> On 20 Aug 2010, at 14:48, Dave Messina wrote:
> 
>> Hi Ben,
>> 
>> I would not use the script you posted ? I don't think it does what you want.
>> 
>> If you haven't already, you should take a look at the beginners' HOWTO
>> 
>> 	http://www.bioperl.org/wiki/HOWTO:Beginners
>> 
>> 
>> the SearchIO HOWTO
>> 
>> 	http://www.bioperl.org/wiki/HOWTO:SearchIO
>> 
>> 
>> and the example scripts included with BioPerl:
>> 
>> 	http://www.bioperl.org/wiki/Scripts
>> 
>> 
>> 
>> Incidentally, it's a lot of fiddly data processing to parse blast reports for many contigs against multiple databases and then go back and collate the results by query. I'm not sure exactly what you want to do once you've separated by query ? if you provide some more information, we could suggest ways to best get you where you want to go.
>> 
>> I will mention, though, that BLAST has the ability to search multiple separate databases in one go and collate the results for you. So that's something to consider.
>> 
>> 
>> 
>> Dave
>> 
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Tue Aug 24 12:17:17 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 24 Aug 2010 11:17:17 -0500
Subject: [Bioperl-l] FYI: interesting stuff in BLAST 2.2.24 release
	announcement
References: <73C34D2F-813E-4FCE-8819-B86BC5A974C0@ncbi.nlm.nih.gov>
Message-ID: <A26B0224-CFDD-4D2B-A5B0-4275693416FD@illinois.edu>

FYI,

Very interesting additions to BLAST+ (archive format).  

chris

Begin forwarded message:

> From: mcginnis <mcginnis at ncbi.nlm.nih.gov>
> Date: August 24, 2010 10:46:50 AM CDT
> To: NLM/NCBI List blast-announce <blast-announce at ncbi.nlm.nih.gov>
> Subject: [blast-announce] Correction: BLAST 2.2.24 release announcement
> 
> A new version of the stand-alone applications is available.
>  
> Users are encouraged to use the BLAST+ applications available at ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/
> 
> This release includes a number of bug fixes as well as new features for the BLAST+ applications:
>  
> * Introduce BLAST Archive format to permit reformatting of stand-alone BLAST searches with the blast_formatter(see BLAST+ user manual) 
> * Added the blast_formatter application (see BLAST+ user manual)
> * Added support for translated subject soft masking in the BLAST databases
> * Added support for the BLAST Trace-back operations (btop) output format
> * Added command line options to blastdbcmd for listing available BLAST databases
> * Improved performance of formatting of remote BLAST searches
> * Use a consistent exit code for out of memory conditions
> * Fixed bug in indexed megablast with multiple space-separated BLAST databases
> * Fixed bugs in legacy_blast.pl, blastdbcmd, rpsblast, and makeblastdb
> * Fixed Windows installer for 64-bit installations
>  
> BLAST+ applications, as well as the legacy C applications (e.g. blastall), may be downloaded from http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download


From David.Messina at sbc.su.se  Tue Aug 24 13:00:14 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Tue, 24 Aug 2010 19:00:14 +0200
Subject: [Bioperl-l] FYI: interesting stuff in BLAST 2.2.24 release
	announcement
In-Reply-To: <A26B0224-CFDD-4D2B-A5B0-4275693416FD@illinois.edu>
References: <73C34D2F-813E-4FCE-8819-B86BC5A974C0@ncbi.nlm.nih.gov>
	<A26B0224-CFDD-4D2B-A5B0-4275693416FD@illinois.edu>
Message-ID: <27DD75E8-4452-4B2D-B5B9-A686C113E5B6@sbc.su.se>

Here's a link to the manual:
ftp://ftp.ncbi.nlm.nih.gov//blast/executables/blast%2B/2.2.24/user_manual.pdf

(Is it on the NCBI website somewhere? Strange to have only a downloadable PDF.) The section on the new archive format is on page 27.

It seems like a nice idea to have the flexibility, but I wonder about the time cost of using this format.

One of the big gains from using tab-delimited output is that BLAST doesn't have to do all the post-processing to generate the alignment views. By doing the archive format, which if I understand it correctly is ASN.1, you're always paying the full price in time (and space, for that matter).


Dave


On Aug 24, 2010, at 18:17 , Chris Fields wrote:

> FYI,
> 
> Very interesting additions to BLAST+ (archive format).  
> 
> chris
> 
> Begin forwarded message:
> 
>> From: mcginnis <mcginnis at ncbi.nlm.nih.gov>
>> Date: August 24, 2010 10:46:50 AM CDT
>> To: NLM/NCBI List blast-announce <blast-announce at ncbi.nlm.nih.gov>
>> Subject: [blast-announce] Correction: BLAST 2.2.24 release announcement
>> 
>> A new version of the stand-alone applications is available.
>> 
>> Users are encouraged to use the BLAST+ applications available at ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/
>> 
>> This release includes a number of bug fixes as well as new features for the BLAST+ applications:
>> 
>> * Introduce BLAST Archive format to permit reformatting of stand-alone BLAST searches with the blast_formatter(see BLAST+ user manual) 
>> * Added the blast_formatter application (see BLAST+ user manual)
>> * Added support for translated subject soft masking in the BLAST databases
>> * Added support for the BLAST Trace-back operations (btop) output format
>> * Added command line options to blastdbcmd for listing available BLAST databases
>> * Improved performance of formatting of remote BLAST searches
>> * Use a consistent exit code for out of memory conditions
>> * Fixed bug in indexed megablast with multiple space-separated BLAST databases
>> * Fixed bugs in legacy_blast.pl, blastdbcmd, rpsblast, and makeblastdb
>> * Fixed Windows installer for 64-bit installations
>> 
>> BLAST+ applications, as well as the legacy C applications (e.g. blastall), may be downloaded from http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Tue Aug 24 13:26:49 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 24 Aug 2010 12:26:49 -0500
Subject: [Bioperl-l] FYI: interesting stuff in BLAST 2.2.24 release
	announcement
In-Reply-To: <27DD75E8-4452-4B2D-B5B9-A686C113E5B6@sbc.su.se>
References: <73C34D2F-813E-4FCE-8819-B86BC5A974C0@ncbi.nlm.nih.gov>
	<A26B0224-CFDD-4D2B-A5B0-4275693416FD@illinois.edu>
	<27DD75E8-4452-4B2D-B5B9-A686C113E5B6@sbc.su.se>
Message-ID: <D84DD1C8-6CBE-40F1-8CF9-F9482F0E4B18@illinois.edu>

It's probably more applicable from the viewpoint of a cluster admin who would want to add the flexibility of having a single archive and allowing any format (as opposed to re-running the analysis). I'm just wondering if there is anything to glean there for possible alignment archiving purposes (ala SAM/BAM), but if it's ASN.1, likely not.

chris

On Aug 24, 2010, at 12:00 PM, Dave Messina wrote:

> Here's a link to the manual:
> ftp://ftp.ncbi.nlm.nih.gov//blast/executables/blast%2B/2.2.24/user_manual.pdf
> 
> (Is it on the NCBI website somewhere? Strange to have only a downloadable PDF.) The section on the new archive format is on page 27.
> 
> It seems like a nice idea to have the flexibility, but I wonder about the time cost of using this format.
> 
> One of the big gains from using tab-delimited output is that BLAST doesn't have to do all the post-processing to generate the alignment views. By doing the archive format, which if I understand it correctly is ASN.1, you're always paying the full price in time (and space, for that matter).
> 
> 
> 
> Dave
> 
> 
> 
> 
> On Aug 24, 2010, at 18:17 , Chris Fields wrote:
> 
>> FYI,
>> 
>> Very interesting additions to BLAST+ (archive format).  
>> 
>> chris
>> 
>> Begin forwarded message:
>> 
>>> From: mcginnis <mcginnis at ncbi.nlm.nih.gov>
>>> Date: August 24, 2010 10:46:50 AM CDT
>>> To: NLM/NCBI List blast-announce <blast-announce at ncbi.nlm.nih.gov>
>>> Subject: [blast-announce] Correction: BLAST 2.2.24 release announcement
>>> 
>>> A new version of the stand-alone applications is available.
>>> 
>>> Users are encouraged to use the BLAST+ applications available at ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/
>>> 
>>> This release includes a number of bug fixes as well as new features for the BLAST+ applications:
>>> 
>>> * Introduce BLAST Archive format to permit reformatting of stand-alone BLAST searches with the blast_formatter(see BLAST+ user manual) 
>>> * Added the blast_formatter application (see BLAST+ user manual)
>>> * Added support for translated subject soft masking in the BLAST databases
>>> * Added support for the BLAST Trace-back operations (btop) output format
>>> * Added command line options to blastdbcmd for listing available BLAST databases
>>> * Improved performance of formatting of remote BLAST searches
>>> * Use a consistent exit code for out of memory conditions
>>> * Fixed bug in indexed megablast with multiple space-separated BLAST databases
>>> * Fixed bugs in legacy_blast.pl, blastdbcmd, rpsblast, and makeblastdb
>>> * Fixed Windows installer for 64-bit installations
>>> 
>>> BLAST+ applications, as well as the legacy C applications (e.g. blastall), may be downloaded from http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From David.Messina at sbc.su.se  Tue Aug 24 14:45:29 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Tue, 24 Aug 2010 20:45:29 +0200
Subject: [Bioperl-l] FYI: interesting stuff in BLAST 2.2.24 release
	announcement
In-Reply-To: <D84DD1C8-6CBE-40F1-8CF9-F9482F0E4B18@illinois.edu>
References: <73C34D2F-813E-4FCE-8819-B86BC5A974C0@ncbi.nlm.nih.gov>
	<A26B0224-CFDD-4D2B-A5B0-4275693416FD@illinois.edu>
	<27DD75E8-4452-4B2D-B5B9-A686C113E5B6@sbc.su.se>
	<D84DD1C8-6CBE-40F1-8CF9-F9482F0E4B18@illinois.edu>
Message-ID: <00C04DF9-F3C2-4574-B1E4-A3BF28EE953F@sbc.su.se>

> It's probably more applicable from the viewpoint of a cluster admin who would want to add the flexibility of having a single archive and allowing any format (as opposed to re-running the analysis).

Good point.


> I'm just wondering if there is anything to glean there for possible alignment archiving purposes (ala SAM/BAM), but if it's ASN.1, likely not.

To be honest, I didn't look that closely at it. It may be worth considering nevertheless.


Dave


From buiduyminh at gmail.com  Tue Aug 24 14:56:43 2010
From: buiduyminh at gmail.com (Minh Bui)
Date: Tue, 24 Aug 2010 14:56:43 -0400
Subject: [Bioperl-l] bp_seqfeature_load.pl fails on Mac os. Please help.
In-Reply-To: <491D1B66-741F-4315-8A6B-46F465956017@sgul.ac.uk>
References: <AANLkTinsyOMPJxpks_pqMwLpW8gx0VRihhJsLDnF53mu@mail.gmail.com>
	<491D1B66-741F-4315-8A6B-46F465956017@sgul.ac.uk>
Message-ID: <AANLkTimOe=T9FrpMPqMy8yyrfz8Sf7QJ5Rr5YYFjicJb@mail.gmail.com>

How can I know where DBD:mysql PATH on my MAC? I am very new to MAC sorry.

I just check and mysql.pm is in
/Library/Perl/5.8.6/Bio/DB/SeqFeature/Store/DBI/mysql.pm


On 8/21/10, Adam Witney <awitney at sgul.ac.uk> wrote:
>
>  On 20 Aug 2010, at 22:29, Minh Bui wrote:
>
>  > Hi,,
>  > I am trying to load my GFF file to mysql database but I got this error
>  > when I ran the bp_seqfeature_load.pl ( bioperl 1.6.1 on ?MAC)
>  >
>  > [BioComplexity-5:/usr/local/bin] minh% perl bp_seqfeature_load.pl
>  > install_driver(mysql) failed: Can't locate DBD/mysql.pm in @INC (@INC
>  > contains: /sw/lib/perl5 /sw/lib/perl5/darwin
>  > /System/Library/Perl/5.8.6/darwin-thread-multi-2level
>  > /System/Library/Perl/5.8.6
>  > /Library/Perl/5.8.6/darwin-thread-multi-2level /Library/Perl/5.8.6
>  > /Library/Perl /Network/Library/Perl/5.8.6/darwin-thread-multi-2level
>  > /Network/Library/Perl/5.8.6 /Network/Library/Perl
>  > /System/Library/Perl/Extras/5.8.6/darwin-thread-multi-2level
>  > /System/Library/Perl/Extras/5.8.6 /Library/Perl/5.8.1 .) at (eval 44)
>  > line 3.
>  > Perhaps the DBD::mysql perl module hasn't been fully installed,
>  > or perhaps the capitalisation of 'mysql' isn't right.
>  > Available drivers: DBM, ExampleP, File, Gofer, Proxy, Sponge.
>  > at /Library/Perl/5.8.6/Bio/DB/SeqFeature/Store/DBI/mysql.pm line 212
>  >
>  > I am using MAC OSX version 10.4.10 and MAMP? Isnt it the
>  > "/Library/Perl/5.8.6" already in @INC? What am I missing?
>  > I have been googling this error for a few hours. I also install
>  > Bioperl and reinstall DBD::mysql using CPAN. It still doesnt work..
>  >
>  > Here is my $PERL5LIB: ?/sw/lib/perl5:/sw/lib/perl5/darwin/
>
>
>
> Where did DBD:mysql get installed? can you verify that DBD/mysql.pm is actually in one of those directories listed above?
>
>


From scott at scottcain.net  Tue Aug 24 15:04:04 2010
From: scott at scottcain.net (Scott Cain)
Date: Tue, 24 Aug 2010 15:04:04 -0400
Subject: [Bioperl-l] bp_seqfeature_load.pl fails on Mac os. Please help.
In-Reply-To: <AANLkTimOe=T9FrpMPqMy8yyrfz8Sf7QJ5Rr5YYFjicJb@mail.gmail.com>
References: <AANLkTinsyOMPJxpks_pqMwLpW8gx0VRihhJsLDnF53mu@mail.gmail.com>
	<491D1B66-741F-4315-8A6B-46F465956017@sgul.ac.uk>
	<AANLkTimOe=T9FrpMPqMy8yyrfz8Sf7QJ5Rr5YYFjicJb@mail.gmail.com>
Message-ID: <AANLkTimPapxSzwVxCBMw1J0+x88K80SJ_6OH9LBkS3Jn@mail.gmail.com>

Hi Minh,

The file you found is not DBD::mysql though; it is
Bio::DB::SeqFeature::Store::DBI::mysql, which was installed along with
BioPerl.  How did you find that file?  The same method presumably
would turn up DBD::mysql if it existed.  I would use a command like
this:

  locate mysql.pm

which would locate all of the instances of files name mysql.pm on your
computer.  I would expect it to be located in
/Library/Perl/5.8.6/darwin-thread-multi-2level/DBD/ if it was
installed in a "normal" way (that is, not involving macports or fink
or MAMP).

Scott


On Tue, Aug 24, 2010 at 2:56 PM, Minh Bui <buiduyminh at gmail.com> wrote:
> How can I know where DBD:mysql PATH on my MAC? I am very new to MAC sorry.
>
> I just check and mysql.pm is in
> /Library/Perl/5.8.6/Bio/DB/SeqFeature/Store/DBI/mysql.pm
>
>
>
> On 8/21/10, Adam Witney <awitney at sgul.ac.uk> wrote:
>>
>> ?On 20 Aug 2010, at 22:29, Minh Bui wrote:
>>
>> ?> Hi,,
>> ?> I am trying to load my GFF file to mysql database but I got this error
>> ?> when I ran the bp_seqfeature_load.pl ( bioperl 1.6.1 on ?MAC)
>> ?>
>> ?> [BioComplexity-5:/usr/local/bin] minh% perl bp_seqfeature_load.pl
>> ?> install_driver(mysql) failed: Can't locate DBD/mysql.pm in @INC (@INC
>> ?> contains: /sw/lib/perl5 /sw/lib/perl5/darwin
>> ?> /System/Library/Perl/5.8.6/darwin-thread-multi-2level
>> ?> /System/Library/Perl/5.8.6
>> ?> /Library/Perl/5.8.6/darwin-thread-multi-2level /Library/Perl/5.8.6
>> ?> /Library/Perl /Network/Library/Perl/5.8.6/darwin-thread-multi-2level
>> ?> /Network/Library/Perl/5.8.6 /Network/Library/Perl
>> ?> /System/Library/Perl/Extras/5.8.6/darwin-thread-multi-2level
>> ?> /System/Library/Perl/Extras/5.8.6 /Library/Perl/5.8.1 .) at (eval 44)
>> ?> line 3.
>> ?> Perhaps the DBD::mysql perl module hasn't been fully installed,
>> ?> or perhaps the capitalisation of 'mysql' isn't right.
>> ?> Available drivers: DBM, ExampleP, File, Gofer, Proxy, Sponge.
>> ?> at /Library/Perl/5.8.6/Bio/DB/SeqFeature/Store/DBI/mysql.pm line 212
>> ?>
>> ?> I am using MAC OSX version 10.4.10 and MAMP? Isnt it the
>> ?> "/Library/Perl/5.8.6" already in @INC? What am I missing?
>> ?> I have been googling this error for a few hours. I also install
>> ?> Bioperl and reinstall DBD::mysql using CPAN. It still doesnt work..
>> ?>
>> ?> Here is my $PERL5LIB: ?/sw/lib/perl5:/sw/lib/perl5/darwin/
>>
>>
>>
>> Where did DBD:mysql get installed? can you verify that DBD/mysql.pm is actually in one of those directories listed above?
>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
Ontario Institute for Cancer Research


From jason at bioperl.org  Wed Aug 25 00:33:45 2010
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 24 Aug 2010 21:33:45 -0700
Subject: [Bioperl-l] Enquiry on gi_taxid_nucl.dmp.gz
In-Reply-To: <AANLkTi=BrV0ODdF+sEQdAmtAMxRk6y2xGgRMOgbnZz-6@mail.gmail.com>
References: <AANLkTi=BrV0ODdF+sEQdAmtAMxRk6y2xGgRMOgbnZz-6@mail.gmail.com>
Message-ID: <4C749D29.3040003@bioperl.org>

hi - please keep questions on list.


I think one of your problem is your first use of $gi2taxidfile is wrong. 
when you call tie you want to specify an dbfile you want to store the 
index in.
So call it "/tmp/gi2taxid.idx" or something like that.

In my code here 
http://github.com/bioperl/bioperl-live/blob/master/scripts/taxa/classify_hits_kingdom.PLS
you will see on line 97 we construct the name of the index file to be 
the folder, plus 'idx', plus the name gi2taxid which will be the name of 
index file.

Also it would be safer for the split to be whitespace matching and that 
you want the the two first columns from the file.  Doing this would 
eliminate the need for the chomp on the line above.

  my ($gi, $taxid) = split(/\s+/, $_);

instead of

  chomp;
  my ($gi, $taxid) = split(" ", $_,2);

There may be other problems but these should be fixed first -- and 
please send queries to the mailing list rather than to me directly so 
that others can answer questions.

-jason
Amali Thrimawithana wrote, On 8/24/10 8:13 PM:
> Dear Jason
>
> Thank you very much for the information. I manage to get the information on
> different taxonomic  levels with the help of one of your example code
> "local_taxonomydb_query". However I am having trouble with creating a local
> index file of the gi_taxid_nucl.dmp so that I am able to get the taxonomic
> id given the GI number of NCBI. At the moment I am using the tie() function
> with DB_file and then storing the detail into a hash. However when I try to
> retrieve a taxonomic ID given the GI number, it is not returning any thing
> but an error. Below is part of the code (borrowed from the example code
> classify kingdom), can you please let me know where I am going wrong?
> ...
> my $dbh2 = tie(%taxid4gi, 'DB_File', $gi2taxidfile);
>
> if( ! $done ) {
>      my $fh;
>     open(GI2TAXID, "$gi2taxidfile") or die $!; #here passing the unzipped
> gi_taxid_nucl.dmp
>     my$i=0;
>      while (<GI2TAXID>) {
>        chomp;
>         my ($gi, $taxid) = split(" ", $_, 2);
>         $taxid4gi{$gi} = $taxid
>         if exists $taxid4gi{$gi};
>         $i++;
>       unless( $DEBUG&&  $i % 100000  ) {
>          warn "$i\n";
>      }
>      }
>      $dbh2->sync;
> }
> my $gi2='183397240';
> my $taxd2=$taxid4gi{$gi2};
>   print $taxd2, " \n";
>
> Any help would be much appreciated
>
> Thanking you
> Amali
>
> On 23 August 2010 06:29, Jason Stajich<jason at bioperl.org>  wrote:
>
>    
>> Hi Amali -
>>
>> This is how I'd print out the full classification by using the Tree methods
>> (with probably a different way of initializing the $db object to your
>> flatfiles location).
>>
>> #!/usr/bin/perl -w
>> use strict;
>> use Bio::DB::Taxonomy;
>>
>> my $db= Bio::DB::Taxonomy->new(-source =>  'flatfile',
>>                    -nodesfile =>  'taxonomy/nodes.dmp',
>>                    -namesfile =>  'taxonomy/names.dmp');
>>
>> my $taxonid = $db->get_taxonid('Homo sapiens');
>> my $taxon = $db->get_taxon(-taxonid =>  $taxonid);
>> my $tree = Bio::Tree::Tree->new(-node =>  $taxon);
>> my @taxa = $tree->get_nodes;
>> print join(",", map { $_->scientific_name } @taxa), "\n";
>>
>> -jason
>>
>> Amali Thrimawithana wrote, On 8/18/10 3:56 PM:
>>
>>   Dear Dr Stajich,
>>      
>>> I am a Masters student at Auckland university and my research is on
>>> identifying yeast species present in wine by the use of 454 sequencing. In
>>> order to carry out this research, a pipeline is being built in which at
>>> the
>>> final step each representative OTU need to be classified at different
>>> taxonomic levels (ie: at Phylum, family, class, genus and species) by
>>> using
>>> the results from BLAST. To identify the sequences at each taxonomic level,
>>> I
>>> have been trying out the Bio::DB::Taxonomy module in bioperl. Using this
>>> module, I am able to get the genus and species level by splitting the
>>> scientific name returned by the Bio::taxon object. But unfortunately I am
>>> uncertain on how to get the information for the other levels of the rank.
>>> I
>>> have tried several commands including "my @class =
>>> $node->classification;",
>>> but it does not work. Hence, could you please let me know how I might be
>>> able to get the higher levels of taxonomy such as class and phylum using
>>> bioperl?
>>>
>>> Look forward to hearing from you soon
>>>
>>> Thanking You
>>>
>>> Amali
>>>
>>>
>>>        

From roy.chaudhuri at gmail.com  Wed Aug 25 07:12:15 2010
From: roy.chaudhuri at gmail.com (Roy Chaudhuri)
Date: Wed, 25 Aug 2010 12:12:15 +0100
Subject: [Bioperl-l] Enquiry on gi_taxid_nucl.dmp.gz
In-Reply-To: <4C749D29.3040003@bioperl.org>
References: <AANLkTi=BrV0ODdF+sEQdAmtAMxRk6y2xGgRMOgbnZz-6@mail.gmail.com>
	<4C749D29.3040003@bioperl.org>
Message-ID: <4C74FA8F.3080506@gmail.com>

 > Also it would be safer for the split to be whitespace matching and that
> you want the the two first columns from the file.  Doing this would
> eliminate the need for the chomp on the line above.
>
>    my ($gi, $taxid) = split(/\s+/, $_);
>
> instead of
>
>    chomp;
>    my ($gi, $taxid) = split(" ", $_,2);

Sorry to be pedantic, but according to perldoc -f split: "As a special 
case, specifying a PATTERN of space (' ') will split on white space just 
as "split" with no arguments does"

The only difference between patterns of " " and /\s+/ is that the latter 
will return an initial null field if there is leading white space, which 
may or may not be what you want.

$ perl -e 'print join("-", split(" ", " 1\t2  3")), "\n"'
1-2-3
$ perl -e 'print join("-", split(/\s+/, " 1\t2  3")), "\n"'
-1-2-3

Cheers.
Roy.

From kanmaninradha at gmail.com  Thu Aug 26 04:29:08 2010
From: kanmaninradha at gmail.com (kanmani radha)
Date: Thu, 26 Aug 2010 01:29:08 -0700
Subject: [Bioperl-l] getting DNA sequence for exon features from GFF
Message-ID: <AANLkTimzuqNz+oHZt3JAq=eQaefJ4UnQK=0R2tQhKfmS@mail.gmail.com>

Hi All,
I would like to get the DNA seq from GFF file. I'm using Bio::Tools::GFF
module. I could get everything else but not the DNA seq.

Can anyone help me to find this out, Please. I appreciate your help very
much.
thanks,
Kanmani

#!/usr/bin/perl

use strict;
use warnings;
use Bio::Tools::GFF;

my $file = shift;

my $gffio = Bio::Tools::GFF->new(-file => $file, -gff_version => 3);
$gffio->features_attached_to_seqs(1);

while (my $feat = $gffio->next_feature()){
    my $start = $feat->start;
    my $end= $feat->end;
    my $size = $end-$start+1;
    my $strand = $feat->strand;
    my $seqid = $feat->seq_id;
    my $score = $feat->score;
    my $frame = $feat->frame;
    my $source = $feat->source_tag;
    my $type = $feat->primary_tag;
    my $gffstr = $gffio->gff_string($feat);
    my @alltags = $feat->all_tags();
    my @ID_tag_value = $feat->each_tag_value("ID");

    my  $seq = $feat->seq();
    print "$seq\n";

     if($type eq "gene"){     #
       print "@ID_tag_value\t$size\t$type\t$start\t$end\n";
    }
}

From David.Messina at sbc.su.se  Thu Aug 26 04:53:48 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 26 Aug 2010 10:53:48 +0200
Subject: [Bioperl-l] getting DNA sequence for exon features from GFF
In-Reply-To: <AANLkTimzuqNz+oHZt3JAq=eQaefJ4UnQK=0R2tQhKfmS@mail.gmail.com>
References: <AANLkTimzuqNz+oHZt3JAq=eQaefJ4UnQK=0R2tQhKfmS@mail.gmail.com>
Message-ID: <6A715DB8-C51A-4195-BB6C-73A7345CB65A@sbc.su.se>

Admittedly i'm not up on the latest uses of GFF, but as far as I know, GFF is an annotation format only ? it does not contain the actual sequence.

Have you looked in your GFF file to see if there are nucleotides in there?

Dave


On Aug 26, 2010, at 10:29, kanmani radha wrote:

> Hi All,
> I would like to get the DNA seq from GFF file. I'm using Bio::Tools::GFF
> module. I could get everything else but not the DNA seq.


From biopython at maubp.freeserve.co.uk  Thu Aug 26 05:02:53 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 26 Aug 2010 10:02:53 +0100
Subject: [Bioperl-l] getting DNA sequence for exon features from GFF
In-Reply-To: <6A715DB8-C51A-4195-BB6C-73A7345CB65A@sbc.su.se>
References: <AANLkTimzuqNz+oHZt3JAq=eQaefJ4UnQK=0R2tQhKfmS@mail.gmail.com>
	<6A715DB8-C51A-4195-BB6C-73A7345CB65A@sbc.su.se>
Message-ID: <AANLkTikw=9zFm5sZej0C4kTQZMnvoFNox06jCC6p9Jxy@mail.gmail.com>

On Thu, Aug 26, 2010 at 9:53 AM, Dave Messina <David.Messina at sbc.su.se> wrote:
>
> Admittedly i'm not up on the latest uses of GFF, but as far as I know, GFF
> is an annotation format only ? it does not contain the actual sequence.
>
> Have you looked in your GFF file to see if there are nucleotides in there?
>
> Dave

Actually a GFF file can optionally include a FASTA format sequence
at the end of the file, although it seems to be more common to just
supply separate GFF and FASTA files and cross reference by ID.

Peter


From David.Messina at sbc.su.se  Thu Aug 26 05:08:20 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 26 Aug 2010 11:08:20 +0200
Subject: [Bioperl-l] getting DNA sequence for exon features from GFF
In-Reply-To: <AANLkTikw=9zFm5sZej0C4kTQZMnvoFNox06jCC6p9Jxy@mail.gmail.com>
References: <AANLkTimzuqNz+oHZt3JAq=eQaefJ4UnQK=0R2tQhKfmS@mail.gmail.com>
	<6A715DB8-C51A-4195-BB6C-73A7345CB65A@sbc.su.se>
	<AANLkTikw=9zFm5sZej0C4kTQZMnvoFNox06jCC6p9Jxy@mail.gmail.com>
Message-ID: <C7C28E1D-7BAC-4D06-9EC6-71EA95F06776@sbc.su.se>

Aha, great, thanks for clarifying, Peter.

And if I bothered to look at the Bio::Tools::GFF documentation before answering :), I would have seen this:

    http://doc.bioperl.org/bioperl-live/Bio/Tools/GFF.html#General

which describes how you can use

    $gffio->get_seqs()


and related methods to pull out the sequence data.


Dave


On Aug 26, 2010, at 11:02, Peter wrote:

> On Thu, Aug 26, 2010 at 9:53 AM, Dave Messina <David.Messina at sbc.su.se> wrote:
>> 
>> Admittedly i'm not up on the latest uses of GFF, but as far as I know, GFF
>> is an annotation format only ? it does not contain the actual sequence.
>> 
>> Have you looked in your GFF file to see if there are nucleotides in there?
>> 
>> Dave
> 
> Actually a GFF file can optionally include a FASTA format sequence
> at the end of the file, although it seems to be more common to just
> supply separate GFF and FASTA files and cross reference by ID.
> 
> Peter


From David.Messina at sbc.su.se  Thu Aug 26 05:18:25 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 26 Aug 2010 11:18:25 +0200
Subject: [Bioperl-l] getting DNA sequence for exon features from GFF
In-Reply-To: <C7C28E1D-7BAC-4D06-9EC6-71EA95F06776@sbc.su.se>
References: <AANLkTimzuqNz+oHZt3JAq=eQaefJ4UnQK=0R2tQhKfmS@mail.gmail.com>
	<6A715DB8-C51A-4195-BB6C-73A7345CB65A@sbc.su.se>
	<AANLkTikw=9zFm5sZej0C4kTQZMnvoFNox06jCC6p9Jxy@mail.gmail.com>
	<C7C28E1D-7BAC-4D06-9EC6-71EA95F06776@sbc.su.se>
Message-ID: <984552CF-01F3-4D29-932F-DD030CCC1448@sbc.su.se>

So, just to finish the thought:

Kanmani,

Apologies for my sloppy and uninformed answer. The following is only slightly less sloppy and uninformed, but may actually answer your question.

I think you need to call 

   $gffio->get_seqs()

probably as

  my @seq_objects = $gffio->get_seqs();


and then loop through those something like:

	foreach my $seq_object (@seq_objects) {
		my $seq = $seq_object->seq();
    
		foreach my $feat ($seq->get_SeqFeatures) {
			# do your feature processing here
		}
	}


Note that I haven't tested the above code.


Dave


From fs5 at sanger.ac.uk  Thu Aug 26 05:19:44 2010
From: fs5 at sanger.ac.uk (Frank Schwach)
Date: Thu, 26 Aug 2010 10:19:44 +0100
Subject: [Bioperl-l] getting DNA sequence for exon features from GFF
In-Reply-To: <AANLkTimzuqNz+oHZt3JAq=eQaefJ4UnQK=0R2tQhKfmS@mail.gmail.com>
References: <AANLkTimzuqNz+oHZt3JAq=eQaefJ4UnQK=0R2tQhKfmS@mail.gmail.com>
Message-ID: <1282814384.4777.48.camel@deskpro15336.dynamic.sanger.ac.uk>

Hi Kammani,

While GFF files may contain DNA sequence data, most of them don't, so
you will have to use the location information you get from the GFF
annotation file in conjunction with, e.g., a local FASTA database of the
genomic sequence you are working with or an online resource.


Frank


On Thu, 2010-08-26 at 01:29 -0700, kanmani radha wrote:
> Hi All,
> I would like to get the DNA seq from GFF file. I'm using Bio::Tools::GFF
> module. I could get everything else but not the DNA seq.
> 
> Can anyone help me to find this out, Please. I appreciate your help very
> much.
> thanks,
> Kanmani
> 
> #!/usr/bin/perl
> 
> use strict;
> use warnings;
> use Bio::Tools::GFF;
> 
> my $file = shift;
> 
> my $gffio = Bio::Tools::GFF->new(-file => $file, -gff_version => 3);
> $gffio->features_attached_to_seqs(1);
> 
> while (my $feat = $gffio->next_feature()){
>     my $start = $feat->start;
>     my $end= $feat->end;
>     my $size = $end-$start+1;
>     my $strand = $feat->strand;
>     my $seqid = $feat->seq_id;
>     my $score = $feat->score;
>     my $frame = $feat->frame;
>     my $source = $feat->source_tag;
>     my $type = $feat->primary_tag;
>     my $gffstr = $gffio->gff_string($feat);
>     my @alltags = $feat->all_tags();
>     my @ID_tag_value = $feat->each_tag_value("ID");
> 
>     my  $seq = $feat->seq();
>     print "$seq\n";
> 
>      if($type eq "gene"){     #
>        print "@ID_tag_value\t$size\t$type\t$start\t$end\n";
>     }
> }
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 

From cjfields at illinois.edu  Thu Aug 26 10:20:48 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 26 Aug 2010 09:20:48 -0500
Subject: [Bioperl-l] getting DNA sequence for exon features from GFF
In-Reply-To: <1282814384.4777.48.camel@deskpro15336.dynamic.sanger.ac.uk>
References: <AANLkTimzuqNz+oHZt3JAq=eQaefJ4UnQK=0R2tQhKfmS@mail.gmail.com>
	<1282814384.4777.48.camel@deskpro15336.dynamic.sanger.ac.uk>
Message-ID: <6CC5D44B-47C7-4A05-B834-EAB9173C74CE@illinois.edu>

Kammani,

If you are using BioPerl, the best option currently available is to load a database with all relevant information (GFF and FASTA), then use that database for querying.  The most commonly-used ones now are Bio::DB::SeqFeature::Store and Bio::DB::GFF; the former is very GFF3-centric, but I believe it can handle GFF/GTF, and it has various database adaptors (MySQL, Pg, BDB, SQLite).

chris

On Aug 26, 2010, at 4:19 AM, Frank Schwach wrote:

> Hi Kammani,
> 
> While GFF files may contain DNA sequence data, most of them don't, so
> you will have to use the location information you get from the GFF
> annotation file in conjunction with, e.g., a local FASTA database of the
> genomic sequence you are working with or an online resource.
> 
> 
> Frank
> 
> 
> 
> On Thu, 2010-08-26 at 01:29 -0700, kanmani radha wrote:
>> Hi All,
>> I would like to get the DNA seq from GFF file. I'm using Bio::Tools::GFF
>> module. I could get everything else but not the DNA seq.
>> 
>> Can anyone help me to find this out, Please. I appreciate your help very
>> much.
>> thanks,
>> Kanmani
>> 
>> #!/usr/bin/perl
>> 
>> use strict;
>> use warnings;
>> use Bio::Tools::GFF;
>> 
>> my $file = shift;
>> 
>> my $gffio = Bio::Tools::GFF->new(-file => $file, -gff_version => 3);
>> $gffio->features_attached_to_seqs(1);
>> 
>> while (my $feat = $gffio->next_feature()){
>>    my $start = $feat->start;
>>    my $end= $feat->end;
>>    my $size = $end-$start+1;
>>    my $strand = $feat->strand;
>>    my $seqid = $feat->seq_id;
>>    my $score = $feat->score;
>>    my $frame = $feat->frame;
>>    my $source = $feat->source_tag;
>>    my $type = $feat->primary_tag;
>>    my $gffstr = $gffio->gff_string($feat);
>>    my @alltags = $feat->all_tags();
>>    my @ID_tag_value = $feat->each_tag_value("ID");
>> 
>>    my  $seq = $feat->seq();
>>    print "$seq\n";
>> 
>>     if($type eq "gene"){     #
>>       print "@ID_tag_value\t$size\t$type\t$start\t$end\n";
>>    }
>> }
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> 
> -- 
> The Wellcome Trust Sanger Institute is operated by Genome Research 
> Limited, a charity registered in England with number 1021457 and a 
> company registered in England with number 2742969, whose registered 
> office is 215 Euston Road, London, NW1 2BE. 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Thu Aug 26 10:31:59 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 26 Aug 2010 09:31:59 -0500
Subject: [Bioperl-l] getting DNA sequence for exon features from GFF
In-Reply-To: <AANLkTikw=9zFm5sZej0C4kTQZMnvoFNox06jCC6p9Jxy@mail.gmail.com>
References: <AANLkTimzuqNz+oHZt3JAq=eQaefJ4UnQK=0R2tQhKfmS@mail.gmail.com>
	<6A715DB8-C51A-4195-BB6C-73A7345CB65A@sbc.su.se>
	<AANLkTikw=9zFm5sZej0C4kTQZMnvoFNox06jCC6p9Jxy@mail.gmail.com>
Message-ID: <DD36A578-4156-4911-8432-84BD5ECB3AB8@illinois.edu>

On Aug 26, 2010, at 4:02 AM, Peter wrote:

> On Thu, Aug 26, 2010 at 9:53 AM, Dave Messina <David.Messina at sbc.su.se> wrote:
>> 
>> Admittedly i'm not up on the latest uses of GFF, but as far as I know, GFF
>> is an annotation format only ? it does not contain the actual sequence.
>> 
>> Have you looked in your GFF file to see if there are nucleotides in there?
>> 
>> Dave
> 
> Actually a GFF file can optionally include a FASTA format sequence
> at the end of the file, although it seems to be more common to just
> supply separate GFF and FASTA files and cross reference by ID.
> 
> Peter

IIRC, optionally including FASTA sequence is specified only in the GFF3 spec; use of FASTA isn't explicitly mentioned in earlier versions.  We only support it with earlier GFF due to convergence of the various GFF parsers.  

The original GFF spec proposed allowing sequence, but it's in the form of meta information and I have never seen it used in practice (as you mention, the FASTA is normally loaded separately).

chris

From kanmaninradha at gmail.com  Thu Aug 26 12:22:14 2010
From: kanmaninradha at gmail.com (kanmani radha)
Date: Thu, 26 Aug 2010 09:22:14 -0700
Subject: [Bioperl-l] getting DNA sequence for exon features from GFF
In-Reply-To: <6CC5D44B-47C7-4A05-B834-EAB9173C74CE@illinois.edu>
References: <AANLkTimzuqNz+oHZt3JAq=eQaefJ4UnQK=0R2tQhKfmS@mail.gmail.com>
	<1282814384.4777.48.camel@deskpro15336.dynamic.sanger.ac.uk>
	<6CC5D44B-47C7-4A05-B834-EAB9173C74CE@illinois.edu>
Message-ID: <AANLkTinxcoKBHqU7bnfyNA6bi5qBjNAYR54c6K+Pg7rz@mail.gmail.com>

Hi Everyone,

Thanks very much for this clarification.  Thanks a ton for every one who
spared their time to educate me.

I see your points.  Please correct me if I am wrong.

I understand that, Its better to use use Bio::DB::SeqFeature or Bio::DB::GFF
to load the fasta sequences (from a separate multifasta) file and
then Bio::Tools::GFF to parse the feature info from a gff file . Then query
the created database for the relevent GFF coordinates....

I will implement this.

Thanks once again.
Kanmani

On Thu, Aug 26, 2010 at 7:20 AM, Chris Fields <cjfields at illinois.edu> wrote:

> Kammani,
>
> If you are using BioPerl, the best option currently available is to load a
> database with all relevant information (GFF and FASTA), then use that
> database for querying.  The most commonly-used ones now are
> Bio::DB::SeqFeature::Store and Bio::DB::GFF; the former is very
> GFF3-centric, but I believe it can handle GFF/GTF, and it has various
> database adaptors (MySQL, Pg, BDB, SQLite).
>
> chris
>
> On Aug 26, 2010, at 4:19 AM, Frank Schwach wrote:
>
> > Hi Kammani,
> >
> > While GFF files may contain DNA sequence data, most of them don't, so
> > you will have to use the location information you get from the GFF
> > annotation file in conjunction with, e.g., a local FASTA database of the
> > genomic sequence you are working with or an online resource.
> >
> >
> > Frank
> >
> >
> >
> > On Thu, 2010-08-26 at 01:29 -0700, kanmani radha wrote:
> >> Hi All,
> >> I would like to get the DNA seq from GFF file. I'm using Bio::Tools::GFF
> >> module. I could get everything else but not the DNA seq.
> >>
> >> Can anyone help me to find this out, Please. I appreciate your help very
> >> much.
> >> thanks,
> >> Kanmani
> >>
> >> #!/usr/bin/perl
> >>
> >> use strict;
> >> use warnings;
> >> use Bio::Tools::GFF;
> >>
> >> my $file = shift;
> >>
> >> my $gffio = Bio::Tools::GFF->new(-file => $file, -gff_version => 3);
> >> $gffio->features_attached_to_seqs(1);
> >>
> >> while (my $feat = $gffio->next_feature()){
> >>    my $start = $feat->start;
> >>    my $end= $feat->end;
> >>    my $size = $end-$start+1;
> >>    my $strand = $feat->strand;
> >>    my $seqid = $feat->seq_id;
> >>    my $score = $feat->score;
> >>    my $frame = $feat->frame;
> >>    my $source = $feat->source_tag;
> >>    my $type = $feat->primary_tag;
> >>    my $gffstr = $gffio->gff_string($feat);
> >>    my @alltags = $feat->all_tags();
> >>    my @ID_tag_value = $feat->each_tag_value("ID");
> >>
> >>    my  $seq = $feat->seq();
> >>    print "$seq\n";
> >>
> >>     if($type eq "gene"){     #
> >>       print "@ID_tag_value\t$size\t$type\t$start\t$end\n";
> >>    }
> >> }
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> >
> > --
> > The Wellcome Trust Sanger Institute is operated by Genome Research
> > Limited, a charity registered in England with number 1021457 and a
> > company registered in England with number 2742969, whose registered
> > office is 215 Euston Road, London, NW1 2BE.
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>

From cjfields at illinois.edu  Thu Aug 26 13:08:56 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 26 Aug 2010 12:08:56 -0500
Subject: [Bioperl-l] getting DNA sequence for exon features from GFF
In-Reply-To: <AANLkTinxcoKBHqU7bnfyNA6bi5qBjNAYR54c6K+Pg7rz@mail.gmail.com>
References: <AANLkTimzuqNz+oHZt3JAq=eQaefJ4UnQK=0R2tQhKfmS@mail.gmail.com>
	<1282814384.4777.48.camel@deskpro15336.dynamic.sanger.ac.uk>
	<6CC5D44B-47C7-4A05-B834-EAB9173C74CE@illinois.edu>
	<AANLkTinxcoKBHqU7bnfyNA6bi5qBjNAYR54c6K+Pg7rz@mail.gmail.com>
Message-ID: <EF1B137F-94A7-45E1-B8FB-0E20142F0A7F@illinois.edu>

On Aug 26, 2010, at 11:22 AM, kanmani radha wrote:

> Hi Everyone,
> 
> Thanks very much for this clarification.  Thanks a ton for every one who
> spared their time to educate me.
> 
> I see your points.  Please correct me if I am wrong.
> 
> I understand that, Its better to use use Bio::DB::SeqFeature or Bio::DB::GFF
> to load the fasta sequences (from a separate multifasta) file and
> then Bio::Tools::GFF to parse the feature info from a gff file . Then query
> the created database for the relevent GFF coordinates....
> 
> I will implement this.
> 
> Thanks once again.
> Kanmani

Yes, in general.  I forgot to mention that you can have an in-memory database as well, but it's only suggested if you have a few thousand or so features and small sequences (I think bacterial chromosomes will work).  

chris

From Havard.Aanes at nvh.no  Wed Aug 25 11:47:12 2010
From: Havard.Aanes at nvh.no (=?iso-8859-1?Q?Aanes_H=E5vard?=)
Date: Wed, 25 Aug 2010 17:47:12 +0200
Subject: [Bioperl-l] bpfetch.pl
Message-ID: <897520BC3AAE754FA4E34E2FD26490A8021C61597B8D@A-EXMB1.veths.no>


Hi,

I am trying do obtain a set of mRNA sequences from a database, made by the bpindex script. I thought this should be a trivial task, but it appears not to be. I get the sequences if I do one by one, like this:

perl scripts/index/bpfetch.pl -dir ./ zebrafish:NM_201192 zebrafish:NM_212708

But I need hundreds of sequences, so my plan was to put the RefSeq IDs in a file and use that as an argument (or whatever it is called in perl). That does not work:

haavaaan at login2 ~/download/src/bioperl-1.2.3 $ perl scripts/index/bpfetch.pl -dir ./ zebrafish:./some_seqs

You are running bpindex.pl without installing bioperl.
You have done it from bioperl/scripts, and so we can find the necessary information
but it is much better to install bioperl

Please read the README in the bioperl distribution

Sequence %id in Database zebrafish is not present


Any suggestions on how to do this? Alternative approaches are also appreciated.

I have no experience in perl, just started using linux, and for the moment there is no time to learn perl, so I would really be grateful for any help to solve this specific task.

Best regards

H?vard Aanes (M.Sc.)
Ph.D. student
Section for biochemistry and physiology
The Norwegian School of Veterinary Science
Telephone: +47 22597358


The new e-mail domain name for The Norwegian School of Veterinary Science is @nvh.no.
The former domain address @veths.no will still be in use, but it will be discontinued within 1-2 years.
Please update your e-mail records.


This message verifies that the e-mail has been 
scanned for virus, and deemed virus-free 
according to our scanengines.


From kanmaninradha at gmail.com  Thu Aug 26 04:23:28 2010
From: kanmaninradha at gmail.com (kanmani)
Date: Thu, 26 Aug 2010 01:23:28 -0700 (PDT)
Subject: [Bioperl-l] Bio::Tools:GFF to get DNA sequences...
Message-ID: <9b7381d7-3596-4e60-a2ac-6c8c135d457d@s24g2000pri.googlegroups.com>

Hi I am trying to get the DNA sequences for each exon feature. I have
the following script. Everything works except getting sequences. Can
some one correct me.....Thanks.

#!/usr/bin/perl

use strict;
use warnings;
use Bio::Tools::GFF;


my $file = shift;
my $gffio = Bio::Tools::GFF->new(-file => $file, -gff_version => 3);
$gffio->features_attached_to_seqs(1);

while (my $feat = $gffio->next_feature()){
    my $start = $feat->start;
    my $end= $feat->end;
    my $size = $end-$start+1;
    my $strand = $feat->strand;
    my $seqid = $feat->seq_id;
    my $score = $feat->score;
    my $frame = $feat->frame;
    my $source = $feat->source_tag;
    my $type = $feat->primary_tag;
    my $gffstr = $gffio->gff_string($feat);
    my @alltags = $feat->all_tags();
    my @ID_tag_value = $feat->each_tag_value("ID");

   my  $seq = $feat->seq();
   print "$seq\n";

  if($type eq "gene"){
       print "@ID_tag_value\t$size\t$type\t$start\t$end\n";
    }
}

From kanmaninradha at gmail.com  Thu Aug 26 17:24:40 2010
From: kanmaninradha at gmail.com (kanmani radha)
Date: Thu, 26 Aug 2010 14:24:40 -0700
Subject: [Bioperl-l] getting DNA sequence for exon features from GFF
In-Reply-To: <EF1B137F-94A7-45E1-B8FB-0E20142F0A7F@illinois.edu>
References: <AANLkTimzuqNz+oHZt3JAq=eQaefJ4UnQK=0R2tQhKfmS@mail.gmail.com>
	<1282814384.4777.48.camel@deskpro15336.dynamic.sanger.ac.uk>
	<6CC5D44B-47C7-4A05-B834-EAB9173C74CE@illinois.edu>
	<AANLkTinxcoKBHqU7bnfyNA6bi5qBjNAYR54c6K+Pg7rz@mail.gmail.com>
	<EF1B137F-94A7-45E1-B8FB-0E20142F0A7F@illinois.edu>
Message-ID: <AANLkTikUxFLLAduO7M1QzSToewA_AgPPELKPVYq0+JKk@mail.gmail.com>

Hi Chris and others,

For a brief amount time i could get away using Bio::DB::Fasta to index fasta
files and Bio::Tools::GFF to iterate thru GFF features. But, i hit the wall
again. Looks like sequential access of GFF featuers is not sufficient, I
want to have a random access to it. I see the only way to do that is by
using Bio::DB::GFF as suggested by Chris.

Here is my question. Is there any tutorial to configure Bioperl  or this
module in particular to work with MySQL/postgres. I will really appreciate
it.

And thanks for all your help.
Kanmani

On Thu, Aug 26, 2010 at 10:08 AM, Chris Fields <cjfields at illinois.edu>wrote:

> On Aug 26, 2010, at 11:22 AM, kanmani radha wrote:
>
> > Hi Everyone,
> >
> > Thanks very much for this clarification.  Thanks a ton for every one who
> > spared their time to educate me.
> >
> > I see your points.  Please correct me if I am wrong.
> >
> > I understand that, Its better to use use Bio::DB::SeqFeature or
> Bio::DB::GFF
> > to load the fasta sequences (from a separate multifasta) file and
> > then Bio::Tools::GFF to parse the feature info from a gff file . Then
> query
> > the created database for the relevent GFF coordinates....
> >
> > I will implement this.
> >
> > Thanks once again.
> > Kanmani
>
> Yes, in general.  I forgot to mention that you can have an in-memory
> database as well, but it's only suggested if you have a few thousand or so
> features and small sequences (I think bacterial chromosomes will work).
>
> chris

From kanmaninradha at gmail.com  Thu Aug 26 18:04:20 2010
From: kanmaninradha at gmail.com (kanmani radha)
Date: Thu, 26 Aug 2010 15:04:20 -0700
Subject: [Bioperl-l] getting DNA sequence for exon features from GFF
In-Reply-To: <AANLkTikUxFLLAduO7M1QzSToewA_AgPPELKPVYq0+JKk@mail.gmail.com>
References: <AANLkTimzuqNz+oHZt3JAq=eQaefJ4UnQK=0R2tQhKfmS@mail.gmail.com>
	<1282814384.4777.48.camel@deskpro15336.dynamic.sanger.ac.uk>
	<6CC5D44B-47C7-4A05-B834-EAB9173C74CE@illinois.edu>
	<AANLkTinxcoKBHqU7bnfyNA6bi5qBjNAYR54c6K+Pg7rz@mail.gmail.com>
	<EF1B137F-94A7-45E1-B8FB-0E20142F0A7F@illinois.edu>
	<AANLkTikUxFLLAduO7M1QzSToewA_AgPPELKPVYq0+JKk@mail.gmail.com>
Message-ID: <AANLkTimTU87G1dajASCzHm5=pjHCKx8W5X8AR9TKLmU4@mail.gmail.com>

HI, I made some progress since then....
- Installing  Bio::DB::DBI::mysql needed Biosql.

- Downloaded and installed biosql follow the instruction as given in their
INSTALL file
- Created biosql db in my mysql server
- loaded schema using script from biosql

- installed DBI
- Now, I have problem with DBD::mysql. That reminds me couple years back i
had to struggle installing this driver on another machine. I thought i ask
around this time.

It fails with a bunch of error messages.....the first of it being....
dbdimp.h:22:49 error: mysql.h no such filer or directory

But, My mysql installation has header file in
"/usr/include/mysql3/mysql/mysql.h". Can anyone suggest how to move forward
from that.....

thanks,
Kanmani

On Thu, Aug 26, 2010 at 2:24 PM, kanmani radha <kanmaninradha at gmail.com>wrote:

> Hi Chris and others,
>
> For a brief amount time i could get away using Bio::DB::Fasta to index
> fasta files and Bio::Tools::GFF to iterate thru GFF features. But, i hit the
> wall again. Looks like sequential access of GFF featuers is not sufficient,
> I want to have a random access to it. I see the only way to do that is by
> using Bio::DB::GFF as suggested by Chris.
>
> Here is my question. Is there any tutorial to configure Bioperl  or this
> module in particular to work with MySQL/postgres. I will really appreciate
> it.
>
> And thanks for all your help.
> Kanmani
>
>
> On Thu, Aug 26, 2010 at 10:08 AM, Chris Fields <cjfields at illinois.edu>wrote:
>
>> On Aug 26, 2010, at 11:22 AM, kanmani radha wrote:
>>
>> > Hi Everyone,
>> >
>> > Thanks very much for this clarification.  Thanks a ton for every one who
>> > spared their time to educate me.
>> >
>> > I see your points.  Please correct me if I am wrong.
>> >
>> > I understand that, Its better to use use Bio::DB::SeqFeature or
>> Bio::DB::GFF
>> > to load the fasta sequences (from a separate multifasta) file and
>> > then Bio::Tools::GFF to parse the feature info from a gff file . Then
>> query
>> > the created database for the relevent GFF coordinates....
>> >
>> > I will implement this.
>> >
>> > Thanks once again.
>> > Kanmani
>>
>> Yes, in general.  I forgot to mention that you can have an in-memory
>> database as well, but it's only suggested if you have a few thousand or so
>> features and small sequences (I think bacterial chromosomes will work).
>>
>> chris
>
>
>

From rafalucas.unicamp at gmail.com  Thu Aug 26 18:11:07 2010
From: rafalucas.unicamp at gmail.com (Rafael Lucas)
Date: Thu, 26 Aug 2010 19:11:07 -0300
Subject: [Bioperl-l] Help in algorithm Bio::Structure::IO::pdb
Message-ID: <AANLkTi=zWPKeY1NpRA9TBSEnsbGH1W9F0y0QQ0+um7Yq@mail.gmail.com>

Hi folks,

How are you? I'm from Brazil and I was making an algorithm that
Cryptographyc a data and then print the result in a pdb file. So I have a
.fasta file and want to pass this file to .pdb file, if I use a program,
like PyMol, it will take so much time, so I wanna use the
Bio::Structure::IO::pdb to accelerate this process, could you help me in
this problem?

Thank you,

Rafael Lucas
Faculdade de Tecnologia em Analise e Desenvolvimento de Sistemas
FT - UNICAMP
+55 (19)9614-0533

From J.Christopher.Ellis at duke.edu  Thu Aug 26 22:06:30 2010
From: J.Christopher.Ellis at duke.edu (J. Christopher Ellis)
Date: Thu, 26 Aug 2010 22:06:30 -0400
Subject: [Bioperl-l] standaloneblastplus blastn crash
Message-ID: <55861.1282874790@duke.edu>

 When I run the standaloneblastplus I get the following error...

 ------------- EXCEPTION -------------
 MSG: C:Program FilesNCBIblast-2.2.24+binblastn.exe call crashed: There
was a problem running C:Program FilesNCBIblast-2.2.24+binblastn.exe :? at
C:/Perl64/lib/Bio/Tools/Run/WrapperBase/CommandExts.pm line 1001.

 STACK Bio::Tools::Run::WrapperBase::_run
C:/Perl64/lib/Bio/Tools/Run/WrapperBase/CommandExts.pm:1006
 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD
C:/Perl64/lib/Bio/Tools/Run/StandAloneBlastPlus.pm:1303
 STACK Bio::Tools::Run::StandAloneBlastPlus::run
C:/Perl64/lib/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:270
 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD
C:/Perl64/lib/Bio/Tools/Run/StandAloneBlastPlus.pm:1301
 STACK toplevel localBlast.pl:9
 -------------------------------------

 I have a sneaky suspicion that it is an easy fix but for the life of me I
can not figure it out! :)

 Thanks in advance,
 Chris
 

From indraniel at gmail.com  Thu Aug 26 21:57:54 2010
From: indraniel at gmail.com (Indraniel)
Date: Fri, 27 Aug 2010 01:57:54 +0000 (UTC)
Subject: [Bioperl-l] How to convert SFF into Fastq
References: <COL102-W14F3F0CDA966B9ECE0BE1BFABB0@phx.gbl>
	<AANLkTilN3rsgWEjvmyMq9IjC8p5MzBdGGe-Xtfd6XoZF@mail.gmail.com>
	<AANLkTikC-I0JFvWqptlA69qrKnKrWSNyNPAwHQKSLluJ@mail.gmail.com>
Message-ID: <loom.20100827T035104-821@post.gmane.org>

A fourth option is the following tool, sff2fastq (written in C), described here:

http://indraniel.wordpress.com/2010/04/23/sff2fastq/

and 

http://github.com/indraniel/sff2fastq

Indraniel


From David.Messina at sbc.su.se  Fri Aug 27 03:41:21 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 27 Aug 2010 09:41:21 +0200
Subject: [Bioperl-l] [RFC] Interolog::Walk
In-Reply-To: <4C6D0B50.4050902@sms.ed.ac.uk>
References: <4C6BF4BD.5010200@sms.ed.ac.uk>
	<8BBCD561-CA5E-412B-8CBC-34767006A74A@sbc.su.se>
	<4C6D0B50.4050902@sms.ed.ac.uk>
Message-ID: <A5AACD38-0396-4221-B6F7-5740FBBD83E0@sbc.su.se>

Hi Giuseppe,


On Aug 19, 2010, at 12:45, Giuseppe Gallone wrote:
> Bio::Orthology::InterologMap
> Bio::Orthology::Interolog::Map,

> just in case somebody else finds other interesting applications for the Interolog concept and would like to "plug in" their own contribution. Would this make any sense?

Absolutely. I think either of the above is a good option, and I agree that the second is a little more flexible.

Your POD looks great! Way better than most. Having seen the whole thing now, I think your description is fine as is. And if you have another tutorial and example scripts on top of it, that would really be terrific, above and beyond what most people would expect.

So, time to unleash it on the world! :)


Dave


From David.Messina at sbc.su.se  Fri Aug 27 03:58:12 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 27 Aug 2010 09:58:12 +0200
Subject: [Bioperl-l] standaloneblastplus blastn crash
In-Reply-To: <55861.1282874790@duke.edu>
References: <55861.1282874790@duke.edu>
Message-ID: <9275A540-AE42-47B0-BA73-A906964C451B@sbc.su.se>

Hi Chris,

If you look at the error message, it says what the problem is: it's trying to call the blastn executable with no spaces in the path name.

> MSG: C:Program FilesNCBIblast-2.2.24+binblastn.exe call crashed: There
> was a problem running C:Program FilesNCBIblast-2.2.24+binblastn.exe


Now, that could be a problem is BioPerl or it could be a problem in your code. It's hard to diagnose where the problem lies without your code, so please post your code.


Dave


From G.Gallone at sms.ed.ac.uk  Fri Aug 27 07:07:57 2010
From: G.Gallone at sms.ed.ac.uk (Giuseppe Gallone)
Date: Fri, 27 Aug 2010 12:07:57 +0100
Subject: [Bioperl-l] [RFC] Interolog::Walk
In-Reply-To: <A5AACD38-0396-4221-B6F7-5740FBBD83E0@sbc.su.se>
References: <4C6BF4BD.5010200@sms.ed.ac.uk>
	<8BBCD561-CA5E-412B-8CBC-34767006A74A@sbc.su.se>
	<4C6D0B50.4050902@sms.ed.ac.uk>
	<A5AACD38-0396-4221-B6F7-5740FBBD83E0@sbc.su.se>
Message-ID: <4C779C8D.1090007@sms.ed.ac.uk>

Hi Dave,

thank you very much for your feedback :) . I will register the namespace 
right now. I think I will use 'homology' as the second level name 
though, because I plan to extend the module to work with paralogues as well.

As for the category, which one of the following you reckon it will fit a 
Bio:: package better

http://www.cpan.org/modules/by-category/

Regards
Giuseppe

On 27/08/10 08:41, Dave Messina wrote:
> Hi Giuseppe,
>
>
> On Aug 19, 2010, at 12:45, Giuseppe Gallone wrote:
>> Bio::Orthology::InterologMap
>> Bio::Orthology::Interolog::Map,
>
>> just in case somebody else finds other interesting applications for the Interolog concept and would like to "plug in" their own contribution. Would this make any sense?
>
> Absolutely. I think either of the above is a good option, and I agree that the second is a little more flexible.
>
> Your POD looks great! Way better than most. Having seen the whole thing now, I think your description is fine as is. And if you have another tutorial and example scripts on top of it, that would really be terrific, above and beyond what most people would expect.
>
> So, time to unleash it on the world! :)
>
>
> Dave
>
>

-- 

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

From David.Messina at sbc.su.se  Fri Aug 27 07:25:06 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 27 Aug 2010 13:25:06 +0200
Subject: [Bioperl-l] [RFC] Interolog::Walk
In-Reply-To: <4C779C8D.1090007@sms.ed.ac.uk>
References: <4C6BF4BD.5010200@sms.ed.ac.uk>
	<8BBCD561-CA5E-412B-8CBC-34767006A74A@sbc.su.se>
	<4C6D0B50.4050902@sms.ed.ac.uk>
	<A5AACD38-0396-4221-B6F7-5740FBBD83E0@sbc.su.se>
	<4C779C8D.1090007@sms.ed.ac.uk>
Message-ID: <80E5F23B-EA13-40EE-B0C5-81F2E6A69C01@sbc.su.se>

Hi Giuseppe,


> I think I will use 'homology' as the second level name though, because I plan to extend the module to work with paralogues as well.

Sounds good.


> As for the category, which one of the following you reckon it will fit a Bio:: package better
> 
> http://www.cpan.org/modules/by-category/


Bio:: is in 23 - miscellaneous modules, so probably keeping with that makes sense.

I don't know much about that stuff, though. Chris F. or other CPAN cognoscenti care to comment?


Dave


From cjfields at illinois.edu  Fri Aug 27 09:26:51 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 27 Aug 2010 08:26:51 -0500
Subject: [Bioperl-l] [RFC] Interolog::Walk
In-Reply-To: <80E5F23B-EA13-40EE-B0C5-81F2E6A69C01@sbc.su.se>
References: <4C6BF4BD.5010200@sms.ed.ac.uk>
	<8BBCD561-CA5E-412B-8CBC-34767006A74A@sbc.su.se>
	<4C6D0B50.4050902@sms.ed.ac.uk>
	<A5AACD38-0396-4221-B6F7-5740FBBD83E0@sbc.su.se>
	<4C779C8D.1090007@sms.ed.ac.uk>
	<80E5F23B-EA13-40EE-B0C5-81F2E6A69C01@sbc.su.se>
Message-ID: <88BB7813-E892-4BEC-9C49-5FD22325BBF7@illinois.edu>

On Aug 27, 2010, at 6:25 AM, Dave Messina wrote:

> Hi Giuseppe,
> 
> 
>> I think I will use 'homology' as the second level name though, because I plan to extend the module to work with paralogues as well.
> 
> Sounds good.
> 
> 
>> As for the category, which one of the following you reckon it will fit a Bio:: package better
>> 
>> http://www.cpan.org/modules/by-category/
> 
> 
> Bio:: is in 23 - miscellaneous modules, so probably keeping with that makes sense.
> 
> I don't know much about that stuff, though. Chris F. or other CPAN cognoscenti care to comment?
> 
> 
> Dave

That's probably the best spot, as we cover a fairly broad range (mainly due to core monolithic structure).  Though it's terribly non-descript, sort of the junk drawer of CPAN.

chris

From adamkennedybackup at gmail.com  Sun Aug 29 07:35:50 2010
From: adamkennedybackup at gmail.com (Adam Kennedy)
Date: Sun, 29 Aug 2010 21:35:50 +1000
Subject: [Bioperl-l] Could I install BioPerl on Windows with the
 ActivePerl 5.12.1?
In-Reply-To: <5115F433-06AC-46F1-81AD-D15C4A8D9524@gmail.com>
References: <AANLkTi=ycKzqWWQ-FHk=4WBxhedt7CYT-WkBZkxRjgrm@mail.gmail.com>
	<78E913D5-00E2-45F2-AA9D-7F4A7CDBFDA1@gmail.com>
	<AANLkTimBPL6Sr2kmg+f0t1j8pk_9nBAoqubKzY4AJoxo@mail.gmail.com>
	<5115F433-06AC-46F1-81AD-D15C4A8D9524@gmail.com>
Message-ID: <AANLkTinSp6GCOQvCFYOUk1Ad8EjKdU=dQbe5GpbLiLr1@mail.gmail.com>

http://strawberryperl.com/download/professional/strawberry-perl-professional-5.10.1.3-alpha-2.msi

You get BioPerl installed out the box.

Adam K

On 20 August 2010 03:20, Christopher Fields <cjfields1 at gmail.com> wrote:
> cc'ing list. ?Looks like the BioPerl PPM is possibly broken for perl 5.12. ?Shouldn't be too hard to fix, but apparently there are a lot of missing packages. Troubling...
>
> chris
>
> On Aug 19, 2010, at 11:29 AM, han sun wrote:
>
>> v5.10 works,thanks.
>>
>> 2010/8/19 Christopher Fields <cjfields1 at gmail.com>
>> Try using ActivePerl 5.10 instead of v5.12. ?It's very possible the PPM won't work for v5.12 yet.
>>
>> chris
>>
>> On Aug 19, 2010, at 9:25 AM, han sun wrote:
>>
>> > Hello everyone,
>> >
>> > I have used perl for several months,and I now want to feel the power of
>> > bioperl.
>> > But it seems that the installing is more difficult than I thought.
>> >
>> > I typed the commands.
>> >
>> >
>> >
>> > install-shell
>> >
>> >
>> > rep add bioperl http://bioperl.org/DIST
>> >
>> >
>> > rep add uwinnipeg
>> > http://cpan.uwinnipeg.ca/PPMPackages/12xx/<http://cpan.uwinnipeg.ca/PPMPackages/10xx/>
>> >
>> >
>> > rep add trouchelle http://trouchelle.com/ppm12/
>> >
>> > install BioPerl
>> >
>> > However,the installing failed,
>> >
>> > ppm install failed:
>> > Can't find any package that provides List::MoreUtils for Bundle-BioPerl-Core
>> > Can't find any package that provides PostScript::TextBlock for
>> > Bundle-BioPerl-Core
>> > Can't find any package that provides Ace:: for Bundle-BioPerl-Core
>> > Can't find any package that provides SOAP::Lite for Bundle-BioPerl-Core
>> > Can't find any package that provides Convert::Binary::C for
>> > Bundle-BioPerl-Core
>> > Can't find any package that provides XML::Twig for Bundle-BioPerl-Core
>> > Can't find any package that provides DB_File:: for Bundle-BioPerl-Core
>> > Can't find any package that provides IPC::Run for GraphViz
>> > Can't find any package that provides XML-XPathEngine for XML-DOM-XPath
>> > Can't find any package that provides List-MoreUtils for Moose
>> > Can't find any package that provides List-MoreUtils for Class-MOP
>> >
>> >
>> > then I tried
>> >
>> > install http://www.bribes.org/perl/ppm/GD.ppd
>> >
>> > and tried the installation again,but it still didn't help.
>> >
>> > *
>> > *
>> > *
>> > *
>> > *
>> > *
>> >
>> >
>> > *Do you konw what's wrong with the problem?*
>> > *
>> > *
>> > *
>> > *
>> > *Please help me,thanks very much.*
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields1 at gmail.com  Sun Aug 29 11:58:50 2010
From: cjfields1 at gmail.com (Christopher Fields)
Date: Sun, 29 Aug 2010 10:58:50 -0500
Subject: [Bioperl-l] Could I install BioPerl on Windows with the
	ActivePerl 5.12.1?
In-Reply-To: <AANLkTinSp6GCOQvCFYOUk1Ad8EjKdU=dQbe5GpbLiLr1@mail.gmail.com>
References: <AANLkTi=ycKzqWWQ-FHk=4WBxhedt7CYT-WkBZkxRjgrm@mail.gmail.com>
	<78E913D5-00E2-45F2-AA9D-7F4A7CDBFDA1@gmail.com>
	<AANLkTimBPL6Sr2kmg+f0t1j8pk_9nBAoqubKzY4AJoxo@mail.gmail.com>
	<5115F433-06AC-46F1-81AD-D15C4A8D9524@gmail.com>
	<AANLkTinSp6GCOQvCFYOUk1Ad8EjKdU=dQbe5GpbLiLr1@mail.gmail.com>
Message-ID: <A1B60C18-E144-466B-9630-21A88EF2CECB@gmail.com>

Yes, and I am thinking of pointing more and more users that direction instead.  Can't say maintaining PPM packages with ever-fluctuating specs is easy when I don't work with Windows anymore.

chris

On Aug 29, 2010, at 6:35 AM, Adam Kennedy wrote:

> http://strawberryperl.com/download/professional/strawberry-perl-professional-5.10.1.3-alpha-2.msi
> 
> You get BioPerl installed out the box.
> 
> Adam K
> 
> On 20 August 2010 03:20, Christopher Fields <cjfields1 at gmail.com> wrote:
>> cc'ing list.  Looks like the BioPerl PPM is possibly broken for perl 5.12.  Shouldn't be too hard to fix, but apparently there are a lot of missing packages. Troubling...
>> 
>> chris
>> 
>> On Aug 19, 2010, at 11:29 AM, han sun wrote:
>> 
>>> v5.10 works,thanks.
>>> 
>>> 2010/8/19 Christopher Fields <cjfields1 at gmail.com>
>>> Try using ActivePerl 5.10 instead of v5.12.  It's very possible the PPM won't work for v5.12 yet.
>>> 
>>> chris
>>> 
>>> On Aug 19, 2010, at 9:25 AM, han sun wrote:
>>> 
>>>> Hello everyone,
>>>> 
>>>> I have used perl for several months,and I now want to feel the power of
>>>> bioperl.
>>>> But it seems that the installing is more difficult than I thought.
>>>> 
>>>> I typed the commands.
>>>> 
>>>> 
>>>> 
>>>> install-shell
>>>> 
>>>> 
>>>> rep add bioperl http://bioperl.org/DIST
>>>> 
>>>> 
>>>> rep add uwinnipeg
>>>> http://cpan.uwinnipeg.ca/PPMPackages/12xx/<http://cpan.uwinnipeg.ca/PPMPackages/10xx/>
>>>> 
>>>> 
>>>> rep add trouchelle http://trouchelle.com/ppm12/
>>>> 
>>>> install BioPerl
>>>> 
>>>> However,the installing failed,
>>>> 
>>>> ppm install failed:
>>>> Can't find any package that provides List::MoreUtils for Bundle-BioPerl-Core
>>>> Can't find any package that provides PostScript::TextBlock for
>>>> Bundle-BioPerl-Core
>>>> Can't find any package that provides Ace:: for Bundle-BioPerl-Core
>>>> Can't find any package that provides SOAP::Lite for Bundle-BioPerl-Core
>>>> Can't find any package that provides Convert::Binary::C for
>>>> Bundle-BioPerl-Core
>>>> Can't find any package that provides XML::Twig for Bundle-BioPerl-Core
>>>> Can't find any package that provides DB_File:: for Bundle-BioPerl-Core
>>>> Can't find any package that provides IPC::Run for GraphViz
>>>> Can't find any package that provides XML-XPathEngine for XML-DOM-XPath
>>>> Can't find any package that provides List-MoreUtils for Moose
>>>> Can't find any package that provides List-MoreUtils for Class-MOP
>>>> 
>>>> 
>>>> then I tried
>>>> 
>>>> install http://www.bribes.org/perl/ppm/GD.ppd
>>>> 
>>>> and tried the installation again,but it still didn't help.
>>>> 
>>>> *
>>>> *
>>>> *
>>>> *
>>>> *
>>>> *
>>>> 
>>>> 
>>>> *Do you konw what's wrong with the problem?*
>>>> *
>>>> *
>>>> *
>>>> *
>>>> *Please help me,thanks very much.*
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>>> 
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From odclerck at gmail.com  Fri Aug 27 03:44:14 2010
From: odclerck at gmail.com (odclerck)
Date: Fri, 27 Aug 2010 00:44:14 -0700 (PDT)
Subject: [Bioperl-l]  fasta header replace
Message-ID: <29550202.post@talk.nabble.com>


Hi,
Was wondering if someone had an easy script available that converts the
headers of a fasta sequences to a value stored in a separate text file.

Macrogen produces files with sequences that look more or less like this:
>100825-30_I01_CF_CentralAmerica1_A1_psbAF.ab1	1012, 1000 bases, 0 checksum.

I can filter out the position on the plate e.g. "A1" easily but would like
to replace this with the name of the strain stored in a different text file,
e.g. "A1_D1222".

Realize this sounds pretty basic to most of you, but I'm pretty new at
scripting.
Olivier

-- 
View this message in context: http://old.nabble.com/fasta-header-replace-tp29550202p29550202.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From J.Christopher.Ellis at duke.edu  Mon Aug 30 08:55:04 2010
From: J.Christopher.Ellis at duke.edu (J. Christopher Ellis)
Date: Mon, 30 Aug 2010 08:55:04 -0400
Subject: [Bioperl-l] Taxonomy DB problem
Message-ID: <51468.1283172904@duke.edu>

 Hi All,

 I am trying to extract the entire taxonomy of an organism including the
classifications. Some thing like...

Phylum:Proteobacteria,?Class:Gammaproteobacteria,?Order:Enterobacteriales,?Family:Enterobacteriaceae,?Genus:Escherichia

I?am?not?worried?about?format?just?that?I?get?the?information?and?the?associated?level?of?hierarchy.?The?script?found?at?http://bioperl.org/wiki/Species_names_from_accession_numbers?seemed?like?a?good?starting?point?so?I?copied?it?and?tried?run?it?but?got?an?error.

My?first?question?is?"Is?there?a?known?fix?for?this?"?and?my?second?question?is?how?do?I?get?the?full?hierarchical?information?(as?seen?above)?with?the?taxonomy?db?

Thanks?for?all?your?help?in?advance!

Chris?


From rafalucas.unicamp at gmail.com  Mon Aug 30 09:24:11 2010
From: rafalucas.unicamp at gmail.com (Rafael Lucas)
Date: Mon, 30 Aug 2010 10:24:11 -0300
Subject: [Bioperl-l] help in algorithm Bio::Structure::IO::pdb
Message-ID: <AANLkTimNHcjCRqYrhH8=Q=Dqqjtj35NNqMqP+Q2P1oPU@mail.gmail.com>

Hi folks,

How are you? I'm from Brazil and I was making an algorithm that
Cryptographyc a data and then print the result in a pdb file. So I have a
.fasta file and want to pass this file to .pdb file, if I use a program,
like PyMol, it will take so much time, so I wanna use the
Bio::Structure::IO::pdb to accelerate this process, could you help me in
this problem?

Thank you,

Rafael Lucas
Faculdade de Tecnologia em Analise e Desenvolvimento de Sistemas
FT - UNICAMP
+55 (19)9614-0533

From cjfields at illinois.edu  Mon Aug 30 09:36:41 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 30 Aug 2010 08:36:41 -0500
Subject: [Bioperl-l] Taxonomy DB problem
In-Reply-To: <51468.1283172904@duke.edu>
References: <51468.1283172904@duke.edu>
Message-ID: <B93CF33A-0FA5-4A19-AF5A-BE203AA26E38@illinois.edu>

Chris,

Regarding a fix for that script, we would have to see your modified script and the error.  However, there are modules within BioPerl to essentially do what you want, in particular, Bio::DB::Taxonomy.

chris

On Aug 30, 2010, at 7:55 AM, J. Christopher Ellis wrote:

> Hi All,
> 
> I am trying to extract the entire taxonomy of an organism including the
> classifications. Some thing like...
> 
> Phylum:Proteobacteria, Class:Gammaproteobacteria, Order:Enterobacteriales, Family:Enterobacteriaceae, Genus:Escherichia
> 
> I am not worried about format just that I get the information and the associated level of hierarchy. The script found at http://bioperl.org/wiki/Species_names_from_accession_numbers seemed like a good starting point so I copied it and tried run it but got an error.
> 
> My first question is "Is there a known fix for this?" and my second question is how do I get the full hierarchical information (as seen above) with the taxonomy db?
> 
> Thanks for all your help in advance!
> 
> Chris 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From fs5 at sanger.ac.uk  Mon Aug 30 11:11:06 2010
From: fs5 at sanger.ac.uk (Frank Schwach)
Date: Mon, 30 Aug 2010 16:11:06 +0100
Subject: [Bioperl-l] fasta header replace
In-Reply-To: <29550202.post@talk.nabble.com>
References: <29550202.post@talk.nabble.com>
Message-ID: <4C7BCA0A.70503@sanger.ac.uk>

Hi Olivier,

Do you know how to read a file and build a hash from the contents? This 
is what you will need to do,
e.g. if your file is
A1 Strain_A
A2 Strain_A
A3 Strain_B

then you can do something like:

open (INFILE, '>', $infile_path) or die;
my %well2strain;
While (<INFILE>){
    my ($well, $strain) = ($_=~/^([A-Z]\d+)\s+(\w+)/);
    $well2strain{$well}=$strain;
}

You can then use the values of the hash to set the sequence ID as you 
parse the FASTA file. The BioPerl SeqIO howto gives details about how to 
read and write the FASTA file 
(http://www.bioperl.org/wiki/HOWTO:SeqIO#Working_Examples).
You can change the id of a sequence object with
$some_seq_object->id( 'my new ID');

See http://doc.bioperl.org/releases/bioperl-1.0/Bio/Seq.html for details.

Hope that helps to get you started.

Frank

 
odclerck wrote:
> Hi,
> Was wondering if someone had an easy script available that converts the
> headers of a fasta sequences to a value stored in a separate text file.
>
> Macrogen produces files with sequences that look more or less like this:
>   
>> 100825-30_I01_CF_CentralAmerica1_A1_psbAF.ab1	1012, 1000 bases, 0 checksum.
>>     
>
> I can filter out the position on the plate e.g. "A1" easily but would like
> to replace this with the name of the strain stored in a different text file,
> e.g. "A1_D1222".
>
> Realize this sounds pretty basic to most of you, but I'm pretty new at
> scripting.
> Olivier
>
>   


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 

From jessica.sun at gmail.com  Mon Aug 30 11:51:39 2010
From: jessica.sun at gmail.com (Jessica Sun)
Date: Mon, 30 Aug 2010 11:51:39 -0400
Subject: [Bioperl-l] Git for the lazy
In-Reply-To: <4A13D48C-B920-4FA5-AF18-292C764A8B79@sbc.su.se>
References: <4A13D48C-B920-4FA5-AF18-292C764A8B79@sbc.su.se>
Message-ID: <AANLkTikzkPL-WN7XUNPcfNhqqnOYUR15br-YzrwsE5tL@mail.gmail.com>

I want to add sequence features  with tags and tag values, I want to have
them in my order, however somehow it seems it is in default alphabetically
orders of the tags, does any one knows how to fix? thanks a lot in advance.

From G.Gallone at sms.ed.ac.uk  Tue Aug 31 07:52:57 2010
From: G.Gallone at sms.ed.ac.uk (Giuseppe Gallone)
Date: Tue, 31 Aug 2010 12:52:57 +0100
Subject: [Bioperl-l] New CPAN Release - Bio::Homology::InterologWalk - A
 Perl Module to retrieve putative PPIs through Interologs
Message-ID: <4C7CED19.80802@sms.ed.ac.uk>

Dear Bioperl users,

I would like to announce the release of Bio::Homology::InterologWalk, a
module that retrieves, scores and visualizes putative Protein-Protein 
Interactions through the orthology-walk method.

The project is available from the following link

http://search.cpan.org/~ggallone/

and a description of the idea behind it is here

http://search.cpan.org/~ggallone/Bio-Homology-InterologWalk-0.02/lib/Bio/Homology/InterologWalk.pm#DESCRIPTION

The project is in a very early stage (currently ver. 0.02 alpha) and has 
currently been tested only on Linux environments. It has not been tested 
on Macs, but it should work fine, and I would appreciate any feedback 
from Mac users who try it.

*Any* form of feedback  will be extremely appreciated (bug, typos,
syntactical errors, verbal abuse etc :) ).


Best,
Giuseppe


-- 

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


From cjfields at illinois.edu  Tue Aug 31 11:01:59 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 31 Aug 2010 10:01:59 -0500
Subject: [Bioperl-l] Taxonomy DB problem
In-Reply-To: <56973.1283255847@duke.edu>
References: <56973.1283255847@duke.edu>
Message-ID: <7167CA86-857E-4E16-A3D6-BA45045CF892@illinois.edu>

Yes, I see that one.  It may be the ID hash that is being returned is empty.  I'll look into it.

-c 

On Aug 31, 2010, at 6:57 AM, J. Christopher Ellis wrote:

> Hi Chris,
> 
> The error is...
> 
> "Use of uninitialized value $id in join or string at C:/Perl64/site/lib/Bio/Tools/EUtilities/EUtilParameters.pm line 363."
> 
> The script from http://bioperl.org/wiki/Species_names_from_accession_numbers is as follows....
> 
> use Bio::DB::EUtilities;
> 
> 
> 
>  
> 
> 
> 
> 
> my (%taxa, @taxa);
> 
> 
> 
> my (%names, %idmap);
> 
> 
> 
>  
> 
> 
> 
> 
> # these are protein ids; nuc ids will work by changing -dbfrom => 'nucleotide',
> 
> 
> 
> # (probably)
> 
> 
> 
>  
> 
> 
> 
> 
> my @ids = qw(1621261 89318838 68536103 
> 
> 20807972
>  730439);
> 
>  
> 
> 
> 
> 
> my $factory = Bio::DB::EUtilities->new(
> 
> -
> eutil => 'elink',
> 
>  
> -db => 'taxonomy',
> 
> 
> 
>  
> -dbfrom => 'protein',
> 
> 
> 
>  
> -correspondence => 1,
> 
> 
> 
>  
> -id => \@ids);
> 
> 
> 
>  
> 
> 
> 
> 
> # iterate through the LinkSet objects
> 
> 
> 
> while (my $ds = $factory->next_LinkSet) {
> 
> 
> 
>  
> $taxa{($ds->get_submitted_ids)[0]
> 
> }
>  = ($ds->get_ids)[0]
> 
> }
> 
> 
> 
>  
> 
> 
> 
> 
> @taxa = @taxa{@ids};
> 
> 
> 
>  
> 
> 
> 
> 
> $factory = Bio::DB::EUtilities->new(-eutil 
> 
> =>
>  'esummary',
> 
>  
> -db => 'taxonomy',
> 
> 
> 
>  
> -id => \@taxa );
> 
> 
> 
>  
> 
> 
> 
> 
> while (local $_ = $factory->next_DocSum)
> 
>  
> {
> 
>  
> $names{($_->get_contents_by_name('TaxId'))
> 
> [
> 0]} = 
> 
> ($_->get_contents_by_name('ScientificName'))[0
> 
> ]
> ;
> 
> }
> 
> 
> 
>  
> 
> 
> 
> 
> foreach (@ids) {
> 
> 
> 
>  
> $idmap{$_} = $names{$taxa{$_
> 
> }
> };
> 
> }
> 
> 
> 
>  
> 
> 
> 
> 
> # %idmap is
> 
> 
> 
> # 1621261 => 'Mycobacterium tuberculosis H37Rv'
> 
> 
> 
> # 20807972 => 'Thermoanaerobacter tengcongensis MB4'
> 
> 
> 
> # 68536103 => 'Corynebacterium jeikeium K411'
> 
> 
> 
> # 730439 => 'Bacillus caldolyticus'
> 
> 
> 
> # 89318838 => undef (this record has been removed from the db)
> 
> 
> 
>  
> 
> 
> 
> 
> 1;
> 
> 
> Thanks,
> 
> 
> 
> Chris
> 
> 
> On Mon 08/30/10 09:36 , "Chris Fields" cjfields at illinois.edu sent:
> Chris,
> 
> Regarding a fix for that script, we would have to see your modified script and the error. However, there are modules within BioPerl to essentially do what you want, in particular, Bio::DB::Taxonomy.
> 
> chris
> 
> On Aug 30, 2010, at 7:55 AM, J. Christopher Ellis wrote:
> 
> > Hi All,
> > 
> > I am trying to extract the entire taxonomy of an organism including the
> > classifications. Some thing like...
> > 
> > Phylum:Proteobacteria, Class:Gammaproteobacteria, Order:Enterobacteriales, Family:Enterobacteriaceae, Genus:Escherichia
> > 
> > I am not worried about format just that I get the information and the associated level of hierarchy. The script found athttp://bioperl.org/wiki/Species_names_from_accession_numbers">http://bioperl.org/wiki/Species_names_from_accession_numbers seemed like a good starting point so I copied it and tried run it but got an error.
> > 
> > My first question is "Is there a known fix for this?" and my second question is how do I get the full hierarchical information (as seen above) with the taxonomy db?
> > 
> > Thanks for all your help in advance!
> > 
> > Chris 
> > 
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l">http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 


From J.Christopher.Ellis at duke.edu  Tue Aug 31 07:57:27 2010
From: J.Christopher.Ellis at duke.edu (J. Christopher Ellis)
Date: Tue, 31 Aug 2010 07:57:27 -0400
Subject: [Bioperl-l] Taxonomy DB problem
Message-ID: <56973.1283255847@duke.edu>

 Hi Chris,

 The error is...

 "Use of uninitialized value $id in join or string at
C:/Perl64/site/lib/Bio/Tools/EUtilities/EUtilParameters.pm line 363."

 The script from
http://bioperl.org/wiki/Species_names_from_accession_numbers is as
follows....

use?Bio::DB::EUtilities;

?

my?(%taxa,?@taxa);

my?(%names,?%idmap);

?

#?these?are?protein?ids;?nuc?ids?will?work?by?changing?-dbfrom?=>?'nucleotide',

#?(probably)

?

my?@ids?=?qw(1621261?89318838?68536103?

20807972?730439);

?

my?$factory?=?Bio::DB::EUtilities->new(

-eutil?=>?'elink',

?-db?=>?'taxonomy',

?-dbfrom?=>?'protein',

?-correspondence?=>?1,

?-id?=>?@ids);

?

#?iterate?through?the?LinkSet?objects

while?(my?$ds?=?$factory->next_LinkSet)?{

?$taxa{($ds->get_submitted_ids)[0]

}?=?($ds->get_ids)[0]

}

?

@taxa?=?@taxa{@ids};

?

$factory?=?Bio::DB::EUtilities->new(-eutil?

=>?'esummary',

?-db?=>?'taxonomy',

?-id?=>?@taxa?);

?

while?(local?$_?=?$factory->next_DocSum)

?{

?$names{($_->get_contents_by_name('TaxId'))

[0]}?=?

($_->get_contents_by_name('ScientificName'))[0

];

}

?

foreach?(@ids)?{

?$idmap{$_}?=?$names{$taxa{$_

}};

}

?

#?%idmap?is

#?1621261?=>?'Mycobacterium?tuberculosis?H37Rv'

#?20807972?=>?'Thermoanaerobacter?tengcongensis?MB4'

#?68536103?=>?'Corynebacterium?jeikeium?K411'

#?730439?=>?'Bacillus?caldolyticus'

#?89318838?=>?undef?(this?record?has?been?removed?from?the?db)

?

1;

Thanks,

Chris

 On Mon 08/30/10 09:36 , "Chris Fields" cjfields at illinois.edu sent:
 Chris,

 Regarding a fix for that script, we would have to see your modified
script and the error. However, there are modules within BioPerl to
essentially do what you want, in particular, Bio::DB::Taxonomy.

 chris

 On Aug 30, 2010, at 7:55 AM, J. Christopher Ellis wrote:

 > Hi All,
 > 
 > I am trying to extract the entire taxonomy of an organism including the
 > classifications. Some thing like...
 > 
 > Phylum:Proteobacteria, Class:Gammaproteobacteria,
Order:Enterobacteriales, Family:Enterobacteriaceae, Genus:Escherichia
 > 
 > I am not worried about format just that I get the information and the
associated level of hierarchy. The script found at
http://bioperl.org/wiki/Species_names_from_accession_numbers seemed like a
good starting point so I copied it and tried run it but got an error.
 > 
 > My first question is "Is there a known fix for this?" and my second
question is how do I get the full hierarchical information (as seen above)
with the taxonomy db?
 > 
 > Thanks for all your help in advance!
 > 
 > Chris 
 > 
 > 
 > _______________________________________________
 > Bioperl-l mailing list
 > Bioperl-l at lists.open-bio.org
 > http://lists.open-bio.org/mailman/listinfo/bioperl-l

 
From rmb32 at cornell.edu  Sun Aug  1 15:17:14 2010
From: rmb32 at cornell.edu (Robert Buels)
Date: Sun, 01 Aug 2010 12:17:14 -0700
Subject: [Bioperl-l] GMOD Evo Hackathon Open Call for Participation
Message-ID: <4C55C83A.3060700@cornell.edu>

We are seeking participants for the GMOD Tools for Evolutionary Biology 
Hackathon, held November 8-12, 2010 at the US National Evolutionary 
Synthesis Center (NESCent) in Durham, NC.

This hackathon targets three critical gaps in the capabilities of the 
GMOD toolbox that currently limit its utility for evolutionary research:

  1. Visualization of comparative genomics data
  2. Visualization of phylogenetic data and trees
  3. Support for population diversity and phenotype data

If you are interested in these areas and have relevant expertise, you 
are strongly encouraged to apply. Relevant areas of expertise include 
more than just software development: if you are a GMOD power user, 
visualization guru, domain expert (comparative, phylogenetics, 
population, ...), or documentation wizard, then your skills are needed!

How To Apply:

Fill out the online application form at http://bit.ly/gmodevohack. 
Applications are due August 25.

About GMOD:

GMOD is an intercompatible suite of open-source software components for 
storing, managing, analyzing, and visualizing genome-scale data. GMOD 
includes many widely-used software components: GBrowse and JBrowse, both 
genome viewers; GBrowse_syn, a comparative genomics viewer; Chado, a 
generic and modular database schema; CMap, a comparative map viewer; as 
well as many other components including Apollo, MAKER, BioMart, 
InterMine, and Galaxy. We hope to extend the functionality of existing 
GMOD components, and integrate new components as well.

About Hackathons:

A hackathon is an intense event at which a group of programmers with 
different backgrounds and skills collaborate hands-on and face-to-face 
to develop working code that is of utility to the community as a whole. 
The mix of people will include domain experts and computer-savvy end-users.

More details about the event, its motivation, organization, procedures, 
and attendees, as well as URLs to the hackathon and related websites are 
included below.

Sincerely,

The GMOD EvoHack Organizing Committee (and project affiliations as
relevant):

Nicole Washington, Chair (LBNL, modENCODE, Phenote)

Robert Buels (SGN, Chado NatDiv)

Scott Cain (OICR, GMOD)

Dave Clements (NESCent, GMOD)

Hilmar Lapp (NESCent, Phenoscape, Chado NatDiv)

Sheldon McKay (University of Arizona, iPlant, GBrowse_syn)


-----------------------------

About the GMOD Evo Hackathon

Overview

We are organizing a hackathon to fill critical gaps in the capabilities 
of the Generic Model Organism Database (GMOD) toolbox that currently 
limit its utility for evolutionary research. Specifically, we will focus 
on tools for

   1) viewing comparative genomics data;
   2) visualizing phylogenomic data; and
   3) supporting population diversity data and phenotype annotation.

The event will be hosted at NESCent and bring together a group of about 
20+ software developers, end-user representatives, and documentation 
experts who would otherwise not meet. The participants will include key 
developers of GMOD components that currently lack features critical for 
emerging evolutionary biology research, developers of informatics tools 
in evolutionary research that lack GMOD integration, and 
informatics-savvy biologists who can represent end-user requirements.

The event will provide a unique opportunity to infuse the GMOD developer 
community with a heightened awareness of unmet needs in evolutionary 
biology that GMOD components have the potential to fill, and for tool 
developers in evolutionary biology to better understand how best to 
extend or integrate with already existing GMOD components.

Before the Event

Discussion of ideas and sometimes even design actually starts well 
before the hackathon, on mailing lists, wiki pages, and conference calls 
set up among accepted attendees.  This advance work lays the foundation 
for participants to be productive from the very first day.  This also 
means that participants should be willing to contribute some time in 
advance of the hackathon itself to participate in this preparatory 
discussion.

During the Event

Typically, hackathon participants use the morning of the first day of 
the event to organize themselves into working groups of between 3 and 6 
people, each with a focused implementation objective.  Ideas and 
objectives are discussed, and attendees coalesce around the projects in 
which they have the most experience or interest.


Deliverables / Event Results

The meeting's attendance, working groups, and outcomes will be fully 
logged and documented on the GMOD wiki (http://gmod.org). Each working 
group during the event will typically have its own wiki page, linked 
from the main EvoHack page, where it documents its minutes and design 
notes, and provides links to the code and documentation it produces. 
Also, since GMOD and NESCent are both committed to open source 
principles, all code and documentation produced by participants during 
the hackathon must be published under an OSI-approved open source 
license. As contributions to existing GMOD tools, all hackathon products 
will most likely satisfy this requirement automatically.

NESCent

This event is sponsored by the US National Evolutionary Synthesis Center 
(NESCent, http://www.nescent.org) through its Informatics Whitepapers 
program (http://www.nescent.org/informatics/whitepapers.php). NESCent 
promotes the synthesis of information, concepts and knowledge to address 
significant, emerging, or novel questions in evolutionary science and 
its applications. NESCent achieves this by supporting research and 
education across disciplinary, institutional, geographic, and 
demographic boundaries (see http://www.nescent.org/science/proposals.php).

Links

Main GMOD EvoHack page, and full proposal:
http://gmod.org/wiki/GMOD_Evo_Hackathon

NESCent: http://www.nescent.org/
GMOD: http://gmod.org <http://gmod.org/>
Similar past NESCent events, see: http://hackathon.nescent.org/
GMOD hackathon application:  http://bit.ly/gmodevohack

-- 
http://gmod.org/wiki/GMOD_News
http://gmod.org/wiki/GMOD_Europe_2010
http://gmod.org/wiki/Help_Desk_Feedback


From maj at fortinbras.us  Sun Aug  1 19:19:16 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Sun, 1 Aug 2010 19:19:16 -0400
Subject: [Bioperl-l] SOAP Eutilities
In-Reply-To: <AANLkTi=DSQ2vktjCghDscW6OyHv25HYNXqA96LXTz443@mail.gmail.com>
References: <AANLkTi=DSQ2vktjCghDscW6OyHv25HYNXqA96LXTz443@mail.gmail.com>
Message-ID: <627BEC8B2E624A69A0B11EEBC8C93B71@NewLife>

Turns out that module lives in bioperl-run; try 

git clone git://github.com/bioperl/bioperl-run.git

MAJ
----- Original Message ----- 
From: "Robson de Souza" <robfsouza at gmail.com>
To: <bioperl-l at bioperl.org>
Sent: Saturday, July 31, 2010 4:56 PM
Subject: [Bioperl-l] SOAP Eutilities


> Hi,
> 
> Bio::DB::SoapEUtilities, referred in the HOWTO on EUtilities, seems to
> have disappeared from the Git repository.
> A simple
> 
> git clone git://github.com/bioperl/bioperl-live.git
> 
> does not download it. Any ideas why?
> Robson
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>


From David.Messina at sbc.su.se  Mon Aug  2 09:58:10 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Mon, 2 Aug 2010 15:58:10 +0200
Subject: [Bioperl-l] phyloxml and element order
In-Reply-To: <AANLkTimk5j3VfOvLNcN_c+FsgoVqpntB9xR5NfDopLPh@mail.gmail.com>
References: <AANLkTimk5j3VfOvLNcN_c+FsgoVqpntB9xR5NfDopLPh@mail.gmail.com>
Message-ID: <AB413C9E-ED42-48AF-A8AB-893771AD7067@sbc.su.se>

Hi Fred,

Thanks for letting us know about this ? definitely sounds like a bug.

Would you please submit this to our bug tracker?

    http://bugzilla.open-bio.org


(You can just copy and paste your previous email.)

Dave


On Jul 30, 2010, at 06:59, Fr?d?ric Romagn? wrote:

> Hi,
> 
> I'm using bioperl to create phyloxml trees, after few tentatives, i got my
> tree with all the element/attributes i want but when I write the tree,
> element are not written following the order specified in the XSD Schema.
> 
> For example, i got :
> 
> <clade>
>   <clade>
>      <name>Loxosceles intermedia</name>
>      <taxonomy>
>         <scientific_name>Araneomorphae Sicariidae</scientific_name>
>      </taxonomy>
>      <sequence>
>         <accession source="Arachnoserver">969</accession>
>         <mol_seq>HAAERADSRKPIWDIAHMVNDLELVD</mol_seq>
>      </sequence>
>   </clade>
>   <taxonomy>
>      <scientific_name>Araneomorphae Sicariidae</scientific_name>
>   </taxonomy>
> </clade>
> 
> The program forester complains that <taxonomy> should be written before the
> <clade> element.
> 
> According to
> http://phyloxml.wordpress.com/2009/11/25/order-of-elements-in-phyloxml this
> is what bioperl is supposed to do.
> 
> All my element/attributes are set before writing the tree using
> 'add_Annotation', 'add_tag_value' and 'sequence' methods from a
> Bio::Tree::AnnotatableNode object, so i think the error comes from the
> write_tree method.
> 
> Any help would be appreciated.
> 
> Thank you,
> Fred
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From shalabh.sharma7 at gmail.com  Mon Aug  2 15:44:35 2010
From: shalabh.sharma7 at gmail.com (shalabh sharma)
Date: Mon, 2 Aug 2010 15:44:35 -0400
Subject: [Bioperl-l] clustalw to maf format
Message-ID: <AANLkTik4yDV6p0AyMWXxsJP1ifYAehu7jw=2oUWeo0B3@mail.gmail.com>

Hi,
    I am trying to convert clustalw to maf format.
I am trying to use AlignIO for that but its not working.

Its giving me the following error:

EXCEPTION Bio::Root::NotImplemented -------------
MSG: Abstract method "Bio::AlignIO::maf::write_aln" is not implemented by
package Bio::AlignIO::maf.
This is not your fault - author of Bio::AlignIO::maf should be blamed!

STACK Bio::Root::RootI::throw_not_implemented
/Library/Perl/5.8.8/Bio/Root/RootI.pm:707
STACK Bio::AlignIO::maf::write_aln /Library/Perl/5.8.8/Bio/AlignIO/
maf.pm:176
STACK Bio::AlignIO::PRINT /Library/Perl/5.8.8/Bio/AlignIO.pm:492
STACK toplevel msf2mafy.pl:11


Is there any other way i can convert clustalw to maf?

I would really appreciate if anyone can help me out.

Thanks
Shalabh


From Russell.Smithies at agresearch.co.nz  Mon Aug  2 16:25:26 2010
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Tue, 3 Aug 2010 08:25:26 +1200
Subject: [Bioperl-l] clustalw to maf format
In-Reply-To: <AANLkTik4yDV6p0AyMWXxsJP1ifYAehu7jw=2oUWeo0B3@mail.gmail.com>
References: <AANLkTik4yDV6p0AyMWXxsJP1ifYAehu7jw=2oUWeo0B3@mail.gmail.com>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF32F02147B68@exchsth.agresearch.co.nz>

This might work if you only have a few:
http://www.ibi.vu.nl/programs/convertalignwww/

--Russell


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of shalabh sharma
> Sent: Tuesday, 3 August 2010 7:45 a.m.
> To: bioperl-l
> Subject: [Bioperl-l] clustalw to maf format
> 
> Hi,
>     I am trying to convert clustalw to maf format.
> I am trying to use AlignIO for that but its not working.
> 
> Its giving me the following error:
> 
> EXCEPTION Bio::Root::NotImplemented -------------
> MSG: Abstract method "Bio::AlignIO::maf::write_aln" is not implemented by
> package Bio::AlignIO::maf.
> This is not your fault - author of Bio::AlignIO::maf should be blamed!
> 
> STACK Bio::Root::RootI::throw_not_implemented
> /Library/Perl/5.8.8/Bio/Root/RootI.pm:707
> STACK Bio::AlignIO::maf::write_aln /Library/Perl/5.8.8/Bio/AlignIO/
> maf.pm:176
> STACK Bio::AlignIO::PRINT /Library/Perl/5.8.8/Bio/AlignIO.pm:492
> STACK toplevel msf2mafy.pl:11
> 
> 
> Is there any other way i can convert clustalw to maf?
> 
> I would really appreciate if anyone can help me out.
> 
> Thanks
> Shalabh
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From shalabh.sharma7 at gmail.com  Mon Aug  2 16:53:31 2010
From: shalabh.sharma7 at gmail.com (shalabh sharma)
Date: Mon, 2 Aug 2010 16:53:31 -0400
Subject: [Bioperl-l] clustalw to maf format
In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32F02147B68@exchsth.agresearch.co.nz>
References: <AANLkTik4yDV6p0AyMWXxsJP1ifYAehu7jw=2oUWeo0B3@mail.gmail.com>
	<18DF7D20DFEC044098A1062202F5FFF32F02147B68@exchsth.agresearch.co.nz>
Message-ID: <AANLkTingREcmgoeS7RVzi4j84Kk9bFmg_F6p-tScpKWA@mail.gmail.com>

Hi Russell,
            Thanks for the reply, but i  have around 400 alignments and some
huge ones :(

Thanks
Shalabh


On Mon, Aug 2, 2010 at 4:25 PM, Smithies, Russell <
Russell.Smithies at agresearch.co.nz> wrote:

> This might work if you only have a few:
> http://www.ibi.vu.nl/programs/convertalignwww/
>
> --Russell
>
>
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of shalabh sharma
> > Sent: Tuesday, 3 August 2010 7:45 a.m.
> > To: bioperl-l
> > Subject: [Bioperl-l] clustalw to maf format
> >
> > Hi,
> >     I am trying to convert clustalw to maf format.
> > I am trying to use AlignIO for that but its not working.
> >
> > Its giving me the following error:
> >
> > EXCEPTION Bio::Root::NotImplemented -------------
> > MSG: Abstract method "Bio::AlignIO::maf::write_aln" is not implemented by
> > package Bio::AlignIO::maf.
> > This is not your fault - author of Bio::AlignIO::maf should be blamed!
> >
> > STACK Bio::Root::RootI::throw_not_implemented
> > /Library/Perl/5.8.8/Bio/Root/RootI.pm:707
> > STACK Bio::AlignIO::maf::write_aln /Library/Perl/5.8.8/Bio/AlignIO/
> > maf.pm:176
> > STACK Bio::AlignIO::PRINT /Library/Perl/5.8.8/Bio/AlignIO.pm:492
> > STACK toplevel msf2mafy.pl:11
> >
> >
> > Is there any other way i can convert clustalw to maf?
> >
> > I would really appreciate if anyone can help me out.
> >
> > Thanks
> > Shalabh
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
>


From biopython at maubp.freeserve.co.uk  Mon Aug  2 17:24:09 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 2 Aug 2010 22:24:09 +0100
Subject: [Bioperl-l] clustalw to maf format
In-Reply-To: <AANLkTik4yDV6p0AyMWXxsJP1ifYAehu7jw=2oUWeo0B3@mail.gmail.com>
References: <AANLkTik4yDV6p0AyMWXxsJP1ifYAehu7jw=2oUWeo0B3@mail.gmail.com>
Message-ID: <AANLkTikFJP0aZHWgcRVxfJ9dhg-8Aj+aRWLF2GJDseW3@mail.gmail.com>

On Mon, Aug 2, 2010 at 8:44 PM, shalabh sharma
<shalabh.sharma7 at gmail.com> wrote:
> Hi,
> ? ?I am trying to convert clustalw to maf format.
> I am trying to use AlignIO for that but its not working.

Could you tell us why you have to use maf format?
I'm curious because all of the phylogenetics tools I've
had to work with personally will take some other format
which is more widely supported (e.g. FASTA, PFAM,
ClustalW, PHYLIP, ...).

Peter


From bernd.web at gmail.com  Mon Aug  2 17:25:52 2010
From: bernd.web at gmail.com (Bernd Web)
Date: Mon, 2 Aug 2010 23:25:52 +0200
Subject: [Bioperl-l] clustalw to maf format
In-Reply-To: <AANLkTingREcmgoeS7RVzi4j84Kk9bFmg_F6p-tScpKWA@mail.gmail.com>
References: <AANLkTik4yDV6p0AyMWXxsJP1ifYAehu7jw=2oUWeo0B3@mail.gmail.com>
	<18DF7D20DFEC044098A1062202F5FFF32F02147B68@exchsth.agresearch.co.nz>
	<AANLkTingREcmgoeS7RVzi4j84Kk9bFmg_F6p-tScpKWA@mail.gmail.com>
Message-ID: <AANLkTimQe9fgO3jMeWR_y3E7gNskh26GUVVuEyfgtRJc@mail.gmail.com>

Hi Shalabh,

This ConvertAlign does not write maf either, it only reads it (i made
it). I found some other converters on the web but they do not export
to maf format either...

http://biotechvana.uv.es/servers/afc/main.php
http://www.hiv.lanl.gov/content/sequence/FORMAT_CONVERSION/form.html

Galaxy has a MAF to Fasta converter:
http://main.g2.bx.psu.edu/root?tool_id=MAF_To_Fasta1


Regards,
Bernd


On Mon, Aug 2, 2010 at 10:53 PM, shalabh sharma
<shalabh.sharma7 at gmail.com> wrote:
> Hi Russell,
> ? ? ? ? ? ?Thanks for the reply, but i ?have around 400 alignments and some
> huge ones :(
>
> Thanks
> Shalabh
>
>
> On Mon, Aug 2, 2010 at 4:25 PM, Smithies, Russell <
> Russell.Smithies at agresearch.co.nz> wrote:
>
>> This might work if you only have a few:
>> http://www.ibi.vu.nl/programs/convertalignwww/
>>
>> --Russell
>>
>>
>> > -----Original Message-----
>> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> > bounces at lists.open-bio.org] On Behalf Of shalabh sharma
>> > Sent: Tuesday, 3 August 2010 7:45 a.m.
>> > To: bioperl-l
>> > Subject: [Bioperl-l] clustalw to maf format
>> >
>> > Hi,
>> > ? ? I am trying to convert clustalw to maf format.
>> > I am trying to use AlignIO for that but its not working.
>> >
>> > Its giving me the following error:
>> >
>> > EXCEPTION Bio::Root::NotImplemented -------------
>> > MSG: Abstract method "Bio::AlignIO::maf::write_aln" is not implemented by
>> > package Bio::AlignIO::maf.
>> > This is not your fault - author of Bio::AlignIO::maf should be blamed!
>> >
>> > STACK Bio::Root::RootI::throw_not_implemented
>> > /Library/Perl/5.8.8/Bio/Root/RootI.pm:707
>> > STACK Bio::AlignIO::maf::write_aln /Library/Perl/5.8.8/Bio/AlignIO/
>> > maf.pm:176
>> > STACK Bio::AlignIO::PRINT /Library/Perl/5.8.8/Bio/AlignIO.pm:492
>> > STACK toplevel msf2mafy.pl:11
>> >
>> >
>> > Is there any other way i can convert clustalw to maf?
>> >
>> > I would really appreciate if anyone can help me out.
>> >
>> > Thanks
>> > Shalabh
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> =======================================================================
>> Attention: The information contained in this message and/or attachments
>> from AgResearch Limited is intended only for the persons or entities
>> to which it is addressed and may contain confidential and/or privileged
>> material. Any review, retransmission, dissemination or other use of, or
>> taking of any action in reliance upon, this information by persons or
>> entities other than the intended recipients is prohibited by AgResearch
>> Limited. If you have received this message in error, please notify the
>> sender immediately.
>> =======================================================================
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at illinois.edu  Mon Aug  2 17:31:20 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 2 Aug 2010 16:31:20 -0500
Subject: [Bioperl-l] clustalw to maf format
In-Reply-To: <AANLkTingREcmgoeS7RVzi4j84Kk9bFmg_F6p-tScpKWA@mail.gmail.com>
References: <AANLkTik4yDV6p0AyMWXxsJP1ifYAehu7jw=2oUWeo0B3@mail.gmail.com>
	<18DF7D20DFEC044098A1062202F5FFF32F02147B68@exchsth.agresearch.co.nz>
	<AANLkTingREcmgoeS7RVzi4j84Kk9bFmg_F6p-tScpKWA@mail.gmail.com>
Message-ID: <6E9C9D64-D23A-4FC8-B213-FC8A7FFA4F27@illinois.edu>

No other format will work?  The main reason you see unimplemented methods like this is there is no active interest in working with this format beyond getting the information stored within them into objects and other commonly-used formats.

chris

On Aug 2, 2010, at 3:53 PM, shalabh sharma wrote:

> Hi Russell,
>            Thanks for the reply, but i  have around 400 alignments and some
> huge ones :(
> 
> Thanks
> Shalabh
> 
> 
> On Mon, Aug 2, 2010 at 4:25 PM, Smithies, Russell <
> Russell.Smithies at agresearch.co.nz> wrote:
> 
>> This might work if you only have a few:
>> http://www.ibi.vu.nl/programs/convertalignwww/
>> 
>> --Russell
>> 
>> 
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>> bounces at lists.open-bio.org] On Behalf Of shalabh sharma
>>> Sent: Tuesday, 3 August 2010 7:45 a.m.
>>> To: bioperl-l
>>> Subject: [Bioperl-l] clustalw to maf format
>>> 
>>> Hi,
>>>    I am trying to convert clustalw to maf format.
>>> I am trying to use AlignIO for that but its not working.
>>> 
>>> Its giving me the following error:
>>> 
>>> EXCEPTION Bio::Root::NotImplemented -------------
>>> MSG: Abstract method "Bio::AlignIO::maf::write_aln" is not implemented by
>>> package Bio::AlignIO::maf.
>>> This is not your fault - author of Bio::AlignIO::maf should be blamed!
>>> 
>>> STACK Bio::Root::RootI::throw_not_implemented
>>> /Library/Perl/5.8.8/Bio/Root/RootI.pm:707
>>> STACK Bio::AlignIO::maf::write_aln /Library/Perl/5.8.8/Bio/AlignIO/
>>> maf.pm:176
>>> STACK Bio::AlignIO::PRINT /Library/Perl/5.8.8/Bio/AlignIO.pm:492
>>> STACK toplevel msf2mafy.pl:11
>>> 
>>> 
>>> Is there any other way i can convert clustalw to maf?
>>> 
>>> I would really appreciate if anyone can help me out.
>>> 
>>> Thanks
>>> Shalabh
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> =======================================================================
>> Attention: The information contained in this message and/or attachments
>> from AgResearch Limited is intended only for the persons or entities
>> to which it is addressed and may contain confidential and/or privileged
>> material. Any review, retransmission, dissemination or other use of, or
>> taking of any action in reliance upon, this information by persons or
>> entities other than the intended recipients is prohibited by AgResearch
>> Limited. If you have received this message in error, please notify the
>> sender immediately.
>> =======================================================================
>> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From shalabh.sharma7 at gmail.com  Mon Aug  2 18:30:41 2010
From: shalabh.sharma7 at gmail.com (shalabh sharma)
Date: Mon, 2 Aug 2010 18:30:41 -0400
Subject: [Bioperl-l] clustalw to maf format
In-Reply-To: <6E9C9D64-D23A-4FC8-B213-FC8A7FFA4F27@illinois.edu>
References: <AANLkTik4yDV6p0AyMWXxsJP1ifYAehu7jw=2oUWeo0B3@mail.gmail.com>
	<18DF7D20DFEC044098A1062202F5FFF32F02147B68@exchsth.agresearch.co.nz>
	<AANLkTingREcmgoeS7RVzi4j84Kk9bFmg_F6p-tScpKWA@mail.gmail.com>
	<6E9C9D64-D23A-4FC8-B213-FC8A7FFA4F27@illinois.edu>
Message-ID: <AANLkTi=DS92zQ+_YANSbgmMiPUDo66YXyLhQby+_z_VD@mail.gmail.com>

Hi All,
      Thanks for the replies.
Actually i am working on a pipeline involving RNAz.
I had impression that there must be a converter available as their webserver
can take xmfa or maf format but standalone is only accepting maf format.

I think i will use a program that can output as xmfa and write to those
people if they can provide me with the converter.

Thanks
Shalabh


On Mon, Aug 2, 2010 at 5:31 PM, Chris Fields <cjfields at illinois.edu> wrote:

> No other format will work?  The main reason you see unimplemented methods
> like this is there is no active interest in working with this format beyond
> getting the information stored within them into objects and other
> commonly-used formats.
>
> chris
>
> On Aug 2, 2010, at 3:53 PM, shalabh sharma wrote:
>
> > Hi Russell,
> >            Thanks for the reply, but i  have around 400 alignments and
> some
> > huge ones :(
> >
> > Thanks
> > Shalabh
> >
> >
> > On Mon, Aug 2, 2010 at 4:25 PM, Smithies, Russell <
> > Russell.Smithies at agresearch.co.nz> wrote:
> >
> >> This might work if you only have a few:
> >> http://www.ibi.vu.nl/programs/convertalignwww/
> >>
> >> --Russell
> >>
> >>
> >>> -----Original Message-----
> >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >>> bounces at lists.open-bio.org] On Behalf Of shalabh sharma
> >>> Sent: Tuesday, 3 August 2010 7:45 a.m.
> >>> To: bioperl-l
> >>> Subject: [Bioperl-l] clustalw to maf format
> >>>
> >>> Hi,
> >>>    I am trying to convert clustalw to maf format.
> >>> I am trying to use AlignIO for that but its not working.
> >>>
> >>> Its giving me the following error:
> >>>
> >>> EXCEPTION Bio::Root::NotImplemented -------------
> >>> MSG: Abstract method "Bio::AlignIO::maf::write_aln" is not implemented
> by
> >>> package Bio::AlignIO::maf.
> >>> This is not your fault - author of Bio::AlignIO::maf should be blamed!
> >>>
> >>> STACK Bio::Root::RootI::throw_not_implemented
> >>> /Library/Perl/5.8.8/Bio/Root/RootI.pm:707
> >>> STACK Bio::AlignIO::maf::write_aln /Library/Perl/5.8.8/Bio/AlignIO/
> >>> maf.pm:176
> >>> STACK Bio::AlignIO::PRINT /Library/Perl/5.8.8/Bio/AlignIO.pm:492
> >>> STACK toplevel msf2mafy.pl:11
> >>>
> >>>
> >>> Is there any other way i can convert clustalw to maf?
> >>>
> >>> I would really appreciate if anyone can help me out.
> >>>
> >>> Thanks
> >>> Shalabh
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >> =======================================================================
> >> Attention: The information contained in this message and/or attachments
> >> from AgResearch Limited is intended only for the persons or entities
> >> to which it is addressed and may contain confidential and/or privileged
> >> material. Any review, retransmission, dissemination or other use of, or
> >> taking of any action in reliance upon, this information by persons or
> >> entities other than the intended recipients is prohibited by AgResearch
> >> Limited. If you have received this message in error, please notify the
> >> sender immediately.
> >> =======================================================================
> >>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From chiragmatkarbioinfo at gmail.com  Tue Aug  3 03:47:37 2010
From: chiragmatkarbioinfo at gmail.com (chirag matkar)
Date: Tue, 3 Aug 2010 13:17:37 +0530
Subject: [Bioperl-l] Pubmed Parsing
Message-ID: <AANLkTim+qcBN_9kXVLAkessaHUY9e=gc4Ad5MVGWk-mF@mail.gmail.com>

Hello all,
I have a list of Pubmed Ids.
I want to parse articles to find specific SNP related information.
Can i work it out using a Script?


-- 
Regards,
Chirag Matkar


From genehack at genehack.org  Tue Aug  3 05:03:35 2010
From: genehack at genehack.org (John Anderson)
Date: Tue, 3 Aug 2010 05:03:35 -0400
Subject: [Bioperl-l] Pubmed Parsing
In-Reply-To: <AANLkTim+qcBN_9kXVLAkessaHUY9e=gc4Ad5MVGWk-mF@mail.gmail.com>
References: <AANLkTim+qcBN_9kXVLAkessaHUY9e=gc4Ad5MVGWk-mF@mail.gmail.com>
Message-ID: <5E557C44-224B-4460-9C2C-E375555B8BE6@genehack.org>


On Aug 3, 2010, at 3:47 AM, chirag matkar wrote:

> I have a list of Pubmed Ids.
> I want to parse articles to find specific SNP related information.
> Can i work it out using a Script?

Can you provide a more specific example of what you'd like to do? For example, something along the lines of, "for PMID 1234, get ... about SNP 5678" (where '...' is replaced with whatever it is you're trying to get). Even describing how you would obtain this information using the website yourself will be helpful.

thanks,
john.


From gowthaman.ramasamy at seattlebiomed.org  Tue Aug  3 01:29:10 2010
From: gowthaman.ramasamy at seattlebiomed.org (Gowthaman Ramasamy)
Date: Mon, 2 Aug 2010 22:29:10 -0700
Subject: [Bioperl-l] Getting pileup consensus from BAM files using
	Bio::DB::Sam
In-Reply-To: <AANLkTi=DS92zQ+_YANSbgmMiPUDo66YXyLhQby+_z_VD@mail.gmail.com>
Message-ID: <C87CF736.E5DB%gowthaman.ramasamy@sbri.org>

Hi List,
I am trying to find out the consensus using pileup via Bio::DB::Sam. Using the following script I could parse out the ref_base and different bases from reads at that position. Though, I am not able to find a method to derive consensus. Similar to the values produced by "samtools pileup -c -f xxxxxx.fasta yyyyyyy.bam".

The script I use now retrives ref base, query bases for each position. How do I improve it to get the consensus?

Thanks very much in advance,
Gowthaman


use Bio::DB::Sam;

my $bam = Bio::DB::Sam->new(-bam => 'something.bam',
                            -fasta => 'something.fasta'
                           );

my $cb = sub {
                        my ($seqid, $pos, $pileups) = @_;
                        my $refBase = $bam->segment($seqid, $pos, $pos)->dna;
                        print "\n$pos\t$refBase=>";
                        for my $pileup (@$pileups){
                                my $al = $pileup->alignment;
                                my $qBase = substr($al->qseq, $pileup->qpos, 1);
                                print "$qBase,";
                                }
                        };

$bam->pileup('Lin.chr10i', $cb);


From scott at scottcain.net  Tue Aug  3 06:32:59 2010
From: scott at scottcain.net (Scott Cain)
Date: Tue, 3 Aug 2010 06:32:59 -0400
Subject: [Bioperl-l] Getting pileup consensus from BAM files using
	Bio::DB::Sam
In-Reply-To: <C87CF736.E5DB%gowthaman.ramasamy@sbri.org>
References: <AANLkTi=DS92zQ+_YANSbgmMiPUDo66YXyLhQby+_z_VD@mail.gmail.com>
	<C87CF736.E5DB%gowthaman.ramasamy@sbri.org>
Message-ID: <AANLkTi=vkM5rhy2x_s3p1jZKPtnLjq4wWD=ebGxxmaha@mail.gmail.com>

Hi Gowthaman,

I don't see a method to extract the consensus.  You are welcome to
submit a patch :-)

Scott


On Tue, Aug 3, 2010 at 1:29 AM, Gowthaman Ramasamy
<gowthaman.ramasamy at seattlebiomed.org> wrote:
> Hi List,
> I am trying to find out the consensus using pileup via Bio::DB::Sam. Using the following script I could parse out the ref_base and different bases from reads at that position. Though, I am not able to find a method to derive consensus. Similar to the values produced by "samtools pileup -c -f xxxxxx.fasta yyyyyyy.bam".
>
> The script I use now retrives ref base, query bases for each position. How do I improve it to get the consensus?
>
> Thanks very much in advance,
> Gowthaman
>
>
> use Bio::DB::Sam;
>
> my $bam = Bio::DB::Sam->new(-bam => 'something.bam',
> ? ? ? ? ? ? ? ? ? ? ? ? ? ?-fasta => 'something.fasta'
> ? ? ? ? ? ? ? ? ? ? ? ? ? );
>
> my $cb = sub {
> ? ? ? ? ? ? ? ? ? ? ? ?my ($seqid, $pos, $pileups) = @_;
> ? ? ? ? ? ? ? ? ? ? ? ?my $refBase = $bam->segment($seqid, $pos, $pos)->dna;
> ? ? ? ? ? ? ? ? ? ? ? ?print "\n$pos\t$refBase=>";
> ? ? ? ? ? ? ? ? ? ? ? ?for my $pileup (@$pileups){
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?my $al = $pileup->alignment;
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?my $qBase = substr($al->qseq, $pileup->qpos, 1);
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?print "$qBase,";
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?}
> ? ? ? ? ? ? ? ? ? ? ? ?};
>
> $bam->pileup('Lin.chr10i', $cb);
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
Ontario Institute for Cancer Research


From lincoln.stein at gmail.com  Tue Aug  3 12:57:52 2010
From: lincoln.stein at gmail.com (Lincoln Stein)
Date: Tue, 3 Aug 2010 12:57:52 -0400
Subject: [Bioperl-l] Getting pileup consensus from BAM files using
	Bio::DB::Sam
In-Reply-To: <C87CF736.E5DB%gowthaman.ramasamy@sbri.org>
References: <AANLkTi=DS92zQ+_YANSbgmMiPUDo66YXyLhQby+_z_VD@mail.gmail.com>
	<C87CF736.E5DB%gowthaman.ramasamy@sbri.org>
Message-ID: <AANLkTimhT7KFHgg8HssiMrXkGNSjRMqjkuE++iAaqj4C@mail.gmail.com>

Samtools is running MAQ on the pileup. You could either implement MAQ in
perl, or come up with your own consensus caller.

Lincoln

On Tue, Aug 3, 2010 at 1:29 AM, Gowthaman Ramasamy <
gowthaman.ramasamy at seattlebiomed.org> wrote:

> Hi List,
> I am trying to find out the consensus using pileup via Bio::DB::Sam. Using
> the following script I could parse out the ref_base and different bases from
> reads at that position. Though, I am not able to find a method to derive
> consensus. Similar to the values produced by "samtools pileup -c -f
> xxxxxx.fasta yyyyyyy.bam".
>
> The script I use now retrives ref base, query bases for each position. How
> do I improve it to get the consensus?
>
> Thanks very much in advance,
> Gowthaman
>
>
> use Bio::DB::Sam;
>
> my $bam = Bio::DB::Sam->new(-bam => 'something.bam',
>                            -fasta => 'something.fasta'
>                           );
>
> my $cb = sub {
>                        my ($seqid, $pos, $pileups) = @_;
>                        my $refBase = $bam->segment($seqid, $pos,
> $pos)->dna;
>                        print "\n$pos\t$refBase=>";
>                        for my $pileup (@$pileups){
>                                my $al = $pileup->alignment;
>                                my $qBase = substr($al->qseq, $pileup->qpos,
> 1);
>                                print "$qBase,";
>                                }
>                        };
>
> $bam->pileup('Lin.chr10i', $cb);
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Renata Musa <Renata.Musa at oicr.on.ca>


From biopython at maubp.freeserve.co.uk  Tue Aug  3 13:06:46 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 3 Aug 2010 18:06:46 +0100
Subject: [Bioperl-l] Getting pileup consensus from BAM files using
	Bio::DB::Sam
In-Reply-To: <AANLkTimhT7KFHgg8HssiMrXkGNSjRMqjkuE++iAaqj4C@mail.gmail.com>
References: <AANLkTi=DS92zQ+_YANSbgmMiPUDo66YXyLhQby+_z_VD@mail.gmail.com>
	<C87CF736.E5DB%gowthaman.ramasamy@sbri.org>
	<AANLkTimhT7KFHgg8HssiMrXkGNSjRMqjkuE++iAaqj4C@mail.gmail.com>
Message-ID: <AANLkTinoszFJNtDeEbh_DyFLp97aayv7bYVu6c=znq1h@mail.gmail.com>

On Tue, Aug 3, 2010 at 5:57 PM, Lincoln Stein <lincoln.stein at gmail.com> wrote:
> Samtools is running MAQ on the pileup. You could either implement MAQ in
> perl, or come up with your own consensus caller.
>
> Lincoln

See also: http://seqanswers.com/forums/showthread.php?t=6241


From gowthaman.ramasamy at seattlebiomed.org  Tue Aug  3 13:28:36 2010
From: gowthaman.ramasamy at seattlebiomed.org (Gowthaman Ramasamy)
Date: Tue, 3 Aug 2010 10:28:36 -0700
Subject: [Bioperl-l] Getting pileup consensus from BAM files using
 Bio::DB::Sam
In-Reply-To: <AANLkTimhT7KFHgg8HssiMrXkGNSjRMqjkuE++iAaqj4C@mail.gmail.com>
References: <AANLkTi=DS92zQ+_YANSbgmMiPUDo66YXyLhQby+_z_VD@mail.gmail.com>
	<C87CF736.E5DB%gowthaman.ramasamy@sbri.org>,
	<AANLkTimhT7KFHgg8HssiMrXkGNSjRMqjkuE++iAaqj4C@mail.gmail.com>
Message-ID: <89080953C3D300419AACB6E63A7EEFBA5C47613B34@mail02.sbri.org>

Hi Lincoln,
Thats a good lead. I will try to use MAQ in perl rather than using my simple majority rule.

-gowtham
________________________________________
From: Lincoln Stein [lincoln.stein at gmail.com]
Sent: Tuesday, August 03, 2010 9:57 AM
To: Gowthaman Ramasamy
Cc: bioperl-l
Subject: Re: [Bioperl-l] Getting pileup consensus from BAM files using  Bio::DB::Sam

Samtools is running MAQ on the pileup. You could either implement MAQ in perl, or come up with your own consensus caller.

Lincoln

On Tue, Aug 3, 2010 at 1:29 AM, Gowthaman Ramasamy <gowthaman.ramasamy at seattlebiomed.org<mailto:gowthaman.ramasamy at seattlebiomed.org>> wrote:
Hi List,
I am trying to find out the consensus using pileup via Bio::DB::Sam. Using the following script I could parse out the ref_base and different bases from reads at that position. Though, I am not able to find a method to derive consensus. Similar to the values produced by "samtools pileup -c -f xxxxxx.fasta yyyyyyy.bam".

The script I use now retrives ref base, query bases for each position. How do I improve it to get the consensus?

Thanks very much in advance,
Gowthaman


use Bio::DB::Sam;

my $bam = Bio::DB::Sam->new(-bam => 'something.bam',
                           -fasta => 'something.fasta'
                          );

my $cb = sub {
                       my ($seqid, $pos, $pileups) = @_;
                       my $refBase = $bam->segment($seqid, $pos, $pos)->dna;
                       print "\n$pos\t$refBase=>";
                       for my $pileup (@$pileups){
                               my $al = $pileup->alignment;
                               my $qBase = substr($al->qseq, $pileup->qpos, 1);
                               print "$qBase,";
                               }
                       };

$bam->pileup('Lin.chr10i', $cb);

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org<mailto:Bioperl-l at lists.open-bio.org>
http://lists.open-bio.org/mailman/listinfo/bioperl-l


--
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Renata Musa <Renata.Musa at oicr.on.ca<mailto:Renata.Musa at oicr.on.ca>>


From stefan.kirov at bms.com  Tue Aug  3 16:22:35 2010
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Tue, 03 Aug 2010 16:22:35 -0400
Subject: [Bioperl-l] nmica parser
Message-ID: <4C587A8B.8090603@bms.com>

Has anyone written nmica parser? If not I will perhaps do that. It 
should be straightforward- the output is XML.
Stefan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: stefan_kirov.vcf
Type: text/x-vcard
Size: 207 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20100803/7e4ab529/attachment-0002.vcf>

From fs5 at sanger.ac.uk  Wed Aug  4 04:45:39 2010
From: fs5 at sanger.ac.uk (Frank Schwach)
Date: Wed, 04 Aug 2010 09:45:39 +0100
Subject: [Bioperl-l] Pubmed Parsing
In-Reply-To: <AANLkTim+qcBN_9kXVLAkessaHUY9e=gc4Ad5MVGWk-mF@mail.gmail.com>
References: <AANLkTim+qcBN_9kXVLAkessaHUY9e=gc4Ad5MVGWk-mF@mail.gmail.com>
Message-ID: <1280911539.3499.46.camel@deskpro15336.dynamic.sanger.ac.uk>

Hi Chiraq,

have a look at this earlier post:
http://bioperl.org/pipermail/bioperl-l/2009-April/029690.html

However, you won't be able to retrieve all full texts and it is quite a
task to parse natural language and get useful information about a gene,
protein, SNP etc out of a manuscript. 

Frank

On Tue, 2010-08-03 at 13:17 +0530, chirag matkar wrote:
> Hello all,
> I have a list of Pubmed Ids.
> I want to parse articles to find specific SNP related information.
> Can i work it out using a Script?
> 
> 
> 
> 
> 


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


From David.Messina at sbc.su.se  Thu Aug  5 08:16:17 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 5 Aug 2010 14:16:17 +0200
Subject: [Bioperl-l] call for a TreeIO volunteer
Message-ID: <91AC5B00-5969-4C56-B08A-6EEA76916A10@sbc.su.se>

Hi everybody,

We've got a couple of small open bugs related to the Bio::TreeIO modules, and we could really use someone to take a look at them. Ideally, that someone would have familiarity with TreeIO already.*

It'd help us to get the next release (1.6.2) out the door.

The bugs in question are:
- TreeIO::newick writes root node branch length incorrectly
http://bugzilla.open-bio.org/show_bug.cgi?id=3039

- Bio::TreeIO::nhx cannot parse empty [&&NHX] + round-trip failure
http://bugzilla.open-bio.org/show_bug.cgi?id=3007


Thanks,
Dave
on behalf of the core developers


* Even if you don't, though, if you've been looking for an opportunity to contribute to BioPerl, and this sounds like something you'd like to work on, by all means raise your hand.


From clements at nescent.org  Thu Aug  5 13:15:41 2010
From: clements at nescent.org (Dave Clements)
Date: Thu, 5 Aug 2010 10:15:41 -0700
Subject: [Bioperl-l] GMOD Europe 2010, 13-16 Sept, Cambridge, UK
In-Reply-To: <AANLkTinpd0pP9cBGUfnEd8PuV-VOcfqz6VKdCRp0d=uA@mail.gmail.com>
References: <AANLkTinpd0pP9cBGUfnEd8PuV-VOcfqz6VKdCRp0d=uA@mail.gmail.com>
Message-ID: <AANLkTi=BCjD3w0w4S+44qRb4ShW-P6DVBH0SZ+41k1Ah@mail.gmail.com>

GMOD Europe 2010
================
13-16 September 2010
Cambridge, UK
http://gmod.org/wiki/GMOD_Europe_2010


We are pleased to announce GMOD Europe 2010, four days of GMOD events being
held 13-16 September 2010, at the University of Cambridge. GMOD Europe 2010
includes:

1) GMOD Community Meeting, Monday & Tuesday:  Project updates, developer and
user presentations and best practices, project direction.

2) GMOD Satellite Meetings, Wednesday:  Special interest groups where GMOD
community members meet to discuss specific topics of interest.

3) InterMine Workshop, Wednesday:  A one day workshop on installing,
configuring and using the InterMine biological data warehouse system.

4) BioMart Workshop, Thursday:  A one day workshop on using the BioMart
biological data warehouse system, including accessing data through APIs.

Registration is now open for these events. There is a ?50 registration fee
for the GMOD Meeting to cover catered lunches and other expenses.
Registration for all other events is free, but required, as space is
limited.  These events are open to all: GMOD users, developers, prospective
users, biologists, and computer scientists.  See
http://gmod.org/wiki/January_2010_GMOD_Meeting for an idea of what goes on
at GMOD meetings,

GMOD is a collection of interoperable open source software components for
managing, visualizing and annotating biological data.  GMOD incorporates
many widely used tools, including GBrowse and JBrowse for genome browsing,
InterMine and BioMart for data mining, Galaxy and Ergatis for workflow,
Chado for data management, GBrowse_syn and CMap for comparative genomics,
plus many other tools (Apollo, MAKER, Pathway Tools, Textpresso, ...).  GMOD
is also an active community of researchers and developers addressing common
challenges in exploiting their data.  If you are struggling to fully exploit
your data then please consider attending GMOD Europe 2010.

Please let us know if you have any questions, and we hope to see you in
Cambridge.

Thanks,

Scott Cain and Dave Clements
-- 
http://gmod.org/wiki/GMOD_News
 <http://gmod.org/wiki/GMOD_News>http://gmod.org/wiki/GMOD_Evo_Hackathon
http://gmod.org/wiki/GMOD_Europe_2010
http://gmod.org/wiki/Help_Desk_Feedback


From abhishek.vit at gmail.com  Thu Aug  5 18:15:56 2010
From: abhishek.vit at gmail.com (Abhishek Pratap)
Date: Thu, 5 Aug 2010 18:15:56 -0400
Subject: [Bioperl-l] Wrapper for Picard tools in Bioperl
Message-ID: <AANLkTi=rrPKSuuddK-+gTqPyo-wKQA0ZamDP59_+dUfi@mail.gmail.com>

Hi All

Just wondering if there is any Picard wrapper/s available in Bioperl.


Thanks!
-Abhi

-----------------------------
Abhishek Pratap
Bioinformatics Software Engineer II
Genomics Resource Center
Institute for Genome Sciences
School of Medicine, Univ of Maryland
801, W. Baltimore Street, Baltimore, MD 21209
Ph: (+1)-410-706-2296
www.igs.umaryland.edu/


From Russell.Smithies at agresearch.co.nz  Thu Aug  5 18:37:46 2010
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Fri, 6 Aug 2010 10:37:46 +1200
Subject: [Bioperl-l] Wrapper for Picard tools in Bioperl
In-Reply-To: <AANLkTi=rrPKSuuddK-+gTqPyo-wKQA0ZamDP59_+dUfi@mail.gmail.com>
References: <AANLkTi=rrPKSuuddK-+gTqPyo-wKQA0ZamDP59_+dUfi@mail.gmail.com>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF32F02262E96@exchsth.agresearch.co.nz>

Might be part of the "Enterprise" package.
If not, some developer should "make it so".

:-)

--Russell
(I hate Fridays)

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Abhishek Pratap
> Sent: Friday, 6 August 2010 10:16 a.m.
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Wrapper for Picard tools in Bioperl
> 
> Hi All
> 
> Just wondering if there is any Picard wrapper/s available in Bioperl.
> 
> 
> Thanks!
> -Abhi
> 
> -----------------------------
> Abhishek Pratap
> Bioinformatics Software Engineer II
> Genomics Resource Center
> Institute for Genome Sciences
> School of Medicine, Univ of Maryland
> 801, W. Baltimore Street, Baltimore, MD 21209
> Ph: (+1)-410-706-2296
> www.igs.umaryland.edu/
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From cjfields at illinois.edu  Thu Aug  5 19:10:16 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 5 Aug 2010 18:10:16 -0500
Subject: [Bioperl-l] Wrapper for Picard tools in Bioperl
In-Reply-To: <AANLkTi=rrPKSuuddK-+gTqPyo-wKQA0ZamDP59_+dUfi@mail.gmail.com>
References: <AANLkTi=rrPKSuuddK-+gTqPyo-wKQA0ZamDP59_+dUfi@mail.gmail.com>
Message-ID: <26E3E5B6-47CF-4744-9687-199C218B5571@illinois.edu>

Picard uses samtools, which has a perl API:

http://search.cpan.org/dist/Bio-SamTools/

which uses BioPerl.  Ah, the circle of life...

chris

On Aug 5, 2010, at 5:15 PM, Abhishek Pratap wrote:

> Hi All
> 
> Just wondering if there is any Picard wrapper/s available in Bioperl.
> 
> 
> Thanks!
> -Abhi
> 
> -----------------------------
> Abhishek Pratap
> Bioinformatics Software Engineer II
> Genomics Resource Center
> Institute for Genome Sciences
> School of Medicine, Univ of Maryland
> 801, W. Baltimore Street, Baltimore, MD 21209
> Ph: (+1)-410-706-2296
> www.igs.umaryland.edu/
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From dan.kortschak at adelaide.edu.au  Thu Aug  5 21:06:45 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Fri, 06 Aug 2010 10:36:45 +0930
Subject: [Bioperl-l] MUMmer parser work
Message-ID: <1281056805.2414.26.camel@zoidberg.mbs.adelaide.edu.au>

Hello Everyone,

I've just noticed the absence of a MUMmer parser and thought that it
might be a worthwhile contribution to bioperl-run (I won't be able to
start on this for a while, but given Mark's excellent work on
CommandExts, it should take too long to get up when I do have time). Has
anyone made any effort in this direction that I would be stepping on, or
if they have left it, that I could pick up to shorten the work time?

cheers
Dan


From cjfields at illinois.edu  Thu Aug  5 23:13:51 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 5 Aug 2010 22:13:51 -0500
Subject: [Bioperl-l] MUMmer parser work
In-Reply-To: <1281056805.2414.26.camel@zoidberg.mbs.adelaide.edu.au>
References: <1281056805.2414.26.camel@zoidberg.mbs.adelaide.edu.au>
Message-ID: <80AF6158-9ADF-47A6-97EC-C322F75C8959@illinois.edu>

Dan,

Just so you know, there is a proposed MUMmer AlignIO parser that John (genehack) is planning on trying to incorporate in:

http://bugzilla.open-bio.org/show_bug.cgi?id=2701

It currently lacks significant tests, so feel free to chip in there as needed.

chris

On Aug 5, 2010, at 8:06 PM, Dan Kortschak wrote:

> Hello Everyone,
> 
> I've just noticed the absence of a MUMmer parser and thought that it
> might be a worthwhile contribution to bioperl-run (I won't be able to
> start on this for a while, but given Mark's excellent work on
> CommandExts, it should take too long to get up when I do have time). Has
> anyone made any effort in this direction that I would be stepping on, or
> if they have left it, that I could pick up to shorten the work time?
> 
> cheers
> Dan
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From greg at ebi.ac.uk  Fri Aug  6 05:47:21 2010
From: greg at ebi.ac.uk (Gregory Jordan)
Date: Fri, 6 Aug 2010 10:47:21 +0100
Subject: [Bioperl-l] call for a TreeIO volunteer
In-Reply-To: <91AC5B00-5969-4C56-B08A-6EEA76916A10@sbc.su.se>
References: <91AC5B00-5969-4C56-B08A-6EEA76916A10@sbc.su.se>
Message-ID: <AANLkTiknuVWFiz6kmOYAsHaLnPxMZEBWsHeBtv0yfuCQ@mail.gmail.com>

I can help out with these. I'm pretty sure I've previously fought with (and
perhaps even come up with a fix for) bug 3039, and I can take a look at 3007
too.

Now lemme just see if I can get up and running with the Bioperl test suite.
I'll give a shout if I run into any problems.

Cheers,
 Greg

On 5 August 2010 13:16, Dave Messina <David.Messina at sbc.su.se> wrote:

> Hi everybody,
>
> We've got a couple of small open bugs related to the Bio::TreeIO modules,
> and we could really use someone to take a look at them. Ideally, that
> someone would have familiarity with TreeIO already.*
>
> It'd help us to get the next release (1.6.2) out the door.
>
> The bugs in question are:
> - TreeIO::newick writes root node branch length incorrectly
> http://bugzilla.open-bio.org/show_bug.cgi?id=3039
>
> - Bio::TreeIO::nhx cannot parse empty [&&NHX] + round-trip failure
> http://bugzilla.open-bio.org/show_bug.cgi?id=3007
>
>
> Thanks,
> Dave
> on behalf of the core developers
>
>
> * Even if you don't, though, if you've been looking for an opportunity to
> contribute to BioPerl, and this sounds like something you'd like to work on,
> by all means raise your hand.
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From jun.yin at ucd.ie  Fri Aug  6 06:52:14 2010
From: jun.yin at ucd.ie (Jun Yin)
Date: Fri, 06 Aug 2010 11:52:14 +0100
Subject: [Bioperl-l] Packages retrieving online alignment sequences
Message-ID: <00d901cb3555$6e3a5500$4aaeff00$%yin@ucd.ie>

Hi, all,

 
I am the google summer of code student working on refactoring Bio::Align
subsystem. I recently implemented several packages retrieving online
alignment sequences. The aim of the packages are to provide convenient
methods to retrieve online alignment sequences for the BioPerl users. The
alignment sequences are converted into Bio::SimpleAlign object after the
retrieval, which will be easy to manipulate and write to local disk. Now the
packages support Pfam, Rfam, Prosite and Entrez Protein Clusters databases.

 
Here is the structure of the packages:

Packages

Bio::DB::Align (interface, and calling other packages)

Bio::DB::Align::Pfam (retrieving alignment from Pfam)

Bio::DB::Align::Rfam (retrieving alignment from Rfam)

Bio::DB::Align:Prosite (retrieving alignment from Prosite)

Bio::DB::Align:ProtClustDB (retrieving alignment from Entrez Protein
Clusters Database)

 
Usually four methods are provided for each package:

Methods

get_Aln_by_id (retrieving alignment by id and returns Bio::SimpleAlign
object)

get_Aln_by_acc (retrieving alignment by acession and returns
Bio::SimpleAlign object) (Rfam and Prosite only supports this method)

id2acc (id to accession conversion)

acc2id (accession to id conversion)

 
These packages are built dependent on LWP::UserAgent, HTTP::Request and
Bio::DB::GenericWebAgent. Bio::DB::Align::ProtClustDB is dependent on
Bio::DB::EUtilities.

 
Calling the packages can be:

 
my $dbobj=Bio::DB::Align->new(-db=>"rfam");

Or, my $dbobj= Bio::DB::Align::Pfam->new();


my $aln=$dbobj->get_Aln_by_acc("RF0001");
my $aln2=$dbobj->get_Aln_by_acc(-accession=>"RF0001",-alignment=>"full");

print $aln->length();

foreach my $seq ($aln->each_Seq) {
#do something
}

 
I have done some tests on these packages. And, I will write them into
standard tests later. Any suggestions on these packages are welcome.

 
Cheers,

Jun Yin

Ph.D. student in U.C.D.

 
Bioinformatics Laboratory

Conway Institute

University College Dublin

 
From David.Messina at sbc.su.se  Fri Aug  6 08:59:19 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 6 Aug 2010 14:59:19 +0200
Subject: [Bioperl-l] call for a TreeIO volunteer
In-Reply-To: <AANLkTiknuVWFiz6kmOYAsHaLnPxMZEBWsHeBtv0yfuCQ@mail.gmail.com>
References: <91AC5B00-5969-4C56-B08A-6EEA76916A10@sbc.su.se>
	<AANLkTiknuVWFiz6kmOYAsHaLnPxMZEBWsHeBtv0yfuCQ@mail.gmail.com>
Message-ID: <6D6DAA77-2A2F-4AAA-B36D-FACED1FDE383@sbc.su.se>


> I can help out with these. I'm pretty sure I've previously fought with (and perhaps even come up with a fix for) bug 3039, and I can take a look at 3007 too.

Awesome ? thanks Greg!


> Now lemme just see if I can get up and running with the Bioperl test suite. I'll give a shout if I run into any problems.

Please do.


Dave


From David.Messina at sbc.su.se  Fri Aug  6 09:06:47 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 6 Aug 2010 15:06:47 +0200
Subject: [Bioperl-l] Packages retrieving online alignment sequences
In-Reply-To: <00d901cb3555$6e3a5500$4aaeff00$%yin@ucd.ie>
References: <00d901cb3555$6e3a5500$4aaeff00$%yin@ucd.ie>
Message-ID: <F90660F7-74F9-41F2-A3E4-B3B42B817A0D@sbc.su.se>

Sounds great, Jun!

Did you happen to test your code on very large alignments? I know there's one in Pfam that's something like 100,000 sequences. An rRNA, I believe.


Dave


From jun.yin at ucd.ie  Fri Aug  6 09:11:41 2010
From: jun.yin at ucd.ie (Jun Yin)
Date: Fri, 06 Aug 2010 14:11:41 +0100
Subject: [Bioperl-l] Packages retrieving online alignment sequences
In-Reply-To: <F90660F7-74F9-41F2-A3E4-B3B42B817A0D@sbc.su.se>
References: <00d901cb3555$6e3a5500$4aaeff00$%yin@ucd.ie>
	<F90660F7-74F9-41F2-A3E4-B3B42B817A0D@sbc.su.se>
Message-ID: <00fc01cb3568$e97968b0$bc6c3a10$%yin@ucd.ie>

Hi, Dave,

Thx for reminding me this. I will definitely try it.

Cheers,
Jun Yin
Ph.D.?student in U.C.D.

Bioinformatics Laboratory
Conway Institute
University College Dublin


-----Original Message-----
From: Dave Messina [mailto:David.Messina at sbc.su.se] 
Sent: Friday, August 06, 2010 2:07 PM
To: Jun Yin
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Packages retrieving online alignment sequences

Sounds great, Jun!

Did you happen to test your code on very large alignments? I know there's
one in Pfam that's something like 100,000 sequences. An rRNA, I believe.


Dave


__________ Information from ESET Smart Security, version of virus signature
database 5346 (20100806) __________

The message was checked by ESET Smart Security.

http://www.eset.com


__________ Information from ESET Smart Security, version of virus signature
database 5346 (20100806) __________

The message was checked by ESET Smart Security.

http://www.eset.com
 

From cjfields at illinois.edu  Fri Aug  6 09:19:54 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 6 Aug 2010 08:19:54 -0500
Subject: [Bioperl-l] call for a TreeIO volunteer
In-Reply-To: <6D6DAA77-2A2F-4AAA-B36D-FACED1FDE383@sbc.su.se>
References: <91AC5B00-5969-4C56-B08A-6EEA76916A10@sbc.su.se>
	<AANLkTiknuVWFiz6kmOYAsHaLnPxMZEBWsHeBtv0yfuCQ@mail.gmail.com>
	<6D6DAA77-2A2F-4AAA-B36D-FACED1FDE383@sbc.su.se>
Message-ID: <8CB3DE9A-4C5C-42A3-94B4-8818D7143951@illinois.edu>

On Aug 6, 2010, at 7:59 AM, Dave Messina wrote:

> 
>> I can help out with these. I'm pretty sure I've previously fought with (and perhaps even come up with a fix for) bug 3039, and I can take a look at 3007 too.
> 
> Awesome ? thanks Greg!
> 
> 
>> Now lemme just see if I can get up and running with the Bioperl test suite. I'll give a shout if I run into any problems.
> 
> Please do.
> 
> 
> 
> Dave

Agreed, and thanks for helping out!

chris


From dianabowley at gmail.com  Fri Aug  6 18:33:57 2010
From: dianabowley at gmail.com (DRBowley)
Date: Fri, 6 Aug 2010 15:33:57 -0700 (PDT)
Subject: [Bioperl-l] BioPerl install issues
Message-ID: <b70994fe-d6c3-4c58-8b45-dfe50b9a8fe5@t5g2000prd.googlegroups.com>

I'm new to both perl and bioperl and I'm having issues installing
bioperl.  I'm trying to install on a Mac OS 10.6.4, and I've already
installed perl (5.10.0).  I tried installing using the recommended
approach for Mac - via Fink...
"fink install bioperl-pm5100"

Looking back over the terminal window text it looks like the problem
is:
"This package requires Module::Build v0.2805 or greater to install
itself."

I tried doing "fink selfupdate" and that did not fix the problem.

Any suggestions?

Thanks!
Diana


From Kevin.M.Brown at asu.edu  Fri Aug  6 18:50:45 2010
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Fri, 6 Aug 2010 15:50:45 -0700
Subject: [Bioperl-l] BioPerl install issues
In-Reply-To: <b70994fe-d6c3-4c58-8b45-dfe50b9a8fe5@t5g2000prd.googlegroups.com>
References: <b70994fe-d6c3-4c58-8b45-dfe50b9a8fe5@t5g2000prd.googlegroups.com>
Message-ID: <1A4207F8295607498283FE9E93B775B406E44A05@EX02.asurite.ad.asu.edu>

http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix#INSTALLING_BIOPE
RL_THE_EASY_WAY_USING_Build.PL

Not sure why you had to install perl since it should have been part of
the stock OSX install (or at least it was last time I logged onto a
mac). Not sure why the Fink method has so many issues, but might try the
above which works for linux or bsd.

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of DRBowley
Sent: Friday, August 06, 2010 3:34 PM
To: bioperl-l at bioperl.org
Subject: [Bioperl-l] BioPerl install issues

I'm new to both perl and bioperl and I'm having issues installing
bioperl.  I'm trying to install on a Mac OS 10.6.4, and I've already
installed perl (5.10.0).  I tried installing using the recommended
approach for Mac - via Fink...
"fink install bioperl-pm5100"

Looking back over the terminal window text it looks like the problem
is:
"This package requires Module::Build v0.2805 or greater to install
itself."

I tried doing "fink selfupdate" and that did not fix the problem.

Any suggestions?

Thanks!
Diana
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From skastu01 at students.poly.edu  Fri Aug  6 20:03:50 2010
From: skastu01 at students.poly.edu (Lakshmi Kastury)
Date: Sat, 7 Aug 2010 00:03:50 +0000
Subject: [Bioperl-l] BioPerl install issues
Message-ID: <BLU106-W267722078497EAEDEC08C594920@phx.gbl>


Hi -
I went through several failed attempts on MACOS Snow Leopard, and fink was a dead end. Eventually I succeeded to install on Windows Vista using CPAN. I am not sure if this method will work with MACOS:

1. Opened command prompt.
2. Typed command: >perl -MCPAN -e "install Bundle::BioPerl"
3. Answered yes to the series of questions, which prompts install of several bundles and a compiler.

The instructions were in a link from:
http://bioperl.org/Core/Latest/INSTALL

All the best,
Lakshmi

> Date: Fri, 6 Aug 2010 15:33:57 -0700
> From: dianabowley at gmail.com
> To: bioperl-l at bioperl.org
> Subject: [Bioperl-l] BioPerl install issues
> 
> I'm new to both perl and bioperl and I'm having issues installing
> bioperl.  I'm trying to install on a Mac OS 10.6.4, and I've already
> installed perl (5.10.0).  I tried installing using the recommended
> approach for Mac - via Fink...
> "fink install bioperl-pm5100"
> 
> Looking back over the terminal window text it looks like the problem
> is:
> "This package requires Module::Build v0.2805 or greater to install
> itself."
> 
> I tried doing "fink selfupdate" and that did not fix the problem.
> 
> Any suggestions?
> 
> Thanks!
> Diana
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
 		 	   		  

From David.Messina at sbc.su.se  Sat Aug  7 02:47:40 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Sat, 7 Aug 2010 08:47:40 +0200
Subject: [Bioperl-l] BioPerl install issues
In-Reply-To: <BLU106-W267722078497EAEDEC08C594920@phx.gbl>
References: <BLU106-W267722078497EAEDEC08C594920@phx.gbl>
Message-ID: <5BE9DB7C-9A51-4C09-8F83-8CA8ED4AADFE@sbc.su.se>


On Aug 7, 2010, at 02:03 , Lakshmi Kastury wrote:

>  I am not sure if this method will work with MACOS:

It will. CPAN is cross-platform and is the best way to install BioPerl.


Dave


From cjfields at illinois.edu  Sat Aug  7 09:58:56 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Sat, 7 Aug 2010 08:58:56 -0500
Subject: [Bioperl-l] BioPerl install issues
In-Reply-To: <5BE9DB7C-9A51-4C09-8F83-8CA8ED4AADFE@sbc.su.se>
References: <BLU106-W267722078497EAEDEC08C594920@phx.gbl>
	<5BE9DB7C-9A51-4C09-8F83-8CA8ED4AADFE@sbc.su.se>
Message-ID: <A21BBC5D-1D71-4534-B278-9FCFA0BB6DA8@illinois.edu>

It should work fine.  Even installing from trunk right now works w/o failing tests. 

chris

On Aug 7, 2010, at 1:47 AM, Dave Messina wrote:

> 
> On Aug 7, 2010, at 02:03 , Lakshmi Kastury wrote:
> 
>> I am not sure if this method will work with MACOS:
> 
> It will. CPAN is cross-platform and is the best way to install BioPerl.
> 
> 
> Dave
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From greg at ebi.ac.uk  Sat Aug  7 17:14:58 2010
From: greg at ebi.ac.uk (Gregory Jordan)
Date: Sat, 7 Aug 2010 22:14:58 +0100
Subject: [Bioperl-l] Packages retrieving online alignment sequences
In-Reply-To: <00fc01cb3568$e97968b0$bc6c3a10$%yin@ucd.ie>
References: <00d901cb3555$6e3a5500$4aaeff00$%yin@ucd.ie>
	<F90660F7-74F9-41F2-A3E4-B3B42B817A0D@sbc.su.se> 
	<00fc01cb3568$e97968b0$bc6c3a10$%yin@ucd.ie>
Message-ID: <AANLkTimL938B1ovmOKC_FBNw1OwjipVpjOXZfN+P5Kf5@mail.gmail.com>

Maybe I'm just a bit naive here, but what is the expected difference between
accession and ID and why do we need a separate method for each? Seems to me
that one could just have a single method, get_Aln, which determines under
the hood whether the query string is an accession or ID.

It would be nice if the SimpleAlign object had its Annotation filled with
some extra metadata (such as accession, ID, database version number, URI,
etc.).

One other thing: have you thought about adding an Ensembl adaptor? Or maybe
something similar already exists in BioPerl...?

Sure Ensembl provides their own Perl API, but for someone who doesn't want
to go through the hassle of installing it from CVS (pardon my french, but
wtf!?! Who still uses CVS) and learning a whole new API, it might be
convenient to have a simple BioPerl module for quickly grabbing gene family
alignments from the public Ensembl MySQL databases. I'd be willing to help
write the necessary SQL queries for this.

greg

On 6 August 2010 14:11, Jun Yin <jun.yin at ucd.ie> wrote:

> Hi, Dave,
>
> Thx for reminding me this. I will definitely try it.
>
> Cheers,
> Jun Yin
> Ph.D. student in U.C.D.
>
> Bioinformatics Laboratory
> Conway Institute
> University College Dublin
>
>
> -----Original Message-----
> From: Dave Messina [mailto:David.Messina at sbc.su.se]
> Sent: Friday, August 06, 2010 2:07 PM
> To: Jun Yin
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Packages retrieving online alignment sequences
>
> Sounds great, Jun!
>
> Did you happen to test your code on very large alignments? I know there's
> one in Pfam that's something like 100,000 sequences. An rRNA, I believe.
>
>
> Dave
>
>
> __________ Information from ESET Smart Security, version of virus signature
> database 5346 (20100806) __________
>
> The message was checked by ESET Smart Security.
>
> http://www.eset.com
>
>
>
>
> __________ Information from ESET Smart Security, version of virus signature
> database 5346 (20100806) __________
>
> The message was checked by ESET Smart Security.
>
> http://www.eset.com
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at illinois.edu  Sat Aug  7 18:07:39 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Sat, 7 Aug 2010 17:07:39 -0500
Subject: [Bioperl-l] Packages retrieving online alignment sequences
In-Reply-To: <AANLkTimL938B1ovmOKC_FBNw1OwjipVpjOXZfN+P5Kf5@mail.gmail.com>
References: <00d901cb3555$6e3a5500$4aaeff00$%yin@ucd.ie>
	<F90660F7-74F9-41F2-A3E4-B3B42B817A0D@sbc.su.se>
	<00fc01cb3568$e97968b0$bc6c3a10$%yin@ucd.ie>
	<AANLkTimL938B1ovmOKC_FBNw1OwjipVpjOXZfN+P5Kf5@mail.gmail.com>
Message-ID: <21E3B6D7-01BC-4DDA-B5B3-06F1F5AD7105@illinois.edu>

On Aug 7, 2010, at 4:14 PM, Gregory Jordan wrote:

> Maybe I'm just a bit naive here, but what is the expected difference between
> accession and ID and why do we need a separate method for each?

Depends on the remote service, but in many cases there is a difference.  With NCBI eutils you can have either an accession and the unique identifier (UID, or GI for nuc/protein seqs).  efetch can use both, but only the UID is guaranteed to retrieve a single sequence all the time; the accession can (very rarely) map to more than one sequence.  

The other eutils services require either a string (esearch) or a UID, but do not allow an accession.

> Seems to me
> that one could just have a single method, get_Aln, which determines under
> the hood whether the query string is an accession or ID.

A simpler method could be introduced, but I can see that being potentially brittle in the long run.  A naked alphanumeric string doesn't reveal much about what it is at face value w/o knowing database/service-specific behavior.  And then we're reliant on that behavior not changing, which we can't guarantee (this has bitten us in the past).  What would one do if NCBI (for instance) allowed accessions derived completely of digits, or conversely a unique ID with mixed alphanumerics?

Using methods specific for ID/acc at least guarantees a behavior on the backend w/o guessing, and if there is no danger of overlap (a service accepts either/or) one could simply be an alias of the other.

> It would be nice if the SimpleAlign object had its Annotation filled with
> some extra metadata (such as accession, ID, database version number, URI,
> etc.).

According to the deobfuscator SimpleAlign does have accession() and id().  The others could be simple attributes, and can be added as simple getter/setters, or as annotation via Bio::Annotation (this is the way Stockholm annotation is currently handled).  Something to think about.

> One other thing: have you thought about adding an Ensembl adaptor? Or maybe
> something similar already exists in BioPerl...?

That's a good idea, though it might make more sense if this was done when mem-efficient (possibly DB-dependent) AlignI modules are present within bioperl, which is part of the GSoC (see below).  For instance, have a Bio::Align::AlignI with a backend ensembl DB adaptor that works lazily.

If using the Ensembl Perl API, a few possible roadblocks/problems might pop up. Ensembl currently requires bioperl (v1.2.3, but it works with the latest as well, at least when I've used it).  If using the ensembl perl API we would just need to ensure we aren't conflicting with ensembl code that pulls in bioperl classes expecting a v1.2.3 API when we only support the latest.  I don't foresee this being an issue, though (there is precedent for this, see Sendu's Ensembl module Bio::Tools::Run::Ensembl in bioperl-run).

> Sure Ensembl provides their own Perl API, but for someone who doesn't want
> to go through the hassle of installing it from CVS (pardon my french, but
> wtf!?! Who still uses CVS) and learning a whole new API, it might be
> convenient to have a simple BioPerl module for quickly grabbing gene family
> alignments from the public Ensembl MySQL databases. I'd be willing to help
> write the necessary SQL queries for this.
> 
> greg

The GSoC project on alignment subsystem refactoring will be finishing up this month, so I'm sure Jun discuss ideas for initial DB-dependent implementations.  The more input and coders implementing the better, IMO.

As for writing up an adaptor to ensembl outside of it's API, overall I don't think it's a bad idea, but if it's possible maybe start without reinventing things, then move to direct SQL.  Unless it's easier to use SQL.

chris

> On 6 August 2010 14:11, Jun Yin <jun.yin at ucd.ie> wrote:
> 
>> Hi, Dave,
>> 
>> Thx for reminding me this. I will definitely try it.
>> 
>> Cheers,
>> Jun Yin
>> Ph.D. student in U.C.D.
>> 
>> Bioinformatics Laboratory
>> Conway Institute
>> University College Dublin
>> 
>> 
>> -----Original Message-----
>> From: Dave Messina [mailto:David.Messina at sbc.su.se]
>> Sent: Friday, August 06, 2010 2:07 PM
>> To: Jun Yin
>> Cc: bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] Packages retrieving online alignment sequences
>> 
>> Sounds great, Jun!
>> 
>> Did you happen to test your code on very large alignments? I know there's
>> one in Pfam that's something like 100,000 sequences. An rRNA, I believe.
>> 
>> 
>> Dave
>> 
>> 
>> __________ Information from ESET Smart Security, version of virus signature
>> database 5346 (20100806) __________
>> 
>> The message was checked by ESET Smart Security.
>> 
>> http://www.eset.com
>> 
>> 
>> 
>> 
>> __________ Information from ESET Smart Security, version of virus signature
>> database 5346 (20100806) __________
>> 
>> The message was checked by ESET Smart Security.
>> 
>> http://www.eset.com
>> 
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hartzell at alerce.com  Sat Aug  7 17:45:04 2010
From: hartzell at alerce.com (George Hartzell)
Date: Sat, 7 Aug 2010 14:45:04 -0700
Subject: [Bioperl-l] BioPerl install issues
In-Reply-To: <A21BBC5D-1D71-4534-B278-9FCFA0BB6DA8@illinois.edu>
References: <BLU106-W267722078497EAEDEC08C594920@phx.gbl>
	<5BE9DB7C-9A51-4C09-8F83-8CA8ED4AADFE@sbc.su.se>
	<A21BBC5D-1D71-4534-B278-9FCFA0BB6DA8@illinois.edu>
Message-ID: <19549.54240.499140.501136@gargle.gargle.HOWL>

Chris Fields writes:
 > It should work fine.  Even installing from trunk right now works
 > w/o failing tests.  

As a slight aside, if you're looking to build a current perl binary
for your mac (e.g. 5.12.1) you should take a look at perlbrew
(http://search.cpan.org/dist/App-perlbrew/).  The three steps at the
top of the installation section of the README are all you need to get
going.  Even a manager can do it.

If you're using bash on the mac via terminal you'll probably want to
put the one-liner they prescribe into your .bash_profile instead of
your .bashrc, but everything else just flows right along.

Once you have that in place you have a nicely isolated system into
which you can install things to your hearts content without worrying
about PERL5LIB and local::lib and the rest.

g.


From cjfields at illinois.edu  Sat Aug  7 21:19:54 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Sat, 7 Aug 2010 20:19:54 -0500
Subject: [Bioperl-l] BioPerl install issues
In-Reply-To: <19549.54240.499140.501136@gargle.gargle.HOWL>
References: <BLU106-W267722078497EAEDEC08C594920@phx.gbl>
	<5BE9DB7C-9A51-4C09-8F83-8CA8ED4AADFE@sbc.su.se>
	<A21BBC5D-1D71-4534-B278-9FCFA0BB6DA8@illinois.edu>
	<19549.54240.499140.501136@gargle.gargle.HOWL>
Message-ID: <EA5D5C26-7F3E-46B5-9CD0-F3D51B5F9511@illinois.edu>

On Aug 7, 2010, at 4:45 PM, George Hartzell wrote:

> Chris Fields writes:
>> It should work fine.  Even installing from trunk right now works
>> w/o failing tests.  
> 
> As a slight aside, if you're looking to build a current perl binary
> for your mac (e.g. 5.12.1) you should take a look at perlbrew
> (http://search.cpan.org/dist/App-perlbrew/).  The three steps at the
> top of the installation section of the README are all you need to get
> going.  Even a manager can do it.
> 
> If you're using bash on the mac via terminal you'll probably want to
> put the one-liner they prescribe into your .bash_profile instead of
> your .bashrc, but everything else just flows right along.
> 
> Once you have that in place you have a nicely isolated system into
> which you can install things to your hearts content without worrying
> about PERL5LIB and local::lib and the rest.
> 
> g.

Have to second using perlbrew, started using it for my local Ubuntu installation (don't have it running on my macbook yet, but it's in the plans).

chris


From greg at ebi.ac.uk  Sun Aug  8 02:12:41 2010
From: greg at ebi.ac.uk (Gregory Jordan)
Date: Sun, 8 Aug 2010 07:12:41 +0100
Subject: [Bioperl-l] Packages retrieving online alignment sequences
In-Reply-To: <21E3B6D7-01BC-4DDA-B5B3-06F1F5AD7105@illinois.edu>
References: <00d901cb3555$6e3a5500$4aaeff00$%yin@ucd.ie>
	<F90660F7-74F9-41F2-A3E4-B3B42B817A0D@sbc.su.se> 
	<00fc01cb3568$e97968b0$bc6c3a10$%yin@ucd.ie>
	<AANLkTimL938B1ovmOKC_FBNw1OwjipVpjOXZfN+P5Kf5@mail.gmail.com> 
	<21E3B6D7-01BC-4DDA-B5B3-06F1F5AD7105@illinois.edu>
Message-ID: <AANLkTim9jkmKSGHm5bHPLOF3_xf+p9xMTN5Ha7bOMR7P@mail.gmail.com>

On 7 August 2010 23:07, Chris Fields <cjfields at illinois.edu> wrote:

>
> A simpler method could be introduced, but I can see that being potentially
> brittle in the long run.  A naked alphanumeric string doesn't reveal much
> about what it is at face value w/o knowing database/service-specific
> behavior.  And then we're reliant on that behavior not changing, which we
> can't guarantee (this has bitten us in the past).  What would one do if NCBI
> (for instance) allowed accessions derived completely of digits, or
> conversely a unique ID with mixed alphanumerics?
>
> Using methods specific for ID/acc at least guarantees a behavior on the
> backend w/o guessing, and if there is no danger of overlap (a service
> accepts either/or) one could simply be an alias of the other.
>

Thanks for the clarification on IDs vs accessions. As long as the behavior
and distinction are well-documented, I'm sure it won't make too much of a
difference.

My main concern was just that having two similar methods -- with no clearly
laid out distinction between the two and one of them only supported by half
of the implementing subclasses -- might confuse potential users.

As a point of reference: both Rfam and Pfam allow either an ID or an
accession in their front-page search interface (http://www.pfam.org /
http://www.rfam.org/). In fact, they seem to entirely hide the distinction
between ID and Accession from the end user; nowhere on the Rfam page for an
individual result is it clear which string is the accession and which is the
ID (http://rfam.sanger.ac.uk/family/snoZ107_R87).

Thus, a potential user of the Rfam module wouldn't know whether to call the
get_by_ID or get_by_Accession method, even after looking at the Rfam page
for his / her desired alignment!

As you can probably tell, I'm all in favor of a unified search whenever
feasible / possible. :-)


> As for writing up an adaptor to ensembl outside of it's API, overall I
> don't think it's a bad idea, but if it's possible maybe start without
> reinventing things, then move to direct SQL.  Unless it's easier to use SQL.
>
>
For fetching Ensembl's gene family alignments, using the SQL will be
easiest. They don't tend to get unreasonably large in terms of memory  -- I
think the biggest tend to be ~700 sequences with a few thousand alignment
columns or so -- and it's a simple table join or two to get both the tree
and alignment from the database.

For genomic alignments, I agree that a more memory-efficient and/or lazy
backend would be necessary. And it's pretty much impossible to get those
things out of the Ensembl tables without using their API.

--greg


From dan.kortschak at adelaide.edu.au  Sun Aug  8 20:53:43 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Mon, 09 Aug 2010 10:23:43 +0930
Subject: [Bioperl-l] MUMmer parser work
In-Reply-To: <80AF6158-9ADF-47A6-97EC-C322F75C8959@illinois.edu>
References: <1281056805.2414.26.camel@zoidberg.mbs.adelaide.edu.au>
	<80AF6158-9ADF-47A6-97EC-C322F75C8959@illinois.edu>
Message-ID: <1281315223.2414.48.camel@zoidberg.mbs.adelaide.edu.au>

Hi Chris,

Is that set of files planned to be included in the git repository on
bioperl-live? I don't want to push something that is being organised by
someone else.

cheers
Dan

On Thu, 2010-08-05 at 22:13 -0500, Chris Fields wrote:
> Dan,
> 
> Just so you know, there is a proposed MUMmer AlignIO parser that John (genehack) is planning on trying to incorporate in:
> 
> http://bugzilla.open-bio.org/show_bug.cgi?id=2701
> 
> It currently lacks significant tests, so feel free to chip in there as needed.
> 
> chris


From genehack at genehack.org  Sun Aug  8 21:42:27 2010
From: genehack at genehack.org (John SJ Anderson)
Date: Sun, 8 Aug 2010 21:42:27 -0400
Subject: [Bioperl-l] MUMmer parser work
In-Reply-To: <1281315223.2414.48.camel@zoidberg.mbs.adelaide.edu.au>
References: <1281056805.2414.26.camel@zoidberg.mbs.adelaide.edu.au>
	<80AF6158-9ADF-47A6-97EC-C322F75C8959@illinois.edu>
	<1281315223.2414.48.camel@zoidberg.mbs.adelaide.edu.au>
Message-ID: <5BEA6ECA-B7A7-4417-BC91-763AB956347A@genehack.org>

I'm working on getting those files into a topic branch in bioperl-live so they can be reviewed -- that'll probably be pushed back to the main master within the next couple days at the latest. 

j.

On Aug 8, 2010, at 20:53 , Dan Kortschak wrote:

> Hi Chris,
> 
> Is that set of files planned to be included in the git repository on
> bioperl-live? I don't want to push something that is being organised by
> someone else.
> 
> cheers
> Dan
> 
> On Thu, 2010-08-05 at 22:13 -0500, Chris Fields wrote:
>> Dan,
>> 
>> Just so you know, there is a proposed MUMmer AlignIO parser that John (genehack) is planning on trying to incorporate in:
>> 
>> http://bugzilla.open-bio.org/show_bug.cgi?id=2701
>> 
>> It currently lacks significant tests, so feel free to chip in there as needed.
>> 
>> chris
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From dan.kortschak at adelaide.edu.au  Sun Aug  8 22:03:52 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Mon, 09 Aug 2010 11:33:52 +0930
Subject: [Bioperl-l] MUMmer parser work
In-Reply-To: <5BEA6ECA-B7A7-4417-BC91-763AB956347A@genehack.org>
References: <1281056805.2414.26.camel@zoidberg.mbs.adelaide.edu.au>
	<80AF6158-9ADF-47A6-97EC-C322F75C8959@illinois.edu>
	<1281315223.2414.48.camel@zoidberg.mbs.adelaide.edu.au>
	<5BEA6ECA-B7A7-4417-BC91-763AB956347A@genehack.org>
Message-ID: <1281319432.2414.49.camel@zoidberg.mbs.adelaide.edu.au>

Excellent. Thanks for that.

Dan

On Sun, 2010-08-08 at 21:42 -0400, John SJ Anderson wrote:
> I'm working on getting those files into a topic branch in bioperl-live so they can be reviewed -- that'll probably be pushed back to the main master within the next couple days at the latest. 
> 
> j.


From cjfields at illinois.edu  Mon Aug  9 22:40:07 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 9 Aug 2010 21:40:07 -0500
Subject: [Bioperl-l] bioperl-live, moving Bio->lib/Bio
Message-ID: <DE527A62-E6E7-45B0-96A5-F94E7A7A137F@illinois.edu>

Any objections to moving the Bio directory to lib/Bio in bioperl-live?  It's a more standard location for code in most distributions; I have a branch (topic/cjfields_standard_lib) that has this working, though it's possible that it needs more work.

chris


From genehack at genehack.org  Tue Aug 10 04:30:44 2010
From: genehack at genehack.org (John SJ Anderson)
Date: Tue, 10 Aug 2010 04:30:44 -0400
Subject: [Bioperl-l] bioperl-live, moving Bio->lib/Bio
In-Reply-To: <DE527A62-E6E7-45B0-96A5-F94E7A7A137F@illinois.edu>
References: <DE527A62-E6E7-45B0-96A5-F94E7A7A137F@illinois.edu>
Message-ID: <B2C73D74-1F72-402B-A3F7-C4E3ECF7D3B6@genehack.org>


On Aug 9, 2010, at 22:40 , Chris Fields wrote:

> Any objections to moving the Bio directory to lib/Bio in bioperl-live?  

+1 on this idea. 

j.


From genehack at genehack.org  Tue Aug 10 07:21:51 2010
From: genehack at genehack.org (John Anderson)
Date: Tue, 10 Aug 2010 07:21:51 -0400
Subject: [Bioperl-l] MUMmer parser work
In-Reply-To: <5BEA6ECA-B7A7-4417-BC91-763AB956347A@genehack.org>
References: <1281056805.2414.26.camel@zoidberg.mbs.adelaide.edu.au>
	<80AF6158-9ADF-47A6-97EC-C322F75C8959@illinois.edu>
	<1281315223.2414.48.camel@zoidberg.mbs.adelaide.edu.au>
	<5BEA6ECA-B7A7-4417-BC91-763AB956347A@genehack.org>
Message-ID: <7A4F93AB-1BF7-4775-BC0E-38E7B431ECC6@genehack.org>


On Aug 8, 2010, at 9:42 PM, John SJ Anderson wrote:

> I'm working on getting those files into a topic branch in bioperl-live so they can be reviewed -- that'll probably be pushed back to the main master within the next couple days at the latest. 

Okay, the files have been added to topic/bug-2701 -- see <http://github.com/bioperl/bioperl-live/commits/topic/bug-2701>.

Please note, these are just the files from the bug report, slotted into the appropriate spots. I haven't reviewed the code or done anything about the non-BioPerl-y tests or the general lack of test coverage. I hope to do something about that in the coming week, but if somebody beats me to it, that would be okay too.

j.


From maj at fortinbras.us  Tue Aug 10 19:52:05 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 10 Aug 2010 19:52:05 -0400
Subject: [Bioperl-l] bioperl-live, moving Bio->lib/Bio
In-Reply-To: <DE527A62-E6E7-45B0-96A5-F94E7A7A137F@illinois.edu>
References: <DE527A62-E6E7-45B0-96A5-F94E7A7A137F@illinois.edu>
Message-ID: <1C55239986494A8D82BDC21A85B324E9@NewLife>

+1
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "BioPerl List" <bioperl-l at lists.open-bio.org>
Sent: Monday, August 09, 2010 10:40 PM
Subject: [Bioperl-l] bioperl-live, moving Bio->lib/Bio


> Any objections to moving the Bio directory to lib/Bio in bioperl-live?  It's a 
> more standard location for code in most distributions; I have a branch 
> (topic/cjfields_standard_lib) that has this working, though it's possible that 
> it needs more work.
>
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From fayroz_farouk at yahoo.com  Sun Aug  8 04:24:31 2010
From: fayroz_farouk at yahoo.com (fayroz)
Date: Sun, 8 Aug 2010 01:24:31 -0700 (PDT)
Subject: [Bioperl-l] using HMMER
Message-ID: <603590.1072.qm@web112620.mail.gq1.yahoo.com>

i need your help, i?am a new perl user and want to use bioperl modules to run 
HMMER program ( HMMsearch) i have" model.hmm" and a "fasta file" to?see?which of 
them are similar?with the model
i write this code but there is a problems

#!/usr/local/bin/perl W
use Bio::AlignIO;
use Bio::SearchIO;
use Bio::SeqIO ;
use Bio::Tools::Run::Hmmer;

# run hmmsearch (similar for hmmpfam)
my $factory = Bio::Tools::Run::Hmmer->new(-hmm => 'h6_avian.hmm',-informat => 
'fasta');
my $seq = Bio::SeqIO->new('-file'=> "one_seq.fa", '-format'=>'Fasta');

# Pass the factory a Bio::Seq object or a file name, returns a Bio::SearchIO
my $searchio = $factory->hmmsearch($seq);

while (my $result = $searchio->next_result){
while(my $hit = $result->next_hit){
while (my $hsp = $hit->next_hsp){
print join("\t", ( $result->query_name,
$hsp->query->start,
$hsp->query->end,
$hit->name,
$hsp->hit->start,
$hsp->hit->end,
$hsp->score,
$hsp->evalue,
$hsp->seq_str,
)), "\n";
}
}
}


exceptions:
MSG: Unknown kind of input 'Bio::SeqIO::fasta=HASH(0x329a504)'
STACK Bio::Tools::Run::Hmmer::_setinput 
D:/Perl/site/lib/Bio/Tools/Run/Hmmer.pm:381
STACK Bio::Tools::Run::Hmmer::hmmsearch 
D:/Perl/site/lib/Bio/Tools/Run/Hmmer.pm:352
?STACK toplevel test_bioperl.pl:12
thank you

fayroz?


From douglas.hoen at gmail.com  Tue Aug 10 21:54:53 2010
From: douglas.hoen at gmail.com (Douglas Hoen)
Date: Tue, 10 Aug 2010 21:54:53 -0400
Subject: [Bioperl-l] Bio::SeqFeature::SimilarityPair->from_searchResult()?
Message-ID: <4513D6B2-F7B3-4A6E-91CA-879C9E372E84@gmail.com>

Hi,

I was wondering why the Synopsis in the docs for Bio::SeqFeature::SimilarityPair has the following:
$sim_pair = Bio::SeqFeature::SimilarityPair->from_searchResult($blastHit);

There doesn't actually seem to be a from_searchResult method. Am I missing something?

Thanks,
-- Doug


From zhaoy at mail.cbi.pku.edu.cn  Wed Aug 11 04:17:42 2010
From: zhaoy at mail.cbi.pku.edu.cn (zhaoy at mail.cbi.pku.edu.cn)
Date: Wed, 11 Aug 2010 16:17:42 +0800 (CST)
Subject: [Bioperl-l] About extracting sequence from genewise format result
Message-ID: <53663.162.105.250.100.1281514662.squirrel@mail.cbi.pku.edu.cn>

Dear authors:

Hello!

Recently I am trying to parse the genewise format result for extracting
the nuclear sequence using method "hit_string" in module "SearchIO",
however, the result is empty. What's more terrible, the cycle seems not
working, because I always get the last result. I'm confused.

My perl code is shown below:

#!/usr/bin/perl -w
use strict;
use warnings;

use Bio::SearchIO;
my $in = new Bio::SearchIO(-format => 'wise',
                           -wisetype => 'genewise',
                           -file   => 'test');
while( my $result = $in->next_result ) {
        while (my $hit = $result->next_hit) {
           while (my $hsp = $hit->next_hsp){
                print "Query=",      $result->query_name, "\n",
                      "Length=",     $hsp->length('total'),"\n",
                      "hit_string:", $hsp->hit_string, "\n";
}
}
}

And one of the genewise format results is shown below:

genewise $Name: wise2-4-0alpha $ (unreleased release)
This program is freely distributed under a GPL. See source directory
Copyright (c) GRL limited: portions of the code are from separate copyright

Query protein:       Cpa_s110_24
Comp Matrix:         BLOSUM62.bla
Gap open:            12
Gap extension:       2
Start/End            global
Target Sequence      Bdi_chr3:38292015..38292302
Strand:              forward
Start/End (protein)  global
Gene Parameter file: gene.stat
Splice site model:   GT/AG only
Codon Table:         codon.table
Subs error:          1e-06
Indel error:         1e-06
Null model           syn
Algorithm            623

genewise output
Score 37.97 bits over entire alignment
Scores as bits over a synchronous coding model

Warning: The bits scores is not probablistically correct for single seqs
See WWW help for more info

Cpa_s110_24        1 MGNCQAVDAATLAIQHPS-GKVDRLYWPVSASEVMRTNPGHYVALLI--
                     MGNCQA DAA + IQHP+ GKV+RLYWP +A++VMR NPGHYVAL++
                     MGNCQAADAAAVVIQHPAEGKVERLYWPATAADVMRKNPGHYVALVVVH
Bdi_chr3:382920    1 agatcggggggggacccgggaggccttcgaggggacaacgctggcgggc
                     tgagaccaccctttaaccagatagtagcccccattgaacgaatctttta
                     gctcgggtggcggcgcgcgggcgcccggccgcccgcgcccccccccccc


Cpa_s110_24       47 ----STTLCPSNSNASNAESVRVTRIKLLRPTDTLVLGQVYRLITTQEV
                              P+ +    A + R+T++KLL+P DTL++GQVYRLIT+Q
                     VSGGAGETDPAVAGGGAAAAARITKVKLLKPRDTLLIGQVYRLITSQ--
Bdi_chr3:382920  148 gtgggggagcgggggggggggaaaagaccaccgaccagcgtccaatc
                     tcggcgacacctcgggcccccgtcatattacgactttgatagttcca
                     cctcctgtcccacaaaattccgccgcgccgcgctgcccgccccccca


Cpa_s110_24       92 MKGLWAKKCAKMKKYQEADHKDGLKPETIPGRRSGPERDTQVAKHERHR

                     -------------------------------------------------
Bdi_chr3:382920  289


Cpa_s110_24      141 SRVAASTNQAGLKSRTWQPSLKSISEAAS

                     -----------------------------
Bdi_chr3:382920  289


//
Gene 1
Gene 1 288
  Exon 1 288 phase 0
     Supporting 1 54 1 18
     Supporting 58 141 19 46
     Supporting 160 288 47 89
//

......


The part of output of this code is shown below:
Query=Aly_481360
Length=0
hit_string:

Query=Aly_481360
Length=0
hit_string:

......

What's wrong with my code and how can I get the correct result? I'm
looking forward to your reply.

Thanks very much!

Best regards,
Zackaly


From roy.chaudhuri at gmail.com  Wed Aug 11 10:32:39 2010
From: roy.chaudhuri at gmail.com (Roy Chaudhuri)
Date: Wed, 11 Aug 2010 15:32:39 +0100
Subject: [Bioperl-l] using HMMER
In-Reply-To: <603590.1072.qm@web112620.mail.gq1.yahoo.com>
References: <603590.1072.qm@web112620.mail.gq1.yahoo.com>
Message-ID: <4C62B487.9090103@gmail.com>

Hi Fayroz,

Your $seq variable contains a Bio::SeqIO object (a biological 
filehandle), not a Bio::Seq (sequence object).

You need to change that line to:
my $seqio = Bio::SeqIO->new(-file=>'one_seq.fa', -format=>'fasta');
my $seq=$seqio->next_seq;

If you have multiple sequences in the file, then you will need to loop 
over them:
while (my $seq=$seqio->next_seq) {
# Code to run Hmmer goes here
}

Also, I don't think you need to specify -informat for your 
Bio::Tools::Run::Hmmer object, since you're passing it a sequence 
object, not a filename.

Hope this helps.
Roy.

On 08/08/2010 09:24, fayroz wrote:
> i need your help, i am a new perl user and want to use bioperl modules to run
> HMMER program ( HMMsearch) i have" model.hmm" and a "fasta file" to see which of
> them are similar with the model
> i write this code but there is a problems
>
> #!/usr/local/bin/perl W
> use Bio::AlignIO;
> use Bio::SearchIO;
> use Bio::SeqIO ;
> use Bio::Tools::Run::Hmmer;
>
> # run hmmsearch (similar for hmmpfam)
> my $factory = Bio::Tools::Run::Hmmer->new(-hmm =>  'h6_avian.hmm',-informat =>
> 'fasta');
> my $seq = Bio::SeqIO->new('-file'=>  "one_seq.fa", '-format'=>'Fasta');
>
> # Pass the factory a Bio::Seq object or a file name, returns a Bio::SearchIO
> my $searchio = $factory->hmmsearch($seq);
>
> while (my $result = $searchio->next_result){
> while(my $hit = $result->next_hit){
> while (my $hsp = $hit->next_hsp){
> print join("\t", ( $result->query_name,
> $hsp->query->start,
> $hsp->query->end,
> $hit->name,
> $hsp->hit->start,
> $hsp->hit->end,
> $hsp->score,
> $hsp->evalue,
> $hsp->seq_str,
> )), "\n";
> }
> }
> }
>
>
> exceptions:
> MSG: Unknown kind of input 'Bio::SeqIO::fasta=HASH(0x329a504)'
> STACK Bio::Tools::Run::Hmmer::_setinput
> D:/Perl/site/lib/Bio/Tools/Run/Hmmer.pm:381
> STACK Bio::Tools::Run::Hmmer::hmmsearch
> D:/Perl/site/lib/Bio/Tools/Run/Hmmer.pm:352
>   STACK toplevel test_bioperl.pl:12
> thank you
>
> fayroz
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Wed Aug 11 11:07:36 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 11 Aug 2010 10:07:36 -0500
Subject: [Bioperl-l] using HMMER
In-Reply-To: <4C62B487.9090103@gmail.com>
References: <603590.1072.qm@web112620.mail.gq1.yahoo.com>
	<4C62B487.9090103@gmail.com>
Message-ID: <62C86AFB-FF3A-44C6-A413-50C3F839DF34@illinois.edu>

might also want to check whether you are using hmmer2 vs hmmer3.  not sure if the wrapper works for hmmer3.

chris

On Aug 11, 2010, at 9:32 AM, Roy Chaudhuri wrote:

> Hi Fayroz,
> 
> Your $seq variable contains a Bio::SeqIO object (a biological filehandle), not a Bio::Seq (sequence object).
> 
> You need to change that line to:
> my $seqio = Bio::SeqIO->new(-file=>'one_seq.fa', -format=>'fasta');
> my $seq=$seqio->next_seq;
> 
> If you have multiple sequences in the file, then you will need to loop over them:
> while (my $seq=$seqio->next_seq) {
> # Code to run Hmmer goes here
> }
> 
> Also, I don't think you need to specify -informat for your Bio::Tools::Run::Hmmer object, since you're passing it a sequence object, not a filename.
> 
> Hope this helps.
> Roy.
> 
> On 08/08/2010 09:24, fayroz wrote:
>> i need your help, i am a new perl user and want to use bioperl modules to run
>> HMMER program ( HMMsearch) i have" model.hmm" and a "fasta file" to see which of
>> them are similar with the model
>> i write this code but there is a problems
>> 
>> #!/usr/local/bin/perl W
>> use Bio::AlignIO;
>> use Bio::SearchIO;
>> use Bio::SeqIO ;
>> use Bio::Tools::Run::Hmmer;
>> 
>> # run hmmsearch (similar for hmmpfam)
>> my $factory = Bio::Tools::Run::Hmmer->new(-hmm =>  'h6_avian.hmm',-informat =>
>> 'fasta');
>> my $seq = Bio::SeqIO->new('-file'=>  "one_seq.fa", '-format'=>'Fasta');
>> 
>> # Pass the factory a Bio::Seq object or a file name, returns a Bio::SearchIO
>> my $searchio = $factory->hmmsearch($seq);
>> 
>> while (my $result = $searchio->next_result){
>> while(my $hit = $result->next_hit){
>> while (my $hsp = $hit->next_hsp){
>> print join("\t", ( $result->query_name,
>> $hsp->query->start,
>> $hsp->query->end,
>> $hit->name,
>> $hsp->hit->start,
>> $hsp->hit->end,
>> $hsp->score,
>> $hsp->evalue,
>> $hsp->seq_str,
>> )), "\n";
>> }
>> }
>> }
>> 
>> 
>> exceptions:
>> MSG: Unknown kind of input 'Bio::SeqIO::fasta=HASH(0x329a504)'
>> STACK Bio::Tools::Run::Hmmer::_setinput
>> D:/Perl/site/lib/Bio/Tools/Run/Hmmer.pm:381
>> STACK Bio::Tools::Run::Hmmer::hmmsearch
>> D:/Perl/site/lib/Bio/Tools/Run/Hmmer.pm:352
>>  STACK toplevel test_bioperl.pl:12
>> thank you
>> 
>> fayroz
>> 
>> 
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From douglas.hoen at gmail.com  Wed Aug 11 15:13:49 2010
From: douglas.hoen at gmail.com (Doug)
Date: Wed, 11 Aug 2010 12:13:49 -0700 (PDT)
Subject: [Bioperl-l] How to store results of searches of translated DNA in
	SeqFeature::Store database of the original DNA?
Message-ID: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com>

Hi,

I am trying to store in a SeqFeature::Store database the results of
searches of translated DNA. The DB contains the original DNA
sequences. For instance, I have done HMMER searches of 6-frame
translations of the sequences stored in the DB. I want to store these
results "at" their (equivalent) DNA positions, which I can calculate.
Preferably, I would like to directly store the SeqFeature::Similarity
objects that I get from parsing these searches. But they are of course
located on different coordinate systems than the DNA, so I guess I
can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct
DNA position and then store the Similarity's as sub-SeqFeatures.

I could just set the Similarity's position to the (calculated) DNA
coordinates, or alternately make a new SeqFeature and copy in the
attributes I want. But is there a more elegant solution?

Thanks,
-- Doug


From douglas.hoen at gmail.com  Wed Aug 11 16:11:26 2010
From: douglas.hoen at gmail.com (Doug)
Date: Wed, 11 Aug 2010 13:11:26 -0700 (PDT)
Subject: [Bioperl-l] How to store results of searches of translated DNA
	in SeqFeature::Store database of the original DNA?
In-Reply-To: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com>
References: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com>
Message-ID: <f2a15012-4af5-4e91-babe-3fb8ff55dd86@k10g2000yqa.googlegroups.com>

One possible answer to my own question: Use
Bio::SeqFeature::PositionProxy's? Would this work?

On Aug 11, 3:13?pm, Doug <douglas.h... at gmail.com> wrote:
> Hi,
>
> I am trying to store in a SeqFeature::Store database the results of
> searches of translated DNA. The DB contains the original DNA
> sequences. For instance, I have done HMMER searches of 6-frame
> translations of the sequences stored in the DB. I want to store these
> results "at" their (equivalent) DNA positions, which I can calculate.
> Preferably, I would like to directly store the SeqFeature::Similarity
> objects that I get from parsing these searches. But they are of course
> located on different coordinate systems than the DNA, so I guess I
> can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct
> DNA position and then store the Similarity's as sub-SeqFeatures.
>
> I could just set the Similarity's position to the (calculated) DNA
> coordinates, or alternately make a new SeqFeature and copy in the
> attributes I want. But is there a more elegant solution?
>
> Thanks,
> -- Doug
> _______________________________________________
> Bioperl-l mailing list
> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l


From scott at scottcain.net  Wed Aug 11 16:16:22 2010
From: scott at scottcain.net (Scott Cain)
Date: Wed, 11 Aug 2010 16:16:22 -0400
Subject: [Bioperl-l] How to store results of searches of translated DNA
	in SeqFeature::Store database of the original DNA?
In-Reply-To: <f2a15012-4af5-4e91-babe-3fb8ff55dd86@k10g2000yqa.googlegroups.com>
References: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com>
	<f2a15012-4af5-4e91-babe-3fb8ff55dd86@k10g2000yqa.googlegroups.com>
Message-ID: <AANLkTim2X9uaVq6ChayrRJr10L3MeA4fVfuHvA0HyvqM@mail.gmail.com>

Hi Doug,

I don't know if any of the things you've thought of would work; I've
never tried it.  My inclination would be to express your data in GFF3
and use the standard loader.

Scott


On Wed, Aug 11, 2010 at 4:11 PM, Doug <douglas.hoen at gmail.com> wrote:
> One possible answer to my own question: Use
> Bio::SeqFeature::PositionProxy's? Would this work?
>
> On Aug 11, 3:13?pm, Doug <douglas.h... at gmail.com> wrote:
>> Hi,
>>
>> I am trying to store in a SeqFeature::Store database the results of
>> searches of translated DNA. The DB contains the original DNA
>> sequences. For instance, I have done HMMER searches of 6-frame
>> translations of the sequences stored in the DB. I want to store these
>> results "at" their (equivalent) DNA positions, which I can calculate.
>> Preferably, I would like to directly store the SeqFeature::Similarity
>> objects that I get from parsing these searches. But they are of course
>> located on different coordinate systems than the DNA, so I guess I
>> can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct
>> DNA position and then store the Similarity's as sub-SeqFeatures.
>>
>> I could just set the Similarity's position to the (calculated) DNA
>> coordinates, or alternately make a new SeqFeature and copy in the
>> attributes I want. But is there a more elegant solution?
>>
>> Thanks,
>> -- Doug
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
Ontario Institute for Cancer Research


From douglas.hoen at gmail.com  Wed Aug 11 16:38:54 2010
From: douglas.hoen at gmail.com (Doug)
Date: Wed, 11 Aug 2010 13:38:54 -0700 (PDT)
Subject: [Bioperl-l] How to store results of searches of translated DNA
	in SeqFeature::Store database of the original DNA?
In-Reply-To: <AANLkTim2X9uaVq6ChayrRJr10L3MeA4fVfuHvA0HyvqM@mail.gmail.com>
References: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com>
	<f2a15012-4af5-4e91-babe-3fb8ff55dd86@k10g2000yqa.googlegroups.com> 
	<AANLkTim2X9uaVq6ChayrRJr10L3MeA4fVfuHvA0HyvqM@mail.gmail.com>
Message-ID: <6e28dc26-ada0-4be2-9f62-a4d632aaf0bb@j8g2000yqd.googlegroups.com>

Hi Scott,

Good idea. Would you happen to know of an existing HMMER3 to GFF3
converter?

Thanks for your advice,
-- Doug

On Aug 11, 4:16?pm, Scott Cain <sc... at scottcain.net> wrote:
> Hi Doug,
>
> I don't know if any of the things you've thought of would work; I've
> never tried it. ?My inclination would be to express your data in GFF3
> and use the standard loader.
>
> Scott
>
>
>
>
>
> On Wed, Aug 11, 2010 at 4:11 PM, Doug <douglas.h... at gmail.com> wrote:
> > One possible answer to my own question: Use
> > Bio::SeqFeature::PositionProxy's? Would this work?
>
> > On Aug 11, 3:13?pm, Doug <douglas.h... at gmail.com> wrote:
> >> Hi,
>
> >> I am trying to store in a SeqFeature::Store database the results of
> >> searches of translated DNA. The DB contains the original DNA
> >> sequences. For instance, I have done HMMER searches of 6-frame
> >> translations of the sequences stored in the DB. I want to store these
> >> results "at" their (equivalent) DNA positions, which I can calculate.
> >> Preferably, I would like to directly store the SeqFeature::Similarity
> >> objects that I get from parsing these searches. But they are of course
> >> located on different coordinate systems than the DNA, so I guess I
> >> can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct
> >> DNA position and then store the Similarity's as sub-SeqFeatures.
>
> >> I could just set the Similarity's position to the (calculated) DNA
> >> coordinates, or alternately make a new SeqFeature and copy in the
> >> attributes I want. But is there a more elegant solution?
>
> >> Thanks,
> >> -- Doug
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioper... at lists.open-bio.org
> >http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net
> GMOD Coordinator (http://gmod.org/)?? ? ? ? ? ? ? ? ?? 216-392-3087
> Ontario Institute for Cancer Research
>
> _______________________________________________
> Bioperl-l mailing list
> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l


From douglas.hoen at gmail.com  Wed Aug 11 16:53:35 2010
From: douglas.hoen at gmail.com (Doug)
Date: Wed, 11 Aug 2010 13:53:35 -0700 (PDT)
Subject: [Bioperl-l] How to store results of searches of translated DNA
	in SeqFeature::Store database of the original DNA?
In-Reply-To: <6e28dc26-ada0-4be2-9f62-a4d632aaf0bb@j8g2000yqd.googlegroups.com>
References: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com>
	<f2a15012-4af5-4e91-babe-3fb8ff55dd86@k10g2000yqa.googlegroups.com> 
	<AANLkTim2X9uaVq6ChayrRJr10L3MeA4fVfuHvA0HyvqM@mail.gmail.com> 
	<6e28dc26-ada0-4be2-9f62-a4d632aaf0bb@j8g2000yqd.googlegroups.com>
Message-ID: <a9d5aca2-3c28-49e8-bd76-119309c38c05@x21g2000yqa.googlegroups.com>

One more note: I did try using PositionProxy but it failed. It doesn't
implement seq_id() and so can't be stored in the DB:

------------- EXCEPTION: Bio::Root::NotImplemented -------------
MSG: Abstract method "Bio::SeqFeatureI::seq_id" is not implemented by
package Bio::SeqFeature::PositionProxy.
This is not your fault - author of Bio::SeqFeature::PositionProxy
should be blamed!

...


On Aug 11, 4:38?pm, Doug <douglas.h... at gmail.com> wrote:
> Hi Scott,
>
> Good idea. Would you happen to know of an existing HMMER3 to GFF3
> converter?
>
> Thanks for your advice,
> -- Doug
>
> On Aug 11, 4:16?pm, Scott Cain <sc... at scottcain.net> wrote:
>
>
>
>
>
> > Hi Doug,
>
> > I don't know if any of the things you've thought of would work; I've
> > never tried it. ?My inclination would be to express your data in GFF3
> > and use the standard loader.
>
> > Scott
>
> > On Wed, Aug 11, 2010 at 4:11 PM, Doug <douglas.h... at gmail.com> wrote:
> > > One possible answer to my own question: Use
> > > Bio::SeqFeature::PositionProxy's? Would this work?
>
> > > On Aug 11, 3:13?pm, Doug <douglas.h... at gmail.com> wrote:
> > >> Hi,
>
> > >> I am trying to store in a SeqFeature::Store database the results of
> > >> searches of translated DNA. The DB contains the original DNA
> > >> sequences. For instance, I have done HMMER searches of 6-frame
> > >> translations of the sequences stored in the DB. I want to store these
> > >> results "at" their (equivalent) DNA positions, which I can calculate.
> > >> Preferably, I would like to directly store the SeqFeature::Similarity
> > >> objects that I get from parsing these searches. But they are of course
> > >> located on different coordinate systems than the DNA, so I guess I
> > >> can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct
> > >> DNA position and then store the Similarity's as sub-SeqFeatures.
>
> > >> I could just set the Similarity's position to the (calculated) DNA
> > >> coordinates, or alternately make a new SeqFeature and copy in the
> > >> attributes I want. But is there a more elegant solution?
>
> > >> Thanks,
> > >> -- Doug
> > >> _______________________________________________
> > >> Bioperl-l mailing list
> > >> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioper... at lists.open-bio.org
> > >http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> > --
> > ------------------------------------------------------------------------
> > Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net
> > GMOD Coordinator (http://gmod.org/)?? ? ? ? ? ? ? ? ?? 216-392-3087
> > Ontario Institute for Cancer Research
>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Wed Aug 11 16:45:00 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 11 Aug 2010 15:45:00 -0500
Subject: [Bioperl-l] How to store results of searches of translated DNA
	in SeqFeature::Store database of the original DNA?
In-Reply-To: <6e28dc26-ada0-4be2-9f62-a4d632aaf0bb@j8g2000yqd.googlegroups.com>
References: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com>
	<f2a15012-4af5-4e91-babe-3fb8ff55dd86@k10g2000yqa.googlegroups.com>
	<AANLkTim2X9uaVq6ChayrRJr10L3MeA4fVfuHvA0HyvqM@mail.gmail.com>
	<6e28dc26-ada0-4be2-9f62-a4d632aaf0bb@j8g2000yqd.googlegroups.com>
Message-ID: <190AF658-E8FE-43D7-A71F-196AE54DA1DB@illinois.edu>

HMMER3 is parsed by Bio::SearchIO now in bioperl-live, and I think there is a generic SearchIO->GFF3 script floating around the intertubes somewheres...

chris

On Aug 11, 2010, at 3:38 PM, Doug wrote:

> Hi Scott,
> 
> Good idea. Would you happen to know of an existing HMMER3 to GFF3
> converter?
> 
> Thanks for your advice,
> -- Doug
> 
> On Aug 11, 4:16 pm, Scott Cain <sc... at scottcain.net> wrote:
>> Hi Doug,
>> 
>> I don't know if any of the things you've thought of would work; I've
>> never tried it.  My inclination would be to express your data in GFF3
>> and use the standard loader.
>> 
>> Scott
>> 
>> 
>> 
>> 
>> 
>> On Wed, Aug 11, 2010 at 4:11 PM, Doug <douglas.h... at gmail.com> wrote:
>>> One possible answer to my own question: Use
>>> Bio::SeqFeature::PositionProxy's? Would this work?
>> 
>>> On Aug 11, 3:13 pm, Doug <douglas.h... at gmail.com> wrote:
>>>> Hi,
>> 
>>>> I am trying to store in a SeqFeature::Store database the results of
>>>> searches of translated DNA. The DB contains the original DNA
>>>> sequences. For instance, I have done HMMER searches of 6-frame
>>>> translations of the sequences stored in the DB. I want to store these
>>>> results "at" their (equivalent) DNA positions, which I can calculate.
>>>> Preferably, I would like to directly store the SeqFeature::Similarity
>>>> objects that I get from parsing these searches. But they are of course
>>>> located on different coordinate systems than the DNA, so I guess I
>>>> can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct
>>>> DNA position and then store the Similarity's as sub-SeqFeatures.
>> 
>>>> I could just set the Similarity's position to the (calculated) DNA
>>>> coordinates, or alternately make a new SeqFeature and copy in the
>>>> attributes I want. But is there a more elegant solution?
>> 
>>>> Thanks,
>>>> -- Doug
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioper... at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> --
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.                                   scott at scottcain dot net
>> GMOD Coordinator (http://gmod.org/)                    216-392-3087
>> Ontario Institute for Cancer Research
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From scott at scottcain.net  Wed Aug 11 17:05:25 2010
From: scott at scottcain.net (Scott Cain)
Date: Wed, 11 Aug 2010 17:05:25 -0400
Subject: [Bioperl-l] How to store results of searches of translated DNA
	in SeqFeature::Store database of the original DNA?
In-Reply-To: <190AF658-E8FE-43D7-A71F-196AE54DA1DB@illinois.edu>
References: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com>
	<f2a15012-4af5-4e91-babe-3fb8ff55dd86@k10g2000yqa.googlegroups.com>
	<AANLkTim2X9uaVq6ChayrRJr10L3MeA4fVfuHvA0HyvqM@mail.gmail.com>
	<6e28dc26-ada0-4be2-9f62-a4d632aaf0bb@j8g2000yqd.googlegroups.com>
	<190AF658-E8FE-43D7-A71F-196AE54DA1DB@illinois.edu>
Message-ID: <AANLkTimY09-wo9R_ZbPmSG_9x7TZjVobTM95VO5fgCa4@mail.gmail.com>

Um, yeah, it's in bioperl: bp_search2gff.pl.

Scott


On Wed, Aug 11, 2010 at 4:45 PM, Chris Fields <cjfields at illinois.edu> wrote:
> HMMER3 is parsed by Bio::SearchIO now in bioperl-live, and I think there is a generic SearchIO->GFF3 script floating around the intertubes somewheres...
>
> chris
>
> On Aug 11, 2010, at 3:38 PM, Doug wrote:
>
>> Hi Scott,
>>
>> Good idea. Would you happen to know of an existing HMMER3 to GFF3
>> converter?
>>
>> Thanks for your advice,
>> -- Doug
>>
>> On Aug 11, 4:16 pm, Scott Cain <sc... at scottcain.net> wrote:
>>> Hi Doug,
>>>
>>> I don't know if any of the things you've thought of would work; I've
>>> never tried it. ?My inclination would be to express your data in GFF3
>>> and use the standard loader.
>>>
>>> Scott
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Aug 11, 2010 at 4:11 PM, Doug <douglas.h... at gmail.com> wrote:
>>>> One possible answer to my own question: Use
>>>> Bio::SeqFeature::PositionProxy's? Would this work?
>>>
>>>> On Aug 11, 3:13 pm, Doug <douglas.h... at gmail.com> wrote:
>>>>> Hi,
>>>
>>>>> I am trying to store in a SeqFeature::Store database the results of
>>>>> searches of translated DNA. The DB contains the original DNA
>>>>> sequences. For instance, I have done HMMER searches of 6-frame
>>>>> translations of the sequences stored in the DB. I want to store these
>>>>> results "at" their (equivalent) DNA positions, which I can calculate.
>>>>> Preferably, I would like to directly store the SeqFeature::Similarity
>>>>> objects that I get from parsing these searches. But they are of course
>>>>> located on different coordinate systems than the DNA, so I guess I
>>>>> can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct
>>>>> DNA position and then store the Similarity's as sub-SeqFeatures.
>>>
>>>>> I could just set the Similarity's position to the (calculated) DNA
>>>>> coordinates, or alternately make a new SeqFeature and copy in the
>>>>> attributes I want. But is there a more elegant solution?
>>>
>>>>> Thanks,
>>>>> -- Doug
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioper... at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> --
>>> ------------------------------------------------------------------------
>>> Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain dot net
>>> GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ?216-392-3087
>>> Ontario Institute for Cancer Research
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
Ontario Institute for Cancer Research


From cjfields at illinois.edu  Wed Aug 11 17:07:20 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 11 Aug 2010 16:07:20 -0500
Subject: [Bioperl-l] How to store results of searches of translated DNA
	in SeqFeature::Store database of the original DNA?
In-Reply-To: <AANLkTimY09-wo9R_ZbPmSG_9x7TZjVobTM95VO5fgCa4@mail.gmail.com>
References: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com>
	<f2a15012-4af5-4e91-babe-3fb8ff55dd86@k10g2000yqa.googlegroups.com>
	<AANLkTim2X9uaVq6ChayrRJr10L3MeA4fVfuHvA0HyvqM@mail.gmail.com>
	<6e28dc26-ada0-4be2-9f62-a4d632aaf0bb@j8g2000yqd.googlegroups.com>
	<190AF658-E8FE-43D7-A71F-196AE54DA1DB@illinois.edu>
	<AANLkTimY09-wo9R_ZbPmSG_9x7TZjVobTM95VO5fgCa4@mail.gmail.com>
Message-ID: <CCD1DE1D-867E-468D-941A-7C418C126FBE@illinois.edu>

For some reason I thought there was a more up-to-date one somewhere.  Ah well, can't keep track of all the code in bioperl :>

chris

On Aug 11, 2010, at 4:05 PM, Scott Cain wrote:

> Um, yeah, it's in bioperl: bp_search2gff.pl.
> 
> Scott
> 
> 
> On Wed, Aug 11, 2010 at 4:45 PM, Chris Fields <cjfields at illinois.edu> wrote:
>> HMMER3 is parsed by Bio::SearchIO now in bioperl-live, and I think there is a generic SearchIO->GFF3 script floating around the intertubes somewheres...
>> 
>> chris
>> 
>> On Aug 11, 2010, at 3:38 PM, Doug wrote:
>> 
>>> Hi Scott,
>>> 
>>> Good idea. Would you happen to know of an existing HMMER3 to GFF3
>>> converter?
>>> 
>>> Thanks for your advice,
>>> -- Doug
>>> 
>>> On Aug 11, 4:16 pm, Scott Cain <sc... at scottcain.net> wrote:
>>>> Hi Doug,
>>>> 
>>>> I don't know if any of the things you've thought of would work; I've
>>>> never tried it.  My inclination would be to express your data in GFF3
>>>> and use the standard loader.
>>>> 
>>>> Scott
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Wed, Aug 11, 2010 at 4:11 PM, Doug <douglas.h... at gmail.com> wrote:
>>>>> One possible answer to my own question: Use
>>>>> Bio::SeqFeature::PositionProxy's? Would this work?
>>>> 
>>>>> On Aug 11, 3:13 pm, Doug <douglas.h... at gmail.com> wrote:
>>>>>> Hi,
>>>> 
>>>>>> I am trying to store in a SeqFeature::Store database the results of
>>>>>> searches of translated DNA. The DB contains the original DNA
>>>>>> sequences. For instance, I have done HMMER searches of 6-frame
>>>>>> translations of the sequences stored in the DB. I want to store these
>>>>>> results "at" their (equivalent) DNA positions, which I can calculate.
>>>>>> Preferably, I would like to directly store the SeqFeature::Similarity
>>>>>> objects that I get from parsing these searches. But they are of course
>>>>>> located on different coordinate systems than the DNA, so I guess I
>>>>>> can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct
>>>>>> DNA position and then store the Similarity's as sub-SeqFeatures.
>>>> 
>>>>>> I could just set the Similarity's position to the (calculated) DNA
>>>>>> coordinates, or alternately make a new SeqFeature and copy in the
>>>>>> attributes I want. But is there a more elegant solution?
>>>> 
>>>>>> Thanks,
>>>>>> -- Doug
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>> 
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioper... at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>> 
>>>> --
>>>> ------------------------------------------------------------------------
>>>> Scott Cain, Ph. D.                                   scott at scottcain dot net
>>>> GMOD Coordinator (http://gmod.org/)                    216-392-3087
>>>> Ontario Institute for Cancer Research
>>>> 
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> 
> 
> 
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain dot net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research


From douglas.hoen at gmail.com  Wed Aug 11 17:11:20 2010
From: douglas.hoen at gmail.com (Douglas Hoen)
Date: Wed, 11 Aug 2010 17:11:20 -0400
Subject: [Bioperl-l] How to store results of searches of translated DNA
	in SeqFeature::Store database of the original DNA?
In-Reply-To: <AANLkTimY09-wo9R_ZbPmSG_9x7TZjVobTM95VO5fgCa4@mail.gmail.com>
References: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com>
	<f2a15012-4af5-4e91-babe-3fb8ff55dd86@k10g2000yqa.googlegroups.com>
	<AANLkTim2X9uaVq6ChayrRJr10L3MeA4fVfuHvA0HyvqM@mail.gmail.com>
	<6e28dc26-ada0-4be2-9f62-a4d632aaf0bb@j8g2000yqd.googlegroups.com>
	<190AF658-E8FE-43D7-A71F-196AE54DA1DB@illinois.edu>
	<AANLkTimY09-wo9R_ZbPmSG_9x7TZjVobTM95VO5fgCa4@mail.gmail.com>
Message-ID: <A8FFFBCC-4E4F-478B-B824-BB4249B11BA1@gmail.com>

Great, thanks so much for the info.

On 2010-08-11, at 5:05 PM, Scott Cain wrote:

> Um, yeah, it's in bioperl: bp_search2gff.pl.
> 
> Scott
> 
> 
> On Wed, Aug 11, 2010 at 4:45 PM, Chris Fields <cjfields at illinois.edu> wrote:
>> HMMER3 is parsed by Bio::SearchIO now in bioperl-live, and I think there is a generic SearchIO->GFF3 script floating around the intertubes somewheres...
>> 
>> chris
>> 
>> On Aug 11, 2010, at 3:38 PM, Doug wrote:
>> 
>>> Hi Scott,
>>> 
>>> Good idea. Would you happen to know of an existing HMMER3 to GFF3
>>> converter?
>>> 
>>> Thanks for your advice,
>>> -- Doug
>>> 
>>> On Aug 11, 4:16 pm, Scott Cain <sc... at scottcain.net> wrote:
>>>> Hi Doug,
>>>> 
>>>> I don't know if any of the things you've thought of would work; I've
>>>> never tried it.  My inclination would be to express your data in GFF3
>>>> and use the standard loader.
>>>> 
>>>> Scott
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Wed, Aug 11, 2010 at 4:11 PM, Doug <douglas.h... at gmail.com> wrote:
>>>>> One possible answer to my own question: Use
>>>>> Bio::SeqFeature::PositionProxy's? Would this work?
>>>> 
>>>>> On Aug 11, 3:13 pm, Doug <douglas.h... at gmail.com> wrote:
>>>>>> Hi,
>>>> 
>>>>>> I am trying to store in a SeqFeature::Store database the results of
>>>>>> searches of translated DNA. The DB contains the original DNA
>>>>>> sequences. For instance, I have done HMMER searches of 6-frame
>>>>>> translations of the sequences stored in the DB. I want to store these
>>>>>> results "at" their (equivalent) DNA positions, which I can calculate.
>>>>>> Preferably, I would like to directly store the SeqFeature::Similarity
>>>>>> objects that I get from parsing these searches. But they are of course
>>>>>> located on different coordinate systems than the DNA, so I guess I
>>>>>> can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct
>>>>>> DNA position and then store the Similarity's as sub-SeqFeatures.
>>>> 
>>>>>> I could just set the Similarity's position to the (calculated) DNA
>>>>>> coordinates, or alternately make a new SeqFeature and copy in the
>>>>>> attributes I want. But is there a more elegant solution?
>>>> 
>>>>>> Thanks,
>>>>>> -- Doug
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>> 
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioper... at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>> 
>>>> --
>>>> ------------------------------------------------------------------------
>>>> Scott Cain, Ph. D.                                   scott at scottcain dot net
>>>> GMOD Coordinator (http://gmod.org/)                    216-392-3087
>>>> Ontario Institute for Cancer Research
>>>> 
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> 
> 
> 
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain dot net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research


From Russell.Smithies at agresearch.co.nz  Wed Aug 11 17:31:32 2010
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Thu, 12 Aug 2010 09:31:32 +1200
Subject: [Bioperl-l] AlignIO  and Gbrowse_syn
In-Reply-To: <AANLkTimhT7KFHgg8HssiMrXkGNSjRMqjkuE++iAaqj4C@mail.gmail.com>
References: <AANLkTi=DS92zQ+_YANSbgmMiPUDo66YXyLhQby+_z_VD@mail.gmail.com>
	<C87CF736.E5DB%gowthaman.ramasamy@sbri.org>
	<AANLkTimhT7KFHgg8HssiMrXkGNSjRMqjkuE++iAaqj4C@mail.gmail.com>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF32F0237EAB7@exchsth.agresearch.co.nz>

I know there was some brief discussion about .maf format a few weeks ago but I've had an enquiry (as below) from a colleague. 
If GBrowse_syn is using .maf format, does AlignIO need more work?
Any comments?

--Russell


I'd like to plug LASTZ alignments into GBrowse_syn. LASTZ can produce a limit number of alignment formats (http://www.bx.psu.edu/miller_lab/dist/README.lastz-1.02.00/README.lastz-1.02.00a.html#options_output). GBrowse_syn accepts clustalw format plus "other commonly used formats recognized by BioPerl's AlignIO parser" (http://gmod.org/wiki/GBrowse_syn_Database) .  Since LASTZ doesn't produce clustalw, I've tried parsing LASTZ maf output to clustalw (and other alignment formats) using AlignIO, however I run into the following issues:
*Strand info is lost (probably fair enough, since this isn't part of the clustalw format per se; incorporating strand info within sequence IDs is a GBrowse_syn clustalw specification)
*The coordinate system for reverse strand matches  differs between LASTZ .maf and BioPerl .maf: for LASTZ, coordinates relate to the reverse complemented sequence, whereas for BioPerl/GBrowse, coordinates relate to the original (non-rev complemented) sequence. E.g. a coordinate of "1" in the LASTZ .maf file refers to the last base of the original sequence; AlignIO prints "1" to the output clustalw file, but since strand info is lost it is construed as the first position at the very start of the original sequence. As a result all reverse match coordinates in the resulting clustalw output file are incorrect.
*AlignIO is unable to parse multiple, individual aligned regions within the same .maf file; it interleaves them

I would be interested to hear whether anyone has already found a solution to integrating LASTZ and GBrowse_syn... and also whether any development of AlignIO to improve support of maf format is planned.
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From cjfields at illinois.edu  Wed Aug 11 18:02:38 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 11 Aug 2010 17:02:38 -0500
Subject: [Bioperl-l] AlignIO  and Gbrowse_syn
In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32F0237EAB7@exchsth.agresearch.co.nz>
References: <AANLkTi=DS92zQ+_YANSbgmMiPUDo66YXyLhQby+_z_VD@mail.gmail.com>
	<C87CF736.E5DB%gowthaman.ramasamy@sbri.org>
	<AANLkTimhT7KFHgg8HssiMrXkGNSjRMqjkuE++iAaqj4C@mail.gmail.com>
	<18DF7D20DFEC044098A1062202F5FFF32F0237EAB7@exchsth.agresearch.co.nz>
Message-ID: <E53C66C1-E4F1-4E83-B5ED-631CE62D7DCE@illinois.edu>

Russell,

We have had very few requests to support .maf until recently, which is why there has been little done with it.  We welcome any help to improve it.  

chris

On Aug 11, 2010, at 4:31 PM, Smithies, Russell wrote:

> I know there was some brief discussion about .maf format a few weeks ago but I've had an enquiry (as below) from a colleague. 
> If GBrowse_syn is using .maf format, does AlignIO need more work?
> Any comments?
> 
> --Russell
> 
> 
> I'd like to plug LASTZ alignments into GBrowse_syn. LASTZ can produce a limit number of alignment formats (http://www.bx.psu.edu/miller_lab/dist/README.lastz-1.02.00/README.lastz-1.02.00a.html#options_output). GBrowse_syn accepts clustalw format plus "other commonly used formats recognized by BioPerl's AlignIO parser" (http://gmod.org/wiki/GBrowse_syn_Database) .  Since LASTZ doesn't produce clustalw, I've tried parsing LASTZ maf output to clustalw (and other alignment formats) using AlignIO, however I run into the following issues:
> *Strand info is lost (probably fair enough, since this isn't part of the clustalw format per se; incorporating strand info within sequence IDs is a GBrowse_syn clustalw specification)
> *The coordinate system for reverse strand matches  differs between LASTZ .maf and BioPerl .maf: for LASTZ, coordinates relate to the reverse complemented sequence, whereas for BioPerl/GBrowse, coordinates relate to the original (non-rev complemented) sequence. E.g. a coordinate of "1" in the LASTZ .maf file refers to the last base of the original sequence; AlignIO prints "1" to the output clustalw file, but since strand info is lost it is construed as the first position at the very start of the original sequence. As a result all reverse match coordinates in the resulting clustalw output file are incorrect.
> *AlignIO is unable to parse multiple, individual aligned regions within the same .maf file; it interleaves them
> 
> I would be interested to hear whether anyone has already found a solution to integrating LASTZ and GBrowse_syn... and also whether any development of AlignIO to improve support of maf format is planned.
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From douglas.hoen at gmail.com  Thu Aug 12 01:59:37 2010
From: douglas.hoen at gmail.com (Doug Hoen)
Date: Wed, 11 Aug 2010 22:59:37 -0700 (PDT)
Subject: [Bioperl-l] HMMER3 to GFF3
Message-ID: <4bb89ced-69d9-43ff-ae20-4ce134efc40a@f6g2000yqa.googlegroups.com>

Hi,

 I am trying to convert HMMER3 (hmmscan) output files into GFF3 files.
Based on previous advice (see the thread, "How to store results of
searches of translated DNA in SeqFeature::Store database of the
original DNA?"), I have installed bioperl-live for its new HMMER3
parsing capabilities (in SearchIO) and am trying to use
bp_search2gff.pl to do the file conversion.

The hmmscan was done on translated chromosome sequences with conserved
domain models. I want to get the GFF 'start' and 'end' columns to be
based on these coordinates, not those of the models. To do this (with
my files), it seems I need to use the option "--type hit". However,
this changes the "Target" sequence name from the model name to
chromosome name, and the model name does not appear anywhere in the
output (see below).

Could someone please confirm whether the results are incorrect and, if
so, perhaps suggest a fix? It may well be that this problem is due to
the unusual way I am using hmmscan, rather than a problem with HMMER3
parsing...?

Many thanks,
-- Doug


========================================================


Here's what it looks like if I do *not* use the "--type hit" option.
(RVT_2 is a conserved domain name. I need this in the output.)


COMMAND:
------------------
bp_search2gff.pl -i ../chr1-tesigsv2.hmmscan -o chr1-tesigsv2-hmmscan-
original-locations-v2.gff3 --format hmmer3 --source HMMER3 --version 3
--component


OUTPUT:
------------------
==> chr1-tesigsv2-hmmscan-original-locations-v2.gff3 <==
##gff-version 3
Chr1_1	chromosome	Component	1	10142557	.	.	1	sequence=Chr1_1
Chr1_1	HMMER3	similarity	1	245	307.3	.	0	Target=Sequence:RVT_2 1898330
1898579
Chr1_1	HMMER3	similarity	1	244	329.5	.	0	Target=Sequence:RVT_2 2573551
2573796
Chr1_1	HMMER3	similarity	1	245	308.8	.	0	Target=Sequence:RVT_2 3159685
3159930
Chr1_1	HMMER3	similarity	1	102	108.2	.	0	Target=Sequence:RVT_2 3438684
3438791
Chr1_1	HMMER3	similarity	2	245	277.2	.	0	Target=Sequence:RVT_2 3566642
3566891
Chr1_1	HMMER3	similarity	13	213	251.4	.	0	Target=Sequence:RVT_2
4251160 4251373
Chr1_1	HMMER3	similarity	1	244	310.6	.	0	Target=Sequence:RVT_2 4252791
4253036
Chr1_1	HMMER3	similarity	6	99	94.2	.	0	Target=Sequence:RVT_2 4271555
4271653


========================================================


And here's what it looks like if I *do* use the "--type hit" option.
The coordinates look good but the model name has disappeared (and the
Target=Sequence seems wrong).


COMMAND:
------------------
bp_search2gff.pl -i ../chr1-tesigsv2.hmmscan -o chr1-tesigsv2-hmmscan-
original-locations-v3.gff3 --format hmmer3 --type hit --source HMMER3
--version 3 --component


OUTPUT:
------------------
==> chr1-tesigsv2-hmmscan-original-locations-v3.gff3 <==
##gff-version 3
RVT_2	HMMER3	similarity	1898330	1898579	307.3	.	0
Target=Sequence:Chr1_1 1 245
RVT_2	HMMER3	similarity	2573551	2573796	329.5	.	0
Target=Sequence:Chr1_1 1 244
RVT_2	HMMER3	similarity	3159685	3159930	308.8	.	0
Target=Sequence:Chr1_1 1 245
RVT_2	HMMER3	similarity	3438684	3438791	108.2	.	0
Target=Sequence:Chr1_1 1 102
RVT_2	HMMER3	similarity	3566642	3566891	277.2	.	0
Target=Sequence:Chr1_1 2 245
RVT_2	HMMER3	similarity	4251160	4251373	251.4	.	0
Target=Sequence:Chr1_1 13 213
RVT_2	HMMER3	similarity	4252791	4253036	310.6	.	0
Target=Sequence:Chr1_1 1 244
RVT_2	HMMER3	similarity	4271555	4271653	94.2	.	0
Target=Sequence:Chr1_1 6 99
RVT_2	HMMER3	similarity	4481232	4481477	281.5	.	0
Target=Sequence:Chr1_1 2 245


========================================================


And here's what the input HMMER3 result file looks like:


==> ../chr1-tesigsv2.hmmscan <==
# hmmscan :: search sequence(s) against a profile database
# HMMER 3.0rc1 (February 2010); http://hmmer.org/
# Copyright (C) 2010 Howard Hughes Medical Institute.
# Freely distributed under the GNU General Public License (GPLv3).
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- -
# query sequence file:             [...]/whole_chromosomes/translated/
chr1.pep
# target HMM database:             [...]/signatures/Pfam-A.hmm
# output directed to file:         chr1-tesigsv2.hmmscan
# model-specific thresholding:     TC cutoffs
# Max sensitivity mode:            on [all heuristic filters off]
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- -

Query:       Chr1_1  [L=10142557]
Description: CHROMOSOME dumped from ADB: Jun/20/09 14:53; last
updated: 2009-02-02
Scores for complete sequence (score includes all domains):
   --- full sequence ---   --- best 1 domain ---    -#dom-
    E-value  score  bias    E-value  score  bias    exp  N
Model           Description
    ------- ------ -----    ------- ------ -----   ---- --
--------        -----------
          0 3971.3  17.7   2.6e-101  329.5   0.6   19.4 17
RVT_2           Reverse transcriptase (RNA-dependent DNA pol
          0 3040.7  23.0     1e-206  678.6   0.1   12.2 10
ATHILA          ATHILA ORF-1 family
          0 1681.9  79.1    1.9e-46  149.9   0.4   28.0 21
RVT_1           Reverse transcriptase (RNA-dependent DNA pol
          0 1446.9  27.4    3.6e-95  309.1   0.2    7.6  5
Transposase_21  Transposase family tnp2
          0 1168.4  50.3    1.4e-29   94.4   0.3   21.5 18
rve             Integrase core domain
   9.1e-300  960.0  69.0    3.1e-20   64.0   0.0   28.8 20
Retrotrans_gag  Retrotransposon gag protein
   1.5e-180  577.0  31.6    1.6e-29   93.1   1.5    9.5  8
Transposase_23  TNP1/EN/SPM transposase
   4.4e-143  456.9  82.8    4.8e-18   56.4   0.1   12.9 11
MuDR            MuDR family transposase
   3.8e-116  371.4  19.6    1.2e-18   58.9   0.0   13.7  7
MULE            MULE transposase domain
   7.1e-106  344.1   5.6    2.7e-97  316.0   0.0    3.6  1
Plant_tran      Plant transposon protein
    9.2e-85  275.4  22.9    5.4e-60  194.4   0.3    6.4  3
Peptidase_C48   Ulp1 protease family, C-terminal catalytic d
    1.8e-77  249.8  24.8    4.4e-28   89.8   0.1   10.8  3
Transposase_24  Plant transposase (Ptta/En/Spm family)
    2.8e-47  150.1   1.2    5.5e-23   72.3   0.2    3.7  2
hATC            hAT family dimerisation domain
    5.7e-28   89.4   3.6    4.7e-13   41.1   0.0    6.5  1
RVP_2           Retroviral aspartyl protease
      1e-16   53.3   0.0    4.4e-07   22.1   0.0    6.8  1
RnaseH          RNase H
    1.5e-08   25.3   2.4    0.00016   12.1   0.0    4.9  0
Transposase_mut Transposase, Mutator family


Domain annotation for each model (and alignments):
>> RVT_2  Reverse transcriptase (RNA-dependent DNA polymerase)
   #    score  bias  c-Evalue  i-Evalue hmmfrom  hmm to    alifrom
ali to    envfrom  env to     acc
 ---   ------ ----- --------- --------- ------- -------    -------
-------    ------- -------    ----
   1 !  307.3   0.0   5.3e-95   1.5e-94       1     245 [. 1898330
1898578 .. 1898330 1898579 .. 0.99
   2 !  329.5   0.6  8.9e-102  2.6e-101       1     244 [. 2573551
2573794 .. 2573551 2573796 .. 0.99
   3 !  308.8   0.0   1.8e-95   5.2e-95       1     245 [. 3159685
3159929 .. 3159685 3159930 .. 0.99
   4 !  108.2   0.1   3.4e-34   9.7e-34       1     102 [. 3438684
3438785 .. 3438684 3438791 .. 0.96
   5 !  277.2   0.0   8.1e-86   2.3e-85       2     245 .. 3566643
3566890 .. 3566642 3566891 .. 0.99
   6 !  251.4   0.0   6.2e-78   1.8e-77      13     213 .. 4251164
4251364 .. 4251160 4251373 .. 0.97
   7 !  310.6   0.0   5.1e-96   1.5e-95       1     244 [. 4252791
4253034 .. 4252791 4253036 .. 0.99
   8 !   94.2   0.1   6.1e-30   1.8e-29       6      99 .. 4271560
4271653 .. 4271555 4271653 .. 0.97
   9 !  281.5   0.9   3.9e-87   1.1e-86       2     245 .. 4481233
4481476 .. 4481232 4481477 .. 0.98
  10 !  248.2   0.0   5.9e-77   1.7e-76       1     190 [. 4521040
4521233 .. 4521040 4521237 .. 0.97
  11 !  314.6   0.1   3.2e-97   9.2e-97       1     244 [. 4652456
4652702 .. 4652456 4652704 .. 0.98
  12 !   40.7   0.0   1.3e-13   3.7e-13       2      92 .. 5219607
5219697 .. 5219606 5219701 .. 0.90
  13 !  221.0   0.0   1.2e-68   3.4e-68       2     245 .. 5241015
5241258 .. 5241014 5241259 .. 0.95
  14 !   81.2   0.0   5.6e-26   1.6e-25       2     115 .. 5501957
5502070 .. 5501956 5502080 .. 0.92
  15 !  272.4   0.0   2.3e-84   6.7e-84      30     245 .. 6483057
6483271 .. 6483050 6483272 .. 0.98
  16 !  178.5   0.0   1.2e-55   3.3e-55      81     244 .. 7250563
7250726 .. 7250552 7250728 .. 0.96
  17 !  313.7   0.0   5.9e-97   1.7e-96       2     245 .. 7707124
7707367 .. 7707123 7707368 .. 0.99

  Alignments for each domain:
  == domain 1    score: 307.3 bits;  conditional E-value: 5.3e-95
   RVT_2       1
nktwelvelpkgkkviglkWvfklKlnedgeierykARlVakGftqkegidyeetfspvvklesirlllalaaekkleleqlDvktaFLngelee
95
                 n tw +++lp gkk++g+kWv+k+Kln+dg++erykARlVakG+tq+eg+dy
+tfspv+kl++++ll+a+aa+k+++l+qlD+++aFLng+l+e
  Chr1_1 1898330
NGTWVVCSLPVGKKAVGCKWVYKIKLNADGSLERYKARLVAKGYTQTEGLDYVDTFSPVAKLTTVKLLIAVAAAKGWSLSQLDISNAFLNGSLDE
1898424
 
68*********************************************************************************************
PP

   RVT_2      96
evYvkqpeGfedkkk....enkvckLkkslYgLkqapraWyeklsevllklgfkkseadkclfvkkkeeeliivllYVDDlliagsskelieelk
186
                 e+Y++ p+G++ ++     +n vc+LkkslYgLkqa+r+Wy k+se l++lgf+
+s+ d++lf++k++++ ++vl+YVDD++ia+s +++ e l
  Chr1_1 1898425
EIYMTLPPGYSPRQGdsfpPNAVCRLKKSLYGLKQASRQWYLKFSESLKALGFTQSSGDHTLFTRKSKNSYMAVLVYVDDIIIASSCDRETELLR
1898519
 
***********998889999***************************************************************************
PP

   RVT_2     187
eeLkkefemkdlgelkyfLgleierkeegillsqekyvkkllkkfkmedakpvstplea 245
                 ++L+++ +++dlg+l+yfLglei+r+++gi+++q+ky+ +ll+++++  +k++s
+p+e+
  Chr1_1 1898520
DALQRSSKLRDLGTLRYFLGLEIARNTDGISICQRKYTLELLAETGLLGCKSSSVPMEP 1898578
 
*********************************************************97 PP

  == domain 2    score: 329.5 bits;  conditional E-value: 8.9e-102
   RVT_2       1
nktwelvelpkgkkviglkWvfklKlnedgeierykARlVakGftqkegidyeetfspvvklesirlllalaaekkleleqlDvktaFLngelee
95
                 n+twel++lp+g+k+ig+kWv+k K+n++ge+erykARlVakG++q++gidy+e
+f+pv++le++rl+++laa++k++++q+D k aFLng++ee
  Chr1_1 2573551
NDTWELTSLPNGHKAIGVKWVYKAKKNSKGEVERYKARLVAKGYSQRAGIDYDEVFAPVARLETVRLIISLAAQNKWKIHQMDFKLAFLNGDFEE
2573645
 
79*********************************************************************************************
PP

   RVT_2      96
evYvkqpeGfedkkkenkvckLkkslYgLkqapraWyeklsevllklgfkkseadkclfvkkkeeeliivllYVDDlliagsskelieelkeeLk
190
                 evY++qp+G+ +k++e+kv++Lkk+lYgLkqapraW++++++++++++f k+ +
+++l++k ++e+++i +lYVDDl+++g++ ++ ee+k+e++
  Chr1_1 2573646
EVYIEQPQGYIVKGEEDKVLRLKKALYGLKQAPRAWNTRIDKYFKEKDFIKCPYEHALYIKIQKEDILIACLYVDDLIFTGNNPSMFEEFKKEMT
2573740
 
***********************************************************************************************
PP

   RVT_2     191
kefemkdlgelkyfLgleierkeegillsqekyvkkllkkfkmedakpvstple 244
                 kefem+d+g ++y+Lg+e+++++++i+++qe y+k++lkkfkm+d++pv tp
+e
  Chr1_1 2573741
KEFEMTDIGLMSYYLGIEVKQEDNRIFITQEGYAKEVLKKFKMDDSNPVCTPME 2573794
 
****************************************************97 PP


From kai.blin at biotech.uni-tuebingen.de  Thu Aug 12 08:16:45 2010
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Thu, 12 Aug 2010 14:16:45 +0200
Subject: [Bioperl-l] HMMER3 to GFF3
In-Reply-To: <4bb89ced-69d9-43ff-ae20-4ce134efc40a@f6g2000yqa.googlegroups.com>
References: <4bb89ced-69d9-43ff-ae20-4ce134efc40a@f6g2000yqa.googlegroups.com>
Message-ID: <20100812141645.1dc6507a.kai.blin@biotech.uni-tuebingen.de>

On Wed, 11 Aug 2010 22:59:37 -0700 (PDT)
Doug Hoen <douglas.hoen at gmail.com> wrote:

Hi Doug,

> Could someone please confirm whether the results are incorrect and, if
> so, perhaps suggest a fix? It may well be that this problem is due to
> the unusual way I am using hmmscan, rather than a problem with HMMER3
> parsing...?

Can you please attach your hmmer input file? Along the way something
inserted line breaks, making it unreadable.

It might well be possible that the HMMer3 parser still handles a little
different from the HMMer2 parser, I haven't tried that script.

Cheers,
Kai

-- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-University of T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Deutschland
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben


From kai.blin at biotech.uni-tuebingen.de  Thu Aug 12 08:09:00 2010
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Thu, 12 Aug 2010 14:09:00 +0200
Subject: [Bioperl-l] using HMMER
In-Reply-To: <62C86AFB-FF3A-44C6-A413-50C3F839DF34@illinois.edu>
References: <603590.1072.qm@web112620.mail.gq1.yahoo.com>
	<4C62B487.9090103@gmail.com>
	<62C86AFB-FF3A-44C6-A413-50C3F839DF34@illinois.edu>
Message-ID: <20100812140900.291bbb01.kai.blin@biotech.uni-tuebingen.de>

On Wed, 11 Aug 2010 10:07:36 -0500
Chris Fields <cjfields at illinois.edu> wrote:

> might also want to check whether you are using hmmer2 vs hmmer3.  not sure if the wrapper works for hmmer3.

It might if you initialize it using
my $factory = Bio::Tools::Run::Hmmer->new(-hmm => 'model.hmm', -_READMETHOD => 'hmmer3');

at least for the programs that still exist with the same name in
hmmer3. It won't support hmmer3 using the default options, though.

If I have some spare time, I'll look into this, no promises on the
timeframe, though.

Cheers,
Kai

-- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-University of T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Deutschland
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben


From cjfields at illinois.edu  Thu Aug 12 11:28:50 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 12 Aug 2010 10:28:50 -0500
Subject: [Bioperl-l] using HMMER
In-Reply-To: <20100812140900.291bbb01.kai.blin@biotech.uni-tuebingen.de>
References: <603590.1072.qm@web112620.mail.gq1.yahoo.com>
	<4C62B487.9090103@gmail.com>
	<62C86AFB-FF3A-44C6-A413-50C3F839DF34@illinois.edu>
	<20100812140900.291bbb01.kai.blin@biotech.uni-tuebingen.de>
Message-ID: <8129B813-5B15-4DDC-AB0D-5D95EFFCE78D@illinois.edu>

On Aug 12, 2010, at 7:09 AM, Kai Blin wrote:

> On Wed, 11 Aug 2010 10:07:36 -0500
> Chris Fields <cjfields at illinois.edu> wrote:
> 
>> might also want to check whether you are using hmmer2 vs hmmer3.  not sure if the wrapper works for hmmer3.
> 
> It might if you initialize it using
> my $factory = Bio::Tools::Run::Hmmer->new(-hmm => 'model.hmm', -_READMETHOD => 'hmmer3');
> 
> at least for the programs that still exist with the same name in
> hmmer3. It won't support hmmer3 using the default options, though.
> 
> If I have some spare time, I'll look into this, no promises on the
> timeframe, though.
> 
> Cheers,
> Kai
> 
> -- 
> Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
> Institute for Microbiology and Infection Medicine
> Division of Microbiology/Biotechnology
> Eberhard-Karls-University of T?bingen
> Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
> D-72076 T?bingen                        Fax :   ++49 7071 29-5979
> Deutschland
> Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben

Would be nice to convert this over (at some point) to use Mark's CommandExts.  I'm thinking of doing this with Infernal, so if I get that running it wouldn't be terribly difficult to get hmmer3 working as well.

chris


From cjfields at illinois.edu  Thu Aug 12 12:14:44 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 12 Aug 2010 11:14:44 -0500
Subject: [Bioperl-l] using HMMER
In-Reply-To: <857996.8184.qm@web112610.mail.gq1.yahoo.com>
References: <603590.1072.qm@web112620.mail.gq1.yahoo.com>
	<4C62B487.9090103@gmail.com>
	<62C86AFB-FF3A-44C6-A413-50C3F839DF34@illinois.edu>
	<20100812140900.291bbb01.kai.blin@biotech.uni-tuebingen.de>
	<8129B813-5B15-4DDC-AB0D-5D95EFFCE78D@illinois.edu>
	<857996.8184.qm@web112610.mail.gq1.yahoo.com>
Message-ID: <43FD0A31-DB95-4AE9-B678-937EE6346BC2@illinois.edu>

Fayroz,

Please keep responses on-list.

It seems you need to update your local bioperl, as 'hmmer3' is a recent addition, after 1.6.1.  It will be in 1.6.2 if I can get the time to make a release :>

chris

On Aug 12, 2010, at 10:58 AM, fayroz wrote:

> dear chris,
> from HMMER documentation i found this statement
> "The HMMER programs must either be in your path, or you must set the environment
> variable HMMERDIR to point to their location." 
> is it will solve the problem?
> how can i do it please ? i work under windows7 platform
> 
> 
> when i appled this line with hmmer3
> my $factory = Bio::Tools::Run::Hmmer->new(-hmm => 'model.hmm', -_READMETHOD => 
> 'hmmer3');
> 
> this output apper: 
> 
> Bio::SearchIO: hmmer3 cannot be found
> 
> and when try with hmmer2 the same output apper: 
> 
> Exception
> ------------- EXCEPTION -------------
> MSG: Failed to load module Bio::SearchIO::hmmer3. Can't locate 
> Bio\SearchIO\hmmer3.pm in @INC (@INC contains: D:\Perl\bin\ D:/Perl/site/lib 
> D:/Perl/lib .) at D:/Perl/site/lib/Bio/Root/Root.pm line 439, <GEN0> line 1.
> STACK Bio::Root::Root::_load_module D:/Perl/site/lib/Bio/Root/Root.pm:441
> STACK (eval) D:/Perl/site/lib/Bio/SearchIO.pm:446
> STACK Bio::SearchIO::_load_format_module D:/Perl/site/lib/Bio/SearchIO.pm:445
> STACK Bio::SearchIO::new D:/Perl/site/lib/Bio/SearchIO.pm:189
> STACK Bio::Tools::Run::Hmmer::_run D:/Perl/site/lib/Bio/Tools/Run/Hmmer.pm:431
> STACK Bio::Tools::Run::Hmmer::hmmsearch 
> D:/Perl/site/lib/Bio/Tools/Run/Hmmer.pm:353
> STACK toplevel C:\Users\Khaled\AppData\Local\Temp\dzprltmp.pl:13
> -------------------------------------
> For more information about the SearchIO system please see the SearchIO docs.
> This includes ways of checking for formats at compile time, not run time
> '--informat' is not recognized as an internal or external command,
> operable program or batch file.
> Can't call method "next_result" on an undefined value at 
> C:\Users\Khaled\AppData\Local\Temp\dzprltmp.pl line 15, <GEN0> line 1.
> 
> 
> 
> ----- Original Message ----
> From: Chris Fields <cjfields at illinois.edu>
> To: Kai Blin <kai.blin at biotech.uni-tuebingen.de>
> Cc: fayroz <fayroz_farouk at yahoo.com>; bioperl-l at bioperl.org
> Sent: Thu, August 12, 2010 6:28:50 PM
> Subject: Re: [Bioperl-l] using HMMER
> 
> On Aug 12, 2010, at 7:09 AM, Kai Blin wrote:
> 
>> On Wed, 11 Aug 2010 10:07:36 -0500
>> Chris Fields <cjfields at illinois.edu> wrote:
>> 
>>> might also want to check whether you are using hmmer2 vs hmmer3.  not sure if 
>>> the wrapper works for hmmer3.
>> 
>> It might if you initialize it using
>> my $factory = Bio::Tools::Run::Hmmer->new(-hmm => 'model.hmm', -_READMETHOD => 
>> 'hmmer3');
>> 
>> at least for the programs that still exist with the same name in
>> hmmer3. It won't support hmmer3 using the default options, though.
>> 
>> If I have some spare time, I'll look into this, no promises on the
>> timeframe, though.
>> 
>> Cheers,
>> Kai
>> 
>> -- 
>> Dipl.-Inform. Kai Blin        kai.blin at biotech.uni-tuebingen.de
>> Institute for Microbiology and Infection Medicine
>> Division of Microbiology/Biotechnology
>> Eberhard-Karls-University of T?bingen
>> Auf der Morgenstelle 28                Phone : ++49 7071 29-78841
>> D-72076 T?bingen                        Fax :  ++49 7071 29-5979
>> Deutschland
>> Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben
> 
> Would be nice to convert this over (at some point) to use Mark's CommandExts.  
> I'm thinking of doing this with Infernal, so if I get that running it wouldn't 
> be terribly difficult to get hmmer3 working as well.
> 
> chris
> 
> 
> 


From jason at bioperl.org  Thu Aug 12 14:37:11 2010
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 12 Aug 2010 11:37:11 -0700
Subject: [Bioperl-l] Other: Script for editing alignments?
In-Reply-To: <20100812061811.4D92468539@evol.biology.mcmaster.ca>
References: <20100812061811.4D92468539@evol.biology.mcmaster.ca>
Message-ID: <4C643F57.3040408@bioperl.org>

Hi Si -

This is pretty straightforward with Bioperl. Here's one solution:

#!/usr/bin/perl -w
use strict;
use Bio::AlignIO;
my $in = Bio::AlignIO->new(-format => 'fasta', -file => shift @ARGV);
my $out = Bio::AlignIO->new(-format => 'fasta');

while( my $aln = $in->next_aln ) {
  for my $seq ( $aln->each_seq ) {
  my $str = $seq->seq;
  if( $str =~ /^(-+)/ ) {
     my $rep = length($1);
# replace from the 5' end
     substr($str,0,$rep,'N'x$rep);
  }
  if( $str =~ /(-+)$/ ) {
    my $rep = length($1);
   # replace from the 3' end
    substr($str,-1 * $rep,length($str),'N'x$rep);
  }
     $seq->seq($str);
  }
  # don't print the /start-end info in the FASTA ID
  $aln->set_displayname_flat(1);
  $out->write_aln($aln);
}

-jason

evoldir at evol.biology.mcmaster.ca wrote, On 8/11/10 11:18 PM:
> Dear All
>
> Alignment programs like MUSCLE and Clustal often output alignments with
> "-" symbols indicating indels (real events) within sequence alignments,
> but also "-" symbols at the 5' and 3' ends of sequences. The latter
> however, are not real evolutionary events and really should be Ns
> (missing data), depending on the sort of analytical framework you use.
>
> If there is sufficient heterogeneity and signal within the 5' and 3'
> ends of sequences, the "-"s can be manually edited in a text editor to
> Ns with no problem, if the alignment is small. If it is large (e.g. 2000
> seqs), or there are lots of alignments, it becomes a lengthy task.
>
> I'm investigating such alignments presently and so was wondering if
> anyone had a clever way of implementing sed, or had a Perl script that
> would perform such a task. Simply put, it would require replacing the 5'
> and 3' "-" below only with Ns and leaving the within sequence "-"s
> alone. The sequences naturally may span more than one line.
>
>   >Taxon 1
> -----ATGCTG--TGACTG----TGACT---
>   >Taxon 2
> ---GTATGTTG--TGACTGCT--TGACCGTC
>
> to
>
>   >Taxon 1
> NNNNNATGCTG--TGACTG----TGACTNNN
>   >Taxon 2
> NNNGTATGTTG--TGACTGCT--TGACCGTC
>
> It's a simple task, but I haven't seen any scripts out there to do the job.
>
> If there are any scripters out there who can help, or if someone knows
> of an application that would help, it would be great to hear from you.
>
> With best wishes and thanks
>
> Si Creer
>
>    


From genehack at genehack.org  Thu Aug 12 20:32:07 2010
From: genehack at genehack.org (John SJ Anderson)
Date: Thu, 12 Aug 2010 20:32:07 -0400
Subject: [Bioperl-l]
	Bio::SeqFeature::SimilarityPair->from_searchResult()?
In-Reply-To: <4513D6B2-F7B3-4A6E-91CA-879C9E372E84@gmail.com>
References: <4513D6B2-F7B3-4A6E-91CA-879C9E372E84@gmail.com>
Message-ID: <ABCC813F-9FF8-465E-B5AF-E95BD8291D95@genehack.org>


On Aug 10, 2010, at 21:54 , Douglas Hoen wrote:

> I was wondering why the Synopsis in the docs for Bio::SeqFeature::SimilarityPair has the following:
> $sim_pair = Bio::SeqFeature::SimilarityPair->from_searchResult($blastHit);
> 
> There doesn't actually seem to be a from_searchResult method. Am I missing something?

No, it looks like that method got removed back in 2002 as a part of moving to Bio::SearchIO (which was removed still later...):

  <http://github.com/bioperl/bioperl-live/commit/5e3bdc11eb0ceffcd8e8966299a6367e792f2fd1>

Unfortunately, the commit didn't update the documentation. From the tiny little bit I've looked at the code, it looks like you should just be calling the 'new()' method instead (note that it takes a set of arguments, not just a BLAST hit object).

Hope this helps -- if you should happen to have the tuits, a patch to update the documentation to reflect the current interface would be awesome...

chrs,
john.


From david.breimann at gmail.com  Fri Aug 13 09:01:10 2010
From: david.breimann at gmail.com (David Breimann)
Date: Fri, 13 Aug 2010 16:01:10 +0300
Subject: [Bioperl-l] Problem executing bp_genbank2gff3.pl from another perl
	script
Message-ID: <AANLkTikqTXynSe4dTqw1Tz5GOOyoDOZTC5C-HJWLKfaL@mail.gmail.com>

Hi,
I am rying to run bp_genbank2gff3.pl from another perl script that
gets a genbank as its argument.

This does not work  (no output files are generated):
    my $command = "bp_genbank2gff3.pl -y -o /tmp $ARGV[0]";

    open( my $command_out, "-|", $command );
    close $command_out;

but this does

    open( my $command_out, "-|", $command );
    sleep 3; # why do I need to sleep?
    close $command_out;

Why?

I though that close is supposed to block until the command is done:

Closing any piped filehandle causes the parent process to wait for the
child to finish... (see http://perldoc.perl.org/functions/open.html).

Thanks
Dave


From jun.yin at ucd.ie  Fri Aug 13 09:36:34 2010
From: jun.yin at ucd.ie (Jun Yin)
Date: Fri, 13 Aug 2010 14:36:34 +0100
Subject: [Bioperl-l] Bio::LocatableSeq end checking inconsistency
Message-ID: <004a01cb3aec$8c2ddd60$a4899820$%yin@ucd.ie>

Hi, all,

 
I am the google summer of code student working on Bio::Align subsystem
refactoring. The code (Bio::SimpleAlign) I re-implemented now has passed
nearly all the test, except a few tests on seq/start-end testing. But here
comes a problem. This may be an old issue, that the Bio::LocatableSeq end
assignment and checking are inconsistent.

 
The current end checking method is based on:

$end=$seq->_ungapped_len+$seq->start-1

However, this checking may not fit the real world case.

 
The inconsistency usually happens when a few columns of the sequence are
removed.

 
For example:

my $a = Bio::LocatableSeq->new(

    -id    => 'a',

    -strand => 1,

    -seq   => '-tcgatc-atcgatcg',

    -start => 30,

    -end   => 43

);

 
If we remove the 1st, 8th and the last columns

 
$a->seq() will be 'tcgatcatcgatc'

$a->_ungapped_len==12

 
Actually, in the real world, the first residue will still be 30 (the old
$seq->start), and the last residue is the residue before the 43 (the old
$seq->end), thus 42.

 
But if you call a validation, the calculation is
$a->_ungapped_len+$a->start-1=12+30-1=41

So the reassignment of the $seq->end will not pass the validation.

 
So unless you save the information to a new sequence object, the original
position information will be lost anyway. But in some cases, we have to
change the sequence in its original sequence object ..

 
What is your suggestion on this issue? 

A. pass the test and lose the information      #convenient in coding but the
start-end annotation is not right any more

B. keep the information and forget the test   #the object will still
remember where the last residue was in the original sequence. But is it
really meaningful at all? Because all the other residues may come from
nowhere

C. Neither of above #any other suggestions?

 
Cheers,

Jun Yin

Ph.D. student in U.C.D.

 
Bioinformatics Laboratory

Conway Institute

University College Dublin

 
From jessica.sun at gmail.com  Fri Aug 13 11:06:46 2010
From: jessica.sun at gmail.com (Jessica Sun)
Date: Fri, 13 Aug 2010 11:06:46 -0400
Subject: [Bioperl-l] Add sequence feature
Message-ID: <AANLkTimkwyqE1tbZ+p-T41yvBWVq=GrCJ+L0rT+84mPy@mail.gmail.com>

Does anyone knows how to open a genbank file, add new feature and then save
a new genbank
file with new feature added in bioperl ?

thx

-- 
Jessica Jingping Sun


From jessica.sun at gmail.com  Fri Aug 13 11:27:10 2010
From: jessica.sun at gmail.com (Jessica Sun)
Date: Fri, 13 Aug 2010 11:27:10 -0400
Subject: [Bioperl-l] Add sequence feature
In-Reply-To: <4C6562E0.7090008@gmail.com>
References: <AANLkTimkwyqE1tbZ+p-T41yvBWVq=GrCJ+L0rT+84mPy@mail.gmail.com>
	<4C6562E0.7090008@gmail.com>
Message-ID: <AANLkTikGRnpybPPsoSf_FAXWUUpFCK9YEMMiyStBbfbN@mail.gmail.com>

unfortunately. I want to add the feature to the sequence object I got from
the Genbank file, I do not mind to save a new genbank file but these new
genbank file contains the original genbank format and info I got plus the
new feature tags I need to added to. Any quick solution to this?

thx

Jessica


On Fri, Aug 13, 2010 at 11:21 AM, Roy Chaudhuri <roy.chaudhuri at gmail.com>wrote:

> Hi Jessica.
>
> You need to use Bio::SeqIO to read in the GenBank file to a BioPerl
> sequence object, and to write your new GenBank file:
> http://www.bioperl.org/wiki/HOWTO:SeqIO
>
> To add a new feature follow the instructions here:
>
> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Own_Sequences
>
> (except that you are adding the feature to the sequence object you got from
> the Genbank file, not a new Bio::Seq object).
>
> Cheers.
> Roy.
>
>
> On 13/08/2010 16:06, Jessica Sun wrote:
>
>> Does anyone knows how to open a genbank file, add new feature and then
>> save
>> a new genbank
>> file with new feature added in bioperl ?
>>
>> thx
>>
>>
>


-- 
Jessica Jingping Sun


From roy.chaudhuri at gmail.com  Fri Aug 13 11:21:04 2010
From: roy.chaudhuri at gmail.com (Roy Chaudhuri)
Date: Fri, 13 Aug 2010 16:21:04 +0100
Subject: [Bioperl-l] Add sequence feature
In-Reply-To: <AANLkTimkwyqE1tbZ+p-T41yvBWVq=GrCJ+L0rT+84mPy@mail.gmail.com>
References: <AANLkTimkwyqE1tbZ+p-T41yvBWVq=GrCJ+L0rT+84mPy@mail.gmail.com>
Message-ID: <4C6562E0.7090008@gmail.com>

Hi Jessica.

You need to use Bio::SeqIO to read in the GenBank file to a BioPerl 
sequence object, and to write your new GenBank file:
http://www.bioperl.org/wiki/HOWTO:SeqIO

To add a new feature follow the instructions here:
http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Own_Sequences

(except that you are adding the feature to the sequence object you got 
from the Genbank file, not a new Bio::Seq object).

Cheers.
Roy.

On 13/08/2010 16:06, Jessica Sun wrote:
> Does anyone knows how to open a genbank file, add new feature and then save
> a new genbank
> file with new feature added in bioperl ?
>
> thx
>


From roy.chaudhuri at gmail.com  Fri Aug 13 11:37:20 2010
From: roy.chaudhuri at gmail.com (Roy Chaudhuri)
Date: Fri, 13 Aug 2010 16:37:20 +0100
Subject: [Bioperl-l] Add sequence feature
In-Reply-To: <AANLkTikGRnpybPPsoSf_FAXWUUpFCK9YEMMiyStBbfbN@mail.gmail.com>
References: <AANLkTimkwyqE1tbZ+p-T41yvBWVq=GrCJ+L0rT+84mPy@mail.gmail.com>	<4C6562E0.7090008@gmail.com>
	<AANLkTikGRnpybPPsoSf_FAXWUUpFCK9YEMMiyStBbfbN@mail.gmail.com>
Message-ID: <4C6566B0.60706@gmail.com>

I'm not sure I understand, do you mean that you want to load just the 
sequence from the GenBank file (ignoring the existing annotation), then 
add your own features? There are instructions on how to do that here:
http://www.bioperl.org/wiki/HOWTO:SeqIO#Speed.2C_Bio::Seq::SeqBuilder

On 13/08/2010 16:27, Jessica Sun wrote:
> unfortunately. I want to add the feature to the sequence object I got
> from the Genbank file, I do not mind to save a new genbank file but
> these new genbank file contains the original genbank format and info I
> got plus the new feature tags I need to added to. Any quick solution to
> this?
>
> thx
>
> Jessica
>
>
>
> On Fri, Aug 13, 2010 at 11:21 AM, Roy Chaudhuri <roy.chaudhuri at gmail.com
> <mailto:roy.chaudhuri at gmail.com>> wrote:
>
>     Hi Jessica.
>
>     You need to use Bio::SeqIO to read in the GenBank file to a BioPerl
>     sequence object, and to write your new GenBank file:
>     http://www.bioperl.org/wiki/HOWTO:SeqIO
>
>     To add a new feature follow the instructions here:
>     http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Own_Sequences
>
>     (except that you are adding the feature to the sequence object you
>     got from the Genbank file, not a new Bio::Seq object).
>
>     Cheers.
>     Roy.
>
>
>     On 13/08/2010 16:06, Jessica Sun wrote:
>
>         Does anyone knows how to open a genbank file, add new feature
>         and then save
>         a new genbank
>         file with new feature added in bioperl ?
>
>         thx
>
>
>
>
>
> --
> Jessica Jingping Sun


From roy.chaudhuri at gmail.com  Fri Aug 13 11:57:27 2010
From: roy.chaudhuri at gmail.com (Roy Chaudhuri)
Date: Fri, 13 Aug 2010 16:57:27 +0100
Subject: [Bioperl-l] Add sequence feature
In-Reply-To: <AANLkTikRuHaDmA3JvsdBHBXpgeti-pJvHuJk15s_9K+P@mail.gmail.com>
References: <AANLkTimkwyqE1tbZ+p-T41yvBWVq=GrCJ+L0rT+84mPy@mail.gmail.com>	<4C6562E0.7090008@gmail.com>	<AANLkTikGRnpybPPsoSf_FAXWUUpFCK9YEMMiyStBbfbN@mail.gmail.com>	<4C6566B0.60706@gmail.com>
	<AANLkTikRuHaDmA3JvsdBHBXpgeti-pJvHuJk15s_9K+P@mail.gmail.com>
Message-ID: <4C656B67.5020402@gmail.com>

Please remember to copy replies to the mailing list.

You can loop over the features in your Bio::Seq object:
for my $feat ($seq->get_SeqFeatures) { # do something }

And once you have found the feature you want to modify, you can add a 
tag using something like:
$feat->add_tag_value('note',"this is a note");

When you're finished you can write out the modified sequence object to a 
new GenBank file.

On 13/08/2010 16:40, Jessica Sun wrote:
> no i want to load the genbank file with existing features and I need to
> add some new feature tags to the existing ones and then save to a new
> update genbank file for local usage. I just not quite good on how to
> easily merge the two steps you recommended into one in a neat way.
>
> thx
>
>
> On Fri, Aug 13, 2010 at 11:37 AM, Roy Chaudhuri <roy.chaudhuri at gmail.com
> <mailto:roy.chaudhuri at gmail.com>> wrote:
>
>     I'm not sure I understand, do you mean that you want to load just
>     the sequence from the GenBank file (ignoring the existing
>     annotation), then add your own features? There are instructions on
>     how to do that here:
>     http://www.bioperl.org/wiki/HOWTO:SeqIO#Speed.2C_Bio::Seq::SeqBuilder
>
>
>     On 13/08/2010 16:27, Jessica Sun wrote:
>
>         unfortunately. I want to add the feature to the sequence object
>         I got
>         from the Genbank file, I do not mind to save a new genbank file but
>         these new genbank file contains the original genbank format and
>         info I
>         got plus the new feature tags I need to added to. Any quick
>         solution to
>         this?
>
>         thx
>
>         Jessica
>
>
>
>         On Fri, Aug 13, 2010 at 11:21 AM, Roy Chaudhuri
>         <roy.chaudhuri at gmail.com <mailto:roy.chaudhuri at gmail.com>
>         <mailto:roy.chaudhuri at gmail.com
>         <mailto:roy.chaudhuri at gmail.com>>> wrote:
>
>             Hi Jessica.
>
>             You need to use Bio::SeqIO to read in the GenBank file to a
>         BioPerl
>             sequence object, and to write your new GenBank file:
>         http://www.bioperl.org/wiki/HOWTO:SeqIO
>
>             To add a new feature follow the instructions here:
>         http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Own_Sequences
>
>             (except that you are adding the feature to the sequence
>         object you
>             got from the Genbank file, not a new Bio::Seq object).
>
>             Cheers.
>             Roy.
>
>
>             On 13/08/2010 16:06, Jessica Sun wrote:
>
>                 Does anyone knows how to open a genbank file, add new
>         feature
>                 and then save
>                 a new genbank
>                 file with new feature added in bioperl ?
>
>                 thx
>
>
>
>
>
>         --
>         Jessica Jingping Sun
>
>
>
>
>
> --
> Jessica Jingping Sun


From jessica.sun at gmail.com  Fri Aug 13 13:06:32 2010
From: jessica.sun at gmail.com (Jessica Sun)
Date: Fri, 13 Aug 2010 13:06:32 -0400
Subject: [Bioperl-l] Add sequence feature
In-Reply-To: <4C656B67.5020402@gmail.com>
References: <AANLkTimkwyqE1tbZ+p-T41yvBWVq=GrCJ+L0rT+84mPy@mail.gmail.com>
	<4C6562E0.7090008@gmail.com>
	<AANLkTikGRnpybPPsoSf_FAXWUUpFCK9YEMMiyStBbfbN@mail.gmail.com>
	<4C6566B0.60706@gmail.com>
	<AANLkTikRuHaDmA3JvsdBHBXpgeti-pJvHuJk15s_9K+P@mail.gmail.com>
	<4C656B67.5020402@gmail.com>
Message-ID: <AANLkTikJ7md5gYebt3GOXBMbCBvDZQ1TmoEkUMp1MPf5@mail.gmail.com>

Thanks. I somehow get these error messages.

--------------------- WARNING ---------------------
MSG:  Bio::SeqIO::genbank=HASH(0xa7ba1c) is not a SeqI compliant module.
Attempting to dump, but may fail!
---------------------------------------------------
Can't locate object method "seq" via package "Bio::SeqIO::genbank" at
/Library/Perl/5.8.8/Bio/SeqIO/genbank.pm line 760, <GEN0> line 447.

by doing this,

my $feat = new Bio::SeqFeature::Generic(-start                 =>20,
                                        -end         => $40,
                                        -primary_tag => 'newfeature' );
                                    $feat->add_tag_value("note","this is
notes");
  $f->add_SeqFeature($feat); ## f is original feature pointer
$io = Bio::SeqIO->new(-format => "genbank", -file => ">$newoutfile" );

    $io->write_seq($seqio_object);

On Fri, Aug 13, 2010 at 11:57 AM, Roy Chaudhuri <roy.chaudhuri at gmail.com>wrote:

> Please remember to copy replies to the mailing list.
>
> You can loop over the features in your Bio::Seq object:
> for my $feat ($seq->get_SeqFeatures) { # do something }
>
> And once you have found the feature you want to modify, you can add a tag
> using something like:
> $feat->add_tag_value('note',"this is a note");
>
> When you're finished you can write out the modified sequence object to a
> new GenBank file.
>
>
> On 13/08/2010 16:40, Jessica Sun wrote:
>
>> no i want to load the genbank file with existing features and I need to
>> add some new feature tags to the existing ones and then save to a new
>> update genbank file for local usage. I just not quite good on how to
>> easily merge the two steps you recommended into one in a neat way.
>>
>> thx
>>
>>
>> On Fri, Aug 13, 2010 at 11:37 AM, Roy Chaudhuri <roy.chaudhuri at gmail.com
>> <mailto:roy.chaudhuri at gmail.com>> wrote:
>>
>>    I'm not sure I understand, do you mean that you want to load just
>>    the sequence from the GenBank file (ignoring the existing
>>    annotation), then add your own features? There are instructions on
>>    how to do that here:
>>    http://www.bioperl.org/wiki/HOWTO:SeqIO#Speed.2C_Bio::Seq::SeqBuilder
>>
>>
>>    On 13/08/2010 16:27, Jessica Sun wrote:
>>
>>        unfortunately. I want to add the feature to the sequence object
>>        I got
>>        from the Genbank file, I do not mind to save a new genbank file but
>>        these new genbank file contains the original genbank format and
>>        info I
>>        got plus the new feature tags I need to added to. Any quick
>>        solution to
>>        this?
>>
>>        thx
>>
>>        Jessica
>>
>>
>>
>>        On Fri, Aug 13, 2010 at 11:21 AM, Roy Chaudhuri
>>        <roy.chaudhuri at gmail.com <mailto:roy.chaudhuri at gmail.com>
>>        <mailto:roy.chaudhuri at gmail.com
>>        <mailto:roy.chaudhuri at gmail.com>>> wrote:
>>
>>            Hi Jessica.
>>
>>            You need to use Bio::SeqIO to read in the GenBank file to a
>>        BioPerl
>>            sequence object, and to write your new GenBank file:
>>        http://www.bioperl.org/wiki/HOWTO:SeqIO
>>
>>            To add a new feature follow the instructions here:
>>
>> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Own_Sequences
>>
>>            (except that you are adding the feature to the sequence
>>        object you
>>            got from the Genbank file, not a new Bio::Seq object).
>>
>>            Cheers.
>>            Roy.
>>
>>
>>            On 13/08/2010 16:06, Jessica Sun wrote:
>>
>>                Does anyone knows how to open a genbank file, add new
>>        feature
>>                and then save
>>                a new genbank
>>                file with new feature added in bioperl ?
>>
>>                thx
>>
>>
>>
>>
>>
>>        --
>>        Jessica Jingping Sun
>>
>>
>>
>>
>>
>> --
>> Jessica Jingping Sun
>>
>
>


-- 
Jessica Jingping Sun


From drummike at gmail.com  Fri Aug 13 13:41:55 2010
From: drummike at gmail.com (Mike Williams)
Date: Fri, 13 Aug 2010 13:41:55 -0400
Subject: [Bioperl-l] Add sequence feature
In-Reply-To: <AANLkTikJ7md5gYebt3GOXBMbCBvDZQ1TmoEkUMp1MPf5@mail.gmail.com>
References: <AANLkTimkwyqE1tbZ+p-T41yvBWVq=GrCJ+L0rT+84mPy@mail.gmail.com>
	<4C6562E0.7090008@gmail.com>
	<AANLkTikGRnpybPPsoSf_FAXWUUpFCK9YEMMiyStBbfbN@mail.gmail.com>
	<4C6566B0.60706@gmail.com>
	<AANLkTikRuHaDmA3JvsdBHBXpgeti-pJvHuJk15s_9K+P@mail.gmail.com>
	<4C656B67.5020402@gmail.com>
	<AANLkTikJ7md5gYebt3GOXBMbCBvDZQ1TmoEkUMp1MPf5@mail.gmail.com>
Message-ID: <AANLkTi=SuCgDmDZ1qQW0-mUQJxigteO4GPnSQD09oB90@mail.gmail.com>

On Fri, Aug 13, 2010 at 1:06 PM, Jessica Sun <jessica.sun at gmail.com> wrote:

> Thanks. I somehow get these error messages.
> by doing this,
>
> my $feat = new Bio::SeqFeature::Generic(-start                 =>20,
>                                        -end         => $40,
>                                        -primary_tag => 'newfeature' );
>                                     $feat->add_tag_value("note","this is
> notes");
>

That $40 looks fishy.  Try deleting the dollar sign.  You did mean just 40,
right?

Mike


From MEC at stowers.org  Fri Aug 13 13:37:50 2010
From: MEC at stowers.org (Cook, Malcolm)
Date: Fri, 13 Aug 2010 12:37:50 -0500
Subject: [Bioperl-l] Add sequence feature
In-Reply-To: <AANLkTikJ7md5gYebt3GOXBMbCBvDZQ1TmoEkUMp1MPf5@mail.gmail.com>
References: <AANLkTimkwyqE1tbZ+p-T41yvBWVq=GrCJ+L0rT+84mPy@mail.gmail.com>
	<4C6562E0.7090008@gmail.com>
	<AANLkTikGRnpybPPsoSf_FAXWUUpFCK9YEMMiyStBbfbN@mail.gmail.com>
	<4C6566B0.60706@gmail.com>
	<AANLkTikRuHaDmA3JvsdBHBXpgeti-pJvHuJk15s_9K+P@mail.gmail.com>
	<4C656B67.5020402@gmail.com>
	<AANLkTikJ7md5gYebt3GOXBMbCBvDZQ1TmoEkUMp1MPf5@mail.gmail.com>
Message-ID: <BD62CBAC4395B94096109020651BE2EC1312232E24@EXCHMB-02.stowers-institute.org>

Jessica,

Show more code!

In particular, where did $f get set?

--Malcolm

 
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Jessica Sun
Sent: Friday, August 13, 2010 12:07 PM
To: Roy Chaudhuri
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Add sequence feature

Thanks. I somehow get these error messages.

--------------------- WARNING ---------------------
MSG:  Bio::SeqIO::genbank=HASH(0xa7ba1c) is not a SeqI compliant module.
Attempting to dump, but may fail!
---------------------------------------------------
Can't locate object method "seq" via package "Bio::SeqIO::genbank" at /Library/Perl/5.8.8/Bio/SeqIO/genbank.pm line 760, <GEN0> line 447.

by doing this,

my $feat = new Bio::SeqFeature::Generic(-start                 =>20,
                                        -end         => $40,
                                        -primary_tag => 'newfeature' );
                                    $feat->add_tag_value("note","this is notes");
  $f->add_SeqFeature($feat); ## f is original feature pointer $io = Bio::SeqIO->new(-format => "genbank", -file => ">$newoutfile" );

    $io->write_seq($seqio_object);

On Fri, Aug 13, 2010 at 11:57 AM, Roy Chaudhuri <roy.chaudhuri at gmail.com>wrote:

> Please remember to copy replies to the mailing list.
>
> You can loop over the features in your Bio::Seq object:
> for my $feat ($seq->get_SeqFeatures) { # do something }
>
> And once you have found the feature you want to modify, you can add a 
> tag using something like:
> $feat->add_tag_value('note',"this is a note");
>
> When you're finished you can write out the modified sequence object to 
> a new GenBank file.
>
>
> On 13/08/2010 16:40, Jessica Sun wrote:
>
>> no i want to load the genbank file with existing features and I need 
>> to add some new feature tags to the existing ones and then save to a 
>> new update genbank file for local usage. I just not quite good on how 
>> to easily merge the two steps you recommended into one in a neat way.
>>
>> thx
>>
>>
>> On Fri, Aug 13, 2010 at 11:37 AM, Roy Chaudhuri 
>> <roy.chaudhuri at gmail.com <mailto:roy.chaudhuri at gmail.com>> wrote:
>>
>>    I'm not sure I understand, do you mean that you want to load just
>>    the sequence from the GenBank file (ignoring the existing
>>    annotation), then add your own features? There are instructions on
>>    how to do that here:
>>    
>> http://www.bioperl.org/wiki/HOWTO:SeqIO#Speed.2C_Bio::Seq::SeqBuilder
>>
>>
>>    On 13/08/2010 16:27, Jessica Sun wrote:
>>
>>        unfortunately. I want to add the feature to the sequence object
>>        I got
>>        from the Genbank file, I do not mind to save a new genbank file but
>>        these new genbank file contains the original genbank format and
>>        info I
>>        got plus the new feature tags I need to added to. Any quick
>>        solution to
>>        this?
>>
>>        thx
>>
>>        Jessica
>>
>>
>>
>>        On Fri, Aug 13, 2010 at 11:21 AM, Roy Chaudhuri
>>        <roy.chaudhuri at gmail.com <mailto:roy.chaudhuri at gmail.com>
>>        <mailto:roy.chaudhuri at gmail.com
>>        <mailto:roy.chaudhuri at gmail.com>>> wrote:
>>
>>            Hi Jessica.
>>
>>            You need to use Bio::SeqIO to read in the GenBank file to a
>>        BioPerl
>>            sequence object, and to write your new GenBank file:
>>        http://www.bioperl.org/wiki/HOWTO:SeqIO
>>
>>            To add a new feature follow the instructions here:
>>
>> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Ow
>> n_Sequences
>>
>>            (except that you are adding the feature to the sequence
>>        object you
>>            got from the Genbank file, not a new Bio::Seq object).
>>
>>            Cheers.
>>            Roy.
>>
>>
>>            On 13/08/2010 16:06, Jessica Sun wrote:
>>
>>                Does anyone knows how to open a genbank file, add new
>>        feature
>>                and then save
>>                a new genbank
>>                file with new feature added in bioperl ?
>>
>>                thx
>>
>>
>>
>>
>>
>>        --
>>        Jessica Jingping Sun
>>
>>
>>
>>
>>
>> --
>> Jessica Jingping Sun
>>
>
>


--
Jessica Jingping Sun
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From Kevin.M.Brown at asu.edu  Fri Aug 13 13:53:50 2010
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Fri, 13 Aug 2010 10:53:50 -0700
Subject: [Bioperl-l] Add sequence feature
In-Reply-To: <AANLkTikJ7md5gYebt3GOXBMbCBvDZQ1TmoEkUMp1MPf5@mail.gmail.com>
References: <AANLkTimkwyqE1tbZ+p-T41yvBWVq=GrCJ+L0rT+84mPy@mail.gmail.com><4C6562E0.7090008@gmail.com><AANLkTikGRnpybPPsoSf_FAXWUUpFCK9YEMMiyStBbfbN@mail.gmail.com><4C6566B0.60706@gmail.com><AANLkTikRuHaDmA3JvsdBHBXpgeti-pJvHuJk15s_9K+P@mail.gmail.com><4C656B67.5020402@gmail.com>
	<AANLkTikJ7md5gYebt3GOXBMbCBvDZQ1TmoEkUMp1MPf5@mail.gmail.com>
Message-ID: <1A4207F8295607498283FE9E93B775B406E4529F@EX02.asurite.ad.asu.edu>

If I'm reading your sample code correctly, then you are mistakenly
trying to output the input SeqIO object and not the actual Bio::Seq
object that was read in by SeqIO.

My $seqio = Bio::SeqIO->new;
My $seq = $seqio->next_seq;

#manipulate $seq

My $out = Bio::SeqIO->new;
$out->write_seq($seq);

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Jessica Sun
Sent: Friday, August 13, 2010 10:07 AM
To: Roy Chaudhuri
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Add sequence feature

Thanks. I somehow get these error messages.

--------------------- WARNING ---------------------
MSG:  Bio::SeqIO::genbank=HASH(0xa7ba1c) is not a SeqI compliant module.
Attempting to dump, but may fail!
---------------------------------------------------
Can't locate object method "seq" via package "Bio::SeqIO::genbank" at
/Library/Perl/5.8.8/Bio/SeqIO/genbank.pm line 760, <GEN0> line 447.

by doing this,

my $feat = new Bio::SeqFeature::Generic(-start                 =>20,
                                        -end         => $40,
                                        -primary_tag => 'newfeature' );
                                    $feat->add_tag_value("note","this is
notes");
  $f->add_SeqFeature($feat); ## f is original feature pointer
$io = Bio::SeqIO->new(-format => "genbank", -file => ">$newoutfile" );

    $io->write_seq($seqio_object);

On Fri, Aug 13, 2010 at 11:57 AM, Roy Chaudhuri
<roy.chaudhuri at gmail.com>wrote:

> Please remember to copy replies to the mailing list.
>
> You can loop over the features in your Bio::Seq object:
> for my $feat ($seq->get_SeqFeatures) { # do something }
>
> And once you have found the feature you want to modify, you can add a
tag
> using something like:
> $feat->add_tag_value('note',"this is a note");
>
> When you're finished you can write out the modified sequence object to
a
> new GenBank file.
>
>
> On 13/08/2010 16:40, Jessica Sun wrote:
>
>> no i want to load the genbank file with existing features and I need
to
>> add some new feature tags to the existing ones and then save to a new
>> update genbank file for local usage. I just not quite good on how to
>> easily merge the two steps you recommended into one in a neat way.
>>
>> thx
>>
>>
>> On Fri, Aug 13, 2010 at 11:37 AM, Roy Chaudhuri
<roy.chaudhuri at gmail.com
>> <mailto:roy.chaudhuri at gmail.com>> wrote:
>>
>>    I'm not sure I understand, do you mean that you want to load just
>>    the sequence from the GenBank file (ignoring the existing
>>    annotation), then add your own features? There are instructions on
>>    how to do that here:
>>
http://www.bioperl.org/wiki/HOWTO:SeqIO#Speed.2C_Bio::Seq::SeqBuilder
>>
>>
>>    On 13/08/2010 16:27, Jessica Sun wrote:
>>
>>        unfortunately. I want to add the feature to the sequence
object
>>        I got
>>        from the Genbank file, I do not mind to save a new genbank
file but
>>        these new genbank file contains the original genbank format
and
>>        info I
>>        got plus the new feature tags I need to added to. Any quick
>>        solution to
>>        this?
>>
>>        thx
>>
>>        Jessica
>>
>>
>>
>>        On Fri, Aug 13, 2010 at 11:21 AM, Roy Chaudhuri
>>        <roy.chaudhuri at gmail.com <mailto:roy.chaudhuri at gmail.com>
>>        <mailto:roy.chaudhuri at gmail.com
>>        <mailto:roy.chaudhuri at gmail.com>>> wrote:
>>
>>            Hi Jessica.
>>
>>            You need to use Bio::SeqIO to read in the GenBank file to
a
>>        BioPerl
>>            sequence object, and to write your new GenBank file:
>>        http://www.bioperl.org/wiki/HOWTO:SeqIO
>>
>>            To add a new feature follow the instructions here:
>>
>>
http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Own_S
equences
>>
>>            (except that you are adding the feature to the sequence
>>        object you
>>            got from the Genbank file, not a new Bio::Seq object).
>>
>>            Cheers.
>>            Roy.
>>
>>
>>            On 13/08/2010 16:06, Jessica Sun wrote:
>>
>>                Does anyone knows how to open a genbank file, add new
>>        feature
>>                and then save
>>                a new genbank
>>                file with new feature added in bioperl ?
>>
>>                thx
>>
>>
>>
>>
>>
>>        --
>>        Jessica Jingping Sun
>>
>>
>>
>>
>>
>> --
>> Jessica Jingping Sun
>>
>
>


-- 
Jessica Jingping Sun
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jessica.sun at gmail.com  Fri Aug 13 15:16:51 2010
From: jessica.sun at gmail.com (Jessica Sun)
Date: Fri, 13 Aug 2010 15:16:51 -0400
Subject: [Bioperl-l] Fwd:  Add sequence feature
In-Reply-To: <AANLkTim6MBPBbRr2bEkCgCL+6NMXGqJ0wWoz3-JPRKyG@mail.gmail.com>
References: <AANLkTimkwyqE1tbZ+p-T41yvBWVq=GrCJ+L0rT+84mPy@mail.gmail.com>
	<4C6562E0.7090008@gmail.com>
	<AANLkTikGRnpybPPsoSf_FAXWUUpFCK9YEMMiyStBbfbN@mail.gmail.com>
	<4C6566B0.60706@gmail.com>
	<AANLkTikRuHaDmA3JvsdBHBXpgeti-pJvHuJk15s_9K+P@mail.gmail.com>
	<4C656B67.5020402@gmail.com>
	<AANLkTikJ7md5gYebt3GOXBMbCBvDZQ1TmoEkUMp1MPf5@mail.gmail.com>
	<1A4207F8295607498283FE9E93B775B406E4529F@EX02.asurite.ad.asu.edu>
	<AANLkTim6MBPBbRr2bEkCgCL+6NMXGqJ0wWoz3-JPRKyG@mail.gmail.com>
Message-ID: <AANLkTimFO1Yn-n7vqmmvAF5smQeGadEW_fs_a0U-7ej4@mail.gmail.com>

---------- Forwarded message ----------
From: Jessica Sun <jessica.sun at gmail.com>
Date: Fri, Aug 13, 2010 at 3:16 PM
Subject: Re: [Bioperl-l] Add sequence feature
To: Kevin Brown <Kevin.M.Brown at asu.edu>


yes, I change that, somehow it still did not take the added features in.


On Fri, Aug 13, 2010 at 1:53 PM, Kevin Brown <Kevin.M.Brown at asu.edu> wrote:

> If I'm reading your sample code correctly, then you are mistakenly
> trying to output the input SeqIO object and not the actual Bio::Seq
> object that was read in by SeqIO.
>
> My $seqio = Bio::SeqIO->new;
> My $seq = $seqio->next_seq;
>
> #manipulate $seq
>
> My $out = Bio::SeqIO->new;
> $out->write_seq($seq);
>
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Jessica Sun
> Sent: Friday, August 13, 2010 10:07 AM
> To: Roy Chaudhuri
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Add sequence feature
>
> Thanks. I somehow get these error messages.
>
> --------------------- WARNING ---------------------
> MSG:  Bio::SeqIO::genbank=HASH(0xa7ba1c) is not a SeqI compliant module.
> Attempting to dump, but may fail!
> ---------------------------------------------------
> Can't locate object method "seq" via package "Bio::SeqIO::genbank" at
> /Library/Perl/5.8.8/Bio/SeqIO/genbank.pm line 760, <GEN0> line 447.
>
> by doing this,
>
> my $feat = new Bio::SeqFeature::Generic(-start                 =>20,
>                                        -end         => $40,
>                                        -primary_tag => 'newfeature' );
>                                    $feat->add_tag_value("note","this is
> notes");
>  $f->add_SeqFeature($feat); ## f is original feature pointer
> $io = Bio::SeqIO->new(-format => "genbank", -file => ">$newoutfile" );
>
>    $io->write_seq($seqio_object);
>
> On Fri, Aug 13, 2010 at 11:57 AM, Roy Chaudhuri
> <roy.chaudhuri at gmail.com>wrote:
>
> > Please remember to copy replies to the mailing list.
> >
> > You can loop over the features in your Bio::Seq object:
> > for my $feat ($seq->get_SeqFeatures) { # do something }
> >
> > And once you have found the feature you want to modify, you can add a
> tag
> > using something like:
> > $feat->add_tag_value('note',"this is a note");
> >
> > When you're finished you can write out the modified sequence object to
> a
> > new GenBank file.
> >
> >
> > On 13/08/2010 16:40, Jessica Sun wrote:
> >
> >> no i want to load the genbank file with existing features and I need
> to
> >> add some new feature tags to the existing ones and then save to a new
> >> update genbank file for local usage. I just not quite good on how to
> >> easily merge the two steps you recommended into one in a neat way.
> >>
> >> thx
> >>
> >>
> >> On Fri, Aug 13, 2010 at 11:37 AM, Roy Chaudhuri
> <roy.chaudhuri at gmail.com
> >> <mailto:roy.chaudhuri at gmail.com>> wrote:
> >>
> >>    I'm not sure I understand, do you mean that you want to load just
> >>    the sequence from the GenBank file (ignoring the existing
> >>    annotation), then add your own features? There are instructions on
> >>    how to do that here:
> >>
> http://www.bioperl.org/wiki/HOWTO:SeqIO#Speed.2C_Bio::Seq::SeqBuilder
> >>
> >>
> >>    On 13/08/2010 16:27, Jessica Sun wrote:
> >>
> >>        unfortunately. I want to add the feature to the sequence
> object
> >>        I got
> >>        from the Genbank file, I do not mind to save a new genbank
> file but
> >>        these new genbank file contains the original genbank format
> and
> >>        info I
> >>        got plus the new feature tags I need to added to. Any quick
> >>        solution to
> >>        this?
> >>
> >>        thx
> >>
> >>        Jessica
> >>
> >>
> >>
> >>        On Fri, Aug 13, 2010 at 11:21 AM, Roy Chaudhuri
> >>        <roy.chaudhuri at gmail.com <mailto:roy.chaudhuri at gmail.com>
> >>        <mailto:roy.chaudhuri at gmail.com
> >>        <mailto:roy.chaudhuri at gmail.com>>> wrote:
> >>
> >>            Hi Jessica.
> >>
> >>            You need to use Bio::SeqIO to read in the GenBank file to
> a
> >>        BioPerl
> >>            sequence object, and to write your new GenBank file:
> >>        http://www.bioperl.org/wiki/HOWTO:SeqIO
> >>
> >>            To add a new feature follow the instructions here:
> >>
> >>
> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Own_S
> equences
> >>
> >>            (except that you are adding the feature to the sequence
> >>        object you
> >>            got from the Genbank file, not a new Bio::Seq object).
> >>
> >>            Cheers.
> >>            Roy.
> >>
> >>
> >>            On 13/08/2010 16:06, Jessica Sun wrote:
> >>
> >>                Does anyone knows how to open a genbank file, add new
> >>        feature
> >>                and then save
> >>                a new genbank
> >>                file with new feature added in bioperl ?
> >>
> >>                thx
> >>
> >>
> >>
> >>
> >>
> >>        --
> >>        Jessica Jingping Sun
> >>
> >>
> >>
> >>
> >>
> >> --
> >> Jessica Jingping Sun
> >>
> >
> >
>
>
> --
> Jessica Jingping Sun
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Jessica Jingping Sun


-- 
Jessica Jingping Sun


From MEC at stowers.org  Fri Aug 13 15:56:09 2010
From: MEC at stowers.org (Cook, Malcolm)
Date: Fri, 13 Aug 2010 14:56:09 -0500
Subject: [Bioperl-l] Fwd:  Add sequence feature
In-Reply-To: <AANLkTimFO1Yn-n7vqmmvAF5smQeGadEW_fs_a0U-7ej4@mail.gmail.com>
References: <AANLkTimkwyqE1tbZ+p-T41yvBWVq=GrCJ+L0rT+84mPy@mail.gmail.com>
	<4C6562E0.7090008@gmail.com>
	<AANLkTikGRnpybPPsoSf_FAXWUUpFCK9YEMMiyStBbfbN@mail.gmail.com>
	<4C6566B0.60706@gmail.com>
	<AANLkTikRuHaDmA3JvsdBHBXpgeti-pJvHuJk15s_9K+P@mail.gmail.com>
	<4C656B67.5020402@gmail.com>
	<AANLkTikJ7md5gYebt3GOXBMbCBvDZQ1TmoEkUMp1MPf5@mail.gmail.com>
	<1A4207F8295607498283FE9E93B775B406E4529F@EX02.asurite.ad.asu.edu>
	<AANLkTim6MBPBbRr2bEkCgCL+6NMXGqJ0wWoz3-JPRKyG@mail.gmail.com>
	<AANLkTimFO1Yn-n7vqmmvAF5smQeGadEW_fs_a0U-7ej4@mail.gmail.com>
Message-ID: <BD62CBAC4395B94096109020651BE2EC1312232E46@EXCHMB-02.stowers-institute.org>

if you want to show all your code we might not have to guess at what the problem is.....
 

Malcolm Cook
Stowers Institute for Medical Research -  Bioinformatics
Kansas City, Missouri  USA
 

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Jessica Sun
Sent: Friday, August 13, 2010 2:17 PM
To: bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] Fwd: Add sequence feature

---------- Forwarded message ----------
From: Jessica Sun <jessica.sun at gmail.com>
Date: Fri, Aug 13, 2010 at 3:16 PM
Subject: Re: [Bioperl-l] Add sequence feature
To: Kevin Brown <Kevin.M.Brown at asu.edu>


yes, I change that, somehow it still did not take the added features in.


On Fri, Aug 13, 2010 at 1:53 PM, Kevin Brown <Kevin.M.Brown at asu.edu> wrote:

> If I'm reading your sample code correctly, then you are mistakenly 
> trying to output the input SeqIO object and not the actual Bio::Seq 
> object that was read in by SeqIO.
>
> My $seqio = Bio::SeqIO->new;
> My $seq = $seqio->next_seq;
>
> #manipulate $seq
>
> My $out = Bio::SeqIO->new;
> $out->write_seq($seq);
>
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Jessica Sun
> Sent: Friday, August 13, 2010 10:07 AM
> To: Roy Chaudhuri
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Add sequence feature
>
> Thanks. I somehow get these error messages.
>
> --------------------- WARNING ---------------------
> MSG:  Bio::SeqIO::genbank=HASH(0xa7ba1c) is not a SeqI compliant module.
> Attempting to dump, but may fail!
> ---------------------------------------------------
> Can't locate object method "seq" via package "Bio::SeqIO::genbank" at 
> /Library/Perl/5.8.8/Bio/SeqIO/genbank.pm line 760, <GEN0> line 447.
>
> by doing this,
>
> my $feat = new Bio::SeqFeature::Generic(-start                 =>20,
>                                        -end         => $40,
>                                        -primary_tag => 'newfeature' );
>                                    $feat->add_tag_value("note","this 
> is notes");  $f->add_SeqFeature($feat); ## f is original feature 
> pointer $io = Bio::SeqIO->new(-format => "genbank", -file => 
> ">$newoutfile" );
>
>    $io->write_seq($seqio_object);
>
> On Fri, Aug 13, 2010 at 11:57 AM, Roy Chaudhuri
> <roy.chaudhuri at gmail.com>wrote:
>
> > Please remember to copy replies to the mailing list.
> >
> > You can loop over the features in your Bio::Seq object:
> > for my $feat ($seq->get_SeqFeatures) { # do something }
> >
> > And once you have found the feature you want to modify, you can add 
> > a
> tag
> > using something like:
> > $feat->add_tag_value('note',"this is a note");
> >
> > When you're finished you can write out the modified sequence object 
> > to
> a
> > new GenBank file.
> >
> >
> > On 13/08/2010 16:40, Jessica Sun wrote:
> >
> >> no i want to load the genbank file with existing features and I 
> >> need
> to
> >> add some new feature tags to the existing ones and then save to a 
> >> new update genbank file for local usage. I just not quite good on 
> >> how to easily merge the two steps you recommended into one in a neat way.
> >>
> >> thx
> >>
> >>
> >> On Fri, Aug 13, 2010 at 11:37 AM, Roy Chaudhuri
> <roy.chaudhuri at gmail.com
> >> <mailto:roy.chaudhuri at gmail.com>> wrote:
> >>
> >>    I'm not sure I understand, do you mean that you want to load just
> >>    the sequence from the GenBank file (ignoring the existing
> >>    annotation), then add your own features? There are instructions on
> >>    how to do that here:
> >>
> http://www.bioperl.org/wiki/HOWTO:SeqIO#Speed.2C_Bio::Seq::SeqBuilder
> >>
> >>
> >>    On 13/08/2010 16:27, Jessica Sun wrote:
> >>
> >>        unfortunately. I want to add the feature to the sequence
> object
> >>        I got
> >>        from the Genbank file, I do not mind to save a new genbank
> file but
> >>        these new genbank file contains the original genbank format
> and
> >>        info I
> >>        got plus the new feature tags I need to added to. Any quick
> >>        solution to
> >>        this?
> >>
> >>        thx
> >>
> >>        Jessica
> >>
> >>
> >>
> >>        On Fri, Aug 13, 2010 at 11:21 AM, Roy Chaudhuri
> >>        <roy.chaudhuri at gmail.com <mailto:roy.chaudhuri at gmail.com>
> >>        <mailto:roy.chaudhuri at gmail.com
> >>        <mailto:roy.chaudhuri at gmail.com>>> wrote:
> >>
> >>            Hi Jessica.
> >>
> >>            You need to use Bio::SeqIO to read in the GenBank file 
> >> to
> a
> >>        BioPerl
> >>            sequence object, and to write your new GenBank file:
> >>        http://www.bioperl.org/wiki/HOWTO:SeqIO
> >>
> >>            To add a new feature follow the instructions here:
> >>
> >>
> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Own
> _S
> equences
> >>
> >>            (except that you are adding the feature to the sequence
> >>        object you
> >>            got from the Genbank file, not a new Bio::Seq object).
> >>
> >>            Cheers.
> >>            Roy.
> >>
> >>
> >>            On 13/08/2010 16:06, Jessica Sun wrote:
> >>
> >>                Does anyone knows how to open a genbank file, add new
> >>        feature
> >>                and then save
> >>                a new genbank
> >>                file with new feature added in bioperl ?
> >>
> >>                thx
> >>
> >>
> >>
> >>
> >>
> >>        --
> >>        Jessica Jingping Sun
> >>
> >>
> >>
> >>
> >>
> >> --
> >> Jessica Jingping Sun
> >>
> >
> >
>
>
> --
> Jessica Jingping Sun
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


--
Jessica Jingping Sun


-- 
Jessica Jingping Sun
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Mon Aug 16 14:02:15 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 16 Aug 2010 13:02:15 -0500
Subject: [Bioperl-l] Bug? Features with similar ranges,
	different IDs are considered overlapping
Message-ID: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu>

All,

This is in reference to a bug report I filed a while back.  In the below test script, two features with the same start/end are compared.  If the features have the same seq_id(), overlap succeeds.  If the seq_id is changed (e.g. is on another chromosome, for instance), the overlap still succeeds.  

The question is: is this a bug?  My vote would be 'yes', but there have been various arguments to say it's not.  

chris

(maybe I'll make this a regular thing on the list, just to hash out some of the edge cases I run into periodically)

=========================================

#!/usr/bin/perl -w

use strict;
use warnings;
use Test::More;
use Bio::SeqFeature::Generic;

my ( $feat1, $feat2 );

$feat1 = Bio::SeqFeature::Generic->new(
    -start  => 40,
    -end    => 80,
    -strand => 1,
    -seq_id => 'ABC123',
);

is $feat1->start,  40,       'start of feature location';
is $feat1->end,    80,       'end of feature location';
is $feat1->seq_id, 'ABC123', 'seq_id';

$feat2 = Bio::SeqFeature::Generic->new(
    -start  => 40,
    -end    => 80,
    -strand => 1,
    -seq_id => 'ABC123',
);

is $feat2->start,  40,       'start of feature location';
is $feat2->end,    80,       'end of feature location';
is $feat2->seq_id, 'ABC123', 'seq_id';

# Generic features with same Seq ID should overlap
ok( $feat2->overlaps($feat1), 'feat2 overlaps feat1' );

# Generic features with different Seq IDs shouldn't overlap
is( $feat2->seq_id('XYZ678'), 'XYZ678', 'change seq_id' );

# this currently fails
ok( !( $feat2->overlaps($feat1), 'feat2 doesn\'t overlap feat1' ) );

done_testing();


From David.Messina at sbc.su.se  Mon Aug 16 14:51:54 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Mon, 16 Aug 2010 20:51:54 +0200
Subject: [Bioperl-l] Bug? Features with similar ranges,
	different IDs are considered overlapping
In-Reply-To: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu>
References: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu>
Message-ID: <A07B1A30-7708-4401-BB13-7B4463D306E7@sbc.su.se>

> The question is: is this a bug?

Hmm, tricky.

Genomic start and end positions with differing IDs shouldn't overlap, but can't SeqFeatures apply to proteins and other molecules where one would want to compare positions without regard to ID?


Dave


From cjfields at illinois.edu  Mon Aug 16 21:39:00 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 16 Aug 2010 20:39:00 -0500
Subject: [Bioperl-l] Bug? Features with similar ranges,
	different IDs are considered overlapping
In-Reply-To: <A07B1A30-7708-4401-BB13-7B4463D306E7@sbc.su.se>
References: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu>
	<A07B1A30-7708-4401-BB13-7B4463D306E7@sbc.su.se>
Message-ID: <E3473ED6-2122-4B4A-8A73-E80C4136CCAC@illinois.edu>

On Aug 16, 2010, at 1:51 PM, Dave Messina wrote:

>> The question is: is this a bug?
> 
> Hmm, tricky.
> 
> Genomic start and end positions with differing IDs shouldn't overlap, but can't SeqFeatures apply to proteins and other molecules where one would want to compare positions without regard to ID?
> 
> Dave

Good point; it's probably the context the methods are used that matters.  So, maybe just a document clarification?

chris


From David.Messina at sbc.su.se  Tue Aug 17 05:06:05 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Tue, 17 Aug 2010 11:06:05 +0200
Subject: [Bioperl-l] Bug? Features with similar ranges,
	different IDs are considered overlapping
In-Reply-To: <E3473ED6-2122-4B4A-8A73-E80C4136CCAC@illinois.edu>
References: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu>
	<A07B1A30-7708-4401-BB13-7B4463D306E7@sbc.su.se>
	<E3473ED6-2122-4B4A-8A73-E80C4136CCAC@illinois.edu>
Message-ID: <83732A2C-C6E3-479A-ACAA-FE5756271991@sbc.su.se>

> Good point; it's probably the context the methods are used that matters.  So, maybe just a document clarification?

That's always good, but it really doesn't solve the issue you're describing.

I mean, who would expect to get overlaps for features on different chromosomes?

To me, that's a clear violation of reasonable user expectations. You shouldn't have to read the docs about something like that.

So what's the solution for these duelling use cases? I haven't thought about it much, but a first approximation might be to add a -genomic boolean flag that, when true, would do the right thing and check the ID when doing overlaps or other positional comparisons.

(Maybe -genomic is too obscure. Maybe it should be -same_id_for_overlaps or something like that.)

And maybe having to know to set a flag is effectively the same thing as having to read the docs to understand SeqFeature's overlap behavior.

What do the rest of you out there think?


Dave


From scott at scottcain.net  Tue Aug 17 08:45:27 2010
From: scott at scottcain.net (Scott Cain)
Date: Tue, 17 Aug 2010 08:45:27 -0400
Subject: [Bioperl-l] Bug? Features with similar ranges,
	different IDs are considered overlapping
In-Reply-To: <83732A2C-C6E3-479A-ACAA-FE5756271991@sbc.su.se>
References: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu>
	<A07B1A30-7708-4401-BB13-7B4463D306E7@sbc.su.se>
	<E3473ED6-2122-4B4A-8A73-E80C4136CCAC@illinois.edu>
	<83732A2C-C6E3-479A-ACAA-FE5756271991@sbc.su.se>
Message-ID: <B7A8E3B4-1E7E-4768-AFF3-3D4C4A5FC3B1@scottcain.net>

Hi Dave and Chris,

It seems to me that the genomic comparison is the thing people would do more often, so if you're going to create a flag, the default should be for the genomic comparison and if somebody is doing the protein space comparison and not getting the the expected results, they'll probably read the docs to find out why. 

Scott

--
Scott Cain, Ph. D.
scott at scottcain dot net
Ontario Institute for Cancer Research
http://gmod.org/
216 392 3087 

Snet from my iPhone.

On Aug 17, 2010, at 5:06 AM, Dave Messina <David.Messina at sbc.su.se> wrote:

>> Good point; it's probably the context the methods are used that matters.  So, maybe just a document clarification?
> 
> That's always good, but it really doesn't solve the issue you're describing.
> 
> I mean, who would expect to get overlaps for features on different chromosomes?
> 
> To me, that's a clear violation of reasonable user expectations. You shouldn't have to read the docs about something like that.
> 
> So what's the solution for these duelling use cases? I haven't thought about it much, but a first approximation might be to add a -genomic boolean flag that, when true, would do the right thing and check the ID when doing overlaps or other positional comparisons.
> 
> (Maybe -genomic is too obscure. Maybe it should be -same_id_for_overlaps or something like that.)
> 
> And maybe having to know to set a flag is effectively the same thing as having to read the docs to understand SeqFeature's overlap behavior.
> 
> What do the rest of you out there think?
> 
> 
> Dave
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From david.breimann at gmail.com  Tue Aug 17 09:44:08 2010
From: david.breimann at gmail.com (David Breimann)
Date: Tue, 17 Aug 2010 16:44:08 +0300
Subject: [Bioperl-l] bp_genbank2gff3.pl error with circular genomes
Message-ID: <AANLkTinzCSDcbXVDabwW+qmwSOKcVkYC-a-pkmYy1dfM@mail.gmail.com>

Hello,

The following genbank has a gene that runs over the 'end" of the
chromosome and into its "beginning", and the script generates an
error.

ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Bacillus_cereus_ATCC_10987/NC_005707.gbk

NC_005707 Unflattening error:
Details:
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: PROBLEM, SEVERITY==2
Ranges not in correct order. Strange ensembl genbank entry? Range:
[207497,208369] [1,687]
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:473
STACK: Bio::SeqFeature::Tools::Unflattener::problem
/usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:952
STACK: Bio::SeqFeature::Tools::Unflattener::_check_order_is_consistent
/usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2842
STACK: Bio::SeqFeature::Tools::Unflattener::infer_mRNA_from_CDS
/usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2713
STACK: Bio::SeqFeature::Tools::Unflattener::unflatten_seq
/usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:1532
STACK: main::unflatten_seq /usr/local/bin/bp_genbank2gff3.pl:1023
STACK: /usr/local/bin/bp_genbank2gff3.pl:506
-----------------------------------------------------------

Best,
Dave


From cjfields at illinois.edu  Tue Aug 17 09:51:02 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 17 Aug 2010 08:51:02 -0500
Subject: [Bioperl-l] bp_genbank2gff3.pl error with circular genomes
In-Reply-To: <AANLkTinzCSDcbXVDabwW+qmwSOKcVkYC-a-pkmYy1dfM@mail.gmail.com>
References: <AANLkTinzCSDcbXVDabwW+qmwSOKcVkYC-a-pkmYy1dfM@mail.gmail.com>
Message-ID: <8E620E8B-4A4A-42B2-A2FA-60C8ECA4C8A5@illinois.edu>

I think Chris Mungall has a branch set up for this in bioperl:

http://github.com/bioperl/bioperl-live/tree/circular

Is that correct?  Should we merge that code into the master branch?

chris

On Aug 17, 2010, at 8:44 AM, David Breimann wrote:

> Hello,
> 
> The following genbank has a gene that runs over the 'end" of the
> chromosome and into its "beginning", and the script generates an
> error.
> 
> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Bacillus_cereus_ATCC_10987/NC_005707.gbk
> 
> NC_005707 Unflattening error:
> Details:
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: PROBLEM, SEVERITY==2
> Ranges not in correct order. Strange ensembl genbank entry? Range:
> [207497,208369] [1,687]
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:473
> STACK: Bio::SeqFeature::Tools::Unflattener::problem
> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:952
> STACK: Bio::SeqFeature::Tools::Unflattener::_check_order_is_consistent
> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2842
> STACK: Bio::SeqFeature::Tools::Unflattener::infer_mRNA_from_CDS
> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2713
> STACK: Bio::SeqFeature::Tools::Unflattener::unflatten_seq
> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:1532
> STACK: main::unflatten_seq /usr/local/bin/bp_genbank2gff3.pl:1023
> STACK: /usr/local/bin/bp_genbank2gff3.pl:506
> -----------------------------------------------------------
> 
> Best,
> Dave
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From David.Messina at sbc.su.se  Tue Aug 17 09:52:11 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Tue, 17 Aug 2010 15:52:11 +0200
Subject: [Bioperl-l] Bug? Features with similar ranges,
	different IDs are considered overlapping
In-Reply-To: <B7A8E3B4-1E7E-4768-AFF3-3D4C4A5FC3B1@scottcain.net>
References: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu>
	<A07B1A30-7708-4401-BB13-7B4463D306E7@sbc.su.se>
	<E3473ED6-2122-4B4A-8A73-E80C4136CCAC@illinois.edu>
	<83732A2C-C6E3-479A-ACAA-FE5756271991@sbc.su.se>
	<B7A8E3B4-1E7E-4768-AFF3-3D4C4A5FC3B1@scottcain.net>
Message-ID: <EA0C23FB-8C2F-4C04-B0E8-4207409916DC@sbc.su.se>

> It seems to me that the genomic comparison is the thing people would do more often, so if you're going to create a flag, the default should be for the genomic comparison

Yep, agreed.

And such a flag should be named for the non-default behavior, then, like: -ignore_IDs_for_overlaps


Dave


From douglas.hoen at gmail.com  Thu Aug 12 10:24:27 2010
From: douglas.hoen at gmail.com (Douglas Hoen)
Date: Thu, 12 Aug 2010 10:24:27 -0400
Subject: [Bioperl-l] HMMER3 to GFF3
In-Reply-To: <20100812141645.1dc6507a.kai.blin@biotech.uni-tuebingen.de>
References: <4bb89ced-69d9-43ff-ae20-4ce134efc40a@f6g2000yqa.googlegroups.com>
	<20100812141645.1dc6507a.kai.blin@biotech.uni-tuebingen.de>
Message-ID: <A1AA9B70-69B9-4AA6-BB5F-FB0D0FDD0491@gmail.com>

Hi Kai,

Here it is.

Thanks,
-- Doug


-------------- next part --------------
A non-text attachment was scrubbed...
Name: chr1-tesigsv2.hmmscan
Type: application/octet-stream
Size: 676132 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20100812/7818b4a4/attachment-0002.obj>
-------------- next part --------------


On 2010-08-12, at 8:16 AM, Kai Blin wrote:

> On Wed, 11 Aug 2010 22:59:37 -0700 (PDT)
> Doug Hoen <douglas.hoen at gmail.com> wrote:
> 
> Hi Doug,
> 
>> Could someone please confirm whether the results are incorrect and, if
>> so, perhaps suggest a fix? It may well be that this problem is due to
>> the unusual way I am using hmmscan, rather than a problem with HMMER3
>> parsing...?
> 
> Can you please attach your hmmer input file? Along the way something
> inserted line breaks, making it unreadable.
> 
> It might well be possible that the HMMer3 parser still handles a little
> different from the HMMer2 parser, I haven't tried that script.
> 
> Cheers,
> Kai
> 
> -- 
> Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
> Institute for Microbiology and Infection Medicine
> Division of Microbiology/Biotechnology
> Eberhard-Karls-University of T?bingen
> Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
> D-72076 T?bingen                        Fax :   ++49 7071 29-5979
> Deutschland
> Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben


From CJMungall at lbl.gov  Tue Aug 17 11:53:15 2010
From: CJMungall at lbl.gov (Chris Mungall)
Date: Tue, 17 Aug 2010 08:53:15 -0700
Subject: [Bioperl-l] bp_genbank2gff3.pl error with circular genomes
In-Reply-To: <8E620E8B-4A4A-42B2-A2FA-60C8ECA4C8A5@illinois.edu>
References: <AANLkTinzCSDcbXVDabwW+qmwSOKcVkYC-a-pkmYy1dfM@mail.gmail.com>
	<8E620E8B-4A4A-42B2-A2FA-60C8ECA4C8A5@illinois.edu>
Message-ID: <D64E3F00-57BE-484B-A4DE-EEAC673C82E4@lbl.gov>


You can merge this in. It should allow David to proceed.

I haven't kept up on synchrony between bioperl and GFF on circular  
genomes. The above fix is conservative in that essentially preserves  
the genbank coordinates even when the origin is crossed:

	http://github.com/bioperl/bioperl-live/commit/d752a4cb5168d1bb01f8c80247a57f66b2bd9daf

However, if this is to conform to GFF3 then the resulting coordinates  
that cross the origin should have start/end incremented by the genome  
length

On Aug 17, 2010, at 6:51 AM, Chris Fields wrote:

> I think Chris Mungall has a branch set up for this in bioperl:
>
> http://github.com/bioperl/bioperl-live/tree/circular
>
> Is that correct?  Should we merge that code into the master branch?
>
> chris
>
> On Aug 17, 2010, at 8:44 AM, David Breimann wrote:
>
>> Hello,
>>
>> The following genbank has a gene that runs over the 'end" of the
>> chromosome and into its "beginning", and the script generates an
>> error.
>>
>> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Bacillus_cereus_ATCC_10987/NC_005707.gbk
>>
>> NC_005707 Unflattening error:
>> Details:
>> ------------- EXCEPTION: Bio::Root::Exception -------------
>> MSG: PROBLEM, SEVERITY==2
>> Ranges not in correct order. Strange ensembl genbank entry? Range:
>> [207497,208369] [1,687]
>> STACK: Error::throw
>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/ 
>> Root.pm:473
>> STACK: Bio::SeqFeature::Tools::Unflattener::problem
>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:952
>> STACK:  
>> Bio::SeqFeature::Tools::Unflattener::_check_order_is_consistent
>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2842
>> STACK: Bio::SeqFeature::Tools::Unflattener::infer_mRNA_from_CDS
>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2713
>> STACK: Bio::SeqFeature::Tools::Unflattener::unflatten_seq
>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:1532
>> STACK: main::unflatten_seq /usr/local/bin/bp_genbank2gff3.pl:1023
>> STACK: /usr/local/bin/bp_genbank2gff3.pl:506
>> -----------------------------------------------------------
>>
>> Best,
>> Dave
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at illinois.edu  Tue Aug 17 15:24:23 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 17 Aug 2010 14:24:23 -0500
Subject: [Bioperl-l] bp_genbank2gff3.pl error with circular genomes
In-Reply-To: <D64E3F00-57BE-484B-A4DE-EEAC673C82E4@lbl.gov>
References: <AANLkTinzCSDcbXVDabwW+qmwSOKcVkYC-a-pkmYy1dfM@mail.gmail.com>
	<8E620E8B-4A4A-42B2-A2FA-60C8ECA4C8A5@illinois.edu>
	<D64E3F00-57BE-484B-A4DE-EEAC673C82E4@lbl.gov>
Message-ID: <8C50FCC4-4BB5-4ACB-A138-BB20F50D2C45@illinois.edu>

On Aug 17, 2010, at 10:53 AM, Chris Mungall wrote:

> You can merge this in. It should allow David to proceed.

Will do.  I'll go ahead and delete the remote branch as well.

> I haven't kept up on synchrony between bioperl and GFF on circular genomes. The above fix is conservative in that essentially preserves the genbank coordinates even when the origin is crossed:
> 
> 	http://github.com/bioperl/bioperl-live/commit/d752a4cb5168d1bb01f8c80247a57f66b2bd9daf
> 
> However, if this is to conform to GFF3 then the resulting coordinates that cross the origin should have start/end incremented by the genome length

Yes, that is a problem that needs to be addressed.  Might be worth filing a bug report for tracking this; we can use David's example, or the one I recently added for phi-X174.

chris

> On Aug 17, 2010, at 6:51 AM, Chris Fields wrote:
> 
>> I think Chris Mungall has a branch set up for this in bioperl:
>> 
>> http://github.com/bioperl/bioperl-live/tree/circular
>> 
>> Is that correct?  Should we merge that code into the master branch?
>> 
>> chris
>> 
>> On Aug 17, 2010, at 8:44 AM, David Breimann wrote:
>> 
>>> Hello,
>>> 
>>> The following genbank has a gene that runs over the 'end" of the
>>> chromosome and into its "beginning", and the script generates an
>>> error.
>>> 
>>> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Bacillus_cereus_ATCC_10987/NC_005707.gbk
>>> 
>>> NC_005707 Unflattening error:
>>> Details:
>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>> MSG: PROBLEM, SEVERITY==2
>>> Ranges not in correct order. Strange ensembl genbank entry? Range:
>>> [207497,208369] [1,687]
>>> STACK: Error::throw
>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:473
>>> STACK: Bio::SeqFeature::Tools::Unflattener::problem
>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:952
>>> STACK: Bio::SeqFeature::Tools::Unflattener::_check_order_is_consistent
>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2842
>>> STACK: Bio::SeqFeature::Tools::Unflattener::infer_mRNA_from_CDS
>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2713
>>> STACK: Bio::SeqFeature::Tools::Unflattener::unflatten_seq
>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:1532
>>> STACK: main::unflatten_seq /usr/local/bin/bp_genbank2gff3.pl:1023
>>> STACK: /usr/local/bin/bp_genbank2gff3.pl:506
>>> -----------------------------------------------------------
>>> 
>>> Best,
>>> Dave
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From sheldon.mckay at gmail.com  Tue Aug 17 16:42:50 2010
From: sheldon.mckay at gmail.com (Sheldon McKay)
Date: Tue, 17 Aug 2010 16:42:50 -0400
Subject: [Bioperl-l] AlignIO and Gbrowse_syn
In-Reply-To: <E53C66C1-E4F1-4E83-B5ED-631CE62D7DCE@illinois.edu>
References: <AANLkTi=DS92zQ+_YANSbgmMiPUDo66YXyLhQby+_z_VD@mail.gmail.com>
	<C87CF736.E5DB%gowthaman.ramasamy@sbri.org>
	<AANLkTimhT7KFHgg8HssiMrXkGNSjRMqjkuE++iAaqj4C@mail.gmail.com>
	<18DF7D20DFEC044098A1062202F5FFF32F0237EAB7@exchsth.agresearch.co.nz>
	<E53C66C1-E4F1-4E83-B5ED-631CE62D7DCE@illinois.edu>
Message-ID: <AANLkTikYi9TGag3poS=xB73iGxqX_-ThZS9wU1TC2JDH@mail.gmail.com>

The growse_syn dev team is pretty small (n=1) right now, so any
patches would be welcome.

Sheldon


On Wed, Aug 11, 2010 at 6:02 PM, Chris Fields <cjfields at illinois.edu> wrote:
> Russell,
>
> We have had very few requests to support .maf until recently, which is why there has been little done with it. ?We welcome any help to improve it.
>
> chris
>
> On Aug 11, 2010, at 4:31 PM, Smithies, Russell wrote:
>
>> I know there was some brief discussion about .maf format a few weeks ago but I've had an enquiry (as below) from a colleague.
>> If GBrowse_syn is using .maf format, does AlignIO need more work?
>> Any comments?
>>
>> --Russell
>>
>>
>> I'd like to plug LASTZ alignments into GBrowse_syn. LASTZ can produce a limit number of alignment formats (http://www.bx.psu.edu/miller_lab/dist/README.lastz-1.02.00/README.lastz-1.02.00a.html#options_output). GBrowse_syn accepts clustalw format plus "other commonly used formats recognized by BioPerl's AlignIO parser" (http://gmod.org/wiki/GBrowse_syn_Database) . ?Since LASTZ doesn't produce clustalw, I've tried parsing LASTZ maf output to clustalw (and other alignment formats) using AlignIO, however I run into the following issues:
>> *Strand info is lost (probably fair enough, since this isn't part of the clustalw format per se; incorporating strand info within sequence IDs is a GBrowse_syn clustalw specification)
>> *The coordinate system for reverse strand matches ?differs between LASTZ .maf and BioPerl .maf: for LASTZ, coordinates relate to the reverse complemented sequence, whereas for BioPerl/GBrowse, coordinates relate to the original (non-rev complemented) sequence. E.g. a coordinate of "1" in the LASTZ .maf file refers to the last base of the original sequence; AlignIO prints "1" to the output clustalw file, but since strand info is lost it is construed as the first position at the very start of the original sequence. As a result all reverse match coordinates in the resulting clustalw output file are incorrect.
>> *AlignIO is unable to parse multiple, individual aligned regions within the same .maf file; it interleaves them
>>
>> I would be interested to hear whether anyone has already found a solution to integrating LASTZ and GBrowse_syn... and also whether any development of AlignIO to improve support of maf format is planned.
>> =======================================================================
>> Attention: The information contained in this message and/or attachments
>> from AgResearch Limited is intended only for the persons or entities
>> to which it is addressed and may contain confidential and/or privileged
>> material. Any review, retransmission, dissemination or other use of, or
>> taking of any action in reliance upon, this information by persons or
>> entities other than the intended recipients is prohibited by AgResearch
>> Limited. If you have received this message in error, please notify the
>> sender immediately.
>> =======================================================================
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From hxu.hong at gmail.com  Tue Aug 17 16:50:43 2010
From: hxu.hong at gmail.com (Hong Xu)
Date: Tue, 17 Aug 2010 16:50:43 -0400
Subject: [Bioperl-l] Bio::Tools::Primer3 question
Message-ID: <AANLkTi=NcuvzepGaqw_TUTr5MM6F2K_b8PT8Fa3qrZg2@mail.gmail.com>

Hello all,

I'm working to parse the Primer3 release 2.2.2-beta result. I made the
necessary changes to make Bio::Tools::Primer3 work with the new output
tags of Primer3 release 2.2.2. But when I tried to get the primer Tm,
I found that Bio::Tools::Primer3 gave different Tm from Primer3 result
file. Then I learned that the Tm was calculated by
Bio::SeqFeature::Primer module, not from parsing Primer3 result. If I
want to get data from parsing Primer3 result, should I write my own
Primer3 parser instead of Bio::Tools::Primer3?

thanks a lot,
Hong


From cjfields at illinois.edu  Tue Aug 17 17:14:02 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 17 Aug 2010 16:14:02 -0500
Subject: [Bioperl-l] Bio::Tools::Primer3 question
In-Reply-To: <AANLkTi=NcuvzepGaqw_TUTr5MM6F2K_b8PT8Fa3qrZg2@mail.gmail.com>
References: <AANLkTi=NcuvzepGaqw_TUTr5MM6F2K_b8PT8Fa3qrZg2@mail.gmail.com>
Message-ID: <E039C425-80C3-4F18-B589-AE98896A1175@illinois.edu>

Already ahead of you there, unfortunately.  I wrote a complete reimplementation of both the Primer3 parser and the Primer3 wrapper that handles both v1 and v2 of primer3_core.  Lack of tuits lately have prevented me from getting tests written up, so for the time being it's sitting in bioperl-dev:

http://github.com/bioperl/bioperl-dev

They are Bio::Tools::Primer3Redux (parser) and Bio::Tools::Run::Primer3Redux (wrapper).

I rewrote those b/c I found the original modules not adequate enough in many ways for my purposes then (the newer version uses simple features or feature pairs instead of the primer features, for the same reasons you mention re: Tm).  You're more than welcome to hack on the code a bit.  I'm planning on pulling it out into my own github repo for separate submission to CPAN.  

chris

On Aug 17, 2010, at 3:50 PM, Hong Xu wrote:

> Hello all,
> 
> I'm working to parse the Primer3 release 2.2.2-beta result. I made the
> necessary changes to make Bio::Tools::Primer3 work with the new output
> tags of Primer3 release 2.2.2. But when I tried to get the primer Tm,
> I found that Bio::Tools::Primer3 gave different Tm from Primer3 result
> file. Then I learned that the Tm was calculated by
> Bio::SeqFeature::Primer module, not from parsing Primer3 result. If I
> want to get data from parsing Primer3 result, should I write my own
> Primer3 parser instead of Bio::Tools::Primer3?
> 
> thanks a lot,
> Hong
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Tue Aug 17 23:42:59 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 17 Aug 2010 22:42:59 -0500
Subject: [Bioperl-l] bp_genbank2gff3.pl error with circular genomes
In-Reply-To: <8C50FCC4-4BB5-4ACB-A138-BB20F50D2C45@illinois.edu>
References: <AANLkTinzCSDcbXVDabwW+qmwSOKcVkYC-a-pkmYy1dfM@mail.gmail.com>
	<8E620E8B-4A4A-42B2-A2FA-60C8ECA4C8A5@illinois.edu>
	<D64E3F00-57BE-484B-A4DE-EEAC673C82E4@lbl.gov>
	<8C50FCC4-4BB5-4ACB-A138-BB20F50D2C45@illinois.edu>
Message-ID: <D1CC1B9C-36A7-4427-9100-AE5C85C5E965@illinois.edu>

Chris, David, 

The branch is now merged back to trunk.  David, let us know if this helps.

chris (f)

On Aug 17, 2010, at 2:24 PM, Chris Fields wrote:

> On Aug 17, 2010, at 10:53 AM, Chris Mungall wrote:
> 
>> You can merge this in. It should allow David to proceed.
> 
> Will do.  I'll go ahead and delete the remote branch as well.
> 
>> I haven't kept up on synchrony between bioperl and GFF on circular genomes. The above fix is conservative in that essentially preserves the genbank coordinates even when the origin is crossed:
>> 
>> 	http://github.com/bioperl/bioperl-live/commit/d752a4cb5168d1bb01f8c80247a57f66b2bd9daf
>> 
>> However, if this is to conform to GFF3 then the resulting coordinates that cross the origin should have start/end incremented by the genome length
> 
> Yes, that is a problem that needs to be addressed.  Might be worth filing a bug report for tracking this; we can use David's example, or the one I recently added for phi-X174.
> 
> chris
> 
>> On Aug 17, 2010, at 6:51 AM, Chris Fields wrote:
>> 
>>> I think Chris Mungall has a branch set up for this in bioperl:
>>> 
>>> http://github.com/bioperl/bioperl-live/tree/circular
>>> 
>>> Is that correct?  Should we merge that code into the master branch?
>>> 
>>> chris
>>> 
>>> On Aug 17, 2010, at 8:44 AM, David Breimann wrote:
>>> 
>>>> Hello,
>>>> 
>>>> The following genbank has a gene that runs over the 'end" of the
>>>> chromosome and into its "beginning", and the script generates an
>>>> error.
>>>> 
>>>> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Bacillus_cereus_ATCC_10987/NC_005707.gbk
>>>> 
>>>> NC_005707 Unflattening error:
>>>> Details:
>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>>> MSG: PROBLEM, SEVERITY==2
>>>> Ranges not in correct order. Strange ensembl genbank entry? Range:
>>>> [207497,208369] [1,687]
>>>> STACK: Error::throw
>>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:473
>>>> STACK: Bio::SeqFeature::Tools::Unflattener::problem
>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:952
>>>> STACK: Bio::SeqFeature::Tools::Unflattener::_check_order_is_consistent
>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2842
>>>> STACK: Bio::SeqFeature::Tools::Unflattener::infer_mRNA_from_CDS
>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2713
>>>> STACK: Bio::SeqFeature::Tools::Unflattener::unflatten_seq
>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:1532
>>>> STACK: main::unflatten_seq /usr/local/bin/bp_genbank2gff3.pl:1023
>>>> STACK: /usr/local/bin/bp_genbank2gff3.pl:506
>>>> -----------------------------------------------------------
>>>> 
>>>> Best,
>>>> Dave
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Wed Aug 18 00:48:55 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 17 Aug 2010 23:48:55 -0500
Subject: [Bioperl-l] Bio::Tools::Primer3 question
In-Reply-To: <E039C425-80C3-4F18-B589-AE98896A1175@illinois.edu>
References: <AANLkTi=NcuvzepGaqw_TUTr5MM6F2K_b8PT8Fa3qrZg2@mail.gmail.com>
	<E039C425-80C3-4F18-B589-AE98896A1175@illinois.edu>
Message-ID: <C4B91FBD-1705-4045-9D98-F5ABEA80C038@illinois.edu>

Hong,

The latest code, along with working tests, is present here:

http://github.com/cjfields/Bio-Tools-Primer3Redux

It needs a few more tests but the initial wrapper tests work fine for primer3 v2.2.1 on both Mac and Linux.  Will try using this to CPAN after a bit more cleanup.

chris

On Aug 17, 2010, at 4:14 PM, Chris Fields wrote:

> Already ahead of you there, unfortunately.  I wrote a complete reimplementation of both the Primer3 parser and the Primer3 wrapper that handles both v1 and v2 of primer3_core.  Lack of tuits lately have prevented me from getting tests written up, so for the time being it's sitting in bioperl-dev:
> 
> http://github.com/bioperl/bioperl-dev
> 
> They are Bio::Tools::Primer3Redux (parser) and Bio::Tools::Run::Primer3Redux (wrapper).
> 
> I rewrote those b/c I found the original modules not adequate enough in many ways for my purposes then (the newer version uses simple features or feature pairs instead of the primer features, for the same reasons you mention re: Tm).  You're more than welcome to hack on the code a bit.  I'm planning on pulling it out into my own github repo for separate submission to CPAN.  
> 
> chris
> 
> On Aug 17, 2010, at 3:50 PM, Hong Xu wrote:
> 
>> Hello all,
>> 
>> I'm working to parse the Primer3 release 2.2.2-beta result. I made the
>> necessary changes to make Bio::Tools::Primer3 work with the new output
>> tags of Primer3 release 2.2.2. But when I tried to get the primer Tm,
>> I found that Bio::Tools::Primer3 gave different Tm from Primer3 result
>> file. Then I learned that the Tm was calculated by
>> Bio::SeqFeature::Primer module, not from parsing Primer3 result. If I
>> want to get data from parsing Primer3 result, should I write my own
>> Primer3 parser instead of Bio::Tools::Primer3?
>> 
>> thanks a lot,
>> Hong
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From david.breimann at gmail.com  Wed Aug 18 02:46:58 2010
From: david.breimann at gmail.com (David Breimann)
Date: Wed, 18 Aug 2010 09:46:58 +0300
Subject: [Bioperl-l] bp_genbank2gff3.pl error with circular genomes
In-Reply-To: <D1CC1B9C-36A7-4427-9100-AE5C85C5E965@illinois.edu>
References: <AANLkTinzCSDcbXVDabwW+qmwSOKcVkYC-a-pkmYy1dfM@mail.gmail.com>
	<8E620E8B-4A4A-42B2-A2FA-60C8ECA4C8A5@illinois.edu>
	<D64E3F00-57BE-484B-A4DE-EEAC673C82E4@lbl.gov>
	<8C50FCC4-4BB5-4ACB-A138-BB20F50D2C45@illinois.edu>
	<D1CC1B9C-36A7-4427-9100-AE5C85C5E965@illinois.edu>
Message-ID: <AANLkTinsqQCpybg6MUzTwqNuKMn=kJMV4pL64GXwAOkG@mail.gmail.com>

Dear Chris's,

I tested the updated version on multiple genomes that previously
returned errors (for future reference: NC_005707, NC_006578,
NC_007103, NC_007104, NC_007106, NC_007107, NC_008573, NC_008762,
NC_008763, NC_008785, NC_009457, NC_012040). The script now ends
normally on all of them. However, as you mentioned, the result GFF3
file does not comply with GFF3 specifications for circular genomes.
This in turn causes some unexpected results in other applications.

Best,
Dave

On Wed, Aug 18, 2010 at 6:42 AM, Chris Fields <cjfields at illinois.edu> wrote:
> Chris, David,
>
> The branch is now merged back to trunk. ?David, let us know if this helps.
>
> chris (f)
>
> On Aug 17, 2010, at 2:24 PM, Chris Fields wrote:
>
>> On Aug 17, 2010, at 10:53 AM, Chris Mungall wrote:
>>
>>> You can merge this in. It should allow David to proceed.
>>
>> Will do. ?I'll go ahead and delete the remote branch as well.
>>
>>> I haven't kept up on synchrony between bioperl and GFF on circular genomes. The above fix is conservative in that essentially preserves the genbank coordinates even when the origin is crossed:
>>>
>>> ? ? ?http://github.com/bioperl/bioperl-live/commit/d752a4cb5168d1bb01f8c80247a57f66b2bd9daf
>>>
>>> However, if this is to conform to GFF3 then the resulting coordinates that cross the origin should have start/end incremented by the genome length
>>
>> Yes, that is a problem that needs to be addressed. ?Might be worth filing a bug report for tracking this; we can use David's example, or the one I recently added for phi-X174.
>>
>> chris
>>
>>> On Aug 17, 2010, at 6:51 AM, Chris Fields wrote:
>>>
>>>> I think Chris Mungall has a branch set up for this in bioperl:
>>>>
>>>> http://github.com/bioperl/bioperl-live/tree/circular
>>>>
>>>> Is that correct? ?Should we merge that code into the master branch?
>>>>
>>>> chris
>>>>
>>>> On Aug 17, 2010, at 8:44 AM, David Breimann wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> The following genbank has a gene that runs over the 'end" of the
>>>>> chromosome and into its "beginning", and the script generates an
>>>>> error.
>>>>>
>>>>> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Bacillus_cereus_ATCC_10987/NC_005707.gbk
>>>>>
>>>>> NC_005707 Unflattening error:
>>>>> Details:
>>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>>>> MSG: PROBLEM, SEVERITY==2
>>>>> Ranges not in correct order. Strange ensembl genbank entry? Range:
>>>>> [207497,208369] [1,687]
>>>>> STACK: Error::throw
>>>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:473
>>>>> STACK: Bio::SeqFeature::Tools::Unflattener::problem
>>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:952
>>>>> STACK: Bio::SeqFeature::Tools::Unflattener::_check_order_is_consistent
>>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2842
>>>>> STACK: Bio::SeqFeature::Tools::Unflattener::infer_mRNA_from_CDS
>>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2713
>>>>> STACK: Bio::SeqFeature::Tools::Unflattener::unflatten_seq
>>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:1532
>>>>> STACK: main::unflatten_seq /usr/local/bin/bp_genbank2gff3.pl:1023
>>>>> STACK: /usr/local/bin/bp_genbank2gff3.pl:506
>>>>> -----------------------------------------------------------
>>>>>
>>>>> Best,
>>>>> Dave
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From G.Gallone at sms.ed.ac.uk  Wed Aug 18 10:57:01 2010
From: G.Gallone at sms.ed.ac.uk (Giuseppe Gallone)
Date: Wed, 18 Aug 2010 15:57:01 +0100
Subject: [Bioperl-l] [RFC] Interolog::Walk
Message-ID: <4C6BF4BD.5010200@sms.ed.ac.uk>

Hello BioPerl community - I've written a new module called 
Interolog::Walk that I'm planning to put on CPAN. I would be grateful if 
you might take a look at the brief description I attached and tell me 
what you think. I'll be more than happy to post further details should 
the module be of some interest for someone.

Also, I am not totally sure about having the correct name for it. This 
is my first module and It would be great if you could advise on naming 
it appropriately. Hopefully the following description will give an idea 
on what it does.

===================


NAME
     Interolog::Walk - Retrieve, score and visualize putative 
Protein-Protein Interactions through the orthology-walk method

DESCRIPTION
     A common activity in computational biology is to mine 
protein-protein interactions from publicly available databases in order 
to build Protein-Protein Interaction (PPI) datasets.
In many instances, however, the number of experimentally obtained 
annotated PPIs is very scarce and it would be helpful to enrich the 
experimental dataset with high-quality, computationally-inferred PPIs. 
Such computationally-obtained dataset can extend, support or enrich 
experimental PPI datasets, and are of crucial importance in 
high-throughput gene prioritization studies, i.e. to drive hypotheses 
and restrict the dimensionality of many gene functional discovery problems.
This Perl Module, Interolog::Walk, is aimed at building putative PPI 
datasets on the basis of a number of comparative biology paradigms: the 
module implements a collection of computational biology algorithms based 
on the concept of "orthology projection". If interacting proteins A and 
B in organism X have orthologs A' and B' in organism Y, under certain 
conditions one can assume that the interaction will be conserved in 
organism Y, i.e. the A-B interaction can be "projected through the 
orthologies" to obtain a putative A'-B' interaction. The pair of 
interactions (A-B) and (A'-B') are named "Interologs" (see for instance 
[1] and [2]).

Interolog::Walk collects, analyses and collates gene orthology data 
provided by the Ensembl Consortium (www.ensembl.org) as well as PPI data 
provided by EBI Intact (http://www.ebi.ac.uk/intact/). It provides the 
user with the possibility of rating the quality and reliability of the 
putative interactions collected, by means of confidence scores, and 
optionally outputs network representations of the datasets, compatible 
with the biological network representation standard, Cytoscape.

USAGE
In order to carry out an interolog walk we start with a set of gene 
identifiers in one organism of interest. We query those ids against a 
number of comparative biology databases to retrieve a list of 
orthologues for each gene id of interest, in one or more species.
In the following step we rely  on PPI databases to retrieve the list of 
available interactors for the protein ids obtained. The output at this 
stage consists of a list of interactors of the orthologues of the 
initial gene set, plus several fields of ancillary data.
In the last step of the process we  project the interactions - again 
using orthology data - back to the original species of interest. The 
output of the process is a list of PUTATIVE INTERACTORS of the initial 
gene set, plus several fields of ancillary data.

====================

Given the scope and the focus of the project, I would imagine that 
viable alternatives for the namespace might be

Bio::Orthology::InterologWalk
Bio::InterologMap

or maybe
Interolog::Map
Orthology::Map
Orthology::InterologMap

There are no similar projects as far as I could see so I shouldn't run 
the risk of overlapping namespaces. Still I would love to know your 
informed opinion about it.

best,
Giuseppe


REFERENCES
[1] Yu H, Luscombe NM, Lu HX, Zhu X, Xia Y, Han JD, Bertin N, Chung S, 
Vidal M, Gerstein M. Annotation transfer between genomes: 
protein-protein interologs and protein-DNA regulogs. Genome Research 
2004 Jun;14(6):1107-18.

[2]Wiles AM, Doderer M, Ruan J, Gu T-T, Ravi D, Blackman BA, Bishop AJR. 
"Building and Analyzing Protein Interactome Networks by Cross-species 
Comparisons." BMC Systems Biology 2010, 4:36 - PMID: 20353594

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


From David.Messina at sbc.su.se  Wed Aug 18 12:52:58 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Wed, 18 Aug 2010 18:52:58 +0200
Subject: [Bioperl-l] [RFC] Interolog::Walk
In-Reply-To: <4C6BF4BD.5010200@sms.ed.ac.uk>
References: <4C6BF4BD.5010200@sms.ed.ac.uk>
Message-ID: <8BBCD561-CA5E-412B-8CBC-34767006A74A@sbc.su.se>

Hi Giuseppe,

Sounds really interesting ? thanks for posting this.

> Bio::Orthology::InterologWalk

I vote for this name, or in any case something with Bio:: as the top-level namespace since it's a biology-related package.

I like that you're providing a lot of background and information about the project in the documentation. However, the USAGE section should give information about how to use the module, with example code. You can look at other modules on CPAN (or in BioPerl) to see the conventions for writing documentation.

Also, from what you wrote, it sounds like this might be a pipeline or a script rather than a module per se, or perhaps a script and a set of modules. It would be helpful to clarify in your documentation (if you haven't already) how exactly things are organized (and of course example code will help with that, too).


Hope that's helpful, and let us know when you've got it up on CPAN so we can try it out!


Dave


From cjfields at illinois.edu  Wed Aug 18 14:24:16 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 18 Aug 2010 13:24:16 -0500
Subject: [Bioperl-l] bp_genbank2gff3.pl error with circular genomes
In-Reply-To: <AANLkTinsqQCpybg6MUzTwqNuKMn=kJMV4pL64GXwAOkG@mail.gmail.com>
References: <AANLkTinzCSDcbXVDabwW+qmwSOKcVkYC-a-pkmYy1dfM@mail.gmail.com>
	<8E620E8B-4A4A-42B2-A2FA-60C8ECA4C8A5@illinois.edu>
	<D64E3F00-57BE-484B-A4DE-EEAC673C82E4@lbl.gov>
	<8C50FCC4-4BB5-4ACB-A138-BB20F50D2C45@illinois.edu>
	<D1CC1B9C-36A7-4427-9100-AE5C85C5E965@illinois.edu>
	<AANLkTinsqQCpybg6MUzTwqNuKMn=kJMV4pL64GXwAOkG@mail.gmail.com>
Message-ID: <C385563A-9724-4045-B5A2-7F28A5CB897A@illinois.edu>

Okay, will file this as a bug.  Thanks!

chris

On Aug 18, 2010, at 1:46 AM, David Breimann wrote:

> Dear Chris's,
> 
> I tested the updated version on multiple genomes that previously
> returned errors (for future reference: NC_005707, NC_006578,
> NC_007103, NC_007104, NC_007106, NC_007107, NC_008573, NC_008762,
> NC_008763, NC_008785, NC_009457, NC_012040). The script now ends
> normally on all of them. However, as you mentioned, the result GFF3
> file does not comply with GFF3 specifications for circular genomes.
> This in turn causes some unexpected results in other applications.
> 
> Best,
> Dave
> 
> On Wed, Aug 18, 2010 at 6:42 AM, Chris Fields <cjfields at illinois.edu> wrote:
>> Chris, David,
>> 
>> The branch is now merged back to trunk.  David, let us know if this helps.
>> 
>> chris (f)
>> 
>> On Aug 17, 2010, at 2:24 PM, Chris Fields wrote:
>> 
>>> On Aug 17, 2010, at 10:53 AM, Chris Mungall wrote:
>>> 
>>>> You can merge this in. It should allow David to proceed.
>>> 
>>> Will do.  I'll go ahead and delete the remote branch as well.
>>> 
>>>> I haven't kept up on synchrony between bioperl and GFF on circular genomes. The above fix is conservative in that essentially preserves the genbank coordinates even when the origin is crossed:
>>>> 
>>>>      http://github.com/bioperl/bioperl-live/commit/d752a4cb5168d1bb01f8c80247a57f66b2bd9daf
>>>> 
>>>> However, if this is to conform to GFF3 then the resulting coordinates that cross the origin should have start/end incremented by the genome length
>>> 
>>> Yes, that is a problem that needs to be addressed.  Might be worth filing a bug report for tracking this; we can use David's example, or the one I recently added for phi-X174.
>>> 
>>> chris
>>> 
>>>> On Aug 17, 2010, at 6:51 AM, Chris Fields wrote:
>>>> 
>>>>> I think Chris Mungall has a branch set up for this in bioperl:
>>>>> 
>>>>> http://github.com/bioperl/bioperl-live/tree/circular
>>>>> 
>>>>> Is that correct?  Should we merge that code into the master branch?
>>>>> 
>>>>> chris
>>>>> 
>>>>> On Aug 17, 2010, at 8:44 AM, David Breimann wrote:
>>>>> 
>>>>>> Hello,
>>>>>> 
>>>>>> The following genbank has a gene that runs over the 'end" of the
>>>>>> chromosome and into its "beginning", and the script generates an
>>>>>> error.
>>>>>> 
>>>>>> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Bacillus_cereus_ATCC_10987/NC_005707.gbk
>>>>>> 
>>>>>> NC_005707 Unflattening error:
>>>>>> Details:
>>>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>>>>> MSG: PROBLEM, SEVERITY==2
>>>>>> Ranges not in correct order. Strange ensembl genbank entry? Range:
>>>>>> [207497,208369] [1,687]
>>>>>> STACK: Error::throw
>>>>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:473
>>>>>> STACK: Bio::SeqFeature::Tools::Unflattener::problem
>>>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:952
>>>>>> STACK: Bio::SeqFeature::Tools::Unflattener::_check_order_is_consistent
>>>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2842
>>>>>> STACK: Bio::SeqFeature::Tools::Unflattener::infer_mRNA_from_CDS
>>>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2713
>>>>>> STACK: Bio::SeqFeature::Tools::Unflattener::unflatten_seq
>>>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:1532
>>>>>> STACK: main::unflatten_seq /usr/local/bin/bp_genbank2gff3.pl:1023
>>>>>> STACK: /usr/local/bin/bp_genbank2gff3.pl:506
>>>>>> -----------------------------------------------------------
>>>>>> 
>>>>>> Best,
>>>>>> Dave
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>> 
>>>> 
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cdavis at bcm.tmc.edu  Wed Aug 18 15:19:53 2010
From: cdavis at bcm.tmc.edu (Caleb Davis)
Date: Wed, 18 Aug 2010 14:19:53 -0500
Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlast.pm - bl2seq question
Message-ID: <4C6C3259.4060304@bcm.tmc.edu>

Hello, thank you for bioperl!

I am getting discrepancies between the online bl2seq 
(www.ncbi.nlm.nih.gov/blast/*bl2seq*/wblast2.cgi) and bioperl's 
implementation, and I'm not sure why. I'm seeing a desired behavior 
through the web interface but can't replicate it locally. Specifically, 
online bl2seq aligns across a 1 bp insertion in the subject whereas the 
local bl2seq just reports a shorter alignment.

Any ideas? Thanks again,
--Caleb

The desired parameter differences from default are -F F -W 7 (turn 
complexity filter off, word size = 7). Below I present the online and 
local results given the following input sequences:

 >consensus
GAGGATCCAGAATTCTC
 >FVFTF6N01A86BR
AACCCAATGTAAGGAAGCTAAGAACCTTGAAAAGAGGATACCAGAATTCTC

Here are the parameters and result I'm getting online:
Blast4-request ::= {
  body queue-search {
    program "blastn",
    service "plain",
    queries bioseq-set {
      seq-set {
        seq {
          id {
            local id 26297
          },
          descr {
            title "consensus",
            user {
              type str "CFastaReader",
              data {
                {
                  label str "DefLine",
                  data str ">consensus"
                }
              }
            }
          },
          inst {
            repr raw,
            mol na,
            length 17,
            seq-data ncbi2na '8A3520F740'H
          }
        }
      }
    },
    subject sequences {
      {
        id {
          local id 26299
        },
        descr {
          title "FVFTF6N01A86BR",
          user {
            type str "CFastaReader",
            data {
              {
                label str "DefLine",
                data str ">FVFTF6N01A86BR"
              }
            }
          }
        },
        inst {
          repr raw,
          mol na,
          length 51,
          seq-data ncbi2na '0543B0A09C205F80228C520F74'H
        }
      }
    },
    algorithm-options {
      {
        name "EvalueThreshold",
        value cutoff e-value { 1, 10, 1 }
      },
      {
        name "UngappedMode",
        value boolean FALSE
      },
      {
        name "PercentIdentity",
        value real { 0, 10, 0 }
      },
      {
        name "HitlistSize",
        value integer 100
      },
      {
        name "EffectiveSearchSpace",
        value big-integer 0
      },
      {
        name "DbLength",
        value big-integer 0
      },
      {
        name "WindowSize",
        value integer 0
      },
      {
        name "DustFiltering",
        value boolean FALSE
      },
      {
        name "RepeatFiltering",
        value boolean FALSE
      },
      {
        name "MaskAtHash",
        value boolean TRUE
      },
      {
        name "MismatchPenalty",
        value integer -3
      },
      {
        name "MatchReward",
        value integer 2
      },
      {
        name "GapOpeningCost",
        value integer 5
      },
      {
        name "GapExtensionCost",
        value integer 2
      },
      {
        name "StrandOption",
        value strand-type both-strands
      },
      {
        name "WordSize",
        value integer 7
      }
    },
    format-options {
      {
        name "Web_JobTitle",
        value string "consensus"
      },
      {
        name "Web_BlastSpecialPage",
        value string "blast2seq"
      }
    }
  }
}

 >lcl|30439 FVFTF6N01A86BR
Length=51


                                                         Sort alignments 
for this subject sequence by:
                                                           E value  
Score  Percent identity
                                                           Query start 
position  Subject start position
 Score = 24.7 bits (26),  Expect = 2e-05
 Identities = 17/18 (94%), Gaps = 1/18 (5%)
 Strand=Plus/Plus

Query  1   GAGGAT-CCAGAATTCTC  17
           |||||| |||||||||||
Sbjct  34  GAGGATACCAGAATTCTC  51

Here's the output from a local search (I changed the expect to 5.0 just 
to prove to myself that some parameters are getting through OK):
my @params = (-program => 'blastn', -outfile => 'bl2seq.out', -FILTER => 
'F', -WORDSIZE => 7, -expect => 5.0);
my $factory = Bio::Tools::Run::StandAloneBlast->new(@params);
my $bl2seq_report = $factory->bl2seq($cons_seqobj, $single_seqobj); 
#consensus vs. FVFTF6N01A86BR
print Dumper $bl2seq_report->next_result;

$VAR1 = bless( {
                 '_inclusion_threshold' => undef,
                 '_queryacc' => 'adapter_consensus',
                 '_iteration_index' => 0,
                 '_iteration_count' => 1,
                 '_hits' => [],
                 '_hitindex' => 0,
                 '_querylength' => '17',
                 '_querydesc' => '',
                 '_iterations' => [
                                    bless( {
                                             
'_oldhits_not_below_threshold' => [],
                                             '_newhits_unclassified' => [],
                                             '_number' => 1,
                                             
'_oldhits_newly_below_threshold' => [],
                                             '_hit_factory' => bless( {
                                                                        
'interface' => 'Bio::Search::Hit::HitI',
                                                                        
'type' => 'Bio::Search::Hit::BlastHit',
                                                                        
'_loaded_types' => {
                                                                                             
'Bio::Search::Hit::BlastHit' => 1
                                                                                           
},
                                                                        
'_root_verbose' => 0
                                                                      }, 
'Bio::Factory::ObjectFactory' ),
                                             '_newhits_below_threshold' => [
                                                                             
{
                                                                               
'-algorithm' => 'BLASTN',
                                                                               
'-description' => '',
                                                                               
'-length' => '51',
                                                                               
'-query_len' => '17',
                                                                               
'-hsp_factory' => bless( {
                                                                                                          
'interface' => 'Bio::Search::HSP::HSPI',
                                                                                                          
'type' => 'Bio::Search::HSP::GenericHSP',
                                                                                                          
'_loaded_types' => {
                                                                                                                               
'Bio::Search::HSP::GenericHSP' => 1
                                                                                                                             
},
                                                                                                          
'_root_verbose' => 0
                                                                                                        
}, 'Bio::Factory::ObjectFactory' ),
                                                                               
'-name' => 'FVFTF6N01A86BR',
                                                                               
'-rank' => 1,
                                                                               
'-hsps' => [
                                                                                            
{
                                                                                              
'-query_start' => '7',
                                                                                              
'-algorithm' => 'BLASTN',
                                                                                              
'-hit_seq' => 'ccagaattctc',
                                                                                              
'-hit_length' => '51',
                                                                                              
'-query_length' => '17',
                                                                                              
'-query_desc' => '',
                                                                                              
'-query_frame' => 0,
                                                                                              
'-rank' => 1,
                                                                                              
'-hit_desc' => '',
                                                                                              
'-query_end' => '17',
                                                                                              
'-hit_name' => 'FVFTF6N01A86BR',
                                                                                              
'-identical' => '11',
                                                                                              
'-query_name' => 'adapter_consensus',
                                                                                              
'-evalue' => '1e-04',
                                                                                              
'-score' => '11',
                                                                                              
'-conserved' => '11',
                                                                                              
'-hit_frame' => 0,
                                                                                              
'-hsp_length' => '11',
                                                                                              
'-query_seq' => 'ccagaattctc',
                                                                                              
'-hit_start' => '41',
                                                                                              
'-homology_seq' => '|||||||||||',
                                                                                              
'-hit_end' => '51',
                                                                                              
'-bits' => '22.3'
                                                                                            
},
                                                                                            
{
                                                                                              
'-query_start' => '9',
                                                                                              
'-algorithm' => 'BLASTN',
                                                                                              
'-hit_seq' => 'agaattct',
                                                                                              
'-hit_length' => '51',
                                                                                              
'-query_length' => '17',
                                                                                              
'-query_desc' => '',
                                                                                              
'-query_frame' => 0,
                                                                                              
'-rank' => 2,
                                                                                              
'-hit_desc' => '',
                                                                                              
'-query_end' => '16',
                                                                                              
'-hit_name' => 'FVFTF6N01A86BR',
                                                                                              
'-identical' => '8',
                                                                                              
'-query_name' => 'adapter_consensus',
                                                                                              
'-evalue' => '0.007',
                                                                                              
'-score' => '8',
                                                                                              
'-conserved' => '8',
                                                                                              
'-hit_frame' => 0,
                                                                                              
'-hsp_length' => '8',
                                                                                              
'-query_seq' => 'agaattct',
                                                                                              
'-hit_start' => '50',
                                                                                              
'-homology_seq' => '||||||||',
                                                                                              
'-hit_end' => '43',
                                                                                              
'-bits' => '16.4'
                                                                                            
}
                                                                                          
],
                                                                               
'-accession' => 'FVFTF6N01A86BR',
                                                                               
'-significance' => '1e-04'
                                                                             
}
                                                                           
],
                                             '_root_verbose' => 0,
                                             
'_newhits_not_below_threshold' => [],
                                             '_oldhits_below_threshold' 
=> []
                                           }, 
'Bio::Search::Iteration::GenericIteration' )
                                  ],
                 '_hit_factory' => 
$VAR1->{'_iterations'}[0]{'_hit_factory'},
                 '_statistics' => bless( {
                                           'stats' => {
                                                        'S1' => '4',
                                                        'S1_bits' => '8.4',
                                                        'kappa_gapped' 
=> '0.711',
                                                        'X3_bits' => '99.1',
                                                        'X1' => '4',
                                                        'lambda_gapped' 
=> '1.37',
                                                        'X2' => '15',
                                                        'S2' => '4',
                                                        
'seqs_better_than_cutoff' => '1',
                                                        'Hits_to_DB' => '5',
                                                        'num_extensions' 
=> '2',
                                                        
'num_successful_extensions' => '2',
                                                        'X1_bits' => '7.9',
                                                        'X3' => '50',
                                                        'dbentries' => '1',
                                                        'entropy_gapped' 
=> '1.31',
                                                        'X2_bits' => '29.7',
                                                        'S2_bits' => '8.4'
                                                      }
                                         }, 
'Bio::Search::GenericStatistics' ),
                 '_algorithm' => 'BLASTN',
                 '_parameters' => bless( {
                                           'params' => {
                                                         'gapext' => '2',
                                                         'matrix' => 
'blastn matrix:1 -3',
                                                         'expect' => '5.0',
                                                         'allowgaps' => 
'yes',
                                                         'gapopen' => '5'
                                                       }
                                         }, 
'Bio::Tools::Run::GenericParameters' ),
                 '_root_verbose' => 0,
                 '_queryname' => 'adapter_consensus'
               }, 'Bio::Search::Result::BlastResult' );


From David.Messina at sbc.su.se  Wed Aug 18 18:32:37 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 19 Aug 2010 00:32:37 +0200
Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlast.pm - bl2seq
	question
In-Reply-To: <4C6C3259.4060304@bcm.tmc.edu>
References: <4C6C3259.4060304@bcm.tmc.edu>
Message-ID: <E8F0F7A7-BC33-4E37-8AAB-75A9470E82A5@sbc.su.se>

Hi Caleb,

The first thing I would do is take BioPerl out of the equation and test your local bl2seq on the command line. If you get the same output locally as on the web version, then there is a problem with BioPerl. If you're still seeing a discrepancy between the web and your local run, then this isn't a problem with BioPerl.

Just to be clear, BioPerl doesn't "implement" any of the BLAST programs; it is simply a wrapper around the programs that you download from NCBI. That doesn't mean BioPerl isn't at fault, of course, just that it's important to isolate the problem carefully.

The most common reasons for these discrepancies are:

- different version numbers of BLAST

2.2.21? 2.2.22? Is it the same on the web as locally?

- similarly, different implementations of BLAST

NCBI's old BLAST suite is now deprecated and replaced with BLAST+. All of the online BLAST web queries are Blast+ now ? are you running BLAST+ locally? (there's also a separate BioPerl wrapper for BLAST+ called Bio::Tools::Run::BlastPlus)

- hidden "default" parameters

Even though you're only changing a handful of parameters, the defaults (particularly on the web version) may be different than what you expect.

In your case, it looks like on the web version, match score is 2 and mismatch is -3. However, in the local version I believe match score is 1 and a mismatch is -3.

See this line in the params block near the end of your post:

	'matrix' => 'blastn matrix:1 -3',


Dave


From sidd.basu at gmail.com  Wed Aug 18 20:28:32 2010
From: sidd.basu at gmail.com (Siddhartha Basu)
Date: Wed, 18 Aug 2010 19:28:32 -0500
Subject: [Bioperl-l]  Re: [RFC] Interolog::Walk
In-Reply-To: <4C6BF4BD.5010200@sms.ed.ac.uk>
References: <4C6BF4BD.5010200@sms.ed.ac.uk>
Message-ID: <20100819002830.GA366@Macintosh-235.local>

Hi, 

On Wed, 18 Aug 2010, Giuseppe Gallone wrote:

> Hello BioPerl community - I've written a new module called Interolog::Walk 
> that I'm planning to put on CPAN. I would be grateful if you might take a 
> look at the brief description I attached and tell me what you think. I'll 
> be more than happy to post further details should the module be of some 
> interest for someone.
>
> Also, I am not totally sure about having the correct name for it. This is 
> my first module and It would be great if you could advise on naming it 
> appropriately. Hopefully the following description will give an idea on 
> what it does.
>
> ===================
>
>
> NAME
>     Interolog::Walk - Retrieve, score and visualize putative 
> Protein-Protein Interactions through the orthology-walk method
>
> DESCRIPTION
>     A common activity in computational biology is to mine protein-protein 
> interactions from publicly available databases in order to build 
> Protein-Protein Interaction (PPI) datasets.
> In many instances, however, the number of experimentally obtained annotated 
> PPIs is very scarce and it would be helpful to enrich the experimental 
> dataset with high-quality, computationally-inferred PPIs. Such 
> computationally-obtained dataset can extend, support or enrich experimental 
> PPI datasets, and are of crucial importance in high-throughput gene 
> prioritization studies, i.e. to drive hypotheses and restrict the 
> dimensionality of many gene functional discovery problems.
> This Perl Module, Interolog::Walk, is aimed at building putative PPI 
> datasets on the basis of a number of comparative biology paradigms: the 
> module implements a collection of computational biology algorithms based on 
> the concept of "orthology projection". If interacting proteins A and B in 
> organism X have orthologs A' and B' in organism Y, under certain conditions 
> one can assume that the interaction will be conserved in organism Y, i.e. 
> the A-B interaction can be "projected through the orthologies" to obtain a 
> putative A'-B' interaction. The pair of interactions (A-B) and (A'-B') are 
> named "Interologs" (see for instance [1] and [2]).
>
> Interolog::Walk collects, analyses and collates gene orthology data 
> provided by the Ensembl Consortium (www.ensembl.org) as well as PPI data 
> provided by EBI Intact (http://www.ebi.ac.uk/intact/). It provides the user 
> with the possibility of rating the quality and reliability of the putative 
> interactions collected, by means of confidence scores, and optionally 
> outputs network representations of the datasets, compatible with the 
> biological network representation standard, Cytoscape.

Sounds interesting. I am currently playing around with a perl based webapp for displaying interactome
using cytoscapeweb. Depending how your design pans out,  would be happy to
use your module as a backend analysis layer. And on a related note,  you
might want to have a look at bioperl-network and if there is any overlap
might be worth contributing.

-siddhartha

>
> USAGE
> In order to carry out an interolog walk we start with a set of gene 
> identifiers in one organism of interest. We query those ids against a 
> number of comparative biology databases to retrieve a list of orthologues 
> for each gene id of interest, in one or more species.
> In the following step we rely  on PPI databases to retrieve the list of 
> available interactors for the protein ids obtained. The output at this 
> stage consists of a list of interactors of the orthologues of the initial 
> gene set, plus several fields of ancillary data.
> In the last step of the process we  project the interactions - again using 
> orthology data - back to the original species of interest. The output of 
> the process is a list of PUTATIVE INTERACTORS of the initial gene set, plus 
> several fields of ancillary data.
>
> ====================
>
> Given the scope and the focus of the project, I would imagine that viable 
> alternatives for the namespace might be
>
> Bio::Orthology::InterologWalk
> Bio::InterologMap
>
> or maybe
> Interolog::Map
> Orthology::Map
> Orthology::InterologMap
>
> There are no similar projects as far as I could see so I shouldn't run the 
> risk of overlapping namespaces. Still I would love to know your informed 
> opinion about it.
>
> best,
> Giuseppe
>
>
>
> REFERENCES
> [1] Yu H, Luscombe NM, Lu HX, Zhu X, Xia Y, Han JD, Bertin N, Chung S, 
> Vidal M, Gerstein M. Annotation transfer between genomes: protein-protein 
> interologs and protein-DNA regulogs. Genome Research 2004 
> Jun;14(6):1107-18.
>
> [2]Wiles AM, Doderer M, Ruan J, Gu T-T, Ravi D, Blackman BA, Bishop AJR. 
> "Building and Analyzing Protein Interactome Networks by Cross-species 
> Comparisons." BMC Systems Biology 2010, 4:36 - PMID: 20353594
>
> -- 
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From dan.kortschak at adelaide.edu.au  Wed Aug 18 22:15:03 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Thu, 19 Aug 2010 11:45:03 +0930
Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query
Message-ID: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au>

Hi Everyone,

I'm wanting to set up a persistent data store for some of my work and am
in the process of choosing parts for my system. From my brief look
around I think I'd like to use BioSQL (next best choice being Chado -
but BioPerl bindings in bioperl-db for BioSQL being the decider here),
but have noticed comments some time back that bioperl-db and PostgreSQL
8.3 (my prefered engine - though MySQL is possible, but makes the whole
system messier) don't play well together.

What is the status of the casting expectation conflict between
bioperl-db and Pg8.3? The scripts are run with safe data, so
placeholders aren't strictly crucial (though speed may be an issue?) and
`$dbh->{pg_server_prepare} = 0;' seems like it could be an option.

Can anybody provide any advice on this issue?

thanks
Dan Kortschak


From cjfields at illinois.edu  Wed Aug 18 23:29:36 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 18 Aug 2010 22:29:36 -0500
Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlast.pm - bl2seq
	question
In-Reply-To: <E8F0F7A7-BC33-4E37-8AAB-75A9470E82A5@sbc.su.se>
References: <4C6C3259.4060304@bcm.tmc.edu>
	<E8F0F7A7-BC33-4E37-8AAB-75A9470E82A5@sbc.su.se>
Message-ID: <194D43EC-A44C-450A-B57B-EC379DBCB935@illinois.edu>

Wouldn't surprise me too much if the parameters are not set the same; IIRC the main BLAST URL API and the online NCBI Web-BLAST have different default settings.

chris

On Aug 18, 2010, at 5:32 PM, Dave Messina wrote:

> Hi Caleb,
> 
> The first thing I would do is take BioPerl out of the equation and test your local bl2seq on the command line. If you get the same output locally as on the web version, then there is a problem with BioPerl. If you're still seeing a discrepancy between the web and your local run, then this isn't a problem with BioPerl.
> 
> Just to be clear, BioPerl doesn't "implement" any of the BLAST programs; it is simply a wrapper around the programs that you download from NCBI. That doesn't mean BioPerl isn't at fault, of course, just that it's important to isolate the problem carefully.
> 
> The most common reasons for these discrepancies are:
> 
> - different version numbers of BLAST
> 
> 2.2.21? 2.2.22? Is it the same on the web as locally?
> 
> - similarly, different implementations of BLAST
> 
> NCBI's old BLAST suite is now deprecated and replaced with BLAST+. All of the online BLAST web queries are Blast+ now ? are you running BLAST+ locally? (there's also a separate BioPerl wrapper for BLAST+ called Bio::Tools::Run::BlastPlus)
> 
> - hidden "default" parameters
> 
> Even though you're only changing a handful of parameters, the defaults (particularly on the web version) may be different than what you expect.
> 
> In your case, it looks like on the web version, match score is 2 and mismatch is -3. However, in the local version I believe match score is 1 and a mismatch is -3.
> 
> See this line in the params block near the end of your post:
> 
> 	'matrix' => 'blastn matrix:1 -3',
> 
> 
> 
> Dave
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hlapp at drycafe.net  Thu Aug 19 01:48:19 2010
From: hlapp at drycafe.net (Hilmar Lapp)
Date: Thu, 19 Aug 2010 01:48:19 -0400
Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query
In-Reply-To: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au>
References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au>
Message-ID: <986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net>

Hi Dan,

the casting isn't an issue anymore, I think. (And even if it were,  
there is actually a small script that brings back the casts that were  
built into 8.2.) Have you found an example where it still is?

	-hilmar

On Aug 18, 2010, at 10:15 PM, Dan Kortschak wrote:

> Hi Everyone,
>
> I'm wanting to set up a persistent data store for some of my work  
> and am
> in the process of choosing parts for my system. From my brief look
> around I think I'd like to use BioSQL (next best choice being Chado -
> but BioPerl bindings in bioperl-db for BioSQL being the decider here),
> but have noticed comments some time back that bioperl-db and  
> PostgreSQL
> 8.3 (my prefered engine - though MySQL is possible, but makes the  
> whole
> system messier) don't play well together.
>
> What is the status of the casting expectation conflict between
> bioperl-db and Pg8.3? The scripts are run with safe data, so
> placeholders aren't strictly crucial (though speed may be an issue?)  
> and
> `$dbh->{pg_server_prepare} = 0;' seems like it could be an option.
>
> Can anybody provide any advice on this issue?
>
> thanks
> Dan Kortschak
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
===========================================================


From dan.kortschak at adelaide.edu.au  Thu Aug 19 01:54:03 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Thu, 19 Aug 2010 15:24:03 +0930
Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query
In-Reply-To: <986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net>
References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au>
	<986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net>
Message-ID: <1282197243.14127.27.camel@zoidberg.mbs.adelaide.edu.au>

Hi Hilmar,

No, I haven't found any problems, just hoping to avoid them by prior
research.

thanks
Dan

On Thu, 2010-08-19 at 01:48 -0400, Hilmar Lapp wrote:
> Hi Dan,
> 
> the casting isn't an issue anymore, I think. (And even if it were,  
> there is actually a small script that brings back the casts that
> were  
> built into 8.2.) Have you found an example where it still is?
> 
>         -hilmar


From biopython at maubp.freeserve.co.uk  Thu Aug 19 06:01:03 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 19 Aug 2010 11:01:03 +0100
Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query
In-Reply-To: <986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net>
References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au>
	<986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net>
Message-ID: <AANLkTikiTQwzKJGrAKAtCqbHiAVw_uENrJu4Y038=anD@mail.gmail.com>

On Thu, Aug 19, 2010 at 6:48 AM, Hilmar Lapp <hlapp at drycafe.net> wrote:
> Hi Dan,
>
> the casting isn't an issue anymore, I think. (And even if it were, there is
> actually a small script that brings back the casts that were built into
> 8.2.) Have you found an example where it still is?
>
> ? ? ? ?-hilmar

Hi Hilmar,

Do the bioperl-db bindings for BioSQL on PostgreSQL still require those
extra rules in the schema?
http://bugzilla.open-bio.org/show_bug.cgi?id=2839

Peter


From G.Gallone at sms.ed.ac.uk  Thu Aug 19 06:45:36 2010
From: G.Gallone at sms.ed.ac.uk (Giuseppe Gallone)
Date: Thu, 19 Aug 2010 11:45:36 +0100
Subject: [Bioperl-l] [RFC] Interolog::Walk
In-Reply-To: <8BBCD561-CA5E-412B-8CBC-34767006A74A@sbc.su.se>
References: <4C6BF4BD.5010200@sms.ed.ac.uk>
	<8BBCD561-CA5E-412B-8CBC-34767006A74A@sbc.su.se>
Message-ID: <4C6D0B50.4050902@sms.ed.ac.uk>

Hi Dave,

thank you very much for your helpful comments.

Regarding the module name: I will follow your advice and avoid to 
propose a new root during the module registration. As for the second 
level, I haven't been able to find anything related to 
homology/orthology, therefore I'm not sure whether I should go for

Bio::Orthology::InterologMap
or
Bio::Homology::InterologMap

The first one being maybe a bit more specific. I might also expand 
further as in

Bio::Orthology::Interolog::Map,

just in case somebody else finds other interesting applications for the 
Interolog concept and would like to "plug in" their own contribution. 
Would this make any sense?

I also appreciate your comments on the documentation. The one I provided 
is actually not the full pod I was planning to include, but rather an 
extract. What I have at the moment is a description, for each method, in 
the following form:

=====================================
    remove_duplicate_rows
      Usage     : $RC = InterologMap::remove_duplicate_rows(input_handle 
    => $dbh,
 
output_handle   => $out_data,
                                                            header 
     => 'standard',
                                                            );
      Purpose   : This is used to clean up a TSV data files of duplicate 
entries. Occasionally,  Intact can return duplicate
                  entries. This routine will make sure no such 
duplicates are kept. A new datafile is built.
                  The number of unique data rows is updated.
      Returns   : success/error
      Argument  : database handle to input file, filehandle to 
outputfile, header type. Header type is one of the following:
                  - "standard": when the routine is used to clean up an 
interolog walk file (the header will be longer)
                  - "direct":   when the routine is used to clean up a 
file of real db interaction (the header is shorter)
                  - no field provided: default is standard
      Throws    : -
      Comment   : Sample


     See Also :
=======================================

On top of that, there is a DESCRIPTION, USAGE, and SYNOPSIS. The 
synopsis has some code with an example of typical usage of the module. 
Please take a look at this (attached below) and tell me what you think.

You mention that the description contains a lot of background 
information. Would you recommend reducing it, or placing it elsewhere?
I was considering to write a little tutorial in latex as soon as 
possible anyway, to provide a "centralised" source of information to 
familiarise with the module. Does this respect the CPAN regulations?

As for your question on the structure of the module: you are indeed 
right, the idea when running the "orthology walk" is to create a 
pipeline of subroutines: there's a core set of subroutines meant to work 
in strict sequentiality.
Each of these subroutines expects, as input, the output of the previous 
one. The input/output dataset is currently in the form of a TSV text 
file, which I process with the help of the DBI module (to be more 
specific, I use DBD::CSV).

While there's a certain flexibility regarding how to use the module, one 
core idea remains: in order to get the set of putative interactors, the 
user would have to call at least three basic routines:

(A)
=================
1)get_forward_orthologies(): this queries the initial gene list against 
one or more Ensembl dbs (using the Ensembl Perl Api) and retrieves their 
orthologues, plus a number of ancillary data fields (mainly conservation 
data, eg dn/ds ratio,distance from ancestor,orthology type, etc)

2)get_interactors(): this queries the orthology list built in the 
previous stage against a PSICQUIC-enabled PPI db using Rest (at the 
moment I only query the EBI Intact DB, but it should be easy to expand 
this and query all PSICQUIC compatible PPI dbs transparently). This step 
will "fatten" the dataset built in (1) with the interactors of those 
orthologues, plus ancillary data (including lots of parameters 
describing the quality, nature, origin of the annotated interaction)

3)get_backward_orthologies(): this queries the interactor list built in 
the previous stage against one or more Ensembl dbs to find orthologues 
*back* in the original species. It also adds a number of supplementary 
information just like in (1).
==================

At the end of this procedure the user will have a TSV files where each 
row contains a binary putative interaction plus (currently) 37 
supplementary data fields.

One can then scan these results to check for duplicates, to compute 
counts, to see if we have discovered new gene ids that were not present 
in the original dataset (hopefully we have :) ).

Most importantly, one can then further process these results to do one 
or more of the following:
(B) compute a global confidence score to assess the reliability of the 
each binary putative interaction
(C) extract the binary putative PPIs from the dataset and save them in a 
format compatible with Cytoscape: this helps providing a visual quality 
to the result: one can then apply network analysis tools to discover 
motifs, clusters, etc. The format I use is currently .SIF + attributes, 
as detailed in
http://cytoscape.wodaklab.org/wiki/Cytoscape_User_Manual/Network_Formats
(D) given the same initial gene list, one can also build a dataset of 
REAL, experimentally-obtained PPIs,(without mapping through orthologies 
in other species). One can then compare this dataset with the Putative 
dataset to see if/where the two overlap, what's the intersection or the 
differences, etc.


In order to suggest ways of using the module I have written 4 sample 
scripts and I will include them in the module. Each script utilises the 
module and uses/reuses subroutines in a pipeline fashion, and does the 
following:

1)doInterologWalk.pl: runs the basic pipeline in (A)
2)doScores.pl: computes and adds confidence scores as explained in (B)
3)doNetworks.pl: computes SIF network + attributes as in (D)
4)getRealInteractions.pl: runs a pipeline to obtain real PPIs from the 
inital gene set.

Hope I didn't make this too confusing. I would love to hear back from 
you and from anybody else that would like to provide feedback.

Cheers
Giuseppe

On 18/08/10 17:52, Dave Messina wrote:
> Hi Giuseppe,
>
> Sounds really interesting ? thanks for posting this.
>
>> Bio::Orthology::InterologWalk
>
> I vote for this name, or in any case something with Bio:: as the top-level namespace since it's a biology-related package.
>
> I like that you're providing a lot of background and information about the project in the documentation. However, the USAGE section should give information about how to use the module, with example code. You can look at other modules on CPAN (or in BioPerl) to see the conventions for writing documentation.
>
> Also, from what you wrote, it sounds like this might be a pipeline or a script rather than a module per se, or perhaps a script and a set of modules. It would be helpful to clarify in your documentation (if you haven't already) how exactly things are organized (and of course example code will help with that, too).
>
>
> Hope that's helpful, and let us know when you've got it up on CPAN so we can try it out!
>
>
> Dave
>
>

NAME
     Interolog::Walk - Retrieve, score and visualize putative 
Protein-Protein
     Interactions through the orthology-walk method

SYNOPSIS
       use Interolog::Walk;

     First, obtain Intact Interactions for the dataset (see example in
     "getDirectInteractions.pl"):

       #get a registry from Ensembl
       my $registry = InterologMap::setup_ensembl_adaptor(connect_to_db 
  => $ensembl_db,
                                                          source_species 
=> $sourceorg,
                                                          verbose 
  => 1
                                                          );


       #query actual interactions
       $RC = InterologMap::Direct::get_direct_interactions(registry 
     => $registry,
 
source_species   => $sourceorg,
                                                           input_path 
     => $in_path,
                                                           output_path 
     => $out_path,
                                                           url 
     => $url,
                                                           );

     do some postprocessing (see "do_counts()" and "extract_unseen_ids()" )
     and then do the actual interolog walk on the dataset with the following
     sequence of three methods.

     get orthologues of starting set:

       $RC = InterologMap::get_forward_orthologies(registry        => 
$registry,
                                                   ensembl_db      => 
$ensembl_db,
                                                   input_path      => 
$in_path,
                                                   output_path     => 
$out_path,
                                                   source_org      => 
$sourceorg,
                                                   dest_org        => 
$destorg,
                                                   );

     add interactors of orthologues found by "get_forward_orthologies()":

       $RC = InterologMap::get_interactions(input_path    => $in_path,
                                            output_path   => $out_path,
                                            url           => $url,
                                            url_global    => $url_global,
                                            );

     add orthologues of interactors found by "get_interactions()":

       $RC = InterologMap::get_backward_orthologies(registry    => 
$registry,
                                                    ensembl_db  => 
$ensembl_db,
                                                    input_path  => $in_path,
                                                    output_path => 
$out_path,
                                                    error_path  => 
$err_path,
                                                    source_org  => 
$sourceorg,
                                                    );

     do some postprocessing (see "remove_duplicate_rows()", "do_counts()",
     "extract_unseen_ids()") and then optionally compute a composite score
     for the putative interactions obtained:

        $RC = InterologMap::Scores::compute_scores(input_path      => 
$in_path,
                                                   score_path      => 
$score_path,
                                                   output_path     => 
$out_path,
                                                   term_graph      => 
$onto_graph,
                                                   M_IT_SCORE      => $M_IT,
                                                   M_DM_SCORE      => $M_DM,
                                                   M_ME_DM_SCORE   => 
$M_MDM,
                                                   M_ME_TAXA_SCORE => 
$M_MTAXA
                                                   );

     get some networks and network attributes which you can then visualise
     with cytoscape

        $RC = InterologMap::Networks::do_network(registry            => 
$registry,
                                                    db               => 
$ensembl_db,
                                                    input_path       => 
$in_path,
                                                    output_path      => 
$out_path,
                                                    source_org       => 
$sourceorg,
                                                    orthology_type   => 
$orthtype,
                                                    );

        $RC = InterologMap::Networks::do_attributes(registry      => 
$registry,
                                                    input_path    => 
$in_path,
                                                    output_path   => 
$out_path,
                                                    source_org    => 
$sourceorg,
                                                    label_type    => 
'external name'
                                                    );

     *The synopsis above only lists the major methods and parameters.*

DESCRIPTION
     A common activity in computational biology is to mine protein-protein
     interactions from publicly available databases to build 
*Protein-Protein
     Interaction* (PPI) datasets. In many instances, however, the number of
     experimentally obtained annotated PPIs is very scarce and it would be
     helpful to enrich the experimental dataset with high-quality,
     computationally-inferred PPIs. Such computationally-obtained 
dataset can
     extend, support or enrich experimental PPI datasets, and are of crucial
     importance in high-throughput gene prioritization studies, i.e. to 
drive
     hypotheses and restrict the dimensionality of functional discovery
     problems. This Perl Module, Interolog::Walk, is aimed at building
     putative PPI datasets on the basis of a number of comparative biology
     paradigms: the module implements a collection of computational biology
     algorithms based on the concept of "orthology projection". If
     interacting proteins A and B in organism X have orthologs A' and B' in
     organism Y, under certain conditions one can assume that the 
interaction
     will be conserved in organism Y, i.e. the A-B interaction can be
     "projected through the orthologies" to obtain a putative A'-B'
     interaction. The pair of interactions (A-B) and (A'-B') are named
     "Interologs".

     Interolog::Walk collects, analyses and collates gene orthology data
     provided by the Ensembl Consortium as well as PPI data provided by EBI
     Intact. It provides the user with the possibility of rating the quality
     and reliability of the putative interactions collected, by means of
     confidence scores, and optionally outputs network representations 
of the
     datasets, compatible with the biological network representation
     standard, Cytoscape.

BASIC USAGE
   Rationale behind "Interolog::Walk".
                                   \EBI Intact API/
              .--------------.            |             .-------------.
          (2) | A(e.g. mouse)|<------------------------>|   B(mouse)  |  (3)
              `--------------'          <PPI>           `-------------'
                     ^                                         |
        /Ensembl\    | <Orthology>                 <Orthology> | \ Ensembl /
       / Compara \   |                                         |  \Compara/
      /    Api    \  |                                         |   \ Api /
                     |                                         |
              .--------------.                           .-------------.
          (1) | A'(e.g. fly) |. . . . . . . . . . . . .  |   B'(fly)   | (4)
              `--------------'     [SCORED]PUTATIVE PPI  `-------------'
                              (Output of Interolog::Walk)

     In order to carry out an interolog walk we start with a set of gene
     identifiers in one organism of interest (1). We query those ids against
     a number of comparative biology databases to retrieve a list of
     orthologues for the gene id of interest, in one or more species (2). In
     the next step we rely instead on PPI databases to retrieve the list of
     available interactors for the protein ids obtained in (2). The 
output at
     this stage consists of a list of interactors of the orthologues of the
     initial gene set, plus several fields of ancillary data (whose
     importance will be explained later) (3). In the last step of this
     process we will need to project the interactions in (3) - again using
     orthology data - back to the original species of interest. The 
output of
     the process is a list of PUTATIVE INTERACTORS of the initial gene set,
     plus several fields of ancillary data.

     "Interolog::Walk" provides three main functions to carry out the basic
     walk, "get_forward_orthologies()", "get_interactions()" and
     "get_backward_orthologies()". These functions must be called strictly
     sequentially in your script, as the process, analyse and attach data to
     the output in a pipeline-like fashion, i.e. processing the output 
of the
     preceding function.

     get_forward_orthologies
     get_interactions
     get_backward_orthologies

SCORING THE PUTATIVE INTERACTIONS
BUILDING PUTATIVE INTERACTION NETWORKS
BUGS
     Please report any you find

SUPPORT
     TODO

AUTHOR
     Giuseppe Gallone <ggallone at cpan.org>

     CPAN ID: GGALLONE

     University of Edinburgh

COPYRIGHT
     The Interolog::Walk module is Copyright (c) 2010 Giuseppe Gallone All
     rights reserved.

     You may distribute under the terms of either the GNU General Public
     License or the Artistic License, as specified in the Perl 5.10.0 README
     file.

SEE ALSO


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


From G.Gallone at sms.ed.ac.uk  Thu Aug 19 08:42:28 2010
From: G.Gallone at sms.ed.ac.uk (Giuseppe Gallone)
Date: Thu, 19 Aug 2010 13:42:28 +0100
Subject: [Bioperl-l] [RFC] Interolog::Walk
In-Reply-To: <20100819002830.GA366@Macintosh-235.local>
References: <4C6BF4BD.5010200@sms.ed.ac.uk>
	<20100819002830.GA366@Macintosh-235.local>
Message-ID: <4C6D26B4.5090702@sms.ed.ac.uk>

Dear Siddhartha,

glad to hear this might be helpful. As for the bioperl-network package 
you mention, thank for you for mentioning that. I gave a quick look to 
its documentation and looks like a much deeper and more complex effort 
than what I have in my package. I've actually been using a lot the 
package Graph on which it seems to be based and found it very helpful.

I'm not sure if the network routines in my module overlap with it 
though: all I do in my package is parse the dataset, filtering out only 
what requested to build a cytoscape SIF file and optionally some 
cytoscape NOA attribute files, as requested by the cytoscape 
specification in

http://cytoscape.wodaklab.org/wiki/Cytoscape_User_Manual/Network_Formats

instead it looks like  bioperl-network actually builds some kind of 
internal representation of the network for further manipulation in Perl, 
if I understand it correctly?

Kind regards
Giuseppe

On 19/08/10 01:28, Siddhartha Basu wrote:

> Sounds interesting. I am currently playing around with a perl based webapp for displaying interactome
> using cytoscapeweb. Depending how your design pans out,  would be happy to
> use your module as a backend analysis layer. And on a related note,  you
> might want to have a look at bioperl-network and if there is any overlap
> might be worth contributing.
>
> -siddhartha
>


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


From xupeng86 at gmail.com  Thu Aug 19 04:02:48 2010
From: xupeng86 at gmail.com (xupeng)
Date: Thu, 19 Aug 2010 16:02:48 +0800
Subject: [Bioperl-l] Why I can't find the perl script "load_seqdatabase.pl"
	when use biosql database?
Message-ID: <201008191602.49068.xupeng86@gmail.com>

 	I've downloaded the biosql-1.0.1.tar.gz. It works well. But I 
can't find the 'load_seqdatabase.pl' when I try to import the 
Genbank files into biosql databsase. 
	Can anyone give me a copy of that file? 
many thanks ! 


From sunhanifk at gmail.com  Thu Aug 19 10:25:38 2010
From: sunhanifk at gmail.com (han sun)
Date: Thu, 19 Aug 2010 22:25:38 +0800
Subject: [Bioperl-l] Could I install BioPerl on Windows with the ActivePerl
	5.12.1?
Message-ID: <AANLkTi=ycKzqWWQ-FHk=4WBxhedt7CYT-WkBZkxRjgrm@mail.gmail.com>

Hello everyone,

I have used perl for several months,and I now want to feel the power of
bioperl.
But it seems that the installing is more difficult than I thought.

I typed the commands.


install-shell


rep add bioperl http://bioperl.org/DIST


rep add uwinnipeg
http://cpan.uwinnipeg.ca/PPMPackages/12xx/<http://cpan.uwinnipeg.ca/PPMPackages/10xx/>


rep add trouchelle http://trouchelle.com/ppm12/

install BioPerl

However,the installing failed,

ppm install failed:
Can't find any package that provides List::MoreUtils for Bundle-BioPerl-Core
Can't find any package that provides PostScript::TextBlock for
Bundle-BioPerl-Core
Can't find any package that provides Ace:: for Bundle-BioPerl-Core
Can't find any package that provides SOAP::Lite for Bundle-BioPerl-Core
Can't find any package that provides Convert::Binary::C for
Bundle-BioPerl-Core
Can't find any package that provides XML::Twig for Bundle-BioPerl-Core
Can't find any package that provides DB_File:: for Bundle-BioPerl-Core
Can't find any package that provides IPC::Run for GraphViz
Can't find any package that provides XML-XPathEngine for XML-DOM-XPath
Can't find any package that provides List-MoreUtils for Moose
Can't find any package that provides List-MoreUtils for Class-MOP


then I tried

install http://www.bribes.org/perl/ppm/GD.ppd

and tried the installation again,but it still didn't help.

*
*
*
*
*
*


*Do you konw what's wrong with the problem?*
*
*
*
*
*Please help me,thanks very much.*


From cjfields1 at gmail.com  Thu Aug 19 10:33:26 2010
From: cjfields1 at gmail.com (Christopher Fields)
Date: Thu, 19 Aug 2010 09:33:26 -0500
Subject: [Bioperl-l] Could I install BioPerl on Windows with the
	ActivePerl 5.12.1?
In-Reply-To: <AANLkTi=ycKzqWWQ-FHk=4WBxhedt7CYT-WkBZkxRjgrm@mail.gmail.com>
References: <AANLkTi=ycKzqWWQ-FHk=4WBxhedt7CYT-WkBZkxRjgrm@mail.gmail.com>
Message-ID: <78E913D5-00E2-45F2-AA9D-7F4A7CDBFDA1@gmail.com>

Try using ActivePerl 5.10 instead of v5.12.  It's very possible the PPM won't work for v5.12 yet.

chris

On Aug 19, 2010, at 9:25 AM, han sun wrote:

> Hello everyone,
> 
> I have used perl for several months,and I now want to feel the power of
> bioperl.
> But it seems that the installing is more difficult than I thought.
> 
> I typed the commands.
> 
> 
> 
> install-shell
> 
> 
> rep add bioperl http://bioperl.org/DIST
> 
> 
> rep add uwinnipeg
> http://cpan.uwinnipeg.ca/PPMPackages/12xx/<http://cpan.uwinnipeg.ca/PPMPackages/10xx/>
> 
> 
> rep add trouchelle http://trouchelle.com/ppm12/
> 
> install BioPerl
> 
> However,the installing failed,
> 
> ppm install failed:
> Can't find any package that provides List::MoreUtils for Bundle-BioPerl-Core
> Can't find any package that provides PostScript::TextBlock for
> Bundle-BioPerl-Core
> Can't find any package that provides Ace:: for Bundle-BioPerl-Core
> Can't find any package that provides SOAP::Lite for Bundle-BioPerl-Core
> Can't find any package that provides Convert::Binary::C for
> Bundle-BioPerl-Core
> Can't find any package that provides XML::Twig for Bundle-BioPerl-Core
> Can't find any package that provides DB_File:: for Bundle-BioPerl-Core
> Can't find any package that provides IPC::Run for GraphViz
> Can't find any package that provides XML-XPathEngine for XML-DOM-XPath
> Can't find any package that provides List-MoreUtils for Moose
> Can't find any package that provides List-MoreUtils for Class-MOP
> 
> 
> then I tried
> 
> install http://www.bribes.org/perl/ppm/GD.ppd
> 
> and tried the installation again,but it still didn't help.
> 
> *
> *
> *
> *
> *
> *
> 
> 
> *Do you konw what's wrong with the problem?*
> *
> *
> *
> *
> *Please help me,thanks very much.*
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hlapp at drycafe.net  Thu Aug 19 10:53:22 2010
From: hlapp at drycafe.net (Hilmar Lapp)
Date: Thu, 19 Aug 2010 10:53:22 -0400
Subject: [Bioperl-l] Why I can't find the perl script
	"load_seqdatabase.pl" when use biosql database?
In-Reply-To: <201008191602.49068.xupeng86@gmail.com>
References: <201008191602.49068.xupeng86@gmail.com>
Message-ID: <14AD92D9-3EA7-4E74-8EF7-2B2EC70D9095@drycafe.net>

The file comes with Bioperl-db, not BioSQL. That is so because it  
depends on BioPerl and on Bioperl-db, and so you will need to have  
both installed.

	-hilmar

On Aug 19, 2010, at 4:02 AM, xupeng wrote:

> 	I've downloaded the biosql-1.0.1.tar.gz. It works well. But I
> can't find the 'load_seqdatabase.pl' when I try to import the
> Genbank files into biosql databsase.
> 	Can anyone give me a copy of that file?
> many thanks !
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
===========================================================


From hlapp at drycafe.net  Thu Aug 19 10:58:46 2010
From: hlapp at drycafe.net (Hilmar Lapp)
Date: Thu, 19 Aug 2010 10:58:46 -0400
Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query
In-Reply-To: <AANLkTikiTQwzKJGrAKAtCqbHiAVw_uENrJu4Y038=anD@mail.gmail.com>
References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au>
	<986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net>
	<AANLkTikiTQwzKJGrAKAtCqbHiAVw_uENrJu4Y038=anD@mail.gmail.com>
Message-ID: <045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net>

Yes, unfortunately they do. The feature for obviating them (namely  
nested transactions) is there in Pg 8.2+, but Bioperl-db doesn't use  
them yet ... I have to learn more about Class::DBIx first to decide  
whether it's better to first implement nested transactions in the home- 
grown ORM that Bioperl-db in essence is, or whether it's better to  
reimplement everything in Class::DBIx instead.

There are new datatypes in Bioperl, and relations in BioSQL that could  
hold them, and so I need to decide what's the way forward.

	-hilmar

On Aug 19, 2010, at 6:01 AM, Peter wrote:

> On Thu, Aug 19, 2010 at 6:48 AM, Hilmar Lapp <hlapp at drycafe.net>  
> wrote:
>> Hi Dan,
>>
>> the casting isn't an issue anymore, I think. (And even if it were,  
>> there is
>> actually a small script that brings back the casts that were built  
>> into
>> 8.2.) Have you found an example where it still is?
>>
>>        -hilmar
>
> Hi Hilmar,
>
> Do the bioperl-db bindings for BioSQL on PostgreSQL still require  
> those
> extra rules in the schema?
> http://bugzilla.open-bio.org/show_bug.cgi?id=2839
>
> Peter

-- 
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
===========================================================


From mmuratet at hudsonalpha.org  Thu Aug 19 11:00:52 2010
From: mmuratet at hudsonalpha.org (Michael Muratet)
Date: Thu, 19 Aug 2010 10:00:52 -0500
Subject: [Bioperl-l] Why I can't find the perl script
	"load_seqdatabase.pl" when use biosql database?
In-Reply-To: <14AD92D9-3EA7-4E74-8EF7-2B2EC70D9095@drycafe.net>
References: <201008191602.49068.xupeng86@gmail.com>
	<14AD92D9-3EA7-4E74-8EF7-2B2EC70D9095@drycafe.net>
Message-ID: <C6FECD93-E599-465B-A93A-BD1F2CDFBE9C@hudsonalpha.org>


On Aug 19, 2010, at 9:53 AM, Hilmar Lapp wrote:

> The file comes with Bioperl-db, not BioSQL. That is so because it  
> depends on BioPerl and on Bioperl-db, and so you will need to have  
> both installed.

Is load_seqdatabase.pl still the best method? I vaguely remember a  
post that said that load_seqdatabase was deprecated, but I can't find  
it in the archives.

Mike

>
> 	-hilmar
>
> On Aug 19, 2010, at 4:02 AM, xupeng wrote:
>
>> 	I've downloaded the biosql-1.0.1.tar.gz. It works well. But I
>> can't find the 'load_seqdatabase.pl' when I try to import the
>> Genbank files into biosql databsase.
>> 	Can anyone give me a copy of that file?
>> many thanks !
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> -- 
> ===========================================================
> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
> ===========================================================
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Michael Muratet, Ph.D.
Senior Scientist
HudsonAlpha Institute for Biotechnology
mmuratet at hudsonalpha.org
(256) 327-0473 (p)
(256) 327-0966 (f)

Room 4005
601 Genome Way
Huntsville, Alabama 35806


From hlapp at drycafe.net  Thu Aug 19 11:29:31 2010
From: hlapp at drycafe.net (Hilmar Lapp)
Date: Thu, 19 Aug 2010 11:29:31 -0400
Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query
In-Reply-To: <5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu>
References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au>
	<986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net>
	<AANLkTikiTQwzKJGrAKAtCqbHiAVw_uENrJu4Y038=anD@mail.gmail.com>
	<045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net>
	<5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu>
Message-ID: <5F77404A-086D-4D0C-B3A5-F5119FCF878A@drycafe.net>


On Aug 19, 2010, at 11:09 AM, Chris Fields wrote:

> DBIx::Class


Did I have this in the wrong order :-) More coffee, please.
-- 
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
===========================================================


From hlapp at drycafe.net  Thu Aug 19 11:30:26 2010
From: hlapp at drycafe.net (Hilmar Lapp)
Date: Thu, 19 Aug 2010 11:30:26 -0400
Subject: [Bioperl-l] Why I can't find the perl script
	"load_seqdatabase.pl" when use biosql database?
In-Reply-To: <C6FECD93-E599-465B-A93A-BD1F2CDFBE9C@hudsonalpha.org>
References: <201008191602.49068.xupeng86@gmail.com>
	<14AD92D9-3EA7-4E74-8EF7-2B2EC70D9095@drycafe.net>
	<C6FECD93-E599-465B-A93A-BD1F2CDFBE9C@hudsonalpha.org>
Message-ID: <C5FD4B85-25B3-4D76-AA99-B3DBE42400C7@drycafe.net>

It's not deprecated. Unless I'm again mixing up something?

	-hilmar

On Aug 19, 2010, at 11:00 AM, Michael Muratet wrote:

>
> On Aug 19, 2010, at 9:53 AM, Hilmar Lapp wrote:
>
>> The file comes with Bioperl-db, not BioSQL. That is so because it  
>> depends on BioPerl and on Bioperl-db, and so you will need to have  
>> both installed.
>
> Is load_seqdatabase.pl still the best method? I vaguely remember a  
> post that said that load_seqdatabase was deprecated, but I can't  
> find it in the archives.
>
> Mike
>
>>
>> 	-hilmar
>>
>> On Aug 19, 2010, at 4:02 AM, xupeng wrote:
>>
>>> 	I've downloaded the biosql-1.0.1.tar.gz. It works well. But I
>>> can't find the 'load_seqdatabase.pl' when I try to import the
>>> Genbank files into biosql databsase.
>>> 	Can anyone give me a copy of that file?
>>> many thanks !
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> -- 
>> ===========================================================
>> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
>> ===========================================================
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Michael Muratet, Ph.D.
> Senior Scientist
> HudsonAlpha Institute for Biotechnology
> mmuratet at hudsonalpha.org
> (256) 327-0473 (p)
> (256) 327-0966 (f)
>
> Room 4005
> 601 Genome Way
> Huntsville, Alabama 35806
>
>
>
>
>

-- 
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
===========================================================


From cjfields at illinois.edu  Thu Aug 19 11:09:13 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 19 Aug 2010 10:09:13 -0500
Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query
In-Reply-To: <045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net>
References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au>
	<986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net>
	<AANLkTikiTQwzKJGrAKAtCqbHiAVw_uENrJu4Y038=anD@mail.gmail.com>
	<045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net>
Message-ID: <5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu>

I think it's worth exploring having a DBIx::Class-based middle-ware approach similar to what Rob Buels has done for Chado.  That would be fairly easy to get started using DBIx::Class::Schema::Loader.  

After that it would require optimization and tweaking, which is potentially more complex than Rob's setup as Chado is very Pg-specific, but maybe Rob can elaborate...

chris

On Aug 19, 2010, at 9:58 AM, Hilmar Lapp wrote:

> Yes, unfortunately they do. The feature for obviating them (namely nested transactions) is there in Pg 8.2+, but Bioperl-db doesn't use them yet ... I have to learn more about Class::DBIx first to decide whether it's better to first implement nested transactions in the home-grown ORM that Bioperl-db in essence is, or whether it's better to reimplement everything in Class::DBIx instead.
> 
> There are new datatypes in Bioperl, and relations in BioSQL that could hold them, and so I need to decide what's the way forward.
> 
> 	-hilmar
> 
> On Aug 19, 2010, at 6:01 AM, Peter wrote:
> 
>> On Thu, Aug 19, 2010 at 6:48 AM, Hilmar Lapp <hlapp at drycafe.net> wrote:
>>> Hi Dan,
>>> 
>>> the casting isn't an issue anymore, I think. (And even if it were, there is
>>> actually a small script that brings back the casts that were built into
>>> 8.2.) Have you found an example where it still is?
>>> 
>>>       -hilmar
>> 
>> Hi Hilmar,
>> 
>> Do the bioperl-db bindings for BioSQL on PostgreSQL still require those
>> extra rules in the schema?
>> http://bugzilla.open-bio.org/show_bug.cgi?id=2839
>> 
>> Peter
> 
> -- 
> ===========================================================
> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
> ===========================================================
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Thu Aug 19 11:37:39 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 19 Aug 2010 10:37:39 -0500
Subject: [Bioperl-l] Why I can't find the perl script
	"load_seqdatabase.pl" when use biosql database?
In-Reply-To: <C5FD4B85-25B3-4D76-AA99-B3DBE42400C7@drycafe.net>
References: <201008191602.49068.xupeng86@gmail.com>
	<14AD92D9-3EA7-4E74-8EF7-2B2EC70D9095@drycafe.net>
	<C6FECD93-E599-465B-A93A-BD1F2CDFBE9C@hudsonalpha.org>
	<C5FD4B85-25B3-4D76-AA99-B3DBE42400C7@drycafe.net>
Message-ID: <68FB78FF-11F7-43D7-9FA3-5DFF7D391FAB@illinois.edu>

I don't recall this either.  So, can't blame it on lack of coffee :)

chris

On Aug 19, 2010, at 10:30 AM, Hilmar Lapp wrote:

> It's not deprecated. Unless I'm again mixing up something?
> 
> 	-hilmar
> 
> On Aug 19, 2010, at 11:00 AM, Michael Muratet wrote:
> 
>> 
>> On Aug 19, 2010, at 9:53 AM, Hilmar Lapp wrote:
>> 
>>> The file comes with Bioperl-db, not BioSQL. That is so because it depends on BioPerl and on Bioperl-db, and so you will need to have both installed.
>> 
>> Is load_seqdatabase.pl still the best method? I vaguely remember a post that said that load_seqdatabase was deprecated, but I can't find it in the archives.
>> 
>> Mike
>> 
>>> 
>>> 	-hilmar
>>> 
>>> On Aug 19, 2010, at 4:02 AM, xupeng wrote:
>>> 
>>>> 	I've downloaded the biosql-1.0.1.tar.gz. It works well. But I
>>>> can't find the 'load_seqdatabase.pl' when I try to import the
>>>> Genbank files into biosql databsase.
>>>> 	Can anyone give me a copy of that file?
>>>> many thanks !
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>>> -- 
>>> ===========================================================
>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
>>> ===========================================================
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> Michael Muratet, Ph.D.
>> Senior Scientist
>> HudsonAlpha Institute for Biotechnology
>> mmuratet at hudsonalpha.org
>> (256) 327-0473 (p)
>> (256) 327-0966 (f)
>> 
>> Room 4005
>> 601 Genome Way
>> Huntsville, Alabama 35806
>> 
>> 
>> 
>> 
>> 
> 
> -- 
> ===========================================================
> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
> ===========================================================
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From mmuratet at hudsonalpha.org  Thu Aug 19 11:40:02 2010
From: mmuratet at hudsonalpha.org (Michael Muratet)
Date: Thu, 19 Aug 2010 10:40:02 -0500
Subject: [Bioperl-l] Why I can't find the perl script
	"load_seqdatabase.pl" when use biosql database?
In-Reply-To: <68FB78FF-11F7-43D7-9FA3-5DFF7D391FAB@illinois.edu>
References: <201008191602.49068.xupeng86@gmail.com>
	<14AD92D9-3EA7-4E74-8EF7-2B2EC70D9095@drycafe.net>
	<C6FECD93-E599-465B-A93A-BD1F2CDFBE9C@hudsonalpha.org>
	<C5FD4B85-25B3-4D76-AA99-B3DBE42400C7@drycafe.net>
	<68FB78FF-11F7-43D7-9FA3-5DFF7D391FAB@illinois.edu>
Message-ID: <A0AD0D4E-89EC-4FA0-8625-FF0A2EFB5669@hudsonalpha.org>


On Aug 19, 2010, at 10:37 AM, Chris Fields wrote:

> I don't recall this either.  So, can't blame it on lack of coffee :)

Thanks. I'll keep using it!

Mike
>
> chris
>
> On Aug 19, 2010, at 10:30 AM, Hilmar Lapp wrote:
>
>> It's not deprecated. Unless I'm again mixing up something?
>>
>> 	-hilmar
>>
>> On Aug 19, 2010, at 11:00 AM, Michael Muratet wrote:
>>
>>>
>>> On Aug 19, 2010, at 9:53 AM, Hilmar Lapp wrote:
>>>
>>>> The file comes with Bioperl-db, not BioSQL. That is so because it  
>>>> depends on BioPerl and on Bioperl-db, and so you will need to  
>>>> have both installed.
>>>
>>> Is load_seqdatabase.pl still the best method? I vaguely remember a  
>>> post that said that load_seqdatabase was deprecated, but I can't  
>>> find it in the archives.
>>>
>>> Mike
>>>
>>>>
>>>> 	-hilmar
>>>>
>>>> On Aug 19, 2010, at 4:02 AM, xupeng wrote:
>>>>
>>>>> 	I've downloaded the biosql-1.0.1.tar.gz. It works well. But I
>>>>> can't find the 'load_seqdatabase.pl' when I try to import the
>>>>> Genbank files into biosql databsase.
>>>>> 	Can anyone give me a copy of that file?
>>>>> many thanks !
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>> -- 
>>>> ===========================================================
>>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
>>>> ===========================================================
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> Michael Muratet, Ph.D.
>>> Senior Scientist
>>> HudsonAlpha Institute for Biotechnology
>>> mmuratet at hudsonalpha.org
>>> (256) 327-0473 (p)
>>> (256) 327-0966 (f)
>>>
>>> Room 4005
>>> 601 Genome Way
>>> Huntsville, Alabama 35806
>>>
>>>
>>>
>>>
>>>
>>
>> -- 
>> ===========================================================
>> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
>> ===========================================================
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

Michael Muratet, Ph.D.
Senior Scientist
HudsonAlpha Institute for Biotechnology
mmuratet at hudsonalpha.org
(256) 327-0473 (p)
(256) 327-0966 (f)

Room 4005
601 Genome Way
Huntsville, Alabama 35806


From cjfields at illinois.edu  Thu Aug 19 11:55:54 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 19 Aug 2010 10:55:54 -0500
Subject: [Bioperl-l] Bug? Features with similar ranges,
	different IDs are considered overlapping
In-Reply-To: <EA0C23FB-8C2F-4C04-B0E8-4207409916DC@sbc.su.se>
References: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu>
	<A07B1A30-7708-4401-BB13-7B4463D306E7@sbc.su.se>
	<E3473ED6-2122-4B4A-8A73-E80C4136CCAC@illinois.edu>
	<83732A2C-C6E3-479A-ACAA-FE5756271991@sbc.su.se>
	<B7A8E3B4-1E7E-4768-AFF3-3D4C4A5FC3B1@scottcain.net>
	<EA0C23FB-8C2F-4C04-B0E8-4207409916DC@sbc.su.se>
Message-ID: <5611499B-FA63-4A52-8279-99B554418374@illinois.edu>

On Aug 17, 2010, at 8:52 AM, Dave Messina wrote:

>> It seems to me that the genomic comparison is the thing people would do more often, so if you're going to create a flag, the default should be for the genomic comparison
> 
> Yep, agreed.
> 
> And such a flag should be named for the non-default behavior, then, like: -ignore_IDs_for_overlaps
> 
> Dave

Probably would just be -ignore_ids as this behavior would have to be consistent across the various Bio::RangeI methods (overlaps, contains, etc).  The params are case-insensitive IIRC, so the _IDs would just be lc().

RangeI doesn't define a seq_id(), though, so we either use can() in RangeI (which is dirtier IMO) or define this in the appropriate class, probably LocationI or SeqFeatureI.

chris


From cjfields at illinois.edu  Thu Aug 19 11:56:11 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 19 Aug 2010 10:56:11 -0500
Subject: [Bioperl-l] Bug? Features with similar ranges,
	different IDs are considered overlapping
In-Reply-To: <B7A8E3B4-1E7E-4768-AFF3-3D4C4A5FC3B1@scottcain.net>
References: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu>
	<A07B1A30-7708-4401-BB13-7B4463D306E7@sbc.su.se>
	<E3473ED6-2122-4B4A-8A73-E80C4136CCAC@illinois.edu>
	<83732A2C-C6E3-479A-ACAA-FE5756271991@sbc.su.se>
	<B7A8E3B4-1E7E-4768-AFF3-3D4C4A5FC3B1@scottcain.net>
Message-ID: <7CF700A0-C7A0-4BD2-9757-50B693B3B614@illinois.edu>

Makes sense.  

chris

On Aug 17, 2010, at 7:45 AM, Scott Cain wrote:

> Hi Dave and Chris,
> 
> It seems to me that the genomic comparison is the thing people would do more often, so if you're going to create a flag, the default should be for the genomic comparison and if somebody is doing the protein space comparison and not getting the the expected results, they'll probably read the docs to find out why. 
> 
> Scott
> 
> --
> Scott Cain, Ph. D.
> scott at scottcain dot net
> Ontario Institute for Cancer Research
> http://gmod.org/
> 216 392 3087 
> 
> Snet from my iPhone.
> 
> On Aug 17, 2010, at 5:06 AM, Dave Messina <David.Messina at sbc.su.se> wrote:
> 
>>> Good point; it's probably the context the methods are used that matters.  So, maybe just a document clarification?
>> 
>> That's always good, but it really doesn't solve the issue you're describing.
>> 
>> I mean, who would expect to get overlaps for features on different chromosomes?
>> 
>> To me, that's a clear violation of reasonable user expectations. You shouldn't have to read the docs about something like that.
>> 
>> So what's the solution for these duelling use cases? I haven't thought about it much, but a first approximation might be to add a -genomic boolean flag that, when true, would do the right thing and check the ID when doing overlaps or other positional comparisons.
>> 
>> (Maybe -genomic is too obscure. Maybe it should be -same_id_for_overlaps or something like that.)
>> 
>> And maybe having to know to set a flag is effectively the same thing as having to read the docs to understand SeqFeature's overlap behavior.
>> 
>> What do the rest of you out there think?
>> 
>> 
>> Dave
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From David.Messina at sbc.su.se  Thu Aug 19 12:54:23 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 19 Aug 2010 18:54:23 +0200
Subject: [Bioperl-l]  Bug? Features with similar ranges,
	different IDs are considered overlapping
References: <83299B71-0F73-440D-A9C5-DC1DA2AFF605@davemessina.com>
Message-ID: <1EFB951F-AEE1-4B2A-9E29-114E40B25D21@sbc.su.se>

[Ccing list for real this time]

On Aug 19, 2010, at 17:55, Chris Fields <cjfields at illinois.edu> wrote:

> Probably would just be -ignore_ids

You're right, that's the way to go. 


> define this in the appropriate class, probably LocationI or 

Yep, that's cleaner.

Thanks!


Dave


From cjfields1 at gmail.com  Thu Aug 19 13:20:32 2010
From: cjfields1 at gmail.com (Christopher Fields)
Date: Thu, 19 Aug 2010 12:20:32 -0500
Subject: [Bioperl-l] Could I install BioPerl on Windows with the
	ActivePerl 5.12.1?
In-Reply-To: <AANLkTimBPL6Sr2kmg+f0t1j8pk_9nBAoqubKzY4AJoxo@mail.gmail.com>
References: <AANLkTi=ycKzqWWQ-FHk=4WBxhedt7CYT-WkBZkxRjgrm@mail.gmail.com>
	<78E913D5-00E2-45F2-AA9D-7F4A7CDBFDA1@gmail.com>
	<AANLkTimBPL6Sr2kmg+f0t1j8pk_9nBAoqubKzY4AJoxo@mail.gmail.com>
Message-ID: <5115F433-06AC-46F1-81AD-D15C4A8D9524@gmail.com>

cc'ing list.  Looks like the BioPerl PPM is possibly broken for perl 5.12.  Shouldn't be too hard to fix, but apparently there are a lot of missing packages. Troubling...

chris

On Aug 19, 2010, at 11:29 AM, han sun wrote:

> v5.10 works,thanks.
> 
> 2010/8/19 Christopher Fields <cjfields1 at gmail.com>
> Try using ActivePerl 5.10 instead of v5.12.  It's very possible the PPM won't work for v5.12 yet.
> 
> chris
> 
> On Aug 19, 2010, at 9:25 AM, han sun wrote:
> 
> > Hello everyone,
> >
> > I have used perl for several months,and I now want to feel the power of
> > bioperl.
> > But it seems that the installing is more difficult than I thought.
> >
> > I typed the commands.
> >
> >
> >
> > install-shell
> >
> >
> > rep add bioperl http://bioperl.org/DIST
> >
> >
> > rep add uwinnipeg
> > http://cpan.uwinnipeg.ca/PPMPackages/12xx/<http://cpan.uwinnipeg.ca/PPMPackages/10xx/>
> >
> >
> > rep add trouchelle http://trouchelle.com/ppm12/
> >
> > install BioPerl
> >
> > However,the installing failed,
> >
> > ppm install failed:
> > Can't find any package that provides List::MoreUtils for Bundle-BioPerl-Core
> > Can't find any package that provides PostScript::TextBlock for
> > Bundle-BioPerl-Core
> > Can't find any package that provides Ace:: for Bundle-BioPerl-Core
> > Can't find any package that provides SOAP::Lite for Bundle-BioPerl-Core
> > Can't find any package that provides Convert::Binary::C for
> > Bundle-BioPerl-Core
> > Can't find any package that provides XML::Twig for Bundle-BioPerl-Core
> > Can't find any package that provides DB_File:: for Bundle-BioPerl-Core
> > Can't find any package that provides IPC::Run for GraphViz
> > Can't find any package that provides XML-XPathEngine for XML-DOM-XPath
> > Can't find any package that provides List-MoreUtils for Moose
> > Can't find any package that provides List-MoreUtils for Class-MOP
> >
> >
> > then I tried
> >
> > install http://www.bribes.org/perl/ppm/GD.ppd
> >
> > and tried the installation again,but it still didn't help.
> >
> > *
> > *
> > *
> > *
> > *
> > *
> >
> >
> > *Do you konw what's wrong with the problem?*
> > *
> > *
> > *
> > *
> > *Please help me,thanks very much.*
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 


From rmb32 at cornell.edu  Thu Aug 19 13:09:45 2010
From: rmb32 at cornell.edu (Robert Buels)
Date: Thu, 19 Aug 2010 10:09:45 -0700
Subject: [Bioperl-l] reminder: Aug 25 deadline for GMOD Hackathon application
Message-ID: <4C6D6559.3080809@cornell.edu>

Hi all,

This is your one-week reminder: the deadline for open applications to 
the GMOD Evo hackathon is Wednesday, August 25th.

Rob

========================================

We are seeking participants for the GMOD Tools for Evolutionary Biology
Hackathon, held November 8-12, 2010 at the US National Evolutionary
Synthesis Center (NESCent) in Durham, NC.

This hackathon targets three critical gaps in the capabilities of the
GMOD toolbox that currently limit its utility for evolutionary research:

  1. Visualization of comparative genomics data
  2. Visualization of phylogenetic data and trees
  3. Support for population diversity and phenotype data

If you are interested in these areas and have relevant expertise, you
are strongly encouraged to apply. Relevant areas of expertise include
more than just software development: if you are a GMOD power user,
visualization guru, domain expert (comparative, phylogenetics,
population, ...), or documentation wizard, then your skills are needed!

How To Apply:

Fill out the online application form at http://bit.ly/gmodevohack.
Applications are due August 25.

About GMOD:

GMOD is an intercompatible suite of open-source software components for
storing, managing, analyzing, and visualizing genome-scale data. GMOD
includes many widely-used software components: GBrowse and JBrowse, both
genome viewers; GBrowse_syn, a comparative genomics viewer; Chado, a
generic and modular database schema; CMap, a comparative map viewer; as
well as many other components including Apollo, MAKER, BioMart,
InterMine, and Galaxy. We hope to extend the functionality of existing
GMOD components, and integrate new components as well.

About Hackathons:

A hackathon is an intense event at which a group of programmers with
different backgrounds and skills collaborate hands-on and face-to-face
to develop working code that is of utility to the community as a whole.
The mix of people will include domain experts and computer-savvy end-users.

More details about the event, its motivation, organization, procedures,
and attendees, as well as URLs to the hackathon and related websites are
included below.

Sincerely,

The GMOD EvoHack Organizing Committee (and project affiliations as
relevant):

Nicole Washington, Chair (LBNL, modENCODE, Phenote)

Robert Buels (SGN, Chado NatDiv)

Scott Cain (OICR, GMOD)

Dave Clements (NESCent, GMOD)

Hilmar Lapp (NESCent, Phenoscape, Chado NatDiv)

Sheldon McKay (University of Arizona, iPlant, GBrowse_syn)


-----------------------------

About the GMOD Evo Hackathon

Overview

We are organizing a hackathon to fill critical gaps in the capabilities
of the Generic Model Organism Database (GMOD) toolbox that currently
limit its utility for evolutionary research. Specifically, we will focus
on tools for

   1) viewing comparative genomics data;
   2) visualizing phylogenomic data; and
   3) supporting population diversity data and phenotype annotation.

The event will be hosted at NESCent and bring together a group of about
20+ software developers, end-user representatives, and documentation
experts who would otherwise not meet. The participants will include key
developers of GMOD components that currently lack features critical for
emerging evolutionary biology research, developers of informatics tools
in evolutionary research that lack GMOD integration, and
informatics-savvy biologists who can represent end-user requirements.

The event will provide a unique opportunity to infuse the GMOD developer
community with a heightened awareness of unmet needs in evolutionary
biology that GMOD components have the potential to fill, and for tool
developers in evolutionary biology to better understand how best to
extend or integrate with already existing GMOD components.

Before the Event

Discussion of ideas and sometimes even design actually starts well
before the hackathon, on mailing lists, wiki pages, and conference calls
set up among accepted attendees.  This advance work lays the foundation
for participants to be productive from the very first day.  This also
means that participants should be willing to contribute some time in
advance of the hackathon itself to participate in this preparatory
discussion.

During the Event

Typically, hackathon participants use the morning of the first day of
the event to organize themselves into working groups of between 3 and 6
people, each with a focused implementation objective.  Ideas and
objectives are discussed, and attendees coalesce around the projects in
which they have the most experience or interest.


Deliverables / Event Results

The meeting's attendance, working groups, and outcomes will be fully
logged and documented on the GMOD wiki (http://gmod.org). Each working
group during the event will typically have its own wiki page, linked
from the main EvoHack page, where it documents its minutes and design
notes, and provides links to the code and documentation it produces.
Also, since GMOD and NESCent are both committed to open source
principles, all code and documentation produced by participants during
the hackathon must be published under an OSI-approved open source
license. As contributions to existing GMOD tools, all hackathon products
will most likely satisfy this requirement automatically.

NESCent

This event is sponsored by the US National Evolutionary Synthesis Center
(NESCent, http://www.nescent.org) through its Informatics Whitepapers
program (http://www.nescent.org/informatics/whitepapers.php). NESCent
promotes the synthesis of information, concepts and knowledge to address
significant, emerging, or novel questions in evolutionary science and
its applications. NESCent achieves this by supporting research and
education across disciplinary, institutional, geographic, and
demographic boundaries (see http://www.nescent.org/science/proposals.php).

Links

Main GMOD EvoHack page, and full proposal:
http://gmod.org/wiki/GMOD_Evo_Hackathon

NESCent: http://www.nescent.org/
GMOD: http://gmod.org <http://gmod.org/>
Similar past NESCent events, see: http://hackathon.nescent.org/
GMOD hackathon application:  http://bit.ly/gmodevohack

-- 
http://gmod.org/wiki/GMOD_News
http://gmod.org/wiki/GMOD_Europe_2010
http://gmod.org/wiki/Help_Desk_Feedback


From David.Messina at sbc.su.se  Thu Aug 19 14:55:50 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 19 Aug 2010 20:55:50 +0200
Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlast.pm - bl2seq
	question
In-Reply-To: <4C6D7123.9080908@bcm.tmc.edu>
References: <4C6C3259.4060304@bcm.tmc.edu>
	<E8F0F7A7-BC33-4E37-8AAB-75A9470E82A5@sbc.su.se>
	<4C6D7123.9080908@bcm.tmc.edu>
Message-ID: <4E977318-05AC-4D8E-9A39-8C07A2419198@sbc.su.se>


Glad I could help, Caleb.

Dave


On Aug 19, 2010, at 20:00, Caleb Davis <cdavis at bcm.tmc.edu> wrote:

> Hi Dave,
> 
> Thank you so much for your detailed response! Fixing the reward parameter replicated the online result for me.  All of the other factors you brought up will help me track down any future problems. Thanks again.
> 
> --Caleb
> 


From rmb32 at cornell.edu  Thu Aug 19 18:19:11 2010
From: rmb32 at cornell.edu (Robert Buels)
Date: Thu, 19 Aug 2010 15:19:11 -0700
Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query
In-Reply-To: <5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu>
References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au>
	<986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net>
	<AANLkTikiTQwzKJGrAKAtCqbHiAVw_uENrJu4Y038=anD@mail.gmail.com>
	<045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net>
	<5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu>
Message-ID: <4C6DADDF.1000103@cornell.edu>

Chris Fields wrote:
> I think it's worth exploring having a DBIx::Class-based middle-ware approach similar to what Rob Buels has done for Chado.  That would be fairly easy to get started using DBIx::Class::Schema::Loader.
> 
> After that it would require optimization and tweaking, which is potentially more complex than Rob's setup as Chado is very Pg-specific, but maybe Rob can elaborate...

Elaborating on how Bio::Chado::Schema is developed:

The vast majority of the code and POD in BCS is autogenerated by 
DBIx::Class::Schema::Loader.  DBICSL gives you a baseline set of 
DBIx::Class classes that covers all the tables, views, columns, unique 
constraints, and foreign key relationships.

Beyond that, you have to add on yourself.  In BCS, we have mostly done 
things like:

   * make better-named aliases for some of the autogenerated
     relationships (though DBICSL does a surprisingly good job of naming
     relationships automatically most of the time)
   * add a tiny bit of bioperl compatibility (this needs a lot more work
     by somebody, volunteers needed!)
   * add convenience methods for using some of the Chado property tables
   * use DBIx::Class::Tree::NestedSet to add some powerful ways of
     traversing phylogenetic tree relationships

Regarding DB backend specificity, BCS isn't Pg-specific at all, because 
DBIx::Class itself goes to great lengths to be compatible (and 
performant!) with just about every relational database out there.  In 
fact, the BCS test suite deploys a Chado schema into a temporary SQLite 
database using DBIC::Schema's deploy() method, and runs all of its tests 
on that.  Very handy.

Chado's Pg-specific server-side functions can of course be called 
through BCS if they are present, but it's perfectly possible to use 
Chado without any of the server-side functions, and mostly the way I use it.

Rob


From David.Messina at sbc.su.se  Fri Aug 20 05:19:14 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 20 Aug 2010 11:19:14 +0200
Subject: [Bioperl-l] Git for the lazy
Message-ID: <4A13D48C-B920-4FA5-AF18-292C764A8B79@sbc.su.se>

Hi everyone,

If you're like me and still getting up to speed with Git, you might find this helpful:

	http://www.spheredev.org/wiki/Git_for_the_lazy


Dave


From bgs500 at york.ac.uk  Fri Aug 20 09:07:50 2010
From: bgs500 at york.ac.uk (Ben Saville)
Date: Fri, 20 Aug 2010 14:07:50 +0100
Subject: [Bioperl-l] Problem Parsing BLAST output
Message-ID: <77D8BE69-E2D3-4C75-BED4-08F1755E414F@york.ac.uk>

Hi Everyone,

I'm very much new to the world of sequence data analysis (and this  
mailing list!), and have reached a roadblock.

I have BLASTed some contigs against a series of databases that I  
created. From this I would like to parse through the data and separate  
it before extracting the information of interest at a later point. I  
would like to separate the data by query ID. I found the following  
Bioperl script;

#!/usr/bin/perl

use Bio::Search::Result::BlastResult;
use Bio::SearchIO;

my $report = Bio::SearchIO->new( -file=>'All_BCM_results.bls', -format  
=> blast);
my $result = $report->next_result;
my %hits_by_query;
while (my $hit = $result->next_hit) {
   push @{$hits_by_query{$hit->name}}, $hit;
}

foreach my $qid ( keys %hits_by_query ) {
   my $result = Bio::Search::Result::BlastResult->new();
   $result->add_hit($_) for ( @{$hits_by_query{$qid}} );
   my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", - 
format=>'blast' );
   $blio->write_result($result);
}

running this script resulted in the following error;

BlastResult::new(): Not adding iterations.

------------- EXCEPTION: Bio::Root::NoSuchThing -------------
MSG: No such iteration number: 0. Valid range=1-0
VALUE: The number zero (0)
STACK: Error::throw
STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.8/Bio/Root/Root.pm:368
STACK: Bio::Search::Result::BlastResult::iteration /sw/lib/perl5/5.8.8/ 
Bio/Search/Result/BlastResult.pm:328
STACK: Bio::Search::Result::BlastResult::add_hit /sw/lib/perl5/5.8.8/ 
Bio/Search/Result/BlastResult.pm:258
STACK: /Users/bsaville/Desktop/Parsing_BLAST_by_query.pl:15
-------------------------------------------------------------

So I added
my $result = Bio::Search::Result::BlastResult->new(1);
The 1 to the line shown above, as it told me this was within the valid  
range. This produced the following error;

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Must define arrayref of Iterations when initializing a  
Bio::Search::Result::BlastResult

STACK: Error::throw
STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.8/Bio/Root/Root.pm:368
STACK: Bio::Search::Result::BlastResult::new /sw/lib/perl5/5.8.8/Bio/ 
Search/Result/BlastResult.pm:128
STACK: /Users/bsaville/Desktop/Parsing_BLAST_by_query.pl:14
-----------------------------------------------------------

I know that it is my inexperience that is causing this problem, but I  
really can't figure this out.

Regards
Ben Saville


From David.Messina at sbc.su.se  Fri Aug 20 09:48:28 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 20 Aug 2010 15:48:28 +0200
Subject: [Bioperl-l] Problem Parsing BLAST output
In-Reply-To: <77D8BE69-E2D3-4C75-BED4-08F1755E414F@york.ac.uk>
References: <77D8BE69-E2D3-4C75-BED4-08F1755E414F@york.ac.uk>
Message-ID: <0384052D-74D2-4789-B7FA-76EED826044F@sbc.su.se>

Hi Ben,

I would not use the script you posted ? I don't think it does what you want.

If you haven't already, you should take a look at the beginners' HOWTO

	http://www.bioperl.org/wiki/HOWTO:Beginners


 the SearchIO HOWTO

	http://www.bioperl.org/wiki/HOWTO:SearchIO


and the example scripts included with BioPerl:

	http://www.bioperl.org/wiki/Scripts


Incidentally, it's a lot of fiddly data processing to parse blast reports for many contigs against multiple databases and then go back and collate the results by query. I'm not sure exactly what you want to do once you've separated by query ? if you provide some more information, we could suggest ways to best get you where you want to go.

I will mention, though, that BLAST has the ability to search multiple separate databases in one go and collate the results for you. So that's something to consider.


Dave


From bernd.web at gmail.com  Fri Aug 20 11:17:05 2010
From: bernd.web at gmail.com (Bernd Web)
Date: Fri, 20 Aug 2010 17:17:05 +0200
Subject: [Bioperl-l] Bio::LocatableSeq end checking inconsistency
In-Reply-To: <004a01cb3aec$8c2ddd60$a4899820$%yin@ucd.ie>
References: <004a01cb3aec$8c2ddd60$a4899820$%yin@ucd.ie>
Message-ID: <AANLkTim2MyJ1XKmvYHr+8gX-j9h9z81==e5suTW09PWs@mail.gmail.com>

Hi Yin,

I am not quite sure if the following is also related to your gapped
length issue but I found I had to adapt the calculation of
ungapped_len in   Bio::LocatableSeq. If my slices did not contain any
letters or a new gap char I used, SimpleAlign could not find the
sequences when outputting the alignment. This was due to a difference
in length calculation:

SimpleAlign: uses \W:  $slice_seq =~ s/\W//g;
Bio::LocatableSeq::ungapped_len uses  "$string =~ s/[\.\-]+//g;"

I had to include '~' (for my local sequences) in the ungapped_len;
otherwise i would run into the end issues with SimpleAlign.


Kind regards,
Bernd


On Fri, Aug 13, 2010 at 3:36 PM, Jun Yin <jun.yin at ucd.ie> wrote:
> Hi, all,
>
>
>
> I am the google summer of code student working on Bio::Align subsystem
> refactoring. The code (Bio::SimpleAlign) I re-implemented now has passed
> nearly all the test, except a few tests on seq/start-end testing. But here
> comes a problem. This may be an old issue, that the Bio::LocatableSeq end
> assignment and checking are inconsistent.
>
>
>
> The current end checking method is based on:
>
> $end=$seq->_ungapped_len+$seq->start-1
>
> However, this checking may not fit the real world case.
>
>
>
> The inconsistency usually happens when a few columns of the sequence are
> removed.
>
>
>
> For example:
>
> my $a = Bio::LocatableSeq->new(
>
> ? ?-id ? ?=> 'a',
>
> ? ?-strand => 1,
>
> ? ?-seq ? => '-tcgatc-atcgatcg',
>
> ? ?-start => 30,
>
> ? ?-end ? => 43
>
> );
>
>
>
> If we remove the 1st, 8th and the last columns
>
>
>
> $a->seq() will be 'tcgatcatcgatc'
>
> $a->_ungapped_len==12
>
>
>
> Actually, in the real world, the first residue will still be 30 (the old
> $seq->start), and the last residue is the residue before the 43 (the old
> $seq->end), thus 42.
>
>
>
> But if you call a validation, the calculation is
> $a->_ungapped_len+$a->start-1=12+30-1=41
>
> So the reassignment of the $seq->end will not pass the validation.
>
>
>
> So unless you save the information to a new sequence object, the original
> position information will be lost anyway. But in some cases, we have to
> change the sequence in its original sequence object ..
>
>
>
> What is your suggestion on this issue?
>
> A. pass the test and lose the information ? ? ?#convenient in coding but the
> start-end annotation is not right any more
>
> B. keep the information and forget the test ? #the object will still
> remember where the last residue was in the original sequence. But is it
> really meaningful at all? Because all the other residues may come from
> nowhere
>
> C. Neither of above #any other suggestions?
>
>
>
> Cheers,
>
> Jun Yin
>
> Ph.D. student in U.C.D.
>
>
>
> Bioinformatics Laboratory
>
> Conway Institute
>
> University College Dublin
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From sidd.basu at gmail.com  Fri Aug 20 11:59:59 2010
From: sidd.basu at gmail.com (Siddhartha Basu)
Date: Fri, 20 Aug 2010 10:59:59 -0500
Subject: [Bioperl-l]  Re: bioperl-db and postgres8.3 - status query
In-Reply-To: <4C6DADDF.1000103@cornell.edu>
References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au>
	<986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net>
	<AANLkTikiTQwzKJGrAKAtCqbHiAVw_uENrJu4Y038=anD@mail.gmail.com>
	<045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net>
	<5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu>
	<4C6DADDF.1000103@cornell.edu>
Message-ID: <20100820155956.GB400@vpn-165-124-164-118.vpn.northwestern.edu>

Hi, 

On Thu, 19 Aug 2010, Robert Buels wrote:

> Chris Fields wrote:
> > I think it's worth exploring having a DBIx::Class-based middle-ware 
> > approach similar to what Rob Buels has done for Chado.  That would be 
> > fairly easy to get started using DBIx::Class::Schema::Loader.
> > After that it would require optimization and tweaking, which is 
> > potentially more complex than Rob's setup as Chado is very Pg-specific, 
> > but maybe Rob can elaborate...
>
> Elaborating on how Bio::Chado::Schema is developed:
>
> The vast majority of the code and POD in BCS is autogenerated by 
> DBIx::Class::Schema::Loader.  DBICSL gives you a baseline set of 
> DBIx::Class classes that covers all the tables, views, columns, unique 
> constraints, and foreign key relationships.
>
> Beyond that, you have to add on yourself.  In BCS, we have mostly done 
> things like:
>
>   * make better-named aliases for some of the autogenerated
>     relationships (though DBICSL does a surprisingly good job of naming
>     relationships automatically most of the time)
>   * add a tiny bit of bioperl compatibility (this needs a lot more work
>     by somebody, volunteers needed!)
>   * add convenience methods for using some of the Chado property tables
>   * use DBIx::Class::Tree::NestedSet to add some powerful ways of
>     traversing phylogenetic tree relationships
>
> Regarding DB backend specificity, BCS isn't Pg-specific at all, because 
> DBIx::Class itself goes to great lengths to be compatible (and performant!) 
> with just about every relational database out there.  
I would vouch for that at least as far as chado in oracle is concerned.
So,  far BCS works out flawlessly with our oracle chado instance at
dictybase. Quite a chunk of BCS based code is also active in couple of
our Mojo based webapps. The part which i still couldn't use directly is
the 'synonym' table as it clashes with oracle specific reserved keywords. 
However,  overall it seems to quite cross-RDMS compatible and highly
recommended.

-siddhartha


>In fact, the BCS test 
> suite deploys a Chado schema into a temporary SQLite database using 
> DBIC::Schema's deploy() method, and runs all of its tests on that.  Very 
> handy.
>
> Chado's Pg-specific server-side functions can of course be called through 
> BCS if they are present, but it's perfectly possible to use Chado without 
> any of the server-side functions, and mostly the way I use it.
>
> Rob
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jun.yin at ucd.ie  Fri Aug 20 12:17:33 2010
From: jun.yin at ucd.ie (Jun Yin)
Date: Fri, 20 Aug 2010 17:17:33 +0100
Subject: [Bioperl-l] Bio::LocatableSeq end checking inconsistency
In-Reply-To: <AANLkTim2MyJ1XKmvYHr+8gX-j9h9z81==e5suTW09PWs@mail.gmail.com>
References: <004a01cb3aec$8c2ddd60$a4899820$%yin@ucd.ie>
	<AANLkTim2MyJ1XKmvYHr+8gX-j9h9z81==e5suTW09PWs@mail.gmail.com>
Message-ID: <000b01cb4083$31f98280$95ec8780$%yin@ucd.ie>

Hi, Bernd,

Thx for your input. 

Yes, this is one of the old bugs in Bio::SimpleAlign.  $aln->slice just
simply $slice_seq =~ s/\W//g to calculate the ungapped length.
But in  $seq->_ungapped_len, this method use $string =~
s{[$GAP_SYMBOLS$FRAMESHIFT_SYMBOLS]+}{}g;
Which is '\-\.=~\\\/ ' to calculate the ungapped length.

To solve this problem, first, now I use 
$nonres = join("",$self->gap_char, $self->match_char,$self->missing_char);
Which is '-\.&' to remove the non-residue chars in the alignment sequence
(though if you use '=','~','\','/' will also cause problems).

Secondly, I have merged slice, remove_columns and remove_gaps, using the
same internal function. Thus it is easier to debug.

These changes will be merged into main BioPerl branch after next version.

But anyway, the confict is still there, because the non residue chars are
defined as:
In Bio::SimpleAlign, $aln->gap_char, $aln->missing_char, $aln->match_char
In Bio::LocatableSeq   
$GAP_SYMBOLS = '\-\.=~';
$FRAMESHIFT_SYMBOLS = '\\\/';

so try to use '-' or '.' for your gap char at the moment, otherwise you may
encounter end warnings in calculation.

And, if you want to keep gap only sequences, you can call the method as:
$aln2 = $aln->slice(20,30,1)
The last parameter is to keep gap only sequence.

Cheers,
Jun Yin
Ph.D.?student in U.C.D.

Bioinformatics Laboratory
Conway Institute
University College Dublin


-----Original Message-----
From: Bernd Web [mailto:bernd.web at gmail.com] 
Sent: Friday, August 20, 2010 4:17 PM
To: Jun Yin
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Bio::LocatableSeq end checking inconsistency

Hi Yin,

I am not quite sure if the following is also related to your gapped
length issue but I found I had to adapt the calculation of
ungapped_len in   Bio::LocatableSeq. If my slices did not contain any
letters or a new gap char I used, SimpleAlign could not find the
sequences when outputting the alignment. This was due to a difference
in length calculation:

SimpleAlign: uses \W:  $slice_seq =~ s/\W//g;
Bio::LocatableSeq::ungapped_len uses  "$string =~ s/[\.\-]+//g;"

I had to include '~' (for my local sequences) in the ungapped_len;
otherwise i would run into the end issues with SimpleAlign.


Kind regards,
Bernd


On Fri, Aug 13, 2010 at 3:36 PM, Jun Yin <jun.yin at ucd.ie> wrote:
> Hi, all,
>
>
>
> I am the google summer of code student working on Bio::Align subsystem
> refactoring. The code (Bio::SimpleAlign) I re-implemented now has passed
> nearly all the test, except a few tests on seq/start-end testing. But here
> comes a problem. This may be an old issue, that the Bio::LocatableSeq end
> assignment and checking are inconsistent.
>
>
>
> The current end checking method is based on:
>
> $end=$seq->_ungapped_len+$seq->start-1
>
> However, this checking may not fit the real world case.
>
>
>
> The inconsistency usually happens when a few columns of the sequence are
> removed.
>
>
>
> For example:
>
> my $a = Bio::LocatableSeq->new(
>
> ? ?-id ? ?=> 'a',
>
> ? ?-strand => 1,
>
> ? ?-seq ? => '-tcgatc-atcgatcg',
>
> ? ?-start => 30,
>
> ? ?-end ? => 43
>
> );
>
>
>
> If we remove the 1st, 8th and the last columns
>
>
>
> $a->seq() will be 'tcgatcatcgatc'
>
> $a->_ungapped_len==12
>
>
>
> Actually, in the real world, the first residue will still be 30 (the old
> $seq->start), and the last residue is the residue before the 43 (the old
> $seq->end), thus 42.
>
>
>
> But if you call a validation, the calculation is
> $a->_ungapped_len+$a->start-1=12+30-1=41
>
> So the reassignment of the $seq->end will not pass the validation.
>
>
>
> So unless you save the information to a new sequence object, the original
> position information will be lost anyway. But in some cases, we have to
> change the sequence in its original sequence object ..
>
>
>
> What is your suggestion on this issue?
>
> A. pass the test and lose the information ? ? ?#convenient in coding but
the
> start-end annotation is not right any more
>
> B. keep the information and forget the test ? #the object will still
> remember where the last residue was in the original sequence. But is it
> really meaningful at all? Because all the other residues may come from
> nowhere
>
> C. Neither of above #any other suggestions?
>
>
>
> Cheers,
>
> Jun Yin
>
> Ph.D. student in U.C.D.
>
>
>
> Bioinformatics Laboratory
>
> Conway Institute
>
> University College Dublin
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

__________ Information from ESET Smart Security, version of virus signature
database 5377 (20100818) __________

The message was checked by ESET Smart Security.

http://www.eset.com


__________ Information from ESET Smart Security, version of virus signature
database 5377 (20100818) __________

The message was checked by ESET Smart Security.

http://www.eset.com
 

From cjfields at illinois.edu  Fri Aug 20 12:23:07 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 20 Aug 2010 11:23:07 -0500
Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query
In-Reply-To: <20100820155956.GB400@vpn-165-124-164-118.vpn.northwestern.edu>
References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au>
	<986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net>
	<AANLkTikiTQwzKJGrAKAtCqbHiAVw_uENrJu4Y038=anD@mail.gmail.com>
	<045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net>
	<5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu>
	<4C6DADDF.1000103@cornell.edu>
	<20100820155956.GB400@vpn-165-124-164-118.vpn.northwestern.edu>
Message-ID: <1282321387.30812.17.camel@pyrimidine.igb.uiuc.edu>

On Fri, 2010-08-20 at 10:59 -0500, Siddhartha Basu wrote:
> Hi, 
> 
> On Thu, 19 Aug 2010, Robert Buels wrote:
> 
> > Chris Fields wrote:
> > > I think it's worth exploring having a DBIx::Class-based middle-ware 
> > > approach similar to what Rob Buels has done for Chado.  That would be 
> > > fairly easy to get started using DBIx::Class::Schema::Loader.
> > > After that it would require optimization and tweaking, which is 
> > > potentially more complex than Rob's setup as Chado is very Pg-specific, 
> > > but maybe Rob can elaborate...
> >
> > Elaborating on how Bio::Chado::Schema is developed:
> >
> > The vast majority of the code and POD in BCS is autogenerated by 
> > DBIx::Class::Schema::Loader.  DBICSL gives you a baseline set of 
> > DBIx::Class classes that covers all the tables, views, columns, unique 
> > constraints, and foreign key relationships.
> >
> > Beyond that, you have to add on yourself.  In BCS, we have mostly done 
> > things like:
> >
> >   * make better-named aliases for some of the autogenerated
> >     relationships (though DBICSL does a surprisingly good job of naming
> >     relationships automatically most of the time)
> >   * add a tiny bit of bioperl compatibility (this needs a lot more work
> >     by somebody, volunteers needed!)
> >   * add convenience methods for using some of the Chado property tables
> >   * use DBIx::Class::Tree::NestedSet to add some powerful ways of
> >     traversing phylogenetic tree relationships
> >
> > Regarding DB backend specificity, BCS isn't Pg-specific at all, because 
> > DBIx::Class itself goes to great lengths to be compatible (and performant!) 
> > with just about every relational database out there.  
> I would vouch for that at least as far as chado in oracle is concerned.
> So,  far BCS works out flawlessly with our oracle chado instance at
> dictybase. Quite a chunk of BCS based code is also active in couple of
> our Mojo based webapps. The part which i still couldn't use directly is
> the 'synonym' table as it clashes with oracle specific reserved keywords. 
> However,  overall it seems to quite cross-RDMS compatible and highly
> recommended.
> 
> -siddhartha

Just to point out, I didn't say BCS is Pg-specific, but that Chado is
(that was the DBMS it was designed for).  Maybe that should be amended
to 'was' now :)

I recall seeing a page on this somewhere on the GMOD website along the
lines of "MySQL has problems so we chose Pg", and that Chado support
would focus on Pg.  I'm guessing that's no longer the case?  Or is only
the server-side stuff Pg-specific.

> >In fact, the BCS test 
> > suite deploys a Chado schema into a temporary SQLite database using 
> > DBIC::Schema's deploy() method, and runs all of its tests on that.  Very 
> > handy.
> >
> > Chado's Pg-specific server-side functions can of course be called through 
> > BCS if they are present, but it's perfectly possible to use Chado without 
> > any of the server-side functions, and mostly the way I use it.
> >
> > Rob

I think this opens up the possibility of starting a DBIx::Class-based
middleware solution.  Hilmar, did you want to take that on?

chris


From sidd.basu at gmail.com  Fri Aug 20 13:39:44 2010
From: sidd.basu at gmail.com (Siddhartha Basu)
Date: Fri, 20 Aug 2010 12:39:44 -0500
Subject: [Bioperl-l]  Re: bioperl-db and postgres8.3 - status query
In-Reply-To: <1282321387.30812.17.camel@pyrimidine.igb.uiuc.edu>
References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au>
	<986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net>
	<AANLkTikiTQwzKJGrAKAtCqbHiAVw_uENrJu4Y038=anD@mail.gmail.com>
	<045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net>
	<5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu>
	<4C6DADDF.1000103@cornell.edu>
	<20100820155956.GB400@vpn-165-124-164-118.vpn.northwestern.edu>
	<1282321387.30812.17.camel@pyrimidine.igb.uiuc.edu>
Message-ID: <20100820173942.GC400@vpn-165-124-164-118.vpn.northwestern.edu>

On Fri, 20 Aug 2010, Chris Fields wrote:

> On Fri, 2010-08-20 at 10:59 -0500, Siddhartha Basu wrote:
> > Hi, 
> > 
> > On Thu, 19 Aug 2010, Robert Buels wrote:
> > 
> > > Chris Fields wrote:
> > > > I think it's worth exploring having a DBIx::Class-based middle-ware 
> > > > approach similar to what Rob Buels has done for Chado.  That would be 
> > > > fairly easy to get started using DBIx::Class::Schema::Loader.
> > > > After that it would require optimization and tweaking, which is 
> > > > potentially more complex than Rob's setup as Chado is very Pg-specific, 
> > > > but maybe Rob can elaborate...
> > >
> > > Elaborating on how Bio::Chado::Schema is developed:
> > >
> > > The vast majority of the code and POD in BCS is autogenerated by 
> > > DBIx::Class::Schema::Loader.  DBICSL gives you a baseline set of 
> > > DBIx::Class classes that covers all the tables, views, columns, unique 
> > > constraints, and foreign key relationships.
> > >
> > > Beyond that, you have to add on yourself.  In BCS, we have mostly done 
> > > things like:
> > >
> > >   * make better-named aliases for some of the autogenerated
> > >     relationships (though DBICSL does a surprisingly good job of naming
> > >     relationships automatically most of the time)
> > >   * add a tiny bit of bioperl compatibility (this needs a lot more work
> > >     by somebody, volunteers needed!)
> > >   * add convenience methods for using some of the Chado property tables
> > >   * use DBIx::Class::Tree::NestedSet to add some powerful ways of
> > >     traversing phylogenetic tree relationships
> > >
> > > Regarding DB backend specificity, BCS isn't Pg-specific at all, because 
> > > DBIx::Class itself goes to great lengths to be compatible (and performant!) 
> > > with just about every relational database out there.  
> > I would vouch for that at least as far as chado in oracle is concerned.
> > So,  far BCS works out flawlessly with our oracle chado instance at
> > dictybase. Quite a chunk of BCS based code is also active in couple of
> > our Mojo based webapps. The part which i still couldn't use directly is
> > the 'synonym' table as it clashes with oracle specific reserved keywords. 
> > However,  overall it seems to quite cross-RDMS compatible and highly
> > recommended.
> > 
> > -siddhartha
> 
> Just to point out, I didn't say BCS is Pg-specific, but that Chado is
> (that was the DBMS it was designed for).  Maybe that should be amended
> to 'was' now :)
> 
> I recall seeing a page on this somewhere on the GMOD website along the
> lines of "MySQL has problems so we chose Pg", and that Chado support
> would focus on Pg.  
As far as i understand GMOD stongly recommends and the popular backend
for chado is Pg. However, my point was if anybody wants to use or tryout chado
schema on a different backend or have an existing setup,  
tools like DBIx::Class or particularly BCS makes it quite easier to do
so. The code developed on top also become quite robust and portable.

-siddhartha 

>I'm guessing that's no longer the case?  Or is only
> the server-side stuff Pg-specific.
> 
> > >In fact, the BCS test 
> > > suite deploys a Chado schema into a temporary SQLite database using 
> > > DBIC::Schema's deploy() method, and runs all of its tests on that.  Very 
> > > handy.
> > >
> > > Chado's Pg-specific server-side functions can of course be called through 
> > > BCS if they are present, but it's perfectly possible to use Chado without 
> > > any of the server-side functions, and mostly the way I use it.
> > >
> > > Rob
> 
> I think this opens up the possibility of starting a DBIx::Class-based
> middleware solution.  Hilmar, did you want to take that on?
> 
> chris
> 
> 


From buiduyminh at gmail.com  Fri Aug 20 17:29:00 2010
From: buiduyminh at gmail.com (Minh Bui)
Date: Fri, 20 Aug 2010 17:29:00 -0400
Subject: [Bioperl-l] bp_seqfeature_load.pl fails on Mac os. Please help.
Message-ID: <AANLkTinsyOMPJxpks_pqMwLpW8gx0VRihhJsLDnF53mu@mail.gmail.com>

Hi,,
I am trying to load my GFF file to mysql database but I got this error
when I ran the bp_seqfeature_load.pl ( bioperl 1.6.1 on  MAC)

[BioComplexity-5:/usr/local/bin] minh% perl bp_seqfeature_load.pl
install_driver(mysql) failed: Can't locate DBD/mysql.pm in @INC (@INC
contains: /sw/lib/perl5 /sw/lib/perl5/darwin
/System/Library/Perl/5.8.6/darwin-thread-multi-2level
/System/Library/Perl/5.8.6
/Library/Perl/5.8.6/darwin-thread-multi-2level /Library/Perl/5.8.6
/Library/Perl /Network/Library/Perl/5.8.6/darwin-thread-multi-2level
/Network/Library/Perl/5.8.6 /Network/Library/Perl
/System/Library/Perl/Extras/5.8.6/darwin-thread-multi-2level
/System/Library/Perl/Extras/5.8.6 /Library/Perl/5.8.1 .) at (eval 44)
line 3.
Perhaps the DBD::mysql perl module hasn't been fully installed,
or perhaps the capitalisation of 'mysql' isn't right.
Available drivers: DBM, ExampleP, File, Gofer, Proxy, Sponge.
 at /Library/Perl/5.8.6/Bio/DB/SeqFeature/Store/DBI/mysql.pm line 212

I am using MAC OSX version 10.4.10 and MAMP? Isnt it the
"/Library/Perl/5.8.6" already in @INC? What am I missing?
I have been googling this error for a few hours. I also install
Bioperl and reinstall DBD::mysql using CPAN. It still doesnt work..

Here is my $PERL5LIB:  /sw/lib/perl5:/sw/lib/perl5/darwin/

I really need help on this.
Thank you,


From awitney at sgul.ac.uk  Sat Aug 21 06:39:10 2010
From: awitney at sgul.ac.uk (Adam Witney)
Date: Sat, 21 Aug 2010 11:39:10 +0100
Subject: [Bioperl-l] bp_seqfeature_load.pl fails on Mac os. Please help.
In-Reply-To: <AANLkTinsyOMPJxpks_pqMwLpW8gx0VRihhJsLDnF53mu@mail.gmail.com>
References: <AANLkTinsyOMPJxpks_pqMwLpW8gx0VRihhJsLDnF53mu@mail.gmail.com>
Message-ID: <491D1B66-741F-4315-8A6B-46F465956017@sgul.ac.uk>


On 20 Aug 2010, at 22:29, Minh Bui wrote:

> Hi,,
> I am trying to load my GFF file to mysql database but I got this error
> when I ran the bp_seqfeature_load.pl ( bioperl 1.6.1 on  MAC)
> 
> [BioComplexity-5:/usr/local/bin] minh% perl bp_seqfeature_load.pl
> install_driver(mysql) failed: Can't locate DBD/mysql.pm in @INC (@INC
> contains: /sw/lib/perl5 /sw/lib/perl5/darwin
> /System/Library/Perl/5.8.6/darwin-thread-multi-2level
> /System/Library/Perl/5.8.6
> /Library/Perl/5.8.6/darwin-thread-multi-2level /Library/Perl/5.8.6
> /Library/Perl /Network/Library/Perl/5.8.6/darwin-thread-multi-2level
> /Network/Library/Perl/5.8.6 /Network/Library/Perl
> /System/Library/Perl/Extras/5.8.6/darwin-thread-multi-2level
> /System/Library/Perl/Extras/5.8.6 /Library/Perl/5.8.1 .) at (eval 44)
> line 3.
> Perhaps the DBD::mysql perl module hasn't been fully installed,
> or perhaps the capitalisation of 'mysql' isn't right.
> Available drivers: DBM, ExampleP, File, Gofer, Proxy, Sponge.
> at /Library/Perl/5.8.6/Bio/DB/SeqFeature/Store/DBI/mysql.pm line 212
> 
> I am using MAC OSX version 10.4.10 and MAMP? Isnt it the
> "/Library/Perl/5.8.6" already in @INC? What am I missing?
> I have been googling this error for a few hours. I also install
> Bioperl and reinstall DBD::mysql using CPAN. It still doesnt work..
> 
> Here is my $PERL5LIB:  /sw/lib/perl5:/sw/lib/perl5/darwin/


Where did DBD:mysql get installed? can you verify that DBD/mysql.pm is actually in one of those directories listed above?


From i.hatethispart at ymail.com  Sat Aug 21 10:07:28 2010
From: i.hatethispart at ymail.com (keiko)
Date: Sat, 21 Aug 2010 07:07:28 -0700 (PDT)
Subject: [Bioperl-l] clustalw.exe
In-Reply-To: <3612399.post@talk.nabble.com>
References: <3612399.post@talk.nabble.com>
Message-ID: <29499435.post@talk.nabble.com>


Katrin wrote:
> 
> hello, I am a new Perl/Bioperl-User and first I must excuse me for my
> really bad english, but I hope everybody will understand me. I have the
> following problem: In my Perl-skript is the following system call:
> $y=exec("C:\\Programme\\xampp-win32-1.5.1\\xampp\\perl\\clustalw.exe
> C:\\Programme\\xampp-win32-1.5.1\\xampp\\htdocs\\gene\\clustal.fasta"); If
> I call this Script with the Shell (cmd.exe) everything works correctly.
> But if I call this script with PHP I get the following error message:
> Error: unknown option
> /C:\Programme\xampp-win32-1.5.1\xampp\htdocs\gene\clustal.fasta. I tried
> also system and qx. And I tested the environment variables: I wrote a
> bat-file with the definition of all environment-variables and the system
> call, but this did not work, too. The same problem is in php. The
> PHP-Scipt is called from html and I worked under WindowsXP with xampp. I
> hope, somebody can help me. greetings Katrin
> 

Hi. I also have a problem with this one. I want to call clustalw using php.
Can I ask what you included in your bat-file and where did you download your
clustal? thanks a lot!
-- 
View this message in context: http://old.nabble.com/clustalw.exe-tp3612399p29499435.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From jason at bioperl.org  Sun Aug 22 14:29:30 2010
From: jason at bioperl.org (Jason Stajich)
Date: Sun, 22 Aug 2010 11:29:30 -0700
Subject: [Bioperl-l] Enquiry on Bio::DB::Taxonomy
In-Reply-To: <AANLkTik9qpKSQV9dRKzxSrt_q5qq=g6X6eop8LTqkRVm@mail.gmail.com>
References: <AANLkTik9qpKSQV9dRKzxSrt_q5qq=g6X6eop8LTqkRVm@mail.gmail.com>
Message-ID: <4C716C8A.3010000@bioperl.org>

Hi Amali -

This is how I'd print out the full classification by using the Tree 
methods (with probably a different way of initializing the $db object to 
your flatfiles location).

#!/usr/bin/perl -w
use strict;
use Bio::DB::Taxonomy;

my $db= Bio::DB::Taxonomy->new(-source => 'flatfile',
                    -nodesfile => 'taxonomy/nodes.dmp',
                    -namesfile => 'taxonomy/names.dmp');

my $taxonid = $db->get_taxonid('Homo sapiens');
my $taxon = $db->get_taxon(-taxonid => $taxonid);
my $tree = Bio::Tree::Tree->new(-node => $taxon);
my @taxa = $tree->get_nodes;
print join(",", map { $_->scientific_name } @taxa), "\n";

-jason

Amali Thrimawithana wrote, On 8/18/10 3:56 PM:
> Dear Dr Stajich,
>
> I am a Masters student at Auckland university and my research is on
> identifying yeast species present in wine by the use of 454 sequencing. In
> order to carry out this research, a pipeline is being built in which at the
> final step each representative OTU need to be classified at different
> taxonomic levels (ie: at Phylum, family, class, genus and species) by using
> the results from BLAST. To identify the sequences at each taxonomic level, I
> have been trying out the Bio::DB::Taxonomy module in bioperl. Using this
> module, I am able to get the genus and species level by splitting the
> scientific name returned by the Bio::taxon object. But unfortunately I am
> uncertain on how to get the information for the other levels of the rank. I
> have tried several commands including "my @class = $node->classification;",
> but it does not work. Hence, could you please let me know how I might be
> able to get the higher levels of taxonomy such as class and phylum using
> bioperl?
>
> Look forward to hearing from you soon
>
> Thanking You
>
> Amali
>    


From cjfields at illinois.edu  Sun Aug 22 15:56:58 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Sun, 22 Aug 2010 14:56:58 -0500
Subject: [Bioperl-l] clustalw.exe
In-Reply-To: <29499435.post@talk.nabble.com>
References: <3612399.post@talk.nabble.com> <29499435.post@talk.nabble.com>
Message-ID: <E6C6AE4B-A6AB-4B90-8D81-74DE14B165BD@illinois.edu>

On Aug 21, 2010, at 9:07 AM, keiko wrote:

> Katrin wrote:
>> 
>> hello, I am a new Perl/Bioperl-User and first I must excuse me for my
>> really bad english, but I hope everybody will understand me. I have the
>> following problem: In my Perl-skript is the following system call:
>> $y=exec("C:\\Programme\\xampp-win32-1.5.1\\xampp\\perl\\clustalw.exe
>> C:\\Programme\\xampp-win32-1.5.1\\xampp\\htdocs\\gene\\clustal.fasta"); If
>> I call this Script with the Shell (cmd.exe) everything works correctly.
>> But if I call this script with PHP I get the following error message:
>> Error: unknown option
>> /C:\Programme\xampp-win32-1.5.1\xampp\htdocs\gene\clustal.fasta. I tried
>> also system and qx. And I tested the environment variables: I wrote a
>> bat-file with the definition of all environment-variables and the system
>> call, but this did not work, too. The same problem is in php. The
>> PHP-Scipt is called from html and I worked under WindowsXP with xampp. I
>> hope, somebody can help me. greetings Katrin
>> 
> 
> Hi. I also have a problem with this one. I want to call clustalw using php.
> Can I ask what you included in your bat-file and where did you download your
> clustal? thanks a lot!

Not sure, but what does this have to do with BioPerl?

chris


From jason at bioperl.org  Mon Aug 23 11:56:47 2010
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 23 Aug 2010 08:56:47 -0700
Subject: [Bioperl-l] a problem when using the Bioperl modules
In-Reply-To: <AANLkTinZYJC6JwP776K3phzbAmtjiKMi_K_VTH=B6oeC@mail.gmail.com>
References: <AANLkTinZYJC6JwP776K3phzbAmtjiKMi_K_VTH=B6oeC@mail.gmail.com>
Message-ID: <4C729A3F.7080304@bioperl.org>

Wei -

Please ask your questions on the bioperl mailing list, I cannot answer 
questions directly for all requests.
Your problem has been answered by me on the list before so I urge you to 
use the list archives as a starting point.

The line lengths of the fasta file sequence aren't the same length.

you need to run this
bp_sreformat -if fasta -of fasta -i ORIGINAL -o NEW
mv NEW ORIGINAL

or with sreformat
sreformat fasta ORIGINAL > NEW
mv NEW ORIGINAL


Guifeng Wei wrote, On 8/23/10 4:57 AM:
> Dear professor Stajich,
> So sorry to interrupt you. i came across a problem when i use the 
> Bio::DB::Fasta modules of BioPerl.  The aim i want to arrive at is to 
> extract the subsequences accoording to the *.bed files which are the 
> C.elegans genomic sequnece annotation.  The code i programed is in the 
> attached file.
> The genomic sequences file contains sequences from 6 chromosomes of 
> C.elegans.
> when i run this program in the command line, the following error 
> warnings was coming.
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Each line of the fasta entry must be the same length except the last.
>     Line above #301451 '
> ..' is 22 != 51 chars.
> STACK: Error::throw
> STACK: Bio::Root::Root::throw 
> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368
> STACK: Bio::DB::Fasta::calculate_offsets 
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:770
> STACK: Bio::DB::Fasta::index_file 
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:680
> STACK: Bio::DB::Fasta::new 
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:491
> STACK: bed_to_fasta.pl:14 <http://bed_to_fasta.pl:14>
> -----------------------------------------------------------
> indexing was interrupted, so unlinking 
> /home/wgf/WORM_DATA/elegans.WS190.dna.fa.index at 
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm line 1053.
>
> and therefore i write to you in hope that you can help me solve this 
> problem,as well as, give me some suggestion about how to learn Bioperl 
> well.
> thank you very very much.
> yours sincerely
> Wei Guifeng


From jason.stajich at ucr.edu  Mon Aug 23 11:58:07 2010
From: jason.stajich at ucr.edu (Jason Stajich)
Date: Mon, 23 Aug 2010 08:58:07 -0700
Subject: [Bioperl-l] a problem when using the Bioperl modules
In-Reply-To: <AANLkTinrqwQCho_obj-_9MvQAyLEBVvaFA+HzJpFKovS@mail.gmail.com>
References: <AANLkTinZYJC6JwP776K3phzbAmtjiKMi_K_VTH=B6oeC@mail.gmail.com>
	<AANLkTinrqwQCho_obj-_9MvQAyLEBVvaFA+HzJpFKovS@mail.gmail.com>
Message-ID: <4C729A8F.1070506@ucr.edu>

You haven't defined this variable $db - you need to not skip the part 
that initializes the Bio::DB::Fasta object that you had previous asked 
about.
Please send all your future queries to the mailing list.


Guifeng Wei wrote, On 8/23/10 8:14 AM:
> Dear professor,
> after that, i revised my scripts, which is that i divide the genomic 
> sequences into 7 single file, every file contains the sequence from a 
> chromosome.
> however, when i try to run the scripts, the following error was coming.
> Can't call method "seq" on an undefined value at bed_to_fasta.pl 
> <http://bed_to_fasta.pl> line 29, <IN> line 1.
> while(<IN>){
>         chomp $_;
>         my @bed=split(/\s+/, $_ );
>     #print length($db->seq('chrI'));
>         my $chr_id=$bed[0];
>         my $start=$bed[1];
>         my $end=$bed[2];
>         my $seq_name=$bed[3];
>         my $strand=$bed[5];
> my $segment =  $db ->seq($chr_id,$start=>$end);
>         print ">",$seq_name,"_",$chr_id,":",$start=>$end;
>         print "$segment\n";
> }
> the blue line is .
> why?

-- 
Jason E. Stajich, PhD
Assistant Professor
Department of Plant Pathology & Microbiology
University of California
Riverside, CA 92521
jason.stajich at ucr.edu
office: 951.827.2363

http://lab.stajich.org/
http://twitter.com/stajichlab
http://fungalgenomes.org/blog/

http://plantpathology.ucr.edu/
http://genomics.ucr.edu/
http://cepceb.ucr.edu/


From guifengwei at gmail.com  Mon Aug 23 22:44:57 2010
From: guifengwei at gmail.com (Guifeng Wei)
Date: Tue, 24 Aug 2010 10:44:57 +0800
Subject: [Bioperl-l] a problem when using the Bio::DB::Fasta
Message-ID: <AANLkTinnyEw4f8F5BP+CicffaVCe+pBEXc+0rj5vu1iG@mail.gmail.com>

Hi,

i came across a problem when i use the Bio::DB::Fasta modules of
BioPerl. The aim i want to arrive at is to extract the subsequences
accoording to the *.bed files which are the C.elegans genomic sequnece
annotation.

when i tried to run the scripts i wrote, the error message was coming, as
follows:

Can't call method "seq" on an undefined value at bed_to_fasta.pl line 28,
<IN> line 1.

so, ask for favor to slove this problem.
Here is my perl scripts.

#!/usr/bin/perl -w
# Purpose: extract sequences from genomic sequences
use strict;
use Bio::DB::Fasta;
open(IN,$ARGV[0]) || die "sorry, the program cannot open the .bed file, plea
check it. \n";
my $db = Bio::DB::Fasta->new( '/home/wgf/elegans190.dna/' );
# The dir ...../elegans190.dna/ includes 6
files:chrI,chrII,chrIII,chrIV,chrV,chrX,
#each stands for the sequences from the coressponding chromosome.

while(<IN>){
        chomp $_;
        my @bed=split(/\s+/, $_ );

        my $chr_id=$bed[0];
        my $start=$bed[1];
        my $end=$bed[2];
        my $seq_name=$bed[3];
        my $strand=$bed[5];

        my $segment =  $db->seq( $chr_id, $start=>$end );

        print ">",$seq_name,"_",$chr_id,":",$start=>$end;
        print "$segment\n";

}

close(IN);


From florent.angly at gmail.com  Tue Aug 24 01:06:21 2010
From: florent.angly at gmail.com (Florent Angly)
Date: Tue, 24 Aug 2010 15:06:21 +1000
Subject: [Bioperl-l] a problem when using the Bio::DB::Fasta
In-Reply-To: <AANLkTinnyEw4f8F5BP+CicffaVCe+pBEXc+0rj5vu1iG@mail.gmail.com>
References: <AANLkTinnyEw4f8F5BP+CicffaVCe+pBEXc+0rj5vu1iG@mail.gmail.com>
Message-ID: <4C73534D.6080607@gmail.com>

  Hi Guifeng,

 From the Bio::DB::Fasta documentation:
>        $db = Bio::DB::Fasta->new($fasta_path [,%options])
>          Create a new Bio::DB::Fasta object from the Fasta file or files
>          indicated by $fasta_path.  Indexing will be performed 
> automatically
>          if needed.  If successful, new() will return the database 
> accessor
>          object.  Otherwise it will return undef.

Hence, after you create the database object $db, you should check that 
it was successful, e.g.:
> my $db = Bio::DB::Fasta->new( '/home/wgf/elegans190.dna/' );
> if (not defined $db) {
>   die "There was a problem creating the database\n";
> }
A problem creating the database would explain the message you get.

If the extension of the FASTA files in the directory path that you gave 
as input is not fa, fasta, fast, FA, FASTA, FAST or dna, then you should 
use the -glob option when constructing your database object. From the 
documentation:
>           -glob         Glob expression to use    
> *.{fa,fasta,fast,FA,FASTA,FAST,dna}
>                         for searching for Fasta
>                              files in directories.


Florent


On 24/08/10 12:44, Guifeng Wei wrote:
> Hi,
>
> i came across a problem when i use the Bio::DB::Fasta modules of
> BioPerl. The aim i want to arrive at is to extract the subsequences
> accoording to the *.bed files which are the C.elegans genomic sequnece
> annotation.
>
> when i tried to run the scripts i wrote, the error message was coming, as
> follows:
>
> Can't call method "seq" on an undefined value at bed_to_fasta.pl line 28,
> <IN>  line 1.
>
> so, ask for favor to slove this problem.
> Here is my perl scripts.
>
> #!/usr/bin/perl -w
> # Purpose: extract sequences from genomic sequences
> use strict;
> use Bio::DB::Fasta;
> open(IN,$ARGV[0]) || die "sorry, the program cannot open the .bed file, plea
> check it. \n";
> my $db = Bio::DB::Fasta->new( '/home/wgf/elegans190.dna/' );
> # The dir ...../elegans190.dna/ includes 6
> files:chrI,chrII,chrIII,chrIV,chrV,chrX,
> #each stands for the sequences from the coressponding chromosome.
>
> while(<IN>){
>          chomp $_;
>          my @bed=split(/\s+/, $_ );
>
>          my $chr_id=$bed[0];
>          my $start=$bed[1];
>          my $end=$bed[2];
>          my $seq_name=$bed[3];
>          my $strand=$bed[5];
>
>          my $segment =  $db->seq( $chr_id, $start=>$end );
>
>          print ">",$seq_name,"_",$chr_id,":",$start=>$end;
>          print "$segment\n";
>
> }
>
> close(IN);
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From guifengwei at gmail.com  Tue Aug 24 07:28:16 2010
From: guifengwei at gmail.com (Guifeng Wei)
Date: Tue, 24 Aug 2010 19:28:16 +0800
Subject: [Bioperl-l] a problem when using the Bio::DB::Fasta
In-Reply-To: <AANLkTinnyEw4f8F5BP+CicffaVCe+pBEXc+0rj5vu1iG@mail.gmail.com>
References: <AANLkTinnyEw4f8F5BP+CicffaVCe+pBEXc+0rj5vu1iG@mail.gmail.com>
Message-ID: <AANLkTik_yFysscFwAd-8Ar4S_cM-XCk5w+C=8121MWNA@mail.gmail.com>

Hi,

i have revised my scripts according to the previous email from Florent.
However, there were still some errors which frustrated me so much.

The errors are as follows:

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Each line of the fasta entry must be the same length except the last.
    Line above #301451 '
..' is 22 != 51 chars.
STACK: Error::throw
STACK: Bio::Root::Root::throw
/usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368
STACK: Bio::DB::Fasta::calculate_offsets
/usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:770
STACK: Bio::DB::Fasta::index_dir
/usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:593
STACK: Bio::DB::Fasta::new
/usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:488
STACK: bed2fasta.pl:13
-----------------------------------------------------------
indexing was interrupted, so unlinking
/home/wgf/elegans190.dna//directory.index at
/usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm line 1053
But in the directory /home/wgf/elegans190.dna/ , it concludes 6 files,
each contains the complete sequences from one single chromosome, the format
is fasta. The extension of the FASTA files is .fa. Every single file is
started as ">chromosoemeXXX" followed by the thousands of sequences.

and therefore, it warn me that "Each line of the fasta entry must be the
same length except the last". and "indexing was interrupted, so unlinking
/home/wgf/elegans190.dna//directory".

i was much confused about this. so for help.

Wei Guifeng


From biopython at maubp.freeserve.co.uk  Tue Aug 24 09:28:33 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 24 Aug 2010 14:28:33 +0100
Subject: [Bioperl-l] a problem when using the Bio::DB::Fasta
In-Reply-To: <AANLkTik_yFysscFwAd-8Ar4S_cM-XCk5w+C=8121MWNA@mail.gmail.com>
References: <AANLkTinnyEw4f8F5BP+CicffaVCe+pBEXc+0rj5vu1iG@mail.gmail.com>
	<AANLkTik_yFysscFwAd-8Ar4S_cM-XCk5w+C=8121MWNA@mail.gmail.com>
Message-ID: <AANLkTi=Nn7m1_6mPoiUcmJNsBoFu4eh-pO9QJaVipOU0@mail.gmail.com>

On Tue, Aug 24, 2010 at 12:28 PM, Guifeng Wei <guifengwei at gmail.com> wrote:
> Hi,
>
> i have revised my scripts according to the previous email from Florent.
> However, there were still some errors which frustrated me so much.
>
> The errors are as follows:
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Each line of the fasta entry must be the same length except the last.
> ? ?Line above #301451 '
> ..' is 22 != 51 chars.
> STACK: Error::throw
> STACK: Bio::Root::Root::throw
> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368
> STACK: Bio::DB::Fasta::calculate_offsets
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:770
> STACK: Bio::DB::Fasta::index_dir
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:593
> STACK: Bio::DB::Fasta::new
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:488
> STACK: bed2fasta.pl:13
> -----------------------------------------------------------
> indexing was interrupted, so unlinking
> /home/wgf/elegans190.dna//directory.index at
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm line 1053
> But in the directory /home/wgf/elegans190.dna/ , it concludes 6 files,
> each contains the complete sequences from one single chromosome, the format
> is fasta. The extension of the FASTA files is .fa. Every single file is
> started as ">chromosoemeXXX" followed by the thousands of sequences.
>
> and therefore, it warn me that "Each line of the fasta entry must be the
> same length except the last". and "indexing was interrupted, so unlinking
> /home/wgf/elegans190.dna//directory".
>
> i was much confused about this. so for help.
>
> Wei Guifeng

Hi Wei,

It sounds like there is inconsistent line wrapping in your FASTA file.
This is often not a problem at all, but the DB indexing system (and
indeed other indexing tools like the samtools fasta index) requires
all the entries have the same wrapping.

e.g. This is a valid FASTA file but would not be suitable for indexing:

>Test
ACGTACGT
ACGTACGT
ACGTACGT
ACGT
ACGT
T

Ignoring the final line (special case - here length one) that uses a
mixture of line lengths, 8 and 4. If you had used this it should be
fine:

>Test
ACGTACGT
ACGTACGT
ACGTACGT
ACGTACGT
T

All the lines are now wrapped at length 8 (and the final line is
less than or equal to length 8).

Of course, in a real file wrapping a 60 or 80 characters is more
common ;)

Peter


From cjfields at illinois.edu  Tue Aug 24 09:38:45 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 24 Aug 2010 08:38:45 -0500
Subject: [Bioperl-l] a problem when using the Bio::DB::Fasta
In-Reply-To: <AANLkTik_yFysscFwAd-8Ar4S_cM-XCk5w+C=8121MWNA@mail.gmail.com>
References: <AANLkTinnyEw4f8F5BP+CicffaVCe+pBEXc+0rj5vu1iG@mail.gmail.com>
	<AANLkTik_yFysscFwAd-8Ar4S_cM-XCk5w+C=8121MWNA@mail.gmail.com>
Message-ID: <995BCF30-99B2-46C2-A4E8-681F9E2A0BB5@illinois.edu>

Guifeng,

Did you follow Jason's advice yesterday about converting the FASTA over to a more consistent length?  Or checking the database itself?  These are both things reiterated by Florent and Peter.

>From Jason's last response:

-------------------------
Wei -

Please ask your questions on the bioperl mailing list, I cannot answer questions directly for all requests.
Your problem has been answered by me on the list before so I urge you to use the list archives as a starting point.

The line lengths of the fasta file sequence aren't the same length.

you need to run this
bp_sreformat -if fasta -of fasta -i ORIGINAL -o NEW
mv NEW ORIGINAL

or with sreformat
sreformat fasta ORIGINAL > NEW
mv NEW ORIGINAL
-------------------------

chris


On Aug 24, 2010, at 6:28 AM, Guifeng Wei wrote:

> Hi,
> 
> i have revised my scripts according to the previous email from Florent.
> However, there were still some errors which frustrated me so much.
> 
> The errors are as follows:
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Each line of the fasta entry must be the same length except the last.
>   Line above #301451 '
> ..' is 22 != 51 chars.
> STACK: Error::throw
> STACK: Bio::Root::Root::throw
> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368
> STACK: Bio::DB::Fasta::calculate_offsets
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:770
> STACK: Bio::DB::Fasta::index_dir
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:593
> STACK: Bio::DB::Fasta::new
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:488
> STACK: bed2fasta.pl:13
> -----------------------------------------------------------
> indexing was interrupted, so unlinking
> /home/wgf/elegans190.dna//directory.index at
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm line 1053
> But in the directory /home/wgf/elegans190.dna/ , it concludes 6 files,
> each contains the complete sequences from one single chromosome, the format
> is fasta. The extension of the FASTA files is .fa. Every single file is
> started as ">chromosoemeXXX" followed by the thousands of sequences.
> 
> and therefore, it warn me that "Each line of the fasta entry must be the
> same length except the last". and "indexing was interrupted, so unlinking
> /home/wgf/elegans190.dna//directory".
> 
> i was much confused about this. so for help.
> 
> Wei Guifeng
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From scott at scottcain.net  Tue Aug 24 11:01:47 2010
From: scott at scottcain.net (Scott Cain)
Date: Tue, 24 Aug 2010 11:01:47 -0400
Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query
In-Reply-To: <1282321387.30812.17.camel@pyrimidine.igb.uiuc.edu>
References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au>
	<986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net>
	<AANLkTikiTQwzKJGrAKAtCqbHiAVw_uENrJu4Y038=anD@mail.gmail.com>
	<045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net>
	<5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu>
	<4C6DADDF.1000103@cornell.edu>
	<20100820155956.GB400@vpn-165-124-164-118.vpn.northwestern.edu>
	<1282321387.30812.17.camel@pyrimidine.igb.uiuc.edu>
Message-ID: <AANLkTin01uf32_1G2+d8PA2YEtw3UfB5FK+CVPnLCD81@mail.gmail.com>

Hi Chris,

GMOD still only supports Chado with Postgres (for example, the GFF
loader assumes a Postgres database), but when I reengineered the GFF
loader a few years ago, I tried to do it with subclassing the loader
in mind so that it could be subclassed to work with other RDMS.

Scott


On Fri, Aug 20, 2010 at 12:23 PM, Chris Fields <cjfields at illinois.edu> wrote:
> On Fri, 2010-08-20 at 10:59 -0500, Siddhartha Basu wrote:
>> Hi,
>>
>> On Thu, 19 Aug 2010, Robert Buels wrote:
>>
>> > Chris Fields wrote:
>> > > I think it's worth exploring having a DBIx::Class-based middle-ware
>> > > approach similar to what Rob Buels has done for Chado. ?That would be
>> > > fairly easy to get started using DBIx::Class::Schema::Loader.
>> > > After that it would require optimization and tweaking, which is
>> > > potentially more complex than Rob's setup as Chado is very Pg-specific,
>> > > but maybe Rob can elaborate...
>> >
>> > Elaborating on how Bio::Chado::Schema is developed:
>> >
>> > The vast majority of the code and POD in BCS is autogenerated by
>> > DBIx::Class::Schema::Loader. ?DBICSL gives you a baseline set of
>> > DBIx::Class classes that covers all the tables, views, columns, unique
>> > constraints, and foreign key relationships.
>> >
>> > Beyond that, you have to add on yourself. ?In BCS, we have mostly done
>> > things like:
>> >
>> > ? * make better-named aliases for some of the autogenerated
>> > ? ? relationships (though DBICSL does a surprisingly good job of naming
>> > ? ? relationships automatically most of the time)
>> > ? * add a tiny bit of bioperl compatibility (this needs a lot more work
>> > ? ? by somebody, volunteers needed!)
>> > ? * add convenience methods for using some of the Chado property tables
>> > ? * use DBIx::Class::Tree::NestedSet to add some powerful ways of
>> > ? ? traversing phylogenetic tree relationships
>> >
>> > Regarding DB backend specificity, BCS isn't Pg-specific at all, because
>> > DBIx::Class itself goes to great lengths to be compatible (and performant!)
>> > with just about every relational database out there.
>> I would vouch for that at least as far as chado in oracle is concerned.
>> So, ?far BCS works out flawlessly with our oracle chado instance at
>> dictybase. Quite a chunk of BCS based code is also active in couple of
>> our Mojo based webapps. The part which i still couldn't use directly is
>> the 'synonym' table as it clashes with oracle specific reserved keywords.
>> However, ?overall it seems to quite cross-RDMS compatible and highly
>> recommended.
>>
>> -siddhartha
>
> Just to point out, I didn't say BCS is Pg-specific, but that Chado is
> (that was the DBMS it was designed for). ?Maybe that should be amended
> to 'was' now :)
>
> I recall seeing a page on this somewhere on the GMOD website along the
> lines of "MySQL has problems so we chose Pg", and that Chado support
> would focus on Pg. ?I'm guessing that's no longer the case? ?Or is only
> the server-side stuff Pg-specific.
>
>> >In fact, the BCS test
>> > suite deploys a Chado schema into a temporary SQLite database using
>> > DBIC::Schema's deploy() method, and runs all of its tests on that. ?Very
>> > handy.
>> >
>> > Chado's Pg-specific server-side functions can of course be called through
>> > BCS if they are present, but it's perfectly possible to use Chado without
>> > any of the server-side functions, and mostly the way I use it.
>> >
>> > Rob
>
> I think this opens up the possibility of starting a DBIx::Class-based
> middleware solution. ?Hilmar, did you want to take that on?
>
> chris
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
Ontario Institute for Cancer Research


From bgs500 at york.ac.uk  Tue Aug 24 11:35:53 2010
From: bgs500 at york.ac.uk (Ben Saville)
Date: Tue, 24 Aug 2010 16:35:53 +0100
Subject: [Bioperl-l] Problem Parsing BLAST output
In-Reply-To: <0384052D-74D2-4789-B7FA-76EED826044F@sbc.su.se>
References: <77D8BE69-E2D3-4C75-BED4-08F1755E414F@york.ac.uk>
	<0384052D-74D2-4789-B7FA-76EED826044F@sbc.su.se>
Message-ID: <34F7412D-2BFA-4D80-AEEB-2B8A9BE415D4@york.ac.uk>

Sorry for the Delay in replying, 454 data analysis is very time  
consuming.

please see http://seqanswers.com/forums/showthread.php?t=6484
For a discussion about this problem, and how we solved the issue.

Thanks for the reply though, much appreciated!

Regards
Ben Saville


On 20 Aug 2010, at 14:48, Dave Messina wrote:

> Hi Ben,
>
> I would not use the script you posted ? I don't think it does what  
> you want.
>
> If you haven't already, you should take a look at the beginners' HOWTO
>
> 	http://www.bioperl.org/wiki/HOWTO:Beginners
>
>
> the SearchIO HOWTO
>
> 	http://www.bioperl.org/wiki/HOWTO:SearchIO
>
>
> and the example scripts included with BioPerl:
>
> 	http://www.bioperl.org/wiki/Scripts
>
>
>
> Incidentally, it's a lot of fiddly data processing to parse blast  
> reports for many contigs against multiple databases and then go back  
> and collate the results by query. I'm not sure exactly what you want  
> to do once you've separated by query ? if you provide some more  
> information, we could suggest ways to best get you where you want to  
> go.
>
> I will mention, though, that BLAST has the ability to search  
> multiple separate databases in one go and collate the results for  
> you. So that's something to consider.
>
>
>
> Dave
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Tue Aug 24 11:54:20 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 24 Aug 2010 10:54:20 -0500
Subject: [Bioperl-l] a problem when using the Bio::DB::Fasta
In-Reply-To: <AANLkTi=7_fFU4Q53S1onRZpFaVoS6ndNNq68ZSHMDoe3@mail.gmail.com>
References: <AANLkTinnyEw4f8F5BP+CicffaVCe+pBEXc+0rj5vu1iG@mail.gmail.com>
	<AANLkTik_yFysscFwAd-8Ar4S_cM-XCk5w+C=8121MWNA@mail.gmail.com>
	<995BCF30-99B2-46C2-A4E8-681F9E2A0BB5@illinois.edu>
	<AANLkTi=7_fFU4Q53S1onRZpFaVoS6ndNNq68ZSHMDoe3@mail.gmail.com>
Message-ID: <B269BA3E-C0E7-4FEA-BA78-E164F4D2B787@illinois.edu>

Please keep all responses on-list.  

Regarding sreformat:

http://tinyurl.com/28q75rr

Judging by the stack traces below, you are also running off a UNIX-like system.  To concatenate files, use 'cat'.  So, for all files ending with .fa:

cat *.fa >> all.fa

chris

On Aug 24, 2010, at 8:54 AM, Guifeng Wei wrote:

> Hello Fields,
>  
> i have checked the fasta files. i suddenly find that the last line is blank line, and the last second is less than common.
>  
> i am not able to run the command line as Jason's advice because i have no knowledge about "sreformat".
>  
> i also want to ask a more question. i want megre the several single chromosome sequence file into one, OK?
>  
> thank you very much.
>  
> Wei Guifeng
> 2010/8/24 Chris Fields <cjfields at illinois.edu>
> Guifeng,
> 
> Did you follow Jason's advice yesterday about converting the FASTA over to a more consistent length?  Or checking the database itself?  These are both things reiterated by Florent and Peter.
> 
> From Jason's last response:
> 
> -------------------------
> Wei -
> 
> Please ask your questions on the bioperl mailing list, I cannot answer questions directly for all requests.
> Your problem has been answered by me on the list before so I urge you to use the list archives as a starting point.
> 
> The line lengths of the fasta file sequence aren't the same length.
> 
> you need to run this
> bp_sreformat -if fasta -of fasta -i ORIGINAL -o NEW
> mv NEW ORIGINAL
> 
> or with sreformat
> sreformat fasta ORIGINAL > NEW
> mv NEW ORIGINAL
> -------------------------
> 
> chris
> 
> 
> On Aug 24, 2010, at 6:28 AM, Guifeng Wei wrote:
> 
> > Hi,
> >
> > i have revised my scripts according to the previous email from Florent.
> > However, there were still some errors which frustrated me so much.
> >
> > The errors are as follows:
> >
> > ------------- EXCEPTION: Bio::Root::Exception -------------
> > MSG: Each line of the fasta entry must be the same length except the last.
> >   Line above #301451 '
> > ..' is 22 != 51 chars.
> > STACK: Error::throw
> > STACK: Bio::Root::Root::throw
> > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368
> > STACK: Bio::DB::Fasta::calculate_offsets
> > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:770
> > STACK: Bio::DB::Fasta::index_dir
> > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:593
> > STACK: Bio::DB::Fasta::new
> > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:488
> > STACK: bed2fasta.pl:13
> > -----------------------------------------------------------
> > indexing was interrupted, so unlinking
> > /home/wgf/elegans190.dna//directory.index at
> > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm line 1053
> > But in the directory /home/wgf/elegans190.dna/ , it concludes 6 files,
> > each contains the complete sequences from one single chromosome, the format
> > is fasta. The extension of the FASTA files is .fa. Every single file is
> > started as ">chromosoemeXXX" followed by the thousands of sequences.
> >
> > and therefore, it warn me that "Each line of the fasta entry must be the
> > same length except the last". and "indexing was interrupted, so unlinking
> > /home/wgf/elegans190.dna//directory".
> >
> > i was much confused about this. so for help.
> >
> > Wei Guifeng
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> 
> 
> -- 
> ?????? Wei Guifeng
> 
> 
> 


From cjfields at illinois.edu  Tue Aug 24 12:14:51 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 24 Aug 2010 11:14:51 -0500
Subject: [Bioperl-l] Problem Parsing BLAST output
In-Reply-To: <34F7412D-2BFA-4D80-AEEB-2B8A9BE415D4@york.ac.uk>
References: <77D8BE69-E2D3-4C75-BED4-08F1755E414F@york.ac.uk>
	<0384052D-74D2-4789-B7FA-76EED826044F@sbc.su.se>
	<34F7412D-2BFA-4D80-AEEB-2B8A9BE415D4@york.ac.uk>
Message-ID: <69C47A74-09C7-4024-9303-A3893658A2A8@illinois.edu>

Just in case anyone needs it, there is a way to index these as well (both BLAST and the two tabular BLAST versions) for fast lookups of specific reports, if needed.  See Bio::Index::Blast and Bio::Index::BlastTable in BioPerl.

Caveat: I believe there is a bug with BLAST+ text output indexing (it chops the header off subsequent reports).  I haven't investigated it enough, though, but I'll try looking into it today.  

chris

On Aug 24, 2010, at 10:35 AM, Ben Saville wrote:

> Sorry for the Delay in replying, 454 data analysis is very time consuming.
> 
> please see http://seqanswers.com/forums/showthread.php?t=6484
> For a discussion about this problem, and how we solved the issue.
> 
> Thanks for the reply though, much appreciated!
> 
> Regards
> Ben Saville
> 
> 
> 
> 
> 
> On 20 Aug 2010, at 14:48, Dave Messina wrote:
> 
>> Hi Ben,
>> 
>> I would not use the script you posted ? I don't think it does what you want.
>> 
>> If you haven't already, you should take a look at the beginners' HOWTO
>> 
>> 	http://www.bioperl.org/wiki/HOWTO:Beginners
>> 
>> 
>> the SearchIO HOWTO
>> 
>> 	http://www.bioperl.org/wiki/HOWTO:SearchIO
>> 
>> 
>> and the example scripts included with BioPerl:
>> 
>> 	http://www.bioperl.org/wiki/Scripts
>> 
>> 
>> 
>> Incidentally, it's a lot of fiddly data processing to parse blast reports for many contigs against multiple databases and then go back and collate the results by query. I'm not sure exactly what you want to do once you've separated by query ? if you provide some more information, we could suggest ways to best get you where you want to go.
>> 
>> I will mention, though, that BLAST has the ability to search multiple separate databases in one go and collate the results for you. So that's something to consider.
>> 
>> 
>> 
>> Dave
>> 
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Tue Aug 24 12:17:17 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 24 Aug 2010 11:17:17 -0500
Subject: [Bioperl-l] FYI: interesting stuff in BLAST 2.2.24 release
	announcement
References: <73C34D2F-813E-4FCE-8819-B86BC5A974C0@ncbi.nlm.nih.gov>
Message-ID: <A26B0224-CFDD-4D2B-A5B0-4275693416FD@illinois.edu>

FYI,

Very interesting additions to BLAST+ (archive format).  

chris

Begin forwarded message:

> From: mcginnis <mcginnis at ncbi.nlm.nih.gov>
> Date: August 24, 2010 10:46:50 AM CDT
> To: NLM/NCBI List blast-announce <blast-announce at ncbi.nlm.nih.gov>
> Subject: [blast-announce] Correction: BLAST 2.2.24 release announcement
> 
> A new version of the stand-alone applications is available.
>  
> Users are encouraged to use the BLAST+ applications available at ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/
> 
> This release includes a number of bug fixes as well as new features for the BLAST+ applications:
>  
> * Introduce BLAST Archive format to permit reformatting of stand-alone BLAST searches with the blast_formatter(see BLAST+ user manual) 
> * Added the blast_formatter application (see BLAST+ user manual)
> * Added support for translated subject soft masking in the BLAST databases
> * Added support for the BLAST Trace-back operations (btop) output format
> * Added command line options to blastdbcmd for listing available BLAST databases
> * Improved performance of formatting of remote BLAST searches
> * Use a consistent exit code for out of memory conditions
> * Fixed bug in indexed megablast with multiple space-separated BLAST databases
> * Fixed bugs in legacy_blast.pl, blastdbcmd, rpsblast, and makeblastdb
> * Fixed Windows installer for 64-bit installations
>  
> BLAST+ applications, as well as the legacy C applications (e.g. blastall), may be downloaded from http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download


From David.Messina at sbc.su.se  Tue Aug 24 13:00:14 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Tue, 24 Aug 2010 19:00:14 +0200
Subject: [Bioperl-l] FYI: interesting stuff in BLAST 2.2.24 release
	announcement
In-Reply-To: <A26B0224-CFDD-4D2B-A5B0-4275693416FD@illinois.edu>
References: <73C34D2F-813E-4FCE-8819-B86BC5A974C0@ncbi.nlm.nih.gov>
	<A26B0224-CFDD-4D2B-A5B0-4275693416FD@illinois.edu>
Message-ID: <27DD75E8-4452-4B2D-B5B9-A686C113E5B6@sbc.su.se>

Here's a link to the manual:
ftp://ftp.ncbi.nlm.nih.gov//blast/executables/blast%2B/2.2.24/user_manual.pdf

(Is it on the NCBI website somewhere? Strange to have only a downloadable PDF.) The section on the new archive format is on page 27.

It seems like a nice idea to have the flexibility, but I wonder about the time cost of using this format.

One of the big gains from using tab-delimited output is that BLAST doesn't have to do all the post-processing to generate the alignment views. By doing the archive format, which if I understand it correctly is ASN.1, you're always paying the full price in time (and space, for that matter).


Dave


On Aug 24, 2010, at 18:17 , Chris Fields wrote:

> FYI,
> 
> Very interesting additions to BLAST+ (archive format).  
> 
> chris
> 
> Begin forwarded message:
> 
>> From: mcginnis <mcginnis at ncbi.nlm.nih.gov>
>> Date: August 24, 2010 10:46:50 AM CDT
>> To: NLM/NCBI List blast-announce <blast-announce at ncbi.nlm.nih.gov>
>> Subject: [blast-announce] Correction: BLAST 2.2.24 release announcement
>> 
>> A new version of the stand-alone applications is available.
>> 
>> Users are encouraged to use the BLAST+ applications available at ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/
>> 
>> This release includes a number of bug fixes as well as new features for the BLAST+ applications:
>> 
>> * Introduce BLAST Archive format to permit reformatting of stand-alone BLAST searches with the blast_formatter(see BLAST+ user manual) 
>> * Added the blast_formatter application (see BLAST+ user manual)
>> * Added support for translated subject soft masking in the BLAST databases
>> * Added support for the BLAST Trace-back operations (btop) output format
>> * Added command line options to blastdbcmd for listing available BLAST databases
>> * Improved performance of formatting of remote BLAST searches
>> * Use a consistent exit code for out of memory conditions
>> * Fixed bug in indexed megablast with multiple space-separated BLAST databases
>> * Fixed bugs in legacy_blast.pl, blastdbcmd, rpsblast, and makeblastdb
>> * Fixed Windows installer for 64-bit installations
>> 
>> BLAST+ applications, as well as the legacy C applications (e.g. blastall), may be downloaded from http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Tue Aug 24 13:26:49 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 24 Aug 2010 12:26:49 -0500
Subject: [Bioperl-l] FYI: interesting stuff in BLAST 2.2.24 release
	announcement
In-Reply-To: <27DD75E8-4452-4B2D-B5B9-A686C113E5B6@sbc.su.se>
References: <73C34D2F-813E-4FCE-8819-B86BC5A974C0@ncbi.nlm.nih.gov>
	<A26B0224-CFDD-4D2B-A5B0-4275693416FD@illinois.edu>
	<27DD75E8-4452-4B2D-B5B9-A686C113E5B6@sbc.su.se>
Message-ID: <D84DD1C8-6CBE-40F1-8CF9-F9482F0E4B18@illinois.edu>

It's probably more applicable from the viewpoint of a cluster admin who would want to add the flexibility of having a single archive and allowing any format (as opposed to re-running the analysis). I'm just wondering if there is anything to glean there for possible alignment archiving purposes (ala SAM/BAM), but if it's ASN.1, likely not.

chris

On Aug 24, 2010, at 12:00 PM, Dave Messina wrote:

> Here's a link to the manual:
> ftp://ftp.ncbi.nlm.nih.gov//blast/executables/blast%2B/2.2.24/user_manual.pdf
> 
> (Is it on the NCBI website somewhere? Strange to have only a downloadable PDF.) The section on the new archive format is on page 27.
> 
> It seems like a nice idea to have the flexibility, but I wonder about the time cost of using this format.
> 
> One of the big gains from using tab-delimited output is that BLAST doesn't have to do all the post-processing to generate the alignment views. By doing the archive format, which if I understand it correctly is ASN.1, you're always paying the full price in time (and space, for that matter).
> 
> 
> 
> Dave
> 
> 
> 
> 
> On Aug 24, 2010, at 18:17 , Chris Fields wrote:
> 
>> FYI,
>> 
>> Very interesting additions to BLAST+ (archive format).  
>> 
>> chris
>> 
>> Begin forwarded message:
>> 
>>> From: mcginnis <mcginnis at ncbi.nlm.nih.gov>
>>> Date: August 24, 2010 10:46:50 AM CDT
>>> To: NLM/NCBI List blast-announce <blast-announce at ncbi.nlm.nih.gov>
>>> Subject: [blast-announce] Correction: BLAST 2.2.24 release announcement
>>> 
>>> A new version of the stand-alone applications is available.
>>> 
>>> Users are encouraged to use the BLAST+ applications available at ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/
>>> 
>>> This release includes a number of bug fixes as well as new features for the BLAST+ applications:
>>> 
>>> * Introduce BLAST Archive format to permit reformatting of stand-alone BLAST searches with the blast_formatter(see BLAST+ user manual) 
>>> * Added the blast_formatter application (see BLAST+ user manual)
>>> * Added support for translated subject soft masking in the BLAST databases
>>> * Added support for the BLAST Trace-back operations (btop) output format
>>> * Added command line options to blastdbcmd for listing available BLAST databases
>>> * Improved performance of formatting of remote BLAST searches
>>> * Use a consistent exit code for out of memory conditions
>>> * Fixed bug in indexed megablast with multiple space-separated BLAST databases
>>> * Fixed bugs in legacy_blast.pl, blastdbcmd, rpsblast, and makeblastdb
>>> * Fixed Windows installer for 64-bit installations
>>> 
>>> BLAST+ applications, as well as the legacy C applications (e.g. blastall), may be downloaded from http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From David.Messina at sbc.su.se  Tue Aug 24 14:45:29 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Tue, 24 Aug 2010 20:45:29 +0200
Subject: [Bioperl-l] FYI: interesting stuff in BLAST 2.2.24 release
	announcement
In-Reply-To: <D84DD1C8-6CBE-40F1-8CF9-F9482F0E4B18@illinois.edu>
References: <73C34D2F-813E-4FCE-8819-B86BC5A974C0@ncbi.nlm.nih.gov>
	<A26B0224-CFDD-4D2B-A5B0-4275693416FD@illinois.edu>
	<27DD75E8-4452-4B2D-B5B9-A686C113E5B6@sbc.su.se>
	<D84DD1C8-6CBE-40F1-8CF9-F9482F0E4B18@illinois.edu>
Message-ID: <00C04DF9-F3C2-4574-B1E4-A3BF28EE953F@sbc.su.se>

> It's probably more applicable from the viewpoint of a cluster admin who would want to add the flexibility of having a single archive and allowing any format (as opposed to re-running the analysis).

Good point.


> I'm just wondering if there is anything to glean there for possible alignment archiving purposes (ala SAM/BAM), but if it's ASN.1, likely not.

To be honest, I didn't look that closely at it. It may be worth considering nevertheless.


Dave


From buiduyminh at gmail.com  Tue Aug 24 14:56:43 2010
From: buiduyminh at gmail.com (Minh Bui)
Date: Tue, 24 Aug 2010 14:56:43 -0400
Subject: [Bioperl-l] bp_seqfeature_load.pl fails on Mac os. Please help.
In-Reply-To: <491D1B66-741F-4315-8A6B-46F465956017@sgul.ac.uk>
References: <AANLkTinsyOMPJxpks_pqMwLpW8gx0VRihhJsLDnF53mu@mail.gmail.com>
	<491D1B66-741F-4315-8A6B-46F465956017@sgul.ac.uk>
Message-ID: <AANLkTimOe=T9FrpMPqMy8yyrfz8Sf7QJ5Rr5YYFjicJb@mail.gmail.com>

How can I know where DBD:mysql PATH on my MAC? I am very new to MAC sorry.

I just check and mysql.pm is in
/Library/Perl/5.8.6/Bio/DB/SeqFeature/Store/DBI/mysql.pm


On 8/21/10, Adam Witney <awitney at sgul.ac.uk> wrote:
>
>  On 20 Aug 2010, at 22:29, Minh Bui wrote:
>
>  > Hi,,
>  > I am trying to load my GFF file to mysql database but I got this error
>  > when I ran the bp_seqfeature_load.pl ( bioperl 1.6.1 on ?MAC)
>  >
>  > [BioComplexity-5:/usr/local/bin] minh% perl bp_seqfeature_load.pl
>  > install_driver(mysql) failed: Can't locate DBD/mysql.pm in @INC (@INC
>  > contains: /sw/lib/perl5 /sw/lib/perl5/darwin
>  > /System/Library/Perl/5.8.6/darwin-thread-multi-2level
>  > /System/Library/Perl/5.8.6
>  > /Library/Perl/5.8.6/darwin-thread-multi-2level /Library/Perl/5.8.6
>  > /Library/Perl /Network/Library/Perl/5.8.6/darwin-thread-multi-2level
>  > /Network/Library/Perl/5.8.6 /Network/Library/Perl
>  > /System/Library/Perl/Extras/5.8.6/darwin-thread-multi-2level
>  > /System/Library/Perl/Extras/5.8.6 /Library/Perl/5.8.1 .) at (eval 44)
>  > line 3.
>  > Perhaps the DBD::mysql perl module hasn't been fully installed,
>  > or perhaps the capitalisation of 'mysql' isn't right.
>  > Available drivers: DBM, ExampleP, File, Gofer, Proxy, Sponge.
>  > at /Library/Perl/5.8.6/Bio/DB/SeqFeature/Store/DBI/mysql.pm line 212
>  >
>  > I am using MAC OSX version 10.4.10 and MAMP? Isnt it the
>  > "/Library/Perl/5.8.6" already in @INC? What am I missing?
>  > I have been googling this error for a few hours. I also install
>  > Bioperl and reinstall DBD::mysql using CPAN. It still doesnt work..
>  >
>  > Here is my $PERL5LIB: ?/sw/lib/perl5:/sw/lib/perl5/darwin/
>
>
>
> Where did DBD:mysql get installed? can you verify that DBD/mysql.pm is actually in one of those directories listed above?
>
>


From scott at scottcain.net  Tue Aug 24 15:04:04 2010
From: scott at scottcain.net (Scott Cain)
Date: Tue, 24 Aug 2010 15:04:04 -0400
Subject: [Bioperl-l] bp_seqfeature_load.pl fails on Mac os. Please help.
In-Reply-To: <AANLkTimOe=T9FrpMPqMy8yyrfz8Sf7QJ5Rr5YYFjicJb@mail.gmail.com>
References: <AANLkTinsyOMPJxpks_pqMwLpW8gx0VRihhJsLDnF53mu@mail.gmail.com>
	<491D1B66-741F-4315-8A6B-46F465956017@sgul.ac.uk>
	<AANLkTimOe=T9FrpMPqMy8yyrfz8Sf7QJ5Rr5YYFjicJb@mail.gmail.com>
Message-ID: <AANLkTimPapxSzwVxCBMw1J0+x88K80SJ_6OH9LBkS3Jn@mail.gmail.com>

Hi Minh,

The file you found is not DBD::mysql though; it is
Bio::DB::SeqFeature::Store::DBI::mysql, which was installed along with
BioPerl.  How did you find that file?  The same method presumably
would turn up DBD::mysql if it existed.  I would use a command like
this:

  locate mysql.pm

which would locate all of the instances of files name mysql.pm on your
computer.  I would expect it to be located in
/Library/Perl/5.8.6/darwin-thread-multi-2level/DBD/ if it was
installed in a "normal" way (that is, not involving macports or fink
or MAMP).

Scott


On Tue, Aug 24, 2010 at 2:56 PM, Minh Bui <buiduyminh at gmail.com> wrote:
> How can I know where DBD:mysql PATH on my MAC? I am very new to MAC sorry.
>
> I just check and mysql.pm is in
> /Library/Perl/5.8.6/Bio/DB/SeqFeature/Store/DBI/mysql.pm
>
>
>
> On 8/21/10, Adam Witney <awitney at sgul.ac.uk> wrote:
>>
>> ?On 20 Aug 2010, at 22:29, Minh Bui wrote:
>>
>> ?> Hi,,
>> ?> I am trying to load my GFF file to mysql database but I got this error
>> ?> when I ran the bp_seqfeature_load.pl ( bioperl 1.6.1 on ?MAC)
>> ?>
>> ?> [BioComplexity-5:/usr/local/bin] minh% perl bp_seqfeature_load.pl
>> ?> install_driver(mysql) failed: Can't locate DBD/mysql.pm in @INC (@INC
>> ?> contains: /sw/lib/perl5 /sw/lib/perl5/darwin
>> ?> /System/Library/Perl/5.8.6/darwin-thread-multi-2level
>> ?> /System/Library/Perl/5.8.6
>> ?> /Library/Perl/5.8.6/darwin-thread-multi-2level /Library/Perl/5.8.6
>> ?> /Library/Perl /Network/Library/Perl/5.8.6/darwin-thread-multi-2level
>> ?> /Network/Library/Perl/5.8.6 /Network/Library/Perl
>> ?> /System/Library/Perl/Extras/5.8.6/darwin-thread-multi-2level
>> ?> /System/Library/Perl/Extras/5.8.6 /Library/Perl/5.8.1 .) at (eval 44)
>> ?> line 3.
>> ?> Perhaps the DBD::mysql perl module hasn't been fully installed,
>> ?> or perhaps the capitalisation of 'mysql' isn't right.
>> ?> Available drivers: DBM, ExampleP, File, Gofer, Proxy, Sponge.
>> ?> at /Library/Perl/5.8.6/Bio/DB/SeqFeature/Store/DBI/mysql.pm line 212
>> ?>
>> ?> I am using MAC OSX version 10.4.10 and MAMP? Isnt it the
>> ?> "/Library/Perl/5.8.6" already in @INC? What am I missing?
>> ?> I have been googling this error for a few hours. I also install
>> ?> Bioperl and reinstall DBD::mysql using CPAN. It still doesnt work..
>> ?>
>> ?> Here is my $PERL5LIB: ?/sw/lib/perl5:/sw/lib/perl5/darwin/
>>
>>
>>
>> Where did DBD:mysql get installed? can you verify that DBD/mysql.pm is actually in one of those directories listed above?
>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
Ontario Institute for Cancer Research


From jason at bioperl.org  Wed Aug 25 00:33:45 2010
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 24 Aug 2010 21:33:45 -0700
Subject: [Bioperl-l] Enquiry on gi_taxid_nucl.dmp.gz
In-Reply-To: <AANLkTi=BrV0ODdF+sEQdAmtAMxRk6y2xGgRMOgbnZz-6@mail.gmail.com>
References: <AANLkTi=BrV0ODdF+sEQdAmtAMxRk6y2xGgRMOgbnZz-6@mail.gmail.com>
Message-ID: <4C749D29.3040003@bioperl.org>

hi - please keep questions on list.


I think one of your problem is your first use of $gi2taxidfile is wrong. 
when you call tie you want to specify an dbfile you want to store the 
index in.
So call it "/tmp/gi2taxid.idx" or something like that.

In my code here 
http://github.com/bioperl/bioperl-live/blob/master/scripts/taxa/classify_hits_kingdom.PLS
you will see on line 97 we construct the name of the index file to be 
the folder, plus 'idx', plus the name gi2taxid which will be the name of 
index file.

Also it would be safer for the split to be whitespace matching and that 
you want the the two first columns from the file.  Doing this would 
eliminate the need for the chomp on the line above.

  my ($gi, $taxid) = split(/\s+/, $_);

instead of

  chomp;
  my ($gi, $taxid) = split(" ", $_,2);

There may be other problems but these should be fixed first -- and 
please send queries to the mailing list rather than to me directly so 
that others can answer questions.

-jason
Amali Thrimawithana wrote, On 8/24/10 8:13 PM:
> Dear Jason
>
> Thank you very much for the information. I manage to get the information on
> different taxonomic  levels with the help of one of your example code
> "local_taxonomydb_query". However I am having trouble with creating a local
> index file of the gi_taxid_nucl.dmp so that I am able to get the taxonomic
> id given the GI number of NCBI. At the moment I am using the tie() function
> with DB_file and then storing the detail into a hash. However when I try to
> retrieve a taxonomic ID given the GI number, it is not returning any thing
> but an error. Below is part of the code (borrowed from the example code
> classify kingdom), can you please let me know where I am going wrong?
> ...
> my $dbh2 = tie(%taxid4gi, 'DB_File', $gi2taxidfile);
>
> if( ! $done ) {
>      my $fh;
>     open(GI2TAXID, "$gi2taxidfile") or die $!; #here passing the unzipped
> gi_taxid_nucl.dmp
>     my$i=0;
>      while (<GI2TAXID>) {
>        chomp;
>         my ($gi, $taxid) = split(" ", $_, 2);
>         $taxid4gi{$gi} = $taxid
>         if exists $taxid4gi{$gi};
>         $i++;
>       unless( $DEBUG&&  $i % 100000  ) {
>          warn "$i\n";
>      }
>      }
>      $dbh2->sync;
> }
> my $gi2='183397240';
> my $taxd2=$taxid4gi{$gi2};
>   print $taxd2, " \n";
>
> Any help would be much appreciated
>
> Thanking you
> Amali
>
> On 23 August 2010 06:29, Jason Stajich<jason at bioperl.org>  wrote:
>
>    
>> Hi Amali -
>>
>> This is how I'd print out the full classification by using the Tree methods
>> (with probably a different way of initializing the $db object to your
>> flatfiles location).
>>
>> #!/usr/bin/perl -w
>> use strict;
>> use Bio::DB::Taxonomy;
>>
>> my $db= Bio::DB::Taxonomy->new(-source =>  'flatfile',
>>                    -nodesfile =>  'taxonomy/nodes.dmp',
>>                    -namesfile =>  'taxonomy/names.dmp');
>>
>> my $taxonid = $db->get_taxonid('Homo sapiens');
>> my $taxon = $db->get_taxon(-taxonid =>  $taxonid);
>> my $tree = Bio::Tree::Tree->new(-node =>  $taxon);
>> my @taxa = $tree->get_nodes;
>> print join(",", map { $_->scientific_name } @taxa), "\n";
>>
>> -jason
>>
>> Amali Thrimawithana wrote, On 8/18/10 3:56 PM:
>>
>>   Dear Dr Stajich,
>>      
>>> I am a Masters student at Auckland university and my research is on
>>> identifying yeast species present in wine by the use of 454 sequencing. In
>>> order to carry out this research, a pipeline is being built in which at
>>> the
>>> final step each representative OTU need to be classified at different
>>> taxonomic levels (ie: at Phylum, family, class, genus and species) by
>>> using
>>> the results from BLAST. To identify the sequences at each taxonomic level,
>>> I
>>> have been trying out the Bio::DB::Taxonomy module in bioperl. Using this
>>> module, I am able to get the genus and species level by splitting the
>>> scientific name returned by the Bio::taxon object. But unfortunately I am
>>> uncertain on how to get the information for the other levels of the rank.
>>> I
>>> have tried several commands including "my @class =
>>> $node->classification;",
>>> but it does not work. Hence, could you please let me know how I might be
>>> able to get the higher levels of taxonomy such as class and phylum using
>>> bioperl?
>>>
>>> Look forward to hearing from you soon
>>>
>>> Thanking You
>>>
>>> Amali
>>>
>>>
>>>        


From roy.chaudhuri at gmail.com  Wed Aug 25 07:12:15 2010
From: roy.chaudhuri at gmail.com (Roy Chaudhuri)
Date: Wed, 25 Aug 2010 12:12:15 +0100
Subject: [Bioperl-l] Enquiry on gi_taxid_nucl.dmp.gz
In-Reply-To: <4C749D29.3040003@bioperl.org>
References: <AANLkTi=BrV0ODdF+sEQdAmtAMxRk6y2xGgRMOgbnZz-6@mail.gmail.com>
	<4C749D29.3040003@bioperl.org>
Message-ID: <4C74FA8F.3080506@gmail.com>

 > Also it would be safer for the split to be whitespace matching and that
> you want the the two first columns from the file.  Doing this would
> eliminate the need for the chomp on the line above.
>
>    my ($gi, $taxid) = split(/\s+/, $_);
>
> instead of
>
>    chomp;
>    my ($gi, $taxid) = split(" ", $_,2);

Sorry to be pedantic, but according to perldoc -f split: "As a special 
case, specifying a PATTERN of space (' ') will split on white space just 
as "split" with no arguments does"

The only difference between patterns of " " and /\s+/ is that the latter 
will return an initial null field if there is leading white space, which 
may or may not be what you want.

$ perl -e 'print join("-", split(" ", " 1\t2  3")), "\n"'
1-2-3
$ perl -e 'print join("-", split(/\s+/, " 1\t2  3")), "\n"'
-1-2-3

Cheers.
Roy.


From kanmaninradha at gmail.com  Thu Aug 26 04:29:08 2010
From: kanmaninradha at gmail.com (kanmani radha)
Date: Thu, 26 Aug 2010 01:29:08 -0700
Subject: [Bioperl-l] getting DNA sequence for exon features from GFF
Message-ID: <AANLkTimzuqNz+oHZt3JAq=eQaefJ4UnQK=0R2tQhKfmS@mail.gmail.com>

Hi All,
I would like to get the DNA seq from GFF file. I'm using Bio::Tools::GFF
module. I could get everything else but not the DNA seq.

Can anyone help me to find this out, Please. I appreciate your help very
much.
thanks,
Kanmani

#!/usr/bin/perl

use strict;
use warnings;
use Bio::Tools::GFF;

my $file = shift;

my $gffio = Bio::Tools::GFF->new(-file => $file, -gff_version => 3);
$gffio->features_attached_to_seqs(1);

while (my $feat = $gffio->next_feature()){
    my $start = $feat->start;
    my $end= $feat->end;
    my $size = $end-$start+1;
    my $strand = $feat->strand;
    my $seqid = $feat->seq_id;
    my $score = $feat->score;
    my $frame = $feat->frame;
    my $source = $feat->source_tag;
    my $type = $feat->primary_tag;
    my $gffstr = $gffio->gff_string($feat);
    my @alltags = $feat->all_tags();
    my @ID_tag_value = $feat->each_tag_value("ID");

    my  $seq = $feat->seq();
    print "$seq\n";

     if($type eq "gene"){     #
       print "@ID_tag_value\t$size\t$type\t$start\t$end\n";
    }
}


From David.Messina at sbc.su.se  Thu Aug 26 04:53:48 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 26 Aug 2010 10:53:48 +0200
Subject: [Bioperl-l] getting DNA sequence for exon features from GFF
In-Reply-To: <AANLkTimzuqNz+oHZt3JAq=eQaefJ4UnQK=0R2tQhKfmS@mail.gmail.com>
References: <AANLkTimzuqNz+oHZt3JAq=eQaefJ4UnQK=0R2tQhKfmS@mail.gmail.com>
Message-ID: <6A715DB8-C51A-4195-BB6C-73A7345CB65A@sbc.su.se>

Admittedly i'm not up on the latest uses of GFF, but as far as I know, GFF is an annotation format only ? it does not contain the actual sequence.

Have you looked in your GFF file to see if there are nucleotides in there?

Dave


On Aug 26, 2010, at 10:29, kanmani radha wrote:

> Hi All,
> I would like to get the DNA seq from GFF file. I'm using Bio::Tools::GFF
> module. I could get everything else but not the DNA seq.


From biopython at maubp.freeserve.co.uk  Thu Aug 26 05:02:53 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 26 Aug 2010 10:02:53 +0100
Subject: [Bioperl-l] getting DNA sequence for exon features from GFF
In-Reply-To: <6A715DB8-C51A-4195-BB6C-73A7345CB65A@sbc.su.se>
References: <AANLkTimzuqNz+oHZt3JAq=eQaefJ4UnQK=0R2tQhKfmS@mail.gmail.com>
	<6A715DB8-C51A-4195-BB6C-73A7345CB65A@sbc.su.se>
Message-ID: <AANLkTikw=9zFm5sZej0C4kTQZMnvoFNox06jCC6p9Jxy@mail.gmail.com>

On Thu, Aug 26, 2010 at 9:53 AM, Dave Messina <David.Messina at sbc.su.se> wrote:
>
> Admittedly i'm not up on the latest uses of GFF, but as far as I know, GFF
> is an annotation format only ? it does not contain the actual sequence.
>
> Have you looked in your GFF file to see if there are nucleotides in there?
>
> Dave

Actually a GFF file can optionally include a FASTA format sequence
at the end of the file, although it seems to be more common to just
supply separate GFF and FASTA files and cross reference by ID.

Peter


From David.Messina at sbc.su.se  Thu Aug 26 05:08:20 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 26 Aug 2010 11:08:20 +0200
Subject: [Bioperl-l] getting DNA sequence for exon features from GFF
In-Reply-To: <AANLkTikw=9zFm5sZej0C4kTQZMnvoFNox06jCC6p9Jxy@mail.gmail.com>
References: <AANLkTimzuqNz+oHZt3JAq=eQaefJ4UnQK=0R2tQhKfmS@mail.gmail.com>
	<6A715DB8-C51A-4195-BB6C-73A7345CB65A@sbc.su.se>
	<AANLkTikw=9zFm5sZej0C4kTQZMnvoFNox06jCC6p9Jxy@mail.gmail.com>
Message-ID: <C7C28E1D-7BAC-4D06-9EC6-71EA95F06776@sbc.su.se>

Aha, great, thanks for clarifying, Peter.

And if I bothered to look at the Bio::Tools::GFF documentation before answering :), I would have seen this:

    http://doc.bioperl.org/bioperl-live/Bio/Tools/GFF.html#General

which describes how you can use

    $gffio->get_seqs()


and related methods to pull out the sequence data.


Dave


On Aug 26, 2010, at 11:02, Peter wrote:

> On Thu, Aug 26, 2010 at 9:53 AM, Dave Messina <David.Messina at sbc.su.se> wrote:
>> 
>> Admittedly i'm not up on the latest uses of GFF, but as far as I know, GFF
>> is an annotation format only ? it does not contain the actual sequence.
>> 
>> Have you looked in your GFF file to see if there are nucleotides in there?
>> 
>> Dave
> 
> Actually a GFF file can optionally include a FASTA format sequence
> at the end of the file, although it seems to be more common to just
> supply separate GFF and FASTA files and cross reference by ID.
> 
> Peter


From David.Messina at sbc.su.se  Thu Aug 26 05:18:25 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 26 Aug 2010 11:18:25 +0200
Subject: [Bioperl-l] getting DNA sequence for exon features from GFF
In-Reply-To: <C7C28E1D-7BAC-4D06-9EC6-71EA95F06776@sbc.su.se>
References: <AANLkTimzuqNz+oHZt3JAq=eQaefJ4UnQK=0R2tQhKfmS@mail.gmail.com>
	<6A715DB8-C51A-4195-BB6C-73A7345CB65A@sbc.su.se>
	<AANLkTikw=9zFm5sZej0C4kTQZMnvoFNox06jCC6p9Jxy@mail.gmail.com>
	<C7C28E1D-7BAC-4D06-9EC6-71EA95F06776@sbc.su.se>
Message-ID: <984552CF-01F3-4D29-932F-DD030CCC1448@sbc.su.se>

So, just to finish the thought:

Kanmani,

Apologies for my sloppy and uninformed answer. The following is only slightly less sloppy and uninformed, but may actually answer your question.

I think you need to call 

   $gffio->get_seqs()

probably as

  my @seq_objects = $gffio->get_seqs();


and then loop through those something like:

	foreach my $seq_object (@seq_objects) {
		my $seq = $seq_object->seq();
    
		foreach my $feat ($seq->get_SeqFeatures) {
			# do your feature processing here
		}
	}


Note that I haven't tested the above code.


Dave


From fs5 at sanger.ac.uk  Thu Aug 26 05:19:44 2010
From: fs5 at sanger.ac.uk (Frank Schwach)
Date: Thu, 26 Aug 2010 10:19:44 +0100
Subject: [Bioperl-l] getting DNA sequence for exon features from GFF
In-Reply-To: <AANLkTimzuqNz+oHZt3JAq=eQaefJ4UnQK=0R2tQhKfmS@mail.gmail.com>
References: <AANLkTimzuqNz+oHZt3JAq=eQaefJ4UnQK=0R2tQhKfmS@mail.gmail.com>
Message-ID: <1282814384.4777.48.camel@deskpro15336.dynamic.sanger.ac.uk>

Hi Kammani,

While GFF files may contain DNA sequence data, most of them don't, so
you will have to use the location information you get from the GFF
annotation file in conjunction with, e.g., a local FASTA database of the
genomic sequence you are working with or an online resource.


Frank


On Thu, 2010-08-26 at 01:29 -0700, kanmani radha wrote:
> Hi All,
> I would like to get the DNA seq from GFF file. I'm using Bio::Tools::GFF
> module. I could get everything else but not the DNA seq.
> 
> Can anyone help me to find this out, Please. I appreciate your help very
> much.
> thanks,
> Kanmani
> 
> #!/usr/bin/perl
> 
> use strict;
> use warnings;
> use Bio::Tools::GFF;
> 
> my $file = shift;
> 
> my $gffio = Bio::Tools::GFF->new(-file => $file, -gff_version => 3);
> $gffio->features_attached_to_seqs(1);
> 
> while (my $feat = $gffio->next_feature()){
>     my $start = $feat->start;
>     my $end= $feat->end;
>     my $size = $end-$start+1;
>     my $strand = $feat->strand;
>     my $seqid = $feat->seq_id;
>     my $score = $feat->score;
>     my $frame = $feat->frame;
>     my $source = $feat->source_tag;
>     my $type = $feat->primary_tag;
>     my $gffstr = $gffio->gff_string($feat);
>     my @alltags = $feat->all_tags();
>     my @ID_tag_value = $feat->each_tag_value("ID");
> 
>     my  $seq = $feat->seq();
>     print "$seq\n";
> 
>      if($type eq "gene"){     #
>        print "@ID_tag_value\t$size\t$type\t$start\t$end\n";
>     }
> }
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


From cjfields at illinois.edu  Thu Aug 26 10:20:48 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 26 Aug 2010 09:20:48 -0500
Subject: [Bioperl-l] getting DNA sequence for exon features from GFF
In-Reply-To: <1282814384.4777.48.camel@deskpro15336.dynamic.sanger.ac.uk>
References: <AANLkTimzuqNz+oHZt3JAq=eQaefJ4UnQK=0R2tQhKfmS@mail.gmail.com>
	<1282814384.4777.48.camel@deskpro15336.dynamic.sanger.ac.uk>
Message-ID: <6CC5D44B-47C7-4A05-B834-EAB9173C74CE@illinois.edu>

Kammani,

If you are using BioPerl, the best option currently available is to load a database with all relevant information (GFF and FASTA), then use that database for querying.  The most commonly-used ones now are Bio::DB::SeqFeature::Store and Bio::DB::GFF; the former is very GFF3-centric, but I believe it can handle GFF/GTF, and it has various database adaptors (MySQL, Pg, BDB, SQLite).

chris

On Aug 26, 2010, at 4:19 AM, Frank Schwach wrote:

> Hi Kammani,
> 
> While GFF files may contain DNA sequence data, most of them don't, so
> you will have to use the location information you get from the GFF
> annotation file in conjunction with, e.g., a local FASTA database of the
> genomic sequence you are working with or an online resource.
> 
> 
> Frank
> 
> 
> 
> On Thu, 2010-08-26 at 01:29 -0700, kanmani radha wrote:
>> Hi All,
>> I would like to get the DNA seq from GFF file. I'm using Bio::Tools::GFF
>> module. I could get everything else but not the DNA seq.
>> 
>> Can anyone help me to find this out, Please. I appreciate your help very
>> much.
>> thanks,
>> Kanmani
>> 
>> #!/usr/bin/perl
>> 
>> use strict;
>> use warnings;
>> use Bio::Tools::GFF;
>> 
>> my $file = shift;
>> 
>> my $gffio = Bio::Tools::GFF->new(-file => $file, -gff_version => 3);
>> $gffio->features_attached_to_seqs(1);
>> 
>> while (my $feat = $gffio->next_feature()){
>>    my $start = $feat->start;
>>    my $end= $feat->end;
>>    my $size = $end-$start+1;
>>    my $strand = $feat->strand;
>>    my $seqid = $feat->seq_id;
>>    my $score = $feat->score;
>>    my $frame = $feat->frame;
>>    my $source = $feat->source_tag;
>>    my $type = $feat->primary_tag;
>>    my $gffstr = $gffio->gff_string($feat);
>>    my @alltags = $feat->all_tags();
>>    my @ID_tag_value = $feat->each_tag_value("ID");
>> 
>>    my  $seq = $feat->seq();
>>    print "$seq\n";
>> 
>>     if($type eq "gene"){     #
>>       print "@ID_tag_value\t$size\t$type\t$start\t$end\n";
>>    }
>> }
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> 
> -- 
> The Wellcome Trust Sanger Institute is operated by Genome Research 
> Limited, a charity registered in England with number 1021457 and a 
> company registered in England with number 2742969, whose registered 
> office is 215 Euston Road, London, NW1 2BE. 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Thu Aug 26 10:31:59 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 26 Aug 2010 09:31:59 -0500
Subject: [Bioperl-l] getting DNA sequence for exon features from GFF
In-Reply-To: <AANLkTikw=9zFm5sZej0C4kTQZMnvoFNox06jCC6p9Jxy@mail.gmail.com>
References: <AANLkTimzuqNz+oHZt3JAq=eQaefJ4UnQK=0R2tQhKfmS@mail.gmail.com>
	<6A715DB8-C51A-4195-BB6C-73A7345CB65A@sbc.su.se>
	<AANLkTikw=9zFm5sZej0C4kTQZMnvoFNox06jCC6p9Jxy@mail.gmail.com>
Message-ID: <DD36A578-4156-4911-8432-84BD5ECB3AB8@illinois.edu>

On Aug 26, 2010, at 4:02 AM, Peter wrote:

> On Thu, Aug 26, 2010 at 9:53 AM, Dave Messina <David.Messina at sbc.su.se> wrote:
>> 
>> Admittedly i'm not up on the latest uses of GFF, but as far as I know, GFF
>> is an annotation format only ? it does not contain the actual sequence.
>> 
>> Have you looked in your GFF file to see if there are nucleotides in there?
>> 
>> Dave
> 
> Actually a GFF file can optionally include a FASTA format sequence
> at the end of the file, although it seems to be more common to just
> supply separate GFF and FASTA files and cross reference by ID.
> 
> Peter

IIRC, optionally including FASTA sequence is specified only in the GFF3 spec; use of FASTA isn't explicitly mentioned in earlier versions.  We only support it with earlier GFF due to convergence of the various GFF parsers.  

The original GFF spec proposed allowing sequence, but it's in the form of meta information and I have never seen it used in practice (as you mention, the FASTA is normally loaded separately).

chris


From kanmaninradha at gmail.com  Thu Aug 26 12:22:14 2010
From: kanmaninradha at gmail.com (kanmani radha)
Date: Thu, 26 Aug 2010 09:22:14 -0700
Subject: [Bioperl-l] getting DNA sequence for exon features from GFF
In-Reply-To: <6CC5D44B-47C7-4A05-B834-EAB9173C74CE@illinois.edu>
References: <AANLkTimzuqNz+oHZt3JAq=eQaefJ4UnQK=0R2tQhKfmS@mail.gmail.com>
	<1282814384.4777.48.camel@deskpro15336.dynamic.sanger.ac.uk>
	<6CC5D44B-47C7-4A05-B834-EAB9173C74CE@illinois.edu>
Message-ID: <AANLkTinxcoKBHqU7bnfyNA6bi5qBjNAYR54c6K+Pg7rz@mail.gmail.com>

Hi Everyone,

Thanks very much for this clarification.  Thanks a ton for every one who
spared their time to educate me.

I see your points.  Please correct me if I am wrong.

I understand that, Its better to use use Bio::DB::SeqFeature or Bio::DB::GFF
to load the fasta sequences (from a separate multifasta) file and
then Bio::Tools::GFF to parse the feature info from a gff file . Then query
the created database for the relevent GFF coordinates....

I will implement this.

Thanks once again.
Kanmani

On Thu, Aug 26, 2010 at 7:20 AM, Chris Fields <cjfields at illinois.edu> wrote:

> Kammani,
>
> If you are using BioPerl, the best option currently available is to load a
> database with all relevant information (GFF and FASTA), then use that
> database for querying.  The most commonly-used ones now are
> Bio::DB::SeqFeature::Store and Bio::DB::GFF; the former is very
> GFF3-centric, but I believe it can handle GFF/GTF, and it has various
> database adaptors (MySQL, Pg, BDB, SQLite).
>
> chris
>
> On Aug 26, 2010, at 4:19 AM, Frank Schwach wrote:
>
> > Hi Kammani,
> >
> > While GFF files may contain DNA sequence data, most of them don't, so
> > you will have to use the location information you get from the GFF
> > annotation file in conjunction with, e.g., a local FASTA database of the
> > genomic sequence you are working with or an online resource.
> >
> >
> > Frank
> >
> >
> >
> > On Thu, 2010-08-26 at 01:29 -0700, kanmani radha wrote:
> >> Hi All,
> >> I would like to get the DNA seq from GFF file. I'm using Bio::Tools::GFF
> >> module. I could get everything else but not the DNA seq.
> >>
> >> Can anyone help me to find this out, Please. I appreciate your help very
> >> much.
> >> thanks,
> >> Kanmani
> >>
> >> #!/usr/bin/perl
> >>
> >> use strict;
> >> use warnings;
> >> use Bio::Tools::GFF;
> >>
> >> my $file = shift;
> >>
> >> my $gffio = Bio::Tools::GFF->new(-file => $file, -gff_version => 3);
> >> $gffio->features_attached_to_seqs(1);
> >>
> >> while (my $feat = $gffio->next_feature()){
> >>    my $start = $feat->start;
> >>    my $end= $feat->end;
> >>    my $size = $end-$start+1;
> >>    my $strand = $feat->strand;
> >>    my $seqid = $feat->seq_id;
> >>    my $score = $feat->score;
> >>    my $frame = $feat->frame;
> >>    my $source = $feat->source_tag;
> >>    my $type = $feat->primary_tag;
> >>    my $gffstr = $gffio->gff_string($feat);
> >>    my @alltags = $feat->all_tags();
> >>    my @ID_tag_value = $feat->each_tag_value("ID");
> >>
> >>    my  $seq = $feat->seq();
> >>    print "$seq\n";
> >>
> >>     if($type eq "gene"){     #
> >>       print "@ID_tag_value\t$size\t$type\t$start\t$end\n";
> >>    }
> >> }
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> >
> > --
> > The Wellcome Trust Sanger Institute is operated by Genome Research
> > Limited, a charity registered in England with number 1021457 and a
> > company registered in England with number 2742969, whose registered
> > office is 215 Euston Road, London, NW1 2BE.
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From cjfields at illinois.edu  Thu Aug 26 13:08:56 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 26 Aug 2010 12:08:56 -0500
Subject: [Bioperl-l] getting DNA sequence for exon features from GFF
In-Reply-To: <AANLkTinxcoKBHqU7bnfyNA6bi5qBjNAYR54c6K+Pg7rz@mail.gmail.com>
References: <AANLkTimzuqNz+oHZt3JAq=eQaefJ4UnQK=0R2tQhKfmS@mail.gmail.com>
	<1282814384.4777.48.camel@deskpro15336.dynamic.sanger.ac.uk>
	<6CC5D44B-47C7-4A05-B834-EAB9173C74CE@illinois.edu>
	<AANLkTinxcoKBHqU7bnfyNA6bi5qBjNAYR54c6K+Pg7rz@mail.gmail.com>
Message-ID: <EF1B137F-94A7-45E1-B8FB-0E20142F0A7F@illinois.edu>

On Aug 26, 2010, at 11:22 AM, kanmani radha wrote:

> Hi Everyone,
> 
> Thanks very much for this clarification.  Thanks a ton for every one who
> spared their time to educate me.
> 
> I see your points.  Please correct me if I am wrong.
> 
> I understand that, Its better to use use Bio::DB::SeqFeature or Bio::DB::GFF
> to load the fasta sequences (from a separate multifasta) file and
> then Bio::Tools::GFF to parse the feature info from a gff file . Then query
> the created database for the relevent GFF coordinates....
> 
> I will implement this.
> 
> Thanks once again.
> Kanmani

Yes, in general.  I forgot to mention that you can have an in-memory database as well, but it's only suggested if you have a few thousand or so features and small sequences (I think bacterial chromosomes will work).  

chris


From Havard.Aanes at nvh.no  Wed Aug 25 11:47:12 2010
From: Havard.Aanes at nvh.no (=?iso-8859-1?Q?Aanes_H=E5vard?=)
Date: Wed, 25 Aug 2010 17:47:12 +0200
Subject: [Bioperl-l] bpfetch.pl
Message-ID: <897520BC3AAE754FA4E34E2FD26490A8021C61597B8D@A-EXMB1.veths.no>


Hi,

I am trying do obtain a set of mRNA sequences from a database, made by the bpindex script. I thought this should be a trivial task, but it appears not to be. I get the sequences if I do one by one, like this:

perl scripts/index/bpfetch.pl -dir ./ zebrafish:NM_201192 zebrafish:NM_212708

But I need hundreds of sequences, so my plan was to put the RefSeq IDs in a file and use that as an argument (or whatever it is called in perl). That does not work:

haavaaan at login2 ~/download/src/bioperl-1.2.3 $ perl scripts/index/bpfetch.pl -dir ./ zebrafish:./some_seqs

You are running bpindex.pl without installing bioperl.
You have done it from bioperl/scripts, and so we can find the necessary information
but it is much better to install bioperl

Please read the README in the bioperl distribution

Sequence %id in Database zebrafish is not present


Any suggestions on how to do this? Alternative approaches are also appreciated.

I have no experience in perl, just started using linux, and for the moment there is no time to learn perl, so I would really be grateful for any help to solve this specific task.

Best regards

H?vard Aanes (M.Sc.)
Ph.D. student
Section for biochemistry and physiology
The Norwegian School of Veterinary Science
Telephone: +47 22597358


The new e-mail domain name for The Norwegian School of Veterinary Science is @nvh.no.
The former domain address @veths.no will still be in use, but it will be discontinued within 1-2 years.
Please update your e-mail records.


This message verifies that the e-mail has been 
scanned for virus, and deemed virus-free 
according to our scanengines.


From kanmaninradha at gmail.com  Thu Aug 26 04:23:28 2010
From: kanmaninradha at gmail.com (kanmani)
Date: Thu, 26 Aug 2010 01:23:28 -0700 (PDT)
Subject: [Bioperl-l] Bio::Tools:GFF to get DNA sequences...
Message-ID: <9b7381d7-3596-4e60-a2ac-6c8c135d457d@s24g2000pri.googlegroups.com>

Hi I am trying to get the DNA sequences for each exon feature. I have
the following script. Everything works except getting sequences. Can
some one correct me.....Thanks.

#!/usr/bin/perl

use strict;
use warnings;
use Bio::Tools::GFF;


my $file = shift;
my $gffio = Bio::Tools::GFF->new(-file => $file, -gff_version => 3);
$gffio->features_attached_to_seqs(1);

while (my $feat = $gffio->next_feature()){
    my $start = $feat->start;
    my $end= $feat->end;
    my $size = $end-$start+1;
    my $strand = $feat->strand;
    my $seqid = $feat->seq_id;
    my $score = $feat->score;
    my $frame = $feat->frame;
    my $source = $feat->source_tag;
    my $type = $feat->primary_tag;
    my $gffstr = $gffio->gff_string($feat);
    my @alltags = $feat->all_tags();
    my @ID_tag_value = $feat->each_tag_value("ID");

   my  $seq = $feat->seq();
   print "$seq\n";

  if($type eq "gene"){
       print "@ID_tag_value\t$size\t$type\t$start\t$end\n";
    }
}


From kanmaninradha at gmail.com  Thu Aug 26 17:24:40 2010
From: kanmaninradha at gmail.com (kanmani radha)
Date: Thu, 26 Aug 2010 14:24:40 -0700
Subject: [Bioperl-l] getting DNA sequence for exon features from GFF
In-Reply-To: <EF1B137F-94A7-45E1-B8FB-0E20142F0A7F@illinois.edu>
References: <AANLkTimzuqNz+oHZt3JAq=eQaefJ4UnQK=0R2tQhKfmS@mail.gmail.com>
	<1282814384.4777.48.camel@deskpro15336.dynamic.sanger.ac.uk>
	<6CC5D44B-47C7-4A05-B834-EAB9173C74CE@illinois.edu>
	<AANLkTinxcoKBHqU7bnfyNA6bi5qBjNAYR54c6K+Pg7rz@mail.gmail.com>
	<EF1B137F-94A7-45E1-B8FB-0E20142F0A7F@illinois.edu>
Message-ID: <AANLkTikUxFLLAduO7M1QzSToewA_AgPPELKPVYq0+JKk@mail.gmail.com>

Hi Chris and others,

For a brief amount time i could get away using Bio::DB::Fasta to index fasta
files and Bio::Tools::GFF to iterate thru GFF features. But, i hit the wall
again. Looks like sequential access of GFF featuers is not sufficient, I
want to have a random access to it. I see the only way to do that is by
using Bio::DB::GFF as suggested by Chris.

Here is my question. Is there any tutorial to configure Bioperl  or this
module in particular to work with MySQL/postgres. I will really appreciate
it.

And thanks for all your help.
Kanmani

On Thu, Aug 26, 2010 at 10:08 AM, Chris Fields <cjfields at illinois.edu>wrote:

> On Aug 26, 2010, at 11:22 AM, kanmani radha wrote:
>
> > Hi Everyone,
> >
> > Thanks very much for this clarification.  Thanks a ton for every one who
> > spared their time to educate me.
> >
> > I see your points.  Please correct me if I am wrong.
> >
> > I understand that, Its better to use use Bio::DB::SeqFeature or
> Bio::DB::GFF
> > to load the fasta sequences (from a separate multifasta) file and
> > then Bio::Tools::GFF to parse the feature info from a gff file . Then
> query
> > the created database for the relevent GFF coordinates....
> >
> > I will implement this.
> >
> > Thanks once again.
> > Kanmani
>
> Yes, in general.  I forgot to mention that you can have an in-memory
> database as well, but it's only suggested if you have a few thousand or so
> features and small sequences (I think bacterial chromosomes will work).
>
> chris


From kanmaninradha at gmail.com  Thu Aug 26 18:04:20 2010
From: kanmaninradha at gmail.com (kanmani radha)
Date: Thu, 26 Aug 2010 15:04:20 -0700
Subject: [Bioperl-l] getting DNA sequence for exon features from GFF
In-Reply-To: <AANLkTikUxFLLAduO7M1QzSToewA_AgPPELKPVYq0+JKk@mail.gmail.com>
References: <AANLkTimzuqNz+oHZt3JAq=eQaefJ4UnQK=0R2tQhKfmS@mail.gmail.com>
	<1282814384.4777.48.camel@deskpro15336.dynamic.sanger.ac.uk>
	<6CC5D44B-47C7-4A05-B834-EAB9173C74CE@illinois.edu>
	<AANLkTinxcoKBHqU7bnfyNA6bi5qBjNAYR54c6K+Pg7rz@mail.gmail.com>
	<EF1B137F-94A7-45E1-B8FB-0E20142F0A7F@illinois.edu>
	<AANLkTikUxFLLAduO7M1QzSToewA_AgPPELKPVYq0+JKk@mail.gmail.com>
Message-ID: <AANLkTimTU87G1dajASCzHm5=pjHCKx8W5X8AR9TKLmU4@mail.gmail.com>

HI, I made some progress since then....
- Installing  Bio::DB::DBI::mysql needed Biosql.

- Downloaded and installed biosql follow the instruction as given in their
INSTALL file
- Created biosql db in my mysql server
- loaded schema using script from biosql

- installed DBI
- Now, I have problem with DBD::mysql. That reminds me couple years back i
had to struggle installing this driver on another machine. I thought i ask
around this time.

It fails with a bunch of error messages.....the first of it being....
dbdimp.h:22:49 error: mysql.h no such filer or directory

But, My mysql installation has header file in
"/usr/include/mysql3/mysql/mysql.h". Can anyone suggest how to move forward
from that.....

thanks,
Kanmani

On Thu, Aug 26, 2010 at 2:24 PM, kanmani radha <kanmaninradha at gmail.com>wrote:

> Hi Chris and others,
>
> For a brief amount time i could get away using Bio::DB::Fasta to index
> fasta files and Bio::Tools::GFF to iterate thru GFF features. But, i hit the
> wall again. Looks like sequential access of GFF featuers is not sufficient,
> I want to have a random access to it. I see the only way to do that is by
> using Bio::DB::GFF as suggested by Chris.
>
> Here is my question. Is there any tutorial to configure Bioperl  or this
> module in particular to work with MySQL/postgres. I will really appreciate
> it.
>
> And thanks for all your help.
> Kanmani
>
>
> On Thu, Aug 26, 2010 at 10:08 AM, Chris Fields <cjfields at illinois.edu>wrote:
>
>> On Aug 26, 2010, at 11:22 AM, kanmani radha wrote:
>>
>> > Hi Everyone,
>> >
>> > Thanks very much for this clarification.  Thanks a ton for every one who
>> > spared their time to educate me.
>> >
>> > I see your points.  Please correct me if I am wrong.
>> >
>> > I understand that, Its better to use use Bio::DB::SeqFeature or
>> Bio::DB::GFF
>> > to load the fasta sequences (from a separate multifasta) file and
>> > then Bio::Tools::GFF to parse the feature info from a gff file . Then
>> query
>> > the created database for the relevent GFF coordinates....
>> >
>> > I will implement this.
>> >
>> > Thanks once again.
>> > Kanmani
>>
>> Yes, in general.  I forgot to mention that you can have an in-memory
>> database as well, but it's only suggested if you have a few thousand or so
>> features and small sequences (I think bacterial chromosomes will work).
>>
>> chris
>
>
>


From rafalucas.unicamp at gmail.com  Thu Aug 26 18:11:07 2010
From: rafalucas.unicamp at gmail.com (Rafael Lucas)
Date: Thu, 26 Aug 2010 19:11:07 -0300
Subject: [Bioperl-l] Help in algorithm Bio::Structure::IO::pdb
Message-ID: <AANLkTi=zWPKeY1NpRA9TBSEnsbGH1W9F0y0QQ0+um7Yq@mail.gmail.com>

Hi folks,

How are you? I'm from Brazil and I was making an algorithm that
Cryptographyc a data and then print the result in a pdb file. So I have a
.fasta file and want to pass this file to .pdb file, if I use a program,
like PyMol, it will take so much time, so I wanna use the
Bio::Structure::IO::pdb to accelerate this process, could you help me in
this problem?

Thank you,

Rafael Lucas
Faculdade de Tecnologia em Analise e Desenvolvimento de Sistemas
FT - UNICAMP
+55 (19)9614-0533


From J.Christopher.Ellis at duke.edu  Thu Aug 26 22:06:30 2010
From: J.Christopher.Ellis at duke.edu (J. Christopher Ellis)
Date: Thu, 26 Aug 2010 22:06:30 -0400
Subject: [Bioperl-l] standaloneblastplus blastn crash
Message-ID: <55861.1282874790@duke.edu>

 When I run the standaloneblastplus I get the following error...

 ------------- EXCEPTION -------------
 MSG: C:Program FilesNCBIblast-2.2.24+binblastn.exe call crashed: There
was a problem running C:Program FilesNCBIblast-2.2.24+binblastn.exe :? at
C:/Perl64/lib/Bio/Tools/Run/WrapperBase/CommandExts.pm line 1001.

 STACK Bio::Tools::Run::WrapperBase::_run
C:/Perl64/lib/Bio/Tools/Run/WrapperBase/CommandExts.pm:1006
 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD
C:/Perl64/lib/Bio/Tools/Run/StandAloneBlastPlus.pm:1303
 STACK Bio::Tools::Run::StandAloneBlastPlus::run
C:/Perl64/lib/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:270
 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD
C:/Perl64/lib/Bio/Tools/Run/StandAloneBlastPlus.pm:1301
 STACK toplevel localBlast.pl:9
 -------------------------------------

 I have a sneaky suspicion that it is an easy fix but for the life of me I
can not figure it out! :)

 Thanks in advance,
 Chris
 

From indraniel at gmail.com  Thu Aug 26 21:57:54 2010
From: indraniel at gmail.com (Indraniel)
Date: Fri, 27 Aug 2010 01:57:54 +0000 (UTC)
Subject: [Bioperl-l] How to convert SFF into Fastq
References: <COL102-W14F3F0CDA966B9ECE0BE1BFABB0@phx.gbl>
	<AANLkTilN3rsgWEjvmyMq9IjC8p5MzBdGGe-Xtfd6XoZF@mail.gmail.com>
	<AANLkTikC-I0JFvWqptlA69qrKnKrWSNyNPAwHQKSLluJ@mail.gmail.com>
Message-ID: <loom.20100827T035104-821@post.gmane.org>

A fourth option is the following tool, sff2fastq (written in C), described here:

http://indraniel.wordpress.com/2010/04/23/sff2fastq/

and 

http://github.com/indraniel/sff2fastq

Indraniel


From David.Messina at sbc.su.se  Fri Aug 27 03:41:21 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 27 Aug 2010 09:41:21 +0200
Subject: [Bioperl-l] [RFC] Interolog::Walk
In-Reply-To: <4C6D0B50.4050902@sms.ed.ac.uk>
References: <4C6BF4BD.5010200@sms.ed.ac.uk>
	<8BBCD561-CA5E-412B-8CBC-34767006A74A@sbc.su.se>
	<4C6D0B50.4050902@sms.ed.ac.uk>
Message-ID: <A5AACD38-0396-4221-B6F7-5740FBBD83E0@sbc.su.se>

Hi Giuseppe,


On Aug 19, 2010, at 12:45, Giuseppe Gallone wrote:
> Bio::Orthology::InterologMap
> Bio::Orthology::Interolog::Map,

> just in case somebody else finds other interesting applications for the Interolog concept and would like to "plug in" their own contribution. Would this make any sense?

Absolutely. I think either of the above is a good option, and I agree that the second is a little more flexible.

Your POD looks great! Way better than most. Having seen the whole thing now, I think your description is fine as is. And if you have another tutorial and example scripts on top of it, that would really be terrific, above and beyond what most people would expect.

So, time to unleash it on the world! :)


Dave


From David.Messina at sbc.su.se  Fri Aug 27 03:58:12 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 27 Aug 2010 09:58:12 +0200
Subject: [Bioperl-l] standaloneblastplus blastn crash
In-Reply-To: <55861.1282874790@duke.edu>
References: <55861.1282874790@duke.edu>
Message-ID: <9275A540-AE42-47B0-BA73-A906964C451B@sbc.su.se>

Hi Chris,

If you look at the error message, it says what the problem is: it's trying to call the blastn executable with no spaces in the path name.

> MSG: C:Program FilesNCBIblast-2.2.24+binblastn.exe call crashed: There
> was a problem running C:Program FilesNCBIblast-2.2.24+binblastn.exe


Now, that could be a problem is BioPerl or it could be a problem in your code. It's hard to diagnose where the problem lies without your code, so please post your code.


Dave


From G.Gallone at sms.ed.ac.uk  Fri Aug 27 07:07:57 2010
From: G.Gallone at sms.ed.ac.uk (Giuseppe Gallone)
Date: Fri, 27 Aug 2010 12:07:57 +0100
Subject: [Bioperl-l] [RFC] Interolog::Walk
In-Reply-To: <A5AACD38-0396-4221-B6F7-5740FBBD83E0@sbc.su.se>
References: <4C6BF4BD.5010200@sms.ed.ac.uk>
	<8BBCD561-CA5E-412B-8CBC-34767006A74A@sbc.su.se>
	<4C6D0B50.4050902@sms.ed.ac.uk>
	<A5AACD38-0396-4221-B6F7-5740FBBD83E0@sbc.su.se>
Message-ID: <4C779C8D.1090007@sms.ed.ac.uk>

Hi Dave,

thank you very much for your feedback :) . I will register the namespace 
right now. I think I will use 'homology' as the second level name 
though, because I plan to extend the module to work with paralogues as well.

As for the category, which one of the following you reckon it will fit a 
Bio:: package better

http://www.cpan.org/modules/by-category/

Regards
Giuseppe

On 27/08/10 08:41, Dave Messina wrote:
> Hi Giuseppe,
>
>
> On Aug 19, 2010, at 12:45, Giuseppe Gallone wrote:
>> Bio::Orthology::InterologMap
>> Bio::Orthology::Interolog::Map,
>
>> just in case somebody else finds other interesting applications for the Interolog concept and would like to "plug in" their own contribution. Would this make any sense?
>
> Absolutely. I think either of the above is a good option, and I agree that the second is a little more flexible.
>
> Your POD looks great! Way better than most. Having seen the whole thing now, I think your description is fine as is. And if you have another tutorial and example scripts on top of it, that would really be terrific, above and beyond what most people would expect.
>
> So, time to unleash it on the world! :)
>
>
> Dave
>
>

-- 

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


From David.Messina at sbc.su.se  Fri Aug 27 07:25:06 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 27 Aug 2010 13:25:06 +0200
Subject: [Bioperl-l] [RFC] Interolog::Walk
In-Reply-To: <4C779C8D.1090007@sms.ed.ac.uk>
References: <4C6BF4BD.5010200@sms.ed.ac.uk>
	<8BBCD561-CA5E-412B-8CBC-34767006A74A@sbc.su.se>
	<4C6D0B50.4050902@sms.ed.ac.uk>
	<A5AACD38-0396-4221-B6F7-5740FBBD83E0@sbc.su.se>
	<4C779C8D.1090007@sms.ed.ac.uk>
Message-ID: <80E5F23B-EA13-40EE-B0C5-81F2E6A69C01@sbc.su.se>

Hi Giuseppe,


> I think I will use 'homology' as the second level name though, because I plan to extend the module to work with paralogues as well.

Sounds good.


> As for the category, which one of the following you reckon it will fit a Bio:: package better
> 
> http://www.cpan.org/modules/by-category/


Bio:: is in 23 - miscellaneous modules, so probably keeping with that makes sense.

I don't know much about that stuff, though. Chris F. or other CPAN cognoscenti care to comment?


Dave


From cjfields at illinois.edu  Fri Aug 27 09:26:51 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 27 Aug 2010 08:26:51 -0500
Subject: [Bioperl-l] [RFC] Interolog::Walk
In-Reply-To: <80E5F23B-EA13-40EE-B0C5-81F2E6A69C01@sbc.su.se>
References: <4C6BF4BD.5010200@sms.ed.ac.uk>
	<8BBCD561-CA5E-412B-8CBC-34767006A74A@sbc.su.se>
	<4C6D0B50.4050902@sms.ed.ac.uk>
	<A5AACD38-0396-4221-B6F7-5740FBBD83E0@sbc.su.se>
	<4C779C8D.1090007@sms.ed.ac.uk>
	<80E5F23B-EA13-40EE-B0C5-81F2E6A69C01@sbc.su.se>
Message-ID: <88BB7813-E892-4BEC-9C49-5FD22325BBF7@illinois.edu>

On Aug 27, 2010, at 6:25 AM, Dave Messina wrote:

> Hi Giuseppe,
> 
> 
>> I think I will use 'homology' as the second level name though, because I plan to extend the module to work with paralogues as well.
> 
> Sounds good.
> 
> 
>> As for the category, which one of the following you reckon it will fit a Bio:: package better
>> 
>> http://www.cpan.org/modules/by-category/
> 
> 
> Bio:: is in 23 - miscellaneous modules, so probably keeping with that makes sense.
> 
> I don't know much about that stuff, though. Chris F. or other CPAN cognoscenti care to comment?
> 
> 
> Dave

That's probably the best spot, as we cover a fairly broad range (mainly due to core monolithic structure).  Though it's terribly non-descript, sort of the junk drawer of CPAN.

chris


From adamkennedybackup at gmail.com  Sun Aug 29 07:35:50 2010
From: adamkennedybackup at gmail.com (Adam Kennedy)
Date: Sun, 29 Aug 2010 21:35:50 +1000
Subject: [Bioperl-l] Could I install BioPerl on Windows with the
 ActivePerl 5.12.1?
In-Reply-To: <5115F433-06AC-46F1-81AD-D15C4A8D9524@gmail.com>
References: <AANLkTi=ycKzqWWQ-FHk=4WBxhedt7CYT-WkBZkxRjgrm@mail.gmail.com>
	<78E913D5-00E2-45F2-AA9D-7F4A7CDBFDA1@gmail.com>
	<AANLkTimBPL6Sr2kmg+f0t1j8pk_9nBAoqubKzY4AJoxo@mail.gmail.com>
	<5115F433-06AC-46F1-81AD-D15C4A8D9524@gmail.com>
Message-ID: <AANLkTinSp6GCOQvCFYOUk1Ad8EjKdU=dQbe5GpbLiLr1@mail.gmail.com>

http://strawberryperl.com/download/professional/strawberry-perl-professional-5.10.1.3-alpha-2.msi

You get BioPerl installed out the box.

Adam K

On 20 August 2010 03:20, Christopher Fields <cjfields1 at gmail.com> wrote:
> cc'ing list. ?Looks like the BioPerl PPM is possibly broken for perl 5.12. ?Shouldn't be too hard to fix, but apparently there are a lot of missing packages. Troubling...
>
> chris
>
> On Aug 19, 2010, at 11:29 AM, han sun wrote:
>
>> v5.10 works,thanks.
>>
>> 2010/8/19 Christopher Fields <cjfields1 at gmail.com>
>> Try using ActivePerl 5.10 instead of v5.12. ?It's very possible the PPM won't work for v5.12 yet.
>>
>> chris
>>
>> On Aug 19, 2010, at 9:25 AM, han sun wrote:
>>
>> > Hello everyone,
>> >
>> > I have used perl for several months,and I now want to feel the power of
>> > bioperl.
>> > But it seems that the installing is more difficult than I thought.
>> >
>> > I typed the commands.
>> >
>> >
>> >
>> > install-shell
>> >
>> >
>> > rep add bioperl http://bioperl.org/DIST
>> >
>> >
>> > rep add uwinnipeg
>> > http://cpan.uwinnipeg.ca/PPMPackages/12xx/<http://cpan.uwinnipeg.ca/PPMPackages/10xx/>
>> >
>> >
>> > rep add trouchelle http://trouchelle.com/ppm12/
>> >
>> > install BioPerl
>> >
>> > However,the installing failed,
>> >
>> > ppm install failed:
>> > Can't find any package that provides List::MoreUtils for Bundle-BioPerl-Core
>> > Can't find any package that provides PostScript::TextBlock for
>> > Bundle-BioPerl-Core
>> > Can't find any package that provides Ace:: for Bundle-BioPerl-Core
>> > Can't find any package that provides SOAP::Lite for Bundle-BioPerl-Core
>> > Can't find any package that provides Convert::Binary::C for
>> > Bundle-BioPerl-Core
>> > Can't find any package that provides XML::Twig for Bundle-BioPerl-Core
>> > Can't find any package that provides DB_File:: for Bundle-BioPerl-Core
>> > Can't find any package that provides IPC::Run for GraphViz
>> > Can't find any package that provides XML-XPathEngine for XML-DOM-XPath
>> > Can't find any package that provides List-MoreUtils for Moose
>> > Can't find any package that provides List-MoreUtils for Class-MOP
>> >
>> >
>> > then I tried
>> >
>> > install http://www.bribes.org/perl/ppm/GD.ppd
>> >
>> > and tried the installation again,but it still didn't help.
>> >
>> > *
>> > *
>> > *
>> > *
>> > *
>> > *
>> >
>> >
>> > *Do you konw what's wrong with the problem?*
>> > *
>> > *
>> > *
>> > *
>> > *Please help me,thanks very much.*
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields1 at gmail.com  Sun Aug 29 11:58:50 2010
From: cjfields1 at gmail.com (Christopher Fields)
Date: Sun, 29 Aug 2010 10:58:50 -0500
Subject: [Bioperl-l] Could I install BioPerl on Windows with the
	ActivePerl 5.12.1?
In-Reply-To: <AANLkTinSp6GCOQvCFYOUk1Ad8EjKdU=dQbe5GpbLiLr1@mail.gmail.com>
References: <AANLkTi=ycKzqWWQ-FHk=4WBxhedt7CYT-WkBZkxRjgrm@mail.gmail.com>
	<78E913D5-00E2-45F2-AA9D-7F4A7CDBFDA1@gmail.com>
	<AANLkTimBPL6Sr2kmg+f0t1j8pk_9nBAoqubKzY4AJoxo@mail.gmail.com>
	<5115F433-06AC-46F1-81AD-D15C4A8D9524@gmail.com>
	<AANLkTinSp6GCOQvCFYOUk1Ad8EjKdU=dQbe5GpbLiLr1@mail.gmail.com>
Message-ID: <A1B60C18-E144-466B-9630-21A88EF2CECB@gmail.com>

Yes, and I am thinking of pointing more and more users that direction instead.  Can't say maintaining PPM packages with ever-fluctuating specs is easy when I don't work with Windows anymore.

chris

On Aug 29, 2010, at 6:35 AM, Adam Kennedy wrote:

> http://strawberryperl.com/download/professional/strawberry-perl-professional-5.10.1.3-alpha-2.msi
> 
> You get BioPerl installed out the box.
> 
> Adam K
> 
> On 20 August 2010 03:20, Christopher Fields <cjfields1 at gmail.com> wrote:
>> cc'ing list.  Looks like the BioPerl PPM is possibly broken for perl 5.12.  Shouldn't be too hard to fix, but apparently there are a lot of missing packages. Troubling...
>> 
>> chris
>> 
>> On Aug 19, 2010, at 11:29 AM, han sun wrote:
>> 
>>> v5.10 works,thanks.
>>> 
>>> 2010/8/19 Christopher Fields <cjfields1 at gmail.com>
>>> Try using ActivePerl 5.10 instead of v5.12.  It's very possible the PPM won't work for v5.12 yet.
>>> 
>>> chris
>>> 
>>> On Aug 19, 2010, at 9:25 AM, han sun wrote:
>>> 
>>>> Hello everyone,
>>>> 
>>>> I have used perl for several months,and I now want to feel the power of
>>>> bioperl.
>>>> But it seems that the installing is more difficult than I thought.
>>>> 
>>>> I typed the commands.
>>>> 
>>>> 
>>>> 
>>>> install-shell
>>>> 
>>>> 
>>>> rep add bioperl http://bioperl.org/DIST
>>>> 
>>>> 
>>>> rep add uwinnipeg
>>>> http://cpan.uwinnipeg.ca/PPMPackages/12xx/<http://cpan.uwinnipeg.ca/PPMPackages/10xx/>
>>>> 
>>>> 
>>>> rep add trouchelle http://trouchelle.com/ppm12/
>>>> 
>>>> install BioPerl
>>>> 
>>>> However,the installing failed,
>>>> 
>>>> ppm install failed:
>>>> Can't find any package that provides List::MoreUtils for Bundle-BioPerl-Core
>>>> Can't find any package that provides PostScript::TextBlock for
>>>> Bundle-BioPerl-Core
>>>> Can't find any package that provides Ace:: for Bundle-BioPerl-Core
>>>> Can't find any package that provides SOAP::Lite for Bundle-BioPerl-Core
>>>> Can't find any package that provides Convert::Binary::C for
>>>> Bundle-BioPerl-Core
>>>> Can't find any package that provides XML::Twig for Bundle-BioPerl-Core
>>>> Can't find any package that provides DB_File:: for Bundle-BioPerl-Core
>>>> Can't find any package that provides IPC::Run for GraphViz
>>>> Can't find any package that provides XML-XPathEngine for XML-DOM-XPath
>>>> Can't find any package that provides List-MoreUtils for Moose
>>>> Can't find any package that provides List-MoreUtils for Class-MOP
>>>> 
>>>> 
>>>> then I tried
>>>> 
>>>> install http://www.bribes.org/perl/ppm/GD.ppd
>>>> 
>>>> and tried the installation again,but it still didn't help.
>>>> 
>>>> *
>>>> *
>>>> *
>>>> *
>>>> *
>>>> *
>>>> 
>>>> 
>>>> *Do you konw what's wrong with the problem?*
>>>> *
>>>> *
>>>> *
>>>> *
>>>> *Please help me,thanks very much.*
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>>> 
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From odclerck at gmail.com  Fri Aug 27 03:44:14 2010
From: odclerck at gmail.com (odclerck)
Date: Fri, 27 Aug 2010 00:44:14 -0700 (PDT)
Subject: [Bioperl-l]  fasta header replace
Message-ID: <29550202.post@talk.nabble.com>


Hi,
Was wondering if someone had an easy script available that converts the
headers of a fasta sequences to a value stored in a separate text file.

Macrogen produces files with sequences that look more or less like this:
>100825-30_I01_CF_CentralAmerica1_A1_psbAF.ab1	1012, 1000 bases, 0 checksum.

I can filter out the position on the plate e.g. "A1" easily but would like
to replace this with the name of the strain stored in a different text file,
e.g. "A1_D1222".

Realize this sounds pretty basic to most of you, but I'm pretty new at
scripting.
Olivier

-- 
View this message in context: http://old.nabble.com/fasta-header-replace-tp29550202p29550202.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From J.Christopher.Ellis at duke.edu  Mon Aug 30 08:55:04 2010
From: J.Christopher.Ellis at duke.edu (J. Christopher Ellis)
Date: Mon, 30 Aug 2010 08:55:04 -0400
Subject: [Bioperl-l] Taxonomy DB problem
Message-ID: <51468.1283172904@duke.edu>

 Hi All,

 I am trying to extract the entire taxonomy of an organism including the
classifications. Some thing like...

Phylum:Proteobacteria,?Class:Gammaproteobacteria,?Order:Enterobacteriales,?Family:Enterobacteriaceae,?Genus:Escherichia

I?am?not?worried?about?format?just?that?I?get?the?information?and?the?associated?level?of?hierarchy.?The?script?found?at?http://bioperl.org/wiki/Species_names_from_accession_numbers?seemed?like?a?good?starting?point?so?I?copied?it?and?tried?run?it?but?got?an?error.

My?first?question?is?"Is?there?a?known?fix?for?this?"?and?my?second?question?is?how?do?I?get?the?full?hierarchical?information?(as?seen?above)?with?the?taxonomy?db?

Thanks?for?all?your?help?in?advance!

Chris?


From rafalucas.unicamp at gmail.com  Mon Aug 30 09:24:11 2010
From: rafalucas.unicamp at gmail.com (Rafael Lucas)
Date: Mon, 30 Aug 2010 10:24:11 -0300
Subject: [Bioperl-l] help in algorithm Bio::Structure::IO::pdb
Message-ID: <AANLkTimNHcjCRqYrhH8=Q=Dqqjtj35NNqMqP+Q2P1oPU@mail.gmail.com>

Hi folks,

How are you? I'm from Brazil and I was making an algorithm that
Cryptographyc a data and then print the result in a pdb file. So I have a
.fasta file and want to pass this file to .pdb file, if I use a program,
like PyMol, it will take so much time, so I wanna use the
Bio::Structure::IO::pdb to accelerate this process, could you help me in
this problem?

Thank you,

Rafael Lucas
Faculdade de Tecnologia em Analise e Desenvolvimento de Sistemas
FT - UNICAMP
+55 (19)9614-0533


From cjfields at illinois.edu  Mon Aug 30 09:36:41 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 30 Aug 2010 08:36:41 -0500
Subject: [Bioperl-l] Taxonomy DB problem
In-Reply-To: <51468.1283172904@duke.edu>
References: <51468.1283172904@duke.edu>
Message-ID: <B93CF33A-0FA5-4A19-AF5A-BE203AA26E38@illinois.edu>

Chris,

Regarding a fix for that script, we would have to see your modified script and the error.  However, there are modules within BioPerl to essentially do what you want, in particular, Bio::DB::Taxonomy.

chris

On Aug 30, 2010, at 7:55 AM, J. Christopher Ellis wrote:

> Hi All,
> 
> I am trying to extract the entire taxonomy of an organism including the
> classifications. Some thing like...
> 
> Phylum:Proteobacteria, Class:Gammaproteobacteria, Order:Enterobacteriales, Family:Enterobacteriaceae, Genus:Escherichia
> 
> I am not worried about format just that I get the information and the associated level of hierarchy. The script found at http://bioperl.org/wiki/Species_names_from_accession_numbers seemed like a good starting point so I copied it and tried run it but got an error.
> 
> My first question is "Is there a known fix for this?" and my second question is how do I get the full hierarchical information (as seen above) with the taxonomy db?
> 
> Thanks for all your help in advance!
> 
> Chris 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From fs5 at sanger.ac.uk  Mon Aug 30 11:11:06 2010
From: fs5 at sanger.ac.uk (Frank Schwach)
Date: Mon, 30 Aug 2010 16:11:06 +0100
Subject: [Bioperl-l] fasta header replace
In-Reply-To: <29550202.post@talk.nabble.com>
References: <29550202.post@talk.nabble.com>
Message-ID: <4C7BCA0A.70503@sanger.ac.uk>

Hi Olivier,

Do you know how to read a file and build a hash from the contents? This 
is what you will need to do,
e.g. if your file is
A1 Strain_A
A2 Strain_A
A3 Strain_B

then you can do something like:

open (INFILE, '>', $infile_path) or die;
my %well2strain;
While (<INFILE>){
    my ($well, $strain) = ($_=~/^([A-Z]\d+)\s+(\w+)/);
    $well2strain{$well}=$strain;
}

You can then use the values of the hash to set the sequence ID as you 
parse the FASTA file. The BioPerl SeqIO howto gives details about how to 
read and write the FASTA file 
(http://www.bioperl.org/wiki/HOWTO:SeqIO#Working_Examples).
You can change the id of a sequence object with
$some_seq_object->id( 'my new ID');

See http://doc.bioperl.org/releases/bioperl-1.0/Bio/Seq.html for details.

Hope that helps to get you started.

Frank

 
odclerck wrote:
> Hi,
> Was wondering if someone had an easy script available that converts the
> headers of a fasta sequences to a value stored in a separate text file.
>
> Macrogen produces files with sequences that look more or less like this:
>   
>> 100825-30_I01_CF_CentralAmerica1_A1_psbAF.ab1	1012, 1000 bases, 0 checksum.
>>     
>
> I can filter out the position on the plate e.g. "A1" easily but would like
> to replace this with the name of the strain stored in a different text file,
> e.g. "A1_D1222".
>
> Realize this sounds pretty basic to most of you, but I'm pretty new at
> scripting.
> Olivier
>
>   


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


From jessica.sun at gmail.com  Mon Aug 30 11:51:39 2010
From: jessica.sun at gmail.com (Jessica Sun)
Date: Mon, 30 Aug 2010 11:51:39 -0400
Subject: [Bioperl-l] Git for the lazy
In-Reply-To: <4A13D48C-B920-4FA5-AF18-292C764A8B79@sbc.su.se>
References: <4A13D48C-B920-4FA5-AF18-292C764A8B79@sbc.su.se>
Message-ID: <AANLkTikzkPL-WN7XUNPcfNhqqnOYUR15br-YzrwsE5tL@mail.gmail.com>

I want to add sequence features  with tags and tag values, I want to have
them in my order, however somehow it seems it is in default alphabetically
orders of the tags, does any one knows how to fix? thanks a lot in advance.


From G.Gallone at sms.ed.ac.uk  Tue Aug 31 07:52:57 2010
From: G.Gallone at sms.ed.ac.uk (Giuseppe Gallone)
Date: Tue, 31 Aug 2010 12:52:57 +0100
Subject: [Bioperl-l] New CPAN Release - Bio::Homology::InterologWalk - A
 Perl Module to retrieve putative PPIs through Interologs
Message-ID: <4C7CED19.80802@sms.ed.ac.uk>

Dear Bioperl users,

I would like to announce the release of Bio::Homology::InterologWalk, a
module that retrieves, scores and visualizes putative Protein-Protein 
Interactions through the orthology-walk method.

The project is available from the following link

http://search.cpan.org/~ggallone/

and a description of the idea behind it is here

http://search.cpan.org/~ggallone/Bio-Homology-InterologWalk-0.02/lib/Bio/Homology/InterologWalk.pm#DESCRIPTION

The project is in a very early stage (currently ver. 0.02 alpha) and has 
currently been tested only on Linux environments. It has not been tested 
on Macs, but it should work fine, and I would appreciate any feedback 
from Mac users who try it.

*Any* form of feedback  will be extremely appreciated (bug, typos,
syntactical errors, verbal abuse etc :) ).


Best,
Giuseppe


-- 

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


From cjfields at illinois.edu  Tue Aug 31 11:01:59 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 31 Aug 2010 10:01:59 -0500
Subject: [Bioperl-l] Taxonomy DB problem
In-Reply-To: <56973.1283255847@duke.edu>
References: <56973.1283255847@duke.edu>
Message-ID: <7167CA86-857E-4E16-A3D6-BA45045CF892@illinois.edu>

Yes, I see that one.  It may be the ID hash that is being returned is empty.  I'll look into it.

-c 

On Aug 31, 2010, at 6:57 AM, J. Christopher Ellis wrote:

> Hi Chris,
> 
> The error is...
> 
> "Use of uninitialized value $id in join or string at C:/Perl64/site/lib/Bio/Tools/EUtilities/EUtilParameters.pm line 363."
> 
> The script from http://bioperl.org/wiki/Species_names_from_accession_numbers is as follows....
> 
> use Bio::DB::EUtilities;
> 
> 
> 
>  
> 
> 
> 
> 
> my (%taxa, @taxa);
> 
> 
> 
> my (%names, %idmap);
> 
> 
> 
>  
> 
> 
> 
> 
> # these are protein ids; nuc ids will work by changing -dbfrom => 'nucleotide',
> 
> 
> 
> # (probably)
> 
> 
> 
>  
> 
> 
> 
> 
> my @ids = qw(1621261 89318838 68536103 
> 
> 20807972
>  730439);
> 
>  
> 
> 
> 
> 
> my $factory = Bio::DB::EUtilities->new(
> 
> -
> eutil => 'elink',
> 
>  
> -db => 'taxonomy',
> 
> 
> 
>  
> -dbfrom => 'protein',
> 
> 
> 
>  
> -correspondence => 1,
> 
> 
> 
>  
> -id => \@ids);
> 
> 
> 
>  
> 
> 
> 
> 
> # iterate through the LinkSet objects
> 
> 
> 
> while (my $ds = $factory->next_LinkSet) {
> 
> 
> 
>  
> $taxa{($ds->get_submitted_ids)[0]
> 
> }
>  = ($ds->get_ids)[0]
> 
> }
> 
> 
> 
>  
> 
> 
> 
> 
> @taxa = @taxa{@ids};
> 
> 
> 
>  
> 
> 
> 
> 
> $factory = Bio::DB::EUtilities->new(-eutil 
> 
> =>
>  'esummary',
> 
>  
> -db => 'taxonomy',
> 
> 
> 
>  
> -id => \@taxa );
> 
> 
> 
>  
> 
> 
> 
> 
> while (local $_ = $factory->next_DocSum)
> 
>  
> {
> 
>  
> $names{($_->get_contents_by_name('TaxId'))
> 
> [
> 0]} = 
> 
> ($_->get_contents_by_name('ScientificName'))[0
> 
> ]
> ;
> 
> }
> 
> 
> 
>  
> 
> 
> 
> 
> foreach (@ids) {
> 
> 
> 
>  
> $idmap{$_} = $names{$taxa{$_
> 
> }
> };
> 
> }
> 
> 
> 
>  
> 
> 
> 
> 
> # %idmap is
> 
> 
> 
> # 1621261 => 'Mycobacterium tuberculosis H37Rv'
> 
> 
> 
> # 20807972 => 'Thermoanaerobacter tengcongensis MB4'
> 
> 
> 
> # 68536103 => 'Corynebacterium jeikeium K411'
> 
> 
> 
> # 730439 => 'Bacillus caldolyticus'
> 
> 
> 
> # 89318838 => undef (this record has been removed from the db)
> 
> 
> 
>  
> 
> 
> 
> 
> 1;
> 
> 
> Thanks,
> 
> 
> 
> Chris
> 
> 
> On Mon 08/30/10 09:36 , "Chris Fields" cjfields at illinois.edu sent:
> Chris,
> 
> Regarding a fix for that script, we would have to see your modified script and the error. However, there are modules within BioPerl to essentially do what you want, in particular, Bio::DB::Taxonomy.
> 
> chris
> 
> On Aug 30, 2010, at 7:55 AM, J. Christopher Ellis wrote:
> 
> > Hi All,
> > 
> > I am trying to extract the entire taxonomy of an organism including the
> > classifications. Some thing like...
> > 
> > Phylum:Proteobacteria, Class:Gammaproteobacteria, Order:Enterobacteriales, Family:Enterobacteriaceae, Genus:Escherichia
> > 
> > I am not worried about format just that I get the information and the associated level of hierarchy. The script found athttp://bioperl.org/wiki/Species_names_from_accession_numbers">http://bioperl.org/wiki/Species_names_from_accession_numbers seemed like a good starting point so I copied it and tried run it but got an error.
> > 
> > My first question is "Is there a known fix for this?" and my second question is how do I get the full hierarchical information (as seen above) with the taxonomy db?
> > 
> > Thanks for all your help in advance!
> > 
> > Chris 
> > 
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l">http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 


From J.Christopher.Ellis at duke.edu  Tue Aug 31 07:57:27 2010
From: J.Christopher.Ellis at duke.edu (J. Christopher Ellis)
Date: Tue, 31 Aug 2010 07:57:27 -0400
Subject: [Bioperl-l] Taxonomy DB problem
Message-ID: <56973.1283255847@duke.edu>

 Hi Chris,

 The error is...

 "Use of uninitialized value $id in join or string at
C:/Perl64/site/lib/Bio/Tools/EUtilities/EUtilParameters.pm line 363."

 The script from
http://bioperl.org/wiki/Species_names_from_accession_numbers is as
follows....

use?Bio::DB::EUtilities;

?

my?(%taxa,?@taxa);

my?(%names,?%idmap);

?

#?these?are?protein?ids;?nuc?ids?will?work?by?changing?-dbfrom?=>?'nucleotide',

#?(probably)

?

my?@ids?=?qw(1621261?89318838?68536103?

20807972?730439);

?

my?$factory?=?Bio::DB::EUtilities->new(

-eutil?=>?'elink',

?-db?=>?'taxonomy',

?-dbfrom?=>?'protein',

?-correspondence?=>?1,

?-id?=>?@ids);

?

#?iterate?through?the?LinkSet?objects

while?(my?$ds?=?$factory->next_LinkSet)?{

?$taxa{($ds->get_submitted_ids)[0]

}?=?($ds->get_ids)[0]

}

?

@taxa?=?@taxa{@ids};

?

$factory?=?Bio::DB::EUtilities->new(-eutil?

=>?'esummary',

?-db?=>?'taxonomy',

?-id?=>?@taxa?);

?

while?(local?$_?=?$factory->next_DocSum)

?{

?$names{($_->get_contents_by_name('TaxId'))

[0]}?=?

($_->get_contents_by_name('ScientificName'))[0

];

}

?

foreach?(@ids)?{

?$idmap{$_}?=?$names{$taxa{$_

}};

}

?

#?%idmap?is

#?1621261?=>?'Mycobacterium?tuberculosis?H37Rv'

#?20807972?=>?'Thermoanaerobacter?tengcongensis?MB4'

#?68536103?=>?'Corynebacterium?jeikeium?K411'

#?730439?=>?'Bacillus?caldolyticus'

#?89318838?=>?undef?(this?record?has?been?removed?from?the?db)

?

1;

Thanks,

Chris

 On Mon 08/30/10 09:36 , "Chris Fields" cjfields at illinois.edu sent:
 Chris,

 Regarding a fix for that script, we would have to see your modified
script and the error. However, there are modules within BioPerl to
essentially do what you want, in particular, Bio::DB::Taxonomy.

 chris

 On Aug 30, 2010, at 7:55 AM, J. Christopher Ellis wrote:

 > Hi All,
 > 
 > I am trying to extract the entire taxonomy of an organism including the
 > classifications. Some thing like...
 > 
 > Phylum:Proteobacteria, Class:Gammaproteobacteria,
Order:Enterobacteriales, Family:Enterobacteriaceae, Genus:Escherichia
 > 
 > I am not worried about format just that I get the information and the
associated level of hierarchy. The script found at
http://bioperl.org/wiki/Species_names_from_accession_numbers seemed like a
good starting point so I copied it and tried run it but got an error.
 > 
 > My first question is "Is there a known fix for this?" and my second
question is how do I get the full hierarchical information (as seen above)
with the taxonomy db?
 > 
 > Thanks for all your help in advance!
 > 
 > Chris 
 > 
 > 
 > _______________________________________________
 > Bioperl-l mailing list
 > Bioperl-l at lists.open-bio.org
 > http://lists.open-bio.org/mailman/listinfo/bioperl-l

 
From rmb32 at cornell.edu  Sun Aug  1 15:17:14 2010
From: rmb32 at cornell.edu (Robert Buels)
Date: Sun, 01 Aug 2010 12:17:14 -0700
Subject: [Bioperl-l] GMOD Evo Hackathon Open Call for Participation
Message-ID: <4C55C83A.3060700@cornell.edu>

We are seeking participants for the GMOD Tools for Evolutionary Biology 
Hackathon, held November 8-12, 2010 at the US National Evolutionary 
Synthesis Center (NESCent) in Durham, NC.

This hackathon targets three critical gaps in the capabilities of the 
GMOD toolbox that currently limit its utility for evolutionary research:

  1. Visualization of comparative genomics data
  2. Visualization of phylogenetic data and trees
  3. Support for population diversity and phenotype data

If you are interested in these areas and have relevant expertise, you 
are strongly encouraged to apply. Relevant areas of expertise include 
more than just software development: if you are a GMOD power user, 
visualization guru, domain expert (comparative, phylogenetics, 
population, ...), or documentation wizard, then your skills are needed!

How To Apply:

Fill out the online application form at http://bit.ly/gmodevohack. 
Applications are due August 25.

About GMOD:

GMOD is an intercompatible suite of open-source software components for 
storing, managing, analyzing, and visualizing genome-scale data. GMOD 
includes many widely-used software components: GBrowse and JBrowse, both 
genome viewers; GBrowse_syn, a comparative genomics viewer; Chado, a 
generic and modular database schema; CMap, a comparative map viewer; as 
well as many other components including Apollo, MAKER, BioMart, 
InterMine, and Galaxy. We hope to extend the functionality of existing 
GMOD components, and integrate new components as well.

About Hackathons:

A hackathon is an intense event at which a group of programmers with 
different backgrounds and skills collaborate hands-on and face-to-face 
to develop working code that is of utility to the community as a whole. 
The mix of people will include domain experts and computer-savvy end-users.

More details about the event, its motivation, organization, procedures, 
and attendees, as well as URLs to the hackathon and related websites are 
included below.

Sincerely,

The GMOD EvoHack Organizing Committee (and project affiliations as
relevant):

Nicole Washington, Chair (LBNL, modENCODE, Phenote)

Robert Buels (SGN, Chado NatDiv)

Scott Cain (OICR, GMOD)

Dave Clements (NESCent, GMOD)

Hilmar Lapp (NESCent, Phenoscape, Chado NatDiv)

Sheldon McKay (University of Arizona, iPlant, GBrowse_syn)


-----------------------------

About the GMOD Evo Hackathon

Overview

We are organizing a hackathon to fill critical gaps in the capabilities 
of the Generic Model Organism Database (GMOD) toolbox that currently 
limit its utility for evolutionary research. Specifically, we will focus 
on tools for

   1) viewing comparative genomics data;
   2) visualizing phylogenomic data; and
   3) supporting population diversity data and phenotype annotation.

The event will be hosted at NESCent and bring together a group of about 
20+ software developers, end-user representatives, and documentation 
experts who would otherwise not meet. The participants will include key 
developers of GMOD components that currently lack features critical for 
emerging evolutionary biology research, developers of informatics tools 
in evolutionary research that lack GMOD integration, and 
informatics-savvy biologists who can represent end-user requirements.

The event will provide a unique opportunity to infuse the GMOD developer 
community with a heightened awareness of unmet needs in evolutionary 
biology that GMOD components have the potential to fill, and for tool 
developers in evolutionary biology to better understand how best to 
extend or integrate with already existing GMOD components.

Before the Event

Discussion of ideas and sometimes even design actually starts well 
before the hackathon, on mailing lists, wiki pages, and conference calls 
set up among accepted attendees.  This advance work lays the foundation 
for participants to be productive from the very first day.  This also 
means that participants should be willing to contribute some time in 
advance of the hackathon itself to participate in this preparatory 
discussion.

During the Event

Typically, hackathon participants use the morning of the first day of 
the event to organize themselves into working groups of between 3 and 6 
people, each with a focused implementation objective.  Ideas and 
objectives are discussed, and attendees coalesce around the projects in 
which they have the most experience or interest.


Deliverables / Event Results

The meeting's attendance, working groups, and outcomes will be fully 
logged and documented on the GMOD wiki (http://gmod.org). Each working 
group during the event will typically have its own wiki page, linked 
from the main EvoHack page, where it documents its minutes and design 
notes, and provides links to the code and documentation it produces. 
Also, since GMOD and NESCent are both committed to open source 
principles, all code and documentation produced by participants during 
the hackathon must be published under an OSI-approved open source 
license. As contributions to existing GMOD tools, all hackathon products 
will most likely satisfy this requirement automatically.

NESCent

This event is sponsored by the US National Evolutionary Synthesis Center 
(NESCent, http://www.nescent.org) through its Informatics Whitepapers 
program (http://www.nescent.org/informatics/whitepapers.php). NESCent 
promotes the synthesis of information, concepts and knowledge to address 
significant, emerging, or novel questions in evolutionary science and 
its applications. NESCent achieves this by supporting research and 
education across disciplinary, institutional, geographic, and 
demographic boundaries (see http://www.nescent.org/science/proposals.php).

Links

Main GMOD EvoHack page, and full proposal:
http://gmod.org/wiki/GMOD_Evo_Hackathon

NESCent: http://www.nescent.org/
GMOD: http://gmod.org <http://gmod.org/>
Similar past NESCent events, see: http://hackathon.nescent.org/
GMOD hackathon application:  http://bit.ly/gmodevohack

-- 
http://gmod.org/wiki/GMOD_News
http://gmod.org/wiki/GMOD_Europe_2010
http://gmod.org/wiki/Help_Desk_Feedback


From maj at fortinbras.us  Sun Aug  1 19:19:16 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Sun, 1 Aug 2010 19:19:16 -0400
Subject: [Bioperl-l] SOAP Eutilities
In-Reply-To: <AANLkTi=DSQ2vktjCghDscW6OyHv25HYNXqA96LXTz443@mail.gmail.com>
References: <AANLkTi=DSQ2vktjCghDscW6OyHv25HYNXqA96LXTz443@mail.gmail.com>
Message-ID: <627BEC8B2E624A69A0B11EEBC8C93B71@NewLife>

Turns out that module lives in bioperl-run; try 

git clone git://github.com/bioperl/bioperl-run.git

MAJ
----- Original Message ----- 
From: "Robson de Souza" <robfsouza at gmail.com>
To: <bioperl-l at bioperl.org>
Sent: Saturday, July 31, 2010 4:56 PM
Subject: [Bioperl-l] SOAP Eutilities


> Hi,
> 
> Bio::DB::SoapEUtilities, referred in the HOWTO on EUtilities, seems to
> have disappeared from the Git repository.
> A simple
> 
> git clone git://github.com/bioperl/bioperl-live.git
> 
> does not download it. Any ideas why?
> Robson
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>


From David.Messina at sbc.su.se  Mon Aug  2 09:58:10 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Mon, 2 Aug 2010 15:58:10 +0200
Subject: [Bioperl-l] phyloxml and element order
In-Reply-To: <AANLkTimk5j3VfOvLNcN_c+FsgoVqpntB9xR5NfDopLPh@mail.gmail.com>
References: <AANLkTimk5j3VfOvLNcN_c+FsgoVqpntB9xR5NfDopLPh@mail.gmail.com>
Message-ID: <AB413C9E-ED42-48AF-A8AB-893771AD7067@sbc.su.se>

Hi Fred,

Thanks for letting us know about this ? definitely sounds like a bug.

Would you please submit this to our bug tracker?

    http://bugzilla.open-bio.org


(You can just copy and paste your previous email.)

Dave


On Jul 30, 2010, at 06:59, Fr?d?ric Romagn? wrote:

> Hi,
> 
> I'm using bioperl to create phyloxml trees, after few tentatives, i got my
> tree with all the element/attributes i want but when I write the tree,
> element are not written following the order specified in the XSD Schema.
> 
> For example, i got :
> 
> <clade>
>   <clade>
>      <name>Loxosceles intermedia</name>
>      <taxonomy>
>         <scientific_name>Araneomorphae Sicariidae</scientific_name>
>      </taxonomy>
>      <sequence>
>         <accession source="Arachnoserver">969</accession>
>         <mol_seq>HAAERADSRKPIWDIAHMVNDLELVD</mol_seq>
>      </sequence>
>   </clade>
>   <taxonomy>
>      <scientific_name>Araneomorphae Sicariidae</scientific_name>
>   </taxonomy>
> </clade>
> 
> The program forester complains that <taxonomy> should be written before the
> <clade> element.
> 
> According to
> http://phyloxml.wordpress.com/2009/11/25/order-of-elements-in-phyloxml this
> is what bioperl is supposed to do.
> 
> All my element/attributes are set before writing the tree using
> 'add_Annotation', 'add_tag_value' and 'sequence' methods from a
> Bio::Tree::AnnotatableNode object, so i think the error comes from the
> write_tree method.
> 
> Any help would be appreciated.
> 
> Thank you,
> Fred
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From shalabh.sharma7 at gmail.com  Mon Aug  2 15:44:35 2010
From: shalabh.sharma7 at gmail.com (shalabh sharma)
Date: Mon, 2 Aug 2010 15:44:35 -0400
Subject: [Bioperl-l] clustalw to maf format
Message-ID: <AANLkTik4yDV6p0AyMWXxsJP1ifYAehu7jw=2oUWeo0B3@mail.gmail.com>

Hi,
    I am trying to convert clustalw to maf format.
I am trying to use AlignIO for that but its not working.

Its giving me the following error:

EXCEPTION Bio::Root::NotImplemented -------------
MSG: Abstract method "Bio::AlignIO::maf::write_aln" is not implemented by
package Bio::AlignIO::maf.
This is not your fault - author of Bio::AlignIO::maf should be blamed!

STACK Bio::Root::RootI::throw_not_implemented
/Library/Perl/5.8.8/Bio/Root/RootI.pm:707
STACK Bio::AlignIO::maf::write_aln /Library/Perl/5.8.8/Bio/AlignIO/
maf.pm:176
STACK Bio::AlignIO::PRINT /Library/Perl/5.8.8/Bio/AlignIO.pm:492
STACK toplevel msf2mafy.pl:11


Is there any other way i can convert clustalw to maf?

I would really appreciate if anyone can help me out.

Thanks
Shalabh


From Russell.Smithies at agresearch.co.nz  Mon Aug  2 16:25:26 2010
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Tue, 3 Aug 2010 08:25:26 +1200
Subject: [Bioperl-l] clustalw to maf format
In-Reply-To: <AANLkTik4yDV6p0AyMWXxsJP1ifYAehu7jw=2oUWeo0B3@mail.gmail.com>
References: <AANLkTik4yDV6p0AyMWXxsJP1ifYAehu7jw=2oUWeo0B3@mail.gmail.com>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF32F02147B68@exchsth.agresearch.co.nz>

This might work if you only have a few:
http://www.ibi.vu.nl/programs/convertalignwww/

--Russell


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of shalabh sharma
> Sent: Tuesday, 3 August 2010 7:45 a.m.
> To: bioperl-l
> Subject: [Bioperl-l] clustalw to maf format
> 
> Hi,
>     I am trying to convert clustalw to maf format.
> I am trying to use AlignIO for that but its not working.
> 
> Its giving me the following error:
> 
> EXCEPTION Bio::Root::NotImplemented -------------
> MSG: Abstract method "Bio::AlignIO::maf::write_aln" is not implemented by
> package Bio::AlignIO::maf.
> This is not your fault - author of Bio::AlignIO::maf should be blamed!
> 
> STACK Bio::Root::RootI::throw_not_implemented
> /Library/Perl/5.8.8/Bio/Root/RootI.pm:707
> STACK Bio::AlignIO::maf::write_aln /Library/Perl/5.8.8/Bio/AlignIO/
> maf.pm:176
> STACK Bio::AlignIO::PRINT /Library/Perl/5.8.8/Bio/AlignIO.pm:492
> STACK toplevel msf2mafy.pl:11
> 
> 
> Is there any other way i can convert clustalw to maf?
> 
> I would really appreciate if anyone can help me out.
> 
> Thanks
> Shalabh
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From shalabh.sharma7 at gmail.com  Mon Aug  2 16:53:31 2010
From: shalabh.sharma7 at gmail.com (shalabh sharma)
Date: Mon, 2 Aug 2010 16:53:31 -0400
Subject: [Bioperl-l] clustalw to maf format
In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32F02147B68@exchsth.agresearch.co.nz>
References: <AANLkTik4yDV6p0AyMWXxsJP1ifYAehu7jw=2oUWeo0B3@mail.gmail.com>
	<18DF7D20DFEC044098A1062202F5FFF32F02147B68@exchsth.agresearch.co.nz>
Message-ID: <AANLkTingREcmgoeS7RVzi4j84Kk9bFmg_F6p-tScpKWA@mail.gmail.com>

Hi Russell,
            Thanks for the reply, but i  have around 400 alignments and some
huge ones :(

Thanks
Shalabh


On Mon, Aug 2, 2010 at 4:25 PM, Smithies, Russell <
Russell.Smithies at agresearch.co.nz> wrote:

> This might work if you only have a few:
> http://www.ibi.vu.nl/programs/convertalignwww/
>
> --Russell
>
>
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of shalabh sharma
> > Sent: Tuesday, 3 August 2010 7:45 a.m.
> > To: bioperl-l
> > Subject: [Bioperl-l] clustalw to maf format
> >
> > Hi,
> >     I am trying to convert clustalw to maf format.
> > I am trying to use AlignIO for that but its not working.
> >
> > Its giving me the following error:
> >
> > EXCEPTION Bio::Root::NotImplemented -------------
> > MSG: Abstract method "Bio::AlignIO::maf::write_aln" is not implemented by
> > package Bio::AlignIO::maf.
> > This is not your fault - author of Bio::AlignIO::maf should be blamed!
> >
> > STACK Bio::Root::RootI::throw_not_implemented
> > /Library/Perl/5.8.8/Bio/Root/RootI.pm:707
> > STACK Bio::AlignIO::maf::write_aln /Library/Perl/5.8.8/Bio/AlignIO/
> > maf.pm:176
> > STACK Bio::AlignIO::PRINT /Library/Perl/5.8.8/Bio/AlignIO.pm:492
> > STACK toplevel msf2mafy.pl:11
> >
> >
> > Is there any other way i can convert clustalw to maf?
> >
> > I would really appreciate if anyone can help me out.
> >
> > Thanks
> > Shalabh
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
>


From biopython at maubp.freeserve.co.uk  Mon Aug  2 17:24:09 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 2 Aug 2010 22:24:09 +0100
Subject: [Bioperl-l] clustalw to maf format
In-Reply-To: <AANLkTik4yDV6p0AyMWXxsJP1ifYAehu7jw=2oUWeo0B3@mail.gmail.com>
References: <AANLkTik4yDV6p0AyMWXxsJP1ifYAehu7jw=2oUWeo0B3@mail.gmail.com>
Message-ID: <AANLkTikFJP0aZHWgcRVxfJ9dhg-8Aj+aRWLF2GJDseW3@mail.gmail.com>

On Mon, Aug 2, 2010 at 8:44 PM, shalabh sharma
<shalabh.sharma7 at gmail.com> wrote:
> Hi,
> ? ?I am trying to convert clustalw to maf format.
> I am trying to use AlignIO for that but its not working.

Could you tell us why you have to use maf format?
I'm curious because all of the phylogenetics tools I've
had to work with personally will take some other format
which is more widely supported (e.g. FASTA, PFAM,
ClustalW, PHYLIP, ...).

Peter


From bernd.web at gmail.com  Mon Aug  2 17:25:52 2010
From: bernd.web at gmail.com (Bernd Web)
Date: Mon, 2 Aug 2010 23:25:52 +0200
Subject: [Bioperl-l] clustalw to maf format
In-Reply-To: <AANLkTingREcmgoeS7RVzi4j84Kk9bFmg_F6p-tScpKWA@mail.gmail.com>
References: <AANLkTik4yDV6p0AyMWXxsJP1ifYAehu7jw=2oUWeo0B3@mail.gmail.com>
	<18DF7D20DFEC044098A1062202F5FFF32F02147B68@exchsth.agresearch.co.nz>
	<AANLkTingREcmgoeS7RVzi4j84Kk9bFmg_F6p-tScpKWA@mail.gmail.com>
Message-ID: <AANLkTimQe9fgO3jMeWR_y3E7gNskh26GUVVuEyfgtRJc@mail.gmail.com>

Hi Shalabh,

This ConvertAlign does not write maf either, it only reads it (i made
it). I found some other converters on the web but they do not export
to maf format either...

http://biotechvana.uv.es/servers/afc/main.php
http://www.hiv.lanl.gov/content/sequence/FORMAT_CONVERSION/form.html

Galaxy has a MAF to Fasta converter:
http://main.g2.bx.psu.edu/root?tool_id=MAF_To_Fasta1


Regards,
Bernd


On Mon, Aug 2, 2010 at 10:53 PM, shalabh sharma
<shalabh.sharma7 at gmail.com> wrote:
> Hi Russell,
> ? ? ? ? ? ?Thanks for the reply, but i ?have around 400 alignments and some
> huge ones :(
>
> Thanks
> Shalabh
>
>
> On Mon, Aug 2, 2010 at 4:25 PM, Smithies, Russell <
> Russell.Smithies at agresearch.co.nz> wrote:
>
>> This might work if you only have a few:
>> http://www.ibi.vu.nl/programs/convertalignwww/
>>
>> --Russell
>>
>>
>> > -----Original Message-----
>> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> > bounces at lists.open-bio.org] On Behalf Of shalabh sharma
>> > Sent: Tuesday, 3 August 2010 7:45 a.m.
>> > To: bioperl-l
>> > Subject: [Bioperl-l] clustalw to maf format
>> >
>> > Hi,
>> > ? ? I am trying to convert clustalw to maf format.
>> > I am trying to use AlignIO for that but its not working.
>> >
>> > Its giving me the following error:
>> >
>> > EXCEPTION Bio::Root::NotImplemented -------------
>> > MSG: Abstract method "Bio::AlignIO::maf::write_aln" is not implemented by
>> > package Bio::AlignIO::maf.
>> > This is not your fault - author of Bio::AlignIO::maf should be blamed!
>> >
>> > STACK Bio::Root::RootI::throw_not_implemented
>> > /Library/Perl/5.8.8/Bio/Root/RootI.pm:707
>> > STACK Bio::AlignIO::maf::write_aln /Library/Perl/5.8.8/Bio/AlignIO/
>> > maf.pm:176
>> > STACK Bio::AlignIO::PRINT /Library/Perl/5.8.8/Bio/AlignIO.pm:492
>> > STACK toplevel msf2mafy.pl:11
>> >
>> >
>> > Is there any other way i can convert clustalw to maf?
>> >
>> > I would really appreciate if anyone can help me out.
>> >
>> > Thanks
>> > Shalabh
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> =======================================================================
>> Attention: The information contained in this message and/or attachments
>> from AgResearch Limited is intended only for the persons or entities
>> to which it is addressed and may contain confidential and/or privileged
>> material. Any review, retransmission, dissemination or other use of, or
>> taking of any action in reliance upon, this information by persons or
>> entities other than the intended recipients is prohibited by AgResearch
>> Limited. If you have received this message in error, please notify the
>> sender immediately.
>> =======================================================================
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at illinois.edu  Mon Aug  2 17:31:20 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 2 Aug 2010 16:31:20 -0500
Subject: [Bioperl-l] clustalw to maf format
In-Reply-To: <AANLkTingREcmgoeS7RVzi4j84Kk9bFmg_F6p-tScpKWA@mail.gmail.com>
References: <AANLkTik4yDV6p0AyMWXxsJP1ifYAehu7jw=2oUWeo0B3@mail.gmail.com>
	<18DF7D20DFEC044098A1062202F5FFF32F02147B68@exchsth.agresearch.co.nz>
	<AANLkTingREcmgoeS7RVzi4j84Kk9bFmg_F6p-tScpKWA@mail.gmail.com>
Message-ID: <6E9C9D64-D23A-4FC8-B213-FC8A7FFA4F27@illinois.edu>

No other format will work?  The main reason you see unimplemented methods like this is there is no active interest in working with this format beyond getting the information stored within them into objects and other commonly-used formats.

chris

On Aug 2, 2010, at 3:53 PM, shalabh sharma wrote:

> Hi Russell,
>            Thanks for the reply, but i  have around 400 alignments and some
> huge ones :(
> 
> Thanks
> Shalabh
> 
> 
> On Mon, Aug 2, 2010 at 4:25 PM, Smithies, Russell <
> Russell.Smithies at agresearch.co.nz> wrote:
> 
>> This might work if you only have a few:
>> http://www.ibi.vu.nl/programs/convertalignwww/
>> 
>> --Russell
>> 
>> 
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>> bounces at lists.open-bio.org] On Behalf Of shalabh sharma
>>> Sent: Tuesday, 3 August 2010 7:45 a.m.
>>> To: bioperl-l
>>> Subject: [Bioperl-l] clustalw to maf format
>>> 
>>> Hi,
>>>    I am trying to convert clustalw to maf format.
>>> I am trying to use AlignIO for that but its not working.
>>> 
>>> Its giving me the following error:
>>> 
>>> EXCEPTION Bio::Root::NotImplemented -------------
>>> MSG: Abstract method "Bio::AlignIO::maf::write_aln" is not implemented by
>>> package Bio::AlignIO::maf.
>>> This is not your fault - author of Bio::AlignIO::maf should be blamed!
>>> 
>>> STACK Bio::Root::RootI::throw_not_implemented
>>> /Library/Perl/5.8.8/Bio/Root/RootI.pm:707
>>> STACK Bio::AlignIO::maf::write_aln /Library/Perl/5.8.8/Bio/AlignIO/
>>> maf.pm:176
>>> STACK Bio::AlignIO::PRINT /Library/Perl/5.8.8/Bio/AlignIO.pm:492
>>> STACK toplevel msf2mafy.pl:11
>>> 
>>> 
>>> Is there any other way i can convert clustalw to maf?
>>> 
>>> I would really appreciate if anyone can help me out.
>>> 
>>> Thanks
>>> Shalabh
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> =======================================================================
>> Attention: The information contained in this message and/or attachments
>> from AgResearch Limited is intended only for the persons or entities
>> to which it is addressed and may contain confidential and/or privileged
>> material. Any review, retransmission, dissemination or other use of, or
>> taking of any action in reliance upon, this information by persons or
>> entities other than the intended recipients is prohibited by AgResearch
>> Limited. If you have received this message in error, please notify the
>> sender immediately.
>> =======================================================================
>> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From shalabh.sharma7 at gmail.com  Mon Aug  2 18:30:41 2010
From: shalabh.sharma7 at gmail.com (shalabh sharma)
Date: Mon, 2 Aug 2010 18:30:41 -0400
Subject: [Bioperl-l] clustalw to maf format
In-Reply-To: <6E9C9D64-D23A-4FC8-B213-FC8A7FFA4F27@illinois.edu>
References: <AANLkTik4yDV6p0AyMWXxsJP1ifYAehu7jw=2oUWeo0B3@mail.gmail.com>
	<18DF7D20DFEC044098A1062202F5FFF32F02147B68@exchsth.agresearch.co.nz>
	<AANLkTingREcmgoeS7RVzi4j84Kk9bFmg_F6p-tScpKWA@mail.gmail.com>
	<6E9C9D64-D23A-4FC8-B213-FC8A7FFA4F27@illinois.edu>
Message-ID: <AANLkTi=DS92zQ+_YANSbgmMiPUDo66YXyLhQby+_z_VD@mail.gmail.com>

Hi All,
      Thanks for the replies.
Actually i am working on a pipeline involving RNAz.
I had impression that there must be a converter available as their webserver
can take xmfa or maf format but standalone is only accepting maf format.

I think i will use a program that can output as xmfa and write to those
people if they can provide me with the converter.

Thanks
Shalabh


On Mon, Aug 2, 2010 at 5:31 PM, Chris Fields <cjfields at illinois.edu> wrote:

> No other format will work?  The main reason you see unimplemented methods
> like this is there is no active interest in working with this format beyond
> getting the information stored within them into objects and other
> commonly-used formats.
>
> chris
>
> On Aug 2, 2010, at 3:53 PM, shalabh sharma wrote:
>
> > Hi Russell,
> >            Thanks for the reply, but i  have around 400 alignments and
> some
> > huge ones :(
> >
> > Thanks
> > Shalabh
> >
> >
> > On Mon, Aug 2, 2010 at 4:25 PM, Smithies, Russell <
> > Russell.Smithies at agresearch.co.nz> wrote:
> >
> >> This might work if you only have a few:
> >> http://www.ibi.vu.nl/programs/convertalignwww/
> >>
> >> --Russell
> >>
> >>
> >>> -----Original Message-----
> >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >>> bounces at lists.open-bio.org] On Behalf Of shalabh sharma
> >>> Sent: Tuesday, 3 August 2010 7:45 a.m.
> >>> To: bioperl-l
> >>> Subject: [Bioperl-l] clustalw to maf format
> >>>
> >>> Hi,
> >>>    I am trying to convert clustalw to maf format.
> >>> I am trying to use AlignIO for that but its not working.
> >>>
> >>> Its giving me the following error:
> >>>
> >>> EXCEPTION Bio::Root::NotImplemented -------------
> >>> MSG: Abstract method "Bio::AlignIO::maf::write_aln" is not implemented
> by
> >>> package Bio::AlignIO::maf.
> >>> This is not your fault - author of Bio::AlignIO::maf should be blamed!
> >>>
> >>> STACK Bio::Root::RootI::throw_not_implemented
> >>> /Library/Perl/5.8.8/Bio/Root/RootI.pm:707
> >>> STACK Bio::AlignIO::maf::write_aln /Library/Perl/5.8.8/Bio/AlignIO/
> >>> maf.pm:176
> >>> STACK Bio::AlignIO::PRINT /Library/Perl/5.8.8/Bio/AlignIO.pm:492
> >>> STACK toplevel msf2mafy.pl:11
> >>>
> >>>
> >>> Is there any other way i can convert clustalw to maf?
> >>>
> >>> I would really appreciate if anyone can help me out.
> >>>
> >>> Thanks
> >>> Shalabh
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >> =======================================================================
> >> Attention: The information contained in this message and/or attachments
> >> from AgResearch Limited is intended only for the persons or entities
> >> to which it is addressed and may contain confidential and/or privileged
> >> material. Any review, retransmission, dissemination or other use of, or
> >> taking of any action in reliance upon, this information by persons or
> >> entities other than the intended recipients is prohibited by AgResearch
> >> Limited. If you have received this message in error, please notify the
> >> sender immediately.
> >> =======================================================================
> >>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From chiragmatkarbioinfo at gmail.com  Tue Aug  3 03:47:37 2010
From: chiragmatkarbioinfo at gmail.com (chirag matkar)
Date: Tue, 3 Aug 2010 13:17:37 +0530
Subject: [Bioperl-l] Pubmed Parsing
Message-ID: <AANLkTim+qcBN_9kXVLAkessaHUY9e=gc4Ad5MVGWk-mF@mail.gmail.com>

Hello all,
I have a list of Pubmed Ids.
I want to parse articles to find specific SNP related information.
Can i work it out using a Script?


-- 
Regards,
Chirag Matkar


From genehack at genehack.org  Tue Aug  3 05:03:35 2010
From: genehack at genehack.org (John Anderson)
Date: Tue, 3 Aug 2010 05:03:35 -0400
Subject: [Bioperl-l] Pubmed Parsing
In-Reply-To: <AANLkTim+qcBN_9kXVLAkessaHUY9e=gc4Ad5MVGWk-mF@mail.gmail.com>
References: <AANLkTim+qcBN_9kXVLAkessaHUY9e=gc4Ad5MVGWk-mF@mail.gmail.com>
Message-ID: <5E557C44-224B-4460-9C2C-E375555B8BE6@genehack.org>


On Aug 3, 2010, at 3:47 AM, chirag matkar wrote:

> I have a list of Pubmed Ids.
> I want to parse articles to find specific SNP related information.
> Can i work it out using a Script?

Can you provide a more specific example of what you'd like to do? For example, something along the lines of, "for PMID 1234, get ... about SNP 5678" (where '...' is replaced with whatever it is you're trying to get). Even describing how you would obtain this information using the website yourself will be helpful.

thanks,
john.


From gowthaman.ramasamy at seattlebiomed.org  Tue Aug  3 01:29:10 2010
From: gowthaman.ramasamy at seattlebiomed.org (Gowthaman Ramasamy)
Date: Mon, 2 Aug 2010 22:29:10 -0700
Subject: [Bioperl-l] Getting pileup consensus from BAM files using
	Bio::DB::Sam
In-Reply-To: <AANLkTi=DS92zQ+_YANSbgmMiPUDo66YXyLhQby+_z_VD@mail.gmail.com>
Message-ID: <C87CF736.E5DB%gowthaman.ramasamy@sbri.org>

Hi List,
I am trying to find out the consensus using pileup via Bio::DB::Sam. Using the following script I could parse out the ref_base and different bases from reads at that position. Though, I am not able to find a method to derive consensus. Similar to the values produced by "samtools pileup -c -f xxxxxx.fasta yyyyyyy.bam".

The script I use now retrives ref base, query bases for each position. How do I improve it to get the consensus?

Thanks very much in advance,
Gowthaman


use Bio::DB::Sam;

my $bam = Bio::DB::Sam->new(-bam => 'something.bam',
                            -fasta => 'something.fasta'
                           );

my $cb = sub {
                        my ($seqid, $pos, $pileups) = @_;
                        my $refBase = $bam->segment($seqid, $pos, $pos)->dna;
                        print "\n$pos\t$refBase=>";
                        for my $pileup (@$pileups){
                                my $al = $pileup->alignment;
                                my $qBase = substr($al->qseq, $pileup->qpos, 1);
                                print "$qBase,";
                                }
                        };

$bam->pileup('Lin.chr10i', $cb);


From scott at scottcain.net  Tue Aug  3 06:32:59 2010
From: scott at scottcain.net (Scott Cain)
Date: Tue, 3 Aug 2010 06:32:59 -0400
Subject: [Bioperl-l] Getting pileup consensus from BAM files using
	Bio::DB::Sam
In-Reply-To: <C87CF736.E5DB%gowthaman.ramasamy@sbri.org>
References: <AANLkTi=DS92zQ+_YANSbgmMiPUDo66YXyLhQby+_z_VD@mail.gmail.com>
	<C87CF736.E5DB%gowthaman.ramasamy@sbri.org>
Message-ID: <AANLkTi=vkM5rhy2x_s3p1jZKPtnLjq4wWD=ebGxxmaha@mail.gmail.com>

Hi Gowthaman,

I don't see a method to extract the consensus.  You are welcome to
submit a patch :-)

Scott


On Tue, Aug 3, 2010 at 1:29 AM, Gowthaman Ramasamy
<gowthaman.ramasamy at seattlebiomed.org> wrote:
> Hi List,
> I am trying to find out the consensus using pileup via Bio::DB::Sam. Using the following script I could parse out the ref_base and different bases from reads at that position. Though, I am not able to find a method to derive consensus. Similar to the values produced by "samtools pileup -c -f xxxxxx.fasta yyyyyyy.bam".
>
> The script I use now retrives ref base, query bases for each position. How do I improve it to get the consensus?
>
> Thanks very much in advance,
> Gowthaman
>
>
> use Bio::DB::Sam;
>
> my $bam = Bio::DB::Sam->new(-bam => 'something.bam',
> ? ? ? ? ? ? ? ? ? ? ? ? ? ?-fasta => 'something.fasta'
> ? ? ? ? ? ? ? ? ? ? ? ? ? );
>
> my $cb = sub {
> ? ? ? ? ? ? ? ? ? ? ? ?my ($seqid, $pos, $pileups) = @_;
> ? ? ? ? ? ? ? ? ? ? ? ?my $refBase = $bam->segment($seqid, $pos, $pos)->dna;
> ? ? ? ? ? ? ? ? ? ? ? ?print "\n$pos\t$refBase=>";
> ? ? ? ? ? ? ? ? ? ? ? ?for my $pileup (@$pileups){
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?my $al = $pileup->alignment;
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?my $qBase = substr($al->qseq, $pileup->qpos, 1);
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?print "$qBase,";
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?}
> ? ? ? ? ? ? ? ? ? ? ? ?};
>
> $bam->pileup('Lin.chr10i', $cb);
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
Ontario Institute for Cancer Research


From lincoln.stein at gmail.com  Tue Aug  3 12:57:52 2010
From: lincoln.stein at gmail.com (Lincoln Stein)
Date: Tue, 3 Aug 2010 12:57:52 -0400
Subject: [Bioperl-l] Getting pileup consensus from BAM files using
	Bio::DB::Sam
In-Reply-To: <C87CF736.E5DB%gowthaman.ramasamy@sbri.org>
References: <AANLkTi=DS92zQ+_YANSbgmMiPUDo66YXyLhQby+_z_VD@mail.gmail.com>
	<C87CF736.E5DB%gowthaman.ramasamy@sbri.org>
Message-ID: <AANLkTimhT7KFHgg8HssiMrXkGNSjRMqjkuE++iAaqj4C@mail.gmail.com>

Samtools is running MAQ on the pileup. You could either implement MAQ in
perl, or come up with your own consensus caller.

Lincoln

On Tue, Aug 3, 2010 at 1:29 AM, Gowthaman Ramasamy <
gowthaman.ramasamy at seattlebiomed.org> wrote:

> Hi List,
> I am trying to find out the consensus using pileup via Bio::DB::Sam. Using
> the following script I could parse out the ref_base and different bases from
> reads at that position. Though, I am not able to find a method to derive
> consensus. Similar to the values produced by "samtools pileup -c -f
> xxxxxx.fasta yyyyyyy.bam".
>
> The script I use now retrives ref base, query bases for each position. How
> do I improve it to get the consensus?
>
> Thanks very much in advance,
> Gowthaman
>
>
> use Bio::DB::Sam;
>
> my $bam = Bio::DB::Sam->new(-bam => 'something.bam',
>                            -fasta => 'something.fasta'
>                           );
>
> my $cb = sub {
>                        my ($seqid, $pos, $pileups) = @_;
>                        my $refBase = $bam->segment($seqid, $pos,
> $pos)->dna;
>                        print "\n$pos\t$refBase=>";
>                        for my $pileup (@$pileups){
>                                my $al = $pileup->alignment;
>                                my $qBase = substr($al->qseq, $pileup->qpos,
> 1);
>                                print "$qBase,";
>                                }
>                        };
>
> $bam->pileup('Lin.chr10i', $cb);
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Renata Musa <Renata.Musa at oicr.on.ca>


From biopython at maubp.freeserve.co.uk  Tue Aug  3 13:06:46 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 3 Aug 2010 18:06:46 +0100
Subject: [Bioperl-l] Getting pileup consensus from BAM files using
	Bio::DB::Sam
In-Reply-To: <AANLkTimhT7KFHgg8HssiMrXkGNSjRMqjkuE++iAaqj4C@mail.gmail.com>
References: <AANLkTi=DS92zQ+_YANSbgmMiPUDo66YXyLhQby+_z_VD@mail.gmail.com>
	<C87CF736.E5DB%gowthaman.ramasamy@sbri.org>
	<AANLkTimhT7KFHgg8HssiMrXkGNSjRMqjkuE++iAaqj4C@mail.gmail.com>
Message-ID: <AANLkTinoszFJNtDeEbh_DyFLp97aayv7bYVu6c=znq1h@mail.gmail.com>

On Tue, Aug 3, 2010 at 5:57 PM, Lincoln Stein <lincoln.stein at gmail.com> wrote:
> Samtools is running MAQ on the pileup. You could either implement MAQ in
> perl, or come up with your own consensus caller.
>
> Lincoln

See also: http://seqanswers.com/forums/showthread.php?t=6241


From gowthaman.ramasamy at seattlebiomed.org  Tue Aug  3 13:28:36 2010
From: gowthaman.ramasamy at seattlebiomed.org (Gowthaman Ramasamy)
Date: Tue, 3 Aug 2010 10:28:36 -0700
Subject: [Bioperl-l] Getting pileup consensus from BAM files using
 Bio::DB::Sam
In-Reply-To: <AANLkTimhT7KFHgg8HssiMrXkGNSjRMqjkuE++iAaqj4C@mail.gmail.com>
References: <AANLkTi=DS92zQ+_YANSbgmMiPUDo66YXyLhQby+_z_VD@mail.gmail.com>
	<C87CF736.E5DB%gowthaman.ramasamy@sbri.org>,
	<AANLkTimhT7KFHgg8HssiMrXkGNSjRMqjkuE++iAaqj4C@mail.gmail.com>
Message-ID: <89080953C3D300419AACB6E63A7EEFBA5C47613B34@mail02.sbri.org>

Hi Lincoln,
Thats a good lead. I will try to use MAQ in perl rather than using my simple majority rule.

-gowtham
________________________________________
From: Lincoln Stein [lincoln.stein at gmail.com]
Sent: Tuesday, August 03, 2010 9:57 AM
To: Gowthaman Ramasamy
Cc: bioperl-l
Subject: Re: [Bioperl-l] Getting pileup consensus from BAM files using  Bio::DB::Sam

Samtools is running MAQ on the pileup. You could either implement MAQ in perl, or come up with your own consensus caller.

Lincoln

On Tue, Aug 3, 2010 at 1:29 AM, Gowthaman Ramasamy <gowthaman.ramasamy at seattlebiomed.org<mailto:gowthaman.ramasamy at seattlebiomed.org>> wrote:
Hi List,
I am trying to find out the consensus using pileup via Bio::DB::Sam. Using the following script I could parse out the ref_base and different bases from reads at that position. Though, I am not able to find a method to derive consensus. Similar to the values produced by "samtools pileup -c -f xxxxxx.fasta yyyyyyy.bam".

The script I use now retrives ref base, query bases for each position. How do I improve it to get the consensus?

Thanks very much in advance,
Gowthaman


use Bio::DB::Sam;

my $bam = Bio::DB::Sam->new(-bam => 'something.bam',
                           -fasta => 'something.fasta'
                          );

my $cb = sub {
                       my ($seqid, $pos, $pileups) = @_;
                       my $refBase = $bam->segment($seqid, $pos, $pos)->dna;
                       print "\n$pos\t$refBase=>";
                       for my $pileup (@$pileups){
                               my $al = $pileup->alignment;
                               my $qBase = substr($al->qseq, $pileup->qpos, 1);
                               print "$qBase,";
                               }
                       };

$bam->pileup('Lin.chr10i', $cb);

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org<mailto:Bioperl-l at lists.open-bio.org>
http://lists.open-bio.org/mailman/listinfo/bioperl-l


--
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Renata Musa <Renata.Musa at oicr.on.ca<mailto:Renata.Musa at oicr.on.ca>>


From stefan.kirov at bms.com  Tue Aug  3 16:22:35 2010
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Tue, 03 Aug 2010 16:22:35 -0400
Subject: [Bioperl-l] nmica parser
Message-ID: <4C587A8B.8090603@bms.com>

Has anyone written nmica parser? If not I will perhaps do that. It 
should be straightforward- the output is XML.
Stefan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: stefan_kirov.vcf
Type: text/x-vcard
Size: 207 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20100803/7e4ab529/attachment-0003.vcf>

From fs5 at sanger.ac.uk  Wed Aug  4 04:45:39 2010
From: fs5 at sanger.ac.uk (Frank Schwach)
Date: Wed, 04 Aug 2010 09:45:39 +0100
Subject: [Bioperl-l] Pubmed Parsing
In-Reply-To: <AANLkTim+qcBN_9kXVLAkessaHUY9e=gc4Ad5MVGWk-mF@mail.gmail.com>
References: <AANLkTim+qcBN_9kXVLAkessaHUY9e=gc4Ad5MVGWk-mF@mail.gmail.com>
Message-ID: <1280911539.3499.46.camel@deskpro15336.dynamic.sanger.ac.uk>

Hi Chiraq,

have a look at this earlier post:
http://bioperl.org/pipermail/bioperl-l/2009-April/029690.html

However, you won't be able to retrieve all full texts and it is quite a
task to parse natural language and get useful information about a gene,
protein, SNP etc out of a manuscript. 

Frank

On Tue, 2010-08-03 at 13:17 +0530, chirag matkar wrote:
> Hello all,
> I have a list of Pubmed Ids.
> I want to parse articles to find specific SNP related information.
> Can i work it out using a Script?
> 
> 
> 
> 
> 


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


From David.Messina at sbc.su.se  Thu Aug  5 08:16:17 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 5 Aug 2010 14:16:17 +0200
Subject: [Bioperl-l] call for a TreeIO volunteer
Message-ID: <91AC5B00-5969-4C56-B08A-6EEA76916A10@sbc.su.se>

Hi everybody,

We've got a couple of small open bugs related to the Bio::TreeIO modules, and we could really use someone to take a look at them. Ideally, that someone would have familiarity with TreeIO already.*

It'd help us to get the next release (1.6.2) out the door.

The bugs in question are:
- TreeIO::newick writes root node branch length incorrectly
http://bugzilla.open-bio.org/show_bug.cgi?id=3039

- Bio::TreeIO::nhx cannot parse empty [&&NHX] + round-trip failure
http://bugzilla.open-bio.org/show_bug.cgi?id=3007


Thanks,
Dave
on behalf of the core developers


* Even if you don't, though, if you've been looking for an opportunity to contribute to BioPerl, and this sounds like something you'd like to work on, by all means raise your hand.


From clements at nescent.org  Thu Aug  5 13:15:41 2010
From: clements at nescent.org (Dave Clements)
Date: Thu, 5 Aug 2010 10:15:41 -0700
Subject: [Bioperl-l] GMOD Europe 2010, 13-16 Sept, Cambridge, UK
In-Reply-To: <AANLkTinpd0pP9cBGUfnEd8PuV-VOcfqz6VKdCRp0d=uA@mail.gmail.com>
References: <AANLkTinpd0pP9cBGUfnEd8PuV-VOcfqz6VKdCRp0d=uA@mail.gmail.com>
Message-ID: <AANLkTi=BCjD3w0w4S+44qRb4ShW-P6DVBH0SZ+41k1Ah@mail.gmail.com>

GMOD Europe 2010
================
13-16 September 2010
Cambridge, UK
http://gmod.org/wiki/GMOD_Europe_2010


We are pleased to announce GMOD Europe 2010, four days of GMOD events being
held 13-16 September 2010, at the University of Cambridge. GMOD Europe 2010
includes:

1) GMOD Community Meeting, Monday & Tuesday:  Project updates, developer and
user presentations and best practices, project direction.

2) GMOD Satellite Meetings, Wednesday:  Special interest groups where GMOD
community members meet to discuss specific topics of interest.

3) InterMine Workshop, Wednesday:  A one day workshop on installing,
configuring and using the InterMine biological data warehouse system.

4) BioMart Workshop, Thursday:  A one day workshop on using the BioMart
biological data warehouse system, including accessing data through APIs.

Registration is now open for these events. There is a ?50 registration fee
for the GMOD Meeting to cover catered lunches and other expenses.
Registration for all other events is free, but required, as space is
limited.  These events are open to all: GMOD users, developers, prospective
users, biologists, and computer scientists.  See
http://gmod.org/wiki/January_2010_GMOD_Meeting for an idea of what goes on
at GMOD meetings,

GMOD is a collection of interoperable open source software components for
managing, visualizing and annotating biological data.  GMOD incorporates
many widely used tools, including GBrowse and JBrowse for genome browsing,
InterMine and BioMart for data mining, Galaxy and Ergatis for workflow,
Chado for data management, GBrowse_syn and CMap for comparative genomics,
plus many other tools (Apollo, MAKER, Pathway Tools, Textpresso, ...).  GMOD
is also an active community of researchers and developers addressing common
challenges in exploiting their data.  If you are struggling to fully exploit
your data then please consider attending GMOD Europe 2010.

Please let us know if you have any questions, and we hope to see you in
Cambridge.

Thanks,

Scott Cain and Dave Clements
-- 
http://gmod.org/wiki/GMOD_News
 <http://gmod.org/wiki/GMOD_News>http://gmod.org/wiki/GMOD_Evo_Hackathon
http://gmod.org/wiki/GMOD_Europe_2010
http://gmod.org/wiki/Help_Desk_Feedback


From abhishek.vit at gmail.com  Thu Aug  5 18:15:56 2010
From: abhishek.vit at gmail.com (Abhishek Pratap)
Date: Thu, 5 Aug 2010 18:15:56 -0400
Subject: [Bioperl-l] Wrapper for Picard tools in Bioperl
Message-ID: <AANLkTi=rrPKSuuddK-+gTqPyo-wKQA0ZamDP59_+dUfi@mail.gmail.com>

Hi All

Just wondering if there is any Picard wrapper/s available in Bioperl.


Thanks!
-Abhi

-----------------------------
Abhishek Pratap
Bioinformatics Software Engineer II
Genomics Resource Center
Institute for Genome Sciences
School of Medicine, Univ of Maryland
801, W. Baltimore Street, Baltimore, MD 21209
Ph: (+1)-410-706-2296
www.igs.umaryland.edu/


From Russell.Smithies at agresearch.co.nz  Thu Aug  5 18:37:46 2010
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Fri, 6 Aug 2010 10:37:46 +1200
Subject: [Bioperl-l] Wrapper for Picard tools in Bioperl
In-Reply-To: <AANLkTi=rrPKSuuddK-+gTqPyo-wKQA0ZamDP59_+dUfi@mail.gmail.com>
References: <AANLkTi=rrPKSuuddK-+gTqPyo-wKQA0ZamDP59_+dUfi@mail.gmail.com>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF32F02262E96@exchsth.agresearch.co.nz>

Might be part of the "Enterprise" package.
If not, some developer should "make it so".

:-)

--Russell
(I hate Fridays)

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Abhishek Pratap
> Sent: Friday, 6 August 2010 10:16 a.m.
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Wrapper for Picard tools in Bioperl
> 
> Hi All
> 
> Just wondering if there is any Picard wrapper/s available in Bioperl.
> 
> 
> Thanks!
> -Abhi
> 
> -----------------------------
> Abhishek Pratap
> Bioinformatics Software Engineer II
> Genomics Resource Center
> Institute for Genome Sciences
> School of Medicine, Univ of Maryland
> 801, W. Baltimore Street, Baltimore, MD 21209
> Ph: (+1)-410-706-2296
> www.igs.umaryland.edu/
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From cjfields at illinois.edu  Thu Aug  5 19:10:16 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 5 Aug 2010 18:10:16 -0500
Subject: [Bioperl-l] Wrapper for Picard tools in Bioperl
In-Reply-To: <AANLkTi=rrPKSuuddK-+gTqPyo-wKQA0ZamDP59_+dUfi@mail.gmail.com>
References: <AANLkTi=rrPKSuuddK-+gTqPyo-wKQA0ZamDP59_+dUfi@mail.gmail.com>
Message-ID: <26E3E5B6-47CF-4744-9687-199C218B5571@illinois.edu>

Picard uses samtools, which has a perl API:

http://search.cpan.org/dist/Bio-SamTools/

which uses BioPerl.  Ah, the circle of life...

chris

On Aug 5, 2010, at 5:15 PM, Abhishek Pratap wrote:

> Hi All
> 
> Just wondering if there is any Picard wrapper/s available in Bioperl.
> 
> 
> Thanks!
> -Abhi
> 
> -----------------------------
> Abhishek Pratap
> Bioinformatics Software Engineer II
> Genomics Resource Center
> Institute for Genome Sciences
> School of Medicine, Univ of Maryland
> 801, W. Baltimore Street, Baltimore, MD 21209
> Ph: (+1)-410-706-2296
> www.igs.umaryland.edu/
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From dan.kortschak at adelaide.edu.au  Thu Aug  5 21:06:45 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Fri, 06 Aug 2010 10:36:45 +0930
Subject: [Bioperl-l] MUMmer parser work
Message-ID: <1281056805.2414.26.camel@zoidberg.mbs.adelaide.edu.au>

Hello Everyone,

I've just noticed the absence of a MUMmer parser and thought that it
might be a worthwhile contribution to bioperl-run (I won't be able to
start on this for a while, but given Mark's excellent work on
CommandExts, it should take too long to get up when I do have time). Has
anyone made any effort in this direction that I would be stepping on, or
if they have left it, that I could pick up to shorten the work time?

cheers
Dan


From cjfields at illinois.edu  Thu Aug  5 23:13:51 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 5 Aug 2010 22:13:51 -0500
Subject: [Bioperl-l] MUMmer parser work
In-Reply-To: <1281056805.2414.26.camel@zoidberg.mbs.adelaide.edu.au>
References: <1281056805.2414.26.camel@zoidberg.mbs.adelaide.edu.au>
Message-ID: <80AF6158-9ADF-47A6-97EC-C322F75C8959@illinois.edu>

Dan,

Just so you know, there is a proposed MUMmer AlignIO parser that John (genehack) is planning on trying to incorporate in:

http://bugzilla.open-bio.org/show_bug.cgi?id=2701

It currently lacks significant tests, so feel free to chip in there as needed.

chris

On Aug 5, 2010, at 8:06 PM, Dan Kortschak wrote:

> Hello Everyone,
> 
> I've just noticed the absence of a MUMmer parser and thought that it
> might be a worthwhile contribution to bioperl-run (I won't be able to
> start on this for a while, but given Mark's excellent work on
> CommandExts, it should take too long to get up when I do have time). Has
> anyone made any effort in this direction that I would be stepping on, or
> if they have left it, that I could pick up to shorten the work time?
> 
> cheers
> Dan
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From greg at ebi.ac.uk  Fri Aug  6 05:47:21 2010
From: greg at ebi.ac.uk (Gregory Jordan)
Date: Fri, 6 Aug 2010 10:47:21 +0100
Subject: [Bioperl-l] call for a TreeIO volunteer
In-Reply-To: <91AC5B00-5969-4C56-B08A-6EEA76916A10@sbc.su.se>
References: <91AC5B00-5969-4C56-B08A-6EEA76916A10@sbc.su.se>
Message-ID: <AANLkTiknuVWFiz6kmOYAsHaLnPxMZEBWsHeBtv0yfuCQ@mail.gmail.com>

I can help out with these. I'm pretty sure I've previously fought with (and
perhaps even come up with a fix for) bug 3039, and I can take a look at 3007
too.

Now lemme just see if I can get up and running with the Bioperl test suite.
I'll give a shout if I run into any problems.

Cheers,
 Greg

On 5 August 2010 13:16, Dave Messina <David.Messina at sbc.su.se> wrote:

> Hi everybody,
>
> We've got a couple of small open bugs related to the Bio::TreeIO modules,
> and we could really use someone to take a look at them. Ideally, that
> someone would have familiarity with TreeIO already.*
>
> It'd help us to get the next release (1.6.2) out the door.
>
> The bugs in question are:
> - TreeIO::newick writes root node branch length incorrectly
> http://bugzilla.open-bio.org/show_bug.cgi?id=3039
>
> - Bio::TreeIO::nhx cannot parse empty [&&NHX] + round-trip failure
> http://bugzilla.open-bio.org/show_bug.cgi?id=3007
>
>
> Thanks,
> Dave
> on behalf of the core developers
>
>
> * Even if you don't, though, if you've been looking for an opportunity to
> contribute to BioPerl, and this sounds like something you'd like to work on,
> by all means raise your hand.
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From jun.yin at ucd.ie  Fri Aug  6 06:52:14 2010
From: jun.yin at ucd.ie (Jun Yin)
Date: Fri, 06 Aug 2010 11:52:14 +0100
Subject: [Bioperl-l] Packages retrieving online alignment sequences
Message-ID: <00d901cb3555$6e3a5500$4aaeff00$%yin@ucd.ie>

Hi, all,

 
I am the google summer of code student working on refactoring Bio::Align
subsystem. I recently implemented several packages retrieving online
alignment sequences. The aim of the packages are to provide convenient
methods to retrieve online alignment sequences for the BioPerl users. The
alignment sequences are converted into Bio::SimpleAlign object after the
retrieval, which will be easy to manipulate and write to local disk. Now the
packages support Pfam, Rfam, Prosite and Entrez Protein Clusters databases.

 
Here is the structure of the packages:

Packages

Bio::DB::Align (interface, and calling other packages)

Bio::DB::Align::Pfam (retrieving alignment from Pfam)

Bio::DB::Align::Rfam (retrieving alignment from Rfam)

Bio::DB::Align:Prosite (retrieving alignment from Prosite)

Bio::DB::Align:ProtClustDB (retrieving alignment from Entrez Protein
Clusters Database)

 
Usually four methods are provided for each package:

Methods

get_Aln_by_id (retrieving alignment by id and returns Bio::SimpleAlign
object)

get_Aln_by_acc (retrieving alignment by acession and returns
Bio::SimpleAlign object) (Rfam and Prosite only supports this method)

id2acc (id to accession conversion)

acc2id (accession to id conversion)

 
These packages are built dependent on LWP::UserAgent, HTTP::Request and
Bio::DB::GenericWebAgent. Bio::DB::Align::ProtClustDB is dependent on
Bio::DB::EUtilities.

 
Calling the packages can be:

 
my $dbobj=Bio::DB::Align->new(-db=>"rfam");

Or, my $dbobj= Bio::DB::Align::Pfam->new();


my $aln=$dbobj->get_Aln_by_acc("RF0001");
my $aln2=$dbobj->get_Aln_by_acc(-accession=>"RF0001",-alignment=>"full");

print $aln->length();

foreach my $seq ($aln->each_Seq) {
#do something
}

 
I have done some tests on these packages. And, I will write them into
standard tests later. Any suggestions on these packages are welcome.

 
Cheers,

Jun Yin

Ph.D. student in U.C.D.

 
Bioinformatics Laboratory

Conway Institute

University College Dublin

 
From David.Messina at sbc.su.se  Fri Aug  6 08:59:19 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 6 Aug 2010 14:59:19 +0200
Subject: [Bioperl-l] call for a TreeIO volunteer
In-Reply-To: <AANLkTiknuVWFiz6kmOYAsHaLnPxMZEBWsHeBtv0yfuCQ@mail.gmail.com>
References: <91AC5B00-5969-4C56-B08A-6EEA76916A10@sbc.su.se>
	<AANLkTiknuVWFiz6kmOYAsHaLnPxMZEBWsHeBtv0yfuCQ@mail.gmail.com>
Message-ID: <6D6DAA77-2A2F-4AAA-B36D-FACED1FDE383@sbc.su.se>


> I can help out with these. I'm pretty sure I've previously fought with (and perhaps even come up with a fix for) bug 3039, and I can take a look at 3007 too.

Awesome ? thanks Greg!


> Now lemme just see if I can get up and running with the Bioperl test suite. I'll give a shout if I run into any problems.

Please do.


Dave


From David.Messina at sbc.su.se  Fri Aug  6 09:06:47 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 6 Aug 2010 15:06:47 +0200
Subject: [Bioperl-l] Packages retrieving online alignment sequences
In-Reply-To: <00d901cb3555$6e3a5500$4aaeff00$%yin@ucd.ie>
References: <00d901cb3555$6e3a5500$4aaeff00$%yin@ucd.ie>
Message-ID: <F90660F7-74F9-41F2-A3E4-B3B42B817A0D@sbc.su.se>

Sounds great, Jun!

Did you happen to test your code on very large alignments? I know there's one in Pfam that's something like 100,000 sequences. An rRNA, I believe.


Dave


From jun.yin at ucd.ie  Fri Aug  6 09:11:41 2010
From: jun.yin at ucd.ie (Jun Yin)
Date: Fri, 06 Aug 2010 14:11:41 +0100
Subject: [Bioperl-l] Packages retrieving online alignment sequences
In-Reply-To: <F90660F7-74F9-41F2-A3E4-B3B42B817A0D@sbc.su.se>
References: <00d901cb3555$6e3a5500$4aaeff00$%yin@ucd.ie>
	<F90660F7-74F9-41F2-A3E4-B3B42B817A0D@sbc.su.se>
Message-ID: <00fc01cb3568$e97968b0$bc6c3a10$%yin@ucd.ie>

Hi, Dave,

Thx for reminding me this. I will definitely try it.

Cheers,
Jun Yin
Ph.D.?student in U.C.D.

Bioinformatics Laboratory
Conway Institute
University College Dublin


-----Original Message-----
From: Dave Messina [mailto:David.Messina at sbc.su.se] 
Sent: Friday, August 06, 2010 2:07 PM
To: Jun Yin
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Packages retrieving online alignment sequences

Sounds great, Jun!

Did you happen to test your code on very large alignments? I know there's
one in Pfam that's something like 100,000 sequences. An rRNA, I believe.


Dave


__________ Information from ESET Smart Security, version of virus signature
database 5346 (20100806) __________

The message was checked by ESET Smart Security.

http://www.eset.com


__________ Information from ESET Smart Security, version of virus signature
database 5346 (20100806) __________

The message was checked by ESET Smart Security.

http://www.eset.com
 

From cjfields at illinois.edu  Fri Aug  6 09:19:54 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 6 Aug 2010 08:19:54 -0500
Subject: [Bioperl-l] call for a TreeIO volunteer
In-Reply-To: <6D6DAA77-2A2F-4AAA-B36D-FACED1FDE383@sbc.su.se>
References: <91AC5B00-5969-4C56-B08A-6EEA76916A10@sbc.su.se>
	<AANLkTiknuVWFiz6kmOYAsHaLnPxMZEBWsHeBtv0yfuCQ@mail.gmail.com>
	<6D6DAA77-2A2F-4AAA-B36D-FACED1FDE383@sbc.su.se>
Message-ID: <8CB3DE9A-4C5C-42A3-94B4-8818D7143951@illinois.edu>

On Aug 6, 2010, at 7:59 AM, Dave Messina wrote:

> 
>> I can help out with these. I'm pretty sure I've previously fought with (and perhaps even come up with a fix for) bug 3039, and I can take a look at 3007 too.
> 
> Awesome ? thanks Greg!
> 
> 
>> Now lemme just see if I can get up and running with the Bioperl test suite. I'll give a shout if I run into any problems.
> 
> Please do.
> 
> 
> 
> Dave

Agreed, and thanks for helping out!

chris


From dianabowley at gmail.com  Fri Aug  6 18:33:57 2010
From: dianabowley at gmail.com (DRBowley)
Date: Fri, 6 Aug 2010 15:33:57 -0700 (PDT)
Subject: [Bioperl-l] BioPerl install issues
Message-ID: <b70994fe-d6c3-4c58-8b45-dfe50b9a8fe5@t5g2000prd.googlegroups.com>

I'm new to both perl and bioperl and I'm having issues installing
bioperl.  I'm trying to install on a Mac OS 10.6.4, and I've already
installed perl (5.10.0).  I tried installing using the recommended
approach for Mac - via Fink...
"fink install bioperl-pm5100"

Looking back over the terminal window text it looks like the problem
is:
"This package requires Module::Build v0.2805 or greater to install
itself."

I tried doing "fink selfupdate" and that did not fix the problem.

Any suggestions?

Thanks!
Diana


From Kevin.M.Brown at asu.edu  Fri Aug  6 18:50:45 2010
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Fri, 6 Aug 2010 15:50:45 -0700
Subject: [Bioperl-l] BioPerl install issues
In-Reply-To: <b70994fe-d6c3-4c58-8b45-dfe50b9a8fe5@t5g2000prd.googlegroups.com>
References: <b70994fe-d6c3-4c58-8b45-dfe50b9a8fe5@t5g2000prd.googlegroups.com>
Message-ID: <1A4207F8295607498283FE9E93B775B406E44A05@EX02.asurite.ad.asu.edu>

http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix#INSTALLING_BIOPE
RL_THE_EASY_WAY_USING_Build.PL

Not sure why you had to install perl since it should have been part of
the stock OSX install (or at least it was last time I logged onto a
mac). Not sure why the Fink method has so many issues, but might try the
above which works for linux or bsd.

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of DRBowley
Sent: Friday, August 06, 2010 3:34 PM
To: bioperl-l at bioperl.org
Subject: [Bioperl-l] BioPerl install issues

I'm new to both perl and bioperl and I'm having issues installing
bioperl.  I'm trying to install on a Mac OS 10.6.4, and I've already
installed perl (5.10.0).  I tried installing using the recommended
approach for Mac - via Fink...
"fink install bioperl-pm5100"

Looking back over the terminal window text it looks like the problem
is:
"This package requires Module::Build v0.2805 or greater to install
itself."

I tried doing "fink selfupdate" and that did not fix the problem.

Any suggestions?

Thanks!
Diana
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From skastu01 at students.poly.edu  Fri Aug  6 20:03:50 2010
From: skastu01 at students.poly.edu (Lakshmi Kastury)
Date: Sat, 7 Aug 2010 00:03:50 +0000
Subject: [Bioperl-l] BioPerl install issues
Message-ID: <BLU106-W267722078497EAEDEC08C594920@phx.gbl>


Hi -
I went through several failed attempts on MACOS Snow Leopard, and fink was a dead end. Eventually I succeeded to install on Windows Vista using CPAN. I am not sure if this method will work with MACOS:

1. Opened command prompt.
2. Typed command: >perl -MCPAN -e "install Bundle::BioPerl"
3. Answered yes to the series of questions, which prompts install of several bundles and a compiler.

The instructions were in a link from:
http://bioperl.org/Core/Latest/INSTALL

All the best,
Lakshmi

> Date: Fri, 6 Aug 2010 15:33:57 -0700
> From: dianabowley at gmail.com
> To: bioperl-l at bioperl.org
> Subject: [Bioperl-l] BioPerl install issues
> 
> I'm new to both perl and bioperl and I'm having issues installing
> bioperl.  I'm trying to install on a Mac OS 10.6.4, and I've already
> installed perl (5.10.0).  I tried installing using the recommended
> approach for Mac - via Fink...
> "fink install bioperl-pm5100"
> 
> Looking back over the terminal window text it looks like the problem
> is:
> "This package requires Module::Build v0.2805 or greater to install
> itself."
> 
> I tried doing "fink selfupdate" and that did not fix the problem.
> 
> Any suggestions?
> 
> Thanks!
> Diana
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
 		 	   		  

From David.Messina at sbc.su.se  Sat Aug  7 02:47:40 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Sat, 7 Aug 2010 08:47:40 +0200
Subject: [Bioperl-l] BioPerl install issues
In-Reply-To: <BLU106-W267722078497EAEDEC08C594920@phx.gbl>
References: <BLU106-W267722078497EAEDEC08C594920@phx.gbl>
Message-ID: <5BE9DB7C-9A51-4C09-8F83-8CA8ED4AADFE@sbc.su.se>


On Aug 7, 2010, at 02:03 , Lakshmi Kastury wrote:

>  I am not sure if this method will work with MACOS:

It will. CPAN is cross-platform and is the best way to install BioPerl.


Dave


From cjfields at illinois.edu  Sat Aug  7 09:58:56 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Sat, 7 Aug 2010 08:58:56 -0500
Subject: [Bioperl-l] BioPerl install issues
In-Reply-To: <5BE9DB7C-9A51-4C09-8F83-8CA8ED4AADFE@sbc.su.se>
References: <BLU106-W267722078497EAEDEC08C594920@phx.gbl>
	<5BE9DB7C-9A51-4C09-8F83-8CA8ED4AADFE@sbc.su.se>
Message-ID: <A21BBC5D-1D71-4534-B278-9FCFA0BB6DA8@illinois.edu>

It should work fine.  Even installing from trunk right now works w/o failing tests. 

chris

On Aug 7, 2010, at 1:47 AM, Dave Messina wrote:

> 
> On Aug 7, 2010, at 02:03 , Lakshmi Kastury wrote:
> 
>> I am not sure if this method will work with MACOS:
> 
> It will. CPAN is cross-platform and is the best way to install BioPerl.
> 
> 
> Dave
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From greg at ebi.ac.uk  Sat Aug  7 17:14:58 2010
From: greg at ebi.ac.uk (Gregory Jordan)
Date: Sat, 7 Aug 2010 22:14:58 +0100
Subject: [Bioperl-l] Packages retrieving online alignment sequences
In-Reply-To: <00fc01cb3568$e97968b0$bc6c3a10$%yin@ucd.ie>
References: <00d901cb3555$6e3a5500$4aaeff00$%yin@ucd.ie>
	<F90660F7-74F9-41F2-A3E4-B3B42B817A0D@sbc.su.se> 
	<00fc01cb3568$e97968b0$bc6c3a10$%yin@ucd.ie>
Message-ID: <AANLkTimL938B1ovmOKC_FBNw1OwjipVpjOXZfN+P5Kf5@mail.gmail.com>

Maybe I'm just a bit naive here, but what is the expected difference between
accession and ID and why do we need a separate method for each? Seems to me
that one could just have a single method, get_Aln, which determines under
the hood whether the query string is an accession or ID.

It would be nice if the SimpleAlign object had its Annotation filled with
some extra metadata (such as accession, ID, database version number, URI,
etc.).

One other thing: have you thought about adding an Ensembl adaptor? Or maybe
something similar already exists in BioPerl...?

Sure Ensembl provides their own Perl API, but for someone who doesn't want
to go through the hassle of installing it from CVS (pardon my french, but
wtf!?! Who still uses CVS) and learning a whole new API, it might be
convenient to have a simple BioPerl module for quickly grabbing gene family
alignments from the public Ensembl MySQL databases. I'd be willing to help
write the necessary SQL queries for this.

greg

On 6 August 2010 14:11, Jun Yin <jun.yin at ucd.ie> wrote:

> Hi, Dave,
>
> Thx for reminding me this. I will definitely try it.
>
> Cheers,
> Jun Yin
> Ph.D. student in U.C.D.
>
> Bioinformatics Laboratory
> Conway Institute
> University College Dublin
>
>
> -----Original Message-----
> From: Dave Messina [mailto:David.Messina at sbc.su.se]
> Sent: Friday, August 06, 2010 2:07 PM
> To: Jun Yin
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Packages retrieving online alignment sequences
>
> Sounds great, Jun!
>
> Did you happen to test your code on very large alignments? I know there's
> one in Pfam that's something like 100,000 sequences. An rRNA, I believe.
>
>
> Dave
>
>
> __________ Information from ESET Smart Security, version of virus signature
> database 5346 (20100806) __________
>
> The message was checked by ESET Smart Security.
>
> http://www.eset.com
>
>
>
>
> __________ Information from ESET Smart Security, version of virus signature
> database 5346 (20100806) __________
>
> The message was checked by ESET Smart Security.
>
> http://www.eset.com
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at illinois.edu  Sat Aug  7 18:07:39 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Sat, 7 Aug 2010 17:07:39 -0500
Subject: [Bioperl-l] Packages retrieving online alignment sequences
In-Reply-To: <AANLkTimL938B1ovmOKC_FBNw1OwjipVpjOXZfN+P5Kf5@mail.gmail.com>
References: <00d901cb3555$6e3a5500$4aaeff00$%yin@ucd.ie>
	<F90660F7-74F9-41F2-A3E4-B3B42B817A0D@sbc.su.se>
	<00fc01cb3568$e97968b0$bc6c3a10$%yin@ucd.ie>
	<AANLkTimL938B1ovmOKC_FBNw1OwjipVpjOXZfN+P5Kf5@mail.gmail.com>
Message-ID: <21E3B6D7-01BC-4DDA-B5B3-06F1F5AD7105@illinois.edu>

On Aug 7, 2010, at 4:14 PM, Gregory Jordan wrote:

> Maybe I'm just a bit naive here, but what is the expected difference between
> accession and ID and why do we need a separate method for each?

Depends on the remote service, but in many cases there is a difference.  With NCBI eutils you can have either an accession and the unique identifier (UID, or GI for nuc/protein seqs).  efetch can use both, but only the UID is guaranteed to retrieve a single sequence all the time; the accession can (very rarely) map to more than one sequence.  

The other eutils services require either a string (esearch) or a UID, but do not allow an accession.

> Seems to me
> that one could just have a single method, get_Aln, which determines under
> the hood whether the query string is an accession or ID.

A simpler method could be introduced, but I can see that being potentially brittle in the long run.  A naked alphanumeric string doesn't reveal much about what it is at face value w/o knowing database/service-specific behavior.  And then we're reliant on that behavior not changing, which we can't guarantee (this has bitten us in the past).  What would one do if NCBI (for instance) allowed accessions derived completely of digits, or conversely a unique ID with mixed alphanumerics?

Using methods specific for ID/acc at least guarantees a behavior on the backend w/o guessing, and if there is no danger of overlap (a service accepts either/or) one could simply be an alias of the other.

> It would be nice if the SimpleAlign object had its Annotation filled with
> some extra metadata (such as accession, ID, database version number, URI,
> etc.).

According to the deobfuscator SimpleAlign does have accession() and id().  The others could be simple attributes, and can be added as simple getter/setters, or as annotation via Bio::Annotation (this is the way Stockholm annotation is currently handled).  Something to think about.

> One other thing: have you thought about adding an Ensembl adaptor? Or maybe
> something similar already exists in BioPerl...?

That's a good idea, though it might make more sense if this was done when mem-efficient (possibly DB-dependent) AlignI modules are present within bioperl, which is part of the GSoC (see below).  For instance, have a Bio::Align::AlignI with a backend ensembl DB adaptor that works lazily.

If using the Ensembl Perl API, a few possible roadblocks/problems might pop up. Ensembl currently requires bioperl (v1.2.3, but it works with the latest as well, at least when I've used it).  If using the ensembl perl API we would just need to ensure we aren't conflicting with ensembl code that pulls in bioperl classes expecting a v1.2.3 API when we only support the latest.  I don't foresee this being an issue, though (there is precedent for this, see Sendu's Ensembl module Bio::Tools::Run::Ensembl in bioperl-run).

> Sure Ensembl provides their own Perl API, but for someone who doesn't want
> to go through the hassle of installing it from CVS (pardon my french, but
> wtf!?! Who still uses CVS) and learning a whole new API, it might be
> convenient to have a simple BioPerl module for quickly grabbing gene family
> alignments from the public Ensembl MySQL databases. I'd be willing to help
> write the necessary SQL queries for this.
> 
> greg

The GSoC project on alignment subsystem refactoring will be finishing up this month, so I'm sure Jun discuss ideas for initial DB-dependent implementations.  The more input and coders implementing the better, IMO.

As for writing up an adaptor to ensembl outside of it's API, overall I don't think it's a bad idea, but if it's possible maybe start without reinventing things, then move to direct SQL.  Unless it's easier to use SQL.

chris

> On 6 August 2010 14:11, Jun Yin <jun.yin at ucd.ie> wrote:
> 
>> Hi, Dave,
>> 
>> Thx for reminding me this. I will definitely try it.
>> 
>> Cheers,
>> Jun Yin
>> Ph.D. student in U.C.D.
>> 
>> Bioinformatics Laboratory
>> Conway Institute
>> University College Dublin
>> 
>> 
>> -----Original Message-----
>> From: Dave Messina [mailto:David.Messina at sbc.su.se]
>> Sent: Friday, August 06, 2010 2:07 PM
>> To: Jun Yin
>> Cc: bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] Packages retrieving online alignment sequences
>> 
>> Sounds great, Jun!
>> 
>> Did you happen to test your code on very large alignments? I know there's
>> one in Pfam that's something like 100,000 sequences. An rRNA, I believe.
>> 
>> 
>> Dave
>> 
>> 
>> __________ Information from ESET Smart Security, version of virus signature
>> database 5346 (20100806) __________
>> 
>> The message was checked by ESET Smart Security.
>> 
>> http://www.eset.com
>> 
>> 
>> 
>> 
>> __________ Information from ESET Smart Security, version of virus signature
>> database 5346 (20100806) __________
>> 
>> The message was checked by ESET Smart Security.
>> 
>> http://www.eset.com
>> 
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hartzell at alerce.com  Sat Aug  7 17:45:04 2010
From: hartzell at alerce.com (George Hartzell)
Date: Sat, 7 Aug 2010 14:45:04 -0700
Subject: [Bioperl-l] BioPerl install issues
In-Reply-To: <A21BBC5D-1D71-4534-B278-9FCFA0BB6DA8@illinois.edu>
References: <BLU106-W267722078497EAEDEC08C594920@phx.gbl>
	<5BE9DB7C-9A51-4C09-8F83-8CA8ED4AADFE@sbc.su.se>
	<A21BBC5D-1D71-4534-B278-9FCFA0BB6DA8@illinois.edu>
Message-ID: <19549.54240.499140.501136@gargle.gargle.HOWL>

Chris Fields writes:
 > It should work fine.  Even installing from trunk right now works
 > w/o failing tests.  

As a slight aside, if you're looking to build a current perl binary
for your mac (e.g. 5.12.1) you should take a look at perlbrew
(http://search.cpan.org/dist/App-perlbrew/).  The three steps at the
top of the installation section of the README are all you need to get
going.  Even a manager can do it.

If you're using bash on the mac via terminal you'll probably want to
put the one-liner they prescribe into your .bash_profile instead of
your .bashrc, but everything else just flows right along.

Once you have that in place you have a nicely isolated system into
which you can install things to your hearts content without worrying
about PERL5LIB and local::lib and the rest.

g.


From cjfields at illinois.edu  Sat Aug  7 21:19:54 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Sat, 7 Aug 2010 20:19:54 -0500
Subject: [Bioperl-l] BioPerl install issues
In-Reply-To: <19549.54240.499140.501136@gargle.gargle.HOWL>
References: <BLU106-W267722078497EAEDEC08C594920@phx.gbl>
	<5BE9DB7C-9A51-4C09-8F83-8CA8ED4AADFE@sbc.su.se>
	<A21BBC5D-1D71-4534-B278-9FCFA0BB6DA8@illinois.edu>
	<19549.54240.499140.501136@gargle.gargle.HOWL>
Message-ID: <EA5D5C26-7F3E-46B5-9CD0-F3D51B5F9511@illinois.edu>

On Aug 7, 2010, at 4:45 PM, George Hartzell wrote:

> Chris Fields writes:
>> It should work fine.  Even installing from trunk right now works
>> w/o failing tests.  
> 
> As a slight aside, if you're looking to build a current perl binary
> for your mac (e.g. 5.12.1) you should take a look at perlbrew
> (http://search.cpan.org/dist/App-perlbrew/).  The three steps at the
> top of the installation section of the README are all you need to get
> going.  Even a manager can do it.
> 
> If you're using bash on the mac via terminal you'll probably want to
> put the one-liner they prescribe into your .bash_profile instead of
> your .bashrc, but everything else just flows right along.
> 
> Once you have that in place you have a nicely isolated system into
> which you can install things to your hearts content without worrying
> about PERL5LIB and local::lib and the rest.
> 
> g.

Have to second using perlbrew, started using it for my local Ubuntu installation (don't have it running on my macbook yet, but it's in the plans).

chris


From greg at ebi.ac.uk  Sun Aug  8 02:12:41 2010
From: greg at ebi.ac.uk (Gregory Jordan)
Date: Sun, 8 Aug 2010 07:12:41 +0100
Subject: [Bioperl-l] Packages retrieving online alignment sequences
In-Reply-To: <21E3B6D7-01BC-4DDA-B5B3-06F1F5AD7105@illinois.edu>
References: <00d901cb3555$6e3a5500$4aaeff00$%yin@ucd.ie>
	<F90660F7-74F9-41F2-A3E4-B3B42B817A0D@sbc.su.se> 
	<00fc01cb3568$e97968b0$bc6c3a10$%yin@ucd.ie>
	<AANLkTimL938B1ovmOKC_FBNw1OwjipVpjOXZfN+P5Kf5@mail.gmail.com> 
	<21E3B6D7-01BC-4DDA-B5B3-06F1F5AD7105@illinois.edu>
Message-ID: <AANLkTim9jkmKSGHm5bHPLOF3_xf+p9xMTN5Ha7bOMR7P@mail.gmail.com>

On 7 August 2010 23:07, Chris Fields <cjfields at illinois.edu> wrote:

>
> A simpler method could be introduced, but I can see that being potentially
> brittle in the long run.  A naked alphanumeric string doesn't reveal much
> about what it is at face value w/o knowing database/service-specific
> behavior.  And then we're reliant on that behavior not changing, which we
> can't guarantee (this has bitten us in the past).  What would one do if NCBI
> (for instance) allowed accessions derived completely of digits, or
> conversely a unique ID with mixed alphanumerics?
>
> Using methods specific for ID/acc at least guarantees a behavior on the
> backend w/o guessing, and if there is no danger of overlap (a service
> accepts either/or) one could simply be an alias of the other.
>

Thanks for the clarification on IDs vs accessions. As long as the behavior
and distinction are well-documented, I'm sure it won't make too much of a
difference.

My main concern was just that having two similar methods -- with no clearly
laid out distinction between the two and one of them only supported by half
of the implementing subclasses -- might confuse potential users.

As a point of reference: both Rfam and Pfam allow either an ID or an
accession in their front-page search interface (http://www.pfam.org /
http://www.rfam.org/). In fact, they seem to entirely hide the distinction
between ID and Accession from the end user; nowhere on the Rfam page for an
individual result is it clear which string is the accession and which is the
ID (http://rfam.sanger.ac.uk/family/snoZ107_R87).

Thus, a potential user of the Rfam module wouldn't know whether to call the
get_by_ID or get_by_Accession method, even after looking at the Rfam page
for his / her desired alignment!

As you can probably tell, I'm all in favor of a unified search whenever
feasible / possible. :-)


> As for writing up an adaptor to ensembl outside of it's API, overall I
> don't think it's a bad idea, but if it's possible maybe start without
> reinventing things, then move to direct SQL.  Unless it's easier to use SQL.
>
>
For fetching Ensembl's gene family alignments, using the SQL will be
easiest. They don't tend to get unreasonably large in terms of memory  -- I
think the biggest tend to be ~700 sequences with a few thousand alignment
columns or so -- and it's a simple table join or two to get both the tree
and alignment from the database.

For genomic alignments, I agree that a more memory-efficient and/or lazy
backend would be necessary. And it's pretty much impossible to get those
things out of the Ensembl tables without using their API.

--greg


From dan.kortschak at adelaide.edu.au  Sun Aug  8 20:53:43 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Mon, 09 Aug 2010 10:23:43 +0930
Subject: [Bioperl-l] MUMmer parser work
In-Reply-To: <80AF6158-9ADF-47A6-97EC-C322F75C8959@illinois.edu>
References: <1281056805.2414.26.camel@zoidberg.mbs.adelaide.edu.au>
	<80AF6158-9ADF-47A6-97EC-C322F75C8959@illinois.edu>
Message-ID: <1281315223.2414.48.camel@zoidberg.mbs.adelaide.edu.au>

Hi Chris,

Is that set of files planned to be included in the git repository on
bioperl-live? I don't want to push something that is being organised by
someone else.

cheers
Dan

On Thu, 2010-08-05 at 22:13 -0500, Chris Fields wrote:
> Dan,
> 
> Just so you know, there is a proposed MUMmer AlignIO parser that John (genehack) is planning on trying to incorporate in:
> 
> http://bugzilla.open-bio.org/show_bug.cgi?id=2701
> 
> It currently lacks significant tests, so feel free to chip in there as needed.
> 
> chris


From genehack at genehack.org  Sun Aug  8 21:42:27 2010
From: genehack at genehack.org (John SJ Anderson)
Date: Sun, 8 Aug 2010 21:42:27 -0400
Subject: [Bioperl-l] MUMmer parser work
In-Reply-To: <1281315223.2414.48.camel@zoidberg.mbs.adelaide.edu.au>
References: <1281056805.2414.26.camel@zoidberg.mbs.adelaide.edu.au>
	<80AF6158-9ADF-47A6-97EC-C322F75C8959@illinois.edu>
	<1281315223.2414.48.camel@zoidberg.mbs.adelaide.edu.au>
Message-ID: <5BEA6ECA-B7A7-4417-BC91-763AB956347A@genehack.org>

I'm working on getting those files into a topic branch in bioperl-live so they can be reviewed -- that'll probably be pushed back to the main master within the next couple days at the latest. 

j.

On Aug 8, 2010, at 20:53 , Dan Kortschak wrote:

> Hi Chris,
> 
> Is that set of files planned to be included in the git repository on
> bioperl-live? I don't want to push something that is being organised by
> someone else.
> 
> cheers
> Dan
> 
> On Thu, 2010-08-05 at 22:13 -0500, Chris Fields wrote:
>> Dan,
>> 
>> Just so you know, there is a proposed MUMmer AlignIO parser that John (genehack) is planning on trying to incorporate in:
>> 
>> http://bugzilla.open-bio.org/show_bug.cgi?id=2701
>> 
>> It currently lacks significant tests, so feel free to chip in there as needed.
>> 
>> chris
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From dan.kortschak at adelaide.edu.au  Sun Aug  8 22:03:52 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Mon, 09 Aug 2010 11:33:52 +0930
Subject: [Bioperl-l] MUMmer parser work
In-Reply-To: <5BEA6ECA-B7A7-4417-BC91-763AB956347A@genehack.org>
References: <1281056805.2414.26.camel@zoidberg.mbs.adelaide.edu.au>
	<80AF6158-9ADF-47A6-97EC-C322F75C8959@illinois.edu>
	<1281315223.2414.48.camel@zoidberg.mbs.adelaide.edu.au>
	<5BEA6ECA-B7A7-4417-BC91-763AB956347A@genehack.org>
Message-ID: <1281319432.2414.49.camel@zoidberg.mbs.adelaide.edu.au>

Excellent. Thanks for that.

Dan

On Sun, 2010-08-08 at 21:42 -0400, John SJ Anderson wrote:
> I'm working on getting those files into a topic branch in bioperl-live so they can be reviewed -- that'll probably be pushed back to the main master within the next couple days at the latest. 
> 
> j.


From cjfields at illinois.edu  Mon Aug  9 22:40:07 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 9 Aug 2010 21:40:07 -0500
Subject: [Bioperl-l] bioperl-live, moving Bio->lib/Bio
Message-ID: <DE527A62-E6E7-45B0-96A5-F94E7A7A137F@illinois.edu>

Any objections to moving the Bio directory to lib/Bio in bioperl-live?  It's a more standard location for code in most distributions; I have a branch (topic/cjfields_standard_lib) that has this working, though it's possible that it needs more work.

chris


From genehack at genehack.org  Tue Aug 10 04:30:44 2010
From: genehack at genehack.org (John SJ Anderson)
Date: Tue, 10 Aug 2010 04:30:44 -0400
Subject: [Bioperl-l] bioperl-live, moving Bio->lib/Bio
In-Reply-To: <DE527A62-E6E7-45B0-96A5-F94E7A7A137F@illinois.edu>
References: <DE527A62-E6E7-45B0-96A5-F94E7A7A137F@illinois.edu>
Message-ID: <B2C73D74-1F72-402B-A3F7-C4E3ECF7D3B6@genehack.org>


On Aug 9, 2010, at 22:40 , Chris Fields wrote:

> Any objections to moving the Bio directory to lib/Bio in bioperl-live?  

+1 on this idea. 

j.


From genehack at genehack.org  Tue Aug 10 07:21:51 2010
From: genehack at genehack.org (John Anderson)
Date: Tue, 10 Aug 2010 07:21:51 -0400
Subject: [Bioperl-l] MUMmer parser work
In-Reply-To: <5BEA6ECA-B7A7-4417-BC91-763AB956347A@genehack.org>
References: <1281056805.2414.26.camel@zoidberg.mbs.adelaide.edu.au>
	<80AF6158-9ADF-47A6-97EC-C322F75C8959@illinois.edu>
	<1281315223.2414.48.camel@zoidberg.mbs.adelaide.edu.au>
	<5BEA6ECA-B7A7-4417-BC91-763AB956347A@genehack.org>
Message-ID: <7A4F93AB-1BF7-4775-BC0E-38E7B431ECC6@genehack.org>


On Aug 8, 2010, at 9:42 PM, John SJ Anderson wrote:

> I'm working on getting those files into a topic branch in bioperl-live so they can be reviewed -- that'll probably be pushed back to the main master within the next couple days at the latest. 

Okay, the files have been added to topic/bug-2701 -- see <http://github.com/bioperl/bioperl-live/commits/topic/bug-2701>.

Please note, these are just the files from the bug report, slotted into the appropriate spots. I haven't reviewed the code or done anything about the non-BioPerl-y tests or the general lack of test coverage. I hope to do something about that in the coming week, but if somebody beats me to it, that would be okay too.

j.


From maj at fortinbras.us  Tue Aug 10 19:52:05 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 10 Aug 2010 19:52:05 -0400
Subject: [Bioperl-l] bioperl-live, moving Bio->lib/Bio
In-Reply-To: <DE527A62-E6E7-45B0-96A5-F94E7A7A137F@illinois.edu>
References: <DE527A62-E6E7-45B0-96A5-F94E7A7A137F@illinois.edu>
Message-ID: <1C55239986494A8D82BDC21A85B324E9@NewLife>

+1
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "BioPerl List" <bioperl-l at lists.open-bio.org>
Sent: Monday, August 09, 2010 10:40 PM
Subject: [Bioperl-l] bioperl-live, moving Bio->lib/Bio


> Any objections to moving the Bio directory to lib/Bio in bioperl-live?  It's a 
> more standard location for code in most distributions; I have a branch 
> (topic/cjfields_standard_lib) that has this working, though it's possible that 
> it needs more work.
>
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From fayroz_farouk at yahoo.com  Sun Aug  8 04:24:31 2010
From: fayroz_farouk at yahoo.com (fayroz)
Date: Sun, 8 Aug 2010 01:24:31 -0700 (PDT)
Subject: [Bioperl-l] using HMMER
Message-ID: <603590.1072.qm@web112620.mail.gq1.yahoo.com>

i need your help, i?am a new perl user and want to use bioperl modules to run 
HMMER program ( HMMsearch) i have" model.hmm" and a "fasta file" to?see?which of 
them are similar?with the model
i write this code but there is a problems

#!/usr/local/bin/perl W
use Bio::AlignIO;
use Bio::SearchIO;
use Bio::SeqIO ;
use Bio::Tools::Run::Hmmer;

# run hmmsearch (similar for hmmpfam)
my $factory = Bio::Tools::Run::Hmmer->new(-hmm => 'h6_avian.hmm',-informat => 
'fasta');
my $seq = Bio::SeqIO->new('-file'=> "one_seq.fa", '-format'=>'Fasta');

# Pass the factory a Bio::Seq object or a file name, returns a Bio::SearchIO
my $searchio = $factory->hmmsearch($seq);

while (my $result = $searchio->next_result){
while(my $hit = $result->next_hit){
while (my $hsp = $hit->next_hsp){
print join("\t", ( $result->query_name,
$hsp->query->start,
$hsp->query->end,
$hit->name,
$hsp->hit->start,
$hsp->hit->end,
$hsp->score,
$hsp->evalue,
$hsp->seq_str,
)), "\n";
}
}
}


exceptions:
MSG: Unknown kind of input 'Bio::SeqIO::fasta=HASH(0x329a504)'
STACK Bio::Tools::Run::Hmmer::_setinput 
D:/Perl/site/lib/Bio/Tools/Run/Hmmer.pm:381
STACK Bio::Tools::Run::Hmmer::hmmsearch 
D:/Perl/site/lib/Bio/Tools/Run/Hmmer.pm:352
?STACK toplevel test_bioperl.pl:12
thank you

fayroz?


From douglas.hoen at gmail.com  Tue Aug 10 21:54:53 2010
From: douglas.hoen at gmail.com (Douglas Hoen)
Date: Tue, 10 Aug 2010 21:54:53 -0400
Subject: [Bioperl-l] Bio::SeqFeature::SimilarityPair->from_searchResult()?
Message-ID: <4513D6B2-F7B3-4A6E-91CA-879C9E372E84@gmail.com>

Hi,

I was wondering why the Synopsis in the docs for Bio::SeqFeature::SimilarityPair has the following:
$sim_pair = Bio::SeqFeature::SimilarityPair->from_searchResult($blastHit);

There doesn't actually seem to be a from_searchResult method. Am I missing something?

Thanks,
-- Doug


From zhaoy at mail.cbi.pku.edu.cn  Wed Aug 11 04:17:42 2010
From: zhaoy at mail.cbi.pku.edu.cn (zhaoy at mail.cbi.pku.edu.cn)
Date: Wed, 11 Aug 2010 16:17:42 +0800 (CST)
Subject: [Bioperl-l] About extracting sequence from genewise format result
Message-ID: <53663.162.105.250.100.1281514662.squirrel@mail.cbi.pku.edu.cn>

Dear authors:

Hello!

Recently I am trying to parse the genewise format result for extracting
the nuclear sequence using method "hit_string" in module "SearchIO",
however, the result is empty. What's more terrible, the cycle seems not
working, because I always get the last result. I'm confused.

My perl code is shown below:

#!/usr/bin/perl -w
use strict;
use warnings;

use Bio::SearchIO;
my $in = new Bio::SearchIO(-format => 'wise',
                           -wisetype => 'genewise',
                           -file   => 'test');
while( my $result = $in->next_result ) {
        while (my $hit = $result->next_hit) {
           while (my $hsp = $hit->next_hsp){
                print "Query=",      $result->query_name, "\n",
                      "Length=",     $hsp->length('total'),"\n",
                      "hit_string:", $hsp->hit_string, "\n";
}
}
}

And one of the genewise format results is shown below:

genewise $Name: wise2-4-0alpha $ (unreleased release)
This program is freely distributed under a GPL. See source directory
Copyright (c) GRL limited: portions of the code are from separate copyright

Query protein:       Cpa_s110_24
Comp Matrix:         BLOSUM62.bla
Gap open:            12
Gap extension:       2
Start/End            global
Target Sequence      Bdi_chr3:38292015..38292302
Strand:              forward
Start/End (protein)  global
Gene Parameter file: gene.stat
Splice site model:   GT/AG only
Codon Table:         codon.table
Subs error:          1e-06
Indel error:         1e-06
Null model           syn
Algorithm            623

genewise output
Score 37.97 bits over entire alignment
Scores as bits over a synchronous coding model

Warning: The bits scores is not probablistically correct for single seqs
See WWW help for more info

Cpa_s110_24        1 MGNCQAVDAATLAIQHPS-GKVDRLYWPVSASEVMRTNPGHYVALLI--
                     MGNCQA DAA + IQHP+ GKV+RLYWP +A++VMR NPGHYVAL++
                     MGNCQAADAAAVVIQHPAEGKVERLYWPATAADVMRKNPGHYVALVVVH
Bdi_chr3:382920    1 agatcggggggggacccgggaggccttcgaggggacaacgctggcgggc
                     tgagaccaccctttaaccagatagtagcccccattgaacgaatctttta
                     gctcgggtggcggcgcgcgggcgcccggccgcccgcgcccccccccccc


Cpa_s110_24       47 ----STTLCPSNSNASNAESVRVTRIKLLRPTDTLVLGQVYRLITTQEV
                              P+ +    A + R+T++KLL+P DTL++GQVYRLIT+Q
                     VSGGAGETDPAVAGGGAAAAARITKVKLLKPRDTLLIGQVYRLITSQ--
Bdi_chr3:382920  148 gtgggggagcgggggggggggaaaagaccaccgaccagcgtccaatc
                     tcggcgacacctcgggcccccgtcatattacgactttgatagttcca
                     cctcctgtcccacaaaattccgccgcgccgcgctgcccgccccccca


Cpa_s110_24       92 MKGLWAKKCAKMKKYQEADHKDGLKPETIPGRRSGPERDTQVAKHERHR

                     -------------------------------------------------
Bdi_chr3:382920  289


Cpa_s110_24      141 SRVAASTNQAGLKSRTWQPSLKSISEAAS

                     -----------------------------
Bdi_chr3:382920  289


//
Gene 1
Gene 1 288
  Exon 1 288 phase 0
     Supporting 1 54 1 18
     Supporting 58 141 19 46
     Supporting 160 288 47 89
//

......


The part of output of this code is shown below:
Query=Aly_481360
Length=0
hit_string:

Query=Aly_481360
Length=0
hit_string:

......

What's wrong with my code and how can I get the correct result? I'm
looking forward to your reply.

Thanks very much!

Best regards,
Zackaly


From roy.chaudhuri at gmail.com  Wed Aug 11 10:32:39 2010
From: roy.chaudhuri at gmail.com (Roy Chaudhuri)
Date: Wed, 11 Aug 2010 15:32:39 +0100
Subject: [Bioperl-l] using HMMER
In-Reply-To: <603590.1072.qm@web112620.mail.gq1.yahoo.com>
References: <603590.1072.qm@web112620.mail.gq1.yahoo.com>
Message-ID: <4C62B487.9090103@gmail.com>

Hi Fayroz,

Your $seq variable contains a Bio::SeqIO object (a biological 
filehandle), not a Bio::Seq (sequence object).

You need to change that line to:
my $seqio = Bio::SeqIO->new(-file=>'one_seq.fa', -format=>'fasta');
my $seq=$seqio->next_seq;

If you have multiple sequences in the file, then you will need to loop 
over them:
while (my $seq=$seqio->next_seq) {
# Code to run Hmmer goes here
}

Also, I don't think you need to specify -informat for your 
Bio::Tools::Run::Hmmer object, since you're passing it a sequence 
object, not a filename.

Hope this helps.
Roy.

On 08/08/2010 09:24, fayroz wrote:
> i need your help, i am a new perl user and want to use bioperl modules to run
> HMMER program ( HMMsearch) i have" model.hmm" and a "fasta file" to see which of
> them are similar with the model
> i write this code but there is a problems
>
> #!/usr/local/bin/perl W
> use Bio::AlignIO;
> use Bio::SearchIO;
> use Bio::SeqIO ;
> use Bio::Tools::Run::Hmmer;
>
> # run hmmsearch (similar for hmmpfam)
> my $factory = Bio::Tools::Run::Hmmer->new(-hmm =>  'h6_avian.hmm',-informat =>
> 'fasta');
> my $seq = Bio::SeqIO->new('-file'=>  "one_seq.fa", '-format'=>'Fasta');
>
> # Pass the factory a Bio::Seq object or a file name, returns a Bio::SearchIO
> my $searchio = $factory->hmmsearch($seq);
>
> while (my $result = $searchio->next_result){
> while(my $hit = $result->next_hit){
> while (my $hsp = $hit->next_hsp){
> print join("\t", ( $result->query_name,
> $hsp->query->start,
> $hsp->query->end,
> $hit->name,
> $hsp->hit->start,
> $hsp->hit->end,
> $hsp->score,
> $hsp->evalue,
> $hsp->seq_str,
> )), "\n";
> }
> }
> }
>
>
> exceptions:
> MSG: Unknown kind of input 'Bio::SeqIO::fasta=HASH(0x329a504)'
> STACK Bio::Tools::Run::Hmmer::_setinput
> D:/Perl/site/lib/Bio/Tools/Run/Hmmer.pm:381
> STACK Bio::Tools::Run::Hmmer::hmmsearch
> D:/Perl/site/lib/Bio/Tools/Run/Hmmer.pm:352
>   STACK toplevel test_bioperl.pl:12
> thank you
>
> fayroz
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Wed Aug 11 11:07:36 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 11 Aug 2010 10:07:36 -0500
Subject: [Bioperl-l] using HMMER
In-Reply-To: <4C62B487.9090103@gmail.com>
References: <603590.1072.qm@web112620.mail.gq1.yahoo.com>
	<4C62B487.9090103@gmail.com>
Message-ID: <62C86AFB-FF3A-44C6-A413-50C3F839DF34@illinois.edu>

might also want to check whether you are using hmmer2 vs hmmer3.  not sure if the wrapper works for hmmer3.

chris

On Aug 11, 2010, at 9:32 AM, Roy Chaudhuri wrote:

> Hi Fayroz,
> 
> Your $seq variable contains a Bio::SeqIO object (a biological filehandle), not a Bio::Seq (sequence object).
> 
> You need to change that line to:
> my $seqio = Bio::SeqIO->new(-file=>'one_seq.fa', -format=>'fasta');
> my $seq=$seqio->next_seq;
> 
> If you have multiple sequences in the file, then you will need to loop over them:
> while (my $seq=$seqio->next_seq) {
> # Code to run Hmmer goes here
> }
> 
> Also, I don't think you need to specify -informat for your Bio::Tools::Run::Hmmer object, since you're passing it a sequence object, not a filename.
> 
> Hope this helps.
> Roy.
> 
> On 08/08/2010 09:24, fayroz wrote:
>> i need your help, i am a new perl user and want to use bioperl modules to run
>> HMMER program ( HMMsearch) i have" model.hmm" and a "fasta file" to see which of
>> them are similar with the model
>> i write this code but there is a problems
>> 
>> #!/usr/local/bin/perl W
>> use Bio::AlignIO;
>> use Bio::SearchIO;
>> use Bio::SeqIO ;
>> use Bio::Tools::Run::Hmmer;
>> 
>> # run hmmsearch (similar for hmmpfam)
>> my $factory = Bio::Tools::Run::Hmmer->new(-hmm =>  'h6_avian.hmm',-informat =>
>> 'fasta');
>> my $seq = Bio::SeqIO->new('-file'=>  "one_seq.fa", '-format'=>'Fasta');
>> 
>> # Pass the factory a Bio::Seq object or a file name, returns a Bio::SearchIO
>> my $searchio = $factory->hmmsearch($seq);
>> 
>> while (my $result = $searchio->next_result){
>> while(my $hit = $result->next_hit){
>> while (my $hsp = $hit->next_hsp){
>> print join("\t", ( $result->query_name,
>> $hsp->query->start,
>> $hsp->query->end,
>> $hit->name,
>> $hsp->hit->start,
>> $hsp->hit->end,
>> $hsp->score,
>> $hsp->evalue,
>> $hsp->seq_str,
>> )), "\n";
>> }
>> }
>> }
>> 
>> 
>> exceptions:
>> MSG: Unknown kind of input 'Bio::SeqIO::fasta=HASH(0x329a504)'
>> STACK Bio::Tools::Run::Hmmer::_setinput
>> D:/Perl/site/lib/Bio/Tools/Run/Hmmer.pm:381
>> STACK Bio::Tools::Run::Hmmer::hmmsearch
>> D:/Perl/site/lib/Bio/Tools/Run/Hmmer.pm:352
>>  STACK toplevel test_bioperl.pl:12
>> thank you
>> 
>> fayroz
>> 
>> 
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From douglas.hoen at gmail.com  Wed Aug 11 15:13:49 2010
From: douglas.hoen at gmail.com (Doug)
Date: Wed, 11 Aug 2010 12:13:49 -0700 (PDT)
Subject: [Bioperl-l] How to store results of searches of translated DNA in
	SeqFeature::Store database of the original DNA?
Message-ID: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com>

Hi,

I am trying to store in a SeqFeature::Store database the results of
searches of translated DNA. The DB contains the original DNA
sequences. For instance, I have done HMMER searches of 6-frame
translations of the sequences stored in the DB. I want to store these
results "at" their (equivalent) DNA positions, which I can calculate.
Preferably, I would like to directly store the SeqFeature::Similarity
objects that I get from parsing these searches. But they are of course
located on different coordinate systems than the DNA, so I guess I
can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct
DNA position and then store the Similarity's as sub-SeqFeatures.

I could just set the Similarity's position to the (calculated) DNA
coordinates, or alternately make a new SeqFeature and copy in the
attributes I want. But is there a more elegant solution?

Thanks,
-- Doug


From douglas.hoen at gmail.com  Wed Aug 11 16:11:26 2010
From: douglas.hoen at gmail.com (Doug)
Date: Wed, 11 Aug 2010 13:11:26 -0700 (PDT)
Subject: [Bioperl-l] How to store results of searches of translated DNA
	in SeqFeature::Store database of the original DNA?
In-Reply-To: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com>
References: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com>
Message-ID: <f2a15012-4af5-4e91-babe-3fb8ff55dd86@k10g2000yqa.googlegroups.com>

One possible answer to my own question: Use
Bio::SeqFeature::PositionProxy's? Would this work?

On Aug 11, 3:13?pm, Doug <douglas.h... at gmail.com> wrote:
> Hi,
>
> I am trying to store in a SeqFeature::Store database the results of
> searches of translated DNA. The DB contains the original DNA
> sequences. For instance, I have done HMMER searches of 6-frame
> translations of the sequences stored in the DB. I want to store these
> results "at" their (equivalent) DNA positions, which I can calculate.
> Preferably, I would like to directly store the SeqFeature::Similarity
> objects that I get from parsing these searches. But they are of course
> located on different coordinate systems than the DNA, so I guess I
> can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct
> DNA position and then store the Similarity's as sub-SeqFeatures.
>
> I could just set the Similarity's position to the (calculated) DNA
> coordinates, or alternately make a new SeqFeature and copy in the
> attributes I want. But is there a more elegant solution?
>
> Thanks,
> -- Doug
> _______________________________________________
> Bioperl-l mailing list
> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l


From scott at scottcain.net  Wed Aug 11 16:16:22 2010
From: scott at scottcain.net (Scott Cain)
Date: Wed, 11 Aug 2010 16:16:22 -0400
Subject: [Bioperl-l] How to store results of searches of translated DNA
	in SeqFeature::Store database of the original DNA?
In-Reply-To: <f2a15012-4af5-4e91-babe-3fb8ff55dd86@k10g2000yqa.googlegroups.com>
References: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com>
	<f2a15012-4af5-4e91-babe-3fb8ff55dd86@k10g2000yqa.googlegroups.com>
Message-ID: <AANLkTim2X9uaVq6ChayrRJr10L3MeA4fVfuHvA0HyvqM@mail.gmail.com>

Hi Doug,

I don't know if any of the things you've thought of would work; I've
never tried it.  My inclination would be to express your data in GFF3
and use the standard loader.

Scott


On Wed, Aug 11, 2010 at 4:11 PM, Doug <douglas.hoen at gmail.com> wrote:
> One possible answer to my own question: Use
> Bio::SeqFeature::PositionProxy's? Would this work?
>
> On Aug 11, 3:13?pm, Doug <douglas.h... at gmail.com> wrote:
>> Hi,
>>
>> I am trying to store in a SeqFeature::Store database the results of
>> searches of translated DNA. The DB contains the original DNA
>> sequences. For instance, I have done HMMER searches of 6-frame
>> translations of the sequences stored in the DB. I want to store these
>> results "at" their (equivalent) DNA positions, which I can calculate.
>> Preferably, I would like to directly store the SeqFeature::Similarity
>> objects that I get from parsing these searches. But they are of course
>> located on different coordinate systems than the DNA, so I guess I
>> can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct
>> DNA position and then store the Similarity's as sub-SeqFeatures.
>>
>> I could just set the Similarity's position to the (calculated) DNA
>> coordinates, or alternately make a new SeqFeature and copy in the
>> attributes I want. But is there a more elegant solution?
>>
>> Thanks,
>> -- Doug
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
Ontario Institute for Cancer Research


From douglas.hoen at gmail.com  Wed Aug 11 16:38:54 2010
From: douglas.hoen at gmail.com (Doug)
Date: Wed, 11 Aug 2010 13:38:54 -0700 (PDT)
Subject: [Bioperl-l] How to store results of searches of translated DNA
	in SeqFeature::Store database of the original DNA?
In-Reply-To: <AANLkTim2X9uaVq6ChayrRJr10L3MeA4fVfuHvA0HyvqM@mail.gmail.com>
References: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com>
	<f2a15012-4af5-4e91-babe-3fb8ff55dd86@k10g2000yqa.googlegroups.com> 
	<AANLkTim2X9uaVq6ChayrRJr10L3MeA4fVfuHvA0HyvqM@mail.gmail.com>
Message-ID: <6e28dc26-ada0-4be2-9f62-a4d632aaf0bb@j8g2000yqd.googlegroups.com>

Hi Scott,

Good idea. Would you happen to know of an existing HMMER3 to GFF3
converter?

Thanks for your advice,
-- Doug

On Aug 11, 4:16?pm, Scott Cain <sc... at scottcain.net> wrote:
> Hi Doug,
>
> I don't know if any of the things you've thought of would work; I've
> never tried it. ?My inclination would be to express your data in GFF3
> and use the standard loader.
>
> Scott
>
>
>
>
>
> On Wed, Aug 11, 2010 at 4:11 PM, Doug <douglas.h... at gmail.com> wrote:
> > One possible answer to my own question: Use
> > Bio::SeqFeature::PositionProxy's? Would this work?
>
> > On Aug 11, 3:13?pm, Doug <douglas.h... at gmail.com> wrote:
> >> Hi,
>
> >> I am trying to store in a SeqFeature::Store database the results of
> >> searches of translated DNA. The DB contains the original DNA
> >> sequences. For instance, I have done HMMER searches of 6-frame
> >> translations of the sequences stored in the DB. I want to store these
> >> results "at" their (equivalent) DNA positions, which I can calculate.
> >> Preferably, I would like to directly store the SeqFeature::Similarity
> >> objects that I get from parsing these searches. But they are of course
> >> located on different coordinate systems than the DNA, so I guess I
> >> can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct
> >> DNA position and then store the Similarity's as sub-SeqFeatures.
>
> >> I could just set the Similarity's position to the (calculated) DNA
> >> coordinates, or alternately make a new SeqFeature and copy in the
> >> attributes I want. But is there a more elegant solution?
>
> >> Thanks,
> >> -- Doug
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioper... at lists.open-bio.org
> >http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net
> GMOD Coordinator (http://gmod.org/)?? ? ? ? ? ? ? ? ?? 216-392-3087
> Ontario Institute for Cancer Research
>
> _______________________________________________
> Bioperl-l mailing list
> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l


From douglas.hoen at gmail.com  Wed Aug 11 16:53:35 2010
From: douglas.hoen at gmail.com (Doug)
Date: Wed, 11 Aug 2010 13:53:35 -0700 (PDT)
Subject: [Bioperl-l] How to store results of searches of translated DNA
	in SeqFeature::Store database of the original DNA?
In-Reply-To: <6e28dc26-ada0-4be2-9f62-a4d632aaf0bb@j8g2000yqd.googlegroups.com>
References: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com>
	<f2a15012-4af5-4e91-babe-3fb8ff55dd86@k10g2000yqa.googlegroups.com> 
	<AANLkTim2X9uaVq6ChayrRJr10L3MeA4fVfuHvA0HyvqM@mail.gmail.com> 
	<6e28dc26-ada0-4be2-9f62-a4d632aaf0bb@j8g2000yqd.googlegroups.com>
Message-ID: <a9d5aca2-3c28-49e8-bd76-119309c38c05@x21g2000yqa.googlegroups.com>

One more note: I did try using PositionProxy but it failed. It doesn't
implement seq_id() and so can't be stored in the DB:

------------- EXCEPTION: Bio::Root::NotImplemented -------------
MSG: Abstract method "Bio::SeqFeatureI::seq_id" is not implemented by
package Bio::SeqFeature::PositionProxy.
This is not your fault - author of Bio::SeqFeature::PositionProxy
should be blamed!

...


On Aug 11, 4:38?pm, Doug <douglas.h... at gmail.com> wrote:
> Hi Scott,
>
> Good idea. Would you happen to know of an existing HMMER3 to GFF3
> converter?
>
> Thanks for your advice,
> -- Doug
>
> On Aug 11, 4:16?pm, Scott Cain <sc... at scottcain.net> wrote:
>
>
>
>
>
> > Hi Doug,
>
> > I don't know if any of the things you've thought of would work; I've
> > never tried it. ?My inclination would be to express your data in GFF3
> > and use the standard loader.
>
> > Scott
>
> > On Wed, Aug 11, 2010 at 4:11 PM, Doug <douglas.h... at gmail.com> wrote:
> > > One possible answer to my own question: Use
> > > Bio::SeqFeature::PositionProxy's? Would this work?
>
> > > On Aug 11, 3:13?pm, Doug <douglas.h... at gmail.com> wrote:
> > >> Hi,
>
> > >> I am trying to store in a SeqFeature::Store database the results of
> > >> searches of translated DNA. The DB contains the original DNA
> > >> sequences. For instance, I have done HMMER searches of 6-frame
> > >> translations of the sequences stored in the DB. I want to store these
> > >> results "at" their (equivalent) DNA positions, which I can calculate.
> > >> Preferably, I would like to directly store the SeqFeature::Similarity
> > >> objects that I get from parsing these searches. But they are of course
> > >> located on different coordinate systems than the DNA, so I guess I
> > >> can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct
> > >> DNA position and then store the Similarity's as sub-SeqFeatures.
>
> > >> I could just set the Similarity's position to the (calculated) DNA
> > >> coordinates, or alternately make a new SeqFeature and copy in the
> > >> attributes I want. But is there a more elegant solution?
>
> > >> Thanks,
> > >> -- Doug
> > >> _______________________________________________
> > >> Bioperl-l mailing list
> > >> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioper... at lists.open-bio.org
> > >http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> > --
> > ------------------------------------------------------------------------
> > Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net
> > GMOD Coordinator (http://gmod.org/)?? ? ? ? ? ? ? ? ?? 216-392-3087
> > Ontario Institute for Cancer Research
>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Wed Aug 11 16:45:00 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 11 Aug 2010 15:45:00 -0500
Subject: [Bioperl-l] How to store results of searches of translated DNA
	in SeqFeature::Store database of the original DNA?
In-Reply-To: <6e28dc26-ada0-4be2-9f62-a4d632aaf0bb@j8g2000yqd.googlegroups.com>
References: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com>
	<f2a15012-4af5-4e91-babe-3fb8ff55dd86@k10g2000yqa.googlegroups.com>
	<AANLkTim2X9uaVq6ChayrRJr10L3MeA4fVfuHvA0HyvqM@mail.gmail.com>
	<6e28dc26-ada0-4be2-9f62-a4d632aaf0bb@j8g2000yqd.googlegroups.com>
Message-ID: <190AF658-E8FE-43D7-A71F-196AE54DA1DB@illinois.edu>

HMMER3 is parsed by Bio::SearchIO now in bioperl-live, and I think there is a generic SearchIO->GFF3 script floating around the intertubes somewheres...

chris

On Aug 11, 2010, at 3:38 PM, Doug wrote:

> Hi Scott,
> 
> Good idea. Would you happen to know of an existing HMMER3 to GFF3
> converter?
> 
> Thanks for your advice,
> -- Doug
> 
> On Aug 11, 4:16 pm, Scott Cain <sc... at scottcain.net> wrote:
>> Hi Doug,
>> 
>> I don't know if any of the things you've thought of would work; I've
>> never tried it.  My inclination would be to express your data in GFF3
>> and use the standard loader.
>> 
>> Scott
>> 
>> 
>> 
>> 
>> 
>> On Wed, Aug 11, 2010 at 4:11 PM, Doug <douglas.h... at gmail.com> wrote:
>>> One possible answer to my own question: Use
>>> Bio::SeqFeature::PositionProxy's? Would this work?
>> 
>>> On Aug 11, 3:13 pm, Doug <douglas.h... at gmail.com> wrote:
>>>> Hi,
>> 
>>>> I am trying to store in a SeqFeature::Store database the results of
>>>> searches of translated DNA. The DB contains the original DNA
>>>> sequences. For instance, I have done HMMER searches of 6-frame
>>>> translations of the sequences stored in the DB. I want to store these
>>>> results "at" their (equivalent) DNA positions, which I can calculate.
>>>> Preferably, I would like to directly store the SeqFeature::Similarity
>>>> objects that I get from parsing these searches. But they are of course
>>>> located on different coordinate systems than the DNA, so I guess I
>>>> can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct
>>>> DNA position and then store the Similarity's as sub-SeqFeatures.
>> 
>>>> I could just set the Similarity's position to the (calculated) DNA
>>>> coordinates, or alternately make a new SeqFeature and copy in the
>>>> attributes I want. But is there a more elegant solution?
>> 
>>>> Thanks,
>>>> -- Doug
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioper... at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> --
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.                                   scott at scottcain dot net
>> GMOD Coordinator (http://gmod.org/)                    216-392-3087
>> Ontario Institute for Cancer Research
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From scott at scottcain.net  Wed Aug 11 17:05:25 2010
From: scott at scottcain.net (Scott Cain)
Date: Wed, 11 Aug 2010 17:05:25 -0400
Subject: [Bioperl-l] How to store results of searches of translated DNA
	in SeqFeature::Store database of the original DNA?
In-Reply-To: <190AF658-E8FE-43D7-A71F-196AE54DA1DB@illinois.edu>
References: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com>
	<f2a15012-4af5-4e91-babe-3fb8ff55dd86@k10g2000yqa.googlegroups.com>
	<AANLkTim2X9uaVq6ChayrRJr10L3MeA4fVfuHvA0HyvqM@mail.gmail.com>
	<6e28dc26-ada0-4be2-9f62-a4d632aaf0bb@j8g2000yqd.googlegroups.com>
	<190AF658-E8FE-43D7-A71F-196AE54DA1DB@illinois.edu>
Message-ID: <AANLkTimY09-wo9R_ZbPmSG_9x7TZjVobTM95VO5fgCa4@mail.gmail.com>

Um, yeah, it's in bioperl: bp_search2gff.pl.

Scott


On Wed, Aug 11, 2010 at 4:45 PM, Chris Fields <cjfields at illinois.edu> wrote:
> HMMER3 is parsed by Bio::SearchIO now in bioperl-live, and I think there is a generic SearchIO->GFF3 script floating around the intertubes somewheres...
>
> chris
>
> On Aug 11, 2010, at 3:38 PM, Doug wrote:
>
>> Hi Scott,
>>
>> Good idea. Would you happen to know of an existing HMMER3 to GFF3
>> converter?
>>
>> Thanks for your advice,
>> -- Doug
>>
>> On Aug 11, 4:16 pm, Scott Cain <sc... at scottcain.net> wrote:
>>> Hi Doug,
>>>
>>> I don't know if any of the things you've thought of would work; I've
>>> never tried it. ?My inclination would be to express your data in GFF3
>>> and use the standard loader.
>>>
>>> Scott
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Aug 11, 2010 at 4:11 PM, Doug <douglas.h... at gmail.com> wrote:
>>>> One possible answer to my own question: Use
>>>> Bio::SeqFeature::PositionProxy's? Would this work?
>>>
>>>> On Aug 11, 3:13 pm, Doug <douglas.h... at gmail.com> wrote:
>>>>> Hi,
>>>
>>>>> I am trying to store in a SeqFeature::Store database the results of
>>>>> searches of translated DNA. The DB contains the original DNA
>>>>> sequences. For instance, I have done HMMER searches of 6-frame
>>>>> translations of the sequences stored in the DB. I want to store these
>>>>> results "at" their (equivalent) DNA positions, which I can calculate.
>>>>> Preferably, I would like to directly store the SeqFeature::Similarity
>>>>> objects that I get from parsing these searches. But they are of course
>>>>> located on different coordinate systems than the DNA, so I guess I
>>>>> can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct
>>>>> DNA position and then store the Similarity's as sub-SeqFeatures.
>>>
>>>>> I could just set the Similarity's position to the (calculated) DNA
>>>>> coordinates, or alternately make a new SeqFeature and copy in the
>>>>> attributes I want. But is there a more elegant solution?
>>>
>>>>> Thanks,
>>>>> -- Doug
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioper... at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> --
>>> ------------------------------------------------------------------------
>>> Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain dot net
>>> GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ?216-392-3087
>>> Ontario Institute for Cancer Research
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
Ontario Institute for Cancer Research


From cjfields at illinois.edu  Wed Aug 11 17:07:20 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 11 Aug 2010 16:07:20 -0500
Subject: [Bioperl-l] How to store results of searches of translated DNA
	in SeqFeature::Store database of the original DNA?
In-Reply-To: <AANLkTimY09-wo9R_ZbPmSG_9x7TZjVobTM95VO5fgCa4@mail.gmail.com>
References: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com>
	<f2a15012-4af5-4e91-babe-3fb8ff55dd86@k10g2000yqa.googlegroups.com>
	<AANLkTim2X9uaVq6ChayrRJr10L3MeA4fVfuHvA0HyvqM@mail.gmail.com>
	<6e28dc26-ada0-4be2-9f62-a4d632aaf0bb@j8g2000yqd.googlegroups.com>
	<190AF658-E8FE-43D7-A71F-196AE54DA1DB@illinois.edu>
	<AANLkTimY09-wo9R_ZbPmSG_9x7TZjVobTM95VO5fgCa4@mail.gmail.com>
Message-ID: <CCD1DE1D-867E-468D-941A-7C418C126FBE@illinois.edu>

For some reason I thought there was a more up-to-date one somewhere.  Ah well, can't keep track of all the code in bioperl :>

chris

On Aug 11, 2010, at 4:05 PM, Scott Cain wrote:

> Um, yeah, it's in bioperl: bp_search2gff.pl.
> 
> Scott
> 
> 
> On Wed, Aug 11, 2010 at 4:45 PM, Chris Fields <cjfields at illinois.edu> wrote:
>> HMMER3 is parsed by Bio::SearchIO now in bioperl-live, and I think there is a generic SearchIO->GFF3 script floating around the intertubes somewheres...
>> 
>> chris
>> 
>> On Aug 11, 2010, at 3:38 PM, Doug wrote:
>> 
>>> Hi Scott,
>>> 
>>> Good idea. Would you happen to know of an existing HMMER3 to GFF3
>>> converter?
>>> 
>>> Thanks for your advice,
>>> -- Doug
>>> 
>>> On Aug 11, 4:16 pm, Scott Cain <sc... at scottcain.net> wrote:
>>>> Hi Doug,
>>>> 
>>>> I don't know if any of the things you've thought of would work; I've
>>>> never tried it.  My inclination would be to express your data in GFF3
>>>> and use the standard loader.
>>>> 
>>>> Scott
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Wed, Aug 11, 2010 at 4:11 PM, Doug <douglas.h... at gmail.com> wrote:
>>>>> One possible answer to my own question: Use
>>>>> Bio::SeqFeature::PositionProxy's? Would this work?
>>>> 
>>>>> On Aug 11, 3:13 pm, Doug <douglas.h... at gmail.com> wrote:
>>>>>> Hi,
>>>> 
>>>>>> I am trying to store in a SeqFeature::Store database the results of
>>>>>> searches of translated DNA. The DB contains the original DNA
>>>>>> sequences. For instance, I have done HMMER searches of 6-frame
>>>>>> translations of the sequences stored in the DB. I want to store these
>>>>>> results "at" their (equivalent) DNA positions, which I can calculate.
>>>>>> Preferably, I would like to directly store the SeqFeature::Similarity
>>>>>> objects that I get from parsing these searches. But they are of course
>>>>>> located on different coordinate systems than the DNA, so I guess I
>>>>>> can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct
>>>>>> DNA position and then store the Similarity's as sub-SeqFeatures.
>>>> 
>>>>>> I could just set the Similarity's position to the (calculated) DNA
>>>>>> coordinates, or alternately make a new SeqFeature and copy in the
>>>>>> attributes I want. But is there a more elegant solution?
>>>> 
>>>>>> Thanks,
>>>>>> -- Doug
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>> 
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioper... at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>> 
>>>> --
>>>> ------------------------------------------------------------------------
>>>> Scott Cain, Ph. D.                                   scott at scottcain dot net
>>>> GMOD Coordinator (http://gmod.org/)                    216-392-3087
>>>> Ontario Institute for Cancer Research
>>>> 
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> 
> 
> 
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain dot net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research


From douglas.hoen at gmail.com  Wed Aug 11 17:11:20 2010
From: douglas.hoen at gmail.com (Douglas Hoen)
Date: Wed, 11 Aug 2010 17:11:20 -0400
Subject: [Bioperl-l] How to store results of searches of translated DNA
	in SeqFeature::Store database of the original DNA?
In-Reply-To: <AANLkTimY09-wo9R_ZbPmSG_9x7TZjVobTM95VO5fgCa4@mail.gmail.com>
References: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com>
	<f2a15012-4af5-4e91-babe-3fb8ff55dd86@k10g2000yqa.googlegroups.com>
	<AANLkTim2X9uaVq6ChayrRJr10L3MeA4fVfuHvA0HyvqM@mail.gmail.com>
	<6e28dc26-ada0-4be2-9f62-a4d632aaf0bb@j8g2000yqd.googlegroups.com>
	<190AF658-E8FE-43D7-A71F-196AE54DA1DB@illinois.edu>
	<AANLkTimY09-wo9R_ZbPmSG_9x7TZjVobTM95VO5fgCa4@mail.gmail.com>
Message-ID: <A8FFFBCC-4E4F-478B-B824-BB4249B11BA1@gmail.com>

Great, thanks so much for the info.

On 2010-08-11, at 5:05 PM, Scott Cain wrote:

> Um, yeah, it's in bioperl: bp_search2gff.pl.
> 
> Scott
> 
> 
> On Wed, Aug 11, 2010 at 4:45 PM, Chris Fields <cjfields at illinois.edu> wrote:
>> HMMER3 is parsed by Bio::SearchIO now in bioperl-live, and I think there is a generic SearchIO->GFF3 script floating around the intertubes somewheres...
>> 
>> chris
>> 
>> On Aug 11, 2010, at 3:38 PM, Doug wrote:
>> 
>>> Hi Scott,
>>> 
>>> Good idea. Would you happen to know of an existing HMMER3 to GFF3
>>> converter?
>>> 
>>> Thanks for your advice,
>>> -- Doug
>>> 
>>> On Aug 11, 4:16 pm, Scott Cain <sc... at scottcain.net> wrote:
>>>> Hi Doug,
>>>> 
>>>> I don't know if any of the things you've thought of would work; I've
>>>> never tried it.  My inclination would be to express your data in GFF3
>>>> and use the standard loader.
>>>> 
>>>> Scott
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Wed, Aug 11, 2010 at 4:11 PM, Doug <douglas.h... at gmail.com> wrote:
>>>>> One possible answer to my own question: Use
>>>>> Bio::SeqFeature::PositionProxy's? Would this work?
>>>> 
>>>>> On Aug 11, 3:13 pm, Doug <douglas.h... at gmail.com> wrote:
>>>>>> Hi,
>>>> 
>>>>>> I am trying to store in a SeqFeature::Store database the results of
>>>>>> searches of translated DNA. The DB contains the original DNA
>>>>>> sequences. For instance, I have done HMMER searches of 6-frame
>>>>>> translations of the sequences stored in the DB. I want to store these
>>>>>> results "at" their (equivalent) DNA positions, which I can calculate.
>>>>>> Preferably, I would like to directly store the SeqFeature::Similarity
>>>>>> objects that I get from parsing these searches. But they are of course
>>>>>> located on different coordinate systems than the DNA, so I guess I
>>>>>> can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct
>>>>>> DNA position and then store the Similarity's as sub-SeqFeatures.
>>>> 
>>>>>> I could just set the Similarity's position to the (calculated) DNA
>>>>>> coordinates, or alternately make a new SeqFeature and copy in the
>>>>>> attributes I want. But is there a more elegant solution?
>>>> 
>>>>>> Thanks,
>>>>>> -- Doug
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>> 
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioper... at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>> 
>>>> --
>>>> ------------------------------------------------------------------------
>>>> Scott Cain, Ph. D.                                   scott at scottcain dot net
>>>> GMOD Coordinator (http://gmod.org/)                    216-392-3087
>>>> Ontario Institute for Cancer Research
>>>> 
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> 
> 
> 
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain dot net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research


From Russell.Smithies at agresearch.co.nz  Wed Aug 11 17:31:32 2010
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Thu, 12 Aug 2010 09:31:32 +1200
Subject: [Bioperl-l] AlignIO  and Gbrowse_syn
In-Reply-To: <AANLkTimhT7KFHgg8HssiMrXkGNSjRMqjkuE++iAaqj4C@mail.gmail.com>
References: <AANLkTi=DS92zQ+_YANSbgmMiPUDo66YXyLhQby+_z_VD@mail.gmail.com>
	<C87CF736.E5DB%gowthaman.ramasamy@sbri.org>
	<AANLkTimhT7KFHgg8HssiMrXkGNSjRMqjkuE++iAaqj4C@mail.gmail.com>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF32F0237EAB7@exchsth.agresearch.co.nz>

I know there was some brief discussion about .maf format a few weeks ago but I've had an enquiry (as below) from a colleague. 
If GBrowse_syn is using .maf format, does AlignIO need more work?
Any comments?

--Russell


I'd like to plug LASTZ alignments into GBrowse_syn. LASTZ can produce a limit number of alignment formats (http://www.bx.psu.edu/miller_lab/dist/README.lastz-1.02.00/README.lastz-1.02.00a.html#options_output). GBrowse_syn accepts clustalw format plus "other commonly used formats recognized by BioPerl's AlignIO parser" (http://gmod.org/wiki/GBrowse_syn_Database) .  Since LASTZ doesn't produce clustalw, I've tried parsing LASTZ maf output to clustalw (and other alignment formats) using AlignIO, however I run into the following issues:
*Strand info is lost (probably fair enough, since this isn't part of the clustalw format per se; incorporating strand info within sequence IDs is a GBrowse_syn clustalw specification)
*The coordinate system for reverse strand matches  differs between LASTZ .maf and BioPerl .maf: for LASTZ, coordinates relate to the reverse complemented sequence, whereas for BioPerl/GBrowse, coordinates relate to the original (non-rev complemented) sequence. E.g. a coordinate of "1" in the LASTZ .maf file refers to the last base of the original sequence; AlignIO prints "1" to the output clustalw file, but since strand info is lost it is construed as the first position at the very start of the original sequence. As a result all reverse match coordinates in the resulting clustalw output file are incorrect.
*AlignIO is unable to parse multiple, individual aligned regions within the same .maf file; it interleaves them

I would be interested to hear whether anyone has already found a solution to integrating LASTZ and GBrowse_syn... and also whether any development of AlignIO to improve support of maf format is planned.
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From cjfields at illinois.edu  Wed Aug 11 18:02:38 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 11 Aug 2010 17:02:38 -0500
Subject: [Bioperl-l] AlignIO  and Gbrowse_syn
In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32F0237EAB7@exchsth.agresearch.co.nz>
References: <AANLkTi=DS92zQ+_YANSbgmMiPUDo66YXyLhQby+_z_VD@mail.gmail.com>
	<C87CF736.E5DB%gowthaman.ramasamy@sbri.org>
	<AANLkTimhT7KFHgg8HssiMrXkGNSjRMqjkuE++iAaqj4C@mail.gmail.com>
	<18DF7D20DFEC044098A1062202F5FFF32F0237EAB7@exchsth.agresearch.co.nz>
Message-ID: <E53C66C1-E4F1-4E83-B5ED-631CE62D7DCE@illinois.edu>

Russell,

We have had very few requests to support .maf until recently, which is why there has been little done with it.  We welcome any help to improve it.  

chris

On Aug 11, 2010, at 4:31 PM, Smithies, Russell wrote:

> I know there was some brief discussion about .maf format a few weeks ago but I've had an enquiry (as below) from a colleague. 
> If GBrowse_syn is using .maf format, does AlignIO need more work?
> Any comments?
> 
> --Russell
> 
> 
> I'd like to plug LASTZ alignments into GBrowse_syn. LASTZ can produce a limit number of alignment formats (http://www.bx.psu.edu/miller_lab/dist/README.lastz-1.02.00/README.lastz-1.02.00a.html#options_output). GBrowse_syn accepts clustalw format plus "other commonly used formats recognized by BioPerl's AlignIO parser" (http://gmod.org/wiki/GBrowse_syn_Database) .  Since LASTZ doesn't produce clustalw, I've tried parsing LASTZ maf output to clustalw (and other alignment formats) using AlignIO, however I run into the following issues:
> *Strand info is lost (probably fair enough, since this isn't part of the clustalw format per se; incorporating strand info within sequence IDs is a GBrowse_syn clustalw specification)
> *The coordinate system for reverse strand matches  differs between LASTZ .maf and BioPerl .maf: for LASTZ, coordinates relate to the reverse complemented sequence, whereas for BioPerl/GBrowse, coordinates relate to the original (non-rev complemented) sequence. E.g. a coordinate of "1" in the LASTZ .maf file refers to the last base of the original sequence; AlignIO prints "1" to the output clustalw file, but since strand info is lost it is construed as the first position at the very start of the original sequence. As a result all reverse match coordinates in the resulting clustalw output file are incorrect.
> *AlignIO is unable to parse multiple, individual aligned regions within the same .maf file; it interleaves them
> 
> I would be interested to hear whether anyone has already found a solution to integrating LASTZ and GBrowse_syn... and also whether any development of AlignIO to improve support of maf format is planned.
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From douglas.hoen at gmail.com  Thu Aug 12 01:59:37 2010
From: douglas.hoen at gmail.com (Doug Hoen)
Date: Wed, 11 Aug 2010 22:59:37 -0700 (PDT)
Subject: [Bioperl-l] HMMER3 to GFF3
Message-ID: <4bb89ced-69d9-43ff-ae20-4ce134efc40a@f6g2000yqa.googlegroups.com>

Hi,

 I am trying to convert HMMER3 (hmmscan) output files into GFF3 files.
Based on previous advice (see the thread, "How to store results of
searches of translated DNA in SeqFeature::Store database of the
original DNA?"), I have installed bioperl-live for its new HMMER3
parsing capabilities (in SearchIO) and am trying to use
bp_search2gff.pl to do the file conversion.

The hmmscan was done on translated chromosome sequences with conserved
domain models. I want to get the GFF 'start' and 'end' columns to be
based on these coordinates, not those of the models. To do this (with
my files), it seems I need to use the option "--type hit". However,
this changes the "Target" sequence name from the model name to
chromosome name, and the model name does not appear anywhere in the
output (see below).

Could someone please confirm whether the results are incorrect and, if
so, perhaps suggest a fix? It may well be that this problem is due to
the unusual way I am using hmmscan, rather than a problem with HMMER3
parsing...?

Many thanks,
-- Doug


========================================================


Here's what it looks like if I do *not* use the "--type hit" option.
(RVT_2 is a conserved domain name. I need this in the output.)


COMMAND:
------------------
bp_search2gff.pl -i ../chr1-tesigsv2.hmmscan -o chr1-tesigsv2-hmmscan-
original-locations-v2.gff3 --format hmmer3 --source HMMER3 --version 3
--component


OUTPUT:
------------------
==> chr1-tesigsv2-hmmscan-original-locations-v2.gff3 <==
##gff-version 3
Chr1_1	chromosome	Component	1	10142557	.	.	1	sequence=Chr1_1
Chr1_1	HMMER3	similarity	1	245	307.3	.	0	Target=Sequence:RVT_2 1898330
1898579
Chr1_1	HMMER3	similarity	1	244	329.5	.	0	Target=Sequence:RVT_2 2573551
2573796
Chr1_1	HMMER3	similarity	1	245	308.8	.	0	Target=Sequence:RVT_2 3159685
3159930
Chr1_1	HMMER3	similarity	1	102	108.2	.	0	Target=Sequence:RVT_2 3438684
3438791
Chr1_1	HMMER3	similarity	2	245	277.2	.	0	Target=Sequence:RVT_2 3566642
3566891
Chr1_1	HMMER3	similarity	13	213	251.4	.	0	Target=Sequence:RVT_2
4251160 4251373
Chr1_1	HMMER3	similarity	1	244	310.6	.	0	Target=Sequence:RVT_2 4252791
4253036
Chr1_1	HMMER3	similarity	6	99	94.2	.	0	Target=Sequence:RVT_2 4271555
4271653


========================================================


And here's what it looks like if I *do* use the "--type hit" option.
The coordinates look good but the model name has disappeared (and the
Target=Sequence seems wrong).


COMMAND:
------------------
bp_search2gff.pl -i ../chr1-tesigsv2.hmmscan -o chr1-tesigsv2-hmmscan-
original-locations-v3.gff3 --format hmmer3 --type hit --source HMMER3
--version 3 --component


OUTPUT:
------------------
==> chr1-tesigsv2-hmmscan-original-locations-v3.gff3 <==
##gff-version 3
RVT_2	HMMER3	similarity	1898330	1898579	307.3	.	0
Target=Sequence:Chr1_1 1 245
RVT_2	HMMER3	similarity	2573551	2573796	329.5	.	0
Target=Sequence:Chr1_1 1 244
RVT_2	HMMER3	similarity	3159685	3159930	308.8	.	0
Target=Sequence:Chr1_1 1 245
RVT_2	HMMER3	similarity	3438684	3438791	108.2	.	0
Target=Sequence:Chr1_1 1 102
RVT_2	HMMER3	similarity	3566642	3566891	277.2	.	0
Target=Sequence:Chr1_1 2 245
RVT_2	HMMER3	similarity	4251160	4251373	251.4	.	0
Target=Sequence:Chr1_1 13 213
RVT_2	HMMER3	similarity	4252791	4253036	310.6	.	0
Target=Sequence:Chr1_1 1 244
RVT_2	HMMER3	similarity	4271555	4271653	94.2	.	0
Target=Sequence:Chr1_1 6 99
RVT_2	HMMER3	similarity	4481232	4481477	281.5	.	0
Target=Sequence:Chr1_1 2 245


========================================================


And here's what the input HMMER3 result file looks like:


==> ../chr1-tesigsv2.hmmscan <==
# hmmscan :: search sequence(s) against a profile database
# HMMER 3.0rc1 (February 2010); http://hmmer.org/
# Copyright (C) 2010 Howard Hughes Medical Institute.
# Freely distributed under the GNU General Public License (GPLv3).
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- -
# query sequence file:             [...]/whole_chromosomes/translated/
chr1.pep
# target HMM database:             [...]/signatures/Pfam-A.hmm
# output directed to file:         chr1-tesigsv2.hmmscan
# model-specific thresholding:     TC cutoffs
# Max sensitivity mode:            on [all heuristic filters off]
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- -

Query:       Chr1_1  [L=10142557]
Description: CHROMOSOME dumped from ADB: Jun/20/09 14:53; last
updated: 2009-02-02
Scores for complete sequence (score includes all domains):
   --- full sequence ---   --- best 1 domain ---    -#dom-
    E-value  score  bias    E-value  score  bias    exp  N
Model           Description
    ------- ------ -----    ------- ------ -----   ---- --
--------        -----------
          0 3971.3  17.7   2.6e-101  329.5   0.6   19.4 17
RVT_2           Reverse transcriptase (RNA-dependent DNA pol
          0 3040.7  23.0     1e-206  678.6   0.1   12.2 10
ATHILA          ATHILA ORF-1 family
          0 1681.9  79.1    1.9e-46  149.9   0.4   28.0 21
RVT_1           Reverse transcriptase (RNA-dependent DNA pol
          0 1446.9  27.4    3.6e-95  309.1   0.2    7.6  5
Transposase_21  Transposase family tnp2
          0 1168.4  50.3    1.4e-29   94.4   0.3   21.5 18
rve             Integrase core domain
   9.1e-300  960.0  69.0    3.1e-20   64.0   0.0   28.8 20
Retrotrans_gag  Retrotransposon gag protein
   1.5e-180  577.0  31.6    1.6e-29   93.1   1.5    9.5  8
Transposase_23  TNP1/EN/SPM transposase
   4.4e-143  456.9  82.8    4.8e-18   56.4   0.1   12.9 11
MuDR            MuDR family transposase
   3.8e-116  371.4  19.6    1.2e-18   58.9   0.0   13.7  7
MULE            MULE transposase domain
   7.1e-106  344.1   5.6    2.7e-97  316.0   0.0    3.6  1
Plant_tran      Plant transposon protein
    9.2e-85  275.4  22.9    5.4e-60  194.4   0.3    6.4  3
Peptidase_C48   Ulp1 protease family, C-terminal catalytic d
    1.8e-77  249.8  24.8    4.4e-28   89.8   0.1   10.8  3
Transposase_24  Plant transposase (Ptta/En/Spm family)
    2.8e-47  150.1   1.2    5.5e-23   72.3   0.2    3.7  2
hATC            hAT family dimerisation domain
    5.7e-28   89.4   3.6    4.7e-13   41.1   0.0    6.5  1
RVP_2           Retroviral aspartyl protease
      1e-16   53.3   0.0    4.4e-07   22.1   0.0    6.8  1
RnaseH          RNase H
    1.5e-08   25.3   2.4    0.00016   12.1   0.0    4.9  0
Transposase_mut Transposase, Mutator family


Domain annotation for each model (and alignments):
>> RVT_2  Reverse transcriptase (RNA-dependent DNA polymerase)
   #    score  bias  c-Evalue  i-Evalue hmmfrom  hmm to    alifrom
ali to    envfrom  env to     acc
 ---   ------ ----- --------- --------- ------- -------    -------
-------    ------- -------    ----
   1 !  307.3   0.0   5.3e-95   1.5e-94       1     245 [. 1898330
1898578 .. 1898330 1898579 .. 0.99
   2 !  329.5   0.6  8.9e-102  2.6e-101       1     244 [. 2573551
2573794 .. 2573551 2573796 .. 0.99
   3 !  308.8   0.0   1.8e-95   5.2e-95       1     245 [. 3159685
3159929 .. 3159685 3159930 .. 0.99
   4 !  108.2   0.1   3.4e-34   9.7e-34       1     102 [. 3438684
3438785 .. 3438684 3438791 .. 0.96
   5 !  277.2   0.0   8.1e-86   2.3e-85       2     245 .. 3566643
3566890 .. 3566642 3566891 .. 0.99
   6 !  251.4   0.0   6.2e-78   1.8e-77      13     213 .. 4251164
4251364 .. 4251160 4251373 .. 0.97
   7 !  310.6   0.0   5.1e-96   1.5e-95       1     244 [. 4252791
4253034 .. 4252791 4253036 .. 0.99
   8 !   94.2   0.1   6.1e-30   1.8e-29       6      99 .. 4271560
4271653 .. 4271555 4271653 .. 0.97
   9 !  281.5   0.9   3.9e-87   1.1e-86       2     245 .. 4481233
4481476 .. 4481232 4481477 .. 0.98
  10 !  248.2   0.0   5.9e-77   1.7e-76       1     190 [. 4521040
4521233 .. 4521040 4521237 .. 0.97
  11 !  314.6   0.1   3.2e-97   9.2e-97       1     244 [. 4652456
4652702 .. 4652456 4652704 .. 0.98
  12 !   40.7   0.0   1.3e-13   3.7e-13       2      92 .. 5219607
5219697 .. 5219606 5219701 .. 0.90
  13 !  221.0   0.0   1.2e-68   3.4e-68       2     245 .. 5241015
5241258 .. 5241014 5241259 .. 0.95
  14 !   81.2   0.0   5.6e-26   1.6e-25       2     115 .. 5501957
5502070 .. 5501956 5502080 .. 0.92
  15 !  272.4   0.0   2.3e-84   6.7e-84      30     245 .. 6483057
6483271 .. 6483050 6483272 .. 0.98
  16 !  178.5   0.0   1.2e-55   3.3e-55      81     244 .. 7250563
7250726 .. 7250552 7250728 .. 0.96
  17 !  313.7   0.0   5.9e-97   1.7e-96       2     245 .. 7707124
7707367 .. 7707123 7707368 .. 0.99

  Alignments for each domain:
  == domain 1    score: 307.3 bits;  conditional E-value: 5.3e-95
   RVT_2       1
nktwelvelpkgkkviglkWvfklKlnedgeierykARlVakGftqkegidyeetfspvvklesirlllalaaekkleleqlDvktaFLngelee
95
                 n tw +++lp gkk++g+kWv+k+Kln+dg++erykARlVakG+tq+eg+dy
+tfspv+kl++++ll+a+aa+k+++l+qlD+++aFLng+l+e
  Chr1_1 1898330
NGTWVVCSLPVGKKAVGCKWVYKIKLNADGSLERYKARLVAKGYTQTEGLDYVDTFSPVAKLTTVKLLIAVAAAKGWSLSQLDISNAFLNGSLDE
1898424
 
68*********************************************************************************************
PP

   RVT_2      96
evYvkqpeGfedkkk....enkvckLkkslYgLkqapraWyeklsevllklgfkkseadkclfvkkkeeeliivllYVDDlliagsskelieelk
186
                 e+Y++ p+G++ ++     +n vc+LkkslYgLkqa+r+Wy k+se l++lgf+
+s+ d++lf++k++++ ++vl+YVDD++ia+s +++ e l
  Chr1_1 1898425
EIYMTLPPGYSPRQGdsfpPNAVCRLKKSLYGLKQASRQWYLKFSESLKALGFTQSSGDHTLFTRKSKNSYMAVLVYVDDIIIASSCDRETELLR
1898519
 
***********998889999***************************************************************************
PP

   RVT_2     187
eeLkkefemkdlgelkyfLgleierkeegillsqekyvkkllkkfkmedakpvstplea 245
                 ++L+++ +++dlg+l+yfLglei+r+++gi+++q+ky+ +ll+++++  +k++s
+p+e+
  Chr1_1 1898520
DALQRSSKLRDLGTLRYFLGLEIARNTDGISICQRKYTLELLAETGLLGCKSSSVPMEP 1898578
 
*********************************************************97 PP

  == domain 2    score: 329.5 bits;  conditional E-value: 8.9e-102
   RVT_2       1
nktwelvelpkgkkviglkWvfklKlnedgeierykARlVakGftqkegidyeetfspvvklesirlllalaaekkleleqlDvktaFLngelee
95
                 n+twel++lp+g+k+ig+kWv+k K+n++ge+erykARlVakG++q++gidy+e
+f+pv++le++rl+++laa++k++++q+D k aFLng++ee
  Chr1_1 2573551
NDTWELTSLPNGHKAIGVKWVYKAKKNSKGEVERYKARLVAKGYSQRAGIDYDEVFAPVARLETVRLIISLAAQNKWKIHQMDFKLAFLNGDFEE
2573645
 
79*********************************************************************************************
PP

   RVT_2      96
evYvkqpeGfedkkkenkvckLkkslYgLkqapraWyeklsevllklgfkkseadkclfvkkkeeeliivllYVDDlliagsskelieelkeeLk
190
                 evY++qp+G+ +k++e+kv++Lkk+lYgLkqapraW++++++++++++f k+ +
+++l++k ++e+++i +lYVDDl+++g++ ++ ee+k+e++
  Chr1_1 2573646
EVYIEQPQGYIVKGEEDKVLRLKKALYGLKQAPRAWNTRIDKYFKEKDFIKCPYEHALYIKIQKEDILIACLYVDDLIFTGNNPSMFEEFKKEMT
2573740
 
***********************************************************************************************
PP

   RVT_2     191
kefemkdlgelkyfLgleierkeegillsqekyvkkllkkfkmedakpvstple 244
                 kefem+d+g ++y+Lg+e+++++++i+++qe y+k++lkkfkm+d++pv tp
+e
  Chr1_1 2573741
KEFEMTDIGLMSYYLGIEVKQEDNRIFITQEGYAKEVLKKFKMDDSNPVCTPME 2573794
 
****************************************************97 PP


From kai.blin at biotech.uni-tuebingen.de  Thu Aug 12 08:16:45 2010
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Thu, 12 Aug 2010 14:16:45 +0200
Subject: [Bioperl-l] HMMER3 to GFF3
In-Reply-To: <4bb89ced-69d9-43ff-ae20-4ce134efc40a@f6g2000yqa.googlegroups.com>
References: <4bb89ced-69d9-43ff-ae20-4ce134efc40a@f6g2000yqa.googlegroups.com>
Message-ID: <20100812141645.1dc6507a.kai.blin@biotech.uni-tuebingen.de>

On Wed, 11 Aug 2010 22:59:37 -0700 (PDT)
Doug Hoen <douglas.hoen at gmail.com> wrote:

Hi Doug,

> Could someone please confirm whether the results are incorrect and, if
> so, perhaps suggest a fix? It may well be that this problem is due to
> the unusual way I am using hmmscan, rather than a problem with HMMER3
> parsing...?

Can you please attach your hmmer input file? Along the way something
inserted line breaks, making it unreadable.

It might well be possible that the HMMer3 parser still handles a little
different from the HMMer2 parser, I haven't tried that script.

Cheers,
Kai

-- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-University of T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Deutschland
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben


From kai.blin at biotech.uni-tuebingen.de  Thu Aug 12 08:09:00 2010
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Thu, 12 Aug 2010 14:09:00 +0200
Subject: [Bioperl-l] using HMMER
In-Reply-To: <62C86AFB-FF3A-44C6-A413-50C3F839DF34@illinois.edu>
References: <603590.1072.qm@web112620.mail.gq1.yahoo.com>
	<4C62B487.9090103@gmail.com>
	<62C86AFB-FF3A-44C6-A413-50C3F839DF34@illinois.edu>
Message-ID: <20100812140900.291bbb01.kai.blin@biotech.uni-tuebingen.de>

On Wed, 11 Aug 2010 10:07:36 -0500
Chris Fields <cjfields at illinois.edu> wrote:

> might also want to check whether you are using hmmer2 vs hmmer3.  not sure if the wrapper works for hmmer3.

It might if you initialize it using
my $factory = Bio::Tools::Run::Hmmer->new(-hmm => 'model.hmm', -_READMETHOD => 'hmmer3');

at least for the programs that still exist with the same name in
hmmer3. It won't support hmmer3 using the default options, though.

If I have some spare time, I'll look into this, no promises on the
timeframe, though.

Cheers,
Kai

-- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-University of T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Deutschland
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben


From cjfields at illinois.edu  Thu Aug 12 11:28:50 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 12 Aug 2010 10:28:50 -0500
Subject: [Bioperl-l] using HMMER
In-Reply-To: <20100812140900.291bbb01.kai.blin@biotech.uni-tuebingen.de>
References: <603590.1072.qm@web112620.mail.gq1.yahoo.com>
	<4C62B487.9090103@gmail.com>
	<62C86AFB-FF3A-44C6-A413-50C3F839DF34@illinois.edu>
	<20100812140900.291bbb01.kai.blin@biotech.uni-tuebingen.de>
Message-ID: <8129B813-5B15-4DDC-AB0D-5D95EFFCE78D@illinois.edu>

On Aug 12, 2010, at 7:09 AM, Kai Blin wrote:

> On Wed, 11 Aug 2010 10:07:36 -0500
> Chris Fields <cjfields at illinois.edu> wrote:
> 
>> might also want to check whether you are using hmmer2 vs hmmer3.  not sure if the wrapper works for hmmer3.
> 
> It might if you initialize it using
> my $factory = Bio::Tools::Run::Hmmer->new(-hmm => 'model.hmm', -_READMETHOD => 'hmmer3');
> 
> at least for the programs that still exist with the same name in
> hmmer3. It won't support hmmer3 using the default options, though.
> 
> If I have some spare time, I'll look into this, no promises on the
> timeframe, though.
> 
> Cheers,
> Kai
> 
> -- 
> Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
> Institute for Microbiology and Infection Medicine
> Division of Microbiology/Biotechnology
> Eberhard-Karls-University of T?bingen
> Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
> D-72076 T?bingen                        Fax :   ++49 7071 29-5979
> Deutschland
> Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben

Would be nice to convert this over (at some point) to use Mark's CommandExts.  I'm thinking of doing this with Infernal, so if I get that running it wouldn't be terribly difficult to get hmmer3 working as well.

chris


From cjfields at illinois.edu  Thu Aug 12 12:14:44 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 12 Aug 2010 11:14:44 -0500
Subject: [Bioperl-l] using HMMER
In-Reply-To: <857996.8184.qm@web112610.mail.gq1.yahoo.com>
References: <603590.1072.qm@web112620.mail.gq1.yahoo.com>
	<4C62B487.9090103@gmail.com>
	<62C86AFB-FF3A-44C6-A413-50C3F839DF34@illinois.edu>
	<20100812140900.291bbb01.kai.blin@biotech.uni-tuebingen.de>
	<8129B813-5B15-4DDC-AB0D-5D95EFFCE78D@illinois.edu>
	<857996.8184.qm@web112610.mail.gq1.yahoo.com>
Message-ID: <43FD0A31-DB95-4AE9-B678-937EE6346BC2@illinois.edu>

Fayroz,

Please keep responses on-list.

It seems you need to update your local bioperl, as 'hmmer3' is a recent addition, after 1.6.1.  It will be in 1.6.2 if I can get the time to make a release :>

chris

On Aug 12, 2010, at 10:58 AM, fayroz wrote:

> dear chris,
> from HMMER documentation i found this statement
> "The HMMER programs must either be in your path, or you must set the environment
> variable HMMERDIR to point to their location." 
> is it will solve the problem?
> how can i do it please ? i work under windows7 platform
> 
> 
> when i appled this line with hmmer3
> my $factory = Bio::Tools::Run::Hmmer->new(-hmm => 'model.hmm', -_READMETHOD => 
> 'hmmer3');
> 
> this output apper: 
> 
> Bio::SearchIO: hmmer3 cannot be found
> 
> and when try with hmmer2 the same output apper: 
> 
> Exception
> ------------- EXCEPTION -------------
> MSG: Failed to load module Bio::SearchIO::hmmer3. Can't locate 
> Bio\SearchIO\hmmer3.pm in @INC (@INC contains: D:\Perl\bin\ D:/Perl/site/lib 
> D:/Perl/lib .) at D:/Perl/site/lib/Bio/Root/Root.pm line 439, <GEN0> line 1.
> STACK Bio::Root::Root::_load_module D:/Perl/site/lib/Bio/Root/Root.pm:441
> STACK (eval) D:/Perl/site/lib/Bio/SearchIO.pm:446
> STACK Bio::SearchIO::_load_format_module D:/Perl/site/lib/Bio/SearchIO.pm:445
> STACK Bio::SearchIO::new D:/Perl/site/lib/Bio/SearchIO.pm:189
> STACK Bio::Tools::Run::Hmmer::_run D:/Perl/site/lib/Bio/Tools/Run/Hmmer.pm:431
> STACK Bio::Tools::Run::Hmmer::hmmsearch 
> D:/Perl/site/lib/Bio/Tools/Run/Hmmer.pm:353
> STACK toplevel C:\Users\Khaled\AppData\Local\Temp\dzprltmp.pl:13
> -------------------------------------
> For more information about the SearchIO system please see the SearchIO docs.
> This includes ways of checking for formats at compile time, not run time
> '--informat' is not recognized as an internal or external command,
> operable program or batch file.
> Can't call method "next_result" on an undefined value at 
> C:\Users\Khaled\AppData\Local\Temp\dzprltmp.pl line 15, <GEN0> line 1.
> 
> 
> 
> ----- Original Message ----
> From: Chris Fields <cjfields at illinois.edu>
> To: Kai Blin <kai.blin at biotech.uni-tuebingen.de>
> Cc: fayroz <fayroz_farouk at yahoo.com>; bioperl-l at bioperl.org
> Sent: Thu, August 12, 2010 6:28:50 PM
> Subject: Re: [Bioperl-l] using HMMER
> 
> On Aug 12, 2010, at 7:09 AM, Kai Blin wrote:
> 
>> On Wed, 11 Aug 2010 10:07:36 -0500
>> Chris Fields <cjfields at illinois.edu> wrote:
>> 
>>> might also want to check whether you are using hmmer2 vs hmmer3.  not sure if 
>>> the wrapper works for hmmer3.
>> 
>> It might if you initialize it using
>> my $factory = Bio::Tools::Run::Hmmer->new(-hmm => 'model.hmm', -_READMETHOD => 
>> 'hmmer3');
>> 
>> at least for the programs that still exist with the same name in
>> hmmer3. It won't support hmmer3 using the default options, though.
>> 
>> If I have some spare time, I'll look into this, no promises on the
>> timeframe, though.
>> 
>> Cheers,
>> Kai
>> 
>> -- 
>> Dipl.-Inform. Kai Blin        kai.blin at biotech.uni-tuebingen.de
>> Institute for Microbiology and Infection Medicine
>> Division of Microbiology/Biotechnology
>> Eberhard-Karls-University of T?bingen
>> Auf der Morgenstelle 28                Phone : ++49 7071 29-78841
>> D-72076 T?bingen                        Fax :  ++49 7071 29-5979
>> Deutschland
>> Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben
> 
> Would be nice to convert this over (at some point) to use Mark's CommandExts.  
> I'm thinking of doing this with Infernal, so if I get that running it wouldn't 
> be terribly difficult to get hmmer3 working as well.
> 
> chris
> 
> 
> 


From jason at bioperl.org  Thu Aug 12 14:37:11 2010
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 12 Aug 2010 11:37:11 -0700
Subject: [Bioperl-l] Other: Script for editing alignments?
In-Reply-To: <20100812061811.4D92468539@evol.biology.mcmaster.ca>
References: <20100812061811.4D92468539@evol.biology.mcmaster.ca>
Message-ID: <4C643F57.3040408@bioperl.org>

Hi Si -

This is pretty straightforward with Bioperl. Here's one solution:

#!/usr/bin/perl -w
use strict;
use Bio::AlignIO;
my $in = Bio::AlignIO->new(-format => 'fasta', -file => shift @ARGV);
my $out = Bio::AlignIO->new(-format => 'fasta');

while( my $aln = $in->next_aln ) {
  for my $seq ( $aln->each_seq ) {
  my $str = $seq->seq;
  if( $str =~ /^(-+)/ ) {
     my $rep = length($1);
# replace from the 5' end
     substr($str,0,$rep,'N'x$rep);
  }
  if( $str =~ /(-+)$/ ) {
    my $rep = length($1);
   # replace from the 3' end
    substr($str,-1 * $rep,length($str),'N'x$rep);
  }
     $seq->seq($str);
  }
  # don't print the /start-end info in the FASTA ID
  $aln->set_displayname_flat(1);
  $out->write_aln($aln);
}

-jason

evoldir at evol.biology.mcmaster.ca wrote, On 8/11/10 11:18 PM:
> Dear All
>
> Alignment programs like MUSCLE and Clustal often output alignments with
> "-" symbols indicating indels (real events) within sequence alignments,
> but also "-" symbols at the 5' and 3' ends of sequences. The latter
> however, are not real evolutionary events and really should be Ns
> (missing data), depending on the sort of analytical framework you use.
>
> If there is sufficient heterogeneity and signal within the 5' and 3'
> ends of sequences, the "-"s can be manually edited in a text editor to
> Ns with no problem, if the alignment is small. If it is large (e.g. 2000
> seqs), or there are lots of alignments, it becomes a lengthy task.
>
> I'm investigating such alignments presently and so was wondering if
> anyone had a clever way of implementing sed, or had a Perl script that
> would perform such a task. Simply put, it would require replacing the 5'
> and 3' "-" below only with Ns and leaving the within sequence "-"s
> alone. The sequences naturally may span more than one line.
>
>   >Taxon 1
> -----ATGCTG--TGACTG----TGACT---
>   >Taxon 2
> ---GTATGTTG--TGACTGCT--TGACCGTC
>
> to
>
>   >Taxon 1
> NNNNNATGCTG--TGACTG----TGACTNNN
>   >Taxon 2
> NNNGTATGTTG--TGACTGCT--TGACCGTC
>
> It's a simple task, but I haven't seen any scripts out there to do the job.
>
> If there are any scripters out there who can help, or if someone knows
> of an application that would help, it would be great to hear from you.
>
> With best wishes and thanks
>
> Si Creer
>
>    


From genehack at genehack.org  Thu Aug 12 20:32:07 2010
From: genehack at genehack.org (John SJ Anderson)
Date: Thu, 12 Aug 2010 20:32:07 -0400
Subject: [Bioperl-l]
	Bio::SeqFeature::SimilarityPair->from_searchResult()?
In-Reply-To: <4513D6B2-F7B3-4A6E-91CA-879C9E372E84@gmail.com>
References: <4513D6B2-F7B3-4A6E-91CA-879C9E372E84@gmail.com>
Message-ID: <ABCC813F-9FF8-465E-B5AF-E95BD8291D95@genehack.org>


On Aug 10, 2010, at 21:54 , Douglas Hoen wrote:

> I was wondering why the Synopsis in the docs for Bio::SeqFeature::SimilarityPair has the following:
> $sim_pair = Bio::SeqFeature::SimilarityPair->from_searchResult($blastHit);
> 
> There doesn't actually seem to be a from_searchResult method. Am I missing something?

No, it looks like that method got removed back in 2002 as a part of moving to Bio::SearchIO (which was removed still later...):

  <http://github.com/bioperl/bioperl-live/commit/5e3bdc11eb0ceffcd8e8966299a6367e792f2fd1>

Unfortunately, the commit didn't update the documentation. From the tiny little bit I've looked at the code, it looks like you should just be calling the 'new()' method instead (note that it takes a set of arguments, not just a BLAST hit object).

Hope this helps -- if you should happen to have the tuits, a patch to update the documentation to reflect the current interface would be awesome...

chrs,
john.


From david.breimann at gmail.com  Fri Aug 13 09:01:10 2010
From: david.breimann at gmail.com (David Breimann)
Date: Fri, 13 Aug 2010 16:01:10 +0300
Subject: [Bioperl-l] Problem executing bp_genbank2gff3.pl from another perl
	script
Message-ID: <AANLkTikqTXynSe4dTqw1Tz5GOOyoDOZTC5C-HJWLKfaL@mail.gmail.com>

Hi,
I am rying to run bp_genbank2gff3.pl from another perl script that
gets a genbank as its argument.

This does not work  (no output files are generated):
    my $command = "bp_genbank2gff3.pl -y -o /tmp $ARGV[0]";

    open( my $command_out, "-|", $command );
    close $command_out;

but this does

    open( my $command_out, "-|", $command );
    sleep 3; # why do I need to sleep?
    close $command_out;

Why?

I though that close is supposed to block until the command is done:

Closing any piped filehandle causes the parent process to wait for the
child to finish... (see http://perldoc.perl.org/functions/open.html).

Thanks
Dave


From jun.yin at ucd.ie  Fri Aug 13 09:36:34 2010
From: jun.yin at ucd.ie (Jun Yin)
Date: Fri, 13 Aug 2010 14:36:34 +0100
Subject: [Bioperl-l] Bio::LocatableSeq end checking inconsistency
Message-ID: <004a01cb3aec$8c2ddd60$a4899820$%yin@ucd.ie>

Hi, all,

 
I am the google summer of code student working on Bio::Align subsystem
refactoring. The code (Bio::SimpleAlign) I re-implemented now has passed
nearly all the test, except a few tests on seq/start-end testing. But here
comes a problem. This may be an old issue, that the Bio::LocatableSeq end
assignment and checking are inconsistent.

 
The current end checking method is based on:

$end=$seq->_ungapped_len+$seq->start-1

However, this checking may not fit the real world case.

 
The inconsistency usually happens when a few columns of the sequence are
removed.

 
For example:

my $a = Bio::LocatableSeq->new(

    -id    => 'a',

    -strand => 1,

    -seq   => '-tcgatc-atcgatcg',

    -start => 30,

    -end   => 43

);

 
If we remove the 1st, 8th and the last columns

 
$a->seq() will be 'tcgatcatcgatc'

$a->_ungapped_len==12

 
Actually, in the real world, the first residue will still be 30 (the old
$seq->start), and the last residue is the residue before the 43 (the old
$seq->end), thus 42.

 
But if you call a validation, the calculation is
$a->_ungapped_len+$a->start-1=12+30-1=41

So the reassignment of the $seq->end will not pass the validation.

 
So unless you save the information to a new sequence object, the original
position information will be lost anyway. But in some cases, we have to
change the sequence in its original sequence object ..

 
What is your suggestion on this issue? 

A. pass the test and lose the information      #convenient in coding but the
start-end annotation is not right any more

B. keep the information and forget the test   #the object will still
remember where the last residue was in the original sequence. But is it
really meaningful at all? Because all the other residues may come from
nowhere

C. Neither of above #any other suggestions?

 
Cheers,

Jun Yin

Ph.D. student in U.C.D.

 
Bioinformatics Laboratory

Conway Institute

University College Dublin

 
From jessica.sun at gmail.com  Fri Aug 13 11:06:46 2010
From: jessica.sun at gmail.com (Jessica Sun)
Date: Fri, 13 Aug 2010 11:06:46 -0400
Subject: [Bioperl-l] Add sequence feature
Message-ID: <AANLkTimkwyqE1tbZ+p-T41yvBWVq=GrCJ+L0rT+84mPy@mail.gmail.com>

Does anyone knows how to open a genbank file, add new feature and then save
a new genbank
file with new feature added in bioperl ?

thx

-- 
Jessica Jingping Sun


From jessica.sun at gmail.com  Fri Aug 13 11:27:10 2010
From: jessica.sun at gmail.com (Jessica Sun)
Date: Fri, 13 Aug 2010 11:27:10 -0400
Subject: [Bioperl-l] Add sequence feature
In-Reply-To: <4C6562E0.7090008@gmail.com>
References: <AANLkTimkwyqE1tbZ+p-T41yvBWVq=GrCJ+L0rT+84mPy@mail.gmail.com>
	<4C6562E0.7090008@gmail.com>
Message-ID: <AANLkTikGRnpybPPsoSf_FAXWUUpFCK9YEMMiyStBbfbN@mail.gmail.com>

unfortunately. I want to add the feature to the sequence object I got from
the Genbank file, I do not mind to save a new genbank file but these new
genbank file contains the original genbank format and info I got plus the
new feature tags I need to added to. Any quick solution to this?

thx

Jessica


On Fri, Aug 13, 2010 at 11:21 AM, Roy Chaudhuri <roy.chaudhuri at gmail.com>wrote:

> Hi Jessica.
>
> You need to use Bio::SeqIO to read in the GenBank file to a BioPerl
> sequence object, and to write your new GenBank file:
> http://www.bioperl.org/wiki/HOWTO:SeqIO
>
> To add a new feature follow the instructions here:
>
> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Own_Sequences
>
> (except that you are adding the feature to the sequence object you got from
> the Genbank file, not a new Bio::Seq object).
>
> Cheers.
> Roy.
>
>
> On 13/08/2010 16:06, Jessica Sun wrote:
>
>> Does anyone knows how to open a genbank file, add new feature and then
>> save
>> a new genbank
>> file with new feature added in bioperl ?
>>
>> thx
>>
>>
>


-- 
Jessica Jingping Sun


From roy.chaudhuri at gmail.com  Fri Aug 13 11:21:04 2010
From: roy.chaudhuri at gmail.com (Roy Chaudhuri)
Date: Fri, 13 Aug 2010 16:21:04 +0100
Subject: [Bioperl-l] Add sequence feature
In-Reply-To: <AANLkTimkwyqE1tbZ+p-T41yvBWVq=GrCJ+L0rT+84mPy@mail.gmail.com>
References: <AANLkTimkwyqE1tbZ+p-T41yvBWVq=GrCJ+L0rT+84mPy@mail.gmail.com>
Message-ID: <4C6562E0.7090008@gmail.com>

Hi Jessica.

You need to use Bio::SeqIO to read in the GenBank file to a BioPerl 
sequence object, and to write your new GenBank file:
http://www.bioperl.org/wiki/HOWTO:SeqIO

To add a new feature follow the instructions here:
http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Own_Sequences

(except that you are adding the feature to the sequence object you got 
from the Genbank file, not a new Bio::Seq object).

Cheers.
Roy.

On 13/08/2010 16:06, Jessica Sun wrote:
> Does anyone knows how to open a genbank file, add new feature and then save
> a new genbank
> file with new feature added in bioperl ?
>
> thx
>


From roy.chaudhuri at gmail.com  Fri Aug 13 11:37:20 2010
From: roy.chaudhuri at gmail.com (Roy Chaudhuri)
Date: Fri, 13 Aug 2010 16:37:20 +0100
Subject: [Bioperl-l] Add sequence feature
In-Reply-To: <AANLkTikGRnpybPPsoSf_FAXWUUpFCK9YEMMiyStBbfbN@mail.gmail.com>
References: <AANLkTimkwyqE1tbZ+p-T41yvBWVq=GrCJ+L0rT+84mPy@mail.gmail.com>	<4C6562E0.7090008@gmail.com>
	<AANLkTikGRnpybPPsoSf_FAXWUUpFCK9YEMMiyStBbfbN@mail.gmail.com>
Message-ID: <4C6566B0.60706@gmail.com>

I'm not sure I understand, do you mean that you want to load just the 
sequence from the GenBank file (ignoring the existing annotation), then 
add your own features? There are instructions on how to do that here:
http://www.bioperl.org/wiki/HOWTO:SeqIO#Speed.2C_Bio::Seq::SeqBuilder

On 13/08/2010 16:27, Jessica Sun wrote:
> unfortunately. I want to add the feature to the sequence object I got
> from the Genbank file, I do not mind to save a new genbank file but
> these new genbank file contains the original genbank format and info I
> got plus the new feature tags I need to added to. Any quick solution to
> this?
>
> thx
>
> Jessica
>
>
>
> On Fri, Aug 13, 2010 at 11:21 AM, Roy Chaudhuri <roy.chaudhuri at gmail.com
> <mailto:roy.chaudhuri at gmail.com>> wrote:
>
>     Hi Jessica.
>
>     You need to use Bio::SeqIO to read in the GenBank file to a BioPerl
>     sequence object, and to write your new GenBank file:
>     http://www.bioperl.org/wiki/HOWTO:SeqIO
>
>     To add a new feature follow the instructions here:
>     http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Own_Sequences
>
>     (except that you are adding the feature to the sequence object you
>     got from the Genbank file, not a new Bio::Seq object).
>
>     Cheers.
>     Roy.
>
>
>     On 13/08/2010 16:06, Jessica Sun wrote:
>
>         Does anyone knows how to open a genbank file, add new feature
>         and then save
>         a new genbank
>         file with new feature added in bioperl ?
>
>         thx
>
>
>
>
>
> --
> Jessica Jingping Sun


From roy.chaudhuri at gmail.com  Fri Aug 13 11:57:27 2010
From: roy.chaudhuri at gmail.com (Roy Chaudhuri)
Date: Fri, 13 Aug 2010 16:57:27 +0100
Subject: [Bioperl-l] Add sequence feature
In-Reply-To: <AANLkTikRuHaDmA3JvsdBHBXpgeti-pJvHuJk15s_9K+P@mail.gmail.com>
References: <AANLkTimkwyqE1tbZ+p-T41yvBWVq=GrCJ+L0rT+84mPy@mail.gmail.com>	<4C6562E0.7090008@gmail.com>	<AANLkTikGRnpybPPsoSf_FAXWUUpFCK9YEMMiyStBbfbN@mail.gmail.com>	<4C6566B0.60706@gmail.com>
	<AANLkTikRuHaDmA3JvsdBHBXpgeti-pJvHuJk15s_9K+P@mail.gmail.com>
Message-ID: <4C656B67.5020402@gmail.com>

Please remember to copy replies to the mailing list.

You can loop over the features in your Bio::Seq object:
for my $feat ($seq->get_SeqFeatures) { # do something }

And once you have found the feature you want to modify, you can add a 
tag using something like:
$feat->add_tag_value('note',"this is a note");

When you're finished you can write out the modified sequence object to a 
new GenBank file.

On 13/08/2010 16:40, Jessica Sun wrote:
> no i want to load the genbank file with existing features and I need to
> add some new feature tags to the existing ones and then save to a new
> update genbank file for local usage. I just not quite good on how to
> easily merge the two steps you recommended into one in a neat way.
>
> thx
>
>
> On Fri, Aug 13, 2010 at 11:37 AM, Roy Chaudhuri <roy.chaudhuri at gmail.com
> <mailto:roy.chaudhuri at gmail.com>> wrote:
>
>     I'm not sure I understand, do you mean that you want to load just
>     the sequence from the GenBank file (ignoring the existing
>     annotation), then add your own features? There are instructions on
>     how to do that here:
>     http://www.bioperl.org/wiki/HOWTO:SeqIO#Speed.2C_Bio::Seq::SeqBuilder
>
>
>     On 13/08/2010 16:27, Jessica Sun wrote:
>
>         unfortunately. I want to add the feature to the sequence object
>         I got
>         from the Genbank file, I do not mind to save a new genbank file but
>         these new genbank file contains the original genbank format and
>         info I
>         got plus the new feature tags I need to added to. Any quick
>         solution to
>         this?
>
>         thx
>
>         Jessica
>
>
>
>         On Fri, Aug 13, 2010 at 11:21 AM, Roy Chaudhuri
>         <roy.chaudhuri at gmail.com <mailto:roy.chaudhuri at gmail.com>
>         <mailto:roy.chaudhuri at gmail.com
>         <mailto:roy.chaudhuri at gmail.com>>> wrote:
>
>             Hi Jessica.
>
>             You need to use Bio::SeqIO to read in the GenBank file to a
>         BioPerl
>             sequence object, and to write your new GenBank file:
>         http://www.bioperl.org/wiki/HOWTO:SeqIO
>
>             To add a new feature follow the instructions here:
>         http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Own_Sequences
>
>             (except that you are adding the feature to the sequence
>         object you
>             got from the Genbank file, not a new Bio::Seq object).
>
>             Cheers.
>             Roy.
>
>
>             On 13/08/2010 16:06, Jessica Sun wrote:
>
>                 Does anyone knows how to open a genbank file, add new
>         feature
>                 and then save
>                 a new genbank
>                 file with new feature added in bioperl ?
>
>                 thx
>
>
>
>
>
>         --
>         Jessica Jingping Sun
>
>
>
>
>
> --
> Jessica Jingping Sun


From jessica.sun at gmail.com  Fri Aug 13 13:06:32 2010
From: jessica.sun at gmail.com (Jessica Sun)
Date: Fri, 13 Aug 2010 13:06:32 -0400
Subject: [Bioperl-l] Add sequence feature
In-Reply-To: <4C656B67.5020402@gmail.com>
References: <AANLkTimkwyqE1tbZ+p-T41yvBWVq=GrCJ+L0rT+84mPy@mail.gmail.com>
	<4C6562E0.7090008@gmail.com>
	<AANLkTikGRnpybPPsoSf_FAXWUUpFCK9YEMMiyStBbfbN@mail.gmail.com>
	<4C6566B0.60706@gmail.com>
	<AANLkTikRuHaDmA3JvsdBHBXpgeti-pJvHuJk15s_9K+P@mail.gmail.com>
	<4C656B67.5020402@gmail.com>
Message-ID: <AANLkTikJ7md5gYebt3GOXBMbCBvDZQ1TmoEkUMp1MPf5@mail.gmail.com>

Thanks. I somehow get these error messages.

--------------------- WARNING ---------------------
MSG:  Bio::SeqIO::genbank=HASH(0xa7ba1c) is not a SeqI compliant module.
Attempting to dump, but may fail!
---------------------------------------------------
Can't locate object method "seq" via package "Bio::SeqIO::genbank" at
/Library/Perl/5.8.8/Bio/SeqIO/genbank.pm line 760, <GEN0> line 447.

by doing this,

my $feat = new Bio::SeqFeature::Generic(-start                 =>20,
                                        -end         => $40,
                                        -primary_tag => 'newfeature' );
                                    $feat->add_tag_value("note","this is
notes");
  $f->add_SeqFeature($feat); ## f is original feature pointer
$io = Bio::SeqIO->new(-format => "genbank", -file => ">$newoutfile" );

    $io->write_seq($seqio_object);

On Fri, Aug 13, 2010 at 11:57 AM, Roy Chaudhuri <roy.chaudhuri at gmail.com>wrote:

> Please remember to copy replies to the mailing list.
>
> You can loop over the features in your Bio::Seq object:
> for my $feat ($seq->get_SeqFeatures) { # do something }
>
> And once you have found the feature you want to modify, you can add a tag
> using something like:
> $feat->add_tag_value('note',"this is a note");
>
> When you're finished you can write out the modified sequence object to a
> new GenBank file.
>
>
> On 13/08/2010 16:40, Jessica Sun wrote:
>
>> no i want to load the genbank file with existing features and I need to
>> add some new feature tags to the existing ones and then save to a new
>> update genbank file for local usage. I just not quite good on how to
>> easily merge the two steps you recommended into one in a neat way.
>>
>> thx
>>
>>
>> On Fri, Aug 13, 2010 at 11:37 AM, Roy Chaudhuri <roy.chaudhuri at gmail.com
>> <mailto:roy.chaudhuri at gmail.com>> wrote:
>>
>>    I'm not sure I understand, do you mean that you want to load just
>>    the sequence from the GenBank file (ignoring the existing
>>    annotation), then add your own features? There are instructions on
>>    how to do that here:
>>    http://www.bioperl.org/wiki/HOWTO:SeqIO#Speed.2C_Bio::Seq::SeqBuilder
>>
>>
>>    On 13/08/2010 16:27, Jessica Sun wrote:
>>
>>        unfortunately. I want to add the feature to the sequence object
>>        I got
>>        from the Genbank file, I do not mind to save a new genbank file but
>>        these new genbank file contains the original genbank format and
>>        info I
>>        got plus the new feature tags I need to added to. Any quick
>>        solution to
>>        this?
>>
>>        thx
>>
>>        Jessica
>>
>>
>>
>>        On Fri, Aug 13, 2010 at 11:21 AM, Roy Chaudhuri
>>        <roy.chaudhuri at gmail.com <mailto:roy.chaudhuri at gmail.com>
>>        <mailto:roy.chaudhuri at gmail.com
>>        <mailto:roy.chaudhuri at gmail.com>>> wrote:
>>
>>            Hi Jessica.
>>
>>            You need to use Bio::SeqIO to read in the GenBank file to a
>>        BioPerl
>>            sequence object, and to write your new GenBank file:
>>        http://www.bioperl.org/wiki/HOWTO:SeqIO
>>
>>            To add a new feature follow the instructions here:
>>
>> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Own_Sequences
>>
>>            (except that you are adding the feature to the sequence
>>        object you
>>            got from the Genbank file, not a new Bio::Seq object).
>>
>>            Cheers.
>>            Roy.
>>
>>
>>            On 13/08/2010 16:06, Jessica Sun wrote:
>>
>>                Does anyone knows how to open a genbank file, add new
>>        feature
>>                and then save
>>                a new genbank
>>                file with new feature added in bioperl ?
>>
>>                thx
>>
>>
>>
>>
>>
>>        --
>>        Jessica Jingping Sun
>>
>>
>>
>>
>>
>> --
>> Jessica Jingping Sun
>>
>
>


-- 
Jessica Jingping Sun


From drummike at gmail.com  Fri Aug 13 13:41:55 2010
From: drummike at gmail.com (Mike Williams)
Date: Fri, 13 Aug 2010 13:41:55 -0400
Subject: [Bioperl-l] Add sequence feature
In-Reply-To: <AANLkTikJ7md5gYebt3GOXBMbCBvDZQ1TmoEkUMp1MPf5@mail.gmail.com>
References: <AANLkTimkwyqE1tbZ+p-T41yvBWVq=GrCJ+L0rT+84mPy@mail.gmail.com>
	<4C6562E0.7090008@gmail.com>
	<AANLkTikGRnpybPPsoSf_FAXWUUpFCK9YEMMiyStBbfbN@mail.gmail.com>
	<4C6566B0.60706@gmail.com>
	<AANLkTikRuHaDmA3JvsdBHBXpgeti-pJvHuJk15s_9K+P@mail.gmail.com>
	<4C656B67.5020402@gmail.com>
	<AANLkTikJ7md5gYebt3GOXBMbCBvDZQ1TmoEkUMp1MPf5@mail.gmail.com>
Message-ID: <AANLkTi=SuCgDmDZ1qQW0-mUQJxigteO4GPnSQD09oB90@mail.gmail.com>

On Fri, Aug 13, 2010 at 1:06 PM, Jessica Sun <jessica.sun at gmail.com> wrote:

> Thanks. I somehow get these error messages.
> by doing this,
>
> my $feat = new Bio::SeqFeature::Generic(-start                 =>20,
>                                        -end         => $40,
>                                        -primary_tag => 'newfeature' );
>                                     $feat->add_tag_value("note","this is
> notes");
>

That $40 looks fishy.  Try deleting the dollar sign.  You did mean just 40,
right?

Mike


From MEC at stowers.org  Fri Aug 13 13:37:50 2010
From: MEC at stowers.org (Cook, Malcolm)
Date: Fri, 13 Aug 2010 12:37:50 -0500
Subject: [Bioperl-l] Add sequence feature
In-Reply-To: <AANLkTikJ7md5gYebt3GOXBMbCBvDZQ1TmoEkUMp1MPf5@mail.gmail.com>
References: <AANLkTimkwyqE1tbZ+p-T41yvBWVq=GrCJ+L0rT+84mPy@mail.gmail.com>
	<4C6562E0.7090008@gmail.com>
	<AANLkTikGRnpybPPsoSf_FAXWUUpFCK9YEMMiyStBbfbN@mail.gmail.com>
	<4C6566B0.60706@gmail.com>
	<AANLkTikRuHaDmA3JvsdBHBXpgeti-pJvHuJk15s_9K+P@mail.gmail.com>
	<4C656B67.5020402@gmail.com>
	<AANLkTikJ7md5gYebt3GOXBMbCBvDZQ1TmoEkUMp1MPf5@mail.gmail.com>
Message-ID: <BD62CBAC4395B94096109020651BE2EC1312232E24@EXCHMB-02.stowers-institute.org>

Jessica,

Show more code!

In particular, where did $f get set?

--Malcolm

 
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Jessica Sun
Sent: Friday, August 13, 2010 12:07 PM
To: Roy Chaudhuri
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Add sequence feature

Thanks. I somehow get these error messages.

--------------------- WARNING ---------------------
MSG:  Bio::SeqIO::genbank=HASH(0xa7ba1c) is not a SeqI compliant module.
Attempting to dump, but may fail!
---------------------------------------------------
Can't locate object method "seq" via package "Bio::SeqIO::genbank" at /Library/Perl/5.8.8/Bio/SeqIO/genbank.pm line 760, <GEN0> line 447.

by doing this,

my $feat = new Bio::SeqFeature::Generic(-start                 =>20,
                                        -end         => $40,
                                        -primary_tag => 'newfeature' );
                                    $feat->add_tag_value("note","this is notes");
  $f->add_SeqFeature($feat); ## f is original feature pointer $io = Bio::SeqIO->new(-format => "genbank", -file => ">$newoutfile" );

    $io->write_seq($seqio_object);

On Fri, Aug 13, 2010 at 11:57 AM, Roy Chaudhuri <roy.chaudhuri at gmail.com>wrote:

> Please remember to copy replies to the mailing list.
>
> You can loop over the features in your Bio::Seq object:
> for my $feat ($seq->get_SeqFeatures) { # do something }
>
> And once you have found the feature you want to modify, you can add a 
> tag using something like:
> $feat->add_tag_value('note',"this is a note");
>
> When you're finished you can write out the modified sequence object to 
> a new GenBank file.
>
>
> On 13/08/2010 16:40, Jessica Sun wrote:
>
>> no i want to load the genbank file with existing features and I need 
>> to add some new feature tags to the existing ones and then save to a 
>> new update genbank file for local usage. I just not quite good on how 
>> to easily merge the two steps you recommended into one in a neat way.
>>
>> thx
>>
>>
>> On Fri, Aug 13, 2010 at 11:37 AM, Roy Chaudhuri 
>> <roy.chaudhuri at gmail.com <mailto:roy.chaudhuri at gmail.com>> wrote:
>>
>>    I'm not sure I understand, do you mean that you want to load just
>>    the sequence from the GenBank file (ignoring the existing
>>    annotation), then add your own features? There are instructions on
>>    how to do that here:
>>    
>> http://www.bioperl.org/wiki/HOWTO:SeqIO#Speed.2C_Bio::Seq::SeqBuilder
>>
>>
>>    On 13/08/2010 16:27, Jessica Sun wrote:
>>
>>        unfortunately. I want to add the feature to the sequence object
>>        I got
>>        from the Genbank file, I do not mind to save a new genbank file but
>>        these new genbank file contains the original genbank format and
>>        info I
>>        got plus the new feature tags I need to added to. Any quick
>>        solution to
>>        this?
>>
>>        thx
>>
>>        Jessica
>>
>>
>>
>>        On Fri, Aug 13, 2010 at 11:21 AM, Roy Chaudhuri
>>        <roy.chaudhuri at gmail.com <mailto:roy.chaudhuri at gmail.com>
>>        <mailto:roy.chaudhuri at gmail.com
>>        <mailto:roy.chaudhuri at gmail.com>>> wrote:
>>
>>            Hi Jessica.
>>
>>            You need to use Bio::SeqIO to read in the GenBank file to a
>>        BioPerl
>>            sequence object, and to write your new GenBank file:
>>        http://www.bioperl.org/wiki/HOWTO:SeqIO
>>
>>            To add a new feature follow the instructions here:
>>
>> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Ow
>> n_Sequences
>>
>>            (except that you are adding the feature to the sequence
>>        object you
>>            got from the Genbank file, not a new Bio::Seq object).
>>
>>            Cheers.
>>            Roy.
>>
>>
>>            On 13/08/2010 16:06, Jessica Sun wrote:
>>
>>                Does anyone knows how to open a genbank file, add new
>>        feature
>>                and then save
>>                a new genbank
>>                file with new feature added in bioperl ?
>>
>>                thx
>>
>>
>>
>>
>>
>>        --
>>        Jessica Jingping Sun
>>
>>
>>
>>
>>
>> --
>> Jessica Jingping Sun
>>
>
>


--
Jessica Jingping Sun
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From Kevin.M.Brown at asu.edu  Fri Aug 13 13:53:50 2010
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Fri, 13 Aug 2010 10:53:50 -0700
Subject: [Bioperl-l] Add sequence feature
In-Reply-To: <AANLkTikJ7md5gYebt3GOXBMbCBvDZQ1TmoEkUMp1MPf5@mail.gmail.com>
References: <AANLkTimkwyqE1tbZ+p-T41yvBWVq=GrCJ+L0rT+84mPy@mail.gmail.com><4C6562E0.7090008@gmail.com><AANLkTikGRnpybPPsoSf_FAXWUUpFCK9YEMMiyStBbfbN@mail.gmail.com><4C6566B0.60706@gmail.com><AANLkTikRuHaDmA3JvsdBHBXpgeti-pJvHuJk15s_9K+P@mail.gmail.com><4C656B67.5020402@gmail.com>
	<AANLkTikJ7md5gYebt3GOXBMbCBvDZQ1TmoEkUMp1MPf5@mail.gmail.com>
Message-ID: <1A4207F8295607498283FE9E93B775B406E4529F@EX02.asurite.ad.asu.edu>

If I'm reading your sample code correctly, then you are mistakenly
trying to output the input SeqIO object and not the actual Bio::Seq
object that was read in by SeqIO.

My $seqio = Bio::SeqIO->new;
My $seq = $seqio->next_seq;

#manipulate $seq

My $out = Bio::SeqIO->new;
$out->write_seq($seq);

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Jessica Sun
Sent: Friday, August 13, 2010 10:07 AM
To: Roy Chaudhuri
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Add sequence feature

Thanks. I somehow get these error messages.

--------------------- WARNING ---------------------
MSG:  Bio::SeqIO::genbank=HASH(0xa7ba1c) is not a SeqI compliant module.
Attempting to dump, but may fail!
---------------------------------------------------
Can't locate object method "seq" via package "Bio::SeqIO::genbank" at
/Library/Perl/5.8.8/Bio/SeqIO/genbank.pm line 760, <GEN0> line 447.

by doing this,

my $feat = new Bio::SeqFeature::Generic(-start                 =>20,
                                        -end         => $40,
                                        -primary_tag => 'newfeature' );
                                    $feat->add_tag_value("note","this is
notes");
  $f->add_SeqFeature($feat); ## f is original feature pointer
$io = Bio::SeqIO->new(-format => "genbank", -file => ">$newoutfile" );

    $io->write_seq($seqio_object);

On Fri, Aug 13, 2010 at 11:57 AM, Roy Chaudhuri
<roy.chaudhuri at gmail.com>wrote:

> Please remember to copy replies to the mailing list.
>
> You can loop over the features in your Bio::Seq object:
> for my $feat ($seq->get_SeqFeatures) { # do something }
>
> And once you have found the feature you want to modify, you can add a
tag
> using something like:
> $feat->add_tag_value('note',"this is a note");
>
> When you're finished you can write out the modified sequence object to
a
> new GenBank file.
>
>
> On 13/08/2010 16:40, Jessica Sun wrote:
>
>> no i want to load the genbank file with existing features and I need
to
>> add some new feature tags to the existing ones and then save to a new
>> update genbank file for local usage. I just not quite good on how to
>> easily merge the two steps you recommended into one in a neat way.
>>
>> thx
>>
>>
>> On Fri, Aug 13, 2010 at 11:37 AM, Roy Chaudhuri
<roy.chaudhuri at gmail.com
>> <mailto:roy.chaudhuri at gmail.com>> wrote:
>>
>>    I'm not sure I understand, do you mean that you want to load just
>>    the sequence from the GenBank file (ignoring the existing
>>    annotation), then add your own features? There are instructions on
>>    how to do that here:
>>
http://www.bioperl.org/wiki/HOWTO:SeqIO#Speed.2C_Bio::Seq::SeqBuilder
>>
>>
>>    On 13/08/2010 16:27, Jessica Sun wrote:
>>
>>        unfortunately. I want to add the feature to the sequence
object
>>        I got
>>        from the Genbank file, I do not mind to save a new genbank
file but
>>        these new genbank file contains the original genbank format
and
>>        info I
>>        got plus the new feature tags I need to added to. Any quick
>>        solution to
>>        this?
>>
>>        thx
>>
>>        Jessica
>>
>>
>>
>>        On Fri, Aug 13, 2010 at 11:21 AM, Roy Chaudhuri
>>        <roy.chaudhuri at gmail.com <mailto:roy.chaudhuri at gmail.com>
>>        <mailto:roy.chaudhuri at gmail.com
>>        <mailto:roy.chaudhuri at gmail.com>>> wrote:
>>
>>            Hi Jessica.
>>
>>            You need to use Bio::SeqIO to read in the GenBank file to
a
>>        BioPerl
>>            sequence object, and to write your new GenBank file:
>>        http://www.bioperl.org/wiki/HOWTO:SeqIO
>>
>>            To add a new feature follow the instructions here:
>>
>>
http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Own_S
equences
>>
>>            (except that you are adding the feature to the sequence
>>        object you
>>            got from the Genbank file, not a new Bio::Seq object).
>>
>>            Cheers.
>>            Roy.
>>
>>
>>            On 13/08/2010 16:06, Jessica Sun wrote:
>>
>>                Does anyone knows how to open a genbank file, add new
>>        feature
>>                and then save
>>                a new genbank
>>                file with new feature added in bioperl ?
>>
>>                thx
>>
>>
>>
>>
>>
>>        --
>>        Jessica Jingping Sun
>>
>>
>>
>>
>>
>> --
>> Jessica Jingping Sun
>>
>
>


-- 
Jessica Jingping Sun
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jessica.sun at gmail.com  Fri Aug 13 15:16:51 2010
From: jessica.sun at gmail.com (Jessica Sun)
Date: Fri, 13 Aug 2010 15:16:51 -0400
Subject: [Bioperl-l] Fwd:  Add sequence feature
In-Reply-To: <AANLkTim6MBPBbRr2bEkCgCL+6NMXGqJ0wWoz3-JPRKyG@mail.gmail.com>
References: <AANLkTimkwyqE1tbZ+p-T41yvBWVq=GrCJ+L0rT+84mPy@mail.gmail.com>
	<4C6562E0.7090008@gmail.com>
	<AANLkTikGRnpybPPsoSf_FAXWUUpFCK9YEMMiyStBbfbN@mail.gmail.com>
	<4C6566B0.60706@gmail.com>
	<AANLkTikRuHaDmA3JvsdBHBXpgeti-pJvHuJk15s_9K+P@mail.gmail.com>
	<4C656B67.5020402@gmail.com>
	<AANLkTikJ7md5gYebt3GOXBMbCBvDZQ1TmoEkUMp1MPf5@mail.gmail.com>
	<1A4207F8295607498283FE9E93B775B406E4529F@EX02.asurite.ad.asu.edu>
	<AANLkTim6MBPBbRr2bEkCgCL+6NMXGqJ0wWoz3-JPRKyG@mail.gmail.com>
Message-ID: <AANLkTimFO1Yn-n7vqmmvAF5smQeGadEW_fs_a0U-7ej4@mail.gmail.com>

---------- Forwarded message ----------
From: Jessica Sun <jessica.sun at gmail.com>
Date: Fri, Aug 13, 2010 at 3:16 PM
Subject: Re: [Bioperl-l] Add sequence feature
To: Kevin Brown <Kevin.M.Brown at asu.edu>


yes, I change that, somehow it still did not take the added features in.


On Fri, Aug 13, 2010 at 1:53 PM, Kevin Brown <Kevin.M.Brown at asu.edu> wrote:

> If I'm reading your sample code correctly, then you are mistakenly
> trying to output the input SeqIO object and not the actual Bio::Seq
> object that was read in by SeqIO.
>
> My $seqio = Bio::SeqIO->new;
> My $seq = $seqio->next_seq;
>
> #manipulate $seq
>
> My $out = Bio::SeqIO->new;
> $out->write_seq($seq);
>
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Jessica Sun
> Sent: Friday, August 13, 2010 10:07 AM
> To: Roy Chaudhuri
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Add sequence feature
>
> Thanks. I somehow get these error messages.
>
> --------------------- WARNING ---------------------
> MSG:  Bio::SeqIO::genbank=HASH(0xa7ba1c) is not a SeqI compliant module.
> Attempting to dump, but may fail!
> ---------------------------------------------------
> Can't locate object method "seq" via package "Bio::SeqIO::genbank" at
> /Library/Perl/5.8.8/Bio/SeqIO/genbank.pm line 760, <GEN0> line 447.
>
> by doing this,
>
> my $feat = new Bio::SeqFeature::Generic(-start                 =>20,
>                                        -end         => $40,
>                                        -primary_tag => 'newfeature' );
>                                    $feat->add_tag_value("note","this is
> notes");
>  $f->add_SeqFeature($feat); ## f is original feature pointer
> $io = Bio::SeqIO->new(-format => "genbank", -file => ">$newoutfile" );
>
>    $io->write_seq($seqio_object);
>
> On Fri, Aug 13, 2010 at 11:57 AM, Roy Chaudhuri
> <roy.chaudhuri at gmail.com>wrote:
>
> > Please remember to copy replies to the mailing list.
> >
> > You can loop over the features in your Bio::Seq object:
> > for my $feat ($seq->get_SeqFeatures) { # do something }
> >
> > And once you have found the feature you want to modify, you can add a
> tag
> > using something like:
> > $feat->add_tag_value('note',"this is a note");
> >
> > When you're finished you can write out the modified sequence object to
> a
> > new GenBank file.
> >
> >
> > On 13/08/2010 16:40, Jessica Sun wrote:
> >
> >> no i want to load the genbank file with existing features and I need
> to
> >> add some new feature tags to the existing ones and then save to a new
> >> update genbank file for local usage. I just not quite good on how to
> >> easily merge the two steps you recommended into one in a neat way.
> >>
> >> thx
> >>
> >>
> >> On Fri, Aug 13, 2010 at 11:37 AM, Roy Chaudhuri
> <roy.chaudhuri at gmail.com
> >> <mailto:roy.chaudhuri at gmail.com>> wrote:
> >>
> >>    I'm not sure I understand, do you mean that you want to load just
> >>    the sequence from the GenBank file (ignoring the existing
> >>    annotation), then add your own features? There are instructions on
> >>    how to do that here:
> >>
> http://www.bioperl.org/wiki/HOWTO:SeqIO#Speed.2C_Bio::Seq::SeqBuilder
> >>
> >>
> >>    On 13/08/2010 16:27, Jessica Sun wrote:
> >>
> >>        unfortunately. I want to add the feature to the sequence
> object
> >>        I got
> >>        from the Genbank file, I do not mind to save a new genbank
> file but
> >>        these new genbank file contains the original genbank format
> and
> >>        info I
> >>        got plus the new feature tags I need to added to. Any quick
> >>        solution to
> >>        this?
> >>
> >>        thx
> >>
> >>        Jessica
> >>
> >>
> >>
> >>        On Fri, Aug 13, 2010 at 11:21 AM, Roy Chaudhuri
> >>        <roy.chaudhuri at gmail.com <mailto:roy.chaudhuri at gmail.com>
> >>        <mailto:roy.chaudhuri at gmail.com
> >>        <mailto:roy.chaudhuri at gmail.com>>> wrote:
> >>
> >>            Hi Jessica.
> >>
> >>            You need to use Bio::SeqIO to read in the GenBank file to
> a
> >>        BioPerl
> >>            sequence object, and to write your new GenBank file:
> >>        http://www.bioperl.org/wiki/HOWTO:SeqIO
> >>
> >>            To add a new feature follow the instructions here:
> >>
> >>
> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Own_S
> equences
> >>
> >>            (except that you are adding the feature to the sequence
> >>        object you
> >>            got from the Genbank file, not a new Bio::Seq object).
> >>
> >>            Cheers.
> >>            Roy.
> >>
> >>
> >>            On 13/08/2010 16:06, Jessica Sun wrote:
> >>
> >>                Does anyone knows how to open a genbank file, add new
> >>        feature
> >>                and then save
> >>                a new genbank
> >>                file with new feature added in bioperl ?
> >>
> >>                thx
> >>
> >>
> >>
> >>
> >>
> >>        --
> >>        Jessica Jingping Sun
> >>
> >>
> >>
> >>
> >>
> >> --
> >> Jessica Jingping Sun
> >>
> >
> >
>
>
> --
> Jessica Jingping Sun
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Jessica Jingping Sun


-- 
Jessica Jingping Sun


From MEC at stowers.org  Fri Aug 13 15:56:09 2010
From: MEC at stowers.org (Cook, Malcolm)
Date: Fri, 13 Aug 2010 14:56:09 -0500
Subject: [Bioperl-l] Fwd:  Add sequence feature
In-Reply-To: <AANLkTimFO1Yn-n7vqmmvAF5smQeGadEW_fs_a0U-7ej4@mail.gmail.com>
References: <AANLkTimkwyqE1tbZ+p-T41yvBWVq=GrCJ+L0rT+84mPy@mail.gmail.com>
	<4C6562E0.7090008@gmail.com>
	<AANLkTikGRnpybPPsoSf_FAXWUUpFCK9YEMMiyStBbfbN@mail.gmail.com>
	<4C6566B0.60706@gmail.com>
	<AANLkTikRuHaDmA3JvsdBHBXpgeti-pJvHuJk15s_9K+P@mail.gmail.com>
	<4C656B67.5020402@gmail.com>
	<AANLkTikJ7md5gYebt3GOXBMbCBvDZQ1TmoEkUMp1MPf5@mail.gmail.com>
	<1A4207F8295607498283FE9E93B775B406E4529F@EX02.asurite.ad.asu.edu>
	<AANLkTim6MBPBbRr2bEkCgCL+6NMXGqJ0wWoz3-JPRKyG@mail.gmail.com>
	<AANLkTimFO1Yn-n7vqmmvAF5smQeGadEW_fs_a0U-7ej4@mail.gmail.com>
Message-ID: <BD62CBAC4395B94096109020651BE2EC1312232E46@EXCHMB-02.stowers-institute.org>

if you want to show all your code we might not have to guess at what the problem is.....
 

Malcolm Cook
Stowers Institute for Medical Research -  Bioinformatics
Kansas City, Missouri  USA
 

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Jessica Sun
Sent: Friday, August 13, 2010 2:17 PM
To: bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] Fwd: Add sequence feature

---------- Forwarded message ----------
From: Jessica Sun <jessica.sun at gmail.com>
Date: Fri, Aug 13, 2010 at 3:16 PM
Subject: Re: [Bioperl-l] Add sequence feature
To: Kevin Brown <Kevin.M.Brown at asu.edu>


yes, I change that, somehow it still did not take the added features in.


On Fri, Aug 13, 2010 at 1:53 PM, Kevin Brown <Kevin.M.Brown at asu.edu> wrote:

> If I'm reading your sample code correctly, then you are mistakenly 
> trying to output the input SeqIO object and not the actual Bio::Seq 
> object that was read in by SeqIO.
>
> My $seqio = Bio::SeqIO->new;
> My $seq = $seqio->next_seq;
>
> #manipulate $seq
>
> My $out = Bio::SeqIO->new;
> $out->write_seq($seq);
>
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Jessica Sun
> Sent: Friday, August 13, 2010 10:07 AM
> To: Roy Chaudhuri
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Add sequence feature
>
> Thanks. I somehow get these error messages.
>
> --------------------- WARNING ---------------------
> MSG:  Bio::SeqIO::genbank=HASH(0xa7ba1c) is not a SeqI compliant module.
> Attempting to dump, but may fail!
> ---------------------------------------------------
> Can't locate object method "seq" via package "Bio::SeqIO::genbank" at 
> /Library/Perl/5.8.8/Bio/SeqIO/genbank.pm line 760, <GEN0> line 447.
>
> by doing this,
>
> my $feat = new Bio::SeqFeature::Generic(-start                 =>20,
>                                        -end         => $40,
>                                        -primary_tag => 'newfeature' );
>                                    $feat->add_tag_value("note","this 
> is notes");  $f->add_SeqFeature($feat); ## f is original feature 
> pointer $io = Bio::SeqIO->new(-format => "genbank", -file => 
> ">$newoutfile" );
>
>    $io->write_seq($seqio_object);
>
> On Fri, Aug 13, 2010 at 11:57 AM, Roy Chaudhuri
> <roy.chaudhuri at gmail.com>wrote:
>
> > Please remember to copy replies to the mailing list.
> >
> > You can loop over the features in your Bio::Seq object:
> > for my $feat ($seq->get_SeqFeatures) { # do something }
> >
> > And once you have found the feature you want to modify, you can add 
> > a
> tag
> > using something like:
> > $feat->add_tag_value('note',"this is a note");
> >
> > When you're finished you can write out the modified sequence object 
> > to
> a
> > new GenBank file.
> >
> >
> > On 13/08/2010 16:40, Jessica Sun wrote:
> >
> >> no i want to load the genbank file with existing features and I 
> >> need
> to
> >> add some new feature tags to the existing ones and then save to a 
> >> new update genbank file for local usage. I just not quite good on 
> >> how to easily merge the two steps you recommended into one in a neat way.
> >>
> >> thx
> >>
> >>
> >> On Fri, Aug 13, 2010 at 11:37 AM, Roy Chaudhuri
> <roy.chaudhuri at gmail.com
> >> <mailto:roy.chaudhuri at gmail.com>> wrote:
> >>
> >>    I'm not sure I understand, do you mean that you want to load just
> >>    the sequence from the GenBank file (ignoring the existing
> >>    annotation), then add your own features? There are instructions on
> >>    how to do that here:
> >>
> http://www.bioperl.org/wiki/HOWTO:SeqIO#Speed.2C_Bio::Seq::SeqBuilder
> >>
> >>
> >>    On 13/08/2010 16:27, Jessica Sun wrote:
> >>
> >>        unfortunately. I want to add the feature to the sequence
> object
> >>        I got
> >>        from the Genbank file, I do not mind to save a new genbank
> file but
> >>        these new genbank file contains the original genbank format
> and
> >>        info I
> >>        got plus the new feature tags I need to added to. Any quick
> >>        solution to
> >>        this?
> >>
> >>        thx
> >>
> >>        Jessica
> >>
> >>
> >>
> >>        On Fri, Aug 13, 2010 at 11:21 AM, Roy Chaudhuri
> >>        <roy.chaudhuri at gmail.com <mailto:roy.chaudhuri at gmail.com>
> >>        <mailto:roy.chaudhuri at gmail.com
> >>        <mailto:roy.chaudhuri at gmail.com>>> wrote:
> >>
> >>            Hi Jessica.
> >>
> >>            You need to use Bio::SeqIO to read in the GenBank file 
> >> to
> a
> >>        BioPerl
> >>            sequence object, and to write your new GenBank file:
> >>        http://www.bioperl.org/wiki/HOWTO:SeqIO
> >>
> >>            To add a new feature follow the instructions here:
> >>
> >>
> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Own
> _S
> equences
> >>
> >>            (except that you are adding the feature to the sequence
> >>        object you
> >>            got from the Genbank file, not a new Bio::Seq object).
> >>
> >>            Cheers.
> >>            Roy.
> >>
> >>
> >>            On 13/08/2010 16:06, Jessica Sun wrote:
> >>
> >>                Does anyone knows how to open a genbank file, add new
> >>        feature
> >>                and then save
> >>                a new genbank
> >>                file with new feature added in bioperl ?
> >>
> >>                thx
> >>
> >>
> >>
> >>
> >>
> >>        --
> >>        Jessica Jingping Sun
> >>
> >>
> >>
> >>
> >>
> >> --
> >> Jessica Jingping Sun
> >>
> >
> >
>
>
> --
> Jessica Jingping Sun
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


--
Jessica Jingping Sun


-- 
Jessica Jingping Sun
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Mon Aug 16 14:02:15 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 16 Aug 2010 13:02:15 -0500
Subject: [Bioperl-l] Bug? Features with similar ranges,
	different IDs are considered overlapping
Message-ID: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu>

All,

This is in reference to a bug report I filed a while back.  In the below test script, two features with the same start/end are compared.  If the features have the same seq_id(), overlap succeeds.  If the seq_id is changed (e.g. is on another chromosome, for instance), the overlap still succeeds.  

The question is: is this a bug?  My vote would be 'yes', but there have been various arguments to say it's not.  

chris

(maybe I'll make this a regular thing on the list, just to hash out some of the edge cases I run into periodically)

=========================================

#!/usr/bin/perl -w

use strict;
use warnings;
use Test::More;
use Bio::SeqFeature::Generic;

my ( $feat1, $feat2 );

$feat1 = Bio::SeqFeature::Generic->new(
    -start  => 40,
    -end    => 80,
    -strand => 1,
    -seq_id => 'ABC123',
);

is $feat1->start,  40,       'start of feature location';
is $feat1->end,    80,       'end of feature location';
is $feat1->seq_id, 'ABC123', 'seq_id';

$feat2 = Bio::SeqFeature::Generic->new(
    -start  => 40,
    -end    => 80,
    -strand => 1,
    -seq_id => 'ABC123',
);

is $feat2->start,  40,       'start of feature location';
is $feat2->end,    80,       'end of feature location';
is $feat2->seq_id, 'ABC123', 'seq_id';

# Generic features with same Seq ID should overlap
ok( $feat2->overlaps($feat1), 'feat2 overlaps feat1' );

# Generic features with different Seq IDs shouldn't overlap
is( $feat2->seq_id('XYZ678'), 'XYZ678', 'change seq_id' );

# this currently fails
ok( !( $feat2->overlaps($feat1), 'feat2 doesn\'t overlap feat1' ) );

done_testing();


From David.Messina at sbc.su.se  Mon Aug 16 14:51:54 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Mon, 16 Aug 2010 20:51:54 +0200
Subject: [Bioperl-l] Bug? Features with similar ranges,
	different IDs are considered overlapping
In-Reply-To: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu>
References: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu>
Message-ID: <A07B1A30-7708-4401-BB13-7B4463D306E7@sbc.su.se>

> The question is: is this a bug?

Hmm, tricky.

Genomic start and end positions with differing IDs shouldn't overlap, but can't SeqFeatures apply to proteins and other molecules where one would want to compare positions without regard to ID?


Dave


From cjfields at illinois.edu  Mon Aug 16 21:39:00 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 16 Aug 2010 20:39:00 -0500
Subject: [Bioperl-l] Bug? Features with similar ranges,
	different IDs are considered overlapping
In-Reply-To: <A07B1A30-7708-4401-BB13-7B4463D306E7@sbc.su.se>
References: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu>
	<A07B1A30-7708-4401-BB13-7B4463D306E7@sbc.su.se>
Message-ID: <E3473ED6-2122-4B4A-8A73-E80C4136CCAC@illinois.edu>

On Aug 16, 2010, at 1:51 PM, Dave Messina wrote:

>> The question is: is this a bug?
> 
> Hmm, tricky.
> 
> Genomic start and end positions with differing IDs shouldn't overlap, but can't SeqFeatures apply to proteins and other molecules where one would want to compare positions without regard to ID?
> 
> Dave

Good point; it's probably the context the methods are used that matters.  So, maybe just a document clarification?

chris


From David.Messina at sbc.su.se  Tue Aug 17 05:06:05 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Tue, 17 Aug 2010 11:06:05 +0200
Subject: [Bioperl-l] Bug? Features with similar ranges,
	different IDs are considered overlapping
In-Reply-To: <E3473ED6-2122-4B4A-8A73-E80C4136CCAC@illinois.edu>
References: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu>
	<A07B1A30-7708-4401-BB13-7B4463D306E7@sbc.su.se>
	<E3473ED6-2122-4B4A-8A73-E80C4136CCAC@illinois.edu>
Message-ID: <83732A2C-C6E3-479A-ACAA-FE5756271991@sbc.su.se>

> Good point; it's probably the context the methods are used that matters.  So, maybe just a document clarification?

That's always good, but it really doesn't solve the issue you're describing.

I mean, who would expect to get overlaps for features on different chromosomes?

To me, that's a clear violation of reasonable user expectations. You shouldn't have to read the docs about something like that.

So what's the solution for these duelling use cases? I haven't thought about it much, but a first approximation might be to add a -genomic boolean flag that, when true, would do the right thing and check the ID when doing overlaps or other positional comparisons.

(Maybe -genomic is too obscure. Maybe it should be -same_id_for_overlaps or something like that.)

And maybe having to know to set a flag is effectively the same thing as having to read the docs to understand SeqFeature's overlap behavior.

What do the rest of you out there think?


Dave


From scott at scottcain.net  Tue Aug 17 08:45:27 2010
From: scott at scottcain.net (Scott Cain)
Date: Tue, 17 Aug 2010 08:45:27 -0400
Subject: [Bioperl-l] Bug? Features with similar ranges,
	different IDs are considered overlapping
In-Reply-To: <83732A2C-C6E3-479A-ACAA-FE5756271991@sbc.su.se>
References: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu>
	<A07B1A30-7708-4401-BB13-7B4463D306E7@sbc.su.se>
	<E3473ED6-2122-4B4A-8A73-E80C4136CCAC@illinois.edu>
	<83732A2C-C6E3-479A-ACAA-FE5756271991@sbc.su.se>
Message-ID: <B7A8E3B4-1E7E-4768-AFF3-3D4C4A5FC3B1@scottcain.net>

Hi Dave and Chris,

It seems to me that the genomic comparison is the thing people would do more often, so if you're going to create a flag, the default should be for the genomic comparison and if somebody is doing the protein space comparison and not getting the the expected results, they'll probably read the docs to find out why. 

Scott

--
Scott Cain, Ph. D.
scott at scottcain dot net
Ontario Institute for Cancer Research
http://gmod.org/
216 392 3087 

Snet from my iPhone.

On Aug 17, 2010, at 5:06 AM, Dave Messina <David.Messina at sbc.su.se> wrote:

>> Good point; it's probably the context the methods are used that matters.  So, maybe just a document clarification?
> 
> That's always good, but it really doesn't solve the issue you're describing.
> 
> I mean, who would expect to get overlaps for features on different chromosomes?
> 
> To me, that's a clear violation of reasonable user expectations. You shouldn't have to read the docs about something like that.
> 
> So what's the solution for these duelling use cases? I haven't thought about it much, but a first approximation might be to add a -genomic boolean flag that, when true, would do the right thing and check the ID when doing overlaps or other positional comparisons.
> 
> (Maybe -genomic is too obscure. Maybe it should be -same_id_for_overlaps or something like that.)
> 
> And maybe having to know to set a flag is effectively the same thing as having to read the docs to understand SeqFeature's overlap behavior.
> 
> What do the rest of you out there think?
> 
> 
> Dave
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From david.breimann at gmail.com  Tue Aug 17 09:44:08 2010
From: david.breimann at gmail.com (David Breimann)
Date: Tue, 17 Aug 2010 16:44:08 +0300
Subject: [Bioperl-l] bp_genbank2gff3.pl error with circular genomes
Message-ID: <AANLkTinzCSDcbXVDabwW+qmwSOKcVkYC-a-pkmYy1dfM@mail.gmail.com>

Hello,

The following genbank has a gene that runs over the 'end" of the
chromosome and into its "beginning", and the script generates an
error.

ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Bacillus_cereus_ATCC_10987/NC_005707.gbk

NC_005707 Unflattening error:
Details:
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: PROBLEM, SEVERITY==2
Ranges not in correct order. Strange ensembl genbank entry? Range:
[207497,208369] [1,687]
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:473
STACK: Bio::SeqFeature::Tools::Unflattener::problem
/usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:952
STACK: Bio::SeqFeature::Tools::Unflattener::_check_order_is_consistent
/usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2842
STACK: Bio::SeqFeature::Tools::Unflattener::infer_mRNA_from_CDS
/usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2713
STACK: Bio::SeqFeature::Tools::Unflattener::unflatten_seq
/usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:1532
STACK: main::unflatten_seq /usr/local/bin/bp_genbank2gff3.pl:1023
STACK: /usr/local/bin/bp_genbank2gff3.pl:506
-----------------------------------------------------------

Best,
Dave


From cjfields at illinois.edu  Tue Aug 17 09:51:02 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 17 Aug 2010 08:51:02 -0500
Subject: [Bioperl-l] bp_genbank2gff3.pl error with circular genomes
In-Reply-To: <AANLkTinzCSDcbXVDabwW+qmwSOKcVkYC-a-pkmYy1dfM@mail.gmail.com>
References: <AANLkTinzCSDcbXVDabwW+qmwSOKcVkYC-a-pkmYy1dfM@mail.gmail.com>
Message-ID: <8E620E8B-4A4A-42B2-A2FA-60C8ECA4C8A5@illinois.edu>

I think Chris Mungall has a branch set up for this in bioperl:

http://github.com/bioperl/bioperl-live/tree/circular

Is that correct?  Should we merge that code into the master branch?

chris

On Aug 17, 2010, at 8:44 AM, David Breimann wrote:

> Hello,
> 
> The following genbank has a gene that runs over the 'end" of the
> chromosome and into its "beginning", and the script generates an
> error.
> 
> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Bacillus_cereus_ATCC_10987/NC_005707.gbk
> 
> NC_005707 Unflattening error:
> Details:
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: PROBLEM, SEVERITY==2
> Ranges not in correct order. Strange ensembl genbank entry? Range:
> [207497,208369] [1,687]
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:473
> STACK: Bio::SeqFeature::Tools::Unflattener::problem
> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:952
> STACK: Bio::SeqFeature::Tools::Unflattener::_check_order_is_consistent
> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2842
> STACK: Bio::SeqFeature::Tools::Unflattener::infer_mRNA_from_CDS
> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2713
> STACK: Bio::SeqFeature::Tools::Unflattener::unflatten_seq
> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:1532
> STACK: main::unflatten_seq /usr/local/bin/bp_genbank2gff3.pl:1023
> STACK: /usr/local/bin/bp_genbank2gff3.pl:506
> -----------------------------------------------------------
> 
> Best,
> Dave
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From David.Messina at sbc.su.se  Tue Aug 17 09:52:11 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Tue, 17 Aug 2010 15:52:11 +0200
Subject: [Bioperl-l] Bug? Features with similar ranges,
	different IDs are considered overlapping
In-Reply-To: <B7A8E3B4-1E7E-4768-AFF3-3D4C4A5FC3B1@scottcain.net>
References: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu>
	<A07B1A30-7708-4401-BB13-7B4463D306E7@sbc.su.se>
	<E3473ED6-2122-4B4A-8A73-E80C4136CCAC@illinois.edu>
	<83732A2C-C6E3-479A-ACAA-FE5756271991@sbc.su.se>
	<B7A8E3B4-1E7E-4768-AFF3-3D4C4A5FC3B1@scottcain.net>
Message-ID: <EA0C23FB-8C2F-4C04-B0E8-4207409916DC@sbc.su.se>

> It seems to me that the genomic comparison is the thing people would do more often, so if you're going to create a flag, the default should be for the genomic comparison

Yep, agreed.

And such a flag should be named for the non-default behavior, then, like: -ignore_IDs_for_overlaps


Dave


From douglas.hoen at gmail.com  Thu Aug 12 10:24:27 2010
From: douglas.hoen at gmail.com (Douglas Hoen)
Date: Thu, 12 Aug 2010 10:24:27 -0400
Subject: [Bioperl-l] HMMER3 to GFF3
In-Reply-To: <20100812141645.1dc6507a.kai.blin@biotech.uni-tuebingen.de>
References: <4bb89ced-69d9-43ff-ae20-4ce134efc40a@f6g2000yqa.googlegroups.com>
	<20100812141645.1dc6507a.kai.blin@biotech.uni-tuebingen.de>
Message-ID: <A1AA9B70-69B9-4AA6-BB5F-FB0D0FDD0491@gmail.com>

Hi Kai,

Here it is.

Thanks,
-- Doug


-------------- next part --------------
A non-text attachment was scrubbed...
Name: chr1-tesigsv2.hmmscan
Type: application/octet-stream
Size: 676132 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20100812/7818b4a4/attachment-0003.obj>
-------------- next part --------------


On 2010-08-12, at 8:16 AM, Kai Blin wrote:

> On Wed, 11 Aug 2010 22:59:37 -0700 (PDT)
> Doug Hoen <douglas.hoen at gmail.com> wrote:
> 
> Hi Doug,
> 
>> Could someone please confirm whether the results are incorrect and, if
>> so, perhaps suggest a fix? It may well be that this problem is due to
>> the unusual way I am using hmmscan, rather than a problem with HMMER3
>> parsing...?
> 
> Can you please attach your hmmer input file? Along the way something
> inserted line breaks, making it unreadable.
> 
> It might well be possible that the HMMer3 parser still handles a little
> different from the HMMer2 parser, I haven't tried that script.
> 
> Cheers,
> Kai
> 
> -- 
> Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
> Institute for Microbiology and Infection Medicine
> Division of Microbiology/Biotechnology
> Eberhard-Karls-University of T?bingen
> Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
> D-72076 T?bingen                        Fax :   ++49 7071 29-5979
> Deutschland
> Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben


From CJMungall at lbl.gov  Tue Aug 17 11:53:15 2010
From: CJMungall at lbl.gov (Chris Mungall)
Date: Tue, 17 Aug 2010 08:53:15 -0700
Subject: [Bioperl-l] bp_genbank2gff3.pl error with circular genomes
In-Reply-To: <8E620E8B-4A4A-42B2-A2FA-60C8ECA4C8A5@illinois.edu>
References: <AANLkTinzCSDcbXVDabwW+qmwSOKcVkYC-a-pkmYy1dfM@mail.gmail.com>
	<8E620E8B-4A4A-42B2-A2FA-60C8ECA4C8A5@illinois.edu>
Message-ID: <D64E3F00-57BE-484B-A4DE-EEAC673C82E4@lbl.gov>


You can merge this in. It should allow David to proceed.

I haven't kept up on synchrony between bioperl and GFF on circular  
genomes. The above fix is conservative in that essentially preserves  
the genbank coordinates even when the origin is crossed:

	http://github.com/bioperl/bioperl-live/commit/d752a4cb5168d1bb01f8c80247a57f66b2bd9daf

However, if this is to conform to GFF3 then the resulting coordinates  
that cross the origin should have start/end incremented by the genome  
length

On Aug 17, 2010, at 6:51 AM, Chris Fields wrote:

> I think Chris Mungall has a branch set up for this in bioperl:
>
> http://github.com/bioperl/bioperl-live/tree/circular
>
> Is that correct?  Should we merge that code into the master branch?
>
> chris
>
> On Aug 17, 2010, at 8:44 AM, David Breimann wrote:
>
>> Hello,
>>
>> The following genbank has a gene that runs over the 'end" of the
>> chromosome and into its "beginning", and the script generates an
>> error.
>>
>> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Bacillus_cereus_ATCC_10987/NC_005707.gbk
>>
>> NC_005707 Unflattening error:
>> Details:
>> ------------- EXCEPTION: Bio::Root::Exception -------------
>> MSG: PROBLEM, SEVERITY==2
>> Ranges not in correct order. Strange ensembl genbank entry? Range:
>> [207497,208369] [1,687]
>> STACK: Error::throw
>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/ 
>> Root.pm:473
>> STACK: Bio::SeqFeature::Tools::Unflattener::problem
>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:952
>> STACK:  
>> Bio::SeqFeature::Tools::Unflattener::_check_order_is_consistent
>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2842
>> STACK: Bio::SeqFeature::Tools::Unflattener::infer_mRNA_from_CDS
>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2713
>> STACK: Bio::SeqFeature::Tools::Unflattener::unflatten_seq
>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:1532
>> STACK: main::unflatten_seq /usr/local/bin/bp_genbank2gff3.pl:1023
>> STACK: /usr/local/bin/bp_genbank2gff3.pl:506
>> -----------------------------------------------------------
>>
>> Best,
>> Dave
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at illinois.edu  Tue Aug 17 15:24:23 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 17 Aug 2010 14:24:23 -0500
Subject: [Bioperl-l] bp_genbank2gff3.pl error with circular genomes
In-Reply-To: <D64E3F00-57BE-484B-A4DE-EEAC673C82E4@lbl.gov>
References: <AANLkTinzCSDcbXVDabwW+qmwSOKcVkYC-a-pkmYy1dfM@mail.gmail.com>
	<8E620E8B-4A4A-42B2-A2FA-60C8ECA4C8A5@illinois.edu>
	<D64E3F00-57BE-484B-A4DE-EEAC673C82E4@lbl.gov>
Message-ID: <8C50FCC4-4BB5-4ACB-A138-BB20F50D2C45@illinois.edu>

On Aug 17, 2010, at 10:53 AM, Chris Mungall wrote:

> You can merge this in. It should allow David to proceed.

Will do.  I'll go ahead and delete the remote branch as well.

> I haven't kept up on synchrony between bioperl and GFF on circular genomes. The above fix is conservative in that essentially preserves the genbank coordinates even when the origin is crossed:
> 
> 	http://github.com/bioperl/bioperl-live/commit/d752a4cb5168d1bb01f8c80247a57f66b2bd9daf
> 
> However, if this is to conform to GFF3 then the resulting coordinates that cross the origin should have start/end incremented by the genome length

Yes, that is a problem that needs to be addressed.  Might be worth filing a bug report for tracking this; we can use David's example, or the one I recently added for phi-X174.

chris

> On Aug 17, 2010, at 6:51 AM, Chris Fields wrote:
> 
>> I think Chris Mungall has a branch set up for this in bioperl:
>> 
>> http://github.com/bioperl/bioperl-live/tree/circular
>> 
>> Is that correct?  Should we merge that code into the master branch?
>> 
>> chris
>> 
>> On Aug 17, 2010, at 8:44 AM, David Breimann wrote:
>> 
>>> Hello,
>>> 
>>> The following genbank has a gene that runs over the 'end" of the
>>> chromosome and into its "beginning", and the script generates an
>>> error.
>>> 
>>> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Bacillus_cereus_ATCC_10987/NC_005707.gbk
>>> 
>>> NC_005707 Unflattening error:
>>> Details:
>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>> MSG: PROBLEM, SEVERITY==2
>>> Ranges not in correct order. Strange ensembl genbank entry? Range:
>>> [207497,208369] [1,687]
>>> STACK: Error::throw
>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:473
>>> STACK: Bio::SeqFeature::Tools::Unflattener::problem
>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:952
>>> STACK: Bio::SeqFeature::Tools::Unflattener::_check_order_is_consistent
>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2842
>>> STACK: Bio::SeqFeature::Tools::Unflattener::infer_mRNA_from_CDS
>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2713
>>> STACK: Bio::SeqFeature::Tools::Unflattener::unflatten_seq
>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:1532
>>> STACK: main::unflatten_seq /usr/local/bin/bp_genbank2gff3.pl:1023
>>> STACK: /usr/local/bin/bp_genbank2gff3.pl:506
>>> -----------------------------------------------------------
>>> 
>>> Best,
>>> Dave
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From sheldon.mckay at gmail.com  Tue Aug 17 16:42:50 2010
From: sheldon.mckay at gmail.com (Sheldon McKay)
Date: Tue, 17 Aug 2010 16:42:50 -0400
Subject: [Bioperl-l] AlignIO and Gbrowse_syn
In-Reply-To: <E53C66C1-E4F1-4E83-B5ED-631CE62D7DCE@illinois.edu>
References: <AANLkTi=DS92zQ+_YANSbgmMiPUDo66YXyLhQby+_z_VD@mail.gmail.com>
	<C87CF736.E5DB%gowthaman.ramasamy@sbri.org>
	<AANLkTimhT7KFHgg8HssiMrXkGNSjRMqjkuE++iAaqj4C@mail.gmail.com>
	<18DF7D20DFEC044098A1062202F5FFF32F0237EAB7@exchsth.agresearch.co.nz>
	<E53C66C1-E4F1-4E83-B5ED-631CE62D7DCE@illinois.edu>
Message-ID: <AANLkTikYi9TGag3poS=xB73iGxqX_-ThZS9wU1TC2JDH@mail.gmail.com>

The growse_syn dev team is pretty small (n=1) right now, so any
patches would be welcome.

Sheldon


On Wed, Aug 11, 2010 at 6:02 PM, Chris Fields <cjfields at illinois.edu> wrote:
> Russell,
>
> We have had very few requests to support .maf until recently, which is why there has been little done with it. ?We welcome any help to improve it.
>
> chris
>
> On Aug 11, 2010, at 4:31 PM, Smithies, Russell wrote:
>
>> I know there was some brief discussion about .maf format a few weeks ago but I've had an enquiry (as below) from a colleague.
>> If GBrowse_syn is using .maf format, does AlignIO need more work?
>> Any comments?
>>
>> --Russell
>>
>>
>> I'd like to plug LASTZ alignments into GBrowse_syn. LASTZ can produce a limit number of alignment formats (http://www.bx.psu.edu/miller_lab/dist/README.lastz-1.02.00/README.lastz-1.02.00a.html#options_output). GBrowse_syn accepts clustalw format plus "other commonly used formats recognized by BioPerl's AlignIO parser" (http://gmod.org/wiki/GBrowse_syn_Database) . ?Since LASTZ doesn't produce clustalw, I've tried parsing LASTZ maf output to clustalw (and other alignment formats) using AlignIO, however I run into the following issues:
>> *Strand info is lost (probably fair enough, since this isn't part of the clustalw format per se; incorporating strand info within sequence IDs is a GBrowse_syn clustalw specification)
>> *The coordinate system for reverse strand matches ?differs between LASTZ .maf and BioPerl .maf: for LASTZ, coordinates relate to the reverse complemented sequence, whereas for BioPerl/GBrowse, coordinates relate to the original (non-rev complemented) sequence. E.g. a coordinate of "1" in the LASTZ .maf file refers to the last base of the original sequence; AlignIO prints "1" to the output clustalw file, but since strand info is lost it is construed as the first position at the very start of the original sequence. As a result all reverse match coordinates in the resulting clustalw output file are incorrect.
>> *AlignIO is unable to parse multiple, individual aligned regions within the same .maf file; it interleaves them
>>
>> I would be interested to hear whether anyone has already found a solution to integrating LASTZ and GBrowse_syn... and also whether any development of AlignIO to improve support of maf format is planned.
>> =======================================================================
>> Attention: The information contained in this message and/or attachments
>> from AgResearch Limited is intended only for the persons or entities
>> to which it is addressed and may contain confidential and/or privileged
>> material. Any review, retransmission, dissemination or other use of, or
>> taking of any action in reliance upon, this information by persons or
>> entities other than the intended recipients is prohibited by AgResearch
>> Limited. If you have received this message in error, please notify the
>> sender immediately.
>> =======================================================================
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From hxu.hong at gmail.com  Tue Aug 17 16:50:43 2010
From: hxu.hong at gmail.com (Hong Xu)
Date: Tue, 17 Aug 2010 16:50:43 -0400
Subject: [Bioperl-l] Bio::Tools::Primer3 question
Message-ID: <AANLkTi=NcuvzepGaqw_TUTr5MM6F2K_b8PT8Fa3qrZg2@mail.gmail.com>

Hello all,

I'm working to parse the Primer3 release 2.2.2-beta result. I made the
necessary changes to make Bio::Tools::Primer3 work with the new output
tags of Primer3 release 2.2.2. But when I tried to get the primer Tm,
I found that Bio::Tools::Primer3 gave different Tm from Primer3 result
file. Then I learned that the Tm was calculated by
Bio::SeqFeature::Primer module, not from parsing Primer3 result. If I
want to get data from parsing Primer3 result, should I write my own
Primer3 parser instead of Bio::Tools::Primer3?

thanks a lot,
Hong


From cjfields at illinois.edu  Tue Aug 17 17:14:02 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 17 Aug 2010 16:14:02 -0500
Subject: [Bioperl-l] Bio::Tools::Primer3 question
In-Reply-To: <AANLkTi=NcuvzepGaqw_TUTr5MM6F2K_b8PT8Fa3qrZg2@mail.gmail.com>
References: <AANLkTi=NcuvzepGaqw_TUTr5MM6F2K_b8PT8Fa3qrZg2@mail.gmail.com>
Message-ID: <E039C425-80C3-4F18-B589-AE98896A1175@illinois.edu>

Already ahead of you there, unfortunately.  I wrote a complete reimplementation of both the Primer3 parser and the Primer3 wrapper that handles both v1 and v2 of primer3_core.  Lack of tuits lately have prevented me from getting tests written up, so for the time being it's sitting in bioperl-dev:

http://github.com/bioperl/bioperl-dev

They are Bio::Tools::Primer3Redux (parser) and Bio::Tools::Run::Primer3Redux (wrapper).

I rewrote those b/c I found the original modules not adequate enough in many ways for my purposes then (the newer version uses simple features or feature pairs instead of the primer features, for the same reasons you mention re: Tm).  You're more than welcome to hack on the code a bit.  I'm planning on pulling it out into my own github repo for separate submission to CPAN.  

chris

On Aug 17, 2010, at 3:50 PM, Hong Xu wrote:

> Hello all,
> 
> I'm working to parse the Primer3 release 2.2.2-beta result. I made the
> necessary changes to make Bio::Tools::Primer3 work with the new output
> tags of Primer3 release 2.2.2. But when I tried to get the primer Tm,
> I found that Bio::Tools::Primer3 gave different Tm from Primer3 result
> file. Then I learned that the Tm was calculated by
> Bio::SeqFeature::Primer module, not from parsing Primer3 result. If I
> want to get data from parsing Primer3 result, should I write my own
> Primer3 parser instead of Bio::Tools::Primer3?
> 
> thanks a lot,
> Hong
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Tue Aug 17 23:42:59 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 17 Aug 2010 22:42:59 -0500
Subject: [Bioperl-l] bp_genbank2gff3.pl error with circular genomes
In-Reply-To: <8C50FCC4-4BB5-4ACB-A138-BB20F50D2C45@illinois.edu>
References: <AANLkTinzCSDcbXVDabwW+qmwSOKcVkYC-a-pkmYy1dfM@mail.gmail.com>
	<8E620E8B-4A4A-42B2-A2FA-60C8ECA4C8A5@illinois.edu>
	<D64E3F00-57BE-484B-A4DE-EEAC673C82E4@lbl.gov>
	<8C50FCC4-4BB5-4ACB-A138-BB20F50D2C45@illinois.edu>
Message-ID: <D1CC1B9C-36A7-4427-9100-AE5C85C5E965@illinois.edu>

Chris, David, 

The branch is now merged back to trunk.  David, let us know if this helps.

chris (f)

On Aug 17, 2010, at 2:24 PM, Chris Fields wrote:

> On Aug 17, 2010, at 10:53 AM, Chris Mungall wrote:
> 
>> You can merge this in. It should allow David to proceed.
> 
> Will do.  I'll go ahead and delete the remote branch as well.
> 
>> I haven't kept up on synchrony between bioperl and GFF on circular genomes. The above fix is conservative in that essentially preserves the genbank coordinates even when the origin is crossed:
>> 
>> 	http://github.com/bioperl/bioperl-live/commit/d752a4cb5168d1bb01f8c80247a57f66b2bd9daf
>> 
>> However, if this is to conform to GFF3 then the resulting coordinates that cross the origin should have start/end incremented by the genome length
> 
> Yes, that is a problem that needs to be addressed.  Might be worth filing a bug report for tracking this; we can use David's example, or the one I recently added for phi-X174.
> 
> chris
> 
>> On Aug 17, 2010, at 6:51 AM, Chris Fields wrote:
>> 
>>> I think Chris Mungall has a branch set up for this in bioperl:
>>> 
>>> http://github.com/bioperl/bioperl-live/tree/circular
>>> 
>>> Is that correct?  Should we merge that code into the master branch?
>>> 
>>> chris
>>> 
>>> On Aug 17, 2010, at 8:44 AM, David Breimann wrote:
>>> 
>>>> Hello,
>>>> 
>>>> The following genbank has a gene that runs over the 'end" of the
>>>> chromosome and into its "beginning", and the script generates an
>>>> error.
>>>> 
>>>> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Bacillus_cereus_ATCC_10987/NC_005707.gbk
>>>> 
>>>> NC_005707 Unflattening error:
>>>> Details:
>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>>> MSG: PROBLEM, SEVERITY==2
>>>> Ranges not in correct order. Strange ensembl genbank entry? Range:
>>>> [207497,208369] [1,687]
>>>> STACK: Error::throw
>>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:473
>>>> STACK: Bio::SeqFeature::Tools::Unflattener::problem
>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:952
>>>> STACK: Bio::SeqFeature::Tools::Unflattener::_check_order_is_consistent
>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2842
>>>> STACK: Bio::SeqFeature::Tools::Unflattener::infer_mRNA_from_CDS
>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2713
>>>> STACK: Bio::SeqFeature::Tools::Unflattener::unflatten_seq
>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:1532
>>>> STACK: main::unflatten_seq /usr/local/bin/bp_genbank2gff3.pl:1023
>>>> STACK: /usr/local/bin/bp_genbank2gff3.pl:506
>>>> -----------------------------------------------------------
>>>> 
>>>> Best,
>>>> Dave
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Wed Aug 18 00:48:55 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 17 Aug 2010 23:48:55 -0500
Subject: [Bioperl-l] Bio::Tools::Primer3 question
In-Reply-To: <E039C425-80C3-4F18-B589-AE98896A1175@illinois.edu>
References: <AANLkTi=NcuvzepGaqw_TUTr5MM6F2K_b8PT8Fa3qrZg2@mail.gmail.com>
	<E039C425-80C3-4F18-B589-AE98896A1175@illinois.edu>
Message-ID: <C4B91FBD-1705-4045-9D98-F5ABEA80C038@illinois.edu>

Hong,

The latest code, along with working tests, is present here:

http://github.com/cjfields/Bio-Tools-Primer3Redux

It needs a few more tests but the initial wrapper tests work fine for primer3 v2.2.1 on both Mac and Linux.  Will try using this to CPAN after a bit more cleanup.

chris

On Aug 17, 2010, at 4:14 PM, Chris Fields wrote:

> Already ahead of you there, unfortunately.  I wrote a complete reimplementation of both the Primer3 parser and the Primer3 wrapper that handles both v1 and v2 of primer3_core.  Lack of tuits lately have prevented me from getting tests written up, so for the time being it's sitting in bioperl-dev:
> 
> http://github.com/bioperl/bioperl-dev
> 
> They are Bio::Tools::Primer3Redux (parser) and Bio::Tools::Run::Primer3Redux (wrapper).
> 
> I rewrote those b/c I found the original modules not adequate enough in many ways for my purposes then (the newer version uses simple features or feature pairs instead of the primer features, for the same reasons you mention re: Tm).  You're more than welcome to hack on the code a bit.  I'm planning on pulling it out into my own github repo for separate submission to CPAN.  
> 
> chris
> 
> On Aug 17, 2010, at 3:50 PM, Hong Xu wrote:
> 
>> Hello all,
>> 
>> I'm working to parse the Primer3 release 2.2.2-beta result. I made the
>> necessary changes to make Bio::Tools::Primer3 work with the new output
>> tags of Primer3 release 2.2.2. But when I tried to get the primer Tm,
>> I found that Bio::Tools::Primer3 gave different Tm from Primer3 result
>> file. Then I learned that the Tm was calculated by
>> Bio::SeqFeature::Primer module, not from parsing Primer3 result. If I
>> want to get data from parsing Primer3 result, should I write my own
>> Primer3 parser instead of Bio::Tools::Primer3?
>> 
>> thanks a lot,
>> Hong
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From david.breimann at gmail.com  Wed Aug 18 02:46:58 2010
From: david.breimann at gmail.com (David Breimann)
Date: Wed, 18 Aug 2010 09:46:58 +0300
Subject: [Bioperl-l] bp_genbank2gff3.pl error with circular genomes
In-Reply-To: <D1CC1B9C-36A7-4427-9100-AE5C85C5E965@illinois.edu>
References: <AANLkTinzCSDcbXVDabwW+qmwSOKcVkYC-a-pkmYy1dfM@mail.gmail.com>
	<8E620E8B-4A4A-42B2-A2FA-60C8ECA4C8A5@illinois.edu>
	<D64E3F00-57BE-484B-A4DE-EEAC673C82E4@lbl.gov>
	<8C50FCC4-4BB5-4ACB-A138-BB20F50D2C45@illinois.edu>
	<D1CC1B9C-36A7-4427-9100-AE5C85C5E965@illinois.edu>
Message-ID: <AANLkTinsqQCpybg6MUzTwqNuKMn=kJMV4pL64GXwAOkG@mail.gmail.com>

Dear Chris's,

I tested the updated version on multiple genomes that previously
returned errors (for future reference: NC_005707, NC_006578,
NC_007103, NC_007104, NC_007106, NC_007107, NC_008573, NC_008762,
NC_008763, NC_008785, NC_009457, NC_012040). The script now ends
normally on all of them. However, as you mentioned, the result GFF3
file does not comply with GFF3 specifications for circular genomes.
This in turn causes some unexpected results in other applications.

Best,
Dave

On Wed, Aug 18, 2010 at 6:42 AM, Chris Fields <cjfields at illinois.edu> wrote:
> Chris, David,
>
> The branch is now merged back to trunk. ?David, let us know if this helps.
>
> chris (f)
>
> On Aug 17, 2010, at 2:24 PM, Chris Fields wrote:
>
>> On Aug 17, 2010, at 10:53 AM, Chris Mungall wrote:
>>
>>> You can merge this in. It should allow David to proceed.
>>
>> Will do. ?I'll go ahead and delete the remote branch as well.
>>
>>> I haven't kept up on synchrony between bioperl and GFF on circular genomes. The above fix is conservative in that essentially preserves the genbank coordinates even when the origin is crossed:
>>>
>>> ? ? ?http://github.com/bioperl/bioperl-live/commit/d752a4cb5168d1bb01f8c80247a57f66b2bd9daf
>>>
>>> However, if this is to conform to GFF3 then the resulting coordinates that cross the origin should have start/end incremented by the genome length
>>
>> Yes, that is a problem that needs to be addressed. ?Might be worth filing a bug report for tracking this; we can use David's example, or the one I recently added for phi-X174.
>>
>> chris
>>
>>> On Aug 17, 2010, at 6:51 AM, Chris Fields wrote:
>>>
>>>> I think Chris Mungall has a branch set up for this in bioperl:
>>>>
>>>> http://github.com/bioperl/bioperl-live/tree/circular
>>>>
>>>> Is that correct? ?Should we merge that code into the master branch?
>>>>
>>>> chris
>>>>
>>>> On Aug 17, 2010, at 8:44 AM, David Breimann wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> The following genbank has a gene that runs over the 'end" of the
>>>>> chromosome and into its "beginning", and the script generates an
>>>>> error.
>>>>>
>>>>> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Bacillus_cereus_ATCC_10987/NC_005707.gbk
>>>>>
>>>>> NC_005707 Unflattening error:
>>>>> Details:
>>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>>>> MSG: PROBLEM, SEVERITY==2
>>>>> Ranges not in correct order. Strange ensembl genbank entry? Range:
>>>>> [207497,208369] [1,687]
>>>>> STACK: Error::throw
>>>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:473
>>>>> STACK: Bio::SeqFeature::Tools::Unflattener::problem
>>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:952
>>>>> STACK: Bio::SeqFeature::Tools::Unflattener::_check_order_is_consistent
>>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2842
>>>>> STACK: Bio::SeqFeature::Tools::Unflattener::infer_mRNA_from_CDS
>>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2713
>>>>> STACK: Bio::SeqFeature::Tools::Unflattener::unflatten_seq
>>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:1532
>>>>> STACK: main::unflatten_seq /usr/local/bin/bp_genbank2gff3.pl:1023
>>>>> STACK: /usr/local/bin/bp_genbank2gff3.pl:506
>>>>> -----------------------------------------------------------
>>>>>
>>>>> Best,
>>>>> Dave
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From G.Gallone at sms.ed.ac.uk  Wed Aug 18 10:57:01 2010
From: G.Gallone at sms.ed.ac.uk (Giuseppe Gallone)
Date: Wed, 18 Aug 2010 15:57:01 +0100
Subject: [Bioperl-l] [RFC] Interolog::Walk
Message-ID: <4C6BF4BD.5010200@sms.ed.ac.uk>

Hello BioPerl community - I've written a new module called 
Interolog::Walk that I'm planning to put on CPAN. I would be grateful if 
you might take a look at the brief description I attached and tell me 
what you think. I'll be more than happy to post further details should 
the module be of some interest for someone.

Also, I am not totally sure about having the correct name for it. This 
is my first module and It would be great if you could advise on naming 
it appropriately. Hopefully the following description will give an idea 
on what it does.

===================


NAME
     Interolog::Walk - Retrieve, score and visualize putative 
Protein-Protein Interactions through the orthology-walk method

DESCRIPTION
     A common activity in computational biology is to mine 
protein-protein interactions from publicly available databases in order 
to build Protein-Protein Interaction (PPI) datasets.
In many instances, however, the number of experimentally obtained 
annotated PPIs is very scarce and it would be helpful to enrich the 
experimental dataset with high-quality, computationally-inferred PPIs. 
Such computationally-obtained dataset can extend, support or enrich 
experimental PPI datasets, and are of crucial importance in 
high-throughput gene prioritization studies, i.e. to drive hypotheses 
and restrict the dimensionality of many gene functional discovery problems.
This Perl Module, Interolog::Walk, is aimed at building putative PPI 
datasets on the basis of a number of comparative biology paradigms: the 
module implements a collection of computational biology algorithms based 
on the concept of "orthology projection". If interacting proteins A and 
B in organism X have orthologs A' and B' in organism Y, under certain 
conditions one can assume that the interaction will be conserved in 
organism Y, i.e. the A-B interaction can be "projected through the 
orthologies" to obtain a putative A'-B' interaction. The pair of 
interactions (A-B) and (A'-B') are named "Interologs" (see for instance 
[1] and [2]).

Interolog::Walk collects, analyses and collates gene orthology data 
provided by the Ensembl Consortium (www.ensembl.org) as well as PPI data 
provided by EBI Intact (http://www.ebi.ac.uk/intact/). It provides the 
user with the possibility of rating the quality and reliability of the 
putative interactions collected, by means of confidence scores, and 
optionally outputs network representations of the datasets, compatible 
with the biological network representation standard, Cytoscape.

USAGE
In order to carry out an interolog walk we start with a set of gene 
identifiers in one organism of interest. We query those ids against a 
number of comparative biology databases to retrieve a list of 
orthologues for each gene id of interest, in one or more species.
In the following step we rely  on PPI databases to retrieve the list of 
available interactors for the protein ids obtained. The output at this 
stage consists of a list of interactors of the orthologues of the 
initial gene set, plus several fields of ancillary data.
In the last step of the process we  project the interactions - again 
using orthology data - back to the original species of interest. The 
output of the process is a list of PUTATIVE INTERACTORS of the initial 
gene set, plus several fields of ancillary data.

====================

Given the scope and the focus of the project, I would imagine that 
viable alternatives for the namespace might be

Bio::Orthology::InterologWalk
Bio::InterologMap

or maybe
Interolog::Map
Orthology::Map
Orthology::InterologMap

There are no similar projects as far as I could see so I shouldn't run 
the risk of overlapping namespaces. Still I would love to know your 
informed opinion about it.

best,
Giuseppe


REFERENCES
[1] Yu H, Luscombe NM, Lu HX, Zhu X, Xia Y, Han JD, Bertin N, Chung S, 
Vidal M, Gerstein M. Annotation transfer between genomes: 
protein-protein interologs and protein-DNA regulogs. Genome Research 
2004 Jun;14(6):1107-18.

[2]Wiles AM, Doderer M, Ruan J, Gu T-T, Ravi D, Blackman BA, Bishop AJR. 
"Building and Analyzing Protein Interactome Networks by Cross-species 
Comparisons." BMC Systems Biology 2010, 4:36 - PMID: 20353594

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


From David.Messina at sbc.su.se  Wed Aug 18 12:52:58 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Wed, 18 Aug 2010 18:52:58 +0200
Subject: [Bioperl-l] [RFC] Interolog::Walk
In-Reply-To: <4C6BF4BD.5010200@sms.ed.ac.uk>
References: <4C6BF4BD.5010200@sms.ed.ac.uk>
Message-ID: <8BBCD561-CA5E-412B-8CBC-34767006A74A@sbc.su.se>

Hi Giuseppe,

Sounds really interesting ? thanks for posting this.

> Bio::Orthology::InterologWalk

I vote for this name, or in any case something with Bio:: as the top-level namespace since it's a biology-related package.

I like that you're providing a lot of background and information about the project in the documentation. However, the USAGE section should give information about how to use the module, with example code. You can look at other modules on CPAN (or in BioPerl) to see the conventions for writing documentation.

Also, from what you wrote, it sounds like this might be a pipeline or a script rather than a module per se, or perhaps a script and a set of modules. It would be helpful to clarify in your documentation (if you haven't already) how exactly things are organized (and of course example code will help with that, too).


Hope that's helpful, and let us know when you've got it up on CPAN so we can try it out!


Dave


From cjfields at illinois.edu  Wed Aug 18 14:24:16 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 18 Aug 2010 13:24:16 -0500
Subject: [Bioperl-l] bp_genbank2gff3.pl error with circular genomes
In-Reply-To: <AANLkTinsqQCpybg6MUzTwqNuKMn=kJMV4pL64GXwAOkG@mail.gmail.com>
References: <AANLkTinzCSDcbXVDabwW+qmwSOKcVkYC-a-pkmYy1dfM@mail.gmail.com>
	<8E620E8B-4A4A-42B2-A2FA-60C8ECA4C8A5@illinois.edu>
	<D64E3F00-57BE-484B-A4DE-EEAC673C82E4@lbl.gov>
	<8C50FCC4-4BB5-4ACB-A138-BB20F50D2C45@illinois.edu>
	<D1CC1B9C-36A7-4427-9100-AE5C85C5E965@illinois.edu>
	<AANLkTinsqQCpybg6MUzTwqNuKMn=kJMV4pL64GXwAOkG@mail.gmail.com>
Message-ID: <C385563A-9724-4045-B5A2-7F28A5CB897A@illinois.edu>

Okay, will file this as a bug.  Thanks!

chris

On Aug 18, 2010, at 1:46 AM, David Breimann wrote:

> Dear Chris's,
> 
> I tested the updated version on multiple genomes that previously
> returned errors (for future reference: NC_005707, NC_006578,
> NC_007103, NC_007104, NC_007106, NC_007107, NC_008573, NC_008762,
> NC_008763, NC_008785, NC_009457, NC_012040). The script now ends
> normally on all of them. However, as you mentioned, the result GFF3
> file does not comply with GFF3 specifications for circular genomes.
> This in turn causes some unexpected results in other applications.
> 
> Best,
> Dave
> 
> On Wed, Aug 18, 2010 at 6:42 AM, Chris Fields <cjfields at illinois.edu> wrote:
>> Chris, David,
>> 
>> The branch is now merged back to trunk.  David, let us know if this helps.
>> 
>> chris (f)
>> 
>> On Aug 17, 2010, at 2:24 PM, Chris Fields wrote:
>> 
>>> On Aug 17, 2010, at 10:53 AM, Chris Mungall wrote:
>>> 
>>>> You can merge this in. It should allow David to proceed.
>>> 
>>> Will do.  I'll go ahead and delete the remote branch as well.
>>> 
>>>> I haven't kept up on synchrony between bioperl and GFF on circular genomes. The above fix is conservative in that essentially preserves the genbank coordinates even when the origin is crossed:
>>>> 
>>>>      http://github.com/bioperl/bioperl-live/commit/d752a4cb5168d1bb01f8c80247a57f66b2bd9daf
>>>> 
>>>> However, if this is to conform to GFF3 then the resulting coordinates that cross the origin should have start/end incremented by the genome length
>>> 
>>> Yes, that is a problem that needs to be addressed.  Might be worth filing a bug report for tracking this; we can use David's example, or the one I recently added for phi-X174.
>>> 
>>> chris
>>> 
>>>> On Aug 17, 2010, at 6:51 AM, Chris Fields wrote:
>>>> 
>>>>> I think Chris Mungall has a branch set up for this in bioperl:
>>>>> 
>>>>> http://github.com/bioperl/bioperl-live/tree/circular
>>>>> 
>>>>> Is that correct?  Should we merge that code into the master branch?
>>>>> 
>>>>> chris
>>>>> 
>>>>> On Aug 17, 2010, at 8:44 AM, David Breimann wrote:
>>>>> 
>>>>>> Hello,
>>>>>> 
>>>>>> The following genbank has a gene that runs over the 'end" of the
>>>>>> chromosome and into its "beginning", and the script generates an
>>>>>> error.
>>>>>> 
>>>>>> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Bacillus_cereus_ATCC_10987/NC_005707.gbk
>>>>>> 
>>>>>> NC_005707 Unflattening error:
>>>>>> Details:
>>>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>>>>> MSG: PROBLEM, SEVERITY==2
>>>>>> Ranges not in correct order. Strange ensembl genbank entry? Range:
>>>>>> [207497,208369] [1,687]
>>>>>> STACK: Error::throw
>>>>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:473
>>>>>> STACK: Bio::SeqFeature::Tools::Unflattener::problem
>>>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:952
>>>>>> STACK: Bio::SeqFeature::Tools::Unflattener::_check_order_is_consistent
>>>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2842
>>>>>> STACK: Bio::SeqFeature::Tools::Unflattener::infer_mRNA_from_CDS
>>>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2713
>>>>>> STACK: Bio::SeqFeature::Tools::Unflattener::unflatten_seq
>>>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:1532
>>>>>> STACK: main::unflatten_seq /usr/local/bin/bp_genbank2gff3.pl:1023
>>>>>> STACK: /usr/local/bin/bp_genbank2gff3.pl:506
>>>>>> -----------------------------------------------------------
>>>>>> 
>>>>>> Best,
>>>>>> Dave
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>> 
>>>> 
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cdavis at bcm.tmc.edu  Wed Aug 18 15:19:53 2010
From: cdavis at bcm.tmc.edu (Caleb Davis)
Date: Wed, 18 Aug 2010 14:19:53 -0500
Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlast.pm - bl2seq question
Message-ID: <4C6C3259.4060304@bcm.tmc.edu>

Hello, thank you for bioperl!

I am getting discrepancies between the online bl2seq 
(www.ncbi.nlm.nih.gov/blast/*bl2seq*/wblast2.cgi) and bioperl's 
implementation, and I'm not sure why. I'm seeing a desired behavior 
through the web interface but can't replicate it locally. Specifically, 
online bl2seq aligns across a 1 bp insertion in the subject whereas the 
local bl2seq just reports a shorter alignment.

Any ideas? Thanks again,
--Caleb

The desired parameter differences from default are -F F -W 7 (turn 
complexity filter off, word size = 7). Below I present the online and 
local results given the following input sequences:

 >consensus
GAGGATCCAGAATTCTC
 >FVFTF6N01A86BR
AACCCAATGTAAGGAAGCTAAGAACCTTGAAAAGAGGATACCAGAATTCTC

Here are the parameters and result I'm getting online:
Blast4-request ::= {
  body queue-search {
    program "blastn",
    service "plain",
    queries bioseq-set {
      seq-set {
        seq {
          id {
            local id 26297
          },
          descr {
            title "consensus",
            user {
              type str "CFastaReader",
              data {
                {
                  label str "DefLine",
                  data str ">consensus"
                }
              }
            }
          },
          inst {
            repr raw,
            mol na,
            length 17,
            seq-data ncbi2na '8A3520F740'H
          }
        }
      }
    },
    subject sequences {
      {
        id {
          local id 26299
        },
        descr {
          title "FVFTF6N01A86BR",
          user {
            type str "CFastaReader",
            data {
              {
                label str "DefLine",
                data str ">FVFTF6N01A86BR"
              }
            }
          }
        },
        inst {
          repr raw,
          mol na,
          length 51,
          seq-data ncbi2na '0543B0A09C205F80228C520F74'H
        }
      }
    },
    algorithm-options {
      {
        name "EvalueThreshold",
        value cutoff e-value { 1, 10, 1 }
      },
      {
        name "UngappedMode",
        value boolean FALSE
      },
      {
        name "PercentIdentity",
        value real { 0, 10, 0 }
      },
      {
        name "HitlistSize",
        value integer 100
      },
      {
        name "EffectiveSearchSpace",
        value big-integer 0
      },
      {
        name "DbLength",
        value big-integer 0
      },
      {
        name "WindowSize",
        value integer 0
      },
      {
        name "DustFiltering",
        value boolean FALSE
      },
      {
        name "RepeatFiltering",
        value boolean FALSE
      },
      {
        name "MaskAtHash",
        value boolean TRUE
      },
      {
        name "MismatchPenalty",
        value integer -3
      },
      {
        name "MatchReward",
        value integer 2
      },
      {
        name "GapOpeningCost",
        value integer 5
      },
      {
        name "GapExtensionCost",
        value integer 2
      },
      {
        name "StrandOption",
        value strand-type both-strands
      },
      {
        name "WordSize",
        value integer 7
      }
    },
    format-options {
      {
        name "Web_JobTitle",
        value string "consensus"
      },
      {
        name "Web_BlastSpecialPage",
        value string "blast2seq"
      }
    }
  }
}

 >lcl|30439 FVFTF6N01A86BR
Length=51


                                                         Sort alignments 
for this subject sequence by:
                                                           E value  
Score  Percent identity
                                                           Query start 
position  Subject start position
 Score = 24.7 bits (26),  Expect = 2e-05
 Identities = 17/18 (94%), Gaps = 1/18 (5%)
 Strand=Plus/Plus

Query  1   GAGGAT-CCAGAATTCTC  17
           |||||| |||||||||||
Sbjct  34  GAGGATACCAGAATTCTC  51

Here's the output from a local search (I changed the expect to 5.0 just 
to prove to myself that some parameters are getting through OK):
my @params = (-program => 'blastn', -outfile => 'bl2seq.out', -FILTER => 
'F', -WORDSIZE => 7, -expect => 5.0);
my $factory = Bio::Tools::Run::StandAloneBlast->new(@params);
my $bl2seq_report = $factory->bl2seq($cons_seqobj, $single_seqobj); 
#consensus vs. FVFTF6N01A86BR
print Dumper $bl2seq_report->next_result;

$VAR1 = bless( {
                 '_inclusion_threshold' => undef,
                 '_queryacc' => 'adapter_consensus',
                 '_iteration_index' => 0,
                 '_iteration_count' => 1,
                 '_hits' => [],
                 '_hitindex' => 0,
                 '_querylength' => '17',
                 '_querydesc' => '',
                 '_iterations' => [
                                    bless( {
                                             
'_oldhits_not_below_threshold' => [],
                                             '_newhits_unclassified' => [],
                                             '_number' => 1,
                                             
'_oldhits_newly_below_threshold' => [],
                                             '_hit_factory' => bless( {
                                                                        
'interface' => 'Bio::Search::Hit::HitI',
                                                                        
'type' => 'Bio::Search::Hit::BlastHit',
                                                                        
'_loaded_types' => {
                                                                                             
'Bio::Search::Hit::BlastHit' => 1
                                                                                           
},
                                                                        
'_root_verbose' => 0
                                                                      }, 
'Bio::Factory::ObjectFactory' ),
                                             '_newhits_below_threshold' => [
                                                                             
{
                                                                               
'-algorithm' => 'BLASTN',
                                                                               
'-description' => '',
                                                                               
'-length' => '51',
                                                                               
'-query_len' => '17',
                                                                               
'-hsp_factory' => bless( {
                                                                                                          
'interface' => 'Bio::Search::HSP::HSPI',
                                                                                                          
'type' => 'Bio::Search::HSP::GenericHSP',
                                                                                                          
'_loaded_types' => {
                                                                                                                               
'Bio::Search::HSP::GenericHSP' => 1
                                                                                                                             
},
                                                                                                          
'_root_verbose' => 0
                                                                                                        
}, 'Bio::Factory::ObjectFactory' ),
                                                                               
'-name' => 'FVFTF6N01A86BR',
                                                                               
'-rank' => 1,
                                                                               
'-hsps' => [
                                                                                            
{
                                                                                              
'-query_start' => '7',
                                                                                              
'-algorithm' => 'BLASTN',
                                                                                              
'-hit_seq' => 'ccagaattctc',
                                                                                              
'-hit_length' => '51',
                                                                                              
'-query_length' => '17',
                                                                                              
'-query_desc' => '',
                                                                                              
'-query_frame' => 0,
                                                                                              
'-rank' => 1,
                                                                                              
'-hit_desc' => '',
                                                                                              
'-query_end' => '17',
                                                                                              
'-hit_name' => 'FVFTF6N01A86BR',
                                                                                              
'-identical' => '11',
                                                                                              
'-query_name' => 'adapter_consensus',
                                                                                              
'-evalue' => '1e-04',
                                                                                              
'-score' => '11',
                                                                                              
'-conserved' => '11',
                                                                                              
'-hit_frame' => 0,
                                                                                              
'-hsp_length' => '11',
                                                                                              
'-query_seq' => 'ccagaattctc',
                                                                                              
'-hit_start' => '41',
                                                                                              
'-homology_seq' => '|||||||||||',
                                                                                              
'-hit_end' => '51',
                                                                                              
'-bits' => '22.3'
                                                                                            
},
                                                                                            
{
                                                                                              
'-query_start' => '9',
                                                                                              
'-algorithm' => 'BLASTN',
                                                                                              
'-hit_seq' => 'agaattct',
                                                                                              
'-hit_length' => '51',
                                                                                              
'-query_length' => '17',
                                                                                              
'-query_desc' => '',
                                                                                              
'-query_frame' => 0,
                                                                                              
'-rank' => 2,
                                                                                              
'-hit_desc' => '',
                                                                                              
'-query_end' => '16',
                                                                                              
'-hit_name' => 'FVFTF6N01A86BR',
                                                                                              
'-identical' => '8',
                                                                                              
'-query_name' => 'adapter_consensus',
                                                                                              
'-evalue' => '0.007',
                                                                                              
'-score' => '8',
                                                                                              
'-conserved' => '8',
                                                                                              
'-hit_frame' => 0,
                                                                                              
'-hsp_length' => '8',
                                                                                              
'-query_seq' => 'agaattct',
                                                                                              
'-hit_start' => '50',
                                                                                              
'-homology_seq' => '||||||||',
                                                                                              
'-hit_end' => '43',
                                                                                              
'-bits' => '16.4'
                                                                                            
}
                                                                                          
],
                                                                               
'-accession' => 'FVFTF6N01A86BR',
                                                                               
'-significance' => '1e-04'
                                                                             
}
                                                                           
],
                                             '_root_verbose' => 0,
                                             
'_newhits_not_below_threshold' => [],
                                             '_oldhits_below_threshold' 
=> []
                                           }, 
'Bio::Search::Iteration::GenericIteration' )
                                  ],
                 '_hit_factory' => 
$VAR1->{'_iterations'}[0]{'_hit_factory'},
                 '_statistics' => bless( {
                                           'stats' => {
                                                        'S1' => '4',
                                                        'S1_bits' => '8.4',
                                                        'kappa_gapped' 
=> '0.711',
                                                        'X3_bits' => '99.1',
                                                        'X1' => '4',
                                                        'lambda_gapped' 
=> '1.37',
                                                        'X2' => '15',
                                                        'S2' => '4',
                                                        
'seqs_better_than_cutoff' => '1',
                                                        'Hits_to_DB' => '5',
                                                        'num_extensions' 
=> '2',
                                                        
'num_successful_extensions' => '2',
                                                        'X1_bits' => '7.9',
                                                        'X3' => '50',
                                                        'dbentries' => '1',
                                                        'entropy_gapped' 
=> '1.31',
                                                        'X2_bits' => '29.7',
                                                        'S2_bits' => '8.4'
                                                      }
                                         }, 
'Bio::Search::GenericStatistics' ),
                 '_algorithm' => 'BLASTN',
                 '_parameters' => bless( {
                                           'params' => {
                                                         'gapext' => '2',
                                                         'matrix' => 
'blastn matrix:1 -3',
                                                         'expect' => '5.0',
                                                         'allowgaps' => 
'yes',
                                                         'gapopen' => '5'
                                                       }
                                         }, 
'Bio::Tools::Run::GenericParameters' ),
                 '_root_verbose' => 0,
                 '_queryname' => 'adapter_consensus'
               }, 'Bio::Search::Result::BlastResult' );


From David.Messina at sbc.su.se  Wed Aug 18 18:32:37 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 19 Aug 2010 00:32:37 +0200
Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlast.pm - bl2seq
	question
In-Reply-To: <4C6C3259.4060304@bcm.tmc.edu>
References: <4C6C3259.4060304@bcm.tmc.edu>
Message-ID: <E8F0F7A7-BC33-4E37-8AAB-75A9470E82A5@sbc.su.se>

Hi Caleb,

The first thing I would do is take BioPerl out of the equation and test your local bl2seq on the command line. If you get the same output locally as on the web version, then there is a problem with BioPerl. If you're still seeing a discrepancy between the web and your local run, then this isn't a problem with BioPerl.

Just to be clear, BioPerl doesn't "implement" any of the BLAST programs; it is simply a wrapper around the programs that you download from NCBI. That doesn't mean BioPerl isn't at fault, of course, just that it's important to isolate the problem carefully.

The most common reasons for these discrepancies are:

- different version numbers of BLAST

2.2.21? 2.2.22? Is it the same on the web as locally?

- similarly, different implementations of BLAST

NCBI's old BLAST suite is now deprecated and replaced with BLAST+. All of the online BLAST web queries are Blast+ now ? are you running BLAST+ locally? (there's also a separate BioPerl wrapper for BLAST+ called Bio::Tools::Run::BlastPlus)

- hidden "default" parameters

Even though you're only changing a handful of parameters, the defaults (particularly on the web version) may be different than what you expect.

In your case, it looks like on the web version, match score is 2 and mismatch is -3. However, in the local version I believe match score is 1 and a mismatch is -3.

See this line in the params block near the end of your post:

	'matrix' => 'blastn matrix:1 -3',


Dave


From sidd.basu at gmail.com  Wed Aug 18 20:28:32 2010
From: sidd.basu at gmail.com (Siddhartha Basu)
Date: Wed, 18 Aug 2010 19:28:32 -0500
Subject: [Bioperl-l]  Re: [RFC] Interolog::Walk
In-Reply-To: <4C6BF4BD.5010200@sms.ed.ac.uk>
References: <4C6BF4BD.5010200@sms.ed.ac.uk>
Message-ID: <20100819002830.GA366@Macintosh-235.local>

Hi, 

On Wed, 18 Aug 2010, Giuseppe Gallone wrote:

> Hello BioPerl community - I've written a new module called Interolog::Walk 
> that I'm planning to put on CPAN. I would be grateful if you might take a 
> look at the brief description I attached and tell me what you think. I'll 
> be more than happy to post further details should the module be of some 
> interest for someone.
>
> Also, I am not totally sure about having the correct name for it. This is 
> my first module and It would be great if you could advise on naming it 
> appropriately. Hopefully the following description will give an idea on 
> what it does.
>
> ===================
>
>
> NAME
>     Interolog::Walk - Retrieve, score and visualize putative 
> Protein-Protein Interactions through the orthology-walk method
>
> DESCRIPTION
>     A common activity in computational biology is to mine protein-protein 
> interactions from publicly available databases in order to build 
> Protein-Protein Interaction (PPI) datasets.
> In many instances, however, the number of experimentally obtained annotated 
> PPIs is very scarce and it would be helpful to enrich the experimental 
> dataset with high-quality, computationally-inferred PPIs. Such 
> computationally-obtained dataset can extend, support or enrich experimental 
> PPI datasets, and are of crucial importance in high-throughput gene 
> prioritization studies, i.e. to drive hypotheses and restrict the 
> dimensionality of many gene functional discovery problems.
> This Perl Module, Interolog::Walk, is aimed at building putative PPI 
> datasets on the basis of a number of comparative biology paradigms: the 
> module implements a collection of computational biology algorithms based on 
> the concept of "orthology projection". If interacting proteins A and B in 
> organism X have orthologs A' and B' in organism Y, under certain conditions 
> one can assume that the interaction will be conserved in organism Y, i.e. 
> the A-B interaction can be "projected through the orthologies" to obtain a 
> putative A'-B' interaction. The pair of interactions (A-B) and (A'-B') are 
> named "Interologs" (see for instance [1] and [2]).
>
> Interolog::Walk collects, analyses and collates gene orthology data 
> provided by the Ensembl Consortium (www.ensembl.org) as well as PPI data 
> provided by EBI Intact (http://www.ebi.ac.uk/intact/). It provides the user 
> with the possibility of rating the quality and reliability of the putative 
> interactions collected, by means of confidence scores, and optionally 
> outputs network representations of the datasets, compatible with the 
> biological network representation standard, Cytoscape.

Sounds interesting. I am currently playing around with a perl based webapp for displaying interactome
using cytoscapeweb. Depending how your design pans out,  would be happy to
use your module as a backend analysis layer. And on a related note,  you
might want to have a look at bioperl-network and if there is any overlap
might be worth contributing.

-siddhartha

>
> USAGE
> In order to carry out an interolog walk we start with a set of gene 
> identifiers in one organism of interest. We query those ids against a 
> number of comparative biology databases to retrieve a list of orthologues 
> for each gene id of interest, in one or more species.
> In the following step we rely  on PPI databases to retrieve the list of 
> available interactors for the protein ids obtained. The output at this 
> stage consists of a list of interactors of the orthologues of the initial 
> gene set, plus several fields of ancillary data.
> In the last step of the process we  project the interactions - again using 
> orthology data - back to the original species of interest. The output of 
> the process is a list of PUTATIVE INTERACTORS of the initial gene set, plus 
> several fields of ancillary data.
>
> ====================
>
> Given the scope and the focus of the project, I would imagine that viable 
> alternatives for the namespace might be
>
> Bio::Orthology::InterologWalk
> Bio::InterologMap
>
> or maybe
> Interolog::Map
> Orthology::Map
> Orthology::InterologMap
>
> There are no similar projects as far as I could see so I shouldn't run the 
> risk of overlapping namespaces. Still I would love to know your informed 
> opinion about it.
>
> best,
> Giuseppe
>
>
>
> REFERENCES
> [1] Yu H, Luscombe NM, Lu HX, Zhu X, Xia Y, Han JD, Bertin N, Chung S, 
> Vidal M, Gerstein M. Annotation transfer between genomes: protein-protein 
> interologs and protein-DNA regulogs. Genome Research 2004 
> Jun;14(6):1107-18.
>
> [2]Wiles AM, Doderer M, Ruan J, Gu T-T, Ravi D, Blackman BA, Bishop AJR. 
> "Building and Analyzing Protein Interactome Networks by Cross-species 
> Comparisons." BMC Systems Biology 2010, 4:36 - PMID: 20353594
>
> -- 
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From dan.kortschak at adelaide.edu.au  Wed Aug 18 22:15:03 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Thu, 19 Aug 2010 11:45:03 +0930
Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query
Message-ID: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au>

Hi Everyone,

I'm wanting to set up a persistent data store for some of my work and am
in the process of choosing parts for my system. From my brief look
around I think I'd like to use BioSQL (next best choice being Chado -
but BioPerl bindings in bioperl-db for BioSQL being the decider here),
but have noticed comments some time back that bioperl-db and PostgreSQL
8.3 (my prefered engine - though MySQL is possible, but makes the whole
system messier) don't play well together.

What is the status of the casting expectation conflict between
bioperl-db and Pg8.3? The scripts are run with safe data, so
placeholders aren't strictly crucial (though speed may be an issue?) and
`$dbh->{pg_server_prepare} = 0;' seems like it could be an option.

Can anybody provide any advice on this issue?

thanks
Dan Kortschak


From cjfields at illinois.edu  Wed Aug 18 23:29:36 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 18 Aug 2010 22:29:36 -0500
Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlast.pm - bl2seq
	question
In-Reply-To: <E8F0F7A7-BC33-4E37-8AAB-75A9470E82A5@sbc.su.se>
References: <4C6C3259.4060304@bcm.tmc.edu>
	<E8F0F7A7-BC33-4E37-8AAB-75A9470E82A5@sbc.su.se>
Message-ID: <194D43EC-A44C-450A-B57B-EC379DBCB935@illinois.edu>

Wouldn't surprise me too much if the parameters are not set the same; IIRC the main BLAST URL API and the online NCBI Web-BLAST have different default settings.

chris

On Aug 18, 2010, at 5:32 PM, Dave Messina wrote:

> Hi Caleb,
> 
> The first thing I would do is take BioPerl out of the equation and test your local bl2seq on the command line. If you get the same output locally as on the web version, then there is a problem with BioPerl. If you're still seeing a discrepancy between the web and your local run, then this isn't a problem with BioPerl.
> 
> Just to be clear, BioPerl doesn't "implement" any of the BLAST programs; it is simply a wrapper around the programs that you download from NCBI. That doesn't mean BioPerl isn't at fault, of course, just that it's important to isolate the problem carefully.
> 
> The most common reasons for these discrepancies are:
> 
> - different version numbers of BLAST
> 
> 2.2.21? 2.2.22? Is it the same on the web as locally?
> 
> - similarly, different implementations of BLAST
> 
> NCBI's old BLAST suite is now deprecated and replaced with BLAST+. All of the online BLAST web queries are Blast+ now ? are you running BLAST+ locally? (there's also a separate BioPerl wrapper for BLAST+ called Bio::Tools::Run::BlastPlus)
> 
> - hidden "default" parameters
> 
> Even though you're only changing a handful of parameters, the defaults (particularly on the web version) may be different than what you expect.
> 
> In your case, it looks like on the web version, match score is 2 and mismatch is -3. However, in the local version I believe match score is 1 and a mismatch is -3.
> 
> See this line in the params block near the end of your post:
> 
> 	'matrix' => 'blastn matrix:1 -3',
> 
> 
> 
> Dave
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hlapp at drycafe.net  Thu Aug 19 01:48:19 2010
From: hlapp at drycafe.net (Hilmar Lapp)
Date: Thu, 19 Aug 2010 01:48:19 -0400
Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query
In-Reply-To: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au>
References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au>
Message-ID: <986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net>

Hi Dan,

the casting isn't an issue anymore, I think. (And even if it were,  
there is actually a small script that brings back the casts that were  
built into 8.2.) Have you found an example where it still is?

	-hilmar

On Aug 18, 2010, at 10:15 PM, Dan Kortschak wrote:

> Hi Everyone,
>
> I'm wanting to set up a persistent data store for some of my work  
> and am
> in the process of choosing parts for my system. From my brief look
> around I think I'd like to use BioSQL (next best choice being Chado -
> but BioPerl bindings in bioperl-db for BioSQL being the decider here),
> but have noticed comments some time back that bioperl-db and  
> PostgreSQL
> 8.3 (my prefered engine - though MySQL is possible, but makes the  
> whole
> system messier) don't play well together.
>
> What is the status of the casting expectation conflict between
> bioperl-db and Pg8.3? The scripts are run with safe data, so
> placeholders aren't strictly crucial (though speed may be an issue?)  
> and
> `$dbh->{pg_server_prepare} = 0;' seems like it could be an option.
>
> Can anybody provide any advice on this issue?
>
> thanks
> Dan Kortschak
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
===========================================================


From dan.kortschak at adelaide.edu.au  Thu Aug 19 01:54:03 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Thu, 19 Aug 2010 15:24:03 +0930
Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query
In-Reply-To: <986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net>
References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au>
	<986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net>
Message-ID: <1282197243.14127.27.camel@zoidberg.mbs.adelaide.edu.au>

Hi Hilmar,

No, I haven't found any problems, just hoping to avoid them by prior
research.

thanks
Dan

On Thu, 2010-08-19 at 01:48 -0400, Hilmar Lapp wrote:
> Hi Dan,
> 
> the casting isn't an issue anymore, I think. (And even if it were,  
> there is actually a small script that brings back the casts that
> were  
> built into 8.2.) Have you found an example where it still is?
> 
>         -hilmar


From biopython at maubp.freeserve.co.uk  Thu Aug 19 06:01:03 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 19 Aug 2010 11:01:03 +0100
Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query
In-Reply-To: <986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net>
References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au>
	<986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net>
Message-ID: <AANLkTikiTQwzKJGrAKAtCqbHiAVw_uENrJu4Y038=anD@mail.gmail.com>

On Thu, Aug 19, 2010 at 6:48 AM, Hilmar Lapp <hlapp at drycafe.net> wrote:
> Hi Dan,
>
> the casting isn't an issue anymore, I think. (And even if it were, there is
> actually a small script that brings back the casts that were built into
> 8.2.) Have you found an example where it still is?
>
> ? ? ? ?-hilmar

Hi Hilmar,

Do the bioperl-db bindings for BioSQL on PostgreSQL still require those
extra rules in the schema?
http://bugzilla.open-bio.org/show_bug.cgi?id=2839

Peter


From G.Gallone at sms.ed.ac.uk  Thu Aug 19 06:45:36 2010
From: G.Gallone at sms.ed.ac.uk (Giuseppe Gallone)
Date: Thu, 19 Aug 2010 11:45:36 +0100
Subject: [Bioperl-l] [RFC] Interolog::Walk
In-Reply-To: <8BBCD561-CA5E-412B-8CBC-34767006A74A@sbc.su.se>
References: <4C6BF4BD.5010200@sms.ed.ac.uk>
	<8BBCD561-CA5E-412B-8CBC-34767006A74A@sbc.su.se>
Message-ID: <4C6D0B50.4050902@sms.ed.ac.uk>

Hi Dave,

thank you very much for your helpful comments.

Regarding the module name: I will follow your advice and avoid to 
propose a new root during the module registration. As for the second 
level, I haven't been able to find anything related to 
homology/orthology, therefore I'm not sure whether I should go for

Bio::Orthology::InterologMap
or
Bio::Homology::InterologMap

The first one being maybe a bit more specific. I might also expand 
further as in

Bio::Orthology::Interolog::Map,

just in case somebody else finds other interesting applications for the 
Interolog concept and would like to "plug in" their own contribution. 
Would this make any sense?

I also appreciate your comments on the documentation. The one I provided 
is actually not the full pod I was planning to include, but rather an 
extract. What I have at the moment is a description, for each method, in 
the following form:

=====================================
    remove_duplicate_rows
      Usage     : $RC = InterologMap::remove_duplicate_rows(input_handle 
    => $dbh,
 
output_handle   => $out_data,
                                                            header 
     => 'standard',
                                                            );
      Purpose   : This is used to clean up a TSV data files of duplicate 
entries. Occasionally,  Intact can return duplicate
                  entries. This routine will make sure no such 
duplicates are kept. A new datafile is built.
                  The number of unique data rows is updated.
      Returns   : success/error
      Argument  : database handle to input file, filehandle to 
outputfile, header type. Header type is one of the following:
                  - "standard": when the routine is used to clean up an 
interolog walk file (the header will be longer)
                  - "direct":   when the routine is used to clean up a 
file of real db interaction (the header is shorter)
                  - no field provided: default is standard
      Throws    : -
      Comment   : Sample


     See Also :
=======================================

On top of that, there is a DESCRIPTION, USAGE, and SYNOPSIS. The 
synopsis has some code with an example of typical usage of the module. 
Please take a look at this (attached below) and tell me what you think.

You mention that the description contains a lot of background 
information. Would you recommend reducing it, or placing it elsewhere?
I was considering to write a little tutorial in latex as soon as 
possible anyway, to provide a "centralised" source of information to 
familiarise with the module. Does this respect the CPAN regulations?

As for your question on the structure of the module: you are indeed 
right, the idea when running the "orthology walk" is to create a 
pipeline of subroutines: there's a core set of subroutines meant to work 
in strict sequentiality.
Each of these subroutines expects, as input, the output of the previous 
one. The input/output dataset is currently in the form of a TSV text 
file, which I process with the help of the DBI module (to be more 
specific, I use DBD::CSV).

While there's a certain flexibility regarding how to use the module, one 
core idea remains: in order to get the set of putative interactors, the 
user would have to call at least three basic routines:

(A)
=================
1)get_forward_orthologies(): this queries the initial gene list against 
one or more Ensembl dbs (using the Ensembl Perl Api) and retrieves their 
orthologues, plus a number of ancillary data fields (mainly conservation 
data, eg dn/ds ratio,distance from ancestor,orthology type, etc)

2)get_interactors(): this queries the orthology list built in the 
previous stage against a PSICQUIC-enabled PPI db using Rest (at the 
moment I only query the EBI Intact DB, but it should be easy to expand 
this and query all PSICQUIC compatible PPI dbs transparently). This step 
will "fatten" the dataset built in (1) with the interactors of those 
orthologues, plus ancillary data (including lots of parameters 
describing the quality, nature, origin of the annotated interaction)

3)get_backward_orthologies(): this queries the interactor list built in 
the previous stage against one or more Ensembl dbs to find orthologues 
*back* in the original species. It also adds a number of supplementary 
information just like in (1).
==================

At the end of this procedure the user will have a TSV files where each 
row contains a binary putative interaction plus (currently) 37 
supplementary data fields.

One can then scan these results to check for duplicates, to compute 
counts, to see if we have discovered new gene ids that were not present 
in the original dataset (hopefully we have :) ).

Most importantly, one can then further process these results to do one 
or more of the following:
(B) compute a global confidence score to assess the reliability of the 
each binary putative interaction
(C) extract the binary putative PPIs from the dataset and save them in a 
format compatible with Cytoscape: this helps providing a visual quality 
to the result: one can then apply network analysis tools to discover 
motifs, clusters, etc. The format I use is currently .SIF + attributes, 
as detailed in
http://cytoscape.wodaklab.org/wiki/Cytoscape_User_Manual/Network_Formats
(D) given the same initial gene list, one can also build a dataset of 
REAL, experimentally-obtained PPIs,(without mapping through orthologies 
in other species). One can then compare this dataset with the Putative 
dataset to see if/where the two overlap, what's the intersection or the 
differences, etc.


In order to suggest ways of using the module I have written 4 sample 
scripts and I will include them in the module. Each script utilises the 
module and uses/reuses subroutines in a pipeline fashion, and does the 
following:

1)doInterologWalk.pl: runs the basic pipeline in (A)
2)doScores.pl: computes and adds confidence scores as explained in (B)
3)doNetworks.pl: computes SIF network + attributes as in (D)
4)getRealInteractions.pl: runs a pipeline to obtain real PPIs from the 
inital gene set.

Hope I didn't make this too confusing. I would love to hear back from 
you and from anybody else that would like to provide feedback.

Cheers
Giuseppe

On 18/08/10 17:52, Dave Messina wrote:
> Hi Giuseppe,
>
> Sounds really interesting ? thanks for posting this.
>
>> Bio::Orthology::InterologWalk
>
> I vote for this name, or in any case something with Bio:: as the top-level namespace since it's a biology-related package.
>
> I like that you're providing a lot of background and information about the project in the documentation. However, the USAGE section should give information about how to use the module, with example code. You can look at other modules on CPAN (or in BioPerl) to see the conventions for writing documentation.
>
> Also, from what you wrote, it sounds like this might be a pipeline or a script rather than a module per se, or perhaps a script and a set of modules. It would be helpful to clarify in your documentation (if you haven't already) how exactly things are organized (and of course example code will help with that, too).
>
>
> Hope that's helpful, and let us know when you've got it up on CPAN so we can try it out!
>
>
> Dave
>
>

NAME
     Interolog::Walk - Retrieve, score and visualize putative 
Protein-Protein
     Interactions through the orthology-walk method

SYNOPSIS
       use Interolog::Walk;

     First, obtain Intact Interactions for the dataset (see example in
     "getDirectInteractions.pl"):

       #get a registry from Ensembl
       my $registry = InterologMap::setup_ensembl_adaptor(connect_to_db 
  => $ensembl_db,
                                                          source_species 
=> $sourceorg,
                                                          verbose 
  => 1
                                                          );


       #query actual interactions
       $RC = InterologMap::Direct::get_direct_interactions(registry 
     => $registry,
 
source_species   => $sourceorg,
                                                           input_path 
     => $in_path,
                                                           output_path 
     => $out_path,
                                                           url 
     => $url,
                                                           );

     do some postprocessing (see "do_counts()" and "extract_unseen_ids()" )
     and then do the actual interolog walk on the dataset with the following
     sequence of three methods.

     get orthologues of starting set:

       $RC = InterologMap::get_forward_orthologies(registry        => 
$registry,
                                                   ensembl_db      => 
$ensembl_db,
                                                   input_path      => 
$in_path,
                                                   output_path     => 
$out_path,
                                                   source_org      => 
$sourceorg,
                                                   dest_org        => 
$destorg,
                                                   );

     add interactors of orthologues found by "get_forward_orthologies()":

       $RC = InterologMap::get_interactions(input_path    => $in_path,
                                            output_path   => $out_path,
                                            url           => $url,
                                            url_global    => $url_global,
                                            );

     add orthologues of interactors found by "get_interactions()":

       $RC = InterologMap::get_backward_orthologies(registry    => 
$registry,
                                                    ensembl_db  => 
$ensembl_db,
                                                    input_path  => $in_path,
                                                    output_path => 
$out_path,
                                                    error_path  => 
$err_path,
                                                    source_org  => 
$sourceorg,
                                                    );

     do some postprocessing (see "remove_duplicate_rows()", "do_counts()",
     "extract_unseen_ids()") and then optionally compute a composite score
     for the putative interactions obtained:

        $RC = InterologMap::Scores::compute_scores(input_path      => 
$in_path,
                                                   score_path      => 
$score_path,
                                                   output_path     => 
$out_path,
                                                   term_graph      => 
$onto_graph,
                                                   M_IT_SCORE      => $M_IT,
                                                   M_DM_SCORE      => $M_DM,
                                                   M_ME_DM_SCORE   => 
$M_MDM,
                                                   M_ME_TAXA_SCORE => 
$M_MTAXA
                                                   );

     get some networks and network attributes which you can then visualise
     with cytoscape

        $RC = InterologMap::Networks::do_network(registry            => 
$registry,
                                                    db               => 
$ensembl_db,
                                                    input_path       => 
$in_path,
                                                    output_path      => 
$out_path,
                                                    source_org       => 
$sourceorg,
                                                    orthology_type   => 
$orthtype,
                                                    );

        $RC = InterologMap::Networks::do_attributes(registry      => 
$registry,
                                                    input_path    => 
$in_path,
                                                    output_path   => 
$out_path,
                                                    source_org    => 
$sourceorg,
                                                    label_type    => 
'external name'
                                                    );

     *The synopsis above only lists the major methods and parameters.*

DESCRIPTION
     A common activity in computational biology is to mine protein-protein
     interactions from publicly available databases to build 
*Protein-Protein
     Interaction* (PPI) datasets. In many instances, however, the number of
     experimentally obtained annotated PPIs is very scarce and it would be
     helpful to enrich the experimental dataset with high-quality,
     computationally-inferred PPIs. Such computationally-obtained 
dataset can
     extend, support or enrich experimental PPI datasets, and are of crucial
     importance in high-throughput gene prioritization studies, i.e. to 
drive
     hypotheses and restrict the dimensionality of functional discovery
     problems. This Perl Module, Interolog::Walk, is aimed at building
     putative PPI datasets on the basis of a number of comparative biology
     paradigms: the module implements a collection of computational biology
     algorithms based on the concept of "orthology projection". If
     interacting proteins A and B in organism X have orthologs A' and B' in
     organism Y, under certain conditions one can assume that the 
interaction
     will be conserved in organism Y, i.e. the A-B interaction can be
     "projected through the orthologies" to obtain a putative A'-B'
     interaction. The pair of interactions (A-B) and (A'-B') are named
     "Interologs".

     Interolog::Walk collects, analyses and collates gene orthology data
     provided by the Ensembl Consortium as well as PPI data provided by EBI
     Intact. It provides the user with the possibility of rating the quality
     and reliability of the putative interactions collected, by means of
     confidence scores, and optionally outputs network representations 
of the
     datasets, compatible with the biological network representation
     standard, Cytoscape.

BASIC USAGE
   Rationale behind "Interolog::Walk".
                                   \EBI Intact API/
              .--------------.            |             .-------------.
          (2) | A(e.g. mouse)|<------------------------>|   B(mouse)  |  (3)
              `--------------'          <PPI>           `-------------'
                     ^                                         |
        /Ensembl\    | <Orthology>                 <Orthology> | \ Ensembl /
       / Compara \   |                                         |  \Compara/
      /    Api    \  |                                         |   \ Api /
                     |                                         |
              .--------------.                           .-------------.
          (1) | A'(e.g. fly) |. . . . . . . . . . . . .  |   B'(fly)   | (4)
              `--------------'     [SCORED]PUTATIVE PPI  `-------------'
                              (Output of Interolog::Walk)

     In order to carry out an interolog walk we start with a set of gene
     identifiers in one organism of interest (1). We query those ids against
     a number of comparative biology databases to retrieve a list of
     orthologues for the gene id of interest, in one or more species (2). In
     the next step we rely instead on PPI databases to retrieve the list of
     available interactors for the protein ids obtained in (2). The 
output at
     this stage consists of a list of interactors of the orthologues of the
     initial gene set, plus several fields of ancillary data (whose
     importance will be explained later) (3). In the last step of this
     process we will need to project the interactions in (3) - again using
     orthology data - back to the original species of interest. The 
output of
     the process is a list of PUTATIVE INTERACTORS of the initial gene set,
     plus several fields of ancillary data.

     "Interolog::Walk" provides three main functions to carry out the basic
     walk, "get_forward_orthologies()", "get_interactions()" and
     "get_backward_orthologies()". These functions must be called strictly
     sequentially in your script, as the process, analyse and attach data to
     the output in a pipeline-like fashion, i.e. processing the output 
of the
     preceding function.

     get_forward_orthologies
     get_interactions
     get_backward_orthologies

SCORING THE PUTATIVE INTERACTIONS
BUILDING PUTATIVE INTERACTION NETWORKS
BUGS
     Please report any you find

SUPPORT
     TODO

AUTHOR
     Giuseppe Gallone <ggallone at cpan.org>

     CPAN ID: GGALLONE

     University of Edinburgh

COPYRIGHT
     The Interolog::Walk module is Copyright (c) 2010 Giuseppe Gallone All
     rights reserved.

     You may distribute under the terms of either the GNU General Public
     License or the Artistic License, as specified in the Perl 5.10.0 README
     file.

SEE ALSO


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


From G.Gallone at sms.ed.ac.uk  Thu Aug 19 08:42:28 2010
From: G.Gallone at sms.ed.ac.uk (Giuseppe Gallone)
Date: Thu, 19 Aug 2010 13:42:28 +0100
Subject: [Bioperl-l] [RFC] Interolog::Walk
In-Reply-To: <20100819002830.GA366@Macintosh-235.local>
References: <4C6BF4BD.5010200@sms.ed.ac.uk>
	<20100819002830.GA366@Macintosh-235.local>
Message-ID: <4C6D26B4.5090702@sms.ed.ac.uk>

Dear Siddhartha,

glad to hear this might be helpful. As for the bioperl-network package 
you mention, thank for you for mentioning that. I gave a quick look to 
its documentation and looks like a much deeper and more complex effort 
than what I have in my package. I've actually been using a lot the 
package Graph on which it seems to be based and found it very helpful.

I'm not sure if the network routines in my module overlap with it 
though: all I do in my package is parse the dataset, filtering out only 
what requested to build a cytoscape SIF file and optionally some 
cytoscape NOA attribute files, as requested by the cytoscape 
specification in

http://cytoscape.wodaklab.org/wiki/Cytoscape_User_Manual/Network_Formats

instead it looks like  bioperl-network actually builds some kind of 
internal representation of the network for further manipulation in Perl, 
if I understand it correctly?

Kind regards
Giuseppe

On 19/08/10 01:28, Siddhartha Basu wrote:

> Sounds interesting. I am currently playing around with a perl based webapp for displaying interactome
> using cytoscapeweb. Depending how your design pans out,  would be happy to
> use your module as a backend analysis layer. And on a related note,  you
> might want to have a look at bioperl-network and if there is any overlap
> might be worth contributing.
>
> -siddhartha
>


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


From xupeng86 at gmail.com  Thu Aug 19 04:02:48 2010
From: xupeng86 at gmail.com (xupeng)
Date: Thu, 19 Aug 2010 16:02:48 +0800
Subject: [Bioperl-l] Why I can't find the perl script "load_seqdatabase.pl"
	when use biosql database?
Message-ID: <201008191602.49068.xupeng86@gmail.com>

 	I've downloaded the biosql-1.0.1.tar.gz. It works well. But I 
can't find the 'load_seqdatabase.pl' when I try to import the 
Genbank files into biosql databsase. 
	Can anyone give me a copy of that file? 
many thanks ! 


From sunhanifk at gmail.com  Thu Aug 19 10:25:38 2010
From: sunhanifk at gmail.com (han sun)
Date: Thu, 19 Aug 2010 22:25:38 +0800
Subject: [Bioperl-l] Could I install BioPerl on Windows with the ActivePerl
	5.12.1?
Message-ID: <AANLkTi=ycKzqWWQ-FHk=4WBxhedt7CYT-WkBZkxRjgrm@mail.gmail.com>

Hello everyone,

I have used perl for several months,and I now want to feel the power of
bioperl.
But it seems that the installing is more difficult than I thought.

I typed the commands.


install-shell


rep add bioperl http://bioperl.org/DIST


rep add uwinnipeg
http://cpan.uwinnipeg.ca/PPMPackages/12xx/<http://cpan.uwinnipeg.ca/PPMPackages/10xx/>


rep add trouchelle http://trouchelle.com/ppm12/

install BioPerl

However,the installing failed,

ppm install failed:
Can't find any package that provides List::MoreUtils for Bundle-BioPerl-Core
Can't find any package that provides PostScript::TextBlock for
Bundle-BioPerl-Core
Can't find any package that provides Ace:: for Bundle-BioPerl-Core
Can't find any package that provides SOAP::Lite for Bundle-BioPerl-Core
Can't find any package that provides Convert::Binary::C for
Bundle-BioPerl-Core
Can't find any package that provides XML::Twig for Bundle-BioPerl-Core
Can't find any package that provides DB_File:: for Bundle-BioPerl-Core
Can't find any package that provides IPC::Run for GraphViz
Can't find any package that provides XML-XPathEngine for XML-DOM-XPath
Can't find any package that provides List-MoreUtils for Moose
Can't find any package that provides List-MoreUtils for Class-MOP


then I tried

install http://www.bribes.org/perl/ppm/GD.ppd

and tried the installation again,but it still didn't help.

*
*
*
*
*
*


*Do you konw what's wrong with the problem?*
*
*
*
*
*Please help me,thanks very much.*


From cjfields1 at gmail.com  Thu Aug 19 10:33:26 2010
From: cjfields1 at gmail.com (Christopher Fields)
Date: Thu, 19 Aug 2010 09:33:26 -0500
Subject: [Bioperl-l] Could I install BioPerl on Windows with the
	ActivePerl 5.12.1?
In-Reply-To: <AANLkTi=ycKzqWWQ-FHk=4WBxhedt7CYT-WkBZkxRjgrm@mail.gmail.com>
References: <AANLkTi=ycKzqWWQ-FHk=4WBxhedt7CYT-WkBZkxRjgrm@mail.gmail.com>
Message-ID: <78E913D5-00E2-45F2-AA9D-7F4A7CDBFDA1@gmail.com>

Try using ActivePerl 5.10 instead of v5.12.  It's very possible the PPM won't work for v5.12 yet.

chris

On Aug 19, 2010, at 9:25 AM, han sun wrote:

> Hello everyone,
> 
> I have used perl for several months,and I now want to feel the power of
> bioperl.
> But it seems that the installing is more difficult than I thought.
> 
> I typed the commands.
> 
> 
> 
> install-shell
> 
> 
> rep add bioperl http://bioperl.org/DIST
> 
> 
> rep add uwinnipeg
> http://cpan.uwinnipeg.ca/PPMPackages/12xx/<http://cpan.uwinnipeg.ca/PPMPackages/10xx/>
> 
> 
> rep add trouchelle http://trouchelle.com/ppm12/
> 
> install BioPerl
> 
> However,the installing failed,
> 
> ppm install failed:
> Can't find any package that provides List::MoreUtils for Bundle-BioPerl-Core
> Can't find any package that provides PostScript::TextBlock for
> Bundle-BioPerl-Core
> Can't find any package that provides Ace:: for Bundle-BioPerl-Core
> Can't find any package that provides SOAP::Lite for Bundle-BioPerl-Core
> Can't find any package that provides Convert::Binary::C for
> Bundle-BioPerl-Core
> Can't find any package that provides XML::Twig for Bundle-BioPerl-Core
> Can't find any package that provides DB_File:: for Bundle-BioPerl-Core
> Can't find any package that provides IPC::Run for GraphViz
> Can't find any package that provides XML-XPathEngine for XML-DOM-XPath
> Can't find any package that provides List-MoreUtils for Moose
> Can't find any package that provides List-MoreUtils for Class-MOP
> 
> 
> then I tried
> 
> install http://www.bribes.org/perl/ppm/GD.ppd
> 
> and tried the installation again,but it still didn't help.
> 
> *
> *
> *
> *
> *
> *
> 
> 
> *Do you konw what's wrong with the problem?*
> *
> *
> *
> *
> *Please help me,thanks very much.*
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hlapp at drycafe.net  Thu Aug 19 10:53:22 2010
From: hlapp at drycafe.net (Hilmar Lapp)
Date: Thu, 19 Aug 2010 10:53:22 -0400
Subject: [Bioperl-l] Why I can't find the perl script
	"load_seqdatabase.pl" when use biosql database?
In-Reply-To: <201008191602.49068.xupeng86@gmail.com>
References: <201008191602.49068.xupeng86@gmail.com>
Message-ID: <14AD92D9-3EA7-4E74-8EF7-2B2EC70D9095@drycafe.net>

The file comes with Bioperl-db, not BioSQL. That is so because it  
depends on BioPerl and on Bioperl-db, and so you will need to have  
both installed.

	-hilmar

On Aug 19, 2010, at 4:02 AM, xupeng wrote:

> 	I've downloaded the biosql-1.0.1.tar.gz. It works well. But I
> can't find the 'load_seqdatabase.pl' when I try to import the
> Genbank files into biosql databsase.
> 	Can anyone give me a copy of that file?
> many thanks !
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
===========================================================


From hlapp at drycafe.net  Thu Aug 19 10:58:46 2010
From: hlapp at drycafe.net (Hilmar Lapp)
Date: Thu, 19 Aug 2010 10:58:46 -0400
Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query
In-Reply-To: <AANLkTikiTQwzKJGrAKAtCqbHiAVw_uENrJu4Y038=anD@mail.gmail.com>
References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au>
	<986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net>
	<AANLkTikiTQwzKJGrAKAtCqbHiAVw_uENrJu4Y038=anD@mail.gmail.com>
Message-ID: <045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net>

Yes, unfortunately they do. The feature for obviating them (namely  
nested transactions) is there in Pg 8.2+, but Bioperl-db doesn't use  
them yet ... I have to learn more about Class::DBIx first to decide  
whether it's better to first implement nested transactions in the home- 
grown ORM that Bioperl-db in essence is, or whether it's better to  
reimplement everything in Class::DBIx instead.

There are new datatypes in Bioperl, and relations in BioSQL that could  
hold them, and so I need to decide what's the way forward.

	-hilmar

On Aug 19, 2010, at 6:01 AM, Peter wrote:

> On Thu, Aug 19, 2010 at 6:48 AM, Hilmar Lapp <hlapp at drycafe.net>  
> wrote:
>> Hi Dan,
>>
>> the casting isn't an issue anymore, I think. (And even if it were,  
>> there is
>> actually a small script that brings back the casts that were built  
>> into
>> 8.2.) Have you found an example where it still is?
>>
>>        -hilmar
>
> Hi Hilmar,
>
> Do the bioperl-db bindings for BioSQL on PostgreSQL still require  
> those
> extra rules in the schema?
> http://bugzilla.open-bio.org/show_bug.cgi?id=2839
>
> Peter

-- 
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
===========================================================


From mmuratet at hudsonalpha.org  Thu Aug 19 11:00:52 2010
From: mmuratet at hudsonalpha.org (Michael Muratet)
Date: Thu, 19 Aug 2010 10:00:52 -0500
Subject: [Bioperl-l] Why I can't find the perl script
	"load_seqdatabase.pl" when use biosql database?
In-Reply-To: <14AD92D9-3EA7-4E74-8EF7-2B2EC70D9095@drycafe.net>
References: <201008191602.49068.xupeng86@gmail.com>
	<14AD92D9-3EA7-4E74-8EF7-2B2EC70D9095@drycafe.net>
Message-ID: <C6FECD93-E599-465B-A93A-BD1F2CDFBE9C@hudsonalpha.org>


On Aug 19, 2010, at 9:53 AM, Hilmar Lapp wrote:

> The file comes with Bioperl-db, not BioSQL. That is so because it  
> depends on BioPerl and on Bioperl-db, and so you will need to have  
> both installed.

Is load_seqdatabase.pl still the best method? I vaguely remember a  
post that said that load_seqdatabase was deprecated, but I can't find  
it in the archives.

Mike

>
> 	-hilmar
>
> On Aug 19, 2010, at 4:02 AM, xupeng wrote:
>
>> 	I've downloaded the biosql-1.0.1.tar.gz. It works well. But I
>> can't find the 'load_seqdatabase.pl' when I try to import the
>> Genbank files into biosql databsase.
>> 	Can anyone give me a copy of that file?
>> many thanks !
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> -- 
> ===========================================================
> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
> ===========================================================
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Michael Muratet, Ph.D.
Senior Scientist
HudsonAlpha Institute for Biotechnology
mmuratet at hudsonalpha.org
(256) 327-0473 (p)
(256) 327-0966 (f)

Room 4005
601 Genome Way
Huntsville, Alabama 35806


From hlapp at drycafe.net  Thu Aug 19 11:29:31 2010
From: hlapp at drycafe.net (Hilmar Lapp)
Date: Thu, 19 Aug 2010 11:29:31 -0400
Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query
In-Reply-To: <5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu>
References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au>
	<986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net>
	<AANLkTikiTQwzKJGrAKAtCqbHiAVw_uENrJu4Y038=anD@mail.gmail.com>
	<045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net>
	<5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu>
Message-ID: <5F77404A-086D-4D0C-B3A5-F5119FCF878A@drycafe.net>


On Aug 19, 2010, at 11:09 AM, Chris Fields wrote:

> DBIx::Class


Did I have this in the wrong order :-) More coffee, please.
-- 
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
===========================================================


From hlapp at drycafe.net  Thu Aug 19 11:30:26 2010
From: hlapp at drycafe.net (Hilmar Lapp)
Date: Thu, 19 Aug 2010 11:30:26 -0400
Subject: [Bioperl-l] Why I can't find the perl script
	"load_seqdatabase.pl" when use biosql database?
In-Reply-To: <C6FECD93-E599-465B-A93A-BD1F2CDFBE9C@hudsonalpha.org>
References: <201008191602.49068.xupeng86@gmail.com>
	<14AD92D9-3EA7-4E74-8EF7-2B2EC70D9095@drycafe.net>
	<C6FECD93-E599-465B-A93A-BD1F2CDFBE9C@hudsonalpha.org>
Message-ID: <C5FD4B85-25B3-4D76-AA99-B3DBE42400C7@drycafe.net>

It's not deprecated. Unless I'm again mixing up something?

	-hilmar

On Aug 19, 2010, at 11:00 AM, Michael Muratet wrote:

>
> On Aug 19, 2010, at 9:53 AM, Hilmar Lapp wrote:
>
>> The file comes with Bioperl-db, not BioSQL. That is so because it  
>> depends on BioPerl and on Bioperl-db, and so you will need to have  
>> both installed.
>
> Is load_seqdatabase.pl still the best method? I vaguely remember a  
> post that said that load_seqdatabase was deprecated, but I can't  
> find it in the archives.
>
> Mike
>
>>
>> 	-hilmar
>>
>> On Aug 19, 2010, at 4:02 AM, xupeng wrote:
>>
>>> 	I've downloaded the biosql-1.0.1.tar.gz. It works well. But I
>>> can't find the 'load_seqdatabase.pl' when I try to import the
>>> Genbank files into biosql databsase.
>>> 	Can anyone give me a copy of that file?
>>> many thanks !
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> -- 
>> ===========================================================
>> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
>> ===========================================================
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Michael Muratet, Ph.D.
> Senior Scientist
> HudsonAlpha Institute for Biotechnology
> mmuratet at hudsonalpha.org
> (256) 327-0473 (p)
> (256) 327-0966 (f)
>
> Room 4005
> 601 Genome Way
> Huntsville, Alabama 35806
>
>
>
>
>

-- 
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
===========================================================


From cjfields at illinois.edu  Thu Aug 19 11:09:13 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 19 Aug 2010 10:09:13 -0500
Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query
In-Reply-To: <045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net>
References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au>
	<986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net>
	<AANLkTikiTQwzKJGrAKAtCqbHiAVw_uENrJu4Y038=anD@mail.gmail.com>
	<045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net>
Message-ID: <5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu>

I think it's worth exploring having a DBIx::Class-based middle-ware approach similar to what Rob Buels has done for Chado.  That would be fairly easy to get started using DBIx::Class::Schema::Loader.  

After that it would require optimization and tweaking, which is potentially more complex than Rob's setup as Chado is very Pg-specific, but maybe Rob can elaborate...

chris

On Aug 19, 2010, at 9:58 AM, Hilmar Lapp wrote:

> Yes, unfortunately they do. The feature for obviating them (namely nested transactions) is there in Pg 8.2+, but Bioperl-db doesn't use them yet ... I have to learn more about Class::DBIx first to decide whether it's better to first implement nested transactions in the home-grown ORM that Bioperl-db in essence is, or whether it's better to reimplement everything in Class::DBIx instead.
> 
> There are new datatypes in Bioperl, and relations in BioSQL that could hold them, and so I need to decide what's the way forward.
> 
> 	-hilmar
> 
> On Aug 19, 2010, at 6:01 AM, Peter wrote:
> 
>> On Thu, Aug 19, 2010 at 6:48 AM, Hilmar Lapp <hlapp at drycafe.net> wrote:
>>> Hi Dan,
>>> 
>>> the casting isn't an issue anymore, I think. (And even if it were, there is
>>> actually a small script that brings back the casts that were built into
>>> 8.2.) Have you found an example where it still is?
>>> 
>>>       -hilmar
>> 
>> Hi Hilmar,
>> 
>> Do the bioperl-db bindings for BioSQL on PostgreSQL still require those
>> extra rules in the schema?
>> http://bugzilla.open-bio.org/show_bug.cgi?id=2839
>> 
>> Peter
> 
> -- 
> ===========================================================
> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
> ===========================================================
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Thu Aug 19 11:37:39 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 19 Aug 2010 10:37:39 -0500
Subject: [Bioperl-l] Why I can't find the perl script
	"load_seqdatabase.pl" when use biosql database?
In-Reply-To: <C5FD4B85-25B3-4D76-AA99-B3DBE42400C7@drycafe.net>
References: <201008191602.49068.xupeng86@gmail.com>
	<14AD92D9-3EA7-4E74-8EF7-2B2EC70D9095@drycafe.net>
	<C6FECD93-E599-465B-A93A-BD1F2CDFBE9C@hudsonalpha.org>
	<C5FD4B85-25B3-4D76-AA99-B3DBE42400C7@drycafe.net>
Message-ID: <68FB78FF-11F7-43D7-9FA3-5DFF7D391FAB@illinois.edu>

I don't recall this either.  So, can't blame it on lack of coffee :)

chris

On Aug 19, 2010, at 10:30 AM, Hilmar Lapp wrote:

> It's not deprecated. Unless I'm again mixing up something?
> 
> 	-hilmar
> 
> On Aug 19, 2010, at 11:00 AM, Michael Muratet wrote:
> 
>> 
>> On Aug 19, 2010, at 9:53 AM, Hilmar Lapp wrote:
>> 
>>> The file comes with Bioperl-db, not BioSQL. That is so because it depends on BioPerl and on Bioperl-db, and so you will need to have both installed.
>> 
>> Is load_seqdatabase.pl still the best method? I vaguely remember a post that said that load_seqdatabase was deprecated, but I can't find it in the archives.
>> 
>> Mike
>> 
>>> 
>>> 	-hilmar
>>> 
>>> On Aug 19, 2010, at 4:02 AM, xupeng wrote:
>>> 
>>>> 	I've downloaded the biosql-1.0.1.tar.gz. It works well. But I
>>>> can't find the 'load_seqdatabase.pl' when I try to import the
>>>> Genbank files into biosql databsase.
>>>> 	Can anyone give me a copy of that file?
>>>> many thanks !
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>>> -- 
>>> ===========================================================
>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
>>> ===========================================================
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> Michael Muratet, Ph.D.
>> Senior Scientist
>> HudsonAlpha Institute for Biotechnology
>> mmuratet at hudsonalpha.org
>> (256) 327-0473 (p)
>> (256) 327-0966 (f)
>> 
>> Room 4005
>> 601 Genome Way
>> Huntsville, Alabama 35806
>> 
>> 
>> 
>> 
>> 
> 
> -- 
> ===========================================================
> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
> ===========================================================
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From mmuratet at hudsonalpha.org  Thu Aug 19 11:40:02 2010
From: mmuratet at hudsonalpha.org (Michael Muratet)
Date: Thu, 19 Aug 2010 10:40:02 -0500
Subject: [Bioperl-l] Why I can't find the perl script
	"load_seqdatabase.pl" when use biosql database?
In-Reply-To: <68FB78FF-11F7-43D7-9FA3-5DFF7D391FAB@illinois.edu>
References: <201008191602.49068.xupeng86@gmail.com>
	<14AD92D9-3EA7-4E74-8EF7-2B2EC70D9095@drycafe.net>
	<C6FECD93-E599-465B-A93A-BD1F2CDFBE9C@hudsonalpha.org>
	<C5FD4B85-25B3-4D76-AA99-B3DBE42400C7@drycafe.net>
	<68FB78FF-11F7-43D7-9FA3-5DFF7D391FAB@illinois.edu>
Message-ID: <A0AD0D4E-89EC-4FA0-8625-FF0A2EFB5669@hudsonalpha.org>


On Aug 19, 2010, at 10:37 AM, Chris Fields wrote:

> I don't recall this either.  So, can't blame it on lack of coffee :)

Thanks. I'll keep using it!

Mike
>
> chris
>
> On Aug 19, 2010, at 10:30 AM, Hilmar Lapp wrote:
>
>> It's not deprecated. Unless I'm again mixing up something?
>>
>> 	-hilmar
>>
>> On Aug 19, 2010, at 11:00 AM, Michael Muratet wrote:
>>
>>>
>>> On Aug 19, 2010, at 9:53 AM, Hilmar Lapp wrote:
>>>
>>>> The file comes with Bioperl-db, not BioSQL. That is so because it  
>>>> depends on BioPerl and on Bioperl-db, and so you will need to  
>>>> have both installed.
>>>
>>> Is load_seqdatabase.pl still the best method? I vaguely remember a  
>>> post that said that load_seqdatabase was deprecated, but I can't  
>>> find it in the archives.
>>>
>>> Mike
>>>
>>>>
>>>> 	-hilmar
>>>>
>>>> On Aug 19, 2010, at 4:02 AM, xupeng wrote:
>>>>
>>>>> 	I've downloaded the biosql-1.0.1.tar.gz. It works well. But I
>>>>> can't find the 'load_seqdatabase.pl' when I try to import the
>>>>> Genbank files into biosql databsase.
>>>>> 	Can anyone give me a copy of that file?
>>>>> many thanks !
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>> -- 
>>>> ===========================================================
>>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
>>>> ===========================================================
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> Michael Muratet, Ph.D.
>>> Senior Scientist
>>> HudsonAlpha Institute for Biotechnology
>>> mmuratet at hudsonalpha.org
>>> (256) 327-0473 (p)
>>> (256) 327-0966 (f)
>>>
>>> Room 4005
>>> 601 Genome Way
>>> Huntsville, Alabama 35806
>>>
>>>
>>>
>>>
>>>
>>
>> -- 
>> ===========================================================
>> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
>> ===========================================================
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

Michael Muratet, Ph.D.
Senior Scientist
HudsonAlpha Institute for Biotechnology
mmuratet at hudsonalpha.org
(256) 327-0473 (p)
(256) 327-0966 (f)

Room 4005
601 Genome Way
Huntsville, Alabama 35806


From cjfields at illinois.edu  Thu Aug 19 11:55:54 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 19 Aug 2010 10:55:54 -0500
Subject: [Bioperl-l] Bug? Features with similar ranges,
	different IDs are considered overlapping
In-Reply-To: <EA0C23FB-8C2F-4C04-B0E8-4207409916DC@sbc.su.se>
References: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu>
	<A07B1A30-7708-4401-BB13-7B4463D306E7@sbc.su.se>
	<E3473ED6-2122-4B4A-8A73-E80C4136CCAC@illinois.edu>
	<83732A2C-C6E3-479A-ACAA-FE5756271991@sbc.su.se>
	<B7A8E3B4-1E7E-4768-AFF3-3D4C4A5FC3B1@scottcain.net>
	<EA0C23FB-8C2F-4C04-B0E8-4207409916DC@sbc.su.se>
Message-ID: <5611499B-FA63-4A52-8279-99B554418374@illinois.edu>

On Aug 17, 2010, at 8:52 AM, Dave Messina wrote:

>> It seems to me that the genomic comparison is the thing people would do more often, so if you're going to create a flag, the default should be for the genomic comparison
> 
> Yep, agreed.
> 
> And such a flag should be named for the non-default behavior, then, like: -ignore_IDs_for_overlaps
> 
> Dave

Probably would just be -ignore_ids as this behavior would have to be consistent across the various Bio::RangeI methods (overlaps, contains, etc).  The params are case-insensitive IIRC, so the _IDs would just be lc().

RangeI doesn't define a seq_id(), though, so we either use can() in RangeI (which is dirtier IMO) or define this in the appropriate class, probably LocationI or SeqFeatureI.

chris


From cjfields at illinois.edu  Thu Aug 19 11:56:11 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 19 Aug 2010 10:56:11 -0500
Subject: [Bioperl-l] Bug? Features with similar ranges,
	different IDs are considered overlapping
In-Reply-To: <B7A8E3B4-1E7E-4768-AFF3-3D4C4A5FC3B1@scottcain.net>
References: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu>
	<A07B1A30-7708-4401-BB13-7B4463D306E7@sbc.su.se>
	<E3473ED6-2122-4B4A-8A73-E80C4136CCAC@illinois.edu>
	<83732A2C-C6E3-479A-ACAA-FE5756271991@sbc.su.se>
	<B7A8E3B4-1E7E-4768-AFF3-3D4C4A5FC3B1@scottcain.net>
Message-ID: <7CF700A0-C7A0-4BD2-9757-50B693B3B614@illinois.edu>

Makes sense.  

chris

On Aug 17, 2010, at 7:45 AM, Scott Cain wrote:

> Hi Dave and Chris,
> 
> It seems to me that the genomic comparison is the thing people would do more often, so if you're going to create a flag, the default should be for the genomic comparison and if somebody is doing the protein space comparison and not getting the the expected results, they'll probably read the docs to find out why. 
> 
> Scott
> 
> --
> Scott Cain, Ph. D.
> scott at scottcain dot net
> Ontario Institute for Cancer Research
> http://gmod.org/
> 216 392 3087 
> 
> Snet from my iPhone.
> 
> On Aug 17, 2010, at 5:06 AM, Dave Messina <David.Messina at sbc.su.se> wrote:
> 
>>> Good point; it's probably the context the methods are used that matters.  So, maybe just a document clarification?
>> 
>> That's always good, but it really doesn't solve the issue you're describing.
>> 
>> I mean, who would expect to get overlaps for features on different chromosomes?
>> 
>> To me, that's a clear violation of reasonable user expectations. You shouldn't have to read the docs about something like that.
>> 
>> So what's the solution for these duelling use cases? I haven't thought about it much, but a first approximation might be to add a -genomic boolean flag that, when true, would do the right thing and check the ID when doing overlaps or other positional comparisons.
>> 
>> (Maybe -genomic is too obscure. Maybe it should be -same_id_for_overlaps or something like that.)
>> 
>> And maybe having to know to set a flag is effectively the same thing as having to read the docs to understand SeqFeature's overlap behavior.
>> 
>> What do the rest of you out there think?
>> 
>> 
>> Dave
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From David.Messina at sbc.su.se  Thu Aug 19 12:54:23 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 19 Aug 2010 18:54:23 +0200
Subject: [Bioperl-l]  Bug? Features with similar ranges,
	different IDs are considered overlapping
References: <83299B71-0F73-440D-A9C5-DC1DA2AFF605@davemessina.com>
Message-ID: <1EFB951F-AEE1-4B2A-9E29-114E40B25D21@sbc.su.se>

[Ccing list for real this time]

On Aug 19, 2010, at 17:55, Chris Fields <cjfields at illinois.edu> wrote:

> Probably would just be -ignore_ids

You're right, that's the way to go. 


> define this in the appropriate class, probably LocationI or 

Yep, that's cleaner.

Thanks!


Dave


From cjfields1 at gmail.com  Thu Aug 19 13:20:32 2010
From: cjfields1 at gmail.com (Christopher Fields)
Date: Thu, 19 Aug 2010 12:20:32 -0500
Subject: [Bioperl-l] Could I install BioPerl on Windows with the
	ActivePerl 5.12.1?
In-Reply-To: <AANLkTimBPL6Sr2kmg+f0t1j8pk_9nBAoqubKzY4AJoxo@mail.gmail.com>
References: <AANLkTi=ycKzqWWQ-FHk=4WBxhedt7CYT-WkBZkxRjgrm@mail.gmail.com>
	<78E913D5-00E2-45F2-AA9D-7F4A7CDBFDA1@gmail.com>
	<AANLkTimBPL6Sr2kmg+f0t1j8pk_9nBAoqubKzY4AJoxo@mail.gmail.com>
Message-ID: <5115F433-06AC-46F1-81AD-D15C4A8D9524@gmail.com>

cc'ing list.  Looks like the BioPerl PPM is possibly broken for perl 5.12.  Shouldn't be too hard to fix, but apparently there are a lot of missing packages. Troubling...

chris

On Aug 19, 2010, at 11:29 AM, han sun wrote:

> v5.10 works,thanks.
> 
> 2010/8/19 Christopher Fields <cjfields1 at gmail.com>
> Try using ActivePerl 5.10 instead of v5.12.  It's very possible the PPM won't work for v5.12 yet.
> 
> chris
> 
> On Aug 19, 2010, at 9:25 AM, han sun wrote:
> 
> > Hello everyone,
> >
> > I have used perl for several months,and I now want to feel the power of
> > bioperl.
> > But it seems that the installing is more difficult than I thought.
> >
> > I typed the commands.
> >
> >
> >
> > install-shell
> >
> >
> > rep add bioperl http://bioperl.org/DIST
> >
> >
> > rep add uwinnipeg
> > http://cpan.uwinnipeg.ca/PPMPackages/12xx/<http://cpan.uwinnipeg.ca/PPMPackages/10xx/>
> >
> >
> > rep add trouchelle http://trouchelle.com/ppm12/
> >
> > install BioPerl
> >
> > However,the installing failed,
> >
> > ppm install failed:
> > Can't find any package that provides List::MoreUtils for Bundle-BioPerl-Core
> > Can't find any package that provides PostScript::TextBlock for
> > Bundle-BioPerl-Core
> > Can't find any package that provides Ace:: for Bundle-BioPerl-Core
> > Can't find any package that provides SOAP::Lite for Bundle-BioPerl-Core
> > Can't find any package that provides Convert::Binary::C for
> > Bundle-BioPerl-Core
> > Can't find any package that provides XML::Twig for Bundle-BioPerl-Core
> > Can't find any package that provides DB_File:: for Bundle-BioPerl-Core
> > Can't find any package that provides IPC::Run for GraphViz
> > Can't find any package that provides XML-XPathEngine for XML-DOM-XPath
> > Can't find any package that provides List-MoreUtils for Moose
> > Can't find any package that provides List-MoreUtils for Class-MOP
> >
> >
> > then I tried
> >
> > install http://www.bribes.org/perl/ppm/GD.ppd
> >
> > and tried the installation again,but it still didn't help.
> >
> > *
> > *
> > *
> > *
> > *
> > *
> >
> >
> > *Do you konw what's wrong with the problem?*
> > *
> > *
> > *
> > *
> > *Please help me,thanks very much.*
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 


From rmb32 at cornell.edu  Thu Aug 19 13:09:45 2010
From: rmb32 at cornell.edu (Robert Buels)
Date: Thu, 19 Aug 2010 10:09:45 -0700
Subject: [Bioperl-l] reminder: Aug 25 deadline for GMOD Hackathon application
Message-ID: <4C6D6559.3080809@cornell.edu>

Hi all,

This is your one-week reminder: the deadline for open applications to 
the GMOD Evo hackathon is Wednesday, August 25th.

Rob

========================================

We are seeking participants for the GMOD Tools for Evolutionary Biology
Hackathon, held November 8-12, 2010 at the US National Evolutionary
Synthesis Center (NESCent) in Durham, NC.

This hackathon targets three critical gaps in the capabilities of the
GMOD toolbox that currently limit its utility for evolutionary research:

  1. Visualization of comparative genomics data
  2. Visualization of phylogenetic data and trees
  3. Support for population diversity and phenotype data

If you are interested in these areas and have relevant expertise, you
are strongly encouraged to apply. Relevant areas of expertise include
more than just software development: if you are a GMOD power user,
visualization guru, domain expert (comparative, phylogenetics,
population, ...), or documentation wizard, then your skills are needed!

How To Apply:

Fill out the online application form at http://bit.ly/gmodevohack.
Applications are due August 25.

About GMOD:

GMOD is an intercompatible suite of open-source software components for
storing, managing, analyzing, and visualizing genome-scale data. GMOD
includes many widely-used software components: GBrowse and JBrowse, both
genome viewers; GBrowse_syn, a comparative genomics viewer; Chado, a
generic and modular database schema; CMap, a comparative map viewer; as
well as many other components including Apollo, MAKER, BioMart,
InterMine, and Galaxy. We hope to extend the functionality of existing
GMOD components, and integrate new components as well.

About Hackathons:

A hackathon is an intense event at which a group of programmers with
different backgrounds and skills collaborate hands-on and face-to-face
to develop working code that is of utility to the community as a whole.
The mix of people will include domain experts and computer-savvy end-users.

More details about the event, its motivation, organization, procedures,
and attendees, as well as URLs to the hackathon and related websites are
included below.

Sincerely,

The GMOD EvoHack Organizing Committee (and project affiliations as
relevant):

Nicole Washington, Chair (LBNL, modENCODE, Phenote)

Robert Buels (SGN, Chado NatDiv)

Scott Cain (OICR, GMOD)

Dave Clements (NESCent, GMOD)

Hilmar Lapp (NESCent, Phenoscape, Chado NatDiv)

Sheldon McKay (University of Arizona, iPlant, GBrowse_syn)


-----------------------------

About the GMOD Evo Hackathon

Overview

We are organizing a hackathon to fill critical gaps in the capabilities
of the Generic Model Organism Database (GMOD) toolbox that currently
limit its utility for evolutionary research. Specifically, we will focus
on tools for

   1) viewing comparative genomics data;
   2) visualizing phylogenomic data; and
   3) supporting population diversity data and phenotype annotation.

The event will be hosted at NESCent and bring together a group of about
20+ software developers, end-user representatives, and documentation
experts who would otherwise not meet. The participants will include key
developers of GMOD components that currently lack features critical for
emerging evolutionary biology research, developers of informatics tools
in evolutionary research that lack GMOD integration, and
informatics-savvy biologists who can represent end-user requirements.

The event will provide a unique opportunity to infuse the GMOD developer
community with a heightened awareness of unmet needs in evolutionary
biology that GMOD components have the potential to fill, and for tool
developers in evolutionary biology to better understand how best to
extend or integrate with already existing GMOD components.

Before the Event

Discussion of ideas and sometimes even design actually starts well
before the hackathon, on mailing lists, wiki pages, and conference calls
set up among accepted attendees.  This advance work lays the foundation
for participants to be productive from the very first day.  This also
means that participants should be willing to contribute some time in
advance of the hackathon itself to participate in this preparatory
discussion.

During the Event

Typically, hackathon participants use the morning of the first day of
the event to organize themselves into working groups of between 3 and 6
people, each with a focused implementation objective.  Ideas and
objectives are discussed, and attendees coalesce around the projects in
which they have the most experience or interest.


Deliverables / Event Results

The meeting's attendance, working groups, and outcomes will be fully
logged and documented on the GMOD wiki (http://gmod.org). Each working
group during the event will typically have its own wiki page, linked
from the main EvoHack page, where it documents its minutes and design
notes, and provides links to the code and documentation it produces.
Also, since GMOD and NESCent are both committed to open source
principles, all code and documentation produced by participants during
the hackathon must be published under an OSI-approved open source
license. As contributions to existing GMOD tools, all hackathon products
will most likely satisfy this requirement automatically.

NESCent

This event is sponsored by the US National Evolutionary Synthesis Center
(NESCent, http://www.nescent.org) through its Informatics Whitepapers
program (http://www.nescent.org/informatics/whitepapers.php). NESCent
promotes the synthesis of information, concepts and knowledge to address
significant, emerging, or novel questions in evolutionary science and
its applications. NESCent achieves this by supporting research and
education across disciplinary, institutional, geographic, and
demographic boundaries (see http://www.nescent.org/science/proposals.php).

Links

Main GMOD EvoHack page, and full proposal:
http://gmod.org/wiki/GMOD_Evo_Hackathon

NESCent: http://www.nescent.org/
GMOD: http://gmod.org <http://gmod.org/>
Similar past NESCent events, see: http://hackathon.nescent.org/
GMOD hackathon application:  http://bit.ly/gmodevohack

-- 
http://gmod.org/wiki/GMOD_News
http://gmod.org/wiki/GMOD_Europe_2010
http://gmod.org/wiki/Help_Desk_Feedback


From David.Messina at sbc.su.se  Thu Aug 19 14:55:50 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 19 Aug 2010 20:55:50 +0200
Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlast.pm - bl2seq
	question
In-Reply-To: <4C6D7123.9080908@bcm.tmc.edu>
References: <4C6C3259.4060304@bcm.tmc.edu>
	<E8F0F7A7-BC33-4E37-8AAB-75A9470E82A5@sbc.su.se>
	<4C6D7123.9080908@bcm.tmc.edu>
Message-ID: <4E977318-05AC-4D8E-9A39-8C07A2419198@sbc.su.se>


Glad I could help, Caleb.

Dave


On Aug 19, 2010, at 20:00, Caleb Davis <cdavis at bcm.tmc.edu> wrote:

> Hi Dave,
> 
> Thank you so much for your detailed response! Fixing the reward parameter replicated the online result for me.  All of the other factors you brought up will help me track down any future problems. Thanks again.
> 
> --Caleb
> 


From rmb32 at cornell.edu  Thu Aug 19 18:19:11 2010
From: rmb32 at cornell.edu (Robert Buels)
Date: Thu, 19 Aug 2010 15:19:11 -0700
Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query
In-Reply-To: <5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu>
References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au>
	<986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net>
	<AANLkTikiTQwzKJGrAKAtCqbHiAVw_uENrJu4Y038=anD@mail.gmail.com>
	<045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net>
	<5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu>
Message-ID: <4C6DADDF.1000103@cornell.edu>

Chris Fields wrote:
> I think it's worth exploring having a DBIx::Class-based middle-ware approach similar to what Rob Buels has done for Chado.  That would be fairly easy to get started using DBIx::Class::Schema::Loader.
> 
> After that it would require optimization and tweaking, which is potentially more complex than Rob's setup as Chado is very Pg-specific, but maybe Rob can elaborate...

Elaborating on how Bio::Chado::Schema is developed:

The vast majority of the code and POD in BCS is autogenerated by 
DBIx::Class::Schema::Loader.  DBICSL gives you a baseline set of 
DBIx::Class classes that covers all the tables, views, columns, unique 
constraints, and foreign key relationships.

Beyond that, you have to add on yourself.  In BCS, we have mostly done 
things like:

   * make better-named aliases for some of the autogenerated
     relationships (though DBICSL does a surprisingly good job of naming
     relationships automatically most of the time)
   * add a tiny bit of bioperl compatibility (this needs a lot more work
     by somebody, volunteers needed!)
   * add convenience methods for using some of the Chado property tables
   * use DBIx::Class::Tree::NestedSet to add some powerful ways of
     traversing phylogenetic tree relationships

Regarding DB backend specificity, BCS isn't Pg-specific at all, because 
DBIx::Class itself goes to great lengths to be compatible (and 
performant!) with just about every relational database out there.  In 
fact, the BCS test suite deploys a Chado schema into a temporary SQLite 
database using DBIC::Schema's deploy() method, and runs all of its tests 
on that.  Very handy.

Chado's Pg-specific server-side functions can of course be called 
through BCS if they are present, but it's perfectly possible to use 
Chado without any of the server-side functions, and mostly the way I use it.

Rob


From David.Messina at sbc.su.se  Fri Aug 20 05:19:14 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 20 Aug 2010 11:19:14 +0200
Subject: [Bioperl-l] Git for the lazy
Message-ID: <4A13D48C-B920-4FA5-AF18-292C764A8B79@sbc.su.se>

Hi everyone,

If you're like me and still getting up to speed with Git, you might find this helpful:

	http://www.spheredev.org/wiki/Git_for_the_lazy


Dave


From bgs500 at york.ac.uk  Fri Aug 20 09:07:50 2010
From: bgs500 at york.ac.uk (Ben Saville)
Date: Fri, 20 Aug 2010 14:07:50 +0100
Subject: [Bioperl-l] Problem Parsing BLAST output
Message-ID: <77D8BE69-E2D3-4C75-BED4-08F1755E414F@york.ac.uk>

Hi Everyone,

I'm very much new to the world of sequence data analysis (and this  
mailing list!), and have reached a roadblock.

I have BLASTed some contigs against a series of databases that I  
created. From this I would like to parse through the data and separate  
it before extracting the information of interest at a later point. I  
would like to separate the data by query ID. I found the following  
Bioperl script;

#!/usr/bin/perl

use Bio::Search::Result::BlastResult;
use Bio::SearchIO;

my $report = Bio::SearchIO->new( -file=>'All_BCM_results.bls', -format  
=> blast);
my $result = $report->next_result;
my %hits_by_query;
while (my $hit = $result->next_hit) {
   push @{$hits_by_query{$hit->name}}, $hit;
}

foreach my $qid ( keys %hits_by_query ) {
   my $result = Bio::Search::Result::BlastResult->new();
   $result->add_hit($_) for ( @{$hits_by_query{$qid}} );
   my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", - 
format=>'blast' );
   $blio->write_result($result);
}

running this script resulted in the following error;

BlastResult::new(): Not adding iterations.

------------- EXCEPTION: Bio::Root::NoSuchThing -------------
MSG: No such iteration number: 0. Valid range=1-0
VALUE: The number zero (0)
STACK: Error::throw
STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.8/Bio/Root/Root.pm:368
STACK: Bio::Search::Result::BlastResult::iteration /sw/lib/perl5/5.8.8/ 
Bio/Search/Result/BlastResult.pm:328
STACK: Bio::Search::Result::BlastResult::add_hit /sw/lib/perl5/5.8.8/ 
Bio/Search/Result/BlastResult.pm:258
STACK: /Users/bsaville/Desktop/Parsing_BLAST_by_query.pl:15
-------------------------------------------------------------

So I added
my $result = Bio::Search::Result::BlastResult->new(1);
The 1 to the line shown above, as it told me this was within the valid  
range. This produced the following error;

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Must define arrayref of Iterations when initializing a  
Bio::Search::Result::BlastResult

STACK: Error::throw
STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.8/Bio/Root/Root.pm:368
STACK: Bio::Search::Result::BlastResult::new /sw/lib/perl5/5.8.8/Bio/ 
Search/Result/BlastResult.pm:128
STACK: /Users/bsaville/Desktop/Parsing_BLAST_by_query.pl:14
-----------------------------------------------------------

I know that it is my inexperience that is causing this problem, but I  
really can't figure this out.

Regards
Ben Saville


From David.Messina at sbc.su.se  Fri Aug 20 09:48:28 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 20 Aug 2010 15:48:28 +0200
Subject: [Bioperl-l] Problem Parsing BLAST output
In-Reply-To: <77D8BE69-E2D3-4C75-BED4-08F1755E414F@york.ac.uk>
References: <77D8BE69-E2D3-4C75-BED4-08F1755E414F@york.ac.uk>
Message-ID: <0384052D-74D2-4789-B7FA-76EED826044F@sbc.su.se>

Hi Ben,

I would not use the script you posted ? I don't think it does what you want.

If you haven't already, you should take a look at the beginners' HOWTO

	http://www.bioperl.org/wiki/HOWTO:Beginners


 the SearchIO HOWTO

	http://www.bioperl.org/wiki/HOWTO:SearchIO


and the example scripts included with BioPerl:

	http://www.bioperl.org/wiki/Scripts


Incidentally, it's a lot of fiddly data processing to parse blast reports for many contigs against multiple databases and then go back and collate the results by query. I'm not sure exactly what you want to do once you've separated by query ? if you provide some more information, we could suggest ways to best get you where you want to go.

I will mention, though, that BLAST has the ability to search multiple separate databases in one go and collate the results for you. So that's something to consider.


Dave


From bernd.web at gmail.com  Fri Aug 20 11:17:05 2010
From: bernd.web at gmail.com (Bernd Web)
Date: Fri, 20 Aug 2010 17:17:05 +0200
Subject: [Bioperl-l] Bio::LocatableSeq end checking inconsistency
In-Reply-To: <004a01cb3aec$8c2ddd60$a4899820$%yin@ucd.ie>
References: <004a01cb3aec$8c2ddd60$a4899820$%yin@ucd.ie>
Message-ID: <AANLkTim2MyJ1XKmvYHr+8gX-j9h9z81==e5suTW09PWs@mail.gmail.com>

Hi Yin,

I am not quite sure if the following is also related to your gapped
length issue but I found I had to adapt the calculation of
ungapped_len in   Bio::LocatableSeq. If my slices did not contain any
letters or a new gap char I used, SimpleAlign could not find the
sequences when outputting the alignment. This was due to a difference
in length calculation:

SimpleAlign: uses \W:  $slice_seq =~ s/\W//g;
Bio::LocatableSeq::ungapped_len uses  "$string =~ s/[\.\-]+//g;"

I had to include '~' (for my local sequences) in the ungapped_len;
otherwise i would run into the end issues with SimpleAlign.


Kind regards,
Bernd


On Fri, Aug 13, 2010 at 3:36 PM, Jun Yin <jun.yin at ucd.ie> wrote:
> Hi, all,
>
>
>
> I am the google summer of code student working on Bio::Align subsystem
> refactoring. The code (Bio::SimpleAlign) I re-implemented now has passed
> nearly all the test, except a few tests on seq/start-end testing. But here
> comes a problem. This may be an old issue, that the Bio::LocatableSeq end
> assignment and checking are inconsistent.
>
>
>
> The current end checking method is based on:
>
> $end=$seq->_ungapped_len+$seq->start-1
>
> However, this checking may not fit the real world case.
>
>
>
> The inconsistency usually happens when a few columns of the sequence are
> removed.
>
>
>
> For example:
>
> my $a = Bio::LocatableSeq->new(
>
> ? ?-id ? ?=> 'a',
>
> ? ?-strand => 1,
>
> ? ?-seq ? => '-tcgatc-atcgatcg',
>
> ? ?-start => 30,
>
> ? ?-end ? => 43
>
> );
>
>
>
> If we remove the 1st, 8th and the last columns
>
>
>
> $a->seq() will be 'tcgatcatcgatc'
>
> $a->_ungapped_len==12
>
>
>
> Actually, in the real world, the first residue will still be 30 (the old
> $seq->start), and the last residue is the residue before the 43 (the old
> $seq->end), thus 42.
>
>
>
> But if you call a validation, the calculation is
> $a->_ungapped_len+$a->start-1=12+30-1=41
>
> So the reassignment of the $seq->end will not pass the validation.
>
>
>
> So unless you save the information to a new sequence object, the original
> position information will be lost anyway. But in some cases, we have to
> change the sequence in its original sequence object ..
>
>
>
> What is your suggestion on this issue?
>
> A. pass the test and lose the information ? ? ?#convenient in coding but the
> start-end annotation is not right any more
>
> B. keep the information and forget the test ? #the object will still
> remember where the last residue was in the original sequence. But is it
> really meaningful at all? Because all the other residues may come from
> nowhere
>
> C. Neither of above #any other suggestions?
>
>
>
> Cheers,
>
> Jun Yin
>
> Ph.D. student in U.C.D.
>
>
>
> Bioinformatics Laboratory
>
> Conway Institute
>
> University College Dublin
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From sidd.basu at gmail.com  Fri Aug 20 11:59:59 2010
From: sidd.basu at gmail.com (Siddhartha Basu)
Date: Fri, 20 Aug 2010 10:59:59 -0500
Subject: [Bioperl-l]  Re: bioperl-db and postgres8.3 - status query
In-Reply-To: <4C6DADDF.1000103@cornell.edu>
References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au>
	<986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net>
	<AANLkTikiTQwzKJGrAKAtCqbHiAVw_uENrJu4Y038=anD@mail.gmail.com>
	<045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net>
	<5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu>
	<4C6DADDF.1000103@cornell.edu>
Message-ID: <20100820155956.GB400@vpn-165-124-164-118.vpn.northwestern.edu>

Hi, 

On Thu, 19 Aug 2010, Robert Buels wrote:

> Chris Fields wrote:
> > I think it's worth exploring having a DBIx::Class-based middle-ware 
> > approach similar to what Rob Buels has done for Chado.  That would be 
> > fairly easy to get started using DBIx::Class::Schema::Loader.
> > After that it would require optimization and tweaking, which is 
> > potentially more complex than Rob's setup as Chado is very Pg-specific, 
> > but maybe Rob can elaborate...
>
> Elaborating on how Bio::Chado::Schema is developed:
>
> The vast majority of the code and POD in BCS is autogenerated by 
> DBIx::Class::Schema::Loader.  DBICSL gives you a baseline set of 
> DBIx::Class classes that covers all the tables, views, columns, unique 
> constraints, and foreign key relationships.
>
> Beyond that, you have to add on yourself.  In BCS, we have mostly done 
> things like:
>
>   * make better-named aliases for some of the autogenerated
>     relationships (though DBICSL does a surprisingly good job of naming
>     relationships automatically most of the time)
>   * add a tiny bit of bioperl compatibility (this needs a lot more work
>     by somebody, volunteers needed!)
>   * add convenience methods for using some of the Chado property tables
>   * use DBIx::Class::Tree::NestedSet to add some powerful ways of
>     traversing phylogenetic tree relationships
>
> Regarding DB backend specificity, BCS isn't Pg-specific at all, because 
> DBIx::Class itself goes to great lengths to be compatible (and performant!) 
> with just about every relational database out there.  
I would vouch for that at least as far as chado in oracle is concerned.
So,  far BCS works out flawlessly with our oracle chado instance at
dictybase. Quite a chunk of BCS based code is also active in couple of
our Mojo based webapps. The part which i still couldn't use directly is
the 'synonym' table as it clashes with oracle specific reserved keywords. 
However,  overall it seems to quite cross-RDMS compatible and highly
recommended.

-siddhartha


>In fact, the BCS test 
> suite deploys a Chado schema into a temporary SQLite database using 
> DBIC::Schema's deploy() method, and runs all of its tests on that.  Very 
> handy.
>
> Chado's Pg-specific server-side functions can of course be called through 
> BCS if they are present, but it's perfectly possible to use Chado without 
> any of the server-side functions, and mostly the way I use it.
>
> Rob
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jun.yin at ucd.ie  Fri Aug 20 12:17:33 2010
From: jun.yin at ucd.ie (Jun Yin)
Date: Fri, 20 Aug 2010 17:17:33 +0100
Subject: [Bioperl-l] Bio::LocatableSeq end checking inconsistency
In-Reply-To: <AANLkTim2MyJ1XKmvYHr+8gX-j9h9z81==e5suTW09PWs@mail.gmail.com>
References: <004a01cb3aec$8c2ddd60$a4899820$%yin@ucd.ie>
	<AANLkTim2MyJ1XKmvYHr+8gX-j9h9z81==e5suTW09PWs@mail.gmail.com>
Message-ID: <000b01cb4083$31f98280$95ec8780$%yin@ucd.ie>

Hi, Bernd,

Thx for your input. 

Yes, this is one of the old bugs in Bio::SimpleAlign.  $aln->slice just
simply $slice_seq =~ s/\W//g to calculate the ungapped length.
But in  $seq->_ungapped_len, this method use $string =~
s{[$GAP_SYMBOLS$FRAMESHIFT_SYMBOLS]+}{}g;
Which is '\-\.=~\\\/ ' to calculate the ungapped length.

To solve this problem, first, now I use 
$nonres = join("",$self->gap_char, $self->match_char,$self->missing_char);
Which is '-\.&' to remove the non-residue chars in the alignment sequence
(though if you use '=','~','\','/' will also cause problems).

Secondly, I have merged slice, remove_columns and remove_gaps, using the
same internal function. Thus it is easier to debug.

These changes will be merged into main BioPerl branch after next version.

But anyway, the confict is still there, because the non residue chars are
defined as:
In Bio::SimpleAlign, $aln->gap_char, $aln->missing_char, $aln->match_char
In Bio::LocatableSeq   
$GAP_SYMBOLS = '\-\.=~';
$FRAMESHIFT_SYMBOLS = '\\\/';

so try to use '-' or '.' for your gap char at the moment, otherwise you may
encounter end warnings in calculation.

And, if you want to keep gap only sequences, you can call the method as:
$aln2 = $aln->slice(20,30,1)
The last parameter is to keep gap only sequence.

Cheers,
Jun Yin
Ph.D.?student in U.C.D.

Bioinformatics Laboratory
Conway Institute
University College Dublin


-----Original Message-----
From: Bernd Web [mailto:bernd.web at gmail.com] 
Sent: Friday, August 20, 2010 4:17 PM
To: Jun Yin
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Bio::LocatableSeq end checking inconsistency

Hi Yin,

I am not quite sure if the following is also related to your gapped
length issue but I found I had to adapt the calculation of
ungapped_len in   Bio::LocatableSeq. If my slices did not contain any
letters or a new gap char I used, SimpleAlign could not find the
sequences when outputting the alignment. This was due to a difference
in length calculation:

SimpleAlign: uses \W:  $slice_seq =~ s/\W//g;
Bio::LocatableSeq::ungapped_len uses  "$string =~ s/[\.\-]+//g;"

I had to include '~' (for my local sequences) in the ungapped_len;
otherwise i would run into the end issues with SimpleAlign.


Kind regards,
Bernd


On Fri, Aug 13, 2010 at 3:36 PM, Jun Yin <jun.yin at ucd.ie> wrote:
> Hi, all,
>
>
>
> I am the google summer of code student working on Bio::Align subsystem
> refactoring. The code (Bio::SimpleAlign) I re-implemented now has passed
> nearly all the test, except a few tests on seq/start-end testing. But here
> comes a problem. This may be an old issue, that the Bio::LocatableSeq end
> assignment and checking are inconsistent.
>
>
>
> The current end checking method is based on:
>
> $end=$seq->_ungapped_len+$seq->start-1
>
> However, this checking may not fit the real world case.
>
>
>
> The inconsistency usually happens when a few columns of the sequence are
> removed.
>
>
>
> For example:
>
> my $a = Bio::LocatableSeq->new(
>
> ? ?-id ? ?=> 'a',
>
> ? ?-strand => 1,
>
> ? ?-seq ? => '-tcgatc-atcgatcg',
>
> ? ?-start => 30,
>
> ? ?-end ? => 43
>
> );
>
>
>
> If we remove the 1st, 8th and the last columns
>
>
>
> $a->seq() will be 'tcgatcatcgatc'
>
> $a->_ungapped_len==12
>
>
>
> Actually, in the real world, the first residue will still be 30 (the old
> $seq->start), and the last residue is the residue before the 43 (the old
> $seq->end), thus 42.
>
>
>
> But if you call a validation, the calculation is
> $a->_ungapped_len+$a->start-1=12+30-1=41
>
> So the reassignment of the $seq->end will not pass the validation.
>
>
>
> So unless you save the information to a new sequence object, the original
> position information will be lost anyway. But in some cases, we have to
> change the sequence in its original sequence object ..
>
>
>
> What is your suggestion on this issue?
>
> A. pass the test and lose the information ? ? ?#convenient in coding but
the
> start-end annotation is not right any more
>
> B. keep the information and forget the test ? #the object will still
> remember where the last residue was in the original sequence. But is it
> really meaningful at all? Because all the other residues may come from
> nowhere
>
> C. Neither of above #any other suggestions?
>
>
>
> Cheers,
>
> Jun Yin
>
> Ph.D. student in U.C.D.
>
>
>
> Bioinformatics Laboratory
>
> Conway Institute
>
> University College Dublin
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

__________ Information from ESET Smart Security, version of virus signature
database 5377 (20100818) __________

The message was checked by ESET Smart Security.

http://www.eset.com


__________ Information from ESET Smart Security, version of virus signature
database 5377 (20100818) __________

The message was checked by ESET Smart Security.

http://www.eset.com
 

From cjfields at illinois.edu  Fri Aug 20 12:23:07 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 20 Aug 2010 11:23:07 -0500
Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query
In-Reply-To: <20100820155956.GB400@vpn-165-124-164-118.vpn.northwestern.edu>
References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au>
	<986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net>
	<AANLkTikiTQwzKJGrAKAtCqbHiAVw_uENrJu4Y038=anD@mail.gmail.com>
	<045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net>
	<5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu>
	<4C6DADDF.1000103@cornell.edu>
	<20100820155956.GB400@vpn-165-124-164-118.vpn.northwestern.edu>
Message-ID: <1282321387.30812.17.camel@pyrimidine.igb.uiuc.edu>

On Fri, 2010-08-20 at 10:59 -0500, Siddhartha Basu wrote:
> Hi, 
> 
> On Thu, 19 Aug 2010, Robert Buels wrote:
> 
> > Chris Fields wrote:
> > > I think it's worth exploring having a DBIx::Class-based middle-ware 
> > > approach similar to what Rob Buels has done for Chado.  That would be 
> > > fairly easy to get started using DBIx::Class::Schema::Loader.
> > > After that it would require optimization and tweaking, which is 
> > > potentially more complex than Rob's setup as Chado is very Pg-specific, 
> > > but maybe Rob can elaborate...
> >
> > Elaborating on how Bio::Chado::Schema is developed:
> >
> > The vast majority of the code and POD in BCS is autogenerated by 
> > DBIx::Class::Schema::Loader.  DBICSL gives you a baseline set of 
> > DBIx::Class classes that covers all the tables, views, columns, unique 
> > constraints, and foreign key relationships.
> >
> > Beyond that, you have to add on yourself.  In BCS, we have mostly done 
> > things like:
> >
> >   * make better-named aliases for some of the autogenerated
> >     relationships (though DBICSL does a surprisingly good job of naming
> >     relationships automatically most of the time)
> >   * add a tiny bit of bioperl compatibility (this needs a lot more work
> >     by somebody, volunteers needed!)
> >   * add convenience methods for using some of the Chado property tables
> >   * use DBIx::Class::Tree::NestedSet to add some powerful ways of
> >     traversing phylogenetic tree relationships
> >
> > Regarding DB backend specificity, BCS isn't Pg-specific at all, because 
> > DBIx::Class itself goes to great lengths to be compatible (and performant!) 
> > with just about every relational database out there.  
> I would vouch for that at least as far as chado in oracle is concerned.
> So,  far BCS works out flawlessly with our oracle chado instance at
> dictybase. Quite a chunk of BCS based code is also active in couple of
> our Mojo based webapps. The part which i still couldn't use directly is
> the 'synonym' table as it clashes with oracle specific reserved keywords. 
> However,  overall it seems to quite cross-RDMS compatible and highly
> recommended.
> 
> -siddhartha

Just to point out, I didn't say BCS is Pg-specific, but that Chado is
(that was the DBMS it was designed for).  Maybe that should be amended
to 'was' now :)

I recall seeing a page on this somewhere on the GMOD website along the
lines of "MySQL has problems so we chose Pg", and that Chado support
would focus on Pg.  I'm guessing that's no longer the case?  Or is only
the server-side stuff Pg-specific.

> >In fact, the BCS test 
> > suite deploys a Chado schema into a temporary SQLite database using 
> > DBIC::Schema's deploy() method, and runs all of its tests on that.  Very 
> > handy.
> >
> > Chado's Pg-specific server-side functions can of course be called through 
> > BCS if they are present, but it's perfectly possible to use Chado without 
> > any of the server-side functions, and mostly the way I use it.
> >
> > Rob

I think this opens up the possibility of starting a DBIx::Class-based
middleware solution.  Hilmar, did you want to take that on?

chris


From sidd.basu at gmail.com  Fri Aug 20 13:39:44 2010
From: sidd.basu at gmail.com (Siddhartha Basu)
Date: Fri, 20 Aug 2010 12:39:44 -0500
Subject: [Bioperl-l]  Re: bioperl-db and postgres8.3 - status query
In-Reply-To: <1282321387.30812.17.camel@pyrimidine.igb.uiuc.edu>
References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au>
	<986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net>
	<AANLkTikiTQwzKJGrAKAtCqbHiAVw_uENrJu4Y038=anD@mail.gmail.com>
	<045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net>
	<5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu>
	<4C6DADDF.1000103@cornell.edu>
	<20100820155956.GB400@vpn-165-124-164-118.vpn.northwestern.edu>
	<1282321387.30812.17.camel@pyrimidine.igb.uiuc.edu>
Message-ID: <20100820173942.GC400@vpn-165-124-164-118.vpn.northwestern.edu>

On Fri, 20 Aug 2010, Chris Fields wrote:

> On Fri, 2010-08-20 at 10:59 -0500, Siddhartha Basu wrote:
> > Hi, 
> > 
> > On Thu, 19 Aug 2010, Robert Buels wrote:
> > 
> > > Chris Fields wrote:
> > > > I think it's worth exploring having a DBIx::Class-based middle-ware 
> > > > approach similar to what Rob Buels has done for Chado.  That would be 
> > > > fairly easy to get started using DBIx::Class::Schema::Loader.
> > > > After that it would require optimization and tweaking, which is 
> > > > potentially more complex than Rob's setup as Chado is very Pg-specific, 
> > > > but maybe Rob can elaborate...
> > >
> > > Elaborating on how Bio::Chado::Schema is developed:
> > >
> > > The vast majority of the code and POD in BCS is autogenerated by 
> > > DBIx::Class::Schema::Loader.  DBICSL gives you a baseline set of 
> > > DBIx::Class classes that covers all the tables, views, columns, unique 
> > > constraints, and foreign key relationships.
> > >
> > > Beyond that, you have to add on yourself.  In BCS, we have mostly done 
> > > things like:
> > >
> > >   * make better-named aliases for some of the autogenerated
> > >     relationships (though DBICSL does a surprisingly good job of naming
> > >     relationships automatically most of the time)
> > >   * add a tiny bit of bioperl compatibility (this needs a lot more work
> > >     by somebody, volunteers needed!)
> > >   * add convenience methods for using some of the Chado property tables
> > >   * use DBIx::Class::Tree::NestedSet to add some powerful ways of
> > >     traversing phylogenetic tree relationships
> > >
> > > Regarding DB backend specificity, BCS isn't Pg-specific at all, because 
> > > DBIx::Class itself goes to great lengths to be compatible (and performant!) 
> > > with just about every relational database out there.  
> > I would vouch for that at least as far as chado in oracle is concerned.
> > So,  far BCS works out flawlessly with our oracle chado instance at
> > dictybase. Quite a chunk of BCS based code is also active in couple of
> > our Mojo based webapps. The part which i still couldn't use directly is
> > the 'synonym' table as it clashes with oracle specific reserved keywords. 
> > However,  overall it seems to quite cross-RDMS compatible and highly
> > recommended.
> > 
> > -siddhartha
> 
> Just to point out, I didn't say BCS is Pg-specific, but that Chado is
> (that was the DBMS it was designed for).  Maybe that should be amended
> to 'was' now :)
> 
> I recall seeing a page on this somewhere on the GMOD website along the
> lines of "MySQL has problems so we chose Pg", and that Chado support
> would focus on Pg.  
As far as i understand GMOD stongly recommends and the popular backend
for chado is Pg. However, my point was if anybody wants to use or tryout chado
schema on a different backend or have an existing setup,  
tools like DBIx::Class or particularly BCS makes it quite easier to do
so. The code developed on top also become quite robust and portable.

-siddhartha 

>I'm guessing that's no longer the case?  Or is only
> the server-side stuff Pg-specific.
> 
> > >In fact, the BCS test 
> > > suite deploys a Chado schema into a temporary SQLite database using 
> > > DBIC::Schema's deploy() method, and runs all of its tests on that.  Very 
> > > handy.
> > >
> > > Chado's Pg-specific server-side functions can of course be called through 
> > > BCS if they are present, but it's perfectly possible to use Chado without 
> > > any of the server-side functions, and mostly the way I use it.
> > >
> > > Rob
> 
> I think this opens up the possibility of starting a DBIx::Class-based
> middleware solution.  Hilmar, did you want to take that on?
> 
> chris
> 
> 


From buiduyminh at gmail.com  Fri Aug 20 17:29:00 2010
From: buiduyminh at gmail.com (Minh Bui)
Date: Fri, 20 Aug 2010 17:29:00 -0400
Subject: [Bioperl-l] bp_seqfeature_load.pl fails on Mac os. Please help.
Message-ID: <AANLkTinsyOMPJxpks_pqMwLpW8gx0VRihhJsLDnF53mu@mail.gmail.com>

Hi,,
I am trying to load my GFF file to mysql database but I got this error
when I ran the bp_seqfeature_load.pl ( bioperl 1.6.1 on  MAC)

[BioComplexity-5:/usr/local/bin] minh% perl bp_seqfeature_load.pl
install_driver(mysql) failed: Can't locate DBD/mysql.pm in @INC (@INC
contains: /sw/lib/perl5 /sw/lib/perl5/darwin
/System/Library/Perl/5.8.6/darwin-thread-multi-2level
/System/Library/Perl/5.8.6
/Library/Perl/5.8.6/darwin-thread-multi-2level /Library/Perl/5.8.6
/Library/Perl /Network/Library/Perl/5.8.6/darwin-thread-multi-2level
/Network/Library/Perl/5.8.6 /Network/Library/Perl
/System/Library/Perl/Extras/5.8.6/darwin-thread-multi-2level
/System/Library/Perl/Extras/5.8.6 /Library/Perl/5.8.1 .) at (eval 44)
line 3.
Perhaps the DBD::mysql perl module hasn't been fully installed,
or perhaps the capitalisation of 'mysql' isn't right.
Available drivers: DBM, ExampleP, File, Gofer, Proxy, Sponge.
 at /Library/Perl/5.8.6/Bio/DB/SeqFeature/Store/DBI/mysql.pm line 212

I am using MAC OSX version 10.4.10 and MAMP? Isnt it the
"/Library/Perl/5.8.6" already in @INC? What am I missing?
I have been googling this error for a few hours. I also install
Bioperl and reinstall DBD::mysql using CPAN. It still doesnt work..

Here is my $PERL5LIB:  /sw/lib/perl5:/sw/lib/perl5/darwin/

I really need help on this.
Thank you,


From awitney at sgul.ac.uk  Sat Aug 21 06:39:10 2010
From: awitney at sgul.ac.uk (Adam Witney)
Date: Sat, 21 Aug 2010 11:39:10 +0100
Subject: [Bioperl-l] bp_seqfeature_load.pl fails on Mac os. Please help.
In-Reply-To: <AANLkTinsyOMPJxpks_pqMwLpW8gx0VRihhJsLDnF53mu@mail.gmail.com>
References: <AANLkTinsyOMPJxpks_pqMwLpW8gx0VRihhJsLDnF53mu@mail.gmail.com>
Message-ID: <491D1B66-741F-4315-8A6B-46F465956017@sgul.ac.uk>


On 20 Aug 2010, at 22:29, Minh Bui wrote:

> Hi,,
> I am trying to load my GFF file to mysql database but I got this error
> when I ran the bp_seqfeature_load.pl ( bioperl 1.6.1 on  MAC)
> 
> [BioComplexity-5:/usr/local/bin] minh% perl bp_seqfeature_load.pl
> install_driver(mysql) failed: Can't locate DBD/mysql.pm in @INC (@INC
> contains: /sw/lib/perl5 /sw/lib/perl5/darwin
> /System/Library/Perl/5.8.6/darwin-thread-multi-2level
> /System/Library/Perl/5.8.6
> /Library/Perl/5.8.6/darwin-thread-multi-2level /Library/Perl/5.8.6
> /Library/Perl /Network/Library/Perl/5.8.6/darwin-thread-multi-2level
> /Network/Library/Perl/5.8.6 /Network/Library/Perl
> /System/Library/Perl/Extras/5.8.6/darwin-thread-multi-2level
> /System/Library/Perl/Extras/5.8.6 /Library/Perl/5.8.1 .) at (eval 44)
> line 3.
> Perhaps the DBD::mysql perl module hasn't been fully installed,
> or perhaps the capitalisation of 'mysql' isn't right.
> Available drivers: DBM, ExampleP, File, Gofer, Proxy, Sponge.
> at /Library/Perl/5.8.6/Bio/DB/SeqFeature/Store/DBI/mysql.pm line 212
> 
> I am using MAC OSX version 10.4.10 and MAMP? Isnt it the
> "/Library/Perl/5.8.6" already in @INC? What am I missing?
> I have been googling this error for a few hours. I also install
> Bioperl and reinstall DBD::mysql using CPAN. It still doesnt work..
> 
> Here is my $PERL5LIB:  /sw/lib/perl5:/sw/lib/perl5/darwin/


Where did DBD:mysql get installed? can you verify that DBD/mysql.pm is actually in one of those directories listed above?


From i.hatethispart at ymail.com  Sat Aug 21 10:07:28 2010
From: i.hatethispart at ymail.com (keiko)
Date: Sat, 21 Aug 2010 07:07:28 -0700 (PDT)
Subject: [Bioperl-l] clustalw.exe
In-Reply-To: <3612399.post@talk.nabble.com>
References: <3612399.post@talk.nabble.com>
Message-ID: <29499435.post@talk.nabble.com>


Katrin wrote:
> 
> hello, I am a new Perl/Bioperl-User and first I must excuse me for my
> really bad english, but I hope everybody will understand me. I have the
> following problem: In my Perl-skript is the following system call:
> $y=exec("C:\\Programme\\xampp-win32-1.5.1\\xampp\\perl\\clustalw.exe
> C:\\Programme\\xampp-win32-1.5.1\\xampp\\htdocs\\gene\\clustal.fasta"); If
> I call this Script with the Shell (cmd.exe) everything works correctly.
> But if I call this script with PHP I get the following error message:
> Error: unknown option
> /C:\Programme\xampp-win32-1.5.1\xampp\htdocs\gene\clustal.fasta. I tried
> also system and qx. And I tested the environment variables: I wrote a
> bat-file with the definition of all environment-variables and the system
> call, but this did not work, too. The same problem is in php. The
> PHP-Scipt is called from html and I worked under WindowsXP with xampp. I
> hope, somebody can help me. greetings Katrin
> 

Hi. I also have a problem with this one. I want to call clustalw using php.
Can I ask what you included in your bat-file and where did you download your
clustal? thanks a lot!
-- 
View this message in context: http://old.nabble.com/clustalw.exe-tp3612399p29499435.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From jason at bioperl.org  Sun Aug 22 14:29:30 2010
From: jason at bioperl.org (Jason Stajich)
Date: Sun, 22 Aug 2010 11:29:30 -0700
Subject: [Bioperl-l] Enquiry on Bio::DB::Taxonomy
In-Reply-To: <AANLkTik9qpKSQV9dRKzxSrt_q5qq=g6X6eop8LTqkRVm@mail.gmail.com>
References: <AANLkTik9qpKSQV9dRKzxSrt_q5qq=g6X6eop8LTqkRVm@mail.gmail.com>
Message-ID: <4C716C8A.3010000@bioperl.org>

Hi Amali -

This is how I'd print out the full classification by using the Tree 
methods (with probably a different way of initializing the $db object to 
your flatfiles location).

#!/usr/bin/perl -w
use strict;
use Bio::DB::Taxonomy;

my $db= Bio::DB::Taxonomy->new(-source => 'flatfile',
                    -nodesfile => 'taxonomy/nodes.dmp',
                    -namesfile => 'taxonomy/names.dmp');

my $taxonid = $db->get_taxonid('Homo sapiens');
my $taxon = $db->get_taxon(-taxonid => $taxonid);
my $tree = Bio::Tree::Tree->new(-node => $taxon);
my @taxa = $tree->get_nodes;
print join(",", map { $_->scientific_name } @taxa), "\n";

-jason

Amali Thrimawithana wrote, On 8/18/10 3:56 PM:
> Dear Dr Stajich,
>
> I am a Masters student at Auckland university and my research is on
> identifying yeast species present in wine by the use of 454 sequencing. In
> order to carry out this research, a pipeline is being built in which at the
> final step each representative OTU need to be classified at different
> taxonomic levels (ie: at Phylum, family, class, genus and species) by using
> the results from BLAST. To identify the sequences at each taxonomic level, I
> have been trying out the Bio::DB::Taxonomy module in bioperl. Using this
> module, I am able to get the genus and species level by splitting the
> scientific name returned by the Bio::taxon object. But unfortunately I am
> uncertain on how to get the information for the other levels of the rank. I
> have tried several commands including "my @class = $node->classification;",
> but it does not work. Hence, could you please let me know how I might be
> able to get the higher levels of taxonomy such as class and phylum using
> bioperl?
>
> Look forward to hearing from you soon
>
> Thanking You
>
> Amali
>    


From cjfields at illinois.edu  Sun Aug 22 15:56:58 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Sun, 22 Aug 2010 14:56:58 -0500
Subject: [Bioperl-l] clustalw.exe
In-Reply-To: <29499435.post@talk.nabble.com>
References: <3612399.post@talk.nabble.com> <29499435.post@talk.nabble.com>
Message-ID: <E6C6AE4B-A6AB-4B90-8D81-74DE14B165BD@illinois.edu>

On Aug 21, 2010, at 9:07 AM, keiko wrote:

> Katrin wrote:
>> 
>> hello, I am a new Perl/Bioperl-User and first I must excuse me for my
>> really bad english, but I hope everybody will understand me. I have the
>> following problem: In my Perl-skript is the following system call:
>> $y=exec("C:\\Programme\\xampp-win32-1.5.1\\xampp\\perl\\clustalw.exe
>> C:\\Programme\\xampp-win32-1.5.1\\xampp\\htdocs\\gene\\clustal.fasta"); If
>> I call this Script with the Shell (cmd.exe) everything works correctly.
>> But if I call this script with PHP I get the following error message:
>> Error: unknown option
>> /C:\Programme\xampp-win32-1.5.1\xampp\htdocs\gene\clustal.fasta. I tried
>> also system and qx. And I tested the environment variables: I wrote a
>> bat-file with the definition of all environment-variables and the system
>> call, but this did not work, too. The same problem is in php. The
>> PHP-Scipt is called from html and I worked under WindowsXP with xampp. I
>> hope, somebody can help me. greetings Katrin
>> 
> 
> Hi. I also have a problem with this one. I want to call clustalw using php.
> Can I ask what you included in your bat-file and where did you download your
> clustal? thanks a lot!

Not sure, but what does this have to do with BioPerl?

chris


From jason at bioperl.org  Mon Aug 23 11:56:47 2010
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 23 Aug 2010 08:56:47 -0700
Subject: [Bioperl-l] a problem when using the Bioperl modules
In-Reply-To: <AANLkTinZYJC6JwP776K3phzbAmtjiKMi_K_VTH=B6oeC@mail.gmail.com>
References: <AANLkTinZYJC6JwP776K3phzbAmtjiKMi_K_VTH=B6oeC@mail.gmail.com>
Message-ID: <4C729A3F.7080304@bioperl.org>

Wei -

Please ask your questions on the bioperl mailing list, I cannot answer 
questions directly for all requests.
Your problem has been answered by me on the list before so I urge you to 
use the list archives as a starting point.

The line lengths of the fasta file sequence aren't the same length.

you need to run this
bp_sreformat -if fasta -of fasta -i ORIGINAL -o NEW
mv NEW ORIGINAL

or with sreformat
sreformat fasta ORIGINAL > NEW
mv NEW ORIGINAL


Guifeng Wei wrote, On 8/23/10 4:57 AM:
> Dear professor Stajich,
> So sorry to interrupt you. i came across a problem when i use the 
> Bio::DB::Fasta modules of BioPerl.  The aim i want to arrive at is to 
> extract the subsequences accoording to the *.bed files which are the 
> C.elegans genomic sequnece annotation.  The code i programed is in the 
> attached file.
> The genomic sequences file contains sequences from 6 chromosomes of 
> C.elegans.
> when i run this program in the command line, the following error 
> warnings was coming.
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Each line of the fasta entry must be the same length except the last.
>     Line above #301451 '
> ..' is 22 != 51 chars.
> STACK: Error::throw
> STACK: Bio::Root::Root::throw 
> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368
> STACK: Bio::DB::Fasta::calculate_offsets 
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:770
> STACK: Bio::DB::Fasta::index_file 
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:680
> STACK: Bio::DB::Fasta::new 
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:491
> STACK: bed_to_fasta.pl:14 <http://bed_to_fasta.pl:14>
> -----------------------------------------------------------
> indexing was interrupted, so unlinking 
> /home/wgf/WORM_DATA/elegans.WS190.dna.fa.index at 
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm line 1053.
>
> and therefore i write to you in hope that you can help me solve this 
> problem,as well as, give me some suggestion about how to learn Bioperl 
> well.
> thank you very very much.
> yours sincerely
> Wei Guifeng


From jason.stajich at ucr.edu  Mon Aug 23 11:58:07 2010
From: jason.stajich at ucr.edu (Jason Stajich)
Date: Mon, 23 Aug 2010 08:58:07 -0700
Subject: [Bioperl-l] a problem when using the Bioperl modules
In-Reply-To: <AANLkTinrqwQCho_obj-_9MvQAyLEBVvaFA+HzJpFKovS@mail.gmail.com>
References: <AANLkTinZYJC6JwP776K3phzbAmtjiKMi_K_VTH=B6oeC@mail.gmail.com>
	<AANLkTinrqwQCho_obj-_9MvQAyLEBVvaFA+HzJpFKovS@mail.gmail.com>
Message-ID: <4C729A8F.1070506@ucr.edu>

You haven't defined this variable $db - you need to not skip the part 
that initializes the Bio::DB::Fasta object that you had previous asked 
about.
Please send all your future queries to the mailing list.


Guifeng Wei wrote, On 8/23/10 8:14 AM:
> Dear professor,
> after that, i revised my scripts, which is that i divide the genomic 
> sequences into 7 single file, every file contains the sequence from a 
> chromosome.
> however, when i try to run the scripts, the following error was coming.
> Can't call method "seq" on an undefined value at bed_to_fasta.pl 
> <http://bed_to_fasta.pl> line 29, <IN> line 1.
> while(<IN>){
>         chomp $_;
>         my @bed=split(/\s+/, $_ );
>     #print length($db->seq('chrI'));
>         my $chr_id=$bed[0];
>         my $start=$bed[1];
>         my $end=$bed[2];
>         my $seq_name=$bed[3];
>         my $strand=$bed[5];
> my $segment =  $db ->seq($chr_id,$start=>$end);
>         print ">",$seq_name,"_",$chr_id,":",$start=>$end;
>         print "$segment\n";
> }
> the blue line is .
> why?

-- 
Jason E. Stajich, PhD
Assistant Professor
Department of Plant Pathology & Microbiology
University of California
Riverside, CA 92521
jason.stajich at ucr.edu
office: 951.827.2363

http://lab.stajich.org/
http://twitter.com/stajichlab
http://fungalgenomes.org/blog/

http://plantpathology.ucr.edu/
http://genomics.ucr.edu/
http://cepceb.ucr.edu/


From guifengwei at gmail.com  Mon Aug 23 22:44:57 2010
From: guifengwei at gmail.com (Guifeng Wei)
Date: Tue, 24 Aug 2010 10:44:57 +0800
Subject: [Bioperl-l] a problem when using the Bio::DB::Fasta
Message-ID: <AANLkTinnyEw4f8F5BP+CicffaVCe+pBEXc+0rj5vu1iG@mail.gmail.com>

Hi,

i came across a problem when i use the Bio::DB::Fasta modules of
BioPerl. The aim i want to arrive at is to extract the subsequences
accoording to the *.bed files which are the C.elegans genomic sequnece
annotation.

when i tried to run the scripts i wrote, the error message was coming, as
follows:

Can't call method "seq" on an undefined value at bed_to_fasta.pl line 28,
<IN> line 1.

so, ask for favor to slove this problem.
Here is my perl scripts.

#!/usr/bin/perl -w
# Purpose: extract sequences from genomic sequences
use strict;
use Bio::DB::Fasta;
open(IN,$ARGV[0]) || die "sorry, the program cannot open the .bed file, plea
check it. \n";
my $db = Bio::DB::Fasta->new( '/home/wgf/elegans190.dna/' );
# The dir ...../elegans190.dna/ includes 6
files:chrI,chrII,chrIII,chrIV,chrV,chrX,
#each stands for the sequences from the coressponding chromosome.

while(<IN>){
        chomp $_;
        my @bed=split(/\s+/, $_ );

        my $chr_id=$bed[0];
        my $start=$bed[1];
        my $end=$bed[2];
        my $seq_name=$bed[3];
        my $strand=$bed[5];

        my $segment =  $db->seq( $chr_id, $start=>$end );

        print ">",$seq_name,"_",$chr_id,":",$start=>$end;
        print "$segment\n";

}

close(IN);


From florent.angly at gmail.com  Tue Aug 24 01:06:21 2010
From: florent.angly at gmail.com (Florent Angly)
Date: Tue, 24 Aug 2010 15:06:21 +1000
Subject: [Bioperl-l] a problem when using the Bio::DB::Fasta
In-Reply-To: <AANLkTinnyEw4f8F5BP+CicffaVCe+pBEXc+0rj5vu1iG@mail.gmail.com>
References: <AANLkTinnyEw4f8F5BP+CicffaVCe+pBEXc+0rj5vu1iG@mail.gmail.com>
Message-ID: <4C73534D.6080607@gmail.com>

  Hi Guifeng,

 From the Bio::DB::Fasta documentation:
>        $db = Bio::DB::Fasta->new($fasta_path [,%options])
>          Create a new Bio::DB::Fasta object from the Fasta file or files
>          indicated by $fasta_path.  Indexing will be performed 
> automatically
>          if needed.  If successful, new() will return the database 
> accessor
>          object.  Otherwise it will return undef.

Hence, after you create the database object $db, you should check that 
it was successful, e.g.:
> my $db = Bio::DB::Fasta->new( '/home/wgf/elegans190.dna/' );
> if (not defined $db) {
>   die "There was a problem creating the database\n";
> }
A problem creating the database would explain the message you get.

If the extension of the FASTA files in the directory path that you gave 
as input is not fa, fasta, fast, FA, FASTA, FAST or dna, then you should 
use the -glob option when constructing your database object. From the 
documentation:
>           -glob         Glob expression to use    
> *.{fa,fasta,fast,FA,FASTA,FAST,dna}
>                         for searching for Fasta
>                              files in directories.


Florent


On 24/08/10 12:44, Guifeng Wei wrote:
> Hi,
>
> i came across a problem when i use the Bio::DB::Fasta modules of
> BioPerl. The aim i want to arrive at is to extract the subsequences
> accoording to the *.bed files which are the C.elegans genomic sequnece
> annotation.
>
> when i tried to run the scripts i wrote, the error message was coming, as
> follows:
>
> Can't call method "seq" on an undefined value at bed_to_fasta.pl line 28,
> <IN>  line 1.
>
> so, ask for favor to slove this problem.
> Here is my perl scripts.
>
> #!/usr/bin/perl -w
> # Purpose: extract sequences from genomic sequences
> use strict;
> use Bio::DB::Fasta;
> open(IN,$ARGV[0]) || die "sorry, the program cannot open the .bed file, plea
> check it. \n";
> my $db = Bio::DB::Fasta->new( '/home/wgf/elegans190.dna/' );
> # The dir ...../elegans190.dna/ includes 6
> files:chrI,chrII,chrIII,chrIV,chrV,chrX,
> #each stands for the sequences from the coressponding chromosome.
>
> while(<IN>){
>          chomp $_;
>          my @bed=split(/\s+/, $_ );
>
>          my $chr_id=$bed[0];
>          my $start=$bed[1];
>          my $end=$bed[2];
>          my $seq_name=$bed[3];
>          my $strand=$bed[5];
>
>          my $segment =  $db->seq( $chr_id, $start=>$end );
>
>          print ">",$seq_name,"_",$chr_id,":",$start=>$end;
>          print "$segment\n";
>
> }
>
> close(IN);
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From guifengwei at gmail.com  Tue Aug 24 07:28:16 2010
From: guifengwei at gmail.com (Guifeng Wei)
Date: Tue, 24 Aug 2010 19:28:16 +0800
Subject: [Bioperl-l] a problem when using the Bio::DB::Fasta
In-Reply-To: <AANLkTinnyEw4f8F5BP+CicffaVCe+pBEXc+0rj5vu1iG@mail.gmail.com>
References: <AANLkTinnyEw4f8F5BP+CicffaVCe+pBEXc+0rj5vu1iG@mail.gmail.com>
Message-ID: <AANLkTik_yFysscFwAd-8Ar4S_cM-XCk5w+C=8121MWNA@mail.gmail.com>

Hi,

i have revised my scripts according to the previous email from Florent.
However, there were still some errors which frustrated me so much.

The errors are as follows:

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Each line of the fasta entry must be the same length except the last.
    Line above #301451 '
..' is 22 != 51 chars.
STACK: Error::throw
STACK: Bio::Root::Root::throw
/usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368
STACK: Bio::DB::Fasta::calculate_offsets
/usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:770
STACK: Bio::DB::Fasta::index_dir
/usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:593
STACK: Bio::DB::Fasta::new
/usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:488
STACK: bed2fasta.pl:13
-----------------------------------------------------------
indexing was interrupted, so unlinking
/home/wgf/elegans190.dna//directory.index at
/usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm line 1053
But in the directory /home/wgf/elegans190.dna/ , it concludes 6 files,
each contains the complete sequences from one single chromosome, the format
is fasta. The extension of the FASTA files is .fa. Every single file is
started as ">chromosoemeXXX" followed by the thousands of sequences.

and therefore, it warn me that "Each line of the fasta entry must be the
same length except the last". and "indexing was interrupted, so unlinking
/home/wgf/elegans190.dna//directory".

i was much confused about this. so for help.

Wei Guifeng


From biopython at maubp.freeserve.co.uk  Tue Aug 24 09:28:33 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 24 Aug 2010 14:28:33 +0100
Subject: [Bioperl-l] a problem when using the Bio::DB::Fasta
In-Reply-To: <AANLkTik_yFysscFwAd-8Ar4S_cM-XCk5w+C=8121MWNA@mail.gmail.com>
References: <AANLkTinnyEw4f8F5BP+CicffaVCe+pBEXc+0rj5vu1iG@mail.gmail.com>
	<AANLkTik_yFysscFwAd-8Ar4S_cM-XCk5w+C=8121MWNA@mail.gmail.com>
Message-ID: <AANLkTi=Nn7m1_6mPoiUcmJNsBoFu4eh-pO9QJaVipOU0@mail.gmail.com>

On Tue, Aug 24, 2010 at 12:28 PM, Guifeng Wei <guifengwei at gmail.com> wrote:
> Hi,
>
> i have revised my scripts according to the previous email from Florent.
> However, there were still some errors which frustrated me so much.
>
> The errors are as follows:
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Each line of the fasta entry must be the same length except the last.
> ? ?Line above #301451 '
> ..' is 22 != 51 chars.
> STACK: Error::throw
> STACK: Bio::Root::Root::throw
> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368
> STACK: Bio::DB::Fasta::calculate_offsets
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:770
> STACK: Bio::DB::Fasta::index_dir
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:593
> STACK: Bio::DB::Fasta::new
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:488
> STACK: bed2fasta.pl:13
> -----------------------------------------------------------
> indexing was interrupted, so unlinking
> /home/wgf/elegans190.dna//directory.index at
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm line 1053
> But in the directory /home/wgf/elegans190.dna/ , it concludes 6 files,
> each contains the complete sequences from one single chromosome, the format
> is fasta. The extension of the FASTA files is .fa. Every single file is
> started as ">chromosoemeXXX" followed by the thousands of sequences.
>
> and therefore, it warn me that "Each line of the fasta entry must be the
> same length except the last". and "indexing was interrupted, so unlinking
> /home/wgf/elegans190.dna//directory".
>
> i was much confused about this. so for help.
>
> Wei Guifeng

Hi Wei,

It sounds like there is inconsistent line wrapping in your FASTA file.
This is often not a problem at all, but the DB indexing system (and
indeed other indexing tools like the samtools fasta index) requires
all the entries have the same wrapping.

e.g. This is a valid FASTA file but would not be suitable for indexing:

>Test
ACGTACGT
ACGTACGT
ACGTACGT
ACGT
ACGT
T

Ignoring the final line (special case - here length one) that uses a
mixture of line lengths, 8 and 4. If you had used this it should be
fine:

>Test
ACGTACGT
ACGTACGT
ACGTACGT
ACGTACGT
T

All the lines are now wrapped at length 8 (and the final line is
less than or equal to length 8).

Of course, in a real file wrapping a 60 or 80 characters is more
common ;)

Peter


From cjfields at illinois.edu  Tue Aug 24 09:38:45 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 24 Aug 2010 08:38:45 -0500
Subject: [Bioperl-l] a problem when using the Bio::DB::Fasta
In-Reply-To: <AANLkTik_yFysscFwAd-8Ar4S_cM-XCk5w+C=8121MWNA@mail.gmail.com>
References: <AANLkTinnyEw4f8F5BP+CicffaVCe+pBEXc+0rj5vu1iG@mail.gmail.com>
	<AANLkTik_yFysscFwAd-8Ar4S_cM-XCk5w+C=8121MWNA@mail.gmail.com>
Message-ID: <995BCF30-99B2-46C2-A4E8-681F9E2A0BB5@illinois.edu>

Guifeng,

Did you follow Jason's advice yesterday about converting the FASTA over to a more consistent length?  Or checking the database itself?  These are both things reiterated by Florent and Peter.

>From Jason's last response:

-------------------------
Wei -

Please ask your questions on the bioperl mailing list, I cannot answer questions directly for all requests.
Your problem has been answered by me on the list before so I urge you to use the list archives as a starting point.

The line lengths of the fasta file sequence aren't the same length.

you need to run this
bp_sreformat -if fasta -of fasta -i ORIGINAL -o NEW
mv NEW ORIGINAL

or with sreformat
sreformat fasta ORIGINAL > NEW
mv NEW ORIGINAL
-------------------------

chris


On Aug 24, 2010, at 6:28 AM, Guifeng Wei wrote:

> Hi,
> 
> i have revised my scripts according to the previous email from Florent.
> However, there were still some errors which frustrated me so much.
> 
> The errors are as follows:
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Each line of the fasta entry must be the same length except the last.
>   Line above #301451 '
> ..' is 22 != 51 chars.
> STACK: Error::throw
> STACK: Bio::Root::Root::throw
> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368
> STACK: Bio::DB::Fasta::calculate_offsets
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:770
> STACK: Bio::DB::Fasta::index_dir
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:593
> STACK: Bio::DB::Fasta::new
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:488
> STACK: bed2fasta.pl:13
> -----------------------------------------------------------
> indexing was interrupted, so unlinking
> /home/wgf/elegans190.dna//directory.index at
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm line 1053
> But in the directory /home/wgf/elegans190.dna/ , it concludes 6 files,
> each contains the complete sequences from one single chromosome, the format
> is fasta. The extension of the FASTA files is .fa. Every single file is
> started as ">chromosoemeXXX" followed by the thousands of sequences.
> 
> and therefore, it warn me that "Each line of the fasta entry must be the
> same length except the last". and "indexing was interrupted, so unlinking
> /home/wgf/elegans190.dna//directory".
> 
> i was much confused about this. so for help.
> 
> Wei Guifeng
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From scott at scottcain.net  Tue Aug 24 11:01:47 2010
From: scott at scottcain.net (Scott Cain)
Date: Tue, 24 Aug 2010 11:01:47 -0400
Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query
In-Reply-To: <1282321387.30812.17.camel@pyrimidine.igb.uiuc.edu>
References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au>
	<986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net>
	<AANLkTikiTQwzKJGrAKAtCqbHiAVw_uENrJu4Y038=anD@mail.gmail.com>
	<045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net>
	<5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu>
	<4C6DADDF.1000103@cornell.edu>
	<20100820155956.GB400@vpn-165-124-164-118.vpn.northwestern.edu>
	<1282321387.30812.17.camel@pyrimidine.igb.uiuc.edu>
Message-ID: <AANLkTin01uf32_1G2+d8PA2YEtw3UfB5FK+CVPnLCD81@mail.gmail.com>

Hi Chris,

GMOD still only supports Chado with Postgres (for example, the GFF
loader assumes a Postgres database), but when I reengineered the GFF
loader a few years ago, I tried to do it with subclassing the loader
in mind so that it could be subclassed to work with other RDMS.

Scott


On Fri, Aug 20, 2010 at 12:23 PM, Chris Fields <cjfields at illinois.edu> wrote:
> On Fri, 2010-08-20 at 10:59 -0500, Siddhartha Basu wrote:
>> Hi,
>>
>> On Thu, 19 Aug 2010, Robert Buels wrote:
>>
>> > Chris Fields wrote:
>> > > I think it's worth exploring having a DBIx::Class-based middle-ware
>> > > approach similar to what Rob Buels has done for Chado. ?That would be
>> > > fairly easy to get started using DBIx::Class::Schema::Loader.
>> > > After that it would require optimization and tweaking, which is
>> > > potentially more complex than Rob's setup as Chado is very Pg-specific,
>> > > but maybe Rob can elaborate...
>> >
>> > Elaborating on how Bio::Chado::Schema is developed:
>> >
>> > The vast majority of the code and POD in BCS is autogenerated by
>> > DBIx::Class::Schema::Loader. ?DBICSL gives you a baseline set of
>> > DBIx::Class classes that covers all the tables, views, columns, unique
>> > constraints, and foreign key relationships.
>> >
>> > Beyond that, you have to add on yourself. ?In BCS, we have mostly done
>> > things like:
>> >
>> > ? * make better-named aliases for some of the autogenerated
>> > ? ? relationships (though DBICSL does a surprisingly good job of naming
>> > ? ? relationships automatically most of the time)
>> > ? * add a tiny bit of bioperl compatibility (this needs a lot more work
>> > ? ? by somebody, volunteers needed!)
>> > ? * add convenience methods for using some of the Chado property tables
>> > ? * use DBIx::Class::Tree::NestedSet to add some powerful ways of
>> > ? ? traversing phylogenetic tree relationships
>> >
>> > Regarding DB backend specificity, BCS isn't Pg-specific at all, because
>> > DBIx::Class itself goes to great lengths to be compatible (and performant!)
>> > with just about every relational database out there.
>> I would vouch for that at least as far as chado in oracle is concerned.
>> So, ?far BCS works out flawlessly with our oracle chado instance at
>> dictybase. Quite a chunk of BCS based code is also active in couple of
>> our Mojo based webapps. The part which i still couldn't use directly is
>> the 'synonym' table as it clashes with oracle specific reserved keywords.
>> However, ?overall it seems to quite cross-RDMS compatible and highly
>> recommended.
>>
>> -siddhartha
>
> Just to point out, I didn't say BCS is Pg-specific, but that Chado is
> (that was the DBMS it was designed for). ?Maybe that should be amended
> to 'was' now :)
>
> I recall seeing a page on this somewhere on the GMOD website along the
> lines of "MySQL has problems so we chose Pg", and that Chado support
> would focus on Pg. ?I'm guessing that's no longer the case? ?Or is only
> the server-side stuff Pg-specific.
>
>> >In fact, the BCS test
>> > suite deploys a Chado schema into a temporary SQLite database using
>> > DBIC::Schema's deploy() method, and runs all of its tests on that. ?Very
>> > handy.
>> >
>> > Chado's Pg-specific server-side functions can of course be called through
>> > BCS if they are present, but it's perfectly possible to use Chado without
>> > any of the server-side functions, and mostly the way I use it.
>> >
>> > Rob
>
> I think this opens up the possibility of starting a DBIx::Class-based
> middleware solution. ?Hilmar, did you want to take that on?
>
> chris
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
Ontario Institute for Cancer Research


From bgs500 at york.ac.uk  Tue Aug 24 11:35:53 2010
From: bgs500 at york.ac.uk (Ben Saville)
Date: Tue, 24 Aug 2010 16:35:53 +0100
Subject: [Bioperl-l] Problem Parsing BLAST output
In-Reply-To: <0384052D-74D2-4789-B7FA-76EED826044F@sbc.su.se>
References: <77D8BE69-E2D3-4C75-BED4-08F1755E414F@york.ac.uk>
	<0384052D-74D2-4789-B7FA-76EED826044F@sbc.su.se>
Message-ID: <34F7412D-2BFA-4D80-AEEB-2B8A9BE415D4@york.ac.uk>

Sorry for the Delay in replying, 454 data analysis is very time  
consuming.

please see http://seqanswers.com/forums/showthread.php?t=6484
For a discussion about this problem, and how we solved the issue.

Thanks for the reply though, much appreciated!

Regards
Ben Saville


On 20 Aug 2010, at 14:48, Dave Messina wrote:

> Hi Ben,
>
> I would not use the script you posted ? I don't think it does what  
> you want.
>
> If you haven't already, you should take a look at the beginners' HOWTO
>
> 	http://www.bioperl.org/wiki/HOWTO:Beginners
>
>
> the SearchIO HOWTO
>
> 	http://www.bioperl.org/wiki/HOWTO:SearchIO
>
>
> and the example scripts included with BioPerl:
>
> 	http://www.bioperl.org/wiki/Scripts
>
>
>
> Incidentally, it's a lot of fiddly data processing to parse blast  
> reports for many contigs against multiple databases and then go back  
> and collate the results by query. I'm not sure exactly what you want  
> to do once you've separated by query ? if you provide some more  
> information, we could suggest ways to best get you where you want to  
> go.
>
> I will mention, though, that BLAST has the ability to search  
> multiple separate databases in one go and collate the results for  
> you. So that's something to consider.
>
>
>
> Dave
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Tue Aug 24 11:54:20 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 24 Aug 2010 10:54:20 -0500
Subject: [Bioperl-l] a problem when using the Bio::DB::Fasta
In-Reply-To: <AANLkTi=7_fFU4Q53S1onRZpFaVoS6ndNNq68ZSHMDoe3@mail.gmail.com>
References: <AANLkTinnyEw4f8F5BP+CicffaVCe+pBEXc+0rj5vu1iG@mail.gmail.com>
	<AANLkTik_yFysscFwAd-8Ar4S_cM-XCk5w+C=8121MWNA@mail.gmail.com>
	<995BCF30-99B2-46C2-A4E8-681F9E2A0BB5@illinois.edu>
	<AANLkTi=7_fFU4Q53S1onRZpFaVoS6ndNNq68ZSHMDoe3@mail.gmail.com>
Message-ID: <B269BA3E-C0E7-4FEA-BA78-E164F4D2B787@illinois.edu>

Please keep all responses on-list.  

Regarding sreformat:

http://tinyurl.com/28q75rr

Judging by the stack traces below, you are also running off a UNIX-like system.  To concatenate files, use 'cat'.  So, for all files ending with .fa:

cat *.fa >> all.fa

chris

On Aug 24, 2010, at 8:54 AM, Guifeng Wei wrote:

> Hello Fields,
>  
> i have checked the fasta files. i suddenly find that the last line is blank line, and the last second is less than common.
>  
> i am not able to run the command line as Jason's advice because i have no knowledge about "sreformat".
>  
> i also want to ask a more question. i want megre the several single chromosome sequence file into one, OK?
>  
> thank you very much.
>  
> Wei Guifeng
> 2010/8/24 Chris Fields <cjfields at illinois.edu>
> Guifeng,
> 
> Did you follow Jason's advice yesterday about converting the FASTA over to a more consistent length?  Or checking the database itself?  These are both things reiterated by Florent and Peter.
> 
> From Jason's last response:
> 
> -------------------------
> Wei -
> 
> Please ask your questions on the bioperl mailing list, I cannot answer questions directly for all requests.
> Your problem has been answered by me on the list before so I urge you to use the list archives as a starting point.
> 
> The line lengths of the fasta file sequence aren't the same length.
> 
> you need to run this
> bp_sreformat -if fasta -of fasta -i ORIGINAL -o NEW
> mv NEW ORIGINAL
> 
> or with sreformat
> sreformat fasta ORIGINAL > NEW
> mv NEW ORIGINAL
> -------------------------
> 
> chris
> 
> 
> On Aug 24, 2010, at 6:28 AM, Guifeng Wei wrote:
> 
> > Hi,
> >
> > i have revised my scripts according to the previous email from Florent.
> > However, there were still some errors which frustrated me so much.
> >
> > The errors are as follows:
> >
> > ------------- EXCEPTION: Bio::Root::Exception -------------
> > MSG: Each line of the fasta entry must be the same length except the last.
> >   Line above #301451 '
> > ..' is 22 != 51 chars.
> > STACK: Error::throw
> > STACK: Bio::Root::Root::throw
> > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368
> > STACK: Bio::DB::Fasta::calculate_offsets
> > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:770
> > STACK: Bio::DB::Fasta::index_dir
> > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:593
> > STACK: Bio::DB::Fasta::new
> > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:488
> > STACK: bed2fasta.pl:13
> > -----------------------------------------------------------
> > indexing was interrupted, so unlinking
> > /home/wgf/elegans190.dna//directory.index at
> > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm line 1053
> > But in the directory /home/wgf/elegans190.dna/ , it concludes 6 files,
> > each contains the complete sequences from one single chromosome, the format
> > is fasta. The extension of the FASTA files is .fa. Every single file is
> > started as ">chromosoemeXXX" followed by the thousands of sequences.
> >
> > and therefore, it warn me that "Each line of the fasta entry must be the
> > same length except the last". and "indexing was interrupted, so unlinking
> > /home/wgf/elegans190.dna//directory".
> >
> > i was much confused about this. so for help.
> >
> > Wei Guifeng
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> 
> 
> -- 
> ?????? Wei Guifeng
> 
> 
> 


From cjfields at illinois.edu  Tue Aug 24 12:14:51 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 24 Aug 2010 11:14:51 -0500
Subject: [Bioperl-l] Problem Parsing BLAST output
In-Reply-To: <34F7412D-2BFA-4D80-AEEB-2B8A9BE415D4@york.ac.uk>
References: <77D8BE69-E2D3-4C75-BED4-08F1755E414F@york.ac.uk>
	<0384052D-74D2-4789-B7FA-76EED826044F@sbc.su.se>
	<34F7412D-2BFA-4D80-AEEB-2B8A9BE415D4@york.ac.uk>
Message-ID: <69C47A74-09C7-4024-9303-A3893658A2A8@illinois.edu>

Just in case anyone needs it, there is a way to index these as well (both BLAST and the two tabular BLAST versions) for fast lookups of specific reports, if needed.  See Bio::Index::Blast and Bio::Index::BlastTable in BioPerl.

Caveat: I believe there is a bug with BLAST+ text output indexing (it chops the header off subsequent reports).  I haven't investigated it enough, though, but I'll try looking into it today.  

chris

On Aug 24, 2010, at 10:35 AM, Ben Saville wrote:

> Sorry for the Delay in replying, 454 data analysis is very time consuming.
> 
> please see http://seqanswers.com/forums/showthread.php?t=6484
> For a discussion about this problem, and how we solved the issue.
> 
> Thanks for the reply though, much appreciated!
> 
> Regards
> Ben Saville
> 
> 
> 
> 
> 
> On 20 Aug 2010, at 14:48, Dave Messina wrote:
> 
>> Hi Ben,
>> 
>> I would not use the script you posted ? I don't think it does what you want.
>> 
>> If you haven't already, you should take a look at the beginners' HOWTO
>> 
>> 	http://www.bioperl.org/wiki/HOWTO:Beginners
>> 
>> 
>> the SearchIO HOWTO
>> 
>> 	http://www.bioperl.org/wiki/HOWTO:SearchIO
>> 
>> 
>> and the example scripts included with BioPerl:
>> 
>> 	http://www.bioperl.org/wiki/Scripts
>> 
>> 
>> 
>> Incidentally, it's a lot of fiddly data processing to parse blast reports for many contigs against multiple databases and then go back and collate the results by query. I'm not sure exactly what you want to do once you've separated by query ? if you provide some more information, we could suggest ways to best get you where you want to go.
>> 
>> I will mention, though, that BLAST has the ability to search multiple separate databases in one go and collate the results for you. So that's something to consider.
>> 
>> 
>> 
>> Dave
>> 
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Tue Aug 24 12:17:17 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 24 Aug 2010 11:17:17 -0500
Subject: [Bioperl-l] FYI: interesting stuff in BLAST 2.2.24 release
	announcement
References: <73C34D2F-813E-4FCE-8819-B86BC5A974C0@ncbi.nlm.nih.gov>
Message-ID: <A26B0224-CFDD-4D2B-A5B0-4275693416FD@illinois.edu>

FYI,

Very interesting additions to BLAST+ (archive format).  

chris

Begin forwarded message:

> From: mcginnis <mcginnis at ncbi.nlm.nih.gov>
> Date: August 24, 2010 10:46:50 AM CDT
> To: NLM/NCBI List blast-announce <blast-announce at ncbi.nlm.nih.gov>
> Subject: [blast-announce] Correction: BLAST 2.2.24 release announcement
> 
> A new version of the stand-alone applications is available.
>  
> Users are encouraged to use the BLAST+ applications available at ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/
> 
> This release includes a number of bug fixes as well as new features for the BLAST+ applications:
>  
> * Introduce BLAST Archive format to permit reformatting of stand-alone BLAST searches with the blast_formatter(see BLAST+ user manual) 
> * Added the blast_formatter application (see BLAST+ user manual)
> * Added support for translated subject soft masking in the BLAST databases
> * Added support for the BLAST Trace-back operations (btop) output format
> * Added command line options to blastdbcmd for listing available BLAST databases
> * Improved performance of formatting of remote BLAST searches
> * Use a consistent exit code for out of memory conditions
> * Fixed bug in indexed megablast with multiple space-separated BLAST databases
> * Fixed bugs in legacy_blast.pl, blastdbcmd, rpsblast, and makeblastdb
> * Fixed Windows installer for 64-bit installations
>  
> BLAST+ applications, as well as the legacy C applications (e.g. blastall), may be downloaded from http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download


From David.Messina at sbc.su.se  Tue Aug 24 13:00:14 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Tue, 24 Aug 2010 19:00:14 +0200
Subject: [Bioperl-l] FYI: interesting stuff in BLAST 2.2.24 release
	announcement
In-Reply-To: <A26B0224-CFDD-4D2B-A5B0-4275693416FD@illinois.edu>
References: <73C34D2F-813E-4FCE-8819-B86BC5A974C0@ncbi.nlm.nih.gov>
	<A26B0224-CFDD-4D2B-A5B0-4275693416FD@illinois.edu>
Message-ID: <27DD75E8-4452-4B2D-B5B9-A686C113E5B6@sbc.su.se>

Here's a link to the manual:
ftp://ftp.ncbi.nlm.nih.gov//blast/executables/blast%2B/2.2.24/user_manual.pdf

(Is it on the NCBI website somewhere? Strange to have only a downloadable PDF.) The section on the new archive format is on page 27.

It seems like a nice idea to have the flexibility, but I wonder about the time cost of using this format.

One of the big gains from using tab-delimited output is that BLAST doesn't have to do all the post-processing to generate the alignment views. By doing the archive format, which if I understand it correctly is ASN.1, you're always paying the full price in time (and space, for that matter).


Dave


On Aug 24, 2010, at 18:17 , Chris Fields wrote:

> FYI,
> 
> Very interesting additions to BLAST+ (archive format).  
> 
> chris
> 
> Begin forwarded message:
> 
>> From: mcginnis <mcginnis at ncbi.nlm.nih.gov>
>> Date: August 24, 2010 10:46:50 AM CDT
>> To: NLM/NCBI List blast-announce <blast-announce at ncbi.nlm.nih.gov>
>> Subject: [blast-announce] Correction: BLAST 2.2.24 release announcement
>> 
>> A new version of the stand-alone applications is available.
>> 
>> Users are encouraged to use the BLAST+ applications available at ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/
>> 
>> This release includes a number of bug fixes as well as new features for the BLAST+ applications:
>> 
>> * Introduce BLAST Archive format to permit reformatting of stand-alone BLAST searches with the blast_formatter(see BLAST+ user manual) 
>> * Added the blast_formatter application (see BLAST+ user manual)
>> * Added support for translated subject soft masking in the BLAST databases
>> * Added support for the BLAST Trace-back operations (btop) output format
>> * Added command line options to blastdbcmd for listing available BLAST databases
>> * Improved performance of formatting of remote BLAST searches
>> * Use a consistent exit code for out of memory conditions
>> * Fixed bug in indexed megablast with multiple space-separated BLAST databases
>> * Fixed bugs in legacy_blast.pl, blastdbcmd, rpsblast, and makeblastdb
>> * Fixed Windows installer for 64-bit installations
>> 
>> BLAST+ applications, as well as the legacy C applications (e.g. blastall), may be downloaded from http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Tue Aug 24 13:26:49 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 24 Aug 2010 12:26:49 -0500
Subject: [Bioperl-l] FYI: interesting stuff in BLAST 2.2.24 release
	announcement
In-Reply-To: <27DD75E8-4452-4B2D-B5B9-A686C113E5B6@sbc.su.se>
References: <73C34D2F-813E-4FCE-8819-B86BC5A974C0@ncbi.nlm.nih.gov>
	<A26B0224-CFDD-4D2B-A5B0-4275693416FD@illinois.edu>
	<27DD75E8-4452-4B2D-B5B9-A686C113E5B6@sbc.su.se>
Message-ID: <D84DD1C8-6CBE-40F1-8CF9-F9482F0E4B18@illinois.edu>

It's probably more applicable from the viewpoint of a cluster admin who would want to add the flexibility of having a single archive and allowing any format (as opposed to re-running the analysis). I'm just wondering if there is anything to glean there for possible alignment archiving purposes (ala SAM/BAM), but if it's ASN.1, likely not.

chris

On Aug 24, 2010, at 12:00 PM, Dave Messina wrote:

> Here's a link to the manual:
> ftp://ftp.ncbi.nlm.nih.gov//blast/executables/blast%2B/2.2.24/user_manual.pdf
> 
> (Is it on the NCBI website somewhere? Strange to have only a downloadable PDF.) The section on the new archive format is on page 27.
> 
> It seems like a nice idea to have the flexibility, but I wonder about the time cost of using this format.
> 
> One of the big gains from using tab-delimited output is that BLAST doesn't have to do all the post-processing to generate the alignment views. By doing the archive format, which if I understand it correctly is ASN.1, you're always paying the full price in time (and space, for that matter).
> 
> 
> 
> Dave
> 
> 
> 
> 
> On Aug 24, 2010, at 18:17 , Chris Fields wrote:
> 
>> FYI,
>> 
>> Very interesting additions to BLAST+ (archive format).  
>> 
>> chris
>> 
>> Begin forwarded message:
>> 
>>> From: mcginnis <mcginnis at ncbi.nlm.nih.gov>
>>> Date: August 24, 2010 10:46:50 AM CDT
>>> To: NLM/NCBI List blast-announce <blast-announce at ncbi.nlm.nih.gov>
>>> Subject: [blast-announce] Correction: BLAST 2.2.24 release announcement
>>> 
>>> A new version of the stand-alone applications is available.
>>> 
>>> Users are encouraged to use the BLAST+ applications available at ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/
>>> 
>>> This release includes a number of bug fixes as well as new features for the BLAST+ applications:
>>> 
>>> * Introduce BLAST Archive format to permit reformatting of stand-alone BLAST searches with the blast_formatter(see BLAST+ user manual) 
>>> * Added the blast_formatter application (see BLAST+ user manual)
>>> * Added support for translated subject soft masking in the BLAST databases
>>> * Added support for the BLAST Trace-back operations (btop) output format
>>> * Added command line options to blastdbcmd for listing available BLAST databases
>>> * Improved performance of formatting of remote BLAST searches
>>> * Use a consistent exit code for out of memory conditions
>>> * Fixed bug in indexed megablast with multiple space-separated BLAST databases
>>> * Fixed bugs in legacy_blast.pl, blastdbcmd, rpsblast, and makeblastdb
>>> * Fixed Windows installer for 64-bit installations
>>> 
>>> BLAST+ applications, as well as the legacy C applications (e.g. blastall), may be downloaded from http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From David.Messina at sbc.su.se  Tue Aug 24 14:45:29 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Tue, 24 Aug 2010 20:45:29 +0200
Subject: [Bioperl-l] FYI: interesting stuff in BLAST 2.2.24 release
	announcement
In-Reply-To: <D84DD1C8-6CBE-40F1-8CF9-F9482F0E4B18@illinois.edu>
References: <73C34D2F-813E-4FCE-8819-B86BC5A974C0@ncbi.nlm.nih.gov>
	<A26B0224-CFDD-4D2B-A5B0-4275693416FD@illinois.edu>
	<27DD75E8-4452-4B2D-B5B9-A686C113E5B6@sbc.su.se>
	<D84DD1C8-6CBE-40F1-8CF9-F9482F0E4B18@illinois.edu>
Message-ID: <00C04DF9-F3C2-4574-B1E4-A3BF28EE953F@sbc.su.se>

> It's probably more applicable from the viewpoint of a cluster admin who would want to add the flexibility of having a single archive and allowing any format (as opposed to re-running the analysis).

Good point.


> I'm just wondering if there is anything to glean there for possible alignment archiving purposes (ala SAM/BAM), but if it's ASN.1, likely not.

To be honest, I didn't look that closely at it. It may be worth considering nevertheless.


Dave


From buiduyminh at gmail.com  Tue Aug 24 14:56:43 2010
From: buiduyminh at gmail.com (Minh Bui)
Date: Tue, 24 Aug 2010 14:56:43 -0400
Subject: [Bioperl-l] bp_seqfeature_load.pl fails on Mac os. Please help.
In-Reply-To: <491D1B66-741F-4315-8A6B-46F465956017@sgul.ac.uk>
References: <AANLkTinsyOMPJxpks_pqMwLpW8gx0VRihhJsLDnF53mu@mail.gmail.com>
	<491D1B66-741F-4315-8A6B-46F465956017@sgul.ac.uk>
Message-ID: <AANLkTimOe=T9FrpMPqMy8yyrfz8Sf7QJ5Rr5YYFjicJb@mail.gmail.com>

How can I know where DBD:mysql PATH on my MAC? I am very new to MAC sorry.

I just check and mysql.pm is in
/Library/Perl/5.8.6/Bio/DB/SeqFeature/Store/DBI/mysql.pm


On 8/21/10, Adam Witney <awitney at sgul.ac.uk> wrote:
>
>  On 20 Aug 2010, at 22:29, Minh Bui wrote:
>
>  > Hi,,
>  > I am trying to load my GFF file to mysql database but I got this error
>  > when I ran the bp_seqfeature_load.pl ( bioperl 1.6.1 on ?MAC)
>  >
>  > [BioComplexity-5:/usr/local/bin] minh% perl bp_seqfeature_load.pl
>  > install_driver(mysql) failed: Can't locate DBD/mysql.pm in @INC (@INC
>  > contains: /sw/lib/perl5 /sw/lib/perl5/darwin
>  > /System/Library/Perl/5.8.6/darwin-thread-multi-2level
>  > /System/Library/Perl/5.8.6
>  > /Library/Perl/5.8.6/darwin-thread-multi-2level /Library/Perl/5.8.6
>  > /Library/Perl /Network/Library/Perl/5.8.6/darwin-thread-multi-2level
>  > /Network/Library/Perl/5.8.6 /Network/Library/Perl
>  > /System/Library/Perl/Extras/5.8.6/darwin-thread-multi-2level
>  > /System/Library/Perl/Extras/5.8.6 /Library/Perl/5.8.1 .) at (eval 44)
>  > line 3.
>  > Perhaps the DBD::mysql perl module hasn't been fully installed,
>  > or perhaps the capitalisation of 'mysql' isn't right.
>  > Available drivers: DBM, ExampleP, File, Gofer, Proxy, Sponge.
>  > at /Library/Perl/5.8.6/Bio/DB/SeqFeature/Store/DBI/mysql.pm line 212
>  >
>  > I am using MAC OSX version 10.4.10 and MAMP? Isnt it the
>  > "/Library/Perl/5.8.6" already in @INC? What am I missing?
>  > I have been googling this error for a few hours. I also install
>  > Bioperl and reinstall DBD::mysql using CPAN. It still doesnt work..
>  >
>  > Here is my $PERL5LIB: ?/sw/lib/perl5:/sw/lib/perl5/darwin/
>
>
>
> Where did DBD:mysql get installed? can you verify that DBD/mysql.pm is actually in one of those directories listed above?
>
>


From scott at scottcain.net  Tue Aug 24 15:04:04 2010
From: scott at scottcain.net (Scott Cain)
Date: Tue, 24 Aug 2010 15:04:04 -0400
Subject: [Bioperl-l] bp_seqfeature_load.pl fails on Mac os. Please help.
In-Reply-To: <AANLkTimOe=T9FrpMPqMy8yyrfz8Sf7QJ5Rr5YYFjicJb@mail.gmail.com>
References: <AANLkTinsyOMPJxpks_pqMwLpW8gx0VRihhJsLDnF53mu@mail.gmail.com>
	<491D1B66-741F-4315-8A6B-46F465956017@sgul.ac.uk>
	<AANLkTimOe=T9FrpMPqMy8yyrfz8Sf7QJ5Rr5YYFjicJb@mail.gmail.com>
Message-ID: <AANLkTimPapxSzwVxCBMw1J0+x88K80SJ_6OH9LBkS3Jn@mail.gmail.com>

Hi Minh,

The file you found is not DBD::mysql though; it is
Bio::DB::SeqFeature::Store::DBI::mysql, which was installed along with
BioPerl.  How did you find that file?  The same method presumably
would turn up DBD::mysql if it existed.  I would use a command like
this:

  locate mysql.pm

which would locate all of the instances of files name mysql.pm on your
computer.  I would expect it to be located in
/Library/Perl/5.8.6/darwin-thread-multi-2level/DBD/ if it was
installed in a "normal" way (that is, not involving macports or fink
or MAMP).

Scott


On Tue, Aug 24, 2010 at 2:56 PM, Minh Bui <buiduyminh at gmail.com> wrote:
> How can I know where DBD:mysql PATH on my MAC? I am very new to MAC sorry.
>
> I just check and mysql.pm is in
> /Library/Perl/5.8.6/Bio/DB/SeqFeature/Store/DBI/mysql.pm
>
>
>
> On 8/21/10, Adam Witney <awitney at sgul.ac.uk> wrote:
>>
>> ?On 20 Aug 2010, at 22:29, Minh Bui wrote:
>>
>> ?> Hi,,
>> ?> I am trying to load my GFF file to mysql database but I got this error
>> ?> when I ran the bp_seqfeature_load.pl ( bioperl 1.6.1 on ?MAC)
>> ?>
>> ?> [BioComplexity-5:/usr/local/bin] minh% perl bp_seqfeature_load.pl
>> ?> install_driver(mysql) failed: Can't locate DBD/mysql.pm in @INC (@INC
>> ?> contains: /sw/lib/perl5 /sw/lib/perl5/darwin
>> ?> /System/Library/Perl/5.8.6/darwin-thread-multi-2level
>> ?> /System/Library/Perl/5.8.6
>> ?> /Library/Perl/5.8.6/darwin-thread-multi-2level /Library/Perl/5.8.6
>> ?> /Library/Perl /Network/Library/Perl/5.8.6/darwin-thread-multi-2level
>> ?> /Network/Library/Perl/5.8.6 /Network/Library/Perl
>> ?> /System/Library/Perl/Extras/5.8.6/darwin-thread-multi-2level
>> ?> /System/Library/Perl/Extras/5.8.6 /Library/Perl/5.8.1 .) at (eval 44)
>> ?> line 3.
>> ?> Perhaps the DBD::mysql perl module hasn't been fully installed,
>> ?> or perhaps the capitalisation of 'mysql' isn't right.
>> ?> Available drivers: DBM, ExampleP, File, Gofer, Proxy, Sponge.
>> ?> at /Library/Perl/5.8.6/Bio/DB/SeqFeature/Store/DBI/mysql.pm line 212
>> ?>
>> ?> I am using MAC OSX version 10.4.10 and MAMP? Isnt it the
>> ?> "/Library/Perl/5.8.6" already in @INC? What am I missing?
>> ?> I have been googling this error for a few hours. I also install
>> ?> Bioperl and reinstall DBD::mysql using CPAN. It still doesnt work..
>> ?>
>> ?> Here is my $PERL5LIB: ?/sw/lib/perl5:/sw/lib/perl5/darwin/
>>
>>
>>
>> Where did DBD:mysql get installed? can you verify that DBD/mysql.pm is actually in one of those directories listed above?
>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
Ontario Institute for Cancer Research


From jason at bioperl.org  Wed Aug 25 00:33:45 2010
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 24 Aug 2010 21:33:45 -0700
Subject: [Bioperl-l] Enquiry on gi_taxid_nucl.dmp.gz
In-Reply-To: <AANLkTi=BrV0ODdF+sEQdAmtAMxRk6y2xGgRMOgbnZz-6@mail.gmail.com>
References: <AANLkTi=BrV0ODdF+sEQdAmtAMxRk6y2xGgRMOgbnZz-6@mail.gmail.com>
Message-ID: <4C749D29.3040003@bioperl.org>

hi - please keep questions on list.


I think one of your problem is your first use of $gi2taxidfile is wrong. 
when you call tie you want to specify an dbfile you want to store the 
index in.
So call it "/tmp/gi2taxid.idx" or something like that.

In my code here 
http://github.com/bioperl/bioperl-live/blob/master/scripts/taxa/classify_hits_kingdom.PLS
you will see on line 97 we construct the name of the index file to be 
the folder, plus 'idx', plus the name gi2taxid which will be the name of 
index file.

Also it would be safer for the split to be whitespace matching and that 
you want the the two first columns from the file.  Doing this would 
eliminate the need for the chomp on the line above.

  my ($gi, $taxid) = split(/\s+/, $_);

instead of

  chomp;
  my ($gi, $taxid) = split(" ", $_,2);

There may be other problems but these should be fixed first -- and 
please send queries to the mailing list rather than to me directly so 
that others can answer questions.

-jason
Amali Thrimawithana wrote, On 8/24/10 8:13 PM:
> Dear Jason
>
> Thank you very much for the information. I manage to get the information on
> different taxonomic  levels with the help of one of your example code
> "local_taxonomydb_query". However I am having trouble with creating a local
> index file of the gi_taxid_nucl.dmp so that I am able to get the taxonomic
> id given the GI number of NCBI. At the moment I am using the tie() function
> with DB_file and then storing the detail into a hash. However when I try to
> retrieve a taxonomic ID given the GI number, it is not returning any thing
> but an error. Below is part of the code (borrowed from the example code
> classify kingdom), can you please let me know where I am going wrong?
> ...
> my $dbh2 = tie(%taxid4gi, 'DB_File', $gi2taxidfile);
>
> if( ! $done ) {
>      my $fh;
>     open(GI2TAXID, "$gi2taxidfile") or die $!; #here passing the unzipped
> gi_taxid_nucl.dmp
>     my$i=0;
>      while (<GI2TAXID>) {
>        chomp;
>         my ($gi, $taxid) = split(" ", $_, 2);
>         $taxid4gi{$gi} = $taxid
>         if exists $taxid4gi{$gi};
>         $i++;
>       unless( $DEBUG&&  $i % 100000  ) {
>          warn "$i\n";
>      }
>      }
>      $dbh2->sync;
> }
> my $gi2='183397240';
> my $taxd2=$taxid4gi{$gi2};
>   print $taxd2, " \n";
>
> Any help would be much appreciated
>
> Thanking you
> Amali
>
> On 23 August 2010 06:29, Jason Stajich<jason at bioperl.org>  wrote:
>
>    
>> Hi Amali -
>>
>> This is how I'd print out the full classification by using the Tree methods
>> (with probably a different way of initializing the $db object to your
>> flatfiles location).
>>
>> #!/usr/bin/perl -w
>> use strict;
>> use Bio::DB::Taxonomy;
>>
>> my $db= Bio::DB::Taxonomy->new(-source =>  'flatfile',
>>                    -nodesfile =>  'taxonomy/nodes.dmp',
>>                    -namesfile =>  'taxonomy/names.dmp');
>>
>> my $taxonid = $db->get_taxonid('Homo sapiens');
>> my $taxon = $db->get_taxon(-taxonid =>  $taxonid);
>> my $tree = Bio::Tree::Tree->new(-node =>  $taxon);
>> my @taxa = $tree->get_nodes;
>> print join(",", map { $_->scientific_name } @taxa), "\n";
>>
>> -jason
>>
>> Amali Thrimawithana wrote, On 8/18/10 3:56 PM:
>>
>>   Dear Dr Stajich,
>>      
>>> I am a Masters student at Auckland university and my research is on
>>> identifying yeast species present in wine by the use of 454 sequencing. In
>>> order to carry out this research, a pipeline is being built in which at
>>> the
>>> final step each representative OTU need to be classified at different
>>> taxonomic levels (ie: at Phylum, family, class, genus and species) by
>>> using
>>> the results from BLAST. To identify the sequences at each taxonomic level,
>>> I
>>> have been trying out the Bio::DB::Taxonomy module in bioperl. Using this
>>> module, I am able to get the genus and species level by splitting the
>>> scientific name returned by the Bio::taxon object. But unfortunately I am
>>> uncertain on how to get the information for the other levels of the rank.
>>> I
>>> have tried several commands including "my @class =
>>> $node->classification;",
>>> but it does not work. Hence, could you please let me know how I might be
>>> able to get the higher levels of taxonomy such as class and phylum using
>>> bioperl?
>>>
>>> Look forward to hearing from you soon
>>>
>>> Thanking You
>>>
>>> Amali
>>>
>>>
>>>        


From roy.chaudhuri at gmail.com  Wed Aug 25 07:12:15 2010
From: roy.chaudhuri at gmail.com (Roy Chaudhuri)
Date: Wed, 25 Aug 2010 12:12:15 +0100
Subject: [Bioperl-l] Enquiry on gi_taxid_nucl.dmp.gz
In-Reply-To: <4C749D29.3040003@bioperl.org>
References: <AANLkTi=BrV0ODdF+sEQdAmtAMxRk6y2xGgRMOgbnZz-6@mail.gmail.com>
	<4C749D29.3040003@bioperl.org>
Message-ID: <4C74FA8F.3080506@gmail.com>

 > Also it would be safer for the split to be whitespace matching and that
> you want the the two first columns from the file.  Doing this would
> eliminate the need for the chomp on the line above.
>
>    my ($gi, $taxid) = split(/\s+/, $_);
>
> instead of
>
>    chomp;
>    my ($gi, $taxid) = split(" ", $_,2);

Sorry to be pedantic, but according to perldoc -f split: "As a special 
case, specifying a PATTERN of space (' ') will split on white space just 
as "split" with no arguments does"

The only difference between patterns of " " and /\s+/ is that the latter 
will return an initial null field if there is leading white space, which 
may or may not be what you want.

$ perl -e 'print join("-", split(" ", " 1\t2  3")), "\n"'
1-2-3
$ perl -e 'print join("-", split(/\s+/, " 1\t2  3")), "\n"'
-1-2-3

Cheers.
Roy.


From kanmaninradha at gmail.com  Thu Aug 26 04:29:08 2010
From: kanmaninradha at gmail.com (kanmani radha)
Date: Thu, 26 Aug 2010 01:29:08 -0700
Subject: [Bioperl-l] getting DNA sequence for exon features from GFF
Message-ID: <AANLkTimzuqNz+oHZt3JAq=eQaefJ4UnQK=0R2tQhKfmS@mail.gmail.com>

Hi All,
I would like to get the DNA seq from GFF file. I'm using Bio::Tools::GFF
module. I could get everything else but not the DNA seq.

Can anyone help me to find this out, Please. I appreciate your help very
much.
thanks,
Kanmani

#!/usr/bin/perl

use strict;
use warnings;
use Bio::Tools::GFF;

my $file = shift;

my $gffio = Bio::Tools::GFF->new(-file => $file, -gff_version => 3);
$gffio->features_attached_to_seqs(1);

while (my $feat = $gffio->next_feature()){
    my $start = $feat->start;
    my $end= $feat->end;
    my $size = $end-$start+1;
    my $strand = $feat->strand;
    my $seqid = $feat->seq_id;
    my $score = $feat->score;
    my $frame = $feat->frame;
    my $source = $feat->source_tag;
    my $type = $feat->primary_tag;
    my $gffstr = $gffio->gff_string($feat);
    my @alltags = $feat->all_tags();
    my @ID_tag_value = $feat->each_tag_value("ID");

    my  $seq = $feat->seq();
    print "$seq\n";

     if($type eq "gene"){     #
       print "@ID_tag_value\t$size\t$type\t$start\t$end\n";
    }
}


From David.Messina at sbc.su.se  Thu Aug 26 04:53:48 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 26 Aug 2010 10:53:48 +0200
Subject: [Bioperl-l] getting DNA sequence for exon features from GFF
In-Reply-To: <AANLkTimzuqNz+oHZt3JAq=eQaefJ4UnQK=0R2tQhKfmS@mail.gmail.com>
References: <AANLkTimzuqNz+oHZt3JAq=eQaefJ4UnQK=0R2tQhKfmS@mail.gmail.com>
Message-ID: <6A715DB8-C51A-4195-BB6C-73A7345CB65A@sbc.su.se>

Admittedly i'm not up on the latest uses of GFF, but as far as I know, GFF is an annotation format only ? it does not contain the actual sequence.

Have you looked in your GFF file to see if there are nucleotides in there?

Dave


On Aug 26, 2010, at 10:29, kanmani radha wrote:

> Hi All,
> I would like to get the DNA seq from GFF file. I'm using Bio::Tools::GFF
> module. I could get everything else but not the DNA seq.


From biopython at maubp.freeserve.co.uk  Thu Aug 26 05:02:53 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 26 Aug 2010 10:02:53 +0100
Subject: [Bioperl-l] getting DNA sequence for exon features from GFF
In-Reply-To: <6A715DB8-C51A-4195-BB6C-73A7345CB65A@sbc.su.se>
References: <AANLkTimzuqNz+oHZt3JAq=eQaefJ4UnQK=0R2tQhKfmS@mail.gmail.com>
	<6A715DB8-C51A-4195-BB6C-73A7345CB65A@sbc.su.se>
Message-ID: <AANLkTikw=9zFm5sZej0C4kTQZMnvoFNox06jCC6p9Jxy@mail.gmail.com>

On Thu, Aug 26, 2010 at 9:53 AM, Dave Messina <David.Messina at sbc.su.se> wrote:
>
> Admittedly i'm not up on the latest uses of GFF, but as far as I know, GFF
> is an annotation format only ? it does not contain the actual sequence.
>
> Have you looked in your GFF file to see if there are nucleotides in there?
>
> Dave

Actually a GFF file can optionally include a FASTA format sequence
at the end of the file, although it seems to be more common to just
supply separate GFF and FASTA files and cross reference by ID.

Peter


From David.Messina at sbc.su.se  Thu Aug 26 05:08:20 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 26 Aug 2010 11:08:20 +0200
Subject: [Bioperl-l] getting DNA sequence for exon features from GFF
In-Reply-To: <AANLkTikw=9zFm5sZej0C4kTQZMnvoFNox06jCC6p9Jxy@mail.gmail.com>
References: <AANLkTimzuqNz+oHZt3JAq=eQaefJ4UnQK=0R2tQhKfmS@mail.gmail.com>
	<6A715DB8-C51A-4195-BB6C-73A7345CB65A@sbc.su.se>
	<AANLkTikw=9zFm5sZej0C4kTQZMnvoFNox06jCC6p9Jxy@mail.gmail.com>
Message-ID: <C7C28E1D-7BAC-4D06-9EC6-71EA95F06776@sbc.su.se>

Aha, great, thanks for clarifying, Peter.

And if I bothered to look at the Bio::Tools::GFF documentation before answering :), I would have seen this:

    http://doc.bioperl.org/bioperl-live/Bio/Tools/GFF.html#General

which describes how you can use

    $gffio->get_seqs()


and related methods to pull out the sequence data.


Dave


On Aug 26, 2010, at 11:02, Peter wrote:

> On Thu, Aug 26, 2010 at 9:53 AM, Dave Messina <David.Messina at sbc.su.se> wrote:
>> 
>> Admittedly i'm not up on the latest uses of GFF, but as far as I know, GFF
>> is an annotation format only ? it does not contain the actual sequence.
>> 
>> Have you looked in your GFF file to see if there are nucleotides in there?
>> 
>> Dave
> 
> Actually a GFF file can optionally include a FASTA format sequence
> at the end of the file, although it seems to be more common to just
> supply separate GFF and FASTA files and cross reference by ID.
> 
> Peter


From David.Messina at sbc.su.se  Thu Aug 26 05:18:25 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 26 Aug 2010 11:18:25 +0200
Subject: [Bioperl-l] getting DNA sequence for exon features from GFF
In-Reply-To: <C7C28E1D-7BAC-4D06-9EC6-71EA95F06776@sbc.su.se>
References: <AANLkTimzuqNz+oHZt3JAq=eQaefJ4UnQK=0R2tQhKfmS@mail.gmail.com>
	<6A715DB8-C51A-4195-BB6C-73A7345CB65A@sbc.su.se>
	<AANLkTikw=9zFm5sZej0C4kTQZMnvoFNox06jCC6p9Jxy@mail.gmail.com>
	<C7C28E1D-7BAC-4D06-9EC6-71EA95F06776@sbc.su.se>
Message-ID: <984552CF-01F3-4D29-932F-DD030CCC1448@sbc.su.se>

So, just to finish the thought:

Kanmani,

Apologies for my sloppy and uninformed answer. The following is only slightly less sloppy and uninformed, but may actually answer your question.

I think you need to call 

   $gffio->get_seqs()

probably as

  my @seq_objects = $gffio->get_seqs();


and then loop through those something like:

	foreach my $seq_object (@seq_objects) {
		my $seq = $seq_object->seq();
    
		foreach my $feat ($seq->get_SeqFeatures) {
			# do your feature processing here
		}
	}


Note that I haven't tested the above code.


Dave


From fs5 at sanger.ac.uk  Thu Aug 26 05:19:44 2010
From: fs5 at sanger.ac.uk (Frank Schwach)
Date: Thu, 26 Aug 2010 10:19:44 +0100
Subject: [Bioperl-l] getting DNA sequence for exon features from GFF
In-Reply-To: <AANLkTimzuqNz+oHZt3JAq=eQaefJ4UnQK=0R2tQhKfmS@mail.gmail.com>
References: <AANLkTimzuqNz+oHZt3JAq=eQaefJ4UnQK=0R2tQhKfmS@mail.gmail.com>
Message-ID: <1282814384.4777.48.camel@deskpro15336.dynamic.sanger.ac.uk>

Hi Kammani,

While GFF files may contain DNA sequence data, most of them don't, so
you will have to use the location information you get from the GFF
annotation file in conjunction with, e.g., a local FASTA database of the
genomic sequence you are working with or an online resource.


Frank


On Thu, 2010-08-26 at 01:29 -0700, kanmani radha wrote:
> Hi All,
> I would like to get the DNA seq from GFF file. I'm using Bio::Tools::GFF
> module. I could get everything else but not the DNA seq.
> 
> Can anyone help me to find this out, Please. I appreciate your help very
> much.
> thanks,
> Kanmani
> 
> #!/usr/bin/perl
> 
> use strict;
> use warnings;
> use Bio::Tools::GFF;
> 
> my $file = shift;
> 
> my $gffio = Bio::Tools::GFF->new(-file => $file, -gff_version => 3);
> $gffio->features_attached_to_seqs(1);
> 
> while (my $feat = $gffio->next_feature()){
>     my $start = $feat->start;
>     my $end= $feat->end;
>     my $size = $end-$start+1;
>     my $strand = $feat->strand;
>     my $seqid = $feat->seq_id;
>     my $score = $feat->score;
>     my $frame = $feat->frame;
>     my $source = $feat->source_tag;
>     my $type = $feat->primary_tag;
>     my $gffstr = $gffio->gff_string($feat);
>     my @alltags = $feat->all_tags();
>     my @ID_tag_value = $feat->each_tag_value("ID");
> 
>     my  $seq = $feat->seq();
>     print "$seq\n";
> 
>      if($type eq "gene"){     #
>        print "@ID_tag_value\t$size\t$type\t$start\t$end\n";
>     }
> }
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


From cjfields at illinois.edu  Thu Aug 26 10:20:48 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 26 Aug 2010 09:20:48 -0500
Subject: [Bioperl-l] getting DNA sequence for exon features from GFF
In-Reply-To: <1282814384.4777.48.camel@deskpro15336.dynamic.sanger.ac.uk>
References: <AANLkTimzuqNz+oHZt3JAq=eQaefJ4UnQK=0R2tQhKfmS@mail.gmail.com>
	<1282814384.4777.48.camel@deskpro15336.dynamic.sanger.ac.uk>
Message-ID: <6CC5D44B-47C7-4A05-B834-EAB9173C74CE@illinois.edu>

Kammani,

If you are using BioPerl, the best option currently available is to load a database with all relevant information (GFF and FASTA), then use that database for querying.  The most commonly-used ones now are Bio::DB::SeqFeature::Store and Bio::DB::GFF; the former is very GFF3-centric, but I believe it can handle GFF/GTF, and it has various database adaptors (MySQL, Pg, BDB, SQLite).

chris

On Aug 26, 2010, at 4:19 AM, Frank Schwach wrote:

> Hi Kammani,
> 
> While GFF files may contain DNA sequence data, most of them don't, so
> you will have to use the location information you get from the GFF
> annotation file in conjunction with, e.g., a local FASTA database of the
> genomic sequence you are working with or an online resource.
> 
> 
> Frank
> 
> 
> 
> On Thu, 2010-08-26 at 01:29 -0700, kanmani radha wrote:
>> Hi All,
>> I would like to get the DNA seq from GFF file. I'm using Bio::Tools::GFF
>> module. I could get everything else but not the DNA seq.
>> 
>> Can anyone help me to find this out, Please. I appreciate your help very
>> much.
>> thanks,
>> Kanmani
>> 
>> #!/usr/bin/perl
>> 
>> use strict;
>> use warnings;
>> use Bio::Tools::GFF;
>> 
>> my $file = shift;
>> 
>> my $gffio = Bio::Tools::GFF->new(-file => $file, -gff_version => 3);
>> $gffio->features_attached_to_seqs(1);
>> 
>> while (my $feat = $gffio->next_feature()){
>>    my $start = $feat->start;
>>    my $end= $feat->end;
>>    my $size = $end-$start+1;
>>    my $strand = $feat->strand;
>>    my $seqid = $feat->seq_id;
>>    my $score = $feat->score;
>>    my $frame = $feat->frame;
>>    my $source = $feat->source_tag;
>>    my $type = $feat->primary_tag;
>>    my $gffstr = $gffio->gff_string($feat);
>>    my @alltags = $feat->all_tags();
>>    my @ID_tag_value = $feat->each_tag_value("ID");
>> 
>>    my  $seq = $feat->seq();
>>    print "$seq\n";
>> 
>>     if($type eq "gene"){     #
>>       print "@ID_tag_value\t$size\t$type\t$start\t$end\n";
>>    }
>> }
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> 
> -- 
> The Wellcome Trust Sanger Institute is operated by Genome Research 
> Limited, a charity registered in England with number 1021457 and a 
> company registered in England with number 2742969, whose registered 
> office is 215 Euston Road, London, NW1 2BE. 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Thu Aug 26 10:31:59 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 26 Aug 2010 09:31:59 -0500
Subject: [Bioperl-l] getting DNA sequence for exon features from GFF
In-Reply-To: <AANLkTikw=9zFm5sZej0C4kTQZMnvoFNox06jCC6p9Jxy@mail.gmail.com>
References: <AANLkTimzuqNz+oHZt3JAq=eQaefJ4UnQK=0R2tQhKfmS@mail.gmail.com>
	<6A715DB8-C51A-4195-BB6C-73A7345CB65A@sbc.su.se>
	<AANLkTikw=9zFm5sZej0C4kTQZMnvoFNox06jCC6p9Jxy@mail.gmail.com>
Message-ID: <DD36A578-4156-4911-8432-84BD5ECB3AB8@illinois.edu>

On Aug 26, 2010, at 4:02 AM, Peter wrote:

> On Thu, Aug 26, 2010 at 9:53 AM, Dave Messina <David.Messina at sbc.su.se> wrote:
>> 
>> Admittedly i'm not up on the latest uses of GFF, but as far as I know, GFF
>> is an annotation format only ? it does not contain the actual sequence.
>> 
>> Have you looked in your GFF file to see if there are nucleotides in there?
>> 
>> Dave
> 
> Actually a GFF file can optionally include a FASTA format sequence
> at the end of the file, although it seems to be more common to just
> supply separate GFF and FASTA files and cross reference by ID.
> 
> Peter

IIRC, optionally including FASTA sequence is specified only in the GFF3 spec; use of FASTA isn't explicitly mentioned in earlier versions.  We only support it with earlier GFF due to convergence of the various GFF parsers.  

The original GFF spec proposed allowing sequence, but it's in the form of meta information and I have never seen it used in practice (as you mention, the FASTA is normally loaded separately).

chris


From kanmaninradha at gmail.com  Thu Aug 26 12:22:14 2010
From: kanmaninradha at gmail.com (kanmani radha)
Date: Thu, 26 Aug 2010 09:22:14 -0700
Subject: [Bioperl-l] getting DNA sequence for exon features from GFF
In-Reply-To: <6CC5D44B-47C7-4A05-B834-EAB9173C74CE@illinois.edu>
References: <AANLkTimzuqNz+oHZt3JAq=eQaefJ4UnQK=0R2tQhKfmS@mail.gmail.com>
	<1282814384.4777.48.camel@deskpro15336.dynamic.sanger.ac.uk>
	<6CC5D44B-47C7-4A05-B834-EAB9173C74CE@illinois.edu>
Message-ID: <AANLkTinxcoKBHqU7bnfyNA6bi5qBjNAYR54c6K+Pg7rz@mail.gmail.com>

Hi Everyone,

Thanks very much for this clarification.  Thanks a ton for every one who
spared their time to educate me.

I see your points.  Please correct me if I am wrong.

I understand that, Its better to use use Bio::DB::SeqFeature or Bio::DB::GFF
to load the fasta sequences (from a separate multifasta) file and
then Bio::Tools::GFF to parse the feature info from a gff file . Then query
the created database for the relevent GFF coordinates....

I will implement this.

Thanks once again.
Kanmani

On Thu, Aug 26, 2010 at 7:20 AM, Chris Fields <cjfields at illinois.edu> wrote:

> Kammani,
>
> If you are using BioPerl, the best option currently available is to load a
> database with all relevant information (GFF and FASTA), then use that
> database for querying.  The most commonly-used ones now are
> Bio::DB::SeqFeature::Store and Bio::DB::GFF; the former is very
> GFF3-centric, but I believe it can handle GFF/GTF, and it has various
> database adaptors (MySQL, Pg, BDB, SQLite).
>
> chris
>
> On Aug 26, 2010, at 4:19 AM, Frank Schwach wrote:
>
> > Hi Kammani,
> >
> > While GFF files may contain DNA sequence data, most of them don't, so
> > you will have to use the location information you get from the GFF
> > annotation file in conjunction with, e.g., a local FASTA database of the
> > genomic sequence you are working with or an online resource.
> >
> >
> > Frank
> >
> >
> >
> > On Thu, 2010-08-26 at 01:29 -0700, kanmani radha wrote:
> >> Hi All,
> >> I would like to get the DNA seq from GFF file. I'm using Bio::Tools::GFF
> >> module. I could get everything else but not the DNA seq.
> >>
> >> Can anyone help me to find this out, Please. I appreciate your help very
> >> much.
> >> thanks,
> >> Kanmani
> >>
> >> #!/usr/bin/perl
> >>
> >> use strict;
> >> use warnings;
> >> use Bio::Tools::GFF;
> >>
> >> my $file = shift;
> >>
> >> my $gffio = Bio::Tools::GFF->new(-file => $file, -gff_version => 3);
> >> $gffio->features_attached_to_seqs(1);
> >>
> >> while (my $feat = $gffio->next_feature()){
> >>    my $start = $feat->start;
> >>    my $end= $feat->end;
> >>    my $size = $end-$start+1;
> >>    my $strand = $feat->strand;
> >>    my $seqid = $feat->seq_id;
> >>    my $score = $feat->score;
> >>    my $frame = $feat->frame;
> >>    my $source = $feat->source_tag;
> >>    my $type = $feat->primary_tag;
> >>    my $gffstr = $gffio->gff_string($feat);
> >>    my @alltags = $feat->all_tags();
> >>    my @ID_tag_value = $feat->each_tag_value("ID");
> >>
> >>    my  $seq = $feat->seq();
> >>    print "$seq\n";
> >>
> >>     if($type eq "gene"){     #
> >>       print "@ID_tag_value\t$size\t$type\t$start\t$end\n";
> >>    }
> >> }
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> >
> > --
> > The Wellcome Trust Sanger Institute is operated by Genome Research
> > Limited, a charity registered in England with number 1021457 and a
> > company registered in England with number 2742969, whose registered
> > office is 215 Euston Road, London, NW1 2BE.
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From cjfields at illinois.edu  Thu Aug 26 13:08:56 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 26 Aug 2010 12:08:56 -0500
Subject: [Bioperl-l] getting DNA sequence for exon features from GFF
In-Reply-To: <AANLkTinxcoKBHqU7bnfyNA6bi5qBjNAYR54c6K+Pg7rz@mail.gmail.com>
References: <AANLkTimzuqNz+oHZt3JAq=eQaefJ4UnQK=0R2tQhKfmS@mail.gmail.com>
	<1282814384.4777.48.camel@deskpro15336.dynamic.sanger.ac.uk>
	<6CC5D44B-47C7-4A05-B834-EAB9173C74CE@illinois.edu>
	<AANLkTinxcoKBHqU7bnfyNA6bi5qBjNAYR54c6K+Pg7rz@mail.gmail.com>
Message-ID: <EF1B137F-94A7-45E1-B8FB-0E20142F0A7F@illinois.edu>

On Aug 26, 2010, at 11:22 AM, kanmani radha wrote:

> Hi Everyone,
> 
> Thanks very much for this clarification.  Thanks a ton for every one who
> spared their time to educate me.
> 
> I see your points.  Please correct me if I am wrong.
> 
> I understand that, Its better to use use Bio::DB::SeqFeature or Bio::DB::GFF
> to load the fasta sequences (from a separate multifasta) file and
> then Bio::Tools::GFF to parse the feature info from a gff file . Then query
> the created database for the relevent GFF coordinates....
> 
> I will implement this.
> 
> Thanks once again.
> Kanmani

Yes, in general.  I forgot to mention that you can have an in-memory database as well, but it's only suggested if you have a few thousand or so features and small sequences (I think bacterial chromosomes will work).  

chris


From Havard.Aanes at nvh.no  Wed Aug 25 11:47:12 2010
From: Havard.Aanes at nvh.no (=?iso-8859-1?Q?Aanes_H=E5vard?=)
Date: Wed, 25 Aug 2010 17:47:12 +0200
Subject: [Bioperl-l] bpfetch.pl
Message-ID: <897520BC3AAE754FA4E34E2FD26490A8021C61597B8D@A-EXMB1.veths.no>


Hi,

I am trying do obtain a set of mRNA sequences from a database, made by the bpindex script. I thought this should be a trivial task, but it appears not to be. I get the sequences if I do one by one, like this:

perl scripts/index/bpfetch.pl -dir ./ zebrafish:NM_201192 zebrafish:NM_212708

But I need hundreds of sequences, so my plan was to put the RefSeq IDs in a file and use that as an argument (or whatever it is called in perl). That does not work:

haavaaan at login2 ~/download/src/bioperl-1.2.3 $ perl scripts/index/bpfetch.pl -dir ./ zebrafish:./some_seqs

You are running bpindex.pl without installing bioperl.
You have done it from bioperl/scripts, and so we can find the necessary information
but it is much better to install bioperl

Please read the README in the bioperl distribution

Sequence %id in Database zebrafish is not present


Any suggestions on how to do this? Alternative approaches are also appreciated.

I have no experience in perl, just started using linux, and for the moment there is no time to learn perl, so I would really be grateful for any help to solve this specific task.

Best regards

H?vard Aanes (M.Sc.)
Ph.D. student
Section for biochemistry and physiology
The Norwegian School of Veterinary Science
Telephone: +47 22597358


The new e-mail domain name for The Norwegian School of Veterinary Science is @nvh.no.
The former domain address @veths.no will still be in use, but it will be discontinued within 1-2 years.
Please update your e-mail records.


This message verifies that the e-mail has been 
scanned for virus, and deemed virus-free 
according to our scanengines.


From kanmaninradha at gmail.com  Thu Aug 26 04:23:28 2010
From: kanmaninradha at gmail.com (kanmani)
Date: Thu, 26 Aug 2010 01:23:28 -0700 (PDT)
Subject: [Bioperl-l] Bio::Tools:GFF to get DNA sequences...
Message-ID: <9b7381d7-3596-4e60-a2ac-6c8c135d457d@s24g2000pri.googlegroups.com>

Hi I am trying to get the DNA sequences for each exon feature. I have
the following script. Everything works except getting sequences. Can
some one correct me.....Thanks.

#!/usr/bin/perl

use strict;
use warnings;
use Bio::Tools::GFF;


my $file = shift;
my $gffio = Bio::Tools::GFF->new(-file => $file, -gff_version => 3);
$gffio->features_attached_to_seqs(1);

while (my $feat = $gffio->next_feature()){
    my $start = $feat->start;
    my $end= $feat->end;
    my $size = $end-$start+1;
    my $strand = $feat->strand;
    my $seqid = $feat->seq_id;
    my $score = $feat->score;
    my $frame = $feat->frame;
    my $source = $feat->source_tag;
    my $type = $feat->primary_tag;
    my $gffstr = $gffio->gff_string($feat);
    my @alltags = $feat->all_tags();
    my @ID_tag_value = $feat->each_tag_value("ID");

   my  $seq = $feat->seq();
   print "$seq\n";

  if($type eq "gene"){
       print "@ID_tag_value\t$size\t$type\t$start\t$end\n";
    }
}


From kanmaninradha at gmail.com  Thu Aug 26 17:24:40 2010
From: kanmaninradha at gmail.com (kanmani radha)
Date: Thu, 26 Aug 2010 14:24:40 -0700
Subject: [Bioperl-l] getting DNA sequence for exon features from GFF
In-Reply-To: <EF1B137F-94A7-45E1-B8FB-0E20142F0A7F@illinois.edu>
References: <AANLkTimzuqNz+oHZt3JAq=eQaefJ4UnQK=0R2tQhKfmS@mail.gmail.com>
	<1282814384.4777.48.camel@deskpro15336.dynamic.sanger.ac.uk>
	<6CC5D44B-47C7-4A05-B834-EAB9173C74CE@illinois.edu>
	<AANLkTinxcoKBHqU7bnfyNA6bi5qBjNAYR54c6K+Pg7rz@mail.gmail.com>
	<EF1B137F-94A7-45E1-B8FB-0E20142F0A7F@illinois.edu>
Message-ID: <AANLkTikUxFLLAduO7M1QzSToewA_AgPPELKPVYq0+JKk@mail.gmail.com>

Hi Chris and others,

For a brief amount time i could get away using Bio::DB::Fasta to index fasta
files and Bio::Tools::GFF to iterate thru GFF features. But, i hit the wall
again. Looks like sequential access of GFF featuers is not sufficient, I
want to have a random access to it. I see the only way to do that is by
using Bio::DB::GFF as suggested by Chris.

Here is my question. Is there any tutorial to configure Bioperl  or this
module in particular to work with MySQL/postgres. I will really appreciate
it.

And thanks for all your help.
Kanmani

On Thu, Aug 26, 2010 at 10:08 AM, Chris Fields <cjfields at illinois.edu>wrote:

> On Aug 26, 2010, at 11:22 AM, kanmani radha wrote:
>
> > Hi Everyone,
> >
> > Thanks very much for this clarification.  Thanks a ton for every one who
> > spared their time to educate me.
> >
> > I see your points.  Please correct me if I am wrong.
> >
> > I understand that, Its better to use use Bio::DB::SeqFeature or
> Bio::DB::GFF
> > to load the fasta sequences (from a separate multifasta) file and
> > then Bio::Tools::GFF to parse the feature info from a gff file . Then
> query
> > the created database for the relevent GFF coordinates....
> >
> > I will implement this.
> >
> > Thanks once again.
> > Kanmani
>
> Yes, in general.  I forgot to mention that you can have an in-memory
> database as well, but it's only suggested if you have a few thousand or so
> features and small sequences (I think bacterial chromosomes will work).
>
> chris


From kanmaninradha at gmail.com  Thu Aug 26 18:04:20 2010
From: kanmaninradha at gmail.com (kanmani radha)
Date: Thu, 26 Aug 2010 15:04:20 -0700
Subject: [Bioperl-l] getting DNA sequence for exon features from GFF
In-Reply-To: <AANLkTikUxFLLAduO7M1QzSToewA_AgPPELKPVYq0+JKk@mail.gmail.com>
References: <AANLkTimzuqNz+oHZt3JAq=eQaefJ4UnQK=0R2tQhKfmS@mail.gmail.com>
	<1282814384.4777.48.camel@deskpro15336.dynamic.sanger.ac.uk>
	<6CC5D44B-47C7-4A05-B834-EAB9173C74CE@illinois.edu>
	<AANLkTinxcoKBHqU7bnfyNA6bi5qBjNAYR54c6K+Pg7rz@mail.gmail.com>
	<EF1B137F-94A7-45E1-B8FB-0E20142F0A7F@illinois.edu>
	<AANLkTikUxFLLAduO7M1QzSToewA_AgPPELKPVYq0+JKk@mail.gmail.com>
Message-ID: <AANLkTimTU87G1dajASCzHm5=pjHCKx8W5X8AR9TKLmU4@mail.gmail.com>

HI, I made some progress since then....
- Installing  Bio::DB::DBI::mysql needed Biosql.

- Downloaded and installed biosql follow the instruction as given in their
INSTALL file
- Created biosql db in my mysql server
- loaded schema using script from biosql

- installed DBI
- Now, I have problem with DBD::mysql. That reminds me couple years back i
had to struggle installing this driver on another machine. I thought i ask
around this time.

It fails with a bunch of error messages.....the first of it being....
dbdimp.h:22:49 error: mysql.h no such filer or directory

But, My mysql installation has header file in
"/usr/include/mysql3/mysql/mysql.h". Can anyone suggest how to move forward
from that.....

thanks,
Kanmani

On Thu, Aug 26, 2010 at 2:24 PM, kanmani radha <kanmaninradha at gmail.com>wrote:

> Hi Chris and others,
>
> For a brief amount time i could get away using Bio::DB::Fasta to index
> fasta files and Bio::Tools::GFF to iterate thru GFF features. But, i hit the
> wall again. Looks like sequential access of GFF featuers is not sufficient,
> I want to have a random access to it. I see the only way to do that is by
> using Bio::DB::GFF as suggested by Chris.
>
> Here is my question. Is there any tutorial to configure Bioperl  or this
> module in particular to work with MySQL/postgres. I will really appreciate
> it.
>
> And thanks for all your help.
> Kanmani
>
>
> On Thu, Aug 26, 2010 at 10:08 AM, Chris Fields <cjfields at illinois.edu>wrote:
>
>> On Aug 26, 2010, at 11:22 AM, kanmani radha wrote:
>>
>> > Hi Everyone,
>> >
>> > Thanks very much for this clarification.  Thanks a ton for every one who
>> > spared their time to educate me.
>> >
>> > I see your points.  Please correct me if I am wrong.
>> >
>> > I understand that, Its better to use use Bio::DB::SeqFeature or
>> Bio::DB::GFF
>> > to load the fasta sequences (from a separate multifasta) file and
>> > then Bio::Tools::GFF to parse the feature info from a gff file . Then
>> query
>> > the created database for the relevent GFF coordinates....
>> >
>> > I will implement this.
>> >
>> > Thanks once again.
>> > Kanmani
>>
>> Yes, in general.  I forgot to mention that you can have an in-memory
>> database as well, but it's only suggested if you have a few thousand or so
>> features and small sequences (I think bacterial chromosomes will work).
>>
>> chris
>
>
>


From rafalucas.unicamp at gmail.com  Thu Aug 26 18:11:07 2010
From: rafalucas.unicamp at gmail.com (Rafael Lucas)
Date: Thu, 26 Aug 2010 19:11:07 -0300
Subject: [Bioperl-l] Help in algorithm Bio::Structure::IO::pdb
Message-ID: <AANLkTi=zWPKeY1NpRA9TBSEnsbGH1W9F0y0QQ0+um7Yq@mail.gmail.com>

Hi folks,

How are you? I'm from Brazil and I was making an algorithm that
Cryptographyc a data and then print the result in a pdb file. So I have a
.fasta file and want to pass this file to .pdb file, if I use a program,
like PyMol, it will take so much time, so I wanna use the
Bio::Structure::IO::pdb to accelerate this process, could you help me in
this problem?

Thank you,

Rafael Lucas
Faculdade de Tecnologia em Analise e Desenvolvimento de Sistemas
FT - UNICAMP
+55 (19)9614-0533


From J.Christopher.Ellis at duke.edu  Thu Aug 26 22:06:30 2010
From: J.Christopher.Ellis at duke.edu (J. Christopher Ellis)
Date: Thu, 26 Aug 2010 22:06:30 -0400
Subject: [Bioperl-l] standaloneblastplus blastn crash
Message-ID: <55861.1282874790@duke.edu>

 When I run the standaloneblastplus I get the following error...

 ------------- EXCEPTION -------------
 MSG: C:Program FilesNCBIblast-2.2.24+binblastn.exe call crashed: There
was a problem running C:Program FilesNCBIblast-2.2.24+binblastn.exe :? at
C:/Perl64/lib/Bio/Tools/Run/WrapperBase/CommandExts.pm line 1001.

 STACK Bio::Tools::Run::WrapperBase::_run
C:/Perl64/lib/Bio/Tools/Run/WrapperBase/CommandExts.pm:1006
 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD
C:/Perl64/lib/Bio/Tools/Run/StandAloneBlastPlus.pm:1303
 STACK Bio::Tools::Run::StandAloneBlastPlus::run
C:/Perl64/lib/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:270
 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD
C:/Perl64/lib/Bio/Tools/Run/StandAloneBlastPlus.pm:1301
 STACK toplevel localBlast.pl:9
 -------------------------------------

 I have a sneaky suspicion that it is an easy fix but for the life of me I
can not figure it out! :)

 Thanks in advance,
 Chris
 

From indraniel at gmail.com  Thu Aug 26 21:57:54 2010
From: indraniel at gmail.com (Indraniel)
Date: Fri, 27 Aug 2010 01:57:54 +0000 (UTC)
Subject: [Bioperl-l] How to convert SFF into Fastq
References: <COL102-W14F3F0CDA966B9ECE0BE1BFABB0@phx.gbl>
	<AANLkTilN3rsgWEjvmyMq9IjC8p5MzBdGGe-Xtfd6XoZF@mail.gmail.com>
	<AANLkTikC-I0JFvWqptlA69qrKnKrWSNyNPAwHQKSLluJ@mail.gmail.com>
Message-ID: <loom.20100827T035104-821@post.gmane.org>

A fourth option is the following tool, sff2fastq (written in C), described here:

http://indraniel.wordpress.com/2010/04/23/sff2fastq/

and 

http://github.com/indraniel/sff2fastq

Indraniel


From David.Messina at sbc.su.se  Fri Aug 27 03:41:21 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 27 Aug 2010 09:41:21 +0200
Subject: [Bioperl-l] [RFC] Interolog::Walk
In-Reply-To: <4C6D0B50.4050902@sms.ed.ac.uk>
References: <4C6BF4BD.5010200@sms.ed.ac.uk>
	<8BBCD561-CA5E-412B-8CBC-34767006A74A@sbc.su.se>
	<4C6D0B50.4050902@sms.ed.ac.uk>
Message-ID: <A5AACD38-0396-4221-B6F7-5740FBBD83E0@sbc.su.se>

Hi Giuseppe,


On Aug 19, 2010, at 12:45, Giuseppe Gallone wrote:
> Bio::Orthology::InterologMap
> Bio::Orthology::Interolog::Map,

> just in case somebody else finds other interesting applications for the Interolog concept and would like to "plug in" their own contribution. Would this make any sense?

Absolutely. I think either of the above is a good option, and I agree that the second is a little more flexible.

Your POD looks great! Way better than most. Having seen the whole thing now, I think your description is fine as is. And if you have another tutorial and example scripts on top of it, that would really be terrific, above and beyond what most people would expect.

So, time to unleash it on the world! :)


Dave


From David.Messina at sbc.su.se  Fri Aug 27 03:58:12 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 27 Aug 2010 09:58:12 +0200
Subject: [Bioperl-l] standaloneblastplus blastn crash
In-Reply-To: <55861.1282874790@duke.edu>
References: <55861.1282874790@duke.edu>
Message-ID: <9275A540-AE42-47B0-BA73-A906964C451B@sbc.su.se>

Hi Chris,

If you look at the error message, it says what the problem is: it's trying to call the blastn executable with no spaces in the path name.

> MSG: C:Program FilesNCBIblast-2.2.24+binblastn.exe call crashed: There
> was a problem running C:Program FilesNCBIblast-2.2.24+binblastn.exe


Now, that could be a problem is BioPerl or it could be a problem in your code. It's hard to diagnose where the problem lies without your code, so please post your code.


Dave


From G.Gallone at sms.ed.ac.uk  Fri Aug 27 07:07:57 2010
From: G.Gallone at sms.ed.ac.uk (Giuseppe Gallone)
Date: Fri, 27 Aug 2010 12:07:57 +0100
Subject: [Bioperl-l] [RFC] Interolog::Walk
In-Reply-To: <A5AACD38-0396-4221-B6F7-5740FBBD83E0@sbc.su.se>
References: <4C6BF4BD.5010200@sms.ed.ac.uk>
	<8BBCD561-CA5E-412B-8CBC-34767006A74A@sbc.su.se>
	<4C6D0B50.4050902@sms.ed.ac.uk>
	<A5AACD38-0396-4221-B6F7-5740FBBD83E0@sbc.su.se>
Message-ID: <4C779C8D.1090007@sms.ed.ac.uk>

Hi Dave,

thank you very much for your feedback :) . I will register the namespace 
right now. I think I will use 'homology' as the second level name 
though, because I plan to extend the module to work with paralogues as well.

As for the category, which one of the following you reckon it will fit a 
Bio:: package better

http://www.cpan.org/modules/by-category/

Regards
Giuseppe

On 27/08/10 08:41, Dave Messina wrote:
> Hi Giuseppe,
>
>
> On Aug 19, 2010, at 12:45, Giuseppe Gallone wrote:
>> Bio::Orthology::InterologMap
>> Bio::Orthology::Interolog::Map,
>
>> just in case somebody else finds other interesting applications for the Interolog concept and would like to "plug in" their own contribution. Would this make any sense?
>
> Absolutely. I think either of the above is a good option, and I agree that the second is a little more flexible.
>
> Your POD looks great! Way better than most. Having seen the whole thing now, I think your description is fine as is. And if you have another tutorial and example scripts on top of it, that would really be terrific, above and beyond what most people would expect.
>
> So, time to unleash it on the world! :)
>
>
> Dave
>
>

-- 

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


From David.Messina at sbc.su.se  Fri Aug 27 07:25:06 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 27 Aug 2010 13:25:06 +0200
Subject: [Bioperl-l] [RFC] Interolog::Walk
In-Reply-To: <4C779C8D.1090007@sms.ed.ac.uk>
References: <4C6BF4BD.5010200@sms.ed.ac.uk>
	<8BBCD561-CA5E-412B-8CBC-34767006A74A@sbc.su.se>
	<4C6D0B50.4050902@sms.ed.ac.uk>
	<A5AACD38-0396-4221-B6F7-5740FBBD83E0@sbc.su.se>
	<4C779C8D.1090007@sms.ed.ac.uk>
Message-ID: <80E5F23B-EA13-40EE-B0C5-81F2E6A69C01@sbc.su.se>

Hi Giuseppe,


> I think I will use 'homology' as the second level name though, because I plan to extend the module to work with paralogues as well.

Sounds good.


> As for the category, which one of the following you reckon it will fit a Bio:: package better
> 
> http://www.cpan.org/modules/by-category/


Bio:: is in 23 - miscellaneous modules, so probably keeping with that makes sense.

I don't know much about that stuff, though. Chris F. or other CPAN cognoscenti care to comment?


Dave


From cjfields at illinois.edu  Fri Aug 27 09:26:51 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 27 Aug 2010 08:26:51 -0500
Subject: [Bioperl-l] [RFC] Interolog::Walk
In-Reply-To: <80E5F23B-EA13-40EE-B0C5-81F2E6A69C01@sbc.su.se>
References: <4C6BF4BD.5010200@sms.ed.ac.uk>
	<8BBCD561-CA5E-412B-8CBC-34767006A74A@sbc.su.se>
	<4C6D0B50.4050902@sms.ed.ac.uk>
	<A5AACD38-0396-4221-B6F7-5740FBBD83E0@sbc.su.se>
	<4C779C8D.1090007@sms.ed.ac.uk>
	<80E5F23B-EA13-40EE-B0C5-81F2E6A69C01@sbc.su.se>
Message-ID: <88BB7813-E892-4BEC-9C49-5FD22325BBF7@illinois.edu>

On Aug 27, 2010, at 6:25 AM, Dave Messina wrote:

> Hi Giuseppe,
> 
> 
>> I think I will use 'homology' as the second level name though, because I plan to extend the module to work with paralogues as well.
> 
> Sounds good.
> 
> 
>> As for the category, which one of the following you reckon it will fit a Bio:: package better
>> 
>> http://www.cpan.org/modules/by-category/
> 
> 
> Bio:: is in 23 - miscellaneous modules, so probably keeping with that makes sense.
> 
> I don't know much about that stuff, though. Chris F. or other CPAN cognoscenti care to comment?
> 
> 
> Dave

That's probably the best spot, as we cover a fairly broad range (mainly due to core monolithic structure).  Though it's terribly non-descript, sort of the junk drawer of CPAN.

chris


From adamkennedybackup at gmail.com  Sun Aug 29 07:35:50 2010
From: adamkennedybackup at gmail.com (Adam Kennedy)
Date: Sun, 29 Aug 2010 21:35:50 +1000
Subject: [Bioperl-l] Could I install BioPerl on Windows with the
 ActivePerl 5.12.1?
In-Reply-To: <5115F433-06AC-46F1-81AD-D15C4A8D9524@gmail.com>
References: <AANLkTi=ycKzqWWQ-FHk=4WBxhedt7CYT-WkBZkxRjgrm@mail.gmail.com>
	<78E913D5-00E2-45F2-AA9D-7F4A7CDBFDA1@gmail.com>
	<AANLkTimBPL6Sr2kmg+f0t1j8pk_9nBAoqubKzY4AJoxo@mail.gmail.com>
	<5115F433-06AC-46F1-81AD-D15C4A8D9524@gmail.com>
Message-ID: <AANLkTinSp6GCOQvCFYOUk1Ad8EjKdU=dQbe5GpbLiLr1@mail.gmail.com>

http://strawberryperl.com/download/professional/strawberry-perl-professional-5.10.1.3-alpha-2.msi

You get BioPerl installed out the box.

Adam K

On 20 August 2010 03:20, Christopher Fields <cjfields1 at gmail.com> wrote:
> cc'ing list. ?Looks like the BioPerl PPM is possibly broken for perl 5.12. ?Shouldn't be too hard to fix, but apparently there are a lot of missing packages. Troubling...
>
> chris
>
> On Aug 19, 2010, at 11:29 AM, han sun wrote:
>
>> v5.10 works,thanks.
>>
>> 2010/8/19 Christopher Fields <cjfields1 at gmail.com>
>> Try using ActivePerl 5.10 instead of v5.12. ?It's very possible the PPM won't work for v5.12 yet.
>>
>> chris
>>
>> On Aug 19, 2010, at 9:25 AM, han sun wrote:
>>
>> > Hello everyone,
>> >
>> > I have used perl for several months,and I now want to feel the power of
>> > bioperl.
>> > But it seems that the installing is more difficult than I thought.
>> >
>> > I typed the commands.
>> >
>> >
>> >
>> > install-shell
>> >
>> >
>> > rep add bioperl http://bioperl.org/DIST
>> >
>> >
>> > rep add uwinnipeg
>> > http://cpan.uwinnipeg.ca/PPMPackages/12xx/<http://cpan.uwinnipeg.ca/PPMPackages/10xx/>
>> >
>> >
>> > rep add trouchelle http://trouchelle.com/ppm12/
>> >
>> > install BioPerl
>> >
>> > However,the installing failed,
>> >
>> > ppm install failed:
>> > Can't find any package that provides List::MoreUtils for Bundle-BioPerl-Core
>> > Can't find any package that provides PostScript::TextBlock for
>> > Bundle-BioPerl-Core
>> > Can't find any package that provides Ace:: for Bundle-BioPerl-Core
>> > Can't find any package that provides SOAP::Lite for Bundle-BioPerl-Core
>> > Can't find any package that provides Convert::Binary::C for
>> > Bundle-BioPerl-Core
>> > Can't find any package that provides XML::Twig for Bundle-BioPerl-Core
>> > Can't find any package that provides DB_File:: for Bundle-BioPerl-Core
>> > Can't find any package that provides IPC::Run for GraphViz
>> > Can't find any package that provides XML-XPathEngine for XML-DOM-XPath
>> > Can't find any package that provides List-MoreUtils for Moose
>> > Can't find any package that provides List-MoreUtils for Class-MOP
>> >
>> >
>> > then I tried
>> >
>> > install http://www.bribes.org/perl/ppm/GD.ppd
>> >
>> > and tried the installation again,but it still didn't help.
>> >
>> > *
>> > *
>> > *
>> > *
>> > *
>> > *
>> >
>> >
>> > *Do you konw what's wrong with the problem?*
>> > *
>> > *
>> > *
>> > *
>> > *Please help me,thanks very much.*
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields1 at gmail.com  Sun Aug 29 11:58:50 2010
From: cjfields1 at gmail.com (Christopher Fields)
Date: Sun, 29 Aug 2010 10:58:50 -0500
Subject: [Bioperl-l] Could I install BioPerl on Windows with the
	ActivePerl 5.12.1?
In-Reply-To: <AANLkTinSp6GCOQvCFYOUk1Ad8EjKdU=dQbe5GpbLiLr1@mail.gmail.com>
References: <AANLkTi=ycKzqWWQ-FHk=4WBxhedt7CYT-WkBZkxRjgrm@mail.gmail.com>
	<78E913D5-00E2-45F2-AA9D-7F4A7CDBFDA1@gmail.com>
	<AANLkTimBPL6Sr2kmg+f0t1j8pk_9nBAoqubKzY4AJoxo@mail.gmail.com>
	<5115F433-06AC-46F1-81AD-D15C4A8D9524@gmail.com>
	<AANLkTinSp6GCOQvCFYOUk1Ad8EjKdU=dQbe5GpbLiLr1@mail.gmail.com>
Message-ID: <A1B60C18-E144-466B-9630-21A88EF2CECB@gmail.com>

Yes, and I am thinking of pointing more and more users that direction instead.  Can't say maintaining PPM packages with ever-fluctuating specs is easy when I don't work with Windows anymore.

chris

On Aug 29, 2010, at 6:35 AM, Adam Kennedy wrote:

> http://strawberryperl.com/download/professional/strawberry-perl-professional-5.10.1.3-alpha-2.msi
> 
> You get BioPerl installed out the box.
> 
> Adam K
> 
> On 20 August 2010 03:20, Christopher Fields <cjfields1 at gmail.com> wrote:
>> cc'ing list.  Looks like the BioPerl PPM is possibly broken for perl 5.12.  Shouldn't be too hard to fix, but apparently there are a lot of missing packages. Troubling...
>> 
>> chris
>> 
>> On Aug 19, 2010, at 11:29 AM, han sun wrote:
>> 
>>> v5.10 works,thanks.
>>> 
>>> 2010/8/19 Christopher Fields <cjfields1 at gmail.com>
>>> Try using ActivePerl 5.10 instead of v5.12.  It's very possible the PPM won't work for v5.12 yet.
>>> 
>>> chris
>>> 
>>> On Aug 19, 2010, at 9:25 AM, han sun wrote:
>>> 
>>>> Hello everyone,
>>>> 
>>>> I have used perl for several months,and I now want to feel the power of
>>>> bioperl.
>>>> But it seems that the installing is more difficult than I thought.
>>>> 
>>>> I typed the commands.
>>>> 
>>>> 
>>>> 
>>>> install-shell
>>>> 
>>>> 
>>>> rep add bioperl http://bioperl.org/DIST
>>>> 
>>>> 
>>>> rep add uwinnipeg
>>>> http://cpan.uwinnipeg.ca/PPMPackages/12xx/<http://cpan.uwinnipeg.ca/PPMPackages/10xx/>
>>>> 
>>>> 
>>>> rep add trouchelle http://trouchelle.com/ppm12/
>>>> 
>>>> install BioPerl
>>>> 
>>>> However,the installing failed,
>>>> 
>>>> ppm install failed:
>>>> Can't find any package that provides List::MoreUtils for Bundle-BioPerl-Core
>>>> Can't find any package that provides PostScript::TextBlock for
>>>> Bundle-BioPerl-Core
>>>> Can't find any package that provides Ace:: for Bundle-BioPerl-Core
>>>> Can't find any package that provides SOAP::Lite for Bundle-BioPerl-Core
>>>> Can't find any package that provides Convert::Binary::C for
>>>> Bundle-BioPerl-Core
>>>> Can't find any package that provides XML::Twig for Bundle-BioPerl-Core
>>>> Can't find any package that provides DB_File:: for Bundle-BioPerl-Core
>>>> Can't find any package that provides IPC::Run for GraphViz
>>>> Can't find any package that provides XML-XPathEngine for XML-DOM-XPath
>>>> Can't find any package that provides List-MoreUtils for Moose
>>>> Can't find any package that provides List-MoreUtils for Class-MOP
>>>> 
>>>> 
>>>> then I tried
>>>> 
>>>> install http://www.bribes.org/perl/ppm/GD.ppd
>>>> 
>>>> and tried the installation again,but it still didn't help.
>>>> 
>>>> *
>>>> *
>>>> *
>>>> *
>>>> *
>>>> *
>>>> 
>>>> 
>>>> *Do you konw what's wrong with the problem?*
>>>> *
>>>> *
>>>> *
>>>> *
>>>> *Please help me,thanks very much.*
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>>> 
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From odclerck at gmail.com  Fri Aug 27 03:44:14 2010
From: odclerck at gmail.com (odclerck)
Date: Fri, 27 Aug 2010 00:44:14 -0700 (PDT)
Subject: [Bioperl-l]  fasta header replace
Message-ID: <29550202.post@talk.nabble.com>


Hi,
Was wondering if someone had an easy script available that converts the
headers of a fasta sequences to a value stored in a separate text file.

Macrogen produces files with sequences that look more or less like this:
>100825-30_I01_CF_CentralAmerica1_A1_psbAF.ab1	1012, 1000 bases, 0 checksum.

I can filter out the position on the plate e.g. "A1" easily but would like
to replace this with the name of the strain stored in a different text file,
e.g. "A1_D1222".

Realize this sounds pretty basic to most of you, but I'm pretty new at
scripting.
Olivier

-- 
View this message in context: http://old.nabble.com/fasta-header-replace-tp29550202p29550202.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From J.Christopher.Ellis at duke.edu  Mon Aug 30 08:55:04 2010
From: J.Christopher.Ellis at duke.edu (J. Christopher Ellis)
Date: Mon, 30 Aug 2010 08:55:04 -0400
Subject: [Bioperl-l] Taxonomy DB problem
Message-ID: <51468.1283172904@duke.edu>

 Hi All,

 I am trying to extract the entire taxonomy of an organism including the
classifications. Some thing like...

Phylum:Proteobacteria,?Class:Gammaproteobacteria,?Order:Enterobacteriales,?Family:Enterobacteriaceae,?Genus:Escherichia

I?am?not?worried?about?format?just?that?I?get?the?information?and?the?associated?level?of?hierarchy.?The?script?found?at?http://bioperl.org/wiki/Species_names_from_accession_numbers?seemed?like?a?good?starting?point?so?I?copied?it?and?tried?run?it?but?got?an?error.

My?first?question?is?"Is?there?a?known?fix?for?this?"?and?my?second?question?is?how?do?I?get?the?full?hierarchical?information?(as?seen?above)?with?the?taxonomy?db?

Thanks?for?all?your?help?in?advance!

Chris?


From rafalucas.unicamp at gmail.com  Mon Aug 30 09:24:11 2010
From: rafalucas.unicamp at gmail.com (Rafael Lucas)
Date: Mon, 30 Aug 2010 10:24:11 -0300
Subject: [Bioperl-l] help in algorithm Bio::Structure::IO::pdb
Message-ID: <AANLkTimNHcjCRqYrhH8=Q=Dqqjtj35NNqMqP+Q2P1oPU@mail.gmail.com>

Hi folks,

How are you? I'm from Brazil and I was making an algorithm that
Cryptographyc a data and then print the result in a pdb file. So I have a
.fasta file and want to pass this file to .pdb file, if I use a program,
like PyMol, it will take so much time, so I wanna use the
Bio::Structure::IO::pdb to accelerate this process, could you help me in
this problem?

Thank you,

Rafael Lucas
Faculdade de Tecnologia em Analise e Desenvolvimento de Sistemas
FT - UNICAMP
+55 (19)9614-0533


From cjfields at illinois.edu  Mon Aug 30 09:36:41 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 30 Aug 2010 08:36:41 -0500
Subject: [Bioperl-l] Taxonomy DB problem
In-Reply-To: <51468.1283172904@duke.edu>
References: <51468.1283172904@duke.edu>
Message-ID: <B93CF33A-0FA5-4A19-AF5A-BE203AA26E38@illinois.edu>

Chris,

Regarding a fix for that script, we would have to see your modified script and the error.  However, there are modules within BioPerl to essentially do what you want, in particular, Bio::DB::Taxonomy.

chris

On Aug 30, 2010, at 7:55 AM, J. Christopher Ellis wrote:

> Hi All,
> 
> I am trying to extract the entire taxonomy of an organism including the
> classifications. Some thing like...
> 
> Phylum:Proteobacteria, Class:Gammaproteobacteria, Order:Enterobacteriales, Family:Enterobacteriaceae, Genus:Escherichia
> 
> I am not worried about format just that I get the information and the associated level of hierarchy. The script found at http://bioperl.org/wiki/Species_names_from_accession_numbers seemed like a good starting point so I copied it and tried run it but got an error.
> 
> My first question is "Is there a known fix for this?" and my second question is how do I get the full hierarchical information (as seen above) with the taxonomy db?
> 
> Thanks for all your help in advance!
> 
> Chris 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From fs5 at sanger.ac.uk  Mon Aug 30 11:11:06 2010
From: fs5 at sanger.ac.uk (Frank Schwach)
Date: Mon, 30 Aug 2010 16:11:06 +0100
Subject: [Bioperl-l] fasta header replace
In-Reply-To: <29550202.post@talk.nabble.com>
References: <29550202.post@talk.nabble.com>
Message-ID: <4C7BCA0A.70503@sanger.ac.uk>

Hi Olivier,

Do you know how to read a file and build a hash from the contents? This 
is what you will need to do,
e.g. if your file is
A1 Strain_A
A2 Strain_A
A3 Strain_B

then you can do something like:

open (INFILE, '>', $infile_path) or die;
my %well2strain;
While (<INFILE>){
    my ($well, $strain) = ($_=~/^([A-Z]\d+)\s+(\w+)/);
    $well2strain{$well}=$strain;
}

You can then use the values of the hash to set the sequence ID as you 
parse the FASTA file. The BioPerl SeqIO howto gives details about how to 
read and write the FASTA file 
(http://www.bioperl.org/wiki/HOWTO:SeqIO#Working_Examples).
You can change the id of a sequence object with
$some_seq_object->id( 'my new ID');

See http://doc.bioperl.org/releases/bioperl-1.0/Bio/Seq.html for details.

Hope that helps to get you started.

Frank

 
odclerck wrote:
> Hi,
> Was wondering if someone had an easy script available that converts the
> headers of a fasta sequences to a value stored in a separate text file.
>
> Macrogen produces files with sequences that look more or less like this:
>   
>> 100825-30_I01_CF_CentralAmerica1_A1_psbAF.ab1	1012, 1000 bases, 0 checksum.
>>     
>
> I can filter out the position on the plate e.g. "A1" easily but would like
> to replace this with the name of the strain stored in a different text file,
> e.g. "A1_D1222".
>
> Realize this sounds pretty basic to most of you, but I'm pretty new at
> scripting.
> Olivier
>
>   


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


From jessica.sun at gmail.com  Mon Aug 30 11:51:39 2010
From: jessica.sun at gmail.com (Jessica Sun)
Date: Mon, 30 Aug 2010 11:51:39 -0400
Subject: [Bioperl-l] Git for the lazy
In-Reply-To: <4A13D48C-B920-4FA5-AF18-292C764A8B79@sbc.su.se>
References: <4A13D48C-B920-4FA5-AF18-292C764A8B79@sbc.su.se>
Message-ID: <AANLkTikzkPL-WN7XUNPcfNhqqnOYUR15br-YzrwsE5tL@mail.gmail.com>

I want to add sequence features  with tags and tag values, I want to have
them in my order, however somehow it seems it is in default alphabetically
orders of the tags, does any one knows how to fix? thanks a lot in advance.


From G.Gallone at sms.ed.ac.uk  Tue Aug 31 07:52:57 2010
From: G.Gallone at sms.ed.ac.uk (Giuseppe Gallone)
Date: Tue, 31 Aug 2010 12:52:57 +0100
Subject: [Bioperl-l] New CPAN Release - Bio::Homology::InterologWalk - A
 Perl Module to retrieve putative PPIs through Interologs
Message-ID: <4C7CED19.80802@sms.ed.ac.uk>

Dear Bioperl users,

I would like to announce the release of Bio::Homology::InterologWalk, a
module that retrieves, scores and visualizes putative Protein-Protein 
Interactions through the orthology-walk method.

The project is available from the following link

http://search.cpan.org/~ggallone/

and a description of the idea behind it is here

http://search.cpan.org/~ggallone/Bio-Homology-InterologWalk-0.02/lib/Bio/Homology/InterologWalk.pm#DESCRIPTION

The project is in a very early stage (currently ver. 0.02 alpha) and has 
currently been tested only on Linux environments. It has not been tested 
on Macs, but it should work fine, and I would appreciate any feedback 
from Mac users who try it.

*Any* form of feedback  will be extremely appreciated (bug, typos,
syntactical errors, verbal abuse etc :) ).


Best,
Giuseppe


-- 

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


From cjfields at illinois.edu  Tue Aug 31 11:01:59 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 31 Aug 2010 10:01:59 -0500
Subject: [Bioperl-l] Taxonomy DB problem
In-Reply-To: <56973.1283255847@duke.edu>
References: <56973.1283255847@duke.edu>
Message-ID: <7167CA86-857E-4E16-A3D6-BA45045CF892@illinois.edu>

Yes, I see that one.  It may be the ID hash that is being returned is empty.  I'll look into it.

-c 

On Aug 31, 2010, at 6:57 AM, J. Christopher Ellis wrote:

> Hi Chris,
> 
> The error is...
> 
> "Use of uninitialized value $id in join or string at C:/Perl64/site/lib/Bio/Tools/EUtilities/EUtilParameters.pm line 363."
> 
> The script from http://bioperl.org/wiki/Species_names_from_accession_numbers is as follows....
> 
> use Bio::DB::EUtilities;
> 
> 
> 
>  
> 
> 
> 
> 
> my (%taxa, @taxa);
> 
> 
> 
> my (%names, %idmap);
> 
> 
> 
>  
> 
> 
> 
> 
> # these are protein ids; nuc ids will work by changing -dbfrom => 'nucleotide',
> 
> 
> 
> # (probably)
> 
> 
> 
>  
> 
> 
> 
> 
> my @ids = qw(1621261 89318838 68536103 
> 
> 20807972
>  730439);
> 
>  
> 
> 
> 
> 
> my $factory = Bio::DB::EUtilities->new(
> 
> -
> eutil => 'elink',
> 
>  
> -db => 'taxonomy',
> 
> 
> 
>  
> -dbfrom => 'protein',
> 
> 
> 
>  
> -correspondence => 1,
> 
> 
> 
>  
> -id => \@ids);
> 
> 
> 
>  
> 
> 
> 
> 
> # iterate through the LinkSet objects
> 
> 
> 
> while (my $ds = $factory->next_LinkSet) {
> 
> 
> 
>  
> $taxa{($ds->get_submitted_ids)[0]
> 
> }
>  = ($ds->get_ids)[0]
> 
> }
> 
> 
> 
>  
> 
> 
> 
> 
> @taxa = @taxa{@ids};
> 
> 
> 
>  
> 
> 
> 
> 
> $factory = Bio::DB::EUtilities->new(-eutil 
> 
> =>
>  'esummary',
> 
>  
> -db => 'taxonomy',
> 
> 
> 
>  
> -id => \@taxa );
> 
> 
> 
>  
> 
> 
> 
> 
> while (local $_ = $factory->next_DocSum)
> 
>  
> {
> 
>  
> $names{($_->get_contents_by_name('TaxId'))
> 
> [
> 0]} = 
> 
> ($_->get_contents_by_name('ScientificName'))[0
> 
> ]
> ;
> 
> }
> 
> 
> 
>  
> 
> 
> 
> 
> foreach (@ids) {
> 
> 
> 
>  
> $idmap{$_} = $names{$taxa{$_
> 
> }
> };
> 
> }
> 
> 
> 
>  
> 
> 
> 
> 
> # %idmap is
> 
> 
> 
> # 1621261 => 'Mycobacterium tuberculosis H37Rv'
> 
> 
> 
> # 20807972 => 'Thermoanaerobacter tengcongensis MB4'
> 
> 
> 
> # 68536103 => 'Corynebacterium jeikeium K411'
> 
> 
> 
> # 730439 => 'Bacillus caldolyticus'
> 
> 
> 
> # 89318838 => undef (this record has been removed from the db)
> 
> 
> 
>  
> 
> 
> 
> 
> 1;
> 
> 
> Thanks,
> 
> 
> 
> Chris
> 
> 
> On Mon 08/30/10 09:36 , "Chris Fields" cjfields at illinois.edu sent:
> Chris,
> 
> Regarding a fix for that script, we would have to see your modified script and the error. However, there are modules within BioPerl to essentially do what you want, in particular, Bio::DB::Taxonomy.
> 
> chris
> 
> On Aug 30, 2010, at 7:55 AM, J. Christopher Ellis wrote:
> 
> > Hi All,
> > 
> > I am trying to extract the entire taxonomy of an organism including the
> > classifications. Some thing like...
> > 
> > Phylum:Proteobacteria, Class:Gammaproteobacteria, Order:Enterobacteriales, Family:Enterobacteriaceae, Genus:Escherichia
> > 
> > I am not worried about format just that I get the information and the associated level of hierarchy. The script found athttp://bioperl.org/wiki/Species_names_from_accession_numbers">http://bioperl.org/wiki/Species_names_from_accession_numbers seemed like a good starting point so I copied it and tried run it but got an error.
> > 
> > My first question is "Is there a known fix for this?" and my second question is how do I get the full hierarchical information (as seen above) with the taxonomy db?
> > 
> > Thanks for all your help in advance!
> > 
> > Chris 
> > 
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l">http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 


From J.Christopher.Ellis at duke.edu  Tue Aug 31 07:57:27 2010
From: J.Christopher.Ellis at duke.edu (J. Christopher Ellis)
Date: Tue, 31 Aug 2010 07:57:27 -0400
Subject: [Bioperl-l] Taxonomy DB problem
Message-ID: <56973.1283255847@duke.edu>

 Hi Chris,

 The error is...

 "Use of uninitialized value $id in join or string at
C:/Perl64/site/lib/Bio/Tools/EUtilities/EUtilParameters.pm line 363."

 The script from
http://bioperl.org/wiki/Species_names_from_accession_numbers is as
follows....

use?Bio::DB::EUtilities;

?

my?(%taxa,?@taxa);

my?(%names,?%idmap);

?

#?these?are?protein?ids;?nuc?ids?will?work?by?changing?-dbfrom?=>?'nucleotide',

#?(probably)

?

my?@ids?=?qw(1621261?89318838?68536103?

20807972?730439);

?

my?$factory?=?Bio::DB::EUtilities->new(

-eutil?=>?'elink',

?-db?=>?'taxonomy',

?-dbfrom?=>?'protein',

?-correspondence?=>?1,

?-id?=>?@ids);

?

#?iterate?through?the?LinkSet?objects

while?(my?$ds?=?$factory->next_LinkSet)?{

?$taxa{($ds->get_submitted_ids)[0]

}?=?($ds->get_ids)[0]

}

?

@taxa?=?@taxa{@ids};

?

$factory?=?Bio::DB::EUtilities->new(-eutil?

=>?'esummary',

?-db?=>?'taxonomy',

?-id?=>?@taxa?);

?

while?(local?$_?=?$factory->next_DocSum)

?{

?$names{($_->get_contents_by_name('TaxId'))

[0]}?=?

($_->get_contents_by_name('ScientificName'))[0

];

}

?

foreach?(@ids)?{

?$idmap{$_}?=?$names{$taxa{$_

}};

}

?

#?%idmap?is

#?1621261?=>?'Mycobacterium?tuberculosis?H37Rv'

#?20807972?=>?'Thermoanaerobacter?tengcongensis?MB4'

#?68536103?=>?'Corynebacterium?jeikeium?K411'

#?730439?=>?'Bacillus?caldolyticus'

#?89318838?=>?undef?(this?record?has?been?removed?from?the?db)

?

1;

Thanks,

Chris

 On Mon 08/30/10 09:36 , "Chris Fields" cjfields at illinois.edu sent:
 Chris,

 Regarding a fix for that script, we would have to see your modified
script and the error. However, there are modules within BioPerl to
essentially do what you want, in particular, Bio::DB::Taxonomy.

 chris

 On Aug 30, 2010, at 7:55 AM, J. Christopher Ellis wrote:

 > Hi All,
 > 
 > I am trying to extract the entire taxonomy of an organism including the
 > classifications. Some thing like...
 > 
 > Phylum:Proteobacteria, Class:Gammaproteobacteria,
Order:Enterobacteriales, Family:Enterobacteriaceae, Genus:Escherichia
 > 
 > I am not worried about format just that I get the information and the
associated level of hierarchy. The script found at
http://bioperl.org/wiki/Species_names_from_accession_numbers seemed like a
good starting point so I copied it and tried run it but got an error.
 > 
 > My first question is "Is there a known fix for this?" and my second
question is how do I get the full hierarchical information (as seen above)
with the taxonomy db?
 > 
 > Thanks for all your help in advance!
 > 
 > Chris 
 > 
 > 
 > _______________________________________________
 > Bioperl-l mailing list
 > Bioperl-l at lists.open-bio.org
 > http://lists.open-bio.org/mailman/listinfo/bioperl-l

 
From rmb32 at cornell.edu  Sun Aug  1 19:17:14 2010
From: rmb32 at cornell.edu (Robert Buels)
Date: Sun, 01 Aug 2010 12:17:14 -0700
Subject: [Bioperl-l] GMOD Evo Hackathon Open Call for Participation
Message-ID: <4C55C83A.3060700@cornell.edu>

We are seeking participants for the GMOD Tools for Evolutionary Biology 
Hackathon, held November 8-12, 2010 at the US National Evolutionary 
Synthesis Center (NESCent) in Durham, NC.

This hackathon targets three critical gaps in the capabilities of the 
GMOD toolbox that currently limit its utility for evolutionary research:

  1. Visualization of comparative genomics data
  2. Visualization of phylogenetic data and trees
  3. Support for population diversity and phenotype data

If you are interested in these areas and have relevant expertise, you 
are strongly encouraged to apply. Relevant areas of expertise include 
more than just software development: if you are a GMOD power user, 
visualization guru, domain expert (comparative, phylogenetics, 
population, ...), or documentation wizard, then your skills are needed!

How To Apply:

Fill out the online application form at http://bit.ly/gmodevohack. 
Applications are due August 25.

About GMOD:

GMOD is an intercompatible suite of open-source software components for 
storing, managing, analyzing, and visualizing genome-scale data. GMOD 
includes many widely-used software components: GBrowse and JBrowse, both 
genome viewers; GBrowse_syn, a comparative genomics viewer; Chado, a 
generic and modular database schema; CMap, a comparative map viewer; as 
well as many other components including Apollo, MAKER, BioMart, 
InterMine, and Galaxy. We hope to extend the functionality of existing 
GMOD components, and integrate new components as well.

About Hackathons:

A hackathon is an intense event at which a group of programmers with 
different backgrounds and skills collaborate hands-on and face-to-face 
to develop working code that is of utility to the community as a whole. 
The mix of people will include domain experts and computer-savvy end-users.

More details about the event, its motivation, organization, procedures, 
and attendees, as well as URLs to the hackathon and related websites are 
included below.

Sincerely,

The GMOD EvoHack Organizing Committee (and project affiliations as
relevant):

Nicole Washington, Chair (LBNL, modENCODE, Phenote)

Robert Buels (SGN, Chado NatDiv)

Scott Cain (OICR, GMOD)

Dave Clements (NESCent, GMOD)

Hilmar Lapp (NESCent, Phenoscape, Chado NatDiv)

Sheldon McKay (University of Arizona, iPlant, GBrowse_syn)


-----------------------------

About the GMOD Evo Hackathon

Overview

We are organizing a hackathon to fill critical gaps in the capabilities 
of the Generic Model Organism Database (GMOD) toolbox that currently 
limit its utility for evolutionary research. Specifically, we will focus 
on tools for

   1) viewing comparative genomics data;
   2) visualizing phylogenomic data; and
   3) supporting population diversity data and phenotype annotation.

The event will be hosted at NESCent and bring together a group of about 
20+ software developers, end-user representatives, and documentation 
experts who would otherwise not meet. The participants will include key 
developers of GMOD components that currently lack features critical for 
emerging evolutionary biology research, developers of informatics tools 
in evolutionary research that lack GMOD integration, and 
informatics-savvy biologists who can represent end-user requirements.

The event will provide a unique opportunity to infuse the GMOD developer 
community with a heightened awareness of unmet needs in evolutionary 
biology that GMOD components have the potential to fill, and for tool 
developers in evolutionary biology to better understand how best to 
extend or integrate with already existing GMOD components.

Before the Event

Discussion of ideas and sometimes even design actually starts well 
before the hackathon, on mailing lists, wiki pages, and conference calls 
set up among accepted attendees.  This advance work lays the foundation 
for participants to be productive from the very first day.  This also 
means that participants should be willing to contribute some time in 
advance of the hackathon itself to participate in this preparatory 
discussion.

During the Event

Typically, hackathon participants use the morning of the first day of 
the event to organize themselves into working groups of between 3 and 6 
people, each with a focused implementation objective.  Ideas and 
objectives are discussed, and attendees coalesce around the projects in 
which they have the most experience or interest.


Deliverables / Event Results

The meeting's attendance, working groups, and outcomes will be fully 
logged and documented on the GMOD wiki (http://gmod.org). Each working 
group during the event will typically have its own wiki page, linked 
from the main EvoHack page, where it documents its minutes and design 
notes, and provides links to the code and documentation it produces. 
Also, since GMOD and NESCent are both committed to open source 
principles, all code and documentation produced by participants during 
the hackathon must be published under an OSI-approved open source 
license. As contributions to existing GMOD tools, all hackathon products 
will most likely satisfy this requirement automatically.

NESCent

This event is sponsored by the US National Evolutionary Synthesis Center 
(NESCent, http://www.nescent.org) through its Informatics Whitepapers 
program (http://www.nescent.org/informatics/whitepapers.php). NESCent 
promotes the synthesis of information, concepts and knowledge to address 
significant, emerging, or novel questions in evolutionary science and 
its applications. NESCent achieves this by supporting research and 
education across disciplinary, institutional, geographic, and 
demographic boundaries (see http://www.nescent.org/science/proposals.php).

Links

Main GMOD EvoHack page, and full proposal:
http://gmod.org/wiki/GMOD_Evo_Hackathon

NESCent: http://www.nescent.org/
GMOD: http://gmod.org <http://gmod.org/>
Similar past NESCent events, see: http://hackathon.nescent.org/
GMOD hackathon application:  http://bit.ly/gmodevohack

-- 
http://gmod.org/wiki/GMOD_News
http://gmod.org/wiki/GMOD_Europe_2010
http://gmod.org/wiki/Help_Desk_Feedback


From maj at fortinbras.us  Sun Aug  1 23:19:16 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Sun, 1 Aug 2010 19:19:16 -0400
Subject: [Bioperl-l] SOAP Eutilities
In-Reply-To: <AANLkTi=DSQ2vktjCghDscW6OyHv25HYNXqA96LXTz443@mail.gmail.com>
References: <AANLkTi=DSQ2vktjCghDscW6OyHv25HYNXqA96LXTz443@mail.gmail.com>
Message-ID: <627BEC8B2E624A69A0B11EEBC8C93B71@NewLife>

Turns out that module lives in bioperl-run; try 

git clone git://github.com/bioperl/bioperl-run.git

MAJ
----- Original Message ----- 
From: "Robson de Souza" <robfsouza at gmail.com>
To: <bioperl-l at bioperl.org>
Sent: Saturday, July 31, 2010 4:56 PM
Subject: [Bioperl-l] SOAP Eutilities


> Hi,
> 
> Bio::DB::SoapEUtilities, referred in the HOWTO on EUtilities, seems to
> have disappeared from the Git repository.
> A simple
> 
> git clone git://github.com/bioperl/bioperl-live.git
> 
> does not download it. Any ideas why?
> Robson
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>


From David.Messina at sbc.su.se  Mon Aug  2 13:58:10 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Mon, 2 Aug 2010 15:58:10 +0200
Subject: [Bioperl-l] phyloxml and element order
In-Reply-To: <AANLkTimk5j3VfOvLNcN_c+FsgoVqpntB9xR5NfDopLPh@mail.gmail.com>
References: <AANLkTimk5j3VfOvLNcN_c+FsgoVqpntB9xR5NfDopLPh@mail.gmail.com>
Message-ID: <AB413C9E-ED42-48AF-A8AB-893771AD7067@sbc.su.se>

Hi Fred,

Thanks for letting us know about this ? definitely sounds like a bug.

Would you please submit this to our bug tracker?

    http://bugzilla.open-bio.org


(You can just copy and paste your previous email.)

Dave


On Jul 30, 2010, at 06:59, Fr?d?ric Romagn? wrote:

> Hi,
> 
> I'm using bioperl to create phyloxml trees, after few tentatives, i got my
> tree with all the element/attributes i want but when I write the tree,
> element are not written following the order specified in the XSD Schema.
> 
> For example, i got :
> 
> <clade>
>   <clade>
>      <name>Loxosceles intermedia</name>
>      <taxonomy>
>         <scientific_name>Araneomorphae Sicariidae</scientific_name>
>      </taxonomy>
>      <sequence>
>         <accession source="Arachnoserver">969</accession>
>         <mol_seq>HAAERADSRKPIWDIAHMVNDLELVD</mol_seq>
>      </sequence>
>   </clade>
>   <taxonomy>
>      <scientific_name>Araneomorphae Sicariidae</scientific_name>
>   </taxonomy>
> </clade>
> 
> The program forester complains that <taxonomy> should be written before the
> <clade> element.
> 
> According to
> http://phyloxml.wordpress.com/2009/11/25/order-of-elements-in-phyloxml this
> is what bioperl is supposed to do.
> 
> All my element/attributes are set before writing the tree using
> 'add_Annotation', 'add_tag_value' and 'sequence' methods from a
> Bio::Tree::AnnotatableNode object, so i think the error comes from the
> write_tree method.
> 
> Any help would be appreciated.
> 
> Thank you,
> Fred
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From shalabh.sharma7 at gmail.com  Mon Aug  2 19:44:35 2010
From: shalabh.sharma7 at gmail.com (shalabh sharma)
Date: Mon, 2 Aug 2010 15:44:35 -0400
Subject: [Bioperl-l] clustalw to maf format
Message-ID: <AANLkTik4yDV6p0AyMWXxsJP1ifYAehu7jw=2oUWeo0B3@mail.gmail.com>

Hi,
    I am trying to convert clustalw to maf format.
I am trying to use AlignIO for that but its not working.

Its giving me the following error:

EXCEPTION Bio::Root::NotImplemented -------------
MSG: Abstract method "Bio::AlignIO::maf::write_aln" is not implemented by
package Bio::AlignIO::maf.
This is not your fault - author of Bio::AlignIO::maf should be blamed!

STACK Bio::Root::RootI::throw_not_implemented
/Library/Perl/5.8.8/Bio/Root/RootI.pm:707
STACK Bio::AlignIO::maf::write_aln /Library/Perl/5.8.8/Bio/AlignIO/
maf.pm:176
STACK Bio::AlignIO::PRINT /Library/Perl/5.8.8/Bio/AlignIO.pm:492
STACK toplevel msf2mafy.pl:11


Is there any other way i can convert clustalw to maf?

I would really appreciate if anyone can help me out.

Thanks
Shalabh


From Russell.Smithies at agresearch.co.nz  Mon Aug  2 20:25:26 2010
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Tue, 3 Aug 2010 08:25:26 +1200
Subject: [Bioperl-l] clustalw to maf format
In-Reply-To: <AANLkTik4yDV6p0AyMWXxsJP1ifYAehu7jw=2oUWeo0B3@mail.gmail.com>
References: <AANLkTik4yDV6p0AyMWXxsJP1ifYAehu7jw=2oUWeo0B3@mail.gmail.com>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF32F02147B68@exchsth.agresearch.co.nz>

This might work if you only have a few:
http://www.ibi.vu.nl/programs/convertalignwww/

--Russell


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of shalabh sharma
> Sent: Tuesday, 3 August 2010 7:45 a.m.
> To: bioperl-l
> Subject: [Bioperl-l] clustalw to maf format
> 
> Hi,
>     I am trying to convert clustalw to maf format.
> I am trying to use AlignIO for that but its not working.
> 
> Its giving me the following error:
> 
> EXCEPTION Bio::Root::NotImplemented -------------
> MSG: Abstract method "Bio::AlignIO::maf::write_aln" is not implemented by
> package Bio::AlignIO::maf.
> This is not your fault - author of Bio::AlignIO::maf should be blamed!
> 
> STACK Bio::Root::RootI::throw_not_implemented
> /Library/Perl/5.8.8/Bio/Root/RootI.pm:707
> STACK Bio::AlignIO::maf::write_aln /Library/Perl/5.8.8/Bio/AlignIO/
> maf.pm:176
> STACK Bio::AlignIO::PRINT /Library/Perl/5.8.8/Bio/AlignIO.pm:492
> STACK toplevel msf2mafy.pl:11
> 
> 
> Is there any other way i can convert clustalw to maf?
> 
> I would really appreciate if anyone can help me out.
> 
> Thanks
> Shalabh
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From shalabh.sharma7 at gmail.com  Mon Aug  2 20:53:31 2010
From: shalabh.sharma7 at gmail.com (shalabh sharma)
Date: Mon, 2 Aug 2010 16:53:31 -0400
Subject: [Bioperl-l] clustalw to maf format
In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32F02147B68@exchsth.agresearch.co.nz>
References: <AANLkTik4yDV6p0AyMWXxsJP1ifYAehu7jw=2oUWeo0B3@mail.gmail.com>
	<18DF7D20DFEC044098A1062202F5FFF32F02147B68@exchsth.agresearch.co.nz>
Message-ID: <AANLkTingREcmgoeS7RVzi4j84Kk9bFmg_F6p-tScpKWA@mail.gmail.com>

Hi Russell,
            Thanks for the reply, but i  have around 400 alignments and some
huge ones :(

Thanks
Shalabh


On Mon, Aug 2, 2010 at 4:25 PM, Smithies, Russell <
Russell.Smithies at agresearch.co.nz> wrote:

> This might work if you only have a few:
> http://www.ibi.vu.nl/programs/convertalignwww/
>
> --Russell
>
>
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of shalabh sharma
> > Sent: Tuesday, 3 August 2010 7:45 a.m.
> > To: bioperl-l
> > Subject: [Bioperl-l] clustalw to maf format
> >
> > Hi,
> >     I am trying to convert clustalw to maf format.
> > I am trying to use AlignIO for that but its not working.
> >
> > Its giving me the following error:
> >
> > EXCEPTION Bio::Root::NotImplemented -------------
> > MSG: Abstract method "Bio::AlignIO::maf::write_aln" is not implemented by
> > package Bio::AlignIO::maf.
> > This is not your fault - author of Bio::AlignIO::maf should be blamed!
> >
> > STACK Bio::Root::RootI::throw_not_implemented
> > /Library/Perl/5.8.8/Bio/Root/RootI.pm:707
> > STACK Bio::AlignIO::maf::write_aln /Library/Perl/5.8.8/Bio/AlignIO/
> > maf.pm:176
> > STACK Bio::AlignIO::PRINT /Library/Perl/5.8.8/Bio/AlignIO.pm:492
> > STACK toplevel msf2mafy.pl:11
> >
> >
> > Is there any other way i can convert clustalw to maf?
> >
> > I would really appreciate if anyone can help me out.
> >
> > Thanks
> > Shalabh
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
>


From biopython at maubp.freeserve.co.uk  Mon Aug  2 21:24:09 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 2 Aug 2010 22:24:09 +0100
Subject: [Bioperl-l] clustalw to maf format
In-Reply-To: <AANLkTik4yDV6p0AyMWXxsJP1ifYAehu7jw=2oUWeo0B3@mail.gmail.com>
References: <AANLkTik4yDV6p0AyMWXxsJP1ifYAehu7jw=2oUWeo0B3@mail.gmail.com>
Message-ID: <AANLkTikFJP0aZHWgcRVxfJ9dhg-8Aj+aRWLF2GJDseW3@mail.gmail.com>

On Mon, Aug 2, 2010 at 8:44 PM, shalabh sharma
<shalabh.sharma7 at gmail.com> wrote:
> Hi,
> ? ?I am trying to convert clustalw to maf format.
> I am trying to use AlignIO for that but its not working.

Could you tell us why you have to use maf format?
I'm curious because all of the phylogenetics tools I've
had to work with personally will take some other format
which is more widely supported (e.g. FASTA, PFAM,
ClustalW, PHYLIP, ...).

Peter


From bernd.web at gmail.com  Mon Aug  2 21:25:52 2010
From: bernd.web at gmail.com (Bernd Web)
Date: Mon, 2 Aug 2010 23:25:52 +0200
Subject: [Bioperl-l] clustalw to maf format
In-Reply-To: <AANLkTingREcmgoeS7RVzi4j84Kk9bFmg_F6p-tScpKWA@mail.gmail.com>
References: <AANLkTik4yDV6p0AyMWXxsJP1ifYAehu7jw=2oUWeo0B3@mail.gmail.com>
	<18DF7D20DFEC044098A1062202F5FFF32F02147B68@exchsth.agresearch.co.nz>
	<AANLkTingREcmgoeS7RVzi4j84Kk9bFmg_F6p-tScpKWA@mail.gmail.com>
Message-ID: <AANLkTimQe9fgO3jMeWR_y3E7gNskh26GUVVuEyfgtRJc@mail.gmail.com>

Hi Shalabh,

This ConvertAlign does not write maf either, it only reads it (i made
it). I found some other converters on the web but they do not export
to maf format either...

http://biotechvana.uv.es/servers/afc/main.php
http://www.hiv.lanl.gov/content/sequence/FORMAT_CONVERSION/form.html

Galaxy has a MAF to Fasta converter:
http://main.g2.bx.psu.edu/root?tool_id=MAF_To_Fasta1


Regards,
Bernd


On Mon, Aug 2, 2010 at 10:53 PM, shalabh sharma
<shalabh.sharma7 at gmail.com> wrote:
> Hi Russell,
> ? ? ? ? ? ?Thanks for the reply, but i ?have around 400 alignments and some
> huge ones :(
>
> Thanks
> Shalabh
>
>
> On Mon, Aug 2, 2010 at 4:25 PM, Smithies, Russell <
> Russell.Smithies at agresearch.co.nz> wrote:
>
>> This might work if you only have a few:
>> http://www.ibi.vu.nl/programs/convertalignwww/
>>
>> --Russell
>>
>>
>> > -----Original Message-----
>> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> > bounces at lists.open-bio.org] On Behalf Of shalabh sharma
>> > Sent: Tuesday, 3 August 2010 7:45 a.m.
>> > To: bioperl-l
>> > Subject: [Bioperl-l] clustalw to maf format
>> >
>> > Hi,
>> > ? ? I am trying to convert clustalw to maf format.
>> > I am trying to use AlignIO for that but its not working.
>> >
>> > Its giving me the following error:
>> >
>> > EXCEPTION Bio::Root::NotImplemented -------------
>> > MSG: Abstract method "Bio::AlignIO::maf::write_aln" is not implemented by
>> > package Bio::AlignIO::maf.
>> > This is not your fault - author of Bio::AlignIO::maf should be blamed!
>> >
>> > STACK Bio::Root::RootI::throw_not_implemented
>> > /Library/Perl/5.8.8/Bio/Root/RootI.pm:707
>> > STACK Bio::AlignIO::maf::write_aln /Library/Perl/5.8.8/Bio/AlignIO/
>> > maf.pm:176
>> > STACK Bio::AlignIO::PRINT /Library/Perl/5.8.8/Bio/AlignIO.pm:492
>> > STACK toplevel msf2mafy.pl:11
>> >
>> >
>> > Is there any other way i can convert clustalw to maf?
>> >
>> > I would really appreciate if anyone can help me out.
>> >
>> > Thanks
>> > Shalabh
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> =======================================================================
>> Attention: The information contained in this message and/or attachments
>> from AgResearch Limited is intended only for the persons or entities
>> to which it is addressed and may contain confidential and/or privileged
>> material. Any review, retransmission, dissemination or other use of, or
>> taking of any action in reliance upon, this information by persons or
>> entities other than the intended recipients is prohibited by AgResearch
>> Limited. If you have received this message in error, please notify the
>> sender immediately.
>> =======================================================================
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at illinois.edu  Mon Aug  2 21:31:20 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 2 Aug 2010 16:31:20 -0500
Subject: [Bioperl-l] clustalw to maf format
In-Reply-To: <AANLkTingREcmgoeS7RVzi4j84Kk9bFmg_F6p-tScpKWA@mail.gmail.com>
References: <AANLkTik4yDV6p0AyMWXxsJP1ifYAehu7jw=2oUWeo0B3@mail.gmail.com>
	<18DF7D20DFEC044098A1062202F5FFF32F02147B68@exchsth.agresearch.co.nz>
	<AANLkTingREcmgoeS7RVzi4j84Kk9bFmg_F6p-tScpKWA@mail.gmail.com>
Message-ID: <6E9C9D64-D23A-4FC8-B213-FC8A7FFA4F27@illinois.edu>

No other format will work?  The main reason you see unimplemented methods like this is there is no active interest in working with this format beyond getting the information stored within them into objects and other commonly-used formats.

chris

On Aug 2, 2010, at 3:53 PM, shalabh sharma wrote:

> Hi Russell,
>            Thanks for the reply, but i  have around 400 alignments and some
> huge ones :(
> 
> Thanks
> Shalabh
> 
> 
> On Mon, Aug 2, 2010 at 4:25 PM, Smithies, Russell <
> Russell.Smithies at agresearch.co.nz> wrote:
> 
>> This might work if you only have a few:
>> http://www.ibi.vu.nl/programs/convertalignwww/
>> 
>> --Russell
>> 
>> 
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>> bounces at lists.open-bio.org] On Behalf Of shalabh sharma
>>> Sent: Tuesday, 3 August 2010 7:45 a.m.
>>> To: bioperl-l
>>> Subject: [Bioperl-l] clustalw to maf format
>>> 
>>> Hi,
>>>    I am trying to convert clustalw to maf format.
>>> I am trying to use AlignIO for that but its not working.
>>> 
>>> Its giving me the following error:
>>> 
>>> EXCEPTION Bio::Root::NotImplemented -------------
>>> MSG: Abstract method "Bio::AlignIO::maf::write_aln" is not implemented by
>>> package Bio::AlignIO::maf.
>>> This is not your fault - author of Bio::AlignIO::maf should be blamed!
>>> 
>>> STACK Bio::Root::RootI::throw_not_implemented
>>> /Library/Perl/5.8.8/Bio/Root/RootI.pm:707
>>> STACK Bio::AlignIO::maf::write_aln /Library/Perl/5.8.8/Bio/AlignIO/
>>> maf.pm:176
>>> STACK Bio::AlignIO::PRINT /Library/Perl/5.8.8/Bio/AlignIO.pm:492
>>> STACK toplevel msf2mafy.pl:11
>>> 
>>> 
>>> Is there any other way i can convert clustalw to maf?
>>> 
>>> I would really appreciate if anyone can help me out.
>>> 
>>> Thanks
>>> Shalabh
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> =======================================================================
>> Attention: The information contained in this message and/or attachments
>> from AgResearch Limited is intended only for the persons or entities
>> to which it is addressed and may contain confidential and/or privileged
>> material. Any review, retransmission, dissemination or other use of, or
>> taking of any action in reliance upon, this information by persons or
>> entities other than the intended recipients is prohibited by AgResearch
>> Limited. If you have received this message in error, please notify the
>> sender immediately.
>> =======================================================================
>> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From shalabh.sharma7 at gmail.com  Mon Aug  2 22:30:41 2010
From: shalabh.sharma7 at gmail.com (shalabh sharma)
Date: Mon, 2 Aug 2010 18:30:41 -0400
Subject: [Bioperl-l] clustalw to maf format
In-Reply-To: <6E9C9D64-D23A-4FC8-B213-FC8A7FFA4F27@illinois.edu>
References: <AANLkTik4yDV6p0AyMWXxsJP1ifYAehu7jw=2oUWeo0B3@mail.gmail.com>
	<18DF7D20DFEC044098A1062202F5FFF32F02147B68@exchsth.agresearch.co.nz>
	<AANLkTingREcmgoeS7RVzi4j84Kk9bFmg_F6p-tScpKWA@mail.gmail.com>
	<6E9C9D64-D23A-4FC8-B213-FC8A7FFA4F27@illinois.edu>
Message-ID: <AANLkTi=DS92zQ+_YANSbgmMiPUDo66YXyLhQby+_z_VD@mail.gmail.com>

Hi All,
      Thanks for the replies.
Actually i am working on a pipeline involving RNAz.
I had impression that there must be a converter available as their webserver
can take xmfa or maf format but standalone is only accepting maf format.

I think i will use a program that can output as xmfa and write to those
people if they can provide me with the converter.

Thanks
Shalabh


On Mon, Aug 2, 2010 at 5:31 PM, Chris Fields <cjfields at illinois.edu> wrote:

> No other format will work?  The main reason you see unimplemented methods
> like this is there is no active interest in working with this format beyond
> getting the information stored within them into objects and other
> commonly-used formats.
>
> chris
>
> On Aug 2, 2010, at 3:53 PM, shalabh sharma wrote:
>
> > Hi Russell,
> >            Thanks for the reply, but i  have around 400 alignments and
> some
> > huge ones :(
> >
> > Thanks
> > Shalabh
> >
> >
> > On Mon, Aug 2, 2010 at 4:25 PM, Smithies, Russell <
> > Russell.Smithies at agresearch.co.nz> wrote:
> >
> >> This might work if you only have a few:
> >> http://www.ibi.vu.nl/programs/convertalignwww/
> >>
> >> --Russell
> >>
> >>
> >>> -----Original Message-----
> >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >>> bounces at lists.open-bio.org] On Behalf Of shalabh sharma
> >>> Sent: Tuesday, 3 August 2010 7:45 a.m.
> >>> To: bioperl-l
> >>> Subject: [Bioperl-l] clustalw to maf format
> >>>
> >>> Hi,
> >>>    I am trying to convert clustalw to maf format.
> >>> I am trying to use AlignIO for that but its not working.
> >>>
> >>> Its giving me the following error:
> >>>
> >>> EXCEPTION Bio::Root::NotImplemented -------------
> >>> MSG: Abstract method "Bio::AlignIO::maf::write_aln" is not implemented
> by
> >>> package Bio::AlignIO::maf.
> >>> This is not your fault - author of Bio::AlignIO::maf should be blamed!
> >>>
> >>> STACK Bio::Root::RootI::throw_not_implemented
> >>> /Library/Perl/5.8.8/Bio/Root/RootI.pm:707
> >>> STACK Bio::AlignIO::maf::write_aln /Library/Perl/5.8.8/Bio/AlignIO/
> >>> maf.pm:176
> >>> STACK Bio::AlignIO::PRINT /Library/Perl/5.8.8/Bio/AlignIO.pm:492
> >>> STACK toplevel msf2mafy.pl:11
> >>>
> >>>
> >>> Is there any other way i can convert clustalw to maf?
> >>>
> >>> I would really appreciate if anyone can help me out.
> >>>
> >>> Thanks
> >>> Shalabh
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >> =======================================================================
> >> Attention: The information contained in this message and/or attachments
> >> from AgResearch Limited is intended only for the persons or entities
> >> to which it is addressed and may contain confidential and/or privileged
> >> material. Any review, retransmission, dissemination or other use of, or
> >> taking of any action in reliance upon, this information by persons or
> >> entities other than the intended recipients is prohibited by AgResearch
> >> Limited. If you have received this message in error, please notify the
> >> sender immediately.
> >> =======================================================================
> >>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From chiragmatkarbioinfo at gmail.com  Tue Aug  3 07:47:37 2010
From: chiragmatkarbioinfo at gmail.com (chirag matkar)
Date: Tue, 3 Aug 2010 13:17:37 +0530
Subject: [Bioperl-l] Pubmed Parsing
Message-ID: <AANLkTim+qcBN_9kXVLAkessaHUY9e=gc4Ad5MVGWk-mF@mail.gmail.com>

Hello all,
I have a list of Pubmed Ids.
I want to parse articles to find specific SNP related information.
Can i work it out using a Script?


-- 
Regards,
Chirag Matkar


From genehack at genehack.org  Tue Aug  3 09:03:35 2010
From: genehack at genehack.org (John Anderson)
Date: Tue, 3 Aug 2010 05:03:35 -0400
Subject: [Bioperl-l] Pubmed Parsing
In-Reply-To: <AANLkTim+qcBN_9kXVLAkessaHUY9e=gc4Ad5MVGWk-mF@mail.gmail.com>
References: <AANLkTim+qcBN_9kXVLAkessaHUY9e=gc4Ad5MVGWk-mF@mail.gmail.com>
Message-ID: <5E557C44-224B-4460-9C2C-E375555B8BE6@genehack.org>


On Aug 3, 2010, at 3:47 AM, chirag matkar wrote:

> I have a list of Pubmed Ids.
> I want to parse articles to find specific SNP related information.
> Can i work it out using a Script?

Can you provide a more specific example of what you'd like to do? For example, something along the lines of, "for PMID 1234, get ... about SNP 5678" (where '...' is replaced with whatever it is you're trying to get). Even describing how you would obtain this information using the website yourself will be helpful.

thanks,
john.


From gowthaman.ramasamy at seattlebiomed.org  Tue Aug  3 05:29:10 2010
From: gowthaman.ramasamy at seattlebiomed.org (Gowthaman Ramasamy)
Date: Mon, 2 Aug 2010 22:29:10 -0700
Subject: [Bioperl-l] Getting pileup consensus from BAM files using
	Bio::DB::Sam
In-Reply-To: <AANLkTi=DS92zQ+_YANSbgmMiPUDo66YXyLhQby+_z_VD@mail.gmail.com>
Message-ID: <C87CF736.E5DB%gowthaman.ramasamy@sbri.org>

Hi List,
I am trying to find out the consensus using pileup via Bio::DB::Sam. Using the following script I could parse out the ref_base and different bases from reads at that position. Though, I am not able to find a method to derive consensus. Similar to the values produced by "samtools pileup -c -f xxxxxx.fasta yyyyyyy.bam".

The script I use now retrives ref base, query bases for each position. How do I improve it to get the consensus?

Thanks very much in advance,
Gowthaman


use Bio::DB::Sam;

my $bam = Bio::DB::Sam->new(-bam => 'something.bam',
                            -fasta => 'something.fasta'
                           );

my $cb = sub {
                        my ($seqid, $pos, $pileups) = @_;
                        my $refBase = $bam->segment($seqid, $pos, $pos)->dna;
                        print "\n$pos\t$refBase=>";
                        for my $pileup (@$pileups){
                                my $al = $pileup->alignment;
                                my $qBase = substr($al->qseq, $pileup->qpos, 1);
                                print "$qBase,";
                                }
                        };

$bam->pileup('Lin.chr10i', $cb);


From scott at scottcain.net  Tue Aug  3 10:32:59 2010
From: scott at scottcain.net (Scott Cain)
Date: Tue, 3 Aug 2010 06:32:59 -0400
Subject: [Bioperl-l] Getting pileup consensus from BAM files using
	Bio::DB::Sam
In-Reply-To: <C87CF736.E5DB%gowthaman.ramasamy@sbri.org>
References: <AANLkTi=DS92zQ+_YANSbgmMiPUDo66YXyLhQby+_z_VD@mail.gmail.com>
	<C87CF736.E5DB%gowthaman.ramasamy@sbri.org>
Message-ID: <AANLkTi=vkM5rhy2x_s3p1jZKPtnLjq4wWD=ebGxxmaha@mail.gmail.com>

Hi Gowthaman,

I don't see a method to extract the consensus.  You are welcome to
submit a patch :-)

Scott


On Tue, Aug 3, 2010 at 1:29 AM, Gowthaman Ramasamy
<gowthaman.ramasamy at seattlebiomed.org> wrote:
> Hi List,
> I am trying to find out the consensus using pileup via Bio::DB::Sam. Using the following script I could parse out the ref_base and different bases from reads at that position. Though, I am not able to find a method to derive consensus. Similar to the values produced by "samtools pileup -c -f xxxxxx.fasta yyyyyyy.bam".
>
> The script I use now retrives ref base, query bases for each position. How do I improve it to get the consensus?
>
> Thanks very much in advance,
> Gowthaman
>
>
> use Bio::DB::Sam;
>
> my $bam = Bio::DB::Sam->new(-bam => 'something.bam',
> ? ? ? ? ? ? ? ? ? ? ? ? ? ?-fasta => 'something.fasta'
> ? ? ? ? ? ? ? ? ? ? ? ? ? );
>
> my $cb = sub {
> ? ? ? ? ? ? ? ? ? ? ? ?my ($seqid, $pos, $pileups) = @_;
> ? ? ? ? ? ? ? ? ? ? ? ?my $refBase = $bam->segment($seqid, $pos, $pos)->dna;
> ? ? ? ? ? ? ? ? ? ? ? ?print "\n$pos\t$refBase=>";
> ? ? ? ? ? ? ? ? ? ? ? ?for my $pileup (@$pileups){
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?my $al = $pileup->alignment;
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?my $qBase = substr($al->qseq, $pileup->qpos, 1);
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?print "$qBase,";
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?}
> ? ? ? ? ? ? ? ? ? ? ? ?};
>
> $bam->pileup('Lin.chr10i', $cb);
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
Ontario Institute for Cancer Research


From lincoln.stein at gmail.com  Tue Aug  3 16:57:52 2010
From: lincoln.stein at gmail.com (Lincoln Stein)
Date: Tue, 3 Aug 2010 12:57:52 -0400
Subject: [Bioperl-l] Getting pileup consensus from BAM files using
	Bio::DB::Sam
In-Reply-To: <C87CF736.E5DB%gowthaman.ramasamy@sbri.org>
References: <AANLkTi=DS92zQ+_YANSbgmMiPUDo66YXyLhQby+_z_VD@mail.gmail.com>
	<C87CF736.E5DB%gowthaman.ramasamy@sbri.org>
Message-ID: <AANLkTimhT7KFHgg8HssiMrXkGNSjRMqjkuE++iAaqj4C@mail.gmail.com>

Samtools is running MAQ on the pileup. You could either implement MAQ in
perl, or come up with your own consensus caller.

Lincoln

On Tue, Aug 3, 2010 at 1:29 AM, Gowthaman Ramasamy <
gowthaman.ramasamy at seattlebiomed.org> wrote:

> Hi List,
> I am trying to find out the consensus using pileup via Bio::DB::Sam. Using
> the following script I could parse out the ref_base and different bases from
> reads at that position. Though, I am not able to find a method to derive
> consensus. Similar to the values produced by "samtools pileup -c -f
> xxxxxx.fasta yyyyyyy.bam".
>
> The script I use now retrives ref base, query bases for each position. How
> do I improve it to get the consensus?
>
> Thanks very much in advance,
> Gowthaman
>
>
> use Bio::DB::Sam;
>
> my $bam = Bio::DB::Sam->new(-bam => 'something.bam',
>                            -fasta => 'something.fasta'
>                           );
>
> my $cb = sub {
>                        my ($seqid, $pos, $pileups) = @_;
>                        my $refBase = $bam->segment($seqid, $pos,
> $pos)->dna;
>                        print "\n$pos\t$refBase=>";
>                        for my $pileup (@$pileups){
>                                my $al = $pileup->alignment;
>                                my $qBase = substr($al->qseq, $pileup->qpos,
> 1);
>                                print "$qBase,";
>                                }
>                        };
>
> $bam->pileup('Lin.chr10i', $cb);
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Renata Musa <Renata.Musa at oicr.on.ca>


From biopython at maubp.freeserve.co.uk  Tue Aug  3 17:06:46 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 3 Aug 2010 18:06:46 +0100
Subject: [Bioperl-l] Getting pileup consensus from BAM files using
	Bio::DB::Sam
In-Reply-To: <AANLkTimhT7KFHgg8HssiMrXkGNSjRMqjkuE++iAaqj4C@mail.gmail.com>
References: <AANLkTi=DS92zQ+_YANSbgmMiPUDo66YXyLhQby+_z_VD@mail.gmail.com>
	<C87CF736.E5DB%gowthaman.ramasamy@sbri.org>
	<AANLkTimhT7KFHgg8HssiMrXkGNSjRMqjkuE++iAaqj4C@mail.gmail.com>
Message-ID: <AANLkTinoszFJNtDeEbh_DyFLp97aayv7bYVu6c=znq1h@mail.gmail.com>

On Tue, Aug 3, 2010 at 5:57 PM, Lincoln Stein <lincoln.stein at gmail.com> wrote:
> Samtools is running MAQ on the pileup. You could either implement MAQ in
> perl, or come up with your own consensus caller.
>
> Lincoln

See also: http://seqanswers.com/forums/showthread.php?t=6241


From gowthaman.ramasamy at seattlebiomed.org  Tue Aug  3 17:28:36 2010
From: gowthaman.ramasamy at seattlebiomed.org (Gowthaman Ramasamy)
Date: Tue, 3 Aug 2010 10:28:36 -0700
Subject: [Bioperl-l] Getting pileup consensus from BAM files using
 Bio::DB::Sam
In-Reply-To: <AANLkTimhT7KFHgg8HssiMrXkGNSjRMqjkuE++iAaqj4C@mail.gmail.com>
References: <AANLkTi=DS92zQ+_YANSbgmMiPUDo66YXyLhQby+_z_VD@mail.gmail.com>
	<C87CF736.E5DB%gowthaman.ramasamy@sbri.org>,
	<AANLkTimhT7KFHgg8HssiMrXkGNSjRMqjkuE++iAaqj4C@mail.gmail.com>
Message-ID: <89080953C3D300419AACB6E63A7EEFBA5C47613B34@mail02.sbri.org>

Hi Lincoln,
Thats a good lead. I will try to use MAQ in perl rather than using my simple majority rule.

-gowtham
________________________________________
From: Lincoln Stein [lincoln.stein at gmail.com]
Sent: Tuesday, August 03, 2010 9:57 AM
To: Gowthaman Ramasamy
Cc: bioperl-l
Subject: Re: [Bioperl-l] Getting pileup consensus from BAM files using  Bio::DB::Sam

Samtools is running MAQ on the pileup. You could either implement MAQ in perl, or come up with your own consensus caller.

Lincoln

On Tue, Aug 3, 2010 at 1:29 AM, Gowthaman Ramasamy <gowthaman.ramasamy at seattlebiomed.org<mailto:gowthaman.ramasamy at seattlebiomed.org>> wrote:
Hi List,
I am trying to find out the consensus using pileup via Bio::DB::Sam. Using the following script I could parse out the ref_base and different bases from reads at that position. Though, I am not able to find a method to derive consensus. Similar to the values produced by "samtools pileup -c -f xxxxxx.fasta yyyyyyy.bam".

The script I use now retrives ref base, query bases for each position. How do I improve it to get the consensus?

Thanks very much in advance,
Gowthaman


use Bio::DB::Sam;

my $bam = Bio::DB::Sam->new(-bam => 'something.bam',
                           -fasta => 'something.fasta'
                          );

my $cb = sub {
                       my ($seqid, $pos, $pileups) = @_;
                       my $refBase = $bam->segment($seqid, $pos, $pos)->dna;
                       print "\n$pos\t$refBase=>";
                       for my $pileup (@$pileups){
                               my $al = $pileup->alignment;
                               my $qBase = substr($al->qseq, $pileup->qpos, 1);
                               print "$qBase,";
                               }
                       };

$bam->pileup('Lin.chr10i', $cb);

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org<mailto:Bioperl-l at lists.open-bio.org>
http://lists.open-bio.org/mailman/listinfo/bioperl-l


--
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Renata Musa <Renata.Musa at oicr.on.ca<mailto:Renata.Musa at oicr.on.ca>>


From stefan.kirov at bms.com  Tue Aug  3 20:22:35 2010
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Tue, 03 Aug 2010 16:22:35 -0400
Subject: [Bioperl-l] nmica parser
Message-ID: <4C587A8B.8090603@bms.com>

Has anyone written nmica parser? If not I will perhaps do that. It 
should be straightforward- the output is XML.
Stefan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: stefan_kirov.vcf
Type: text/x-vcard
Size: 207 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20100803/7e4ab529/attachment-0004.vcf>

From fs5 at sanger.ac.uk  Wed Aug  4 08:45:39 2010
From: fs5 at sanger.ac.uk (Frank Schwach)
Date: Wed, 04 Aug 2010 09:45:39 +0100
Subject: [Bioperl-l] Pubmed Parsing
In-Reply-To: <AANLkTim+qcBN_9kXVLAkessaHUY9e=gc4Ad5MVGWk-mF@mail.gmail.com>
References: <AANLkTim+qcBN_9kXVLAkessaHUY9e=gc4Ad5MVGWk-mF@mail.gmail.com>
Message-ID: <1280911539.3499.46.camel@deskpro15336.dynamic.sanger.ac.uk>

Hi Chiraq,

have a look at this earlier post:
http://bioperl.org/pipermail/bioperl-l/2009-April/029690.html

However, you won't be able to retrieve all full texts and it is quite a
task to parse natural language and get useful information about a gene,
protein, SNP etc out of a manuscript. 

Frank

On Tue, 2010-08-03 at 13:17 +0530, chirag matkar wrote:
> Hello all,
> I have a list of Pubmed Ids.
> I want to parse articles to find specific SNP related information.
> Can i work it out using a Script?
> 
> 
> 
> 
> 


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


From David.Messina at sbc.su.se  Thu Aug  5 12:16:17 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 5 Aug 2010 14:16:17 +0200
Subject: [Bioperl-l] call for a TreeIO volunteer
Message-ID: <91AC5B00-5969-4C56-B08A-6EEA76916A10@sbc.su.se>

Hi everybody,

We've got a couple of small open bugs related to the Bio::TreeIO modules, and we could really use someone to take a look at them. Ideally, that someone would have familiarity with TreeIO already.*

It'd help us to get the next release (1.6.2) out the door.

The bugs in question are:
- TreeIO::newick writes root node branch length incorrectly
http://bugzilla.open-bio.org/show_bug.cgi?id=3039

- Bio::TreeIO::nhx cannot parse empty [&&NHX] + round-trip failure
http://bugzilla.open-bio.org/show_bug.cgi?id=3007


Thanks,
Dave
on behalf of the core developers


* Even if you don't, though, if you've been looking for an opportunity to contribute to BioPerl, and this sounds like something you'd like to work on, by all means raise your hand.


From clements at nescent.org  Thu Aug  5 17:15:41 2010
From: clements at nescent.org (Dave Clements)
Date: Thu, 5 Aug 2010 10:15:41 -0700
Subject: [Bioperl-l] GMOD Europe 2010, 13-16 Sept, Cambridge, UK
In-Reply-To: <AANLkTinpd0pP9cBGUfnEd8PuV-VOcfqz6VKdCRp0d=uA@mail.gmail.com>
References: <AANLkTinpd0pP9cBGUfnEd8PuV-VOcfqz6VKdCRp0d=uA@mail.gmail.com>
Message-ID: <AANLkTi=BCjD3w0w4S+44qRb4ShW-P6DVBH0SZ+41k1Ah@mail.gmail.com>

GMOD Europe 2010
================
13-16 September 2010
Cambridge, UK
http://gmod.org/wiki/GMOD_Europe_2010


We are pleased to announce GMOD Europe 2010, four days of GMOD events being
held 13-16 September 2010, at the University of Cambridge. GMOD Europe 2010
includes:

1) GMOD Community Meeting, Monday & Tuesday:  Project updates, developer and
user presentations and best practices, project direction.

2) GMOD Satellite Meetings, Wednesday:  Special interest groups where GMOD
community members meet to discuss specific topics of interest.

3) InterMine Workshop, Wednesday:  A one day workshop on installing,
configuring and using the InterMine biological data warehouse system.

4) BioMart Workshop, Thursday:  A one day workshop on using the BioMart
biological data warehouse system, including accessing data through APIs.

Registration is now open for these events. There is a ?50 registration fee
for the GMOD Meeting to cover catered lunches and other expenses.
Registration for all other events is free, but required, as space is
limited.  These events are open to all: GMOD users, developers, prospective
users, biologists, and computer scientists.  See
http://gmod.org/wiki/January_2010_GMOD_Meeting for an idea of what goes on
at GMOD meetings,

GMOD is a collection of interoperable open source software components for
managing, visualizing and annotating biological data.  GMOD incorporates
many widely used tools, including GBrowse and JBrowse for genome browsing,
InterMine and BioMart for data mining, Galaxy and Ergatis for workflow,
Chado for data management, GBrowse_syn and CMap for comparative genomics,
plus many other tools (Apollo, MAKER, Pathway Tools, Textpresso, ...).  GMOD
is also an active community of researchers and developers addressing common
challenges in exploiting their data.  If you are struggling to fully exploit
your data then please consider attending GMOD Europe 2010.

Please let us know if you have any questions, and we hope to see you in
Cambridge.

Thanks,

Scott Cain and Dave Clements
-- 
http://gmod.org/wiki/GMOD_News
 <http://gmod.org/wiki/GMOD_News>http://gmod.org/wiki/GMOD_Evo_Hackathon
http://gmod.org/wiki/GMOD_Europe_2010
http://gmod.org/wiki/Help_Desk_Feedback


From abhishek.vit at gmail.com  Thu Aug  5 22:15:56 2010
From: abhishek.vit at gmail.com (Abhishek Pratap)
Date: Thu, 5 Aug 2010 18:15:56 -0400
Subject: [Bioperl-l] Wrapper for Picard tools in Bioperl
Message-ID: <AANLkTi=rrPKSuuddK-+gTqPyo-wKQA0ZamDP59_+dUfi@mail.gmail.com>

Hi All

Just wondering if there is any Picard wrapper/s available in Bioperl.


Thanks!
-Abhi

-----------------------------
Abhishek Pratap
Bioinformatics Software Engineer II
Genomics Resource Center
Institute for Genome Sciences
School of Medicine, Univ of Maryland
801, W. Baltimore Street, Baltimore, MD 21209
Ph: (+1)-410-706-2296
www.igs.umaryland.edu/


From Russell.Smithies at agresearch.co.nz  Thu Aug  5 22:37:46 2010
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Fri, 6 Aug 2010 10:37:46 +1200
Subject: [Bioperl-l] Wrapper for Picard tools in Bioperl
In-Reply-To: <AANLkTi=rrPKSuuddK-+gTqPyo-wKQA0ZamDP59_+dUfi@mail.gmail.com>
References: <AANLkTi=rrPKSuuddK-+gTqPyo-wKQA0ZamDP59_+dUfi@mail.gmail.com>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF32F02262E96@exchsth.agresearch.co.nz>

Might be part of the "Enterprise" package.
If not, some developer should "make it so".

:-)

--Russell
(I hate Fridays)

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Abhishek Pratap
> Sent: Friday, 6 August 2010 10:16 a.m.
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Wrapper for Picard tools in Bioperl
> 
> Hi All
> 
> Just wondering if there is any Picard wrapper/s available in Bioperl.
> 
> 
> Thanks!
> -Abhi
> 
> -----------------------------
> Abhishek Pratap
> Bioinformatics Software Engineer II
> Genomics Resource Center
> Institute for Genome Sciences
> School of Medicine, Univ of Maryland
> 801, W. Baltimore Street, Baltimore, MD 21209
> Ph: (+1)-410-706-2296
> www.igs.umaryland.edu/
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From cjfields at illinois.edu  Thu Aug  5 23:10:16 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 5 Aug 2010 18:10:16 -0500
Subject: [Bioperl-l] Wrapper for Picard tools in Bioperl
In-Reply-To: <AANLkTi=rrPKSuuddK-+gTqPyo-wKQA0ZamDP59_+dUfi@mail.gmail.com>
References: <AANLkTi=rrPKSuuddK-+gTqPyo-wKQA0ZamDP59_+dUfi@mail.gmail.com>
Message-ID: <26E3E5B6-47CF-4744-9687-199C218B5571@illinois.edu>

Picard uses samtools, which has a perl API:

http://search.cpan.org/dist/Bio-SamTools/

which uses BioPerl.  Ah, the circle of life...

chris

On Aug 5, 2010, at 5:15 PM, Abhishek Pratap wrote:

> Hi All
> 
> Just wondering if there is any Picard wrapper/s available in Bioperl.
> 
> 
> Thanks!
> -Abhi
> 
> -----------------------------
> Abhishek Pratap
> Bioinformatics Software Engineer II
> Genomics Resource Center
> Institute for Genome Sciences
> School of Medicine, Univ of Maryland
> 801, W. Baltimore Street, Baltimore, MD 21209
> Ph: (+1)-410-706-2296
> www.igs.umaryland.edu/
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From dan.kortschak at adelaide.edu.au  Fri Aug  6 01:06:45 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Fri, 06 Aug 2010 10:36:45 +0930
Subject: [Bioperl-l] MUMmer parser work
Message-ID: <1281056805.2414.26.camel@zoidberg.mbs.adelaide.edu.au>

Hello Everyone,

I've just noticed the absence of a MUMmer parser and thought that it
might be a worthwhile contribution to bioperl-run (I won't be able to
start on this for a while, but given Mark's excellent work on
CommandExts, it should take too long to get up when I do have time). Has
anyone made any effort in this direction that I would be stepping on, or
if they have left it, that I could pick up to shorten the work time?

cheers
Dan


From cjfields at illinois.edu  Fri Aug  6 03:13:51 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 5 Aug 2010 22:13:51 -0500
Subject: [Bioperl-l] MUMmer parser work
In-Reply-To: <1281056805.2414.26.camel@zoidberg.mbs.adelaide.edu.au>
References: <1281056805.2414.26.camel@zoidberg.mbs.adelaide.edu.au>
Message-ID: <80AF6158-9ADF-47A6-97EC-C322F75C8959@illinois.edu>

Dan,

Just so you know, there is a proposed MUMmer AlignIO parser that John (genehack) is planning on trying to incorporate in:

http://bugzilla.open-bio.org/show_bug.cgi?id=2701

It currently lacks significant tests, so feel free to chip in there as needed.

chris

On Aug 5, 2010, at 8:06 PM, Dan Kortschak wrote:

> Hello Everyone,
> 
> I've just noticed the absence of a MUMmer parser and thought that it
> might be a worthwhile contribution to bioperl-run (I won't be able to
> start on this for a while, but given Mark's excellent work on
> CommandExts, it should take too long to get up when I do have time). Has
> anyone made any effort in this direction that I would be stepping on, or
> if they have left it, that I could pick up to shorten the work time?
> 
> cheers
> Dan
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From greg at ebi.ac.uk  Fri Aug  6 09:47:21 2010
From: greg at ebi.ac.uk (Gregory Jordan)
Date: Fri, 6 Aug 2010 10:47:21 +0100
Subject: [Bioperl-l] call for a TreeIO volunteer
In-Reply-To: <91AC5B00-5969-4C56-B08A-6EEA76916A10@sbc.su.se>
References: <91AC5B00-5969-4C56-B08A-6EEA76916A10@sbc.su.se>
Message-ID: <AANLkTiknuVWFiz6kmOYAsHaLnPxMZEBWsHeBtv0yfuCQ@mail.gmail.com>

I can help out with these. I'm pretty sure I've previously fought with (and
perhaps even come up with a fix for) bug 3039, and I can take a look at 3007
too.

Now lemme just see if I can get up and running with the Bioperl test suite.
I'll give a shout if I run into any problems.

Cheers,
 Greg

On 5 August 2010 13:16, Dave Messina <David.Messina at sbc.su.se> wrote:

> Hi everybody,
>
> We've got a couple of small open bugs related to the Bio::TreeIO modules,
> and we could really use someone to take a look at them. Ideally, that
> someone would have familiarity with TreeIO already.*
>
> It'd help us to get the next release (1.6.2) out the door.
>
> The bugs in question are:
> - TreeIO::newick writes root node branch length incorrectly
> http://bugzilla.open-bio.org/show_bug.cgi?id=3039
>
> - Bio::TreeIO::nhx cannot parse empty [&&NHX] + round-trip failure
> http://bugzilla.open-bio.org/show_bug.cgi?id=3007
>
>
> Thanks,
> Dave
> on behalf of the core developers
>
>
> * Even if you don't, though, if you've been looking for an opportunity to
> contribute to BioPerl, and this sounds like something you'd like to work on,
> by all means raise your hand.
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From jun.yin at ucd.ie  Fri Aug  6 10:52:14 2010
From: jun.yin at ucd.ie (Jun Yin)
Date: Fri, 06 Aug 2010 11:52:14 +0100
Subject: [Bioperl-l] Packages retrieving online alignment sequences
Message-ID: <00d901cb3555$6e3a5500$4aaeff00$%yin@ucd.ie>

Hi, all,

 
I am the google summer of code student working on refactoring Bio::Align
subsystem. I recently implemented several packages retrieving online
alignment sequences. The aim of the packages are to provide convenient
methods to retrieve online alignment sequences for the BioPerl users. The
alignment sequences are converted into Bio::SimpleAlign object after the
retrieval, which will be easy to manipulate and write to local disk. Now the
packages support Pfam, Rfam, Prosite and Entrez Protein Clusters databases.

 
Here is the structure of the packages:

Packages

Bio::DB::Align (interface, and calling other packages)

Bio::DB::Align::Pfam (retrieving alignment from Pfam)

Bio::DB::Align::Rfam (retrieving alignment from Rfam)

Bio::DB::Align:Prosite (retrieving alignment from Prosite)

Bio::DB::Align:ProtClustDB (retrieving alignment from Entrez Protein
Clusters Database)

 
Usually four methods are provided for each package:

Methods

get_Aln_by_id (retrieving alignment by id and returns Bio::SimpleAlign
object)

get_Aln_by_acc (retrieving alignment by acession and returns
Bio::SimpleAlign object) (Rfam and Prosite only supports this method)

id2acc (id to accession conversion)

acc2id (accession to id conversion)

 
These packages are built dependent on LWP::UserAgent, HTTP::Request and
Bio::DB::GenericWebAgent. Bio::DB::Align::ProtClustDB is dependent on
Bio::DB::EUtilities.

 
Calling the packages can be:

 
my $dbobj=Bio::DB::Align->new(-db=>"rfam");

Or, my $dbobj= Bio::DB::Align::Pfam->new();


my $aln=$dbobj->get_Aln_by_acc("RF0001");
my $aln2=$dbobj->get_Aln_by_acc(-accession=>"RF0001",-alignment=>"full");

print $aln->length();

foreach my $seq ($aln->each_Seq) {
#do something
}

 
I have done some tests on these packages. And, I will write them into
standard tests later. Any suggestions on these packages are welcome.

 
Cheers,

Jun Yin

Ph.D. student in U.C.D.

 
Bioinformatics Laboratory

Conway Institute

University College Dublin

 
From David.Messina at sbc.su.se  Fri Aug  6 12:59:19 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 6 Aug 2010 14:59:19 +0200
Subject: [Bioperl-l] call for a TreeIO volunteer
In-Reply-To: <AANLkTiknuVWFiz6kmOYAsHaLnPxMZEBWsHeBtv0yfuCQ@mail.gmail.com>
References: <91AC5B00-5969-4C56-B08A-6EEA76916A10@sbc.su.se>
	<AANLkTiknuVWFiz6kmOYAsHaLnPxMZEBWsHeBtv0yfuCQ@mail.gmail.com>
Message-ID: <6D6DAA77-2A2F-4AAA-B36D-FACED1FDE383@sbc.su.se>


> I can help out with these. I'm pretty sure I've previously fought with (and perhaps even come up with a fix for) bug 3039, and I can take a look at 3007 too.

Awesome ? thanks Greg!


> Now lemme just see if I can get up and running with the Bioperl test suite. I'll give a shout if I run into any problems.

Please do.


Dave


From David.Messina at sbc.su.se  Fri Aug  6 13:06:47 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 6 Aug 2010 15:06:47 +0200
Subject: [Bioperl-l] Packages retrieving online alignment sequences
In-Reply-To: <00d901cb3555$6e3a5500$4aaeff00$%yin@ucd.ie>
References: <00d901cb3555$6e3a5500$4aaeff00$%yin@ucd.ie>
Message-ID: <F90660F7-74F9-41F2-A3E4-B3B42B817A0D@sbc.su.se>

Sounds great, Jun!

Did you happen to test your code on very large alignments? I know there's one in Pfam that's something like 100,000 sequences. An rRNA, I believe.


Dave


From jun.yin at ucd.ie  Fri Aug  6 13:11:41 2010
From: jun.yin at ucd.ie (Jun Yin)
Date: Fri, 06 Aug 2010 14:11:41 +0100
Subject: [Bioperl-l] Packages retrieving online alignment sequences
In-Reply-To: <F90660F7-74F9-41F2-A3E4-B3B42B817A0D@sbc.su.se>
References: <00d901cb3555$6e3a5500$4aaeff00$%yin@ucd.ie>
	<F90660F7-74F9-41F2-A3E4-B3B42B817A0D@sbc.su.se>
Message-ID: <00fc01cb3568$e97968b0$bc6c3a10$%yin@ucd.ie>

Hi, Dave,

Thx for reminding me this. I will definitely try it.

Cheers,
Jun Yin
Ph.D.?student in U.C.D.

Bioinformatics Laboratory
Conway Institute
University College Dublin


-----Original Message-----
From: Dave Messina [mailto:David.Messina at sbc.su.se] 
Sent: Friday, August 06, 2010 2:07 PM
To: Jun Yin
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Packages retrieving online alignment sequences

Sounds great, Jun!

Did you happen to test your code on very large alignments? I know there's
one in Pfam that's something like 100,000 sequences. An rRNA, I believe.


Dave


__________ Information from ESET Smart Security, version of virus signature
database 5346 (20100806) __________

The message was checked by ESET Smart Security.

http://www.eset.com


__________ Information from ESET Smart Security, version of virus signature
database 5346 (20100806) __________

The message was checked by ESET Smart Security.

http://www.eset.com
 

From cjfields at illinois.edu  Fri Aug  6 13:19:54 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 6 Aug 2010 08:19:54 -0500
Subject: [Bioperl-l] call for a TreeIO volunteer
In-Reply-To: <6D6DAA77-2A2F-4AAA-B36D-FACED1FDE383@sbc.su.se>
References: <91AC5B00-5969-4C56-B08A-6EEA76916A10@sbc.su.se>
	<AANLkTiknuVWFiz6kmOYAsHaLnPxMZEBWsHeBtv0yfuCQ@mail.gmail.com>
	<6D6DAA77-2A2F-4AAA-B36D-FACED1FDE383@sbc.su.se>
Message-ID: <8CB3DE9A-4C5C-42A3-94B4-8818D7143951@illinois.edu>

On Aug 6, 2010, at 7:59 AM, Dave Messina wrote:

> 
>> I can help out with these. I'm pretty sure I've previously fought with (and perhaps even come up with a fix for) bug 3039, and I can take a look at 3007 too.
> 
> Awesome ? thanks Greg!
> 
> 
>> Now lemme just see if I can get up and running with the Bioperl test suite. I'll give a shout if I run into any problems.
> 
> Please do.
> 
> 
> 
> Dave

Agreed, and thanks for helping out!

chris


From dianabowley at gmail.com  Fri Aug  6 22:33:57 2010
From: dianabowley at gmail.com (DRBowley)
Date: Fri, 6 Aug 2010 15:33:57 -0700 (PDT)
Subject: [Bioperl-l] BioPerl install issues
Message-ID: <b70994fe-d6c3-4c58-8b45-dfe50b9a8fe5@t5g2000prd.googlegroups.com>

I'm new to both perl and bioperl and I'm having issues installing
bioperl.  I'm trying to install on a Mac OS 10.6.4, and I've already
installed perl (5.10.0).  I tried installing using the recommended
approach for Mac - via Fink...
"fink install bioperl-pm5100"

Looking back over the terminal window text it looks like the problem
is:
"This package requires Module::Build v0.2805 or greater to install
itself."

I tried doing "fink selfupdate" and that did not fix the problem.

Any suggestions?

Thanks!
Diana


From Kevin.M.Brown at asu.edu  Fri Aug  6 22:50:45 2010
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Fri, 6 Aug 2010 15:50:45 -0700
Subject: [Bioperl-l] BioPerl install issues
In-Reply-To: <b70994fe-d6c3-4c58-8b45-dfe50b9a8fe5@t5g2000prd.googlegroups.com>
References: <b70994fe-d6c3-4c58-8b45-dfe50b9a8fe5@t5g2000prd.googlegroups.com>
Message-ID: <1A4207F8295607498283FE9E93B775B406E44A05@EX02.asurite.ad.asu.edu>

http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix#INSTALLING_BIOPE
RL_THE_EASY_WAY_USING_Build.PL

Not sure why you had to install perl since it should have been part of
the stock OSX install (or at least it was last time I logged onto a
mac). Not sure why the Fink method has so many issues, but might try the
above which works for linux or bsd.

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of DRBowley
Sent: Friday, August 06, 2010 3:34 PM
To: bioperl-l at bioperl.org
Subject: [Bioperl-l] BioPerl install issues

I'm new to both perl and bioperl and I'm having issues installing
bioperl.  I'm trying to install on a Mac OS 10.6.4, and I've already
installed perl (5.10.0).  I tried installing using the recommended
approach for Mac - via Fink...
"fink install bioperl-pm5100"

Looking back over the terminal window text it looks like the problem
is:
"This package requires Module::Build v0.2805 or greater to install
itself."

I tried doing "fink selfupdate" and that did not fix the problem.

Any suggestions?

Thanks!
Diana
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From skastu01 at students.poly.edu  Sat Aug  7 00:03:50 2010
From: skastu01 at students.poly.edu (Lakshmi Kastury)
Date: Sat, 7 Aug 2010 00:03:50 +0000
Subject: [Bioperl-l] BioPerl install issues
Message-ID: <BLU106-W267722078497EAEDEC08C594920@phx.gbl>


Hi -
I went through several failed attempts on MACOS Snow Leopard, and fink was a dead end. Eventually I succeeded to install on Windows Vista using CPAN. I am not sure if this method will work with MACOS:

1. Opened command prompt.
2. Typed command: >perl -MCPAN -e "install Bundle::BioPerl"
3. Answered yes to the series of questions, which prompts install of several bundles and a compiler.

The instructions were in a link from:
http://bioperl.org/Core/Latest/INSTALL

All the best,
Lakshmi

> Date: Fri, 6 Aug 2010 15:33:57 -0700
> From: dianabowley at gmail.com
> To: bioperl-l at bioperl.org
> Subject: [Bioperl-l] BioPerl install issues
> 
> I'm new to both perl and bioperl and I'm having issues installing
> bioperl.  I'm trying to install on a Mac OS 10.6.4, and I've already
> installed perl (5.10.0).  I tried installing using the recommended
> approach for Mac - via Fink...
> "fink install bioperl-pm5100"
> 
> Looking back over the terminal window text it looks like the problem
> is:
> "This package requires Module::Build v0.2805 or greater to install
> itself."
> 
> I tried doing "fink selfupdate" and that did not fix the problem.
> 
> Any suggestions?
> 
> Thanks!
> Diana
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
 		 	   		  

From David.Messina at sbc.su.se  Sat Aug  7 06:47:40 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Sat, 7 Aug 2010 08:47:40 +0200
Subject: [Bioperl-l] BioPerl install issues
In-Reply-To: <BLU106-W267722078497EAEDEC08C594920@phx.gbl>
References: <BLU106-W267722078497EAEDEC08C594920@phx.gbl>
Message-ID: <5BE9DB7C-9A51-4C09-8F83-8CA8ED4AADFE@sbc.su.se>


On Aug 7, 2010, at 02:03 , Lakshmi Kastury wrote:

>  I am not sure if this method will work with MACOS:

It will. CPAN is cross-platform and is the best way to install BioPerl.


Dave


From cjfields at illinois.edu  Sat Aug  7 13:58:56 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Sat, 7 Aug 2010 08:58:56 -0500
Subject: [Bioperl-l] BioPerl install issues
In-Reply-To: <5BE9DB7C-9A51-4C09-8F83-8CA8ED4AADFE@sbc.su.se>
References: <BLU106-W267722078497EAEDEC08C594920@phx.gbl>
	<5BE9DB7C-9A51-4C09-8F83-8CA8ED4AADFE@sbc.su.se>
Message-ID: <A21BBC5D-1D71-4534-B278-9FCFA0BB6DA8@illinois.edu>

It should work fine.  Even installing from trunk right now works w/o failing tests. 

chris

On Aug 7, 2010, at 1:47 AM, Dave Messina wrote:

> 
> On Aug 7, 2010, at 02:03 , Lakshmi Kastury wrote:
> 
>> I am not sure if this method will work with MACOS:
> 
> It will. CPAN is cross-platform and is the best way to install BioPerl.
> 
> 
> Dave
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From greg at ebi.ac.uk  Sat Aug  7 21:14:58 2010
From: greg at ebi.ac.uk (Gregory Jordan)
Date: Sat, 7 Aug 2010 22:14:58 +0100
Subject: [Bioperl-l] Packages retrieving online alignment sequences
In-Reply-To: <00fc01cb3568$e97968b0$bc6c3a10$%yin@ucd.ie>
References: <00d901cb3555$6e3a5500$4aaeff00$%yin@ucd.ie>
	<F90660F7-74F9-41F2-A3E4-B3B42B817A0D@sbc.su.se> 
	<00fc01cb3568$e97968b0$bc6c3a10$%yin@ucd.ie>
Message-ID: <AANLkTimL938B1ovmOKC_FBNw1OwjipVpjOXZfN+P5Kf5@mail.gmail.com>

Maybe I'm just a bit naive here, but what is the expected difference between
accession and ID and why do we need a separate method for each? Seems to me
that one could just have a single method, get_Aln, which determines under
the hood whether the query string is an accession or ID.

It would be nice if the SimpleAlign object had its Annotation filled with
some extra metadata (such as accession, ID, database version number, URI,
etc.).

One other thing: have you thought about adding an Ensembl adaptor? Or maybe
something similar already exists in BioPerl...?

Sure Ensembl provides their own Perl API, but for someone who doesn't want
to go through the hassle of installing it from CVS (pardon my french, but
wtf!?! Who still uses CVS) and learning a whole new API, it might be
convenient to have a simple BioPerl module for quickly grabbing gene family
alignments from the public Ensembl MySQL databases. I'd be willing to help
write the necessary SQL queries for this.

greg

On 6 August 2010 14:11, Jun Yin <jun.yin at ucd.ie> wrote:

> Hi, Dave,
>
> Thx for reminding me this. I will definitely try it.
>
> Cheers,
> Jun Yin
> Ph.D. student in U.C.D.
>
> Bioinformatics Laboratory
> Conway Institute
> University College Dublin
>
>
> -----Original Message-----
> From: Dave Messina [mailto:David.Messina at sbc.su.se]
> Sent: Friday, August 06, 2010 2:07 PM
> To: Jun Yin
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Packages retrieving online alignment sequences
>
> Sounds great, Jun!
>
> Did you happen to test your code on very large alignments? I know there's
> one in Pfam that's something like 100,000 sequences. An rRNA, I believe.
>
>
> Dave
>
>
> __________ Information from ESET Smart Security, version of virus signature
> database 5346 (20100806) __________
>
> The message was checked by ESET Smart Security.
>
> http://www.eset.com
>
>
>
>
> __________ Information from ESET Smart Security, version of virus signature
> database 5346 (20100806) __________
>
> The message was checked by ESET Smart Security.
>
> http://www.eset.com
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at illinois.edu  Sat Aug  7 22:07:39 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Sat, 7 Aug 2010 17:07:39 -0500
Subject: [Bioperl-l] Packages retrieving online alignment sequences
In-Reply-To: <AANLkTimL938B1ovmOKC_FBNw1OwjipVpjOXZfN+P5Kf5@mail.gmail.com>
References: <00d901cb3555$6e3a5500$4aaeff00$%yin@ucd.ie>
	<F90660F7-74F9-41F2-A3E4-B3B42B817A0D@sbc.su.se>
	<00fc01cb3568$e97968b0$bc6c3a10$%yin@ucd.ie>
	<AANLkTimL938B1ovmOKC_FBNw1OwjipVpjOXZfN+P5Kf5@mail.gmail.com>
Message-ID: <21E3B6D7-01BC-4DDA-B5B3-06F1F5AD7105@illinois.edu>

On Aug 7, 2010, at 4:14 PM, Gregory Jordan wrote:

> Maybe I'm just a bit naive here, but what is the expected difference between
> accession and ID and why do we need a separate method for each?

Depends on the remote service, but in many cases there is a difference.  With NCBI eutils you can have either an accession and the unique identifier (UID, or GI for nuc/protein seqs).  efetch can use both, but only the UID is guaranteed to retrieve a single sequence all the time; the accession can (very rarely) map to more than one sequence.  

The other eutils services require either a string (esearch) or a UID, but do not allow an accession.

> Seems to me
> that one could just have a single method, get_Aln, which determines under
> the hood whether the query string is an accession or ID.

A simpler method could be introduced, but I can see that being potentially brittle in the long run.  A naked alphanumeric string doesn't reveal much about what it is at face value w/o knowing database/service-specific behavior.  And then we're reliant on that behavior not changing, which we can't guarantee (this has bitten us in the past).  What would one do if NCBI (for instance) allowed accessions derived completely of digits, or conversely a unique ID with mixed alphanumerics?

Using methods specific for ID/acc at least guarantees a behavior on the backend w/o guessing, and if there is no danger of overlap (a service accepts either/or) one could simply be an alias of the other.

> It would be nice if the SimpleAlign object had its Annotation filled with
> some extra metadata (such as accession, ID, database version number, URI,
> etc.).

According to the deobfuscator SimpleAlign does have accession() and id().  The others could be simple attributes, and can be added as simple getter/setters, or as annotation via Bio::Annotation (this is the way Stockholm annotation is currently handled).  Something to think about.

> One other thing: have you thought about adding an Ensembl adaptor? Or maybe
> something similar already exists in BioPerl...?

That's a good idea, though it might make more sense if this was done when mem-efficient (possibly DB-dependent) AlignI modules are present within bioperl, which is part of the GSoC (see below).  For instance, have a Bio::Align::AlignI with a backend ensembl DB adaptor that works lazily.

If using the Ensembl Perl API, a few possible roadblocks/problems might pop up. Ensembl currently requires bioperl (v1.2.3, but it works with the latest as well, at least when I've used it).  If using the ensembl perl API we would just need to ensure we aren't conflicting with ensembl code that pulls in bioperl classes expecting a v1.2.3 API when we only support the latest.  I don't foresee this being an issue, though (there is precedent for this, see Sendu's Ensembl module Bio::Tools::Run::Ensembl in bioperl-run).

> Sure Ensembl provides their own Perl API, but for someone who doesn't want
> to go through the hassle of installing it from CVS (pardon my french, but
> wtf!?! Who still uses CVS) and learning a whole new API, it might be
> convenient to have a simple BioPerl module for quickly grabbing gene family
> alignments from the public Ensembl MySQL databases. I'd be willing to help
> write the necessary SQL queries for this.
> 
> greg

The GSoC project on alignment subsystem refactoring will be finishing up this month, so I'm sure Jun discuss ideas for initial DB-dependent implementations.  The more input and coders implementing the better, IMO.

As for writing up an adaptor to ensembl outside of it's API, overall I don't think it's a bad idea, but if it's possible maybe start without reinventing things, then move to direct SQL.  Unless it's easier to use SQL.

chris

> On 6 August 2010 14:11, Jun Yin <jun.yin at ucd.ie> wrote:
> 
>> Hi, Dave,
>> 
>> Thx for reminding me this. I will definitely try it.
>> 
>> Cheers,
>> Jun Yin
>> Ph.D. student in U.C.D.
>> 
>> Bioinformatics Laboratory
>> Conway Institute
>> University College Dublin
>> 
>> 
>> -----Original Message-----
>> From: Dave Messina [mailto:David.Messina at sbc.su.se]
>> Sent: Friday, August 06, 2010 2:07 PM
>> To: Jun Yin
>> Cc: bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] Packages retrieving online alignment sequences
>> 
>> Sounds great, Jun!
>> 
>> Did you happen to test your code on very large alignments? I know there's
>> one in Pfam that's something like 100,000 sequences. An rRNA, I believe.
>> 
>> 
>> Dave
>> 
>> 
>> __________ Information from ESET Smart Security, version of virus signature
>> database 5346 (20100806) __________
>> 
>> The message was checked by ESET Smart Security.
>> 
>> http://www.eset.com
>> 
>> 
>> 
>> 
>> __________ Information from ESET Smart Security, version of virus signature
>> database 5346 (20100806) __________
>> 
>> The message was checked by ESET Smart Security.
>> 
>> http://www.eset.com
>> 
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hartzell at alerce.com  Sat Aug  7 21:45:04 2010
From: hartzell at alerce.com (George Hartzell)
Date: Sat, 7 Aug 2010 14:45:04 -0700
Subject: [Bioperl-l] BioPerl install issues
In-Reply-To: <A21BBC5D-1D71-4534-B278-9FCFA0BB6DA8@illinois.edu>
References: <BLU106-W267722078497EAEDEC08C594920@phx.gbl>
	<5BE9DB7C-9A51-4C09-8F83-8CA8ED4AADFE@sbc.su.se>
	<A21BBC5D-1D71-4534-B278-9FCFA0BB6DA8@illinois.edu>
Message-ID: <19549.54240.499140.501136@gargle.gargle.HOWL>

Chris Fields writes:
 > It should work fine.  Even installing from trunk right now works
 > w/o failing tests.  

As a slight aside, if you're looking to build a current perl binary
for your mac (e.g. 5.12.1) you should take a look at perlbrew
(http://search.cpan.org/dist/App-perlbrew/).  The three steps at the
top of the installation section of the README are all you need to get
going.  Even a manager can do it.

If you're using bash on the mac via terminal you'll probably want to
put the one-liner they prescribe into your .bash_profile instead of
your .bashrc, but everything else just flows right along.

Once you have that in place you have a nicely isolated system into
which you can install things to your hearts content without worrying
about PERL5LIB and local::lib and the rest.

g.


From cjfields at illinois.edu  Sun Aug  8 01:19:54 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Sat, 7 Aug 2010 20:19:54 -0500
Subject: [Bioperl-l] BioPerl install issues
In-Reply-To: <19549.54240.499140.501136@gargle.gargle.HOWL>
References: <BLU106-W267722078497EAEDEC08C594920@phx.gbl>
	<5BE9DB7C-9A51-4C09-8F83-8CA8ED4AADFE@sbc.su.se>
	<A21BBC5D-1D71-4534-B278-9FCFA0BB6DA8@illinois.edu>
	<19549.54240.499140.501136@gargle.gargle.HOWL>
Message-ID: <EA5D5C26-7F3E-46B5-9CD0-F3D51B5F9511@illinois.edu>

On Aug 7, 2010, at 4:45 PM, George Hartzell wrote:

> Chris Fields writes:
>> It should work fine.  Even installing from trunk right now works
>> w/o failing tests.  
> 
> As a slight aside, if you're looking to build a current perl binary
> for your mac (e.g. 5.12.1) you should take a look at perlbrew
> (http://search.cpan.org/dist/App-perlbrew/).  The three steps at the
> top of the installation section of the README are all you need to get
> going.  Even a manager can do it.
> 
> If you're using bash on the mac via terminal you'll probably want to
> put the one-liner they prescribe into your .bash_profile instead of
> your .bashrc, but everything else just flows right along.
> 
> Once you have that in place you have a nicely isolated system into
> which you can install things to your hearts content without worrying
> about PERL5LIB and local::lib and the rest.
> 
> g.

Have to second using perlbrew, started using it for my local Ubuntu installation (don't have it running on my macbook yet, but it's in the plans).

chris


From greg at ebi.ac.uk  Sun Aug  8 06:12:41 2010
From: greg at ebi.ac.uk (Gregory Jordan)
Date: Sun, 8 Aug 2010 07:12:41 +0100
Subject: [Bioperl-l] Packages retrieving online alignment sequences
In-Reply-To: <21E3B6D7-01BC-4DDA-B5B3-06F1F5AD7105@illinois.edu>
References: <00d901cb3555$6e3a5500$4aaeff00$%yin@ucd.ie>
	<F90660F7-74F9-41F2-A3E4-B3B42B817A0D@sbc.su.se> 
	<00fc01cb3568$e97968b0$bc6c3a10$%yin@ucd.ie>
	<AANLkTimL938B1ovmOKC_FBNw1OwjipVpjOXZfN+P5Kf5@mail.gmail.com> 
	<21E3B6D7-01BC-4DDA-B5B3-06F1F5AD7105@illinois.edu>
Message-ID: <AANLkTim9jkmKSGHm5bHPLOF3_xf+p9xMTN5Ha7bOMR7P@mail.gmail.com>

On 7 August 2010 23:07, Chris Fields <cjfields at illinois.edu> wrote:

>
> A simpler method could be introduced, but I can see that being potentially
> brittle in the long run.  A naked alphanumeric string doesn't reveal much
> about what it is at face value w/o knowing database/service-specific
> behavior.  And then we're reliant on that behavior not changing, which we
> can't guarantee (this has bitten us in the past).  What would one do if NCBI
> (for instance) allowed accessions derived completely of digits, or
> conversely a unique ID with mixed alphanumerics?
>
> Using methods specific for ID/acc at least guarantees a behavior on the
> backend w/o guessing, and if there is no danger of overlap (a service
> accepts either/or) one could simply be an alias of the other.
>

Thanks for the clarification on IDs vs accessions. As long as the behavior
and distinction are well-documented, I'm sure it won't make too much of a
difference.

My main concern was just that having two similar methods -- with no clearly
laid out distinction between the two and one of them only supported by half
of the implementing subclasses -- might confuse potential users.

As a point of reference: both Rfam and Pfam allow either an ID or an
accession in their front-page search interface (http://www.pfam.org /
http://www.rfam.org/). In fact, they seem to entirely hide the distinction
between ID and Accession from the end user; nowhere on the Rfam page for an
individual result is it clear which string is the accession and which is the
ID (http://rfam.sanger.ac.uk/family/snoZ107_R87).

Thus, a potential user of the Rfam module wouldn't know whether to call the
get_by_ID or get_by_Accession method, even after looking at the Rfam page
for his / her desired alignment!

As you can probably tell, I'm all in favor of a unified search whenever
feasible / possible. :-)


> As for writing up an adaptor to ensembl outside of it's API, overall I
> don't think it's a bad idea, but if it's possible maybe start without
> reinventing things, then move to direct SQL.  Unless it's easier to use SQL.
>
>
For fetching Ensembl's gene family alignments, using the SQL will be
easiest. They don't tend to get unreasonably large in terms of memory  -- I
think the biggest tend to be ~700 sequences with a few thousand alignment
columns or so -- and it's a simple table join or two to get both the tree
and alignment from the database.

For genomic alignments, I agree that a more memory-efficient and/or lazy
backend would be necessary. And it's pretty much impossible to get those
things out of the Ensembl tables without using their API.

--greg


From dan.kortschak at adelaide.edu.au  Mon Aug  9 00:53:43 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Mon, 09 Aug 2010 10:23:43 +0930
Subject: [Bioperl-l] MUMmer parser work
In-Reply-To: <80AF6158-9ADF-47A6-97EC-C322F75C8959@illinois.edu>
References: <1281056805.2414.26.camel@zoidberg.mbs.adelaide.edu.au>
	<80AF6158-9ADF-47A6-97EC-C322F75C8959@illinois.edu>
Message-ID: <1281315223.2414.48.camel@zoidberg.mbs.adelaide.edu.au>

Hi Chris,

Is that set of files planned to be included in the git repository on
bioperl-live? I don't want to push something that is being organised by
someone else.

cheers
Dan

On Thu, 2010-08-05 at 22:13 -0500, Chris Fields wrote:
> Dan,
> 
> Just so you know, there is a proposed MUMmer AlignIO parser that John (genehack) is planning on trying to incorporate in:
> 
> http://bugzilla.open-bio.org/show_bug.cgi?id=2701
> 
> It currently lacks significant tests, so feel free to chip in there as needed.
> 
> chris


From genehack at genehack.org  Mon Aug  9 01:42:27 2010
From: genehack at genehack.org (John SJ Anderson)
Date: Sun, 8 Aug 2010 21:42:27 -0400
Subject: [Bioperl-l] MUMmer parser work
In-Reply-To: <1281315223.2414.48.camel@zoidberg.mbs.adelaide.edu.au>
References: <1281056805.2414.26.camel@zoidberg.mbs.adelaide.edu.au>
	<80AF6158-9ADF-47A6-97EC-C322F75C8959@illinois.edu>
	<1281315223.2414.48.camel@zoidberg.mbs.adelaide.edu.au>
Message-ID: <5BEA6ECA-B7A7-4417-BC91-763AB956347A@genehack.org>

I'm working on getting those files into a topic branch in bioperl-live so they can be reviewed -- that'll probably be pushed back to the main master within the next couple days at the latest. 

j.

On Aug 8, 2010, at 20:53 , Dan Kortschak wrote:

> Hi Chris,
> 
> Is that set of files planned to be included in the git repository on
> bioperl-live? I don't want to push something that is being organised by
> someone else.
> 
> cheers
> Dan
> 
> On Thu, 2010-08-05 at 22:13 -0500, Chris Fields wrote:
>> Dan,
>> 
>> Just so you know, there is a proposed MUMmer AlignIO parser that John (genehack) is planning on trying to incorporate in:
>> 
>> http://bugzilla.open-bio.org/show_bug.cgi?id=2701
>> 
>> It currently lacks significant tests, so feel free to chip in there as needed.
>> 
>> chris
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From dan.kortschak at adelaide.edu.au  Mon Aug  9 02:03:52 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Mon, 09 Aug 2010 11:33:52 +0930
Subject: [Bioperl-l] MUMmer parser work
In-Reply-To: <5BEA6ECA-B7A7-4417-BC91-763AB956347A@genehack.org>
References: <1281056805.2414.26.camel@zoidberg.mbs.adelaide.edu.au>
	<80AF6158-9ADF-47A6-97EC-C322F75C8959@illinois.edu>
	<1281315223.2414.48.camel@zoidberg.mbs.adelaide.edu.au>
	<5BEA6ECA-B7A7-4417-BC91-763AB956347A@genehack.org>
Message-ID: <1281319432.2414.49.camel@zoidberg.mbs.adelaide.edu.au>

Excellent. Thanks for that.

Dan

On Sun, 2010-08-08 at 21:42 -0400, John SJ Anderson wrote:
> I'm working on getting those files into a topic branch in bioperl-live so they can be reviewed -- that'll probably be pushed back to the main master within the next couple days at the latest. 
> 
> j.


From cjfields at illinois.edu  Tue Aug 10 02:40:07 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 9 Aug 2010 21:40:07 -0500
Subject: [Bioperl-l] bioperl-live, moving Bio->lib/Bio
Message-ID: <DE527A62-E6E7-45B0-96A5-F94E7A7A137F@illinois.edu>

Any objections to moving the Bio directory to lib/Bio in bioperl-live?  It's a more standard location for code in most distributions; I have a branch (topic/cjfields_standard_lib) that has this working, though it's possible that it needs more work.

chris


From genehack at genehack.org  Tue Aug 10 08:30:44 2010
From: genehack at genehack.org (John SJ Anderson)
Date: Tue, 10 Aug 2010 04:30:44 -0400
Subject: [Bioperl-l] bioperl-live, moving Bio->lib/Bio
In-Reply-To: <DE527A62-E6E7-45B0-96A5-F94E7A7A137F@illinois.edu>
References: <DE527A62-E6E7-45B0-96A5-F94E7A7A137F@illinois.edu>
Message-ID: <B2C73D74-1F72-402B-A3F7-C4E3ECF7D3B6@genehack.org>


On Aug 9, 2010, at 22:40 , Chris Fields wrote:

> Any objections to moving the Bio directory to lib/Bio in bioperl-live?  

+1 on this idea. 

j.


From genehack at genehack.org  Tue Aug 10 11:21:51 2010
From: genehack at genehack.org (John Anderson)
Date: Tue, 10 Aug 2010 07:21:51 -0400
Subject: [Bioperl-l] MUMmer parser work
In-Reply-To: <5BEA6ECA-B7A7-4417-BC91-763AB956347A@genehack.org>
References: <1281056805.2414.26.camel@zoidberg.mbs.adelaide.edu.au>
	<80AF6158-9ADF-47A6-97EC-C322F75C8959@illinois.edu>
	<1281315223.2414.48.camel@zoidberg.mbs.adelaide.edu.au>
	<5BEA6ECA-B7A7-4417-BC91-763AB956347A@genehack.org>
Message-ID: <7A4F93AB-1BF7-4775-BC0E-38E7B431ECC6@genehack.org>


On Aug 8, 2010, at 9:42 PM, John SJ Anderson wrote:

> I'm working on getting those files into a topic branch in bioperl-live so they can be reviewed -- that'll probably be pushed back to the main master within the next couple days at the latest. 

Okay, the files have been added to topic/bug-2701 -- see <http://github.com/bioperl/bioperl-live/commits/topic/bug-2701>.

Please note, these are just the files from the bug report, slotted into the appropriate spots. I haven't reviewed the code or done anything about the non-BioPerl-y tests or the general lack of test coverage. I hope to do something about that in the coming week, but if somebody beats me to it, that would be okay too.

j.


From maj at fortinbras.us  Tue Aug 10 23:52:05 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 10 Aug 2010 19:52:05 -0400
Subject: [Bioperl-l] bioperl-live, moving Bio->lib/Bio
In-Reply-To: <DE527A62-E6E7-45B0-96A5-F94E7A7A137F@illinois.edu>
References: <DE527A62-E6E7-45B0-96A5-F94E7A7A137F@illinois.edu>
Message-ID: <1C55239986494A8D82BDC21A85B324E9@NewLife>

+1
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "BioPerl List" <bioperl-l at lists.open-bio.org>
Sent: Monday, August 09, 2010 10:40 PM
Subject: [Bioperl-l] bioperl-live, moving Bio->lib/Bio


> Any objections to moving the Bio directory to lib/Bio in bioperl-live?  It's a 
> more standard location for code in most distributions; I have a branch 
> (topic/cjfields_standard_lib) that has this working, though it's possible that 
> it needs more work.
>
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From fayroz_farouk at yahoo.com  Sun Aug  8 08:24:31 2010
From: fayroz_farouk at yahoo.com (fayroz)
Date: Sun, 8 Aug 2010 01:24:31 -0700 (PDT)
Subject: [Bioperl-l] using HMMER
Message-ID: <603590.1072.qm@web112620.mail.gq1.yahoo.com>

i need your help, i?am a new perl user and want to use bioperl modules to run 
HMMER program ( HMMsearch) i have" model.hmm" and a "fasta file" to?see?which of 
them are similar?with the model
i write this code but there is a problems

#!/usr/local/bin/perl W
use Bio::AlignIO;
use Bio::SearchIO;
use Bio::SeqIO ;
use Bio::Tools::Run::Hmmer;

# run hmmsearch (similar for hmmpfam)
my $factory = Bio::Tools::Run::Hmmer->new(-hmm => 'h6_avian.hmm',-informat => 
'fasta');
my $seq = Bio::SeqIO->new('-file'=> "one_seq.fa", '-format'=>'Fasta');

# Pass the factory a Bio::Seq object or a file name, returns a Bio::SearchIO
my $searchio = $factory->hmmsearch($seq);

while (my $result = $searchio->next_result){
while(my $hit = $result->next_hit){
while (my $hsp = $hit->next_hsp){
print join("\t", ( $result->query_name,
$hsp->query->start,
$hsp->query->end,
$hit->name,
$hsp->hit->start,
$hsp->hit->end,
$hsp->score,
$hsp->evalue,
$hsp->seq_str,
)), "\n";
}
}
}


exceptions:
MSG: Unknown kind of input 'Bio::SeqIO::fasta=HASH(0x329a504)'
STACK Bio::Tools::Run::Hmmer::_setinput 
D:/Perl/site/lib/Bio/Tools/Run/Hmmer.pm:381
STACK Bio::Tools::Run::Hmmer::hmmsearch 
D:/Perl/site/lib/Bio/Tools/Run/Hmmer.pm:352
?STACK toplevel test_bioperl.pl:12
thank you

fayroz?


From douglas.hoen at gmail.com  Wed Aug 11 01:54:53 2010
From: douglas.hoen at gmail.com (Douglas Hoen)
Date: Tue, 10 Aug 2010 21:54:53 -0400
Subject: [Bioperl-l] Bio::SeqFeature::SimilarityPair->from_searchResult()?
Message-ID: <4513D6B2-F7B3-4A6E-91CA-879C9E372E84@gmail.com>

Hi,

I was wondering why the Synopsis in the docs for Bio::SeqFeature::SimilarityPair has the following:
$sim_pair = Bio::SeqFeature::SimilarityPair->from_searchResult($blastHit);

There doesn't actually seem to be a from_searchResult method. Am I missing something?

Thanks,
-- Doug


From zhaoy at mail.cbi.pku.edu.cn  Wed Aug 11 08:17:42 2010
From: zhaoy at mail.cbi.pku.edu.cn (zhaoy at mail.cbi.pku.edu.cn)
Date: Wed, 11 Aug 2010 16:17:42 +0800 (CST)
Subject: [Bioperl-l] About extracting sequence from genewise format result
Message-ID: <53663.162.105.250.100.1281514662.squirrel@mail.cbi.pku.edu.cn>

Dear authors:

Hello!

Recently I am trying to parse the genewise format result for extracting
the nuclear sequence using method "hit_string" in module "SearchIO",
however, the result is empty. What's more terrible, the cycle seems not
working, because I always get the last result. I'm confused.

My perl code is shown below:

#!/usr/bin/perl -w
use strict;
use warnings;

use Bio::SearchIO;
my $in = new Bio::SearchIO(-format => 'wise',
                           -wisetype => 'genewise',
                           -file   => 'test');
while( my $result = $in->next_result ) {
        while (my $hit = $result->next_hit) {
           while (my $hsp = $hit->next_hsp){
                print "Query=",      $result->query_name, "\n",
                      "Length=",     $hsp->length('total'),"\n",
                      "hit_string:", $hsp->hit_string, "\n";
}
}
}

And one of the genewise format results is shown below:

genewise $Name: wise2-4-0alpha $ (unreleased release)
This program is freely distributed under a GPL. See source directory
Copyright (c) GRL limited: portions of the code are from separate copyright

Query protein:       Cpa_s110_24
Comp Matrix:         BLOSUM62.bla
Gap open:            12
Gap extension:       2
Start/End            global
Target Sequence      Bdi_chr3:38292015..38292302
Strand:              forward
Start/End (protein)  global
Gene Parameter file: gene.stat
Splice site model:   GT/AG only
Codon Table:         codon.table
Subs error:          1e-06
Indel error:         1e-06
Null model           syn
Algorithm            623

genewise output
Score 37.97 bits over entire alignment
Scores as bits over a synchronous coding model

Warning: The bits scores is not probablistically correct for single seqs
See WWW help for more info

Cpa_s110_24        1 MGNCQAVDAATLAIQHPS-GKVDRLYWPVSASEVMRTNPGHYVALLI--
                     MGNCQA DAA + IQHP+ GKV+RLYWP +A++VMR NPGHYVAL++
                     MGNCQAADAAAVVIQHPAEGKVERLYWPATAADVMRKNPGHYVALVVVH
Bdi_chr3:382920    1 agatcggggggggacccgggaggccttcgaggggacaacgctggcgggc
                     tgagaccaccctttaaccagatagtagcccccattgaacgaatctttta
                     gctcgggtggcggcgcgcgggcgcccggccgcccgcgcccccccccccc


Cpa_s110_24       47 ----STTLCPSNSNASNAESVRVTRIKLLRPTDTLVLGQVYRLITTQEV
                              P+ +    A + R+T++KLL+P DTL++GQVYRLIT+Q
                     VSGGAGETDPAVAGGGAAAAARITKVKLLKPRDTLLIGQVYRLITSQ--
Bdi_chr3:382920  148 gtgggggagcgggggggggggaaaagaccaccgaccagcgtccaatc
                     tcggcgacacctcgggcccccgtcatattacgactttgatagttcca
                     cctcctgtcccacaaaattccgccgcgccgcgctgcccgccccccca


Cpa_s110_24       92 MKGLWAKKCAKMKKYQEADHKDGLKPETIPGRRSGPERDTQVAKHERHR

                     -------------------------------------------------
Bdi_chr3:382920  289


Cpa_s110_24      141 SRVAASTNQAGLKSRTWQPSLKSISEAAS

                     -----------------------------
Bdi_chr3:382920  289


//
Gene 1
Gene 1 288
  Exon 1 288 phase 0
     Supporting 1 54 1 18
     Supporting 58 141 19 46
     Supporting 160 288 47 89
//

......


The part of output of this code is shown below:
Query=Aly_481360
Length=0
hit_string:

Query=Aly_481360
Length=0
hit_string:

......

What's wrong with my code and how can I get the correct result? I'm
looking forward to your reply.

Thanks very much!

Best regards,
Zackaly


From roy.chaudhuri at gmail.com  Wed Aug 11 14:32:39 2010
From: roy.chaudhuri at gmail.com (Roy Chaudhuri)
Date: Wed, 11 Aug 2010 15:32:39 +0100
Subject: [Bioperl-l] using HMMER
In-Reply-To: <603590.1072.qm@web112620.mail.gq1.yahoo.com>
References: <603590.1072.qm@web112620.mail.gq1.yahoo.com>
Message-ID: <4C62B487.9090103@gmail.com>

Hi Fayroz,

Your $seq variable contains a Bio::SeqIO object (a biological 
filehandle), not a Bio::Seq (sequence object).

You need to change that line to:
my $seqio = Bio::SeqIO->new(-file=>'one_seq.fa', -format=>'fasta');
my $seq=$seqio->next_seq;

If you have multiple sequences in the file, then you will need to loop 
over them:
while (my $seq=$seqio->next_seq) {
# Code to run Hmmer goes here
}

Also, I don't think you need to specify -informat for your 
Bio::Tools::Run::Hmmer object, since you're passing it a sequence 
object, not a filename.

Hope this helps.
Roy.

On 08/08/2010 09:24, fayroz wrote:
> i need your help, i am a new perl user and want to use bioperl modules to run
> HMMER program ( HMMsearch) i have" model.hmm" and a "fasta file" to see which of
> them are similar with the model
> i write this code but there is a problems
>
> #!/usr/local/bin/perl W
> use Bio::AlignIO;
> use Bio::SearchIO;
> use Bio::SeqIO ;
> use Bio::Tools::Run::Hmmer;
>
> # run hmmsearch (similar for hmmpfam)
> my $factory = Bio::Tools::Run::Hmmer->new(-hmm =>  'h6_avian.hmm',-informat =>
> 'fasta');
> my $seq = Bio::SeqIO->new('-file'=>  "one_seq.fa", '-format'=>'Fasta');
>
> # Pass the factory a Bio::Seq object or a file name, returns a Bio::SearchIO
> my $searchio = $factory->hmmsearch($seq);
>
> while (my $result = $searchio->next_result){
> while(my $hit = $result->next_hit){
> while (my $hsp = $hit->next_hsp){
> print join("\t", ( $result->query_name,
> $hsp->query->start,
> $hsp->query->end,
> $hit->name,
> $hsp->hit->start,
> $hsp->hit->end,
> $hsp->score,
> $hsp->evalue,
> $hsp->seq_str,
> )), "\n";
> }
> }
> }
>
>
> exceptions:
> MSG: Unknown kind of input 'Bio::SeqIO::fasta=HASH(0x329a504)'
> STACK Bio::Tools::Run::Hmmer::_setinput
> D:/Perl/site/lib/Bio/Tools/Run/Hmmer.pm:381
> STACK Bio::Tools::Run::Hmmer::hmmsearch
> D:/Perl/site/lib/Bio/Tools/Run/Hmmer.pm:352
>   STACK toplevel test_bioperl.pl:12
> thank you
>
> fayroz
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Wed Aug 11 15:07:36 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 11 Aug 2010 10:07:36 -0500
Subject: [Bioperl-l] using HMMER
In-Reply-To: <4C62B487.9090103@gmail.com>
References: <603590.1072.qm@web112620.mail.gq1.yahoo.com>
	<4C62B487.9090103@gmail.com>
Message-ID: <62C86AFB-FF3A-44C6-A413-50C3F839DF34@illinois.edu>

might also want to check whether you are using hmmer2 vs hmmer3.  not sure if the wrapper works for hmmer3.

chris

On Aug 11, 2010, at 9:32 AM, Roy Chaudhuri wrote:

> Hi Fayroz,
> 
> Your $seq variable contains a Bio::SeqIO object (a biological filehandle), not a Bio::Seq (sequence object).
> 
> You need to change that line to:
> my $seqio = Bio::SeqIO->new(-file=>'one_seq.fa', -format=>'fasta');
> my $seq=$seqio->next_seq;
> 
> If you have multiple sequences in the file, then you will need to loop over them:
> while (my $seq=$seqio->next_seq) {
> # Code to run Hmmer goes here
> }
> 
> Also, I don't think you need to specify -informat for your Bio::Tools::Run::Hmmer object, since you're passing it a sequence object, not a filename.
> 
> Hope this helps.
> Roy.
> 
> On 08/08/2010 09:24, fayroz wrote:
>> i need your help, i am a new perl user and want to use bioperl modules to run
>> HMMER program ( HMMsearch) i have" model.hmm" and a "fasta file" to see which of
>> them are similar with the model
>> i write this code but there is a problems
>> 
>> #!/usr/local/bin/perl W
>> use Bio::AlignIO;
>> use Bio::SearchIO;
>> use Bio::SeqIO ;
>> use Bio::Tools::Run::Hmmer;
>> 
>> # run hmmsearch (similar for hmmpfam)
>> my $factory = Bio::Tools::Run::Hmmer->new(-hmm =>  'h6_avian.hmm',-informat =>
>> 'fasta');
>> my $seq = Bio::SeqIO->new('-file'=>  "one_seq.fa", '-format'=>'Fasta');
>> 
>> # Pass the factory a Bio::Seq object or a file name, returns a Bio::SearchIO
>> my $searchio = $factory->hmmsearch($seq);
>> 
>> while (my $result = $searchio->next_result){
>> while(my $hit = $result->next_hit){
>> while (my $hsp = $hit->next_hsp){
>> print join("\t", ( $result->query_name,
>> $hsp->query->start,
>> $hsp->query->end,
>> $hit->name,
>> $hsp->hit->start,
>> $hsp->hit->end,
>> $hsp->score,
>> $hsp->evalue,
>> $hsp->seq_str,
>> )), "\n";
>> }
>> }
>> }
>> 
>> 
>> exceptions:
>> MSG: Unknown kind of input 'Bio::SeqIO::fasta=HASH(0x329a504)'
>> STACK Bio::Tools::Run::Hmmer::_setinput
>> D:/Perl/site/lib/Bio/Tools/Run/Hmmer.pm:381
>> STACK Bio::Tools::Run::Hmmer::hmmsearch
>> D:/Perl/site/lib/Bio/Tools/Run/Hmmer.pm:352
>>  STACK toplevel test_bioperl.pl:12
>> thank you
>> 
>> fayroz
>> 
>> 
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From douglas.hoen at gmail.com  Wed Aug 11 19:13:49 2010
From: douglas.hoen at gmail.com (Doug)
Date: Wed, 11 Aug 2010 12:13:49 -0700 (PDT)
Subject: [Bioperl-l] How to store results of searches of translated DNA in
	SeqFeature::Store database of the original DNA?
Message-ID: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com>

Hi,

I am trying to store in a SeqFeature::Store database the results of
searches of translated DNA. The DB contains the original DNA
sequences. For instance, I have done HMMER searches of 6-frame
translations of the sequences stored in the DB. I want to store these
results "at" their (equivalent) DNA positions, which I can calculate.
Preferably, I would like to directly store the SeqFeature::Similarity
objects that I get from parsing these searches. But they are of course
located on different coordinate systems than the DNA, so I guess I
can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct
DNA position and then store the Similarity's as sub-SeqFeatures.

I could just set the Similarity's position to the (calculated) DNA
coordinates, or alternately make a new SeqFeature and copy in the
attributes I want. But is there a more elegant solution?

Thanks,
-- Doug


From douglas.hoen at gmail.com  Wed Aug 11 20:11:26 2010
From: douglas.hoen at gmail.com (Doug)
Date: Wed, 11 Aug 2010 13:11:26 -0700 (PDT)
Subject: [Bioperl-l] How to store results of searches of translated DNA
	in SeqFeature::Store database of the original DNA?
In-Reply-To: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com>
References: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com>
Message-ID: <f2a15012-4af5-4e91-babe-3fb8ff55dd86@k10g2000yqa.googlegroups.com>

One possible answer to my own question: Use
Bio::SeqFeature::PositionProxy's? Would this work?

On Aug 11, 3:13?pm, Doug <douglas.h... at gmail.com> wrote:
> Hi,
>
> I am trying to store in a SeqFeature::Store database the results of
> searches of translated DNA. The DB contains the original DNA
> sequences. For instance, I have done HMMER searches of 6-frame
> translations of the sequences stored in the DB. I want to store these
> results "at" their (equivalent) DNA positions, which I can calculate.
> Preferably, I would like to directly store the SeqFeature::Similarity
> objects that I get from parsing these searches. But they are of course
> located on different coordinate systems than the DNA, so I guess I
> can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct
> DNA position and then store the Similarity's as sub-SeqFeatures.
>
> I could just set the Similarity's position to the (calculated) DNA
> coordinates, or alternately make a new SeqFeature and copy in the
> attributes I want. But is there a more elegant solution?
>
> Thanks,
> -- Doug
> _______________________________________________
> Bioperl-l mailing list
> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l


From scott at scottcain.net  Wed Aug 11 20:16:22 2010
From: scott at scottcain.net (Scott Cain)
Date: Wed, 11 Aug 2010 16:16:22 -0400
Subject: [Bioperl-l] How to store results of searches of translated DNA
	in SeqFeature::Store database of the original DNA?
In-Reply-To: <f2a15012-4af5-4e91-babe-3fb8ff55dd86@k10g2000yqa.googlegroups.com>
References: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com>
	<f2a15012-4af5-4e91-babe-3fb8ff55dd86@k10g2000yqa.googlegroups.com>
Message-ID: <AANLkTim2X9uaVq6ChayrRJr10L3MeA4fVfuHvA0HyvqM@mail.gmail.com>

Hi Doug,

I don't know if any of the things you've thought of would work; I've
never tried it.  My inclination would be to express your data in GFF3
and use the standard loader.

Scott


On Wed, Aug 11, 2010 at 4:11 PM, Doug <douglas.hoen at gmail.com> wrote:
> One possible answer to my own question: Use
> Bio::SeqFeature::PositionProxy's? Would this work?
>
> On Aug 11, 3:13?pm, Doug <douglas.h... at gmail.com> wrote:
>> Hi,
>>
>> I am trying to store in a SeqFeature::Store database the results of
>> searches of translated DNA. The DB contains the original DNA
>> sequences. For instance, I have done HMMER searches of 6-frame
>> translations of the sequences stored in the DB. I want to store these
>> results "at" their (equivalent) DNA positions, which I can calculate.
>> Preferably, I would like to directly store the SeqFeature::Similarity
>> objects that I get from parsing these searches. But they are of course
>> located on different coordinate systems than the DNA, so I guess I
>> can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct
>> DNA position and then store the Similarity's as sub-SeqFeatures.
>>
>> I could just set the Similarity's position to the (calculated) DNA
>> coordinates, or alternately make a new SeqFeature and copy in the
>> attributes I want. But is there a more elegant solution?
>>
>> Thanks,
>> -- Doug
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
Ontario Institute for Cancer Research


From douglas.hoen at gmail.com  Wed Aug 11 20:38:54 2010
From: douglas.hoen at gmail.com (Doug)
Date: Wed, 11 Aug 2010 13:38:54 -0700 (PDT)
Subject: [Bioperl-l] How to store results of searches of translated DNA
	in SeqFeature::Store database of the original DNA?
In-Reply-To: <AANLkTim2X9uaVq6ChayrRJr10L3MeA4fVfuHvA0HyvqM@mail.gmail.com>
References: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com>
	<f2a15012-4af5-4e91-babe-3fb8ff55dd86@k10g2000yqa.googlegroups.com> 
	<AANLkTim2X9uaVq6ChayrRJr10L3MeA4fVfuHvA0HyvqM@mail.gmail.com>
Message-ID: <6e28dc26-ada0-4be2-9f62-a4d632aaf0bb@j8g2000yqd.googlegroups.com>

Hi Scott,

Good idea. Would you happen to know of an existing HMMER3 to GFF3
converter?

Thanks for your advice,
-- Doug

On Aug 11, 4:16?pm, Scott Cain <sc... at scottcain.net> wrote:
> Hi Doug,
>
> I don't know if any of the things you've thought of would work; I've
> never tried it. ?My inclination would be to express your data in GFF3
> and use the standard loader.
>
> Scott
>
>
>
>
>
> On Wed, Aug 11, 2010 at 4:11 PM, Doug <douglas.h... at gmail.com> wrote:
> > One possible answer to my own question: Use
> > Bio::SeqFeature::PositionProxy's? Would this work?
>
> > On Aug 11, 3:13?pm, Doug <douglas.h... at gmail.com> wrote:
> >> Hi,
>
> >> I am trying to store in a SeqFeature::Store database the results of
> >> searches of translated DNA. The DB contains the original DNA
> >> sequences. For instance, I have done HMMER searches of 6-frame
> >> translations of the sequences stored in the DB. I want to store these
> >> results "at" their (equivalent) DNA positions, which I can calculate.
> >> Preferably, I would like to directly store the SeqFeature::Similarity
> >> objects that I get from parsing these searches. But they are of course
> >> located on different coordinate systems than the DNA, so I guess I
> >> can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct
> >> DNA position and then store the Similarity's as sub-SeqFeatures.
>
> >> I could just set the Similarity's position to the (calculated) DNA
> >> coordinates, or alternately make a new SeqFeature and copy in the
> >> attributes I want. But is there a more elegant solution?
>
> >> Thanks,
> >> -- Doug
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioper... at lists.open-bio.org
> >http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net
> GMOD Coordinator (http://gmod.org/)?? ? ? ? ? ? ? ? ?? 216-392-3087
> Ontario Institute for Cancer Research
>
> _______________________________________________
> Bioperl-l mailing list
> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l


From douglas.hoen at gmail.com  Wed Aug 11 20:53:35 2010
From: douglas.hoen at gmail.com (Doug)
Date: Wed, 11 Aug 2010 13:53:35 -0700 (PDT)
Subject: [Bioperl-l] How to store results of searches of translated DNA
	in SeqFeature::Store database of the original DNA?
In-Reply-To: <6e28dc26-ada0-4be2-9f62-a4d632aaf0bb@j8g2000yqd.googlegroups.com>
References: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com>
	<f2a15012-4af5-4e91-babe-3fb8ff55dd86@k10g2000yqa.googlegroups.com> 
	<AANLkTim2X9uaVq6ChayrRJr10L3MeA4fVfuHvA0HyvqM@mail.gmail.com> 
	<6e28dc26-ada0-4be2-9f62-a4d632aaf0bb@j8g2000yqd.googlegroups.com>
Message-ID: <a9d5aca2-3c28-49e8-bd76-119309c38c05@x21g2000yqa.googlegroups.com>

One more note: I did try using PositionProxy but it failed. It doesn't
implement seq_id() and so can't be stored in the DB:

------------- EXCEPTION: Bio::Root::NotImplemented -------------
MSG: Abstract method "Bio::SeqFeatureI::seq_id" is not implemented by
package Bio::SeqFeature::PositionProxy.
This is not your fault - author of Bio::SeqFeature::PositionProxy
should be blamed!

...


On Aug 11, 4:38?pm, Doug <douglas.h... at gmail.com> wrote:
> Hi Scott,
>
> Good idea. Would you happen to know of an existing HMMER3 to GFF3
> converter?
>
> Thanks for your advice,
> -- Doug
>
> On Aug 11, 4:16?pm, Scott Cain <sc... at scottcain.net> wrote:
>
>
>
>
>
> > Hi Doug,
>
> > I don't know if any of the things you've thought of would work; I've
> > never tried it. ?My inclination would be to express your data in GFF3
> > and use the standard loader.
>
> > Scott
>
> > On Wed, Aug 11, 2010 at 4:11 PM, Doug <douglas.h... at gmail.com> wrote:
> > > One possible answer to my own question: Use
> > > Bio::SeqFeature::PositionProxy's? Would this work?
>
> > > On Aug 11, 3:13?pm, Doug <douglas.h... at gmail.com> wrote:
> > >> Hi,
>
> > >> I am trying to store in a SeqFeature::Store database the results of
> > >> searches of translated DNA. The DB contains the original DNA
> > >> sequences. For instance, I have done HMMER searches of 6-frame
> > >> translations of the sequences stored in the DB. I want to store these
> > >> results "at" their (equivalent) DNA positions, which I can calculate.
> > >> Preferably, I would like to directly store the SeqFeature::Similarity
> > >> objects that I get from parsing these searches. But they are of course
> > >> located on different coordinate systems than the DNA, so I guess I
> > >> can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct
> > >> DNA position and then store the Similarity's as sub-SeqFeatures.
>
> > >> I could just set the Similarity's position to the (calculated) DNA
> > >> coordinates, or alternately make a new SeqFeature and copy in the
> > >> attributes I want. But is there a more elegant solution?
>
> > >> Thanks,
> > >> -- Doug
> > >> _______________________________________________
> > >> Bioperl-l mailing list
> > >> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioper... at lists.open-bio.org
> > >http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> > --
> > ------------------------------------------------------------------------
> > Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net
> > GMOD Coordinator (http://gmod.org/)?? ? ? ? ? ? ? ? ?? 216-392-3087
> > Ontario Institute for Cancer Research
>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Wed Aug 11 20:45:00 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 11 Aug 2010 15:45:00 -0500
Subject: [Bioperl-l] How to store results of searches of translated DNA
	in SeqFeature::Store database of the original DNA?
In-Reply-To: <6e28dc26-ada0-4be2-9f62-a4d632aaf0bb@j8g2000yqd.googlegroups.com>
References: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com>
	<f2a15012-4af5-4e91-babe-3fb8ff55dd86@k10g2000yqa.googlegroups.com>
	<AANLkTim2X9uaVq6ChayrRJr10L3MeA4fVfuHvA0HyvqM@mail.gmail.com>
	<6e28dc26-ada0-4be2-9f62-a4d632aaf0bb@j8g2000yqd.googlegroups.com>
Message-ID: <190AF658-E8FE-43D7-A71F-196AE54DA1DB@illinois.edu>

HMMER3 is parsed by Bio::SearchIO now in bioperl-live, and I think there is a generic SearchIO->GFF3 script floating around the intertubes somewheres...

chris

On Aug 11, 2010, at 3:38 PM, Doug wrote:

> Hi Scott,
> 
> Good idea. Would you happen to know of an existing HMMER3 to GFF3
> converter?
> 
> Thanks for your advice,
> -- Doug
> 
> On Aug 11, 4:16 pm, Scott Cain <sc... at scottcain.net> wrote:
>> Hi Doug,
>> 
>> I don't know if any of the things you've thought of would work; I've
>> never tried it.  My inclination would be to express your data in GFF3
>> and use the standard loader.
>> 
>> Scott
>> 
>> 
>> 
>> 
>> 
>> On Wed, Aug 11, 2010 at 4:11 PM, Doug <douglas.h... at gmail.com> wrote:
>>> One possible answer to my own question: Use
>>> Bio::SeqFeature::PositionProxy's? Would this work?
>> 
>>> On Aug 11, 3:13 pm, Doug <douglas.h... at gmail.com> wrote:
>>>> Hi,
>> 
>>>> I am trying to store in a SeqFeature::Store database the results of
>>>> searches of translated DNA. The DB contains the original DNA
>>>> sequences. For instance, I have done HMMER searches of 6-frame
>>>> translations of the sequences stored in the DB. I want to store these
>>>> results "at" their (equivalent) DNA positions, which I can calculate.
>>>> Preferably, I would like to directly store the SeqFeature::Similarity
>>>> objects that I get from parsing these searches. But they are of course
>>>> located on different coordinate systems than the DNA, so I guess I
>>>> can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct
>>>> DNA position and then store the Similarity's as sub-SeqFeatures.
>> 
>>>> I could just set the Similarity's position to the (calculated) DNA
>>>> coordinates, or alternately make a new SeqFeature and copy in the
>>>> attributes I want. But is there a more elegant solution?
>> 
>>>> Thanks,
>>>> -- Doug
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioper... at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> --
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.                                   scott at scottcain dot net
>> GMOD Coordinator (http://gmod.org/)                    216-392-3087
>> Ontario Institute for Cancer Research
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From scott at scottcain.net  Wed Aug 11 21:05:25 2010
From: scott at scottcain.net (Scott Cain)
Date: Wed, 11 Aug 2010 17:05:25 -0400
Subject: [Bioperl-l] How to store results of searches of translated DNA
	in SeqFeature::Store database of the original DNA?
In-Reply-To: <190AF658-E8FE-43D7-A71F-196AE54DA1DB@illinois.edu>
References: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com>
	<f2a15012-4af5-4e91-babe-3fb8ff55dd86@k10g2000yqa.googlegroups.com>
	<AANLkTim2X9uaVq6ChayrRJr10L3MeA4fVfuHvA0HyvqM@mail.gmail.com>
	<6e28dc26-ada0-4be2-9f62-a4d632aaf0bb@j8g2000yqd.googlegroups.com>
	<190AF658-E8FE-43D7-A71F-196AE54DA1DB@illinois.edu>
Message-ID: <AANLkTimY09-wo9R_ZbPmSG_9x7TZjVobTM95VO5fgCa4@mail.gmail.com>

Um, yeah, it's in bioperl: bp_search2gff.pl.

Scott


On Wed, Aug 11, 2010 at 4:45 PM, Chris Fields <cjfields at illinois.edu> wrote:
> HMMER3 is parsed by Bio::SearchIO now in bioperl-live, and I think there is a generic SearchIO->GFF3 script floating around the intertubes somewheres...
>
> chris
>
> On Aug 11, 2010, at 3:38 PM, Doug wrote:
>
>> Hi Scott,
>>
>> Good idea. Would you happen to know of an existing HMMER3 to GFF3
>> converter?
>>
>> Thanks for your advice,
>> -- Doug
>>
>> On Aug 11, 4:16 pm, Scott Cain <sc... at scottcain.net> wrote:
>>> Hi Doug,
>>>
>>> I don't know if any of the things you've thought of would work; I've
>>> never tried it. ?My inclination would be to express your data in GFF3
>>> and use the standard loader.
>>>
>>> Scott
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Aug 11, 2010 at 4:11 PM, Doug <douglas.h... at gmail.com> wrote:
>>>> One possible answer to my own question: Use
>>>> Bio::SeqFeature::PositionProxy's? Would this work?
>>>
>>>> On Aug 11, 3:13 pm, Doug <douglas.h... at gmail.com> wrote:
>>>>> Hi,
>>>
>>>>> I am trying to store in a SeqFeature::Store database the results of
>>>>> searches of translated DNA. The DB contains the original DNA
>>>>> sequences. For instance, I have done HMMER searches of 6-frame
>>>>> translations of the sequences stored in the DB. I want to store these
>>>>> results "at" their (equivalent) DNA positions, which I can calculate.
>>>>> Preferably, I would like to directly store the SeqFeature::Similarity
>>>>> objects that I get from parsing these searches. But they are of course
>>>>> located on different coordinate systems than the DNA, so I guess I
>>>>> can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct
>>>>> DNA position and then store the Similarity's as sub-SeqFeatures.
>>>
>>>>> I could just set the Similarity's position to the (calculated) DNA
>>>>> coordinates, or alternately make a new SeqFeature and copy in the
>>>>> attributes I want. But is there a more elegant solution?
>>>
>>>>> Thanks,
>>>>> -- Doug
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioper... at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> --
>>> ------------------------------------------------------------------------
>>> Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain dot net
>>> GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ?216-392-3087
>>> Ontario Institute for Cancer Research
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
Ontario Institute for Cancer Research


From cjfields at illinois.edu  Wed Aug 11 21:07:20 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 11 Aug 2010 16:07:20 -0500
Subject: [Bioperl-l] How to store results of searches of translated DNA
	in SeqFeature::Store database of the original DNA?
In-Reply-To: <AANLkTimY09-wo9R_ZbPmSG_9x7TZjVobTM95VO5fgCa4@mail.gmail.com>
References: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com>
	<f2a15012-4af5-4e91-babe-3fb8ff55dd86@k10g2000yqa.googlegroups.com>
	<AANLkTim2X9uaVq6ChayrRJr10L3MeA4fVfuHvA0HyvqM@mail.gmail.com>
	<6e28dc26-ada0-4be2-9f62-a4d632aaf0bb@j8g2000yqd.googlegroups.com>
	<190AF658-E8FE-43D7-A71F-196AE54DA1DB@illinois.edu>
	<AANLkTimY09-wo9R_ZbPmSG_9x7TZjVobTM95VO5fgCa4@mail.gmail.com>
Message-ID: <CCD1DE1D-867E-468D-941A-7C418C126FBE@illinois.edu>

For some reason I thought there was a more up-to-date one somewhere.  Ah well, can't keep track of all the code in bioperl :>

chris

On Aug 11, 2010, at 4:05 PM, Scott Cain wrote:

> Um, yeah, it's in bioperl: bp_search2gff.pl.
> 
> Scott
> 
> 
> On Wed, Aug 11, 2010 at 4:45 PM, Chris Fields <cjfields at illinois.edu> wrote:
>> HMMER3 is parsed by Bio::SearchIO now in bioperl-live, and I think there is a generic SearchIO->GFF3 script floating around the intertubes somewheres...
>> 
>> chris
>> 
>> On Aug 11, 2010, at 3:38 PM, Doug wrote:
>> 
>>> Hi Scott,
>>> 
>>> Good idea. Would you happen to know of an existing HMMER3 to GFF3
>>> converter?
>>> 
>>> Thanks for your advice,
>>> -- Doug
>>> 
>>> On Aug 11, 4:16 pm, Scott Cain <sc... at scottcain.net> wrote:
>>>> Hi Doug,
>>>> 
>>>> I don't know if any of the things you've thought of would work; I've
>>>> never tried it.  My inclination would be to express your data in GFF3
>>>> and use the standard loader.
>>>> 
>>>> Scott
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Wed, Aug 11, 2010 at 4:11 PM, Doug <douglas.h... at gmail.com> wrote:
>>>>> One possible answer to my own question: Use
>>>>> Bio::SeqFeature::PositionProxy's? Would this work?
>>>> 
>>>>> On Aug 11, 3:13 pm, Doug <douglas.h... at gmail.com> wrote:
>>>>>> Hi,
>>>> 
>>>>>> I am trying to store in a SeqFeature::Store database the results of
>>>>>> searches of translated DNA. The DB contains the original DNA
>>>>>> sequences. For instance, I have done HMMER searches of 6-frame
>>>>>> translations of the sequences stored in the DB. I want to store these
>>>>>> results "at" their (equivalent) DNA positions, which I can calculate.
>>>>>> Preferably, I would like to directly store the SeqFeature::Similarity
>>>>>> objects that I get from parsing these searches. But they are of course
>>>>>> located on different coordinate systems than the DNA, so I guess I
>>>>>> can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct
>>>>>> DNA position and then store the Similarity's as sub-SeqFeatures.
>>>> 
>>>>>> I could just set the Similarity's position to the (calculated) DNA
>>>>>> coordinates, or alternately make a new SeqFeature and copy in the
>>>>>> attributes I want. But is there a more elegant solution?
>>>> 
>>>>>> Thanks,
>>>>>> -- Doug
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>> 
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioper... at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>> 
>>>> --
>>>> ------------------------------------------------------------------------
>>>> Scott Cain, Ph. D.                                   scott at scottcain dot net
>>>> GMOD Coordinator (http://gmod.org/)                    216-392-3087
>>>> Ontario Institute for Cancer Research
>>>> 
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> 
> 
> 
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain dot net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research


From douglas.hoen at gmail.com  Wed Aug 11 21:11:20 2010
From: douglas.hoen at gmail.com (Douglas Hoen)
Date: Wed, 11 Aug 2010 17:11:20 -0400
Subject: [Bioperl-l] How to store results of searches of translated DNA
	in SeqFeature::Store database of the original DNA?
In-Reply-To: <AANLkTimY09-wo9R_ZbPmSG_9x7TZjVobTM95VO5fgCa4@mail.gmail.com>
References: <1d774f4c-0aa0-45e3-964d-82dbdab4f261@j8g2000yqd.googlegroups.com>
	<f2a15012-4af5-4e91-babe-3fb8ff55dd86@k10g2000yqa.googlegroups.com>
	<AANLkTim2X9uaVq6ChayrRJr10L3MeA4fVfuHvA0HyvqM@mail.gmail.com>
	<6e28dc26-ada0-4be2-9f62-a4d632aaf0bb@j8g2000yqd.googlegroups.com>
	<190AF658-E8FE-43D7-A71F-196AE54DA1DB@illinois.edu>
	<AANLkTimY09-wo9R_ZbPmSG_9x7TZjVobTM95VO5fgCa4@mail.gmail.com>
Message-ID: <A8FFFBCC-4E4F-478B-B824-BB4249B11BA1@gmail.com>

Great, thanks so much for the info.

On 2010-08-11, at 5:05 PM, Scott Cain wrote:

> Um, yeah, it's in bioperl: bp_search2gff.pl.
> 
> Scott
> 
> 
> On Wed, Aug 11, 2010 at 4:45 PM, Chris Fields <cjfields at illinois.edu> wrote:
>> HMMER3 is parsed by Bio::SearchIO now in bioperl-live, and I think there is a generic SearchIO->GFF3 script floating around the intertubes somewheres...
>> 
>> chris
>> 
>> On Aug 11, 2010, at 3:38 PM, Doug wrote:
>> 
>>> Hi Scott,
>>> 
>>> Good idea. Would you happen to know of an existing HMMER3 to GFF3
>>> converter?
>>> 
>>> Thanks for your advice,
>>> -- Doug
>>> 
>>> On Aug 11, 4:16 pm, Scott Cain <sc... at scottcain.net> wrote:
>>>> Hi Doug,
>>>> 
>>>> I don't know if any of the things you've thought of would work; I've
>>>> never tried it.  My inclination would be to express your data in GFF3
>>>> and use the standard loader.
>>>> 
>>>> Scott
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Wed, Aug 11, 2010 at 4:11 PM, Doug <douglas.h... at gmail.com> wrote:
>>>>> One possible answer to my own question: Use
>>>>> Bio::SeqFeature::PositionProxy's? Would this work?
>>>> 
>>>>> On Aug 11, 3:13 pm, Doug <douglas.h... at gmail.com> wrote:
>>>>>> Hi,
>>>> 
>>>>>> I am trying to store in a SeqFeature::Store database the results of
>>>>>> searches of translated DNA. The DB contains the original DNA
>>>>>> sequences. For instance, I have done HMMER searches of 6-frame
>>>>>> translations of the sequences stored in the DB. I want to store these
>>>>>> results "at" their (equivalent) DNA positions, which I can calculate.
>>>>>> Preferably, I would like to directly store the SeqFeature::Similarity
>>>>>> objects that I get from parsing these searches. But they are of course
>>>>>> located on different coordinate systems than the DNA, so I guess I
>>>>>> can't (or shouldn't) create a SeqFeature (e.g. Generic) at the correct
>>>>>> DNA position and then store the Similarity's as sub-SeqFeatures.
>>>> 
>>>>>> I could just set the Similarity's position to the (calculated) DNA
>>>>>> coordinates, or alternately make a new SeqFeature and copy in the
>>>>>> attributes I want. But is there a more elegant solution?
>>>> 
>>>>>> Thanks,
>>>>>> -- Doug
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>> 
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioper... at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>> 
>>>> --
>>>> ------------------------------------------------------------------------
>>>> Scott Cain, Ph. D.                                   scott at scottcain dot net
>>>> GMOD Coordinator (http://gmod.org/)                    216-392-3087
>>>> Ontario Institute for Cancer Research
>>>> 
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> 
> 
> 
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain dot net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research


From Russell.Smithies at agresearch.co.nz  Wed Aug 11 21:31:32 2010
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Thu, 12 Aug 2010 09:31:32 +1200
Subject: [Bioperl-l] AlignIO  and Gbrowse_syn
In-Reply-To: <AANLkTimhT7KFHgg8HssiMrXkGNSjRMqjkuE++iAaqj4C@mail.gmail.com>
References: <AANLkTi=DS92zQ+_YANSbgmMiPUDo66YXyLhQby+_z_VD@mail.gmail.com>
	<C87CF736.E5DB%gowthaman.ramasamy@sbri.org>
	<AANLkTimhT7KFHgg8HssiMrXkGNSjRMqjkuE++iAaqj4C@mail.gmail.com>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF32F0237EAB7@exchsth.agresearch.co.nz>

I know there was some brief discussion about .maf format a few weeks ago but I've had an enquiry (as below) from a colleague. 
If GBrowse_syn is using .maf format, does AlignIO need more work?
Any comments?

--Russell


I'd like to plug LASTZ alignments into GBrowse_syn. LASTZ can produce a limit number of alignment formats (http://www.bx.psu.edu/miller_lab/dist/README.lastz-1.02.00/README.lastz-1.02.00a.html#options_output). GBrowse_syn accepts clustalw format plus "other commonly used formats recognized by BioPerl's AlignIO parser" (http://gmod.org/wiki/GBrowse_syn_Database) .  Since LASTZ doesn't produce clustalw, I've tried parsing LASTZ maf output to clustalw (and other alignment formats) using AlignIO, however I run into the following issues:
*Strand info is lost (probably fair enough, since this isn't part of the clustalw format per se; incorporating strand info within sequence IDs is a GBrowse_syn clustalw specification)
*The coordinate system for reverse strand matches  differs between LASTZ .maf and BioPerl .maf: for LASTZ, coordinates relate to the reverse complemented sequence, whereas for BioPerl/GBrowse, coordinates relate to the original (non-rev complemented) sequence. E.g. a coordinate of "1" in the LASTZ .maf file refers to the last base of the original sequence; AlignIO prints "1" to the output clustalw file, but since strand info is lost it is construed as the first position at the very start of the original sequence. As a result all reverse match coordinates in the resulting clustalw output file are incorrect.
*AlignIO is unable to parse multiple, individual aligned regions within the same .maf file; it interleaves them

I would be interested to hear whether anyone has already found a solution to integrating LASTZ and GBrowse_syn... and also whether any development of AlignIO to improve support of maf format is planned.
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From cjfields at illinois.edu  Wed Aug 11 22:02:38 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 11 Aug 2010 17:02:38 -0500
Subject: [Bioperl-l] AlignIO  and Gbrowse_syn
In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32F0237EAB7@exchsth.agresearch.co.nz>
References: <AANLkTi=DS92zQ+_YANSbgmMiPUDo66YXyLhQby+_z_VD@mail.gmail.com>
	<C87CF736.E5DB%gowthaman.ramasamy@sbri.org>
	<AANLkTimhT7KFHgg8HssiMrXkGNSjRMqjkuE++iAaqj4C@mail.gmail.com>
	<18DF7D20DFEC044098A1062202F5FFF32F0237EAB7@exchsth.agresearch.co.nz>
Message-ID: <E53C66C1-E4F1-4E83-B5ED-631CE62D7DCE@illinois.edu>

Russell,

We have had very few requests to support .maf until recently, which is why there has been little done with it.  We welcome any help to improve it.  

chris

On Aug 11, 2010, at 4:31 PM, Smithies, Russell wrote:

> I know there was some brief discussion about .maf format a few weeks ago but I've had an enquiry (as below) from a colleague. 
> If GBrowse_syn is using .maf format, does AlignIO need more work?
> Any comments?
> 
> --Russell
> 
> 
> I'd like to plug LASTZ alignments into GBrowse_syn. LASTZ can produce a limit number of alignment formats (http://www.bx.psu.edu/miller_lab/dist/README.lastz-1.02.00/README.lastz-1.02.00a.html#options_output). GBrowse_syn accepts clustalw format plus "other commonly used formats recognized by BioPerl's AlignIO parser" (http://gmod.org/wiki/GBrowse_syn_Database) .  Since LASTZ doesn't produce clustalw, I've tried parsing LASTZ maf output to clustalw (and other alignment formats) using AlignIO, however I run into the following issues:
> *Strand info is lost (probably fair enough, since this isn't part of the clustalw format per se; incorporating strand info within sequence IDs is a GBrowse_syn clustalw specification)
> *The coordinate system for reverse strand matches  differs between LASTZ .maf and BioPerl .maf: for LASTZ, coordinates relate to the reverse complemented sequence, whereas for BioPerl/GBrowse, coordinates relate to the original (non-rev complemented) sequence. E.g. a coordinate of "1" in the LASTZ .maf file refers to the last base of the original sequence; AlignIO prints "1" to the output clustalw file, but since strand info is lost it is construed as the first position at the very start of the original sequence. As a result all reverse match coordinates in the resulting clustalw output file are incorrect.
> *AlignIO is unable to parse multiple, individual aligned regions within the same .maf file; it interleaves them
> 
> I would be interested to hear whether anyone has already found a solution to integrating LASTZ and GBrowse_syn... and also whether any development of AlignIO to improve support of maf format is planned.
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From douglas.hoen at gmail.com  Thu Aug 12 05:59:37 2010
From: douglas.hoen at gmail.com (Doug Hoen)
Date: Wed, 11 Aug 2010 22:59:37 -0700 (PDT)
Subject: [Bioperl-l] HMMER3 to GFF3
Message-ID: <4bb89ced-69d9-43ff-ae20-4ce134efc40a@f6g2000yqa.googlegroups.com>

Hi,

 I am trying to convert HMMER3 (hmmscan) output files into GFF3 files.
Based on previous advice (see the thread, "How to store results of
searches of translated DNA in SeqFeature::Store database of the
original DNA?"), I have installed bioperl-live for its new HMMER3
parsing capabilities (in SearchIO) and am trying to use
bp_search2gff.pl to do the file conversion.

The hmmscan was done on translated chromosome sequences with conserved
domain models. I want to get the GFF 'start' and 'end' columns to be
based on these coordinates, not those of the models. To do this (with
my files), it seems I need to use the option "--type hit". However,
this changes the "Target" sequence name from the model name to
chromosome name, and the model name does not appear anywhere in the
output (see below).

Could someone please confirm whether the results are incorrect and, if
so, perhaps suggest a fix? It may well be that this problem is due to
the unusual way I am using hmmscan, rather than a problem with HMMER3
parsing...?

Many thanks,
-- Doug


========================================================


Here's what it looks like if I do *not* use the "--type hit" option.
(RVT_2 is a conserved domain name. I need this in the output.)


COMMAND:
------------------
bp_search2gff.pl -i ../chr1-tesigsv2.hmmscan -o chr1-tesigsv2-hmmscan-
original-locations-v2.gff3 --format hmmer3 --source HMMER3 --version 3
--component


OUTPUT:
------------------
==> chr1-tesigsv2-hmmscan-original-locations-v2.gff3 <==
##gff-version 3
Chr1_1	chromosome	Component	1	10142557	.	.	1	sequence=Chr1_1
Chr1_1	HMMER3	similarity	1	245	307.3	.	0	Target=Sequence:RVT_2 1898330
1898579
Chr1_1	HMMER3	similarity	1	244	329.5	.	0	Target=Sequence:RVT_2 2573551
2573796
Chr1_1	HMMER3	similarity	1	245	308.8	.	0	Target=Sequence:RVT_2 3159685
3159930
Chr1_1	HMMER3	similarity	1	102	108.2	.	0	Target=Sequence:RVT_2 3438684
3438791
Chr1_1	HMMER3	similarity	2	245	277.2	.	0	Target=Sequence:RVT_2 3566642
3566891
Chr1_1	HMMER3	similarity	13	213	251.4	.	0	Target=Sequence:RVT_2
4251160 4251373
Chr1_1	HMMER3	similarity	1	244	310.6	.	0	Target=Sequence:RVT_2 4252791
4253036
Chr1_1	HMMER3	similarity	6	99	94.2	.	0	Target=Sequence:RVT_2 4271555
4271653


========================================================


And here's what it looks like if I *do* use the "--type hit" option.
The coordinates look good but the model name has disappeared (and the
Target=Sequence seems wrong).


COMMAND:
------------------
bp_search2gff.pl -i ../chr1-tesigsv2.hmmscan -o chr1-tesigsv2-hmmscan-
original-locations-v3.gff3 --format hmmer3 --type hit --source HMMER3
--version 3 --component


OUTPUT:
------------------
==> chr1-tesigsv2-hmmscan-original-locations-v3.gff3 <==
##gff-version 3
RVT_2	HMMER3	similarity	1898330	1898579	307.3	.	0
Target=Sequence:Chr1_1 1 245
RVT_2	HMMER3	similarity	2573551	2573796	329.5	.	0
Target=Sequence:Chr1_1 1 244
RVT_2	HMMER3	similarity	3159685	3159930	308.8	.	0
Target=Sequence:Chr1_1 1 245
RVT_2	HMMER3	similarity	3438684	3438791	108.2	.	0
Target=Sequence:Chr1_1 1 102
RVT_2	HMMER3	similarity	3566642	3566891	277.2	.	0
Target=Sequence:Chr1_1 2 245
RVT_2	HMMER3	similarity	4251160	4251373	251.4	.	0
Target=Sequence:Chr1_1 13 213
RVT_2	HMMER3	similarity	4252791	4253036	310.6	.	0
Target=Sequence:Chr1_1 1 244
RVT_2	HMMER3	similarity	4271555	4271653	94.2	.	0
Target=Sequence:Chr1_1 6 99
RVT_2	HMMER3	similarity	4481232	4481477	281.5	.	0
Target=Sequence:Chr1_1 2 245


========================================================


And here's what the input HMMER3 result file looks like:


==> ../chr1-tesigsv2.hmmscan <==
# hmmscan :: search sequence(s) against a profile database
# HMMER 3.0rc1 (February 2010); http://hmmer.org/
# Copyright (C) 2010 Howard Hughes Medical Institute.
# Freely distributed under the GNU General Public License (GPLv3).
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- -
# query sequence file:             [...]/whole_chromosomes/translated/
chr1.pep
# target HMM database:             [...]/signatures/Pfam-A.hmm
# output directed to file:         chr1-tesigsv2.hmmscan
# model-specific thresholding:     TC cutoffs
# Max sensitivity mode:            on [all heuristic filters off]
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- -

Query:       Chr1_1  [L=10142557]
Description: CHROMOSOME dumped from ADB: Jun/20/09 14:53; last
updated: 2009-02-02
Scores for complete sequence (score includes all domains):
   --- full sequence ---   --- best 1 domain ---    -#dom-
    E-value  score  bias    E-value  score  bias    exp  N
Model           Description
    ------- ------ -----    ------- ------ -----   ---- --
--------        -----------
          0 3971.3  17.7   2.6e-101  329.5   0.6   19.4 17
RVT_2           Reverse transcriptase (RNA-dependent DNA pol
          0 3040.7  23.0     1e-206  678.6   0.1   12.2 10
ATHILA          ATHILA ORF-1 family
          0 1681.9  79.1    1.9e-46  149.9   0.4   28.0 21
RVT_1           Reverse transcriptase (RNA-dependent DNA pol
          0 1446.9  27.4    3.6e-95  309.1   0.2    7.6  5
Transposase_21  Transposase family tnp2
          0 1168.4  50.3    1.4e-29   94.4   0.3   21.5 18
rve             Integrase core domain
   9.1e-300  960.0  69.0    3.1e-20   64.0   0.0   28.8 20
Retrotrans_gag  Retrotransposon gag protein
   1.5e-180  577.0  31.6    1.6e-29   93.1   1.5    9.5  8
Transposase_23  TNP1/EN/SPM transposase
   4.4e-143  456.9  82.8    4.8e-18   56.4   0.1   12.9 11
MuDR            MuDR family transposase
   3.8e-116  371.4  19.6    1.2e-18   58.9   0.0   13.7  7
MULE            MULE transposase domain
   7.1e-106  344.1   5.6    2.7e-97  316.0   0.0    3.6  1
Plant_tran      Plant transposon protein
    9.2e-85  275.4  22.9    5.4e-60  194.4   0.3    6.4  3
Peptidase_C48   Ulp1 protease family, C-terminal catalytic d
    1.8e-77  249.8  24.8    4.4e-28   89.8   0.1   10.8  3
Transposase_24  Plant transposase (Ptta/En/Spm family)
    2.8e-47  150.1   1.2    5.5e-23   72.3   0.2    3.7  2
hATC            hAT family dimerisation domain
    5.7e-28   89.4   3.6    4.7e-13   41.1   0.0    6.5  1
RVP_2           Retroviral aspartyl protease
      1e-16   53.3   0.0    4.4e-07   22.1   0.0    6.8  1
RnaseH          RNase H
    1.5e-08   25.3   2.4    0.00016   12.1   0.0    4.9  0
Transposase_mut Transposase, Mutator family


Domain annotation for each model (and alignments):
>> RVT_2  Reverse transcriptase (RNA-dependent DNA polymerase)
   #    score  bias  c-Evalue  i-Evalue hmmfrom  hmm to    alifrom
ali to    envfrom  env to     acc
 ---   ------ ----- --------- --------- ------- -------    -------
-------    ------- -------    ----
   1 !  307.3   0.0   5.3e-95   1.5e-94       1     245 [. 1898330
1898578 .. 1898330 1898579 .. 0.99
   2 !  329.5   0.6  8.9e-102  2.6e-101       1     244 [. 2573551
2573794 .. 2573551 2573796 .. 0.99
   3 !  308.8   0.0   1.8e-95   5.2e-95       1     245 [. 3159685
3159929 .. 3159685 3159930 .. 0.99
   4 !  108.2   0.1   3.4e-34   9.7e-34       1     102 [. 3438684
3438785 .. 3438684 3438791 .. 0.96
   5 !  277.2   0.0   8.1e-86   2.3e-85       2     245 .. 3566643
3566890 .. 3566642 3566891 .. 0.99
   6 !  251.4   0.0   6.2e-78   1.8e-77      13     213 .. 4251164
4251364 .. 4251160 4251373 .. 0.97
   7 !  310.6   0.0   5.1e-96   1.5e-95       1     244 [. 4252791
4253034 .. 4252791 4253036 .. 0.99
   8 !   94.2   0.1   6.1e-30   1.8e-29       6      99 .. 4271560
4271653 .. 4271555 4271653 .. 0.97
   9 !  281.5   0.9   3.9e-87   1.1e-86       2     245 .. 4481233
4481476 .. 4481232 4481477 .. 0.98
  10 !  248.2   0.0   5.9e-77   1.7e-76       1     190 [. 4521040
4521233 .. 4521040 4521237 .. 0.97
  11 !  314.6   0.1   3.2e-97   9.2e-97       1     244 [. 4652456
4652702 .. 4652456 4652704 .. 0.98
  12 !   40.7   0.0   1.3e-13   3.7e-13       2      92 .. 5219607
5219697 .. 5219606 5219701 .. 0.90
  13 !  221.0   0.0   1.2e-68   3.4e-68       2     245 .. 5241015
5241258 .. 5241014 5241259 .. 0.95
  14 !   81.2   0.0   5.6e-26   1.6e-25       2     115 .. 5501957
5502070 .. 5501956 5502080 .. 0.92
  15 !  272.4   0.0   2.3e-84   6.7e-84      30     245 .. 6483057
6483271 .. 6483050 6483272 .. 0.98
  16 !  178.5   0.0   1.2e-55   3.3e-55      81     244 .. 7250563
7250726 .. 7250552 7250728 .. 0.96
  17 !  313.7   0.0   5.9e-97   1.7e-96       2     245 .. 7707124
7707367 .. 7707123 7707368 .. 0.99

  Alignments for each domain:
  == domain 1    score: 307.3 bits;  conditional E-value: 5.3e-95
   RVT_2       1
nktwelvelpkgkkviglkWvfklKlnedgeierykARlVakGftqkegidyeetfspvvklesirlllalaaekkleleqlDvktaFLngelee
95
                 n tw +++lp gkk++g+kWv+k+Kln+dg++erykARlVakG+tq+eg+dy
+tfspv+kl++++ll+a+aa+k+++l+qlD+++aFLng+l+e
  Chr1_1 1898330
NGTWVVCSLPVGKKAVGCKWVYKIKLNADGSLERYKARLVAKGYTQTEGLDYVDTFSPVAKLTTVKLLIAVAAAKGWSLSQLDISNAFLNGSLDE
1898424
 
68*********************************************************************************************
PP

   RVT_2      96
evYvkqpeGfedkkk....enkvckLkkslYgLkqapraWyeklsevllklgfkkseadkclfvkkkeeeliivllYVDDlliagsskelieelk
186
                 e+Y++ p+G++ ++     +n vc+LkkslYgLkqa+r+Wy k+se l++lgf+
+s+ d++lf++k++++ ++vl+YVDD++ia+s +++ e l
  Chr1_1 1898425
EIYMTLPPGYSPRQGdsfpPNAVCRLKKSLYGLKQASRQWYLKFSESLKALGFTQSSGDHTLFTRKSKNSYMAVLVYVDDIIIASSCDRETELLR
1898519
 
***********998889999***************************************************************************
PP

   RVT_2     187
eeLkkefemkdlgelkyfLgleierkeegillsqekyvkkllkkfkmedakpvstplea 245
                 ++L+++ +++dlg+l+yfLglei+r+++gi+++q+ky+ +ll+++++  +k++s
+p+e+
  Chr1_1 1898520
DALQRSSKLRDLGTLRYFLGLEIARNTDGISICQRKYTLELLAETGLLGCKSSSVPMEP 1898578
 
*********************************************************97 PP

  == domain 2    score: 329.5 bits;  conditional E-value: 8.9e-102
   RVT_2       1
nktwelvelpkgkkviglkWvfklKlnedgeierykARlVakGftqkegidyeetfspvvklesirlllalaaekkleleqlDvktaFLngelee
95
                 n+twel++lp+g+k+ig+kWv+k K+n++ge+erykARlVakG++q++gidy+e
+f+pv++le++rl+++laa++k++++q+D k aFLng++ee
  Chr1_1 2573551
NDTWELTSLPNGHKAIGVKWVYKAKKNSKGEVERYKARLVAKGYSQRAGIDYDEVFAPVARLETVRLIISLAAQNKWKIHQMDFKLAFLNGDFEE
2573645
 
79*********************************************************************************************
PP

   RVT_2      96
evYvkqpeGfedkkkenkvckLkkslYgLkqapraWyeklsevllklgfkkseadkclfvkkkeeeliivllYVDDlliagsskelieelkeeLk
190
                 evY++qp+G+ +k++e+kv++Lkk+lYgLkqapraW++++++++++++f k+ +
+++l++k ++e+++i +lYVDDl+++g++ ++ ee+k+e++
  Chr1_1 2573646
EVYIEQPQGYIVKGEEDKVLRLKKALYGLKQAPRAWNTRIDKYFKEKDFIKCPYEHALYIKIQKEDILIACLYVDDLIFTGNNPSMFEEFKKEMT
2573740
 
***********************************************************************************************
PP

   RVT_2     191
kefemkdlgelkyfLgleierkeegillsqekyvkkllkkfkmedakpvstple 244
                 kefem+d+g ++y+Lg+e+++++++i+++qe y+k++lkkfkm+d++pv tp
+e
  Chr1_1 2573741
KEFEMTDIGLMSYYLGIEVKQEDNRIFITQEGYAKEVLKKFKMDDSNPVCTPME 2573794
 
****************************************************97 PP


From kai.blin at biotech.uni-tuebingen.de  Thu Aug 12 12:16:45 2010
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Thu, 12 Aug 2010 14:16:45 +0200
Subject: [Bioperl-l] HMMER3 to GFF3
In-Reply-To: <4bb89ced-69d9-43ff-ae20-4ce134efc40a@f6g2000yqa.googlegroups.com>
References: <4bb89ced-69d9-43ff-ae20-4ce134efc40a@f6g2000yqa.googlegroups.com>
Message-ID: <20100812141645.1dc6507a.kai.blin@biotech.uni-tuebingen.de>

On Wed, 11 Aug 2010 22:59:37 -0700 (PDT)
Doug Hoen <douglas.hoen at gmail.com> wrote:

Hi Doug,

> Could someone please confirm whether the results are incorrect and, if
> so, perhaps suggest a fix? It may well be that this problem is due to
> the unusual way I am using hmmscan, rather than a problem with HMMER3
> parsing...?

Can you please attach your hmmer input file? Along the way something
inserted line breaks, making it unreadable.

It might well be possible that the HMMer3 parser still handles a little
different from the HMMer2 parser, I haven't tried that script.

Cheers,
Kai

-- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-University of T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Deutschland
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben


From kai.blin at biotech.uni-tuebingen.de  Thu Aug 12 12:09:00 2010
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Thu, 12 Aug 2010 14:09:00 +0200
Subject: [Bioperl-l] using HMMER
In-Reply-To: <62C86AFB-FF3A-44C6-A413-50C3F839DF34@illinois.edu>
References: <603590.1072.qm@web112620.mail.gq1.yahoo.com>
	<4C62B487.9090103@gmail.com>
	<62C86AFB-FF3A-44C6-A413-50C3F839DF34@illinois.edu>
Message-ID: <20100812140900.291bbb01.kai.blin@biotech.uni-tuebingen.de>

On Wed, 11 Aug 2010 10:07:36 -0500
Chris Fields <cjfields at illinois.edu> wrote:

> might also want to check whether you are using hmmer2 vs hmmer3.  not sure if the wrapper works for hmmer3.

It might if you initialize it using
my $factory = Bio::Tools::Run::Hmmer->new(-hmm => 'model.hmm', -_READMETHOD => 'hmmer3');

at least for the programs that still exist with the same name in
hmmer3. It won't support hmmer3 using the default options, though.

If I have some spare time, I'll look into this, no promises on the
timeframe, though.

Cheers,
Kai

-- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-University of T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Deutschland
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben


From cjfields at illinois.edu  Thu Aug 12 15:28:50 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 12 Aug 2010 10:28:50 -0500
Subject: [Bioperl-l] using HMMER
In-Reply-To: <20100812140900.291bbb01.kai.blin@biotech.uni-tuebingen.de>
References: <603590.1072.qm@web112620.mail.gq1.yahoo.com>
	<4C62B487.9090103@gmail.com>
	<62C86AFB-FF3A-44C6-A413-50C3F839DF34@illinois.edu>
	<20100812140900.291bbb01.kai.blin@biotech.uni-tuebingen.de>
Message-ID: <8129B813-5B15-4DDC-AB0D-5D95EFFCE78D@illinois.edu>

On Aug 12, 2010, at 7:09 AM, Kai Blin wrote:

> On Wed, 11 Aug 2010 10:07:36 -0500
> Chris Fields <cjfields at illinois.edu> wrote:
> 
>> might also want to check whether you are using hmmer2 vs hmmer3.  not sure if the wrapper works for hmmer3.
> 
> It might if you initialize it using
> my $factory = Bio::Tools::Run::Hmmer->new(-hmm => 'model.hmm', -_READMETHOD => 'hmmer3');
> 
> at least for the programs that still exist with the same name in
> hmmer3. It won't support hmmer3 using the default options, though.
> 
> If I have some spare time, I'll look into this, no promises on the
> timeframe, though.
> 
> Cheers,
> Kai
> 
> -- 
> Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
> Institute for Microbiology and Infection Medicine
> Division of Microbiology/Biotechnology
> Eberhard-Karls-University of T?bingen
> Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
> D-72076 T?bingen                        Fax :   ++49 7071 29-5979
> Deutschland
> Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben

Would be nice to convert this over (at some point) to use Mark's CommandExts.  I'm thinking of doing this with Infernal, so if I get that running it wouldn't be terribly difficult to get hmmer3 working as well.

chris


From cjfields at illinois.edu  Thu Aug 12 16:14:44 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 12 Aug 2010 11:14:44 -0500
Subject: [Bioperl-l] using HMMER
In-Reply-To: <857996.8184.qm@web112610.mail.gq1.yahoo.com>
References: <603590.1072.qm@web112620.mail.gq1.yahoo.com>
	<4C62B487.9090103@gmail.com>
	<62C86AFB-FF3A-44C6-A413-50C3F839DF34@illinois.edu>
	<20100812140900.291bbb01.kai.blin@biotech.uni-tuebingen.de>
	<8129B813-5B15-4DDC-AB0D-5D95EFFCE78D@illinois.edu>
	<857996.8184.qm@web112610.mail.gq1.yahoo.com>
Message-ID: <43FD0A31-DB95-4AE9-B678-937EE6346BC2@illinois.edu>

Fayroz,

Please keep responses on-list.

It seems you need to update your local bioperl, as 'hmmer3' is a recent addition, after 1.6.1.  It will be in 1.6.2 if I can get the time to make a release :>

chris

On Aug 12, 2010, at 10:58 AM, fayroz wrote:

> dear chris,
> from HMMER documentation i found this statement
> "The HMMER programs must either be in your path, or you must set the environment
> variable HMMERDIR to point to their location." 
> is it will solve the problem?
> how can i do it please ? i work under windows7 platform
> 
> 
> when i appled this line with hmmer3
> my $factory = Bio::Tools::Run::Hmmer->new(-hmm => 'model.hmm', -_READMETHOD => 
> 'hmmer3');
> 
> this output apper: 
> 
> Bio::SearchIO: hmmer3 cannot be found
> 
> and when try with hmmer2 the same output apper: 
> 
> Exception
> ------------- EXCEPTION -------------
> MSG: Failed to load module Bio::SearchIO::hmmer3. Can't locate 
> Bio\SearchIO\hmmer3.pm in @INC (@INC contains: D:\Perl\bin\ D:/Perl/site/lib 
> D:/Perl/lib .) at D:/Perl/site/lib/Bio/Root/Root.pm line 439, <GEN0> line 1.
> STACK Bio::Root::Root::_load_module D:/Perl/site/lib/Bio/Root/Root.pm:441
> STACK (eval) D:/Perl/site/lib/Bio/SearchIO.pm:446
> STACK Bio::SearchIO::_load_format_module D:/Perl/site/lib/Bio/SearchIO.pm:445
> STACK Bio::SearchIO::new D:/Perl/site/lib/Bio/SearchIO.pm:189
> STACK Bio::Tools::Run::Hmmer::_run D:/Perl/site/lib/Bio/Tools/Run/Hmmer.pm:431
> STACK Bio::Tools::Run::Hmmer::hmmsearch 
> D:/Perl/site/lib/Bio/Tools/Run/Hmmer.pm:353
> STACK toplevel C:\Users\Khaled\AppData\Local\Temp\dzprltmp.pl:13
> -------------------------------------
> For more information about the SearchIO system please see the SearchIO docs.
> This includes ways of checking for formats at compile time, not run time
> '--informat' is not recognized as an internal or external command,
> operable program or batch file.
> Can't call method "next_result" on an undefined value at 
> C:\Users\Khaled\AppData\Local\Temp\dzprltmp.pl line 15, <GEN0> line 1.
> 
> 
> 
> ----- Original Message ----
> From: Chris Fields <cjfields at illinois.edu>
> To: Kai Blin <kai.blin at biotech.uni-tuebingen.de>
> Cc: fayroz <fayroz_farouk at yahoo.com>; bioperl-l at bioperl.org
> Sent: Thu, August 12, 2010 6:28:50 PM
> Subject: Re: [Bioperl-l] using HMMER
> 
> On Aug 12, 2010, at 7:09 AM, Kai Blin wrote:
> 
>> On Wed, 11 Aug 2010 10:07:36 -0500
>> Chris Fields <cjfields at illinois.edu> wrote:
>> 
>>> might also want to check whether you are using hmmer2 vs hmmer3.  not sure if 
>>> the wrapper works for hmmer3.
>> 
>> It might if you initialize it using
>> my $factory = Bio::Tools::Run::Hmmer->new(-hmm => 'model.hmm', -_READMETHOD => 
>> 'hmmer3');
>> 
>> at least for the programs that still exist with the same name in
>> hmmer3. It won't support hmmer3 using the default options, though.
>> 
>> If I have some spare time, I'll look into this, no promises on the
>> timeframe, though.
>> 
>> Cheers,
>> Kai
>> 
>> -- 
>> Dipl.-Inform. Kai Blin        kai.blin at biotech.uni-tuebingen.de
>> Institute for Microbiology and Infection Medicine
>> Division of Microbiology/Biotechnology
>> Eberhard-Karls-University of T?bingen
>> Auf der Morgenstelle 28                Phone : ++49 7071 29-78841
>> D-72076 T?bingen                        Fax :  ++49 7071 29-5979
>> Deutschland
>> Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben
> 
> Would be nice to convert this over (at some point) to use Mark's CommandExts.  
> I'm thinking of doing this with Infernal, so if I get that running it wouldn't 
> be terribly difficult to get hmmer3 working as well.
> 
> chris
> 
> 
> 


From jason at bioperl.org  Thu Aug 12 18:37:11 2010
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 12 Aug 2010 11:37:11 -0700
Subject: [Bioperl-l] Other: Script for editing alignments?
In-Reply-To: <20100812061811.4D92468539@evol.biology.mcmaster.ca>
References: <20100812061811.4D92468539@evol.biology.mcmaster.ca>
Message-ID: <4C643F57.3040408@bioperl.org>

Hi Si -

This is pretty straightforward with Bioperl. Here's one solution:

#!/usr/bin/perl -w
use strict;
use Bio::AlignIO;
my $in = Bio::AlignIO->new(-format => 'fasta', -file => shift @ARGV);
my $out = Bio::AlignIO->new(-format => 'fasta');

while( my $aln = $in->next_aln ) {
  for my $seq ( $aln->each_seq ) {
  my $str = $seq->seq;
  if( $str =~ /^(-+)/ ) {
     my $rep = length($1);
# replace from the 5' end
     substr($str,0,$rep,'N'x$rep);
  }
  if( $str =~ /(-+)$/ ) {
    my $rep = length($1);
   # replace from the 3' end
    substr($str,-1 * $rep,length($str),'N'x$rep);
  }
     $seq->seq($str);
  }
  # don't print the /start-end info in the FASTA ID
  $aln->set_displayname_flat(1);
  $out->write_aln($aln);
}

-jason

evoldir at evol.biology.mcmaster.ca wrote, On 8/11/10 11:18 PM:
> Dear All
>
> Alignment programs like MUSCLE and Clustal often output alignments with
> "-" symbols indicating indels (real events) within sequence alignments,
> but also "-" symbols at the 5' and 3' ends of sequences. The latter
> however, are not real evolutionary events and really should be Ns
> (missing data), depending on the sort of analytical framework you use.
>
> If there is sufficient heterogeneity and signal within the 5' and 3'
> ends of sequences, the "-"s can be manually edited in a text editor to
> Ns with no problem, if the alignment is small. If it is large (e.g. 2000
> seqs), or there are lots of alignments, it becomes a lengthy task.
>
> I'm investigating such alignments presently and so was wondering if
> anyone had a clever way of implementing sed, or had a Perl script that
> would perform such a task. Simply put, it would require replacing the 5'
> and 3' "-" below only with Ns and leaving the within sequence "-"s
> alone. The sequences naturally may span more than one line.
>
>   >Taxon 1
> -----ATGCTG--TGACTG----TGACT---
>   >Taxon 2
> ---GTATGTTG--TGACTGCT--TGACCGTC
>
> to
>
>   >Taxon 1
> NNNNNATGCTG--TGACTG----TGACTNNN
>   >Taxon 2
> NNNGTATGTTG--TGACTGCT--TGACCGTC
>
> It's a simple task, but I haven't seen any scripts out there to do the job.
>
> If there are any scripters out there who can help, or if someone knows
> of an application that would help, it would be great to hear from you.
>
> With best wishes and thanks
>
> Si Creer
>
>    


From genehack at genehack.org  Fri Aug 13 00:32:07 2010
From: genehack at genehack.org (John SJ Anderson)
Date: Thu, 12 Aug 2010 20:32:07 -0400
Subject: [Bioperl-l]
	Bio::SeqFeature::SimilarityPair->from_searchResult()?
In-Reply-To: <4513D6B2-F7B3-4A6E-91CA-879C9E372E84@gmail.com>
References: <4513D6B2-F7B3-4A6E-91CA-879C9E372E84@gmail.com>
Message-ID: <ABCC813F-9FF8-465E-B5AF-E95BD8291D95@genehack.org>


On Aug 10, 2010, at 21:54 , Douglas Hoen wrote:

> I was wondering why the Synopsis in the docs for Bio::SeqFeature::SimilarityPair has the following:
> $sim_pair = Bio::SeqFeature::SimilarityPair->from_searchResult($blastHit);
> 
> There doesn't actually seem to be a from_searchResult method. Am I missing something?

No, it looks like that method got removed back in 2002 as a part of moving to Bio::SearchIO (which was removed still later...):

  <http://github.com/bioperl/bioperl-live/commit/5e3bdc11eb0ceffcd8e8966299a6367e792f2fd1>

Unfortunately, the commit didn't update the documentation. From the tiny little bit I've looked at the code, it looks like you should just be calling the 'new()' method instead (note that it takes a set of arguments, not just a BLAST hit object).

Hope this helps -- if you should happen to have the tuits, a patch to update the documentation to reflect the current interface would be awesome...

chrs,
john.


From david.breimann at gmail.com  Fri Aug 13 13:01:10 2010
From: david.breimann at gmail.com (David Breimann)
Date: Fri, 13 Aug 2010 16:01:10 +0300
Subject: [Bioperl-l] Problem executing bp_genbank2gff3.pl from another perl
	script
Message-ID: <AANLkTikqTXynSe4dTqw1Tz5GOOyoDOZTC5C-HJWLKfaL@mail.gmail.com>

Hi,
I am rying to run bp_genbank2gff3.pl from another perl script that
gets a genbank as its argument.

This does not work  (no output files are generated):
    my $command = "bp_genbank2gff3.pl -y -o /tmp $ARGV[0]";

    open( my $command_out, "-|", $command );
    close $command_out;

but this does

    open( my $command_out, "-|", $command );
    sleep 3; # why do I need to sleep?
    close $command_out;

Why?

I though that close is supposed to block until the command is done:

Closing any piped filehandle causes the parent process to wait for the
child to finish... (see http://perldoc.perl.org/functions/open.html).

Thanks
Dave


From jun.yin at ucd.ie  Fri Aug 13 13:36:34 2010
From: jun.yin at ucd.ie (Jun Yin)
Date: Fri, 13 Aug 2010 14:36:34 +0100
Subject: [Bioperl-l] Bio::LocatableSeq end checking inconsistency
Message-ID: <004a01cb3aec$8c2ddd60$a4899820$%yin@ucd.ie>

Hi, all,

 
I am the google summer of code student working on Bio::Align subsystem
refactoring. The code (Bio::SimpleAlign) I re-implemented now has passed
nearly all the test, except a few tests on seq/start-end testing. But here
comes a problem. This may be an old issue, that the Bio::LocatableSeq end
assignment and checking are inconsistent.

 
The current end checking method is based on:

$end=$seq->_ungapped_len+$seq->start-1

However, this checking may not fit the real world case.

 
The inconsistency usually happens when a few columns of the sequence are
removed.

 
For example:

my $a = Bio::LocatableSeq->new(

    -id    => 'a',

    -strand => 1,

    -seq   => '-tcgatc-atcgatcg',

    -start => 30,

    -end   => 43

);

 
If we remove the 1st, 8th and the last columns

 
$a->seq() will be 'tcgatcatcgatc'

$a->_ungapped_len==12

 
Actually, in the real world, the first residue will still be 30 (the old
$seq->start), and the last residue is the residue before the 43 (the old
$seq->end), thus 42.

 
But if you call a validation, the calculation is
$a->_ungapped_len+$a->start-1=12+30-1=41

So the reassignment of the $seq->end will not pass the validation.

 
So unless you save the information to a new sequence object, the original
position information will be lost anyway. But in some cases, we have to
change the sequence in its original sequence object ..

 
What is your suggestion on this issue? 

A. pass the test and lose the information      #convenient in coding but the
start-end annotation is not right any more

B. keep the information and forget the test   #the object will still
remember where the last residue was in the original sequence. But is it
really meaningful at all? Because all the other residues may come from
nowhere

C. Neither of above #any other suggestions?

 
Cheers,

Jun Yin

Ph.D. student in U.C.D.

 
Bioinformatics Laboratory

Conway Institute

University College Dublin

 
From jessica.sun at gmail.com  Fri Aug 13 15:06:46 2010
From: jessica.sun at gmail.com (Jessica Sun)
Date: Fri, 13 Aug 2010 11:06:46 -0400
Subject: [Bioperl-l] Add sequence feature
Message-ID: <AANLkTimkwyqE1tbZ+p-T41yvBWVq=GrCJ+L0rT+84mPy@mail.gmail.com>

Does anyone knows how to open a genbank file, add new feature and then save
a new genbank
file with new feature added in bioperl ?

thx

-- 
Jessica Jingping Sun


From jessica.sun at gmail.com  Fri Aug 13 15:27:10 2010
From: jessica.sun at gmail.com (Jessica Sun)
Date: Fri, 13 Aug 2010 11:27:10 -0400
Subject: [Bioperl-l] Add sequence feature
In-Reply-To: <4C6562E0.7090008@gmail.com>
References: <AANLkTimkwyqE1tbZ+p-T41yvBWVq=GrCJ+L0rT+84mPy@mail.gmail.com>
	<4C6562E0.7090008@gmail.com>
Message-ID: <AANLkTikGRnpybPPsoSf_FAXWUUpFCK9YEMMiyStBbfbN@mail.gmail.com>

unfortunately. I want to add the feature to the sequence object I got from
the Genbank file, I do not mind to save a new genbank file but these new
genbank file contains the original genbank format and info I got plus the
new feature tags I need to added to. Any quick solution to this?

thx

Jessica


On Fri, Aug 13, 2010 at 11:21 AM, Roy Chaudhuri <roy.chaudhuri at gmail.com>wrote:

> Hi Jessica.
>
> You need to use Bio::SeqIO to read in the GenBank file to a BioPerl
> sequence object, and to write your new GenBank file:
> http://www.bioperl.org/wiki/HOWTO:SeqIO
>
> To add a new feature follow the instructions here:
>
> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Own_Sequences
>
> (except that you are adding the feature to the sequence object you got from
> the Genbank file, not a new Bio::Seq object).
>
> Cheers.
> Roy.
>
>
> On 13/08/2010 16:06, Jessica Sun wrote:
>
>> Does anyone knows how to open a genbank file, add new feature and then
>> save
>> a new genbank
>> file with new feature added in bioperl ?
>>
>> thx
>>
>>
>


-- 
Jessica Jingping Sun


From roy.chaudhuri at gmail.com  Fri Aug 13 15:21:04 2010
From: roy.chaudhuri at gmail.com (Roy Chaudhuri)
Date: Fri, 13 Aug 2010 16:21:04 +0100
Subject: [Bioperl-l] Add sequence feature
In-Reply-To: <AANLkTimkwyqE1tbZ+p-T41yvBWVq=GrCJ+L0rT+84mPy@mail.gmail.com>
References: <AANLkTimkwyqE1tbZ+p-T41yvBWVq=GrCJ+L0rT+84mPy@mail.gmail.com>
Message-ID: <4C6562E0.7090008@gmail.com>

Hi Jessica.

You need to use Bio::SeqIO to read in the GenBank file to a BioPerl 
sequence object, and to write your new GenBank file:
http://www.bioperl.org/wiki/HOWTO:SeqIO

To add a new feature follow the instructions here:
http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Own_Sequences

(except that you are adding the feature to the sequence object you got 
from the Genbank file, not a new Bio::Seq object).

Cheers.
Roy.

On 13/08/2010 16:06, Jessica Sun wrote:
> Does anyone knows how to open a genbank file, add new feature and then save
> a new genbank
> file with new feature added in bioperl ?
>
> thx
>


From roy.chaudhuri at gmail.com  Fri Aug 13 15:37:20 2010
From: roy.chaudhuri at gmail.com (Roy Chaudhuri)
Date: Fri, 13 Aug 2010 16:37:20 +0100
Subject: [Bioperl-l] Add sequence feature
In-Reply-To: <AANLkTikGRnpybPPsoSf_FAXWUUpFCK9YEMMiyStBbfbN@mail.gmail.com>
References: <AANLkTimkwyqE1tbZ+p-T41yvBWVq=GrCJ+L0rT+84mPy@mail.gmail.com>	<4C6562E0.7090008@gmail.com>
	<AANLkTikGRnpybPPsoSf_FAXWUUpFCK9YEMMiyStBbfbN@mail.gmail.com>
Message-ID: <4C6566B0.60706@gmail.com>

I'm not sure I understand, do you mean that you want to load just the 
sequence from the GenBank file (ignoring the existing annotation), then 
add your own features? There are instructions on how to do that here:
http://www.bioperl.org/wiki/HOWTO:SeqIO#Speed.2C_Bio::Seq::SeqBuilder

On 13/08/2010 16:27, Jessica Sun wrote:
> unfortunately. I want to add the feature to the sequence object I got
> from the Genbank file, I do not mind to save a new genbank file but
> these new genbank file contains the original genbank format and info I
> got plus the new feature tags I need to added to. Any quick solution to
> this?
>
> thx
>
> Jessica
>
>
>
> On Fri, Aug 13, 2010 at 11:21 AM, Roy Chaudhuri <roy.chaudhuri at gmail.com
> <mailto:roy.chaudhuri at gmail.com>> wrote:
>
>     Hi Jessica.
>
>     You need to use Bio::SeqIO to read in the GenBank file to a BioPerl
>     sequence object, and to write your new GenBank file:
>     http://www.bioperl.org/wiki/HOWTO:SeqIO
>
>     To add a new feature follow the instructions here:
>     http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Own_Sequences
>
>     (except that you are adding the feature to the sequence object you
>     got from the Genbank file, not a new Bio::Seq object).
>
>     Cheers.
>     Roy.
>
>
>     On 13/08/2010 16:06, Jessica Sun wrote:
>
>         Does anyone knows how to open a genbank file, add new feature
>         and then save
>         a new genbank
>         file with new feature added in bioperl ?
>
>         thx
>
>
>
>
>
> --
> Jessica Jingping Sun


From roy.chaudhuri at gmail.com  Fri Aug 13 15:57:27 2010
From: roy.chaudhuri at gmail.com (Roy Chaudhuri)
Date: Fri, 13 Aug 2010 16:57:27 +0100
Subject: [Bioperl-l] Add sequence feature
In-Reply-To: <AANLkTikRuHaDmA3JvsdBHBXpgeti-pJvHuJk15s_9K+P@mail.gmail.com>
References: <AANLkTimkwyqE1tbZ+p-T41yvBWVq=GrCJ+L0rT+84mPy@mail.gmail.com>	<4C6562E0.7090008@gmail.com>	<AANLkTikGRnpybPPsoSf_FAXWUUpFCK9YEMMiyStBbfbN@mail.gmail.com>	<4C6566B0.60706@gmail.com>
	<AANLkTikRuHaDmA3JvsdBHBXpgeti-pJvHuJk15s_9K+P@mail.gmail.com>
Message-ID: <4C656B67.5020402@gmail.com>

Please remember to copy replies to the mailing list.

You can loop over the features in your Bio::Seq object:
for my $feat ($seq->get_SeqFeatures) { # do something }

And once you have found the feature you want to modify, you can add a 
tag using something like:
$feat->add_tag_value('note',"this is a note");

When you're finished you can write out the modified sequence object to a 
new GenBank file.

On 13/08/2010 16:40, Jessica Sun wrote:
> no i want to load the genbank file with existing features and I need to
> add some new feature tags to the existing ones and then save to a new
> update genbank file for local usage. I just not quite good on how to
> easily merge the two steps you recommended into one in a neat way.
>
> thx
>
>
> On Fri, Aug 13, 2010 at 11:37 AM, Roy Chaudhuri <roy.chaudhuri at gmail.com
> <mailto:roy.chaudhuri at gmail.com>> wrote:
>
>     I'm not sure I understand, do you mean that you want to load just
>     the sequence from the GenBank file (ignoring the existing
>     annotation), then add your own features? There are instructions on
>     how to do that here:
>     http://www.bioperl.org/wiki/HOWTO:SeqIO#Speed.2C_Bio::Seq::SeqBuilder
>
>
>     On 13/08/2010 16:27, Jessica Sun wrote:
>
>         unfortunately. I want to add the feature to the sequence object
>         I got
>         from the Genbank file, I do not mind to save a new genbank file but
>         these new genbank file contains the original genbank format and
>         info I
>         got plus the new feature tags I need to added to. Any quick
>         solution to
>         this?
>
>         thx
>
>         Jessica
>
>
>
>         On Fri, Aug 13, 2010 at 11:21 AM, Roy Chaudhuri
>         <roy.chaudhuri at gmail.com <mailto:roy.chaudhuri at gmail.com>
>         <mailto:roy.chaudhuri at gmail.com
>         <mailto:roy.chaudhuri at gmail.com>>> wrote:
>
>             Hi Jessica.
>
>             You need to use Bio::SeqIO to read in the GenBank file to a
>         BioPerl
>             sequence object, and to write your new GenBank file:
>         http://www.bioperl.org/wiki/HOWTO:SeqIO
>
>             To add a new feature follow the instructions here:
>         http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Own_Sequences
>
>             (except that you are adding the feature to the sequence
>         object you
>             got from the Genbank file, not a new Bio::Seq object).
>
>             Cheers.
>             Roy.
>
>
>             On 13/08/2010 16:06, Jessica Sun wrote:
>
>                 Does anyone knows how to open a genbank file, add new
>         feature
>                 and then save
>                 a new genbank
>                 file with new feature added in bioperl ?
>
>                 thx
>
>
>
>
>
>         --
>         Jessica Jingping Sun
>
>
>
>
>
> --
> Jessica Jingping Sun


From jessica.sun at gmail.com  Fri Aug 13 17:06:32 2010
From: jessica.sun at gmail.com (Jessica Sun)
Date: Fri, 13 Aug 2010 13:06:32 -0400
Subject: [Bioperl-l] Add sequence feature
In-Reply-To: <4C656B67.5020402@gmail.com>
References: <AANLkTimkwyqE1tbZ+p-T41yvBWVq=GrCJ+L0rT+84mPy@mail.gmail.com>
	<4C6562E0.7090008@gmail.com>
	<AANLkTikGRnpybPPsoSf_FAXWUUpFCK9YEMMiyStBbfbN@mail.gmail.com>
	<4C6566B0.60706@gmail.com>
	<AANLkTikRuHaDmA3JvsdBHBXpgeti-pJvHuJk15s_9K+P@mail.gmail.com>
	<4C656B67.5020402@gmail.com>
Message-ID: <AANLkTikJ7md5gYebt3GOXBMbCBvDZQ1TmoEkUMp1MPf5@mail.gmail.com>

Thanks. I somehow get these error messages.

--------------------- WARNING ---------------------
MSG:  Bio::SeqIO::genbank=HASH(0xa7ba1c) is not a SeqI compliant module.
Attempting to dump, but may fail!
---------------------------------------------------
Can't locate object method "seq" via package "Bio::SeqIO::genbank" at
/Library/Perl/5.8.8/Bio/SeqIO/genbank.pm line 760, <GEN0> line 447.

by doing this,

my $feat = new Bio::SeqFeature::Generic(-start                 =>20,
                                        -end         => $40,
                                        -primary_tag => 'newfeature' );
                                    $feat->add_tag_value("note","this is
notes");
  $f->add_SeqFeature($feat); ## f is original feature pointer
$io = Bio::SeqIO->new(-format => "genbank", -file => ">$newoutfile" );

    $io->write_seq($seqio_object);

On Fri, Aug 13, 2010 at 11:57 AM, Roy Chaudhuri <roy.chaudhuri at gmail.com>wrote:

> Please remember to copy replies to the mailing list.
>
> You can loop over the features in your Bio::Seq object:
> for my $feat ($seq->get_SeqFeatures) { # do something }
>
> And once you have found the feature you want to modify, you can add a tag
> using something like:
> $feat->add_tag_value('note',"this is a note");
>
> When you're finished you can write out the modified sequence object to a
> new GenBank file.
>
>
> On 13/08/2010 16:40, Jessica Sun wrote:
>
>> no i want to load the genbank file with existing features and I need to
>> add some new feature tags to the existing ones and then save to a new
>> update genbank file for local usage. I just not quite good on how to
>> easily merge the two steps you recommended into one in a neat way.
>>
>> thx
>>
>>
>> On Fri, Aug 13, 2010 at 11:37 AM, Roy Chaudhuri <roy.chaudhuri at gmail.com
>> <mailto:roy.chaudhuri at gmail.com>> wrote:
>>
>>    I'm not sure I understand, do you mean that you want to load just
>>    the sequence from the GenBank file (ignoring the existing
>>    annotation), then add your own features? There are instructions on
>>    how to do that here:
>>    http://www.bioperl.org/wiki/HOWTO:SeqIO#Speed.2C_Bio::Seq::SeqBuilder
>>
>>
>>    On 13/08/2010 16:27, Jessica Sun wrote:
>>
>>        unfortunately. I want to add the feature to the sequence object
>>        I got
>>        from the Genbank file, I do not mind to save a new genbank file but
>>        these new genbank file contains the original genbank format and
>>        info I
>>        got plus the new feature tags I need to added to. Any quick
>>        solution to
>>        this?
>>
>>        thx
>>
>>        Jessica
>>
>>
>>
>>        On Fri, Aug 13, 2010 at 11:21 AM, Roy Chaudhuri
>>        <roy.chaudhuri at gmail.com <mailto:roy.chaudhuri at gmail.com>
>>        <mailto:roy.chaudhuri at gmail.com
>>        <mailto:roy.chaudhuri at gmail.com>>> wrote:
>>
>>            Hi Jessica.
>>
>>            You need to use Bio::SeqIO to read in the GenBank file to a
>>        BioPerl
>>            sequence object, and to write your new GenBank file:
>>        http://www.bioperl.org/wiki/HOWTO:SeqIO
>>
>>            To add a new feature follow the instructions here:
>>
>> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Own_Sequences
>>
>>            (except that you are adding the feature to the sequence
>>        object you
>>            got from the Genbank file, not a new Bio::Seq object).
>>
>>            Cheers.
>>            Roy.
>>
>>
>>            On 13/08/2010 16:06, Jessica Sun wrote:
>>
>>                Does anyone knows how to open a genbank file, add new
>>        feature
>>                and then save
>>                a new genbank
>>                file with new feature added in bioperl ?
>>
>>                thx
>>
>>
>>
>>
>>
>>        --
>>        Jessica Jingping Sun
>>
>>
>>
>>
>>
>> --
>> Jessica Jingping Sun
>>
>
>


-- 
Jessica Jingping Sun


From drummike at gmail.com  Fri Aug 13 17:41:55 2010
From: drummike at gmail.com (Mike Williams)
Date: Fri, 13 Aug 2010 13:41:55 -0400
Subject: [Bioperl-l] Add sequence feature
In-Reply-To: <AANLkTikJ7md5gYebt3GOXBMbCBvDZQ1TmoEkUMp1MPf5@mail.gmail.com>
References: <AANLkTimkwyqE1tbZ+p-T41yvBWVq=GrCJ+L0rT+84mPy@mail.gmail.com>
	<4C6562E0.7090008@gmail.com>
	<AANLkTikGRnpybPPsoSf_FAXWUUpFCK9YEMMiyStBbfbN@mail.gmail.com>
	<4C6566B0.60706@gmail.com>
	<AANLkTikRuHaDmA3JvsdBHBXpgeti-pJvHuJk15s_9K+P@mail.gmail.com>
	<4C656B67.5020402@gmail.com>
	<AANLkTikJ7md5gYebt3GOXBMbCBvDZQ1TmoEkUMp1MPf5@mail.gmail.com>
Message-ID: <AANLkTi=SuCgDmDZ1qQW0-mUQJxigteO4GPnSQD09oB90@mail.gmail.com>

On Fri, Aug 13, 2010 at 1:06 PM, Jessica Sun <jessica.sun at gmail.com> wrote:

> Thanks. I somehow get these error messages.
> by doing this,
>
> my $feat = new Bio::SeqFeature::Generic(-start                 =>20,
>                                        -end         => $40,
>                                        -primary_tag => 'newfeature' );
>                                     $feat->add_tag_value("note","this is
> notes");
>

That $40 looks fishy.  Try deleting the dollar sign.  You did mean just 40,
right?

Mike


From MEC at stowers.org  Fri Aug 13 17:37:50 2010
From: MEC at stowers.org (Cook, Malcolm)
Date: Fri, 13 Aug 2010 12:37:50 -0500
Subject: [Bioperl-l] Add sequence feature
In-Reply-To: <AANLkTikJ7md5gYebt3GOXBMbCBvDZQ1TmoEkUMp1MPf5@mail.gmail.com>
References: <AANLkTimkwyqE1tbZ+p-T41yvBWVq=GrCJ+L0rT+84mPy@mail.gmail.com>
	<4C6562E0.7090008@gmail.com>
	<AANLkTikGRnpybPPsoSf_FAXWUUpFCK9YEMMiyStBbfbN@mail.gmail.com>
	<4C6566B0.60706@gmail.com>
	<AANLkTikRuHaDmA3JvsdBHBXpgeti-pJvHuJk15s_9K+P@mail.gmail.com>
	<4C656B67.5020402@gmail.com>
	<AANLkTikJ7md5gYebt3GOXBMbCBvDZQ1TmoEkUMp1MPf5@mail.gmail.com>
Message-ID: <BD62CBAC4395B94096109020651BE2EC1312232E24@EXCHMB-02.stowers-institute.org>

Jessica,

Show more code!

In particular, where did $f get set?

--Malcolm

 
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Jessica Sun
Sent: Friday, August 13, 2010 12:07 PM
To: Roy Chaudhuri
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Add sequence feature

Thanks. I somehow get these error messages.

--------------------- WARNING ---------------------
MSG:  Bio::SeqIO::genbank=HASH(0xa7ba1c) is not a SeqI compliant module.
Attempting to dump, but may fail!
---------------------------------------------------
Can't locate object method "seq" via package "Bio::SeqIO::genbank" at /Library/Perl/5.8.8/Bio/SeqIO/genbank.pm line 760, <GEN0> line 447.

by doing this,

my $feat = new Bio::SeqFeature::Generic(-start                 =>20,
                                        -end         => $40,
                                        -primary_tag => 'newfeature' );
                                    $feat->add_tag_value("note","this is notes");
  $f->add_SeqFeature($feat); ## f is original feature pointer $io = Bio::SeqIO->new(-format => "genbank", -file => ">$newoutfile" );

    $io->write_seq($seqio_object);

On Fri, Aug 13, 2010 at 11:57 AM, Roy Chaudhuri <roy.chaudhuri at gmail.com>wrote:

> Please remember to copy replies to the mailing list.
>
> You can loop over the features in your Bio::Seq object:
> for my $feat ($seq->get_SeqFeatures) { # do something }
>
> And once you have found the feature you want to modify, you can add a 
> tag using something like:
> $feat->add_tag_value('note',"this is a note");
>
> When you're finished you can write out the modified sequence object to 
> a new GenBank file.
>
>
> On 13/08/2010 16:40, Jessica Sun wrote:
>
>> no i want to load the genbank file with existing features and I need 
>> to add some new feature tags to the existing ones and then save to a 
>> new update genbank file for local usage. I just not quite good on how 
>> to easily merge the two steps you recommended into one in a neat way.
>>
>> thx
>>
>>
>> On Fri, Aug 13, 2010 at 11:37 AM, Roy Chaudhuri 
>> <roy.chaudhuri at gmail.com <mailto:roy.chaudhuri at gmail.com>> wrote:
>>
>>    I'm not sure I understand, do you mean that you want to load just
>>    the sequence from the GenBank file (ignoring the existing
>>    annotation), then add your own features? There are instructions on
>>    how to do that here:
>>    
>> http://www.bioperl.org/wiki/HOWTO:SeqIO#Speed.2C_Bio::Seq::SeqBuilder
>>
>>
>>    On 13/08/2010 16:27, Jessica Sun wrote:
>>
>>        unfortunately. I want to add the feature to the sequence object
>>        I got
>>        from the Genbank file, I do not mind to save a new genbank file but
>>        these new genbank file contains the original genbank format and
>>        info I
>>        got plus the new feature tags I need to added to. Any quick
>>        solution to
>>        this?
>>
>>        thx
>>
>>        Jessica
>>
>>
>>
>>        On Fri, Aug 13, 2010 at 11:21 AM, Roy Chaudhuri
>>        <roy.chaudhuri at gmail.com <mailto:roy.chaudhuri at gmail.com>
>>        <mailto:roy.chaudhuri at gmail.com
>>        <mailto:roy.chaudhuri at gmail.com>>> wrote:
>>
>>            Hi Jessica.
>>
>>            You need to use Bio::SeqIO to read in the GenBank file to a
>>        BioPerl
>>            sequence object, and to write your new GenBank file:
>>        http://www.bioperl.org/wiki/HOWTO:SeqIO
>>
>>            To add a new feature follow the instructions here:
>>
>> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Ow
>> n_Sequences
>>
>>            (except that you are adding the feature to the sequence
>>        object you
>>            got from the Genbank file, not a new Bio::Seq object).
>>
>>            Cheers.
>>            Roy.
>>
>>
>>            On 13/08/2010 16:06, Jessica Sun wrote:
>>
>>                Does anyone knows how to open a genbank file, add new
>>        feature
>>                and then save
>>                a new genbank
>>                file with new feature added in bioperl ?
>>
>>                thx
>>
>>
>>
>>
>>
>>        --
>>        Jessica Jingping Sun
>>
>>
>>
>>
>>
>> --
>> Jessica Jingping Sun
>>
>
>


--
Jessica Jingping Sun
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From Kevin.M.Brown at asu.edu  Fri Aug 13 17:53:50 2010
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Fri, 13 Aug 2010 10:53:50 -0700
Subject: [Bioperl-l] Add sequence feature
In-Reply-To: <AANLkTikJ7md5gYebt3GOXBMbCBvDZQ1TmoEkUMp1MPf5@mail.gmail.com>
References: <AANLkTimkwyqE1tbZ+p-T41yvBWVq=GrCJ+L0rT+84mPy@mail.gmail.com><4C6562E0.7090008@gmail.com><AANLkTikGRnpybPPsoSf_FAXWUUpFCK9YEMMiyStBbfbN@mail.gmail.com><4C6566B0.60706@gmail.com><AANLkTikRuHaDmA3JvsdBHBXpgeti-pJvHuJk15s_9K+P@mail.gmail.com><4C656B67.5020402@gmail.com>
	<AANLkTikJ7md5gYebt3GOXBMbCBvDZQ1TmoEkUMp1MPf5@mail.gmail.com>
Message-ID: <1A4207F8295607498283FE9E93B775B406E4529F@EX02.asurite.ad.asu.edu>

If I'm reading your sample code correctly, then you are mistakenly
trying to output the input SeqIO object and not the actual Bio::Seq
object that was read in by SeqIO.

My $seqio = Bio::SeqIO->new;
My $seq = $seqio->next_seq;

#manipulate $seq

My $out = Bio::SeqIO->new;
$out->write_seq($seq);

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Jessica Sun
Sent: Friday, August 13, 2010 10:07 AM
To: Roy Chaudhuri
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Add sequence feature

Thanks. I somehow get these error messages.

--------------------- WARNING ---------------------
MSG:  Bio::SeqIO::genbank=HASH(0xa7ba1c) is not a SeqI compliant module.
Attempting to dump, but may fail!
---------------------------------------------------
Can't locate object method "seq" via package "Bio::SeqIO::genbank" at
/Library/Perl/5.8.8/Bio/SeqIO/genbank.pm line 760, <GEN0> line 447.

by doing this,

my $feat = new Bio::SeqFeature::Generic(-start                 =>20,
                                        -end         => $40,
                                        -primary_tag => 'newfeature' );
                                    $feat->add_tag_value("note","this is
notes");
  $f->add_SeqFeature($feat); ## f is original feature pointer
$io = Bio::SeqIO->new(-format => "genbank", -file => ">$newoutfile" );

    $io->write_seq($seqio_object);

On Fri, Aug 13, 2010 at 11:57 AM, Roy Chaudhuri
<roy.chaudhuri at gmail.com>wrote:

> Please remember to copy replies to the mailing list.
>
> You can loop over the features in your Bio::Seq object:
> for my $feat ($seq->get_SeqFeatures) { # do something }
>
> And once you have found the feature you want to modify, you can add a
tag
> using something like:
> $feat->add_tag_value('note',"this is a note");
>
> When you're finished you can write out the modified sequence object to
a
> new GenBank file.
>
>
> On 13/08/2010 16:40, Jessica Sun wrote:
>
>> no i want to load the genbank file with existing features and I need
to
>> add some new feature tags to the existing ones and then save to a new
>> update genbank file for local usage. I just not quite good on how to
>> easily merge the two steps you recommended into one in a neat way.
>>
>> thx
>>
>>
>> On Fri, Aug 13, 2010 at 11:37 AM, Roy Chaudhuri
<roy.chaudhuri at gmail.com
>> <mailto:roy.chaudhuri at gmail.com>> wrote:
>>
>>    I'm not sure I understand, do you mean that you want to load just
>>    the sequence from the GenBank file (ignoring the existing
>>    annotation), then add your own features? There are instructions on
>>    how to do that here:
>>
http://www.bioperl.org/wiki/HOWTO:SeqIO#Speed.2C_Bio::Seq::SeqBuilder
>>
>>
>>    On 13/08/2010 16:27, Jessica Sun wrote:
>>
>>        unfortunately. I want to add the feature to the sequence
object
>>        I got
>>        from the Genbank file, I do not mind to save a new genbank
file but
>>        these new genbank file contains the original genbank format
and
>>        info I
>>        got plus the new feature tags I need to added to. Any quick
>>        solution to
>>        this?
>>
>>        thx
>>
>>        Jessica
>>
>>
>>
>>        On Fri, Aug 13, 2010 at 11:21 AM, Roy Chaudhuri
>>        <roy.chaudhuri at gmail.com <mailto:roy.chaudhuri at gmail.com>
>>        <mailto:roy.chaudhuri at gmail.com
>>        <mailto:roy.chaudhuri at gmail.com>>> wrote:
>>
>>            Hi Jessica.
>>
>>            You need to use Bio::SeqIO to read in the GenBank file to
a
>>        BioPerl
>>            sequence object, and to write your new GenBank file:
>>        http://www.bioperl.org/wiki/HOWTO:SeqIO
>>
>>            To add a new feature follow the instructions here:
>>
>>
http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Own_S
equences
>>
>>            (except that you are adding the feature to the sequence
>>        object you
>>            got from the Genbank file, not a new Bio::Seq object).
>>
>>            Cheers.
>>            Roy.
>>
>>
>>            On 13/08/2010 16:06, Jessica Sun wrote:
>>
>>                Does anyone knows how to open a genbank file, add new
>>        feature
>>                and then save
>>                a new genbank
>>                file with new feature added in bioperl ?
>>
>>                thx
>>
>>
>>
>>
>>
>>        --
>>        Jessica Jingping Sun
>>
>>
>>
>>
>>
>> --
>> Jessica Jingping Sun
>>
>
>


-- 
Jessica Jingping Sun
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jessica.sun at gmail.com  Fri Aug 13 19:16:51 2010
From: jessica.sun at gmail.com (Jessica Sun)
Date: Fri, 13 Aug 2010 15:16:51 -0400
Subject: [Bioperl-l] Fwd:  Add sequence feature
In-Reply-To: <AANLkTim6MBPBbRr2bEkCgCL+6NMXGqJ0wWoz3-JPRKyG@mail.gmail.com>
References: <AANLkTimkwyqE1tbZ+p-T41yvBWVq=GrCJ+L0rT+84mPy@mail.gmail.com>
	<4C6562E0.7090008@gmail.com>
	<AANLkTikGRnpybPPsoSf_FAXWUUpFCK9YEMMiyStBbfbN@mail.gmail.com>
	<4C6566B0.60706@gmail.com>
	<AANLkTikRuHaDmA3JvsdBHBXpgeti-pJvHuJk15s_9K+P@mail.gmail.com>
	<4C656B67.5020402@gmail.com>
	<AANLkTikJ7md5gYebt3GOXBMbCBvDZQ1TmoEkUMp1MPf5@mail.gmail.com>
	<1A4207F8295607498283FE9E93B775B406E4529F@EX02.asurite.ad.asu.edu>
	<AANLkTim6MBPBbRr2bEkCgCL+6NMXGqJ0wWoz3-JPRKyG@mail.gmail.com>
Message-ID: <AANLkTimFO1Yn-n7vqmmvAF5smQeGadEW_fs_a0U-7ej4@mail.gmail.com>

---------- Forwarded message ----------
From: Jessica Sun <jessica.sun at gmail.com>
Date: Fri, Aug 13, 2010 at 3:16 PM
Subject: Re: [Bioperl-l] Add sequence feature
To: Kevin Brown <Kevin.M.Brown at asu.edu>


yes, I change that, somehow it still did not take the added features in.


On Fri, Aug 13, 2010 at 1:53 PM, Kevin Brown <Kevin.M.Brown at asu.edu> wrote:

> If I'm reading your sample code correctly, then you are mistakenly
> trying to output the input SeqIO object and not the actual Bio::Seq
> object that was read in by SeqIO.
>
> My $seqio = Bio::SeqIO->new;
> My $seq = $seqio->next_seq;
>
> #manipulate $seq
>
> My $out = Bio::SeqIO->new;
> $out->write_seq($seq);
>
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Jessica Sun
> Sent: Friday, August 13, 2010 10:07 AM
> To: Roy Chaudhuri
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Add sequence feature
>
> Thanks. I somehow get these error messages.
>
> --------------------- WARNING ---------------------
> MSG:  Bio::SeqIO::genbank=HASH(0xa7ba1c) is not a SeqI compliant module.
> Attempting to dump, but may fail!
> ---------------------------------------------------
> Can't locate object method "seq" via package "Bio::SeqIO::genbank" at
> /Library/Perl/5.8.8/Bio/SeqIO/genbank.pm line 760, <GEN0> line 447.
>
> by doing this,
>
> my $feat = new Bio::SeqFeature::Generic(-start                 =>20,
>                                        -end         => $40,
>                                        -primary_tag => 'newfeature' );
>                                    $feat->add_tag_value("note","this is
> notes");
>  $f->add_SeqFeature($feat); ## f is original feature pointer
> $io = Bio::SeqIO->new(-format => "genbank", -file => ">$newoutfile" );
>
>    $io->write_seq($seqio_object);
>
> On Fri, Aug 13, 2010 at 11:57 AM, Roy Chaudhuri
> <roy.chaudhuri at gmail.com>wrote:
>
> > Please remember to copy replies to the mailing list.
> >
> > You can loop over the features in your Bio::Seq object:
> > for my $feat ($seq->get_SeqFeatures) { # do something }
> >
> > And once you have found the feature you want to modify, you can add a
> tag
> > using something like:
> > $feat->add_tag_value('note',"this is a note");
> >
> > When you're finished you can write out the modified sequence object to
> a
> > new GenBank file.
> >
> >
> > On 13/08/2010 16:40, Jessica Sun wrote:
> >
> >> no i want to load the genbank file with existing features and I need
> to
> >> add some new feature tags to the existing ones and then save to a new
> >> update genbank file for local usage. I just not quite good on how to
> >> easily merge the two steps you recommended into one in a neat way.
> >>
> >> thx
> >>
> >>
> >> On Fri, Aug 13, 2010 at 11:37 AM, Roy Chaudhuri
> <roy.chaudhuri at gmail.com
> >> <mailto:roy.chaudhuri at gmail.com>> wrote:
> >>
> >>    I'm not sure I understand, do you mean that you want to load just
> >>    the sequence from the GenBank file (ignoring the existing
> >>    annotation), then add your own features? There are instructions on
> >>    how to do that here:
> >>
> http://www.bioperl.org/wiki/HOWTO:SeqIO#Speed.2C_Bio::Seq::SeqBuilder
> >>
> >>
> >>    On 13/08/2010 16:27, Jessica Sun wrote:
> >>
> >>        unfortunately. I want to add the feature to the sequence
> object
> >>        I got
> >>        from the Genbank file, I do not mind to save a new genbank
> file but
> >>        these new genbank file contains the original genbank format
> and
> >>        info I
> >>        got plus the new feature tags I need to added to. Any quick
> >>        solution to
> >>        this?
> >>
> >>        thx
> >>
> >>        Jessica
> >>
> >>
> >>
> >>        On Fri, Aug 13, 2010 at 11:21 AM, Roy Chaudhuri
> >>        <roy.chaudhuri at gmail.com <mailto:roy.chaudhuri at gmail.com>
> >>        <mailto:roy.chaudhuri at gmail.com
> >>        <mailto:roy.chaudhuri at gmail.com>>> wrote:
> >>
> >>            Hi Jessica.
> >>
> >>            You need to use Bio::SeqIO to read in the GenBank file to
> a
> >>        BioPerl
> >>            sequence object, and to write your new GenBank file:
> >>        http://www.bioperl.org/wiki/HOWTO:SeqIO
> >>
> >>            To add a new feature follow the instructions here:
> >>
> >>
> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Own_S
> equences
> >>
> >>            (except that you are adding the feature to the sequence
> >>        object you
> >>            got from the Genbank file, not a new Bio::Seq object).
> >>
> >>            Cheers.
> >>            Roy.
> >>
> >>
> >>            On 13/08/2010 16:06, Jessica Sun wrote:
> >>
> >>                Does anyone knows how to open a genbank file, add new
> >>        feature
> >>                and then save
> >>                a new genbank
> >>                file with new feature added in bioperl ?
> >>
> >>                thx
> >>
> >>
> >>
> >>
> >>
> >>        --
> >>        Jessica Jingping Sun
> >>
> >>
> >>
> >>
> >>
> >> --
> >> Jessica Jingping Sun
> >>
> >
> >
>
>
> --
> Jessica Jingping Sun
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Jessica Jingping Sun


-- 
Jessica Jingping Sun


From MEC at stowers.org  Fri Aug 13 19:56:09 2010
From: MEC at stowers.org (Cook, Malcolm)
Date: Fri, 13 Aug 2010 14:56:09 -0500
Subject: [Bioperl-l] Fwd:  Add sequence feature
In-Reply-To: <AANLkTimFO1Yn-n7vqmmvAF5smQeGadEW_fs_a0U-7ej4@mail.gmail.com>
References: <AANLkTimkwyqE1tbZ+p-T41yvBWVq=GrCJ+L0rT+84mPy@mail.gmail.com>
	<4C6562E0.7090008@gmail.com>
	<AANLkTikGRnpybPPsoSf_FAXWUUpFCK9YEMMiyStBbfbN@mail.gmail.com>
	<4C6566B0.60706@gmail.com>
	<AANLkTikRuHaDmA3JvsdBHBXpgeti-pJvHuJk15s_9K+P@mail.gmail.com>
	<4C656B67.5020402@gmail.com>
	<AANLkTikJ7md5gYebt3GOXBMbCBvDZQ1TmoEkUMp1MPf5@mail.gmail.com>
	<1A4207F8295607498283FE9E93B775B406E4529F@EX02.asurite.ad.asu.edu>
	<AANLkTim6MBPBbRr2bEkCgCL+6NMXGqJ0wWoz3-JPRKyG@mail.gmail.com>
	<AANLkTimFO1Yn-n7vqmmvAF5smQeGadEW_fs_a0U-7ej4@mail.gmail.com>
Message-ID: <BD62CBAC4395B94096109020651BE2EC1312232E46@EXCHMB-02.stowers-institute.org>

if you want to show all your code we might not have to guess at what the problem is.....
 

Malcolm Cook
Stowers Institute for Medical Research -  Bioinformatics
Kansas City, Missouri  USA
 

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Jessica Sun
Sent: Friday, August 13, 2010 2:17 PM
To: bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] Fwd: Add sequence feature

---------- Forwarded message ----------
From: Jessica Sun <jessica.sun at gmail.com>
Date: Fri, Aug 13, 2010 at 3:16 PM
Subject: Re: [Bioperl-l] Add sequence feature
To: Kevin Brown <Kevin.M.Brown at asu.edu>


yes, I change that, somehow it still did not take the added features in.


On Fri, Aug 13, 2010 at 1:53 PM, Kevin Brown <Kevin.M.Brown at asu.edu> wrote:

> If I'm reading your sample code correctly, then you are mistakenly 
> trying to output the input SeqIO object and not the actual Bio::Seq 
> object that was read in by SeqIO.
>
> My $seqio = Bio::SeqIO->new;
> My $seq = $seqio->next_seq;
>
> #manipulate $seq
>
> My $out = Bio::SeqIO->new;
> $out->write_seq($seq);
>
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Jessica Sun
> Sent: Friday, August 13, 2010 10:07 AM
> To: Roy Chaudhuri
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Add sequence feature
>
> Thanks. I somehow get these error messages.
>
> --------------------- WARNING ---------------------
> MSG:  Bio::SeqIO::genbank=HASH(0xa7ba1c) is not a SeqI compliant module.
> Attempting to dump, but may fail!
> ---------------------------------------------------
> Can't locate object method "seq" via package "Bio::SeqIO::genbank" at 
> /Library/Perl/5.8.8/Bio/SeqIO/genbank.pm line 760, <GEN0> line 447.
>
> by doing this,
>
> my $feat = new Bio::SeqFeature::Generic(-start                 =>20,
>                                        -end         => $40,
>                                        -primary_tag => 'newfeature' );
>                                    $feat->add_tag_value("note","this 
> is notes");  $f->add_SeqFeature($feat); ## f is original feature 
> pointer $io = Bio::SeqIO->new(-format => "genbank", -file => 
> ">$newoutfile" );
>
>    $io->write_seq($seqio_object);
>
> On Fri, Aug 13, 2010 at 11:57 AM, Roy Chaudhuri
> <roy.chaudhuri at gmail.com>wrote:
>
> > Please remember to copy replies to the mailing list.
> >
> > You can loop over the features in your Bio::Seq object:
> > for my $feat ($seq->get_SeqFeatures) { # do something }
> >
> > And once you have found the feature you want to modify, you can add 
> > a
> tag
> > using something like:
> > $feat->add_tag_value('note',"this is a note");
> >
> > When you're finished you can write out the modified sequence object 
> > to
> a
> > new GenBank file.
> >
> >
> > On 13/08/2010 16:40, Jessica Sun wrote:
> >
> >> no i want to load the genbank file with existing features and I 
> >> need
> to
> >> add some new feature tags to the existing ones and then save to a 
> >> new update genbank file for local usage. I just not quite good on 
> >> how to easily merge the two steps you recommended into one in a neat way.
> >>
> >> thx
> >>
> >>
> >> On Fri, Aug 13, 2010 at 11:37 AM, Roy Chaudhuri
> <roy.chaudhuri at gmail.com
> >> <mailto:roy.chaudhuri at gmail.com>> wrote:
> >>
> >>    I'm not sure I understand, do you mean that you want to load just
> >>    the sequence from the GenBank file (ignoring the existing
> >>    annotation), then add your own features? There are instructions on
> >>    how to do that here:
> >>
> http://www.bioperl.org/wiki/HOWTO:SeqIO#Speed.2C_Bio::Seq::SeqBuilder
> >>
> >>
> >>    On 13/08/2010 16:27, Jessica Sun wrote:
> >>
> >>        unfortunately. I want to add the feature to the sequence
> object
> >>        I got
> >>        from the Genbank file, I do not mind to save a new genbank
> file but
> >>        these new genbank file contains the original genbank format
> and
> >>        info I
> >>        got plus the new feature tags I need to added to. Any quick
> >>        solution to
> >>        this?
> >>
> >>        thx
> >>
> >>        Jessica
> >>
> >>
> >>
> >>        On Fri, Aug 13, 2010 at 11:21 AM, Roy Chaudhuri
> >>        <roy.chaudhuri at gmail.com <mailto:roy.chaudhuri at gmail.com>
> >>        <mailto:roy.chaudhuri at gmail.com
> >>        <mailto:roy.chaudhuri at gmail.com>>> wrote:
> >>
> >>            Hi Jessica.
> >>
> >>            You need to use Bio::SeqIO to read in the GenBank file 
> >> to
> a
> >>        BioPerl
> >>            sequence object, and to write your new GenBank file:
> >>        http://www.bioperl.org/wiki/HOWTO:SeqIO
> >>
> >>            To add a new feature follow the instructions here:
> >>
> >>
> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Building_Your_Own
> _S
> equences
> >>
> >>            (except that you are adding the feature to the sequence
> >>        object you
> >>            got from the Genbank file, not a new Bio::Seq object).
> >>
> >>            Cheers.
> >>            Roy.
> >>
> >>
> >>            On 13/08/2010 16:06, Jessica Sun wrote:
> >>
> >>                Does anyone knows how to open a genbank file, add new
> >>        feature
> >>                and then save
> >>                a new genbank
> >>                file with new feature added in bioperl ?
> >>
> >>                thx
> >>
> >>
> >>
> >>
> >>
> >>        --
> >>        Jessica Jingping Sun
> >>
> >>
> >>
> >>
> >>
> >> --
> >> Jessica Jingping Sun
> >>
> >
> >
>
>
> --
> Jessica Jingping Sun
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


--
Jessica Jingping Sun


-- 
Jessica Jingping Sun
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Mon Aug 16 18:02:15 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 16 Aug 2010 13:02:15 -0500
Subject: [Bioperl-l] Bug? Features with similar ranges,
	different IDs are considered overlapping
Message-ID: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu>

All,

This is in reference to a bug report I filed a while back.  In the below test script, two features with the same start/end are compared.  If the features have the same seq_id(), overlap succeeds.  If the seq_id is changed (e.g. is on another chromosome, for instance), the overlap still succeeds.  

The question is: is this a bug?  My vote would be 'yes', but there have been various arguments to say it's not.  

chris

(maybe I'll make this a regular thing on the list, just to hash out some of the edge cases I run into periodically)

=========================================

#!/usr/bin/perl -w

use strict;
use warnings;
use Test::More;
use Bio::SeqFeature::Generic;

my ( $feat1, $feat2 );

$feat1 = Bio::SeqFeature::Generic->new(
    -start  => 40,
    -end    => 80,
    -strand => 1,
    -seq_id => 'ABC123',
);

is $feat1->start,  40,       'start of feature location';
is $feat1->end,    80,       'end of feature location';
is $feat1->seq_id, 'ABC123', 'seq_id';

$feat2 = Bio::SeqFeature::Generic->new(
    -start  => 40,
    -end    => 80,
    -strand => 1,
    -seq_id => 'ABC123',
);

is $feat2->start,  40,       'start of feature location';
is $feat2->end,    80,       'end of feature location';
is $feat2->seq_id, 'ABC123', 'seq_id';

# Generic features with same Seq ID should overlap
ok( $feat2->overlaps($feat1), 'feat2 overlaps feat1' );

# Generic features with different Seq IDs shouldn't overlap
is( $feat2->seq_id('XYZ678'), 'XYZ678', 'change seq_id' );

# this currently fails
ok( !( $feat2->overlaps($feat1), 'feat2 doesn\'t overlap feat1' ) );

done_testing();


From David.Messina at sbc.su.se  Mon Aug 16 18:51:54 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Mon, 16 Aug 2010 20:51:54 +0200
Subject: [Bioperl-l] Bug? Features with similar ranges,
	different IDs are considered overlapping
In-Reply-To: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu>
References: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu>
Message-ID: <A07B1A30-7708-4401-BB13-7B4463D306E7@sbc.su.se>

> The question is: is this a bug?

Hmm, tricky.

Genomic start and end positions with differing IDs shouldn't overlap, but can't SeqFeatures apply to proteins and other molecules where one would want to compare positions without regard to ID?


Dave


From cjfields at illinois.edu  Tue Aug 17 01:39:00 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 16 Aug 2010 20:39:00 -0500
Subject: [Bioperl-l] Bug? Features with similar ranges,
	different IDs are considered overlapping
In-Reply-To: <A07B1A30-7708-4401-BB13-7B4463D306E7@sbc.su.se>
References: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu>
	<A07B1A30-7708-4401-BB13-7B4463D306E7@sbc.su.se>
Message-ID: <E3473ED6-2122-4B4A-8A73-E80C4136CCAC@illinois.edu>

On Aug 16, 2010, at 1:51 PM, Dave Messina wrote:

>> The question is: is this a bug?
> 
> Hmm, tricky.
> 
> Genomic start and end positions with differing IDs shouldn't overlap, but can't SeqFeatures apply to proteins and other molecules where one would want to compare positions without regard to ID?
> 
> Dave

Good point; it's probably the context the methods are used that matters.  So, maybe just a document clarification?

chris


From David.Messina at sbc.su.se  Tue Aug 17 09:06:05 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Tue, 17 Aug 2010 11:06:05 +0200
Subject: [Bioperl-l] Bug? Features with similar ranges,
	different IDs are considered overlapping
In-Reply-To: <E3473ED6-2122-4B4A-8A73-E80C4136CCAC@illinois.edu>
References: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu>
	<A07B1A30-7708-4401-BB13-7B4463D306E7@sbc.su.se>
	<E3473ED6-2122-4B4A-8A73-E80C4136CCAC@illinois.edu>
Message-ID: <83732A2C-C6E3-479A-ACAA-FE5756271991@sbc.su.se>

> Good point; it's probably the context the methods are used that matters.  So, maybe just a document clarification?

That's always good, but it really doesn't solve the issue you're describing.

I mean, who would expect to get overlaps for features on different chromosomes?

To me, that's a clear violation of reasonable user expectations. You shouldn't have to read the docs about something like that.

So what's the solution for these duelling use cases? I haven't thought about it much, but a first approximation might be to add a -genomic boolean flag that, when true, would do the right thing and check the ID when doing overlaps or other positional comparisons.

(Maybe -genomic is too obscure. Maybe it should be -same_id_for_overlaps or something like that.)

And maybe having to know to set a flag is effectively the same thing as having to read the docs to understand SeqFeature's overlap behavior.

What do the rest of you out there think?


Dave


From scott at scottcain.net  Tue Aug 17 12:45:27 2010
From: scott at scottcain.net (Scott Cain)
Date: Tue, 17 Aug 2010 08:45:27 -0400
Subject: [Bioperl-l] Bug? Features with similar ranges,
	different IDs are considered overlapping
In-Reply-To: <83732A2C-C6E3-479A-ACAA-FE5756271991@sbc.su.se>
References: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu>
	<A07B1A30-7708-4401-BB13-7B4463D306E7@sbc.su.se>
	<E3473ED6-2122-4B4A-8A73-E80C4136CCAC@illinois.edu>
	<83732A2C-C6E3-479A-ACAA-FE5756271991@sbc.su.se>
Message-ID: <B7A8E3B4-1E7E-4768-AFF3-3D4C4A5FC3B1@scottcain.net>

Hi Dave and Chris,

It seems to me that the genomic comparison is the thing people would do more often, so if you're going to create a flag, the default should be for the genomic comparison and if somebody is doing the protein space comparison and not getting the the expected results, they'll probably read the docs to find out why. 

Scott

--
Scott Cain, Ph. D.
scott at scottcain dot net
Ontario Institute for Cancer Research
http://gmod.org/
216 392 3087 

Snet from my iPhone.

On Aug 17, 2010, at 5:06 AM, Dave Messina <David.Messina at sbc.su.se> wrote:

>> Good point; it's probably the context the methods are used that matters.  So, maybe just a document clarification?
> 
> That's always good, but it really doesn't solve the issue you're describing.
> 
> I mean, who would expect to get overlaps for features on different chromosomes?
> 
> To me, that's a clear violation of reasonable user expectations. You shouldn't have to read the docs about something like that.
> 
> So what's the solution for these duelling use cases? I haven't thought about it much, but a first approximation might be to add a -genomic boolean flag that, when true, would do the right thing and check the ID when doing overlaps or other positional comparisons.
> 
> (Maybe -genomic is too obscure. Maybe it should be -same_id_for_overlaps or something like that.)
> 
> And maybe having to know to set a flag is effectively the same thing as having to read the docs to understand SeqFeature's overlap behavior.
> 
> What do the rest of you out there think?
> 
> 
> Dave
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From david.breimann at gmail.com  Tue Aug 17 13:44:08 2010
From: david.breimann at gmail.com (David Breimann)
Date: Tue, 17 Aug 2010 16:44:08 +0300
Subject: [Bioperl-l] bp_genbank2gff3.pl error with circular genomes
Message-ID: <AANLkTinzCSDcbXVDabwW+qmwSOKcVkYC-a-pkmYy1dfM@mail.gmail.com>

Hello,

The following genbank has a gene that runs over the 'end" of the
chromosome and into its "beginning", and the script generates an
error.

ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Bacillus_cereus_ATCC_10987/NC_005707.gbk

NC_005707 Unflattening error:
Details:
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: PROBLEM, SEVERITY==2
Ranges not in correct order. Strange ensembl genbank entry? Range:
[207497,208369] [1,687]
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:473
STACK: Bio::SeqFeature::Tools::Unflattener::problem
/usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:952
STACK: Bio::SeqFeature::Tools::Unflattener::_check_order_is_consistent
/usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2842
STACK: Bio::SeqFeature::Tools::Unflattener::infer_mRNA_from_CDS
/usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2713
STACK: Bio::SeqFeature::Tools::Unflattener::unflatten_seq
/usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:1532
STACK: main::unflatten_seq /usr/local/bin/bp_genbank2gff3.pl:1023
STACK: /usr/local/bin/bp_genbank2gff3.pl:506
-----------------------------------------------------------

Best,
Dave


From cjfields at illinois.edu  Tue Aug 17 13:51:02 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 17 Aug 2010 08:51:02 -0500
Subject: [Bioperl-l] bp_genbank2gff3.pl error with circular genomes
In-Reply-To: <AANLkTinzCSDcbXVDabwW+qmwSOKcVkYC-a-pkmYy1dfM@mail.gmail.com>
References: <AANLkTinzCSDcbXVDabwW+qmwSOKcVkYC-a-pkmYy1dfM@mail.gmail.com>
Message-ID: <8E620E8B-4A4A-42B2-A2FA-60C8ECA4C8A5@illinois.edu>

I think Chris Mungall has a branch set up for this in bioperl:

http://github.com/bioperl/bioperl-live/tree/circular

Is that correct?  Should we merge that code into the master branch?

chris

On Aug 17, 2010, at 8:44 AM, David Breimann wrote:

> Hello,
> 
> The following genbank has a gene that runs over the 'end" of the
> chromosome and into its "beginning", and the script generates an
> error.
> 
> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Bacillus_cereus_ATCC_10987/NC_005707.gbk
> 
> NC_005707 Unflattening error:
> Details:
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: PROBLEM, SEVERITY==2
> Ranges not in correct order. Strange ensembl genbank entry? Range:
> [207497,208369] [1,687]
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:473
> STACK: Bio::SeqFeature::Tools::Unflattener::problem
> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:952
> STACK: Bio::SeqFeature::Tools::Unflattener::_check_order_is_consistent
> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2842
> STACK: Bio::SeqFeature::Tools::Unflattener::infer_mRNA_from_CDS
> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2713
> STACK: Bio::SeqFeature::Tools::Unflattener::unflatten_seq
> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:1532
> STACK: main::unflatten_seq /usr/local/bin/bp_genbank2gff3.pl:1023
> STACK: /usr/local/bin/bp_genbank2gff3.pl:506
> -----------------------------------------------------------
> 
> Best,
> Dave
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From David.Messina at sbc.su.se  Tue Aug 17 13:52:11 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Tue, 17 Aug 2010 15:52:11 +0200
Subject: [Bioperl-l] Bug? Features with similar ranges,
	different IDs are considered overlapping
In-Reply-To: <B7A8E3B4-1E7E-4768-AFF3-3D4C4A5FC3B1@scottcain.net>
References: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu>
	<A07B1A30-7708-4401-BB13-7B4463D306E7@sbc.su.se>
	<E3473ED6-2122-4B4A-8A73-E80C4136CCAC@illinois.edu>
	<83732A2C-C6E3-479A-ACAA-FE5756271991@sbc.su.se>
	<B7A8E3B4-1E7E-4768-AFF3-3D4C4A5FC3B1@scottcain.net>
Message-ID: <EA0C23FB-8C2F-4C04-B0E8-4207409916DC@sbc.su.se>

> It seems to me that the genomic comparison is the thing people would do more often, so if you're going to create a flag, the default should be for the genomic comparison

Yep, agreed.

And such a flag should be named for the non-default behavior, then, like: -ignore_IDs_for_overlaps


Dave


From douglas.hoen at gmail.com  Thu Aug 12 14:24:27 2010
From: douglas.hoen at gmail.com (Douglas Hoen)
Date: Thu, 12 Aug 2010 10:24:27 -0400
Subject: [Bioperl-l] HMMER3 to GFF3
In-Reply-To: <20100812141645.1dc6507a.kai.blin@biotech.uni-tuebingen.de>
References: <4bb89ced-69d9-43ff-ae20-4ce134efc40a@f6g2000yqa.googlegroups.com>
	<20100812141645.1dc6507a.kai.blin@biotech.uni-tuebingen.de>
Message-ID: <A1AA9B70-69B9-4AA6-BB5F-FB0D0FDD0491@gmail.com>

Hi Kai,

Here it is.

Thanks,
-- Doug


-------------- next part --------------
A non-text attachment was scrubbed...
Name: chr1-tesigsv2.hmmscan
Type: application/octet-stream
Size: 676132 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20100812/7818b4a4/attachment-0004.obj>
-------------- next part --------------


On 2010-08-12, at 8:16 AM, Kai Blin wrote:

> On Wed, 11 Aug 2010 22:59:37 -0700 (PDT)
> Doug Hoen <douglas.hoen at gmail.com> wrote:
> 
> Hi Doug,
> 
>> Could someone please confirm whether the results are incorrect and, if
>> so, perhaps suggest a fix? It may well be that this problem is due to
>> the unusual way I am using hmmscan, rather than a problem with HMMER3
>> parsing...?
> 
> Can you please attach your hmmer input file? Along the way something
> inserted line breaks, making it unreadable.
> 
> It might well be possible that the HMMer3 parser still handles a little
> different from the HMMer2 parser, I haven't tried that script.
> 
> Cheers,
> Kai
> 
> -- 
> Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
> Institute for Microbiology and Infection Medicine
> Division of Microbiology/Biotechnology
> Eberhard-Karls-University of T?bingen
> Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
> D-72076 T?bingen                        Fax :   ++49 7071 29-5979
> Deutschland
> Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben


From CJMungall at lbl.gov  Tue Aug 17 15:53:15 2010
From: CJMungall at lbl.gov (Chris Mungall)
Date: Tue, 17 Aug 2010 08:53:15 -0700
Subject: [Bioperl-l] bp_genbank2gff3.pl error with circular genomes
In-Reply-To: <8E620E8B-4A4A-42B2-A2FA-60C8ECA4C8A5@illinois.edu>
References: <AANLkTinzCSDcbXVDabwW+qmwSOKcVkYC-a-pkmYy1dfM@mail.gmail.com>
	<8E620E8B-4A4A-42B2-A2FA-60C8ECA4C8A5@illinois.edu>
Message-ID: <D64E3F00-57BE-484B-A4DE-EEAC673C82E4@lbl.gov>


You can merge this in. It should allow David to proceed.

I haven't kept up on synchrony between bioperl and GFF on circular  
genomes. The above fix is conservative in that essentially preserves  
the genbank coordinates even when the origin is crossed:

	http://github.com/bioperl/bioperl-live/commit/d752a4cb5168d1bb01f8c80247a57f66b2bd9daf

However, if this is to conform to GFF3 then the resulting coordinates  
that cross the origin should have start/end incremented by the genome  
length

On Aug 17, 2010, at 6:51 AM, Chris Fields wrote:

> I think Chris Mungall has a branch set up for this in bioperl:
>
> http://github.com/bioperl/bioperl-live/tree/circular
>
> Is that correct?  Should we merge that code into the master branch?
>
> chris
>
> On Aug 17, 2010, at 8:44 AM, David Breimann wrote:
>
>> Hello,
>>
>> The following genbank has a gene that runs over the 'end" of the
>> chromosome and into its "beginning", and the script generates an
>> error.
>>
>> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Bacillus_cereus_ATCC_10987/NC_005707.gbk
>>
>> NC_005707 Unflattening error:
>> Details:
>> ------------- EXCEPTION: Bio::Root::Exception -------------
>> MSG: PROBLEM, SEVERITY==2
>> Ranges not in correct order. Strange ensembl genbank entry? Range:
>> [207497,208369] [1,687]
>> STACK: Error::throw
>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/ 
>> Root.pm:473
>> STACK: Bio::SeqFeature::Tools::Unflattener::problem
>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:952
>> STACK:  
>> Bio::SeqFeature::Tools::Unflattener::_check_order_is_consistent
>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2842
>> STACK: Bio::SeqFeature::Tools::Unflattener::infer_mRNA_from_CDS
>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2713
>> STACK: Bio::SeqFeature::Tools::Unflattener::unflatten_seq
>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:1532
>> STACK: main::unflatten_seq /usr/local/bin/bp_genbank2gff3.pl:1023
>> STACK: /usr/local/bin/bp_genbank2gff3.pl:506
>> -----------------------------------------------------------
>>
>> Best,
>> Dave
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at illinois.edu  Tue Aug 17 19:24:23 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 17 Aug 2010 14:24:23 -0500
Subject: [Bioperl-l] bp_genbank2gff3.pl error with circular genomes
In-Reply-To: <D64E3F00-57BE-484B-A4DE-EEAC673C82E4@lbl.gov>
References: <AANLkTinzCSDcbXVDabwW+qmwSOKcVkYC-a-pkmYy1dfM@mail.gmail.com>
	<8E620E8B-4A4A-42B2-A2FA-60C8ECA4C8A5@illinois.edu>
	<D64E3F00-57BE-484B-A4DE-EEAC673C82E4@lbl.gov>
Message-ID: <8C50FCC4-4BB5-4ACB-A138-BB20F50D2C45@illinois.edu>

On Aug 17, 2010, at 10:53 AM, Chris Mungall wrote:

> You can merge this in. It should allow David to proceed.

Will do.  I'll go ahead and delete the remote branch as well.

> I haven't kept up on synchrony between bioperl and GFF on circular genomes. The above fix is conservative in that essentially preserves the genbank coordinates even when the origin is crossed:
> 
> 	http://github.com/bioperl/bioperl-live/commit/d752a4cb5168d1bb01f8c80247a57f66b2bd9daf
> 
> However, if this is to conform to GFF3 then the resulting coordinates that cross the origin should have start/end incremented by the genome length

Yes, that is a problem that needs to be addressed.  Might be worth filing a bug report for tracking this; we can use David's example, or the one I recently added for phi-X174.

chris

> On Aug 17, 2010, at 6:51 AM, Chris Fields wrote:
> 
>> I think Chris Mungall has a branch set up for this in bioperl:
>> 
>> http://github.com/bioperl/bioperl-live/tree/circular
>> 
>> Is that correct?  Should we merge that code into the master branch?
>> 
>> chris
>> 
>> On Aug 17, 2010, at 8:44 AM, David Breimann wrote:
>> 
>>> Hello,
>>> 
>>> The following genbank has a gene that runs over the 'end" of the
>>> chromosome and into its "beginning", and the script generates an
>>> error.
>>> 
>>> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Bacillus_cereus_ATCC_10987/NC_005707.gbk
>>> 
>>> NC_005707 Unflattening error:
>>> Details:
>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>> MSG: PROBLEM, SEVERITY==2
>>> Ranges not in correct order. Strange ensembl genbank entry? Range:
>>> [207497,208369] [1,687]
>>> STACK: Error::throw
>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:473
>>> STACK: Bio::SeqFeature::Tools::Unflattener::problem
>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:952
>>> STACK: Bio::SeqFeature::Tools::Unflattener::_check_order_is_consistent
>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2842
>>> STACK: Bio::SeqFeature::Tools::Unflattener::infer_mRNA_from_CDS
>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2713
>>> STACK: Bio::SeqFeature::Tools::Unflattener::unflatten_seq
>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:1532
>>> STACK: main::unflatten_seq /usr/local/bin/bp_genbank2gff3.pl:1023
>>> STACK: /usr/local/bin/bp_genbank2gff3.pl:506
>>> -----------------------------------------------------------
>>> 
>>> Best,
>>> Dave
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From sheldon.mckay at gmail.com  Tue Aug 17 20:42:50 2010
From: sheldon.mckay at gmail.com (Sheldon McKay)
Date: Tue, 17 Aug 2010 16:42:50 -0400
Subject: [Bioperl-l] AlignIO and Gbrowse_syn
In-Reply-To: <E53C66C1-E4F1-4E83-B5ED-631CE62D7DCE@illinois.edu>
References: <AANLkTi=DS92zQ+_YANSbgmMiPUDo66YXyLhQby+_z_VD@mail.gmail.com>
	<C87CF736.E5DB%gowthaman.ramasamy@sbri.org>
	<AANLkTimhT7KFHgg8HssiMrXkGNSjRMqjkuE++iAaqj4C@mail.gmail.com>
	<18DF7D20DFEC044098A1062202F5FFF32F0237EAB7@exchsth.agresearch.co.nz>
	<E53C66C1-E4F1-4E83-B5ED-631CE62D7DCE@illinois.edu>
Message-ID: <AANLkTikYi9TGag3poS=xB73iGxqX_-ThZS9wU1TC2JDH@mail.gmail.com>

The growse_syn dev team is pretty small (n=1) right now, so any
patches would be welcome.

Sheldon


On Wed, Aug 11, 2010 at 6:02 PM, Chris Fields <cjfields at illinois.edu> wrote:
> Russell,
>
> We have had very few requests to support .maf until recently, which is why there has been little done with it. ?We welcome any help to improve it.
>
> chris
>
> On Aug 11, 2010, at 4:31 PM, Smithies, Russell wrote:
>
>> I know there was some brief discussion about .maf format a few weeks ago but I've had an enquiry (as below) from a colleague.
>> If GBrowse_syn is using .maf format, does AlignIO need more work?
>> Any comments?
>>
>> --Russell
>>
>>
>> I'd like to plug LASTZ alignments into GBrowse_syn. LASTZ can produce a limit number of alignment formats (http://www.bx.psu.edu/miller_lab/dist/README.lastz-1.02.00/README.lastz-1.02.00a.html#options_output). GBrowse_syn accepts clustalw format plus "other commonly used formats recognized by BioPerl's AlignIO parser" (http://gmod.org/wiki/GBrowse_syn_Database) . ?Since LASTZ doesn't produce clustalw, I've tried parsing LASTZ maf output to clustalw (and other alignment formats) using AlignIO, however I run into the following issues:
>> *Strand info is lost (probably fair enough, since this isn't part of the clustalw format per se; incorporating strand info within sequence IDs is a GBrowse_syn clustalw specification)
>> *The coordinate system for reverse strand matches ?differs between LASTZ .maf and BioPerl .maf: for LASTZ, coordinates relate to the reverse complemented sequence, whereas for BioPerl/GBrowse, coordinates relate to the original (non-rev complemented) sequence. E.g. a coordinate of "1" in the LASTZ .maf file refers to the last base of the original sequence; AlignIO prints "1" to the output clustalw file, but since strand info is lost it is construed as the first position at the very start of the original sequence. As a result all reverse match coordinates in the resulting clustalw output file are incorrect.
>> *AlignIO is unable to parse multiple, individual aligned regions within the same .maf file; it interleaves them
>>
>> I would be interested to hear whether anyone has already found a solution to integrating LASTZ and GBrowse_syn... and also whether any development of AlignIO to improve support of maf format is planned.
>> =======================================================================
>> Attention: The information contained in this message and/or attachments
>> from AgResearch Limited is intended only for the persons or entities
>> to which it is addressed and may contain confidential and/or privileged
>> material. Any review, retransmission, dissemination or other use of, or
>> taking of any action in reliance upon, this information by persons or
>> entities other than the intended recipients is prohibited by AgResearch
>> Limited. If you have received this message in error, please notify the
>> sender immediately.
>> =======================================================================
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From hxu.hong at gmail.com  Tue Aug 17 20:50:43 2010
From: hxu.hong at gmail.com (Hong Xu)
Date: Tue, 17 Aug 2010 16:50:43 -0400
Subject: [Bioperl-l] Bio::Tools::Primer3 question
Message-ID: <AANLkTi=NcuvzepGaqw_TUTr5MM6F2K_b8PT8Fa3qrZg2@mail.gmail.com>

Hello all,

I'm working to parse the Primer3 release 2.2.2-beta result. I made the
necessary changes to make Bio::Tools::Primer3 work with the new output
tags of Primer3 release 2.2.2. But when I tried to get the primer Tm,
I found that Bio::Tools::Primer3 gave different Tm from Primer3 result
file. Then I learned that the Tm was calculated by
Bio::SeqFeature::Primer module, not from parsing Primer3 result. If I
want to get data from parsing Primer3 result, should I write my own
Primer3 parser instead of Bio::Tools::Primer3?

thanks a lot,
Hong


From cjfields at illinois.edu  Tue Aug 17 21:14:02 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 17 Aug 2010 16:14:02 -0500
Subject: [Bioperl-l] Bio::Tools::Primer3 question
In-Reply-To: <AANLkTi=NcuvzepGaqw_TUTr5MM6F2K_b8PT8Fa3qrZg2@mail.gmail.com>
References: <AANLkTi=NcuvzepGaqw_TUTr5MM6F2K_b8PT8Fa3qrZg2@mail.gmail.com>
Message-ID: <E039C425-80C3-4F18-B589-AE98896A1175@illinois.edu>

Already ahead of you there, unfortunately.  I wrote a complete reimplementation of both the Primer3 parser and the Primer3 wrapper that handles both v1 and v2 of primer3_core.  Lack of tuits lately have prevented me from getting tests written up, so for the time being it's sitting in bioperl-dev:

http://github.com/bioperl/bioperl-dev

They are Bio::Tools::Primer3Redux (parser) and Bio::Tools::Run::Primer3Redux (wrapper).

I rewrote those b/c I found the original modules not adequate enough in many ways for my purposes then (the newer version uses simple features or feature pairs instead of the primer features, for the same reasons you mention re: Tm).  You're more than welcome to hack on the code a bit.  I'm planning on pulling it out into my own github repo for separate submission to CPAN.  

chris

On Aug 17, 2010, at 3:50 PM, Hong Xu wrote:

> Hello all,
> 
> I'm working to parse the Primer3 release 2.2.2-beta result. I made the
> necessary changes to make Bio::Tools::Primer3 work with the new output
> tags of Primer3 release 2.2.2. But when I tried to get the primer Tm,
> I found that Bio::Tools::Primer3 gave different Tm from Primer3 result
> file. Then I learned that the Tm was calculated by
> Bio::SeqFeature::Primer module, not from parsing Primer3 result. If I
> want to get data from parsing Primer3 result, should I write my own
> Primer3 parser instead of Bio::Tools::Primer3?
> 
> thanks a lot,
> Hong
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Wed Aug 18 03:42:59 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 17 Aug 2010 22:42:59 -0500
Subject: [Bioperl-l] bp_genbank2gff3.pl error with circular genomes
In-Reply-To: <8C50FCC4-4BB5-4ACB-A138-BB20F50D2C45@illinois.edu>
References: <AANLkTinzCSDcbXVDabwW+qmwSOKcVkYC-a-pkmYy1dfM@mail.gmail.com>
	<8E620E8B-4A4A-42B2-A2FA-60C8ECA4C8A5@illinois.edu>
	<D64E3F00-57BE-484B-A4DE-EEAC673C82E4@lbl.gov>
	<8C50FCC4-4BB5-4ACB-A138-BB20F50D2C45@illinois.edu>
Message-ID: <D1CC1B9C-36A7-4427-9100-AE5C85C5E965@illinois.edu>

Chris, David, 

The branch is now merged back to trunk.  David, let us know if this helps.

chris (f)

On Aug 17, 2010, at 2:24 PM, Chris Fields wrote:

> On Aug 17, 2010, at 10:53 AM, Chris Mungall wrote:
> 
>> You can merge this in. It should allow David to proceed.
> 
> Will do.  I'll go ahead and delete the remote branch as well.
> 
>> I haven't kept up on synchrony between bioperl and GFF on circular genomes. The above fix is conservative in that essentially preserves the genbank coordinates even when the origin is crossed:
>> 
>> 	http://github.com/bioperl/bioperl-live/commit/d752a4cb5168d1bb01f8c80247a57f66b2bd9daf
>> 
>> However, if this is to conform to GFF3 then the resulting coordinates that cross the origin should have start/end incremented by the genome length
> 
> Yes, that is a problem that needs to be addressed.  Might be worth filing a bug report for tracking this; we can use David's example, or the one I recently added for phi-X174.
> 
> chris
> 
>> On Aug 17, 2010, at 6:51 AM, Chris Fields wrote:
>> 
>>> I think Chris Mungall has a branch set up for this in bioperl:
>>> 
>>> http://github.com/bioperl/bioperl-live/tree/circular
>>> 
>>> Is that correct?  Should we merge that code into the master branch?
>>> 
>>> chris
>>> 
>>> On Aug 17, 2010, at 8:44 AM, David Breimann wrote:
>>> 
>>>> Hello,
>>>> 
>>>> The following genbank has a gene that runs over the 'end" of the
>>>> chromosome and into its "beginning", and the script generates an
>>>> error.
>>>> 
>>>> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Bacillus_cereus_ATCC_10987/NC_005707.gbk
>>>> 
>>>> NC_005707 Unflattening error:
>>>> Details:
>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>>> MSG: PROBLEM, SEVERITY==2
>>>> Ranges not in correct order. Strange ensembl genbank entry? Range:
>>>> [207497,208369] [1,687]
>>>> STACK: Error::throw
>>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:473
>>>> STACK: Bio::SeqFeature::Tools::Unflattener::problem
>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:952
>>>> STACK: Bio::SeqFeature::Tools::Unflattener::_check_order_is_consistent
>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2842
>>>> STACK: Bio::SeqFeature::Tools::Unflattener::infer_mRNA_from_CDS
>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2713
>>>> STACK: Bio::SeqFeature::Tools::Unflattener::unflatten_seq
>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:1532
>>>> STACK: main::unflatten_seq /usr/local/bin/bp_genbank2gff3.pl:1023
>>>> STACK: /usr/local/bin/bp_genbank2gff3.pl:506
>>>> -----------------------------------------------------------
>>>> 
>>>> Best,
>>>> Dave
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Wed Aug 18 04:48:55 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 17 Aug 2010 23:48:55 -0500
Subject: [Bioperl-l] Bio::Tools::Primer3 question
In-Reply-To: <E039C425-80C3-4F18-B589-AE98896A1175@illinois.edu>
References: <AANLkTi=NcuvzepGaqw_TUTr5MM6F2K_b8PT8Fa3qrZg2@mail.gmail.com>
	<E039C425-80C3-4F18-B589-AE98896A1175@illinois.edu>
Message-ID: <C4B91FBD-1705-4045-9D98-F5ABEA80C038@illinois.edu>

Hong,

The latest code, along with working tests, is present here:

http://github.com/cjfields/Bio-Tools-Primer3Redux

It needs a few more tests but the initial wrapper tests work fine for primer3 v2.2.1 on both Mac and Linux.  Will try using this to CPAN after a bit more cleanup.

chris

On Aug 17, 2010, at 4:14 PM, Chris Fields wrote:

> Already ahead of you there, unfortunately.  I wrote a complete reimplementation of both the Primer3 parser and the Primer3 wrapper that handles both v1 and v2 of primer3_core.  Lack of tuits lately have prevented me from getting tests written up, so for the time being it's sitting in bioperl-dev:
> 
> http://github.com/bioperl/bioperl-dev
> 
> They are Bio::Tools::Primer3Redux (parser) and Bio::Tools::Run::Primer3Redux (wrapper).
> 
> I rewrote those b/c I found the original modules not adequate enough in many ways for my purposes then (the newer version uses simple features or feature pairs instead of the primer features, for the same reasons you mention re: Tm).  You're more than welcome to hack on the code a bit.  I'm planning on pulling it out into my own github repo for separate submission to CPAN.  
> 
> chris
> 
> On Aug 17, 2010, at 3:50 PM, Hong Xu wrote:
> 
>> Hello all,
>> 
>> I'm working to parse the Primer3 release 2.2.2-beta result. I made the
>> necessary changes to make Bio::Tools::Primer3 work with the new output
>> tags of Primer3 release 2.2.2. But when I tried to get the primer Tm,
>> I found that Bio::Tools::Primer3 gave different Tm from Primer3 result
>> file. Then I learned that the Tm was calculated by
>> Bio::SeqFeature::Primer module, not from parsing Primer3 result. If I
>> want to get data from parsing Primer3 result, should I write my own
>> Primer3 parser instead of Bio::Tools::Primer3?
>> 
>> thanks a lot,
>> Hong
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From david.breimann at gmail.com  Wed Aug 18 06:46:58 2010
From: david.breimann at gmail.com (David Breimann)
Date: Wed, 18 Aug 2010 09:46:58 +0300
Subject: [Bioperl-l] bp_genbank2gff3.pl error with circular genomes
In-Reply-To: <D1CC1B9C-36A7-4427-9100-AE5C85C5E965@illinois.edu>
References: <AANLkTinzCSDcbXVDabwW+qmwSOKcVkYC-a-pkmYy1dfM@mail.gmail.com>
	<8E620E8B-4A4A-42B2-A2FA-60C8ECA4C8A5@illinois.edu>
	<D64E3F00-57BE-484B-A4DE-EEAC673C82E4@lbl.gov>
	<8C50FCC4-4BB5-4ACB-A138-BB20F50D2C45@illinois.edu>
	<D1CC1B9C-36A7-4427-9100-AE5C85C5E965@illinois.edu>
Message-ID: <AANLkTinsqQCpybg6MUzTwqNuKMn=kJMV4pL64GXwAOkG@mail.gmail.com>

Dear Chris's,

I tested the updated version on multiple genomes that previously
returned errors (for future reference: NC_005707, NC_006578,
NC_007103, NC_007104, NC_007106, NC_007107, NC_008573, NC_008762,
NC_008763, NC_008785, NC_009457, NC_012040). The script now ends
normally on all of them. However, as you mentioned, the result GFF3
file does not comply with GFF3 specifications for circular genomes.
This in turn causes some unexpected results in other applications.

Best,
Dave

On Wed, Aug 18, 2010 at 6:42 AM, Chris Fields <cjfields at illinois.edu> wrote:
> Chris, David,
>
> The branch is now merged back to trunk. ?David, let us know if this helps.
>
> chris (f)
>
> On Aug 17, 2010, at 2:24 PM, Chris Fields wrote:
>
>> On Aug 17, 2010, at 10:53 AM, Chris Mungall wrote:
>>
>>> You can merge this in. It should allow David to proceed.
>>
>> Will do. ?I'll go ahead and delete the remote branch as well.
>>
>>> I haven't kept up on synchrony between bioperl and GFF on circular genomes. The above fix is conservative in that essentially preserves the genbank coordinates even when the origin is crossed:
>>>
>>> ? ? ?http://github.com/bioperl/bioperl-live/commit/d752a4cb5168d1bb01f8c80247a57f66b2bd9daf
>>>
>>> However, if this is to conform to GFF3 then the resulting coordinates that cross the origin should have start/end incremented by the genome length
>>
>> Yes, that is a problem that needs to be addressed. ?Might be worth filing a bug report for tracking this; we can use David's example, or the one I recently added for phi-X174.
>>
>> chris
>>
>>> On Aug 17, 2010, at 6:51 AM, Chris Fields wrote:
>>>
>>>> I think Chris Mungall has a branch set up for this in bioperl:
>>>>
>>>> http://github.com/bioperl/bioperl-live/tree/circular
>>>>
>>>> Is that correct? ?Should we merge that code into the master branch?
>>>>
>>>> chris
>>>>
>>>> On Aug 17, 2010, at 8:44 AM, David Breimann wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> The following genbank has a gene that runs over the 'end" of the
>>>>> chromosome and into its "beginning", and the script generates an
>>>>> error.
>>>>>
>>>>> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Bacillus_cereus_ATCC_10987/NC_005707.gbk
>>>>>
>>>>> NC_005707 Unflattening error:
>>>>> Details:
>>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>>>> MSG: PROBLEM, SEVERITY==2
>>>>> Ranges not in correct order. Strange ensembl genbank entry? Range:
>>>>> [207497,208369] [1,687]
>>>>> STACK: Error::throw
>>>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:473
>>>>> STACK: Bio::SeqFeature::Tools::Unflattener::problem
>>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:952
>>>>> STACK: Bio::SeqFeature::Tools::Unflattener::_check_order_is_consistent
>>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2842
>>>>> STACK: Bio::SeqFeature::Tools::Unflattener::infer_mRNA_from_CDS
>>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2713
>>>>> STACK: Bio::SeqFeature::Tools::Unflattener::unflatten_seq
>>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:1532
>>>>> STACK: main::unflatten_seq /usr/local/bin/bp_genbank2gff3.pl:1023
>>>>> STACK: /usr/local/bin/bp_genbank2gff3.pl:506
>>>>> -----------------------------------------------------------
>>>>>
>>>>> Best,
>>>>> Dave
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From G.Gallone at sms.ed.ac.uk  Wed Aug 18 14:57:01 2010
From: G.Gallone at sms.ed.ac.uk (Giuseppe Gallone)
Date: Wed, 18 Aug 2010 15:57:01 +0100
Subject: [Bioperl-l] [RFC] Interolog::Walk
Message-ID: <4C6BF4BD.5010200@sms.ed.ac.uk>

Hello BioPerl community - I've written a new module called 
Interolog::Walk that I'm planning to put on CPAN. I would be grateful if 
you might take a look at the brief description I attached and tell me 
what you think. I'll be more than happy to post further details should 
the module be of some interest for someone.

Also, I am not totally sure about having the correct name for it. This 
is my first module and It would be great if you could advise on naming 
it appropriately. Hopefully the following description will give an idea 
on what it does.

===================


NAME
     Interolog::Walk - Retrieve, score and visualize putative 
Protein-Protein Interactions through the orthology-walk method

DESCRIPTION
     A common activity in computational biology is to mine 
protein-protein interactions from publicly available databases in order 
to build Protein-Protein Interaction (PPI) datasets.
In many instances, however, the number of experimentally obtained 
annotated PPIs is very scarce and it would be helpful to enrich the 
experimental dataset with high-quality, computationally-inferred PPIs. 
Such computationally-obtained dataset can extend, support or enrich 
experimental PPI datasets, and are of crucial importance in 
high-throughput gene prioritization studies, i.e. to drive hypotheses 
and restrict the dimensionality of many gene functional discovery problems.
This Perl Module, Interolog::Walk, is aimed at building putative PPI 
datasets on the basis of a number of comparative biology paradigms: the 
module implements a collection of computational biology algorithms based 
on the concept of "orthology projection". If interacting proteins A and 
B in organism X have orthologs A' and B' in organism Y, under certain 
conditions one can assume that the interaction will be conserved in 
organism Y, i.e. the A-B interaction can be "projected through the 
orthologies" to obtain a putative A'-B' interaction. The pair of 
interactions (A-B) and (A'-B') are named "Interologs" (see for instance 
[1] and [2]).

Interolog::Walk collects, analyses and collates gene orthology data 
provided by the Ensembl Consortium (www.ensembl.org) as well as PPI data 
provided by EBI Intact (http://www.ebi.ac.uk/intact/). It provides the 
user with the possibility of rating the quality and reliability of the 
putative interactions collected, by means of confidence scores, and 
optionally outputs network representations of the datasets, compatible 
with the biological network representation standard, Cytoscape.

USAGE
In order to carry out an interolog walk we start with a set of gene 
identifiers in one organism of interest. We query those ids against a 
number of comparative biology databases to retrieve a list of 
orthologues for each gene id of interest, in one or more species.
In the following step we rely  on PPI databases to retrieve the list of 
available interactors for the protein ids obtained. The output at this 
stage consists of a list of interactors of the orthologues of the 
initial gene set, plus several fields of ancillary data.
In the last step of the process we  project the interactions - again 
using orthology data - back to the original species of interest. The 
output of the process is a list of PUTATIVE INTERACTORS of the initial 
gene set, plus several fields of ancillary data.

====================

Given the scope and the focus of the project, I would imagine that 
viable alternatives for the namespace might be

Bio::Orthology::InterologWalk
Bio::InterologMap

or maybe
Interolog::Map
Orthology::Map
Orthology::InterologMap

There are no similar projects as far as I could see so I shouldn't run 
the risk of overlapping namespaces. Still I would love to know your 
informed opinion about it.

best,
Giuseppe


REFERENCES
[1] Yu H, Luscombe NM, Lu HX, Zhu X, Xia Y, Han JD, Bertin N, Chung S, 
Vidal M, Gerstein M. Annotation transfer between genomes: 
protein-protein interologs and protein-DNA regulogs. Genome Research 
2004 Jun;14(6):1107-18.

[2]Wiles AM, Doderer M, Ruan J, Gu T-T, Ravi D, Blackman BA, Bishop AJR. 
"Building and Analyzing Protein Interactome Networks by Cross-species 
Comparisons." BMC Systems Biology 2010, 4:36 - PMID: 20353594

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


From David.Messina at sbc.su.se  Wed Aug 18 16:52:58 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Wed, 18 Aug 2010 18:52:58 +0200
Subject: [Bioperl-l] [RFC] Interolog::Walk
In-Reply-To: <4C6BF4BD.5010200@sms.ed.ac.uk>
References: <4C6BF4BD.5010200@sms.ed.ac.uk>
Message-ID: <8BBCD561-CA5E-412B-8CBC-34767006A74A@sbc.su.se>

Hi Giuseppe,

Sounds really interesting ? thanks for posting this.

> Bio::Orthology::InterologWalk

I vote for this name, or in any case something with Bio:: as the top-level namespace since it's a biology-related package.

I like that you're providing a lot of background and information about the project in the documentation. However, the USAGE section should give information about how to use the module, with example code. You can look at other modules on CPAN (or in BioPerl) to see the conventions for writing documentation.

Also, from what you wrote, it sounds like this might be a pipeline or a script rather than a module per se, or perhaps a script and a set of modules. It would be helpful to clarify in your documentation (if you haven't already) how exactly things are organized (and of course example code will help with that, too).


Hope that's helpful, and let us know when you've got it up on CPAN so we can try it out!


Dave


From cjfields at illinois.edu  Wed Aug 18 18:24:16 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 18 Aug 2010 13:24:16 -0500
Subject: [Bioperl-l] bp_genbank2gff3.pl error with circular genomes
In-Reply-To: <AANLkTinsqQCpybg6MUzTwqNuKMn=kJMV4pL64GXwAOkG@mail.gmail.com>
References: <AANLkTinzCSDcbXVDabwW+qmwSOKcVkYC-a-pkmYy1dfM@mail.gmail.com>
	<8E620E8B-4A4A-42B2-A2FA-60C8ECA4C8A5@illinois.edu>
	<D64E3F00-57BE-484B-A4DE-EEAC673C82E4@lbl.gov>
	<8C50FCC4-4BB5-4ACB-A138-BB20F50D2C45@illinois.edu>
	<D1CC1B9C-36A7-4427-9100-AE5C85C5E965@illinois.edu>
	<AANLkTinsqQCpybg6MUzTwqNuKMn=kJMV4pL64GXwAOkG@mail.gmail.com>
Message-ID: <C385563A-9724-4045-B5A2-7F28A5CB897A@illinois.edu>

Okay, will file this as a bug.  Thanks!

chris

On Aug 18, 2010, at 1:46 AM, David Breimann wrote:

> Dear Chris's,
> 
> I tested the updated version on multiple genomes that previously
> returned errors (for future reference: NC_005707, NC_006578,
> NC_007103, NC_007104, NC_007106, NC_007107, NC_008573, NC_008762,
> NC_008763, NC_008785, NC_009457, NC_012040). The script now ends
> normally on all of them. However, as you mentioned, the result GFF3
> file does not comply with GFF3 specifications for circular genomes.
> This in turn causes some unexpected results in other applications.
> 
> Best,
> Dave
> 
> On Wed, Aug 18, 2010 at 6:42 AM, Chris Fields <cjfields at illinois.edu> wrote:
>> Chris, David,
>> 
>> The branch is now merged back to trunk.  David, let us know if this helps.
>> 
>> chris (f)
>> 
>> On Aug 17, 2010, at 2:24 PM, Chris Fields wrote:
>> 
>>> On Aug 17, 2010, at 10:53 AM, Chris Mungall wrote:
>>> 
>>>> You can merge this in. It should allow David to proceed.
>>> 
>>> Will do.  I'll go ahead and delete the remote branch as well.
>>> 
>>>> I haven't kept up on synchrony between bioperl and GFF on circular genomes. The above fix is conservative in that essentially preserves the genbank coordinates even when the origin is crossed:
>>>> 
>>>>      http://github.com/bioperl/bioperl-live/commit/d752a4cb5168d1bb01f8c80247a57f66b2bd9daf
>>>> 
>>>> However, if this is to conform to GFF3 then the resulting coordinates that cross the origin should have start/end incremented by the genome length
>>> 
>>> Yes, that is a problem that needs to be addressed.  Might be worth filing a bug report for tracking this; we can use David's example, or the one I recently added for phi-X174.
>>> 
>>> chris
>>> 
>>>> On Aug 17, 2010, at 6:51 AM, Chris Fields wrote:
>>>> 
>>>>> I think Chris Mungall has a branch set up for this in bioperl:
>>>>> 
>>>>> http://github.com/bioperl/bioperl-live/tree/circular
>>>>> 
>>>>> Is that correct?  Should we merge that code into the master branch?
>>>>> 
>>>>> chris
>>>>> 
>>>>> On Aug 17, 2010, at 8:44 AM, David Breimann wrote:
>>>>> 
>>>>>> Hello,
>>>>>> 
>>>>>> The following genbank has a gene that runs over the 'end" of the
>>>>>> chromosome and into its "beginning", and the script generates an
>>>>>> error.
>>>>>> 
>>>>>> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Bacillus_cereus_ATCC_10987/NC_005707.gbk
>>>>>> 
>>>>>> NC_005707 Unflattening error:
>>>>>> Details:
>>>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>>>>> MSG: PROBLEM, SEVERITY==2
>>>>>> Ranges not in correct order. Strange ensembl genbank entry? Range:
>>>>>> [207497,208369] [1,687]
>>>>>> STACK: Error::throw
>>>>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:473
>>>>>> STACK: Bio::SeqFeature::Tools::Unflattener::problem
>>>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:952
>>>>>> STACK: Bio::SeqFeature::Tools::Unflattener::_check_order_is_consistent
>>>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2842
>>>>>> STACK: Bio::SeqFeature::Tools::Unflattener::infer_mRNA_from_CDS
>>>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2713
>>>>>> STACK: Bio::SeqFeature::Tools::Unflattener::unflatten_seq
>>>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:1532
>>>>>> STACK: main::unflatten_seq /usr/local/bin/bp_genbank2gff3.pl:1023
>>>>>> STACK: /usr/local/bin/bp_genbank2gff3.pl:506
>>>>>> -----------------------------------------------------------
>>>>>> 
>>>>>> Best,
>>>>>> Dave
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>> 
>>>> 
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cdavis at bcm.tmc.edu  Wed Aug 18 19:19:53 2010
From: cdavis at bcm.tmc.edu (Caleb Davis)
Date: Wed, 18 Aug 2010 14:19:53 -0500
Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlast.pm - bl2seq question
Message-ID: <4C6C3259.4060304@bcm.tmc.edu>

Hello, thank you for bioperl!

I am getting discrepancies between the online bl2seq 
(www.ncbi.nlm.nih.gov/blast/*bl2seq*/wblast2.cgi) and bioperl's 
implementation, and I'm not sure why. I'm seeing a desired behavior 
through the web interface but can't replicate it locally. Specifically, 
online bl2seq aligns across a 1 bp insertion in the subject whereas the 
local bl2seq just reports a shorter alignment.

Any ideas? Thanks again,
--Caleb

The desired parameter differences from default are -F F -W 7 (turn 
complexity filter off, word size = 7). Below I present the online and 
local results given the following input sequences:

 >consensus
GAGGATCCAGAATTCTC
 >FVFTF6N01A86BR
AACCCAATGTAAGGAAGCTAAGAACCTTGAAAAGAGGATACCAGAATTCTC

Here are the parameters and result I'm getting online:
Blast4-request ::= {
  body queue-search {
    program "blastn",
    service "plain",
    queries bioseq-set {
      seq-set {
        seq {
          id {
            local id 26297
          },
          descr {
            title "consensus",
            user {
              type str "CFastaReader",
              data {
                {
                  label str "DefLine",
                  data str ">consensus"
                }
              }
            }
          },
          inst {
            repr raw,
            mol na,
            length 17,
            seq-data ncbi2na '8A3520F740'H
          }
        }
      }
    },
    subject sequences {
      {
        id {
          local id 26299
        },
        descr {
          title "FVFTF6N01A86BR",
          user {
            type str "CFastaReader",
            data {
              {
                label str "DefLine",
                data str ">FVFTF6N01A86BR"
              }
            }
          }
        },
        inst {
          repr raw,
          mol na,
          length 51,
          seq-data ncbi2na '0543B0A09C205F80228C520F74'H
        }
      }
    },
    algorithm-options {
      {
        name "EvalueThreshold",
        value cutoff e-value { 1, 10, 1 }
      },
      {
        name "UngappedMode",
        value boolean FALSE
      },
      {
        name "PercentIdentity",
        value real { 0, 10, 0 }
      },
      {
        name "HitlistSize",
        value integer 100
      },
      {
        name "EffectiveSearchSpace",
        value big-integer 0
      },
      {
        name "DbLength",
        value big-integer 0
      },
      {
        name "WindowSize",
        value integer 0
      },
      {
        name "DustFiltering",
        value boolean FALSE
      },
      {
        name "RepeatFiltering",
        value boolean FALSE
      },
      {
        name "MaskAtHash",
        value boolean TRUE
      },
      {
        name "MismatchPenalty",
        value integer -3
      },
      {
        name "MatchReward",
        value integer 2
      },
      {
        name "GapOpeningCost",
        value integer 5
      },
      {
        name "GapExtensionCost",
        value integer 2
      },
      {
        name "StrandOption",
        value strand-type both-strands
      },
      {
        name "WordSize",
        value integer 7
      }
    },
    format-options {
      {
        name "Web_JobTitle",
        value string "consensus"
      },
      {
        name "Web_BlastSpecialPage",
        value string "blast2seq"
      }
    }
  }
}

 >lcl|30439 FVFTF6N01A86BR
Length=51


                                                         Sort alignments 
for this subject sequence by:
                                                           E value  
Score  Percent identity
                                                           Query start 
position  Subject start position
 Score = 24.7 bits (26),  Expect = 2e-05
 Identities = 17/18 (94%), Gaps = 1/18 (5%)
 Strand=Plus/Plus

Query  1   GAGGAT-CCAGAATTCTC  17
           |||||| |||||||||||
Sbjct  34  GAGGATACCAGAATTCTC  51

Here's the output from a local search (I changed the expect to 5.0 just 
to prove to myself that some parameters are getting through OK):
my @params = (-program => 'blastn', -outfile => 'bl2seq.out', -FILTER => 
'F', -WORDSIZE => 7, -expect => 5.0);
my $factory = Bio::Tools::Run::StandAloneBlast->new(@params);
my $bl2seq_report = $factory->bl2seq($cons_seqobj, $single_seqobj); 
#consensus vs. FVFTF6N01A86BR
print Dumper $bl2seq_report->next_result;

$VAR1 = bless( {
                 '_inclusion_threshold' => undef,
                 '_queryacc' => 'adapter_consensus',
                 '_iteration_index' => 0,
                 '_iteration_count' => 1,
                 '_hits' => [],
                 '_hitindex' => 0,
                 '_querylength' => '17',
                 '_querydesc' => '',
                 '_iterations' => [
                                    bless( {
                                             
'_oldhits_not_below_threshold' => [],
                                             '_newhits_unclassified' => [],
                                             '_number' => 1,
                                             
'_oldhits_newly_below_threshold' => [],
                                             '_hit_factory' => bless( {
                                                                        
'interface' => 'Bio::Search::Hit::HitI',
                                                                        
'type' => 'Bio::Search::Hit::BlastHit',
                                                                        
'_loaded_types' => {
                                                                                             
'Bio::Search::Hit::BlastHit' => 1
                                                                                           
},
                                                                        
'_root_verbose' => 0
                                                                      }, 
'Bio::Factory::ObjectFactory' ),
                                             '_newhits_below_threshold' => [
                                                                             
{
                                                                               
'-algorithm' => 'BLASTN',
                                                                               
'-description' => '',
                                                                               
'-length' => '51',
                                                                               
'-query_len' => '17',
                                                                               
'-hsp_factory' => bless( {
                                                                                                          
'interface' => 'Bio::Search::HSP::HSPI',
                                                                                                          
'type' => 'Bio::Search::HSP::GenericHSP',
                                                                                                          
'_loaded_types' => {
                                                                                                                               
'Bio::Search::HSP::GenericHSP' => 1
                                                                                                                             
},
                                                                                                          
'_root_verbose' => 0
                                                                                                        
}, 'Bio::Factory::ObjectFactory' ),
                                                                               
'-name' => 'FVFTF6N01A86BR',
                                                                               
'-rank' => 1,
                                                                               
'-hsps' => [
                                                                                            
{
                                                                                              
'-query_start' => '7',
                                                                                              
'-algorithm' => 'BLASTN',
                                                                                              
'-hit_seq' => 'ccagaattctc',
                                                                                              
'-hit_length' => '51',
                                                                                              
'-query_length' => '17',
                                                                                              
'-query_desc' => '',
                                                                                              
'-query_frame' => 0,
                                                                                              
'-rank' => 1,
                                                                                              
'-hit_desc' => '',
                                                                                              
'-query_end' => '17',
                                                                                              
'-hit_name' => 'FVFTF6N01A86BR',
                                                                                              
'-identical' => '11',
                                                                                              
'-query_name' => 'adapter_consensus',
                                                                                              
'-evalue' => '1e-04',
                                                                                              
'-score' => '11',
                                                                                              
'-conserved' => '11',
                                                                                              
'-hit_frame' => 0,
                                                                                              
'-hsp_length' => '11',
                                                                                              
'-query_seq' => 'ccagaattctc',
                                                                                              
'-hit_start' => '41',
                                                                                              
'-homology_seq' => '|||||||||||',
                                                                                              
'-hit_end' => '51',
                                                                                              
'-bits' => '22.3'
                                                                                            
},
                                                                                            
{
                                                                                              
'-query_start' => '9',
                                                                                              
'-algorithm' => 'BLASTN',
                                                                                              
'-hit_seq' => 'agaattct',
                                                                                              
'-hit_length' => '51',
                                                                                              
'-query_length' => '17',
                                                                                              
'-query_desc' => '',
                                                                                              
'-query_frame' => 0,
                                                                                              
'-rank' => 2,
                                                                                              
'-hit_desc' => '',
                                                                                              
'-query_end' => '16',
                                                                                              
'-hit_name' => 'FVFTF6N01A86BR',
                                                                                              
'-identical' => '8',
                                                                                              
'-query_name' => 'adapter_consensus',
                                                                                              
'-evalue' => '0.007',
                                                                                              
'-score' => '8',
                                                                                              
'-conserved' => '8',
                                                                                              
'-hit_frame' => 0,
                                                                                              
'-hsp_length' => '8',
                                                                                              
'-query_seq' => 'agaattct',
                                                                                              
'-hit_start' => '50',
                                                                                              
'-homology_seq' => '||||||||',
                                                                                              
'-hit_end' => '43',
                                                                                              
'-bits' => '16.4'
                                                                                            
}
                                                                                          
],
                                                                               
'-accession' => 'FVFTF6N01A86BR',
                                                                               
'-significance' => '1e-04'
                                                                             
}
                                                                           
],
                                             '_root_verbose' => 0,
                                             
'_newhits_not_below_threshold' => [],
                                             '_oldhits_below_threshold' 
=> []
                                           }, 
'Bio::Search::Iteration::GenericIteration' )
                                  ],
                 '_hit_factory' => 
$VAR1->{'_iterations'}[0]{'_hit_factory'},
                 '_statistics' => bless( {
                                           'stats' => {
                                                        'S1' => '4',
                                                        'S1_bits' => '8.4',
                                                        'kappa_gapped' 
=> '0.711',
                                                        'X3_bits' => '99.1',
                                                        'X1' => '4',
                                                        'lambda_gapped' 
=> '1.37',
                                                        'X2' => '15',
                                                        'S2' => '4',
                                                        
'seqs_better_than_cutoff' => '1',
                                                        'Hits_to_DB' => '5',
                                                        'num_extensions' 
=> '2',
                                                        
'num_successful_extensions' => '2',
                                                        'X1_bits' => '7.9',
                                                        'X3' => '50',
                                                        'dbentries' => '1',
                                                        'entropy_gapped' 
=> '1.31',
                                                        'X2_bits' => '29.7',
                                                        'S2_bits' => '8.4'
                                                      }
                                         }, 
'Bio::Search::GenericStatistics' ),
                 '_algorithm' => 'BLASTN',
                 '_parameters' => bless( {
                                           'params' => {
                                                         'gapext' => '2',
                                                         'matrix' => 
'blastn matrix:1 -3',
                                                         'expect' => '5.0',
                                                         'allowgaps' => 
'yes',
                                                         'gapopen' => '5'
                                                       }
                                         }, 
'Bio::Tools::Run::GenericParameters' ),
                 '_root_verbose' => 0,
                 '_queryname' => 'adapter_consensus'
               }, 'Bio::Search::Result::BlastResult' );


From David.Messina at sbc.su.se  Wed Aug 18 22:32:37 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 19 Aug 2010 00:32:37 +0200
Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlast.pm - bl2seq
	question
In-Reply-To: <4C6C3259.4060304@bcm.tmc.edu>
References: <4C6C3259.4060304@bcm.tmc.edu>
Message-ID: <E8F0F7A7-BC33-4E37-8AAB-75A9470E82A5@sbc.su.se>

Hi Caleb,

The first thing I would do is take BioPerl out of the equation and test your local bl2seq on the command line. If you get the same output locally as on the web version, then there is a problem with BioPerl. If you're still seeing a discrepancy between the web and your local run, then this isn't a problem with BioPerl.

Just to be clear, BioPerl doesn't "implement" any of the BLAST programs; it is simply a wrapper around the programs that you download from NCBI. That doesn't mean BioPerl isn't at fault, of course, just that it's important to isolate the problem carefully.

The most common reasons for these discrepancies are:

- different version numbers of BLAST

2.2.21? 2.2.22? Is it the same on the web as locally?

- similarly, different implementations of BLAST

NCBI's old BLAST suite is now deprecated and replaced with BLAST+. All of the online BLAST web queries are Blast+ now ? are you running BLAST+ locally? (there's also a separate BioPerl wrapper for BLAST+ called Bio::Tools::Run::BlastPlus)

- hidden "default" parameters

Even though you're only changing a handful of parameters, the defaults (particularly on the web version) may be different than what you expect.

In your case, it looks like on the web version, match score is 2 and mismatch is -3. However, in the local version I believe match score is 1 and a mismatch is -3.

See this line in the params block near the end of your post:

	'matrix' => 'blastn matrix:1 -3',


Dave


From sidd.basu at gmail.com  Thu Aug 19 00:28:32 2010
From: sidd.basu at gmail.com (Siddhartha Basu)
Date: Wed, 18 Aug 2010 19:28:32 -0500
Subject: [Bioperl-l]  Re: [RFC] Interolog::Walk
In-Reply-To: <4C6BF4BD.5010200@sms.ed.ac.uk>
References: <4C6BF4BD.5010200@sms.ed.ac.uk>
Message-ID: <20100819002830.GA366@Macintosh-235.local>

Hi, 

On Wed, 18 Aug 2010, Giuseppe Gallone wrote:

> Hello BioPerl community - I've written a new module called Interolog::Walk 
> that I'm planning to put on CPAN. I would be grateful if you might take a 
> look at the brief description I attached and tell me what you think. I'll 
> be more than happy to post further details should the module be of some 
> interest for someone.
>
> Also, I am not totally sure about having the correct name for it. This is 
> my first module and It would be great if you could advise on naming it 
> appropriately. Hopefully the following description will give an idea on 
> what it does.
>
> ===================
>
>
> NAME
>     Interolog::Walk - Retrieve, score and visualize putative 
> Protein-Protein Interactions through the orthology-walk method
>
> DESCRIPTION
>     A common activity in computational biology is to mine protein-protein 
> interactions from publicly available databases in order to build 
> Protein-Protein Interaction (PPI) datasets.
> In many instances, however, the number of experimentally obtained annotated 
> PPIs is very scarce and it would be helpful to enrich the experimental 
> dataset with high-quality, computationally-inferred PPIs. Such 
> computationally-obtained dataset can extend, support or enrich experimental 
> PPI datasets, and are of crucial importance in high-throughput gene 
> prioritization studies, i.e. to drive hypotheses and restrict the 
> dimensionality of many gene functional discovery problems.
> This Perl Module, Interolog::Walk, is aimed at building putative PPI 
> datasets on the basis of a number of comparative biology paradigms: the 
> module implements a collection of computational biology algorithms based on 
> the concept of "orthology projection". If interacting proteins A and B in 
> organism X have orthologs A' and B' in organism Y, under certain conditions 
> one can assume that the interaction will be conserved in organism Y, i.e. 
> the A-B interaction can be "projected through the orthologies" to obtain a 
> putative A'-B' interaction. The pair of interactions (A-B) and (A'-B') are 
> named "Interologs" (see for instance [1] and [2]).
>
> Interolog::Walk collects, analyses and collates gene orthology data 
> provided by the Ensembl Consortium (www.ensembl.org) as well as PPI data 
> provided by EBI Intact (http://www.ebi.ac.uk/intact/). It provides the user 
> with the possibility of rating the quality and reliability of the putative 
> interactions collected, by means of confidence scores, and optionally 
> outputs network representations of the datasets, compatible with the 
> biological network representation standard, Cytoscape.

Sounds interesting. I am currently playing around with a perl based webapp for displaying interactome
using cytoscapeweb. Depending how your design pans out,  would be happy to
use your module as a backend analysis layer. And on a related note,  you
might want to have a look at bioperl-network and if there is any overlap
might be worth contributing.

-siddhartha

>
> USAGE
> In order to carry out an interolog walk we start with a set of gene 
> identifiers in one organism of interest. We query those ids against a 
> number of comparative biology databases to retrieve a list of orthologues 
> for each gene id of interest, in one or more species.
> In the following step we rely  on PPI databases to retrieve the list of 
> available interactors for the protein ids obtained. The output at this 
> stage consists of a list of interactors of the orthologues of the initial 
> gene set, plus several fields of ancillary data.
> In the last step of the process we  project the interactions - again using 
> orthology data - back to the original species of interest. The output of 
> the process is a list of PUTATIVE INTERACTORS of the initial gene set, plus 
> several fields of ancillary data.
>
> ====================
>
> Given the scope and the focus of the project, I would imagine that viable 
> alternatives for the namespace might be
>
> Bio::Orthology::InterologWalk
> Bio::InterologMap
>
> or maybe
> Interolog::Map
> Orthology::Map
> Orthology::InterologMap
>
> There are no similar projects as far as I could see so I shouldn't run the 
> risk of overlapping namespaces. Still I would love to know your informed 
> opinion about it.
>
> best,
> Giuseppe
>
>
>
> REFERENCES
> [1] Yu H, Luscombe NM, Lu HX, Zhu X, Xia Y, Han JD, Bertin N, Chung S, 
> Vidal M, Gerstein M. Annotation transfer between genomes: protein-protein 
> interologs and protein-DNA regulogs. Genome Research 2004 
> Jun;14(6):1107-18.
>
> [2]Wiles AM, Doderer M, Ruan J, Gu T-T, Ravi D, Blackman BA, Bishop AJR. 
> "Building and Analyzing Protein Interactome Networks by Cross-species 
> Comparisons." BMC Systems Biology 2010, 4:36 - PMID: 20353594
>
> -- 
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From dan.kortschak at adelaide.edu.au  Thu Aug 19 02:15:03 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Thu, 19 Aug 2010 11:45:03 +0930
Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query
Message-ID: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au>

Hi Everyone,

I'm wanting to set up a persistent data store for some of my work and am
in the process of choosing parts for my system. From my brief look
around I think I'd like to use BioSQL (next best choice being Chado -
but BioPerl bindings in bioperl-db for BioSQL being the decider here),
but have noticed comments some time back that bioperl-db and PostgreSQL
8.3 (my prefered engine - though MySQL is possible, but makes the whole
system messier) don't play well together.

What is the status of the casting expectation conflict between
bioperl-db and Pg8.3? The scripts are run with safe data, so
placeholders aren't strictly crucial (though speed may be an issue?) and
`$dbh->{pg_server_prepare} = 0;' seems like it could be an option.

Can anybody provide any advice on this issue?

thanks
Dan Kortschak


From cjfields at illinois.edu  Thu Aug 19 03:29:36 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 18 Aug 2010 22:29:36 -0500
Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlast.pm - bl2seq
	question
In-Reply-To: <E8F0F7A7-BC33-4E37-8AAB-75A9470E82A5@sbc.su.se>
References: <4C6C3259.4060304@bcm.tmc.edu>
	<E8F0F7A7-BC33-4E37-8AAB-75A9470E82A5@sbc.su.se>
Message-ID: <194D43EC-A44C-450A-B57B-EC379DBCB935@illinois.edu>

Wouldn't surprise me too much if the parameters are not set the same; IIRC the main BLAST URL API and the online NCBI Web-BLAST have different default settings.

chris

On Aug 18, 2010, at 5:32 PM, Dave Messina wrote:

> Hi Caleb,
> 
> The first thing I would do is take BioPerl out of the equation and test your local bl2seq on the command line. If you get the same output locally as on the web version, then there is a problem with BioPerl. If you're still seeing a discrepancy between the web and your local run, then this isn't a problem with BioPerl.
> 
> Just to be clear, BioPerl doesn't "implement" any of the BLAST programs; it is simply a wrapper around the programs that you download from NCBI. That doesn't mean BioPerl isn't at fault, of course, just that it's important to isolate the problem carefully.
> 
> The most common reasons for these discrepancies are:
> 
> - different version numbers of BLAST
> 
> 2.2.21? 2.2.22? Is it the same on the web as locally?
> 
> - similarly, different implementations of BLAST
> 
> NCBI's old BLAST suite is now deprecated and replaced with BLAST+. All of the online BLAST web queries are Blast+ now ? are you running BLAST+ locally? (there's also a separate BioPerl wrapper for BLAST+ called Bio::Tools::Run::BlastPlus)
> 
> - hidden "default" parameters
> 
> Even though you're only changing a handful of parameters, the defaults (particularly on the web version) may be different than what you expect.
> 
> In your case, it looks like on the web version, match score is 2 and mismatch is -3. However, in the local version I believe match score is 1 and a mismatch is -3.
> 
> See this line in the params block near the end of your post:
> 
> 	'matrix' => 'blastn matrix:1 -3',
> 
> 
> 
> Dave
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hlapp at drycafe.net  Thu Aug 19 05:48:19 2010
From: hlapp at drycafe.net (Hilmar Lapp)
Date: Thu, 19 Aug 2010 01:48:19 -0400
Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query
In-Reply-To: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au>
References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au>
Message-ID: <986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net>

Hi Dan,

the casting isn't an issue anymore, I think. (And even if it were,  
there is actually a small script that brings back the casts that were  
built into 8.2.) Have you found an example where it still is?

	-hilmar

On Aug 18, 2010, at 10:15 PM, Dan Kortschak wrote:

> Hi Everyone,
>
> I'm wanting to set up a persistent data store for some of my work  
> and am
> in the process of choosing parts for my system. From my brief look
> around I think I'd like to use BioSQL (next best choice being Chado -
> but BioPerl bindings in bioperl-db for BioSQL being the decider here),
> but have noticed comments some time back that bioperl-db and  
> PostgreSQL
> 8.3 (my prefered engine - though MySQL is possible, but makes the  
> whole
> system messier) don't play well together.
>
> What is the status of the casting expectation conflict between
> bioperl-db and Pg8.3? The scripts are run with safe data, so
> placeholders aren't strictly crucial (though speed may be an issue?)  
> and
> `$dbh->{pg_server_prepare} = 0;' seems like it could be an option.
>
> Can anybody provide any advice on this issue?
>
> thanks
> Dan Kortschak
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
===========================================================


From dan.kortschak at adelaide.edu.au  Thu Aug 19 05:54:03 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Thu, 19 Aug 2010 15:24:03 +0930
Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query
In-Reply-To: <986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net>
References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au>
	<986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net>
Message-ID: <1282197243.14127.27.camel@zoidberg.mbs.adelaide.edu.au>

Hi Hilmar,

No, I haven't found any problems, just hoping to avoid them by prior
research.

thanks
Dan

On Thu, 2010-08-19 at 01:48 -0400, Hilmar Lapp wrote:
> Hi Dan,
> 
> the casting isn't an issue anymore, I think. (And even if it were,  
> there is actually a small script that brings back the casts that
> were  
> built into 8.2.) Have you found an example where it still is?
> 
>         -hilmar


From biopython at maubp.freeserve.co.uk  Thu Aug 19 10:01:03 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 19 Aug 2010 11:01:03 +0100
Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query
In-Reply-To: <986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net>
References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au>
	<986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net>
Message-ID: <AANLkTikiTQwzKJGrAKAtCqbHiAVw_uENrJu4Y038=anD@mail.gmail.com>

On Thu, Aug 19, 2010 at 6:48 AM, Hilmar Lapp <hlapp at drycafe.net> wrote:
> Hi Dan,
>
> the casting isn't an issue anymore, I think. (And even if it were, there is
> actually a small script that brings back the casts that were built into
> 8.2.) Have you found an example where it still is?
>
> ? ? ? ?-hilmar

Hi Hilmar,

Do the bioperl-db bindings for BioSQL on PostgreSQL still require those
extra rules in the schema?
http://bugzilla.open-bio.org/show_bug.cgi?id=2839

Peter


From G.Gallone at sms.ed.ac.uk  Thu Aug 19 10:45:36 2010
From: G.Gallone at sms.ed.ac.uk (Giuseppe Gallone)
Date: Thu, 19 Aug 2010 11:45:36 +0100
Subject: [Bioperl-l] [RFC] Interolog::Walk
In-Reply-To: <8BBCD561-CA5E-412B-8CBC-34767006A74A@sbc.su.se>
References: <4C6BF4BD.5010200@sms.ed.ac.uk>
	<8BBCD561-CA5E-412B-8CBC-34767006A74A@sbc.su.se>
Message-ID: <4C6D0B50.4050902@sms.ed.ac.uk>

Hi Dave,

thank you very much for your helpful comments.

Regarding the module name: I will follow your advice and avoid to 
propose a new root during the module registration. As for the second 
level, I haven't been able to find anything related to 
homology/orthology, therefore I'm not sure whether I should go for

Bio::Orthology::InterologMap
or
Bio::Homology::InterologMap

The first one being maybe a bit more specific. I might also expand 
further as in

Bio::Orthology::Interolog::Map,

just in case somebody else finds other interesting applications for the 
Interolog concept and would like to "plug in" their own contribution. 
Would this make any sense?

I also appreciate your comments on the documentation. The one I provided 
is actually not the full pod I was planning to include, but rather an 
extract. What I have at the moment is a description, for each method, in 
the following form:

=====================================
    remove_duplicate_rows
      Usage     : $RC = InterologMap::remove_duplicate_rows(input_handle 
    => $dbh,
 
output_handle   => $out_data,
                                                            header 
     => 'standard',
                                                            );
      Purpose   : This is used to clean up a TSV data files of duplicate 
entries. Occasionally,  Intact can return duplicate
                  entries. This routine will make sure no such 
duplicates are kept. A new datafile is built.
                  The number of unique data rows is updated.
      Returns   : success/error
      Argument  : database handle to input file, filehandle to 
outputfile, header type. Header type is one of the following:
                  - "standard": when the routine is used to clean up an 
interolog walk file (the header will be longer)
                  - "direct":   when the routine is used to clean up a 
file of real db interaction (the header is shorter)
                  - no field provided: default is standard
      Throws    : -
      Comment   : Sample


     See Also :
=======================================

On top of that, there is a DESCRIPTION, USAGE, and SYNOPSIS. The 
synopsis has some code with an example of typical usage of the module. 
Please take a look at this (attached below) and tell me what you think.

You mention that the description contains a lot of background 
information. Would you recommend reducing it, or placing it elsewhere?
I was considering to write a little tutorial in latex as soon as 
possible anyway, to provide a "centralised" source of information to 
familiarise with the module. Does this respect the CPAN regulations?

As for your question on the structure of the module: you are indeed 
right, the idea when running the "orthology walk" is to create a 
pipeline of subroutines: there's a core set of subroutines meant to work 
in strict sequentiality.
Each of these subroutines expects, as input, the output of the previous 
one. The input/output dataset is currently in the form of a TSV text 
file, which I process with the help of the DBI module (to be more 
specific, I use DBD::CSV).

While there's a certain flexibility regarding how to use the module, one 
core idea remains: in order to get the set of putative interactors, the 
user would have to call at least three basic routines:

(A)
=================
1)get_forward_orthologies(): this queries the initial gene list against 
one or more Ensembl dbs (using the Ensembl Perl Api) and retrieves their 
orthologues, plus a number of ancillary data fields (mainly conservation 
data, eg dn/ds ratio,distance from ancestor,orthology type, etc)

2)get_interactors(): this queries the orthology list built in the 
previous stage against a PSICQUIC-enabled PPI db using Rest (at the 
moment I only query the EBI Intact DB, but it should be easy to expand 
this and query all PSICQUIC compatible PPI dbs transparently). This step 
will "fatten" the dataset built in (1) with the interactors of those 
orthologues, plus ancillary data (including lots of parameters 
describing the quality, nature, origin of the annotated interaction)

3)get_backward_orthologies(): this queries the interactor list built in 
the previous stage against one or more Ensembl dbs to find orthologues 
*back* in the original species. It also adds a number of supplementary 
information just like in (1).
==================

At the end of this procedure the user will have a TSV files where each 
row contains a binary putative interaction plus (currently) 37 
supplementary data fields.

One can then scan these results to check for duplicates, to compute 
counts, to see if we have discovered new gene ids that were not present 
in the original dataset (hopefully we have :) ).

Most importantly, one can then further process these results to do one 
or more of the following:
(B) compute a global confidence score to assess the reliability of the 
each binary putative interaction
(C) extract the binary putative PPIs from the dataset and save them in a 
format compatible with Cytoscape: this helps providing a visual quality 
to the result: one can then apply network analysis tools to discover 
motifs, clusters, etc. The format I use is currently .SIF + attributes, 
as detailed in
http://cytoscape.wodaklab.org/wiki/Cytoscape_User_Manual/Network_Formats
(D) given the same initial gene list, one can also build a dataset of 
REAL, experimentally-obtained PPIs,(without mapping through orthologies 
in other species). One can then compare this dataset with the Putative 
dataset to see if/where the two overlap, what's the intersection or the 
differences, etc.


In order to suggest ways of using the module I have written 4 sample 
scripts and I will include them in the module. Each script utilises the 
module and uses/reuses subroutines in a pipeline fashion, and does the 
following:

1)doInterologWalk.pl: runs the basic pipeline in (A)
2)doScores.pl: computes and adds confidence scores as explained in (B)
3)doNetworks.pl: computes SIF network + attributes as in (D)
4)getRealInteractions.pl: runs a pipeline to obtain real PPIs from the 
inital gene set.

Hope I didn't make this too confusing. I would love to hear back from 
you and from anybody else that would like to provide feedback.

Cheers
Giuseppe

On 18/08/10 17:52, Dave Messina wrote:
> Hi Giuseppe,
>
> Sounds really interesting ? thanks for posting this.
>
>> Bio::Orthology::InterologWalk
>
> I vote for this name, or in any case something with Bio:: as the top-level namespace since it's a biology-related package.
>
> I like that you're providing a lot of background and information about the project in the documentation. However, the USAGE section should give information about how to use the module, with example code. You can look at other modules on CPAN (or in BioPerl) to see the conventions for writing documentation.
>
> Also, from what you wrote, it sounds like this might be a pipeline or a script rather than a module per se, or perhaps a script and a set of modules. It would be helpful to clarify in your documentation (if you haven't already) how exactly things are organized (and of course example code will help with that, too).
>
>
> Hope that's helpful, and let us know when you've got it up on CPAN so we can try it out!
>
>
> Dave
>
>

NAME
     Interolog::Walk - Retrieve, score and visualize putative 
Protein-Protein
     Interactions through the orthology-walk method

SYNOPSIS
       use Interolog::Walk;

     First, obtain Intact Interactions for the dataset (see example in
     "getDirectInteractions.pl"):

       #get a registry from Ensembl
       my $registry = InterologMap::setup_ensembl_adaptor(connect_to_db 
  => $ensembl_db,
                                                          source_species 
=> $sourceorg,
                                                          verbose 
  => 1
                                                          );


       #query actual interactions
       $RC = InterologMap::Direct::get_direct_interactions(registry 
     => $registry,
 
source_species   => $sourceorg,
                                                           input_path 
     => $in_path,
                                                           output_path 
     => $out_path,
                                                           url 
     => $url,
                                                           );

     do some postprocessing (see "do_counts()" and "extract_unseen_ids()" )
     and then do the actual interolog walk on the dataset with the following
     sequence of three methods.

     get orthologues of starting set:

       $RC = InterologMap::get_forward_orthologies(registry        => 
$registry,
                                                   ensembl_db      => 
$ensembl_db,
                                                   input_path      => 
$in_path,
                                                   output_path     => 
$out_path,
                                                   source_org      => 
$sourceorg,
                                                   dest_org        => 
$destorg,
                                                   );

     add interactors of orthologues found by "get_forward_orthologies()":

       $RC = InterologMap::get_interactions(input_path    => $in_path,
                                            output_path   => $out_path,
                                            url           => $url,
                                            url_global    => $url_global,
                                            );

     add orthologues of interactors found by "get_interactions()":

       $RC = InterologMap::get_backward_orthologies(registry    => 
$registry,
                                                    ensembl_db  => 
$ensembl_db,
                                                    input_path  => $in_path,
                                                    output_path => 
$out_path,
                                                    error_path  => 
$err_path,
                                                    source_org  => 
$sourceorg,
                                                    );

     do some postprocessing (see "remove_duplicate_rows()", "do_counts()",
     "extract_unseen_ids()") and then optionally compute a composite score
     for the putative interactions obtained:

        $RC = InterologMap::Scores::compute_scores(input_path      => 
$in_path,
                                                   score_path      => 
$score_path,
                                                   output_path     => 
$out_path,
                                                   term_graph      => 
$onto_graph,
                                                   M_IT_SCORE      => $M_IT,
                                                   M_DM_SCORE      => $M_DM,
                                                   M_ME_DM_SCORE   => 
$M_MDM,
                                                   M_ME_TAXA_SCORE => 
$M_MTAXA
                                                   );

     get some networks and network attributes which you can then visualise
     with cytoscape

        $RC = InterologMap::Networks::do_network(registry            => 
$registry,
                                                    db               => 
$ensembl_db,
                                                    input_path       => 
$in_path,
                                                    output_path      => 
$out_path,
                                                    source_org       => 
$sourceorg,
                                                    orthology_type   => 
$orthtype,
                                                    );

        $RC = InterologMap::Networks::do_attributes(registry      => 
$registry,
                                                    input_path    => 
$in_path,
                                                    output_path   => 
$out_path,
                                                    source_org    => 
$sourceorg,
                                                    label_type    => 
'external name'
                                                    );

     *The synopsis above only lists the major methods and parameters.*

DESCRIPTION
     A common activity in computational biology is to mine protein-protein
     interactions from publicly available databases to build 
*Protein-Protein
     Interaction* (PPI) datasets. In many instances, however, the number of
     experimentally obtained annotated PPIs is very scarce and it would be
     helpful to enrich the experimental dataset with high-quality,
     computationally-inferred PPIs. Such computationally-obtained 
dataset can
     extend, support or enrich experimental PPI datasets, and are of crucial
     importance in high-throughput gene prioritization studies, i.e. to 
drive
     hypotheses and restrict the dimensionality of functional discovery
     problems. This Perl Module, Interolog::Walk, is aimed at building
     putative PPI datasets on the basis of a number of comparative biology
     paradigms: the module implements a collection of computational biology
     algorithms based on the concept of "orthology projection". If
     interacting proteins A and B in organism X have orthologs A' and B' in
     organism Y, under certain conditions one can assume that the 
interaction
     will be conserved in organism Y, i.e. the A-B interaction can be
     "projected through the orthologies" to obtain a putative A'-B'
     interaction. The pair of interactions (A-B) and (A'-B') are named
     "Interologs".

     Interolog::Walk collects, analyses and collates gene orthology data
     provided by the Ensembl Consortium as well as PPI data provided by EBI
     Intact. It provides the user with the possibility of rating the quality
     and reliability of the putative interactions collected, by means of
     confidence scores, and optionally outputs network representations 
of the
     datasets, compatible with the biological network representation
     standard, Cytoscape.

BASIC USAGE
   Rationale behind "Interolog::Walk".
                                   \EBI Intact API/
              .--------------.            |             .-------------.
          (2) | A(e.g. mouse)|<------------------------>|   B(mouse)  |  (3)
              `--------------'          <PPI>           `-------------'
                     ^                                         |
        /Ensembl\    | <Orthology>                 <Orthology> | \ Ensembl /
       / Compara \   |                                         |  \Compara/
      /    Api    \  |                                         |   \ Api /
                     |                                         |
              .--------------.                           .-------------.
          (1) | A'(e.g. fly) |. . . . . . . . . . . . .  |   B'(fly)   | (4)
              `--------------'     [SCORED]PUTATIVE PPI  `-------------'
                              (Output of Interolog::Walk)

     In order to carry out an interolog walk we start with a set of gene
     identifiers in one organism of interest (1). We query those ids against
     a number of comparative biology databases to retrieve a list of
     orthologues for the gene id of interest, in one or more species (2). In
     the next step we rely instead on PPI databases to retrieve the list of
     available interactors for the protein ids obtained in (2). The 
output at
     this stage consists of a list of interactors of the orthologues of the
     initial gene set, plus several fields of ancillary data (whose
     importance will be explained later) (3). In the last step of this
     process we will need to project the interactions in (3) - again using
     orthology data - back to the original species of interest. The 
output of
     the process is a list of PUTATIVE INTERACTORS of the initial gene set,
     plus several fields of ancillary data.

     "Interolog::Walk" provides three main functions to carry out the basic
     walk, "get_forward_orthologies()", "get_interactions()" and
     "get_backward_orthologies()". These functions must be called strictly
     sequentially in your script, as the process, analyse and attach data to
     the output in a pipeline-like fashion, i.e. processing the output 
of the
     preceding function.

     get_forward_orthologies
     get_interactions
     get_backward_orthologies

SCORING THE PUTATIVE INTERACTIONS
BUILDING PUTATIVE INTERACTION NETWORKS
BUGS
     Please report any you find

SUPPORT
     TODO

AUTHOR
     Giuseppe Gallone <ggallone at cpan.org>

     CPAN ID: GGALLONE

     University of Edinburgh

COPYRIGHT
     The Interolog::Walk module is Copyright (c) 2010 Giuseppe Gallone All
     rights reserved.

     You may distribute under the terms of either the GNU General Public
     License or the Artistic License, as specified in the Perl 5.10.0 README
     file.

SEE ALSO


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


From G.Gallone at sms.ed.ac.uk  Thu Aug 19 12:42:28 2010
From: G.Gallone at sms.ed.ac.uk (Giuseppe Gallone)
Date: Thu, 19 Aug 2010 13:42:28 +0100
Subject: [Bioperl-l] [RFC] Interolog::Walk
In-Reply-To: <20100819002830.GA366@Macintosh-235.local>
References: <4C6BF4BD.5010200@sms.ed.ac.uk>
	<20100819002830.GA366@Macintosh-235.local>
Message-ID: <4C6D26B4.5090702@sms.ed.ac.uk>

Dear Siddhartha,

glad to hear this might be helpful. As for the bioperl-network package 
you mention, thank for you for mentioning that. I gave a quick look to 
its documentation and looks like a much deeper and more complex effort 
than what I have in my package. I've actually been using a lot the 
package Graph on which it seems to be based and found it very helpful.

I'm not sure if the network routines in my module overlap with it 
though: all I do in my package is parse the dataset, filtering out only 
what requested to build a cytoscape SIF file and optionally some 
cytoscape NOA attribute files, as requested by the cytoscape 
specification in

http://cytoscape.wodaklab.org/wiki/Cytoscape_User_Manual/Network_Formats

instead it looks like  bioperl-network actually builds some kind of 
internal representation of the network for further manipulation in Perl, 
if I understand it correctly?

Kind regards
Giuseppe

On 19/08/10 01:28, Siddhartha Basu wrote:

> Sounds interesting. I am currently playing around with a perl based webapp for displaying interactome
> using cytoscapeweb. Depending how your design pans out,  would be happy to
> use your module as a backend analysis layer. And on a related note,  you
> might want to have a look at bioperl-network and if there is any overlap
> might be worth contributing.
>
> -siddhartha
>


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


From xupeng86 at gmail.com  Thu Aug 19 08:02:48 2010
From: xupeng86 at gmail.com (xupeng)
Date: Thu, 19 Aug 2010 16:02:48 +0800
Subject: [Bioperl-l] Why I can't find the perl script "load_seqdatabase.pl"
	when use biosql database?
Message-ID: <201008191602.49068.xupeng86@gmail.com>

 	I've downloaded the biosql-1.0.1.tar.gz. It works well. But I 
can't find the 'load_seqdatabase.pl' when I try to import the 
Genbank files into biosql databsase. 
	Can anyone give me a copy of that file? 
many thanks ! 


From sunhanifk at gmail.com  Thu Aug 19 14:25:38 2010
From: sunhanifk at gmail.com (han sun)
Date: Thu, 19 Aug 2010 22:25:38 +0800
Subject: [Bioperl-l] Could I install BioPerl on Windows with the ActivePerl
	5.12.1?
Message-ID: <AANLkTi=ycKzqWWQ-FHk=4WBxhedt7CYT-WkBZkxRjgrm@mail.gmail.com>

Hello everyone,

I have used perl for several months,and I now want to feel the power of
bioperl.
But it seems that the installing is more difficult than I thought.

I typed the commands.


install-shell


rep add bioperl http://bioperl.org/DIST


rep add uwinnipeg
http://cpan.uwinnipeg.ca/PPMPackages/12xx/<http://cpan.uwinnipeg.ca/PPMPackages/10xx/>


rep add trouchelle http://trouchelle.com/ppm12/

install BioPerl

However,the installing failed,

ppm install failed:
Can't find any package that provides List::MoreUtils for Bundle-BioPerl-Core
Can't find any package that provides PostScript::TextBlock for
Bundle-BioPerl-Core
Can't find any package that provides Ace:: for Bundle-BioPerl-Core
Can't find any package that provides SOAP::Lite for Bundle-BioPerl-Core
Can't find any package that provides Convert::Binary::C for
Bundle-BioPerl-Core
Can't find any package that provides XML::Twig for Bundle-BioPerl-Core
Can't find any package that provides DB_File:: for Bundle-BioPerl-Core
Can't find any package that provides IPC::Run for GraphViz
Can't find any package that provides XML-XPathEngine for XML-DOM-XPath
Can't find any package that provides List-MoreUtils for Moose
Can't find any package that provides List-MoreUtils for Class-MOP


then I tried

install http://www.bribes.org/perl/ppm/GD.ppd

and tried the installation again,but it still didn't help.

*
*
*
*
*
*


*Do you konw what's wrong with the problem?*
*
*
*
*
*Please help me,thanks very much.*


From cjfields1 at gmail.com  Thu Aug 19 14:33:26 2010
From: cjfields1 at gmail.com (Christopher Fields)
Date: Thu, 19 Aug 2010 09:33:26 -0500
Subject: [Bioperl-l] Could I install BioPerl on Windows with the
	ActivePerl 5.12.1?
In-Reply-To: <AANLkTi=ycKzqWWQ-FHk=4WBxhedt7CYT-WkBZkxRjgrm@mail.gmail.com>
References: <AANLkTi=ycKzqWWQ-FHk=4WBxhedt7CYT-WkBZkxRjgrm@mail.gmail.com>
Message-ID: <78E913D5-00E2-45F2-AA9D-7F4A7CDBFDA1@gmail.com>

Try using ActivePerl 5.10 instead of v5.12.  It's very possible the PPM won't work for v5.12 yet.

chris

On Aug 19, 2010, at 9:25 AM, han sun wrote:

> Hello everyone,
> 
> I have used perl for several months,and I now want to feel the power of
> bioperl.
> But it seems that the installing is more difficult than I thought.
> 
> I typed the commands.
> 
> 
> 
> install-shell
> 
> 
> rep add bioperl http://bioperl.org/DIST
> 
> 
> rep add uwinnipeg
> http://cpan.uwinnipeg.ca/PPMPackages/12xx/<http://cpan.uwinnipeg.ca/PPMPackages/10xx/>
> 
> 
> rep add trouchelle http://trouchelle.com/ppm12/
> 
> install BioPerl
> 
> However,the installing failed,
> 
> ppm install failed:
> Can't find any package that provides List::MoreUtils for Bundle-BioPerl-Core
> Can't find any package that provides PostScript::TextBlock for
> Bundle-BioPerl-Core
> Can't find any package that provides Ace:: for Bundle-BioPerl-Core
> Can't find any package that provides SOAP::Lite for Bundle-BioPerl-Core
> Can't find any package that provides Convert::Binary::C for
> Bundle-BioPerl-Core
> Can't find any package that provides XML::Twig for Bundle-BioPerl-Core
> Can't find any package that provides DB_File:: for Bundle-BioPerl-Core
> Can't find any package that provides IPC::Run for GraphViz
> Can't find any package that provides XML-XPathEngine for XML-DOM-XPath
> Can't find any package that provides List-MoreUtils for Moose
> Can't find any package that provides List-MoreUtils for Class-MOP
> 
> 
> then I tried
> 
> install http://www.bribes.org/perl/ppm/GD.ppd
> 
> and tried the installation again,but it still didn't help.
> 
> *
> *
> *
> *
> *
> *
> 
> 
> *Do you konw what's wrong with the problem?*
> *
> *
> *
> *
> *Please help me,thanks very much.*
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hlapp at drycafe.net  Thu Aug 19 14:53:22 2010
From: hlapp at drycafe.net (Hilmar Lapp)
Date: Thu, 19 Aug 2010 10:53:22 -0400
Subject: [Bioperl-l] Why I can't find the perl script
	"load_seqdatabase.pl" when use biosql database?
In-Reply-To: <201008191602.49068.xupeng86@gmail.com>
References: <201008191602.49068.xupeng86@gmail.com>
Message-ID: <14AD92D9-3EA7-4E74-8EF7-2B2EC70D9095@drycafe.net>

The file comes with Bioperl-db, not BioSQL. That is so because it  
depends on BioPerl and on Bioperl-db, and so you will need to have  
both installed.

	-hilmar

On Aug 19, 2010, at 4:02 AM, xupeng wrote:

> 	I've downloaded the biosql-1.0.1.tar.gz. It works well. But I
> can't find the 'load_seqdatabase.pl' when I try to import the
> Genbank files into biosql databsase.
> 	Can anyone give me a copy of that file?
> many thanks !
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
===========================================================


From hlapp at drycafe.net  Thu Aug 19 14:58:46 2010
From: hlapp at drycafe.net (Hilmar Lapp)
Date: Thu, 19 Aug 2010 10:58:46 -0400
Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query
In-Reply-To: <AANLkTikiTQwzKJGrAKAtCqbHiAVw_uENrJu4Y038=anD@mail.gmail.com>
References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au>
	<986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net>
	<AANLkTikiTQwzKJGrAKAtCqbHiAVw_uENrJu4Y038=anD@mail.gmail.com>
Message-ID: <045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net>

Yes, unfortunately they do. The feature for obviating them (namely  
nested transactions) is there in Pg 8.2+, but Bioperl-db doesn't use  
them yet ... I have to learn more about Class::DBIx first to decide  
whether it's better to first implement nested transactions in the home- 
grown ORM that Bioperl-db in essence is, or whether it's better to  
reimplement everything in Class::DBIx instead.

There are new datatypes in Bioperl, and relations in BioSQL that could  
hold them, and so I need to decide what's the way forward.

	-hilmar

On Aug 19, 2010, at 6:01 AM, Peter wrote:

> On Thu, Aug 19, 2010 at 6:48 AM, Hilmar Lapp <hlapp at drycafe.net>  
> wrote:
>> Hi Dan,
>>
>> the casting isn't an issue anymore, I think. (And even if it were,  
>> there is
>> actually a small script that brings back the casts that were built  
>> into
>> 8.2.) Have you found an example where it still is?
>>
>>        -hilmar
>
> Hi Hilmar,
>
> Do the bioperl-db bindings for BioSQL on PostgreSQL still require  
> those
> extra rules in the schema?
> http://bugzilla.open-bio.org/show_bug.cgi?id=2839
>
> Peter

-- 
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
===========================================================


From mmuratet at hudsonalpha.org  Thu Aug 19 15:00:52 2010
From: mmuratet at hudsonalpha.org (Michael Muratet)
Date: Thu, 19 Aug 2010 10:00:52 -0500
Subject: [Bioperl-l] Why I can't find the perl script
	"load_seqdatabase.pl" when use biosql database?
In-Reply-To: <14AD92D9-3EA7-4E74-8EF7-2B2EC70D9095@drycafe.net>
References: <201008191602.49068.xupeng86@gmail.com>
	<14AD92D9-3EA7-4E74-8EF7-2B2EC70D9095@drycafe.net>
Message-ID: <C6FECD93-E599-465B-A93A-BD1F2CDFBE9C@hudsonalpha.org>


On Aug 19, 2010, at 9:53 AM, Hilmar Lapp wrote:

> The file comes with Bioperl-db, not BioSQL. That is so because it  
> depends on BioPerl and on Bioperl-db, and so you will need to have  
> both installed.

Is load_seqdatabase.pl still the best method? I vaguely remember a  
post that said that load_seqdatabase was deprecated, but I can't find  
it in the archives.

Mike

>
> 	-hilmar
>
> On Aug 19, 2010, at 4:02 AM, xupeng wrote:
>
>> 	I've downloaded the biosql-1.0.1.tar.gz. It works well. But I
>> can't find the 'load_seqdatabase.pl' when I try to import the
>> Genbank files into biosql databsase.
>> 	Can anyone give me a copy of that file?
>> many thanks !
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> -- 
> ===========================================================
> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
> ===========================================================
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Michael Muratet, Ph.D.
Senior Scientist
HudsonAlpha Institute for Biotechnology
mmuratet at hudsonalpha.org
(256) 327-0473 (p)
(256) 327-0966 (f)

Room 4005
601 Genome Way
Huntsville, Alabama 35806


From hlapp at drycafe.net  Thu Aug 19 15:29:31 2010
From: hlapp at drycafe.net (Hilmar Lapp)
Date: Thu, 19 Aug 2010 11:29:31 -0400
Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query
In-Reply-To: <5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu>
References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au>
	<986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net>
	<AANLkTikiTQwzKJGrAKAtCqbHiAVw_uENrJu4Y038=anD@mail.gmail.com>
	<045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net>
	<5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu>
Message-ID: <5F77404A-086D-4D0C-B3A5-F5119FCF878A@drycafe.net>


On Aug 19, 2010, at 11:09 AM, Chris Fields wrote:

> DBIx::Class


Did I have this in the wrong order :-) More coffee, please.
-- 
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
===========================================================


From hlapp at drycafe.net  Thu Aug 19 15:30:26 2010
From: hlapp at drycafe.net (Hilmar Lapp)
Date: Thu, 19 Aug 2010 11:30:26 -0400
Subject: [Bioperl-l] Why I can't find the perl script
	"load_seqdatabase.pl" when use biosql database?
In-Reply-To: <C6FECD93-E599-465B-A93A-BD1F2CDFBE9C@hudsonalpha.org>
References: <201008191602.49068.xupeng86@gmail.com>
	<14AD92D9-3EA7-4E74-8EF7-2B2EC70D9095@drycafe.net>
	<C6FECD93-E599-465B-A93A-BD1F2CDFBE9C@hudsonalpha.org>
Message-ID: <C5FD4B85-25B3-4D76-AA99-B3DBE42400C7@drycafe.net>

It's not deprecated. Unless I'm again mixing up something?

	-hilmar

On Aug 19, 2010, at 11:00 AM, Michael Muratet wrote:

>
> On Aug 19, 2010, at 9:53 AM, Hilmar Lapp wrote:
>
>> The file comes with Bioperl-db, not BioSQL. That is so because it  
>> depends on BioPerl and on Bioperl-db, and so you will need to have  
>> both installed.
>
> Is load_seqdatabase.pl still the best method? I vaguely remember a  
> post that said that load_seqdatabase was deprecated, but I can't  
> find it in the archives.
>
> Mike
>
>>
>> 	-hilmar
>>
>> On Aug 19, 2010, at 4:02 AM, xupeng wrote:
>>
>>> 	I've downloaded the biosql-1.0.1.tar.gz. It works well. But I
>>> can't find the 'load_seqdatabase.pl' when I try to import the
>>> Genbank files into biosql databsase.
>>> 	Can anyone give me a copy of that file?
>>> many thanks !
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> -- 
>> ===========================================================
>> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
>> ===========================================================
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Michael Muratet, Ph.D.
> Senior Scientist
> HudsonAlpha Institute for Biotechnology
> mmuratet at hudsonalpha.org
> (256) 327-0473 (p)
> (256) 327-0966 (f)
>
> Room 4005
> 601 Genome Way
> Huntsville, Alabama 35806
>
>
>
>
>

-- 
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
===========================================================


From cjfields at illinois.edu  Thu Aug 19 15:09:13 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 19 Aug 2010 10:09:13 -0500
Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query
In-Reply-To: <045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net>
References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au>
	<986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net>
	<AANLkTikiTQwzKJGrAKAtCqbHiAVw_uENrJu4Y038=anD@mail.gmail.com>
	<045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net>
Message-ID: <5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu>

I think it's worth exploring having a DBIx::Class-based middle-ware approach similar to what Rob Buels has done for Chado.  That would be fairly easy to get started using DBIx::Class::Schema::Loader.  

After that it would require optimization and tweaking, which is potentially more complex than Rob's setup as Chado is very Pg-specific, but maybe Rob can elaborate...

chris

On Aug 19, 2010, at 9:58 AM, Hilmar Lapp wrote:

> Yes, unfortunately they do. The feature for obviating them (namely nested transactions) is there in Pg 8.2+, but Bioperl-db doesn't use them yet ... I have to learn more about Class::DBIx first to decide whether it's better to first implement nested transactions in the home-grown ORM that Bioperl-db in essence is, or whether it's better to reimplement everything in Class::DBIx instead.
> 
> There are new datatypes in Bioperl, and relations in BioSQL that could hold them, and so I need to decide what's the way forward.
> 
> 	-hilmar
> 
> On Aug 19, 2010, at 6:01 AM, Peter wrote:
> 
>> On Thu, Aug 19, 2010 at 6:48 AM, Hilmar Lapp <hlapp at drycafe.net> wrote:
>>> Hi Dan,
>>> 
>>> the casting isn't an issue anymore, I think. (And even if it were, there is
>>> actually a small script that brings back the casts that were built into
>>> 8.2.) Have you found an example where it still is?
>>> 
>>>       -hilmar
>> 
>> Hi Hilmar,
>> 
>> Do the bioperl-db bindings for BioSQL on PostgreSQL still require those
>> extra rules in the schema?
>> http://bugzilla.open-bio.org/show_bug.cgi?id=2839
>> 
>> Peter
> 
> -- 
> ===========================================================
> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
> ===========================================================
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Thu Aug 19 15:37:39 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 19 Aug 2010 10:37:39 -0500
Subject: [Bioperl-l] Why I can't find the perl script
	"load_seqdatabase.pl" when use biosql database?
In-Reply-To: <C5FD4B85-25B3-4D76-AA99-B3DBE42400C7@drycafe.net>
References: <201008191602.49068.xupeng86@gmail.com>
	<14AD92D9-3EA7-4E74-8EF7-2B2EC70D9095@drycafe.net>
	<C6FECD93-E599-465B-A93A-BD1F2CDFBE9C@hudsonalpha.org>
	<C5FD4B85-25B3-4D76-AA99-B3DBE42400C7@drycafe.net>
Message-ID: <68FB78FF-11F7-43D7-9FA3-5DFF7D391FAB@illinois.edu>

I don't recall this either.  So, can't blame it on lack of coffee :)

chris

On Aug 19, 2010, at 10:30 AM, Hilmar Lapp wrote:

> It's not deprecated. Unless I'm again mixing up something?
> 
> 	-hilmar
> 
> On Aug 19, 2010, at 11:00 AM, Michael Muratet wrote:
> 
>> 
>> On Aug 19, 2010, at 9:53 AM, Hilmar Lapp wrote:
>> 
>>> The file comes with Bioperl-db, not BioSQL. That is so because it depends on BioPerl and on Bioperl-db, and so you will need to have both installed.
>> 
>> Is load_seqdatabase.pl still the best method? I vaguely remember a post that said that load_seqdatabase was deprecated, but I can't find it in the archives.
>> 
>> Mike
>> 
>>> 
>>> 	-hilmar
>>> 
>>> On Aug 19, 2010, at 4:02 AM, xupeng wrote:
>>> 
>>>> 	I've downloaded the biosql-1.0.1.tar.gz. It works well. But I
>>>> can't find the 'load_seqdatabase.pl' when I try to import the
>>>> Genbank files into biosql databsase.
>>>> 	Can anyone give me a copy of that file?
>>>> many thanks !
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>>> -- 
>>> ===========================================================
>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
>>> ===========================================================
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> Michael Muratet, Ph.D.
>> Senior Scientist
>> HudsonAlpha Institute for Biotechnology
>> mmuratet at hudsonalpha.org
>> (256) 327-0473 (p)
>> (256) 327-0966 (f)
>> 
>> Room 4005
>> 601 Genome Way
>> Huntsville, Alabama 35806
>> 
>> 
>> 
>> 
>> 
> 
> -- 
> ===========================================================
> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
> ===========================================================
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From mmuratet at hudsonalpha.org  Thu Aug 19 15:40:02 2010
From: mmuratet at hudsonalpha.org (Michael Muratet)
Date: Thu, 19 Aug 2010 10:40:02 -0500
Subject: [Bioperl-l] Why I can't find the perl script
	"load_seqdatabase.pl" when use biosql database?
In-Reply-To: <68FB78FF-11F7-43D7-9FA3-5DFF7D391FAB@illinois.edu>
References: <201008191602.49068.xupeng86@gmail.com>
	<14AD92D9-3EA7-4E74-8EF7-2B2EC70D9095@drycafe.net>
	<C6FECD93-E599-465B-A93A-BD1F2CDFBE9C@hudsonalpha.org>
	<C5FD4B85-25B3-4D76-AA99-B3DBE42400C7@drycafe.net>
	<68FB78FF-11F7-43D7-9FA3-5DFF7D391FAB@illinois.edu>
Message-ID: <A0AD0D4E-89EC-4FA0-8625-FF0A2EFB5669@hudsonalpha.org>


On Aug 19, 2010, at 10:37 AM, Chris Fields wrote:

> I don't recall this either.  So, can't blame it on lack of coffee :)

Thanks. I'll keep using it!

Mike
>
> chris
>
> On Aug 19, 2010, at 10:30 AM, Hilmar Lapp wrote:
>
>> It's not deprecated. Unless I'm again mixing up something?
>>
>> 	-hilmar
>>
>> On Aug 19, 2010, at 11:00 AM, Michael Muratet wrote:
>>
>>>
>>> On Aug 19, 2010, at 9:53 AM, Hilmar Lapp wrote:
>>>
>>>> The file comes with Bioperl-db, not BioSQL. That is so because it  
>>>> depends on BioPerl and on Bioperl-db, and so you will need to  
>>>> have both installed.
>>>
>>> Is load_seqdatabase.pl still the best method? I vaguely remember a  
>>> post that said that load_seqdatabase was deprecated, but I can't  
>>> find it in the archives.
>>>
>>> Mike
>>>
>>>>
>>>> 	-hilmar
>>>>
>>>> On Aug 19, 2010, at 4:02 AM, xupeng wrote:
>>>>
>>>>> 	I've downloaded the biosql-1.0.1.tar.gz. It works well. But I
>>>>> can't find the 'load_seqdatabase.pl' when I try to import the
>>>>> Genbank files into biosql databsase.
>>>>> 	Can anyone give me a copy of that file?
>>>>> many thanks !
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>> -- 
>>>> ===========================================================
>>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
>>>> ===========================================================
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> Michael Muratet, Ph.D.
>>> Senior Scientist
>>> HudsonAlpha Institute for Biotechnology
>>> mmuratet at hudsonalpha.org
>>> (256) 327-0473 (p)
>>> (256) 327-0966 (f)
>>>
>>> Room 4005
>>> 601 Genome Way
>>> Huntsville, Alabama 35806
>>>
>>>
>>>
>>>
>>>
>>
>> -- 
>> ===========================================================
>> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
>> ===========================================================
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

Michael Muratet, Ph.D.
Senior Scientist
HudsonAlpha Institute for Biotechnology
mmuratet at hudsonalpha.org
(256) 327-0473 (p)
(256) 327-0966 (f)

Room 4005
601 Genome Way
Huntsville, Alabama 35806


From cjfields at illinois.edu  Thu Aug 19 15:55:54 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 19 Aug 2010 10:55:54 -0500
Subject: [Bioperl-l] Bug? Features with similar ranges,
	different IDs are considered overlapping
In-Reply-To: <EA0C23FB-8C2F-4C04-B0E8-4207409916DC@sbc.su.se>
References: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu>
	<A07B1A30-7708-4401-BB13-7B4463D306E7@sbc.su.se>
	<E3473ED6-2122-4B4A-8A73-E80C4136CCAC@illinois.edu>
	<83732A2C-C6E3-479A-ACAA-FE5756271991@sbc.su.se>
	<B7A8E3B4-1E7E-4768-AFF3-3D4C4A5FC3B1@scottcain.net>
	<EA0C23FB-8C2F-4C04-B0E8-4207409916DC@sbc.su.se>
Message-ID: <5611499B-FA63-4A52-8279-99B554418374@illinois.edu>

On Aug 17, 2010, at 8:52 AM, Dave Messina wrote:

>> It seems to me that the genomic comparison is the thing people would do more often, so if you're going to create a flag, the default should be for the genomic comparison
> 
> Yep, agreed.
> 
> And such a flag should be named for the non-default behavior, then, like: -ignore_IDs_for_overlaps
> 
> Dave

Probably would just be -ignore_ids as this behavior would have to be consistent across the various Bio::RangeI methods (overlaps, contains, etc).  The params are case-insensitive IIRC, so the _IDs would just be lc().

RangeI doesn't define a seq_id(), though, so we either use can() in RangeI (which is dirtier IMO) or define this in the appropriate class, probably LocationI or SeqFeatureI.

chris


From cjfields at illinois.edu  Thu Aug 19 15:56:11 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 19 Aug 2010 10:56:11 -0500
Subject: [Bioperl-l] Bug? Features with similar ranges,
	different IDs are considered overlapping
In-Reply-To: <B7A8E3B4-1E7E-4768-AFF3-3D4C4A5FC3B1@scottcain.net>
References: <4D07AB61-3AEC-4D34-8C23-563D9A61A00C@illinois.edu>
	<A07B1A30-7708-4401-BB13-7B4463D306E7@sbc.su.se>
	<E3473ED6-2122-4B4A-8A73-E80C4136CCAC@illinois.edu>
	<83732A2C-C6E3-479A-ACAA-FE5756271991@sbc.su.se>
	<B7A8E3B4-1E7E-4768-AFF3-3D4C4A5FC3B1@scottcain.net>
Message-ID: <7CF700A0-C7A0-4BD2-9757-50B693B3B614@illinois.edu>

Makes sense.  

chris

On Aug 17, 2010, at 7:45 AM, Scott Cain wrote:

> Hi Dave and Chris,
> 
> It seems to me that the genomic comparison is the thing people would do more often, so if you're going to create a flag, the default should be for the genomic comparison and if somebody is doing the protein space comparison and not getting the the expected results, they'll probably read the docs to find out why. 
> 
> Scott
> 
> --
> Scott Cain, Ph. D.
> scott at scottcain dot net
> Ontario Institute for Cancer Research
> http://gmod.org/
> 216 392 3087 
> 
> Snet from my iPhone.
> 
> On Aug 17, 2010, at 5:06 AM, Dave Messina <David.Messina at sbc.su.se> wrote:
> 
>>> Good point; it's probably the context the methods are used that matters.  So, maybe just a document clarification?
>> 
>> That's always good, but it really doesn't solve the issue you're describing.
>> 
>> I mean, who would expect to get overlaps for features on different chromosomes?
>> 
>> To me, that's a clear violation of reasonable user expectations. You shouldn't have to read the docs about something like that.
>> 
>> So what's the solution for these duelling use cases? I haven't thought about it much, but a first approximation might be to add a -genomic boolean flag that, when true, would do the right thing and check the ID when doing overlaps or other positional comparisons.
>> 
>> (Maybe -genomic is too obscure. Maybe it should be -same_id_for_overlaps or something like that.)
>> 
>> And maybe having to know to set a flag is effectively the same thing as having to read the docs to understand SeqFeature's overlap behavior.
>> 
>> What do the rest of you out there think?
>> 
>> 
>> Dave
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From David.Messina at sbc.su.se  Thu Aug 19 16:54:23 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 19 Aug 2010 18:54:23 +0200
Subject: [Bioperl-l]  Bug? Features with similar ranges,
	different IDs are considered overlapping
References: <83299B71-0F73-440D-A9C5-DC1DA2AFF605@davemessina.com>
Message-ID: <1EFB951F-AEE1-4B2A-9E29-114E40B25D21@sbc.su.se>

[Ccing list for real this time]

On Aug 19, 2010, at 17:55, Chris Fields <cjfields at illinois.edu> wrote:

> Probably would just be -ignore_ids

You're right, that's the way to go. 


> define this in the appropriate class, probably LocationI or 

Yep, that's cleaner.

Thanks!


Dave


From cjfields1 at gmail.com  Thu Aug 19 17:20:32 2010
From: cjfields1 at gmail.com (Christopher Fields)
Date: Thu, 19 Aug 2010 12:20:32 -0500
Subject: [Bioperl-l] Could I install BioPerl on Windows with the
	ActivePerl 5.12.1?
In-Reply-To: <AANLkTimBPL6Sr2kmg+f0t1j8pk_9nBAoqubKzY4AJoxo@mail.gmail.com>
References: <AANLkTi=ycKzqWWQ-FHk=4WBxhedt7CYT-WkBZkxRjgrm@mail.gmail.com>
	<78E913D5-00E2-45F2-AA9D-7F4A7CDBFDA1@gmail.com>
	<AANLkTimBPL6Sr2kmg+f0t1j8pk_9nBAoqubKzY4AJoxo@mail.gmail.com>
Message-ID: <5115F433-06AC-46F1-81AD-D15C4A8D9524@gmail.com>

cc'ing list.  Looks like the BioPerl PPM is possibly broken for perl 5.12.  Shouldn't be too hard to fix, but apparently there are a lot of missing packages. Troubling...

chris

On Aug 19, 2010, at 11:29 AM, han sun wrote:

> v5.10 works,thanks.
> 
> 2010/8/19 Christopher Fields <cjfields1 at gmail.com>
> Try using ActivePerl 5.10 instead of v5.12.  It's very possible the PPM won't work for v5.12 yet.
> 
> chris
> 
> On Aug 19, 2010, at 9:25 AM, han sun wrote:
> 
> > Hello everyone,
> >
> > I have used perl for several months,and I now want to feel the power of
> > bioperl.
> > But it seems that the installing is more difficult than I thought.
> >
> > I typed the commands.
> >
> >
> >
> > install-shell
> >
> >
> > rep add bioperl http://bioperl.org/DIST
> >
> >
> > rep add uwinnipeg
> > http://cpan.uwinnipeg.ca/PPMPackages/12xx/<http://cpan.uwinnipeg.ca/PPMPackages/10xx/>
> >
> >
> > rep add trouchelle http://trouchelle.com/ppm12/
> >
> > install BioPerl
> >
> > However,the installing failed,
> >
> > ppm install failed:
> > Can't find any package that provides List::MoreUtils for Bundle-BioPerl-Core
> > Can't find any package that provides PostScript::TextBlock for
> > Bundle-BioPerl-Core
> > Can't find any package that provides Ace:: for Bundle-BioPerl-Core
> > Can't find any package that provides SOAP::Lite for Bundle-BioPerl-Core
> > Can't find any package that provides Convert::Binary::C for
> > Bundle-BioPerl-Core
> > Can't find any package that provides XML::Twig for Bundle-BioPerl-Core
> > Can't find any package that provides DB_File:: for Bundle-BioPerl-Core
> > Can't find any package that provides IPC::Run for GraphViz
> > Can't find any package that provides XML-XPathEngine for XML-DOM-XPath
> > Can't find any package that provides List-MoreUtils for Moose
> > Can't find any package that provides List-MoreUtils for Class-MOP
> >
> >
> > then I tried
> >
> > install http://www.bribes.org/perl/ppm/GD.ppd
> >
> > and tried the installation again,but it still didn't help.
> >
> > *
> > *
> > *
> > *
> > *
> > *
> >
> >
> > *Do you konw what's wrong with the problem?*
> > *
> > *
> > *
> > *
> > *Please help me,thanks very much.*
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 


From rmb32 at cornell.edu  Thu Aug 19 17:09:45 2010
From: rmb32 at cornell.edu (Robert Buels)
Date: Thu, 19 Aug 2010 10:09:45 -0700
Subject: [Bioperl-l] reminder: Aug 25 deadline for GMOD Hackathon application
Message-ID: <4C6D6559.3080809@cornell.edu>

Hi all,

This is your one-week reminder: the deadline for open applications to 
the GMOD Evo hackathon is Wednesday, August 25th.

Rob

========================================

We are seeking participants for the GMOD Tools for Evolutionary Biology
Hackathon, held November 8-12, 2010 at the US National Evolutionary
Synthesis Center (NESCent) in Durham, NC.

This hackathon targets three critical gaps in the capabilities of the
GMOD toolbox that currently limit its utility for evolutionary research:

  1. Visualization of comparative genomics data
  2. Visualization of phylogenetic data and trees
  3. Support for population diversity and phenotype data

If you are interested in these areas and have relevant expertise, you
are strongly encouraged to apply. Relevant areas of expertise include
more than just software development: if you are a GMOD power user,
visualization guru, domain expert (comparative, phylogenetics,
population, ...), or documentation wizard, then your skills are needed!

How To Apply:

Fill out the online application form at http://bit.ly/gmodevohack.
Applications are due August 25.

About GMOD:

GMOD is an intercompatible suite of open-source software components for
storing, managing, analyzing, and visualizing genome-scale data. GMOD
includes many widely-used software components: GBrowse and JBrowse, both
genome viewers; GBrowse_syn, a comparative genomics viewer; Chado, a
generic and modular database schema; CMap, a comparative map viewer; as
well as many other components including Apollo, MAKER, BioMart,
InterMine, and Galaxy. We hope to extend the functionality of existing
GMOD components, and integrate new components as well.

About Hackathons:

A hackathon is an intense event at which a group of programmers with
different backgrounds and skills collaborate hands-on and face-to-face
to develop working code that is of utility to the community as a whole.
The mix of people will include domain experts and computer-savvy end-users.

More details about the event, its motivation, organization, procedures,
and attendees, as well as URLs to the hackathon and related websites are
included below.

Sincerely,

The GMOD EvoHack Organizing Committee (and project affiliations as
relevant):

Nicole Washington, Chair (LBNL, modENCODE, Phenote)

Robert Buels (SGN, Chado NatDiv)

Scott Cain (OICR, GMOD)

Dave Clements (NESCent, GMOD)

Hilmar Lapp (NESCent, Phenoscape, Chado NatDiv)

Sheldon McKay (University of Arizona, iPlant, GBrowse_syn)


-----------------------------

About the GMOD Evo Hackathon

Overview

We are organizing a hackathon to fill critical gaps in the capabilities
of the Generic Model Organism Database (GMOD) toolbox that currently
limit its utility for evolutionary research. Specifically, we will focus
on tools for

   1) viewing comparative genomics data;
   2) visualizing phylogenomic data; and
   3) supporting population diversity data and phenotype annotation.

The event will be hosted at NESCent and bring together a group of about
20+ software developers, end-user representatives, and documentation
experts who would otherwise not meet. The participants will include key
developers of GMOD components that currently lack features critical for
emerging evolutionary biology research, developers of informatics tools
in evolutionary research that lack GMOD integration, and
informatics-savvy biologists who can represent end-user requirements.

The event will provide a unique opportunity to infuse the GMOD developer
community with a heightened awareness of unmet needs in evolutionary
biology that GMOD components have the potential to fill, and for tool
developers in evolutionary biology to better understand how best to
extend or integrate with already existing GMOD components.

Before the Event

Discussion of ideas and sometimes even design actually starts well
before the hackathon, on mailing lists, wiki pages, and conference calls
set up among accepted attendees.  This advance work lays the foundation
for participants to be productive from the very first day.  This also
means that participants should be willing to contribute some time in
advance of the hackathon itself to participate in this preparatory
discussion.

During the Event

Typically, hackathon participants use the morning of the first day of
the event to organize themselves into working groups of between 3 and 6
people, each with a focused implementation objective.  Ideas and
objectives are discussed, and attendees coalesce around the projects in
which they have the most experience or interest.


Deliverables / Event Results

The meeting's attendance, working groups, and outcomes will be fully
logged and documented on the GMOD wiki (http://gmod.org). Each working
group during the event will typically have its own wiki page, linked
from the main EvoHack page, where it documents its minutes and design
notes, and provides links to the code and documentation it produces.
Also, since GMOD and NESCent are both committed to open source
principles, all code and documentation produced by participants during
the hackathon must be published under an OSI-approved open source
license. As contributions to existing GMOD tools, all hackathon products
will most likely satisfy this requirement automatically.

NESCent

This event is sponsored by the US National Evolutionary Synthesis Center
(NESCent, http://www.nescent.org) through its Informatics Whitepapers
program (http://www.nescent.org/informatics/whitepapers.php). NESCent
promotes the synthesis of information, concepts and knowledge to address
significant, emerging, or novel questions in evolutionary science and
its applications. NESCent achieves this by supporting research and
education across disciplinary, institutional, geographic, and
demographic boundaries (see http://www.nescent.org/science/proposals.php).

Links

Main GMOD EvoHack page, and full proposal:
http://gmod.org/wiki/GMOD_Evo_Hackathon

NESCent: http://www.nescent.org/
GMOD: http://gmod.org <http://gmod.org/>
Similar past NESCent events, see: http://hackathon.nescent.org/
GMOD hackathon application:  http://bit.ly/gmodevohack

-- 
http://gmod.org/wiki/GMOD_News
http://gmod.org/wiki/GMOD_Europe_2010
http://gmod.org/wiki/Help_Desk_Feedback


From David.Messina at sbc.su.se  Thu Aug 19 18:55:50 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 19 Aug 2010 20:55:50 +0200
Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlast.pm - bl2seq
	question
In-Reply-To: <4C6D7123.9080908@bcm.tmc.edu>
References: <4C6C3259.4060304@bcm.tmc.edu>
	<E8F0F7A7-BC33-4E37-8AAB-75A9470E82A5@sbc.su.se>
	<4C6D7123.9080908@bcm.tmc.edu>
Message-ID: <4E977318-05AC-4D8E-9A39-8C07A2419198@sbc.su.se>


Glad I could help, Caleb.

Dave


On Aug 19, 2010, at 20:00, Caleb Davis <cdavis at bcm.tmc.edu> wrote:

> Hi Dave,
> 
> Thank you so much for your detailed response! Fixing the reward parameter replicated the online result for me.  All of the other factors you brought up will help me track down any future problems. Thanks again.
> 
> --Caleb
> 


From rmb32 at cornell.edu  Thu Aug 19 22:19:11 2010
From: rmb32 at cornell.edu (Robert Buels)
Date: Thu, 19 Aug 2010 15:19:11 -0700
Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query
In-Reply-To: <5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu>
References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au>
	<986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net>
	<AANLkTikiTQwzKJGrAKAtCqbHiAVw_uENrJu4Y038=anD@mail.gmail.com>
	<045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net>
	<5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu>
Message-ID: <4C6DADDF.1000103@cornell.edu>

Chris Fields wrote:
> I think it's worth exploring having a DBIx::Class-based middle-ware approach similar to what Rob Buels has done for Chado.  That would be fairly easy to get started using DBIx::Class::Schema::Loader.
> 
> After that it would require optimization and tweaking, which is potentially more complex than Rob's setup as Chado is very Pg-specific, but maybe Rob can elaborate...

Elaborating on how Bio::Chado::Schema is developed:

The vast majority of the code and POD in BCS is autogenerated by 
DBIx::Class::Schema::Loader.  DBICSL gives you a baseline set of 
DBIx::Class classes that covers all the tables, views, columns, unique 
constraints, and foreign key relationships.

Beyond that, you have to add on yourself.  In BCS, we have mostly done 
things like:

   * make better-named aliases for some of the autogenerated
     relationships (though DBICSL does a surprisingly good job of naming
     relationships automatically most of the time)
   * add a tiny bit of bioperl compatibility (this needs a lot more work
     by somebody, volunteers needed!)
   * add convenience methods for using some of the Chado property tables
   * use DBIx::Class::Tree::NestedSet to add some powerful ways of
     traversing phylogenetic tree relationships

Regarding DB backend specificity, BCS isn't Pg-specific at all, because 
DBIx::Class itself goes to great lengths to be compatible (and 
performant!) with just about every relational database out there.  In 
fact, the BCS test suite deploys a Chado schema into a temporary SQLite 
database using DBIC::Schema's deploy() method, and runs all of its tests 
on that.  Very handy.

Chado's Pg-specific server-side functions can of course be called 
through BCS if they are present, but it's perfectly possible to use 
Chado without any of the server-side functions, and mostly the way I use it.

Rob


From David.Messina at sbc.su.se  Fri Aug 20 09:19:14 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 20 Aug 2010 11:19:14 +0200
Subject: [Bioperl-l] Git for the lazy
Message-ID: <4A13D48C-B920-4FA5-AF18-292C764A8B79@sbc.su.se>

Hi everyone,

If you're like me and still getting up to speed with Git, you might find this helpful:

	http://www.spheredev.org/wiki/Git_for_the_lazy


Dave


From bgs500 at york.ac.uk  Fri Aug 20 13:07:50 2010
From: bgs500 at york.ac.uk (Ben Saville)
Date: Fri, 20 Aug 2010 14:07:50 +0100
Subject: [Bioperl-l] Problem Parsing BLAST output
Message-ID: <77D8BE69-E2D3-4C75-BED4-08F1755E414F@york.ac.uk>

Hi Everyone,

I'm very much new to the world of sequence data analysis (and this  
mailing list!), and have reached a roadblock.

I have BLASTed some contigs against a series of databases that I  
created. From this I would like to parse through the data and separate  
it before extracting the information of interest at a later point. I  
would like to separate the data by query ID. I found the following  
Bioperl script;

#!/usr/bin/perl

use Bio::Search::Result::BlastResult;
use Bio::SearchIO;

my $report = Bio::SearchIO->new( -file=>'All_BCM_results.bls', -format  
=> blast);
my $result = $report->next_result;
my %hits_by_query;
while (my $hit = $result->next_hit) {
   push @{$hits_by_query{$hit->name}}, $hit;
}

foreach my $qid ( keys %hits_by_query ) {
   my $result = Bio::Search::Result::BlastResult->new();
   $result->add_hit($_) for ( @{$hits_by_query{$qid}} );
   my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", - 
format=>'blast' );
   $blio->write_result($result);
}

running this script resulted in the following error;

BlastResult::new(): Not adding iterations.

------------- EXCEPTION: Bio::Root::NoSuchThing -------------
MSG: No such iteration number: 0. Valid range=1-0
VALUE: The number zero (0)
STACK: Error::throw
STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.8/Bio/Root/Root.pm:368
STACK: Bio::Search::Result::BlastResult::iteration /sw/lib/perl5/5.8.8/ 
Bio/Search/Result/BlastResult.pm:328
STACK: Bio::Search::Result::BlastResult::add_hit /sw/lib/perl5/5.8.8/ 
Bio/Search/Result/BlastResult.pm:258
STACK: /Users/bsaville/Desktop/Parsing_BLAST_by_query.pl:15
-------------------------------------------------------------

So I added
my $result = Bio::Search::Result::BlastResult->new(1);
The 1 to the line shown above, as it told me this was within the valid  
range. This produced the following error;

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Must define arrayref of Iterations when initializing a  
Bio::Search::Result::BlastResult

STACK: Error::throw
STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.8/Bio/Root/Root.pm:368
STACK: Bio::Search::Result::BlastResult::new /sw/lib/perl5/5.8.8/Bio/ 
Search/Result/BlastResult.pm:128
STACK: /Users/bsaville/Desktop/Parsing_BLAST_by_query.pl:14
-----------------------------------------------------------

I know that it is my inexperience that is causing this problem, but I  
really can't figure this out.

Regards
Ben Saville


From David.Messina at sbc.su.se  Fri Aug 20 13:48:28 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 20 Aug 2010 15:48:28 +0200
Subject: [Bioperl-l] Problem Parsing BLAST output
In-Reply-To: <77D8BE69-E2D3-4C75-BED4-08F1755E414F@york.ac.uk>
References: <77D8BE69-E2D3-4C75-BED4-08F1755E414F@york.ac.uk>
Message-ID: <0384052D-74D2-4789-B7FA-76EED826044F@sbc.su.se>

Hi Ben,

I would not use the script you posted ? I don't think it does what you want.

If you haven't already, you should take a look at the beginners' HOWTO

	http://www.bioperl.org/wiki/HOWTO:Beginners


 the SearchIO HOWTO

	http://www.bioperl.org/wiki/HOWTO:SearchIO


and the example scripts included with BioPerl:

	http://www.bioperl.org/wiki/Scripts


Incidentally, it's a lot of fiddly data processing to parse blast reports for many contigs against multiple databases and then go back and collate the results by query. I'm not sure exactly what you want to do once you've separated by query ? if you provide some more information, we could suggest ways to best get you where you want to go.

I will mention, though, that BLAST has the ability to search multiple separate databases in one go and collate the results for you. So that's something to consider.


Dave


From bernd.web at gmail.com  Fri Aug 20 15:17:05 2010
From: bernd.web at gmail.com (Bernd Web)
Date: Fri, 20 Aug 2010 17:17:05 +0200
Subject: [Bioperl-l] Bio::LocatableSeq end checking inconsistency
In-Reply-To: <004a01cb3aec$8c2ddd60$a4899820$%yin@ucd.ie>
References: <004a01cb3aec$8c2ddd60$a4899820$%yin@ucd.ie>
Message-ID: <AANLkTim2MyJ1XKmvYHr+8gX-j9h9z81==e5suTW09PWs@mail.gmail.com>

Hi Yin,

I am not quite sure if the following is also related to your gapped
length issue but I found I had to adapt the calculation of
ungapped_len in   Bio::LocatableSeq. If my slices did not contain any
letters or a new gap char I used, SimpleAlign could not find the
sequences when outputting the alignment. This was due to a difference
in length calculation:

SimpleAlign: uses \W:  $slice_seq =~ s/\W//g;
Bio::LocatableSeq::ungapped_len uses  "$string =~ s/[\.\-]+//g;"

I had to include '~' (for my local sequences) in the ungapped_len;
otherwise i would run into the end issues with SimpleAlign.


Kind regards,
Bernd


On Fri, Aug 13, 2010 at 3:36 PM, Jun Yin <jun.yin at ucd.ie> wrote:
> Hi, all,
>
>
>
> I am the google summer of code student working on Bio::Align subsystem
> refactoring. The code (Bio::SimpleAlign) I re-implemented now has passed
> nearly all the test, except a few tests on seq/start-end testing. But here
> comes a problem. This may be an old issue, that the Bio::LocatableSeq end
> assignment and checking are inconsistent.
>
>
>
> The current end checking method is based on:
>
> $end=$seq->_ungapped_len+$seq->start-1
>
> However, this checking may not fit the real world case.
>
>
>
> The inconsistency usually happens when a few columns of the sequence are
> removed.
>
>
>
> For example:
>
> my $a = Bio::LocatableSeq->new(
>
> ? ?-id ? ?=> 'a',
>
> ? ?-strand => 1,
>
> ? ?-seq ? => '-tcgatc-atcgatcg',
>
> ? ?-start => 30,
>
> ? ?-end ? => 43
>
> );
>
>
>
> If we remove the 1st, 8th and the last columns
>
>
>
> $a->seq() will be 'tcgatcatcgatc'
>
> $a->_ungapped_len==12
>
>
>
> Actually, in the real world, the first residue will still be 30 (the old
> $seq->start), and the last residue is the residue before the 43 (the old
> $seq->end), thus 42.
>
>
>
> But if you call a validation, the calculation is
> $a->_ungapped_len+$a->start-1=12+30-1=41
>
> So the reassignment of the $seq->end will not pass the validation.
>
>
>
> So unless you save the information to a new sequence object, the original
> position information will be lost anyway. But in some cases, we have to
> change the sequence in its original sequence object ..
>
>
>
> What is your suggestion on this issue?
>
> A. pass the test and lose the information ? ? ?#convenient in coding but the
> start-end annotation is not right any more
>
> B. keep the information and forget the test ? #the object will still
> remember where the last residue was in the original sequence. But is it
> really meaningful at all? Because all the other residues may come from
> nowhere
>
> C. Neither of above #any other suggestions?
>
>
>
> Cheers,
>
> Jun Yin
>
> Ph.D. student in U.C.D.
>
>
>
> Bioinformatics Laboratory
>
> Conway Institute
>
> University College Dublin
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From sidd.basu at gmail.com  Fri Aug 20 15:59:59 2010
From: sidd.basu at gmail.com (Siddhartha Basu)
Date: Fri, 20 Aug 2010 10:59:59 -0500
Subject: [Bioperl-l]  Re: bioperl-db and postgres8.3 - status query
In-Reply-To: <4C6DADDF.1000103@cornell.edu>
References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au>
	<986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net>
	<AANLkTikiTQwzKJGrAKAtCqbHiAVw_uENrJu4Y038=anD@mail.gmail.com>
	<045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net>
	<5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu>
	<4C6DADDF.1000103@cornell.edu>
Message-ID: <20100820155956.GB400@vpn-165-124-164-118.vpn.northwestern.edu>

Hi, 

On Thu, 19 Aug 2010, Robert Buels wrote:

> Chris Fields wrote:
> > I think it's worth exploring having a DBIx::Class-based middle-ware 
> > approach similar to what Rob Buels has done for Chado.  That would be 
> > fairly easy to get started using DBIx::Class::Schema::Loader.
> > After that it would require optimization and tweaking, which is 
> > potentially more complex than Rob's setup as Chado is very Pg-specific, 
> > but maybe Rob can elaborate...
>
> Elaborating on how Bio::Chado::Schema is developed:
>
> The vast majority of the code and POD in BCS is autogenerated by 
> DBIx::Class::Schema::Loader.  DBICSL gives you a baseline set of 
> DBIx::Class classes that covers all the tables, views, columns, unique 
> constraints, and foreign key relationships.
>
> Beyond that, you have to add on yourself.  In BCS, we have mostly done 
> things like:
>
>   * make better-named aliases for some of the autogenerated
>     relationships (though DBICSL does a surprisingly good job of naming
>     relationships automatically most of the time)
>   * add a tiny bit of bioperl compatibility (this needs a lot more work
>     by somebody, volunteers needed!)
>   * add convenience methods for using some of the Chado property tables
>   * use DBIx::Class::Tree::NestedSet to add some powerful ways of
>     traversing phylogenetic tree relationships
>
> Regarding DB backend specificity, BCS isn't Pg-specific at all, because 
> DBIx::Class itself goes to great lengths to be compatible (and performant!) 
> with just about every relational database out there.  
I would vouch for that at least as far as chado in oracle is concerned.
So,  far BCS works out flawlessly with our oracle chado instance at
dictybase. Quite a chunk of BCS based code is also active in couple of
our Mojo based webapps. The part which i still couldn't use directly is
the 'synonym' table as it clashes with oracle specific reserved keywords. 
However,  overall it seems to quite cross-RDMS compatible and highly
recommended.

-siddhartha


>In fact, the BCS test 
> suite deploys a Chado schema into a temporary SQLite database using 
> DBIC::Schema's deploy() method, and runs all of its tests on that.  Very 
> handy.
>
> Chado's Pg-specific server-side functions can of course be called through 
> BCS if they are present, but it's perfectly possible to use Chado without 
> any of the server-side functions, and mostly the way I use it.
>
> Rob
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jun.yin at ucd.ie  Fri Aug 20 16:17:33 2010
From: jun.yin at ucd.ie (Jun Yin)
Date: Fri, 20 Aug 2010 17:17:33 +0100
Subject: [Bioperl-l] Bio::LocatableSeq end checking inconsistency
In-Reply-To: <AANLkTim2MyJ1XKmvYHr+8gX-j9h9z81==e5suTW09PWs@mail.gmail.com>
References: <004a01cb3aec$8c2ddd60$a4899820$%yin@ucd.ie>
	<AANLkTim2MyJ1XKmvYHr+8gX-j9h9z81==e5suTW09PWs@mail.gmail.com>
Message-ID: <000b01cb4083$31f98280$95ec8780$%yin@ucd.ie>

Hi, Bernd,

Thx for your input. 

Yes, this is one of the old bugs in Bio::SimpleAlign.  $aln->slice just
simply $slice_seq =~ s/\W//g to calculate the ungapped length.
But in  $seq->_ungapped_len, this method use $string =~
s{[$GAP_SYMBOLS$FRAMESHIFT_SYMBOLS]+}{}g;
Which is '\-\.=~\\\/ ' to calculate the ungapped length.

To solve this problem, first, now I use 
$nonres = join("",$self->gap_char, $self->match_char,$self->missing_char);
Which is '-\.&' to remove the non-residue chars in the alignment sequence
(though if you use '=','~','\','/' will also cause problems).

Secondly, I have merged slice, remove_columns and remove_gaps, using the
same internal function. Thus it is easier to debug.

These changes will be merged into main BioPerl branch after next version.

But anyway, the confict is still there, because the non residue chars are
defined as:
In Bio::SimpleAlign, $aln->gap_char, $aln->missing_char, $aln->match_char
In Bio::LocatableSeq   
$GAP_SYMBOLS = '\-\.=~';
$FRAMESHIFT_SYMBOLS = '\\\/';

so try to use '-' or '.' for your gap char at the moment, otherwise you may
encounter end warnings in calculation.

And, if you want to keep gap only sequences, you can call the method as:
$aln2 = $aln->slice(20,30,1)
The last parameter is to keep gap only sequence.

Cheers,
Jun Yin
Ph.D.?student in U.C.D.

Bioinformatics Laboratory
Conway Institute
University College Dublin


-----Original Message-----
From: Bernd Web [mailto:bernd.web at gmail.com] 
Sent: Friday, August 20, 2010 4:17 PM
To: Jun Yin
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Bio::LocatableSeq end checking inconsistency

Hi Yin,

I am not quite sure if the following is also related to your gapped
length issue but I found I had to adapt the calculation of
ungapped_len in   Bio::LocatableSeq. If my slices did not contain any
letters or a new gap char I used, SimpleAlign could not find the
sequences when outputting the alignment. This was due to a difference
in length calculation:

SimpleAlign: uses \W:  $slice_seq =~ s/\W//g;
Bio::LocatableSeq::ungapped_len uses  "$string =~ s/[\.\-]+//g;"

I had to include '~' (for my local sequences) in the ungapped_len;
otherwise i would run into the end issues with SimpleAlign.


Kind regards,
Bernd


On Fri, Aug 13, 2010 at 3:36 PM, Jun Yin <jun.yin at ucd.ie> wrote:
> Hi, all,
>
>
>
> I am the google summer of code student working on Bio::Align subsystem
> refactoring. The code (Bio::SimpleAlign) I re-implemented now has passed
> nearly all the test, except a few tests on seq/start-end testing. But here
> comes a problem. This may be an old issue, that the Bio::LocatableSeq end
> assignment and checking are inconsistent.
>
>
>
> The current end checking method is based on:
>
> $end=$seq->_ungapped_len+$seq->start-1
>
> However, this checking may not fit the real world case.
>
>
>
> The inconsistency usually happens when a few columns of the sequence are
> removed.
>
>
>
> For example:
>
> my $a = Bio::LocatableSeq->new(
>
> ? ?-id ? ?=> 'a',
>
> ? ?-strand => 1,
>
> ? ?-seq ? => '-tcgatc-atcgatcg',
>
> ? ?-start => 30,
>
> ? ?-end ? => 43
>
> );
>
>
>
> If we remove the 1st, 8th and the last columns
>
>
>
> $a->seq() will be 'tcgatcatcgatc'
>
> $a->_ungapped_len==12
>
>
>
> Actually, in the real world, the first residue will still be 30 (the old
> $seq->start), and the last residue is the residue before the 43 (the old
> $seq->end), thus 42.
>
>
>
> But if you call a validation, the calculation is
> $a->_ungapped_len+$a->start-1=12+30-1=41
>
> So the reassignment of the $seq->end will not pass the validation.
>
>
>
> So unless you save the information to a new sequence object, the original
> position information will be lost anyway. But in some cases, we have to
> change the sequence in its original sequence object ..
>
>
>
> What is your suggestion on this issue?
>
> A. pass the test and lose the information ? ? ?#convenient in coding but
the
> start-end annotation is not right any more
>
> B. keep the information and forget the test ? #the object will still
> remember where the last residue was in the original sequence. But is it
> really meaningful at all? Because all the other residues may come from
> nowhere
>
> C. Neither of above #any other suggestions?
>
>
>
> Cheers,
>
> Jun Yin
>
> Ph.D. student in U.C.D.
>
>
>
> Bioinformatics Laboratory
>
> Conway Institute
>
> University College Dublin
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

__________ Information from ESET Smart Security, version of virus signature
database 5377 (20100818) __________

The message was checked by ESET Smart Security.

http://www.eset.com


__________ Information from ESET Smart Security, version of virus signature
database 5377 (20100818) __________

The message was checked by ESET Smart Security.

http://www.eset.com
 

From cjfields at illinois.edu  Fri Aug 20 16:23:07 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 20 Aug 2010 11:23:07 -0500
Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query
In-Reply-To: <20100820155956.GB400@vpn-165-124-164-118.vpn.northwestern.edu>
References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au>
	<986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net>
	<AANLkTikiTQwzKJGrAKAtCqbHiAVw_uENrJu4Y038=anD@mail.gmail.com>
	<045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net>
	<5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu>
	<4C6DADDF.1000103@cornell.edu>
	<20100820155956.GB400@vpn-165-124-164-118.vpn.northwestern.edu>
Message-ID: <1282321387.30812.17.camel@pyrimidine.igb.uiuc.edu>

On Fri, 2010-08-20 at 10:59 -0500, Siddhartha Basu wrote:
> Hi, 
> 
> On Thu, 19 Aug 2010, Robert Buels wrote:
> 
> > Chris Fields wrote:
> > > I think it's worth exploring having a DBIx::Class-based middle-ware 
> > > approach similar to what Rob Buels has done for Chado.  That would be 
> > > fairly easy to get started using DBIx::Class::Schema::Loader.
> > > After that it would require optimization and tweaking, which is 
> > > potentially more complex than Rob's setup as Chado is very Pg-specific, 
> > > but maybe Rob can elaborate...
> >
> > Elaborating on how Bio::Chado::Schema is developed:
> >
> > The vast majority of the code and POD in BCS is autogenerated by 
> > DBIx::Class::Schema::Loader.  DBICSL gives you a baseline set of 
> > DBIx::Class classes that covers all the tables, views, columns, unique 
> > constraints, and foreign key relationships.
> >
> > Beyond that, you have to add on yourself.  In BCS, we have mostly done 
> > things like:
> >
> >   * make better-named aliases for some of the autogenerated
> >     relationships (though DBICSL does a surprisingly good job of naming
> >     relationships automatically most of the time)
> >   * add a tiny bit of bioperl compatibility (this needs a lot more work
> >     by somebody, volunteers needed!)
> >   * add convenience methods for using some of the Chado property tables
> >   * use DBIx::Class::Tree::NestedSet to add some powerful ways of
> >     traversing phylogenetic tree relationships
> >
> > Regarding DB backend specificity, BCS isn't Pg-specific at all, because 
> > DBIx::Class itself goes to great lengths to be compatible (and performant!) 
> > with just about every relational database out there.  
> I would vouch for that at least as far as chado in oracle is concerned.
> So,  far BCS works out flawlessly with our oracle chado instance at
> dictybase. Quite a chunk of BCS based code is also active in couple of
> our Mojo based webapps. The part which i still couldn't use directly is
> the 'synonym' table as it clashes with oracle specific reserved keywords. 
> However,  overall it seems to quite cross-RDMS compatible and highly
> recommended.
> 
> -siddhartha

Just to point out, I didn't say BCS is Pg-specific, but that Chado is
(that was the DBMS it was designed for).  Maybe that should be amended
to 'was' now :)

I recall seeing a page on this somewhere on the GMOD website along the
lines of "MySQL has problems so we chose Pg", and that Chado support
would focus on Pg.  I'm guessing that's no longer the case?  Or is only
the server-side stuff Pg-specific.

> >In fact, the BCS test 
> > suite deploys a Chado schema into a temporary SQLite database using 
> > DBIC::Schema's deploy() method, and runs all of its tests on that.  Very 
> > handy.
> >
> > Chado's Pg-specific server-side functions can of course be called through 
> > BCS if they are present, but it's perfectly possible to use Chado without 
> > any of the server-side functions, and mostly the way I use it.
> >
> > Rob

I think this opens up the possibility of starting a DBIx::Class-based
middleware solution.  Hilmar, did you want to take that on?

chris


From sidd.basu at gmail.com  Fri Aug 20 17:39:44 2010
From: sidd.basu at gmail.com (Siddhartha Basu)
Date: Fri, 20 Aug 2010 12:39:44 -0500
Subject: [Bioperl-l]  Re: bioperl-db and postgres8.3 - status query
In-Reply-To: <1282321387.30812.17.camel@pyrimidine.igb.uiuc.edu>
References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au>
	<986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net>
	<AANLkTikiTQwzKJGrAKAtCqbHiAVw_uENrJu4Y038=anD@mail.gmail.com>
	<045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net>
	<5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu>
	<4C6DADDF.1000103@cornell.edu>
	<20100820155956.GB400@vpn-165-124-164-118.vpn.northwestern.edu>
	<1282321387.30812.17.camel@pyrimidine.igb.uiuc.edu>
Message-ID: <20100820173942.GC400@vpn-165-124-164-118.vpn.northwestern.edu>

On Fri, 20 Aug 2010, Chris Fields wrote:

> On Fri, 2010-08-20 at 10:59 -0500, Siddhartha Basu wrote:
> > Hi, 
> > 
> > On Thu, 19 Aug 2010, Robert Buels wrote:
> > 
> > > Chris Fields wrote:
> > > > I think it's worth exploring having a DBIx::Class-based middle-ware 
> > > > approach similar to what Rob Buels has done for Chado.  That would be 
> > > > fairly easy to get started using DBIx::Class::Schema::Loader.
> > > > After that it would require optimization and tweaking, which is 
> > > > potentially more complex than Rob's setup as Chado is very Pg-specific, 
> > > > but maybe Rob can elaborate...
> > >
> > > Elaborating on how Bio::Chado::Schema is developed:
> > >
> > > The vast majority of the code and POD in BCS is autogenerated by 
> > > DBIx::Class::Schema::Loader.  DBICSL gives you a baseline set of 
> > > DBIx::Class classes that covers all the tables, views, columns, unique 
> > > constraints, and foreign key relationships.
> > >
> > > Beyond that, you have to add on yourself.  In BCS, we have mostly done 
> > > things like:
> > >
> > >   * make better-named aliases for some of the autogenerated
> > >     relationships (though DBICSL does a surprisingly good job of naming
> > >     relationships automatically most of the time)
> > >   * add a tiny bit of bioperl compatibility (this needs a lot more work
> > >     by somebody, volunteers needed!)
> > >   * add convenience methods for using some of the Chado property tables
> > >   * use DBIx::Class::Tree::NestedSet to add some powerful ways of
> > >     traversing phylogenetic tree relationships
> > >
> > > Regarding DB backend specificity, BCS isn't Pg-specific at all, because 
> > > DBIx::Class itself goes to great lengths to be compatible (and performant!) 
> > > with just about every relational database out there.  
> > I would vouch for that at least as far as chado in oracle is concerned.
> > So,  far BCS works out flawlessly with our oracle chado instance at
> > dictybase. Quite a chunk of BCS based code is also active in couple of
> > our Mojo based webapps. The part which i still couldn't use directly is
> > the 'synonym' table as it clashes with oracle specific reserved keywords. 
> > However,  overall it seems to quite cross-RDMS compatible and highly
> > recommended.
> > 
> > -siddhartha
> 
> Just to point out, I didn't say BCS is Pg-specific, but that Chado is
> (that was the DBMS it was designed for).  Maybe that should be amended
> to 'was' now :)
> 
> I recall seeing a page on this somewhere on the GMOD website along the
> lines of "MySQL has problems so we chose Pg", and that Chado support
> would focus on Pg.  
As far as i understand GMOD stongly recommends and the popular backend
for chado is Pg. However, my point was if anybody wants to use or tryout chado
schema on a different backend or have an existing setup,  
tools like DBIx::Class or particularly BCS makes it quite easier to do
so. The code developed on top also become quite robust and portable.

-siddhartha 

>I'm guessing that's no longer the case?  Or is only
> the server-side stuff Pg-specific.
> 
> > >In fact, the BCS test 
> > > suite deploys a Chado schema into a temporary SQLite database using 
> > > DBIC::Schema's deploy() method, and runs all of its tests on that.  Very 
> > > handy.
> > >
> > > Chado's Pg-specific server-side functions can of course be called through 
> > > BCS if they are present, but it's perfectly possible to use Chado without 
> > > any of the server-side functions, and mostly the way I use it.
> > >
> > > Rob
> 
> I think this opens up the possibility of starting a DBIx::Class-based
> middleware solution.  Hilmar, did you want to take that on?
> 
> chris
> 
> 


From buiduyminh at gmail.com  Fri Aug 20 21:29:00 2010
From: buiduyminh at gmail.com (Minh Bui)
Date: Fri, 20 Aug 2010 17:29:00 -0400
Subject: [Bioperl-l] bp_seqfeature_load.pl fails on Mac os. Please help.
Message-ID: <AANLkTinsyOMPJxpks_pqMwLpW8gx0VRihhJsLDnF53mu@mail.gmail.com>

Hi,,
I am trying to load my GFF file to mysql database but I got this error
when I ran the bp_seqfeature_load.pl ( bioperl 1.6.1 on  MAC)

[BioComplexity-5:/usr/local/bin] minh% perl bp_seqfeature_load.pl
install_driver(mysql) failed: Can't locate DBD/mysql.pm in @INC (@INC
contains: /sw/lib/perl5 /sw/lib/perl5/darwin
/System/Library/Perl/5.8.6/darwin-thread-multi-2level
/System/Library/Perl/5.8.6
/Library/Perl/5.8.6/darwin-thread-multi-2level /Library/Perl/5.8.6
/Library/Perl /Network/Library/Perl/5.8.6/darwin-thread-multi-2level
/Network/Library/Perl/5.8.6 /Network/Library/Perl
/System/Library/Perl/Extras/5.8.6/darwin-thread-multi-2level
/System/Library/Perl/Extras/5.8.6 /Library/Perl/5.8.1 .) at (eval 44)
line 3.
Perhaps the DBD::mysql perl module hasn't been fully installed,
or perhaps the capitalisation of 'mysql' isn't right.
Available drivers: DBM, ExampleP, File, Gofer, Proxy, Sponge.
 at /Library/Perl/5.8.6/Bio/DB/SeqFeature/Store/DBI/mysql.pm line 212

I am using MAC OSX version 10.4.10 and MAMP? Isnt it the
"/Library/Perl/5.8.6" already in @INC? What am I missing?
I have been googling this error for a few hours. I also install
Bioperl and reinstall DBD::mysql using CPAN. It still doesnt work..

Here is my $PERL5LIB:  /sw/lib/perl5:/sw/lib/perl5/darwin/

I really need help on this.
Thank you,


From awitney at sgul.ac.uk  Sat Aug 21 10:39:10 2010
From: awitney at sgul.ac.uk (Adam Witney)
Date: Sat, 21 Aug 2010 11:39:10 +0100
Subject: [Bioperl-l] bp_seqfeature_load.pl fails on Mac os. Please help.
In-Reply-To: <AANLkTinsyOMPJxpks_pqMwLpW8gx0VRihhJsLDnF53mu@mail.gmail.com>
References: <AANLkTinsyOMPJxpks_pqMwLpW8gx0VRihhJsLDnF53mu@mail.gmail.com>
Message-ID: <491D1B66-741F-4315-8A6B-46F465956017@sgul.ac.uk>


On 20 Aug 2010, at 22:29, Minh Bui wrote:

> Hi,,
> I am trying to load my GFF file to mysql database but I got this error
> when I ran the bp_seqfeature_load.pl ( bioperl 1.6.1 on  MAC)
> 
> [BioComplexity-5:/usr/local/bin] minh% perl bp_seqfeature_load.pl
> install_driver(mysql) failed: Can't locate DBD/mysql.pm in @INC (@INC
> contains: /sw/lib/perl5 /sw/lib/perl5/darwin
> /System/Library/Perl/5.8.6/darwin-thread-multi-2level
> /System/Library/Perl/5.8.6
> /Library/Perl/5.8.6/darwin-thread-multi-2level /Library/Perl/5.8.6
> /Library/Perl /Network/Library/Perl/5.8.6/darwin-thread-multi-2level
> /Network/Library/Perl/5.8.6 /Network/Library/Perl
> /System/Library/Perl/Extras/5.8.6/darwin-thread-multi-2level
> /System/Library/Perl/Extras/5.8.6 /Library/Perl/5.8.1 .) at (eval 44)
> line 3.
> Perhaps the DBD::mysql perl module hasn't been fully installed,
> or perhaps the capitalisation of 'mysql' isn't right.
> Available drivers: DBM, ExampleP, File, Gofer, Proxy, Sponge.
> at /Library/Perl/5.8.6/Bio/DB/SeqFeature/Store/DBI/mysql.pm line 212
> 
> I am using MAC OSX version 10.4.10 and MAMP? Isnt it the
> "/Library/Perl/5.8.6" already in @INC? What am I missing?
> I have been googling this error for a few hours. I also install
> Bioperl and reinstall DBD::mysql using CPAN. It still doesnt work..
> 
> Here is my $PERL5LIB:  /sw/lib/perl5:/sw/lib/perl5/darwin/


Where did DBD:mysql get installed? can you verify that DBD/mysql.pm is actually in one of those directories listed above?


From i.hatethispart at ymail.com  Sat Aug 21 14:07:28 2010
From: i.hatethispart at ymail.com (keiko)
Date: Sat, 21 Aug 2010 07:07:28 -0700 (PDT)
Subject: [Bioperl-l] clustalw.exe
In-Reply-To: <3612399.post@talk.nabble.com>
References: <3612399.post@talk.nabble.com>
Message-ID: <29499435.post@talk.nabble.com>


Katrin wrote:
> 
> hello, I am a new Perl/Bioperl-User and first I must excuse me for my
> really bad english, but I hope everybody will understand me. I have the
> following problem: In my Perl-skript is the following system call:
> $y=exec("C:\\Programme\\xampp-win32-1.5.1\\xampp\\perl\\clustalw.exe
> C:\\Programme\\xampp-win32-1.5.1\\xampp\\htdocs\\gene\\clustal.fasta"); If
> I call this Script with the Shell (cmd.exe) everything works correctly.
> But if I call this script with PHP I get the following error message:
> Error: unknown option
> /C:\Programme\xampp-win32-1.5.1\xampp\htdocs\gene\clustal.fasta. I tried
> also system and qx. And I tested the environment variables: I wrote a
> bat-file with the definition of all environment-variables and the system
> call, but this did not work, too. The same problem is in php. The
> PHP-Scipt is called from html and I worked under WindowsXP with xampp. I
> hope, somebody can help me. greetings Katrin
> 

Hi. I also have a problem with this one. I want to call clustalw using php.
Can I ask what you included in your bat-file and where did you download your
clustal? thanks a lot!
-- 
View this message in context: http://old.nabble.com/clustalw.exe-tp3612399p29499435.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From jason at bioperl.org  Sun Aug 22 18:29:30 2010
From: jason at bioperl.org (Jason Stajich)
Date: Sun, 22 Aug 2010 11:29:30 -0700
Subject: [Bioperl-l] Enquiry on Bio::DB::Taxonomy
In-Reply-To: <AANLkTik9qpKSQV9dRKzxSrt_q5qq=g6X6eop8LTqkRVm@mail.gmail.com>
References: <AANLkTik9qpKSQV9dRKzxSrt_q5qq=g6X6eop8LTqkRVm@mail.gmail.com>
Message-ID: <4C716C8A.3010000@bioperl.org>

Hi Amali -

This is how I'd print out the full classification by using the Tree 
methods (with probably a different way of initializing the $db object to 
your flatfiles location).

#!/usr/bin/perl -w
use strict;
use Bio::DB::Taxonomy;

my $db= Bio::DB::Taxonomy->new(-source => 'flatfile',
                    -nodesfile => 'taxonomy/nodes.dmp',
                    -namesfile => 'taxonomy/names.dmp');

my $taxonid = $db->get_taxonid('Homo sapiens');
my $taxon = $db->get_taxon(-taxonid => $taxonid);
my $tree = Bio::Tree::Tree->new(-node => $taxon);
my @taxa = $tree->get_nodes;
print join(",", map { $_->scientific_name } @taxa), "\n";

-jason

Amali Thrimawithana wrote, On 8/18/10 3:56 PM:
> Dear Dr Stajich,
>
> I am a Masters student at Auckland university and my research is on
> identifying yeast species present in wine by the use of 454 sequencing. In
> order to carry out this research, a pipeline is being built in which at the
> final step each representative OTU need to be classified at different
> taxonomic levels (ie: at Phylum, family, class, genus and species) by using
> the results from BLAST. To identify the sequences at each taxonomic level, I
> have been trying out the Bio::DB::Taxonomy module in bioperl. Using this
> module, I am able to get the genus and species level by splitting the
> scientific name returned by the Bio::taxon object. But unfortunately I am
> uncertain on how to get the information for the other levels of the rank. I
> have tried several commands including "my @class = $node->classification;",
> but it does not work. Hence, could you please let me know how I might be
> able to get the higher levels of taxonomy such as class and phylum using
> bioperl?
>
> Look forward to hearing from you soon
>
> Thanking You
>
> Amali
>    


From cjfields at illinois.edu  Sun Aug 22 19:56:58 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Sun, 22 Aug 2010 14:56:58 -0500
Subject: [Bioperl-l] clustalw.exe
In-Reply-To: <29499435.post@talk.nabble.com>
References: <3612399.post@talk.nabble.com> <29499435.post@talk.nabble.com>
Message-ID: <E6C6AE4B-A6AB-4B90-8D81-74DE14B165BD@illinois.edu>

On Aug 21, 2010, at 9:07 AM, keiko wrote:

> Katrin wrote:
>> 
>> hello, I am a new Perl/Bioperl-User and first I must excuse me for my
>> really bad english, but I hope everybody will understand me. I have the
>> following problem: In my Perl-skript is the following system call:
>> $y=exec("C:\\Programme\\xampp-win32-1.5.1\\xampp\\perl\\clustalw.exe
>> C:\\Programme\\xampp-win32-1.5.1\\xampp\\htdocs\\gene\\clustal.fasta"); If
>> I call this Script with the Shell (cmd.exe) everything works correctly.
>> But if I call this script with PHP I get the following error message:
>> Error: unknown option
>> /C:\Programme\xampp-win32-1.5.1\xampp\htdocs\gene\clustal.fasta. I tried
>> also system and qx. And I tested the environment variables: I wrote a
>> bat-file with the definition of all environment-variables and the system
>> call, but this did not work, too. The same problem is in php. The
>> PHP-Scipt is called from html and I worked under WindowsXP with xampp. I
>> hope, somebody can help me. greetings Katrin
>> 
> 
> Hi. I also have a problem with this one. I want to call clustalw using php.
> Can I ask what you included in your bat-file and where did you download your
> clustal? thanks a lot!

Not sure, but what does this have to do with BioPerl?

chris


From jason at bioperl.org  Mon Aug 23 15:56:47 2010
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 23 Aug 2010 08:56:47 -0700
Subject: [Bioperl-l] a problem when using the Bioperl modules
In-Reply-To: <AANLkTinZYJC6JwP776K3phzbAmtjiKMi_K_VTH=B6oeC@mail.gmail.com>
References: <AANLkTinZYJC6JwP776K3phzbAmtjiKMi_K_VTH=B6oeC@mail.gmail.com>
Message-ID: <4C729A3F.7080304@bioperl.org>

Wei -

Please ask your questions on the bioperl mailing list, I cannot answer 
questions directly for all requests.
Your problem has been answered by me on the list before so I urge you to 
use the list archives as a starting point.

The line lengths of the fasta file sequence aren't the same length.

you need to run this
bp_sreformat -if fasta -of fasta -i ORIGINAL -o NEW
mv NEW ORIGINAL

or with sreformat
sreformat fasta ORIGINAL > NEW
mv NEW ORIGINAL


Guifeng Wei wrote, On 8/23/10 4:57 AM:
> Dear professor Stajich,
> So sorry to interrupt you. i came across a problem when i use the 
> Bio::DB::Fasta modules of BioPerl.  The aim i want to arrive at is to 
> extract the subsequences accoording to the *.bed files which are the 
> C.elegans genomic sequnece annotation.  The code i programed is in the 
> attached file.
> The genomic sequences file contains sequences from 6 chromosomes of 
> C.elegans.
> when i run this program in the command line, the following error 
> warnings was coming.
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Each line of the fasta entry must be the same length except the last.
>     Line above #301451 '
> ..' is 22 != 51 chars.
> STACK: Error::throw
> STACK: Bio::Root::Root::throw 
> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368
> STACK: Bio::DB::Fasta::calculate_offsets 
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:770
> STACK: Bio::DB::Fasta::index_file 
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:680
> STACK: Bio::DB::Fasta::new 
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:491
> STACK: bed_to_fasta.pl:14 <http://bed_to_fasta.pl:14>
> -----------------------------------------------------------
> indexing was interrupted, so unlinking 
> /home/wgf/WORM_DATA/elegans.WS190.dna.fa.index at 
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm line 1053.
>
> and therefore i write to you in hope that you can help me solve this 
> problem,as well as, give me some suggestion about how to learn Bioperl 
> well.
> thank you very very much.
> yours sincerely
> Wei Guifeng


From jason.stajich at ucr.edu  Mon Aug 23 15:58:07 2010
From: jason.stajich at ucr.edu (Jason Stajich)
Date: Mon, 23 Aug 2010 08:58:07 -0700
Subject: [Bioperl-l] a problem when using the Bioperl modules
In-Reply-To: <AANLkTinrqwQCho_obj-_9MvQAyLEBVvaFA+HzJpFKovS@mail.gmail.com>
References: <AANLkTinZYJC6JwP776K3phzbAmtjiKMi_K_VTH=B6oeC@mail.gmail.com>
	<AANLkTinrqwQCho_obj-_9MvQAyLEBVvaFA+HzJpFKovS@mail.gmail.com>
Message-ID: <4C729A8F.1070506@ucr.edu>

You haven't defined this variable $db - you need to not skip the part 
that initializes the Bio::DB::Fasta object that you had previous asked 
about.
Please send all your future queries to the mailing list.


Guifeng Wei wrote, On 8/23/10 8:14 AM:
> Dear professor,
> after that, i revised my scripts, which is that i divide the genomic 
> sequences into 7 single file, every file contains the sequence from a 
> chromosome.
> however, when i try to run the scripts, the following error was coming.
> Can't call method "seq" on an undefined value at bed_to_fasta.pl 
> <http://bed_to_fasta.pl> line 29, <IN> line 1.
> while(<IN>){
>         chomp $_;
>         my @bed=split(/\s+/, $_ );
>     #print length($db->seq('chrI'));
>         my $chr_id=$bed[0];
>         my $start=$bed[1];
>         my $end=$bed[2];
>         my $seq_name=$bed[3];
>         my $strand=$bed[5];
> my $segment =  $db ->seq($chr_id,$start=>$end);
>         print ">",$seq_name,"_",$chr_id,":",$start=>$end;
>         print "$segment\n";
> }
> the blue line is .
> why?

-- 
Jason E. Stajich, PhD
Assistant Professor
Department of Plant Pathology & Microbiology
University of California
Riverside, CA 92521
jason.stajich at ucr.edu
office: 951.827.2363

http://lab.stajich.org/
http://twitter.com/stajichlab
http://fungalgenomes.org/blog/

http://plantpathology.ucr.edu/
http://genomics.ucr.edu/
http://cepceb.ucr.edu/


From guifengwei at gmail.com  Tue Aug 24 02:44:57 2010
From: guifengwei at gmail.com (Guifeng Wei)
Date: Tue, 24 Aug 2010 10:44:57 +0800
Subject: [Bioperl-l] a problem when using the Bio::DB::Fasta
Message-ID: <AANLkTinnyEw4f8F5BP+CicffaVCe+pBEXc+0rj5vu1iG@mail.gmail.com>

Hi,

i came across a problem when i use the Bio::DB::Fasta modules of
BioPerl. The aim i want to arrive at is to extract the subsequences
accoording to the *.bed files which are the C.elegans genomic sequnece
annotation.

when i tried to run the scripts i wrote, the error message was coming, as
follows:

Can't call method "seq" on an undefined value at bed_to_fasta.pl line 28,
<IN> line 1.

so, ask for favor to slove this problem.
Here is my perl scripts.

#!/usr/bin/perl -w
# Purpose: extract sequences from genomic sequences
use strict;
use Bio::DB::Fasta;
open(IN,$ARGV[0]) || die "sorry, the program cannot open the .bed file, plea
check it. \n";
my $db = Bio::DB::Fasta->new( '/home/wgf/elegans190.dna/' );
# The dir ...../elegans190.dna/ includes 6
files:chrI,chrII,chrIII,chrIV,chrV,chrX,
#each stands for the sequences from the coressponding chromosome.

while(<IN>){
        chomp $_;
        my @bed=split(/\s+/, $_ );

        my $chr_id=$bed[0];
        my $start=$bed[1];
        my $end=$bed[2];
        my $seq_name=$bed[3];
        my $strand=$bed[5];

        my $segment =  $db->seq( $chr_id, $start=>$end );

        print ">",$seq_name,"_",$chr_id,":",$start=>$end;
        print "$segment\n";

}

close(IN);


From florent.angly at gmail.com  Tue Aug 24 05:06:21 2010
From: florent.angly at gmail.com (Florent Angly)
Date: Tue, 24 Aug 2010 15:06:21 +1000
Subject: [Bioperl-l] a problem when using the Bio::DB::Fasta
In-Reply-To: <AANLkTinnyEw4f8F5BP+CicffaVCe+pBEXc+0rj5vu1iG@mail.gmail.com>
References: <AANLkTinnyEw4f8F5BP+CicffaVCe+pBEXc+0rj5vu1iG@mail.gmail.com>
Message-ID: <4C73534D.6080607@gmail.com>

  Hi Guifeng,

 From the Bio::DB::Fasta documentation:
>        $db = Bio::DB::Fasta->new($fasta_path [,%options])
>          Create a new Bio::DB::Fasta object from the Fasta file or files
>          indicated by $fasta_path.  Indexing will be performed 
> automatically
>          if needed.  If successful, new() will return the database 
> accessor
>          object.  Otherwise it will return undef.

Hence, after you create the database object $db, you should check that 
it was successful, e.g.:
> my $db = Bio::DB::Fasta->new( '/home/wgf/elegans190.dna/' );
> if (not defined $db) {
>   die "There was a problem creating the database\n";
> }
A problem creating the database would explain the message you get.

If the extension of the FASTA files in the directory path that you gave 
as input is not fa, fasta, fast, FA, FASTA, FAST or dna, then you should 
use the -glob option when constructing your database object. From the 
documentation:
>           -glob         Glob expression to use    
> *.{fa,fasta,fast,FA,FASTA,FAST,dna}
>                         for searching for Fasta
>                              files in directories.


Florent


On 24/08/10 12:44, Guifeng Wei wrote:
> Hi,
>
> i came across a problem when i use the Bio::DB::Fasta modules of
> BioPerl. The aim i want to arrive at is to extract the subsequences
> accoording to the *.bed files which are the C.elegans genomic sequnece
> annotation.
>
> when i tried to run the scripts i wrote, the error message was coming, as
> follows:
>
> Can't call method "seq" on an undefined value at bed_to_fasta.pl line 28,
> <IN>  line 1.
>
> so, ask for favor to slove this problem.
> Here is my perl scripts.
>
> #!/usr/bin/perl -w
> # Purpose: extract sequences from genomic sequences
> use strict;
> use Bio::DB::Fasta;
> open(IN,$ARGV[0]) || die "sorry, the program cannot open the .bed file, plea
> check it. \n";
> my $db = Bio::DB::Fasta->new( '/home/wgf/elegans190.dna/' );
> # The dir ...../elegans190.dna/ includes 6
> files:chrI,chrII,chrIII,chrIV,chrV,chrX,
> #each stands for the sequences from the coressponding chromosome.
>
> while(<IN>){
>          chomp $_;
>          my @bed=split(/\s+/, $_ );
>
>          my $chr_id=$bed[0];
>          my $start=$bed[1];
>          my $end=$bed[2];
>          my $seq_name=$bed[3];
>          my $strand=$bed[5];
>
>          my $segment =  $db->seq( $chr_id, $start=>$end );
>
>          print ">",$seq_name,"_",$chr_id,":",$start=>$end;
>          print "$segment\n";
>
> }
>
> close(IN);
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From guifengwei at gmail.com  Tue Aug 24 11:28:16 2010
From: guifengwei at gmail.com (Guifeng Wei)
Date: Tue, 24 Aug 2010 19:28:16 +0800
Subject: [Bioperl-l] a problem when using the Bio::DB::Fasta
In-Reply-To: <AANLkTinnyEw4f8F5BP+CicffaVCe+pBEXc+0rj5vu1iG@mail.gmail.com>
References: <AANLkTinnyEw4f8F5BP+CicffaVCe+pBEXc+0rj5vu1iG@mail.gmail.com>
Message-ID: <AANLkTik_yFysscFwAd-8Ar4S_cM-XCk5w+C=8121MWNA@mail.gmail.com>

Hi,

i have revised my scripts according to the previous email from Florent.
However, there were still some errors which frustrated me so much.

The errors are as follows:

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Each line of the fasta entry must be the same length except the last.
    Line above #301451 '
..' is 22 != 51 chars.
STACK: Error::throw
STACK: Bio::Root::Root::throw
/usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368
STACK: Bio::DB::Fasta::calculate_offsets
/usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:770
STACK: Bio::DB::Fasta::index_dir
/usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:593
STACK: Bio::DB::Fasta::new
/usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:488
STACK: bed2fasta.pl:13
-----------------------------------------------------------
indexing was interrupted, so unlinking
/home/wgf/elegans190.dna//directory.index at
/usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm line 1053
But in the directory /home/wgf/elegans190.dna/ , it concludes 6 files,
each contains the complete sequences from one single chromosome, the format
is fasta. The extension of the FASTA files is .fa. Every single file is
started as ">chromosoemeXXX" followed by the thousands of sequences.

and therefore, it warn me that "Each line of the fasta entry must be the
same length except the last". and "indexing was interrupted, so unlinking
/home/wgf/elegans190.dna//directory".

i was much confused about this. so for help.

Wei Guifeng


From biopython at maubp.freeserve.co.uk  Tue Aug 24 13:28:33 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 24 Aug 2010 14:28:33 +0100
Subject: [Bioperl-l] a problem when using the Bio::DB::Fasta
In-Reply-To: <AANLkTik_yFysscFwAd-8Ar4S_cM-XCk5w+C=8121MWNA@mail.gmail.com>
References: <AANLkTinnyEw4f8F5BP+CicffaVCe+pBEXc+0rj5vu1iG@mail.gmail.com>
	<AANLkTik_yFysscFwAd-8Ar4S_cM-XCk5w+C=8121MWNA@mail.gmail.com>
Message-ID: <AANLkTi=Nn7m1_6mPoiUcmJNsBoFu4eh-pO9QJaVipOU0@mail.gmail.com>

On Tue, Aug 24, 2010 at 12:28 PM, Guifeng Wei <guifengwei at gmail.com> wrote:
> Hi,
>
> i have revised my scripts according to the previous email from Florent.
> However, there were still some errors which frustrated me so much.
>
> The errors are as follows:
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Each line of the fasta entry must be the same length except the last.
> ? ?Line above #301451 '
> ..' is 22 != 51 chars.
> STACK: Error::throw
> STACK: Bio::Root::Root::throw
> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368
> STACK: Bio::DB::Fasta::calculate_offsets
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:770
> STACK: Bio::DB::Fasta::index_dir
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:593
> STACK: Bio::DB::Fasta::new
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:488
> STACK: bed2fasta.pl:13
> -----------------------------------------------------------
> indexing was interrupted, so unlinking
> /home/wgf/elegans190.dna//directory.index at
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm line 1053
> But in the directory /home/wgf/elegans190.dna/ , it concludes 6 files,
> each contains the complete sequences from one single chromosome, the format
> is fasta. The extension of the FASTA files is .fa. Every single file is
> started as ">chromosoemeXXX" followed by the thousands of sequences.
>
> and therefore, it warn me that "Each line of the fasta entry must be the
> same length except the last". and "indexing was interrupted, so unlinking
> /home/wgf/elegans190.dna//directory".
>
> i was much confused about this. so for help.
>
> Wei Guifeng

Hi Wei,

It sounds like there is inconsistent line wrapping in your FASTA file.
This is often not a problem at all, but the DB indexing system (and
indeed other indexing tools like the samtools fasta index) requires
all the entries have the same wrapping.

e.g. This is a valid FASTA file but would not be suitable for indexing:

>Test
ACGTACGT
ACGTACGT
ACGTACGT
ACGT
ACGT
T

Ignoring the final line (special case - here length one) that uses a
mixture of line lengths, 8 and 4. If you had used this it should be
fine:

>Test
ACGTACGT
ACGTACGT
ACGTACGT
ACGTACGT
T

All the lines are now wrapped at length 8 (and the final line is
less than or equal to length 8).

Of course, in a real file wrapping a 60 or 80 characters is more
common ;)

Peter


From cjfields at illinois.edu  Tue Aug 24 13:38:45 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 24 Aug 2010 08:38:45 -0500
Subject: [Bioperl-l] a problem when using the Bio::DB::Fasta
In-Reply-To: <AANLkTik_yFysscFwAd-8Ar4S_cM-XCk5w+C=8121MWNA@mail.gmail.com>
References: <AANLkTinnyEw4f8F5BP+CicffaVCe+pBEXc+0rj5vu1iG@mail.gmail.com>
	<AANLkTik_yFysscFwAd-8Ar4S_cM-XCk5w+C=8121MWNA@mail.gmail.com>
Message-ID: <995BCF30-99B2-46C2-A4E8-681F9E2A0BB5@illinois.edu>

Guifeng,

Did you follow Jason's advice yesterday about converting the FASTA over to a more consistent length?  Or checking the database itself?  These are both things reiterated by Florent and Peter.

>From Jason's last response:

-------------------------
Wei -

Please ask your questions on the bioperl mailing list, I cannot answer questions directly for all requests.
Your problem has been answered by me on the list before so I urge you to use the list archives as a starting point.

The line lengths of the fasta file sequence aren't the same length.

you need to run this
bp_sreformat -if fasta -of fasta -i ORIGINAL -o NEW
mv NEW ORIGINAL

or with sreformat
sreformat fasta ORIGINAL > NEW
mv NEW ORIGINAL
-------------------------

chris


On Aug 24, 2010, at 6:28 AM, Guifeng Wei wrote:

> Hi,
> 
> i have revised my scripts according to the previous email from Florent.
> However, there were still some errors which frustrated me so much.
> 
> The errors are as follows:
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Each line of the fasta entry must be the same length except the last.
>   Line above #301451 '
> ..' is 22 != 51 chars.
> STACK: Error::throw
> STACK: Bio::Root::Root::throw
> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368
> STACK: Bio::DB::Fasta::calculate_offsets
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:770
> STACK: Bio::DB::Fasta::index_dir
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:593
> STACK: Bio::DB::Fasta::new
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:488
> STACK: bed2fasta.pl:13
> -----------------------------------------------------------
> indexing was interrupted, so unlinking
> /home/wgf/elegans190.dna//directory.index at
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm line 1053
> But in the directory /home/wgf/elegans190.dna/ , it concludes 6 files,
> each contains the complete sequences from one single chromosome, the format
> is fasta. The extension of the FASTA files is .fa. Every single file is
> started as ">chromosoemeXXX" followed by the thousands of sequences.
> 
> and therefore, it warn me that "Each line of the fasta entry must be the
> same length except the last". and "indexing was interrupted, so unlinking
> /home/wgf/elegans190.dna//directory".
> 
> i was much confused about this. so for help.
> 
> Wei Guifeng
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From scott at scottcain.net  Tue Aug 24 15:01:47 2010
From: scott at scottcain.net (Scott Cain)
Date: Tue, 24 Aug 2010 11:01:47 -0400
Subject: [Bioperl-l] bioperl-db and postgres8.3 - status query
In-Reply-To: <1282321387.30812.17.camel@pyrimidine.igb.uiuc.edu>
References: <1282184103.14127.7.camel@zoidberg.mbs.adelaide.edu.au>
	<986B92AE-B71A-4128-BD1C-9CF0E10B4CD8@drycafe.net>
	<AANLkTikiTQwzKJGrAKAtCqbHiAVw_uENrJu4Y038=anD@mail.gmail.com>
	<045A8A7A-E49C-48AE-A3CC-674B8992ACDE@drycafe.net>
	<5750DE7E-AF08-4E98-861E-73E106EE72C7@illinois.edu>
	<4C6DADDF.1000103@cornell.edu>
	<20100820155956.GB400@vpn-165-124-164-118.vpn.northwestern.edu>
	<1282321387.30812.17.camel@pyrimidine.igb.uiuc.edu>
Message-ID: <AANLkTin01uf32_1G2+d8PA2YEtw3UfB5FK+CVPnLCD81@mail.gmail.com>

Hi Chris,

GMOD still only supports Chado with Postgres (for example, the GFF
loader assumes a Postgres database), but when I reengineered the GFF
loader a few years ago, I tried to do it with subclassing the loader
in mind so that it could be subclassed to work with other RDMS.

Scott


On Fri, Aug 20, 2010 at 12:23 PM, Chris Fields <cjfields at illinois.edu> wrote:
> On Fri, 2010-08-20 at 10:59 -0500, Siddhartha Basu wrote:
>> Hi,
>>
>> On Thu, 19 Aug 2010, Robert Buels wrote:
>>
>> > Chris Fields wrote:
>> > > I think it's worth exploring having a DBIx::Class-based middle-ware
>> > > approach similar to what Rob Buels has done for Chado. ?That would be
>> > > fairly easy to get started using DBIx::Class::Schema::Loader.
>> > > After that it would require optimization and tweaking, which is
>> > > potentially more complex than Rob's setup as Chado is very Pg-specific,
>> > > but maybe Rob can elaborate...
>> >
>> > Elaborating on how Bio::Chado::Schema is developed:
>> >
>> > The vast majority of the code and POD in BCS is autogenerated by
>> > DBIx::Class::Schema::Loader. ?DBICSL gives you a baseline set of
>> > DBIx::Class classes that covers all the tables, views, columns, unique
>> > constraints, and foreign key relationships.
>> >
>> > Beyond that, you have to add on yourself. ?In BCS, we have mostly done
>> > things like:
>> >
>> > ? * make better-named aliases for some of the autogenerated
>> > ? ? relationships (though DBICSL does a surprisingly good job of naming
>> > ? ? relationships automatically most of the time)
>> > ? * add a tiny bit of bioperl compatibility (this needs a lot more work
>> > ? ? by somebody, volunteers needed!)
>> > ? * add convenience methods for using some of the Chado property tables
>> > ? * use DBIx::Class::Tree::NestedSet to add some powerful ways of
>> > ? ? traversing phylogenetic tree relationships
>> >
>> > Regarding DB backend specificity, BCS isn't Pg-specific at all, because
>> > DBIx::Class itself goes to great lengths to be compatible (and performant!)
>> > with just about every relational database out there.
>> I would vouch for that at least as far as chado in oracle is concerned.
>> So, ?far BCS works out flawlessly with our oracle chado instance at
>> dictybase. Quite a chunk of BCS based code is also active in couple of
>> our Mojo based webapps. The part which i still couldn't use directly is
>> the 'synonym' table as it clashes with oracle specific reserved keywords.
>> However, ?overall it seems to quite cross-RDMS compatible and highly
>> recommended.
>>
>> -siddhartha
>
> Just to point out, I didn't say BCS is Pg-specific, but that Chado is
> (that was the DBMS it was designed for). ?Maybe that should be amended
> to 'was' now :)
>
> I recall seeing a page on this somewhere on the GMOD website along the
> lines of "MySQL has problems so we chose Pg", and that Chado support
> would focus on Pg. ?I'm guessing that's no longer the case? ?Or is only
> the server-side stuff Pg-specific.
>
>> >In fact, the BCS test
>> > suite deploys a Chado schema into a temporary SQLite database using
>> > DBIC::Schema's deploy() method, and runs all of its tests on that. ?Very
>> > handy.
>> >
>> > Chado's Pg-specific server-side functions can of course be called through
>> > BCS if they are present, but it's perfectly possible to use Chado without
>> > any of the server-side functions, and mostly the way I use it.
>> >
>> > Rob
>
> I think this opens up the possibility of starting a DBIx::Class-based
> middleware solution. ?Hilmar, did you want to take that on?
>
> chris
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
Ontario Institute for Cancer Research


From bgs500 at york.ac.uk  Tue Aug 24 15:35:53 2010
From: bgs500 at york.ac.uk (Ben Saville)
Date: Tue, 24 Aug 2010 16:35:53 +0100
Subject: [Bioperl-l] Problem Parsing BLAST output
In-Reply-To: <0384052D-74D2-4789-B7FA-76EED826044F@sbc.su.se>
References: <77D8BE69-E2D3-4C75-BED4-08F1755E414F@york.ac.uk>
	<0384052D-74D2-4789-B7FA-76EED826044F@sbc.su.se>
Message-ID: <34F7412D-2BFA-4D80-AEEB-2B8A9BE415D4@york.ac.uk>

Sorry for the Delay in replying, 454 data analysis is very time  
consuming.

please see http://seqanswers.com/forums/showthread.php?t=6484
For a discussion about this problem, and how we solved the issue.

Thanks for the reply though, much appreciated!

Regards
Ben Saville


On 20 Aug 2010, at 14:48, Dave Messina wrote:

> Hi Ben,
>
> I would not use the script you posted ? I don't think it does what  
> you want.
>
> If you haven't already, you should take a look at the beginners' HOWTO
>
> 	http://www.bioperl.org/wiki/HOWTO:Beginners
>
>
> the SearchIO HOWTO
>
> 	http://www.bioperl.org/wiki/HOWTO:SearchIO
>
>
> and the example scripts included with BioPerl:
>
> 	http://www.bioperl.org/wiki/Scripts
>
>
>
> Incidentally, it's a lot of fiddly data processing to parse blast  
> reports for many contigs against multiple databases and then go back  
> and collate the results by query. I'm not sure exactly what you want  
> to do once you've separated by query ? if you provide some more  
> information, we could suggest ways to best get you where you want to  
> go.
>
> I will mention, though, that BLAST has the ability to search  
> multiple separate databases in one go and collate the results for  
> you. So that's something to consider.
>
>
>
> Dave
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Tue Aug 24 15:54:20 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 24 Aug 2010 10:54:20 -0500
Subject: [Bioperl-l] a problem when using the Bio::DB::Fasta
In-Reply-To: <AANLkTi=7_fFU4Q53S1onRZpFaVoS6ndNNq68ZSHMDoe3@mail.gmail.com>
References: <AANLkTinnyEw4f8F5BP+CicffaVCe+pBEXc+0rj5vu1iG@mail.gmail.com>
	<AANLkTik_yFysscFwAd-8Ar4S_cM-XCk5w+C=8121MWNA@mail.gmail.com>
	<995BCF30-99B2-46C2-A4E8-681F9E2A0BB5@illinois.edu>
	<AANLkTi=7_fFU4Q53S1onRZpFaVoS6ndNNq68ZSHMDoe3@mail.gmail.com>
Message-ID: <B269BA3E-C0E7-4FEA-BA78-E164F4D2B787@illinois.edu>

Please keep all responses on-list.  

Regarding sreformat:

http://tinyurl.com/28q75rr

Judging by the stack traces below, you are also running off a UNIX-like system.  To concatenate files, use 'cat'.  So, for all files ending with .fa:

cat *.fa >> all.fa

chris

On Aug 24, 2010, at 8:54 AM, Guifeng Wei wrote:

> Hello Fields,
>  
> i have checked the fasta files. i suddenly find that the last line is blank line, and the last second is less than common.
>  
> i am not able to run the command line as Jason's advice because i have no knowledge about "sreformat".
>  
> i also want to ask a more question. i want megre the several single chromosome sequence file into one, OK?
>  
> thank you very much.
>  
> Wei Guifeng
> 2010/8/24 Chris Fields <cjfields at illinois.edu>
> Guifeng,
> 
> Did you follow Jason's advice yesterday about converting the FASTA over to a more consistent length?  Or checking the database itself?  These are both things reiterated by Florent and Peter.
> 
> From Jason's last response:
> 
> -------------------------
> Wei -
> 
> Please ask your questions on the bioperl mailing list, I cannot answer questions directly for all requests.
> Your problem has been answered by me on the list before so I urge you to use the list archives as a starting point.
> 
> The line lengths of the fasta file sequence aren't the same length.
> 
> you need to run this
> bp_sreformat -if fasta -of fasta -i ORIGINAL -o NEW
> mv NEW ORIGINAL
> 
> or with sreformat
> sreformat fasta ORIGINAL > NEW
> mv NEW ORIGINAL
> -------------------------
> 
> chris
> 
> 
> On Aug 24, 2010, at 6:28 AM, Guifeng Wei wrote:
> 
> > Hi,
> >
> > i have revised my scripts according to the previous email from Florent.
> > However, there were still some errors which frustrated me so much.
> >
> > The errors are as follows:
> >
> > ------------- EXCEPTION: Bio::Root::Exception -------------
> > MSG: Each line of the fasta entry must be the same length except the last.
> >   Line above #301451 '
> > ..' is 22 != 51 chars.
> > STACK: Error::throw
> > STACK: Bio::Root::Root::throw
> > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368
> > STACK: Bio::DB::Fasta::calculate_offsets
> > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:770
> > STACK: Bio::DB::Fasta::index_dir
> > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:593
> > STACK: Bio::DB::Fasta::new
> > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:488
> > STACK: bed2fasta.pl:13
> > -----------------------------------------------------------
> > indexing was interrupted, so unlinking
> > /home/wgf/elegans190.dna//directory.index at
> > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm line 1053
> > But in the directory /home/wgf/elegans190.dna/ , it concludes 6 files,
> > each contains the complete sequences from one single chromosome, the format
> > is fasta. The extension of the FASTA files is .fa. Every single file is
> > started as ">chromosoemeXXX" followed by the thousands of sequences.
> >
> > and therefore, it warn me that "Each line of the fasta entry must be the
> > same length except the last". and "indexing was interrupted, so unlinking
> > /home/wgf/elegans190.dna//directory".
> >
> > i was much confused about this. so for help.
> >
> > Wei Guifeng
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> 
> 
> -- 
> ??? Wei Guifeng
> 
> 
> 


From cjfields at illinois.edu  Tue Aug 24 16:14:51 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 24 Aug 2010 11:14:51 -0500
Subject: [Bioperl-l] Problem Parsing BLAST output
In-Reply-To: <34F7412D-2BFA-4D80-AEEB-2B8A9BE415D4@york.ac.uk>
References: <77D8BE69-E2D3-4C75-BED4-08F1755E414F@york.ac.uk>
	<0384052D-74D2-4789-B7FA-76EED826044F@sbc.su.se>
	<34F7412D-2BFA-4D80-AEEB-2B8A9BE415D4@york.ac.uk>
Message-ID: <69C47A74-09C7-4024-9303-A3893658A2A8@illinois.edu>

Just in case anyone needs it, there is a way to index these as well (both BLAST and the two tabular BLAST versions) for fast lookups of specific reports, if needed.  See Bio::Index::Blast and Bio::Index::BlastTable in BioPerl.

Caveat: I believe there is a bug with BLAST+ text output indexing (it chops the header off subsequent reports).  I haven't investigated it enough, though, but I'll try looking into it today.  

chris

On Aug 24, 2010, at 10:35 AM, Ben Saville wrote:

> Sorry for the Delay in replying, 454 data analysis is very time consuming.
> 
> please see http://seqanswers.com/forums/showthread.php?t=6484
> For a discussion about this problem, and how we solved the issue.
> 
> Thanks for the reply though, much appreciated!
> 
> Regards
> Ben Saville
> 
> 
> 
> 
> 
> On 20 Aug 2010, at 14:48, Dave Messina wrote:
> 
>> Hi Ben,
>> 
>> I would not use the script you posted ? I don't think it does what you want.
>> 
>> If you haven't already, you should take a look at the beginners' HOWTO
>> 
>> 	http://www.bioperl.org/wiki/HOWTO:Beginners
>> 
>> 
>> the SearchIO HOWTO
>> 
>> 	http://www.bioperl.org/wiki/HOWTO:SearchIO
>> 
>> 
>> and the example scripts included with BioPerl:
>> 
>> 	http://www.bioperl.org/wiki/Scripts
>> 
>> 
>> 
>> Incidentally, it's a lot of fiddly data processing to parse blast reports for many contigs against multiple databases and then go back and collate the results by query. I'm not sure exactly what you want to do once you've separated by query ? if you provide some more information, we could suggest ways to best get you where you want to go.
>> 
>> I will mention, though, that BLAST has the ability to search multiple separate databases in one go and collate the results for you. So that's something to consider.
>> 
>> 
>> 
>> Dave
>> 
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Tue Aug 24 16:17:17 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 24 Aug 2010 11:17:17 -0500
Subject: [Bioperl-l] FYI: interesting stuff in BLAST 2.2.24 release
	announcement
References: <73C34D2F-813E-4FCE-8819-B86BC5A974C0@ncbi.nlm.nih.gov>
Message-ID: <A26B0224-CFDD-4D2B-A5B0-4275693416FD@illinois.edu>

FYI,

Very interesting additions to BLAST+ (archive format).  

chris

Begin forwarded message:

> From: mcginnis <mcginnis at ncbi.nlm.nih.gov>
> Date: August 24, 2010 10:46:50 AM CDT
> To: NLM/NCBI List blast-announce <blast-announce at ncbi.nlm.nih.gov>
> Subject: [blast-announce] Correction: BLAST 2.2.24 release announcement
> 
> A new version of the stand-alone applications is available.
>  
> Users are encouraged to use the BLAST+ applications available at ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/
> 
> This release includes a number of bug fixes as well as new features for the BLAST+ applications:
>  
> * Introduce BLAST Archive format to permit reformatting of stand-alone BLAST searches with the blast_formatter(see BLAST+ user manual) 
> * Added the blast_formatter application (see BLAST+ user manual)
> * Added support for translated subject soft masking in the BLAST databases
> * Added support for the BLAST Trace-back operations (btop) output format
> * Added command line options to blastdbcmd for listing available BLAST databases
> * Improved performance of formatting of remote BLAST searches
> * Use a consistent exit code for out of memory conditions
> * Fixed bug in indexed megablast with multiple space-separated BLAST databases
> * Fixed bugs in legacy_blast.pl, blastdbcmd, rpsblast, and makeblastdb
> * Fixed Windows installer for 64-bit installations
>  
> BLAST+ applications, as well as the legacy C applications (e.g. blastall), may be downloaded from http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download


From David.Messina at sbc.su.se  Tue Aug 24 17:00:14 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Tue, 24 Aug 2010 19:00:14 +0200
Subject: [Bioperl-l] FYI: interesting stuff in BLAST 2.2.24 release
	announcement
In-Reply-To: <A26B0224-CFDD-4D2B-A5B0-4275693416FD@illinois.edu>
References: <73C34D2F-813E-4FCE-8819-B86BC5A974C0@ncbi.nlm.nih.gov>
	<A26B0224-CFDD-4D2B-A5B0-4275693416FD@illinois.edu>
Message-ID: <27DD75E8-4452-4B2D-B5B9-A686C113E5B6@sbc.su.se>

Here's a link to the manual:
ftp://ftp.ncbi.nlm.nih.gov//blast/executables/blast%2B/2.2.24/user_manual.pdf

(Is it on the NCBI website somewhere? Strange to have only a downloadable PDF.) The section on the new archive format is on page 27.

It seems like a nice idea to have the flexibility, but I wonder about the time cost of using this format.

One of the big gains from using tab-delimited output is that BLAST doesn't have to do all the post-processing to generate the alignment views. By doing the archive format, which if I understand it correctly is ASN.1, you're always paying the full price in time (and space, for that matter).


Dave


On Aug 24, 2010, at 18:17 , Chris Fields wrote:

> FYI,
> 
> Very interesting additions to BLAST+ (archive format).  
> 
> chris
> 
> Begin forwarded message:
> 
>> From: mcginnis <mcginnis at ncbi.nlm.nih.gov>
>> Date: August 24, 2010 10:46:50 AM CDT
>> To: NLM/NCBI List blast-announce <blast-announce at ncbi.nlm.nih.gov>
>> Subject: [blast-announce] Correction: BLAST 2.2.24 release announcement
>> 
>> A new version of the stand-alone applications is available.
>> 
>> Users are encouraged to use the BLAST+ applications available at ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/
>> 
>> This release includes a number of bug fixes as well as new features for the BLAST+ applications:
>> 
>> * Introduce BLAST Archive format to permit reformatting of stand-alone BLAST searches with the blast_formatter(see BLAST+ user manual) 
>> * Added the blast_formatter application (see BLAST+ user manual)
>> * Added support for translated subject soft masking in the BLAST databases
>> * Added support for the BLAST Trace-back operations (btop) output format
>> * Added command line options to blastdbcmd for listing available BLAST databases
>> * Improved performance of formatting of remote BLAST searches
>> * Use a consistent exit code for out of memory conditions
>> * Fixed bug in indexed megablast with multiple space-separated BLAST databases
>> * Fixed bugs in legacy_blast.pl, blastdbcmd, rpsblast, and makeblastdb
>> * Fixed Windows installer for 64-bit installations
>> 
>> BLAST+ applications, as well as the legacy C applications (e.g. blastall), may be downloaded from http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Tue Aug 24 17:26:49 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 24 Aug 2010 12:26:49 -0500
Subject: [Bioperl-l] FYI: interesting stuff in BLAST 2.2.24 release
	announcement
In-Reply-To: <27DD75E8-4452-4B2D-B5B9-A686C113E5B6@sbc.su.se>
References: <73C34D2F-813E-4FCE-8819-B86BC5A974C0@ncbi.nlm.nih.gov>
	<A26B0224-CFDD-4D2B-A5B0-4275693416FD@illinois.edu>
	<27DD75E8-4452-4B2D-B5B9-A686C113E5B6@sbc.su.se>
Message-ID: <D84DD1C8-6CBE-40F1-8CF9-F9482F0E4B18@illinois.edu>

It's probably more applicable from the viewpoint of a cluster admin who would want to add the flexibility of having a single archive and allowing any format (as opposed to re-running the analysis). I'm just wondering if there is anything to glean there for possible alignment archiving purposes (ala SAM/BAM), but if it's ASN.1, likely not.

chris

On Aug 24, 2010, at 12:00 PM, Dave Messina wrote:

> Here's a link to the manual:
> ftp://ftp.ncbi.nlm.nih.gov//blast/executables/blast%2B/2.2.24/user_manual.pdf
> 
> (Is it on the NCBI website somewhere? Strange to have only a downloadable PDF.) The section on the new archive format is on page 27.
> 
> It seems like a nice idea to have the flexibility, but I wonder about the time cost of using this format.
> 
> One of the big gains from using tab-delimited output is that BLAST doesn't have to do all the post-processing to generate the alignment views. By doing the archive format, which if I understand it correctly is ASN.1, you're always paying the full price in time (and space, for that matter).
> 
> 
> 
> Dave
> 
> 
> 
> 
> On Aug 24, 2010, at 18:17 , Chris Fields wrote:
> 
>> FYI,
>> 
>> Very interesting additions to BLAST+ (archive format).  
>> 
>> chris
>> 
>> Begin forwarded message:
>> 
>>> From: mcginnis <mcginnis at ncbi.nlm.nih.gov>
>>> Date: August 24, 2010 10:46:50 AM CDT
>>> To: NLM/NCBI List blast-announce <blast-announce at ncbi.nlm.nih.gov>
>>> Subject: [blast-announce] Correction: BLAST 2.2.24 release announcement
>>> 
>>> A new version of the stand-alone applications is available.
>>> 
>>> Users are encouraged to use the BLAST+ applications available at ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/
>>> 
>>> This release includes a number of bug fixes as well as new features for the BLAST+ applications:
>>> 
>>> * Introduce BLAST Archive format to permit reformatting of stand-alone BLAST searches with the blast_formatter(see BLAST+ user manual) 
>>> * Added the blast_formatter application (see BLAST+ user manual)
>>> * Added support for translated subject soft masking in the BLAST databases
>>> * Added support for the BLAST Trace-back operations (btop) output format
>>> * Added command line options to blastdbcmd for listing available BLAST databases
>>> * Improved performance of formatting of remote BLAST searches
>>> * Use a consistent exit code for out of memory conditions
>>> * Fixed bug in indexed megablast with multiple space-separated BLAST databases
>>> * Fixed bugs in legacy_blast.pl, blastdbcmd, rpsblast, and makeblastdb
>>> * Fixed Windows installer for 64-bit installations
>>> 
>>> BLAST+ applications, as well as the legacy C applications (e.g. blastall), may be downloaded from http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From David.Messina at sbc.su.se  Tue Aug 24 18:45:29 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Tue, 24 Aug 2010 20:45:29 +0200
Subject: [Bioperl-l] FYI: interesting stuff in BLAST 2.2.24 release
	announcement
In-Reply-To: <D84DD1C8-6CBE-40F1-8CF9-F9482F0E4B18@illinois.edu>
References: <73C34D2F-813E-4FCE-8819-B86BC5A974C0@ncbi.nlm.nih.gov>
	<A26B0224-CFDD-4D2B-A5B0-4275693416FD@illinois.edu>
	<27DD75E8-4452-4B2D-B5B9-A686C113E5B6@sbc.su.se>
	<D84DD1C8-6CBE-40F1-8CF9-F9482F0E4B18@illinois.edu>
Message-ID: <00C04DF9-F3C2-4574-B1E4-A3BF28EE953F@sbc.su.se>

> It's probably more applicable from the viewpoint of a cluster admin who would want to add the flexibility of having a single archive and allowing any format (as opposed to re-running the analysis).

Good point.


> I'm just wondering if there is anything to glean there for possible alignment archiving purposes (ala SAM/BAM), but if it's ASN.1, likely not.

To be honest, I didn't look that closely at it. It may be worth considering nevertheless.


Dave


From buiduyminh at gmail.com  Tue Aug 24 18:56:43 2010
From: buiduyminh at gmail.com (Minh Bui)
Date: Tue, 24 Aug 2010 14:56:43 -0400
Subject: [Bioperl-l] bp_seqfeature_load.pl fails on Mac os. Please help.
In-Reply-To: <491D1B66-741F-4315-8A6B-46F465956017@sgul.ac.uk>
References: <AANLkTinsyOMPJxpks_pqMwLpW8gx0VRihhJsLDnF53mu@mail.gmail.com>
	<491D1B66-741F-4315-8A6B-46F465956017@sgul.ac.uk>
Message-ID: <AANLkTimOe=T9FrpMPqMy8yyrfz8Sf7QJ5Rr5YYFjicJb@mail.gmail.com>

How can I know where DBD:mysql PATH on my MAC? I am very new to MAC sorry.

I just check and mysql.pm is in
/Library/Perl/5.8.6/Bio/DB/SeqFeature/Store/DBI/mysql.pm


On 8/21/10, Adam Witney <awitney at sgul.ac.uk> wrote:
>
>  On 20 Aug 2010, at 22:29, Minh Bui wrote:
>
>  > Hi,,
>  > I am trying to load my GFF file to mysql database but I got this error
>  > when I ran the bp_seqfeature_load.pl ( bioperl 1.6.1 on ?MAC)
>  >
>  > [BioComplexity-5:/usr/local/bin] minh% perl bp_seqfeature_load.pl
>  > install_driver(mysql) failed: Can't locate DBD/mysql.pm in @INC (@INC
>  > contains: /sw/lib/perl5 /sw/lib/perl5/darwin
>  > /System/Library/Perl/5.8.6/darwin-thread-multi-2level
>  > /System/Library/Perl/5.8.6
>  > /Library/Perl/5.8.6/darwin-thread-multi-2level /Library/Perl/5.8.6
>  > /Library/Perl /Network/Library/Perl/5.8.6/darwin-thread-multi-2level
>  > /Network/Library/Perl/5.8.6 /Network/Library/Perl
>  > /System/Library/Perl/Extras/5.8.6/darwin-thread-multi-2level
>  > /System/Library/Perl/Extras/5.8.6 /Library/Perl/5.8.1 .) at (eval 44)
>  > line 3.
>  > Perhaps the DBD::mysql perl module hasn't been fully installed,
>  > or perhaps the capitalisation of 'mysql' isn't right.
>  > Available drivers: DBM, ExampleP, File, Gofer, Proxy, Sponge.
>  > at /Library/Perl/5.8.6/Bio/DB/SeqFeature/Store/DBI/mysql.pm line 212
>  >
>  > I am using MAC OSX version 10.4.10 and MAMP? Isnt it the
>  > "/Library/Perl/5.8.6" already in @INC? What am I missing?
>  > I have been googling this error for a few hours. I also install
>  > Bioperl and reinstall DBD::mysql using CPAN. It still doesnt work..
>  >
>  > Here is my $PERL5LIB: ?/sw/lib/perl5:/sw/lib/perl5/darwin/
>
>
>
> Where did DBD:mysql get installed? can you verify that DBD/mysql.pm is actually in one of those directories listed above?
>
>


From scott at scottcain.net  Tue Aug 24 19:04:04 2010
From: scott at scottcain.net (Scott Cain)
Date: Tue, 24 Aug 2010 15:04:04 -0400
Subject: [Bioperl-l] bp_seqfeature_load.pl fails on Mac os. Please help.
In-Reply-To: <AANLkTimOe=T9FrpMPqMy8yyrfz8Sf7QJ5Rr5YYFjicJb@mail.gmail.com>
References: <AANLkTinsyOMPJxpks_pqMwLpW8gx0VRihhJsLDnF53mu@mail.gmail.com>
	<491D1B66-741F-4315-8A6B-46F465956017@sgul.ac.uk>
	<AANLkTimOe=T9FrpMPqMy8yyrfz8Sf7QJ5Rr5YYFjicJb@mail.gmail.com>
Message-ID: <AANLkTimPapxSzwVxCBMw1J0+x88K80SJ_6OH9LBkS3Jn@mail.gmail.com>

Hi Minh,

The file you found is not DBD::mysql though; it is
Bio::DB::SeqFeature::Store::DBI::mysql, which was installed along with
BioPerl.  How did you find that file?  The same method presumably
would turn up DBD::mysql if it existed.  I would use a command like
this:

  locate mysql.pm

which would locate all of the instances of files name mysql.pm on your
computer.  I would expect it to be located in
/Library/Perl/5.8.6/darwin-thread-multi-2level/DBD/ if it was
installed in a "normal" way (that is, not involving macports or fink
or MAMP).

Scott


On Tue, Aug 24, 2010 at 2:56 PM, Minh Bui <buiduyminh at gmail.com> wrote:
> How can I know where DBD:mysql PATH on my MAC? I am very new to MAC sorry.
>
> I just check and mysql.pm is in
> /Library/Perl/5.8.6/Bio/DB/SeqFeature/Store/DBI/mysql.pm
>
>
>
> On 8/21/10, Adam Witney <awitney at sgul.ac.uk> wrote:
>>
>> ?On 20 Aug 2010, at 22:29, Minh Bui wrote:
>>
>> ?> Hi,,
>> ?> I am trying to load my GFF file to mysql database but I got this error
>> ?> when I ran the bp_seqfeature_load.pl ( bioperl 1.6.1 on ?MAC)
>> ?>
>> ?> [BioComplexity-5:/usr/local/bin] minh% perl bp_seqfeature_load.pl
>> ?> install_driver(mysql) failed: Can't locate DBD/mysql.pm in @INC (@INC
>> ?> contains: /sw/lib/perl5 /sw/lib/perl5/darwin
>> ?> /System/Library/Perl/5.8.6/darwin-thread-multi-2level
>> ?> /System/Library/Perl/5.8.6
>> ?> /Library/Perl/5.8.6/darwin-thread-multi-2level /Library/Perl/5.8.6
>> ?> /Library/Perl /Network/Library/Perl/5.8.6/darwin-thread-multi-2level
>> ?> /Network/Library/Perl/5.8.6 /Network/Library/Perl
>> ?> /System/Library/Perl/Extras/5.8.6/darwin-thread-multi-2level
>> ?> /System/Library/Perl/Extras/5.8.6 /Library/Perl/5.8.1 .) at (eval 44)
>> ?> line 3.
>> ?> Perhaps the DBD::mysql perl module hasn't been fully installed,
>> ?> or perhaps the capitalisation of 'mysql' isn't right.
>> ?> Available drivers: DBM, ExampleP, File, Gofer, Proxy, Sponge.
>> ?> at /Library/Perl/5.8.6/Bio/DB/SeqFeature/Store/DBI/mysql.pm line 212
>> ?>
>> ?> I am using MAC OSX version 10.4.10 and MAMP? Isnt it the
>> ?> "/Library/Perl/5.8.6" already in @INC? What am I missing?
>> ?> I have been googling this error for a few hours. I also install
>> ?> Bioperl and reinstall DBD::mysql using CPAN. It still doesnt work..
>> ?>
>> ?> Here is my $PERL5LIB: ?/sw/lib/perl5:/sw/lib/perl5/darwin/
>>
>>
>>
>> Where did DBD:mysql get installed? can you verify that DBD/mysql.pm is actually in one of those directories listed above?
>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
Ontario Institute for Cancer Research


From jason at bioperl.org  Wed Aug 25 04:33:45 2010
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 24 Aug 2010 21:33:45 -0700
Subject: [Bioperl-l] Enquiry on gi_taxid_nucl.dmp.gz
In-Reply-To: <AANLkTi=BrV0ODdF+sEQdAmtAMxRk6y2xGgRMOgbnZz-6@mail.gmail.com>
References: <AANLkTi=BrV0ODdF+sEQdAmtAMxRk6y2xGgRMOgbnZz-6@mail.gmail.com>
Message-ID: <4C749D29.3040003@bioperl.org>

hi - please keep questions on list.


I think one of your problem is your first use of $gi2taxidfile is wrong. 
when you call tie you want to specify an dbfile you want to store the 
index in.
So call it "/tmp/gi2taxid.idx" or something like that.

In my code here 
http://github.com/bioperl/bioperl-live/blob/master/scripts/taxa/classify_hits_kingdom.PLS
you will see on line 97 we construct the name of the index file to be 
the folder, plus 'idx', plus the name gi2taxid which will be the name of 
index file.

Also it would be safer for the split to be whitespace matching and that 
you want the the two first columns from the file.  Doing this would 
eliminate the need for the chomp on the line above.

  my ($gi, $taxid) = split(/\s+/, $_);

instead of

  chomp;
  my ($gi, $taxid) = split(" ", $_,2);

There may be other problems but these should be fixed first -- and 
please send queries to the mailing list rather than to me directly so 
that others can answer questions.

-jason
Amali Thrimawithana wrote, On 8/24/10 8:13 PM:
> Dear Jason
>
> Thank you very much for the information. I manage to get the information on
> different taxonomic  levels with the help of one of your example code
> "local_taxonomydb_query". However I am having trouble with creating a local
> index file of the gi_taxid_nucl.dmp so that I am able to get the taxonomic
> id given the GI number of NCBI. At the moment I am using the tie() function
> with DB_file and then storing the detail into a hash. However when I try to
> retrieve a taxonomic ID given the GI number, it is not returning any thing
> but an error. Below is part of the code (borrowed from the example code
> classify kingdom), can you please let me know where I am going wrong?
> ...
> my $dbh2 = tie(%taxid4gi, 'DB_File', $gi2taxidfile);
>
> if( ! $done ) {
>      my $fh;
>     open(GI2TAXID, "$gi2taxidfile") or die $!; #here passing the unzipped
> gi_taxid_nucl.dmp
>     my$i=0;
>      while (<GI2TAXID>) {
>        chomp;
>         my ($gi, $taxid) = split(" ", $_, 2);
>         $taxid4gi{$gi} = $taxid
>         if exists $taxid4gi{$gi};
>         $i++;
>       unless( $DEBUG&&  $i % 100000  ) {
>          warn "$i\n";
>      }
>      }
>      $dbh2->sync;
> }
> my $gi2='183397240';
> my $taxd2=$taxid4gi{$gi2};
>   print $taxd2, " \n";
>
> Any help would be much appreciated
>
> Thanking you
> Amali
>
> On 23 August 2010 06:29, Jason Stajich<jason at bioperl.org>  wrote:
>
>    
>> Hi Amali -
>>
>> This is how I'd print out the full classification by using the Tree methods
>> (with probably a different way of initializing the $db object to your
>> flatfiles location).
>>
>> #!/usr/bin/perl -w
>> use strict;
>> use Bio::DB::Taxonomy;
>>
>> my $db= Bio::DB::Taxonomy->new(-source =>  'flatfile',
>>                    -nodesfile =>  'taxonomy/nodes.dmp',
>>                    -namesfile =>  'taxonomy/names.dmp');
>>
>> my $taxonid = $db->get_taxonid('Homo sapiens');
>> my $taxon = $db->get_taxon(-taxonid =>  $taxonid);
>> my $tree = Bio::Tree::Tree->new(-node =>  $taxon);
>> my @taxa = $tree->get_nodes;
>> print join(",", map { $_->scientific_name } @taxa), "\n";
>>
>> -jason
>>
>> Amali Thrimawithana wrote, On 8/18/10 3:56 PM:
>>
>>   Dear Dr Stajich,
>>      
>>> I am a Masters student at Auckland university and my research is on
>>> identifying yeast species present in wine by the use of 454 sequencing. In
>>> order to carry out this research, a pipeline is being built in which at
>>> the
>>> final step each representative OTU need to be classified at different
>>> taxonomic levels (ie: at Phylum, family, class, genus and species) by
>>> using
>>> the results from BLAST. To identify the sequences at each taxonomic level,
>>> I
>>> have been trying out the Bio::DB::Taxonomy module in bioperl. Using this
>>> module, I am able to get the genus and species level by splitting the
>>> scientific name returned by the Bio::taxon object. But unfortunately I am
>>> uncertain on how to get the information for the other levels of the rank.
>>> I
>>> have tried several commands including "my @class =
>>> $node->classification;",
>>> but it does not work. Hence, could you please let me know how I might be
>>> able to get the higher levels of taxonomy such as class and phylum using
>>> bioperl?
>>>
>>> Look forward to hearing from you soon
>>>
>>> Thanking You
>>>
>>> Amali
>>>
>>>
>>>        


From roy.chaudhuri at gmail.com  Wed Aug 25 11:12:15 2010
From: roy.chaudhuri at gmail.com (Roy Chaudhuri)
Date: Wed, 25 Aug 2010 12:12:15 +0100
Subject: [Bioperl-l] Enquiry on gi_taxid_nucl.dmp.gz
In-Reply-To: <4C749D29.3040003@bioperl.org>
References: <AANLkTi=BrV0ODdF+sEQdAmtAMxRk6y2xGgRMOgbnZz-6@mail.gmail.com>
	<4C749D29.3040003@bioperl.org>
Message-ID: <4C74FA8F.3080506@gmail.com>

 > Also it would be safer for the split to be whitespace matching and that
> you want the the two first columns from the file.  Doing this would
> eliminate the need for the chomp on the line above.
>
>    my ($gi, $taxid) = split(/\s+/, $_);
>
> instead of
>
>    chomp;
>    my ($gi, $taxid) = split(" ", $_,2);

Sorry to be pedantic, but according to perldoc -f split: "As a special 
case, specifying a PATTERN of space (' ') will split on white space just 
as "split" with no arguments does"

The only difference between patterns of " " and /\s+/ is that the latter 
will return an initial null field if there is leading white space, which 
may or may not be what you want.

$ perl -e 'print join("-", split(" ", " 1\t2  3")), "\n"'
1-2-3
$ perl -e 'print join("-", split(/\s+/, " 1\t2  3")), "\n"'
-1-2-3

Cheers.
Roy.


From kanmaninradha at gmail.com  Thu Aug 26 08:29:08 2010
From: kanmaninradha at gmail.com (kanmani radha)
Date: Thu, 26 Aug 2010 01:29:08 -0700
Subject: [Bioperl-l] getting DNA sequence for exon features from GFF
Message-ID: <AANLkTimzuqNz+oHZt3JAq=eQaefJ4UnQK=0R2tQhKfmS@mail.gmail.com>

Hi All,
I would like to get the DNA seq from GFF file. I'm using Bio::Tools::GFF
module. I could get everything else but not the DNA seq.

Can anyone help me to find this out, Please. I appreciate your help very
much.
thanks,
Kanmani

#!/usr/bin/perl

use strict;
use warnings;
use Bio::Tools::GFF;

my $file = shift;

my $gffio = Bio::Tools::GFF->new(-file => $file, -gff_version => 3);
$gffio->features_attached_to_seqs(1);

while (my $feat = $gffio->next_feature()){
    my $start = $feat->start;
    my $end= $feat->end;
    my $size = $end-$start+1;
    my $strand = $feat->strand;
    my $seqid = $feat->seq_id;
    my $score = $feat->score;
    my $frame = $feat->frame;
    my $source = $feat->source_tag;
    my $type = $feat->primary_tag;
    my $gffstr = $gffio->gff_string($feat);
    my @alltags = $feat->all_tags();
    my @ID_tag_value = $feat->each_tag_value("ID");

    my  $seq = $feat->seq();
    print "$seq\n";

     if($type eq "gene"){     #
       print "@ID_tag_value\t$size\t$type\t$start\t$end\n";
    }
}


From David.Messina at sbc.su.se  Thu Aug 26 08:53:48 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 26 Aug 2010 10:53:48 +0200
Subject: [Bioperl-l] getting DNA sequence for exon features from GFF
In-Reply-To: <AANLkTimzuqNz+oHZt3JAq=eQaefJ4UnQK=0R2tQhKfmS@mail.gmail.com>
References: <AANLkTimzuqNz+oHZt3JAq=eQaefJ4UnQK=0R2tQhKfmS@mail.gmail.com>
Message-ID: <6A715DB8-C51A-4195-BB6C-73A7345CB65A@sbc.su.se>

Admittedly i'm not up on the latest uses of GFF, but as far as I know, GFF is an annotation format only ? it does not contain the actual sequence.

Have you looked in your GFF file to see if there are nucleotides in there?

Dave


On Aug 26, 2010, at 10:29, kanmani radha wrote:

> Hi All,
> I would like to get the DNA seq from GFF file. I'm using Bio::Tools::GFF
> module. I could get everything else but not the DNA seq.


From biopython at maubp.freeserve.co.uk  Thu Aug 26 09:02:53 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 26 Aug 2010 10:02:53 +0100
Subject: [Bioperl-l] getting DNA sequence for exon features from GFF
In-Reply-To: <6A715DB8-C51A-4195-BB6C-73A7345CB65A@sbc.su.se>
References: <AANLkTimzuqNz+oHZt3JAq=eQaefJ4UnQK=0R2tQhKfmS@mail.gmail.com>
	<6A715DB8-C51A-4195-BB6C-73A7345CB65A@sbc.su.se>
Message-ID: <AANLkTikw=9zFm5sZej0C4kTQZMnvoFNox06jCC6p9Jxy@mail.gmail.com>

On Thu, Aug 26, 2010 at 9:53 AM, Dave Messina <David.Messina at sbc.su.se> wrote:
>
> Admittedly i'm not up on the latest uses of GFF, but as far as I know, GFF
> is an annotation format only ? it does not contain the actual sequence.
>
> Have you looked in your GFF file to see if there are nucleotides in there?
>
> Dave

Actually a GFF file can optionally include a FASTA format sequence
at the end of the file, although it seems to be more common to just
supply separate GFF and FASTA files and cross reference by ID.

Peter


From David.Messina at sbc.su.se  Thu Aug 26 09:08:20 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 26 Aug 2010 11:08:20 +0200
Subject: [Bioperl-l] getting DNA sequence for exon features from GFF
In-Reply-To: <AANLkTikw=9zFm5sZej0C4kTQZMnvoFNox06jCC6p9Jxy@mail.gmail.com>
References: <AANLkTimzuqNz+oHZt3JAq=eQaefJ4UnQK=0R2tQhKfmS@mail.gmail.com>
	<6A715DB8-C51A-4195-BB6C-73A7345CB65A@sbc.su.se>
	<AANLkTikw=9zFm5sZej0C4kTQZMnvoFNox06jCC6p9Jxy@mail.gmail.com>
Message-ID: <C7C28E1D-7BAC-4D06-9EC6-71EA95F06776@sbc.su.se>

Aha, great, thanks for clarifying, Peter.

And if I bothered to look at the Bio::Tools::GFF documentation before answering :), I would have seen this:

    http://doc.bioperl.org/bioperl-live/Bio/Tools/GFF.html#General

which describes how you can use

    $gffio->get_seqs()


and related methods to pull out the sequence data.


Dave


On Aug 26, 2010, at 11:02, Peter wrote:

> On Thu, Aug 26, 2010 at 9:53 AM, Dave Messina <David.Messina at sbc.su.se> wrote:
>> 
>> Admittedly i'm not up on the latest uses of GFF, but as far as I know, GFF
>> is an annotation format only ? it does not contain the actual sequence.
>> 
>> Have you looked in your GFF file to see if there are nucleotides in there?
>> 
>> Dave
> 
> Actually a GFF file can optionally include a FASTA format sequence
> at the end of the file, although it seems to be more common to just
> supply separate GFF and FASTA files and cross reference by ID.
> 
> Peter


From David.Messina at sbc.su.se  Thu Aug 26 09:18:25 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 26 Aug 2010 11:18:25 +0200
Subject: [Bioperl-l] getting DNA sequence for exon features from GFF
In-Reply-To: <C7C28E1D-7BAC-4D06-9EC6-71EA95F06776@sbc.su.se>
References: <AANLkTimzuqNz+oHZt3JAq=eQaefJ4UnQK=0R2tQhKfmS@mail.gmail.com>
	<6A715DB8-C51A-4195-BB6C-73A7345CB65A@sbc.su.se>
	<AANLkTikw=9zFm5sZej0C4kTQZMnvoFNox06jCC6p9Jxy@mail.gmail.com>
	<C7C28E1D-7BAC-4D06-9EC6-71EA95F06776@sbc.su.se>
Message-ID: <984552CF-01F3-4D29-932F-DD030CCC1448@sbc.su.se>

So, just to finish the thought:

Kanmani,

Apologies for my sloppy and uninformed answer. The following is only slightly less sloppy and uninformed, but may actually answer your question.

I think you need to call 

   $gffio->get_seqs()

probably as

  my @seq_objects = $gffio->get_seqs();


and then loop through those something like:

	foreach my $seq_object (@seq_objects) {
		my $seq = $seq_object->seq();
    
		foreach my $feat ($seq->get_SeqFeatures) {
			# do your feature processing here
		}
	}


Note that I haven't tested the above code.


Dave


From fs5 at sanger.ac.uk  Thu Aug 26 09:19:44 2010
From: fs5 at sanger.ac.uk (Frank Schwach)
Date: Thu, 26 Aug 2010 10:19:44 +0100
Subject: [Bioperl-l] getting DNA sequence for exon features from GFF
In-Reply-To: <AANLkTimzuqNz+oHZt3JAq=eQaefJ4UnQK=0R2tQhKfmS@mail.gmail.com>
References: <AANLkTimzuqNz+oHZt3JAq=eQaefJ4UnQK=0R2tQhKfmS@mail.gmail.com>
Message-ID: <1282814384.4777.48.camel@deskpro15336.dynamic.sanger.ac.uk>

Hi Kammani,

While GFF files may contain DNA sequence data, most of them don't, so
you will have to use the location information you get from the GFF
annotation file in conjunction with, e.g., a local FASTA database of the
genomic sequence you are working with or an online resource.


Frank


On Thu, 2010-08-26 at 01:29 -0700, kanmani radha wrote:
> Hi All,
> I would like to get the DNA seq from GFF file. I'm using Bio::Tools::GFF
> module. I could get everything else but not the DNA seq.
> 
> Can anyone help me to find this out, Please. I appreciate your help very
> much.
> thanks,
> Kanmani
> 
> #!/usr/bin/perl
> 
> use strict;
> use warnings;
> use Bio::Tools::GFF;
> 
> my $file = shift;
> 
> my $gffio = Bio::Tools::GFF->new(-file => $file, -gff_version => 3);
> $gffio->features_attached_to_seqs(1);
> 
> while (my $feat = $gffio->next_feature()){
>     my $start = $feat->start;
>     my $end= $feat->end;
>     my $size = $end-$start+1;
>     my $strand = $feat->strand;
>     my $seqid = $feat->seq_id;
>     my $score = $feat->score;
>     my $frame = $feat->frame;
>     my $source = $feat->source_tag;
>     my $type = $feat->primary_tag;
>     my $gffstr = $gffio->gff_string($feat);
>     my @alltags = $feat->all_tags();
>     my @ID_tag_value = $feat->each_tag_value("ID");
> 
>     my  $seq = $feat->seq();
>     print "$seq\n";
> 
>      if($type eq "gene"){     #
>        print "@ID_tag_value\t$size\t$type\t$start\t$end\n";
>     }
> }
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


From cjfields at illinois.edu  Thu Aug 26 14:20:48 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 26 Aug 2010 09:20:48 -0500
Subject: [Bioperl-l] getting DNA sequence for exon features from GFF
In-Reply-To: <1282814384.4777.48.camel@deskpro15336.dynamic.sanger.ac.uk>
References: <AANLkTimzuqNz+oHZt3JAq=eQaefJ4UnQK=0R2tQhKfmS@mail.gmail.com>
	<1282814384.4777.48.camel@deskpro15336.dynamic.sanger.ac.uk>
Message-ID: <6CC5D44B-47C7-4A05-B834-EAB9173C74CE@illinois.edu>

Kammani,

If you are using BioPerl, the best option currently available is to load a database with all relevant information (GFF and FASTA), then use that database for querying.  The most commonly-used ones now are Bio::DB::SeqFeature::Store and Bio::DB::GFF; the former is very GFF3-centric, but I believe it can handle GFF/GTF, and it has various database adaptors (MySQL, Pg, BDB, SQLite).

chris

On Aug 26, 2010, at 4:19 AM, Frank Schwach wrote:

> Hi Kammani,
> 
> While GFF files may contain DNA sequence data, most of them don't, so
> you will have to use the location information you get from the GFF
> annotation file in conjunction with, e.g., a local FASTA database of the
> genomic sequence you are working with or an online resource.
> 
> 
> Frank
> 
> 
> 
> On Thu, 2010-08-26 at 01:29 -0700, kanmani radha wrote:
>> Hi All,
>> I would like to get the DNA seq from GFF file. I'm using Bio::Tools::GFF
>> module. I could get everything else but not the DNA seq.
>> 
>> Can anyone help me to find this out, Please. I appreciate your help very
>> much.
>> thanks,
>> Kanmani
>> 
>> #!/usr/bin/perl
>> 
>> use strict;
>> use warnings;
>> use Bio::Tools::GFF;
>> 
>> my $file = shift;
>> 
>> my $gffio = Bio::Tools::GFF->new(-file => $file, -gff_version => 3);
>> $gffio->features_attached_to_seqs(1);
>> 
>> while (my $feat = $gffio->next_feature()){
>>    my $start = $feat->start;
>>    my $end= $feat->end;
>>    my $size = $end-$start+1;
>>    my $strand = $feat->strand;
>>    my $seqid = $feat->seq_id;
>>    my $score = $feat->score;
>>    my $frame = $feat->frame;
>>    my $source = $feat->source_tag;
>>    my $type = $feat->primary_tag;
>>    my $gffstr = $gffio->gff_string($feat);
>>    my @alltags = $feat->all_tags();
>>    my @ID_tag_value = $feat->each_tag_value("ID");
>> 
>>    my  $seq = $feat->seq();
>>    print "$seq\n";
>> 
>>     if($type eq "gene"){     #
>>       print "@ID_tag_value\t$size\t$type\t$start\t$end\n";
>>    }
>> }
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> 
> -- 
> The Wellcome Trust Sanger Institute is operated by Genome Research 
> Limited, a charity registered in England with number 1021457 and a 
> company registered in England with number 2742969, whose registered 
> office is 215 Euston Road, London, NW1 2BE. 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Thu Aug 26 14:31:59 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 26 Aug 2010 09:31:59 -0500
Subject: [Bioperl-l] getting DNA sequence for exon features from GFF
In-Reply-To: <AANLkTikw=9zFm5sZej0C4kTQZMnvoFNox06jCC6p9Jxy@mail.gmail.com>
References: <AANLkTimzuqNz+oHZt3JAq=eQaefJ4UnQK=0R2tQhKfmS@mail.gmail.com>
	<6A715DB8-C51A-4195-BB6C-73A7345CB65A@sbc.su.se>
	<AANLkTikw=9zFm5sZej0C4kTQZMnvoFNox06jCC6p9Jxy@mail.gmail.com>
Message-ID: <DD36A578-4156-4911-8432-84BD5ECB3AB8@illinois.edu>

On Aug 26, 2010, at 4:02 AM, Peter wrote:

> On Thu, Aug 26, 2010 at 9:53 AM, Dave Messina <David.Messina at sbc.su.se> wrote:
>> 
>> Admittedly i'm not up on the latest uses of GFF, but as far as I know, GFF
>> is an annotation format only ? it does not contain the actual sequence.
>> 
>> Have you looked in your GFF file to see if there are nucleotides in there?
>> 
>> Dave
> 
> Actually a GFF file can optionally include a FASTA format sequence
> at the end of the file, although it seems to be more common to just
> supply separate GFF and FASTA files and cross reference by ID.
> 
> Peter

IIRC, optionally including FASTA sequence is specified only in the GFF3 spec; use of FASTA isn't explicitly mentioned in earlier versions.  We only support it with earlier GFF due to convergence of the various GFF parsers.  

The original GFF spec proposed allowing sequence, but it's in the form of meta information and I have never seen it used in practice (as you mention, the FASTA is normally loaded separately).

chris


From kanmaninradha at gmail.com  Thu Aug 26 16:22:14 2010
From: kanmaninradha at gmail.com (kanmani radha)
Date: Thu, 26 Aug 2010 09:22:14 -0700
Subject: [Bioperl-l] getting DNA sequence for exon features from GFF
In-Reply-To: <6CC5D44B-47C7-4A05-B834-EAB9173C74CE@illinois.edu>
References: <AANLkTimzuqNz+oHZt3JAq=eQaefJ4UnQK=0R2tQhKfmS@mail.gmail.com>
	<1282814384.4777.48.camel@deskpro15336.dynamic.sanger.ac.uk>
	<6CC5D44B-47C7-4A05-B834-EAB9173C74CE@illinois.edu>
Message-ID: <AANLkTinxcoKBHqU7bnfyNA6bi5qBjNAYR54c6K+Pg7rz@mail.gmail.com>

Hi Everyone,

Thanks very much for this clarification.  Thanks a ton for every one who
spared their time to educate me.

I see your points.  Please correct me if I am wrong.

I understand that, Its better to use use Bio::DB::SeqFeature or Bio::DB::GFF
to load the fasta sequences (from a separate multifasta) file and
then Bio::Tools::GFF to parse the feature info from a gff file . Then query
the created database for the relevent GFF coordinates....

I will implement this.

Thanks once again.
Kanmani

On Thu, Aug 26, 2010 at 7:20 AM, Chris Fields <cjfields at illinois.edu> wrote:

> Kammani,
>
> If you are using BioPerl, the best option currently available is to load a
> database with all relevant information (GFF and FASTA), then use that
> database for querying.  The most commonly-used ones now are
> Bio::DB::SeqFeature::Store and Bio::DB::GFF; the former is very
> GFF3-centric, but I believe it can handle GFF/GTF, and it has various
> database adaptors (MySQL, Pg, BDB, SQLite).
>
> chris
>
> On Aug 26, 2010, at 4:19 AM, Frank Schwach wrote:
>
> > Hi Kammani,
> >
> > While GFF files may contain DNA sequence data, most of them don't, so
> > you will have to use the location information you get from the GFF
> > annotation file in conjunction with, e.g., a local FASTA database of the
> > genomic sequence you are working with or an online resource.
> >
> >
> > Frank
> >
> >
> >
> > On Thu, 2010-08-26 at 01:29 -0700, kanmani radha wrote:
> >> Hi All,
> >> I would like to get the DNA seq from GFF file. I'm using Bio::Tools::GFF
> >> module. I could get everything else but not the DNA seq.
> >>
> >> Can anyone help me to find this out, Please. I appreciate your help very
> >> much.
> >> thanks,
> >> Kanmani
> >>
> >> #!/usr/bin/perl
> >>
> >> use strict;
> >> use warnings;
> >> use Bio::Tools::GFF;
> >>
> >> my $file = shift;
> >>
> >> my $gffio = Bio::Tools::GFF->new(-file => $file, -gff_version => 3);
> >> $gffio->features_attached_to_seqs(1);
> >>
> >> while (my $feat = $gffio->next_feature()){
> >>    my $start = $feat->start;
> >>    my $end= $feat->end;
> >>    my $size = $end-$start+1;
> >>    my $strand = $feat->strand;
> >>    my $seqid = $feat->seq_id;
> >>    my $score = $feat->score;
> >>    my $frame = $feat->frame;
> >>    my $source = $feat->source_tag;
> >>    my $type = $feat->primary_tag;
> >>    my $gffstr = $gffio->gff_string($feat);
> >>    my @alltags = $feat->all_tags();
> >>    my @ID_tag_value = $feat->each_tag_value("ID");
> >>
> >>    my  $seq = $feat->seq();
> >>    print "$seq\n";
> >>
> >>     if($type eq "gene"){     #
> >>       print "@ID_tag_value\t$size\t$type\t$start\t$end\n";
> >>    }
> >> }
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> >
> > --
> > The Wellcome Trust Sanger Institute is operated by Genome Research
> > Limited, a charity registered in England with number 1021457 and a
> > company registered in England with number 2742969, whose registered
> > office is 215 Euston Road, London, NW1 2BE.
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From cjfields at illinois.edu  Thu Aug 26 17:08:56 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 26 Aug 2010 12:08:56 -0500
Subject: [Bioperl-l] getting DNA sequence for exon features from GFF
In-Reply-To: <AANLkTinxcoKBHqU7bnfyNA6bi5qBjNAYR54c6K+Pg7rz@mail.gmail.com>
References: <AANLkTimzuqNz+oHZt3JAq=eQaefJ4UnQK=0R2tQhKfmS@mail.gmail.com>
	<1282814384.4777.48.camel@deskpro15336.dynamic.sanger.ac.uk>
	<6CC5D44B-47C7-4A05-B834-EAB9173C74CE@illinois.edu>
	<AANLkTinxcoKBHqU7bnfyNA6bi5qBjNAYR54c6K+Pg7rz@mail.gmail.com>
Message-ID: <EF1B137F-94A7-45E1-B8FB-0E20142F0A7F@illinois.edu>

On Aug 26, 2010, at 11:22 AM, kanmani radha wrote:

> Hi Everyone,
> 
> Thanks very much for this clarification.  Thanks a ton for every one who
> spared their time to educate me.
> 
> I see your points.  Please correct me if I am wrong.
> 
> I understand that, Its better to use use Bio::DB::SeqFeature or Bio::DB::GFF
> to load the fasta sequences (from a separate multifasta) file and
> then Bio::Tools::GFF to parse the feature info from a gff file . Then query
> the created database for the relevent GFF coordinates....
> 
> I will implement this.
> 
> Thanks once again.
> Kanmani

Yes, in general.  I forgot to mention that you can have an in-memory database as well, but it's only suggested if you have a few thousand or so features and small sequences (I think bacterial chromosomes will work).  

chris


From Havard.Aanes at nvh.no  Wed Aug 25 15:47:12 2010
From: Havard.Aanes at nvh.no (=?iso-8859-1?Q?Aanes_H=E5vard?=)
Date: Wed, 25 Aug 2010 17:47:12 +0200
Subject: [Bioperl-l] bpfetch.pl
Message-ID: <897520BC3AAE754FA4E34E2FD26490A8021C61597B8D@A-EXMB1.veths.no>


Hi,

I am trying do obtain a set of mRNA sequences from a database, made by the bpindex script. I thought this should be a trivial task, but it appears not to be. I get the sequences if I do one by one, like this:

perl scripts/index/bpfetch.pl -dir ./ zebrafish:NM_201192 zebrafish:NM_212708

But I need hundreds of sequences, so my plan was to put the RefSeq IDs in a file and use that as an argument (or whatever it is called in perl). That does not work:

haavaaan at login2 ~/download/src/bioperl-1.2.3 $ perl scripts/index/bpfetch.pl -dir ./ zebrafish:./some_seqs

You are running bpindex.pl without installing bioperl.
You have done it from bioperl/scripts, and so we can find the necessary information
but it is much better to install bioperl

Please read the README in the bioperl distribution

Sequence %id in Database zebrafish is not present


Any suggestions on how to do this? Alternative approaches are also appreciated.

I have no experience in perl, just started using linux, and for the moment there is no time to learn perl, so I would really be grateful for any help to solve this specific task.

Best regards

H?vard Aanes (M.Sc.)
Ph.D. student
Section for biochemistry and physiology
The Norwegian School of Veterinary Science
Telephone: +47 22597358


The new e-mail domain name for The Norwegian School of Veterinary Science is @nvh.no.
The former domain address @veths.no will still be in use, but it will be discontinued within 1-2 years.
Please update your e-mail records.


This message verifies that the e-mail has been 
scanned for virus, and deemed virus-free 
according to our scanengines.


From kanmaninradha at gmail.com  Thu Aug 26 08:23:28 2010
From: kanmaninradha at gmail.com (kanmani)
Date: Thu, 26 Aug 2010 01:23:28 -0700 (PDT)
Subject: [Bioperl-l] Bio::Tools:GFF to get DNA sequences...
Message-ID: <9b7381d7-3596-4e60-a2ac-6c8c135d457d@s24g2000pri.googlegroups.com>

Hi I am trying to get the DNA sequences for each exon feature. I have
the following script. Everything works except getting sequences. Can
some one correct me.....Thanks.

#!/usr/bin/perl

use strict;
use warnings;
use Bio::Tools::GFF;


my $file = shift;
my $gffio = Bio::Tools::GFF->new(-file => $file, -gff_version => 3);
$gffio->features_attached_to_seqs(1);

while (my $feat = $gffio->next_feature()){
    my $start = $feat->start;
    my $end= $feat->end;
    my $size = $end-$start+1;
    my $strand = $feat->strand;
    my $seqid = $feat->seq_id;
    my $score = $feat->score;
    my $frame = $feat->frame;
    my $source = $feat->source_tag;
    my $type = $feat->primary_tag;
    my $gffstr = $gffio->gff_string($feat);
    my @alltags = $feat->all_tags();
    my @ID_tag_value = $feat->each_tag_value("ID");

   my  $seq = $feat->seq();
   print "$seq\n";

  if($type eq "gene"){
       print "@ID_tag_value\t$size\t$type\t$start\t$end\n";
    }
}


From kanmaninradha at gmail.com  Thu Aug 26 21:24:40 2010
From: kanmaninradha at gmail.com (kanmani radha)
Date: Thu, 26 Aug 2010 14:24:40 -0700
Subject: [Bioperl-l] getting DNA sequence for exon features from GFF
In-Reply-To: <EF1B137F-94A7-45E1-B8FB-0E20142F0A7F@illinois.edu>
References: <AANLkTimzuqNz+oHZt3JAq=eQaefJ4UnQK=0R2tQhKfmS@mail.gmail.com>
	<1282814384.4777.48.camel@deskpro15336.dynamic.sanger.ac.uk>
	<6CC5D44B-47C7-4A05-B834-EAB9173C74CE@illinois.edu>
	<AANLkTinxcoKBHqU7bnfyNA6bi5qBjNAYR54c6K+Pg7rz@mail.gmail.com>
	<EF1B137F-94A7-45E1-B8FB-0E20142F0A7F@illinois.edu>
Message-ID: <AANLkTikUxFLLAduO7M1QzSToewA_AgPPELKPVYq0+JKk@mail.gmail.com>

Hi Chris and others,

For a brief amount time i could get away using Bio::DB::Fasta to index fasta
files and Bio::Tools::GFF to iterate thru GFF features. But, i hit the wall
again. Looks like sequential access of GFF featuers is not sufficient, I
want to have a random access to it. I see the only way to do that is by
using Bio::DB::GFF as suggested by Chris.

Here is my question. Is there any tutorial to configure Bioperl  or this
module in particular to work with MySQL/postgres. I will really appreciate
it.

And thanks for all your help.
Kanmani

On Thu, Aug 26, 2010 at 10:08 AM, Chris Fields <cjfields at illinois.edu>wrote:

> On Aug 26, 2010, at 11:22 AM, kanmani radha wrote:
>
> > Hi Everyone,
> >
> > Thanks very much for this clarification.  Thanks a ton for every one who
> > spared their time to educate me.
> >
> > I see your points.  Please correct me if I am wrong.
> >
> > I understand that, Its better to use use Bio::DB::SeqFeature or
> Bio::DB::GFF
> > to load the fasta sequences (from a separate multifasta) file and
> > then Bio::Tools::GFF to parse the feature info from a gff file . Then
> query
> > the created database for the relevent GFF coordinates....
> >
> > I will implement this.
> >
> > Thanks once again.
> > Kanmani
>
> Yes, in general.  I forgot to mention that you can have an in-memory
> database as well, but it's only suggested if you have a few thousand or so
> features and small sequences (I think bacterial chromosomes will work).
>
> chris


From kanmaninradha at gmail.com  Thu Aug 26 22:04:20 2010
From: kanmaninradha at gmail.com (kanmani radha)
Date: Thu, 26 Aug 2010 15:04:20 -0700
Subject: [Bioperl-l] getting DNA sequence for exon features from GFF
In-Reply-To: <AANLkTikUxFLLAduO7M1QzSToewA_AgPPELKPVYq0+JKk@mail.gmail.com>
References: <AANLkTimzuqNz+oHZt3JAq=eQaefJ4UnQK=0R2tQhKfmS@mail.gmail.com>
	<1282814384.4777.48.camel@deskpro15336.dynamic.sanger.ac.uk>
	<6CC5D44B-47C7-4A05-B834-EAB9173C74CE@illinois.edu>
	<AANLkTinxcoKBHqU7bnfyNA6bi5qBjNAYR54c6K+Pg7rz@mail.gmail.com>
	<EF1B137F-94A7-45E1-B8FB-0E20142F0A7F@illinois.edu>
	<AANLkTikUxFLLAduO7M1QzSToewA_AgPPELKPVYq0+JKk@mail.gmail.com>
Message-ID: <AANLkTimTU87G1dajASCzHm5=pjHCKx8W5X8AR9TKLmU4@mail.gmail.com>

HI, I made some progress since then....
- Installing  Bio::DB::DBI::mysql needed Biosql.

- Downloaded and installed biosql follow the instruction as given in their
INSTALL file
- Created biosql db in my mysql server
- loaded schema using script from biosql

- installed DBI
- Now, I have problem with DBD::mysql. That reminds me couple years back i
had to struggle installing this driver on another machine. I thought i ask
around this time.

It fails with a bunch of error messages.....the first of it being....
dbdimp.h:22:49 error: mysql.h no such filer or directory

But, My mysql installation has header file in
"/usr/include/mysql3/mysql/mysql.h". Can anyone suggest how to move forward
from that.....

thanks,
Kanmani

On Thu, Aug 26, 2010 at 2:24 PM, kanmani radha <kanmaninradha at gmail.com>wrote:

> Hi Chris and others,
>
> For a brief amount time i could get away using Bio::DB::Fasta to index
> fasta files and Bio::Tools::GFF to iterate thru GFF features. But, i hit the
> wall again. Looks like sequential access of GFF featuers is not sufficient,
> I want to have a random access to it. I see the only way to do that is by
> using Bio::DB::GFF as suggested by Chris.
>
> Here is my question. Is there any tutorial to configure Bioperl  or this
> module in particular to work with MySQL/postgres. I will really appreciate
> it.
>
> And thanks for all your help.
> Kanmani
>
>
> On Thu, Aug 26, 2010 at 10:08 AM, Chris Fields <cjfields at illinois.edu>wrote:
>
>> On Aug 26, 2010, at 11:22 AM, kanmani radha wrote:
>>
>> > Hi Everyone,
>> >
>> > Thanks very much for this clarification.  Thanks a ton for every one who
>> > spared their time to educate me.
>> >
>> > I see your points.  Please correct me if I am wrong.
>> >
>> > I understand that, Its better to use use Bio::DB::SeqFeature or
>> Bio::DB::GFF
>> > to load the fasta sequences (from a separate multifasta) file and
>> > then Bio::Tools::GFF to parse the feature info from a gff file . Then
>> query
>> > the created database for the relevent GFF coordinates....
>> >
>> > I will implement this.
>> >
>> > Thanks once again.
>> > Kanmani
>>
>> Yes, in general.  I forgot to mention that you can have an in-memory
>> database as well, but it's only suggested if you have a few thousand or so
>> features and small sequences (I think bacterial chromosomes will work).
>>
>> chris
>
>
>


From rafalucas.unicamp at gmail.com  Thu Aug 26 22:11:07 2010
From: rafalucas.unicamp at gmail.com (Rafael Lucas)
Date: Thu, 26 Aug 2010 19:11:07 -0300
Subject: [Bioperl-l] Help in algorithm Bio::Structure::IO::pdb
Message-ID: <AANLkTi=zWPKeY1NpRA9TBSEnsbGH1W9F0y0QQ0+um7Yq@mail.gmail.com>

Hi folks,

How are you? I'm from Brazil and I was making an algorithm that
Cryptographyc a data and then print the result in a pdb file. So I have a
.fasta file and want to pass this file to .pdb file, if I use a program,
like PyMol, it will take so much time, so I wanna use the
Bio::Structure::IO::pdb to accelerate this process, could you help me in
this problem?

Thank you,

Rafael Lucas
Faculdade de Tecnologia em Analise e Desenvolvimento de Sistemas
FT - UNICAMP
+55 (19)9614-0533


From J.Christopher.Ellis at duke.edu  Fri Aug 27 02:06:30 2010
From: J.Christopher.Ellis at duke.edu (J. Christopher Ellis)
Date: Thu, 26 Aug 2010 22:06:30 -0400
Subject: [Bioperl-l] standaloneblastplus blastn crash
Message-ID: <55861.1282874790@duke.edu>

 When I run the standaloneblastplus I get the following error...

 ------------- EXCEPTION -------------
 MSG: C:Program FilesNCBIblast-2.2.24+binblastn.exe call crashed: There
was a problem running C:Program FilesNCBIblast-2.2.24+binblastn.exe :? at
C:/Perl64/lib/Bio/Tools/Run/WrapperBase/CommandExts.pm line 1001.

 STACK Bio::Tools::Run::WrapperBase::_run
C:/Perl64/lib/Bio/Tools/Run/WrapperBase/CommandExts.pm:1006
 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD
C:/Perl64/lib/Bio/Tools/Run/StandAloneBlastPlus.pm:1303
 STACK Bio::Tools::Run::StandAloneBlastPlus::run
C:/Perl64/lib/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:270
 STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD
C:/Perl64/lib/Bio/Tools/Run/StandAloneBlastPlus.pm:1301
 STACK toplevel localBlast.pl:9
 -------------------------------------

 I have a sneaky suspicion that it is an easy fix but for the life of me I
can not figure it out! :)

 Thanks in advance,
 Chris
 

From indraniel at gmail.com  Fri Aug 27 01:57:54 2010
From: indraniel at gmail.com (Indraniel)
Date: Fri, 27 Aug 2010 01:57:54 +0000 (UTC)
Subject: [Bioperl-l] How to convert SFF into Fastq
References: <COL102-W14F3F0CDA966B9ECE0BE1BFABB0@phx.gbl>
	<AANLkTilN3rsgWEjvmyMq9IjC8p5MzBdGGe-Xtfd6XoZF@mail.gmail.com>
	<AANLkTikC-I0JFvWqptlA69qrKnKrWSNyNPAwHQKSLluJ@mail.gmail.com>
Message-ID: <loom.20100827T035104-821@post.gmane.org>

A fourth option is the following tool, sff2fastq (written in C), described here:

http://indraniel.wordpress.com/2010/04/23/sff2fastq/

and 

http://github.com/indraniel/sff2fastq

Indraniel


From David.Messina at sbc.su.se  Fri Aug 27 07:41:21 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 27 Aug 2010 09:41:21 +0200
Subject: [Bioperl-l] [RFC] Interolog::Walk
In-Reply-To: <4C6D0B50.4050902@sms.ed.ac.uk>
References: <4C6BF4BD.5010200@sms.ed.ac.uk>
	<8BBCD561-CA5E-412B-8CBC-34767006A74A@sbc.su.se>
	<4C6D0B50.4050902@sms.ed.ac.uk>
Message-ID: <A5AACD38-0396-4221-B6F7-5740FBBD83E0@sbc.su.se>

Hi Giuseppe,


On Aug 19, 2010, at 12:45, Giuseppe Gallone wrote:
> Bio::Orthology::InterologMap
> Bio::Orthology::Interolog::Map,

> just in case somebody else finds other interesting applications for the Interolog concept and would like to "plug in" their own contribution. Would this make any sense?

Absolutely. I think either of the above is a good option, and I agree that the second is a little more flexible.

Your POD looks great! Way better than most. Having seen the whole thing now, I think your description is fine as is. And if you have another tutorial and example scripts on top of it, that would really be terrific, above and beyond what most people would expect.

So, time to unleash it on the world! :)


Dave


From David.Messina at sbc.su.se  Fri Aug 27 07:58:12 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 27 Aug 2010 09:58:12 +0200
Subject: [Bioperl-l] standaloneblastplus blastn crash
In-Reply-To: <55861.1282874790@duke.edu>
References: <55861.1282874790@duke.edu>
Message-ID: <9275A540-AE42-47B0-BA73-A906964C451B@sbc.su.se>

Hi Chris,

If you look at the error message, it says what the problem is: it's trying to call the blastn executable with no spaces in the path name.

> MSG: C:Program FilesNCBIblast-2.2.24+binblastn.exe call crashed: There
> was a problem running C:Program FilesNCBIblast-2.2.24+binblastn.exe


Now, that could be a problem is BioPerl or it could be a problem in your code. It's hard to diagnose where the problem lies without your code, so please post your code.


Dave


From G.Gallone at sms.ed.ac.uk  Fri Aug 27 11:07:57 2010
From: G.Gallone at sms.ed.ac.uk (Giuseppe Gallone)
Date: Fri, 27 Aug 2010 12:07:57 +0100
Subject: [Bioperl-l] [RFC] Interolog::Walk
In-Reply-To: <A5AACD38-0396-4221-B6F7-5740FBBD83E0@sbc.su.se>
References: <4C6BF4BD.5010200@sms.ed.ac.uk>
	<8BBCD561-CA5E-412B-8CBC-34767006A74A@sbc.su.se>
	<4C6D0B50.4050902@sms.ed.ac.uk>
	<A5AACD38-0396-4221-B6F7-5740FBBD83E0@sbc.su.se>
Message-ID: <4C779C8D.1090007@sms.ed.ac.uk>

Hi Dave,

thank you very much for your feedback :) . I will register the namespace 
right now. I think I will use 'homology' as the second level name 
though, because I plan to extend the module to work with paralogues as well.

As for the category, which one of the following you reckon it will fit a 
Bio:: package better

http://www.cpan.org/modules/by-category/

Regards
Giuseppe

On 27/08/10 08:41, Dave Messina wrote:
> Hi Giuseppe,
>
>
> On Aug 19, 2010, at 12:45, Giuseppe Gallone wrote:
>> Bio::Orthology::InterologMap
>> Bio::Orthology::Interolog::Map,
>
>> just in case somebody else finds other interesting applications for the Interolog concept and would like to "plug in" their own contribution. Would this make any sense?
>
> Absolutely. I think either of the above is a good option, and I agree that the second is a little more flexible.
>
> Your POD looks great! Way better than most. Having seen the whole thing now, I think your description is fine as is. And if you have another tutorial and example scripts on top of it, that would really be terrific, above and beyond what most people would expect.
>
> So, time to unleash it on the world! :)
>
>
> Dave
>
>

-- 

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


From David.Messina at sbc.su.se  Fri Aug 27 11:25:06 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 27 Aug 2010 13:25:06 +0200
Subject: [Bioperl-l] [RFC] Interolog::Walk
In-Reply-To: <4C779C8D.1090007@sms.ed.ac.uk>
References: <4C6BF4BD.5010200@sms.ed.ac.uk>
	<8BBCD561-CA5E-412B-8CBC-34767006A74A@sbc.su.se>
	<4C6D0B50.4050902@sms.ed.ac.uk>
	<A5AACD38-0396-4221-B6F7-5740FBBD83E0@sbc.su.se>
	<4C779C8D.1090007@sms.ed.ac.uk>
Message-ID: <80E5F23B-EA13-40EE-B0C5-81F2E6A69C01@sbc.su.se>

Hi Giuseppe,


> I think I will use 'homology' as the second level name though, because I plan to extend the module to work with paralogues as well.

Sounds good.


> As for the category, which one of the following you reckon it will fit a Bio:: package better
> 
> http://www.cpan.org/modules/by-category/


Bio:: is in 23 - miscellaneous modules, so probably keeping with that makes sense.

I don't know much about that stuff, though. Chris F. or other CPAN cognoscenti care to comment?


Dave


From cjfields at illinois.edu  Fri Aug 27 13:26:51 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 27 Aug 2010 08:26:51 -0500
Subject: [Bioperl-l] [RFC] Interolog::Walk
In-Reply-To: <80E5F23B-EA13-40EE-B0C5-81F2E6A69C01@sbc.su.se>
References: <4C6BF4BD.5010200@sms.ed.ac.uk>
	<8BBCD561-CA5E-412B-8CBC-34767006A74A@sbc.su.se>
	<4C6D0B50.4050902@sms.ed.ac.uk>
	<A5AACD38-0396-4221-B6F7-5740FBBD83E0@sbc.su.se>
	<4C779C8D.1090007@sms.ed.ac.uk>
	<80E5F23B-EA13-40EE-B0C5-81F2E6A69C01@sbc.su.se>
Message-ID: <88BB7813-E892-4BEC-9C49-5FD22325BBF7@illinois.edu>

On Aug 27, 2010, at 6:25 AM, Dave Messina wrote:

> Hi Giuseppe,
> 
> 
>> I think I will use 'homology' as the second level name though, because I plan to extend the module to work with paralogues as well.
> 
> Sounds good.
> 
> 
>> As for the category, which one of the following you reckon it will fit a Bio:: package better
>> 
>> http://www.cpan.org/modules/by-category/
> 
> 
> Bio:: is in 23 - miscellaneous modules, so probably keeping with that makes sense.
> 
> I don't know much about that stuff, though. Chris F. or other CPAN cognoscenti care to comment?
> 
> 
> Dave

That's probably the best spot, as we cover a fairly broad range (mainly due to core monolithic structure).  Though it's terribly non-descript, sort of the junk drawer of CPAN.

chris


From adamkennedybackup at gmail.com  Sun Aug 29 11:35:50 2010
From: adamkennedybackup at gmail.com (Adam Kennedy)
Date: Sun, 29 Aug 2010 21:35:50 +1000
Subject: [Bioperl-l] Could I install BioPerl on Windows with the
 ActivePerl 5.12.1?
In-Reply-To: <5115F433-06AC-46F1-81AD-D15C4A8D9524@gmail.com>
References: <AANLkTi=ycKzqWWQ-FHk=4WBxhedt7CYT-WkBZkxRjgrm@mail.gmail.com>
	<78E913D5-00E2-45F2-AA9D-7F4A7CDBFDA1@gmail.com>
	<AANLkTimBPL6Sr2kmg+f0t1j8pk_9nBAoqubKzY4AJoxo@mail.gmail.com>
	<5115F433-06AC-46F1-81AD-D15C4A8D9524@gmail.com>
Message-ID: <AANLkTinSp6GCOQvCFYOUk1Ad8EjKdU=dQbe5GpbLiLr1@mail.gmail.com>

http://strawberryperl.com/download/professional/strawberry-perl-professional-5.10.1.3-alpha-2.msi

You get BioPerl installed out the box.

Adam K

On 20 August 2010 03:20, Christopher Fields <cjfields1 at gmail.com> wrote:
> cc'ing list. ?Looks like the BioPerl PPM is possibly broken for perl 5.12. ?Shouldn't be too hard to fix, but apparently there are a lot of missing packages. Troubling...
>
> chris
>
> On Aug 19, 2010, at 11:29 AM, han sun wrote:
>
>> v5.10 works,thanks.
>>
>> 2010/8/19 Christopher Fields <cjfields1 at gmail.com>
>> Try using ActivePerl 5.10 instead of v5.12. ?It's very possible the PPM won't work for v5.12 yet.
>>
>> chris
>>
>> On Aug 19, 2010, at 9:25 AM, han sun wrote:
>>
>> > Hello everyone,
>> >
>> > I have used perl for several months,and I now want to feel the power of
>> > bioperl.
>> > But it seems that the installing is more difficult than I thought.
>> >
>> > I typed the commands.
>> >
>> >
>> >
>> > install-shell
>> >
>> >
>> > rep add bioperl http://bioperl.org/DIST
>> >
>> >
>> > rep add uwinnipeg
>> > http://cpan.uwinnipeg.ca/PPMPackages/12xx/<http://cpan.uwinnipeg.ca/PPMPackages/10xx/>
>> >
>> >
>> > rep add trouchelle http://trouchelle.com/ppm12/
>> >
>> > install BioPerl
>> >
>> > However,the installing failed,
>> >
>> > ppm install failed:
>> > Can't find any package that provides List::MoreUtils for Bundle-BioPerl-Core
>> > Can't find any package that provides PostScript::TextBlock for
>> > Bundle-BioPerl-Core
>> > Can't find any package that provides Ace:: for Bundle-BioPerl-Core
>> > Can't find any package that provides SOAP::Lite for Bundle-BioPerl-Core
>> > Can't find any package that provides Convert::Binary::C for
>> > Bundle-BioPerl-Core
>> > Can't find any package that provides XML::Twig for Bundle-BioPerl-Core
>> > Can't find any package that provides DB_File:: for Bundle-BioPerl-Core
>> > Can't find any package that provides IPC::Run for GraphViz
>> > Can't find any package that provides XML-XPathEngine for XML-DOM-XPath
>> > Can't find any package that provides List-MoreUtils for Moose
>> > Can't find any package that provides List-MoreUtils for Class-MOP
>> >
>> >
>> > then I tried
>> >
>> > install http://www.bribes.org/perl/ppm/GD.ppd
>> >
>> > and tried the installation again,but it still didn't help.
>> >
>> > *
>> > *
>> > *
>> > *
>> > *
>> > *
>> >
>> >
>> > *Do you konw what's wrong with the problem?*
>> > *
>> > *
>> > *
>> > *
>> > *Please help me,thanks very much.*
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields1 at gmail.com  Sun Aug 29 15:58:50 2010
From: cjfields1 at gmail.com (Christopher Fields)
Date: Sun, 29 Aug 2010 10:58:50 -0500
Subject: [Bioperl-l] Could I install BioPerl on Windows with the
	ActivePerl 5.12.1?
In-Reply-To: <AANLkTinSp6GCOQvCFYOUk1Ad8EjKdU=dQbe5GpbLiLr1@mail.gmail.com>
References: <AANLkTi=ycKzqWWQ-FHk=4WBxhedt7CYT-WkBZkxRjgrm@mail.gmail.com>
	<78E913D5-00E2-45F2-AA9D-7F4A7CDBFDA1@gmail.com>
	<AANLkTimBPL6Sr2kmg+f0t1j8pk_9nBAoqubKzY4AJoxo@mail.gmail.com>
	<5115F433-06AC-46F1-81AD-D15C4A8D9524@gmail.com>
	<AANLkTinSp6GCOQvCFYOUk1Ad8EjKdU=dQbe5GpbLiLr1@mail.gmail.com>
Message-ID: <A1B60C18-E144-466B-9630-21A88EF2CECB@gmail.com>

Yes, and I am thinking of pointing more and more users that direction instead.  Can't say maintaining PPM packages with ever-fluctuating specs is easy when I don't work with Windows anymore.

chris

On Aug 29, 2010, at 6:35 AM, Adam Kennedy wrote:

> http://strawberryperl.com/download/professional/strawberry-perl-professional-5.10.1.3-alpha-2.msi
> 
> You get BioPerl installed out the box.
> 
> Adam K
> 
> On 20 August 2010 03:20, Christopher Fields <cjfields1 at gmail.com> wrote:
>> cc'ing list.  Looks like the BioPerl PPM is possibly broken for perl 5.12.  Shouldn't be too hard to fix, but apparently there are a lot of missing packages. Troubling...
>> 
>> chris
>> 
>> On Aug 19, 2010, at 11:29 AM, han sun wrote:
>> 
>>> v5.10 works,thanks.
>>> 
>>> 2010/8/19 Christopher Fields <cjfields1 at gmail.com>
>>> Try using ActivePerl 5.10 instead of v5.12.  It's very possible the PPM won't work for v5.12 yet.
>>> 
>>> chris
>>> 
>>> On Aug 19, 2010, at 9:25 AM, han sun wrote:
>>> 
>>>> Hello everyone,
>>>> 
>>>> I have used perl for several months,and I now want to feel the power of
>>>> bioperl.
>>>> But it seems that the installing is more difficult than I thought.
>>>> 
>>>> I typed the commands.
>>>> 
>>>> 
>>>> 
>>>> install-shell
>>>> 
>>>> 
>>>> rep add bioperl http://bioperl.org/DIST
>>>> 
>>>> 
>>>> rep add uwinnipeg
>>>> http://cpan.uwinnipeg.ca/PPMPackages/12xx/<http://cpan.uwinnipeg.ca/PPMPackages/10xx/>
>>>> 
>>>> 
>>>> rep add trouchelle http://trouchelle.com/ppm12/
>>>> 
>>>> install BioPerl
>>>> 
>>>> However,the installing failed,
>>>> 
>>>> ppm install failed:
>>>> Can't find any package that provides List::MoreUtils for Bundle-BioPerl-Core
>>>> Can't find any package that provides PostScript::TextBlock for
>>>> Bundle-BioPerl-Core
>>>> Can't find any package that provides Ace:: for Bundle-BioPerl-Core
>>>> Can't find any package that provides SOAP::Lite for Bundle-BioPerl-Core
>>>> Can't find any package that provides Convert::Binary::C for
>>>> Bundle-BioPerl-Core
>>>> Can't find any package that provides XML::Twig for Bundle-BioPerl-Core
>>>> Can't find any package that provides DB_File:: for Bundle-BioPerl-Core
>>>> Can't find any package that provides IPC::Run for GraphViz
>>>> Can't find any package that provides XML-XPathEngine for XML-DOM-XPath
>>>> Can't find any package that provides List-MoreUtils for Moose
>>>> Can't find any package that provides List-MoreUtils for Class-MOP
>>>> 
>>>> 
>>>> then I tried
>>>> 
>>>> install http://www.bribes.org/perl/ppm/GD.ppd
>>>> 
>>>> and tried the installation again,but it still didn't help.
>>>> 
>>>> *
>>>> *
>>>> *
>>>> *
>>>> *
>>>> *
>>>> 
>>>> 
>>>> *Do you konw what's wrong with the problem?*
>>>> *
>>>> *
>>>> *
>>>> *
>>>> *Please help me,thanks very much.*
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>>> 
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From odclerck at gmail.com  Fri Aug 27 07:44:14 2010
From: odclerck at gmail.com (odclerck)
Date: Fri, 27 Aug 2010 00:44:14 -0700 (PDT)
Subject: [Bioperl-l]  fasta header replace
Message-ID: <29550202.post@talk.nabble.com>


Hi,
Was wondering if someone had an easy script available that converts the
headers of a fasta sequences to a value stored in a separate text file.

Macrogen produces files with sequences that look more or less like this:
>100825-30_I01_CF_CentralAmerica1_A1_psbAF.ab1	1012, 1000 bases, 0 checksum.

I can filter out the position on the plate e.g. "A1" easily but would like
to replace this with the name of the strain stored in a different text file,
e.g. "A1_D1222".

Realize this sounds pretty basic to most of you, but I'm pretty new at
scripting.
Olivier

-- 
View this message in context: http://old.nabble.com/fasta-header-replace-tp29550202p29550202.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From J.Christopher.Ellis at duke.edu  Mon Aug 30 12:55:04 2010
From: J.Christopher.Ellis at duke.edu (J. Christopher Ellis)
Date: Mon, 30 Aug 2010 08:55:04 -0400
Subject: [Bioperl-l] Taxonomy DB problem
Message-ID: <51468.1283172904@duke.edu>

 Hi All,

 I am trying to extract the entire taxonomy of an organism including the
classifications. Some thing like...

Phylum:Proteobacteria,?Class:Gammaproteobacteria,?Order:Enterobacteriales,?Family:Enterobacteriaceae,?Genus:Escherichia

I?am?not?worried?about?format?just?that?I?get?the?information?and?the?associated?level?of?hierarchy.?The?script?found?at?http://bioperl.org/wiki/Species_names_from_accession_numbers?seemed?like?a?good?starting?point?so?I?copied?it?and?tried?run?it?but?got?an?error.

My?first?question?is?"Is?there?a?known?fix?for?this?"?and?my?second?question?is?how?do?I?get?the?full?hierarchical?information?(as?seen?above)?with?the?taxonomy?db?

Thanks?for?all?your?help?in?advance!

Chris?


From rafalucas.unicamp at gmail.com  Mon Aug 30 13:24:11 2010
From: rafalucas.unicamp at gmail.com (Rafael Lucas)
Date: Mon, 30 Aug 2010 10:24:11 -0300
Subject: [Bioperl-l] help in algorithm Bio::Structure::IO::pdb
Message-ID: <AANLkTimNHcjCRqYrhH8=Q=Dqqjtj35NNqMqP+Q2P1oPU@mail.gmail.com>

Hi folks,

How are you? I'm from Brazil and I was making an algorithm that
Cryptographyc a data and then print the result in a pdb file. So I have a
.fasta file and want to pass this file to .pdb file, if I use a program,
like PyMol, it will take so much time, so I wanna use the
Bio::Structure::IO::pdb to accelerate this process, could you help me in
this problem?

Thank you,

Rafael Lucas
Faculdade de Tecnologia em Analise e Desenvolvimento de Sistemas
FT - UNICAMP
+55 (19)9614-0533


From cjfields at illinois.edu  Mon Aug 30 13:36:41 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 30 Aug 2010 08:36:41 -0500
Subject: [Bioperl-l] Taxonomy DB problem
In-Reply-To: <51468.1283172904@duke.edu>
References: <51468.1283172904@duke.edu>
Message-ID: <B93CF33A-0FA5-4A19-AF5A-BE203AA26E38@illinois.edu>

Chris,

Regarding a fix for that script, we would have to see your modified script and the error.  However, there are modules within BioPerl to essentially do what you want, in particular, Bio::DB::Taxonomy.

chris

On Aug 30, 2010, at 7:55 AM, J. Christopher Ellis wrote:

> Hi All,
> 
> I am trying to extract the entire taxonomy of an organism including the
> classifications. Some thing like...
> 
> Phylum:Proteobacteria, Class:Gammaproteobacteria, Order:Enterobacteriales, Family:Enterobacteriaceae, Genus:Escherichia
> 
> I am not worried about format just that I get the information and the associated level of hierarchy. The script found at http://bioperl.org/wiki/Species_names_from_accession_numbers seemed like a good starting point so I copied it and tried run it but got an error.
> 
> My first question is "Is there a known fix for this?" and my second question is how do I get the full hierarchical information (as seen above) with the taxonomy db?
> 
> Thanks for all your help in advance!
> 
> Chris 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From fs5 at sanger.ac.uk  Mon Aug 30 15:11:06 2010
From: fs5 at sanger.ac.uk (Frank Schwach)
Date: Mon, 30 Aug 2010 16:11:06 +0100
Subject: [Bioperl-l] fasta header replace
In-Reply-To: <29550202.post@talk.nabble.com>
References: <29550202.post@talk.nabble.com>
Message-ID: <4C7BCA0A.70503@sanger.ac.uk>

Hi Olivier,

Do you know how to read a file and build a hash from the contents? This 
is what you will need to do,
e.g. if your file is
A1 Strain_A
A2 Strain_A
A3 Strain_B

then you can do something like:

open (INFILE, '>', $infile_path) or die;
my %well2strain;
While (<INFILE>){
    my ($well, $strain) = ($_=~/^([A-Z]\d+)\s+(\w+)/);
    $well2strain{$well}=$strain;
}

You can then use the values of the hash to set the sequence ID as you 
parse the FASTA file. The BioPerl SeqIO howto gives details about how to 
read and write the FASTA file 
(http://www.bioperl.org/wiki/HOWTO:SeqIO#Working_Examples).
You can change the id of a sequence object with
$some_seq_object->id( 'my new ID');

See http://doc.bioperl.org/releases/bioperl-1.0/Bio/Seq.html for details.

Hope that helps to get you started.

Frank

 
odclerck wrote:
> Hi,
> Was wondering if someone had an easy script available that converts the
> headers of a fasta sequences to a value stored in a separate text file.
>
> Macrogen produces files with sequences that look more or less like this:
>   
>> 100825-30_I01_CF_CentralAmerica1_A1_psbAF.ab1	1012, 1000 bases, 0 checksum.
>>     
>
> I can filter out the position on the plate e.g. "A1" easily but would like
> to replace this with the name of the strain stored in a different text file,
> e.g. "A1_D1222".
>
> Realize this sounds pretty basic to most of you, but I'm pretty new at
> scripting.
> Olivier
>
>   


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


From jessica.sun at gmail.com  Mon Aug 30 15:51:39 2010
From: jessica.sun at gmail.com (Jessica Sun)
Date: Mon, 30 Aug 2010 11:51:39 -0400
Subject: [Bioperl-l] Git for the lazy
In-Reply-To: <4A13D48C-B920-4FA5-AF18-292C764A8B79@sbc.su.se>
References: <4A13D48C-B920-4FA5-AF18-292C764A8B79@sbc.su.se>
Message-ID: <AANLkTikzkPL-WN7XUNPcfNhqqnOYUR15br-YzrwsE5tL@mail.gmail.com>

I want to add sequence features  with tags and tag values, I want to have
them in my order, however somehow it seems it is in default alphabetically
orders of the tags, does any one knows how to fix? thanks a lot in advance.


From G.Gallone at sms.ed.ac.uk  Tue Aug 31 11:52:57 2010
From: G.Gallone at sms.ed.ac.uk (Giuseppe Gallone)
Date: Tue, 31 Aug 2010 12:52:57 +0100
Subject: [Bioperl-l] New CPAN Release - Bio::Homology::InterologWalk - A
 Perl Module to retrieve putative PPIs through Interologs
Message-ID: <4C7CED19.80802@sms.ed.ac.uk>

Dear Bioperl users,

I would like to announce the release of Bio::Homology::InterologWalk, a
module that retrieves, scores and visualizes putative Protein-Protein 
Interactions through the orthology-walk method.

The project is available from the following link

http://search.cpan.org/~ggallone/

and a description of the idea behind it is here

http://search.cpan.org/~ggallone/Bio-Homology-InterologWalk-0.02/lib/Bio/Homology/InterologWalk.pm#DESCRIPTION

The project is in a very early stage (currently ver. 0.02 alpha) and has 
currently been tested only on Linux environments. It has not been tested 
on Macs, but it should work fine, and I would appreciate any feedback 
from Mac users who try it.

*Any* form of feedback  will be extremely appreciated (bug, typos,
syntactical errors, verbal abuse etc :) ).


Best,
Giuseppe


-- 

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


From cjfields at illinois.edu  Tue Aug 31 15:01:59 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 31 Aug 2010 10:01:59 -0500
Subject: [Bioperl-l] Taxonomy DB problem
In-Reply-To: <56973.1283255847@duke.edu>
References: <56973.1283255847@duke.edu>
Message-ID: <7167CA86-857E-4E16-A3D6-BA45045CF892@illinois.edu>

Yes, I see that one.  It may be the ID hash that is being returned is empty.  I'll look into it.

-c 

On Aug 31, 2010, at 6:57 AM, J. Christopher Ellis wrote:

> Hi Chris,
> 
> The error is...
> 
> "Use of uninitialized value $id in join or string at C:/Perl64/site/lib/Bio/Tools/EUtilities/EUtilParameters.pm line 363."
> 
> The script from http://bioperl.org/wiki/Species_names_from_accession_numbers is as follows....
> 
> use Bio::DB::EUtilities;
> 
> 
> 
>  
> 
> 
> 
> 
> my (%taxa, @taxa);
> 
> 
> 
> my (%names, %idmap);
> 
> 
> 
>  
> 
> 
> 
> 
> # these are protein ids; nuc ids will work by changing -dbfrom => 'nucleotide',
> 
> 
> 
> # (probably)
> 
> 
> 
>  
> 
> 
> 
> 
> my @ids = qw(1621261 89318838 68536103 
> 
> 20807972
>  730439);
> 
>  
> 
> 
> 
> 
> my $factory = Bio::DB::EUtilities->new(
> 
> -
> eutil => 'elink',
> 
>  
> -db => 'taxonomy',
> 
> 
> 
>  
> -dbfrom => 'protein',
> 
> 
> 
>  
> -correspondence => 1,
> 
> 
> 
>  
> -id => \@ids);
> 
> 
> 
>  
> 
> 
> 
> 
> # iterate through the LinkSet objects
> 
> 
> 
> while (my $ds = $factory->next_LinkSet) {
> 
> 
> 
>  
> $taxa{($ds->get_submitted_ids)[0]
> 
> }
>  = ($ds->get_ids)[0]
> 
> }
> 
> 
> 
>  
> 
> 
> 
> 
> @taxa = @taxa{@ids};
> 
> 
> 
>  
> 
> 
> 
> 
> $factory = Bio::DB::EUtilities->new(-eutil 
> 
> =>
>  'esummary',
> 
>  
> -db => 'taxonomy',
> 
> 
> 
>  
> -id => \@taxa );
> 
> 
> 
>  
> 
> 
> 
> 
> while (local $_ = $factory->next_DocSum)
> 
>  
> {
> 
>  
> $names{($_->get_contents_by_name('TaxId'))
> 
> [
> 0]} = 
> 
> ($_->get_contents_by_name('ScientificName'))[0
> 
> ]
> ;
> 
> }
> 
> 
> 
>  
> 
> 
> 
> 
> foreach (@ids) {
> 
> 
> 
>  
> $idmap{$_} = $names{$taxa{$_
> 
> }
> };
> 
> }
> 
> 
> 
>  
> 
> 
> 
> 
> # %idmap is
> 
> 
> 
> # 1621261 => 'Mycobacterium tuberculosis H37Rv'
> 
> 
> 
> # 20807972 => 'Thermoanaerobacter tengcongensis MB4'
> 
> 
> 
> # 68536103 => 'Corynebacterium jeikeium K411'
> 
> 
> 
> # 730439 => 'Bacillus caldolyticus'
> 
> 
> 
> # 89318838 => undef (this record has been removed from the db)
> 
> 
> 
>  
> 
> 
> 
> 
> 1;
> 
> 
> Thanks,
> 
> 
> 
> Chris
> 
> 
> On Mon 08/30/10 09:36 , "Chris Fields" cjfields at illinois.edu sent:
> Chris,
> 
> Regarding a fix for that script, we would have to see your modified script and the error. However, there are modules within BioPerl to essentially do what you want, in particular, Bio::DB::Taxonomy.
> 
> chris
> 
> On Aug 30, 2010, at 7:55 AM, J. Christopher Ellis wrote:
> 
> > Hi All,
> > 
> > I am trying to extract the entire taxonomy of an organism including the
> > classifications. Some thing like...
> > 
> > Phylum:Proteobacteria, Class:Gammaproteobacteria, Order:Enterobacteriales, Family:Enterobacteriaceae, Genus:Escherichia
> > 
> > I am not worried about format just that I get the information and the associated level of hierarchy. The script found athttp://bioperl.org/wiki/Species_names_from_accession_numbers">http://bioperl.org/wiki/Species_names_from_accession_numbers seemed like a good starting point so I copied it and tried run it but got an error.
> > 
> > My first question is "Is there a known fix for this?" and my second question is how do I get the full hierarchical information (as seen above) with the taxonomy db?
> > 
> > Thanks for all your help in advance!
> > 
> > Chris 
> > 
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l">http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 


From J.Christopher.Ellis at duke.edu  Tue Aug 31 11:57:27 2010
From: J.Christopher.Ellis at duke.edu (J. Christopher Ellis)
Date: Tue, 31 Aug 2010 07:57:27 -0400
Subject: [Bioperl-l] Taxonomy DB problem
Message-ID: <56973.1283255847@duke.edu>

 Hi Chris,

 The error is...

 "Use of uninitialized value $id in join or string at
C:/Perl64/site/lib/Bio/Tools/EUtilities/EUtilParameters.pm line 363."

 The script from
http://bioperl.org/wiki/Species_names_from_accession_numbers is as
follows....

use?Bio::DB::EUtilities;

?

my?(%taxa,?@taxa);

my?(%names,?%idmap);

?

#?these?are?protein?ids;?nuc?ids?will?work?by?changing?-dbfrom?=>?'nucleotide',

#?(probably)

?

my?@ids?=?qw(1621261?89318838?68536103?

20807972?730439);

?

my?$factory?=?Bio::DB::EUtilities->new(

-eutil?=>?'elink',

?-db?=>?'taxonomy',

?-dbfrom?=>?'protein',

?-correspondence?=>?1,

?-id?=>?@ids);

?

#?iterate?through?the?LinkSet?objects

while?(my?$ds?=?$factory->next_LinkSet)?{

?$taxa{($ds->get_submitted_ids)[0]

}?=?($ds->get_ids)[0]

}

?

@taxa?=?@taxa{@ids};

?

$factory?=?Bio::DB::EUtilities->new(-eutil?

=>?'esummary',

?-db?=>?'taxonomy',

?-id?=>?@taxa?);

?

while?(local?$_?=?$factory->next_DocSum)

?{

?$names{($_->get_contents_by_name('TaxId'))

[0]}?=?

($_->get_contents_by_name('ScientificName'))[0

];

}

?

foreach?(@ids)?{

?$idmap{$_}?=?$names{$taxa{$_

}};

}

?

#?%idmap?is

#?1621261?=>?'Mycobacterium?tuberculosis?H37Rv'

#?20807972?=>?'Thermoanaerobacter?tengcongensis?MB4'

#?68536103?=>?'Corynebacterium?jeikeium?K411'

#?730439?=>?'Bacillus?caldolyticus'

#?89318838?=>?undef?(this?record?has?been?removed?from?the?db)

?

1;

Thanks,

Chris

 On Mon 08/30/10 09:36 , "Chris Fields" cjfields at illinois.edu sent:
 Chris,

 Regarding a fix for that script, we would have to see your modified
script and the error. However, there are modules within BioPerl to
essentially do what you want, in particular, Bio::DB::Taxonomy.

 chris

 On Aug 30, 2010, at 7:55 AM, J. Christopher Ellis wrote:

 > Hi All,
 > 
 > I am trying to extract the entire taxonomy of an organism including the
 > classifications. Some thing like...
 > 
 > Phylum:Proteobacteria, Class:Gammaproteobacteria,
Order:Enterobacteriales, Family:Enterobacteriaceae, Genus:Escherichia
 > 
 > I am not worried about format just that I get the information and the
associated level of hierarchy. The script found at
http://bioperl.org/wiki/Species_names_from_accession_numbers seemed like a
good starting point so I copied it and tried run it but got an error.
 > 
 > My first question is "Is there a known fix for this?" and my second
question is how do I get the full hierarchical information (as seen above)
with the taxonomy db?
 > 
 > Thanks for all your help in advance!
 > 
 > Chris 
 > 
 > 
 > _______________________________________________
 > Bioperl-l mailing list
 > Bioperl-l at lists.open-bio.org
 > http://lists.open-bio.org/mailman/listinfo/bioperl-l