From wrp at virginia.edu  Mon Jul  2 10:31:40 2012
From: wrp at virginia.edu (William Pearson)
Date: Mon, 2 Jul 2012 10:31:40 -0400
Subject: [Bioperl-l] Application Deadline - 2012 CSHL Computational and
	Comparative Genomics Course
Message-ID: <AAD6FEEC-F120-4B30-8208-1E48D8129FE6@virginia.edu>


Course announcement - Application deadline, July 15, 2012

Cold Spring Harbor COMPUTATIONAL & COMPARATIVE GENOMICS

Oct 31 - Nov 6, 2011
Application Deadline: July 15, 2012

INSTRUCTORS:

William Pearson, University of Virginia, Charlottesville, VA
Lisa Stubbs, University of Illinois, Urbana, IL
This course presents a comprehensive overview of the theory and practice of computational methods for the identification and characterization of functional elements from DNA sequence data. The course focuses on approaches for extracting the maximum amount of information from protein and DNA sequence similarity through sequence database searches, statistical analysis, and multiple sequence alignment. Additional topics include:

Alignment and analysis of "Next-Gen" sequencing data
The Galaxy environment for high-throughput analysis
Identification of conserved signals in aligned and unaligned sequences
Regulatory element and motif recognition
Integration of genetic and sequence information in biological databases
The ENSEMBL genome browser and BioMart
Function/phenotype prediction for sequence variants
The course combines lectures with hands-on exercises; students are encouraged to pose challenging sequence analysis problems using their own data. The course is designed for biologists seeking advanced training in biological sequence and genome analysis, computational biology core resource directors and staff, and for scientists in other disciplines, such as computer science, who wish to survey current research problems in biological sequence analysis. Advanced programming skills are not required.

The lecture/lab schedule for the 2011 course can be found at fasta.bioch.virginia.edu/cshl

Speakers in the 2011 course included:
Aaron Mackey, U. of Virginia, Next-Gen analysis pipelines
Bert Overduin, European Bioinformatics Institute, UK, ENSEMBL and BioMart
Francis Ouellette, Ontario Institute for Cancer Research, Databases for Biological Function
William Pearson, U. of Virginia, Similarity Searching, Multiple Alignment
Lisa Stubbs, U. of Illinois, Urbana, ChIP, Transcription Factors, and Comparative Genomics
James Taylor, Emory, Galaxy and genome analysis pipelines
The primary focus of the computational and comparative genomics course is the theory and practice of algorithms used in computational biology, with the goal of using current methods more effectively and evaluating new approaches. Students who wish to learn Perl programming for Bioinformatics are encouraged to apply to the Programming for Biology course. Students who would like in-depth training in the analysis of next-generation sequencing data (e.g., SNP calling and the detection of structural variants) should apply to the course on Advanced Sequencing Technologies & Applications. This Computational and Comparative Genomics course will discuss methods for phenotype prediction from variation data.

To apply to the course, fill out and send in the form at:

http://meetings.cshl.edu/course/courseapp_instr.shtml


From shalabh.sharma7 at gmail.com  Mon Jul  2 13:09:57 2012
From: shalabh.sharma7 at gmail.com (shalabh sharma)
Date: Mon, 2 Jul 2012 13:09:57 -0400
Subject: [Bioperl-l] translation frame problem in bioperl
Message-ID: <CAA7rn9cnqjsMv-ggCn_H+WavqdtCZZfh37v8pkLak1-ct8UX+A@mail.gmail.com>

Hi All,
         I am just confused about the translation frames. I used bioperl to
parse a blastx report.
Reports shows that the frame used is -2 but when i translate the sequence
using EMBOSS or Some other program the frame is -1.
Am i doing something wrong here.

Here is the sequence:
>gi|378759230|gb|AHBJ01000169.1| SAR86 cluster bacterium SAR86D
scf1120176765857, whole genome shotgun sequence 2642:3697
AGCTTCCCATGGAACCCATGCAAGTGCAATATTTGTTTCTAGCTCTGGTGACCACCAAGGAGATGTCACGTAGCCCACCTCATCTTCATCAGTATTAGTTACTATCCAAAAATCAGAAGCATAATCTGTGATTTCTTTTCCTCCAAGGGTTAAACCAACCATCTTCATTTTAAATGGTGCATTTCCTTCATCTATGATTGCTCTCTGTTTTTCAAGCTCTTCTTTACCAATGTAATCAGCTGCTTTATTTCTTGGTACCTGATAACTTAAATTAACCTGAAAGGGAGAAGTTTCATGATCCAGATCTTGTCCCCAAGACAAAATTCCAGCTGCAATGCGACGATGATGCGCAGGAGCTATGACCATTAAGCCAAATTCTTCTCCAGCCTCAAGAACAGCATTCCACATTTTTTCTGCATTATCATGTGCGTCACGAACATATATTTCATAACCTTTTTCGCCTGTAAAACCAGTTTGACTGATTACACAATCAGCTCCACCAACCTGAGTTTCTAAAATTCCATAATAAGGAACTTCTCTTAACTCTTCGCCAGCTAACTTTGCCATAAGATCTTCAGATAAAGGGCCTTGAATTTGAACAGGACAAACATCAATCTCATCAATTTCTACGTCATATTTTTTAGACACATTTACGCCTTGAAGCCAAAGTAAGAGATCGCTGTCTGATATTGAGAACCAGAATTCATCTTCTGTTAGTCTTAATAGAACAGGGTCATTTAAAACCCCTCCTTTTTCATTGCATAAAATCGCATATTTACCATTTCCGGGTTTAATTTTTGTAGCATCACGAGTTATTACATAATCTGTAAAAGCTTCTGCATCTGGACCTTTTACTCTTATCTGTCTTTCAACAGCAACATTCCACATAGTAACTCTATTAACCAAGGCTTCGTATTCAACCATGGCACCGCCATCTTCAGGTTTTACATAGCCTCGTGGATGATAAATTCGATTATATACAGTTGCTCTCCAACAGCCCGCTTCATGAGATAGATGCCAAAAAGGCGATTTTCTTACCCGGGTTGAAATTAATAA

This is a part of blast report by bioperl:
>JCVI_READ_1105499496127 /Indian_Ocean/gcvT
          Length = 352

 Score =  655 bits (1690), Expect = 0.0
 Identities = 311/352 (88%), Positives = 329/352 (93%)
 Frame = -2

Query: 3697 LLISTRVRKSPFWHLSHEAGCWRATVYNRIYHPRGYVKPEDGGAMVEYEALVNRVTMWNV
3518
            +LISTRVRKSPFWHLSHEAGCWRATVYNRIYHPRGY+KPEDGGAMVEY+ALVNRVTMWNV
Sbjct: 1    MLISTRVRKSPFWHLSHEAGCWRATVYNRIYHPRGYIKPEDGGAMVEYDALVNRVTMWNV 60
 .....
.....
Query: 2797 GLTLGGKEITDYASDFWIVTNTDEDEVGYVTSPWWSPELETNIALAWVPWEA 2642
            GLTLGGKEITDYA DFW+V + D   +     PWWSPEL TNIAL WVPW A
Sbjct: 301  GLTLGGKEITDYAPDFWLVADMDGMMLDISLPPWWSPELNTNIALGWVPWSA 352

This is EMBOSS output (from EBI):

>EMBOSS_001_4
LLISTRVRKSPFWHLSHEAGCWRATVYNRIYHPRGYVKPEDGGAMVEYEALVNRVTMWNV
AVERQIRVKGPDAEAFTDYVITRDATKIKPGNGKYAILCNEKGGVLNDPVLLRLTEDEFW
FSISDSDLLLWLQGVNVSKKYDVEIDEIDVCPVQIQGPLSEDLMAKLAGEELREVPYYGI
LETQVGGADCVISQTGFTGEKGYEIYVRDAHDNAEKMWNAVLEAGEEFGLMVIAPAHHRR
IAAGILSWGQDLDHETSPFQVNLSYQVPRNKAADYIGKEELEKQRAIIDEGNAPFKMKMV
GLTLGGKEITDYASDFWIVTNTDEDEVGYVTSPWWSPELETNIALAWVPWEA
>EMBOSS_001_5
INFNPGKKIAFLASIS*SGLLESNCI*SNLSSTRLCKT*RWRCHG*IRSLG**SYYVECC
C*KTDKSKRSRCRSFYRLCNNS*CYKN*TRKW*ICDFMQ*KRRGFK*PCSIKTNRR*ILV

......

You can see its a frame -1.

I would really appreciate your help.


Thanks

Shalabh

-- 
Shalabh Sharma
Scientific Computing Professional Associate (Bioinformatics Specialist)
Department of Marine Sciences
University of Georgia
Athens, GA 30602-3636


From scott at scottcain.net  Mon Jul  2 14:50:45 2012
From: scott at scottcain.net (Scott Cain)
Date: Mon, 2 Jul 2012 14:50:45 -0400
Subject: [Bioperl-l] GMOD Summer School application deadline
Message-ID: <CA+JTaoym218vQ9_VvtwvMW9ZaoHfcu1hxG59L2z=ovE6qWN6Mw@mail.gmail.com>

Hello,

The deadline to apply for the GMOD Summer School is in one week, July
9th.  The application is available as a Google Form:

  https://docs.google.com/spreadsheet/embeddedform?formkey=dG5hNGFiQ3UwYTV2LUZxZW04Qm1yZXc6MQ

In the GMOD Summer School (August 24-29, 2012) we will cover the
installation, configuration and use of a variety of GMOD tools,
including Chado, GBrowse, JBrowse and Tripal.  For more information on
the course, see the course web page at

  http://gmod.org/wiki/2012_GMOD_Summer_School

The course will make heavy use of the Amazon Web Service (aka, the
Cloud) via a grant from Amazon.  Enrollment is limited to 24 students,
and the application process is competitive: the last few years we've
received over 75 applications for those 24 spots.

I look forward to seeing you in North Carolina in August!

Scott


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

From p.j.a.cock at googlemail.com  Mon Jul  2 15:34:40 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 2 Jul 2012 20:34:40 +0100
Subject: [Bioperl-l] translation frame problem in bioperl
In-Reply-To: <CAA7rn9cnqjsMv-ggCn_H+WavqdtCZZfh37v8pkLak1-ct8UX+A@mail.gmail.com>
References: <CAA7rn9cnqjsMv-ggCn_H+WavqdtCZZfh37v8pkLak1-ct8UX+A@mail.gmail.com>
Message-ID: <CAKVJ-_5nX7f2ZejEpeVnMhd35PYxxrJX2mxu4A712NPOxvie0g@mail.gmail.com>

On Mon, Jul 2, 2012 at 6:09 PM, shalabh sharma
<shalabh.sharma7 at gmail.com> wrote:
> Hi All,
> ? ? ? ? ?I am just confused about the translation frames. I used bioperl to
> parse a blastx report.
> Reports shows that the frame used is -2 but when i translate the sequence
> using EMBOSS or Some other program the frame is -1.
> Am i doing something wrong here.

Possibly there are conflicting definitions of frames -1, -2, and -3 here
(and that's leaving out the possibility of -0, -1 and -2 counting). Some
will count from the first base (start for forward strand), others the last
base (start of reverse strand). This can make comparing the output
of different tools quite confusing.

Peter


From shalabh.sharma7 at gmail.com  Mon Jul  2 16:39:29 2012
From: shalabh.sharma7 at gmail.com (shalabh sharma)
Date: Mon, 2 Jul 2012 16:39:29 -0400
Subject: [Bioperl-l] translation frame problem in bioperl
In-Reply-To: <98F2B304-71F3-42BD-9603-6858F03CC9F5@verizon.net>
References: <CAA7rn9cnqjsMv-ggCn_H+WavqdtCZZfh37v8pkLak1-ct8UX+A@mail.gmail.com>
	<CAKVJ-_5nX7f2ZejEpeVnMhd35PYxxrJX2mxu4A712NPOxvie0g@mail.gmail.com>
	<98F2B304-71F3-42BD-9603-6858F03CC9F5@verizon.net>
Message-ID: <CAA7rn9eiKh6EszMsV=Vk65rJPywOUJijP0Q=rYg_vt=xoK-0=Q@mail.gmail.com>

Hi Peter and Brian,
                           Thanks a lot for your reply.
I have already taken this in account.
So if i parse the blast report (my previous example)
i get strand '-1' and frame '1' (according to bioperl)
so if we convert it to general term then its -2 because bioperl starts from
0.

Also for bioperl forward frame translation working fine.


Thanks
Shalabh
On Mon, Jul 2, 2012 at 4:24 PM, Brian Osborne <bosborne11 at verizon.net>wrote:

> Shalabh,
>
> Also take a look at this:
>
> http://www.bioperl.org/wiki/HOWTO:SearchIO#frame.28.29
>
> Brian O.
>
>
> On Jul 2, 2012, at 3:34 PM, Peter Cock wrote:
>
> > On Mon, Jul 2, 2012 at 6:09 PM, shalabh sharma
> > <shalabh.sharma7 at gmail.com> wrote:
> >> Hi All,
> >>          I am just confused about the translation frames. I used
> bioperl to
> >> parse a blastx report.
> >> Reports shows that the frame used is -2 but when i translate the
> sequence
> >> using EMBOSS or Some other program the frame is -1.
> >> Am i doing something wrong here.
> >
> > Possibly there are conflicting definitions of frames -1, -2, and -3 here
> > (and that's leaving out the possibility of -0, -1 and -2 counting). Some
> > will count from the first base (start for forward strand), others the
> last
> > base (start of reverse strand). This can make comparing the output
> > of different tools quite confusing.
> >
> > Peter
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


-- 
Shalabh Sharma
Scientific Computing Professional Associate (Bioinformatics Specialist)
Department of Marine Sciences
University of Georgia
Athens, GA 30602-3636

From bosborne11 at verizon.net  Mon Jul  2 16:24:24 2012
From: bosborne11 at verizon.net (Brian Osborne)
Date: Mon, 02 Jul 2012 16:24:24 -0400
Subject: [Bioperl-l] translation frame problem in bioperl
In-Reply-To: <CAKVJ-_5nX7f2ZejEpeVnMhd35PYxxrJX2mxu4A712NPOxvie0g@mail.gmail.com>
References: <CAA7rn9cnqjsMv-ggCn_H+WavqdtCZZfh37v8pkLak1-ct8UX+A@mail.gmail.com>
	<CAKVJ-_5nX7f2ZejEpeVnMhd35PYxxrJX2mxu4A712NPOxvie0g@mail.gmail.com>
Message-ID: <98F2B304-71F3-42BD-9603-6858F03CC9F5@verizon.net>

Shalabh,

Also take a look at this:

http://www.bioperl.org/wiki/HOWTO:SearchIO#frame.28.29

Brian O.


On Jul 2, 2012, at 3:34 PM, Peter Cock wrote:

> On Mon, Jul 2, 2012 at 6:09 PM, shalabh sharma
> <shalabh.sharma7 at gmail.com> wrote:
>> Hi All,
>>          I am just confused about the translation frames. I used bioperl to
>> parse a blastx report.
>> Reports shows that the frame used is -2 but when i translate the sequence
>> using EMBOSS or Some other program the frame is -1.
>> Am i doing something wrong here.
> 
> Possibly there are conflicting definitions of frames -1, -2, and -3 here
> (and that's leaving out the possibility of -0, -1 and -2 counting). Some
> will count from the first base (start for forward strand), others the last
> base (start of reverse strand). This can make comparing the output
> of different tools quite confusing.
> 
> Peter
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From vebaev at gmail.com  Tue Jul  3 12:35:26 2012
From: vebaev at gmail.com (vebaev at gmail.com)
Date: Tue, 3 Jul 2012 09:35:26 -0700 (PDT)
Subject: [Bioperl-l] CFP - International Conference on Bioinformatics and
 Computational Biology - BIOCOMP BG 2012
Message-ID: <7b498b4c-2b2e-4e1f-871f-513203488bf1@googlegroups.com>

International Conference on Bioinformatics and Computational Biology - 
BIOCOMP BG 2012 
September 20-21, 2012, Varna, Bulgaria
 

Dear Colleague,
It is our pleasure to circulate the 2nd announcement of the International 
Conference on Bioinformatics and Computational Biology - BIOCOMP 2012 
(http://biocomp.bio.uni-plovdiv.bg/).
 
Keynote speakers
 
Prof. Dr. Klaas Vandepoele - Ghent University, Belgium
Dr. Andreas Gisel - Institute for Biomedical Technologies, Italy
Prof. Wojciech Karlowski - Insitute of Molecular Biology and Biotechnology, 
Poland
Prof. Mario A. Fares - University of Dublin, Trinity College, Ireland
Dr.Andrey Kajava - CRBM - Macromolecular Biochemistry Research Center, 
France
Dr.Gaurav Sablok - Istituto Agrario San Michele (IASMA), Italy
 
Topics
 
Topics of interest include, but are not limited to:
 
High-performance bio-computing
High-throughput sequencing data analysis (NGS)
Bio-ontologies
Molecular evolution
Comparative genomics
Molecular modeling and simulation
Computational genetics 
Computational proteomics
Data mining and visualization
Software tools and applications
Gene expression analysis
Gene networks
Structural biology
Genome analysis
Databases
Systems biology
 
Special topic: bioinformatics and miRNAs
 
Recent achievements in these fields will be presented. The conference will 
include plenary and poster sessions. Participant?s proposals will be taken 
under advisement in compiling the program. 
 
Publications
 
All accepted abstracts will be published in the conference abstract book.
Best 20 abstracts will be peer-reviewed and published as full text 
manuscripts in a Special Issue of Springer and Elsevier journals:
Interdisciplinary Sciences: Computational Life Sciences (ISSN: 1867-1462).
Journal of Computational Science (ISSN: 1877-7503)


Venue
The venue of the conference is 4-star All-inclusive Sunny Day Black Sea 
resort, Bulgaria
 
Registration and abstract submission
 
All the actions related to the BIOCOMP 2012  (abstract submission, 
registration etc) may be completed via the Conference website at 
http://biocomp.bio.uni-plovdiv.bg/
 
Accommodation
 
IMPORTANT: Accommodation is included in the conference registration fee.
 
Important dates
 
Abstract Submission Deadline - 20 August 2012
Early Registration Fee Payment Deadline - 20 August 2012
Arriving, Poster set up, Registration ? 19 September 2012
Plenary and Poster Sessions ? 20-21 September 2012
 
 
You may find details of the Conference visiting the Conference website at 
http://biocomp.bio.uni-plovdiv.bg/
 
 
Looking forward to see you in Bulgaria!


------------------------------------------------


Dr. Vesselin Baev
Research Assistant Professor

University of Plovdiv
Dept. Plant Phys. and Molecular Biology
Bioinformatics SMART Group

Tzar Assen 24,Plovdiv 4000, BULGARIA
Office:+359 32 261 (560); Mobile:+359 89 43 80 945
vebaev at gmail.com; baev at uni-plovdiv.bg; CV: http://plantgene.eu/


From tarakaramji at gmail.com  Tue Jul  3 15:33:43 2012
From: tarakaramji at gmail.com (Tarakaramji Moturu)
Date: Tue, 3 Jul 2012 19:33:43 +0000 (UTC)
Subject: [Bioperl-l] Invitation to connect on LinkedIn
Message-ID: <757342252.16905070.1341344023805.JavaMail.app@ela4-bed83.prod>

LinkedIn
------------


I'd like to add you to my professional network on LinkedIn.

- Tarakaramji

Tarakaramji Moturu
Student at GITAM University
Vishakhapatnam Area, India

Confirm that you know Tarakaramji Moturu:
https://www.linkedin.com/e/1505z7-h47dlkop-69/isd/7726719493/9xC087NO/?hs=false&tok=2UuxBwCCkl7Rk1

--
You are receiving Invitation to Connect emails. Click to unsubscribe:
http://www.linkedin.com/e/1505z7-h47dlkop-69/q7l5PgNeLXh3mAgNJzs79PDWzhT0l80xWa/goo/bioperl-l%40bioperl%2Eorg/20061/I2613636655_1/?hs=false&tok=0hY4YIDwkl7Rk1

(c) 2012 LinkedIn Corporation. 2029 Stierlin Ct, Mountain View, CA 94043, USA.


From l.m.timmermans at students.uu.nl  Wed Jul  4 03:16:34 2012
From: l.m.timmermans at students.uu.nl (Leon Timmermans)
Date: Wed, 4 Jul 2012 10:16:34 +0300
Subject: [Bioperl-l] Invitation to connect on LinkedIn
In-Reply-To: <757342252.16905070.1341344023805.JavaMail.app@ela4-bed83.prod>
References: <757342252.16905070.1341344023805.JavaMail.app@ela4-bed83.prod>
Message-ID: <CAC1jpXBYn+3+TtN=n7NoRhnQ87-EuYRpsuBqpE5iHHHAntJ_ew@mail.gmail.com>

On Tue, Jul 3, 2012 at 10:33 PM, Tarakaramji Moturu
<tarakaramji at gmail.com> wrote:
> LinkedIn
> ------------
>
>
>
> I'd like to add you to my professional network on LinkedIn.
>
> - Tarakaramji

Sending messages like this directly over mailinglists is a rather bad
idea, if only because LinkedIn will think bioperl-l at bioperl.org is one
of the email addresses of whomever accepts the request (which is
relevant for retrieving a lost password, I think).

Leon

From ulrik.stervbo at gmail.com  Fri Jul  6 03:03:08 2012
From: ulrik.stervbo at gmail.com (Ulrik Stervbo)
Date: Fri, 6 Jul 2012 09:03:08 +0200
Subject: [Bioperl-l] problem in using protparam.pm module
In-Reply-To: <CACyyM1bpg+pOwCK26M2OgNaBD2VJsaXtc02WkGHrKg1QxVL0Tw@mail.gmail.com>
References: <CACyyM1Za0pW9R2GPxAFOATj4tPM24rbwLsUazGUFHNimZ2LO4g@mail.gmail.com>
	<CACyyM1ZUf2y7fO2Vkc5cSoqSgtW86GA+1yt_ctc9Y8P3Qbnt_A@mail.gmail.com>
	<9AA27ADA-FFE1-4735-BDE4-56C9B9A18009@illinois.edu>
	<CACyyM1bpg+pOwCK26M2OgNaBD2VJsaXtc02WkGHrKg1QxVL0Tw@mail.gmail.com>
Message-ID: <CAKVAULOTmTparqLEuREquzcrO6Hf+Z7MSGRAPBWMTqH0dJ7-xg@mail.gmail.com>

I had the same problem, and realized it is because I am behind a proxy.

This is what I did to the Protparam module:
Changed the url to 'http://web.expasy.org/cgi-bin/protparam/protparam'
as previously found

Added:
$browser->proxy(['http'], 'http://[my proxy]/'); after initialization
of the LWP agent.

The proxy settings is what made Perl choke. (If only one could make
perl see global proxy settings).

Cheers,
Ulrik

2011/7/28 Shachi Gahoi <shachigahoimbi at gmail.com>:
> Please help me how to run protparam using bioperl module
>
> On Wed, Jul 27, 2011 at 11:05 AM, Chris Fields <cjfields at illinois.edu>wrote:
>
>> The web service appears to have changed, but it looks as if no tests have
>> been written up for this module which would have caught this out.  We can
>> write some basic tests up to check for simple functionality.
>>
>> chris
>>
>> On Jul 26, 2011, at 10:58 PM, Shachi Gahoi wrote:
>>
>> > Dear All,
>> >
>> > i am using protparam.pm module. but when i am running this script it is
>> > printing one error message
>> >
>> > "Can't call method "throw" without a package or object reference at
>> > /usr/share/perl5/Bio/Root/Root.pm line 368, <GEN1> line 1."
>> >
>> > Kindly help me to solve this problem.
>> >
>> >
>> > Script is here----
>> >
>> ###################################################################################
>> > #!/usr/bin/perl
>> >
>> > use warnings;
>> > use Bio::SeqIO;
>> > use Bio::Tools::Protparam;
>> >
>> >
>> > $seqfile='test1.fasta';
>> >
>> > $seqio = Bio::SeqIO->new(-file => "$seqfile", -format => 'Fasta');
>> >
>> >
>> >    while(  $seq = $seqio->next_seq() )
>> > {
>> >
>> >
>> >        my $pp = Bio::Tools::Protparam->new(-seq=>$seq->seq);
>> >
>> >        print
>> >        "ID : ", $seq->display_id,"\n",
>> >        "Amino acid number : ",$pp->amino_acid_number(),"\n",
>> >        "Number of negative amino acids : ",$pp->num_neg(),"\n",
>> >        "Number of positive amino acids : ",$pp->num_pos(),"\n",
>> >        "Molecular weight : ",$pp->molecular_weight(),"\n",
>> >        "Theoretical pI : ",$pp->theoretical_pI(),"\n",
>> >        "Total number of atoms : ", $pp->total_atoms(),"\n",
>> >        "Number of carbon atoms : ",$pp->num_carbon(),"\n",
>> >        "Number of hydrogen atoms : ",$pp->num_hydrogen(),"\n",
>> >        "Number of nitrogen atoms : ",$pp->num_nitro(),"\n",
>> >        "Number of oxygen atoms : ",$pp->num_oxygen(),"\n",
>> >        "Number of sulphur atoms : ",$pp->num_sulphur(),"\n",
>> >        "Half life : ", $pp->half_life(),"\n",
>> >        "Instability Index : ", $pp->instability_index(),"\n",
>> >        "Stability class : ", $pp->stability(),"\n",
>> >        "Aliphatic_index : ",$pp->aliphatic_index(),"\n",
>> >        "Gravy : ", $pp->gravy(),"\n",
>> >        "Composition of A : ", $pp->AA_comp('A'),"\n",
>> >        "Composition of R : ", $pp->AA_comp('R'),"\n",
>> >        "Composition of N : ", $pp->AA_comp('N'),"\n",
>> >        "Composition of D : ", $pp->AA_comp('D'),"\n",
>> >        "Composition of C : ", $pp->AA_comp('C'),"\n",
>> >        "Composition of Q : ", $pp->AA_comp('Q'),"\n",
>> >        "Composition of E : ", $pp->AA_comp('E'),"\n",
>> >        "Composition of G : ", $pp->AA_comp('G'),"\n",
>> >        "Composition of H : ", $pp->AA_comp('H'),"\n",
>> >        "Composition of I : ", $pp->AA_comp('I'),"\n",
>> >        "Composition of L : ", $pp->AA_comp('L'),"\n",
>> >        "Composition of K : ", $pp->AA_comp('K'),"\n",
>> >        "Composition of M : ", $pp->AA_comp('M'),"\n",
>> >        "Composition of F : ", $pp->AA_comp('F'),"\n",
>> >        "Composition of P : ", $pp->AA_comp('P'),"\n",
>> >        "Composition of S : ", $pp->AA_comp('S'),"\n",
>> >        "Composition of T : ", $pp->AA_comp('T'),"\n",
>> >        "Composition of W : ", $pp->AA_comp('W'),"\n",
>> >        "Composition of Y : ", $pp->AA_comp('Y'),"\n",
>> >        "Composition of V : ", $pp->AA_comp('V'),"\n",
>> >        "Composition of B : ", $pp->AA_comp('B'),"\n",
>> >        "Composition of Z : ", $pp->AA_comp('Z'),"\n",
>> >        "Composition of X : ", $pp->AA_comp('X'),"\n";
>> > }
>> >
>> ###################################################################################
>> >
>> >
>> >
>> >
>> > --
>> > Regards,
>> > Shachi
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
>
> --
> Regards,
> Shachi
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

From jason.stajich at gmail.com  Fri Jul  6 13:49:46 2012
From: jason.stajich at gmail.com (Jason Stajich)
Date: Fri, 6 Jul 2012 10:49:46 -0700
Subject: [Bioperl-l] problem in using protparam.pm module
In-Reply-To: <CAKVAULOTmTparqLEuREquzcrO6Hf+Z7MSGRAPBWMTqH0dJ7-xg@mail.gmail.com>
References: <CACyyM1Za0pW9R2GPxAFOATj4tPM24rbwLsUazGUFHNimZ2LO4g@mail.gmail.com>
	<CACyyM1ZUf2y7fO2Vkc5cSoqSgtW86GA+1yt_ctc9Y8P3Qbnt_A@mail.gmail.com>
	<9AA27ADA-FFE1-4735-BDE4-56C9B9A18009@illinois.edu>
	<CACyyM1bpg+pOwCK26M2OgNaBD2VJsaXtc02WkGHrKg1QxVL0Tw@mail.gmail.com>
	<CAKVAULOTmTparqLEuREquzcrO6Hf+Z7MSGRAPBWMTqH0dJ7-xg@mail.gmail.com>
Message-ID: <8C9056B6-1DA4-4BE0-B008-429C2F6C05BE@gmail.com>

you might try the PERL_LWP_ENV_PROXY and HTTP_PROXY env variables
http://search.cpan.org/~gaas/libwww-perl-6.04/lib/LWP/UserAgent.pm#CONSTRUCTOR_METHODS
http://search.cpan.org/~gaas/libwww-perl-6.04/lib/LWP/UserAgent.pm#Proxy_attributes

I can't test it my end though w/o a proxy service.


On Jul 6, 2012, at 12:03 AM, Ulrik Stervbo wrote:

> I had the same problem, and realized it is because I am behind a proxy.
> 
> This is what I did to the Protparam module:
> Changed the url to 'http://web.expasy.org/cgi-bin/protparam/protparam'
> as previously found
> 
> Added:
> $browser->proxy(['http'], 'http://[my proxy]/'); after initialization
> of the LWP agent.
> 
> The proxy settings is what made Perl choke. (If only one could make
> perl see global proxy settings).
> 
> Cheers,
> Ulrik
> 
> 2011/7/28 Shachi Gahoi <shachigahoimbi at gmail.com>:
>> Please help me how to run protparam using bioperl module
>> 
>> On Wed, Jul 27, 2011 at 11:05 AM, Chris Fields <cjfields at illinois.edu>wrote:
>> 
>>> The web service appears to have changed, but it looks as if no tests have
>>> been written up for this module which would have caught this out.  We can
>>> write some basic tests up to check for simple functionality.
>>> 
>>> chris
>>> 
>>> On Jul 26, 2011, at 10:58 PM, Shachi Gahoi wrote:
>>> 
>>>> Dear All,
>>>> 
>>>> i am using protparam.pm module. but when i am running this script it is
>>>> printing one error message
>>>> 
>>>> "Can't call method "throw" without a package or object reference at
>>>> /usr/share/perl5/Bio/Root/Root.pm line 368, <GEN1> line 1."
>>>> 
>>>> Kindly help me to solve this problem.
>>>> 
>>>> 
>>>> Script is here----
>>>> 
>>> ###################################################################################
>>>> #!/usr/bin/perl
>>>> 
>>>> use warnings;
>>>> use Bio::SeqIO;
>>>> use Bio::Tools::Protparam;
>>>> 
>>>> 
>>>> $seqfile='test1.fasta';
>>>> 
>>>> $seqio = Bio::SeqIO->new(-file => "$seqfile", -format => 'Fasta');
>>>> 
>>>> 
>>>>   while(  $seq = $seqio->next_seq() )
>>>> {
>>>> 
>>>> 
>>>>       my $pp = Bio::Tools::Protparam->new(-seq=>$seq->seq);
>>>> 
>>>>       print
>>>>       "ID : ", $seq->display_id,"\n",
>>>>       "Amino acid number : ",$pp->amino_acid_number(),"\n",
>>>>       "Number of negative amino acids : ",$pp->num_neg(),"\n",
>>>>       "Number of positive amino acids : ",$pp->num_pos(),"\n",
>>>>       "Molecular weight : ",$pp->molecular_weight(),"\n",
>>>>       "Theoretical pI : ",$pp->theoretical_pI(),"\n",
>>>>       "Total number of atoms : ", $pp->total_atoms(),"\n",
>>>>       "Number of carbon atoms : ",$pp->num_carbon(),"\n",
>>>>       "Number of hydrogen atoms : ",$pp->num_hydrogen(),"\n",
>>>>       "Number of nitrogen atoms : ",$pp->num_nitro(),"\n",
>>>>       "Number of oxygen atoms : ",$pp->num_oxygen(),"\n",
>>>>       "Number of sulphur atoms : ",$pp->num_sulphur(),"\n",
>>>>       "Half life : ", $pp->half_life(),"\n",
>>>>       "Instability Index : ", $pp->instability_index(),"\n",
>>>>       "Stability class : ", $pp->stability(),"\n",
>>>>       "Aliphatic_index : ",$pp->aliphatic_index(),"\n",
>>>>       "Gravy : ", $pp->gravy(),"\n",
>>>>       "Composition of A : ", $pp->AA_comp('A'),"\n",
>>>>       "Composition of R : ", $pp->AA_comp('R'),"\n",
>>>>       "Composition of N : ", $pp->AA_comp('N'),"\n",
>>>>       "Composition of D : ", $pp->AA_comp('D'),"\n",
>>>>       "Composition of C : ", $pp->AA_comp('C'),"\n",
>>>>       "Composition of Q : ", $pp->AA_comp('Q'),"\n",
>>>>       "Composition of E : ", $pp->AA_comp('E'),"\n",
>>>>       "Composition of G : ", $pp->AA_comp('G'),"\n",
>>>>       "Composition of H : ", $pp->AA_comp('H'),"\n",
>>>>       "Composition of I : ", $pp->AA_comp('I'),"\n",
>>>>       "Composition of L : ", $pp->AA_comp('L'),"\n",
>>>>       "Composition of K : ", $pp->AA_comp('K'),"\n",
>>>>       "Composition of M : ", $pp->AA_comp('M'),"\n",
>>>>       "Composition of F : ", $pp->AA_comp('F'),"\n",
>>>>       "Composition of P : ", $pp->AA_comp('P'),"\n",
>>>>       "Composition of S : ", $pp->AA_comp('S'),"\n",
>>>>       "Composition of T : ", $pp->AA_comp('T'),"\n",
>>>>       "Composition of W : ", $pp->AA_comp('W'),"\n",
>>>>       "Composition of Y : ", $pp->AA_comp('Y'),"\n",
>>>>       "Composition of V : ", $pp->AA_comp('V'),"\n",
>>>>       "Composition of B : ", $pp->AA_comp('B'),"\n",
>>>>       "Composition of Z : ", $pp->AA_comp('Z'),"\n",
>>>>       "Composition of X : ", $pp->AA_comp('X'),"\n";
>>>> }
>>>> 
>>> ###################################################################################
>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Regards,
>>>> Shachi
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>>> 
>> 
>> 
>> --
>> Regards,
>> Shachi
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org


From bubli_thakur at rediffmail.com  Sun Jul  1 10:59:29 2012
From: bubli_thakur at rediffmail.com (subarna thakur)
Date: Sun, 01 Jul 2012 14:59:29 -0000
Subject: [Bioperl-l] =?utf-8?q?Ks_saturation?=
Message-ID: <20120617031856.16345.qmail@f4mail-235-140.rediffmail.com>


Dear all,I am trying to calculate dn/ds values of &nbsp;all orthologous gene pair between a pair of genome using pairwsie_kaks.pl script within bioperl which evokes the codeml program in runmode -2. When I am analyzing the results, some of the genes have&nbsp;anomalously high dS or Ks values and some of them even reaching more than 100 as a result of which the average value of Ks for the whole genome &nbsp;shots up. These genes are orthologous genes and even share more than 50% sequence identity. Should I consider these genes for the anlysis or left them out. If I left them out, then upto what cutoff value of Ks should I consider for analysis. In some papers, I have found that they have considered Ks values as high as 5.6. Is there a way for determining the cutoff value for Ks?
Subarna&nbsp;&nbsp;

From haywardjeremya at gmail.com  Fri Jul  6 13:56:12 2012
From: haywardjeremya at gmail.com (Jeremy Hayward)
Date: Fri, 6 Jul 2012 14:56:12 -0300
Subject: [Bioperl-l] Two 'host' tags?
Message-ID: <CAJv8UuyLgFfHe-Vfyv1p+90ypO+NFQ5E_br1YYyL=PKg++yyNA@mail.gmail.com>

Hi--  Clueless newbie here, for which apologies.

I've posted a description of my problem, inputs and outputs, at Gist
2816510; https://gist.github.com/2816510

Briefly, I'm trying to take a genbank file (.gb), and create a FASTA
file with a specific identifier line for each sequence. Specifically,
I want the "host" tag as the identifier. With the help of the Bioperl
beginner readme and the HOWTO's (which are great!) I've worked out how
to loop through my sequences and get the 'host' tag for each one. For
some reason, I get two identifier lines for each sequence. I guess the
problem is in the 'for' loop--it's running the stuff below it twice,
once with the actual 'host' tag data and once with...nothing? Not
sure.

I think I can work out how to use s/ and a regex just to delete the
second identifier line, but that feels like I'm avoiding the problem
instead of fixing it. Any help appreciated!


Many thanks,

--Jeremy Hayward

From jason.stajich at gmail.com  Fri Jul  6 15:39:52 2012
From: jason.stajich at gmail.com (Jason Stajich)
Date: Fri, 6 Jul 2012 12:39:52 -0700
Subject: [Bioperl-l] Two 'host' tags?
In-Reply-To: <CAJv8UuyLgFfHe-Vfyv1p+90ypO+NFQ5E_br1YYyL=PKg++yyNA@mail.gmail.com>
References: <CAJv8UuyLgFfHe-Vfyv1p+90ypO+NFQ5E_br1YYyL=PKg++yyNA@mail.gmail.com>
Message-ID: <A572776F-CF80-4EB2-A853-B62EE5AF5266@gmail.com>

Hi Jeremy -

You are printing for every feature in the loop (e.g. the source and the misc_RNA ) - you only want to loop through the features, then grab the one which is source, then change or print the info when you see that.  So you could have an if( $feature->primary_tag eq 'source') in there or something as well.  Alternatively I've left it pretty much intact and just simplified it a bit.
You should also try and use Bio::SeqIO to print instead of your printing.

I updated the code here to be simpler - right now it warns you that you are printing IDs with spaces (which is something you should think about when it comes to your output file, but I don't know your downstream plans). Also you could put other info in the description field if you wanted to capture accession number or the endophyte name too.

https://gist.github.com/3062285

Best,
Jason
On Jul 6, 2012, at 10:56 AM, Jeremy Hayward wrote:

> Hi--  Clueless newbie here, for which apologies.
> 
> I've posted a description of my problem, inputs and outputs, at Gist
> 2816510; https://gist.github.com/2816510
> 
> Briefly, I'm trying to take a genbank file (.gb), and create a FASTA
> file with a specific identifier line for each sequence. Specifically,
> I want the "host" tag as the identifier. With the help of the Bioperl
> beginner readme and the HOWTO's (which are great!) I've worked out how
> to loop through my sequences and get the 'host' tag for each one. For
> some reason, I get two identifier lines for each sequence. I guess the
> problem is in the 'for' loop--it's running the stuff below it twice,
> once with the actual 'host' tag data and once with...nothing? Not
> sure.
> 
> I think I can work out how to use s/ and a regex just to delete the
> second identifier line, but that feels like I'm avoiding the problem
> instead of fixing it. Any help appreciated!
> 
> 
> Many thanks,
> 
> --Jeremy Hayward
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org


From bosborne11 at verizon.net  Fri Jul  6 15:51:11 2012
From: bosborne11 at verizon.net (Brian Osborne)
Date: Fri, 06 Jul 2012 15:51:11 -0400
Subject: [Bioperl-l] Two 'host' tags?
In-Reply-To: <CAJv8UuyLgFfHe-Vfyv1p+90ypO+NFQ5E_br1YYyL=PKg++yyNA@mail.gmail.com>
References: <CAJv8UuyLgFfHe-Vfyv1p+90ypO+NFQ5E_br1YYyL=PKg++yyNA@mail.gmail.com>
Message-ID: <456448FF-C413-42D1-833A-FAA74E4FEF9E@verizon.net>

Jeremy,

Looks like each of your individual sequences has 2 features, but you only care about the 'source' feature (   if ($feat_object->primary_tag eq "source") ?).

Also, try not to print out the sequence like you're doing, try to build a Sequence object for each input sequence and then write its contents to your fasta file using write_seq(). You will set the id for your Sequence object using display_name().

Brian O.


On Jul 6, 2012, at 1:56 PM, Jeremy Hayward wrote:

> Hi--  Clueless newbie here, for which apologies.
> 
> I've posted a description of my problem, inputs and outputs, at Gist
> 2816510; https://gist.github.com/2816510
> 
> Briefly, I'm trying to take a genbank file (.gb), and create a FASTA
> file with a specific identifier line for each sequence. Specifically,
> I want the "host" tag as the identifier. With the help of the Bioperl
> beginner readme and the HOWTO's (which are great!) I've worked out how
> to loop through my sequences and get the 'host' tag for each one. For
> some reason, I get two identifier lines for each sequence. I guess the
> problem is in the 'for' loop--it's running the stuff below it twice,
> once with the actual 'host' tag data and once with...nothing? Not
> sure.
> 
> I think I can work out how to use s/ and a regex just to delete the
> second identifier line, but that feels like I'm avoiding the problem
> instead of fixing it. Any help appreciated!
> 
> 
> Many thanks,
> 
> --Jeremy Hayward
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From dejian.zhao at gmail.com  Wed Jul 11 13:31:37 2012
From: dejian.zhao at gmail.com (De-Jian Zhao)
Date: Thu, 12 Jul 2012 01:31:37 +0800
Subject: [Bioperl-l] Errors with Bio::Seq objects
Message-ID: <4FFDB879.1020906@gmail.com>

Hi,

I downloaded a nucleotide sequence from Genbank (file name: nt.gb) and 
tested the Bio::SeqIO module as follows. The first 3 commands succeeded; 
however the last 2 failed.

I think $seqio->next_seq() produces a Bio::Seq object which contains the 
sequence, features and annotation (according to the DESCRIPTION of 
"perldoc Bio::Seq") and thus the invocation of the methods 
get_SeqFeatures() and translate() should be valid. However, the results 
denied this idea.

Will anyone explain what happened to the last 2 commands? I have 
encountered numerous cases of failures when testing the bioperl methods. 
I want to translate the mRNA sequence and extract the sequence features. 
What are the right commands? Thanks a lot!

Best,
Dejian


PS: The commands and results

$ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); 
print $seqio->next_seq()->display_id(); ' nt.gb
NM_053056

$ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); 
print $seqio->next_seq()->molecule(); ' nt.gb
mRNA

$ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); 
print $seqio->next_seq()->subseq(1,6); ' nt.gb
CACACG

$ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); 
print $seqio->next_seq()->get_SeqFeatures(); ' nt.gb
Bio::SeqFeature::Generic=HASH(0x20a30898)Bio::SeqFeature::Generic=HASH(0x20a30bb0)Bio::SeqFeature::Generic=HASH(0x20a30cd0)Bio::SeqFeature::Generic=HASH(0x20a317b0)Bio::SeqFeature::Generic=HASH(0x20a31720)Bio::SeqFeature::Generic=HASH(0x20a39a18)Bio::SeqFeature::Generic=HASH(0x20a317e0)Bio::SeqFeature::Generic=HASH(0x20a398b0)Bio::SeqFeature::Generic=HASH(0x20a39838)Bio::SeqFeature::Generic=HASH(0x20a39e98)Bio::SeqFeature::Generic=HASH(0x20a3b898)Bio::SeqFeature::Generic=HASH(0x20a3a120)Bio::SeqFeature::Generic=HASH(0x20a3bda8)Bio::SeqFeature::Generic=HASH(0x20a3c030)Bio::SeqFeature::Generic=HASH(0x20a3c2b8)Bio::SeqFeature::Generic=HASH(0x20a3be20)Bio::SeqFeature::Generic=HASH(0x20a3c0a8)Bio::SeqFeature::Generic=HASH(0x20a3bb98)Bio::SeqFeature::Generic=HASH(0x20a3c300)Bio::SeqFeature::Generic=HASH(0x20a3c588)Bio::SeqFeature::Generic=HASH(0x20a3d838)Bio::SeqFeature::Generic=HASH(0x20a3dfb8)Bio::SeqFeature::Generic=HASH(0x20a3dd18)

$ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); 
print $seqio->next_seq()->translate(); ' nt.gb
Bio::Seq::RichSeq=HASH(0x20a3e7b0)


From jimhu at tamu.edu  Wed Jul 11 14:01:27 2012
From: jimhu at tamu.edu (Jim Hu)
Date: Wed, 11 Jul 2012 13:01:27 -0500
Subject: [Bioperl-l] Errors with Bio::Seq objects
In-Reply-To: <4FFDB879.1020906@gmail.com>
References: <4FFDB879.1020906@gmail.com>
Message-ID: <9CA9DA3A-B03F-4EC3-977C-E18A6F4D9B6F@tamu.edu>

Hi Dejian,

On Jul 11, 2012, at 12:31 PM, De-Jian Zhao wrote:

> Hi,
> 
> I downloaded a nucleotide sequence from Genbank (file name: nt.gb) and tested the Bio::SeqIO module as follows. The first 3 commands succeeded; however the last 2 failed.
> 
> I think $seqio->next_seq() produces a Bio::Seq object which contains the sequence, features and annotation (according to the DESCRIPTION of "perldoc Bio::Seq") and thus the invocation of the methods get_SeqFeatures() and translate() should be valid. However, the results denied this idea.

That's correct about Bio::Seq objects being returned. Actually, it is probably a kind of Bio::Seq object. For example, SeqIO may return a Bio::Seq::RichSeq object that inherits methods from Bio::Seq objects.  However, as explained below, the methods are working as they should... they are just returning objects when you are expecting something else.

> 
> Will anyone explain what happened to the last 2 commands? I have encountered numerous cases of failures when testing the bioperl methods. I want to translate the mRNA sequence and extract the sequence features. What are the right commands? Thanks a lot!
> 
> Best,
> Dejian
> 
> 
> 
> PS: The commands and results
> 
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->display_id(); ' nt.gb
> NM_053056
> 
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->molecule(); ' nt.gb
> mRNA
> 
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->subseq(1,6); ' nt.gb
> CACACG
> 
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->get_SeqFeatures(); ' nt.gb
> Bio::SeqFeature::Generic=HASH(0x20a30898)Bio::SeqFeature::Generic=HASH(0x20a30bb0)Bio::SeqFeature::Generic=HASH(0x20a30cd0)Bio::SeqFeature::Generic=HASH(0x20a317b0)Bio::SeqFeature::Generic=HASH(0x20a31720)Bio::SeqFeature::Generic=HASH(0x20a39a18)Bio::SeqFeature::Generic=HASH(0x20a317e0)Bio::SeqFeature::Generic=HASH(0x20a398b0)Bio::SeqFeature::Generic=HASH(0x20a39838)Bio::SeqFeature::Generic=HASH(0x20a39e98)Bio::SeqFeature::Generic=HASH(0x20a3b898)Bio::SeqFeature::Generic=HASH(0x20a3a120)Bio::SeqFeature::Generic=HASH(0x20a3bda8)Bio::SeqFeature::Generic=HASH(0x20a3c030)Bio::SeqFeature::Generic=HASH(0x20a3c2b8)Bio::SeqFeature::Generic=HASH(0x20a3be20)Bio::SeqFeature::Generic=HASH(0x20a3c0a8)Bio::SeqFeature::Generic=HASH(0x20a3bb98)Bio::SeqFeature::Generic=HASH(0x20a3c300)Bio::SeqFeature::Generic=HASH(0x20a3c588)Bio::SeqFeature::Generic=HASH(0x20a3d838)Bio::SeqFeature::Generic=HASH(0x20a3dfb8)Bio::SeqFeature::Generic=HASH(0x20a3dd18)

http://doc.bioperl.org/bioperl-live/Bio/Seq.html#POD24

$seq_obj->get_SeqFeatures() returns an array of SeqFeature objects, which are references.  So this worked as expected.

I usually write this as script files, so I've never done it all with perl -e.  But you need to iterate over the array and query the objects for the information you want about the features.


> 
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->translate(); ' nt.gb
> Bio::Seq::RichSeq=HASH(0x20a3e7b0)

->translate returns a new Seq object.  I think 

$ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->translate()->seq(); ' nt.gb

should work (haven't tried it).

Jim

> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

=====================================
Jim Hu
Professor
Dept. of Biochemistry and Biophysics
2128 TAMU
Texas A&M Univ.
College Station, TX 77843-2128
979-862-4054


From bosborne11 at verizon.net  Wed Jul 11 13:47:25 2012
From: bosborne11 at verizon.net (Brian Osborne)
Date: Wed, 11 Jul 2012 13:47:25 -0400
Subject: [Bioperl-l] Errors with Bio::Seq objects
In-Reply-To: <4FFDB879.1020906@gmail.com>
References: <4FFDB879.1020906@gmail.com>
Message-ID: <DBC94184-BB61-47EB-BC2A-D306E94BC25A@verizon.net>

Dejian,

These are not "failures". The get_SeqFeatures() and translate() methods will return Bio::Seq objects or a Bio::Seq object.

Start here:

www.bioperl.org/wiki/HOWTO:Beginners

Brian O.


On Jul 11, 2012, at 1:31 PM, De-Jian Zhao wrote:

> Hi,
> 
> I downloaded a nucleotide sequence from Genbank (file name: nt.gb) and tested the Bio::SeqIO module as follows. The first 3 commands succeeded; however the last 2 failed.
> 
> I think $seqio->next_seq() produces a Bio::Seq object which contains the sequence, features and annotation (according to the DESCRIPTION of "perldoc Bio::Seq") and thus the invocation of the methods get_SeqFeatures() and translate() should be valid. However, the results denied this idea.
> 
> Will anyone explain what happened to the last 2 commands? I have encountered numerous cases of failures when testing the bioperl methods. I want to translate the mRNA sequence and extract the sequence features. What are the right commands? Thanks a lot!
> 
> Best,
> Dejian
> 
> 
> 
> PS: The commands and results
> 
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->display_id(); ' nt.gb
> NM_053056
> 
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->molecule(); ' nt.gb
> mRNA
> 
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->subseq(1,6); ' nt.gb
> CACACG
> 
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->get_SeqFeatures(); ' nt.gb
> Bio::SeqFeature::Generic=HASH(0x20a30898)Bio::SeqFeature::Generic=HASH(0x20a30bb0)Bio::SeqFeature::Generic=HASH(0x20a30cd0)Bio::SeqFeature::Generic=HASH(0x20a317b0)Bio::SeqFeature::Generic=HASH(0x20a31720)Bio::SeqFeature::Generic=HASH(0x20a39a18)Bio::SeqFeature::Generic=HASH(0x20a317e0)Bio::SeqFeature::Generic=HASH(0x20a398b0)Bio::SeqFeature::Generic=HASH(0x20a39838)Bio::SeqFeature::Generic=HASH(0x20a39e98)Bio::SeqFeature::Generic=HASH(0x20a3b898)Bio::SeqFeature::Generic=HASH(0x20a3a120)Bio::SeqFeature::Generic=HASH(0x20a3bda8)Bio::SeqFeature::Generic=HASH(0x20a3c030)Bio::SeqFeature::Generic=HASH(0x20a3c2b8)Bio::SeqFeature::Generic=HASH(0x20a3be20)Bio::SeqFeature::Generic=HASH(0x20a3c0a8)Bio::SeqFeature::Generic=HASH(0x20a3bb98)Bio::SeqFeature::Generic=HASH(0x20a3c300)Bio::SeqFeature::Generic=HASH(0x20a3c588)Bio::SeqFeature::Generic=HASH(0x20a3d838)Bio::SeqFeature::Generic=HASH(0x20a3dfb8)Bio::SeqFeature::Generic=HASH(0x20a3dd18)
> 
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->translate(); ' nt.gb
> Bio::Seq::RichSeq=HASH(0x20a3e7b0)
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Wed Jul 11 15:02:46 2012
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Wed, 11 Jul 2012 19:02:46 +0000
Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates
In-Reply-To: <a73f4de8-8a2e-4dec-9073-62ab2e4da668@googlegroups.com>
References: <CA4B2C62.4B0E%givans@missouri.edu>
	<7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com>
	<1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu>
	<EBB9878A-F77D-4A0C-B543-7E9FA0C25ABE@missouri.edu>
	<a73f4de8-8a2e-4dec-9073-62ab2e4da668@googlegroups.com>
Message-ID: <FFCC2B49-3582-4CEE-9658-3710B213E754@illinois.edu>

Peng, 

Has this been filed as a bug yet?  

    https://redmine.open-bio.org/projects/bioperl

Seems like it would be fairly easy to fix, but I want to track it just in case.

chris

On Jul 11, 2012, at 12:45 PM, Peng Zhou wrote:

> Hello guys,
> 
> Just a follow-up, it seems to me the bioperl-live version is still having the same problem - calling hit "query" while query sequence "hit". I also looked into the test script written for hmmer3 (bioperl-live/t/SearchIO/hmmer.t), and it doesn't deal with the alignment part - I guess that's why this bug was not discovered.
> 
> To be simple, here's an output of hmmsearch v3.0:
> # hmmsearch :: search profile(s) against a sequence database
> # HMMER 3.0 (March 2010); http://hmmer.org/
> # Copyright (C) 2010 Howard Hughes Medical Institute.
> # Freely distributed under the GNU General Public License (GPLv3).
> # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> # query HMM file:                  /project/youngn/zhoup/Scripts/spada/profile/21_all.hmm
> # target sequence database:        /project/youngn/zhoup/Data/misc3/spada/Athaliana/01_genome/12_refseq_orf.fa
> # output directed to file:         /project/youngn/zhoup/Data/misc3/spada/Athaliana/11_hmmSearchX/01_raw.txt
> # number of worker threads:        4
> # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> 
> Query:       CRP0000  [M=75]
> Scores for complete sequences (score includes all domains):
>    --- full sequence ---   --- best 1 domain ---    -#dom-
>     E-value  score  bias    E-value  score  bias    exp  N  Sequence                 Description
>     ------- ------ -----    ------- ------ -----   ---- --  --------                 -----------
>     5.5e-25   95.0  14.4    5.7e-25   95.0  10.0    1.0  1  Chr2_540228_540404_+     
> 
> Domain annotation for each sequence (and alignments):
> >> Chr2_540228_540404_+  
>    #    score  bias  c-Evalue  i-Evalue hmmfrom  hmm to    alifrom  ali to    envfrom  env to     acc
>  ---   ------ ----- --------- --------- ------- -------    ------- -------    ------- -------    ----
>    1 !   95.0  10.0   3.6e-30   5.7e-25      20      74 ..       4      59 .]       1      59 [] 0.95
> 
>   Alignments for each domain:
>   == domain 1    score: 95.0 bits;  conditional E-value: 3.6e-30
>                CRP0000 20 tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg.rrrCfCtkpc 74
>                           ++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf gG+crg rrrC+Ct++c
>   Chr2_540228_540404_+  4 GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC 59
>                           568899***99********************************************* PP
> 
> And here is a dump of the parsed HSP object:
> $VAR1 = bless( {
>                  'VERBOSE' => 0,
>                  'IDENTICAL' => 0,
>                  'RANK' => 1,
>                  'STRANDED' => 'NONE',
>                  'EVALUE' => '3.6e-30',
>                  'HSP_LENGTH' => 56,
>                  'ALGORITHM' => 'HMMSEARCH'
>                  'SCORE' => '95.0',
>                  'GAP_SYMBOL' => '-',
>                  'CONSERVED' => 0,
>                  
>                  'HIT_NAME' => 'Chr2_540228_540404_+',
>                  'HIT_DESC' => '',
>                  'HIT_START' => '20',
>                  'HIT_END' => '74',
>                  'HIT_LENGTH' => 56,
>                  'HIT_SEQ' => 'tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg-rrrCfCtkpc',
>                  'HIT_FRAME' => 0,
>                  
>                  'QUERY_NAME' => 'CRP0000',
>                  'QUERY_DESC' => undef,
>                  'QUERY_START' => '4',
>                  'QUERY_END' => '59',
>                  'QUERY_LENGTH' => '75',
>                  'QUERY_FRAME' => 0,
>                  'QUERY_SEQ' => 'GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC',
>                  
>                  'HOMOLOGY_SEQ' => '++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf gG+crg rrrC+Ct++c',
>                }, 'Bio::Search::HSP::HMMERHSP' );
> 
> Clearly, the "HIT_START", "HIT_END", "HIT_SEQ" should actually be exchanged with "QUERY_START", "QUERY_END" and "QUERY_SEQ" values.
> 
> Thanks,
> 
> Peng,
> 
> On Tuesday, July 19, 2011 11:23:20 PM UTC-5, Givan, Scott A. wrote:
> I'll try the bioperl-live version. Thanks guys.
> Scott Givan
> 541-740-4685
> Sent from an iPhone (so expect typos).
> 
> On Jul 19, 2011, at 10:34 PM, "Chris Fields" <cjfields at illinois.edu> wrote:
> 
> > This might be a disconnect between the HMMER3 version in bioperl-live and the one in Kai's bioperl-hmmer3 repo.  I believe the one in bioperl-live is newer.  Scott, can you give that a try?
> > 
> > chris
> > 
> > On Jul 19, 2011, at 9:45 PM, Thomas Sharpton wrote:
> > 
> >> Hi Scott,
> >> 
> >> Thanks for writing. I'm on the road at the moment so I have to be briefer and less thorough than I'd like to be.
> >> 
> >> What you are observing is not the intended behavior. Oddly, it's not what I recall obtaining in my tests on this software, though I was mostly interested in hmmsearch at the time and may have been sloppier than I should have been when it came to hmmscan.
> >> 
> >> What version of HMMER3 you're using? There have been some small formatting changes in the past that might be causing a burp in the parser, though I'm doubting it.
> >> 
> >> Kai Blin wrote some test scripts (found here: bioperl-live/t/SearchIO/hmmer.t) that, if I recall correctly, evaluate query/hit coordinates. It might be worth giving this a shot if you haven't already.
> >> 
> >> Also, if you don't mind, I'm happy to run your code on your output file on my end. It might help me diagnose the problem.
> >> 
> >> Sorry this is being a thorn in your side! I've cc'ed the list in case anyone else has insight into this matter.
> >> 
> >> Best,
> >> Thomas
> >> 
> >> On Jul 19, 2011, at 10:43 AM, Givan, Scott A. wrote:
> >> 
> >>> Hi Thomas,
> >>> 
> >>> I'm using modules in the bipoerl-hmmer3 git repository to parse hmmscan
> >>> reports. When I parse the files and walk through the HSP's like:
> >>> 
> >>> while (my $hit = $rslt->next_model) {
> >>> 
> >>>    while (my $domain = $hit->next_hsp) {
> >>> 
> >>> And retrieve the "hit" coordinates like:
> >>> 
> >>>        print "hit coords: ", $domain->start('hit'), "-", $domain->end('hit'),
> >>> "\n";
> >>> 
> >>> The coordinates returned correspond to what I would call the "query",
> >>> since they are for the sequence I fed to hmmscan to search the profile
> >>> database. Likewise, when retrieving the query coordinates like
> >>> $domain->start('query'), I get what I consider the "hit" coordinates,
> >>> since they are for the domain profile. Is this the intended behavior?
> >>> 
> >>> Thanks.
> >>> 
> >>> scott
> >>> 
> >>> -- 
> >>> Scott A. Givan
> >>> Associate Director
> >>> Informatics Research Core Facility
> >>> 240e Bond Life Sciences Center
> >>> Research Assistant Professor
> >>> Molecular Microbiology and Immunology
> >>> University of Missouri, Columbia
> >>> 
> >>> TEL 573-882-2948
> >>> FAX 573-884-9676
> >>> http://ircf.rnet.missouri.edu
> >>> 
> >>> 
> >>> 
> >> 
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> 
> 


From p.j.a.cock at googlemail.com  Wed Jul 11 17:00:56 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 11 Jul 2012 22:00:56 +0100
Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates
In-Reply-To: <FFCC2B49-3582-4CEE-9658-3710B213E754@illinois.edu>
References: <CA4B2C62.4B0E%givans@missouri.edu>
	<7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com>
	<1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu>
	<EBB9878A-F77D-4A0C-B543-7E9FA0C25ABE@missouri.edu>
	<a73f4de8-8a2e-4dec-9073-62ab2e4da668@googlegroups.com>
	<FFCC2B49-3582-4CEE-9658-3710B213E754@illinois.edu>
Message-ID: <CAKVJ-_4mo16NxEbZkNp3-EG9uknc4Fo4m6ajQWTknumQNUn4Yw@mail.gmail.com>

On Wed, Jul 11, 2012 at 8:02 PM, Fields, Christopher J
<cjfields at illinois.edu> wrote:
> Peng,
>
> Has this been filed as a bug yet?
>
>     https://redmine.open-bio.org/projects/bioperl
>
> Seems like it would be fairly easy to fix, but I want to track it just in case.
>
> chris

Hi all,

This could be the unfortunate fact that hmmscan and
hmmsearch return very similar tabular output, but
with query and hit interchanged. i.e. You need some
extra information to know which way round they are
(not possible with the current output). This was an
issue in Bow's Biopython SearchIO project - which
for the moment he solved by handling this as two
hmmer file formats. In the medium term we're hoping
hmmer3 will add some header information or something.

Peter

From zhoupenggeni at gmail.com  Wed Jul 11 13:45:00 2012
From: zhoupenggeni at gmail.com (Peng Zhou)
Date: Wed, 11 Jul 2012 10:45:00 -0700 (PDT)
Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates
In-Reply-To: <EBB9878A-F77D-4A0C-B543-7E9FA0C25ABE@missouri.edu>
References: <CA4B2C62.4B0E%givans@missouri.edu>
	<7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com>
	<1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu>
	<EBB9878A-F77D-4A0C-B543-7E9FA0C25ABE@missouri.edu>
Message-ID: <a73f4de8-8a2e-4dec-9073-62ab2e4da668@googlegroups.com>

Hello guys,

Just a follow-up, it seems to me the bioperl-live version is still having 
the same problem - calling hit "query" while query sequence "hit". I also 
looked into the test script written for hmmer3 (
bioperl-live/t/SearchIO/hmmer.t), and it doesn't deal with the alignment 
part - I guess that's why this bug was not discovered.

To be simple, here's an output of hmmsearch v3.0:
# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.0 (March 2010); http://hmmer.org/
# Copyright (C) 2010 Howard Hughes Medical Institute.
# Freely distributed under the GNU General Public License (GPLv3).
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                 
 /project/youngn/zhoup/Scripts/spada/profile/21_all.hmm
# target sequence database:       
 /project/youngn/zhoup/Data/misc3/spada/Athaliana/01_genome/12_refseq_orf.fa
# output directed to file:         
/project/youngn/zhoup/Data/misc3/spada/Athaliana/11_hmmSearchX/01_raw.txt
# number of worker threads:        4
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       CRP0000  [M=75]
Scores for complete sequences (score includes all domains):
   --- full sequence ---   --- best 1 domain ---    -#dom-
    E-value  score  bias    E-value  score  bias    exp  N  Sequence       
          Description
    ------- ------ -----    ------- ------ -----   ---- --  --------       
          -----------
    5.5e-25   95.0  14.4    5.7e-25   95.0  10.0    1.0  1 
 Chr2_540228_540404_+     

Domain annotation for each sequence (and alignments):
>> Chr2_540228_540404_+  
   #    score  bias  c-Evalue  i-Evalue hmmfrom  hmm to    alifrom  ali to 
   envfrom  env to     acc
 ---   ------ ----- --------- --------- ------- -------    ------- ------- 
   ------- -------    ----
   1 !   95.0  10.0   3.6e-30   5.7e-25      20      74 ..       4      59 
.]       1      59 [] 0.95

  Alignments for each domain:
  == domain 1    score: 95.0 bits;  conditional E-value: 3.6e-30
               CRP0000 20 
tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg.rrrCfCtkpc 74
                          ++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf gG+crg 
rrrC+Ct++c
  Chr2_540228_540404_+  4 
GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC 59
                          
568899***99********************************************* PP

And here is a dump of the parsed HSP object:
$VAR1 = bless( {
                 'VERBOSE' => 0,
                 'IDENTICAL' => 0,
                 'RANK' => 1,
                 'STRANDED' => 'NONE',
                 'EVALUE' => '3.6e-30',
                 'HSP_LENGTH' => 56,
                 'ALGORITHM' => 'HMMSEARCH'
                 'SCORE' => '95.0',
                 'GAP_SYMBOL' => '-',
                 'CONSERVED' => 0,
                 
                 'HIT_NAME' => 'Chr2_540228_540404_+',
                 'HIT_DESC' => '',
                 'HIT_START' => '20',
                 'HIT_END' => '74',
                 'HIT_LENGTH' => 56,
                 'HIT_SEQ' => 
'tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg-rrrCfCtkpc',
                 'HIT_FRAME' => 0,
                 
                 'QUERY_NAME' => 'CRP0000',
                 'QUERY_DESC' => undef,
                 'QUERY_START' => '4',
                 'QUERY_END' => '59',
                 'QUERY_LENGTH' => '75',
                 'QUERY_FRAME' => 0,
                 'QUERY_SEQ' => 
'GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC',
                 
                 'HOMOLOGY_SEQ' => '++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf 
gG+crg rrrC+Ct++c',
               }, 'Bio::Search::HSP::HMMERHSP' );

Clearly, the "HIT_START", "HIT_END", "HIT_SEQ" should actually be exchanged 
with "QUERY_START", "QUERY_END" and "QUERY_SEQ" values.

Thanks,

Peng,

On Tuesday, July 19, 2011 11:23:20 PM UTC-5, Givan, Scott A. wrote:
>
> I'll try the bioperl-live version. Thanks guys. 
>
> Scott Givan
> 541-740-4685
> Sent from an iPhone (so expect typos). 
>
> On Jul 19, 2011, at 10:34 PM, "Chris Fields" <cjfields at illinois.edu> 
> wrote:
>
> > This might be a disconnect between the HMMER3 version in bioperl-live 
> and the one in Kai's bioperl-hmmer3 repo.  I believe the one in 
> bioperl-live is newer.  Scott, can you give that a try?
> > 
> > chris
> > 
> > On Jul 19, 2011, at 9:45 PM, Thomas Sharpton wrote:
> > 
> >> Hi Scott,
> >> 
> >> Thanks for writing. I'm on the road at the moment so I have to be 
> briefer and less thorough than I'd like to be.
> >> 
> >> What you are observing is not the intended behavior. Oddly, it's not 
> what I recall obtaining in my tests on this software, though I was mostly 
> interested in hmmsearch at the time and may have been sloppier than I 
> should have been when it came to hmmscan.
> >> 
> >> What version of HMMER3 you're using? There have been some small 
> formatting changes in the past that might be causing a burp in the parser, 
> though I'm doubting it.
> >> 
> >> Kai Blin wrote some test scripts (found here: 
> bioperl-live/t/SearchIO/hmmer.t) that, if I recall correctly, evaluate 
> query/hit coordinates. It might be worth giving this a shot if you haven't 
> already.
> >> 
> >> Also, if you don't mind, I'm happy to run your code on your output file 
> on my end. It might help me diagnose the problem.
> >> 
> >> Sorry this is being a thorn in your side! I've cc'ed the list in case 
> anyone else has insight into this matter.
> >> 
> >> Best,
> >> Thomas
> >> 
> >> On Jul 19, 2011, at 10:43 AM, Givan, Scott A. wrote:
> >> 
> >>> Hi Thomas,
> >>> 
> >>> I'm using modules in the bipoerl-hmmer3 git repository to parse hmmscan
> >>> reports. When I parse the files and walk through the HSP's like:
> >>> 
> >>> while (my $hit = $rslt->next_model) {
> >>> 
> >>>    while (my $domain = $hit->next_hsp) {
> >>> 
> >>> And retrieve the "hit" coordinates like:
> >>> 
> >>>        print "hit coords: ", $domain->start('hit'), "-", 
> $domain->end('hit'),
> >>> "\n";
> >>> 
> >>> The coordinates returned correspond to what I would call the "query",
> >>> since they are for the sequence I fed to hmmscan to search the profile
> >>> database. Likewise, when retrieving the query coordinates like
> >>> $domain->start('query'), I get what I consider the "hit" coordinates,
> >>> since they are for the domain profile. Is this the intended behavior?
> >>> 
> >>> Thanks.
> >>> 
> >>> scott
> >>> 
> >>> -- 
> >>> Scott A. Givan
> >>> Associate Director
> >>> Informatics Research Core Facility
> >>> 240e Bond Life Sciences Center
> >>> Research Assistant Professor
> >>> Molecular Microbiology and Immunology
> >>> University of Missouri, Columbia
> >>> 
> >>> TEL 573-882-2948
> >>> FAX 573-884-9676
> >>> http://ircf.rnet.missouri.edu
> >>> 
> >>> 
> >>> 
> >> 
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>

From zhoupenggeni at gmail.com  Wed Jul 11 13:45:00 2012
From: zhoupenggeni at gmail.com (Peng Zhou)
Date: Wed, 11 Jul 2012 10:45:00 -0700 (PDT)
Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates
In-Reply-To: <EBB9878A-F77D-4A0C-B543-7E9FA0C25ABE@missouri.edu>
References: <CA4B2C62.4B0E%givans@missouri.edu>
	<7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com>
	<1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu>
	<EBB9878A-F77D-4A0C-B543-7E9FA0C25ABE@missouri.edu>
Message-ID: <a73f4de8-8a2e-4dec-9073-62ab2e4da668@googlegroups.com>

Hello guys,

Just a follow-up, it seems to me the bioperl-live version is still having 
the same problem - calling hit "query" while query sequence "hit". I also 
looked into the test script written for hmmer3 (
bioperl-live/t/SearchIO/hmmer.t), and it doesn't deal with the alignment 
part - I guess that's why this bug was not discovered.

To be simple, here's an output of hmmsearch v3.0:
# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.0 (March 2010); http://hmmer.org/
# Copyright (C) 2010 Howard Hughes Medical Institute.
# Freely distributed under the GNU General Public License (GPLv3).
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                 
 /project/youngn/zhoup/Scripts/spada/profile/21_all.hmm
# target sequence database:       
 /project/youngn/zhoup/Data/misc3/spada/Athaliana/01_genome/12_refseq_orf.fa
# output directed to file:         
/project/youngn/zhoup/Data/misc3/spada/Athaliana/11_hmmSearchX/01_raw.txt
# number of worker threads:        4
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       CRP0000  [M=75]
Scores for complete sequences (score includes all domains):
   --- full sequence ---   --- best 1 domain ---    -#dom-
    E-value  score  bias    E-value  score  bias    exp  N  Sequence       
          Description
    ------- ------ -----    ------- ------ -----   ---- --  --------       
          -----------
    5.5e-25   95.0  14.4    5.7e-25   95.0  10.0    1.0  1 
 Chr2_540228_540404_+     

Domain annotation for each sequence (and alignments):
>> Chr2_540228_540404_+  
   #    score  bias  c-Evalue  i-Evalue hmmfrom  hmm to    alifrom  ali to 
   envfrom  env to     acc
 ---   ------ ----- --------- --------- ------- -------    ------- ------- 
   ------- -------    ----
   1 !   95.0  10.0   3.6e-30   5.7e-25      20      74 ..       4      59 
.]       1      59 [] 0.95

  Alignments for each domain:
  == domain 1    score: 95.0 bits;  conditional E-value: 3.6e-30
               CRP0000 20 
tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg.rrrCfCtkpc 74
                          ++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf gG+crg 
rrrC+Ct++c
  Chr2_540228_540404_+  4 
GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC 59
                          
568899***99********************************************* PP

And here is a dump of the parsed HSP object:
$VAR1 = bless( {
                 'VERBOSE' => 0,
                 'IDENTICAL' => 0,
                 'RANK' => 1,
                 'STRANDED' => 'NONE',
                 'EVALUE' => '3.6e-30',
                 'HSP_LENGTH' => 56,
                 'ALGORITHM' => 'HMMSEARCH'
                 'SCORE' => '95.0',
                 'GAP_SYMBOL' => '-',
                 'CONSERVED' => 0,
                 
                 'HIT_NAME' => 'Chr2_540228_540404_+',
                 'HIT_DESC' => '',
                 'HIT_START' => '20',
                 'HIT_END' => '74',
                 'HIT_LENGTH' => 56,
                 'HIT_SEQ' => 
'tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg-rrrCfCtkpc',
                 'HIT_FRAME' => 0,
                 
                 'QUERY_NAME' => 'CRP0000',
                 'QUERY_DESC' => undef,
                 'QUERY_START' => '4',
                 'QUERY_END' => '59',
                 'QUERY_LENGTH' => '75',
                 'QUERY_FRAME' => 0,
                 'QUERY_SEQ' => 
'GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC',
                 
                 'HOMOLOGY_SEQ' => '++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf 
gG+crg rrrC+Ct++c',
               }, 'Bio::Search::HSP::HMMERHSP' );

Clearly, the "HIT_START", "HIT_END", "HIT_SEQ" should actually be exchanged 
with "QUERY_START", "QUERY_END" and "QUERY_SEQ" values.

Thanks,

Peng,

On Tuesday, July 19, 2011 11:23:20 PM UTC-5, Givan, Scott A. wrote:
>
> I'll try the bioperl-live version. Thanks guys. 
>
> Scott Givan
> 541-740-4685
> Sent from an iPhone (so expect typos). 
>
> On Jul 19, 2011, at 10:34 PM, "Chris Fields" <cjfields at illinois.edu> 
> wrote:
>
> > This might be a disconnect between the HMMER3 version in bioperl-live 
> and the one in Kai's bioperl-hmmer3 repo.  I believe the one in 
> bioperl-live is newer.  Scott, can you give that a try?
> > 
> > chris
> > 
> > On Jul 19, 2011, at 9:45 PM, Thomas Sharpton wrote:
> > 
> >> Hi Scott,
> >> 
> >> Thanks for writing. I'm on the road at the moment so I have to be 
> briefer and less thorough than I'd like to be.
> >> 
> >> What you are observing is not the intended behavior. Oddly, it's not 
> what I recall obtaining in my tests on this software, though I was mostly 
> interested in hmmsearch at the time and may have been sloppier than I 
> should have been when it came to hmmscan.
> >> 
> >> What version of HMMER3 you're using? There have been some small 
> formatting changes in the past that might be causing a burp in the parser, 
> though I'm doubting it.
> >> 
> >> Kai Blin wrote some test scripts (found here: 
> bioperl-live/t/SearchIO/hmmer.t) that, if I recall correctly, evaluate 
> query/hit coordinates. It might be worth giving this a shot if you haven't 
> already.
> >> 
> >> Also, if you don't mind, I'm happy to run your code on your output file 
> on my end. It might help me diagnose the problem.
> >> 
> >> Sorry this is being a thorn in your side! I've cc'ed the list in case 
> anyone else has insight into this matter.
> >> 
> >> Best,
> >> Thomas
> >> 
> >> On Jul 19, 2011, at 10:43 AM, Givan, Scott A. wrote:
> >> 
> >>> Hi Thomas,
> >>> 
> >>> I'm using modules in the bipoerl-hmmer3 git repository to parse hmmscan
> >>> reports. When I parse the files and walk through the HSP's like:
> >>> 
> >>> while (my $hit = $rslt->next_model) {
> >>> 
> >>>    while (my $domain = $hit->next_hsp) {
> >>> 
> >>> And retrieve the "hit" coordinates like:
> >>> 
> >>>        print "hit coords: ", $domain->start('hit'), "-", 
> $domain->end('hit'),
> >>> "\n";
> >>> 
> >>> The coordinates returned correspond to what I would call the "query",
> >>> since they are for the sequence I fed to hmmscan to search the profile
> >>> database. Likewise, when retrieving the query coordinates like
> >>> $domain->start('query'), I get what I consider the "hit" coordinates,
> >>> since they are for the domain profile. Is this the intended behavior?
> >>> 
> >>> Thanks.
> >>> 
> >>> scott
> >>> 
> >>> -- 
> >>> Scott A. Givan
> >>> Associate Director
> >>> Informatics Research Core Facility
> >>> 240e Bond Life Sciences Center
> >>> Research Assistant Professor
> >>> Molecular Microbiology and Immunology
> >>> University of Missouri, Columbia
> >>> 
> >>> TEL 573-882-2948
> >>> FAX 573-884-9676
> >>> http://ircf.rnet.missouri.edu
> >>> 
> >>> 
> >>> 
> >> 
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>

From zhoupenggeni at gmail.com  Wed Jul 11 14:03:17 2012
From: zhoupenggeni at gmail.com (Peng Zhou)
Date: Wed, 11 Jul 2012 11:03:17 -0700 (PDT)
Subject: [Bioperl-l] Errors with Bio::Seq objects
In-Reply-To: <4FFDB879.1020906@gmail.com>
References: <4FFDB879.1020906@gmail.com>
Message-ID: <25cf332a-f998-4eae-96ff-d9db1ee2ff2c@googlegroups.com>

Hi, 

I guess that's what the commands are supposed to do: the get_SeqFeatures() 
method return an array of Bio::SeqFeature objects, and the translate() 
method returns a Bio::Seq object. And you can't simply "print" an object in 
perl - you can "dump" it though:

$ perl -e ' use Bio::SeqIO; use Data::Dumper; my 
$seqio=Bio::SeqIO->new(-file=>shift); 
print Dumper($seqio->next_seq()->get_SeqFeatures()); ' nt.gb 

$ perl -e ' use Bio::SeqIO; use Data::Dumper; my 
$seqio=Bio::SeqIO->new(-file=>shift); 
print Dumper($seqio->next_seq()->translate()); ' nt.gb 

On Wednesday, July 11, 2012 12:31:37 PM UTC-5, De-Jian Zhao wrote:
>
> Hi, 
>
> I downloaded a nucleotide sequence from Genbank (file name: nt.gb) and 
> tested the Bio::SeqIO module as follows. The first 3 commands succeeded; 
> however the last 2 failed. 
>
> I think $seqio->next_seq() produces a Bio::Seq object which contains the 
> sequence, features and annotation (according to the DESCRIPTION of 
> "perldoc Bio::Seq") and thus the invocation of the methods 
> get_SeqFeatures() and translate() should be valid. However, the results 
> denied this idea. 
>
> Will anyone explain what happened to the last 2 commands? I have 
> encountered numerous cases of failures when testing the bioperl methods. 
> I want to translate the mRNA sequence and extract the sequence features. 
> What are the right commands? Thanks a lot! 
>
> Best, 
> Dejian 
>
>
>
> PS: The commands and results 
>
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); 
> print $seqio->next_seq()->display_id(); ' nt.gb 
> NM_053056 
>
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); 
> print $seqio->next_seq()->molecule(); ' nt.gb 
> mRNA 
>
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); 
> print $seqio->next_seq()->subseq(1,6); ' nt.gb 
> CACACG 
>
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); 
> print $seqio->next_seq()->get_SeqFeatures(); ' nt.gb 
> Bio::SeqFeature::Generic=HASH(0x20a30898)Bio::SeqFeature::Generic=HASH(0x20a30bb0)Bio::SeqFeature::Generic=HASH(0x20a30cd0)Bio::SeqFeature::Generic=HASH(0x20a317b0)Bio::SeqFeature::Generic=HASH(0x20a31720)Bio::SeqFeature::Generic=HASH(0x20a39a18)Bio::SeqFeature::Generic=HASH(0x20a317e0)Bio::SeqFeature::Generic=HASH(0x20a398b0)Bio::SeqFeature::Generic=HASH(0x20a39838)Bio::SeqFeature::Generic=HASH(0x20a39e98)Bio::SeqFeature::Generic=HASH(0x20a3b898)Bio::SeqFeature::Generic=HASH(0x20a3a120)Bio::SeqFeature::Generic=HASH(0x20a3bda8)Bio::SeqFeature::Generic=HASH(0x20a3c030)Bio::SeqFeature::Generic=HASH(0x20a3c2b8)Bio::SeqFeature::Generic=HASH(0x20a3be20)Bio::SeqFeature::Generic=HASH(0x20a3c0a8)Bio::SeqFeature::Generic=HASH(0x20a3bb98)Bio::SeqFeature::Generic=HASH(0x20a3c300)Bio::SeqFeature::Generic=HASH(0x20a3c588)Bio::SeqFeature::Generic=HASH(0x20a3d838)Bio::SeqFeature::Generic=HASH(0x20a3dfb8)Bio::SeqFeature::Generic=HASH(0x20a3dd18) 
>
>
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); 
> print $seqio->next_seq()->translate(); ' nt.gb 
> Bio::Seq::RichSeq=HASH(0x20a3e7b0) 
>
>
>
>
> _______________________________________________ 
> Bioperl-l mailing list 
> Bioperl-l at lists.open-bio.org 
> http://lists.open-bio.org/mailman/listinfo/bioperl-l 
>

From zhoupenggeni at gmail.com  Wed Jul 11 14:03:17 2012
From: zhoupenggeni at gmail.com (Peng Zhou)
Date: Wed, 11 Jul 2012 11:03:17 -0700 (PDT)
Subject: [Bioperl-l] Errors with Bio::Seq objects
In-Reply-To: <4FFDB879.1020906@gmail.com>
References: <4FFDB879.1020906@gmail.com>
Message-ID: <25cf332a-f998-4eae-96ff-d9db1ee2ff2c@googlegroups.com>

Hi, 

I guess that's what the commands are supposed to do: the get_SeqFeatures() 
method return an array of Bio::SeqFeature objects, and the translate() 
method returns a Bio::Seq object. And you can't simply "print" an object in 
perl - you can "dump" it though:

$ perl -e ' use Bio::SeqIO; use Data::Dumper; my 
$seqio=Bio::SeqIO->new(-file=>shift); 
print Dumper($seqio->next_seq()->get_SeqFeatures()); ' nt.gb 

$ perl -e ' use Bio::SeqIO; use Data::Dumper; my 
$seqio=Bio::SeqIO->new(-file=>shift); 
print Dumper($seqio->next_seq()->translate()); ' nt.gb 

On Wednesday, July 11, 2012 12:31:37 PM UTC-5, De-Jian Zhao wrote:
>
> Hi, 
>
> I downloaded a nucleotide sequence from Genbank (file name: nt.gb) and 
> tested the Bio::SeqIO module as follows. The first 3 commands succeeded; 
> however the last 2 failed. 
>
> I think $seqio->next_seq() produces a Bio::Seq object which contains the 
> sequence, features and annotation (according to the DESCRIPTION of 
> "perldoc Bio::Seq") and thus the invocation of the methods 
> get_SeqFeatures() and translate() should be valid. However, the results 
> denied this idea. 
>
> Will anyone explain what happened to the last 2 commands? I have 
> encountered numerous cases of failures when testing the bioperl methods. 
> I want to translate the mRNA sequence and extract the sequence features. 
> What are the right commands? Thanks a lot! 
>
> Best, 
> Dejian 
>
>
>
> PS: The commands and results 
>
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); 
> print $seqio->next_seq()->display_id(); ' nt.gb 
> NM_053056 
>
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); 
> print $seqio->next_seq()->molecule(); ' nt.gb 
> mRNA 
>
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); 
> print $seqio->next_seq()->subseq(1,6); ' nt.gb 
> CACACG 
>
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); 
> print $seqio->next_seq()->get_SeqFeatures(); ' nt.gb 
> Bio::SeqFeature::Generic=HASH(0x20a30898)Bio::SeqFeature::Generic=HASH(0x20a30bb0)Bio::SeqFeature::Generic=HASH(0x20a30cd0)Bio::SeqFeature::Generic=HASH(0x20a317b0)Bio::SeqFeature::Generic=HASH(0x20a31720)Bio::SeqFeature::Generic=HASH(0x20a39a18)Bio::SeqFeature::Generic=HASH(0x20a317e0)Bio::SeqFeature::Generic=HASH(0x20a398b0)Bio::SeqFeature::Generic=HASH(0x20a39838)Bio::SeqFeature::Generic=HASH(0x20a39e98)Bio::SeqFeature::Generic=HASH(0x20a3b898)Bio::SeqFeature::Generic=HASH(0x20a3a120)Bio::SeqFeature::Generic=HASH(0x20a3bda8)Bio::SeqFeature::Generic=HASH(0x20a3c030)Bio::SeqFeature::Generic=HASH(0x20a3c2b8)Bio::SeqFeature::Generic=HASH(0x20a3be20)Bio::SeqFeature::Generic=HASH(0x20a3c0a8)Bio::SeqFeature::Generic=HASH(0x20a3bb98)Bio::SeqFeature::Generic=HASH(0x20a3c300)Bio::SeqFeature::Generic=HASH(0x20a3c588)Bio::SeqFeature::Generic=HASH(0x20a3d838)Bio::SeqFeature::Generic=HASH(0x20a3dfb8)Bio::SeqFeature::Generic=HASH(0x20a3dd18) 
>
>
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); 
> print $seqio->next_seq()->translate(); ' nt.gb 
> Bio::Seq::RichSeq=HASH(0x20a3e7b0) 
>
>
>
>
> _______________________________________________ 
> Bioperl-l mailing list 
> Bioperl-l at lists.open-bio.org 
> http://lists.open-bio.org/mailman/listinfo/bioperl-l 
>

From zhoupenggeni at gmail.com  Wed Jul 11 16:05:56 2012
From: zhoupenggeni at gmail.com (Peng Zhou)
Date: Wed, 11 Jul 2012 13:05:56 -0700 (PDT)
Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates
In-Reply-To: <FFCC2B49-3582-4CEE-9658-3710B213E754@illinois.edu>
References: <CA4B2C62.4B0E%givans@missouri.edu>
	<7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com>
	<1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu>
	<EBB9878A-F77D-4A0C-B543-7E9FA0C25ABE@missouri.edu>
	<a73f4de8-8a2e-4dec-9073-62ab2e4da668@googlegroups.com>
	<FFCC2B49-3582-4CEE-9658-3710B213E754@illinois.edu>
Message-ID: <ee477075-8d1a-4172-a4f2-7c51f45d9224@googlegroups.com>

Thanks Chris, here is the link of the filed 
bug: https://redmine.open-bio.org/issues/3369

On Wednesday, July 11, 2012 2:02:46 PM UTC-5, Christopher Fields wrote:
>
> Peng, 
>
> Has this been filed as a bug yet?   
>
>     https://redmine.open-bio.org/projects/bioperl 
>
> Seems like it would be fairly easy to fix, but I want to track it just in 
> case. 
>
> chris 
>
> On Jul 11, 2012, at 12:45 PM, Peng Zhou wrote: 
>
> > Hello guys, 
> > 
> > Just a follow-up, it seems to me the bioperl-live version is still 
> having the same problem - calling hit "query" while query sequence "hit". I 
> also looked into the test script written for hmmer3 
> (bioperl-live/t/SearchIO/hmmer.t), and it doesn't deal with the alignment 
> part - I guess that's why this bug was not discovered. 
> > 
> > To be simple, here's an output of hmmsearch v3.0: 
> > # hmmsearch :: search profile(s) against a sequence database 
> > # HMMER 3.0 (March 2010); http://hmmer.org/ 
> > # Copyright (C) 2010 Howard Hughes Medical Institute. 
> > # Freely distributed under the GNU General Public License (GPLv3). 
> > # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
> - 
> > # query HMM file:                 
>  /project/youngn/zhoup/Scripts/spada/profile/21_all.hmm 
> > # target sequence database:       
>  /project/youngn/zhoup/Data/misc3/spada/Athaliana/01_genome/12_refseq_orf.fa 
>
> > # output directed to file:         
> /project/youngn/zhoup/Data/misc3/spada/Athaliana/11_hmmSearchX/01_raw.txt 
> > # number of worker threads:        4 
> > # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
> - 
> > 
> > Query:       CRP0000  [M=75] 
> > Scores for complete sequences (score includes all domains): 
> >    --- full sequence ---   --- best 1 domain ---    -#dom- 
> >     E-value  score  bias    E-value  score  bias    exp  N  Sequence     
>             Description 
> >     ------- ------ -----    ------- ------ -----   ---- --  --------     
>             ----------- 
> >     5.5e-25   95.0  14.4    5.7e-25   95.0  10.0    1.0  1 
>  Chr2_540228_540404_+     
> > 
> > Domain annotation for each sequence (and alignments): 
> > >> Chr2_540228_540404_+   
> >    #    score  bias  c-Evalue  i-Evalue hmmfrom  hmm to    alifrom  ali 
> to    envfrom  env to     acc 
> >  ---   ------ ----- --------- --------- ------- -------    ------- 
> -------    ------- -------    ---- 
> >    1 !   95.0  10.0   3.6e-30   5.7e-25      20      74 ..       4     
>  59 .]       1      59 [] 0.95 
> > 
> >   Alignments for each domain: 
> >   == domain 1    score: 95.0 bits;  conditional E-value: 3.6e-30 
> >                CRP0000 20 
> tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg.rrrCfCtkpc 74 
> >                           ++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf gG+crg 
> rrrC+Ct++c 
> >   Chr2_540228_540404_+  4 
> GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC 59 
> >                           
> 568899***99********************************************* PP 
> > 
> > And here is a dump of the parsed HSP object: 
> > $VAR1 = bless( { 
> >                  'VERBOSE' => 0, 
> >                  'IDENTICAL' => 0, 
> >                  'RANK' => 1, 
> >                  'STRANDED' => 'NONE', 
> >                  'EVALUE' => '3.6e-30', 
> >                  'HSP_LENGTH' => 56, 
> >                  'ALGORITHM' => 'HMMSEARCH' 
> >                  'SCORE' => '95.0', 
> >                  'GAP_SYMBOL' => '-', 
> >                  'CONSERVED' => 0, 
> >                   
> >                  'HIT_NAME' => 'Chr2_540228_540404_+', 
> >                  'HIT_DESC' => '', 
> >                  'HIT_START' => '20', 
> >                  'HIT_END' => '74', 
> >                  'HIT_LENGTH' => 56, 
> >                  'HIT_SEQ' => 
> 'tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg-rrrCfCtkpc', 
> >                  'HIT_FRAME' => 0, 
> >                   
> >                  'QUERY_NAME' => 'CRP0000', 
> >                  'QUERY_DESC' => undef, 
> >                  'QUERY_START' => '4', 
> >                  'QUERY_END' => '59', 
> >                  'QUERY_LENGTH' => '75', 
> >                  'QUERY_FRAME' => 0, 
> >                  'QUERY_SEQ' => 
> 'GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC', 
> >                   
> >                  'HOMOLOGY_SEQ' => '++gp+++eartCes+Sh+FkGpCvs 
> +nCa+vC++Egf gG+crg rrrC+Ct++c', 
> >                }, 'Bio::Search::HSP::HMMERHSP' ); 
> > 
> > Clearly, the "HIT_START", "HIT_END", "HIT_SEQ" should actually be 
> exchanged with "QUERY_START", "QUERY_END" and "QUERY_SEQ" values. 
> > 
> > Thanks, 
> > 
> > Peng, 
> > 
> > On Tuesday, July 19, 2011 11:23:20 PM UTC-5, Givan, Scott A. wrote: 
> > I'll try the bioperl-live version. Thanks guys. 
> > Scott Givan 
> > 541-740-4685 
> > Sent from an iPhone (so expect typos). 
> > 
> > On Jul 19, 2011, at 10:34 PM, "Chris Fields" <cjfields at illinois.edu> 
> wrote: 
> > 
> > > This might be a disconnect between the HMMER3 version in bioperl-live 
> and the one in Kai's bioperl-hmmer3 repo.  I believe the one in 
> bioperl-live is newer.  Scott, can you give that a try? 
> > > 
> > > chris 
> > > 
> > > On Jul 19, 2011, at 9:45 PM, Thomas Sharpton wrote: 
> > > 
> > >> Hi Scott, 
> > >> 
> > >> Thanks for writing. I'm on the road at the moment so I have to be 
> briefer and less thorough than I'd like to be. 
> > >> 
> > >> What you are observing is not the intended behavior. Oddly, it's not 
> what I recall obtaining in my tests on this software, though I was mostly 
> interested in hmmsearch at the time and may have been sloppier than I 
> should have been when it came to hmmscan. 
> > >> 
> > >> What version of HMMER3 you're using? There have been some small 
> formatting changes in the past that might be causing a burp in the parser, 
> though I'm doubting it. 
> > >> 
> > >> Kai Blin wrote some test scripts (found here: 
> bioperl-live/t/SearchIO/hmmer.t) that, if I recall correctly, evaluate 
> query/hit coordinates. It might be worth giving this a shot if you haven't 
> already. 
> > >> 
> > >> Also, if you don't mind, I'm happy to run your code on your output 
> file on my end. It might help me diagnose the problem. 
> > >> 
> > >> Sorry this is being a thorn in your side! I've cc'ed the list in case 
> anyone else has insight into this matter. 
> > >> 
> > >> Best, 
> > >> Thomas 
> > >> 
> > >> On Jul 19, 2011, at 10:43 AM, Givan, Scott A. wrote: 
> > >> 
> > >>> Hi Thomas, 
> > >>> 
> > >>> I'm using modules in the bipoerl-hmmer3 git repository to parse 
> hmmscan 
> > >>> reports. When I parse the files and walk through the HSP's like: 
> > >>> 
> > >>> while (my $hit = $rslt->next_model) { 
> > >>> 
> > >>>    while (my $domain = $hit->next_hsp) { 
> > >>> 
> > >>> And retrieve the "hit" coordinates like: 
> > >>> 
> > >>>        print "hit coords: ", $domain->start('hit'), "-", 
> $domain->end('hit'), 
> > >>> "\n"; 
> > >>> 
> > >>> The coordinates returned correspond to what I would call the 
> "query", 
> > >>> since they are for the sequence I fed to hmmscan to search the 
> profile 
> > >>> database. Likewise, when retrieving the query coordinates like 
> > >>> $domain->start('query'), I get what I consider the "hit" 
> coordinates, 
> > >>> since they are for the domain profile. Is this the intended 
> behavior? 
> > >>> 
> > >>> Thanks. 
> > >>> 
> > >>> scott 
> > >>> 
> > >>> -- 
> > >>> Scott A. Givan 
> > >>> Associate Director 
> > >>> Informatics Research Core Facility 
> > >>> 240e Bond Life Sciences Center 
> > >>> Research Assistant Professor 
> > >>> Molecular Microbiology and Immunology 
> > >>> University of Missouri, Columbia 
> > >>> 
> > >>> TEL 573-882-2948 
> > >>> FAX 573-884-9676 
> > >>> http://ircf.rnet.missouri.edu 
> > >>> 
> > >>> 
> > >>> 
> > >> 
> > >> _______________________________________________ 
> > >> Bioperl-l mailing list 
> > >> Bioperl-l at lists.open-bio.org 
> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l 
> > > 
> > 
> > _______________________________________________ 
> > Bioperl-l mailing list 
> > Bioperl-l at lists.open-bio.org 
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l 
> > 
> > 
> > 
> > 
>
>
> _______________________________________________ 
> Bioperl-l mailing list 
> Bioperl-l at lists.open-bio.org 
> http://lists.open-bio.org/mailman/listinfo/bioperl-l 
>

From zhoupenggeni at gmail.com  Wed Jul 11 16:05:56 2012
From: zhoupenggeni at gmail.com (Peng Zhou)
Date: Wed, 11 Jul 2012 13:05:56 -0700 (PDT)
Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates
In-Reply-To: <FFCC2B49-3582-4CEE-9658-3710B213E754@illinois.edu>
References: <CA4B2C62.4B0E%givans@missouri.edu>
	<7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com>
	<1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu>
	<EBB9878A-F77D-4A0C-B543-7E9FA0C25ABE@missouri.edu>
	<a73f4de8-8a2e-4dec-9073-62ab2e4da668@googlegroups.com>
	<FFCC2B49-3582-4CEE-9658-3710B213E754@illinois.edu>
Message-ID: <ee477075-8d1a-4172-a4f2-7c51f45d9224@googlegroups.com>

Thanks Chris, here is the link of the filed 
bug: https://redmine.open-bio.org/issues/3369

On Wednesday, July 11, 2012 2:02:46 PM UTC-5, Christopher Fields wrote:
>
> Peng, 
>
> Has this been filed as a bug yet?   
>
>     https://redmine.open-bio.org/projects/bioperl 
>
> Seems like it would be fairly easy to fix, but I want to track it just in 
> case. 
>
> chris 
>
> On Jul 11, 2012, at 12:45 PM, Peng Zhou wrote: 
>
> > Hello guys, 
> > 
> > Just a follow-up, it seems to me the bioperl-live version is still 
> having the same problem - calling hit "query" while query sequence "hit". I 
> also looked into the test script written for hmmer3 
> (bioperl-live/t/SearchIO/hmmer.t), and it doesn't deal with the alignment 
> part - I guess that's why this bug was not discovered. 
> > 
> > To be simple, here's an output of hmmsearch v3.0: 
> > # hmmsearch :: search profile(s) against a sequence database 
> > # HMMER 3.0 (March 2010); http://hmmer.org/ 
> > # Copyright (C) 2010 Howard Hughes Medical Institute. 
> > # Freely distributed under the GNU General Public License (GPLv3). 
> > # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
> - 
> > # query HMM file:                 
>  /project/youngn/zhoup/Scripts/spada/profile/21_all.hmm 
> > # target sequence database:       
>  /project/youngn/zhoup/Data/misc3/spada/Athaliana/01_genome/12_refseq_orf.fa 
>
> > # output directed to file:         
> /project/youngn/zhoup/Data/misc3/spada/Athaliana/11_hmmSearchX/01_raw.txt 
> > # number of worker threads:        4 
> > # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
> - 
> > 
> > Query:       CRP0000  [M=75] 
> > Scores for complete sequences (score includes all domains): 
> >    --- full sequence ---   --- best 1 domain ---    -#dom- 
> >     E-value  score  bias    E-value  score  bias    exp  N  Sequence     
>             Description 
> >     ------- ------ -----    ------- ------ -----   ---- --  --------     
>             ----------- 
> >     5.5e-25   95.0  14.4    5.7e-25   95.0  10.0    1.0  1 
>  Chr2_540228_540404_+     
> > 
> > Domain annotation for each sequence (and alignments): 
> > >> Chr2_540228_540404_+   
> >    #    score  bias  c-Evalue  i-Evalue hmmfrom  hmm to    alifrom  ali 
> to    envfrom  env to     acc 
> >  ---   ------ ----- --------- --------- ------- -------    ------- 
> -------    ------- -------    ---- 
> >    1 !   95.0  10.0   3.6e-30   5.7e-25      20      74 ..       4     
>  59 .]       1      59 [] 0.95 
> > 
> >   Alignments for each domain: 
> >   == domain 1    score: 95.0 bits;  conditional E-value: 3.6e-30 
> >                CRP0000 20 
> tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg.rrrCfCtkpc 74 
> >                           ++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf gG+crg 
> rrrC+Ct++c 
> >   Chr2_540228_540404_+  4 
> GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC 59 
> >                           
> 568899***99********************************************* PP 
> > 
> > And here is a dump of the parsed HSP object: 
> > $VAR1 = bless( { 
> >                  'VERBOSE' => 0, 
> >                  'IDENTICAL' => 0, 
> >                  'RANK' => 1, 
> >                  'STRANDED' => 'NONE', 
> >                  'EVALUE' => '3.6e-30', 
> >                  'HSP_LENGTH' => 56, 
> >                  'ALGORITHM' => 'HMMSEARCH' 
> >                  'SCORE' => '95.0', 
> >                  'GAP_SYMBOL' => '-', 
> >                  'CONSERVED' => 0, 
> >                   
> >                  'HIT_NAME' => 'Chr2_540228_540404_+', 
> >                  'HIT_DESC' => '', 
> >                  'HIT_START' => '20', 
> >                  'HIT_END' => '74', 
> >                  'HIT_LENGTH' => 56, 
> >                  'HIT_SEQ' => 
> 'tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg-rrrCfCtkpc', 
> >                  'HIT_FRAME' => 0, 
> >                   
> >                  'QUERY_NAME' => 'CRP0000', 
> >                  'QUERY_DESC' => undef, 
> >                  'QUERY_START' => '4', 
> >                  'QUERY_END' => '59', 
> >                  'QUERY_LENGTH' => '75', 
> >                  'QUERY_FRAME' => 0, 
> >                  'QUERY_SEQ' => 
> 'GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC', 
> >                   
> >                  'HOMOLOGY_SEQ' => '++gp+++eartCes+Sh+FkGpCvs 
> +nCa+vC++Egf gG+crg rrrC+Ct++c', 
> >                }, 'Bio::Search::HSP::HMMERHSP' ); 
> > 
> > Clearly, the "HIT_START", "HIT_END", "HIT_SEQ" should actually be 
> exchanged with "QUERY_START", "QUERY_END" and "QUERY_SEQ" values. 
> > 
> > Thanks, 
> > 
> > Peng, 
> > 
> > On Tuesday, July 19, 2011 11:23:20 PM UTC-5, Givan, Scott A. wrote: 
> > I'll try the bioperl-live version. Thanks guys. 
> > Scott Givan 
> > 541-740-4685 
> > Sent from an iPhone (so expect typos). 
> > 
> > On Jul 19, 2011, at 10:34 PM, "Chris Fields" <cjfields at illinois.edu> 
> wrote: 
> > 
> > > This might be a disconnect between the HMMER3 version in bioperl-live 
> and the one in Kai's bioperl-hmmer3 repo.  I believe the one in 
> bioperl-live is newer.  Scott, can you give that a try? 
> > > 
> > > chris 
> > > 
> > > On Jul 19, 2011, at 9:45 PM, Thomas Sharpton wrote: 
> > > 
> > >> Hi Scott, 
> > >> 
> > >> Thanks for writing. I'm on the road at the moment so I have to be 
> briefer and less thorough than I'd like to be. 
> > >> 
> > >> What you are observing is not the intended behavior. Oddly, it's not 
> what I recall obtaining in my tests on this software, though I was mostly 
> interested in hmmsearch at the time and may have been sloppier than I 
> should have been when it came to hmmscan. 
> > >> 
> > >> What version of HMMER3 you're using? There have been some small 
> formatting changes in the past that might be causing a burp in the parser, 
> though I'm doubting it. 
> > >> 
> > >> Kai Blin wrote some test scripts (found here: 
> bioperl-live/t/SearchIO/hmmer.t) that, if I recall correctly, evaluate 
> query/hit coordinates. It might be worth giving this a shot if you haven't 
> already. 
> > >> 
> > >> Also, if you don't mind, I'm happy to run your code on your output 
> file on my end. It might help me diagnose the problem. 
> > >> 
> > >> Sorry this is being a thorn in your side! I've cc'ed the list in case 
> anyone else has insight into this matter. 
> > >> 
> > >> Best, 
> > >> Thomas 
> > >> 
> > >> On Jul 19, 2011, at 10:43 AM, Givan, Scott A. wrote: 
> > >> 
> > >>> Hi Thomas, 
> > >>> 
> > >>> I'm using modules in the bipoerl-hmmer3 git repository to parse 
> hmmscan 
> > >>> reports. When I parse the files and walk through the HSP's like: 
> > >>> 
> > >>> while (my $hit = $rslt->next_model) { 
> > >>> 
> > >>>    while (my $domain = $hit->next_hsp) { 
> > >>> 
> > >>> And retrieve the "hit" coordinates like: 
> > >>> 
> > >>>        print "hit coords: ", $domain->start('hit'), "-", 
> $domain->end('hit'), 
> > >>> "\n"; 
> > >>> 
> > >>> The coordinates returned correspond to what I would call the 
> "query", 
> > >>> since they are for the sequence I fed to hmmscan to search the 
> profile 
> > >>> database. Likewise, when retrieving the query coordinates like 
> > >>> $domain->start('query'), I get what I consider the "hit" 
> coordinates, 
> > >>> since they are for the domain profile. Is this the intended 
> behavior? 
> > >>> 
> > >>> Thanks. 
> > >>> 
> > >>> scott 
> > >>> 
> > >>> -- 
> > >>> Scott A. Givan 
> > >>> Associate Director 
> > >>> Informatics Research Core Facility 
> > >>> 240e Bond Life Sciences Center 
> > >>> Research Assistant Professor 
> > >>> Molecular Microbiology and Immunology 
> > >>> University of Missouri, Columbia 
> > >>> 
> > >>> TEL 573-882-2948 
> > >>> FAX 573-884-9676 
> > >>> http://ircf.rnet.missouri.edu 
> > >>> 
> > >>> 
> > >>> 
> > >> 
> > >> _______________________________________________ 
> > >> Bioperl-l mailing list 
> > >> Bioperl-l at lists.open-bio.org 
> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l 
> > > 
> > 
> > _______________________________________________ 
> > Bioperl-l mailing list 
> > Bioperl-l at lists.open-bio.org 
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l 
> > 
> > 
> > 
> > 
>
>
> _______________________________________________ 
> Bioperl-l mailing list 
> Bioperl-l at lists.open-bio.org 
> http://lists.open-bio.org/mailman/listinfo/bioperl-l 
>

From w.arindrarto at gmail.com  Wed Jul 11 17:25:44 2012
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Wed, 11 Jul 2012 23:25:44 +0200
Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates
In-Reply-To: <CAKVJ-_4mo16NxEbZkNp3-EG9uknc4Fo4m6ajQWTknumQNUn4Yw@mail.gmail.com>
References: <CA4B2C62.4B0E%givans@missouri.edu>
	<7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com>
	<1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu>
	<EBB9878A-F77D-4A0C-B543-7E9FA0C25ABE@missouri.edu>
	<a73f4de8-8a2e-4dec-9073-62ab2e4da668@googlegroups.com>
	<FFCC2B49-3582-4CEE-9658-3710B213E754@illinois.edu>
	<CAKVJ-_4mo16NxEbZkNp3-EG9uknc4Fo4m6ajQWTknumQNUn4Yw@mail.gmail.com>
Message-ID: <CADEGkF5mE=1Kgy5X4BX9VH0TuChU4OyKuf3e-rzH_SeB=xxapA@mail.gmail.com>

Hi everyone,

Just as an additional info that might be useful:

The current Biopython parser for the plain text format parses the very
first line to find out which HMMER flavor produces the result. Both 'hmm
from' and 'hmmto' are query coordinates if the flavor is hmmsearch
or phmmer; and they're hit coordinates if the flavor is hmmscan.

This information is not available in other HMMER command line output
formats (tblout and domtblout), which as Peter has mentioned, required
us to treat different flavors of the table output as different formats
for the time being.
Fortunately, after contacting the HMMER developers they mentioned that this
is not the case anymore in their development branch (and their future
planned release).

Hope that helps :),
Bow


On Wed, Jul 11, 2012 at 11:00 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Wed, Jul 11, 2012 at 8:02 PM, Fields, Christopher J
> <cjfields at illinois.edu> wrote:
>> Peng,
>>
>> Has this been filed as a bug yet?
>>
>>     https://redmine.open-bio.org/projects/bioperl
>>
>> Seems like it would be fairly easy to fix, but I want to track it just in case.
>>
>> chris
>
> Hi all,
>
> This could be the unfortunate fact that hmmscan and
> hmmsearch return very similar tabular output, but
> with query and hit interchanged. i.e. You need some
> extra information to know which way round they are
> (not possible with the current output). This was an
> issue in Bow's Biopython SearchIO project - which
> for the moment he solved by handling this as two
> hmmer file formats. In the medium term we're hoping
> hmmer3 will add some header information or something.
>
> Peter

From dejian.zhao at gmail.com  Thu Jul 12 01:04:54 2012
From: dejian.zhao at gmail.com (De-Jian Zhao)
Date: Thu, 12 Jul 2012 13:04:54 +0800
Subject: [Bioperl-l] Errors with Bio::Seq objects
In-Reply-To: <25cf332a-f998-4eae-96ff-d9db1ee2ff2c@googlegroups.com>
References: <4FFDB879.1020906@gmail.com>
	<25cf332a-f998-4eae-96ff-d9db1ee2ff2c@googlegroups.com>
Message-ID: <4FFE5AF6.1020300@gmail.com>

Thank you, Peng. That's great! Actually I am wondering how to get the 
whole content of an object these days; "Dumping it" is a good solution.

On 2012-7-12 2:03, Peng Zhou wrote:
> Hi,
>
> I guess that's what the commands are supposed to do: the get_SeqFeatures()
> method return an array of Bio::SeqFeature objects, and the translate()
> method returns a Bio::Seq object. And you can't simply "print" an object in
> perl - you can "dump" it though:
>
> $ perl -e ' use Bio::SeqIO; use Data::Dumper; my
> $seqio=Bio::SeqIO->new(-file=>shift);
> print Dumper($seqio->next_seq()->get_SeqFeatures()); ' nt.gb
>
> $ perl -e ' use Bio::SeqIO; use Data::Dumper; my
> $seqio=Bio::SeqIO->new(-file=>shift);
> print Dumper($seqio->next_seq()->translate()); ' nt.gb
>
> On Wednesday, July 11, 2012 12:31:37 PM UTC-5, De-Jian Zhao wrote:
>> Hi,
>>
>> I downloaded a nucleotide sequence from Genbank (file name: nt.gb) and
>> tested the Bio::SeqIO module as follows. The first 3 commands succeeded;
>> however the last 2 failed.
>>
>> I think $seqio->next_seq() produces a Bio::Seq object which contains the
>> sequence, features and annotation (according to the DESCRIPTION of
>> "perldoc Bio::Seq") and thus the invocation of the methods
>> get_SeqFeatures() and translate() should be valid. However, the results
>> denied this idea.
>>
>> Will anyone explain what happened to the last 2 commands? I have
>> encountered numerous cases of failures when testing the bioperl methods.
>> I want to translate the mRNA sequence and extract the sequence features.
>> What are the right commands? Thanks a lot!
>>
>> Best,
>> Dejian
>>
>>
>>
>> PS: The commands and results
>>
>> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift);
>> print $seqio->next_seq()->display_id(); ' nt.gb
>> NM_053056
>>
>> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift);
>> print $seqio->next_seq()->molecule(); ' nt.gb
>> mRNA
>>
>> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift);
>> print $seqio->next_seq()->subseq(1,6); ' nt.gb
>> CACACG
>>
>> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift);
>> print $seqio->next_seq()->get_SeqFeatures(); ' nt.gb
>> Bio::SeqFeature::Generic=HASH(0x20a30898)Bio::SeqFeature::Generic=HASH(0x20a30bb0)Bio::SeqFeature::Generic=HASH(0x20a30cd0)Bio::SeqFeature::Generic=HASH(0x20a317b0)Bio::SeqFeature::Generic=HASH(0x20a31720)Bio::SeqFeature::Generic=HASH(0x20a39a18)Bio::SeqFeature::Generic=HASH(0x20a317e0)Bio::SeqFeature::Generic=HASH(0x20a398b0)Bio::SeqFeature::Generic=HASH(0x20a39838)Bio::SeqFeature::Generic=HASH(0x20a39e98)Bio::SeqFeature::Generic=HASH(0x20a3b898)Bio::SeqFeature::Generic=HASH(0x20a3a120)Bio::SeqFeature::Generic=HASH(0x20a3bda8)Bio::SeqFeature::Generic=HASH(0x20a3c030)Bio::SeqFeature::Generic=HASH(0x20a3c2b8)Bio::SeqFeature::Generic=HASH(0x20a3be20)Bio::SeqFeature::Generic=HASH(0x20a3c0a8)Bio::SeqFeature::Generic=HASH(0x20a3bb98)Bio::SeqFeature::Generic=HASH(0x20a3c300)Bio::SeqFeature::Generic=HASH(0x20a3c588)Bio::SeqFeature::Generic=HASH(0x20a3d838)Bio::SeqFeature::Generic=HASH(0x20a3dfb8)Bio::SeqFeature::Generic=HASH(0x20a3dd18)
>>
>>
>> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift);
>> print $seqio->next_seq()->translate(); ' nt.gb
>> Bio::Seq::RichSeq=HASH(0x20a3e7b0)
>>


From dejian.zhao at gmail.com  Thu Jul 12 01:14:33 2012
From: dejian.zhao at gmail.com (De-Jian Zhao)
Date: Thu, 12 Jul 2012 13:14:33 +0800
Subject: [Bioperl-l] Errors with Bio::Seq objects
In-Reply-To: <9CA9DA3A-B03F-4EC3-977C-E18A6F4D9B6F@tamu.edu>
References: <4FFDB879.1020906@gmail.com>
	<9CA9DA3A-B03F-4EC3-977C-E18A6F4D9B6F@tamu.edu>
Message-ID: <4FFE5D39.6010406@gmail.com>

Thank you, Jim. You are right. It works. This example deepens my 
understanding of OOP.

On 2012-7-12 2:01, Jim Hu wrote:
>> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->translate(); ' nt.gb
>> >  Bio::Seq::RichSeq=HASH(0x20a3e7b0)
> ->translate returns a new Seq object.  I think
>
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->translate()->seq(); ' nt.gb
>
> should work (haven't tried it).


From kai.blin at biotech.uni-tuebingen.de  Thu Jul 12 09:43:19 2012
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Thu, 12 Jul 2012 15:43:19 +0200
Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates
In-Reply-To: <CADEGkF5mE=1Kgy5X4BX9VH0TuChU4OyKuf3e-rzH_SeB=xxapA@mail.gmail.com>
References: <CA4B2C62.4B0E%givans@missouri.edu>
	<7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com>
	<1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu>
	<EBB9878A-F77D-4A0C-B543-7E9FA0C25ABE@missouri.edu>
	<a73f4de8-8a2e-4dec-9073-62ab2e4da668@googlegroups.com>
	<FFCC2B49-3582-4CEE-9658-3710B213E754@illinois.edu>
	<CAKVJ-_4mo16NxEbZkNp3-EG9uknc4Fo4m6ajQWTknumQNUn4Yw@mail.gmail.com>
	<CADEGkF5mE=1Kgy5X4BX9VH0TuChU4OyKuf3e-rzH_SeB=xxapA@mail.gmail.com>
Message-ID: <4FFED477.3090907@biotech.uni-tuebingen.de>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 2012-07-11 23:25, Wibowo Arindrarto wrote:

Hi,

> The current Biopython parser for the plain text format parses the
> very first line to find out which HMMER flavor produces the result.
> Both 'hmm from' and 'hmmto' are query coordinates if the flavor is
> hmmsearch or phmmer; and they're hit coordinates if the flavor is
> hmmscan.

Whoops. I mostly looked at hmmscan when writing the parser, because
that's the file format I needed for my code. The code clearly should
follow the way the hmmer2 parser works, and differentiate between
hmmsearch and hmmscan type output.

As I said on the bug report, I'm happy to look at code fixing this.

> This information is not available in other HMMER command line
> output formats (tblout and domtblout), which as Peter has
> mentioned, required us to treat different flavors of the table
> output as different formats for the time being.

As far as I'm aware, BioPerl currently doesn't parse the table output
format.

Seeing how much repeated pain we run into with all these parsers in
the different Bio* projects, I wonder if there was a smarter way to
deal with parsing. Maybe at least some shared grammar file that we
could use for testing, to make sure we at least have the same
expectations about file formats in the different language
implementations. Ideally we'd auto-generate the parsers from the
grammar specification, but I guess that'll stay wishful thinking for
quite a bit.

> Fortunately, after contacting the HMMER developers they mentioned
> that this is not the case anymore in their development branch (and
> their future planned release).

That's certainly good news. :)

Cheers,
Kai

- -- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-Universit?t T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Germany
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJP/tR3AAoJEKM5lwBiwTTP6OoIAM3J9chdyfmTuQTp4KMxVIk7
PCkJy+aLcnfa3d7s8BVPG0GWQTPrfHLX6a7zWfoSLzL9RBShFWCQIxGpu+Tq3yR8
Hu/TpoFIg8bB1iAroAWLdsX8nio3Idlcl5JN38LBsFEUirFrGAsvfdN/+fYrP5Ni
y0ULP18uihiN07sVG88nZXNyEB7fIscVYdO90GsGq03/KOTRsRD4kugapiQJIy4D
lrqnYznLa4p30lBDCEHbTaHYbfIs7/8tryfHJsfjimjg8IoSMHMJfIkI7/z0qlL+
bxt/HuGMsm1Ak08xEAoT7T00t5tcAp1gclgZsO/CrviOicmhUgd6iri/kIpzg0c=
=acWd
-----END PGP SIGNATURE-----

From cjfields at illinois.edu  Thu Jul 12 11:24:13 2012
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Thu, 12 Jul 2012 15:24:13 +0000
Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates
In-Reply-To: <4FFED477.3090907@biotech.uni-tuebingen.de>
References: <CA4B2C62.4B0E%givans@missouri.edu>
	<7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com>
	<1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu>
	<EBB9878A-F77D-4A0C-B543-7E9FA0C25ABE@missouri.edu>
	<a73f4de8-8a2e-4dec-9073-62ab2e4da668@googlegroups.com>
	<FFCC2B49-3582-4CEE-9658-3710B213E754@illinois.edu>
	<CAKVJ-_4mo16NxEbZkNp3-EG9uknc4Fo4m6ajQWTknumQNUn4Yw@mail.gmail.com>
	<CADEGkF5mE=1Kgy5X4BX9VH0TuChU4OyKuf3e-rzH_SeB=xxapA@mail.gmail.com>
	<4FFED477.3090907@biotech.uni-tuebingen.de>
Message-ID: <1C3A31F9-9717-49F3-A880-FA725D0F3CDB@illinois.edu>

On Jul 12, 2012, at 8:43 AM, Kai Blin wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 2012-07-11 23:25, Wibowo Arindrarto wrote:
> 
> Hi,
> 
>> The current Biopython parser for the plain text format parses the
>> very first line to find out which HMMER flavor produces the result.
>> Both 'hmm from' and 'hmmto' are query coordinates if the flavor is
>> hmmsearch or phmmer; and they're hit coordinates if the flavor is
>> hmmscan.
> 
> Whoops. I mostly looked at hmmscan when writing the parser, because
> that's the file format I needed for my code. The code clearly should
> follow the way the hmmer2 parser works, and differentiate between
> hmmsearch and hmmscan type output.
> 
> As I said on the bug report, I'm happy to look at code fixing this.

Seems like it should be easy enough to address if there is something in the output that indicates the report type.

>> This information is not available in other HMMER command line
>> output formats (tblout and domtblout), which as Peter has
>> mentioned, required us to treat different flavors of the table
>> output as different formats for the time being.
> 
> As far as I'm aware, BioPerl currently doesn't parse the table output
> format.

The only reason to do so is if the table provides additional information the actual hits don't (this can be the case with BLAST reports).

> Seeing how much repeated pain we run into with all these parsers in
> the different Bio* projects, I wonder if there was a smarter way to
> deal with parsing. Maybe at least some shared grammar file that we
> could use for testing, to make sure we at least have the same
> expectations about file formats in the different language
> implementations. Ideally we'd auto-generate the parsers from the
> grammar specification, but I guess that'll stay wishful thinking for
> quite a bit.

I would fully support something like this, been thinking about this with Marpa::XS (which now has a compiled library, libmarpa, to make it less perl-centric), and there have been talks of using a similar toolkit with the bioruby folks.  We could always have a plain-perl/python/ruby/etc fallback in the most common formats.

chris

From buschj at hhu.de  Sun Jul 15 15:46:42 2012
From: buschj at hhu.de (jobu)
Date: Sun, 15 Jul 2012 21:46:42 +0200
Subject: [Bioperl-l] How to obtain Up- and Downstream target-Sequences of
	Blast Matches
Message-ID: <50031E22.3060902@hhu.de>

Dear All.

Still being a beginner in Perl and just having started to look into
BioPerl, I hope to ask my question at the right place.

I locally ran a standalone blastn search of many short query-sequences
against a set of target-fasta-sequences consisting of whole chromosomal
sequence data.

What I need to do now is to get let's say 100nt each Up- and Downstream
out of my target sequences for each Blast match.

At this point I only can assume that BioPerl might be helpfull in
resolving this task, though I haven't found a module yet that will
manage to do this locally on my harddrive.

Thus I would be thankful for the slightest hint where to begin.

Sincerely
Jochen

From Russell.Smithies at agresearch.co.nz  Sun Jul 15 17:19:14 2012
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Mon, 16 Jul 2012 09:19:14 +1200
Subject: [Bioperl-l] How to obtain Up- and Downstream target-Sequences
 of	Blast Matches
In-Reply-To: <50031E22.3060902@hhu.de>
References: <50031E22.3060902@hhu.de>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF34CCF2A4CAA@exchsth.agresearch.co.nz>

Hi Jochen,
I don't think BioPerl can directly manipulate blast databases so I'd probably do it with fastacmd to extract the sequence from the original blast database.
eg.
fastacmd -s X51494.1 -d /dataset/blastdata/active/nt -L 100,200
>gi|20090|emb|X51494.1|:100-200 Rice prolamin gene (strain NE4)
ATGATGCAAACGTTGGGCATGGGTAGCTCCACAGCCATGTTCATGTCGCAGCCAATGGCGCTCCTGCAGCAGCAATGTTG
CATGCAGCTACAAGGCATGAT

Or if you're using blast+, use the blastdbcmd command:
eg.
blastdbcmd -entry  X51494.1 -db /dataset/blastdata/active/nt -range 100-200
>gi|20090|emb|X51494.1|:100-200 Rice prolamin gene (strain NE4)
ATGATGCAAACGTTGGGCATGGGTAGCTCCACAGCCATGTTCATGTCGCAGCCAATGGCGCTCCTGCAGCAGCAATGTTG
CATGCAGCTACAAGGCATGAT

So to add it all together, try using BioPerl to parse your existing blast results and pull out each hit's coordinates then use a system call to exec fastacmd or blastdbcmd to extract the sequence from the blast database then write the sequences to file.

These might be useful:
http://www.bioperl.org/wiki/HOWTO:SearchIO
http://www.bioperl.org/wiki/HOWTO:SearchIO#Speed_improvements_with_lightweight_objects 
http://www.bioperl.org/wiki/HOWTO:BlastPlus
http://www.bioperl.org/wiki/HOWTO:StandAloneBlast


--Russell

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of jobu
Sent: Monday, 16 July 2012 7:47 a.m.
To: bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] How to obtain Up- and Downstream target-Sequences of Blast Matches

Dear All.

Still being a beginner in Perl and just having started to look into BioPerl, I hope to ask my question at the right place.

I locally ran a standalone blastn search of many short query-sequences against a set of target-fasta-sequences consisting of whole chromosomal sequence data.

What I need to do now is to get let's say 100nt each Up- and Downstream out of my target sequences for each Blast match.

At this point I only can assume that BioPerl might be helpfull in resolving this task, though I haven't found a module yet that will manage to do this locally on my harddrive.

Thus I would be thankful for the slightest hint where to begin.

Sincerely
Jochen
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From dcmertens.perl at gmail.com  Tue Jul 17 08:57:55 2012
From: dcmertens.perl at gmail.com (David Mertens)
Date: Tue, 17 Jul 2012 07:57:55 -0500
Subject: [Bioperl-l] Announcing The Quantified Onion Google Group and
	perl4science.github.com
Message-ID: <CA+4ieYXGs=ywH97gEdTOakVLp22n0gPuwZJK-C75C7ijUzPgwg@mail.gmail.com>

Hello everybody -

I returned from YAPC::NA this year intending to build-up the scientific
Perl community. One outgrowth of this has been Joel Berger's creation of
perl4science.github.com and gizmomathboy's creation of The Quantified Onion
Google Group<https://groups.google.com/forum/#%21forum/the-quantified-onion>
.

perl4science is meant to be a landing page for anybody looking to combine
Perl and science. Since it is a github repository, it makes it about as
easy as possible for others to contribute content or fixes. If you have a
project that scientists would find useful, you should fork the project, add
your content, and issue a pull request. It's that easy.

The Quantified Onion is meant to be a space for scientists to discuss how
we use Perl in our science and to work together to grow adoption of Perl
among scientists. It will undoubtedly attract newcomers to Perll asking
beginner questions, at which point we will gently refer them to the
appropriate manual pages. Interesting discussions thus far (in my mind)
include a discussion about teaching test-driven design and a discussion
about submitting an article to Computing in Science and Engineering for
their November Issue, which is supposed to be about Modern Programming
Languages. I would like to begin putting on workshops on Perl for
Scientists and Engineers (and encourage others to do that same), and I will
begin the discussion on The Quantified Onion.

If you know of other Perl science resources, please feel free to add them
to perl4science or post them on The Quantified Onion, and please join The
Quantified Onion. Together, we can grow Perl's adoption among scientists!

David Mertens

-- 
 "Debugging is twice as hard as writing the code in the first place.
  Therefore, if you write the code as cleverly as possible, you are,
  by definition, not smart enough to debug it." -- Brian Kernighan

From cjfields at illinois.edu  Wed Jul 18 10:29:02 2012
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Wed, 18 Jul 2012 14:29:02 +0000
Subject: [Bioperl-l] [blast-announce] OLD_BLAST parameter to be
 discontinued. Alternative NCBI BLAST parsable formats are available
References: <CC2C3DA1.A01D6%mcginnis@ncbi.nlm.nih.gov>
Message-ID: <118F034CF4C3EF48A96F86CE585B94BF2B63F6C5@CHIMBX5.ad.uillinois.edu>

Not sure if anyone is using this as a means of getting their reports (I don't), but I'm posting this here just in case.

-c

Begin forwarded message:

> From: "Mcginnis, Scott (NIH/NLM/NCBI) [E]" 
> Subject: [blast-announce] OLD_BLAST parameter to be discontinued. Alternative NCBI BLAST parsable formats are available
> Date: July 18, 2012 9:17:05 AM CDT
> To: NLM/NCBI List blast-announce
> 
> Beginning Sept. 10, 2012, the BLAST service will ignore the OLD_BLAST parameter in posted URLs. We are removing this old and little used option to prepare for upcoming enhancements to the BLAST service later this year. Setting OLD_BLAST=true produces an older version of the BLAST HTML results that a few people have used for automated processing (parsing) of results. NCBI BLAST supports a number of different and more stable formats for parsing. These include XML, tabular reports and ASN.1. For more information, please see BLAST Developer Information (http://1.usa.gov/O8AocI) and links on that page.
> 


From dejian.zhao at gmail.com  Wed Jul 18 11:36:14 2012
From: dejian.zhao at gmail.com (De-Jian Zhao)
Date: Wed, 18 Jul 2012 23:36:14 +0800
Subject: [Bioperl-l] Which graphic module should I learn?
Message-ID: <5006D7EE.1020205@gmail.com>

Hi, all.

Currently I am working on a genome. I will draw some pictures based on 
the sequencing data. In the long run, I will use the module in my future 
projects, so I want to learn a popular module to get better support from 
the community. I searched in cpan with the command "i /SVG/" and got 234 
items. Which one is popular in bioinformatics? Which module should I 
start with?  Thanks for any suggestions.

Best,
De-Jian

From scott at scottcain.net  Wed Jul 18 11:46:01 2012
From: scott at scottcain.net (Scott Cain)
Date: Wed, 18 Jul 2012 11:46:01 -0400
Subject: [Bioperl-l] Which graphic module should I learn?
In-Reply-To: <5006D7EE.1020205@gmail.com>
References: <5006D7EE.1020205@gmail.com>
Message-ID: <CA+JTaow0sJ-MTq47r1vvvu+H2zPu=U8VuEi++s-YDxcTo1+Ybg@mail.gmail.com>

Hi De-Jian,

Of course, it depends on what you want to do, but if you're referring
to the genome feature/annotation type graphics, Bio::Graphics already
supports SVG pretty well, via GD::SVG.

Scott


On Wed, Jul 18, 2012 at 11:36 AM, De-Jian Zhao <dejian.zhao at gmail.com> wrote:
> Hi, all.
>
> Currently I am working on a genome. I will draw some pictures based on the
> sequencing data. In the long run, I will use the module in my future
> projects, so I want to learn a popular module to get better support from the
> community. I searched in cpan with the command "i /SVG/" and got 234 items.
> Which one is popular in bioinformatics? Which module should I start with?
> Thanks for any suggestions.
>
> Best,
> De-Jian
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

From cjfields at illinois.edu  Tue Jul 24 23:08:05 2012
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Wed, 25 Jul 2012 03:08:05 +0000
Subject: [Bioperl-l] BioPerl and Travis-CI
Message-ID: <118F034CF4C3EF48A96F86CE585B94BF3140B017@CITESMBX5.ad.uillinois.edu>

Peter Cock has graciously helped start up a branch for bioperl-live that is using Travis-CI (a nice continuous integration tool).  Results from Peter's fork are found here:

    http://travis-ci.org/#!/peterjc/bioperl-live

As this is now pulled into the main bioperl repo, results will be here:

    http://travis-ci.org/#!/bioperl/bioperl-live

I'll be working on this and expect this will be added to master in the next few days.  

chris

From p.j.a.cock at googlemail.com  Wed Jul 25 06:31:13 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 25 Jul 2012 11:31:13 +0100
Subject: [Bioperl-l] BioPerl and Travis-CI
In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF3140B017@CITESMBX5.ad.uillinois.edu>
References: <118F034CF4C3EF48A96F86CE585B94BF3140B017@CITESMBX5.ad.uillinois.edu>
Message-ID: <CAKVJ-_4TN_OUGLGrcwp-cNkQGhXGn2BiO9F1NEzAiua49HY3Lg@mail.gmail.com>

On Wed, Jul 25, 2012 at 4:08 AM, Fields, Christopher J
<cjfields at illinois.edu> wrote:
> Peter Cock has graciously helped start up a branch for bioperl-live
> that is using Travis-CI (a nice continuous integration tool).  Results
> from Peter's fork are found here:
>
>     http://travis-ci.org/#!/peterjc/bioperl-live
>
> As this is now pulled into the main bioperl repo, results will be here:
>
>     http://travis-ci.org/#!/bioperl/bioperl-live
>
> I'll be working on this and expect this will be added to master in
> the next few days.
>
> chris

We've had this running for Biopython for a month now, and it has
been a useful complement to the BuildBot (which covers other
operating systems). This was following BioRuby's lead:
http://biopython.org/pipermail/biopython-dev/2012-June/009742.html

The current BioPerl Travis configuration is probably usable right
now (after changing the branch whitelist to either master, or simple
all branches).

Other remaining issues include sorting out which dependencies
should be installed, and streamlining their verbose output (e.g.
using tail).

TravisCI can send out emails (e.g. on test failures), and perhaps
bioperl-guts-l might be a sensible place to send these. Initially
we'd disabled the emails for Biopython. I'd like to use an RSS
feed... there is a JSON API which BioRuby are using for
http://www.biogems.info/ which tracks their plugins.

Peter

From p.j.a.cock at googlemail.com  Fri Jul 27 11:03:05 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 27 Jul 2012 16:03:05 +0100
Subject: [Bioperl-l] BioPerl and Travis-CI
In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF3140D21F@CITESMBX5.ad.uillinois.edu>
References: <118F034CF4C3EF48A96F86CE585B94BF3140B017@CITESMBX5.ad.uillinois.edu>
	<CAKVJ-_4TN_OUGLGrcwp-cNkQGhXGn2BiO9F1NEzAiua49HY3Lg@mail.gmail.com>
	<CAKVJ-_7g-rVQ37Q0Svg4LGXN7C=3i7Mab-kjUc2XpuuX5uEXsQ@mail.gmail.com>
	<CAKVJ-_45r1Hza-ga95oyMRbYGzupsKtmAZ=ac29g2n-U9SegWA@mail.gmail.com>
	<118F034CF4C3EF48A96F86CE585B94BF3140D21F@CITESMBX5.ad.uillinois.edu>
Message-ID: <CAKVJ-_4iYB0N7A6eOEivrKTkeQZzw=9jiS7kqVCKR2aeEiZZaQ@mail.gmail.com>

On Fri, Jul 27, 2012 at 3:58 PM, Fields, Christopher J
<cjfields at illinois.edu> wrote:
> On Jul 27, 2012, at 9:47 AM, Peter Cock wrote:
>
>> On Thu, Jul 26, 2012 at 4:22 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>>>
>>> That's done now - except for the circular dependencies, and GD,
>>> which might be easy to solve if anyone knows what the error
>>> means - see commit message here:
>>> https://github.com/peterjc/bioperl-live/commit/905441ac09939be3368c14de38d04486c7e9849a
>>
>> Re: https://twitter.com/cjfields/status/228861370454638592
>> Not sure why you got GD to work when something very similar
>> had failed for me. Oh well - job done :)
>
> It was the lack of gdlib-config in the libgd2-xpm package, you need
> libgd2-xpm-dev.  One of the fun things about Debian packaging.

Ah - I should have guessed that.

>>> Would a single clean commit of the (current) .travis.yml file be
>>> preferable to the current series of commits? And you you want
>>> a pull request, or would you just merge/cherry-pick manually?
>>
>> Given all the churn between our revisions, personally I'd opt for
>> a single clean commit to bioperl/master - but your call.
>>
>> Peter
>
> Yep, about to merge it over.  It's working now, just need to
> whitelist master instead of travis after the merge.

I'd removed the whitelist altogether here:
https://github.com/peterjc/bioperl-live/commit/96dc5866f4406179353909c72d812623341c8fbd

My thinking was BioPerl seems to have multiple feature branches
under the official repo, so they should get tested too. You'd be
in a better position than me to judge what would work best for
BioPerl here.

Peter

From cjfields at illinois.edu  Fri Jul 27 10:58:21 2012
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Fri, 27 Jul 2012 14:58:21 +0000
Subject: [Bioperl-l] BioPerl and Travis-CI
In-Reply-To: <CAKVJ-_45r1Hza-ga95oyMRbYGzupsKtmAZ=ac29g2n-U9SegWA@mail.gmail.com>
References: <118F034CF4C3EF48A96F86CE585B94BF3140B017@CITESMBX5.ad.uillinois.edu>
	<CAKVJ-_4TN_OUGLGrcwp-cNkQGhXGn2BiO9F1NEzAiua49HY3Lg@mail.gmail.com>
	<CAKVJ-_7g-rVQ37Q0Svg4LGXN7C=3i7Mab-kjUc2XpuuX5uEXsQ@mail.gmail.com>
	<CAKVJ-_45r1Hza-ga95oyMRbYGzupsKtmAZ=ac29g2n-U9SegWA@mail.gmail.com>
Message-ID: <118F034CF4C3EF48A96F86CE585B94BF3140D21F@CITESMBX5.ad.uillinois.edu>

On Jul 27, 2012, at 9:47 AM, Peter Cock wrote:

> On Thu, Jul 26, 2012 at 4:22 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>> 
>> That's done now - except for the circular dependencies, and GD,
>> which might be easy to solve if anyone knows what the error
>> means - see commit message here:
>> https://github.com/peterjc/bioperl-live/commit/905441ac09939be3368c14de38d04486c7e9849a
> 
> Re: https://twitter.com/cjfields/status/228861370454638592
> Not sure why you got GD to work when something very similar
> had failed for me. Oh well - job done :)

It was the lack of gdlib-config in the libgd2-xpm package, you need libgd2-xpm-dev.  One of the fun things about Debian packaging.

>> Would a single clean commit of the (current) .travis.yml file be
>> preferable to the current series of commits? And you you want
>> a pull request, or would you just merge/cherry-pick manually?
> 
> Given all the churn between our revisions, personally I'd opt for
> a single clean commit to bioperl/master - but your call.
> 
> Peter

Yep, about to merge it over.  It's working now, just need to whitelist master instead of travis after the merge.

chris


From cjfields at illinois.edu  Fri Jul 27 12:26:34 2012
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Fri, 27 Jul 2012 16:26:34 +0000
Subject: [Bioperl-l] BioPerl Travis-CI now live
Message-ID: <118F034CF4C3EF48A96F86CE585B94BF3140D54D@CITESMBX5.ad.uillinois.edu>

All commits to bioperl-live master branch on github are now being tracked:

   http://travis-ci.org/#!/bioperl/bioperl-live

The .travis.yml file has a whitelist for branches to be tested; if anyone wants to test additional branches feel free to add them to the list!

chris

From cjfields at illinois.edu  Fri Jul 27 11:15:19 2012
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Fri, 27 Jul 2012 15:15:19 +0000
Subject: [Bioperl-l] BioPerl and Travis-CI
In-Reply-To: <CAKVJ-_4iYB0N7A6eOEivrKTkeQZzw=9jiS7kqVCKR2aeEiZZaQ@mail.gmail.com>
References: <118F034CF4C3EF48A96F86CE585B94BF3140B017@CITESMBX5.ad.uillinois.edu>
	<CAKVJ-_4TN_OUGLGrcwp-cNkQGhXGn2BiO9F1NEzAiua49HY3Lg@mail.gmail.com>
	<CAKVJ-_7g-rVQ37Q0Svg4LGXN7C=3i7Mab-kjUc2XpuuX5uEXsQ@mail.gmail.com>
	<CAKVJ-_45r1Hza-ga95oyMRbYGzupsKtmAZ=ac29g2n-U9SegWA@mail.gmail.com>
	<118F034CF4C3EF48A96F86CE585B94BF3140D21F@CITESMBX5.ad.uillinois.edu>
	<CAKVJ-_4iYB0N7A6eOEivrKTkeQZzw=9jiS7kqVCKR2aeEiZZaQ@mail.gmail.com>
Message-ID: <118F034CF4C3EF48A96F86CE585B94BF3140D2D6@CITESMBX5.ad.uillinois.edu>


On Jul 27, 2012, at 10:03 AM, Peter Cock wrote:

> On Fri, Jul 27, 2012 at 3:58 PM, Fields, Christopher J
> <cjfields at illinois.edu> wrote:
>> On Jul 27, 2012, at 9:47 AM, Peter Cock wrote:
>> 
>>> On Thu, Jul 26, 2012 at 4:22 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>>>> 
>>>> That's done now - except for the circular dependencies, and GD,
>>>> which might be easy to solve if anyone knows what the error
>>>> means - see commit message here:
>>>> https://github.com/peterjc/bioperl-live/commit/905441ac09939be3368c14de38d04486c7e9849a
>>> 
>>> Re: https://twitter.com/cjfields/status/228861370454638592
>>> Not sure why you got GD to work when something very similar
>>> had failed for me. Oh well - job done :)
>> 
>> It was the lack of gdlib-config in the libgd2-xpm package, you need
>> libgd2-xpm-dev.  One of the fun things about Debian packaging.
> 
> Ah - I should have guessed that.
> 
>>>> Would a single clean commit of the (current) .travis.yml file be
>>>> preferable to the current series of commits? And you you want
>>>> a pull request, or would you just merge/cherry-pick manually?
>>> 
>>> Given all the churn between our revisions, personally I'd opt for
>>> a single clean commit to bioperl/master - but your call.
>>> 
>>> Peter
>> 
>> Yep, about to merge it over.  It's working now, just need to
>> whitelist master instead of travis after the merge.
> 
> I'd removed the whitelist altogether here:
> https://github.com/peterjc/bioperl-live/commit/96dc5866f4406179353909c72d812623341c8fbd
> 
> My thinking was BioPerl seems to have multiple feature branches
> under the official repo, so they should get tested too. You'd be
> in a better position than me to judge what would work best for
> BioPerl here.
> 
> Peter

We'll keep it to master for now.  It's pretty easy to add branches as needed, and I didn't want to expand to all the potentially stale branches unless explicitly set (we need to triage all those at some point).

chris

From p.j.a.cock at googlemail.com  Fri Jul 27 10:47:18 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 27 Jul 2012 15:47:18 +0100
Subject: [Bioperl-l] BioPerl and Travis-CI
In-Reply-To: <CAKVJ-_7g-rVQ37Q0Svg4LGXN7C=3i7Mab-kjUc2XpuuX5uEXsQ@mail.gmail.com>
References: <118F034CF4C3EF48A96F86CE585B94BF3140B017@CITESMBX5.ad.uillinois.edu>
	<CAKVJ-_4TN_OUGLGrcwp-cNkQGhXGn2BiO9F1NEzAiua49HY3Lg@mail.gmail.com>
	<CAKVJ-_7g-rVQ37Q0Svg4LGXN7C=3i7Mab-kjUc2XpuuX5uEXsQ@mail.gmail.com>
Message-ID: <CAKVJ-_45r1Hza-ga95oyMRbYGzupsKtmAZ=ac29g2n-U9SegWA@mail.gmail.com>

On Thu, Jul 26, 2012 at 4:22 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>
> That's done now - except for the circular dependencies, and GD,
> which might be easy to solve if anyone knows what the error
> means - see commit message here:
> https://github.com/peterjc/bioperl-live/commit/905441ac09939be3368c14de38d04486c7e9849a

Re: https://twitter.com/cjfields/status/228861370454638592
Not sure why you got GD to work when something very similar
had failed for me. Oh well - job done :)

> Would a single clean commit of the (current) .travis.yml file be
> preferable to the current series of commits? And you you want
> a pull request, or would you just merge/cherry-pick manually?

Given all the churn between our revisions, personally I'd opt for
a single clean commit to bioperl/master - but your call.

Peter

From robfsouza at gmail.com  Fri Jul 27 18:29:22 2012
From: robfsouza at gmail.com (Robson de Souza)
Date: Fri, 27 Jul 2012 15:29:22 -0700 (PDT)
Subject: [Bioperl-l] obf sites offline?
Message-ID: <9bef8a3b-08ca-4868-be7a-193e7596290d@googlegroups.com>

I can't access any of the OBF sites, either from work (USA) or my phone... 
is there something going on?
Robson

From p.j.a.cock at googlemail.com  Thu Jul 26 11:22:26 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 26 Jul 2012 16:22:26 +0100
Subject: [Bioperl-l] BioPerl and Travis-CI
In-Reply-To: <CAKVJ-_4TN_OUGLGrcwp-cNkQGhXGn2BiO9F1NEzAiua49HY3Lg@mail.gmail.com>
References: <118F034CF4C3EF48A96F86CE585B94BF3140B017@CITESMBX5.ad.uillinois.edu>
	<CAKVJ-_4TN_OUGLGrcwp-cNkQGhXGn2BiO9F1NEzAiua49HY3Lg@mail.gmail.com>
Message-ID: <CAKVJ-_7g-rVQ37Q0Svg4LGXN7C=3i7Mab-kjUc2XpuuX5uEXsQ@mail.gmail.com>

On Wed, Jul 25, 2012 at 11:31 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Wed, Jul 25, 2012 at 4:08 AM, Fields, Christopher J
> <cjfields at illinois.edu> wrote:
>> Peter Cock has graciously helped start up a branch for bioperl-live
>> that is using Travis-CI (a nice continuous integration tool).  Results
>> from Peter's fork are found here:
>>
>>     http://travis-ci.org/#!/peterjc/bioperl-live
>>
>> As this is now pulled into the main bioperl repo, results will be here:
>>
>>     http://travis-ci.org/#!/bioperl/bioperl-live
>>
>> I'll be working on this and expect this will be added to master in
>> the next few days.
>>
>> chris
>
> We've had this running for Biopython for a month now, and it has
> been a useful complement to the BuildBot (which covers other
> operating systems). This was following BioRuby's lead:
> http://biopython.org/pipermail/biopython-dev/2012-June/009742.html
>
> The current BioPerl Travis configuration is probably usable right
> now (after changing the branch whitelist to either master, or simple
> all branches).
>
> Other remaining issues include sorting out which dependencies
> should be installed, and streamlining their verbose output (e.g.
> using tail).

That's done now - except for the circular dependencies, and GD,
which might be easy to solve if anyone knows what the error
means - see commit message here:
https://github.com/peterjc/bioperl-live/commit/905441ac09939be3368c14de38d04486c7e9849a

Would a single clean commit of the (current) .travis.yml file be
preferable to the current series of commits? And you you want
a pull request, or would you just merge/cherry-pick manually?

> TravisCI can send out emails (e.g. on test failures), and perhaps
> bioperl-guts-l might be a sensible place to send these. Initially
> we'd disabled the emails for Biopython. I'd like to use an RSS
> feed... there is a JSON API which BioRuby are using for
> http://www.biogems.info/ which tracks their plugins.

I've filed an issue for news feed support in TravisCI,
https://github.com/travis-ci/travis-core/issues/82

Regards,

Peter

From p.j.a.cock at googlemail.com  Tue Jul 31 06:37:35 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 31 Jul 2012 11:37:35 +0100
Subject: [Bioperl-l] Travis Continuous Integration testing & pull requests
Message-ID: <CAKVJ-_5Yt8O21Wo0o6p1VgjheaaPYBDQEbF6geaABb1M++GfRA@mail.gmail.com>

Hi all,

I'm cross posting as this is an announcement. Please keep any
follow up discussion to the relevant project specific mailing list,
or if general open-bio-l please.

Those following the OBF blog or the OBF or Bio* Twitter accounts
will have already seen this, which I posted yesterday:
http://news.open-bio.org/news/2012/07/travis-ci-for-testing/

In summary, since earlier this year BioRuby and then Biopython and
BioPerl have been using Travis-CI.org (a hosted continuous
integration service for the open source community) to run their unit
tests automatically whenever their GitHub repositories are updated.

In addition we now have TravisCI automatically running our tests
on any new GitHub pull requests - supported by an OBF donation
to Travis-CI, see:
http://about.travis-ci.org/blog/announcing-pull-request-support/

Currently BioJava only uses GitHub as an SVN mirror - but this
should still let you start using TravisCI for automated testing:
http://about.travis-ci.org/docs/user/languages/java/

For EMBOSS, this is another incentive to convert from CVS to
github - TravisCI recently announced support for C/C++ projects:
http://about.travis-ci.org/blog/support_for_go_c_and_cpp/
http://about.travis-ci.org/docs/user/languages/c/

Potentially there are other OBF projects where this would be
useful too.

Regards,

Peter

From wrp at virginia.edu  Mon Jul  2 10:31:40 2012
From: wrp at virginia.edu (William Pearson)
Date: Mon, 2 Jul 2012 10:31:40 -0400
Subject: [Bioperl-l] Application Deadline - 2012 CSHL Computational and
	Comparative Genomics Course
Message-ID: <AAD6FEEC-F120-4B30-8208-1E48D8129FE6@virginia.edu>


Course announcement - Application deadline, July 15, 2012

Cold Spring Harbor COMPUTATIONAL & COMPARATIVE GENOMICS

Oct 31 - Nov 6, 2011
Application Deadline: July 15, 2012

INSTRUCTORS:

William Pearson, University of Virginia, Charlottesville, VA
Lisa Stubbs, University of Illinois, Urbana, IL
This course presents a comprehensive overview of the theory and practice of computational methods for the identification and characterization of functional elements from DNA sequence data. The course focuses on approaches for extracting the maximum amount of information from protein and DNA sequence similarity through sequence database searches, statistical analysis, and multiple sequence alignment. Additional topics include:

Alignment and analysis of "Next-Gen" sequencing data
The Galaxy environment for high-throughput analysis
Identification of conserved signals in aligned and unaligned sequences
Regulatory element and motif recognition
Integration of genetic and sequence information in biological databases
The ENSEMBL genome browser and BioMart
Function/phenotype prediction for sequence variants
The course combines lectures with hands-on exercises; students are encouraged to pose challenging sequence analysis problems using their own data. The course is designed for biologists seeking advanced training in biological sequence and genome analysis, computational biology core resource directors and staff, and for scientists in other disciplines, such as computer science, who wish to survey current research problems in biological sequence analysis. Advanced programming skills are not required.

The lecture/lab schedule for the 2011 course can be found at fasta.bioch.virginia.edu/cshl

Speakers in the 2011 course included:
Aaron Mackey, U. of Virginia, Next-Gen analysis pipelines
Bert Overduin, European Bioinformatics Institute, UK, ENSEMBL and BioMart
Francis Ouellette, Ontario Institute for Cancer Research, Databases for Biological Function
William Pearson, U. of Virginia, Similarity Searching, Multiple Alignment
Lisa Stubbs, U. of Illinois, Urbana, ChIP, Transcription Factors, and Comparative Genomics
James Taylor, Emory, Galaxy and genome analysis pipelines
The primary focus of the computational and comparative genomics course is the theory and practice of algorithms used in computational biology, with the goal of using current methods more effectively and evaluating new approaches. Students who wish to learn Perl programming for Bioinformatics are encouraged to apply to the Programming for Biology course. Students who would like in-depth training in the analysis of next-generation sequencing data (e.g., SNP calling and the detection of structural variants) should apply to the course on Advanced Sequencing Technologies & Applications. This Computational and Comparative Genomics course will discuss methods for phenotype prediction from variation data.

To apply to the course, fill out and send in the form at:

http://meetings.cshl.edu/course/courseapp_instr.shtml


From shalabh.sharma7 at gmail.com  Mon Jul  2 13:09:57 2012
From: shalabh.sharma7 at gmail.com (shalabh sharma)
Date: Mon, 2 Jul 2012 13:09:57 -0400
Subject: [Bioperl-l] translation frame problem in bioperl
Message-ID: <CAA7rn9cnqjsMv-ggCn_H+WavqdtCZZfh37v8pkLak1-ct8UX+A@mail.gmail.com>

Hi All,
         I am just confused about the translation frames. I used bioperl to
parse a blastx report.
Reports shows that the frame used is -2 but when i translate the sequence
using EMBOSS or Some other program the frame is -1.
Am i doing something wrong here.

Here is the sequence:
>gi|378759230|gb|AHBJ01000169.1| SAR86 cluster bacterium SAR86D
scf1120176765857, whole genome shotgun sequence 2642:3697
AGCTTCCCATGGAACCCATGCAAGTGCAATATTTGTTTCTAGCTCTGGTGACCACCAAGGAGATGTCACGTAGCCCACCTCATCTTCATCAGTATTAGTTACTATCCAAAAATCAGAAGCATAATCTGTGATTTCTTTTCCTCCAAGGGTTAAACCAACCATCTTCATTTTAAATGGTGCATTTCCTTCATCTATGATTGCTCTCTGTTTTTCAAGCTCTTCTTTACCAATGTAATCAGCTGCTTTATTTCTTGGTACCTGATAACTTAAATTAACCTGAAAGGGAGAAGTTTCATGATCCAGATCTTGTCCCCAAGACAAAATTCCAGCTGCAATGCGACGATGATGCGCAGGAGCTATGACCATTAAGCCAAATTCTTCTCCAGCCTCAAGAACAGCATTCCACATTTTTTCTGCATTATCATGTGCGTCACGAACATATATTTCATAACCTTTTTCGCCTGTAAAACCAGTTTGACTGATTACACAATCAGCTCCACCAACCTGAGTTTCTAAAATTCCATAATAAGGAACTTCTCTTAACTCTTCGCCAGCTAACTTTGCCATAAGATCTTCAGATAAAGGGCCTTGAATTTGAACAGGACAAACATCAATCTCATCAATTTCTACGTCATATTTTTTAGACACATTTACGCCTTGAAGCCAAAGTAAGAGATCGCTGTCTGATATTGAGAACCAGAATTCATCTTCTGTTAGTCTTAATAGAACAGGGTCATTTAAAACCCCTCCTTTTTCATTGCATAAAATCGCATATTTACCATTTCCGGGTTTAATTTTTGTAGCATCACGAGTTATTACATAATCTGTAAAAGCTTCTGCATCTGGACCTTTTACTCTTATCTGTCTTTCAACAGCAACATTCCACATAGTAACTCTATTAACCAAGGCTTCGTATTCAACCATGGCACCGCCATCTTCAGGTTTTACATAGCCTCGTGGATGATAAATTCGATTATATACAGTTGCTCTCCAACAGCCCGCTTCATGAGATAGATGCCAAAAAGGCGATTTTCTTACCCGGGTTGAAATTAATAA

This is a part of blast report by bioperl:
>JCVI_READ_1105499496127 /Indian_Ocean/gcvT
          Length = 352

 Score =  655 bits (1690), Expect = 0.0
 Identities = 311/352 (88%), Positives = 329/352 (93%)
 Frame = -2

Query: 3697 LLISTRVRKSPFWHLSHEAGCWRATVYNRIYHPRGYVKPEDGGAMVEYEALVNRVTMWNV
3518
            +LISTRVRKSPFWHLSHEAGCWRATVYNRIYHPRGY+KPEDGGAMVEY+ALVNRVTMWNV
Sbjct: 1    MLISTRVRKSPFWHLSHEAGCWRATVYNRIYHPRGYIKPEDGGAMVEYDALVNRVTMWNV 60
 .....
.....
Query: 2797 GLTLGGKEITDYASDFWIVTNTDEDEVGYVTSPWWSPELETNIALAWVPWEA 2642
            GLTLGGKEITDYA DFW+V + D   +     PWWSPEL TNIAL WVPW A
Sbjct: 301  GLTLGGKEITDYAPDFWLVADMDGMMLDISLPPWWSPELNTNIALGWVPWSA 352

This is EMBOSS output (from EBI):

>EMBOSS_001_4
LLISTRVRKSPFWHLSHEAGCWRATVYNRIYHPRGYVKPEDGGAMVEYEALVNRVTMWNV
AVERQIRVKGPDAEAFTDYVITRDATKIKPGNGKYAILCNEKGGVLNDPVLLRLTEDEFW
FSISDSDLLLWLQGVNVSKKYDVEIDEIDVCPVQIQGPLSEDLMAKLAGEELREVPYYGI
LETQVGGADCVISQTGFTGEKGYEIYVRDAHDNAEKMWNAVLEAGEEFGLMVIAPAHHRR
IAAGILSWGQDLDHETSPFQVNLSYQVPRNKAADYIGKEELEKQRAIIDEGNAPFKMKMV
GLTLGGKEITDYASDFWIVTNTDEDEVGYVTSPWWSPELETNIALAWVPWEA
>EMBOSS_001_5
INFNPGKKIAFLASIS*SGLLESNCI*SNLSSTRLCKT*RWRCHG*IRSLG**SYYVECC
C*KTDKSKRSRCRSFYRLCNNS*CYKN*TRKW*ICDFMQ*KRRGFK*PCSIKTNRR*ILV

......

You can see its a frame -1.

I would really appreciate your help.


Thanks

Shalabh

-- 
Shalabh Sharma
Scientific Computing Professional Associate (Bioinformatics Specialist)
Department of Marine Sciences
University of Georgia
Athens, GA 30602-3636


From scott at scottcain.net  Mon Jul  2 14:50:45 2012
From: scott at scottcain.net (Scott Cain)
Date: Mon, 2 Jul 2012 14:50:45 -0400
Subject: [Bioperl-l] GMOD Summer School application deadline
Message-ID: <CA+JTaoym218vQ9_VvtwvMW9ZaoHfcu1hxG59L2z=ovE6qWN6Mw@mail.gmail.com>

Hello,

The deadline to apply for the GMOD Summer School is in one week, July
9th.  The application is available as a Google Form:

  https://docs.google.com/spreadsheet/embeddedform?formkey=dG5hNGFiQ3UwYTV2LUZxZW04Qm1yZXc6MQ

In the GMOD Summer School (August 24-29, 2012) we will cover the
installation, configuration and use of a variety of GMOD tools,
including Chado, GBrowse, JBrowse and Tripal.  For more information on
the course, see the course web page at

  http://gmod.org/wiki/2012_GMOD_Summer_School

The course will make heavy use of the Amazon Web Service (aka, the
Cloud) via a grant from Amazon.  Enrollment is limited to 24 students,
and the application process is competitive: the last few years we've
received over 75 applications for those 24 spots.

I look forward to seeing you in North Carolina in August!

Scott


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research


From p.j.a.cock at googlemail.com  Mon Jul  2 15:34:40 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 2 Jul 2012 20:34:40 +0100
Subject: [Bioperl-l] translation frame problem in bioperl
In-Reply-To: <CAA7rn9cnqjsMv-ggCn_H+WavqdtCZZfh37v8pkLak1-ct8UX+A@mail.gmail.com>
References: <CAA7rn9cnqjsMv-ggCn_H+WavqdtCZZfh37v8pkLak1-ct8UX+A@mail.gmail.com>
Message-ID: <CAKVJ-_5nX7f2ZejEpeVnMhd35PYxxrJX2mxu4A712NPOxvie0g@mail.gmail.com>

On Mon, Jul 2, 2012 at 6:09 PM, shalabh sharma
<shalabh.sharma7 at gmail.com> wrote:
> Hi All,
> ? ? ? ? ?I am just confused about the translation frames. I used bioperl to
> parse a blastx report.
> Reports shows that the frame used is -2 but when i translate the sequence
> using EMBOSS or Some other program the frame is -1.
> Am i doing something wrong here.

Possibly there are conflicting definitions of frames -1, -2, and -3 here
(and that's leaving out the possibility of -0, -1 and -2 counting). Some
will count from the first base (start for forward strand), others the last
base (start of reverse strand). This can make comparing the output
of different tools quite confusing.

Peter


From shalabh.sharma7 at gmail.com  Mon Jul  2 16:39:29 2012
From: shalabh.sharma7 at gmail.com (shalabh sharma)
Date: Mon, 2 Jul 2012 16:39:29 -0400
Subject: [Bioperl-l] translation frame problem in bioperl
In-Reply-To: <98F2B304-71F3-42BD-9603-6858F03CC9F5@verizon.net>
References: <CAA7rn9cnqjsMv-ggCn_H+WavqdtCZZfh37v8pkLak1-ct8UX+A@mail.gmail.com>
	<CAKVJ-_5nX7f2ZejEpeVnMhd35PYxxrJX2mxu4A712NPOxvie0g@mail.gmail.com>
	<98F2B304-71F3-42BD-9603-6858F03CC9F5@verizon.net>
Message-ID: <CAA7rn9eiKh6EszMsV=Vk65rJPywOUJijP0Q=rYg_vt=xoK-0=Q@mail.gmail.com>

Hi Peter and Brian,
                           Thanks a lot for your reply.
I have already taken this in account.
So if i parse the blast report (my previous example)
i get strand '-1' and frame '1' (according to bioperl)
so if we convert it to general term then its -2 because bioperl starts from
0.

Also for bioperl forward frame translation working fine.


Thanks
Shalabh
On Mon, Jul 2, 2012 at 4:24 PM, Brian Osborne <bosborne11 at verizon.net>wrote:

> Shalabh,
>
> Also take a look at this:
>
> http://www.bioperl.org/wiki/HOWTO:SearchIO#frame.28.29
>
> Brian O.
>
>
> On Jul 2, 2012, at 3:34 PM, Peter Cock wrote:
>
> > On Mon, Jul 2, 2012 at 6:09 PM, shalabh sharma
> > <shalabh.sharma7 at gmail.com> wrote:
> >> Hi All,
> >>          I am just confused about the translation frames. I used
> bioperl to
> >> parse a blastx report.
> >> Reports shows that the frame used is -2 but when i translate the
> sequence
> >> using EMBOSS or Some other program the frame is -1.
> >> Am i doing something wrong here.
> >
> > Possibly there are conflicting definitions of frames -1, -2, and -3 here
> > (and that's leaving out the possibility of -0, -1 and -2 counting). Some
> > will count from the first base (start for forward strand), others the
> last
> > base (start of reverse strand). This can make comparing the output
> > of different tools quite confusing.
> >
> > Peter
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


-- 
Shalabh Sharma
Scientific Computing Professional Associate (Bioinformatics Specialist)
Department of Marine Sciences
University of Georgia
Athens, GA 30602-3636


From bosborne11 at verizon.net  Mon Jul  2 16:24:24 2012
From: bosborne11 at verizon.net (Brian Osborne)
Date: Mon, 02 Jul 2012 16:24:24 -0400
Subject: [Bioperl-l] translation frame problem in bioperl
In-Reply-To: <CAKVJ-_5nX7f2ZejEpeVnMhd35PYxxrJX2mxu4A712NPOxvie0g@mail.gmail.com>
References: <CAA7rn9cnqjsMv-ggCn_H+WavqdtCZZfh37v8pkLak1-ct8UX+A@mail.gmail.com>
	<CAKVJ-_5nX7f2ZejEpeVnMhd35PYxxrJX2mxu4A712NPOxvie0g@mail.gmail.com>
Message-ID: <98F2B304-71F3-42BD-9603-6858F03CC9F5@verizon.net>

Shalabh,

Also take a look at this:

http://www.bioperl.org/wiki/HOWTO:SearchIO#frame.28.29

Brian O.


On Jul 2, 2012, at 3:34 PM, Peter Cock wrote:

> On Mon, Jul 2, 2012 at 6:09 PM, shalabh sharma
> <shalabh.sharma7 at gmail.com> wrote:
>> Hi All,
>>          I am just confused about the translation frames. I used bioperl to
>> parse a blastx report.
>> Reports shows that the frame used is -2 but when i translate the sequence
>> using EMBOSS or Some other program the frame is -1.
>> Am i doing something wrong here.
> 
> Possibly there are conflicting definitions of frames -1, -2, and -3 here
> (and that's leaving out the possibility of -0, -1 and -2 counting). Some
> will count from the first base (start for forward strand), others the last
> base (start of reverse strand). This can make comparing the output
> of different tools quite confusing.
> 
> Peter
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From vebaev at gmail.com  Tue Jul  3 12:35:26 2012
From: vebaev at gmail.com (vebaev at gmail.com)
Date: Tue, 3 Jul 2012 09:35:26 -0700 (PDT)
Subject: [Bioperl-l] CFP - International Conference on Bioinformatics and
 Computational Biology - BIOCOMP BG 2012
Message-ID: <7b498b4c-2b2e-4e1f-871f-513203488bf1@googlegroups.com>

International Conference on Bioinformatics and Computational Biology - 
BIOCOMP BG 2012 
September 20-21, 2012, Varna, Bulgaria
 

Dear Colleague,
It is our pleasure to circulate the 2nd announcement of the International 
Conference on Bioinformatics and Computational Biology - BIOCOMP 2012 
(http://biocomp.bio.uni-plovdiv.bg/).
 
Keynote speakers
 
Prof. Dr. Klaas Vandepoele - Ghent University, Belgium
Dr. Andreas Gisel - Institute for Biomedical Technologies, Italy
Prof. Wojciech Karlowski - Insitute of Molecular Biology and Biotechnology, 
Poland
Prof. Mario A. Fares - University of Dublin, Trinity College, Ireland
Dr.Andrey Kajava - CRBM - Macromolecular Biochemistry Research Center, 
France
Dr.Gaurav Sablok - Istituto Agrario San Michele (IASMA), Italy
 
Topics
 
Topics of interest include, but are not limited to:
 
High-performance bio-computing
High-throughput sequencing data analysis (NGS)
Bio-ontologies
Molecular evolution
Comparative genomics
Molecular modeling and simulation
Computational genetics 
Computational proteomics
Data mining and visualization
Software tools and applications
Gene expression analysis
Gene networks
Structural biology
Genome analysis
Databases
Systems biology
 
Special topic: bioinformatics and miRNAs
 
Recent achievements in these fields will be presented. The conference will 
include plenary and poster sessions. Participant?s proposals will be taken 
under advisement in compiling the program. 
 
Publications
 
All accepted abstracts will be published in the conference abstract book.
Best 20 abstracts will be peer-reviewed and published as full text 
manuscripts in a Special Issue of Springer and Elsevier journals:
Interdisciplinary Sciences: Computational Life Sciences (ISSN: 1867-1462).
Journal of Computational Science (ISSN: 1877-7503)


Venue
The venue of the conference is 4-star All-inclusive Sunny Day Black Sea 
resort, Bulgaria
 
Registration and abstract submission
 
All the actions related to the BIOCOMP 2012  (abstract submission, 
registration etc) may be completed via the Conference website at 
http://biocomp.bio.uni-plovdiv.bg/
 
Accommodation
 
IMPORTANT: Accommodation is included in the conference registration fee.
 
Important dates
 
Abstract Submission Deadline - 20 August 2012
Early Registration Fee Payment Deadline - 20 August 2012
Arriving, Poster set up, Registration ? 19 September 2012
Plenary and Poster Sessions ? 20-21 September 2012
 
 
You may find details of the Conference visiting the Conference website at 
http://biocomp.bio.uni-plovdiv.bg/
 
 
Looking forward to see you in Bulgaria!


------------------------------------------------


Dr. Vesselin Baev
Research Assistant Professor

University of Plovdiv
Dept. Plant Phys. and Molecular Biology
Bioinformatics SMART Group

Tzar Assen 24,Plovdiv 4000, BULGARIA
Office:+359 32 261 (560); Mobile:+359 89 43 80 945
vebaev at gmail.com; baev at uni-plovdiv.bg; CV: http://plantgene.eu/


From tarakaramji at gmail.com  Tue Jul  3 15:33:43 2012
From: tarakaramji at gmail.com (Tarakaramji Moturu)
Date: Tue, 3 Jul 2012 19:33:43 +0000 (UTC)
Subject: [Bioperl-l] Invitation to connect on LinkedIn
Message-ID: <757342252.16905070.1341344023805.JavaMail.app@ela4-bed83.prod>

LinkedIn
------------


I'd like to add you to my professional network on LinkedIn.

- Tarakaramji

Tarakaramji Moturu
Student at GITAM University
Vishakhapatnam Area, India

Confirm that you know Tarakaramji Moturu:
https://www.linkedin.com/e/1505z7-h47dlkop-69/isd/7726719493/9xC087NO/?hs=false&tok=2UuxBwCCkl7Rk1

--
You are receiving Invitation to Connect emails. Click to unsubscribe:
http://www.linkedin.com/e/1505z7-h47dlkop-69/q7l5PgNeLXh3mAgNJzs79PDWzhT0l80xWa/goo/bioperl-l%40bioperl%2Eorg/20061/I2613636655_1/?hs=false&tok=0hY4YIDwkl7Rk1

(c) 2012 LinkedIn Corporation. 2029 Stierlin Ct, Mountain View, CA 94043, USA.


From l.m.timmermans at students.uu.nl  Wed Jul  4 03:16:34 2012
From: l.m.timmermans at students.uu.nl (Leon Timmermans)
Date: Wed, 4 Jul 2012 10:16:34 +0300
Subject: [Bioperl-l] Invitation to connect on LinkedIn
In-Reply-To: <757342252.16905070.1341344023805.JavaMail.app@ela4-bed83.prod>
References: <757342252.16905070.1341344023805.JavaMail.app@ela4-bed83.prod>
Message-ID: <CAC1jpXBYn+3+TtN=n7NoRhnQ87-EuYRpsuBqpE5iHHHAntJ_ew@mail.gmail.com>

On Tue, Jul 3, 2012 at 10:33 PM, Tarakaramji Moturu
<tarakaramji at gmail.com> wrote:
> LinkedIn
> ------------
>
>
>
> I'd like to add you to my professional network on LinkedIn.
>
> - Tarakaramji

Sending messages like this directly over mailinglists is a rather bad
idea, if only because LinkedIn will think bioperl-l at bioperl.org is one
of the email addresses of whomever accepts the request (which is
relevant for retrieving a lost password, I think).

Leon


From ulrik.stervbo at gmail.com  Fri Jul  6 03:03:08 2012
From: ulrik.stervbo at gmail.com (Ulrik Stervbo)
Date: Fri, 6 Jul 2012 09:03:08 +0200
Subject: [Bioperl-l] problem in using protparam.pm module
In-Reply-To: <CACyyM1bpg+pOwCK26M2OgNaBD2VJsaXtc02WkGHrKg1QxVL0Tw@mail.gmail.com>
References: <CACyyM1Za0pW9R2GPxAFOATj4tPM24rbwLsUazGUFHNimZ2LO4g@mail.gmail.com>
	<CACyyM1ZUf2y7fO2Vkc5cSoqSgtW86GA+1yt_ctc9Y8P3Qbnt_A@mail.gmail.com>
	<9AA27ADA-FFE1-4735-BDE4-56C9B9A18009@illinois.edu>
	<CACyyM1bpg+pOwCK26M2OgNaBD2VJsaXtc02WkGHrKg1QxVL0Tw@mail.gmail.com>
Message-ID: <CAKVAULOTmTparqLEuREquzcrO6Hf+Z7MSGRAPBWMTqH0dJ7-xg@mail.gmail.com>

I had the same problem, and realized it is because I am behind a proxy.

This is what I did to the Protparam module:
Changed the url to 'http://web.expasy.org/cgi-bin/protparam/protparam'
as previously found

Added:
$browser->proxy(['http'], 'http://[my proxy]/'); after initialization
of the LWP agent.

The proxy settings is what made Perl choke. (If only one could make
perl see global proxy settings).

Cheers,
Ulrik

2011/7/28 Shachi Gahoi <shachigahoimbi at gmail.com>:
> Please help me how to run protparam using bioperl module
>
> On Wed, Jul 27, 2011 at 11:05 AM, Chris Fields <cjfields at illinois.edu>wrote:
>
>> The web service appears to have changed, but it looks as if no tests have
>> been written up for this module which would have caught this out.  We can
>> write some basic tests up to check for simple functionality.
>>
>> chris
>>
>> On Jul 26, 2011, at 10:58 PM, Shachi Gahoi wrote:
>>
>> > Dear All,
>> >
>> > i am using protparam.pm module. but when i am running this script it is
>> > printing one error message
>> >
>> > "Can't call method "throw" without a package or object reference at
>> > /usr/share/perl5/Bio/Root/Root.pm line 368, <GEN1> line 1."
>> >
>> > Kindly help me to solve this problem.
>> >
>> >
>> > Script is here----
>> >
>> ###################################################################################
>> > #!/usr/bin/perl
>> >
>> > use warnings;
>> > use Bio::SeqIO;
>> > use Bio::Tools::Protparam;
>> >
>> >
>> > $seqfile='test1.fasta';
>> >
>> > $seqio = Bio::SeqIO->new(-file => "$seqfile", -format => 'Fasta');
>> >
>> >
>> >    while(  $seq = $seqio->next_seq() )
>> > {
>> >
>> >
>> >        my $pp = Bio::Tools::Protparam->new(-seq=>$seq->seq);
>> >
>> >        print
>> >        "ID : ", $seq->display_id,"\n",
>> >        "Amino acid number : ",$pp->amino_acid_number(),"\n",
>> >        "Number of negative amino acids : ",$pp->num_neg(),"\n",
>> >        "Number of positive amino acids : ",$pp->num_pos(),"\n",
>> >        "Molecular weight : ",$pp->molecular_weight(),"\n",
>> >        "Theoretical pI : ",$pp->theoretical_pI(),"\n",
>> >        "Total number of atoms : ", $pp->total_atoms(),"\n",
>> >        "Number of carbon atoms : ",$pp->num_carbon(),"\n",
>> >        "Number of hydrogen atoms : ",$pp->num_hydrogen(),"\n",
>> >        "Number of nitrogen atoms : ",$pp->num_nitro(),"\n",
>> >        "Number of oxygen atoms : ",$pp->num_oxygen(),"\n",
>> >        "Number of sulphur atoms : ",$pp->num_sulphur(),"\n",
>> >        "Half life : ", $pp->half_life(),"\n",
>> >        "Instability Index : ", $pp->instability_index(),"\n",
>> >        "Stability class : ", $pp->stability(),"\n",
>> >        "Aliphatic_index : ",$pp->aliphatic_index(),"\n",
>> >        "Gravy : ", $pp->gravy(),"\n",
>> >        "Composition of A : ", $pp->AA_comp('A'),"\n",
>> >        "Composition of R : ", $pp->AA_comp('R'),"\n",
>> >        "Composition of N : ", $pp->AA_comp('N'),"\n",
>> >        "Composition of D : ", $pp->AA_comp('D'),"\n",
>> >        "Composition of C : ", $pp->AA_comp('C'),"\n",
>> >        "Composition of Q : ", $pp->AA_comp('Q'),"\n",
>> >        "Composition of E : ", $pp->AA_comp('E'),"\n",
>> >        "Composition of G : ", $pp->AA_comp('G'),"\n",
>> >        "Composition of H : ", $pp->AA_comp('H'),"\n",
>> >        "Composition of I : ", $pp->AA_comp('I'),"\n",
>> >        "Composition of L : ", $pp->AA_comp('L'),"\n",
>> >        "Composition of K : ", $pp->AA_comp('K'),"\n",
>> >        "Composition of M : ", $pp->AA_comp('M'),"\n",
>> >        "Composition of F : ", $pp->AA_comp('F'),"\n",
>> >        "Composition of P : ", $pp->AA_comp('P'),"\n",
>> >        "Composition of S : ", $pp->AA_comp('S'),"\n",
>> >        "Composition of T : ", $pp->AA_comp('T'),"\n",
>> >        "Composition of W : ", $pp->AA_comp('W'),"\n",
>> >        "Composition of Y : ", $pp->AA_comp('Y'),"\n",
>> >        "Composition of V : ", $pp->AA_comp('V'),"\n",
>> >        "Composition of B : ", $pp->AA_comp('B'),"\n",
>> >        "Composition of Z : ", $pp->AA_comp('Z'),"\n",
>> >        "Composition of X : ", $pp->AA_comp('X'),"\n";
>> > }
>> >
>> ###################################################################################
>> >
>> >
>> >
>> >
>> > --
>> > Regards,
>> > Shachi
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
>
> --
> Regards,
> Shachi
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason.stajich at gmail.com  Fri Jul  6 13:49:46 2012
From: jason.stajich at gmail.com (Jason Stajich)
Date: Fri, 6 Jul 2012 10:49:46 -0700
Subject: [Bioperl-l] problem in using protparam.pm module
In-Reply-To: <CAKVAULOTmTparqLEuREquzcrO6Hf+Z7MSGRAPBWMTqH0dJ7-xg@mail.gmail.com>
References: <CACyyM1Za0pW9R2GPxAFOATj4tPM24rbwLsUazGUFHNimZ2LO4g@mail.gmail.com>
	<CACyyM1ZUf2y7fO2Vkc5cSoqSgtW86GA+1yt_ctc9Y8P3Qbnt_A@mail.gmail.com>
	<9AA27ADA-FFE1-4735-BDE4-56C9B9A18009@illinois.edu>
	<CACyyM1bpg+pOwCK26M2OgNaBD2VJsaXtc02WkGHrKg1QxVL0Tw@mail.gmail.com>
	<CAKVAULOTmTparqLEuREquzcrO6Hf+Z7MSGRAPBWMTqH0dJ7-xg@mail.gmail.com>
Message-ID: <8C9056B6-1DA4-4BE0-B008-429C2F6C05BE@gmail.com>

you might try the PERL_LWP_ENV_PROXY and HTTP_PROXY env variables
http://search.cpan.org/~gaas/libwww-perl-6.04/lib/LWP/UserAgent.pm#CONSTRUCTOR_METHODS
http://search.cpan.org/~gaas/libwww-perl-6.04/lib/LWP/UserAgent.pm#Proxy_attributes

I can't test it my end though w/o a proxy service.


On Jul 6, 2012, at 12:03 AM, Ulrik Stervbo wrote:

> I had the same problem, and realized it is because I am behind a proxy.
> 
> This is what I did to the Protparam module:
> Changed the url to 'http://web.expasy.org/cgi-bin/protparam/protparam'
> as previously found
> 
> Added:
> $browser->proxy(['http'], 'http://[my proxy]/'); after initialization
> of the LWP agent.
> 
> The proxy settings is what made Perl choke. (If only one could make
> perl see global proxy settings).
> 
> Cheers,
> Ulrik
> 
> 2011/7/28 Shachi Gahoi <shachigahoimbi at gmail.com>:
>> Please help me how to run protparam using bioperl module
>> 
>> On Wed, Jul 27, 2011 at 11:05 AM, Chris Fields <cjfields at illinois.edu>wrote:
>> 
>>> The web service appears to have changed, but it looks as if no tests have
>>> been written up for this module which would have caught this out.  We can
>>> write some basic tests up to check for simple functionality.
>>> 
>>> chris
>>> 
>>> On Jul 26, 2011, at 10:58 PM, Shachi Gahoi wrote:
>>> 
>>>> Dear All,
>>>> 
>>>> i am using protparam.pm module. but when i am running this script it is
>>>> printing one error message
>>>> 
>>>> "Can't call method "throw" without a package or object reference at
>>>> /usr/share/perl5/Bio/Root/Root.pm line 368, <GEN1> line 1."
>>>> 
>>>> Kindly help me to solve this problem.
>>>> 
>>>> 
>>>> Script is here----
>>>> 
>>> ###################################################################################
>>>> #!/usr/bin/perl
>>>> 
>>>> use warnings;
>>>> use Bio::SeqIO;
>>>> use Bio::Tools::Protparam;
>>>> 
>>>> 
>>>> $seqfile='test1.fasta';
>>>> 
>>>> $seqio = Bio::SeqIO->new(-file => "$seqfile", -format => 'Fasta');
>>>> 
>>>> 
>>>>   while(  $seq = $seqio->next_seq() )
>>>> {
>>>> 
>>>> 
>>>>       my $pp = Bio::Tools::Protparam->new(-seq=>$seq->seq);
>>>> 
>>>>       print
>>>>       "ID : ", $seq->display_id,"\n",
>>>>       "Amino acid number : ",$pp->amino_acid_number(),"\n",
>>>>       "Number of negative amino acids : ",$pp->num_neg(),"\n",
>>>>       "Number of positive amino acids : ",$pp->num_pos(),"\n",
>>>>       "Molecular weight : ",$pp->molecular_weight(),"\n",
>>>>       "Theoretical pI : ",$pp->theoretical_pI(),"\n",
>>>>       "Total number of atoms : ", $pp->total_atoms(),"\n",
>>>>       "Number of carbon atoms : ",$pp->num_carbon(),"\n",
>>>>       "Number of hydrogen atoms : ",$pp->num_hydrogen(),"\n",
>>>>       "Number of nitrogen atoms : ",$pp->num_nitro(),"\n",
>>>>       "Number of oxygen atoms : ",$pp->num_oxygen(),"\n",
>>>>       "Number of sulphur atoms : ",$pp->num_sulphur(),"\n",
>>>>       "Half life : ", $pp->half_life(),"\n",
>>>>       "Instability Index : ", $pp->instability_index(),"\n",
>>>>       "Stability class : ", $pp->stability(),"\n",
>>>>       "Aliphatic_index : ",$pp->aliphatic_index(),"\n",
>>>>       "Gravy : ", $pp->gravy(),"\n",
>>>>       "Composition of A : ", $pp->AA_comp('A'),"\n",
>>>>       "Composition of R : ", $pp->AA_comp('R'),"\n",
>>>>       "Composition of N : ", $pp->AA_comp('N'),"\n",
>>>>       "Composition of D : ", $pp->AA_comp('D'),"\n",
>>>>       "Composition of C : ", $pp->AA_comp('C'),"\n",
>>>>       "Composition of Q : ", $pp->AA_comp('Q'),"\n",
>>>>       "Composition of E : ", $pp->AA_comp('E'),"\n",
>>>>       "Composition of G : ", $pp->AA_comp('G'),"\n",
>>>>       "Composition of H : ", $pp->AA_comp('H'),"\n",
>>>>       "Composition of I : ", $pp->AA_comp('I'),"\n",
>>>>       "Composition of L : ", $pp->AA_comp('L'),"\n",
>>>>       "Composition of K : ", $pp->AA_comp('K'),"\n",
>>>>       "Composition of M : ", $pp->AA_comp('M'),"\n",
>>>>       "Composition of F : ", $pp->AA_comp('F'),"\n",
>>>>       "Composition of P : ", $pp->AA_comp('P'),"\n",
>>>>       "Composition of S : ", $pp->AA_comp('S'),"\n",
>>>>       "Composition of T : ", $pp->AA_comp('T'),"\n",
>>>>       "Composition of W : ", $pp->AA_comp('W'),"\n",
>>>>       "Composition of Y : ", $pp->AA_comp('Y'),"\n",
>>>>       "Composition of V : ", $pp->AA_comp('V'),"\n",
>>>>       "Composition of B : ", $pp->AA_comp('B'),"\n",
>>>>       "Composition of Z : ", $pp->AA_comp('Z'),"\n",
>>>>       "Composition of X : ", $pp->AA_comp('X'),"\n";
>>>> }
>>>> 
>>> ###################################################################################
>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Regards,
>>>> Shachi
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>>> 
>> 
>> 
>> --
>> Regards,
>> Shachi
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org


From bubli_thakur at rediffmail.com  Sun Jul  1 10:59:29 2012
From: bubli_thakur at rediffmail.com (subarna thakur)
Date: Sun, 01 Jul 2012 14:59:29 -0000
Subject: [Bioperl-l] =?utf-8?q?Ks_saturation?=
Message-ID: <20120617031856.16345.qmail@f4mail-235-140.rediffmail.com>


Dear all,I am trying to calculate dn/ds values of &nbsp;all orthologous gene pair between a pair of genome using pairwsie_kaks.pl script within bioperl which evokes the codeml program in runmode -2. When I am analyzing the results, some of the genes have&nbsp;anomalously high dS or Ks values and some of them even reaching more than 100 as a result of which the average value of Ks for the whole genome &nbsp;shots up. These genes are orthologous genes and even share more than 50% sequence identity. Should I consider these genes for the anlysis or left them out. If I left them out, then upto what cutoff value of Ks should I consider for analysis. In some papers, I have found that they have considered Ks values as high as 5.6. Is there a way for determining the cutoff value for Ks?
Subarna&nbsp;&nbsp;


From haywardjeremya at gmail.com  Fri Jul  6 13:56:12 2012
From: haywardjeremya at gmail.com (Jeremy Hayward)
Date: Fri, 6 Jul 2012 14:56:12 -0300
Subject: [Bioperl-l] Two 'host' tags?
Message-ID: <CAJv8UuyLgFfHe-Vfyv1p+90ypO+NFQ5E_br1YYyL=PKg++yyNA@mail.gmail.com>

Hi--  Clueless newbie here, for which apologies.

I've posted a description of my problem, inputs and outputs, at Gist
2816510; https://gist.github.com/2816510

Briefly, I'm trying to take a genbank file (.gb), and create a FASTA
file with a specific identifier line for each sequence. Specifically,
I want the "host" tag as the identifier. With the help of the Bioperl
beginner readme and the HOWTO's (which are great!) I've worked out how
to loop through my sequences and get the 'host' tag for each one. For
some reason, I get two identifier lines for each sequence. I guess the
problem is in the 'for' loop--it's running the stuff below it twice,
once with the actual 'host' tag data and once with...nothing? Not
sure.

I think I can work out how to use s/ and a regex just to delete the
second identifier line, but that feels like I'm avoiding the problem
instead of fixing it. Any help appreciated!


Many thanks,

--Jeremy Hayward


From jason.stajich at gmail.com  Fri Jul  6 15:39:52 2012
From: jason.stajich at gmail.com (Jason Stajich)
Date: Fri, 6 Jul 2012 12:39:52 -0700
Subject: [Bioperl-l] Two 'host' tags?
In-Reply-To: <CAJv8UuyLgFfHe-Vfyv1p+90ypO+NFQ5E_br1YYyL=PKg++yyNA@mail.gmail.com>
References: <CAJv8UuyLgFfHe-Vfyv1p+90ypO+NFQ5E_br1YYyL=PKg++yyNA@mail.gmail.com>
Message-ID: <A572776F-CF80-4EB2-A853-B62EE5AF5266@gmail.com>

Hi Jeremy -

You are printing for every feature in the loop (e.g. the source and the misc_RNA ) - you only want to loop through the features, then grab the one which is source, then change or print the info when you see that.  So you could have an if( $feature->primary_tag eq 'source') in there or something as well.  Alternatively I've left it pretty much intact and just simplified it a bit.
You should also try and use Bio::SeqIO to print instead of your printing.

I updated the code here to be simpler - right now it warns you that you are printing IDs with spaces (which is something you should think about when it comes to your output file, but I don't know your downstream plans). Also you could put other info in the description field if you wanted to capture accession number or the endophyte name too.

https://gist.github.com/3062285

Best,
Jason
On Jul 6, 2012, at 10:56 AM, Jeremy Hayward wrote:

> Hi--  Clueless newbie here, for which apologies.
> 
> I've posted a description of my problem, inputs and outputs, at Gist
> 2816510; https://gist.github.com/2816510
> 
> Briefly, I'm trying to take a genbank file (.gb), and create a FASTA
> file with a specific identifier line for each sequence. Specifically,
> I want the "host" tag as the identifier. With the help of the Bioperl
> beginner readme and the HOWTO's (which are great!) I've worked out how
> to loop through my sequences and get the 'host' tag for each one. For
> some reason, I get two identifier lines for each sequence. I guess the
> problem is in the 'for' loop--it's running the stuff below it twice,
> once with the actual 'host' tag data and once with...nothing? Not
> sure.
> 
> I think I can work out how to use s/ and a regex just to delete the
> second identifier line, but that feels like I'm avoiding the problem
> instead of fixing it. Any help appreciated!
> 
> 
> Many thanks,
> 
> --Jeremy Hayward
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org


From bosborne11 at verizon.net  Fri Jul  6 15:51:11 2012
From: bosborne11 at verizon.net (Brian Osborne)
Date: Fri, 06 Jul 2012 15:51:11 -0400
Subject: [Bioperl-l] Two 'host' tags?
In-Reply-To: <CAJv8UuyLgFfHe-Vfyv1p+90ypO+NFQ5E_br1YYyL=PKg++yyNA@mail.gmail.com>
References: <CAJv8UuyLgFfHe-Vfyv1p+90ypO+NFQ5E_br1YYyL=PKg++yyNA@mail.gmail.com>
Message-ID: <456448FF-C413-42D1-833A-FAA74E4FEF9E@verizon.net>

Jeremy,

Looks like each of your individual sequences has 2 features, but you only care about the 'source' feature (   if ($feat_object->primary_tag eq "source") ?).

Also, try not to print out the sequence like you're doing, try to build a Sequence object for each input sequence and then write its contents to your fasta file using write_seq(). You will set the id for your Sequence object using display_name().

Brian O.


On Jul 6, 2012, at 1:56 PM, Jeremy Hayward wrote:

> Hi--  Clueless newbie here, for which apologies.
> 
> I've posted a description of my problem, inputs and outputs, at Gist
> 2816510; https://gist.github.com/2816510
> 
> Briefly, I'm trying to take a genbank file (.gb), and create a FASTA
> file with a specific identifier line for each sequence. Specifically,
> I want the "host" tag as the identifier. With the help of the Bioperl
> beginner readme and the HOWTO's (which are great!) I've worked out how
> to loop through my sequences and get the 'host' tag for each one. For
> some reason, I get two identifier lines for each sequence. I guess the
> problem is in the 'for' loop--it's running the stuff below it twice,
> once with the actual 'host' tag data and once with...nothing? Not
> sure.
> 
> I think I can work out how to use s/ and a regex just to delete the
> second identifier line, but that feels like I'm avoiding the problem
> instead of fixing it. Any help appreciated!
> 
> 
> Many thanks,
> 
> --Jeremy Hayward
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From dejian.zhao at gmail.com  Wed Jul 11 13:31:37 2012
From: dejian.zhao at gmail.com (De-Jian Zhao)
Date: Thu, 12 Jul 2012 01:31:37 +0800
Subject: [Bioperl-l] Errors with Bio::Seq objects
Message-ID: <4FFDB879.1020906@gmail.com>

Hi,

I downloaded a nucleotide sequence from Genbank (file name: nt.gb) and 
tested the Bio::SeqIO module as follows. The first 3 commands succeeded; 
however the last 2 failed.

I think $seqio->next_seq() produces a Bio::Seq object which contains the 
sequence, features and annotation (according to the DESCRIPTION of 
"perldoc Bio::Seq") and thus the invocation of the methods 
get_SeqFeatures() and translate() should be valid. However, the results 
denied this idea.

Will anyone explain what happened to the last 2 commands? I have 
encountered numerous cases of failures when testing the bioperl methods. 
I want to translate the mRNA sequence and extract the sequence features. 
What are the right commands? Thanks a lot!

Best,
Dejian


PS: The commands and results

$ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); 
print $seqio->next_seq()->display_id(); ' nt.gb
NM_053056

$ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); 
print $seqio->next_seq()->molecule(); ' nt.gb
mRNA

$ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); 
print $seqio->next_seq()->subseq(1,6); ' nt.gb
CACACG

$ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); 
print $seqio->next_seq()->get_SeqFeatures(); ' nt.gb
Bio::SeqFeature::Generic=HASH(0x20a30898)Bio::SeqFeature::Generic=HASH(0x20a30bb0)Bio::SeqFeature::Generic=HASH(0x20a30cd0)Bio::SeqFeature::Generic=HASH(0x20a317b0)Bio::SeqFeature::Generic=HASH(0x20a31720)Bio::SeqFeature::Generic=HASH(0x20a39a18)Bio::SeqFeature::Generic=HASH(0x20a317e0)Bio::SeqFeature::Generic=HASH(0x20a398b0)Bio::SeqFeature::Generic=HASH(0x20a39838)Bio::SeqFeature::Generic=HASH(0x20a39e98)Bio::SeqFeature::Generic=HASH(0x20a3b898)Bio::SeqFeature::Generic=HASH(0x20a3a120)Bio::SeqFeature::Generic=HASH(0x20a3bda8)Bio::SeqFeature::Generic=HASH(0x20a3c030)Bio::SeqFeature::Generic=HASH(0x20a3c2b8)Bio::SeqFeature::Generic=HASH(0x20a3be20)Bio::SeqFeature::Generic=HASH(0x20a3c0a8)Bio::SeqFeature::Generic=HASH(0x20a3bb98)Bio::SeqFeature::Generic=HASH(0x20a3c300)Bio::SeqFeature::Generic=HASH(0x20a3c588)Bio::SeqFeature::Generic=HASH(0x20a3d838)Bio::SeqFeature::Generic=HASH(0x20a3dfb8)Bio::SeqFeature::Generic=HASH(0x20a3dd18)

$ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); 
print $seqio->next_seq()->translate(); ' nt.gb
Bio::Seq::RichSeq=HASH(0x20a3e7b0)


From jimhu at tamu.edu  Wed Jul 11 14:01:27 2012
From: jimhu at tamu.edu (Jim Hu)
Date: Wed, 11 Jul 2012 13:01:27 -0500
Subject: [Bioperl-l] Errors with Bio::Seq objects
In-Reply-To: <4FFDB879.1020906@gmail.com>
References: <4FFDB879.1020906@gmail.com>
Message-ID: <9CA9DA3A-B03F-4EC3-977C-E18A6F4D9B6F@tamu.edu>

Hi Dejian,

On Jul 11, 2012, at 12:31 PM, De-Jian Zhao wrote:

> Hi,
> 
> I downloaded a nucleotide sequence from Genbank (file name: nt.gb) and tested the Bio::SeqIO module as follows. The first 3 commands succeeded; however the last 2 failed.
> 
> I think $seqio->next_seq() produces a Bio::Seq object which contains the sequence, features and annotation (according to the DESCRIPTION of "perldoc Bio::Seq") and thus the invocation of the methods get_SeqFeatures() and translate() should be valid. However, the results denied this idea.

That's correct about Bio::Seq objects being returned. Actually, it is probably a kind of Bio::Seq object. For example, SeqIO may return a Bio::Seq::RichSeq object that inherits methods from Bio::Seq objects.  However, as explained below, the methods are working as they should... they are just returning objects when you are expecting something else.

> 
> Will anyone explain what happened to the last 2 commands? I have encountered numerous cases of failures when testing the bioperl methods. I want to translate the mRNA sequence and extract the sequence features. What are the right commands? Thanks a lot!
> 
> Best,
> Dejian
> 
> 
> 
> PS: The commands and results
> 
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->display_id(); ' nt.gb
> NM_053056
> 
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->molecule(); ' nt.gb
> mRNA
> 
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->subseq(1,6); ' nt.gb
> CACACG
> 
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->get_SeqFeatures(); ' nt.gb
> Bio::SeqFeature::Generic=HASH(0x20a30898)Bio::SeqFeature::Generic=HASH(0x20a30bb0)Bio::SeqFeature::Generic=HASH(0x20a30cd0)Bio::SeqFeature::Generic=HASH(0x20a317b0)Bio::SeqFeature::Generic=HASH(0x20a31720)Bio::SeqFeature::Generic=HASH(0x20a39a18)Bio::SeqFeature::Generic=HASH(0x20a317e0)Bio::SeqFeature::Generic=HASH(0x20a398b0)Bio::SeqFeature::Generic=HASH(0x20a39838)Bio::SeqFeature::Generic=HASH(0x20a39e98)Bio::SeqFeature::Generic=HASH(0x20a3b898)Bio::SeqFeature::Generic=HASH(0x20a3a120)Bio::SeqFeature::Generic=HASH(0x20a3bda8)Bio::SeqFeature::Generic=HASH(0x20a3c030)Bio::SeqFeature::Generic=HASH(0x20a3c2b8)Bio::SeqFeature::Generic=HASH(0x20a3be20)Bio::SeqFeature::Generic=HASH(0x20a3c0a8)Bio::SeqFeature::Generic=HASH(0x20a3bb98)Bio::SeqFeature::Generic=HASH(0x20a3c300)Bio::SeqFeature::Generic=HASH(0x20a3c588)Bio::SeqFeature::Generic=HASH(0x20a3d838)Bio::SeqFeature::Generic=HASH(0x20a3dfb8)Bio::SeqFeature::Generic=HASH(0x20a3dd18)

http://doc.bioperl.org/bioperl-live/Bio/Seq.html#POD24

$seq_obj->get_SeqFeatures() returns an array of SeqFeature objects, which are references.  So this worked as expected.

I usually write this as script files, so I've never done it all with perl -e.  But you need to iterate over the array and query the objects for the information you want about the features.


> 
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->translate(); ' nt.gb
> Bio::Seq::RichSeq=HASH(0x20a3e7b0)

->translate returns a new Seq object.  I think 

$ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->translate()->seq(); ' nt.gb

should work (haven't tried it).

Jim

> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

=====================================
Jim Hu
Professor
Dept. of Biochemistry and Biophysics
2128 TAMU
Texas A&M Univ.
College Station, TX 77843-2128
979-862-4054


From bosborne11 at verizon.net  Wed Jul 11 13:47:25 2012
From: bosborne11 at verizon.net (Brian Osborne)
Date: Wed, 11 Jul 2012 13:47:25 -0400
Subject: [Bioperl-l] Errors with Bio::Seq objects
In-Reply-To: <4FFDB879.1020906@gmail.com>
References: <4FFDB879.1020906@gmail.com>
Message-ID: <DBC94184-BB61-47EB-BC2A-D306E94BC25A@verizon.net>

Dejian,

These are not "failures". The get_SeqFeatures() and translate() methods will return Bio::Seq objects or a Bio::Seq object.

Start here:

www.bioperl.org/wiki/HOWTO:Beginners

Brian O.


On Jul 11, 2012, at 1:31 PM, De-Jian Zhao wrote:

> Hi,
> 
> I downloaded a nucleotide sequence from Genbank (file name: nt.gb) and tested the Bio::SeqIO module as follows. The first 3 commands succeeded; however the last 2 failed.
> 
> I think $seqio->next_seq() produces a Bio::Seq object which contains the sequence, features and annotation (according to the DESCRIPTION of "perldoc Bio::Seq") and thus the invocation of the methods get_SeqFeatures() and translate() should be valid. However, the results denied this idea.
> 
> Will anyone explain what happened to the last 2 commands? I have encountered numerous cases of failures when testing the bioperl methods. I want to translate the mRNA sequence and extract the sequence features. What are the right commands? Thanks a lot!
> 
> Best,
> Dejian
> 
> 
> 
> PS: The commands and results
> 
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->display_id(); ' nt.gb
> NM_053056
> 
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->molecule(); ' nt.gb
> mRNA
> 
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->subseq(1,6); ' nt.gb
> CACACG
> 
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->get_SeqFeatures(); ' nt.gb
> Bio::SeqFeature::Generic=HASH(0x20a30898)Bio::SeqFeature::Generic=HASH(0x20a30bb0)Bio::SeqFeature::Generic=HASH(0x20a30cd0)Bio::SeqFeature::Generic=HASH(0x20a317b0)Bio::SeqFeature::Generic=HASH(0x20a31720)Bio::SeqFeature::Generic=HASH(0x20a39a18)Bio::SeqFeature::Generic=HASH(0x20a317e0)Bio::SeqFeature::Generic=HASH(0x20a398b0)Bio::SeqFeature::Generic=HASH(0x20a39838)Bio::SeqFeature::Generic=HASH(0x20a39e98)Bio::SeqFeature::Generic=HASH(0x20a3b898)Bio::SeqFeature::Generic=HASH(0x20a3a120)Bio::SeqFeature::Generic=HASH(0x20a3bda8)Bio::SeqFeature::Generic=HASH(0x20a3c030)Bio::SeqFeature::Generic=HASH(0x20a3c2b8)Bio::SeqFeature::Generic=HASH(0x20a3be20)Bio::SeqFeature::Generic=HASH(0x20a3c0a8)Bio::SeqFeature::Generic=HASH(0x20a3bb98)Bio::SeqFeature::Generic=HASH(0x20a3c300)Bio::SeqFeature::Generic=HASH(0x20a3c588)Bio::SeqFeature::Generic=HASH(0x20a3d838)Bio::SeqFeature::Generic=HASH(0x20a3dfb8)Bio::SeqFeature::Generic=HASH(0x20a3dd18)
> 
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->translate(); ' nt.gb
> Bio::Seq::RichSeq=HASH(0x20a3e7b0)
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Wed Jul 11 15:02:46 2012
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Wed, 11 Jul 2012 19:02:46 +0000
Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates
In-Reply-To: <a73f4de8-8a2e-4dec-9073-62ab2e4da668@googlegroups.com>
References: <CA4B2C62.4B0E%givans@missouri.edu>
	<7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com>
	<1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu>
	<EBB9878A-F77D-4A0C-B543-7E9FA0C25ABE@missouri.edu>
	<a73f4de8-8a2e-4dec-9073-62ab2e4da668@googlegroups.com>
Message-ID: <FFCC2B49-3582-4CEE-9658-3710B213E754@illinois.edu>

Peng, 

Has this been filed as a bug yet?  

    https://redmine.open-bio.org/projects/bioperl

Seems like it would be fairly easy to fix, but I want to track it just in case.

chris

On Jul 11, 2012, at 12:45 PM, Peng Zhou wrote:

> Hello guys,
> 
> Just a follow-up, it seems to me the bioperl-live version is still having the same problem - calling hit "query" while query sequence "hit". I also looked into the test script written for hmmer3 (bioperl-live/t/SearchIO/hmmer.t), and it doesn't deal with the alignment part - I guess that's why this bug was not discovered.
> 
> To be simple, here's an output of hmmsearch v3.0:
> # hmmsearch :: search profile(s) against a sequence database
> # HMMER 3.0 (March 2010); http://hmmer.org/
> # Copyright (C) 2010 Howard Hughes Medical Institute.
> # Freely distributed under the GNU General Public License (GPLv3).
> # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> # query HMM file:                  /project/youngn/zhoup/Scripts/spada/profile/21_all.hmm
> # target sequence database:        /project/youngn/zhoup/Data/misc3/spada/Athaliana/01_genome/12_refseq_orf.fa
> # output directed to file:         /project/youngn/zhoup/Data/misc3/spada/Athaliana/11_hmmSearchX/01_raw.txt
> # number of worker threads:        4
> # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> 
> Query:       CRP0000  [M=75]
> Scores for complete sequences (score includes all domains):
>    --- full sequence ---   --- best 1 domain ---    -#dom-
>     E-value  score  bias    E-value  score  bias    exp  N  Sequence                 Description
>     ------- ------ -----    ------- ------ -----   ---- --  --------                 -----------
>     5.5e-25   95.0  14.4    5.7e-25   95.0  10.0    1.0  1  Chr2_540228_540404_+     
> 
> Domain annotation for each sequence (and alignments):
> >> Chr2_540228_540404_+  
>    #    score  bias  c-Evalue  i-Evalue hmmfrom  hmm to    alifrom  ali to    envfrom  env to     acc
>  ---   ------ ----- --------- --------- ------- -------    ------- -------    ------- -------    ----
>    1 !   95.0  10.0   3.6e-30   5.7e-25      20      74 ..       4      59 .]       1      59 [] 0.95
> 
>   Alignments for each domain:
>   == domain 1    score: 95.0 bits;  conditional E-value: 3.6e-30
>                CRP0000 20 tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg.rrrCfCtkpc 74
>                           ++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf gG+crg rrrC+Ct++c
>   Chr2_540228_540404_+  4 GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC 59
>                           568899***99********************************************* PP
> 
> And here is a dump of the parsed HSP object:
> $VAR1 = bless( {
>                  'VERBOSE' => 0,
>                  'IDENTICAL' => 0,
>                  'RANK' => 1,
>                  'STRANDED' => 'NONE',
>                  'EVALUE' => '3.6e-30',
>                  'HSP_LENGTH' => 56,
>                  'ALGORITHM' => 'HMMSEARCH'
>                  'SCORE' => '95.0',
>                  'GAP_SYMBOL' => '-',
>                  'CONSERVED' => 0,
>                  
>                  'HIT_NAME' => 'Chr2_540228_540404_+',
>                  'HIT_DESC' => '',
>                  'HIT_START' => '20',
>                  'HIT_END' => '74',
>                  'HIT_LENGTH' => 56,
>                  'HIT_SEQ' => 'tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg-rrrCfCtkpc',
>                  'HIT_FRAME' => 0,
>                  
>                  'QUERY_NAME' => 'CRP0000',
>                  'QUERY_DESC' => undef,
>                  'QUERY_START' => '4',
>                  'QUERY_END' => '59',
>                  'QUERY_LENGTH' => '75',
>                  'QUERY_FRAME' => 0,
>                  'QUERY_SEQ' => 'GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC',
>                  
>                  'HOMOLOGY_SEQ' => '++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf gG+crg rrrC+Ct++c',
>                }, 'Bio::Search::HSP::HMMERHSP' );
> 
> Clearly, the "HIT_START", "HIT_END", "HIT_SEQ" should actually be exchanged with "QUERY_START", "QUERY_END" and "QUERY_SEQ" values.
> 
> Thanks,
> 
> Peng,
> 
> On Tuesday, July 19, 2011 11:23:20 PM UTC-5, Givan, Scott A. wrote:
> I'll try the bioperl-live version. Thanks guys.
> Scott Givan
> 541-740-4685
> Sent from an iPhone (so expect typos).
> 
> On Jul 19, 2011, at 10:34 PM, "Chris Fields" <cjfields at illinois.edu> wrote:
> 
> > This might be a disconnect between the HMMER3 version in bioperl-live and the one in Kai's bioperl-hmmer3 repo.  I believe the one in bioperl-live is newer.  Scott, can you give that a try?
> > 
> > chris
> > 
> > On Jul 19, 2011, at 9:45 PM, Thomas Sharpton wrote:
> > 
> >> Hi Scott,
> >> 
> >> Thanks for writing. I'm on the road at the moment so I have to be briefer and less thorough than I'd like to be.
> >> 
> >> What you are observing is not the intended behavior. Oddly, it's not what I recall obtaining in my tests on this software, though I was mostly interested in hmmsearch at the time and may have been sloppier than I should have been when it came to hmmscan.
> >> 
> >> What version of HMMER3 you're using? There have been some small formatting changes in the past that might be causing a burp in the parser, though I'm doubting it.
> >> 
> >> Kai Blin wrote some test scripts (found here: bioperl-live/t/SearchIO/hmmer.t) that, if I recall correctly, evaluate query/hit coordinates. It might be worth giving this a shot if you haven't already.
> >> 
> >> Also, if you don't mind, I'm happy to run your code on your output file on my end. It might help me diagnose the problem.
> >> 
> >> Sorry this is being a thorn in your side! I've cc'ed the list in case anyone else has insight into this matter.
> >> 
> >> Best,
> >> Thomas
> >> 
> >> On Jul 19, 2011, at 10:43 AM, Givan, Scott A. wrote:
> >> 
> >>> Hi Thomas,
> >>> 
> >>> I'm using modules in the bipoerl-hmmer3 git repository to parse hmmscan
> >>> reports. When I parse the files and walk through the HSP's like:
> >>> 
> >>> while (my $hit = $rslt->next_model) {
> >>> 
> >>>    while (my $domain = $hit->next_hsp) {
> >>> 
> >>> And retrieve the "hit" coordinates like:
> >>> 
> >>>        print "hit coords: ", $domain->start('hit'), "-", $domain->end('hit'),
> >>> "\n";
> >>> 
> >>> The coordinates returned correspond to what I would call the "query",
> >>> since they are for the sequence I fed to hmmscan to search the profile
> >>> database. Likewise, when retrieving the query coordinates like
> >>> $domain->start('query'), I get what I consider the "hit" coordinates,
> >>> since they are for the domain profile. Is this the intended behavior?
> >>> 
> >>> Thanks.
> >>> 
> >>> scott
> >>> 
> >>> -- 
> >>> Scott A. Givan
> >>> Associate Director
> >>> Informatics Research Core Facility
> >>> 240e Bond Life Sciences Center
> >>> Research Assistant Professor
> >>> Molecular Microbiology and Immunology
> >>> University of Missouri, Columbia
> >>> 
> >>> TEL 573-882-2948
> >>> FAX 573-884-9676
> >>> http://ircf.rnet.missouri.edu
> >>> 
> >>> 
> >>> 
> >> 
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> 
> 


From p.j.a.cock at googlemail.com  Wed Jul 11 17:00:56 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 11 Jul 2012 22:00:56 +0100
Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates
In-Reply-To: <FFCC2B49-3582-4CEE-9658-3710B213E754@illinois.edu>
References: <CA4B2C62.4B0E%givans@missouri.edu>
	<7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com>
	<1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu>
	<EBB9878A-F77D-4A0C-B543-7E9FA0C25ABE@missouri.edu>
	<a73f4de8-8a2e-4dec-9073-62ab2e4da668@googlegroups.com>
	<FFCC2B49-3582-4CEE-9658-3710B213E754@illinois.edu>
Message-ID: <CAKVJ-_4mo16NxEbZkNp3-EG9uknc4Fo4m6ajQWTknumQNUn4Yw@mail.gmail.com>

On Wed, Jul 11, 2012 at 8:02 PM, Fields, Christopher J
<cjfields at illinois.edu> wrote:
> Peng,
>
> Has this been filed as a bug yet?
>
>     https://redmine.open-bio.org/projects/bioperl
>
> Seems like it would be fairly easy to fix, but I want to track it just in case.
>
> chris

Hi all,

This could be the unfortunate fact that hmmscan and
hmmsearch return very similar tabular output, but
with query and hit interchanged. i.e. You need some
extra information to know which way round they are
(not possible with the current output). This was an
issue in Bow's Biopython SearchIO project - which
for the moment he solved by handling this as two
hmmer file formats. In the medium term we're hoping
hmmer3 will add some header information or something.

Peter


From zhoupenggeni at gmail.com  Wed Jul 11 13:45:00 2012
From: zhoupenggeni at gmail.com (Peng Zhou)
Date: Wed, 11 Jul 2012 10:45:00 -0700 (PDT)
Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates
In-Reply-To: <EBB9878A-F77D-4A0C-B543-7E9FA0C25ABE@missouri.edu>
References: <CA4B2C62.4B0E%givans@missouri.edu>
	<7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com>
	<1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu>
	<EBB9878A-F77D-4A0C-B543-7E9FA0C25ABE@missouri.edu>
Message-ID: <a73f4de8-8a2e-4dec-9073-62ab2e4da668@googlegroups.com>

Hello guys,

Just a follow-up, it seems to me the bioperl-live version is still having 
the same problem - calling hit "query" while query sequence "hit". I also 
looked into the test script written for hmmer3 (
bioperl-live/t/SearchIO/hmmer.t), and it doesn't deal with the alignment 
part - I guess that's why this bug was not discovered.

To be simple, here's an output of hmmsearch v3.0:
# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.0 (March 2010); http://hmmer.org/
# Copyright (C) 2010 Howard Hughes Medical Institute.
# Freely distributed under the GNU General Public License (GPLv3).
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                 
 /project/youngn/zhoup/Scripts/spada/profile/21_all.hmm
# target sequence database:       
 /project/youngn/zhoup/Data/misc3/spada/Athaliana/01_genome/12_refseq_orf.fa
# output directed to file:         
/project/youngn/zhoup/Data/misc3/spada/Athaliana/11_hmmSearchX/01_raw.txt
# number of worker threads:        4
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       CRP0000  [M=75]
Scores for complete sequences (score includes all domains):
   --- full sequence ---   --- best 1 domain ---    -#dom-
    E-value  score  bias    E-value  score  bias    exp  N  Sequence       
          Description
    ------- ------ -----    ------- ------ -----   ---- --  --------       
          -----------
    5.5e-25   95.0  14.4    5.7e-25   95.0  10.0    1.0  1 
 Chr2_540228_540404_+     

Domain annotation for each sequence (and alignments):
>> Chr2_540228_540404_+  
   #    score  bias  c-Evalue  i-Evalue hmmfrom  hmm to    alifrom  ali to 
   envfrom  env to     acc
 ---   ------ ----- --------- --------- ------- -------    ------- ------- 
   ------- -------    ----
   1 !   95.0  10.0   3.6e-30   5.7e-25      20      74 ..       4      59 
.]       1      59 [] 0.95

  Alignments for each domain:
  == domain 1    score: 95.0 bits;  conditional E-value: 3.6e-30
               CRP0000 20 
tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg.rrrCfCtkpc 74
                          ++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf gG+crg 
rrrC+Ct++c
  Chr2_540228_540404_+  4 
GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC 59
                          
568899***99********************************************* PP

And here is a dump of the parsed HSP object:
$VAR1 = bless( {
                 'VERBOSE' => 0,
                 'IDENTICAL' => 0,
                 'RANK' => 1,
                 'STRANDED' => 'NONE',
                 'EVALUE' => '3.6e-30',
                 'HSP_LENGTH' => 56,
                 'ALGORITHM' => 'HMMSEARCH'
                 'SCORE' => '95.0',
                 'GAP_SYMBOL' => '-',
                 'CONSERVED' => 0,
                 
                 'HIT_NAME' => 'Chr2_540228_540404_+',
                 'HIT_DESC' => '',
                 'HIT_START' => '20',
                 'HIT_END' => '74',
                 'HIT_LENGTH' => 56,
                 'HIT_SEQ' => 
'tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg-rrrCfCtkpc',
                 'HIT_FRAME' => 0,
                 
                 'QUERY_NAME' => 'CRP0000',
                 'QUERY_DESC' => undef,
                 'QUERY_START' => '4',
                 'QUERY_END' => '59',
                 'QUERY_LENGTH' => '75',
                 'QUERY_FRAME' => 0,
                 'QUERY_SEQ' => 
'GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC',
                 
                 'HOMOLOGY_SEQ' => '++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf 
gG+crg rrrC+Ct++c',
               }, 'Bio::Search::HSP::HMMERHSP' );

Clearly, the "HIT_START", "HIT_END", "HIT_SEQ" should actually be exchanged 
with "QUERY_START", "QUERY_END" and "QUERY_SEQ" values.

Thanks,

Peng,

On Tuesday, July 19, 2011 11:23:20 PM UTC-5, Givan, Scott A. wrote:
>
> I'll try the bioperl-live version. Thanks guys. 
>
> Scott Givan
> 541-740-4685
> Sent from an iPhone (so expect typos). 
>
> On Jul 19, 2011, at 10:34 PM, "Chris Fields" <cjfields at illinois.edu> 
> wrote:
>
> > This might be a disconnect between the HMMER3 version in bioperl-live 
> and the one in Kai's bioperl-hmmer3 repo.  I believe the one in 
> bioperl-live is newer.  Scott, can you give that a try?
> > 
> > chris
> > 
> > On Jul 19, 2011, at 9:45 PM, Thomas Sharpton wrote:
> > 
> >> Hi Scott,
> >> 
> >> Thanks for writing. I'm on the road at the moment so I have to be 
> briefer and less thorough than I'd like to be.
> >> 
> >> What you are observing is not the intended behavior. Oddly, it's not 
> what I recall obtaining in my tests on this software, though I was mostly 
> interested in hmmsearch at the time and may have been sloppier than I 
> should have been when it came to hmmscan.
> >> 
> >> What version of HMMER3 you're using? There have been some small 
> formatting changes in the past that might be causing a burp in the parser, 
> though I'm doubting it.
> >> 
> >> Kai Blin wrote some test scripts (found here: 
> bioperl-live/t/SearchIO/hmmer.t) that, if I recall correctly, evaluate 
> query/hit coordinates. It might be worth giving this a shot if you haven't 
> already.
> >> 
> >> Also, if you don't mind, I'm happy to run your code on your output file 
> on my end. It might help me diagnose the problem.
> >> 
> >> Sorry this is being a thorn in your side! I've cc'ed the list in case 
> anyone else has insight into this matter.
> >> 
> >> Best,
> >> Thomas
> >> 
> >> On Jul 19, 2011, at 10:43 AM, Givan, Scott A. wrote:
> >> 
> >>> Hi Thomas,
> >>> 
> >>> I'm using modules in the bipoerl-hmmer3 git repository to parse hmmscan
> >>> reports. When I parse the files and walk through the HSP's like:
> >>> 
> >>> while (my $hit = $rslt->next_model) {
> >>> 
> >>>    while (my $domain = $hit->next_hsp) {
> >>> 
> >>> And retrieve the "hit" coordinates like:
> >>> 
> >>>        print "hit coords: ", $domain->start('hit'), "-", 
> $domain->end('hit'),
> >>> "\n";
> >>> 
> >>> The coordinates returned correspond to what I would call the "query",
> >>> since they are for the sequence I fed to hmmscan to search the profile
> >>> database. Likewise, when retrieving the query coordinates like
> >>> $domain->start('query'), I get what I consider the "hit" coordinates,
> >>> since they are for the domain profile. Is this the intended behavior?
> >>> 
> >>> Thanks.
> >>> 
> >>> scott
> >>> 
> >>> -- 
> >>> Scott A. Givan
> >>> Associate Director
> >>> Informatics Research Core Facility
> >>> 240e Bond Life Sciences Center
> >>> Research Assistant Professor
> >>> Molecular Microbiology and Immunology
> >>> University of Missouri, Columbia
> >>> 
> >>> TEL 573-882-2948
> >>> FAX 573-884-9676
> >>> http://ircf.rnet.missouri.edu
> >>> 
> >>> 
> >>> 
> >> 
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From zhoupenggeni at gmail.com  Wed Jul 11 13:45:00 2012
From: zhoupenggeni at gmail.com (Peng Zhou)
Date: Wed, 11 Jul 2012 10:45:00 -0700 (PDT)
Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates
In-Reply-To: <EBB9878A-F77D-4A0C-B543-7E9FA0C25ABE@missouri.edu>
References: <CA4B2C62.4B0E%givans@missouri.edu>
	<7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com>
	<1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu>
	<EBB9878A-F77D-4A0C-B543-7E9FA0C25ABE@missouri.edu>
Message-ID: <a73f4de8-8a2e-4dec-9073-62ab2e4da668@googlegroups.com>

Hello guys,

Just a follow-up, it seems to me the bioperl-live version is still having 
the same problem - calling hit "query" while query sequence "hit". I also 
looked into the test script written for hmmer3 (
bioperl-live/t/SearchIO/hmmer.t), and it doesn't deal with the alignment 
part - I guess that's why this bug was not discovered.

To be simple, here's an output of hmmsearch v3.0:
# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.0 (March 2010); http://hmmer.org/
# Copyright (C) 2010 Howard Hughes Medical Institute.
# Freely distributed under the GNU General Public License (GPLv3).
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                 
 /project/youngn/zhoup/Scripts/spada/profile/21_all.hmm
# target sequence database:       
 /project/youngn/zhoup/Data/misc3/spada/Athaliana/01_genome/12_refseq_orf.fa
# output directed to file:         
/project/youngn/zhoup/Data/misc3/spada/Athaliana/11_hmmSearchX/01_raw.txt
# number of worker threads:        4
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       CRP0000  [M=75]
Scores for complete sequences (score includes all domains):
   --- full sequence ---   --- best 1 domain ---    -#dom-
    E-value  score  bias    E-value  score  bias    exp  N  Sequence       
          Description
    ------- ------ -----    ------- ------ -----   ---- --  --------       
          -----------
    5.5e-25   95.0  14.4    5.7e-25   95.0  10.0    1.0  1 
 Chr2_540228_540404_+     

Domain annotation for each sequence (and alignments):
>> Chr2_540228_540404_+  
   #    score  bias  c-Evalue  i-Evalue hmmfrom  hmm to    alifrom  ali to 
   envfrom  env to     acc
 ---   ------ ----- --------- --------- ------- -------    ------- ------- 
   ------- -------    ----
   1 !   95.0  10.0   3.6e-30   5.7e-25      20      74 ..       4      59 
.]       1      59 [] 0.95

  Alignments for each domain:
  == domain 1    score: 95.0 bits;  conditional E-value: 3.6e-30
               CRP0000 20 
tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg.rrrCfCtkpc 74
                          ++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf gG+crg 
rrrC+Ct++c
  Chr2_540228_540404_+  4 
GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC 59
                          
568899***99********************************************* PP

And here is a dump of the parsed HSP object:
$VAR1 = bless( {
                 'VERBOSE' => 0,
                 'IDENTICAL' => 0,
                 'RANK' => 1,
                 'STRANDED' => 'NONE',
                 'EVALUE' => '3.6e-30',
                 'HSP_LENGTH' => 56,
                 'ALGORITHM' => 'HMMSEARCH'
                 'SCORE' => '95.0',
                 'GAP_SYMBOL' => '-',
                 'CONSERVED' => 0,
                 
                 'HIT_NAME' => 'Chr2_540228_540404_+',
                 'HIT_DESC' => '',
                 'HIT_START' => '20',
                 'HIT_END' => '74',
                 'HIT_LENGTH' => 56,
                 'HIT_SEQ' => 
'tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg-rrrCfCtkpc',
                 'HIT_FRAME' => 0,
                 
                 'QUERY_NAME' => 'CRP0000',
                 'QUERY_DESC' => undef,
                 'QUERY_START' => '4',
                 'QUERY_END' => '59',
                 'QUERY_LENGTH' => '75',
                 'QUERY_FRAME' => 0,
                 'QUERY_SEQ' => 
'GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC',
                 
                 'HOMOLOGY_SEQ' => '++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf 
gG+crg rrrC+Ct++c',
               }, 'Bio::Search::HSP::HMMERHSP' );

Clearly, the "HIT_START", "HIT_END", "HIT_SEQ" should actually be exchanged 
with "QUERY_START", "QUERY_END" and "QUERY_SEQ" values.

Thanks,

Peng,

On Tuesday, July 19, 2011 11:23:20 PM UTC-5, Givan, Scott A. wrote:
>
> I'll try the bioperl-live version. Thanks guys. 
>
> Scott Givan
> 541-740-4685
> Sent from an iPhone (so expect typos). 
>
> On Jul 19, 2011, at 10:34 PM, "Chris Fields" <cjfields at illinois.edu> 
> wrote:
>
> > This might be a disconnect between the HMMER3 version in bioperl-live 
> and the one in Kai's bioperl-hmmer3 repo.  I believe the one in 
> bioperl-live is newer.  Scott, can you give that a try?
> > 
> > chris
> > 
> > On Jul 19, 2011, at 9:45 PM, Thomas Sharpton wrote:
> > 
> >> Hi Scott,
> >> 
> >> Thanks for writing. I'm on the road at the moment so I have to be 
> briefer and less thorough than I'd like to be.
> >> 
> >> What you are observing is not the intended behavior. Oddly, it's not 
> what I recall obtaining in my tests on this software, though I was mostly 
> interested in hmmsearch at the time and may have been sloppier than I 
> should have been when it came to hmmscan.
> >> 
> >> What version of HMMER3 you're using? There have been some small 
> formatting changes in the past that might be causing a burp in the parser, 
> though I'm doubting it.
> >> 
> >> Kai Blin wrote some test scripts (found here: 
> bioperl-live/t/SearchIO/hmmer.t) that, if I recall correctly, evaluate 
> query/hit coordinates. It might be worth giving this a shot if you haven't 
> already.
> >> 
> >> Also, if you don't mind, I'm happy to run your code on your output file 
> on my end. It might help me diagnose the problem.
> >> 
> >> Sorry this is being a thorn in your side! I've cc'ed the list in case 
> anyone else has insight into this matter.
> >> 
> >> Best,
> >> Thomas
> >> 
> >> On Jul 19, 2011, at 10:43 AM, Givan, Scott A. wrote:
> >> 
> >>> Hi Thomas,
> >>> 
> >>> I'm using modules in the bipoerl-hmmer3 git repository to parse hmmscan
> >>> reports. When I parse the files and walk through the HSP's like:
> >>> 
> >>> while (my $hit = $rslt->next_model) {
> >>> 
> >>>    while (my $domain = $hit->next_hsp) {
> >>> 
> >>> And retrieve the "hit" coordinates like:
> >>> 
> >>>        print "hit coords: ", $domain->start('hit'), "-", 
> $domain->end('hit'),
> >>> "\n";
> >>> 
> >>> The coordinates returned correspond to what I would call the "query",
> >>> since they are for the sequence I fed to hmmscan to search the profile
> >>> database. Likewise, when retrieving the query coordinates like
> >>> $domain->start('query'), I get what I consider the "hit" coordinates,
> >>> since they are for the domain profile. Is this the intended behavior?
> >>> 
> >>> Thanks.
> >>> 
> >>> scott
> >>> 
> >>> -- 
> >>> Scott A. Givan
> >>> Associate Director
> >>> Informatics Research Core Facility
> >>> 240e Bond Life Sciences Center
> >>> Research Assistant Professor
> >>> Molecular Microbiology and Immunology
> >>> University of Missouri, Columbia
> >>> 
> >>> TEL 573-882-2948
> >>> FAX 573-884-9676
> >>> http://ircf.rnet.missouri.edu
> >>> 
> >>> 
> >>> 
> >> 
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From zhoupenggeni at gmail.com  Wed Jul 11 14:03:17 2012
From: zhoupenggeni at gmail.com (Peng Zhou)
Date: Wed, 11 Jul 2012 11:03:17 -0700 (PDT)
Subject: [Bioperl-l] Errors with Bio::Seq objects
In-Reply-To: <4FFDB879.1020906@gmail.com>
References: <4FFDB879.1020906@gmail.com>
Message-ID: <25cf332a-f998-4eae-96ff-d9db1ee2ff2c@googlegroups.com>

Hi, 

I guess that's what the commands are supposed to do: the get_SeqFeatures() 
method return an array of Bio::SeqFeature objects, and the translate() 
method returns a Bio::Seq object. And you can't simply "print" an object in 
perl - you can "dump" it though:

$ perl -e ' use Bio::SeqIO; use Data::Dumper; my 
$seqio=Bio::SeqIO->new(-file=>shift); 
print Dumper($seqio->next_seq()->get_SeqFeatures()); ' nt.gb 

$ perl -e ' use Bio::SeqIO; use Data::Dumper; my 
$seqio=Bio::SeqIO->new(-file=>shift); 
print Dumper($seqio->next_seq()->translate()); ' nt.gb 

On Wednesday, July 11, 2012 12:31:37 PM UTC-5, De-Jian Zhao wrote:
>
> Hi, 
>
> I downloaded a nucleotide sequence from Genbank (file name: nt.gb) and 
> tested the Bio::SeqIO module as follows. The first 3 commands succeeded; 
> however the last 2 failed. 
>
> I think $seqio->next_seq() produces a Bio::Seq object which contains the 
> sequence, features and annotation (according to the DESCRIPTION of 
> "perldoc Bio::Seq") and thus the invocation of the methods 
> get_SeqFeatures() and translate() should be valid. However, the results 
> denied this idea. 
>
> Will anyone explain what happened to the last 2 commands? I have 
> encountered numerous cases of failures when testing the bioperl methods. 
> I want to translate the mRNA sequence and extract the sequence features. 
> What are the right commands? Thanks a lot! 
>
> Best, 
> Dejian 
>
>
>
> PS: The commands and results 
>
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); 
> print $seqio->next_seq()->display_id(); ' nt.gb 
> NM_053056 
>
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); 
> print $seqio->next_seq()->molecule(); ' nt.gb 
> mRNA 
>
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); 
> print $seqio->next_seq()->subseq(1,6); ' nt.gb 
> CACACG 
>
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); 
> print $seqio->next_seq()->get_SeqFeatures(); ' nt.gb 
> Bio::SeqFeature::Generic=HASH(0x20a30898)Bio::SeqFeature::Generic=HASH(0x20a30bb0)Bio::SeqFeature::Generic=HASH(0x20a30cd0)Bio::SeqFeature::Generic=HASH(0x20a317b0)Bio::SeqFeature::Generic=HASH(0x20a31720)Bio::SeqFeature::Generic=HASH(0x20a39a18)Bio::SeqFeature::Generic=HASH(0x20a317e0)Bio::SeqFeature::Generic=HASH(0x20a398b0)Bio::SeqFeature::Generic=HASH(0x20a39838)Bio::SeqFeature::Generic=HASH(0x20a39e98)Bio::SeqFeature::Generic=HASH(0x20a3b898)Bio::SeqFeature::Generic=HASH(0x20a3a120)Bio::SeqFeature::Generic=HASH(0x20a3bda8)Bio::SeqFeature::Generic=HASH(0x20a3c030)Bio::SeqFeature::Generic=HASH(0x20a3c2b8)Bio::SeqFeature::Generic=HASH(0x20a3be20)Bio::SeqFeature::Generic=HASH(0x20a3c0a8)Bio::SeqFeature::Generic=HASH(0x20a3bb98)Bio::SeqFeature::Generic=HASH(0x20a3c300)Bio::SeqFeature::Generic=HASH(0x20a3c588)Bio::SeqFeature::Generic=HASH(0x20a3d838)Bio::SeqFeature::Generic=HASH(0x20a3dfb8)Bio::SeqFeature::Generic=HASH(0x20a3dd18) 
>
>
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); 
> print $seqio->next_seq()->translate(); ' nt.gb 
> Bio::Seq::RichSeq=HASH(0x20a3e7b0) 
>
>
>
>
> _______________________________________________ 
> Bioperl-l mailing list 
> Bioperl-l at lists.open-bio.org 
> http://lists.open-bio.org/mailman/listinfo/bioperl-l 
>


From zhoupenggeni at gmail.com  Wed Jul 11 14:03:17 2012
From: zhoupenggeni at gmail.com (Peng Zhou)
Date: Wed, 11 Jul 2012 11:03:17 -0700 (PDT)
Subject: [Bioperl-l] Errors with Bio::Seq objects
In-Reply-To: <4FFDB879.1020906@gmail.com>
References: <4FFDB879.1020906@gmail.com>
Message-ID: <25cf332a-f998-4eae-96ff-d9db1ee2ff2c@googlegroups.com>

Hi, 

I guess that's what the commands are supposed to do: the get_SeqFeatures() 
method return an array of Bio::SeqFeature objects, and the translate() 
method returns a Bio::Seq object. And you can't simply "print" an object in 
perl - you can "dump" it though:

$ perl -e ' use Bio::SeqIO; use Data::Dumper; my 
$seqio=Bio::SeqIO->new(-file=>shift); 
print Dumper($seqio->next_seq()->get_SeqFeatures()); ' nt.gb 

$ perl -e ' use Bio::SeqIO; use Data::Dumper; my 
$seqio=Bio::SeqIO->new(-file=>shift); 
print Dumper($seqio->next_seq()->translate()); ' nt.gb 

On Wednesday, July 11, 2012 12:31:37 PM UTC-5, De-Jian Zhao wrote:
>
> Hi, 
>
> I downloaded a nucleotide sequence from Genbank (file name: nt.gb) and 
> tested the Bio::SeqIO module as follows. The first 3 commands succeeded; 
> however the last 2 failed. 
>
> I think $seqio->next_seq() produces a Bio::Seq object which contains the 
> sequence, features and annotation (according to the DESCRIPTION of 
> "perldoc Bio::Seq") and thus the invocation of the methods 
> get_SeqFeatures() and translate() should be valid. However, the results 
> denied this idea. 
>
> Will anyone explain what happened to the last 2 commands? I have 
> encountered numerous cases of failures when testing the bioperl methods. 
> I want to translate the mRNA sequence and extract the sequence features. 
> What are the right commands? Thanks a lot! 
>
> Best, 
> Dejian 
>
>
>
> PS: The commands and results 
>
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); 
> print $seqio->next_seq()->display_id(); ' nt.gb 
> NM_053056 
>
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); 
> print $seqio->next_seq()->molecule(); ' nt.gb 
> mRNA 
>
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); 
> print $seqio->next_seq()->subseq(1,6); ' nt.gb 
> CACACG 
>
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); 
> print $seqio->next_seq()->get_SeqFeatures(); ' nt.gb 
> Bio::SeqFeature::Generic=HASH(0x20a30898)Bio::SeqFeature::Generic=HASH(0x20a30bb0)Bio::SeqFeature::Generic=HASH(0x20a30cd0)Bio::SeqFeature::Generic=HASH(0x20a317b0)Bio::SeqFeature::Generic=HASH(0x20a31720)Bio::SeqFeature::Generic=HASH(0x20a39a18)Bio::SeqFeature::Generic=HASH(0x20a317e0)Bio::SeqFeature::Generic=HASH(0x20a398b0)Bio::SeqFeature::Generic=HASH(0x20a39838)Bio::SeqFeature::Generic=HASH(0x20a39e98)Bio::SeqFeature::Generic=HASH(0x20a3b898)Bio::SeqFeature::Generic=HASH(0x20a3a120)Bio::SeqFeature::Generic=HASH(0x20a3bda8)Bio::SeqFeature::Generic=HASH(0x20a3c030)Bio::SeqFeature::Generic=HASH(0x20a3c2b8)Bio::SeqFeature::Generic=HASH(0x20a3be20)Bio::SeqFeature::Generic=HASH(0x20a3c0a8)Bio::SeqFeature::Generic=HASH(0x20a3bb98)Bio::SeqFeature::Generic=HASH(0x20a3c300)Bio::SeqFeature::Generic=HASH(0x20a3c588)Bio::SeqFeature::Generic=HASH(0x20a3d838)Bio::SeqFeature::Generic=HASH(0x20a3dfb8)Bio::SeqFeature::Generic=HASH(0x20a3dd18) 
>
>
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); 
> print $seqio->next_seq()->translate(); ' nt.gb 
> Bio::Seq::RichSeq=HASH(0x20a3e7b0) 
>
>
>
>
> _______________________________________________ 
> Bioperl-l mailing list 
> Bioperl-l at lists.open-bio.org 
> http://lists.open-bio.org/mailman/listinfo/bioperl-l 
>


From zhoupenggeni at gmail.com  Wed Jul 11 16:05:56 2012
From: zhoupenggeni at gmail.com (Peng Zhou)
Date: Wed, 11 Jul 2012 13:05:56 -0700 (PDT)
Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates
In-Reply-To: <FFCC2B49-3582-4CEE-9658-3710B213E754@illinois.edu>
References: <CA4B2C62.4B0E%givans@missouri.edu>
	<7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com>
	<1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu>
	<EBB9878A-F77D-4A0C-B543-7E9FA0C25ABE@missouri.edu>
	<a73f4de8-8a2e-4dec-9073-62ab2e4da668@googlegroups.com>
	<FFCC2B49-3582-4CEE-9658-3710B213E754@illinois.edu>
Message-ID: <ee477075-8d1a-4172-a4f2-7c51f45d9224@googlegroups.com>

Thanks Chris, here is the link of the filed 
bug: https://redmine.open-bio.org/issues/3369

On Wednesday, July 11, 2012 2:02:46 PM UTC-5, Christopher Fields wrote:
>
> Peng, 
>
> Has this been filed as a bug yet?   
>
>     https://redmine.open-bio.org/projects/bioperl 
>
> Seems like it would be fairly easy to fix, but I want to track it just in 
> case. 
>
> chris 
>
> On Jul 11, 2012, at 12:45 PM, Peng Zhou wrote: 
>
> > Hello guys, 
> > 
> > Just a follow-up, it seems to me the bioperl-live version is still 
> having the same problem - calling hit "query" while query sequence "hit". I 
> also looked into the test script written for hmmer3 
> (bioperl-live/t/SearchIO/hmmer.t), and it doesn't deal with the alignment 
> part - I guess that's why this bug was not discovered. 
> > 
> > To be simple, here's an output of hmmsearch v3.0: 
> > # hmmsearch :: search profile(s) against a sequence database 
> > # HMMER 3.0 (March 2010); http://hmmer.org/ 
> > # Copyright (C) 2010 Howard Hughes Medical Institute. 
> > # Freely distributed under the GNU General Public License (GPLv3). 
> > # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
> - 
> > # query HMM file:                 
>  /project/youngn/zhoup/Scripts/spada/profile/21_all.hmm 
> > # target sequence database:       
>  /project/youngn/zhoup/Data/misc3/spada/Athaliana/01_genome/12_refseq_orf.fa 
>
> > # output directed to file:         
> /project/youngn/zhoup/Data/misc3/spada/Athaliana/11_hmmSearchX/01_raw.txt 
> > # number of worker threads:        4 
> > # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
> - 
> > 
> > Query:       CRP0000  [M=75] 
> > Scores for complete sequences (score includes all domains): 
> >    --- full sequence ---   --- best 1 domain ---    -#dom- 
> >     E-value  score  bias    E-value  score  bias    exp  N  Sequence     
>             Description 
> >     ------- ------ -----    ------- ------ -----   ---- --  --------     
>             ----------- 
> >     5.5e-25   95.0  14.4    5.7e-25   95.0  10.0    1.0  1 
>  Chr2_540228_540404_+     
> > 
> > Domain annotation for each sequence (and alignments): 
> > >> Chr2_540228_540404_+   
> >    #    score  bias  c-Evalue  i-Evalue hmmfrom  hmm to    alifrom  ali 
> to    envfrom  env to     acc 
> >  ---   ------ ----- --------- --------- ------- -------    ------- 
> -------    ------- -------    ---- 
> >    1 !   95.0  10.0   3.6e-30   5.7e-25      20      74 ..       4     
>  59 .]       1      59 [] 0.95 
> > 
> >   Alignments for each domain: 
> >   == domain 1    score: 95.0 bits;  conditional E-value: 3.6e-30 
> >                CRP0000 20 
> tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg.rrrCfCtkpc 74 
> >                           ++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf gG+crg 
> rrrC+Ct++c 
> >   Chr2_540228_540404_+  4 
> GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC 59 
> >                           
> 568899***99********************************************* PP 
> > 
> > And here is a dump of the parsed HSP object: 
> > $VAR1 = bless( { 
> >                  'VERBOSE' => 0, 
> >                  'IDENTICAL' => 0, 
> >                  'RANK' => 1, 
> >                  'STRANDED' => 'NONE', 
> >                  'EVALUE' => '3.6e-30', 
> >                  'HSP_LENGTH' => 56, 
> >                  'ALGORITHM' => 'HMMSEARCH' 
> >                  'SCORE' => '95.0', 
> >                  'GAP_SYMBOL' => '-', 
> >                  'CONSERVED' => 0, 
> >                   
> >                  'HIT_NAME' => 'Chr2_540228_540404_+', 
> >                  'HIT_DESC' => '', 
> >                  'HIT_START' => '20', 
> >                  'HIT_END' => '74', 
> >                  'HIT_LENGTH' => 56, 
> >                  'HIT_SEQ' => 
> 'tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg-rrrCfCtkpc', 
> >                  'HIT_FRAME' => 0, 
> >                   
> >                  'QUERY_NAME' => 'CRP0000', 
> >                  'QUERY_DESC' => undef, 
> >                  'QUERY_START' => '4', 
> >                  'QUERY_END' => '59', 
> >                  'QUERY_LENGTH' => '75', 
> >                  'QUERY_FRAME' => 0, 
> >                  'QUERY_SEQ' => 
> 'GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC', 
> >                   
> >                  'HOMOLOGY_SEQ' => '++gp+++eartCes+Sh+FkGpCvs 
> +nCa+vC++Egf gG+crg rrrC+Ct++c', 
> >                }, 'Bio::Search::HSP::HMMERHSP' ); 
> > 
> > Clearly, the "HIT_START", "HIT_END", "HIT_SEQ" should actually be 
> exchanged with "QUERY_START", "QUERY_END" and "QUERY_SEQ" values. 
> > 
> > Thanks, 
> > 
> > Peng, 
> > 
> > On Tuesday, July 19, 2011 11:23:20 PM UTC-5, Givan, Scott A. wrote: 
> > I'll try the bioperl-live version. Thanks guys. 
> > Scott Givan 
> > 541-740-4685 
> > Sent from an iPhone (so expect typos). 
> > 
> > On Jul 19, 2011, at 10:34 PM, "Chris Fields" <cjfields at illinois.edu> 
> wrote: 
> > 
> > > This might be a disconnect between the HMMER3 version in bioperl-live 
> and the one in Kai's bioperl-hmmer3 repo.  I believe the one in 
> bioperl-live is newer.  Scott, can you give that a try? 
> > > 
> > > chris 
> > > 
> > > On Jul 19, 2011, at 9:45 PM, Thomas Sharpton wrote: 
> > > 
> > >> Hi Scott, 
> > >> 
> > >> Thanks for writing. I'm on the road at the moment so I have to be 
> briefer and less thorough than I'd like to be. 
> > >> 
> > >> What you are observing is not the intended behavior. Oddly, it's not 
> what I recall obtaining in my tests on this software, though I was mostly 
> interested in hmmsearch at the time and may have been sloppier than I 
> should have been when it came to hmmscan. 
> > >> 
> > >> What version of HMMER3 you're using? There have been some small 
> formatting changes in the past that might be causing a burp in the parser, 
> though I'm doubting it. 
> > >> 
> > >> Kai Blin wrote some test scripts (found here: 
> bioperl-live/t/SearchIO/hmmer.t) that, if I recall correctly, evaluate 
> query/hit coordinates. It might be worth giving this a shot if you haven't 
> already. 
> > >> 
> > >> Also, if you don't mind, I'm happy to run your code on your output 
> file on my end. It might help me diagnose the problem. 
> > >> 
> > >> Sorry this is being a thorn in your side! I've cc'ed the list in case 
> anyone else has insight into this matter. 
> > >> 
> > >> Best, 
> > >> Thomas 
> > >> 
> > >> On Jul 19, 2011, at 10:43 AM, Givan, Scott A. wrote: 
> > >> 
> > >>> Hi Thomas, 
> > >>> 
> > >>> I'm using modules in the bipoerl-hmmer3 git repository to parse 
> hmmscan 
> > >>> reports. When I parse the files and walk through the HSP's like: 
> > >>> 
> > >>> while (my $hit = $rslt->next_model) { 
> > >>> 
> > >>>    while (my $domain = $hit->next_hsp) { 
> > >>> 
> > >>> And retrieve the "hit" coordinates like: 
> > >>> 
> > >>>        print "hit coords: ", $domain->start('hit'), "-", 
> $domain->end('hit'), 
> > >>> "\n"; 
> > >>> 
> > >>> The coordinates returned correspond to what I would call the 
> "query", 
> > >>> since they are for the sequence I fed to hmmscan to search the 
> profile 
> > >>> database. Likewise, when retrieving the query coordinates like 
> > >>> $domain->start('query'), I get what I consider the "hit" 
> coordinates, 
> > >>> since they are for the domain profile. Is this the intended 
> behavior? 
> > >>> 
> > >>> Thanks. 
> > >>> 
> > >>> scott 
> > >>> 
> > >>> -- 
> > >>> Scott A. Givan 
> > >>> Associate Director 
> > >>> Informatics Research Core Facility 
> > >>> 240e Bond Life Sciences Center 
> > >>> Research Assistant Professor 
> > >>> Molecular Microbiology and Immunology 
> > >>> University of Missouri, Columbia 
> > >>> 
> > >>> TEL 573-882-2948 
> > >>> FAX 573-884-9676 
> > >>> http://ircf.rnet.missouri.edu 
> > >>> 
> > >>> 
> > >>> 
> > >> 
> > >> _______________________________________________ 
> > >> Bioperl-l mailing list 
> > >> Bioperl-l at lists.open-bio.org 
> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l 
> > > 
> > 
> > _______________________________________________ 
> > Bioperl-l mailing list 
> > Bioperl-l at lists.open-bio.org 
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l 
> > 
> > 
> > 
> > 
>
>
> _______________________________________________ 
> Bioperl-l mailing list 
> Bioperl-l at lists.open-bio.org 
> http://lists.open-bio.org/mailman/listinfo/bioperl-l 
>


From zhoupenggeni at gmail.com  Wed Jul 11 16:05:56 2012
From: zhoupenggeni at gmail.com (Peng Zhou)
Date: Wed, 11 Jul 2012 13:05:56 -0700 (PDT)
Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates
In-Reply-To: <FFCC2B49-3582-4CEE-9658-3710B213E754@illinois.edu>
References: <CA4B2C62.4B0E%givans@missouri.edu>
	<7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com>
	<1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu>
	<EBB9878A-F77D-4A0C-B543-7E9FA0C25ABE@missouri.edu>
	<a73f4de8-8a2e-4dec-9073-62ab2e4da668@googlegroups.com>
	<FFCC2B49-3582-4CEE-9658-3710B213E754@illinois.edu>
Message-ID: <ee477075-8d1a-4172-a4f2-7c51f45d9224@googlegroups.com>

Thanks Chris, here is the link of the filed 
bug: https://redmine.open-bio.org/issues/3369

On Wednesday, July 11, 2012 2:02:46 PM UTC-5, Christopher Fields wrote:
>
> Peng, 
>
> Has this been filed as a bug yet?   
>
>     https://redmine.open-bio.org/projects/bioperl 
>
> Seems like it would be fairly easy to fix, but I want to track it just in 
> case. 
>
> chris 
>
> On Jul 11, 2012, at 12:45 PM, Peng Zhou wrote: 
>
> > Hello guys, 
> > 
> > Just a follow-up, it seems to me the bioperl-live version is still 
> having the same problem - calling hit "query" while query sequence "hit". I 
> also looked into the test script written for hmmer3 
> (bioperl-live/t/SearchIO/hmmer.t), and it doesn't deal with the alignment 
> part - I guess that's why this bug was not discovered. 
> > 
> > To be simple, here's an output of hmmsearch v3.0: 
> > # hmmsearch :: search profile(s) against a sequence database 
> > # HMMER 3.0 (March 2010); http://hmmer.org/ 
> > # Copyright (C) 2010 Howard Hughes Medical Institute. 
> > # Freely distributed under the GNU General Public License (GPLv3). 
> > # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
> - 
> > # query HMM file:                 
>  /project/youngn/zhoup/Scripts/spada/profile/21_all.hmm 
> > # target sequence database:       
>  /project/youngn/zhoup/Data/misc3/spada/Athaliana/01_genome/12_refseq_orf.fa 
>
> > # output directed to file:         
> /project/youngn/zhoup/Data/misc3/spada/Athaliana/11_hmmSearchX/01_raw.txt 
> > # number of worker threads:        4 
> > # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
> - 
> > 
> > Query:       CRP0000  [M=75] 
> > Scores for complete sequences (score includes all domains): 
> >    --- full sequence ---   --- best 1 domain ---    -#dom- 
> >     E-value  score  bias    E-value  score  bias    exp  N  Sequence     
>             Description 
> >     ------- ------ -----    ------- ------ -----   ---- --  --------     
>             ----------- 
> >     5.5e-25   95.0  14.4    5.7e-25   95.0  10.0    1.0  1 
>  Chr2_540228_540404_+     
> > 
> > Domain annotation for each sequence (and alignments): 
> > >> Chr2_540228_540404_+   
> >    #    score  bias  c-Evalue  i-Evalue hmmfrom  hmm to    alifrom  ali 
> to    envfrom  env to     acc 
> >  ---   ------ ----- --------- --------- ------- -------    ------- 
> -------    ------- -------    ---- 
> >    1 !   95.0  10.0   3.6e-30   5.7e-25      20      74 ..       4     
>  59 .]       1      59 [] 0.95 
> > 
> >   Alignments for each domain: 
> >   == domain 1    score: 95.0 bits;  conditional E-value: 3.6e-30 
> >                CRP0000 20 
> tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg.rrrCfCtkpc 74 
> >                           ++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf gG+crg 
> rrrC+Ct++c 
> >   Chr2_540228_540404_+  4 
> GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC 59 
> >                           
> 568899***99********************************************* PP 
> > 
> > And here is a dump of the parsed HSP object: 
> > $VAR1 = bless( { 
> >                  'VERBOSE' => 0, 
> >                  'IDENTICAL' => 0, 
> >                  'RANK' => 1, 
> >                  'STRANDED' => 'NONE', 
> >                  'EVALUE' => '3.6e-30', 
> >                  'HSP_LENGTH' => 56, 
> >                  'ALGORITHM' => 'HMMSEARCH' 
> >                  'SCORE' => '95.0', 
> >                  'GAP_SYMBOL' => '-', 
> >                  'CONSERVED' => 0, 
> >                   
> >                  'HIT_NAME' => 'Chr2_540228_540404_+', 
> >                  'HIT_DESC' => '', 
> >                  'HIT_START' => '20', 
> >                  'HIT_END' => '74', 
> >                  'HIT_LENGTH' => 56, 
> >                  'HIT_SEQ' => 
> 'tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg-rrrCfCtkpc', 
> >                  'HIT_FRAME' => 0, 
> >                   
> >                  'QUERY_NAME' => 'CRP0000', 
> >                  'QUERY_DESC' => undef, 
> >                  'QUERY_START' => '4', 
> >                  'QUERY_END' => '59', 
> >                  'QUERY_LENGTH' => '75', 
> >                  'QUERY_FRAME' => 0, 
> >                  'QUERY_SEQ' => 
> 'GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC', 
> >                   
> >                  'HOMOLOGY_SEQ' => '++gp+++eartCes+Sh+FkGpCvs 
> +nCa+vC++Egf gG+crg rrrC+Ct++c', 
> >                }, 'Bio::Search::HSP::HMMERHSP' ); 
> > 
> > Clearly, the "HIT_START", "HIT_END", "HIT_SEQ" should actually be 
> exchanged with "QUERY_START", "QUERY_END" and "QUERY_SEQ" values. 
> > 
> > Thanks, 
> > 
> > Peng, 
> > 
> > On Tuesday, July 19, 2011 11:23:20 PM UTC-5, Givan, Scott A. wrote: 
> > I'll try the bioperl-live version. Thanks guys. 
> > Scott Givan 
> > 541-740-4685 
> > Sent from an iPhone (so expect typos). 
> > 
> > On Jul 19, 2011, at 10:34 PM, "Chris Fields" <cjfields at illinois.edu> 
> wrote: 
> > 
> > > This might be a disconnect between the HMMER3 version in bioperl-live 
> and the one in Kai's bioperl-hmmer3 repo.  I believe the one in 
> bioperl-live is newer.  Scott, can you give that a try? 
> > > 
> > > chris 
> > > 
> > > On Jul 19, 2011, at 9:45 PM, Thomas Sharpton wrote: 
> > > 
> > >> Hi Scott, 
> > >> 
> > >> Thanks for writing. I'm on the road at the moment so I have to be 
> briefer and less thorough than I'd like to be. 
> > >> 
> > >> What you are observing is not the intended behavior. Oddly, it's not 
> what I recall obtaining in my tests on this software, though I was mostly 
> interested in hmmsearch at the time and may have been sloppier than I 
> should have been when it came to hmmscan. 
> > >> 
> > >> What version of HMMER3 you're using? There have been some small 
> formatting changes in the past that might be causing a burp in the parser, 
> though I'm doubting it. 
> > >> 
> > >> Kai Blin wrote some test scripts (found here: 
> bioperl-live/t/SearchIO/hmmer.t) that, if I recall correctly, evaluate 
> query/hit coordinates. It might be worth giving this a shot if you haven't 
> already. 
> > >> 
> > >> Also, if you don't mind, I'm happy to run your code on your output 
> file on my end. It might help me diagnose the problem. 
> > >> 
> > >> Sorry this is being a thorn in your side! I've cc'ed the list in case 
> anyone else has insight into this matter. 
> > >> 
> > >> Best, 
> > >> Thomas 
> > >> 
> > >> On Jul 19, 2011, at 10:43 AM, Givan, Scott A. wrote: 
> > >> 
> > >>> Hi Thomas, 
> > >>> 
> > >>> I'm using modules in the bipoerl-hmmer3 git repository to parse 
> hmmscan 
> > >>> reports. When I parse the files and walk through the HSP's like: 
> > >>> 
> > >>> while (my $hit = $rslt->next_model) { 
> > >>> 
> > >>>    while (my $domain = $hit->next_hsp) { 
> > >>> 
> > >>> And retrieve the "hit" coordinates like: 
> > >>> 
> > >>>        print "hit coords: ", $domain->start('hit'), "-", 
> $domain->end('hit'), 
> > >>> "\n"; 
> > >>> 
> > >>> The coordinates returned correspond to what I would call the 
> "query", 
> > >>> since they are for the sequence I fed to hmmscan to search the 
> profile 
> > >>> database. Likewise, when retrieving the query coordinates like 
> > >>> $domain->start('query'), I get what I consider the "hit" 
> coordinates, 
> > >>> since they are for the domain profile. Is this the intended 
> behavior? 
> > >>> 
> > >>> Thanks. 
> > >>> 
> > >>> scott 
> > >>> 
> > >>> -- 
> > >>> Scott A. Givan 
> > >>> Associate Director 
> > >>> Informatics Research Core Facility 
> > >>> 240e Bond Life Sciences Center 
> > >>> Research Assistant Professor 
> > >>> Molecular Microbiology and Immunology 
> > >>> University of Missouri, Columbia 
> > >>> 
> > >>> TEL 573-882-2948 
> > >>> FAX 573-884-9676 
> > >>> http://ircf.rnet.missouri.edu 
> > >>> 
> > >>> 
> > >>> 
> > >> 
> > >> _______________________________________________ 
> > >> Bioperl-l mailing list 
> > >> Bioperl-l at lists.open-bio.org 
> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l 
> > > 
> > 
> > _______________________________________________ 
> > Bioperl-l mailing list 
> > Bioperl-l at lists.open-bio.org 
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l 
> > 
> > 
> > 
> > 
>
>
> _______________________________________________ 
> Bioperl-l mailing list 
> Bioperl-l at lists.open-bio.org 
> http://lists.open-bio.org/mailman/listinfo/bioperl-l 
>


From w.arindrarto at gmail.com  Wed Jul 11 17:25:44 2012
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Wed, 11 Jul 2012 23:25:44 +0200
Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates
In-Reply-To: <CAKVJ-_4mo16NxEbZkNp3-EG9uknc4Fo4m6ajQWTknumQNUn4Yw@mail.gmail.com>
References: <CA4B2C62.4B0E%givans@missouri.edu>
	<7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com>
	<1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu>
	<EBB9878A-F77D-4A0C-B543-7E9FA0C25ABE@missouri.edu>
	<a73f4de8-8a2e-4dec-9073-62ab2e4da668@googlegroups.com>
	<FFCC2B49-3582-4CEE-9658-3710B213E754@illinois.edu>
	<CAKVJ-_4mo16NxEbZkNp3-EG9uknc4Fo4m6ajQWTknumQNUn4Yw@mail.gmail.com>
Message-ID: <CADEGkF5mE=1Kgy5X4BX9VH0TuChU4OyKuf3e-rzH_SeB=xxapA@mail.gmail.com>

Hi everyone,

Just as an additional info that might be useful:

The current Biopython parser for the plain text format parses the very
first line to find out which HMMER flavor produces the result. Both 'hmm
from' and 'hmmto' are query coordinates if the flavor is hmmsearch
or phmmer; and they're hit coordinates if the flavor is hmmscan.

This information is not available in other HMMER command line output
formats (tblout and domtblout), which as Peter has mentioned, required
us to treat different flavors of the table output as different formats
for the time being.
Fortunately, after contacting the HMMER developers they mentioned that this
is not the case anymore in their development branch (and their future
planned release).

Hope that helps :),
Bow


On Wed, Jul 11, 2012 at 11:00 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Wed, Jul 11, 2012 at 8:02 PM, Fields, Christopher J
> <cjfields at illinois.edu> wrote:
>> Peng,
>>
>> Has this been filed as a bug yet?
>>
>>     https://redmine.open-bio.org/projects/bioperl
>>
>> Seems like it would be fairly easy to fix, but I want to track it just in case.
>>
>> chris
>
> Hi all,
>
> This could be the unfortunate fact that hmmscan and
> hmmsearch return very similar tabular output, but
> with query and hit interchanged. i.e. You need some
> extra information to know which way round they are
> (not possible with the current output). This was an
> issue in Bow's Biopython SearchIO project - which
> for the moment he solved by handling this as two
> hmmer file formats. In the medium term we're hoping
> hmmer3 will add some header information or something.
>
> Peter


From dejian.zhao at gmail.com  Thu Jul 12 01:04:54 2012
From: dejian.zhao at gmail.com (De-Jian Zhao)
Date: Thu, 12 Jul 2012 13:04:54 +0800
Subject: [Bioperl-l] Errors with Bio::Seq objects
In-Reply-To: <25cf332a-f998-4eae-96ff-d9db1ee2ff2c@googlegroups.com>
References: <4FFDB879.1020906@gmail.com>
	<25cf332a-f998-4eae-96ff-d9db1ee2ff2c@googlegroups.com>
Message-ID: <4FFE5AF6.1020300@gmail.com>

Thank you, Peng. That's great! Actually I am wondering how to get the 
whole content of an object these days; "Dumping it" is a good solution.

On 2012-7-12 2:03, Peng Zhou wrote:
> Hi,
>
> I guess that's what the commands are supposed to do: the get_SeqFeatures()
> method return an array of Bio::SeqFeature objects, and the translate()
> method returns a Bio::Seq object. And you can't simply "print" an object in
> perl - you can "dump" it though:
>
> $ perl -e ' use Bio::SeqIO; use Data::Dumper; my
> $seqio=Bio::SeqIO->new(-file=>shift);
> print Dumper($seqio->next_seq()->get_SeqFeatures()); ' nt.gb
>
> $ perl -e ' use Bio::SeqIO; use Data::Dumper; my
> $seqio=Bio::SeqIO->new(-file=>shift);
> print Dumper($seqio->next_seq()->translate()); ' nt.gb
>
> On Wednesday, July 11, 2012 12:31:37 PM UTC-5, De-Jian Zhao wrote:
>> Hi,
>>
>> I downloaded a nucleotide sequence from Genbank (file name: nt.gb) and
>> tested the Bio::SeqIO module as follows. The first 3 commands succeeded;
>> however the last 2 failed.
>>
>> I think $seqio->next_seq() produces a Bio::Seq object which contains the
>> sequence, features and annotation (according to the DESCRIPTION of
>> "perldoc Bio::Seq") and thus the invocation of the methods
>> get_SeqFeatures() and translate() should be valid. However, the results
>> denied this idea.
>>
>> Will anyone explain what happened to the last 2 commands? I have
>> encountered numerous cases of failures when testing the bioperl methods.
>> I want to translate the mRNA sequence and extract the sequence features.
>> What are the right commands? Thanks a lot!
>>
>> Best,
>> Dejian
>>
>>
>>
>> PS: The commands and results
>>
>> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift);
>> print $seqio->next_seq()->display_id(); ' nt.gb
>> NM_053056
>>
>> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift);
>> print $seqio->next_seq()->molecule(); ' nt.gb
>> mRNA
>>
>> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift);
>> print $seqio->next_seq()->subseq(1,6); ' nt.gb
>> CACACG
>>
>> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift);
>> print $seqio->next_seq()->get_SeqFeatures(); ' nt.gb
>> Bio::SeqFeature::Generic=HASH(0x20a30898)Bio::SeqFeature::Generic=HASH(0x20a30bb0)Bio::SeqFeature::Generic=HASH(0x20a30cd0)Bio::SeqFeature::Generic=HASH(0x20a317b0)Bio::SeqFeature::Generic=HASH(0x20a31720)Bio::SeqFeature::Generic=HASH(0x20a39a18)Bio::SeqFeature::Generic=HASH(0x20a317e0)Bio::SeqFeature::Generic=HASH(0x20a398b0)Bio::SeqFeature::Generic=HASH(0x20a39838)Bio::SeqFeature::Generic=HASH(0x20a39e98)Bio::SeqFeature::Generic=HASH(0x20a3b898)Bio::SeqFeature::Generic=HASH(0x20a3a120)Bio::SeqFeature::Generic=HASH(0x20a3bda8)Bio::SeqFeature::Generic=HASH(0x20a3c030)Bio::SeqFeature::Generic=HASH(0x20a3c2b8)Bio::SeqFeature::Generic=HASH(0x20a3be20)Bio::SeqFeature::Generic=HASH(0x20a3c0a8)Bio::SeqFeature::Generic=HASH(0x20a3bb98)Bio::SeqFeature::Generic=HASH(0x20a3c300)Bio::SeqFeature::Generic=HASH(0x20a3c588)Bio::SeqFeature::Generic=HASH(0x20a3d838)Bio::SeqFeature::Generic=HASH(0x20a3dfb8)Bio::SeqFeature::Generic=HASH(0x20a3dd18)
>>
>>
>> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift);
>> print $seqio->next_seq()->translate(); ' nt.gb
>> Bio::Seq::RichSeq=HASH(0x20a3e7b0)
>>


From dejian.zhao at gmail.com  Thu Jul 12 01:14:33 2012
From: dejian.zhao at gmail.com (De-Jian Zhao)
Date: Thu, 12 Jul 2012 13:14:33 +0800
Subject: [Bioperl-l] Errors with Bio::Seq objects
In-Reply-To: <9CA9DA3A-B03F-4EC3-977C-E18A6F4D9B6F@tamu.edu>
References: <4FFDB879.1020906@gmail.com>
	<9CA9DA3A-B03F-4EC3-977C-E18A6F4D9B6F@tamu.edu>
Message-ID: <4FFE5D39.6010406@gmail.com>

Thank you, Jim. You are right. It works. This example deepens my 
understanding of OOP.

On 2012-7-12 2:01, Jim Hu wrote:
>> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->translate(); ' nt.gb
>> >  Bio::Seq::RichSeq=HASH(0x20a3e7b0)
> ->translate returns a new Seq object.  I think
>
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->translate()->seq(); ' nt.gb
>
> should work (haven't tried it).


From kai.blin at biotech.uni-tuebingen.de  Thu Jul 12 09:43:19 2012
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Thu, 12 Jul 2012 15:43:19 +0200
Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates
In-Reply-To: <CADEGkF5mE=1Kgy5X4BX9VH0TuChU4OyKuf3e-rzH_SeB=xxapA@mail.gmail.com>
References: <CA4B2C62.4B0E%givans@missouri.edu>
	<7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com>
	<1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu>
	<EBB9878A-F77D-4A0C-B543-7E9FA0C25ABE@missouri.edu>
	<a73f4de8-8a2e-4dec-9073-62ab2e4da668@googlegroups.com>
	<FFCC2B49-3582-4CEE-9658-3710B213E754@illinois.edu>
	<CAKVJ-_4mo16NxEbZkNp3-EG9uknc4Fo4m6ajQWTknumQNUn4Yw@mail.gmail.com>
	<CADEGkF5mE=1Kgy5X4BX9VH0TuChU4OyKuf3e-rzH_SeB=xxapA@mail.gmail.com>
Message-ID: <4FFED477.3090907@biotech.uni-tuebingen.de>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 2012-07-11 23:25, Wibowo Arindrarto wrote:

Hi,

> The current Biopython parser for the plain text format parses the
> very first line to find out which HMMER flavor produces the result.
> Both 'hmm from' and 'hmmto' are query coordinates if the flavor is
> hmmsearch or phmmer; and they're hit coordinates if the flavor is
> hmmscan.

Whoops. I mostly looked at hmmscan when writing the parser, because
that's the file format I needed for my code. The code clearly should
follow the way the hmmer2 parser works, and differentiate between
hmmsearch and hmmscan type output.

As I said on the bug report, I'm happy to look at code fixing this.

> This information is not available in other HMMER command line
> output formats (tblout and domtblout), which as Peter has
> mentioned, required us to treat different flavors of the table
> output as different formats for the time being.

As far as I'm aware, BioPerl currently doesn't parse the table output
format.

Seeing how much repeated pain we run into with all these parsers in
the different Bio* projects, I wonder if there was a smarter way to
deal with parsing. Maybe at least some shared grammar file that we
could use for testing, to make sure we at least have the same
expectations about file formats in the different language
implementations. Ideally we'd auto-generate the parsers from the
grammar specification, but I guess that'll stay wishful thinking for
quite a bit.

> Fortunately, after contacting the HMMER developers they mentioned
> that this is not the case anymore in their development branch (and
> their future planned release).

That's certainly good news. :)

Cheers,
Kai

- -- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-Universit?t T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Germany
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJP/tR3AAoJEKM5lwBiwTTP6OoIAM3J9chdyfmTuQTp4KMxVIk7
PCkJy+aLcnfa3d7s8BVPG0GWQTPrfHLX6a7zWfoSLzL9RBShFWCQIxGpu+Tq3yR8
Hu/TpoFIg8bB1iAroAWLdsX8nio3Idlcl5JN38LBsFEUirFrGAsvfdN/+fYrP5Ni
y0ULP18uihiN07sVG88nZXNyEB7fIscVYdO90GsGq03/KOTRsRD4kugapiQJIy4D
lrqnYznLa4p30lBDCEHbTaHYbfIs7/8tryfHJsfjimjg8IoSMHMJfIkI7/z0qlL+
bxt/HuGMsm1Ak08xEAoT7T00t5tcAp1gclgZsO/CrviOicmhUgd6iri/kIpzg0c=
=acWd
-----END PGP SIGNATURE-----


From cjfields at illinois.edu  Thu Jul 12 11:24:13 2012
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Thu, 12 Jul 2012 15:24:13 +0000
Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates
In-Reply-To: <4FFED477.3090907@biotech.uni-tuebingen.de>
References: <CA4B2C62.4B0E%givans@missouri.edu>
	<7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com>
	<1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu>
	<EBB9878A-F77D-4A0C-B543-7E9FA0C25ABE@missouri.edu>
	<a73f4de8-8a2e-4dec-9073-62ab2e4da668@googlegroups.com>
	<FFCC2B49-3582-4CEE-9658-3710B213E754@illinois.edu>
	<CAKVJ-_4mo16NxEbZkNp3-EG9uknc4Fo4m6ajQWTknumQNUn4Yw@mail.gmail.com>
	<CADEGkF5mE=1Kgy5X4BX9VH0TuChU4OyKuf3e-rzH_SeB=xxapA@mail.gmail.com>
	<4FFED477.3090907@biotech.uni-tuebingen.de>
Message-ID: <1C3A31F9-9717-49F3-A880-FA725D0F3CDB@illinois.edu>

On Jul 12, 2012, at 8:43 AM, Kai Blin wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 2012-07-11 23:25, Wibowo Arindrarto wrote:
> 
> Hi,
> 
>> The current Biopython parser for the plain text format parses the
>> very first line to find out which HMMER flavor produces the result.
>> Both 'hmm from' and 'hmmto' are query coordinates if the flavor is
>> hmmsearch or phmmer; and they're hit coordinates if the flavor is
>> hmmscan.
> 
> Whoops. I mostly looked at hmmscan when writing the parser, because
> that's the file format I needed for my code. The code clearly should
> follow the way the hmmer2 parser works, and differentiate between
> hmmsearch and hmmscan type output.
> 
> As I said on the bug report, I'm happy to look at code fixing this.

Seems like it should be easy enough to address if there is something in the output that indicates the report type.

>> This information is not available in other HMMER command line
>> output formats (tblout and domtblout), which as Peter has
>> mentioned, required us to treat different flavors of the table
>> output as different formats for the time being.
> 
> As far as I'm aware, BioPerl currently doesn't parse the table output
> format.

The only reason to do so is if the table provides additional information the actual hits don't (this can be the case with BLAST reports).

> Seeing how much repeated pain we run into with all these parsers in
> the different Bio* projects, I wonder if there was a smarter way to
> deal with parsing. Maybe at least some shared grammar file that we
> could use for testing, to make sure we at least have the same
> expectations about file formats in the different language
> implementations. Ideally we'd auto-generate the parsers from the
> grammar specification, but I guess that'll stay wishful thinking for
> quite a bit.

I would fully support something like this, been thinking about this with Marpa::XS (which now has a compiled library, libmarpa, to make it less perl-centric), and there have been talks of using a similar toolkit with the bioruby folks.  We could always have a plain-perl/python/ruby/etc fallback in the most common formats.

chris


From buschj at hhu.de  Sun Jul 15 15:46:42 2012
From: buschj at hhu.de (jobu)
Date: Sun, 15 Jul 2012 21:46:42 +0200
Subject: [Bioperl-l] How to obtain Up- and Downstream target-Sequences of
	Blast Matches
Message-ID: <50031E22.3060902@hhu.de>

Dear All.

Still being a beginner in Perl and just having started to look into
BioPerl, I hope to ask my question at the right place.

I locally ran a standalone blastn search of many short query-sequences
against a set of target-fasta-sequences consisting of whole chromosomal
sequence data.

What I need to do now is to get let's say 100nt each Up- and Downstream
out of my target sequences for each Blast match.

At this point I only can assume that BioPerl might be helpfull in
resolving this task, though I haven't found a module yet that will
manage to do this locally on my harddrive.

Thus I would be thankful for the slightest hint where to begin.

Sincerely
Jochen


From Russell.Smithies at agresearch.co.nz  Sun Jul 15 17:19:14 2012
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Mon, 16 Jul 2012 09:19:14 +1200
Subject: [Bioperl-l] How to obtain Up- and Downstream target-Sequences
 of	Blast Matches
In-Reply-To: <50031E22.3060902@hhu.de>
References: <50031E22.3060902@hhu.de>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF34CCF2A4CAA@exchsth.agresearch.co.nz>

Hi Jochen,
I don't think BioPerl can directly manipulate blast databases so I'd probably do it with fastacmd to extract the sequence from the original blast database.
eg.
fastacmd -s X51494.1 -d /dataset/blastdata/active/nt -L 100,200
>gi|20090|emb|X51494.1|:100-200 Rice prolamin gene (strain NE4)
ATGATGCAAACGTTGGGCATGGGTAGCTCCACAGCCATGTTCATGTCGCAGCCAATGGCGCTCCTGCAGCAGCAATGTTG
CATGCAGCTACAAGGCATGAT

Or if you're using blast+, use the blastdbcmd command:
eg.
blastdbcmd -entry  X51494.1 -db /dataset/blastdata/active/nt -range 100-200
>gi|20090|emb|X51494.1|:100-200 Rice prolamin gene (strain NE4)
ATGATGCAAACGTTGGGCATGGGTAGCTCCACAGCCATGTTCATGTCGCAGCCAATGGCGCTCCTGCAGCAGCAATGTTG
CATGCAGCTACAAGGCATGAT

So to add it all together, try using BioPerl to parse your existing blast results and pull out each hit's coordinates then use a system call to exec fastacmd or blastdbcmd to extract the sequence from the blast database then write the sequences to file.

These might be useful:
http://www.bioperl.org/wiki/HOWTO:SearchIO
http://www.bioperl.org/wiki/HOWTO:SearchIO#Speed_improvements_with_lightweight_objects 
http://www.bioperl.org/wiki/HOWTO:BlastPlus
http://www.bioperl.org/wiki/HOWTO:StandAloneBlast


--Russell

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of jobu
Sent: Monday, 16 July 2012 7:47 a.m.
To: bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] How to obtain Up- and Downstream target-Sequences of Blast Matches

Dear All.

Still being a beginner in Perl and just having started to look into BioPerl, I hope to ask my question at the right place.

I locally ran a standalone blastn search of many short query-sequences against a set of target-fasta-sequences consisting of whole chromosomal sequence data.

What I need to do now is to get let's say 100nt each Up- and Downstream out of my target sequences for each Blast match.

At this point I only can assume that BioPerl might be helpfull in resolving this task, though I haven't found a module yet that will manage to do this locally on my harddrive.

Thus I would be thankful for the slightest hint where to begin.

Sincerely
Jochen
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From dcmertens.perl at gmail.com  Tue Jul 17 08:57:55 2012
From: dcmertens.perl at gmail.com (David Mertens)
Date: Tue, 17 Jul 2012 07:57:55 -0500
Subject: [Bioperl-l] Announcing The Quantified Onion Google Group and
	perl4science.github.com
Message-ID: <CA+4ieYXGs=ywH97gEdTOakVLp22n0gPuwZJK-C75C7ijUzPgwg@mail.gmail.com>

Hello everybody -

I returned from YAPC::NA this year intending to build-up the scientific
Perl community. One outgrowth of this has been Joel Berger's creation of
perl4science.github.com and gizmomathboy's creation of The Quantified Onion
Google Group<https://groups.google.com/forum/#%21forum/the-quantified-onion>
.

perl4science is meant to be a landing page for anybody looking to combine
Perl and science. Since it is a github repository, it makes it about as
easy as possible for others to contribute content or fixes. If you have a
project that scientists would find useful, you should fork the project, add
your content, and issue a pull request. It's that easy.

The Quantified Onion is meant to be a space for scientists to discuss how
we use Perl in our science and to work together to grow adoption of Perl
among scientists. It will undoubtedly attract newcomers to Perll asking
beginner questions, at which point we will gently refer them to the
appropriate manual pages. Interesting discussions thus far (in my mind)
include a discussion about teaching test-driven design and a discussion
about submitting an article to Computing in Science and Engineering for
their November Issue, which is supposed to be about Modern Programming
Languages. I would like to begin putting on workshops on Perl for
Scientists and Engineers (and encourage others to do that same), and I will
begin the discussion on The Quantified Onion.

If you know of other Perl science resources, please feel free to add them
to perl4science or post them on The Quantified Onion, and please join The
Quantified Onion. Together, we can grow Perl's adoption among scientists!

David Mertens

-- 
 "Debugging is twice as hard as writing the code in the first place.
  Therefore, if you write the code as cleverly as possible, you are,
  by definition, not smart enough to debug it." -- Brian Kernighan


From cjfields at illinois.edu  Wed Jul 18 10:29:02 2012
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Wed, 18 Jul 2012 14:29:02 +0000
Subject: [Bioperl-l] [blast-announce] OLD_BLAST parameter to be
 discontinued. Alternative NCBI BLAST parsable formats are available
References: <CC2C3DA1.A01D6%mcginnis@ncbi.nlm.nih.gov>
Message-ID: <118F034CF4C3EF48A96F86CE585B94BF2B63F6C5@CHIMBX5.ad.uillinois.edu>

Not sure if anyone is using this as a means of getting their reports (I don't), but I'm posting this here just in case.

-c

Begin forwarded message:

> From: "Mcginnis, Scott (NIH/NLM/NCBI) [E]" 
> Subject: [blast-announce] OLD_BLAST parameter to be discontinued. Alternative NCBI BLAST parsable formats are available
> Date: July 18, 2012 9:17:05 AM CDT
> To: NLM/NCBI List blast-announce
> 
> Beginning Sept. 10, 2012, the BLAST service will ignore the OLD_BLAST parameter in posted URLs. We are removing this old and little used option to prepare for upcoming enhancements to the BLAST service later this year. Setting OLD_BLAST=true produces an older version of the BLAST HTML results that a few people have used for automated processing (parsing) of results. NCBI BLAST supports a number of different and more stable formats for parsing. These include XML, tabular reports and ASN.1. For more information, please see BLAST Developer Information (http://1.usa.gov/O8AocI) and links on that page.
> 


From dejian.zhao at gmail.com  Wed Jul 18 11:36:14 2012
From: dejian.zhao at gmail.com (De-Jian Zhao)
Date: Wed, 18 Jul 2012 23:36:14 +0800
Subject: [Bioperl-l] Which graphic module should I learn?
Message-ID: <5006D7EE.1020205@gmail.com>

Hi, all.

Currently I am working on a genome. I will draw some pictures based on 
the sequencing data. In the long run, I will use the module in my future 
projects, so I want to learn a popular module to get better support from 
the community. I searched in cpan with the command "i /SVG/" and got 234 
items. Which one is popular in bioinformatics? Which module should I 
start with?  Thanks for any suggestions.

Best,
De-Jian


From scott at scottcain.net  Wed Jul 18 11:46:01 2012
From: scott at scottcain.net (Scott Cain)
Date: Wed, 18 Jul 2012 11:46:01 -0400
Subject: [Bioperl-l] Which graphic module should I learn?
In-Reply-To: <5006D7EE.1020205@gmail.com>
References: <5006D7EE.1020205@gmail.com>
Message-ID: <CA+JTaow0sJ-MTq47r1vvvu+H2zPu=U8VuEi++s-YDxcTo1+Ybg@mail.gmail.com>

Hi De-Jian,

Of course, it depends on what you want to do, but if you're referring
to the genome feature/annotation type graphics, Bio::Graphics already
supports SVG pretty well, via GD::SVG.

Scott


On Wed, Jul 18, 2012 at 11:36 AM, De-Jian Zhao <dejian.zhao at gmail.com> wrote:
> Hi, all.
>
> Currently I am working on a genome. I will draw some pictures based on the
> sequencing data. In the long run, I will use the module in my future
> projects, so I want to learn a popular module to get better support from the
> community. I searched in cpan with the command "i /SVG/" and got 234 items.
> Which one is popular in bioinformatics? Which module should I start with?
> Thanks for any suggestions.
>
> Best,
> De-Jian
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research


From cjfields at illinois.edu  Tue Jul 24 23:08:05 2012
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Wed, 25 Jul 2012 03:08:05 +0000
Subject: [Bioperl-l] BioPerl and Travis-CI
Message-ID: <118F034CF4C3EF48A96F86CE585B94BF3140B017@CITESMBX5.ad.uillinois.edu>

Peter Cock has graciously helped start up a branch for bioperl-live that is using Travis-CI (a nice continuous integration tool).  Results from Peter's fork are found here:

    http://travis-ci.org/#!/peterjc/bioperl-live

As this is now pulled into the main bioperl repo, results will be here:

    http://travis-ci.org/#!/bioperl/bioperl-live

I'll be working on this and expect this will be added to master in the next few days.  

chris


From p.j.a.cock at googlemail.com  Wed Jul 25 06:31:13 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 25 Jul 2012 11:31:13 +0100
Subject: [Bioperl-l] BioPerl and Travis-CI
In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF3140B017@CITESMBX5.ad.uillinois.edu>
References: <118F034CF4C3EF48A96F86CE585B94BF3140B017@CITESMBX5.ad.uillinois.edu>
Message-ID: <CAKVJ-_4TN_OUGLGrcwp-cNkQGhXGn2BiO9F1NEzAiua49HY3Lg@mail.gmail.com>

On Wed, Jul 25, 2012 at 4:08 AM, Fields, Christopher J
<cjfields at illinois.edu> wrote:
> Peter Cock has graciously helped start up a branch for bioperl-live
> that is using Travis-CI (a nice continuous integration tool).  Results
> from Peter's fork are found here:
>
>     http://travis-ci.org/#!/peterjc/bioperl-live
>
> As this is now pulled into the main bioperl repo, results will be here:
>
>     http://travis-ci.org/#!/bioperl/bioperl-live
>
> I'll be working on this and expect this will be added to master in
> the next few days.
>
> chris

We've had this running for Biopython for a month now, and it has
been a useful complement to the BuildBot (which covers other
operating systems). This was following BioRuby's lead:
http://biopython.org/pipermail/biopython-dev/2012-June/009742.html

The current BioPerl Travis configuration is probably usable right
now (after changing the branch whitelist to either master, or simple
all branches).

Other remaining issues include sorting out which dependencies
should be installed, and streamlining their verbose output (e.g.
using tail).

TravisCI can send out emails (e.g. on test failures), and perhaps
bioperl-guts-l might be a sensible place to send these. Initially
we'd disabled the emails for Biopython. I'd like to use an RSS
feed... there is a JSON API which BioRuby are using for
http://www.biogems.info/ which tracks their plugins.

Peter


From p.j.a.cock at googlemail.com  Fri Jul 27 11:03:05 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 27 Jul 2012 16:03:05 +0100
Subject: [Bioperl-l] BioPerl and Travis-CI
In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF3140D21F@CITESMBX5.ad.uillinois.edu>
References: <118F034CF4C3EF48A96F86CE585B94BF3140B017@CITESMBX5.ad.uillinois.edu>
	<CAKVJ-_4TN_OUGLGrcwp-cNkQGhXGn2BiO9F1NEzAiua49HY3Lg@mail.gmail.com>
	<CAKVJ-_7g-rVQ37Q0Svg4LGXN7C=3i7Mab-kjUc2XpuuX5uEXsQ@mail.gmail.com>
	<CAKVJ-_45r1Hza-ga95oyMRbYGzupsKtmAZ=ac29g2n-U9SegWA@mail.gmail.com>
	<118F034CF4C3EF48A96F86CE585B94BF3140D21F@CITESMBX5.ad.uillinois.edu>
Message-ID: <CAKVJ-_4iYB0N7A6eOEivrKTkeQZzw=9jiS7kqVCKR2aeEiZZaQ@mail.gmail.com>

On Fri, Jul 27, 2012 at 3:58 PM, Fields, Christopher J
<cjfields at illinois.edu> wrote:
> On Jul 27, 2012, at 9:47 AM, Peter Cock wrote:
>
>> On Thu, Jul 26, 2012 at 4:22 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>>>
>>> That's done now - except for the circular dependencies, and GD,
>>> which might be easy to solve if anyone knows what the error
>>> means - see commit message here:
>>> https://github.com/peterjc/bioperl-live/commit/905441ac09939be3368c14de38d04486c7e9849a
>>
>> Re: https://twitter.com/cjfields/status/228861370454638592
>> Not sure why you got GD to work when something very similar
>> had failed for me. Oh well - job done :)
>
> It was the lack of gdlib-config in the libgd2-xpm package, you need
> libgd2-xpm-dev.  One of the fun things about Debian packaging.

Ah - I should have guessed that.

>>> Would a single clean commit of the (current) .travis.yml file be
>>> preferable to the current series of commits? And you you want
>>> a pull request, or would you just merge/cherry-pick manually?
>>
>> Given all the churn between our revisions, personally I'd opt for
>> a single clean commit to bioperl/master - but your call.
>>
>> Peter
>
> Yep, about to merge it over.  It's working now, just need to
> whitelist master instead of travis after the merge.

I'd removed the whitelist altogether here:
https://github.com/peterjc/bioperl-live/commit/96dc5866f4406179353909c72d812623341c8fbd

My thinking was BioPerl seems to have multiple feature branches
under the official repo, so they should get tested too. You'd be
in a better position than me to judge what would work best for
BioPerl here.

Peter


From cjfields at illinois.edu  Fri Jul 27 10:58:21 2012
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Fri, 27 Jul 2012 14:58:21 +0000
Subject: [Bioperl-l] BioPerl and Travis-CI
In-Reply-To: <CAKVJ-_45r1Hza-ga95oyMRbYGzupsKtmAZ=ac29g2n-U9SegWA@mail.gmail.com>
References: <118F034CF4C3EF48A96F86CE585B94BF3140B017@CITESMBX5.ad.uillinois.edu>
	<CAKVJ-_4TN_OUGLGrcwp-cNkQGhXGn2BiO9F1NEzAiua49HY3Lg@mail.gmail.com>
	<CAKVJ-_7g-rVQ37Q0Svg4LGXN7C=3i7Mab-kjUc2XpuuX5uEXsQ@mail.gmail.com>
	<CAKVJ-_45r1Hza-ga95oyMRbYGzupsKtmAZ=ac29g2n-U9SegWA@mail.gmail.com>
Message-ID: <118F034CF4C3EF48A96F86CE585B94BF3140D21F@CITESMBX5.ad.uillinois.edu>

On Jul 27, 2012, at 9:47 AM, Peter Cock wrote:

> On Thu, Jul 26, 2012 at 4:22 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>> 
>> That's done now - except for the circular dependencies, and GD,
>> which might be easy to solve if anyone knows what the error
>> means - see commit message here:
>> https://github.com/peterjc/bioperl-live/commit/905441ac09939be3368c14de38d04486c7e9849a
> 
> Re: https://twitter.com/cjfields/status/228861370454638592
> Not sure why you got GD to work when something very similar
> had failed for me. Oh well - job done :)

It was the lack of gdlib-config in the libgd2-xpm package, you need libgd2-xpm-dev.  One of the fun things about Debian packaging.

>> Would a single clean commit of the (current) .travis.yml file be
>> preferable to the current series of commits? And you you want
>> a pull request, or would you just merge/cherry-pick manually?
> 
> Given all the churn between our revisions, personally I'd opt for
> a single clean commit to bioperl/master - but your call.
> 
> Peter

Yep, about to merge it over.  It's working now, just need to whitelist master instead of travis after the merge.

chris


From cjfields at illinois.edu  Fri Jul 27 12:26:34 2012
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Fri, 27 Jul 2012 16:26:34 +0000
Subject: [Bioperl-l] BioPerl Travis-CI now live
Message-ID: <118F034CF4C3EF48A96F86CE585B94BF3140D54D@CITESMBX5.ad.uillinois.edu>

All commits to bioperl-live master branch on github are now being tracked:

   http://travis-ci.org/#!/bioperl/bioperl-live

The .travis.yml file has a whitelist for branches to be tested; if anyone wants to test additional branches feel free to add them to the list!

chris


From cjfields at illinois.edu  Fri Jul 27 11:15:19 2012
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Fri, 27 Jul 2012 15:15:19 +0000
Subject: [Bioperl-l] BioPerl and Travis-CI
In-Reply-To: <CAKVJ-_4iYB0N7A6eOEivrKTkeQZzw=9jiS7kqVCKR2aeEiZZaQ@mail.gmail.com>
References: <118F034CF4C3EF48A96F86CE585B94BF3140B017@CITESMBX5.ad.uillinois.edu>
	<CAKVJ-_4TN_OUGLGrcwp-cNkQGhXGn2BiO9F1NEzAiua49HY3Lg@mail.gmail.com>
	<CAKVJ-_7g-rVQ37Q0Svg4LGXN7C=3i7Mab-kjUc2XpuuX5uEXsQ@mail.gmail.com>
	<CAKVJ-_45r1Hza-ga95oyMRbYGzupsKtmAZ=ac29g2n-U9SegWA@mail.gmail.com>
	<118F034CF4C3EF48A96F86CE585B94BF3140D21F@CITESMBX5.ad.uillinois.edu>
	<CAKVJ-_4iYB0N7A6eOEivrKTkeQZzw=9jiS7kqVCKR2aeEiZZaQ@mail.gmail.com>
Message-ID: <118F034CF4C3EF48A96F86CE585B94BF3140D2D6@CITESMBX5.ad.uillinois.edu>


On Jul 27, 2012, at 10:03 AM, Peter Cock wrote:

> On Fri, Jul 27, 2012 at 3:58 PM, Fields, Christopher J
> <cjfields at illinois.edu> wrote:
>> On Jul 27, 2012, at 9:47 AM, Peter Cock wrote:
>> 
>>> On Thu, Jul 26, 2012 at 4:22 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>>>> 
>>>> That's done now - except for the circular dependencies, and GD,
>>>> which might be easy to solve if anyone knows what the error
>>>> means - see commit message here:
>>>> https://github.com/peterjc/bioperl-live/commit/905441ac09939be3368c14de38d04486c7e9849a
>>> 
>>> Re: https://twitter.com/cjfields/status/228861370454638592
>>> Not sure why you got GD to work when something very similar
>>> had failed for me. Oh well - job done :)
>> 
>> It was the lack of gdlib-config in the libgd2-xpm package, you need
>> libgd2-xpm-dev.  One of the fun things about Debian packaging.
> 
> Ah - I should have guessed that.
> 
>>>> Would a single clean commit of the (current) .travis.yml file be
>>>> preferable to the current series of commits? And you you want
>>>> a pull request, or would you just merge/cherry-pick manually?
>>> 
>>> Given all the churn between our revisions, personally I'd opt for
>>> a single clean commit to bioperl/master - but your call.
>>> 
>>> Peter
>> 
>> Yep, about to merge it over.  It's working now, just need to
>> whitelist master instead of travis after the merge.
> 
> I'd removed the whitelist altogether here:
> https://github.com/peterjc/bioperl-live/commit/96dc5866f4406179353909c72d812623341c8fbd
> 
> My thinking was BioPerl seems to have multiple feature branches
> under the official repo, so they should get tested too. You'd be
> in a better position than me to judge what would work best for
> BioPerl here.
> 
> Peter

We'll keep it to master for now.  It's pretty easy to add branches as needed, and I didn't want to expand to all the potentially stale branches unless explicitly set (we need to triage all those at some point).

chris


From p.j.a.cock at googlemail.com  Fri Jul 27 10:47:18 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 27 Jul 2012 15:47:18 +0100
Subject: [Bioperl-l] BioPerl and Travis-CI
In-Reply-To: <CAKVJ-_7g-rVQ37Q0Svg4LGXN7C=3i7Mab-kjUc2XpuuX5uEXsQ@mail.gmail.com>
References: <118F034CF4C3EF48A96F86CE585B94BF3140B017@CITESMBX5.ad.uillinois.edu>
	<CAKVJ-_4TN_OUGLGrcwp-cNkQGhXGn2BiO9F1NEzAiua49HY3Lg@mail.gmail.com>
	<CAKVJ-_7g-rVQ37Q0Svg4LGXN7C=3i7Mab-kjUc2XpuuX5uEXsQ@mail.gmail.com>
Message-ID: <CAKVJ-_45r1Hza-ga95oyMRbYGzupsKtmAZ=ac29g2n-U9SegWA@mail.gmail.com>

On Thu, Jul 26, 2012 at 4:22 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>
> That's done now - except for the circular dependencies, and GD,
> which might be easy to solve if anyone knows what the error
> means - see commit message here:
> https://github.com/peterjc/bioperl-live/commit/905441ac09939be3368c14de38d04486c7e9849a

Re: https://twitter.com/cjfields/status/228861370454638592
Not sure why you got GD to work when something very similar
had failed for me. Oh well - job done :)

> Would a single clean commit of the (current) .travis.yml file be
> preferable to the current series of commits? And you you want
> a pull request, or would you just merge/cherry-pick manually?

Given all the churn between our revisions, personally I'd opt for
a single clean commit to bioperl/master - but your call.

Peter


From robfsouza at gmail.com  Fri Jul 27 18:29:22 2012
From: robfsouza at gmail.com (Robson de Souza)
Date: Fri, 27 Jul 2012 15:29:22 -0700 (PDT)
Subject: [Bioperl-l] obf sites offline?
Message-ID: <9bef8a3b-08ca-4868-be7a-193e7596290d@googlegroups.com>

I can't access any of the OBF sites, either from work (USA) or my phone... 
is there something going on?
Robson


From p.j.a.cock at googlemail.com  Thu Jul 26 11:22:26 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 26 Jul 2012 16:22:26 +0100
Subject: [Bioperl-l] BioPerl and Travis-CI
In-Reply-To: <CAKVJ-_4TN_OUGLGrcwp-cNkQGhXGn2BiO9F1NEzAiua49HY3Lg@mail.gmail.com>
References: <118F034CF4C3EF48A96F86CE585B94BF3140B017@CITESMBX5.ad.uillinois.edu>
	<CAKVJ-_4TN_OUGLGrcwp-cNkQGhXGn2BiO9F1NEzAiua49HY3Lg@mail.gmail.com>
Message-ID: <CAKVJ-_7g-rVQ37Q0Svg4LGXN7C=3i7Mab-kjUc2XpuuX5uEXsQ@mail.gmail.com>

On Wed, Jul 25, 2012 at 11:31 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Wed, Jul 25, 2012 at 4:08 AM, Fields, Christopher J
> <cjfields at illinois.edu> wrote:
>> Peter Cock has graciously helped start up a branch for bioperl-live
>> that is using Travis-CI (a nice continuous integration tool).  Results
>> from Peter's fork are found here:
>>
>>     http://travis-ci.org/#!/peterjc/bioperl-live
>>
>> As this is now pulled into the main bioperl repo, results will be here:
>>
>>     http://travis-ci.org/#!/bioperl/bioperl-live
>>
>> I'll be working on this and expect this will be added to master in
>> the next few days.
>>
>> chris
>
> We've had this running for Biopython for a month now, and it has
> been a useful complement to the BuildBot (which covers other
> operating systems). This was following BioRuby's lead:
> http://biopython.org/pipermail/biopython-dev/2012-June/009742.html
>
> The current BioPerl Travis configuration is probably usable right
> now (after changing the branch whitelist to either master, or simple
> all branches).
>
> Other remaining issues include sorting out which dependencies
> should be installed, and streamlining their verbose output (e.g.
> using tail).

That's done now - except for the circular dependencies, and GD,
which might be easy to solve if anyone knows what the error
means - see commit message here:
https://github.com/peterjc/bioperl-live/commit/905441ac09939be3368c14de38d04486c7e9849a

Would a single clean commit of the (current) .travis.yml file be
preferable to the current series of commits? And you you want
a pull request, or would you just merge/cherry-pick manually?

> TravisCI can send out emails (e.g. on test failures), and perhaps
> bioperl-guts-l might be a sensible place to send these. Initially
> we'd disabled the emails for Biopython. I'd like to use an RSS
> feed... there is a JSON API which BioRuby are using for
> http://www.biogems.info/ which tracks their plugins.

I've filed an issue for news feed support in TravisCI,
https://github.com/travis-ci/travis-core/issues/82

Regards,

Peter


From p.j.a.cock at googlemail.com  Tue Jul 31 06:37:35 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 31 Jul 2012 11:37:35 +0100
Subject: [Bioperl-l] Travis Continuous Integration testing & pull requests
Message-ID: <CAKVJ-_5Yt8O21Wo0o6p1VgjheaaPYBDQEbF6geaABb1M++GfRA@mail.gmail.com>

Hi all,

I'm cross posting as this is an announcement. Please keep any
follow up discussion to the relevant project specific mailing list,
or if general open-bio-l please.

Those following the OBF blog or the OBF or Bio* Twitter accounts
will have already seen this, which I posted yesterday:
http://news.open-bio.org/news/2012/07/travis-ci-for-testing/

In summary, since earlier this year BioRuby and then Biopython and
BioPerl have been using Travis-CI.org (a hosted continuous
integration service for the open source community) to run their unit
tests automatically whenever their GitHub repositories are updated.

In addition we now have TravisCI automatically running our tests
on any new GitHub pull requests - supported by an OBF donation
to Travis-CI, see:
http://about.travis-ci.org/blog/announcing-pull-request-support/

Currently BioJava only uses GitHub as an SVN mirror - but this
should still let you start using TravisCI for automated testing:
http://about.travis-ci.org/docs/user/languages/java/

For EMBOSS, this is another incentive to convert from CVS to
github - TravisCI recently announced support for C/C++ projects:
http://about.travis-ci.org/blog/support_for_go_c_and_cpp/
http://about.travis-ci.org/docs/user/languages/c/

Potentially there are other OBF projects where this would be
useful too.

Regards,

Peter


From wrp at virginia.edu  Mon Jul  2 10:31:40 2012
From: wrp at virginia.edu (William Pearson)
Date: Mon, 2 Jul 2012 10:31:40 -0400
Subject: [Bioperl-l] Application Deadline - 2012 CSHL Computational and
	Comparative Genomics Course
Message-ID: <AAD6FEEC-F120-4B30-8208-1E48D8129FE6@virginia.edu>


Course announcement - Application deadline, July 15, 2012

Cold Spring Harbor COMPUTATIONAL & COMPARATIVE GENOMICS

Oct 31 - Nov 6, 2011
Application Deadline: July 15, 2012

INSTRUCTORS:

William Pearson, University of Virginia, Charlottesville, VA
Lisa Stubbs, University of Illinois, Urbana, IL
This course presents a comprehensive overview of the theory and practice of computational methods for the identification and characterization of functional elements from DNA sequence data. The course focuses on approaches for extracting the maximum amount of information from protein and DNA sequence similarity through sequence database searches, statistical analysis, and multiple sequence alignment. Additional topics include:

Alignment and analysis of "Next-Gen" sequencing data
The Galaxy environment for high-throughput analysis
Identification of conserved signals in aligned and unaligned sequences
Regulatory element and motif recognition
Integration of genetic and sequence information in biological databases
The ENSEMBL genome browser and BioMart
Function/phenotype prediction for sequence variants
The course combines lectures with hands-on exercises; students are encouraged to pose challenging sequence analysis problems using their own data. The course is designed for biologists seeking advanced training in biological sequence and genome analysis, computational biology core resource directors and staff, and for scientists in other disciplines, such as computer science, who wish to survey current research problems in biological sequence analysis. Advanced programming skills are not required.

The lecture/lab schedule for the 2011 course can be found at fasta.bioch.virginia.edu/cshl

Speakers in the 2011 course included:
Aaron Mackey, U. of Virginia, Next-Gen analysis pipelines
Bert Overduin, European Bioinformatics Institute, UK, ENSEMBL and BioMart
Francis Ouellette, Ontario Institute for Cancer Research, Databases for Biological Function
William Pearson, U. of Virginia, Similarity Searching, Multiple Alignment
Lisa Stubbs, U. of Illinois, Urbana, ChIP, Transcription Factors, and Comparative Genomics
James Taylor, Emory, Galaxy and genome analysis pipelines
The primary focus of the computational and comparative genomics course is the theory and practice of algorithms used in computational biology, with the goal of using current methods more effectively and evaluating new approaches. Students who wish to learn Perl programming for Bioinformatics are encouraged to apply to the Programming for Biology course. Students who would like in-depth training in the analysis of next-generation sequencing data (e.g., SNP calling and the detection of structural variants) should apply to the course on Advanced Sequencing Technologies & Applications. This Computational and Comparative Genomics course will discuss methods for phenotype prediction from variation data.

To apply to the course, fill out and send in the form at:

http://meetings.cshl.edu/course/courseapp_instr.shtml


From shalabh.sharma7 at gmail.com  Mon Jul  2 13:09:57 2012
From: shalabh.sharma7 at gmail.com (shalabh sharma)
Date: Mon, 2 Jul 2012 13:09:57 -0400
Subject: [Bioperl-l] translation frame problem in bioperl
Message-ID: <CAA7rn9cnqjsMv-ggCn_H+WavqdtCZZfh37v8pkLak1-ct8UX+A@mail.gmail.com>

Hi All,
         I am just confused about the translation frames. I used bioperl to
parse a blastx report.
Reports shows that the frame used is -2 but when i translate the sequence
using EMBOSS or Some other program the frame is -1.
Am i doing something wrong here.

Here is the sequence:
>gi|378759230|gb|AHBJ01000169.1| SAR86 cluster bacterium SAR86D
scf1120176765857, whole genome shotgun sequence 2642:3697
AGCTTCCCATGGAACCCATGCAAGTGCAATATTTGTTTCTAGCTCTGGTGACCACCAAGGAGATGTCACGTAGCCCACCTCATCTTCATCAGTATTAGTTACTATCCAAAAATCAGAAGCATAATCTGTGATTTCTTTTCCTCCAAGGGTTAAACCAACCATCTTCATTTTAAATGGTGCATTTCCTTCATCTATGATTGCTCTCTGTTTTTCAAGCTCTTCTTTACCAATGTAATCAGCTGCTTTATTTCTTGGTACCTGATAACTTAAATTAACCTGAAAGGGAGAAGTTTCATGATCCAGATCTTGTCCCCAAGACAAAATTCCAGCTGCAATGCGACGATGATGCGCAGGAGCTATGACCATTAAGCCAAATTCTTCTCCAGCCTCAAGAACAGCATTCCACATTTTTTCTGCATTATCATGTGCGTCACGAACATATATTTCATAACCTTTTTCGCCTGTAAAACCAGTTTGACTGATTACACAATCAGCTCCACCAACCTGAGTTTCTAAAATTCCATAATAAGGAACTTCTCTTAACTCTTCGCCAGCTAACTTTGCCATAAGATCTTCAGATAAAGGGCCTTGAATTTGAACAGGACAAACATCAATCTCATCAATTTCTACGTCATATTTTTTAGACACATTTACGCCTTGAAGCCAAAGTAAGAGATCGCTGTCTGATATTGAGAACCAGAATTCATCTTCTGTTAGTCTTAATAGAACAGGGTCATTTAAAACCCCTCCTTTTTCATTGCATAAAATCGCATATTTACCATTTCCGGGTTTAATTTTTGTAGCATCACGAGTTATTACATAATCTGTAAAAGCTTCTGCATCTGGACCTTTTACTCTTATCTGTCTTTCAACAGCAACATTCCACATAGTAACTCTATTAACCAAGGCTTCGTATTCAACCATGGCACCGCCATCTTCAGGTTTTACATAGCCTCGTGGATGATAAATTCGATTATATACAGTTGCTCTCCAACAGCCCGCTTCATGAGATAGATGCCAAAAAGGCGATTTTCTTACCCGGGTTGAAATTAATAA

This is a part of blast report by bioperl:
>JCVI_READ_1105499496127 /Indian_Ocean/gcvT
          Length = 352

 Score =  655 bits (1690), Expect = 0.0
 Identities = 311/352 (88%), Positives = 329/352 (93%)
 Frame = -2

Query: 3697 LLISTRVRKSPFWHLSHEAGCWRATVYNRIYHPRGYVKPEDGGAMVEYEALVNRVTMWNV
3518
            +LISTRVRKSPFWHLSHEAGCWRATVYNRIYHPRGY+KPEDGGAMVEY+ALVNRVTMWNV
Sbjct: 1    MLISTRVRKSPFWHLSHEAGCWRATVYNRIYHPRGYIKPEDGGAMVEYDALVNRVTMWNV 60
 .....
.....
Query: 2797 GLTLGGKEITDYASDFWIVTNTDEDEVGYVTSPWWSPELETNIALAWVPWEA 2642
            GLTLGGKEITDYA DFW+V + D   +     PWWSPEL TNIAL WVPW A
Sbjct: 301  GLTLGGKEITDYAPDFWLVADMDGMMLDISLPPWWSPELNTNIALGWVPWSA 352

This is EMBOSS output (from EBI):

>EMBOSS_001_4
LLISTRVRKSPFWHLSHEAGCWRATVYNRIYHPRGYVKPEDGGAMVEYEALVNRVTMWNV
AVERQIRVKGPDAEAFTDYVITRDATKIKPGNGKYAILCNEKGGVLNDPVLLRLTEDEFW
FSISDSDLLLWLQGVNVSKKYDVEIDEIDVCPVQIQGPLSEDLMAKLAGEELREVPYYGI
LETQVGGADCVISQTGFTGEKGYEIYVRDAHDNAEKMWNAVLEAGEEFGLMVIAPAHHRR
IAAGILSWGQDLDHETSPFQVNLSYQVPRNKAADYIGKEELEKQRAIIDEGNAPFKMKMV
GLTLGGKEITDYASDFWIVTNTDEDEVGYVTSPWWSPELETNIALAWVPWEA
>EMBOSS_001_5
INFNPGKKIAFLASIS*SGLLESNCI*SNLSSTRLCKT*RWRCHG*IRSLG**SYYVECC
C*KTDKSKRSRCRSFYRLCNNS*CYKN*TRKW*ICDFMQ*KRRGFK*PCSIKTNRR*ILV

......

You can see its a frame -1.

I would really appreciate your help.


Thanks

Shalabh

-- 
Shalabh Sharma
Scientific Computing Professional Associate (Bioinformatics Specialist)
Department of Marine Sciences
University of Georgia
Athens, GA 30602-3636


From scott at scottcain.net  Mon Jul  2 14:50:45 2012
From: scott at scottcain.net (Scott Cain)
Date: Mon, 2 Jul 2012 14:50:45 -0400
Subject: [Bioperl-l] GMOD Summer School application deadline
Message-ID: <CA+JTaoym218vQ9_VvtwvMW9ZaoHfcu1hxG59L2z=ovE6qWN6Mw@mail.gmail.com>

Hello,

The deadline to apply for the GMOD Summer School is in one week, July
9th.  The application is available as a Google Form:

  https://docs.google.com/spreadsheet/embeddedform?formkey=dG5hNGFiQ3UwYTV2LUZxZW04Qm1yZXc6MQ

In the GMOD Summer School (August 24-29, 2012) we will cover the
installation, configuration and use of a variety of GMOD tools,
including Chado, GBrowse, JBrowse and Tripal.  For more information on
the course, see the course web page at

  http://gmod.org/wiki/2012_GMOD_Summer_School

The course will make heavy use of the Amazon Web Service (aka, the
Cloud) via a grant from Amazon.  Enrollment is limited to 24 students,
and the application process is competitive: the last few years we've
received over 75 applications for those 24 spots.

I look forward to seeing you in North Carolina in August!

Scott


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research


From p.j.a.cock at googlemail.com  Mon Jul  2 15:34:40 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 2 Jul 2012 20:34:40 +0100
Subject: [Bioperl-l] translation frame problem in bioperl
In-Reply-To: <CAA7rn9cnqjsMv-ggCn_H+WavqdtCZZfh37v8pkLak1-ct8UX+A@mail.gmail.com>
References: <CAA7rn9cnqjsMv-ggCn_H+WavqdtCZZfh37v8pkLak1-ct8UX+A@mail.gmail.com>
Message-ID: <CAKVJ-_5nX7f2ZejEpeVnMhd35PYxxrJX2mxu4A712NPOxvie0g@mail.gmail.com>

On Mon, Jul 2, 2012 at 6:09 PM, shalabh sharma
<shalabh.sharma7 at gmail.com> wrote:
> Hi All,
> ? ? ? ? ?I am just confused about the translation frames. I used bioperl to
> parse a blastx report.
> Reports shows that the frame used is -2 but when i translate the sequence
> using EMBOSS or Some other program the frame is -1.
> Am i doing something wrong here.

Possibly there are conflicting definitions of frames -1, -2, and -3 here
(and that's leaving out the possibility of -0, -1 and -2 counting). Some
will count from the first base (start for forward strand), others the last
base (start of reverse strand). This can make comparing the output
of different tools quite confusing.

Peter


From shalabh.sharma7 at gmail.com  Mon Jul  2 16:39:29 2012
From: shalabh.sharma7 at gmail.com (shalabh sharma)
Date: Mon, 2 Jul 2012 16:39:29 -0400
Subject: [Bioperl-l] translation frame problem in bioperl
In-Reply-To: <98F2B304-71F3-42BD-9603-6858F03CC9F5@verizon.net>
References: <CAA7rn9cnqjsMv-ggCn_H+WavqdtCZZfh37v8pkLak1-ct8UX+A@mail.gmail.com>
	<CAKVJ-_5nX7f2ZejEpeVnMhd35PYxxrJX2mxu4A712NPOxvie0g@mail.gmail.com>
	<98F2B304-71F3-42BD-9603-6858F03CC9F5@verizon.net>
Message-ID: <CAA7rn9eiKh6EszMsV=Vk65rJPywOUJijP0Q=rYg_vt=xoK-0=Q@mail.gmail.com>

Hi Peter and Brian,
                           Thanks a lot for your reply.
I have already taken this in account.
So if i parse the blast report (my previous example)
i get strand '-1' and frame '1' (according to bioperl)
so if we convert it to general term then its -2 because bioperl starts from
0.

Also for bioperl forward frame translation working fine.


Thanks
Shalabh
On Mon, Jul 2, 2012 at 4:24 PM, Brian Osborne <bosborne11 at verizon.net>wrote:

> Shalabh,
>
> Also take a look at this:
>
> http://www.bioperl.org/wiki/HOWTO:SearchIO#frame.28.29
>
> Brian O.
>
>
> On Jul 2, 2012, at 3:34 PM, Peter Cock wrote:
>
> > On Mon, Jul 2, 2012 at 6:09 PM, shalabh sharma
> > <shalabh.sharma7 at gmail.com> wrote:
> >> Hi All,
> >>          I am just confused about the translation frames. I used
> bioperl to
> >> parse a blastx report.
> >> Reports shows that the frame used is -2 but when i translate the
> sequence
> >> using EMBOSS or Some other program the frame is -1.
> >> Am i doing something wrong here.
> >
> > Possibly there are conflicting definitions of frames -1, -2, and -3 here
> > (and that's leaving out the possibility of -0, -1 and -2 counting). Some
> > will count from the first base (start for forward strand), others the
> last
> > base (start of reverse strand). This can make comparing the output
> > of different tools quite confusing.
> >
> > Peter
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


-- 
Shalabh Sharma
Scientific Computing Professional Associate (Bioinformatics Specialist)
Department of Marine Sciences
University of Georgia
Athens, GA 30602-3636


From bosborne11 at verizon.net  Mon Jul  2 16:24:24 2012
From: bosborne11 at verizon.net (Brian Osborne)
Date: Mon, 02 Jul 2012 16:24:24 -0400
Subject: [Bioperl-l] translation frame problem in bioperl
In-Reply-To: <CAKVJ-_5nX7f2ZejEpeVnMhd35PYxxrJX2mxu4A712NPOxvie0g@mail.gmail.com>
References: <CAA7rn9cnqjsMv-ggCn_H+WavqdtCZZfh37v8pkLak1-ct8UX+A@mail.gmail.com>
	<CAKVJ-_5nX7f2ZejEpeVnMhd35PYxxrJX2mxu4A712NPOxvie0g@mail.gmail.com>
Message-ID: <98F2B304-71F3-42BD-9603-6858F03CC9F5@verizon.net>

Shalabh,

Also take a look at this:

http://www.bioperl.org/wiki/HOWTO:SearchIO#frame.28.29

Brian O.


On Jul 2, 2012, at 3:34 PM, Peter Cock wrote:

> On Mon, Jul 2, 2012 at 6:09 PM, shalabh sharma
> <shalabh.sharma7 at gmail.com> wrote:
>> Hi All,
>>          I am just confused about the translation frames. I used bioperl to
>> parse a blastx report.
>> Reports shows that the frame used is -2 but when i translate the sequence
>> using EMBOSS or Some other program the frame is -1.
>> Am i doing something wrong here.
> 
> Possibly there are conflicting definitions of frames -1, -2, and -3 here
> (and that's leaving out the possibility of -0, -1 and -2 counting). Some
> will count from the first base (start for forward strand), others the last
> base (start of reverse strand). This can make comparing the output
> of different tools quite confusing.
> 
> Peter
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From vebaev at gmail.com  Tue Jul  3 12:35:26 2012
From: vebaev at gmail.com (vebaev at gmail.com)
Date: Tue, 3 Jul 2012 09:35:26 -0700 (PDT)
Subject: [Bioperl-l] CFP - International Conference on Bioinformatics and
 Computational Biology - BIOCOMP BG 2012
Message-ID: <7b498b4c-2b2e-4e1f-871f-513203488bf1@googlegroups.com>

International Conference on Bioinformatics and Computational Biology - 
BIOCOMP BG 2012 
September 20-21, 2012, Varna, Bulgaria
 

Dear Colleague,
It is our pleasure to circulate the 2nd announcement of the International 
Conference on Bioinformatics and Computational Biology - BIOCOMP 2012 
(http://biocomp.bio.uni-plovdiv.bg/).
 
Keynote speakers
 
Prof. Dr. Klaas Vandepoele - Ghent University, Belgium
Dr. Andreas Gisel - Institute for Biomedical Technologies, Italy
Prof. Wojciech Karlowski - Insitute of Molecular Biology and Biotechnology, 
Poland
Prof. Mario A. Fares - University of Dublin, Trinity College, Ireland
Dr.Andrey Kajava - CRBM - Macromolecular Biochemistry Research Center, 
France
Dr.Gaurav Sablok - Istituto Agrario San Michele (IASMA), Italy
 
Topics
 
Topics of interest include, but are not limited to:
 
High-performance bio-computing
High-throughput sequencing data analysis (NGS)
Bio-ontologies
Molecular evolution
Comparative genomics
Molecular modeling and simulation
Computational genetics 
Computational proteomics
Data mining and visualization
Software tools and applications
Gene expression analysis
Gene networks
Structural biology
Genome analysis
Databases
Systems biology
 
Special topic: bioinformatics and miRNAs
 
Recent achievements in these fields will be presented. The conference will 
include plenary and poster sessions. Participant?s proposals will be taken 
under advisement in compiling the program. 
 
Publications
 
All accepted abstracts will be published in the conference abstract book.
Best 20 abstracts will be peer-reviewed and published as full text 
manuscripts in a Special Issue of Springer and Elsevier journals:
Interdisciplinary Sciences: Computational Life Sciences (ISSN: 1867-1462).
Journal of Computational Science (ISSN: 1877-7503)


Venue
The venue of the conference is 4-star All-inclusive Sunny Day Black Sea 
resort, Bulgaria
 
Registration and abstract submission
 
All the actions related to the BIOCOMP 2012  (abstract submission, 
registration etc) may be completed via the Conference website at 
http://biocomp.bio.uni-plovdiv.bg/
 
Accommodation
 
IMPORTANT: Accommodation is included in the conference registration fee.
 
Important dates
 
Abstract Submission Deadline - 20 August 2012
Early Registration Fee Payment Deadline - 20 August 2012
Arriving, Poster set up, Registration ? 19 September 2012
Plenary and Poster Sessions ? 20-21 September 2012
 
 
You may find details of the Conference visiting the Conference website at 
http://biocomp.bio.uni-plovdiv.bg/
 
 
Looking forward to see you in Bulgaria!


------------------------------------------------


Dr. Vesselin Baev
Research Assistant Professor

University of Plovdiv
Dept. Plant Phys. and Molecular Biology
Bioinformatics SMART Group

Tzar Assen 24,Plovdiv 4000, BULGARIA
Office:+359 32 261 (560); Mobile:+359 89 43 80 945
vebaev at gmail.com; baev at uni-plovdiv.bg; CV: http://plantgene.eu/


From tarakaramji at gmail.com  Tue Jul  3 15:33:43 2012
From: tarakaramji at gmail.com (Tarakaramji Moturu)
Date: Tue, 3 Jul 2012 19:33:43 +0000 (UTC)
Subject: [Bioperl-l] Invitation to connect on LinkedIn
Message-ID: <757342252.16905070.1341344023805.JavaMail.app@ela4-bed83.prod>

LinkedIn
------------


I'd like to add you to my professional network on LinkedIn.

- Tarakaramji

Tarakaramji Moturu
Student at GITAM University
Vishakhapatnam Area, India

Confirm that you know Tarakaramji Moturu:
https://www.linkedin.com/e/1505z7-h47dlkop-69/isd/7726719493/9xC087NO/?hs=false&tok=2UuxBwCCkl7Rk1

--
You are receiving Invitation to Connect emails. Click to unsubscribe:
http://www.linkedin.com/e/1505z7-h47dlkop-69/q7l5PgNeLXh3mAgNJzs79PDWzhT0l80xWa/goo/bioperl-l%40bioperl%2Eorg/20061/I2613636655_1/?hs=false&tok=0hY4YIDwkl7Rk1

(c) 2012 LinkedIn Corporation. 2029 Stierlin Ct, Mountain View, CA 94043, USA.


From l.m.timmermans at students.uu.nl  Wed Jul  4 03:16:34 2012
From: l.m.timmermans at students.uu.nl (Leon Timmermans)
Date: Wed, 4 Jul 2012 10:16:34 +0300
Subject: [Bioperl-l] Invitation to connect on LinkedIn
In-Reply-To: <757342252.16905070.1341344023805.JavaMail.app@ela4-bed83.prod>
References: <757342252.16905070.1341344023805.JavaMail.app@ela4-bed83.prod>
Message-ID: <CAC1jpXBYn+3+TtN=n7NoRhnQ87-EuYRpsuBqpE5iHHHAntJ_ew@mail.gmail.com>

On Tue, Jul 3, 2012 at 10:33 PM, Tarakaramji Moturu
<tarakaramji at gmail.com> wrote:
> LinkedIn
> ------------
>
>
>
> I'd like to add you to my professional network on LinkedIn.
>
> - Tarakaramji

Sending messages like this directly over mailinglists is a rather bad
idea, if only because LinkedIn will think bioperl-l at bioperl.org is one
of the email addresses of whomever accepts the request (which is
relevant for retrieving a lost password, I think).

Leon


From ulrik.stervbo at gmail.com  Fri Jul  6 03:03:08 2012
From: ulrik.stervbo at gmail.com (Ulrik Stervbo)
Date: Fri, 6 Jul 2012 09:03:08 +0200
Subject: [Bioperl-l] problem in using protparam.pm module
In-Reply-To: <CACyyM1bpg+pOwCK26M2OgNaBD2VJsaXtc02WkGHrKg1QxVL0Tw@mail.gmail.com>
References: <CACyyM1Za0pW9R2GPxAFOATj4tPM24rbwLsUazGUFHNimZ2LO4g@mail.gmail.com>
	<CACyyM1ZUf2y7fO2Vkc5cSoqSgtW86GA+1yt_ctc9Y8P3Qbnt_A@mail.gmail.com>
	<9AA27ADA-FFE1-4735-BDE4-56C9B9A18009@illinois.edu>
	<CACyyM1bpg+pOwCK26M2OgNaBD2VJsaXtc02WkGHrKg1QxVL0Tw@mail.gmail.com>
Message-ID: <CAKVAULOTmTparqLEuREquzcrO6Hf+Z7MSGRAPBWMTqH0dJ7-xg@mail.gmail.com>

I had the same problem, and realized it is because I am behind a proxy.

This is what I did to the Protparam module:
Changed the url to 'http://web.expasy.org/cgi-bin/protparam/protparam'
as previously found

Added:
$browser->proxy(['http'], 'http://[my proxy]/'); after initialization
of the LWP agent.

The proxy settings is what made Perl choke. (If only one could make
perl see global proxy settings).

Cheers,
Ulrik

2011/7/28 Shachi Gahoi <shachigahoimbi at gmail.com>:
> Please help me how to run protparam using bioperl module
>
> On Wed, Jul 27, 2011 at 11:05 AM, Chris Fields <cjfields at illinois.edu>wrote:
>
>> The web service appears to have changed, but it looks as if no tests have
>> been written up for this module which would have caught this out.  We can
>> write some basic tests up to check for simple functionality.
>>
>> chris
>>
>> On Jul 26, 2011, at 10:58 PM, Shachi Gahoi wrote:
>>
>> > Dear All,
>> >
>> > i am using protparam.pm module. but when i am running this script it is
>> > printing one error message
>> >
>> > "Can't call method "throw" without a package or object reference at
>> > /usr/share/perl5/Bio/Root/Root.pm line 368, <GEN1> line 1."
>> >
>> > Kindly help me to solve this problem.
>> >
>> >
>> > Script is here----
>> >
>> ###################################################################################
>> > #!/usr/bin/perl
>> >
>> > use warnings;
>> > use Bio::SeqIO;
>> > use Bio::Tools::Protparam;
>> >
>> >
>> > $seqfile='test1.fasta';
>> >
>> > $seqio = Bio::SeqIO->new(-file => "$seqfile", -format => 'Fasta');
>> >
>> >
>> >    while(  $seq = $seqio->next_seq() )
>> > {
>> >
>> >
>> >        my $pp = Bio::Tools::Protparam->new(-seq=>$seq->seq);
>> >
>> >        print
>> >        "ID : ", $seq->display_id,"\n",
>> >        "Amino acid number : ",$pp->amino_acid_number(),"\n",
>> >        "Number of negative amino acids : ",$pp->num_neg(),"\n",
>> >        "Number of positive amino acids : ",$pp->num_pos(),"\n",
>> >        "Molecular weight : ",$pp->molecular_weight(),"\n",
>> >        "Theoretical pI : ",$pp->theoretical_pI(),"\n",
>> >        "Total number of atoms : ", $pp->total_atoms(),"\n",
>> >        "Number of carbon atoms : ",$pp->num_carbon(),"\n",
>> >        "Number of hydrogen atoms : ",$pp->num_hydrogen(),"\n",
>> >        "Number of nitrogen atoms : ",$pp->num_nitro(),"\n",
>> >        "Number of oxygen atoms : ",$pp->num_oxygen(),"\n",
>> >        "Number of sulphur atoms : ",$pp->num_sulphur(),"\n",
>> >        "Half life : ", $pp->half_life(),"\n",
>> >        "Instability Index : ", $pp->instability_index(),"\n",
>> >        "Stability class : ", $pp->stability(),"\n",
>> >        "Aliphatic_index : ",$pp->aliphatic_index(),"\n",
>> >        "Gravy : ", $pp->gravy(),"\n",
>> >        "Composition of A : ", $pp->AA_comp('A'),"\n",
>> >        "Composition of R : ", $pp->AA_comp('R'),"\n",
>> >        "Composition of N : ", $pp->AA_comp('N'),"\n",
>> >        "Composition of D : ", $pp->AA_comp('D'),"\n",
>> >        "Composition of C : ", $pp->AA_comp('C'),"\n",
>> >        "Composition of Q : ", $pp->AA_comp('Q'),"\n",
>> >        "Composition of E : ", $pp->AA_comp('E'),"\n",
>> >        "Composition of G : ", $pp->AA_comp('G'),"\n",
>> >        "Composition of H : ", $pp->AA_comp('H'),"\n",
>> >        "Composition of I : ", $pp->AA_comp('I'),"\n",
>> >        "Composition of L : ", $pp->AA_comp('L'),"\n",
>> >        "Composition of K : ", $pp->AA_comp('K'),"\n",
>> >        "Composition of M : ", $pp->AA_comp('M'),"\n",
>> >        "Composition of F : ", $pp->AA_comp('F'),"\n",
>> >        "Composition of P : ", $pp->AA_comp('P'),"\n",
>> >        "Composition of S : ", $pp->AA_comp('S'),"\n",
>> >        "Composition of T : ", $pp->AA_comp('T'),"\n",
>> >        "Composition of W : ", $pp->AA_comp('W'),"\n",
>> >        "Composition of Y : ", $pp->AA_comp('Y'),"\n",
>> >        "Composition of V : ", $pp->AA_comp('V'),"\n",
>> >        "Composition of B : ", $pp->AA_comp('B'),"\n",
>> >        "Composition of Z : ", $pp->AA_comp('Z'),"\n",
>> >        "Composition of X : ", $pp->AA_comp('X'),"\n";
>> > }
>> >
>> ###################################################################################
>> >
>> >
>> >
>> >
>> > --
>> > Regards,
>> > Shachi
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
>
> --
> Regards,
> Shachi
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason.stajich at gmail.com  Fri Jul  6 13:49:46 2012
From: jason.stajich at gmail.com (Jason Stajich)
Date: Fri, 6 Jul 2012 10:49:46 -0700
Subject: [Bioperl-l] problem in using protparam.pm module
In-Reply-To: <CAKVAULOTmTparqLEuREquzcrO6Hf+Z7MSGRAPBWMTqH0dJ7-xg@mail.gmail.com>
References: <CACyyM1Za0pW9R2GPxAFOATj4tPM24rbwLsUazGUFHNimZ2LO4g@mail.gmail.com>
	<CACyyM1ZUf2y7fO2Vkc5cSoqSgtW86GA+1yt_ctc9Y8P3Qbnt_A@mail.gmail.com>
	<9AA27ADA-FFE1-4735-BDE4-56C9B9A18009@illinois.edu>
	<CACyyM1bpg+pOwCK26M2OgNaBD2VJsaXtc02WkGHrKg1QxVL0Tw@mail.gmail.com>
	<CAKVAULOTmTparqLEuREquzcrO6Hf+Z7MSGRAPBWMTqH0dJ7-xg@mail.gmail.com>
Message-ID: <8C9056B6-1DA4-4BE0-B008-429C2F6C05BE@gmail.com>

you might try the PERL_LWP_ENV_PROXY and HTTP_PROXY env variables
http://search.cpan.org/~gaas/libwww-perl-6.04/lib/LWP/UserAgent.pm#CONSTRUCTOR_METHODS
http://search.cpan.org/~gaas/libwww-perl-6.04/lib/LWP/UserAgent.pm#Proxy_attributes

I can't test it my end though w/o a proxy service.


On Jul 6, 2012, at 12:03 AM, Ulrik Stervbo wrote:

> I had the same problem, and realized it is because I am behind a proxy.
> 
> This is what I did to the Protparam module:
> Changed the url to 'http://web.expasy.org/cgi-bin/protparam/protparam'
> as previously found
> 
> Added:
> $browser->proxy(['http'], 'http://[my proxy]/'); after initialization
> of the LWP agent.
> 
> The proxy settings is what made Perl choke. (If only one could make
> perl see global proxy settings).
> 
> Cheers,
> Ulrik
> 
> 2011/7/28 Shachi Gahoi <shachigahoimbi at gmail.com>:
>> Please help me how to run protparam using bioperl module
>> 
>> On Wed, Jul 27, 2011 at 11:05 AM, Chris Fields <cjfields at illinois.edu>wrote:
>> 
>>> The web service appears to have changed, but it looks as if no tests have
>>> been written up for this module which would have caught this out.  We can
>>> write some basic tests up to check for simple functionality.
>>> 
>>> chris
>>> 
>>> On Jul 26, 2011, at 10:58 PM, Shachi Gahoi wrote:
>>> 
>>>> Dear All,
>>>> 
>>>> i am using protparam.pm module. but when i am running this script it is
>>>> printing one error message
>>>> 
>>>> "Can't call method "throw" without a package or object reference at
>>>> /usr/share/perl5/Bio/Root/Root.pm line 368, <GEN1> line 1."
>>>> 
>>>> Kindly help me to solve this problem.
>>>> 
>>>> 
>>>> Script is here----
>>>> 
>>> ###################################################################################
>>>> #!/usr/bin/perl
>>>> 
>>>> use warnings;
>>>> use Bio::SeqIO;
>>>> use Bio::Tools::Protparam;
>>>> 
>>>> 
>>>> $seqfile='test1.fasta';
>>>> 
>>>> $seqio = Bio::SeqIO->new(-file => "$seqfile", -format => 'Fasta');
>>>> 
>>>> 
>>>>   while(  $seq = $seqio->next_seq() )
>>>> {
>>>> 
>>>> 
>>>>       my $pp = Bio::Tools::Protparam->new(-seq=>$seq->seq);
>>>> 
>>>>       print
>>>>       "ID : ", $seq->display_id,"\n",
>>>>       "Amino acid number : ",$pp->amino_acid_number(),"\n",
>>>>       "Number of negative amino acids : ",$pp->num_neg(),"\n",
>>>>       "Number of positive amino acids : ",$pp->num_pos(),"\n",
>>>>       "Molecular weight : ",$pp->molecular_weight(),"\n",
>>>>       "Theoretical pI : ",$pp->theoretical_pI(),"\n",
>>>>       "Total number of atoms : ", $pp->total_atoms(),"\n",
>>>>       "Number of carbon atoms : ",$pp->num_carbon(),"\n",
>>>>       "Number of hydrogen atoms : ",$pp->num_hydrogen(),"\n",
>>>>       "Number of nitrogen atoms : ",$pp->num_nitro(),"\n",
>>>>       "Number of oxygen atoms : ",$pp->num_oxygen(),"\n",
>>>>       "Number of sulphur atoms : ",$pp->num_sulphur(),"\n",
>>>>       "Half life : ", $pp->half_life(),"\n",
>>>>       "Instability Index : ", $pp->instability_index(),"\n",
>>>>       "Stability class : ", $pp->stability(),"\n",
>>>>       "Aliphatic_index : ",$pp->aliphatic_index(),"\n",
>>>>       "Gravy : ", $pp->gravy(),"\n",
>>>>       "Composition of A : ", $pp->AA_comp('A'),"\n",
>>>>       "Composition of R : ", $pp->AA_comp('R'),"\n",
>>>>       "Composition of N : ", $pp->AA_comp('N'),"\n",
>>>>       "Composition of D : ", $pp->AA_comp('D'),"\n",
>>>>       "Composition of C : ", $pp->AA_comp('C'),"\n",
>>>>       "Composition of Q : ", $pp->AA_comp('Q'),"\n",
>>>>       "Composition of E : ", $pp->AA_comp('E'),"\n",
>>>>       "Composition of G : ", $pp->AA_comp('G'),"\n",
>>>>       "Composition of H : ", $pp->AA_comp('H'),"\n",
>>>>       "Composition of I : ", $pp->AA_comp('I'),"\n",
>>>>       "Composition of L : ", $pp->AA_comp('L'),"\n",
>>>>       "Composition of K : ", $pp->AA_comp('K'),"\n",
>>>>       "Composition of M : ", $pp->AA_comp('M'),"\n",
>>>>       "Composition of F : ", $pp->AA_comp('F'),"\n",
>>>>       "Composition of P : ", $pp->AA_comp('P'),"\n",
>>>>       "Composition of S : ", $pp->AA_comp('S'),"\n",
>>>>       "Composition of T : ", $pp->AA_comp('T'),"\n",
>>>>       "Composition of W : ", $pp->AA_comp('W'),"\n",
>>>>       "Composition of Y : ", $pp->AA_comp('Y'),"\n",
>>>>       "Composition of V : ", $pp->AA_comp('V'),"\n",
>>>>       "Composition of B : ", $pp->AA_comp('B'),"\n",
>>>>       "Composition of Z : ", $pp->AA_comp('Z'),"\n",
>>>>       "Composition of X : ", $pp->AA_comp('X'),"\n";
>>>> }
>>>> 
>>> ###################################################################################
>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Regards,
>>>> Shachi
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>>> 
>> 
>> 
>> --
>> Regards,
>> Shachi
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org


From bubli_thakur at rediffmail.com  Sun Jul  1 10:59:29 2012
From: bubli_thakur at rediffmail.com (subarna thakur)
Date: Sun, 01 Jul 2012 14:59:29 -0000
Subject: [Bioperl-l] =?utf-8?q?Ks_saturation?=
Message-ID: <20120617031856.16345.qmail@f4mail-235-140.rediffmail.com>


Dear all,I am trying to calculate dn/ds values of &nbsp;all orthologous gene pair between a pair of genome using pairwsie_kaks.pl script within bioperl which evokes the codeml program in runmode -2. When I am analyzing the results, some of the genes have&nbsp;anomalously high dS or Ks values and some of them even reaching more than 100 as a result of which the average value of Ks for the whole genome &nbsp;shots up. These genes are orthologous genes and even share more than 50% sequence identity. Should I consider these genes for the anlysis or left them out. If I left them out, then upto what cutoff value of Ks should I consider for analysis. In some papers, I have found that they have considered Ks values as high as 5.6. Is there a way for determining the cutoff value for Ks?
Subarna&nbsp;&nbsp;


From haywardjeremya at gmail.com  Fri Jul  6 13:56:12 2012
From: haywardjeremya at gmail.com (Jeremy Hayward)
Date: Fri, 6 Jul 2012 14:56:12 -0300
Subject: [Bioperl-l] Two 'host' tags?
Message-ID: <CAJv8UuyLgFfHe-Vfyv1p+90ypO+NFQ5E_br1YYyL=PKg++yyNA@mail.gmail.com>

Hi--  Clueless newbie here, for which apologies.

I've posted a description of my problem, inputs and outputs, at Gist
2816510; https://gist.github.com/2816510

Briefly, I'm trying to take a genbank file (.gb), and create a FASTA
file with a specific identifier line for each sequence. Specifically,
I want the "host" tag as the identifier. With the help of the Bioperl
beginner readme and the HOWTO's (which are great!) I've worked out how
to loop through my sequences and get the 'host' tag for each one. For
some reason, I get two identifier lines for each sequence. I guess the
problem is in the 'for' loop--it's running the stuff below it twice,
once with the actual 'host' tag data and once with...nothing? Not
sure.

I think I can work out how to use s/ and a regex just to delete the
second identifier line, but that feels like I'm avoiding the problem
instead of fixing it. Any help appreciated!


Many thanks,

--Jeremy Hayward


From jason.stajich at gmail.com  Fri Jul  6 15:39:52 2012
From: jason.stajich at gmail.com (Jason Stajich)
Date: Fri, 6 Jul 2012 12:39:52 -0700
Subject: [Bioperl-l] Two 'host' tags?
In-Reply-To: <CAJv8UuyLgFfHe-Vfyv1p+90ypO+NFQ5E_br1YYyL=PKg++yyNA@mail.gmail.com>
References: <CAJv8UuyLgFfHe-Vfyv1p+90ypO+NFQ5E_br1YYyL=PKg++yyNA@mail.gmail.com>
Message-ID: <A572776F-CF80-4EB2-A853-B62EE5AF5266@gmail.com>

Hi Jeremy -

You are printing for every feature in the loop (e.g. the source and the misc_RNA ) - you only want to loop through the features, then grab the one which is source, then change or print the info when you see that.  So you could have an if( $feature->primary_tag eq 'source') in there or something as well.  Alternatively I've left it pretty much intact and just simplified it a bit.
You should also try and use Bio::SeqIO to print instead of your printing.

I updated the code here to be simpler - right now it warns you that you are printing IDs with spaces (which is something you should think about when it comes to your output file, but I don't know your downstream plans). Also you could put other info in the description field if you wanted to capture accession number or the endophyte name too.

https://gist.github.com/3062285

Best,
Jason
On Jul 6, 2012, at 10:56 AM, Jeremy Hayward wrote:

> Hi--  Clueless newbie here, for which apologies.
> 
> I've posted a description of my problem, inputs and outputs, at Gist
> 2816510; https://gist.github.com/2816510
> 
> Briefly, I'm trying to take a genbank file (.gb), and create a FASTA
> file with a specific identifier line for each sequence. Specifically,
> I want the "host" tag as the identifier. With the help of the Bioperl
> beginner readme and the HOWTO's (which are great!) I've worked out how
> to loop through my sequences and get the 'host' tag for each one. For
> some reason, I get two identifier lines for each sequence. I guess the
> problem is in the 'for' loop--it's running the stuff below it twice,
> once with the actual 'host' tag data and once with...nothing? Not
> sure.
> 
> I think I can work out how to use s/ and a regex just to delete the
> second identifier line, but that feels like I'm avoiding the problem
> instead of fixing it. Any help appreciated!
> 
> 
> Many thanks,
> 
> --Jeremy Hayward
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org


From bosborne11 at verizon.net  Fri Jul  6 15:51:11 2012
From: bosborne11 at verizon.net (Brian Osborne)
Date: Fri, 06 Jul 2012 15:51:11 -0400
Subject: [Bioperl-l] Two 'host' tags?
In-Reply-To: <CAJv8UuyLgFfHe-Vfyv1p+90ypO+NFQ5E_br1YYyL=PKg++yyNA@mail.gmail.com>
References: <CAJv8UuyLgFfHe-Vfyv1p+90ypO+NFQ5E_br1YYyL=PKg++yyNA@mail.gmail.com>
Message-ID: <456448FF-C413-42D1-833A-FAA74E4FEF9E@verizon.net>

Jeremy,

Looks like each of your individual sequences has 2 features, but you only care about the 'source' feature (   if ($feat_object->primary_tag eq "source") ?).

Also, try not to print out the sequence like you're doing, try to build a Sequence object for each input sequence and then write its contents to your fasta file using write_seq(). You will set the id for your Sequence object using display_name().

Brian O.


On Jul 6, 2012, at 1:56 PM, Jeremy Hayward wrote:

> Hi--  Clueless newbie here, for which apologies.
> 
> I've posted a description of my problem, inputs and outputs, at Gist
> 2816510; https://gist.github.com/2816510
> 
> Briefly, I'm trying to take a genbank file (.gb), and create a FASTA
> file with a specific identifier line for each sequence. Specifically,
> I want the "host" tag as the identifier. With the help of the Bioperl
> beginner readme and the HOWTO's (which are great!) I've worked out how
> to loop through my sequences and get the 'host' tag for each one. For
> some reason, I get two identifier lines for each sequence. I guess the
> problem is in the 'for' loop--it's running the stuff below it twice,
> once with the actual 'host' tag data and once with...nothing? Not
> sure.
> 
> I think I can work out how to use s/ and a regex just to delete the
> second identifier line, but that feels like I'm avoiding the problem
> instead of fixing it. Any help appreciated!
> 
> 
> Many thanks,
> 
> --Jeremy Hayward
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From dejian.zhao at gmail.com  Wed Jul 11 13:31:37 2012
From: dejian.zhao at gmail.com (De-Jian Zhao)
Date: Thu, 12 Jul 2012 01:31:37 +0800
Subject: [Bioperl-l] Errors with Bio::Seq objects
Message-ID: <4FFDB879.1020906@gmail.com>

Hi,

I downloaded a nucleotide sequence from Genbank (file name: nt.gb) and 
tested the Bio::SeqIO module as follows. The first 3 commands succeeded; 
however the last 2 failed.

I think $seqio->next_seq() produces a Bio::Seq object which contains the 
sequence, features and annotation (according to the DESCRIPTION of 
"perldoc Bio::Seq") and thus the invocation of the methods 
get_SeqFeatures() and translate() should be valid. However, the results 
denied this idea.

Will anyone explain what happened to the last 2 commands? I have 
encountered numerous cases of failures when testing the bioperl methods. 
I want to translate the mRNA sequence and extract the sequence features. 
What are the right commands? Thanks a lot!

Best,
Dejian


PS: The commands and results

$ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); 
print $seqio->next_seq()->display_id(); ' nt.gb
NM_053056

$ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); 
print $seqio->next_seq()->molecule(); ' nt.gb
mRNA

$ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); 
print $seqio->next_seq()->subseq(1,6); ' nt.gb
CACACG

$ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); 
print $seqio->next_seq()->get_SeqFeatures(); ' nt.gb
Bio::SeqFeature::Generic=HASH(0x20a30898)Bio::SeqFeature::Generic=HASH(0x20a30bb0)Bio::SeqFeature::Generic=HASH(0x20a30cd0)Bio::SeqFeature::Generic=HASH(0x20a317b0)Bio::SeqFeature::Generic=HASH(0x20a31720)Bio::SeqFeature::Generic=HASH(0x20a39a18)Bio::SeqFeature::Generic=HASH(0x20a317e0)Bio::SeqFeature::Generic=HASH(0x20a398b0)Bio::SeqFeature::Generic=HASH(0x20a39838)Bio::SeqFeature::Generic=HASH(0x20a39e98)Bio::SeqFeature::Generic=HASH(0x20a3b898)Bio::SeqFeature::Generic=HASH(0x20a3a120)Bio::SeqFeature::Generic=HASH(0x20a3bda8)Bio::SeqFeature::Generic=HASH(0x20a3c030)Bio::SeqFeature::Generic=HASH(0x20a3c2b8)Bio::SeqFeature::Generic=HASH(0x20a3be20)Bio::SeqFeature::Generic=HASH(0x20a3c0a8)Bio::SeqFeature::Generic=HASH(0x20a3bb98)Bio::SeqFeature::Generic=HASH(0x20a3c300)Bio::SeqFeature::Generic=HASH(0x20a3c588)Bio::SeqFeature::Generic=HASH(0x20a3d838)Bio::SeqFeature::Generic=HASH(0x20a3dfb8)Bio::SeqFeature::Generic=HASH(0x20a3dd18)

$ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); 
print $seqio->next_seq()->translate(); ' nt.gb
Bio::Seq::RichSeq=HASH(0x20a3e7b0)


From jimhu at tamu.edu  Wed Jul 11 14:01:27 2012
From: jimhu at tamu.edu (Jim Hu)
Date: Wed, 11 Jul 2012 13:01:27 -0500
Subject: [Bioperl-l] Errors with Bio::Seq objects
In-Reply-To: <4FFDB879.1020906@gmail.com>
References: <4FFDB879.1020906@gmail.com>
Message-ID: <9CA9DA3A-B03F-4EC3-977C-E18A6F4D9B6F@tamu.edu>

Hi Dejian,

On Jul 11, 2012, at 12:31 PM, De-Jian Zhao wrote:

> Hi,
> 
> I downloaded a nucleotide sequence from Genbank (file name: nt.gb) and tested the Bio::SeqIO module as follows. The first 3 commands succeeded; however the last 2 failed.
> 
> I think $seqio->next_seq() produces a Bio::Seq object which contains the sequence, features and annotation (according to the DESCRIPTION of "perldoc Bio::Seq") and thus the invocation of the methods get_SeqFeatures() and translate() should be valid. However, the results denied this idea.

That's correct about Bio::Seq objects being returned. Actually, it is probably a kind of Bio::Seq object. For example, SeqIO may return a Bio::Seq::RichSeq object that inherits methods from Bio::Seq objects.  However, as explained below, the methods are working as they should... they are just returning objects when you are expecting something else.

> 
> Will anyone explain what happened to the last 2 commands? I have encountered numerous cases of failures when testing the bioperl methods. I want to translate the mRNA sequence and extract the sequence features. What are the right commands? Thanks a lot!
> 
> Best,
> Dejian
> 
> 
> 
> PS: The commands and results
> 
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->display_id(); ' nt.gb
> NM_053056
> 
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->molecule(); ' nt.gb
> mRNA
> 
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->subseq(1,6); ' nt.gb
> CACACG
> 
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->get_SeqFeatures(); ' nt.gb
> Bio::SeqFeature::Generic=HASH(0x20a30898)Bio::SeqFeature::Generic=HASH(0x20a30bb0)Bio::SeqFeature::Generic=HASH(0x20a30cd0)Bio::SeqFeature::Generic=HASH(0x20a317b0)Bio::SeqFeature::Generic=HASH(0x20a31720)Bio::SeqFeature::Generic=HASH(0x20a39a18)Bio::SeqFeature::Generic=HASH(0x20a317e0)Bio::SeqFeature::Generic=HASH(0x20a398b0)Bio::SeqFeature::Generic=HASH(0x20a39838)Bio::SeqFeature::Generic=HASH(0x20a39e98)Bio::SeqFeature::Generic=HASH(0x20a3b898)Bio::SeqFeature::Generic=HASH(0x20a3a120)Bio::SeqFeature::Generic=HASH(0x20a3bda8)Bio::SeqFeature::Generic=HASH(0x20a3c030)Bio::SeqFeature::Generic=HASH(0x20a3c2b8)Bio::SeqFeature::Generic=HASH(0x20a3be20)Bio::SeqFeature::Generic=HASH(0x20a3c0a8)Bio::SeqFeature::Generic=HASH(0x20a3bb98)Bio::SeqFeature::Generic=HASH(0x20a3c300)Bio::SeqFeature::Generic=HASH(0x20a3c588)Bio::SeqFeature::Generic=HASH(0x20a3d838)Bio::SeqFeature::Generic=HASH(0x20a3dfb8)Bio::SeqFeature::Generic=HASH(0x20a3dd18)

http://doc.bioperl.org/bioperl-live/Bio/Seq.html#POD24

$seq_obj->get_SeqFeatures() returns an array of SeqFeature objects, which are references.  So this worked as expected.

I usually write this as script files, so I've never done it all with perl -e.  But you need to iterate over the array and query the objects for the information you want about the features.


> 
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->translate(); ' nt.gb
> Bio::Seq::RichSeq=HASH(0x20a3e7b0)

->translate returns a new Seq object.  I think 

$ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->translate()->seq(); ' nt.gb

should work (haven't tried it).

Jim

> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

=====================================
Jim Hu
Professor
Dept. of Biochemistry and Biophysics
2128 TAMU
Texas A&M Univ.
College Station, TX 77843-2128
979-862-4054


From bosborne11 at verizon.net  Wed Jul 11 13:47:25 2012
From: bosborne11 at verizon.net (Brian Osborne)
Date: Wed, 11 Jul 2012 13:47:25 -0400
Subject: [Bioperl-l] Errors with Bio::Seq objects
In-Reply-To: <4FFDB879.1020906@gmail.com>
References: <4FFDB879.1020906@gmail.com>
Message-ID: <DBC94184-BB61-47EB-BC2A-D306E94BC25A@verizon.net>

Dejian,

These are not "failures". The get_SeqFeatures() and translate() methods will return Bio::Seq objects or a Bio::Seq object.

Start here:

www.bioperl.org/wiki/HOWTO:Beginners

Brian O.


On Jul 11, 2012, at 1:31 PM, De-Jian Zhao wrote:

> Hi,
> 
> I downloaded a nucleotide sequence from Genbank (file name: nt.gb) and tested the Bio::SeqIO module as follows. The first 3 commands succeeded; however the last 2 failed.
> 
> I think $seqio->next_seq() produces a Bio::Seq object which contains the sequence, features and annotation (according to the DESCRIPTION of "perldoc Bio::Seq") and thus the invocation of the methods get_SeqFeatures() and translate() should be valid. However, the results denied this idea.
> 
> Will anyone explain what happened to the last 2 commands? I have encountered numerous cases of failures when testing the bioperl methods. I want to translate the mRNA sequence and extract the sequence features. What are the right commands? Thanks a lot!
> 
> Best,
> Dejian
> 
> 
> 
> PS: The commands and results
> 
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->display_id(); ' nt.gb
> NM_053056
> 
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->molecule(); ' nt.gb
> mRNA
> 
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->subseq(1,6); ' nt.gb
> CACACG
> 
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->get_SeqFeatures(); ' nt.gb
> Bio::SeqFeature::Generic=HASH(0x20a30898)Bio::SeqFeature::Generic=HASH(0x20a30bb0)Bio::SeqFeature::Generic=HASH(0x20a30cd0)Bio::SeqFeature::Generic=HASH(0x20a317b0)Bio::SeqFeature::Generic=HASH(0x20a31720)Bio::SeqFeature::Generic=HASH(0x20a39a18)Bio::SeqFeature::Generic=HASH(0x20a317e0)Bio::SeqFeature::Generic=HASH(0x20a398b0)Bio::SeqFeature::Generic=HASH(0x20a39838)Bio::SeqFeature::Generic=HASH(0x20a39e98)Bio::SeqFeature::Generic=HASH(0x20a3b898)Bio::SeqFeature::Generic=HASH(0x20a3a120)Bio::SeqFeature::Generic=HASH(0x20a3bda8)Bio::SeqFeature::Generic=HASH(0x20a3c030)Bio::SeqFeature::Generic=HASH(0x20a3c2b8)Bio::SeqFeature::Generic=HASH(0x20a3be20)Bio::SeqFeature::Generic=HASH(0x20a3c0a8)Bio::SeqFeature::Generic=HASH(0x20a3bb98)Bio::SeqFeature::Generic=HASH(0x20a3c300)Bio::SeqFeature::Generic=HASH(0x20a3c588)Bio::SeqFeature::Generic=HASH(0x20a3d838)Bio::SeqFeature::Generic=HASH(0x20a3dfb8)Bio::SeqFeature::Generic=HASH(0x20a3dd18)
> 
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->translate(); ' nt.gb
> Bio::Seq::RichSeq=HASH(0x20a3e7b0)
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Wed Jul 11 15:02:46 2012
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Wed, 11 Jul 2012 19:02:46 +0000
Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates
In-Reply-To: <a73f4de8-8a2e-4dec-9073-62ab2e4da668@googlegroups.com>
References: <CA4B2C62.4B0E%givans@missouri.edu>
	<7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com>
	<1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu>
	<EBB9878A-F77D-4A0C-B543-7E9FA0C25ABE@missouri.edu>
	<a73f4de8-8a2e-4dec-9073-62ab2e4da668@googlegroups.com>
Message-ID: <FFCC2B49-3582-4CEE-9658-3710B213E754@illinois.edu>

Peng, 

Has this been filed as a bug yet?  

    https://redmine.open-bio.org/projects/bioperl

Seems like it would be fairly easy to fix, but I want to track it just in case.

chris

On Jul 11, 2012, at 12:45 PM, Peng Zhou wrote:

> Hello guys,
> 
> Just a follow-up, it seems to me the bioperl-live version is still having the same problem - calling hit "query" while query sequence "hit". I also looked into the test script written for hmmer3 (bioperl-live/t/SearchIO/hmmer.t), and it doesn't deal with the alignment part - I guess that's why this bug was not discovered.
> 
> To be simple, here's an output of hmmsearch v3.0:
> # hmmsearch :: search profile(s) against a sequence database
> # HMMER 3.0 (March 2010); http://hmmer.org/
> # Copyright (C) 2010 Howard Hughes Medical Institute.
> # Freely distributed under the GNU General Public License (GPLv3).
> # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> # query HMM file:                  /project/youngn/zhoup/Scripts/spada/profile/21_all.hmm
> # target sequence database:        /project/youngn/zhoup/Data/misc3/spada/Athaliana/01_genome/12_refseq_orf.fa
> # output directed to file:         /project/youngn/zhoup/Data/misc3/spada/Athaliana/11_hmmSearchX/01_raw.txt
> # number of worker threads:        4
> # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> 
> Query:       CRP0000  [M=75]
> Scores for complete sequences (score includes all domains):
>    --- full sequence ---   --- best 1 domain ---    -#dom-
>     E-value  score  bias    E-value  score  bias    exp  N  Sequence                 Description
>     ------- ------ -----    ------- ------ -----   ---- --  --------                 -----------
>     5.5e-25   95.0  14.4    5.7e-25   95.0  10.0    1.0  1  Chr2_540228_540404_+     
> 
> Domain annotation for each sequence (and alignments):
> >> Chr2_540228_540404_+  
>    #    score  bias  c-Evalue  i-Evalue hmmfrom  hmm to    alifrom  ali to    envfrom  env to     acc
>  ---   ------ ----- --------- --------- ------- -------    ------- -------    ------- -------    ----
>    1 !   95.0  10.0   3.6e-30   5.7e-25      20      74 ..       4      59 .]       1      59 [] 0.95
> 
>   Alignments for each domain:
>   == domain 1    score: 95.0 bits;  conditional E-value: 3.6e-30
>                CRP0000 20 tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg.rrrCfCtkpc 74
>                           ++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf gG+crg rrrC+Ct++c
>   Chr2_540228_540404_+  4 GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC 59
>                           568899***99********************************************* PP
> 
> And here is a dump of the parsed HSP object:
> $VAR1 = bless( {
>                  'VERBOSE' => 0,
>                  'IDENTICAL' => 0,
>                  'RANK' => 1,
>                  'STRANDED' => 'NONE',
>                  'EVALUE' => '3.6e-30',
>                  'HSP_LENGTH' => 56,
>                  'ALGORITHM' => 'HMMSEARCH'
>                  'SCORE' => '95.0',
>                  'GAP_SYMBOL' => '-',
>                  'CONSERVED' => 0,
>                  
>                  'HIT_NAME' => 'Chr2_540228_540404_+',
>                  'HIT_DESC' => '',
>                  'HIT_START' => '20',
>                  'HIT_END' => '74',
>                  'HIT_LENGTH' => 56,
>                  'HIT_SEQ' => 'tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg-rrrCfCtkpc',
>                  'HIT_FRAME' => 0,
>                  
>                  'QUERY_NAME' => 'CRP0000',
>                  'QUERY_DESC' => undef,
>                  'QUERY_START' => '4',
>                  'QUERY_END' => '59',
>                  'QUERY_LENGTH' => '75',
>                  'QUERY_FRAME' => 0,
>                  'QUERY_SEQ' => 'GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC',
>                  
>                  'HOMOLOGY_SEQ' => '++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf gG+crg rrrC+Ct++c',
>                }, 'Bio::Search::HSP::HMMERHSP' );
> 
> Clearly, the "HIT_START", "HIT_END", "HIT_SEQ" should actually be exchanged with "QUERY_START", "QUERY_END" and "QUERY_SEQ" values.
> 
> Thanks,
> 
> Peng,
> 
> On Tuesday, July 19, 2011 11:23:20 PM UTC-5, Givan, Scott A. wrote:
> I'll try the bioperl-live version. Thanks guys.
> Scott Givan
> 541-740-4685
> Sent from an iPhone (so expect typos).
> 
> On Jul 19, 2011, at 10:34 PM, "Chris Fields" <cjfields at illinois.edu> wrote:
> 
> > This might be a disconnect between the HMMER3 version in bioperl-live and the one in Kai's bioperl-hmmer3 repo.  I believe the one in bioperl-live is newer.  Scott, can you give that a try?
> > 
> > chris
> > 
> > On Jul 19, 2011, at 9:45 PM, Thomas Sharpton wrote:
> > 
> >> Hi Scott,
> >> 
> >> Thanks for writing. I'm on the road at the moment so I have to be briefer and less thorough than I'd like to be.
> >> 
> >> What you are observing is not the intended behavior. Oddly, it's not what I recall obtaining in my tests on this software, though I was mostly interested in hmmsearch at the time and may have been sloppier than I should have been when it came to hmmscan.
> >> 
> >> What version of HMMER3 you're using? There have been some small formatting changes in the past that might be causing a burp in the parser, though I'm doubting it.
> >> 
> >> Kai Blin wrote some test scripts (found here: bioperl-live/t/SearchIO/hmmer.t) that, if I recall correctly, evaluate query/hit coordinates. It might be worth giving this a shot if you haven't already.
> >> 
> >> Also, if you don't mind, I'm happy to run your code on your output file on my end. It might help me diagnose the problem.
> >> 
> >> Sorry this is being a thorn in your side! I've cc'ed the list in case anyone else has insight into this matter.
> >> 
> >> Best,
> >> Thomas
> >> 
> >> On Jul 19, 2011, at 10:43 AM, Givan, Scott A. wrote:
> >> 
> >>> Hi Thomas,
> >>> 
> >>> I'm using modules in the bipoerl-hmmer3 git repository to parse hmmscan
> >>> reports. When I parse the files and walk through the HSP's like:
> >>> 
> >>> while (my $hit = $rslt->next_model) {
> >>> 
> >>>    while (my $domain = $hit->next_hsp) {
> >>> 
> >>> And retrieve the "hit" coordinates like:
> >>> 
> >>>        print "hit coords: ", $domain->start('hit'), "-", $domain->end('hit'),
> >>> "\n";
> >>> 
> >>> The coordinates returned correspond to what I would call the "query",
> >>> since they are for the sequence I fed to hmmscan to search the profile
> >>> database. Likewise, when retrieving the query coordinates like
> >>> $domain->start('query'), I get what I consider the "hit" coordinates,
> >>> since they are for the domain profile. Is this the intended behavior?
> >>> 
> >>> Thanks.
> >>> 
> >>> scott
> >>> 
> >>> -- 
> >>> Scott A. Givan
> >>> Associate Director
> >>> Informatics Research Core Facility
> >>> 240e Bond Life Sciences Center
> >>> Research Assistant Professor
> >>> Molecular Microbiology and Immunology
> >>> University of Missouri, Columbia
> >>> 
> >>> TEL 573-882-2948
> >>> FAX 573-884-9676
> >>> http://ircf.rnet.missouri.edu
> >>> 
> >>> 
> >>> 
> >> 
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> 
> 


From p.j.a.cock at googlemail.com  Wed Jul 11 17:00:56 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 11 Jul 2012 22:00:56 +0100
Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates
In-Reply-To: <FFCC2B49-3582-4CEE-9658-3710B213E754@illinois.edu>
References: <CA4B2C62.4B0E%givans@missouri.edu>
	<7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com>
	<1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu>
	<EBB9878A-F77D-4A0C-B543-7E9FA0C25ABE@missouri.edu>
	<a73f4de8-8a2e-4dec-9073-62ab2e4da668@googlegroups.com>
	<FFCC2B49-3582-4CEE-9658-3710B213E754@illinois.edu>
Message-ID: <CAKVJ-_4mo16NxEbZkNp3-EG9uknc4Fo4m6ajQWTknumQNUn4Yw@mail.gmail.com>

On Wed, Jul 11, 2012 at 8:02 PM, Fields, Christopher J
<cjfields at illinois.edu> wrote:
> Peng,
>
> Has this been filed as a bug yet?
>
>     https://redmine.open-bio.org/projects/bioperl
>
> Seems like it would be fairly easy to fix, but I want to track it just in case.
>
> chris

Hi all,

This could be the unfortunate fact that hmmscan and
hmmsearch return very similar tabular output, but
with query and hit interchanged. i.e. You need some
extra information to know which way round they are
(not possible with the current output). This was an
issue in Bow's Biopython SearchIO project - which
for the moment he solved by handling this as two
hmmer file formats. In the medium term we're hoping
hmmer3 will add some header information or something.

Peter


From zhoupenggeni at gmail.com  Wed Jul 11 13:45:00 2012
From: zhoupenggeni at gmail.com (Peng Zhou)
Date: Wed, 11 Jul 2012 10:45:00 -0700 (PDT)
Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates
In-Reply-To: <EBB9878A-F77D-4A0C-B543-7E9FA0C25ABE@missouri.edu>
References: <CA4B2C62.4B0E%givans@missouri.edu>
	<7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com>
	<1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu>
	<EBB9878A-F77D-4A0C-B543-7E9FA0C25ABE@missouri.edu>
Message-ID: <a73f4de8-8a2e-4dec-9073-62ab2e4da668@googlegroups.com>

Hello guys,

Just a follow-up, it seems to me the bioperl-live version is still having 
the same problem - calling hit "query" while query sequence "hit". I also 
looked into the test script written for hmmer3 (
bioperl-live/t/SearchIO/hmmer.t), and it doesn't deal with the alignment 
part - I guess that's why this bug was not discovered.

To be simple, here's an output of hmmsearch v3.0:
# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.0 (March 2010); http://hmmer.org/
# Copyright (C) 2010 Howard Hughes Medical Institute.
# Freely distributed under the GNU General Public License (GPLv3).
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                 
 /project/youngn/zhoup/Scripts/spada/profile/21_all.hmm
# target sequence database:       
 /project/youngn/zhoup/Data/misc3/spada/Athaliana/01_genome/12_refseq_orf.fa
# output directed to file:         
/project/youngn/zhoup/Data/misc3/spada/Athaliana/11_hmmSearchX/01_raw.txt
# number of worker threads:        4
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       CRP0000  [M=75]
Scores for complete sequences (score includes all domains):
   --- full sequence ---   --- best 1 domain ---    -#dom-
    E-value  score  bias    E-value  score  bias    exp  N  Sequence       
          Description
    ------- ------ -----    ------- ------ -----   ---- --  --------       
          -----------
    5.5e-25   95.0  14.4    5.7e-25   95.0  10.0    1.0  1 
 Chr2_540228_540404_+     

Domain annotation for each sequence (and alignments):
>> Chr2_540228_540404_+  
   #    score  bias  c-Evalue  i-Evalue hmmfrom  hmm to    alifrom  ali to 
   envfrom  env to     acc
 ---   ------ ----- --------- --------- ------- -------    ------- ------- 
   ------- -------    ----
   1 !   95.0  10.0   3.6e-30   5.7e-25      20      74 ..       4      59 
.]       1      59 [] 0.95

  Alignments for each domain:
  == domain 1    score: 95.0 bits;  conditional E-value: 3.6e-30
               CRP0000 20 
tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg.rrrCfCtkpc 74
                          ++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf gG+crg 
rrrC+Ct++c
  Chr2_540228_540404_+  4 
GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC 59
                          
568899***99********************************************* PP

And here is a dump of the parsed HSP object:
$VAR1 = bless( {
                 'VERBOSE' => 0,
                 'IDENTICAL' => 0,
                 'RANK' => 1,
                 'STRANDED' => 'NONE',
                 'EVALUE' => '3.6e-30',
                 'HSP_LENGTH' => 56,
                 'ALGORITHM' => 'HMMSEARCH'
                 'SCORE' => '95.0',
                 'GAP_SYMBOL' => '-',
                 'CONSERVED' => 0,
                 
                 'HIT_NAME' => 'Chr2_540228_540404_+',
                 'HIT_DESC' => '',
                 'HIT_START' => '20',
                 'HIT_END' => '74',
                 'HIT_LENGTH' => 56,
                 'HIT_SEQ' => 
'tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg-rrrCfCtkpc',
                 'HIT_FRAME' => 0,
                 
                 'QUERY_NAME' => 'CRP0000',
                 'QUERY_DESC' => undef,
                 'QUERY_START' => '4',
                 'QUERY_END' => '59',
                 'QUERY_LENGTH' => '75',
                 'QUERY_FRAME' => 0,
                 'QUERY_SEQ' => 
'GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC',
                 
                 'HOMOLOGY_SEQ' => '++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf 
gG+crg rrrC+Ct++c',
               }, 'Bio::Search::HSP::HMMERHSP' );

Clearly, the "HIT_START", "HIT_END", "HIT_SEQ" should actually be exchanged 
with "QUERY_START", "QUERY_END" and "QUERY_SEQ" values.

Thanks,

Peng,

On Tuesday, July 19, 2011 11:23:20 PM UTC-5, Givan, Scott A. wrote:
>
> I'll try the bioperl-live version. Thanks guys. 
>
> Scott Givan
> 541-740-4685
> Sent from an iPhone (so expect typos). 
>
> On Jul 19, 2011, at 10:34 PM, "Chris Fields" <cjfields at illinois.edu> 
> wrote:
>
> > This might be a disconnect between the HMMER3 version in bioperl-live 
> and the one in Kai's bioperl-hmmer3 repo.  I believe the one in 
> bioperl-live is newer.  Scott, can you give that a try?
> > 
> > chris
> > 
> > On Jul 19, 2011, at 9:45 PM, Thomas Sharpton wrote:
> > 
> >> Hi Scott,
> >> 
> >> Thanks for writing. I'm on the road at the moment so I have to be 
> briefer and less thorough than I'd like to be.
> >> 
> >> What you are observing is not the intended behavior. Oddly, it's not 
> what I recall obtaining in my tests on this software, though I was mostly 
> interested in hmmsearch at the time and may have been sloppier than I 
> should have been when it came to hmmscan.
> >> 
> >> What version of HMMER3 you're using? There have been some small 
> formatting changes in the past that might be causing a burp in the parser, 
> though I'm doubting it.
> >> 
> >> Kai Blin wrote some test scripts (found here: 
> bioperl-live/t/SearchIO/hmmer.t) that, if I recall correctly, evaluate 
> query/hit coordinates. It might be worth giving this a shot if you haven't 
> already.
> >> 
> >> Also, if you don't mind, I'm happy to run your code on your output file 
> on my end. It might help me diagnose the problem.
> >> 
> >> Sorry this is being a thorn in your side! I've cc'ed the list in case 
> anyone else has insight into this matter.
> >> 
> >> Best,
> >> Thomas
> >> 
> >> On Jul 19, 2011, at 10:43 AM, Givan, Scott A. wrote:
> >> 
> >>> Hi Thomas,
> >>> 
> >>> I'm using modules in the bipoerl-hmmer3 git repository to parse hmmscan
> >>> reports. When I parse the files and walk through the HSP's like:
> >>> 
> >>> while (my $hit = $rslt->next_model) {
> >>> 
> >>>    while (my $domain = $hit->next_hsp) {
> >>> 
> >>> And retrieve the "hit" coordinates like:
> >>> 
> >>>        print "hit coords: ", $domain->start('hit'), "-", 
> $domain->end('hit'),
> >>> "\n";
> >>> 
> >>> The coordinates returned correspond to what I would call the "query",
> >>> since they are for the sequence I fed to hmmscan to search the profile
> >>> database. Likewise, when retrieving the query coordinates like
> >>> $domain->start('query'), I get what I consider the "hit" coordinates,
> >>> since they are for the domain profile. Is this the intended behavior?
> >>> 
> >>> Thanks.
> >>> 
> >>> scott
> >>> 
> >>> -- 
> >>> Scott A. Givan
> >>> Associate Director
> >>> Informatics Research Core Facility
> >>> 240e Bond Life Sciences Center
> >>> Research Assistant Professor
> >>> Molecular Microbiology and Immunology
> >>> University of Missouri, Columbia
> >>> 
> >>> TEL 573-882-2948
> >>> FAX 573-884-9676
> >>> http://ircf.rnet.missouri.edu
> >>> 
> >>> 
> >>> 
> >> 
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From zhoupenggeni at gmail.com  Wed Jul 11 13:45:00 2012
From: zhoupenggeni at gmail.com (Peng Zhou)
Date: Wed, 11 Jul 2012 10:45:00 -0700 (PDT)
Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates
In-Reply-To: <EBB9878A-F77D-4A0C-B543-7E9FA0C25ABE@missouri.edu>
References: <CA4B2C62.4B0E%givans@missouri.edu>
	<7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com>
	<1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu>
	<EBB9878A-F77D-4A0C-B543-7E9FA0C25ABE@missouri.edu>
Message-ID: <a73f4de8-8a2e-4dec-9073-62ab2e4da668@googlegroups.com>

Hello guys,

Just a follow-up, it seems to me the bioperl-live version is still having 
the same problem - calling hit "query" while query sequence "hit". I also 
looked into the test script written for hmmer3 (
bioperl-live/t/SearchIO/hmmer.t), and it doesn't deal with the alignment 
part - I guess that's why this bug was not discovered.

To be simple, here's an output of hmmsearch v3.0:
# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.0 (March 2010); http://hmmer.org/
# Copyright (C) 2010 Howard Hughes Medical Institute.
# Freely distributed under the GNU General Public License (GPLv3).
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                 
 /project/youngn/zhoup/Scripts/spada/profile/21_all.hmm
# target sequence database:       
 /project/youngn/zhoup/Data/misc3/spada/Athaliana/01_genome/12_refseq_orf.fa
# output directed to file:         
/project/youngn/zhoup/Data/misc3/spada/Athaliana/11_hmmSearchX/01_raw.txt
# number of worker threads:        4
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       CRP0000  [M=75]
Scores for complete sequences (score includes all domains):
   --- full sequence ---   --- best 1 domain ---    -#dom-
    E-value  score  bias    E-value  score  bias    exp  N  Sequence       
          Description
    ------- ------ -----    ------- ------ -----   ---- --  --------       
          -----------
    5.5e-25   95.0  14.4    5.7e-25   95.0  10.0    1.0  1 
 Chr2_540228_540404_+     

Domain annotation for each sequence (and alignments):
>> Chr2_540228_540404_+  
   #    score  bias  c-Evalue  i-Evalue hmmfrom  hmm to    alifrom  ali to 
   envfrom  env to     acc
 ---   ------ ----- --------- --------- ------- -------    ------- ------- 
   ------- -------    ----
   1 !   95.0  10.0   3.6e-30   5.7e-25      20      74 ..       4      59 
.]       1      59 [] 0.95

  Alignments for each domain:
  == domain 1    score: 95.0 bits;  conditional E-value: 3.6e-30
               CRP0000 20 
tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg.rrrCfCtkpc 74
                          ++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf gG+crg 
rrrC+Ct++c
  Chr2_540228_540404_+  4 
GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC 59
                          
568899***99********************************************* PP

And here is a dump of the parsed HSP object:
$VAR1 = bless( {
                 'VERBOSE' => 0,
                 'IDENTICAL' => 0,
                 'RANK' => 1,
                 'STRANDED' => 'NONE',
                 'EVALUE' => '3.6e-30',
                 'HSP_LENGTH' => 56,
                 'ALGORITHM' => 'HMMSEARCH'
                 'SCORE' => '95.0',
                 'GAP_SYMBOL' => '-',
                 'CONSERVED' => 0,
                 
                 'HIT_NAME' => 'Chr2_540228_540404_+',
                 'HIT_DESC' => '',
                 'HIT_START' => '20',
                 'HIT_END' => '74',
                 'HIT_LENGTH' => 56,
                 'HIT_SEQ' => 
'tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg-rrrCfCtkpc',
                 'HIT_FRAME' => 0,
                 
                 'QUERY_NAME' => 'CRP0000',
                 'QUERY_DESC' => undef,
                 'QUERY_START' => '4',
                 'QUERY_END' => '59',
                 'QUERY_LENGTH' => '75',
                 'QUERY_FRAME' => 0,
                 'QUERY_SEQ' => 
'GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC',
                 
                 'HOMOLOGY_SEQ' => '++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf 
gG+crg rrrC+Ct++c',
               }, 'Bio::Search::HSP::HMMERHSP' );

Clearly, the "HIT_START", "HIT_END", "HIT_SEQ" should actually be exchanged 
with "QUERY_START", "QUERY_END" and "QUERY_SEQ" values.

Thanks,

Peng,

On Tuesday, July 19, 2011 11:23:20 PM UTC-5, Givan, Scott A. wrote:
>
> I'll try the bioperl-live version. Thanks guys. 
>
> Scott Givan
> 541-740-4685
> Sent from an iPhone (so expect typos). 
>
> On Jul 19, 2011, at 10:34 PM, "Chris Fields" <cjfields at illinois.edu> 
> wrote:
>
> > This might be a disconnect between the HMMER3 version in bioperl-live 
> and the one in Kai's bioperl-hmmer3 repo.  I believe the one in 
> bioperl-live is newer.  Scott, can you give that a try?
> > 
> > chris
> > 
> > On Jul 19, 2011, at 9:45 PM, Thomas Sharpton wrote:
> > 
> >> Hi Scott,
> >> 
> >> Thanks for writing. I'm on the road at the moment so I have to be 
> briefer and less thorough than I'd like to be.
> >> 
> >> What you are observing is not the intended behavior. Oddly, it's not 
> what I recall obtaining in my tests on this software, though I was mostly 
> interested in hmmsearch at the time and may have been sloppier than I 
> should have been when it came to hmmscan.
> >> 
> >> What version of HMMER3 you're using? There have been some small 
> formatting changes in the past that might be causing a burp in the parser, 
> though I'm doubting it.
> >> 
> >> Kai Blin wrote some test scripts (found here: 
> bioperl-live/t/SearchIO/hmmer.t) that, if I recall correctly, evaluate 
> query/hit coordinates. It might be worth giving this a shot if you haven't 
> already.
> >> 
> >> Also, if you don't mind, I'm happy to run your code on your output file 
> on my end. It might help me diagnose the problem.
> >> 
> >> Sorry this is being a thorn in your side! I've cc'ed the list in case 
> anyone else has insight into this matter.
> >> 
> >> Best,
> >> Thomas
> >> 
> >> On Jul 19, 2011, at 10:43 AM, Givan, Scott A. wrote:
> >> 
> >>> Hi Thomas,
> >>> 
> >>> I'm using modules in the bipoerl-hmmer3 git repository to parse hmmscan
> >>> reports. When I parse the files and walk through the HSP's like:
> >>> 
> >>> while (my $hit = $rslt->next_model) {
> >>> 
> >>>    while (my $domain = $hit->next_hsp) {
> >>> 
> >>> And retrieve the "hit" coordinates like:
> >>> 
> >>>        print "hit coords: ", $domain->start('hit'), "-", 
> $domain->end('hit'),
> >>> "\n";
> >>> 
> >>> The coordinates returned correspond to what I would call the "query",
> >>> since they are for the sequence I fed to hmmscan to search the profile
> >>> database. Likewise, when retrieving the query coordinates like
> >>> $domain->start('query'), I get what I consider the "hit" coordinates,
> >>> since they are for the domain profile. Is this the intended behavior?
> >>> 
> >>> Thanks.
> >>> 
> >>> scott
> >>> 
> >>> -- 
> >>> Scott A. Givan
> >>> Associate Director
> >>> Informatics Research Core Facility
> >>> 240e Bond Life Sciences Center
> >>> Research Assistant Professor
> >>> Molecular Microbiology and Immunology
> >>> University of Missouri, Columbia
> >>> 
> >>> TEL 573-882-2948
> >>> FAX 573-884-9676
> >>> http://ircf.rnet.missouri.edu
> >>> 
> >>> 
> >>> 
> >> 
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From zhoupenggeni at gmail.com  Wed Jul 11 14:03:17 2012
From: zhoupenggeni at gmail.com (Peng Zhou)
Date: Wed, 11 Jul 2012 11:03:17 -0700 (PDT)
Subject: [Bioperl-l] Errors with Bio::Seq objects
In-Reply-To: <4FFDB879.1020906@gmail.com>
References: <4FFDB879.1020906@gmail.com>
Message-ID: <25cf332a-f998-4eae-96ff-d9db1ee2ff2c@googlegroups.com>

Hi, 

I guess that's what the commands are supposed to do: the get_SeqFeatures() 
method return an array of Bio::SeqFeature objects, and the translate() 
method returns a Bio::Seq object. And you can't simply "print" an object in 
perl - you can "dump" it though:

$ perl -e ' use Bio::SeqIO; use Data::Dumper; my 
$seqio=Bio::SeqIO->new(-file=>shift); 
print Dumper($seqio->next_seq()->get_SeqFeatures()); ' nt.gb 

$ perl -e ' use Bio::SeqIO; use Data::Dumper; my 
$seqio=Bio::SeqIO->new(-file=>shift); 
print Dumper($seqio->next_seq()->translate()); ' nt.gb 

On Wednesday, July 11, 2012 12:31:37 PM UTC-5, De-Jian Zhao wrote:
>
> Hi, 
>
> I downloaded a nucleotide sequence from Genbank (file name: nt.gb) and 
> tested the Bio::SeqIO module as follows. The first 3 commands succeeded; 
> however the last 2 failed. 
>
> I think $seqio->next_seq() produces a Bio::Seq object which contains the 
> sequence, features and annotation (according to the DESCRIPTION of 
> "perldoc Bio::Seq") and thus the invocation of the methods 
> get_SeqFeatures() and translate() should be valid. However, the results 
> denied this idea. 
>
> Will anyone explain what happened to the last 2 commands? I have 
> encountered numerous cases of failures when testing the bioperl methods. 
> I want to translate the mRNA sequence and extract the sequence features. 
> What are the right commands? Thanks a lot! 
>
> Best, 
> Dejian 
>
>
>
> PS: The commands and results 
>
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); 
> print $seqio->next_seq()->display_id(); ' nt.gb 
> NM_053056 
>
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); 
> print $seqio->next_seq()->molecule(); ' nt.gb 
> mRNA 
>
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); 
> print $seqio->next_seq()->subseq(1,6); ' nt.gb 
> CACACG 
>
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); 
> print $seqio->next_seq()->get_SeqFeatures(); ' nt.gb 
> Bio::SeqFeature::Generic=HASH(0x20a30898)Bio::SeqFeature::Generic=HASH(0x20a30bb0)Bio::SeqFeature::Generic=HASH(0x20a30cd0)Bio::SeqFeature::Generic=HASH(0x20a317b0)Bio::SeqFeature::Generic=HASH(0x20a31720)Bio::SeqFeature::Generic=HASH(0x20a39a18)Bio::SeqFeature::Generic=HASH(0x20a317e0)Bio::SeqFeature::Generic=HASH(0x20a398b0)Bio::SeqFeature::Generic=HASH(0x20a39838)Bio::SeqFeature::Generic=HASH(0x20a39e98)Bio::SeqFeature::Generic=HASH(0x20a3b898)Bio::SeqFeature::Generic=HASH(0x20a3a120)Bio::SeqFeature::Generic=HASH(0x20a3bda8)Bio::SeqFeature::Generic=HASH(0x20a3c030)Bio::SeqFeature::Generic=HASH(0x20a3c2b8)Bio::SeqFeature::Generic=HASH(0x20a3be20)Bio::SeqFeature::Generic=HASH(0x20a3c0a8)Bio::SeqFeature::Generic=HASH(0x20a3bb98)Bio::SeqFeature::Generic=HASH(0x20a3c300)Bio::SeqFeature::Generic=HASH(0x20a3c588)Bio::SeqFeature::Generic=HASH(0x20a3d838)Bio::SeqFeature::Generic=HASH(0x20a3dfb8)Bio::SeqFeature::Generic=HASH(0x20a3dd18) 
>
>
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); 
> print $seqio->next_seq()->translate(); ' nt.gb 
> Bio::Seq::RichSeq=HASH(0x20a3e7b0) 
>
>
>
>
> _______________________________________________ 
> Bioperl-l mailing list 
> Bioperl-l at lists.open-bio.org 
> http://lists.open-bio.org/mailman/listinfo/bioperl-l 
>


From zhoupenggeni at gmail.com  Wed Jul 11 14:03:17 2012
From: zhoupenggeni at gmail.com (Peng Zhou)
Date: Wed, 11 Jul 2012 11:03:17 -0700 (PDT)
Subject: [Bioperl-l] Errors with Bio::Seq objects
In-Reply-To: <4FFDB879.1020906@gmail.com>
References: <4FFDB879.1020906@gmail.com>
Message-ID: <25cf332a-f998-4eae-96ff-d9db1ee2ff2c@googlegroups.com>

Hi, 

I guess that's what the commands are supposed to do: the get_SeqFeatures() 
method return an array of Bio::SeqFeature objects, and the translate() 
method returns a Bio::Seq object. And you can't simply "print" an object in 
perl - you can "dump" it though:

$ perl -e ' use Bio::SeqIO; use Data::Dumper; my 
$seqio=Bio::SeqIO->new(-file=>shift); 
print Dumper($seqio->next_seq()->get_SeqFeatures()); ' nt.gb 

$ perl -e ' use Bio::SeqIO; use Data::Dumper; my 
$seqio=Bio::SeqIO->new(-file=>shift); 
print Dumper($seqio->next_seq()->translate()); ' nt.gb 

On Wednesday, July 11, 2012 12:31:37 PM UTC-5, De-Jian Zhao wrote:
>
> Hi, 
>
> I downloaded a nucleotide sequence from Genbank (file name: nt.gb) and 
> tested the Bio::SeqIO module as follows. The first 3 commands succeeded; 
> however the last 2 failed. 
>
> I think $seqio->next_seq() produces a Bio::Seq object which contains the 
> sequence, features and annotation (according to the DESCRIPTION of 
> "perldoc Bio::Seq") and thus the invocation of the methods 
> get_SeqFeatures() and translate() should be valid. However, the results 
> denied this idea. 
>
> Will anyone explain what happened to the last 2 commands? I have 
> encountered numerous cases of failures when testing the bioperl methods. 
> I want to translate the mRNA sequence and extract the sequence features. 
> What are the right commands? Thanks a lot! 
>
> Best, 
> Dejian 
>
>
>
> PS: The commands and results 
>
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); 
> print $seqio->next_seq()->display_id(); ' nt.gb 
> NM_053056 
>
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); 
> print $seqio->next_seq()->molecule(); ' nt.gb 
> mRNA 
>
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); 
> print $seqio->next_seq()->subseq(1,6); ' nt.gb 
> CACACG 
>
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); 
> print $seqio->next_seq()->get_SeqFeatures(); ' nt.gb 
> Bio::SeqFeature::Generic=HASH(0x20a30898)Bio::SeqFeature::Generic=HASH(0x20a30bb0)Bio::SeqFeature::Generic=HASH(0x20a30cd0)Bio::SeqFeature::Generic=HASH(0x20a317b0)Bio::SeqFeature::Generic=HASH(0x20a31720)Bio::SeqFeature::Generic=HASH(0x20a39a18)Bio::SeqFeature::Generic=HASH(0x20a317e0)Bio::SeqFeature::Generic=HASH(0x20a398b0)Bio::SeqFeature::Generic=HASH(0x20a39838)Bio::SeqFeature::Generic=HASH(0x20a39e98)Bio::SeqFeature::Generic=HASH(0x20a3b898)Bio::SeqFeature::Generic=HASH(0x20a3a120)Bio::SeqFeature::Generic=HASH(0x20a3bda8)Bio::SeqFeature::Generic=HASH(0x20a3c030)Bio::SeqFeature::Generic=HASH(0x20a3c2b8)Bio::SeqFeature::Generic=HASH(0x20a3be20)Bio::SeqFeature::Generic=HASH(0x20a3c0a8)Bio::SeqFeature::Generic=HASH(0x20a3bb98)Bio::SeqFeature::Generic=HASH(0x20a3c300)Bio::SeqFeature::Generic=HASH(0x20a3c588)Bio::SeqFeature::Generic=HASH(0x20a3d838)Bio::SeqFeature::Generic=HASH(0x20a3dfb8)Bio::SeqFeature::Generic=HASH(0x20a3dd18) 
>
>
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); 
> print $seqio->next_seq()->translate(); ' nt.gb 
> Bio::Seq::RichSeq=HASH(0x20a3e7b0) 
>
>
>
>
> _______________________________________________ 
> Bioperl-l mailing list 
> Bioperl-l at lists.open-bio.org 
> http://lists.open-bio.org/mailman/listinfo/bioperl-l 
>


From zhoupenggeni at gmail.com  Wed Jul 11 16:05:56 2012
From: zhoupenggeni at gmail.com (Peng Zhou)
Date: Wed, 11 Jul 2012 13:05:56 -0700 (PDT)
Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates
In-Reply-To: <FFCC2B49-3582-4CEE-9658-3710B213E754@illinois.edu>
References: <CA4B2C62.4B0E%givans@missouri.edu>
	<7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com>
	<1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu>
	<EBB9878A-F77D-4A0C-B543-7E9FA0C25ABE@missouri.edu>
	<a73f4de8-8a2e-4dec-9073-62ab2e4da668@googlegroups.com>
	<FFCC2B49-3582-4CEE-9658-3710B213E754@illinois.edu>
Message-ID: <ee477075-8d1a-4172-a4f2-7c51f45d9224@googlegroups.com>

Thanks Chris, here is the link of the filed 
bug: https://redmine.open-bio.org/issues/3369

On Wednesday, July 11, 2012 2:02:46 PM UTC-5, Christopher Fields wrote:
>
> Peng, 
>
> Has this been filed as a bug yet?   
>
>     https://redmine.open-bio.org/projects/bioperl 
>
> Seems like it would be fairly easy to fix, but I want to track it just in 
> case. 
>
> chris 
>
> On Jul 11, 2012, at 12:45 PM, Peng Zhou wrote: 
>
> > Hello guys, 
> > 
> > Just a follow-up, it seems to me the bioperl-live version is still 
> having the same problem - calling hit "query" while query sequence "hit". I 
> also looked into the test script written for hmmer3 
> (bioperl-live/t/SearchIO/hmmer.t), and it doesn't deal with the alignment 
> part - I guess that's why this bug was not discovered. 
> > 
> > To be simple, here's an output of hmmsearch v3.0: 
> > # hmmsearch :: search profile(s) against a sequence database 
> > # HMMER 3.0 (March 2010); http://hmmer.org/ 
> > # Copyright (C) 2010 Howard Hughes Medical Institute. 
> > # Freely distributed under the GNU General Public License (GPLv3). 
> > # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
> - 
> > # query HMM file:                 
>  /project/youngn/zhoup/Scripts/spada/profile/21_all.hmm 
> > # target sequence database:       
>  /project/youngn/zhoup/Data/misc3/spada/Athaliana/01_genome/12_refseq_orf.fa 
>
> > # output directed to file:         
> /project/youngn/zhoup/Data/misc3/spada/Athaliana/11_hmmSearchX/01_raw.txt 
> > # number of worker threads:        4 
> > # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
> - 
> > 
> > Query:       CRP0000  [M=75] 
> > Scores for complete sequences (score includes all domains): 
> >    --- full sequence ---   --- best 1 domain ---    -#dom- 
> >     E-value  score  bias    E-value  score  bias    exp  N  Sequence     
>             Description 
> >     ------- ------ -----    ------- ------ -----   ---- --  --------     
>             ----------- 
> >     5.5e-25   95.0  14.4    5.7e-25   95.0  10.0    1.0  1 
>  Chr2_540228_540404_+     
> > 
> > Domain annotation for each sequence (and alignments): 
> > >> Chr2_540228_540404_+   
> >    #    score  bias  c-Evalue  i-Evalue hmmfrom  hmm to    alifrom  ali 
> to    envfrom  env to     acc 
> >  ---   ------ ----- --------- --------- ------- -------    ------- 
> -------    ------- -------    ---- 
> >    1 !   95.0  10.0   3.6e-30   5.7e-25      20      74 ..       4     
>  59 .]       1      59 [] 0.95 
> > 
> >   Alignments for each domain: 
> >   == domain 1    score: 95.0 bits;  conditional E-value: 3.6e-30 
> >                CRP0000 20 
> tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg.rrrCfCtkpc 74 
> >                           ++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf gG+crg 
> rrrC+Ct++c 
> >   Chr2_540228_540404_+  4 
> GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC 59 
> >                           
> 568899***99********************************************* PP 
> > 
> > And here is a dump of the parsed HSP object: 
> > $VAR1 = bless( { 
> >                  'VERBOSE' => 0, 
> >                  'IDENTICAL' => 0, 
> >                  'RANK' => 1, 
> >                  'STRANDED' => 'NONE', 
> >                  'EVALUE' => '3.6e-30', 
> >                  'HSP_LENGTH' => 56, 
> >                  'ALGORITHM' => 'HMMSEARCH' 
> >                  'SCORE' => '95.0', 
> >                  'GAP_SYMBOL' => '-', 
> >                  'CONSERVED' => 0, 
> >                   
> >                  'HIT_NAME' => 'Chr2_540228_540404_+', 
> >                  'HIT_DESC' => '', 
> >                  'HIT_START' => '20', 
> >                  'HIT_END' => '74', 
> >                  'HIT_LENGTH' => 56, 
> >                  'HIT_SEQ' => 
> 'tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg-rrrCfCtkpc', 
> >                  'HIT_FRAME' => 0, 
> >                   
> >                  'QUERY_NAME' => 'CRP0000', 
> >                  'QUERY_DESC' => undef, 
> >                  'QUERY_START' => '4', 
> >                  'QUERY_END' => '59', 
> >                  'QUERY_LENGTH' => '75', 
> >                  'QUERY_FRAME' => 0, 
> >                  'QUERY_SEQ' => 
> 'GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC', 
> >                   
> >                  'HOMOLOGY_SEQ' => '++gp+++eartCes+Sh+FkGpCvs 
> +nCa+vC++Egf gG+crg rrrC+Ct++c', 
> >                }, 'Bio::Search::HSP::HMMERHSP' ); 
> > 
> > Clearly, the "HIT_START", "HIT_END", "HIT_SEQ" should actually be 
> exchanged with "QUERY_START", "QUERY_END" and "QUERY_SEQ" values. 
> > 
> > Thanks, 
> > 
> > Peng, 
> > 
> > On Tuesday, July 19, 2011 11:23:20 PM UTC-5, Givan, Scott A. wrote: 
> > I'll try the bioperl-live version. Thanks guys. 
> > Scott Givan 
> > 541-740-4685 
> > Sent from an iPhone (so expect typos). 
> > 
> > On Jul 19, 2011, at 10:34 PM, "Chris Fields" <cjfields at illinois.edu> 
> wrote: 
> > 
> > > This might be a disconnect between the HMMER3 version in bioperl-live 
> and the one in Kai's bioperl-hmmer3 repo.  I believe the one in 
> bioperl-live is newer.  Scott, can you give that a try? 
> > > 
> > > chris 
> > > 
> > > On Jul 19, 2011, at 9:45 PM, Thomas Sharpton wrote: 
> > > 
> > >> Hi Scott, 
> > >> 
> > >> Thanks for writing. I'm on the road at the moment so I have to be 
> briefer and less thorough than I'd like to be. 
> > >> 
> > >> What you are observing is not the intended behavior. Oddly, it's not 
> what I recall obtaining in my tests on this software, though I was mostly 
> interested in hmmsearch at the time and may have been sloppier than I 
> should have been when it came to hmmscan. 
> > >> 
> > >> What version of HMMER3 you're using? There have been some small 
> formatting changes in the past that might be causing a burp in the parser, 
> though I'm doubting it. 
> > >> 
> > >> Kai Blin wrote some test scripts (found here: 
> bioperl-live/t/SearchIO/hmmer.t) that, if I recall correctly, evaluate 
> query/hit coordinates. It might be worth giving this a shot if you haven't 
> already. 
> > >> 
> > >> Also, if you don't mind, I'm happy to run your code on your output 
> file on my end. It might help me diagnose the problem. 
> > >> 
> > >> Sorry this is being a thorn in your side! I've cc'ed the list in case 
> anyone else has insight into this matter. 
> > >> 
> > >> Best, 
> > >> Thomas 
> > >> 
> > >> On Jul 19, 2011, at 10:43 AM, Givan, Scott A. wrote: 
> > >> 
> > >>> Hi Thomas, 
> > >>> 
> > >>> I'm using modules in the bipoerl-hmmer3 git repository to parse 
> hmmscan 
> > >>> reports. When I parse the files and walk through the HSP's like: 
> > >>> 
> > >>> while (my $hit = $rslt->next_model) { 
> > >>> 
> > >>>    while (my $domain = $hit->next_hsp) { 
> > >>> 
> > >>> And retrieve the "hit" coordinates like: 
> > >>> 
> > >>>        print "hit coords: ", $domain->start('hit'), "-", 
> $domain->end('hit'), 
> > >>> "\n"; 
> > >>> 
> > >>> The coordinates returned correspond to what I would call the 
> "query", 
> > >>> since they are for the sequence I fed to hmmscan to search the 
> profile 
> > >>> database. Likewise, when retrieving the query coordinates like 
> > >>> $domain->start('query'), I get what I consider the "hit" 
> coordinates, 
> > >>> since they are for the domain profile. Is this the intended 
> behavior? 
> > >>> 
> > >>> Thanks. 
> > >>> 
> > >>> scott 
> > >>> 
> > >>> -- 
> > >>> Scott A. Givan 
> > >>> Associate Director 
> > >>> Informatics Research Core Facility 
> > >>> 240e Bond Life Sciences Center 
> > >>> Research Assistant Professor 
> > >>> Molecular Microbiology and Immunology 
> > >>> University of Missouri, Columbia 
> > >>> 
> > >>> TEL 573-882-2948 
> > >>> FAX 573-884-9676 
> > >>> http://ircf.rnet.missouri.edu 
> > >>> 
> > >>> 
> > >>> 
> > >> 
> > >> _______________________________________________ 
> > >> Bioperl-l mailing list 
> > >> Bioperl-l at lists.open-bio.org 
> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l 
> > > 
> > 
> > _______________________________________________ 
> > Bioperl-l mailing list 
> > Bioperl-l at lists.open-bio.org 
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l 
> > 
> > 
> > 
> > 
>
>
> _______________________________________________ 
> Bioperl-l mailing list 
> Bioperl-l at lists.open-bio.org 
> http://lists.open-bio.org/mailman/listinfo/bioperl-l 
>


From zhoupenggeni at gmail.com  Wed Jul 11 16:05:56 2012
From: zhoupenggeni at gmail.com (Peng Zhou)
Date: Wed, 11 Jul 2012 13:05:56 -0700 (PDT)
Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates
In-Reply-To: <FFCC2B49-3582-4CEE-9658-3710B213E754@illinois.edu>
References: <CA4B2C62.4B0E%givans@missouri.edu>
	<7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com>
	<1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu>
	<EBB9878A-F77D-4A0C-B543-7E9FA0C25ABE@missouri.edu>
	<a73f4de8-8a2e-4dec-9073-62ab2e4da668@googlegroups.com>
	<FFCC2B49-3582-4CEE-9658-3710B213E754@illinois.edu>
Message-ID: <ee477075-8d1a-4172-a4f2-7c51f45d9224@googlegroups.com>

Thanks Chris, here is the link of the filed 
bug: https://redmine.open-bio.org/issues/3369

On Wednesday, July 11, 2012 2:02:46 PM UTC-5, Christopher Fields wrote:
>
> Peng, 
>
> Has this been filed as a bug yet?   
>
>     https://redmine.open-bio.org/projects/bioperl 
>
> Seems like it would be fairly easy to fix, but I want to track it just in 
> case. 
>
> chris 
>
> On Jul 11, 2012, at 12:45 PM, Peng Zhou wrote: 
>
> > Hello guys, 
> > 
> > Just a follow-up, it seems to me the bioperl-live version is still 
> having the same problem - calling hit "query" while query sequence "hit". I 
> also looked into the test script written for hmmer3 
> (bioperl-live/t/SearchIO/hmmer.t), and it doesn't deal with the alignment 
> part - I guess that's why this bug was not discovered. 
> > 
> > To be simple, here's an output of hmmsearch v3.0: 
> > # hmmsearch :: search profile(s) against a sequence database 
> > # HMMER 3.0 (March 2010); http://hmmer.org/ 
> > # Copyright (C) 2010 Howard Hughes Medical Institute. 
> > # Freely distributed under the GNU General Public License (GPLv3). 
> > # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
> - 
> > # query HMM file:                 
>  /project/youngn/zhoup/Scripts/spada/profile/21_all.hmm 
> > # target sequence database:       
>  /project/youngn/zhoup/Data/misc3/spada/Athaliana/01_genome/12_refseq_orf.fa 
>
> > # output directed to file:         
> /project/youngn/zhoup/Data/misc3/spada/Athaliana/11_hmmSearchX/01_raw.txt 
> > # number of worker threads:        4 
> > # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
> - 
> > 
> > Query:       CRP0000  [M=75] 
> > Scores for complete sequences (score includes all domains): 
> >    --- full sequence ---   --- best 1 domain ---    -#dom- 
> >     E-value  score  bias    E-value  score  bias    exp  N  Sequence     
>             Description 
> >     ------- ------ -----    ------- ------ -----   ---- --  --------     
>             ----------- 
> >     5.5e-25   95.0  14.4    5.7e-25   95.0  10.0    1.0  1 
>  Chr2_540228_540404_+     
> > 
> > Domain annotation for each sequence (and alignments): 
> > >> Chr2_540228_540404_+   
> >    #    score  bias  c-Evalue  i-Evalue hmmfrom  hmm to    alifrom  ali 
> to    envfrom  env to     acc 
> >  ---   ------ ----- --------- --------- ------- -------    ------- 
> -------    ------- -------    ---- 
> >    1 !   95.0  10.0   3.6e-30   5.7e-25      20      74 ..       4     
>  59 .]       1      59 [] 0.95 
> > 
> >   Alignments for each domain: 
> >   == domain 1    score: 95.0 bits;  conditional E-value: 3.6e-30 
> >                CRP0000 20 
> tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg.rrrCfCtkpc 74 
> >                           ++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf gG+crg 
> rrrC+Ct++c 
> >   Chr2_540228_540404_+  4 
> GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC 59 
> >                           
> 568899***99********************************************* PP 
> > 
> > And here is a dump of the parsed HSP object: 
> > $VAR1 = bless( { 
> >                  'VERBOSE' => 0, 
> >                  'IDENTICAL' => 0, 
> >                  'RANK' => 1, 
> >                  'STRANDED' => 'NONE', 
> >                  'EVALUE' => '3.6e-30', 
> >                  'HSP_LENGTH' => 56, 
> >                  'ALGORITHM' => 'HMMSEARCH' 
> >                  'SCORE' => '95.0', 
> >                  'GAP_SYMBOL' => '-', 
> >                  'CONSERVED' => 0, 
> >                   
> >                  'HIT_NAME' => 'Chr2_540228_540404_+', 
> >                  'HIT_DESC' => '', 
> >                  'HIT_START' => '20', 
> >                  'HIT_END' => '74', 
> >                  'HIT_LENGTH' => 56, 
> >                  'HIT_SEQ' => 
> 'tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg-rrrCfCtkpc', 
> >                  'HIT_FRAME' => 0, 
> >                   
> >                  'QUERY_NAME' => 'CRP0000', 
> >                  'QUERY_DESC' => undef, 
> >                  'QUERY_START' => '4', 
> >                  'QUERY_END' => '59', 
> >                  'QUERY_LENGTH' => '75', 
> >                  'QUERY_FRAME' => 0, 
> >                  'QUERY_SEQ' => 
> 'GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC', 
> >                   
> >                  'HOMOLOGY_SEQ' => '++gp+++eartCes+Sh+FkGpCvs 
> +nCa+vC++Egf gG+crg rrrC+Ct++c', 
> >                }, 'Bio::Search::HSP::HMMERHSP' ); 
> > 
> > Clearly, the "HIT_START", "HIT_END", "HIT_SEQ" should actually be 
> exchanged with "QUERY_START", "QUERY_END" and "QUERY_SEQ" values. 
> > 
> > Thanks, 
> > 
> > Peng, 
> > 
> > On Tuesday, July 19, 2011 11:23:20 PM UTC-5, Givan, Scott A. wrote: 
> > I'll try the bioperl-live version. Thanks guys. 
> > Scott Givan 
> > 541-740-4685 
> > Sent from an iPhone (so expect typos). 
> > 
> > On Jul 19, 2011, at 10:34 PM, "Chris Fields" <cjfields at illinois.edu> 
> wrote: 
> > 
> > > This might be a disconnect between the HMMER3 version in bioperl-live 
> and the one in Kai's bioperl-hmmer3 repo.  I believe the one in 
> bioperl-live is newer.  Scott, can you give that a try? 
> > > 
> > > chris 
> > > 
> > > On Jul 19, 2011, at 9:45 PM, Thomas Sharpton wrote: 
> > > 
> > >> Hi Scott, 
> > >> 
> > >> Thanks for writing. I'm on the road at the moment so I have to be 
> briefer and less thorough than I'd like to be. 
> > >> 
> > >> What you are observing is not the intended behavior. Oddly, it's not 
> what I recall obtaining in my tests on this software, though I was mostly 
> interested in hmmsearch at the time and may have been sloppier than I 
> should have been when it came to hmmscan. 
> > >> 
> > >> What version of HMMER3 you're using? There have been some small 
> formatting changes in the past that might be causing a burp in the parser, 
> though I'm doubting it. 
> > >> 
> > >> Kai Blin wrote some test scripts (found here: 
> bioperl-live/t/SearchIO/hmmer.t) that, if I recall correctly, evaluate 
> query/hit coordinates. It might be worth giving this a shot if you haven't 
> already. 
> > >> 
> > >> Also, if you don't mind, I'm happy to run your code on your output 
> file on my end. It might help me diagnose the problem. 
> > >> 
> > >> Sorry this is being a thorn in your side! I've cc'ed the list in case 
> anyone else has insight into this matter. 
> > >> 
> > >> Best, 
> > >> Thomas 
> > >> 
> > >> On Jul 19, 2011, at 10:43 AM, Givan, Scott A. wrote: 
> > >> 
> > >>> Hi Thomas, 
> > >>> 
> > >>> I'm using modules in the bipoerl-hmmer3 git repository to parse 
> hmmscan 
> > >>> reports. When I parse the files and walk through the HSP's like: 
> > >>> 
> > >>> while (my $hit = $rslt->next_model) { 
> > >>> 
> > >>>    while (my $domain = $hit->next_hsp) { 
> > >>> 
> > >>> And retrieve the "hit" coordinates like: 
> > >>> 
> > >>>        print "hit coords: ", $domain->start('hit'), "-", 
> $domain->end('hit'), 
> > >>> "\n"; 
> > >>> 
> > >>> The coordinates returned correspond to what I would call the 
> "query", 
> > >>> since they are for the sequence I fed to hmmscan to search the 
> profile 
> > >>> database. Likewise, when retrieving the query coordinates like 
> > >>> $domain->start('query'), I get what I consider the "hit" 
> coordinates, 
> > >>> since they are for the domain profile. Is this the intended 
> behavior? 
> > >>> 
> > >>> Thanks. 
> > >>> 
> > >>> scott 
> > >>> 
> > >>> -- 
> > >>> Scott A. Givan 
> > >>> Associate Director 
> > >>> Informatics Research Core Facility 
> > >>> 240e Bond Life Sciences Center 
> > >>> Research Assistant Professor 
> > >>> Molecular Microbiology and Immunology 
> > >>> University of Missouri, Columbia 
> > >>> 
> > >>> TEL 573-882-2948 
> > >>> FAX 573-884-9676 
> > >>> http://ircf.rnet.missouri.edu 
> > >>> 
> > >>> 
> > >>> 
> > >> 
> > >> _______________________________________________ 
> > >> Bioperl-l mailing list 
> > >> Bioperl-l at lists.open-bio.org 
> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l 
> > > 
> > 
> > _______________________________________________ 
> > Bioperl-l mailing list 
> > Bioperl-l at lists.open-bio.org 
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l 
> > 
> > 
> > 
> > 
>
>
> _______________________________________________ 
> Bioperl-l mailing list 
> Bioperl-l at lists.open-bio.org 
> http://lists.open-bio.org/mailman/listinfo/bioperl-l 
>


From w.arindrarto at gmail.com  Wed Jul 11 17:25:44 2012
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Wed, 11 Jul 2012 23:25:44 +0200
Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates
In-Reply-To: <CAKVJ-_4mo16NxEbZkNp3-EG9uknc4Fo4m6ajQWTknumQNUn4Yw@mail.gmail.com>
References: <CA4B2C62.4B0E%givans@missouri.edu>
	<7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com>
	<1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu>
	<EBB9878A-F77D-4A0C-B543-7E9FA0C25ABE@missouri.edu>
	<a73f4de8-8a2e-4dec-9073-62ab2e4da668@googlegroups.com>
	<FFCC2B49-3582-4CEE-9658-3710B213E754@illinois.edu>
	<CAKVJ-_4mo16NxEbZkNp3-EG9uknc4Fo4m6ajQWTknumQNUn4Yw@mail.gmail.com>
Message-ID: <CADEGkF5mE=1Kgy5X4BX9VH0TuChU4OyKuf3e-rzH_SeB=xxapA@mail.gmail.com>

Hi everyone,

Just as an additional info that might be useful:

The current Biopython parser for the plain text format parses the very
first line to find out which HMMER flavor produces the result. Both 'hmm
from' and 'hmmto' are query coordinates if the flavor is hmmsearch
or phmmer; and they're hit coordinates if the flavor is hmmscan.

This information is not available in other HMMER command line output
formats (tblout and domtblout), which as Peter has mentioned, required
us to treat different flavors of the table output as different formats
for the time being.
Fortunately, after contacting the HMMER developers they mentioned that this
is not the case anymore in their development branch (and their future
planned release).

Hope that helps :),
Bow


On Wed, Jul 11, 2012 at 11:00 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Wed, Jul 11, 2012 at 8:02 PM, Fields, Christopher J
> <cjfields at illinois.edu> wrote:
>> Peng,
>>
>> Has this been filed as a bug yet?
>>
>>     https://redmine.open-bio.org/projects/bioperl
>>
>> Seems like it would be fairly easy to fix, but I want to track it just in case.
>>
>> chris
>
> Hi all,
>
> This could be the unfortunate fact that hmmscan and
> hmmsearch return very similar tabular output, but
> with query and hit interchanged. i.e. You need some
> extra information to know which way round they are
> (not possible with the current output). This was an
> issue in Bow's Biopython SearchIO project - which
> for the moment he solved by handling this as two
> hmmer file formats. In the medium term we're hoping
> hmmer3 will add some header information or something.
>
> Peter


From dejian.zhao at gmail.com  Thu Jul 12 01:04:54 2012
From: dejian.zhao at gmail.com (De-Jian Zhao)
Date: Thu, 12 Jul 2012 13:04:54 +0800
Subject: [Bioperl-l] Errors with Bio::Seq objects
In-Reply-To: <25cf332a-f998-4eae-96ff-d9db1ee2ff2c@googlegroups.com>
References: <4FFDB879.1020906@gmail.com>
	<25cf332a-f998-4eae-96ff-d9db1ee2ff2c@googlegroups.com>
Message-ID: <4FFE5AF6.1020300@gmail.com>

Thank you, Peng. That's great! Actually I am wondering how to get the 
whole content of an object these days; "Dumping it" is a good solution.

On 2012-7-12 2:03, Peng Zhou wrote:
> Hi,
>
> I guess that's what the commands are supposed to do: the get_SeqFeatures()
> method return an array of Bio::SeqFeature objects, and the translate()
> method returns a Bio::Seq object. And you can't simply "print" an object in
> perl - you can "dump" it though:
>
> $ perl -e ' use Bio::SeqIO; use Data::Dumper; my
> $seqio=Bio::SeqIO->new(-file=>shift);
> print Dumper($seqio->next_seq()->get_SeqFeatures()); ' nt.gb
>
> $ perl -e ' use Bio::SeqIO; use Data::Dumper; my
> $seqio=Bio::SeqIO->new(-file=>shift);
> print Dumper($seqio->next_seq()->translate()); ' nt.gb
>
> On Wednesday, July 11, 2012 12:31:37 PM UTC-5, De-Jian Zhao wrote:
>> Hi,
>>
>> I downloaded a nucleotide sequence from Genbank (file name: nt.gb) and
>> tested the Bio::SeqIO module as follows. The first 3 commands succeeded;
>> however the last 2 failed.
>>
>> I think $seqio->next_seq() produces a Bio::Seq object which contains the
>> sequence, features and annotation (according to the DESCRIPTION of
>> "perldoc Bio::Seq") and thus the invocation of the methods
>> get_SeqFeatures() and translate() should be valid. However, the results
>> denied this idea.
>>
>> Will anyone explain what happened to the last 2 commands? I have
>> encountered numerous cases of failures when testing the bioperl methods.
>> I want to translate the mRNA sequence and extract the sequence features.
>> What are the right commands? Thanks a lot!
>>
>> Best,
>> Dejian
>>
>>
>>
>> PS: The commands and results
>>
>> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift);
>> print $seqio->next_seq()->display_id(); ' nt.gb
>> NM_053056
>>
>> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift);
>> print $seqio->next_seq()->molecule(); ' nt.gb
>> mRNA
>>
>> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift);
>> print $seqio->next_seq()->subseq(1,6); ' nt.gb
>> CACACG
>>
>> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift);
>> print $seqio->next_seq()->get_SeqFeatures(); ' nt.gb
>> Bio::SeqFeature::Generic=HASH(0x20a30898)Bio::SeqFeature::Generic=HASH(0x20a30bb0)Bio::SeqFeature::Generic=HASH(0x20a30cd0)Bio::SeqFeature::Generic=HASH(0x20a317b0)Bio::SeqFeature::Generic=HASH(0x20a31720)Bio::SeqFeature::Generic=HASH(0x20a39a18)Bio::SeqFeature::Generic=HASH(0x20a317e0)Bio::SeqFeature::Generic=HASH(0x20a398b0)Bio::SeqFeature::Generic=HASH(0x20a39838)Bio::SeqFeature::Generic=HASH(0x20a39e98)Bio::SeqFeature::Generic=HASH(0x20a3b898)Bio::SeqFeature::Generic=HASH(0x20a3a120)Bio::SeqFeature::Generic=HASH(0x20a3bda8)Bio::SeqFeature::Generic=HASH(0x20a3c030)Bio::SeqFeature::Generic=HASH(0x20a3c2b8)Bio::SeqFeature::Generic=HASH(0x20a3be20)Bio::SeqFeature::Generic=HASH(0x20a3c0a8)Bio::SeqFeature::Generic=HASH(0x20a3bb98)Bio::SeqFeature::Generic=HASH(0x20a3c300)Bio::SeqFeature::Generic=HASH(0x20a3c588)Bio::SeqFeature::Generic=HASH(0x20a3d838)Bio::SeqFeature::Generic=HASH(0x20a3dfb8)Bio::SeqFeature::Generic=HASH(0x20a3dd18)
>>
>>
>> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift);
>> print $seqio->next_seq()->translate(); ' nt.gb
>> Bio::Seq::RichSeq=HASH(0x20a3e7b0)
>>


From dejian.zhao at gmail.com  Thu Jul 12 01:14:33 2012
From: dejian.zhao at gmail.com (De-Jian Zhao)
Date: Thu, 12 Jul 2012 13:14:33 +0800
Subject: [Bioperl-l] Errors with Bio::Seq objects
In-Reply-To: <9CA9DA3A-B03F-4EC3-977C-E18A6F4D9B6F@tamu.edu>
References: <4FFDB879.1020906@gmail.com>
	<9CA9DA3A-B03F-4EC3-977C-E18A6F4D9B6F@tamu.edu>
Message-ID: <4FFE5D39.6010406@gmail.com>

Thank you, Jim. You are right. It works. This example deepens my 
understanding of OOP.

On 2012-7-12 2:01, Jim Hu wrote:
>> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->translate(); ' nt.gb
>> >  Bio::Seq::RichSeq=HASH(0x20a3e7b0)
> ->translate returns a new Seq object.  I think
>
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->translate()->seq(); ' nt.gb
>
> should work (haven't tried it).


From kai.blin at biotech.uni-tuebingen.de  Thu Jul 12 09:43:19 2012
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Thu, 12 Jul 2012 15:43:19 +0200
Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates
In-Reply-To: <CADEGkF5mE=1Kgy5X4BX9VH0TuChU4OyKuf3e-rzH_SeB=xxapA@mail.gmail.com>
References: <CA4B2C62.4B0E%givans@missouri.edu>
	<7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com>
	<1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu>
	<EBB9878A-F77D-4A0C-B543-7E9FA0C25ABE@missouri.edu>
	<a73f4de8-8a2e-4dec-9073-62ab2e4da668@googlegroups.com>
	<FFCC2B49-3582-4CEE-9658-3710B213E754@illinois.edu>
	<CAKVJ-_4mo16NxEbZkNp3-EG9uknc4Fo4m6ajQWTknumQNUn4Yw@mail.gmail.com>
	<CADEGkF5mE=1Kgy5X4BX9VH0TuChU4OyKuf3e-rzH_SeB=xxapA@mail.gmail.com>
Message-ID: <4FFED477.3090907@biotech.uni-tuebingen.de>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 2012-07-11 23:25, Wibowo Arindrarto wrote:

Hi,

> The current Biopython parser for the plain text format parses the
> very first line to find out which HMMER flavor produces the result.
> Both 'hmm from' and 'hmmto' are query coordinates if the flavor is
> hmmsearch or phmmer; and they're hit coordinates if the flavor is
> hmmscan.

Whoops. I mostly looked at hmmscan when writing the parser, because
that's the file format I needed for my code. The code clearly should
follow the way the hmmer2 parser works, and differentiate between
hmmsearch and hmmscan type output.

As I said on the bug report, I'm happy to look at code fixing this.

> This information is not available in other HMMER command line
> output formats (tblout and domtblout), which as Peter has
> mentioned, required us to treat different flavors of the table
> output as different formats for the time being.

As far as I'm aware, BioPerl currently doesn't parse the table output
format.

Seeing how much repeated pain we run into with all these parsers in
the different Bio* projects, I wonder if there was a smarter way to
deal with parsing. Maybe at least some shared grammar file that we
could use for testing, to make sure we at least have the same
expectations about file formats in the different language
implementations. Ideally we'd auto-generate the parsers from the
grammar specification, but I guess that'll stay wishful thinking for
quite a bit.

> Fortunately, after contacting the HMMER developers they mentioned
> that this is not the case anymore in their development branch (and
> their future planned release).

That's certainly good news. :)

Cheers,
Kai

- -- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-Universit?t T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Germany
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJP/tR3AAoJEKM5lwBiwTTP6OoIAM3J9chdyfmTuQTp4KMxVIk7
PCkJy+aLcnfa3d7s8BVPG0GWQTPrfHLX6a7zWfoSLzL9RBShFWCQIxGpu+Tq3yR8
Hu/TpoFIg8bB1iAroAWLdsX8nio3Idlcl5JN38LBsFEUirFrGAsvfdN/+fYrP5Ni
y0ULP18uihiN07sVG88nZXNyEB7fIscVYdO90GsGq03/KOTRsRD4kugapiQJIy4D
lrqnYznLa4p30lBDCEHbTaHYbfIs7/8tryfHJsfjimjg8IoSMHMJfIkI7/z0qlL+
bxt/HuGMsm1Ak08xEAoT7T00t5tcAp1gclgZsO/CrviOicmhUgd6iri/kIpzg0c=
=acWd
-----END PGP SIGNATURE-----


From cjfields at illinois.edu  Thu Jul 12 11:24:13 2012
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Thu, 12 Jul 2012 15:24:13 +0000
Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates
In-Reply-To: <4FFED477.3090907@biotech.uni-tuebingen.de>
References: <CA4B2C62.4B0E%givans@missouri.edu>
	<7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com>
	<1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu>
	<EBB9878A-F77D-4A0C-B543-7E9FA0C25ABE@missouri.edu>
	<a73f4de8-8a2e-4dec-9073-62ab2e4da668@googlegroups.com>
	<FFCC2B49-3582-4CEE-9658-3710B213E754@illinois.edu>
	<CAKVJ-_4mo16NxEbZkNp3-EG9uknc4Fo4m6ajQWTknumQNUn4Yw@mail.gmail.com>
	<CADEGkF5mE=1Kgy5X4BX9VH0TuChU4OyKuf3e-rzH_SeB=xxapA@mail.gmail.com>
	<4FFED477.3090907@biotech.uni-tuebingen.de>
Message-ID: <1C3A31F9-9717-49F3-A880-FA725D0F3CDB@illinois.edu>

On Jul 12, 2012, at 8:43 AM, Kai Blin wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 2012-07-11 23:25, Wibowo Arindrarto wrote:
> 
> Hi,
> 
>> The current Biopython parser for the plain text format parses the
>> very first line to find out which HMMER flavor produces the result.
>> Both 'hmm from' and 'hmmto' are query coordinates if the flavor is
>> hmmsearch or phmmer; and they're hit coordinates if the flavor is
>> hmmscan.
> 
> Whoops. I mostly looked at hmmscan when writing the parser, because
> that's the file format I needed for my code. The code clearly should
> follow the way the hmmer2 parser works, and differentiate between
> hmmsearch and hmmscan type output.
> 
> As I said on the bug report, I'm happy to look at code fixing this.

Seems like it should be easy enough to address if there is something in the output that indicates the report type.

>> This information is not available in other HMMER command line
>> output formats (tblout and domtblout), which as Peter has
>> mentioned, required us to treat different flavors of the table
>> output as different formats for the time being.
> 
> As far as I'm aware, BioPerl currently doesn't parse the table output
> format.

The only reason to do so is if the table provides additional information the actual hits don't (this can be the case with BLAST reports).

> Seeing how much repeated pain we run into with all these parsers in
> the different Bio* projects, I wonder if there was a smarter way to
> deal with parsing. Maybe at least some shared grammar file that we
> could use for testing, to make sure we at least have the same
> expectations about file formats in the different language
> implementations. Ideally we'd auto-generate the parsers from the
> grammar specification, but I guess that'll stay wishful thinking for
> quite a bit.

I would fully support something like this, been thinking about this with Marpa::XS (which now has a compiled library, libmarpa, to make it less perl-centric), and there have been talks of using a similar toolkit with the bioruby folks.  We could always have a plain-perl/python/ruby/etc fallback in the most common formats.

chris


From buschj at hhu.de  Sun Jul 15 15:46:42 2012
From: buschj at hhu.de (jobu)
Date: Sun, 15 Jul 2012 21:46:42 +0200
Subject: [Bioperl-l] How to obtain Up- and Downstream target-Sequences of
	Blast Matches
Message-ID: <50031E22.3060902@hhu.de>

Dear All.

Still being a beginner in Perl and just having started to look into
BioPerl, I hope to ask my question at the right place.

I locally ran a standalone blastn search of many short query-sequences
against a set of target-fasta-sequences consisting of whole chromosomal
sequence data.

What I need to do now is to get let's say 100nt each Up- and Downstream
out of my target sequences for each Blast match.

At this point I only can assume that BioPerl might be helpfull in
resolving this task, though I haven't found a module yet that will
manage to do this locally on my harddrive.

Thus I would be thankful for the slightest hint where to begin.

Sincerely
Jochen


From Russell.Smithies at agresearch.co.nz  Sun Jul 15 17:19:14 2012
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Mon, 16 Jul 2012 09:19:14 +1200
Subject: [Bioperl-l] How to obtain Up- and Downstream target-Sequences
 of	Blast Matches
In-Reply-To: <50031E22.3060902@hhu.de>
References: <50031E22.3060902@hhu.de>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF34CCF2A4CAA@exchsth.agresearch.co.nz>

Hi Jochen,
I don't think BioPerl can directly manipulate blast databases so I'd probably do it with fastacmd to extract the sequence from the original blast database.
eg.
fastacmd -s X51494.1 -d /dataset/blastdata/active/nt -L 100,200
>gi|20090|emb|X51494.1|:100-200 Rice prolamin gene (strain NE4)
ATGATGCAAACGTTGGGCATGGGTAGCTCCACAGCCATGTTCATGTCGCAGCCAATGGCGCTCCTGCAGCAGCAATGTTG
CATGCAGCTACAAGGCATGAT

Or if you're using blast+, use the blastdbcmd command:
eg.
blastdbcmd -entry  X51494.1 -db /dataset/blastdata/active/nt -range 100-200
>gi|20090|emb|X51494.1|:100-200 Rice prolamin gene (strain NE4)
ATGATGCAAACGTTGGGCATGGGTAGCTCCACAGCCATGTTCATGTCGCAGCCAATGGCGCTCCTGCAGCAGCAATGTTG
CATGCAGCTACAAGGCATGAT

So to add it all together, try using BioPerl to parse your existing blast results and pull out each hit's coordinates then use a system call to exec fastacmd or blastdbcmd to extract the sequence from the blast database then write the sequences to file.

These might be useful:
http://www.bioperl.org/wiki/HOWTO:SearchIO
http://www.bioperl.org/wiki/HOWTO:SearchIO#Speed_improvements_with_lightweight_objects 
http://www.bioperl.org/wiki/HOWTO:BlastPlus
http://www.bioperl.org/wiki/HOWTO:StandAloneBlast


--Russell

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of jobu
Sent: Monday, 16 July 2012 7:47 a.m.
To: bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] How to obtain Up- and Downstream target-Sequences of Blast Matches

Dear All.

Still being a beginner in Perl and just having started to look into BioPerl, I hope to ask my question at the right place.

I locally ran a standalone blastn search of many short query-sequences against a set of target-fasta-sequences consisting of whole chromosomal sequence data.

What I need to do now is to get let's say 100nt each Up- and Downstream out of my target sequences for each Blast match.

At this point I only can assume that BioPerl might be helpfull in resolving this task, though I haven't found a module yet that will manage to do this locally on my harddrive.

Thus I would be thankful for the slightest hint where to begin.

Sincerely
Jochen
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From dcmertens.perl at gmail.com  Tue Jul 17 08:57:55 2012
From: dcmertens.perl at gmail.com (David Mertens)
Date: Tue, 17 Jul 2012 07:57:55 -0500
Subject: [Bioperl-l] Announcing The Quantified Onion Google Group and
	perl4science.github.com
Message-ID: <CA+4ieYXGs=ywH97gEdTOakVLp22n0gPuwZJK-C75C7ijUzPgwg@mail.gmail.com>

Hello everybody -

I returned from YAPC::NA this year intending to build-up the scientific
Perl community. One outgrowth of this has been Joel Berger's creation of
perl4science.github.com and gizmomathboy's creation of The Quantified Onion
Google Group<https://groups.google.com/forum/#%21forum/the-quantified-onion>
.

perl4science is meant to be a landing page for anybody looking to combine
Perl and science. Since it is a github repository, it makes it about as
easy as possible for others to contribute content or fixes. If you have a
project that scientists would find useful, you should fork the project, add
your content, and issue a pull request. It's that easy.

The Quantified Onion is meant to be a space for scientists to discuss how
we use Perl in our science and to work together to grow adoption of Perl
among scientists. It will undoubtedly attract newcomers to Perll asking
beginner questions, at which point we will gently refer them to the
appropriate manual pages. Interesting discussions thus far (in my mind)
include a discussion about teaching test-driven design and a discussion
about submitting an article to Computing in Science and Engineering for
their November Issue, which is supposed to be about Modern Programming
Languages. I would like to begin putting on workshops on Perl for
Scientists and Engineers (and encourage others to do that same), and I will
begin the discussion on The Quantified Onion.

If you know of other Perl science resources, please feel free to add them
to perl4science or post them on The Quantified Onion, and please join The
Quantified Onion. Together, we can grow Perl's adoption among scientists!

David Mertens

-- 
 "Debugging is twice as hard as writing the code in the first place.
  Therefore, if you write the code as cleverly as possible, you are,
  by definition, not smart enough to debug it." -- Brian Kernighan


From cjfields at illinois.edu  Wed Jul 18 10:29:02 2012
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Wed, 18 Jul 2012 14:29:02 +0000
Subject: [Bioperl-l] [blast-announce] OLD_BLAST parameter to be
 discontinued. Alternative NCBI BLAST parsable formats are available
References: <CC2C3DA1.A01D6%mcginnis@ncbi.nlm.nih.gov>
Message-ID: <118F034CF4C3EF48A96F86CE585B94BF2B63F6C5@CHIMBX5.ad.uillinois.edu>

Not sure if anyone is using this as a means of getting their reports (I don't), but I'm posting this here just in case.

-c

Begin forwarded message:

> From: "Mcginnis, Scott (NIH/NLM/NCBI) [E]" 
> Subject: [blast-announce] OLD_BLAST parameter to be discontinued. Alternative NCBI BLAST parsable formats are available
> Date: July 18, 2012 9:17:05 AM CDT
> To: NLM/NCBI List blast-announce
> 
> Beginning Sept. 10, 2012, the BLAST service will ignore the OLD_BLAST parameter in posted URLs. We are removing this old and little used option to prepare for upcoming enhancements to the BLAST service later this year. Setting OLD_BLAST=true produces an older version of the BLAST HTML results that a few people have used for automated processing (parsing) of results. NCBI BLAST supports a number of different and more stable formats for parsing. These include XML, tabular reports and ASN.1. For more information, please see BLAST Developer Information (http://1.usa.gov/O8AocI) and links on that page.
> 


From dejian.zhao at gmail.com  Wed Jul 18 11:36:14 2012
From: dejian.zhao at gmail.com (De-Jian Zhao)
Date: Wed, 18 Jul 2012 23:36:14 +0800
Subject: [Bioperl-l] Which graphic module should I learn?
Message-ID: <5006D7EE.1020205@gmail.com>

Hi, all.

Currently I am working on a genome. I will draw some pictures based on 
the sequencing data. In the long run, I will use the module in my future 
projects, so I want to learn a popular module to get better support from 
the community. I searched in cpan with the command "i /SVG/" and got 234 
items. Which one is popular in bioinformatics? Which module should I 
start with?  Thanks for any suggestions.

Best,
De-Jian


From scott at scottcain.net  Wed Jul 18 11:46:01 2012
From: scott at scottcain.net (Scott Cain)
Date: Wed, 18 Jul 2012 11:46:01 -0400
Subject: [Bioperl-l] Which graphic module should I learn?
In-Reply-To: <5006D7EE.1020205@gmail.com>
References: <5006D7EE.1020205@gmail.com>
Message-ID: <CA+JTaow0sJ-MTq47r1vvvu+H2zPu=U8VuEi++s-YDxcTo1+Ybg@mail.gmail.com>

Hi De-Jian,

Of course, it depends on what you want to do, but if you're referring
to the genome feature/annotation type graphics, Bio::Graphics already
supports SVG pretty well, via GD::SVG.

Scott


On Wed, Jul 18, 2012 at 11:36 AM, De-Jian Zhao <dejian.zhao at gmail.com> wrote:
> Hi, all.
>
> Currently I am working on a genome. I will draw some pictures based on the
> sequencing data. In the long run, I will use the module in my future
> projects, so I want to learn a popular module to get better support from the
> community. I searched in cpan with the command "i /SVG/" and got 234 items.
> Which one is popular in bioinformatics? Which module should I start with?
> Thanks for any suggestions.
>
> Best,
> De-Jian
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research


From cjfields at illinois.edu  Tue Jul 24 23:08:05 2012
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Wed, 25 Jul 2012 03:08:05 +0000
Subject: [Bioperl-l] BioPerl and Travis-CI
Message-ID: <118F034CF4C3EF48A96F86CE585B94BF3140B017@CITESMBX5.ad.uillinois.edu>

Peter Cock has graciously helped start up a branch for bioperl-live that is using Travis-CI (a nice continuous integration tool).  Results from Peter's fork are found here:

    http://travis-ci.org/#!/peterjc/bioperl-live

As this is now pulled into the main bioperl repo, results will be here:

    http://travis-ci.org/#!/bioperl/bioperl-live

I'll be working on this and expect this will be added to master in the next few days.  

chris


From p.j.a.cock at googlemail.com  Wed Jul 25 06:31:13 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 25 Jul 2012 11:31:13 +0100
Subject: [Bioperl-l] BioPerl and Travis-CI
In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF3140B017@CITESMBX5.ad.uillinois.edu>
References: <118F034CF4C3EF48A96F86CE585B94BF3140B017@CITESMBX5.ad.uillinois.edu>
Message-ID: <CAKVJ-_4TN_OUGLGrcwp-cNkQGhXGn2BiO9F1NEzAiua49HY3Lg@mail.gmail.com>

On Wed, Jul 25, 2012 at 4:08 AM, Fields, Christopher J
<cjfields at illinois.edu> wrote:
> Peter Cock has graciously helped start up a branch for bioperl-live
> that is using Travis-CI (a nice continuous integration tool).  Results
> from Peter's fork are found here:
>
>     http://travis-ci.org/#!/peterjc/bioperl-live
>
> As this is now pulled into the main bioperl repo, results will be here:
>
>     http://travis-ci.org/#!/bioperl/bioperl-live
>
> I'll be working on this and expect this will be added to master in
> the next few days.
>
> chris

We've had this running for Biopython for a month now, and it has
been a useful complement to the BuildBot (which covers other
operating systems). This was following BioRuby's lead:
http://biopython.org/pipermail/biopython-dev/2012-June/009742.html

The current BioPerl Travis configuration is probably usable right
now (after changing the branch whitelist to either master, or simple
all branches).

Other remaining issues include sorting out which dependencies
should be installed, and streamlining their verbose output (e.g.
using tail).

TravisCI can send out emails (e.g. on test failures), and perhaps
bioperl-guts-l might be a sensible place to send these. Initially
we'd disabled the emails for Biopython. I'd like to use an RSS
feed... there is a JSON API which BioRuby are using for
http://www.biogems.info/ which tracks their plugins.

Peter


From p.j.a.cock at googlemail.com  Fri Jul 27 11:03:05 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 27 Jul 2012 16:03:05 +0100
Subject: [Bioperl-l] BioPerl and Travis-CI
In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF3140D21F@CITESMBX5.ad.uillinois.edu>
References: <118F034CF4C3EF48A96F86CE585B94BF3140B017@CITESMBX5.ad.uillinois.edu>
	<CAKVJ-_4TN_OUGLGrcwp-cNkQGhXGn2BiO9F1NEzAiua49HY3Lg@mail.gmail.com>
	<CAKVJ-_7g-rVQ37Q0Svg4LGXN7C=3i7Mab-kjUc2XpuuX5uEXsQ@mail.gmail.com>
	<CAKVJ-_45r1Hza-ga95oyMRbYGzupsKtmAZ=ac29g2n-U9SegWA@mail.gmail.com>
	<118F034CF4C3EF48A96F86CE585B94BF3140D21F@CITESMBX5.ad.uillinois.edu>
Message-ID: <CAKVJ-_4iYB0N7A6eOEivrKTkeQZzw=9jiS7kqVCKR2aeEiZZaQ@mail.gmail.com>

On Fri, Jul 27, 2012 at 3:58 PM, Fields, Christopher J
<cjfields at illinois.edu> wrote:
> On Jul 27, 2012, at 9:47 AM, Peter Cock wrote:
>
>> On Thu, Jul 26, 2012 at 4:22 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>>>
>>> That's done now - except for the circular dependencies, and GD,
>>> which might be easy to solve if anyone knows what the error
>>> means - see commit message here:
>>> https://github.com/peterjc/bioperl-live/commit/905441ac09939be3368c14de38d04486c7e9849a
>>
>> Re: https://twitter.com/cjfields/status/228861370454638592
>> Not sure why you got GD to work when something very similar
>> had failed for me. Oh well - job done :)
>
> It was the lack of gdlib-config in the libgd2-xpm package, you need
> libgd2-xpm-dev.  One of the fun things about Debian packaging.

Ah - I should have guessed that.

>>> Would a single clean commit of the (current) .travis.yml file be
>>> preferable to the current series of commits? And you you want
>>> a pull request, or would you just merge/cherry-pick manually?
>>
>> Given all the churn between our revisions, personally I'd opt for
>> a single clean commit to bioperl/master - but your call.
>>
>> Peter
>
> Yep, about to merge it over.  It's working now, just need to
> whitelist master instead of travis after the merge.

I'd removed the whitelist altogether here:
https://github.com/peterjc/bioperl-live/commit/96dc5866f4406179353909c72d812623341c8fbd

My thinking was BioPerl seems to have multiple feature branches
under the official repo, so they should get tested too. You'd be
in a better position than me to judge what would work best for
BioPerl here.

Peter


From cjfields at illinois.edu  Fri Jul 27 10:58:21 2012
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Fri, 27 Jul 2012 14:58:21 +0000
Subject: [Bioperl-l] BioPerl and Travis-CI
In-Reply-To: <CAKVJ-_45r1Hza-ga95oyMRbYGzupsKtmAZ=ac29g2n-U9SegWA@mail.gmail.com>
References: <118F034CF4C3EF48A96F86CE585B94BF3140B017@CITESMBX5.ad.uillinois.edu>
	<CAKVJ-_4TN_OUGLGrcwp-cNkQGhXGn2BiO9F1NEzAiua49HY3Lg@mail.gmail.com>
	<CAKVJ-_7g-rVQ37Q0Svg4LGXN7C=3i7Mab-kjUc2XpuuX5uEXsQ@mail.gmail.com>
	<CAKVJ-_45r1Hza-ga95oyMRbYGzupsKtmAZ=ac29g2n-U9SegWA@mail.gmail.com>
Message-ID: <118F034CF4C3EF48A96F86CE585B94BF3140D21F@CITESMBX5.ad.uillinois.edu>

On Jul 27, 2012, at 9:47 AM, Peter Cock wrote:

> On Thu, Jul 26, 2012 at 4:22 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>> 
>> That's done now - except for the circular dependencies, and GD,
>> which might be easy to solve if anyone knows what the error
>> means - see commit message here:
>> https://github.com/peterjc/bioperl-live/commit/905441ac09939be3368c14de38d04486c7e9849a
> 
> Re: https://twitter.com/cjfields/status/228861370454638592
> Not sure why you got GD to work when something very similar
> had failed for me. Oh well - job done :)

It was the lack of gdlib-config in the libgd2-xpm package, you need libgd2-xpm-dev.  One of the fun things about Debian packaging.

>> Would a single clean commit of the (current) .travis.yml file be
>> preferable to the current series of commits? And you you want
>> a pull request, or would you just merge/cherry-pick manually?
> 
> Given all the churn between our revisions, personally I'd opt for
> a single clean commit to bioperl/master - but your call.
> 
> Peter

Yep, about to merge it over.  It's working now, just need to whitelist master instead of travis after the merge.

chris


From cjfields at illinois.edu  Fri Jul 27 12:26:34 2012
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Fri, 27 Jul 2012 16:26:34 +0000
Subject: [Bioperl-l] BioPerl Travis-CI now live
Message-ID: <118F034CF4C3EF48A96F86CE585B94BF3140D54D@CITESMBX5.ad.uillinois.edu>

All commits to bioperl-live master branch on github are now being tracked:

   http://travis-ci.org/#!/bioperl/bioperl-live

The .travis.yml file has a whitelist for branches to be tested; if anyone wants to test additional branches feel free to add them to the list!

chris


From cjfields at illinois.edu  Fri Jul 27 11:15:19 2012
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Fri, 27 Jul 2012 15:15:19 +0000
Subject: [Bioperl-l] BioPerl and Travis-CI
In-Reply-To: <CAKVJ-_4iYB0N7A6eOEivrKTkeQZzw=9jiS7kqVCKR2aeEiZZaQ@mail.gmail.com>
References: <118F034CF4C3EF48A96F86CE585B94BF3140B017@CITESMBX5.ad.uillinois.edu>
	<CAKVJ-_4TN_OUGLGrcwp-cNkQGhXGn2BiO9F1NEzAiua49HY3Lg@mail.gmail.com>
	<CAKVJ-_7g-rVQ37Q0Svg4LGXN7C=3i7Mab-kjUc2XpuuX5uEXsQ@mail.gmail.com>
	<CAKVJ-_45r1Hza-ga95oyMRbYGzupsKtmAZ=ac29g2n-U9SegWA@mail.gmail.com>
	<118F034CF4C3EF48A96F86CE585B94BF3140D21F@CITESMBX5.ad.uillinois.edu>
	<CAKVJ-_4iYB0N7A6eOEivrKTkeQZzw=9jiS7kqVCKR2aeEiZZaQ@mail.gmail.com>
Message-ID: <118F034CF4C3EF48A96F86CE585B94BF3140D2D6@CITESMBX5.ad.uillinois.edu>


On Jul 27, 2012, at 10:03 AM, Peter Cock wrote:

> On Fri, Jul 27, 2012 at 3:58 PM, Fields, Christopher J
> <cjfields at illinois.edu> wrote:
>> On Jul 27, 2012, at 9:47 AM, Peter Cock wrote:
>> 
>>> On Thu, Jul 26, 2012 at 4:22 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>>>> 
>>>> That's done now - except for the circular dependencies, and GD,
>>>> which might be easy to solve if anyone knows what the error
>>>> means - see commit message here:
>>>> https://github.com/peterjc/bioperl-live/commit/905441ac09939be3368c14de38d04486c7e9849a
>>> 
>>> Re: https://twitter.com/cjfields/status/228861370454638592
>>> Not sure why you got GD to work when something very similar
>>> had failed for me. Oh well - job done :)
>> 
>> It was the lack of gdlib-config in the libgd2-xpm package, you need
>> libgd2-xpm-dev.  One of the fun things about Debian packaging.
> 
> Ah - I should have guessed that.
> 
>>>> Would a single clean commit of the (current) .travis.yml file be
>>>> preferable to the current series of commits? And you you want
>>>> a pull request, or would you just merge/cherry-pick manually?
>>> 
>>> Given all the churn between our revisions, personally I'd opt for
>>> a single clean commit to bioperl/master - but your call.
>>> 
>>> Peter
>> 
>> Yep, about to merge it over.  It's working now, just need to
>> whitelist master instead of travis after the merge.
> 
> I'd removed the whitelist altogether here:
> https://github.com/peterjc/bioperl-live/commit/96dc5866f4406179353909c72d812623341c8fbd
> 
> My thinking was BioPerl seems to have multiple feature branches
> under the official repo, so they should get tested too. You'd be
> in a better position than me to judge what would work best for
> BioPerl here.
> 
> Peter

We'll keep it to master for now.  It's pretty easy to add branches as needed, and I didn't want to expand to all the potentially stale branches unless explicitly set (we need to triage all those at some point).

chris


From p.j.a.cock at googlemail.com  Fri Jul 27 10:47:18 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 27 Jul 2012 15:47:18 +0100
Subject: [Bioperl-l] BioPerl and Travis-CI
In-Reply-To: <CAKVJ-_7g-rVQ37Q0Svg4LGXN7C=3i7Mab-kjUc2XpuuX5uEXsQ@mail.gmail.com>
References: <118F034CF4C3EF48A96F86CE585B94BF3140B017@CITESMBX5.ad.uillinois.edu>
	<CAKVJ-_4TN_OUGLGrcwp-cNkQGhXGn2BiO9F1NEzAiua49HY3Lg@mail.gmail.com>
	<CAKVJ-_7g-rVQ37Q0Svg4LGXN7C=3i7Mab-kjUc2XpuuX5uEXsQ@mail.gmail.com>
Message-ID: <CAKVJ-_45r1Hza-ga95oyMRbYGzupsKtmAZ=ac29g2n-U9SegWA@mail.gmail.com>

On Thu, Jul 26, 2012 at 4:22 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>
> That's done now - except for the circular dependencies, and GD,
> which might be easy to solve if anyone knows what the error
> means - see commit message here:
> https://github.com/peterjc/bioperl-live/commit/905441ac09939be3368c14de38d04486c7e9849a

Re: https://twitter.com/cjfields/status/228861370454638592
Not sure why you got GD to work when something very similar
had failed for me. Oh well - job done :)

> Would a single clean commit of the (current) .travis.yml file be
> preferable to the current series of commits? And you you want
> a pull request, or would you just merge/cherry-pick manually?

Given all the churn between our revisions, personally I'd opt for
a single clean commit to bioperl/master - but your call.

Peter


From robfsouza at gmail.com  Fri Jul 27 18:29:22 2012
From: robfsouza at gmail.com (Robson de Souza)
Date: Fri, 27 Jul 2012 15:29:22 -0700 (PDT)
Subject: [Bioperl-l] obf sites offline?
Message-ID: <9bef8a3b-08ca-4868-be7a-193e7596290d@googlegroups.com>

I can't access any of the OBF sites, either from work (USA) or my phone... 
is there something going on?
Robson


From p.j.a.cock at googlemail.com  Thu Jul 26 11:22:26 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 26 Jul 2012 16:22:26 +0100
Subject: [Bioperl-l] BioPerl and Travis-CI
In-Reply-To: <CAKVJ-_4TN_OUGLGrcwp-cNkQGhXGn2BiO9F1NEzAiua49HY3Lg@mail.gmail.com>
References: <118F034CF4C3EF48A96F86CE585B94BF3140B017@CITESMBX5.ad.uillinois.edu>
	<CAKVJ-_4TN_OUGLGrcwp-cNkQGhXGn2BiO9F1NEzAiua49HY3Lg@mail.gmail.com>
Message-ID: <CAKVJ-_7g-rVQ37Q0Svg4LGXN7C=3i7Mab-kjUc2XpuuX5uEXsQ@mail.gmail.com>

On Wed, Jul 25, 2012 at 11:31 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Wed, Jul 25, 2012 at 4:08 AM, Fields, Christopher J
> <cjfields at illinois.edu> wrote:
>> Peter Cock has graciously helped start up a branch for bioperl-live
>> that is using Travis-CI (a nice continuous integration tool).  Results
>> from Peter's fork are found here:
>>
>>     http://travis-ci.org/#!/peterjc/bioperl-live
>>
>> As this is now pulled into the main bioperl repo, results will be here:
>>
>>     http://travis-ci.org/#!/bioperl/bioperl-live
>>
>> I'll be working on this and expect this will be added to master in
>> the next few days.
>>
>> chris
>
> We've had this running for Biopython for a month now, and it has
> been a useful complement to the BuildBot (which covers other
> operating systems). This was following BioRuby's lead:
> http://biopython.org/pipermail/biopython-dev/2012-June/009742.html
>
> The current BioPerl Travis configuration is probably usable right
> now (after changing the branch whitelist to either master, or simple
> all branches).
>
> Other remaining issues include sorting out which dependencies
> should be installed, and streamlining their verbose output (e.g.
> using tail).

That's done now - except for the circular dependencies, and GD,
which might be easy to solve if anyone knows what the error
means - see commit message here:
https://github.com/peterjc/bioperl-live/commit/905441ac09939be3368c14de38d04486c7e9849a

Would a single clean commit of the (current) .travis.yml file be
preferable to the current series of commits? And you you want
a pull request, or would you just merge/cherry-pick manually?

> TravisCI can send out emails (e.g. on test failures), and perhaps
> bioperl-guts-l might be a sensible place to send these. Initially
> we'd disabled the emails for Biopython. I'd like to use an RSS
> feed... there is a JSON API which BioRuby are using for
> http://www.biogems.info/ which tracks their plugins.

I've filed an issue for news feed support in TravisCI,
https://github.com/travis-ci/travis-core/issues/82

Regards,

Peter


From p.j.a.cock at googlemail.com  Tue Jul 31 06:37:35 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 31 Jul 2012 11:37:35 +0100
Subject: [Bioperl-l] Travis Continuous Integration testing & pull requests
Message-ID: <CAKVJ-_5Yt8O21Wo0o6p1VgjheaaPYBDQEbF6geaABb1M++GfRA@mail.gmail.com>

Hi all,

I'm cross posting as this is an announcement. Please keep any
follow up discussion to the relevant project specific mailing list,
or if general open-bio-l please.

Those following the OBF blog or the OBF or Bio* Twitter accounts
will have already seen this, which I posted yesterday:
http://news.open-bio.org/news/2012/07/travis-ci-for-testing/

In summary, since earlier this year BioRuby and then Biopython and
BioPerl have been using Travis-CI.org (a hosted continuous
integration service for the open source community) to run their unit
tests automatically whenever their GitHub repositories are updated.

In addition we now have TravisCI automatically running our tests
on any new GitHub pull requests - supported by an OBF donation
to Travis-CI, see:
http://about.travis-ci.org/blog/announcing-pull-request-support/

Currently BioJava only uses GitHub as an SVN mirror - but this
should still let you start using TravisCI for automated testing:
http://about.travis-ci.org/docs/user/languages/java/

For EMBOSS, this is another incentive to convert from CVS to
github - TravisCI recently announced support for C/C++ projects:
http://about.travis-ci.org/blog/support_for_go_c_and_cpp/
http://about.travis-ci.org/docs/user/languages/c/

Potentially there are other OBF projects where this would be
useful too.

Regards,

Peter


From wrp at virginia.edu  Mon Jul  2 14:31:40 2012
From: wrp at virginia.edu (William Pearson)
Date: Mon, 2 Jul 2012 10:31:40 -0400
Subject: [Bioperl-l] Application Deadline - 2012 CSHL Computational and
	Comparative Genomics Course
Message-ID: <AAD6FEEC-F120-4B30-8208-1E48D8129FE6@virginia.edu>


Course announcement - Application deadline, July 15, 2012

Cold Spring Harbor COMPUTATIONAL & COMPARATIVE GENOMICS

Oct 31 - Nov 6, 2011
Application Deadline: July 15, 2012

INSTRUCTORS:

William Pearson, University of Virginia, Charlottesville, VA
Lisa Stubbs, University of Illinois, Urbana, IL
This course presents a comprehensive overview of the theory and practice of computational methods for the identification and characterization of functional elements from DNA sequence data. The course focuses on approaches for extracting the maximum amount of information from protein and DNA sequence similarity through sequence database searches, statistical analysis, and multiple sequence alignment. Additional topics include:

Alignment and analysis of "Next-Gen" sequencing data
The Galaxy environment for high-throughput analysis
Identification of conserved signals in aligned and unaligned sequences
Regulatory element and motif recognition
Integration of genetic and sequence information in biological databases
The ENSEMBL genome browser and BioMart
Function/phenotype prediction for sequence variants
The course combines lectures with hands-on exercises; students are encouraged to pose challenging sequence analysis problems using their own data. The course is designed for biologists seeking advanced training in biological sequence and genome analysis, computational biology core resource directors and staff, and for scientists in other disciplines, such as computer science, who wish to survey current research problems in biological sequence analysis. Advanced programming skills are not required.

The lecture/lab schedule for the 2011 course can be found at fasta.bioch.virginia.edu/cshl

Speakers in the 2011 course included:
Aaron Mackey, U. of Virginia, Next-Gen analysis pipelines
Bert Overduin, European Bioinformatics Institute, UK, ENSEMBL and BioMart
Francis Ouellette, Ontario Institute for Cancer Research, Databases for Biological Function
William Pearson, U. of Virginia, Similarity Searching, Multiple Alignment
Lisa Stubbs, U. of Illinois, Urbana, ChIP, Transcription Factors, and Comparative Genomics
James Taylor, Emory, Galaxy and genome analysis pipelines
The primary focus of the computational and comparative genomics course is the theory and practice of algorithms used in computational biology, with the goal of using current methods more effectively and evaluating new approaches. Students who wish to learn Perl programming for Bioinformatics are encouraged to apply to the Programming for Biology course. Students who would like in-depth training in the analysis of next-generation sequencing data (e.g., SNP calling and the detection of structural variants) should apply to the course on Advanced Sequencing Technologies & Applications. This Computational and Comparative Genomics course will discuss methods for phenotype prediction from variation data.

To apply to the course, fill out and send in the form at:

http://meetings.cshl.edu/course/courseapp_instr.shtml


From shalabh.sharma7 at gmail.com  Mon Jul  2 17:09:57 2012
From: shalabh.sharma7 at gmail.com (shalabh sharma)
Date: Mon, 2 Jul 2012 13:09:57 -0400
Subject: [Bioperl-l] translation frame problem in bioperl
Message-ID: <CAA7rn9cnqjsMv-ggCn_H+WavqdtCZZfh37v8pkLak1-ct8UX+A@mail.gmail.com>

Hi All,
         I am just confused about the translation frames. I used bioperl to
parse a blastx report.
Reports shows that the frame used is -2 but when i translate the sequence
using EMBOSS or Some other program the frame is -1.
Am i doing something wrong here.

Here is the sequence:
>gi|378759230|gb|AHBJ01000169.1| SAR86 cluster bacterium SAR86D
scf1120176765857, whole genome shotgun sequence 2642:3697
AGCTTCCCATGGAACCCATGCAAGTGCAATATTTGTTTCTAGCTCTGGTGACCACCAAGGAGATGTCACGTAGCCCACCTCATCTTCATCAGTATTAGTTACTATCCAAAAATCAGAAGCATAATCTGTGATTTCTTTTCCTCCAAGGGTTAAACCAACCATCTTCATTTTAAATGGTGCATTTCCTTCATCTATGATTGCTCTCTGTTTTTCAAGCTCTTCTTTACCAATGTAATCAGCTGCTTTATTTCTTGGTACCTGATAACTTAAATTAACCTGAAAGGGAGAAGTTTCATGATCCAGATCTTGTCCCCAAGACAAAATTCCAGCTGCAATGCGACGATGATGCGCAGGAGCTATGACCATTAAGCCAAATTCTTCTCCAGCCTCAAGAACAGCATTCCACATTTTTTCTGCATTATCATGTGCGTCACGAACATATATTTCATAACCTTTTTCGCCTGTAAAACCAGTTTGACTGATTACACAATCAGCTCCACCAACCTGAGTTTCTAAAATTCCATAATAAGGAACTTCTCTTAACTCTTCGCCAGCTAACTTTGCCATAAGATCTTCAGATAAAGGGCCTTGAATTTGAACAGGACAAACATCAATCTCATCAATTTCTACGTCATATTTTTTAGACACATTTACGCCTTGAAGCCAAAGTAAGAGATCGCTGTCTGATATTGAGAACCAGAATTCATCTTCTGTTAGTCTTAATAGAACAGGGTCATTTAAAACCCCTCCTTTTTCATTGCATAAAATCGCATATTTACCATTTCCGGGTTTAATTTTTGTAGCATCACGAGTTATTACATAATCTGTAAAAGCTTCTGCATCTGGACCTTTTACTCTTATCTGTCTTTCAACAGCAACATTCCACATAGTAACTCTATTAACCAAGGCTTCGTATTCAACCATGGCACCGCCATCTTCAGGTTTTACATAGCCTCGTGGATGATAAATTCGATTATATACAGTTGCTCTCCAACAGCCCGCTTCATGAGATAGATGCCAAAAAGGCGATTTTCTTACCCGGGTTGAAATTAATAA

This is a part of blast report by bioperl:
>JCVI_READ_1105499496127 /Indian_Ocean/gcvT
          Length = 352

 Score =  655 bits (1690), Expect = 0.0
 Identities = 311/352 (88%), Positives = 329/352 (93%)
 Frame = -2

Query: 3697 LLISTRVRKSPFWHLSHEAGCWRATVYNRIYHPRGYVKPEDGGAMVEYEALVNRVTMWNV
3518
            +LISTRVRKSPFWHLSHEAGCWRATVYNRIYHPRGY+KPEDGGAMVEY+ALVNRVTMWNV
Sbjct: 1    MLISTRVRKSPFWHLSHEAGCWRATVYNRIYHPRGYIKPEDGGAMVEYDALVNRVTMWNV 60
 .....
.....
Query: 2797 GLTLGGKEITDYASDFWIVTNTDEDEVGYVTSPWWSPELETNIALAWVPWEA 2642
            GLTLGGKEITDYA DFW+V + D   +     PWWSPEL TNIAL WVPW A
Sbjct: 301  GLTLGGKEITDYAPDFWLVADMDGMMLDISLPPWWSPELNTNIALGWVPWSA 352

This is EMBOSS output (from EBI):

>EMBOSS_001_4
LLISTRVRKSPFWHLSHEAGCWRATVYNRIYHPRGYVKPEDGGAMVEYEALVNRVTMWNV
AVERQIRVKGPDAEAFTDYVITRDATKIKPGNGKYAILCNEKGGVLNDPVLLRLTEDEFW
FSISDSDLLLWLQGVNVSKKYDVEIDEIDVCPVQIQGPLSEDLMAKLAGEELREVPYYGI
LETQVGGADCVISQTGFTGEKGYEIYVRDAHDNAEKMWNAVLEAGEEFGLMVIAPAHHRR
IAAGILSWGQDLDHETSPFQVNLSYQVPRNKAADYIGKEELEKQRAIIDEGNAPFKMKMV
GLTLGGKEITDYASDFWIVTNTDEDEVGYVTSPWWSPELETNIALAWVPWEA
>EMBOSS_001_5
INFNPGKKIAFLASIS*SGLLESNCI*SNLSSTRLCKT*RWRCHG*IRSLG**SYYVECC
C*KTDKSKRSRCRSFYRLCNNS*CYKN*TRKW*ICDFMQ*KRRGFK*PCSIKTNRR*ILV

......

You can see its a frame -1.

I would really appreciate your help.


Thanks

Shalabh

-- 
Shalabh Sharma
Scientific Computing Professional Associate (Bioinformatics Specialist)
Department of Marine Sciences
University of Georgia
Athens, GA 30602-3636


From scott at scottcain.net  Mon Jul  2 18:50:45 2012
From: scott at scottcain.net (Scott Cain)
Date: Mon, 2 Jul 2012 14:50:45 -0400
Subject: [Bioperl-l] GMOD Summer School application deadline
Message-ID: <CA+JTaoym218vQ9_VvtwvMW9ZaoHfcu1hxG59L2z=ovE6qWN6Mw@mail.gmail.com>

Hello,

The deadline to apply for the GMOD Summer School is in one week, July
9th.  The application is available as a Google Form:

  https://docs.google.com/spreadsheet/embeddedform?formkey=dG5hNGFiQ3UwYTV2LUZxZW04Qm1yZXc6MQ

In the GMOD Summer School (August 24-29, 2012) we will cover the
installation, configuration and use of a variety of GMOD tools,
including Chado, GBrowse, JBrowse and Tripal.  For more information on
the course, see the course web page at

  http://gmod.org/wiki/2012_GMOD_Summer_School

The course will make heavy use of the Amazon Web Service (aka, the
Cloud) via a grant from Amazon.  Enrollment is limited to 24 students,
and the application process is competitive: the last few years we've
received over 75 applications for those 24 spots.

I look forward to seeing you in North Carolina in August!

Scott


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research


From p.j.a.cock at googlemail.com  Mon Jul  2 19:34:40 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 2 Jul 2012 20:34:40 +0100
Subject: [Bioperl-l] translation frame problem in bioperl
In-Reply-To: <CAA7rn9cnqjsMv-ggCn_H+WavqdtCZZfh37v8pkLak1-ct8UX+A@mail.gmail.com>
References: <CAA7rn9cnqjsMv-ggCn_H+WavqdtCZZfh37v8pkLak1-ct8UX+A@mail.gmail.com>
Message-ID: <CAKVJ-_5nX7f2ZejEpeVnMhd35PYxxrJX2mxu4A712NPOxvie0g@mail.gmail.com>

On Mon, Jul 2, 2012 at 6:09 PM, shalabh sharma
<shalabh.sharma7 at gmail.com> wrote:
> Hi All,
> ? ? ? ? ?I am just confused about the translation frames. I used bioperl to
> parse a blastx report.
> Reports shows that the frame used is -2 but when i translate the sequence
> using EMBOSS or Some other program the frame is -1.
> Am i doing something wrong here.

Possibly there are conflicting definitions of frames -1, -2, and -3 here
(and that's leaving out the possibility of -0, -1 and -2 counting). Some
will count from the first base (start for forward strand), others the last
base (start of reverse strand). This can make comparing the output
of different tools quite confusing.

Peter


From shalabh.sharma7 at gmail.com  Mon Jul  2 20:39:29 2012
From: shalabh.sharma7 at gmail.com (shalabh sharma)
Date: Mon, 2 Jul 2012 16:39:29 -0400
Subject: [Bioperl-l] translation frame problem in bioperl
In-Reply-To: <98F2B304-71F3-42BD-9603-6858F03CC9F5@verizon.net>
References: <CAA7rn9cnqjsMv-ggCn_H+WavqdtCZZfh37v8pkLak1-ct8UX+A@mail.gmail.com>
	<CAKVJ-_5nX7f2ZejEpeVnMhd35PYxxrJX2mxu4A712NPOxvie0g@mail.gmail.com>
	<98F2B304-71F3-42BD-9603-6858F03CC9F5@verizon.net>
Message-ID: <CAA7rn9eiKh6EszMsV=Vk65rJPywOUJijP0Q=rYg_vt=xoK-0=Q@mail.gmail.com>

Hi Peter and Brian,
                           Thanks a lot for your reply.
I have already taken this in account.
So if i parse the blast report (my previous example)
i get strand '-1' and frame '1' (according to bioperl)
so if we convert it to general term then its -2 because bioperl starts from
0.

Also for bioperl forward frame translation working fine.


Thanks
Shalabh
On Mon, Jul 2, 2012 at 4:24 PM, Brian Osborne <bosborne11 at verizon.net>wrote:

> Shalabh,
>
> Also take a look at this:
>
> http://www.bioperl.org/wiki/HOWTO:SearchIO#frame.28.29
>
> Brian O.
>
>
> On Jul 2, 2012, at 3:34 PM, Peter Cock wrote:
>
> > On Mon, Jul 2, 2012 at 6:09 PM, shalabh sharma
> > <shalabh.sharma7 at gmail.com> wrote:
> >> Hi All,
> >>          I am just confused about the translation frames. I used
> bioperl to
> >> parse a blastx report.
> >> Reports shows that the frame used is -2 but when i translate the
> sequence
> >> using EMBOSS or Some other program the frame is -1.
> >> Am i doing something wrong here.
> >
> > Possibly there are conflicting definitions of frames -1, -2, and -3 here
> > (and that's leaving out the possibility of -0, -1 and -2 counting). Some
> > will count from the first base (start for forward strand), others the
> last
> > base (start of reverse strand). This can make comparing the output
> > of different tools quite confusing.
> >
> > Peter
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


-- 
Shalabh Sharma
Scientific Computing Professional Associate (Bioinformatics Specialist)
Department of Marine Sciences
University of Georgia
Athens, GA 30602-3636


From bosborne11 at verizon.net  Mon Jul  2 20:24:24 2012
From: bosborne11 at verizon.net (Brian Osborne)
Date: Mon, 02 Jul 2012 16:24:24 -0400
Subject: [Bioperl-l] translation frame problem in bioperl
In-Reply-To: <CAKVJ-_5nX7f2ZejEpeVnMhd35PYxxrJX2mxu4A712NPOxvie0g@mail.gmail.com>
References: <CAA7rn9cnqjsMv-ggCn_H+WavqdtCZZfh37v8pkLak1-ct8UX+A@mail.gmail.com>
	<CAKVJ-_5nX7f2ZejEpeVnMhd35PYxxrJX2mxu4A712NPOxvie0g@mail.gmail.com>
Message-ID: <98F2B304-71F3-42BD-9603-6858F03CC9F5@verizon.net>

Shalabh,

Also take a look at this:

http://www.bioperl.org/wiki/HOWTO:SearchIO#frame.28.29

Brian O.


On Jul 2, 2012, at 3:34 PM, Peter Cock wrote:

> On Mon, Jul 2, 2012 at 6:09 PM, shalabh sharma
> <shalabh.sharma7 at gmail.com> wrote:
>> Hi All,
>>          I am just confused about the translation frames. I used bioperl to
>> parse a blastx report.
>> Reports shows that the frame used is -2 but when i translate the sequence
>> using EMBOSS or Some other program the frame is -1.
>> Am i doing something wrong here.
> 
> Possibly there are conflicting definitions of frames -1, -2, and -3 here
> (and that's leaving out the possibility of -0, -1 and -2 counting). Some
> will count from the first base (start for forward strand), others the last
> base (start of reverse strand). This can make comparing the output
> of different tools quite confusing.
> 
> Peter
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From vebaev at gmail.com  Tue Jul  3 16:35:26 2012
From: vebaev at gmail.com (vebaev at gmail.com)
Date: Tue, 3 Jul 2012 09:35:26 -0700 (PDT)
Subject: [Bioperl-l] CFP - International Conference on Bioinformatics and
 Computational Biology - BIOCOMP BG 2012
Message-ID: <7b498b4c-2b2e-4e1f-871f-513203488bf1@googlegroups.com>

International Conference on Bioinformatics and Computational Biology - 
BIOCOMP BG 2012 
September 20-21, 2012, Varna, Bulgaria
 

Dear Colleague,
It is our pleasure to circulate the 2nd announcement of the International 
Conference on Bioinformatics and Computational Biology - BIOCOMP 2012 
(http://biocomp.bio.uni-plovdiv.bg/).
 
Keynote speakers
 
Prof. Dr. Klaas Vandepoele - Ghent University, Belgium
Dr. Andreas Gisel - Institute for Biomedical Technologies, Italy
Prof. Wojciech Karlowski - Insitute of Molecular Biology and Biotechnology, 
Poland
Prof. Mario A. Fares - University of Dublin, Trinity College, Ireland
Dr.Andrey Kajava - CRBM - Macromolecular Biochemistry Research Center, 
France
Dr.Gaurav Sablok - Istituto Agrario San Michele (IASMA), Italy
 
Topics
 
Topics of interest include, but are not limited to:
 
High-performance bio-computing
High-throughput sequencing data analysis (NGS)
Bio-ontologies
Molecular evolution
Comparative genomics
Molecular modeling and simulation
Computational genetics 
Computational proteomics
Data mining and visualization
Software tools and applications
Gene expression analysis
Gene networks
Structural biology
Genome analysis
Databases
Systems biology
 
Special topic: bioinformatics and miRNAs
 
Recent achievements in these fields will be presented. The conference will 
include plenary and poster sessions. Participant?s proposals will be taken 
under advisement in compiling the program. 
 
Publications
 
All accepted abstracts will be published in the conference abstract book.
Best 20 abstracts will be peer-reviewed and published as full text 
manuscripts in a Special Issue of Springer and Elsevier journals:
Interdisciplinary Sciences: Computational Life Sciences (ISSN: 1867-1462).
Journal of Computational Science (ISSN: 1877-7503)


Venue
The venue of the conference is 4-star All-inclusive Sunny Day Black Sea 
resort, Bulgaria
 
Registration and abstract submission
 
All the actions related to the BIOCOMP 2012  (abstract submission, 
registration etc) may be completed via the Conference website at 
http://biocomp.bio.uni-plovdiv.bg/
 
Accommodation
 
IMPORTANT: Accommodation is included in the conference registration fee.
 
Important dates
 
Abstract Submission Deadline - 20 August 2012
Early Registration Fee Payment Deadline - 20 August 2012
Arriving, Poster set up, Registration ? 19 September 2012
Plenary and Poster Sessions ? 20-21 September 2012
 
 
You may find details of the Conference visiting the Conference website at 
http://biocomp.bio.uni-plovdiv.bg/
 
 
Looking forward to see you in Bulgaria!


------------------------------------------------


Dr. Vesselin Baev
Research Assistant Professor

University of Plovdiv
Dept. Plant Phys. and Molecular Biology
Bioinformatics SMART Group

Tzar Assen 24,Plovdiv 4000, BULGARIA
Office:+359 32 261 (560); Mobile:+359 89 43 80 945
vebaev at gmail.com; baev at uni-plovdiv.bg; CV: http://plantgene.eu/


From tarakaramji at gmail.com  Tue Jul  3 19:33:43 2012
From: tarakaramji at gmail.com (Tarakaramji Moturu)
Date: Tue, 3 Jul 2012 19:33:43 +0000 (UTC)
Subject: [Bioperl-l] Invitation to connect on LinkedIn
Message-ID: <757342252.16905070.1341344023805.JavaMail.app@ela4-bed83.prod>

LinkedIn
------------


I'd like to add you to my professional network on LinkedIn.

- Tarakaramji

Tarakaramji Moturu
Student at GITAM University
Vishakhapatnam Area, India

Confirm that you know Tarakaramji Moturu:
https://www.linkedin.com/e/1505z7-h47dlkop-69/isd/7726719493/9xC087NO/?hs=false&tok=2UuxBwCCkl7Rk1

--
You are receiving Invitation to Connect emails. Click to unsubscribe:
http://www.linkedin.com/e/1505z7-h47dlkop-69/q7l5PgNeLXh3mAgNJzs79PDWzhT0l80xWa/goo/bioperl-l%40bioperl%2Eorg/20061/I2613636655_1/?hs=false&tok=0hY4YIDwkl7Rk1

(c) 2012 LinkedIn Corporation. 2029 Stierlin Ct, Mountain View, CA 94043, USA.


From l.m.timmermans at students.uu.nl  Wed Jul  4 07:16:34 2012
From: l.m.timmermans at students.uu.nl (Leon Timmermans)
Date: Wed, 4 Jul 2012 10:16:34 +0300
Subject: [Bioperl-l] Invitation to connect on LinkedIn
In-Reply-To: <757342252.16905070.1341344023805.JavaMail.app@ela4-bed83.prod>
References: <757342252.16905070.1341344023805.JavaMail.app@ela4-bed83.prod>
Message-ID: <CAC1jpXBYn+3+TtN=n7NoRhnQ87-EuYRpsuBqpE5iHHHAntJ_ew@mail.gmail.com>

On Tue, Jul 3, 2012 at 10:33 PM, Tarakaramji Moturu
<tarakaramji at gmail.com> wrote:
> LinkedIn
> ------------
>
>
>
> I'd like to add you to my professional network on LinkedIn.
>
> - Tarakaramji

Sending messages like this directly over mailinglists is a rather bad
idea, if only because LinkedIn will think bioperl-l at bioperl.org is one
of the email addresses of whomever accepts the request (which is
relevant for retrieving a lost password, I think).

Leon


From ulrik.stervbo at gmail.com  Fri Jul  6 07:03:08 2012
From: ulrik.stervbo at gmail.com (Ulrik Stervbo)
Date: Fri, 6 Jul 2012 09:03:08 +0200
Subject: [Bioperl-l] problem in using protparam.pm module
In-Reply-To: <CACyyM1bpg+pOwCK26M2OgNaBD2VJsaXtc02WkGHrKg1QxVL0Tw@mail.gmail.com>
References: <CACyyM1Za0pW9R2GPxAFOATj4tPM24rbwLsUazGUFHNimZ2LO4g@mail.gmail.com>
	<CACyyM1ZUf2y7fO2Vkc5cSoqSgtW86GA+1yt_ctc9Y8P3Qbnt_A@mail.gmail.com>
	<9AA27ADA-FFE1-4735-BDE4-56C9B9A18009@illinois.edu>
	<CACyyM1bpg+pOwCK26M2OgNaBD2VJsaXtc02WkGHrKg1QxVL0Tw@mail.gmail.com>
Message-ID: <CAKVAULOTmTparqLEuREquzcrO6Hf+Z7MSGRAPBWMTqH0dJ7-xg@mail.gmail.com>

I had the same problem, and realized it is because I am behind a proxy.

This is what I did to the Protparam module:
Changed the url to 'http://web.expasy.org/cgi-bin/protparam/protparam'
as previously found

Added:
$browser->proxy(['http'], 'http://[my proxy]/'); after initialization
of the LWP agent.

The proxy settings is what made Perl choke. (If only one could make
perl see global proxy settings).

Cheers,
Ulrik

2011/7/28 Shachi Gahoi <shachigahoimbi at gmail.com>:
> Please help me how to run protparam using bioperl module
>
> On Wed, Jul 27, 2011 at 11:05 AM, Chris Fields <cjfields at illinois.edu>wrote:
>
>> The web service appears to have changed, but it looks as if no tests have
>> been written up for this module which would have caught this out.  We can
>> write some basic tests up to check for simple functionality.
>>
>> chris
>>
>> On Jul 26, 2011, at 10:58 PM, Shachi Gahoi wrote:
>>
>> > Dear All,
>> >
>> > i am using protparam.pm module. but when i am running this script it is
>> > printing one error message
>> >
>> > "Can't call method "throw" without a package or object reference at
>> > /usr/share/perl5/Bio/Root/Root.pm line 368, <GEN1> line 1."
>> >
>> > Kindly help me to solve this problem.
>> >
>> >
>> > Script is here----
>> >
>> ###################################################################################
>> > #!/usr/bin/perl
>> >
>> > use warnings;
>> > use Bio::SeqIO;
>> > use Bio::Tools::Protparam;
>> >
>> >
>> > $seqfile='test1.fasta';
>> >
>> > $seqio = Bio::SeqIO->new(-file => "$seqfile", -format => 'Fasta');
>> >
>> >
>> >    while(  $seq = $seqio->next_seq() )
>> > {
>> >
>> >
>> >        my $pp = Bio::Tools::Protparam->new(-seq=>$seq->seq);
>> >
>> >        print
>> >        "ID : ", $seq->display_id,"\n",
>> >        "Amino acid number : ",$pp->amino_acid_number(),"\n",
>> >        "Number of negative amino acids : ",$pp->num_neg(),"\n",
>> >        "Number of positive amino acids : ",$pp->num_pos(),"\n",
>> >        "Molecular weight : ",$pp->molecular_weight(),"\n",
>> >        "Theoretical pI : ",$pp->theoretical_pI(),"\n",
>> >        "Total number of atoms : ", $pp->total_atoms(),"\n",
>> >        "Number of carbon atoms : ",$pp->num_carbon(),"\n",
>> >        "Number of hydrogen atoms : ",$pp->num_hydrogen(),"\n",
>> >        "Number of nitrogen atoms : ",$pp->num_nitro(),"\n",
>> >        "Number of oxygen atoms : ",$pp->num_oxygen(),"\n",
>> >        "Number of sulphur atoms : ",$pp->num_sulphur(),"\n",
>> >        "Half life : ", $pp->half_life(),"\n",
>> >        "Instability Index : ", $pp->instability_index(),"\n",
>> >        "Stability class : ", $pp->stability(),"\n",
>> >        "Aliphatic_index : ",$pp->aliphatic_index(),"\n",
>> >        "Gravy : ", $pp->gravy(),"\n",
>> >        "Composition of A : ", $pp->AA_comp('A'),"\n",
>> >        "Composition of R : ", $pp->AA_comp('R'),"\n",
>> >        "Composition of N : ", $pp->AA_comp('N'),"\n",
>> >        "Composition of D : ", $pp->AA_comp('D'),"\n",
>> >        "Composition of C : ", $pp->AA_comp('C'),"\n",
>> >        "Composition of Q : ", $pp->AA_comp('Q'),"\n",
>> >        "Composition of E : ", $pp->AA_comp('E'),"\n",
>> >        "Composition of G : ", $pp->AA_comp('G'),"\n",
>> >        "Composition of H : ", $pp->AA_comp('H'),"\n",
>> >        "Composition of I : ", $pp->AA_comp('I'),"\n",
>> >        "Composition of L : ", $pp->AA_comp('L'),"\n",
>> >        "Composition of K : ", $pp->AA_comp('K'),"\n",
>> >        "Composition of M : ", $pp->AA_comp('M'),"\n",
>> >        "Composition of F : ", $pp->AA_comp('F'),"\n",
>> >        "Composition of P : ", $pp->AA_comp('P'),"\n",
>> >        "Composition of S : ", $pp->AA_comp('S'),"\n",
>> >        "Composition of T : ", $pp->AA_comp('T'),"\n",
>> >        "Composition of W : ", $pp->AA_comp('W'),"\n",
>> >        "Composition of Y : ", $pp->AA_comp('Y'),"\n",
>> >        "Composition of V : ", $pp->AA_comp('V'),"\n",
>> >        "Composition of B : ", $pp->AA_comp('B'),"\n",
>> >        "Composition of Z : ", $pp->AA_comp('Z'),"\n",
>> >        "Composition of X : ", $pp->AA_comp('X'),"\n";
>> > }
>> >
>> ###################################################################################
>> >
>> >
>> >
>> >
>> > --
>> > Regards,
>> > Shachi
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
>
> --
> Regards,
> Shachi
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason.stajich at gmail.com  Fri Jul  6 17:49:46 2012
From: jason.stajich at gmail.com (Jason Stajich)
Date: Fri, 6 Jul 2012 10:49:46 -0700
Subject: [Bioperl-l] problem in using protparam.pm module
In-Reply-To: <CAKVAULOTmTparqLEuREquzcrO6Hf+Z7MSGRAPBWMTqH0dJ7-xg@mail.gmail.com>
References: <CACyyM1Za0pW9R2GPxAFOATj4tPM24rbwLsUazGUFHNimZ2LO4g@mail.gmail.com>
	<CACyyM1ZUf2y7fO2Vkc5cSoqSgtW86GA+1yt_ctc9Y8P3Qbnt_A@mail.gmail.com>
	<9AA27ADA-FFE1-4735-BDE4-56C9B9A18009@illinois.edu>
	<CACyyM1bpg+pOwCK26M2OgNaBD2VJsaXtc02WkGHrKg1QxVL0Tw@mail.gmail.com>
	<CAKVAULOTmTparqLEuREquzcrO6Hf+Z7MSGRAPBWMTqH0dJ7-xg@mail.gmail.com>
Message-ID: <8C9056B6-1DA4-4BE0-B008-429C2F6C05BE@gmail.com>

you might try the PERL_LWP_ENV_PROXY and HTTP_PROXY env variables
http://search.cpan.org/~gaas/libwww-perl-6.04/lib/LWP/UserAgent.pm#CONSTRUCTOR_METHODS
http://search.cpan.org/~gaas/libwww-perl-6.04/lib/LWP/UserAgent.pm#Proxy_attributes

I can't test it my end though w/o a proxy service.


On Jul 6, 2012, at 12:03 AM, Ulrik Stervbo wrote:

> I had the same problem, and realized it is because I am behind a proxy.
> 
> This is what I did to the Protparam module:
> Changed the url to 'http://web.expasy.org/cgi-bin/protparam/protparam'
> as previously found
> 
> Added:
> $browser->proxy(['http'], 'http://[my proxy]/'); after initialization
> of the LWP agent.
> 
> The proxy settings is what made Perl choke. (If only one could make
> perl see global proxy settings).
> 
> Cheers,
> Ulrik
> 
> 2011/7/28 Shachi Gahoi <shachigahoimbi at gmail.com>:
>> Please help me how to run protparam using bioperl module
>> 
>> On Wed, Jul 27, 2011 at 11:05 AM, Chris Fields <cjfields at illinois.edu>wrote:
>> 
>>> The web service appears to have changed, but it looks as if no tests have
>>> been written up for this module which would have caught this out.  We can
>>> write some basic tests up to check for simple functionality.
>>> 
>>> chris
>>> 
>>> On Jul 26, 2011, at 10:58 PM, Shachi Gahoi wrote:
>>> 
>>>> Dear All,
>>>> 
>>>> i am using protparam.pm module. but when i am running this script it is
>>>> printing one error message
>>>> 
>>>> "Can't call method "throw" without a package or object reference at
>>>> /usr/share/perl5/Bio/Root/Root.pm line 368, <GEN1> line 1."
>>>> 
>>>> Kindly help me to solve this problem.
>>>> 
>>>> 
>>>> Script is here----
>>>> 
>>> ###################################################################################
>>>> #!/usr/bin/perl
>>>> 
>>>> use warnings;
>>>> use Bio::SeqIO;
>>>> use Bio::Tools::Protparam;
>>>> 
>>>> 
>>>> $seqfile='test1.fasta';
>>>> 
>>>> $seqio = Bio::SeqIO->new(-file => "$seqfile", -format => 'Fasta');
>>>> 
>>>> 
>>>>   while(  $seq = $seqio->next_seq() )
>>>> {
>>>> 
>>>> 
>>>>       my $pp = Bio::Tools::Protparam->new(-seq=>$seq->seq);
>>>> 
>>>>       print
>>>>       "ID : ", $seq->display_id,"\n",
>>>>       "Amino acid number : ",$pp->amino_acid_number(),"\n",
>>>>       "Number of negative amino acids : ",$pp->num_neg(),"\n",
>>>>       "Number of positive amino acids : ",$pp->num_pos(),"\n",
>>>>       "Molecular weight : ",$pp->molecular_weight(),"\n",
>>>>       "Theoretical pI : ",$pp->theoretical_pI(),"\n",
>>>>       "Total number of atoms : ", $pp->total_atoms(),"\n",
>>>>       "Number of carbon atoms : ",$pp->num_carbon(),"\n",
>>>>       "Number of hydrogen atoms : ",$pp->num_hydrogen(),"\n",
>>>>       "Number of nitrogen atoms : ",$pp->num_nitro(),"\n",
>>>>       "Number of oxygen atoms : ",$pp->num_oxygen(),"\n",
>>>>       "Number of sulphur atoms : ",$pp->num_sulphur(),"\n",
>>>>       "Half life : ", $pp->half_life(),"\n",
>>>>       "Instability Index : ", $pp->instability_index(),"\n",
>>>>       "Stability class : ", $pp->stability(),"\n",
>>>>       "Aliphatic_index : ",$pp->aliphatic_index(),"\n",
>>>>       "Gravy : ", $pp->gravy(),"\n",
>>>>       "Composition of A : ", $pp->AA_comp('A'),"\n",
>>>>       "Composition of R : ", $pp->AA_comp('R'),"\n",
>>>>       "Composition of N : ", $pp->AA_comp('N'),"\n",
>>>>       "Composition of D : ", $pp->AA_comp('D'),"\n",
>>>>       "Composition of C : ", $pp->AA_comp('C'),"\n",
>>>>       "Composition of Q : ", $pp->AA_comp('Q'),"\n",
>>>>       "Composition of E : ", $pp->AA_comp('E'),"\n",
>>>>       "Composition of G : ", $pp->AA_comp('G'),"\n",
>>>>       "Composition of H : ", $pp->AA_comp('H'),"\n",
>>>>       "Composition of I : ", $pp->AA_comp('I'),"\n",
>>>>       "Composition of L : ", $pp->AA_comp('L'),"\n",
>>>>       "Composition of K : ", $pp->AA_comp('K'),"\n",
>>>>       "Composition of M : ", $pp->AA_comp('M'),"\n",
>>>>       "Composition of F : ", $pp->AA_comp('F'),"\n",
>>>>       "Composition of P : ", $pp->AA_comp('P'),"\n",
>>>>       "Composition of S : ", $pp->AA_comp('S'),"\n",
>>>>       "Composition of T : ", $pp->AA_comp('T'),"\n",
>>>>       "Composition of W : ", $pp->AA_comp('W'),"\n",
>>>>       "Composition of Y : ", $pp->AA_comp('Y'),"\n",
>>>>       "Composition of V : ", $pp->AA_comp('V'),"\n",
>>>>       "Composition of B : ", $pp->AA_comp('B'),"\n",
>>>>       "Composition of Z : ", $pp->AA_comp('Z'),"\n",
>>>>       "Composition of X : ", $pp->AA_comp('X'),"\n";
>>>> }
>>>> 
>>> ###################################################################################
>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Regards,
>>>> Shachi
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>>> 
>> 
>> 
>> --
>> Regards,
>> Shachi
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org


From bubli_thakur at rediffmail.com  Sun Jul  1 14:59:29 2012
From: bubli_thakur at rediffmail.com (subarna thakur)
Date: Sun, 01 Jul 2012 14:59:29 -0000
Subject: [Bioperl-l] =?utf-8?q?Ks_saturation?=
Message-ID: <20120617031856.16345.qmail@f4mail-235-140.rediffmail.com>


Dear all,I am trying to calculate dn/ds values of &nbsp;all orthologous gene pair between a pair of genome using pairwsie_kaks.pl script within bioperl which evokes the codeml program in runmode -2. When I am analyzing the results, some of the genes have&nbsp;anomalously high dS or Ks values and some of them even reaching more than 100 as a result of which the average value of Ks for the whole genome &nbsp;shots up. These genes are orthologous genes and even share more than 50% sequence identity. Should I consider these genes for the anlysis or left them out. If I left them out, then upto what cutoff value of Ks should I consider for analysis. In some papers, I have found that they have considered Ks values as high as 5.6. Is there a way for determining the cutoff value for Ks?
Subarna&nbsp;&nbsp;


From haywardjeremya at gmail.com  Fri Jul  6 17:56:12 2012
From: haywardjeremya at gmail.com (Jeremy Hayward)
Date: Fri, 6 Jul 2012 14:56:12 -0300
Subject: [Bioperl-l] Two 'host' tags?
Message-ID: <CAJv8UuyLgFfHe-Vfyv1p+90ypO+NFQ5E_br1YYyL=PKg++yyNA@mail.gmail.com>

Hi--  Clueless newbie here, for which apologies.

I've posted a description of my problem, inputs and outputs, at Gist
2816510; https://gist.github.com/2816510

Briefly, I'm trying to take a genbank file (.gb), and create a FASTA
file with a specific identifier line for each sequence. Specifically,
I want the "host" tag as the identifier. With the help of the Bioperl
beginner readme and the HOWTO's (which are great!) I've worked out how
to loop through my sequences and get the 'host' tag for each one. For
some reason, I get two identifier lines for each sequence. I guess the
problem is in the 'for' loop--it's running the stuff below it twice,
once with the actual 'host' tag data and once with...nothing? Not
sure.

I think I can work out how to use s/ and a regex just to delete the
second identifier line, but that feels like I'm avoiding the problem
instead of fixing it. Any help appreciated!


Many thanks,

--Jeremy Hayward


From jason.stajich at gmail.com  Fri Jul  6 19:39:52 2012
From: jason.stajich at gmail.com (Jason Stajich)
Date: Fri, 6 Jul 2012 12:39:52 -0700
Subject: [Bioperl-l] Two 'host' tags?
In-Reply-To: <CAJv8UuyLgFfHe-Vfyv1p+90ypO+NFQ5E_br1YYyL=PKg++yyNA@mail.gmail.com>
References: <CAJv8UuyLgFfHe-Vfyv1p+90ypO+NFQ5E_br1YYyL=PKg++yyNA@mail.gmail.com>
Message-ID: <A572776F-CF80-4EB2-A853-B62EE5AF5266@gmail.com>

Hi Jeremy -

You are printing for every feature in the loop (e.g. the source and the misc_RNA ) - you only want to loop through the features, then grab the one which is source, then change or print the info when you see that.  So you could have an if( $feature->primary_tag eq 'source') in there or something as well.  Alternatively I've left it pretty much intact and just simplified it a bit.
You should also try and use Bio::SeqIO to print instead of your printing.

I updated the code here to be simpler - right now it warns you that you are printing IDs with spaces (which is something you should think about when it comes to your output file, but I don't know your downstream plans). Also you could put other info in the description field if you wanted to capture accession number or the endophyte name too.

https://gist.github.com/3062285

Best,
Jason
On Jul 6, 2012, at 10:56 AM, Jeremy Hayward wrote:

> Hi--  Clueless newbie here, for which apologies.
> 
> I've posted a description of my problem, inputs and outputs, at Gist
> 2816510; https://gist.github.com/2816510
> 
> Briefly, I'm trying to take a genbank file (.gb), and create a FASTA
> file with a specific identifier line for each sequence. Specifically,
> I want the "host" tag as the identifier. With the help of the Bioperl
> beginner readme and the HOWTO's (which are great!) I've worked out how
> to loop through my sequences and get the 'host' tag for each one. For
> some reason, I get two identifier lines for each sequence. I guess the
> problem is in the 'for' loop--it's running the stuff below it twice,
> once with the actual 'host' tag data and once with...nothing? Not
> sure.
> 
> I think I can work out how to use s/ and a regex just to delete the
> second identifier line, but that feels like I'm avoiding the problem
> instead of fixing it. Any help appreciated!
> 
> 
> Many thanks,
> 
> --Jeremy Hayward
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org


From bosborne11 at verizon.net  Fri Jul  6 19:51:11 2012
From: bosborne11 at verizon.net (Brian Osborne)
Date: Fri, 06 Jul 2012 15:51:11 -0400
Subject: [Bioperl-l] Two 'host' tags?
In-Reply-To: <CAJv8UuyLgFfHe-Vfyv1p+90ypO+NFQ5E_br1YYyL=PKg++yyNA@mail.gmail.com>
References: <CAJv8UuyLgFfHe-Vfyv1p+90ypO+NFQ5E_br1YYyL=PKg++yyNA@mail.gmail.com>
Message-ID: <456448FF-C413-42D1-833A-FAA74E4FEF9E@verizon.net>

Jeremy,

Looks like each of your individual sequences has 2 features, but you only care about the 'source' feature (   if ($feat_object->primary_tag eq "source") ?).

Also, try not to print out the sequence like you're doing, try to build a Sequence object for each input sequence and then write its contents to your fasta file using write_seq(). You will set the id for your Sequence object using display_name().

Brian O.


On Jul 6, 2012, at 1:56 PM, Jeremy Hayward wrote:

> Hi--  Clueless newbie here, for which apologies.
> 
> I've posted a description of my problem, inputs and outputs, at Gist
> 2816510; https://gist.github.com/2816510
> 
> Briefly, I'm trying to take a genbank file (.gb), and create a FASTA
> file with a specific identifier line for each sequence. Specifically,
> I want the "host" tag as the identifier. With the help of the Bioperl
> beginner readme and the HOWTO's (which are great!) I've worked out how
> to loop through my sequences and get the 'host' tag for each one. For
> some reason, I get two identifier lines for each sequence. I guess the
> problem is in the 'for' loop--it's running the stuff below it twice,
> once with the actual 'host' tag data and once with...nothing? Not
> sure.
> 
> I think I can work out how to use s/ and a regex just to delete the
> second identifier line, but that feels like I'm avoiding the problem
> instead of fixing it. Any help appreciated!
> 
> 
> Many thanks,
> 
> --Jeremy Hayward
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From dejian.zhao at gmail.com  Wed Jul 11 17:31:37 2012
From: dejian.zhao at gmail.com (De-Jian Zhao)
Date: Thu, 12 Jul 2012 01:31:37 +0800
Subject: [Bioperl-l] Errors with Bio::Seq objects
Message-ID: <4FFDB879.1020906@gmail.com>

Hi,

I downloaded a nucleotide sequence from Genbank (file name: nt.gb) and 
tested the Bio::SeqIO module as follows. The first 3 commands succeeded; 
however the last 2 failed.

I think $seqio->next_seq() produces a Bio::Seq object which contains the 
sequence, features and annotation (according to the DESCRIPTION of 
"perldoc Bio::Seq") and thus the invocation of the methods 
get_SeqFeatures() and translate() should be valid. However, the results 
denied this idea.

Will anyone explain what happened to the last 2 commands? I have 
encountered numerous cases of failures when testing the bioperl methods. 
I want to translate the mRNA sequence and extract the sequence features. 
What are the right commands? Thanks a lot!

Best,
Dejian


PS: The commands and results

$ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); 
print $seqio->next_seq()->display_id(); ' nt.gb
NM_053056

$ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); 
print $seqio->next_seq()->molecule(); ' nt.gb
mRNA

$ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); 
print $seqio->next_seq()->subseq(1,6); ' nt.gb
CACACG

$ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); 
print $seqio->next_seq()->get_SeqFeatures(); ' nt.gb
Bio::SeqFeature::Generic=HASH(0x20a30898)Bio::SeqFeature::Generic=HASH(0x20a30bb0)Bio::SeqFeature::Generic=HASH(0x20a30cd0)Bio::SeqFeature::Generic=HASH(0x20a317b0)Bio::SeqFeature::Generic=HASH(0x20a31720)Bio::SeqFeature::Generic=HASH(0x20a39a18)Bio::SeqFeature::Generic=HASH(0x20a317e0)Bio::SeqFeature::Generic=HASH(0x20a398b0)Bio::SeqFeature::Generic=HASH(0x20a39838)Bio::SeqFeature::Generic=HASH(0x20a39e98)Bio::SeqFeature::Generic=HASH(0x20a3b898)Bio::SeqFeature::Generic=HASH(0x20a3a120)Bio::SeqFeature::Generic=HASH(0x20a3bda8)Bio::SeqFeature::Generic=HASH(0x20a3c030)Bio::SeqFeature::Generic=HASH(0x20a3c2b8)Bio::SeqFeature::Generic=HASH(0x20a3be20)Bio::SeqFeature::Generic=HASH(0x20a3c0a8)Bio::SeqFeature::Generic=HASH(0x20a3bb98)Bio::SeqFeature::Generic=HASH(0x20a3c300)Bio::SeqFeature::Generic=HASH(0x20a3c588)Bio::SeqFeature::Generic=HASH(0x20a3d838)Bio::SeqFeature::Generic=HASH(0x20a3dfb8)Bio::SeqFeature::Generic=HASH(0x20a3dd18)

$ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); 
print $seqio->next_seq()->translate(); ' nt.gb
Bio::Seq::RichSeq=HASH(0x20a3e7b0)


From jimhu at tamu.edu  Wed Jul 11 18:01:27 2012
From: jimhu at tamu.edu (Jim Hu)
Date: Wed, 11 Jul 2012 13:01:27 -0500
Subject: [Bioperl-l] Errors with Bio::Seq objects
In-Reply-To: <4FFDB879.1020906@gmail.com>
References: <4FFDB879.1020906@gmail.com>
Message-ID: <9CA9DA3A-B03F-4EC3-977C-E18A6F4D9B6F@tamu.edu>

Hi Dejian,

On Jul 11, 2012, at 12:31 PM, De-Jian Zhao wrote:

> Hi,
> 
> I downloaded a nucleotide sequence from Genbank (file name: nt.gb) and tested the Bio::SeqIO module as follows. The first 3 commands succeeded; however the last 2 failed.
> 
> I think $seqio->next_seq() produces a Bio::Seq object which contains the sequence, features and annotation (according to the DESCRIPTION of "perldoc Bio::Seq") and thus the invocation of the methods get_SeqFeatures() and translate() should be valid. However, the results denied this idea.

That's correct about Bio::Seq objects being returned. Actually, it is probably a kind of Bio::Seq object. For example, SeqIO may return a Bio::Seq::RichSeq object that inherits methods from Bio::Seq objects.  However, as explained below, the methods are working as they should... they are just returning objects when you are expecting something else.

> 
> Will anyone explain what happened to the last 2 commands? I have encountered numerous cases of failures when testing the bioperl methods. I want to translate the mRNA sequence and extract the sequence features. What are the right commands? Thanks a lot!
> 
> Best,
> Dejian
> 
> 
> 
> PS: The commands and results
> 
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->display_id(); ' nt.gb
> NM_053056
> 
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->molecule(); ' nt.gb
> mRNA
> 
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->subseq(1,6); ' nt.gb
> CACACG
> 
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->get_SeqFeatures(); ' nt.gb
> Bio::SeqFeature::Generic=HASH(0x20a30898)Bio::SeqFeature::Generic=HASH(0x20a30bb0)Bio::SeqFeature::Generic=HASH(0x20a30cd0)Bio::SeqFeature::Generic=HASH(0x20a317b0)Bio::SeqFeature::Generic=HASH(0x20a31720)Bio::SeqFeature::Generic=HASH(0x20a39a18)Bio::SeqFeature::Generic=HASH(0x20a317e0)Bio::SeqFeature::Generic=HASH(0x20a398b0)Bio::SeqFeature::Generic=HASH(0x20a39838)Bio::SeqFeature::Generic=HASH(0x20a39e98)Bio::SeqFeature::Generic=HASH(0x20a3b898)Bio::SeqFeature::Generic=HASH(0x20a3a120)Bio::SeqFeature::Generic=HASH(0x20a3bda8)Bio::SeqFeature::Generic=HASH(0x20a3c030)Bio::SeqFeature::Generic=HASH(0x20a3c2b8)Bio::SeqFeature::Generic=HASH(0x20a3be20)Bio::SeqFeature::Generic=HASH(0x20a3c0a8)Bio::SeqFeature::Generic=HASH(0x20a3bb98)Bio::SeqFeature::Generic=HASH(0x20a3c300)Bio::SeqFeature::Generic=HASH(0x20a3c588)Bio::SeqFeature::Generic=HASH(0x20a3d838)Bio::SeqFeature::Generic=HASH(0x20a3dfb8)Bio::SeqFeature::Generic=HASH(0x20a3dd18)

http://doc.bioperl.org/bioperl-live/Bio/Seq.html#POD24

$seq_obj->get_SeqFeatures() returns an array of SeqFeature objects, which are references.  So this worked as expected.

I usually write this as script files, so I've never done it all with perl -e.  But you need to iterate over the array and query the objects for the information you want about the features.


> 
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->translate(); ' nt.gb
> Bio::Seq::RichSeq=HASH(0x20a3e7b0)

->translate returns a new Seq object.  I think 

$ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->translate()->seq(); ' nt.gb

should work (haven't tried it).

Jim

> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

=====================================
Jim Hu
Professor
Dept. of Biochemistry and Biophysics
2128 TAMU
Texas A&M Univ.
College Station, TX 77843-2128
979-862-4054


From bosborne11 at verizon.net  Wed Jul 11 17:47:25 2012
From: bosborne11 at verizon.net (Brian Osborne)
Date: Wed, 11 Jul 2012 13:47:25 -0400
Subject: [Bioperl-l] Errors with Bio::Seq objects
In-Reply-To: <4FFDB879.1020906@gmail.com>
References: <4FFDB879.1020906@gmail.com>
Message-ID: <DBC94184-BB61-47EB-BC2A-D306E94BC25A@verizon.net>

Dejian,

These are not "failures". The get_SeqFeatures() and translate() methods will return Bio::Seq objects or a Bio::Seq object.

Start here:

www.bioperl.org/wiki/HOWTO:Beginners

Brian O.


On Jul 11, 2012, at 1:31 PM, De-Jian Zhao wrote:

> Hi,
> 
> I downloaded a nucleotide sequence from Genbank (file name: nt.gb) and tested the Bio::SeqIO module as follows. The first 3 commands succeeded; however the last 2 failed.
> 
> I think $seqio->next_seq() produces a Bio::Seq object which contains the sequence, features and annotation (according to the DESCRIPTION of "perldoc Bio::Seq") and thus the invocation of the methods get_SeqFeatures() and translate() should be valid. However, the results denied this idea.
> 
> Will anyone explain what happened to the last 2 commands? I have encountered numerous cases of failures when testing the bioperl methods. I want to translate the mRNA sequence and extract the sequence features. What are the right commands? Thanks a lot!
> 
> Best,
> Dejian
> 
> 
> 
> PS: The commands and results
> 
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->display_id(); ' nt.gb
> NM_053056
> 
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->molecule(); ' nt.gb
> mRNA
> 
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->subseq(1,6); ' nt.gb
> CACACG
> 
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->get_SeqFeatures(); ' nt.gb
> Bio::SeqFeature::Generic=HASH(0x20a30898)Bio::SeqFeature::Generic=HASH(0x20a30bb0)Bio::SeqFeature::Generic=HASH(0x20a30cd0)Bio::SeqFeature::Generic=HASH(0x20a317b0)Bio::SeqFeature::Generic=HASH(0x20a31720)Bio::SeqFeature::Generic=HASH(0x20a39a18)Bio::SeqFeature::Generic=HASH(0x20a317e0)Bio::SeqFeature::Generic=HASH(0x20a398b0)Bio::SeqFeature::Generic=HASH(0x20a39838)Bio::SeqFeature::Generic=HASH(0x20a39e98)Bio::SeqFeature::Generic=HASH(0x20a3b898)Bio::SeqFeature::Generic=HASH(0x20a3a120)Bio::SeqFeature::Generic=HASH(0x20a3bda8)Bio::SeqFeature::Generic=HASH(0x20a3c030)Bio::SeqFeature::Generic=HASH(0x20a3c2b8)Bio::SeqFeature::Generic=HASH(0x20a3be20)Bio::SeqFeature::Generic=HASH(0x20a3c0a8)Bio::SeqFeature::Generic=HASH(0x20a3bb98)Bio::SeqFeature::Generic=HASH(0x20a3c300)Bio::SeqFeature::Generic=HASH(0x20a3c588)Bio::SeqFeature::Generic=HASH(0x20a3d838)Bio::SeqFeature::Generic=HASH(0x20a3dfb8)Bio::SeqFeature::Generic=HASH(0x20a3dd18)
> 
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->translate(); ' nt.gb
> Bio::Seq::RichSeq=HASH(0x20a3e7b0)
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Wed Jul 11 19:02:46 2012
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Wed, 11 Jul 2012 19:02:46 +0000
Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates
In-Reply-To: <a73f4de8-8a2e-4dec-9073-62ab2e4da668@googlegroups.com>
References: <CA4B2C62.4B0E%givans@missouri.edu>
	<7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com>
	<1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu>
	<EBB9878A-F77D-4A0C-B543-7E9FA0C25ABE@missouri.edu>
	<a73f4de8-8a2e-4dec-9073-62ab2e4da668@googlegroups.com>
Message-ID: <FFCC2B49-3582-4CEE-9658-3710B213E754@illinois.edu>

Peng, 

Has this been filed as a bug yet?  

    https://redmine.open-bio.org/projects/bioperl

Seems like it would be fairly easy to fix, but I want to track it just in case.

chris

On Jul 11, 2012, at 12:45 PM, Peng Zhou wrote:

> Hello guys,
> 
> Just a follow-up, it seems to me the bioperl-live version is still having the same problem - calling hit "query" while query sequence "hit". I also looked into the test script written for hmmer3 (bioperl-live/t/SearchIO/hmmer.t), and it doesn't deal with the alignment part - I guess that's why this bug was not discovered.
> 
> To be simple, here's an output of hmmsearch v3.0:
> # hmmsearch :: search profile(s) against a sequence database
> # HMMER 3.0 (March 2010); http://hmmer.org/
> # Copyright (C) 2010 Howard Hughes Medical Institute.
> # Freely distributed under the GNU General Public License (GPLv3).
> # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> # query HMM file:                  /project/youngn/zhoup/Scripts/spada/profile/21_all.hmm
> # target sequence database:        /project/youngn/zhoup/Data/misc3/spada/Athaliana/01_genome/12_refseq_orf.fa
> # output directed to file:         /project/youngn/zhoup/Data/misc3/spada/Athaliana/11_hmmSearchX/01_raw.txt
> # number of worker threads:        4
> # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> 
> Query:       CRP0000  [M=75]
> Scores for complete sequences (score includes all domains):
>    --- full sequence ---   --- best 1 domain ---    -#dom-
>     E-value  score  bias    E-value  score  bias    exp  N  Sequence                 Description
>     ------- ------ -----    ------- ------ -----   ---- --  --------                 -----------
>     5.5e-25   95.0  14.4    5.7e-25   95.0  10.0    1.0  1  Chr2_540228_540404_+     
> 
> Domain annotation for each sequence (and alignments):
> >> Chr2_540228_540404_+  
>    #    score  bias  c-Evalue  i-Evalue hmmfrom  hmm to    alifrom  ali to    envfrom  env to     acc
>  ---   ------ ----- --------- --------- ------- -------    ------- -------    ------- -------    ----
>    1 !   95.0  10.0   3.6e-30   5.7e-25      20      74 ..       4      59 .]       1      59 [] 0.95
> 
>   Alignments for each domain:
>   == domain 1    score: 95.0 bits;  conditional E-value: 3.6e-30
>                CRP0000 20 tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg.rrrCfCtkpc 74
>                           ++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf gG+crg rrrC+Ct++c
>   Chr2_540228_540404_+  4 GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC 59
>                           568899***99********************************************* PP
> 
> And here is a dump of the parsed HSP object:
> $VAR1 = bless( {
>                  'VERBOSE' => 0,
>                  'IDENTICAL' => 0,
>                  'RANK' => 1,
>                  'STRANDED' => 'NONE',
>                  'EVALUE' => '3.6e-30',
>                  'HSP_LENGTH' => 56,
>                  'ALGORITHM' => 'HMMSEARCH'
>                  'SCORE' => '95.0',
>                  'GAP_SYMBOL' => '-',
>                  'CONSERVED' => 0,
>                  
>                  'HIT_NAME' => 'Chr2_540228_540404_+',
>                  'HIT_DESC' => '',
>                  'HIT_START' => '20',
>                  'HIT_END' => '74',
>                  'HIT_LENGTH' => 56,
>                  'HIT_SEQ' => 'tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg-rrrCfCtkpc',
>                  'HIT_FRAME' => 0,
>                  
>                  'QUERY_NAME' => 'CRP0000',
>                  'QUERY_DESC' => undef,
>                  'QUERY_START' => '4',
>                  'QUERY_END' => '59',
>                  'QUERY_LENGTH' => '75',
>                  'QUERY_FRAME' => 0,
>                  'QUERY_SEQ' => 'GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC',
>                  
>                  'HOMOLOGY_SEQ' => '++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf gG+crg rrrC+Ct++c',
>                }, 'Bio::Search::HSP::HMMERHSP' );
> 
> Clearly, the "HIT_START", "HIT_END", "HIT_SEQ" should actually be exchanged with "QUERY_START", "QUERY_END" and "QUERY_SEQ" values.
> 
> Thanks,
> 
> Peng,
> 
> On Tuesday, July 19, 2011 11:23:20 PM UTC-5, Givan, Scott A. wrote:
> I'll try the bioperl-live version. Thanks guys.
> Scott Givan
> 541-740-4685
> Sent from an iPhone (so expect typos).
> 
> On Jul 19, 2011, at 10:34 PM, "Chris Fields" <cjfields at illinois.edu> wrote:
> 
> > This might be a disconnect between the HMMER3 version in bioperl-live and the one in Kai's bioperl-hmmer3 repo.  I believe the one in bioperl-live is newer.  Scott, can you give that a try?
> > 
> > chris
> > 
> > On Jul 19, 2011, at 9:45 PM, Thomas Sharpton wrote:
> > 
> >> Hi Scott,
> >> 
> >> Thanks for writing. I'm on the road at the moment so I have to be briefer and less thorough than I'd like to be.
> >> 
> >> What you are observing is not the intended behavior. Oddly, it's not what I recall obtaining in my tests on this software, though I was mostly interested in hmmsearch at the time and may have been sloppier than I should have been when it came to hmmscan.
> >> 
> >> What version of HMMER3 you're using? There have been some small formatting changes in the past that might be causing a burp in the parser, though I'm doubting it.
> >> 
> >> Kai Blin wrote some test scripts (found here: bioperl-live/t/SearchIO/hmmer.t) that, if I recall correctly, evaluate query/hit coordinates. It might be worth giving this a shot if you haven't already.
> >> 
> >> Also, if you don't mind, I'm happy to run your code on your output file on my end. It might help me diagnose the problem.
> >> 
> >> Sorry this is being a thorn in your side! I've cc'ed the list in case anyone else has insight into this matter.
> >> 
> >> Best,
> >> Thomas
> >> 
> >> On Jul 19, 2011, at 10:43 AM, Givan, Scott A. wrote:
> >> 
> >>> Hi Thomas,
> >>> 
> >>> I'm using modules in the bipoerl-hmmer3 git repository to parse hmmscan
> >>> reports. When I parse the files and walk through the HSP's like:
> >>> 
> >>> while (my $hit = $rslt->next_model) {
> >>> 
> >>>    while (my $domain = $hit->next_hsp) {
> >>> 
> >>> And retrieve the "hit" coordinates like:
> >>> 
> >>>        print "hit coords: ", $domain->start('hit'), "-", $domain->end('hit'),
> >>> "\n";
> >>> 
> >>> The coordinates returned correspond to what I would call the "query",
> >>> since they are for the sequence I fed to hmmscan to search the profile
> >>> database. Likewise, when retrieving the query coordinates like
> >>> $domain->start('query'), I get what I consider the "hit" coordinates,
> >>> since they are for the domain profile. Is this the intended behavior?
> >>> 
> >>> Thanks.
> >>> 
> >>> scott
> >>> 
> >>> -- 
> >>> Scott A. Givan
> >>> Associate Director
> >>> Informatics Research Core Facility
> >>> 240e Bond Life Sciences Center
> >>> Research Assistant Professor
> >>> Molecular Microbiology and Immunology
> >>> University of Missouri, Columbia
> >>> 
> >>> TEL 573-882-2948
> >>> FAX 573-884-9676
> >>> http://ircf.rnet.missouri.edu
> >>> 
> >>> 
> >>> 
> >> 
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> 
> 


From p.j.a.cock at googlemail.com  Wed Jul 11 21:00:56 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 11 Jul 2012 22:00:56 +0100
Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates
In-Reply-To: <FFCC2B49-3582-4CEE-9658-3710B213E754@illinois.edu>
References: <CA4B2C62.4B0E%givans@missouri.edu>
	<7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com>
	<1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu>
	<EBB9878A-F77D-4A0C-B543-7E9FA0C25ABE@missouri.edu>
	<a73f4de8-8a2e-4dec-9073-62ab2e4da668@googlegroups.com>
	<FFCC2B49-3582-4CEE-9658-3710B213E754@illinois.edu>
Message-ID: <CAKVJ-_4mo16NxEbZkNp3-EG9uknc4Fo4m6ajQWTknumQNUn4Yw@mail.gmail.com>

On Wed, Jul 11, 2012 at 8:02 PM, Fields, Christopher J
<cjfields at illinois.edu> wrote:
> Peng,
>
> Has this been filed as a bug yet?
>
>     https://redmine.open-bio.org/projects/bioperl
>
> Seems like it would be fairly easy to fix, but I want to track it just in case.
>
> chris

Hi all,

This could be the unfortunate fact that hmmscan and
hmmsearch return very similar tabular output, but
with query and hit interchanged. i.e. You need some
extra information to know which way round they are
(not possible with the current output). This was an
issue in Bow's Biopython SearchIO project - which
for the moment he solved by handling this as two
hmmer file formats. In the medium term we're hoping
hmmer3 will add some header information or something.

Peter


From zhoupenggeni at gmail.com  Wed Jul 11 17:45:00 2012
From: zhoupenggeni at gmail.com (Peng Zhou)
Date: Wed, 11 Jul 2012 10:45:00 -0700 (PDT)
Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates
In-Reply-To: <EBB9878A-F77D-4A0C-B543-7E9FA0C25ABE@missouri.edu>
References: <CA4B2C62.4B0E%givans@missouri.edu>
	<7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com>
	<1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu>
	<EBB9878A-F77D-4A0C-B543-7E9FA0C25ABE@missouri.edu>
Message-ID: <a73f4de8-8a2e-4dec-9073-62ab2e4da668@googlegroups.com>

Hello guys,

Just a follow-up, it seems to me the bioperl-live version is still having 
the same problem - calling hit "query" while query sequence "hit". I also 
looked into the test script written for hmmer3 (
bioperl-live/t/SearchIO/hmmer.t), and it doesn't deal with the alignment 
part - I guess that's why this bug was not discovered.

To be simple, here's an output of hmmsearch v3.0:
# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.0 (March 2010); http://hmmer.org/
# Copyright (C) 2010 Howard Hughes Medical Institute.
# Freely distributed under the GNU General Public License (GPLv3).
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                 
 /project/youngn/zhoup/Scripts/spada/profile/21_all.hmm
# target sequence database:       
 /project/youngn/zhoup/Data/misc3/spada/Athaliana/01_genome/12_refseq_orf.fa
# output directed to file:         
/project/youngn/zhoup/Data/misc3/spada/Athaliana/11_hmmSearchX/01_raw.txt
# number of worker threads:        4
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       CRP0000  [M=75]
Scores for complete sequences (score includes all domains):
   --- full sequence ---   --- best 1 domain ---    -#dom-
    E-value  score  bias    E-value  score  bias    exp  N  Sequence       
          Description
    ------- ------ -----    ------- ------ -----   ---- --  --------       
          -----------
    5.5e-25   95.0  14.4    5.7e-25   95.0  10.0    1.0  1 
 Chr2_540228_540404_+     

Domain annotation for each sequence (and alignments):
>> Chr2_540228_540404_+  
   #    score  bias  c-Evalue  i-Evalue hmmfrom  hmm to    alifrom  ali to 
   envfrom  env to     acc
 ---   ------ ----- --------- --------- ------- -------    ------- ------- 
   ------- -------    ----
   1 !   95.0  10.0   3.6e-30   5.7e-25      20      74 ..       4      59 
.]       1      59 [] 0.95

  Alignments for each domain:
  == domain 1    score: 95.0 bits;  conditional E-value: 3.6e-30
               CRP0000 20 
tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg.rrrCfCtkpc 74
                          ++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf gG+crg 
rrrC+Ct++c
  Chr2_540228_540404_+  4 
GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC 59
                          
568899***99********************************************* PP

And here is a dump of the parsed HSP object:
$VAR1 = bless( {
                 'VERBOSE' => 0,
                 'IDENTICAL' => 0,
                 'RANK' => 1,
                 'STRANDED' => 'NONE',
                 'EVALUE' => '3.6e-30',
                 'HSP_LENGTH' => 56,
                 'ALGORITHM' => 'HMMSEARCH'
                 'SCORE' => '95.0',
                 'GAP_SYMBOL' => '-',
                 'CONSERVED' => 0,
                 
                 'HIT_NAME' => 'Chr2_540228_540404_+',
                 'HIT_DESC' => '',
                 'HIT_START' => '20',
                 'HIT_END' => '74',
                 'HIT_LENGTH' => 56,
                 'HIT_SEQ' => 
'tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg-rrrCfCtkpc',
                 'HIT_FRAME' => 0,
                 
                 'QUERY_NAME' => 'CRP0000',
                 'QUERY_DESC' => undef,
                 'QUERY_START' => '4',
                 'QUERY_END' => '59',
                 'QUERY_LENGTH' => '75',
                 'QUERY_FRAME' => 0,
                 'QUERY_SEQ' => 
'GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC',
                 
                 'HOMOLOGY_SEQ' => '++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf 
gG+crg rrrC+Ct++c',
               }, 'Bio::Search::HSP::HMMERHSP' );

Clearly, the "HIT_START", "HIT_END", "HIT_SEQ" should actually be exchanged 
with "QUERY_START", "QUERY_END" and "QUERY_SEQ" values.

Thanks,

Peng,

On Tuesday, July 19, 2011 11:23:20 PM UTC-5, Givan, Scott A. wrote:
>
> I'll try the bioperl-live version. Thanks guys. 
>
> Scott Givan
> 541-740-4685
> Sent from an iPhone (so expect typos). 
>
> On Jul 19, 2011, at 10:34 PM, "Chris Fields" <cjfields at illinois.edu> 
> wrote:
>
> > This might be a disconnect between the HMMER3 version in bioperl-live 
> and the one in Kai's bioperl-hmmer3 repo.  I believe the one in 
> bioperl-live is newer.  Scott, can you give that a try?
> > 
> > chris
> > 
> > On Jul 19, 2011, at 9:45 PM, Thomas Sharpton wrote:
> > 
> >> Hi Scott,
> >> 
> >> Thanks for writing. I'm on the road at the moment so I have to be 
> briefer and less thorough than I'd like to be.
> >> 
> >> What you are observing is not the intended behavior. Oddly, it's not 
> what I recall obtaining in my tests on this software, though I was mostly 
> interested in hmmsearch at the time and may have been sloppier than I 
> should have been when it came to hmmscan.
> >> 
> >> What version of HMMER3 you're using? There have been some small 
> formatting changes in the past that might be causing a burp in the parser, 
> though I'm doubting it.
> >> 
> >> Kai Blin wrote some test scripts (found here: 
> bioperl-live/t/SearchIO/hmmer.t) that, if I recall correctly, evaluate 
> query/hit coordinates. It might be worth giving this a shot if you haven't 
> already.
> >> 
> >> Also, if you don't mind, I'm happy to run your code on your output file 
> on my end. It might help me diagnose the problem.
> >> 
> >> Sorry this is being a thorn in your side! I've cc'ed the list in case 
> anyone else has insight into this matter.
> >> 
> >> Best,
> >> Thomas
> >> 
> >> On Jul 19, 2011, at 10:43 AM, Givan, Scott A. wrote:
> >> 
> >>> Hi Thomas,
> >>> 
> >>> I'm using modules in the bipoerl-hmmer3 git repository to parse hmmscan
> >>> reports. When I parse the files and walk through the HSP's like:
> >>> 
> >>> while (my $hit = $rslt->next_model) {
> >>> 
> >>>    while (my $domain = $hit->next_hsp) {
> >>> 
> >>> And retrieve the "hit" coordinates like:
> >>> 
> >>>        print "hit coords: ", $domain->start('hit'), "-", 
> $domain->end('hit'),
> >>> "\n";
> >>> 
> >>> The coordinates returned correspond to what I would call the "query",
> >>> since they are for the sequence I fed to hmmscan to search the profile
> >>> database. Likewise, when retrieving the query coordinates like
> >>> $domain->start('query'), I get what I consider the "hit" coordinates,
> >>> since they are for the domain profile. Is this the intended behavior?
> >>> 
> >>> Thanks.
> >>> 
> >>> scott
> >>> 
> >>> -- 
> >>> Scott A. Givan
> >>> Associate Director
> >>> Informatics Research Core Facility
> >>> 240e Bond Life Sciences Center
> >>> Research Assistant Professor
> >>> Molecular Microbiology and Immunology
> >>> University of Missouri, Columbia
> >>> 
> >>> TEL 573-882-2948
> >>> FAX 573-884-9676
> >>> http://ircf.rnet.missouri.edu
> >>> 
> >>> 
> >>> 
> >> 
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From zhoupenggeni at gmail.com  Wed Jul 11 17:45:00 2012
From: zhoupenggeni at gmail.com (Peng Zhou)
Date: Wed, 11 Jul 2012 10:45:00 -0700 (PDT)
Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates
In-Reply-To: <EBB9878A-F77D-4A0C-B543-7E9FA0C25ABE@missouri.edu>
References: <CA4B2C62.4B0E%givans@missouri.edu>
	<7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com>
	<1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu>
	<EBB9878A-F77D-4A0C-B543-7E9FA0C25ABE@missouri.edu>
Message-ID: <a73f4de8-8a2e-4dec-9073-62ab2e4da668@googlegroups.com>

Hello guys,

Just a follow-up, it seems to me the bioperl-live version is still having 
the same problem - calling hit "query" while query sequence "hit". I also 
looked into the test script written for hmmer3 (
bioperl-live/t/SearchIO/hmmer.t), and it doesn't deal with the alignment 
part - I guess that's why this bug was not discovered.

To be simple, here's an output of hmmsearch v3.0:
# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.0 (March 2010); http://hmmer.org/
# Copyright (C) 2010 Howard Hughes Medical Institute.
# Freely distributed under the GNU General Public License (GPLv3).
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                 
 /project/youngn/zhoup/Scripts/spada/profile/21_all.hmm
# target sequence database:       
 /project/youngn/zhoup/Data/misc3/spada/Athaliana/01_genome/12_refseq_orf.fa
# output directed to file:         
/project/youngn/zhoup/Data/misc3/spada/Athaliana/11_hmmSearchX/01_raw.txt
# number of worker threads:        4
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       CRP0000  [M=75]
Scores for complete sequences (score includes all domains):
   --- full sequence ---   --- best 1 domain ---    -#dom-
    E-value  score  bias    E-value  score  bias    exp  N  Sequence       
          Description
    ------- ------ -----    ------- ------ -----   ---- --  --------       
          -----------
    5.5e-25   95.0  14.4    5.7e-25   95.0  10.0    1.0  1 
 Chr2_540228_540404_+     

Domain annotation for each sequence (and alignments):
>> Chr2_540228_540404_+  
   #    score  bias  c-Evalue  i-Evalue hmmfrom  hmm to    alifrom  ali to 
   envfrom  env to     acc
 ---   ------ ----- --------- --------- ------- -------    ------- ------- 
   ------- -------    ----
   1 !   95.0  10.0   3.6e-30   5.7e-25      20      74 ..       4      59 
.]       1      59 [] 0.95

  Alignments for each domain:
  == domain 1    score: 95.0 bits;  conditional E-value: 3.6e-30
               CRP0000 20 
tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg.rrrCfCtkpc 74
                          ++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf gG+crg 
rrrC+Ct++c
  Chr2_540228_540404_+  4 
GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC 59
                          
568899***99********************************************* PP

And here is a dump of the parsed HSP object:
$VAR1 = bless( {
                 'VERBOSE' => 0,
                 'IDENTICAL' => 0,
                 'RANK' => 1,
                 'STRANDED' => 'NONE',
                 'EVALUE' => '3.6e-30',
                 'HSP_LENGTH' => 56,
                 'ALGORITHM' => 'HMMSEARCH'
                 'SCORE' => '95.0',
                 'GAP_SYMBOL' => '-',
                 'CONSERVED' => 0,
                 
                 'HIT_NAME' => 'Chr2_540228_540404_+',
                 'HIT_DESC' => '',
                 'HIT_START' => '20',
                 'HIT_END' => '74',
                 'HIT_LENGTH' => 56,
                 'HIT_SEQ' => 
'tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg-rrrCfCtkpc',
                 'HIT_FRAME' => 0,
                 
                 'QUERY_NAME' => 'CRP0000',
                 'QUERY_DESC' => undef,
                 'QUERY_START' => '4',
                 'QUERY_END' => '59',
                 'QUERY_LENGTH' => '75',
                 'QUERY_FRAME' => 0,
                 'QUERY_SEQ' => 
'GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC',
                 
                 'HOMOLOGY_SEQ' => '++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf 
gG+crg rrrC+Ct++c',
               }, 'Bio::Search::HSP::HMMERHSP' );

Clearly, the "HIT_START", "HIT_END", "HIT_SEQ" should actually be exchanged 
with "QUERY_START", "QUERY_END" and "QUERY_SEQ" values.

Thanks,

Peng,

On Tuesday, July 19, 2011 11:23:20 PM UTC-5, Givan, Scott A. wrote:
>
> I'll try the bioperl-live version. Thanks guys. 
>
> Scott Givan
> 541-740-4685
> Sent from an iPhone (so expect typos). 
>
> On Jul 19, 2011, at 10:34 PM, "Chris Fields" <cjfields at illinois.edu> 
> wrote:
>
> > This might be a disconnect between the HMMER3 version in bioperl-live 
> and the one in Kai's bioperl-hmmer3 repo.  I believe the one in 
> bioperl-live is newer.  Scott, can you give that a try?
> > 
> > chris
> > 
> > On Jul 19, 2011, at 9:45 PM, Thomas Sharpton wrote:
> > 
> >> Hi Scott,
> >> 
> >> Thanks for writing. I'm on the road at the moment so I have to be 
> briefer and less thorough than I'd like to be.
> >> 
> >> What you are observing is not the intended behavior. Oddly, it's not 
> what I recall obtaining in my tests on this software, though I was mostly 
> interested in hmmsearch at the time and may have been sloppier than I 
> should have been when it came to hmmscan.
> >> 
> >> What version of HMMER3 you're using? There have been some small 
> formatting changes in the past that might be causing a burp in the parser, 
> though I'm doubting it.
> >> 
> >> Kai Blin wrote some test scripts (found here: 
> bioperl-live/t/SearchIO/hmmer.t) that, if I recall correctly, evaluate 
> query/hit coordinates. It might be worth giving this a shot if you haven't 
> already.
> >> 
> >> Also, if you don't mind, I'm happy to run your code on your output file 
> on my end. It might help me diagnose the problem.
> >> 
> >> Sorry this is being a thorn in your side! I've cc'ed the list in case 
> anyone else has insight into this matter.
> >> 
> >> Best,
> >> Thomas
> >> 
> >> On Jul 19, 2011, at 10:43 AM, Givan, Scott A. wrote:
> >> 
> >>> Hi Thomas,
> >>> 
> >>> I'm using modules in the bipoerl-hmmer3 git repository to parse hmmscan
> >>> reports. When I parse the files and walk through the HSP's like:
> >>> 
> >>> while (my $hit = $rslt->next_model) {
> >>> 
> >>>    while (my $domain = $hit->next_hsp) {
> >>> 
> >>> And retrieve the "hit" coordinates like:
> >>> 
> >>>        print "hit coords: ", $domain->start('hit'), "-", 
> $domain->end('hit'),
> >>> "\n";
> >>> 
> >>> The coordinates returned correspond to what I would call the "query",
> >>> since they are for the sequence I fed to hmmscan to search the profile
> >>> database. Likewise, when retrieving the query coordinates like
> >>> $domain->start('query'), I get what I consider the "hit" coordinates,
> >>> since they are for the domain profile. Is this the intended behavior?
> >>> 
> >>> Thanks.
> >>> 
> >>> scott
> >>> 
> >>> -- 
> >>> Scott A. Givan
> >>> Associate Director
> >>> Informatics Research Core Facility
> >>> 240e Bond Life Sciences Center
> >>> Research Assistant Professor
> >>> Molecular Microbiology and Immunology
> >>> University of Missouri, Columbia
> >>> 
> >>> TEL 573-882-2948
> >>> FAX 573-884-9676
> >>> http://ircf.rnet.missouri.edu
> >>> 
> >>> 
> >>> 
> >> 
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From zhoupenggeni at gmail.com  Wed Jul 11 18:03:17 2012
From: zhoupenggeni at gmail.com (Peng Zhou)
Date: Wed, 11 Jul 2012 11:03:17 -0700 (PDT)
Subject: [Bioperl-l] Errors with Bio::Seq objects
In-Reply-To: <4FFDB879.1020906@gmail.com>
References: <4FFDB879.1020906@gmail.com>
Message-ID: <25cf332a-f998-4eae-96ff-d9db1ee2ff2c@googlegroups.com>

Hi, 

I guess that's what the commands are supposed to do: the get_SeqFeatures() 
method return an array of Bio::SeqFeature objects, and the translate() 
method returns a Bio::Seq object. And you can't simply "print" an object in 
perl - you can "dump" it though:

$ perl -e ' use Bio::SeqIO; use Data::Dumper; my 
$seqio=Bio::SeqIO->new(-file=>shift); 
print Dumper($seqio->next_seq()->get_SeqFeatures()); ' nt.gb 

$ perl -e ' use Bio::SeqIO; use Data::Dumper; my 
$seqio=Bio::SeqIO->new(-file=>shift); 
print Dumper($seqio->next_seq()->translate()); ' nt.gb 

On Wednesday, July 11, 2012 12:31:37 PM UTC-5, De-Jian Zhao wrote:
>
> Hi, 
>
> I downloaded a nucleotide sequence from Genbank (file name: nt.gb) and 
> tested the Bio::SeqIO module as follows. The first 3 commands succeeded; 
> however the last 2 failed. 
>
> I think $seqio->next_seq() produces a Bio::Seq object which contains the 
> sequence, features and annotation (according to the DESCRIPTION of 
> "perldoc Bio::Seq") and thus the invocation of the methods 
> get_SeqFeatures() and translate() should be valid. However, the results 
> denied this idea. 
>
> Will anyone explain what happened to the last 2 commands? I have 
> encountered numerous cases of failures when testing the bioperl methods. 
> I want to translate the mRNA sequence and extract the sequence features. 
> What are the right commands? Thanks a lot! 
>
> Best, 
> Dejian 
>
>
>
> PS: The commands and results 
>
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); 
> print $seqio->next_seq()->display_id(); ' nt.gb 
> NM_053056 
>
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); 
> print $seqio->next_seq()->molecule(); ' nt.gb 
> mRNA 
>
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); 
> print $seqio->next_seq()->subseq(1,6); ' nt.gb 
> CACACG 
>
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); 
> print $seqio->next_seq()->get_SeqFeatures(); ' nt.gb 
> Bio::SeqFeature::Generic=HASH(0x20a30898)Bio::SeqFeature::Generic=HASH(0x20a30bb0)Bio::SeqFeature::Generic=HASH(0x20a30cd0)Bio::SeqFeature::Generic=HASH(0x20a317b0)Bio::SeqFeature::Generic=HASH(0x20a31720)Bio::SeqFeature::Generic=HASH(0x20a39a18)Bio::SeqFeature::Generic=HASH(0x20a317e0)Bio::SeqFeature::Generic=HASH(0x20a398b0)Bio::SeqFeature::Generic=HASH(0x20a39838)Bio::SeqFeature::Generic=HASH(0x20a39e98)Bio::SeqFeature::Generic=HASH(0x20a3b898)Bio::SeqFeature::Generic=HASH(0x20a3a120)Bio::SeqFeature::Generic=HASH(0x20a3bda8)Bio::SeqFeature::Generic=HASH(0x20a3c030)Bio::SeqFeature::Generic=HASH(0x20a3c2b8)Bio::SeqFeature::Generic=HASH(0x20a3be20)Bio::SeqFeature::Generic=HASH(0x20a3c0a8)Bio::SeqFeature::Generic=HASH(0x20a3bb98)Bio::SeqFeature::Generic=HASH(0x20a3c300)Bio::SeqFeature::Generic=HASH(0x20a3c588)Bio::SeqFeature::Generic=HASH(0x20a3d838)Bio::SeqFeature::Generic=HASH(0x20a3dfb8)Bio::SeqFeature::Generic=HASH(0x20a3dd18) 
>
>
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); 
> print $seqio->next_seq()->translate(); ' nt.gb 
> Bio::Seq::RichSeq=HASH(0x20a3e7b0) 
>
>
>
>
> _______________________________________________ 
> Bioperl-l mailing list 
> Bioperl-l at lists.open-bio.org 
> http://lists.open-bio.org/mailman/listinfo/bioperl-l 
>


From zhoupenggeni at gmail.com  Wed Jul 11 18:03:17 2012
From: zhoupenggeni at gmail.com (Peng Zhou)
Date: Wed, 11 Jul 2012 11:03:17 -0700 (PDT)
Subject: [Bioperl-l] Errors with Bio::Seq objects
In-Reply-To: <4FFDB879.1020906@gmail.com>
References: <4FFDB879.1020906@gmail.com>
Message-ID: <25cf332a-f998-4eae-96ff-d9db1ee2ff2c@googlegroups.com>

Hi, 

I guess that's what the commands are supposed to do: the get_SeqFeatures() 
method return an array of Bio::SeqFeature objects, and the translate() 
method returns a Bio::Seq object. And you can't simply "print" an object in 
perl - you can "dump" it though:

$ perl -e ' use Bio::SeqIO; use Data::Dumper; my 
$seqio=Bio::SeqIO->new(-file=>shift); 
print Dumper($seqio->next_seq()->get_SeqFeatures()); ' nt.gb 

$ perl -e ' use Bio::SeqIO; use Data::Dumper; my 
$seqio=Bio::SeqIO->new(-file=>shift); 
print Dumper($seqio->next_seq()->translate()); ' nt.gb 

On Wednesday, July 11, 2012 12:31:37 PM UTC-5, De-Jian Zhao wrote:
>
> Hi, 
>
> I downloaded a nucleotide sequence from Genbank (file name: nt.gb) and 
> tested the Bio::SeqIO module as follows. The first 3 commands succeeded; 
> however the last 2 failed. 
>
> I think $seqio->next_seq() produces a Bio::Seq object which contains the 
> sequence, features and annotation (according to the DESCRIPTION of 
> "perldoc Bio::Seq") and thus the invocation of the methods 
> get_SeqFeatures() and translate() should be valid. However, the results 
> denied this idea. 
>
> Will anyone explain what happened to the last 2 commands? I have 
> encountered numerous cases of failures when testing the bioperl methods. 
> I want to translate the mRNA sequence and extract the sequence features. 
> What are the right commands? Thanks a lot! 
>
> Best, 
> Dejian 
>
>
>
> PS: The commands and results 
>
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); 
> print $seqio->next_seq()->display_id(); ' nt.gb 
> NM_053056 
>
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); 
> print $seqio->next_seq()->molecule(); ' nt.gb 
> mRNA 
>
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); 
> print $seqio->next_seq()->subseq(1,6); ' nt.gb 
> CACACG 
>
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); 
> print $seqio->next_seq()->get_SeqFeatures(); ' nt.gb 
> Bio::SeqFeature::Generic=HASH(0x20a30898)Bio::SeqFeature::Generic=HASH(0x20a30bb0)Bio::SeqFeature::Generic=HASH(0x20a30cd0)Bio::SeqFeature::Generic=HASH(0x20a317b0)Bio::SeqFeature::Generic=HASH(0x20a31720)Bio::SeqFeature::Generic=HASH(0x20a39a18)Bio::SeqFeature::Generic=HASH(0x20a317e0)Bio::SeqFeature::Generic=HASH(0x20a398b0)Bio::SeqFeature::Generic=HASH(0x20a39838)Bio::SeqFeature::Generic=HASH(0x20a39e98)Bio::SeqFeature::Generic=HASH(0x20a3b898)Bio::SeqFeature::Generic=HASH(0x20a3a120)Bio::SeqFeature::Generic=HASH(0x20a3bda8)Bio::SeqFeature::Generic=HASH(0x20a3c030)Bio::SeqFeature::Generic=HASH(0x20a3c2b8)Bio::SeqFeature::Generic=HASH(0x20a3be20)Bio::SeqFeature::Generic=HASH(0x20a3c0a8)Bio::SeqFeature::Generic=HASH(0x20a3bb98)Bio::SeqFeature::Generic=HASH(0x20a3c300)Bio::SeqFeature::Generic=HASH(0x20a3c588)Bio::SeqFeature::Generic=HASH(0x20a3d838)Bio::SeqFeature::Generic=HASH(0x20a3dfb8)Bio::SeqFeature::Generic=HASH(0x20a3dd18) 
>
>
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); 
> print $seqio->next_seq()->translate(); ' nt.gb 
> Bio::Seq::RichSeq=HASH(0x20a3e7b0) 
>
>
>
>
> _______________________________________________ 
> Bioperl-l mailing list 
> Bioperl-l at lists.open-bio.org 
> http://lists.open-bio.org/mailman/listinfo/bioperl-l 
>


From zhoupenggeni at gmail.com  Wed Jul 11 20:05:56 2012
From: zhoupenggeni at gmail.com (Peng Zhou)
Date: Wed, 11 Jul 2012 13:05:56 -0700 (PDT)
Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates
In-Reply-To: <FFCC2B49-3582-4CEE-9658-3710B213E754@illinois.edu>
References: <CA4B2C62.4B0E%givans@missouri.edu>
	<7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com>
	<1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu>
	<EBB9878A-F77D-4A0C-B543-7E9FA0C25ABE@missouri.edu>
	<a73f4de8-8a2e-4dec-9073-62ab2e4da668@googlegroups.com>
	<FFCC2B49-3582-4CEE-9658-3710B213E754@illinois.edu>
Message-ID: <ee477075-8d1a-4172-a4f2-7c51f45d9224@googlegroups.com>

Thanks Chris, here is the link of the filed 
bug: https://redmine.open-bio.org/issues/3369

On Wednesday, July 11, 2012 2:02:46 PM UTC-5, Christopher Fields wrote:
>
> Peng, 
>
> Has this been filed as a bug yet?   
>
>     https://redmine.open-bio.org/projects/bioperl 
>
> Seems like it would be fairly easy to fix, but I want to track it just in 
> case. 
>
> chris 
>
> On Jul 11, 2012, at 12:45 PM, Peng Zhou wrote: 
>
> > Hello guys, 
> > 
> > Just a follow-up, it seems to me the bioperl-live version is still 
> having the same problem - calling hit "query" while query sequence "hit". I 
> also looked into the test script written for hmmer3 
> (bioperl-live/t/SearchIO/hmmer.t), and it doesn't deal with the alignment 
> part - I guess that's why this bug was not discovered. 
> > 
> > To be simple, here's an output of hmmsearch v3.0: 
> > # hmmsearch :: search profile(s) against a sequence database 
> > # HMMER 3.0 (March 2010); http://hmmer.org/ 
> > # Copyright (C) 2010 Howard Hughes Medical Institute. 
> > # Freely distributed under the GNU General Public License (GPLv3). 
> > # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
> - 
> > # query HMM file:                 
>  /project/youngn/zhoup/Scripts/spada/profile/21_all.hmm 
> > # target sequence database:       
>  /project/youngn/zhoup/Data/misc3/spada/Athaliana/01_genome/12_refseq_orf.fa 
>
> > # output directed to file:         
> /project/youngn/zhoup/Data/misc3/spada/Athaliana/11_hmmSearchX/01_raw.txt 
> > # number of worker threads:        4 
> > # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
> - 
> > 
> > Query:       CRP0000  [M=75] 
> > Scores for complete sequences (score includes all domains): 
> >    --- full sequence ---   --- best 1 domain ---    -#dom- 
> >     E-value  score  bias    E-value  score  bias    exp  N  Sequence     
>             Description 
> >     ------- ------ -----    ------- ------ -----   ---- --  --------     
>             ----------- 
> >     5.5e-25   95.0  14.4    5.7e-25   95.0  10.0    1.0  1 
>  Chr2_540228_540404_+     
> > 
> > Domain annotation for each sequence (and alignments): 
> > >> Chr2_540228_540404_+   
> >    #    score  bias  c-Evalue  i-Evalue hmmfrom  hmm to    alifrom  ali 
> to    envfrom  env to     acc 
> >  ---   ------ ----- --------- --------- ------- -------    ------- 
> -------    ------- -------    ---- 
> >    1 !   95.0  10.0   3.6e-30   5.7e-25      20      74 ..       4     
>  59 .]       1      59 [] 0.95 
> > 
> >   Alignments for each domain: 
> >   == domain 1    score: 95.0 bits;  conditional E-value: 3.6e-30 
> >                CRP0000 20 
> tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg.rrrCfCtkpc 74 
> >                           ++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf gG+crg 
> rrrC+Ct++c 
> >   Chr2_540228_540404_+  4 
> GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC 59 
> >                           
> 568899***99********************************************* PP 
> > 
> > And here is a dump of the parsed HSP object: 
> > $VAR1 = bless( { 
> >                  'VERBOSE' => 0, 
> >                  'IDENTICAL' => 0, 
> >                  'RANK' => 1, 
> >                  'STRANDED' => 'NONE', 
> >                  'EVALUE' => '3.6e-30', 
> >                  'HSP_LENGTH' => 56, 
> >                  'ALGORITHM' => 'HMMSEARCH' 
> >                  'SCORE' => '95.0', 
> >                  'GAP_SYMBOL' => '-', 
> >                  'CONSERVED' => 0, 
> >                   
> >                  'HIT_NAME' => 'Chr2_540228_540404_+', 
> >                  'HIT_DESC' => '', 
> >                  'HIT_START' => '20', 
> >                  'HIT_END' => '74', 
> >                  'HIT_LENGTH' => 56, 
> >                  'HIT_SEQ' => 
> 'tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg-rrrCfCtkpc', 
> >                  'HIT_FRAME' => 0, 
> >                   
> >                  'QUERY_NAME' => 'CRP0000', 
> >                  'QUERY_DESC' => undef, 
> >                  'QUERY_START' => '4', 
> >                  'QUERY_END' => '59', 
> >                  'QUERY_LENGTH' => '75', 
> >                  'QUERY_FRAME' => 0, 
> >                  'QUERY_SEQ' => 
> 'GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC', 
> >                   
> >                  'HOMOLOGY_SEQ' => '++gp+++eartCes+Sh+FkGpCvs 
> +nCa+vC++Egf gG+crg rrrC+Ct++c', 
> >                }, 'Bio::Search::HSP::HMMERHSP' ); 
> > 
> > Clearly, the "HIT_START", "HIT_END", "HIT_SEQ" should actually be 
> exchanged with "QUERY_START", "QUERY_END" and "QUERY_SEQ" values. 
> > 
> > Thanks, 
> > 
> > Peng, 
> > 
> > On Tuesday, July 19, 2011 11:23:20 PM UTC-5, Givan, Scott A. wrote: 
> > I'll try the bioperl-live version. Thanks guys. 
> > Scott Givan 
> > 541-740-4685 
> > Sent from an iPhone (so expect typos). 
> > 
> > On Jul 19, 2011, at 10:34 PM, "Chris Fields" <cjfields at illinois.edu> 
> wrote: 
> > 
> > > This might be a disconnect between the HMMER3 version in bioperl-live 
> and the one in Kai's bioperl-hmmer3 repo.  I believe the one in 
> bioperl-live is newer.  Scott, can you give that a try? 
> > > 
> > > chris 
> > > 
> > > On Jul 19, 2011, at 9:45 PM, Thomas Sharpton wrote: 
> > > 
> > >> Hi Scott, 
> > >> 
> > >> Thanks for writing. I'm on the road at the moment so I have to be 
> briefer and less thorough than I'd like to be. 
> > >> 
> > >> What you are observing is not the intended behavior. Oddly, it's not 
> what I recall obtaining in my tests on this software, though I was mostly 
> interested in hmmsearch at the time and may have been sloppier than I 
> should have been when it came to hmmscan. 
> > >> 
> > >> What version of HMMER3 you're using? There have been some small 
> formatting changes in the past that might be causing a burp in the parser, 
> though I'm doubting it. 
> > >> 
> > >> Kai Blin wrote some test scripts (found here: 
> bioperl-live/t/SearchIO/hmmer.t) that, if I recall correctly, evaluate 
> query/hit coordinates. It might be worth giving this a shot if you haven't 
> already. 
> > >> 
> > >> Also, if you don't mind, I'm happy to run your code on your output 
> file on my end. It might help me diagnose the problem. 
> > >> 
> > >> Sorry this is being a thorn in your side! I've cc'ed the list in case 
> anyone else has insight into this matter. 
> > >> 
> > >> Best, 
> > >> Thomas 
> > >> 
> > >> On Jul 19, 2011, at 10:43 AM, Givan, Scott A. wrote: 
> > >> 
> > >>> Hi Thomas, 
> > >>> 
> > >>> I'm using modules in the bipoerl-hmmer3 git repository to parse 
> hmmscan 
> > >>> reports. When I parse the files and walk through the HSP's like: 
> > >>> 
> > >>> while (my $hit = $rslt->next_model) { 
> > >>> 
> > >>>    while (my $domain = $hit->next_hsp) { 
> > >>> 
> > >>> And retrieve the "hit" coordinates like: 
> > >>> 
> > >>>        print "hit coords: ", $domain->start('hit'), "-", 
> $domain->end('hit'), 
> > >>> "\n"; 
> > >>> 
> > >>> The coordinates returned correspond to what I would call the 
> "query", 
> > >>> since they are for the sequence I fed to hmmscan to search the 
> profile 
> > >>> database. Likewise, when retrieving the query coordinates like 
> > >>> $domain->start('query'), I get what I consider the "hit" 
> coordinates, 
> > >>> since they are for the domain profile. Is this the intended 
> behavior? 
> > >>> 
> > >>> Thanks. 
> > >>> 
> > >>> scott 
> > >>> 
> > >>> -- 
> > >>> Scott A. Givan 
> > >>> Associate Director 
> > >>> Informatics Research Core Facility 
> > >>> 240e Bond Life Sciences Center 
> > >>> Research Assistant Professor 
> > >>> Molecular Microbiology and Immunology 
> > >>> University of Missouri, Columbia 
> > >>> 
> > >>> TEL 573-882-2948 
> > >>> FAX 573-884-9676 
> > >>> http://ircf.rnet.missouri.edu 
> > >>> 
> > >>> 
> > >>> 
> > >> 
> > >> _______________________________________________ 
> > >> Bioperl-l mailing list 
> > >> Bioperl-l at lists.open-bio.org 
> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l 
> > > 
> > 
> > _______________________________________________ 
> > Bioperl-l mailing list 
> > Bioperl-l at lists.open-bio.org 
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l 
> > 
> > 
> > 
> > 
>
>
> _______________________________________________ 
> Bioperl-l mailing list 
> Bioperl-l at lists.open-bio.org 
> http://lists.open-bio.org/mailman/listinfo/bioperl-l 
>


From zhoupenggeni at gmail.com  Wed Jul 11 20:05:56 2012
From: zhoupenggeni at gmail.com (Peng Zhou)
Date: Wed, 11 Jul 2012 13:05:56 -0700 (PDT)
Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates
In-Reply-To: <FFCC2B49-3582-4CEE-9658-3710B213E754@illinois.edu>
References: <CA4B2C62.4B0E%givans@missouri.edu>
	<7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com>
	<1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu>
	<EBB9878A-F77D-4A0C-B543-7E9FA0C25ABE@missouri.edu>
	<a73f4de8-8a2e-4dec-9073-62ab2e4da668@googlegroups.com>
	<FFCC2B49-3582-4CEE-9658-3710B213E754@illinois.edu>
Message-ID: <ee477075-8d1a-4172-a4f2-7c51f45d9224@googlegroups.com>

Thanks Chris, here is the link of the filed 
bug: https://redmine.open-bio.org/issues/3369

On Wednesday, July 11, 2012 2:02:46 PM UTC-5, Christopher Fields wrote:
>
> Peng, 
>
> Has this been filed as a bug yet?   
>
>     https://redmine.open-bio.org/projects/bioperl 
>
> Seems like it would be fairly easy to fix, but I want to track it just in 
> case. 
>
> chris 
>
> On Jul 11, 2012, at 12:45 PM, Peng Zhou wrote: 
>
> > Hello guys, 
> > 
> > Just a follow-up, it seems to me the bioperl-live version is still 
> having the same problem - calling hit "query" while query sequence "hit". I 
> also looked into the test script written for hmmer3 
> (bioperl-live/t/SearchIO/hmmer.t), and it doesn't deal with the alignment 
> part - I guess that's why this bug was not discovered. 
> > 
> > To be simple, here's an output of hmmsearch v3.0: 
> > # hmmsearch :: search profile(s) against a sequence database 
> > # HMMER 3.0 (March 2010); http://hmmer.org/ 
> > # Copyright (C) 2010 Howard Hughes Medical Institute. 
> > # Freely distributed under the GNU General Public License (GPLv3). 
> > # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
> - 
> > # query HMM file:                 
>  /project/youngn/zhoup/Scripts/spada/profile/21_all.hmm 
> > # target sequence database:       
>  /project/youngn/zhoup/Data/misc3/spada/Athaliana/01_genome/12_refseq_orf.fa 
>
> > # output directed to file:         
> /project/youngn/zhoup/Data/misc3/spada/Athaliana/11_hmmSearchX/01_raw.txt 
> > # number of worker threads:        4 
> > # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
> - 
> > 
> > Query:       CRP0000  [M=75] 
> > Scores for complete sequences (score includes all domains): 
> >    --- full sequence ---   --- best 1 domain ---    -#dom- 
> >     E-value  score  bias    E-value  score  bias    exp  N  Sequence     
>             Description 
> >     ------- ------ -----    ------- ------ -----   ---- --  --------     
>             ----------- 
> >     5.5e-25   95.0  14.4    5.7e-25   95.0  10.0    1.0  1 
>  Chr2_540228_540404_+     
> > 
> > Domain annotation for each sequence (and alignments): 
> > >> Chr2_540228_540404_+   
> >    #    score  bias  c-Evalue  i-Evalue hmmfrom  hmm to    alifrom  ali 
> to    envfrom  env to     acc 
> >  ---   ------ ----- --------- --------- ------- -------    ------- 
> -------    ------- -------    ---- 
> >    1 !   95.0  10.0   3.6e-30   5.7e-25      20      74 ..       4     
>  59 .]       1      59 [] 0.95 
> > 
> >   Alignments for each domain: 
> >   == domain 1    score: 95.0 bits;  conditional E-value: 3.6e-30 
> >                CRP0000 20 
> tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg.rrrCfCtkpc 74 
> >                           ++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf gG+crg 
> rrrC+Ct++c 
> >   Chr2_540228_540404_+  4 
> GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC 59 
> >                           
> 568899***99********************************************* PP 
> > 
> > And here is a dump of the parsed HSP object: 
> > $VAR1 = bless( { 
> >                  'VERBOSE' => 0, 
> >                  'IDENTICAL' => 0, 
> >                  'RANK' => 1, 
> >                  'STRANDED' => 'NONE', 
> >                  'EVALUE' => '3.6e-30', 
> >                  'HSP_LENGTH' => 56, 
> >                  'ALGORITHM' => 'HMMSEARCH' 
> >                  'SCORE' => '95.0', 
> >                  'GAP_SYMBOL' => '-', 
> >                  'CONSERVED' => 0, 
> >                   
> >                  'HIT_NAME' => 'Chr2_540228_540404_+', 
> >                  'HIT_DESC' => '', 
> >                  'HIT_START' => '20', 
> >                  'HIT_END' => '74', 
> >                  'HIT_LENGTH' => 56, 
> >                  'HIT_SEQ' => 
> 'tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg-rrrCfCtkpc', 
> >                  'HIT_FRAME' => 0, 
> >                   
> >                  'QUERY_NAME' => 'CRP0000', 
> >                  'QUERY_DESC' => undef, 
> >                  'QUERY_START' => '4', 
> >                  'QUERY_END' => '59', 
> >                  'QUERY_LENGTH' => '75', 
> >                  'QUERY_FRAME' => 0, 
> >                  'QUERY_SEQ' => 
> 'GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC', 
> >                   
> >                  'HOMOLOGY_SEQ' => '++gp+++eartCes+Sh+FkGpCvs 
> +nCa+vC++Egf gG+crg rrrC+Ct++c', 
> >                }, 'Bio::Search::HSP::HMMERHSP' ); 
> > 
> > Clearly, the "HIT_START", "HIT_END", "HIT_SEQ" should actually be 
> exchanged with "QUERY_START", "QUERY_END" and "QUERY_SEQ" values. 
> > 
> > Thanks, 
> > 
> > Peng, 
> > 
> > On Tuesday, July 19, 2011 11:23:20 PM UTC-5, Givan, Scott A. wrote: 
> > I'll try the bioperl-live version. Thanks guys. 
> > Scott Givan 
> > 541-740-4685 
> > Sent from an iPhone (so expect typos). 
> > 
> > On Jul 19, 2011, at 10:34 PM, "Chris Fields" <cjfields at illinois.edu> 
> wrote: 
> > 
> > > This might be a disconnect between the HMMER3 version in bioperl-live 
> and the one in Kai's bioperl-hmmer3 repo.  I believe the one in 
> bioperl-live is newer.  Scott, can you give that a try? 
> > > 
> > > chris 
> > > 
> > > On Jul 19, 2011, at 9:45 PM, Thomas Sharpton wrote: 
> > > 
> > >> Hi Scott, 
> > >> 
> > >> Thanks for writing. I'm on the road at the moment so I have to be 
> briefer and less thorough than I'd like to be. 
> > >> 
> > >> What you are observing is not the intended behavior. Oddly, it's not 
> what I recall obtaining in my tests on this software, though I was mostly 
> interested in hmmsearch at the time and may have been sloppier than I 
> should have been when it came to hmmscan. 
> > >> 
> > >> What version of HMMER3 you're using? There have been some small 
> formatting changes in the past that might be causing a burp in the parser, 
> though I'm doubting it. 
> > >> 
> > >> Kai Blin wrote some test scripts (found here: 
> bioperl-live/t/SearchIO/hmmer.t) that, if I recall correctly, evaluate 
> query/hit coordinates. It might be worth giving this a shot if you haven't 
> already. 
> > >> 
> > >> Also, if you don't mind, I'm happy to run your code on your output 
> file on my end. It might help me diagnose the problem. 
> > >> 
> > >> Sorry this is being a thorn in your side! I've cc'ed the list in case 
> anyone else has insight into this matter. 
> > >> 
> > >> Best, 
> > >> Thomas 
> > >> 
> > >> On Jul 19, 2011, at 10:43 AM, Givan, Scott A. wrote: 
> > >> 
> > >>> Hi Thomas, 
> > >>> 
> > >>> I'm using modules in the bipoerl-hmmer3 git repository to parse 
> hmmscan 
> > >>> reports. When I parse the files and walk through the HSP's like: 
> > >>> 
> > >>> while (my $hit = $rslt->next_model) { 
> > >>> 
> > >>>    while (my $domain = $hit->next_hsp) { 
> > >>> 
> > >>> And retrieve the "hit" coordinates like: 
> > >>> 
> > >>>        print "hit coords: ", $domain->start('hit'), "-", 
> $domain->end('hit'), 
> > >>> "\n"; 
> > >>> 
> > >>> The coordinates returned correspond to what I would call the 
> "query", 
> > >>> since they are for the sequence I fed to hmmscan to search the 
> profile 
> > >>> database. Likewise, when retrieving the query coordinates like 
> > >>> $domain->start('query'), I get what I consider the "hit" 
> coordinates, 
> > >>> since they are for the domain profile. Is this the intended 
> behavior? 
> > >>> 
> > >>> Thanks. 
> > >>> 
> > >>> scott 
> > >>> 
> > >>> -- 
> > >>> Scott A. Givan 
> > >>> Associate Director 
> > >>> Informatics Research Core Facility 
> > >>> 240e Bond Life Sciences Center 
> > >>> Research Assistant Professor 
> > >>> Molecular Microbiology and Immunology 
> > >>> University of Missouri, Columbia 
> > >>> 
> > >>> TEL 573-882-2948 
> > >>> FAX 573-884-9676 
> > >>> http://ircf.rnet.missouri.edu 
> > >>> 
> > >>> 
> > >>> 
> > >> 
> > >> _______________________________________________ 
> > >> Bioperl-l mailing list 
> > >> Bioperl-l at lists.open-bio.org 
> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l 
> > > 
> > 
> > _______________________________________________ 
> > Bioperl-l mailing list 
> > Bioperl-l at lists.open-bio.org 
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l 
> > 
> > 
> > 
> > 
>
>
> _______________________________________________ 
> Bioperl-l mailing list 
> Bioperl-l at lists.open-bio.org 
> http://lists.open-bio.org/mailman/listinfo/bioperl-l 
>


From w.arindrarto at gmail.com  Wed Jul 11 21:25:44 2012
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Wed, 11 Jul 2012 23:25:44 +0200
Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates
In-Reply-To: <CAKVJ-_4mo16NxEbZkNp3-EG9uknc4Fo4m6ajQWTknumQNUn4Yw@mail.gmail.com>
References: <CA4B2C62.4B0E%givans@missouri.edu>
	<7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com>
	<1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu>
	<EBB9878A-F77D-4A0C-B543-7E9FA0C25ABE@missouri.edu>
	<a73f4de8-8a2e-4dec-9073-62ab2e4da668@googlegroups.com>
	<FFCC2B49-3582-4CEE-9658-3710B213E754@illinois.edu>
	<CAKVJ-_4mo16NxEbZkNp3-EG9uknc4Fo4m6ajQWTknumQNUn4Yw@mail.gmail.com>
Message-ID: <CADEGkF5mE=1Kgy5X4BX9VH0TuChU4OyKuf3e-rzH_SeB=xxapA@mail.gmail.com>

Hi everyone,

Just as an additional info that might be useful:

The current Biopython parser for the plain text format parses the very
first line to find out which HMMER flavor produces the result. Both 'hmm
from' and 'hmmto' are query coordinates if the flavor is hmmsearch
or phmmer; and they're hit coordinates if the flavor is hmmscan.

This information is not available in other HMMER command line output
formats (tblout and domtblout), which as Peter has mentioned, required
us to treat different flavors of the table output as different formats
for the time being.
Fortunately, after contacting the HMMER developers they mentioned that this
is not the case anymore in their development branch (and their future
planned release).

Hope that helps :),
Bow


On Wed, Jul 11, 2012 at 11:00 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Wed, Jul 11, 2012 at 8:02 PM, Fields, Christopher J
> <cjfields at illinois.edu> wrote:
>> Peng,
>>
>> Has this been filed as a bug yet?
>>
>>     https://redmine.open-bio.org/projects/bioperl
>>
>> Seems like it would be fairly easy to fix, but I want to track it just in case.
>>
>> chris
>
> Hi all,
>
> This could be the unfortunate fact that hmmscan and
> hmmsearch return very similar tabular output, but
> with query and hit interchanged. i.e. You need some
> extra information to know which way round they are
> (not possible with the current output). This was an
> issue in Bow's Biopython SearchIO project - which
> for the moment he solved by handling this as two
> hmmer file formats. In the medium term we're hoping
> hmmer3 will add some header information or something.
>
> Peter


From dejian.zhao at gmail.com  Thu Jul 12 05:04:54 2012
From: dejian.zhao at gmail.com (De-Jian Zhao)
Date: Thu, 12 Jul 2012 13:04:54 +0800
Subject: [Bioperl-l] Errors with Bio::Seq objects
In-Reply-To: <25cf332a-f998-4eae-96ff-d9db1ee2ff2c@googlegroups.com>
References: <4FFDB879.1020906@gmail.com>
	<25cf332a-f998-4eae-96ff-d9db1ee2ff2c@googlegroups.com>
Message-ID: <4FFE5AF6.1020300@gmail.com>

Thank you, Peng. That's great! Actually I am wondering how to get the 
whole content of an object these days; "Dumping it" is a good solution.

On 2012-7-12 2:03, Peng Zhou wrote:
> Hi,
>
> I guess that's what the commands are supposed to do: the get_SeqFeatures()
> method return an array of Bio::SeqFeature objects, and the translate()
> method returns a Bio::Seq object. And you can't simply "print" an object in
> perl - you can "dump" it though:
>
> $ perl -e ' use Bio::SeqIO; use Data::Dumper; my
> $seqio=Bio::SeqIO->new(-file=>shift);
> print Dumper($seqio->next_seq()->get_SeqFeatures()); ' nt.gb
>
> $ perl -e ' use Bio::SeqIO; use Data::Dumper; my
> $seqio=Bio::SeqIO->new(-file=>shift);
> print Dumper($seqio->next_seq()->translate()); ' nt.gb
>
> On Wednesday, July 11, 2012 12:31:37 PM UTC-5, De-Jian Zhao wrote:
>> Hi,
>>
>> I downloaded a nucleotide sequence from Genbank (file name: nt.gb) and
>> tested the Bio::SeqIO module as follows. The first 3 commands succeeded;
>> however the last 2 failed.
>>
>> I think $seqio->next_seq() produces a Bio::Seq object which contains the
>> sequence, features and annotation (according to the DESCRIPTION of
>> "perldoc Bio::Seq") and thus the invocation of the methods
>> get_SeqFeatures() and translate() should be valid. However, the results
>> denied this idea.
>>
>> Will anyone explain what happened to the last 2 commands? I have
>> encountered numerous cases of failures when testing the bioperl methods.
>> I want to translate the mRNA sequence and extract the sequence features.
>> What are the right commands? Thanks a lot!
>>
>> Best,
>> Dejian
>>
>>
>>
>> PS: The commands and results
>>
>> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift);
>> print $seqio->next_seq()->display_id(); ' nt.gb
>> NM_053056
>>
>> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift);
>> print $seqio->next_seq()->molecule(); ' nt.gb
>> mRNA
>>
>> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift);
>> print $seqio->next_seq()->subseq(1,6); ' nt.gb
>> CACACG
>>
>> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift);
>> print $seqio->next_seq()->get_SeqFeatures(); ' nt.gb
>> Bio::SeqFeature::Generic=HASH(0x20a30898)Bio::SeqFeature::Generic=HASH(0x20a30bb0)Bio::SeqFeature::Generic=HASH(0x20a30cd0)Bio::SeqFeature::Generic=HASH(0x20a317b0)Bio::SeqFeature::Generic=HASH(0x20a31720)Bio::SeqFeature::Generic=HASH(0x20a39a18)Bio::SeqFeature::Generic=HASH(0x20a317e0)Bio::SeqFeature::Generic=HASH(0x20a398b0)Bio::SeqFeature::Generic=HASH(0x20a39838)Bio::SeqFeature::Generic=HASH(0x20a39e98)Bio::SeqFeature::Generic=HASH(0x20a3b898)Bio::SeqFeature::Generic=HASH(0x20a3a120)Bio::SeqFeature::Generic=HASH(0x20a3bda8)Bio::SeqFeature::Generic=HASH(0x20a3c030)Bio::SeqFeature::Generic=HASH(0x20a3c2b8)Bio::SeqFeature::Generic=HASH(0x20a3be20)Bio::SeqFeature::Generic=HASH(0x20a3c0a8)Bio::SeqFeature::Generic=HASH(0x20a3bb98)Bio::SeqFeature::Generic=HASH(0x20a3c300)Bio::SeqFeature::Generic=HASH(0x20a3c588)Bio::SeqFeature::Generic=HASH(0x20a3d838)Bio::SeqFeature::Generic=HASH(0x20a3dfb8)Bio::SeqFeature::Generic=HASH(0x20a3dd18)
>>
>>
>> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift);
>> print $seqio->next_seq()->translate(); ' nt.gb
>> Bio::Seq::RichSeq=HASH(0x20a3e7b0)
>>


From dejian.zhao at gmail.com  Thu Jul 12 05:14:33 2012
From: dejian.zhao at gmail.com (De-Jian Zhao)
Date: Thu, 12 Jul 2012 13:14:33 +0800
Subject: [Bioperl-l] Errors with Bio::Seq objects
In-Reply-To: <9CA9DA3A-B03F-4EC3-977C-E18A6F4D9B6F@tamu.edu>
References: <4FFDB879.1020906@gmail.com>
	<9CA9DA3A-B03F-4EC3-977C-E18A6F4D9B6F@tamu.edu>
Message-ID: <4FFE5D39.6010406@gmail.com>

Thank you, Jim. You are right. It works. This example deepens my 
understanding of OOP.

On 2012-7-12 2:01, Jim Hu wrote:
>> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->translate(); ' nt.gb
>> >  Bio::Seq::RichSeq=HASH(0x20a3e7b0)
> ->translate returns a new Seq object.  I think
>
> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->translate()->seq(); ' nt.gb
>
> should work (haven't tried it).


From kai.blin at biotech.uni-tuebingen.de  Thu Jul 12 13:43:19 2012
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Thu, 12 Jul 2012 15:43:19 +0200
Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates
In-Reply-To: <CADEGkF5mE=1Kgy5X4BX9VH0TuChU4OyKuf3e-rzH_SeB=xxapA@mail.gmail.com>
References: <CA4B2C62.4B0E%givans@missouri.edu>
	<7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com>
	<1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu>
	<EBB9878A-F77D-4A0C-B543-7E9FA0C25ABE@missouri.edu>
	<a73f4de8-8a2e-4dec-9073-62ab2e4da668@googlegroups.com>
	<FFCC2B49-3582-4CEE-9658-3710B213E754@illinois.edu>
	<CAKVJ-_4mo16NxEbZkNp3-EG9uknc4Fo4m6ajQWTknumQNUn4Yw@mail.gmail.com>
	<CADEGkF5mE=1Kgy5X4BX9VH0TuChU4OyKuf3e-rzH_SeB=xxapA@mail.gmail.com>
Message-ID: <4FFED477.3090907@biotech.uni-tuebingen.de>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 2012-07-11 23:25, Wibowo Arindrarto wrote:

Hi,

> The current Biopython parser for the plain text format parses the
> very first line to find out which HMMER flavor produces the result.
> Both 'hmm from' and 'hmmto' are query coordinates if the flavor is
> hmmsearch or phmmer; and they're hit coordinates if the flavor is
> hmmscan.

Whoops. I mostly looked at hmmscan when writing the parser, because
that's the file format I needed for my code. The code clearly should
follow the way the hmmer2 parser works, and differentiate between
hmmsearch and hmmscan type output.

As I said on the bug report, I'm happy to look at code fixing this.

> This information is not available in other HMMER command line
> output formats (tblout and domtblout), which as Peter has
> mentioned, required us to treat different flavors of the table
> output as different formats for the time being.

As far as I'm aware, BioPerl currently doesn't parse the table output
format.

Seeing how much repeated pain we run into with all these parsers in
the different Bio* projects, I wonder if there was a smarter way to
deal with parsing. Maybe at least some shared grammar file that we
could use for testing, to make sure we at least have the same
expectations about file formats in the different language
implementations. Ideally we'd auto-generate the parsers from the
grammar specification, but I guess that'll stay wishful thinking for
quite a bit.

> Fortunately, after contacting the HMMER developers they mentioned
> that this is not the case anymore in their development branch (and
> their future planned release).

That's certainly good news. :)

Cheers,
Kai

- -- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-Universit?t T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Germany
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJP/tR3AAoJEKM5lwBiwTTP6OoIAM3J9chdyfmTuQTp4KMxVIk7
PCkJy+aLcnfa3d7s8BVPG0GWQTPrfHLX6a7zWfoSLzL9RBShFWCQIxGpu+Tq3yR8
Hu/TpoFIg8bB1iAroAWLdsX8nio3Idlcl5JN38LBsFEUirFrGAsvfdN/+fYrP5Ni
y0ULP18uihiN07sVG88nZXNyEB7fIscVYdO90GsGq03/KOTRsRD4kugapiQJIy4D
lrqnYznLa4p30lBDCEHbTaHYbfIs7/8tryfHJsfjimjg8IoSMHMJfIkI7/z0qlL+
bxt/HuGMsm1Ak08xEAoT7T00t5tcAp1gclgZsO/CrviOicmhUgd6iri/kIpzg0c=
=acWd
-----END PGP SIGNATURE-----


From cjfields at illinois.edu  Thu Jul 12 15:24:13 2012
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Thu, 12 Jul 2012 15:24:13 +0000
Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates
In-Reply-To: <4FFED477.3090907@biotech.uni-tuebingen.de>
References: <CA4B2C62.4B0E%givans@missouri.edu>
	<7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com>
	<1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu>
	<EBB9878A-F77D-4A0C-B543-7E9FA0C25ABE@missouri.edu>
	<a73f4de8-8a2e-4dec-9073-62ab2e4da668@googlegroups.com>
	<FFCC2B49-3582-4CEE-9658-3710B213E754@illinois.edu>
	<CAKVJ-_4mo16NxEbZkNp3-EG9uknc4Fo4m6ajQWTknumQNUn4Yw@mail.gmail.com>
	<CADEGkF5mE=1Kgy5X4BX9VH0TuChU4OyKuf3e-rzH_SeB=xxapA@mail.gmail.com>
	<4FFED477.3090907@biotech.uni-tuebingen.de>
Message-ID: <1C3A31F9-9717-49F3-A880-FA725D0F3CDB@illinois.edu>

On Jul 12, 2012, at 8:43 AM, Kai Blin wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 2012-07-11 23:25, Wibowo Arindrarto wrote:
> 
> Hi,
> 
>> The current Biopython parser for the plain text format parses the
>> very first line to find out which HMMER flavor produces the result.
>> Both 'hmm from' and 'hmmto' are query coordinates if the flavor is
>> hmmsearch or phmmer; and they're hit coordinates if the flavor is
>> hmmscan.
> 
> Whoops. I mostly looked at hmmscan when writing the parser, because
> that's the file format I needed for my code. The code clearly should
> follow the way the hmmer2 parser works, and differentiate between
> hmmsearch and hmmscan type output.
> 
> As I said on the bug report, I'm happy to look at code fixing this.

Seems like it should be easy enough to address if there is something in the output that indicates the report type.

>> This information is not available in other HMMER command line
>> output formats (tblout and domtblout), which as Peter has
>> mentioned, required us to treat different flavors of the table
>> output as different formats for the time being.
> 
> As far as I'm aware, BioPerl currently doesn't parse the table output
> format.

The only reason to do so is if the table provides additional information the actual hits don't (this can be the case with BLAST reports).

> Seeing how much repeated pain we run into with all these parsers in
> the different Bio* projects, I wonder if there was a smarter way to
> deal with parsing. Maybe at least some shared grammar file that we
> could use for testing, to make sure we at least have the same
> expectations about file formats in the different language
> implementations. Ideally we'd auto-generate the parsers from the
> grammar specification, but I guess that'll stay wishful thinking for
> quite a bit.

I would fully support something like this, been thinking about this with Marpa::XS (which now has a compiled library, libmarpa, to make it less perl-centric), and there have been talks of using a similar toolkit with the bioruby folks.  We could always have a plain-perl/python/ruby/etc fallback in the most common formats.

chris


From buschj at hhu.de  Sun Jul 15 19:46:42 2012
From: buschj at hhu.de (jobu)
Date: Sun, 15 Jul 2012 21:46:42 +0200
Subject: [Bioperl-l] How to obtain Up- and Downstream target-Sequences of
	Blast Matches
Message-ID: <50031E22.3060902@hhu.de>

Dear All.

Still being a beginner in Perl and just having started to look into
BioPerl, I hope to ask my question at the right place.

I locally ran a standalone blastn search of many short query-sequences
against a set of target-fasta-sequences consisting of whole chromosomal
sequence data.

What I need to do now is to get let's say 100nt each Up- and Downstream
out of my target sequences for each Blast match.

At this point I only can assume that BioPerl might be helpfull in
resolving this task, though I haven't found a module yet that will
manage to do this locally on my harddrive.

Thus I would be thankful for the slightest hint where to begin.

Sincerely
Jochen


From Russell.Smithies at agresearch.co.nz  Sun Jul 15 21:19:14 2012
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Mon, 16 Jul 2012 09:19:14 +1200
Subject: [Bioperl-l] How to obtain Up- and Downstream target-Sequences
 of	Blast Matches
In-Reply-To: <50031E22.3060902@hhu.de>
References: <50031E22.3060902@hhu.de>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF34CCF2A4CAA@exchsth.agresearch.co.nz>

Hi Jochen,
I don't think BioPerl can directly manipulate blast databases so I'd probably do it with fastacmd to extract the sequence from the original blast database.
eg.
fastacmd -s X51494.1 -d /dataset/blastdata/active/nt -L 100,200
>gi|20090|emb|X51494.1|:100-200 Rice prolamin gene (strain NE4)
ATGATGCAAACGTTGGGCATGGGTAGCTCCACAGCCATGTTCATGTCGCAGCCAATGGCGCTCCTGCAGCAGCAATGTTG
CATGCAGCTACAAGGCATGAT

Or if you're using blast+, use the blastdbcmd command:
eg.
blastdbcmd -entry  X51494.1 -db /dataset/blastdata/active/nt -range 100-200
>gi|20090|emb|X51494.1|:100-200 Rice prolamin gene (strain NE4)
ATGATGCAAACGTTGGGCATGGGTAGCTCCACAGCCATGTTCATGTCGCAGCCAATGGCGCTCCTGCAGCAGCAATGTTG
CATGCAGCTACAAGGCATGAT

So to add it all together, try using BioPerl to parse your existing blast results and pull out each hit's coordinates then use a system call to exec fastacmd or blastdbcmd to extract the sequence from the blast database then write the sequences to file.

These might be useful:
http://www.bioperl.org/wiki/HOWTO:SearchIO
http://www.bioperl.org/wiki/HOWTO:SearchIO#Speed_improvements_with_lightweight_objects 
http://www.bioperl.org/wiki/HOWTO:BlastPlus
http://www.bioperl.org/wiki/HOWTO:StandAloneBlast


--Russell

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of jobu
Sent: Monday, 16 July 2012 7:47 a.m.
To: bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] How to obtain Up- and Downstream target-Sequences of Blast Matches

Dear All.

Still being a beginner in Perl and just having started to look into BioPerl, I hope to ask my question at the right place.

I locally ran a standalone blastn search of many short query-sequences against a set of target-fasta-sequences consisting of whole chromosomal sequence data.

What I need to do now is to get let's say 100nt each Up- and Downstream out of my target sequences for each Blast match.

At this point I only can assume that BioPerl might be helpfull in resolving this task, though I haven't found a module yet that will manage to do this locally on my harddrive.

Thus I would be thankful for the slightest hint where to begin.

Sincerely
Jochen
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From dcmertens.perl at gmail.com  Tue Jul 17 12:57:55 2012
From: dcmertens.perl at gmail.com (David Mertens)
Date: Tue, 17 Jul 2012 07:57:55 -0500
Subject: [Bioperl-l] Announcing The Quantified Onion Google Group and
	perl4science.github.com
Message-ID: <CA+4ieYXGs=ywH97gEdTOakVLp22n0gPuwZJK-C75C7ijUzPgwg@mail.gmail.com>

Hello everybody -

I returned from YAPC::NA this year intending to build-up the scientific
Perl community. One outgrowth of this has been Joel Berger's creation of
perl4science.github.com and gizmomathboy's creation of The Quantified Onion
Google Group<https://groups.google.com/forum/#%21forum/the-quantified-onion>
.

perl4science is meant to be a landing page for anybody looking to combine
Perl and science. Since it is a github repository, it makes it about as
easy as possible for others to contribute content or fixes. If you have a
project that scientists would find useful, you should fork the project, add
your content, and issue a pull request. It's that easy.

The Quantified Onion is meant to be a space for scientists to discuss how
we use Perl in our science and to work together to grow adoption of Perl
among scientists. It will undoubtedly attract newcomers to Perll asking
beginner questions, at which point we will gently refer them to the
appropriate manual pages. Interesting discussions thus far (in my mind)
include a discussion about teaching test-driven design and a discussion
about submitting an article to Computing in Science and Engineering for
their November Issue, which is supposed to be about Modern Programming
Languages. I would like to begin putting on workshops on Perl for
Scientists and Engineers (and encourage others to do that same), and I will
begin the discussion on The Quantified Onion.

If you know of other Perl science resources, please feel free to add them
to perl4science or post them on The Quantified Onion, and please join The
Quantified Onion. Together, we can grow Perl's adoption among scientists!

David Mertens

-- 
 "Debugging is twice as hard as writing the code in the first place.
  Therefore, if you write the code as cleverly as possible, you are,
  by definition, not smart enough to debug it." -- Brian Kernighan


From cjfields at illinois.edu  Wed Jul 18 14:29:02 2012
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Wed, 18 Jul 2012 14:29:02 +0000
Subject: [Bioperl-l] [blast-announce] OLD_BLAST parameter to be
 discontinued. Alternative NCBI BLAST parsable formats are available
References: <CC2C3DA1.A01D6%mcginnis@ncbi.nlm.nih.gov>
Message-ID: <118F034CF4C3EF48A96F86CE585B94BF2B63F6C5@CHIMBX5.ad.uillinois.edu>

Not sure if anyone is using this as a means of getting their reports (I don't), but I'm posting this here just in case.

-c

Begin forwarded message:

> From: "Mcginnis, Scott (NIH/NLM/NCBI) [E]" 
> Subject: [blast-announce] OLD_BLAST parameter to be discontinued. Alternative NCBI BLAST parsable formats are available
> Date: July 18, 2012 9:17:05 AM CDT
> To: NLM/NCBI List blast-announce
> 
> Beginning Sept. 10, 2012, the BLAST service will ignore the OLD_BLAST parameter in posted URLs. We are removing this old and little used option to prepare for upcoming enhancements to the BLAST service later this year. Setting OLD_BLAST=true produces an older version of the BLAST HTML results that a few people have used for automated processing (parsing) of results. NCBI BLAST supports a number of different and more stable formats for parsing. These include XML, tabular reports and ASN.1. For more information, please see BLAST Developer Information (http://1.usa.gov/O8AocI) and links on that page.
> 


From dejian.zhao at gmail.com  Wed Jul 18 15:36:14 2012
From: dejian.zhao at gmail.com (De-Jian Zhao)
Date: Wed, 18 Jul 2012 23:36:14 +0800
Subject: [Bioperl-l] Which graphic module should I learn?
Message-ID: <5006D7EE.1020205@gmail.com>

Hi, all.

Currently I am working on a genome. I will draw some pictures based on 
the sequencing data. In the long run, I will use the module in my future 
projects, so I want to learn a popular module to get better support from 
the community. I searched in cpan with the command "i /SVG/" and got 234 
items. Which one is popular in bioinformatics? Which module should I 
start with?  Thanks for any suggestions.

Best,
De-Jian


From scott at scottcain.net  Wed Jul 18 15:46:01 2012
From: scott at scottcain.net (Scott Cain)
Date: Wed, 18 Jul 2012 11:46:01 -0400
Subject: [Bioperl-l] Which graphic module should I learn?
In-Reply-To: <5006D7EE.1020205@gmail.com>
References: <5006D7EE.1020205@gmail.com>
Message-ID: <CA+JTaow0sJ-MTq47r1vvvu+H2zPu=U8VuEi++s-YDxcTo1+Ybg@mail.gmail.com>

Hi De-Jian,

Of course, it depends on what you want to do, but if you're referring
to the genome feature/annotation type graphics, Bio::Graphics already
supports SVG pretty well, via GD::SVG.

Scott


On Wed, Jul 18, 2012 at 11:36 AM, De-Jian Zhao <dejian.zhao at gmail.com> wrote:
> Hi, all.
>
> Currently I am working on a genome. I will draw some pictures based on the
> sequencing data. In the long run, I will use the module in my future
> projects, so I want to learn a popular module to get better support from the
> community. I searched in cpan with the command "i /SVG/" and got 234 items.
> Which one is popular in bioinformatics? Which module should I start with?
> Thanks for any suggestions.
>
> Best,
> De-Jian
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research


From cjfields at illinois.edu  Wed Jul 25 03:08:05 2012
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Wed, 25 Jul 2012 03:08:05 +0000
Subject: [Bioperl-l] BioPerl and Travis-CI
Message-ID: <118F034CF4C3EF48A96F86CE585B94BF3140B017@CITESMBX5.ad.uillinois.edu>

Peter Cock has graciously helped start up a branch for bioperl-live that is using Travis-CI (a nice continuous integration tool).  Results from Peter's fork are found here:

    http://travis-ci.org/#!/peterjc/bioperl-live

As this is now pulled into the main bioperl repo, results will be here:

    http://travis-ci.org/#!/bioperl/bioperl-live

I'll be working on this and expect this will be added to master in the next few days.  

chris


From p.j.a.cock at googlemail.com  Wed Jul 25 10:31:13 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 25 Jul 2012 11:31:13 +0100
Subject: [Bioperl-l] BioPerl and Travis-CI
In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF3140B017@CITESMBX5.ad.uillinois.edu>
References: <118F034CF4C3EF48A96F86CE585B94BF3140B017@CITESMBX5.ad.uillinois.edu>
Message-ID: <CAKVJ-_4TN_OUGLGrcwp-cNkQGhXGn2BiO9F1NEzAiua49HY3Lg@mail.gmail.com>

On Wed, Jul 25, 2012 at 4:08 AM, Fields, Christopher J
<cjfields at illinois.edu> wrote:
> Peter Cock has graciously helped start up a branch for bioperl-live
> that is using Travis-CI (a nice continuous integration tool).  Results
> from Peter's fork are found here:
>
>     http://travis-ci.org/#!/peterjc/bioperl-live
>
> As this is now pulled into the main bioperl repo, results will be here:
>
>     http://travis-ci.org/#!/bioperl/bioperl-live
>
> I'll be working on this and expect this will be added to master in
> the next few days.
>
> chris

We've had this running for Biopython for a month now, and it has
been a useful complement to the BuildBot (which covers other
operating systems). This was following BioRuby's lead:
http://biopython.org/pipermail/biopython-dev/2012-June/009742.html

The current BioPerl Travis configuration is probably usable right
now (after changing the branch whitelist to either master, or simple
all branches).

Other remaining issues include sorting out which dependencies
should be installed, and streamlining their verbose output (e.g.
using tail).

TravisCI can send out emails (e.g. on test failures), and perhaps
bioperl-guts-l might be a sensible place to send these. Initially
we'd disabled the emails for Biopython. I'd like to use an RSS
feed... there is a JSON API which BioRuby are using for
http://www.biogems.info/ which tracks their plugins.

Peter


From p.j.a.cock at googlemail.com  Fri Jul 27 15:03:05 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 27 Jul 2012 16:03:05 +0100
Subject: [Bioperl-l] BioPerl and Travis-CI
In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF3140D21F@CITESMBX5.ad.uillinois.edu>
References: <118F034CF4C3EF48A96F86CE585B94BF3140B017@CITESMBX5.ad.uillinois.edu>
	<CAKVJ-_4TN_OUGLGrcwp-cNkQGhXGn2BiO9F1NEzAiua49HY3Lg@mail.gmail.com>
	<CAKVJ-_7g-rVQ37Q0Svg4LGXN7C=3i7Mab-kjUc2XpuuX5uEXsQ@mail.gmail.com>
	<CAKVJ-_45r1Hza-ga95oyMRbYGzupsKtmAZ=ac29g2n-U9SegWA@mail.gmail.com>
	<118F034CF4C3EF48A96F86CE585B94BF3140D21F@CITESMBX5.ad.uillinois.edu>
Message-ID: <CAKVJ-_4iYB0N7A6eOEivrKTkeQZzw=9jiS7kqVCKR2aeEiZZaQ@mail.gmail.com>

On Fri, Jul 27, 2012 at 3:58 PM, Fields, Christopher J
<cjfields at illinois.edu> wrote:
> On Jul 27, 2012, at 9:47 AM, Peter Cock wrote:
>
>> On Thu, Jul 26, 2012 at 4:22 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>>>
>>> That's done now - except for the circular dependencies, and GD,
>>> which might be easy to solve if anyone knows what the error
>>> means - see commit message here:
>>> https://github.com/peterjc/bioperl-live/commit/905441ac09939be3368c14de38d04486c7e9849a
>>
>> Re: https://twitter.com/cjfields/status/228861370454638592
>> Not sure why you got GD to work when something very similar
>> had failed for me. Oh well - job done :)
>
> It was the lack of gdlib-config in the libgd2-xpm package, you need
> libgd2-xpm-dev.  One of the fun things about Debian packaging.

Ah - I should have guessed that.

>>> Would a single clean commit of the (current) .travis.yml file be
>>> preferable to the current series of commits? And you you want
>>> a pull request, or would you just merge/cherry-pick manually?
>>
>> Given all the churn between our revisions, personally I'd opt for
>> a single clean commit to bioperl/master - but your call.
>>
>> Peter
>
> Yep, about to merge it over.  It's working now, just need to
> whitelist master instead of travis after the merge.

I'd removed the whitelist altogether here:
https://github.com/peterjc/bioperl-live/commit/96dc5866f4406179353909c72d812623341c8fbd

My thinking was BioPerl seems to have multiple feature branches
under the official repo, so they should get tested too. You'd be
in a better position than me to judge what would work best for
BioPerl here.

Peter


From cjfields at illinois.edu  Fri Jul 27 14:58:21 2012
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Fri, 27 Jul 2012 14:58:21 +0000
Subject: [Bioperl-l] BioPerl and Travis-CI
In-Reply-To: <CAKVJ-_45r1Hza-ga95oyMRbYGzupsKtmAZ=ac29g2n-U9SegWA@mail.gmail.com>
References: <118F034CF4C3EF48A96F86CE585B94BF3140B017@CITESMBX5.ad.uillinois.edu>
	<CAKVJ-_4TN_OUGLGrcwp-cNkQGhXGn2BiO9F1NEzAiua49HY3Lg@mail.gmail.com>
	<CAKVJ-_7g-rVQ37Q0Svg4LGXN7C=3i7Mab-kjUc2XpuuX5uEXsQ@mail.gmail.com>
	<CAKVJ-_45r1Hza-ga95oyMRbYGzupsKtmAZ=ac29g2n-U9SegWA@mail.gmail.com>
Message-ID: <118F034CF4C3EF48A96F86CE585B94BF3140D21F@CITESMBX5.ad.uillinois.edu>

On Jul 27, 2012, at 9:47 AM, Peter Cock wrote:

> On Thu, Jul 26, 2012 at 4:22 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>> 
>> That's done now - except for the circular dependencies, and GD,
>> which might be easy to solve if anyone knows what the error
>> means - see commit message here:
>> https://github.com/peterjc/bioperl-live/commit/905441ac09939be3368c14de38d04486c7e9849a
> 
> Re: https://twitter.com/cjfields/status/228861370454638592
> Not sure why you got GD to work when something very similar
> had failed for me. Oh well - job done :)

It was the lack of gdlib-config in the libgd2-xpm package, you need libgd2-xpm-dev.  One of the fun things about Debian packaging.

>> Would a single clean commit of the (current) .travis.yml file be
>> preferable to the current series of commits? And you you want
>> a pull request, or would you just merge/cherry-pick manually?
> 
> Given all the churn between our revisions, personally I'd opt for
> a single clean commit to bioperl/master - but your call.
> 
> Peter

Yep, about to merge it over.  It's working now, just need to whitelist master instead of travis after the merge.

chris


From cjfields at illinois.edu  Fri Jul 27 16:26:34 2012
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Fri, 27 Jul 2012 16:26:34 +0000
Subject: [Bioperl-l] BioPerl Travis-CI now live
Message-ID: <118F034CF4C3EF48A96F86CE585B94BF3140D54D@CITESMBX5.ad.uillinois.edu>

All commits to bioperl-live master branch on github are now being tracked:

   http://travis-ci.org/#!/bioperl/bioperl-live

The .travis.yml file has a whitelist for branches to be tested; if anyone wants to test additional branches feel free to add them to the list!

chris


From cjfields at illinois.edu  Fri Jul 27 15:15:19 2012
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Fri, 27 Jul 2012 15:15:19 +0000
Subject: [Bioperl-l] BioPerl and Travis-CI
In-Reply-To: <CAKVJ-_4iYB0N7A6eOEivrKTkeQZzw=9jiS7kqVCKR2aeEiZZaQ@mail.gmail.com>
References: <118F034CF4C3EF48A96F86CE585B94BF3140B017@CITESMBX5.ad.uillinois.edu>
	<CAKVJ-_4TN_OUGLGrcwp-cNkQGhXGn2BiO9F1NEzAiua49HY3Lg@mail.gmail.com>
	<CAKVJ-_7g-rVQ37Q0Svg4LGXN7C=3i7Mab-kjUc2XpuuX5uEXsQ@mail.gmail.com>
	<CAKVJ-_45r1Hza-ga95oyMRbYGzupsKtmAZ=ac29g2n-U9SegWA@mail.gmail.com>
	<118F034CF4C3EF48A96F86CE585B94BF3140D21F@CITESMBX5.ad.uillinois.edu>
	<CAKVJ-_4iYB0N7A6eOEivrKTkeQZzw=9jiS7kqVCKR2aeEiZZaQ@mail.gmail.com>
Message-ID: <118F034CF4C3EF48A96F86CE585B94BF3140D2D6@CITESMBX5.ad.uillinois.edu>


On Jul 27, 2012, at 10:03 AM, Peter Cock wrote:

> On Fri, Jul 27, 2012 at 3:58 PM, Fields, Christopher J
> <cjfields at illinois.edu> wrote:
>> On Jul 27, 2012, at 9:47 AM, Peter Cock wrote:
>> 
>>> On Thu, Jul 26, 2012 at 4:22 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>>>> 
>>>> That's done now - except for the circular dependencies, and GD,
>>>> which might be easy to solve if anyone knows what the error
>>>> means - see commit message here:
>>>> https://github.com/peterjc/bioperl-live/commit/905441ac09939be3368c14de38d04486c7e9849a
>>> 
>>> Re: https://twitter.com/cjfields/status/228861370454638592
>>> Not sure why you got GD to work when something very similar
>>> had failed for me. Oh well - job done :)
>> 
>> It was the lack of gdlib-config in the libgd2-xpm package, you need
>> libgd2-xpm-dev.  One of the fun things about Debian packaging.
> 
> Ah - I should have guessed that.
> 
>>>> Would a single clean commit of the (current) .travis.yml file be
>>>> preferable to the current series of commits? And you you want
>>>> a pull request, or would you just merge/cherry-pick manually?
>>> 
>>> Given all the churn between our revisions, personally I'd opt for
>>> a single clean commit to bioperl/master - but your call.
>>> 
>>> Peter
>> 
>> Yep, about to merge it over.  It's working now, just need to
>> whitelist master instead of travis after the merge.
> 
> I'd removed the whitelist altogether here:
> https://github.com/peterjc/bioperl-live/commit/96dc5866f4406179353909c72d812623341c8fbd
> 
> My thinking was BioPerl seems to have multiple feature branches
> under the official repo, so they should get tested too. You'd be
> in a better position than me to judge what would work best for
> BioPerl here.
> 
> Peter

We'll keep it to master for now.  It's pretty easy to add branches as needed, and I didn't want to expand to all the potentially stale branches unless explicitly set (we need to triage all those at some point).

chris


From p.j.a.cock at googlemail.com  Fri Jul 27 14:47:18 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 27 Jul 2012 15:47:18 +0100
Subject: [Bioperl-l] BioPerl and Travis-CI
In-Reply-To: <CAKVJ-_7g-rVQ37Q0Svg4LGXN7C=3i7Mab-kjUc2XpuuX5uEXsQ@mail.gmail.com>
References: <118F034CF4C3EF48A96F86CE585B94BF3140B017@CITESMBX5.ad.uillinois.edu>
	<CAKVJ-_4TN_OUGLGrcwp-cNkQGhXGn2BiO9F1NEzAiua49HY3Lg@mail.gmail.com>
	<CAKVJ-_7g-rVQ37Q0Svg4LGXN7C=3i7Mab-kjUc2XpuuX5uEXsQ@mail.gmail.com>
Message-ID: <CAKVJ-_45r1Hza-ga95oyMRbYGzupsKtmAZ=ac29g2n-U9SegWA@mail.gmail.com>

On Thu, Jul 26, 2012 at 4:22 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>
> That's done now - except for the circular dependencies, and GD,
> which might be easy to solve if anyone knows what the error
> means - see commit message here:
> https://github.com/peterjc/bioperl-live/commit/905441ac09939be3368c14de38d04486c7e9849a

Re: https://twitter.com/cjfields/status/228861370454638592
Not sure why you got GD to work when something very similar
had failed for me. Oh well - job done :)

> Would a single clean commit of the (current) .travis.yml file be
> preferable to the current series of commits? And you you want
> a pull request, or would you just merge/cherry-pick manually?

Given all the churn between our revisions, personally I'd opt for
a single clean commit to bioperl/master - but your call.

Peter


From robfsouza at gmail.com  Fri Jul 27 22:29:22 2012
From: robfsouza at gmail.com (Robson de Souza)
Date: Fri, 27 Jul 2012 15:29:22 -0700 (PDT)
Subject: [Bioperl-l] obf sites offline?
Message-ID: <9bef8a3b-08ca-4868-be7a-193e7596290d@googlegroups.com>

I can't access any of the OBF sites, either from work (USA) or my phone... 
is there something going on?
Robson


From p.j.a.cock at googlemail.com  Thu Jul 26 15:22:26 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 26 Jul 2012 16:22:26 +0100
Subject: [Bioperl-l] BioPerl and Travis-CI
In-Reply-To: <CAKVJ-_4TN_OUGLGrcwp-cNkQGhXGn2BiO9F1NEzAiua49HY3Lg@mail.gmail.com>
References: <118F034CF4C3EF48A96F86CE585B94BF3140B017@CITESMBX5.ad.uillinois.edu>
	<CAKVJ-_4TN_OUGLGrcwp-cNkQGhXGn2BiO9F1NEzAiua49HY3Lg@mail.gmail.com>
Message-ID: <CAKVJ-_7g-rVQ37Q0Svg4LGXN7C=3i7Mab-kjUc2XpuuX5uEXsQ@mail.gmail.com>

On Wed, Jul 25, 2012 at 11:31 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Wed, Jul 25, 2012 at 4:08 AM, Fields, Christopher J
> <cjfields at illinois.edu> wrote:
>> Peter Cock has graciously helped start up a branch for bioperl-live
>> that is using Travis-CI (a nice continuous integration tool).  Results
>> from Peter's fork are found here:
>>
>>     http://travis-ci.org/#!/peterjc/bioperl-live
>>
>> As this is now pulled into the main bioperl repo, results will be here:
>>
>>     http://travis-ci.org/#!/bioperl/bioperl-live
>>
>> I'll be working on this and expect this will be added to master in
>> the next few days.
>>
>> chris
>
> We've had this running for Biopython for a month now, and it has
> been a useful complement to the BuildBot (which covers other
> operating systems). This was following BioRuby's lead:
> http://biopython.org/pipermail/biopython-dev/2012-June/009742.html
>
> The current BioPerl Travis configuration is probably usable right
> now (after changing the branch whitelist to either master, or simple
> all branches).
>
> Other remaining issues include sorting out which dependencies
> should be installed, and streamlining their verbose output (e.g.
> using tail).

That's done now - except for the circular dependencies, and GD,
which might be easy to solve if anyone knows what the error
means - see commit message here:
https://github.com/peterjc/bioperl-live/commit/905441ac09939be3368c14de38d04486c7e9849a

Would a single clean commit of the (current) .travis.yml file be
preferable to the current series of commits? And you you want
a pull request, or would you just merge/cherry-pick manually?

> TravisCI can send out emails (e.g. on test failures), and perhaps
> bioperl-guts-l might be a sensible place to send these. Initially
> we'd disabled the emails for Biopython. I'd like to use an RSS
> feed... there is a JSON API which BioRuby are using for
> http://www.biogems.info/ which tracks their plugins.

I've filed an issue for news feed support in TravisCI,
https://github.com/travis-ci/travis-core/issues/82

Regards,

Peter


From p.j.a.cock at googlemail.com  Tue Jul 31 10:37:35 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 31 Jul 2012 11:37:35 +0100
Subject: [Bioperl-l] Travis Continuous Integration testing & pull requests
Message-ID: <CAKVJ-_5Yt8O21Wo0o6p1VgjheaaPYBDQEbF6geaABb1M++GfRA@mail.gmail.com>

Hi all,

I'm cross posting as this is an announcement. Please keep any
follow up discussion to the relevant project specific mailing list,
or if general open-bio-l please.

Those following the OBF blog or the OBF or Bio* Twitter accounts
will have already seen this, which I posted yesterday:
http://news.open-bio.org/news/2012/07/travis-ci-for-testing/

In summary, since earlier this year BioRuby and then Biopython and
BioPerl have been using Travis-CI.org (a hosted continuous
integration service for the open source community) to run their unit
tests automatically whenever their GitHub repositories are updated.

In addition we now have TravisCI automatically running our tests
on any new GitHub pull requests - supported by an OBF donation
to Travis-CI, see:
http://about.travis-ci.org/blog/announcing-pull-request-support/

Currently BioJava only uses GitHub as an SVN mirror - but this
should still let you start using TravisCI for automated testing:
http://about.travis-ci.org/docs/user/languages/java/

For EMBOSS, this is another incentive to convert from CVS to
github - TravisCI recently announced support for C/C++ projects:
http://about.travis-ci.org/blog/support_for_go_c_and_cpp/
http://about.travis-ci.org/docs/user/languages/c/

Potentially there are other OBF projects where this would be
useful too.

Regards,

Peter